Enterprise Technology By Phil Edholm

Instant Messaging - Is Text a Waystation to Voice?

As I discussed in earlier blogs, both text and the act of typing are limited in their ability to convey both information and collaboration. A recent event in a totally separate field caused me to begin to think if a change is already underway.

Being somewhat of a car enthusiast, I subscribe to Road and Track. In the December 2007 issue, there is a review of the new Ford Focus. One hot feature of the new Focus is called Sync, a technology capability that enables the car to interact with Bluetooth Cell Phones and iPods through the car and using speech recognition. The reviewer waxes idyllic about using his voice to access songs on his iPod or Zune (it is, after all, a Microsoft based technology).

What really got me thinking though was this line; "For instance, Sync can read aloud any test message the driver receives, even translating such common abbreviations as LOL to "Laugh Out Loud"..... On a safety note, responses to text messages can be made only when the Focus is stationary."

This thought process really got me to thinking, both about technology and hypocrisy. On the technology note, what we are doing is taking text messaging, which started because it was cheap and could run in the available space in phone networks without impacting the real revenue of calls, and turning it into an intercom. Other than the thrill of passing notes in class which text messaging obviously gives, and the capability to do it while appearing to be doing something else, isn't a more logical way to merely record a voice snip and send it to the receiver who can listen to it? It would sure seem to me that for many events this is more logical. And, with a very small Bluetooth headset, I can even hear the comments while in a meeting.

I am not devaluing text messaging, but merely pointing out that it might be better to make text a representation of speech rather than converting text as the basis into speech. So the message would actually be an audio clip that could be listened to or converted to text.

The next is a bit of a rant on the hypocrisy of why you cannot record a voice to text message while the car is moving? I can fully see why texting with a keyboard is a real problem while driving (as is eating, make-up, reading, and a number of other activities I see every day on my commute), but saying "..reply to message....Bob, I agree and we should proceed" in response to a message from Bob does not reduce my ability to drive any less than talking with someone in the passenger seat. Further, by not enabling a voice interface to send messages, it actually encourages the driver to pick up the text device and enter the response through the keypad, which is much worse than speaking. If the goal is to eliminate distractions, we should do a number of other things; ban all speaking in cars, eliminate cup-holders to eliminate beverage distractions, eliminate all drive-through fast food lanes, and so on. This concept that either talking on a phone or voice messaging has a greater impact on driving than these other activities is absurd as is equally absurd the concept of banning them. I fully agree that all activities in the car should be "hands free", leaving the driver to use both hands for the vital task of driving. We have a very good personal friend who suffered a life-long spinal disability that occurred when she looked down at the radio and hit a bridge abutment. Perhaps radios that have physical controls should be banned, only allowing radios that have speech control? While I know we can do this with technology for radios, telephony, and messaging, I do not know how you can do hands free drinking, eating, make-up, or even holding hands.

I think it is time we realized that technology that enables people to be productivity while in a vehicle is not the enemy, it is a lack of technology that reduces the use of our physical assets to manipulate the physical environment. So voice calling and now messaging are, in fact, the heroes here.

Trackbacks/Pings

  1. […] Edholm has a post on how voice has far more potential as an effective instant-messaging tool than text - something that will, no doubt, rile all those Twitter users. […]

  2. […] While I agree with the law, I think this is the right answer, not banning cellular phones altogether in cars. For more commentary on that subject, please see this previous post. […]

Comments

  1. I did not mean to devalue text, but was asking a basic question; if we translate text to speech to listen and speech to text to record, why not just send the speech?

    Maybe the Twitter users will become something new if there was simple speech messaging with text conversion when required……sure would be easier for my mother who has macular degeneration.

  2. To me it is situational. In some locations, and from some people, I prefer to receive text, in other situations, I prefer voice. As well, the area I am in, or device I am using may not support one or the other. So, IMO the capability should be flexible.

    This leads me to think about Cell Jammers. A really useful jammer would be able to selectively block voice in favour of text, automatically putting everyone’s phone on “meeting mode”.

  3. That is interesting. It is related to using context as part of availability. For example, user velocity could be interesting in determining a mode; if you are traveling over 20 miles an hour, video would not be available. Of course, you would need some way to distinguish a train from a car or a driver from a passenger…..maybe the jammer is not that but an environmental context input….it could be used in conjunction with other factors to determine both availability and mode preferences.

  4. I am available in a consulting role for a fee :)

  5. This kind of technologi give me ide to how voice can be transmit (telecommunication). May be we can use ide Speech to Text in one site, transmit text and use Text to Speech in the other end. This could save bandwith. May be in next technology of speech “color” of speech like intuation, “feeling” can be translate to text so to rebuild speech from text will more be likely to original peech

  6. David brings up a great point I had not considered; is text better for bandwidth savings? Is it worth the speech to text and text to speech for bandwidth, especially if we have to add emotion cues and we would lose the senders actual voice.

    To see the bandwidth differences/benefits requires some calculations on the difference in bandwidth between speech and text.

    To do this, I recorded the following phrase on the audio recorder in my PC; “Now is the time for all good men to come to the aid of their country.” The audio recording took 5.25 seconds. Assuming PCM rates, at 64 Kbps, that would be about 336 Kbps or 42 Kbytes. There are 68 characters, resulting in 68 bytes of text. So the relative ratio is very large. The speech is almost 620 times larger. However, if we compress this to 8 Kbps voice (easy to do as it is not real time - we do this in our mail systems with virtually no loss in Moss scores), the difference drops to 78 times.

    I think the key question is whether this is truly meaningful in the larger scheme of things for different networks.

    If we assume the average text message is 20 characters and the average person sends 100 IMs per day, then the total text transmitted is 2000 bytes plus IP/Ethernet overhead of about 6000 bytes for a total of 8000 bytes. To send the same as speech as compressed voice would be 156 Kbytes per day of payload and about 175 Kbytes per day with overhead (as the packets are much larger, the overhead percentage is much less of the IP/Ethernet headers). Because of the overhead (assuming no header compression), the ratio is actually 20 to 1 in this example. While 175 Kbytes per day is a big deal on a 2G or 2.5G wireless network, on either a LAN, a broadband home connection, a WiLAN or a 3G cellular system it is virtually negligible. It is equivalent to a few browser screens, a few emails with attachments or a few seconds of video. Taken for a cell site, assuming an average of 100 users per cell doing all of their texting in 8 hours, the average BW per second per user is less than 50 bits (175000 bytes per day/8/60/60*8=50 bits per second). With 100 active users in a cell, this translates to 5Kbps of total texting traffic on a continual basis, a very small percentage in of available capacity in 3G and negligible in 4G. Assuming a 1 Mbps channel speed of the cell site, this is about .5% of available bandwidth. As it is non-real-time traffic, it can be relegated to a best effort level of service. While the text demand of 250bps (5Kbps/20) is obviously much lower, in the overall world of graphics, phone calls, and video, the difference is probably not even rounding error.

    So my conclusion would be that speech based IM does not really have a major bandwidth issue, and has the benefit of being the original voice, dripping with all the sarcasm and subtle emotions that can only be adequately reproduced with speech (my wife reminding me I am 45 minutes late coming home to dinner!!!).

Leave a Reply