When is something irrelevant?
In a posting a few while back I commented on the potential of Audio IM replacing Text IM. As one of the comments, David brought up a great point I had not thoroughly considered; is text better than audio/speech because of the bandwidth savings? Is it worth the speech to text and text to speech for bandwidth for the savings in network load, especially if we have to add emotion cues and we would lose the senders actual voice.
While I responded in the comments on that posting, I thought some of you might miss the thought process this stimulated and some of the resulting analysis. I have long maintained that things that are less than 5% of the total available capacity of a system are essentially irrelevant. Tis applies in two ways, for traffic like VoIP, making it have absolute priority and no discard assures the SLA, and can be done without regard to consequence if the VoIP is a small percentage. On the other hand, traffic such as IM on a data network is so small that it has negligible impact from a cost perspective. What is interesting is that networks and computing grow exponentially, some traffic is only linear (minutes of phone calls is much more linear based on average minutes per day and numbers of served customers, both of which are relatively low percentage growth). For some traffic, exponential growth is an impossibility. For these services, eventually they will drop below the "irrelevance" threshold and no longer be of technical or economic consequence in network planning. This does not mean they are important, only irrelevant from a capacity and operational planning perspective.
The question that David raised got me to thinking about text versus voice messaging in this area. While we would all agree that text messaging is probably irrelevant in demands on modern networks, what about Audio IM. Is the requirement sufficiently larger that it will change the operational dynamics of the network. To see the bandwidth differences/benefits requires some calculations on the difference in bandwidth between speech and text.
To do this, I recorded the following phrase on the audio recorder in my PC; “Now is the time for all good men to come to the aid of their country.” The audio recording took 5.25 seconds. Assuming PCM rates, at 64 Kbps, that would be about 336 Kbps or 42 Kbytes. There are 68 characters, resulting in 68 bytes of text. So the relative ratio is very large. The speech is almost 620 times larger. However, if we compress this to 8 Kbps voice (easy to do as it is not real time - we do this in our mail systems with virtually no loss in Moss scores), the difference drops to 78 times.
I think the key question is whether this is truly meaningful in the larger scheme of things for different networks.
If we assume the average text message is 20 characters and the average person sends 100 IMs per day, then the total text transmitted is 2000 bytes plus IP/Ethernet overhead of about 6000 bytes for a total of 8000 bytes. To send the same as speech as compressed voice would be 156 Kbytes per day of payload and about 175 Kbytes per day with overhead (as the packets are much larger, the overhead percentage is much less of the IP/Ethernet headers). Because of the overhead (assuming no header compression), the ratio is actually 20 to 1 in this example. While 175 Kbytes per day is a big deal on a 2G or 2.5G wireless network, on either a LAN, a broadband home connection, a WiLAN or a 3G cellular system it is virtually negligible. It is equivalent to a few browser screens, a few emails with attachments or a few seconds of video. Taken for a cell site, assuming an average of 100 users per cell doing all of their texting in 8 hours, the average BW per second per user is less than 50 bits (175000 bytes per day/8/60/60*8=50 bits per second). With 100 active users in a cell, this translates to 5Kbps of total texting traffic on a continual basis, a very small percentage in of available capacity in 3G and negligible in 4G. Assuming a 1 Mbps channel speed of the cell site, this is about .5% of available bandwidth. As it is non-real-time traffic, it can be relegated to a best effort level of service. While the text demand of 250bps (5Kbps/20) is obviously much lower, in the overall world of graphics, phone calls, and video, the difference is probably not even rounding error.
So my conclusion would be that speech based IM does not really have a major bandwidth issue, and has the benefit of being the original voice, dripping with all the sarcasm and subtle emotions that can only be adequately reproduced with speech (my wife reminding me I am 45 minutes late coming home to dinner!!!)
Older: 