As you already know G.711 is a high quality voice codec that we have support for in Asterisk as well as in  many other open source and commercial VoIP platforms. G.711 uses logarithmic PCM (pulse code modulation), a standard as old as from 1972. G.711 is pretty much the norm for IP-telephony where there is enough bandwidth, hence most IP-phones will come preset to this codec (either u-law or a-law, see below). Whether you can use it or not, depends on your available bandwidth (with IP headers it’s bandwidth requirements close to 80 Kbit/s per each active line). If you have a normal broadband connection of cheapest type, you can still get at least two phone calls going with this codec (or one call and one on hold with music).

Watch the upstream speed on ADSL (and satellite links as well), this is the limiting factor when you decide what codec to use.

Now we will try to understand what is the difference between µ-law and a-law versions of G.711 voice codec and which one is better for your particular setup.


1) µ-law (pronounce mu-law, the u is a Greek mu) and a-law operate in the same way; that is they both sample audio at 8KHz and taking into account low frequency and anti-alias filters give the standard telephony frequency response of 300Hz to 3400Hz.

2) In a-law the audio is sampled with a 13 bit resolution and u-law with 14 bit resolution. What they then do is logarithmically compress the amplitudes so in both cases you compress the amplitude down to 8 bits. 8 bits at 8KHz gives 64 kbit bandwidth of DSP output.

The reason the amplitudes are compressed to 8 bits is because the human ear is sensitive to sound in a logarithmic fashion. Most people simply won’t notice the quantisation used for amplitude compression for a voice conversation.

When G.711 (µ-law/a-law) is received it’s expanded back to it’s original 13 or 14 bits. This way you get up to 14 bits of dynamic range using only 8 bit values. The difference between µ-law and a-law are the parameters used to define the logarithmic function (the chords and from memory, exponent).

3) Europe, Australia and most of the rest of the world uses the a-law (which is European version of a-law) and Japan and Northern America uses µ-law. The ITU G.711 standard describes  the parameters for both µ-law (ulaw in asterisk config files) and a-law (alaw in asterisk config files) and at the back of the document it provides a translation table for about 40 values that need to be mapped to ensure a good translation between both.

Appendicies 1 and 2 define PLC (packet loss concealment, a way of handling packet loss in ethernet transmission) and comfort noise generation used to reduce bandwidth during silence, there is also voice activity detection.

Which G.711 to chose?

They are both very similar but subjective tests show a-law to sound slightly better under non-optimal performance (packet loss, delays) so choose the a-law version if you can.


P.S. I have a good reading for you with mu-law and  a-law in-deep analyze and algorithm description (it’s a PDF brochure available for download).

Quick overview:

Human Acoustics and the Telephone Network

By classifying according to their mode of excitation, speechsounds can be broken into three distinct classes of phonemes,where a phoneme is defined as the smallest unit of speech thatdistinguishes one utterance from another. The three classes ofphonemes are voiced, unvoiced, and plosives. Voiced phonemesare considered deterministic in nature. They are produced byforcing air through the glottis with the tension of the vocal cordsadjusted so that they vibrate in a relaxed oscillation. Thisproduces quasi-periodic pulses of air which excite the vocal tract.

Examples of voiced phonemes are the vowels, fricatives /v/, and/z/, and stop consonants /b/, /d/, and /g/.

Unvoiced phonemes aregenerated by forming a constriction at some point in the vocaltract and forcing air through the constriction at a high enoughvelocity to produce turbulence. As a result, unvoiced phonemesare considered random in nature. Examples of unvoicedphonemes are the nasal consonants /m/, and /n/, fricatives /f/, and/s/, and stop consonants /p/, /t/, and /k/.

Similar in nature tounvoiced sounds, plosive sounds result from making a completeclosure of the vocal tract, building up pressure behind the closure,and abruptly releasing it, such as the /ch/ phoneme.Naturally occurring speech signals are composed of combinationsof voiced, unvoiced and plosive phonemes. For example,contained in Figure 1 is the speech signal ‘goat’, which containstwo voiced phonemes /g/ and /oa/, followed by a partial closure ofthe vocal tract, and then an unvoiced phoneme, /t/. The /g/, /oa/,and /t/ occur approximately at samples 3400-3900, 3900-5400,and 6300-6900, respectively.

Each phoneme class brings its own stress to the telephonesystem. In general, the peak to peak amplitude of voicedphonemes is approximately ten times that of unvoiced and plosivephonemes, as clearly illustrated in Figure 1. As a result, thetelephone system must provide for a large range of signalamplitudes.

Although lower in amplitude, unvoiced and plosivephonemes contain more information and thus, higher entropy thenvoiced phonemes. Thus, the telephone system must providehigher resolution for lower amplitude signals.In addition to the tasks presented by the speech signal, thetelephone network is also subject to bandwidth restrictions withrespect to the human speech and auditory ranges.

The speech bandwidth for most adults is approximately 10 kHz. In contrast,the maximum auditory range of humans is 20 kHz. This maximumauditory range is usually limited to young children; instead, thetypical hearing bandwidth for most adults is 15 kHz.Of the speech and auditory bandwidths, the telephone networkrestricts transmission to a 3 kHz portion, from .3 to 3.3 kHz.

This frequency range is believed to coincide with the region of greatestintelligible speech, retaining only the first three formant frequencies of the sampled speech signal. This reducedbandwidth is then surrounded by unused space from 0 to .3 kHzand from 3.3 to 4 kHz. This unused space, known as the guardband, provides a buffer against conversation interference.Summing the transmission and guard bands, the telephone network has a total bandwidth of 4 kHz.

In summary, the telephone system must provide adequate qualityfor small amplitude signals consisting of unvoiced phonemes.Concurrently, the telephone system must provide for transmissionof a wide range of signal amplitudes, due to the occasionaloccurrence of high energy voiced phonemes. Theaccomplishment of these concurrent tasks, within a limited bandwidth, may be achieved via Pulse Code Modulation andcompanding, as discussed in the following section….

Complete PDF brochure available for download

Tagged with:
 

Leave a Reply