Practical Troubleshooting of G729 Codec in a VoIP Network

JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 12

Practical Troubleshooting of G729 Codec in a VoIP Network

Yuri Ritvin

Abstract— In the modern telecom arena VoIP networks became ubiquitous and their share is ever-growing bringing more advanced

services to the end users. While the VoIP systems' advantages are undoubted, there are multiple challenges during their deployments and

this requires a high level of a technical expertise from the personnel involved in such projects. This paper particularly discusses aspects

related to a proper G729 codec deployment, describing the pertinent problems and troubleshooting approach successfully used by the

author to solve them.

Index Terms— G729, codec, VoIP, RFC2833, troubleshooting, sniffer, QoS

—————————— ——————————

1 INTRODUCTION

VoIP (Voice over Internet Protocol) communications became common today as they replace the legacy telephony services everywhere in a global technological shift towards "All IP network" when the packet switched networks will eventually supersede all the circuit switched networks. But in the IP (Internet Protocol) technology there are multiple challenges for achieving a good voice quality. These challenges are intrinsic to the IP networks' nature and are stipulated by a few major affecting factors as follows:

packet delay (also known as latency) jitter (variation between delays of different packets

at the same voice stream) packet loss (packets from a sender that had never

been received by a conversation counterpart). As a threshold for a sound VoIP call quality the following values of the mentioned above factors are commonly acceptable:

delay 150 ms (one way, per an ITU-T recommendation G.114, see [1])

jitter 30 ms packet loss 1 %

As well there are some additional influencing factors as the out-of-sequence packets (when network is affected by insufficient QoS), jitter buffer misconfiguration (different for particular VoIP appliances) and echo. To achieve a desirable voice quality all these factors have to

be taken into account during a planning phase of each VoIP project, because the technological decisions, made at that phase, will have a critical impact on the project's deployment results. Along with technological considerations it's important to work with experienced professionals for such deployments since, for instance, even if there will be plenty of reserved bandwidth in a network for a VoIP part, the voice quality can be affected by improper setup of pertinent hardware, buggy software versions and non-optimal configurations.

One of the key questions to address during the planning phase is what codec to use in the VoIP system ? The answer to this question will have an essential impact on a scope of the effort to ensure the desired voice quality.

2 VOICE CODEC SELECTION

There are many codecs that can be used in VoIP systems (see Appendix A for the codecs' list), but the most common one is G711 (described in [2]), the default codec in majority of VoIP implementations.

There are 2 versions of this codec - G711a and G711u, but both have the same bit rate of 64 Kbit/s. In USA G711u is used, while G711a is used mostly in Europe. The advantages of this codec are in high voice quality with Mean Opinion Score (MOS) of 4.1 (on a scale from 1 to 5), simplicity of deployment, ubiquity and availability on every vendor platform. But the quality comes in expense of the high bandwidth utilization per each call - G711 call consumes 87.2 Kbit/s minimum. This number is built of the voice payload, which is 64Kbit/s as produced by DSP (Digital Signal Processor) that converted analog voice into digital, and 23.2 Kbit/s of a network overhead - the "vehicle" that transports G711 voice payload over the network. The calculation is based on a G711 codec sampling rate of 8,000 times per second with each sample size of 8 bits - this comes to 64,000 bits per second or 8,000 bytes since a byte is equal to 8 bits. The standard packetization rate of G711 codec is 50 pps (packets per second), meaning a time frame size of each packet is equal 20 ms and the voice payload per packet is 8,000 bytes / 50 = 160 bytes. This payload is encapsulated into a network packet that adds the overhead,

————————————————

Yuri Ritvin is the founder of YRI CORP, a professional services company

in fields of telecommunication, VoIP, security, databases and systems,

Internationally recognized telecommunication expert, with over 20 years of

experience. Author of a few patents and among them a patent pending

CareOneCall project that will change drastically the future of 911 system

bringing together modern telecom technologies, health care industry and

public safety systems. Has a M.Sc. degree in electrical engineering and

holds numerous professional certifications including Cisco CCNA, CCNP,

Unix Administrator, Oracle and MySQL DBA.


mentioned above, consisting of "envelopes" of Layer 2 (Ethernet in most cases), Layer 3 (IP), Layer 4 (UDP) and service protocol (RTP - Real Time Transport Protocol) that provides time stamps and sequence numbers for the packets. The entire packet size is dissected in a Table 1 below with a total bandwidth consumption's calculation for 50 pps (packets per second). Along with the voice payload (aka "media") VoIP traffic

includes signaling that is responsible for the call session establishment, media parameters negotiation, call session maintenance and tearing down. Most popular signaling protocol today is SIP and the SIP traffic is in average about 5% of total media traffic. As well some additional bandwidth is consumed by an RTCP protocol, which is "a companion" of RTP responsible for network conditions

Size in bytes per one packet Size in bits

Ethernet

header

IP

header

UDP

header

RTP

header

Total

overhead

per packet

Voice

payload

Total

VoIP

packet

Total bandwidth

consumption per

second

18 20 8 12 58 160 218 218 x 8 x 50 = 87200

Table 1

monitoring and reporting during the established voice session essentially providing a feedback on the quality of service (QoS). RTCP traffic volume is also around 5% of media traffic (according to [4], §6.2). So, in the network with an available bandwidth of 100 Mbit/s dedicated to the VoIP usage there will be possible to place 1042 simultaneous G711 calls (100,000,000 / (87,200 x 1.1) = 1042).

For the VoIP provider it'll be just a half of that number (521) if all media - the voice itself or fax - is going thru the VoIP provider premises since each end-to-end call will consist of 2 call legs (as shown in Figure 1): one - aka leg A - from a subscriber (a caller) into the VoIP provider's

softswitch and another - aka leg B - from that softswitch to the actual call destination (a callee).

Bandwidth in most cases is a pricy asset, so to reduce its exhaustion or to constrain the need for its expansion there were introduced voice codecs that consume much less bandwidth than G711. The list contains many (see Appendix A) - some more popular, some less. Many factors influenced codecs adoptability in the industry like complexity of implementation, availability of source code, licensing fees and others. One of the bandwidth saving codecs that received wide popularity is G729 with a bit rate of 8 Kbit/s, which 8 times less than G711 with 64 Kbit/s.

Figure 1


3 G729 SPECIFICS

G729 codec is a good choice for the networks where bandwidth cannot be easily increased regardless of the reason - like network hardware capabilities limitation, leased lines availability, cost prohibiting cases, etc. Low bit rate is achieved with a patented audio data compression algorithm that compresses digital voice by default in packets of 10 ms duration (frame size). It is officially described as Coding of speech at 8 Kbit/s using code-excited linear prediction speech coding (CS-ACELP) in [3].

G729 codec voice quality score (MOS) is 3.92, which is slightly less than that of G711, but is still considered as good enough (aka toll-quality). The bandwidth conservation,

however, is significant - G729 consumes only 31.2 Kbit/s (see Table 2 below) in comparison to G711 that takes minimum 87.2 Kbit/s (as shown in a Table 1 above). Capacity-wise in comparison with G711 for the same bandwidth of 100 Mbit/s there will be possible to place 2913 simultaneous G729 calls (100,000,000 / (31,200 x 1.1) = 2913) that is 2.8 times more than with G711.

Despite the fact that a default packet time frame size of G729 codec is 10 ms, it may have different frame sizes depending on particular implementation: 10 ms, 20 ms, 30 ms. Table 2 shows case for 20 ms frame size that is 50 pps (packets per second).

Size in bytes per one packet Size in bits

Ethernet

header

IP

header

UDP

header

RTP

header

Total

overhead

per packet

Voice

payload

Total

VoIP

packet

Total bandwidth

consumption per

second

18 20 8 12 58 20 78 78 x 8 x 50 = 31200

Table 2

There are a few variations of G729: 1) G729 original. 2) G729A or annex A - it's a simplification of G729 and

compatible with G729. Less complex algorithm, but produces lesser voice quality.

3) G729B or annex B - provides silence suppression and not compatible with the previous ones.

4) G729AB - essentially G729A with silence suppression and only compatible with G729B.

As well there are G729 versions with 6.4 kbps (annex D) and 11.4 Kbps ( annex E).

When 2 VoIP peers conduct media capabilities negotiation during a call establishment phase they have to agree on the right codec's version to use in order to have the quality communication. In a case when a SIP protocol is used for signaling such the negotiation is managed within

an SDP portion of a SIP message body (SDP stands for Session Description Protocol). Each codec is specified there with a number corresponding to an RTP payload type in the media description and media attribute fields and for G729 codec this number is 18. This number, however, remains the same for all versions of G729 codec despite their actual annex. The difference is indicated in another media attribute - annexb=no (for G729/G729A) or annexb=yes (for G729B).

Figure 2 shows these media attributes in an actual call trace. Absence of the annexb attribute in an SDP part of a SIP message body is interpreted by some vendors as a declaration of G729A version, but by some others it's considered as a declaration of G729B, so it's important to include this attribute for explicit indication of the desirable version.

Figure 2


4 LICENSING CONSIDERATION

G729 includes patents from several companies and is licensed

by Sipro Lab Telecom. Sipro Lab Telecom is the authorized

Intellectual Property Licensing Administrator for G729

technology. OEM vendors sale G729 licenses in different prices

depending on amount of channels requested by end users.

Retail price of a single channel license is $10. For wholesale

cases there are discounts - more licenses requested, less it'll

cost. A single channel for a purpose of licensing is any

connection to a softswitch that activates the codec processing -

this can be a call session that required transcoding between

G729 and any other codec or a need in an IVR (Interactive Voice

Response) session. In the latter case 2 licenses will be required

for the call - the first one for an initial caller's channel and the

second one for the IVR channel. When both call parties - caller

and callee - use the same codec there is no need in the codec

license activation on a softswitch and this case is known as a

path-thru call.

5 PROBLEMS

Problems of G729 codec deployment stem from the described earlier plenty of the codec variations and from discrepancies in understanding of the codec implementation by different vendors even when the same version is declared by both peers participating in a call session. When all configurations look good, but the actual voice quality is bad - like voice distortion / garbling, very low volume, choppy voice / breaking up - the reason is not always easy to identify. Additionally to voice, special attention should be given to a DTMF method definition since a voice stream compression applied in G729 codec distorts the inband DTMFs. Two possible DTMF methods are compatible with G729 - RFC2833 (RTP events, described in [5]) and a SIP info

(described in [6]). Some vendors prefer RFC2833 (RFC2833 is a universal name for the RTP events method despite the fact that RFC2833 recommendations per se had been superseded by RFC4733, http://tools.ietf.org/html/rfc4733 ) while others recommend to use the SIP info with their equipment. The proper method is chosen during the actual interoperability tests in the field and - in a case of problems - after conducting the necessary troubleshooting activities. In some cases the problem appears in a long-time properly working system without apparent reason. If such a problem persists or can be reproduced, then it's a "good" situation for troubleshooting since the "culprit" can be caught during the troubleshooting session.

6 TROUBLESHOOTING

Undoubtedly, the best sources of the information for VoIP systems' troubleshooting are the packet traces that are collected using network sniffers. The traces have to include both media and signaling together. Some softswitches allow easy trace collection from a command line, some switches require attachment of external network sniffer or - in cases when it's feasible - a trace can be taken at some point on the network pipe, like Firewall or an access switch with a port mirroring, for instance. Additional sources of valuable information are application and system logs. When a softswitch has a rich logging capability, then the log should be set to maximum verbosity in order to catch the errors that can shed light on the codec interoperability problems. For the effective troubleshooting the traces should be taken simultaneously on both sides of the call channel and it - of course - requires cooperation of both involved parties from both sides of a SIP trunk. Analysis of the collected traces will show the actual call processing.

Figure 3


The good choice to perform such analysis is a Wireshark network sniffer. When the trace is open with Wireshark, the VoIP calls are extracted by selecting a "VoIP Calls" option from a "Telephony" menu item (see Figure 3). From a list of

VoIP calls the call flow diagram is displayed by highlighting a particular call and then selecting a "Flow" button at the bottom (see Figure 4).

Figure 4

Figure 5


In the call flow diagram an entire call session is presented in a graphical visual form (see Figure 5), where a signaling part and a media part are commented as SIP and RTP respectively, and DTMFs are marked explicitly. Reviewing of the call flow diagram is a good starting point for finding the problem's cause. Clicking on any arrow in the call flow diagram opens a corresponding packet in a details window of a main Wireshark screen (see Figure 6) and allows a deep analysis of the packet content.

In particular, Synchronization Source identifier's number will allow to correlate the corresponding voice stream with the right direction at the next troubleshooting step - RTP stream analysis. To get to the latter step, first, a "Telephony" option should be chosen from a Wireshark menu, then an "RTP" option and from there "Stream Analysis" (see Figure 7). In the Stream Analysis window (see Figure 8) there are 2 tabs - Forward direction and Reversed direction.

Figure 6

Figure 7


In each direction the first thing to analyze is "Delta" time between packets in ms. This should be as close as possible to the voice packets' frame size per the codec's negotiated packetization time. The latter is shown in the SDP portion of the SIP message body at a Media attribute called ptime, for instance, ptime=20 (see Figure 2). Meticulous attention should be paid to deviations of the delta, starting from a packet that has the maximum delta (Max delta) as indicated in the voice stream summary at the bottom of the Stream Analysis window (see Figure 8). The output in the Stream Analysis window can be rebuilt according to the ascending or descending order of any column. If to rebuild it in the descending order per a Delta column values, then it'll allow to assess how stable was the voice stream and whether there were packets with an abnormal delta and whether the amount of such packets was considerable.In the problematic cases the abnormality will be noticeable (see Figure 9) - an expected RTP packets'

frame size (packetization time, ptime) is 20 ms, but the analysis shows many packets with a delta time close to 40 ms that is twice bigger than expected, meaning there were chunks of conversation with a packetization rate of 25 pps instead of expected 50 pps. This means the ptime that was negotiated between call parties (ptime=20) is not maintained by the source softswitch and it causes the voice quality deterioration on the destination softswitch or softphone. This finding should lead to validation of the softswitch configuration, particularly, to check within the configuration and operation manual or wiki what are manufacturer's recommendations in such a case. For instance, for such a popular softswitch like FreeSwitch, the recommendation is to add the following statement into the configuration file vars.xml: <X-PRE-PROCESS cmd="set" data="rtp_manual_rtp_bugs=IGNORE_MARK_BIT"/>

Figure 8


And after making the change the service should be restarted, like "service freeswitch restart". Discrepancy in the packetization rate between source and destination - like in the depicted case - is one of the major reasons of the bad voice quality, but, as mentioned before, there is an impact of other factors, for instance, of a jitter buffer's size. Usually, a jitter buffer is applied on a receiving side, but sometimes it's overlooked during an integration phase and a jitter buffer is set on a softswitch for purpose of the choppy voice occurrence elimination. For G729 codec this introduces extra latency over the acceptable limit, because G729 has the built-in compression delays of 10 ms on each side of the call channel - during encoding and during decoding. So, the delay budget of G729 codec is 20 ms less than G711, for instance, and it should be taken into consideration. Many packets are dropped by the jitter buffer as a result of its size misconfiguration and the call is perceived as breaking up

and garbled. Another cause of the problematic voice quality is out-of-sequence delivery of the RTP packets. Each packet in the network, in general, can take its own path - this is one of the tenets of the packet switching technology vs the circuit switching technology where all communication, related to a particular call, goes via a strictly predefined path. When packets of the same voice stream are sent via different paths (different routers) it can be because of the network congestion conditions, network convergence or multi-path load balancing on the routers staying in the path between the call parties. At any rate the packets that came out-of-sequence are dropped during the voice stream reconstruction by the call receiving party and it negatively impact the voice quality. The solution for such cases is in striving to achieve the network QoS agreement with the VoIP provider thru establishing of MPLS circuit or via installing a dedicated leased line (like a fiber circuit).

Figure 9


ABBREVIATION AND ACRONYMS

Acronym Description

VoIP Voice over Internet Protocol

MOS

Mean Opinion Score (measure of the quality of human speech at the destination end of

the circuit, value of 5 is excellent, value of 1 is bad)

DSP Digital Signal Processor

IP Internet Protocol

UDP User Datagram Protocol

RTP Real Time Transport Protocol

RTCP RTP Control Protocol

ITP Internet Telephony Provider

SIP Session Initiation Protocol

SIP trunk VoIP communication line of specific capacity as defined by agreement with ITP

QoS Quality of Service

pps packets per second

CS-ACELP Conjugate-Structure Algebraic Code Excited Linear Prediction

SDP Session Description Protocol

OEM Original Equipment Manufacturer

IVR Interactive Voice Response

DTMF Dual Tone Multi Frequency

APPENDIX A. VOICE CODECS LIST

Number Standard by Description Bit rate (kb/s) Sampling

rate (kHz)

Frame

size (ms)

MOS

(Mean

Opinion

Score)

G.711 ITU-T Pulse code modulation (PCM) 64 8 Sampling 4.1

G.722.1 ITU-T

Coding at 24 and 32 kbit/s for

hands-free operation in systems

with low frame loss 24/32 16 20

G.722.2 AMR-

WB ITU-T

Adaptive Multi-Rate Wideband

Codec (AMR-WB)

23.85/ 23.05/

19.85/

16 20

18.25/ 15.85/ 14.25/

12.65/ 8.85/ 6.6

G.723.1 ITU-T

Dual rate speech coder for

multimedia communications

transmitting at 5.3 and 6.3 kbit/s 5.6/6.3 8 30 3.8-3.9

G.726 ITU-T

40, 32, 24, 16 kbit/s adaptive

differential pulse code

modulation (ADPCM) 16/24/32/40 8 Sampling 3.85

G.727 ITU-T

5-, 4-, 3- and 2-bit/sample

embedded (ADPCM) var. Sampling

G.728 ITU-T

Coding of speech at 16 kbit/s

using low-delay CELP 16 8 2.5 3.61


G.729 ITU-T


using conjugate-structure

algebraic-code-excited linear-

prediction (CS-ACELP) 8 8 10 3.92

G.729.1 ITU-T


using CS-ACELP

8/12/14/16/

8 10

18/20/22/24/

26/28/30/32

GSM 06.10 ETSI

RegularPulse Excitation Long-

Term Predictor (RPE-LTP) 13 8 22.5

LPC10

USA

Government Linear-predictive codec 2.4 8 22.5

Speex 8, 16, 32

2.15-24.6

(NB) 30 ( NB )

4-44.2 (WB) 34 ( WB )

iLBC 8 13.3 30

DoD CELP

Department

of Defense

(DoD) USA

Government 4.8 30

EVRC 3GPP2 Enhanced Variable Rate CODEC 9.6/4.8/1.2 8 20

DVI

Interactive

Multimedia

Association

(IMA)

DVI4 uses an adaptive delta

pulse code modulation

(ADPCM) 32 Variable Sampling

L16

Uncompressed audio data

samples 128 Variable Sampling

SILK Skype From 6 to 40 Variable 20

REFERENCES

[1] ITU-T Recommendation G.114, http://www.itu.int/rec/T-REC-

G.114-200305-I


G.711-198811-I/en


G.729-201206-I/en

[4] RTP: A Transport Protocol for Real-Time Applications,

https://www.ietf.org/rfc/rfc3550.txt

[5] RTP Payload for DTMF Digits, Telephony Tones and Telephony

Signals, http://tools.ietf.org/html/rfc2833,

http://tools.ietf.org/html/rfc4733

[6] The SIP INFO Method, http://www.ietf.org/rfc/rfc2976.txt

Documents

Practical Troubleshooting of G729 Codec in a VoIP Network