7
Multiple Descriptions Coding in MELP Coder for Voice over IP M.SAIDI, B.Boudraa Laboratoire de communication parlée et de traitement du signal (LCPTS), université USTHB Algiers, Algeria [email protected], [email protected] M.Bouzid, M.Boudraa Laboratoire de communication parlée et de traitement du signal (LCPTS), université USTHB Algiers, Algeria [email protected], [email protected] AbstractIn VoIP systems, CELP coders, such as G.729, are commonly used as they offer good speech quality in the absence of packet loss. However, harmonic coders such as MELP may be a good alternative for VoIP due to their higher resilience to packet loss. In this paper we examine the problem of the packet loss in the VoIP application using MELP coders. The proposed packetization scheme based on Multiple Description Coding (MDC) applied to the MELP coder is presented. A packet will contain information on two MELP coders operating at 2.4 and 1.2 Kbps respectively. The packetization is achieved using 135 bits in 22.5 ms corresponding to a total rate of 6 kbps. The results show that under typical VOIP operating conditions, the method performs well and outperforms CELP coders operating without MDC at 8 kbps. Keyword; VoIP, MELP, MDC, packet loss. I. INTRODUCTION Voice over Internet (VoIP) is achieving by sending and receiving packets. At the receiver, some packets are missing because of delays, congestions or transmission errors. This packet loss degrades the quality of service (QoS) at the receiver. As the voice is transmitted in real time, the receiver cannot request the retransmission of lost packets due to the large transfer time induced. Concealing loss techniques are then used at the transmitter or the receiver in order to recover the loss packets. These techniques are called redundant descriptions [1]. Among these techniques, Multiple Description Coding (MDC), based on information redundancy, increases the robustness against packet loss. The goal is to keep a certain quality and intelligibility of speech when the packet loss rate increases. Real-time Transport Protocol (RTP) is often used. In this work, we have developed and implemented a packetization scheme made frame by frame and applied on a harmonic coder designed for voice over IP. Hence, this packetization is achieved on a mixed excitation linear prediction (MELP) [2]. Each packet contains major information on the current frame and its future neighbouring. A received packet can recover until four lost frames. This paper is organized as follow. First, we proceed in the voice communication over IP, then we propose a method for recovering lost packets. Afterward a Packetization scheme is proposed, and then a packet recovery is presented. Finally, a simulation results are shown and evaluation. II. VOICE OVER INTERNET PROTOCOL (VOIP) It is widely accepted that VoIP is a technology emerging on the internet and will dominate the field of voice communications. It has begun to take place as a viable alternative to the traditional voice communication systems. In this technique, the analog call is converted to digital and compressed to be then transformed into packets for transmission over an IP network. At the other end, the process is reversed: As the packets of information circulating on the Internet take different paths and arrive frequently out of order, the packets are firstly stored in buffers to be re-sequenced and then decompressed and transformed into a sound signal before being routed back through the ordinary telephone equipment [5, 6]. Figure 1 shows the block diagram of a voice communication system over IP network. Fig. 1: Block diagram of the transmission of voice over IP IP network Packetization De-packetization Decoding Coding Original speech Synthetic speech 978-1-4673-1591-3/12/$31.00 ©2012 IEEE

SSD2012-bouzid_paperOrig.pdf

Embed Size (px)

Citation preview

  • Multiple Descriptions Coding inMELP Coder for Voice over IP

    M.SAIDI, B.BoudraaLaboratoire de communication parle et de traitement du

    signal (LCPTS), universit USTHBAlgiers, Algeria

    [email protected], [email protected]

    M.Bouzid, M.BoudraaLaboratoire de communication parle et de traitement du

    signal (LCPTS), universit USTHBAlgiers, Algeria

    [email protected], [email protected]

    Abstract In VoIP systems, CELP coders, such as G.729, are commonly used as they offer good speech quality in the absence of packet loss. However, harmonic coders such as MELP may be a good alternative for VoIP due to their higher resilience to packet loss. In this paper we examine the problem of the packet loss in the VoIP application using MELP coders. The proposed packetization scheme based on Multiple Description Coding (MDC) applied to the MELP coder is presented. A packet will contain information on two MELP coders operating at 2.4 and 1.2 Kbps respectively. The packetization is achieved using 135 bits in 22.5 ms corresponding to a total rate of 6 kbps. The results show that under typical VOIP operating conditions, the method performs well and outperforms CELP coders operating without MDC at 8 kbps.

    Keyword; VoIP, MELP, MDC, packet loss.

    I. INTRODUCTION

    Voice over Internet (VoIP) is achieving by sending and receiving packets. At the receiver, some packets are missing because of delays, congestions or transmission errors. This packet loss degrades the quality of service (QoS) at the receiver. As the voice is transmitted in real time, the receiver cannot request the retransmission of lost packets due to the large transfer time induced. Concealing loss techniques are then used at the transmitter or the receiver in order to recover the loss packets. These techniques are called redundant descriptions [1]. Among these techniques, Multiple Description Coding (MDC), based on information redundancy, increases the robustness against packet loss. The goal is to keep a certain quality and intelligibility of speech when the packet loss rate increases. Real-time Transport Protocol (RTP) is often used.In this work, we have developed and implemented a packetization scheme made frame by frame and applied on a harmonic coder designed for voice over IP. Hence, this packetization is achieved on a mixed excitation linear prediction (MELP) [2]. Each packet contains major information on the current frame and its future neighbouring. A received packet can recover until four lost frames.

    This paper is organized as follow. First, we proceed in the voice communication over IP, then we propose a method for recovering lost packets. Afterward a Packetization scheme is proposed, and then a packet recovery is presented. Finally, a simulation results are shown and evaluation.

    II. VOICE OVER INTERNET PROTOCOL (VOIP)

    It is widely accepted that VoIP is a technology emerging on the internet and will dominate the field of voice communications. It has begun to take place as a viable alternative to the traditional voice communication systems. In this technique, the analog call is converted to digital and compressed to be then transformed into packets for transmission over an IP network. At the other end, the process is reversed: As the packets of information circulating on the Internet take different paths and arrive frequently out of order, the packets are firstly stored in buffers to be re-sequenced and then decompressed and transformed into a sound signal before being routed back through the ordinary telephone equipment [5, 6]. Figure 1shows the block diagram of a voice communication system over IP network.

    Fig. 1: Block diagram of the transmission of voice over IP

    IP network

    Packetization

    De-packetizationDecoding

    CodingOriginal speech

    Synthetic speech

    978-1-4673-1591-3/12/$31.00 2012 IEEE

  • III. OUR PROPOSED METHOD

    In this work, we propose a method for combating efficiently packet loss. The method is based on MDC technique, including redundancy. Indeed, we code the signal on two descriptions in the same packet. The first uses a MELP coder and encodes the current frame Fn at 2.4 kbps while the second uses another MELP encoder running at 1.2 kbps to encode the three frames Fn+1, Fn+2, Fn+3, following Fn. Obviously, Fn is used to reconstruct the signal with a good quality while Fn+1, Fn+2and Fn +3 are used to recover the eventual loss packets. Indeed, the second description thus formed contributes to reconstruct the speech when one, two, three or even four successive packets are lost. This redundant information has not the same quality of the extracted one from Fn as it is roughly quantified. It only helps to reconstruct an intelligible speech when packets are lost. The packetization scheme is shown in figure 2. Note that the MELP 2.4 operates on a frame of 22.5ms, while the MELP 1.2 operations are achieved on a 67.5 ms frame [3, 4].

    Fig. 2: Packetization using two descriptions

    IV. PACKETIZATION

    MELP coder parameters are given in table 1. The two coders encode the fundamental frequency (pitch), the flag of the aperiodicity, the five bands of voicing, two gains corresponding to the energy of two half-frames, ten LPC coefficients converted into LSF and spectral amplitudes of ten harmonics of the pitch.

    A fine description at 2.4 kbps is required in order to provide good speech quality in no-error conditions. The configuration allows also a coarse description at 1.2 kbps with reasonable quality to recover until three successive packet losses for larger bursts. The packetisation scheme is shown in Figure 2. A packet will contain both mentioned MELP coders and will be coded using 135 bits (54 + 81) corresponding to a rate of 6kbps. Hence, formation of a packet requires the presence of four successive frames of 22.5 ms each. In a transmission without packet loss, only the first 54 bits will be used to decode the signal. Then, a packet is attributed to the current frame corresponding to the MELP 2.4 (Figure 2). This causes a delay of 22.5 ms. Note that forming and sending the first packet request a delay of 90 ms. Afterwards, every 22.5 ms a packet is sent.

    Bit Rate MELP 2.4 kbps MELP 1.2 kbps

    Sampling frequency Size of frame

    Bit Rates of frame

    8 kHz180 samples (22.5 ms)44,44 frames/seconde

    8 kHz3*180 samples (67.5 ms)14.8148 frames/seconde

    Mode of voicing V N/V VVVUVVVUV VVU

    UUV UVU VUU

    UUU

    10 LSFs 25 25 43 43 39 43 27Pitch 7 7 12 12 12 12 1210 Fourier amplitudes 8 - 8 8 8 8 05 Bands of voicing 4 - 6 4 4 2 02 Gains 8 8 10 10 10 10 10

    Flag 1 0 1 1 1 1 0Protection - 13 0 2 6 4 31Synchronisation 1 1 1 1 1 1 1

    Total bits per frame 54 bits 81bitsTotal 54*44,44= 2400 bps 81*14.8148 = 1200 bps

    Table I: Bit allocation encoder MELP 2.4 kbps and 1.2 kbps

    Packet

    MELP1.2kbps

    MELP 2.4 kbps

    Fn Fn+1 Fn+2 Fn+3

  • V. PACKET RECOVERY

    Figure 3 shows how our MDC system allows recovering lost packets. Using this scheme, three successive lost packets can be easily recovered. Even the fourth frame can be retrieved by adopting a method of extrapolation [2]. From left to right on the same figure, the respective cases of loss of a single packet, 2 packets, 3 packets and finally 4 packets are shown.

    Explanation: The first case corresponds to a single packet loss (F2 frame).The current packet is lost and the MELP 2.4 is unable toprovide us with a good quality of the speech. The system falls back to the previous received packet containing the information the lost current frame. This frame is then reconstructed with a coarse quality using the 81 bits corresponding to the 1.2 MELP coder; i.e. to the three framesF2, F3 and F4.

    The second case corresponds to two successive loss packets, namely F4 and F5 (figure 3). As in the first case, we cannot reconstruct the signal using a 2.4 MELP. So, we proceed with recovery using the previous packet which was received correctly.

    In the third case, three successive packets corresponding toF7, F8 and F9 are lost. The decoding of the signal is obviously achieved from the last received packet.

    In the latter case, when four successive packets (F11, F12, F13and F14) are lost, then F11, F12 and F13 will be directly recovered from the packet No. 10, as this packet contains bothinformation on the frame 10 at 2.4 kbps and information aboutsuccessive frames F11, F12 and F13 which were coded at 1.2kbps in this packet. The frame F14 will be retrieved using an extrapolation method as in [2].

    Fig. 3: Process of recovery of lost packets based on the proposed MDC

    22.5 ms Original speech

    Synthetic speech

    2.4 kbps 1.2 kbps 2.4 kbps 1.2 kbps 1.2 kbps 2.4 kbps 1.2 kbps 1.2 kbps 1.2 kbps 2.4 kbps

    F1 F2 F3 F4 F5 F6

    F2 F3 F4 F3 F4 F5 F4 F5 F6 F5 F6 F7 F6 F7 F8 F7 F8 F9

    F1 F2 F3 F4 F5 F6

    Received Lost Received Received Lost Received

    2.4 kbps

    1.2 kbps

    2.4 kbps

    1.2 kbps

    2.4 kbps

    1.2 kbps

    F7 F8 F9 F10

    F8 F9 F10 F9 F10 F11 F10F11 F12 F11F12 F13

    F7 F8 F9 F10

    Lost Lost Lost

    2.4 kbps

    1.2 kbps

    Lost

    Extrapolation

    2.4 kbps 1.2 kbps 1.2 kbps 1.2 kbps

    F10

    F11F12 F13

    F10

    Received Received

    2.4 kbps

    1.2 kbps

    F11 F12 F14 F15

    F12 F13F14 F13F14 F15 F14F15 F16 F16F17 F18

    F11 F12 F13 F14

    Lost Lost Lost

    2.4 kbps

    1.2 kbps

    F14

    F15F16 F17

    F15

    2.4 kbps

    Lost

  • VI. SIMULATION RESULTS AND EVALUATION

    First, we evaluate the performance of the two MELP coders implemented separately; the aim is to quantify the perceptual

    quality of these coders before they are implemented using MDC. A second evaluation will be conducted using MDC.

    A. EVALUATION CORPUS

    To assess and validate our method, we used a multilingual corpus combining Arabic, French and English. The first record is composed of Arabic sentences phonetically balanced [7] developed in our laboratory. This corpus contains a total of 60sentences, 10 sentences spoken by 3 male and 3 female speakers. For French and English, we used the known phrases phonetically balanced, la bise et le soleil and The wind and sun .

    B. EVALUATION OF CODERS

    The evaluation of the performance of the two MELP coders implemented separately were designed using the Recommendation P.862 of the ITU-T (International Telecommunication Union) [8] called PESQ (Perceptual Evaluation of Speech Quality). This method describes an objective method for predicting the subjective quality for telephony and for voice coders. It is intended to evaluate the influence of factors such as packet loss, the variable delay and distortion due to channel errors that is poorly evaluated by conventional methods. The PESQ is designed to compare a reference version (original) to that obtained by synthesizing this reference or after transmission and have been adversely affected. The results are shown in Table 2.

    Concerning the assessment of the simulation of the technique using MDC, we used different rates of packet loss using the model random process of lost [9] and tested the robustness of the MELP coder using the MDC. In this case, we found an enhancement in the robustness of the system against packet loss. The results from the objective tests of our simulation are shown in Figure 5. They showed that our method based on a loss concealment technique for VoIP application raises significant quality loss rates for up to 30%. These results show that the encoding technique used by multiple descriptions provides a significant improvement in speech quality, especially when the loss rate increases. We give in Figure 6, Figure 7 and Figure 8 a sample result which shows the reconstructed signal after a packet loss when one, two and three frames consecutives were lost respectively. We observe the correction of lost frames for different cases of packet loss. For areas where voiced and after the correction, the voicing is preserved.

    VII. CONDUCT OF TESTS

    This section presents the simulations used to perform our tests. We simulated various losses to introduce degradation in the synthetic signal. These losses were simulated randomly by use the function that follows a uniform distribution.

    MELP 2.4 Kbps MELP 1.2 Kbps

    PESQ 3.20 2.71

    Table II: Results of objective tests of two MELP coders

    Simulation of loss

    PESQ Score

    Coder Decoder

    Evaluation

    Original Signal

    Reference signal

    Synthetic signalcorrected by MDC

    Signal with loss

    Fig. 4: Results of objective tests of two MELP coders

  • 0 5 10 15 20 25 300

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    Packets loss (%)

    PESQ

    Original signalMELPMELP with MDCG.729

    1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18x 104

    1

    -0.5

    0

    0.5

    1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18 1.2x 104

    -0.5

    0

    0.5

    MELP signal at 2.4 kbps with packet loss

    1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18 1.2x 10 4

    1

    -0.5

    0

    0.5

    MELP signal with MDC

    1

    1.2

    Am

    plitu

    de

    Number of samples

    Original signal

    Am

    plitu

    de

    Am

    plitu

    de

    Fig. 5: Results of objective tests using PESQ

    Fig. 6: Sample output showing the correction made by the MDC

  • 8200 8400 8600 880 9000 9200 940 9600 9800-1

    -0.5

    0

    0.5

    8200 8400 8600 8800 9000 9200 9400 9600 9800 10000

    -0.5

    0

    0.5

    8000 8200 8400 8600 8800 9000 9200 9400 9600 9800 10000

    -0.5

    0

    0.5

    Number of samples

    Synthetic signal with MDC

    Synthetic signal at 2.4 kbps with loss

    Original signal

    Am

    plitu

    de

    Am

    plitu

    de

    Fig. 8: Sample output showing the correction made by the MDC

    8000 8200 8400 8600 8800 9000 9200 9400 9600 9800 10000

    -0.5

    0

    0.5

    8000 8200 8400 8600 8800 9000 9200 9400 9600 9800 10000

    -0.5

    0

    0.5

    8000 8200 8400 8600 8800 9000 9200 9400 9600 9800 10000-1

    -0.5

    0

    0.5

    Am

    plitu

    de

    Am

    plitu

    de

    Synthetic signal at 2.4 kbps with loss

    Synthetic signal with MDC

    Number of samples

    Original signal

    Fig. 7: Sample output showing the correction made by the MDC

    Am

    plitu

    de

    Am

    plitu

    de

  • VIII. CONCLUSION

    In this work, we have presented an original method using two harmonics MELP coders, the first for the transmission over an IP network speech encoded at 2.4 kbps. The second operating at 1.2 kbps is added to the first in the same package in order to compensate of the lost packets using a technique called multiple description or MDC. The results of our simulations for a VoIP application showed that our method based on this concealment method enhances the quality for loss rates up to 30%. The results show that the proposed system of concealment is effective and provides a significant improvement in speech quality, especially outperforms when the packet losses exceeded 15%. We have proved that theredundancy added by MDC can ensure properly good quality speech for any loss of packets. In this study, we obtained an encoder that operates at 135 bits / frame of 22.5 mscorresponding to a total rate of 6 kbps. So, we recommend the use of this solution to replace advantageously the CELP G.729standard currently used in VoIP applications and that has aflow rate of 8 kbps without MDC.

    REFERENCES[1] M. Rui and F. Labeau, Error-Resilient Multiple Description

    Coding. Proceedings of the IEEE, Vol 56, No 8, 2008

    [2] E. Orozco, E. Orozco, and A.M.Kondoz. Multiple Description Coding for Voice over IP using Sinusoidal Speech Coding. In Proc. of IEEE- ICASSP, 2006.

    [3] McCree, K. Truong, E. B. George, T. P. Barnwell, V. Viswanathan. A 2.4 kbits/s MELP Coder Candidate for the New U.S. Federal Standard. In Proc. of IEEE-ICASSP, 1996.

    [4] T. Wang, K. Koishida, V. Cuperman and A. Gersho. A 1200bps Speech Coder Based on MELP. Proc. IEEE. Inter. Conf. Acoustics. Speech and Signal Processing, 2000.

    [5] A. Nagle. Enrichissement de la Confrence audio en Voix sur IP au travers de l'amlioration de la qualit et de la spatialisation sonore. Paris Tech, France, 2008.

    [6] L. Ouakil et G. Pujolle, Tlphonie sur IP . Edition groupe Eyrolles, 2008.

    [7] M. Boudraa, B. Boudraa, B. Gurin. Twenty lists of Ten Arabic Sentenses for assessment. Acustica, vol. 86, pp.870-882, 2000.

    [8]

    [9]

    ITU-T, Perceptual evaluation for speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, Recommendation P.862, International Telecommunications Union, 2001.M. Yajnik, S. Moon, J. Kurose, and D. Towsley. Measurement and modeling of the temporal dependence in packet loss. In INFOCOM 99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, Vol 1, P. 345-352, 1999.

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 200 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 400 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /CreateJDFFile false /Description >>> setdistillerparams> setpagedevice