View
1.635
Download
0
Category
Preview:
DESCRIPTION
Citation preview
Improving Perceived Speech Quality for Wireless
VoIP By Cross-Layer Designs
By Zhuoqun Li
This dissertation is submitted to the University of Plymouth
in partial fulfilment of the award of
Master of Research in Network System Engineering
Supervisor
Prof. Emmanuel C. Ifeachor
School of Computing, Communication and Electronics
University of Plymouth
September 2003
ABSTACT
Providing VoIP services with satisfying speech quality in wireless/mobile
Internet is difficult because of impairment factors introduced in the wireless channel,
such as packet error, delay and jitter. Effective packet error recovery mechanisms
such as Automatic Repeat on reQuest (ARQ) in wireless networks are important as
they can reduce packet loss due to bit errors. This dissertation is focus on making use
of cross-layer techniques to improve the performance of ARQ hence to improve the
perceived speech quality for Wireless VoIP, which may be difficult for the layered
protocol structure. The research works for this project have been carried out in two
steps:
First, we use an objective measure of perceived conversational speech quality
(MOSc) as a metric to evaluate the performance of three current retransmission
schemes (i.e. No Retransmission, Speech Property-Based Retransmission and Full
Retransmission). Our findings indicate that the performance of the retransmission
mechanisms is a function of both wireless link quality and delay introduced in the
wireline network. We also propose a perceived speech quality driven retransmission
mechanism, which can automatically switch to the most suitable retransmission
schemes according to QoS parameters reported from different layers.
Next, we investigate the problems introduced by retransmission procedures of the
Stop and Wait ARQ protocol in a Wireless VoIP system. We then propose a cross-
layer framework in which 1) the retransmission procedure of the link layer ARQ
protocol is constrained by the available playout delay 2) In the playout delay
estimation, delivery delay in the wireless channel and wireline network is estimated
separately, and the delivery delay in the wireless channel is constrained to avoid delay
accumulations in the transmitting queue.3) If the retransmission procedure is
terminated prematurely, received noisy copies of a speech packet are combined
together to reduce the damaged part and finally played out at the application layer.
Simulation results show that these cross-layer designs improved the performance
of the Stop and Wait ARQ protocol hence significantly enhanced the perceptual
speech quality of a wireless VoIP system.
I
TABLE OF CONTENTS
ABSTACT .................................................................................................................I
TABLE OF CONTENTS....................................................................................... II
LIST OF FIGURES............................................................................................... IV
LIST OF TABLES ................................................................................................. IV
ACKOWLEDGEMENTS........................................................................................V
CHAPTER 1 .............................................................................................................1
INTRODUCTION...................................................................................................1
1.1 VoIP and Its Application in Wireless Internet.......................................................1
1.2 Motivation ............................................................................................................4
1.2.1 Impairment factors of wireless VoIP speech quality......................................4
1.2.2 Packet error concealment techniques.............................................................6
1.2.3 Cross-layer designs ........................................................................................8
1.2.4 Problem statement..........................................................................................9
1.3 Aims and Objectives...........................................................................................10
1.4 Thesis Contributions...........................................................................................10
1.5 Organization of the Thesis.................................................................................. 11
CHAPTER 2 ........................................................................................................... 12
BACKGROUND THEORIES ............................................................................... 12
2.1 Speech Quality Evaluations................................................................................12
2.1.1 Objective Speech Quality Measurement......................................................12
2.1.2 PESQ............................................................................................................13
2.1.3 E-Model .......................................................................................................14
2.1.4 Conversational speech quality evaluation....................................................15
2.2 Adaptive Playout Buffer .....................................................................................16
2.3 Automatic Repeat upon reQuest (ARQ).............................................................18
CHAPTER 3
PERCEIVED SPEECH QUALITY DRIVEN
RTRANSMISSION METCHANISM .........................20
3.1 Introduction ........................................................................................................20
3.2 Related Works.....................................................................................................21
3.2.1 Speech property-based retransmission mechanisms....................................21
3.2.2 Measuring conversational speech quality ....................................................22
II
3.2.3 Adaptive jitter buffer and retransmission jitters...........................................23
3.3 Simulation System Description ..........................................................................23
3.4 Performance Comparison of Current Retransmission Schemes.........................26
3.5 Perceived Speech Quality Driven Retransmission Scheme ...............................28
3.6 Summary ............................................................................................................29
CHPAPTER 4
PLAYOUT DELAY CONSTRAINED ARQ
and ARQ AWARE PLAYOUT BUFFER .................... 31
4.1 Introduction ........................................................................................................31
4.2 The Cross-Layer Design.....................................................................................33
4.2.1 System model...............................................................................................34
4.2.2 Playout delay constrained ARQ...................................................................34
4.2.3 ARQ aware playout buffer ...........................................................................35
4.2.3.1 Queue model..........................................................................................35
4.2.3.2 ARQ aware playout buffer.....................................................................36
4.3 Simulation Model and Experimental Results .....................................................37
4.3.1 Wireless channel model ...............................................................................37
4.3.2 Voice traffic model .......................................................................................38
4.3.3 Speech quality evaluation ............................................................................38
4.3.4 Simulation results and analysis....................................................................39
4.4 Summary ............................................................................................................41
CHAPTER 5
DISCUSSIONS, SUGGESTIONS for
FURTHER WORKS, and CONCLUSIONS...............43
5.1 Discussions .........................................................................................................43
5.2 Suggestions for Further Works ...........................................................................45
5.3 Conclusions ........................................................................................................47
REFERENCES ......................................................................................................49
APPENDICES........................................................................................................53
[APPENDIX A] ns-2 Extensions for ARQ Retry Limit Control ...........................53
[APPENDIX B] ns-2 Simulation Script for Per Packet Control of ARQ..............56
[APPENDIX C] C code for Majority-Logic Packet Combining ...........................60
[APPENDIX D] List of Items Included in the Appended CD ...............................63
[APPENDIX E] Published Papers .........................................................................64
III
LIST OF FIGURES
Figure 1-1 VoIP Protocol Architecture……………………………………………..... 2
Figure 1-2 the Wireless VoIP system overview……………………………………… 3
Figure 1-3 the Basic model of cross-layer designs………………………………….. 8
Figure 2-1 Basic Structure of Perceptual Evaluation of Speech Quality…………... 13
Figure 2-2 Schematic diagram for MOSc measurement …………………………... 15
Figure 2-3 Timing associated with packet i………………………………………... 16
Figure 3-1 Simulation Environment………………………………………………. 24
Figure 3-2 Overall packet loss rate comparison…………………………………… 27
Figure 3-3 Buffered Retx delay comparison……………………………………….. 27
Figure 3-4 MOSc comparison with 175ms network delay………………………… 27
Figure 3-5 MOSc comparison with packet error probability 0.001………………... 27
Figure 3-6 Perceived speech quality driven Retx scheme pseudo code…………… 29
Figure 4-1 Stop and Wait ARQ……………………………………………………. 31
Figure 4-2 the Cross-layer design system model………………………………….. 33
Figure 4-3 Block diagram of the playout delay constraint ARQ
with packet combining…………... 34
Figure 4-4 Timing associated with Packet…………………………………………. 36
Figure 4-5 the Simulation Model………………………………………………….. 37
Figure 4-6 Overall packet losses comparison……………………………………… 39
Figure 4-7 End-to-end delays with different inter-arrival delay…………………… 39
Figure 4-8 End-to-end delay comparison…….……………………………………. 39
Figure 4-9 Conversational MOS comparison……………………………………… 39
Figure 5-1 Perceived speech quality driven packet error recovery scheduler……... 46
LIST OF TABLES
Table 2-1 MOS scale……………………………………………………………….13
Table.3-1- Average voiced packets losses with fast-exp playout buffer……………25
IV
ACKOWLEDGEMENTS
I would like to express my sincere and deep gratitude to my supervisor, Professor
Emmanuel C. Ifeacher, who provided me the opportunity to commence the study of
Master of Research. His continuous advice and encouragements through this study are
acknowledged and greatly appreciated.
I also had the opportunity to work with researchers in the Centre for Signal
Processing and Multimedia Communications I would like to thank them for their
friendliness and support. Special thanks go to Ms. Lingfen Sun and Mr. ZiZhi Qiao,
for their valuable comments and suggestions. Without their support, this thesis would
not have been possible.
I would like to acknowledge all my classmates in MRes/Msc NSE and CE&SP,
for their generous help and enlightening. With them, I really enjoyed the passed year
in University of Plymouth.
On the personal side, I would like to thank my parents, for their unending love
and support.
V
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
CHAPTER 1
INTRODUCTION
1.1 VoIP and Its Application in Wireless Internet
Packet switched networks such as Internet had been developing very fast in the
past decades. The advantages of packet switched networks, such as efficiency and
flexibility, make them eventually become the terminator of traditional circuit switch
networks, i.e. Public Switch Telephone Network (PSTN). VoIP (Voice over Internet
Protocol or Voice over Packet) is one of the successful stories about applications of
packet networks. Generally, VoIP service is the real time delivery of packetized voice
traffic across packet switched networks such as Internet. It provides economical
communication expense and suitable speech quality compared with traditional
telephone networks.
Recently, wireless/mobile communication has been growing rapidly and
providing more and more convenient services. It’s not a surprise that there’s a great
demand to add voice service to wireless IP networks and wireless handsets. Wireless
VoIP services can be provided in Wireless Local Area Network (WLAN) i.e. IEEE
802.11 [1] network or third generation mobile network (3G) i.e. WCDMA [2]. The
protocol stack of transmitting VoIP traffic in wireline and wireless network is
presented in Figure 1-1.
MRes Thesis –University of Plymouth 1
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
In application layer, VoIP is supported by RTP (Real-time Transport Protocol) [3].
RTP provides a way to delivery delay-sensitive real-time data. The services provided
by RTP include payload type identification; sequence numbering; timestamping and
delivery monitoring. RTP Applications typically running on top of UDP, which does
not guarantee Quality of Service (QoS) but requiring lower overhead [4].
RTCP (Real-time Control Protocol) is the control protocol associated with RTP.
RTCP monitors the quality of service and conveys information about the participants
in an on-going session [3]. After voice sample is compressed and digitised, then it is
packed as the payload of an IP packet, along with an IP address for the purposes of
routing in IP networks. In the link layer, IP packets with speech data are encapsulated
in frames and supported by IEEE 802.3 [4] or 802.11 for wireline network and
wireless network respectively. Both of these link layer protocols provide services such
as framing, error control, flow control.
RTP RTCPApplication Layer
UDPTransport Layer
IPNetwork Layer
IEEE 802.3 IEEE 802.11x Data Link Layer
Figure 1-1 VoIP Protocol Architecture
MRes Thesis –University of Plymouth 2
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
Speech Source
Figure 1-2 described a VoIP system implemented in the wireless Internet. Speech
is an analog signal that varies slowly in time (with bandwidth not exceeding 4KHz).
As depicted in Figure 1-2, the speech source alternates between talking and silence
periods, which are typically considered to be exponentially distributed. Before
transmitted over packet switched networks, the speech analog signal has to be
digitised at the sender; the reverse process is performed at the receiver. The
digitalization process is composed of sampling, quantization and encoding. There are
many encoding techniques that have been developed and standardized by the ITU.
The basic encoder is the ITU G.711 which samples the voice signal in 8 kHz and
generates 8-bits per sample. Code Excited Linear Prediction (CELP) based encoders
provide rate reduction (i.e. 8 Kbps for G.729, 5.3 and 6.4 Kbps for G.723.1) at the
expense of lower quality and additional complexity and encoding delay [5]. For the
wireless/mobile communication, codecs with variable rate have been developed, e.g.
AMR [6], EVRC [7].
The encoded speech is then packetized into packets of equal size. Each such
packet includes the headers at the various protocol layers (e.g. RTP 12 bytes, UDP 8
bytes, IP 20 bytes and 802.11 34 bytes) and the payload comprising the encoded
speech for a certain duration depends on the codec deployed (e.g. 20ms for an AMR
12.2k frame).
In the study, Wireless VoIP system is considered in a last-hop scenario. In this
case, voice streams have to traverse wireline networks before they reach the access
point, which is the conjunction point of a wireline network and the wireless channel.
Silence Talk
Internet
Encoder Packetizer DDepacketizer ecoderPlayout Buffer
Figure 1-2 the Wireless VoIP system overview Sender Receiver
Access Point
MRes Thesis –University of Plymouth 3
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
As the voice packets are sent over IP networks and wireless channel, they incur
vari
us stream of packets with fixed intervals to
the
1.2 Motivation
1.2.1 Impairment factors of wireless VoIP speech quality
ctive according as perceived
by t
able delay and possibly loss. In order to provide a smooth playout delay, at the
receiver, a playout buffer is used to compensate the delay variations. Packets are held
for a later playout time in order to ensure that there are enough packets buffered to be
played out continuously. Any packet arriving after its scheduled playout time is
discarded. There are two types of playout algorithms: fixed and adaptive. A fixed
playout scheme schedules the playout of packets so that the end-to-end delay
(including both network and buffering) is the same for all packets. Fixed jitter buffers
cannot adapt readily to changes in network delays and as a result are not practical in
real VoIP applications. Adaptive playout scheme is more common in VoIP systems.
Adaptive playout buffer can adjust playout delay for each talkspurt hence it is more
suitable for the time-varying IP networks. The scheduled playout delay is a tradeoff of
buffer losses and end-to-end delay. It is important to select the value so as to
maximize the quality of voice communications. A large playout delay decreases
packet loss due to late arrivals but hinders interactivity between the communicating
parties, while small playout delay improves interactivity but causes higher buffer
losses and degrades the speech quality.
The playout buffer deliver continuo
depacketiser, whose responsibility is to stretch speech data from the payload and
feed them to the decoder. The main function of the decoder is to reconstruct speech
signals. Some decoders may implement packet loss concealment (PLC) methods that
produce replacement for the lost data packets. Having been depacketized and decoded,
speech signals are finally played out by the VoIP end devices.
Perceived speech quality of VoIP is defined in subje
he end users. Despite its costs saving benefits, providing acceptable perceived
speech quality is the key for the success of VoIP service. Currently, IP Telephony still
can’t provide a very satisfied quality due to lots of impairments factors introduced in
the transmission path over IP networks. When VoIP is applied in wireless/mobile IP
networks, because of the unreliability of wireless channel performance and the
MRes Thesis –University of Plymouth 4
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
uncertainty of the mobility of wireless handsets, the speech quality will be more
aggravated. There exist many correlated impairment factors that may seriously affect
the perceived speech quality of Wireless VoIP. In this study, the main impairment
factors are concluded as packet losses, bit errors, end-to-end delays, jitters and coding.
Packet Loss
a major impairment factor. It causes more noticeable degradation in
voic
Bit Error
is not really a problem for VoIP in wireline networks, as it does not
hap
nd-to-end delay
ectly cause any reduction in speech information but affects the
inte
Packet loss is
e quality than any other impairment factors. During their trips in the inter-
connected IP networks, speech packets may be lost due to router overflow or network
link congestion. On the other hand, VoIP applications are supported by the
connectionless protocol - UDP, which means speech packets may travel over different
paths in the IP networks before they arrive at the destination. This result in some
speech packets being out of sequence and are discarded at the receiver. Lost packets
may be reconstructed by the decoder from related information. But it is impossible to
completely rescue speech information carried by the lost packets.
Bit error
pen very often. However, if wireless channels are included in the traverse path of
speech packets, bit errors become a challenging nutshell. In the wireless environment,
the digital signal wave is exposed to absorption, scattering, interference and multi-
path fading. All these effects contribute to the Signal to Noise Ratio (SNR) at the
receiver and hence determine the performance of Bit Error Rate (BER). For packet
communications, the result of bit errors is packet loss if the whole packet is covered
by a checksum. However, if a partial checksum is used specifically for VoIP
applications, speech packets contain bit errors in the payload are still decoded and
played out. In this case, the effect of bit error on the perceived speech quality is
determined by the positions and number of bit errors.
E
Delay does not dir
ractive nature of conversations. The end-to-end delay encompasses: a. the delay
incurred in encoding and decoding; b. the delay incurred in packetization; c. the delay
incurred in the path from the sender to the receiver (e.g. transmission time over IP
MRes Thesis –University of Plymouth 5
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
networks, queuing delays in network elements, propagation and retransmission time
in wireless channel); d. the delay incurred in the playout buffer. For natural hearing,
delays lower than 100ms cannot really be noticed by most users, between 100ms and
300ms delay begin to affect conversation interactivity [9]. Longer delays are obvious
to the user and make conversations becomes impossible.
itter
s defined as a variation in the delay of received packets. At the sending side,
pack
oding
ocess of transforming analog speech signal to digital bit streams, some
cod
.2.2 Packet error concealment techniques
ror has been a critical impairment factor to
the
J
Jitter i
ets are sent in a continuous stream with the packets being spaced evenly
apart. Due to network congestion, improper queuing, or configuration errors, the
interval between adjacent packets changes constantly, hence the delay between each
packet can vary instead of remaining constant. Jitters can make voice very annoying
to the audience. Removing jitter requires collecting packets and holding them long
enough to allow the slowest packets to arrive in time to be played in the correct
sequence and re-sequence if necessary. This job is normally performed by playout
buffer, which maintains constant packet intervals at the expense of additional playout
delay or packet losses due to not arriving in time.
C
In the pr
ecs also use compression techniques to remove redundant or less important speech
information, as a way to reduce transmission bandwidth requirement while preserving
perceptual important voice signals. This procedure leads to a certain amount of speech
information lost hence affects the speech quality perceived by the user at the receiving
side. For Wireless VoIP, speech quality can be also affected the error-correction
mechanism used by codecs.
1
Packet error due to packet loss or bit er
perceived speech quality of Wireless VoIP. Many packet error concealment
techniques have been developed and improved with great effort. But these techniques
are far from perfect and even can not work properly in new communication
environment such as the growing wireless/mobile internet. Some of the main packet
MRes Thesis –University of Plymouth 6
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
error recovery methods are described hereafter:
orward Error Correction
(FEC) [11] enables lost data to be recovered at the
recei
Interleaving
as been widely used in mobile networks to distributed burst frame
erro
DP Lite
ite [15] is designed for the applications that prefer to have damaged
data de
F
Forward Error Correction
ver without further reference to the sender. Both the original data and the
redundant information are transmitted to the receiver. There are two kinds of
redundant information: those that are either independent or dependent on the media
stream. The media-independent FEC does not need to know the original data type. In
media-independent FEC, original data together with some redundant data are
transmitted to the receiver. In media dependent or specific FEC, if an original data
packet is lost, redundant data packets, which are related to the specific media, are used
to recover the loss. Usually, the redundant packet is produced using a lower-
bandwidth encoding method than the primary encoding, which results in lower quality
than the original one. The expenses of using FEC are reduced bandwidth efficiency
and increasing end-to-end delay, for the redundant information is transmitted behind
the packet it protects.
Interleaving h
rs in several channels. In VoIP applications, if the size of a data unit produced at a
time by a coder is smaller than the allowed payload size in a packet, then a few data
units may be combined into a single packet. However, in order to reduce the packet-
loss effects, or burst bit error effects in wireless environment, the original data units
are not combined in the same sequential order as produced by the coder, instead they
are interleaved by the transmitter. The resulting small gap intervals correspond
typically to speech intervals considerably shorter than a phoneme length. Therefore,
humans are able to mentally interpolate the gap intervals, and speech intelligibility is
not decreased.
U
UDP L
livered rather than discarded by the network. For VoIP over wireless, it’s not
necessary to discard speech frames that contain only several bit errors. In IP layer, the
IP header has no checksum to cover the IP payload. However UDP checksum covers
MRes Thesis –University of Plymouth 7
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
Thesis –University of Plymouth 8
Automatic Retransmission reQuest
est (ARQ) [16], when receiver can’t correctly
recei
.2.3 Cross-layer designs
been successfully supported by the layered protocol
arch
the entire datagram including media payload. In fact, in real network applications, it’s
the application layer, not the transport layer, knows best what should be verified by
the checksum. UDP Lite provides a checksum with optionally partial coverage.
In Automatic Retransmission reQu
ve a packet, sender will retransmit it for several times. ARQ-based schemes
mainly consist of three parts: a. lost data detection by the receiver or by the sender
(timeout); b. acknowledgment strategy: The receiver sends acknowledgments that
indicate which data are received or which data are missing; c. retransmission strategy:
It determines which data are retransmitted by the sender. Although it is robust and
efficient against the burst losses, ARQ also bring a series of problems to real-time
applications with delay constraint.
1
IP networks have
itecture since their early development stage. However, for the real-time
applications such as Wireless VoIP, the layered architecture may prevent them to be
readily adaptive for the instantaneous change of communication environment and
consequently can seriously impact their performance. Examples of system
performance degradation due to lack of co-operations among different layers have
been given in [18]. Corresponding solutions for the problems introduced by the
Figure 1-3 designs
Qos inforamtion mapping and
s Joint-Layer QoS technique
the Basic model of cross-layer
MRes
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
layered protocol architecture have been developed and named as cross-layer approach
or cross-layer design. The objective of cross-layer designs is to achieve efficient QoS
support and network resource allocating by joint-layer techniques, such as QoS
knowledge sharing and QoS mechanisms cooperation among different layers (see
Figure 1-3). The system performance of future networks may be enhanced by such
cross-layer designs between PHY, MAC and higher layer protocols.
Cross-layer designs have been addressed in many recent literatures.
Krishnamachari et al [19] proposed a cross-layer framework to enhance the
performance of video streaming. This framework can adaptively optimize link layer
ARQ, application layer FEC and packetization according to wireless channel
conditions. In [20], a cross-layer design was developed to control transmissions of
video streams over wireless based on the information of prefetched video (application
layer), signal strength and multiple access interference (physical layer).
1.2.4 Problem statement
In this dissertation, we raise the following research questions regarding the
improvement of perceived speech quality for Wireless VoIP by cross-layer approach.
What are the impairment factors of Wireless VoIP applications?
What are the pros and cons of ARQ mechanisms? Is the performance of Wireless
VoIP System improved by ARQ mechanisms in terms of perceived speech quality?
How to optimize current ARQ schemes to improve speech quality? And how to
mapping real-time network and wireless channel QoS parameters into ARQ
protocol optimization?
What are the effects of the interactions between ARQ mechanisms with other
components of the Wireless VoIP system? How to cope with these effects if they
are negative?
How to make use other packet error concealment technologies with ARQ? Or
how to use ARQ as a complement mechanism for other packet error concealment
MRes Thesis –University of Plymouth 9
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
technologies?
How to establish a cross-layer framework in which we can optimize the QoS
techniques located in different layer with a joint-layer analysis? And how to
establish a profile of real-time predicted speech quality and QoS parameters
collected from different layers and eventually make this profile become the
scheduler of a cross-layer framework?
Bearing these questions in mind, we have reviewed lots of related literatures and
carried out research works toward their corresponding solutions.
1.3 Aims and Objectives
The aim of this project is to develop and evaluate a cross-layer framework to
improve perceived speech quality for Wireless VoIP systems. This framework is
expected to utilize QoS parameters from multiple layers and optimize QoS techniques
located in different layers based on a joint-layer analysis, consequently to achieve
efficient and significant speech quality improvement, which may be very hard or even
impossible for single layer approaches.
1.4 Thesis Contributions
The contributions of this dissertation are listed hereafter:
We identify the impairment factors for perceived speech quality of Wireless VoIP
and specifically focus on the impact of ARQ mechanisms. We use an objective
measure of perceived conversational speech quality (MOSc) as a metric to
evaluate the performance of three current retransmission schemes including no
retransmission, Speech Property-Based (SPB) [21] retransmission and full
retransmission, while considering the impact of retransmission jitters. Our
findings indicate that the performance of the retransmission mechanisms is a
function of both wireless link quality and delay introduced in the wireline
network. And the SPB retransmission, which is supposed to protect only
perceptual important speech frames, may not achieve the expected performance
as it introduces two much jitters.
We propose a new perceived speech quality driven retransmission mechanism [22]
MRes Thesis –University of Plymouth 10
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
which may be used to improve speech quality for wireless VoIP (in terms of the
objective mean opinion score) by switching between No retransmission and Full
retransmission according to different communication conditions. Through
simulations, we show that the proposed method can achieve an optimum MOSc
compared to no retransmission, full retransmission and SPB retransmission, and it
can also achieve the similar retransmission efficiency as SPB retransmission
while avoid the implementation complexity to obtain speech property information
that is necessary for SPB retransmission
We propose a cross-layer design in which 1) retransmission procedure of the link
layer Automatic Repeat on request (ARQ) protocol is constrained by the available
delay budget estimated by the application level playout buffer. 2) If the
retransmission procedure is terminated prematurely, received noisy copies of a
speech packet are presented to application layer and finally played out. 3) In the
playout delay estimation, delivery delay in the wireless channel is estimated
separately and constrained to avoid delay accumulations in the transmitting queue.
The simulation results show that the perceptual speech quality of a wireless VoIP
system can be significantly enhanced, since retransmission delay, playout buffer
losses, queuing delay and losses are reduced by this design.
1.5 Organization of the Thesis
The rest of this dissertation is organized as follows. Chapter 2 provides an
introduction to some basic theories related to this project, such as speech quality
evaluation, adaptive playout buffer and Automatic Retransmission reQuest (ARQ)
protocol. In Chapter 3, we look at the impairment factors introduced by ARQ schemes,
and introduce a perceived speech quality driven retransmission scheme to achieve
optimum conversational speech quality. In Chapter 4, we consider problems
introduced by an ARQ protocol when it works with other components of a Wireless
VoIP system (e.g. transmitting queue, adaptive playout buffer) in the layered protocol
architecture, and propose a cross-layer design as a solution for the presented problems.
Finally, in Chapter 5 we discuss the research outcome of this project, and present
extensions and ideas for future works, a short conclusion is also presented to conclude
this thesis.
MRes Thesis –University of Plymouth 11
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
CHAPTER 2
BACKGROUND THEORIES
2.1 Speech Quality Evaluations
2.1.1 Objective Speech Quality Measurement
In voice communications, the mean opinion score (MOS) provides a numerical
measure of the quality of human speech at the receiving end. MOS indicates the
speech quality perceived by the listener and can range from 1 (bad) to 5 (excellent) as
presented in Table 2-1. There are number of measurements methods are available to
measure speech quality of a VoIP system. Basically, speech quality measurements can
be divided into two categories, subjective measurements and objective measurements.
Subjective speech quality measurement requires a large group of people involved to
attend the test. It is time consuming, unrepeatable and expensive. Compared with
subjective tests, objective tests are repeatable, automatic and do not suffer from
environment effects.
The most popular objective measurements are Perceptual Evaluation of
Speech Quality (PESQ) [23] and E-model [24]. PESQ is also categorized as a kind of
intrusive speech quality measurement, as it requires the original speech signal with the
degraded one to perform the quality evaluation. While E-model is categorized as one
of the non-intrusive speech quality measurement, as it is parameter-based and does
not require the help or original speech signal.
MRes Thesis –University of Plymouth 12
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
Quality Scale Score Listening Effort Scale
Excellent 5 No effort required
Good 4 No appreciable effort required
Fair 3 Moderate effort required
Poor 2 Considerable effort required
Bad 1 No meaning understood with reasonable effort
Table 2-1 MOS scale
2.1.2 PESQ
PESQ was specifically developed to be applicable to end-to-end voice quality
testing under real network conditions. The result of comparing the reference and
degraded signals is a quality score. The simplified system model of PESQ is given in
Figure 2-2. It consists of three key modules: time alignment module, perceptual
transform module and cognition/judgment module. The time alignment model
synchronized the degraded signal with the reference signal. The perceptual transform
module transforms the signal into a psychophysical representation that approximates
human perception. The cognition/judgment module maps the difference between
original (reference) signal and distorted (degraded) signal into estimated perceptual
distortion and then further mapped into Mean Opinion Score (MOS) scale.
The evaluated results given by PESQ have been calibrated using a large database
Figure 2-1 Basic Structure of Perceptual Evaluation of Speech Quality t
Time Alignment Model
Perceptual Transform Module
Original Speech Estimated
Distortion
Cognition/Judgment Module
Perceptual Transform Module
Distorted Speech
MRes Thesis –University of Plymouth 13
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
of subjective tests. PESQ takes into account signal degradation such as coding
distortions, errors, packet losses, delay and variable delay, and filtering with transfer
function equalization, time alignment, and a new algorithm for averaging distortions
over time. However, PESQ does not take into account the subjective effect of level
changes in the network, echo, and the effect of round-trip delay on conversation.
2.1.3 E-Model
The E-Model is a computational model, standardized by ITU-T in [24][27][28]. It
uses transmission parameters to predict the subjective speech quality of packtized
voice. E-Model has proven to be useful as a transmission-planning tool, for assessing
the combined effects of variations in several transmission parameters that affect
conversational1 quality of telephony [24]. The primary output from the EModel is the
"Rating Factor" R, and R can be further transformed to give estimates of customer
opinion by mapping it to the MOS scale.
The EModel Equation for “Rating Factor” is
AIIIRR esd +−−−= 0
This equation results in an R factor between 0 and 100. The components of R are:
R0, base R value (noise level); Id, representing the effects of impairments occurring
simultaneously with the speech signal; Is, representing the effects of impairments
occurring simultaneously with the speech signal; Ie, representing the effects of
"equipment” such as DCME or Voice over IP networks; A, the advantage factor, used
to compensate for the allowance users make for poor quality when given some
additional convenience (e.g. 0 for wireline and 10 for GSM)
Delay impairment Id
The Id factor models the quality degradation due to one-way or “mouth-to-ear”
delay. Id can be computed from the one-way delay as [29]:
)3.177()3.177(11.0024.0 −−+= aaad THTTI
where ⎪⎩
⎪⎨
⎧
≥=
<=
01)(
00)(
xifxH
xifxH
Ta represents one-way delay ( or “mouth-to-ear” delay) in milliseconds.
MRes Thesis –University of Plymouth 14
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
Equipment impairment Ie
The loss impairment Ie captures the distortion of the original voice signal due to
low-rate codec, and packet losses in both the network and the playout buffer.
Currently, the E-Model can only cope with speech distortion introduced by several
codecs i.e. G.729 or G.723.
Mapping R factor into MOS scale
We can map R into MOS scale by the following equations [24]:
MOS=1 if 0≤R 6107)100)(60(035.01 −×−−++= RRRRMOS if 1000 <≤ R
MOS =4.5 if 100≥R
2.1.4 Conversational speech quality evaluation
Trace data (loss) Degraded speech
Reference speech
Encoder Loss process Decoder
PESQ
IeMOS
Perceived speech quality during a VoIP conversation can be expressed as a
conversational Mean Opinion Score (MOSc). MOSc values can be obtained by
subjective listening tests or by objective evaluation methods, such as the EModel. As
described in Section 2.1.2, the E-Model consists of very complicated equations and is
not applicable to some impairment factors, such as some codecs or bit errors in the
payload. A prediction method for perceived conversational speech quality has been
Trace data (delay) Delay model
MOS->RE-Model Concepts
MOSc
Id
Figure 2-2 Schematic diagram for MOSc measurement
MRes Thesis –University of Plymouth 15
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
proposed in [29]. This schematic diagram of this new method is illustrated in Figure
2-3. In this method, MOS index produced by PESQ is firstly transformed to R scale
by
336.57060.87314.25026.3 23 −+−= xxxRpesq
where x represents MOS index from PESQ.
Then equipment impairment factor Ie can be computed as Ie=R0-Rpesq, with delay
impairment factor Id, we can get R scale value by R=R0-Id-Ie, finally get MOSc from
R according to the standard E-Model equations. Hence, the impairments of delay,
packet loss, coding and bit error can all be represented in the evaluated value of
MOSc.
2.2 Adaptive Playout Buffer
Playout buffer can be fixed or adaptive. In the fixed playout buffer, the playout
delay for a packet stream is preset before a conversation begins. So the fixed playout
buffer cannot readily adapt to the time-varying network conditions and may result in
poor speech quality. For this reason, adaptive playout buffer is considered. A lot of
works have been done in developing adaptive playout buffer algorithms to achieve the
best balance between playout delay and packet losses in playout buffer. Recent work
in addressing the problem specifically for the Internet can be found in
[30][31][32][33]. In this section, we briefly review some playout buffer algorithms
from these literatures. The details of applications of adaptive playout buffer in our
Wireless VoIP system can be found in Chapter 3, 4.
di
In [30], Ramjee et. al. proposed four algorithms (e.g. ‘exp-avg’, ‘fast-exp’, ‘min-
delay’ and ‘spk-delay’) to adjust playout delay according to estimated network delay
performance. These algorithms estimate mean and variation of network delay and id^
receiver
sender ti
ni ai pi
bi
Figure 2-3 Timing associated with packet i
MRes Thesis –University of Plymouth 16
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
iv^
on the arrival of the ith packet. The playout delay is adjusted at the beginning of
each talkspurt. Let ti be the timestamp of packet i which is the first packet in a
talkspurt, the playout time pi is computed as
iiii vdtp^^⋅++= µ
where µ is a constant. The playout time for the subsequent packets j in the same
talkspurt pj is computed as ijij ttpp −+= (see Figure 2-4 for the related timing
notations).
In these four algorithms is given by iv^
iiii ndabsvv )()1(^
1
^^−⋅−+⋅= − αα
But they differ in the computation of . id^
1) exponential-average (exp-avg): In this algorithm, the mean delay is estimated
through an exponentially weighted average [30]:
^
id
iii ndd ⋅−+⋅= )1(^^
αα
where means the one-way delay of iin th packet. The value of α is chosen to be
0.998002 in [30].
2) fast exponential-average (fast-exp): This algorithm is a modified version of exp-
avg. fast-exp computes the weighted mean of as [30]:
⎪⎪⎩
⎪⎪⎨
⎧
≤−+
>−+=
−−
−−
^
11
^
11
^
^
:)1(
:)1(
iiii
iiii
i
dnnada
dnndd
ββ
where α and β are constant values, satisfying 0 <α <β < 1. In [30] α = 0.998002
and β = 0.750000, this allows fast-exp adapt more quickly to increases in delays . in
3) minimum delay (min-delay) : This algorithm is more aggressive in minimizing
delays. It uses the minimum delay of all packets received in the current talkspurt. Let
Si be this set of delays [30]:
{ }jSji ndi∈= min
^
4) spike delay detection (spk-delay): This algorithm focuses on spike which represents
MRes Thesis –University of Plymouth 17
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
a sudden and large increase in delays over a sequence number of packets. spk-delay
usually obtains the playout delay usig the same equation as exp-avg, despiteα is set to
be 0.875 in [wan]. During spike, however, spk-delay uses the following
11
^^
−− −+= iiii nndd
to catch up the sudden increase of delays.
We also present here some more complex algorithms, which have been developed
based on the four classical algorithms described above.
5) window: This algorithm is proposed in [31]. It intends to detect spikes like spk-
delay. During a spike, the first packet in the spike is used as the playout delay. After
the spike, the playout delay is chosen by finding the delay corresponding to the qth
quantile of the distribution of the last N (10,000 in [31]) packets received by the
receiver.
6) adaptive: In [32], Sun et al had proposed an ‘adaptive’ algorithm to adapt to
different networks. The ‘adaptive’ algorithm switch between min-delay and fast-exp
depends on higher than a delay threshold (e.g.150ms) or not. id^
7) E-MOS: Fujimoto et al [33] proposed a playout buffer algorithm called E-MOS.
The E-MOS algorithm models the delay distribution with the Pareto distribution. The
Pareto distribution of delay is integrated with packet loss ratio in a function Q(d) to
model the impact of delay and packet loss on speech quality, which is represented by
MOS. Upon a packet is received, E-MOS uses the measured one-way delay to update
the Pareto distribution. Then, a optimal value of d is chosen as the playout delay if it
can maximize speech quality Q(d).
2.3 Automatic Repeat upon reQuest (ARQ)
Automatic Repeat reQuest (ARQ) is an error-control system in which a request for
re-transmission is generated by the receiver when an error in transmission is detected.
A very basic ARQ scheme includes only error detecting and retransmission
capabilities. If a packet is found to have errors after decoding, this packet is discarded
and a retransmission is requested to the source. The source then retransmits an exact
copy of that packet. This process may be repeated indefinitely, but normally an upper
bound in the number of retransmissions is set. If errors still persist after the maximum
number of allowed retransmissions is reached, higher layer will have to decide how
MRes Thesis –University of Plymouth 18
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
the situation is to be handled. For the retransmission procedures using ARQ, the three
most popular schemes are [16]:
Stop and Wait (SW)
In SW-ARQ, the sender, after delivering the first copy of a packet in its buffer, is
blocked until a positive acknowledgement (ACK) is received or the timeout is expired.
In the first case, sender drops the successful packet from the buffer and transmits next
packet, while in the second distance, sender simply retransmits the same packet.
Go Back N (GBN)
The sender continuously transmits packets stored in its buffer, until a Negative
ACK (NACK) is received. In this case, sender stops the transmission of a new packet,
pulls back to the packet erroneously received, and retransmits a complete sequence of
N packets, starting with NACKed packet, where N is the number of packets
transmitted within an average round trip time.
Selective Repeat (SR)
In this case sender continuously transmits packets stored in its buffer. Whenever a
NACK is received, sender stops the transmission of a new packet, pulls back to the
packet erroneously received, retransmits only it and begins the transmission of a new
packet. It is worth noticing that, in this case, the retransmission of successfully
received packet following the corrupted packet is avoided, thus allowing better
efficiency.
MRes Thesis –University of Plymouth 19
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
CHAPTER 3
PERCEIVED SPEECH QUALITY DRIVEN
RTRANSMISSION METCHANISM
3.1 Introduction
Quality of Service (QoS) support for voice over IP (VoIP) in wireless/mobile
networks is an important issue for technical and commercial reasons. However,
speech quality for VoIP suffers from high packet loss rates and other impairments in
the wireless link. Retransmission mechanisms, such as automatic repeat request
(ARQ), have been incorporated in wireless and cellular networks to retransmit lost
packets to improve performance in data transmission over wireless. In wireless
networks such as 802.11b [1], the retransmission mechanism is a simple Stop & Wait
algorithm and is implemented at the Media Access (MAC) layer, in which each
transmitted packet must be acknowledged before the next packet can be sent. If in a
certain timeout period an acknowledgement is not received by the sender of a frame,
the sender will retransmit the frame until a maximal retransmission limit is reached.
When the wireless link quality is poor, retransmission of MAC frames can effectively
recover corrupted packets that contain bit errors.
However, excessive delays may be introduced by retransmission schemes that
have significant adverse effects on real-time applications such as VoIP, which are
MRes Thesis –University of Plymouth 20
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
sensitive to delay. A simplex retransmission scheme always negatively affects
perceived speech quality in VoIP. There exists a tradeoff between packet loss and
delay in a variety of retransmission schemes. Improved retransmission mechanisms
such as Speech Property-Based ARQ (SPB-ARQ) [21] and Hybrid loss recovery
scheme [34] have been proposed to reduce speech distortions by protecting packets
that are perceptually more relevant. However, these schemes are only limited to
listening-only quality assessment of the effect of the retransmission schemes on
speech quality and do not consider the impact of delay which is important for
conversation and interactivity. Further, these schemes do not consider the impact of
retransmission jitters. Since adaptive jitter buffers would discard inappropriately
retransmitted packets, the character of retransmission jitters introduced by different
retransmission schemes should be considered.
The primary aim of the study reported is to investigate new retransmission
mechanisms to improve speech quality for wireless VoIP. In this study, we use a
perceived conversational speech quality assessment method [29] to evaluate the
performance of current retransmission mechanisms (No retransmission, Full
retransmission, SPB retransmission) instead of listening-only method or individual
network parameters (e.g. packet loss and delay). We also present a new retransmission
policy, which can adapt to the most suitable retransmission mechanism, depending on
the wireless link quality and network delay conditions. The ultimate aim of this
perceived speech quality driven policy is to achieve optimum speech quality (in terms
of the conversational Mean Opinion Score MOSc) in the face of network impairment
factors and wireless channel situations, while considering the coupling effect of
retransmission jitters and adaptive jitter buffers.
3.2 Related Works
3.2.1 Speech property-based retransmission mechanisms
Speech Property-Based QoS control schemes are based on the fact that some
voice frames are perceptually more important than others when encoded speech is
transferred through packet networks. Recent experimental results show [35], that in
some popular codecs used in wireless applications (e.g. AMR) the position of a frame
loss has a significant influence on the perceived speech quality. In such codecs, frame
loss concealment techniques are used to interpolate the parameters for the loss frames
MRes Thesis –University of Plymouth 21
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
from the parameters of the previous frames. Lost voice frames at the beginning of a
talkspurt will be concealed using the decoding information of previous unvoiced
frames. However, because voiced sounds always have a higher energy than unvoiced
sounds, concealment of these frames with unvoiced frames that have lower energy
will cause a serious degradation in speech quality. Moreover, at the unvoiced/voiced
transition stage, it is difficult for the decoder to correctly conceal the loss of voiced
frames using the filter coefficients and the excitation for an unvoiced sound,
especially when burst loss occurs or the frame size grows.
To maximize the perceptual quality at the receiving end, perceptually important
voice packets may be protected by giving them a high priory with the unimportant
packets handled as 'best-effort'. SPB retransmission, a retransmission scheme that
protects only the perceptual important speech frames, is presented in [21] [34].
Experimental results reported in [21] show that SPB retransmission could provides a
better speech quality (assessed by EMBSD) than No retransmission scheme, which do
not retransmit any packet. In [34], SPB retransmission was shown to be more efficient
in reducing retransmission delays than Full retransmission, which retransmits every
unacknowledged (unACKed) packet.
3.2.2 Measuring conversational speech quality
In previous studies [21][34], the assessment of retransmission schemes was
performed using the EMBSD algorithm, which only considers the distortion caused
by packet loss. However, in practice both packet loss and delay are crucial in voice
conversation and long retransmission delays (e.g. due to long network delay) would
seriously impact speech quality. The E-model is introduced by ITU as a non-intrusive
quality assessment method to obtain a measure of voice quality. Unfortunately, the E-
model is only applicable to a limited number of codecs, which at present does not
include the AMR codec. In our simulation, we employed the conversation MOS [29]
to qualify the performance of different retransmission schemes. In he conversation
speech quality evaluation (see Chapter 2), the ITU PESQ is firstly used to quantify the
impact of packet loss on speech quality. The result of this is then converted to the
equipment impairment Ie. The average end-to-end delay effect, Id, is then calculated.
The E-model is then used to obtain a measure of the speech quality, MOSc, based on
Ie and Id (see Figure 3-1).
MRes Thesis –University of Plymouth 22
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
3.2.3 Adaptive jitter buffer and retransmission jitters
In VoIP applications, jitters are compensated for in the receiver by a jitter buffer.
The size of a jitter buffer can be fixed or adjustable. Fixed jitter buffers cannot adapt
readily to changes in network delays and as a result are not practical in real VoIP
applications. In our study, we investigated fast-exp, one of the classical adaptive jitter
buffer algorithms proposed in [30]. By using a smaller weighting factor as delays
increase, the fast-exp algorithm can quickly adapt to the increases while avoiding
discarding of too many packets. It estimates the current mean network delay (denoted
as ) and current variance of network delay (denoted as ) when a packet arrives.
The mean delay estimation equation is given by:
^
id ^
iv
⎪⎪⎩
⎪⎪⎨
⎧
≤−+
>−+=
−−
−−
^
11
^
11
^
^
:)1(
:)1(
iiii
iiii
i
dnnada
dnndd
ββ
where is the network delay of the iin th packet, 75.0=β and 0.99802. The
following equation is used to estimate :
=a
^
iv iiii ndavav −−+ (= −
^
1
^)1
. At the beginning of
a talkspurt, adaptive jitter buffer changes the play out delay using the
equation: , where D is the play out delay and ^^
* ii vdD µ+= µ is a constant that
can be selected from 1 to 20. We set µ to be 4 in our simulation. It should be noted
that for VoIP over wireless, the network delay consists of delays introduced by the
wireline network and the wireless link. Jitters can be introduced by network
congestions in the wireline network or by retransmissions/propagations in the wireless
links. In view of the fact that most jitter buffer algorithms were proposed for
compensation of network congestion jitters, it should be valuable to investigate the
impact of retransmission jitters for VoIP over wireless
in
3.3 Simulation System Description
MRes Thesis –University of Plymouth 23
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
Our study is based on network simulator ns-2 [36], in which we simulated a last-
hop wireless scenario. Both of the IEEE 802.11 and the Ethernet protocol stack are
implemented in the simulator. A two way Bernoulli error model was inserted to
simulate the wireless link transmission errors. In 802.11, if the packet size exceeds
the Max. Transmission Unit (e.g. 1500 bytes for WaveLan) the packet will be
fragmented. Since we set the packet size to 71 bytes, a 12.2kbit rate AMR speech
frame for one RTP packet the impact of fragmentation is avoided.
The simulation system is given in Figure 3-1. In our simulation, the original
speech file is first encoded by the AMR codec and then analyzed to extract the speech
marking information (voiced/unvoiced) for each packet. The speech marking
information is used with network delay and wireless link quality to control the
retransmission policy. The error model determines whether a packet is corrupted or
not according to packet error probability (PER). The base station (BS) will neither
send an ACK to the sender for a corrupted packet nor present it to the high layer. If the
MAC layer of the sender has not received an acknowledgement for a packet, it will
retransmit the packet until the packet is ACKed or it reaches the limit of
retransmission attempts (we will denote Retransmission as Retx in the rest of this
Chapter). In our simulation, we set the Retx attempts limit to 6 for both SPB Retx and
Full Retx. In the receiver, the received speech packets are fed to an adaptive jitter
DegradedSpeech
AMR Decoder
Adaptive Playout Buffer
PESQ
EModel
RTP
UDP
IP
MAC
PHY
Fixed Host
RTP
UDP
IP
Ethernet
AMR Encoder
Speech Marking
Retx. Limit Control
Mobile HostOriginal Speech
Network Delay PER
Access Point
MOS/IeEnd-to-end MOScDelay (Id)
Speech Quality Evaluation
Figure 3-1 Simulation Environment
MRes Thesis –University of Plymouth 24
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
buffer and subsequently decoded to recover the degraded speech file that is used to
obtain a measure of speech quality.
In our study, we used combined PESQ and E-Model to evaluate the
conversational speech quality as described in Chapter 2. Performance index was
obtained averaging the computation results that were obtained from this method for
each 20 seconds of the speech file.
The following simulation results were obtained by averaging results of 50
simulations with different random seeds to avoid the impact of packet loss locations.
The three simulated retransmission schemes are SPB Retx, Full Retx and Null Retx.
TABLE 3-1 gives the average number of voiced packets losses of transmitting
73000 speech packets in our simulated wireless network with these schemes. For
simplicity, we only simulated the wireless link for the purpose of this study. And only
the wireless link (Retx limit exceeded) and the adaptive jitter buffer account for the
packet losses. In Table 3-1, most of the losses of voiced packets in Full Retx or SPB
Retx are caused by jitter buffer. As we deployed a Bernoulli error model in our
simulation, most of the retransmitted packets can be successfully received by the
receiver. If the bursty of packet errors is considered, there should be more losses of
voiced packets in Full Retx or SPB Retx scheme.
It
least t
MRes
Table.3-1- Average voiced packets losses with fast-exp playout buffer
Retx Scheme
PER
No
Retx
SPB
Retx
Full
Retx
0.0001 15 53 29
0.0005 36 54 27
0.0008 61 51 26
0.001 69 47 22
0.003 144 28 17
0.005 241 22 13
0.01 474 13 9
0.05 2344 42 16
0.10 4678 931 159
seems very straightforward that SPB Retx should be better than No Retx and at
he same as Full Retx with regard to the performance of protecting voiced frames.
Thesis –University of Plymouth 25
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
However, in TABLE 3-1, we can see that Full Retx always has less voiced packets
losses, while No Retx has the least lost voiced packets when link quality is good
(packet error probability lower than 0.0005). In fact, as in fast-exp algorithm, the
estimated playout delay will increase with the number of retransmission jitters
increases. When link quality is good, the estimated play out delay keeps at a low level,
occasionally retransmitted packets and packets adjacent to them would be discarded
by jitter buffer due to jitters they introduced. However, in No Retx scheme, a
corrupted packet doesn’t affect its following packets. That’s why it has least packet
losses when link quality is very good. On the other hand, in SPB Retx, unvoiced
packets are not retransmitted hence the estimated playout delay can’t reflect current
wireless link situations when link quality becomes worse. While in Full Retx, every
unACKed packets is retransmitted, this is helpful for the adaptive jitter buffer to
estimate the playout delay for the next talkspurt. That’s why the adaptive jitter buffer
discards more packets in SPB Retx than in Full Retx.
3.4 Performance Comparison of Current Retransmission Schemes
Figure 3-2 and Figure 3-3 give the overall packet loss rates and buffered
retransmission delay comparison. In Figure 2, we can see that Full Retx keeps the
packet loss rate at a low level at the expense of higher delay as plotted in Figure 3
because every unACKed packet is retransmitted. It’s very interesting that when link
quality is not too bad (packet error probability up to 0.01), packet loss rate of Full
Retx scheme is decreasing while link quality becoming worse. In fact, as we
mentioned before, in worse link quality, more retransmissions helps the jitter buffer to
estimate playout delay more accurately. However, when link quality is very good
(packet error probability up to 0.0005), No Retx can obtain the best packet loss rate
because it doesn’t introduce any jitter and few packets is corrupted due to bit errors.
As a compromised method, the packet loss rate and Retx delay of SPB Retx is
between No Retx and Full Retx.
Using the evaluation method described in Chapter 2, we give a more
straightforward performance comparison in Figure 4 and Figure 5 for these schemes
with MOSc as the metric. Our evaluation didn’t consider the packet losses introduced
in the wireline network hence to focus on the performance of Retx schemes. However,
we considered network delay in the evaluation. For natural hearing, delays lower than
100ms cannot really be appreciated, but delays above 150ms can obviously affect
MRes Thesis –University of Plymouth 26
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
10-4
10-3
10-2
10-1
100
10-2
10-1
100
101
102
Packet Error Probability
Loss
Rat
e (%
)
No RetxSPB RetxFull Retx
Figure 3-2 Overall packet loss rate comparison
10-4
10-3
10-2
10-1
100
0
50
100
150
200
250
300
Packet Error Probability
Buf
fere
d R
etx
Del
ay (m
s)
No RetxSPB RetxFull Retx
Figure 3-3 Buffered retx delay comparison
100 120 140 160 180 200 220 240 260 280 3003.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4
4.1
4.2
Network Delay
Perceived Quality DrivenNo RetxSPB RetxFull Retx
10-4 10-3 10-2 10-1 1001.5
2
2.5
3
3.5
4
Packet Error Probability
MO
Sc
Perceived Quality DrivenNo RetxSPB RetxFull Retx
MO
Sc
Figure 3-5 MOSc comparison with packet error probability 0.001 Figure 3-4 MOSc comparison with 175ms
network delay
MRes Thesis –University of Plymouth 27
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
conversation interactivity [37]. Considering Retx delays rarely exceed 100ms, to
obviously reflect the impact of Retx delay, we assume 175ms delay had been
introduced in the wireline network and add it to the end-to-end delay in the MOSc
evaluation. In Figure 4, the MOSc of Full Retx is lower than No Retx and SPB Retx
when packet error probability is lower than 0.003. That’s because Full Retx scheme
always introduces more Retx delay, while the perceived speech quality is sensitive to
high delay when link quality is good. When packet error probability exceeds 0.003,
Full Retx scheme becomes the best, as it can greatly reduce the number of corrupted
packets. Figure 3-5 illustrates the performance comparison with different network
delays when packet error probability is 0.001. In Figure 3-5, we can see that when
delay lower than 150ms, Full Retx can get the best MOSc. When delay is higher than
150ms Null Retx becomes the best, it confirms that 150ms is the threshold above
which delay begins to have a severe impact on speech quality. Similar to Figure 4, the
performance of SPB is between No Retx and Full Retx, but it doesn’t become the best
in both sides of the delay threshold.
3.5 Perceived Speech Quality Driven Retransmission Scheme
Considering both No Retx and Full Retx schemes can achieve the best MOSc
under different link quality and network delay situations. We propose a new perceived
speech quality driven retransmission scheme, which can switch between these two
schemes when link quality and network delay changes. The pseudo code of the new
scheme is shown in Figure 3-6. Low_Error_Threshold is set to be 0.0005 and
High_Error_Threshold is 0.003. Since according the simulation results, when packet
error probability is lower than 0.0005, No Retx can achieve the best MOSc even delay
is not considered, whereas Full Retx becomes the best when packet error probability
exceed 0.003, even network delay is very high. When packet error probability is
between 0.0005 and 0.003, the decision should be made according to network delay.
In the proposed scheme, Delay_Threshold is set to be 150ms as it’s the threshold that
delay begin to obviously affect speech quality. In real applications, we can convert Bit
Error Rate (BER) to PER, and BER can be obtained according to bit errors in bit
pattern series sent from BS. Network delay can be estimated by deducting average
MH to BS handoff delay from average end-to-end delay that can be retrieved from
RTP packet header.
MRes Thesis –University of Plymouth 28
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
The performance of the new perceived speech driven scheme is also given in
Figure 3-4 and Figure 3-5 under different network delay and packet error probability.
We can see that the curve of the perceived quality driven scheme is overlapped with
parts of No Retx and Full Retx when they achieve best MOSc. As it can switch to the
more suitable scheme between No Retx and Full Retx when communication
conditions changes. Since this method only uses Full Retx when it’s necessary, it can
also achieve the similar retransmission efficiency as SPB Retx while avoid the
implementation complexity to obtain speech property information that is necessary for
SPB Retx.
Figure 3-6 Perceived speech quality driven Retx scheme pseudo code
if (PER < Low_Error_Threshold) . No_Retx();
else if (PER>High_Error_Threshold) Full_Retx();
else { if(Network_Delay<Delay_Threshold)
Full_Retx(); else No_Retx();
}
3.6 Summary
A suitable retransmission scheme is crucial for obtaining the best possible
perceived speech quality in wireless VoIP applications. In this Chapter, we
investigated the performance of three different retransmission schemes (No Retx, SPB
Retx, Full Retx) with regard to the perceived conversational speech quality. The
impact of retransmission jitters with an adaptive jitter buffer was also considered. The
simulation results show that the performance of these schemes depends on the
network delay and wireless link quality. Considering that the wireless environment is
variable, we have proposed a perceived speech quality driven retransmission scheme
that can adapt to the wireless link quality and network delay conditions. As the SPB
Retx is not involved in the new method, the implementation complexity for retrieving
speech property information is avoided. Our results show that the proposed method
can achieve an optimum MOSc compared to No Retx, Full Retx and SPB Retx. Since
the most suitable scheme is deployed by the new method when communication
conditions change. In the study, a simplified last hop wireless network is
MRes Thesis –University of Plymouth 29
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
implemented to demonstrate wireless voice over IP scenario. Further improvements
may be achieved by making the simulation closer to real network, e.g. by
incorporating a multi-state error model in the wireless link.
MRes Thesis –University of Plymouth 30
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
CHPAPTER 4
PLAYOUT DELAY CONSTRAINED ARQ
and ARQ AWARE PLAYOUT BUFFER
4.1 Introduction
Due to the unreliable and error-prone features of wireless channels, assuring
acceptable perceived speech quality has been a challenging task for Wireless VoIP.
Automatic Repeat on reQuest (ARQ) is one of the packet error recovery techniques
for Wireless VoIP and may be a complement or substitute for Forward Error
Correction (FEC) because of its efficiency and simplicity.
n ACKn ACKn+1
n+1
Tx Queue
Rx Buffer
Wireless Channel
Timer Started
Frame Loss
n+1
Timeout
TimerStopped
Backoff
TimerRestarted
TimerStopped
Figure 4-1 Stop and Wait ARQ
MRes Thesis –University of Plymouth 31
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
In ARQ, the sender sends packets or Protocol Data Units (PDUs) consisting of
payload and checksums. According to the result of checksum validation, the receiver
sends back acknowledgment messages (e.g. ACK or NACK) to the transmitter. The
sender performs packet retransmissions based on such acknowledgments. Basically,
ARQ protocols can be categorized as three types: Stop-and-Wait (SW), Go-Back-N
(GBN) and Selective Repeat (SR), which are differed in the way of responding to
acknowledgments. The details of these three types of ARQ have been described in
Chapter 2. In this study, we consider the SW-ARQ in IEEE 802.11 Media Access
Control (MAC) Layer [1]. In the 802.11 SW-ARQ, the transmitted packet must be
acknowledged before the next packet can be sent. If in a certain timeout period an
acknowledgement for a packet is not received by the sender, the sender will retransmit
this packet until a maximal retry limit is reached. In the Distributed coordination
function (DCF) Mode of IEEE 802.11, there is a Backoff procedure to randomly defer
each retransmission hence to avoid collisions of multiple transmitters (see Figure 4-1).
With this procedure, corrupted packets may be recovered by the retransmitted copies.
However, ARQ schemes also bring a series of problems impacting the perceived
speech quality. The retransmission procedure may introduce excessive delays, when
the packets have to traverse a high delay wireline network before it reach the wireless
part, any retransmissions may considered unnecessary [22]. Number of retransmission
attempts may vary according to wireless channel quality, this leads to retransmission
jitter.
Further, the layered protocol architecture, which puts ARQ and the playout buffer
works in different layer, makes things go from bad to worse. Firstly, if an adaptive
playout buffer is employed in the Wireless VoIP system, a packet’s delay budget -
playout delay is decided at the beginning of each talkspurt. Since the retransmission
procedure is only constrained by a fixed maximum retry limit, high retry limit that
exceeds available delay budget may lead to unnecessary retransmissions and postpone
subsequent packets, while low retry limit may terminate retransmission procedure
prematurely with enough delay budget left. Secondly, considering a transmitting
queue exists in the sender, a high mean retransmission delay can make incoming
packets accumulate in the queue and queuing delay or losses quickly climb up.
Thirdly, in current protocol stack, packets that failed in transport or link layer
checksum validations are discarded, despite noisy voice packets may be considered
useful at the upper layer [38].
MRes Thesis –University of Plymouth 32
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
These problems have been addressed in some previous works. In [39][40][41],
the retransmission procedure is still constrained by a fixed maximum retry limit, but it
can be terminated at a packet’s deadline (e.g. presentation time). Nevertheless, these
works still cannot avoid the prematurely terminating of a retransmission procedure
when there is still some delay budget left for more retry attempts, and did not consider
the impact of retransmission delays on queuing delays or losses.
In [15] UDP-Lite, a modified UDP protocol with partial checksum, has been
developed to allow corrupted UDP packet to be reused at application level. However,
for Wireless VoIP the MAC layer checksums should be modified as partial as well.
Otherwise, noisy packets would have been discarded in MAC layer and never reached
upper layers.
We extended these ideas in a cross-layer design for Wireless VoIP, where the
retransmission procedure is only incorporated in local channel. In our design, link
layer ARQ and playout buffer cooperate in an integrated framework, in which 1)
retransmission procedure of a packet is constrained in the available delay budget. 2)
Speech data is not covered in the checksum of link layer or transport layer packets.
And a packet combining process is performed to get a least noisy packet from its
retransmitted copies. 3) Estimates delivery delay in the wireless channel separately
and limits it in the mean inter-arrival delay of the transmitting queue. Simulation
results show that with the help of this design, the simulated Wireless VoIP system
gained considerable performance improvement, at the expense of breaking the layered
protocol architecture.
4.2 The Cross-Layer Design
PLAYOUT TIMEPLAYOUT BUFFER
MRes Thesis –University of Plymouth 33
RTP
UTP
IP
ETHERNET
RTP
UTP
802.11 MAC
PHY
IP
FIXED HOST MOBILE HOST
Figure 4-2 the Cross-layer design system model
c b a
INCOMING QUEUE
ACESS POINT
b a3 a2 a1 To DECODER
PACKET COMBINNING
Retransmission Terminated
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
4.2.1 System model
The system model of the proposed cross-layer design is described in Figure 4-2.
We considered the last-hop scenario in an IEEE 802.11 wireless network. Our design
is composed of two correlated components: playout delay constraint ARQ, in which
playout delays become the stop criterion of the retransmission procedure; ARQ aware
playout buffer, which calculates packet delivery delay for the wireline and wireless
part respectively and constrains the wireless channel delay budget under the arriving
interval of incoming packets hence to avoid accumulations of queuing delay.
As speech data is not covered by the link layer and transport layer checksums, the
playout buffer may receive several noisy versions of a packet. In case of the packet’s
correct version hasn’t been received at its presentation time, we employed the
Majority-Logic packet combining [44] to further reduce the damaged part and then
sent a combined version to the decoder. Details of this technique are presented in
Appendix D.
The two key components of the cross-layer design are described in the following
subsections.
4.2.2 Playout delay constrained ARQ
Corrupted?
Present toupper layer
Y
N
Send ACK
Playouttime?
N
Check recei vedcopies of thepl ayout packet
Exist a correctversion?
Y
N
Send to Decoder
Mul ti -l ogicalpacket combi ni ng
Y
Appliation & LinkLayer Interface
Wait forpacket
retransmission
Terminate currentretransmission
process
Received Apacket?
Y
N
Figure 4-3 Block diagram of the playout delay constrained ARQ with packet combining
The playout delay constraint ARQ is a specific optimization of current protocol
stack for Wireless VoIP. The block diagram of the playout delay constraint ARQ is
MRes Thesis –University of Plymouth 34
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
given in Figure 4-3. In the receiver, the 802.11 MAC layer presents every received
packet to the upper layer, whether it’s corrupted or not. In the application layer, the
playout buffer can terminate a packet’s retransmission procedure at its playout time
hence to avoid unnecessary retransmissions. If a corrupted packet hasn’t been
recovered by the retransmission procedure, the received noisy copies are combined
together by the packet combining module to get a more reliable version, which is then
decoded and played out.
We still keep the maximum retry limit in the 802.11 SW-ARQ, but it is set to be
high
.2.3 ARQ aware playout buffer
4.2.3.1 Queue model
per flow transmission queue at the sender with a large enough
que
enough so as to avoid prematurely terminating of retransmission procedure when
there is still delay budget left for more retry attempts. To allow corrupted packets to
be presented from link layer to application layer, the link layer and transport layer
checksums have to be modified as partial (e.g. UDP-Lite). And the mechanisms that
eliminating duplicate PDUs should be turned off for the supported VoIP services.
Further application level checksum such as CRC in RTP packet should be enabled
hence the application layer can detect correct packets from several copies.
4
Assume there is a
ue length, so the queue losses can be ignored and we can focus on the queuing
delay. With the IEEE 802.11 SW-ARQ, the transmission queue can be seen as an
M/M/1 queuing system with Poisson distribution of packets arrivals and exponential
distribution of packets departures [45]. Let α be the average inter-arrival delay and s
the average packets departure delay. We have a1
=λ, s
1=µ
where λ and µ are
the mean arrival rate and mean service rate. The queu can
be computed as
mean waiting delay in the e
sasaTQ
−⋅
=−
=λµ
1
We can deduce that when as → , ∞→TQ
ained
which means if the mean delivery
delay in the wireless channel is not constr under the mean inter-arrival delay of
incoming packets, TQ will quickly climb up.
MRes Thesis –University of Plymouth 35
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
4.2.3.2 ARQ aware playout buffer
For Wireless VoIP, the network delay is composed of delivery delays in wireline
and wireless part. In our design, besides adjusting playout delay for each talkspurt, the
ARQ aware Playout Buffer is able to estimate required delivery delay in the wireless
and wireline part separately. Figure 4-4 gives the timing notations associated with the
playout buffer algorithm.
Since every noisy copy produced in the retransmission procedure was not
discarded, there may be several copies of a packet exist in the playout buffer. Let ai be
the receiver timestamp of the first arrived copy of ith packet, and ti be the sender
timestamp. We can compute delivery delay in wireline network for packet i (denoted
by nwi) as iii tanw −= . Let ri be the receiver timestamp of the last arrived copy. The
delivery delay in wireless channel of packet i (denoted by nci) can be computed as
. If no retransmission required for packet i,iii arnc −= ii ar = . However, recall that the
waiting delay in the transmission queue will quickly climb up if the mean delivery
delay in the wireless channel higher than the mean inter-arrival delay of the incoming
packets (denoted by iσ ). The playout buffer should be able to limit nci
under iσ when iii ar σ≥− . iσ can be estimated as:
)()1( 11 −− −⋅−+⋅= iiiii abs σσασασ
whereα is the same constant as used in the estimation of whereiv^
α is the same
constant as used in the estimation of and it is set to be 0.99802 in the simulation. iv^
The computing formula for network delay ni can be summarized as:
Receiver
Access Point
Senderti
nwi nci ni
ai ri
id^
Retry Attempts
Figure 4-4 Timing associated with Packet
MRes Thesis –University of Plymouth 36
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
⎪⎩
⎪⎨
⎧
=≥−+<−−+
=+=
iii
iiiii
iiiiii
iii
arnwarnw
ararnwncnwn σσ
σ
The ARQ aware playout buffer is only differed with other algorithms in the way of
computing network delay ni, We can estimate mean network delay according to
present algorithms, e.g. the ‘adaptive’ algorithm proposed in. [35]:
id^
)_(^
thresholddelaydif i ≥ { }jSji ndi∈= min
^.
⎪⎪⎩
⎪⎪⎨
⎧
≤−+
>−+=
−−
−−
^
11
^
11
^
^
)1(
)1(
iiii
iiii
i
dnnada
dnnddelse
ββ
Details of this algorithm can be found in Chapter 2.
4.3 Simulation Model and Experimental Results
As presented in Figure 4-5, the simulation model is comprised of the following
components: a voice traffic model, AMR encoder and decoder, a playout buffer, and a
wireless network simulator that integrated the 802.11 SW-ARQ and a simple
Bernoulli bit error model.
Wireless Network
Simulator
PlayoutBuffer
4.3.1 Wireless channel model
We employed a simple Bernoulli model for bit errors, which lead to packet
corruptions in the payload and the packet header. The probability of PHY layer
packets corrupted by bit errors PER can be computed as follows: plphBERPER +−−= )1(1
Voice Traffic
Decoder Encoder
Conversational Speech Quality
End-to-end Delay
Evaluation
MOSc
Figure 4-5 the Simulation Model
MRes Thesis –University of Plymouth 37
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
where BER is the Bit Error Rate and ph is the packet overhead size from physical
level. For our simulations we have used a value of 784 bits for ph: 24, 34, 20, 8, 12
bytes at the PHY, MAC, IP, UDP and RTP layer respectively (no header compression
is used). pl is the payload size, which is set to be 32 bytes corresponding to an AMR
12.2K voice frame.
Let ω denote the estimated playout delay, and be the corresponding
maximum retry limit constrained by
ϖR
ω . We can also compute the probability of a
packet being recovered after times of retransmissions PKR as ϖR
)1(1 PERPERPKR R −⋅= −ϖ
And the probability of the bit errors happen in the packet header PHE can be given by:
11)1(1+
⋅−−=pl
BERPHE ph
If a packet always contains bit errors in its header in R times of retransmissions,
the speech data carried by this packet can not be reused. The probability of this event
PLS is:
ϖRPHEPLS =
4.3.2 Voice traffic model
The voice traffic model can be simply represented by the on-off model [48]. In
the on-off model a two-state chain is assumed, one corresponds to the talkspurt and
one for the silence periods. The holding time in the two states is assumed to follow an
exponential distribution. In our simulation we selected a mean of 1.0 sec and 1.5 sec
for talkspurt state and silence state respectively as suggested in [49]
4.3.3 Speech quality evaluation
In our simulation model, we employed the conversational speech quality
evaluation method [29] to qualify the performance of different simulation strategies.
This method combined PESQ and E-Model to measure the perceived speech quality,
the results is represented by MOSc (Conversational Mean Opinion Score). The details
of this method can be found in Chapter 2. In this method, the impact of bit errors in
the payload, packet losses and delay all contribute to the degradation of final
evaluated speech quality.
MRes Thesis –University of Plymouth 38
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
4.3.4 Simulation results and analysis
We considered three strategies in the simulation study: Strategy A. SW-ARQ and
‘adaptive’ playout buffer without the proposed cross-layer design, Strategy B. playout
delay constrained ARQ with ‘adaptive’ playout buffer, and Strategy C playout delay
constrained ARQ with ARQ aware playout buffer. The simulation results were
obtained by averaging results of 30 trials with different random seeds to avoid the
impact of packet loss or bit error locations. Each trial continued for 200 seconds
corresponding to 10,000 PDUs (one PDU encapsulated one RTP packet).
Figure 4-6 shows the overall packet loss ratio comparison for these strategies.
When BER increases, Strategy A discard many corrupted packets that can not be fully
recovered before their playout time. Strategy B and C are the same policy regarding
packet losses. Both of them reuse noisy packets and only discard those packets that
cannot reach the receiver before their playout time. The result is that Strategy B and C
only discard a small percentage of packets compared to Strategy A, even when the
wireless channel is very noisy.
MRes Thesis –University of Plymouth 39
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
10-4 10-3100
150
200
250
300
350
BER
End
-to-e
nd d
elay
(ms)
inter-arrvial delay: 26msinter-arrvial delay: 28msiinter-arrvial delay: 30ms
10-4 10-30
10
20
30
40
50
60
70
BER
Pac
ket l
oss
Rat
io (%
)
Strategy AStrategy BStrategy C
Figure 7 End-to-end delay VS inter-arrival delay in Strategy A Figure 6 overall packet losses
Figure 9 conversational MOS comparison
10-4 10-30.5
1
1.5
2
2.5
3
3.5
4
4.5
BER
Con
vers
atio
nal M
OS
Strategy AStrategy BStrategy C
Figure 8 end-to-end delay comparison
10-4 10-3100
150
200
250
300
350
BER
End
-to-e
nd d
elay
(ms)
Strategy AStrategy BStrategy C
MRes Thesis –University of Plymouth 40
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
We also plotted end-to-end delays under different inter-arrival delays and wireless
channel conditions in Figure 4-7 and 4-8 with a fixed 100ms delay in the wireline
network. In Figure 4-7, the delay curves begin to spread at BER . The curve
for the shortest inter-arrival delay (26ms) increases fastest. It reflects the queue model
that the closer between the mean inter-arrival delay and the delivery delay in the
wireless channel, the higher the queuing delays or the end-to-end delays.
4105 −×
In Figure 4-8, we can see that the end-to-end delays of these strategies are
climbing with the increasing of BER. Strategy B performs slightly better than Strategy
A when BER become worse, as Strategy B has the capacity to terminate unnecessary
retransmission. Strategy C outperforms Strategy A and B with a more stable curve, as
it managed to avoid queuing delay accumulations. It should be noted that the delay
curves decreased at some points where the ‘adaptive’ playout buffer switches to the
‘min-delay’ algorithm more frequently.
The performance enhancement achieved by the cross-layer design in terms of
conversational Mean Opinion Score (MOSc) are presented in Figure 4-9. From Figure
4-9, we can see that the curve of Strategy A and B deceases significantly after BER 10-
4. At a BER of around 10-3, Strategy A already reaches 1.0, which is the worst MOSc.
On the contrary, Strategy C, or the cross-layer design, still achieves MOSc 3.0 at the
same BER.
4.4 Summary
We investigated problems introduced by the IEEE 802.11 SW-ARQ protocol
when it works with other components of a Wireless VoIP system (e.g. transmitting
queue, adaptive playout buffer) in the layered protocol architecture, and propose a
cross-layer design as a solution for the presented problems. The proposed cross-layer
design is composed of two correlated components: 1) playout delay constrained ARQ,
in which a packet’s playout time is the deadline of its retransmission procedure, and
instead of simply discard corrupted packets, noisy copies of a packet can be combined
and then played out. 2) ARQ aware playout buffer, in which requirements for the
delivery delay in wireless channel (e.g. not to advocate queuing delay) is considered
in playout delay estimation. Through simulations, we show that the proposed cross-
layer design can improve the perceived speech quality of a Wireless VoIP system in
terms of conversational Mean Opinion Score (MOSc). In our simulation, the wireless
MRes Thesis –University of Plymouth 41
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
channel errors are represented by a simple Bernoulli error model. Further
improvements may be achieved by making use of multi-state error models to simulate
transmission errors in wireless channel.
MRes Thesis –University of Plymouth 42
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
CHAPTER 5
DISCUSSIONS, SUGGESTIONS for
FURTHER WORKS, and CONCLUSIONS
5.1 Discussions
So far, based on the research works we have done in this study, we can discuss the
research questions raised at the beginning of this dissertation.
a. What are the impairment factors of Wireless VoIP applications?
For VoIP, the impairment factors have been concluded as packet loss, delay, jitter
and coding. Besides these impairment factors, for Wireless VoIP, bit errors can be
concluded as another impairment factor. If the whole packet carrying speech data is
covered by checksums (UDP checksum or MAC checksum), the effect of bit errors
perceived at the application level is also packet loss. However, if we applied a partial
checksum to cover only the packet header, the effect of bit errors can be packet loss or
speech distortion, depends on the positions of bit errors are inside the packet header or
payload
b. What are the pros and cons of ARQ mechanisms? Is the performance of Wireless
VoIP System improved by ARQ mechanisms in terms of perceived speech quality?
MRes Thesis –University of Plymouth 43
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
Compared to FEC that requires extra overhead, ARQ is a simple and efficient way
to recover damaged packets. The main problems introduced by ARQ schemes are
retransmission delay and jitter. Normally, the perceived speech quality of a Wireless
VoIP system can be significantly enhanced by using variation of ARQ schemes,
except some cases, e.g. low BER and high wireline network delay. But the use of
ARQ schemes should be constrained in a certain level, i.e. constrain the delay for
retransmission procedure in the playout delay or inter-arrival delay of the transmitting
queue at the access point.
c. How to optimize current ARQ schemes to improve speech quality? And how to
mapping real-time network and wireless channel QoS parameters into ARQ protocol
optimization?
ARQ schemes can be optimized to achieve retransmission efficiency, e.g. only
retransmitting import speech packets in SPB ARQ or switching between No
Retransmission and Full Retransmission in the proposed perceived speech quality
driven scheme. Another optimized version of ARQ is playout delay constrained ARQ,
which can terminate retransmission procedure of ARQ whenever necessary. All these
optimizations need QoS parameters to make decisions. The QoS parameters may be
obtained from other layers, namely, playout delay from application layer, wireless
channel performance from physical layer and other information from joint-layer
analysis.
d. What are the effects of the interactions between ARQ mechanisms with other
components of the Wireless VoIP system? How to cope with these effects if they are
negative?
One example of interactions between ARQ and other components of the Wireless
VoIP system is the effect of playout buffer on ARQ. If the retransmission procedure is
only constrained by a fixed maximum retry limit, retransmission procedure with high
retry limit may exceed available delay budget, leading to unnecessary retransmissions
and subsequent packets postponed, with low retry limit retransmission procedure may
be terminated prematurely before it reach the playout time. The corresponding
solution is the proposed playout delay constrained ARQ, for which the retry limit is
the estimated playout delay.
MRes Thesis –University of Plymouth 44
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
e. How to make use other packet error concealment technologies with ARQ? Or how
to use ARQ as a complement mechanism for other packet error concealment
technologies?
Using ARQ as a complement mechanism for other packet error recovery
techniques, e.g. FEC, has been addressed in previous works. In this study, we
investigated the performance of a cross-layer design, which incorporated ARQ,
majority-logical packet combining and partial checksum. The several noisy copies,
which were produced from the retransmission procedure of ARQ, can result in a least
noisy copy through a packet combining process. We conclude that a hybrid packet
recovery solution can achieve better performance gain than a single one, provided
appropriate scheduling among available packet error concealment techniques..
f. How to establish a cross-layer framework in which we can optimize the QoS
techniques located in different layer with a joint-layer analysis? And how to establish
a profile of real-time predicted speech quality and QoS parameters collected from
different layers and eventually make this profile become the scheduler of a cross-layer
framework?
In this study, we have achieved considerable improvement of speech quality by
simply adapting QoS parameters into the optimization of ARQ schemes with joint-
layer analysis. More works left for future studies to establish a perceived speech
quality driven cross-layer framework, in which QoS parameters from different layers,
evaluated speech quality feed back from the receiver and the Service Level
Agreement (SLA) are contributed to the decisions about using which packet error
recovery techniques and how to combined them together in a inter-cognizing way.
5.2 Suggestions for Further Works
In this study, the packet error recovery techniques of the cross-layer designs are
driven by network parameters. In future works, we plan to improve the performance
of cross-layer designs by establishing a more sophisticated perceived speech quality
driven close-loop packet error recovery scheduler. By close-loop, we mean the effects
of the strategy issued by the cross-layer design can be feedback and contribute to the
next phase of strategy-making.
In fact, we expect the perceived speech quality driven packet error recovery
MRes Thesis –University of Plymouth 45
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
scheduler have the following abilities: 1) collect QoS parameters (e.g. BER, end-to-
end delay, packet loss, and bandwidth) from different layers to form a profile of
current communication environment; 2) considering the performance feedback, the
situations of current communication environment and the users’ requirement (e.g.
SLA), produce an optimized packet error recovery strategy; 3) according to the
decided strategy, packet error recovery techniques are scheduled and several
techniques may be used at the same time, e.g. link layer ARQ and application level
FEC; 4) speech quality is evaluated periodically and sent back with other QoS
parameters to the Scheduler as the input of strategy-making.
Figure 5-1 illustrated the block diagram of the perceived speech quality driven
packet error recovery scheduler. The Scheduler will be composed of three key
components:
Packet error recovery scheduler: The central part of the framework is a real-time
scheduler located in the mobile host or wireless handset. The scheduler takes into
consideration variations due to channel error rate, overall packet loss rate, speech
quality feedback etc. and tries to produce an optimal packet error recovery strategy for
local wireless channel to maximize the perceived speech quality with available
resource. The packet error recovery strategy may address the problem about which
packet error recovery technique should be scheduled, FEC, ARQ, low coding rate or
hybrid? The specification for a specific technique can be provided as well, e.g. coding
rate, delay budget for ARQ.
DegradedSpeech Adaptive
Playout Buffer
RTP
UDP
IP
MAC
PHY
RTP
Fixed HostSpeech Source Mobile Host
UDP
IP
Ethernet
Decoder
Figure 5-1 Perceived speech quality driven packet error recovery scheduler
Access Point
Encoder
BoosterPacket Error
Recovery Scheduler
Perceived Speech Quality
Evaluation
Error recovery strategy
Feedback (MOSc, end-to-end delay, etc.)
QoS parametersFEC
ARQ
...
MRes Thesis –University of Plymouth 46
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
Booster: a Booster will be patched in the access point (AP). The Booster will have the
capacity of per flow service differentiation and admission control in the distributed
coordinated function (DCF). The Booster will also cooperate with the Scheduler to
differentiate wireless channel delivery delay from network delay or wireless channel
packet losses from network congestion losses. Further, the Booster can be designed to
change its QoS policies according to the packet error recovery strategy issued by the
scheduler.
Perceived Speech quality evaluation and feedback: the perceived speech quality
evaluation module is located in the receiver. This module will evaluate perceived
speech quality at a specified interval and send back the results, normally the
conversational Mean Opinion Score, with other QoS parameters such as end-to-end
delay to the sender. Such feedback information can be carried by the RTCP report or
other forms of in-band signaling. It should be noted that Figure 5-1 only gives the
scenario of a Mobile Host sending out speech traffic. In fact, the perceived speech
quality evaluation module should be also located in the Mobile host itself, and feed
evaluated quality indexes to local Scheduler in the case of the Mobile host is receiving
speech traffic in a conversation,
Besides these functionality considerations, more details about implementation
complexity, resource requirement etc. will be considered as well.
5.3 Conclusions
Perceived speech quality is crucial for the success of Wireless VoIP, a typical
application in the up coming wireless Internet or “4G”. The impairment factors to the
perceived speech quality of a Wireless VoIP system can be summarized as packet loss,
end-to-end delay, jitter, bit error and coding. In this study, we investigated the
problems introduced by ARQ schemes with regard to the perceived speech quality. We
tried to optimize current ARQ protocol by mapping cross-layer QoS parameters into
the scheduling and configuration of the retransmission procedure in ARQ. We
proposed a perceived speech quality driven retransmission scheme, which can switch
to the most suitable retransmission schemes according to QoS parameters reported
from lower or upper layer. We also developed a cross-layer framework, in which the
MRes Thesis –University of Plymouth 47
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
retransmission procedure of the ARQ protocol is determined by the available playout
delay and the delivery delay in the wireless channel is constrained in the network
delay estimation. Through simulation results, we showed that these cross-layer
techniques can achieve significant performance gains. But the works have been done
are far from perfect, towards an integrated perceived speech quality driven cross-layer
framework, more effort are required in future studies.
MRes Thesis –University of Plymouth 48
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
REFERENCES
[1] IEEE Standards Department, IEEE 802.11 Standard for Wireless LAN, Medium
Access Control (MAC) and Physical Layer (PHY) Specification, 1999
[2] 3GPP2 C.S0001-B, Introduction to cdma2000 Spread Spectrum Systems, MAY
2002
[3] Schulzrinne H., Castner S., Frederick R and Jacobson V.,RFC 1889: RTP: a
Transport Protocol for Real-time Applications, 1996
[4] Tanenbaum A.S. Computer networks, Prentice-Hall, 1996, ISBN 0-13-394248-1
[5] Thomas J.Kostas et al, Real-Time Voice Over Packet-Switchced Networks, IEEE
Network, 12(1): 18-27, January, 1998
[6] 3GPP TS.26090: Mandatory Speech Codec speech processing functions AMR
speech Codec; Transcoding functions, DEC 1999
[7] 3GPP2 C.S0014-0: Enhanced Variable Rate Codec (EVRC), JAN 1997
[8] S Rudkin, A Grace and M W Whybray, Real-time application on the Internet,
British Telecom Technology Journal,Vol 15,No2,April 1997.
[9] Agilent Technologies, Web ProForum Tutorials, Voice Quality (VQ) in
Converging Telephony and IP networks, http://www.iec.org/tutorials/voqual.pdf
[10] M. Veeraraghavan, N. Cocker, and T. Moors, Support of voice services in IEEE
802.11 wireless LANs, Proc. Infocom, Anchorage, Alaska, April 2001
[11] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley and J. Crowcroft,
RFC3453: The Use of Forward Error Correction (FEC) in Reliable Multicast, DEC
2002
[12] Moo Young Kim, Renat Vafin, Packet-Loss Recovery Techniques for VoIP,
Technical Report, Royal Institute of Technology (KTH), Sweden
[13] Wenyu Jiang, Henning Schulzrinne, Comparison and Optimization of Packet
Loss Repair Methods on VoIP Perceived Quality under Bursty Loss , NOSSDAV 2002
[14] C. S. Perkins, O. Hodson and V. Hardman, A Survey of Packet-Loss Recovery
Techniques for Streaming Audio, IEEE Network Magazine, SEP/OCT 1998.
[15] L. A. Larzon, M. Degermark, and S. Pink, “The UDP Lite Protocol,” Internet
Draft draft-ietf-tsvwg-udp-lite-00.txt, Jan. 2002.
[16] Leon-Garcia and Widjaja, Communication Networks: Fundamental Concepts and
MRes Thesis –University of Plymouth 49
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
Key Architectures, McGraw-Hill, 2000, ISBN 0070228396
[17] Qian Zhang, Wenwu Zhu, and Ya-Qin Zhang, A Cross-layer Qos-Supporting
Framework for Multimedia Delivery over Wireless Internet, Proc. 12th Packet Video
Workshop (PV2002), Pittsburgh, USA, 2002
[18] Sanjay Shakkottai, Theodore S. Rappaport and Peter C. Karlsson, Cross-layer
Design for Wireless Networks, Technical Report Submitted for Journal Publication,
2003
[19] S. Krishnamachari, M.V. D. Schaar, S. Chor and X. Xu,Video Streaming over
Wireless LANs: A Cross-layer Approach, Proc. Packet Video, Nantes, France, APR
2003
[20] Yo Huh, Ming Hu, Martin Reisslein, and Junshan Zhang, MAI-JSQ: A Cross-
Layer Design for Real-Time Video Streaming in Wireless Networks, Technical
Report Telecommunications Research Center, Dept. of Electrical Eng., Arizona State
University, AUG 2002.
[21] H Sanneck, N Tuong L Le et al, Selective Packet Prioritization for Wireless
Voice over IP, 4th Int Sym Wireless Personal Multimedia Communication, Denmark,
2001
[22] Z.Li, L.Sun, Z.Qiao and E.Ifeachor, Perceived Speech Quality Driven
Retransmission Mechanism for Wireless VoIP, Proc. IEE 3G 2003 pp395-399, London,
UK, JUN 2003
[23] ITU-T Recommendation P.862, Perceptual evaluation of speech quality (PESQ),
an objective method for end-to-end speech quality assessment of narrowband
telephone networks and speech codecs.
[24] ITU-T Recommendation G.107 (05/2000), The E-model, a computational model
for use in transmission planning.
[25] ITU-T Recommendation P.830, Subjective Performance Assessment of
Telephone-band and Wideband Digital Codes.
[26] Athina. P. Markopoulou, Access the Quality of Multimedia Communication over
Internet Backbone Networks, PHD thesis, Department of Electronical Engineering,
Stanford University, USA, OCT 2002
[27] ITU-T Recommendation G..108, Application of the Emodel: a planning guide,
SEP 1998
[28] ITU-T Recommendation G.113, Transmission impairments due to speech
processing, FEB 2001
MRes Thesis –University of Plymouth 50
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
[29] Lingfen Sun and Emmanuel Ifeachor, "New Methods for Voice Quality
Evaluation for IP Networks", Proc. of 18th International Teletraffic Congress (ITC18),
Berlin, Germany, SEP 2003
[30] R.Ramachandran, J.Kurose, D.Towsley and H.Schulzrinne, 1994, Adaptive
playout mechanisms for packetized audio applications in wide-area networks, Proc. of
IEEE Inforcom, vol.2, pp.680-688
[31] S. B. Moon, J. Kurose, and D. Towsley, “Packet audio playout delay adjustment:
performance bounds and algorithms,” ACM/Springer Multimedia Systems, vol. 5, pp.
17–28, JAN 1998.
[32] L Sun, E.C.Ifeachor, 2003, Prediction of Perceived Conversational Speech
Quality and Effects of Playout Buffer Algorithms, Proc. of IEEE ICC 2003
[33] Kouhei Fujimoto, Shingo Ata, and Masayuki Murata. Playout control for
streaming applications by statistical delay analysis, Proc. IEEE ICC, vol.8, pp 2337-
2342, JUN 2001.
[34] C.Hoene, I.Carreras, A.Wolisz, 2001, Voice over IP: Improving the Quality over
Wireless LAN by Adopting a Booster Mechanism – An Experiment Approach. Proc.
SPIE 2001 - Voice over IP (VoIP) Technology, pp. 157- Denver, Colorado, USA
[35] L.F.Sun, G.Wade, B.M.Lines and E.C.Ifeachor, 2001, Impact of Packet Loss
Location on Perceived Speech Quality ,Proceedings of 2nd IP-Telephony Workshop
(IPTEL '01), Columbia University, New York, pp.114-122.
[36] The Network Simulator - ns-2, http://www.isi.edu/nsnam/ns/
[37] ITU-T G.114, One-Way Transmission Time, FEB 1999
[38] Florian Hammer, Peter Reichl, Tomas Nordstrom, Gernot Kubin, Corrupted
Speech Data Considered Useful, in Proceeding First ISCA Tutorial and Research
Workshop on Auditory Quality of Systems, Mont Cenis, Germany, April 2003
[39] E. Uhlemann, T. M. Aulin, L. K. Rasmussen and P.-A.Wiberg, “Concatenated
hybrid ARQ - A flexible scheme for wireless real-time communication”, IEEE Real-
Time and Embedded Tech. and Appl. Symp., SEP 2002
[40] Christos Papadopoulos, Gurudatta M.Parulkar, Retransmission-Based Error
Control for Continuous Media Applications, Proc. NOSSDAV, 1996
[41] Guijin Wang, Qian Zhang, Wenwu Zhu and Ya-Qin Zhang, Channel-Adaptive
Error Control for Scalable Video over Wireless Channel, the 7th International
workshop on Mobile Multimedia Communcations (Momuc), Japan, Oct.2000
MRes Thesis –University of Plymouth 51
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
[42] Qingwen Liu, Shengli Zhou, and Georgios B. Giannakis, Cross-Layer
Combining of Adaptive Modulation and Coding with Truncated ARQ over Wireless
Links, IEEE Transactions On Wireless Communications, 2004 (To appear)
[43] Richard Han, David Messerschmitt, A Progressively Reliable Transport
Protocol For Interactive Wireless Multimedia, ACM Multimedia Systems Journal,
MAR 1999
[44] Stephen B.Wicker, Adaptive Rate Error Control Through the Use of Diversity
Combining and Majority-Logic Decoding in a Hybrid-ARQ Protocol, IEEE
Transactions on communications, VOL.39, NO.3, MAR 1991
[45] E. PAGE, Queuing system in OR, the Butterworths Group, 1972, ISBN
0408702370
[46] F.Cali, M.Conti and E.Gregori, “IEEE 802.11 wireless LAN: Capacity analysis
and protocol enhancement”, Proc. IEEE INFOCOM, 1998
[47] J. Rosenberg, L. Qiu and H. Schulzrinne, ‘Integrating Packet FEC into Adaptive
Voice Playout Buffer Algorithms on the Internet’, Proc. of IEEE Infocom 2000, vol.3
pp.1705-1714
[48] P. Brady, ‘A Technique for Inversting On-Off Patterns of Speech’, Bell System
Technical Journal, 44(1):1-22, JAN 1965.
[49] ITU-T Recommendation P.59, Telephone transmission quality objective
measuring apparatus: Artificial conversational speech.
[50] Shyan S.Chakraborty, Erkki Yli-Juuti, and Markku Liinaharja, An ARQ Scheme
with Packet Combining, IEEE Communications Letters, 1998
[51] E.Uhlemann., T.M. Aulin, L.K. Rasmussen and P-Arne Wiberg. Packet
Combining and Doping in Concatenated Hybrid ARQ Schemes Using Iterative
Decoding, Proc. of IEEE WCNC 2003
[52] Wenyu Jiang, Henning Schulzrinne, Comparison and Optimization of Packet
Loss Repair Methods on VoIP Perceived Quality under Bursty Loss, NOSSDAV 2002
MRes Thesis –University of Plymouth 52
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
APPENDICES
[APPENDIX A] ns-2 Extensions for ARQ Retry Limit Control
/* Modifications In mac-802_11.h */ class Mac802_11 : public Mac { public: Mac802_11(PHY_MIB* p, MAC_MIB *m); static int retr; … } // TCL Hooks for the simulator static class Mac802_11Class : public TclClass { public: Mac802_11Class() : TclClass("Mac/802_11") {} TclObject* create(int, const char*const*) { return (new Mac802_11(&PMIB, &MMIB)); } virtual void bind(); virtual int method(int argc, const char*const* argv); } class_mac802_11; /* Modifications in mac-802_11.cc */ void Mac802_11Class::bind() { //Call to base class bind() must precede add_method() TclClass::bind(); add_method("retrNo"); } int Mac802_11Class::method(int ac, const char*const* av) { Tcl& tcl = Tcl::instance(); int argc = ac - 2; const char*const* argv = av + 2; if (argc == 2) { if (strcmp(argv[1], "retrNo") == 0) { tcl.resultf("%d", Mac802_11::retr); return (TCL_OK); } } else if (argc == 3) { if (strcmp(argv[1], "retrNo") == 0) { Mac802_11::retr= atoi(argv[2]); //set value of the static variable here return (TCL_OK); } } return TclClass::method(ac, av); }
MRes Thesis –University of Plymouth 53
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
//Retransmission Routines void Mac802_11::RetransmitDATA() { struct hdr_cmn *ch; struct hdr_mac802_11 *mh; u_int32_t *rcount, *thresh; assert(mhBackoff_.busy() == 0); assert(pktTx_); assert(pktRTS_ == 0); ch = HDR_CMN(pktTx_); mh = HDR_MAC802_11(pktTx_); /* * Broadcast packets don't get ACKed and therefore * are never retransmitted. */ if((u_int32_t)ETHER_ADDR(mh->dh_da) == MAC_BROADCAST) { //Packet::free(pktTx_); pktTx_ = 0; /* * Backoff at end of TX. */ //rst_cw(); //mhBackoff_.start(cw_, is_idle()); //return;
// these lines are commented so ARQ mechanism can be //used for any topology
} macmib_->ACKFailureCount++; if((u_int32_t) ch->size() <= macmib_->RTSThreshold) { rcount = &ssrc_; thresh = &macmib_->ShortRetryLimit; } else { rcount = &slrc_; //thresh = &macmib_->LongRetryLimit; // set the value of retransmission limit *thresh=Mac802_11::retr;
printf("threshold=%d\n",*thresh); } (*rcount)++; if(*rcount > *thresh) { macmib_->FailedCount++; /* tell the callback the send operation failed before discarding the packet */ hdr_cmn *ch = HDR_CMN(pktTx_); if (ch->xmit_failure_) { ch->size() -= ETHER_HDR_LEN11; ch->xmit_reason_ = XMIT_REASON_ACK; ch->xmit_failure_(pktTx_->copy(), ch->xmit_failure_data_); }
MRes Thesis –University of Plymouth 54
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
discard(pktTx_, DROP_MAC_RETRY_COUNT_EXCEEDED); pktTx_ = 0; printf("(%d)DATA discarded: count exceeded\n",sta_seqno_); *rcount = 0; rst_cw(); } else { struct hdr_mac802_11 *dh; dh = HDR_MAC802_11(pktTx_); dh->dh_fc.fc_retry = 1; sendRTS(ETHER_ADDR(mh->dh_da)); //printf("(%d)retxing data:%x..sendRTS..\n",index_,pktTx_); inc_cw(); mhBackoff_.start(cw_, is_idle()); } }
MRes Thesis –University of Plymouth 55
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
[APPENDIX B] ns-2 Simulation Script for Per Packet Control of ARQ
# wireless2.tcl # simulation of a wired-cum-wireless scenario consisting of 2 wired nodes # connected to a wireless domain through a base-station node. #================================================================== # Define options #================================================================== set opt(chan) Channel/WirelessChannel ;# channel type set opt(prop) Propagation/TwoRayGround ;# radio-propagation model set opt(netif) Phy/WirelessPhy ;# network interface type set opt(mac) Mac/802_11 ;# MAC type set opt(ifq) Queue/DropTail/PriQueue ;# interface queue type set opt(ll) LL ;# link layer type set opt(ant) Antenna/OmniAntenna ;# antenna model set opt(ifqlen) 25000 ;# max packet in ifq set opt(nn) 1 ;# number of mobilenodes set opt(adhocRouting) DSDV ;# routing protocol set opt(x) 500 ;# x coordinate of topology set opt(y) 500 ;# y coordinate of topology set opt(seed) [lindex $argv 0] ;# seed for random number gen. set opt(stop) 20000 ;# time to stop simulation set opt(utp1-start) 2.0 set num_wired_nodes 2 set num_bs_nodes 1 # ================================================================ # check for boundary parameters and random seed if { $opt(x) == 0 || $opt(y) == 0 } { puts "No X-Y boundary values given for wireless topology\n" } if {$opt(seed) > 0} { puts "Seeding Random number generator with $opt(seed)\n" ns-random $opt(seed) } # create simulator instance set ns_ [new Simulator] set erate [lindex $argv 1] puts "erate $erate \n" proc UniformErr {} { global erate set em [new ErrorModel] $em set rate_ $erate $em unit pkt $em ranvar [new RandomVariable/Uniform] return $em } $ns_ node-config -IncomingErrProc UniformErr -OutgoingErrProc UniformErr
MRes Thesis –University of Plymouth 56
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
# set up for hierarchical routing $ns_ node-config -addressType hierarchical AddrParams set domain_num_ 2 ;# number of domains lappend cluster_num 2 1 ;# number of clusters in each domain AddrParams set cluster_num_ $cluster_num lappend eilastlevel 1 1 2 ;# number of nodes in each cluster AddrParams set nodes_num_ $eilastlevel ;# of each domain set tracefd [open wireless2.tr w] #set namtrace [open wireless2.nam w] $ns_ trace-all $tracefd #$ns_ namtrace-all-wireless $namtrace $opt(x) $opt(y) # Create topography object set topo [new Topography] #set mac80211 [new Mac/802_11] # define topology $topo load_flatgrid $opt(x) $opt(y) # create God create-god [expr $opt(nn) + $num_bs_nodes] #create wired nodes set temp {0.0.0 0.1.0} ;# hierarchical addresses for wired domain for {set i 0} {$i < $num_wired_nodes} {incr i} { set W($i) [$ns_ node [lindex $temp $i]] } # configure for base-station node $ns_ node-config -adhocRouting $opt(adhocRouting) \ -llType $opt(ll) \ -macType $opt(mac) \ -ifqType $opt(ifq) \ -ifqLen $opt(ifqlen) \ -antType $opt(ant) \ -propType $opt(prop) \ -phyType $opt(netif) \ -channelType $opt(chan) \ -macTrace OFF \ -wiredRouting ON \ -agentTrace ON \ -routerTrace OFF \ -topoInstance $topo #create base-station node set temp {1.0.0 1.0.1 1.0.2 1.0.3} ;# hier address to be used for wireless ;# domain set BS(0) [$ns_ node [lindex $temp 0]] $BS(0) random-motion 0 ;# disable random motion #provide some co-ord (fixed) to base station node $BS(0) set X_ 1.0 $BS(0) set Y_ 2.0 $BS(0) set Z_ 0.0 #configure for mobilenodes
MRes Thesis –University of Plymouth 57
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
$ns_ node-config -wiredRouting OFF for {set j 0} {$j < $opt(nn)} {incr j} { set node_($j) [ $ns_ node [lindex $temp \ [expr $j+1]] ] $node_($j) base-station [AddrParams addr2id \ [$BS(0) node-addr]] } #create links between wired and BS nodes $ns_ duplex-link $W(0) $W(1) 5Mb 2ms DropTail $ns_ duplex-link $W(1) $BS(0) 5Mb 2ms DropTail $ns_ duplex-link-op $W(0) $W(1) orient down $ns_ duplex-link-op $W(1) $BS(0) orient left-down # setup TCP connections set udp1 [new Agent/UDP] $udp1 set class_ 2 set null1 [new Agent/Null] set cbr1 [new Application/Traffic/CBR] $cbr1 set packetSize_ 32 $cbr1 set interval_ 0.020 $cbr1 attach-agent $udp1 $cbr1 set maxpkts_ 1 #per packet control $ns_ attach-agent $node_(0) $udp1 $ns_ attach-agent $BS(0) $null1 $ns_ connect $udp1 $null1 # Define initial node position in nam for {set i 0} {$i < $opt(nn)} {incr i} { # 20 defines the node size in nam, must adjust it according to your # scenario # The function must be called after mobility model is defined $ns_ initial_node_pos $node_($i) 5 } # begin to read in per packet information, i.e. Voiced or Unvoiced set pattern_file_name abmixed.vo set pattern_fid [open $pattern_file_name r] set cbrtime 0.0 set j -1 puts "Reading Speech Property Marking files.............." while {[eof $pattern_fid]==0} { incr j gets $pattern_fid current_line scan $current_line "%d" voice_flag set r($j) $voice_flag } set i 0 while {$i<=$j} { $ns_ at [expr $i*0.027] "Mac/802_11 retrNo 9;$cbr1 start" incr i } set opt(stop) [expr $i*0.02+10000]
MRes Thesis –University of Plymouth 58
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
$ns_ at $opt(stop) "$cbr1 stop" # Tell all nodes when the simulation ends for {set i } {$i < $opt(nn) } {incr i} { $ns_ at $opt(stop).00001 "$node_($i) reset"; } $ns_ at $opt(stop).00002 "$BS(0) reset"; $ns_ at $opt(stop).0002 "puts \"NS EXITING...\" ; $ns_ halt" $ns_ at $opt(stop).01 "stop" proc stop {} { global ns_ tracefd namtrace # $ns_ flush-trace close $tracefd close $namtrace #exec nam wireless2-out.nam & exit 0 } puts "Starting Simulation..." $ns_ run
MRes Thesis –University of Plymouth 59
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
[APPENDIX C] C Code for Majority-Logic Packet Combining
Packet combining techniques are used in the decoding process in packet
switched networks with ARQ protocol. The motivation to use packet combining is
that a received packet always contains at least a small amount of useful information.
This information can be used in conjunction with other received copies of the packet
to obtain an estimate of the transmitted data that is more reliable than that obtainable
from any single copy. There are two basic approaches to combine multiple received
packets: code combining and diversity combining.
Diversity combining differs from code combining in that multiple copies of a
packet encoded at rate R are combined bit by bit to create a single codeword from the
original rate R code. Each bit in the resulting packets make more reliable through the
receipt of multiple copies of each bit. Despite it is not as powerful as code combining;
diversity combining is much simpler to implement.
Majority-logic diversity combining is the use of multiple copies of each
transmitted bit in a voting scheme to obtain a single more reliable version of each bit.
Majority-logic packet combining rule
The majority-logic packet combining rule is the simplified majority-logic
decoding rule [44]. Let J be the number of received copies of a packet.
Let , be the set of bits with the same position i in packet copies of J. Let kiB , Jk ≤≤0
η be the number of bits with the value one in bits set . IfkiB , 12
+⎥⎦⎥
⎢⎣⎢≥
Jη , Bi in final
combined packet is determined to have a value of one. If ⎥⎦⎥
⎢⎣⎢ −
≤2
1Jη , Bi is
determined to have a value of zero. It should be noted that if J is even, η may equal to
2J , so 1
221
+⎥⎦⎥
⎢⎣⎢<<⎥⎦
⎥⎢⎣⎢ − JJ η . In this case, we can increase J to be odd through further
retransmission or to take 50 percent of risk if there is no time for further
retransmissions.
MRes Thesis –University of Plymouth 60
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
C code for majority-logic packet combining and producing bit errors in payload:
#ifdef HAVE_CONFIG_H #include <config.h> #endif #include <stdio.h> #include <stdlib.h> enum RXFrameType { RX_SPEECH_GOOD = 0, RX_SPEECH_PROBABLY_DEGRADED, RX_SPARE, RX_SPEECH_BAD, RX_SID_FIRST, RX_SID_UPDATE, RX_SID_BAD, RX_NO_DATA, RX_N_FRAMETYPES /* number of frame types */ }; enum TXFrameType { TX_SPEECH = 0, TX_SID_FIRST, TX_SID_UPDATE, TX_NO_DATA, TX_N_FRAMETYPES /* number of frame types */ }; typedef short Word16; #define SERIAL_SIZE 1+244+4+1 int main(int argc, char *argv[]) { FILE *file_serial, *lossfile,*losspattern; Word16 serial[SERIAL_SIZE],serial_noisy[6][SERIAL_SIZE]; int frame,iCombine,i,j,iseed,iMajority,erase_flag; float rm,errate; char buf[50]; if(argc<6) {printf("Usage: crpacket amr_encodedfile loss_pattern_file output_lossfile Error_rate randomseed\n"); exit(0);} if((file_serial=fopen(argv[1],"rb"))==NULL){ printf( "%s cannot be opened for read\n",argv[1]); exit(0);} if( (lossfile=fopen(argv[3],"wb")) ==NULL){ printf( "%s cannot be opened for write\n",argv[3] ); exit(0);} if( (losspattern=fopen(argv[2],"rb")) ==NULL){ printf( "%s cannot be opened for read\n",argv[2] ); exit(0);} //iCombine=atoi(argv[3]); errate=atof(argv[4]); iseed=atoi(argv[5]); frame=0; srand48(iseed); while (fread (serial, sizeof(Word16), SERIAL_SIZE, file_serial) == SERIAL_SIZE)
MRes Thesis –University of Plymouth 61
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
{ printf ("\nframe=%d ", ++frame); fgets(buf,50,losspattern); sscanf(buf,"%d %d",&erase_flag,&iCombine); if(iCombine<0 || iCombine >6) iCombine=0; if(erase_flag==0) serial[0]=TX_NO_DATA; else if(iCombine!=0) { //Multi-logical packet combining for(j=0;j<iCombine;j++){ for(i=0;i<SERIAL_SIZE;i++) serial_noisy[j][i]=serial[i]; for(i=1;i<SERIAL_SIZE-5;i++) {rm=drand48(); if(rm<=errate) serial_noisy[j][i]=!serial_noisy[j][i]; } } //corrupt original packet for(i=1;i<SERIAL_SIZE-5;i++) {rm=drand48();//Benoulli random error
if(rm<=errate) serial[i]=!serial[i]; }
//Multi-logical packet combining for(i=1;i<SERIAL_SIZE-5;i++){ iMajority=1; for(j=0;j<iCombine;j++) if(serial[i]==serial_noisy[j][i]) iMajority++;
if(iMajority<(iCombine/2+iCombine%2)) {serial[i]=!serial[i];printf("combined ");}
} } if (fwrite (serial, sizeof (Word16), SERIAL_SIZE, lossfile) != SERIAL_SIZE) { fprintf(stderr, "\nerror writing output file: %s\n", argv[2]); }; } fflush(lossfile); fclose(file_serial); fclose(lossfile); fclose(losspattern); return EXIT_SUCCESS; }
MRes Thesis –University of Plymouth 62
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
[APPENDIX D] List of Items Included in the Appended CD The following items are included in the appended CD:
Thesis
The e-copy of the thesis (Word/PDF)
Papers
Papers published or going to be published (Word/PDF)
References
Papers/Documents referenced in the thesis
Presentation
Slides presented in the MRes Viva
Software
Developed programs for the project, including matlab/C ++ source codes and
sripts. And related software tools (e.g. AMR codec and PESQ), data (e.g. ITU-
T speech file ).
MRes Thesis –University of Plymouth 63
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs
[APPENDIX E] Published Papers
[1] Z.Li, L.Sun, Z.Qiao and E.Ifeachor, Perceived Speech Quality Driven
Retransmission Mechanism for Wireless VoIP, Proc. IEE 3G 2003 pp395-399, London,
UK, JUN 2003
MRes Thesis –University of Plymouth 64
PERCEIVED SPEECH QUALITY DRIVEN RETRANSMISSION MECHANISM FOR WIRELESS VoIP
Z Li, L Sun, Z Qiao and E Ifeachor Department of Communication and Electronic Engineering
University of Plymouth, Plymouth, U.K.
Abstract—Effective link Layer retransmission mechanisms in wireless networks are important as they can reduce packet loss due to bit errors. For wireless voice over IP (VoIP) , a key question that needs to be addressed in order to provide the best possible perceived speech quality is how to utilize retransmission schemes to recover corrupted packets whilst avoiding excessive retransmission delays. The contributions of this paper are two fold. First, we use an objective measure of perceived conversational speech quality (MOSc) as a metric to evaluate the performance of three current retransmission schemes (i.e. No Retransmission, Speech Property-Based Retransmission and Full Retransmission), while considering the impact of retransmission jitters. Our findings indicate that the performance of the retransmission mechanisms is a function of both wireless link quality and delay introduced in the wireline network. Second, we propose a new perceived speech quality driven retransmission mechanism which may be used to achieve optimum perceived speech quality for wireless VoIP (in terms of the objective mean opinion score) by switching to the most suitable retransmission schemes under different communication conditions. I.INTRODUCTION
Quality of Service (QoS) support for voice over IP (VoIP) in wireless/mobile networks is an important issue for technical and commercial reasons. However, speech quality for VoIP suffers from high packet loss rates and other impairments in the wireless link. Retransmission mechanisms, such as automatic repeat request (ARQ), have been incorporated in wireless and cellular networks to retransmit lost packets to improve performance in data transmission over wireless. In wireless networks such as 802.11b [1], the retransmission mechanism is a simple Stop & Wait algorithm and is implemented at the Media Access (MAC) layer, in which each transmitted packet must be acknowledged before the next packet can be sent. If in a certain timeout period an acknowledgement is not received by the sender of a frame, the sender will retransmit the frame until a maximal retransmission limit is reached. When the wireless link quality is poor, retransmission of MAC frames can effectively recover corrupted packets that contain bit errors.
However, excessive delays may be introduced by retransmission schemes that have significant adverse effects on real-time applications such as VoIP, which are sensitive to delay. A simplex retransmission scheme always negatively affects perceived speech quality in VoIP. There exists a tradeoff between packet loss and delay in a
variety of retransmission schemes. Improved retransmission mechanisms such as Hybrid loss recovery scheme [2] and Speech Property-Based ARQ (SPB-ARQ) [3] have been proposed to reduce speech distortions by protecting packets that are perceptually more relevant. However, these schemes are only limited to listening-only quality assessment of the effect of the retransmission schemes on speech quality and do not consider the impact of delay which is important for conversation and interactivity. Further, these schemes do not consider the impact of retransmission jitters. Since adaptive jitter buffers would discard inappropriately retransmitted packets, the character of retransmission jitters introduced by different retransmission schemes should be considered.
The primary aim of the study reported in the paper is to investigate new retransmission mechanisms to improve speech quality for wireless VoIP. The contributions of the paper are twofold. First, we propose the use of a perceived conversational speech quality assessment method [4] to evaluate the performance of current retransmission mechanisms (No retransmission, Full retransmission, SPB retransmission) instead of listening-only method or individual network parameters (e.g. packet loss and delay). Second, we present a new retransmission policy, which can adapt to the most suitable retransmission mechanism, depending on the wireless link quality and network delay conditions. The ultimate aim of this perceived speech quality driven policy is to achieve optimum speech quality (in terms of the conversational Mean Opinion Score MOSc) in the face of network impairment factors and wireless channel situations, while considering the coupling effect of retransmission jitters and adaptive jitter buffers.
The paper is organized as follows, In Section II we describe the basic issues and methodology, including retransmission mechanisms, conversational speech quality evaluation and adaptive jitter buffers. Section III describes our simulation system. Results of simulations and the proposed perceived speech driven retransmission scheme is presented in Section IV. Section V concludes this paper.
II.BASIC ISSUES AND METHODOLOGY
A. Speech Property-based Retransmission Mechanisms Speech Property-Based QoS control schemes are
based on the fact that some voice frames are perceptually more important than others when encoded speech is transferred through packet networks. Recent experimental results show [5], that in some popular codecs used in wireless applications (e.g. AMR) the position of a frame loss has a significant influence on the perceived speech
quality. In such codecs, frame loss concealment techniques are used to interpolate the parameters for the loss frames from the parameters of the previous frames. Lost voice frames at the beginning of a talkspurt will be concealed using the decoding information of previous unvoiced frames. However, because voiced sounds always have a higher energy than unvoiced sounds, concealment of these frames with unvoiced frames that have lower energy will cause a serious degradation in speech quality. Moreover, at the unvoiced/voiced transition stage, it is difficult for the decoder to correctly conceal the loss of voiced frames using the filter coefficients and the excitation for an unvoiced sound, especially when burst loss occurs or the frame size grows.
To maximise the perceptual quality at the receiving end, perceptually important voice packets may be protected by giving them a high priory with the unimportant packets handled as 'best-effort' . For SPB retransmission, a retransmission scheme that protects only the perceptual important speech frames, is presented in [2][3]. Experimental results reported in [2] show that SPB retransmission could provides a better speech quality (assessed by EMBSD) than No retransmission scheme, which do not retransmit any packet. In [3], SPB retransmission was shown to be more efficient in reducing retransmission delays than Full retransmission, which retransmits every unacknowledged (unACKed) packet. B. MEASURING CONVERSATIONAL SPEECH QUALITY
In previous studies [2][3], the assessment of retransmission schemes was performed using the EMBSD algorithm, which only considers the distortion caused by packet loss. However, in practice both packet loss and delay are crucial in voice conversation and long retransmission delays (e.g. due to long network delay) would seriously impact speech quality . The E-model [6] is introduced by ITU as a non-intrusive quality assessment method to obtain a measure of voice quality. Unfortunately, the E-model is only applicable to a limited number of codecs which at present does not include the AMR codec. In our simulation, we employed a technique that combines the PESQ and the E-model to evaluate the performance of different retransmission schemes. In the combined approach , the ITU PESQ is firstly used to quantify the impact of packet loss on speech quality. The result of this is then converted to the equipment impairment Ie. The average end-to-end delay effect, Id, is then calculated. The E-model is then used to obtain a measure of the speech quality, MOSc, based on Ie and Id (see Figure 1). Details of the implementation of the combined method are given in [4] C. Adaptive jitter buffer and Retransmission Jitters
In VoIP applications, jitters are compensated for in the receiver by a jitter buffer. The size of a jitter buffer can be fixed or adjustable. Fixed jitter buffers cannot adapt
readily to changes in network delays and as a result are not practical in real VoIP applications. In our study, we investigated fast-exp, one of the classical adaptive jitter buffer algorithms proposed in [7]. By using a smaller weighting factor as delays increase, the fast-exp algorithm can quickly adapt to the increases while avoiding discarding of too many packets. It estimates the current
mean network delay (denoted as d ) and current variance of network delay (denoted as v ) when a packet arrives. The mean delay estimation equation is given by:
^
i
i
^
≤−+
>−+
−−
−−
^
11
^
11
^
:)1(
:)1(
iiii
iiii
dnnada
dnnd ββ
where is the network delay of the iin th packet, 75.0=β and =a
^
0.99802. The following equation is used to
estimate : iv
iiii ndava −−+= −
^
1
^)1(
^
idD µ+=
v . At the beginning
of a talkspurt, adaptive jitter buffer changes the play out delay using the equation: , where D is the play out delay and
^* v i
µ is a constant that can be selected from 1 to 20. We set µ to be 4 in our simulation. It should be noted that for VoIP over wireless, the network delay consists of delays introduced by the wireline network and the wireless link. Jitters can be introduced by network congestions in the wireline network or by retransmissions/propagations in the wireless links. In view of the fact that most jitter buffer algorithms were proposed for compensation of network congestion jitters, it should be valuable to investigate the impact of retransmission jitters for VoIP over wireless
in
III. SIMULATION SYSTEM DESCRIPTION Our study is based on network simulator ns-2 [8], in
which we simulated a last-hop wireless scenario. Both of the IEEE 802.11 and the Ethernet protocol stack are implemented in the simulator. A two way Bernoulli error model was inserted to simulate the wireless link transmission errors. In 802.11, if the packet size exceeds the Max. Transmission Unit (e.g. 1500 bytes for WaveLan) the packet will be fragmented. Since we set the packet size to 71 bytes, a 12.2kbit rate AMR speech frame for one RTP packet the impact of fragmentation is avoided.
The simulation system is given in Figure 1. In our simulation, the original speech file is first encoded by the AMR codec and then analyzed to extract the speech marking information (voiced/unvoiced) for each packet. The speech marking information is used with network delay and wireless link quality to control the retransmission policy. The error model determines whether a packet is corrupted or not according to
Fixed HostMobile Host
Original Speech
AMR Encoder
RTP Adaptive Jitter Buffer
AMR DecoderRTP
UDPUDPSpeech
Marking IPNetwork Delay IP
EthernetMACRetx.
Limit Control
DegradedSpeech
PER PHY BS
End-to-end MOS/Ie Delay (Id) MOSc
Speech Quality Evaluation
PESQ
EModel
Figure 1 Simulation Environment
packet error probability ( PER). The base station (BS) will neither send an ACK to the sender for a corrupted packet nor present it to the high layer. If the MAC layer of the sender has not received an acknowledgement for a packet, it will retransmit the packet until the packet is ACKed or it reaches the limit of retransmission (we will denote Retransmission as Retx in the rest of this paper). In our simulation, we set the Retx limit to 6 for both SPB Retx and Full Retx. In the receiver, the received speech packets are fed to an adaptive jitter buffer and subsequently decoded to recover the degraded speech file that is used to obtain a measure of speech quality.
In our study, we used combined PESQ and E-Model to evaluate the conversational speech quality as described in Section II-B. Performance index was obtained averaging the computation results that were obtained from this method for each 20 seconds of the speech file.
IV. RESULT ANALYSIS AND THE PROPOSED RETRANSMISSION SCHEME
The following simulation results were obtained by
averaging results of 50 simulations with different random seeds to avoid the impact of packet loss locations. The three simulated retransmission schemes are SPB Retx, Full Retx and Null Retx.
TABLE.1 gives the average number of voiced packets losses of transmitting 73000 speech packets in our simulated wireless network with these schemes. For simplicity, we only simulated the wireless link for the purpose of this study. And only the wireless link (Retx limit exceeded) and the adaptive jitter buffer account for the packet losses. In Table.1, most of the losses of voiced packets in Full Retx or SPB Retx are caused by jitter buffer. As we deployed a Bernoulli error model in our simulation,
most of the retransmitted packets can be successfully received by the receiver. If the bursty of packet errors is considered, there should be more losses of voiced packets in Full Retx or SPB Retx scheme.
TABLE.1- Average Voiced Packets Losses With fast-exp Jitter Buffer Retx SchemePER
No Retx
SPB Retx
Full Retx
0.0001 15 53 290.0005 36 54 270.0008 61 51 260.001 69 47 220.003 144 28 170.005 241 22 130.01 474 13 90.05 2344 42 160.10 4678 931 159
It seems very straightforward that SPB Retx should be
better than No Retx and at least the same as Full Retx with regard to the performance of protecting voiced frames. However, in TABLE.1, we can see that Full Retx always has less voiced packets losses, while No Retx has the least lost voiced packets when link quality is good (packet error probability lower than 0.0005). In fact, as in fast-exp algorithm, the estimated playout delay will increase with the number of retransmission jitters increases. When link quality is good, the estimated play out delay keeps at a low level, occasionally retransmitted packets and packets adjacent to them would be discarded by jitter buffer due to jitters they introduced. However, in No Retx scheme, a corrupted packet doesn’t affect its following packets. That’s why it has least packet losses when link quality is very good. On the other hand, in SPB Retx, unvoiced
Buf
fere
d R
etx
Del
ay (m
s)
10-4 10-3 10-2 10-1 10010-2
10-1
100
101
102
Packet Error Probability
Loss
Rat
e (%
)No RetxSPB RetxFull Retx
10-4 10-3 10-2 10-1 1000
50
100
150
200
250
300
Packet Error Probability
No RetxSPB RetxFull Retx
Figure 2 Overall packet loss rate comparison
100 120 140 160 180 200 220 240 260 280 3003.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4
4.1
4.2
Network Delay
MO
Sc
Perceived Quality DrivenNo RetxSPB RetxFull Retx
Figure 3 Buffered retx delay comparison
10-4 10-3 10-2 10-1 1001.5
2
2.5
3
3.5
4
MO
Sc
Perceived Quality DrivenNo RetxSPB RetxFull Retx
Figure 5 MOSc comparison with packet error probability 0.001
packedelaylink unACadaptnext more
rates Figurrate ain Fretrannot tloss qualiin wobuffewhento 0.becaucorru
Packet Error Probability
Figure 4 MOSc comparison with 175ms network
ts are not retransmitted hence the estimated playout can’t reflect current wireless link situations when quality becomes worse. While in Full Retx, every Ked packets is retransmitted, this is helpful for the ive jitter buffer to estimate the playout delay for the talkspurt. That’s why the adaptive jitter buffer discard packets in SPB Retx than in Full Retx. Figure 2 and Figure 3 give the overall packet loss and buffered retransmission delay comparison. Ine 2, we can see that Full Retx keeps the packet loss t a low level at the expense of higher delay as plotted igure 3 because every unACKed packet is smitted. It’s very interesting that when link quality is
oo bad (packet error probability up to 0.01), packet rate of Full Retx scheme is decreasing while link ty becoming worse. In fact, as we mentioned before, rse link quality, more retransmissions helps the jitter r to estimate playout delay more accurately. However, link quality is very good (packet error probability up 0005), No Retx can obtain the best packet loss rate se it doesn’t introduce any jitter and few packets is pted due to bit errors. As a compromised method, the
packet loss rate and Retx delay of SPB Retx is between No Retx and Full Retx.
Using the evaluation method described in Section II-B, we give a more straightforward performance comparison in Figure 4 and Figure 5 for these schemes with MOSc as the metric. Our evaluation didn’t consider the packet losses introduced in the wireline network hence to focus on the performance of Retx schemes. However, we considered network delay in the evaluation. For natural hearing, delays lower than 100ms cannot really be appreciated, but delays above 150ms can obviously affect conversation interactivity [8]. Considering Retx delays rarely exceed 100ms, to obviously reflect the impact of Retx delay, we assume 175ms delay had been introduced in the wireline network and add it to the end-to-end delay in the MOSc evaluation. In Figure 4, the MOSc of Full Retx is lower than No Retx and SPB Retx when packet error probability is lower than 0.003. That’s because Full Retx scheme always introduces more Retx delay, while the perceived speech quality is sensitive to high delay when link quality is good. When packet error probability exceeds 0.003, Full Retx scheme becomes the best, as it can greatly reduce the number of corrupted packets. Fig. 5 illustrates the
performance comparison with different network delays when packet error probability is 0.001. In Fig. 5, we can see that when delay lower than 150ms, Full Retx can get the best MOSc. When delay is higher than 150ms Null Retx becomes the best, it confirms that 150ms is the threshold above which delay begins to have a severe impact on speech quality. Similar to Fig 4, the performance of SPB is between No Retx and Full Retx, but it doesn’t become the best in both sides of the delay threshold.
Considering both No Retx and Full Retx schemes can achieve the best MOSc under different link quality and network delay situations. We propose a new perceived speech quality driven retransmission scheme, which can switch between these two schemes when link quality and network delay changes. The pseudo code of the new scheme is shown in Figure 6. Low_Error_Threshold is set to be 0.0005 and High_Error_Threshold is 0.003. Since according the simulation results, when packet error probability is lower than 0.0005, No Retx can achieve the best MOSc even delay is not considered, whereas Full Retx becomes the best when packet error probability exceed 0.003, even network delay is very high. When packet error probability is between 0.0005 and 0.003, the decision should be made according to network delay. In the proposed scheme, Delay_Threshold is set to be 150ms as it’s the threshold that delay begin to obviously affect speech quality. In real applications, we can convert Bit Error Rate (BER) to PER, and BER can be obtained according to bit errors in bit pattern series sent from BS. Network delay can be estimated by deducting average MH to BS handoff delay from average end-to-end delay that can be retrieved from RTP packet header.
The performance of the new perceived speech driven scheme is also given in Figure 4 and Figure 5 under different network delay and packet error probability. We can see that the curve of the perceived quality driven scheme is overlapped with parts of No Retx and Full Retx when they achieve best MOSc. As it can switch to the more suitable scheme between No Retx and Full Retx when communication conditions changes. Since this method only uses Full Retx when it’s necessary, it can also achieve the similar retransmission efficiency as SPB Retx while avoid the implementation complexity to obtain speech property information that is necessary for SPB Retx.
VII. CONCLUSION A suitable retransmission scheme is crucial for
obtaining the best possible perceived speech quality in wireless VoIP applications. In this paper, we investigated the performance of three different retransmission schemes (No Retx, SPB Retx, Full Retx) with regard to the perceived conversational speech quality. The impact of retransmission jitters with an adaptive jitter buffer was also considered. The simulation results show that the performance of these schemes depends on the network delay and wireless link quality. Considering that the wireless environment is variable, we have proposed a perceived speech quality driven retransmission scheme that can adapt to the wireless link quality and network delay conditions. As the SPB Retx is not involved in the new method, the implementation complexity for retrieving speech property information is avoided. Our results show that the proposed method can achieve an optimum MOSc compared to No Retx, Full Retx and SPB Retx. Since the most suitable scheme is deployed by the new method when communication conditions changes. In the study, a simplified last hop wireless network is implemented to demonstrate wireless voice over IP scenario. Further improvements may be achieved by making the simulation closer to real network, e.g. by incorporating a multi-state error model in the wireless link.
Reference: [1] IEEE Standards Department, 1999, IEEE 802.11 Standard for Wireless LAN, Medium Access Control (MAC) and Physical Layer (PHY) Specification. [2] C.Hoene, I.Carreras, A.Wolisz, 2001, Voice over IP: Improving the Quality Over Wireless LAN by Adopting a Booster Mechanism – An Experiment Approach. Proc. SPIE 2001 - Voice Over IP (VoIP) Technology, pp. 157- Denver, Colorado, USA [3] H Sanneck, N Tuong L Le et al, 2001, Selective Packet Prioritization for Wireless Voice over IP, 4th Int Sym Wireless Personal Multimedia Communication, Denmark [4] L Sun, E.C.Ifeachor, 2003, Prediction of Perceived Conversational Speech Quality and Effects of Playout Buffer Algorithms, to appear in the Proc. of IEEE ICC 2003 [5] L.F.Sun, G.Wade, B.M.Lines and E.C.Ifeachor, 2001, Impact of Packet Loss Location on Perceived Speech Quality ,Proceedings of 2nd IP-Telephony Workshop (IPTEL '01), Columbia University, New York, pp.114-122.
[6] ITU-T G.107, The E-model, a computational model for use in transmission planning, May 2000 [7] R.Ramachandran, J.Kurose, D.Towsley and H.Schulzrinne, 1994, Adaptive playout mechanisms for packetized audio applications in wide-area networks, Proc. of IEEE Inforcom, vol.2, pp.680-688 [8] The Network Simulator - ns-2, available on line at http://www.isi.edu/nsnam/ns/
if (PER < Low_Error_Threshold) . No_Retx();
else if (PER>High_Error_Threshold) Full_Retx();
else { if(Network_Delay<Delay_Threshold)
Full_Retx(); else No_Retx();
}
[9] ITU-T G.114, One-Way Transmission Time, Feb 1999e
Figure 6 Perceived speech quality driven Retx scheme pseudo codRecommended