A real-time QoE methodology for AMR codec voice in mobile network

. RESEARCH PAPER .Special Focus on Convergence Communications

SCIENCE CHINAInformation Sciences

April 2014, Vol. 57 042308:1–042308:13

doi: 10.1007/s11432-014-5074-z

c© Science China Press and Springer-Verlag Berlin Heidelberg 2014 info.scichina.com link.springer.com

A real-time QoE methodology for AMR codec voicein mobile network

LI WenZhi, WANG Jing∗, XING ChengWen, FEI ZeSong & KUANG JingMing

School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China

Received October 26, 2013; accepted December 20, 2013

Abstract This paper studies a general strategy to predict voice Quality of Experience (QoE) for various

mobile networks. Particularly, based on data-mining for Adaptive Multi-Rate (AMR) codec voice, a novel QoE

assessment methodology is proposed. The proposed algorithm consists of two parts. The first part is devoted

to assessing speech quality of fixed rate codec mode (CM) of AMR while in the other one a adaptive rate CM is

designed. Measuring basic network parameters that have much impact on speech quality, QoE can be monitored

in real time for operators. Meanwhile, based on the measurement data sets from real mobile network, the QoE

prediction strategy can be implemented and QoE assessment model for AMR codec voice is trained and tested.

Finally, the numerical results suggest that the correlation coefficient between predicted values and true values

is greater than 90% and root mean squared error is less than 0.5 for fixed and adaptive rate CM.

Keywords quality of experience (QoE), speech quality, adaptive multi-rate (AMR), objective speech quality

measurement, data mining, multivariate adaptive regression splines (MARS)

Citation Li W Z, Wang J, Xing C W, et al. A real-time QoE methodology for AMR codec voice in mobile

network. Sci China Inf Sci, 2014, 57: 042308(13), doi: 10.1007/s11432-014-5074-z

1 Introduction

The wireless operators have particular emphasis on Quality of Experience (QoE) of services delivered over

telecommunication networks gradually. High QoE will help to improve the perception of services and

operators’s brand value, which closely correlates with users’ loyalty and operators’ income. In particular,

convergence networks allow the transportation of voice, data, and video within a single network. It is an

inevitable trend. Operators must pay more attention to QoE of various services in convergence network.

Just as the definition of QoE by International Telecommunication Union (ITU), “The overall acceptability

of an application or service, as perceived subjectively by the end-user” [1], QoE does describe the end

user’s real perception of accessibility and retain ability and integrity of the particular service. It differs

from traditional Quality of Service (QoS). QoS parameters focus on network or service itself, instead of

user’s experience. For example, most commonly, a coverage ratio of 95% is guaranteed as a traditional

QoS metric [2]. Setup Time and Cutoff Call Ratio are used as QoS parameters of voice service [3]. All

of these can reflect the health of network or service, but operators do not know users’ real perception

of service. It is the user’s experience that matters. Customers will never tell that what the coverage

∗Corresponding author (email: [email protected])

Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:2

ratio is, they will just say how awful the service is, which may result from the coverage ratio and other

factors. Low QoE will lead to users’ rejection of service. Especially for voice service, speech quality or

voice quality during a conversation can reflect users’ experience the most.

In the current mobile communication systems, the upper bound of speech quality is usually described

by the performance of speech coder. Therefore, an advanced source coding called Adaptive Multi-Rate

(AMR) recommended by [4–6] is used in most of mobile systems such as Global System for Mobile

Communications (GSM), Universal Mobile Telecommunications System (UMTS), Long Term Evolution,

and so on. AMR codec is a combination of speech coding and channel coding, aiming to maintain good

speech quality under varying channel conditions. It involves AMR-Wide Band and AMR-Narrow Band

(AMR-NB). Taking AMR-NB, for example, there are 14 different codec modes (CMs) (eight modes in

full rate (FR) and six modes in half rate (HR)) in total that will be selected adaptively according to the

dynamics of the channel, which is called link adaptation. Obviously that AMR can improve the speech

quality dramatically for its robustness of the network connection under poor channel condition and its

improvement of speech quality under good channel condition. AMR is also adopted as the standard speech

codec by 3rd Generation Partnership Project (3GPP) to provide a high voice service level. Therefore, it

will be of profound practical significance to predict voice service QoE for AMR codec.

Numerous methodologies for voice service QoE estimations have been extensively studied in the existing

works, including some industrial standardizations. All of the methods fall into two classes: subjective

methods and objective methods. Commonly, QoE is measured by subjective assessments. Each speech

sample is scored with a specific integer from 1 (worst) to 5 (best) by a number of listeners and then a

Mean Opinion Score (MOS) is calculated finally [7]. Although subjective assessment can achieve high

accuracy, it will be time-consuming and requires large investments in equipment and manpower, making

it unsuitable for monitoring QoE in real time and prompting the development of objective speech quality

assessments.

Objective models can be divided into intrusive measures (full-reference) and nonintrusive measures (no-

reference) with respect to the utilization of original clean voice. Also, no-reference methods can also be

broadly classified into speech parameter-based or network parameter-based. Full-reference methods and

speech parameter-based methods are limited by the requirement of a large number of degenerated speech

samples, considering that we cannot collect them during a call. The widely used and elaborate Perceptual

Evaluation of Speech Quality (PESQ) [8] and its evolution version Perceptual Objective Listening Quality

Assessment (POLQA) belong to full-reference methods. POLQA is the next-generation mobile voice

quality testing standard. It is developed for super wideband requirement of High Definition voice [9].

In the existing work, Refs. [10,11] summarizes the standardizations for measuring speech QoE in detail.

Several speech parameter-based intrusive methods are described in [12].

As a consequence of the above, for operators, only the network parameter-based assessments provide a

possible solution for monitoring QoE. Nonintrusive network parameter-based measurements are proposed

in [13,14], but some radio parameters such as TxPow and Latency used in [13] and Frame Erasure Rate

and Length of Erased Frames used in [14] are not able to be acquired in signaling monitor platform

for operators, indicating that operators cannot monitor QoE of the voice service by the two methods.

Another network parameter-based measurement for GSM called Speech Quality Index (SQI) is proposed

by Karlsson et al. [15]. Then, SQI is extended to AMR codec by Wanstedt et al. [16], but the algo-

rithm proposed in [16] can only assess the fixed rate CM. Recently, machine learning-based object QoE

methodologies are studied in [17–19]. In specific, Adaptive Network-based Fuzzy Inference Systems-based

measurement is proposed in [17,18]. Bayesian Network-based model is exploited to predict user’s QoE in

[19] in which the proposed model is trained by data generated from OPNET simulator related to IEEE

802.11g. However, the previous works in [17,18] only delve into the network parameters Rxqual, Rxlev in

2G, and RSCP, Ec/N0 in 3G; to the best of our knowledge, the existing work cannot reveal the overall

impacts of radio link on voice QoE.

Generally, for operators, subjective QoE method is time-consuming. Degenerated speech samples are

essential for full-reference objective methods and speech parameter-based methods. They are not available

for monitoring voice service QoE. Network parameter-based assessment offers a possible solution, but


there are still some deficiencies in the following aspects: the sufficiency of the network parameters, the

computational complexity, and whether the data set used for model training is obtained in actual network.

Compared with the existing works, the main contributions of this paper are listed as follows. A general

solution based on regressive methods to predict QoE in various mobile networks is proposed. In particular,

a no-reference network parameter-based QoE assessment called Radio Speech Quality (RSQ) is proposed

mainly for AMR codec voice. The novel assessment model consists of two parts: the first part is used

to estimate the speech quality of the fixed rate CM of AMR and then the second part based on the first

part’s results is designed for adaptive rate CM. Besides, it is noted that the first part of the proposed

algorithm is also suitable for the other fixed rate CMs such as enhanced full rate, FR, and HR of GSM.

It is also observed that the parameters of the proposed model can be acquired in real time, guaranteeing

that the measurement goes also real time. Additionally, the utilization of data set from realistic network

other than simulation enhances the model accuracy and practicability.

The remainder of this paper is organized as follows. The general description of voice QoE assessment

methodology is given in Section 2. Testbed setup and data collection are described in Section 3. Section 4

presents the AMR speech quality measurement in detail. Result analysis is given in Section 5. Finally,

conclusions are drawn in Section 6.

2 QoE assessment methodology

The proposed QoE prediction methodology allows operators to assess voice service QoE by measuring

certain network parameters or network Key Performance Indicators in mobile radio access networks. The

methodology includes the following three steps.

1) Data Acquisition and Analysis: Measurements data are usually obtained by Drive Test (DT). A

measurement scenario describing measurement approach, data format, data size, and so on conducts the

test voice calls. Meanwhile, signaling is captured by signal monitoring equipment. Speech samples are

recorded in calling station and called station, respectively.

Data analysis involves two aspects. First, we must extract network parameters that have great impact

on voice quality from the signaling data captured. For example, Received Signal Quality Sub values

(RxqualSub) and Received Signal Level Sub values (RxlevSub) will be selected in GSM. Second, the

number of data sets as well as data distribution must be checked roughly to decide whether extra mea-

surements are still needed.

2) Model Building and Algorithm Design: After data acquisition and analysis is completed, QoE

prediction algorithm will be constructed, which is also the data training process shown in Figure 1. a) As

for fixed rate CM of AMR, data preprocessing will be adopted first to describe the data characteristics

such as average, variance, and maximum & minimum value. Then, we plot the scatter diagram, box plot,

and histogram to give visual presentation of relationship between network parameters and QoE value.

Outlier detection is done and some abnormal data must be removed at the same time. Followed by is

the Principal Components Analysis (PCA) to reduce the data dimension and overcome multicollinearity.

Finally, mapping model from Principal Components (PCs) to QoE is built by Multivariate Adaptive

Regression Splines (MARS). MARS can reveal the nonlinear relationship among independent variables

and dependent variables well and has low complexity compared with Artificial Neural Network. b) As

for adaptive rate codec, the specific CM is deduced every time interval, then the corresponding QoE

values QoEi (i = 1, . . . , n) can be computed. The final QoE of AMR codec is evaluated by Multiple

Linear Regression (MLR). The independent variables are the weighing sum of QoEi as well as some

new variables related to CMs such as Forward num, Backward num, which will be stated in detail in

Section 4.

3) QoE Prediction: The QoE assessment method predicts QoE in the following two ways. The first is

done during the process of model training in Step 2, and the second happens only after the QoE prediction

model is established. a) All of the measurements data sets from Step 1 are divided into training set and

test set. The test set is part of original data set without removing the abnormal data. Prediction related

to test set is done after each epoch of training, aiming to find the model of best performance. If the


Fixed rate codec Adaptive rate codec

Datapreprocessing Plot

Removeabnormal data

in two instancesPCA Mapping

model

Codec modedecision

QoE1

QoEn

Weightedsum

New variablesrelated to CMs

MLR

Speechquality

Datapreprocessing Plot

Removeabnormal data

in two instancesPCA Mapping

model

Figure 1 Diagram of the proposed algorithm design.

Uplink

Downlink

MS A Channelsimulator

BTS BSC MSC

Signaling monitoring system BSC

Abis

PESQ

MS B BTS

Abis

Figure 2 System architecture of testing platform.

testing results do not fit the reference values well, training process will be restarted. b) Once the final

model is established, an specific QoE computational expression can be formulated. A QoE predicted

value is computed if a group of legal network parameters are imported. It is less time-consuming and has

lower complexity compared with training of the prediction model.

3 Data acquisition and analysis

For convenience, the QoE methodology presented in this paper is restricted to the case of GSM network.

It should be highlighted that the discussed method is also available for other mobile systems such as

Time Division-Synchronous Code Division Multiple Access and so on. Besides, we focus on the impacts

on speech quality from air interface or radio link, because Core Network or wired link causes little

degeneration of voice.

3.1 Data acquisition

The measurement data are collected using the network equipments in China Mobile Labs. The system

architecture of testing platform is shown in Figure 2, and the configuration related to AMR codec is

depicted in Table 1. Also, Drive Test (DT) terminals are used as Mobile Station A (MS A) and MS B.

Note that the connection of MS B and Base Transceiver Station (BTS) is wired rather than wireless,

and as a result the impact on speech quality from radio link between MS A and BTS is the same as

that between MS B and BTS. Also, the utilization of channel simulator contributes to the ergodicity of

channel condition and the data integrity consequently.

During the testing process, MS A denoted as calling station calls MS B considered as called station again

and again, and they talk to each other continuously. Degenerated speech samples of uplink or downlink

which can be obtained in Abis interface (the interface between the BTS and Base Station Controller


Table 1 Parameters configuration

Parameters Configuration

Channel mode Adaptive Full-rate Speech (AFS)

Active Codec Set (ACS) set1: 4.75 kbps, 5.9 kbps, 7.4 kbps, 12.2 kbps

Initial Codec Mode (ICM) 12.2 kbps

Threshold (THR) [7 dB, 8.5 dB, 11.5 dB]

Hysteresis (HYST) [1.5 dB, 1.5 dB, 2 dB]

(BSC)) and calling station (MS A), respectively, are evaluated with the ITU-T P.862.1 MOS-LQO [20] for

its high correlation with the subjective MOS. The corresponding network parameters such as Rxqual and

Rxlev are acquired via signal monitoring system, which facilitates the monitoring of network performance

along with service quality and costs less to upgrade or modify current equipments. Specifically speaking,

network parameters of downlink quality measured by MS and packed into Measurement Report (MR)

are sent on the uplink channel to the GSM network, while the radio uplink parameters are measured by

BTS directly. Network parameters related to both uplink quality and downlink quality are reported to

BSC by BTS via MR, so that we can obtain them at the standardized Abis interface.

3.2 Network parameters selection

Network parameters identified to be much particularly relevant to the resulting speech quality are selected.

All of the candidates must be measurable in physical layer to guarantee that the proposed algorithm is

very real time. As for GSM network, the analysis of network parameters is based on MR signaling

reported to BSC every 480 ms. Each clean speech sample has a duration of 4.8 s (PESQ recommends

that the minimum of active speech in the reference voice is 3.2 s [21]). It is exactly equal to 10 time

intervals of MR. Therefore, the proposed algorithm will assess speech quality every 4.8 s in real time. For

initial predictions, described further below, the parameters that have been used are as follows. Table 2

presents where to acquire the parameters and how important they are.

1) RxqualSub or RxqualFull: Rxqual is an integer value between 0 and 7, where each value corresponds

to a specific range of Bit Error Rate. The RxqualSub must be used if Discontinuous Transmission (DTX)

is used, otherwise the RxqualFull is preferred due to its higher confidence.

2) RxlevSub or RxlevFull: Rxlev ranging from 0 to 63 is mapped linearly to the received power level

at MS, which ranges from −110 dBm to −47 dBm. The selection criterion of Sub value or Full value is

just the same as that of Rxqual.

3) HO: The number of handovers, including Intracell Handover and Intercell Handover.

4) Codec: Voice coding used in GSM, specifically refers to AMR here.

5) AMR configuration: Parameters configuration related to AMR codec, described in Table 1.

Be aware of that for each speech sample, a PESQ (MOS-LQO) score is evaluated and 10 network

parameter sets are available because the duration of speech sample is 4.8 s, while the parameter set is

reported every 480 ms. The data matrix of each speech sample, also called a single set of data Dsingle i,

is equal to [Rxqual Rxlev HO Codec MOS

]10×5

, (1)

where Rxqual10×1 represents the Rxqual vector. Notice that Rxlev10×1 is called Rxlev vector. Thus,

all of the collected observations Dtotal can be written as

[Dsingle 1 Dsingle 2 · · · Dsingle n

]T(10n)×5

, (2)

where n is the total number of collected speech data and is also named as sample capacity.


Table 2 Network parameters

No. Network parameter ImportanceAcquisition location

Uplink Downlink

1. Rxqual Important Abis MR

2. Rxlev Important Abis MR

3. Codec Important Abis Abis

4. HO Important Abis Abis

Table 3 New variable set

No. Variable name Description Application

1. Mean Meanj = 110

∑10k=1 xj (k) Rxqual, Rxlev

2. Std Stdj =√

19

∑10k=1 (xj (k)−Meanj)

2 Rxqual, Rxlev

3. Max The maximum value in a single set of data Rxqual, Rxlev

4. Medi The median in a single set of data Rxqual, Rxlev

4 Algorithm design

4.1 Fixed rate CM

Further data analysis, including data preprocessing and PCA, should be performed after the selection

of network parameters or predictors in terms of data mining. Data preprocessing aims to transform the

10 different network parameters sets in Dsingle i into only one set to obtain the input data for mapping

model training. PCA is adopted as a method of data reduction before model training. There are four

different bit rates in AMR FR ACS set1 just as shown in Table 1 and we take the example of 4.75 kbps

to illustrate the specific algorithm design of fixed rate CM.

4.1.1 Data preprocess

First of all, a new set of variables described in Table 3 is computed from the Rxqual vector and Rxlev

vector to explain the data fluctuation, average, and so on. Thus, all of observations can be written as

a matrix: [ Y1 · · · Ym HOn×1 MOSn×1 ]n×(m+2), where Yi = [ Yi1 · · · Yij · · · Yi×n ]T and j ranging

from 1 to n, which is referred to as the speech sample index. The number n denotes the total number of

observations, and {Yi}m1 is a new variable set that is indicated in Table 3.

The units of measurement used for each element of {Yi}m1 may be different, so that standardization

is needed to prevent that those variables whose variances are largest will tend to dominate the first few

PCs when PCA is performed. As a result, Z-vectors equals

{Zi}m1 =

{Yi −Mean (Yi)

Std (Yi)

}m

1

, (3)

where

Zi =[Zi1 · · · Zij · · · Zi×n

]T. (4)

Therefore, the original data set can be written as[Z1 · · · Zm HOn×1 MOSn×1

]n×(m+2)

(5)

after the data preprocessing. Note that the preprocessing excludes the variables HO and MOS because

they are constant in a single set of data Dsingle i, while the values of Rxqual and Rxlev are changing

continuously.

4.1.2 Principal component analysis (PCA)

In our work, PCA is exploited as a dimension reduction method to reduce the number of predictors

and eliminate multicollinearity, which contributes to low complexity and stability of mapping model


Table 4 Eigenvalues and proportion of variance

Component no. Eigenvalues Percentage of variance (%) Cumulative of percentage (%)

1. 5.5263 40.32 40.32

2. 3.1079 22.67 62.99

3. 2.2597 16.48 79.47

4. 1.8467 13.47 92.94

5. 0.2991 2.18 95.13

6. 0.2219 1.62 96.74

7. 0.1355 0.99 97.73

6

5

4

3

2

1

0

Eig

enva

lues

0 1 2 3 4 5 6 7 8PC index

100%

80%

60%

40%

20%

0

Exp

lain

ed v

aria

nce

(a)

Thi

rd P

C

1.0

0.5

0.0

−0.5

−1.0

First PC−1.0 −0.5 0 0.5 1.0

(b)

Figure 3 Principal component analysis (PCA).

mentioned in the next section. PCA is carried out on the values of the Z-vector {Zi}m1 derived from (3).

The first PC

PC1 =[PC11 PC12 · · · PC1n

]T(6)

in which PC′1js are defined as

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

PC11 = Z11 × e11 + Z21 × e12 + · · ·+ Zm1 × e1m,

· · ·PC1j = Z1j × e11 + Z2j × e12 + · · ·+ Zmj × e1m,

· · ·PC1n = Z1n × e11 + Z2n × e12 + · · ·+ Zmn × e1m.

(7)

Based on the previous definitions, the equation can be rewritten in a more compact form as

PC1 = e11 ×Z1 + · · ·+ e1i ×Zi + · · ·+ e1m ×Zm = Ze1, (8)

where Z and e1 are defined as

Z �[Z1 · · · Zi · · · Zm

], (9)

e1 �[e11 · · · e1i · · · e1m

]T. (10)

The symbol Z denotes the preprocessed data computed from Rxqual vector and Rxlev vector. On the

other hand, e1 is the eigenvector corresponding to the maximum eigenvalue related to covariance matrix

of Z.

Table 4 shows the first seven eigenvalues for each component as well as the percentage of the total

variance explained by that component, just as reviewed graphically in Figure 3(a). Generally, the first


Table 5 Mapping relationship between Rxqual and C/I

Parameter Mapping function

Rxqual 0 1 2 3 4 5 6 7

C/I (dB) 23 19 17 15 13 11 8 4

several PCs will reflect the variability of predictors. For example, in Figure 3(b), the two sets of points

closing to horizontal axis and away from vertical axis indicate that there is a strong correlation between

the corresponding predictors and the first PC. Finally, we extract the first q components according to the

variance explained criterion, meaning that how much of the variability we would like to explain in the

variables. For instance, if we want 95% of the variability to be explained in total, component 1–5 must

be selected. The PC coefficient matrix is marked as the following equation:

THETAm×q =[e1 · · · ei · · · eq

]. (11)

Consequently, the PCs of the original data can be written in a matrix form:

PCn×q = [PC1PC2 · · ·PCq] = Zn×m ×THETAm×q. (12)

4.1.3 Mapping model

MARS [22] is adopted to further investigate the relationship between network parameters and QoE of

voice service. A forward stepwise algorithm as well as a backward algorithm are conducted during the

regression and generalized cross-validation [22] is used as model selection criterion. The general mapping

model is given in the following equation, by which the predicted speech quality RSQ is calculated.

RSQ = f(x| {ak}M0

)= a0 +

∑Mk=1 akBFk (x). (13)

In (13), x = [ HO PC1 PC2 · · · PCq ] labels the predictors combining PCA results and network pa-

rameter HO, while the parameter Codec is treated as a classified variable. The element of BF (x) in (13)

is presented by (14), and the base function BFk (x) takes the form of (15). The coefficient skl takes on

values ±1 to distinguish the original function from the reflected function and tkl represents a split point

on the corresponding predictor xv(k,l) ∈ x.

(x− t)+ =

{x− t, if x > t,

0, otherwise,(14)

BFk (x) =

Kk∏l=1

[skl ×

(xv(k,l) − tkl

)]+=

Kk∏l=1

max{0,[skl ×

(xv(k,l) − tkl

)]}. (15)

As a result, the final QoE prediction model is formulated as (16a). Here, M is the final number of base

functions, and Kk indicates the number of variables associated with split points in each base function

BFk (x), suggesting the interaction effects among corresponding variables. Coefficients {ak}M0 , tkl, and

skl are obtained by model training. Eq. (16b) shows the analysis of variance (ANOVA) decomposition of

(16a). It puts together the base functions having same number of variables and reveals further relationship

between speech quality and network parameters (independent variables). The second term in (16b) is the

sum of base functions with a single variable HO. Similarly, base functions of the third term only include

PCn, while the last sum consists of base functions that involve two or more variables.

RSQ = a0 +

M∑k=1

ak

Kk∏l=1

max{0,[skl ×

(xv(k,l) − tkl

)]}(16a)


= a0 +

2∑n=1

bn ×max {0, [sn × (HO− tn)]}+q∑

n=1

cn ×max {0, [sn × (PCn − tn)]}︸︷︷︸

Kk=1

+

q+1∑Kk=2

∑k

dk × BFk (x1, · · · , xKk)

︸︷︷︸Kk�2

.

(16b)

4.2 Adaptive rate CM

For AMR codec, it is possible that varying CMs from ACS are used adaptively during a call according to

the channel condition. Detailed description can be found in [23]. Thus, it is the speech quality assessment

under link adaptation that counts for much. An algorithm for the adaptive CM is proposed based on

the results of Subsection 4.1. However, it is necessary to have the knowledge of current codec, the codec

which is being used from the ACS.

4.2.1 CM decision approach

Just as [23] says, the CM decision can be done exactly based on the Carrier to Interference (C/I) ratio

along with the threshold and hysteresis values. C/I value will be evaluated approximately via the mapping

relationship between Rxqual and C/I shown in Table 5 because real-time C/I value is unavailable to be

obtained from the signal monitoring system deployed by operators.

Figure 2 in [23] illustrates the behavior of how codec changes. For example, if the Rxqual is equal to

0, then we would infer that the current codec is CM 4 (12.2 kbps) during the past 480 ms because the

corresponding C/I value 23 dB is greater than THR3 + HYST3. Notice that threshold and hysteresis

values as well as ACS can be seen in Table 1 and CM 4 is defined as the highest rate, that is, 12.2 kbps,

while CM 1 represents the 4.75 kbps codec.

4.2.2 Assessment method

The structure of the original observed data matrix is just the same as that of fixed rate CM,

[Rxqual Rxlev HO Codec MOS

]10×5

. (17)

First of all, CM decision is performed for 10 MRs, giving the CM values ranging from 1 to 4 (i.e., 4.75–

12.2 kbps) denoted as {CMk}101 . Other variables derived from {CMk}101 used for modeling are listed in

Table 6. Then, the 10 MRs in a single set of data are divided into several groups (four groups at most)

according to the four CMs. Each group will be associated with an evaluated RSQ value, for example, by

calling the corresponding mapping function described in (16a) and (16b). A novel variable called RSQw

is calculated as

RSQw =1

10[(RSQ4.75K ×Δ1) + (RSQ5.9K ×Δ2) + (RSQ7.4K ×Δ3) + (RSQ12.2K ×Δ4)]. (18)

Therefore, the predictors prepared for modeling are Forward num, Backward num, RSQw together

with HO. Forward num and Backward num indicate the change numbers of CM during 4.8 s. It

relates to the dynamics of channel condition and has an impact on speech quality consequently. Finally,

the fitting model to assess the speech quality of the adaptive rate CM can be formulated as (19). The

coefficients {ak}40 are obtained by MLR based on least squares because of its low complexity and the

knowledge of strong linear relationship between RSQw and MOS. δ1 and δ2 meaning the Forward num

and Backward num, respectively, have the values of integer. For example, if the δ1 value is 1, it means

that CM change only once from low rate to high rate and the speech quality will increase by a2.

RSQadp = f(x| {ak}M0

)= a0 + a1 × RSQw + a2 × δ1 + a3 × δ2 + a4 ×HO. (19)


Table 6 Variable set derived from {CMk}101No. Variable name Description

1. Forward num, δ1 When the current codec changes to a higher rate or less robust codec

2. Backward num, δ2 When the current codec changes to a lower rate or more robust codec

3. MR num, {Δi}41 The numbers of MRs corresponding to CMs : 4.75 kbps, 5.9 kbps, 7.4 kbps, 12.2 kbps

Table 7 Size of data set

CM (kbps)Data size

Training set Test set

4.75 599 202

5.9 529 177

7.4 505 168

12.2 404 138

Adaptive rate 469 158

Table 8 Fitting accuracy for training and test sets

CM (kbps) ModelTraining set Test set

ρ (%) RMSE ALMOS (%) ρ (%) RMSE ALMOS (%)

4.75 MARS 91.2 0.195 88.9 91.6 0.199 100

5.9 MARS 94.5 0.200 95.8 93.0 0.232 100

7.4 MARS 92.1 0.285 78.9 92.0 0.271 81.8

12.2 MARS 96.2 0.235 86.0 93.3 0.300 83.3

Adaptive rate MLR 91.6 0.202 N/Aa) 92.5 0.200 N/Aa)

a) There are no enough data for model training.

5 Result analysis

5.1 Data size

All of the original data, including network parameters and MOS, are collected in a real network, with MOS

values obtained in DT terminal and network parameters acquired in signal monitor platform. Outliers

will directly be excluded from training and test data sets. At the beginning, the basic setting parameters

for the modeling are given first. Then, the predicted speech quality will be presented in the following

section with four CMs: 4.75 kbps, 5.9 kbps, 7.4 kbps, and 12.2 kbps. The original data have been

randomly separated into two parts, i.e., training and test sets. The training data set is used to fit the

model, while the test set is used for assessing the accuracy of the finally chosen model. Typically, 75%

of original data would be selected for training and the others for testing [24]. Table 7 shows the detailed

information about the total size of training set and test set.

5.2 Result analysis

Traditionally, prediction performance is measured by Pearson’s correlation coefficient ρ and root mean

squared error (RMSE). They are usually used to interpret the relationship between x and y. Pearson’s

correlation coefficient and RMSE are computed based on the following equations:

ρ =

∑Nk=1 [(xk − x) (yk − y)]√∑N

k=1 (xk − x)2∑N

k=1 (yk − y)2∈ [−1 1] , (20)

RMSE =

√∑Nk=1 (xk − yk)

2

N, (21)

where x is the average of xk and y is the average of yk. Besides, the voice will be annoying or intolerable

when the MOS value is less than 2, leading to a dramatic drop in the QoE of voice service. Operators must


4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

PESQ

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5AFS 4.75 kbps predicted MOS

(a)

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

PESQ


(b)

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

PESQ


(c)

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

PESQ


(d)

Figure 4 Estimation results for the fixed rate CM.

take action to check whether the network is working correctly. Thus, a novel indicator called Accuracy of

Low MOS Prediction (ALMOS) is proposed to evaluate the model accuracy when PESQ is less than 2.

ALMOSP takes the following form:

ALMOS =QoELess2 and PreQoELess2

QoELess2× 100%. (22)

The denominator refers to the size of speech samples with their PESQ ranging from 1.02 to 2, and the

numerator is the number of speech samples whose PESQ as well as the predicted MOS range from 1.02

to 2.

The fitting accuracy for training set and test set based on correlation coefficient ρ, RMSE, and ALMOS

are summarized in Table 8. All of the correlation coefficients are greater than 90%. It reveals that high

accuracy is achieved for both fixed and adaptive rate modes, that is, the predicted results have a strong

positive correlation with PESQ results. The absolute difference values between predicted MOS and PESQ

are small because the RMSE values are less than 0.3. Also, the high value of ALMOS shown in Table 8

indicates that the proposed algorithm could raise the low speech quality alarm with high confidence when

QoE of voice service is less than 2. Note that the symbol N/A in Table 8 means we could not give the

value of corresponding indicator because there are no enough data for model training resulting from the

improvement of speech quality by link adaptation.

Figure 4 describes the scatter plots of predicted MOS versus PESQ. It demonstrates the high prediction

accuracy shown in Table 8 visibly. The two oblique lines outside in Figure 4 express that the difference

between predicted MOS and PESQ is 0.354 or −0.354. Obviously, the majority of absolute values of

estimated error are less than 0.354. Figure 5(a) shows the resulting scatter plot of actual QoE against

model prediction for adaptive CM. We achieve a correlation coefficient of about 92% for the test data set.

Figure 5(b) provides a histogram of the difference between predicted MOS and PESQ, also called residual


4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

PESQ

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5AMR FR set1 predicted MOS

(a)

0.25

0.20

0.15

0.10

0.05

0

Freq

uenc

y

−0.8 −0.4 0 0.4 0.8Residual speech quality

(b)

Figure 5 Estimation results for the adaptive rate CM.

Prob

abili

ty

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

The CDF of residual speech quality

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Absolute values of residual speech quality

AFS 4.75 kbps@ MARSAFS 5.9 kbps@ MARSAFS 7.4 kbps@ MARSAFS 12.2 kbps@ MARSAFS set1 @ MARSAFS 4.75 kbps@ MLRAFS 5.9 kbps@ MLRAFS 7.4 kbps@ MLRAFS 12.2 kbps@ MLR

(a)

Voi

ce s

ervi

ce Q

oE

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

Reference and predicted QoE values

Predicted QoE valuesReference QoE values

1 200 400 600 800Speech samples of test set

(b)

Figure 6 CDF curves and QoE values for all of the CMs.

error. It approximates the normal distribution, meaning that the MLR works well for QoE prediction.

Also, we present the Cumulative Distribution Function (CDF) of absolute values of residual speech

quality for fixed rate CM and adaptive rate CM in Figure 6(a). For fixed codec mode, we also give

the experiment results of MLR-based QoE measurement. According to Figure 6(a), the CDF curves of

proposed MARS-based method are above the counterparts compared with the MLR-based QoE mea-

surement. It says that there are more samples on the interval of minor residual error for MARS-based

assessment. The reason is that MARS can work on the nonlinear relationship well than MLR. Taking

AFS 4.75 kbps, for example, in Figure 6(a), the probability based on MARS is greater than 0.9 when

absolute values of residual speech quality are 0.3, while the corresponding probability based on MLR is

just about 80%. We also give the comparative results between predicted QoE and the reference value for

all of the CMs with about 843 samples in Figure 6(b). It is evident that the predictions can follow the

changes of actual QoE values, especially when the actual QoE values are below 2.5 and above 3.5.

6 Conclusion

A novel and applicative QoE measurement strategy of voice service for AMR codec which takes network

parameters as its independent variables is proposed. Taking advantage of the method proposed in this

paper, wireless operators can access to the voice service QoE of the monitored users, especially the users

with high Average Revenue Per User. This result can be exploited to guide network optimization as

well as network maintenance directly and effectively. Numerical results achieved based on the data sets

collected from real network validate the high accuracy and applicability of the algorithm. Also, it was


shown that the method designed for AMR codec can also be used to estimate QoE of other mobile

systems, including UMTS, if the network parameters are selected appropriately such as Received Signal

Code Power (RSCP), Block Error Rate, Signal to Interference Ratio, and so on.

Acknowledgements

The preliminary work of this paper was presented at the International Conference on Optical Internet (COIN)

2013. This research work was supported by China National S&T Major Project (Grant No. 2012ZX03001034).

References

1 ITU-T P.10/G.100. Vocabulary and effects of transmission parameters on customer opinion of transmission quality.

2008

2 Zhou Y Q, Liu H, Pan Z G, et al. Two-stage cooperative multicast transmission with optimized power consumption

and guaranteed coverage. IEEE J Sel Area Commun, 2013, 99: 1–11

3 ETSI TS 102 250. Speech processing, transmission and quality aspects (STQ); QoS aspects for popular services in

GSM and 3G networks, Part 2: definition of quality of service parameters and their computation. 2008

4 3GPP. Technical Specification Group Services and System Aspects; Mandatory Speech CODEC Speech Processing

Functions; AMR Speech CODEC; General Description (Release 10). 3GPP TS 26.071. 2011

5 3GPP. Technical Specification Group Services and System Aspects; Mandatory Speech CODEC Speech Processing

Functions; Adaptive Multi-rate (AMR) Speech CODEC; Transcoding Functions (Release 11). 3GPP TS 26.090. 2012

6 ITU-T Recommendation G.722.2. Wideband coding of speech at round 16 kbit/s using Adaptive Multi-Rate Wideband

(AMR-WB). 2003

7 ITU-T Recommendation P.800. Methods for subjective determination of transmission quality, Geneva. 1996

8 ITU-T Recommendation P.862. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end

speech quality assessment of narrowband telephone networks and speech codecs. Geneva: International Telecommuni-

cation Union, 2001

9 ITU-T Recommendation P.863. Perceptual objective listening quality assessment. 2011

10 Kuipers F, Kooij R, De Vleeschauwer D, et al. Techniques for measuring quality of experience. In: 8th International

Conference on Wired/Wireless Internet Communications. Berline/Heidelberg: Springer-Verlag, 2010. 216–227

11 Moller S, Chan W-Y, Cote N, et al. Speech quality estimation: models and trends. IEEE Signal Process Mag, 2011,

28: 18–28

12 Bhatt N, Kosta Y. Overall performance evaluation of adaptive multi rate 06.90 speech codec based on code excited

linear prediction algorithm using MATLAB. Int J Speech Technol, 2012, 15: 119–129

13 Dolezalova B, Holub J, Street M. Mobile network voice transmission quality estimation based on radio path parameters.

In: Wireless Telecommunications Symposium, Pomona, 2005. 95–99

14 Werner M, Kamps K, Tuisel U, et al. Parameter-based speech quality measures for GSM. Personal Indoor Mob Radio

Commun, 2003, 3: 2611–2615

15 Karlsson A, Heikkila G, Minde T B, et al. Radio link parameter based speech quality index-SQI. In: IEEE Workshop

on Speech Coding Proceedings, Haikko Manor Porvoo, 1999. 147–149

16 Wanstedt S, Pettersson J, Xianchun T, et al. Development of an objective spcech quality measuremcnt model for the

AMR codec. In: Proceedings of Workshop on Measurement of Speech and Audio Quality in Networks, 2002. 77–82

17 Pitas C N, Charilas D E, Panagopoulos A D, et al. Adaptive neuro-fuzzy inference models for speech and video quality

prediction in real-world mobile communication networks. IEEE Wirel Commun, 2013, 20: 80–88

18 Pitas C N, Charilas D E, Panagopoulos A D, et al. ANFIS-based quality prediction models for AMR-telephony in

public 2G/3G mobile networks. In: IEEE Global Communications Conference, Anaheim, 2012. 1728–1732

19 Mitra K, Ahlund C, Zaslavsky A. Performance evaluation of a decision-theoretic approach for quality of experience

measurement in mobile and pervasive computing scenarios. In: IEEE Wireless Communications and Networking

Conference, Shanghai, 2012. 2418–2423

20 ITU-T Recommendation P.862.1. Mapping function for transforming P.862 raw result scores to MOS-LQO. 2003

21 ITU-T Recommendation P.862.3. Application guide for objective quality measurement based on Recommendations

P.862, P.862.1 and P.862.2. 2007

22 Friedman J H. Multivariate adaptive regression splines. Ann Statist, 1991, 19: 1–141

23 3GPP. Technical Specification Group GSM/EDGE; Radio Access Network; Link Adaptation (Release 10). 3GPP TS

45.009. 2011

24 Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction.

2nd ed. Berline: Springer, 2009

Documents

A real-time QoE methodology for AMR codec voice in mobile network