Upload
jingming
View
220
Download
0
Embed Size (px)
Citation preview
. RESEARCH PAPER .Special Focus on Convergence Communications
SCIENCE CHINAInformation Sciences
April 2014, Vol. 57 042308:1–042308:13
doi: 10.1007/s11432-014-5074-z
c© Science China Press and Springer-Verlag Berlin Heidelberg 2014 info.scichina.com link.springer.com
A real-time QoE methodology for AMR codec voicein mobile network
LI WenZhi, WANG Jing∗, XING ChengWen, FEI ZeSong & KUANG JingMing
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
Received October 26, 2013; accepted December 20, 2013
Abstract This paper studies a general strategy to predict voice Quality of Experience (QoE) for various
mobile networks. Particularly, based on data-mining for Adaptive Multi-Rate (AMR) codec voice, a novel QoE
assessment methodology is proposed. The proposed algorithm consists of two parts. The first part is devoted
to assessing speech quality of fixed rate codec mode (CM) of AMR while in the other one a adaptive rate CM is
designed. Measuring basic network parameters that have much impact on speech quality, QoE can be monitored
in real time for operators. Meanwhile, based on the measurement data sets from real mobile network, the QoE
prediction strategy can be implemented and QoE assessment model for AMR codec voice is trained and tested.
Finally, the numerical results suggest that the correlation coefficient between predicted values and true values
is greater than 90% and root mean squared error is less than 0.5 for fixed and adaptive rate CM.
Keywords quality of experience (QoE), speech quality, adaptive multi-rate (AMR), objective speech quality
measurement, data mining, multivariate adaptive regression splines (MARS)
Citation Li W Z, Wang J, Xing C W, et al. A real-time QoE methodology for AMR codec voice in mobile
network. Sci China Inf Sci, 2014, 57: 042308(13), doi: 10.1007/s11432-014-5074-z
1 Introduction
The wireless operators have particular emphasis on Quality of Experience (QoE) of services delivered over
telecommunication networks gradually. High QoE will help to improve the perception of services and
operators’s brand value, which closely correlates with users’ loyalty and operators’ income. In particular,
convergence networks allow the transportation of voice, data, and video within a single network. It is an
inevitable trend. Operators must pay more attention to QoE of various services in convergence network.
Just as the definition of QoE by International Telecommunication Union (ITU), “The overall acceptability
of an application or service, as perceived subjectively by the end-user” [1], QoE does describe the end
user’s real perception of accessibility and retain ability and integrity of the particular service. It differs
from traditional Quality of Service (QoS). QoS parameters focus on network or service itself, instead of
user’s experience. For example, most commonly, a coverage ratio of 95% is guaranteed as a traditional
QoS metric [2]. Setup Time and Cutoff Call Ratio are used as QoS parameters of voice service [3]. All
of these can reflect the health of network or service, but operators do not know users’ real perception
of service. It is the user’s experience that matters. Customers will never tell that what the coverage
∗Corresponding author (email: [email protected])
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:2
ratio is, they will just say how awful the service is, which may result from the coverage ratio and other
factors. Low QoE will lead to users’ rejection of service. Especially for voice service, speech quality or
voice quality during a conversation can reflect users’ experience the most.
In the current mobile communication systems, the upper bound of speech quality is usually described
by the performance of speech coder. Therefore, an advanced source coding called Adaptive Multi-Rate
(AMR) recommended by [4–6] is used in most of mobile systems such as Global System for Mobile
Communications (GSM), Universal Mobile Telecommunications System (UMTS), Long Term Evolution,
and so on. AMR codec is a combination of speech coding and channel coding, aiming to maintain good
speech quality under varying channel conditions. It involves AMR-Wide Band and AMR-Narrow Band
(AMR-NB). Taking AMR-NB, for example, there are 14 different codec modes (CMs) (eight modes in
full rate (FR) and six modes in half rate (HR)) in total that will be selected adaptively according to the
dynamics of the channel, which is called link adaptation. Obviously that AMR can improve the speech
quality dramatically for its robustness of the network connection under poor channel condition and its
improvement of speech quality under good channel condition. AMR is also adopted as the standard speech
codec by 3rd Generation Partnership Project (3GPP) to provide a high voice service level. Therefore, it
will be of profound practical significance to predict voice service QoE for AMR codec.
Numerous methodologies for voice service QoE estimations have been extensively studied in the existing
works, including some industrial standardizations. All of the methods fall into two classes: subjective
methods and objective methods. Commonly, QoE is measured by subjective assessments. Each speech
sample is scored with a specific integer from 1 (worst) to 5 (best) by a number of listeners and then a
Mean Opinion Score (MOS) is calculated finally [7]. Although subjective assessment can achieve high
accuracy, it will be time-consuming and requires large investments in equipment and manpower, making
it unsuitable for monitoring QoE in real time and prompting the development of objective speech quality
assessments.
Objective models can be divided into intrusive measures (full-reference) and nonintrusive measures (no-
reference) with respect to the utilization of original clean voice. Also, no-reference methods can also be
broadly classified into speech parameter-based or network parameter-based. Full-reference methods and
speech parameter-based methods are limited by the requirement of a large number of degenerated speech
samples, considering that we cannot collect them during a call. The widely used and elaborate Perceptual
Evaluation of Speech Quality (PESQ) [8] and its evolution version Perceptual Objective Listening Quality
Assessment (POLQA) belong to full-reference methods. POLQA is the next-generation mobile voice
quality testing standard. It is developed for super wideband requirement of High Definition voice [9].
In the existing work, Refs. [10,11] summarizes the standardizations for measuring speech QoE in detail.
Several speech parameter-based intrusive methods are described in [12].
As a consequence of the above, for operators, only the network parameter-based assessments provide a
possible solution for monitoring QoE. Nonintrusive network parameter-based measurements are proposed
in [13,14], but some radio parameters such as TxPow and Latency used in [13] and Frame Erasure Rate
and Length of Erased Frames used in [14] are not able to be acquired in signaling monitor platform
for operators, indicating that operators cannot monitor QoE of the voice service by the two methods.
Another network parameter-based measurement for GSM called Speech Quality Index (SQI) is proposed
by Karlsson et al. [15]. Then, SQI is extended to AMR codec by Wanstedt et al. [16], but the algo-
rithm proposed in [16] can only assess the fixed rate CM. Recently, machine learning-based object QoE
methodologies are studied in [17–19]. In specific, Adaptive Network-based Fuzzy Inference Systems-based
measurement is proposed in [17,18]. Bayesian Network-based model is exploited to predict user’s QoE in
[19] in which the proposed model is trained by data generated from OPNET simulator related to IEEE
802.11g. However, the previous works in [17,18] only delve into the network parameters Rxqual, Rxlev in
2G, and RSCP, Ec/N0 in 3G; to the best of our knowledge, the existing work cannot reveal the overall
impacts of radio link on voice QoE.
Generally, for operators, subjective QoE method is time-consuming. Degenerated speech samples are
essential for full-reference objective methods and speech parameter-based methods. They are not available
for monitoring voice service QoE. Network parameter-based assessment offers a possible solution, but
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:3
there are still some deficiencies in the following aspects: the sufficiency of the network parameters, the
computational complexity, and whether the data set used for model training is obtained in actual network.
Compared with the existing works, the main contributions of this paper are listed as follows. A general
solution based on regressive methods to predict QoE in various mobile networks is proposed. In particular,
a no-reference network parameter-based QoE assessment called Radio Speech Quality (RSQ) is proposed
mainly for AMR codec voice. The novel assessment model consists of two parts: the first part is used
to estimate the speech quality of the fixed rate CM of AMR and then the second part based on the first
part’s results is designed for adaptive rate CM. Besides, it is noted that the first part of the proposed
algorithm is also suitable for the other fixed rate CMs such as enhanced full rate, FR, and HR of GSM.
It is also observed that the parameters of the proposed model can be acquired in real time, guaranteeing
that the measurement goes also real time. Additionally, the utilization of data set from realistic network
other than simulation enhances the model accuracy and practicability.
The remainder of this paper is organized as follows. The general description of voice QoE assessment
methodology is given in Section 2. Testbed setup and data collection are described in Section 3. Section 4
presents the AMR speech quality measurement in detail. Result analysis is given in Section 5. Finally,
conclusions are drawn in Section 6.
2 QoE assessment methodology
The proposed QoE prediction methodology allows operators to assess voice service QoE by measuring
certain network parameters or network Key Performance Indicators in mobile radio access networks. The
methodology includes the following three steps.
1) Data Acquisition and Analysis: Measurements data are usually obtained by Drive Test (DT). A
measurement scenario describing measurement approach, data format, data size, and so on conducts the
test voice calls. Meanwhile, signaling is captured by signal monitoring equipment. Speech samples are
recorded in calling station and called station, respectively.
Data analysis involves two aspects. First, we must extract network parameters that have great impact
on voice quality from the signaling data captured. For example, Received Signal Quality Sub values
(RxqualSub) and Received Signal Level Sub values (RxlevSub) will be selected in GSM. Second, the
number of data sets as well as data distribution must be checked roughly to decide whether extra mea-
surements are still needed.
2) Model Building and Algorithm Design: After data acquisition and analysis is completed, QoE
prediction algorithm will be constructed, which is also the data training process shown in Figure 1. a) As
for fixed rate CM of AMR, data preprocessing will be adopted first to describe the data characteristics
such as average, variance, and maximum & minimum value. Then, we plot the scatter diagram, box plot,
and histogram to give visual presentation of relationship between network parameters and QoE value.
Outlier detection is done and some abnormal data must be removed at the same time. Followed by is
the Principal Components Analysis (PCA) to reduce the data dimension and overcome multicollinearity.
Finally, mapping model from Principal Components (PCs) to QoE is built by Multivariate Adaptive
Regression Splines (MARS). MARS can reveal the nonlinear relationship among independent variables
and dependent variables well and has low complexity compared with Artificial Neural Network. b) As
for adaptive rate codec, the specific CM is deduced every time interval, then the corresponding QoE
values QoEi (i = 1, . . . , n) can be computed. The final QoE of AMR codec is evaluated by Multiple
Linear Regression (MLR). The independent variables are the weighing sum of QoEi as well as some
new variables related to CMs such as Forward num, Backward num, which will be stated in detail in
Section 4.
3) QoE Prediction: The QoE assessment method predicts QoE in the following two ways. The first is
done during the process of model training in Step 2, and the second happens only after the QoE prediction
model is established. a) All of the measurements data sets from Step 1 are divided into training set and
test set. The test set is part of original data set without removing the abnormal data. Prediction related
to test set is done after each epoch of training, aiming to find the model of best performance. If the
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:4
Fixed rate codec Adaptive rate codec
Datapreprocessing Plot
Removeabnormal data
in two instancesPCA Mapping
model
Codec modedecision
QoE1
QoEn
Weightedsum
New variablesrelated to CMs
MLR
Speechquality
Datapreprocessing Plot
Removeabnormal data
in two instancesPCA Mapping
model
Figure 1 Diagram of the proposed algorithm design.
Uplink
Downlink
MS A Channelsimulator
BTS BSC MSC
Signaling monitoring system BSC
Abis
PESQ
MS B BTS
Abis
Figure 2 System architecture of testing platform.
testing results do not fit the reference values well, training process will be restarted. b) Once the final
model is established, an specific QoE computational expression can be formulated. A QoE predicted
value is computed if a group of legal network parameters are imported. It is less time-consuming and has
lower complexity compared with training of the prediction model.
3 Data acquisition and analysis
For convenience, the QoE methodology presented in this paper is restricted to the case of GSM network.
It should be highlighted that the discussed method is also available for other mobile systems such as
Time Division-Synchronous Code Division Multiple Access and so on. Besides, we focus on the impacts
on speech quality from air interface or radio link, because Core Network or wired link causes little
degeneration of voice.
3.1 Data acquisition
The measurement data are collected using the network equipments in China Mobile Labs. The system
architecture of testing platform is shown in Figure 2, and the configuration related to AMR codec is
depicted in Table 1. Also, Drive Test (DT) terminals are used as Mobile Station A (MS A) and MS B.
Note that the connection of MS B and Base Transceiver Station (BTS) is wired rather than wireless,
and as a result the impact on speech quality from radio link between MS A and BTS is the same as
that between MS B and BTS. Also, the utilization of channel simulator contributes to the ergodicity of
channel condition and the data integrity consequently.
During the testing process, MS A denoted as calling station calls MS B considered as called station again
and again, and they talk to each other continuously. Degenerated speech samples of uplink or downlink
which can be obtained in Abis interface (the interface between the BTS and Base Station Controller
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:5
Table 1 Parameters configuration
Parameters Configuration
Channel mode Adaptive Full-rate Speech (AFS)
Active Codec Set (ACS) set1: 4.75 kbps, 5.9 kbps, 7.4 kbps, 12.2 kbps
Initial Codec Mode (ICM) 12.2 kbps
Threshold (THR) [7 dB, 8.5 dB, 11.5 dB]
Hysteresis (HYST) [1.5 dB, 1.5 dB, 2 dB]
(BSC)) and calling station (MS A), respectively, are evaluated with the ITU-T P.862.1 MOS-LQO [20] for
its high correlation with the subjective MOS. The corresponding network parameters such as Rxqual and
Rxlev are acquired via signal monitoring system, which facilitates the monitoring of network performance
along with service quality and costs less to upgrade or modify current equipments. Specifically speaking,
network parameters of downlink quality measured by MS and packed into Measurement Report (MR)
are sent on the uplink channel to the GSM network, while the radio uplink parameters are measured by
BTS directly. Network parameters related to both uplink quality and downlink quality are reported to
BSC by BTS via MR, so that we can obtain them at the standardized Abis interface.
3.2 Network parameters selection
Network parameters identified to be much particularly relevant to the resulting speech quality are selected.
All of the candidates must be measurable in physical layer to guarantee that the proposed algorithm is
very real time. As for GSM network, the analysis of network parameters is based on MR signaling
reported to BSC every 480 ms. Each clean speech sample has a duration of 4.8 s (PESQ recommends
that the minimum of active speech in the reference voice is 3.2 s [21]). It is exactly equal to 10 time
intervals of MR. Therefore, the proposed algorithm will assess speech quality every 4.8 s in real time. For
initial predictions, described further below, the parameters that have been used are as follows. Table 2
presents where to acquire the parameters and how important they are.
1) RxqualSub or RxqualFull: Rxqual is an integer value between 0 and 7, where each value corresponds
to a specific range of Bit Error Rate. The RxqualSub must be used if Discontinuous Transmission (DTX)
is used, otherwise the RxqualFull is preferred due to its higher confidence.
2) RxlevSub or RxlevFull: Rxlev ranging from 0 to 63 is mapped linearly to the received power level
at MS, which ranges from −110 dBm to −47 dBm. The selection criterion of Sub value or Full value is
just the same as that of Rxqual.
3) HO: The number of handovers, including Intracell Handover and Intercell Handover.
4) Codec: Voice coding used in GSM, specifically refers to AMR here.
5) AMR configuration: Parameters configuration related to AMR codec, described in Table 1.
Be aware of that for each speech sample, a PESQ (MOS-LQO) score is evaluated and 10 network
parameter sets are available because the duration of speech sample is 4.8 s, while the parameter set is
reported every 480 ms. The data matrix of each speech sample, also called a single set of data Dsingle i,
is equal to [Rxqual Rxlev HO Codec MOS
]10×5
, (1)
where Rxqual10×1 represents the Rxqual vector. Notice that Rxlev10×1 is called Rxlev vector. Thus,
all of the collected observations Dtotal can be written as
[Dsingle 1 Dsingle 2 · · · Dsingle n
]T(10n)×5
, (2)
where n is the total number of collected speech data and is also named as sample capacity.
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:6
Table 2 Network parameters
No. Network parameter ImportanceAcquisition location
Uplink Downlink
1. Rxqual Important Abis MR
2. Rxlev Important Abis MR
3. Codec Important Abis Abis
4. HO Important Abis Abis
Table 3 New variable set
No. Variable name Description Application
1. Mean Meanj = 110
∑10k=1 xj (k) Rxqual, Rxlev
2. Std Stdj =√
19
∑10k=1 (xj (k)−Meanj)
2 Rxqual, Rxlev
3. Max The maximum value in a single set of data Rxqual, Rxlev
4. Medi The median in a single set of data Rxqual, Rxlev
4 Algorithm design
4.1 Fixed rate CM
Further data analysis, including data preprocessing and PCA, should be performed after the selection
of network parameters or predictors in terms of data mining. Data preprocessing aims to transform the
10 different network parameters sets in Dsingle i into only one set to obtain the input data for mapping
model training. PCA is adopted as a method of data reduction before model training. There are four
different bit rates in AMR FR ACS set1 just as shown in Table 1 and we take the example of 4.75 kbps
to illustrate the specific algorithm design of fixed rate CM.
4.1.1 Data preprocess
First of all, a new set of variables described in Table 3 is computed from the Rxqual vector and Rxlev
vector to explain the data fluctuation, average, and so on. Thus, all of observations can be written as
a matrix: [ Y1 · · · Ym HOn×1 MOSn×1 ]n×(m+2), where Yi = [ Yi1 · · · Yij · · · Yi×n ]T and j ranging
from 1 to n, which is referred to as the speech sample index. The number n denotes the total number of
observations, and {Yi}m1 is a new variable set that is indicated in Table 3.
The units of measurement used for each element of {Yi}m1 may be different, so that standardization
is needed to prevent that those variables whose variances are largest will tend to dominate the first few
PCs when PCA is performed. As a result, Z-vectors equals
{Zi}m1 =
{Yi −Mean (Yi)
Std (Yi)
}m
1
, (3)
where
Zi =[Zi1 · · · Zij · · · Zi×n
]T. (4)
Therefore, the original data set can be written as[Z1 · · · Zm HOn×1 MOSn×1
]n×(m+2)
(5)
after the data preprocessing. Note that the preprocessing excludes the variables HO and MOS because
they are constant in a single set of data Dsingle i, while the values of Rxqual and Rxlev are changing
continuously.
4.1.2 Principal component analysis (PCA)
In our work, PCA is exploited as a dimension reduction method to reduce the number of predictors
and eliminate multicollinearity, which contributes to low complexity and stability of mapping model
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:7
Table 4 Eigenvalues and proportion of variance
Component no. Eigenvalues Percentage of variance (%) Cumulative of percentage (%)
1. 5.5263 40.32 40.32
2. 3.1079 22.67 62.99
3. 2.2597 16.48 79.47
4. 1.8467 13.47 92.94
5. 0.2991 2.18 95.13
6. 0.2219 1.62 96.74
7. 0.1355 0.99 97.73
6
5
4
3
2
1
0
Eig
enva
lues
0 1 2 3 4 5 6 7 8PC index
100%
80%
60%
40%
20%
0
Exp
lain
ed v
aria
nce
(a)
Thi
rd P
C
1.0
0.5
0.0
−0.5
−1.0
First PC−1.0 −0.5 0 0.5 1.0
(b)
Figure 3 Principal component analysis (PCA).
mentioned in the next section. PCA is carried out on the values of the Z-vector {Zi}m1 derived from (3).
The first PC
PC1 =[PC11 PC12 · · · PC1n
]T(6)
in which PC′1js are defined as
⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩
PC11 = Z11 × e11 + Z21 × e12 + · · ·+ Zm1 × e1m,
· · ·PC1j = Z1j × e11 + Z2j × e12 + · · ·+ Zmj × e1m,
· · ·PC1n = Z1n × e11 + Z2n × e12 + · · ·+ Zmn × e1m.
(7)
Based on the previous definitions, the equation can be rewritten in a more compact form as
PC1 = e11 ×Z1 + · · ·+ e1i ×Zi + · · ·+ e1m ×Zm = Ze1, (8)
where Z and e1 are defined as
Z �[Z1 · · · Zi · · · Zm
], (9)
e1 �[e11 · · · e1i · · · e1m
]T. (10)
The symbol Z denotes the preprocessed data computed from Rxqual vector and Rxlev vector. On the
other hand, e1 is the eigenvector corresponding to the maximum eigenvalue related to covariance matrix
of Z.
Table 4 shows the first seven eigenvalues for each component as well as the percentage of the total
variance explained by that component, just as reviewed graphically in Figure 3(a). Generally, the first
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:8
Table 5 Mapping relationship between Rxqual and C/I
Parameter Mapping function
Rxqual 0 1 2 3 4 5 6 7
C/I (dB) 23 19 17 15 13 11 8 4
several PCs will reflect the variability of predictors. For example, in Figure 3(b), the two sets of points
closing to horizontal axis and away from vertical axis indicate that there is a strong correlation between
the corresponding predictors and the first PC. Finally, we extract the first q components according to the
variance explained criterion, meaning that how much of the variability we would like to explain in the
variables. For instance, if we want 95% of the variability to be explained in total, component 1–5 must
be selected. The PC coefficient matrix is marked as the following equation:
THETAm×q =[e1 · · · ei · · · eq
]. (11)
Consequently, the PCs of the original data can be written in a matrix form:
PCn×q = [PC1PC2 · · ·PCq] = Zn×m ×THETAm×q. (12)
4.1.3 Mapping model
MARS [22] is adopted to further investigate the relationship between network parameters and QoE of
voice service. A forward stepwise algorithm as well as a backward algorithm are conducted during the
regression and generalized cross-validation [22] is used as model selection criterion. The general mapping
model is given in the following equation, by which the predicted speech quality RSQ is calculated.
RSQ = f(x| {ak}M0
)= a0 +
∑Mk=1 akBFk (x). (13)
In (13), x = [ HO PC1 PC2 · · · PCq ] labels the predictors combining PCA results and network pa-
rameter HO, while the parameter Codec is treated as a classified variable. The element of BF (x) in (13)
is presented by (14), and the base function BFk (x) takes the form of (15). The coefficient skl takes on
values ±1 to distinguish the original function from the reflected function and tkl represents a split point
on the corresponding predictor xv(k,l) ∈ x.
(x− t)+ =
{x− t, if x > t,
0, otherwise,(14)
BFk (x) =
Kk∏l=1
[skl ×
(xv(k,l) − tkl
)]+=
Kk∏l=1
max{0,[skl ×
(xv(k,l) − tkl
)]}. (15)
As a result, the final QoE prediction model is formulated as (16a). Here, M is the final number of base
functions, and Kk indicates the number of variables associated with split points in each base function
BFk (x), suggesting the interaction effects among corresponding variables. Coefficients {ak}M0 , tkl, and
skl are obtained by model training. Eq. (16b) shows the analysis of variance (ANOVA) decomposition of
(16a). It puts together the base functions having same number of variables and reveals further relationship
between speech quality and network parameters (independent variables). The second term in (16b) is the
sum of base functions with a single variable HO. Similarly, base functions of the third term only include
PCn, while the last sum consists of base functions that involve two or more variables.
RSQ = a0 +
M∑k=1
ak
Kk∏l=1
max{0,[skl ×
(xv(k,l) − tkl
)]}(16a)
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:9
= a0 +
2∑n=1
bn ×max {0, [sn × (HO− tn)]}+q∑
n=1
cn ×max {0, [sn × (PCn − tn)]}︸ ︷︷ ︸
Kk=1
+
q+1∑Kk=2
∑k
dk × BFk (x1, · · · , xKk)
︸ ︷︷ ︸Kk�2
.
(16b)
4.2 Adaptive rate CM
For AMR codec, it is possible that varying CMs from ACS are used adaptively during a call according to
the channel condition. Detailed description can be found in [23]. Thus, it is the speech quality assessment
under link adaptation that counts for much. An algorithm for the adaptive CM is proposed based on
the results of Subsection 4.1. However, it is necessary to have the knowledge of current codec, the codec
which is being used from the ACS.
4.2.1 CM decision approach
Just as [23] says, the CM decision can be done exactly based on the Carrier to Interference (C/I) ratio
along with the threshold and hysteresis values. C/I value will be evaluated approximately via the mapping
relationship between Rxqual and C/I shown in Table 5 because real-time C/I value is unavailable to be
obtained from the signal monitoring system deployed by operators.
Figure 2 in [23] illustrates the behavior of how codec changes. For example, if the Rxqual is equal to
0, then we would infer that the current codec is CM 4 (12.2 kbps) during the past 480 ms because the
corresponding C/I value 23 dB is greater than THR3 + HYST3. Notice that threshold and hysteresis
values as well as ACS can be seen in Table 1 and CM 4 is defined as the highest rate, that is, 12.2 kbps,
while CM 1 represents the 4.75 kbps codec.
4.2.2 Assessment method
The structure of the original observed data matrix is just the same as that of fixed rate CM,
[Rxqual Rxlev HO Codec MOS
]10×5
. (17)
First of all, CM decision is performed for 10 MRs, giving the CM values ranging from 1 to 4 (i.e., 4.75–
12.2 kbps) denoted as {CMk}101 . Other variables derived from {CMk}101 used for modeling are listed in
Table 6. Then, the 10 MRs in a single set of data are divided into several groups (four groups at most)
according to the four CMs. Each group will be associated with an evaluated RSQ value, for example, by
calling the corresponding mapping function described in (16a) and (16b). A novel variable called RSQw
is calculated as
RSQw =1
10[(RSQ4.75K ×Δ1) + (RSQ5.9K ×Δ2) + (RSQ7.4K ×Δ3) + (RSQ12.2K ×Δ4)]. (18)
Therefore, the predictors prepared for modeling are Forward num, Backward num, RSQw together
with HO. Forward num and Backward num indicate the change numbers of CM during 4.8 s. It
relates to the dynamics of channel condition and has an impact on speech quality consequently. Finally,
the fitting model to assess the speech quality of the adaptive rate CM can be formulated as (19). The
coefficients {ak}40 are obtained by MLR based on least squares because of its low complexity and the
knowledge of strong linear relationship between RSQw and MOS. δ1 and δ2 meaning the Forward num
and Backward num, respectively, have the values of integer. For example, if the δ1 value is 1, it means
that CM change only once from low rate to high rate and the speech quality will increase by a2.
RSQadp = f(x| {ak}M0
)= a0 + a1 × RSQw + a2 × δ1 + a3 × δ2 + a4 ×HO. (19)
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:10
Table 6 Variable set derived from {CMk}101No. Variable name Description
1. Forward num, δ1 When the current codec changes to a higher rate or less robust codec
2. Backward num, δ2 When the current codec changes to a lower rate or more robust codec
3. MR num, {Δi}41 The numbers of MRs corresponding to CMs : 4.75 kbps, 5.9 kbps, 7.4 kbps, 12.2 kbps
Table 7 Size of data set
CM (kbps)Data size
Training set Test set
4.75 599 202
5.9 529 177
7.4 505 168
12.2 404 138
Adaptive rate 469 158
Table 8 Fitting accuracy for training and test sets
CM (kbps) ModelTraining set Test set
ρ (%) RMSE ALMOS (%) ρ (%) RMSE ALMOS (%)
4.75 MARS 91.2 0.195 88.9 91.6 0.199 100
5.9 MARS 94.5 0.200 95.8 93.0 0.232 100
7.4 MARS 92.1 0.285 78.9 92.0 0.271 81.8
12.2 MARS 96.2 0.235 86.0 93.3 0.300 83.3
Adaptive rate MLR 91.6 0.202 N/Aa) 92.5 0.200 N/Aa)
a) There are no enough data for model training.
5 Result analysis
5.1 Data size
All of the original data, including network parameters and MOS, are collected in a real network, with MOS
values obtained in DT terminal and network parameters acquired in signal monitor platform. Outliers
will directly be excluded from training and test data sets. At the beginning, the basic setting parameters
for the modeling are given first. Then, the predicted speech quality will be presented in the following
section with four CMs: 4.75 kbps, 5.9 kbps, 7.4 kbps, and 12.2 kbps. The original data have been
randomly separated into two parts, i.e., training and test sets. The training data set is used to fit the
model, while the test set is used for assessing the accuracy of the finally chosen model. Typically, 75%
of original data would be selected for training and the others for testing [24]. Table 7 shows the detailed
information about the total size of training set and test set.
5.2 Result analysis
Traditionally, prediction performance is measured by Pearson’s correlation coefficient ρ and root mean
squared error (RMSE). They are usually used to interpret the relationship between x and y. Pearson’s
correlation coefficient and RMSE are computed based on the following equations:
ρ =
∑Nk=1 [(xk − x) (yk − y)]√∑N
k=1 (xk − x)2∑N
k=1 (yk − y)2∈ [−1 1] , (20)
RMSE =
√∑Nk=1 (xk − yk)
2
N, (21)
where x is the average of xk and y is the average of yk. Besides, the voice will be annoying or intolerable
when the MOS value is less than 2, leading to a dramatic drop in the QoE of voice service. Operators must
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:11
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
PESQ
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5AFS 4.75 kbps predicted MOS
(a)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
PESQ
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5AFS 5.9 kbps predicted MOS
(b)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
PESQ
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5AFS 7.4 kbps predicted MOS
(c)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
PESQ
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5AFS 12.2 kbps predicted MOS
(d)
Figure 4 Estimation results for the fixed rate CM.
take action to check whether the network is working correctly. Thus, a novel indicator called Accuracy of
Low MOS Prediction (ALMOS) is proposed to evaluate the model accuracy when PESQ is less than 2.
ALMOSP takes the following form:
ALMOS =QoELess2 and PreQoELess2
QoELess2× 100%. (22)
The denominator refers to the size of speech samples with their PESQ ranging from 1.02 to 2, and the
numerator is the number of speech samples whose PESQ as well as the predicted MOS range from 1.02
to 2.
The fitting accuracy for training set and test set based on correlation coefficient ρ, RMSE, and ALMOS
are summarized in Table 8. All of the correlation coefficients are greater than 90%. It reveals that high
accuracy is achieved for both fixed and adaptive rate modes, that is, the predicted results have a strong
positive correlation with PESQ results. The absolute difference values between predicted MOS and PESQ
are small because the RMSE values are less than 0.3. Also, the high value of ALMOS shown in Table 8
indicates that the proposed algorithm could raise the low speech quality alarm with high confidence when
QoE of voice service is less than 2. Note that the symbol N/A in Table 8 means we could not give the
value of corresponding indicator because there are no enough data for model training resulting from the
improvement of speech quality by link adaptation.
Figure 4 describes the scatter plots of predicted MOS versus PESQ. It demonstrates the high prediction
accuracy shown in Table 8 visibly. The two oblique lines outside in Figure 4 express that the difference
between predicted MOS and PESQ is 0.354 or −0.354. Obviously, the majority of absolute values of
estimated error are less than 0.354. Figure 5(a) shows the resulting scatter plot of actual QoE against
model prediction for adaptive CM. We achieve a correlation coefficient of about 92% for the test data set.
Figure 5(b) provides a histogram of the difference between predicted MOS and PESQ, also called residual
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:12
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
PESQ
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5AMR FR set1 predicted MOS
(a)
0.25
0.20
0.15
0.10
0.05
0
Freq
uenc
y
−0.8 −0.4 0 0.4 0.8Residual speech quality
(b)
Figure 5 Estimation results for the adaptive rate CM.
Prob
abili
ty
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
The CDF of residual speech quality
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Absolute values of residual speech quality
AFS 4.75 kbps@ MARSAFS 5.9 kbps@ MARSAFS 7.4 kbps@ MARSAFS 12.2 kbps@ MARSAFS set1 @ MARSAFS 4.75 kbps@ MLRAFS 5.9 kbps@ MLRAFS 7.4 kbps@ MLRAFS 12.2 kbps@ MLR
(a)
Voi
ce s
ervi
ce Q
oE
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
Reference and predicted QoE values
Predicted QoE valuesReference QoE values
1 200 400 600 800Speech samples of test set
(b)
Figure 6 CDF curves and QoE values for all of the CMs.
error. It approximates the normal distribution, meaning that the MLR works well for QoE prediction.
Also, we present the Cumulative Distribution Function (CDF) of absolute values of residual speech
quality for fixed rate CM and adaptive rate CM in Figure 6(a). For fixed codec mode, we also give
the experiment results of MLR-based QoE measurement. According to Figure 6(a), the CDF curves of
proposed MARS-based method are above the counterparts compared with the MLR-based QoE mea-
surement. It says that there are more samples on the interval of minor residual error for MARS-based
assessment. The reason is that MARS can work on the nonlinear relationship well than MLR. Taking
AFS 4.75 kbps, for example, in Figure 6(a), the probability based on MARS is greater than 0.9 when
absolute values of residual speech quality are 0.3, while the corresponding probability based on MLR is
just about 80%. We also give the comparative results between predicted QoE and the reference value for
all of the CMs with about 843 samples in Figure 6(b). It is evident that the predictions can follow the
changes of actual QoE values, especially when the actual QoE values are below 2.5 and above 3.5.
6 Conclusion
A novel and applicative QoE measurement strategy of voice service for AMR codec which takes network
parameters as its independent variables is proposed. Taking advantage of the method proposed in this
paper, wireless operators can access to the voice service QoE of the monitored users, especially the users
with high Average Revenue Per User. This result can be exploited to guide network optimization as
well as network maintenance directly and effectively. Numerical results achieved based on the data sets
collected from real network validate the high accuracy and applicability of the algorithm. Also, it was
Li W Z, et al. Sci China Inf Sci April 2014 Vol. 57 042308:13
shown that the method designed for AMR codec can also be used to estimate QoE of other mobile
systems, including UMTS, if the network parameters are selected appropriately such as Received Signal
Code Power (RSCP), Block Error Rate, Signal to Interference Ratio, and so on.
Acknowledgements
The preliminary work of this paper was presented at the International Conference on Optical Internet (COIN)
2013. This research work was supported by China National S&T Major Project (Grant No. 2012ZX03001034).
References
1 ITU-T P.10/G.100. Vocabulary and effects of transmission parameters on customer opinion of transmission quality.
2008
2 Zhou Y Q, Liu H, Pan Z G, et al. Two-stage cooperative multicast transmission with optimized power consumption
and guaranteed coverage. IEEE J Sel Area Commun, 2013, 99: 1–11
3 ETSI TS 102 250. Speech processing, transmission and quality aspects (STQ); QoS aspects for popular services in
GSM and 3G networks, Part 2: definition of quality of service parameters and their computation. 2008
4 3GPP. Technical Specification Group Services and System Aspects; Mandatory Speech CODEC Speech Processing
Functions; AMR Speech CODEC; General Description (Release 10). 3GPP TS 26.071. 2011
5 3GPP. Technical Specification Group Services and System Aspects; Mandatory Speech CODEC Speech Processing
Functions; Adaptive Multi-rate (AMR) Speech CODEC; Transcoding Functions (Release 11). 3GPP TS 26.090. 2012
6 ITU-T Recommendation G.722.2. Wideband coding of speech at round 16 kbit/s using Adaptive Multi-Rate Wideband
(AMR-WB). 2003
7 ITU-T Recommendation P.800. Methods for subjective determination of transmission quality, Geneva. 1996
8 ITU-T Recommendation P.862. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end
speech quality assessment of narrowband telephone networks and speech codecs. Geneva: International Telecommuni-
cation Union, 2001
9 ITU-T Recommendation P.863. Perceptual objective listening quality assessment. 2011
10 Kuipers F, Kooij R, De Vleeschauwer D, et al. Techniques for measuring quality of experience. In: 8th International
Conference on Wired/Wireless Internet Communications. Berline/Heidelberg: Springer-Verlag, 2010. 216–227
11 Moller S, Chan W-Y, Cote N, et al. Speech quality estimation: models and trends. IEEE Signal Process Mag, 2011,
28: 18–28
12 Bhatt N, Kosta Y. Overall performance evaluation of adaptive multi rate 06.90 speech codec based on code excited
linear prediction algorithm using MATLAB. Int J Speech Technol, 2012, 15: 119–129
13 Dolezalova B, Holub J, Street M. Mobile network voice transmission quality estimation based on radio path parameters.
In: Wireless Telecommunications Symposium, Pomona, 2005. 95–99
14 Werner M, Kamps K, Tuisel U, et al. Parameter-based speech quality measures for GSM. Personal Indoor Mob Radio
Commun, 2003, 3: 2611–2615
15 Karlsson A, Heikkila G, Minde T B, et al. Radio link parameter based speech quality index-SQI. In: IEEE Workshop
on Speech Coding Proceedings, Haikko Manor Porvoo, 1999. 147–149
16 Wanstedt S, Pettersson J, Xianchun T, et al. Development of an objective spcech quality measuremcnt model for the
AMR codec. In: Proceedings of Workshop on Measurement of Speech and Audio Quality in Networks, 2002. 77–82
17 Pitas C N, Charilas D E, Panagopoulos A D, et al. Adaptive neuro-fuzzy inference models for speech and video quality
prediction in real-world mobile communication networks. IEEE Wirel Commun, 2013, 20: 80–88
18 Pitas C N, Charilas D E, Panagopoulos A D, et al. ANFIS-based quality prediction models for AMR-telephony in
public 2G/3G mobile networks. In: IEEE Global Communications Conference, Anaheim, 2012. 1728–1732
19 Mitra K, Ahlund C, Zaslavsky A. Performance evaluation of a decision-theoretic approach for quality of experience
measurement in mobile and pervasive computing scenarios. In: IEEE Wireless Communications and Networking
Conference, Shanghai, 2012. 2418–2423
20 ITU-T Recommendation P.862.1. Mapping function for transforming P.862 raw result scores to MOS-LQO. 2003
21 ITU-T Recommendation P.862.3. Application guide for objective quality measurement based on Recommendations
P.862, P.862.1 and P.862.2. 2007
22 Friedman J H. Multivariate adaptive regression splines. Ann Statist, 1991, 19: 1–141
23 3GPP. Technical Specification Group GSM/EDGE; Radio Access Network; Link Adaptation (Release 10). 3GPP TS
45.009. 2011
24 Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
2nd ed. Berline: Springer, 2009