16
1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi-Rate Wideband Speech Codec deployment in 3G Core Network Sergei Hyppenen Supervisor: Professor Sven-Gustav Häggman HELSINKI UNIVERSITY OF TECHNOLOGY 11.04.2006

1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

Embed Size (px)

Citation preview

Page 1: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Seminar Presentation: Adaptive Multi-Rate Wideband Speech Codec deployment in 3G Core Network

Sergei Hyppenen

Supervisor: Professor Sven-Gustav Häggman

HELSINKI UNIVERSITY OF TECHNOLOGY11.04.2006

Page 2: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

2 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Contents of the presentation

• Abbreviations

• Introduction

• AMR-WB speech codec

• Network architectures: GSM and 3G (Release 4)

• Speech transmission

• TrFO and TFO

• Out-of-Band Transcoder Control in TrFO

• TFO frames

• Lawful interception

• Signal interception simulation

• Test results: Noise floor values

• Test results: MOS quality values

• Conclusions

Page 3: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

3 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Abbreviations

• 3G: 3rd Generation

• ACELP: Algebraic Code-Excited Linear Prediction

• AMR-WB: Adaptive Multi-Rate Wideband speech codec

• ATM: Asynchronous Transfer Mode

• BSS: Base Station Subsystem

• CN: Core network

• dB: decibel

• dBov: dB relative to the overload point of the digital system

• DTX: Discontinuous Transmission

• EDGE: Enhanced Data rates for Global Evolution

• G.711: PCM-based coding method with 8 kHz sampling frequency and 8-bit A- or µ-law weighting

• GSM: Global System for Mobile Communications

• HR: Half Rate speech codec

• IP: Internet Protocol

• LSB: Least Significant Bit

• MOS: Mean Opinion Score rated 1-5

• NSS: Network Sub-System

• OoBTC: Out-of-Band Transcoder Control

• TC: Transcoder

• TDM: Time Division Multiplexing

• TFO: Tandem Free Operation

• TrFO: Transcoder Free Operation

• UMTS: Universal Mobile Telecommunications System

• VAD: Voice Activity Detection

• WB-PESQ: a tool for quality evaluation [ITU-T: P.862]

Page 4: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

4 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Introduction

•Speech contains frequencies up to the 10 kHz

•Current fixed and mobile telecommunication systems operate with a narrow audio bandwidth: 300-3400 Hz (ITU-T G.711)

• 500-3000 Hz is sufficient for understanding

• The sampling frequency used in digital core networks is 8000 Hz → in theory enables transmitting signals up to 4000 Hz

•Codecs utilized in mobile systems lower the quality of narrowband speech even more than the G.711

•AMR-WB speech codec improves the quality and especially the naturalness of speech

•In EDGE and UMTS all coding modes of the AMR-WB will be used, in GSM only coding modes till 12.65 kb/s

Page 5: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

5 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

AMR-WB speech codec

• Process 50-7000 Hz

• Sampling: 16 kHz

• Precision: 14-bit

• Coding model: ACELP

• VAD and DTX

• Bad frame handler

• Bit rates: 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85 kb/s

• Coding mode 12.65 kb/s produces better quality than G.711 (64 kb/s)

A-law coded speech

AMR-WB coded speech

Original speech

HR coded speech

time time

time time

Page 6: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

6 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Network architectures: GSM and 3G (Release 4)

• GSM: Transcoder (TC) is a part of Base Station Subsystem (BSS)

• In core Network Sub-Systems (NSS) speech signals are transferred in G.711 formUm

BSS

Abis

A

TC

OtherPLMN

PSTN/ISDN

NSS

MS

SIM

BSC MSC GMSC

O&M

NMS

BTS

ME

HLR

+BTS

BTS

AuC EIR

VLR VLR

BSS

Ater TDM

GGSNSGSN

BSC

Network Management NMS

OtherPLMN

PSTN/ISDN

Internet

BTS

CN CS Domain

CN PS Domain

RNC

Node-B

UE

MSGERAN

UTRAN

Abis

Ater/Iu

Iub

Iu

Iu

Gb

Um

Uu

H.248Mc

BICC CS-2, SIP-T, ISUP

TDM/IP/ATM

MSCServer

H.248Mc

MGW

Nb

MSS/GCS

MGW

TC

• 3G, Release 4: Core Network (CN) is divided to Packet Switched (PS) and Circuit Switched (CS) domains

• CS domain is separated to Control Plane (Signaling) and User Plane (Data)

• TC moved to core network, but still, the most common scheme to transfer speech in CN is G.711

Page 7: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

7 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Speech transmission

• In current telecommunication systems transcoding is performed at least twice

• In core networks speech signals are transferred in narrowband G.711 form and one one-way connection requires a 64 kb/s channel

BTS

BSC

A

TDM MSC

64kb/s

MSC

64kb/s

Ater

TC

G.711 G.711

TC

MSA Ater Abis

Decoding EncodingDecoding

EFR / FR / HR

Abis

Encoding

MS

BSC16 kb/s

BTS

16 kb/s

CODED SIGNAL

22.8 kb/s

Uplink direction Downlink direction

• Wideband speech cannot be transferred using the same technique• Requires 16 kHz * 14 bit connection speeds, which are UNAXEPTABLY

HIGH!

• → wideband speech should be transferred only in CODED FORM!

GSM

Page 8: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

8 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

TrFO and TFO

• Transcoder Free Operation (TrFO) transfers coded speech frames in ATM- and IP-based networks as such

• Transcoder-free means that the same codec is used on the both sides of a connection → Out-of-Band Transcoder Control (OoBTC) is needed

• OoBTC requires the late assignment of a radio traffic channel with forward bearer establishment in CN (see the next slide for details)

• In Tandem Free Operation (TFO) coded frames are merged into least significant bits (LSB) of PCM-based signals

• The TFO is utilized in TDM networks

• TFO protocol negotiates with the distant partner a common codec to be used by sending messages in-band

• Message bits replace every 16th LSB

• When both mobile terminals switch to a compatible codec, coded speech frames can be merged into PCM-based stream that was decoded from those coded frames

Page 9: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

9 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Out-of-Band Transcoder Control in TrFO

•In TrFO negotiation of the codec to be used during the call has to be performed before the bearer establishment procedures

RNC-T

MSC-S-T

MGW-T

MSC-SO -

MGW-ORNC-O

SETUP

IAM + Bearer Information

Paging

SETUP

ß Bearer establishment

Nb UP Initialization à

Bearer establishment à

Iu UP Initialization à

ß Bearer establishment

ß Iu UP Initialization

UE UE

ALERTING

CONNECT

Early assignment of a radio traffic channel with backward bearer establishment in CN

RNC-T

MSC-S-T

MGW-T

MSC-SO -

MGW-O RNC-O

SETUPIAM

Paging

SETUP

Bearer establishment à

Nb UP Initialization à

Bearer establishment à

Iu UP Initialization à

ß Bearer establishment

ß Iu UP Initialization

UE UE

ALERTING

CONNECT

Bearer Information

{

Late assignment of a radio traffic channel with forward bearer establishment in CN

Page 10: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

10 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

TFO frames 1

• When TFO is operational 1, 2 or 4 LSBs of every 8-bit PCM sample are replaced by TFO frames

• TFO frames requiring replacement of 4 LSBs consist of the main frame part (1st and 2nd LSBs) and the extension frame part (3rd and 4th LSBs).

• During the transmission through the core network TFO frames should not be modified by noise suppression, level control or other enhancement algorithms

... ... ... ... ...... ... ...

1 2 3 158 159 160 1 3 5

2 4 6315 317 319

316 318 320

2 4 6

2 4 6

1 3 5

1 3 5

}} main

frame part

extensionframe part

315 317 319

316 318 320

315 317 319

316 318 320

8k TFO frame 16k TFO frame 32k TFO frame

} unaltered sample bits

87654321

Bits

160 samplesTFO frame length=160bits

160 samplesTFO frame length=320bits

160 samplesTFO frame length=640bits

Page 11: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

11 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

TFO frames 2

• TFO frames are different for each codec and each coding mode, if a multi-rate codec is in question

• TFO frames contain synchronization bits, control and error correction bits, time alignment bits, spare bits and actual data bits

• Synchronization and control bits are used only in the main part

• On the right is an example of the TFO frames specified for the AMR-WB, the coding mode is 23.85 kb/s

Page 12: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

12 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Lawful interception

• Before an operator may launch a commercial telecommunication network, it has to provide the lawful interception service.

• The quality provided for the authorities has to be the same or better than the quality provided for the monitored target

• PCM-based intercepted signals are directed to the authorities as such

• Coded signals are converted into PCM form

• What to do if the intercepted signal contains TFO frames? After all, the signal is noisy

• The solution is utilization of the passive TFO protocol

•But how bad the noise really is?

Page 13: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

13 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Signal interception simulation

• Theoretical noise floor values were calculated with the assumption that every bit in signal representation raises the dynamics of the signal 6 dB

• The results were verified by sending silence through the testing system

• Also the MOS quality values of the speech signals were evaluated using the WB-PESQ tool

• In tests the scheme presented on the right was simulated

Input

Output

Input

Output

&

or

or

G.711

G.711

coded

coded

G.711 (+TFO)

G.711 (+TFO)

Local TFO Distant TFO

Passive TFO

Transit network

Encoder

Radio interface

Decoder

Down-sampler

G.711 converter

1

Decoder

G.711

coded

3

4

wideband speech

Down-sampler

G.711 converter

or

Interface towards

authorities

1. Original wideband signal2. Once transcoded wideband signal3. Pure narrowband G.711 signal4. Narrowband G.711 signal with possible embedded TFO frames

3

4

2b

2a

Page 14: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

14 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Test results: Noise floor values

• Linear notation of the A-law is 13 bits and the µ-law is 14 bits. The first bit is the sign bit and it is not one of the effective bits in representation

• In theory only half of the bits are really replaced → measured noise floor values are lower than the calculated ones

Noise floor in dBov Corrupted bits

in G.711 sample

Corrupted bits

in linear values

Effective bits in

linear level representations

Unaltered

bits in linear values

Calculated (approx)

Measured (exact)

0 0 12 -72 -72.26

every 16th LSB -71.21

1 2 10 -60 -64.77

2 3 9 -54 -59.47

A –law

4 5

12

7 -42 -47.59

0 0 13 -78 -78.26

every 16th LSB -76.47

1 2 11 -66 -74.74

2 3 10 -60 -66.44

µ –law

4 5

13

8 -48 -51.42

Page 15: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

15 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Test results: MOS quality values

• The level of the original signals was -26 dBov and SNR 45 dB

• Decoded from TFO frames signals (2b) are slightly different than the originally decoded ones (2a), as TFO protocol needs approx 1 second time to establish a connection. During that time no coded speech frames are sent

Signal files Decoded (2a) G.711 (3) G.711+TFO (4) Decoded TFO (2b)

T04 3.9 3.1 1.7 3.6

T05 4.1 3.9 1.8 3.8

T14 3.7 3.4 1.8 3.6

T18 3.7 2.9 2.1 3.6

Average 3.9 3.3 1.9 3.7

Page 16: 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec deployment in 3G Core Network Sergei

16 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy

Conclusions

• SNR values of the intercepted signals with AMR-WB-specific TFO frames were 15-25 dB (original signals -26 dBov) and MOS grades below two.

• If the original signals would have contained noise from the beginning, as it is usually in real phone-calls, the quality would have been lower

• Using in the tests signals with lower levels, -30 and -36 dBov, which corresponds to intensive whispering in real-world calls, the results would have been even worse

• → authorities will not be satisfied with the quality of the intercepted signal

•→ the passive TFO protocol is needed indeed!