15
SPEECH CODING Maryam Zebarjad Alessandro Chiumento

SPEECH CODING Maryam Zebarjad Alessandro Chiumento

Embed Size (px)

Citation preview

SPEECH CODING

Maryam ZebarjadAlessandro Chiumento

SPEECH PROPERTIES

2 categories: Voiced and Unvoiced Voiced: quasi-periodic in the time domain and harmonically

structured in the frequency domain Unvoiced: random-like and broadband (like white noise)

Why speech coding? Efficient transmission Efficient storage

Problems:High quality with the lowest bit-rate

possible

Performance measures

2 ways of measuring:

Objective SNR, long term SEGSNR, short term

Subjective DRT

Diagnostic Rhyme Test DAM

Diagnostic Acceptability Measure

MOS Mean Opinion Score

4 standards for speech quality:

Broadcast, Network, Communications, Synthetic

Coding Techniques:

WAVEFORM CODERS digitalize speech on a sample-by-sample basis. The goal is to have

the output waveform closely match the input waveform. Scalar and vector quantization Sub-band coders Transform coders

SINUSOIDAL ANALYSIS-SYNTHESIS They relay on the sinusoidal representation of the speech waveform

Short - Time Fourier Transform models Sinusoidal Transform Coding Multiband Excitation Coder

VOCODERS Speech – specific coders

Formant Vocoders Channel Vocoders LPC Vocoders

Scalar and Vector Quantization

SQ: every sample is mapped into a specific codeExamples : PCM, DPCM, DM, ADPCM....

Scalar and Vector Quantization

VQ: the data (speech) is compressed by encoding it in blocks. The incoming vectors are formed from consecutive data samples or from model parameters.

Examples: VPCM, GS-VQ, A-VQ...

Sub-band Coders Unlike SQ and VQ this coders rely more on frequency- domain

properties of speech. the signal band is divided into frequency sub-bands using a bank

of bandpass filters. The output of each filter is then sampled (or down-sampled) and encoded.

Example: AT&T, CCITT (G.722),...

Transform Coders

Work on spectral properties of speech (like SBC)

They use unitary transforms whose parameters are quantized at the transmitter and decoded and inverse-transformed at the receiver

The potential for bit-rate reduction in transform coding lies in the fact that unitary transforms tend to generate near-uncorrelated transform components which can be coded independently

Although there are many possible transforms that can be used (DCT, DFT, WHT, KLT,…) all share the property of unitarity:

Example: Adaptive Transformation Coder It employs DCT and has high performance

Speech Coding Using Sinusoidal Analysis – Synthesis Models

This speech coders relay on the sinusoidal representation of the speech waveform

Speech Analysis-Synthesis Using the Short-Time Fourier Transform Speech is slowly time-varying (quasi-stationary) and can be

modeled by its short time spectrum

Analysis expression Synthesis expression

h(n) is the sliding analysis window and is often constrained to be about 5 – 20 ms

Speech Coding Using Sinusoidal Analysis – Synthesis Models

Speech Analysis-Synthesis Using the Sinusoidal Transform Coding The speech is represented by linear combination of sinusoids with

time-varying amplitudes, phases and frequencies:

McAulay - Quartieri

The number of sinusoids L is time-varying, the possibility to reduce bit-rate comes from the fact that voiced speech is highly periodic and L can be adjusted accordingly.

Furthermore the statistical properties of the Short-Time spectrum of unvoiced speech are preserved.

Vocoders

Speech specific Low bit rate but performance degrades for non

speech signals 4 types:

Channel, Formant, Homomorphic, LPC LPC Vocoders are divided in 3 categories based in

excitation models: 2-state excitation Mixed excitation residual

LPC VocoderFor a p-th order forward linear prediction the present sample if predicted from linear compination of p past samples

The prediction parameters are obtained by minimizing the mean square forward prediction error

where

For forward estimation:

The system can be solved using the recursion:

Levinson – Durbin

Wokplan

Implementation of: LPC Vocoder DCT Transform Coder DPCM Coder

Comparison of three methods for specific speech signals