Upload
dinhkhanh
View
240
Download
0
Embed Size (px)
Citation preview
Speech Technologies – Speech Coding
Speech CodingSpeech Coding1. 1. IntroductionIntroduction2. LPC 2. LPC Vocoder Vocoder 3. 3. AnalysisAnalysis--byby--Synthesis CodingSynthesis Coding
Speech Technologies – Speech Coding
Speech Coding ClassificationSpeech Coding Classification1.1. Waveform CodingWaveform Coding
To reconstruct To reconstruct a a signalsignal waveform similar to the waveform similar to the original oneoriginal one
PCM G.711 64 PCM G.711 64 kbkb/s, ADPCM G.721 32 /s, ADPCM G.721 32 kbkb/s/sSBC G.722SBC G.722
2.2. Source CodingSource CodingTo reconstruct a signal based on the To reconstruct a signal based on the production production
model ofmodel ofLPC LPC VocoderVocoder FS1015 2,4 FS1015 2,4 kbkb/s, MELP 2,4 /s, MELP 2,4 kbkb/s/s
3.3. Hybrid coders Hybrid coders -- AnalysisAnalysis--byby--Synthesis Synthesis Waveform Waveform Coding Coding based on the production model based on the production model
ETSI GSM, CELP G.729ETSI GSM, CELP G.729
Speech Technologies – Speech Coding
Coders ComparisonCoders Comparison
1. Bit Rate kb/s2. Voice Quality MOS (Mean Opinion Score)3. Complexity4. Delay5. Channel error sensibility6. Bandwithd
Coder Bit Rate kb/s MOS BW (kHz)CD Audio 1.411 5.0 44,1
PCM 64 4.3 8
ADPCM 40,32,24,16 4.2 (32 kb/s) 8
SBC 64,56,48 >4.5 16
Speech Technologies – Speech Coding
ComparisonComparison
Speech Technologies – Speech Coding
SpeechSpeech CodingCoding: : LPC LPC Vocoder Vocoder
Speech Technologies – Speech Coding
LPC LPC AnalysisAnalysis
LPC Synthesis:
P(z)
)(ns
)(ˆ ns
)(neH(z)=1/A(z)
∑=
−−=p
i
ii zazP
1·)(
H(z): Vocal tract transfer function
Speech Technologies – Speech Coding
LPC LPC Vocoder Vocoder
Simplification of the excitation in the synthesis:
Train of periodic impulses for voicedsegments White Gaussian noise in the unvoiced segments Maintenance of the power in the new synthetic excitation. Examples:
Speech Technologies – Speech Coding
LPC LPC VocoderVocoder
P(z)
+
H(z)x
ANÁLISISLPC
P(z)
- ANÁLISIS-PITCH-U/V
G
CoeficientesReflexión
G
V
U
F0)(nr
)(ˆ ns
)(ns
)(ns
1/F0
Speech Technologies – Speech Coding
LPC10E/LPC10E/FS1015 FS1015 VocoderVocoder
54 bits/frame
Pitch + U/V->7bitsG->5bitsK1 a K4->5bitsK5 a K8-> 4bitsK9->3bitsK10->2bits
Fs= 8000 samples/sec54bits/frame180 samples/frame(22.5 ms/frame)
54*8000/180=2400bits/sec
Speech Technologies – Speech Coding
LPC10E LPC10E Vocoder Vocoder
Examples:Original signalReconstructed signal LPC10E Reconstructed signal LPC10E (satellite radio transmision)
Features:Nasality: pole-zero modelSimple voiced excitation(train of impulses): buzzingFrame size: problems with fast transitions (p, t, k…)
Speech Technologies – Speech Coding
MELP:MELP: MixedMixed--ExcitationExcitation LinearLinear Predictive Predictive VocoderVocoder
2400 bps Federal Standard speech coder
The excitation signal is generated by means of a mixture of noise and train of impulses in different frequency bands
Speech Technologies – Speech Coding
MELP:MELP: MixedMixed--ExcitationExcitation LinearLinear Predictive Predictive VocoderVocoder
Speech Technologies – Speech Coding
MELP:MELP: MixedMixed--ExcitationExcitation LinearLinear Predictive Predictive VocoderVocoder
Original signal “clean” Lpc-10
Reconstructed signal MELP “clean”
Original signal “noisy”
Reconstructued signal “noisy”Data rate: 2400 bps (54* 44,44444 frames/second) Sampling rate: 8 kHzBit stream format: For each 22.5 ms frame of input speech, the following 54 bits are placed into the bit-stream (in this order)Description Number of bits
Pitch index 7Jitter flag 1Bandpass voicing decision 4x1Gain for second half of frame 5Gain for first half of frame 3LSP frequencies (10 line spectrum pairs) 25Fourier magnitudes (10 harmonies) 8Sync bit 1 Total 54
Speech Technologies – Speech Coding
Hybrid CodersHybrid CodersPredictivePredictive CodersCoders basedbased onon AnalysisAnalysis--byby--
SynthesisSynthesis
Speech Technologies – Speech Coding
Hybrid CodersHybrid CodersDepending on the excitation they are classified in three basic types 1. MultiPulse Excitation (MPE)2. Regular Pulse Excitation (RPE)3. Code Excitated Linear Prediction (CELP)
Speech Technologies – Speech Coding
Hybrid CodersHybrid CodersCELP: Code Excitated Linear Prediction
Speech Technologies – Speech Coding
ShortShort--Time Time AnalysisAnalysis
Typical values:Analysis frame: 25 ms (200 samples)Speech frame: 20 ms (160 samples)Subframe: 5 ms (40 samples)
Speech Technologies – Speech Coding
Synthesis FilterSynthesis FilterBased on short-time and long-time Linear Prediction
s(n) rL(n)
ANÁLISISA CORTO
ANÁLISISA LARGO
P(z)
-PL(z)
-r(n)
SÍNTESIS
PL(z) P(z)
r(n)rL(n)+ +
s(n)
)(ˆ nr )(ˆ ns
Speech Technologies – Speech Coding
Synthesis FilterSynthesis FilterLong-Term predictor
ˆ( ) ( )r n r n Dβ= −
ˆ( ) ( ( 1)) ( ) ( ( 1))1 2 3
r n r n D r n D r n Dβ β β= − + + − + − −Estimation
or
)(ˆ nr
Parameter estimation, minimum mean square error( ) ( ) ( )e n r n r n Dβ= − −
[ ]211 2( ) ( ) ( )
00
NNE e n r n r n D
nnβ
−−= = − −∑∑
==
/ 0E β∂ ∂ =[ ]
1( ) ( )
01 2( )0
Nr n r n D
nN
r n Dn
β
−−∑
==−
−∑=
Speech Technologies – Speech Coding
Synthesis FilterSynthesis FilterFound the value of D to minimize the error power E
[ ]
21( ) ( )
1 2 0( ) 1 20 ( )0
Nr n r n D
N nE r n Nn r n D
n
−−∑
− == −∑ −= −∑
=
Speech Technologies – Speech Coding
PerceptualPerceptual Weighting Filter Weighting Filter
Function: to modify the frequency characteristics of the error to minimize, granting more importance to the zones of frequency in which the ear is going to be more sensible and less importance to the zones in which the ear is going to be less sensible. Based on the frequencial masking:
In the formats, it is possible to allow more errors The filter transfer function will be inversely proportional to the spectral envelope of the coding speech signal. Transfer Function proposed: W(z)=A(z)/A(γ-1z)The parameter γ=[0,1], controls the level of weighting. Must be updated jointly with the predictor.
Speech Technologies – Speech Coding
1 1( ) 1 1( )
( / ) 11 (1 ( ) )1 1
P Pk ka z a zk kA z k kW zP PA z zk ka z pk kk k
γγ
γ
− −− −∑ ∑= == = =
− −− −∑ ∏= =
11( )
1(1 )1
P ka zkkW zP
p zkkγ
−− ∑==
−−∏=
0.8 0.9
PerceptualPerceptual Weighting Filter Weighting Filter
γ≤ ≤normally
Speech Technologies – Speech Coding
PerceptualPerceptual Weighting Filter Weighting Filter
Speech Technologies – Speech Coding
PerceptualPerceptual Weighting Filter Weighting Filter
Speech Technologies – Speech Coding
GSM 06.XX: RPE-LTPGSM 1982 "Groupe Spécial Mobile“ ,
now: "Global System for Mobile communications“RPE-LTP: Regular Pulse Excitation – Long Term Prediction
GSM 06.XX: RPE-LTP
SID – Silence Descrition FrameBFI – Bad Frame Indicator
Speech Technologies – Speech Coding
GSM 06.XX: RPE-LTPFrameFrame lostlost:1) Speech frames
a) First lost -> repetition of the last good frameb) Following losts -> decrease the output level until
the silence in 320 ms 2) SID frames
a) First lost -> repetition of the last good frameb) Following losts -> decrease the output level until
the silence in 320 ms
Speech Technologies – Speech Coding
Speech Technologies – Speech Coding
GSM 06.XX: RPE-LTP – The coder
Speech Technologies – Speech Coding
Speech Technologies – Speech Coding
GSM 06.XX: RPE-LTP – The coder
Speech Technologies – Speech Coding
GSM 06.XX: RPE-LTP – The decoder
Speech Technologies – Speech Coding
Speech Technologies – Speech Coding
GSM 06.10
For each 160 samples (20 ms.)LAR1, LAR2->6 bitsLAR3, LAR4->5 bitsLAR5, LAR6->4 bitsLAR7, LAR8->3 bitsTotal LAR’s->36 bits
For 40 samples (5ms.)Long-term predictor delay-> 7 bitsLong-term predictor gain-> 2 bitsGrid position (k)->2 bitsblock amplitude-> 6 bitspulse amplitude (13)->3 bitsTotal subframe excitation-> 56 bits
36+56·4=260 bits / 20 ms.
Bitrate = 13 kbps
GSM 06.XXConfort Confort NoiseNoise
SID SID –– Background Background Acoustic Noise EvaluationAcoustic Noise Evaluation
SID codeword with 95 bits equal to zero
Over 4 consecutive frames with VAD=0the mean of the LAR parameters and Xmax are computed
The regular pulses are replaced by a sequence of ramdom integers uniform distributed between 1 and 6.
Speech Technologies – Speech Coding
Speech Technologies – Speech Coding
GSM 06.10
Examples:Original signal: Reconstructed signal GSMDifference between original-reconstructed (transcoding noise)
White noise with the same powerOriginal signal + white noise (without the perceptual weighting )
Speech Technologies – Speech Coding
GSM 06.10
Original
Reconstructed
GSM try to keep the same waveform