Upload
nikolay-karpov
View
307
Download
2
Tags:
Embed Size (px)
Citation preview
Methods and algorithms of speech recognition course
Lection 3
Nikolay V. Karpov
nkarpov(а)hse.ru
Speech Waves in Tube
Derive a theoretical model of how sound waves are affected by the vocal tract
Describe a model for lip radiation Describe a model for the pulsating glottal
waveform during voiced speech Assemble the components of a simple
speech synthesiser
Speech Waves in Tube
Ug and Ul are the volume flow of air at the glottis and lips respectively.
Vocal tract is of length L (typically 15-17 cm in adults)
Number of tube segments needed = 2L/cT≈0.001 fsamp
Multi-Tube Model of Vocal TractWe model the vocal tract as a tube that has p segments.
Mass × Acceleration = Force
Adiabatic Gas Law
This equations are known as the wave equations
Solution:
It is easily verified that this solution satisfies the wave equations for any differentiable functions u.
The two functions u represent waves travelling in +ve and –ve directions at velocity c. The actual values of the waves are determined by the boundary conditions at the end of the tube section
Wave Equations
x
pA
t
u
x
pV
x
pxA
t
u
ApV
1
x
uc
t
pA
2
)/()/(),(
)/()/(),(
cxtucxtuA
ctxp
cxtucxtutxu
Assumptions: Sound waves are 1-dimensional: true for frequencies < 3-4
kHz whose wavelengths are long compared to the tube width No frictional or wall-vibration energy losses
Sound Waves in a Tube Acoustic signal is the superposition of two waves: U in
the forward direction and V in the reverse direction
Time for sound to travel along segment = L/cp
Segment length chosen to correspond to half a sample period = 0.5cTIf we take z-transforms, this time delay corresponds to multiplying by
Segment Delays
)()(cp
Ltxtv )()(
cp
Ltwtu
21)(tz
X
W
ztz
X
W
z
zV
U
zWtzzUzXtzzV
121
21
21
21
21
0
01)(
0
0
)()()();()()(
In matrix form
The transfer function is given by:
1-segment transfer function
tjG
tjG
eUcl
cxltxu
eUcl
cxl
A
cjtxp
)(]/sin[
]/)(cos[),(
)(]/cos[
]/)(sin[),(
)/cos(
1
),0(
),(
clU
lU
This function has poles located at every
These correspond to the frequencies at which the tube becomes a quarterwavelength
l
c
F
ccTl
s 4;
22
1
l
cn
2
)12(
Flow Continuity:
Pressure Continuity:
In matrix form:
Hence:
Segment Junction
XWVU
XWB
cVU
A
c
X
W
AAV
U
BB
1111
X
W
BABA
BABA
BX
W
AAB
B
BV
U
2
111
1
1
2
1
Define the reflection coefficient to be
Reflection Coefficients
AB
ABr
X
W
r
r
rX
W
BABA
BABA
BV
U
1
1
1
1
2
1
Reflection coefficients always lie in the range ±1
A3 is large but not infinite: assumption of narrow tube breaks down at this point
A0 is approximately zero: area of glottis opening
2-Segment Vocal Tract Assume Vl = 0:
no sound reflected back into mouth
Work backwards from lips towards glottis:◦ Junction: use the
reflection matrix◦ Tube segment:
use the delay matrix
Multiplying out the matrices gives
We can ignore Vg: it gets absorbed in the lungs. The vocal tract transfer function is given by the ratio of Ul to Ug
Vocal Tract Transfer Function
22
11
1
220
12110
12
0
1)(1
)1(
zaza
Gz
zrrzrrrr
zr
U
U kk
g
l
l
kk
g
gU
zrzrrrrr
zrrzrrrr
r
zV
U
2
21
21010
220
12110
2
0
1
)(
)(1
)1(
Multiplying together all the matrices for a p-segment vocal tract gives:
This results in a transfer function of the form:
p-segment Vocal Tract
1
121
121 1
10
01
1
1
1
1
zr
rz
r
z
zz
r
r
r
lpk
kp
kp
kk
p
g
gU
rzr
zr
r
zV
U
11
)1(1
1
0
0
2
1
pp
p
g
l
zazaza
Gz
U
U
22
11
2
1
1
G is a gain term
is the acoustic time delay along the vocal tract The denominator represents a p-th order all-pole filter
pz 2
1
R(z) is the transfer function between airflow at the lips and pressure at the microphone
For a lip-opening area of A, acoustic theory predicts a 1st-order high-pass response with a corner frequency of:
For fsamp< 20 kHz, a good approximation is:
Lip Radiation
kHzHzA
c5
4
2sin2)(
1)(
)()( 1
TzR
zzU
zSzR
l
“LF Model” (Liljencrants & Fant)
Spectrum of Glottal Waveform
1
0)sin()('
ttdec
ttbtetu
eft
eat
g
egggg tatcontinuoustuandtuuu )(')(;0)1()0(
Line Spectrum of (approx –12 dB/octave):gu
Larynx Frequency ≈130 Hz
First Vocal tract resonance (formant) ≈1 kHz
There is not necessarily any relation between the larynx frequency and the vocal tract resonances.
Resonances at a multiple of the larynx frequency will be louder (good for singers)
Vowel Waveform
Vocal Tract Shape and Response
This lecture reviews some well known facts about filters and introduces some less known ones that will be needed later on. Derive the power response of first order FIR and IIR filters
and relate this to the geometry of the pole-zero diagram. Relate the bandwidth of a 2nd-order resonance to the
geometry of the pole-zero diagram. Describe the bandwidth expansion transformation of a
filter. Describe the effect of reversing the coefficients of a filter. Derive expressions for the log frequency response and its
average value
Filters
System which is perform this transformation called linear digital filter
y(n) – output, x(n) - input, - impulse response
Transfer function
Linear digital filter
k
k knxhny )()(
n
nznxzXzX
zYzH )()(;
)(
)()(
k
kk zhzH )(
kh
L
ll
I
ii lnxbinya
00
)()(
Digital filter is a finite system
I
i i
L
l i
I
i
ii
L
l
ll
z
z
za
zbzH
0
0
1
0
)(
)(
1)(
A linear time-invariant system can be characterized by a constant-coefficient difference equations
Filters, and Signal Flow Graphs
M
kk
N
kk knxbknyany
01
)()()(
Such systems can be implemented as signal flow graphs:
Iiai 1;1
Llbl 1;1
Stable filter
Minimum phase filter
Filter has a single zero at Frequency response of filter Power response of filter
Example
1st order FIR filter
M
kk knxbny
0
)()( )1()()(1)( 1 naxnxnyazzH
jreaz jj aeeH 1)(
)cos(21
)*1)(1(
)(*)()(
2
2
rr
eaae
eHeHeH
jj
jjj
)59.0cos(44.152.1)(
4.06.02
jeH
ja
We can calculate the log response of the filter
If |a|<1 then and we can expand the log as a power series using
Log Frequency Response for |a|<1
)1log())(log( jj aeeH
1 jae
1;32
)1log(32
d
dddd
jn
n
nj e
n
aeH
1
))(log(
j
n
nj rean
n
reH
));(cos(2)(log1
2
First six terms in the summation for:a = 0.6 + 0.4j
If |a|>1, we can rearrange the formula in terms of
Log Frequency Response for |a|>1
1a
)1log()log())1(log())(log( 11 jjjjj eaaeeaaeeH
Since we can expand the log as before to obtain
11 a j
n
nj rean
n
raeH
));(cos(2log2)(log1
2
The average of if |a|>1a2logis))(log( 2jeH
The log response of an arbitrary filter is just the sum of the log responses of each pole or zero. For a stable filter, all the poles must be within the unit circle. Hence
Filter has a single pole at Power response of filter is given by
Single pole filter)1()()(
1
1)(
1
naynxny
azzH
jreaz
)cos(21
1)(
2
2
rreH j
2)( jeH
2)1( rPeak
If the filter coefficients are real, any complex zeros or poles will always occur in conjugate pairs.
The response of the filter is the product of the responses of the individual poles. Conjugate pole/zero pairs ensure a symmetric response.
Pole Pairs
Example: Poles at jj reej 59.072.04.06.0
21221 52.02.11
1
cos21
1)(
zzzrzr
zH
2)( jeH
But since |z|=1, we have This is just the distance between z and a.
The magnitude response of the filter at a frequency ω is proportional to the product of the distance from the point to all the zeros divided by the product of the distance to all the poles .The constant of proportionality is
Geometrical Interpretation
11
11
*11
1)(
)*1)(1(
1)(
zaazzH
zaazzH
azazzaz 111
je
I
i i
L
l i
z
zzH
0
1
0
1
0
0
)1(
)1()(
0
0
The bandwidth of a resonance peak is the frequency range at which the magnitude response has decreased by √2.
For poles near the unit circle this is approximately 2(1–r)rad/s = (1–r)/πHz (normalised).
Bandwidth of a Resonance Peak
)1(212 r
Bandwidth expansionIf we have a filter We can form a new filter by
multiplying coefficients ai and bi by ki for some k< 1.
I
i
ii
L
l
ll
za
zbzH
1
0
1)(
I
i
iii
L
l
lll
zka
zkbkzHzG
1
0
1)/()(
If H(z)has a pole/zero at z0, then G(z)will have one at kz0.All poles and zero will be moved inwards by a factor k.If the bandwidth of a pole of H(z) is b=2(1–r), then the bandwidth of the corresponding pole in G(z) will be expanded to:
)1(2)1(2 krbkr 95.0k
If we have a filter
We can form a new filter by conjugating the coefficients and putting them in reverse order:
If z0 is a zero of H(z)then z0*–1 is a zero of G(z). This is called a reflectionin the unit circle.
The frequency response of G(z) is given by:
Hence G(z) has the same magnitude response as H(z) but a different phase response
Coefficient Reversal)*(*)( 1*
01*
1*
zHzzbzbbzG pppp
)(*)( jjpj eHeeG
)()(
)()(
jj
jj
eHeG
peHArgeGArg