Speech waves in tube and filters

Methods and algorithms of speech recognition course

Lection 3

Nikolay V. Karpov

nkarpov(а)hse.ru

Speech Waves in Tube

Derive a theoretical model of how sound waves are affected by the vocal tract

Describe a model for lip radiation Describe a model for the pulsating glottal

waveform during voiced speech Assemble the components of a simple

speech synthesiser

Speech Waves in Tube

Ug and Ul are the volume flow of air at the glottis and lips respectively.

Vocal tract is of length L (typically 15-17 cm in adults)

Number of tube segments needed = 2L/cT≈0.001 fsamp

Multi-Tube Model of Vocal TractWe model the vocal tract as a tube that has p segments.

Mass × Acceleration = Force

Adiabatic Gas Law

This equations are known as the wave equations

Solution:

It is easily verified that this solution satisfies the wave equations for any differentiable functions u.

The two functions u represent waves travelling in +ve and –ve directions at velocity c. The actual values of the waves are determined by the boundary conditions at the end of the tube section

Wave Equations

x

pA

t

u

x

pV

x

pxA

t

u

ApV

1

x

uc

t

pA

2

)/()/(),(

)/()/(),(

cxtucxtuA

ctxp

cxtucxtutxu

Assumptions: Sound waves are 1-dimensional: true for frequencies < 3-4

kHz whose wavelengths are long compared to the tube width No frictional or wall-vibration energy losses

Sound Waves in a Tube Acoustic signal is the superposition of two waves: U in

the forward direction and V in the reverse direction

Time for sound to travel along segment = L/cp

Segment length chosen to correspond to half a sample period = 0.5cTIf we take z-transforms, this time delay corresponds to multiplying by

Segment Delays

)()(cp

Ltxtv )()(

cp

Ltwtu

21)(tz

X

W

ztz

X

W

z

zV

U

zWtzzUzXtzzV

121

21

21

21

21

0

01)(

0

0

)()()();()()(

In matrix form

The transfer function is given by:

1-segment transfer function

tjG

tjG

eUcl

cxltxu

eUcl

cxl

A

cjtxp

)(]/sin[

]/)(cos[),(

)(]/cos[

]/)(sin[),(

)/cos(

1

),0(

),(

clU

lU

This function has poles located at every

These correspond to the frequencies at which the tube becomes a quarterwavelength

l

c

F

ccTl

s 4;

22

1

l

cn

2

)12(

Flow Continuity:

Pressure Continuity:

In matrix form:

Hence:

Segment Junction

XWVU

XWB

cVU

A

c

X

W

AAV

U

BB

1111

X

W

BABA

BABA

BX

W

AAB

B

BV

U

2

111

1

1

2

1

Define the reflection coefficient to be

Reflection Coefficients

AB

ABr

X

W

r

r

rX

W

BABA

BABA

BV

U

1

1

1

1

2

1

Reflection coefficients always lie in the range ±1

A3 is large but not infinite: assumption of narrow tube breaks down at this point

A0 is approximately zero: area of glottis opening

2-Segment Vocal Tract Assume Vl = 0:

no sound reflected back into mouth

Work backwards from lips towards glottis:◦ Junction: use the

reflection matrix◦ Tube segment:

use the delay matrix

Multiplying out the matrices gives

We can ignore Vg: it gets absorbed in the lungs. The vocal tract transfer function is given by the ratio of Ul to Ug

Vocal Tract Transfer Function

22

11

1

220

12110

12

0

1)(1

)1(

zaza

Gz

zrrzrrrr

zr

U

U kk

g

l

l

kk

g

gU

zrzrrrrr

zrrzrrrr

r

zV

U

2

21

21010

220

12110

2

0

1

)(

)(1

)1(

Multiplying together all the matrices for a p-segment vocal tract gives:

This results in a transfer function of the form:

p-segment Vocal Tract

1

121

121 1

10

01

1

1

1

1

zr

rz

r

z

zz

r

r

r

lpk

kp

kp

kk

p

g

gU

rzr

zr

r

zV

U

11

)1(1

1

0

0

2

1

pp

p

g

l

zazaza

Gz

U

U

22

11

2

1

1

G is a gain term

is the acoustic time delay along the vocal tract The denominator represents a p-th order all-pole filter

pz 2

1

R(z) is the transfer function between airflow at the lips and pressure at the microphone

For a lip-opening area of A, acoustic theory predicts a 1st-order high-pass response with a corner frequency of:

For fsamp< 20 kHz, a good approximation is:

Lip Radiation

kHzHzA

c5

4

2sin2)(

1)(

)()( 1

TzR

zzU

zSzR

l

“LF Model” (Liljencrants & Fant)

Spectrum of Glottal Waveform

1

0)sin()('

ttdec

ttbtetu

eft

eat

g

egggg tatcontinuoustuandtuuu )(')(;0)1()0(

Line Spectrum of (approx –12 dB/octave):gu

Larynx Frequency ≈130 Hz

First Vocal tract resonance (formant) ≈1 kHz

There is not necessarily any relation between the larynx frequency and the vocal tract resonances.

Resonances at a multiple of the larynx frequency will be louder (good for singers)

Vowel Waveform

Vocal Tract Shape and Response

This lecture reviews some well known facts about filters and introduces some less known ones that will be needed later on. Derive the power response of first order FIR and IIR filters

and relate this to the geometry of the pole-zero diagram. Relate the bandwidth of a 2nd-order resonance to the

geometry of the pole-zero diagram. Describe the bandwidth expansion transformation of a

filter. Describe the effect of reversing the coefficients of a filter. Derive expressions for the log frequency response and its

average value

Filters

System which is perform this transformation called linear digital filter

y(n) – output, x(n) - input, - impulse response

Transfer function

Linear digital filter

k

k knxhny )()(

n

nznxzXzX

zYzH )()(;

)(

)()(

k

kk zhzH )(

kh

L

ll

I

ii lnxbinya

00

)()(

Digital filter is a finite system

I

i i

L

l i

I

i

ii

L

l

ll

z

z

za

zbzH

0

0

1

0

)(

)(

1)(

A linear time-invariant system can be characterized by a constant-coefficient difference equations

Filters, and Signal Flow Graphs

M

kk

N

kk knxbknyany

01

)()()(

Such systems can be implemented as signal flow graphs:

Iiai 1;1

Llbl 1;1

Stable filter

Minimum phase filter

Filter has a single zero at Frequency response of filter Power response of filter

Example

1st order FIR filter

M

kk knxbny

0

)()( )1()()(1)( 1 naxnxnyazzH

jreaz jj aeeH 1)(

)cos(21

)*1)(1(

)(*)()(

2

2

rr

eaae

eHeHeH

jj

jjj

)59.0cos(44.152.1)(

4.06.02

jeH

ja

We can calculate the log response of the filter

If |a|<1 then and we can expand the log as a power series using

Log Frequency Response for |a|<1

)1log())(log( jj aeeH

1 jae

1;32

)1log(32

d

dddd

jn

n

nj e

n

aeH

1

))(log(

j

n

nj rean

n

reH

));(cos(2)(log1

2

First six terms in the summation for:a = 0.6 + 0.4j

If |a|>1, we can rearrange the formula in terms of

Log Frequency Response for |a|>1

1a

)1log()log())1(log())(log( 11 jjjjj eaaeeaaeeH

Since we can expand the log as before to obtain

11 a j

n

nj rean

n

raeH

));(cos(2log2)(log1

2

The average of if |a|>1a2logis))(log( 2jeH

The log response of an arbitrary filter is just the sum of the log responses of each pole or zero. For a stable filter, all the poles must be within the unit circle. Hence

Filter has a single pole at Power response of filter is given by

Single pole filter)1()()(

1

1)(

1

naynxny

azzH

jreaz

)cos(21

1)(

2

2

rreH j

2)( jeH

2)1( rPeak

If the filter coefficients are real, any complex zeros or poles will always occur in conjugate pairs.

The response of the filter is the product of the responses of the individual poles. Conjugate pole/zero pairs ensure a symmetric response.

Pole Pairs

Example: Poles at jj reej 59.072.04.06.0

21221 52.02.11

1

cos21

1)(

zzzrzr

zH

2)( jeH

But since |z|=1, we have This is just the distance between z and a.

The magnitude response of the filter at a frequency ω is proportional to the product of the distance from the point to all the zeros divided by the product of the distance to all the poles .The constant of proportionality is

Geometrical Interpretation

11

11

*11

1)(

)*1)(1(

1)(

zaazzH

zaazzH

azazzaz 111

je

I

i i

L

l i

z

zzH

0

1

0

1

0

0

)1(

)1()(

0

0

The bandwidth of a resonance peak is the frequency range at which the magnitude response has decreased by √2.

For poles near the unit circle this is approximately 2(1–r)rad/s = (1–r)/πHz (normalised).

Bandwidth of a Resonance Peak

)1(212 r

Bandwidth expansionIf we have a filter We can form a new filter by

multiplying coefficients ai and bi by ki for some k< 1.

I

i

ii

L

l

ll

za

zbzH

1

0

1)(

I

i

iii

L

l

lll

zka

zkbkzHzG

1

0

1)/()(

If H(z)has a pole/zero at z0, then G(z)will have one at kz0.All poles and zero will be moved inwards by a factor k.If the bandwidth of a pole of H(z) is b=2(1–r), then the bandwidth of the corresponding pole in G(z) will be expanded to:

)1(2)1(2 krbkr 95.0k

If we have a filter

We can form a new filter by conjugating the coefficients and putting them in reverse order:

If z0 is a zero of H(z)then z0*–1 is a zero of G(z). This is called a reflectionin the unit circle.

The frequency response of G(z) is given by:

Hence G(z) has the same magnitude response as H(z) but a different phase response

Coefficient Reversal)*(*)( 1*

01*

1*

zHzzbzbbzG pppp

)(*)( jjpj eHeeG

)()(

)()(

jj

jj

eHeG

peHArgeGArg

Education

Speech waves in tube and filters