17
1 Digital Speech Processing Digital Speech Processing— Lecture 9 Lecture 9 1 Short Short-Time Fourier Time Fourier Analysis Methods Analysis Methods- Introduction Introduction General Discrete General Discrete-Time Model of Time Model of Speech Production Speech Production 2 Voiced Speech: Voiced Speech: A V P(z)G(z)V(z)R(z) P(z)G(z)V(z)R(z) Unvoiced Speech: Unvoiced Speech: A N N(z)V(z)R(z) N(z)V(z)R(z) Short Short-Time Fourier Analysis Time Fourier Analysis represent signal by sum of sinusoids sum of sinusoids or complex exponentials as it leads to convenient solutions to problems (formant estimation, pitch period estimation, analysis-by-synthesis 3 methods), and insight into the signal itself • such Fourier representations Fourier representations provide convenient means to determine response to a sum of sinusoids for linear systems clear evidence of signal properties that are obscured in the original signal Why STFT for Speech Signals Why STFT for Speech Signals steady state sounds, like vowels, are produced by periodic excitation of a linear system periodic excitation of a linear system => speech spectrum is the product of the excitation spectrum and the vocal tract frequency response speech is a time time varying signal varying signal => need more 4 speech is a time time-varying signal varying signal => need more sophisticated analysis to reflect time varying properties changes occur at syllabic rates (~10 times/sec) over fixed time intervals of 10-30 msec, properties of most speech signals are relatively constant (when is this not the case) Overview of Lecture Overview of Lecture • define time time-varying Fourier transform varying Fourier transform (STFT) analysis method • define synthesis method synthesis method from time-varying FT (filter-bank summation, overlap addition) 5 show how time-varying FT can be viewed in terms of a bank of filters model bank of filters model computation methods computation methods based on using FFT application application to vocoders, spectrum displays, format estimation, pitch period estimation Frequency Domain Processing Frequency Domain Processing 6 Coding Coding: – transform, subband, homomorphic, channel vocoders Restoration/Enhancement/Modification Restoration/Enhancement/Modification: – noise and reverberation removal, helium restoration, time-scale modifications (speed-up and slow-down of speech)

ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

  • Upload
    vuphuc

  • View
    231

  • Download
    4

Embed Size (px)

Citation preview

Page 1: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

1

Digital Speech ProcessingDigital Speech Processing——Lecture 9Lecture 9

1

ShortShort--Time Fourier Time Fourier Analysis MethodsAnalysis Methods--

IntroductionIntroduction

General DiscreteGeneral Discrete--Time Model of Time Model of Speech ProductionSpeech Production

2

Voiced Speech: Voiced Speech:

•• AAVVP(z)G(z)V(z)R(z)P(z)G(z)V(z)R(z)

Unvoiced Speech:Unvoiced Speech:

•• AANNN(z)V(z)R(z)N(z)V(z)R(z)

ShortShort--Time Fourier AnalysisTime Fourier Analysis• represent signal by sum of sinusoidssum of sinusoids or

complex exponentials as it leads to convenient solutions to problems (formant estimation, pitch period estimation, analysis-by-synthesis

3

methods), and insight into the signal itself• such Fourier representationsFourier representations provide

– convenient means to determine response to a sum of sinusoids for linear systems

– clear evidence of signal properties that are obscured in the original signal

Why STFT for Speech SignalsWhy STFT for Speech Signals• steady state sounds, like vowels, are produced

by periodic excitation of a linear systemperiodic excitation of a linear system => speech spectrum is the product of the excitation spectrum and the vocal tract frequency response

• speech is a timetime varying signalvarying signal => need more

4

• speech is a timetime--varying signalvarying signal => need more sophisticated analysis to reflect time varying properties– changes occur at syllabic rates (~10 times/sec)– over fixed time intervals of 10-30 msec, properties of

most speech signals are relatively constant (when is this not the case)

Overview of LectureOverview of Lecture• define timetime--varying Fourier transformvarying Fourier transform (STFT)

analysis method• define synthesis methodsynthesis method from time-varying FT

(filter-bank summation, overlap addition)

5

( p )• show how time-varying FT can be viewed in

terms of a bank of filters modelbank of filters model•• computation methodscomputation methods based on using FFT•• applicationapplication to vocoders, spectrum displays,

format estimation, pitch period estimation

Frequency Domain ProcessingFrequency Domain Processing

6

•• CodingCoding:– transform, subband, homomorphic, channel vocoders

•• Restoration/Enhancement/ModificationRestoration/Enhancement/Modification:– noise and reverberation removal, helium restoration,

time-scale modifications (speed-up and slow-down of speech)

Page 2: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

2

Frequency and the Frequency and the DTFTDTFT

0 00

0

2 sinusoids

( ) cos( ) ( ) /where is the (in radians) of the sinusoid the Discrete-Time Fourier Transform ( )

frequency

ω ωωω

= = +

j n j nx n n e e

DTFT

7

( ) ( ) (ω ω∞

=−∞

= =∑j j n

nX e x n e xDTFT { }

{ }12

-1

)

( ) ( ) ( )

where is the of ( )frequency variable

πω ω ω

π

ω

ωπ

ω−

= =∫ j j n j

j

n

x n X e e d X e

X e

DTFT

DTFT and DFT of SpeechDTFT and DFT of Speech

1

The DTFT and the DFT for the infinite duration signal could be calculated (the DTFT) and approximated (the DFT) by the following:

( ) ( ) ( )j j m

mN

X e x m e DTFTω ω∞

=−∞

= ∑

i

8

1(2 / )

0( ) ( ) ( )

Nj N km

mX k x m w m e π

−−

=

=

(2 / )

, 0,1,..., 1

( ) ( )

using a value of =25000 we get the following plot

j

k N

k N

X e DFT

N

ω

ω π=

= −

=

i

2500025000--Point DFT of SpeechPoint DFT of Speech

Mag

nitu

deM

agni

tude

9

Log

Mag

nitu

de (d

B)

Log

Mag

nitu

de (d

B)

ShortShort--Time Fourier Time Fourier Transform (STFT)Transform (STFT)( )( )

10

ShortShort--Time Fourier TransformTime Fourier Transform• speech is not a stationary signalstationary signal, i.e., it

has properties that change with timechange with time• thus a single representationsingle representation based on all

the samples of a speech utterance, for the

11

p p ,most part, has no meaning

• instead, we define a timetime--dependent dependent Fourier transformFourier transform (TDFT or STFT) of speech that changes periodically as the speech properties change over time

Definition of STFTDefinition of STFTˆ ˆ

ˆ

ˆˆ

ˆ ˆ ˆ ( ) ( ) ( ) both and are variables

ˆ ˆ ( ) is a real window which determines the portion of ( )that is used in the computation of ( )

ω ω

ω

ω∞

=−∞

= −

• −

∑j j mn

m

jn

X e x m w n m e n

w n m x nX e

12

Page 3: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

3

Short Time Fourier TransformShort Time Fourier Transform

ˆ ˆˆ

STFT is a function of two variables, the time index, , whichis discrete, and the frequency variable, , which is continuous

ˆ( ) ( ) ( )

ˆ ˆ ˆ( ( ) ( )) fixed, variabl

ω ω

ω

ω

∞−

=−∞

= −

= − ⇒

∑j j mn

m

n

X e x m w n m e

x m w n m n DTFT e

13

ˆˆ ˆ ( )ˆ

ˆˆ ˆ

alternative form of STFT (based on change of variables) is

ˆ ( ) ( ) ( )

ˆ( ) ( )

ω ω

ω ω

∞− −

=−∞

∞−

= −

= −

j j n mn

m

j n j m

X e w m x n m e

e x n m w m e

ShortShort--Time Fourier TransformTime Fourier Transform

14

ˆ ˆˆ

ˆ

if we define

ˆ ( ) ( ) ( )

then (

ω ω

=−∞

=−∞

= −

m

j j mn

mj

n

X e x n m w m e

X e ˆ

ˆ ˆˆ ˆ ˆ ˆˆ

) can be expressed as (using )ˆ ( ) ( ) ( ) ( )

ω

ω ω ω ω− −

′ = −

= = + −⎡ ⎤⎣ ⎦j j n j j n

nn

m m

X e e X e e x n m w mDTFT

STFTSTFT--Different Time OriginsDifferent Time Origins

ˆ ˆˆ

the STFT can be viewed as having two different time origins 1. time origin tied to signal ( )

ˆ( ) ( ) ( )

ˆ ˆ ˆ( ) ( ) fi d i bl

ω ω∞

=−∞

= −

⎡ ⎤

∑j j mn

m

x n

X e x m w n m e

DTFT

15

ˆ( ) ( ) , fixed, variable

2. time origin ti

ω= −⎡ ⎤⎣ ⎦x m w n m n DTFT

ˆˆ ˆ ˆˆ

ˆˆ ˆ

ˆˆ

ed to window signal ( )

ˆ( ) ( ) ( )

( )ˆ ˆ ˆ( ) ( ) , fixed, variable

ω ω ω

ω ω

ω ω

∞− −

=−∞

= + −

=

= − +⎡ ⎤⎣ ⎦

∑j j n j mn

m

j n j

j n

w m

X e e x n m w m e

e X ee w m x n m nDTFT

Time Origin for STFTTime Origin for STFT

ˆ [0]m n x= − ⇒

16

ˆ[ ] [ ]Time origin tied towindow w m x n m− +

Interpretations of STFTInterpretations of STFT

1

ˆˆ

ˆˆ

ˆˆ

there are 2 distinct interpretations of ( )ˆ. assume is fixed, then ( ) is simply the normal Fourier

ˆ transform of the sequence ( ) ( ), for ˆfixed , ( ) has the

ω

ω

ω

− − ∞ < < ∞ =>

jn

jn

jn

X e

n X ew n m x m m

n X e same properties as a normal Fouriertransform

17

ˆˆ

ˆˆ ˆˆ

ˆ ˆ2. consider ( ) as a function of the time index with fixed.ˆ Then ( ) is in the form of a convolution of the signal ( )

with

ω

ω ω

ω−

jn

j j nn

X e n

X e x n e

ˆˆ

ˆthe window ( ). This leads to an interpretation in the form of ˆ ˆ linear filtering of the frequency modulated signal ( ) by ( ).

- we will now consider each of these interpretations of th

ω− j n

w nx n e w n

e STFT in a lot more detail

Fourier Transform InterpretationFourier Transform Interpretationˆ

ˆ consider ( ) as the normal Fourier transform of the sequenceˆ ˆ( ) ( ), for fixed .

ˆ the window ( ) slides along the sequence ( ) and defines aˆ new STFT for every value of

w

ω•

− −∞ < < ∞• −

jnX e

w n m x m m nw n m x m

n

hat are the conditions for the existence of the STFT

18

w• hat are the conditions for the existence of the STFTˆ the sequence ( ) ( ) must be absolutely summable for all

ˆ values of ˆ - since | ( ) | (32767 for 16-bit sampling)

-

• −

w n m x mn

x n L1ˆ since | ( ) | (normalized window levels)

- since window duration is usually finiteˆ ˆ ( ) ( ) is absolutely summable for all

• −

w n

w n m x m n

Page 4: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

4

Frequencies for STFTFrequencies for STFT

2ˆ ˆ( )ˆ ˆ

the STFT is periodic in with period 2 , i.e.,( ) ( ),

can use any of several frequency variables to express STFT, including

ω ω π

ω π+

= ∀

j j kn nX e X e k

19

gˆˆ (where is the sampling period for ω = ΩT T x

2 2

2 2 1

1

ˆˆ

ˆ ˆˆ ˆ

( )) to represent

analog radian frequency, giving ( )ˆ ˆˆˆ ˆ or to represent normalized frequency (0 )

ˆ or analog frequency (0 ), giving ( ) or (π π

ω π ω π

Ω

= = ≤ ≤

≤ ≤ =

j Tn

j f j Fs n n

m

X e

f FT f

F F / T X e X e )T

Signal Recovery from STFTSignal Recovery from STFTˆ

ˆ

ˆˆ

ˆ since for a given value of , ( ) has the same properties as a normal Fourier transform, we can recover the input sequence exactly

since ( ) is the normal Fourier transform of the windo

ω

ω

jn

jn

n X e

X e

1 ˆ ˆ

wed ˆ sequence ( ) ( ), then

ˆπ

∫ j j

w n m x m

20

12

0 0

ˆˆ ˆ ( ) ( ) ( )

assuming the window satisfies the property that ( ) ( a trivial requirement), then by evaluating the inverse Fourier t

ω ω

π

ωπ −

− =

• ≠

∫ j j mnw n m x m X e e d

w

12 0

ˆˆˆ

ransformˆ when , we obtain

ˆ ˆ ( ) ( )( )

πω ω

π

ωπ −

=

= ∫ j j nn

m n

x n X e e dw

Signal Recovery from STFTSignal Recovery from STFT

12 0

0 0

ˆˆˆ

ˆ ˆˆ ˆ

ˆ ˆ ( ) ( )( )

ˆ with the requirement that ( ) , the sequence ( ) can be recovered exactly from ( ), if ( ) is known

πω ω

π

ω ω

ωπ −

=

• ≠

∫ j j nn

j jn n

x n X e e dw

w x nX e X e

21

ˆfor all values of over one complete peω

ˆˆ

riod- sample-by-sample recovery process

ˆ ˆ - ( ) must be known for every value of and for all ˆcan also recover sequence ( ) ( ) but can't guarantee

that ( ) can be recovered since (

ω ω−i

jnX e n

w n m x mx m w ˆ ) can equal 0−n m

Properties of STFTProperties of STFT

[ ]

ˆˆ

ˆ ˆ ˆ ˆ2ˆ ˆ ˆ ˆ ˆ

ˆ ˆ

ˆ ˆ ˆ( ) [ ( ) ( )] fixed, variable relation to short-time power density function

ˆ ( ) | ( ) | ( ) ( ) ( ) fixed

ˆ ˆ( ) ( ) ( ) ( ) ( )

ω

ω ω ω ω

ω

= −

= = ⋅ =

= − − − + ⇔

jn

j j j jn n n n n

n n

X e w n m x m n

S e X e X e X e R k n

R k w n m x m w n m k x m k S

DTFT

DTFT

ˆ( )ω∞

=−∞∑ j

me

22

[ ]

ˆ

ˆ ˆ

ˆˆ ˆ( )ˆ

ˆ

Relation to regular ( ) (assuming it exists)

( ) ( ) ( )

1( ) ( ) ( )2

ˆ( ) ( ) ( ) ( )

ω

ω ω

πω θ ω θ θ

π

θ θ θ

θπ

= ∞

∞−

=−∞

− − −

− −

= =

=

⎡ ⎤− ↔ ∗⎣ ⎦

∫i

mj

j j m

m

j j j j nn

j j n j

X e

X e x m x m e

X e W e X e e d

w n m x m W e e X e

DTFT

Properties of STFTProperties of STFT

[ ]

12

ˆ

ˆ ˆ

ˆˆ ˆ( )ˆ

assume ( ) exists

( ) ( ) ( )

( ) ( ) ( )

ω

ω ω

πω θ ω θ θ θ

∞−

=−∞

− − −

= =

=

j

j j m

m

j j j j nn

X e

X e x m x m e

X e W e X e e d

DTFT

23

2

1 2

1 22

ˆ

ˆˆ ˆ ˆ( )ˆ

( ) ( ) ( )

limiting caseˆ ˆ ˆ ( ) ( ) ( )

( ) ( ) ( ) ( )

i.e.

π

ω

πω ω θ θ ω

π

π

πδ ω

πδ θ θπ

− −

= −∞ < < ∞ ⇔ =

= − =

n

j

j j j n jn

w n n W e

X e X e e d X e

, we get the same thing no matter where the window is shifted

Alternative Forms of STFTAlternative Forms of STFT

1

ˆˆ

ˆ ˆ ˆˆ ˆ ˆ

ˆ ˆ

ˆˆ ˆ

Alternative forms of ( ). real and imaginary parts

( ) Re ( ) Im ( )

ˆ ˆ( ) ( )

ˆ( ) Re ( )

ω

ω ω ω

ω

ω ω

ω

⎡ ⎤ ⎡ ⎤= +⎣ ⎦ ⎣ ⎦= −

⎡ ⎤= ⎣ ⎦

jn

j j jn n n

n n

jn n

X e

X e X e j X e

a j b

a X e

24

ˆˆ ˆ

( ) ( )

ˆ( ) Im ( )

ˆ when ( ) and ( ) are bot

ωω⎣ ⎦⎡ ⎤= − ⎣ ⎦

• −

n n

jn nb X e

x m w n m

ˆ

ˆ ˆ

ˆ( )ˆ ˆˆ ˆ

ˆˆ ˆ

h real (usually the case)ˆ ˆ ˆ can show that ( ) is symmetric in , and ( ) isˆ anti-symmetric in

2. magnitude and phase ( ) | ( ) |

ˆ can relate | ( ) | and ( )

θ ωω ω

ω

ω ω ωω

θ ω

=

n

n n

jj jn n

jn n

a b

X e X e e

X e ˆ ˆˆ ˆ to ( ) and ( ) ω ωn na b

Page 5: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

5

Role of Window in STFTRole of Window in STFT

ˆˆ

ˆˆ

ˆ The window ( ) does the following: 1. chooses portion of ( ) to be analyzed 2. window shape determines the nature of ( )

ˆ Since ( ) (for fixed ) is the normal FT o

ω

ω

−i

i

jn

jn

w n mx m

X e

X e n ˆf ( ) ( ), h if id h l FT' f b h ( ) d ( )

−w n m x m

25

ˆ ˆ

ˆ ˆ

then if we consider the normal FT's of both ( ) and ( ) individually, we get

( ) ( )

( ) ( )

ω ω

ω ω

∞−

=−∞

∞−

=−∞

=

=

j j m

m

j j m

m

x n w n

X e x m e

W e w m e

Role of Window in STFTRole of Window in STFT

ˆˆ ˆ

ˆ then for fixed , the normal Fourier transform of the ˆproduct ( ) ( ) is the convolution of the transforms

ˆof ( ) and ( )ˆ ˆfor fixed , the FT of ( ) is ( ) --thusω ω− −

•−

• − j j n

nw n m x m

w n m x mn w n m W e e

26

, ( ) ( )

12

12

ˆˆ ˆ( )ˆ

ˆˆ ˆ( )ˆ

( ) ( ) ( )

and replacing by gives

( ) ( ) ( )

πω θ θ ω θ

π

πω θ θ ω θ

π

θπ

θ θ

θπ

− − −

+

=

• −

=

j j j n jn

j j j n jn

X e W e e X e d

X e W e e X e d

Interpretation of Role of WindowInterpretation of Role of Window

ˆ ˆˆ

ˆˆ ˆ

ˆ

( ) is the convolution of ( ) with the FT of the shifted

window sequence ( )ˆ ( ) really doesn't have meaning since ( ) varies with time;

ˆ consider ( ) defined for wi

ω ω

ω ω

ω

− −

j jn

j j n

j

X e X e

W e eX e x n

x n ndow duration and extended for all time

27

ˆ to have the same properties then ( ) does exist with properties ˆ that reflect the sound within the window (can also consider ( ) 0

outside the window

ω⇒

=

jX ex n

ˆ

ˆˆ

and define ( ) appropriately--but this is another case)Bottom Line: ( ) is a smoothed version of the FT of the part

ˆ of ( ) that is within the window .

ω

ω

j

jn

X e

X ex n w

Windows in STFTWindows in STFTˆ

ˆ

ˆ

ˆ for ( ) to represent the short-time spectral properties of ( )

inside the window ( ) should be much narrower in frequency than significant spectral regions of ( )--i.e., almost a

ω

θ

ω

=>

jn

j

j

X e x n

W eX e n impulse

in frequencyconsider rectangular and Hamming windows, where width of the•

28

g g main spectral lobe is inversely proportional to window length, and side lobe levels are essentially independent of window length

Rectangular Window: flat window of length L samples; first zero in frequency response occurs at FS/L, with sidelobe levels of -14 dB or lower

Hamming Window: raised cosine window of length Lsamples; first zero in frequency response occurs at 2FS/L, with sidelobe levels of -40 dB or lower

WindowsWindows

29L=2M+1-point Hamming window and its corresponding DTFT

Frequency Responses of Frequency Responses of WindowsWindows

30

Page 6: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

6

Effect of Window LengthEffect of Window Length--HWHW

31

Effect of Window LengthEffect of Window Length--HWHW

32

Effect of Window LengthEffect of Window Length--RWRW

33

Effect of Window LengthEffect of Window Length--HWHW

34

Relation to ShortRelation to Short--Time AutocorrelationTime Autocorrelation

ˆˆ

ˆ ˆ ˆ ˆ2 *ˆ ˆ ˆ ˆ

ˆ( ) [ ] [ ]ˆ,

( ) | ( ) | ( ) ( )

is the discrete-time Fourier transform of for each value of then it is seen that is the Fourier transform of

ω

ω ω ω ω

= =

i jn

j j j jn n n n

X e w n m x mn

S e X e X e X e

35

ˆ ˆ( ) [ ] = −nR l w n m x ˆ[ ] [ ] [ ]

which is the short-time autocorrelation function of the previouschapter. Thus the above equations relate the short-time spectrum to the short-time autocorrelation,

=−∞

− − +∑m

m w n l m x m l

ShortShort--Time Autocorrelation and STFTTime Autocorrelation and STFT

36

Page 7: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

7

Summary of FT view of STFTSummary of FT view of STFTˆ

ˆ

interpret ( ) as the normal Fourier transform of the sequenceˆ( ) ( ),

properties of this Fourier transform depend on the window frequency resolution of ( ) varies inver

ω

ω

− − ∞ < < ∞•

jn

jn

X ew n m x m m

X e sely with thelength of the window want long windows for high resolution=>

37

length of the window want long windows for high resolution want ( ) to be relatively stationary (non-time-varying) during duration of window for most s

=>x n

table spectrum want short windows

as usual in speech processing, there needs to be a compromise between good temporal resolution (short windows) and good frequency resolution (l

=>

ong windows)

Linear Filtering Linear Filtering Interpretation of STFTInterpretation of STFTpp

38

Linear Filtering InterpretationLinear Filtering Interpretation

( )

1

1

ˆ ˆ

ˆ

ˆ( )

ˆ. modulation-lowpass filter form ( rather than )

( ) ( ) ( )

ˆ( ) ( ) , variable, fixed

( ) ( )

ω ω

ω

πθ θ ω θ

ω

θ

∞−

=−∞

+

= −

= ∗

j j mn

m

j n

j j j n

n n

X e x m e w n m

w n x n e n

W X d

39

22

( )( ) ( )

. bandpass filter-demodulation

θ θ ω θ

π

θπ

+

= ∫ j j j n

n

W e X e e d

X ˆ ˆ ( )

ˆ ˆ

ˆ ˆ

( ) ( ) ( )

( ( ) ) ( )

ˆ[( ( ) ) ( )], variable, fixed

ω ω

ω ω

ω ω ω

∞− −

=−∞

∞−

=−∞

= −

= −

= ∗

j j n m

m

j n j m

mj n j n

e w m x n m e

e w m e x n m

e w n e x n n

Linear Filtering InterpretationLinear Filtering Interpretation

ˆ ˆ

1. modulation-lowpass filter form:

( ) ( ) ( ),

ˆi bl fi d

ω ω∞

=−∞

= −∑j j mn

mX e x m e w n m

40

( )( )( )

variable, fixed

( ) ( )

ˆ( )cos( ) ( )

ˆ( )sin( ) ( )ˆ ˆ( ) ( )

ω

ω

ω

ω

ω ω

−= ∗

= ∗ −

= −

j n

n n

n

x n e w n

x n n w n

j x n n w na jb

Linear Filtering InterpretationLinear Filtering Interpretation

( )ˆ ˆ ˆ

2. bandpass filter-demodulation form

ˆ ( ) ( ) ( ) , variable, fixedω ω ω ω− ⎡ ⎤= ∗⎣ ⎦j j n j n

nX e e w n e x n n

41

complex bandpass filter output modulated by signal if ( ) is lowpass, then filter

is bandpass around all real computation for lower

half structure

ω

θ

θ ω

•=

j n

j

eW e

Linear Filtering InterpretationLinear Filtering InterpretationLowpass filter frequency response

42

Bandpass filter frequency response

Page 8: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

8

Linear Filtering InterpretationLinear Filtering Interpretation

ˆ ˆ( )

assume normal FT of ( ) existsˆ ( ) ( ) (recall that is a particular frequency)

( ) ( )ˆ spectrum of ( ) at frequency is shifted to zero frequency;

θ

ω θ ω

ω

ω

− +

↔=>

j

j n j

x nx n X ex n e X e

x n

43

s•

1

ˆ( )

ince the STFT is a convolution, the FT of the STFT is the product of the individual FT's, i.e., ( ) ( ) if ( ) resembles a narrow band lowpass filter, i.e., ( )

for sm

θ ω θ

θ θ

+ ⋅

• =

j j

j j

X e W eW e W e

ˆ ˆ( )

all and is 0 otherwise, then ( ) ( ) ( )θ ω θ ω

θ+ ⋅ ≈j j jX e W e X e

STFT Magnitude OnlySTFT Magnitude Only

for many applications you only need the magnitude of the STFT(not the phase) in such cases, the bandpass filter implementation is

44

1 22 2 /ˆ

less complex, since

ˆ ˆ | ( ) | ( ) ( )

|

ω ω ω⎡ ⎤= +⎣ ⎦

=

jn n nX e a b

1 22 2 /ˆ ˆ ˆ( ) | ( ) ( )ω ω ω⎡ ⎤= +⎣ ⎦j

n n nX e a b

Sampling Rates of STFTSampling Rates of STFT

45

Sampling Rates of STFTSampling Rates of STFT• need to sample STFT in both time and frequency to

produce an unaliased representation from which x(n) can be exactly recovered

• sampling rates lower than the theoretical minimum rate can be used, in either time or frequency, and x(n) can till b tl d f th li d ( d

46

still be exactly recovered from the aliased (under-sampled) short-time transform– this is useful for spectral estimation, pitch estimation, formant

estimation, speech spectrograms, vocoders– for applications where the signal is modified, e.g., speech

enhancement, cannot undersample STFT and still recover modified signal exactly

Sampling Rate in TimeSampling Rate in Time

ˆ

to determine the sampling rate in time, we take a linear filtering view 1. ( ) is the output of a filter with impulse response ( )

2. ( ) is a lowpass response with effective band

ω

ω

•j

nj

X e w n

W e

2

ˆ ˆ

width of Hertz thus the effective bandwidth of ( ) is Hertz ( ) hast b l d t t f l / d t id li i

ω ω• =>j jn n

BX e B X e

B

47

2

0 54 0 46

to be sampled at a rate of samples/second to avoid aliasingExample: Hamming Window

( ) . . cos(= −

B

w n 2 1 0 10

2 10 000

/ ( ))otherwise

(Hz); for 100, , Hz 200 Hz need

rate of 400/sec (every 25 samples) for sampling rate in time

π − ≤ ≤ −=

⇒ ≈ = = => = =>ss

n L n L

FB L F BL

Sampling Rate in FrequencySampling Rate in Frequency2

22 0 1 1

ˆˆ since ( ) is periodic in with period , it is only necessary to sample over an

interval of length need to determine an appropriate finite set of frequencies, / , , ,...,

ω ω ππ

ω π

• = = −

jn

k

X e

k N k Nˆ

ˆ

ˆˆ

ˆˆ

at which ( ) must be specified to exactly recover ( )

use the Fourier transform interpretation of ( )

1. if the window ( ) is time-limited, then the inverse transform of ( )

ω

ω

ω

jn

jn

jn

X e x n

X e

w n X eˆ

is time-limited

2 the sampling theorem requres that we sample ( ) in the frequency dimension at aωjX e

48

ˆ 2. the sampling theorem requres that we sample ( ) in the frequency dimension at a rate of at least twice its ('symmetric') "time width" 3. since the inverse Fou

ωjnX e

ˆˆ

ˆˆ

ˆrier transform of ( ) is the signal ( ) ( ) and this signal is of duration samples (the duration of ( )), then according to the sampling theorem

( ) must be sampled (in fre

ω

ω

−jn

jn

X e x m w n mL w n

X e2 0 1 1 2

ˆ

quency) at the set of frequencies

, , ,..., (where / is the effective width of the window)

in order to exactly recover ( ) from ( ) ω

πω = = −

k

k

jn

k k L LL

x n X e

• thus for a Hamming window of duration L=100 samples, we require that the STFT be evaluated at at least 100 uniformly spaced frequencies around the unit circle

Page 9: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

9

“Total” Sampling Rate of STFT“Total” Sampling Rate of STFT• the “total” sampling rate for the STFT is the product of the sampling

rates in time and frequency, i.e.,SR = SR(time) x SR(frequency)

= 2B x L samples/secB = frequency bandwidth of window (Hz)L = time width of window (samples)

• for most windows of interest, B is a multiple of FS/L, i.e.,

49

, p S , ,B = C FS/L (Hz), C=1 for Rectangular Window

C=2 for Hamming WindowSR = 2C FS samples/second

• can define an ‘oversampling rate’ ofSR/ FS = 2C = oversampling rate of STFT as compared to

conventional sampling representation of x(n)for RW, 2C=2; for HW 2C=4 => range of oversampling is 2-4this oversampling gives a very flexible representation of the speech signal

Mathematical Basis for Sampling the STFTMathematical Basis for Sampling the STFT

2 2

2 0 1 1

assume sample in time at ,

ˆ and in frequency at , , ,...,

sample valuesπ π

πω ω

• = = −∞ < < ∞

⎛ ⎞= = = −⎜ ⎟⎝ ⎠

r

k

j k j km

n n rR r

k k NN

50

2 2

2

( ) [ ] [ ]

( )

( ) [ ] (

π π

π

=−∞

= −

=

= +

∑j k j km

N Nr R

m

j krR j kN N

r R

j kN

r R

X e w rR m x m e

e X e

X e x rR m w ( )2

2 2

) set ;

define DFT-type notation

( ) ( ) ( )

π

π π

∞ −

=−∞

′ ′− = + =

= =

∑j km

N

m

j k j krRN N

r r R r

m e m rR m m m

X k X e e X k

Sampling the STFTSampling the STFT

51

Sampling the STFTSampling the STFT

2 2

21

0

0 0 1

DFT Notation

( ) ( ) ( )

let [ ] for (finite duration window with no zero-valued samples)

[ ] [ ] [ ]

π π

π

− −

= =

• − ≠ ≤ ≤ −

= + −∑

j k j krRN N

r r R r

L j kmN

r

X k X e e X k

w m m L

X k x r R m w m e

52

0

0 1( fixed, ) if then (D

=

≤ ≤ −• ≤

m

r k NL N

21

0

1

0 1

FT defined with no aliasing can recover sequence exactly using inverse DFT)

[ ] [ ] [ ]

( fixed, ) if (IDFT defined with no aliasing), then all samples

ca

π−

=

+ − =

≤ ≤ −• ≤

∑N j km

Nr

kx r R m w m X k e

Nr m N

R L

( )n be recovered from [ ] gaps in sequence> ⇒rX k R L

What We Have Learned So FarWhat We Have Learned So Far1 ˆ ˆ

ˆ

ˆˆ

ˆ. ( ) ( ) ( )

ˆ ˆ function of for sampled (looks like a time sequence)ˆˆ function of for sampled (looks like a transform)

( ) (no sampling rate reduction) define

ω ω

ω

ωω ω

∞−

=−∞

= −

=

=

∑ii

j j mn

m

jn

X e x m w n m e

n nn

X e 1 2 3 0

2 ˆ

ˆ ˆd for , , ,...;ˆ ˆ ˆ( ) DTFT ( ) ( ) fixed variableω

ω π

ω

= ≤ ≤

= ⇒⎡ ⎤⎣ ⎦j

n

X e x m w n m n

53

2 ˆ

ˆˆ ˆˆ

. ( ) DTFT ( ) ( ) fixed, variableˆ with time origin tied to ( )

ˆ ˆ ˆ( ) DTFT ( ) ( ) fixed, variable

with time origin tied to (

ω ω

ω

ω−

= − ⇒⎡ ⎤⎣ ⎦

= + − ⇒⎡ ⎤⎣ ⎦

n

j j nn

X e x m w n m n

x nX e e x n m w m n

w3

1

ˆˆ

ˆˆ

ˆ ˆˆ

). Interpretations of ( )

ˆ ˆˆ. fixed, variable; ( ) DTFT ( ) ( ) DFT View

ˆ ˆ 2. variable, fixed; ( ) ( ) ( ) Linear Filtering view

ω

ω

ω ω

ω ω

ω −

= = − ⇒⎡ ⎤⎣ ⎦= = ∗ ⇒

jn

jn

j j nn

mX e

n X e x m w n m

n n X e x n e w n filter bank implementation⇒

What We Have Learned So FarWhat We Have Learned So Far4

12

12 0

5

ˆ ˆˆ

ˆˆ ˆˆ

. Signal Recovery from STFT

ˆ ( ) ( ) ( )

ˆ( ) ( )( )

. Linear Filtering Interpretation

πω ω

π

πω ω

π

ωπ

ωπ

− =

=

j j mn

j j nn

x m w n m X e e d

x n X e e dw

54

ˆ ˆ 1. modulation-lowpass filter ( ) ( ) ( )ω ω−⇒ = ∗j jnX e w n x n e

( )

12

ˆ ˆ( )

ˆ ˆ ˆ

,

ˆ ˆ variable, fixed ( ) ( ) ( )

2. bandpass filer-demodulation ( ) ( ) ( ) ,

ˆ ˆ variable, fixed

πω θ θ ω θ

π

ω ω ω

ω θπ

ω

+

⎡ ⎤⎣ ⎦

= =

⎡ ⎤⇒ = ∗⎣ ⎦=

n

j j j j nn

j j n j nn

n n X e W e X e e d

X e e w n e x n

n n

Page 10: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

10

What We Have Learned So FarWhat We Have Learned So Far6

22

. Sampling Rates in Time and Frequency 1. time: ( ) has bandwidth of Hertz samples/sec rate

Hamming Window: (Hz)

2. frequency: ( ) is time limited to samples

j

S

W e B BFBL

w n L

ω ⇒

=

inverse of ( ) isalso time limited need to sample in frequency at twice the (effective)

jnX e ω⇒

55

also time limited need to sample in frequency at twice the (effective) time width of the time-limited sequence frequency samples 3. t

L⇒

⇒2otal Sampling Rate: samples/sec

- frequency bandwidth of the window (Hz) - effective time width of the window (samples) / (Hz) SamplinS

B LBL

B C F L

⋅=== ⋅ ⇒ 2 2

12

g Rate samples/second - for Rectangular Window, - for Hamming Window,

SB L CFC

C

= ⋅ =

==

Spectrographic DisplaysSpectrographic Displays

56

Spectrographic DisplaysSpectrographic Displays• Sound Spectrograph-one of the earliest embodiments of the time-

dependent spectrum analysis techniques– 2-second utterance repeatedly modulates a variable frequency

oscillator, then bandpass filtered, and the average energy at a given time and frequency is measured and used as a crude measure of the STFT

– thus energy is recorded by an ingenious electro-mechanical system on special electrostatic paper called teledeltos paper

57

special electrostatic paper called teledeltos paper– result is a two-dimensional representation of the time-dependent

spectrum-with vertical intensity being spectrum level at a given frequency, and horizontal intensity being spectral level at a given time-with spectrum magnitude being represented by the darkness of the marking

– wide bandpass filters (300 Hz bandwidth) provide good temporal resolution and poor frequency resolution (resolve pitch pulses in time but not in frequency)—called wideband spectrogram

– narrow bandpass filters (45 Hz bandwidth) provide good frequency resolution and poor time resolution (resolve pitch pulses in frequency, but not in time)—called narrowband spectrogram

Conventional Spectrogram (Every Conventional Spectrogram (Every salt breeze comes from the sea)salt breeze comes from the sea)

58

Digital Speech SpectrogramsDigital Speech Spectrograms•• wideband spectrogramwideband spectrogram

•• follows broad spectral peaks (formants) follows broad spectral peaks (formants) over timeover time

•• resolves most individual pitch periods as resolves most individual pitch periods as vertical striations since the IR of the vertical striations since the IR of the analyzing filter is comparable in duration analyzing filter is comparable in duration to a pitch periodto a pitch period

•• what happens for low pitch maleswhat happens for low pitch males——high high pitch femalespitch females

59

pitch femalespitch females

•• for unvoiced speech there are no vertical for unvoiced speech there are no vertical pitch striationspitch striations

•• narrowband spectrogramnarrowband spectrogram•• individual harmonics are resolved in individual harmonics are resolved in voiced regionsvoiced regions

•• formant frequencies are still in evidenceformant frequencies are still in evidence

•• usually can see fundamental frequencyusually can see fundamental frequency

•• unvoiced regions show no strong unvoiced regions show no strong structurestructure

Digital Speech SpectrogramsDigital Speech Spectrograms• Speech Parameters (“This is a test”):

– sampling rate: 16 kHz– speech duration: 1.406 seconds– speaker: male

• Wideband Spectrogram Parameters:– analysis window: Hamming window– analysis window duration: 6 msec (96 samples)– analysis window shift: 0.625 msec (10 samples)– FFT size: 512– dynamic range of spectral log magnitudes: 40 dB

• Narrowband Spectrogram Parameters:– analysis window: Hamming window– analysis window duration: 60 msec (960 samples)– analysis window shift: 6 msec (96 samples)– FFT size: 1024 – dynamic range of spectral log magnitudes: 40 dB

60

Page 11: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

11

Digital Speech SpectrogramsDigital Speech SpectrogramsTop Panel: 3 msec (48 samples) window

Second Panel: 6 msec (96

)

61

samples) window

Third Panel: 9 msec (144 sample) window

Fourth Panel: 30 msec (480 sample) window

Spectrogram ComparisonsSpectrogram Comparisons

62

Spectrogram Spectrogram -- MaleMale1024 80 75= = =nfft , ,L Overlap

63

Spectrogram Spectrogram -- FemaleFemale1024 80 75= = =nfft , ,L Overlap

64

Overlap Addition (OLA) Overlap Addition (OLA) MethodMethod

65

Overlap Addition (OLA) MethodOverlap Addition (OLA) Method

/ˆ ˆ

ˆ

based on normal FT interpretation of short-time spectrumˆ( ) ( ) ( ) ( )

can reconstruct by computing IDFT of ( ) anddividing out the window (assumed non-zero

ω

ω

←⎯⎯⎯⎯→ = −

k

k

DFT IDFTjn n

jn

X e y m x m w n m

x(m) X efor all samples)

66

dividing out the window (assumed non zero

ˆ

for all samples) this process gives signal values of for each window

window can be moved by samples and the process repeated

since ( ) is "undersampled" in time, it is highly suscω

• =>

• kjn

L x(m)L

X e eptible to aliasing errors need more robust synthesis procedure=>

Page 12: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

12

Overlap Addition (OLA) MethodOverlap Addition (OLA) Method( ) ( )

summation is for overlapping analysis sections for each value of where ( ) is measured, do an inverse FT to give

( ) ( ) ( ) (where is the size of the FT)(

ω ω

ω

⎡ ⎤= ⎢ ⎥

⎣ ⎦•

= −

∑ ∑ k k

k

j j nm

m k

jm

m

y n X e e

m X ey n Lx n w m n Ly ) ( ) ( ) ( )= = −∑ ∑mn y n Lx n w m n

67

10

00

a basic property of the window is

( ) ( ) | ( )

since any set of samples of the window are equivalent (by sampling arguments), then if ( ) is sampled ofte

ωω

==

= =

∑k

k

m m

Njj

nW e W e w n

w n0

0

n enough we get (independent of ) ( ) ( )

( ) ( ) ( ) using overlap-added sections

− =

=

∑ j

mj

nw m n W e

y n Lx n W e

Overlap Addition of Bartlett Overlap Addition of Bartlett and and HannHann WindowsWindows

68

Overlap Addition of Hamming WindowOverlap Addition of Hamming WindowL=128L=128

69

Window SpectraWindow Spectra

70DTFTof Bartlett (triangular), Hann and Hamming windows

Hamming Window SpectraHamming Window Spectra

71

DTFTs of even-length, odd-length and modified odd-to-even length Hamming windows; zeros spaced at 2π/R give perfect

reconstruction using OLA (even-length window)

Overlap Addition (OLA) MethodOverlap Addition (OLA) Method

72

Page 13: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

13

Overlap Addition (OLA) MethodOverlap Addition (OLA) Method

• w(n) is an L-point Hamming window with R=L/4

73

• assume x(n)=0 for n<0

• time overlap of 4:1 for HW

• first analysis section begins at n=L/4

Overlap Addition (OLA) MethodOverlap Addition (OLA) Method

• 4-overlapping sections contribute to each interval

• N-point FFT’s done using L speech samples, with N-L

dd d t d t

74

zeros padded at end to allow modifications without significant aliasing effects

• for a given value of ny(n)=x(n)w(R-n)+x(n)w(2R-n)+x(n)w(3R-n)+x(n)w(4R-n)=x(n)[w(R-n)+w(2R-n)+w(3R-n)+w(4R-n)]=x(n) W(ej0)/R

Filter Bank Summation Filter Bank Summation (FBS)(FBS)( )( )

75

Filter Bank SummationFilter Bank Summation

the filter bank interpretation of the STFT shows that for

any frequency , ( ) is a lowpass representation ˆ of the signal in a band centered at ( for FBS)

ωωω

=

kjk n

k

X en n

76

( ) ( )ω ω−= −k kj j nn kX e e x n m w ( )

where ( ) is the lowpass window used at frequency (we have generalized the structure to allow a different lowpass window at each frequency ).

ω

ω

ω

=−∞∑ kj m

m

k k

k

m e

w m

Filter Bank SummationFilter Bank Summation

define a bandpass filter and substitute it in the equation to give

77

equation to give

( ) ( )

( ) ( ) ( )

k

k k

j nk k

j j nn k

m

h n w n e

X e e x n m h m

ω

ω ω∞

=−∞

=

= −∑

Filter Bank Interpretation of Filter Bank Interpretation of STFT (case: wSTFT (case: wkk[n]=w[n])[n]=w[n])

( )

[ ] ( )

[ ] [ ] ( ) ( ) ( )

( ) ( )

k k

k k k

j

j n j nj j

j n j n j nj nk

n n

w n W e

h n w n e H e W e FT e

FT e e e e

ω

ω ωω ω

ω ω ω ωω δ ω ω∞ ∞

− −−

=−∞ =−∞

← − − − →

= ← − − − → = ⊗

= = = −∑ ∑

-ω0 ω0 ω

78

[ ]

( )( ) ( ) ( ) ( )

[ ] ( ) [ ] [ ]

( ) ( ) ( )

k

k k

jj jk

j j nk n

j j jk

H e W e W e

v n X e e x n h n

V e H e X e FT

ω ωω ω

ω ω

ω ω ω

δ ω ω −

= ⊗ − =

= = ⋅ ∗

⎡ ⎤= ⋅ ⊗⎣ ⎦

( )

( )

( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( )

k

k

k

j n

j jk

jjk

j j

e

H e X e

X e W e

X e W e

ω

ω ω

ω ωω

ω ω ω

δ ω ω

δ ω ω

+

⎡ ⎤= ⋅ ⊗ +⎣ ⎦⎡ ⎤= ⋅ ⊗ +⎣ ⎦

= ⋅

ωk-ω0 ωk ωk+ ω0 ω

“single-sided, bandpass”

“lowpass”

Page 14: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

14

Filter Bank SummationFilter Bank Summation

thus ( ) is obtained by bandpass filtering ( ) followed by modulation with the complex exponential

f

kjn

j n

X e x nω

ω

79

. We can express this in the form

( ) ( ) ( ) ( )

thus

k

k k

j n

j j nk n k

m

e

y n X e e x n m h m

ω

ω ω

=−∞

= = −

∑ ( ) is the output of a bandpass filter with impulse

response ( )k

k

y nh n

Filter Bank SummationFilter Bank Summation

80

Filter Bank SummationFilter Bank Summation0 1 1

a practical method for reconstructing ( ) from the STFT is as follows

1. assume we know ( ) for a set of frequencies { }, , ,..., 2. assume we have a set of bandpass filters

ω ω

= −kjn k

x n

X e N k NN

0 1 1

with impulse responses

( ) ( ) , , ,..., 3. assume ( ) is an ideal lowpass filter with cutoff frequency

ω

ω= = −kj n

k k

k pk

h n w n e k Nw n

81

( )

- the frequency response of the bandpass filter is

( ) ( )kjjk kH e W e ω ωω −=

Filter Bank SummationFilter Bank Summation

2 0 1 1

consider a set of bandpass filters, uniformly spaced, so that the entire frequency band is covered

, , ,...,

also assume window the same for all channels, i.e.,( ) ( )

πω

= = −

•=

k

N

k k NN

w n w n 0 1 1= −k N

82

( ) ( ),=kw n w n

1 1

0 0

0 1 1

( )

, ,..., if we add together all the bandpass outputs, the composite response is

( ) ( ) ( )

if ( ) is properly sampled in frequency ( ), where is the

ω ωω ω

ω

− −−

= =

= −

= =

• ≥

∑ ∑ k

k

N Njj j

kk k

j

k N

H e H e W e

W e N L L

1

0

1 0( )

window duration, then it can be shown that

( ) ( )ω ω ω−

=

= ∀∑ k

Nj

k

W e wN FBS FormulaFBS Formula

Filter Bank SummationFilter Bank Summation

83

1/N

Filter Bank SummationFilter Bank Summation

/

derivation of FBS formula

( ) ( )

if ( ) is sampled in frequency at uniformly spaced points, the inverse discrete Fourier transform

ω

ω

←⎯⎯⎯→

FT IFT j

j

w n W e

W e N

84

p p ,

of the sampled version of ( ) is ωkjW e

1

0

1

(recall that sampling multiplication convolution aliasing)

( ) ( )

an aliased version of ( ) is obtained.

ω ω− ∞

= =−∞

⇒ ⇔ ⇒

= +

∑ ∑k k

Nj j n

k r

W e e w n rNN

w n

Page 15: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

15

Filter Bank SummationFilter Bank Summation

0 0

0

If ( ) is of duration samples, then ( ) , , and no aliasing occurs due to sampling in frequency

of ( ). In this case if we evaluate the aliased formula for we get

ω

•= < ≥

•j

w n Lw n n n L

W en

85

0

1

formula for , we get

=n1

0

0( ) ( )

the FBS formula is seen to be equivalent to the formula above, since (according to the sampling theorem) any

set of uniformly spaced samples of ( ) is adequate

ω

ω

=

=

∑ k

Nj

k

j

W e wN

N W e

Filter Bank SummationFilter Bank Summation

1 1

0 0

0

0

the impulse response of the composite filter bank system is

( ) ( ) ( ) ( ) ( )

thus the composite output is

( ) ( ) ( ) ( ) ( )

ω δ− −

= =

= = =

∑ ∑ k

N Nj n

kk k

h n h n w n e N w n

h N

86

0 ( ) ( ) ( ) ( ) ( ) thus for FBS method, the

= ∗ =•

y n x n h n Nw x n

1 1

0 0

0

reconstructed signal is

( ) ( ) ( ) ( ) ( )

if ( ) is sampled properly in frequency, and is independent of theshape of ( )

ω ω

ω

− −

= =

= = =

∑ ∑ k k

k

N Nj j n

k nk k

jn

y n y n X e e N w x n

X ew n

Filter Bank SummationFilter Bank Summation

87

1/N

2 21 1

0 0

2 21

0

21 ( )

( ) ( ) ( )

( ) ( )

( ) ( )

N N j k j knN N

k nk k

N j km j knN N

k m

N j k n mN

y n y n X e e

x m w n m e e

x m w n m e

π π

π π

π

− −

= =

− −

=

− −

= =

⎡ ⎤= −⎢ ⎥

⎣ ⎦

= −

∑ ∑

∑ ∑

∑ ∑

Filter Bank SummationFilter Bank Summation

88

0

0 0 1

( ) ( ) ( )

( ) ( ) ( )

( ) for if then n

m k

m r

r

x m w n m N n m rN

y n N w rN x n rN

w n n L N L

δ

=

=−∞

=−∞

= − − −

= −

• ≠ ≤ ≤ − ⇒ ≥

∑ ∑

∑ ∑

∑0

00 1 2

eed only term

if then in order for ( ) ( ) you need the condition ( ) , , ,... 'undersampled' representation can still work--at least in theory

ry(n) Nw( )x(n)

N L y n x n w rN r

==

• < = = = ± ±•

Summary of FBS MethodSummary of FBS Method perfect reconstruction of from ( ) is possible using FBS

under the following conditions: 1. is a finite duration filter/window

2. ( ) is sampled properly in both ti

jn

jn

x(n) X e

w(n)

X e

ω

ω

me and frequency

perfect reconstruction of from ( ) is also possible using FBS under the following condition:

kjnx(n) X e ω•

89

FBS under the following condition:

( ) is perfectly bandlimited

To avoid time aliasing, ( ) mustk

j

jn

W e

X e

ω

ω•

2 4

be evaluated at at least uniformly spaced frequencies, where is the window duration -since window of length samples has frequency bandwidth of from

/ (for RW) to / (for HW), the

LL

LL Lπ π

2 0 1 1 bandpass filters in FBS overlap

in frequency since the analysis frequencies are / , , ,...,

there is a way (at least theoretically) for ( ) to be evaluated in non-overlapping bands

kjn

k L k L

X e ω

π = −

and for which can still be exactly recoveredx(n)

FBS Reconstruction in NonFBS Reconstruction in Non--Overlapping BandsOverlapping Bands

2 0 1 1

assume window length for all bands is samples assume the same window is used for equally spaced

frequency bands with analysis frequencies

, , ,...,πω

••

= = −k

LN

k k NN

90

where can be less than • N L assume is an ideal lowpass filter with cutoff frequency

πω

=p

w(n)

N

example with N=6 equally spaced ideal

filters

Page 16: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

16

FBS Reconstruction in NonFBS Reconstruction in Non--Overlapping BandsOverlapping Bands

1 1

0 0

1 12

0 0

/

the composite impulse response for the FBS system is

( ) ( ) ( )

defining a composite of the terms being summed as

( )

ω ω

ω π

− −

= =

− −

= =

= =

∑ ∑

∑ ∑

k k

k

N Nj n j n

k k

N Nj n j kn N

k k

h n w n e w n e

p n e e

91

0 0

we get fo= =

•k k

r ( )

( ) ( ) ( ) it is easy to show (Prob. 6.7) that is a periodic train of impulses of the form

( ) ( )

giving for ( ) the expression

( ) ( ) ( )

thus the

δ

δ

=−∞

=−∞

=•

= −

= −

r

r

h n

h n w n p np(n)

p n N n rN

h n

h n N w rN n rN

composite impulse response is the window sequence sampled at intervals of samplesN

FBS Reconstruction in NonFBS Reconstruction in Non--Overlapping BandsOverlapping Bands

1 for ideal LPF we have

sin( / ) sin( )( ) , ( ) ( )

giving ( ) ( )other cases where perfect reconstruction is obtained

= = =

=•

n N rw n w rN rn rN N

h n n

π π δπ π

δ

impulse response of ideal lowpass filter with cutoff frequency π/N

92

other cases where perfect reconstruction is obtained1. ( ) is of finite length and causal (no images)2. ( ) has

•<w n L N

w n

0

0

0

0

0

10 0 1 2

length and has the property ( ) / , for

for ( , , , ,...)

giving ( ) ( ) ( ) ( )

( ) ( ) ( )−

>= =

= = ≠ = ± ±

= = −

= ⇒ = −j r Nj

Nw n N n r N

n rN r r r

h n p n w n n r N

H e e y n x n r Nωω

δ

Summary of FBS ReconstructionSummary of FBS Reconstruction for perfect reconstruction using FBS methods

1. does not need to be either time-limited or frequency-limited to exactly reconstruct

from ( )ω

kjn

w(n)

x(n) X e

93

2. just needs eqw(n) ually spaced zeros, spaced samples apart for theoretically perfect reconstruction

exact reconstruction of the input is possible with a number of frequency channels less than that required by the

•N

sampling theorem key issue is how to design digital filters that match these

criteria•

Practical Implementation of FBSPractical Implementation of FBS

94

FBS and OLA ComparisonsFBS and OLA Comparisons

95

FBS and OLA ComparisonsFBS and OLA Comparisonsduals filter bank summation method overlap addition method

-- one depends on sampling relation in frequency -- one depends on sampling relation in time FBS requires sampling in frequency

• ←⎯⎯⎯→

11 0( )

be such that the window

transform ( ) obeys the relation

( ) ( ) any

ω

ω ω ω−

− =∑ k

j

Nj

W e

W e w

96

0

0

0 ( ) ( ) any

OLA requires that sampling in time be such that the window obeys the relation

( ) ( ) / any

ω=

=

− =

∑ k

k

j

W e wN

w rR n W e R

the key to Short-Time Fourier Analysis is the ability to modify the short-time spectrum (via quantization, noise enhancement, signal enhancement, speed-up/slow-down,etc) and recover an

=−∞

∑r

n

"unaliased" modified signal

Page 17: ShortShort--Time Fourier AnalysisTime Fourier Analysis … speech processing... · xn X e e d w wxn Xe Xe 21 for all values of over one complete pe

17

Overlap Addition (OLA) MethodOverlap Addition (OLA) Method

1

0

0 1

1

assume ( ) sampled with period samples in time

( ) ( ), integer, the Overlap Add Method is based on the summation

( ) ( ) OLA Metho

ω

ω ω

ω ω∞ −

=−∞ =

= ≤ ≤ −

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑

k

k k

k k

jn

j jr rR

Nj j n

rr k

X e R

Y e X e r k N

y n Y e eN

d

for each value of , compute the inverse transform of ( ) giving the sequences

ω• kjrr Y e

97

g g q ( ) ( ) ( ), the signal at time is obtained by summing the values at time

of all the seque

= − − ∞ < < ∞

•ry m x m w rR m m

n nnces, ( ) that overlap at time , giving

( ) ( ) ( ) ( )

if has a bandlimited FT and if ( ) is properly sampled in time (i.e., small enough to avoid aliasing) the

ω

∞ ∞

=−∞ =−∞

= = −

∑ ∑k

r

rr r

jn

y m n

y n y n x n w rR n

w(n) X eR

0

0

n

( ) ( ) / -- independent of , and

( ) ( ) ( ) / -- exact reconstruction of

=−∞

− ≈

=

∑ j

rj

w rR n W e R n

y n x n W e R x(n)