17
EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Embed Size (px)

DESCRIPTION

…not those either!

Citation preview

Page 1: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

EEL 6586: AUTOMATIC SPEECH PROCESSING

Windows Lecture

Mark D. Skowronski Computational Neuro-Engineering Lab

University of FloridaFebruary 10, 2003

Page 2: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

No, not MS Windows®…

Page 3: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

…not those either!

Page 4: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Speech windows

Speech is NONSTATIONARY

Page 5: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Assume speech is stationary over ‘short’ window of time.

‘SEVEN’

Speech windows

Page 6: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

What is a ‘short’ window of time?• 10 μs: smallest difference detectable by

auditory system (localization),• 3 ms: shortest phoneme (plosive burst),• 10 ms: glottal pulse period,• 100 ms: average phoneme duration,• 4 s: exhale period during speech.

‘Short’ depends on application.

Page 7: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Applications using windows

• Automatic speech recognition,• Speech coding/decoding,• Speaker identification,• Text-to-speech synthesis,• Noise reduction

Typical window (frame) length: 20-30 msTypical frame rate: 100 frames/sec

Page 8: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Short-time analysis

)()()( nsnwnx s(n): entire speech utterance

w(n): window function

x(n): frame of speech

Window function is non-zero for N samples, n=0,…,N-1

Page 9: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Short-term Fourier Transform

m

njemnwmsnX )()(),(

s(m): entire speech utterance

w(m): window function

X(n,ω): STFT of speech at time n

STFT is a smoothed version of original spectrum.

Page 10: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

STFT example

)(*)()()()()( SWXnsnwnx

s(n): pure sinewave of infinite length

w(n): rectangular window:

o.w.0

1,...,01)(

Nnnw

Page 11: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

STFT example|W(ω)|

*

|S(ω)|

ω0

ω0

=|X(ω)|

Page 12: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Window types

• Rectangular• Hann (cosine)• Hamming (raised cosine)• Blackman• Kaiser-Bessel

Tradeoff between leakage and blurring

Page 13: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Window tradeoff• Blurring: main lobe width A• Leakage: side lobe suppression B

B

A

Page 14: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Popular windowsWindow Unit BW Sidelobe

Rectangle 1 -13 dB

Hann 2 -31 dB

Hamming 2 -43 dB

Blackman 3 -68 dB

Kaiser-Bessel

4 -91 dB

Page 15: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Practical issues

• Rule of thumb:– Time domain, use Rectangle window– Freq domain, use Hamming window

• Why?

Page 16: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Time domain issues• Correlation in time domain interfered by

tapered windows

20 ms /eh/, male utterance, pitch measurement (normalized autocorrelation).

First side peak lower using Hamming window

Page 17: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

Frequency domain issuesfs=12.5 KHz, /eh/, 800 samples, male speaker.Blurring/Leakage tradeoff evidence: