Upload
derick-martin
View
220
Download
0
Embed Size (px)
DESCRIPTION
…not those either!
Citation preview
EEL 6586: AUTOMATIC SPEECH PROCESSING
Windows Lecture
Mark D. Skowronski Computational Neuro-Engineering Lab
University of FloridaFebruary 10, 2003
No, not MS Windows®…
…not those either!
Speech windows
Speech is NONSTATIONARY
Assume speech is stationary over ‘short’ window of time.
‘SEVEN’
Speech windows
What is a ‘short’ window of time?• 10 μs: smallest difference detectable by
auditory system (localization),• 3 ms: shortest phoneme (plosive burst),• 10 ms: glottal pulse period,• 100 ms: average phoneme duration,• 4 s: exhale period during speech.
‘Short’ depends on application.
Applications using windows
• Automatic speech recognition,• Speech coding/decoding,• Speaker identification,• Text-to-speech synthesis,• Noise reduction
Typical window (frame) length: 20-30 msTypical frame rate: 100 frames/sec
Short-time analysis
)()()( nsnwnx s(n): entire speech utterance
w(n): window function
x(n): frame of speech
Window function is non-zero for N samples, n=0,…,N-1
Short-term Fourier Transform
m
njemnwmsnX )()(),(
s(m): entire speech utterance
w(m): window function
X(n,ω): STFT of speech at time n
STFT is a smoothed version of original spectrum.
STFT example
)(*)()()()()( SWXnsnwnx
s(n): pure sinewave of infinite length
w(n): rectangular window:
o.w.0
1,...,01)(
Nnnw
STFT example|W(ω)|
*
|S(ω)|
ω0
ω0
=|X(ω)|
Window types
• Rectangular• Hann (cosine)• Hamming (raised cosine)• Blackman• Kaiser-Bessel
Tradeoff between leakage and blurring
Window tradeoff• Blurring: main lobe width A• Leakage: side lobe suppression B
B
A
Popular windowsWindow Unit BW Sidelobe
Rectangle 1 -13 dB
Hann 2 -31 dB
Hamming 2 -43 dB
Blackman 3 -68 dB
Kaiser-Bessel
4 -91 dB
Practical issues
• Rule of thumb:– Time domain, use Rectangle window– Freq domain, use Hamming window
• Why?
Time domain issues• Correlation in time domain interfered by
tapered windows
20 ms /eh/, male utterance, pitch measurement (normalized autocorrelation).
First side peak lower using Hamming window
Frequency domain issuesfs=12.5 KHz, /eh/, 800 samples, male speaker.Blurring/Leakage tradeoff evidence: