13
Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 25 May, 2004

Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Speech Recognition in Noise

Esfandiar Zavarehei

Department of Electronic and Computer Engineering

Brunel University

25 May, 2004

Page 2: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Contents

• The use of formant features in speech recognition - Variable-Order LP Formant Tracker with Kalman Filtering

- Results

• Kalman De-noising - Tracking and Filtering the Frequency Trajectories (RASTA)

- How Kalman Filter is applied to de-noising problem

- Advantages of Kalman

Page 3: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Variable-Order LP Formant tracker

LP Model Pole Extraction

Rule-Based Refinement

LP Order Adjustment

Continuity Measurement

Track History

Formant Track

Pre-Processed Speech

Kalman Filter

• Higher order of LP modelling for higher resolution• Continuity criteria for better classification• Kalman Filtering for smoother Tracks

Page 4: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Formant Feature (FF) Vectors

•In addition to the Frequency of poles their Band Widths and Magnitudes are used as well

•The HMM models are trained on mono-phones.

Page 5: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

FF vs. MFCC with and without energy component

Mono-phone recognition in Train noise

•Better performance of FF in severe noisy conditions

Page 6: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Robustness of dynamic FF to noise

Mono-phone recognition in Train noise

•Dynamic Features are much more robust to noise

Page 7: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

The use of the Formants for consonant recognition

Mono-phone recognition in Train noise

•Higher Recognition rates than vowels in higher SNR•More sensitive to noise because of the lower energy level

Page 8: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

De-noising the speech by filtering frequency trajectories

Page 9: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

RelAtive SpecTrA (RASTA) Processing

• Filtering the frequency trajectories of the cubic root of power spectrum using a fixed IIR filter

14

431

0.981

220.1H

zz

zzzz

Page 10: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

The use of FIR filters in RASTA

• Filtering the frequency trajectories of the power spectrum using a bank of non-casual FIR filters

• not adaptive• experimentally derived

Filters’ Impulse Response

Page 11: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Kalman Filtering

• Kalman Filter adaptively updates itself with noise covariance

Page 12: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

How Kalman Filter is applied to de-noising problem

Segment Frequency Bin Trajectory

VAD

Noise Modelling

Prior Noise Model and Trajectory Statistics

Spectral Subtraction

Observation

Predictor

Predicted

Error covariance

Noise Covariance

Mean

Kalman Gain

EstimatorOutput

Kalman Filtering

Neighbour Trajectory

Noise Modelling and updating

Page 13: Communications & Multimedia Signal Processing Speech Recognition in Noise Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Advantages of Kalman

• A more informed noise reduction

• Combining the prediction and the observation of the frequency trajectory

• Adaptively updating the noise model while filtering the trajectory (in comparison with RASTA)

• Could (and probably should) be combined with spectral subtraction for improved performance