30
RCC-Mean Subtraction Robust RCC-Mean Subtraction Robust Feature and Compare Various Feature and Compare Various Feature based Methods for Robust Feature based Methods for Robust Speech Recognition in presence of Speech Recognition in presence of Telephone Noise Telephone Noise Amin Fazel Sharif University of Technology Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005 Computer Engineering Department, Sharif University of Computer Engineering Department, Sharif University of Technology Technology

Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

  • Upload
    blithe

  • View
    71

  • Download
    0

Embed Size (px)

DESCRIPTION

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise. Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005. - PowerPoint PPT Presentation

Citation preview

Page 1: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

RCC-Mean Subtraction Robust Feature and RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Compare Various Feature based Methods for Robust Speech Recognition in presence of Robust Speech Recognition in presence of

Telephone NoiseTelephone Noise

Amin FazelSharif University of TechnologySharif University of Technology

Hossein Sameti, Mohammad T. Manzuri

February 2005

Computer Engineering Department, Sharif University of Computer Engineering Department, Sharif University of TechnologyTechnology

Page 2: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

2/30

• Introduction

• Feature based methods– MFCC, RCC, CMN, PLP, RASTA

• Mean Normalization Root Cepstral Coefficients

• Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database

• Summery

Outline

Page 3: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

3/30

Effect of Noise on ASR

• Two phase in most ASR systems– Train– Operating (Testing)

• Mismatch causes reduction in accuracy

• Mismatch occur because of– Environment

• Microphone, babble, distance, transmission canal

– Speaker• Specific speaker: speed,…• Various speakers: gender, age, accent,…

Page 4: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

4/30

Effect of Noise on ASR

• Noise– Additive noise

• Babble, car, subway

• Exhibit, office, …

– Convolutional Noise• Canal, telephone line

• Microphone effect• Distance of speaker to microphone

– Others • Lombard noise, Reflection of building

noise

Stationary Non-stationary

Page 5: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

5/30

Effect of Noise on ASR

• Simple model

• Robust Speech Recognition is the study of building speech recognition that handle mismatch condition.

Convolutional

noise CorruptedSpeech

Additive noise

Clean Speech

Page 6: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

6/30

Robustness Methods

• Signal– Speech enhancement

• Feature– Robust feature extraction

• Model– Change of the model parameters

– Model trainingTraining phase

Testing phase

SpeechSignal

Features ModelFeature

ExtractionModel

Training

SpeechSignal

Features Model

Page 7: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

7/30

Introduction

• Feature based methods– MFCC, RCC, CMN, PLP, RASTA

• Mean Normalization Root Cepstral Coefficients

• Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database

• Summery

Outline

Page 8: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

8/30

Mel-Frequency Cepstral Coefficient

• Compute magnitude-squared of Fourier transform

• Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution

• Take log of outputs ( for RCC we take root instead of log)

• Compute cepstral using discrete cosine transform

• Smooth by dropping higher-order coefficients

Page 9: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

9/30

Temporal processing

• To capture the temporal features of the spectral envelop; to provide the robustness:–Delta Feature: first and second order differences; regression–Cepstral Mean Subtraction:

• For normalizing for channel effects and adjusting for spectral slope

Page 10: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

10/30

Perceptual Linear Prediction (PLP)

• Compute magnitude-squared of Fourier transform• Apply triangular frequency weights that represent the

effects of peripheral auditory frequency resolution

• Apply compressive nonlinearities

• Compute discrete cosine transform

• Smooth using autoregressive modeling• Compute cepstral using linear recursion

Page 11: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

11/30

PLP (Cont.)

• Algorithm

Intensity-Loudness

Conversion

Inverse DFT

Find Autoregressive

Coefficients

All pole model

Critical Band Analysis

Equal Loudness Pre-

Emphasis

Speech signal

Page 12: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

12/30

RelAtive SpecTral Analysis

• Which makes PLP (and possibly also some other short-term spectrum based techniques) more robust to linear spectral distortions

• The new spectral estimate is less sensitive to slow variations in the short-term spectrum

• Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features

– This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz)

Page 13: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

13/30

RASTA (Cont.)

• Algorithm

SPECTRAL ANALYSIS

Bank of Compressing Static Nonlinearities

Bank of Linear Band pass Filters

Bank of Expanding Static Nonlinearities

OPTIONAL PROCESSING

SPEECH SIGNAL

Page 14: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

14/30

RASTA-PLP

• Algorithm

Page 15: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

15/30

Introduction

Feature based methods– MFCC, RCC, CMN, PLP, RASTA

• Mean Normalization Root Cepstral Coefficients

• Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database

• Summery

Outline

Page 16: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

16/30

RCC-Mean Normalization

• Root Cepstral Coefficients (RCC)– Derived using root compression rather than

log compression on the filterbank energies

• Advantage of RCC to MFCC– More immune to noise– Faster decoding

P , 2, 1,jfor ,][~

][][1

0

kSkwjeN

kj

m,2, 1,ifor ,)5.0(

cos])[(][1j

P

P

jijeiRCC

Page 17: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

17/30

RCC-Mean Normalization

• Mean normalization

• If we approximate root with logarithm

NiCCC avgyiyiMNRCC ,,1 ,;;;_

N

iiyavgy C

NC

1;;

1

hsy CCCnhnsny

)()()(

avghavgsavgy CCC ;;;

0

is

hhisiMNRCC

C

CCCC

;

;;_

)(

Page 18: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

18/30

Introduction

Feature based methods– MFCC, RCC, CMN, PLP, RASTA

Mean Normalization Root Cepstral Coefficients

• Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database

• Summery

Outline

Page 19: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

19/30

Experiment 1

• Database– TFARSDAT

• 64 Speakers• 8 hours telephony speech data

• ASR– Sharif ASR System

• HMM based• Training: Segmental K-means • Search: Beam Viterbi

Page 20: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

20/30

Experiment 1

• Test results

Accuracy Correctness%

MFCC % 54.97 % 59.32

MFCC_CMS % 51.62 % 56.63

RASTA_PLPRASTA_PLP % 58.38% 58.38 % 65.59% 65.59

RCC % 55.67 % 59.85

RCC_MNRCC_MN % 56.89% 56.89 % 64.31% 64.31

Page 21: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

21/30

Experiment 2

• Aurora 2.0– Noisy connected digits recognition– 4 hours training data, 2 hours test data in

70 Noise Types/SNR conditions

• HTK– HMM based– Model for each digit

• 16 states with 3 Gaussian mixtures

Page 22: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

22/30

Experiment 2

• Average results on AURORA– Average obtained on various SNRs of a noise

Page 23: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

23/30

Experiment 2

• Subway noise in various SNRs

Page 24: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

24/30

Experiment 2

• Babble noise in various SNRs

Page 25: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

25/30

Experiment 2

• Car noise in various SNRs

Page 26: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

26/30

Experiment 2

• Exhibition noise in various SNRs

Page 27: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

27/30

Introduction

Feature based methods– MFCC, RCC, CMN, PLP, RASTA

Mean Normalization Root Cepstral Coefficients

Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database

• Summery

Outline

Page 28: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

28/30

Summery

• Various robust features was tested

• Introduce of RCC_MN

• In first experiment– RASTA-PLP – Although RCC_MN is good

• In second experiment– RCC_MN

Page 29: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Wednesday, February 18, 2005

Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology

29/30

Introduction

Feature based methods– MFCC, RCC, CMN, PLP, RASTA

Mean Normalization Root Cepstral Coefficients

Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database

Summery

Outline

Page 30: Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Thanks for your patience !