45
PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology Labs, Speech Laboratory [email protected] Phone: (815) 884-3071

PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

Embed Size (px)

Citation preview

Page 1: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab

How to deal with the noise in real systems?

Hsiao-Chun Wu

Motorola PCS Research and Advanced Technology Labs, Speech Laboratory

[email protected]

Phone: (815) 884-3071

Page 2: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Why do we need to study noise?

Noise exists everywhere. It affects the performance of signal

processing in reality. Since the noise cannot be avoided by system

engineers, modern “noise-processing” technology has been

researched and designed to overcome this problem. Hence many

related research areas have been emerging, such as signal detection,

signal enhancement/noise suppression and channel equalization.

Page 3: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

• Spectral Truncation – Spectral Subtraction (1989):

• Time Truncation– Signal Detection:

• Spatial and/or Temporal Filtering– Equalization:

– Array Signal Separation (Blind Source Separation):

How to deal with noise? Cut it off!!!!

)()(~

)()()(~

)(

fSfNfNfSfS

fR

noiseTnr ),()(:~

)()()()()(~

)(

tststhtwts

tr

)()()()()(~

)(

tStStHtWtS

tR

Page 4: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Session 1. On-line Automatic End-of-speech Detection Algorithm (Time Truncation)

1. Project goal.

2. Review of current methods.

3. Introduction to voice metric based end-of-speech detector.

4. Simulation results.

5. Conclusion.

Page 5: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

1. Project Goal:

• Problem

– Digit-dial recognition with unknown digit string length

• Solution 1

– fixed length window such as 10 seconds? (inconvenience to users)

• Solution 2

– Dynamic termination of data capture? (need a robust detection

algorithm)

Page 6: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

• Research and design a robust dynamic termination mechanism for speech

recognizer.

– a new on-line automatic end-of-speech detection algorithm with small

computational complexity.

• Design a more robust front end to improve the recognition accuracy for

speech recognizers.

– a new algorithm can also decrease the excessive feature extraction of redundant

noise.

Page 7: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

2. Review of Current Methods:

Most speech detection algorithms can be characterized into three categories.

• Frame energy detection– short-term frame energy (20 msec) can be used for speech/noise

classification.

– it is not robust at large background noise levels.

• Zero-crossing rate detection– short-term zero-crossing rate can also be used for speech/noise

classification.

– it is not robust in a wide variety of noise types.

• Higher-order-spectral detection– short-term higher-order spectra can be used for speech/noise

classification.

– it implies a heavy computational complexity and its threshold is difficult to be pre-determined.

Page 8: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

3. Introduction to Voice Metric Based End-of-speech Detector:

• End-of-speech detection using voice metric features is based on the Mel-

energies. Voice metric features are robust over a wide variety of background

noise. Originally voice metric based speech/noise classifier was applied for

IS-127 CELP speech coder standard. We modify and enhance voice-metric

features to design a new end-of-speech detector for Motorola voice

recognition front end (VR LITE III).

Page 9: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Page 10: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Page 11: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Page 12: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Page 13: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Page 14: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Page 15: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

voice metric score table

Page 16: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Pre-S/NClassifier

VoiceMetric

Mel-Spectrum

SNREstimate

EOSBuffer

ThresholdAdaptation

raw dataFFT

Speech Start?

Silence Duration

Threshold

Post-S/NClassifier

voice metric scores

Original VR LITE Front End

End-of-speech Detector data capture stops

yes

no

Page 17: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

VR LITE recognition engine

feature vector frame buffer

segmentation of speech into frames

data capture terminates

end of speech?

yes

noframe inext frame i+1

speech input

front end with end-of-speech detector

Page 18: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

6.51 seconds

3.78 seconds

4.81 seconds

raw data

end point

detected end point

String “2-2-9-1-7-8” in Car 55 mph

Page 19: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Correct detection

End point

False detection

false detection time error

correct detection time error

String “2-2-9-1-7-8” in Car 55 mph

seconds

Page 20: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

4. Simulation Results: (Simulation is done over Motorola digit-string database, including 16 speakers and 15,166 variable-length digit strings in 7 different conditions. Silence threshold is 1.85 seconds.)

A. Receiver Operating Curve (ROC): ROC curve is the

relationship between the end-of-speech detection rate versus the

false (early) detection rate. We compare two different methods,

namely, (1) new voice-metric based end-of-speech detector and

(2) old speech/noise flag based end-of-speech detector.

Page 21: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

ROC curve

false detection rate (%)

dete

ctio

n ra

te (

%)

Page 22: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

• B. String-accuracy-convergence (SAC) curve: SAC

curve is the relationship between the string recognition accuracy

versus the false (early) detection rate. We compare two different

methods, namely, (1) new voice-metric based end-of-speech

detector and (2) old speech/noise flag based end-of-speech

detector.

Page 23: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

false detection rate (%)

stri

ng r

ecog

nitio

n ac

cura

cy (

%)

SAC curve

Page 24: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

C. Table of detection results: (This table illustrates the result among the Madison sub-database including data files with 1.85 seconds or more of silence after end of speech.)

Condition AverageTime Error

AverageFalseDetectionTime Error

AverageCorrectDetectionTime Error

FalseDetectionRate

StringNumbers

TotalDetectionRate

Overall 1.98 sec 1.68 sec 1.85 sec 0.47% 7,418 86.08%OfficeClose-talk

1.97 sec 0 sec 1.93 sec 0% 907 94.82%

OfficeArm-length

1.98 sec 0 sec 1.93 sec 0% 988 93.62%

CaféClose-talk

2.17 sec 0 sec 2.00 sec 0% 1,147 81.87%

Café Arm-length

2.31 sec 0.14 sec 2.00 sec 0.11% 898 57.57%

Car Idle(HF)

1.91 sec 1.02 sec 1.84 sec 0.08% 1,210 93.97%

Car35mph(HF)

1.93 sec 0.96 sec 1.77 sec 0.71% 1,130 87.61%

Car 55mph (HF)

1.66 sec 2.00 sec 1.59 sec 2.20% 1,138 89.63%

Page 25: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

(This table illustrates the result over the small database collected by Motorola PCS CSSRL. All digits strings are recorded in 15 seconds of fixed window)

Condition Average Time Error

Average False

Detection Time Error

Average Correct

Detection Time Error

False Detection

Rate

String Numbers

Total Detection

Rate

String Recognition

Accuracy

(w/i EOS)

String Recognition

Accuracy

(w/o EOS)

Overall 1.82 seconds

0 seconds 1.82 seconds

0% 121 96.69% 50.41% 29.75%

Office Close-talk

1.85 seconds

0 seconds 1.85 seconds

0% 21 100% 66.67% 61.90%

Office-Arm-length

1.84 seconds

0 seconds 1.84 seconds

0% 20 100% 65.00% 65.00%

Café Close-talk

1.76 seconds

0 seconds 1.76 seconds

0% 40 100% 40.00% 15.00%

Café Arm-length

1.85 seconds

0 seconds 1.85 seconds

0% 40 90% 45.00% 10.00%

Page 26: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Analysis of the Simulation Result: Why didn’t EOS detection work well in babble noise?

Page 27: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Optimal Detection Decision

• Bayes classifier

• Likelihood Ratio Test

)]|(log[)]|(log[ xnf

H

H

xnsf

n

s

])(

)(log[,)(

)]|(log[)]|(log[)(

nsf

nfTT

H

H

xL

nxfnsxfxL

BayesBayes

n

s

Page 28: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Digit “one” in close-talking mic, quiet office

Page 29: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Digit “one” in handsfree mic, 55 mil/h car

Page 30: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Digit “one” in far-talking mic, cafeteria

Page 31: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

5. Conclusion:• New voice-metric based end-of-speech detector is robust over a wide

variety of background noise.

• Only a small increase in the computational complexity will be brought by

new voice-metric based end-of-speech detector and it can be real-time

implementable.

• New voice-metric based end-of-speech detector can improve recognition

performance by discarding extra noise due to the fixed data capture

window.

• New voice-metric based end-of-speech detector needs further improvement

in the babble noise environment.

Page 32: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Session 2. Speech Enhancement Algorithms: Blind

Source Separation Methods (Spatial and Temporal Filtering)

1. Motivation and research goal.

2. Statement of “blind source separation” problem.

3. Principles of blind source separation.

4. Criteria for blind source separation.

5. Application to blind channel equalization for digital

communication systems.

6. Simulation and comparison.

7. Summary and conclusion.

Page 33: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

1. Motivation:

• Mimic human auditory system to differentiate the subject signals from other sounds, such as interfered sources, background noise for clear recognition of the subject contents.

• ‘One of the most striking facts about our ears is that we have two of them--and yet we hear one acoustic world; only one voice per speaker.’ (E. C. Cherry and W. K. Taylor. Some further experiments on the recognition of speech, with one and two ears. Journal of the Acoustic Society of America, 26:554-559, 1954)

• The ‘‘cocktail party effect’’--the ability to focus one’s listening attention on a single talker among a cacophony of conversations and background noise--has been recognized for some time. This specialized listening ability may be because of characteristics of the human speech production system, the auditory system, or high-level perceptual and language processing.

Page 34: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Research Goal:

Design a preprocessor with digital signal processing speech

enhancement algorithms. The input signals are collected through

multiple sensor (microphone) arrays. After the computation of

embedded signal processing algorithms, we have clearly separated

signals at the output.

Page 35: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Audio Input

Blind Source Separation Algorithms

Enhanced Output

Page 36: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

2. Problem Statement of Blind Source Separation:

What is “Blind Source Separation”?

Sensor 1 Sensor N

Signal 1 Signal M

Received input signals

Sensor 1 Sensor N

Signal 1 Signal M

Received input signals

Given the N linearly mixed received input signals, we need to recover the M statistically independentsources as much as possible ( ).MN

Page 37: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Formulation of Blind Source Separation Problem:

A received signal vector from the array, X(t), is the original source vector S(t)

through the channel distortion H(t), such that X(t) = H(t) S(t), where

and

We need to estimate a separator W(t) such that

where

TMT

N tststStxtxtX )()()(,)()()( 11

)()(

)(

)()(

)(

1

111

thth

th

thth

tH

NMN

ij

M

)()(00)()()(~

1 tXtWtststS TM

)()(

)(

)()(

)(

1

111

twtw

tw

twtw

tW

NNN

pq

N

Page 38: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

3. Principles of Blind Source Separation:

The independence measurement: Shannon’s Mutual information.

0),,,()(),,,( 211

21

N

N

iiN yyyHyHyyyI

y

iiyNYN yfEyyyfEyyyI

i1

2121 )]}({log[)]},,,({log[),,,(

Page 39: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

4. Criteria to Separate Independent Sources:

• Constrained Entropy (Wu, IJCNN99):

• Hardamard Measure (Wu, ICA99):

• Frobenius Norm (Wu, NNSP97):

• Quadratic Gaussianity (Wu, NNSP99):

N

iiiiyfWJ

101 )],,(log[])det(log[

)][log(}])[(log{2TT YYEYYEdiagJ

23 ])[(][

F

TT YYEdiagYYEJ

iiGiY dyyfyfJi

24 )()(

Page 40: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

We apply the minimization of modified constrained entropy

to adapt an equalizer w(t) =[w0, w1, ....] for

a digital channel h(t). Assume a PAM signal constellation with symbols s(t) = , passing through a digital channel h(t) = [c(t, 0.11) + 0.8c(t-1, 0.11) - 0.4c(t-3, 0.11)]W6T(t),

where is raised-cosine function with

roll-off factor and is a rectangular window. the input signal

to the equalizer is where n(t) is the background noise. We

applied generalized anti-Hebbian learning to adapt w(t)

such that .

5. Application to Blind Single Channel Equalization for Digital Communication Systems:

N

iiii

N yfwJ1

01 )],,(log[)log(

1

2

2241

)cos()(sin),(

T

tTt

Tt

ctc

)()()( tthtw

)6

(6 T

trectW T

)()()()( tnsthtx

Page 41: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Signal-to-noise Ratio (dB)

Sign

al-t

o-i n

t erf

eren

ce R

atio

(d B

)

Page 42: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

Signal-to-noise Ratio (dB)

Bit

Err

or R

ate

Page 43: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

6. Simulation and Comparison:

The simulation results for comparison among our generalized

anti-Hebbian learning, SDIF algorithm and Lee’s Informax method

(Lee IJCNN97) over three real recordings downloaded from Salk

Institute, University of California at San Diego.

Page 44: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

New VR LITE Frontend: Blind Source Separation + End-of-speech Detection

schemes AverageDetection

Time Error

AverageFalse

DetectionTimeError

AverageCorrect

DetectionTimeError

Number ofStrings

FalseDetection

Rate

TotalDetection

Rate

EOSonly

0.256seconds

0.155seconds

0.317seconds

14 7.14% 42.86%

BSS+EOS

0.236seconds

0.125seconds

0.322seconds

14 7.14% 50.00%

Page 45: PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology

PCS Research & Advanced Technology Labs

Speech Lab November 14, 2000

7. Conclusion and Future Research:

• The computational efficiency of blind source separation needs to

be reduced.

• Test BSS for EOS detection under microphone arrays of the same

kind.

• Incorporate other array signal processing (beamformer?)

technique to improve speech detection and recognition.