A noise-estimation algorithm for highly non-stationary environments

1

A noise-estimation algorithm for highly non-stationary environments

Sundarrajan Rangachari, Philipos C. Loizou

Department of Electrical Engineering, University of Texas at Dallas, P.O. Box 830688, EC 33 Richardson, TX 75083-0688, USA

Presenter: Shih-Hsiang( 士翔 )

SPEECH COMMUNICATION Vol. 48(2), 2006

2

Reference Doblinger, G., 1995. Computationally efficient speech enhancement

by spectral minima tracking in subbands. Proc. Eurospeech 2, 1513–1516.

Hirsch, H., Ehrlicher, C., 1995. Noise estimation techniques for robust speech recognition. Proc. IEEE Internat. Conf. on Acoust. Speech Signal Process., 153–156.

Martin, R., 2001. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9 (5), 504–512.

Cohen, I., 2002. Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Process. Lett. 9 (1), 12–15.

Hu, Y., Loizou, P., 2004. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. Speech Audio Process. 12 (1), 59–67.

3

Introduction In most speech-enhancement algorithms, it is make assumed

that an estimate of the noise spectrum is available It is critical for the performance of speech-enhancement algorithms

The noise estimate can have a major impact on the quality of the enhanced signal If the noise estimate is too low, annoying residual noise will be audible If the noise estimate is too high, speech will be distorted

The simplest approach is to estimate and update the noise spectrum during the silent segments of the signal Using a voice activity detection (VAD) algorithm It only work satisfactorily in stationary noise, not work well in more

realistic environments (non-stationary noise) Hence there is a need to update the noise spectrum

continuously over time

4

Proposed noise-estimation algorithmsCompute smooth speech power spectrumCompute smooth speech power spectrum

)()()( ndnxny Let the noisy speech signal in the time domain be denoted as

Noisy speech Clean speech Additive noise

The smoothed power spectrum of noisy speech is computed using the following first-order recursive equation

2),()1(),1(),( kYkPkP

Smooth power spectrum

Smoothing constant

Frame index Frequency index

5

END

),),

ELSE

1),1

11),

THEN ),1 IF

min

minmin

min

kP(λk(λP

,k))P(λk(P(λ,k)(λγPk(P

kP(λ,k)(λP

Proposed noise-estimation algorithmsTracking the minimum of noisy speechTracking the minimum of noisy speech

Local minimum of the noisy speech power spectrum

β and γ are constants which are determined experimentally

The look ahead factor β controls the adaptation time of the local minimum

6

Proposed noise-estimation algorithmsTracking the minimum of noisy speechTracking the minimum of noisy speech

Plot of noisy speech power spectrum and local minimum using (3) for a speech degraded by babble noise at 5 dB SNR at frequency bin k=5

7

),()1(),1(),( kIkpkp pp

Proposed noise-estimation algorithmsSpeech-presence probabilitypeech-presence probability

),(/),(),( min kPkpkSr

Let the ratio of noisy speech power spectrum and its local minimum be defined as

The power spectrum of noisy speech will be nearly equalto its local minimum when speech is absent

Smoothing constant

END

absentspeech 0),(

ELSE

presentspeech 1),(

THEN )(),( IF

kI

kI

kkλSr

The speech-presence probability, P(λ,k), is updated using the following first-order recursion

The above recursion implicitly exploits the correlation for speech presence in adjacent frames

8

Proposed noise-estimation algorithmsSpeech-presence probabilitypeech-presence probability

Top panel: Plot of estimated speech-presence probability based on the ratio Sr(λ,k)Bottom panel: spectrogram of the clean signal.

9

Proposed noise-estimation algorithmsComputing frequency-dependent smoothing constantsComputing frequency-dependent smoothing constants

),()1(),( kpk dds

Using the speech-presence probability estimate, we compute thetime-frequency dependent smoothing factor as follows

constant

Note that αs(λ ,k) take values in the range of αd ≤ αs(λ ,k) ≤ 1

Finally, the noise spectrum estimate is updated as

2),()),(1(),1(),(),( kYkkDkkD ss

10

Proposed noise-estimation algorithms

Plot of true noise spectrum and the estimated noise spectrum using our proposed method for a speech degraded by babble noise at 5 dB SNR and single frequency f = 250 Hz.

11

Minimum statistics (MS) (Martin, 2001)

Comparison with existing algorithmsMMinimum statistics (MS)inimum statistics (MS) (Martin, 2001)(Martin, 2001)

)(1

)(2),(),(

~

DM

DMkQkQ eq

eq

),(~

2)1(1),(min

kQDkB

eq

),()),(,(),(2 kPkQDBk eqN the power spectral densities of the noise signal

Equivalent degrees of freedom

2),()),(1(),1(),(),( kYkkPkkP

22 )1),(/),1((1

1),(

kkPk

N

12

Comparison with existing algorithmsMMinimum statistics (MS)inimum statistics (MS) (Martin, 2001)(Martin, 2001)

Comparison between the noise spectrum estimated using the proposed algorithm (thick line) and Martins (Martin, 2001) (dashed line) algorithm for a sentence corrupted by car noise (t < 1.8 s) followed by a sentence corrupted by multi-talker babble (t > 1.8 s).

13

Comparison with existing algorithmsContinuous minima tracking (Doblinger, 1995)ontinuous minima tracking (Doblinger, 1995) Continuous minima tracking (Doblinger, 1995)

2),()1(),1(),( kYkPkP

END

),),

ELSE

1),1

11),

THEN ),1 IF

kP(λk(λP

,k))P(λk(P(λ,k)(λγPk(P

kP(λ,k)(λP

N

NN

N

Drawback: the noise estimate increases whenever the noisy speech power increases

14

Comparison with existing algorithmsContinuous minima tracking (Doblinger, 1995)ontinuous minima tracking (Doblinger, 1995)

Top panel: Plot of true noise spectrum and estimated noise spectrum using the proposed method Bottom panel: Plot of true noise spectrum and estimated noise spectrum using (Doblinger, 1995)Arrows indicate regions where noise is overestimated.

15

Weighted average technique (Hirsch and Ehrliche , 1995)

Comparison with existing algorithmsWeighted average technique (Hirsch et al., 1995)Weighted average technique (Hirsch et al., 1995)

)1(ˆ)()1()(ˆ kNkXkN iii l-th frame

i-th subbandestimate noise magnitude

spectral magnitude

estimate valuesnegative

stop valuespositive)(ˆ)(:threshold

kNkX ii

It fails when there is a sudden increase in noise level. This will result in a situation where the noisy speech spectrum will never be smaller than the threshold, since the threshold is based on the past noise estimates already very low. Thus, the noiseestimate will not be updated if the noise power remains at that high level

16

Comparison with existing algorithmsWeighted average technique (Hirsch et al., 1995)Weighted average technique (Hirsch et al., 1995)

Comparison of estimated noise spectrum (f = 500 Hz) of proposed method (dashed line) with that of Hirsch and Ehrlicher (1995) (solid line) for a noisy speech of SNR 20 dB (t < 1.8 s) followed by a noisy speech of SNR 5 dB (t > 1.8 s).

17

Comparison with existing algorithmsMinima controlled Recursive Averaging (Cohen,2002)Minima controlled Recursive Averaging (Cohen,2002)

Minima controlled Recursive Averaging (MRCA) (Cohen,2002)

),(),(:),(0 lkDlkYlkH Given two hypotheses

),(),(),(:),(1 lkDlkXlkYlkH speech absence

speech presence

l-th frame

k-th subband

Let λd(k,l)=E[|D(k,l)|2] denote the variance of the noise in the k-th band2

0 ),()1(),(ˆ)1,(ˆ:),(' lkYlklklkH dddd ),(ˆ)1,(ˆ:),('1 lklklkH dd

speech absence

speech presenceSmoothing constant

Let p’(k,l)=p(H’1(k,l)|Y(k,l)) denote the conditional signal presence probability

2

2

),()],(~1[),(ˆ),(~

)),('1(]),()1(),(ˆ[),('),(ˆ)1,(ˆ

lkYlklklk

lkplkYlklkplklk

ddd

ddddd

),(')1(),(~ lkplk ddd where

18


w

wif likYiblkS

2),()(),(

),()1()1,(),( lkSlkSlkS fss

)},(),1,(min{),( minmin lkSlkSlkS

0),(,

1),(,

),(

),(),(

'0

'1

min

lkIH

lkIH

lkS

lkSlkSratio

),()1()1,('ˆ),('ˆ lkIlkplkp pp

Let the local energy of the noisy speech be obtained by smoothing the magnitude squared of its STFT in time and frequency. In frequency, use a window function b whose length is 2w+1

In time, the smoothing is performed by a first order recursive averaging, given by

Track the minimum of the local energy

Speech presence is determined by the ratio between the local energy of the noisy speech and its minimum within a specified time window

The conditional signal presence probability calculated as follow

19


The local minimum in (Cohen, 2002) was found by tracking the minimum of noisy speech over a search window spanning L frames, this has some drawbacks: The minimum is sensitive to outliers The minima tracking may lag by as many as 2L frames

In this paper The estimate of the noise spectrum in the proposed method is not

influenced by the minimum-search window the threshold used in our method for identifying speech presence

/absence regions is frequency dependent while that of Cohen (2002) is fixed for all frequencies

20

Experimental Combined with a Wiener-type speech-enhancement algorith

m (Hu and Loizou, 2004) Estimate the spectral gain function

),(),(

),(),(

kDkC

kCkG

k

)},(),,(),(max{),(2

kvDkDkYkC

where C(λ,k) is the estimated clean speech spectrum compute as follow

where v=0.001 is a small positive number

5SNR

20SNR

20SNR5-

1

)(

dB

dB

dB

max

0

s

SNRdb

k

μmax is the maximum allowable value of μ,which was set to 10

μ0=(1+4 μmax)/5

s=25/(μmax-1)

21

Experimental (cont.) Obtain the enhanced spectrum

Other parameters

αd=0.85, αp=0.2, β=0.8, γ=0.998, η=0.7

),(),(),( kGkYkX where X(λ,k) is the enhanced spectrum

2

1

5

2

2

)(

/FkMF

MFkLF

LFk

k

s

where LF and MF are the bins corresponding to 1 and3 kHz, and Fs is the sampling frequency

22

Experimental ResultSubjective evaluationSubjective evaluation Using formal listening tests

Single noiseSentences were degraded by either multi-talker babble noise or factory noise

Triplet noiseThree different noise types (multi-talker babble, factory noise, and white noise) appear in proper order without any pauses in the middle

The listeners were asked to select from the pair of stimuli presented the sentence which was more natural, easier to listen and free of artifacts

A preference score of 100% would indicate that listeners preferred the proposed method over the other methods all the time

23

Experimental ResultSubjective evaluationSubjective evaluation

due to the fact that proposed noise-estimation algorithm adapts quickly to the highly non-stationary environments

24

Mean squared error between the true noise spectrum and the estimated noise spectrum

Log-likelihood ratio (LLR) measure

autocorrelation matrix of the original (clean) speech frame

Experimental ResultObjective evaluationObjective evaluation

1

02

22

),(

),(),(1 M

k D

k D

k

kkD

MMSE

Txxx

Tyxy

LLR aRa

aRad 10log

true noise power spectrum

estimated noise power spectrum

total frame number

linear prediction coefficient vector of the original (clean) speech frame

linear prediction coefficient vector of the enhanced speech frame

The LLR is a spectral distance measure which mainly models the mismatch between the formants of the original and enhanced signals

25

Experimental ResultObjective evaluationObjective evaluation Segmental SNR

Ll

N

k

N

k

lkD

lkX

LSegSNR 2/

0

2

2/

0

2

),(

),(log

10

the set of frames that contain speech

26

Experimental ResultObjective evaluation (MSE)Objective evaluation (MSE)

The MSE results are not consistent with the preference outcomes, in that lower MSE values did not suggest better preference.

This indicates that the MSE measure might not be a reliable measure for assessing performance of noise-estimation algorithms. 1. this measure is sensitive to outlier values 2. it treats noise overestimation and noise underestimation errors the same

27

Experimental ResultObjective evaluation (LLR and SNR)Objective evaluation (LLR and SNR)

The segmental SNR values and the LLR values shown in Table 3 were found to be more consistent with the subjective evaluation results

28

Experimental Result

Panel A – Clean Speech Panel C – Martins (2001) Panel E - Proposed method Panel B – Noisy Speech Panel D – Cohen (2003)

29

Conclusions The noise estimate was updated continuously in every frame

using time–frequency smoothing factors calculated based on speech-presence probability in each frequency bin of the noisy speech spectrum The speech-presence probability was estimated using the ratio of noisy s

peech power spectrum to its local minimum The update of noise estimate was faster for very rapidly varyin

g non-stationary noise environments

Documents

A noise-estimation algorithm for highly non-stationary environments