Digital Audio Signal Processing Lecture-3: Noise Reduction

Digital Audio Signal Processing

Lecture-3: Noise Reduction

Marc Moonen/Alexander BertrandDept. E.E./ESAT-STADIUS, KU Leuven

[email protected]/~moonen/

mailto:[email protected]

Digital Audio Signal Processing Version 2013-2014 Lecture-3: Noise Reduction p. 2

Overview

• Spectral subtraction for single-micr. noise reduction– Single-microphone noise reduction problem– Spectral subtraction basics (=spectral filtering)– Features: gain functions, implementation, musical noise,…– Iterative Wiener filter based on speech modeling

• Multi-channel Wiener filter for multi-micr. noise red.– Multi-microphone noise reduction problem– Multi-channel Wiener filter (=spectral+spatial filtering)

• Kalman filter based on speech & noise modeling– Kalman filters– Kalman filters for noise reduction


Single-Microphone Noise Reduction Problem

• Microphone signal is

• Goal: Estimate s[k] based on y[k]• Applications:

Speech enhancement in conferencing, handsfree telephony, hearing aids, …

Digital audio restoration • Will consider speech applications: s[k] = speech signal

desired signal estimate

desired signal

noise signal(s)

? ][ksy[k]][][][ knksky

desired signal contribution

noisecontribution


Spectral Subtraction Methods: Basics

• Signal chopped into `frames’ (e.g. 10..20msec), for each frame a frequency domain representation is

(i-th frame)

• However, speech signal is an on/off signal, hence some frames have speech +noise, i.e.

some frames have noise only, i.e.

• A speech detection algorithm is needed to distinguish between these 2 types of frames (based on energy/dynamic range/statistical properties,…)

][][][ knksky

)()()( iii NSY

frames} only'-noise{`frame )( 0 )( iii NY


Spectral Subtraction Methods: Basics

• Definition: () = average amplitude of noise spectrum

• Assumption: noise characteristics change slowly, hence estimate () by (long-time) averaging over (M) noise-only frames

• Estimate clean speech spectrum Si() (for each frame), using corrupted speech spectrum Yi() (for each frame, i.e. short-time estimate) + estimated ():

based on `gain function’

)()()(ˆ iii YGS

))(ˆ),(()( ii YfG

framesonly -noise

)(1)(ˆM

iYM

})({)( iNE


Spectral Subtraction: Gain Functions

2

2

)()(ˆ

1)(

ii

YG

2

2

)()(ˆ

1121)(

ii

YG

))(),(ˆ()( ii YfG

)SNR,SNR()( priopostfGi

)()(ˆ

1)(

ii Y

G

2

2

)()(ˆ

1)(

ii

YG

Ephraim-Malah = most frequently used in practice

Non-linear Estimation

Maximum Likelihood

Wiener Estimation

Spectral Subtraction

Magnitude Subtraction

see next slide



• Example 1: Ephraim-Malah Suppression Rule (EMSR)

with:

• This corresponds to a MMSE (*) estimation of the speech spectral amplitude |Si()| based on observation Yi() ( estimate equal to E{ |Si()| | Yi() } ) assuming Gaussian a priori distributions for Si() and Ni() [Ephraim & Malah 1984].

• Similar formula for MMSE log-spectral amplitude estimation [Ephraim & Malah 1985].

(*) minimum mean squared error

prio

priopost

prio

prio

post SNR1SNR

SNR.SNR1

SNRSNR

12

)( MGi

2

211

postprio

2

2

post

102

)()()(

1,0)-)max(SNR-(1)(SNR

)(ˆ)(

)(SNR

)2

()2

()1(][

ii

i

YG

Y

IIeM

modified Bessel functions

skip form

ulas


Spectral Subtraction: Gain Functions• Example 2: Magnitude Subtraction

– Signal model:

– Estimation of clean speech spectrum:

– PS: half-wave rectification

)(,)()()()(

iyji

iii

eYNSY

)(

)()(ˆ

1

)(ˆ)()(ˆ

)(

)(,

i

G

i

jii

YY

eYS

i

iy

))(,0max()( ii GG



• Example 3: Wiener Estimation – Linear MMSE estimation: find linear filter Gi() to minimize MSE

– Solution:

Assume speech s[k] and noise n[k] are uncorrelated, then...

– PS: half-wave rectification

2

2

2

22

,

,,

,

,

)()(ˆ

1)(

)(ˆ)()(

)()()()(

)(

ii

i

iyy

inniyy

iyy

issi

YY

YP

PPPP

G

)(

)()().()().()(

,

,

iyy

isy

ii

iii P

PYYEYSEG <- cross-correlation in i-th frame

<- auto-correlation in i-th frame

2)(ˆ

)().()(

iS

YGSE iii


Spectral Subtraction: Implementation

® Short-time Fourier Transform (=uniform DFT-modulated analysis filter bank) = estimate for Y(n ) at time i (i-th frame)

N=number of frequency bins (channels) n=0..N-1 M=downsampling factor K=frame length h[k] = length-K analysis window (=prototype filter)

® frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2M..3M)® subband processing:

® synthesis bank: matched to analysis bank (see DSP-CIS)

1

0

/2].[][],[K

k

NknjekMiykhinY

y[k]Y[n,i]

Short-timeanalysis

Short-timesynthesis

Gain functions

][ˆ ks],[ˆ inS

],[ˆ in

],[].,[],[ˆ inYinGinS


Spectral Subtraction: Musical Noise

• Audio demo: car noise

• Artifact: musical noiseWhat? Short-time estimates of |Yi()| fluctuate randomly in noise-only frames,

resulting in random gains Gi() ® statistical analysis shows that broadband noise is transformed into

signal composed of short-lived tones with randomly distributed frequencies (=musical noise)

magnitude subtraction][ky ][ˆ ks


probability that speech is present, given observation

)()()(ˆ ii YGS

instantaneousaverage

Spectral Subtraction: Musical Noise

• Solutions?- Magnitude averaging: replace Yi() in calculation of

Gi() by a local average over frames

- EMSR (p7)- augment Gi() with soft-decision VAD: Gi() P(H1 | Yi()). Gi() …


Spectral Subtraction: Iterative Wiener Filter

• Basic: Wiener filtering based spectral subtraction (p.9), with

(improved) spectra estimation based on parametric models

• Procedure:1. Estimate parameters of a speech model from noisy signal y[k]2. Using estimated speech parameters, perform noise reduction

(e.g. Wiener estimation, p. 9)3. Re-estimate parameters of speech model from the speech

signal estimate4. Iterate 2 & 3


Spectral Subtraction: Iterative Wiener Filter

white noise generator

pulse train……

pitch periodvoiced

unvoiced

x

M

m

mjme

1

1

1

sg

speechsignal

frequency domain:

time domain:

= linear prediction parameters

all-pole filter

u[k]

)(1

)(

1

Ue

gS M

m

mjm

s

M

msm kugmksks

1

][][][

TM 1α


Spectral Subtraction: Iterative Wiener FilterFor each frame (vector) y[m] (i=iteration nr.)

1. Estimate and

2. Construct Wiener Filter (p.9)

with:• estimated during noise-only periods•

3. Filter speech frame y[m]

isg , TiMii ,,1 α

)()()(..)(

nnss

ss

PPPG

)(nnP

2

1,

,

1

)(

M

m

mjim

isss

e

gP

][ˆ mis

repeat until

some errorcriterion is satisfied


Overview





Multi-Microphone Noise Reduction Problem

(some) speech estimate

speech source

noise source(s)

microphone signals Miknksky iii ...1],[][][

][kn

][ks

? ][ks

speech part noise part



Will estimate speech part in microphone 1

(*) (**)

Miknksky iii ...1],[][][

? ][1 ks

(*) Estimating s[k] is more difficult, would include dereverberation (topic 6), etc.. (**) This is similar to single-microphone model (p.3), where additional microphones (m=2..M) help to get a better estimate

][kn

][ks



• Data model:

See Lecture-2 on multi-path propagation, with q left out for conciseness.

Hm(ω) is complete transfer function from speech source position to m-the microphone

)(

)()(

)(.

)(

)()(

)(

)()(

)()().( )()()(

2

1

2

1

2

1

MMM N

NN

S

H

HH

Y

YY

S

NdNSY


Multi-Channel Wiener Filter (MWF)

• Data model:

• Will use linear filters to obtain speech estimate (as in Lecture-2)

• Wiener filter (=linear MMSE approach)

Note that (unlike in DSP-CIS) `desired response’ signal S1(w) is unknown here (!), hence solution will be `unusual’…

)()().( )( NdY S

)().()().()(ˆ1

*1 YFH

M

mmm YFS

})().()({min2

1)( YFFHSE



• Wiener filter solution is (see DSP-CIS)

– All quantities can be computed !– Special case of this is single-channel Wiener filter formula on p.9

)}().({)}().({.)}().({

...

.)}().({)}().({)(

*1

*1

1

)0)}().({(with

lationcrosscorre

*1

1

ationautocorrel*

1

NEYEE

SEE

H

NE

H

NYYY

YYYF

S

compute during speech+noise periods

compute during noise-only periods


• MWF combines spatial filtering (as in Lecture-2) with single-channel spectral filtering (as in Lecture-3 on single-channel noise reduction) :

if

then…

noise vectorsteering

2

1

)()(.)(

)(

)(

)()(

Nd

Y

S

Y

YY

M

)()}().({ NNHE ΦNN



• …then it can be shown that

• represents a spatial filtering (*)

Compare to superdirective & delay-and-sum beamforming (Lecture-2) – Delay-and-sum beamf. maximizes array gain in white noise field– Superdirective beamf. maximizes array gain in diffuse noise field – MWF maximizes array gain in unknown (!) noise field. MWF is operated without invoking any prior knowledge (steering vector/noise field) ! (the secret is in the voice activity detection… (explain))

(*) Note that spatial filtering can improve SNR, spectral filtering never improves SNR (at one frequency)

)().(.)()( 1

scalar

dΦF NN

)().(1 dΦNN



• …then it can be shown that (continued)

• represents an additional `spectral post-filter’ i.e. single-channel Wiener estimate (p.9), applied to output signal of spatial filter

(prove it!)

)().(.)()( 1

scalar

dΦF NN

1)(.)().().(

)(.)(...)( 21

*1

2

S

HS

NNH dΦd

)(



Multi-Channel Wiener Filter: Implementation

• Implementation with short-time Fourier transform: see p.10 • Implementation with time-domain linear filtering:

M

ii

Ti

iiiTi

kks

Lkykykyk

11 ].[][

]1[...]1[][][

fy

y

][kn

][ks

][1 ks][1 ky

filter coefficients

][][][ kkk iii nsy


Solution is…

}].[][{min2

1 fyf kksE T

TTM

TT

TTM

TT

kkkk ][...][][][

...

21

21

yyyy

ffff

]}[.][{]}[.][{.}][.][{

]}[.][{.}][.][{

11

1

1

1

knkEkykEkkE

kskEkkE

T

T

nyyy

yyyf

compute during speech+noise periods

compute during noise-only periods

• Implementation with time-domain linear filtering:



• Implementation with time-domain linear filtering:• Block algorithm:

For each block -Apply voice activity detection -Update correlation matrices -Recompute filter coefficients -Apply filters

• Cheaper stochastic gradient algorithms• Frequency domain algorithms

Details omitted (see literature)



SDW-MWF is MWF with additional tuning parameter:• Design criterion for can be re-written as

i.e. speech distortion+residual noise is minimized

f

fy .][][1 kks T

Speech Distortion Weighted MWF

}.][{}][.][{min...

22

1 fnfsf kEkskE TT

`speech distortion’ `residual noise’

][][][ kkk nsy

skip this s

lide


• Design criterion may now be modified to trade-off noise reduction against speech distortion:

• Then optimal solution is…

i.e. (rather) straightforward modification• By increasing `mu’, more noise is reduced, at the expense of more speech

distortion (which is acceptable to a certain level) . means all emphasis is on noise reduction, speech distortion is

ignored ( and then ! )

22

1 .][.][.][min... fnfsf kEkskE TT

0

]}[.][][.][{}][.][{).1(}][.][{ 11

1knkkykEkkEkkE TT nynnyyf

0f

Speech Distortion Weighted MWF

skip this s

lide


Overview





Kalman Filter

Given: state space model of a discrete-time MIMO system

with v[k] and w[k]: mutually uncorrelated, zero mean, white noises

Then: given A, B, C, D and input/output-observations u[k],y[k], k=1,2,...

Kalman filter produces MMSE estimates of internal states x[k], k=1,2,... (=`Wiener filter for dynamic systems’)

R00Q

wvwv

}][][.][

]][{ HH kk

kk

E

][][.][.][][][.][.]1[

kkkkkkkk

wuDxCyvuBxAx process noise

measurement noise


Kalman Filter

Definition: = MMSE-estimate of x[k] using all available data up until time l

• `FILTERING’ = estimate

• `PREDICTION’ = estimate

• `SMOOTHING’ = estimate

]|[ˆ kkx

0],|[ˆ nnkkx

]|[ˆ lkx

0],|[ˆ nnkkx


Kalman Filter: filtering and 1-step prediction

Given together with error covariance matrix P[k|k-1]

Then obtain and using u[k], y[k] : Step 1: Measurement Update

Step 2: Time Update

]1|[ˆ kkx

]1|[ˆ][]1|[~},]1|[~].1|[~{]1|[ kkkkkkkkkEkk T xxxxxP

]|[ˆ kkx

]1|[.].[]1|[]|[][]1|[ˆ.][].[]1|[ˆ]|[ˆ

kkkkkkkkkkkkkkkk

PCkPPD.uxCykxx

1].1|[..].1|[][

RCPCCPk TT kkkkk

Kalman Gain=

QAPAPB.uxAxTkkkk

kkkkk].|[.]|1[

][]|[ˆ.]|1[ˆ

]|1[ˆ kk x


Kalman Filter: smoothing

Estimate states x[1], x[2],…, x[N] based on data u[k], y[k], k = 1, 2, …N How? 1. forward run: apply previous equations for k = 1, 2, … N

Result: estimates

2. backward run: apply following equations for k = N, N -1, …1

Result: (better) estimates

]1|1[]1|1[][][]|[]1|[]1[]1|1[]|1[

]1|[ˆ]|[ˆ]1[]1|1[ˆ]|1[ˆ

1

kkkkkkNkkkkkkNk

kkNkkkkNk

T

T

PAPFFPPFPP

xxFxx

]|[ˆ],...,|1[ˆ NNxNx

]|[ˆ],...,1|1[ˆ NNxx


Kalman filter for Speech Enhancement

• Assume AR model of speech and noise

• Equivalent state-space model is…

y=microphone signal

N

mnm

M

msm

kwgmknkn

kugmksks

1

1

][][][

][][][

u[k], w[k] = zero mean, unit variance,white noise

][][][][]1[

kkykkk

T xcvAxx



with: ][]1[][]1[][ knNknksMkskT x

n

sNM

T

n

s

g00g

GCA00A

A ;100100;

1111

10000010

;10000010

NN

n

MM

s AA

Tkwkuk ][][.][ Gv

;00;00 nTns

Ts gg gg

TGGQ .



• PS: This was single-microphone case.

How can this be extended to multi-microphone case ?

Same A, x, v C=?

][][

][][]1[kk

kkkT xCy

vAxx



Disadvantages iterative approach:• complexity• delay

split signal

in frames

estimateparameters

Kalman Smoother

or Kalman Filter

reconstruct signal

imisg ,, ˆ;ˆ iming ,,

ˆ;ˆ

iterations

y[k]

][̂ks

],[ˆ mis][ˆ min

Iterative algorithm



iteration index time index (no iterations)

State Estimator:Kalman Filter

Parameters Estimator(Kalman Filter)

D

D

],1|1[̂ kks

]1|1[ˆ kkn

][ky

]|[ˆ],|[̂ kknkks

],1|1[ˆ kkm],1|1[ˆ kkm],1|1[ˆ kkgs

]1|1[ˆ kkgn

],|[ˆ kkm],|[ˆ kkm

],|[ˆ kkgs

]|[ˆ kkgn

ns ˆ,ˆ

,ˆ,ˆ mm

ns gg ˆ,ˆ

Sequential algorithm


CONCLUSIONS

• Single-channel noise reduction– Basic system is spectral subtraction– Only spectral filtering, not easily extended to multi-channel case for

additional spatial filtering– Hence can only exploit differences in spectra between noise and

speech signal:• noise reduction at expense of speech distortion• achievable noise reduction may be limited

• Multi-channel noise reduction– Basic system is MWF, possibly extended with speech distortion

regularization– Provides spectral + spatial filtering (links with beamforming!)

• Kalman filtering – Signal model based approach

Documents

Digital Audio Signal Processing Lecture-3: Noise Reduction