Download pdf - Dual Transfer Function GSC and Application to Joint Noise ...webee.technion.ac.il/Sites/People/IsraelCohen/Info/Graduates/PDF/... · Dual Transfer Function GSC and Application to

Dual Transfer Function GSC and Application toJoint Noise Reduction and Acoustic Echo

Cancellation

Gal ReuvenUnder supervision of Sharon Gannot1 and Israel Cohen2

1School of Engineering, Bar-Ilan University, Ramat-Gan2Department of Electrical Engineering, Technion, Haifa

February, 2006

DTF-GSC AND APP. TO JOINT NR AND AEC Motivation

Motivation

• Interferences degrade

– Intelligibility– Speech compression quality– Speech recognition rates

MICROPHONES ARRAY

AMBIENT NOISE

DESIRED SPEECH SIGNAL

COMPEETING SPEECH SIGNAL

NOISE SOURCE

SPEECH ENHANCEMENT

SYSTEM

Goal: speech enhancement by joint interference and noisereduction system

1

DTF-GSC AND APP. TO JOINT NR AND AEC Outline

Outline

• Problem presentation

• The DTF-GSC

• Estimation

• Performance analysis and experimental study

• Application: joint AEC and NR

– Cascade schemes– ETF-GSC scheme

2

DTF-GSC AND APP. TO JOINT NR AND AEC Problem Presentation

Problem Presentation

• M ≥ 3 microphones

• One desired speech signal

• One directional interference signal

• One directional/ambient noise signal

• Arbitrary acoustic transfer functions

(ATFs)

MICROPHONES ARRAY

AMBIENT NOISE


COMPEETING SPEECH SIGNAL

NOISE SOURCE

SPEECH ENHANCEMENT

SYSTEM

zm(t) = am(t) ∗ s1(t) + bm(t) ∗ s2(t) + nm(t)

m = 1, . . . , M

3


Time Domain Presentation

zm(t) = am(t) ∗ s1(t) + bm(t) ∗ s2(t) + nm(t); m = 1, . . . , M

where

am(t): the acoustical impulse responses of the m-thmicrophone to the desired speech source

bm(t): the acoustical impulse responses of the m-thmicrophone to the non-stationary interference source

s1(t): the desired speechs2(t): the non-stationary interference sourcenm(t): the (directional or nondirectional) stationary noise

signal at the m-th microphone

4


Frequency Domain Presentation

STFT:

Z(t, ejω) = A(ejω)S1(t, ejω) + B(ejω)S2(t, ejω) + N(t, ejω)where

Z(t, ejω) =[

Z1(t, ejω) Z2(t, ejω) · · · ZM(t, ejω)]T

A(ejω) =[

A1(ejω) A2(ejω) · · · AM(ejω)]T

B(ejω) =[

B1(ejω) B2(ejω) · · · BM(ejω)]T

N(t, ejω) =[

N1(t, ejω) N2(t, ejω) · · · NM(t, ejω)]T

5


Goal

• Reconstruct the desired speech signal in an environment contains

– Reverberation– Competing speech signal (double talk)– Stationary noise

• Applications

– Blind source separation (BSS)– Acoustic echo cancellation (AEC)

• Methods

– Extend TF-GSC such that it will apply null to the interferencedirection

– Exploiting non stationarity of desired and interference signals

6

DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC

Dual Source Transfer-FunctionGeneralized Sidelobe Canceller (DTF-GSC)

��

��

��

��

��

��

��

W†0

+

ZM (t, ejω)

Y (t, ejω)

YMBF(t, ejω)

∑ ∑

∑

U3(t, ejω)

U4(t, ejω)

UM (t, ejω)

YNC(t, ejω)

H†

Z1(t, ejω)

Z2(t, ejω)

Z3(t, ejω)

G3(t, ejω)

G4(t, ejω)

GM (t, ejω)

−

7


Method

Extend the TF-GSC for dealing with nonstationary interference

X Matched beamformer (MBF)

Distortionless to the desired direction while blocking the interferencedirection

X Blocking matrix (BM)

Blocking both desired and interference directions

X Adaptive noise canceller (ANC)

Estimates the residual noise at the MBF output using reference signalsproduced by the BM

8


Matched beamformer

ATFs ratio matched filter:

W0(ejω) =

A(ejω)

‖A(ejω)‖2 − ρ(ejω) B(ejω)

‖A(ejω)‖‖B(ejω)‖1− |ρ(ejω)|2 F(ejω)

ρ(ejω)≡ B†(ejω)A(ejω)‖A(ejω)‖ ‖B(ejω)‖

Easily verified:

• A†(ejω)W0(ejω) = F(ejω)

• B†(ejω)W0(ejω) = 0

9


Blocking Matrix

H(ejω) =

Q3(ejω) Q4(ejω) · · · QM(ejω)L3(ejω) L4(ejω) · · · LM(ejω)

1 0 · · · 00 1 · · · 0

· · · . . .0 0 · · · 1

Qm(ejω) =−A∗2(e

jω)

A∗1(ejω)

B∗m(ejω)

B∗1(ejω)− B∗2(ejω)

B∗1(ejω)

A∗m(ejω)

A∗1(ejω)

A∗2(ejω)

A∗1(ejω)− B∗2(ejω)

B∗1(ejω)

; m = 3, . . . , M

Lm(ejω) =−A∗m(ejω)

A∗1(ejω)− B∗m(ejω)

B∗1(ejω)

A∗2(ejω)

A∗1(ejω)− B∗2(ejω)

B∗1(ejω)

; m = 3, . . . , M

10


Blocking Matrix

Easily verified:

• A†(ejω)H(ejω) = 0

• B†(ejω)H(ejω) = 0

11


Adaptive Noise Canceller

Normalized LMS:

Gm(t + 1, ejω) = Gm(t, ejω) + µUm(t, ejω)Y ∗(t, ejω)

Pest(t, ejω)

Gm(t + 1, ejω) FIR←− Gm(t + 1, ejω)

for m = 3, . . . , M ; where

Pest(t, ejω) = ηPest(t− 1, ejω) + (1− η)‖Z(t, ejω)‖2

12

DTF-GSC AND APP. TO JOINT NR AND AEC Estimation

Estimation

MBF components:

Done in a two steps procedure

• EstimatingA∗m(ejω)

A∗1(ejω)and

B∗m(ejω)

B∗1(ejω)exploiting non stationarity

• calculating W0(ejω)

13


Estimation

An unbiased estimate ofA∗m(ejω)

A∗1(ejω)and

B∗m(ejω)

B∗1(ejω)is obtained by applying LS to

Φ(1)zmz1(e

jω)Φ(2)

zmz1(ejω)

...

Φ(K)zmz1(e

jω)

=

Φ(1)z1z1(e

jω) 1Φ(2)

z1z1(ejω) 1

...

Φ(K)z1z1(e

jω) 1

[Hm(ejω)

Φumz1(ejω)

]+

ε(1)m (ejω)

ε(2)m (ejω)

...

ε(K)m (ejω)

(a separate set of equations is used for m = 2, . . . , M).

14


Estimation

BM components:

Estimation method depends on type of frames

• Single speech signal is active:A∗m(ejω)

A∗1(ejω)or

B∗m(ejω)

B∗1(ejω)is adapted and H(ejω)

is calculated

• Double talk: Qm(ejω) and Lm(ejω) are estimated directly by solving

15


Estimation

Φ(1)zmz1(e

jω)Φ(2)

zmz1(ejω)

...

Φ(K)zmz1(e

jω)

=

Φ(1)z1z1(e

jω) Φ(1)z2z1(e

jω) 1Φ(2)

z1z1(ejω) Φ(2)

z2z1(ejω) 1

...

Φ(K)z1z1(e

jω) Φ(K)z2z1(e

jω) 1

×

−Qm(ejω)−Lm(ejω)Φumz1(e

jω)

+

ε(1)m (ejω)

ε(2)m (ejω)

...

ε(K)m (ejω)

(a separate set of equations is used for m = 3, . . . , M)

16

DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis

DTF-GSC Performance Analysis

• General expression for the output power spectral density:

Φyy(t, ejω

) =

{W0

†(e

jω)ΦZZ(t, e

jω)W0(e

jω)

−W0†(e

jω)ΦNN(t, e

jω)H(e

jω)(H†(ejω

)ΦNN(t, ejω

)H(ejω

))−1

H†(ejω)ΦZZ(t, e

jω)W0(e

jω)

−W0†(e

jω)ΦZZ(t, e

jω)H(e

jω)(H†(ejω

)ΦNN(t, ejω

)H(ejω

))−1

H†(ejω)ΦNN(t, e

jω)W0(e

jω)

+W0†(e

jω)ΦNN(t, e

jω)H(e

jω)(H†(ejω

)ΦNN(t, ejω

)H(ejω

))−1

H†(ejω)ΦZZ(t, e

jω)H(e

jω)

×(H†(ejω

)ΦNN(t, ejω

)H(ejω

))−1

H†(ejω)ΦNN(t, e

jω)W0(e

jω) }

• PSD depends on:

– Input signal PSD– Noise signal PSD– Signal ATF ratios

17


DTF-GSC Performance Analysis

Output power density

• 10 microphones linear array

• Delay only ATFs for speech and noise

• Maintaining desired signal at θ = 90o

• Blocks directional noise from θ =120o

• Blocks interference from θ = 60o

18


PSD deviation

DEV(t, ejω) =Φs1

yy(t, ejω)

|F(ejω)|2|A1(ejω)|2Φs1s1(t, ejω)


• Delay only ATFs for speech

• Directional noise field

• Desired signal from θ = 90o

• Upto 4dB distortion in frequenciesbelow 3000Hz

01000

20003000

4000 87 88 89 90 91 92 93

−10

−6

−2

2

θ [deg]Frequency[Hz]

Φyy

[dB

]

19


Noise Reduction

NR(t, ejω) =Φn

yy(t, ejω)

|F(ejω)|2|D1(ejω)|2Φnn(t, ejω)



• Directional noise signal from θ =120o

• 50dB attenuation in the noise direc-tion

0

1000

2000

3000

4000

115116117118119120121122123124125−60

−40

−20

0

Frequency[Hz]θ [deg]

Φyy

[dB

]

20


Interference Reduction

NIR(t, ejω

) =Φs2

yy(t, ejω)

|F(ejω)|2|B1(ejω)|2Φs1s1(t, ejω)



• Directional noise field

• Interference signal from θ = 60o

• 50dB attenuation in the interference direc-

tion

0

1000

2000

3000

4000

5556575859606162636465−60

−40

−20

0

Frequency[Hz]θ [deg]

Φyy

[dB

]

21

DTF-GSC AND APP. TO JOINT NR AND AEC Experimental study

Experimental study

• Speech signal

• simulated ATFs in two noise fields:

– directional noise– diffused noise

• Sonograms

• Performance evaluation

22


Sonograms

Time [Sec]

Fre

quen

cy [H

z]

(a)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

Time [sec]

Fre

quen

cy [H

z]

(c)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

Time [sec]

Fre

quen

cy [H

z]

(e)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

Time [sec]

Fre

quen

cy [H

z]

(b)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

Time [sec]

Fre

quen

cy [H

z](d)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

Time [sec]

Fre

quen

cy [H

z]

(f)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

23


Performance evaluation

Noise and interference reduction in

• directional noise field (top)

• diffused noise field (bottom)

Input Output of Output of Output ofMBF BM DTFGSC

S1NR S1S2R S1NR S1S2R S1NR S2NR S1NR S1S2R11.3 2.3 13.8 16.9 -3.9 -4.5 34.6 12.712.7 2.3 17.4 25 -3.8 -3.5 20.9 22.6

24

DTF-GSC AND APP. TO JOINT NR AND AEC Application

Application: joint noise reduction and echo cancellation

• M ≥ 3 microphones

• One desired speech signal

• One competitive speech signal

(echo)

• One directional/ambient noise signal

• Arbitrary acoustic transfer functions

(ATFs)

MICROPHONES ARRAY

AMBIENT NOISE


REMOTE SPEECH SIGNAL

NOISE SOURCE

SPEECH ENHANCEMENT

SYSTEM

zm(t) = am(t) ∗ s1(t) + bm(t) ∗ e(t) + nm(t)

m = 1, . . . , M

25


Cascade scheme

• AEC-BF: multichannel AEC followed by beamformer

– The beamformer inputs contain less echo– The multichannel AEC deteriorates due to noise

• BF-AEC: beamformer followed by single channel AEC

– AEC contains less noise in its input– The beamformer suppresses echo, although AEC has better

performance– AEC suffers from fast variations in echo path due to the beamformer

26


TF−GSC

∑

∑

∑

U2(t, ejω)

U3(t, ejω)E(t, ejω)

ZM (t, ejω)

Z2(t, ejω)

Z1(t, ejω)

GE2

(t, ejω)

GE1

(t, ejω)

GNM (t, ejω)

GN3

(t, ejω)

GN2

(t, ejω)

ZAECM (t, ejω)

ZAEC2

(t, ejω)

ZAEC1

(t, ejω)

Y (t, ejω)

−

GEM (t, ejω)

+

−

+

∑

∑

YMBF(t, ejω)

−

∑

+

UNM (t, ejω)

H†

W†0

YNC(t, ejω)

−

+

27


��

��

��

��

��

��

��

��

��

TF−GSC

∑

∑

H†

Z1(t, ejω)

Z2(t, ejω)

W†0

U2(t, ejω)

U3(t, ejω)

UM (t, ejω)

YNC(t, ejω)

ZM (t, ejω)

E(t, ejω) GE(t, ejω)

GN2

(t, ejω)

GN3

(t, ejω)

GNM

(t, ejω)

YMBF(t, ejω)

∑

−

Y (t, ejω)

−

∑

+

YBF (t, ejω)

+

28


ETF-GSC scheme

• Matched beamformer (MBF)

– Maintains desired signal

• Blocking unit (BU)

– Blocks both desired and echo signals

• Adaptive noise and echo canceller (ANEC)

– Noise canceller and echo canceller work in parallel– Echo reference signal is used to create more interference reference

signals to the ANEC

29


��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

∑

∑

E(t, ejω)

∑

−

+

U′

3(t, ejω)

U′

M(t, ejω)

GNM (t, ejω)

GN3 (t, ejω)

GN2 (t, ejω)

GEM (t, ejω)

GE2 (t, ejω)

GE1 (t, ejω)

∑

Y (t, ejω)

YEC(t, ejω)

∑

F†0

YNC(t, ejω)

Z1(t, ejω)

Z2(t, ejω)

Z3(t, ejω)

ZM (t, ejω)

UM (t, ejω)

U3(t, ejω)

U2(t, ejω)

−

+

H†

F†0

YMBF(t, ejω)

∑

H†

∑−

+

+

−

U′

2(t, ejω)

GH2 (t, ejω)

GH3 (t, ejω)

GHM (t, ejω)

30


ETF-GSC scheme

• Estimation

– MBF estimation is done as in the TF-GSC (during single talk)– BM estimation is done as in the TF-GSC (during single talk)– Noise canceller adapts during noise only frames– Echo canceller adapts during echo frames

31


Performance evaluation

Input Tested algorithm Echo suppression Noise reductionSNR SER AEC BF Total Total

AEC-BF 18.5 2.1 20.6 23.515 5 BF-AEC 5.4 6.2 11.7 24.7

ETF-GSC 37.7 23.1AEC-BF 3.9 0.4 4.4 23.3

5 15 BF-AEC 1.7 6.8 8.6 24.4ETF-GSC 18.1 23.0AEC-BF 12.1 1.0 13.1 23.9

15 15 BF-AEC 4.6 5.8 10.4 24.5ETF-GSC 29.7 23.6

32


Sonograms

Time [Sec]

Fre

quen

cy [H

z]

(a)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

Time [sec]

Fre

quen

cy [H

z]

(c)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

Time [sec]

Fre

quen

cy [H

z]

(b)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

Time [sec]

Fre

quen

cy [H

z]

(d)

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

4000

5

10

15

20

25

30

35

40

45

50

33

DTF-GSC AND APP. TO JOINT NR AND AEC Conclusions

Conclusions

• DTF-GSC algorithm

– GSC structure: modified MBF and BM– New identification procedure for DT frames– Application: BSS problem of convolutive mixtures and additive noise

• DTF-GSC performance analysis

– General expression for the output power spectral density– Expected deviation imposed on the desired signal– Noise reduction– Interference reduction

34

DTF-GSC AND APP. TO JOINT NR AND AEC Conclusions

Conclusions

ETF-GSC

– Joint echo cancellation and noise reduction in a reverberatedenvironment

– TF-GSC based solution: BU and ANEC blocks (reference signalincorporated)

– Performance evaluation (during DT) and comparison to cascade schemes

35

DTF-GSC AND APP. TO JOINT NR AND AEC Future Research

Future Research

• Dual nonstationary speech signals in the presence of echo andstationary noise

MICROPHONES ARRAY

AMBIENT NOISE


ECHO SIGNAL

NOISE SOURCE

SPEECH ENHANCEMENT

SYSTEM

COMPETING SPEECH SIGNAL

36

DTF-GSC AND APP. TO JOINT NR AND AEC Future Research

Future Research• Speech enhancement using the DTF-GSC and postfiltering

– Less significant noise reduction is obtained in diffused noise field– Postfiltering: known methods or using noise reference signals

• DTF-GSC using Relative Transfer Function (RTF) system identification

– Weighted least squares optimization criterion– Smaller error variance and faster convergence

• Joint noise reduction and echo cancellation using the ETF-GSC andresidual echo cancellation

– Misadjusted AEC filters and finite filters length– Linear prediction error filter removes the short-term correlation of the

residual echo– Whitened residual echo is cancelled by a noise reduction filter

37