Dual Transfer Function GSC and Application toJoint Noise Reduction and Acoustic Echo
Cancellation
Gal ReuvenUnder supervision of Sharon Gannot1 and Israel Cohen2
1School of Engineering, Bar-Ilan University, Ramat-Gan2Department of Electrical Engineering, Technion, Haifa
February, 2006
DTF-GSC AND APP. TO JOINT NR AND AEC Motivation
Motivation
• Interferences degrade
– Intelligibility– Speech compression quality– Speech recognition rates
MICROPHONES ARRAY
AMBIENT NOISE
DESIRED SPEECH SIGNAL
COMPEETING SPEECH SIGNAL
NOISE SOURCE
SPEECH ENHANCEMENT
SYSTEM
Goal: speech enhancement by joint interference and noisereduction system
1
DTF-GSC AND APP. TO JOINT NR AND AEC Outline
Outline
• Problem presentation
• The DTF-GSC
• Estimation
• Performance analysis and experimental study
• Application: joint AEC and NR
– Cascade schemes– ETF-GSC scheme
2
DTF-GSC AND APP. TO JOINT NR AND AEC Problem Presentation
Problem Presentation
• M ≥ 3 microphones
• One desired speech signal
• One directional interference signal
• One directional/ambient noise signal
• Arbitrary acoustic transfer functions
(ATFs)
MICROPHONES ARRAY
AMBIENT NOISE
DESIRED SPEECH SIGNAL
COMPEETING SPEECH SIGNAL
NOISE SOURCE
SPEECH ENHANCEMENT
SYSTEM
zm(t) = am(t) ∗ s1(t) + bm(t) ∗ s2(t) + nm(t)
m = 1, . . . , M
3
DTF-GSC AND APP. TO JOINT NR AND AEC Problem Presentation
Time Domain Presentation
zm(t) = am(t) ∗ s1(t) + bm(t) ∗ s2(t) + nm(t); m = 1, . . . , M
where
am(t): the acoustical impulse responses of the m-thmicrophone to the desired speech source
bm(t): the acoustical impulse responses of the m-thmicrophone to the non-stationary interference source
s1(t): the desired speechs2(t): the non-stationary interference sourcenm(t): the (directional or nondirectional) stationary noise
signal at the m-th microphone
4
DTF-GSC AND APP. TO JOINT NR AND AEC Problem Presentation
Frequency Domain Presentation
STFT:
Z(t, ejω) = A(ejω)S1(t, ejω) + B(ejω)S2(t, ejω) + N(t, ejω)where
Z(t, ejω) =[
Z1(t, ejω) Z2(t, ejω) · · · ZM(t, ejω)]T
A(ejω) =[
A1(ejω) A2(ejω) · · · AM(ejω)]T
B(ejω) =[
B1(ejω) B2(ejω) · · · BM(ejω)]T
N(t, ejω) =[
N1(t, ejω) N2(t, ejω) · · · NM(t, ejω)]T
5
DTF-GSC AND APP. TO JOINT NR AND AEC Problem Presentation
Goal
• Reconstruct the desired speech signal in an environment contains
– Reverberation– Competing speech signal (double talk)– Stationary noise
• Applications
– Blind source separation (BSS)– Acoustic echo cancellation (AEC)
• Methods
– Extend TF-GSC such that it will apply null to the interferencedirection
– Exploiting non stationarity of desired and interference signals
6
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Dual Source Transfer-FunctionGeneralized Sidelobe Canceller (DTF-GSC)
��
����
�� ��������
����
��������
��������
��������
W†0
+
ZM (t, ejω)
Y (t, ejω)
YMBF(t, ejω)
∑ ∑
∑
U3(t, ejω)
U4(t, ejω)
UM (t, ejω)
YNC(t, ejω)
H†
Z1(t, ejω)
Z2(t, ejω)
Z3(t, ejω)
G3(t, ejω)
G4(t, ejω)
GM (t, ejω)
−
7
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Method
Extend the TF-GSC for dealing with nonstationary interference
X Matched beamformer (MBF)
Distortionless to the desired direction while blocking the interferencedirection
X Blocking matrix (BM)
Blocking both desired and interference directions
X Adaptive noise canceller (ANC)
Estimates the residual noise at the MBF output using reference signalsproduced by the BM
8
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Matched beamformer
ATFs ratio matched filter:
W0(ejω) =
A(ejω)
‖A(ejω)‖2 − ρ(ejω) B(ejω)
‖A(ejω)‖‖B(ejω)‖1− |ρ(ejω)|2 F(ejω)
ρ(ejω)≡ B†(ejω)A(ejω)‖A(ejω)‖ ‖B(ejω)‖
Easily verified:
• A†(ejω)W0(ejω) = F(ejω)
• B†(ejω)W0(ejω) = 0
9
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Blocking Matrix
H(ejω) =
Q3(ejω) Q4(ejω) · · · QM(ejω)L3(ejω) L4(ejω) · · · LM(ejω)
1 0 · · · 00 1 · · · 0
· · · . . .0 0 · · · 1
Qm(ejω) =−A∗2(e
jω)
A∗1(ejω)
B∗m(ejω)
B∗1(ejω)− B∗2(ejω)
B∗1(ejω)
A∗m(ejω)
A∗1(ejω)
A∗2(ejω)
A∗1(ejω)− B∗2(ejω)
B∗1(ejω)
; m = 3, . . . , M
Lm(ejω) =−A∗m(ejω)
A∗1(ejω)− B∗m(ejω)
B∗1(ejω)
A∗2(ejω)
A∗1(ejω)− B∗2(ejω)
B∗1(ejω)
; m = 3, . . . , M
10
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Blocking Matrix
Easily verified:
• A†(ejω)H(ejω) = 0
• B†(ejω)H(ejω) = 0
11
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Adaptive Noise Canceller
Normalized LMS:
Gm(t + 1, ejω) = Gm(t, ejω) + µUm(t, ejω)Y ∗(t, ejω)
Pest(t, ejω)
Gm(t + 1, ejω) FIR←− Gm(t + 1, ejω)
for m = 3, . . . , M ; where
Pest(t, ejω) = ηPest(t− 1, ejω) + (1− η)‖Z(t, ejω)‖2
12
DTF-GSC AND APP. TO JOINT NR AND AEC Estimation
Estimation
MBF components:
Done in a two steps procedure
• EstimatingA∗m(ejω)
A∗1(ejω)and
B∗m(ejω)
B∗1(ejω)exploiting non stationarity
• calculating W0(ejω)
13
DTF-GSC AND APP. TO JOINT NR AND AEC Estimation
Estimation
An unbiased estimate ofA∗m(ejω)
A∗1(ejω)and
B∗m(ejω)
B∗1(ejω)is obtained by applying LS to
Φ(1)zmz1(e
jω)Φ(2)
zmz1(ejω)
...
Φ(K)zmz1(e
jω)
=
Φ(1)z1z1(e
jω) 1Φ(2)
z1z1(ejω) 1
...
Φ(K)z1z1(e
jω) 1
[Hm(ejω)
Φumz1(ejω)
]+
ε(1)m (ejω)
ε(2)m (ejω)
...
ε(K)m (ejω)
(a separate set of equations is used for m = 2, . . . , M).
14
DTF-GSC AND APP. TO JOINT NR AND AEC Estimation
Estimation
BM components:
Estimation method depends on type of frames
• Single speech signal is active:A∗m(ejω)
A∗1(ejω)or
B∗m(ejω)
B∗1(ejω)is adapted and H(ejω)
is calculated
• Double talk: Qm(ejω) and Lm(ejω) are estimated directly by solving
15
DTF-GSC AND APP. TO JOINT NR AND AEC Estimation
Estimation
Φ(1)zmz1(e
jω)Φ(2)
zmz1(ejω)
...
Φ(K)zmz1(e
jω)
=
Φ(1)z1z1(e
jω) Φ(1)z2z1(e
jω) 1Φ(2)
z1z1(ejω) Φ(2)
z2z1(ejω) 1
...
Φ(K)z1z1(e
jω) Φ(K)z2z1(e
jω) 1
×
−Qm(ejω)−Lm(ejω)Φumz1(e
jω)
+
ε(1)m (ejω)
ε(2)m (ejω)
...
ε(K)m (ejω)
(a separate set of equations is used for m = 3, . . . , M)
16
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
DTF-GSC Performance Analysis
• General expression for the output power spectral density:
Φyy(t, ejω
) =
{W0
†(e
jω)ΦZZ(t, e
jω)W0(e
jω)
−W0†(e
jω)ΦNN(t, e
jω)H(e
jω)(H†(ejω
)ΦNN(t, ejω
)H(ejω
))−1
H†(ejω)ΦZZ(t, e
jω)W0(e
jω)
−W0†(e
jω)ΦZZ(t, e
jω)H(e
jω)(H†(ejω
)ΦNN(t, ejω
)H(ejω
))−1
H†(ejω)ΦNN(t, e
jω)W0(e
jω)
+W0†(e
jω)ΦNN(t, e
jω)H(e
jω)(H†(ejω
)ΦNN(t, ejω
)H(ejω
))−1
H†(ejω)ΦZZ(t, e
jω)H(e
jω)
×(H†(ejω
)ΦNN(t, ejω
)H(ejω
))−1
H†(ejω)ΦNN(t, e
jω)W0(e
jω) }
• PSD depends on:
– Input signal PSD– Noise signal PSD– Signal ATF ratios
17
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
DTF-GSC Performance Analysis
Output power density
• 10 microphones linear array
• Delay only ATFs for speech and noise
• Maintaining desired signal at θ = 90o
• Blocks directional noise from θ =120o
• Blocks interference from θ = 60o
18
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
PSD deviation
DEV(t, ejω) =Φs1
yy(t, ejω)
|F(ejω)|2|A1(ejω)|2Φs1s1(t, ejω)
• 10 microphones linear array
• Delay only ATFs for speech
• Directional noise field
• Desired signal from θ = 90o
• Upto 4dB distortion in frequenciesbelow 3000Hz
01000
20003000
4000 87 88 89 90 91 92 93
−10
−6
−2
2
θ [deg]Frequency[Hz]
Φyy
[dB
]
19
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
Noise Reduction
NR(t, ejω) =Φn
yy(t, ejω)
|F(ejω)|2|D1(ejω)|2Φnn(t, ejω)
• 10 microphones linear array
• Delay only ATFs for speech
• Directional noise signal from θ =120o
• 50dB attenuation in the noise direc-tion
0
1000
2000
3000
4000
115116117118119120121122123124125−60
−40
−20
0
Frequency[Hz]θ [deg]
Φyy
[dB
]
20
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
Interference Reduction
NIR(t, ejω
) =Φs2
yy(t, ejω)
|F(ejω)|2|B1(ejω)|2Φs1s1(t, ejω)
• 10 microphones linear array
• Delay only ATFs for speech
• Directional noise field
• Interference signal from θ = 60o
• 50dB attenuation in the interference direc-
tion
0
1000
2000
3000
4000
5556575859606162636465−60
−40
−20
0
Frequency[Hz]θ [deg]
Φyy
[dB
]
21
DTF-GSC AND APP. TO JOINT NR AND AEC Experimental study
Experimental study
• Speech signal
• simulated ATFs in two noise fields:
– directional noise– diffused noise
• Sonograms
• Performance evaluation
22
DTF-GSC AND APP. TO JOINT NR AND AEC Experimental study
Sonograms
Time [Sec]
Fre
quen
cy [H
z]
(a)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(c)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(e)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(b)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z](d)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(f)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
23
DTF-GSC AND APP. TO JOINT NR AND AEC Experimental study
Performance evaluation
Noise and interference reduction in
• directional noise field (top)
• diffused noise field (bottom)
Input Output of Output of Output ofMBF BM DTFGSC
S1NR S1S2R S1NR S1S2R S1NR S2NR S1NR S1S2R11.3 2.3 13.8 16.9 -3.9 -4.5 34.6 12.712.7 2.3 17.4 25 -3.8 -3.5 20.9 22.6
24
DTF-GSC AND APP. TO JOINT NR AND AEC Application
Application: joint noise reduction and echo cancellation
• M ≥ 3 microphones
• One desired speech signal
• One competitive speech signal
(echo)
• One directional/ambient noise signal
• Arbitrary acoustic transfer functions
(ATFs)
MICROPHONES ARRAY
AMBIENT NOISE
DESIRED SPEECH SIGNAL
REMOTE SPEECH SIGNAL
NOISE SOURCE
SPEECH ENHANCEMENT
SYSTEM
zm(t) = am(t) ∗ s1(t) + bm(t) ∗ e(t) + nm(t)
m = 1, . . . , M
25
DTF-GSC AND APP. TO JOINT NR AND AEC Application
Cascade scheme
• AEC-BF: multichannel AEC followed by beamformer
– The beamformer inputs contain less echo– The multichannel AEC deteriorates due to noise
• BF-AEC: beamformer followed by single channel AEC
– AEC contains less noise in its input– The beamformer suppresses echo, although AEC has better
performance– AEC suffers from fast variations in echo path due to the beamformer
26
DTF-GSC AND APP. TO JOINT NR AND AEC Application
TF−GSC
∑
∑
∑
U2(t, ejω)
U3(t, ejω)E(t, ejω)
ZM (t, ejω)
Z2(t, ejω)
Z1(t, ejω)
GE2
(t, ejω)
GE1
(t, ejω)
GNM (t, ejω)
GN3
(t, ejω)
GN2
(t, ejω)
ZAECM (t, ejω)
ZAEC2
(t, ejω)
ZAEC1
(t, ejω)
Y (t, ejω)
−
GEM (t, ejω)
+
−
+
∑
∑
YMBF(t, ejω)
−
∑
+
UNM (t, ejω)
H†
W†0
YNC(t, ejω)
−
+
27
DTF-GSC AND APP. TO JOINT NR AND AEC Application
��
����
����
��
��
����
������������ ��
��
����
TF−GSC
∑
∑
H†
Z1(t, ejω)
Z2(t, ejω)
W†0
U2(t, ejω)
U3(t, ejω)
UM (t, ejω)
YNC(t, ejω)
ZM (t, ejω)
E(t, ejω) GE(t, ejω)
GN2
(t, ejω)
GN3
(t, ejω)
GNM
(t, ejω)
YMBF(t, ejω)
∑
−
Y (t, ejω)
−
∑
+
YBF (t, ejω)
+
28
DTF-GSC AND APP. TO JOINT NR AND AEC Application
ETF-GSC scheme
• Matched beamformer (MBF)
– Maintains desired signal
• Blocking unit (BU)
– Blocks both desired and echo signals
• Adaptive noise and echo canceller (ANEC)
– Noise canceller and echo canceller work in parallel– Echo reference signal is used to create more interference reference
signals to the ANEC
29
DTF-GSC AND APP. TO JOINT NR AND AEC Application
��
��
����
��������
��
��
��������
����
��
��������
����
����
����
����
��
��
����
��������
��
��
����
��������
����
����
����
��������
��������
∑
∑
E(t, ejω)
∑
−
+
U′
3(t, ejω)
U′
M(t, ejω)
GNM (t, ejω)
GN3 (t, ejω)
GN2 (t, ejω)
GEM (t, ejω)
GE2 (t, ejω)
GE1 (t, ejω)
∑
Y (t, ejω)
YEC(t, ejω)
∑
F†0
YNC(t, ejω)
Z1(t, ejω)
Z2(t, ejω)
Z3(t, ejω)
ZM (t, ejω)
UM (t, ejω)
U3(t, ejω)
U2(t, ejω)
−
+
H†
F†0
YMBF(t, ejω)
∑
H†
∑−
+
+
−
U′
2(t, ejω)
GH2 (t, ejω)
GH3 (t, ejω)
GHM (t, ejω)
30
DTF-GSC AND APP. TO JOINT NR AND AEC Application
ETF-GSC scheme
• Estimation
– MBF estimation is done as in the TF-GSC (during single talk)– BM estimation is done as in the TF-GSC (during single talk)– Noise canceller adapts during noise only frames– Echo canceller adapts during echo frames
31
DTF-GSC AND APP. TO JOINT NR AND AEC Application
Performance evaluation
Input Tested algorithm Echo suppression Noise reductionSNR SER AEC BF Total Total
AEC-BF 18.5 2.1 20.6 23.515 5 BF-AEC 5.4 6.2 11.7 24.7
ETF-GSC 37.7 23.1AEC-BF 3.9 0.4 4.4 23.3
5 15 BF-AEC 1.7 6.8 8.6 24.4ETF-GSC 18.1 23.0AEC-BF 12.1 1.0 13.1 23.9
15 15 BF-AEC 4.6 5.8 10.4 24.5ETF-GSC 29.7 23.6
32
DTF-GSC AND APP. TO JOINT NR AND AEC Application
Sonograms
Time [Sec]
Fre
quen
cy [H
z]
(a)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(c)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(b)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(d)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
33
DTF-GSC AND APP. TO JOINT NR AND AEC Conclusions
Conclusions
• DTF-GSC algorithm
– GSC structure: modified MBF and BM– New identification procedure for DT frames– Application: BSS problem of convolutive mixtures and additive noise
• DTF-GSC performance analysis
– General expression for the output power spectral density– Expected deviation imposed on the desired signal– Noise reduction– Interference reduction
34
DTF-GSC AND APP. TO JOINT NR AND AEC Conclusions
Conclusions
ETF-GSC
– Joint echo cancellation and noise reduction in a reverberatedenvironment
– TF-GSC based solution: BU and ANEC blocks (reference signalincorporated)
– Performance evaluation (during DT) and comparison to cascade schemes
35
DTF-GSC AND APP. TO JOINT NR AND AEC Future Research
Future Research
• Dual nonstationary speech signals in the presence of echo andstationary noise
MICROPHONES ARRAY
AMBIENT NOISE
DESIRED SPEECH SIGNAL
ECHO SIGNAL
NOISE SOURCE
SPEECH ENHANCEMENT
SYSTEM
COMPETING SPEECH SIGNAL
36
DTF-GSC AND APP. TO JOINT NR AND AEC Future Research
Future Research• Speech enhancement using the DTF-GSC and postfiltering
– Less significant noise reduction is obtained in diffused noise field– Postfiltering: known methods or using noise reference signals
• DTF-GSC using Relative Transfer Function (RTF) system identification
– Weighted least squares optimization criterion– Smaller error variance and faster convergence
• Joint noise reduction and echo cancellation using the ETF-GSC andresidual echo cancellation
– Misadjusted AEC filters and finite filters length– Linear prediction error filter removes the short-term correlation of the
residual echo– Whitened residual echo is cancelled by a noise reduction filter
37