View
216
Download
0
Tags:
Embed Size (px)
Citation preview
30/08/2004 1
Department ofCommunication Technology
A Comparative Study of Feature-Domain Error Concealment Techniques
for Distributed Speech Recognition
- Robust2004 workshop, Norwich, UK
Zheng-Hua Tan, Børge Lindberg and Paul Dalsgaard
{zt, bli, pd}@kom.aau.dk
Aalborg University, Denmark
30/08/2004 2
Department ofCommunication Technology
Agenda
• Feature-domain EC techniques– repetition – linear interpolation– subvector concealment
• Speech recognition experiments
• Comparative study– MFCC features– Euclidean and DP distances– HMM state durations
30/08/2004 3
Department ofCommunication Technology
Motivation
Why to do this work?• A variety of EC techniques for DSR occur
– A survey
• Repetition vs. interpolation– Which is better?
• What makes an EC technique good for recognition?
30/08/2004 4
Department ofCommunication Technology
EC techniques
Two classes of EC techniques• Client based EC
– e.g. retransmission and forward error control (FEC)
• Server based EC(the redundancy in the transmitted signal is exploited)– in the model-domain
• Weighted Viterbi, missing feature theory
– in the feature-domain• Insertion based techniques: splicing, substitution, repetition• Interpolation based techniques: linear interpolation• Subvector concealment
30/08/2004 5
Department ofCommunication Technology
Subvector concealment
• Observation1: conventional EC schemes share a common characteristic - conducting EC at the vector level
• Observation 2: within erroneous vectors, a substantial number of subvectors are often error-free
Subvector based EC
30/08/2004 6
Department ofCommunication Technology
Subvector concealment (cont.)
• The ETSI-DSR standard– Feature-pair and SVQ: The n’th vector is
– Frame-pair:
Tnnnnnnn Eccccc ]log,,,...,,,[ 0121121 V
TTnTnTn ]][,][...,,][[ 650 S S S Feature-pair
Subvector
][ 1 V ,V nn
30/08/2004 7
Department ofCommunication Technology
• Buffering matrix
• Consistency test
TSSd OR TSSd jn
jn
jjn
jn
j ))1())1()1((())0())0()0((( 11
Subvector concealment (cont.)
B2NA1-2NA2A1A V V V . V V V A
BNNA
BNNA
BNNA
BNNA
BNNA
BNNA
BNNA
62
612
62
61
66
52
512
52
51
55
42
412
42
41
44
32
312
32
31
33
22
212
22
21
22
12
112
12
11
11
02
012
02
01
00
.
.
.
.
.
.
.
SSSSSS
SSSSSS
SSSSSS
SSSSSS
SSSSSS
SSSSSS
SSSSSS
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
))(())(( 21
22
201
12
1 cAA
cAA TccdORTccd
30/08/2004 8
Department ofCommunication Technology
Consistency matrix and subvector concealment
B8A7A6A5A4A3A2A1A V V V V V V V V V V A
1110011001
1111111111
1001111111
1111111001
1111111111
1110011111
1110000111
0 for inconsistent
1 for consistentC =
6
5
4
3
2
1
0
S
S
S
S
S
S
S
Subvector concealment (cont.)
30/08/2004 9
Department ofCommunication Technology
Outline
• Feature-domain EC techniques– repetition – linear interpolation– subvector concealment
• Speech recognition experiments
• Comparative study– MFCC features– Euclidean and DP distances– HMM state durations
30/08/2004 10
Department ofCommunication Technology
Recognition experiments
• two tasks: Danish digits and city names• the HTK based reference recogniser • the realistic GSM error patterns (EP) :
– EP1, 10 dB (C/I ratios )
– EP2, 7 dB
– EP3, 4dB
30/08/2004 11
Department ofCommunication Technology
Recognition experiments (cont.)
The %WER for three EC techniques
(a) Danish digits (b) city names
0
2
4
6
8
10
12
EP1 EP2 EP3
Repetition
Interpolation
Subvector
20
25
30
35
40
45
EP1 EP2 EP3
Repetition
Interpolation
Subvector
30/08/2004 12
Department ofCommunication Technology
Outline
• Feature-domain EC techniques– repetition – linear interpolation– subvector concealment
• Speech recognition experiments
• Comparative study– MFCC features– Euclidean and DP distances– HMM state durations
30/08/2004 13
Department ofCommunication Technology
Comparative study - MFCC features
• Transmission errors of a random BER value of 2% is used.
• The original error-free MFCC features are directly compared with the features corrupted with errors but concealed either – by repetition – by interpolation– by subvector concealment
30/08/2004 14
Department ofCommunication Technology
Comparative study - MFCC features (cont.)
• MFCC c0
• Two observations
30/08/2004 15
Department ofCommunication Technology
Comparative study - MFCC features (cont.)
• Interpolation: straight line – constant value segment – zero value segment
30/08/2004 16
Department ofCommunication Technology
Comparative study - MFCC features (cont.)
• Repetition generated feature curves display similar shapes even though there are some displacements along the time axis as compared to the iMFCC feature.
• However, the DP embedded in the Viterbi algorithm makes this displacement relatively irrelevant.
30/08/2004 17
Department ofCommunication Technology
Comparative study - DP distances
– The Euclidean and DP distances between c0 of
MFCC and MFCC generated by different EC techniques for word “et”
0
1
2
3
4
5
Euclidean DP
Repetition Interpolation Subvector
– General expectation: interpolation performs better
• Signal reconstruction vs. speech recognition• Euclidean distance vs. DP distance
30/08/2004 18
Department ofCommunication Technology
Comparative study - DP distances (cont.)
Over 328 testing utterances• Number of smaller distances
• Subvector EC always gives the smallest for both distances.
0
50
100
150
200
250300
Euclidean DP
Repetition
Interpolation
30/08/2004 19
Department ofCommunication Technology
Comparative study - HMM state durations
• Viterbi decoding tracks the HMM state alignment • The average state-durations
• Two facts are observed:– repetition vs. interpolation– subvector vs. error-free
0
2
4
6
State durationRepetition InterpolationSubvector Erro-free
30/08/2004 20
Department ofCommunication Technology
Summary
• Three different EC techniques compared– the simple repetition technique is as good as
or even better than linear interpolation– subvector concealment performs best
• Comparative study– MFCC features– Euclidean and DP distances– HMM state durations
30/08/2004 21
Department ofCommunication Technology
A Comparative Study of Feature-Domain Error Concealment Techniques
for Distributed Speech Recognition
Thanks!