Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Effects of Reverberant Energy on Statistics of SpeechMadhu Shashanka1, Barbara Shinn-Cunningham1, and Martin Cooke2
1Boston University, 2University of Sheffield{ shashanka,shinn} @cns.bu.edu, [email protected]
Anechoic Signal
Anechoic
Bathroom Center 16
Bath Ctr 16
Bathroom Corner 16
Bath Cnr 16
Ping−pong Center 16
PP Ctr 16
Ping−pong Corner 16
PP Cnr 16
Ping−pong Side 16
PP Side 16
Anechoic Signal
Anechoic
Bathroom Center 50
Bath Ctr 50
Bathroom Corner 50
Bath Cnr 50
Ping−pong Center 50
PP Ctr 50
Ping−pong Corner 50
PP Cnr 50
Ping−pong Side 50
PP Side 50
Anechoic Signal
Anechoic
Bathroom Center 100
Bath Ctr 100
Bathroom Corner 100
Bath Cnr 100
Ping−pong Center 100
PP Ctr 100
Ping−pong Corner 100
PP Cnr 100
Ping−pong Side 100
PP Side 100
Anechoic Signal
Anechoic
Bathroom Center 200
Bath Ctr 200
Bathroom Corner 200
Bath Cnr 200
Ping−pong Center 200
PP Ctr 200
Ping−pong Corner 200
PP Cnr 200
Ping−pong Side 200
PP Side 200
Objective
Study how reverberation affects sparsity andstatistics of speech.
Measurements: We measured impulse responses in five dif ferent room/positionconditions and at four dif ferent distancesfrom the sour ce.
• Bathr oom
– Center
– Corner
• PingPong Room
– Center
– Corner
– Side
At every location/condition, responses wer erecorded at two mics (left and right) whichwer e 0.4 m apart.
How does reverb distort a speechsignal?Reverberation smears energy across time. Areverb signal is less sparse in time andfrequency than the anechoic signal. Thewaveforms and spectrograms of an anechoicsignal and a reverb signal for a particularcondition are shown below.
Waveforms
ANECHOIC BR CENTER 2m
Spectrograms
Approach
We compar ed energy dif ferences betweensignals using a time-fr equencyrepresentation (spectr ogram). Reverberantsignals wer e first time-aligned with theanechoic signal and levels wer e corr ected sothat RMS energy of the dir ect sound wasequal to that of the anechoic signal.
−100 −50 0 500
5
10
15
20
Energy (dB)
stinu FT fo eg atn ec r eP
Clean0.16 m0.50 m1.0 m2.0 m
BR CENTER
−100 −50 0 500
5
10
15
20
Energy (dB)
stinu FT fo e gatn e cre P
CleanBr CenterBr CornerPr CenterPr CornerPr Side
2.0 m
Speech and Silence
We consider another measur e of the effectsof reverberation – percentage of TF units forwhich anechoic energy was within ± 3 dB ofreverb energy.
• Speech - if energy is within 20 dB of maxenergy.
• Silence - if energy is less than max energyby more than 20 dB.
Below, we plot these percentages separatelyfor the bathroom and the pingpong room.
Bathroom :
16 50 100 2000
10
20
16 50 100 2000
20
40
)ciohcena(E fo Bd 3 ni htiw )b re ver(E
Speech
16 50 100 2000
5
10
15 Silence
CenterCorner
Pingpong room :
16 50 100 2000
20
40
20
40
60
80
0
20
40
CenterCornerSide
)ciohcena(E f o B d 3 n iht iw ) brev er(E
Speech Silence
Below, we plot the histogram of the energydif ference between S1 and S2 in the anechoicand reverberant cases.To visualize thedif ference more clearly , we also plot thedif ference between the two curves (right).
Energy(S1-S2) Histograms and theirdif ference
−100 −50 0 50 1000
5
10
15
20
Energy(S1) − Energy(S2)
stinu FT fo %
AnechoicReverb
BathroomCenter2.0m
−100 −50 0 50 100−3
−2
−1
0
1
2
3
4
)ciohcenA(% − )br ev e
R(%
Energy(S1) − Energy(S2)
BathroomCenter2.0m
We can also see to what extent energies inthe two signals overlapped. Below, we plotthe percentage of TF units for which energyin speech sample 1 was within 3 dB ofenergy in the other sample.
Bathroom :
16 50 100 20018
19
20
20
25
30
35
( ygrene ni pal rev o hcihw stinu FT
%±
) Bd 3
Speech
18
19
20
21 Silence
CenterCornerAnechoic
Pingpong room :
16 50 100 20018
19
20CenterCornerSideAnechoic
20
25
30
35
19
20
21
( ygrene ni pa lrev o hcihw stinu FT
%±
) Bd 3
Speech Silence
Energy of S2 in the gaps of S1
We consider how much energy of speech sample2 is present in the silence regions of speech sample 1. The plot below shows the energy of speech sample 2 and dark blue regions representTF units where sample 1 has high energy.
Sample 2 in the "silence" of Sample 1
Below, we plot energy histograms.
Silence of S1
−100 −80 −60 −40 −20 00
5
10
15
20
Energy
stinu FT fo %
AnechoicReverb
Energy of S1
−100 −80 −60 −40 −20 00
5
10
15
20
Energy
stinu FT fo %
AnechoicReverb
Energy of S2
Energy of (S1+S2)
−100 −80 −60 −40 −20 00
5
10
15
20
Energy
stinu FT fo %
AnechoicReverb
Silence of S1
−100 −80 −60 −40 −20 00
5
10
15
20
Energy
stinu FT fo %
Silence of S2AnechoicReverb
Reverberation More EnergyBelow are histograms of the energy intime frequency (TF) units.
The relative amount of energy fromreverberation grows with distance.Histogram of TF energy for thebathroom center condition.
The hard-walled bathroom (BR)produces more reverberant energythan a classroom (the ping-pong room, PR). Histogram of TF energy for different room conditions, for a 2-m distance.
Silence is filled in by reverberantenergy, especially in the hard-walled bathroom. As distanceincreases, the effects of reverberationincrease, both for silence and forspeech. However, filling in the silencedestroys the normal modulations that convey speech meaning. Plots show the percentage of TF unitsfor which the energy in the reverberantcondition is within 3 dB of what it was in the clean speech.
16 50 100 200 16 50 100 200
Simultaneous SpeechWe now consider the effect of reverberationon simultaneous speech signals.
We analyzed all TF units togethe, but alsobroke the TF units into two categories:
16 50 100 200 16 50 100 200
16 50 100 200 16 50 100 200
Distance (cm)
Distance (cm)
Distance (cm)
Distance (cm)
All TF units
All TF units
All TF units
All TF units
Histograms of energy of signal 1 (left panel) andsignal 2 (right panel) in the silence regions of signal 1.
Histograms of energy of both signals in silenceregions of signal 1 (left panel) andsignal 2 (right panel).
Acknowledgements1. Tim Streeter for helping with measu-rements of impulse responses.2. Portions of this work were supportedby grants from (a) Air Force Office of Scientific Research(b) National Science Foundation.