Upload
bernard-mccormick
View
213
Download
0
Embed Size (px)
Citation preview
DU, C-SIIT 1
Collecting and Transcribing Real Chinese Spontaneous Telephone Speech
Corpus
Limin Du, Chair Professor
Director, Center for Speech Interactive Information Technology Institute of Acoustics, Chinese Academy of Sciences
October 21, 2000
DU, C-SIIT 2
Background Spontaneous speech interactive via telephone is a very
prospect application, building speech recognition systems in terms of the variations in acoustics and spoken styles for telephone application is necessary
There is no large-scale Chinese Spontaneous Telephone Speech Corpus available for research– Simulating telephone speech corpus (1997, C-SIIT, IOA, CAS)
Microphone speech corpus – pipeline to telephone – telephone speech
– Collecting real telephone speech data seems to be a formidable task
Laws Costs
Chinese-English speech translation (CEST) project, an collaboration between CAS-AT&T (1998-2003) is an strong driving for this work
DU, C-SIIT 3
Real Telephone Speech CollectionReal Telephone Speech Collection
A “dialogue oriented” collection paradigm– Human-Human conversationsHuman-Human conversations– Human-machine dialoguesHuman-machine dialogues
RealInformation
Service Center
Caller
HotelInformation
Desk
Computer-
phone
OR
Dialogue
card
Caller
Simulated Human orMachine Service Agent
Computer
Data storage
Labeling is so cool!
DU, C-SIIT 4
Speech Data ProcessingSpeech Data Processing
SamplingSampling– 8kHz sampling8kHz sampling– 16bits A/D quantization16bits A/D quantization
Utterance SegmentationUtterance Segmentation– One Speaker switching for one utteranceOne Speaker switching for one utterance– Utterances in average length of 3 secondsUtterances in average length of 3 seconds
DU, C-SIIT 5
Speech Data Transcribing
What to Label? How to Label?
DU, C-SIIT 6
What to Label?What to Label?
Information about Speakers and Environments– speaker’s dialect, mood, gender, speech quality
Transcribing– Chinese characters– Pinyins– Other acoustic event labels
laugh, lip smack, throat clearing, breath, cough, filled pauses, telephone adjusting, background speech, etc.
Time StampTime Stamp– Other acoustic event are bracketed with time stamps are bracketed with time stamps
automatically when transcribing with a special software toolautomatically when transcribing with a special software tool
DU, C-SIIT 7
Detailed Issues Concerned
MispronunciationMispronunciation– Mispronunciation often occurs in daily life. For Mispronunciation often occurs in daily life. For
example the speaker probably read Chinese example the speaker probably read Chinese character “character “ 山 ” 山 ” ((who’s correct pronunciation is who’s correct pronunciation is “shan1”) as “san2”. In such a case, the associated “shan1”) as “san2”. In such a case, the associated speech segment is transcribed as “speech segment is transcribed as “ 山山 ((san2)” to san2)” to present the present the right textright text and and real pronunciationreal pronunciation
NumbersNumbers– Arabia representation of numbers is a natural Arabia representation of numbers is a natural
method, but it cannot be mapped to a single method, but it cannot be mapped to a single pronunciation. So, transcribers are required to pronunciation. So, transcribers are required to transcribe all numbers with Chinese characterstranscribe all numbers with Chinese characters
DU, C-SIIT 8
Other Acoustic EventsOther Acoustic Events 文件文件 识别结果识别结果 听觉判断听觉判断
– PAUSE1PAUSE1 AIAI [UH][UH]– PAUSE14PAUSE14 AIAI [UH][UH]– PAUSE12PAUSE12 AA [UNG][UNG]– PAUSE33PAUSE33 KA AKA A [UNG][UNG]– PAUSE20PAUSE20 ANGANG [UNG][UNG]– PAUSE26PAUSE26 ANGANG [UNG][UNG]– PAUSE19PAUSE19 ANAN [EN][EN]– PUASE4PUASE4 CHACHA [AO][AO]– PAUSE18PAUSE18 GANGAN [UH][UH]– PAUSE21PAUSE21 HEHE [EN][EN]– PAUSE27PAUSE27 NENE [EN][EN]– PAUSE22PAUSE22 YUNYUN [UM][UM]– PAUSE34PAUSE34 LENGLENG [UH][UH]– PAUSE15PAUSE15 TONGTONG [UH][UH]
DU, C-SIIT 9
Other Acoustic Events(cnt)Other Acoustic Events(cnt) 文件文件 识别结果识别结果 听觉判断听觉判断
– PAUSE31PAUSE31 NONGNONG [EN][EN]
– PAUSE17PAUSE17 HENHEN [EN][EN]
– PAUSE24PAUSE24 ENEN [EN][EN]
– [AA][AA]
– [AI][AI]
– [EN][EN]
– [UH][UH]
– [AO][AO]
– [SIL][SIL] 无声段无声段– [[NOISE]NOISE]
– [LAUGH][LAUGH]
– [ANG] [BREATH][ANG] [BREATH] 呼吸呼吸– [[HESITATION] HESITATION] 犹豫犹豫
DU, C-SIIT 10
Transcription ExampleTranscription Example
<BeginStamp 0>[FILLER]<EndStamp 257> <BeginStamp 260> [NOISE] <EndStamp 928>“ 北京游乐园怎么走”东直门到哪“北京游乐园”北京游乐园是吗“ <BeginStamp 5933> [FILLER]<EndStamp 6250>”<BeginStamp 6228> [FILLER]<EndStamp 6386> 稍等
DU, C-SIIT 11
How to Label?How to Label?
Improving transcribers’ efficiency & reducing Improving transcribers’ efficiency & reducing the possibility to generate errorsthe possibility to generate errors– A labeling tool developed specially for this task.A labeling tool developed specially for this task.
Training transcribersTraining transcribers– Usually our employees assisted speech research Usually our employees assisted speech research
for more than one year and with good working for more than one year and with good working recordsrecords
– Part time employees trained by our employees Part time employees trained by our employees before working atbefore working at
DU, C-SIIT 12
Statistical Results in GeneralChinese Spontaneous Telephone Speech Corpus (CSTSC)
# of Speakers 600# of h-h dialogues # of h-h dialogues 1000# of h-m dialogues# of h-m dialogues 38
Av dura per dialogues 3.5 minutes
Sampling of Speech 8 kHz
Quantization of Speech
16 bits
DU, C-SIIT 13
Statistical Results in Details 180 human-human dialogues, 38 human-machine dialogues
Special Events Count Explanation
Numbers 700 Numbers
Filled pauses 5900 Short non-silence disfluencies, such as [um],[uh] [eh] [ou]
Hesitation 300 Short silence in the context of disfluencies
Laugh 109 Laughter
Breath 98 Breath
Bksound 2300 The caller speaks in a evident noise environment.
MutiSound 570 The caller’ and the service agent speak at same time.
Barge_in 68 The speakers barge in the system’s prompt.
Echo 30 The machine’s echo prompt.
Noise 2000 Non-speech Noise and background speech noise
DU, C-SIIT 14
Summary C-SIIT, CAS started the work to build telephone C-SIIT, CAS started the work to build telephone
speech corpora under very limited budget 3 speech corpora under very limited budget 3 years agoyears ago
The The efforts and experiencesefforts and experiences in collecting real in collecting real Chinese telephone speech corpus are introduced Chinese telephone speech corpus are introduced
C-SIIT C-SIIT will continue the Activitywill continue the Activity on Real Chinese on Real Chinese Telephone and Mobile phone Speech Corpora Telephone and Mobile phone Speech Corpora and try best to make most of the corpora and try best to make most of the corpora already built ,in building, in planning, released to already built ,in building, in planning, released to publicpublic
Suggestions and commencesSuggestions and commences from all of you are from all of you are appreciatedappreciated
Thanks!Thanks!