Upload
allen-brooking
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Data Collection in Nespole!
Goals, procedures and tools
Susanne Burger (Carnegie Mellon University)Erica Costantini (University of Trieste)
Recent Advances in Speech Translation Systems
New Idea
Information about Users:• Acceptance• Usage• Behavior• Wish-list• Problem solving ...
System Information (Dry Run):• Stability• Speed• Bugs ...
Speech Material:• Domain, concept, vocabulary• Style (Human machine conversation)• Quality (Robustness) ...
Why data collection?
Learning by Data
J.T. Hackos, J.C. Redish, User and Task Analysis for interface design, J. Wiley & Sons, 1998.
Learning by Data
1 Mass-Data from the scratch
Artificial Scenario/Environment/Set upWizard of Oz
Cooperative User/Actor
Data collection through usage ofbeta-system with increasing reality
2 User-study Data
AnalysisDevelopmentTrainingTestingEvaluation
Beta-System
Data Collection: Planning
Who are the “Data Customers”?Nespole!:•ASR•MT•Synthesis•Interface Development•...
Type of Collection?Nespole!:•Mass Data Collection•Specific features•User study
Customer Needs?Nespole!:•Audio / Video •Transcription (levels of transcription)•Segmentation
Time and Budget
Data Usage?Nespole!:•Analysis•Development•Training•Testing•Evaluation
Mass-Data Collection: Showcase 1
Travel Scenario / H323 Set upMonolingual
Cooperative Users
Travel + MultimodalityBeta System MT
Unseen Users
Multimodal Experiment
IDEA:NEgotiation through SPOken
Language in E-commerce
NespoleShowcase1-System
Nespole! Data Collection
AnalysisDevelopmentTrainingTestingEvaluation
Example: Mass-Data Collection (Showcase 1)Monolingual data collection for system development
“Assembling Line”
Data Collection Procedure
Recording
Scen./Topic
Participants
Environment
Equipment
DataData
Scenarios
• Scenario in Nespole!Detailed description of:
– the customers’ features (age, marital status…);
– the destination of the travel;
– the objectives and preferences for the holiday
(accommodation, sport activities, cultural events…)
J. M. Carroll, Ed., Scenario-Based Design: Envisioning Work and Technology in System Development, New York, J. Wiley & Sons, 1995.
• Scenario: “story” about users, their work, their environment, how they do tasks, the task they need to do, and all combinations of these elements (*).
Scenarios
Showcase 1
1. winter holidays in val di fiemme2. all inclusive tourist package3. summer vacation in a park4. castle and lake tours5. looking for folklore and brochures
Showcase 2a
All-inclusive tourist packages:1. summer in a hotel or apartment2. summer in a campsite3. summer in a hotel or apartment for a family4. summer in a campsite for a family5. winter in a hotel or apartment
Showcase 2b
1. script 1: chest_pain_12. script 3: chest_pain_23. script 2: flu-like syndrome 14. script 4: flu-like syndrome 2(version 1 and 2 are different for personal dataand symptoms description)
Scenarios in Nespole!
Scenario example
Situation (Winter Holidays in Val di Fiemme):
• choose your vacation starting date after December 10th you want to stay there for (a weekend, 1 week, 2 weeks)
• you have 2 children (choose 2 ages between 2 and 11) and wife/husband
• you want to travel by car and park it at the hotel
• you already know the road to Val di Fiemme
• you want accommodation in ** or *** hotels in Val di Fiemme with bed & breakfast
• choose two hotels among: Latemar in Molina, Bellavista in Cavalese, Excelsior in Cavalese, Lagorai in Cavalese, Belvedere in Panchia, Bellaria in Predazzo, Cimon in Predazzo, Erica in Tesero, Lucia in Tesero, Montanara in Ziano, Zanon in Ziano
• you want to practice a winter sport (choose your favorite winter sport among the following: down hill skiing, cross-country skiing/snowshoeing, ice skating, snow-boarding)
Things to ask for:
• prices and how far in advance to book
• types of ski-lifts nearby and their distance from hotel
• existence of cross-country trails and ice skating areas
• details about favorite winter-sport (exact location, prices, possibility of renting equipment)
• type of parking facilities for the car
• possibility of eating in the hotel and prices of dinner and late supper
• daycare and activities for children in the hotel
• special prices for children
Scenario example
Scenario definition in Nespole!
Example: Showcase 1
• analysis of 5000 e-mail messages (in four languages);
• clustering of the e-mails on the base of the request type;
• selection e-mails concerning requests which could be discussed through phone call;
• construction of 21 scenarios;
• selection of 5 scenarios* among the 21 (done by the APT tourist board office manager)
* http://www.is.cs.cmu.edu/nespole/datacoll.html
Participants
Language Fluent speaker
Age Adults
Sex We tried to balance M & F
Education University (students or more)
Knowledge in the field Half from speech labs and half fromother labs or departments
Computer literacy Average-high
Recruitment Volunteers (invitation)
Reward Non-paid
Other Collaborative
CUSTOMERS:
AGENTS:
Italian professional agents working at Trentino tourist office APT
• APT (agent’s site, Italy) records the English client via H323 connection and the Italian agent via headset
• CMU (client’s site, USA) records the Italian agent via H323 connection and the English client via headset
Environment
File .wav (stereo)
File .wav (stereo)
File .wav (stereo)H323 Eng. customerAgent (local)
File .wav (stereo)H323 AgentEng. Customer (local)
Hardware: PC Pentium 200 and up
Software:Windows NT or Win 98Total RecorderNetMeeting3.01
Microphone:Headsetor close microphone
Environment: Quiet office
Equipment
Recording Procedure(customer’s site)
Before the recording session- Detailed background knowledge of the scenario- Access to web-pages- On-line form (to learn more about the role)
During the recording session- Signing a consent form and providing information about
factors possibly affecting the spoken language- Sitting in front of a computer, wearing a headset- pressing the call button on the Netmeeting window (when
the customer feels ready)- after 10 min the customer was urged to finish
Recording:LTI Data Collection Database
Oracle database, accessible online, containing detailed information and descriptions about meetings recorded, demographics of the speakers, transcriptions and audio files
(currently two separate interfaces to enter data into and retrieve data from the database)
File naming conventions
Confusion with parallel recordings;different types of files concerning the same recording;different languages, types of scenario, locations; stereo vs mono files, etc.
Why?
Example from Nespole! file naming conventions
[dia_name] .[extension]
[language] [count] [scenario] [rec_location] [channel] .[extension]e =Englishg =Germanf =Frenchi =Italian
000-999 A = scen1b = scen2c = scen3d = scen4e = scen5
a = APTg = Grenoblei = IRSTk = Karlsruhep = Pittsburgh
1 = agent2 = customer
wav=audiospr=speaker_inforpr=recording protocoltrl=transcriptionmar=time stamps
FOR EACH DIALOGUE FOR EACH CHANNEL
Recording NameSession No.Project NameRecording TypeRecording CategoryRecording TopicRecording DescriptionRecording ScenarioRECORDING DATERecorded ByNumber of SpeakersNumber of ChannelsComments
Coding (pcm, A-law, u-law)Number of bits (8/16)byte-order (little-endian, big-endian)ratemono or stereosize in byteslength in msmedium typemedium brandmedium usagemedium IDcable IDmixer brandmixer settingschannelspeakers
Log data: recording protocol
MANDATORY DATA NON MANDATORY DATA
Native languageGenderDate of birthEducationCurrent occupationArea of residence duringprimary yearsof schooling (until age 12)
Last NameFirst NameMiddle InitialUser Name (only if applicable)Father's and Mother’s NativeLanguageAccent/Dialect within NativeLanguage (if any)Height (Ft/in or Cm)Weight (Lbs or Kg)Area Of BirthArea Of Longest ResidenceRight or Left HandedSmokerMedical Conditions which couldAffect Speaker's VoiceSpeaker CommentsEmail AddressPhone Number
Log data: speaker protocol
Audio Data
TranscriptionConventions
Transcription Tool
TRL FilesMAR FilesVoc Lists
...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .
m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .
m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .
m054_3_0578_AAH_00: <hm>
m054_5_0579_MTY_00: right . <B>
m054_4_0580_ZMW_00: so , <B> this...
Transcriptionprocess
Audio Data
TranscriptionConventions
Transcription Tool
TRL FilesMAR FilesVoc Lists
...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .
m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .
m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .
m054_3_0578_AAH_00: <hm>
m054_5_0579_MTY_00: right . <B>
m054_4_0580_ZMW_00: so , <B> this...
Transcriptionprocess
-Verbmobil II: - we are familiar with VMB and we have appropriate tools - BAS partitur format - finite/close system (parsing, filtering,converting) - line oriented, no formats (one line/turn) - turn oriented (turn-IDs contain full identification) - time stamps and trl are in different files linked by turn-ID (- http://www.is.cs.cmu.edu/trl_conventions/)
Transcription (trl) Conventions
S. Burger, L. Besacier, P. Coletti, F. Metze and C. Morel, “The NESPOLE! VoIP Dialogue Database”, in Proc. of Eurospeech 2001. Aalborg, Denmark.
-words
-capitalization-punctuation-white space-turn-end-syntax
-non-grammatical phrases-broken words-interrupted words-acoustically hard to understand
-pauses and breathing-filled pauses-acoustically not understandable-human noise
-word tags
-elements
-rules
Orthography: - orthographic rules as long as they are non-ambiguous- no capitalization in case of initial sentence position- vocabulary lists to keep vocabulary spelled the same
Content
<*tENG> Foreign Language Turn (JAP, GER, ..)
;.. global Comment
..'.. Apostrophe (reduced word)
..-.. (--) Hyphen (compound word)
$.. spelled Letter
~..Name
#.. Number
*.. Neologism/Mispronunciation
<*XXX.. Foreign Word (FRA,ITA, ..)
...<L>.. / ..<Z>.. Lengthening
..% Poor intelligible
..= Articulated Break-off
.._ Interruption of a Word, Left Fragment
_.. Interruption of a Word, Right Fragment
<T_>.. Technical Interruption of a Word, Beginning
..<_T> Technical Interruption of a Word, End
<*T> Technical interruption of a Turn
<*T>t Technical Break-off of a Turn
<!n ..> Comment on Pronunciation
. / ? / , Punctuation
+/.. Beginning of a Repetition/Correction
../+ End of a Repetition/Correction
-/.. Beginning of a False Start
../- End of a False Start
<B> / <A> Respiration
<uh> / <"ah> Filled Pause (Hesitation)
<uhm> / <"ahm> Filled Pause (Hesitation)
<hm> Filled Pause (Hesitation)
<hes> / <h"as> Filled Pause (Hesitation)
<%> Unidentifiable Sound Production
<Smack> / <Schmatzen> Nonverbal Artikulatory Sound (sound: smacking)
<Swallow> / <Schlucken> Nonverbal Artikulatory Sound (sound: swallowing)
<Throat> / <R"auspern> Nonverbal Artikulatory Sound (sound: clearing one's throat)
<Cough> / <Husten> Nonverbal Artikulatory Sound (sound: cough)
<Laugh> / <Lachen> Nonverbal Artikulatory Sound (sound: laughing)
<Noise> / <Ger"ausch> Nonverbal Artikulatory Sound (other sounds)
<#Click> / <#Klicken> Technical Noise
<#Ring> / <#Klingeln> Technical Noise
<#Knock> / <#Klopfen> Technical Noise
<#Mtouch> / <#Mikrobe> Technical Noise
<#Mwind> / <#Mikrowind> Technical Noise
<#Rustle> / <#Rascheln> Technical Noise
<#Squeak> / <#Quietschen> Technical Noise
<#> Technical Noise
<P> Pause during Speech
@n.. Active Interference by a Speaker
..n@ Passively Interfered Speaker
<@n.. Active Interference by Acoustic Events
..n@> Passive Interference of Acoustic Events
<:<..> .. Beginning of Noise Interference
..:> End of Noise Interference
<;..> Local Comment
!KEY!.. Code Word
<PP> Scenario Caused Pause
Audio Data
TranscriptionConventions
Transcription Tool
TRL FilesMAR FilesVoc Lists
...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .
m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .
m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .
m054_3_0578_AAH_00: <hm>
m054_5_0579_MTY_00: right . <B>
m054_4_0580_ZMW_00: so , <B> this...
Transcriptionprocess
Why another tool?
Other requirements as before: - Windows instead of Linux - Meetings – multiparty transcription - Transcriber from different backgrounds
At that time (over three years ago) there wasn’t a sufficient transcriber tool
• We did a study what would be the basic requirements.• We asked transcribers what they would find convenient.• We programmed a beta tool according to that.• We are still using this tool (and so do different other places in the mean time)• We call it TransEdit.
Transcription Tools
• MFC program• Windows text editor• click-able buttons for transcription elements• automatic turn naming and counting• label editor• parallel display of multi audio signals• easy turn segmentation• lots of listen functions• easy handling, no research functions•“home work” but available for universities (write to: [email protected])
TransEdit:transcription tool just for transcribers
Audio Data
TranscriptionConventions
Transcription Tool
TRL FilesMAR FilesVoc Lists
...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .
m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .
m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .
m054_3_0578_AAH_00: <hm>
m054_5_0579_MTY_00: right . <B>
m054_4_0580_ZMW_00: so , <B> this...
Transcriptionprocess
; CDR: 00.00 ; TRV: 00.00 ; File: e025at; Last changes made on 09/29/2000 ; Transcriber: VLM ; Comments: ; e025_1_0000_ITL_00: hello ? <P> can you hear me now ?
e025_2_0001_XYZABC_00: hello .
e025_1_0002_ITL_00: hello% . yeah% .
e025_2_0003_ XYZABC _00: <uh> yes , I can .
e025_1_0004_ITL_00: yes , okay . <P> so ?
e025_2_0005_ XYZABC _00: -/hi I would like/- <P> yes ?
e025_1_0006_ITL_00: yes , can you hear me now ?
e025_2_0007_ XYZABC _00: <uh> yes , I can .
e025_1_0008_ITL_00: okay . <B> wonderful . <Laugh> <B> <P> <Smack> <B> so , can I help you ? <B>
e025_2_0009_ XYZABC _00: -/all right I would like/- <uh> yes , madam . I would like to schedule a winter vacation <P> in the north of Italy .
e025_1_0010_ITL_00: <hm> <B>
e025_1_0011_ITL_00: yes . <B> would you like t= <*T>t
e025_1_0012_ITL_00: yes . would you like to come here% in summer or during winter ?
e025_2_0013_ XYZABC _00: <uh> in winter please .
automatic convention check
close check and correction by another transcriber
spell-checking
marker file and trl file cross-check
first pass transcription (but not rough ..)
Data transcription process
Audio Data
TranscriptionConventions
Transcription Tool
TRL FilesMAR FilesVoc Lists
...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .
m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .
m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .
m054_3_0578_AAH_00: <hm>
m054_5_0579_MTY_00: right . <B>
m054_4_0580_ZMW_00: so , <B> this...
Transcriptionprocess
Following mass-data collectionShowcase 2a and 2b
Showcase 2a Showcase 2b
Domain Tourism Medicine
Scenarios 5 4
Multimodality Yes Yes
Dialogues 66 56
Participants12 real APT agents;16 simulated customersper Language
3 doctors in the doctor’srole) and 7 doctors in thepatient’s role per Lang.
Average length 15 mins. 6 mins.