Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
9/26/08 1
9/26/08 2
• WhyStudyLanguageID?
o Formul>‐lingualNLPtasks,weneedtoiden>fythelanguagespokenfirst
• QAandInforma>onRetrievalfrommul>lingualdata• Automa>cDialoguesystems:“theInterna>onalairportofthefuture”(HazenandZue,1993)
o CallCenters:Routeanincomingtelephonecalltoahumanswitchboardoperator(crucialinemergencysitua>ons)
o Wecanlearnaboutdifferencesbetweenlanguages
9/26/08 3
• Givenaspeechsegmentfromanunknownlanguage:
€
Language ∈ {English, Spanish, Arabic,...}
LanguageIden>fica>on–Mo>va>on
Automa>clanguageiden>fica>onusingasegment‐basedapproach(HazenandZue,1993)
• Goal:Designageneralprobabilis>cframeworkforLIDtocombine:
o Phonotac>c(LM),Prosodic,andAcous>cmodels
• ThemostgeneralexpressiontodescribetheLIDproblem,mathema>cally:
• Usingchainandcondi>oningrules,wecangetthisframework:
9/26/08 LanguageID–HazenandZue’sFramework 4
€
iargmaxPr( a |Cb,Sb ,
f ,Li) Pr(Sb,
f ,|Cb,Li) Pr(Cb | Li) P(Li)
Acous>cmodel Prosodicmodel(dura>on||F0)
LM Prior
€
iargmaxPr(Li |
a , f )
Acous*cModels
• Hypothesis:Languagesdifferintheirspectraldistribu>ons
• TwoApproachestoacous>cmodeling:o (HazenandZue,1993)o (Zissman,1996)
9/26/08 LanguageID 5
€
iargmaxPr( a |Cb,Sb ,
f ,Li) Pr(Sb,
f ,|Cb,Li) Pr(Cb | Li) P(Li)
Acous>cmodel Prosodicmodel LM Prior
ProsodicModels
9/26/08 LanguageID 6
€
iargmaxPr( a |Cb,Sb ,
f ,Li) Pr(Sb,
f ,|Cb,Li) Pr(Cb | Li) P(Li)
Acous>cmodel Prosodicmodel LM Prior
• Hypothesis:Languagesdifferintheirprosodicstructureo Dura>on,F0paeerns,energy,speakingrate,andrhythm
• 4ApproachestoProsodicModeling:
o (HazenandZue,1993)
o (Zissman,1996)
o (Rouas,2005)
o (TimoshenkoandHoge,2007)
LanguageModels
9/26/08 LanguageID 7
€
iargmaxPr( a |Cb,Sb ,
f ,Li) Pr(Sb,
f ,|Cb,Li) Pr(Cb | Li) P(Li)
Acous>cmodel Prosodicmodel LM Prior
dh uw z hh ih n d uw w ay ey d y aw ao uh jh y eh k oh aa k v hh aw ao n
f uw v ow z l iy g s m p l k dh n eh g f ey m p l ay ae
dh iy jh sh p eh ae ey d p sh ua r m ey f ay n z
For each language i:
Trainalanguagemodelλi
RunaphoneRecognizer
Hypothesis:Languagesdifferintheirphone>cconstraintsandinventory
LanguageModels
9/26/08 LanguageID 8
€
iargmaxPr( a |Cb,Sb ,
f ,Li) Pr(Sb,
f ,|Cb,Li) Pr(Cb | Li) P(Li)
Acous>cmodel Prosodicmodel LM Prior
uw hh ih n d uw w ay ey uh jh y eh k oh v hh aw ao n hh aa m
Cb
Test utterance:
RunthephoneRecognizer
• 4ApproachestoLanguageModeling:o (HazenandZue,1993)
o (Zissman,1996)
o (KirchhoffandParandekar,2001)
o (Torres‐Carrasquillo,etal.,2002)
€
Pr(Cb | λi)
Automa>clanguageiden>fica>onusingasegment‐basedapproach(HazenandZue,1993)
• OGI‐TS:OregonGraduateIns>tuteMul>‐LanguageTelephoneSpeechdatabase
• 10languages,90speakers
• Acous*cModel:
o TrainedaGaussianforeachof23phoneclassesforeachlanguageo Features:14MFCCcoefficients+14deltacepstra
• ProsodicModel:
• Adura>onmodel:adistribu>onforeachphone
• F0model:twohistogramforeachlanguage(F0normalized)
• Phonotac*cModel:o 23phone‐classrecognizer
o TrigramLM
• Results(on10stestuDerances):
• Combiningallmodels:47.7%(accuracy)
9/26/08 LanguageID 9
ComparisonofFourApproachestoAutoma>cLanguageIden>fica>onofTelephoneSpeech(Zissman,1996)
• Goal:Comparetheperformanceof4LIDapproachesevaluatedoncommoncorpora(OGI‐TS):
I. GMMAcous*cs:GaussianMixtureModeling• Train2GMMs(of40mixtures)foreachlanguage• 1stGMM:12MFCCcoefficients• 2ndGMM:13deltas
II. ThreePhonotac*cApproaches1. PRLM:Single‐languagephonerecogni>onfollowedbylanguagedependentn‐grammodel2. ParallelPRLM:Usemul>plephonerecognizers,eachtrainedonadifferentlanguage3. PPR:Languagedependentparallelphonerecogni>ons• Aphonerecognizerforeachlanguagetobeiden>fied• TheLMisembeddedintheHMM
• Results(comparingGMMtoLMs):o On3languages:PPRperformsaswellasParallelPRLM,andachievebestaccuracy(85%)o On10languages:ParallelPRLMachievesbestaccuracy,63%,PRLM:54%,GMMs,50%
• Improvements:o Prosodicmodel(dura>ononly:shortvs.longphones)o GenderModelo 79%on11languages
9/26/08 LanguageID 10
LanguageIden>fica>onusingGaussianMixtureModelTokeniza>on(Torres‐Carrasquilloetal.,2002)
• Goal:Insteadofusingaphonerecognizerinthefront‐end(PRLMapproach),useclusterindexes
• Approach
o TrainaGMMontheacous>cdataofonelanguageonly
o Tokenizer:foreachframe,outputtheindexoftheGaussianscoringhighestintheGMM
o Trainabigrammodelforeachlanguageovertheseindexes
• Advantages:1. Noneedofmanuallytranscribeddata
• Avoidmislabeleddata• Moredatacanbeaddedeasily
2. GMMislessexpensivethanphonerecognizers,fasterprocessingduringrecogni>on3. Canbecombinedwiththephonelanguagemodelscoretofurtherboostperformance
• Results:
o On12languages(test30s)
o Thisapproachdidnotoutperformthebaseline
o Butcombina>onof3approachesisthebest(83%)
o Backendclassifieralwaysimproveaccuracy
9/26/08 LanguageID 11
1 2
HierarchicalLanguageIden>fica>onbasedonAutoma>cLanguageClustering(Yinetal.,2007)
• Goal:Whichfeaturetypeismostimportanttodis>nguishonelanguagefromanother
• Insteadoffusingallfeaturesinoneclassifier,useamul>‐levelhierarchicalclassifierthatsplitsthelanguagesbasedononefeaturetypeata>me
• UseAgglomera>veclusteringtechnique.1. Init:oneclusterforeachlanguage2. Iterativelytrainclassi7iersoneachlanguageclusterpair:
• mergethepairwithMAXfeature{minaccuracyoftheclassi7ierondev‐set}3. Repeattheprocesstillalllanguagesareinonecluster
• Featuretypes:
o MFCCwith7coefficients(denotedasM)
o Prosodicfeatures:PitchandIntensity(P)
o Concatena>onofboth(M+P)
• ResultsonOGI:(10languages,10s)
o Thesystemoutperformssignificantlyallbaselines
o Adds1.1%tobaseline
o Accuracy:91.3%
9/26/08 LanguageID 12
Phone>cknowledge,phonotac>csandperceptualvalida>onforautoma>clanguageiden>fica>on(Adda‐Deckeretal.,2003)
• Goal:Es>matetheupperboundofthephonotac>capproachbydiscardinglinguis>cnoiseduetorecogni>onerrors1. ComparelanguageIDperformancetrainedonphone>chand‐labeleddatavs.automa>cphonerecognizer2. Seehowwellhumansperforminiden>fyinglanguages
• Experiments:(BN8languages,3heach)o C/Vsequences,10classes,19megaphonesfrom70mul>lingualphoneso Train5‐gramphonotac>cmodels
• Resultsonhandlabeled:o Testueerancesof10phoneso Testueerancesof20phones,100%accuracyisobtained
• Resultsonautoma*cphone*cannota*ons:o UsingPRwith19classes:
• <51.9%,10phones,0.7s><83.7%,40phones,3s><93.7%,80phones,6s>o Using70classes,accuracyincreased(63.2%for10phones)
• HumanLIDperformanceon8languageso Subjectsweretrainedon20secondsperlanguageo 14na>veFrenchspeakingacademicslistenedto3tokensperlanguageof1.5to2secondso Acombinedcorrectiden>fica>onis87.6%.Spanishwasthehardest(63%)
9/26/08 LanguageID 13
Mul>‐StreamSta>s>calN‐GramModelingWithApplica>ontoAutoma>cLanguageID(KirchhoffandParandekar,2001)
• Goal:Usemul>pleparallelsequencesofar>culatoryphone>cfeatureso Voicing,mannerofar>cula>on,consonantalplaceofar>cula>on,nasality,etc.o E.g.,onestream:<glide,vowel,plosive,vowel,frica;ve,vowel,plosive,affrica;ve….>
• Approach
o Acous>cmodelforeachfeature
o DecodeeachfeaturegroupFiindependently
o Classifica>on:
o Streamselec>on{manner,consonantalplace,vowelplace,front‐back,androunding}
• Results:o OGI‐TScorpus,including<3sueerances
o Thephonebasedapproachperformsbeeerthanindependentstreamso Withsomedependency,featuremodel>>phonemodelon
• Onshortueerances50.8%54.8%(<15s)and33.3%48%(<3s)only
• Advantages:o Unseenphonecontextinthetestdatao 47featuremodels,but126phonemodelsmorerobustn‐grammodelso Trainingdataforphone>cfeaturescanbesharedacrossphonesmorerobustacous>cmodels
o Languageindependentnatureofphone>cfeatures
9/26/08 LanguageID 14
€
iargmaxP(F1,...,FK | Li)
Languageiden>fica>onwithsuprasegmentalcues:Astudybasedonspeechresynthesis(RamusandMehler,1999)
• Goal: How do newborns separate input utterances from two languages in a bilingual environment?
• Rhythmhypothesis:Newbornsareabletodiscriminatelanguageswhichhavedifferentrhythmicstructure
• Intona*onHypothesis:Thediscrimina>onisonbasisofintona>onandnotrhythm
• S*muli:20Japaneseand20Englishsentencesreadby4na>vespeakers
• ResynthesisExperiments:
1. saltanaj:Intona>on,rhythm,andbroadphone>ccategorieswerepreserved• Allnon‐prosodic,lexicalandsyntac>cinforma>onwaslost(replaceallphonemesby6broad
categories)
2. sasasa:Onlyintona>onandrhythmwerepreserved
3. aaaa:Onlytheintona>onoftheoriginalsentenceswaspreserved.InterpolateF0overunvoicedframes
4. Flatsasasa:Syllabicrhythmonlyispreserved(constantfundamentalfrequency)
9/26/08 LanguageID 15
Languageiden>fica>onwithsuprasegmentalcues:Astudybasedonspeechresynthesis(RamusandMehler,1999)
• PerceptualExperiment:o 64AdultFrenchsubjectsweretoldthattheueeranceswerefromacous>callymodified
SahatuandMolteco Subjects(awerpassingatrainingsession)wereaskedtoanswerSorM
• Results:
o saltanaj ,sasasa,andflat sasasa wereiden*fiedsignificantlyabovechancebutNOTaaaa
o Whenthes>mulipresentedto16Englishsubjects,whoweretoldthatoneofthelanguagesisEnglish,theycouldsignificantlydiscriminatebetweenaaaa Englishandaaaa Sahatu
• Conclusion
o Syllabicrhythmwasarobustcuefordiscrimina*on
o Intona>onscanbeofgreaterinterestofna>vespeakers
9/26/08 LanguageID 16
UsingSpeechRhythmforAcous>cLanguageIden>fica>on(TimoshenkoandHoge,2007)
• Goal:UserhythmasafeatureforLanguageIDo Speechrhythmismodeledusingthedura>onsoftwosuccessivesyllables
• Automa>csyllabifica>onoftheacous>csignalishard(it’salanguagedependenttask).Instead,usepseudo‐syllables:o CNV,whereN≥0.
o dura>on=|CN|+|V|
• Approach:
o AspeechuDeranceismodeledasasequenceofpseudo‐syllabledura*onsD=d1d2…dNo Learnabigrammodeloverthesesequencesforeachlanguage
o Givenatestueerance,getthedura>onsequence,andtestwhichbigrammodelprovidesthehighestlikelihood
• Results(on7languages,7s)
o RhythmalonewithANNprovide~32%accuracy
o Theacous>csystemalone:92.08%accuracy
o Thefusedsystem92.9%accuracy
9/26/08 LanguageID 17
ModelingLongandShort‐termprosodyforlanguageiden>fica>on(Rouas,2005)
• Goal:Inves>gatetheefficiencyofprosodicfeaturesforlanguageIDatthelevelofpseudosyllables
• Modelingshortandlongtermprosody
o Longtermprosodymodelsprosodicmovementsoverseveralpseudo‐syllables
o Short‐termprosodyrepresentsprosodicmovementinsideapseudo‐syllable
• F0ismodeledusingFujisakimodel(phraseandlocalaccentua>on)o ThebaselineoftheF0contouriscomputedbyconnec>ngallthelocalminima(marktheslopeswithups/
downs/silence)o subtractitfromtheoriginalF0contour.Theresidueisthenapproximatedusinglinearregressionineach
unit(long/short)andthenmarktheslopes
• Energy:foreachpseudo‐syllablesegment,computethelinearregressionoftheenergyandthenmarktheslopes
• Dura*on:marktheunitsasshort/longforeach(whatisshort/long?)
• TheyuseaN‐mul>grammodeltomodelthesesequences
• Results(on7languages,20s)
o Thelong‐termprosodicmodelprovides:41%
o Theshort‐termprosodicmodelprovides:63%
o Mergingbothmodels:71.2%
9/26/08 LanguageID 18
Paper Approach Accuracy #Languages Dur
(HazenandZue,1993) A+P+PRLM 47.7% 10 10s
(Zissman,1996) ParallalPRLM+Dura>on+Genderdependentmodels
79% 11 10s
(KirchhoffandPar.,2001) PRLMofar>culatoryphone>cfeatures
48% 10 <3s
(Torresetal.,2002) ParallelPRLM+GMM‐A+tokenizer
83% 12 30s
(Addaetal.,2003) LMonmanualphones (100%) 8BN <3s
(Rouas,2005) Short+Longtermprosody 71.2% 7 20s
(Timosh.andHoge,2007) GMM‐A+rhythm 92.9% 7 7s
(Yinetal.,2007) A‐GMM+prosodicfeaturesinahierarchicalclassifier+RecentSPtechniques
91.3% 10 10s
ArabicDialectModelingandAnalysis‐‐Mo>va>on
• Arabic:ModernStandardArabic(formal)andcolloquialArabic(informalorcasual)
• Typically,notMSASpontaneous,unconstrained,notwellstudied,solackofNLPtools
• OneofthemainchallengesofSpeechRecogni>onandNLPtasksistodealwithinformaldata
• Goal:WhatarethespeechcuesthatmakeArabicdialectsdifferent?
o DialectID
o CodeSwitchingandMixing
9/26/08 DialectModeling 20
Maghrebi,Egyp>an,Sudanese,Levan>ne,Iraqi,Gulf
ArabicDialectModeling–Topics
1. RhythmicandSyllabicStructure1. (Ramus,2002)2. (Hamdietal.,2004)3. (Hamdietal.,2005)
2. Intona*onandStress1. (Barkat,1999)2. (DeJongandZawaydeh,1999)3. (HellmuthandElZarka,2007)
3. HandlingASRproblemsforDialects1. (KirchhoffandVergyri,2004)2. (Vergyri,etal.,2005)
4. Morphology1. (HabashandRambow,2006)
9/26/08 DialectModeling 21
Acous>ccorrelatesoflinguis>crhythm:Perspec>ves(Ramus,2002)
9/26/08 22
(Ramus,2002)–rhythm‐basedmeasuretoclassify:
Stressed‐TimedLanguages
Syllable‐TimedLanguages
Mora‐TimedLanguages
Acous>ccorrelatesoflinguis>crhythm:Perspec>ves(Ramus,2002)
• Goals:Comparetworecentmeasurestodis>nguishrhythmicstructureoflanguages1. Dura*onandvariabilityofvocalicandintervocalicintervals‐‐%V,ΔC,andΔV(Ramusetal.1999)2. PairwiseVariabilityIndex(rPVIandnPVI)ofvocalicandnon‐vocalicintervals(GrabeandLow,2002)
• Issue:Rhythmis,atleastinpart,amaeerofdura>on,anddura>onareaffectedbyspeakingrate(acrosslanguages)
• Conclusion:nPVImethodisrobusttovariabilityduetovaria*onsofspeakingrate
9/26/08 DialectModeling 23
Speech>mingandrhythmicstructureinArabicdialects:Acomparisonoftwoapproaches(Hamdi,2004)
9/26/08 24
(Ramus,2002)–rhythm‐basedmeasuretoclassify:
Stressed‐TimedLanguages
Syllable‐TimedLanguages
Mora‐TimedLanguages
English
Dutch
Arabic
(Hamdietal.,2004)RhythmacrossDialectsUsingRamusmetric
Speech>mingandrhythmicstructureinArabicdialects:Acomparisonoftwoapproaches(Hamdietal.,2004)
• Goal:ExplorethedifferencesbetweentherhythmicstructureofArabicdialectso Subjectsin(Barakatetal.,1999):“WesternArabicsoundedfasterandjerkierthanEasternArabic”
speechrhythm
• Ques*on:Aretheresystema*crhythmicdis*nc*onsbetweendialects?• DialectsandLanguages:6Arabicdialects(3western,and3eastern)and3otherlanguages
o 30sentencesperlanguage/dialectof2.5sdura>ononavg(3malespeakersperlanguage)
• Method:Compute%V,andΔCforeachlanguage/dialectandcompare
• Results:o Agradualincreaseof%VasonemovesfromWesttoEast(*)o ΔCdecreasesfromWesttoEast(**)o FrenchhaslargervocalicintervalsthantheotherlanguagesandArabicdialectso ΔCofFrenchissimilartothatoftheeasterndialectso Significancedifferencesbetweenregions(Westvs.East)butnotbetweendialectswithinthesamegroupo Theaveragevaluesof%Vofallwesterndialectissignificantlyhigherthantheeasterndialectso TheaveragevaluesofΔCofEAissignificantlyhigherthanWA
• Comparingrhythmmethods:highcorrela*onbetweenΔCandrPVI‐CandΔVandrPVIV • IntheLIDframeworkprosodicmodel
9/26/08 DialectModeling 25
SyllableStructureinSpokenArabic:acompara>veinves>ga>on(Hamdietal.,2005)
9/26/08 26
(Ramus,2002)–rhythm‐basedmeasuretoclassify:
Stressed‐TimedLanguages
Syllable‐TimedLanguages
Mora‐TimedLanguages
English
Dutch
Arabic
(Hamdietal.,2004)RhythmacrossDialectsUsingRamusmeasures
(Hamdietal.,2005)SyllabicStructureofArabicDialects
Syllable/rhythmrela>onship
SyllableStructureinSpokenArabic:acompara>veinves>ga>on(Hamdietal.,2005)
• Goal:compareindetailthesyllabicstructureofthreeArabicdialects:Moroccan,Tunisian,andLebaneseArabictounderstandtheirrhythmictendency
• Data:8‐10minutesofspontaneousspeechatnormalspeakingrateforeachdialect• Analysis:
o ConsonantclustersaremorefrequentinWesterndialects,especiallyinMoroccanArabico CVandCVCarethetwodominanttypes,together:55%inMoroccan,65%inTunisianand76%inLebanese
‐‐CVisthemostfrequento CVsyllablesaremuchmorefrequentinLebanesethaninwesterndialectso Moroccandialectmayincludeupto3consonantsinonsetposi>onand2inthecodao TunisianArabicsyllablecomplexityisbetweenMoroccanandtheLebanesedialects
• Thefindingsofthispapersupportthoseof(Hamdietal.2004)o Vowelreduc>onandshortvoweldele>oninwesterndialectslower%V(*)o MorecomplexsyllabicstructurehigherΔC(**)
• Conclusion:o Withinalanguage,dialectsexhibitdetectabledifferencesinrhythmic/syllabic
characteris*cs• IntheLIDframework(broad)phonotac*cmodel+prosodicmodel
9/26/08 DialectModeling 27
ArabicDialectModeling–Topics
1. RhythmicandSyllabicStructure1. (Ramus,2002)2. (Hamdietal.,2004)3. (Hamdietal.,2005)
2. Intona*onandStressofArabicDialects1. (Barkat,1999)2. (DeJongandZawaydeh,1999)3. (HellmuthandElZarka,2007)
3. HandlingASRproblemsforDialects1. (KirchhoffandVergyri,2004)2. (Vergyri,etal.,2005)
4. Morphology1. (HabashandRambow,2006)
9/26/08 DialectModeling 28
ProsodyasaDis>nc>veFeaturefortheDiscrimina>onofArabicDialects(Barakatetal.,1999)
9/26/08 DialectModeling 29
F0
(Barakat,1999)F0andenergydifferencesbetweenEandWArabic
ProsodyasaDis>nc>veFeaturefortheDiscrimina>onofArabicDialects(Barakatetal.,1999)
• Goal:Testifprosodicpaeernsarereliablecuesforperceptuallydiscrimina>ngArabicdialectso 4Dialects:WesternArabic(MoroccoandAlgerian)andEasternArabic(SyriaandJordan)
• Data:Six“passages”spokenby4malespeakers24totalforeachdialect
• Subjects:19Na>vewesternArabic;and19non‐Arabic
1. Baselineperceptualexperiment:naturalspeechtoevaluatethesubjects'knowledgeandpercep>onofdialectso Results:97%ofcorrectiden>fica>onbytheArabicsubjectsand56%(significant)bythe
non‐Arabic
2. Maskingperceptualexperiment(buzzsounds)toevaluatethereliabilityofprosodicinforma>onondiscrimina>ono Results:58%ofcorrectiden>fica>on(significant)byArabicsubjectsand49%(not
significant)bynon‐Arabic
• Problem:thetwoexperimentswerepresentedtothesamesubjectsinarow
9/26/08 DialectModeling 30
Stress,dura>on,andintona>oninArabicword‐levelprosody(deJongandZawaydeh,1999)
9/26/08 DialectModeling 31
F0
(DeJongandZawaydeh,1999)Acous>c‐prosodiccorrelatesof
lexicalstress
Wordi
Lexicalstress
Stress,dura>on,andintona>oninArabicword‐levelprosody(deJongandZawaydeh,1999)
• Goals:ExaminetheprosodiccorrelatesofAmmani‐JordanianArabiclexicalstresso DoesArabichavesimilarcuesasinEnglish?Increaseindura>on,extremeformantvalues,increasedintensity,andF0
• Data:10typesofwordsspokeninthefiveprosodiccondi>onsspokenin5condi>onsby4speakers(targetsyllableshad/d/onsetsand/a/inthenucleus)
• Dura*ono Dura>onofvowelsinstressedsyllables>>unstressedsyllablesinantepenul>mate(word)posi>ononlyo Syllableposi>onmoreconsistentdeterminerofdura>onacrosssubjectsthanisthestress
• FormantpaDerns:o Stressed/a/hasasystema>callyhigherF1
• F0paDernso StressassociatedwithanincreaseinF0o F0inpenul>matesignificantlygreaterthaninantepenul>matesyllables
• Rela*onshipsbetweenhigher‐levelprosodyandword‐leveleffects:o Voweldura>onsforsyllablefollowedbybreakindex4issignificantlylongerfrom2,3,butnosignificantdifference
between2,3o MostspeakersuseL‐L%contoursforstatementsandL‐H%forques>onso Highpitchaccentscommonlyoccurinstatements
• Conclusion:Arabicword‐levelprosodyisremarkablylikethatofEnglisho Theexpressionofstress,linkageofpitchaccentstostressedsyllables,andintheoccurrenceofpre‐
boundarylengthening
9/26/08 DialectModeling 32
Varia>oninphone>crealiza>onorinphonologicalcategories?Intona>onalpitchaccentsinEgyp>anColloquialArabicandEgyp>anFormalArabic(HellmuthandElZarka,2007)
9/26/08 DialectModeling 33
F0
Wordi
Lexicalstress
(HellmuthandElZarka2007)Egyp>anColloquialvs.Formal
Registers
2ndRegisterF0
Varia>oninphone>crealiza>onorinphonologicalcategories?Intona>onalpitchaccentsinEgyp>anColloquialArabicandEgyp>anFormalArabic(HellmuthandElZarka,2007)
• Goal:Exploretheassump>onthatformalArabicwillhavetheintona>onalcharacteris>csofthespeaker’scolloquialvariety
• Material:2Egyp>anspeakersreadECAandEFAwordsthatsharesamestressedsyllables.Wordswereputinsentencesandread3>mes(total72targetwords)
• Qualita*veAnalysis:o Similari>esbetweenECAandEFA
1. Pitchaccentonalmosteverycontentwordinbothregisters2. Accentshapeforacontentwordismostlythesameforbothregisters3. LowplateaubetweensuccessiveHpeaks
o Differences:• EFAcontainsgreaterpropor>onofphraseboundaries
• Quan*ta*veAnalysis:Testpitcheventtostressedsyllable:o Rela>vepeakdelay:(distanceofHpeakfromthestressedsyllableonset)/(stressedsyllabledura>on)
variessignificantlybetweenregistersforspeakerAonlyandnotspeakerB.o Forbothspeakers,inCVVandCVCsyllables,Hisalignedwithinaccentedsyllableinbothregisters
• Conclusion:
• speakerscarrytheirprosodiceventsofthemother’stonguedialect
9/26/08 DialectModeling 34
ProsodicdifferencesamongArabicdialects
9/26/08 DialectModeling 35
F0
(Barakatetal.,1999)F0andenergydifferencesbetweenEandWArabic
(DeJongandZawaydeh,1999)Acous>c‐prosodiccorrelatesof
lexicalstress
Wordi
Lexicalstress
(HellmuthandElZarka2007)Egyp>anColloquialvs.Formal
Registers
2ndRegisterF0
ArabicDialectModeling–Topics
1. RhythmicandSyllabicStructure1. (Ramus,2002)2. (Hamdietal.,2004)3. (Hamdietal.,2005)
2. Intona*onandStressofArabicDialects1. (DeJongandZawaydeh,1999)2. (Barkat,1999)3. (HellmuthandElZarka,2007)
3. HandlingASRproblemsforDialects1. (KirchhoffandVergyri,2004)2. (Vergyri,etal.,2005)
4. Morphology1. (HabashandRambow,2006)
9/26/08 DialectModeling 36
ASRforArabic
• TypicallyArabictranscriptslackshortvowels(smalldiacri>csinArabicscript)
• Mul>plevalidvocaliza>ons=>pronuncia>onformostwordschallengeforASR
9/26/08 37
qbl
• qabl,qabla,qabli,qablu(before)• qabila(accept)• qab ala(tokiss)• ….
قبل
قبل
قبل
Cross‐DialectalAcous>cDataSharingforArabicSpeechRecogni>on(KirchhoffandVergyri,2004)
• Goal:UseunvocalizedMSAdatatoimproveASRforEgyp>anConversa>onalArabic(ECA)
• Mo*va*on:NotenoughtrainingdataforECA,especiallyfortriphoneacous>cmodelso 40%oftheCallHome(ECAcorpus)triphonesalsooccurintheFBIS(MSAcorpus)
• Automa*cDiacri*za*on1. Generateallpossiblediacri>zedvariantsforeachword,alongwiththeirmorphologicalanalyses2. Trainanunsupervisedtrigramtaggertoassignprobabili>estosequencesofmorphologicaltags3. Usethetrainedtaggertoassignprobabili>estoallpossiblediacri>za>onsforagivenueerance4. Usetheweighteddiacri>za>onsaspronuncia>onnetworksanduseacous>cmodelstrainedonECAto
findthemostlikelydiacri>za>on
• Results:
o Trainingasystemwiththepooleddata(CallHome+FBIS)didnotoutperformbaseline
o Whentrainingtwoindependentsystems,ROVERcombina*onoutperformsCallHome‐Only0.8%absoluteimprovementondevsetand1.0%improvementonevalset(0.1significancelevel)(accuracy58.3%)
9/26/08 DialectModeling 38
DevelopmentofaConversa>onalTelephoneSpeechRecognizerforLevan>neArabic(Vergyrietal.,2005)
• Goal:DescribethedevelopmentofLevan>neSpeechRecognizer,anddiscuss:o GraphemeAcous*cModels
• Eachacous>cmodelimplicitlymodelseitheralongvoweloraconsonantwithop*onalshortvowel(obtainedbysimpleorthographicrules)
o ModelingofShortVowels1. GenericVowel:Addoneop>onalgenericvowelphoneinallpossibleposi>onsinpronuncia>on2. Auto‐Vowel:
1. Annotatesubsetoftrainingdata2. Train4‐gramlanguagemodelwithhiddeneventstopredictthevowelsinalltrainingdata(30%ofthewordshaveatleastonewrongcharacter)
o Morphologicallanguagemodeling• Affixa>onsandasubsetofPOSareiden>fiedusing“asimplescriptandknowledgeofLevan>ne”• UseMSAmorphologicalanalyzertrainfactoredLM
• Results:o Combina>onofAuto‐vowelizedAM+LMoutperformthegenericandgraphememodelso FactoredLMimprovestheaccuracybutnotsignificantly.o ROVERcombina>onofsystems(grapheme+genericvowel+auto‐vowel):accuracy53.5%–
thebest
9/26/08 DialectModeling 39
ArabicDialectModeling–Topics
1. RhythmicandSyllabicStructure1. (Ramus,2002)2. (Hamdietal.,2004)3. (Hamdietal.,2005)
2. Intona*onandStressofArabicDialects1. (DeJongandZawaydeh,1999)2. (Barkat,1999)3. (HellmuthandElZarka,2007)
3. HandlingASRproblemsforDialects1. (KirchhoffandVergyri,2004)2. (Vergyri,etal.,2005)
4. Morphology1. (HabashandRambow,2006)
9/26/08 DialectModeling 40
MAGEAD:AMorphologicalAnalyzerandGeneratorfortheArabicDialects(HabashandRambow,2006)
• Goal:Developageneralframeworkformorphologicalanalyzerandgeneratorfordialectsofonelanguagefamily
9/26/08 DialectModeling 41
tense:FUT
PART:FUT
… ….
Word
Verb
VerbTr VerbInt
Noun …
DialectIndependent
CFG for ordering
Orth./phon.rules
Surface
CM1
CMn
Example:forEGY:[PART:FUT]Ha+
• MAGEADAdvantages: o Verygeneralframework,suppor>nganewdialectrequiresspecifyingconcretemorphemesand
orthographicandphonologicalrulesforthisdialecto Itcanbeusedwithoutalexiconorwithapar>allexicono Itcanbeusedasanalyzerandgeneratoro Itaddsshortvowelstotheanalyzedwords(goodforASR)
• Evalua*on(onverbsof3radicals):o ThesystemoutperformsBuckwalteranalyzeronMSA(onmbcverblist)
• tokenprecision:94.9%;recall:95.8%o OnLevan>ne(onall)
• Contexttokenrecall:MSAsystemonLevan>nedata:60.4%Levan>nesystem:94.2%
DisfluencyDetec>on–Mo>va>on
• ~10%ofspontaneousueerancescontaindisfluencies(Hindle,1982)• OneDisfluencyper4.6secondsforradiotalkshows(BlackmerandMieon,1991).
• Disfluencytypes:
o Hesita*ons:“Ch*ChangeStrategy”
o Restarts(orfalsestarts):“It’salso*Ilikeit”
o Fillers:(filledandunfilled):“um*Bal;more”
o Selfrepairs(orself‐correc*ons):“Ithinkthatyouget*it’smorestrictinCatholic”
• Disfluenciesareuseful!
o Disfluenciesmayfacilitatelanguageacquisi>onbyhighligh>ngequivalentclasses
o Some>mesreducethementalandmemoryloadtodigestinforma>on
o Ge�ngorkeepingthefloor
o Fordialoguesystems(topretendreal‐>meperformance,keeptheturn)
• DisfluenciesareanobstaclesforNLPtasks:
o ASR,SpeechUnderstanding,Parsing,QA,andSummariza>on
9/26/08 DisfluencyDetec>on 42
DisfluencyDetec>on–Topics
1. DisfluencyCorrec>onandIden>fica>on1. (Hindle,1982)2. (NakataniandHirschberg,1994)3. (Liuetal.,2003)4. (Snoveretal.,2004)
2. ModelingDisfluencytoimproveASR
1. (Stolckeetal.,1999)2. (StoutenandMartens,2004)
3. HumanandDisfluency
1. (BardandLickley,1997)
9/26/08 43
9/26/08 44
1.Text:GivemeairlinesflyingatuhflyingtoBostonfromSanFrancisconext…
1. (NakataniandHirschberg,1994)2. (Liuetal.,2003)
Givemeairlinesflyingat‐‐uhflyingtoBostonfromSanFrancisconext…
Interrup>onPoint
2.A+P:
(Snoveretal.,2004)
(Hindle,1982)
GivemeairlinesflyingtoBostonfromSanFrancisconext…
DisfluencyCorrec*on
IPIden*fica*oninrepairs
TextualInputwithEditSignalannotated
Determinis>cparsingofSyntac>cnon‐fluencies(Hindle,1982)
• Goal:Expungeself‐repairsproducewell‐formedsyntac>cstructurethatisconsistentwiththeintendedmeaning
• Edi*ngSignal:Minimalnon‐lexicalmaterialthatself‐repairmightinsert
• Assump*on:phone>callyrecognizableandequivalent
• Method:Integratecorrec>onrulesinaparsertospecifyhowmuch,ifanything,toexpungewhenaneditsignalisdetected
• Rules:
o SurfaceCopyEditor:searchforexactrepe>>onseparatedbyeditsignal;expungeone
o CategoryCopyEditor:searchforexactrepe>>onwithsamecategoryseparatedbyaneditsignal,expungethefirst
o StackCopyEditor:searchforexactrepe>>onwithsimilarsyntac>ccons>tuentseparatedbyaneditsignal,expungethefirst
• Resultsononeinterview:
o 1512.27%ofthesentenceshadeditsignal,73%ofthesentenceshadnoeditsignal
o Surfacecopy:29%|CategoryCopy:9%|StackCopy:27%
o Removingeditsignalonly:24%
o Failures:3%|Remainingunclearandungramma>cal:2%
9/26/08 45
Acorpus‐basedstudyofrepaircuesinspontaneousspeech(NakataniandHirschberg,1994)
• Goals:
1. Proposeaframeworktoinves>gaterepairsthatdividestherepaireventintothreetemporalintervals(RIM)
2. Iden>fyrobustacous>c‐prosodiccuesineachoftheseintervalstodetectrepairswithnorelianceuponsophis>catedunderstandingofthetext
3. BuildarepairISdetector
9/26/08 46
GivemeairlinesflyingtoSa‐‐silence uhsilence flyingtoBostonfromSanFrancisconext…
Reparandum DisfluencyInterval RepairInterval
Interrup>onSite
Data:6414ueerancesfromtheARPAAirlineTravelandInforma>onSystem122speakers346ueerancescontainedatleastonerepair(5.4%)
Acorpus‐basedstudyofrepaircuesinspontaneousspeech(NakataniandHirschberg,1994)
9/26/08 47
• Reparandum• 73.3%ofallreparandaendinwordfragment
• Majorityoffragmentwordsarecontentwordsandrarelymorethanonesyllablelong,some>mesgloealizedandsome>mesexhibitcoar>culatoryeffects.
• #wordsinreparandum:(non‐fragmentrepairs:1,52%2,32%)(fragment:1,65%2,23%)
• DisfluencyInterval
• FilledpausesandcuephrasesoccurinDI(9.4%)–significantlymoreoreninnon‐fragmentrepairsthaninfragmentrepairs
• Speakerstakeless>metoini>atetheproduc>onoftherepairinginfragmentedrepairs
• DIdura>onforfragmentrepairsissignificantlyshorterthanfornon‐fragmentrepairs
• SmallbutreliableincreasesinF0andamplitudefromtheendofthereparandumtothebeginningoftherepair
• TheRepairInterval
• Phraseboundariescanservetoiden*fytherepairregion
• For43%oftherepairs,therepairoffsetcoincideswithphraseboundary
• 70%oftheremaininghavethefirstphraseboundaryawertherepaironsetattherightedgeofasyntac>ccons>tuent
Acorpus‐basedstudyofrepaircuesinspontaneousspeech(NakataniandHirschberg,1994)
• Predic>ngrepairsfromacous>candProsodicCueso Dis>nguish{IS,fluent‐phraseboundary,non‐repairdisfluency,simplewordboundary}
o Theyconsideredeverywordboundarytobeapoten>alrepairsite
o UDerancesinthetestdatahaveatleastoneIS
o Featureexamples:
• Thedura>onofpausebetweenwiandwj• Theoccurrenceofoneormorewordfragmentswithinwiandwj
• Recall:86.1%Precision:91.2%
9/26/08 48
Automa>cDisfluencyIden>fica>oninConversa>onalSpeechUsingMul>pleKnowledgeSources(Liuetal.,2003)
• Goal:Inves>gatemul>pleknowledgesourcesofiden>fyingreparanda1. DecisionTreeclassifierthatusestheacous>cprosodicfeaturesposteriorprobability:IPvs.non‐IPbetween
eachpairofwords• Dura>on+F0features+voicequalityfeatures
o POS/WordLMwithhiddenevent“<IP>”
• Results:o Prosodymodelonly>>chanceperformanceondownsampleddata.Recall77.5%;precision77.6%
(baselineis50%)o Onnon‐downsampleddata:Word‐LM&POS‐LM&Prosody>>baseline(96.62%)
• Accuracy:98.1%• Recall:56.76%• Precision:81.25% • Somedegrada>ononASRoutput(usingonlyWord‐LM97.05vs.98.01)• Morerepe>>on(IPs)areiden>fiedbythepaeernLMthanbytheword‐basedLM.
9/26/08 49
ALexically‐DrivenAlgorithmforDisfluencyDetec>on(Snoveretal.,2004)
• Goal:Designtransforma>on‐basedlearningapproachtodisfluencydetec>onusingprimarilylexicalfeatureswithouttheuseofextensiveprosodiccueso Ruleexample:“changethelabelofwordwithPOSXfromL1toL2iffollowedbywordwithPOSY”
• Task:TageachwordineitherareferenceorASRsentencewith{filler,edit,fluent}
• Features:o Lexemes,speakeriden>ty,andwhetherthewordisfollowedbyasilence
• Training:o Input:Timealignedtranscript:speakerid,sentenceboundaries,edits,fillersandinterrup>on
pointsareannotatedo Rulesarecreatedbyexpandingruletemplates,whicharegivenasinputtothelearner,forexample:
• ChangethelabelofwordXfromL1toL2
o Thealgorithmgreedilyselectstherulethatreducestheerrorratethemost106ruleswerelearned
• Results(usinglexemeerrorrate):• 2Baselines:bothsystemsusingprosodicandlexicalfeatures• NosystemperformswellonASRtranscripts(86‐96%foredits).• Filledpauseiden*fica*on:REF:~18%,ASR:48‐57%(comparabletoacous>c/prosodicsystem)• Edits:Acous>csystem(59%)significantlyoutperformsthelexicalsystem(68%)onlyinCTS.
9/26/08 50
DisfluencyDetec>on–Topics
1. DisfluencyCorrec>on1. (Hindle,1982)2. (NakataniandHirschberg,1994)3. (Liuetal.,2003)4. (Snoveretal.,2004)
2. ModelingDisfluencytoimproveASR
1. (Stolckeetal.,1999)2. (StoutenandMartens,2004)
3. HumanandDisfluency
1. (BardandLickley,1997)
9/26/08 51
Modeling Disfluency to Improve ASR
9/26/08 52
ASR
(Stolckeetal.,1999)Modelprosodydisfluency
(StoutenandMartens,2004)FPdetectorasafront‐end
ModelingtheProsodyofHiddenEventsforImprovedWordRecogni>on(Stolckeetal.,1999)
• Goal:Modelprosodytoimprovespeechrecogni>onbymodifyingthelanguagemodeltorepresenthiddenevents:
• sentenceboundariesandvariousformsofdisfluencies
9/26/08 53
Right <S> I <REP> I don’t <DEL> uh <FP> I’m not really sure
€
wW *=argmaxP(W | A, F) ≈
wargmax P(W , S, F)P(A |W )
S∑
StandardAcous>cModels
€
P(W , S) P(F |W ,S)
N‐gramofwordsandevents
€ €
≈ P(Fi | Ei,W )i∏Ficomputedfromwindowi
HMM:• States:<word,event>pairs• Observa*ons:prosodicfeatures
• Transi*onprobabili*es:n‐gramprobabili>es
• Emissionprobabili*es:posteriorprobabili>esfromadecisiontree
€
P(W , S, F)
W=wordsequence;A=acous>cfeatures;F=prosodicfeaturesS=sequenceofevents;
ModelingtheProsodyofHiddenEventsforImprovedWordRecogni>on(Stolckeetal.,1999)
• Prosodicfeatureso Dura>onsofpauses,offinalvowels,andofsyllablerhymes
• 0.9%significantabsolutereduc>onofworderrorrate
• ErrorAnalysis:
• Fewersubs>tu>onandinser>onbutmoredele>on.• Prosodicmodelreduceserrors
• ofhigh‐frequencywordsthattendtooccuratsentenceboundaries• …thatatchurchto<s>….thatatchurchtoo<s>
• occuraroundfilledpauses• …toperforminandcolweather….toperforminUHcoldweather
9/26/08 54
CopingwithDisfluenciesinSpontaneousSpeechRecogni>on(StoutenandMartens,2004)
• Goal:DetectfirstsimpledisfluenciesandthenchangethebehaviorofthesearchengineandLM(forDutch)
• IfaFPisdetected,o FPframedropping:discardtheframesinFPinterval
• Significantreduc>onofWERwhenusingreferenceonly
o FPprobabilityadapta*on:LocallyraisetheprobabilityofenteringFPstate• ChangetheprobabilitytotheFParcintheLMwhenmorethan50%oftheframesconsumedbythis
arcfallinsideadetectedFPinterval• foreachFP,1.04(auto)‐1.7(ref)wordswerecorrectedFP;tradi>onalmodel(modelingFPasa
word):0.75
• Ifawordrepe>>onisiden>fied,o WRframedropping:Droptheframesoftherepeatedword
• Corrects0.6%wordsperWR(ref)
o WRprobabilityadapta*on:RaisetheprobabilityofpathsthatincludeWRs• Changetheprobabilityofreenteringthewordawerdetec>ngrepe>>on• Corrects1.03%wordsperWR(ref)
9/26/08 55
DisfluencyDetec>on–Topics
1. DisfluencyCorrec>onandIden>fica>on1. (Hindle,1982)2. (NakataniandHirschberg,1994)3. (Liuetal.,2003)4. (Snoveretal.,2004)
2. ModelingDisfluencytoimproveASR
1. (Stolckeetal.,1999)2. (StoutenandMartens,2004)
3. HumanandDisfluency
1. (BardandLickley,1997)
9/26/08 56
OnNotRememberingDisfluencies(BardandLickley,1997)
• Goal:Examineevidencethatfailuresofmemoryandpercep>onareinvolvedinhumanabilitytomissdisfluencies
• S*muli:Spontaneousspeechueeranceswith80simplexand16complexdisfluencies• 30recast(nowordsfromreparandumisrepeatedinrepair)and50withrepeats
• Subjectswereinstructedtotranscribeeverythingtheyheardintorealwordsinstandardorthographyandtobeasaccurateaspossible.
• Results:1. Listenershadgreatdifficultyinrepor*ngwordsfromreparanda2. Recallofwordsinreparandawasworseinthelongeststrings3. Allfluentoutcomesweresignificantlybeeerthananydisfluentoutcomes4. Thelongertherepairthelessrecallofwordsinreparanda5. Reportratefallsmoresharplyinrepe**ondisfluenciesthaninothers6. Repe>>ondisfluenciesaresignificantlymoreforgeeablethanotherswhentheyoccurinueerances
whicharealreadydifficulttoprocessbecauseofmul>plefalsestarts
Repe>>ondeafnesshelpstoexpungedisfluencies
• UsingMul*pleregression,repe>>onandrecastdisfluenciesweresubjecttosomewhatdifferentinfluences
9/26/08 57
9/26/08 58