9/26/08 1fadi/candidacy/fadi-candidacy.pdfdh uw z hh ih n d uw w ay ey d y aw ao uh jh y eh k oh aa k v hh aw ao n f uw v ow z l iy g s m p l k dh n eh g f ey m p l ay ae dh iy jh

9/26/08 1

9/26/08 2

•  WhyStudyLanguageID?

o  Formul>‐lingualNLPtasks,weneedtoiden>fythelanguagespokenfirst

•  QAandInforma>onRetrievalfrommul>lingualdata•  Automa>cDialoguesystems:“theInterna>onalairportofthefuture”(HazenandZue,1993)

o  CallCenters:Routeanincomingtelephonecalltoahumanswitchboardoperator(crucialinemergencysitua>ons)

o  Wecanlearnaboutdifferencesbetweenlanguages

9/26/08 3

•  Givenaspeechsegmentfromanunknownlanguage:

€

Language ∈ {English, Spanish, Arabic,...}

LanguageIden>fica>on–Mo>va>on

Automa>clanguageiden>fica>onusingasegment‐basedapproach(HazenandZue,1993)

•  Goal:Designageneralprobabilis>cframeworkforLIDtocombine:

o  Phonotac>c(LM),Prosodic,andAcous>cmodels

•  ThemostgeneralexpressiontodescribetheLIDproblem,mathema>cally:

•  Usingchainandcondi>oningrules,wecangetthisframework:

9/26/08 LanguageID–HazenandZue’sFramework 4

€

iargmaxPr( a |Cb,Sb ,

f ,Li) Pr(Sb,

f ,|Cb,Li) Pr(Cb | Li) P(Li)

Acous>cmodel Prosodicmodel(dura>on||F0)

LM Prior

€

iargmaxPr(Li |

a , f )

Acous*cModels

•  Hypothesis:Languagesdifferintheirspectraldistribu>ons

•  TwoApproachestoacous>cmodeling:o  (HazenandZue,1993)o  (Zissman,1996)

9/26/08 LanguageID 5

€


f ,Li) Pr(Sb,


Acous>cmodel Prosodicmodel LM Prior

ProsodicModels


€


f ,Li) Pr(Sb,



•  Hypothesis:Languagesdifferintheirprosodicstructureo  Dura>on,F0paeerns,energy,speakingrate,andrhythm

•  4ApproachestoProsodicModeling:

o  (HazenandZue,1993)

o  (Zissman,1996)

o  (Rouas,2005)

o  (TimoshenkoandHoge,2007)

LanguageModels


€


f ,Li) Pr(Sb,



dh uw z hh ih n d uw w ay ey d y aw ao uh jh y eh k oh aa k v hh aw ao n

f uw v ow z l iy g s m p l k dh n eh g f ey m p l ay ae

dh iy jh sh p eh ae ey d p sh ua r m ey f ay n z

For each language i:

Trainalanguagemodelλi

RunaphoneRecognizer

Hypothesis:Languagesdifferintheirphone>cconstraintsandinventory

LanguageModels


€


f ,Li) Pr(Sb,



uw hh ih n d uw w ay ey uh jh y eh k oh v hh aw ao n hh aa m

Cb

Test utterance:

RunthephoneRecognizer

•  4ApproachestoLanguageModeling:o  (HazenandZue,1993)

o  (Zissman,1996)

o  (KirchhoffandParandekar,2001)

o  (Torres‐Carrasquillo,etal.,2002)

€

Pr(Cb | λi)

Automa>clanguageiden>fica>onusingasegment‐basedapproach(HazenandZue,1993)

•  OGI‐TS:OregonGraduateIns>tuteMul>‐LanguageTelephoneSpeechdatabase

•  10languages,90speakers

•  Acous*cModel:

o  TrainedaGaussianforeachof23phoneclassesforeachlanguageo  Features:14MFCCcoefficients+14deltacepstra

•  ProsodicModel:

•  Adura>onmodel:adistribu>onforeachphone

•  F0model:twohistogramforeachlanguage(F0normalized)

•  Phonotac*cModel:o  23phone‐classrecognizer

o  TrigramLM

•  Results(on10stestuDerances):

•  Combiningallmodels:47.7%(accuracy)


ComparisonofFourApproachestoAutoma>cLanguageIden>fica>onofTelephoneSpeech(Zissman,1996)

•  Goal:Comparetheperformanceof4LIDapproachesevaluatedoncommoncorpora(OGI‐TS):

I.  GMMAcous*cs:GaussianMixtureModeling•  Train2GMMs(of40mixtures)foreachlanguage•  1stGMM:12MFCCcoefficients•  2ndGMM:13deltas

II.  ThreePhonotac*cApproaches1.  PRLM:Single‐languagephonerecogni>onfollowedbylanguagedependentn‐grammodel2.  ParallelPRLM:Usemul>plephonerecognizers,eachtrainedonadifferentlanguage3.  PPR:Languagedependentparallelphonerecogni>ons•  Aphonerecognizerforeachlanguagetobeiden>fied•  TheLMisembeddedintheHMM

•  Results(comparingGMMtoLMs):o  On3languages:PPRperformsaswellasParallelPRLM,andachievebestaccuracy(85%)o  On10languages:ParallelPRLMachievesbestaccuracy,63%,PRLM:54%,GMMs,50%

•  Improvements:o  Prosodicmodel(dura>ononly:shortvs.longphones)o  GenderModelo  79%on11languages


LanguageIden>fica>onusingGaussianMixtureModelTokeniza>on(Torres‐Carrasquilloetal.,2002)

•  Goal:Insteadofusingaphonerecognizerinthefront‐end(PRLMapproach),useclusterindexes

•  Approach

o  TrainaGMMontheacous>cdataofonelanguageonly

o  Tokenizer:foreachframe,outputtheindexoftheGaussianscoringhighestintheGMM

o  Trainabigrammodelforeachlanguageovertheseindexes

•  Advantages:1.  Noneedofmanuallytranscribeddata

•  Avoidmislabeleddata•  Moredatacanbeaddedeasily

2.  GMMislessexpensivethanphonerecognizers,fasterprocessingduringrecogni>on3.  Canbecombinedwiththephonelanguagemodelscoretofurtherboostperformance

•  Results:

o  On12languages(test30s)

o  Thisapproachdidnotoutperformthebaseline

o  Butcombina>onof3approachesisthebest(83%)

o  Backendclassifieralwaysimproveaccuracy


1 2

HierarchicalLanguageIden>fica>onbasedonAutoma>cLanguageClustering(Yinetal.,2007)

•  Goal:Whichfeaturetypeismostimportanttodis>nguishonelanguagefromanother

•  Insteadoffusingallfeaturesinoneclassifier,useamul>‐levelhierarchicalclassifierthatsplitsthelanguagesbasedononefeaturetypeata>me

•  UseAgglomera>veclusteringtechnique.1.  Init:oneclusterforeachlanguage2.  Iterativelytrainclassi7iersoneachlanguageclusterpair:

•  mergethepairwithMAXfeature{minaccuracyoftheclassi7ierondev‐set}3.  Repeattheprocesstillalllanguagesareinonecluster

•  Featuretypes:

o  MFCCwith7coefficients(denotedasM)

o  Prosodicfeatures:PitchandIntensity(P)

o  Concatena>onofboth(M+P)

•  ResultsonOGI:(10languages,10s)

o  Thesystemoutperformssignificantlyallbaselines

o  Adds1.1%tobaseline

o  Accuracy:91.3%


Phone>cknowledge,phonotac>csandperceptualvalida>onforautoma>clanguageiden>fica>on(Adda‐Deckeretal.,2003)

•  Goal:Es>matetheupperboundofthephonotac>capproachbydiscardinglinguis>cnoiseduetorecogni>onerrors1.  ComparelanguageIDperformancetrainedonphone>chand‐labeleddatavs.automa>cphonerecognizer2.  Seehowwellhumansperforminiden>fyinglanguages

•  Experiments:(BN8languages,3heach)o  C/Vsequences,10classes,19megaphonesfrom70mul>lingualphoneso  Train5‐gramphonotac>cmodels

•  Resultsonhandlabeled:o  Testueerancesof10phoneso  Testueerancesof20phones,100%accuracyisobtained

•  Resultsonautoma*cphone*cannota*ons:o  UsingPRwith19classes:

•  <51.9%,10phones,0.7s><83.7%,40phones,3s><93.7%,80phones,6s>o  Using70classes,accuracyincreased(63.2%for10phones)

•  HumanLIDperformanceon8languageso  Subjectsweretrainedon20secondsperlanguageo  14na>veFrenchspeakingacademicslistenedto3tokensperlanguageof1.5to2secondso  Acombinedcorrectiden>fica>onis87.6%.Spanishwasthehardest(63%)


Mul>‐StreamSta>s>calN‐GramModelingWithApplica>ontoAutoma>cLanguageID(KirchhoffandParandekar,2001)

•  Goal:Usemul>pleparallelsequencesofar>culatoryphone>cfeatureso  Voicing,mannerofar>cula>on,consonantalplaceofar>cula>on,nasality,etc.o  E.g.,onestream:<glide,vowel,plosive,vowel,frica;ve,vowel,plosive,affrica;ve….>

•  Approach

o  Acous>cmodelforeachfeature

o  DecodeeachfeaturegroupFiindependently

o  Classifica>on:

o  Streamselec>on{manner,consonantalplace,vowelplace,front‐back,androunding}

•  Results:o  OGI‐TScorpus,including<3sueerances

o  Thephonebasedapproachperformsbeeerthanindependentstreamso  Withsomedependency,featuremodel>>phonemodelon

•  Onshortueerances50.8%54.8%(<15s)and33.3%48%(<3s)only

•  Advantages:o  Unseenphonecontextinthetestdatao  47featuremodels,but126phonemodelsmorerobustn‐grammodelso  Trainingdataforphone>cfeaturescanbesharedacrossphonesmorerobustacous>cmodels

o  Languageindependentnatureofphone>cfeatures


€

iargmaxP(F1,...,FK | Li)

Languageiden>fica>onwithsuprasegmentalcues:Astudybasedonspeechresynthesis(RamusandMehler,1999)

•  Goal: How do newborns separate input utterances from two languages in a bilingual environment?

•  Rhythmhypothesis:Newbornsareabletodiscriminatelanguageswhichhavedifferentrhythmicstructure

•  Intona*onHypothesis:Thediscrimina>onisonbasisofintona>onandnotrhythm

•  S*muli:20Japaneseand20Englishsentencesreadby4na>vespeakers

•  ResynthesisExperiments:

1.  saltanaj:Intona>on,rhythm,andbroadphone>ccategorieswerepreserved•  Allnon‐prosodic,lexicalandsyntac>cinforma>onwaslost(replaceallphonemesby6broad

categories)

2.  sasasa:Onlyintona>onandrhythmwerepreserved

3.  aaaa:Onlytheintona>onoftheoriginalsentenceswaspreserved.InterpolateF0overunvoicedframes

4.  Flatsasasa:Syllabicrhythmonlyispreserved(constantfundamentalfrequency)


Languageiden>fica>onwithsuprasegmentalcues:Astudybasedonspeechresynthesis(RamusandMehler,1999)

•  PerceptualExperiment:o  64AdultFrenchsubjectsweretoldthattheueeranceswerefromacous>callymodified

SahatuandMolteco  Subjects(awerpassingatrainingsession)wereaskedtoanswerSorM

•  Results:

o  saltanaj ,sasasa,andflat sasasa wereiden*fiedsignificantlyabovechancebutNOTaaaa

o  Whenthes>mulipresentedto16Englishsubjects,whoweretoldthatoneofthelanguagesisEnglish,theycouldsignificantlydiscriminatebetweenaaaa Englishandaaaa Sahatu

•  Conclusion

o  Syllabicrhythmwasarobustcuefordiscrimina*on

o  Intona>onscanbeofgreaterinterestofna>vespeakers


UsingSpeechRhythmforAcous>cLanguageIden>fica>on(TimoshenkoandHoge,2007)

•  Goal:UserhythmasafeatureforLanguageIDo  Speechrhythmismodeledusingthedura>onsoftwosuccessivesyllables

•  Automa>csyllabifica>onoftheacous>csignalishard(it’salanguagedependenttask).Instead,usepseudo‐syllables:o  CNV,whereN≥0.

o  dura>on=|CN|+|V|

•  Approach:

o  AspeechuDeranceismodeledasasequenceofpseudo‐syllabledura*onsD=d1d2…dNo  Learnabigrammodeloverthesesequencesforeachlanguage

o  Givenatestueerance,getthedura>onsequence,andtestwhichbigrammodelprovidesthehighestlikelihood

•  Results(on7languages,7s)

o  RhythmalonewithANNprovide~32%accuracy

o  Theacous>csystemalone:92.08%accuracy

o  Thefusedsystem92.9%accuracy


ModelingLongandShort‐termprosodyforlanguageiden>fica>on(Rouas,2005)

•  Goal:Inves>gatetheefficiencyofprosodicfeaturesforlanguageIDatthelevelofpseudosyllables

•  Modelingshortandlongtermprosody

o  Longtermprosodymodelsprosodicmovementsoverseveralpseudo‐syllables

o  Short‐termprosodyrepresentsprosodicmovementinsideapseudo‐syllable

•  F0ismodeledusingFujisakimodel(phraseandlocalaccentua>on)o  ThebaselineoftheF0contouriscomputedbyconnec>ngallthelocalminima(marktheslopeswithups/

downs/silence)o  subtractitfromtheoriginalF0contour.Theresidueisthenapproximatedusinglinearregressionineach

unit(long/short)andthenmarktheslopes

•  Energy:foreachpseudo‐syllablesegment,computethelinearregressionoftheenergyandthenmarktheslopes

•  Dura*on:marktheunitsasshort/longforeach(whatisshort/long?)

•  TheyuseaN‐mul>grammodeltomodelthesesequences

•  Results(on7languages,20s)

o  Thelong‐termprosodicmodelprovides:41%

o  Theshort‐termprosodicmodelprovides:63%

o  Mergingbothmodels:71.2%


Paper Approach Accuracy #Languages Dur

(HazenandZue,1993) A+P+PRLM 47.7% 10 10s

(Zissman,1996) ParallalPRLM+Dura>on+Genderdependentmodels

79% 11 10s

(KirchhoffandPar.,2001) PRLMofar>culatoryphone>cfeatures

48% 10 <3s

(Torresetal.,2002) ParallelPRLM+GMM‐A+tokenizer

83% 12 30s

(Addaetal.,2003) LMonmanualphones (100%) 8BN <3s

(Rouas,2005) Short+Longtermprosody 71.2% 7 20s

(Timosh.andHoge,2007) GMM‐A+rhythm 92.9% 7 7s

(Yinetal.,2007) A‐GMM+prosodicfeaturesinahierarchicalclassifier+RecentSPtechniques

91.3% 10 10s

ArabicDialectModelingandAnalysis‐‐Mo>va>on

•  Arabic:ModernStandardArabic(formal)andcolloquialArabic(informalorcasual)

•  Typically,notMSASpontaneous,unconstrained,notwellstudied,solackofNLPtools

•  OneofthemainchallengesofSpeechRecogni>onandNLPtasksistodealwithinformaldata

•  Goal:WhatarethespeechcuesthatmakeArabicdialectsdifferent?

o  DialectID

o  CodeSwitchingandMixing

9/26/08 DialectModeling 20

Maghrebi,Egyp>an,Sudanese,Levan>ne,Iraqi,Gulf

ArabicDialectModeling–Topics

1.  RhythmicandSyllabicStructure1.  (Ramus,2002)2.  (Hamdietal.,2004)3.  (Hamdietal.,2005)

2.  Intona*onandStress1.  (Barkat,1999)2.  (DeJongandZawaydeh,1999)3.  (HellmuthandElZarka,2007)

3.  HandlingASRproblemsforDialects1.  (KirchhoffandVergyri,2004)2.  (Vergyri,etal.,2005)

4.  Morphology1.  (HabashandRambow,2006)


Acous>ccorrelatesoflinguis>crhythm:Perspec>ves(Ramus,2002)

9/26/08 22

(Ramus,2002)–rhythm‐basedmeasuretoclassify:

Stressed‐TimedLanguages

Syllable‐TimedLanguages

Mora‐TimedLanguages

Acous>ccorrelatesoflinguis>crhythm:Perspec>ves(Ramus,2002)

•  Goals:Comparetworecentmeasurestodis>nguishrhythmicstructureoflanguages1.  Dura*onandvariabilityofvocalicandintervocalicintervals‐‐%V,ΔC,andΔV(Ramusetal.1999)2.  PairwiseVariabilityIndex(rPVIandnPVI)ofvocalicandnon‐vocalicintervals(GrabeandLow,2002)

•  Issue:Rhythmis,atleastinpart,amaeerofdura>on,anddura>onareaffectedbyspeakingrate(acrosslanguages)

•  Conclusion:nPVImethodisrobusttovariabilityduetovaria*onsofspeakingrate


Speech>mingandrhythmicstructureinArabicdialects:Acomparisonoftwoapproaches(Hamdi,2004)

9/26/08 24





English

Dutch

Arabic

(Hamdietal.,2004)RhythmacrossDialectsUsingRamusmetric

Speech>mingandrhythmicstructureinArabicdialects:Acomparisonoftwoapproaches(Hamdietal.,2004)

•  Goal:ExplorethedifferencesbetweentherhythmicstructureofArabicdialectso  Subjectsin(Barakatetal.,1999):“WesternArabicsoundedfasterandjerkierthanEasternArabic”

speechrhythm

•  Ques*on:Aretheresystema*crhythmicdis*nc*onsbetweendialects?•  DialectsandLanguages:6Arabicdialects(3western,and3eastern)and3otherlanguages

o  30sentencesperlanguage/dialectof2.5sdura>ononavg(3malespeakersperlanguage)

•  Method:Compute%V,andΔCforeachlanguage/dialectandcompare

•  Results:o  Agradualincreaseof%VasonemovesfromWesttoEast(*)o  ΔCdecreasesfromWesttoEast(**)o  FrenchhaslargervocalicintervalsthantheotherlanguagesandArabicdialectso  ΔCofFrenchissimilartothatoftheeasterndialectso  Significancedifferencesbetweenregions(Westvs.East)butnotbetweendialectswithinthesamegroupo  Theaveragevaluesof%Vofallwesterndialectissignificantlyhigherthantheeasterndialectso  TheaveragevaluesofΔCofEAissignificantlyhigherthanWA

•  Comparingrhythmmethods:highcorrela*onbetweenΔCandrPVI‐CandΔVandrPVIV •  IntheLIDframeworkprosodicmodel


SyllableStructureinSpokenArabic:acompara>veinves>ga>on(Hamdietal.,2005)

9/26/08 26





English

Dutch

Arabic

(Hamdietal.,2004)RhythmacrossDialectsUsingRamusmeasures

(Hamdietal.,2005)SyllabicStructureofArabicDialects

Syllable/rhythmrela>onship

SyllableStructureinSpokenArabic:acompara>veinves>ga>on(Hamdietal.,2005)

•  Goal:compareindetailthesyllabicstructureofthreeArabicdialects:Moroccan,Tunisian,andLebaneseArabictounderstandtheirrhythmictendency

•  Data:8‐10minutesofspontaneousspeechatnormalspeakingrateforeachdialect•  Analysis:

o  ConsonantclustersaremorefrequentinWesterndialects,especiallyinMoroccanArabico  CVandCVCarethetwodominanttypes,together:55%inMoroccan,65%inTunisianand76%inLebanese

‐‐CVisthemostfrequento  CVsyllablesaremuchmorefrequentinLebanesethaninwesterndialectso  Moroccandialectmayincludeupto3consonantsinonsetposi>onand2inthecodao  TunisianArabicsyllablecomplexityisbetweenMoroccanandtheLebanesedialects

•  Thefindingsofthispapersupportthoseof(Hamdietal.2004)o  Vowelreduc>onandshortvoweldele>oninwesterndialectslower%V(*)o  MorecomplexsyllabicstructurehigherΔC(**)

•  Conclusion:o  Withinalanguage,dialectsexhibitdetectabledifferencesinrhythmic/syllabic

characteris*cs•  IntheLIDframework(broad)phonotac*cmodel+prosodicmodel




2.  Intona*onandStressofArabicDialects1.  (Barkat,1999)2.  (DeJongandZawaydeh,1999)3.  (HellmuthandElZarka,2007)




ProsodyasaDis>nc>veFeaturefortheDiscrimina>onofArabicDialects(Barakatetal.,1999)


F0

(Barakat,1999)F0andenergydifferencesbetweenEandWArabic

ProsodyasaDis>nc>veFeaturefortheDiscrimina>onofArabicDialects(Barakatetal.,1999)

•  Goal:Testifprosodicpaeernsarereliablecuesforperceptuallydiscrimina>ngArabicdialectso  4Dialects:WesternArabic(MoroccoandAlgerian)andEasternArabic(SyriaandJordan)

•  Data:Six“passages”spokenby4malespeakers24totalforeachdialect

•  Subjects:19Na>vewesternArabic;and19non‐Arabic

1.  Baselineperceptualexperiment:naturalspeechtoevaluatethesubjects'knowledgeandpercep>onofdialectso  Results:97%ofcorrectiden>fica>onbytheArabicsubjectsand56%(significant)bythe

non‐Arabic

2.  Maskingperceptualexperiment(buzzsounds)toevaluatethereliabilityofprosodicinforma>onondiscrimina>ono  Results:58%ofcorrectiden>fica>on(significant)byArabicsubjectsand49%(not

significant)bynon‐Arabic

•  Problem:thetwoexperimentswerepresentedtothesamesubjectsinarow


Stress,dura>on,andintona>oninArabicword‐levelprosody(deJongandZawaydeh,1999)


F0

(DeJongandZawaydeh,1999)Acous>c‐prosodiccorrelatesof

lexicalstress

Wordi

Lexicalstress

Stress,dura>on,andintona>oninArabicword‐levelprosody(deJongandZawaydeh,1999)

•  Goals:ExaminetheprosodiccorrelatesofAmmani‐JordanianArabiclexicalstresso  DoesArabichavesimilarcuesasinEnglish?Increaseindura>on,extremeformantvalues,increasedintensity,andF0

•  Data:10typesofwordsspokeninthefiveprosodiccondi>onsspokenin5condi>onsby4speakers(targetsyllableshad/d/onsetsand/a/inthenucleus)

•  Dura*ono  Dura>onofvowelsinstressedsyllables>>unstressedsyllablesinantepenul>mate(word)posi>ononlyo  Syllableposi>onmoreconsistentdeterminerofdura>onacrosssubjectsthanisthestress

•  FormantpaDerns:o  Stressed/a/hasasystema>callyhigherF1

•  F0paDernso  StressassociatedwithanincreaseinF0o  F0inpenul>matesignificantlygreaterthaninantepenul>matesyllables

•  Rela*onshipsbetweenhigher‐levelprosodyandword‐leveleffects:o  Voweldura>onsforsyllablefollowedbybreakindex4issignificantlylongerfrom2,3,butnosignificantdifference

between2,3o  MostspeakersuseL‐L%contoursforstatementsandL‐H%forques>onso  Highpitchaccentscommonlyoccurinstatements

•  Conclusion:Arabicword‐levelprosodyisremarkablylikethatofEnglisho  Theexpressionofstress,linkageofpitchaccentstostressedsyllables,andintheoccurrenceofpre‐

boundarylengthening


Varia>oninphone>crealiza>onorinphonologicalcategories?Intona>onalpitchaccentsinEgyp>anColloquialArabicandEgyp>anFormalArabic(HellmuthandElZarka,2007)


F0

Wordi

Lexicalstress

(HellmuthandElZarka2007)Egyp>anColloquialvs.Formal

Registers

2ndRegisterF0

Varia>oninphone>crealiza>onorinphonologicalcategories?Intona>onalpitchaccentsinEgyp>anColloquialArabicandEgyp>anFormalArabic(HellmuthandElZarka,2007)

•  Goal:Exploretheassump>onthatformalArabicwillhavetheintona>onalcharacteris>csofthespeaker’scolloquialvariety

•  Material:2Egyp>anspeakersreadECAandEFAwordsthatsharesamestressedsyllables.Wordswereputinsentencesandread3>mes(total72targetwords)

•  Qualita*veAnalysis:o  Similari>esbetweenECAandEFA

1.  Pitchaccentonalmosteverycontentwordinbothregisters2.  Accentshapeforacontentwordismostlythesameforbothregisters3.  LowplateaubetweensuccessiveHpeaks

o  Differences:•  EFAcontainsgreaterpropor>onofphraseboundaries

•  Quan*ta*veAnalysis:Testpitcheventtostressedsyllable:o  Rela>vepeakdelay:(distanceofHpeakfromthestressedsyllableonset)/(stressedsyllabledura>on)

variessignificantlybetweenregistersforspeakerAonlyandnotspeakerB.o  Forbothspeakers,inCVVandCVCsyllables,Hisalignedwithinaccentedsyllableinbothregisters

•  Conclusion:

•  speakerscarrytheirprosodiceventsofthemother’stonguedialect


ProsodicdifferencesamongArabicdialects


F0

(Barakatetal.,1999)F0andenergydifferencesbetweenEandWArabic

(DeJongandZawaydeh,1999)Acous>c‐prosodiccorrelatesof

lexicalstress

Wordi

Lexicalstress

(HellmuthandElZarka2007)Egyp>anColloquialvs.Formal

Registers

2ndRegisterF0



2.  Intona*onandStressofArabicDialects1.  (DeJongandZawaydeh,1999)2.  (Barkat,1999)3.  (HellmuthandElZarka,2007)




ASRforArabic

•  TypicallyArabictranscriptslackshortvowels(smalldiacri>csinArabicscript)

•  Mul>plevalidvocaliza>ons=>pronuncia>onformostwordschallengeforASR

9/26/08 37

qbl

• qabl,qabla,qabli,qablu(before)• qabila(accept)• qab ala(tokiss)• ….

قبل

قبل

قبل

Cross‐DialectalAcous>cDataSharingforArabicSpeechRecogni>on(KirchhoffandVergyri,2004)

•  Goal:UseunvocalizedMSAdatatoimproveASRforEgyp>anConversa>onalArabic(ECA)

•  Mo*va*on:NotenoughtrainingdataforECA,especiallyfortriphoneacous>cmodelso  40%oftheCallHome(ECAcorpus)triphonesalsooccurintheFBIS(MSAcorpus)

•  Automa*cDiacri*za*on1.  Generateallpossiblediacri>zedvariantsforeachword,alongwiththeirmorphologicalanalyses2.  Trainanunsupervisedtrigramtaggertoassignprobabili>estosequencesofmorphologicaltags3.  Usethetrainedtaggertoassignprobabili>estoallpossiblediacri>za>onsforagivenueerance4.  Usetheweighteddiacri>za>onsaspronuncia>onnetworksanduseacous>cmodelstrainedonECAto

findthemostlikelydiacri>za>on

•  Results:

o  Trainingasystemwiththepooleddata(CallHome+FBIS)didnotoutperformbaseline

o  Whentrainingtwoindependentsystems,ROVERcombina*onoutperformsCallHome‐Only0.8%absoluteimprovementondevsetand1.0%improvementonevalset(0.1significancelevel)(accuracy58.3%)


DevelopmentofaConversa>onalTelephoneSpeechRecognizerforLevan>neArabic(Vergyrietal.,2005)

•  Goal:DescribethedevelopmentofLevan>neSpeechRecognizer,anddiscuss:o  GraphemeAcous*cModels

•  Eachacous>cmodelimplicitlymodelseitheralongvoweloraconsonantwithop*onalshortvowel(obtainedbysimpleorthographicrules)

o  ModelingofShortVowels1.  GenericVowel:Addoneop>onalgenericvowelphoneinallpossibleposi>onsinpronuncia>on2.  Auto‐Vowel:

1.  Annotatesubsetoftrainingdata2.  Train4‐gramlanguagemodelwithhiddeneventstopredictthevowelsinalltrainingdata(30%ofthewordshaveatleastonewrongcharacter)

o  Morphologicallanguagemodeling•  Affixa>onsandasubsetofPOSareiden>fiedusing“asimplescriptandknowledgeofLevan>ne”•  UseMSAmorphologicalanalyzertrainfactoredLM

•  Results:o  Combina>onofAuto‐vowelizedAM+LMoutperformthegenericandgraphememodelso  FactoredLMimprovestheaccuracybutnotsignificantly.o  ROVERcombina>onofsystems(grapheme+genericvowel+auto‐vowel):accuracy53.5%–

thebest




2.  Intona*onandStressofArabicDialects1.  (DeJongandZawaydeh,1999)2.  (Barkat,1999)3.  (HellmuthandElZarka,2007)




MAGEAD:AMorphologicalAnalyzerandGeneratorfortheArabicDialects(HabashandRambow,2006)

•  Goal:Developageneralframeworkformorphologicalanalyzerandgeneratorfordialectsofonelanguagefamily


tense:FUT

PART:FUT

… ….

Word

Verb

VerbTr VerbInt

Noun …

DialectIndependent

CFG for ordering

Orth./phon.rules

Surface

CM1

CMn

Example:forEGY:[PART:FUT]Ha+

•  MAGEADAdvantages: o  Verygeneralframework,suppor>nganewdialectrequiresspecifyingconcretemorphemesand

orthographicandphonologicalrulesforthisdialecto  Itcanbeusedwithoutalexiconorwithapar>allexicono  Itcanbeusedasanalyzerandgeneratoro  Itaddsshortvowelstotheanalyzedwords(goodforASR)

•  Evalua*on(onverbsof3radicals):o  ThesystemoutperformsBuckwalteranalyzeronMSA(onmbcverblist)

•  tokenprecision:94.9%;recall:95.8%o  OnLevan>ne(onall)

•  Contexttokenrecall:MSAsystemonLevan>nedata:60.4%Levan>nesystem:94.2%

DisfluencyDetec>on–Mo>va>on

•  ~10%ofspontaneousueerancescontaindisfluencies(Hindle,1982)•  OneDisfluencyper4.6secondsforradiotalkshows(BlackmerandMieon,1991).

•  Disfluencytypes:

o  Hesita*ons:“Ch*ChangeStrategy”

o  Restarts(orfalsestarts):“It’salso*Ilikeit”

o  Fillers:(filledandunfilled):“um*Bal;more”

o  Selfrepairs(orself‐correc*ons):“Ithinkthatyouget*it’smorestrictinCatholic”

•  Disfluenciesareuseful!

o  Disfluenciesmayfacilitatelanguageacquisi>onbyhighligh>ngequivalentclasses

o  Some>mesreducethementalandmemoryloadtodigestinforma>on

o  Ge�ngorkeepingthefloor

o  Fordialoguesystems(topretendreal‐>meperformance,keeptheturn)

•  DisfluenciesareanobstaclesforNLPtasks:

o  ASR,SpeechUnderstanding,Parsing,QA,andSummariza>on

9/26/08 DisfluencyDetec>on 42

DisfluencyDetec>on–Topics

1.  DisfluencyCorrec>onandIden>fica>on1.  (Hindle,1982)2.  (NakataniandHirschberg,1994)3.  (Liuetal.,2003)4.  (Snoveretal.,2004)

2.  ModelingDisfluencytoimproveASR

1.  (Stolckeetal.,1999)2.  (StoutenandMartens,2004)

3.  HumanandDisfluency

1.  (BardandLickley,1997)

9/26/08 43

9/26/08 44

1.Text:GivemeairlinesflyingatuhflyingtoBostonfromSanFrancisconext…

1.  (NakataniandHirschberg,1994)2.  (Liuetal.,2003)

Givemeairlinesflyingat‐‐uhflyingtoBostonfromSanFrancisconext…

Interrup>onPoint

2.A+P:

(Snoveretal.,2004)

(Hindle,1982)

GivemeairlinesflyingtoBostonfromSanFrancisconext…

DisfluencyCorrec*on

IPIden*fica*oninrepairs

TextualInputwithEditSignalannotated

Determinis>cparsingofSyntac>cnon‐fluencies(Hindle,1982)

•  Goal:Expungeself‐repairsproducewell‐formedsyntac>cstructurethatisconsistentwiththeintendedmeaning

•  Edi*ngSignal:Minimalnon‐lexicalmaterialthatself‐repairmightinsert

•  Assump*on:phone>callyrecognizableandequivalent

•  Method:Integratecorrec>onrulesinaparsertospecifyhowmuch,ifanything,toexpungewhenaneditsignalisdetected

•  Rules:

o  SurfaceCopyEditor:searchforexactrepe>>onseparatedbyeditsignal;expungeone

o  CategoryCopyEditor:searchforexactrepe>>onwithsamecategoryseparatedbyaneditsignal,expungethefirst

o  StackCopyEditor:searchforexactrepe>>onwithsimilarsyntac>ccons>tuentseparatedbyaneditsignal,expungethefirst

•  Resultsononeinterview:

o  1512.27%ofthesentenceshadeditsignal,73%ofthesentenceshadnoeditsignal

o  Surfacecopy:29%|CategoryCopy:9%|StackCopy:27%

o  Removingeditsignalonly:24%

o  Failures:3%|Remainingunclearandungramma>cal:2%

9/26/08 45

Acorpus‐basedstudyofrepaircuesinspontaneousspeech(NakataniandHirschberg,1994)

•  Goals:

1.  Proposeaframeworktoinves>gaterepairsthatdividestherepaireventintothreetemporalintervals(RIM)

2.  Iden>fyrobustacous>c‐prosodiccuesineachoftheseintervalstodetectrepairswithnorelianceuponsophis>catedunderstandingofthetext

3.  BuildarepairISdetector

9/26/08 46

GivemeairlinesflyingtoSa‐‐silence uhsilence flyingtoBostonfromSanFrancisconext…

Reparandum DisfluencyInterval RepairInterval

Interrup>onSite

Data:6414ueerancesfromtheARPAAirlineTravelandInforma>onSystem122speakers346ueerancescontainedatleastonerepair(5.4%)


9/26/08 47

•  Reparandum•  73.3%ofallreparandaendinwordfragment

•  Majorityoffragmentwordsarecontentwordsandrarelymorethanonesyllablelong,some>mesgloealizedandsome>mesexhibitcoar>culatoryeffects.

•  #wordsinreparandum:(non‐fragmentrepairs:1,52%2,32%)(fragment:1,65%2,23%)

•  DisfluencyInterval

•  FilledpausesandcuephrasesoccurinDI(9.4%)–significantlymoreoreninnon‐fragmentrepairsthaninfragmentrepairs

•  Speakerstakeless>metoini>atetheproduc>onoftherepairinginfragmentedrepairs

•  DIdura>onforfragmentrepairsissignificantlyshorterthanfornon‐fragmentrepairs

•  SmallbutreliableincreasesinF0andamplitudefromtheendofthereparandumtothebeginningoftherepair

•  TheRepairInterval

•  Phraseboundariescanservetoiden*fytherepairregion

•  For43%oftherepairs,therepairoffsetcoincideswithphraseboundary

•  70%oftheremaininghavethefirstphraseboundaryawertherepaironsetattherightedgeofasyntac>ccons>tuent


•  Predic>ngrepairsfromacous>candProsodicCueso  Dis>nguish{IS,fluent‐phraseboundary,non‐repairdisfluency,simplewordboundary}

o  Theyconsideredeverywordboundarytobeapoten>alrepairsite

o  UDerancesinthetestdatahaveatleastoneIS

o  Featureexamples:

•  Thedura>onofpausebetweenwiandwj•  Theoccurrenceofoneormorewordfragmentswithinwiandwj

•  Recall:86.1%Precision:91.2%

9/26/08 48

Automa>cDisfluencyIden>fica>oninConversa>onalSpeechUsingMul>pleKnowledgeSources(Liuetal.,2003)

•  Goal:Inves>gatemul>pleknowledgesourcesofiden>fyingreparanda1.  DecisionTreeclassifierthatusestheacous>cprosodicfeaturesposteriorprobability:IPvs.non‐IPbetween

eachpairofwords•  Dura>on+F0features+voicequalityfeatures

o  POS/WordLMwithhiddenevent“<IP>”

•  Results:o  Prosodymodelonly>>chanceperformanceondownsampleddata.Recall77.5%;precision77.6%

(baselineis50%)o  Onnon‐downsampleddata:Word‐LM&POS‐LM&Prosody>>baseline(96.62%)

•  Accuracy:98.1%•  Recall:56.76%•  Precision:81.25% •  Somedegrada>ononASRoutput(usingonlyWord‐LM97.05vs.98.01)•  Morerepe>>on(IPs)areiden>fiedbythepaeernLMthanbytheword‐basedLM.

9/26/08 49

ALexically‐DrivenAlgorithmforDisfluencyDetec>on(Snoveretal.,2004)

•  Goal:Designtransforma>on‐basedlearningapproachtodisfluencydetec>onusingprimarilylexicalfeatureswithouttheuseofextensiveprosodiccueso  Ruleexample:“changethelabelofwordwithPOSXfromL1toL2iffollowedbywordwithPOSY”

•  Task:TageachwordineitherareferenceorASRsentencewith{filler,edit,fluent}

•  Features:o  Lexemes,speakeriden>ty,andwhetherthewordisfollowedbyasilence

•  Training:o  Input:Timealignedtranscript:speakerid,sentenceboundaries,edits,fillersandinterrup>on

pointsareannotatedo  Rulesarecreatedbyexpandingruletemplates,whicharegivenasinputtothelearner,forexample:

•  ChangethelabelofwordXfromL1toL2

o  Thealgorithmgreedilyselectstherulethatreducestheerrorratethemost106ruleswerelearned

•  Results(usinglexemeerrorrate):•  2Baselines:bothsystemsusingprosodicandlexicalfeatures•  NosystemperformswellonASRtranscripts(86‐96%foredits).•  Filledpauseiden*fica*on:REF:~18%,ASR:48‐57%(comparabletoacous>c/prosodicsystem)•  Edits:Acous>csystem(59%)significantlyoutperformsthelexicalsystem(68%)onlyinCTS.

9/26/08 50


1.  DisfluencyCorrec>on1.  (Hindle,1982)2.  (NakataniandHirschberg,1994)3.  (Liuetal.,2003)4.  (Snoveretal.,2004)





9/26/08 51

Modeling Disfluency to Improve ASR

9/26/08 52

ASR

(Stolckeetal.,1999)Modelprosodydisfluency

(StoutenandMartens,2004)FPdetectorasafront‐end

ModelingtheProsodyofHiddenEventsforImprovedWordRecogni>on(Stolckeetal.,1999)

•  Goal:Modelprosodytoimprovespeechrecogni>onbymodifyingthelanguagemodeltorepresenthiddenevents:

•  sentenceboundariesandvariousformsofdisfluencies

9/26/08 53

Right <S> I <REP> I don’t <DEL> uh <FP> I’m not really sure

€

wW *=argmaxP(W | A, F) ≈

wargmax P(W , S, F)P(A |W )

S∑

StandardAcous>cModels

€

P(W , S) P(F |W ,S)

N‐gramofwordsandevents

€ €

≈ P(Fi | Ei,W )i∏Ficomputedfromwindowi

HMM:•  States:<word,event>pairs•  Observa*ons:prosodicfeatures

•  Transi*onprobabili*es:n‐gramprobabili>es

•  Emissionprobabili*es:posteriorprobabili>esfromadecisiontree

€

P(W , S, F)

W=wordsequence;A=acous>cfeatures;F=prosodicfeaturesS=sequenceofevents;

ModelingtheProsodyofHiddenEventsforImprovedWordRecogni>on(Stolckeetal.,1999)

•  Prosodicfeatureso  Dura>onsofpauses,offinalvowels,andofsyllablerhymes

•  0.9%significantabsolutereduc>onofworderrorrate

•  ErrorAnalysis:

•  Fewersubs>tu>onandinser>onbutmoredele>on.•  Prosodicmodelreduceserrors

•  ofhigh‐frequencywordsthattendtooccuratsentenceboundaries•  …thatatchurchto<s>….thatatchurchtoo<s>

•  occuraroundfilledpauses•  …toperforminandcolweather….toperforminUHcoldweather

9/26/08 54

CopingwithDisfluenciesinSpontaneousSpeechRecogni>on(StoutenandMartens,2004)

•  Goal:DetectfirstsimpledisfluenciesandthenchangethebehaviorofthesearchengineandLM(forDutch)

•  IfaFPisdetected,o  FPframedropping:discardtheframesinFPinterval

•  Significantreduc>onofWERwhenusingreferenceonly

o  FPprobabilityadapta*on:LocallyraisetheprobabilityofenteringFPstate•  ChangetheprobabilitytotheFParcintheLMwhenmorethan50%oftheframesconsumedbythis

arcfallinsideadetectedFPinterval•  foreachFP,1.04(auto)‐1.7(ref)wordswerecorrectedFP;tradi>onalmodel(modelingFPasa

word):0.75

•  Ifawordrepe>>onisiden>fied,o  WRframedropping:Droptheframesoftherepeatedword

•  Corrects0.6%wordsperWR(ref)

o  WRprobabilityadapta*on:RaisetheprobabilityofpathsthatincludeWRs•  Changetheprobabilityofreenteringthewordawerdetec>ngrepe>>on•  Corrects1.03%wordsperWR(ref)

9/26/08 55


1.  DisfluencyCorrec>onandIden>fica>on1.  (Hindle,1982)2.  (NakataniandHirschberg,1994)3.  (Liuetal.,2003)4.  (Snoveretal.,2004)





9/26/08 56

OnNotRememberingDisfluencies(BardandLickley,1997)

•  Goal:Examineevidencethatfailuresofmemoryandpercep>onareinvolvedinhumanabilitytomissdisfluencies

•  S*muli:Spontaneousspeechueeranceswith80simplexand16complexdisfluencies•  30recast(nowordsfromreparandumisrepeatedinrepair)and50withrepeats

•  Subjectswereinstructedtotranscribeeverythingtheyheardintorealwordsinstandardorthographyandtobeasaccurateaspossible.

•  Results:1.  Listenershadgreatdifficultyinrepor*ngwordsfromreparanda2.  Recallofwordsinreparandawasworseinthelongeststrings3.  Allfluentoutcomesweresignificantlybeeerthananydisfluentoutcomes4.  Thelongertherepairthelessrecallofwordsinreparanda5.  Reportratefallsmoresharplyinrepe**ondisfluenciesthaninothers6.  Repe>>ondisfluenciesaresignificantlymoreforgeeablethanotherswhentheyoccurinueerances

whicharealreadydifficulttoprocessbecauseofmul>plefalsestarts

  Repe>>ondeafnesshelpstoexpungedisfluencies

•  UsingMul*pleregression,repe>>onandrecastdisfluenciesweresubjecttosomewhatdifferentinfluences

9/26/08 57

9/26/08 58

Documents

9/26/08 1fadi/candidacy/fadi-candidacy.pdfdh uw z hh ih n d uw w ay ey d y aw ao uh jh y eh k oh aa k v hh aw ao n f uw v ow z l iy g s m p l k dh n eh g f ey m p l ay ae dh iy jh