Upload
dangdat
View
216
Download
1
Embed Size (px)
Citation preview
Deep Neural Networks forAcoustic Modeling in Speech
Recognition
Hinton,Geoffrey,etal.“Deepneuralnetworksforacousticmodelinginspeechrecognition:Thesharedviewsoffourresearchgroups.” Signal
ProcessingMagazine,IEEE 29.6(2012):82-97.
Presented by PeidongWang04/04/2016
1
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
2
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
3
SpeechRecognitionSystem
• Goal• Convertingspeechtotext
• AMathematicalPerspective
orw = argmax
w{P(w |Y )}
w = argmaxw
{P(Y |w)P(w)}
4
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
5
GMM-HMMModel
• GMM and HMM• GMM is short for Gaussian Mixture Model, and HMM isshort for Hidden Markov Model.
• PredecessorofDNNs• Before Deep Neural Networks (DNNs), the most commonlyused speech recognition systemswere consistedof GMMsand HMMs.
6
GMM-HMMModel
• HMM• HMMisusedtodealwiththetemporalvariabilityofspeech.
• GMM• GMMisusedtorepresenttherelationshipbetweenHMMstatesandtheacousticinput.
7
GMM-HMMModel
• Features• ThefeaturesistypicallyrepresentedbyconcatenatingMel-frequencycepstralcoefficients(MFCCs)orperceptuallinearpredictivecoefficients(PLPs)computedfromtherawwaveformandtheirfirst- andsecond-ordertemporaldifferences.
8
GMM-HMMModel
• Shortcoming• GMM-HMMmodelsarestatisticallyinefficientformodelingdatathatlieonornearanonlinearmanifoldinthedataspace.• Forexample,modelingthesetofpointsthatlieveryclosetothesurfaceofasphere.
9
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
10
TrainingDeepNeuralNetworks
• DeepNeuralNetwork(DNN)• ADNNisafeed-forward,artificialneuralnetworkthathasmorethanonelayerofhiddenunitsbetweenitsinputsanditsoutputs.•Withnonlinearactivationfunctions,DNNisabletomodelanarbitrarynonlinearfunction(projectionfrominputstooutputs).[*]
[*]Addedbythepresenter.
11
TrainingDeepNeuralNetworks
• ActivationFunctionoftheOutputUnits• Theactivationfunctionoftheoutputunitsis“softmax”function.• Themathematicalexpressionisasfollows.
pj =exp(x j )exp(xk )
k∑
12
TrainingDeepNeuralNetworks
• ObjectiveFunction•Whenusingthesoftmaxoutputfunction,thenaturalobjectivefunction(costfunction)Cisthecross-entropybetweenthetargetprobabilitiesdandtheoutputsofthesoftmax,p.• Themathematicalexpressionisasfollows.
C = dj log pjj∑
13
TrainingDeepNeuralNetworks
•WeightPenaltiesandEarlyStopping• Toreduceoverfitting,largeweightscanbepenalizedinproportiontotheirsquaredmagnitude,orthelearningcansimplybeterminatedatthepointwhichperformanceonaheld-outvalidationsetstartsgettingworse.
14
TrainingDeepNeuralNetworks
• OverfittingReduction• Generallyspeaking,therearethreemethods.•Weightpenaltiesandearlystoppingcanreducetheoverfittingbutonlybyremovingmuchofthemodelingpower.• Verylargetrainingsetscanreduceoverfittingbutonlybymakingtrainingverycomputationallyexpensive.• GenerativePretraining
15
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
16
GenerativePretraining
• Purpose• Themultiplelayersoffeaturedetectors(theresultofthisstep)canbeusedasagoodstartingpointforadiscriminative“fine-tuning”phaseduringwhichbackpropagationthroughtheDNNslightlyadjuststheweightsandimprovestheperformance.• Inaddition,thisstepcansignificantlyreduceoverfitting.
17
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)• RBMconsistsofalayerofstochasticbinary“visible”unitsthatrepresentbinaryinputdataconnectedtoalayerofstochasticbinaryhidden (latent)unitsthatlearntomodelsignificantnonindependenciesbetweenthevisibleunits.• Thereareundirectedconnectionsbetweenvisibleandhiddenunitsbutnovisible-visibleorhidden-hiddenconnections.
18
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• TheframeworkofanRBMisshownbelow.
From:SlidesinCSE5526NeuralNetworks19
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• RBMusesasinglesetofparameters,W,todefinethejointprobabilityofavectorofvaluesoftheobservablevariables,v,andavectorofvaluesofthelatentvariables,h,viaanenergyfunction,E.
20
p(v,h;W ) = 1Ze−E (v,h;W ),Z = e−E (v ',h ';W )
v ',h '∑
E(v,h) = − aivii∈visible∑ − bjhj
j∈visible∑ − vihjwij
i, j∑
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• Theprobabilitythatthenetworkassignstoavisiblevector,v,isgivenbysummingoverallpossiblehiddenvectors.
• Thederivativeofthelogprobabilityofatrainingsetwithrespecttoaweightissurprisinglysimple.Theanglebracketsdenoteexpectationsunderthecorrespondingdistribution.
p(v) = 1Z
e−E (v,h)h∑
1N
∂log p(vn )∂wijn=1
N
∑ =< vihj >data − < vihj >model
21
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• Thelearningruleisthusasfollows.
• Abetterlearningprocedureiscontrastivedivergence(CD),whichisshownbelow.Thesubscript“recon”denotesastepinCDwhenthestatesofvisibleunitsareassigned0or1accordingtothecurrentstatesofthehiddenunits.
Δwij = ε(< vihj >data − < vihj >model )
Δwij = ε(< vihj >data − < vihj >recon )
22
GenerativePretraining
•ModelingReal-ValuedData• Real-valueddata,suchasMFCCs,aremorenaturallymodeledbylinearvariableswithGaussiannoiseandtheRBMenergyfunctioncanbemodifiedtoaccommodatesuchvariables,givingaGaussian-BernoulliRBM(GRBM).
E(v,h) = (vi − ai )2
2σ i2
i∈vis∑ − bjhj
j∈hid∑ − vi
σ i
hjwiji, j∑
23
GenerativePretraining
• StackingRBMstoMakeaDeepBeliefNetwork• AftertraininganRBMonthedata,theinferredstatesofthehiddenunitscanbeusedasdatafortraininganotherRBMthatlearnstomodelthesignificantdependenciesbetweenthehiddenunitsofthefirstRBM.• Thiscanberepeatedasmanytimesasdesiredtoproducemanylayersofnonlinearfeaturedetectorsthatrepresentprogressivelymorecomplexstatisticalstructureinthedata.
24
GenerativePretraining
• StackingRBMstoMakeaDeepBeliefNetwork(Cont’d)
From:Thepaper25
GenerativePretraining
• InterfacingaDNNwithanHMM• InanHMMframework,thehiddenvariablesdenotethestatesofthephonesequence,andthe“visible”variablesdenotethefeaturevectors.[*]
[*]Addedbythepresenter
From:Gales,Mark,andSteveYoung."TheapplicationofhiddenMarkovmodels inspeechrecognition.”Foundationsandtrendsinsignalprocessing 1.3(2008):195-304. 26
GenerativePretraining
• InterfacingaDNNwithanHMM(Cont’d)• TocomputeaViterbialignmentortoruntheforward-backwardalgorithmwithintheHMMframework,werequirethelikelihoodp(AcousticInput|HMMstate).• ADNN,however,outputsprobabilitiesoftheformp(HMMstate|AcousticInput).
27
GenerativePretraining
• InterfacingaDNNwithanHMM(Cont’d)• TheposteriorprobabilitiesthattheDNNoutputscanbeconvertedintothescaledlikelihoodbydividingthembythefrequenciesoftheHMMstatesintheforcedalignmentthatisusedforfine-tuningtheDNN.• Forcedalignment isaprocedureusedtogeneratelabelsforthetrainingprocess.[*]
[*]Addedbythepresenter
28
GenerativePretraining
• InterfacingaDNNwithanHMM(Cont’d)• All of the likelihoods produced in this way are scaled by thesame unknown factor of p(AcousticInput).• Although this appears to have little effect on somerecognition tasks, it can be important for tasks wheretraining labels are highly unbalanced.
29
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
30
Experiments
• PhoneticClassificationandRecognitiononTIMIT• TheTIMITdatasetisarelativelysmalldatasetwhichprovidesasimpleandconvenientwayoftestingnewapproachestospeechrecognition.
31
Experiments
• PhoneticClassificationandRecognitiononTIMIT(Cont’d)
From:Thepaper32
Experiments
• Bing-Voice-SearchSpeechRecognitionTask• Thistaskused24hoftrainingdatawithahighdegreeofacousticvariabilitycausedbynoise,music,side-speech,accents,sloppypronunciation,etal.• ThebestDNN-HMMacousticmodelachievedasentenceaccuracyof69.6%onthetestset,comparedwith63.8%forastrong,minimumphoneerror(MPE)-trainedGMM-HMMbaseline.
33
Experiments
• Bing-Voice-SearchSpeechRecognitionTask(Cont’d)
From:Thepaper 34
Experiments
• OtherLargeVocabularyTasks• SwitchboardSpeechRecognitionTask(acorpuscontainingover300hoftrainingdata)• GoogleVoiceInputSpeechRecognitionTask• YouTubeSpeechRecognitionTask• EnglishBroadcastNewsSpeechRecognitionTask
35
Experiments
• OtherLargeVocabularyTasks(Cont’d)
From:Thepaper 36
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
37
Discussion
• ConvolutionalDNNsforPhoneClassificationandRecognition• AlthoughconvolutionalmodelsalongthetemporaldimensionachievedgoodclassificationresultsonTIMITcorpus,applyingthemtophonerecognitionisnotstraightforward.• ThisisbecausetemporalvariationsinspeechcanbepartiallyhandledbythedynamicprogramingprocedureintheHMMcomponentandhiddentrajectorymodels.
38
Discussion
• SpeedingUpDNNsatRecognitionTime• ThetimethataDNN-HMMsystemrequirestorecognize1sofspeechcanbereducedfrom1.6sto210ms,withoutdecreasingrecognitionaccuracy,byquantizingtheweightsdownto8busingCPU.• Alternatively,itcanbereducedto66msbyusingagraphicsprocessingunit(GPU).
39
Discussion
• AlternativePretrainingMethodsforDNNs• ItispossibletolearnaDNNbystartingwithashallowneuralnetwithasinglehiddenlayer.Oncethisnethasbeentraineddiscriminatively,asecondhiddenlayerisinterposedbetweenthefirsthiddenlayerandthesoftmaxoutputunitsandthewholenetworkisagaindiscriminativelytrained.Thiscanbecontinueduntilthedesirednumberofhiddenlayersisreached,afterwhichfullbackpropagationfine-tuningisapplied.
40
Discussion
• AlternativePretrainingMethodsforDNNs(Cont’d)• PurelydiscriminativetrainingofthewholeDNNfromrandominitialweightsworkswell,too.• Varioustypesofautoencoderwithonehiddenlayercanalsobeusedinthe layer-by-layergenerativepretrainingprocess.
41
Discussion
• AlternativeFine-TuningMethodsforDNNs•MostDBN-DNNacousticmodelsarefine-tunedbyapplyingstochasticgradientdescentwithmomentumtosmallminibatchesoftrainingcases.•Moresophisticatedoptimizationmethodscanbeused,butitisnotclearthatthemoresophisticatedmethodsareworthwhilesincethefine-tuningprocessistypicallystoppedearlytopreventoverfitting.
42
Discussion
• UsingDBN-DNNstoProvideInputFeaturesforGMM-HMMSystems• ThisclassofmethodsuseneuralnetworkstoprovidethefeaturevectorsforthetrainingprocessoftheGMMinaGMM-HMMsystem.• Themostcommonapproachistotrainarandomlyinitializedneuralnetwithanarrowbottleneckmiddlelayerandtousetheactivationsofthebottleneckhiddenunitsasfeatures.
43
Discussion
• UsingDNNstoEstimateArticulatoryFeaturesforDetection-BasedSpeechRecognition• DBN-DNNsareeffectivefordetectingsubphoneticspeechattributes(alsoknownasphonologicalorarticulatoryfeatures).
44
Discussion
• Summary•MostofthegaincomesfromusingDNNstoexploitinformationinneighboringframesandfrommodelingtiedcontext-dependentstates.• Thereisnoreasontobelievethattheoptimaltypesofhiddenunitsortheoptimalnetworkarchitecturesareused,anditishighlylikelythatboththepretrainingandfine-tuningalgorithmscanbemodifiedtoreducetheamountofoverfittingandtheamountofcomputation.
45
Thank You!
46
InvestigationofSpeechSeparationasaFront-Endfor
NoiseRobustSpeechRecognition
Narayanan,Arun,andDeLiangWang."Investigationofspeechseparationasafront-endfornoiserobustspeechrecognition."Audio,Speech,andLanguageProcessing,IEEE/ACMTransactionson 22.4
(2014):826-835.
Presented by PeidongWang04/04/2016
47
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
48
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
49
Introduction
• Background• Althoughautomaticspeechrecognition(ASR)systemshavebecomefairlypowerful,theinherentvariabilitycanstillposechallenges.• Typically,ASRsystemsthatworkwellincleanconditionssufferfromadrasticlossofperformanceinthepresenceofnoise.
50
Introduction
• Feature-BasedMethods• Thisclassofmethodsfocusonfeatureextractionorfeaturenormalization.• Feature-basedtechniqueshavethepotentialtogeneralizewell,butdonotalwaysproducethebestresults.
51
Introduction
• TwoGroupsofFeature-BasedMethods•Whenstereo[*] data isunavailable,priorknowledgeaboutspeechand/ornoiseisused,suchasspectralreconstructionbasedmissingfeaturemethods,directmaskingmethodsandfeatureenhancementmethods.•Whenstereodataisavailable,featuremappingmethodsandrecurrentneuralnetworkshavebeenused.
[*]Bystereowemeannoisyandthecorresponding cleansignals.
52
Introduction
•Model-BasedMethods• TheASRmodelparametersareadaptedtomatchthedistributionofnoisyorenhancedfeatures.•Model-basedmethodsworkwellwhentheunderlyingassumptionsaremet,buttypicallyinvolvesignificantcomputationaloverhead.• Thebestperformancesareusuallyobtainedbycombiningfeature-basedandmodel-basedmethods.
53
Introduction
• SupervisedClassificationBasedSpeechSeparation• Stereotrainingdataisalsousedbysupervisedclassificationbasedspeechseparationalgorithms.• Suchalgorithmstypicallyestimatetheidealbinarymask(IBM)-abinarymaskdefinedinthetime-frequency(T-F)domainthatidentifiesspeechdominantandnoisedominantT-Funits.• Theabovemethodcanbeextendedtoidealratiomask(IRM),which representstheratioofspeechtomixture energy.
54
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
55
SystemDescription
• BlockDiagramoftheProposedSystem
From:Thepaper56
SystemDescription
• AddressingAdditiveNoiseandConvolutionalDistortion• Theadditivenoiseandtheconvolutionaldistortionaredealtwithintwoseparatestages:Noiseremovalfollowedbychannelcompensation.• NoiseisremovedviaT-FmaskingusingtheIRM.Tocompensateforchannelmismatchandtheerrorsintroducedbymasking,welearnanon-linearmappingfunctionthatundoesthesedistortions.
57
SystemDescription
• Time-FrequencyMasking
58
SystemDescription
• Time-FrequencyMasking(Cont’d)• HeretheauthorsperformT-Fmaskinginthemel-frequencydomain,unlikesomeoftheothersystemsthatoperateinthegammatonefeaturedomain.• Toobtainthemel-spectrogramofasignal,itisfirstpre-emphasizedandtransformedtothelinearfrequencydomainusinga320channelfastFouriertransform(FFT).A20msecHammingwindowisused. The161-dimensionalspectrogramisthenconvertedtoa26-channelmel-spectrogram.
59
SystemDescription
• Time-FrequencyMasking(Cont’d)• TheauthorsuseDNNstoestimatetheIRMasDNNsshowgoodperformanceandtrainingusingstochasticgradientdescentscaleswellcomparedtoothernonlineardiscriminativeclassifiers.
60
SystemDescription
• Time-FrequencyMasking(Cont’d)• TargetSignal• Theidealratiomaskisdefinedastheratioofthecleansignalenergytothemixtureenergyateachtime-frequencyunit.• Themathematicalexpressionisshownbelow.
IRM (t, f ) = 10(SNR(t , f )/10)
10(SNR(t , f )/10) +1SNR(t, f ) = 10 log10 (X(t, f ) / N(t, f ))
61
SystemDescription
• Time-FrequencyMasking(Cont’d)• TargetSignal• RatherthanestimatingIRMdirectly,theauthorsestimateatransformedversionoftheSNR.• Themathematicalexpressionofthesigmoidaltransformationisshownbelow.
d(t, f ) = 11+ exp(−α (SNR(t, f )− β ))
62
SystemDescription
• Time-FrequencyMasking(Cont’d)• TargetSignal• Duringtesting,thevaluesoutputfromtheDNNaremappedbacktotheircorrespondingIRMvalues.
63
SystemDescription
• Time-FrequencyMasking(Cont’d)• Features• Featureextractionisperformedbothatthefullbandandthesubbandlevel.• Thecombinationoffeatures,31dimensionalMFCCs,13dimensionalFASTAfilteredPLPsand15dimensionalamplitudemodulationspectrogram(AMS)features,areused.
64
SystemDescription
• Time-FrequencyMasking(Cont’d)• Features• ThefullbandfeaturesarederivedbysplicingtogetherfullbandMFCCsandRASTA-PLPs,alongwiththeirdeltaandaccelerationcomponents,andsubbandAMSfeatures.• ThesubbandfeaturesarederivedbysplicingtogethersubbandMFCCs,RASTA-PLPs,andAMSfeatures.Someauxiliarycomponentsarealsoadded.
65
SystemDescription
• Time-FrequencyMasking(Cont’d)• SupervisedLearning• IRMestimationisperformedintwostages.Inthefirststage,multipleDNNsaretrainedusingfullbandandsubbandfeatures.ThefinalestimateisobtainedusinganMLPthatcombinestheoutputofthefullbandandthesubbandDNNs.
66
SystemDescription
• Time-FrequencyMasking(Cont’d)• SupervisedLearning• ThefullbandDNNswouldbecognizantoftheoverallspectralshapeoftheIRMandtheinformationconveyedbythefullbandfeatures,whereasthesubbandDNNsareexpectedtobemorerobusttonoiseoccurringatfrequenciesoutsidetheirpassband.
67
SystemDescription
• Time-FrequencyMasking(Cont’d)
From:Thepaper 68
SystemDescription
• FeatureMapping
69
SystemDescription
• FeatureMapping(Cont’d)• EvenafterT-Fmasking,channelmismatchcanstillsignificantlyimpactperformance.• Thishappensfortworeasons.Firstly,thealgorithmlearnstoestimatetheratiomaskusingmixturesofspeechandnoiserecordedusingasinglemicrophone.Secondly,becausechannelmismatchisconvolutional,speechandnoise,whichnowincludesbothbackgroundnoiseandconvolutivenoise,areclearlynotuncorrelated.
70
SystemDescription
• FeatureMapping(Cont’d)• Thegoaloffeaturemappinginthisworkistolearnspectro-temporalcorrelationsthatexistinspeechtoundothedistortionsintroducedbyunseenmicrophonesandthefirststageofthealgorithm.
71
SystemDescription
• FeatureMapping(Cont’d)• TargetSignal• Thetargetisthecleanlog-melspectrogram(LMS).The“clean”LMSherecorrespondstothoseobtainedfromthecleansignalsrecordedusingasinglemicrophoneinasinglefiltersetting.
72
SystemDescription
• FeatureMapping(Cont’d)• TargetSignal• InsteadofusingtheLMSdirectlyasthetarget,theauthorsapplyalineartransformtolimitthetargetvaluestotherange[0,1]tousethesigmoidaltransferfunctionfortheoutputlayeroftheDNN.• Themathematicalexpressionisasfollows.
Xd (t, f ) =ln(X(t, f ))−min(ln(X(⋅, f )))
max(ln(X(⋅, f )))−min(ln(X(⋅, f )))
73
SystemDescription
• FeatureMapping(Cont’d)• TargetSignal• Duringtesting,theoutputoftheDNNismappedbacktothedynamicrangeoftheutterancesintrainingset.
74
SystemDescription
• FeatureMapping(Cont’d)• Features• TheauthorsuseboththenoisyandthemaskedLMS.
• SupervisedLearning• UnliketheDNNsusedforIRMestimation,thehiddenlayersoftheDNNforthistaskuserectifiedlinearunits(ReLUs).Inaddition,theoutputlayerusessigmoidactivations.
75
SystemDescription
• FeatureMapping(Cont’d)
From:Thepaper76
SystemDescription
• AcousticModeling
77
SystemDescription
• AcousticModeling(Cont’d)• TheacousticmodelsaretrainedusingtheAurora-4dataset.• Aurora-4isa5000-wordclosedvocabularyrecognitiontaskbasedontheWallStreetJournaldatabase.Thecorpushastwotrainingsets,cleanandmulti-condition,bothwith7138utterances.
78
SystemDescription
• AcousticModeling(Cont’d)• GaussianMixtureModels• TheHMMsandtheGMMsareinitiallytrainedusingthecleantrainingset.Thecleanmodelsarethenusedtoinitializethemulti-conditionmodels;bothcleanandmulti-conditionmodelshavethesamestructureanddifferonlyintransitionandobservationprobabilitydensities.
79
SystemDescription
• AcousticModeling(Cont’d)• DeepNeuralNetworks• Theauthorsfirstalignthecleantrainingsettoobtainsenonelabelsateachtime-frameforallutterancesinthetrainingset.DNNsarethentrainedtopredicttheposteriorprobabilityofsenonesusingeithercleanfeaturesorfeaturesextractedfromthemulti-conditionset.
80
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression
81
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• dFDLRisasemi-supervisedfeatureadaptationtechnique.• ThemotivationfordevelopingdFDLRistoaddresstheproblemofgeneralizationtounseenmicrophoneconditionsinourdataset,whichiswheretheDNN-HMMsystemsperformtheworst.
82
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• ToapplydFDLR,wefirstobtainaninitialsenone-levellabelingforourtestutterancesusingtheunadaptedmodels.Featuresarethentransformedtominimizethecross-entropyerrorinpredictingtheselabels.• Themathematicalexpressionsareasfollow.
Ot ( f ) = wf iOt ( f )+ bf
min E(st ,Dout (Ot−5...Ot+5 ))t∑
83
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• TheparameterscaneasilybelearnedwithintheDNNframeworkbyaddingalayerbetweentheinputlayerandthefirsthiddenlayeroftheoriginalDNN. Afterinitialization,thestandardbackpropagationalgorithmisrunfor10epochstolearntheparametersofthedFDLRmodel. Duringbackpropagation,weightsoftheoriginalhiddenlayersarekeptunchangedandonlytheparametersinthedFDLRareupdated.
84
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
85
EvaluationResults
From:Thepaper86
EvaluationResults
From:Thepaper87
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
88
Discussion
• Severalinterestingobservationscanbemadefromtheresultspresentedintheprevioussection.• Firstly,theresultsclearlyshowthatthespeechseparationfront-endisdoingagoodjobatremovingnoiseandhandlingchannelmismatch.• Secondly,withnochannelmismatch,T-Fmaskingaloneworkedwellinremovingnoise.
89
Discussion
• Finally,directlyperformingfeaturemappingfromnoisyfeaturestocleanfeaturesperformsreasonably,butitdoesnotperformaswellastheproposedfront-end.
90
Thank You!
91