Deep Neural Networks forAcoustic Modeling in Speech
Recognition
Hinton,Geoffrey,etal.“Deepneuralnetworksforacousticmodelinginspeechrecognition:Thesharedviewsoffourresearchgroups.” Signal
ProcessingMagazine,IEEE 29.6(2012):82-97.
Presented by PeidongWang04/04/2016
1
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
2
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
3
SpeechRecognitionSystem
• Goal• Convertingspeechtotext
• AMathematicalPerspective
orw = argmax
w{P(w |Y )}
w = argmaxw
{P(Y |w)P(w)}
4
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
5
GMM-HMMModel
• GMM and HMM• GMM is short for Gaussian Mixture Model, and HMM isshort for Hidden Markov Model.
• PredecessorofDNNs• Before Deep Neural Networks (DNNs), the most commonlyused speech recognition systemswere consistedof GMMsand HMMs.
6
GMM-HMMModel
• HMM• HMMisusedtodealwiththetemporalvariabilityofspeech.
• GMM• GMMisusedtorepresenttherelationshipbetweenHMMstatesandtheacousticinput.
7
GMM-HMMModel
• Features• ThefeaturesistypicallyrepresentedbyconcatenatingMel-frequencycepstralcoefficients(MFCCs)orperceptuallinearpredictivecoefficients(PLPs)computedfromtherawwaveformandtheirfirst- andsecond-ordertemporaldifferences.
8
GMM-HMMModel
• Shortcoming• GMM-HMMmodelsarestatisticallyinefficientformodelingdatathatlieonornearanonlinearmanifoldinthedataspace.• Forexample,modelingthesetofpointsthatlieveryclosetothesurfaceofasphere.
9
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
10
TrainingDeepNeuralNetworks
• DeepNeuralNetwork(DNN)• ADNNisafeed-forward,artificialneuralnetworkthathasmorethanonelayerofhiddenunitsbetweenitsinputsanditsoutputs.•Withnonlinearactivationfunctions,DNNisabletomodelanarbitrarynonlinearfunction(projectionfrominputstooutputs).[*]
[*]Addedbythepresenter.
11
TrainingDeepNeuralNetworks
• ActivationFunctionoftheOutputUnits• Theactivationfunctionoftheoutputunitsis“softmax”function.• Themathematicalexpressionisasfollows.
pj =exp(x j )exp(xk )
k∑
12
TrainingDeepNeuralNetworks
• ObjectiveFunction•Whenusingthesoftmaxoutputfunction,thenaturalobjectivefunction(costfunction)Cisthecross-entropybetweenthetargetprobabilitiesdandtheoutputsofthesoftmax,p.• Themathematicalexpressionisasfollows.
C = dj log pjj∑
13
TrainingDeepNeuralNetworks
•WeightPenaltiesandEarlyStopping• Toreduceoverfitting,largeweightscanbepenalizedinproportiontotheirsquaredmagnitude,orthelearningcansimplybeterminatedatthepointwhichperformanceonaheld-outvalidationsetstartsgettingworse.
14
TrainingDeepNeuralNetworks
• OverfittingReduction• Generallyspeaking,therearethreemethods.•Weightpenaltiesandearlystoppingcanreducetheoverfittingbutonlybyremovingmuchofthemodelingpower.• Verylargetrainingsetscanreduceoverfittingbutonlybymakingtrainingverycomputationallyexpensive.• GenerativePretraining
15
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
16
GenerativePretraining
• Purpose• Themultiplelayersoffeaturedetectors(theresultofthisstep)canbeusedasagoodstartingpointforadiscriminative“fine-tuning”phaseduringwhichbackpropagationthroughtheDNNslightlyadjuststheweightsandimprovestheperformance.• Inaddition,thisstepcansignificantlyreduceoverfitting.
17
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)• RBMconsistsofalayerofstochasticbinary“visible”unitsthatrepresentbinaryinputdataconnectedtoalayerofstochasticbinaryhidden (latent)unitsthatlearntomodelsignificantnonindependenciesbetweenthevisibleunits.• Thereareundirectedconnectionsbetweenvisibleandhiddenunitsbutnovisible-visibleorhidden-hiddenconnections.
18
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• TheframeworkofanRBMisshownbelow.
From:SlidesinCSE5526NeuralNetworks19
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• RBMusesasinglesetofparameters,W,todefinethejointprobabilityofavectorofvaluesoftheobservablevariables,v,andavectorofvaluesofthelatentvariables,h,viaanenergyfunction,E.
20
p(v,h;W ) = 1Ze−E (v,h;W ),Z = e−E (v ',h ';W )
v ',h '∑
E(v,h) = − aivii∈visible∑ − bjhj
j∈visible∑ − vihjwij
i, j∑
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• Theprobabilitythatthenetworkassignstoavisiblevector,v,isgivenbysummingoverallpossiblehiddenvectors.
• Thederivativeofthelogprobabilityofatrainingsetwithrespecttoaweightissurprisinglysimple.Theanglebracketsdenoteexpectationsunderthecorrespondingdistribution.
p(v) = 1Z
e−E (v,h)h∑
1N
∂log p(vn )∂wijn=1
N
∑ =< vihj >data − < vihj >model
21
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• Thelearningruleisthusasfollows.
• Abetterlearningprocedureiscontrastivedivergence(CD),whichisshownbelow.Thesubscript“recon”denotesastepinCDwhenthestatesofvisibleunitsareassigned0or1accordingtothecurrentstatesofthehiddenunits.
Δwij = ε(< vihj >data − < vihj >model )
Δwij = ε(< vihj >data − < vihj >recon )
22
GenerativePretraining
•ModelingReal-ValuedData• Real-valueddata,suchasMFCCs,aremorenaturallymodeledbylinearvariableswithGaussiannoiseandtheRBMenergyfunctioncanbemodifiedtoaccommodatesuchvariables,givingaGaussian-BernoulliRBM(GRBM).
E(v,h) = (vi − ai )2
2σ i2
i∈vis∑ − bjhj
j∈hid∑ − vi
σ i
hjwiji, j∑
23
GenerativePretraining
• StackingRBMstoMakeaDeepBeliefNetwork• AftertraininganRBMonthedata,theinferredstatesofthehiddenunitscanbeusedasdatafortraininganotherRBMthatlearnstomodelthesignificantdependenciesbetweenthehiddenunitsofthefirstRBM.• Thiscanberepeatedasmanytimesasdesiredtoproducemanylayersofnonlinearfeaturedetectorsthatrepresentprogressivelymorecomplexstatisticalstructureinthedata.
24
GenerativePretraining
• StackingRBMstoMakeaDeepBeliefNetwork(Cont’d)
From:Thepaper25
GenerativePretraining
• InterfacingaDNNwithanHMM• InanHMMframework,thehiddenvariablesdenotethestatesofthephonesequence,andthe“visible”variablesdenotethefeaturevectors.[*]
[*]Addedbythepresenter
From:Gales,Mark,andSteveYoung."TheapplicationofhiddenMarkovmodels inspeechrecognition.”Foundationsandtrendsinsignalprocessing 1.3(2008):195-304. 26
GenerativePretraining
• InterfacingaDNNwithanHMM(Cont’d)• TocomputeaViterbialignmentortoruntheforward-backwardalgorithmwithintheHMMframework,werequirethelikelihoodp(AcousticInput|HMMstate).• ADNN,however,outputsprobabilitiesoftheformp(HMMstate|AcousticInput).
27
GenerativePretraining
• InterfacingaDNNwithanHMM(Cont’d)• TheposteriorprobabilitiesthattheDNNoutputscanbeconvertedintothescaledlikelihoodbydividingthembythefrequenciesoftheHMMstatesintheforcedalignmentthatisusedforfine-tuningtheDNN.• Forcedalignment isaprocedureusedtogeneratelabelsforthetrainingprocess.[*]
[*]Addedbythepresenter
28
GenerativePretraining
• InterfacingaDNNwithanHMM(Cont’d)• All of the likelihoods produced in this way are scaled by thesame unknown factor of p(AcousticInput).• Although this appears to have little effect on somerecognition tasks, it can be important for tasks wheretraining labels are highly unbalanced.
29
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
30
Experiments
• PhoneticClassificationandRecognitiononTIMIT• TheTIMITdatasetisarelativelysmalldatasetwhichprovidesasimpleandconvenientwayoftestingnewapproachestospeechrecognition.
31
Experiments
• PhoneticClassificationandRecognitiononTIMIT(Cont’d)
From:Thepaper32
Experiments
• Bing-Voice-SearchSpeechRecognitionTask• Thistaskused24hoftrainingdatawithahighdegreeofacousticvariabilitycausedbynoise,music,side-speech,accents,sloppypronunciation,etal.• ThebestDNN-HMMacousticmodelachievedasentenceaccuracyof69.6%onthetestset,comparedwith63.8%forastrong,minimumphoneerror(MPE)-trainedGMM-HMMbaseline.
33
Experiments
• Bing-Voice-SearchSpeechRecognitionTask(Cont’d)
From:Thepaper 34
Experiments
• OtherLargeVocabularyTasks• SwitchboardSpeechRecognitionTask(acorpuscontainingover300hoftrainingdata)• GoogleVoiceInputSpeechRecognitionTask• YouTubeSpeechRecognitionTask• EnglishBroadcastNewsSpeechRecognitionTask
35
Experiments
• OtherLargeVocabularyTasks(Cont’d)
From:Thepaper 36
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
37
Discussion
• ConvolutionalDNNsforPhoneClassificationandRecognition• AlthoughconvolutionalmodelsalongthetemporaldimensionachievedgoodclassificationresultsonTIMITcorpus,applyingthemtophonerecognitionisnotstraightforward.• ThisisbecausetemporalvariationsinspeechcanbepartiallyhandledbythedynamicprogramingprocedureintheHMMcomponentandhiddentrajectorymodels.
38
Discussion
• SpeedingUpDNNsatRecognitionTime• ThetimethataDNN-HMMsystemrequirestorecognize1sofspeechcanbereducedfrom1.6sto210ms,withoutdecreasingrecognitionaccuracy,byquantizingtheweightsdownto8busingCPU.• Alternatively,itcanbereducedto66msbyusingagraphicsprocessingunit(GPU).
39
Discussion
• AlternativePretrainingMethodsforDNNs• ItispossibletolearnaDNNbystartingwithashallowneuralnetwithasinglehiddenlayer.Oncethisnethasbeentraineddiscriminatively,asecondhiddenlayerisinterposedbetweenthefirsthiddenlayerandthesoftmaxoutputunitsandthewholenetworkisagaindiscriminativelytrained.Thiscanbecontinueduntilthedesirednumberofhiddenlayersisreached,afterwhichfullbackpropagationfine-tuningisapplied.
40
Discussion
• AlternativePretrainingMethodsforDNNs(Cont’d)• PurelydiscriminativetrainingofthewholeDNNfromrandominitialweightsworkswell,too.• Varioustypesofautoencoderwithonehiddenlayercanalsobeusedinthe layer-by-layergenerativepretrainingprocess.
41
Discussion
• AlternativeFine-TuningMethodsforDNNs•MostDBN-DNNacousticmodelsarefine-tunedbyapplyingstochasticgradientdescentwithmomentumtosmallminibatchesoftrainingcases.•Moresophisticatedoptimizationmethodscanbeused,butitisnotclearthatthemoresophisticatedmethodsareworthwhilesincethefine-tuningprocessistypicallystoppedearlytopreventoverfitting.
42
Discussion
• UsingDBN-DNNstoProvideInputFeaturesforGMM-HMMSystems• ThisclassofmethodsuseneuralnetworkstoprovidethefeaturevectorsforthetrainingprocessoftheGMMinaGMM-HMMsystem.• Themostcommonapproachistotrainarandomlyinitializedneuralnetwithanarrowbottleneckmiddlelayerandtousetheactivationsofthebottleneckhiddenunitsasfeatures.
43
Discussion
• UsingDNNstoEstimateArticulatoryFeaturesforDetection-BasedSpeechRecognition• DBN-DNNsareeffectivefordetectingsubphoneticspeechattributes(alsoknownasphonologicalorarticulatoryfeatures).
44
Discussion
• Summary•MostofthegaincomesfromusingDNNstoexploitinformationinneighboringframesandfrommodelingtiedcontext-dependentstates.• Thereisnoreasontobelievethattheoptimaltypesofhiddenunitsortheoptimalnetworkarchitecturesareused,anditishighlylikelythatboththepretrainingandfine-tuningalgorithmscanbemodifiedtoreducetheamountofoverfittingandtheamountofcomputation.
45
Thank You!
46
InvestigationofSpeechSeparationasaFront-Endfor
NoiseRobustSpeechRecognition
Narayanan,Arun,andDeLiangWang."Investigationofspeechseparationasafront-endfornoiserobustspeechrecognition."Audio,Speech,andLanguageProcessing,IEEE/ACMTransactionson 22.4
(2014):826-835.
Presented by PeidongWang04/04/2016
47
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
48
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
49
Introduction
• Background• Althoughautomaticspeechrecognition(ASR)systemshavebecomefairlypowerful,theinherentvariabilitycanstillposechallenges.• Typically,ASRsystemsthatworkwellincleanconditionssufferfromadrasticlossofperformanceinthepresenceofnoise.
50
Introduction
• Feature-BasedMethods• Thisclassofmethodsfocusonfeatureextractionorfeaturenormalization.• Feature-basedtechniqueshavethepotentialtogeneralizewell,butdonotalwaysproducethebestresults.
51
Introduction
• TwoGroupsofFeature-BasedMethods•Whenstereo[*] data isunavailable,priorknowledgeaboutspeechand/ornoiseisused,suchasspectralreconstructionbasedmissingfeaturemethods,directmaskingmethodsandfeatureenhancementmethods.•Whenstereodataisavailable,featuremappingmethodsandrecurrentneuralnetworkshavebeenused.
[*]Bystereowemeannoisyandthecorresponding cleansignals.
52
Introduction
•Model-BasedMethods• TheASRmodelparametersareadaptedtomatchthedistributionofnoisyorenhancedfeatures.•Model-basedmethodsworkwellwhentheunderlyingassumptionsaremet,buttypicallyinvolvesignificantcomputationaloverhead.• Thebestperformancesareusuallyobtainedbycombiningfeature-basedandmodel-basedmethods.
53
Introduction
• SupervisedClassificationBasedSpeechSeparation• Stereotrainingdataisalsousedbysupervisedclassificationbasedspeechseparationalgorithms.• Suchalgorithmstypicallyestimatetheidealbinarymask(IBM)-abinarymaskdefinedinthetime-frequency(T-F)domainthatidentifiesspeechdominantandnoisedominantT-Funits.• Theabovemethodcanbeextendedtoidealratiomask(IRM),which representstheratioofspeechtomixture energy.
54
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
55
SystemDescription
• BlockDiagramoftheProposedSystem
From:Thepaper56
SystemDescription
• AddressingAdditiveNoiseandConvolutionalDistortion• Theadditivenoiseandtheconvolutionaldistortionaredealtwithintwoseparatestages:Noiseremovalfollowedbychannelcompensation.• NoiseisremovedviaT-FmaskingusingtheIRM.Tocompensateforchannelmismatchandtheerrorsintroducedbymasking,welearnanon-linearmappingfunctionthatundoesthesedistortions.
57
SystemDescription
• Time-FrequencyMasking
58
SystemDescription
• Time-FrequencyMasking(Cont’d)• HeretheauthorsperformT-Fmaskinginthemel-frequencydomain,unlikesomeoftheothersystemsthatoperateinthegammatonefeaturedomain.• Toobtainthemel-spectrogramofasignal,itisfirstpre-emphasizedandtransformedtothelinearfrequencydomainusinga320channelfastFouriertransform(FFT).A20msecHammingwindowisused. The161-dimensionalspectrogramisthenconvertedtoa26-channelmel-spectrogram.
59
SystemDescription
• Time-FrequencyMasking(Cont’d)• TheauthorsuseDNNstoestimatetheIRMasDNNsshowgoodperformanceandtrainingusingstochasticgradientdescentscaleswellcomparedtoothernonlineardiscriminativeclassifiers.
60
SystemDescription
• Time-FrequencyMasking(Cont’d)• TargetSignal• Theidealratiomaskisdefinedastheratioofthecleansignalenergytothemixtureenergyateachtime-frequencyunit.• Themathematicalexpressionisshownbelow.
IRM (t, f ) = 10(SNR(t , f )/10)
10(SNR(t , f )/10) +1SNR(t, f ) = 10 log10 (X(t, f ) / N(t, f ))
61
SystemDescription
• Time-FrequencyMasking(Cont’d)• TargetSignal• RatherthanestimatingIRMdirectly,theauthorsestimateatransformedversionoftheSNR.• Themathematicalexpressionofthesigmoidaltransformationisshownbelow.
d(t, f ) = 11+ exp(−α (SNR(t, f )− β ))
62
SystemDescription
• Time-FrequencyMasking(Cont’d)• TargetSignal• Duringtesting,thevaluesoutputfromtheDNNaremappedbacktotheircorrespondingIRMvalues.
63
SystemDescription
• Time-FrequencyMasking(Cont’d)• Features• Featureextractionisperformedbothatthefullbandandthesubbandlevel.• Thecombinationoffeatures,31dimensionalMFCCs,13dimensionalFASTAfilteredPLPsand15dimensionalamplitudemodulationspectrogram(AMS)features,areused.
64
SystemDescription
• Time-FrequencyMasking(Cont’d)• Features• ThefullbandfeaturesarederivedbysplicingtogetherfullbandMFCCsandRASTA-PLPs,alongwiththeirdeltaandaccelerationcomponents,andsubbandAMSfeatures.• ThesubbandfeaturesarederivedbysplicingtogethersubbandMFCCs,RASTA-PLPs,andAMSfeatures.Someauxiliarycomponentsarealsoadded.
65
SystemDescription
• Time-FrequencyMasking(Cont’d)• SupervisedLearning• IRMestimationisperformedintwostages.Inthefirststage,multipleDNNsaretrainedusingfullbandandsubbandfeatures.ThefinalestimateisobtainedusinganMLPthatcombinestheoutputofthefullbandandthesubbandDNNs.
66
SystemDescription
• Time-FrequencyMasking(Cont’d)• SupervisedLearning• ThefullbandDNNswouldbecognizantoftheoverallspectralshapeoftheIRMandtheinformationconveyedbythefullbandfeatures,whereasthesubbandDNNsareexpectedtobemorerobusttonoiseoccurringatfrequenciesoutsidetheirpassband.
67
SystemDescription
• Time-FrequencyMasking(Cont’d)
From:Thepaper 68
SystemDescription
• FeatureMapping
69
SystemDescription
• FeatureMapping(Cont’d)• EvenafterT-Fmasking,channelmismatchcanstillsignificantlyimpactperformance.• Thishappensfortworeasons.Firstly,thealgorithmlearnstoestimatetheratiomaskusingmixturesofspeechandnoiserecordedusingasinglemicrophone.Secondly,becausechannelmismatchisconvolutional,speechandnoise,whichnowincludesbothbackgroundnoiseandconvolutivenoise,areclearlynotuncorrelated.
70
SystemDescription
• FeatureMapping(Cont’d)• Thegoaloffeaturemappinginthisworkistolearnspectro-temporalcorrelationsthatexistinspeechtoundothedistortionsintroducedbyunseenmicrophonesandthefirststageofthealgorithm.
71
SystemDescription
• FeatureMapping(Cont’d)• TargetSignal• Thetargetisthecleanlog-melspectrogram(LMS).The“clean”LMSherecorrespondstothoseobtainedfromthecleansignalsrecordedusingasinglemicrophoneinasinglefiltersetting.
72
SystemDescription
• FeatureMapping(Cont’d)• TargetSignal• InsteadofusingtheLMSdirectlyasthetarget,theauthorsapplyalineartransformtolimitthetargetvaluestotherange[0,1]tousethesigmoidaltransferfunctionfortheoutputlayeroftheDNN.• Themathematicalexpressionisasfollows.
Xd (t, f ) =ln(X(t, f ))−min(ln(X(⋅, f )))
max(ln(X(⋅, f )))−min(ln(X(⋅, f )))
73
SystemDescription
• FeatureMapping(Cont’d)• TargetSignal• Duringtesting,theoutputoftheDNNismappedbacktothedynamicrangeoftheutterancesintrainingset.
74
SystemDescription
• FeatureMapping(Cont’d)• Features• TheauthorsuseboththenoisyandthemaskedLMS.
• SupervisedLearning• UnliketheDNNsusedforIRMestimation,thehiddenlayersoftheDNNforthistaskuserectifiedlinearunits(ReLUs).Inaddition,theoutputlayerusessigmoidactivations.
75
SystemDescription
• FeatureMapping(Cont’d)
From:Thepaper76
SystemDescription
• AcousticModeling
77
SystemDescription
• AcousticModeling(Cont’d)• TheacousticmodelsaretrainedusingtheAurora-4dataset.• Aurora-4isa5000-wordclosedvocabularyrecognitiontaskbasedontheWallStreetJournaldatabase.Thecorpushastwotrainingsets,cleanandmulti-condition,bothwith7138utterances.
78
SystemDescription
• AcousticModeling(Cont’d)• GaussianMixtureModels• TheHMMsandtheGMMsareinitiallytrainedusingthecleantrainingset.Thecleanmodelsarethenusedtoinitializethemulti-conditionmodels;bothcleanandmulti-conditionmodelshavethesamestructureanddifferonlyintransitionandobservationprobabilitydensities.
79
SystemDescription
• AcousticModeling(Cont’d)• DeepNeuralNetworks• Theauthorsfirstalignthecleantrainingsettoobtainsenonelabelsateachtime-frameforallutterancesinthetrainingset.DNNsarethentrainedtopredicttheposteriorprobabilityofsenonesusingeithercleanfeaturesorfeaturesextractedfromthemulti-conditionset.
80
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression
81
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• dFDLRisasemi-supervisedfeatureadaptationtechnique.• ThemotivationfordevelopingdFDLRistoaddresstheproblemofgeneralizationtounseenmicrophoneconditionsinourdataset,whichiswheretheDNN-HMMsystemsperformtheworst.
82
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• ToapplydFDLR,wefirstobtainaninitialsenone-levellabelingforourtestutterancesusingtheunadaptedmodels.Featuresarethentransformedtominimizethecross-entropyerrorinpredictingtheselabels.• Themathematicalexpressionsareasfollow.
Ot ( f ) = wf iOt ( f )+ bf
min E(st ,Dout (Ot−5...Ot+5 ))t∑
83
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• TheparameterscaneasilybelearnedwithintheDNNframeworkbyaddingalayerbetweentheinputlayerandthefirsthiddenlayeroftheoriginalDNN. Afterinitialization,thestandardbackpropagationalgorithmisrunfor10epochstolearntheparametersofthedFDLRmodel. Duringbackpropagation,weightsoftheoriginalhiddenlayersarekeptunchangedandonlytheparametersinthedFDLRareupdated.
84
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
85
EvaluationResults
From:Thepaper86
EvaluationResults
From:Thepaper87
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
88
Discussion
• Severalinterestingobservationscanbemadefromtheresultspresentedintheprevioussection.• Firstly,theresultsclearlyshowthatthespeechseparationfront-endisdoingagoodjobatremovingnoiseandhandlingchannelmismatch.• Secondly,withnochannelmismatch,T-Fmaskingaloneworkedwellinremovingnoise.
89
Discussion
• Finally,directlyperformingfeaturemappingfromnoisyfeaturestocleanfeaturesperformsreasonably,butitdoesnotperformaswellastheproposedfront-end.
90
Thank You!
91