of 91 /91
Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton, Geoffrey, et al. “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups.” Signal Processing Magazine, IEEE 29.6 (2012): 82-97. Presented by Peidong Wang 04/04/2016 1

DeepNeural Networks for Acoustic …web.cse.ohio-state.edu/~wang.7642/homepage/files...DeepNeural Networks for Acoustic ModelinginSpeech Recognition Hinton, Geoffrey, et al. “Deep

  • Author
    dangdat

  • View
    214

  • Download
    1

Embed Size (px)

Text of DeepNeural Networks for Acoustic …web.cse.ohio-state.edu/~wang.7642/homepage/files...DeepNeural...

  • Deep Neural Networks forAcoustic Modeling in Speech

    Recognition

    Hinton,Geoffrey,etal.Deepneuralnetworksforacousticmodelinginspeechrecognition:Thesharedviewsoffourresearchgroups. Signal

    ProcessingMagazine,IEEE 29.6(2012):82-97.

    Presented by PeidongWang04/04/2016

    1

  • Content

    SpeechRecognitionSystem GMM-HMMModel TrainingDeepNeuralNetworks GenerativePretraining Experiments Discussion

    2

  • Content

    SpeechRecognitionSystem GMM-HMMModel TrainingDeepNeuralNetworks GenerativePretraining Experiments Discussion

    3

  • SpeechRecognitionSystem

    Goal Convertingspeechtotext

    AMathematicalPerspective

    orw = argmax

    w{P(w |Y )}

    w = argmaxw

    {P(Y |w)P(w)}

    4

  • Content

    SpeechRecognitionSystem GMM-HMMModel TrainingDeepNeuralNetworks GenerativePretraining Experiments Discussion

    5

  • GMM-HMMModel

    GMM and HMM GMM is short for Gaussian Mixture Model, and HMM isshort for Hidden Markov Model.

    PredecessorofDNNs Before Deep Neural Networks (DNNs), the most commonlyused speech recognition systemswere consistedof GMMsand HMMs.

    6

  • GMM-HMMModel

    HMM HMMisusedtodealwiththetemporalvariabilityofspeech.

    GMM GMMisusedtorepresenttherelationshipbetweenHMMstatesandtheacousticinput.

    7

  • GMM-HMMModel

    Features ThefeaturesistypicallyrepresentedbyconcatenatingMel-frequencycepstralcoefficients(MFCCs)orperceptuallinearpredictivecoefficients(PLPs)computedfromtherawwaveformandtheirfirst- andsecond-ordertemporaldifferences.

    8

  • GMM-HMMModel

    Shortcoming GMM-HMMmodelsarestatisticallyinefficientformodelingdatathatlieonornearanonlinearmanifoldinthedataspace. Forexample,modelingthesetofpointsthatlieveryclosetothesurfaceofasphere.

    9

  • Content

    SpeechRecognitionSystem GMM-HMMModel TrainingDeepNeuralNetworks GenerativePretraining Experiments Discussion

    10

  • TrainingDeepNeuralNetworks

    DeepNeuralNetwork(DNN) ADNNisafeed-forward,artificialneuralnetworkthathasmorethanonelayerofhiddenunitsbetweenitsinputsanditsoutputs.Withnonlinearactivationfunctions,DNNisabletomodelanarbitrarynonlinearfunction(projectionfrominputstooutputs).[*]

    [*]Addedbythepresenter.

    11

  • TrainingDeepNeuralNetworks

    ActivationFunctionoftheOutputUnits Theactivationfunctionoftheoutputunitsissoftmaxfunction. Themathematicalexpressionisasfollows.

    pj =exp(x j )exp(xk )

    k

    12

  • TrainingDeepNeuralNetworks

    ObjectiveFunctionWhenusingthesoftmaxoutputfunction,thenaturalobjectivefunction(costfunction)Cisthecross-entropybetweenthetargetprobabilitiesdandtheoutputsofthesoftmax,p. Themathematicalexpressionisasfollows.

    C = dj log pjj

    13

  • TrainingDeepNeuralNetworks

    WeightPenaltiesandEarlyStopping Toreduceoverfitting,largeweightscanbepenalizedinproportiontotheirsquaredmagnitude,orthelearningcansimplybeterminatedatthepointwhichperformanceonaheld-outvalidationsetstartsgettingworse.

    14

  • TrainingDeepNeuralNetworks

    OverfittingReduction Generallyspeaking,therearethreemethods.Weightpenaltiesandearlystoppingcanreducetheoverfittingbutonlybyremovingmuchofthemodelingpower. Verylargetrainingsetscanreduceoverfittingbutonlybymakingtrainingverycomputationallyexpensive. GenerativePretraining

    15

  • Content

    SpeechRecognitionSystem GMM-HMMModel TrainingDeepNeuralNetworks GenerativePretraining Experiments Discussion

    16

  • GenerativePretraining

    Purpose Themultiplelayersoffeaturedetectors(theresultofthisstep)canbeusedasagoodstartingpointforadiscriminativefine-tuningphaseduringwhichbackpropagationthroughtheDNNslightlyadjuststheweightsandimprovestheperformance. Inaddition,thisstepcansignificantlyreduceoverfitting.

    17

  • GenerativePretraining

    RestrictedBoltzmannMachine(RBM) RBMconsistsofalayerofstochasticbinaryvisibleunitsthatrepresentbinaryinputdataconnectedtoalayerofstochasticbinaryhidden (latent)unitsthatlearntomodelsignificantnonindependenciesbetweenthevisibleunits. Thereareundirectedconnectionsbetweenvisibleandhiddenunitsbutnovisible-visibleorhidden-hiddenconnections.

    18

  • GenerativePretraining

    RestrictedBoltzmannMachine(RBM)(Contd) TheframeworkofanRBMisshownbelow.

    From:SlidesinCSE5526NeuralNetworks19

  • GenerativePretraining

    RestrictedBoltzmannMachine(RBM)(Contd) RBMusesasinglesetofparameters,W,todefinethejointprobabilityofavectorofvaluesoftheobservablevariables,v,andavectorofvaluesofthelatentvariables,h,viaanenergyfunction,E.

    20

    p(v,h;W ) = 1ZeE (v,h;W ),Z = eE (v ',h ';W )

    v ',h '

    E(v,h) = aiviivisible bjhj

    jvisible vihjwij

    i, j

  • GenerativePretraining

    RestrictedBoltzmannMachine(RBM)(Contd) Theprobabilitythatthenetworkassignstoavisiblevector,v,isgivenbysummingoverallpossiblehiddenvectors.

    Thederivativeofthelogprobabilityofatrainingsetwithrespecttoaweightissurprisinglysimple.Theanglebracketsdenoteexpectationsunderthecorrespondingdistribution.

    p(v) = 1Z

    eE (v,h)h

    1N

    log p(vn )wijn=1

    N

    =< vihj >data < vihj >model

    21

  • GenerativePretraining

    RestrictedBoltzmannMachine(RBM)(Contd) Thelearningruleisthusasfollows.

    Abetterlearningprocedureiscontrastivedivergence(CD),whichisshownbelow.ThesubscriptrecondenotesastepinCDwhenthestatesofvisibleunitsareassigned0or1accordingtothecurrentstatesofthehiddenunits.

    wij = (< vihj >data < vihj >model )

    wij = (< vihj >data < vihj >recon )

    22

  • GenerativePretraining

    ModelingReal-ValuedData Real-valueddata,suchasMFCCs,aremorenaturallymodeledbylinearvariableswithGaussiannoiseandtheRBMenergyfunctioncanbemodifiedtoaccommodatesuchvariables,givingaGaussian-BernoulliRBM(GRBM).

    E(v,h) = (vi ai )2

    2 i2

    ivis bjhj

    jhid vi i

    hjwiji, j

    23

  • GenerativePretraining

    StackingRBMstoMakeaDeepBeliefNetwork AftertraininganRBMonthedata,theinferredstatesofthehiddenunitscanbeusedasdatafortraininganotherRBMthatlearnstomodelthesignificantdependenciesbetweenthehiddenunitsofthefirstRBM. Thiscanberepeatedasmanytimesasdesiredtoproducemanylayersofnonlinearfeaturedetectorsthatrepresentprogressivelymorecomplexstatisticalstructureinthedata.

    24

  • GenerativePretraining

    StackingRBMstoMakeaDeepBeliefNetwork(Contd)

    From:Thepaper25

  • GenerativePretraining

    InterfacingaDNNwithanHMM InanHMMframework,thehiddenvariablesdenotethestatesofthephonesequence,andthevisiblevariablesdenotethefeaturevectors.[*]

    [*]Addedbythepresenter

    From:Gales,Mark,andSteveYoung."TheapplicationofhiddenMarkovmodels inspeechrecognition.Foundationsandtrendsinsignalprocessing 1.3(2008):195-304. 26

  • GenerativePretraining

    InterfacingaDNNwithanHMM(Contd) TocomputeaViterbialignmentortoruntheforward-backwardalgorithmwithintheHMMframework,werequirethelikelihoodp(AcousticInput|HMMstate). ADNN,however,outputsprobabilitiesoftheformp(HMMstate|AcousticInput).

    27

  • GenerativePretraining

    InterfacingaDNNwithanHMM(Contd) TheposteriorprobabilitiesthattheDNNoutputscanbeconvertedintothescaledlikelihoodbydividingthembythefrequenciesoftheHMMstatesintheforcedalignmentthatisusedforfine-tuningtheDNN. Forcedalignment isaprocedureusedtogeneratelabelsforthetrainingprocess.[*]

    [*]Addedbythepresenter

    28

  • GenerativePretraining

    InterfacingaDNNwithanHMM(Contd) All of the likelihoods produced in this way are scaled by thesame unknown factor of p(AcousticInput). Although this appears to have little effect on somerecognition tasks, it can be important for tasks wheretraining labels are highly unbalanced.

    29

  • Content

    SpeechRecognitionSystem GMM-HMMModel TrainingDeepNeuralNetworks GenerativePretraining Experiments Discussion

    30

  • Experiments

    PhoneticClassificationandRecognitiononTIMIT TheTIMITdatasetisarelativelysmalldatasetwhichprovidesasimpleandconvenientwayoftestingnewapproachestospeechrecognition.

    31

  • Experiments

    PhoneticClassificationandRecognitiononTIMIT(Contd)

    From:Thepaper32

  • Experiments

    Bing-Voice-SearchSpeechRecognitionTask Thistaskused24hoftrainingdatawithahighdegreeofacousticvariabilitycausedbynoise,music,side-speech,accents,sloppypronunciation,etal. ThebestDNN-HMMacousticmodelachievedasentenceaccuracyof69.6%onthetestset,comparedwith63.8%forastrong,minimumphoneerror(MPE)-trainedGMM-HMMbaseline.

    33

  • Experiments

    Bing-Voice-SearchSpeechRecognitionTask(Contd)

    From:Thepaper 34

  • Experiments

    OtherLargeVocabularyTasks SwitchboardSpeechRecognitionTask(acorpuscontainingover300hoftrainingdata) GoogleVoiceInputSpeechRecognitionTask YouTubeSpeechRecognitionTask EnglishBroadcastNewsSpeechRecognitionTask

    35

  • Experiments

    OtherLargeVocabularyTasks(Contd)

    From:Thepaper 36

  • Content

    SpeechRecognitionSystem GMM-HMMModel TrainingDeepNeuralNetworks GenerativePretraining Experiments Discussion

    37

  • Discussion

    ConvolutionalDNNsforPhoneClassificationandRecognition AlthoughconvolutionalmodelsalongthetemporaldimensionachievedgoodclassificationresultsonTIMITcorpus,applyingthemtophonerecognitionisnotstraightforward. ThisisbecausetemporalvariationsinspeechcanbepartiallyhandledbythedynamicprogramingprocedureintheHMMcomponentandhiddentrajectorymodels.

    38

  • Discussion

    SpeedingUpDNNsatRecognitionTime ThetimethataDNN-HMMsystemrequirestorecognize1sofspeechcanbereducedfrom1.6sto210ms,withoutdecreasingrecognitionaccuracy,byquantizingtheweightsdownto8busingCPU. Alternatively,itcanbereducedto66msbyusingagraphicsprocessingunit(GPU).

    39

  • Discussion

    AlternativePretrainingMethodsforDNNs ItispossibletolearnaDNNbystartingwithashallowneuralnetwithasinglehiddenlayer.Oncethisnethasbeentraineddiscriminatively,asecondhiddenlayerisinterposedbetweenthefirsthiddenlayerandthesoftmaxoutputunitsandthewholenetworkisagaindiscriminativelytrained.Thiscanbecontinueduntilthedesirednumberofhiddenlayersisreached,afterwhichfullbackpropagationfine-tuningisapplied.

    40

  • Discussion

    AlternativePretrainingMethodsforDNNs(Contd) PurelydiscriminativetrainingofthewholeDNNfromrandominitialweightsworkswell,too. Varioustypesofautoencoderwithonehiddenlayercanalsobeusedinthe layer-by-layergenerativepretrainingprocess.

    41

  • Discussion

    AlternativeFine-TuningMethodsforDNNsMostDBN-DNNacousticmodelsarefine-tunedbyapplyingstochasticgradientdescentwithmomentumtosmallminibatchesoftrainingcases.Moresophisticatedoptimizationmethodscanbeused,butitisnotclearthatthemoresophisticatedmethodsareworthwhilesincethefine-tuningprocessistypicallystoppedearlytopreventoverfitting.

    42

  • Discussion

    UsingDBN-DNNstoProvideInputFeaturesforGMM-HMMSystems ThisclassofmethodsuseneuralnetworkstoprovidethefeaturevectorsforthetrainingprocessoftheGMMinaGMM-HMMsystem. Themostcommonapproachistotrainarandomlyinitializedneuralnetwithanarrowbottleneckmiddlelayerandtousetheactivationsofthebottleneckhiddenunitsasfeatures.

    43

  • Discussion

    UsingDNNstoEstimateArticulatoryFeaturesforDetection-BasedSpeechRecognition DBN-DNNsareeffectivefordetectingsubphoneticspeechattributes(alsoknownasphonologicalorarticulatoryfeatures).

    44

  • Discussion

    SummaryMostofthegaincomesfromusingDNNstoexploitinformationinneighboringframesandfrommodelingtiedcontext-dependentstates. Thereisnoreasontobelievethattheoptimaltypesofhiddenunitsortheoptimalnetworkarchitecturesareused,anditishighlylikelythatboththepretrainingandfine-tuningalgorithmscanbemodifiedtoreducetheamountofoverfittingandtheamountofcomputation.

    45

  • Thank You

    46

  • InvestigationofSpeechSeparationasaFront-Endfor

    NoiseRobustSpeechRecognition

    Narayanan,Arun,andDeLiangWang."Investigationofspeechseparationasafront-endfornoiserobustspeechrecognition."Audio,Speech,andLanguageProcessing,IEEE/ACMTransactionson 22.4

    (2014):826-835.

    Presented by PeidongWang04/04/2016

    47

  • Content

    Introduction SystemDescription EvaluationResults Discussion

    48

  • Content

    Introduction SystemDescription EvaluationResults Discussion

    49

  • Introduction

    Background Althoughautomaticspeechrecognition(ASR)systemshavebecomefairlypowerful,theinherentvariabilitycanstillposechallenges. Typically,ASRsystemsthatworkwellincleanconditionssufferfromadrasticlossofperformanceinthepresenceofnoise.

    50

  • Introduction

    Feature-BasedMethods Thisclassofmethodsfocusonfeatureextractionorfeaturenormalization. Feature-basedtechniqueshavethepotentialtogeneralizewell,butdonotalwaysproducethebestresults.

    51

  • Introduction

    TwoGroupsofFeature-BasedMethodsWhenstereo[*] data isunavailable,priorknowledgeaboutspeechand/ornoiseisused,suchasspectralreconstructionbasedmissingfeaturemethods,directmaskingmethodsandfeatureenhancementmethods.Whenstereodataisavailable,featuremappingmethodsandrecurrentneuralnetworkshavebeenused.

    [*]Bystereowemeannoisyandthecorresponding cleansignals.

    52

  • Introduction

    Model-BasedMethods TheASRmodelparametersareadaptedtomatchthedistributionofnoisyorenhancedfeatures.Model-basedmethodsworkwellwhentheunderlyingassumptionsaremet,buttypicallyinvolvesignificantcomputationaloverhead. Thebestperformancesareusuallyobtainedbycombiningfeature-basedandmodel-basedmethods.

    53

  • Introduction

    SupervisedClassificationBasedSpeechSeparation Stereotrainingdataisalsousedbysupervisedclassificationbasedspeechseparationalgorithms. Suchalgorithmstypicallyestimatetheidealbinarymask(IBM)-abinarymaskdefinedinthetime-frequency(T-F)domainthatidentifiesspeechdominantandnoisedominantT-Funits. Theabovemethodcanbeextendedtoidealratiomask(IRM),which representstheratioofspeechtomixture energy.

    54

  • Content

    Introduction SystemDescription EvaluationResults Discussion

    55

  • SystemDescription

    BlockDiagramoftheProposedSystem

    From:Thepaper56

  • SystemDescription

    AddressingAdditiveNoiseandConvolutionalDistortion Theadditivenoiseandtheconvolutionaldistortionaredealtwithintwoseparatestages:Noiseremovalfollowedbychannelcompensation. NoiseisremovedviaT-FmaskingusingtheIRM.Tocompensateforchannelmismatchandtheerrorsintroducedbymasking,welearnanon-linearmappingfunctionthatundoesthesedistortions.

    57

  • SystemDescription

    Time-FrequencyMasking

    58

  • SystemDescription

    Time-FrequencyMasking(Contd) HeretheauthorsperformT-Fmaskinginthemel-frequencydomain,unlikesomeoftheothersystemsthatoperateinthegammatonefeaturedomain. Toobtainthemel-spectrogramofasignal,itisfirstpre-emphasizedandtransformedtothelinearfrequencydomainusinga320channelfastFouriertransform(FFT).A20msecHammingwindowisused. The161-dimensionalspectrogramisthenconvertedtoa26-channelmel-spectrogram.

    59

  • SystemDescription

    Time-FrequencyMasking(Contd) TheauthorsuseDNNstoestimatetheIRMasDNNsshowgoodperformanceandtrainingusingstochasticgradientdescentscaleswellcomparedtoothernonlineardiscriminativeclassifiers.

    60

  • SystemDescription

    Time-FrequencyMasking(Contd) TargetSignal Theidealratiomaskisdefinedastheratioofthecleansignalenergytothemixtureenergyateachtime-frequencyunit. Themathematicalexpressionisshownbelow.

    IRM (t, f ) = 10(SNR(t , f )/10)

    10(SNR(t , f )/10) +1SNR(t, f ) = 10 log10 (X(t, f ) / N(t, f ))

    61

  • SystemDescription

    Time-FrequencyMasking(Contd) TargetSignal RatherthanestimatingIRMdirectly,theauthorsestimateatransformedversionoftheSNR. Themathematicalexpressionofthesigmoidaltransformationisshownbelow.

    d(t, f ) = 11+ exp( (SNR(t, f ) ))

    62

  • SystemDescription

    Time-FrequencyMasking(Contd) TargetSignal Duringtesting,thevaluesoutputfromtheDNNaremappedbacktotheircorrespondingIRMvalues.

    63

  • SystemDescription

    Time-FrequencyMasking(Contd) Features Featureextractionisperformedbothatthefullbandandthesubbandlevel. Thecombinationoffeatures,31dimensionalMFCCs,13dimensionalFASTAfilteredPLPsand15dimensionalamplitudemodulationspectrogram(AMS)features,areused.

    64

  • SystemDescription

    Time-FrequencyMasking(Contd) Features ThefullbandfeaturesarederivedbysplicingtogetherfullbandMFCCsandRASTA-PLPs,alongwiththeirdeltaandaccelerationcomponents,andsubbandAMSfeatures. ThesubbandfeaturesarederivedbysplicingtogethersubbandMFCCs,RASTA-PLPs,andAMSfeatures.Someauxiliarycomponentsarealsoadded.

    65

  • SystemDescription

    Time-FrequencyMasking(Contd) SupervisedLearning IRMestimationisperformedintwostages.Inthefirststage,multipleDNNsaretrainedusingfullbandandsubbandfeatures.ThefinalestimateisobtainedusinganMLPthatcombinestheoutputofthefullbandandthesubbandDNNs.

    66

  • SystemDescription

    Time-FrequencyMasking(Contd) SupervisedLearning ThefullbandDNNswouldbecognizantoftheoverallspectralshapeoftheIRMandtheinformationconveyedbythefullbandfeatures,whereasthesubbandDNNsareexpectedtobemorerobusttonoiseoccurringatfrequenciesoutsidetheirpassband.

    67

  • SystemDescription

    Time-FrequencyMasking(Contd)

    From:Thepaper 68

  • SystemDescription

    FeatureMapping

    69

  • SystemDescription

    FeatureMapping(Contd) EvenafterT-Fmasking,channelmismatchcanstillsignificantlyimpactperformance. Thishappensfortworeasons.Firstly,thealgorithmlearnstoestimatetheratiomaskusingmixturesofspeechandnoiserecordedusingasinglemicrophone.Secondly,becausechannelmismatchisconvolutional,speechandnoise,whichnowincludesbothbackgroundnoiseandconvolutivenoise,areclearlynotuncorrelated.

    70

  • SystemDescription

    FeatureMapping(Contd) Thegoaloffeaturemappinginthisworkistolearnspectro-temporalcorrelationsthatexistinspeechtoundothedistortionsintroducedbyunseenmicrophonesandthefirststageofthealgorithm.

    71

  • SystemDescription

    FeatureMapping(Contd) TargetSignal Thetargetisthecleanlog-melspectrogram(LMS).ThecleanLMSherecorrespondstothoseobtainedfromthecleansignalsrecordedusingasinglemicrophoneinasinglefiltersetting.

    72

  • SystemDescription

    FeatureMapping(Contd) TargetSignal InsteadofusingtheLMSdirectlyasthetarget,theauthorsapplyalineartransformtolimitthetargetvaluestotherange[0,1]tousethesigmoidaltransferfunctionfortheoutputlayeroftheDNN. Themathematicalexpressionisasfollows.

    Xd (t, f ) =ln(X(t, f ))min(ln(X(, f )))

    max(ln(X(, f )))min(ln(X(, f )))

    73

  • SystemDescription

    FeatureMapping(Contd) TargetSignal Duringtesting,theoutputoftheDNNismappedbacktothedynamicrangeoftheutterancesintrainingset.

    74

  • SystemDescription

    FeatureMapping(Contd) Features TheauthorsuseboththenoisyandthemaskedLMS.

    SupervisedLearning UnliketheDNNsusedforIRMestimation,thehiddenlayersoftheDNNforthistaskuserectifiedlinearunits(ReLUs).Inaddition,theoutputlayerusessigmoidactivations.

    75

  • SystemDescription

    FeatureMapping(Contd)

    From:Thepaper76

  • SystemDescription

    AcousticModeling

    77

  • SystemDescription

    AcousticModeling(Contd) TheacousticmodelsaretrainedusingtheAurora-4dataset. Aurora-4isa5000-wordclosedvocabularyrecognitiontaskbasedontheWallStreetJournaldatabase.Thecorpushastwotrainingsets,cleanandmulti-condition,bothwith7138utterances.

    78

  • SystemDescription

    AcousticModeling(Contd) GaussianMixtureModels TheHMMsandtheGMMsareinitiallytrainedusingthecleantrainingset.Thecleanmodelsarethenusedtoinitializethemulti-conditionmodels;bothcleanandmulti-conditionmodelshavethesamestructureanddifferonlyintransitionandobservationprobabilitydensities.

    79

  • SystemDescription

    AcousticModeling(Contd) DeepNeuralNetworks Theauthorsfirstalignthecleantrainingsettoobtainsenonelabelsateachtime-frameforallutterancesinthetrainingset.DNNsarethentrainedtopredicttheposteriorprobabilityofsenonesusingeithercleanfeaturesorfeaturesextractedfromthemulti-conditionset.

    80

  • SystemDescription

    DiagonalFeatureDiscriminantLinearRegression

    81

  • SystemDescription

    DiagonalFeatureDiscriminantLinearRegression(Contd) dFDLRisasemi-supervisedfeatureadaptationtechnique. ThemotivationfordevelopingdFDLRistoaddresstheproblemofgeneralizationtounseenmicrophoneconditionsinourdataset,whichiswheretheDNN-HMMsystemsperformtheworst.

    82

  • SystemDescription

    DiagonalFeatureDiscriminantLinearRegression(Contd) ToapplydFDLR,wefirstobtainaninitialsenone-levellabelingforourtestutterancesusingtheunadaptedmodels.Featuresarethentransformedtominimizethecross-entropyerrorinpredictingtheselabels. Themathematicalexpressionsareasfollow.

    Ot ( f ) = wf iOt ( f )+ bf

    min E(st ,Dout (Ot5...Ot+5 ))t

    83

  • SystemDescription

    DiagonalFeatureDiscriminantLinearRegression(Contd) TheparameterscaneasilybelearnedwithintheDNNframeworkbyaddingalayerbetweentheinputlayerandthefirsthiddenlayeroftheoriginalDNN. Afterinitialization,thestandardbackpropagationalgorithmisrunfor10epochstolearntheparametersofthedFDLRmodel. Duringbackpropagation,weightsoftheoriginalhiddenlayersarekeptunchangedandonlytheparametersinthedFDLRareupdated.

    84

  • Content

    Introduction SystemDescription EvaluationResults Discussion

    85

  • EvaluationResults

    From:Thepaper86

  • EvaluationResults

    From:Thepaper87

  • Content

    Introduction SystemDescription EvaluationResults Discussion

    88

  • Discussion

    Severalinterestingobservationscanbemadefromtheresultspresentedintheprevioussection. Firstly,theresultsclearlyshowthatthespeechseparationfront-endisdoingagoodjobatremovingnoiseandhandlingchannelmismatch. Secondly,withnochannelmismatch,T-Fmaskingaloneworkedwellinremovingnoise.

    89

  • Discussion

    Finally,directlyperformingfeaturemappingfromnoisyfeaturestocleanfeaturesperformsreasonably,butitdoesnotperformaswellastheproposedfront-end.

    90

  • Thank You

    91