Speech Enhancement Presentation_Group Meeting_09172015

A Tour Through the Wonderful World of Speech Enhancement

A Tour Through the Wonderful World of Speech EnhancementFemi OdelowoDefinitionSpeech enhancement is concerned with improving some perceptual aspect of speech that has been degraded by additive noise Speech Enhancement Theory and Practice, P. C. LoizouPerceptual aspects typically are the quality and/or intelligibility of the source signalAlgorithms could be broadly grouped depending on whether there is a single source or multiple sourcesSingle microphone or single channel speech enhancementMicrophone array or multichannel noise enhancement Focus is single channel enhancementSignal ModelSTFTProcessing Flow/Block DiagramThe noisy signal is broken into overlapping framesIndividual frames are processedThe enhanced speech signal is reassembled using the overlap add method

STFTParameter EstimationSpectral ModificationInverse STFTPhaseGain CalculationAlgorithmsSpectral subtractionConceptually the simplest to design/implementBased on the assumed additive nature of the noiseStatistical model-based algorithmsBased on a statistical estimation frameworkIncludes the Wiener and several minimum mean-square error (MMSE) algorithmsSubspace algorithmsBased on a linear algebra frameworkTypically use eigenvalue/eigenvector decomposition or SVDMachine learning algorithmsThe big bad new kid on the blockIncludes ICA, NMF, and DNNProblems With Classical MethodsAlgorithms need a good noise and/or SNR estimateMathematical accuracy is not necessarily the best!Noise estimation is worse with lower SNREnhanced sound is plagued with a distorted backgroundReferred to as musical noisePoor performance in non-stationary noiseExamples using the Wiener FilterThe Wiener filter seeks to minimize the MMSE E[e2(n)]

Exhibition Noise, 10dB Signal, Simple VAD

Exhibition Noise, 10dB Signal, IMCRA Algorithm

Exhibition Noise, 10dB Signal, Enhanced SpeechNoisy SignalEnhanced Signal, = 0.7 Enhanced Signal, oracle PSDs Enhanced Signal, = 1 Enhanced Signal, = 5 Enhanced Signal, = 0.7 Enhanced Signal, = 1 Enhanced Signal, = 5

Simple VADImproved MCRA Noise EstimationRestaurant Noise, 10dB Signal, Enhanced SpeechNoisy SignalEnhanced Signal, = 0.7 Enhanced Signal, oracle PSDs Enhanced Signal, = 1 Enhanced Signal, = 5 Enhanced Signal, = 0.7 Enhanced Signal, = 1 Enhanced Signal, = 5 Simple VADImproved MCRA Noise Estimation

SNR & Wiener Gain Estimation, Car Noise, 10dB

Wiener Gains, Car Noise, 10dB Signal

Important Statistical ModelsStatistical models are based on a probabilistic model of the DFT components of speech and additive noiseSignal Model: Short Time Spectral Amplitude (STSA) estimatorAlso called the Ephraim-Malah estimatorObtained asLog Spectral Amplitude (LSA) estimatorAlso due to Y. Ephraim and D. MalahObtained asA variant of this algorithm, the optimally-modified LSA (OM-LSA) by I. Cohen estimator is typically used as a benchmark for the classical algorithms

A Machine Learning ApproachCan we learn a gain function based on the SNR estimates that performs better than the Wiener gain? A generalized additive model (GAM) was fitted to the true Wiener gain using the decision-directed SNR, a posteriori SNR, and noise estimates as covariatesA GAM is a flexible modeling framework in which a linear predictor depends on either parametric or non-parametric functions of predictor variablesResults showed improved performance over Wiener filtering.Performance of the GAM Model

Performance of the GAM Model (contd.)

Other Machine Learning ApproachesIndependent Component AnalysisNon-negative Matrix FactorizationDeep Neural NetworksVery recent and have produced the best resultsSome interesting results from the publication Yong Xu et. al are at http://home.ustc.edu.cn/~xuyong62/demo/SE_DNN_taslp.htmlMore research is needed on how to obtain the best performance

Other Research AreasSpeech enhancement based on phase spectrum modificationPhase spectrum compensation (PSC) algorithm by K. Wojcicki et. al performed as well or slightly better than the STSA estimatorResearch results suggest the analysis window used and sidelobe attenuation levels are importantEnhancement utilizing both magnitude and phase correctionIdea is to gain the best of both worldsResults varied when the PSC and STSA estimator were combined

Questions/Discussion

Documents

Speech Enhancement Presentation_Group Meeting_09172015