Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10

Survey of INTERSPEECH 2013

Reporter: Yi-Ting Wang

2013/09/10

Outline

Exemplar-based Individuality-Preserving Voice Conversion for Articulation Disorders in Noisy Environments

Robust Speech Enhancement Techniques for ASR in Non-stationary Noise and Dynamic Environments

NMF-base Temporal Feature Integration for Acoustic Event Classificaion

Exemplar-based Individuality-Preserving Voice Conversion for Articulation Disorders in Noisy

Environments

Ryo AIHARA, Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, Yasuo ARIKI

Graduate School of System Informatics, Kobe University, Japen

Introduction

We present in this paper a noise robust voice conversion(VC) method for a person with an articulation disorder resulting from athetoid cerebral pslsy.

Exemplar-based spectral conversion using NMF is applied to a voice with an articulation disorder in real noisy environments.

NMF is a well-known approach for source separation and speech enhancement.

Poorly articulated noisy speech -> clean articulation

Voice conversion based on NMF

Constructing the individuality-preserving dictionary

Experimental Results

ATR Japanese speech database.

Conclusions

We proposed a noise robust spectral conversion method based on NMF for a voice with an articulation disorder.

Our VC method can improve the listening intelligibility of words uttered by a person with an articulation disorder in noisy environments.

Robust Speech Enhancement Techniques for ASR in Non-stationary Noise and Dynamic

Environments

Gang Liu, Dimitrios Dimitriadis, Enrico Bocchieri

Center for Robust Speech Systems, University of Texas at Dallas

Introduction

In the current ASR systems the presence of competing speakers greatly degrades the recognition performance.

Furthermore, speakers are, most often, not standing still while speaking.

We use Time Differences of Arrival(TDOA) estimation, multi-channel Wiener Filtering, NMF, multi-condition training, and robust feature extraction.

Proposed cascaded system

The problem of source localization/separation is often addressed by the TDOA estimation.

Experiment and results

Experiment and results

NMF provides the largest boost, due to the suppression of the non-stationary interfering signals.

Conclusion

We propose a cascaded system for speech recognition dealing with non-stationary noise in reverberated environments.

The proposed system offers an average of 50% and 45% in relative improvements for the above mentioned two scenarios.

NMF-base Temporal Feature Integration for Acoustic Event Classificaion

Jimmy Ludena-Choez, Ascension Gallardo-Antolin

Dep. of Signal Theory and Communications, Universidad Carlos III de Madrid, Avda de la Universidad 30,28911 – Leganes(Madrid), Spain

Introduction

This paper propose a new front-end for Acoustic Event Classification tasks(AEC) based on the combination of the temporal feature integration technique called Filter Bank Coefficients(FC) and Non-Negative Matrix Factorization.

FC allows to capture the dynamic structure in the short time features.

We present an unsupervised method based on NMF for the design of a filter bank more suitable for AEC.

Audio feature extraction

Experiments and results

Here, use the NMF use KL divergence.

Experiments and results

Conclusions

We have presented a new front-end for AEC based on the combination of FC features and NMF.

NMF is used for the unsupervised learning of the filter bank which captures the most relevant temporal behavior in the short-time features.

Low modulation frequencies are more important than the high ones for distinguishing between different acoustic events.

The experiments have shown that the features obtained with this method achieve significant improvements in the classification performance of a SVM-based AEC system in comparison with the baseline FC parameters.

Documents

Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10