Comparison of Noise Removal and Echo Cancellation for Audio Signals

Embed Size (px)

Citation preview

  • 7/29/2019 Comparison of Noise Removal and Echo Cancellation for Audio Signals

    1/3

    Project Report

    Implementation and Comparison of Noise Removal and Echo

    Cancellation for Audio Signals

    Adersh [email protected]

    Course: SIV864

    Indian Institute Of Technology, Delhi

    I. PURPOSE

    The digital signal processing, source of noise, measurement

    of information loss, enhancement and suppression of signals

    are important in studying information filtering of a signal.

    Speech signals are evaluated and processed in transformed

    domain using digital signal processing to reduce noise and toremove undesired speech signals. The transmission medium,

    compression techniques and noisy environments are the main

    sources of degradation of speech. The type of noise signal

    depends on the source of noise. Purpose of this project is to

    study some noise removal and echo cancellation techniques

    and analysis of some basic implementation results.

    This project report is organized as follows. In section 2,

    objective methods to evaluate improvement in the quality of

    speech signal are discussed. In section 3, measurement and

    analysis of noise power spectrum are discussed and then tech-

    niques to remove dominant noise components and cancellation

    of echo in speech signals are described. That includes the

    review of current literature on speech enhancements. In section4, results of experiment of some known speech enhancements

    techniques are analyzed.

    I I . OBJECTIVE MEASUREMENT OF NOISE

    The quality and intelligibility of speech signal should be

    measured to quantify the reversal of degradation [1]. There are

    two categories to measure the amount of noise present before

    and after speech processing. First, subjective measurement

    techniques require intervention of human listeners. These tech-

    niques are standardized for phonetics tests [2], word intelligi-

    bility and sentence intelligibility methods. Second, objective

    measurement techniques require comparison of original and

    processed signals and those results are considered as authenticin comparison with subjective tests.

    These are further divided into two groups - intrusive and

    non-intrusive methods. Intrusive methods are used when orig-

    inal speech signal is clean and processed signal has gone

    through communication channel, compression and decom-

    pression cycles and/or other speech processing techniques.

    Both signals are divided into short window from 10 to 30

    milliseconds. The signal to noise ratio is measured as a

    global and local scores for window and complete signal.

    Those are called segmented SNR techniques. There are some

    experiments performed to compare these score with subjective

    tests [3] [4]. The difference between the noisy and processed

    signal is multiplied by a constant term that is decided based on

    the clean signal. Non-intrusive methods are used when original

    clean signal is not available [5]. Amount of enhancement iscomputed from noisy and processed signal alone. In case of

    live telecast or playing stored audio signals, these methods are

    primarily used.

    The intelligibility of audio is due to distortion in the speech

    signals, background noise or both. Yi Hu evaluated various

    objective quality measurement criteria [6]. Some of those

    are segmented SNR, weighted spectral slope (WSS), PESQ,

    log-likelihood ratio (LLR), Itakura-Satio distance (IS), and

    cepstrum distance (CEP). Yi Hu has done extensive study of

    these measures and provided information on estimated corre-

    lated coefficients and standard deviation of objective measures

    with overall quality, distortion in signal and distortion due to

    background noise. It was concluded that segmentation SNRformula was giving poor results with over all quality and,

    therefore, should not be used for performance measure of

    enhancement algorithms. Through study illustrates that most

    of the enhancement measurement criteria shows better results

    in case of signal distortion but not for background noise.

    Therefore, selection of measures should also consider type of

    noise to be treated.

    Jianfen Ma [7] proposed three measures to account

    distortions introduced in the processed speech due

    to enhancement algorithms. Those three measures -

    SNRLOSS,ESC,SNRLESC are derived from SNR

    and used to test on consonants and sentence signals.

    III. SPEECH ENHANCEMENT TECHNIQUES

    In the previous section, degradation of speech signal and

    addition of echo are considered as two broad groups for loss

    of intelligibility of speech. Here, those are described in terms

    of signal processing methods to remove those degradation

    [1]. The speech signal is divided into the small overlapping

    window of small sizes. Generally, 50% overlapping is used.

    Length of signal in such window is in the range of 10 to 30

  • 7/29/2019 Comparison of Noise Removal and Echo Cancellation for Audio Signals

    2/3

    milliseconds. Short Time Fourier Transform is applied to each

    window and subsequent processing is performed.

    S(ejw) = G(ejw)X(ejw)

    Spectral subtraction method is most commonly used to

    remove the background noise [8]. The hamming coefficients

    are used to subtract a part of magnitude of noisy signal. Thephase is unaltered. This method leaves the broadband noise

    and narrow band spectral spikes. These are responsible for

    tonal noise. Some improvements are suggested with modifi-

    cation of gain function G(ekw) [9]. Here, SNR based non-intrusive speech evaluation measures are used to quantify the

    enhancement.

    Recent advancements in enhancement algorithms are pro-

    posed to process signal in time and frequency domain to

    remove the background noise [10]. This method addresses high

    SNR regions in time domain while removing degradation in

    spectral domain.

    The temporal and spectral processing based methods are

    proposed for echo cancellation [11]. This method uses signalto reverberation ratio (SRR) regions in the temporal domain.

    The spectral processing and temporal processing are per-

    formed in sequence. The segmental SRR and log spectral

    distance are computed as objective measures.

    Spectral subtraction based methods are combined with

    RASTA processing to remove tonal noise along with borad-

    band and additive stationary noise.

    Non-stationary noise environment introduces additional

    complexity that is resolved. The optimally-modified log-

    spectral amplitude (OM-LSA) speech estimator and minima

    controlled recursive averaging (MCRA) noise estimators are

    used before applying spectral gain function [12].

    IV. EXPERIMENTS AND RESULTS

    The background noise and distortion due to reverberation

    are commonly available degradation in speech. Two experi-

    ments are performed to remove those two degradation from

    mono and stereo speech signals. For removing echo and

    background noise, intrusive objective technique is used.

    Echo effects are added into the clean speech and FFT

    magnitude truncation method is used to remove the echo

    effects. The plot clean and enhanced signal after removing

    noise for 1st and 2nd channel is shown in Fig-1 and Fig-2.

    Spectral subtraction algorithm is used to remove background

    distortion from a given noisy speech signal [8]. The hammingwindow size is 256. Standard MATLAB function is used togenerate hamming coefficients. Those hamming coefficients

    are used to remove back ground noise in transformed domain.

    SNR loss is the intrusive method to measure the enhancement

    in processed signal. Variance of noisy and enhanced signal is

    computed for SNR loss.

    SNRnoise = 10 log10(variance(clean)

    variance(noisy))

    Fig. 1. Echo cancellation from 1st channel

    Fig. 2. Echo cancellation from 2nd channel

    SNRenhanced = 10 log10(variance(clean)

    variance(enhanced clean))

    The plot of clean, noisy and enhanced signal is shown in

    Fig-3.

    REFERENCES

    [1] C. Labs, Speech Enhancement Tutorial. [Online]. Available: http://www.clear-labs.com/tutorial

    [2] R. L. Miller, Nature of the vocal cord wave, J Acoust Soc Am, vol. 31,

    no. 6, pp. 667677, Jun. 1959.[3] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, Perceptual evaluation

    of speech quality (pesq)-a new method for speech quality assessmentof telephone networks and codecs, in Acoustics, Speech, and SignalProcessing, 2001. Proceedings. (ICASSP 01). 2001 IEEE InternationalConference on, vol. 2, 2001, pp. 749 752 vol.2.

    [4] T. Yamada, M. Kumakura, and N. Kitawaki, Subjective and objectivequality assessment of noise reduced speech signals, in Nonlinear Signaland Image Processing, 2005. NSIP 2005. Abstracts. IEEE-Eurasip, may2005, p. 28.

    [5] A. Rix, Perceptual speech quality assessment - a review, in Acoustics,Speech, and Signal Processing, 2004. Proceedings. (ICASSP 04). IEEE

    International Conference on, vol. 3, may 2004, pp. iii 10569 vol.3.

  • 7/29/2019 Comparison of Noise Removal and Echo Cancellation for Audio Signals

    3/3

    Fig. 3. Spectral noise removal

    [6] Y. Hu and P. C. Loizou, Evaluation of Objective Quality Measuresfor Speech Enhancement, Audio, Speech, and Language Processing,

    IEEE Transactions on, vol. 16, no. 1, pp. 229238, 2008. [Online].Available: http://dx.doi.org/10.1109/TASL.2007.911054

    [7] J. Ma and P. C. Loizou, Snr loss: A new objective measure forpredicting the intelligibility of noise-suppressed speech, Speech Com-munication, vol. 53, no. 3, pp. 340354, 2011.

    [8] S. F. Boll, Suppression of acoustic noise in speech using spectralsubtraction, IEEE Transactions on Acoustics, Speech, and Signal Pro-cessing, vol. 27, no. 2, pp. 113120, Apr. 1979.

    [9] M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech cor-rupted by acoustic noise, in Acoustics, Speech, and Signal Processing,

    IEEE International Conference on ICASSP 79., vol. 4, apr 1979, pp.208 211.

    [10] P. Krishnamoorthy and S. R. M. Prasanna, Enhancement of noisyspeech by temporal and spectral processing, Speech Commun.,vol. 53, no. 2, pp. 154174, Feb. 2011. [Online]. Available:http://dx.doi.org/10.1016/j.specom.2010.08.011

    [11] , Reverberant speech enhancement by temporal and spectralprocessing, Trans. Audio, Speech and Lang. Proc., vol. 17, no. 2,pp. 253266, Feb. 2009. [Online]. Available: http://dx.doi.org/10.1109/TASL.2008.2008039

    [12] I. Cohen and B. Berdugo, Speech enhancement for non-stationary noiseenvironments, Signal Processing, vol. 81, no. 11, pp. 24032418, 2001.