Comparison of Noise Removal and Echo Cancellation for Audio Signals

7/29/2019 Comparison of Noise Removal and Echo Cancellation for Audio Signals

1/3

Project Report

Implementation and Comparison of Noise Removal and Echo

Cancellation for Audio Signals

Adersh [email protected]

Course: SIV864

Indian Institute Of Technology, Delhi

I. PURPOSE

The digital signal processing, source of noise, measurement

of information loss, enhancement and suppression of signals

are important in studying information filtering of a signal.

Speech signals are evaluated and processed in transformed

domain using digital signal processing to reduce noise and toremove undesired speech signals. The transmission medium,

compression techniques and noisy environments are the main

sources of degradation of speech. The type of noise signal

depends on the source of noise. Purpose of this project is to

study some noise removal and echo cancellation techniques

and analysis of some basic implementation results.

This project report is organized as follows. In section 2,

objective methods to evaluate improvement in the quality of

speech signal are discussed. In section 3, measurement and

analysis of noise power spectrum are discussed and then tech-

niques to remove dominant noise components and cancellation

of echo in speech signals are described. That includes the

review of current literature on speech enhancements. In section4, results of experiment of some known speech enhancements

techniques are analyzed.

I I . OBJECTIVE MEASUREMENT OF NOISE

The quality and intelligibility of speech signal should be

measured to quantify the reversal of degradation [1]. There are

two categories to measure the amount of noise present before

and after speech processing. First, subjective measurement

techniques require intervention of human listeners. These tech-

niques are standardized for phonetics tests [2], word intelligi-

bility and sentence intelligibility methods. Second, objective

measurement techniques require comparison of original and

processed signals and those results are considered as authenticin comparison with subjective tests.

These are further divided into two groups - intrusive and

non-intrusive methods. Intrusive methods are used when orig-

inal speech signal is clean and processed signal has gone

through communication channel, compression and decom-

pression cycles and/or other speech processing techniques.

Both signals are divided into short window from 10 to 30

milliseconds. The signal to noise ratio is measured as a

global and local scores for window and complete signal.

Those are called segmented SNR techniques. There are some

experiments performed to compare these score with subjective

tests [3] [4]. The difference between the noisy and processed

signal is multiplied by a constant term that is decided based on

the clean signal. Non-intrusive methods are used when original

clean signal is not available [5]. Amount of enhancement iscomputed from noisy and processed signal alone. In case of

live telecast or playing stored audio signals, these methods are

primarily used.

The intelligibility of audio is due to distortion in the speech

signals, background noise or both. Yi Hu evaluated various

objective quality measurement criteria [6]. Some of those

are segmented SNR, weighted spectral slope (WSS), PESQ,

log-likelihood ratio (LLR), Itakura-Satio distance (IS), and

cepstrum distance (CEP). Yi Hu has done extensive study of

these measures and provided information on estimated corre-

lated coefficients and standard deviation of objective measures

with overall quality, distortion in signal and distortion due to

background noise. It was concluded that segmentation SNRformula was giving poor results with over all quality and,

therefore, should not be used for performance measure of

enhancement algorithms. Through study illustrates that most

of the enhancement measurement criteria shows better results

in case of signal distortion but not for background noise.

Therefore, selection of measures should also consider type of

noise to be treated.

Jianfen Ma [7] proposed three measures to account

distortions introduced in the processed speech due

to enhancement algorithms. Those three measures -

SNRLOSS,ESC,SNRLESC are derived from SNR

and used to test on consonants and sentence signals.

III. SPEECH ENHANCEMENT TECHNIQUES

In the previous section, degradation of speech signal and

addition of echo are considered as two broad groups for loss

of intelligibility of speech. Here, those are described in terms

of signal processing methods to remove those degradation

[1]. The speech signal is divided into the small overlapping

window of small sizes. Generally, 50% overlapping is used.

Length of signal in such window is in the range of 10 to 30


2/3

milliseconds. Short Time Fourier Transform is applied to each

window and subsequent processing is performed.

S(ejw) = G(ejw)X(ejw)

Spectral subtraction method is most commonly used to

remove the background noise [8]. The hamming coefficients

are used to subtract a part of magnitude of noisy signal. Thephase is unaltered. This method leaves the broadband noise

and narrow band spectral spikes. These are responsible for

tonal noise. Some improvements are suggested with modifi-

cation of gain function G(ekw) [9]. Here, SNR based non-intrusive speech evaluation measures are used to quantify the

enhancement.

Recent advancements in enhancement algorithms are pro-

posed to process signal in time and frequency domain to

remove the background noise [10]. This method addresses high

SNR regions in time domain while removing degradation in

spectral domain.

The temporal and spectral processing based methods are

proposed for echo cancellation [11]. This method uses signalto reverberation ratio (SRR) regions in the temporal domain.

The spectral processing and temporal processing are per-

formed in sequence. The segmental SRR and log spectral

distance are computed as objective measures.

Spectral subtraction based methods are combined with

RASTA processing to remove tonal noise along with borad-

band and additive stationary noise.

Non-stationary noise environment introduces additional

complexity that is resolved. The optimally-modified log-

spectral amplitude (OM-LSA) speech estimator and minima

controlled recursive averaging (MCRA) noise estimators are

used before applying spectral gain function [12].

IV. EXPERIMENTS AND RESULTS

The background noise and distortion due to reverberation

are commonly available degradation in speech. Two experi-

ments are performed to remove those two degradation from

mono and stereo speech signals. For removing echo and

background noise, intrusive objective technique is used.

Echo effects are added into the clean speech and FFT

magnitude truncation method is used to remove the echo

effects. The plot clean and enhanced signal after removing

noise for 1st and 2nd channel is shown in Fig-1 and Fig-2.

Spectral subtraction algorithm is used to remove background

distortion from a given noisy speech signal [8]. The hammingwindow size is 256. Standard MATLAB function is used togenerate hamming coefficients. Those hamming coefficients

are used to remove back ground noise in transformed domain.

SNR loss is the intrusive method to measure the enhancement

in processed signal. Variance of noisy and enhanced signal is

computed for SNR loss.

SNRnoise = 10 log10(variance(clean)

variance(noisy))

Fig. 1. Echo cancellation from 1st channel

Fig. 2. Echo cancellation from 2nd channel

SNRenhanced = 10 log10(variance(clean)

variance(enhanced clean))

The plot of clean, noisy and enhanced signal is shown in

Fig-3.

REFERENCES

[1] C. Labs, Speech Enhancement Tutorial. [Online]. Available: http://www.clear-labs.com/tutorial

[2] R. L. Miller, Nature of the vocal cord wave, J Acoust Soc Am, vol. 31,

no. 6, pp. 667677, Jun. 1959.[3] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, Perceptual evaluation

of speech quality (pesq)-a new method for speech quality assessmentof telephone networks and codecs, in Acoustics, Speech, and SignalProcessing, 2001. Proceedings. (ICASSP 01). 2001 IEEE InternationalConference on, vol. 2, 2001, pp. 749 752 vol.2.

[4] T. Yamada, M. Kumakura, and N. Kitawaki, Subjective and objectivequality assessment of noise reduced speech signals, in Nonlinear Signaland Image Processing, 2005. NSIP 2005. Abstracts. IEEE-Eurasip, may2005, p. 28.

[5] A. Rix, Perceptual speech quality assessment - a review, in Acoustics,Speech, and Signal Processing, 2004. Proceedings. (ICASSP 04). IEEE

International Conference on, vol. 3, may 2004, pp. iii 10569 vol.3.


3/3

Fig. 3. Spectral noise removal

[6] Y. Hu and P. C. Loizou, Evaluation of Objective Quality Measuresfor Speech Enhancement, Audio, Speech, and Language Processing,

IEEE Transactions on, vol. 16, no. 1, pp. 229238, 2008. [Online].Available: http://dx.doi.org/10.1109/TASL.2007.911054

[7] J. Ma and P. C. Loizou, Snr loss: A new objective measure forpredicting the intelligibility of noise-suppressed speech, Speech Com-munication, vol. 53, no. 3, pp. 340354, 2011.

[8] S. F. Boll, Suppression of acoustic noise in speech using spectralsubtraction, IEEE Transactions on Acoustics, Speech, and Signal Pro-cessing, vol. 27, no. 2, pp. 113120, Apr. 1979.

[9] M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech cor-rupted by acoustic noise, in Acoustics, Speech, and Signal Processing,

IEEE International Conference on ICASSP 79., vol. 4, apr 1979, pp.208 211.

[10] P. Krishnamoorthy and S. R. M. Prasanna, Enhancement of noisyspeech by temporal and spectral processing, Speech Commun.,vol. 53, no. 2, pp. 154174, Feb. 2011. [Online]. Available:http://dx.doi.org/10.1016/j.specom.2010.08.011

[11] , Reverberant speech enhancement by temporal and spectralprocessing, Trans. Audio, Speech and Lang. Proc., vol. 17, no. 2,pp. 253266, Feb. 2009. [Online]. Available: http://dx.doi.org/10.1109/TASL.2008.2008039

[12] I. Cohen and B. Berdugo, Speech enhancement for non-stationary noiseenvironments, Signal Processing, vol. 81, no. 11, pp. 24032418, 2001.

Documents

Comparison of Noise Removal and Echo Cancellation for Audio Signals