Upload
amiglani
View
219
Download
1
Embed Size (px)
Citation preview
7/29/2019 Comparison of Noise Removal and Echo Cancellation for Audio Signals
1/3
Project Report
Implementation and Comparison of Noise Removal and Echo
Cancellation for Audio Signals
Adersh [email protected]
Course: SIV864
Indian Institute Of Technology, Delhi
I. PURPOSE
The digital signal processing, source of noise, measurement
of information loss, enhancement and suppression of signals
are important in studying information filtering of a signal.
Speech signals are evaluated and processed in transformed
domain using digital signal processing to reduce noise and toremove undesired speech signals. The transmission medium,
compression techniques and noisy environments are the main
sources of degradation of speech. The type of noise signal
depends on the source of noise. Purpose of this project is to
study some noise removal and echo cancellation techniques
and analysis of some basic implementation results.
This project report is organized as follows. In section 2,
objective methods to evaluate improvement in the quality of
speech signal are discussed. In section 3, measurement and
analysis of noise power spectrum are discussed and then tech-
niques to remove dominant noise components and cancellation
of echo in speech signals are described. That includes the
review of current literature on speech enhancements. In section4, results of experiment of some known speech enhancements
techniques are analyzed.
I I . OBJECTIVE MEASUREMENT OF NOISE
The quality and intelligibility of speech signal should be
measured to quantify the reversal of degradation [1]. There are
two categories to measure the amount of noise present before
and after speech processing. First, subjective measurement
techniques require intervention of human listeners. These tech-
niques are standardized for phonetics tests [2], word intelligi-
bility and sentence intelligibility methods. Second, objective
measurement techniques require comparison of original and
processed signals and those results are considered as authenticin comparison with subjective tests.
These are further divided into two groups - intrusive and
non-intrusive methods. Intrusive methods are used when orig-
inal speech signal is clean and processed signal has gone
through communication channel, compression and decom-
pression cycles and/or other speech processing techniques.
Both signals are divided into short window from 10 to 30
milliseconds. The signal to noise ratio is measured as a
global and local scores for window and complete signal.
Those are called segmented SNR techniques. There are some
experiments performed to compare these score with subjective
tests [3] [4]. The difference between the noisy and processed
signal is multiplied by a constant term that is decided based on
the clean signal. Non-intrusive methods are used when original
clean signal is not available [5]. Amount of enhancement iscomputed from noisy and processed signal alone. In case of
live telecast or playing stored audio signals, these methods are
primarily used.
The intelligibility of audio is due to distortion in the speech
signals, background noise or both. Yi Hu evaluated various
objective quality measurement criteria [6]. Some of those
are segmented SNR, weighted spectral slope (WSS), PESQ,
log-likelihood ratio (LLR), Itakura-Satio distance (IS), and
cepstrum distance (CEP). Yi Hu has done extensive study of
these measures and provided information on estimated corre-
lated coefficients and standard deviation of objective measures
with overall quality, distortion in signal and distortion due to
background noise. It was concluded that segmentation SNRformula was giving poor results with over all quality and,
therefore, should not be used for performance measure of
enhancement algorithms. Through study illustrates that most
of the enhancement measurement criteria shows better results
in case of signal distortion but not for background noise.
Therefore, selection of measures should also consider type of
noise to be treated.
Jianfen Ma [7] proposed three measures to account
distortions introduced in the processed speech due
to enhancement algorithms. Those three measures -
SNRLOSS,ESC,SNRLESC are derived from SNR
and used to test on consonants and sentence signals.
III. SPEECH ENHANCEMENT TECHNIQUES
In the previous section, degradation of speech signal and
addition of echo are considered as two broad groups for loss
of intelligibility of speech. Here, those are described in terms
of signal processing methods to remove those degradation
[1]. The speech signal is divided into the small overlapping
window of small sizes. Generally, 50% overlapping is used.
Length of signal in such window is in the range of 10 to 30
7/29/2019 Comparison of Noise Removal and Echo Cancellation for Audio Signals
2/3
milliseconds. Short Time Fourier Transform is applied to each
window and subsequent processing is performed.
S(ejw) = G(ejw)X(ejw)
Spectral subtraction method is most commonly used to
remove the background noise [8]. The hamming coefficients
are used to subtract a part of magnitude of noisy signal. Thephase is unaltered. This method leaves the broadband noise
and narrow band spectral spikes. These are responsible for
tonal noise. Some improvements are suggested with modifi-
cation of gain function G(ekw) [9]. Here, SNR based non-intrusive speech evaluation measures are used to quantify the
enhancement.
Recent advancements in enhancement algorithms are pro-
posed to process signal in time and frequency domain to
remove the background noise [10]. This method addresses high
SNR regions in time domain while removing degradation in
spectral domain.
The temporal and spectral processing based methods are
proposed for echo cancellation [11]. This method uses signalto reverberation ratio (SRR) regions in the temporal domain.
The spectral processing and temporal processing are per-
formed in sequence. The segmental SRR and log spectral
distance are computed as objective measures.
Spectral subtraction based methods are combined with
RASTA processing to remove tonal noise along with borad-
band and additive stationary noise.
Non-stationary noise environment introduces additional
complexity that is resolved. The optimally-modified log-
spectral amplitude (OM-LSA) speech estimator and minima
controlled recursive averaging (MCRA) noise estimators are
used before applying spectral gain function [12].
IV. EXPERIMENTS AND RESULTS
The background noise and distortion due to reverberation
are commonly available degradation in speech. Two experi-
ments are performed to remove those two degradation from
mono and stereo speech signals. For removing echo and
background noise, intrusive objective technique is used.
Echo effects are added into the clean speech and FFT
magnitude truncation method is used to remove the echo
effects. The plot clean and enhanced signal after removing
noise for 1st and 2nd channel is shown in Fig-1 and Fig-2.
Spectral subtraction algorithm is used to remove background
distortion from a given noisy speech signal [8]. The hammingwindow size is 256. Standard MATLAB function is used togenerate hamming coefficients. Those hamming coefficients
are used to remove back ground noise in transformed domain.
SNR loss is the intrusive method to measure the enhancement
in processed signal. Variance of noisy and enhanced signal is
computed for SNR loss.
SNRnoise = 10 log10(variance(clean)
variance(noisy))
Fig. 1. Echo cancellation from 1st channel
Fig. 2. Echo cancellation from 2nd channel
SNRenhanced = 10 log10(variance(clean)
variance(enhanced clean))
The plot of clean, noisy and enhanced signal is shown in
Fig-3.
REFERENCES
[1] C. Labs, Speech Enhancement Tutorial. [Online]. Available: http://www.clear-labs.com/tutorial
[2] R. L. Miller, Nature of the vocal cord wave, J Acoust Soc Am, vol. 31,
no. 6, pp. 667677, Jun. 1959.[3] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, Perceptual evaluation
of speech quality (pesq)-a new method for speech quality assessmentof telephone networks and codecs, in Acoustics, Speech, and SignalProcessing, 2001. Proceedings. (ICASSP 01). 2001 IEEE InternationalConference on, vol. 2, 2001, pp. 749 752 vol.2.
[4] T. Yamada, M. Kumakura, and N. Kitawaki, Subjective and objectivequality assessment of noise reduced speech signals, in Nonlinear Signaland Image Processing, 2005. NSIP 2005. Abstracts. IEEE-Eurasip, may2005, p. 28.
[5] A. Rix, Perceptual speech quality assessment - a review, in Acoustics,Speech, and Signal Processing, 2004. Proceedings. (ICASSP 04). IEEE
International Conference on, vol. 3, may 2004, pp. iii 10569 vol.3.
7/29/2019 Comparison of Noise Removal and Echo Cancellation for Audio Signals
3/3
Fig. 3. Spectral noise removal
[6] Y. Hu and P. C. Loizou, Evaluation of Objective Quality Measuresfor Speech Enhancement, Audio, Speech, and Language Processing,
IEEE Transactions on, vol. 16, no. 1, pp. 229238, 2008. [Online].Available: http://dx.doi.org/10.1109/TASL.2007.911054
[7] J. Ma and P. C. Loizou, Snr loss: A new objective measure forpredicting the intelligibility of noise-suppressed speech, Speech Com-munication, vol. 53, no. 3, pp. 340354, 2011.
[8] S. F. Boll, Suppression of acoustic noise in speech using spectralsubtraction, IEEE Transactions on Acoustics, Speech, and Signal Pro-cessing, vol. 27, no. 2, pp. 113120, Apr. 1979.
[9] M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech cor-rupted by acoustic noise, in Acoustics, Speech, and Signal Processing,
IEEE International Conference on ICASSP 79., vol. 4, apr 1979, pp.208 211.
[10] P. Krishnamoorthy and S. R. M. Prasanna, Enhancement of noisyspeech by temporal and spectral processing, Speech Commun.,vol. 53, no. 2, pp. 154174, Feb. 2011. [Online]. Available:http://dx.doi.org/10.1016/j.specom.2010.08.011
[11] , Reverberant speech enhancement by temporal and spectralprocessing, Trans. Audio, Speech and Lang. Proc., vol. 17, no. 2,pp. 253266, Feb. 2009. [Online]. Available: http://dx.doi.org/10.1109/TASL.2008.2008039
[12] I. Cohen and B. Berdugo, Speech enhancement for non-stationary noiseenvironments, Signal Processing, vol. 81, no. 11, pp. 24032418, 2001.