28
By Sarita Jondhale 1 preprocessor: “conditions” the speech signal s(n) orm which is more suitable for the analysis ocessor: operate on the ouptut x(m) to give the pro utput that are more suitable for recognition Generalizations of Filter Bank Analyzer ) ( ˆ m x ) ( ˆ n s

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

Embed Size (px)

DESCRIPTION

By Sarita Jondhale 3 Generalizations of Filter Bank Analyzer Postprocessor Operations Temporal smoothing of sequential filter bank output vectors. Frequency smoothing of individual filter bank output vectors. Normalization of each filter bank output vector Thresholding and/or quantization of the filter-bank outputs vectors Principal components analysis of the filter bank output vector. The purpose of postprocessor is to clean up the output so as to best represent the spectral information in the speech signal. sp

Citation preview

Page 1: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 1

Signal preprocessor: “conditions” the speech signal s(n) to newform which is more suitable for the analysis

Postprocessor: operate on the ouptut x(m) to give the processedoutput that are more suitable for recognition

Generalizations of Filter Bank Analyzer

)(ˆ mx

)(ˆ ns

Page 2: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 2

Generalizations of Filter Bank Analyzer

Preprocessor Operations Signal preemphasis: higher frequencies are increased in

amplitude Noise elimination Signal enhancement (to make the formant peaks more

prominent)

The purpose of pre processor is to make the speech signal as clean as possible

Page 3: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 3

Generalizations of Filter Bank AnalyzerPostprocessor Operations Temporal smoothing of sequential filter bank output vectors. Frequency smoothing of individual filter bank output vectors. Normalization of each filter bank output vector Thresholding and/or quantization of the filter-bank outputs

vectors Principal components analysis of the filter bank output vector.

The purpose of postprocessor is to clean up the output so as to best represent the spectral information in the speech signal.

sp

Page 4: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 4

Spectral Analysis

Two methods:

The Filter Bank spectrum

The Linear Predictive coding (LPC)

Page 5: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 5

Linear Predictive Coding

Estimating the parameters of the current speech sample by using the parameter values from linear combinations of past speech samples.

Page 6: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 8

Linear Predictive Coding (or “LPC”) is a method of predicting a sample of a speech signal based on several previous samples.

We can use the LPC coefficients to separate a speech signal into two parts: the transfer function (which contains the vocal quality) and the excitation (which contains the pitch and the sound).

Page 7: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 9

We can predict that the nth sample in a sequence of speech samples is represented by the weighted sum of the p previous samples:

P

k

k knSas1

)(

Page 8: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 10

The LPC Model

The speech sample at time n can be approximated as a linear combination of the past p speech samples,

.excitation theofgain theisG

and excitation is u(n) where)()()(

frame analysisspeech over theconstant assumed area,......,a,a tscoefficien thewhere

)(.....)2()1()(

1

p21

21

P

ii

p

nGuinsans

pnsansansans

Page 9: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 11

The number of samples (p) is referred to as the “order” of the LPC.

As p approaches infinity, we should be able to predict the nth sample exactly.

However, p is usually on the order of ten to twenty, where it can provide an accurate enough representation with a limited cost of computation.

The weights on the previous samples (ak) are chosen in order to minimize the squared error between the real sample and its predicted value.

Thus, we want the error signal e(n), which is sometimes referred to as the LPC residual, to be as small as possible

Page 10: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 18

Error signal

It is defined as difference between predicted sample and actual signal

S[n] is actual signal S[n] is predicted signal

p

kns1k

k ][a-s[n]s[n]-s[n]e[n]

Page 11: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 25

LPC Analysis Equations

Two methods of defining the range of speech (m) The autocorrelation method The covariance method

Page 12: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 26

The Autocorrelation Method

otherwise 01-N0 ),().(

{)(

mmwmns

msn

The speech signal s(m+n) is multiplied by a finite window, w(m)Which is zero outside the range 0≤m ≤N-1The purpose of window in the above equation is to taper the signal near m=0 and near m=N-1 so as to minimize the errors at section boundaries

Page 13: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 27

The Autocorrelation Method

Upper panel shows the running speech waveform s(m)

Middle panel shows the weighted section of speech

Bottom panel shows the resulting error signal en(m), based on optimum selection of predictor parameters

Page 14: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 30

The Autocorrelation Method

For m<0 , the prediction error i.e en(m)=0 since sn(m)=0 for all m<0

For m>N-1+p there is no prediction error because sn(m)=0 for all m>N-1

Page 15: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 38

Examples of LPC AnalysisThe figure shows the effect of LPC prediction order, p, on the prediction error En, for both voiced and unvoiced speech.•For small values of p (1-4) a sharp decrease in prediction error•As p increases error decreases much more slowly•The prediction error for unvoiced speech for a given value of p, is significantly higher than for voiced speech.•i.e. the unvoiced speech is less linearly predictable than voiced speech.

Page 16: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 40

LPC Processor for Speech Recognition

Page 17: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 42

LPC Processor for Speech Recognition1. Preemphasis: the digital system (First order FIR filter)

used in the preemphasizer is either fixed or slowly adaptive to average transmission conditions, noise background.The output is related to the input by

sp

Page 18: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 43

LPC Processor for Speech Recognition

2. Frame Blocking: In this step the preemphasized speech signal is blocked into frames of N samples, with adjacent frames being separated by M samples.

Page 19: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 44

Typical LPC Analysis Parameters N: number of samples in analysis frame. M: number of samples shift between analysis frames. p: LPC analysis order. Q: dimension of LPC derived cepstral vector. K: number of frames over which cepstral time derivatives are

computed.

Page 20: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 45

LPC Processor for Speech Recognition

• In the above figure M=(1/3)N• First frame consist of first N speech samples.• The second frame begins M samples after the first frame

and overlaps it by N-M samples• Similarly third frame begins 2M samples after the first frame (or M

samples after the first samples) and overlaps it by N-2M samples• This process continues until all the speech is accounted for within

one or more frames.

Page 21: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 46

LPC Processor for Speech Recognition

n.recognitiospeech for analysis LPC practicalany in eintolarabl issituation This 4.

analysis). from omitted isspeech more (as increases M as increasesmagnitude whoescomponent noisy acontain willspecral LPC

lost. totally be willsignalspeech theof some thereforeframes,adjacent between overlap no be will thereN,M If 3.

smooth. quite be willframe toframe from estimates spectral LPC then N,M If 2.

frame. toframe from correlated be willestimates spectral LPC resulting and overlap, framesadjacent then N,M If 1.

Page 22: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 47

LPC Processor for Speech Recognition3. Windowing: the next step in the processing is to window

each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame (same as short time spectrum in frequency domain)

i.e. we use the window to taper the signal to zero at the beginning and end of each frame

10 ,)()()(~signal theis windowingofresult The10 window, theis )(

signalspeech entire e within thframes ofnumber totalL1......,2,1,0 respeech whe of frame theis )( th

Nnnwnn

Nnnw

Llln

ll

l

Page 23: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 48

LPC Processor for Speech Recognition A “typical” window used for the autocorrelation method of

LPC is the Hamming window

10 ,1

2cos46.054.0)(

Nn

Nnnw

Page 24: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 49

LPC Processor for Speech Recognition4. Autocorrelation Analysis: Each frame of windowed signal

is next autocorrelated to give

used) are 16 to8 from p of (values analysis LPC theoforder theis n value,correlatiohighest theis p

p,0,1,2,....m ,)(~)(~)(1

0

mN

nlll mnxnxmr

Page 25: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 50

LPC Processor for Speech Recognition

5. LPC Analysis: the next step is the LPC analysis, which converts each frame of P+1 autocorrelations into an “LPC parameter set”.

The set might be The LPC coefficients The reflection (or PARCOR) coefficients The log area ratio coefficients The cepstral coefficients Or any desired transformations of the above sets

The method for converting from autocorrelation coefficients to an LPC parameter set is known as Durbin’s method

Page 26: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 53

LPC Processor for Speech Recognition The cepstral coefficients are the coefficients of the FT

representation of the log magnitude spectrum. The cepstral coefficients are more robust reliable feature set for

speech recognition than the LPC coefficients, the PARCOR coefficients, or the log area ratio coefficients.

Page 27: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 54

LPC Processor for Speech Recognition7. Parameter Weighting: because of the sensitivity of the low-

order cepstral coefficients to overall spectral slope and sensitivity of the high-order cepstral coefficients to noise, it is necessary to weight the cepstral coefficients by a tapered window to minimize these sensitivities.

Page 28: By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 57

8. Temporal Cepstral Derivative: the Cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame

Better representation can be obtained by including the information about temporal Cepstral derivative.

tat timet coefficien cepstral m theis )(c where

)(),(log

thm t

ettcteS

t m

mjmj