24
Speech parameterization using the Mel scale Part II T. Thrasyvoulou and S. Benton

Speech parameterization using the Mel scale Part IIkilyos.ee.bilkent.edu.tr/~onaran/SP-4.pdf · References [1] Wong, E. and Sridharan, S. “Comparison of linear prediction cepstrum

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Speech parameterizationusing theMel scale

Part IIT. Thrasyvoulou and S. Benton

Speech Cepstrum procedure

Pre-emphasis Framing Windowing

FFTSpectrum

MelSpectrum

MelCepstrum

For FFT basedMel Cepstrum

Speech

OR

LPC LPCSpectrum

For LPC based Mel Cepstrum

Speech Cepstrum procedure

Pre-emphasis Framing Windowing

FFTSpectrum

MelSpectrum

MelCepstrum

For FFT basedMel Cepstrum

Speech

OR

LPC LPCSpectrum

For LPC based Mel Cepstrum

Pre-emphasis/Framing/Windowing

10-20ms frames

Pre-emphasis Framing Windowing

H(z)=1+az-1

a=-0.95 Hamming window 20.54 0.46cos( )1n

−−

Reduces the edge effect when taking the FFTImproves spectral estimate accuracy

Used to boost thesignal spectrumapproximately20dB/decade

Speech Cepstrum procedureFor FFT based Mel Cepstrum

Pre-emphasis Framing Windowing

FFTSpectrum

MelSpectrum

MelCepstrum

Speech

FFT SpectrumFFT

Spectrum

Frame(Time domain)

FFT | |2

Frame(Frequency

domain)

FFT of length N

Speech Cepstrum procedureFor LPC based Mel Cepstrum

Pre-emphasis Framing Windowing

MelSpectrum

MelCepstrumLPC LPC

Spectrum

Speech

LPC Spectrum

[ai]

LP coefficients

LPCSpectrumLPC

Frame(Time domain)

| |2FrequencyResponse

LPcoefficients

[ai]

1

( )1

Mi

ii

GH zaz−

=

=+∑

Reduces the spectrum varianceEliminates various source information

Speech Cepstrum procedure

Pre-emphasis Framing Windowing

FFTSpectrum

MelSpectrum

MelCepstrum

LPC LPCSpectrum

For FFT basedMel Cepstrum

Speech

OR

For LPC based Mel Cepstrum

Mel Spectrum

MelFilter BankFFT or LPC Spectrum s(k)

Mel Filter bank

0 500 1000 1500 2000 2500 3000 3500 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mel s cale filter bank

Gai

n

Frequency (Hz)

# samples =# filters in filter bank

Mel Filter bank frequency table

0,1,..., 1( )

0,1,..., / 2l

l LM k

k N= −= 4KHz100Hz

4KHz3031Hz

1l =

18l =

0,1,... 1( )

0,1,..., / 2l

l LM k

k N= −=

center frequency

Mel Filter bank

1

N/2 points

2∑

L=20 pointsL<N20 ∑

Multiply the power spectrum with each of the triangular Mel weighting filters and add the result Perform a weighted averaging procedure around the Mel frequency, similar to a weighted Daniel periodogram

Mel Filter bank

/ 2

0( ) ( ) ( ) 0,1,..., 1

N

lk

S l S k M k l L=

= = −∑%

Filter from filter bankthl

Original Spectrum

skfk HzN

− − >

Total number oftriangular Melweighing filters (20)

Half the FFT sizeMel Spectrum

Will get the wholerange of frequenciesbut only L samples

MelFilter Bank

MelFilter Bank

Mel spectrum plots

FFT-based

LPC-based

Mel spectrum plots

0 1000 2000 3000 40000

0.5

1

1.5

2

2.5

3x 10

-4 FFT bas ed S pectrum

Am

plitu

de

Frequency (Hz)

0 1000 2000 3000 40000

20

40

60

80LP C bas ed S pectrum

Am

plitu

de

Frequency (Hz)

0 1000 2000 3000 40000

0.2

0.4

0.6

0.8

1Mel-FFT bas ed S pectrum

Nor

mal

ized

Am

plitu

de

Frequency (Hz)

0 1000 2000 3000 40000

0.2

0.4

0.6

0.8

1Mel-LP C bas ed S pectrum

Nor

mal

ized

Am

plitu

de

Frequency (Hz)

•BW of the peak expandsbecause of the loss ofresolution in Mel scale

•Peaks that are not locatedexactly at the center Mel frequency of theircorresponding filtermay move to center onthe Mel frequencye.g.110Hz peak 100Hz

Mel spectrum plots

0 1000 2000 3000 40000

1

2

3

4

5x 10

-5 FFT bas ed S pectrum

Am

plitu

de

Frequency (Hz)

0 1000 2000 3000 40000

1

2

3

4

5

6LP C bas ed S pectrum

Am

plitu

de

Frequency (Hz)

0 1000 2000 3000 40000

0.2

0.4

0.6

0.8

1Mel-FFT bas ed S pectrum

Nor

mal

ized

Am

plitu

deFrequency (Hz)

0 1000 2000 3000 40000

0.2

0.4

0.6

0.8

1Mel-LP C bas ed S pectrum

Nor

mal

ized

Am

plitu

de

Frequency (Hz)

•Loss of resolution

•Maintains prominentpeaks

Mel spectrum plots

0 1000 2000 3000 40000

0.05

0.1

0.15

0.2

0.25FFT bas ed S pectrum

Am

plitu

de

Frequency (Hz)

0 1000 2000 3000 40000

5

10

15

20LP C bas ed S pectrum

Am

plitu

de

Frequency (Hz)

0 1000 2000 3000 40000

0.2

0.4

0.6

0.8

1Mel-FFT bas ed S pectrum

Nor

mal

ized

Am

plitu

deFrequency (Hz)

0 1000 2000 3000 40000

0.2

0.4

0.6

0.8

1Mel-LP C bas ed S pectrum

Nor

mal

ized

Am

plitu

de

Frequency (Hz)

•Small peaks close tolarge ones are lumpedinto one main peak

Speech Cepstrum procedure

Pre-emphasis Framing Windowing

FFTSpectrum

MelSpectrum

MelCepstrum

For FFT basedMel Cepstrum

Speech

OR

LPC LPCSpectrum

For LPC based Mel Cepstrum

Speech Cepstrum parameterizationMel

Cepstrum

Log() DCT

# cepstralcoefficients desired

Total number oftriangular Melweighing filters (20)

1

2( ) log( ( )) cos ( 0.5) 0,1,... 1L

m

ic i S m m i CL L

π=

= − = −

∑ %DCT Equation:

Mel Spectrum

2 4 6 8 10 12 14-1

-0.5

0

0.5

Mel S caled Ceps tral CoeffN

orm

aliz

ed A

mpl

itude

Coefficient number

FFT bas edLP C bas ed

Mel Scaled Cepstral Coefficients

Mel Scaled Cepstral Coefficients

2 4 6 8 10 12 14-1

-0.5

0

0.5

Mel S caled Ceps tral CoeffN

orm

aliz

ed A

mpl

itude

Coefficient number

FFT bas edLP C bas ed

Why the DCT?

• The signal is real with mirror symmetry• The IFFT requires complex arithmetic• The DCT does NOT• The DCT implements the same function as

the FFT more efficiently by taking advantage of the redundancy in a real signal.

• The DCT is more efficient computationally

Conclusions

• LPC approximates speech linearly at all frequencies

• LPCC are more robust and reliable than LPC alone

• Mel-scaled LPCC and Mel-scaled FFT-CC are robust and also take into account the psychoacoustic properties of the human auditory system

References[1] Wong, E. and Sridharan, S. “Comparison of linear prediction cepstrum coefficients and Mel-frequency cepstrum coefficients for language identification,” Intelligent Multimedia, Video and Speech Processing Proceedings, pp. 95 -98, 2001.

[2] Molau, S., Pitz, M., Schluter, R. and Ney, H., “Computing Mel-frequency cepstral coefficients on the power spectrum,” Acoustics, Speech, and Signal Processing Proceedings, Volume: 1, pp. 73 -76, 2001.

[3] De Lima Araujo, A.M. and Violaro, F., “Formant frequency estimation using a Mel-scale LPC algorithm,” ITS '98 Proceedings, Volume: 1, pp. 207 -212, 1998.

[4] Picone, J.W., “Signal modeling techniques in speech recognition,” Proceedings of the IEEE , Volume: 81, Issue: 9, pp. 1215 -1247 Sep 1993.

[5] Umesh, S., Cohen, L. and Nelson, D., “Frequency warping and the Mel scale,” IEEE Signal Processing Letters , Volume: 9, Issue: 3, pp. 104 -107, 2002.

[6] http://www-2.cs.cmu.edu/~mseltzer/sphinxman/