21
EE513 Audio Signals and Systems LPC Analysis and Speech Kevin D. Donohue Electrical and Computer Engineering University of Kentucky

EE513 Audio Signals and Systems

  • Upload
    chick

  • View
    32

  • Download
    1

Embed Size (px)

DESCRIPTION

EE513 Audio Signals and Systems. LPC Analysis and Speech Kevin D. Donohue Electrical and Computer Engineering University of Kentucky. Speech Generation. - PowerPoint PPT Presentation

Citation preview

Page 1: EE513 Audio Signals and Systems

EE513Audio Signals and Systems

LPC Analysis and SpeechKevin D. Donohue

Electrical and Computer EngineeringUniversity of Kentucky

Page 2: EE513 Audio Signals and Systems

Speech GenerationSpeech can be divided into fundamental building blocks of sounds referred to as phonemes. All sounds result from turbulence through obstructed air flow

The vocal cords create quasi-periodic obstructions of air flow as a sound source at the base of the vocal tract. Phonemes associated with the vocal cord are referred to as voiced speech.

Single shot turbulence from obstructed air flow through the vocal tract is primarily generated by the teeth, tongue and lips. Phonemes associated with non-periodic obstructed air flow are referred to as unvoiced speech.

Taken from http://www.kt.tu-cottbus.de/speech-analysis/

Page 3: EE513 Audio Signals and Systems

Speech Production ModelsThe general speech model:

Sources can be modeled as quasi periodic impulse trains or random sequences of impulses.Vocal tract filter can be modeled as an all-pole filter related to the tract resonances. The radiator can be modeled as a simple gain with spatial direction (possibly some filtering)

Unvoiced Speech

Quasi-PeriodicPulsed Air

Air Burst or Continuous flow

Voiced Speech

Vocal TractFilter

Vocal Radiator

Page 4: EE513 Audio Signals and Systems

Vocal Tract ResonancesVocal tract length corresponds to signal wavelength (). It can be obtained from resonant frequencies (f ) estimated from recorded speech sounds and the speed of sound (c), using equation:

fc

First 3 resonances of tube with 1 closed end

Image adapted from:hyperphysics.phy-astr.gsu.edu

1/4 Wavelength

3/4 Wavelength

5/4 Wavelength

Page 5: EE513 Audio Signals and Systems

Vocal Tract ResonancesThe resonances of the vocal tract are called formants andcan be estimated from peaks of the spectrum where the effectsof pitch have been smoothed out (i.e. spectral envelope).

Page 6: EE513 Audio Signals and Systems

Low Order AR Modeling

If the voiced speech is characterized by an all pole model with low order (i.e. about 10 for sampling rate of 8kHz), then the pole frequencies correspond to the resonances of the vocal tract:

The above transfer function can represent a filter that computes the error between the current sample and the sample predicted from previous samples. Therefore, it is call a prediction error filter.

pzpazazaG

zEzX

)(......)2()1(1)(ˆ)(ˆ

21

Page 7: EE513 Audio Signals and Systems

ExampleCreate an “auh” sound (as the “a” in about or “u” in hum) and use the (linear prediction coefficient) LPC command to model this sound being generated from a quasi-periodic sequence of impulses exciting an all pole filter.

The LPC command finds a vector of filter coefficients such that prediction error is minimized.

][)(......]2[)2(]1[)1(][ˆ MnxManxanxanx

][)(......]2[)2(]1[)1(][][ MnxManxanxanxne

Predict x(n) from previous samples:

Compute prediction error sequence with:

Use Z-transforms to find transfer function of filter that recovers x(n) from the LPCs and error sequence e(n).

Page 8: EE513 Audio Signals and Systems

LPC DerivationDerive an algorithm to compute LPC coefficients from a stream of data that minimizes the mean squared prediction error.

Let be the sequence of data points and be the Mth order LPC coefficients, and

be the prediction estimate.

The mean squared error for the prediction is given by:

0for )( Nn nx 1for )( Mm ma )(ˆ nx

)(1

1)(ˆ)(1

1mse 22

N

Mn

N

Mn

neMN

nxnxMN

Page 9: EE513 Audio Signals and Systems

LPC ComputationPut prediction equations in matrix form:

Each row of is a prediction of the corresponding sample in

pD xaX ˆ

)( )3( )2( )1(

)3( )( )1( )2()2( )1( )( )1()1( )2( )1( )()0( )3( )2( )1(

MNxNxNxNx

xMxMxMxxMxMxMxxMxMxMxxMxMxMx

D

X

)(

)3( )2( )1(

)(

Nx

MxMxMx

Mx

P

x

)(

)4( )3( )2( )1(

Ma

aaaa

a

px

Page 10: EE513 Audio Signals and Systems

LPC ComputationThe mean squared error can be expressed as:

If derivative is taken with respect to a and set equal to 0, the result is:

pDT

pDppT

pp

N

Mn

neMN xaXxaXxxxx

ˆˆ)(mse)1( 2

axXXX

pTDD

TD

1

Page 11: EE513 Audio Signals and Systems

LPC ComputationTranspose of the data matrix times itself results in the autocorrelation matrix:

The data matrix transpose times the future (p-vector) values become a sequence of autocorrelation values starting with the first lag:

)( )3( )2( )1(

)3( )( )1( )2()2( )1( )( )1()1( )2( )1( )()0( )3( )2( )1(

)( )2( )1( )0(

)4( )2( )3( )4()3( )1( )2( )3()2( )( )1( )2()1( )1( )( )1(

MNxNxNxNx

xMxMxMxxMxMxMxxMxMxMxxMxMxMx

MNxxxx

NxMxMxMxNxMxMxMxNxMxMxMxNxMxMxMx

DTD

XX

)(

)3( )2( )1(

)(

)( )2( )1( )0(

)4( )2( )3( )4()3( )1( )2( )3()2( )( )1( )2()1( )1( )( )1(

Nx

MxMxMx

Mx

MNxxxx

NxMxMxMxNxMxMxMxNxMxMxMxNxMxMxMx

PTD

xX

Page 12: EE513 Audio Signals and Systems

Autocorrelation and LPCDefine the autocorrelation of a sequence as:

Note that the LPC coefficients are computed from the autocorrelation coefficients:

)0()3()2()1(

)3()0()1()2()2()1()0()1()1()2()1()0(

rMrMrMr

MrrrrMrrrrMrrrr

DTD

XX

N

n

nxknxkr0

)()()( n and 0for 0)( where Nnnx

)(

)3()2()1(

Mr

rrr

pTD

xX

Autocorrelation Matrix

Page 13: EE513 Audio Signals and Systems

Script for Analysiswinlens = 50; %PSD window length in milliseconds[y,fs] = wavread('../data/aaa3.wav'); % Read in wavefilewinlen = winlens*fs/1000;[cb,ca] = butter(5,2*100/fs,'high'); % Filter to remove LF recording noiseyf = filtfilt(cb,ca,y);[a,er] = lpc(yf,10); % Compute LPC coefficient with model order 10 predy = filter(a,1,yf); % Compute prediction error with all zero filter kd=1; % Starting figure numberfigure(kd) ; plot(predy); hold on; plot(yf,'g'); hold off; title('Prediction error'); xlabel('Samples'); ylabel('Amplitude') recon = filter(1,a,predy); % Compute reconstructed signal from error and all-pole filterfigure(kd+1) % Plot reconstructed signal plot(recon,'b') hold on% Plot with original delayed by a unit so it does not entirely overlap the perfectly reconstructed signal plot(yf(2:end),'r')hold offxlabel('Samples'); ylabel('Amplitude')title('Reconstructed Signal (blue) and Original (red)') % By examining a the error sequence, generate a simple impulse sequence to simulate its period (about 103 sample period)g = [];for k=1:150 g = [g, 1, zeros(1,55)];end

Page 14: EE513 Audio Signals and Systems

Script for Analysis% Run simulated error sequence through all pole filtersim = filter(1,a,g);soundsc([(sim')/std(sim); zeros(fix(fs)*1,1); yf/std(yf)],fs) % Plot pole zero diagramfigure(kd+2)r = (roots(a))w = [0:.001:2*pi];plot(real(r),imag(r),'xr',real(exp(j*w)),imag(exp(j*w)),'b')title('Pole diagram of vocal tract filter')xlabel('Real'); ylabel('Imaginary') % Find resonant frequencies corresponding to poles froots = (fs/2)*angle(r)/pi;nf = find(froots > 0 & froots < fs/2); % Find those corresponding to complex conjugate polesfigure(kd+3)% Examine average specturm with formant frequencies[pd,f] = pwelch(yf,hamming(winlen),fix(winlen/2),2*winlen,fs); dbspec = 20*log10(pd);mxp = max(dbspec); % Find max and min points for graphing verticle linesmnp = min(dbspec);plot(f,dbspec,'b') % Plot PSDhold

Page 15: EE513 Audio Signals and Systems

Script for Analysis% Over lines on plot where formant frequencies were estimated from LPCsfor k=1:length(nf)plot([froots(nf(k)), froots(nf(k))], [mnp(1), mxp(1)], 'k--')endhold offtitle('PSD plot with formant frequencies (Black broken lines)')xlabel('Hertz')ylabel('dB')% Get spectrum from the AR (LPC) parameters[hz,fz] = freqz(1, a, 1024, fs);figure(kd+4)plot(fz,abs(hz))title('Spectrum Generated by LPCs')xlabel('Hertz')ylabel('Amplitude')

Page 16: EE513 Audio Signals and Systems

LPC Analysis Result

0 500 1000 1500 2000 2500 3000 3500 4000-140

-120

-100

-80

-60

-40

-20

0

20PSD plot with formant frequencies (Black broken lines)

Hertz

dB

Pole Frequencies of LPC model from vocal tract shape

Frequency periodicities from harmonics of Pitch frequency

Page 17: EE513 Audio Signals and Systems

Vocal Tract Filter Implementations

Direct form 1 for all pole model: )()(......)2()2()1()1()()( 1 MnxManxanxanegnx

MzMazazag

zEzX

)(......)2()1(1)(ˆ)(ˆ

211

z-1 z-1 z-1…z-1 +

)(nx

)(ne

)1(a

)2(a

)3(a

)(Ma

1g

Page 18: EE513 Audio Signals and Systems

Vocal Tract Filter Implementations

Direct form 1, second order sections:

212/

212

211

)3,2/()2,2/(1)3,2()2,2(1)3,1()2,1(1)(ˆ)(ˆ

zMczMcg

zczcg

zczcg

zEzX M

… )(nx)(ne

z-1

z-1

+),( 21c

1g

+),( 31c

+

z-1

z-1

+),( 22c

2g

+),( 32c

+

z-1

z-1

+)2,2/(Mc

2/Mg

+ )3,2/(Mc

+

Page 19: EE513 Audio Signals and Systems

Vocal Tract Filter Implementations

)(0 nx)(1 nx)(2 nx

Lattice implementation are popular because of good numerical error and stability properties. The filter is implement in modular stages with coefficients directly related to stability criterion and tube resonances of the vocal tract (example of 2nd order system):

)(ˆ)(ˆ)(ˆ)(ˆ)(ˆ)(ˆ

1*1

11

zXzzEkzX

zXzkzEzE

iiii

iiii

)(ne

z-1+

0k

*0k

+)(0 ne )(nx

z-1+

1k

*1k

+)(2 ne )(1 ne

Page 20: EE513 Audio Signals and Systems

Examplea) Record a neutral vowel sound, estimate the formant frequencies, and

estimate the size of the vocal tract based on a 345 m/s speed of sound and assume an open-at-one-end tube model.

b) Use LPCs estimated from the neutral vowel sound, to filter another sample of speech from the same speaker. Use it as an all zero filter and then as an all pole filter. Listen to the sound and describe what is happening.

c) Convert the LPC coefficients for all-pole filter into a second order section and implement filter. Describe advantages of this approach.

d) Modify the filter by maintaining the angle of the poles/zeros but move their magnitudes closer to the unit circle. Listen to the sound and explain what is happening.

Page 21: EE513 Audio Signals and Systems

Homework (1)a) Record a free vowel sound and estimate the size of your vocal tract

based on the formant frequencies.

b) Compute the LPCs from a free vowel sound and use the LPCs to filter another segment of speech with –10dB of white noise added. Use the LPCs as an all-zero filter and as an all-pole filter. Describe the sound of the filtered outputs and explain what is happening between the 2 filters.

c) Move the poles and zeros further away from the unit circles and repeat part b). Describe the effect on the filtered sound when pole and zeros are moved away from the unit circle. Submit this description and the mfiles used to process the data.