33
Chapter 4: Pitch estimation for music signal processing KH Wong Ch4. pitch, v3.c 1

5707_4_pitch

Embed Size (px)

DESCRIPTION

trabajo practico

Citation preview

Page 1: 5707_4_pitch

Chapter 4: Pitch estimation for music signal processing

KH Wong

Ch4. pitch, v3.c 1

Page 2: 5707_4_pitch

Introduction (lecture 4)

• Pitch estimation is essential to many music signal applications– Genre classification– Music tutor: detection of playing fault– Music style analysis– Automatic transcription, audio signal music

score

Ch4. pitch, v3.c 2

Page 3: 5707_4_pitch

Techniques in pitch extraction

– Time domain approaches• (1) ACF (Autocorrelation function) and MACF (Modified

Autocorrelation function)• (2) Normalized cross correlation function NCCF • (3) AMDF (Average magnitude difference function)

– Frequency domain approaches• (4) Cepstrum Pitch Determination (CPD)

Ch4. pitch, v3.c 3

Page 4: 5707_4_pitch

Definition of pitch

• What is the pitch (音高 ) of a tone?• Answer: The perceived frequency of sound.

(wiki)

Ch4. pitch, v3.c 4

Page 5: 5707_4_pitch

Method 1:ACF (Autocorrelation function)

• Autocorrelation function (ACF)

mN

n

N

NnN

MmmnxnxN

mR

n' -'''nR

MmmnxnxN

mR

1

00

0

0 ),()(1

)(

used. is 0only so l,symmetrica are and for

0 ),()(12

1lim)(

is ncorrelatio-auto ,definitionBy

Ch4. pitch, v3.c 5

Symmetrical on both sideR

x

n

n

m

Page 6: 5707_4_pitch

What is Auto-correlation, R(m)?• E.g.• x=[1 5 7 1 4 ]• N=5, • R(0)=[x(0)*x(0)+x(1)*x(1)+x(2)*x2+x(3)*x(3)+x(4)*x(4)]• R(0)= (1+ 25+49+1+16)=92

• R(1)=[x(0)*x(1)+x(1)*x(2)+x(2)*x(3)+x(3)*x(4)] • x=[1 5 7 1 4 ] • [1 5 7 1 4 ]• (5+ 35+ 7+ 4)=51• And so on…• R=[92.0000 51.0000 40.0000 21.0000 4.0000]•

mN

n

mN

n

MmmnxnxmR

MmmnxnxN

mR

1

00

1

00

0 ),()()(

term(1/N) mean ignor the youifeasier isIt

0 ),()(1

)(

Ch4. pitch, v3.c 6

Page 7: 5707_4_pitch

Exercise 4.1First, what is auto-correlation?

• %matlab code• x=[1 5 7 1 4 8 6 2 4 9 3 ]'• auto_corr_x=xcorr(x) %auto-

correlation• figure(1), clf• subplot(2,1,1),plot(x)• grid on, grid(gca,'minor'), hold on• subplot(2,1,2),plot(auto_corr_x)• grid on, grid(gca,'minor')

• Exercise:• Show the steps of calculation

Ch4. pitch, v3.c 7

X[t]

Auto_correlation(x[t])t

•We only look at positive n•Gap between two peaks is 4, so period of X is around 4

mN

n

MmmnxnxmR1

000 ),()()(

Ans: ??

Page 8: 5707_4_pitch

autocorrelation• When a segment of a signal is correlated with itself, the distance (-

=Lag_time_in_samples) between the positions of the maximum and the second maximum correlation is defined as the fundamental period (pitch) of the signal.

Ch4. pitch, v3.c 8

Lag Time jin samples

Auto correlation R(j)

Rthe_max (j1)Rsecond_max (j2)

j1=0 j2

Page 9: 5707_4_pitch

Then the fundamental frequency can be calculated as:

• Then the fundamental frequency can be calculated as:

• Usually =0, because is at .

Ch4. pitch, v3.c 9

12

120

_

____

1

1

n_samplesLag_time_i

1

jj

frequencysampling

priodsamplingsamplesintimeLag

jjf

20

_

j

frequencysamplingf

Page 10: 5707_4_pitch

Modified Auto-Correlation Method:Auto-Correlation Method enhanced by Center clipping

mN

n

LL

L

LL

MmmnynymR

Cx(n),Cx(n)

Cnx

C, x(n)Cx(n)

nxclcny

1

000 ),()()('

)( , 0)()(• It will give more accurate

result because higher frequency signals will not interfere with the result

Ch4. pitch, v3.c 10

CLCL

Cut(remove) the middle part

X(n)

n

n

y(n) =clc(x)

Typical CL =1/4 peak-to-peak of X

Page 11: 5707_4_pitch

Finding pitchby center clipping

• In R(m) auto correlation of x(n), it is not easy to pick peaks

• In R’(m), auto correlation of clipped signal y(n)=clc{x(n)}, peaks are easy to pick

Ch4. pitch, v3.c 11

T=mean(T1,T2,T3)=Period=1/(pitch_frequency)

T1 T2 T3

R(m)

R’(m)

X(n)

Y(n)=CenterClipped

Page 12: 5707_4_pitch

The MACF (Modified Autocorrelation function) algorithm

Ch4. pitch, v3.c 12

Page 13: 5707_4_pitch

Example

• For each frame, find a pitch.

• Plot pitch against time (blue), you can see the pitch profile

Ch4. pitch, v3.c 13

time

Time n (frame)

X(n)

Pitch (n)frequency

Page 14: 5707_4_pitch

Class exercise 4.2

• x=[1 3 7 2 1 9 3 1 8 ], If Fs= sampling frequency= 1Hz.

• (a) Find pitch of this signal x using ACF (Autocorrelation function) .

• (b) Repeat above of if Fs = 8KHz

Ch4. pitch, v3.c 14

Page 15: 5707_4_pitch

Method 2:Normalized cross correlation function NCCF method

[Verteletskaya 2009 ]

• 01

0

1

0

22

1

0 0 ,

)()(

)()()( Mm

mnxnx

mnxnxmNCCF

mN

n

mN

n

mN

n

Ch4. pitch, v3.c 15

Page 16: 5707_4_pitch

Method 3:Average Magnitude Difference Function (AMDF) Method

[Verteletskaya 2009 ]

• An intuitive method, just pick the peaks and find the period

0 ,)()(1

)( 0

1

0

MmmnxmxN

mDmN

nx

Ch4. pitch, v3.c 16

Find peaks in D, the estimated period is the average gaps between two neighboring –ve peaks

peaks

Page 17: 5707_4_pitch

Method 4:Cepstrum Pitch Determination (CPD)

[Verteletskaya 2009 ] •

1

0

2

1

0

2

11

1

log

)(1

)(

)(log)(log

)(log

)()()(

)()()(

N

n

nkN

πj

N

k

mkN

j

eS(n)C(k)

ekSN

mC

wHFwEF

wSF

wHwEws

nhnens

Ch4. pitch, v3.c 17

The problem : For human voice, the peak may be the result of glottal excitation.

Q’

Peak at Q’, Pitch =1/0.006=166Hz.

Page 18: 5707_4_pitch

For human voice pitch detection (or recognition )

• We must study its structure of the vocal system and find out how to get the accurate answer.

• vocal system has 2 elements– Glottal excitation (no use for pitch measurement)– Vocal tract filter– Use liftering to remove glottal excitation before

we use the spectrum of the vocal tract filter for pitch extraction.

Ch4. pitch, v3.c 18

Page 19: 5707_4_pitch

Cepstrum of speech• A new word by reversing the first 4 letters of spectrum

cepstrum.• It is the spectrum of a spectrum of a signal• Why we need this?

– Answer: remove the ripples – of the spectrum caused by – glottal excitation.

Ch4. pitch, v3.c 19Speech signal x

Spectrum of x

Too many ripples in the spectrum caused by vocalcord vibrations.But we are more interested in the speech envelope for recognition and reproduction

FourierTransform

http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf

Page 20: 5707_4_pitch

Liftering method: Select the higher and lower samples

Ch4. pitch, v3.c 20

Signal X(n)

Cepstrum=C(n)=fft|(log|fft(x(n))|)|

Select high time liftering, select C_high (lower frequency):glottal excitation

Select low time liftering,Select C_low (higher frequency) :Vocal tract filter response

Quefrency is in time domain (in second)So Higher Quefrency lower frequency

Page 21: 5707_4_pitch

Recover Glottal excitation and vocal track spectrum

Ch4. pitch, v3.c 21

C_highForGlottalexcitation

C_highForVocal track

This peak may be the pitch period:This smoothed vocal track spectrum can be used to find pitchFor more information see :

http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf

Frequency

Frequency

quefrency (sample index)

Cepstrum of glottal excitation

Spectrum of glottal excitation

Spectrum of vocal track filterCepstrum of vocal track

Page 22: 5707_4_pitch

Measure pitch of musical instruments Example: Find pitch of Oboe A4 sound

http://www.cse.cuhk.edu.hk/%7Ekhwong/www2/cmsc5707/A4_oboe.wav

• A4_Oboe• Spectrogram

Ch4. pitch, v3.c 22

Page 23: 5707_4_pitch

Example: Find pitch of Oboe A4 sound http://www.cse.cuhk.edu.hk/%7Ekhwong/www2/cmsc5707/A4_oboe.wav

http://www.cse.cuhk.edu.hk/%7Ekhwong/www2/cmsc5707/demo_ceps_note_v3.zip

Ch4. pitch, v3.c 23

Input:Oboe A4X(n)

Fourier TransformX(w)=fft(x)

Cepstrum C(n)=fft|(log|fft(x(n))|)|From range 200To 900 Hz

Cepstrum C(n)All range, aroundFrom 30 to Hz

The first peak of the cepstrum (in Quefrency) time=0.002268(1/time)=F1=440.91Hz is the pitch, it has the strongest energy

The second peak: time=0.004535(1/time)=F2=220.507

200Hz1/200=5x10^-3

900Hz1/900=1.11x10^-3

Hz

This axis is in x10^-3

Found two Harmonics 440, 220Hz

Page 24: 5707_4_pitch

Summary

• Methods of pitch extraction have been studied.

• Cepstrum and its use for pitch extraction is discussed.

Ch4. pitch, v3.c 24

Page 25: 5707_4_pitch

References• [Naotoshi Seo 2007] Project: Pitch Detection,

]http://note.sonots.com/SciSoftware/Pitch.html#ke283f3a• [Verteletskaya 2009 ] E. Verteletskaya, B. Šimák,” Performance

Evaluation of Pitch Detection Algorithms”, http://access.feld.cvut.cz/view.php?cisloclanku=2009060001

• [Rabiner1976] Rabiner, L.; Cheng, M.; Rosenberg, A.; McGonegal, C." A comparative performance study of several pitch detection algorithms",IEEE Transactions on Acoustics, Speech and Signal Processing, Volume: 24, Issue:5 page(s): 399 - 418, Oct 1976

Ch4. pitch, v3.c 25

Page 26: 5707_4_pitch

Appendix

Ch4. pitch, v3.c 26

Page 27: 5707_4_pitch

Music Frequency tablehttp://wc.pima.edu/~manelson/MUS%20102/MIDI%20tunings%20per%20note.jpg

Ch4. pitch, v3.c 27

Page 28: 5707_4_pitch

Music frequency table% source : http://www.angelfire.com/in2/yala/t4scales.htm

Ch4. pitch, v3.c 28

Page 29: 5707_4_pitch

Autocorrelation

• In signal processing, given a signal f(t), the continuous autocorrelation is the continuous cross-correlation of f(t) with itself, at lag τ, and is defined as:

• In discrete system, autocorrelation R at lag j for signal is defined as:

Ch4. pitch, v3.c 29

dttftfdttftfffR f )()()()()()()( ***

n

njnn xxjR ))(()(

Page 30: 5707_4_pitch

Anwer4.1: Exercise 4.1First, what is auto-correlation?

• %matlab code• x=[1 5 7 1 4 8 6 2 4 9 3 ]'• auto_corr_x=xcorr(x) %auto-

correlation• figure(1), clf• subplot(2,1,1),plot(x)• grid on, grid(gca,'minor'), hold on• subplot(2,1,2),plot(auto_corr_x)• grid on, grid(gca,'minor')

• Exercise:• Show the steps of calculation

Ch4. pitch, v3.c 30

X[t]

Auto_correlation(x[t])t

•We only look at positive n•Gap between two peaks is 4, so period of X is around 4

mN

n

MmmnxnxmR1

000 ),()()(

Ans: [302 214 142 183 194 116 65 88 70 24 3 0]

Page 31: 5707_4_pitch

Answer 4.2 for exercise 4.2It is using MACF, you can use ACF, and the result for the pitch found is the

same for this example.• Question: x=[1 3 7 2 1 9 3 1 8 ], sampling at 1Hz.Find pitch of this signal x using MACF (Modified

Autocorrelation function) .• %%%%%%%%%%%%%%Answer: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%• orginal_x = 1 3 7 2 1 9 3 1 8• x =centered_wave =orginal_x-mean_x =• -2.8889 -0.8889 3.1111 -1.8889 -2.8889 5.1111 -0.8889 -2.8889 4.1111• cl=center clipped range= 2• y =center clipped signal=• -2.8889 0 3.1111 0 -2.8889 5.1111 0 -2.8889 4.1111• (a) if the sampling frequency Fs = 1KHz• >> Answer: from the autocorrelation result of y in the figure, we can see that the distance between 2

peaks is 3, so pitch is 1/3 Hz, since the sampling is 1 Hz..

Ch4. pitch, v3.c 31

Page 32: 5707_4_pitch

Answer 4.2: Class exercise 4.2 • R=[ 24.3333, 9.6667,

8.2222, 16.3333, 6.5556, 4.5556, ,6.8889, 2.7778, 0.8889]

• 2nd diagram, R(+ve only) , pick 2 peaks, Period is 3, frequency =1/3 hz

• (b) if FS = 8KHz• Answer: If the sampling

frequency is Fs=8KHz, sampling period is dt=1/Fs=(1/8)ms , the period of x is 3 units, therefore the actual time is 3*dt= 3*(1/8)ms. The frequency of x is 1/dt=(8/3) KHz

Ch4. pitch, v3.c 32

Page 33: 5707_4_pitch

Matlab• %Ver2, MACF (Modified Autocorrelation function)using center clipping• clear• %select one of the followings• %real_data=1 %1 or 0• real_data=0• if real_data==1• %use real sound• %[x,fs]=wavread ('d:\0music\sounds\violin3.wav');• [orginal_x,fs]=wavread ('violin3.wav');• x=x(10000:11000);• else• %use test data• %x=[1 2 5 6 7 6 1 0 4 3 4 8 6 7 3 2 4 9 3 ]• orginal_x=[1 3 7 2 1 9 3 1 8 ]• fs=1 %assume frquecy is 1Hz• end• %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% test • x=orginal_x-mean(orginal_x)• n=length(x)• maxx=max(x)• minx=min(x)• dd=maxx-minx• figure(1)• clf• plot(x)• %pause• %center clipping algo for pitch extraction• if real_data==1 • cl=dd/4000• else• cl=dd/4 %center clippped "cl" length is 1/4 of total peak-to_peak span• pause• end

• %assume the signal x is voltage against time• %center clip means set those signals with levels within the clipped

regions• %center = mean voltage level of the whole signal • %positive peak = maxim,um of the signal voltage• %negative peak = minimum of the signal voltage• %center clip regions are:(i) from center to 1/2 of center_to_positive

peak• % (ii) from center to -1/2 from center_to_negative peak• for t=1:n• if x(t)<cl & x(t) > -1*cl %those within center clipped region set to 0• y(t)=0;• else• y(t)=x(t);• end;• end ;• auto_corr_y=xcorr(y) %auto correlation• figure(2)• clf• subplot(3,1,1),plot(x)• ylabel('x=centered wave')• subplot(3,1,2),plot(y)• ylabel('y=center clipped wave')• hold on• subplot(3,1,3),plot(auto_corr_y)• ylabel('auto correlation of y')• xlabel('time ')• max_list=max(y)• fs• 'orginal_x ' , orginal_x• 'x =centered_wave =orginal_x-mean_x ' , x• 'cl=center clipped range', cl• 'y =center clipped signal' , y

Ch4. pitch, v3.c 33