5707_4_pitch

Embed Size (px)

Citation preview

  • Chapter 4: Pitch estimation for music signal processingKH WongCh4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Introduction (lecture 4)Pitch estimation is essential to many music signal applicationsGenre classificationMusic tutor: detection of playing faultMusic style analysisAutomatic transcription, audio signal music score

    Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Techniques in pitch extraction Time domain approaches(1) ACF (Autocorrelation function) and MACF (Modified Autocorrelation function)(2) Normalized cross correlation function NCCF (3) AMDF (Average magnitude difference function)Frequency domain approaches(4) Cepstrum Pitch Determination (CPD)

    Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Definition of pitchWhat is the pitch () of a tone?Answer: The perceived frequency of sound. (wiki) Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Method 1:ACF (Autocorrelation function)Autocorrelation function (ACF)

    Ch4. pitch, v3.c*Symmetrical on both side

    m

    Ch4. pitch, v3.c

  • What is Auto-correlation, R(m)?E.g.x=[1 5 7 1 4 ]N=5, R(0)=[x(0)*x(0)+x(1)*x(1)+x(2)*x2+x(3)*x(3)+x(4)*x(4)]R(0)= (1+ 25+49+1+16)=92

    R(1)=[x(0)*x(1)+x(1)*x(2)+x(2)*x(3)+x(3)*x(4)] x=[1 5 7 1 4 ] [1 5 7 1 4 ] (5+ 35+ 7+ 4)=51And so onR=[92.0000 51.0000 40.0000 21.0000 4.0000] Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Exercise 4.1First, what is auto-correlation?%matlab codex=[1 5 7 1 4 8 6 2 4 9 3 ]'auto_corr_x=xcorr(x) %auto-correlationfigure(1), clfsubplot(2,1,1),plot(x)grid on, grid(gca,'minor'), hold onsubplot(2,1,2),plot(auto_corr_x)grid on, grid(gca,'minor')Exercise:Show the steps of calculation Ch4. pitch, v3.c*X[t]Auto_correlation(x[t])tWe only look at positive nGap between two peaks is 4, so period of X is around 4Ans: ??

    Ch4. pitch, v3.c

  • autocorrelationWhen a segment of a signal is correlated with itself, the distance (-=Lag_time_in_samples) between the positions of the maximum and the second maximum correlation is defined as the fundamental period (pitch) of the signal.Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Then the fundamental frequency can be calculated as:Then the fundamental frequency can be calculated as:

    Usually =0, because is at . Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Modified Auto-Correlation Method:Auto-Correlation Method enhanced by Center clipping It will give more accurate result because higher frequency signals will not interfere with the resultCh4. pitch, v3.c*CLCLCut(remove) the middle partX(n)nny(n) =clc(x)Typical CL =1/4 peak-to-peak of X

    Ch4. pitch, v3.c

  • Finding pitchby center clippingIn R(m) auto correlation of x(n), it is not easy to pick peaksIn R(m), auto correlation of clipped signal y(n)=clc{x(n)}, peaks are easy to pickCh4. pitch, v3.c*T=mean(T1,T2,T3)=Period=1/(pitch_frequency)T1T2T3R(m)R(m)X(n)Y(n)=CenterClipped

    Ch4. pitch, v3.c

  • The MACF (Modified Autocorrelation function) algorithm Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • ExampleFor each frame, find a pitch.Plot pitch against time (blue), you can see the pitch profileCh4. pitch, v3.c*timeTime n (frame)X(n)Pitch (n)frequency

    Ch4. pitch, v3.c

  • Class exercise 4.2 x=[1 3 7 2 1 9 3 1 8 ], If Fs= sampling frequency= 1Hz.(a) Find pitch of this signal x using ACF (Autocorrelation function) .(b) Repeat above of if Fs = 8KHz

    Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Method 2:Normalized cross correlation function NCCF method [Verteletskaya 2009 ] Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Method 3:Average Magnitude Difference Function (AMDF) Method [Verteletskaya 2009 ]An intuitive method, just pick the peaks and find the period Ch4. pitch, v3.c*Find peaks in D, the estimated period is the average gaps between two neighboring ve peakspeaks

    Ch4. pitch, v3.c

  • Method 4:Cepstrum Pitch Determination (CPD) [Verteletskaya 2009 ] Ch4. pitch, v3.c*The problem : For human voice, the peak may be the result of glottal excitation.

    QPeak at Q, Pitch =1/0.006=166Hz.

    Ch4. pitch, v3.c

  • For human voice pitch detection (or recognition )We must study its structure of the vocal system and find out how to get the accurate answer.vocal system has 2 elementsGlottal excitation (no use for pitch measurement)Vocal tract filterUse liftering to remove glottal excitation before we use the spectrum of the vocal tract filter for pitch extraction.

    Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Cepstrum of speechA new word by reversing the first 4 letters of spectrum cepstrum.It is the spectrum of a spectrum of a signalWhy we need this?Answer: remove the ripples of the spectrum caused by glottal excitation.

    Ch4. pitch, v3.c*Speech signal xSpectrum of xToo many ripples in the spectrum caused by vocalcord vibrations.But we are more interested in the speech envelope for recognition and reproduction

    FourierTransformhttp://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf

    Ch4. pitch, v3.c

  • Liftering method: Select the higher and lower samples Ch4. pitch, v3.c*Signal X(n)

    Cepstrum=C(n)=fft|(log|fft(x(n))|)|

    Select high time liftering, select C_high (lower frequency):glottal excitation

    Select low time liftering,Select C_low (higher frequency) :Vocal tract filter response

    Quefrency is in time domain (in second)So Higher Quefrency lower frequency

    Ch4. pitch, v3.c

  • Recover Glottal excitation and vocal track spectrum Ch4. pitch, v3.c*C_highForGlottalexcitation

    C_highForVocal trackThis peak may be the pitch period:This smoothed vocal track spectrum can be used to find pitchFor more information see : http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf

    FrequencyFrequencyquefrency (sample index)Cepstrum of glottal excitationSpectrum of glottal excitationSpectrum of vocal track filterCepstrum of vocal track

    Ch4. pitch, v3.c

  • Measure pitch of musical instruments Example: Find pitch of Oboe A4 sound http://www.cse.cuhk.edu.hk/%7Ekhwong/www2/cmsc5707/A4_oboe.wavA4_OboeSpectrogramCh4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Example: Find pitch of Oboe A4 sound http://www.cse.cuhk.edu.hk/%7Ekhwong/www2/cmsc5707/A4_oboe.wavhttp://www.cse.cuhk.edu.hk/%7Ekhwong/www2/cmsc5707/demo_ceps_note_v3.zip

    Ch4. pitch, v3.c* Input:Oboe A4X(n)

    Fourier TransformX(w)=fft(x)

    Cepstrum C(n)=fft|(log|fft(x(n))|)|From range 200To 900 Hz

    Cepstrum C(n)All range, aroundFrom 30 to Hz

    The first peak of the cepstrum (in Quefrency) time=0.002268(1/time)=F1=440.91Hz is the pitch, it has the strongest energyThe second peak: time=0.004535(1/time)=F2=220.507200Hz1/200=5x10^-3900Hz1/900=1.11x10^-3 HzThis axis is in x10^-3Found two Harmonics 440, 220Hz

    Ch4. pitch, v3.c

  • SummaryMethods of pitch extraction have been studied.Cepstrum and its use for pitch extraction is discussed.Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • References[Naotoshi Seo 2007] Project: Pitch Detection, ]http://note.sonots.com/SciSoftware/Pitch.html#ke283f3a[Verteletskaya 2009 ] E. Verteletskaya, B. imk, Performance Evaluation of Pitch Detection Algorithms, http://access.feld.cvut.cz/view.php?cisloclanku=2009060001[Rabiner1976] Rabiner, L.; Cheng, M.; Rosenberg, A.; McGonegal, C." A comparative performance study of several pitch detection algorithms",IEEE Transactions on Acoustics, Speech and Signal Processing, Volume: 24, Issue:5 page(s): 399 - 418, Oct 1976 Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • AppendixCh4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Music Frequency tablehttp://wc.pima.edu/~manelson/MUS%20102/MIDI%20tunings%20per%20note.jpg Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Music frequency table% source : http://www.angelfire.com/in2/yala/t4scales.htm Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Autocorrelation In signal processing, given a signal f(t), the continuous autocorrelation is the continuous cross-correlation of f(t) with itself, at lag , and is defined as:

    In discrete system, autocorrelation R at lag j for signal is defined as:

    Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Anwer4.1: Exercise 4.1First, what is auto-correlation?%matlab codex=[1 5 7 1 4 8 6 2 4 9 3 ]'auto_corr_x=xcorr(x) %auto-correlationfigure(1), clfsubplot(2,1,1),plot(x)grid on, grid(gca,'minor'), hold onsubplot(2,1,2),plot(auto_corr_x)grid on, grid(gca,'minor')Exercise:Show the steps of calculation Ch4. pitch, v3.c*X[t]Auto_correlation(x[t])tWe only look at positive nGap between two peaks is 4, so period of X is around 4Ans: [302 214 142 183 194 116 65 88 70 24 3 0]

    Ch4. pitch, v3.c

  • Answer 4.2 for exercise 4.2It is using MACF, you can use ACF, and the result for the pitch found is the same for this example.Question: x=[1 3 7 2 1 9 3 1 8 ], sampling at 1Hz.Find pitch of this signal x using MACF (Modified Autocorrelation function) .%%%%%%%%%%%%%%Answer: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%orginal_x = 1 3 7 2 1 9 3 1 8x =centered_wave =orginal_x-mean_x = -2.8889 -0.8889 3.1111 -1.8889 -2.8889 5.1111 -0.8889 -2.8889 4.1111cl=center clipped range= 2y =center clipped signal= -2.8889 0 3.1111 0 -2.8889 5.1111 0 -2.8889 4.1111(a) if the sampling frequency Fs = 1KHz>> Answer: from the autocorrelation result of y in the figure, we can see that the distance between 2 peaks is 3, so pitch is 1/3 Hz, since the sampling is 1 Hz..

    Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Answer 4.2: Class exercise 4.2 R=[ 24.3333, 9.6667, 8.2222, 16.3333, 6.5556, 4.5556, ,6.8889, 2.7778, 0.8889]2nd diagram, R(+ve only) , pick 2 peaks, Period is 3, frequency =1/3 hz(b) if FS = 8KHzAnswer: If the sampling frequency is Fs=8KHz, sampling period is dt=1/Fs=(1/8)ms , the period of x is 3 units, therefore the actual time is 3*dt= 3*(1/8)ms. The frequency of x is 1/dt=(8/3) KHz

    Ch4. pitch, v3.c*

    Ch4. pitch, v3.c

  • Matlab%Ver2, MACF (Modified Autocorrelation function)using center clippingclear%select one of the followings%real_data=1 %1 or 0real_data=0 if real_data==1 %use real sound %[x,fs]=wavread ('d:\0music\sounds\violin3.wav'); [orginal_x,fs]=wavread ('violin3.wav'); x=x(10000:11000);else %use test data %x=[1 2 5 6 7 6 1 0 4 3 4 8 6 7 3 2 4 9 3 ] orginal_x=[1 3 7 2 1 9 3 1 8 ] fs=1 %assume frquecy is 1Hzend %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% test x=orginal_x-mean(orginal_x)n=length(x)maxx=max(x)minx=min(x)dd=maxx-minxfigure(1)clfplot(x)%pause%center clipping algo for pitch extractionif real_data==1 cl=dd/4000else cl=dd/4 %center clippped "cl" length is 1/4 of total peak-to_peak span pauseend

    %assume the signal x is voltage against time%center clip means set those signals with levels within the clipped regions%center = mean voltage level of the whole signal %positive peak = maxim,um of the signal voltage%negative peak = minimum of the signal voltage%center clip regions are:(i) from center to 1/2 of center_to_positive peak% (ii) from center to -1/2 from center_to_negative peak for t=1:n if x(t) -1*cl %those within center clipped region set to 0 y(t)=0; else y(t)=x(t); end;end ; auto_corr_y=xcorr(y) %auto correlationfigure(2)clfsubplot(3,1,1),plot(x)ylabel('x=centered wave') subplot(3,1,2),plot(y)ylabel('y=center clipped wave') hold onsubplot(3,1,3),plot(auto_corr_y)ylabel('auto correlation of y') xlabel('time ') max_list=max(y)fs 'orginal_x ' , orginal_x'x =centered_wave =orginal_x-mean_x ' , x'cl=center clipped range', cl'y =center clipped signal' , y

    Ch4. pitch, v3.c*

    Ch4. pitch, v3.c