Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling

Pitch Estimation by Enhanced Pitch Estimation by Enhanced Super Resolution determinatorSuper Resolution determinator

By By

Sunya SantananchaiSunya Santananchai

Chia-Ho LingChia-Ho Ling

ObjectiveObjective

Estimate value of the fundamental Estimate value of the fundamental frequency of speech by using frequency of speech by using Enhance Super Resolution determinator Enhance Super Resolution determinator (eSRFD)(eSRFD)

F` a

F` a

IntroductionIntroduction

The fundamental frequency of speech is The fundamental frequency of speech is defined as the rate of glottal pluses defined as the rate of glottal pluses generated by the vibration of the vocal generated by the vibration of the vocal folds. folds.

The pitch of speech is the perceptual The pitch of speech is the perceptual correlate of fundamental frequency . correlate of fundamental frequency .

The fundamental frequency of speech is The fundamental frequency of speech is important in the prosodic features of stress important in the prosodic features of stress and intonation. and intonation.

fundamental frequency fundamental frequency determination Algorithm (FDAs).determination Algorithm (FDAs).

Determine the fundamental frequency of Determine the fundamental frequency of speech waveform or analyzing the pitch speech waveform or analyzing the pitch automatically.automatically.

Desire to examine methods of Desire to examine methods of fundamental frequency extraction which fundamental frequency extraction which use radically different techniques use radically different techniques

The algorithms to determine theThe algorithms to determine the Cepstrum-based determinator (CFD) (Noll, Cepstrum-based determinator (CFD) (Noll,

1969).1969). Harmonic product spectrum (HPS) (Schroeder, Harmonic product spectrum (HPS) (Schroeder,

1968; Noll, 1970)1968; Noll, 1970) Feature-based tracker (FBFT) (Phillips, 1985)Feature-based tracker (FBFT) (Phillips, 1985) Parallel processing method (PP) (Gold & Parallel processing method (PP) (Gold &

Rabiner, 1969)Rabiner, 1969) Integrated tracking algorithm (IFTA) (Secrest & Integrated tracking algorithm (IFTA) (Secrest &

Doddington, 1983)Doddington, 1983) Super resolution determinator (SRFD) (Medan et Super resolution determinator (SRFD) (Medan et

al., 1991)al., 1991)

F` a

Enhance Super Resolution Enhance Super Resolution determinator (eSRFD)determinator (eSRFD)

based on the SRFD method which uses a based on the SRFD method which uses a waveform similarity metric normalized waveform similarity metric normalized cross-correlation coefficient. cross-correlation coefficient.

Performances of the SRFD algorithm, to Performances of the SRFD algorithm, to reduced the occurrence of errors.reduced the occurrence of errors.

The eSRFD algorithm The eSRFD algorithm

Pass the speech waveform to low-pass Pass the speech waveform to low-pass filter .filter . The speech waveform is initially low-pass The speech waveform is initially low-pass

filtered.filtered.

Each frame of filtered sample data Each frame of filtered sample data processed by the silence detector.processed by the silence detector. Signal is analysed frame-by-frame; interval Signal is analysed frame-by-frame; interval

6.4 ms of non-overlapping.6.4 ms of non-overlapping. Contains a set of samples Contains a set of samples DDivided 3 consecutive segmentivided 3 consecutive segment

sN s i` a

| i2 @Nmax , , N@Nmax

R S

X n x i` a

s i@n` a

| i2 1, ,nR S

Y n y i` a

s i` a

| i2 1, ,nR S

Z n z i` a

s i n` a

| i2 1, ,nR S

Analysis segments for the enhanced super resolution determinatorAnalysis segments for the enhanced super resolution determinatorF` a

Normalized cross-correlation for Normalized cross-correlation for ‘voiced’ frame:‘voiced’ frame: If frame of data is not classified as silence or If frame of data is not classified as silence or

unvoice, then candidate values for the unvoice, then candidate values for the fundamental period by using the first fundamental period by using the first normalized cross-correlation of normalized cross-correlation of px,y n

` a

px,y n` a

Xj 1

n L+B C

x jLb c

Ay jLb c

Xj 1

n L+B Cvuuu

t

wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

x jLb c2

AXj 1

n L+B C

y jLb c2

ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

Definition threshold for candidate valueDefinition threshold for candidate value Candidate values of the fundamental period Candidate values of the fundamental period

are obtained by locating peaks in the are obtained by locating peaks in the normalized crosscorrelation coefficient for normalized crosscorrelation coefficient for which the value of exceeds a specified the which the value of exceeds a specified the threshold. threshold.

A second normalized cross-correlation A second normalized cross-correlation coefficient .coefficient . The frame is classified as ‘voiced’ which has The frame is classified as ‘voiced’ which has

>> Determined the second normalized cross-Determined the second normalized cross-

correlation coefficient correlation coefficient

px,y n` a

T srfd

p y,z n` a

py,z n` a

Xj 1

n L+B C

y jLb c

Az jLb c

Xj 1

n L+B Cvuuu

t

wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

y jLb c2

AXj 1

n L+B C

z jLb c2

fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

Candidate score forCandidate score for Candidates for exceeds the threshold Candidates for exceeds the threshold

are given a score of 2, others are 1. are given a score of 2, others are 1.

If there are 1 or more candidates with a score of 2 If there are 1 or more candidates with a score of 2 in a frame, then all those candidates with a score in a frame, then all those candidates with a score of 1 are removed from the list of candidates. of 1 are removed from the list of candidates.

If there is only one candidate (with score 1 or 2), If there is only one candidate (with score 1 or 2), the candidate is assumed to be the best estimate the candidate is assumed to be the best estimate of the fundamental period of that frame.of the fundamental period of that frame.

p y,z n` a

p y,z n` a

T srfd

Otherwise, an optimal fundamental period is sought from the Otherwise, an optimal fundamental period is sought from the set of remaining candidates , calculated the coefficient of set of remaining candidates , calculated the coefficient of each candidate.each candidate.

The first coefficient is assumed to be the optimal value. If The first coefficient is assumed to be the optimal value. If the subsequent * 0.77 > the current optimal value , the the subsequent * 0.77 > the current optimal value , the subsequent is the optimal value. subsequent is the optimal value.

q nm` a

q nm` a

Xj 1

n M

s j@nMb c

As j nmb c

Xj 1

n M

s j@nMb c2

AXj 1

n M

s j nmb c2

vuut

wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

q n1` a

q nm` a

q nm` a

In the case of only 1 candidate score 1 but no In the case of only 1 candidate score 1 but no candidate score2, the frame status will be candidate score2, the frame status will be reconsidered depends on the frames state of reconsidered depends on the frames state of previous frame. previous frame.

If the previous frame is ‘silent’, the current value is hold If the previous frame is ‘silent’, the current value is hold and depends on the next frame. and depends on the next frame.

If the next frame is also ‘silent’, the current frame will be If the next frame is also ‘silent’, the current frame will be considered as ‘silent’. considered as ‘silent’.

Otherwise, the current frame is considered as ‘voiced’ Otherwise, the current frame is considered as ‘voiced’ and the held will be considered as the good estimation and the held will be considered as the good estimation for the current frame. for the current frame.

Modification apply biasing to andModification apply biasing to and

Biasing is applied if the following conditions Biasing is applied if the following conditions The two previous frames were classified as ‘voiced’The two previous frames were classified as ‘voiced’ The value of the previous frame is not being temporarily The value of the previous frame is not being temporarily

held. held. The The of previous frame is less than 7/4 *( of its of previous frame is less than 7/4 *( of its

preceding voiced frame ) , and greater than 5/8* preceding voiced frame ) , and greater than 5/8*

The biasing tends to increase the percentage of The biasing tends to increase the percentage of unvoiced regions of speech being incorrectly unvoiced regions of speech being incorrectly classified as ‘voiced’. classified as ‘voiced’.

F` a f. 0

f. 0 f. 0

Calculate the fundamental period:Calculate the fundamental period:

The fundamental period for the frame is estimated The fundamental period for the frame is estimated by calculate by calculate r x,y n

` a

rx,y n` a

Xj 1

n

x jb c

Ay jb c

Xj 1

n

x jb c2

AXj 1

n

y jb c2

vuut

wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

ImplementationImplementation

In this report will be cover the eSRFD In this report will be cover the eSRFD algorithm, implementation by MATLAB algorithm, implementation by MATLAB ver 7.2b to program following by ver 7.2b to program following by eSRFD algoithmeSRFD algoithm

The ResultThe Result

The ResultThe Result

ConclusionConclusion

The acoustic correlate of pitch is the fundamental The acoustic correlate of pitch is the fundamental frequency of speechfrequency of speech. .

Enhance SRFD (eSRFD) is the performances of the Enhance SRFD (eSRFD) is the performances of the SRFD which can reduce the occurrence of error involved SRFD which can reduce the occurrence of error involved in the extraction of fundamental frequency[1]. in the extraction of fundamental frequency[1].

It have occurrence error in the result which depend on It have occurrence error in the result which depend on kind of speech waveform. kind of speech waveform.

In addition, the result in this project has more occurrence In addition, the result in this project has more occurrence error than Paul Baghaw’s result[2] because of the error than Paul Baghaw’s result[2] because of the problem from design to implement programming follow problem from design to implement programming follow by eSRFD algorithm. by eSRFD algorithm.

ReferencesReferences

[1] Pual Christopher Bagshaw (1994). Automatic [1] Pual Christopher Bagshaw (1994). Automatic prosodic analysis for computer aided prosodic analysis for computer aided pronunciation teaching. The university of pronunciation teaching. The university of Edinburgh. Edinburgh.

[2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A [2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A (1993). Enhanced pitch tracking and the (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided processing of f0 contours for computer aided intonation teaching. intonation teaching. International Speech International Speech Communication Association. In Proc. Communication Association. In Proc. Eurospeech '93, Berlin, volume 2, pages 1003-Eurospeech '93, Berlin, volume 2, pages 1003-1006, 1993.1006, 1993.

Documents

Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling