Upload
brittney-ami-sherman
View
219
Download
0
Embed Size (px)
Citation preview
Pitch Estimation by Enhanced Pitch Estimation by Enhanced Super Resolution determinatorSuper Resolution determinator
By By
Sunya SantananchaiSunya Santananchai
Chia-Ho LingChia-Ho Ling
ObjectiveObjective
Estimate value of the fundamental Estimate value of the fundamental frequency of speech by using frequency of speech by using Enhance Super Resolution determinator Enhance Super Resolution determinator (eSRFD)(eSRFD)
F` a
F` a
IntroductionIntroduction
The fundamental frequency of speech is The fundamental frequency of speech is defined as the rate of glottal pluses defined as the rate of glottal pluses generated by the vibration of the vocal generated by the vibration of the vocal folds. folds.
The pitch of speech is the perceptual The pitch of speech is the perceptual correlate of fundamental frequency . correlate of fundamental frequency .
The fundamental frequency of speech is The fundamental frequency of speech is important in the prosodic features of stress important in the prosodic features of stress and intonation. and intonation.
fundamental frequency fundamental frequency determination Algorithm (FDAs).determination Algorithm (FDAs).
Determine the fundamental frequency of Determine the fundamental frequency of speech waveform or analyzing the pitch speech waveform or analyzing the pitch automatically.automatically.
Desire to examine methods of Desire to examine methods of fundamental frequency extraction which fundamental frequency extraction which use radically different techniques use radically different techniques
The algorithms to determine theThe algorithms to determine the Cepstrum-based determinator (CFD) (Noll, Cepstrum-based determinator (CFD) (Noll,
1969).1969). Harmonic product spectrum (HPS) (Schroeder, Harmonic product spectrum (HPS) (Schroeder,
1968; Noll, 1970)1968; Noll, 1970) Feature-based tracker (FBFT) (Phillips, 1985)Feature-based tracker (FBFT) (Phillips, 1985) Parallel processing method (PP) (Gold & Parallel processing method (PP) (Gold &
Rabiner, 1969)Rabiner, 1969) Integrated tracking algorithm (IFTA) (Secrest & Integrated tracking algorithm (IFTA) (Secrest &
Doddington, 1983)Doddington, 1983) Super resolution determinator (SRFD) (Medan et Super resolution determinator (SRFD) (Medan et
al., 1991)al., 1991)
F` a
Enhance Super Resolution Enhance Super Resolution determinator (eSRFD)determinator (eSRFD)
based on the SRFD method which uses a based on the SRFD method which uses a waveform similarity metric normalized waveform similarity metric normalized cross-correlation coefficient. cross-correlation coefficient.
Performances of the SRFD algorithm, to Performances of the SRFD algorithm, to reduced the occurrence of errors.reduced the occurrence of errors.
The eSRFD algorithm The eSRFD algorithm
Pass the speech waveform to low-pass Pass the speech waveform to low-pass filter .filter . The speech waveform is initially low-pass The speech waveform is initially low-pass
filtered.filtered.
Each frame of filtered sample data Each frame of filtered sample data processed by the silence detector.processed by the silence detector. Signal is analysed frame-by-frame; interval Signal is analysed frame-by-frame; interval
6.4 ms of non-overlapping.6.4 ms of non-overlapping. Contains a set of samples Contains a set of samples DDivided 3 consecutive segmentivided 3 consecutive segment
sN s i` a
| i2 @Nmax , , N@Nmax
R S
X n x i` a
s i@n` a
| i2 1, ,nR S
Y n y i` a
s i` a
| i2 1, ,nR S
Z n z i` a
s i n` a
| i2 1, ,nR S
Analysis segments for the enhanced super resolution determinatorAnalysis segments for the enhanced super resolution determinatorF` a
Normalized cross-correlation for Normalized cross-correlation for ‘voiced’ frame:‘voiced’ frame: If frame of data is not classified as silence or If frame of data is not classified as silence or
unvoice, then candidate values for the unvoice, then candidate values for the fundamental period by using the first fundamental period by using the first normalized cross-correlation of normalized cross-correlation of px,y n
` a
px,y n` a
Xj 1
n L+B C
x jLb c
Ay jLb c
Xj 1
n L+B Cvuuu
t
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
x jLb c2
AXj 1
n L+B C
y jLb c2
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Definition threshold for candidate valueDefinition threshold for candidate value Candidate values of the fundamental period Candidate values of the fundamental period
are obtained by locating peaks in the are obtained by locating peaks in the normalized crosscorrelation coefficient for normalized crosscorrelation coefficient for which the value of exceeds a specified the which the value of exceeds a specified the threshold. threshold.
A second normalized cross-correlation A second normalized cross-correlation coefficient .coefficient . The frame is classified as ‘voiced’ which has The frame is classified as ‘voiced’ which has
>> Determined the second normalized cross-Determined the second normalized cross-
correlation coefficient correlation coefficient
px,y n` a
T srfd
p y,z n` a
py,z n` a
Xj 1
n L+B C
y jLb c
Az jLb c
Xj 1
n L+B Cvuuu
t
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
y jLb c2
AXj 1
n L+B C
z jLb c2
fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Candidate score forCandidate score for Candidates for exceeds the threshold Candidates for exceeds the threshold
are given a score of 2, others are 1. are given a score of 2, others are 1.
If there are 1 or more candidates with a score of 2 If there are 1 or more candidates with a score of 2 in a frame, then all those candidates with a score in a frame, then all those candidates with a score of 1 are removed from the list of candidates. of 1 are removed from the list of candidates.
If there is only one candidate (with score 1 or 2), If there is only one candidate (with score 1 or 2), the candidate is assumed to be the best estimate the candidate is assumed to be the best estimate of the fundamental period of that frame.of the fundamental period of that frame.
p y,z n` a
p y,z n` a
T srfd
Otherwise, an optimal fundamental period is sought from the Otherwise, an optimal fundamental period is sought from the set of remaining candidates , calculated the coefficient of set of remaining candidates , calculated the coefficient of each candidate.each candidate.
The first coefficient is assumed to be the optimal value. If The first coefficient is assumed to be the optimal value. If the subsequent * 0.77 > the current optimal value , the the subsequent * 0.77 > the current optimal value , the subsequent is the optimal value. subsequent is the optimal value.
q nm` a
q nm` a
Xj 1
n M
s j@nMb c
As j nmb c
Xj 1
n M
s j@nMb c2
AXj 1
n M
s j nmb c2
vuut
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
q n1` a
q nm` a
q nm` a
In the case of only 1 candidate score 1 but no In the case of only 1 candidate score 1 but no candidate score2, the frame status will be candidate score2, the frame status will be reconsidered depends on the frames state of reconsidered depends on the frames state of previous frame. previous frame.
If the previous frame is ‘silent’, the current value is hold If the previous frame is ‘silent’, the current value is hold and depends on the next frame. and depends on the next frame.
If the next frame is also ‘silent’, the current frame will be If the next frame is also ‘silent’, the current frame will be considered as ‘silent’. considered as ‘silent’.
Otherwise, the current frame is considered as ‘voiced’ Otherwise, the current frame is considered as ‘voiced’ and the held will be considered as the good estimation and the held will be considered as the good estimation for the current frame. for the current frame.
Modification apply biasing to andModification apply biasing to and
Biasing is applied if the following conditions Biasing is applied if the following conditions The two previous frames were classified as ‘voiced’The two previous frames were classified as ‘voiced’ The value of the previous frame is not being temporarily The value of the previous frame is not being temporarily
held. held. The The of previous frame is less than 7/4 *( of its of previous frame is less than 7/4 *( of its
preceding voiced frame ) , and greater than 5/8* preceding voiced frame ) , and greater than 5/8*
The biasing tends to increase the percentage of The biasing tends to increase the percentage of unvoiced regions of speech being incorrectly unvoiced regions of speech being incorrectly classified as ‘voiced’. classified as ‘voiced’.
F` a f. 0
f. 0 f. 0
Calculate the fundamental period:Calculate the fundamental period:
The fundamental period for the frame is estimated The fundamental period for the frame is estimated by calculate by calculate r x,y n
` a
rx,y n` a
Xj 1
n
x jb c
Ay jb c
Xj 1
n
x jb c2
AXj 1
n
y jb c2
vuut
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ImplementationImplementation
In this report will be cover the eSRFD In this report will be cover the eSRFD algorithm, implementation by MATLAB algorithm, implementation by MATLAB ver 7.2b to program following by ver 7.2b to program following by eSRFD algoithmeSRFD algoithm
The ResultThe Result
The ResultThe Result
ConclusionConclusion
The acoustic correlate of pitch is the fundamental The acoustic correlate of pitch is the fundamental frequency of speechfrequency of speech. .
Enhance SRFD (eSRFD) is the performances of the Enhance SRFD (eSRFD) is the performances of the SRFD which can reduce the occurrence of error involved SRFD which can reduce the occurrence of error involved in the extraction of fundamental frequency[1]. in the extraction of fundamental frequency[1].
It have occurrence error in the result which depend on It have occurrence error in the result which depend on kind of speech waveform. kind of speech waveform.
In addition, the result in this project has more occurrence In addition, the result in this project has more occurrence error than Paul Baghaw’s result[2] because of the error than Paul Baghaw’s result[2] because of the problem from design to implement programming follow problem from design to implement programming follow by eSRFD algorithm. by eSRFD algorithm.
ReferencesReferences
[1] Pual Christopher Bagshaw (1994). Automatic [1] Pual Christopher Bagshaw (1994). Automatic prosodic analysis for computer aided prosodic analysis for computer aided pronunciation teaching. The university of pronunciation teaching. The university of Edinburgh. Edinburgh.
[2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A [2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A (1993). Enhanced pitch tracking and the (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided processing of f0 contours for computer aided intonation teaching. intonation teaching. International Speech International Speech Communication Association. In Proc. Communication Association. In Proc. Eurospeech '93, Berlin, volume 2, pages 1003-Eurospeech '93, Berlin, volume 2, pages 1003-1006, 1993.1006, 1993.