Upload
viola
View
36
Download
1
Tags:
Embed Size (px)
DESCRIPTION
A Spectral-Temporal Method for Pitch Tracking. Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old Dominion University, Norfolk, VA 23529, USA. * Currently at Binghamton University 09/17/2006. Outline. Introduction Algorithm - PowerPoint PPT Presentation
Citation preview
1
A Spectral-Temporal Method for Pitch Tracking
Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu*
Department of Electrical and Computer Engineering
Old Dominion University, Norfolk, VA 23529, USA.
* Currently at Binghamton University
09/17/2006
2
Outline
Introduction Algorithm
Algorithm overview The use of nonlinear processing Pitch tracking from the spectrum
Experimental evaluation Conclusion
3
Introduction
Pitch(the fundamental frequency) applications Automatic speech recognition (ASR), speech synthesis,
speech articulation training aids, etc. Pitch detection algorithms
“Robust and accurate fundamental frequency estimation based on dominant harmonic components,” Nakatani, etc=> High accuracy for noisy speech reported using the harmonic dominance spectrum
“Yet another algorithm for pitch tracking(YAAPT),” Zahorian, etc=> Hybrid spectral-temporal processing for pitch tracking
4
Algorithm Overview
F0 candidates estimation F0 candidates estimation
Squared Value of Speech
Original Speech
Spectrum
Refined F0 Candidates
Refined F0 Candidates
Final F0 Final F0 determination using dynamic programming
Nonlinear processing
FFT
Pitch Tracking
F0 candidates (Squared Value) Spectral F0 track
F0 candidates (Original Speech)
Candidates refinement
5
Restoration of missing fundamental in telephone speech A periodic sound is characterized by the spectrum of its
harmonics The signal the fundamental missed be approximated as
After squaring and applying trigonometric identities
)3cos()2cos()( 32 tbtbty 1st harmonic 2nd harmonic Fundamental
)cos(1 tb
The Use of Nonlinear Processing
ttbb
ttbbty
b
bbb
6cos5cos
4coscos
232
23222
23
22
23
22
The fundamental reappears
6
Illustration of Nonlinear Processing
The telephone speech signal (top panel) and squared telephone signal (bottom panel) for one frame
7
Illustration of Nonlinear Processing The magnitude spectrum for the telephone (top panel) and nonlinear
processed signal (bottom panel)
8
Spectral Effects from Nonlinear Processing
The missing fundamental in the telephone speech (top panel) is restored in the squared signal (bottom panel)
Spectrum of the telephone speech
Time (Seconds)
Fre
quen
cy (
Hz)
18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 23
100
200
300
400
Spectrum of the nonlinear processed signal
Time (Seconds)
Fre
quen
cy (
Hz)
18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 23
100
200
300
400
9
Pitch Tracking From the Spectrum
The pitch track from the spectrum refines the pitch candidates estimated from the temporal method
To achieve a noise robust pitch track from the spectrum, an autocorrelation type of function is proposed
10
The function takes into account multiple harmonics
Equation
0 100 200 300 400 500 600 700 800 900 10000
0.05
0.1
0.15
0.2
Frequency (Hz)
Spectrum
0 50 100 150 200 250 300 350 4000
0.2
0.4
0.6
0.8
1
Frequency (Hz)
Autocorrelation type of function
WL
k 2k
3k
4k
Autocorrelation type of Function
2/
2/
1
1
)()(WL
WLi
N
n
inkfky
)(if : The spectrum,WL: Window length (20Hz)N: The number of harmonics (3),
k: Frequency index, max_0min_0 FF kkk
0 200 400 600 800 10000
0.05
0.1
0.15
0.2
Frequency (Hz)
Spectrum
0 100 200 300 4000
0.2
0.4
0.6
0.8
1
Frequency (Hz)
Autocorrelation type of function
X X X
11
0 200 400 600 800 1000 12000
0.1
0.2
0.3
0.4Spectrum
Frequency(Hz)
Am
plitu
de
0 50 100 150 200 250 300 350 400 4500
0.5
1Peaks in autocorrelation type of function
Frequency(Hz)
Am
plitu
de
Peaks in Autocorrelation Type of Function
A very prominent peak is observed in the proposed function
12
Candidate Insertion to Reduce Pitch Doubling/Halving
If all candidates are larger than a threshold (typically 150 Hz), an additional candidate is inserted at half the frequency of the highest-ranking candidate
Similar logic is used to reduce pitch halving
0 50 100 150 200 250 300 350 400 0
0.5
1 Peaks in autocorrelation type of function
Frequency(Hz)
Am
plitu
de P1 P2(Hz)=P1(Hz)/2
13
Experimental Evaluation
Database Keele pitch extraction database 5 male and 5 female speakers, about 35seconds speaker High quality speech and telephone speech Additive Gaussian noise
Controls (reference pitch) Control C1: supplied in Keele database Control C2: computed from the laryngograph signal
with the proposed algorithm
14
Definition of Error Measures
Gross error The percentage of frames such that the pitch estimate of
the tracker deviates significantly (typically 20%) from the reference pitch (control)
Only evaluated in the voiced sections of the reference
15
Experiment 1 Results
Individual performance of the proposed algorithm
Control Studio,
Clean (%)
Studio,
5dB Noise(%)
Telephone,
Clean (%)
Telephone,
5dB Noise(%)
YAAPT C1 4.26 7.62 8.14 17.85
YAAPT* C1 1.59 1.99 2.69 4.48
Spectral method
C1 4.23 4.45 6.52 6.95
NCCF C1 3.58 4.52 8.00 16.61
YAAPT*: Using control C1 for the spectral pitch trackNCCF : Normalized cross correlation function, used as the temporal method in YAPPT
16
Experiment 2 Results
The results of the new method with various error thresholds
Error Threshold
Control Studio,
Clean (%)
Studio,
5dB Noise(%)
Telephone,
Clean (%)
Telephone,
5dB Noise(%)
10% C1 5.46 7.31 9.39 16.14
10% C2 4.18 6.06 7.77 14.78
20% C1 2.90 3.65 4.86 7.45
20% C2 1.56 2.16 3.27 5.85
40% C1 2.25 2.44 2.75 3.63
40% C2 0.91 1.06 0.99 2.05
17
Comparisons
DASH, REPS, YIN: the results are reported in “Robust and accurate fundamental frequency estimation ... ,” Nakatani, etc.
*: SRAEN filter simulated telephone speech
ControlStudio,
Clean (%)
Studio,
5dB Noise(%)
Telephone,
Clean (%)
Telephone,
5dB Noise(%)
Proposed Method
C1 2.90 3.65 4.86(4.52 *) 7.45(5.90 *)
DASH C1 2.81 2.32 3.73* 4.15 *
REPS C1 2.68 2.98 6.91* 8.49 *
YIN C1 2.57 7.22 7.55* 14.6*
18
Conclusion
A new pitch-tracking algorithm has been developed which combines multiple information sources to enable accurate robust F0 tracking
An analysis of errors indicates better performance for both high quality and telephone speech than previously reported performance for pitch tracking
Acknowledgements This work was partially supported by JWFC 900