A Spectral-Temporal Method for Pitch Tracking

1

A Spectral-Temporal Method for Pitch Tracking

Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu*

Department of Electrical and Computer Engineering

Old Dominion University, Norfolk, VA 23529, USA.

* Currently at Binghamton University

09/17/2006

2

Outline

Introduction Algorithm

Algorithm overview The use of nonlinear processing Pitch tracking from the spectrum

Experimental evaluation Conclusion

3

Introduction

Pitch(the fundamental frequency) applications Automatic speech recognition (ASR), speech synthesis,

speech articulation training aids, etc. Pitch detection algorithms

“Robust and accurate fundamental frequency estimation based on dominant harmonic components,” Nakatani, etc=> High accuracy for noisy speech reported using the harmonic dominance spectrum

“Yet another algorithm for pitch tracking(YAAPT),” Zahorian, etc=> Hybrid spectral-temporal processing for pitch tracking

4

Algorithm Overview

F0 candidates estimation F0 candidates estimation

Squared Value of Speech

Original Speech

Spectrum

Refined F0 Candidates

Refined F0 Candidates

Final F0 Final F0 determination using dynamic programming

Nonlinear processing

FFT

Pitch Tracking

F0 candidates (Squared Value) Spectral F0 track

F0 candidates (Original Speech)

Candidates refinement

5

Restoration of missing fundamental in telephone speech A periodic sound is characterized by the spectrum of its

harmonics The signal the fundamental missed be approximated as

After squaring and applying trigonometric identities

)3cos()2cos()( 32 tbtbty 1st harmonic 2nd harmonic Fundamental

)cos(1 tb

The Use of Nonlinear Processing

ttbb

ttbbty

b

bbb

6cos5cos

4coscos

232

23222

23

22

23

22

The fundamental reappears

6

Illustration of Nonlinear Processing

The telephone speech signal (top panel) and squared telephone signal (bottom panel) for one frame

7

Illustration of Nonlinear Processing The magnitude spectrum for the telephone (top panel) and nonlinear

processed signal (bottom panel)

8

Spectral Effects from Nonlinear Processing

The missing fundamental in the telephone speech (top panel) is restored in the squared signal (bottom panel)

Spectrum of the telephone speech

Time (Seconds)

Fre

quen

cy (

Hz)

18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 23

100

200

300

400

Spectrum of the nonlinear processed signal

Time (Seconds)

Fre

quen

cy (

Hz)

18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 23

100

200

300

400

9

Pitch Tracking From the Spectrum

The pitch track from the spectrum refines the pitch candidates estimated from the temporal method

To achieve a noise robust pitch track from the spectrum, an autocorrelation type of function is proposed

10

The function takes into account multiple harmonics

Equation

0 100 200 300 400 500 600 700 800 900 10000

0.05

0.1

0.15

0.2

Frequency (Hz)

Spectrum

0 50 100 150 200 250 300 350 4000

0.2

0.4

0.6

0.8

1

Frequency (Hz)

Autocorrelation type of function

WL

k 2k

3k

4k

Autocorrelation type of Function

2/

2/

1

1

)()(WL

WLi

N

n

inkfky

)(if : The spectrum,WL: Window length (20Hz)N: The number of harmonics (3),

k: Frequency index, max_0min_0 FF kkk

0 200 400 600 800 10000

0.05

0.1

0.15

0.2

Frequency (Hz)

Spectrum

0 100 200 300 4000

0.2

0.4

0.6

0.8

1

Frequency (Hz)

Autocorrelation type of function

X X X

11

0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4Spectrum

Frequency(Hz)

Am

plitu

de

0 50 100 150 200 250 300 350 400 4500

0.5

1Peaks in autocorrelation type of function

Frequency(Hz)

Am

plitu

de

Peaks in Autocorrelation Type of Function

A very prominent peak is observed in the proposed function

12

Candidate Insertion to Reduce Pitch Doubling/Halving

If all candidates are larger than a threshold (typically 150 Hz), an additional candidate is inserted at half the frequency of the highest-ranking candidate

Similar logic is used to reduce pitch halving

0 50 100 150 200 250 300 350 400 0

0.5

1 Peaks in autocorrelation type of function

Frequency(Hz)

Am

plitu

de P1 P2(Hz)=P1(Hz)/2

13

Experimental Evaluation

Database Keele pitch extraction database 5 male and 5 female speakers, about 35seconds speaker High quality speech and telephone speech Additive Gaussian noise

Controls (reference pitch) Control C1: supplied in Keele database Control C2: computed from the laryngograph signal

with the proposed algorithm

14

Definition of Error Measures

Gross error The percentage of frames such that the pitch estimate of

the tracker deviates significantly (typically 20%) from the reference pitch (control)

Only evaluated in the voiced sections of the reference

15

Experiment 1 Results

Individual performance of the proposed algorithm

Control Studio,

Clean (%)

Studio,

5dB Noise(%)

Telephone,

Clean (%)

Telephone,

5dB Noise(%)

YAAPT C1 4.26 7.62 8.14 17.85

YAAPT* C1 1.59 1.99 2.69 4.48

Spectral method

C1 4.23 4.45 6.52 6.95

NCCF C1 3.58 4.52 8.00 16.61

YAAPT*: Using control C1 for the spectral pitch trackNCCF : Normalized cross correlation function, used as the temporal method in YAPPT

16

Experiment 2 Results

The results of the new method with various error thresholds

Error Threshold

Control Studio,

Clean (%)

Studio,

5dB Noise(%)

Telephone,

Clean (%)

Telephone,

5dB Noise(%)

10% C1 5.46 7.31 9.39 16.14

10% C2 4.18 6.06 7.77 14.78

20% C1 2.90 3.65 4.86 7.45

20% C2 1.56 2.16 3.27 5.85

40% C1 2.25 2.44 2.75 3.63

40% C2 0.91 1.06 0.99 2.05

17

Comparisons

DASH, REPS, YIN: the results are reported in “Robust and accurate fundamental frequency estimation ... ,” Nakatani, etc.

*: SRAEN filter simulated telephone speech

ControlStudio,

Clean (%)

Studio,

5dB Noise(%)

Telephone,

Clean (%)

Telephone,

5dB Noise(%)

Proposed Method

C1 2.90 3.65 4.86(4.52 *) 7.45(5.90 *)

DASH C1 2.81 2.32 3.73* 4.15 *

REPS C1 2.68 2.98 6.91* 8.49 *

YIN C1 2.57 7.22 7.55* 14.6*

18

Conclusion

A new pitch-tracking algorithm has been developed which combines multiple information sources to enable accurate robust F0 tracking

An analysis of errors indicates better performance for both high quality and telephone speech than previously reported performance for pitch tracking

Acknowledgements This work was partially supported by JWFC 900

Documents

A Spectral-Temporal Method for Pitch Tracking