Using phonetic feature extraction to determine optimal ... fileUsing phonetic feature extraction to determine optimal speech regions for maximising the e ectiveness of glottal source

Using phonetic feature extraction to determineoptimal speech regions for maximising the

effectiveness of glottal source analysis

John Kane, Irena Yanushevskaya, John Dalton, Christer Gobl,Ailbhe Nı Chasaide

Monday August 26th, 2013Interspeech

Lyon, France

Phonetic feature extraction for glottal processing 1

Glottal source analysis


Speech production / Glottal inverse filtering


Glottal source in speech technology

Speech synthesisSpeech recognition

Speaker verification


Previous work - Centres of reliability (Mokhtari et al.)

��

� ��

��9��:��1�� !�� "� �� G��$�� G� �� 1��$�K�� 4>�� @�� .�� $�� 0�� !�� G>��"�� $�� K��A�� 6�M��6� �� 6��6��7�� .�� <��C��$��<�� $��

�� !�� "� �� @�� 9��:�� G$��G<��

+�� ,� ��

�� !� "1� ��

<�4�=��

��$�� 5��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��!

��

��

�� " ��

��

��

#��$��

��

�

�� %��

��

��

��

$�*�(� �)�� *� 6*�� \)��#$4&��\)��


Previous work - Phonetic feature extraction

Speech synthesis

Speech recognition


Introduction - Research aims

1 Implement a method for detecting binary phonetic features

2 Quantitatively evaluate phonetic-sensitive glottal sourceprocessing


Phonetic feature extraction


Phonetic feature extraction - Speech data & target labels

ARCTIC: 9 English speakers, 1000+ sentences each

IIIT: 6 speakers of different Indic languages, 1000 sentenceseach

Binary phonetic classes: {Voiced, fricative, nasal, high vowel}

Target labelling: e.g., FRICATION

/a/ => 0

/f/ => 1

/t/ => 0


Phonetic feature extraction - Features & learning

Audio Waveform

MFCC +�/��

Neural NetworkFeature Extraction

VOICING

Features: 13 MFCCs with ∆ and ∆∆

ANN: Multi-layer perceptron, one hidden layer, 100 neurons


Phonetic feature extraction - Speaker independent results

Voiced Fricative Nasal High vowel0

5

10

15

20

Err

or (

%)

Voiced Fricative Nasal High vowel0

0.2

0.4

0.6

0.8

1

F1

scor

e

Interspeech

Post−Interspeech


Phonetic feature extraction - Illustration

“Not at this particular case Tom ...”


Phonetic feature extraction - Illustration

“Not at this particular case Tom ...”


Glottal source processing


Glottal source processing

Glottal source analysis difficult to quantitatively evaluate

Assessed implicitly here through voice quality classificationexperiments


Glottal source processing - Speech data

6 speakers, 17 TIMIT utterances in 3 phonation types(breathy, modal, tense)


Glottal source processing - Features

Model parameters: Liljencrants-Fant (LF) model fit usingdyProg-LF algorithm => {Ra, Rk, Rg}

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016−0.1

−0.05

0

0.05

0.1

Time [s]

Am

plitu

de

Diff. glottal flowLF model

Direct parameters:NAQ: Normalised Amplitude QuotientQOQ: Quasi-Open QuotientH1-H2: Difference in amplitude of first two glottal harmonics


Glottal source processing - Classification

Support Vector Machines (SVMs):One-against-one multi-class architectureRadial Basis Function (RBF) kernel10-fold cross-validation experiments (incrementally removingfeature data from certain phonetic regions)


Glottal source processing - Results

Baseline Sys 1 Sys 2 Sys 3 Sys 4 Sys 520

22

24

26

28

30

32

34

36C

lass

ifica

tion

erro

r(%

)




22

24

26

28

30

32

34

36

Cla

ssifi

catio

ner

ror(

%)

BASELINE: Using all glottal feature data




22

24

26

28

30

32

34

36

Cla

ssifi

catio

ner

ror(

%)

... excluding high vowel regions => :(




22

24

26

28

30

32

34

36

Cla

ssifi

catio

ner

ror(

%)

... additionally excluding fricative regions∗∗∗ => :)




22

24

26

28

30

32

34

36

Cla

ssifi

catio

ner

ror(

%)

... additionally excluding nasal regions




22

24

26

28

30

32

34

36

Cla

ssifi

catio

ner

ror(

%)

JUST excluding nasal regions∗ => :)




22

24

26

28

30

32

34

36C

lass

ifica

tion

erro

r(%

)

Using phonetic features as input features in the classifier => :(


What did we find?

Implementation of phonetic feature extraction based on ANNs

Using information from this (i.e. removing feature data fromfricative and nasal regions) significantly improved voice qualityclassification


Future ...

Optimise phonetic feature extraction

Increase set of phonetic features

Investigate other context-sensitive glottal source processingmethods (e.g., adaptive vocal tract model)

Application in other areas of speech processing


Website: http://covarep.github.io/covarep

GitHub: https://github.com/covarep/covarep


h

h

Thank you!Website: http://covarep.github.io/covarep

GitHub: https://github.com/covarep/covarep


h

h

Documents

Using phonetic feature extraction to determine optimal ... fileUsing phonetic feature extraction to determine optimal speech regions for maximising the e ectiveness of glottal source