Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue Our Initial Effort Enhancement of speaker recognition through score level

Accent Modeling

An Overview

02/09/07 iCONS Group Presentation 2

Prologue

Our Initial Effort Enhancement of speaker recognition through

score level fusion of Arithmetic Harmonic Sphericity (AHS) and Hidden Markov Model (HMM) techniques

performance improvements of 22% and 6% true acceptance rate (at 5% false acceptance rate) on YOHO and USF multi-modal biometric datasets, respectively.


Prologue…contd

Enhanced Recognition at various FARs (YOHO)

0102030405060708090

3 5False Acceptance Rate (%)

True

Acc

epta

nce R

ate (

%)))

))))

AHS

HMM

HF

Enhanced Recognition at Various FARs (USF data)

0

10

20

30

40

50

60

70

80

3 5False Acceptance Rate (%)

True

Acc

epta

nce R

ate (

%) (

%))

AHS

HMMHF


Prologue – what next

Further improvement of recognition rate through speaker accent

Speaker accent will play a critical role in the evaluation of biometric systems, since users will be international in nature.

Incorporating accent model in the speaker recognition/verification system will be a key component that our study will focus on.


Accent

What is accent The cumulative auditory effect of those features of

pronunciation which identify where a person is from regionally and socially.

Difference between accent and dialect Accent is the negative (or rather colorful) influence of

the first language (L1) of a speaker to a second language, while Dialects of a given language are differences in speaking style of that language (which all belong to L1) because of geographical and ethnic differences.


Accent

Factors affecting the level of accent Age at which speaker learns the second

language. Nationality of speaker’s language instructor. Grammatical and phonological differences

between the primary and secondary languages.

Amount of interaction the speaker has with native language speakers.


Applications of Accent Modeling

Accent knowledge can be used for selection of alternative pronunciations or provide information for biasing a language model for speech recognition.

Accent can be useful in profiling speakers for call routing in a call centre.

Document retrieval systems.Speaker recognition systems.


Examples of Accent

- Native American English - Indian - Chinese - British - Japanese - Russian - Arabic - Greek


World’s Major Languages


Accent Classification System

Speech Data

(Training)

Extract Accent Features

Reference Accent Model 1

Speech Data

(Testing)


Classification

Speech Data

(Training)


Reference Accent Model N

Score


Accent– Research Work M. V. Chan, et.al., "Classification of speech accents with neural

networks," IEEE World Congress on Computational Intelligence, vol.7, pp.4483-4486, 27 Jun-2 Jul 1994.

L. M. Arslan, “Foreign Accent Classification in American English,” Ph. D. Dissertation, Duke University, 1996.

C. Teixeira, I. Trancoso, and A. Serralheiro, “Accent identification,” In Proc. International Conference on Spoken Language Processing, vol.3, pp.1784-1787, 1996.

P. Fung and W.K. Liu, "Fast Accent Identification and Accented Speech Recognition," in Proc. ICASSP'99, vol.1, pp. 221-224, 1999.

T. Chen, et.al., "Automatic accent identification using Gaussian mixture models," ASRU '01, pp. 343- 346, 9-13 Dec. 2001.

P. Angkititrakul, J.H.L. Hansen, "Stochastic Trajectory Model Analysis for Accent Classification”, Inter. Conf. on Spoken Language Processing, vol. 1, pp. 493-496, Sept. 2002.

X. Lin, S. Simske, "Phoneme-less hierarchical accent classification," Signals, Systems and Computers, vol.2, pp. 1801-1804, 7-10 Nov. 2004.


Research Work … Contd F. Farahani, et.al., "Speaker identification using supra-segmental pitch

pattern dynamics," in Proc. ICASSP‘04, vol.1, pp. I-89-92, 17-21 May 2004. M. M. Tanabian, et.al., "Automatic speaker recognition with formant

trajectory tracking using CART and neural networks," Canadian Conference on Electrical and Computer Engineering, pp. 1225- 1228, 1-4 May 2005.

S. Gray, J. H. L. Hansen, "An integrated approach to the detection and classification of accents/dialects for a spoken document retrieval system," ASRU '05, pp. 35- 40, 27 Nov-1 Dec. 2005.

P. Angkititrakul, J. H. L. Hansen, "Advances in Phone-based Modeling For Automatic Accent Classification," IEEE Transactions on Audio, Speech, and Language Processing, vol.14, pp. 634- 646, March 2006 .

K. Bartkova, D. Jouvet, "Using Multilingual Units for Improved Modeling of Pronunciation Variants," in Proc. ICASSP‘06, vol.5, pp. V-1037- V-1040, 14-19 May 2006.

A. Ikeno, J. H. L. Hansen, "Perceptual Recognition Cues in Native English Accent Variation: "Listener Accent, Perceived Accent, and Comprehension,” in Proc. ICASSP‘04, vol.1, pp. I-401- I-404, 14-19 May 2006.


Accent Classification Tree

Speech Dataset

Accent Features:

Modeling:

Classification/Decision

Pitch

Stochastic Trajectory ModelsArtificial Neural Networks

Gaussian Mixture Models Hidden Markov Models

Formant Trajectories

Energy Delta MFCCs

MFCCs Formants


Foreign Accent Classification in American English - Dataset

Dataset consists of neutral American English, German, Spanish, Chinese, Turkish, French, Italian, Hindi, Rumanian, Japanese, Persian and greek accents.

All speech was sampled at 8000 Hz

Totally, 43 speakers used microphone input and 68 speakers used telephone input, in a quiet office environment.


Formant Frequency Analysis

Formants represent those frequencies which encompass the majority of the acoustic energy from source to output with an acoustic tube model as the system.

Second and Third formants are particularly favorable for accent classification


Mel Scale Vs Accent Scale


Accent Classifier

The features consisted of 8 dimensional ASCCs, energy along with their delta features.

The IW-FS, CS-FS, and CS-PS classified with 74.5%, 61.3%, and 68.3% respectively.

Using a test word count of 7-8 words, accent classification accuracy among 4 accents is 93%.


Computer Vs Humans


Conclusions about specific features

Word-final stop release time is longer among foreign accents

Slope of intonation contour for isolated words is more negative for Chinese speakers, and more positive for German speakers than native speakers

Voice onset time for unvoiced stops is not a significant contributor for accents considered in this study.

Second and third formant positions are different for native and non native speakers.


Accent Classification/Detection using ANN

Demographic data including speaker’s age, percentage of time in a day when English used as communication and the number of years English was spoken were used as features, along with speech features: average pitch frequency and averaged first three formant frequencies were given as inputs to the neural network.

A dataset of 10 native and 12 non-native speakers were used. F2 and F3 distributions of native and non-native groups show

high dissimilarity. Three neural network classification techniques namely

competitive learning, counter propagation and back propagation were compared.

Back propagation gave a detection rate of 100% for training data and 90.9% for testing data.


Phoneme less Hierarchical Accent Classification

WSJCAM0 & TIDIGITS were used to train British and American accents respectively.

IViE & Voicemail were used to test British and American accents respectively.

13 dimensional MFCCs were used as features and 64-component Gaussian Mixture Model was used for modeling.


Results show an average 7.1% error rate reduction relatively when compared to direct accent classification.


Accent Classification Application


Advances in Phone Based Modeling

Conventional HMMs assumes that the sequence of features are produced by a piecewise stationary process.

Hidden Markov Modeling assumes that adjacent frames are acoustically uncorrelated.

Also that the state dependant duration distributions are exponentially decreasing.


Why Phone Based Modeling?

Capturing the temporal variation of acoustic signal is an important aspect of speech recognition.

A better framework for modeling the evolution of the spectral dynamics of speech

Flexibility and power due to whole segment classification, in contrast to frame by frame classification


Trajectories of the phoneme sequence /aa/ - /r/ from the word

‘Target’


Stochastic Trajectory Model An STM represents the acoustic observations of a

phoneme as clusters of trajectories in a parametric space.

If X is a sequence of N points :

Where each point is a D-dimensional vector,X is obtained by resampling a sequence of d frames along

the linear time scale.

0 1 1( , ,..., )NX x x x


Stochastic Trajectory Model The resampled N-Frame vector vector X is considered to be

underlying trajectory of the original X with d frames. The pdf of a segment X given a duration d and the segment symbol s is:

Where is the set of all trajectory components associated with

, is the probability of observing trajectory , given that the segment is , with the constraint that

is the pdf of the vector sequence X, given component trajectory , duration , symbol .

( | ) 1,kk Ts

pr t s s

( | , ) ( | , , ) ( | )k kt Tk s

p X d s p X t d s pr t s

sT

s( | )kpr t s kt

s

( | , , )kp X t d s

kt d s


Stochastic Trajectory Model

The distribution assigned to each of the samples points on a trajectory is characterized by a multivariate Gaussian distribution with a mean vector , and covariance matrix . With the assumption of frame independent trajectories, the pdf is modeled as,

The training algorithm performs maximum likelihood estimation of the parameters of the gaussian distribution.

,sk im ,

sk i

1

, ,0

( | , , ) ( ; , )N s s

k k i k ii

p X t d s Gaussian X m


Accent Classification System


Performance – Male and Female Chinese vs American-English


Further Investigation

Further study of accent classification and detection.

Study of accent in a linguistic point of view.

Experimentation and formulation of accent modeling and classification.

Combination of Accent information with my previous work to achieve speaker recognition enhancement.

Questions

Thank You

Documents

Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue Our Initial Effort Enhancement of speaker recognition through score level