Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Modeling phonetic category learning from natural acoustic data

Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3, Micha Elsner1, Kasia Hitczenko3, Reiko Mazuka4

1The Ohio State University 2Fairy Devices Inc. 3University of Maryland 4RIKEN Brain Science Institute

Input to infants contains variability


● Yet children acquire phonetic categories of their native language within the first year (Werker & Tees 1984, Polka & Werker 1994)


● Yet children acquire phonetic categories of their native language within the first year (Werker & Tees 1984, Polka & Werker 1994)

● Variability is critical for certain types of language learning (Gomez 2002, Rost & McMurray 2009)

Learning problem

Learning problem

Input:- acoustics- lexical information- etc.

Learning problem


Output:Knowledge ofnative phoneticcategories

Learning problem


?Output:Knowledge ofnative phoneticcategories

Learning problem


?Output:Knowledge ofnative phoneticcategories

Computational Models

e.g. Vallabha et al. (2007), McMurray et al. (2009), Feldman et al. (2013)

Bayesian lexical-distributional clustering model

Acoustic Values Lexical Context

Bayesian Model

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Bayesian lexical-distributional clustering model


Bayesian Model

Vowel Categories Lexical Categories


Feldman et al. 2013

Gaussian distributions of sound Sequences of phones


Bayesian Model



Feldman et al. 2013


ih t s || k ay n d || ah v4302100

5501300

7001220

F1F2


Bayesian Model



Feldman et al. 2013


t s || k n d || v

ih t s || k ay n d || ah v4302100

5501300

7001220

F1F2

4302100

5501300

7001220


Bayesian Model



Feldman et al. 2013


t s || k n d || v

ih ay

ah

ih.t.s k.ay.n.d

ah.v

ih t s || k ay n d || ah v4302100

5501300

7001220

F1F2

4302100

5501300

7001220

Feldman et al. 2013 results

Distributional Lexical-Distributional

0.45 0.76

Acoustic simplification: lab productions


Bayesian Model



Feldman et al. 2013


Hillenbrand et al. 1995

Acoustic simplification: lab productions


Bayesian Model



Feldman et al. 2013


● Stressed, single syllable● No noise, reduction,

co-articulation● No prosodic variability

(affects vowel quality and duration)

Hillenbrand et al. 1995

Acoustic simplification: corpus vowels


Bayesian Model



Feldman et al. 2013


Buckeye Speech corpus (Pitt et al. 2007)

Lexical simplification:phonemic transcription


Bayesian Model



Feldman et al. 2013


ih.t.s k.ay.n.d ah.v ey y.uw.n.iy.k p.ah.z.ih.sh.ah.n

It's kind of a unique position

Lexical simplification:phonetic transcription


Bayesian Model



Feldman et al. 2013



It's kind of a unique position

ih.s k.ah.nx ah.v ey y.ih.n.iy.k p.ah.z.ah.sh.ah.n

What Children Actually Hear

ih.s k.ah.nx ah.v ey y.ih.n.iy.k p.ah.z.ah.sh.ah.n

Natural Vowels Reduced Lexical Items

What Models Actually Receive

Lab Vowels Phonemic Transcription


How do models perform given more naturalistic data?

● Models help explore what can be learned from the input, given some algorithm



● Computational models aren't people; some simplification to input must be made




● What is the impact of input simplifications on model performance?




● What is the impact of input simplifications on model performance?

● Are conclusions drawn from these models reliable?

Overview● Simulation 1: Replication of Simplified Input

● Simulation 2: More realistic lexical information

● Simulation 3: More realistic acoustic information

k.ay.n.d

k.ah.nx

k.ah.nx

Corpora

● Laboratory vowel productions:– English: Hillenbrand et al. 1995

– Japanese: Mokhtari & Tanaka 2000

● Natural Speech:– English: Buckeye Speech corpus (Pitt et al. 2007)

– Japanese: R-JMICC corpus (Mazuka et al. 2006)

Simulation 1: Replication of Simplified Input


Bayesian Model


Hillenbrand et al. 1995 Mokhtari & Tanaka 2000

Buckeye Speech corpus (Pitt et al. 2007)R-JMICC corpus (Mazuka et al. 2006)

Resampled

k.ay.n.d

Phonemic

Replication of Simplified Input: English vs. Japanese Lab Vowels

English Japanese

Simplified Input: Successful Category Recovery

English Japanese


English Japanese

Categories found by model

Tru

e Cate

gories


English Japanese


English Japanese

Phonetic F-Score: 0.98Phonetic F-Score: 0.78

Original Feldman et al. 2013 results: Phonetic F-Score: 0.76


English Japanese

Phonetic F-Score: 0.78 Phonetic F-Score: 0.98

Original Feldman et al. 2013 results: Phonetic F-Score: 0.76

Simulation 2: Phonetic Transcription


Bayesian Model


Resampled

k.ah.nx

Phonetic

Corpus effects ofphonetic transcription

● English: vowels of frequent words reduced to schwa in natural speech → increased number of phonetic variants

k.ih.n

k.ah.n

k.ae.n

k.n

k.eh.nk.ae.n

etc.

Corpus effects ofphonetic transcription

● English: vowels of frequent words reduced to schwa in natural speech → increased number of phonetic variants

● Japanese: less phonetic reduction

Japanese Word Types

Phonemic Transcription: 751Phonetic Transcription: 791

English Word Types

Phonemic Transcription: 1099Phonetic Transcription: 1813

Phonetic Transcription:Decline in English Performance

English Japanese


English Japanese



English Japanese


Simulation 3:Realistic Vowels From Corpus


Bayesian Model


Corpus Vowels

k.ah.nx

Phonetic

Simulation 3:Realistic Vowels From Corpus

English Japanese

Realistic Vowels:Poor Category Recovery

English Japanese


English Japanese



English Japanese


Japanese phrase-final lengthening

● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations


mama



mama

mama:




With Normalization

Without Normalization

mama

mama:



With Normalization


mama

mama:

Summary of Results

Phonetic F-Score

English Japanese

Simulation 1: Simplified Input 0.78 0.98

Simulation 2: Phonetic Transcription 0.46 0.95

Simulation 3:Realistic Vowels 0.13 0.22

Discussion

● There is little variability in simplified input, but a lot in the input received by children– Lexical variability

– Acoustic variability

● Adding this variability back to the input can drastically impact model performance, and may have different effects on different languages.

● To explore the learning problem we must have ecologically valid datasets

Thank You!

NSF IIS-1422987

NSF IIS-1421695

NSF DGE-1343012

OSU Lacqueys reading group

Japanese Long vs Short

English versus Japanese Duration

CDS versus ADS

Speaker Variability

Summary of results



mama



mama:



mama:


With Normalization

Documents

Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,