62
Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso 1 , Kouki Miyazawa 2,4 , Naomi Feldman 3 , Micha Elsner 1 , Kasia Hitczenko 3 , Reiko Mazuka 4 1 The Ohio State University 2 Fairy Devices Inc. 3 University of Maryland 4 RIKEN Brain Science Institute

Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Modeling phonetic category learning from natural acoustic data

Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3, Micha Elsner1, Kasia Hitczenko3, Reiko Mazuka4

1The Ohio State University 2Fairy Devices Inc. 3University of Maryland 4RIKEN Brain Science Institute

Page 2: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Input to infants contains variability

Page 3: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Input to infants contains variability

● Yet children acquire phonetic categories of their native language within the first year (Werker & Tees 1984, Polka & Werker 1994)

Page 4: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Input to infants contains variability

● Yet children acquire phonetic categories of their native language within the first year (Werker & Tees 1984, Polka & Werker 1994)

● Variability is critical for certain types of language learning (Gomez 2002, Rost & McMurray 2009)

Page 5: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Learning problem

Page 6: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Learning problem

Input:- acoustics- lexical information- etc.

Page 7: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Learning problem

Input:- acoustics- lexical information- etc.

Output:Knowledge ofnative phoneticcategories

Page 8: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Learning problem

Input:- acoustics- lexical information- etc.

?Output:Knowledge ofnative phoneticcategories

Page 9: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Learning problem

Input:- acoustics- lexical information- etc.

?Output:Knowledge ofnative phoneticcategories

Computational Models

e.g. Vallabha et al. (2007), McMurray et al. (2009), Feldman et al. (2013)

Page 10: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Bayesian lexical-distributional clustering model

Acoustic Values Lexical Context

Bayesian Model

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Page 11: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Bayesian lexical-distributional clustering model

Acoustic Values Lexical Context

Bayesian Model

Vowel Categories Lexical Categories

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Gaussian distributions of sound Sequences of phones

Page 12: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Acoustic Values Lexical Context

Bayesian Model

Vowel Categories Lexical Categories

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Gaussian distributions of sound Sequences of phones

ih t s || k ay n d || ah v4302100

5501300

7001220

F1F2

Page 13: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Acoustic Values Lexical Context

Bayesian Model

Vowel Categories Lexical Categories

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Gaussian distributions of sound Sequences of phones

t s || k n d || v

ih t s || k ay n d || ah v4302100

5501300

7001220

F1F2

4302100

5501300

7001220

Page 14: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Acoustic Values Lexical Context

Bayesian Model

Vowel Categories Lexical Categories

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Gaussian distributions of sound Sequences of phones

t s || k n d || v

ih ay

ah

ih.t.s k.ay.n.d

ah.v

ih t s || k ay n d || ah v4302100

5501300

7001220

F1F2

4302100

5501300

7001220

Page 15: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Feldman et al. 2013 results

Distributional Lexical-Distributional

0.45 0.76

Page 16: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Acoustic simplification: lab productions

Acoustic Values Lexical Context

Bayesian Model

Vowel Categories Lexical Categories

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Gaussian distributions of sound Sequences of phones

Hillenbrand et al. 1995

Page 17: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Acoustic simplification: lab productions

Acoustic Values Lexical Context

Bayesian Model

Vowel Categories Lexical Categories

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Gaussian distributions of sound Sequences of phones

● Stressed, single syllable● No noise, reduction,

co-articulation● No prosodic variability

(affects vowel quality and duration)

Hillenbrand et al. 1995

Page 18: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Acoustic simplification: corpus vowels

Acoustic Values Lexical Context

Bayesian Model

Vowel Categories Lexical Categories

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Gaussian distributions of sound Sequences of phones

Buckeye Speech corpus (Pitt et al. 2007)

Page 19: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Lexical simplification:phonemic transcription

Acoustic Values Lexical Context

Bayesian Model

Vowel Categories Lexical Categories

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Gaussian distributions of sound Sequences of phones

ih.t.s k.ay.n.d ah.v ey y.uw.n.iy.k p.ah.z.ih.sh.ah.n

It's kind of a unique position

Page 20: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Lexical simplification:phonetic transcription

Acoustic Values Lexical Context

Bayesian Model

Vowel Categories Lexical Categories

Vowel formant pairs Categorical consonant frame

Feldman et al. 2013

Gaussian distributions of sound Sequences of phones

ih.t.s k.ay.n.d ah.v ey y.uw.n.iy.k p.ah.z.ih.sh.ah.n

It's kind of a unique position

ih.s k.ah.nx ah.v ey y.ih.n.iy.k p.ah.z.ah.sh.ah.n

Page 21: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

What Children Actually Hear

ih.s k.ah.nx ah.v ey y.ih.n.iy.k p.ah.z.ah.sh.ah.n

Natural Vowels Reduced Lexical Items

Page 22: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

What Models Actually Receive

Lab Vowels Phonemic Transcription

ih.t.s k.ay.n.d ah.v ey y.uw.n.iy.k p.ah.z.ih.sh.ah.n

Page 23: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

How do models perform given more naturalistic data?

● Models help explore what can be learned from the input, given some algorithm

Page 24: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

How do models perform given more naturalistic data?

● Models help explore what can be learned from the input, given some algorithm

● Computational models aren't people; some simplification to input must be made

Page 25: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

How do models perform given more naturalistic data?

● Models help explore what can be learned from the input, given some algorithm

● Computational models aren't people; some simplification to input must be made

● What is the impact of input simplifications on model performance?

Page 26: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

How do models perform given more naturalistic data?

● Models help explore what can be learned from the input, given some algorithm

● Computational models aren't people; some simplification to input must be made

● What is the impact of input simplifications on model performance?

● Are conclusions drawn from these models reliable?

Page 27: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Overview● Simulation 1: Replication of Simplified Input

● Simulation 2: More realistic lexical information

● Simulation 3: More realistic acoustic information

k.ay.n.d

k.ah.nx

k.ah.nx

Page 28: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Corpora

● Laboratory vowel productions:– English: Hillenbrand et al. 1995

– Japanese: Mokhtari & Tanaka 2000

● Natural Speech:– English: Buckeye Speech corpus (Pitt et al. 2007)

– Japanese: R-JMICC corpus (Mazuka et al. 2006)

Page 29: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Simulation 1: Replication of Simplified Input

Acoustic Values Lexical Context

Bayesian Model

Vowel formant pairs Categorical consonant frame

Hillenbrand et al. 1995 Mokhtari & Tanaka 2000

Buckeye Speech corpus (Pitt et al. 2007)R-JMICC corpus (Mazuka et al. 2006)

Resampled

k.ay.n.d

Phonemic

Page 30: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Replication of Simplified Input: English vs. Japanese Lab Vowels

English Japanese

Page 31: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Simplified Input: Successful Category Recovery

English Japanese

Page 32: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Simplified Input: Successful Category Recovery

English Japanese

Categories found by model

Tru

e Cate

gories

Page 33: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Simplified Input: Successful Category Recovery

English Japanese

Page 34: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Simplified Input: Successful Category Recovery

English Japanese

Phonetic F-Score: 0.98Phonetic F-Score: 0.78

Original Feldman et al. 2013 results: Phonetic F-Score: 0.76

Page 35: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Simplified Input: Successful Category Recovery

English Japanese

Phonetic F-Score: 0.78 Phonetic F-Score: 0.98

Original Feldman et al. 2013 results: Phonetic F-Score: 0.76

Page 36: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Simulation 2: Phonetic Transcription

Acoustic Values Lexical Context

Bayesian Model

Vowel formant pairs Categorical consonant frame

Resampled

k.ah.nx

Phonetic

Page 37: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Corpus effects ofphonetic transcription

● English: vowels of frequent words reduced to schwa in natural speech → increased number of phonetic variants

k.ih.n

k.ah.n

k.ae.n

k.n

k.eh.nk.ae.n

etc.

Page 38: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Corpus effects ofphonetic transcription

● English: vowels of frequent words reduced to schwa in natural speech → increased number of phonetic variants

● Japanese: less phonetic reduction

Japanese Word Types

Phonemic Transcription: 751Phonetic Transcription: 791

English Word Types

Phonemic Transcription: 1099Phonetic Transcription: 1813

Page 39: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Phonetic Transcription:Decline in English Performance

English Japanese

Page 40: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Phonetic Transcription:Decline in English Performance

English Japanese

Phonetic F-Score: 0.46 Phonetic F-Score: 0.95

Page 41: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Phonetic Transcription:Decline in English Performance

English Japanese

Phonetic F-Score: 0.46 Phonetic F-Score: 0.95

Page 42: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Simulation 3:Realistic Vowels From Corpus

Acoustic Values Lexical Context

Bayesian Model

Vowel formant pairs Categorical consonant frame

Corpus Vowels

k.ah.nx

Phonetic

Page 43: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Simulation 3:Realistic Vowels From Corpus

English Japanese

Page 44: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Realistic Vowels:Poor Category Recovery

English Japanese

Page 45: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Realistic Vowels:Poor Category Recovery

English Japanese

Phonetic F-Score: 0.13 Phonetic F-Score: 0.22

Page 46: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Realistic Vowels:Poor Category Recovery

English Japanese

Phonetic F-Score: 0.13 Phonetic F-Score: 0.22

Page 47: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Japanese phrase-final lengthening

● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations

Page 48: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Japanese phrase-final lengthening

mama

● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations

Page 49: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Japanese phrase-final lengthening

mama

mama:

● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations

Page 50: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Japanese phrase-final lengthening

● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations

With Normalization

Without Normalization

mama

mama:

Page 51: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Japanese phrase-final lengthening

● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations

With Normalization

Without Normalization

mama

mama:

Page 52: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Summary of Results

Phonetic F-Score

English Japanese

Simulation 1: Simplified Input 0.78 0.98

Simulation 2: Phonetic Transcription 0.46 0.95

Simulation 3:Realistic Vowels 0.13 0.22

Page 53: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Discussion

● There is little variability in simplified input, but a lot in the input received by children– Lexical variability

– Acoustic variability

● Adding this variability back to the input can drastically impact model performance, and may have different effects on different languages.

● To explore the learning problem we must have ecologically valid datasets

Page 54: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Thank You!

NSF IIS-1422987

NSF IIS-1421695

NSF DGE-1343012

OSU Lacqueys reading group

Page 55: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Japanese Long vs Short

Page 56: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

English versus Japanese Duration

Page 57: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

CDS versus ADS

Page 58: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Speaker Variability

Page 59: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Summary of results

Page 60: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Japanese phrase-final lengthening

● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations

mama

Page 61: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Japanese phrase-final lengthening

● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations

mama:

Page 62: Modeling phonetic category learning from natural acoustic data · Modeling phonetic category learning from natural acoustic data Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3,

Japanese phrase-final lengthening

● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations

mama:

Without Normalization

With Normalization