Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Modeling phonetic category learning from natural acoustic data
Stephanie Antetomaso1, Kouki Miyazawa2,4, Naomi Feldman3, Micha Elsner1, Kasia Hitczenko3, Reiko Mazuka4
1The Ohio State University 2Fairy Devices Inc. 3University of Maryland 4RIKEN Brain Science Institute
Input to infants contains variability
Input to infants contains variability
● Yet children acquire phonetic categories of their native language within the first year (Werker & Tees 1984, Polka & Werker 1994)
Input to infants contains variability
● Yet children acquire phonetic categories of their native language within the first year (Werker & Tees 1984, Polka & Werker 1994)
● Variability is critical for certain types of language learning (Gomez 2002, Rost & McMurray 2009)
Learning problem
Learning problem
Input:- acoustics- lexical information- etc.
Learning problem
Input:- acoustics- lexical information- etc.
Output:Knowledge ofnative phoneticcategories
Learning problem
Input:- acoustics- lexical information- etc.
?Output:Knowledge ofnative phoneticcategories
Learning problem
Input:- acoustics- lexical information- etc.
?Output:Knowledge ofnative phoneticcategories
Computational Models
e.g. Vallabha et al. (2007), McMurray et al. (2009), Feldman et al. (2013)
Bayesian lexical-distributional clustering model
Acoustic Values Lexical Context
Bayesian Model
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Bayesian lexical-distributional clustering model
Acoustic Values Lexical Context
Bayesian Model
Vowel Categories Lexical Categories
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Gaussian distributions of sound Sequences of phones
Acoustic Values Lexical Context
Bayesian Model
Vowel Categories Lexical Categories
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Gaussian distributions of sound Sequences of phones
ih t s || k ay n d || ah v4302100
5501300
7001220
F1F2
Acoustic Values Lexical Context
Bayesian Model
Vowel Categories Lexical Categories
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Gaussian distributions of sound Sequences of phones
t s || k n d || v
ih t s || k ay n d || ah v4302100
5501300
7001220
F1F2
4302100
5501300
7001220
Acoustic Values Lexical Context
Bayesian Model
Vowel Categories Lexical Categories
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Gaussian distributions of sound Sequences of phones
t s || k n d || v
ih ay
ah
ih.t.s k.ay.n.d
ah.v
ih t s || k ay n d || ah v4302100
5501300
7001220
F1F2
4302100
5501300
7001220
Feldman et al. 2013 results
Distributional Lexical-Distributional
0.45 0.76
Acoustic simplification: lab productions
Acoustic Values Lexical Context
Bayesian Model
Vowel Categories Lexical Categories
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Gaussian distributions of sound Sequences of phones
Hillenbrand et al. 1995
Acoustic simplification: lab productions
Acoustic Values Lexical Context
Bayesian Model
Vowel Categories Lexical Categories
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Gaussian distributions of sound Sequences of phones
● Stressed, single syllable● No noise, reduction,
co-articulation● No prosodic variability
(affects vowel quality and duration)
Hillenbrand et al. 1995
Acoustic simplification: corpus vowels
Acoustic Values Lexical Context
Bayesian Model
Vowel Categories Lexical Categories
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Gaussian distributions of sound Sequences of phones
Buckeye Speech corpus (Pitt et al. 2007)
Lexical simplification:phonemic transcription
Acoustic Values Lexical Context
Bayesian Model
Vowel Categories Lexical Categories
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Gaussian distributions of sound Sequences of phones
ih.t.s k.ay.n.d ah.v ey y.uw.n.iy.k p.ah.z.ih.sh.ah.n
It's kind of a unique position
Lexical simplification:phonetic transcription
Acoustic Values Lexical Context
Bayesian Model
Vowel Categories Lexical Categories
Vowel formant pairs Categorical consonant frame
Feldman et al. 2013
Gaussian distributions of sound Sequences of phones
ih.t.s k.ay.n.d ah.v ey y.uw.n.iy.k p.ah.z.ih.sh.ah.n
It's kind of a unique position
ih.s k.ah.nx ah.v ey y.ih.n.iy.k p.ah.z.ah.sh.ah.n
What Children Actually Hear
ih.s k.ah.nx ah.v ey y.ih.n.iy.k p.ah.z.ah.sh.ah.n
Natural Vowels Reduced Lexical Items
What Models Actually Receive
Lab Vowels Phonemic Transcription
ih.t.s k.ay.n.d ah.v ey y.uw.n.iy.k p.ah.z.ih.sh.ah.n
How do models perform given more naturalistic data?
● Models help explore what can be learned from the input, given some algorithm
How do models perform given more naturalistic data?
● Models help explore what can be learned from the input, given some algorithm
● Computational models aren't people; some simplification to input must be made
How do models perform given more naturalistic data?
● Models help explore what can be learned from the input, given some algorithm
● Computational models aren't people; some simplification to input must be made
● What is the impact of input simplifications on model performance?
How do models perform given more naturalistic data?
● Models help explore what can be learned from the input, given some algorithm
● Computational models aren't people; some simplification to input must be made
● What is the impact of input simplifications on model performance?
● Are conclusions drawn from these models reliable?
Overview● Simulation 1: Replication of Simplified Input
● Simulation 2: More realistic lexical information
● Simulation 3: More realistic acoustic information
k.ay.n.d
k.ah.nx
k.ah.nx
Corpora
● Laboratory vowel productions:– English: Hillenbrand et al. 1995
– Japanese: Mokhtari & Tanaka 2000
● Natural Speech:– English: Buckeye Speech corpus (Pitt et al. 2007)
– Japanese: R-JMICC corpus (Mazuka et al. 2006)
Simulation 1: Replication of Simplified Input
Acoustic Values Lexical Context
Bayesian Model
Vowel formant pairs Categorical consonant frame
Hillenbrand et al. 1995 Mokhtari & Tanaka 2000
Buckeye Speech corpus (Pitt et al. 2007)R-JMICC corpus (Mazuka et al. 2006)
Resampled
k.ay.n.d
Phonemic
Replication of Simplified Input: English vs. Japanese Lab Vowels
English Japanese
Simplified Input: Successful Category Recovery
English Japanese
Simplified Input: Successful Category Recovery
English Japanese
Categories found by model
Tru
e Cate
gories
Simplified Input: Successful Category Recovery
English Japanese
Simplified Input: Successful Category Recovery
English Japanese
Phonetic F-Score: 0.98Phonetic F-Score: 0.78
Original Feldman et al. 2013 results: Phonetic F-Score: 0.76
Simplified Input: Successful Category Recovery
English Japanese
Phonetic F-Score: 0.78 Phonetic F-Score: 0.98
Original Feldman et al. 2013 results: Phonetic F-Score: 0.76
Simulation 2: Phonetic Transcription
Acoustic Values Lexical Context
Bayesian Model
Vowel formant pairs Categorical consonant frame
Resampled
k.ah.nx
Phonetic
Corpus effects ofphonetic transcription
● English: vowels of frequent words reduced to schwa in natural speech → increased number of phonetic variants
k.ih.n
k.ah.n
k.ae.n
k.n
k.eh.nk.ae.n
etc.
Corpus effects ofphonetic transcription
● English: vowels of frequent words reduced to schwa in natural speech → increased number of phonetic variants
● Japanese: less phonetic reduction
Japanese Word Types
Phonemic Transcription: 751Phonetic Transcription: 791
English Word Types
Phonemic Transcription: 1099Phonetic Transcription: 1813
Phonetic Transcription:Decline in English Performance
English Japanese
Phonetic Transcription:Decline in English Performance
English Japanese
Phonetic F-Score: 0.46 Phonetic F-Score: 0.95
Phonetic Transcription:Decline in English Performance
English Japanese
Phonetic F-Score: 0.46 Phonetic F-Score: 0.95
Simulation 3:Realistic Vowels From Corpus
Acoustic Values Lexical Context
Bayesian Model
Vowel formant pairs Categorical consonant frame
Corpus Vowels
k.ah.nx
Phonetic
Simulation 3:Realistic Vowels From Corpus
English Japanese
Realistic Vowels:Poor Category Recovery
English Japanese
Realistic Vowels:Poor Category Recovery
English Japanese
Phonetic F-Score: 0.13 Phonetic F-Score: 0.22
Realistic Vowels:Poor Category Recovery
English Japanese
Phonetic F-Score: 0.13 Phonetic F-Score: 0.22
Japanese phrase-final lengthening
● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations
Japanese phrase-final lengthening
mama
● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations
Japanese phrase-final lengthening
mama
mama:
● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations
Japanese phrase-final lengthening
● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations
With Normalization
Without Normalization
mama
mama:
Japanese phrase-final lengthening
● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations
With Normalization
Without Normalization
mama
mama:
Summary of Results
Phonetic F-Score
English Japanese
Simulation 1: Simplified Input 0.78 0.98
Simulation 2: Phonetic Transcription 0.46 0.95
Simulation 3:Realistic Vowels 0.13 0.22
Discussion
● There is little variability in simplified input, but a lot in the input received by children– Lexical variability
– Acoustic variability
● Adding this variability back to the input can drastically impact model performance, and may have different effects on different languages.
● To explore the learning problem we must have ecologically valid datasets
Thank You!
NSF IIS-1422987
NSF IIS-1421695
NSF DGE-1343012
OSU Lacqueys reading group
Japanese Long vs Short
English versus Japanese Duration
CDS versus ADS
Speaker Variability
Summary of results
Japanese phrase-final lengthening
● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations
mama
Japanese phrase-final lengthening
● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations
mama:
Japanese phrase-final lengthening
● Japanese drop in performance potentially partly due to phrase-final lengthening affecting vowel durations
mama:
Without Normalization
With Normalization