150
Continuous acoustic detail affects spoken word recognition Bob McMurray University of Iowa Dept. of Psychology Implications for cognition, development and language disorders.

Continuous acoustic detail affects spoken word recognition

Embed Size (px)

DESCRIPTION

Continuous acoustic detail affects spoken word recognition. Implications for cognition, development and language disorders.. Bob McMurray University of Iowa Dept. of Psychology. Collaborators. Richard Aslin Michael Tanenhaus David Gow J. Bruce Tomblin. Joe Toscano - PowerPoint PPT Presentation

Citation preview

Continuous acoustic detail affects spoken word recognition

Bob McMurrayUniversity of Iowa

Dept. of Psychology

Implications for cognition,

development and language disorders.

Collaborators

Richard AslinMichael TanenhausDavid GowJ. Bruce Tomblin

Joe ToscanoCheyenne MunsonDana SubikJulie Markant

Why Speech and Word Recognition

1) Interface between perception and cognition.- Basic Categories - Meaning- Continuous Input -> Discrete representations.

2) Meaningful stimuli are almost always temporal.- Music - Visual Scenes (across

saccades)- Language

3) We understand the:- Cognitive processes (word recognition)- Perceptual processes (speech perception)- Ecology of the input (phonetics)

4) Speech is important: disordered language.

Divisions, Divisions…

Psychology

Linguistics

Speech / Language Pathology

Speech Perception

Word Recognition, Sentence Processing

PhoneticsPhonology,The Lexicon

Speech,Hearing

Language

Percep

tion (&

Action

)

Cogn

ition

Divisions, Divisions…

Divisions useful for framing research and focusing questions.

But:

Divisions between domains of study

can become…

Implicit models of cognitive processing.

Acoustic

Sublexical Units

/b/

/la//a/

/l/ /p/

/ip/

Speech Perception• Categorization of acoustic

input into sublexical units.

LexiconWord Recognition• Identification of target word

from active sublexical units.

Divisions in Spoken Language Understanding

Acoustic

Sublexical Units

/b/

/la//a/

/l/ /p/

/ip/

Speech Perception• Pattern Recognition• Normalization Processes• Stream Segregation

LexiconWord Recognition• Competition• Activation• Constraint Satisfaction

Divisions yield processes

Acoustic

Sublexical Units

/b/

/la//a/

/l/ /p/

/ip/

Speech Perception• Extract invariant

phonemes and features.• Discard continuous

variation.

Processes yield models

Reduce Continuous Variance

LexiconWord Recognition• Identify single

referent.• Ignore competitors.

Reduce Variance

The Variance Reduction Model

Words

Phonemes (etc)

Remove variance

Remove variance

Variance Reduction Model (VRM) Understanding speech is a process of progressively extracting invariant, discrete representations from variable, continuous input.

Continuous speech cues play a minimal role in word recognition (and probably wouldn’t be helpful anyways).

Temporal Integration

The VRM might apply if speech were static.

“Goon”

Goal: Identify /u/Signal: Low F1, F2, High F3Noise: Initially: F2 decreasing

Later: F2 increasingPresence of anti-formant

Variance ReductionMechanisms

Temporal Integration

But the dynamic properties make it more difficult.

“Goon”

Goal: Identify /u/Signal: Low F1, F2, High F3Noise: Initially: F2 decreasing

Later: F2 increasingPresence of anti-formant

Gone.Maybe in

STM?

Hasn’t happened

yet.

Temporal Integration

But the dynamic properties make it more difficult.

“Goon”

Goal: Identify /u/Signal: Low F1, F2, High F3Signal': Initially: F2 decreasing

Later: F2 increasingPresence of anti-formant

Prior /g/

Upcoming/n/

Gone.Maybe in

STM?

Hasn’t happened

yet.Variance Utilization Mechanisms

Goals

1) Replace the Variance Reduction Model with the Variance Utilization Model.

3) Speculatively (and not so speculatively) examine the consequences for:

• Temporal Integration / Short Term Memory.• Development• Non-normal Development

Words

Phonemes (etc)

Remove variance

Remove variance

2) Normal lexical activation processes can serve as variance utilization mechanisms.

Outline

1) Review• Origins of the VRM.• Spoken Word Recognition.

2) Empirical Test

3) The VUM• Lexical Locus • Temporal Integration• SLI proposal

4) Developmental Consequences• Empirical Tests• Computational Model• CI proposal

bakery

ba…

basic

barrier

barricade bait

baby

Xkery

bakery

X

XXX

Online Spoken Word Recognition

• Information arrives sequentially• Fundamental Problem: At early points in time, signal is

temporarily ambiguous.

• Later arriving information disambiguates the word.

Word Recognition

Current models of spoken word recognition

• Immediacy: Hypotheses formed from the earliest moments of input.

• Activation Based: Lexical candidates (words) receive activation to the degree they match the input.

• Parallel Processing: Multiple items are active in parallel.

• Competition: Items compete with each other for recognition.

Word Recognition

time

Input: b... u… tt… e… r

beach

bump putter

dog

butter

Word Recognition

These processes have been well defined for a phonemic representation of the input.

Considerably less ambiguity if we consider subphonemic information.

• Bonus: processing dynamics may solve problems in speech perception.

Example: subphonemic effects of motor processes.

Word Recognition

Coarticulation

Sensitivity to these perceptual details might yield earlier disambiguation.

Lexical activation could retain these perceptual details.

Example: CoarticulationArticulation (lips, tongue…) reflects current, future and past events.

Subtle subphonemic variation in speech reflects temporal organization.

n n

e et c

k

Any action reflects future actions as it unfolds.

These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded.

Example: Categorical Perception

Review:

B

P

Subphonemic variation in VOT is discarded in favor of a discrete symbol (phoneme).

• Sharp identification of tokens on a continuum.

VOT

0

100

PB

% /p

/

ID (%/pa/)0

100Discrim

ination

Discrimination

• Discrimination poor within a phonetic category.

Categorical Perception

Evidence against the strong form of Categorical Perception from psychophysical-type tasks:

Discrimination Tasks Pisoni and Tash (1974) Pisoni & Lazarus (1974)Carney, Widin & Viemeister (1977)

Training Samuel (1977)Pisoni, Aslin, Perey & Hennessy (1982)

Goodness Ratings Miller (1997)Massaro & Cohen (1983)

Categorical Perception

Words

Phonemes (etc)

Remove variance

Remove variance

CP enabled a

fundamental independence of speech perception & spoken word recognition.

Evidence against CP seen as supporting VRM (auditory vs. phonological processing mode).

Critical Prediction: continuous variation in the signal should not affect word recognition.

Variance Reduction Model

?Does within-category acoustic detail

systematically affect higher level language?

Is there a gradient effect of subphonemic detail on lexical activation?

Experiment 1

A gradient relationship would yield systematic effects of subphonemic information on lexical activation.

If this gradiency is useful for temporal integration, it must be preserved over time.

Need a design sensitive to both acoustic detail and detailed temporal dynamics of lexical activation.

McMurray, Aslin & Tanenhaus (2002)

Use a speech continuum—more steps yields a better picture acoustic mapping.

KlattWorks: generate synthetic continua from natural speech.

Acoustic Detail

9-step VOT continua (0-40 ms)

6 pairs of words.beach/peach bale/pale bear/pearbump/pump bomb/palm butter/putter

6 fillers.lamp leg lock ladder lip leafshark shell shoe ship sheep shirt

Acoustic Detail

How do we tap on-line recognition?With an on-line task: Eye-movements

Subjects hear spoken language and manipulate objects in a visual world.

Visual world includes set of objects with interesting linguistic properties.

a beach,, a peach and some unrelated items.

Eye-movements to each object are monitored throughout the task.

Temporal Dynamics

Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995

• Relatively natural task.

• Eye-movements generated very fast (within 200ms of first bit of information).

• Eye movements time-locked to speech.

• Subjects aren’t aware of eye-movements.

• Fixation probability maps onto lexical activation..

Why use eye-movements and visual world paradigm?

Temporal Dynamics

A moment to view the items

Task

Task

Bear

Repeat 1080 times

Task

By subject: 17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms

High agreement across subjects and items for category boundary.

0 5 10 15 20 25 30 35 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

VOT (ms)

prop

orti

on /p

/

B P

Identification Results

Eye-Movement Analysis

Target = Bear

Competitor = Pear

Unrelated = Lamp, Ship

200 ms

1

2

3

4

5

Trials

Time

% f

ixat

ions

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 400 800 1200 1600 0 400 800 1200 1600 2000

Time (ms)

More looks to competitor than unrelated items.

VOT=0 Response= VOT=40 Response=

Fix

atio

n p

ropo

rtio

n

Eye-Movement Results

Given that • the subject heard bear• clicked on “bear”…

How often was the subject looking at the “pear”?

Categorical Results Gradient Effect

target

competitor

time

Fix

atio

n p

rop

orti

on target

competitor competitorcompetitor

time

Fix

atio

n p

rop

orti

on target

Eye-Movement Results

0 400 800 1200 16000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 ms5 ms10 ms15 ms

VOT

0 400 800 1200 1600 2000

20 ms25 ms30 ms35 ms40 ms

VOT

Com

pet

itor

Fix

atio

ns

Time since word onset (ms)

Response= Response=

Long-lasting gradient effect: seen throughout the timecourse of processing.

Eye-Movement Results

0 5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

0.08

VOT (ms)

CategoryBoundary

Response= Response=

Looks to

Looks to C

omp

etit

or F

ixat

ion

s

B: p=.017* P: p<.001***Clear effects of VOT

Linear Trend B: p=.023* P: p=.002***

Area under the curve:

Eye-Movement Results

0 5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

0.08

VOT (ms)

Response= Response=

Looks to

Looks to

B: p=.014* P: p=.001***Clear effects of VOT

Linear Trend B: p=.009** P: p=.007**

Unambiguous Stimuli Only

CategoryBoundaryC

omp

etit

or F

ixat

ion

s

Eye-Movement Results

Summary

Subphonemic acoustic differences in VOT have gradient effect on lexical activation.

• Gradient effect of VOT on looks to the competitor.

• Seems to be long-lasting.

• Effect holds even for unambiguous stimuli.

Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002).

Extensions

Basic effect has been extended to other phonetic cues.- general property of word recognition…

P

B Sh

L

Bear

Voicing (b/p)1

Laterality (l/r), Manner (b/w), Place (d/g)1

Vowels (i/I, /)2

Natural Speech (VOT)3

X Metalinguistic Tasks3

1 McMurray, Clayards, Tanenhaus &

Aslin (2004)2

McMurray & Toscano (in prep)3

McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted)

0 5 10 15 20 25 30 35 40

VOT (ms)

CategoryBoundary

0

0.02

0.04

0.06

0.08

0.1

Response=BLooks to B

Response=PLooks to B

Com

peti

tor

Fix

atio

ns

Basic effect has been extended to other phonetic cues.- general property of word recognition…

Lexical Sensitivity

Voicing (b/p)1

Laterality (l/r), Manner (b/w), Place (d/g)1

Vowels (i/I, /)2

Natural Speech (VOT)3

X Metalinguistic Tasks3

1 McMurray, Clayards, Tanenhaus &

Aslin (2004)2

McMurray & Toscano (in prep)3

McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted)

Voicing (b/p) Laterality (l/r), Manner (b/w), Place (d/g) Vowels (i/I, /) Natural Speech (VOT)

X Metalinguistic Tasks

0 5 10 15 20 25 30 35 40

VOT (ms)

CategoryBoundary

0

0.02

0.04

0.06

0.08

0.1

Response=BLooks to B

Response=PLooks to B

Com

peti

tor

Fix

atio

ns

Basic effect has been extended to other phonetic cues.- general property of word recognition…

Lexical Sensitivity

1 McMurray, Clayards, Tanenhaus &

Aslin (2004)2

McMurray & Toscano (in prep)3

McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted)

1) Word recognition is systematically sensitive to subphonemic acoustic detail.

The Variance Utilization Model

2) Acoustic detail is represented as gradations in activation across the lexicon.

3) Normal word recognition processes do the work of.• Maintaining detail• Sharpening categories• Anticipating upcoming material• Resolving prior ambiguity.

time

Input: b... u… m… p…

bun

bumper

pump

dump

bump

bomb

Gradations phonetic cues preserved as relative lexical activation.

b/p

The Variance Utilization Model

time

Input: b... u… m… p…

bun

bumper

pump

dump

bump

bomb

Gradations phonetic cues preserved as relative lexical activation.

b/d

The Variance Utilization Model

time

Input: b... u… m… p…

bun

bumper

pump

dump

bump

bomb

Non-phonemic distinctions preserved.(e.g. vowel length: Gow & Gordon, 1995; Salverda, Dahan & McQueen 2003)

Vowellength

The Variance Utilization Model

time

Input: b... u… m… p…

bun

bumper

pump

dump

bump

bomb

Material only retained until it is no longer needed.Words are a conveniently sized unit.

n/m

n/m info lost

The Variance Utilization Model

time

Input: b... u… m… p…

bun

bumper

pump

dump

bump

bomb

No need for explicit short-term memory: lexical activation persists over time.

The Variance Utilization Model

time

Input: b... u… m… p…

bun

bumper

pump

dump

bump

bomb

Lexical competition: Perceptual warping (ala CP) results from natural competition processes.

The Variance Utilization Model

Current models of spoken word recognition

• Immediacy:

• Activation Based:

• Parallel Processing:

• Competition:

The Variance Utilization Model

Phonetic cues not simultaneous,

Activation retains early cues.

Graded response to graded input.

Preserves alternative interpretations until confident.

Anticipatory activation for future possibilities.

Non-linear transformation of perceptual space.

Current models of spoken word recognition

• Immediacy: Phonetic cues not simultaneous,Activation retains early cues.

• Activation Based: Graded response to graded input.

• Parallel Processing: Preserves alternative interpretations until confident.

Anticipatory activation for future possibilities.

• Competition: Non-linear transformation of perceptual space.

The Variance Utilization Model

Current models of spoken word recognition

• Immediacy: Phonetic cues not simultaneous,Activation retains early cues.

• Parallel Processing: Preserves alternative interpretations until confident.

Anticipatory activation for future possibilities.

The Variance Utilization Model

Can lexical activation help integrate continuous acoustic cues over time?

• Regressive ambiguity resolution.• Anticipation of upcoming material.

?Experiment 2: Regressive Ambiguity Resolution

?

How long are gradient effects of within-category detail maintained?

Can subphonemic variation play a role in ambiguity resolution?

How is information at multiple levels integrated?

Competitor still active - easy to activate it rest of the way.

Competitor completely inactive- system will “garden-path”.

P ( misperception ) distance from boundary.

Gradient activation allows the system to hedge its bets.

What if initial portion of a stimulus was misperceived?

Misperception

time

Input: p/b eI r ə k i t…

parakeet

barricade

Categorical Lexicon

barricade vs. parakeet

parakeet

barricade

Gradient Sensitivity

/ beIrəkeId / vs. / peIrəkit /

Misperception

10 Pairs of b/p items.

Voiced Voiceless OverlapBumpercar Pumpernickel 6

Barricade Parakeet 5

Bassinet Passenger 5

Blanket   Plankton 5

Beachball Peachpit 4

Billboard Pillbox 4

Drain Pipes Train Tracks 4

Dreadlocks Treadmill    4

Delaware Telephone   4

Delicatessen Television   4

Methods (McMurray, Tanenhaus & Aslin, in prep)

X

Methods

0

5

10

15

20

25

30

35

0

0.2

0.4

0.6

0.8

1

300 600 900

Time (ms)

Fix

atio

ns to

Tar

get

VOT

Barricade -> Parricade

Eye Movement Results

Faster activation of target as VOTs near lexical endpoint.

--Even within the non-word range.

0

5

10

15

20

25

30

35

0

0.2

0.4

0.6

0.8

1

300 600 900

Time (ms)

Fix

atio

ns to

Tar

get

VOT

Barricade -> Parricade

Eye Movement Results

Parakeet -> Barakeet

300 600 900 1200

Time (ms)

Faster activation of target as VOTs near lexical endpoint.

• Even within the non-word range.

Eye Movement Results

Effect of VOT reduced as lexical information takes over.

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000 1200 1400 1600

Time (ms)

Eff

ect

Siz

eVOT

Lexical

Experiment 2b

Are results driven by the presence of the visual competitor?or

Is this a natural process of lexical activation?

X Look, Ma, no parakeet!

Experiment 2b: Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200 400 600 800 1000 1200 1400

Time

Lo

ok

s to

Ba

rric

ad

e

0

5

10

15

20

25

30

35

40

45

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200 400 600 800 1000 1200 1400

Time

Lo

ok

s to

Pa

rak

eet

• Effect found even without visual competitor.• Regressive ambiguity resolution is a general property

of lexical processes.

Barricade -> Parricade Parakeet-> Barakeet

Gradient effect of within-category variation without minimal-pairs.

Experiment 2 Conclusions

Gradient effect long-lasting: mean POD = 240 ms.

Effect is not driven by visual context.

Regressive ambiguity resolution:• Subphonemic gradations maintained until more

information arrives.• Subphonemic gradation not maintained after

POD.• Subphonemic gradation can improve (or hinder)

recovery from garden path.

Current models of spoken word recognition

• Immediacy: Phonetic cues not simultaneous,Activation retains early cues.

• Parallel Processing: Preserves alternative interpretations until confident.

Anticipatory activation for future possibilities.

The Variance Utilization Model

Can lexical activation help integrate continuous acoustic cues over time?

• Regressive ambiguity resolution.• Anticipation of upcoming material. ?

Progressive Expectation Formation

Can within-category detail be used to predict future acoustic/phonetic events?

Yes: Phonological regularities create systematic within-category variation.

• Predicts future events.

(Gow & McMurray, in press)

time

Input: m… a… rr… oo… ng… g… oo… s…

maroon

goose

goat

duck

Word-final coronal consonants (n, t, d) assimilate the place of the following segment.

Place assimilation -> ambiguous segments —anticipate upcoming material.

Experiment 3: Anticipation

Maroong Goose Maroon Duck

Subject hears “select the maroon duck”“select the maroon goose”“select the maroong goose”“select the maroong duck” *

We should see faster eye-movements to “goose” after assimilated consonants.

Methods

Results

Looks to “goose“ as a function of time

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200 400 600Time (ms)

Fix

atio

n P

rop

orti

on

Assimilated

Non Assimilated

Onset of “goose” + oculomotor delay

Anticipatory effect on looks to non-coronal.

Inhibitory effect on looks to coronal (duck, p=.024)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600Time (ms)

Fix

atio

n P

rop

orti

on

AssimilatedNon Assimilated

Looks to “duck” as a function of time

Onset of “goose” + oculomotor delay

Results

Sensitivity to subphonemic detail:• Increase priors on likely upcoming events.• Decrease priors on unlikely upcoming events.• Active Temporal Integration Process.

Occasionally assimilation creates ambiguity• Resolves prior ambiguity: mudg drinker• Similar to experiment 2…

• Progressive effect delayed 200ms by lexical competition—supports lexical locus.

Summary

Lexical activation is exquisitely sensitive to within-category detail.

This sensitivity is useful to integrate material over time.

• Regressive Ambiguity resolution. • Progressive Facilitation

Underpins a potentially lexical role in speech perception.

Adult Summary

Word Recognition: not separable from speech perception.

Specific Language Impairment => Deficits in:

• Speech Perception: Less categorical perception (some debate: Thibodeaux & Sussman, 1979; Coady, Kluender & Evans, in press; Manis et al, 1997; Serniclaes et al, 2004; Van Alphen

et al, 2004)

• Word Recognition: Slower recognition.(Montgomery, 2002; Dollaghan, 1998)

Could word recognition deficits account for apparent perceptual deficits?

Consequences for Language Disorders

time

Input: b... u… m… p…

bun

bumper

pump

dump

bump

bomb

Lexical competition: Perceptual warping (ala CP) results from natural competition processes.

The Variance Utilization Model

Categorical perception:• Stimuli in the same category become closer in

perceptual space (e.g. Goldstone, 2001)

Lexical competition:• Most active lexical candidate inhibits alternatives.• Becomes more active.• More similar to prototype…• Feedsback to alter phoneme representations

(Magnuson, McMurray, Tanenhaus & Aslin, 2003)

• Two versions of same word (category) become more similar

The Variance Utilization Model

Phonemes (etc)Feedback:p10

b90

WordsCompetes:peach10

beach90

WordsActivates:peach20

beach80

The Variance Utilization Model

Input:

p20

b80

[90 10] more similar to prototype, [100 0].Perceptual space warped.

Critical step.Input warped

If competition is suppressed…

… by a low-familiarity word…should see less CP…greater sensitivity to within-category detail

Visual World Paradigm: ideal test

• Simple task: useable with many populations.

• No meta-linguistic knowledge required.

• Used to examine:

- Lexical Activation (Allopenna et al, 1998)- Lexical Competition (Dahan et al, 2001)- Within-category sensitivity (McMurray et al, 2002)

Consequences for Language Disorders

Proposed Research Program

Population: SLI & Normal Adolescents 16-17 y.o.Iowa Longitudinal Study (Tomblin et al)

Step 1: Word Familiarity (~200 words)Step 2: Basic Word Recognition

Stimuli: Beaker, Beetle, Speaker, etc.Step 3: Frequency effects

Familiar words more active than unfamiliar.Step 4: Gradiency (sensitivity to VOT) suppressed

for familiar words (high competition).Step 5: How do we buttress lexical activation?

Consequences for Language Disorders

(with J. Bruce Tomblin, V. Samelson, and S. Lee)

?

Word recognition sensitive to perceptual detail.• Temporal integration.

Word recognition supports perceptual processed.• Hypothesis: related to SLI

Continuous variability NOT discarded during recognition.

Does this change how we think about development?

Consequences of VUM

Historically, work in speech perception has been linked to development.

Sensitivity to subphonemic detail must revise our view of development.

Development

Use: Infants face additional problems:

No lexicon available to clean up noisy input: rely on acoustic regularities.

Extracting a phonology from the series of utterances.

Sensitivity to subphonemic detail:

For 30 years, virtually all attempts to address this question have yielded categorical discrimination (e.g. Eimas, Siqueland, Jusczyk & Vigorito, 1971).

Exception: Miller & Eimas (1996).• Only at extreme VOTs.• Only when habituated to non- prototypical token.

Development

Nonetheless, infants possess abilities that would require within-category sensitivity.

• Infants can use allophonic differences at word boundaries for segmentation (Jusczyk, Hohne & Bauman, 1999; Hohne, & Jusczyk, 1994)

• Infants can learn phonetic categories from distributional statistics (Maye, Werker & Gerken, 2002; Maye & Weiss, 2004).

Use?

Speech production causes clustering along contrastive phonetic dimensions.

E.g. Voicing / Voice Onset TimeB: VOT ~ 0P: VOT ~ 40

Result: Bimodal distribution

Within a category, VOT forms Gaussian distribution.

VOT0ms 40ms

Statistical Category Learning

• Extract categories from the distribution.

+voice -voice

• Record frequencies of tokens at each value along a stimulus dimension.

VOT

freq

uenc

y

0ms 50ms

To statistically learn speech categories, infants must:

• This requires ability to track specific VOTs.

Statistical Category Learning

Known statistical learning abilities (Maye et al) predict:

• Within category sensitivity.

• Graded structure to category.

Why no demonstrations?

Statistical Category Learning

Why no demonstrations of sensitivity?

• HabituationDiscrimination not ID.Possible selective adaptation.Possible attenuation of sensitivity.

• Synthetic speechNot ideal for infants.

• Single exemplar/continuumNot necessarily a category representation

Experiment 4: Reassess issue with improved methods.

Statistical Category Learning

Head-Turn Preference Procedure (Jusczyk & Aslin, 1995)

Infants exposed to a chunk of language:

• Words in running speech.

• Stream of continuous speech (ala statistical learning paradigm).

• Word list.

Memory for exposed items (or abstractions) assessed:• Compare listening time between consistent and

inconsistent items.

HTPP

Test trials start with all lights off.

HTPP

Center Light blinks.

HTPP

Brings infant’s attention to center.

HTPP

One of the side-lights blinks.

HTPP

When infant looks at side-light……he hears a word

Beach… Beach… Beach…

HTPP

…as long as he keeps looking.

HTPP

7.5 month old infants exposed to either 4 b-, or 4 p-words.

80 repetitions total.

Form a category of the exposed class of words.

PeachBeach

PailBail

PearBear

PalmBomb

Measure listening time on…

VOT closer to boundary

Competitors

Original words

Pear*Bear*

BearPear

PearBear

Methods

McMurray & Aslin, 2005

B* and P* were judged /b/ or /p/ at least 90% consistently by adult listeners.

B*: 97%P*: 96%

Stimuli constructed by cross-splicing naturally produced tokens of each end point.

B: M= 3.6 ms VOTP: M= 40.7 ms VOT

B*: M=11.9 ms VOTP*: M=30.2 ms VOT

Methods

Novelty/Familiarity preference varies across infants and experiments.

1221P

1636B

FamiliarityNoveltyWithin each group will we see evidence for gradiency?

We’re only interested in the middle stimuli (b*, p*).

Infants were classified as novelty or familiarity preferring by performance on the endpoints.

Novelty or Familiarity?

Categorical

What about in between?

After being exposed to bear… beach… bail… bomb…

Infants who show a novelty effect……will look longer for pear than bear.

Gradient

Bear*Bear Pear

Lis

teni

ng T

ime

Novelty or Familiarity?

4000

5000

6000

7000

8000

9000

10000

Target Target* Competitor

Lis

ten

ing

Tim

e (m

s)

B

P

Exposed to:

Novelty infants (B: 36 P: 21)

Target vs. Target*:Competitor vs. Target*:

p<.001p=.017

Results

Familiarity infants (B: 16 P: 12)

Target vs. Target*:Competitor vs. Target*:

P=.003p=.012

4000

5000

6000

7000

8000

9000

10000

Target Target* Competitor

Lis

ten

ing

Tim

e (m

s) B

P

Exposed to:

Results

NoveltyN=21

P P* B

.024*

.009**

P P* B

.024*

.009**

4000

5000

6000

7000

8000

9000

10000

Lis

ten

ing

Tim

e (m

s)

Infants exposed to /p/

P* B4000

5000

6000

7000

8000

9000

.018*

.028*

.018*

P

Lis

ten

ing

Tim

e (m

s).028*

FamiliarityN=12

Results

NoveltyN=36

<.001**>.1

<.001**>.2

4000

5000

6000

7000

8000

9000

10000

B B* P

Lis

ten

ing

Tim

e (m

s)

Infants exposed to /b/

FamiliarityN=16

4000

5000

6000

7000

8000

9000

10000

B B* P

Lis

ten

ing

Tim

e (m

s).06

.15

Results

7.5 month old infants show gradient sensitivity to subphonemic detail.

• Clear effect for /p/• Effect attenuated for /b/.

Contrary to all previous work:

Experiment 4 Conclusions

Reduced effect for /b/… But:

Bear Pear

Lis

teni

ng T

ime

Bear*

Null Effect?

Bear Pear

Lis

teni

ng T

ime

Bear*

Expected Result?

• Bear* Pear

Bear Pear

Lis

teni

ng T

ime

Bear*

Actual result.

• Category boundary lies between Bear & Bear*- Between (3ms and 11 ms) [??]

• Within-category sensitivity in a different range?

Same design as experiment 3.

VOTs shifted away from hypothesized boundary

Train

40.7 ms.Palm Pear Peach Pail

3.6 ms.Bomb* Bear* Beach* Bale*

-9.7 ms.Bomb Bear Beach Bale

Test:

Bomb Bear Beach Bale -9.7 ms.

Experiment 5

Familiarity infants (34 Infants)

4000

5000

6000

7000

8000

9000

B- B P

Lis

ten

ing

Tim

e (m

s)

=.05*

=.01**

Results

Novelty infants (25 Infants)

=.02*

=.002**

4000

5000

6000

7000

8000

9000

B- B P

Lis

ten

ing

Tim

e (m

s)

Results

• Within-category sensitivity in /b/ as well as /p/.

• Shifted category boundary in /b/: not consistent with adult boundary (or prior infant work)….

Experiment 5 Conclusions

• Graded structure supports statistical learning.

Will an implementation of this model allow us to understand developmental mechanism?

Distributional learning model

1) Model distribution of tokens asa mixture of Gaussian distributions over phonetic dimension (e.g. VOT) .

2) After receiving an input, the Gaussian with the highest posterior probability is the “category”.

VOT

3) Each Gaussian has threeparameters:

Computational Model

Statistical Category Learning

1) Start with a set of randomly selected Gaussians.

2) After each input, adjust each parameter to find best description of the input.

3) Start with more Gaussians than necessary--model doesn’t innately know how many categories.

-> 0 for unneeded categories.

VOT VOT

Training:Lisker & Abramson (1964) distribution of VOTs

• Not successful with large K.• [Successful with K=2… …but what if we were learning Hindi?]

Solution: Competition (winner-take-all)

Competition No Competition1 Category 5% 0%2 Categories 95% 0%>4 Categories 0% 100%

% in right place 95% 66%

Mechanism #1: Competition Required.Validated with neural network.

What about the nature of the initial state?

Classic view (e.g. Werker & Tees, 1984):

• Infants start with many small (nonnative) categories.

• Lose distinctions that are not used in native language.

Small (nonnative) categories => Large native categories.

Combining small categories: easy.

What about reverse (large => small)?

Large (overgeneralized) categories => Smaller native categories.

Dividing large categories: hard.

Large (overgeneralized) categories => Smaller native categories.

Dividing large categories: hard.

Mechanism #2: Combining small categories easier than dividing large.

Related to adult non-native speech perception findings?

? ?Question:

Reduced auditory acuity in cochlear implant users.

Larger region in which stimuli are not discriminable.

Larger initial categories. Problem for learning?

Answer:

Assess non-native discrimination in CI users.

• Small categories: Auditory acuity not that bad.• Large categories: suggest different learning

mechanisms.(with J. Bruce Tomblin & B. Barker)

Infants show graded sensitivity to subphonemic detail.• Supports variance utilization model.• Variance used for statistical learning.

Model suggests aspects of developmental mechanism:• Competition.• Starting state (large vs. small)

Remaining questions• Unexpected VOT boundary: may require 2AFC task

(anticipatory eye-movement methods)

• Role of initial category size and learning (possible CI application).

Infant Summary

Conclusions

Infant and adults sensitive to subphonemic detail.

Continuous detail not discarded by perception / word recognition.

Normal SWR mechanisms yield:1) Temporal

Integration2) Perceptual warping

X Variance Reduction Variance Utilization

Conclusions

Infant and adults sensitive to subphonemic detail.

Infant sensitivity allows long term phonology learning.• Potentially reveals developmental mechanism.

Competition processes:1) Potentially responsible for CP – locus of SLI?2) Essential for learning.

Conclusions

Spoken language is defined by change.

But the information to cope with it is in the signal—if lexical processes don’t discard

it.

Within-category acoustic variation is signal, not noise.

Head-Tracker Cam Monitor

IR Head-Tracker Emitters

EyetrackerComputer

SubjectComputer

Computers connected via Ethernet

Head

2 Eye cameras

Bob McMurrayUniversity of Iowa

Dept. of Psychology

Implications for cognition,

development and language disorders.

Continuous acoustic detail affects spoken word recognition

Misperception: Additional Results

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 5 10 15 20 25 30 35

Barricade

Res

pon

se R

ate

Voiced

Voiceless

NW

Identification Results

Parricade

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 5 10 15 20 25 30 35

Voiced

Voiceless

NW

Barakeet Parakeet

Res

pon

se R

ate

Significant target responses even at extreme.

Graded effects of VOT on correct response rate.

“Garden-path” effect:Difference between looks to each target (b

vs. p) at same VOT.

VOT = 0 (/b/)

0

0.2

0.4

0.6

0.8

1

0 500 1000

Time (ms)

Fix

atio

ns

to T

arge

t

Barricade

Parakeet

VOT = 35 (/p/)

0 500 1000 1500

Time (ms)

Phonetic “Garden-Path”

-0.1

-0.05

0

0.05

0.1

0.15

0 5 10 15 20 25 30 35

VOT (ms)

Gar

den

-Pat

h E

ffec

t(

Bar

rica

de

- P

arak

eet

)

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 5 10 15 20 25 30 35

VOT (ms)

Gar

den

-Pat

h E

ffec

t (

Bar

rica

de

- P

arak

eet

)

Target

Competitor

GP Effect:Gradient effect of VOT.

Target: p<.0001Competitor: p<.0001

Assimilation: Additional Results

runm picks

runm takes ***

When /p/ is heard, the bilabial feature can be assumed to come from assimilation (not an underlying /m/).

When /t/ is heard, the bilabial feature is likely to be from an underlying /m/.

Within-category detail used in recovering from assimilation: temporal integration.

• Anticipate upcoming material• Bias activations based on context

- Like Exp 2: within-category detail retained to resolve ambiguity..

Phonological variation is a source of information.

Exp 3 & 4: Conclusions

Subject hears“select the mud drinker”“select the mudg gear” “select the mudg drinker

Critical Pair

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Fix

atio

n P

rop

orti

on

Initial Coronal:Mud Gear

Initial Non-Coronal:Mug Gear

Onset of “gear” Avg. offset of “gear” (402 ms)

Mudg Gear is initially ambiguous with a late bias towards “Mud”.

0

0.1

0.2

0.3

0.4

0.5

0.6

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Fix

atio

n P

ropo

rtio

n

Initial Coronal: Mud Drinker

Initial Non-Coronal: Mug Drinker

Onset of “drinker” Avg. offset of “drinker (408 ms)

Mudg Drinker is also ambiguous with a late bias towards “Mug” (the /g/ has to come from somewhere).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 200 400 600Time (ms)

Fix

atio

n P

rop

orti

on

Assimilated

Non Assimilated

Onset of “gear”

Looks to non-coronal (gear) following assimilated or non-assimilated consonant.

In the same stimuli/experiment there is also a progressive effect!

Feedback

Ganong (1980): Lexical information biases perception of ambiguous phonemes.

d t

duke / tukedoot / toot

% /t

/ Phoneme Restoration (Warren, 1970, Samuel, 1997).

Lexical Feedback: McClelland & Elman (1988); Magnuson, McMurray, Tanenhaus & Aslin (2003)

Ganong (1980): Lexical information biases perception of ambiguous phonemes.

Lexical Feedback: McClelland & Elman (1988); Magnuson, McMurray, Tanenhaus & Aslin (2003)

phonemes

words

Scales of temporal integration in word recognition

• A Word: ordered series of articulations.- Build abstract representations.- Form expectations about future events.- Fast (online) processing.

• A phonology: - Abstract across utterances.- Expectations about possible future events.- Slow (developmental) processing

Sparseness

Overgeneralization• large • costly: lose distinctiveness.

Undergeneralization• small • not as costly: maintain distinctiveness.

To increase likelihood of successful learning:• err on the side of caution.• start with small

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60

Starting

P(S

ucc

ess)

2 Category Model

39,900ModelsRun

3 Category Model

Sparseness coefficient: % of space not strongly mapped to any category.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2000 4000 6000 8000 10000 12000

Training Epochs

Avg

Sp

arse

nes

s C

oeff

icie

nt

Starting

VOT

Small

.5-1

Unmapped space

Start with large σ

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2000 4000 6000 8000 10000 12000

Training Epochs

Avg

Sp

arsi

ty C

oeff

icie

nt

20-40

Starting

VOT

.5-1

Intermediate starting σ

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2000 4000 6000 8000 10000 12000

Training Epochs

Avg

Sp

arsi

ty C

oeff

icie

nt

12-17

3-11

Starting

VOT

.5-1

20-40

Small or even medium starting ’s lead to sparse category structure during infancy—much of phonetic space is unmapped.

To avoid overgeneralization……better to start with small estimates for

Sparse categories:Similar temporal integration to exp 2

Retain ambiguity (and partial representations) until more input is available.

Model Conclusions

Examination of sparseness/completeness of categories needs a two alternative task.

Anticipatory Eye Movements(McMurray & Aslin, 2005)

Infants are trained to make anticipatory eye movements in response to auditory or visual stimulus.

Post-training, generalization can be assessed with respect to both targets.

bear

pail

AEM Paradigm

Quicktime Demo

Also useful with• Color• Shape• Spatial Frequency• Faces

Anticipatory Eye Movements

Train: Bear0: LeftPail35: Right

Test: Bear0 Pear40

Bear5 Pear35

Bear10 Pear30

Bear15 Pear25

Same naturally-produced tokens from Exps 4 & 5.

palm

beach

Experiment 6

Expected results

VOT

Adult boundary

unmapped

space

VOTVOT

Pail

Per

form

ance

Bear

Sparse categories

% Correct: 67%9 / 16 Better than chance.Training Tokens {

0

0.25

0.5

0.75

1

0 10 20 30 40

VOT

% C

orre

ct

Beach

Palm

Results