45
Statistical Language Learning: Mechanisms and Constraints Jenny R. Saffran Department of Psychology & Waisman Center University of Wisconsin - Madison

Statistical Language Learning: Mechanisms and Constraints

  • Upload
    shalin

  • View
    61

  • Download
    0

Embed Size (px)

DESCRIPTION

Statistical Language Learning: Mechanisms and Constraints. Jenny R. Saffran Department of Psychology & Waisman Center University of Wisconsin - Madison. What kinds of learning mechanisms do infants possess?. How do infants master complex bodies of knowledge? - PowerPoint PPT Presentation

Citation preview

Page 1: Statistical Language Learning: Mechanisms and Constraints

Statistical Language Learning:Mechanisms and Constraints

Jenny R. Saffran

Department of Psychology & Waisman CenterUniversity of Wisconsin - Madison

Page 2: Statistical Language Learning: Mechanisms and Constraints
Page 3: Statistical Language Learning: Mechanisms and Constraints

What kinds of learning mechanisms do infants possess?

• How do infants master complex bodies of knowledge?

• Learning requires both experience & innate structure - bridge between nature & nurture?– Constraints on learning: computational,

perceptual, input-driven, maturational… all neural, though we are not working at that level of analysis

Page 4: Statistical Language Learning: Mechanisms and Constraints

Language acquisition: Experience versus innate structure

• How much of language acquisition can be explained by learning?– Language-specific linguistic structures

• Learning does not offer transparent explanations…– How is abstract linguistic structure acquired?– Why are human languages so similar?– Why can’t non-human learners acquire human

language?

Page 5: Statistical Language Learning: Mechanisms and Constraints

Today’s talk:

Consider a new approach to language learning that may begin to address

some of these outstanding central issues in the study of language & beyond

Page 6: Statistical Language Learning: Mechanisms and Constraints

pr Y|X =freq XY freq X

Statistical Learning

Page 7: Statistical Language Learning: Mechanisms and Constraints

pr Y|X =freq XY freq X

Statistical Learning

What computations are performed?What computations are performed?

What are the units over which What are the units over which computations are performed?computations are performed?

Are these the right computations & Are these the right computations & units given the structure of human units given the structure of human languages?languages?

Page 8: Statistical Language Learning: Mechanisms and Constraints

Breaking into language

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 9: Statistical Language Learning: Mechanisms and Constraints

Word segmentation

Page 10: Statistical Language Learning: Mechanisms and Constraints

Word segmentation cues

• Words in isolation• Pauses/utterance boundaries• Prosodic cues (e.g., word-initial stress in

English)• Correlations with objects in the environment• Phonotactic/articulatory cues• Statistical cues

Page 11: Statistical Language Learning: Mechanisms and Constraints

Statistical learning

PRE TTY BA BY

Continuations within words are systematicContinuations between words are arbitrary

High likelihood High likelihood

Low likelihood

Page 12: Statistical Language Learning: Mechanisms and Constraints

Transitional probabilities

(freq) pretty(freq) pre .80

.0002(freq) tyba (freq) ty

versus

PRETTY BABY

Page 13: Statistical Language Learning: Mechanisms and Constraints

Infants can use statistical cues to find word boundaries

• Saffran, Aslin, & Newport (1996)– 2 minute exposure to a nonsense language

(tokibu, gopila, gikoba, tipolu)– Only statistical cues to word boundaries– Tested on discrimination between words and

part-words (sequences spanning word boundaries)

Page 14: Statistical Language Learning: Mechanisms and Constraints

Experimental setup

Page 15: Statistical Language Learning: Mechanisms and Constraints

Headturn Preference Procedure

QuickTime™ and a decompressorare needed to see this picture.QuickTime™ and a decompressorare needed to see this picture.

QuickTime™ and aYUV420 codec decompressorare needed to see this picture.

Page 16: Statistical Language Learning: Mechanisms and Constraints

tokibutokibugikobagikobagopilagopilatipolutipolutokibu tokibu gopilagopilatipolutipolutokibutokibugikobagikobagopilagopilagikobagikobatokibutokibugopilagopilatipolutipolugikobagikobatipolutipolugikobagikobatipolutipolugopilagopilatipolutipolutokibutokibugopilagopilatipolutipolutokibutokibugopilagopilatipolutipolutokibutokibugopilagopilagikobagikobatipolutipolutokibutokibugopilagopilagikobagikobatipolutipolugikobagikobatipolutipolugikobagikobatipolutipolutokibutokibugikobagikobagopilagopilatipolutipolugikobagikobatokibutokibugopilagopila

Page 17: Statistical Language Learning: Mechanisms and Constraints

tokibutokibugikobagikobagopilagopilatipolutipolutokibu tokibu gopilagopilatipolutipolutokibutokibugikobagikobagopilagopilagikobagikobatokibutokibugopilagopilatipolutipolugikobagikobatipolutipolugikobagikobatipolutipolugopilagopilatipolutipolutokibutokibugopilagopilatipolutipolutokibutokibugopilagopilatipolutipolutokibutokibugopilagopilagikobagikobatipolutipolutokibutokibugopilagopilagikobagikobatipolutipolugikobagikobatipolutipolugikobagikobatipolutipolutokibutokibugikobagikobagopilagopilatipolutipolugikobagikobatokibutokibugopilagopila

Page 18: Statistical Language Learning: Mechanisms and Constraints

tokitokibubugikogikobabagopilagopilatipolutipolutokibu tokibu gopilagopilatipolutipolutokibutokibugikobagikobagopilagopilagikobagikobatokibutokibugopilagopilatipolutipolugikobagikobatipolutipolugikobagikobatipolutipolugopilagopilatipolutipolutokibutokibugopilagopilatipolutipolutokibutokibugopilagopilatipolutipolutokibutokibugopilagopilagikobagikobatipolutipolutokibutokibugopilagopilagikobagikobatipolutipolugikobagikobatipolutipolugikobagikobatipolutipolutokibutokibugikobagikobagopilagopilatipolutipolugikobagikobatokibutokibugopilagopila

Page 19: Statistical Language Learning: Mechanisms and Constraints

Results

0

2

4

6

8

Words Part-words

Look

ing

times

(sec

)

**

Page 20: Statistical Language Learning: Mechanisms and Constraints

Detecting sequential probabilities• Statistical learning for word segmentation

– Infants track transitional probabilities, not frequencies of co-ocurrence (Aslin, Saffran, & Newport, 1997)

– The first useable cue to word boundaries: Use of statistical cues precedes use of lexical stress cues (Thiessen & Saffran, 2003)

– Statistical learning is facilitated by the intonation contours of infant-directed speech (Thiessen, Hill, & Saffran, 2005)

– Infants treat “tokibu” as an English word (Saffran, 2001)

– Emerging “words” feed into syntax learning (Saffran & Wilson, 2003)

• Other statistics useful for learning phonetic categories, lexical categories, etc.

• Beyond language: Domain generality– Tone sequences (Saffran et al., 1999; Saffran & Griepentrog, 2001) golabupabikututibudaropi...

AC#EDGFCBG#A#F#D#…– Visuospatial & visuomotor sequences (Hunt & Aslin, 2000; Fiser & Aslin, 2003)

– Even non-human primates can do it! (Hauser, Newport, & Aslin, 2001)

Page 21: Statistical Language Learning: Mechanisms and Constraints

So does statistical learning really tell us anything about language learning?

Page 22: Statistical Language Learning: Mechanisms and Constraints

Language acquisition: Experience versus innate structure

• How much of language acquisition can be explained by learning?– Language-specific linguistic structures

• Learning does not offer transparent explanations…– How is abstract linguistic structure acquired?– Why are human languages so similar?– Why can’t non-human learners acquire human

languages?

Page 23: Statistical Language Learning: Mechanisms and Constraints

Acquisition of basic phrase structure• Words occur serially, but representations of sentences contain clumps of words

(phrases)How is this structure acquired? Where does it come from?

• Innately endowed as part of Universal Grammar (X-bar theory)?• Prosodic cues? (probabilistically available)

• Predictive dependencies as cues to phrase units cross-linguistically (c.f. mid-20th-century structural linguistics: phrasal diagnostics)– Nouns often occur without articles, but articles usually require nouns: *The walked down the street.– NP often occurs without prepositions, but P usually requires NP*She walked among. – NP often occurs without Vtrans, but Vtrans usually requires object NP*The man hit.

Page 24: Statistical Language Learning: Mechanisms and Constraints

Statistical cue to phrase boundaries

• Unidirectional predictive dependencies high conditional probabilities

• Can humans use predictive dependencies to find phrase units? (Saffran, 2001)– Artificial grammar learning task– Dependencies were the only phrase structure cues – Adults & kids learned the basic structure of the language

Page 25: Statistical Language Learning: Mechanisms and Constraints

• Predictive dependencies assist learners in the discovery of abstract underlying structure.

Predicts better phrase structure learning when predictive dependencies are available than when they are not.

**Constraint on learning: Provides potential learnability explanation for why languages so frequently contain predictive dependencies**

Statistical cue to phrase boundaries

Page 26: Statistical Language Learning: Mechanisms and Constraints

Do predictive dependencies enhance learning?

Methodology: Contrast the acquisition of two artificial grammars (Saffran, 2002)

• Predictive language - Contains predictive dependencies between

word classes as a cue to phrasal units

• Non-predictive language - No predictive dependencies between word classes

Page 27: Statistical Language Learning: Mechanisms and Constraints

Predictive language

S AP + BP + (CP) AP A + (D)BP CP + FCP C + (G)

A = BIFF, SIG, RUD, TIZNote: Dependencies are the opposite direction from English (head-final

language)

A, AD

C, CG

Page 28: Statistical Language Learning: Mechanisms and Constraints

Non-predictive language

S AP + BPAP {(A) + (D)}BP CP + FCP {(C) + (G)}

e.g., in English: *NP {(Det) + (N)}

A, D, AD

C, G, CG

Det, N, Det N

Page 29: Statistical Language Learning: Mechanisms and Constraints

Predictive vs. Non-predictive language comparison

P N• Sentence types 12 9• Five word sentences 33% 11%• Three word sentences 11% 44%• Lexical categories 5 5• Vocabulary size 16 16

Page 30: Statistical Language Learning: Mechanisms and Constraints

Experiment 1• Participants: Adults & 6- to 9-year-olds

• Predictive versus Non-predictive phrase structure languages– Language: Between-subject variable– Incidental learning task– 40 min. auditory exposure, with descending sentential prosody

• Auditory forced-choice test – Novel grammatical vs. novel ungrammatical– Same test items for all participants

BIFF HEP KLOR LUM CAV DUPP. LUM TIZ.

RUD

Page 31: Statistical Language Learning: Mechanisms and Constraints

Results

0

5

10

15

20

25

30

Adults Children

PredictivelanguageNon-predictivelanguage

Mea

n sc

ore

(cha

nce

= 15

) ** **

Page 32: Statistical Language Learning: Mechanisms and Constraints

Experiment 2: Effect of predictive dependencies beyond the language domain?

• Same grammars, different vocabulary:

• Nonlinguistic materials: Alert sounds

• Exp. 1 materials (Predictive & Non-predictive grammars and test items), translated into non-linguistic vocabulary

• Adult participants

Page 33: Statistical Language Learning: Mechanisms and Constraints

Linguistic versus non-linguistic

0

5

10

15

20

25

30

Linguistic(Experiment 1)

Non-linguistic(Experiment 2)

PredictivelanguageNon-predictivelanguage

Mea

n sc

ore

(cha

nce

= 15

)

** **

Page 34: Statistical Language Learning: Mechanisms and Constraints

New auditory non-linguistic task: Predictive vs. Non-predictive languages

Page 35: Statistical Language Learning: Mechanisms and Constraints

Non-linguistic replication

0

5

10

15

20

25

30

Linguistic(Exp 1)

Non-linguistic(Exp 2)

Non-linguistic

replication(Exp 3)

PredictivelanguageNon-predictivelanguage

Mea

n sc

ore

(cha

nce

= 15

)

** ****

Page 36: Statistical Language Learning: Mechanisms and Constraints

Predictive language > Non-predictive language

• Predictive dependencies play a role in learning– For both linguistic & non-linguistic auditory materials

• Also seen for simultaneous visual displays• But not sequential visual displays modality effects

• Human languages may contain predictive dependencies because they assist the learner in finding structure.

• The structure of human languages may have been shaped by human learning mechanisms.

Predict different patterns of learning for appropriately aged human learners versus non-human learners.

Page 37: Statistical Language Learning: Mechanisms and Constraints

Infant/Tamarin comparison: Methodology(with Marc Hauser @ Harvard)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Headturn Preference Procedure: Orienting Procedure: Laboratory exposure Home cage exposure Test: Measure looking times Test: Measure % orienting responses

Paired methods previously used in studies of word segmentation, simple grammars, etc. (Hauser, Newport, & Aslin, 2001; Hauser, Weiss, & Marcus, 2002; etc.)

Page 38: Statistical Language Learning: Mechanisms and Constraints

Materials• Predictive vs. Non-Predictive languages (between Ss)

• Small Grammar: Used to validate methodology– Grammars written over individual words, not categories (one A word, one C word,

etc.)– 8 sentences, repeated– 2 min. exposure (infants) or 2 hrs. exposure (tamarins)– Grammatical (familiar) vs. ungrammatical test items

• Large Grammar: Languages from adult studies– Grammars written over categories (category A, C, etc.)– 50 sentences, repeated– 21 min. exposure (infants) or 2 hrs. exposure (tamarins)– Grammatical (novel) vs. ungrammatical test items

Page 39: Statistical Language Learning: Mechanisms and Constraints

Tamarin results

G U G G U G

P re d ic t i v e N o n -P r e d i c t iv e

1 0 0

0

G r a m m a t ic a l

U n g r a m m a t ic a l

G U G G U GP r e d ic t iv e N o n - P r e d ic t i v e

1 0 0

0

A .

B .

Small grammarSmall grammar

Large grammarLarge grammar

**

Page 40: Statistical Language Learning: Mechanisms and Constraints

Tamarin results

G U G G U G

P re d ic t i v e N o n -P r e d i c t iv e

1 0 0

0

G r a m m a t ic a l

U n g r a m m a t ic a l

G U G G U GP r e d ic t iv e N o n - P r e d ic t i v e

1 0 0

0

A .

B .

Small grammarSmall grammar

Large grammarLarge grammar

**

Page 41: Statistical Language Learning: Mechanisms and Constraints

Infant results (12-month-olds, 12 per group)

0

2

4

6

8

10

Predictive Non-predictive

GrammaticalUngrammatical

0

2

4

6

8

10

Predictive Non-predictive

**

**

Small grammarSmall grammar

Large grammarLarge grammar

Look

ing

times

Lo

okin

g tim

es

(sec

)(s

ec)

Look

ing

times

Lo

okin

g tim

es

(sec

)(s

ec)

Page 42: Statistical Language Learning: Mechanisms and Constraints

Cross-species differences• Small grammar vs. large grammar

– Tamarins only learned the small grammar • Difficulty with generalization? Memory for sentence exemplars?• Can learn patterns over individual elements but not categories?

– Infants learned both systems, despite size of large grammar

• Availability of predictive dependencies– Only affected the tamarins learning the small grammar– Affected the infants regardless of the size of the grammar

• Consistent with constrained statistical learning hypothesis human learning mechanisms may have shaped the structure of natural languages

Page 43: Statistical Language Learning: Mechanisms and Constraints

Constrained statistical learning as a theory of language acquisition?

• Word segmentation, aspects of phonology, aspects of syntax

• Developing the theory

– Scaling up: Multiple probabilistic cues in the input (e.g., prosodic cues), multiple levels of language in the input, more realistic speech (e.g., IDS)

– Mapping to meaning: Are statistically-segmented ‘words’ good labels?

– Critical period effects: Exogenous constraints on statistical learning

– Modularity: Distinguishing domain-specific & domain-general factors• e.g., statistical learning of “musical syntax”

– Bilingualism: Separating languages & computing separate statistics

– Relating to real acquisition outcomes: Individual differences• Patients with congenital amusia with Isabelle Peretz, U. de Montreal

• Specific Language Impairment study with Dr. Julia Evans, UW-Madison

Page 44: Statistical Language Learning: Mechanisms and Constraints

Conclusions• Infants are powerful language learners: Rapid acquisition of complex

structure without external reinforcement

• However, humans are constrained in the types of patterns they readily acquire

• Understanding what is *not* learnable may be just as valuable as cataloging what infants *can* learn

These predispositions may be among the factors that have shaped the structure of human language

Page 45: Statistical Language Learning: Mechanisms and Constraints

Acknowledgements

• National Institutes of Health RO1 HD37466, P30 HD03352

• National Science Foundation PECASE BCS-9983630 • UW-Madison Graduate School• UW-Madison Waisman Center• Members of the Infant Learning Lab• All the parents and babies who have participated!

Infant Learning LabUW-Madison