52
1 LING 696B: Categorical perception, perceptual magnets and neural nets

LING 696B: Categorical perception, perceptual magnets and neural nets

  • Upload
    neal

  • View
    27

  • Download
    2

Embed Size (px)

DESCRIPTION

LING 696B: Categorical perception, perceptual magnets and neural nets. From last time: quick overview of phonological acquisition. Prenatal to newborn 4-day-old French babies prefer listening to French over Russian (Mehler et al, 1988) - PowerPoint PPT Presentation

Citation preview

Page 1: LING 696B: Categorical perception, perceptual magnets and neural nets

1

LING 696B: Categorical perception, perceptual magnets and neural nets

Page 2: LING 696B: Categorical perception, perceptual magnets and neural nets

2

From last time: quick overview of phonological acquisition Prenatal to newborn

4-day-old French babies prefer listening to French over Russian (Mehler et al, 1988)

Prepared to learn any phonetic contrast in human languages (Eimas, 71)

Page 3: LING 696B: Categorical perception, perceptual magnets and neural nets

3

A quick overview of phonological acquisition 6 months: Effect of experience

begin to surface (Kuhl, 92)

6 - 12 months: from a universal perceiver to a language-specific one (Werker & Tees, 84)

Page 4: LING 696B: Categorical perception, perceptual magnets and neural nets

4

A quick overview of phonological acquisition 6 - 9 months: knows the sequential

patterns of their language English v.s. Dutch (Jusczyk, 93)

“avoid” v.s. “waardig” Weird English v.s. better English

(Jusczyk, 94) “mim” v.s. “thob”

Impossible Dutch v.s. Dutch (Friederici, 93)

“bref” v.s. “febr”

Page 5: LING 696B: Categorical perception, perceptual magnets and neural nets

5

A quick overview of phonological acquisition 6 months: segment words out of

speech stream based on transition probabilities (Saffran et al, 96) pabikutibudogolatupabikudaropi…

8 months: remembering words heard in stories read to them by graduate students (Jusczyk and Hohne, 97)

Page 6: LING 696B: Categorical perception, perceptual magnets and neural nets

6

A quick overview of phonological acquisition From 12-18 months and above:

vocabulary growth from 50 words to a “burst” Lexical representation?

Toddlers: learning morpho-phonology Example: U-shaped curve

went --> goed --> wented/went

Page 7: LING 696B: Categorical perception, perceptual magnets and neural nets

7

Next few weeks: learning phonological categories The innovation of phonological structure

requires us to think of the system of vocalizations as combinatorial, in that the concatenation of inherently meaningless phonological units leads to an intrinsically unlimited class of words ... It is a way of systematizing existing vocabulary items and being about to create new ones.

-- Ray Jackendoff (also citing C. Hockett in “Foundations of Language”)

Page 8: LING 696B: Categorical perception, perceptual magnets and neural nets

8

Categorical perception Categorization

Discrimination

Page 9: LING 696B: Categorical perception, perceptual magnets and neural nets

9

Eimas et al (1971) Enduring results: tiny (1-4 month)

babies can hear almost anything Stimuli: 3 pairs along the /p/-/b/

synthetic continuum, differing in 20ms

Page 10: LING 696B: Categorical perception, perceptual magnets and neural nets

10

Eimas et al (1971) Babies get

Very excited hearing a change cross the category boundary

Somewhat excited when the change stay within the boundary

Bored when there is no change

Page 11: LING 696B: Categorical perception, perceptual magnets and neural nets

11

Eimas et al (1971) Their conclusion: [voice] feature

detection is innate But similar things were shown for

Chichillas (Kuhl and Miller, 78) and monkeys (Ramus et al)

So maybe this is a Mammalian aditory system thing…

Page 12: LING 696B: Categorical perception, perceptual magnets and neural nets

12

Sensitivity to distributions Percetual Magnet effects (Kuhl,

91): tokens near prototypes are harder to discriminate

Perhaps a fancier term is “perceptual space warping”

Page 13: LING 696B: Categorical perception, perceptual magnets and neural nets

13

Categorical perception v.s. perceptual magnets Gradient effects: Kuhl (91-95) tried

hard to demonstrate these are not experimental errors, but systematic Refererce 1 2 3 4

Page 14: LING 696B: Categorical perception, perceptual magnets and neural nets

14

Categorical perception v.s. perceptual magnets PM is based on a notion of

perceptual space Higher dimensions Non-category boundaries

Page 15: LING 696B: Categorical perception, perceptual magnets and neural nets

15

Kuhl et al (1992) American English has /i/, but no /y/ Swedish has /y/, but no /i/ Test whether infants can detect

differences between prototype and neighbors

Percentof treating

as the same

Page 16: LING 696B: Categorical perception, perceptual magnets and neural nets

16

Kuhl et al (92)’s interpretation Lingustic experience changes

perception of native and non-native sounds (also see Werker & Tees, 84)

Influence of input starts before any phonemic contrast is learned

Information stored as prototypes

Page 17: LING 696B: Categorical perception, perceptual magnets and neural nets

17

Modeling attempt: Guenther and Gjaja (96) Claim: CP/PM is a matter of

auditory cortex, and does not have to involve higher-level cognition

Methodology: build a connectionist model inspired by neurons

Page 18: LING 696B: Categorical perception, perceptual magnets and neural nets

18

Self-organizing map in Guenther and Gjaja Input coding:

So for each (F1,F2) pair, use 4 input nodes This seems to be an artifact of the

mechanism (with regard to scaling), but a standard practice in the field

Page 19: LING 696B: Categorical perception, perceptual magnets and neural nets

19

Self-organizing map in Guenther and Gjaja

mj : activation

Connection weights

Page 20: LING 696B: Categorical perception, perceptual magnets and neural nets

20

Self-organizing map: learning stage

mj : activation

Connection weights:initialized to be random

Learning rule: mismatch

activationupdate

Learning rate

Page 21: LING 696B: Categorical perception, perceptual magnets and neural nets

21

Self-organizing map: learning stage Crucially, only update connections

to the most active N cells The “inhibitory connection”

N goes down to 1 over time Intuition: each cell selectively

encodes certain kind of input. Otherwise the weights are pulled in different directions

Page 22: LING 696B: Categorical perception, perceptual magnets and neural nets

22

Self-organizing map: testing stage Population vector: take a weighted sum

of individual votes

Inspired by neural firing patterns in monkey reaching

How to compute Fvote? X+, X- must point to the same direction as Z+,

Z-

Algebra exercise: derive a formula to calculate Fvote (homework Problem 1)

Page 23: LING 696B: Categorical perception, perceptual magnets and neural nets

23

The distributin of Fvote after learning

Trainining stimuli Distribution of Fvote

Recall: activation

Page 24: LING 696B: Categorical perception, perceptual magnets and neural nets

24

Results shown in the paper

Kuhl

Their model

Page 25: LING 696B: Categorical perception, perceptual magnets and neural nets

25

Results shown in the paper

With some choice of parameters to fit the“percent generalization”

Page 26: LING 696B: Categorical perception, perceptual magnets and neural nets

26

Local summary Captures a type of pre-linguistic /

pre-lexical learning from distributions (next time) Distribution learning can

happen in a short time (Maye&Gerken, 00)

G&G: No need to have categories in order to get PM effects Also see Lacerda (95)’s exemplar model

Levels of view: cognition or cortex Maybe Guenther is right …

Page 27: LING 696B: Categorical perception, perceptual magnets and neural nets

27

Categorization / discrimination training (Guenther et al, 99)

1. Space the stimuli according to JND

2. Test whether they can hear the distinctions in control and training regions

3. TRAIN in some way(45 minutes!)

4. Test again as in 2.

Band-passed white noise

Page 28: LING 696B: Categorical perception, perceptual magnets and neural nets

28

Guenther et al, 99 Categorization training:

Tell people that sounds from the training region belong to a “prototype”

Present them sounds from training region and band-edges, ask whether it’s prototype

Discrimination training: People say “same” or “different”, and get

feedback Made harder by tighter spacing over time

Page 29: LING 696B: Categorical perception, perceptual magnets and neural nets

29

Guenther et al, 99 Result of distribution learning

depends on what people do with it

control

training

Page 30: LING 696B: Categorical perception, perceptual magnets and neural nets

30

Guenther’s neural story for this effect Propose to modify G&G’s

architecture to simulate effects of different training procedure

Did some brain scanto verify the hypothesis(Guenther et al, 04)

New model forthcoming?

Page 31: LING 696B: Categorical perception, perceptual magnets and neural nets

31

Guenther’s neural story for this effect Propose to modify G&G’s

architecture to simulate effects of different training procedure

Did some brain scanto verify the hypothesis(Guenther et al, 04)

New model forthcoming?(A+ guaranteed if you turn one in)

Page 32: LING 696B: Categorical perception, perceptual magnets and neural nets

32

The untold story of Guenther and Gjaja Size of model matters a lot

Larger models give more flexibility, but no free lunch: Take longer to learn Must tune the learning rate parameter Must tune the activation neighborhood

100 cells 500 cells 1500 cells

Page 33: LING 696B: Categorical perception, perceptual magnets and neural nets

33

Randomness in the outcome This is

perhaps the nicest

Some other runs

Page 34: LING 696B: Categorical perception, perceptual magnets and neural nets

34

A technical view of G&G’s model “Self-organization”: a type of

unsupervised learning Receive input with no information on

which categories they belong Also referred to as “clustering” (next

time) “Emergent behavior”: spatial input

patterns coded connection weights that realize the non-linear mapping

Page 35: LING 696B: Categorical perception, perceptual magnets and neural nets

35

A technical view of G&G’s model A non-parametric approximation to

the input-percept non-linear mapping Lots of parameters, but individually do

not correspond to categories Will look at a parametric approach to

clustering next time Difficulties: tuning of parameters,

rates of convergence, formal analysis (more later)

Page 36: LING 696B: Categorical perception, perceptual magnets and neural nets

36

Other questions for G&G If G&G is a model of auditory cortex,

then where do F1 and F2 come from? (see Adam’s presentation)

What if we expose G&G to noisy data? How does it deal with

speech segmentation? More next time.

Page 37: LING 696B: Categorical perception, perceptual magnets and neural nets

37

Homework problems related to perceptual magnets Problem 2: warping in one dimension

predicted by perceptual magnets

Problem 3: program a simplified G&G type of model or any other SOM to simulate the effect shown above

Before Training data After?

Page 38: LING 696B: Categorical perception, perceptual magnets and neural nets

38

Commercial:CogSci colloquium this year Many big names in connectionism

this semester

Page 39: LING 696B: Categorical perception, perceptual magnets and neural nets

39

A digression to the history of neural nets 40’s -- study of neurons (Hebb,

Huxley&Hodgin) 50’s -- self-programming

computers, biological inspirations ADALINE from Stanford (Widrow &

Hoff) Perceptron (Rosenblatt, 60)

Page 40: LING 696B: Categorical perception, perceptual magnets and neural nets

40

Perceptron

Input: x1, …, xn

Output: {0, 1} Weights: w1,

…,wn

Bias: a How it works:

Page 41: LING 696B: Categorical perception, perceptual magnets and neural nets

41

What can perceptrons do? A linear machine: finding a plane

that separates the 0 examples from 1 examples

Page 42: LING 696B: Categorical perception, perceptual magnets and neural nets

42

Perceptron learning rule Start with random weights Take an input, check if the output,

when there is an error weight = weight + a*input*error bias = bias + a*error

Repeat until no more error Recall G&G learning rule See demo

Page 43: LING 696B: Categorical perception, perceptual magnets and neural nets

43

Criticism from traditional AI(Minsky & Papert, 69) Simple perceptron not capable of

learning simple boolean functions More sophisticated machines hard

to understand Example: XOR (see demo)

Page 44: LING 696B: Categorical perception, perceptual magnets and neural nets

44

Revitalization of neural nets: connectionism Minsky and Papert’s book stopped

the work on neural nets for 10 years …

But as computer gets faster and cheaper,

CMU parallel distributed processing (PDP) group

UC San Diego institute of cognitive science

Page 45: LING 696B: Categorical perception, perceptual magnets and neural nets

45

Networks get complicated: more layers 3 layer perceptron can solve the

XOR problem

Page 46: LING 696B: Categorical perception, perceptual magnets and neural nets

46

Networks get complicated: combining non-linearities Decision surface can be non-linear

Page 47: LING 696B: Categorical perception, perceptual magnets and neural nets

47

Networks get complicated: more connections Bi-directional Inhibitory Feedback loops

Page 48: LING 696B: Categorical perception, perceptual magnets and neural nets

48

Key to the revitalization of connectionism Back-propagation algorithm that can be

used to train a variety of networks Variants developed for other

architectureserror

Page 49: LING 696B: Categorical perception, perceptual magnets and neural nets

49

Two kinds of learning with neural nets Supervised

Classification: patterns --> {1,…,N} Adaptive control: control variables -->

observed measurements Time series prediction: past events -->

future events Unsupervised

Clustering Compression / feature extraction

Page 50: LING 696B: Categorical perception, perceptual magnets and neural nets

50

Statistical insights from the neural nets Overtraining / early-stopping Local maxima Model complexity -- training error

tradeoff Limits of learning: generalization

bounds Dimension reduction, feature

selection Many of these problems become

standard topics in machine learning

Page 51: LING 696B: Categorical perception, perceptual magnets and neural nets

51

Big question: neural nets and symbolic computing Grammar: harmony theory

(Smolensky, and later Optimality Theory)

Representation: Can neural nets discovery phonemes?

Page 52: LING 696B: Categorical perception, perceptual magnets and neural nets

52

Next week Reading will be posted on the

course webpage