44
Seidenberg NCPW13 7/2012 PDP Models and American Health Care Reform Mark S. Seidenberg NCPW13 BCBL San Sebastian 2012

SeidenbergNCPW13 7/2012 PDP Models and American Health Care Reform Mark S. Seidenberg NCPW13 BCBL San Sebastian 2012

Embed Size (px)

Citation preview

  • Slide 1

SeidenbergNCPW13 7/2012 PDP Models and American Health Care Reform Mark S. Seidenberg NCPW13 BCBL San Sebastian 2012 Slide 2 SeidenbergNCPW13 7/2012 Observation: Many core concepts of the PDP approach have been broadly assimilated into cognitive science/neuroscience But the modeling, not so much (present distinguished company notwithstanding) skepticism about relevance/adequacy in areas such as language acquisition arising from close analyses of specific PDP models availability of alternative approaches 7/14/2012Seidenberg NCPW13 talk2 Slide 3 SeidenbergNCPW13 7/2012 Observation II: People will endorse PDP concepts as long as you call them something else Like health care debate in US: Click here for movie controls 7/14/2012Seidenberg NCPW13 talk3 Slide 4 SeidenbergNCPW13 7/2012 Obamacare vs. PDP maintain current insurance promote access to affordable health care no denial based on pre- existing conditions individual mandate (the thing that pays for the good stuff) distributed representations interactive processing computation of best fits PDP models 7/14/2012Seidenberg NCPW13 talk4 Slide 5 SeidenbergNCPW13 7/2012 Why? Let us look. Three case studies, all alike 1. Diagnosis of a fatal problem models behave differently from people broad implications, widely repeated attention heads elsewhere 2. Diagnosis turns out to be wrong critiques dont support broader implications less widely known (like getting an audience for a failure to replicate) 3. There is, however, a related problem of considerable interest producing exciting work but need more; have to overcome (1) 7/14/2012Seidenberg NCPW13 talk5 Slide 6 SeidenbergNCPW13 7/2012 Case 1: Catastrophic interference 1. Diagnosis of problem McCloskey & Cohen, 1989 Unlike people, simple feedforward nets exhibit unwanted retroactive interference Based on close analyses of models of simple arithmetic task 2. Solution is interleaving Hippocampus in complementary systems model (MMO, 1995) Life (in which experience is massively interleaved) The type of catastrophic interference that McCloskey-Cohen focused on does occur occasionally certain verbal learning experiments unusual circumstances like Korea France migrs (Pallier et al.) 7/14/2012Seidenberg NCPW13 talk6 Slide 7 SeidenbergNCPW13 7/2012 3. The interesting related problem Massive entrenchment! Reduction in plasticity associated with expertise Example: Critical periods in language learning Paradox of success (Seidenberg & Zevin, 2006), Expertise with L1 makes it difficult to absorb L2 Some models in this area (Ping Li, others). Havent gone that far. A recent example:, 7/14/2012Seidenberg NCPW13 talk7 Slide 8 SeidenbergNCPW13 7/2012 Impact of Dialect Variation in the US on Learning to Read Achievement gap in reading 1. African Americans (and other minorities) perform less well on tests of reading and other subjects compared to whites; 2. gap has been persistent for many years 3. poor reading skills a problem for individuals and society 7/14/2012Seidenberg NCPW13 talk8 Slide 9 SeidenbergNCPW13 7/2012 Why? Not just poverty, school/teacher quality Possibly related to language experience? Major US dialects: Standard American English African American English These dialects overlap more than 2 languages But also differ a lot: phonology, morphology, syntax, discourse 7/14/2012Seidenberg NCPW13 talk9 Slide 10 SeidenbergNCPW13 7/2012 Dialect mismatch effects Home dialect AAE vs. school dialect SAE When schooling starts, child has to learn more of the second dialect learn in less familiar dialect, in noisy environment using books written SAE Dialect differences make learning a more difficult task than for child who uses same dialect at home and in school. But all are judged against same achievement milestones. Gap ensues Other factors like SES may exacerbate further 7/14/2012Seidenberg NCPW13 talk10 Slide 11 SeidenbergNCPW13 7/2012 We wanted to examine impact on reading. Obvious area: how differences in pronunciation affect acquiring basic decoding skills 7/14/2012Seidenberg NCPW13 talk11 Slide 12 SeidenbergNCPW13 7/2012 Pronunciation differences Many words pronounced the same (at phonemic level) Many words pronounced differently Percentage varies with dialect density 30% of words and higher GOLD, FLOOR, and LOW rhyme in AAE 7/14/2012Seidenberg NCPW13 talk12 Slide 13 SeidenbergNCPW13 7/2012 Teacher: G-O-L-D, thats gold [child searches spoken language vocabulary for gold] Child: Ohhh, gole 7/14/2012Seidenberg NCPW13 talk13 Slide 14 SeidenbergNCPW13 7/2012 Thus: Spelling-sound correspondences are more complex for AAE speakers. We have models for that. 7/14/2012Seidenberg NCPW13 talk14 Slide 15 SeidenbergNCPW13 7/2012 Contrastive words: different pronunciations in SAE, AAE bound old toast Non-contrastive: same pronunciation in both dialects brush air stage Latencies do not differ in ELP data base. 7/14/2012Seidenberg NCPW13 talk15 Slide 16 SeidenbergNCPW13 7/2012 Naming latencies as a function of AAE density Children (N =22, M age =11.4 years old) Adults (N = 32, M age = 35.5) 7/14/2012Seidenberg NCPW13 talk16 Slide 17 SeidenbergNCPW13 7/2012 Modeling Once you see the set-up, effects are obvious orth phon model Learns phonology first Then learns to map spellings onto phonology SAElearn map spellings onto known SAE pronunciations AAElearn to pronounce words in SAE while continue using AAE phonology in speech 7/14/2012Seidenberg NCPW13 talk17 Slide 18 SeidenbergNCPW13 7/2012 Model (based on Harm & Seidenberg, 1999) Training corpus: 1700 words from 2 nd grade norms SAE version AAE version: about half the pronunciations are different 7/14/2012Seidenberg NCPW13 talk18 Slide 19 SeidenbergNCPW13 7/2012 SAE match: SAE-SAE AAE match: AAE-AAE Mismatch: AAE-SAE 7/14/2012Seidenberg NCPW13 talk19 Slide 20 SeidenbergNCPW13 7/2012 Training on both dialects 7/14/2012Seidenberg NCPW13 talk20 Slide 21 SeidenbergNCPW13 7/2012 Summary About achievement gap: dialect mismatch slows learning Playing field is not level Models suggest ways to fix this. About models: entrenchment, proactive interference 7/14/2012Seidenberg NCPW13 talk21 Slide 22 SeidenbergNCPW13 7/2012 Case 2: Language acquisition 1. Diagnosis of the problem Language has properties that cant be captured by NNs Rules (Pinker), algebraic rules (Marcus), procedural knowledge (Ullman) Demonstrations: Marcus et al. Lather, rinse, repeat 2. Second opinions: Plenty of people have taken issue with these claims rule-governed only under idealization of data competence theory of performance: Seidenberg & Plaut (in press?) semantic-phonological theory of the past tense (not rules-exceptions) improved models (Altmann, others) 7/14/2012Seidenberg NCPW13 talk22 Slide 23 SeidenbergNCPW13 7/2012 3. The interesting related problem: What is Statistical learning? Language learners learn from statistics of the input Process starts in infancy Many studies examining what kinds of statistics are learned Little of the research makes contact with PDP/connectionist models/concepts Newport (2010) sees progress in the movement in many parts of psycholinguistics from rules to connectionism to statistical learning (p. 369). Statistical learning is not Obamacare! 7/14/2012Seidenberg NCPW13 talk23 Slide 24 SeidenbergNCPW13 7/2012 Irony: Linguists early criticism of connectionist/PDP models languages exhibit lots of regularities depending on how you count models are too powerful; can learn any arbitrary association cant explain why languages exhibit some regularities and not others why people can learn some things and not others Current research on statistical learning in language acquisition same issues! lots of different statistics can be studied in artificial language studies what are the general principles? why are some regularities learnable and not others? 7/14/2012Seidenberg NCPW13 talk24 Slide 25 SeidenbergNCPW13 7/2012 Theyve thrown the theory of how the child learns out with the connectionist bathwater. Need more models, not fewer Recent example: Willits (2012) thesis, UW 7/14/2012Seidenberg NCPW13 talk25 Slide 26 SeidenbergNCPW13 7/2012 Heres what Jon did Studies of non-adjacent dependencies which are everywhere in NL drink, drank, drunk was cooking TheThe woman gave the book to the boy The key(s) to the cabinets is/are on the table. S -> NP + (S) +VP Challenging learning problem. Many recent behavioral studies of infants, toddlers using artificial grammar methods Not much connection to earlier AGL research 7/14/2012Seidenberg NCPW13 talk26 Slide 27 SeidenbergNCPW13 7/2012 Pel Wadim Rud Pel Kicey Rud Pel Puser Rud Vot Wadim Jic Vot Kicey Jic Vot Puser Jic Vary number of As, Bs, Xs Surprisingly hard to learn Gomez, Maye, Newport & Aslin Representative studies: learning an AxB pattern (auditory presentation) Pel Wadim Rud Pel Kicey Rud Pel Puser Rud Vot Wadim Jic Vot Kicey Jic Vot Puser Jic 7/14/2012Seidenberg NCPW13 talk27 Slide 28 SeidenbergNCPW13 7/2012 Willits (2012) Used SRNs to address 4 phenomena: 1. Learning distance-invariant nonadjacent dependencies AxB with 0-3 intervening items 2. Impact of correlated semantic cue (AB are both animals or both foods) 3. Impact of consistent but semantically-unrelated cue (A animal, B food) 4. abstract rule-like knowledge (Marcus) Learntest ABA ABA (same pattern, new items)ABB Key change: let model learn during test phase (like babies do). Then model can learn test pattern with new items-- with savings. 7/14/2012Seidenberg NCPW13 talk28 Slide 29 SeidenbergNCPW13 7/2012 Conclusions 1. Overcoming purported limitations of SRNs, yes. Behavior is similar to humans, yes. 2. More important: Analysis shows reasons why models work. Implications re: learnability of other abstract, rule-like properties of language un-learnability of some types of problems which should be unlearnable for people too 7/14/2012Seidenberg NCPW13 talk29 Slide 30 SeidenbergNCPW13 7/2012 Case 3: Linking Brain and Behavior Problem: PDP models motivated by linkage to brain, neurally inspired, etc. But, most models have not been very constrained by brain data (PDP, neuroimaging developed in parallel at about the same time) 1. Diagnosis: poor fit because the brain doesnt work that way, e.g., backprop, units neurons, etc. 2. Second opinion: things are moving along fine Recent models that are more closely tied to brain Plaut, Lambon Ralph, Taiji Ueno, McClelland, others here 7/14/2012Seidenberg NCPW13 talk30 Slide 31 SeidenbergNCPW13 7/2012 3. Interesting related problem: more please! Integrate PDP models with brain data Otherwise differences in activation for words vs. nonwords = word level representations Grain of neuroimaging data is like grain of behavioral data Models can indeed apply to both 7/14/2012Seidenberg NCPW13 talk31 Slide 32 SeidenbergNCPW13 7/2012 Recent example from our group Jeff Binder (Medical College of Wisconsin Will Graves (now at Rutgers) Me, Tim Rogers (Wisconsin) 7/14/2012Seidenberg NCPW13 talk32 Slide 33 SeidenbergNCPW13 7/2012 How many ways are there to be a skilled reader? Do skilled readers (e.g., of English) read the same way? Old question: Baron & Strawson (1976) Chinese vs. Phoenician readers visual phonological orth semorth phon sem 7/14/2012Seidenberg NCPW13 talk33 Slide 34 SeidenbergNCPW13 7/2012 Maybe different division of labor? Computing a code depends on input from various parts of the system Efficiency arises from division of labor between sources Affected by type of word, type of writing system Plaut et al., 1996: computing phonology Harm & Seidenberg, 2004: computing semantics Individual differences could be related to reading skill, experience 7/14/2012Seidenberg NCPW13 talk34 Slide 35 SeidenbergNCPW13 7/2012 New work looked at impact of semantics on reading words aloud In principle words can be read without using semantics (as in the DRC-CDP+ models) However, in our model, orth sem phon is available, and could facilitate performance for some words or readers Semantic effects on naming: are there any? YES: Strain et al., 1995; Hino & Lupker, 1996; Lichacz et al., 1999; Strain & Herdman, 1999; Hino et al., 2001; Shibahara et al., 2003, and several others. NO: Monaghan & Ellis, 2001; Brown & Watson, 1987; de Groot, 1989; Baayen et al., 2006). 7/14/2012Seidenberg NCPW13 talk35 Slide 36 SeidenbergNCPW13 7/2012 Perhaps there are individual differences Study: examined use of semantics in reading aloud among skilled readers (college graduates, med students) Determine if individual differences are associated with neuroanatomical variation in relevant parts of reading network. 7/14/2012Seidenberg NCPW13 talk36 Slide 37 SeidenbergNCPW13 7/2012 1. Graves et al. (2010): 18 subjects read 465 words aloud in scanner 2. Effect of semantics on naming indexed by impact of imageability. Also looked at freq, consistency, bigrams, number of letters, other factors. 3. Graves et al. (2012): Left hemisphere semantic and phonological ROIs based on results of 2010 study semantic: AGITG/ITS phonological:pSTGpMTG 4. DTI tractography to measure volumes of pathways 7/14/2012Seidenberg NCPW13 talk37 Slide 38 SeidenbergNCPW13 7/2012 7/14/2012Seidenberg NCPW13 talk38 Slide 39 SeidenbergNCPW13 7/2012 Semantic effects on naming correlated with white matter volume in sem-phon pathways Anatomy, not strategy 7/14/2012Seidenberg NCPW13 talk39 Slide 40 SeidenbergNCPW13 7/2012 7/14/2012Seidenberg NCPW13 talk40 Slide 41 SeidenbergNCPW13 7/2012 Everything A-OK? Some reasons why models get a bad name 1. we take credit for good behavior, and discount the bad behavior implementations limited, etc etc like: model learns something that people learn but takes 10 million trials heads I win, tails you lose Properties that hold over many models? Requires doing a lot of models. Like doing replication experiments. Takes lots of time, analysis. Could be hard to build a career around. 7/14/2012Seidenberg NCPW13 talk41 Slide 42 SeidenbergNCPW13 7/2012 2. What about taking learning seriously? Problem wasnt that backprop wasnt neurally realistic It isnt behaviorally realistic. what is learning really like? conditions vary: explicit extrenally provided teacher external or self-generated error signals that are noisy, partial, inconsistent, wrong general rather than specific etc. Can be addressed (h/t OReilly). Maybe models would learn on the human order of magnitude. 7/14/2012Seidenberg NCPW13 talk42 Slide 43 SeidenbergNCPW13 7/2012 So, there is progress, there are obstacles, there are future directions. Why is this important to recognize? In the famous words of the philosopher, Those who fail to remember history are doomed to fail to remember repeating it. Carlos Santana 7/14/2012Seidenberg NCPW13 talk43 Slide 44 SeidenbergNCPW13 7/2012 Thanks for listening! Dialect research:Julie Washington GSU Daragh Sibley Haskins Acquisition researchJon Willits Indiana Jenny Saffran Wisconsin Reading brainJeff Binder MCW Will Graves MCW And Jay for introducing me to PDP. 7/14/2012Seidenberg NCPW13 talk44 Thanks also to collaborators: