Adamson - Interlanguage Variation

Interlanguage Variation in Theoreticaland Pedagogical Perspective

In this book H. D. Adamson reviews scholarship in sociolinguistics and secondlanguage acquisition, comparing theories of variation in first-and-secondlanguage speech, with special attention to the psychological underpinnings ofvariation theory. Interlanguage is what second language learners speak. Itcontains syntactic, morphological, and phonological patterns that are notthose of either the first or the second language, and which can be analyzedusing the principles and techniques of variation theory. Interlanguage Vari-ation in Theoretical and Pedagogical Perspective:

• Relates the emerging field of variation in second language learners’speech (interlanguage) to the established field of variation in nativespeakers’ speech

• Relates the theory of linguistic variation with psycholinguistic modelsof language processing

• Relates sociolinguistic variation theory to the theory of CognitiveLinguistics

• Suggests teaching applications that follow from the theoreticaldiscussion

At the forefront of scholarship in the fields of interlanguage and variationtheory scholarship, this book is directed to graduate students and researchersin applied English linguistics and second language acquisition, especially thosewith a background in sociolinguistics.

H. D. Adamson is Professor of English at the University of Arizona, wherehe has served as the Director of the Ph.D. Program in Second LanguageAcquisition and Teaching. He has taught English as a second or foreignlanguage in Ethiopia, Spain, and the United States.

Interlanguage Variation inTheoretical and PedagogicalPerspective

H. D. AdamsonUniversity of Arizona

First published 2009by Routledge270 Madison Ave, New York, NY 10016

Simultaneously published in the UKby Routledge2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN

Routledge is an imprint of the Taylor & Francis Group, an informa business

© 2009 Taylor & Francis

All rights reserved. No part of this book may be reprinted orreproduced or utilized in any form or by any electronic,mechanical or other means, now known or hereafterinvented, including photocopying and recording, or in anyinformation storage or retrieval system, without permission inwriting from the publishers.

Trademark Notice: Product or corporate names may betrademarks or registered trademarks, and are used only foridentification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication DataAdamson, H. D. (Hugh Douglas)

Interlanguage variation in theoretical and pedagogical perspective / H.D. Adamson.p. cm.

Includes bibliographical references and index.Language and languages—Variation. 2. Interlanguage (Language learning).3. Psycholinguistics. 4. Language and languages—Study and teaching. I. Title.P120.V37A32 2008401′93–dc22 2008021718

ISBN10: 0–8058–5576–9 (hbk)ISBN10: 0–203–88736–0 (ebk)

ISBN13: 978–0–8058–5576–0 (hbk)ISBN13: 978–0–203–88736–3 (ebk)

This edition published in the Taylor & Francis e-Library, 2008.

“To purchase your own copy of this or any of Taylor & Francis or Routledge’scollection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.”

ISBN 0-203-88736-0 Master e-book ISBN

Once again this book is dedicated, with love, to Alice

v

Contents

Preface xi

Phonetic Symbols Used in the Book xv

List of Figures and Tables xvii

Part IVariation in Native Speaker Speech

1 Variation Theory 3Introduction—The Cartesian Mind 3Generative Grammar 7Variation Theory—A History 12

2 A Study of Variation in the Native Speaker SpeechCommunity 23h. d. adamson and vera regan

Introduction 23Sociolinguistic Studies of -ing 24Methods of Data Collection and Analysis 26Results 30Discussion 32

3 Language Variation and Change 33Introduction 33Measuring Sound Change 34Five Problems in Explaining Sound Change 36Conclusions 46

Part IIVariation in Nonnative Speaker Speech

4 The Study of Variation in Interlanguage 49Introduction 49Early Studies of Vertical Variation 50

vii

Studies of Horizontal Variation 51Conclusions 59

5 The Acquisition of English Irregular Past Tense byChinese-speaking Children 61larry berlin and h. d. adamson

Introduction 61Constraints on Past Tense Marking 61Variation in Chinese-speaking Children’s Marking ofEnglish Irregular Past Tense 65Results 70Discussion 71Conclusions 73

Part IIIVariation in Theoretical Perspective

6 Psychological Theories of Linguistic Variation 77Introduction 77Psychological Studies of Probability-matching 77Psycholinguistic Models of Language Performance 81Elliott’s Study of Spanish Acquisition 93Elliott’s Results Interpreted as a Connectionist Network 96Conclusion 99

7 Cognitive Linguistics 101Introduction 101Prototype Schemas in Morphology 106Prototype Schemas in Syntax/Semantics: The Acquisitionof Argument Structure 111A Pilot Study of Ditransitive Acquisition Among KoreanSpeakers 120Discussion: Prototype Schemas, Connectionist Networks,and Variable Rules 126

Part IVVariation in Pedagogical Perspective

8 Speaking Style and Monitoring 133Monitoring—Attention Paid to Speech 133Accommodation, Audience Design, and Self-identification 140Reconciling Monitoring and Audience Design 151

viii • Contents

9 Teaching Implications 153Social Dimensions 153Psychological Dimensions 166A Philosophy of Language Teaching 173

Appendix: Variation and Change in Color Semantics 183

Notes 192

References 195

Index 205

Contents • ix

Preface

Over the years, second language acquisition (SLA) scholars have followed in thefootsteps of researchers in other branches of linguistics. For example, the theoryof Universal Grammar was developed to explain first language acquisition, butthat research has led to a large number of studies investigating whether thetheory also applies to second language acquisition. The same is true in the fieldof quantitative sociolinguistics, a field that was developed by William Labov andhis associates in the 1960s and 1970s to describe social and regional dialects.Labov (1966, 1969, 1972a, 1972b) found that these language varieties differedfrom each other not so much in terms of features that were always present inone variety and never present in another, but in terms of the frequency at whichshared features occurred. For example, young Philadelphians of all social classessometimes raise their front vowels so that “plate” can be pronounced [pliyt],but working-class speakers raise them more frequently than middle and upper-class speakers. Labov also found that the way people talk can be correlated withfactors besides their social class, such as their age, race and gender, and also withthe topic, the setting, and the linguistic context in which their speech occurs.Thus, a young working-class man telling a story in a bar is almost certain to say,“I’m runnin’ late,” whereas a young, middle-class woman apologizing for beinglate to a job interview is almost certain to say, “I’m running late.”

Labov and his associates developed a number of sophisticated tools, includ-ing the Varbrul multivariate analysis program, for studying variation in speechand correlating alternating forms, like [pleyt] and [pliyt], with social andlinguistic factors. Using these tools to study SLA was an obvious notion becausevariation is the hallmark of interlanguage. Language learners, like maturenative speakers, don’t always say the same thing in the same way. But, whereasnative speakers alternate between a formal and an informal variant (forexample, “I’m running” versus “I’m runnin’ ”), nonnative speakers often alter-nate between a grammatical and an ungrammatical variant. For example,Alberto, Schumann’s (1978) famous subject, sometimes said, “I don’t like it”and sometimes said, “I no like it.” SLA researchers asked whether the methodsdeveloped by mainstream variationists to study systematic variation in firstlanguage speech could uncover systematic variation in second language speech.The answer was “yes,” and the quantitative study of variation in interlanguageis now one of the recognized schools of SLA research, included in standardintroductions to the field such as Ellis (1994) and Mitchell and Myles (2004).The question of how quantitative sociolinguistics and related theories oflanguage use can be applied to SLA is the topic of this book.

xi

Part I, Variation in Native Speaker Speech, describes the basics of Laboviansociolinguistics and shows how the field is relevant to the study of inter-language. Labov began his work at a time when the reigning linguistic paradigmwas Chomsky’s (1965) Standard Theory, and Labov considered his ownwork to be an extension and refinement of that theory. This extension waschallenged by generative grammarians, as was the application of quantitativeanalysis to the study of SLA. These debates are reviewed in chapter 1, where itis suggested that generative theory and Labovian theory describe languageat different levels of abstraction, with quantitative descriptions of languagecloser to performance models of language comprehension and production,as described in chapter 6.

In order to give readers a concrete example of what a variation study lookslike, chapter 2 presents a study of -ing variation in Philadelphia speech, whichuses the Varbrul program. Chapter 3 shows how knowledge of synchronic vari-ation can shed light on the processes of historical language change. The basicidea is that synchronic variation in a speech community is a snapshot oflinguistic change in progress. For example, at an earlier time, Philadelphiansdid not raise their front vowels. This change emerged among working-classspeakers and has spread to the other social classes. It is possible that front vowelraising will continue to increase in frequency among all social classes until thevowel system in Philadelphia English is reorganized, as happened earlier in theGreat English Vowel Shift. Chapter 3 also features a discussion of five basicproblems in understanding sound change that were identified by Weinrech,Labov, and Herzog (1968). These problems are revisited from the perspectiveof change in color category systems in the appendix.

Part II, Variation in Nonnative Speaker Speech, extends the study of vari-ation to interlanguage. Chapter 4 reviews the history and development of vari-ation studies of interlanguage, and chapter 5 presents an example of such astudy, the acquisition of English irregular past tense forms by Chinese-speakingchildren, again using the Varbrul program.

Part III, Variation in Theoretical Perspective, explores the psychologicalunderpinnings of quantitative sociolinguistics. Historically, Labovian socio-linguists have used the term Variation Theory to describe their school oflinguistics, but the theoretical aspects of this work have lagged behind thedata-based research and empirical findings. Nevertheless, I will use the term“Variation Theory” throughout the book, and in this section I suggest someof the theoretical underpinnings for developing a more comprehensive theoryof language variation.

As mentioned earlier, Variation Theory was originally conceived as anextension of generative grammar, but that attempt proved to be a categoryerror. Generative grammar is not concerned with the probabilities at whichlinguistic forms are used, only with whether the forms are grammatical. How-ever, psycholinguistic models of language production and comprehension are

xii • Preface

very much concerned with probabilities, and in chapter 6 I show how VariationTheory is compatible with these models. Barsalou’s (1992) sentence produc-tion model (based on Levelt, 1989 and Garrett, 1975) and Townsend andBever’s (2001) sentence comprehension model are reviewed. Barsalou’smodel, when suitably updated, could employ connectionist networks to makeprobabilistic decisions about what forms to produce, and Townsend andBever’s model employs connectionist networks to interpret incoming speech.Because both connectionist networks and Variation Theory deal with prob-abilities, their similarity is obvious, and Mitchell and Myles (1998, p. 178),among others, have suggested that the relationship between the two theoriesought to be worked out. Like Variation Theory, connectionist linguistics isempirically oriented and has not been closely associated with a particulargrammatical theory, with one exception. That exception is the theory of Cogni-tive Linguistics (CL), which, as Feldman (2006) shows, is compatible with con-nectionist networks. In fact, in some cases a CL description can be considereda more abstract representation of what connectionist networks do. Chapter 7describes CL and shows how it is compatible with connectionist models. Then,the chapter goes on to show that both CL and connectionism are compatiblewith Variation Theory. This is perhaps the most innovative claim in the bookbecause it suggests that variationists can look to CL for a more abstract charac-terization, a grammatical characterization if you like, of much of their work.

Part IV, Variation in Pedagogical Perspective, describes some teachingimplications of the theoretical perspectives and research findings presentedin Parts I, II, and III. Chapter 8 compares Labov’s theory of monitoring(paying attention, either consciously or unconsciously, to the form of speech)to Krashen’s (1978, 1982, 1985, 1987) theory of monitoring, which has hadenormous influence on language teaching practice. Both similarities anddifferences in the two theories are pointed out. Chapter 8 also discussesaccommodation theory and related theories that were developed by socio-linguists who were not followers of Labov. These theories complement(their proponents say contradict) Labov’s theory of monitoring but betterexplain how a speaker’s sense of personal identity and how different audiencesaffect the way people talk. This chapter also discusses how accommodationtheory has been applied to the study of interlanguage.

Chapter 9 further explores the teaching implications of Variation Theoryand CL. The chapter is divided into two parts. The first part, Social Dimen-sions, reviews the debate on whether to teach informal and nonstandard formsof a foreign or second language. It is noted that, although communicativecompetence is usually taught in both foreign and second language settings,sociolinguistic competence (Bachman, 1990), that is, the ability to understandand produce informal and nonstandard forms, is seldom taught and is a highlycontroversial subject. The second part of the chapter, Psychological Dimen-sions, considers the teaching implications of Variation Theory proper and CL.

Preface • xiii

The chapter ends with a discussion of the pedagogical implications of thephilosophy of Empirical Realism (Lakoff, 1987), a philosophy developed bycognitive linguists.

In the appendix, the similarities and differences between sound change, asdiscussed in chapter 3, and change in color category systems are pointed out.The theory of color category change has provided core concepts to the theoryof CL. Berlin and Kay (1969) found that the languages of the world containfrom as many as 11 basic color terms to as few as two. They also found that as asociety develops in complexity and technology, it expands its inventory of basiccolor terms. In other words, color category systems evolve over time just assound systems do. Because of this similarity, it is possible to discuss Berlin andKay’s (1969) theory of color category change in terms of the five problems forunderstanding sound change that were introduced in chapter 3.

This book is intended for students who have a good foundation in lin-guistics but who are not familiar with Variation Theory or CL. The bookattempts to place the quantitative study of SLA in the context of related discip-lines, including generative linguistics, psycholinguistics, CL, and, of course,Variation Theory. Therefore, the discussion is wide-ranging, but I have tried towrite at the level of an introductory textbook in each of these disciplines.Most of the chapters review, relate, and interprete the relevant literature, withthree exceptions. Chapter 2 contains a data-based study of variation in nativespeaker speech, and chapter 5 contains a data-based study of variation in inter-language. The suggestions for teaching in chapter 9, for the most part, have notbeen made previously. The book could be used as an auxiliary text in a coursein SLA or mainstream sociolinguistics. It would be even better as the main textin a course in quantitative SLA, and it is offered in the hope that there will bemore such courses in the future.

Acknowledgements

Thanks to my students and colleagues in the Interdisciplinary Ph.D. Programin Second Language Acquisition and Teaching at the University of Arizona.I have used material from the book in my classes, and my students’ commentshave been most helpful. Thanks especially to my co-authors, Vera Regan andLarry Berlin, for their support and collaboration. Conversations with RoyMajor, Norma Mendoza-Denton, and the late Robert MacLaury were also veryhelpful. The author is solely responsible for any shortcomings in the book. Theauthor is sorry.

xiv • Preface

Phonetic Symbols Used in the Book

Vowels

Diphthongs ay, aw, oy

Consonants

front mid back

High iy uw

I u

Central ey ə ow

e � o

Low æ a

bilabial labiodental dental alveolar palatal velar glottal

Stops p, b t, d k, g

Fricatives f, v θ, ð s, z s, z

Affricates c, �Nasals m n ŋLiquids l, r

Glides w y h

xv

List of Figures and Tables

Figures

2.1 Decision tree for stylistic analysis of spontaneous speech inthe sociolinguistic interview (from Labov, 2001b). 27

3.1 Mean values of all Philadelphia vowels with age coefficients. 353.2 Mean values of all Philadelphia vowels with age coefficients. 366.1 Results of Hudson Kam and Newport’s (2005) experiment in

the acquisition of determiners in an artificial language. 796.2 Barsalou’s (1992) model of speech production. 816.3 The Aphasia Model for naming cat. 856.4 Simplified version of Spivey and Tanenhaus’s (1998)

connectionist model for selecting an RR or MC template. 906.5 Spivey and Tanenhaus’s (1998) connectionist model for

selecting an RR or MC template (adapted). 916.6 A connectionist interpretation of Elliott’s (1975) data. 977.1 The prototype category BIRD. 1037.2 The radical category “ditransitive verb” (from Goldberg, 1995). 1197.3 Verbs that fuse with the ditransitive construction (from

Goldberg, 1995). 1207.4 A possible representation of prototype schema (6) as a

connectionist network. 1288.1 Class stratification of /r/ in guard, car, beer, beard, etc. for

native New York City adults (based on Labov 1972a, p. 114). 1348.2 The monitor model. 1368.3 Bell’s categorization of different types of style shifting (from

Bell, 2001). 1439.1 Bachman’s model of communicative language ability. 1549.2 The force image schema. 1779.3 The transfer of possession schema. 179A.1 Stages of evolution in color categories. 184

Tables

1.1 Deletion of final /t,d/ in Detroit African American English ininformal style by social class. 13

1.2 /t,d/ deletion in Detroit African American English by socialclass and linguistic environment. 13

1.3 Varbrul results for /t,d/ deletion by African American Englishspeakers from Detroit (all social classes). 16

xvii

2.1 Syntactic categories in which -ing occurs. 232.2 Probabilities of N in the Philadelphia native speaker data

according to monitoring, gender, grammatical category, andfollowing phonological environment. 29

2.3 Frequency of N according to style and gender. 302.4 Percentage of N by following phonological environment. 313.1 Frequencies of [æ] tensing for two words by age group. 403.2 Varbrul weights for the backing of /�/ in Belten High, for

social categories and genders. 424.1 Percentage of forms for L1 speakers of Quebec French,

French language arts materials, French immersion teachers,French immersion students. 57

5.1 Three semantic features associated with Vendler’s (1967)four semantic categories of verbs. 63

5.2 Overview of individual subjects. 655.3 Percentage of accurate past tense marking by individual

subject and time period. Subjects are ranked according to theiroverall accuracy. 67

5.4 Percentage of accurate past tense marking by individualsubject and verb class. 68

5.5 Percentage of accurate past tense marking by individualsubject and semantic type of verb. 69

5.6 Percentage of accurate past tense marking by individualsubject and clause type. 69

5.7 Varbrul analysis for past tense marking. 705.8 Percentage of accurate past tense marking by time period

and semantic type of verb. 726.1 Percentage of correct se usage by semantic domain in

Elliott’s (1995) study. 966.2 Percentage of overgeneralization of se by semantic domain in

Elliott’s (1995) study. 967.1 Classification of verbs on the grammaticality judgment task

according to Goldberg’s (1995) and Gropen et al.’s (1991) criteria. 1227.2 Native English and native Korean speakers’ ratings for the

grammaticality of the ditransitive construction with various verbs. 1238.1 Style shifting on four target language forms in narrative style,

interview style, and grammar test style by Arabic speakers. 1398.2 Varbrul p values for plural -s marking by Chinese speakers

according to convergence with interlocutors. 1468.3 Percentage of informal variants for sentence reading style

versus phrase reading style for three native language groups. 1488.4 Percentage of informal variants for males and females for

three native language groups. 148

xviii • Figures and Tables

IVariation in Native Speaker Speech

1Variation Theory

Introduction—The Cartesian Mind

The Logical Problem of Learning

Two of the oldest questions in philosophy are the ontological question and theepistemological question. The ontological question asks, “What is the nature ofreality?” The epistemological question asks, “How do we know what we know?”Since Plato, philosophers have observed that we know more than we have evi-dence for. As Bertrand Russell put it, “How comes it that human beings, whosecontacts with the world are brief and personal and limited, are neverthelessable to know as much as they do know?” (quoted in Chomsky, 1986, p. xxv). Anexample of something we know instinctively, without sufficient evidence, isthat physical objects continue to exist when no one is looking at them. But theclaim that objects do not disappear when no one is looking is not refutable byany observation, as Hume pointed out. The problem of how we know thingsbased on limited evidence has been called the logical problem of learning. Theepistemological question and the logical problem of learning were debatedby philosophers during the seventeenth century, who came up with somesurprisingly modern answers. As Chomsky (1999, p. 36) observes:

These 17th century thinkers speculated rather plausibly on how we pre-serve the objects around us in terms of structural properties, in terms ofour concepts of object and relation, cause and effect, whole and part,symmetry, proportion, the functions served by objects and the character-istic uses to which they are put. We perceive the world around us in thismanner, they argued, as a consequence of the organizing activity of themind, based on its innate structure and the experience that has caused itto assume new and richer forms.

John Locke and Rene Descartes proposed two different answers to the epi-stemological question, and the debate between their intellectual descendantscontinues to this day. Locke believed that the mind is like a blank slate and thatall of the ideas that it eventually contains are supplied by experience. He wrote:

Let us then suppose the mind to be, as we say, white paper void of allcharacters, without any ideas. How comes it to be furnished? Whencecomes it by that vast store which the busy and boundless fancy of man

3

has painted on it with an almost endless variety? Whence has it all thematerials of reason and knowledge? To this I answer, in one word, fromEXPERIENCE. (quoted in Pinker, 2002, p. 5)

Descartes, on the other hand, believed that if all human knowledge werebased only on experience, there would be no way to be certain about the truthsof mathematics, science, or anything else because no two people’s experience isthe same. However, he believed that we can be certain of at least one thing:the existence of our own minds: “I think, therefore I am.” A corollary of thisproposition is that the mind can also know its own ideas or representations.Knowledge consists of grasping what these ideas are, and working out the con-nections between them. Many of our ideas, Descartes believed, are imposed onour minds by the force of logical necessity. Such innate ideas included the basicconcepts of mathematics and geometry, and, indeed, the idea of God. Otherseventeenth-century philosophers took a middle position. Kant agreed withLocke that knowledge of the world is gained through sensory experience, butargued that this experience must be organized by principles that are inherent inthe mind. Leibniz expressed the same idea this way: “There is nothing in theintellect that was not first in the senses except the intellect itself” (quoted inPinker, 2002, p. 34).

Modern psychology has found considerable evidence that supports Kant’sand Leibniz’s position that the mind has innate ways of organizing experience.In a famous set of experiments Spelke, Vishton, and von Hofsten (1995) dem-onstrated that three- to four-month-old infants have a fairly well-developedontological theory. They understand what an object is, that objects normallymove along a continuous trajectory, and that objects cannot disappear fromone place and reappear in another. The method used to demonstrate these factsis extraordinary. Before they can crawl, infants can turn their heads to observetheir surroundings, and they like to watch new or unexpected things. They areeasily bored and will turn their heads to see new things and will look longer atthings that are unexpected. By timing how long infants will look at a scenebefore turning their heads, researchers can infer what the infants considernew or unexpected. Using this methodology, Baillargeon (1995) showed thatinfants do not expect one object to pass through another. The babies he studiedwere fascinated by an animated scene in which it appeared that a panel placedin front of a cube fell flat, right through the space that the cube should beoccupying.

Another well-studied example of innate mental organizing principles(which is examined in more detail in the appendix) comes from research onhow languages name colors. At first glance color naming systems seem to beremarkably diverse. The language of the Dani of New Guinea has only twocolor terms (which roughly correspond to “light” and “dark”), whereas Englishhas eleven basic color terms. According to Berlin and Kay (1969), a color term

4 • Variation in Native Speaker Speech

is basic if it is a single morpheme, not derived from another color term (likereddish-brown), and uniquely names a region of the color spectrum. Theeleven basic color terms in English are:

black red yellow brown purplewhite green pink

blue orangegray

Berlin and Kay (1969) discovered a remarkable fact about the color-namingsystems in all of the languages they studied. If a language has a color term onthe right side of the chart above, it will also have the terms to the left of it. Thus,if brown is a basic color term in a language, that language will also have termsfor yellow, green, blue, red, white, and black. The reason for this universal orderamong color systems is the human perceptual apparatus. People perceive sixpurest colors: red, yellow, green, blue, white, and black. The maximum con-trasts between these colors determine which colors will be named in a systemwith a particular number of terms. For example, if a system has three terms,they will be roughly light, dark, and reddish, which divide the color spectrumup into three maximally contrasting regions. As discussed in the appendix,social factors are also important in how a language names basic colors, butthese factors are constrained by principles of color perception and sensoryprocessing that are innate in human beings.

The Logical Problem of Language Acquisition

One aspect of the logical problem of learning involves how we are able to learna language. The problem applies to first language acquisition in a straight-forward way. How can we explain the fact that children know grammaticalpatterns that they have not encountered in the input they receive? Evidence ofsuch knowledge was found in a classic experiment by Crain and McKee (1986).They showed that three-year-olds understand that in the sentence, “He waseating pizza while Kermit the Frog was dancing,” he must refer to someoneother than Kermit the Frog, say Cookie Monster. This finding is surprisingbecause the reason he must have a referent that is not mentioned later in thesentence is very abstract. At first glance one might think that the reason is fairlysimple: a pronoun cannot precede its referent. But this is not the case, as shownby the sentence, “While he was dancing, Kermit the Frog was eating pizza.”The actual reason involves a principle of Universal Grammar (UG) calledBinding Principle B, which states that a pronoun cannot refer to an NP that itc-commands. C-command, roughly put, means this: In a tree structure, node Ac-commands node B if there is a node C that directly dominates node A andalso dominates node B. For example, in the case of the tree on page 8, the firstNP (Marsha) c-commands VP and every node within VP. Similarly, V (picked)c-commands Part, the final NP, and every node within the final NP. Chomsky

Variation Theory • 5

claims that children know Binding Principle B and the other principles of UGinnately. Chomsky’s answer to the logical problem of first language acquisition,then, is that children can use grammatical knowledge that they could not havefigured out from input because they are born with it.

The logical problem of learning a language applies differently to adultsecond language acquisition because in this case it is more difficult to claimthat learners know grammatical patterns that they have not encountered; afterall, they already know a whole language. Therefore, it is not clear what role, ifany, UG plays in adult second language acquisition. Three positions are usuallydistinguished. The fundamental difference hypothesis (Bley-Vroman, 1990)claims that UG is not involved at all in second language acquisition. The evi-dence cited for this position is that, unlike first language acquisition, secondlanguage acquisition is often not successful, as implied by the failure of manylanguage teaching programs to produce fluent speakers. The second position isthat UG works for adults in much the same way that it works for children.Supporters of this view (Schwartz and Sprouse, 1996; Shi, 2003) point out thatthere are many cases of successful adult language learning and that the logicalproblem of language acquisition applies to these cases just as it applies to chil-dren. Therefore, UG must be at work. The third position is that only the prin-ciples and parameters employed during first language acquisition are availableto aid the second language learner. As an example, consider another principleof UG, the null subject principle, which states that a sentence subject eithermust appear in the surface structure or need not appear in the surface struc-ture. Like many UG principles, the null subject principle comes with anassociated parameter, which can be set to plus or minus. In Spanish, the nullsubject parameter is set to plus, which means that Spanish allows null subjectsentences, that is, it allows sentences without a surface subject, such as: Tengoun libro “I have a book” (literally “Have a book”). In English and French, on theother hand, the null subject parameter is set to minus, which means that theselanguages require that sentences have subjects on the surface, except in specialcases like commands. Those who say that only the principles and parameterssettings employed in the first language are available in learning a second lan-guage would predict that a speaker of French will be able to learn this aspect ofEnglish easily because English has the same setting, but a speaker of Spanishwill have more difficulty because Spanish has the opposite setting.

The Developmental Problem of Language Acquisition

Gregg (1996) makes a distinction between the logical problem of languageacquisition and the developmental problem. The answer to the logical problem,he maintains, must involve a comprehensive theory of language, like UG. Sucha theory is a property theory, which addresses the question, “How is acquisitionpossible?” A property theory not only explains how language is acquired butalso goes a long way toward explaining what language is. “A property theory


describes the components that constitute the system, and their interrelations”(Gregg, 1996, p. 51). The developmental problem of language acquisition, onthe other hand, involves a transition theory, which addresses the question,“Why does child language or interlanguage change from state A to state B?” UGis not a transition theory. For example, notice that Binding Principle B setslimits on how children construct an internal grammar of pronoun reference,but it does not describe the course of this acquisition in detail, predicting, forexample, which pronouns a child will use first. A transition theory of pronounreference might complement the UG property theory of pronoun referenceby describing the order in which a particular speaker or group of speakersacquired pronouns, coupled with an explanation of why that order occurred,an explanation that might involve, for example, the frequency of pronounsencountered in the input.

Gregg (1996) maintains that an adequate transition theory must be associ-ated with a property theory, and that would, no doubt, be desirable. Unfor-tunately, however, we have no agreed-upon property theory. The propertytheory that Gregg endorsed in 1996, the Government and Binding version ofgenerative grammar, was in the process of changing fundamentally even ashe wrote. Therefore, it seems reasonable to investigate a transition theoryof language development independently of any particular property theory,keeping in mind that the two kinds of theories must eventually be compatible.I will claim in the next section that Variation Theory, a way of modeling changein linguistic systems, can help to explain the developmental aspects of first andsecond language acquisition, and thus serve as part of a transition theory.

Generative Grammar

Variation theory was developed in the 1960s and 1970s, largely by WilliamLabov (1966, 1969, 1972a, 1972b), to explain the relationship between differentvarieties of the same language. Labov and his colleagues took a special interestin nonstandard varieties, especially African American English, whose speakersoften use both nonstandard forms and standard forms in the same discourse,for example, “He play basketball everyday” and “He plays basketball everyday.”It should be noted that similar variation is also common in standard varietiesof English (“He’s playing basketball now” and “He is playing basketball now”).As we will see in chapter 2, variationist research has shown that such alterna-tions are usually systematic and not random. Originally, Variation Theory wasthought of as an extension of generative grammar, so in order to understandthe early days of Variation Theory, we must take a look at the reigning generativetheory of that time.

The Standard Theory

The generative model that had the most influence on Variation Theory wasthe Standard Theory proposed by Chomsky in 1965. Recall that in this


syntax-driven model re-write rules of the form S → NP VP; NP → det N; andVP → V NP produced tree structures that terminated in lexical items, as in thetree below, which was called a deep structure.

Deep structures could be rearranged using transformational rules to producesurface structures. For example, the particle movement transformation appliedto the tree structure above would change the sentence “Marsha picked up thebook” into the sentence “Marsha picked the book up.” The particle movementtransformation rule looked like this:

V part NP → V NP partpicked up the book picked the book up

This rule says that in a sentence that contains a verb/particle combination fol-lowed by an NP, the particle can be moved to the right of the NP. The rule can,of course, be used to generate many other sentences, including “Marsha pickedthe book with the Corinthian leather binding up.” Whether a particle is likelyto move is not indicated by the rule, even though we know that particles moreoften move around short NPs, like “the book,” than around long NPs, like “thebook with the Corinthian leather binding.” However, probabilistic statementswere outside the scope of generative grammar. The goal of the grammar wasto generate sentences that were grammatical, not to address questions ofprobability.

Competence and Performance

Suppose you turn on your computer, boot up a math program, and punch in

2 + 2 =

The computer produces the answer: 4. You may have wondered how themachine does it. The explanation must be given at several levels of abstraction.At the most abstract level is the equation 2 + 2 = 4. This tells us conceptuallywhat the machine is up to. At the most concrete level is a description of theactual circuits that go on and off as the computer performs the operation.There are also descriptions at intermediate levels of abstraction. For example,writing the equation in base 2 (10 + 10 = 100) is closer to the concrete level


because the machine does its calculations using the binary system. At a stillmore concrete level are lines of code in a programming language, which areactual instructions to the computer. For example, here is how you program acomputer to add 2 and 2 in Basic:

1. int x = 0 This tells the computer to reserve a space in memoryto hold an integer, to name the space “x”, and toplace the integer 0 in the space.

2. x = 2 + 2 This tells the computer to combine 2 and 2 and placethe result in space x.

A level of abstraction between a programming code and the description ofcircuits is called the machine code, and it corresponds fairly closely to what theactual circuits do when making the calculation. All of these descriptions areconceptually equivalent in the sense that they share the same basic informa-tion. However, the descriptions at different levels of abstraction are useful fordifferent purposes. A machine code description is useful to someone who isinvestigating an error message that may be caused by a virus. The Basic codedescription is useful to someone who wants to be able to add 2 and 2 on acomputer that can run Basic. The mathematical equation is useful for under-standing the logic of what the computer is doing.

Many linguists and psychologists have adopted the computer metaphor ofthe mind, which Smolensky (2001, p. 323) describes as follows: “Just as a pro-gram is an abstract higher-level description of a computer, a mind is anabstract, higher-level description of a brain.” How does the mind produce sen-tences like “Marsha hit John”? The explanation is not straightforward but (likethe explanation of how a computer performs a calculation) must be given atseveral levels of abstraction. The most concrete level, the level of “wetware,” is adescription of which neurons in the brain fire during the production of a par-ticular sentence. At a more abstract level, and perhaps analogous to machinecode, is what Fodor (1975) calls the “language of thought.” At a still moreabstract level, is a Standard Theory type of explanation, which provides thederivational history of a sentence in terms of phrase structure rules, transform-ational rules, lexical insertion rules, and phonological rules. Such an explan-ation might be considered analogous to equations like 2 + 2 = 4 in a descriptionof how a computer performs addition.

Chomsky has singled out two levels of abstraction as important for under-standing the mental processes involved in sentence production (and this claimapplies to all versions of Chomskian theory, including the most recent): com-petence and performance. These terms have caused a lot of confusion. A com-petence theory is a theory of the mental system that underlies languagebehavior (Gregg, 1996, p. 53). A performance theory is a theory of the internalmental processes involved in language behavior. Generative linguistics is oftenconsidered to be the study of competence and psychology is considered to be


the study of performance, but Chomsky insists that both levels of descriptionare important for understanding human language abilities and both are part ofa psychological explanation. Stabler (1984) characterizes Chomsky’s positionas follows:

Chomsky . . . argu[es] that linguistics characterizes what is computed, thelinguistic “data structures,” while other psychological investigationsexplain how those linguistic structures are computed. (p. 156)

In terms of the addition analogy, we might consider competence to bethe equivalent of 2 + 2 = 4 and performance to be the equivalent of theBasic code.

Some psychologists, however, have claimed that generative grammar adoptsa level of description that is too abstract. Valian (1979) summarizes this objec-tion as follows:

[Because] the grammar of a language does not have an automatic per-formance interpretation, . . . psycholinguists have attempted to specifyperformance independently of competence. To the extent that they havebeen successful, they have suggested that the distinction between com-petence and performance is unnecessary and that competence itself isnot a useful notion. (p. 1)

In terms of the computer analogy, it is as though computer scientists who wereable to explain how a computer performs addition at the level of machine codewent on to claim that a discussion of addition using numbers in base 10 wasunnecessary. That claim would be wrong. It is true that base 10 mathematicsdoes not get you far in describing the operation of a specific machine becausedifferent computers perform addition in different ways, but all of the ways arebased on the same mathematical principles, which are usefully stated (in ourculture, at least) in terms of base 10 mathematics. Similarly, many principles oflanguage acquisition, such as Binding Principle B, can be usefully stated at thecompetence level.

During the 1960s, some psychologists proposed an interesting possibility fornarrowing the gap between competence and performance. As we have seen, theStandard Theory generates sentences, but generative grammarians have oftenpointed out that the term “generate” is intended in the mathematical sense ofspecifying all and only the grammatical sentences of a language. That is, therules of the grammar are not supposed to apply in real time to produce sen-tences but rather to exist all at the same time to specify which word strings aregrammatical and which are not. Such a defining function is clearly in the realmof competence. But, these psychologists wondered, what if generative ruleswere thought of as applying in real time? That is, what if the competence rules(perhaps with some modifications) could work as a performance model of sen-tence production? If this turned out to be true, the competence/performance


distinction would still exist (competence rules would still “define” grammaticalsentences while performance rules would model how speakers actually men-tally produced sentences), but the gap between the two levels of explanationwould be greatly narrowed. For a few heady years, it looked like this hypothesis,called the derivational theory of complexity (DTC), was right.

Generative Grammar and Psycholinguistics

In the 1960s, a number of experiments showed that the amount of time it takesto produce or comprehend certain sentences depends on how many trans-formations are in their derivations. For example, the passive transformation willchange the base sentence “Marsha hit John” into “John was hit by Marsha,”and the negative transformation will change that sentence into “John was nothit by Marsha.” Thus, the negative passive sentence requires more transform-ations than either the active sentence or the declarative passive sentence.

Miller and McKean (1964) found that subjects took longer to comprehendnegative passives than declarative passives, and that declarative passives tooklonger to comprehend than actives. This finding suggested that to understand asentence, subjects had to mentally undo the transformations that had beenapplied to it. Miller and McKean (1964) also found that when subjects wereprompted with a declarative active sentence, it was easier for them to produce adeclarative passive sentence than a negative passive sentence, suggesting thatthe negative passive sentence required additional mental processing (in theform of a mental negative transformation) just as the Standard Theoryimplied. Other studies (Clifton, Kurcz, and Jenkins, 1965; Clifton and Odom,1966) supported the idea that derivationally complex sentences are more dif-ficult to process than their less complex counterparts, suggesting that, at least inregard to the sentences in question, the Standard Theory competence accountof a derivation could also work as a performance model of sentence com-prehension and production. Townsend and Bever (2001, pp. 29–30) remark,“The . . . hypothesis of a direct mapping from the structure of linguistic know-ledge and language behavior was wildly successful. The golden age hadarrived.”

The golden age did not last long, as problems with the DTC were noticedimmediately. The researchers just mentioned had not claimed that all trans-formations made sentences more difficult to process, just the transformationsthey had studied, but it seemed possible that transformations usually had thiseffect. However, experiments showed that some transformations seemed tomake processing easier, not harder. For example, according to the StandardTheory, the second sentence below is derived from the first sentence byapplying the extraposition transformation.

That John left early surprised Marsha.It surprised Marsha that John left early.


But Fodor and Garrett (1967) showed that subjects understood the extraposedsentence more easily.

A second problem for the DTC was that the Standard Theory was replaced byother generative theories. In the Government and Binding Theory (Chomsky,1981), for example, all transformations except one disappeared. Another prob-lem was that Government and Binding Theory had no clear interpretation as aprocessing model (Townsend and Bever, 2001, p. 179), and so the distancebetween a competence grammar and performance model of sentence produc-tion increased. However, the latest version of generative grammar, the Minim-alist Program (Chomsky, 1995) is less abstract than Government and BindingTheory and, like the Standard Theory, can be interpreted as part of a produc-tion model. As Townsend and Bever (2001) remark, “Perhaps a new ‘deriv-ational theory’ of the psychological operations involved in assigning syntacticderivations is at hand” (p. 179). In chapter 6, we will examine this possibility.But now, having reviewed the linguistic and psychological background of the1960s and 1970s, we can move on to consider Variation Theory, which emergedat this time.

Variation Theory—A History

Variable Rules

Within a speech community, speakers who belong to different age groups,social classes, ethnic groups, and genders show systematic differences in theway they talk. For example, words ending in -ing, such as running and darling,have an informal pronunciation (runnin’, darlin’) as well as a formal pro-nunciation. As we will see in chapter 2, studies (Cofer, 1972; Houston, 1985)have found that middle-class speakers and women use the formal variant moreoften than working-class speakers and men.

Perhaps the most studied example of socially patterned variation involvesthe deletion of the sounds /t/ and /d/ when they occur in a consonant cluster inword final position, so that the words mist and buzzed are pronounced mis’ andbuzz’. Studies (Fasold, 1972; Wolfram, 1969) have found that men delete /t,d/more often than women, that working-class speakers delete /t,d/ more oftenthan middle-class speakers, and that almost all speakers delete /t,d/ more oftenwhen they are speaking informally. Wolfram (1975) found that different ratesof /t,d/ deletion correlated with the social class of African American English(AAE) speakers living in Detroit (who can delete /t,d/ from non-clusters,so that did can be pronounced [di]). This pattern is shown in table 1.1, where/t,d/ deletion rates range from 51 percent for upper-middle-class speakers to 84percent for lower-working-class speakers.

It is remarkable that speakers can learn the frequency at which they shouldproduce variable linguistic forms like in’ and word final /t,d/ in order to soundlike other members of their demographic groups. But, it turns out that this task


is even more complex than has been suggested so far. The frequency at which aspeaker uses variable forms depends not only on the speaker’s demographiccharacteristics, but also on the linguistic environment in which the formoccurs. For example, all speakers sometimes delete final /t,d/ when the follow-ing word starts with a consonant (this makes sense because it is more difficultto pronounce a three-consonant cluster than a two-consonant cluster followedby a vowel). Final /t,d/ deletion is also less likely (in native speaker speech) ifthe final /t,d/ does not serve as a past tense morpheme. Thus, deletion is mostlikely in a phrase like “test me,” where t is part of a three-consonant cluster andis not a past tense morpheme, and is least likely in a phrase like “mist over,”where t is not part of a three-consonant cluster and not a past tense morpheme(we will discuss the effect of linguistic environment in more detail below). Theeffect of both linguistic environment and social class on /t,d/ deletion is shownin table 1.2.

But there are still other factors to consider. Many studies have found that thefrequency at which a variable feature is used also depends on the circumstances

Table 1.1 Deletion of final /t,d/ in Detroit AfricanAmerican English in informal style bysocial class.

Classes Deletion rate

Upper middle .51Lower middle .66Upper working .79Lower working .84

Table 1.2 /t,d/ deletion in Detroit African American English by social class andlinguistic environment.

Social classes

Environments Uppermiddle

Lowermiddle

Upperworking

Lowerworking

Following vowel:/t,d/ is past morpheme .07 .13 .24 .34

(e.g. “missed in”)/t/d/ is not past morpheme .28 .43 .65 .72

(e.g. “mist in”)Following consonant:/t,d/ is past morpheme .49 .62 .73 .76

(e.g. “missed by”)/t,d/ is not past morpheme .79 .87 .94 .97

(e.g. “mist by”)


of speaking. A classic example is Labov’s (1966) study of /r/ deletion in NewYork City. New Yorkers can delete /r/ after a vowel (so that forth floor is pro-nounced [foəθ floə]). Labov found that this deletion correlated not only withthe linguistic environment and the speaker’s social class but also with thespeaking task. He asked subjects to speak in three different circumstances. First,he interviewed the subjects, asking them to provide demographic information(such as how old they were and where they were born) and also to tell storiesabout childhood fights, times when they were in danger of death, and othertopics. Labov found that the speakers tended to delete /r/ more often when theywere telling stories than when they were providing demographic information.He therefore distinguished between a casual style, which was used in narrative,and a formal style, which was used in other parts of the interview. Labov sug-gested that the speakers tended to delete /r/ more in the casual style becausethey paid less attention to how they sounded, concentrating instead on tellingthe story. However, in formal style the speakers monitored their speech, tryingto avoid stigmatized forms like deleted /r/.

The second speaking situation that Labov (1966) investigated involved read-ing a passage that contained a number of words from which /r/ could bedeleted. He found that in this reading style the speakers deleted /r/ less oftenthan in casual style, a fact that also supports the monitoring hypothesis becausewhen reading speakers have more attentional resources to devote to checkingfor correctness. The third situation that Labov (1966) investigated was readingindividual words from a list, a task that allowed the speakers to monitor evenmore carefully. The speakers first read a list of words containing /r/. In this wordlist style, they deleted even less than in the reading style. The subjects then reada list of minimal pairs, one member of which contained /r/, such as god—guard. This minimal pair style, which could be most carefully monitored, con-tained the least amount of /r/ deletion. To summarize, Labov (1966) found thatthere are no single-style speakers. His subjects modified their natural way ofspeaking (or vernacular style) in circumstances that allowed them to do so.Thus, speaking style (as defined by the context of speaking) is still anotherfactor that sociolinguists must take into account if they wish to model variationin speech. The relationship between monitoring and speaking style will bediscussed more fully in chapter 8.

Researchers who wished to write a grammar that described probabilisticpatterns in speech production, such as those found by Labov, faced a basicproblem. How could frequency information be included in a Standard Theorygrammar? The solution that Labov and his colleagues proposed was to modifythe transformational rules of the Standard Theory so that they specified thelinguistic factors that affected rule application. At first, this change appeared tobe minor. Generative grammar already contained optional rules, like the rulefor particle movement mentioned previously, which generated alternativeforms. An optional rule for /t,d/ deletion would look like this:


(1) t,d → (Ø) / C (#) ## {V,C}

Rule (1) says that /t,d/ at the end of a word is optionally deleted when it occursafter a consonant, when followed by either a vowel or a consonant, and regard-less of whether it is a separate morpheme. But how could the grammar showthat deletion is more likely before a consonant and when /t,d/ is not a mor-pheme? Labov’s (1969) answer was to propose the variable rule, which specifiesthe environmental features (called constraints) that favor rule application. Rule(1) can be re-written as a variable rule as follows:

(2) t,d → Ø / C ⟨ Ø# ⟩ # # ⟨ C

V ⟩

Rule (2) says that /t,d/ deletion is optional, but that it is more likely before aconsonant (which is indicated by writing the C above the V in the angledbrackets), and when there is no morpheme boundary (which is indicated bywriting the Ø above the #). Rule 2 generates four environments in which dele-tion can occur. These can be ordered from strongest to weakest (that is, fromthe environment that most favors deletion to the environment that least favorsdeletion) as follows:

Ø,C (no morpheme boundary, following C)#,C (morpheme boundary, following C)Ø,V (no morpheme boundary, following V)#,V (morpheme boundary, following V)

Inspection of table 1.2 reveals that this is the order of the environments thatfavor /t,d/ deletion in Detroit AAE speech.

The Varbrul Program

A computer program called Varbrul (Rand and Sankoff, 1990) was developedas a tool for discovering the constraints on variable rules. The program is simi-lar to other programs used for analyzing variable data in the social sciences,such as ANOVA, but is specially adapted for use with linguistic data. The logicbehind a Varbrul analysis is similar to the logic behind other kinds of correl-ational studies, such as those carried out in agriculture, where scientists mighttry to correlate various combinations of plant food and fertilizer (theindependent variables) with the weight of the tomatoes a plant produces (thedependent variable). Similarly, rule (2) can be thought of as a hypothesis aboutwhich linguistic factors correlate with the deletion of /t,d/, where a followingconsonant and the lack of morpheme status are similar to independent vari-ables and the deletion of /t,d/ is similar to the dependent variable. The functionof the Varbrul program, like the function of ANOVA, is to test statisticallywhether a particular hypothesis is correct. To do this, the analyst supplies theprogram with data regarding the frequency of /t,d/ deletion in the differentlinguistic environments, such as the data in table 1.2. The program will then


determine whether a proposed feature, such as a following consonant, iscorrelated with deletion at a statistically significant level, and the program willcalculate the relative strengths of the linguistic features that favor deletion. Inaddition, the program calculates a figure called the input probability, whichrepresents the likelihood that the rule will apply regardless of which constraintsare present. The input probability is necessary because variable rule theoryassumes that there is some inherent variation in performance that cannot becompletely accounted for by all of the known variables.

Preston (1989, p. 17) ran a Varbrul analysis on the data in table 1.2 and gotthe results shown in table 1.3.

The decimal figures (called p values) associated with each of the independentvariables in table 1.3 indicate how much that factor contributes to theprobability of /t,d/ deletion. A p value greater than .5 indicates that a factorpromotes deletion, while a p value less than .5 indicates that a factor inhibitsdeletion. Notice that the Varbrul analysis in table 1.3 shows how a speaker’ssocioeconomic class affects deletion, but that this information is not includedin variable rule (2). Similarly, if the effects of monitoring were known, thisinformation could be included in a Varbrul analysis, which would assign a highp value to casual style and a low p value to formal style. However, this informa-tion is also not included in variable rule (2). Non-linguistic information wasexcluded from variable rules because, like Standard Theory rules, they wereconsidered to be part of a competence grammar, whereas social patterning andmonitoring were considered to be aspects of performance. Thus, a Varbrulanalysis was not equivalent to a variable rule because it made no psychologicalclaims. Rather, Varbrul was considered to be only an analytical tool that couldhelp the researcher write a variable rule, which did make psychological claims.As previously mentioned, changing optional rules to variable rules seemed likea minor alteration in generative theory, but in fact it was a fundamental changeand caused considerable controversy among both sociolinguists and generative

Table 1.3 Varbrul results for /t,d/ deletion byAfrican American English speakers fromDetroit (all social classes).

Following vowel .25Following consonant .75Morpheme .31Nonmorpheme .69Upper middle class .29Lower middle class .42Upper working class .60Lower working class .69Input probability .60


linguists. But before looking at these objections, let us continue with thehistory of variation theory.

The Scope of Variable Rules

The variable rule speech community. Variation analysis, and especially the useof the Varbrul program, requires a large number of tokens of the variable beingstudied. For example, Wolfram’s (1969) study of Detroit African Americanspeech, upon which table 1.2 is based, involved 12 informants, who produced atotal of 377 tokens. Cedergren’s (1973) massive study of syllable final -s dele-tion in Panamanian Spanish (where dos libros “two books” can be pronounced[dow liybrow]) involved some 79 speakers and over 22,000 tokens. Such largenumbers of tokens cannot be gathered from a single informant; therefore,these studies, like most variation studies, combined the data from manyinformants, and they assumed that the linguistic constraints operated thesame way for almost all of the individuals. In many cases, there are goodreasons why this should be so. For example, as mentioned, the fact that a fol-lowing consonant favors deletion of /t,d/ from a consonant cluster in wordfinal position makes sense because the sequence CCC is more difficult to pro-nounce than the sequence CCV. However, in other cases it is not clear thatall speakers in a speech community will share a variable rule with the samelinguistic constraints and the same constraint ordering. Nevertheless, theuniform constraints assumption was accepted in early studies without beingtested. In fact, several scholars (Cedergren & D. Sankoff, 1974; G. Sankoff, 1974;Wolfram, 1974) tacitly adopted the uniform constraints assumption as oneof the features that defined a speech community, a claim that was stronglydisputed.

Objections to variable rules. Kay (1978; Kay & McDaniel, 1979) objected tothe uniform constraints assumption, noting that numerous studies had, in fact,found that the linguistic constraints on a variable rule were not similarlyordered for all of the demographic groups within a speech community, so asingle variable rule could not describe the speech community as a whole. Forexample, in his study of Martha’s Vineyard, Labov (1972a) looked at threeethnic groups living on the island: Yankees, Portuguese, and Indians. Hefound that speakers in all of these groups centralized the first vowel of thediphthongs [ay] and [aw], so that “about the house” could be pronounced[əbəwt ðə həws] and “pie in the sky” pronounced [pəy in ðə skəy]. This central-ization could be represented by a single variable rule in which [a] is centralizedto [ə] before a following glide, and where [y] and [w] are variable constraints,with [y] the stronger of the two. Such a rule would look like (3), which iswritten using a formalism different from that in (2), but which was commonin the variationist literature. It is included here to facilitate the discussionin later chapters. The ranking of the constraints in (2) was indicated bytheir order within the diagonal brackets, with the strongest constraint placed


highest. In (3) the ranking of the constraints is indicated by Greek letters,where A (alpha) marks the strongest constraint, and B (beta) marks the nextstrongest constraint.

(3) a → (ə) / {AB

yw}

Rule 3 claims that centralization is more likely to occur in [ay] than in [aw].However, Labov found that, as centralization became more common, speakersin the Portuguese and Indian communities had re-ordered the constraints sothat [w] was stronger than [y], thus violating the uniform constraints assump-tion for the community as a whole.

Romaine (1982) raised a similar objection to the uniform constraintsassumption. She pointed out that within larger speech communities there existseparate social networks, whose speech patterns may differ. Milroy (1982) sec-onded this observation, noting “the requirement that variable rules are statedin terms of generative rules imposes decisions as to the . . . structural descrip-tion of, a rule—for the whole community [emphasis in original]—when ourexperience suggests that there may be different variable ‘competences’ withinthe community” (p. 38). According to Milroy, speech in British cities is morevaried than speech in American cities. He characterizes the regular variationobserved by American sociolinguists as the “tip of the iceberg” and says thatBritish sociolinguists, looking beneath the waterline in cities like Glasgow,Edinburgh, and Belfast, have observed a lot more irregularity.

The contrast between the British and American patterns can be seen bycomparing Labov’s (1966) study of New York City speech with Milroy’s(1982) study of Belfast speech. As we have seen, Labov found that the fre-quency of /r/ deletion correlated with social class and the amount of atten-tion paid to speech. Middle-class speakers deleted /r/ less than working-classspeakers, and speakers of all classes deleted /r/ less in monitored styles. Thesefacts suggest that for New Yorkers of all social classes /r/ deletion is stigma-tized and constitutes evidence that New York City is a single speech com-munity. The situation is quite different in Belfast. There, working-classspeakers variably front low vowels (especially before velars), so that back andcab, which are normally pronounced [bak] and [kab], can be pronounced[bæk] and [kæb], as in American English. For these speakers, however, front-ing is stigmatized, as shown by the fact that in their monitored speech theworking-class informants front these vowels less frequently. Middle-classspeakers, on the other hand, use back vowels for this class of words, in allstyles. Complicating the picture still further is the fact that the most pres-tigious style, British Received Pronunciation, employs fronted vowels, andthis style is regularly heard on local newscasts. Clearly, all Belfast speakers donot evaluate low front and back vowels in the same way. Therefore, Belfastcan be described as a speech community that contains several socialnetworks.


A different kind of objection to variable rules was raised by Derek Bickerton(1971), the eminent creolist. He claimed that they were unlearnable, observing:

If we accept the variable-rule principle, we must also accept that themind possesses not only the apparatus necessary for framing two quitedifferent types of rule (i.e. standard grammatical rules and variablerules), but also some kind of recognition device to tell the speakerwhether to interpret a particular set of data as rule-plus-exceptions or asarea-of-variability. When we recall that the data on which non-variablerules are based is often incomplete and heterogeneous, the mode ofoperation of such a device must seem somewhat mysterious. (p. 460)

Bickerton (1971) went on to point out that the learnability problem posed byvariable rules is even more serious:

In order that the average for [a group of speakers that share a variablerule] should remain constant, the variation of the individual must beconfined within a relatively narrow range. What keeps his percentagewithin those limits? And how can it keep within them unless something,somewhere is COUNTING ENVIRONMENTS and keeping a runningscore of percentages? (p. 461)

Variationists (the first was Anshen, 1975) were quick to point out thatBickerton’s second objection ignores the way probability laws operate. Theselaws govern, for example, how often various combinations of numbers willturn up on a pair of dice during an evening at a casino, but no one would claimthat the dice must keep track of past outcomes. Rather, the probabilities of thedifferent combinations result from how the dice were manufactured (or sub-sequently altered). By analogy, the probability of deleting /t,d/ in a particularset of circumstances could result from how connections within the brain havebeen established and possibly altered by learning. Thus, this objection can bedismissed. However, Bickerton’s first objection, that the “mode of operation ofsuch a [mental] device must be somewhat mysterious,” was certainly validin 1971.

Bickerton’s (1971) objections to variable rules were repeated by Gregg(1990). The journal Applied Linguistics published a face-off between Gregg, asecond language acquisition (SLA) scholar from the generative camp, and RodEllis (1990) and Elaine Tarone (1990), SLA scholars from the variationist camp.Gregg’s article was titled “The Variable Competence Model and Why It Isn’t.”Gregg seconded Bickerton’s objection that variation theory does not include atheory of acquisition; that is, that variation theory has no explanation for howspeakers can learn the probabilities embedded in variable rules. What is neededto answer Bickerton’s and Gregg’s argument is a theory that explains howspeakers can learn and produce probabilistic patterns. Since Bickerton’s(1971) article, such a theory has appeared and has become an important part of


theories of language production and comprehension. It is called connectionism,and we will discuss it in some detail in chapter 6.

Replies to objections. Sankoff and Labov (1979) replied to Kay andMcDaniel’s (1979) criticisms of variation theory in the same issue of Languagein Society, and their reply can serve as an answer to Romaine’s (1982) criticism,as well. Sankoff and Labov (1979) denied that most variation studies hadadopted the uniform constraints assumption, pointing out first of all that moststudies that postulated a single shared variable rule, such as Cedergren andSankoff ’s (1974) discussion of s spirantization in Panama Spanish, were “rifewith disclaimers.” Cedergren and Sankoff (1974, p. 353) had, in fact, stated:

This approach neatly solves the problem of community heterogeneity—perhaps too neatly; care should be taken to detect categorical rule differ-ences where these exist . . . Further statistical methods must be developedin order to judge when small data sets on individual speakers can beaggregated without obscuring categorical distinctions between individualgrammars. (quoted in Sankoff and Labov, 1979, p. 203)

Sankoff and Labov (1979) also pointed out that many studies employing vari-able rules were specifically aimed at discovering which individuals and groupswithin a community shared which rules and which constraints. For example, intheir study of South Harlem teenagers, Labov, Cohen, Robins, and Lewis (1968)constructed a single variable rule for /t,d/ deletion only after analyzing the datafrom each of their subjects individually. Because these were early days in vari-ation studies, the researchers were surprised to find that their subjects sharedthe same constraints and the same constraint ordering.

Sankoff and Labov (1979) also denied that variationists believe speechcommunities are defined only by shared variable rules with uniform con-straints, or that speech communities can even be precisely defined. They said,“We know that every speaker is a member of many nested and intersectingspeech communities” (p. 202). An example of such intersection is found in thespeech of African Americans and European Americans in New York City. Bothgroups of speakers share a variable rule for copula contraction, allowing “He isa student” to be shortened to “He’s a student.” The linguistic constraints onthis rule are similarly ordered for both groups. However, the African Americanspeakers also have a variable rule for copula deletion, allowing “He’s a student”to be further shortened to “He a student.” European American speakers do notshare this rule, nor do they share a number of other rules characteristic of AAE.Fasold (1985) put the sociolinguists’ debate about variable rules in perspectivein a review of Romaine’s (1982) book. He observed that Labov and his follow-ers approached the study of language by addressing the same basic questionthat Chomsky addressed: What is the nature of the speaker’s mental grammar?Labov answered the question, in part, in the same way that Chomsky did: Themental grammar contains (or at least can be modeled by) generative rules.


However, Labov went on to ask a second question: How does the speaker’smental grammar allow for the patterned variation that is found in speechcommunities? His answer was to modify the optional rules of generativegrammar to create variable rules, which could describe the probabilisticpatterns observed in speech. Thus, Labovian linguistics, like Chomskianlinguistics, is conceived as a psychological enterprise. Sociolinguists likeRomaine and Milroy, on the other hand, approached linguistics from a socio-logical perspective. They looked at communities defined geographically, likeEdinburgh and Belfast, and asked how speech varies within these communities.They reported their findings in terms of statistical patterns but not in terms ofrules of a mental grammar. In other words, Labov and his colleagues’ top prior-ity was theoretical and explanatory; Romaine and her colleagues’ top prioritywas social and descriptive.

Preston (2002) sorts out the differences between the psychological andsocial perspectives in variation studies in a somewhat different way. He dis-tinguishes three levels of sociolinguistic theories. A level I theory concerns itselfonly with correlations between linguistic forms and social facts. Table 1.1 rep-resents a level I description. A level II theory concerns itself only with correl-ations between linguistic forms and the linguistic environments in which theyoccur. Variable rules represent a level II description. A level III theory repre-sents both social and linguistic factors, and also aims to describe the course oflinguistic change over time. Chapters 6 and 7 contain level III studies.

The Logical Status of Variable Rules

The attempt to incorporate variable rules into a generative grammar alsocame in for some strong criticism. One objection, mentioned by Bickerton(1971), Kay and McDaniel (1979), and Gregg (1990), was that variable rules areincompatible with generative grammar because they represent a differentlogical object. Berdan (1975) put it this way:

If variable rules . . . are to become part of the competence grammar, thereneeds to be serious rethinking of the way in which language is definedand the definition of the grammar that accounts for language. (p. 22)

At the time variable rules were introduced, generative grammar had twomajor goals: (1) to construct an algorithm for generating all and only thegrammatical sentences of a language, and (2) to discover principles of Uni-versal Grammar that explained how speakers can learn the grammar describedby (1). Generative linguists believed that both of these goals could be accom-plished by a competence grammar, and a competence grammar did not addressquestions of how often or under what linguistic and social circumstances aparticular rule would be used, as we have seen. Generative research involved thestudy of types of structures (what are the possibilities for pronouncing the -ingmorpheme?). Variation research involved the tabulation of tokens of a structure


(how many times does a speaker use in’ versus -ing?). This question was con-sidered to be a matter of performance. Thus, according to Bickerton (1971),Labov was committing a category error by introducing probabilistic descrip-tion into a generative grammar. Bickerton was right, and Sankoff and Labov(1979) acknowledged that probabilistic grammars had a “different logicalstatus” than categorical grammars, and that “variable rules are rules of produc-tion” (p. 202). It remained for variationists to find a performance level theoryof production that would allow for probabilistic description. Such theories willbe discussed in chapter 6, where probabilistic models of language production,comprehension, and acquisition from the field of psychology are described.The psychologists who have proposed these models do not use the term “vari-able rule,” and that term has almost disappeared from the vocabulary of socio-linguists as well. Nevertheless, the variable rule is still useful for describing theconstraints on alternating linguistic forms, and I will continue to use it in thisbook. It is understood, however, that the term is a heuristic device that does notmake direct psychological claims. But before looking at the psychologicalunderpinnings of variable speech behavior, we will turn to some specificstudies of variation in the speech of native speakers and language learners.


2A Study of Variation in the Native

Speaker Speech Community

H. D. ADAMSON AND VERA REGAN

Introduction

In order to get a closer look at a quantitative sociolinguistic study, and to seehow the Varbrul program works, we now turn to a small-scale study ofvariation in the speech of native English speakers living in Philadelphia. Thisstudy involves the well-researched variable -ing,1 which can occur in a numberof different linguistic contexts. The contexts examined in this study, whichinclude nouns (“darling”), gerunds (“skiing is fun”), adjectives (“an intriguingidea”), and others, are shown in table 2.1. In all of its contexts, -ing allows two

Table 2.1 Syntactic categories in which -ing occurs.

Categories Example

Verbalsprogressive He’s eating pizza.periphrastic future He’s going to eat pizza.VP complement I like watching rugby.WHIZ deletion The man going home stopped.sentential complement You’ve got to be quick, throwing answers back.participle We go out there fishing.

Modifiersadjective This is a tempting idea.complex gerund I want a swimming pool.

Prepositionpreposition It was during the summer.

Gerundgerund I was amazed by Mary’s recovering her wallet.

Nominalsplace name (internal) Washington is the capital.noun It’s on the ceiling.t-word (only two) I saw something.

I saw nothing.

23

pronunciations: the informal [in], sometimes spelled darlin’, skiin’, etc., andthe formal [iyŋ]. Hereafter, [in] will be referred to as N and [iyŋ] will bereferred to as G.

Sociolinguistic Studies of -ing

Research has consistently shown that the -ing variable is widespread throughoutthe English-speaking world. In Philadelphia, -ing is a stable sociolinguisticvariable, one that has existed in the community for many years and is not in theprocess of spreading or declining. In this respect, -ing contrasts with otherkinds of variable features of Philadelphia speech. These include variablesinvolved in changes in pronunciation that are nearly complete, such as theraising of /æ/ before nasals, so that planet is pronounced [plinət], and new andvigorous changes in pronunciation, such as the fronting of /ow/, so that know ispronounced [nəw]. In the first quantitative study of -ing, Fischer (1958)examined the factors that conditioned the variable. He found that, in the caseof schoolchildren, gender and topic affected the proportion of N and G variants.Boys used N more than girls, and casual speech had a higher proportion ofN. Anshen (1969) studied -ing in southern black and white speech, and found anumber of similar features. In both varieties of speech, men used a higher per-centage of N than women, and casual speech contained a higher percentage ofN than careful speech. In addition, speakers with less education and lessprestigious occupations used more N. African Americans had a higher percent-age of N than European Americans. Labov (1966) first demonstrated thesocial stratification of -ing. In his study of New York City speech, he found acorrelation between race and frequent N usage. He also found that southernAfrican American speakers used more N than northern African Americanspeakers.

In his Norwich study, Trudgill (1974) found, in common with the otherstudies, that men tended to use N more than women, and that casual speechcontained higher frequencies of N than careful speech. Trudgill (1974) alsofound that -ing is a good indicator of social class and that the percentage of Ncan vary from 0 percent for middle-middle-class (MMC) and lower-middle-classspeakers in word list style to 100 percent in lower-working-class speakers incasual style. Stylistic variation was greatest in the case of the upper workingclass, with a range of from 5 to 87 percent. Trudgill (1974) suggested that this isdue to “U[pper] W[orking] C[lass] L[ower] M[iddle] C[lass] awareness of thesocial significance of the linguistic variable because of the border-like nature oftheir social class position” (p. 100). He observed that -ing differentiatedbetween the five social classes he isolated, but that it particularly marked thedistinction between middle-class and working-class speakers. UWC speakersshowed the greatest amount of stylistic variation and MMC speakers showedthe least. Because -ing pronunciation distinguishes between social classes,speaking styles, and genders, Trudgill (1974) concluded that this phonological


variable reflects part of the value system of English speakers, a point that will betaken up in chapter 8.

Perhaps the most extensive study of -ing is reported in Labov (2001a), astudy that is of particular relevance to the research reported here because itinvolved residents of the same Philadelphia neighborhood as the present study.Labov (2001a) found results similar to those of previous researchers. There wasconsiderable gender stratification, with men using higher frequencies of Nthan women, as well as social class stratification. Like Trudgill (1974), Labov(2001a) found that stylistic variation was not great for lower- and working-classspeakers, but increased among the middle social classes. Furthermore, he foundthat the difference between men’s and women’s N usage was far higher formiddle-class speakers than for working-class speakers. Echoing Trudgill’s(1974) theory of the linguistic behavior of the border-like classes, Labovobserved, “In general, the second highest social group shows the greatest genderdifference and the sharpest [difference in] style shifting” (2001a, p. 272).

In regard to the linguistic constraints on -ing, several studies have foundevidence for assimilation to the place of articulation of the following sound.Shuy, Wolfram, and Riley (1968) and Cofer (1972) found that a following velarstop favored G, and a following alveolar stop favored N. Labov (2001a), how-ever, did not find this effect. Houston (1985) found a grammatical effect on thedistribution of N and G among British speakers according to the syntacticcategory of the ing word. When -ing occurred in nouns or in the pronounssomething and nothing, G was used at a high frequency. However, when -ingoccurred in verbals, such as the periphrastic future tense or a progressive tense,N occurred more frequently. Houston (1985) was able to arrange the categoriesin which -ing occurred along a continuum ranging from noun to verb thatreflected the frequency at which a category took N. A simplified version of thiscontinuum is shown in (1).

(1) progressives > participles > gerunds > pronouns > proper nouns

The hierarchy in (1) is implicational, making the claim that the frequencyof N in any particular category is higher than the frequency of N in all thecategories to the right of it. Labov (2001a) found a similar grammatical effect,and set up an implicational hierarchy similar to (1), where verbal elementscorrelated with higher N usage and nominal elements correlated with higherG usage. To explain this ordering, Houston (1985) claimed that the grammat-ical categories in which -ing can occur are not discrete, but form a continuumranging from noun to verb, and that this continuum is the result of a historicalmerger.

Prior to the fourteenth century, the present participle in English did not takethe suffix -ing, but rather -ind.2 The suffix -ing occurred with verbal nouns,such as lufiung (loving): and concrete nouns, such as farthing. However, verbalnouns began to acquire features of verbs, as shown in (2).

Variation in the Native Speaker Speech Community • 25

(2) Mary’s frequently playing the drums bothered George.

In (2) the gerund playing acts like a noun in that it is part of the subject ofbothered, but it acts like a verb in that it takes the adverbial modifier frequentlyand the object the drums. Gradually, the functional difference between nomi-nals and verbals was blurred, with gerunds occupying the fuzzy area betweenthe two categories. The partial coalescing of the categories verbal noun andverbal led to a blurring in the phonetic distinction between the two categories,which were already very similar; thus, [ind] began to be pronounced [in]. Bythe end of the fourteenth century, this sound change had progressed to thepoint that it was reflected in the orthography, so that all the forms in table 2.1were spelled with -ing. Despite the similarity in spelling, however, nominal andverbal -ing forms continued to be pronounced variably, with nominals favoringG and verbals favoring N, a pattern that has continued up to the present day.

To summarize, all of the studies have found the variable -ing to be sensitiveto social and linguistic constraints. On the social level, -ing is sensitive to aspeaker’s gender, speaking style, and socioeconomic class. On the linguisticlevel, -ing is sensitive to the syntactic environment in which it occurs and maybe sensitive to the phonological environment, as well. As Houston (1985)concluded, “not only external, social factors influence the realization of -ing ina regular stable way across diverse speech communities, but internal linguisticfactors exhibit such stable patterns as well” (p. 50).

Methods of Data Collection and Analysis

The subjects for the present study of -ing were 31 native English speakers,10 men and 21 women, whose ages ranged from 17 to 90, with a median age of38. The data were collected by graduate students in a sociolinguistics fieldmethods course at the University of Pennsylvania in 1988. The students tape-recorded interviews conducted using the standard question modulesdeveloped at the University of Pennsylvania (Labov, 1984). These modules areintended to control for shifts in formality, topic, and audience by using a stand-ard format in which one or two interviewers ask memorized questions abouttopics that include “danger of death,” “community services,” “childhoodgames,” and so on.

The variation of -ing in the speech samples was analyzed using the Varbrul 2computer program (Cedergren and Sankoff, 1974). In order to use the program,the analyst must first specify the linguistic and extralinguistic factors believedto constrain the variation. Following Cofer (1972) and Houston (1985), thefollowing factors were examined for their possible effect on -ing variation:grammatical category, following phonological environment, speaking style,and gender. These broad factors, called factor groups, were divided into theirsmaller constituent factors, as shown in table 2.2. For example, the factor groupgender contains two factors: men and women. Next, each token of -ing that


appeared in the data was coded according to which dependent variable (N orG) occurred, and which of the independent variables (the proposed factors intable 2.2) co-occurred with the dependent variable.

In regard to the coding of speaking style, it should be noted that the defin-ition and separation of speaking styles has been a continuing controversy insociolinguistics (see the discussion in chapter 8). As discussed in chapter 1,Labov has defined style in terms of the context of speaking. One such context isreading a list of words; another context is reading a paragraph; a third contextis speaking to a researcher during a sociolinguistic interview. Word list styleand reading style are, of course, easy to identify. The major problem has been toidentify styles within the sociolinguisitc interview. Over the years, researchersat the University of Pennsylvania have developed what is called the style tree,shown in figure 2.1 (Labov, 2001b). The tree is a coding device that allows theanalyst to determine whether an utterance is likely to be in careful or casualstyle according to the topic being addressed. The definitions of the differentstyles are as follows: response = the first sentence in response to a question;language = discourse about language; soapbox = persuasive discourse: whenthe subject mounts a “hobby horse”; careful = other discourse that theresearcher believes to be monitored on the basis of “channel cues” (Labov1972a, pp. 79–99), such as a change in tempo, pitch range, volume, or rate ofbreathing. The definitions of the unmonitored speech styles are as follows:

Figure 2.1 Decision tree for stylistic analysis of spontaneous speech in thesociolinguistic interview (from Labov, 2001b). Reprinted with permissions.


quote = telling what someone else said (note that this style is not representedin the style tree in figure 2.1; if it were, it would be placed above narrative);narrative = recounting a past event; group = when addressing an audienceother than just the interviewer(s); kids = discourse about children; tangent = anaside; casual = other discourse that the researcher believes to be unmonitoredon the basis of the channel cues described earlier. In addition to these cues,laughter is a cue of casual style. Notice that the upper branches of the style treerepresent more objective decisions than the lower branches. It is easy to tell ifan utterance is an immediate response to a question, but more difficult to tell ifa topic is tangential to the main subject of discussion.

Having coded all tokens of -ing in the data for the factors in table 2.2,the analyst can run the Varbrul 2 program, which identifies the contribution ofeach proposed factor to the probability of N being produced. Table 2.2 alsodisplays the results of the Varbrul analysis. As mentioned, the first columnin table 2.2 shows the four proposed factor groups: speaking style, gender,grammatical category, and following phonological environment. Column 2shows the individual factors that make up each factor group. The Varbrul 2program reflects the claim of Variation Theory that many factors simul-taneously influence a speaker’s choice of a particular variant. As we have seen,in previous studies casual style favored N; men favored N; verb-like syntacticcategories favored N; and in some studies, a following apical favored N.

The Varbrul 2 program calculates the probability (if any) that each proposedfactor contributes to the occurrence of N and displays its finding by attaching adecimal number, or coefficient (p), to each factor. The p values are shown incolumn 4 of table 2.2. A p value greater than .50 indicates that the factor favorsN, whereas a p value less than .50 indicates that the factor disfavors N. Theprogram also provides statistical measures of how well the linguist’s analysis ofthe data (that is, the proposed factors and factor groups) actually fits the data.One measure is a chi-square per cell figure which, according to Preston (1989,p. 15), should be no higher than 1.5, and preferably below 1.0. The secondmeasure is a stepwise regression analysis that calculates the extent to whicheach factor group accounts for the variability in the data.

In coding the data for a Varbrul 2 analysis, it is a good idea to code asbroadly as possible, so that all factors that are suspected to affect the variableare included. This is so because factors for which there are insufficient data canbe combined with similar factors during the data analysis. However, if it issuspected that a factor is at work that has not been coded for, the entire corpusmust be coded again. Table 2.2 shows that many of the original factors havebeen combined. For example, in the factor group style there were originally tenfactors, four of which represented a careful style, and six of which represented acasual style. In the final analysis, these factors were combined to form only twofactors: careful style and casual style.

The next step in the analysis is to determine whether there are any factors in


whose presence N always or never occurs. If so, these knockout factors must beexcluded from the input to the Varbrul 2 program because it can handle onlyvariable data. As is often the case, several knockout factors occurred in theinitial analysis due to an insufficient number of tokens involving that factor.However, when the factors were conflated in the way shown in table 2.2, all of

Table 2.2 Probabilities of N in the Philadelphia native speaker data according tomonitoring, gender, grammatical category, and following phonologicalenvironment.

p % n

Speaking Careful response .32 28 228Style language

soapboxcareful

Casual quote .72 72 231narrativegroupkidstangentcasual

Speaker’s Female .24 20 269Gender Male .77 65 251

Grammatical Futurea 1.00 100 20Category Progressive .63 55 209

Verbal participle .47 33 96verb complementsentence complementWHIZ deletion

Gerund .46 24 67Modifier adjective .45 43 35

complex gerundNominal noun .29 23 78

t-forminternal

Preposition .13 2 9

Following Apical .61 49 137Phonological Labial .56 53 79Environment Back velar .50 45 22

palatalSemivowel .46 42 45Pause .43 33 79Vowel .42 36 151

Note: input probability .39; chi-square per cell .870.a Since N occurs categorically for this factor, future is a “knockout” constraint and, therefore, was

not included in the actual Varbrul analysis. It is included here to give a complete picture of theeffect of grammatical categories.


the knockout factors except future, which showed 100 percent N, disappeared.Thus, although future is included in table 2.2 for convenience, it was not part ofthe Varbrul 2 analysis.

Results

The chi-square per cell score for the Varbrul 2 analysis is .870, which exceedsPreston’s (1989, p. 15) criterion for a good fit between the theory of whichfactors constrain the variation represented in table 2.2 and the actual data. Thestepwise regression showed that the effects of three of the four factor groupswere significant: style, grammatical category, and gender. The effect of followingphonological environment was not significant, as was the case in Labov(2001a).

The Effects of Gender and Style

Table 2.2 shows that men and women produced N at very different rates: forwomen, p = .24 (20%), whereas for men, p = .77 (65%). Before looking at theeffect of style, it is necessary to ask whether style shifting had the same effect forboth genders. It is possible, for example, that in casual speech women lowertheir frequency of N, whereas men raise their frequency. If this were the case,the variables gender and style would be said to interact. The Varbrul 2 programassumes that independent variables do not interact, so if an interaction isdetected, one of the interacting variables must be eliminated from the Varbrul2 analysis. Interaction can be checked for by cross-tabulating the factor groupsgender and style, as shown in table 2.3, which reveals that style shifting has thesame effect for both sexes. In careful style, women produce N at 8 percentversus 42 percent in casual style. Men produce N at 51 percent in careful styleversus 85 percent in casual style. Thus, for both genders, switching from carefulto casual style results in a higher percentage of N. Since this is the case, genderand style do not interact, so including both factor groups in the Varbrul 2analysis is justified.

Table 2.3 Frequency of N according to style and gender.

Style

Careful Casual Total

Gender % n % n % n

women 8 157 42 85 20 242men 51 130 85 95 65 225total 28 287 65 180 58 467


The Effect of the Grammatical Category

As we have seen, Houston (1985) and Labov (2001a) found that the frequencyof N was conditioned by the grammatical category to which individual tokensbelonged. More noun-like categories favored G, and more verb-like categoriesfavored N. In our study, the data for the Philadelphia native speakers containonly 520 tokens of -ing compared to Houston’s 2,363 tokens; therefore, it is notpossible for us to make distinctions in grammatical categories that are as fine asHouston’s. However, the continuum of grammatical categories shown in thethird factor group in table 2.2 is similar to Houston’s continuum. Table 2.2shows that the two verbal categories progressive and periphrastic future are mostfavorable to N, whereas the nominal category is highly unfavorable. As inHouston’s data, the verbal, gerund, and adjective forms fall in between thesetwo extremes. In the Philadelphia data, however, the forms in between verbaland nominal are not distinguished: they take N at approximately the samefrequency. A major difference between the use of -ing by our Philadelphiaspeakers and Houston’s (1985) British speakers is the pronunciation of pre-positions. For prepositions, the British speakers highly favor N (77%), whereasthe Philadelphia speakers highly favor G (98%). In sum, our analysis ofthe Philadelphia data, like Houston’s (1985) analysis of the British data, showsa grammatical effect on the production of -ing.

The Effect of the Following Phonological Environment

As mentioned, several studies of -ing have reported the effect of regressiveassimilation. Shuy, Wolfram, and Riley (1968), Cofer (1972), and Houston(1985) found that a following velar stop favored G and a following apicalfavored N. There is some tendency for this effect in the present study, as well,as shown in table 2.4, which shows that when the sound following -ing isproduced at the front of the mouth, as in the case of labials and apicals, Noccurs at the rate of 53 percent and 49 percent, respectively. However, when thefollowing sound is produced at the back of the mouth, as in the case of palatalsand velars, N is produced at the rate of only 45 percent. This difference,

Table 2.4 Percentage of N by following phono-logical environment.

Following environment % n

apical 49 137labial 53 79velar and palatal 45 22semi-vowel 42 45pause 33 79vowel 36 151


however, is not large and does not reach significance. Thus, the present studysupports Labov (2001a), which found no significant effect of the followingphonological environment.

Discussion

This analysis of a small sample of Philadelphia speech has revealed a numberof patterns that have been found in more extensive sociolinguistic studies,including those of Labov (2001a), which included a much larger number ofPhiladelphia speakers. One finding is that there are no single-style speakers. Allof our subjects produced higher proportions of N in casual speech than theydid in careful speech. A second, related, finding is that -ing is evaluated in thesame way throughout the speech community, as shown by the fact that all ofour subjects, both men and women, used less of the stigmatized form N incareful speech. Labov (2001a, p. 214) notes that uniform evaluation throughoutthe speech community is characteristic of stable sociolinguistic variables like-ing (but recall the discussion of the variable rule speech community inchapter 1). A third finding is that different social groups set themselves apartfrom each other by the frequency at which they use sociolinguistic variables.The two groups examined in this study, men and women, used N at verydifferent rates: men at an average of 65 percent and women at an average of20 percent, a difference of 45 percentage points.

A fourth finding of this study is that speakers’ internal grammars canaffect the production of linguistic variables. As we have seen, the frequencies ofN and G are affected by the grammatical category of the word to which the -ingsuffix is attached. Other studies of sociolinguistic variables have discovered asimilar effect. As we saw in chapter 1, Wolfram and Fasold (1974) found thatfinal /t,d/ is less often deleted from consonant clusters when it representsthe past tense morpheme, as in missed, than when it does not represent amorpheme, as in mist. This finding supports the intuitive notion that themental mechanisms for learning grammatical categories and the mechanismsfor learning probabilities of variable forms are connected. This topic will beexplored further in chapter 6.

In conclusion, this small-scale study has revealed a number of importantprinciples of language variation within a speech community. However, severalimportant factors, such as the effect of age and social class on variation, havenot been addressed. These topics, and the important question of how variationrelates to language change, are the subject of the next chapter.


3Language Variation and Change

Introduction

Traditional theories of language have described linguistic forms and systems byabstracting away from the probabilistic nature of speech. In this regard, theycontrast with the kind of study in chapter 2. For example, generative grammarmight analyze -ing forms by saying that the underlying form is /iŋ/ for all thegrammatical contexts listed in table 2.2, and that this form can be optionallychanged to /in/ by means of a phonological rule. Thus, the grammar wouldspecify that -ing has two forms and the description would end at that point.But, as we saw in chapter 2, more can be said. By studying natural speech, theprobabilistic patterns of -ing usage can be discovered, and the two forms of -ingcan be correlated with linguistic and social factors. Such studies show that thelinguistic system used in a speech community is more complex than traditionaltheories of language have implied.

The Philadelphia system of -ing usage is not uniform and neat, but variableand messy. Charles Darwin (1859/1998) pointed out that the messiness oflinguistic variation is in some ways comparable to the messiness of variationamong species of horses, and that there is an important relationship betweenvariation and evolution in both languages and living things. Natural variationamong members of the same species is an essential component of the theory ofevolution, and Darwin studied it at length, remarking on such examples as thedifferences in color among members of the same species of shellfish (they arebrighter in southern waters), and the differences in bar markings amongmembers of the same species of horse. Darwin believed that the faint stripesobserved on some members of the species were evidence that zebras and horsesdescended from a common ancestor (1998, p. 208). Darwin also observedvariability in language use, remarking, “We see variability in every tongue, andnew words are continually cropping up” (quoted in Labov, 2001a, p. 8). Healso believed that language change results from a kind of natural selection,endorsing Muller’s (1861) view that:

A struggle for life is constantly going on amongst the words andgrammatical forms in each language. The better, the shorter, the easierforms are constantly gaining the upper hand, and they owe their successto their own inherent virtue. (quoted in Labov, 2001a, p. 9)

33

In this chapter, we will discuss the mechanisms by which variation in languageis related to language evolution and change.

A longstanding approach to the relationship of language variability andlanguage change was presented in an article by Weinreich, Labov, and Herzog in1968. These scholars divided the topic into five component problems: theconstraints problem, the transition problem, the evaluation problem, theembedding problem, and the actuation problem. These problems apply to alltypes of linguistic change, including syntactic change and semantic change,but they have most often been studied in relation to sound change, and ourdiscussion will focus on that area. Parallels will also be drawn to change in thesemantics of color category systems, which is discussed at greater length in theappendix. All of the problems identified by Weinreich, Labov, and Herzog(1968) (except the constraints problem, which concerns only ease of articula-tion) have both a linguistic aspect and a social aspect. We will discuss bothaspects of each of the problems in turn after some introductory remarks on thestudy of sound change.

Measuring Sound Change

Articulatory phonetics and acoustic phonetics are two approaches to the studyof speech sounds. Articulatory phonetics analyzes the shape of the vocal tractwhen different sounds are pronounced, while acoustic phonetics analyzes thesound waves produced by the vocal tract. Modern studies of sound change inprogress are carried out by acoustic phoneticians using the sound spectrograph,an instrument that measures the frequencies at which air molecules vibrateduring the production of speech sounds. A single vowel (like a single note on aguitar) does not consist of a single frequency of vibration, but rather of manyfrequencies: a basic frequency and a number of secondary frequencies, whichare called harmonics or overtones. When a vowel is pronounced, the soundenergy is supplied by the vocal cords, which set the air in the vocal tract tovibrating. The vocal tract (like the sounding board of a guitar) amplifies someof the frequencies of vibrating air and dampens other frequencies. The ampli-fied frequency with the lowest pitch is called the first formant (abbreviated F1).The amplified frequency with the next lowest pitch is called the second formant(F2), and so on. When the shape of the vocal tract is changed, as when it movesfrom the articulatory gesture for [a] to the articulatory gesture for [æ], differ-ent frequencies are amplified and dampened. Thus, the F1 and F2 for [a] aredifferent from the F1 and F2 for [æ], which is why we hear these vowels asdifferent from each other. In analyzing vowels using a spectrograph, all of thevowels can be distinguished by specifying just the values of F1 and F2. Form-ants are measured in terms of vibrations per second, or hertz (Hz). Here are theformant values for two vowels spoken by a man (from Ladefoged 1975, quotedin Clark and Clark 1977, p. 184):


In articulatory phonetics differences between vowels are specified mainly interms of the position of the tongue. For example, for the vowel [iy], the blade ofthe tongue is high in the front of the mouth; for the vowel [æ], the blade of thetongue is low in the front of the mouth. There is a helpful correspondencebetween the articulatory system and the acoustic system of classifying vowels. Itturns out that higher vowels correspond to lower frequencies of the F1 formantand more fronted vowels correspond to higher frequencies of the F2 formant,as can be seen in the chart above. Thus, the familiar vowel triangle can berepresented in acoustic terms as in figure 3.1 (which represents the averageformant values of several Philadelphia vowels), with the F1 formant valuesshown on the y axis and the F2 formant values shown on the x axis.

In chapter 2, we noted that -ing was a stable variable in Philadelphia speech;that is, the overall frequency of G versus N and the stratification according tospeaking style and gender is not changing. For this reason, -ing cannot be usedas an indicator of change in progress. For such an indicator, we must turn to thevowel system, in which Labov (2001a, p. 140) has identified no fewer than 15changes in progress. These can be divided into four different types of change:

Figure 3.1 Mean values of all Philadelphia vowels with age coefficients. Circles: meanF1 and F2 values. xxC = vowel followed by an obstruent consonant (“checked”). xxF =vowel not followed by an obstruent consonant (“free”). xxO = vowel followed by avoiceless consonant (from Labov, 2001a). Reprinted with permissions.

Vowel As in F1 F2

[iy] heed 280 Hz 2250 Hz[æ] had 690 Hz 1660 Hz

Language Variation and Change • 35

nearly completed changes, partly completed changes, new and vigorouschanges, and incipient changes. Figure 3.2 shows some examples of theseongoing changes in vowel positions. The data in figure 3.2 come from a cross-sectional study of Philadelphians (Labov, 2001a), which included speakers ofall ages, ranging from under 20 to over 50. The tails of the arrows in figure 3.2represent the position of the F1 and F2 formants for speakers 25 years olderthan the average age of all the informants in the study, and the heads of thearrows represent the formant values for speakers 25 years younger than theaverage age. As figure 3.2 shows, the younger speakers are usually pronouncingthe vowels in question at a higher and more fronted position than the olderspeakers (though /uw/ and /ow/ are moving in the opposite direction). Ourdiscussion will focus on some of the changes shown in figure 3.2. We will con-sider these changes in relation to the five problems identified by Weinreich,Labov, and Herzog (1968) involved in understanding how (and perhaps why)languages change.

Five Problems in Explaining Sound Change

The Constraints Problem

The search for linguistic constraints on sound change has focused on chainshifts. A chain shift occurs when a vowel changes its position in phonetic space(the space shown in figures 3.1 and 3.2), thus forcing other vowels to change

Figure 3.2 Mean values of all Philadelphia vowels with age coefficients. Circles: meanF1 and F2 values. Heads of arrows: expected values for speakers 25 years younger thanthe mean. Tails of arrows: expected values for speakers 25 years older than the mean.xxC = vowel followed by an obstruent consonant (“checked”). xxF = vowel not followedby an obstruent consonant (“free”). xxO = vowel followed by a voiceless consonant(from Labov, 2001a). Reprinted with permissions.


their positions in order to maintain contrasts. For example, in the GreatEnglish Vowel Shift /iy/ was lowered and centralized to /ay/, and /ey/ was raisedto fill the resulting empty phonetic space. The search for linguistic constraintson such changes is parallel to the search for linguistic universals in generativetheory, and, as discussed in the appendix, to the search for cognitive universalsin the evolution of color category systems.

Investigations of sound change in progress have discovered several principlesof chain shifting. One of the most robust of these principles is: In chain shifts,back vowels move to the front (Labov, 1994, p. 116). Attested in many studies ofhistorical change, this principle makes sense from a physiological standpointbecause the asymmetry of the mouth gives more room in the front for phoneticdistinctions than in the back. As figure 3.2 shows, in Philadelphia speech manyvowels are moving to more fronted positions. But, note that there are exceptionsto the fronting principle, as there are to all of the principles of sound changethat Labov has found. For example, in Philadelphia speech /uw/ and /ow/ arenot moving forward, but rather toward more backed and lower positions.Thus, the principles of language change have the character of tendencies ratherthan inviolate laws. Labov (2001a) concludes that, although physiologicalfactors have a strong influence on sound change, they interact with and can beoverridden by social factors. In his words, “General linguistic principles . . .form the favorable undercurrent, or perhaps prevailing wind, for changes nowin progress. Given enough social motivation or contrary linguistic pressures,retrograde movements can be set in motion, just as a boat may tack into thewind” (2001a, p. 499).

The Transition Problem

Linguistic aspects. The transition problem concerns the route by which (or how)a sound change moves from one phonetic position to another. Figure 3.2illustrates two possibilities. The first possibility is where the change is notconditioned by the linguistic environment, as illustrated by the movement of(aw).1 The arrow associated with (aw) shows that its nucleus is being frontedand raised. Note that there are no environmental constraints on (aw) move-ment: all words that contain (aw) are affected by this change (though somelexical items, called outliers, may initially be affected more than others). Asecond possible manner of sound change is where the linguistic environmentdoes affect movement. An example from figure 3.2 is the new and vigorouschange of raising [ey] before an obstruent (which is represented by thenotation (eyC)) as in plate and raise. This change is not occurring in wordswhere [ey] is not followed by an obstruent, as in say. If this sound changecontinues, it is possible that [ey] raising will spread to other environments andeventually to all words that contain the [ey] sound.

These two examples of sound change are instances of regular sound change, achange that affects classes of words, not individual lexical items. Another type


of sound change is lexical diffusion, a change that progresses on a word-by-wordbasis. In lexical diffusion some words containing a particular sound in aparticular environment are affected while other words are not. One result oflexical diffusion can be seen in figure 3.2. Notice that the figure contains twovariables with the phone [æ]: (æh) and (æ). The first of these, (aeh), is a tensedvowel with an in-glide represented by [h] (figure 3.2 shows only the value of(æhN), that is (æh) followed by a nasal. The values of (æh) in other environ-ments would be placed near the tail of the (æhN) arrow—see the discussionbelow). Of the two variables, (æ) is the older sound, so words that are nowpronounced as (æh) used to be pronounced as (æ). This (now nearly complete)sound change spread through the Philadelphia community in two stages. First,regular sound change affected all [æ] phones in syllables that ended in a voice-less stop, such as cat and flack. Next, syllables that did not end in a voiceless stopwere affected, and in this stage the change spread by means of lexical diffusion.Thus, some syllables not ending in [p,t,k] were tensed, including bad, glad, andthe first syllable in planet, but other syllables were not, including sad, glass, andthe first syllable in Janet. The word can used as a noun or main verb was tensed,but the word can used as an auxiliary verb was not. Thus, lexical diffusion hasproduced a split in homonyms, so that can (noun, verb) is tensed /kæhn/ butcan (AUX verb) is not /kæn/. In this way a new phoneme, /æh/, has beencreated in Philadelphia speech. As figure 3.2 shows, this new phoneme is nowitself being raised and fronted, especially before nasals, whereas the olderphoneme /æ/ is being lowered.

Social aspects. A fundamental question in regard to transition is whether thelocus of sound change is individual or generational; that is, do individualschange their internal systems during their lifetimes or do children constructsystems that are different from those of their parents? Labov (2001a) concludesthat, in general, individual systems remain fairly stable after the age of nine orten. This is especially true in regard to the underlying phonemic system. Payne(1980) found that adults who moved to the Philadelphia area from out of statecould acquire the Philadelphia rules for regular sound change, such as theraising of (eyC) and (aw), which, as we have seen, produce new allophonesrather than new phonemes and therefore require learning only one rule thataffects all of the words in a class. However, these adults were not able to acquirethe tensing rule for changing /æ/ to /æh/, which involves the lexical diffusionmechanism and must be learned, for the most part, one word at a time. Thus,the adults approximated the Philadelphia pattern, but did not completelymatch it. Payne (1980) found that acquiring the true Philadelphia dialect, withits major innovation of a new phoneme, was possible only for children born toparents from Philadelphia.

The transition problem is intimately connected with the problem of lan-guage acquisition. Studies of first language acquisition carried out within thegenerative paradigm have focused on the acquisition of invariant features, such


as the fact that English, as a minus pro-drop language, requires that everysentence have a subject. As discussed in chapter 1, it is claimed that children areborn with a Universal Grammar containing the knowledge that the targetlanguage will be subject-mandatory (like English) or subject-optional (likeSpanish). Given this initial knowledge, the child can set the pro-drop parameterat either plus or minus on the basis of just a few examples, or triggers. But,learning the appropriate use of a variable feature like -ing or (eyC) raising isvery different. Here the learning is not categorical but probabilistic: the childmust learn the frequencies of the variable that are appropriate to his or her age,gender, and social class. Labov (2001a, p. 419) identifies the mental mechanismof such learning as probability matching, an ability that is observed in manyspecies, which enables an individual to duplicate the probabilistic behavior ofother members of the species. We will discuss how probabilities can be learnedin regard to language production in chapter 6.

The transition problem in regard to learning stable sociolinguistic variableslike -ing is different from the problem in regard to learning variables in theprocess of change like the raising of (eyC). In the first case, children need onlyto learn to talk like their parents. But in the second case, children must learnto talk differently from their parents in order to move the change in thedirection of the arrows for (eyC) in figure 3.2. That is, they must learn to raise(eyC) to a greater height than their parents do. Let us first consider the simplercase of a stable variable, where children must learn to match the parents’ fre-quency of variation. Roberts (1993) found that by age three, children matchedthe adult grammatical constraints on -ing discussed in chapter 2, and that theystyle shifted, using N more frequently in informal speech. The first findingimplies that the mental mechanism for probability matching cannot beindependent of the mental mechanism for grammar learning because childrenmust match the probabilities of N usage associated with different grammaticalcategories. In regard to style shifting, Labov (2001a, p. 418) suggests that atfirst children do not learn the adult concepts of formal style versus casualstyle. Rather, they learn a more primitive and more relevant opposition:instructional style versus playful style. Children are likely to hear more G ininstructional and disciplining situations, as “when the child is in trouble and/or is being instructed” (Labov, 2001a, p. 420). Conversely, children are likely tohear more N in intimate and friendly situations. At a later stage, children learnthat G is associated with formal style and middle-class speech, and that N isassociated with casual style and working-class speech (see the broaderdiscussion in chapter 8).

We now turn to the more complex case of the transition problem: How canchildren learn to talk differently from their caretakers in order to advance asound change in progress? As an example, let us take the nearly completedchange (ohr) raising and backing, as in the word corner. Notice that for thischange to increase with each generation, it is not enough for children to match


the frequency of their parents, nor even the frequency of slightly older peers. Toadvance the change, children must understand the direction of the change, andto do that, they must sample the frequency at which their parents raise andback (ohr) and compare it to the frequency used by slightly older peers. Onlyby noticing that the latter is greater than the former can learners know thedirection of the sound change and adjust their own production accordingly.The fact that they do so is convincingly demonstrated by the (ohr) arrow infigure 3.2. Furthermore, it appears that, while very young children can intuit thedirection of a change, it takes several years of exposure to age stratification inthe speech community for them to internalize all aspects of the change, as thefollowing example illustrates. Recall that in Philadelphia speech /æ/ split intotwo phonemes /æh/ and /æ/, in part by means of lexical diffusion. In thischange, tensing applied categorically in closed syllables, but only to certainwords in open syllables (e.g. planet was tensed, but Janet was not). Table 3.1shows the frequencies of tensing for these words by adults, very young children,and slightly older children. Notice that the children in the youngest age groupincorrectly apply tensing in the word Janet at a rate of 65 percent. However,children in the next oldest age group have moved toward the adult norm,tensing Janet at a rate of only 10 percent.

The Embedding Problem

Linguistic aspects. Regardless of how it begins, a new phone must be embeddedwithin an existing system. The linguistic aspect of the embedding problemis usually discussed in functionalist terms: it is believed that the linguisticsystem arranges itself to preserve the maximum number of oppositions. TheNeogrammarians and Saussure claimed that phonemes comprise a system ofmaximum contrasts, and that change in the value of any individual phonemenecessitates changes in the values of all the other phonemes in the system. Thisclaim is usually true in the long run, as is illustrated in chain shifts. As men-tioned, in the Great English Vowel Shift, /iy/ was lowered and centralized to /ay/,and /ey/ was raised to fill the resulting empty phonetic space. Long-term soundchange usually results in a reshuffling of phonetic values within the same num-ber of phonemic units, rather than in an increase of phonemes. There is nomaximum number of phonemes in a language, but there is a fuzzy upper

Table 3.1 Frequencies of [æ] tensing for two words by agegroup.

Adults Children, ages3:11–4:11

Children, ages3:2–3:10

planet 18 96 90Janet 0 10 65


threshold. This limit, however ill-defined, will discourage the addition of basicunits.

Three concepts, then, are required to explain why equilibrium in phoneticspace is disturbed and then re-established in sound change: (1) the combin-ation of social and phonetic factors that activate sound change, (2) an upperthreshold of complexity, and (3) the functional pressure to maintain maximumcontrast between units. A system in equilibrium can be upset by the phoneticand social factors discussed in this chapter. In the long run, however, the ceilingof complexity and the principle of maximum opposition are acknowledgedand equilibrium is re-established, perhaps by reducing complexity elsewhere inthe system. Labov (1982) likens this stabilizing process to the force of gravity.We can overcome gravity for a time by jumping or flying in an airplane, buteventually the force reasserts itself and we must return to earth.

Social aspects. Sociolinguists have found that in order to understand how anew sound is embedded within a speech community it is necessary to charac-terize speakers according to social class, ethnicity, age, gender, and locality. Thebest-known study of social embedding is Eckert’s (1988, 1999) research at“Belten” High School in Michigan. Eckert identified two main social groups atBelten: jocks and burnouts. The jocks were not just athletes, but students whoparticipated in approved high school activities, including sports, studentcouncil, clubs, and so on. These students were generally college-bound, andthey accepted the legitimacy of the academic program of courses and grades.The burnouts were generally headed for working-class jobs after high school,and they did not buy into the academic program nor the extra-curricularactivities sponsored by the school. The two groups could be distinguished bytheir dress, the places they hung out, the kinds of music they listened to, and,Eckert discovered, the way they pronounced their vowels. It should be notedthat the jock and burnout social categories were not sharply defined, but ratherconstituted two poles of social orientation to which students were attached invarying degrees. In this sense, group membership at Belten High was like classmembership in the larger society, and Labov (2001a, p. 433) suggests that thesocial continuum from burnout to jock was the high school reflection of classstratification in the community.

Belten High is located in a suburb of Detroit, a city that is participating fullyin the Northern Cities Vowel Shift, a chain shift that is ongoing in cities in theNorthern U.S. dialect region including Rochester, Cleveland, and Chicago. Oneaspect of this shift is the backing of the mid-central vowel /�/, so that busses canbe misheard as bosses. Eckert found that the degree of /�/ backing correlatedwith a speaker’s social group and gender, with female burnouts showing themost radical change in vowel position, and male jocks showing the least. TheVarbrul weights for /�/ backing at Belten High were as shown in table 3.2.

Notice that in table 3.2 the female burnout category has been dividedinto two groups: main burnouts and burned-out burnouts. The latter group


contains those girls whose social characteristics placed them closest to theburnout pole of the jock–burnout continuum. Eckert found that this grouphad by far the most advanced backing of /�/. She was also able to distinguishother fine divisions in the social categories at Belten High. Particularly interest-ing was a subcategory of the jocks containing individuals who had both jockand burnout friends, whom Eckert labeled brokers. Labov (2001a, p. 435) sug-gests that the brokers form the important link that moved the change in pro-gress from the originating group (the burnouts) to the conservative group (thejocks).

Belten High is a reflection of the larger society in several ways. Studies ofPhiladelphia neighborhoods representing different social classes have shownthat women are generally the agents of change. In regard to new and vigorouschanges, such as the raising of (eyC), women are a full generation ahead of men(that is, 30-year-old women show the same phonetic values as 50-year-oldmen). Furthermore, the leaders of change are found in the middle socio-economic classes, particularly the upper working class. These leaders, who arealmost entirely women, are the equivalent of Eckert’s brokers because theyare respected in their neighborhoods and also have contacts beyond theirneighborhoods.

Labov (2001a, pp. 77–78) sums up the embedding of a new linguistic formwithin the speech community as follows:

a) Linguistic changes originate in an intermediate social group—theupper working class or the lower middle class.

b) Within these groups, the innovators are usually people with thehighest local status, who play a central role in community affairs.

c) The study of communication networks shows that the innovatorshave the highest density of social interaction and also the highestproportion of contacts outside the local neighborhood.

d) For most linguistic changes women are in advance of men, usually tothe extent of a generation.

As the discussion of Belten High shows, speech patterns can serve as anemblem of group identity and status. Within the larger community, speech pat-terns serve as a symbolic claim to local rights and privileges. It appears that

Table 3.2 Varbrul weights for the backing of /�/ in BeltenHigh, for social categories and genders.

Male jocks .32Female jocks .42Male burnouts .54Main female burnouts .47Burned-out female burnouts .93


new ethnic groups entering a community, such as the Portuguese on Martha’sVineyard (see chapter 1), or African Americans in urban centers, do notparticipate in local sound changes until they begin to gain these rights andprivileges.

The Evaluation Problem

Linguistic aspects. The linguistic evaluation of change addresses the question ofhow a communicative system can change without a loss of communicationamong speakers. Such loss has only recently been observed. As late as 1982,Labov stated that no study so far had documented a loss of communicativeefficiency due to sound change, and that this fact suggested that speakers havean ability to avoid confusion in talking to those with radically different systems.In his words, “A recognition of structural heterogeneity seems to be built intothe competence of members of the speech community. . . . The ability to speakand comprehend includes a knowledge of linguistic variation” (1982, p. 80).Labov’s argument here is functionalist. He implies that it is in the interest ofa speech community to prevent misunderstandings, and therefore that itsmembers develop the ability to comprehend distinctions that they themselvesdo not produce. This is no doubt true, or New Yorkers could not understandPhiladelphians, Alabamans could not understand Bostonians, etc. However,more recent studies (Labov, 1994) have documented some loss of communica-tive efficiency among speakers with partially different phonological systems.An example can be found in the Northern Cities Vowel Shift. One aspectof this change is that /æh/ is variably raised to [eh] and [ih]. This causesmisunderstandings among speakers of other dialects, even when ample contextis supplied. For example, in describing a séance during a sociolinguisticinterview, Debbie S., a 13-year-old Chicagoan, said:

a But then they did this other thingb that [ðiht] they would ask the candle a questionc an’ if it was yes, they would tell the candles to move,d and if it was no, they would have [hehv] it stand still.e And so, nobody really got scared of that [ðiht]. (Labov, 1994,

pp. 188–189)

When this passage was played to speakers of other dialects, the word that inline b was frequently heard as yet, and the word have in line d was frequentlyheard as hear.

Because linguistic change does sometimes cause miscommunication, thefunctionalist hypothesis described above may be too strong, and we must askwhether there are other ways in which meaning can be maintained withinthe speech community. In a weaker hypothesis, Labov (1994) claims thatfunctional forces eventually assert themselves over the course of a linguisticchange. When forms that convey a meaning are lost, other ways of conveying


the meaning appear. For example, French formerly signaled the feminine pluralof nouns with the article las, which was opposed to the feminine singular la justas in present-day Spanish. However, the radical reduction of final consonantsin French eliminated the la, las distinction (except when the sound followinglas was a vowel, in which case the /s/ was preserved by liaison). So, in order tomaintain the singular/plural distinction, the vowel of las was changed from /a/to /e/ by a process not entirely understood, which is reflected in the presentspelling, as in les filles (the girls). Labov (1994) now says that when a languagechanges, its information-carrying capacity can be threatened, and communica-tive efficiency between speakers can be temporarily lost. But in the long run,languages preserve their means of conveying information.

Social aspects. In order to understand how speakers evaluate their own andothers’ speech, it is helpful to distinguish three kinds of sociolinguistic vari-ables, which represent three ways a form can be evaluated by the community.The first kind of variable is the sociolinguistic indicator, a form that is belowthe level of a speaker’s consciousness and therefore neither noticed nor evalu-ated. An example is a variant of -ing found in western speech (but very rare inPhiladelphia and therefore not included in the study in chapter 2). In somewestern dialects (including the author’s own), G, the formal variant, is pro-nounced [iyn] rather than [iyŋ]. Thus, G is distinguished from N by thetenseness of the vowel, not by the nasal. Speakers of other American Englishdialects do not notice that [iyn] is different from [iyŋ], and they attach no(conscious) stigma to the western form.

Sociolinguistic markers are above the level of consciousness, and are com-mented on by speakers. In addition, markers show both class and style stratifi-cation. An example is the form N in Philadelphia speech. As we saw in chapter 2,working-class speakers use more N than middle-class speakers, and all speakersuse less N in monitored speech. The third type of sociolinguistic variable is thestereotype, which is a marker that has become so noticed and commented onthat it is associated with a particular speech variety. One example is the /r/-lessspeech of Boston, where “park your car on the curb” can be pronounced [pækyə kæ on ðə kəhb]. Another example is the southern “y’all.”

There is a common (though not necessary) progression in the developmentof the three kinds of variables. As we have seen, linguistic change usuallyoriginates among members of the upper working class. Typically, at this earlystage a new pronunciation is an indicator, below the level of social awareness.For example, raised (aw) is a new and vigorous sound change that is anindicator in Philadelphia speech. It is also possible to find indicators that arelong-entrenched, such as the [iyn] of western dialects that was just discussed.In the next stage, a change reaches the level of social awareness and becomes amarker. At this point, social stratification appears, with the upper classes usingthe new form less frequently than the middle and lower classes. In this stage orthe next, the change may become stablized and not continue to completion,


leaving two variants in alternation. In the final stage, a marker may become astereotype, commented on by the public and evaluated unfavorably by allclasses, as is the case with N. Stereotypes may even become associated withlower-class speech.

The Actuation Problem

The actuation problem is why a change is initiated at a particular timeand place. Weinreich, Labov, and Herzog (1968) characterized the actuationproblem as the most recalcitrant of the five problems, and this assessment hasnot changed. Labov (2001a) states, “The beginnings of change are as mysteri-ous as ever” (p. 466). Undoubtedly, there are many reasons for a change tobegin at a particular place and time, but one important motivation is whenspeakers adopt a new phone as an emblem of identity because they wish toalign themselves with a particular social group. One example of such a changeoccurred after World War II when New Yorkers began to pronouncepostvocalic /r/, as in forth and floor, in order to sound more like the majority ofAmericans. As described in chapter 1, this was an instance of change fromabove. However, it appears that by far the most common type of change ischange from below. An example is the raising and centering of /aw/ on Martha’sVineyard (Labov, 1972a). During the 1960s, young residents of the Vineyardhad to decide whether to stay on the island or to pursue greater educationaland economic opportunities elsewhere. Those who chose to stay expressedtheir identity as Vineyarders by reviving a local pronunciation, centralized /aw/,so that “about the house” is pronounced /əbəwt/ the /həws/. This pronunci-ation was used by old-timers but had almost died out. Thus, an older patternwas revived when it took on emblematic value.

A third way in which speech patterns can become group emblems, andthus actuate a change, is when the patterns of one group differ from those ofa stigmatized group, as can be seen in the trend toward raising the nuclei of (iy)and (ey) in Philadelphia, shown in figure 3.2. The traditional Philadelphia pat-tern was a lowering and centering of these vowels. Labov (1982) reports that ina sample of 110 speakers, those who were 25 years older than the average agecontinued the lowering trend, but, as figure 3.2 shows, speakers who were25 years younger than the average age had reversed the trend and were raisingthe nuclei of (iy) and (ey). Labov explains this reversal by noting two facts:(1) the raising pattern is typical of other Northern cities, such as New York City,Chicago, and Buffalo, and (2) since World War II a large number of AfricanAmericans from the South have moved to Philadelphia—speakers whoselowering of (iy) and (ey) is even more pronounced than that of the olderPhiladelphians. Thus, Labov suggests that the reversal of the older pattern maybe a reaction against the pattern of the incoming Blacks.


Conclusions

Labov (2001a, p. 437) summarizes how changes are transmitted in Americanurban communities in the following Principles of Transition.

1. Children begin their language development with the patterntransmitted to them by their female caretakers, and any furtherchanges are built on or added to that pattern.

2. Linguistic variation is transmitted to children as stylistic differen-tiation on the formal/informal dimension, rather than as social strati-fication. Children associate formal speech variants with instructionand punishment and informal variants with intimacy and fun.

3. At some stage of socialization, depending on class status, childrenlearn that variants favored in informal speech are associated withlower social status in the wider community.

4. Linguistic changes from below develop first in spontaneous speech atthe most informal level. New variants are unconsciously associatedwith nonconformity to sociolinguistic norms, and are advancedmostly by youth who resist conformity to adult institutional practices.

5. Linguistic changes are further promoted in the larger community byspeakers who have earlier in life adopted symbols of nonconformitywithout taking other actions that lesson their socioeconomic mobility.

This chapter has summarized some of the findings of quantitative socio-linguistic research on the speech of native speakers. In the next chapter we willsee how the methods and findings discussed here have been applied to studyingthe speech of nonnative speakers.


IIVariation in Nonnative Speaker Speech

4The Study of Variation

in Interlanguage

Introduction

In part I, we looked at mainstream sociolinguistic studies of how linguisticvariables are embedded in the native speaker speech community, and howthey are conditioned by linguistic, stylistic, and demographic factors. Thesestudies are mainly sociological, not psychological, in nature (although therehave been psychologically oriented studies of first language acquisition car-ried out within the variationist framework such as Labov and Labov [1976],Payne [1980], and Kovac and Adamson [1981]). In contrast, quantitativesociolinguistic studies of second language acquisition focus on the learning ofnew linguistic forms and are therefore mainly psychological in nature. Earlystudies, including Dickerson (1974, 1975), Adamson (1980, 1988), Huebner(1985), and Tarone (1988), looked at alternation between a nonnative form(I no play baseball) and a native form (I don’t play baseball), charting the grad-ual replacement of the former by the latter. In other words, these studiesfocused on the acquisition of the categorical rules of the target language.Corder (1981) called this kind of variation, which involves the acquisition ofthe basic forms of the language, vertical variation. A more recent strand ofvariationist research in second language acquisition (Adamson and Regan,1991; Bayley, 1996; Mougeon, Rehner, and Nadasdi, 2004; Regan, 1996;Wolfram, Carter, and Moriello, 2004; Bayley and Regan, 2004) is both psycho-logical and sociological in nature and therefore more closely resembles main-stream variationist studies in the native speaker community. This researchexamines the alternation between two or more native forms that are sociallysignificant (such as G versus N), investigating whether learners have acquiredthe frequencies of usage appropriate to their speech community in terms ofage, gender, and social class; that is, whether learners have acquired whatBachman (1990) calls sociolinguistic competence. As we saw in chapter 3, socio-linguistic competence involves learning finely tuned frequencies of produc-tion of alternating forms. Corder (1981) called such variation between twonative forms horizontal variation. This book is mostly about vertical variation.Chapter 5 provides an example of research in vertical variation, and otherstudies of vertical variation will be discussed throughout the book. But, inorder to introduce the reader to the more recent strand of SLA research on

49

horizontal variation, this chapter reviews several representative studies. To setthe stage for this review, we will first briefly examine three early studies ofvertical variation, one involving first language acquisition and two involvingsecond language acquisition.

Early Studies of Vertical Variation

Among the first scholars to study vertical variation in a learner’s speech werenone other than William and Teresa Labov (1976), who studied the develop-ment of wh- questions in the speech of their three-year-old daughter Jessie.They found that Jessie variably inverted the subject and AUX verb in wh- ques-tions, so that questions like “When can we go?” alternated with questions like“When we can go?” However, Jessie did not invert all wh- questions with equalfrequency. Rather, the frequency of inversion depended upon the particularwh- word. Thus, individual wh- words could be considered constraints on avariable inversion rule that would look like (1):

(1) WH NP AUX → AUX NP / WH

� �howwherewhatwhen

Variable rule (1) says that inversion is most likely after how, next most likelyafter where, and so on. The rule not only describes Jessie’s performance ata particular stage of development, but also makes a prediction about thefuture course of development. The prediction is that (1) will “go to comple-tion” (that is, that Jessie will produce categorical inversion) first in how ques-tions, next in where questions, next in what questions, and finally in whenquestions. In variationist jargon, how will become the first knockout constraint,followed by where, what, and when. Of course, that prediction might notturn out to be true. The ordering of constraints on rule (1) might change,implying a different sequence of acquisition. But the variable rule gives us away to model Jessie’s language development at a more detailed level than acategorical rule, which would describe subject–verb inversion in wh- questionsmerely as optional, without specifying where inversion was more likelyto occur.

The first quantitative study of vertical variation in second language learners’speech was conducted by L. Dickerson (1974), who looked at how Japanesecollege students studying in the United States acquired the English phoneme/r/. She found that certain linguistic environments favored accurate /r/ produc-tion and that the effect of these environments changed as the subjects gainedproficiency in English. A second early study, and the first one to use the Varbrulprogram to analyze interlanguage data, was Adamson (1980; Adamson and

50 • Variation in Nonnative Speaker Speech

Kovac, 1981), which reanalyzed the data from Schumann’s (1978) influentialstudy of the speech of Alberto, a 33-year-old Spanish-speaking immigrantfrom Costa Rica. Schumann (1978) had found that Alberto used two mainstrategies to negate verbs:

no + verb I no can see.don’t + verb He don’t like it.

A Varbrul analysis showed that over the nine months of the study, Alberto’suse of don’t increased and that linguistic factors and speaking style affectedwhether Alberto would use no or don’t. In a more statistically sophisticated re-analysis of Alberto’s data, Berdan (1996) also found that these factors affecteddon’t production, although his analysis somewhat modified Adamson’s (1980)and Adamson and Kovac’s (1981) list of linguistic constraints. The mostimportant modification was the discovery that Alberto tended to use severaldon’ts in a row. That is, if he used don’t in a particular negative sentence, he wasmore likely to use don’t in the next negative sentence. This fact suggests thatwhen Alberto activated a mental program for negation with don’t, thatprogram stayed partially activated for immediate subsequent use. A similarphenomenon was found by Young (1991, p. 138), whose Chinese-speaking sub-jects were more likely to mark English nouns as plural if plurality were markedearlier in the noun phrase by a demonstrative pronoun, numeral, or quantifier.For example, plural -s was more likely to appear on the word book in the sen-tence “He wants several of those books for Christmas”, than in the sentence“He wants books for Christmas.”

Studies of Horizontal Variation

Studies of how second language learners acquire sociolinguistically significantforms have been carried out in both naturalistic settings, and in classroomsettings. In the following pages, we will briefly review studies of bothkinds.

Naturalistic Learners

Bayley (1996) looked at both vertical and horizontal variation in the speech ofChinese speakers, focusing on a classic sociolinguistic variable, final /t,d/ pro-duction in consonant clusters, as in mist and raised (see the discussion inchapter 1). He found that the surrounding linguistic environment constrainedthe production of these consonants in basically the same way that it con-strained /t,d/ production in the speech of native English speakers (as discussedin Labov, 1994) and in the speech of Vietnamese speakers of English (as dis-cussed in Wolfram, 1985). Bayley (1996) also took into account three factors ofparticular interest to SLA research: his subjects’ English proficiency, theirspeaking style, and whether they frequently interacted with native Englishspeakers. The subjects of the study were 20 adult native speakers of Mandarin

The Study of Variation in Interlanguage • 51

from both China and Taiwan, mostly students, who had lived in the UnitedStates for periods ranging from 2 to 61 months.

As we saw in chapter 1, for speakers of African American English thestrongest constraint on /t,d/ deletion is a following consonant, and Bayley(1996) found this to be true in the speech of his subjects, as well. In addition, hefound that a preceding obstruent or nasal strongly favored deletion, as did voi-cing agreement between the sounds preceding and following /t,d/. That is, ifthe preceding and following sounds were both voiced (as in mild overbite) orboth voiceless (as in mist from the sea), deletion was favored, but if the twosounds had dissimilar voicing (as in mist over the river), deletion was dis-favored. Bayley also found, as had Wolfram (1985) in his study of VietnameseEnglish, that the grammatical status of /t,d/ had the opposite effect on tensemarking (when /t,d/ signals past tense) than it has for native speakers. Recallfrom chapter 1 that Wolfram (1969) had found that for African Americans inDetroit, when final /t,d/ marked past tense, it was less likely to be deleted thanwhen it did not mark past tense (as in mist). However, Bayley (1996) foundexactly the opposite pattern: past tense /t,d/ was more likely to be deleted. Thismakes sense because adult native speakers of English do not have to learn tomark past tense; that aspect of the grammar has been completely acquired.Therefore, the major cause of deletion for these speakers is the phonologicaltendency to reduce consonant clusters. However, deleting a past tense markercan eliminate important information, so native speakers resist this tendency.For nonnative speakers, on the other hand, there are two causes of deletion: thephonological tendency to delete consonants from clusters, and imperfect mas-tery of the past tense. Thus, nonnative speakers who have not mastered the pasttense rule have two obstacles to overcome, and therefore are more likely to omitfinal /t,d/ from a word like fined than from a word like find.

Bayley (1996) also looked at the effect of speaking style. He elicited data inthree speaking contexts: a one-hour sociolinguistic interview, a reading pas-sage, and the retelling of a story that the subjects had learned in Chinese. Hefound the most deletion in casual conversation (the least monitored style), lessdeletion in story retelling (a more monitored style), and the least deletionin reading (the most monitored style). The relationship between style andmonitoring will be discussed more fully in chapter 8.

A particularly interesting feature of Bayley’s study was that he divided thesubjects into two groups according to the amount of contact they had withnative English speakers. One group belonged to an almost entirely Chinesesocial network and had little contact with Americans outside of the classroom,while the other group belonged to a social network that included Americans.Bayley (1996) made two comparisons between these groups. The first com-parison was which group marked past tense more accurately in irregular verbs.He found that the speakers who had more contacts with native speakers weremore accurate. The second comparison was which group deleted /t,d/ more


from regular verbs. He found that the group with English social contactsdeleted /t,d/ more often, a result that is surprising at first glance. Shouldn’t theaccuracy of past tense marking simply be a function of English proficiency?Why would the group that is more accurate with irregular verbs be less accuratewith regular verbs? Bayley (1996) pointed out that, while past tense markingof irregular verbs is wholly a developmental process (movement on the verti-cal continuum), marking with /t,d/ is subject to community speech norms(movement on the horizontal continuum). He concluded that the group thathad more American contacts had better acquired the general grammatical rulefor past tense marking, and that they applied it to the underlying structure ofall past tense verbs. However, this group had also acquired community speechnorms to some extent, resulting in appropriate variable deletion of /t,d/.

Wolfram, Carter, and Moriello (2004) studied Spanish-speaking Englishlearners who, like Alberto, had come to the United States largely for economicreasons, intending to establish residence. These researchers focused on twocommunities in North Carolina: Raleigh and Siler City, where recent immigra-tion from Mexico and Central America has resulted in large Hispaniccommunities. Wolfram, Carter, and Moriello (2004) note that within thesecommunities Spanish speakers are learning English almost entirely in theAmerican surroundings, and that their language abilities range from mono-lingual Spanish to fully proficient bilingualism. According to the 2000 census(Wolfram, Carter, and Moriello, 2004, p. 356), Hispanic immigrants to NorthCarolina show the lowest degree of English proficiency of immigrants to anystate, and the children who are born to these immigrants in the United Statesare Spanish dominant. The research question of this study was whether theseimmigrants and their children were acquiring features of Southern English orof general American English. Specifically, the researchers looked at the diph-thong /ay/, which in Southern Piedmont speech is fronted and unglided beforevoiced segments, so that time and side are pronounced [ta:m] and [sa:d].

Spanish has phonemic /ay/ in words like bailar (dance) and hay (thereis), though in the speech of two monolingual Spanish speakers whom theresearchers examined the glide had a longer trajectory and a higher and morefronted endpoint than the English glide. The fact that Spanish has /a/, asin dama (woman), as well as /ay/, suggests that neither the Southern nor thenon-Southern pronunciation of /ay/ is favored by transfer from Spanish.

The researchers examined the speech of ten Spanish-speaking residents ofSiler City and seven Hispanic residents of Raleigh, mostly adolescents andteenagers with lengths of residence in the United States from between two andseven years. They note that among native English speakers glide reduction in/ay/ is more prominent in rural Siler City than in urban Raleigh, where it is notheard in the speech of many residents who came from northern states. A con-venient measure of the difference between Southern and non-Southern /ay/is the duration of the glide compared to the duration of the entire vocalic


segment. In non-Southern English the glide is much longer, accounting for47 percent of the vocalic segment, whereas in Southern English the glideaccounts for only 17.5 percent. Based on the length of the glide in /ay/,Wolfram, Carter, and Moriello (2004) found that the Spanish speakers in ruralSiler City had more Southern-sounding diphthongs than the Hispanic speakersin urban Raleigh, concluding that “several Siler City speakers show someaccommodation to the local Southern norm, unlike their Raleigh counter-parts” (p. 353). As this statement suggests, there was considerable variationamong the individual speakers. A particularly telling contrast was that betweentwo siblings, an 11-year-old girl and a 13-year-old boy. Both of these childrenwere the offspring of Mexican immigrants, and both had lived in Piedmont,North Carolina, all of their lives. In sociolinguistic interviews the girl producedonly 5.9 percent unglided /ay/, while the boy produced 62.8 percent. Wolfram,Carter, and Moriello (2004) note that the boy identified with the non-Hispanic“jock culture” of adolescent boys, whereas his sister was more oriented towardmainstream American culture. Despite individual exceptions, the participantsin this study were not, in general, accommodating to the Southern norm.Wolfram, Carter, and Moriello (2004) suggest that this was largely due to theresidential and social segregation of the Hispanic communities, and to the factthat the constant stream of Spanish dominant immigrants reinforced theimportance of communication in Spanish.

Wolfram, Carter, and Moriello (2004) also found evidence that learnersacquire the phonology of the target language, in part, by learning the pro-nunciation of individual words, a process similar to lexical diffusion (seechapter 3). For example, Martin, a Raleigh resident with excellent proficiencyin English, showed much shorter glide paths for the diphthongs in the wordsfive and outside than for the diphthongs in other words, which suggests thatthese lexical items had been learned individually. Other evidence that pro-nunciation was learned by lexical items rather than by word classes involvedthe Southern merger of [i] and [e] before nasals, so that him and hem arehomonyms, both pronounced [him]. Wolfram, Carter, and Moriello (2004)found subjects who pronounced only certain lexical items in this word class,such as pen, with the Southern [i].

Formal Learners

In formal settings, there have been more studies of French acquisition than ofEnglish acquisition, and we now review two studies that are representative ofthis research. The first longitudinal variationist study of instructed learners wasRegan (1996), which looked at the developing French of seven Irish universitystudents, all of whom had studied French in high school and college and spokeFrench at an advanced level, and none of whom had lived in France for longerthan several months. These students (six women and one man) had beenselected to take part in a study abroad program that would allow them to spend


a year in France, and Regan took advantage of this fact to study their socio-linguistic competence before and after the immersion in French.

Regan (1996) looked at the variable ne, which occurs in negative expressionssuch as:

Je ne pourrais pas “I wasn’t able”Elle ne travaille plus “She didn’t work anymore”Je n’aimais rien “I didn’t like anything”

In spoken French, it is possible to delete ne in such constructions, especiallyin informal style. The rates of deletion range from 98 percent in MontrealFrench (Sankoff and Vincent, 1977) to 44.1 percent in Parisian upper-middle-class French (Ashby 1996).1 In addition, Ashby (1981) found evidence of agegrading (young people delete more) and gender stratification (women deletemore). Thus, ne appears to be a classic sociolinguistic variable, constrained bydemographic and stylistic factors.

Regan (1996) compared her subjects’ rates of ne deletion before and aftertheir year in France to see whether they were accommodating to French socio-linguistic norms. She found that there was a significant change. The Varbrulp value for ne deletion before immersion was .32. After the stay in France thisvalue increased to .67. Regan (1996) also found interesting results regarding theconstraints on deletion, including monitored versus unmonitored style. NativeFrench speakers delete ne slightly more in unmonitored style (p = .47) than inmonitored style (p = .52) (p. 189). Before immersion, the learners showed amuch larger difference in ne deletion in these two styles: p = .63 for unmoni-tored style versus p = .35 for monitored style. After the stay in France, however,this difference had narrowed considerably and almost matched the rates of thenative speakers (p = .57 in unmonitored style versus p = .44 in monitoredstyle). Thus, the subjects had learned to increase the rate of deletion in theirmonitored style. After the year in France, their revised hypothesis seemed to be:delete less in monitored style, but delete at a high rate in both styles.

Regan also found significant effects for phonological constraints (place andmanner of the following segment), and morphological/syntactic constraints(presence of object clitic, pronoun subject, and clause type). A particularlyinteresting finding involved whether ne was deleted from formulaic phrases,such as je ne sais pas (I don’t know). Deletion was strongly favored in suchphrases, both before the stay in France (p = .74) and after the stay (p = .80).Both of these p values are higher than the values for native speakers (p = .63).Thus, the learners overgeneralized ne deletion in these phrases. Such over-generalization has also been observed in native speaker speech. In his studyof /r/ in New York City (see chapter 1 and chapter 8), Labov (1966) foundthat the second highest social class overgeneralized /r/ deletion in highly moni-tored styles. In their attempt to match the style of upper-class New Yorkers,these speakers overshot the mark. Regan (1996, p. 191) notes that [seypa] is a


stereotype for je ne sais pas (I don’t know), and [seypa] is a stereotype for cen’est pas (that’s not so). She speculates that the learners may overgeneralizethese easily produced stereotypes in an effort to accommodate to the overallne deletion they encountered in France. Such learning of phrases is a form oflexical learning (the phrase is mentally stored as a long word) and hasbeen noticed by other scholars including Wolfram (1985) and Adamson andRegan (1991).

Howard, Lemee, and Regan (2006) investigated the French acquisition of19 university students in Ireland who were native speakers of English studyingFrench as a foreign language. The researchers elicited their data by means of asociolinguistic interview. All of the subjects had studied French for five or sixyears before university and for three years at university. In addition, 15 of thesubjects had spent a year in France. The researchers focused on the socio-linguistic variable /l/, which can be deleted in words such as il (he) and elle(she). Several studies have identified linguistic constraints on /l/ deletion in thespeech of native French speakers. Poplack and Walker (1986), for example,found that Quebec French speakers deleted /l/ more frequently in the thirdperson pronouns, with rates of deletion ranging from 33 percent with elle to100 percent for impersonal il. Ashby (1981) found that continental Frenchspeakers in Tours deleted at rates ranging from 63 percent for elle to 88 percentfor impersonal iI. Rates of deletion have been found to be lower for objectpronouns and definite articles. Howard, Lemee, and Regan (2006) looked to seeif their students deleted /l/ in frequencies similar to those of native speakersand, if they did, whether their /l/ deletion was constrained by the linguisticenvironment, the context of elicitation, the speaker’s sex, and the degree ofcontact with native French speakers.

The researchers found that the degree of contact with native speakers had astrong influence on the percentage of /l/ deletion. The students who had neverbeen to France deleted /l/ at only 6 percent, while those who had studied inFrance for a year deleted it at 30 percent, which, it should be noted, is still farless than native speakers. A number of the linguistic constraints on /l/ deletionappeared to pattern like the constraints in native speaker speech. As in nativeFrench, impersonal il strongly favored deletion (at 45 percent), whereas elle wasthe least favoring pronoun (at 6 percent). Howard, Lemee, and Regan (2006)also found gender stratification, similar to that in native French, with femalesdeleting /l/ twice as often as males (45 to 22 percent). However, there was nosignificant difference in formal versus informal speaking styles. These resultsare similar to those for the Japanese speakers studied by Major (2004) (seechapter 8) because the subjects had learned gender stratification but notstyle stratification. Thus, as Major (2004) suggests, it may be that gender strati-fication is learned before style stratification in second language acquisition.

Mougeon, Rehner, and Nadasdi (2004) discussed several previous studies(Mougeon, Nadasdi, and Rehner, 2002; Rehner, Mougeon, and Nadasdi, 2003)


that examined the spoken French of 41 adolescent native English speakers whowere enrolled in a French immersion program in Toronto. These programsfeature 50 percent French medium instruction in grades 5 through 8, followedby 20 percent French medium instruction in high school.2 The subjects of thestudy were equally divided among high, intermediate, and low proficiencylevels in French and, as questionnaires showed, had only marginal exposure toFrench outside the classroom.

The researchers looked to see how these students used 13 sociolinguisticvariables like ne deletion in their speech and then compared the frequencies ofthe students’ usage to the frequencies of these variables in the speech of nativeQuebec French speakers, the speech of the students’ French instructors, and thetextbooks that the students were using. We will discuss only three of the vari-ables examined in this study, but they are representative of the overall results.The researchers divided the 13 linguistic forms they studied into three groupsdepending on their degree of formality or stigmatization: vernacular forms,mildly marked forms, and formal forms. Vernacular forms do not conform to therules of standard French and are socially stigmatized. An example is m’as +infinitive, as in M’as aller a l’école (I’m going to school). Mildly marked formsare used in informal contexts. One example is ne deletion, as discussed earlier.Another example is the construction je vas, as in “je vas à l’école” (I’m going toschool). Formal forms conform to the rules of standard French and are used inwriting and formal speaking contexts. An example is the form je vais, as in jevais à l’école (I’m going to school).

The results for these forms, which are representative of the overall results,are found in table 4.1, where it can be seen that the texts, teachers, and students

Table 4.1 Percentage of forms for L1 speakers of Quebec French, French language artsmaterials, French immersion teachers, French immersion students.

Materials

Linguistic variables L1 QuebecFrench

Texts Dialogues Immersionteachers

Immersionstudents

M’as + inf.(vernacular)

28 0 0 0 0

Je vas + inf.(marked)

60 0 0 1 10

Ne use (marked) 1 99.9 97 71 70

Je vais + inf.(formal)

12 100 100 99 90

Note: Very small percentages of forms unique to the immersion students’ interlanguage have beenomitted.


overwhelmingly used the formal forms. Based on all of the data, the authorsreached these six conclusions:

1. The immersion students never use vernacular forms or use them onlymarginally.

2. The immersion students use mildly marked forms considerably lessfrequently than native speakers of Quebec French.

3. The immersion students use formal forms considerably morefrequently than native speakers of Quebec French.

4. Female students use some of the formal forms more frequently thanmale students, and middle-class students use some of the formalforms more frequently than upper-working-class students. Similarcorrelations are found in native speaker communities.

5. Students who had contacts with native speakers outside of school dis-play a better mastery of mildly marked forms (a finding similar to thatof Regan [1996]).

6. There is a correlation between the frequency of forms in the edu-cational input and students’ speech (as is obvious in table 4.1). Theresearchers note that “the teachers and, to an even greater extent, thepedagogical materials make no or only marginal use of vernacularvariants [and] use mildly marked variants only infrequently . . . byand large we saw . . . these patterns . . . inflated in the students’ ownpatterns of variant use” (p. 426).

Perhaps the most obvious finding of Mougeon, Rehner, and Nadasdi (2004)is that, in most cases, the frequency of variable forms in the input was reflectedin the frequency of these forms in the students’ output. For example, the learn-ers were exposed to variable ne production in the speech of their instructors atthe rate of 70 percent, and they produced ne in their own speech at the rate of71 percent. As expected, forms that the learners were not exposed to, such as theQuebec vernacular form m’as + infinitive (though used 28 percent of the timeby native Quebec French speakers) were never produced by the learners. Anexception to the input/output match involved the marked form je vas + infini-tive. In this case, the learners had only the slightest exposure to the form: 1percent in the speech of their instructors. Nevertheless, they produced the format 10 percent in their own speech. This overgeneralization of frequency wasalso found in Regan’s (1996) study in regard to stereotyped forms like je saispas. Recall that Regan speculated that je sais pas may be a frozen form, whichstudents use at high frequency in an attempt to make up for their overall over-use of ne. This explanation presupposes that the learners desired to useinformal forms in order to sound appropriately casual. It may be that the samemotivation applies to the overgeneralization of je vas + infinitive. That is,because the students wished to sound informal, they overgeneralized a marked,frozen form.


Conclusions

The studies reviewed in this chapter found similarities between variation in theinterlanguage of both formal and naturalistic learners and variation in nativespeaker speech. While the interlanguage spoken by the participants was farfrom native-like, it varied along some of the same dimensions as native speech.For example, both interlanguage variation and native language variation can beconstrained by universal articulatory processes. Bayley (1996) found that a fol-lowing consonant favored final /t,d/ deletion by his Chinese-speaking subjects,just as this environment favors /t,d/ deletion in the speech of native Englishspeakers. This constraint was also found by Wolfram (1985) in the speech of hisVietnamese-speaking subjects. Also, speaking style and topic can constraininterlanguage variation in ways similar to native speaker speech. Bayley (1996)and Regan (1996) found that learners produced more native-like speech in cir-cumstances that encouraged more attention to speech, or monitoring. In add-ition, all of the researchers found that learners were able, to some extent, tointernalize constraints on the variation that they encountered in input. At themost basic level, classroom learners approximated the percentages of variableforms in the speech of their instructors and in their textbooks. Similarly,Regan’s (1996) and Howard, Lemee, and Regan’s (2006) subjects altered theirrates of variation after being exposed to different frequencies of input during ayear in France. At a finer level, the researchers discovered that learners were ableto internalize “unnatural” constraints (that is, constraints not motivated byuniversal articulatory processes) that were present in input. For example,Howard, Lemee, and Regan (2006) found that, like native speakers, Frenchlearners deleted /l/ more frequently in impersonal il than in elle. At a still finerlevel, Howard, Lemee and Regan (2006) found that female learners matchedthe pattern of /l/ deletion of female native speakers, and male learners matchedthe pattern of male native speakers. This finding is consistant with Major’s(2004) suggestion that language learners acquire gender stratification beforethey acquire style stratification, as discussed in chapter 8. It should also benoted that Wolfram, Carter, and Moriello (2004) found some dissimilarity inlearner and native speaker variation. Some of their Spanish-speaking subjectsused longer glides in the words five and outside than in other words containing/ay/, a fact that suggests that these two words were learned individually. Similarfindings have been reported in other studies of second language acquisition, asalso discussed in chapter 8.


5The Acquisition of English Irregular Past

Tense by Chinese-speaking Children

LARRY BERLIN AND H. D. ADAMSON

Introduction

In this chapter, we deal with the matter of vertical variation as it applies to agroup of Chinese-speaking children in their acquisition of an obligatory gram-matical feature. This type of work is consistent with previous studies (Bayley,1996; Wolfram, 1985; Young, 1991) and adds to the work amassed in an effortto understand the developmental process in acquiring grammatical features ofa second language. Our longitudinal study examined irregular past tense mark-ing using the Varbrul multivariate analysis program. The power of Varbrul liesin its ability to examine many possible combinations of constraints on pasttense marking in order to determine which combination best accounts for thevariation observed in our data.

Constraints on Past Tense Marking

Phonological Constraints

To begin, we briefly review studies by Wolfram (1985) and Bayley (1994) of thephonological constraints on second language learners’ acquisition of Englishpast tense mentioned in chapter 4. Wolfram (1985) studied Vietnamese adoles-cents and adults living in the United States and found that their past tensemarking of regular verbs was constrained by the surrounding phonologicalenvironment in much the same way as it is constrained in native speakers’speech. That is, regular past tense /t,d/ is less likely to be produced when fol-lowed by a consonant than when followed by a vowel because the articulationof consonant clusters is more difficult than a consonant–vowel sequence.Bayley’s (1994) study of Chinese adults living in the United States yieldedthe same conclusion. In a more recent study Hansen (2001) used Varbrul todetermine accuracy orders and production modifications in three Chinesefemale learners’ production of English regular past tense over a six-monthperiod, focusing on the consonant clusters that are created by the addition ofa past tense morpheme. She found that “Chinese learners of English employdifferent production strategies based on the length of the coda [that is, the

61

syllable final consonant(s) that follows the vowel]” (p. 362), but that saliencyproved a dominant criterion with various (and at times multiple) linguisticconstraints as well as “natural phonological processes” contributing to thelearners’ acquisition.

In regard to irregular past tense verbs, Wolfram (1985) and Bayley (1994)found that the underlying verb class was most likely to indicate which verbswould be marked for past. The four irregular verb classes these researchersidentified were suppletives, doubly marked verbs, verbs that undergo an internalvowel change, and verbs with replacive consonants. Suppletive verbs have idio-syncratic past tense forms (as in go → went). Doubly marked verbs undergotwo phonological changes between the base or present tense form and the pasttense form, typically that of an internal vowel change and either the additionof the past tense morpheme (as in do → did; tell → told) or the replacement ofthe final consonant with the past tense morpheme (as in bring → brought andteach → taught). Internal vowel change verbs undergo a change in the nucleus(vowel) of the syllable rather than in the coda (as in know → knew and lie →lay). Replacive consonant change involves changing a final consonant. Thiskind of change is more salient when it involves a different place of articulation(as in have → had and make → made) and less salient when it involves onlyvoicing (as in send → sent and lend → lent).

The phonological studies conducted by Wolfram (1985) and Bayley (1994)revealed that irregular verbs whose past forms were maximally different fromtheir present forms, such as suppletives (e.g., go → went; am/are/is → was/were), were more likely to be marked for past tense than irregular verbs withrelatively similar past and present forms (e.g., come → came). These findingsled Wolfram (1985) to propose the principle of saliency, suggesting that secondlanguage learners will first acquire the past tense of verbs that show maximaldifference between the base or present tense form and past tense form. Theresultant order for English irregular past tense, then, would be expressed asfollows:

suppletive > doubly marked > internal vowel change > replaciveconsonant

Both researchers found that the principle of saliency roughly held true for bothlow and high proficiency subjects, though Wolfram (1985) found there wasmore individual variation among the low proficiency subjects.

Semantic Constraints

Vendler (1967) proposed a classification for verbs according to their seman-tic type. He suggested that there are essentially four types of verbs basedon their inherent meaning; these are states, activities, accomplishments, andachievements. Stative verbs have no dynamic nature, and thus maintainthe same relative quality throughout their duration. Examples include sen-


sory, cognitive, emotive, and possessive verbs, such as see, remember, love,and have.

Activity verbs are those which have some duration and are thereforenot punctual (i.e., instantaneous). The activity described is homogeneousthroughout its duration, or it has the same quality at every moment it is occur-ring. Another feature of activities is that they have an arbitrary endpoint ratherthan an inherent one; in other words they are atelic, as opposed to telic verbs,which have a clear endpoint (see below). Examples of activities include run,sing, and play.

Accomplishments, like run a mile and build a house, are like activities in thatthey are not punctual, but unlike activities in that they are telic; that is, theyhave a clear endpoint or they result in a product. Usually, verbs that expressaccomplishments are transitive, whereas verbs that express activities areintransitive.

The final verb type identified by Vendler (1967) is achievements. Achieve-ments differ from the other three verb types because they are punctual. That is,at the moment they occur, the action of the verb is simultaneously completed.Consequently, they also have an inherent endpoint and are therefore also telic.Examples of this verb type are start and finish. Although they can take the pro-gressive form, these two actions are inherently instantaneous because, once anactivity is started, the actors are engaged in the activity, not in the starting.Other examples include reach the summit, die, and recognize.

Andersen (1993; Shirai and Andersen, 1995) studied semantic constraintson the acquisition of English past tense by native-speaking children. UsingVendler’s (1967) original classifications, he proposed a reclassification usingonly three semantic features: punctual, telic, and dynamic (see table 5.1 for acomparison of the two classification schemes). As a result of this reclassifica-tion, Andersen (1993; Shirai and Andersen, 1995) posited that telicity is themost relevant feature in past tense marking, especially for novice learners,because events with an inherent endpoint can more naturally be construed asbeing completed than verbs without a clear endpoint. For the purpose of ourstudy, we will classify the verbs only as telic or atelic.

Table 5.1 Three semantic features associated with Vendler’s (1967) four semanticcategories of verbs.

State Activity Accomplishment Achievement

Punctual − − − +Telic − − + +Dynamic − + + +Example see run run a mile die

The Acquisition of English Irregular Past Tense • 63

Discourse Constraints

Pragmatic work in narrative discourse structure (Berman and Slobin, 1994) hasidentified two prominent types of clauses found in narratives: background andforeground clauses. Background clauses are those which set the scene andprovide exposition; in contrast, foreground clauses are those which advance theaction of the story as it unfolds. Examples from our own data include thefollowing.

(1)[But she draw the, eh little man on the TV] [and she wasn’tlistening.]← foreground →

← background →(2)[And then the mole was very scared, and scared the lion would eat

him.]← background →[But the lion only picked him up and pat him]← foreground →

(3)[Mother was watching the eggs.] [And one of the eggs cracked.]← background → ← foreground →[And she was really happy.] [Then the two cracked.]← background→ ← foreground →

As these examples show, the past tense can be used in both background andforeground clauses. However, the present tense can also be used in backgroundclauses (as in evaluative statements like “I hate when that happens”) and inforeground clauses in the historical present tense. Thus, while the exclusiveuse of past tense is not requisite in narratives, its use is clearly predominant(see also Labov, 1972b; Schiffrin, 1981; Wolfson, 1982) for a discussion of thehistorical present in foreground clauses).

In a study of second language learners’ acquisition of English past tense,Bardovi-Harlig (1995) examined the narratives produced by 37 subjects fromvarious language backgrounds. The study used a cross-sectional design wherethe subjects represented different levels of proficiency. The primary researchquestion was whether past tense would be used first in background clauses orin foreground clauses. The subjects were asked to view a short silent film andthen to retell the story both verbally and in writing. The analysis revealedthat past tense marking occurred more frequently in foreground clauses thanin background clauses, regardless of the subjects’ proficiency level, leadingBardovi-Harlig (1995) to conclude that “learners mark foreground events forpast first and use a variety of forms in the background” (p. 286).

In a cross-linguistic study of narrative development, Aksu-Koç and vonStutterheim (1994) concluded that children under five years old from variouslanguage backgrounds tend to express early narratives as a simple sequence ofequally weighted events—typically the foregrounded action—gradually adding


background as they begin to complexify the grammatical structure intomore elaborate, hierarchical discourse structures. Moreover, an examination ofthe developmental stages of discourse competence acquisition of English,German, Hebrew, Spanish, and Turkish speakers from childhood to adulthoodindicated that “switches [in verb forms] found in the 3-year-olds’ storiesare aspectual rather than temporal [with] English-speaking preschoolers[exhibiting] difficulty in adhering to an anchor tense” (p. 452).

Variation in Chinese-speaking Children’s Marking of EnglishIrregular Past Tense

In our study, we wanted to learn which phonological, semantic, and/or prag-matic factors constrained the acquisition of the English irregular past tense byour subjects. The transcripts we examined were comprised of spoken narra-tives derived from a retell protocol similar to that conducted by Bardovi-Harlig(1995). Mandatory past tense contexts were coded for a variety of factorsincluding those identified by earlier research. Our longitudinal data repre-sented a three-year period in the lives of eight Chinese children living in theUnited States and acquiring English as a second language.

Subjects and Data Collection

Our subjects, four boys and four girls, ranged in age between 3 and 11 years oldat the beginning of the data collection period (see table 5.2 for an overview),and were all children of graduate students attending the University of Arizona.The children were all native speakers of Chinese,1 having lived with their parentsin China prior to coming to the United States. At the time the data collectionbegan, the children possessed varying levels of proficiency in English.

Over a three-year period, the children were individually interviewedapproximately one to two times each semester.2 On those occasions, theywere shown one or two short, animated cartoons. After viewing the cartoons,

Table 5.2 Overview of individual subjects.

Subject Age* Sex

X 11:0 MF 7:2 MD 5:8 FL 5:5 FM 5:0 MT 4:10 MY 3:11 FJ 3:0 F

* Refers to age at the initiation of the study


the children were asked to retell the story they had just seen. The narrativesproduced by the children were tape-recorded and transcribed.

Data Analysis

The transcripts were analyzed for obligatory past tense contexts (for examplessee the discussion of foreground and background clauses above). The questionof what constitutes an obligatory context in a narrative can be somewhat com-plicated because native speakers and highly proficient second language learnersmay use the historical present in foreground clauses to refer to past events(Adamson, Fonseca-Greber, Kataoka, Scardino, and Takano, 1996; Schiffrin,1981). In the present study, however, there was no expectation that the childrenwould have any knowledge of historical present because it requires consider-able familiarity with native speaker norms, and it is highly unlikely that oursubjects had much exposure to this stylistic device. We therefore felt confidentthat it would be appropriate to code all foreground clauses as obligatory forpast marking, following Wolfram (1985) and Bayley (1994).

We coded each verb in an obligatory past tense context as marked orunmarked for past. This marking could be considered the dependent variablein our correlational study. We also coded for the linguistic and other factorsthat we thought would constrain past tense marking. These factors, whichcould be considered independent variables, were contained within five factorgroups. Initially these factor groups were: 1) individual subjects; 2) timeperiod; 3) phonological verb class; 4) semantic verb type (telic or atelic); and 5)clause type. As in the Varbrul analysis described in chapter 2, each factor groupcontained a set of mutually exclusive, independent factors that exhausted all ofthe data. In other words, each verb could be coded for only one factor per factorgroup (e.g., a verb must be either telic or atelic).

Analyses were conducted using Goldvarb 2001 (Robinson, Lawrence, andTagliamonte, 2001), an updated version of the Varbrul 2 multivariate analysisprogram for Windows (Cedergren and Sankoff, 1974). Our predictions, basedon the research cited above, were as follows:

1. Past tense would be more frequent in later time periods; in otherwords, the subjects’ interlanguage would move closer to native normsover time.

2. Salient verb classes (classes in which the base/present form and thepast form are maximally different) would be marked more frequently.

3. Telic verbs would be marked more frequently than atelic verbs.4. Foreground clauses would be marked more frequently than back-

ground clauses.


Cross-tabulation: Checking the Factor Groups for Interaction

The Varbrul program assumes that the independent variables do not affect oneanother. That is to say that if a telic verb favors marking and a suppletive verbfavors marking, then a verb that is both telic and suppletive will favor markingeven more. If that is not the case, then the factors are said to interact. Constraintinteraction can indicate very interesting linguistic phenomena, but interactingconstraints should not be simultaneously analyzed by Varbrul. Thus, beforerunning a Varbrul analysis, the researcher should check for constraint inter-action by cross-tabulating the two factor groups in question. Table 5.3 shows thecross-tabulation of the time periods against the proficiency level of the indi-vidual subjects. As previously mentioned, the data were collected over a three-year period, which we divided into three one-year time periods for the purposeof analysis. Individual subjects should improve over time if they continued toacquire English regardless of their proficiency level at the time the data collec-tion was initiated. As can be seen in table 5.3, this was indeed the case with theexception of subject X; nonetheless, if 80 percent accuracy is considered an indi-cator of acquisition, as has been done in other studies, subject X does not showany appreciable difference between his performance from time 1 to time 2.

We also cross-tabulated the individual subjects against the different verbclasses. If consistent with the findings of Wolfram (1985) and Bayley (1994),the frequencies obtained in this analysis should indicate that the more salientverb classes are more frequently marked. As can be seen in table 5.4,3 however,the data do not provide strong support for the principle of saliency. Four ofthe subjects do mark the suppletives more frequently than the doubly-markedverbs, but the other four do the opposite. However, three of those subjects—X,D, and F—all possess a high level of proficiency, and we see again that all thesesubjects approach or surpass the criterion level for acquisition of 80 percent

Table 5.3 Percentage of accurate past tense marking by individual subject and timeperiod. Subjects are ranked according to their overall accuracy.

Time period

Subject 1 2 3 Total

X 86 85 95 86D 78 89 no data 84F 57 87 96 79L 52 78 no data 63J no data 41 75 47T no data 9 63 45Y 6 43 59 39M 10 49 no data 30


marking. Subject Y does not exhibit much difference in her marking of supple-tives and doubly-marked verbs.

Two of the subjects, T and M, mark internal vowel change verbs more fre-quently than doubly-marked verbs, with M marking internal vowel changemost frequently overall. The only claim that can be made is that six of theeight subjects mark suppletives and doubly-marked verbs, which are relativelymore salient, more frequently than vowel-changing verbs, which are relativelyless salient. Thus, the principle of saliency, which found strong support inWolfram’s (1985) and Bayley’s (1994) study of adult learners, is not stronglysupported here.

It may be that verb saliency is less important for acquisition in children thanit is for adolescents and adults, yet one similarity to Wolfram’s (1985) data canbe observed: the exceptions to the weak generalization are low proficiency sub-jects. Wolfram (1985) also found that the saliency hierarchy was less prevalentamong his low frequency subjects than among his high frequency subjects. Asa result of this cross-tabulation, then, the independent variable of verb saliencywill not be tested in our Varbrul run because it does not constrain our data in auniform way.

The next factor group to cross-tabulate against the individual subjects is thesemantic type of the verb. Table 5.5 shows that six of the eight subjects marktelic verbs more frequently than atelic verbs. This finding lends support to thework of Andersen (1993; Shirai and Andersen, 1995), who suggested that ini-tially learners do not mark past time but rather telicity, which may be aninnately known language property. Telecity does not favor marking for subjectsD and T, though D is not really an exception because he marks both semantictypes at 80 percent. Subject T marks the semantic types in the opposite mannerthan expected, but because the difference is only 6 percentage points we feel

Table 5.4 Percentage of accurate past tense marking by individual subject and verbclass.

Verb class

Subject Suppletive Doubly-marked Internal vowelchange

Total

X 83 96 80 86D 93 94 70 84F 78 82 78 79L 73 61 56 63J 83 60 28 47T 59 17 38 45Y 44 45 35 39M 25 8 44 30

* Percentages reported indicate accurate past tense marking


justified in including both semantic type of verb and individual subject in theVarbrul analysis.

We now consider the effect of clause type. Table 5.6 shows that all of thesubjects except F mark past tense more frequently in background clauses thanin foreground clauses. Though this finding is opposite to Bardovi-Harlig’s(1995) earlier work (see the discussion below), the data are constrained in aconsistent manner and are eligible for inclusion in the Varbrul analysis.

Table 5.5 Percentage of accurate past tense marking byindividual subject and semantic type of verb.

Semantic type of verb

Subject Atelic Telic Total Difference

X 83 92 86 9D 84 84 84 0F 72 84 79 12L 61 65 63 4α

J 38 52 47 14T 48 42 45 −6Y 32 47 39 15M 18 48 30 30β

Total 55 66

α Total difference for four high proficiency subjects = 25β Total difference for four low proficiency subjects = 53

Table 5.6 Percentage of accurate past tense marking by individual subject and clausetype.

Clause type

Subject Foreground Background Total Difference

X 84 88 86 4D 79 89 84 10F 81 77 79 −4L 55 70 63 15α

J 38 62 47 24T 44 45 45 1Y 36 45 39 9M 24 40 30 16β

Total 54 68

α Total difference for four high proficiency subjects = 25β Total difference for four low proficiency subjects = 50


Results

The independent variables ultimately chosen for the Varbrul analysis are dis-played in table 5.7, which shows the overall percentage of marking for eachfactor as well as the Varbrul p values. As explained in chapter 2, “p” does notstand for probability but rather for the relative strength of each factor withinthe factor group. As such, the Varbrul program tests how well the hypothesisrepresented by the proposed factors actually fits the data and also serves as atest for constraint interaction. The overall goodness of fit is provided by a chi-square per cell score. If this score is below 1.5 (Preston, 1989), the hypothesisfits the data fairly well, and it is unlikely that the constraints interact. Thelower the score, the better the fit. The chi-square per cell score for our analysisis 1.26, a relatively good fit. This suggests that our choice of factor groupsto include in the final analysis accurately explains what influences oursubjects’ past tense marking. We now consider the effect of the individualfactor groups.

For factor group 1, time period, Varbrul assigned successively higherp values to each subsequent time period, as expected. While this may not actu-ally be deemed a finding, it does represent an important safeguard in the reli-ability of the findings as it clearly demonstrates that the subjects improved over

Table 5.7 Varbrul analysis for past tense marking.

P % n

Factor Group 1: Time period1 .27 46 1992 .56 66 3713 .81 75 152

Factor Group 2: Individual subjectsX .82 86 178D .82 84 90F .69 79 142L .60 63 123J .25 47 16M .25 30 55Y .19 39 89T .13 45 29

Factor Group 3: Semantic typeTelic .60 66 380Non-telic .41 55 342

Factor Group 4: Clause typeBackground .59 68 388Foreground .42 54 334

Input = .65, χ2 per cell = 1.26


time. For factor group 2, Varbrul ranked the subjects by relative proficiencyin almost the identical order that they were ranked by overall percentage ofpast marking. For factor group 3, semantic type, Varbrul assigned a muchhigher p value to telic verbs than to atelic verbs, as expected. Finally, for factorgroup 4, clause type, Varbrul assigned a much higher p value to backgroundclauses than to foreground clauses. This result also reflects the percentagefigures, but is the opposite of our original hypothesis based on the earlierfindings of Bardovi-Harlig (1995).

Discussion

A continuing criticism of Varbrul analysis has been that data from differentsubjects should not be lumped together, as this may obscure individual differ-ences. For this reason, we have looked at each individual subject and foundthat, with minor exceptions, all of them behaved in similar ways in regardto three of the four proposed constraints on past marking. In the case of verbsaliency, however, considerable individual variation was found, making thisfactor group inappropriate for the Varbrul analysis. We are not sure why ourresult in regard to saliency differs from those of Wolfram (1985) and especiallyBayley (1994), who studied adult Chinese speakers. But, as suggested earlier,perhaps it is because verb saliency affects pre-adolescent and post-adolescentsecond language learners differently, with pre-adolescents learning the forms ofindividual irregular verbs rather than verb classes.

In regard to clause type, our finding that marking is more frequent in obliga-tory contexts in background clauses appears to contradict the findings ofBardovi-Harlig (1995), who claimed that “the simple past emerges first in theforeground” (p. 272). However, the contradiction is only apparent because thetwo studies used different methods of tabulating data, thus addressing differentresearch questions. The present study, in the tradition of Wolfram (1985),Bayley (1994), and others, asked the question: Is past tense more accuratelymarked in foreground or background clauses? Like Wolfram (1985) andBayley (1994), we coded the verbs in background and foreground clauses onlyin obligatory past tense contexts. Bardovi-Harlig (1995), on the other hand,asked the question: Does past tense marking first emerge in foreground orbackground clauses? Therefore, she coded all of the verbs in background andforeground clauses, regardless of whether they occurred in an obligatory con-text. But speakers are free to use present tense more frequently in backgroundclauses than in foreground clauses. Foreground clauses trace the events of thestory line, all of which happened in the past. But background clauses can beused to describe a present state of affairs that is relevant to the story line orto interject the narrator’s opinion. For example, the first clause in sentence(3) could be expanded as follows. “I think the mother likes eggs, and shewas watching the eggs.” The revised sentence adds two verbs that do notrequire past marking, yet these verbs would be coded as not marking past in


Bardovi-Harlig’s (1995) system. It is not surprising, then, that she found ahigher percentage of marking in foreground clauses.

We also suggest that the higher rate of accuracy in the background resultsfrom the greater freedom of verb choice in background clauses. Backgroundclauses are not absolutely required in narrative discourse, especially that ofyounger speakers (Aksu-Koç and von Stuttenheim, 1994), and so the speakerhas considerable freedom to use familiar verbs. Indeed, by far the most com-mon verb our speakers used in the background was be. In the foreground, onthe other hand, certain actions must be expressed to tell the story, whetheror not the past tense form of the verb has been mastered. It is also worthnoting that, despite Aksu-Koç and von Stuttenheim’s (1994) finding that ageis a factor in native-speaking children’s acquisition of discourse competence,it did not appear to make a difference for our subjects, all but one of whomdemonstrated a higher percentage of accurate past tense marking in back-ground clauses, regardless of their age at the beginning of the study. In fact, theonly subject who demonstrated a tendency to favor marking slightly in theforeground, subject F, was 7:2 when data collection began.

Finally, the Varbrul analysis showed that telic verbs strongly favored mark-ing for past tense. As the percentages in table 5.5 indicate, this tendency ismore pronounced in the earlier stages of acquisition (i.e., among less proficientspeakers). This claim is further supported by the cross-tabulation of timeperiod and semantic prototype of verb shown in table 5.8, which indicates thatat time period 1, telic verbs are marked much more frequently than atelic verbsbut that this discrepancy diminishes until, at time period 3, the two semantictypes are marked with nearly equal frequency by all subjects.

There appear to be two possible, yet related, explanations for the tendency tomark telic verbs more frequently at earlier stages of acquisition. The first isShirai and Andersen’s (1995) claim that inherent telicity represents a prototypeof the larger semantic category pastness, and that in the early stages of acquisi-tion children mark the prototype more frequently. The second explanation is atransfer explanation suggested by Bayley (1994), who notes that Chinese does

Table 5.8 Percentage of accurate past tense marking by timeperiod and semantic type of verb.

Semantic type

Time Period Atelic Telic Total Difference

1 39 55 46 162 63 70 66 73 74 75 75 1

Total 55 66


not mark the pastness of an event but does mark perfective aspect, which isvery similar to telicity. In either case, it appears that as subjects become moreproficient they rely less on telicity and/or perfective aspect as a cue to pasttense and mark verbs for pastness more consistently.

Conclusions

There are three main findings to this study of Chinese-speaking children’sacquisition of the English irregular past tense.

1. The principle of saliency, which was strongly supported in Wolfram’s(1985) study of Vietnamese adolescents and adults and Bayley’s(1994) study of Chinese adults, is only weakly supported here. Wehave suggested that perhaps children tend to learn irregular pasttense verbs individually while adults tend to learn them in classes. It isalso possible that there are multiple factors not examined here whichparticularly influence Chinese learners (cf. Hansen, 2001).

2. Our subjects marked verbs in obligatory past tense contexts in back-ground clauses more frequently than in foreground clauses. This dif-ference is especially pronounced for lower proficiency subjects. Wesuggest that our subjects were more likely to express backgroundinformation when they had the linguistic resources to do so and couldtherefore be more accurate.

3. Andersen’s prototype hypothesis and Bayley’s transfer hypothesis findstrong support in our study. Our subjects appear to rely on telicity (orperfective aspect) as a cue to pastness in the earlier stages of acquisitionand to abandon this strategy as they become more proficient.


IIIVariation in Theoretical Perspective

6Psychological Theories of

Linguistic Variation

Introduction

In this chapter we pick up the story of the search for the psychological under-pinnings of linguistic variation that we began in chapter 1, where we observedthat the phenomenon of variation in first and second language speech haspuzzled many linguists. Bickerton (1971), for example, noted that there was noexisting psychological theory that explained how the mind could keep trackof the probabilities associated with variable linguistic forms. More recently,Preston (2001, 2002) has suggested what such a theory would look like. But headds (2002, p. 141) that variationists have not dealt extensively with howthe facts of language variation relate to psycholinguistic models of speechproduction and comprehension. In this chapter we will see that psychologistshave been interested in probabilistic behavior for a long time, and we willexplore possible connections between psycholinguistic theories that accountfor probabilistic behavior and the variable speech behavior of native andnonnative speakers that was documented in the previous chapters.

Psychological Studies of Probability-matching

During the 1960s and 1970s a number of experiments in probability-matchingshowed that people can accurately gauge the proportion of different events.For example, Robinson (1964) presented his participants with two flashinglights, one of which flashed more often than the other, and asked them toestimate the proportion of left light flashes to right light flashes. He foundthat the estimates were very accurate, falling within 2 percent of the trueproportions.

In a more recent experiment in this tradition, Hudson Kam and Newport(2005) taught adults an artificial language that contained a variable rule.1 Inthe artificial language, which had Verb, Subject, Object word order, nounscould variably take a following determiner. Possible sentences in one dialectof the language included:

(1) Leymz gerko ka.be yellow sand det“The sand is yellow.”

77

(2) smit nerk pow mæuwzner powbe beside frog det boat det“The frog is beside the boat.”

Notice that in these sentences, mass nouns (like gerko/“sand”) take thedeterminer ka and count nouns (like nerk/“frog”) take the determinerpow. During the training sessions, one group of participants received input inwhich 100 percent of the nouns were followed by a determiner. Other groupsreceived input in which 75 percent, or 60 percent, or 45 percent of the nounswere followed by determiners. Because determiners were optional in thegrammar of the language, all of the participants received fully grammaticalinput. It was expected that the participants who received 100 percent noun+ determiner input would construct a categorical rule like (3), whereas theparticipants who received variable noun + determiner input would constructa variable rule like (4), which they would apply at rates that approximatelymatched the input they received.

(3) NP → N det(4) NP → N (det)

Because there were no variable constraints on where a determiner waslikely to occur (for example, animate nouns did not favor determiner markingmore than inanimate nouns, as is the case in some creole languages), theexperimental task involved learning only the correct percentages at which thedeterminers occurred in the input. In terms of the Varbrul program, the parti-cipants only had to learn the right input probability for variable rule (4).Of course, it would be possible for the subjects to incorrectly construct avariable rule with a linguistic constraint. For example, they might construct arule that incorrectly associated higher determiner frequency with animatenouns, or mass nouns, or whether a particular noun occurred for the firsttime in the sentence. One goal of the experiment was to see if the participantscreated such unsupported associations.

The 40 participants in the study, who were native English-speaking uni-versity students, were randomly assigned to one of eight experimental groups,which differed along two dimensions. The first dimension was frequency ofdeterminer input. As mentioned, for the low input group determiners accom-panied nouns 45 percent of the time; for the mid input group determinersaccompanied nouns 60 percent of the time; for the high input group deter-miners accompanied nouns 75 percent of the time; and for the perfect inputgroup determiners accompanied nouns 100 percent of the time, as shown infigure 6.1. The first question the researchers posed was: Can the participantsaccurately reproduce these percentages on a production task?

The second dimension of variation involved semantics. In one dialectof the artificial language, mass nouns took ka and count nouns took pow,as in (1) and (2). Half of the participants (assigned equally to each input fre-

78 • Variation in Theoretical Perspective

quency group) learned this dialect. The other half of the participants (alsoassigned equally to each input frequency group) received input in which ka andpow were associated not with countability but with a noun class. In this dialectclass 1 nouns (which might be thought of as feminine) took ka, and class 2nouns (which might be thought of as masculine) took pow. Assignment to anoun class was entirely arbitrary, with no basis in semantics or phonology.Thus, the second research question was: Can the participants more accuratelyinternalize the frequency of variation in input when a grammatical featuremarks a natural semantic relationship like the count–mass distinction or whena grammatical feature marks an arbitrary, conventional relationship likegrammatical gender?

The language was taught by showing the participants pictures that illus-trated the meaning of sentences like (1) and (2) while they heard the sentencespoken. Then, they were asked to repeat the sentence. On the productiontest, the experimenter showed each participant a scene with real objects thatcould be described with a sentence in the artificial language and promptedthe participant by supplying the verb. For example, the prompt to elicit sen-tence (1) would be /lemz/. The participant could then complete the sentence,possibly supplying determiners for the nouns.

The results of the experiment appear in figure 6.1. The dotted line in thefigure represents the percentage of articles in the input for both dialect

Figure 6.1 Results of Hudson Kam and Newport’s (2005) experiment in the acquisitionof determiners in an artificial language. Percent of determiner presence according toinput group. Reprinted with permissions.

Psychological Theories of Linguistic Variation • 79

groups. The solid line connecting squares shows the percentage of determinerproduction for the gender agreement dialect, and the solid line connectingcircles shows the percentage of determiner production for the count/massagreement dialect. In regard to the first research question Hudson Kam andNewport (2005) note “the participants generally used determiners in theirproduction about as often as they heard them in the input” (p. 171).

The study found two other interesting patterns in the participants’ responses.The first pattern involved the semantically based agreement rule. Notice thatfor three of the four input groups in figure 6.1, whether noun + determineragreement was based on countability or was arbitrary appears to make nodifference in the accurate learning of determiner frequency. However, thehigh input group is an exception. This group produced count–mass baseddeterminers at 80 percent but class 1 (feminine) and class 2 (masculine) baseddeterminers at only 55 percent. This finding suggests that given sufficientinput, learners can better construct a variable rule that accurately reflectsthe frequencies of variation in the input when the rule involves a languageuniversal, such as the count–mass distinction.

The second interesting pattern involved how the participants produceddeterminers following identical nouns depending on whether the noun phrasesappeared for the first time or the second time in a test sentence. When a particu-lar noun was mentioned for the second time, it was more likely to be markedwith a determiner by all input groups except the perfect group. Furthermore,this tendency was more pronounced in the lower input groups. In other words,participants exposed to fewer determiners were more likely to produce a deter-miner in the second obligatory context that they encountered on a test sentence.ESL teachers will not be surprised at this finding because it could result fromtransferring the pattern of English determiner usage, where the is used to marknouns that are known to the hearer. Therefore, in English discourse the is some-times not required when an NP is first mentioned but is required the secondtime (“Sam wanted to buy coffee and milk, but the coffee was too expensive”).

Hudson Kam and Newport’s (2005) study is a laboratory demonstration ofwhat variationists have known for a long time: adults can construct a variablerule that produces alternating forms in the proportions that these forms occurin the surrounding language. But, as we saw in chapter 1, it is not clear how thisability should be modeled in a grammar. Variable rules cannot be part of aChomskian competence grammar, which is intended to model a speaker’sknowledge of a language at a high level of abstraction. Therefore, the mechan-isms of probabilistic learning must be part of a performance grammar. In thediscussion of the Derivational Theory of Complexity in chapter 1, we notedthat in the 1960s and 1970s psycholinguists adapted the competence grammarsproposed by generative linguists to the needs of performance models, therebycreating a performance grammar that was very similar to a correspondingChomskian competence grammar. Recently, however, models of sentence


production and comprehension have become more distinct from competencegrammars because they have added probabilistic mechanisms. It thereforeseems promising for variationists to look to such mechanisms in order tounderstand how the mind can learn to produce variable forms at the appropri-ate frequencies. The most popular probabilistic mechanism in current psycho-linguistic theory is the connectionist network, which we will discuss in the nextsection.

Psycholinguistic Models of Language Performance

Production Models

The production model we will examine is the one described in Barsalou (1992),based on Levelt (1989) and Garrett (1975). We will first briefly outline theproduction of the sentence “The cat eats the food” and then look at pro-posals for expanding the model by adding probabilistic mechanisms. Adiagram of the model is shown in figure 6.2. The production process inthe figure is divided into seven levels, but we will be concerned mainly with thefirst four.

Figure 6.2 Barsalou’s (1992) model of speech production. Reprinted with permissions.


LEVEL 1: Conceptualize a message.

A mental module called the conceptualizer draws from backgroundinformation, discourse information, and the speaker’s communicativeintentions to conceive a message in the form of a proposition with apredicate and arguments marked with case roles. At this point EAT is theabstract concept of consuming food, which might be lexicalized as gulpedor gobbled, not the actual lexical item. It may be considered a bundle ofsemantic features. Similarly, CAT and FOOD are abstract feature bundlesthat could be lexicalized in several ways. The output of the conceptual-izer, called the preverbal message, serves as input to the next level.

LEVEL 2: Formulate an abstract sentential representation.

In this module, called the formulator, lemmas that match the abstractconceptual representations of the preverbal message are retrieved.A lemma is that part of a lexical entry that contains semantic, functional,and syntactic information, but not phonological information. In theexample sentence the lemma for cat is retrieved first. It looks like this:

catconceptual specifications

FELINE (i.e., a set of semantic features that identify cats)syntactic category

NOUNdiacritic parameters

countabledefinitesingular

The lemma for the verb eat contains the syntactic information that this verbrequires a subject argument and a direct object argument. Linking rules assignthe agent of eat to the subject argument slot and the theme to the direct objectargument slot.

LEVEL 3: Construct a phrase marker.

The speaker constructs a phrase marker for the sentence using theabstract sentential representation from level 2 as a guide and employingphrase structure rules like those in Chomsky’s (1965) Standard Theory(see chapter 1). These rules are employed in the reverse of the order thatthey are usually presented in textbooks, as will be discussed later in thechapter.

LEVEL 4: Retrieve the phonological representation associated with the selectedlemmas.


Using the lemma as a cue, the model retrieves the entire lexical entry, orlexeme, associated with each lemma. The lexemes, which contain theappropriate phonological forms in addition to the information con-tained in the lemmas, are now inserted into the appropriate slots in thephrase structure marker.

In levels 5, 6, and 7 the lexemes are segmented into individual phonemes. Then,affixes and function words are added to the syntactic string, and the phono-logical representations are converted into appropriate phonetic representa-tions. The phonetic string at level 7 can then serve as a set of instructions to thearticulators. Notice that, in this model, sentence production is deterministic:The preverbal message marches on to the spoken sentence without any mech-anism that would allow for probabilistic behavior. But, recent additions to simi-lar models (Levelt, Roelofs, and Meyer, 1999) have added such mechanisms,and we will consider this possibility in the next section.

Let us now return to level 3 to see in more detail how phrase markersare constructed. This discussion is based on Levelt (1989) rather than onBarsalou’s (1992) adaptation of Levelt’s model. In Levelt’s (1989) model aphrase structure frame is built by means of productions, or IF THEN state-ments. Productions can be considered phrase structure rules that have beenturned into a step-by-step set of instructions. These instructions take advan-tage of the projection principle, which says that a lemma of grammaticalcategory X must wind up as part of an X phrase, and, conversely, that an Xphrase must contain a member of category X. For example, IF a lemma ismarked as a Noun, THEN the lexical item corresponding to that lemmamust be part of a Noun Phrase. In the present example, the lemma for catis marked as a noun. Therefore, the NP production is called up, which builds astructure like this:

NP

N

cat

Because the lemma for cat is also marked as + definite, the definite article pro-duction is called up. It takes advantage of the possible determiner slot in an NPand plugs in the lemma for the. The NP now looks like this:

NP

det N

the cat


Next, the higher grammatical category NP (which according to the projectionprinciple must be part of an S) calls up the procedure for building an S. Then,because an S must have an NP and a VP, the S procedure builds a structure thatlooks like this:

S

NP VP

det N

the cat

We now have a VP and therefore there must be a V, which is added to thetree. Fortunately, we have a lemma marked as V, namely eat, so eat gets pluggedinto the V slot. The structure of the sentence under construction now lookslike this:

S

NP VP

det N V

the cat eat

As mentioned earlier, the production model just described does not containprobabilistic mechanisms, but such mechanisms could be added in the form ofconnectionist networks, and we will now consider how a simple connectionistnetwork works.

A connectionist network, which is run on a computer, can be thought of asan electrical circuit in which current spreads through a maze of wires lightingup certain light bulbs along the way. Dell, Chang, and Griffin (2001) havedeveloped a connectionist network that models the task of naming an objectin a picture. It is called the Aphasia Model because it can model the behavior ofaphasiac as well as healthy adults. A picture-naming task involves retrievinga lemma and matching it with its phonological representation. The experi-ments that led to the development of the Aphasia Model involved asking adultnative English speakers to name an object in a picture while various kinds ofdistractors were provided. For example, an informant might see a picture of acat and also see the written word dog. By measuring how long it took to namethe object in the picture with different kinds of distractors, the experimenterswere able to get an idea of the mental processes involved in naming.

The Aphasia Model uses the connectionist network shown in figure 6.3to retrieve a lemma and connect it with its phonological representation.


Connectionist theory adopts the metaphor of a neural network within thebrain, where electrical activation spreads from a top layer of neurons to a bot-tom layer.2 Figure 6.3 contains three layers of nodes that might be thought of asthree layers of neurons. The top layer contains ten nodes that correspond to thesemantic features of CAT, such as [mammal], [furry], [domestic], [has a tail],and [catches mice]. In experiments, when a subject is shown a picture of a cat,these nodes will be activated, and will serve as input to the middle layer. Thenodes in the middle layer correspond to lemmas, and include the lemmas forcat and similar animals like dog and rat. In terms of the brain metaphor, whenan informant sees a picture of a cat, neurons in the brain that correspond tothe appropriate semantic features of CAT are activated. In the Aphasia Model,nodes corresponding to these features are activated, and they pass theiractivation down to the cat node. When that node receives sufficient activation,it will fire, thus retrieving the lemma for cat. But the cat node is not the onlymiddle level node to receive activation. As figure 6.3 shows, some activationwill also spread to the lemmas for dog and rat because they share some of cat’ssemantic features, such as [mammal], [furry], and [has a tail] as represented bythe black nodes in the top level. This activation is increased, of course, if theparticipant reads the word dog or rat while seeing the picture of the cat. Usually,however, only the cat node fires because it receives the most activation. Thenodes in the third level represent phonemes, including those in the lexical entry

Figure 6.3 The Aphasia Model for naming cat (adapted from Dell, Chang, and Griffin,2001).


for cat, namely /k/, /æ/, and /t/. Activation from the cat node spreads primarily tothose phonemes, which are then activated and can eventually be pronounced.

An important feature that the Aphasia Model shares with other connection-ist models is that lemma selection and phoneme selection are not determin-istic. Even when connections between semantic features and lemmas arevery well established, as in our example, making the connections involves anelement of chance. Dell, Chang, and Griffin (2001) refer to this inherent vari-ability as “noise in the system.” In the language of Variation Theory we couldsay that the semantic features at the input level are strong constraints on theproduction of “cat,” but there is always some inherent variability at each level,which can result in a slip of the tongue producing the wrong word.

Let us now consider how connectionist networks could be added toBarsalou’s (1992) production model in figure 6.1. My suggestions are straight-forward and necessarily simplified; more sophisticated suggestions for usingconnectionist networks in production models can be found in Levelt, Roelofs,& Meyer (1999). The Aphasia Model uses connectionist networks to traversetwo of the steps in the production model. The first step is lemma retrieval,where at level 2 lemmas corresponding to particular semantic features areselected. Barsalou’s production model assumes that there will always be a per-fect match between semantic features and lemmas, but Dell, Chang, andGriffen’s (2001) research shows that this is not the case. Therefore, Barsalou’smodel could be improved by specifying that lemmas are retrieved using a two-layer connectionist network at level 2 that corresponds to the top and middlelayers of the Aphasia Model in figure 6.3. The second step in the productionmodel that is relevant to the Aphasia Model is phoneme retrieval, where atlevels 4 and 5 individual phonemes that match the selected lemmas are calledup. Again, the production model assumes that there will always be a perfectmatch because each lemma is associated with a lexeme that contains theappropriate phonological representation. The Aphasia Model replaces lexemeretrieval with the connectionist network between the middle layer and the bot-tom layer of figure 6.3. A similar two-layer network could be added to the pro-duction model, replacing levels 4 and 5.

Before examining how a connectionist network might model production ina second language, let us take a look at a connectionist network used in a modelof language comprehension. This network explicitly incorporates probabilisticconstraints and, like a variable rule or the Varbrul program, claims that con-straints of different weights combine to determine the probability that a par-ticular form will be chosen.

A Comprehension Model

Townsend and Bever’s (2001) model of sentence comprehension uses botha probabilistic mechanism and a categorical generative mechanism to analyzeinput sentences. This model employs the method of analysis by synthesis


to assign a meaning to the continuous stream of incoming speech. Townsendand Bever (2001) use the following metaphor to explain how analysis bysynthesis works.

Producing speech is like taking an ordered lineup of different kinds ofeggs, breaking them so each overlaps with its neighbors, then scramblingthem up a bit so there is a continuous egg belt, and then cooking them.Comprehension is analogous to the problem of figuring out how manyeggs there were originally, exactly where each was located, and what kindit was . . . (p. 161)

They continue:

It is clear how [an analysis-by-synthesis] scheme approaches thescrambled egg analogy. [It] starts with a particular hypothetical eggsequence, scrambles and cooks them in a virtual kitchen, and thencompares the resulting virtual omelet with the actual input. When thevirtual omelet matches the actual omelet, the input and cookingsequence producing the virtual omelet is confirmed as the correctanalysis. (p. 164)

The analysis by synthesis procedure involves the following four steps.

Step 1 The model stores the input string of words in short-term memory.Step 2 Using probabilistic procedures (to be described below), the model

assigns a syntax-like structure (called pseudosyntax) and a likelymeaning to the input string.

Step 3 Using the output of step 2 as input, the model attempts to generatea grammatical derivation using the machinery of the MinimalistProgram (Chomsky, 1995). This generative procedure is referred to asreal syntax.

Step 4 If the real syntax generates a grammatical derivation, it means that thepseudosyntax analysis is correct, and the meaning generated by the realsyntax is assigned to the input string. If the real syntax cannot generatea grammatical derivation, it means that the pseudosyntax analysis is notgrammatical, and the model tries again having eliminated one possiblepseudosyntax analysis.

As an example of this procedure, we will consider how the comprehensionmodel works on the input sentence “Athens attacked Sparta.”

Step 1 The model stores “Athens attacked Sparta” in short-term memory.Step 2 The model looks up the meanings and grammatical categories of the

lexical items in the input sentence. This results in a mental representa-tion like this:

[Athens]N [attacked]V [Sparta]N


This mental representation is then segmented into phrases. Segmenta-tion is accomplished using the “projection principle,” mentionedearlier, which states that nouns must be part of a noun phrase, verbsmust be part of a verb phrase, etc., and that together these constituentsmake up a clause (that is, S). The resulting string looks like this:

((((Athens)N)NP) (((attacked) V)VP) (((Sparta)N)NP))S

Recall that in the production model in figure 6.2, existing semanticcase roles were matched with NPs before a syntactic frame was con-structed. The comprehension model reverses that process, assigningsemantic case roles to the existing NPs in the syntactic structure above.This assignment is based on the fact that in English the prototypical(i.e. most likely) relationship between the syntactic order of majorconstituents and the case assignment of NPs is:

NPagent V NPpatient

After semantic case roles are assigned, the pseudosyntax analysis lookslike this:

((((Athens) N)NP) (((attacked) V)VP) (((Sparta) N)NP))S

AGENT PATIENT

Step 3 The pseudosyntax representation above is used as input for generatinga complete derivation using the procedures of the Minimalist Program.These procedures will not be described here, but basically they accept asinput candidate strings of words marked for grammatical category, suchas the pseudosyntax string above, and then generate a full derivationthat maps a surface structure onto a meaning.

Step 4 If the real syntax derivation does not crash, it means that the pseudo-syntax analysis of the sentence (the virtual omelet) matches the inputstring (the input omelet), so the meaning internally generated by theMinimalist procedure of “Athens attacked Sparta” is correct, and thatmeaning is assigned to the input string.

Now let us consider how a connectionist network is involved in the pseudosyn-tax analysis. First, it is important to point out that in the process of analyzing aninput string the model does not proceed entirely serially, as is implied by thefour-step description above. Rather, parallel processing occurs while the inputis being received. This allows for future input to be anticipated on the basis ofthe input that has been received so far. For example, the sentence “Athensattacked Sparta” might be analyzed as NP V NP after the model has receivedonly the morphemes “Athens attack . . .” The NP V NP analysis is then assignedbecause in English the sequence NP V is very frequently followed by anotherNP. The pseudosyntax component of the comprehension model possesses anumber of syntactic templates, such as NP V NP, which correspond to grammati-


cal constructions in the language, and an entire syntactic template can be calledup after only the initial elements have been received. However, it is possible forthe pseudosyntax to miss its guess and to call up the wrong template. Forexample, if the input string continues as “Athens’ attack on Sparta failed,” theNP V NP template must be discarded and the following template substituted:

(Ngenitive NP)NP PP V

The on-the-fly selection of templates is governed by probabilistic associationsof word strings with possible syntactic structures. One cue that is used to selecta particular syntactic template is the kind of complement structure that is likelyto follow a particular verb. To illustrate this process, we will review a studyof reading comprehension by Spivey and Tanenhaus (1998), which looked atsentences like (5).

(5) The actress selected by the director believed that her performancewas perfect.

When a reader of this sentence reaches the word selected, two interpretationsof the sentence are possible. Selected might be the past participle of a passiveconstruction in a reduced relative clause (as, in fact, is the case in (5)), so thatthe complement of selected is “actress.” In other words, the sentence would beinterpreted as synonymous with “The actress that was selected by the directorbelieved that her performance was perfect.” The second possible interpretationis that selected is the past tense verb of a main clause, so that its complementwill turn out to be some new NP, as in the sentence “The actress selecteda new hair dryer.” When the reader has finished reading the word selected,the pseudosyntax will provisionally assign a template matching one of theseinterpretations, but which one?

In Spivey and Tanenhaus’s (1998) model, the decision of whether to choosea restrictive relative clause (RR) template or a main clause (MC) template onthe basis of the input string “The actress selected . . .” is made by a two-layerconnectionist network which, unlike the Aphasia Model, incorporates prob-abilistic mechanisms. We will now take a look at a simplified version of thatnetwork.

Let us first ask what factors in the linguistic environment might favorthe selection of the RR or the MC template when the reader reaches the wordselected. One factor could be whether the reader has already picked up theword by in peripheral vision. By would favor the RR interpretation, though theMC interpretation would still be possible, as in the sentence “The actressselected by lot a new hair dryer.” Suppose that an examination of an Englishcorpus revealed that by following a verb ending in -ed results in a RR in85 percent of the cases and in an MC in 15 percent of the cases. This situationcould be modeled in the connectionist network in figure 6.4.

Figure 6.4 says that when the by node is activated, it sends different amounts


of activation to the RR node and the MC node: the strength of the activationto the RR node is .85 and the strength of the activation to the MC nodeis .15. This activation results in different weightings of the RR node and theMC node. Either node could be activated, resulting in either template beingcalled up, but the RR node’s chance of being activated is 85 percent, and theMC node’s chance of being activated is 15 percent. As Preston (2002) pointsout, this is like flipping a loaded coin.

But what if the human being whose neural connections we are modelingchanges majors from psychology to English? The academic prose style in thenew discipline contains far fewer passive constructions than the prose style inpsychology. Consequently, the weightings in the connections in the networkin figure 6.4 are no longer optimal and need to be adjusted. This is possible bymeans of a process called “error back propagation,” which is a lot like theold behaviorist notion of operant conditioning. When the comprehensionmodel reaches the word selected and activates either the RR or the MC node,the model will quickly learn whether its choice was correct because it willencounter more words in the input string which may or may not fit the tem-plate it has called up. For example, suppose that the network chooses theRR node, but the word after by in the input string turns out to be lot, meaningthat the network has guessed wrong. In this case, the weights connecting byto the output nodes can be slightly adjusted, with the connection to the RRnode being slightly decreased and the connection to the MS node being slightlyincreased. Using the method of error back propagation over many thousandsof trials, a connectionist network can adjust its weights to reflect the prob-abilities with which features of the input correlate with the various output pos-sibilities, so that the network can most efficiently analyze incoming data. Itis thought that something like error back propagation is the mechanism bywhich Hudson Kam and Newport’s (2005) informants learned the frequencies

Figure 6.4 Simplified version of Spivey and Tanenhaus’s (1998) connectionist model forselecting an RR or MC template (see text).


of articles in the input provided to them. However, the neural mechanisms inhuman beings must be much more efficient than error back propagation runon a computer, because Hudson Kam and Newport’s (2005) informantsrequired only dozens of trials, not thousands.

Of course, the connectionist network in figure 6.4 is far too simple to effect-ively analyze input strings that might contain an RR or a MC. Factors besidesthe presence or absence of by must be included in the network, and Spivey andTanenhaus’s (1998) model includes three other factors, which are shown infigure 6.5.

The black (activated) node to the right of the by node in figure 6.5 repre-sents information in the discourse previous to the input sentence. In the case ofthe example sentence, if two actresses were mentioned in the preceding dis-course, the RR interpretation would be expected in order to identify which ofthe two actresses is being referred to. Suppose an examination of a corpusshowed that when two agents were mentioned in the previous discourse, an RRoccurred in 67 percent of the cases and in an MC occurred in 33 percent of thecases. This means that the weight of the connection from the discourse node tothe RR node should be .67 and the weight of the connection to the MC nodeshould be .33.

The third factor that the network takes into account is the probabilitythat an RR or an MC will follow the particular verb in the sentence. This factoris represented by the node labeled “verb.” Suppose that an examination of dis-course showed that the verb selected is followed by an RR in 60 percent ofcases and by an MC in 40 percent of cases. This means that the weight of

Figure 6.5 Spivey and Tanenhaus’s (1998) connectionist model for selecting an RR orMC template (adapted).


the connection to the RR node should be .60 and the weight of the connectionto the MC node should be .40.

The fourth factor that the network takes into account, represented by therightmost activated node in the constraint layer, represents the bias towardinterpreting an initial noun phrase + verb sequence as a main clause versus areduced relative. Again, suppose a corpus shows that 85 percent of sentence-initial sequences of noun phrase + verb are main clauses and 15 percent arereduced relatives. The appropriate weights will be .85 for the RR node and .15for the MC node.

The four activated nodes on the top layer of figure 6.5 represent fourindependent events (the presence of by, the presence of two possible referentsin the previous discourse, etc.), and it is assumed that each of these eventscontributes its probability equally to the overall probability of activating eitherthe RR node or the MC node. Therefore, the overall probability of activatingone of the bottom layer nodes can be calculated by multiplying the probabilityof each top node by one quarter and adding the four products. The results ofthese calculations for the test sentence are .57 for activating the RR node and.43 for activating the MC node, as shown in figure 6.5.

Now let us compare the connectionist network in the Aphasia Model tothe connectionist network for pseudosyntax just described. Both of these net-works, like all connectionist networks, are non-deterministic; that is, theyincorporate an element of chance. In this respect they are similar to the Varbrulprogram and different from categorical linguistic rules or Barsalou’s (1992)unmodified production model in figure 6.2, both of which are deterministicsystems that march along a predetermined path from input to output. Thus,chance plays a role in both the Aphasia Model and the pseudosyntaxmodel, but it plays a greater role in the pseudosyntax model. This is because theAphasia Model is designed to show how a connectionist network can modelslips of the tongue and other infrequent errors in the speech of native speakers.As discussed earlier, these errors result only from “noise in the system.”In Townsend and Bever’s (2001) template selection process, on the other hand,there is no “correct” output. The task is to calculate the probability of selectingthe RR node or the MC node when the four events represented by the blacktop layer nodes of figure 6.4 occur. The network can make an informed guess,but it might turn out to be wrong. To suggest a gambling metaphor similar toPreston’s (2002) loaded coin, it is like when a blackjack player has been dealttwo face cards and must decide whether to hit or stand. The odds certainlyfavor standing, but that might not turn out to be the right thing to do. Thissituation is exactly the same as that modeled by the Varbrul program or avariable rule, where independent constraints associated with differing prob-abilities combine to determine the likelihood of alternating forms, such asN or G.

An intriguing feature of Townsend and Bever’s (2001) comprehension


model is that in step 3 of the process the generative machinery of the Minimal-ist Program is used to mentally generate sentences in real time. As in theDerivational Complexity Theory, discussed in chapter 1, a generative theorythat was supposed to account for competence has been pressed into servicefor use in a performance model. Thus, the degree of abstraction between thecompetence theory and the performance theory has been narrowed to almostzero. Townsend and Bever (2001) comment:

[Our model] appears to be a quite promising reunification of psycho-logical modeling and linguistic theory . . . Such unification has been lack-ing generally, since the formulation of the Aspects model in the mid-1960s. Later syntactic architectures, especially government and binding,provided a hodgepodge of theoretical systems of constraints, each ofwhich might correspond to psychological operations . . . [But], hopesprings eternal: perhaps a new “derivational theory” of the psychologicaloperations involved in assigning syntactic derivation is at hand. (p. 179)

Elliott’s Study of Spanish Acquisition

We now turn to a study of second language acquisition that found evidence ofprobabilistic learning, which we will attempt to model using a connectionistnetwork. Elliott (1995) studied the acquisition of the Spanish clitic se, when it isused reflexively. His database consisted of computer conferencing messageswritten by American college students, which he analyzed using the Varbrulprogram. Eighty-six students were represented in his study, and they produceda total of 2,247 tokens. The students wrote messages to their classmates at leastweekly and often daily. They wrote mostly about class assignments, but theyalso discussed personal topics and campus events. The students received creditfrom their instructors for their postings, but were not graded on grammaticalaccuracy.

In Spanish, the clitic se has a number of related uses, one of which is as adirect object pronoun in a transitive sentence, as in (6).

(6) Nuria me cortó.Nuria me cut“Nuria cut me.”

When the agent and theme are the same in such a sentence, the object pronounfunctions as a reflexive, as in (7).

(7) a. Yo me corté.I myself cut“I cut myself.”

b. Tu te cortaste.you yourself cut“You cut yourself.”


c. El/Ella se cortó.he/she him-/herself cut“He/She cut him-/herself.”

d. Nosotros nos cortamos.we ourselves cut“We cut ourselves.”

e. Ellos/Ellas se cortaron.they (masc./fem.) themselves cut“They cut themselves.”

Notice that in the sentences above, the reflexive pronoun has four differentforms: me, tu, nos, and se, depending on the referent. For convenience I will usese as a generic to refer to all of these forms.

Se also has a number of other uses that are conceptually related to the transi-tive, reflexive use illustrated above, but which are not equivalent to Englishreflexives. One of these is the so-called “middle experiencer” use, which isillustrated in (8).

(8) a. Ella se molesta con tus preguntas.she herself annoys with your questions“She’s getting annoyed with your questions.”

b. Yo me alegro de verte.I myself make happy of to see you“I’m happy to see you.”

c. Yo me puse triste.I myself put sad“I get sad.”

According to Bull (1965) the transitive use of se in (6) and (7) is “logical”because the agent can act upon the patient, who, in the reflexive case, happensto be the same as the agent. Therefore, according to Bull, the reflexives in (7) are“true” reflexives. But, he says, the middle experiencer use of se, as in (8), is not“logical” because the verbs do not refer to a notion that an agent can performon a patient. Thus, the verb molestar, “annoy,” is used in its “logical,” transitiveuse in (9) because one person can annoy another person with questions.

(9) Juan la molesta a Nuria con sus preguntas.Juan her annoys to Nuria with his questions“Juan annoys Nuria with his questions.”

But, in (8)a molestar is used in its middle experiencer, “non-logical” use becauseone person cannot annoy herself with another person’s questions.

Another difference between the true reflexive and the middle experienceruse of se has to do with the semantic notions of the verbs in question. Kemmer(1993) provides a cognitive linguistics account (see chapter 8) of se that


categorizes verbs into semantic domains. Semantic domains include the phys-ical domain as in (7), the emotional domain as in (8), the social domain as in(10), and the ideational domain as in (11).

(10) Juan se casó.Juan himself married“Juan got married.”

(11) Juan se preguntó donde estaba el dinero.Juan himself asked where was the money“Juan wondered where the money was.”

According to Kemmer (1993), the middle experiencer use of reflexive se occursonly in the emotional domain.

Another fact about the middle experiencer use of se often noted by gram-marians is that the referent of se is somehow changed by the experience namedby the verb. This is clear in (8)b and (8)c, but less so in (8)a.

Elliott performed a Varbrul analysis using a number of factors as independ-ent variables. The demographic variables included information about theinformants, such as age, sex, and the number of years of Spanish instruction.The linguistic variables included the forms of the pronoun se (that is me, tu, nos,or se), the tense of the verb, and the semantic domain of the verb. In order tokeep things simple, I will not present Elliott’s results using Varbrul p valuesbecause there were a number of interacting factors that made the analysis com-plicated. Instead, I will just use percentage figures, but the Varbrul results areconsistent with what I will claim. Also, I am only going to discuss the effect ofthe semantic domain of the verb.

Elliott (1995) found that his subjects used se least accurately with emotionaldomain verbs. The main reason for the inaccuracy with these verbs was over-generalization. Thus, students tended to turn emotional domain verbs that arenot middle experiencer reflexives into middle experiencer reflexives, as when(12)a is erroneously turned into (12)b and (13)a is erroneously turned into(13)b.

(12) a. Odio esta comida.I hate this food“I hate this food.”

b. *Me odio esta comida.myself I hate this food“I hate this food.”

(13) a. Amo a Julia.I love to Julia“I love Julia.”

b. *Me amo a Julia.myself I love to Julia“I love Julia.”


The percentage of correct usage of se within each semantic domain is shownin table 6.1. As the table shows, the students were least accurate in the emo-tional domain.

As already noted, the main reason for the inaccuracy in the emotionaldomain was overgeneralization. The percentage of overgeneralization of sewithin each semantic domain is shown in table 6.2.

As table 6.2 shows, students had a strong tendency to incorrectly use se withtransitive emotional domain verbs, which should not take se.

Elliott’s Results Interpreted as a Connectionist Network

Let me now offer an informal explanation of why Elliott’s (1995) informantsovergeneralized se in the emotional domain. Notice that some emotionaldomain verbs work as true, “logical” reflexives. Odiar in (14) and amar in(15) are used in this way:

(14) Me odio. “I hate myself.”(15) Me amo. “I love myself.”

But, as we have seen, many emotional domain verbs are middle experiencerreflexive verbs. Although Bull (1965) calls these reflexives “non-logical,” thereis some logic in their use because, as mentioned, they imply a change in theemotional state of the experiencer, as illustrated in (8). Thus, a learner’sunconscious hypothesis about emotional domain verbs might be:

If a verb causes a change of state in the subject of the sentence, show thisby using se.

But it may be very difficult to decide, consciously or unconsciously, whethera particular verb is considered to cause such a change of state. For example,

Table 6.1 Percentage of correct se usage by semanticdomain in Elliott’s (1995) study.

Semantic domains

Physical Social Ideational Emotional

54 46 38 24

Table 6.2 Percentage of overgeneralization of se by semanticdomain in Elliott’s (1995) study.

Physical Social Ideational Emotional

52 20 69 86


laugh is considered to cause a change in the one who laughs, as in (16), but loveis not considered to cause a change in the one who loves, as in (17).

(16) Me reí de mi hermano. “I laughed at my brother.”(17) Amo a Julia. “I love Julia.”

It is easy to see how the fuzzy boundary between transitive verbs in the emo-tional domain (which are not reflexive) and middle experiencer verbs inthe emotional domain (which are reflexive) can be crossed, and transitiveverbs within the domain erroneously produced with se. We can now attempt toprovide a connectionist explanation for this behavior.

Figure 6.6 shows a connectionist network where the top layer representsthe semantic features in the preverbal message that activate the lemmas forparticular verbs. Among these features is information regarding the semanticdomain and whether the situation causes a change of state in the experiencer, asrepresented by the two black nodes.

Notice that for the learner represented in this model, amar is erroneously

Figure 6.6 A connectionist interpretation of Elliott’s (1975) data.


connected to the experiencer changes node. The bottom layer of the networkcontains a node representing the activation of se and a zero node representingno clitic particle. Overgeneralization of se for use with amar can occur in twoways. One way is the so called cascading phenomenon, or random activation ofnodes throughout the system. During the acquisition period, se can be ran-domly activated with any verb until connections between the verb nodes andthe se node or zero node have been firmly established. This could account forElliott’s subjects’ overgeneralizing se to verbs outside the emotional domain.But, as we have seen, overgeneralization within the emotional domain was farmore frequent. This could be explained by the second way in which se could beactivated. In order to activate amar, all of the top layer semantic nodes thatconnect to amar are normally activated first, including the emotional domainnode. Because this node is also connected to reir, molestar, and other middleexperiencer verbs, some activation will continue through these middle layernodes (though they will not be activated) on to the se node, which may thenreceive enough activation to be selected instead of the zero node. This is theconnectionist account of overgeneralization. Furthermore, because activationis a two-way street, spreading upward as well as downward, every time thespeaker erroneously produces amar with se, the dotted line connecting thetwo nodes will be strengthened and the line between amar and zero will beweakened.

Earlier, I claimed that a connectionist production story was also an acquisi-tion story. Let me now return to that topic. As we have seen, a connectionistnetwork can learn the appropriate connections and weightings using themechanism of error back propagation. When a learner hears or reads amarused correctly (without se), the connection between the amar node and thezero node is strengthened, and the connection between the amar node and these node is weakened. This is the connectionist instantiation of Krashen’s inputhypothesis in SLA theory. Furthermore, when a learner utters or writes amarcorrectly and notices that it seems right, or uses amar incorrectly and noticesthat it seems wrong, the same connections are strengthened and weakened.This is the connectionist instantiation of Swain and Lapkin’s (1989) outputhypothesis. Thus, the reweighting of connections in a network is compatiblewith two influential learning theories in SLA. Furthermore, the connectionistaccount grounds these abstract theories in actual mental processes and perhaps(if like Feldman [2006] we go beyond the metaphorical interpretation ofconnectionist networks) in actual brain functioning. Notice that in the accountabove, I have claimed that the connectionist component of a production modelis also a component of a comprehension model because the same connectionsare strengthened or weakened when learners both comprehend and producesentences. This claim is generally believed, but the details are controversial.

The connectionist interpretation of Elliott’s (1995) data shows how a con-nectionist model can learn categorical rules, such as which emotional domain


verbs take se. Now let us consider how it might learn a variable rule. Over timethe connection between amar and se will be weakened by correct input andoutput until it disappears. But what if we are dealing with a variable linguisticfeature, say pronouncing -ing words as either G or N, as discussed in chapter 2?Just like the human participants in Hudson Kam and Newport’s (2005)experiment, the connectionist network could be trained to reproduce the fre-quencies of each of these forms as encountered in the input. But, of course,learning the basic frequencies (or in Varbrul terms input frequencies) of G andN is only the beginning of what a speaker needs to know. Real speakers mustaccurately learn the percentages of linguistic features that are associated withmany factors in the speaking context, including the formality of the speakingsituation, the speaker’s age, social class, gender, etc. In principle, a connection-ist network could handle this finely tuned frequency learning by dedicatingnodes in the top layer to each of the relevant features in the speaking context.That is, the network could learn not only the overall probability of connectinga verb stem with G or N, it could also learn to modify this baseline probabilitydepending on the identity of the speaker, the listener, the formality of thespeaking situation, and other contextual factors. In terms of Preston’s (2002)metaphor, this is how the coins are loaded. Thus, a connectionist networkshared by the production and comprehension systems may be the mysteriousscore keeper that Bickerton (1971) called for, which connects Variation Theoryto psycholinguistics.

Conclusion

In chapter 1, we discussed the relationship between a competence grammarand a performance model of speech production or comprehension, noting thatalthough Chomskian generative grammars can be consistent with performancemodels, they are conceived at the level of competence, a level of abstraction thatis not compatible with modeling probabilistic patterns. In this chapter we haveseen how the facts of language variation in the speech of both native speakersand language learners can be accounted for in performance models of produc-tion, comprehension, and learning that use the probabilistic mechanism ofa connectionist network. However, it would be nice to have a grammaticaltheory (not just a performance model) that allows for probabilistic behaviorand is compatible with connectionist modeling. In the next chapter, we willtake a look at such a theory, which is called cognitive linguistics. We have alreadyhad a glimpse of this theory because it was the framework in which Elliott(1995) conducted his study of middle experiencer verbs.


7Cognitive Linguistics

Introduction

In chapter 6, we saw that psycholinguists have incorporated probabilisticmechanisms in models of language production and comprehension. Themechanism used in both of the models we reviewed was the connectionist net-work. In this chapter, we will review a school of linguistics that is compatiblewith probabilistic mechanisms called cognitive linguistics (CL). We will see thatCL is compatible with connectionism and can provide an abstract characteriza-tion (a grammatical characterization if you like) of the workings of con-nectionist networks. I will further claim that both CL and connectionism arecompatible with variation theory. CL can provide a theoretical rationale forsome variable language phenomena, and connectionism can provide atleast the beginning of an answer to Bickerton’s (1971) question, discussed inchapter 1, of how the mind can learn and keep track of the probabilisticpatterns studied by variationists.

CL was developed in the 1980s by a loosely associated group of linguists whosometimes disagreed on certain principles and who did not always use the sameterms and symbols. Therefore, several branches of CL have emerged. However,all the branches agree on basic principles, which contrast with those of genera-tive grammar. These principles include the following: (1) considerations ofmeaning are necessary in doing grammatical analysis; (2) categorization isbasic to human understanding and human beings construct grammatical cat-egories like “noun” and “ditransitive construction” in the same way thatthey construct natural semantic categories like “cup” and “bird;” and (3) theboundaries between semantic and linguistic categories are often fuzzy, andtherefore language processing can involve making decisions about whatcategory a particular linguistic form belongs to based on probabilities.

CL is like generative grammar in that it aims to show the relationshipbetween an utterance (or phonological representation) and a meaning (orsemantic representation). However, CL is unlike generative grammar in that itattempts to show this relationship as directly as possible, without using highlyabstract devices like empty categories and traces. A CL description involvesonly three kinds of structures: phonological, semantic, and symbolic. We willexamine examples of all of these, but in order to provide the background forthat discussion, we will first consider the CL story of categorization.

101

Prototype Categories

The question of what to call the objects around us is central to languagestudy. What makes a container for liquid a cup and not a mug or a bowl? Whatmakes a flying creature a bird and not a bat? Questions like these involve themental process of categorization, and people engage in it all the time. At thesubconscious level, we must decide whether to call the color of our lost suitcase“green” or “blue.” At the conscious, and even legal, level we must decidewhether a college student is in-state (and entitled to reduced tuition) orout-of-state.

The traditional theory of categorization in linguistics and philosophy goesback to Aristotle, who said that members of a category share certain definingfeatures. An even counting number is any number that can be divided by two. Ahuman being is a featherless biped. But Wittgenstein (1953) pointed out thatthere are some categories that don’t seem to have defining features, such as“game.” It might seem that a game is any activity involving competition withother people, like basketball, or competition against odds, like roulette. Butbouncing a ball against the side of a building just for fun could be called agame, and in this case competition seems to be absent. Ball bouncing is a gamebecause it has some of the features of more typical games, such as the manipu-lation of a ball, and certain rules (you have to hit the building). Wittgenstein(1953) theorized that members of the category “game” do not have definingfeatures, but rather share a “family resemblance.” In fact, two games may havenothing in common, but each may have different features of more typicalgames, just as two sisters may not look alike but are obviously members of thesame family because one has her mother’s hair and the other her father’s skin.

Labov (1973) carried out a psycholinguistic experiment where he showedsubjects pictures of cup-like objects and asked them to name the objects. Hefound that there were some pictures that everyone called a cup, but as the vesselbecame shorter and larger in diameter more and more subjects began calling ita bowl. Characteristics besides the vessel’s dimensions also influenced naming.If the vessel had a handle, it was more likely to be called a cup, but if it was filledwith potatoes it was more likely to be called a bowl. Labov’s (1973) experimentshowed that there are some categories, like cup and bowl, that have fuzzyboundaries and gradually blend into other categories. Nevertheless, people donot have trouble deciding what vessel to serve coffee in because these categorieshave prototypical members about which everyone agrees. Such categories arecalled prototype categories.

The peripheral members of a prototype category are of particular interest tovariation theory because they allow one to observe prototype effects, or variablejudgments of category membership. In order to discuss prototype effects let ustake a look at some of the pioneering experiments conducted by Rosch in the1970s. Rosch (1973) asked subjects to identify pictures of different kinds of


birds. She found that they could more quickly identify robins and sparrowsthan chickens and hawks. This finding suggests that people think about birds interms of best examples, or prototypical birds, rather than in terms of atypicalbirds. CL proposes that the mental category BIRD has a prototype structure,with typical members at the center and atypical members at the periphery, asshown in figure 7.1. This kind of prototype category is called a radial category,and we will encounter further examples later in the chapter.

Figure 7.1, which is for purposes of illustration and only loosely based onresearch, is a schematic representation of the category BIRD, showing its cen-tral and peripheral members and their features. Robins and sparrows are thetwo central members of the category. They share the features + flies, + small,and + widely distributed. Swallows and doves come close to the prototypesbecause they share the first two of these features. Ostriches and penguins arethe most peripheral members of the category because they lack a widely sharedcharacteristic: the ability to fly.

CL claims that central members of the category, or exemplars, are used forinformal reasoning about the category as a whole. For example, if someone inTucson says, “I saw a bird on the patio,” the listener will picture a small flyingbird, not a quail or a roadrunner, even though these species are common inTucson. Rips (1994) demonstrated a similar phenomenon experimentally. Hetold one group of subjects that the robins on an island were infected with a

Figure 7.1 The prototype category BIRD. Bold type indicates features shared bysubcategories.

Cognitive Linguistics • 103

particular disease and asked whether they thought the ducks on the islandwould catch it. Then he told another group of subjects that the ducks on anisland were infected with a disease and asked whether they thought the robinswould catch it. The subjects were more likely to say that the ducks would catchthe disease from the robins than vice versa. This experiment suggests thatpeople consider a characteristic of a central member of a prototype categorycommon to the whole category, but that they consider a characteristic of aperipheral member particular to that member. As we will see, speakers also useprototype categories for learning and producing morphological and syntacticstructures. It should also be noted that the category BIRD is not a fuzzy orgraded category like CUP and BOWL because it does not have fuzzy edges. Allof the creatures in figure 7.1 are birds. Nevertheless, BIRD has prototypicalmembers and people use these exemplars when learning and thinking aboutcategories. Perhaps for paleontologists who study birds at the time when theywere evolving from dinosaurs, the category BIRD is a graded category, blendinginto the category PTERODACTYL.

Symbolic Structures

As mentioned earlier, symbolic structures are one of the three kinds of struc-tures postulated by CL. An example of such a structure is the lexical item“bird.” Following a tradition going back to Saussure [1915] (1974), CL claimsthat a word is represented in a speaker’s mind as a symbolic structure thatshows an association between a semantic representation (that is a concept,which is customarily written in capital letters) and a phonological sequence.The symbolic representation for the lexical item bird is shown in (1).

(1)

BIRD

/bird/

The semantic representation in (1) refers to the schema for BIRD representedin figure 7.1, and the phonological representation is straightforward.

According to Langacker (1991, p. 14), speakers relate the symbolic represen-tation for BIRD to symbolic representations for similar concepts, such as CAT,DOG, and PLANET, using an even more abstract symbolic structure. What theconcepts CAT, DOG, and PLANET have in common is that they involvematerial objects that can be perceived as distinct from the background space inwhich they are located. In other words, they are things. The abstract categoryTHING is represented in a speaker’s mind in a way that is compatible with the


following symbolic structure (where /x/ stands for phonological content ofsome kind, which at this abstract level is unknown).

(2)

THING

/x/

Like the symbolic representation for BIRD in (1), the symbolic representa-tion for THING in (2) associates a semantic structure (in this case a veryabstract one) with some kind of phonological representation. When speakerslearn a new word for something, say aardvark, they construct a sound/meaningsymbolic structure like the bottom half of (3), and they associate this withsymbolic structure (2), creating a more complex symbolic structure, which isshown in (3) as a whole. In other words, they note that an aardvark is a thing,like a cat, a dog, or a planet. Thus, part of learning the word aardvark is learningthe association shown in (3).

(3)

THING

/x/

AARDVARK

/ardvark/

Translating (2) and (3) into more common terminology (and for the momentsimplifying a great deal), (2) corresponds to the notion of noun, and (3) saysthat the word aardvark is a noun. Thus, (3) is similar to the lexical entry of agenerative grammar shown in (4).


(4) aardvark[AARDVARKNoun/ardvark/]

(where AARDVARK stands for a semantic representation).

However, an important difference between the CL approach represented in(3) and the generative approach represented in (4) is that in generative gram-mar nounhood is a purely syntactic property, based on considerations such aswhether a word can occur after a determiner or whether it can be modified toagree with a verb. CL recognizes these syntactic properties as part of whatmakes a word a noun but claims that a word’s meaning is the main consider-ation in determining its grammatical class, as explained below.

So far the discussion of the semantics of nouns has been greatly simplified.Of course, there are many nouns that are not things, like anger, moment, andyellow (as in “The painter used a bright yellow”). Langacker (1991, p. 16)points out, however, that at an abstract level these words share the essentialproperty of nouniness mentioned above: they can be construed as entities thatstand out from a background. A prototypical noun like bird is situated in thecognitive domain of physical space, where it can be distinguished from itsbackground mainly by its shape. To understand the nouniness of yellow, wemust shift to the cognitive domain of color, where the yellow region of thespectrum can be distinguished from the background of other colors (or profiledin CL terminology) by its hue. Similarly, the word moment is a noun, in part,because within the cognitive domain of time an individual moment can beprofiled or singled out from the moments that precede and follow it. In a simi-lar way, anger can be profiled as a distinct emotion within the emotionaldomain. Thus, nouns are any kind of entity that, like things, can be profiled insome cognitive domain. In (2), THING should be understood as representingany such entity: bird, anger, yellow, moment, etc.

Prototype Schemas in Morphology

English Irregular Past Tense

For a look at how CL handles morphology, let us consider a form that has beenmuch studied by variationists: the English past tense. Most English past tenseforms are either regular or irregular although some verbs, like dive, are in theprocess of being regularized, so that some speakers say dove, other speakers saydived, and still others alternate between the two forms. Let us consider how thefirst group of speakers retrieve dove according to Barsalou’s (1992) productionmodel (see figure 6.2). At level 3 of the production model, the lemma for dive(which is marked to show that this is an irregular verb) is joined with theappropriate lexeme. The phonological representations of irregular verbs arestored in the lexeme, so at this point the sequence /dowv/ is activated. In other


words, according to the model, dove, like all irregular verb forms, is memorized.Now let us consider a speaker who says dived. For this speaker, the lemmamarks dive as regular verb. At level 3 of the production model, the lemma isjoined to the appropriate lexeme, but in this case the lexeme does not containthe complete phonological representation, only the stem /dayv/. The fact thatthe lemma is marked for regular past causes an abstract PAST marker to beattached to the verb stem. At level 6 in the production model the familiar mor-phosyntactic rule for regular past tense chooses the appropriate ending fromthe set of regular past tense endings: /d/, /t/, or /id/.

Now consider the case of the speaker who alternates between dived and dove.Pinker and Prince (1994) suggest that one possibility for this speaker is that thelemma for dive is marked both regular and irregular, so that either /dayvd/ or/dowv/ could eventually be produced. In this case, the two forms would alter-nate randomly, a case of free variation.1 However, Pinker and Prince (1994) alsoallow for another possibility, one that was proposed by Bybee and Moder(1983; see also Bybee and Slobin, 1982) working within the framework of CL.The formal mechanism that they propose is the same as the mechanism forrepresenting lexical items: the prototype schema, which we will now look at inmore detail.

As mentioned earlier, a schema is any mental representation. Symbolicstructures are schemas. Additional examples of schemas can be found inBarsalou’s (1992) production model in figure 6.2. For example, CAT, theabstract concept of the animal (which might call up the lemma for cat, tabby,or tom, depending on the rest of the preverbal message), is a schema, as isthe phonological representation /kæt/. Similarly, the tree structure at level3 is a schema, as is the equivalent phrase structure rule S → NP VP. As notedin chapter 6, none of the schemas in figure 6.2 can model probabilisticpatterns. For example, if the schema for the phonological representation ofcat is accessed at level 4, only the phones specified in that schema can be senton to level 5. Thus, the version of the production model shown in figure 6.2,which has not been modified to include connectionist networks, cannotmodel the fact that New York City speakers sometimes say [kæt] and some-times say [kiyət]. A prototype schema, on the other hand, can model probabil-istic patterns. To see how this is possible, let us look at the details of Bybeeand Moder’s (1983) research on the learning of English irregular past tenseforms.

Bybee and Moder (1983) presented adult subjects with nonce (made up)verbs like strig and asked them what the past tenses would be. If irregular pastforms were simply memorized, we would expect that the nonce words wouldelicit regular past endings because they are not associated with a memorizedirregular form. But Bybee and Moder (1983) found that this was not the case.Although their subjects sometimes produced the regular forms, they moreoften produced irregular forms, such as strug.


Historically, grammarians have suggested that such innovative forms areproduced by analogy with real pairs of the present and past tenses of regularverbs, like spring – sprung, and clearly this is the case. But the question remains:What is the mental mechanism by which this analogy works? How did Bybeeand Moder’s (1983) subjects know that if strig were a real verb, its past tensewould probably be strug? Bybee and Moder (1983) proposed that irregular verbclasses are mentally represented as prototype schemas. The prototype schemafor the past form of strung-type verbs (technically class II irregular verbs) isshown in (5).

(5) /s/ C (C) /�/ /ŋ/

Notice that (5) is not an exemplar of a class II irregular verb (as robin is anexemplar of the bird category) because it is abstract. Exemplars are concreteexamples, like strung. It may be that learners can use both abstract prototypeschemas like (5) and exemplars when learning prototype categories (Goldberg,2006, p. 47).2

According to (5), the prototype form for the past tense of a class II irregularverb has an initial /s/ followed by a consonant, optionally followed by anotherconsonant (the parentheses indicate that the presence or absence of this thirdconsonant does not affect the prototypicality of the form), followed by thevowel /�/, followed by a velar nasal. As suggested, the past form sprung fitsthe prototype exactly. However, forms that have only some of the features ofthe prototype can also qualify as class II irregular past verbs. An example isflung, which contains a final /ŋ/, the vowel /�/, and a preceding consonant, butlacks the initial /s/. Thus, flung shares a family resemblance with sprung. It is amember of the prototype category of class II past forms, but not a centralmember. It occupies a position similar to that of the roadrunner in the proto-type category of birds shown in figure 7.1. In the case of class II past forms (asin all graded prototype categories) there are no necessary and sufficient fea-tures that distinguish its members from other past forms; rather, membershipin the category is a matter of degree.

Bybee and Moder (1983) suggest that in their experiment, when the subjectswere read a cue word like strig, they mentally constructed both a regular pasttense form /strigd/ and an irregular class II past form /str�g/. Then, they com-pared the irregular form to the prototype in (5). If the irregular form was suf-ficiently similar to (5), they uttered it; otherwise, they uttered the regular pastform. The fuzziness of the criteria for membership in the category of class IIpast form accounts for the fact that Bybee and Moder’s (1983) subjects showedprototype effects in the form of disagreement in regard to forms like siggedversus sug because sug is not close to the prototype. In later sections, I will pointout the similarities between prototype schemas and variable rules and howprototype schemas can be written in variable rule notation.


The Compatibility of Prototype Schemas and ConnectionistNetworks – Elliott’s Study

In chapter 6 we saw how Elliott’s (1995; Adamson and Elliott, 1997) study ofthe acquisition of middle experiencer verbs in Spanish could be explained atthe connectionist level, and I have noted that connectionist models are con-sidered to be compatible with the prototype schemas used in CL (Feldman,2006). Let us therefore ask how Elliott’s results could be discussed at the CLlevel, using prototype schemas. The discussion will illustrate how CL providesan abstract characterization of the workings of connectionist networks.

In the discussion of Bybee and Moder’s (1983) experiment, I suggested thatvariation in morphology could occur when prototype schemas that controlincompletely learned forms are accessed during production. Variation inElliott’s data could occur in a similar way. Native speakers and advanced learn-ers have memorized which emotional domain verbs are reflexive middle expe-riencer verbs requiring se and which are not. But, less advanced learners mustmake an (unconscious) informed guess. To do so, they may access a prototypeschema or an exemplar for emotional domain verbs, which can result in thevariable production of the clitic particle.

Recall that Elliott (1995) found that his subjects’ learning of reflexive verbswas less accurate in the emotional domain than in the physical domain. Themain reason for the inaccuracy within the emotional domain was overgeneral-ization. Subjects tended to turn transitive emotional domain verbs that do nottake se (like odiar “hate”) into middle experiencer verbs that do take se, as in(6)a and (7)a (the correct versions of these sentences are supplied in (6)band (7)b.

(6) a. *Me odio esta comida.myself I hate this food“I hate this food.”

b. Odio esta comida.I hate this food“I hate this food.”

(7) a. Me encanto la comida de Japon.myself I enchant the food of Japan“I love Japanese food.”

b. Me encanta la comida de Japon.me enchants the food of Japan“I love Japanese food.”

As mentioned in chapter 6, it can be difficult for learners to distinguish emo-tional domain verbs that are just transitive and not reflexive, like odiar “hate”and encantar “enchant,” from emotional domain middle experiencer verbsthat are reflexive, like reirse “laugh” and divertirse “have fun.” The difference is


that middle experiencer verbs require the participant to undergo a change ofstate while other kinds of emotional domain verbs do not. Such a change isclearly necessary to the meaning of verbs like volverse loco “to go crazy” andponerse triste “become sad.” But, for some of the middle experiencer verbs sucha change is not so obvious. These include reirse “laugh,” divertirse “have fun,”and quejarse “complain.”

In chapter 6 overgeneralization was modeled using a connectionist network,and it was suggested that because reflexive particles are so often activated withverbs in this domain, some activation can spread to these particles regardless ofwhich verb is activated. This claim is compatible with saying that a prototypicalemotional domain verb is a middle experiencer verb, which takes a reflexiveparticle and implies a change of emotional state in its subject. A prototypeschema for such a verb can be written using the conventions of variable rules,as in (8).

(8) Emotional domain:verb + <se>[<change of state in NPE>]

Schema (8) says that a prototypical emotional domain verb takes a reflexiveparticle and implies a change of state in the Experiencer NP. The angled brack-ets around two of the components indicate that an emotional domain verb canhave both of these components, only one, or neither, but that the prototype hasboth. These possibilities reflect Elliott’s (1995) results. Recall that his subjectsdid not always incorrectly attach se to emotional domain verbs. Rather, theirovergeneralization was variable. Such variable performance could be a proto-type effect resulting from the imperfect fit of peripheral members of the cat-egory. Sometimes the subjects did not use se with middle experiencer verbswhen they did not consider the experiencer to have changed, as in reirse“laugh,” and sometimes they used se with regular transitive verbs that seemedto imply a change in the experiencer, like amar “love.” In chapter 6 we saw howElliott’s (1995) results could be represented in the connectionist network infigure 6.5. This chapter has shown that this information is also compatible witha prototype schema, which can be written using the conventions of variablerules.

So far, I have argued that language learners can make use of prototypeschemas for constructing phonological and morphological representations offorms they have not yet memorized and that such schemas can model variationin interlanguage. For this reason, CL is a promising grammatical theory forvariationist research. We will now consider how CL handles syntax, and howthe theory can model syntactic variation in learners’ speech.


Prototype Schemas in Syntax/Semantics: The Acquisition ofArgument Structure

Introduction

Recently, there has been much interest in how first and second language learn-ers acquire the correct argument structures for verbs. The relationship betweena verb and its arguments can be complex. For example, the locative verbs filland pour take NP PP complements, as in (9) and (10).

(9) Marsha poured water into the glass.(10) Marsha filled the glass with water.

But notice that for pour the first complement NP, water, is the theme of theevent of pouring and the second complement NP, glass, is the goal of the event.However, for fill the first NP, glass, is the goal and the second NP, water, is thetheme. Notice also that for both verbs the order of the arguments cannot bereversed, as in (11) and (12).

(11) *Marsha poured the glass with water.(12) *Marsha filled water into the glass.

To complicate matters further, some locative verbs do allow both thematicorders, as in (13) and (14).

(13) Marsha loaded the truck with hay.(14) Marsha loaded hay onto the truck.

Perhaps the most studied complement structure is associated with dative verbs,and it also presents difficulties. Some dative verbs, such as give, allow two com-plement patterns: a prepositional phrase or a ditransitive. Example (15) illus-trates the prepositional phrase complement, and (16) illustrates the ditransitivecomplement.

(15) The Council gave/sent/faxed $3,000 to Marsha.(16) The Council gave/sent/faxed Marsha $3,000.

However, other dative verbs, like donate, allow only the prepositional phrase, asin (17) and (18).

(17) The Council donated/presented/credited $3,000 to Marsha.(18) The Council *donated/*presented/?credited Marsha $3,000.

How can language learners master the subtleties of verbs and their comple-ments? In particular, how can they avoid overgeneralizing and producingstructures like (18)?

Pinker (1989) calls the problem of avoiding overgeneralization “Baker’sParadox,” after C. L. Baker, whose 1979 article brought widespread attentionto the problem. Baker’s Paradox has three aspects. The first aspect involves


productivity. If speakers never constructed productive rules that allowed themto generate forms they had not heard before, there would be no problem. But,as we have seen, Bybee and Moder’s (1983) subjects did generate such forms,and Pinker (1989) documents that children overgeneralize ditransitive con-structions, producing sentences like “*You finished me lots of rings” insteadof “You finished lots of rings for me” (p. 21). The second aspect of Baker’sParadox is the lack of negative evidence. If speakers were corrected when theysaid things like (11) and (12), they could avoid overgeneralizing in the future,but apparently such correction does not occur for L1 learners or for many L2learners. The third aspect of the Paradox is the question of arbitrariness. Thefact that nearly synonymous verbs like donate and give have different comple-ment structures means that there is no simple semantic guideline for pairingverbs with complements. Different scholars have suggested different solutionsto Baker’s Paradox, and we will take a look at three of them: Baker’s (1979)Strict Constructivism Hypothesis, Pinker’s (1989) Lexical Rule Hypothesis,and Goldberg’s (1995, 2006) Construction Grammar Hypothesis, which isdone within the CL framework. Then, I will present a pilot study of ditransitiveuse by native speakers and ditransitive acquisition by adult Korean speakers.

Baker’s Strict Constructivism Hypothesis

Baker (1979) challenges part 1 of the Paradox. He claims that children do notovergeneralize but rather memorize verbs and their complements as they areencountered. However, as mentioned, longitudinal studies of child acquisition(Bowerman, 1988; Gropen, 1989; Pinker, 1989), as well as experimental studiesusing nonce verbs (Gropen, Pinker, Hollander, and Goldberg, 1991) show thatchildren do, in fact, overgeneralize to some extent, so the strict constructivismhypothesis has been abandoned.

Pinker’s Lexical Rule Hypothesis

Pinker (1989) challenges part 3 of the Paradox, claiming that complement pat-terns are not arbitrary. Rather, complement structure is signaled by complexmorphophonemic and semantic clues. Before looking at the clues to dativealternation, let us consider the theory that Pinker adopts, which is an adaptedversion of Lexical-Functional Grammar (Bresnan, 1982).

Early generative theories accounted for dative sentences with a syntactictransformational rule like (19), which would change a sentence like (20) into asentence like (21).

(19) V NP1 to NP2 → V NP2 NP1(20) Marsha threw the ball to John.(21) Marsha threw John the ball.

One reason that syntactic transformations fell out of favor was that they werenot supposed to change the meaning of the structure. But people perceived that


rules like (19) did change meanings. For example, according to Pinker (1989)(20) can be used where John did not catch the ball or could even be asleep, but(21) entails that John was meant to receive the ball and invites the inferencethat he did. Similarly, “She taught Amharic to the students, but they didn’tlearn anything” sounds all too natural, whereas “She taught the studentsAmharic, but they didn’t learn anything” sounds odd.

Lexical-functional grammar represents the different meanings in (20) and(21) with the semantic representations in (22) and (23) respectively. Thesesemantic representations, which consist of universal semantic primitives, arestored with the lexical entries for verbs. Notice that this claim entails that throwhas two different meanings:

(22) throw1 → x causes y to go to z(23) throw2 → x causes z to possess y

Pinker (1989) says that children notice that many dative verbs besides throw(for example give, send, teach, tell, and get) have both of these meanings.They are then able to abstract a lexical rule that relates the two meanings, asshown in (24).

(24) a. x causes y to go to z →b. x causes z to possess y(where the arrow means “entails that”)

Then, when children hear a new verb, say fax, with a meaning similar to send,they are able to apply rule (24) to produce (25) without hearing fax used in theditransitive form.

(25) Marsha faxed John the letter.

To continue briefly with the derivation of (25), Pinker claims that the semanticstructure represented by (24)b, which is stored in the new lexical entry for fax, isprojected onto an x-bar template by so-called “linking rules,” which indicatethat the first argument in the logical structure becomes the sentence subject, thesecond argument becomes the indirect object, and the third argument becomesthe direct object. Thus, for sentence (25), x = Marsha, y = John and z = the letter.

We now return to the question that began this section: How do learnerslearn that rules like (24) can apply to verbs like fax and give but not to verbslike donate and credit? As mentioned, Pinker claims that there are morphopho-nological and semantic constraints, or clues, as to which verbs allow theditransitive. We will consider the morphophonological constraint first.

Old English had only the ditransitive dative, and there was considerable free-dom as to the order of the two objects. Misunderstanding was avoided becauseNPs were marked for case. The prepositional dative was introduced fromFrench and became widespread in the thirteenth and fourteenth centuries.During that period, the case markers eroded, and word order became less


flexible. For a time there was almost complementary distribution of the twodative forms, with prepositional datives occurring only with latinate verbsborrowed from French and ditransitive datives occurring only with nativeEnglish verbs. Later, both types of verbs extended their range to the other pat-tern, but, as we will see, this extension was not complete, and the ditransitiveform still cannot be used with many latinate verbs.

Because children are not aware of English etymology, they must have somesynchronic clues as to which verbs are native and which verbs are latinate.According to Pinker (1989), native verbs are single syllables or, if polysyllabic,take stress on the first syllable. Also, prefixes and suffixes signal latinate forms.Since this constraint involves both morphology and phonology, Pinker calls itthe “morphophonological constraint.” This constraint does not apply to allditransitive verbs, as discussed below.

The second type of constraint on which verbs allow the ditransitive issemantic. We have seen that all ditransitive verbs require basic semanticsthat match (24), that is, the goal argument must be a potential possessor ofthe theme. Pinker (1989) calls this requirement the broad-range semanticconstraint. This requirement explains the ungrammaticality of (26)b, where“Chicago” cannot possess the car.

(26) a. I drove the car to Chicago.b. *I drove Chicago the car.

Notice that possession need not be literal; for example, verbs of communica-tion are treated as denoting the transfer of messages which the recipientmetaphorically possesses, as in “He told her the story” and “She showed himthe answer.”

Pinker (1989) calls (24) a broad-range lexical rule, but there must be narrow-range lexical rules as well to disallow verbs like push, whisper, and say, which aresemantically compatible with causing a goal to be viewed as a recipient, butwhich do not allow the ditransitive. Pinker identifies nine semantic subclassesof verbs which allow the ditransitive. These are shown in (27). Notice thatsubclasses 5 and 7 are immune from the morphophonological constraint.

(27) Verbs that take the ditransitive construction:

1. Verbs that inherently signify acts of giving: e.g., give, pass, hand,sell, trade, lend, serve, feed.

2. Verbs of instantaneous causation of ballistic motion: e.g., throwtoss, flip, slap, poke, fling, shoot, blast.

3. Verbs of sending: e.g., send, mail, ship.

4. Verbs of continuous causation of accompanied motion in adeictically specific direction: e.g., bring, take.


5. Verbs of future having (involving a commitment that a personwill have something at a later point): e.g., offer, promise, bequeath,leave, refer, forward, allocate, guarantee, allot, assign, advance,award, reserve, grant.

6. Verbs of communicated message: e.g., tell, show, ask, teach, pose,write, spin, quote, cite.

7. Verbs of instrument of communication: e.g., radio, e-mail, tele-graph, wire, telephone, netmail, fax.

8. Verbs of creation: e.g., bake, make, build, cook, sew, knit, toss(when a salad results), fix (when a meal results), pour (when adrink results). Notice that in prepositional phrase form theseverbs take for not to: “They baked a cake for Marsha.”

9. Verbs of obtaining: e.g., get, buy, find, steal, order, win, earn, grab.

Each of the verbs in these nine classes must participate in a different narrow-range lexical rule in order to license the ditransitive. That is, verbs of instant-aneous causation of ballistic motion, such as throw as in “Marsha threw Johnthe ball,” must undergo a lexical rule like (28).

(28) a. x CAUSES y to GO to Z (by means of INSTANTANEOUSBALLISTIC MOTION) →

b. x CAUSES z to HAVE y

Rule (28) would also license toss, slap, kick, etc. It would not license push, pull,lower, haul, etc. because these are verbs of “continuous causation of accom-panied motion in some manner,” as discussed below.

Pinker also identifies five subclasses of verbs that are compatible with thebroad-range constraint but not with the narrow-range rules, and thereforecannot take the ditransitive. These are shown in (29).

(29) Verbs that do not take the ditransitive construction

1. Verbs of fulfilling (X gives something to Y that Y deserves, needs,or is worthy of): e.g., *I presented him the award; *I credited himthe discovery; *Bill entrusted/trusted him the sacred chalice; *Isupplied them a bag of groceries.

2. Verbs of continuous causation accompanied motion in somemanner: e.g., *I pulled/carried/pushed/schlepped/lifted/lowered/screamed/hauled John the box.

3. Verbs of manners of speaking: e.g., *John shouted/screamed/murmured/whispered/yodeled Bill the news.


4. Verbs of proposition and propositional attitudes: e.g., *I said/asserted/questioned/claimed/doubted her something.

5. Verbs of choosing: e.g.,*I chose/picked/selected/favored/indicatedher a dress.

Pinker’s (1989) discussion shows that learners are faced with a formidable taskin learning the English ditransitive. They must learn a morphophonologicalconstraint, a set of narrow semantic constraints, and how the two interact.

Construction Grammar

Introduction

Construction Grammar (C×G) is a branch of CL that has been developed by anumber of researchers, many working at universities on the West Coast of theUnited States, including Fillmore (1988), Kay (1990), Lakoff (1987), Goldberg(1995, 2006), Shibatani (1996), and Feldman (2006). Here I will follow theaccount presented in Goldberg (1995, 2006). As we have seen, the lexical ruleaccount requires that if a verb participates in more than one type of argumentstructure, it must have more than one meaning. This requirement can lead toan unwieldy proliferation of meanings. For example, the verb kick participatesin no fewer than seven argument structures, shown in (30).

(30) a. Pat kicked the wall.b. Pat kicked Bob black and blue.c. Pat kicked the football into the stadium.d. Pat kicked at the football.e. Pat kicked his foot against the chair.f. Pat kicked Bob the football.g. The horse kicks.h. Pat kicked his way out of the operating room.

But we have the sense that the basic meaning of kick does not change in thesentences in (30). C×G posits that, in general, verbs retain a basic meaning, butthat a new element of meaning is added by the construction itself. Thus, weunderstand that the physical action in (30)a is the same as the physical action in(30)b. The sentences in (30), then, contain not eight different lexical verbs, butthe same lexical verb in eight different syntactic constructions. For example,(30)a is the transitive construction, (30)b is the resultative construction, (30)cis the caused motion construction, etc. Notice that (30)h, the way construction,contains not only specific grammatical categories but also a specific word: way.The way construction looks like this:

verb + possessive pronoun + way + PP.

Other sentences that exemplify this construction include “George painted his


way through the apartment” and “Rocky couldn’t punch his way out of apaper bag.” We will return to the way construction below.

Constructions are often presented in a formal notation using boxes, asshown in (31), which represents the ditransitive construction.

(31) Ditransitive Construction

Sem CAUSE-RECEIVE < agt rec pat >

PRED < >�| �| �| �|

Syn V SUBJ OBJ OBJ2

This construction maps a particular semantics onto a particular syntax. Thesemantic component of the construction says that an agent causes a recipientto receive a patient. The mapping says that the agent is realized syntactically asthe subject, the recipient as the first object, and the patient as the second object.This mapping is the equivalent of linking rules in Lexical-Functional Grammar.A verb, such as to hand, can be inserted in (or fused with) the constructionand mapped onto the ditransitive surface structure, as shown in (32).

(32) Composite Fused Structure: Ditransitive + hand

Sem CAUSE-RECEIVE < agt rec pat >

�| �| �| �|

HAND < hander handee handed >

As in Lexical-Functional Grammar, for a verb to be used in a particular construc-tion, its semantics must be compatible with the semantics of the construction.This requirement prevents drive from taking the ditransitive, as in (26)b. Recallthat Pinker (1989) enforced this semantic compatibility requirement by meansof his broad-range constraint on lexical rules.

As we have seen, Pinker (1989) also postulated a number of narrow-rangeconstraints on lexical rules, thus proliferating the polysemy of verbs. Forexample, in his theory, the meaning of throw in (20) differs from the meaningof throw in (21). Because C×G allows verbs to retain a single basic sense, wemay ask how it accounts for the narrow-range semantic constraints on the


ditransitive. The answer is the converse of Pinker’s answer. The constraints arefound not in the meanings of the verbs but in the meanings of the ditransitiveconstruction, which is a radial category like BIRD with a number of relatedmeanings, as shown in figure 7.2. The central meaning of the construction,labeled “A,” is expressed by verbs of giving, instantaneous causation of ballisticmotion, and continuous causation in a deictically specified direction. Theextended senses of the construction are shown in B to F.

Goldberg (1995) says that the extended senses of the construction are notpredictable from the central meaning, but rather are “motivated” by it. That is,they must be learned conventionally, but learning is facilitated because theconnections are not entirely arbitrary: they make sense because they are in afamily resemblance relationship. Pinker (1989) also uses the term “motivated”to express the relationship between the broad-range semantic constraint andthe narrow-range rules.

Goldberg’s Usage-based Account of the Ditransitive

Goldberg’s solution to Baker’s Paradox is similar to Pinker’s. She believes thatthe fusing of verbs with constructions must be conventionally learned, but thatverb semantics provide clues as to which constructions a verb can be fusedwith. However, there are important differences in the two accounts. A problemfor the lexical rule account of ditransitive learning is that it assumes that nar-row semantic classes are mutually exclusive. A verb can fit into only one of theclasses and therefore either dativizes (that is, undergoes lexical rule (23)) ordoes not. But, native speakers’ intuitions regarding verbs in the same narrowsemantic class are variable. For example, according to Pinker (1989), push isnondative (it is on the list of nondative verbs above as a verb of continuouscausation-accompanied motion in some manner). But, “John pushed me abeer” sounds acceptable to me and to some of the native English-speaking sub-jects of the experiment reported in the next section. Perhaps the reason for thisdisagreement about grammaticality is that push can be construed as a verb ofinstantaneous causation of ballistic motion, similar to shove. This resultsin prototype effects in the form of uncertain judgments. Goldberg (1995)remarks, “the determination of which narrowly-defined class a given verbbelongs in is not always entirely clear-cut. [. . .] In general, in the case of verbsthat may fall into one of two classes, one which can appear ditransitively andone which cannot, we would expect to find some dialectal variation in whetherthe verbs can be used ditransitively” (p. 42).

Notice that in Goldberg’s (1995, 2006) account, verbs still fall into semanticclasses that can be fused with the ditransitive construction. She suggests thatthese classes are learned through usage. For example, give is one of the mostfrequent verbs in mothers’ speech to children (Goldberg, 2006, p. 76) and somay serve as an examplar around which a category of verbs which take the


ditransitive is constructed. As children encounter other examples of verbs withsimilar meaning that are used ditransitively, they will be added to this radialcategory. Such a category is shown in figure 7.3. The clusters of verbs in thisfigure are the counterparts of the structural descriptions of narrow lexicalrules. If, for example, a verb’s semantics are compatible with an agent enablinga patient to go to a goal by means of instantaneous ballistic motion, then thatverb is eligible to be fused with the ditransitive construction. This fusing addsan element of meaning to the structure, namely that the goal successfullyreceives the patient and becomes a recipient. The end result is the same as theresult of a lexical rule. What is different in the CL account is the possibility for

Figure 7.2 The radical category “ditransitive verb” (from Goldberg, 1995). Reprintedwith permissions.


verbs that are loosely attached to their narrow semantic categories to be con-strued as belonging to another category.

A Pilot Study of Ditransitive Acquisition Among Korean Speakers

Psycholinguistic experiments in speech production (Hare and Goldberg, 1999)and speech comprehension (Ahrens, 1995; Kaschhuk and Glenberg, 2000;Bencini and Goldberg, 2000) provide evidence for the psycholinguistic realityof syntactic constructions. For example, Ahrens (1995) asked 100 nativeEnglish speakers to decide what the nonce verb moop meant in the sentence“She mooped me something.” Sixty percent of the subjects said that “moop”meant “give,” even though several ditransitive verbs have a higher overall fre-quency in Engish usage, including take and tell. These results suggest that thesubjects equated “moop” with the central sense of ditransitive verbs. In orderto explore whether learners of English as a second language have similar intu-itions about (real) ditransitive verbs, I conducted a pilot study involving nativespeakers of Korean and English, which is described in the next sections.

Research Design

A questionnaire eliciting intuitions regarding 29 potential ditransitive sen-tences (as well as a number of masking sentences) was administered to24 native English speakers (NSs) and 24 Korean speakers (NNSs). The potentialditransitive sentences were chosen with the help of a Korean linguist who is

Figure 7.3 Verbs that fuse with the ditransitive construction (from Goldberg, 1995).Reprinted with permissions.


also an ESL teacher, in order to avoid patterns that could be literally translatedfrom Korean or that are focused on in the Korean schools. The written instruc-tions told the subjects to read each sentence carefully and to rate its grammat-icality along a five-point scale, where 1 was completely ungrammatical and5 was completely grammatical. Examples of how to complete the rating taskwere provided using a grammatical, an ungrammatical, and a questionable sen-tence, none of which involved dative sentences. Written instructions for theKorean subjects were provided in English and Korean. In addition, each subjectwas given a cloze test in English. The NSs’ scores on this test ranged from 18 to24; the NNSs’ scores ranged from 3 to 25. In order to control for English pro-ficiency, all subjects scoring below 17 were excluded. This procedure left14 NNSs who were highly proficient in English. All of the subjects were livingin Tucson, and almost all of them had been in the U.S. for more than five years.Most were students at the University of Arizona.

NSs’ Results

It should first be said that the results of the pilot study are only suggestive andare intended to indicate areas of interest for a larger and more rigorous study.The verbs included in the study are shown in table 7.1, where they are dividedinto verbs that dativize and verbs that do not dativize, according to Goldberg(1995) and Gropen et al. (1991). Verbs that fail to dativize can be subdividedinto three groups: (1) verbs that fail to meet the broad-based semantic con-straint; (2) verbs that meet the broad-based semantic constraint but do not fallinto one of the narrow semantic classes that allow the ditransitive; and (3)verbs that violate the morphophonological constraint.

First, consider the grammaticality judgments of the NS subjects, whichappear in the left column of in table 7.2.3 The ordering of the verbs in thiscolumn appears to match fairly well the grammaticality claims of Goldberg(1995) and Gropen et al. (1991) shown in table 7.1. All of the verbs in the topthird of the column are listed by these authors as ditransitive verbs, and all ofthe verbs in the bottom third are listed as nonditransitive verbs. The study alsoincluded five verbs that appeared to violate the broad-range semantic con-straint (that the indirect object must be construable as the possessor of thedirect object), namely: walk (Nick walked the dog for her → *Nick walked herthe dog), drive (Elaine drove the car for them → *Elaine drove them the car),operate (George operated the projector for him → *George operated him theprojector), and intercept (Jim intercepted the message to her → *Jim inter-cepted her the message). The NSs gave the lowest possible rating to all of theseverbs except drive, which was rated in the bottom third of the scale, but wellabove the other three verbs. In hindsight, it seems that “Elaine drove them thecar” could be construed as meaning that Elaine drove the car to them and theytook over possession. “Nick walked her the dog,” on the other hand, does notseem to imply transferred possession of the dog.


From a variationist point of view, the verbs that fall in the middle area oftable 7.2 with scores ranging between .5 and −.5 are the most interestingbecause they appear to exhibit prototype effects. Recall that Pinker (1989) andGoldberg (1995, 2006) said that ditransitive verbs are learned conventionally,though they are motivated by semantic similarity with one of the narrowsemantic classes. Why, then, don’t the NSs rate all of the verbs near the top orbottom of the scale, as either grammatical or ungrammatical? Instead, sixverbs, create, steal, push, credit, erect, and grab, are rated between +.5 and −.5. Itshould be noted that because the judgments of all the subjects were lumpedtogether, these indeterminate ratings could result from both uncertain judg-ments on the part of individual subjects and disagreement among the subjects.But, the reason that some verbs could cause both uncertain judgmentsand dialect differences is because they do not clearly match the criteria ofthe ditransitive construction and so prototype effects result. Of course, no

Table 7.1 Classification of verbs on the grammaticality judgment task according toGoldberg’s (1995) and Gropen et al.’s (1991) criteria. Verbs that violate themorphonological constraint are marked with *.

Verbs that dativize (according to Goldberg’s [1995] categories, see key below)

B. owe, promiseC. deny, refuseD. save, award, reserve, book, forwardE. *approveF. (verbs of obtaining) get, win, steal, grab,

(verbs of creating) create, *discover, *improve, *erect

Metaphors: fax, quote, *communicateVerbs that do not dativize

(1) Fail to meet broad-based semantic constraint:play (a trick on Jill),walk (the dog for Tom)improve (the recipe for Marsha)*intercept (the message to Nora)operate (the machine for George)

(2) Meet the broad-based semantic constraint but do not fall into one of the narrowsemantic classes:

Verbs of fulfilling:*present, creditVerbs of continuous causation of accompanied motion in some manner: pushVerbs of manner of speaking: say

Key to semantic categories:

B. Conditions of satisfaction imply that agent cause recipient to receive patientC. Agent causes recipient not to receive patientD. Agent acts to cause recipient to receive patient at some future point in timeE. Agent enables recipient to receive patientF. Agent intends to cause recipient to receive patient


definitive claims can be made on the basis of this small study, but I would liketo suggest some possibilities.

The first thing to note about the six questionable verbs is that, by design,none of them fits Goldberg’s central sense of the ditransitive construction.They are all, to some extent outliers. In fact, credit and present appear in class 1(verbs of fulfilling) of Gropen et al.’s (1991) list of verbs that fail to dativize.Yet, credit was rated in the middle of the scale with a score of −.1 and presentjust escaped the middle ground with a score of −.6. One reason that creditsounds better than present is that present violates the morphophonological

Table 7.2 Native English and native Korean speakers’ ratings for the grammaticality ofthe ditransitive construction with various verbs.

Index English Koreann = 24 n = 14

1.0 fax, get, save, ship, award, deny,owe, reserve, win

0.9 book fax

0.8 forward, quote owe, award

0.7

0.6

0.5 refuse ship, get

0.4

0.3 quote

0.2 create forward, credit

0.1

0.0 steal, push

−0.1 credit save, book

−0.2 erect operate, approve

−0.3 create

−0.4 grab play, erect

−0.5 drive present, steal, discover, reserve

−0.6 present, communicate communicate, win, intercept

−0.7 deny, grab, improve, refuse

−0.8 approve push

−0.9 discover drive

−1.0 play, improve, say, walk,intercept, operate

walk

Mean = 0.1 Mean = 0.29


constraint. But why aren’t both of these verbs judged fully ungrammatical?Pinker (1989, p. 156) notes that both credit and present are fully grammatical ifwith follows the object, as in “They will present you with an award” and “I willcredit you with the full amount.” Thus, these verbs pattern like locative verbsthat take a direct object followed by a prepositional phrase (“Marsha pouredwater into the glass”). Pinker (1989) suggests that the requirement that theseverbs take with may be eroding. Construction grammar could perhaps betterexplain this shift by noting that the verbs are semantically very similar toverbs that express the central sense of the ditransitive construction (they sig-nify acts of giving), so they are becoming attached to the construction as anadditional sense. Until they become fully attached, they cannot be easily classi-fied and prototype effects will result, causing variation in dialect usage andgrammaticality judgments.

Create and erect fit semantically in Goldberg’s sense B of the ditransitiveconstruction (verbs involved in scenes of creation) but they both violate themorphophonological constraint, which may account for the variation inacceptability judgments for these verbs. Push should be unacceptable because itfalls into Gropen et al.’s (1991) forbidden class 2, verbs of continuous caus-ation of accompanied motion. But, as discussed earlier, in our test sentence,“Joe pushed me a beer,” push could be construed as a verb of instantaneouscausation of ballistic motion, analogous to slap in “Wayne slapped me thepuck.” Because the scene depicted by the test sentence is ambiguous, prototypeaffects result. It is less clear why our subjects did not like steal and grab inthe ditransitive construction. Both are verbs of obtaining, which in the prep-ositional construction take for. However, Pinker’s (1989, p. 116) observationsmay suggest a reason. He points out that verbs in this class carry “an overlay ofbenefaction,” in other words, in both the ditransitive and prepositional phraseconstructions verbs of obtaining imply that the transfer of possession willbenefit the recipient. But our test sentence, “George stole her $5,” may notresult in benefit to the recipient. And perhaps the same consideration appliedto “She grabbed me a sandwich.” Who wants a grabbed sandwich?

In sum, the uncertainty in the NSs’ judgments regarding six of the verbs onour test suggests that their assignment to semantic categories is not as straight-forward as Pinker (1989) and Gropen et al. (1991) suggest. However, Goldberg’s(1995) grammatical construction account is compatible with fuzzy grammat-icality intuitions and dialect differences because these verbs do not clearly meetthe criteria for fusing with the ditransitive construction.

NNSs’ Results

The NNSs judged far fewer of the sentences to be grammatical than the NSs.NNSs’ conservative judgments of grammaticality have also been observed bymany researchers, including Tarone (1985) and Shi (2003). The NNSs judgedonly five verbs to be acceptable in the ditransitive (receiving an index score of .5


or higher): fax, owe, award, ship, and get. Table 7.2 shows that the NNSs’ judg-ments were also more variable than the NSs’ judgments. The NSs rated 15 ofthe 29 verbs in either the highest (+1.0) or lowest (−1.0) category, indicatingunanimous agreement for these verbs. But the NNSs rated no verbs in thehighest category and only one verb, walk, in the lowest category.

The five verbs that the NNSs judged to be compatible with the ditransitiveconstruction are in different semantic classes. According to Goldberg’s categor-ies shown in table 7.1, these are: fax (metaphor), owe (B), award (D), ship (A),and get (F). The fact that only five verbs, from five different semantic categories,were judged to be clearly ditransitive suggests that these verbs were individuallymemorized, not learned as part of a semantic class. As mentioned earlier, fiveverbs that violate the broad-range semantic constraint were included in thestudy: walk, drive, operate, play (a trick on), and intercept. The NSs gave thelowest possible rating to all of these verbs except drive, which was discussedabove. The NNSs gave very low ratings to walk and drive and rated interceptquite low. But play and operate are rated in the middle of the scale. These factsalso suggest that the NNSs have not mastered the broad-range semantic con-straint but have learned which verbs fit the ditransitive on a verb-by-verb basis.

We now consider the morphophonemic constraint. As table 7.2 shows, theNSs appear to abide by this constraint. Three verbs which violate the con-straint, present, approve, and discover are rated at −.6 or lower (compare “Johngave/*presented, found/*discovered her the perfect dress”; “John promised/*approved her the raise”). The NNSs did not give high ratings to any of theseverbs, though approve (which could be construed either as a verb of permissionto which the morphophonemic constraint applies or a verb of future having, towhich the constraint does not apply) was rated 12th out of the 29. Note alsothat the NSs rated reserve and award, verbs of future having that violate themorphophonemic constraint, fully grammatical. However, the NNSs gave amuch higher rating to award than to reserve, again suggesting that they arelearning ditransitives on a word-by-word basis.

The case of deny and refuse, however, provides evidence that semantics playssome role in the NNSs’ ditransitive learning. These two verbs of refusal form anespecially interesting class because (according to Pinker [1989]) they appearonly in the ditransitive form, not the prepositional form (compare “Annettedenied/refused him a promotion”. “*Annette denied/refused a promotion tohim”). If only memorization was involved in learning which verbs are ditransi-tive, these verbs should be prime candidates to be memorized, but as table 7.2shows, the NNSs judged both deny and refuse ungrammatical, despite the factthat they can have heard them used only in the ditransitive construction.Pinker (1989) considers deny, refuse completely unlike other ditransitive verbsbecause they do not undergo a lexical rule. Goldberg (1995, 2006), however,considers them no different from other ditransitive verbs; their meaning isa metaphorical extension from the central meaning of the construction.


As table 7.1 shows, she considers them members of semantic class C, “Agentcauses recipient not to receive patient”, which is an outlying member of theradial category of ditransitive verbs. But, deny, refuse are special members ofthis category because their meaning is the opposite of the meaning of the otherditransitives: they mean that the recipient does not receive or possess thepatient. In other words deny, refuse violate the broad-range semantic con-straint. It appears that the NNSs have not learned these verbs individually buthave excluded them based on their meaning. Thus, just as learning a newditransitive can be “motivated” by a similarity in meaning to a prototypicalmember of a category, perhaps learning can be inhibited by a dissimilarity ofmeaning.

Discussion: Prototype Schemas, Connectionist Networks, and Variable Rules

In this chapter, I have argued that prototype schemas are compatible withvariation in speech and grammaticality judgments. This claim is reasonable atthe conceptual level because, unlike generative formalisms, prototype schemasare non-deterministic. I have also pointed out that in some cases even theformalisms used to represent prototype schemas and variable rules are similar.In the discussion of Elliott’s (1995) study in chapter 6, I showed that proto-type schemas like (8)a can be written using variable rule notation, as in (8)b.Let us now see if this is possible for another prototype schema we havediscussed.

Bybee and Moder’s (1983) prototype schema for class II irregular verbs wasshown in (5), which is repeated here.

(5) /s/ C (C) /�/ /ŋ/

Recall that (5) says the prototype form for the past tense of a class II irregularverb has an initial /s/ followed by a consonant, optionally followed by anotherconsonant, followed by the vowel /�/, followed by a velar nasal. It would beinteresting to rank the features of (5) to show which of them most favored acandidate form like strug being analyzed as the past tense of a class II irregularverb by Bybee and Moder’s (1983) subjects. In fact, Bybee and Moder (1983)were able to discover these features by varying the initial and final consonantsof the cue words in their experiment. In variationist terms, they were able todiscover the ordering of the constraints. They presented this information intables, but it could be included in the prototype schema by using the conven-tions for variable rules, as shown in (33).

(33) Γ </s/> C (C) /�/ CA [<+velar>]B [<+nasal>]

Prototype schema (33) specifies the necessary (though not the sufficient) fea-tures and the optional features of the prototype past tense form for class II


irregular verbs. The necessary features are a mid-central vowel, a precedingconsonant, and a following consonant. A form with only some of these fea-tures, such as cut, which has the essential features but none of the optionalfeatures, would be far from the prototype. A more prototypical form wouldhave at least some of the optional features, which in (33) are ranked in order oftheir importance using the Greek letter notion discussed in chapter 1. A pasttense form in which the alpha and gamma features are present (such as strug) iscloser to the prototype than a form in which the beta and gamma featuresare present (such as strum). Thus, the prototype schema for the past form ofclass II verbs can be written using variable rule notation and used to modelknowledge of linguistic forms.

Let us now consider the implications of Bybee and Moder’s (1983) experi-ment for language acquisition. As discussed previously, these authors suggestthat in language learning, variation in the production of irregular verbs couldoccur when learners are confronted with the present tense of a new verb thatsomewhat resembles the prototype, for example slink. When called upon toproduce the past tense of this verb, learners would go through the same mentalprocess as Bybee and Moder’s (1983) subjects. First, they would consult thelexical representation of slink to see if an irregular past form had been stored. Ifno form was stored, they would compute the possible form slunk and compareit to prototype schema (32). But, because slunk only loosely matches the proto-type, sometimes slunk would be accepted and sometimes the regular past tensealternative slinked would be accepted. As acquisition proceeds, the learner willencounter the correct past tense of slink, and store it with the lexical entry.According to Bybee and Moder (1983), even after language acquisition is com-plete, the prototype schema will remain as a backup device, which can still beused if novel verbs are encountered.

The psycholinguistic process just described of choosing between twointernally generated forms, slunk and slinked, is similar to the choice describedin Spivey and Tanenhaus’s (1998) study discussed in chapter 6. Recall that thehypothetical learner in that study had to choose whether to tentatively analyzethe input string “the actress selected” as part of a reduced relative (RR) or amain clause (MC). Recall also that in Spivey and Tanenhaus’s (1998) model ofsentence comprehension this choice was made by a connectionist network,where different features of the input were represented by weighted connectionsbetween nodes. The choice of RR or MC was never certain, but different com-binations of input features favored one choice or the other. This situation isexactly the same as when a learner must choose between slunk and slinked, and,as the reader may have guessed, the prototype schema in (33) could be con-verted into a connectionist network, as shown in figure 7.4.

Figure 7.4 represents a network that makes a guess as to whether a candidateverb form should be produced as a legitimate class II past tense or whetherit should be rejected and the regular past tense rule employed instead. The


network is intended to be parallel to the network constructed by Spivey andTanenhaus (1998) shown in figure 6.5.

Now, let us look at the CL/connectionist story of regular past tense learningby children. The traditional account of this process is as follows. At first, chil-dren learn the -ed forms of verbs as memorized chunks, just as they learn theirregular forms. At some point, they notice that many verbs come in two differ-ent versions: walk–walked, push–pushed, etc., and that the -ed version is used todescribe past events. Many children then go through a stage of adding -ed toall verbs, producing forms like goed instead of the previously learned went.Eventually, the two types of verbs are sorted out, and regular verbs are pro-duced by the rule and irregular verbs are produced from memory. However,Pinker (1989) notes that this story skips over two important questions. Thefirst question is how the child knows “to look out for ‘present–past’ instead of‘hot–cold,’ ‘indoor–outdoor,’ ‘good mood–bad mood,’ and hundreds of otherinteresting distinctions?” (p. 193). The second question is “how a child deducesthat the rule is obligatory” (p. 193).

Pinker’s answer to the first question is that children are hard-wired to lookfor certain linguistic distinctions, such as the difference between the past andnonpast (but not “indoor–outdoor,” etc.), and are also innately programmed tocorrelate these distinctions with “minor differences in words, such as walk andwalked” (1999, p. 210). The CL/connectionist account is similar. Although CLdenies the existence of UG, it does acknowledge innate mechanisms of perceiv-ing and processing information. For example, as discussed in chapter 1 andthe appendix, the color category systems in languages expand over time inpredictable ways because of the hard-wiring of the human perceptual system.Furthermore, the connectionist programs that have been written to mimicchildren’s learning of the past tense build in information about how to detectthe differences in past and present forms and how to correlate these differenceswith past and present time. This is tantamount to assuming that children

Figure 7.4 A possible representation of prototype schema (6) as a connectionistnetwork.


innately understand the past–nonpast distinction and to correlate this distinc-tion with minor differences in words. Thus, the built-in programming inconnectionist networks is very much like Pinker’s (1999) claim about hard-wiring in children’s brains.

We have already seen the CL/connectionist answer to Pinker’s secondquestion of how a child can deduce that a rule is obligatory. In chapter 6 wesaw that the Aphasia Model learned a categorical rule using the method ofback-propagation. When the desired connections between the input layer andoutput layer were invariably reinforced, the weightings of those connectionsbecame so strong that probabilistic output ceased, except that which was com-parable to “noise in the system”, or in generative grammar terms “perform-ance error.” The human counterpart of this kind of learning was observed inHudson Kam and Newport’s (2005) research, which was reviewed at thebeginning of chapter 6. That study showed that even adults can learn the fre-quencies at which rules occur in input and that if a rule occurs in input100 percent of the time, it will be produced categorically. In fact, that studyfound that categorical learning results when input overwhelmingly favors arule but does not quite reach 100 percent.

To summarize, in this chapter we have seen that conceptually CL is compat-ible with connectionist theory, and that both CL and connectionism are com-patible with Variation Theory. I have therefore suggested that CL is a promisinggrammatical framework in which to discuss variable linguistic phenomena.


IVVariation in Pedagogical Perspective

8Speaking Style and Monitoring

Monitoring—Attention Paid to Speech

In chapter 4, we noted that early studies of variation in second language acqui-sition were psychologically oriented but that more recent studies have lookedat the social dimensions of SLA. The same is true in regard to the study ofstyle. In this chapter we will first look at some studies that considered speakingstyle mainly in psychological terms in both native and nonnative speech.This strand of research has had a profound effect on the field of SLA becauseit includes Stephen Krashen’s influential monitor model. But before Krashenbegan talking about monitoring, the term was used by William Labov to explaindifferent speaking styles in native speaker speech. Labov’s notion of monitor-ing compares to Krashen’s notion in interesting ways. After the discussion ofmonitoring, we will review some more recent studies, which look at the socialdimension of style. Finally, we will explore the relationship between the psycho-logically based notion of monitoring and the socially based notion of speech asan expression of identity.

Labov’s Account of Style and Monitoring

In his study of New York City speech, Labov (1972a) proposed that there areno single-style speakers: everyone alters the way they speak based on a numberof different factors in the speaking situation. One of these factors is the topicunder discussion. As we saw in chapter 2, topics like “language” and “soapbox”elicit a more formal style of speech, with higher percentages of G, than topicslike “kids” and “narrative.” Thus, in Labov’s (1972a) framework speaking stylesvary along a continuum of prestige. Styles at the lower end of the continuumcontain high percentages of informal or stigmatized features, and styles at theupper end of the continuum contain lower percentages of these features.

Labov’s (1972a) definition of style was directly connected to his methodof eliciting the different styles. The principal method was the sociolinguisticinterview, which was described in chapter 2. Two styles were distinguishedwithin the interview proper, formal and informal. Additional styles were elicitedby asking informants to read different kinds of texts, including reading style,word list style, and minimal pair style. Thus, for Labov style is defined by thecontext in which it is elicited.

Labov (1972a) also showed that style is related to a speaker’s social class.

133

Working-class speakers use higher percentages of informal or stigmatized fea-tures in all speaking styles than middle-class speakers, who use higher percent-ages of these features than upper-class speakers. One of the variable featuresthat Labov (1972a) looked at was postvocalic r, which New Yorkers often delete,so that guard can be pronounced [goəd] and fourth floor can be pronounced[foəð floə]. The relationship between /r/ deletion, speaking style and social classcan be seen in figure 8.1.

Consider the style of the highest social class (the top solid line). Thesespeakers produced the lowest percentage of /r/ in the casual style of the socio-linguistic interview, a higher percentage of /r/ on the reading task, and thehighest percentage of /r/ on the minimal pairs task. This pattern is shown in thespeech of all of the social classes, a fact which implies that notions of correct-ness and prestigious speech are embedded within the entire speech communityand affect all social classes in the same way. Labov (1972a) explained the pat-terning shown in figure 8.1 by invoking a construct from psychology: monitor-ing, or attention paid to speech. He claimed that in informal speech speakerspay attention to the substance of what they are saying and not to the way theysound. Under these circumstances their basic, or vernacular, style emerges. Onthe other hand, when reading, especially when reading a list of words, speakers

Figure 8.1 Class stratification of /r/ in guard, car, beer, beard, etc. for native New YorkCity adults (based on Labov 1972a, p. 114).

134 • Variation in Pedagogical Perspective

are able to pay more attention to how they sound, and they adjust their pro-nunciation in the direction of the prestige norm. Notice that this explanationof the relationship between prestige forms, social class, and monitoring assumesthat the informants were trying to use the prestige variant. Presumably, theirmotive for doing so was to sound like educated members of a higher socialclass. This explanation seems particularly appropriate in regard to the hyper-correct pattern of the second higher class. Notice in figure 8.1 that in word liststyle, which allows a high degree of monitoring, these speakers produced higherfrequencies of /r/ than the upper-class speakers.

To summarize, Labov (1972a) claimed that speaking styles can be arrangedalong a continuum of prestige that embodies the judgments of correctnessshared by members of a speech community. In the speaking situation used inLabov’s (1966) research, namely the sociolinguistic interview and associatedreading tasks, all members of a speech community changed the way they spoketo avoid the less prestigious features that are associated with vernacular speech.These facts are summarized by Rickford and Eckert (2001) as follows: “Thespeaker’s stylistic activity . . . [is] directly connected to the speaker’s place in,and strategies with respect to, the socio-economic hierarchy” (p. 2).

Krashen’s Monitor Model

Labov’s (1972a) claim that attention to speech results in a shift toward moreprestigious variants is related to the influential theory of second language acqui-sition proposed by Stephen Krashen (1978, 1982, 1985, 1987) called monitortheory. Krashen also found that some features of an informant’s speech (in thiscase the informant was a second language learner) varied with the elicitationtask. Unlike Labov, Krashen did not investigate variation among semanticallyequivalent forms like G and N. Rather, he looked at whether learners producedparticular morphemes, like plural -s, regular past tense -ed, and progressive -ing(in either form) in required contexts. In other words, he investigated the verti-cal continuum of language acquisition (see chapter 4). Several studies (e.g.,Dulay and Burt, 1974; Bailey, Madden, and Krashen, 1974) had found thatthese morphemes could be ordered according to how accurately they are usedby English learners, but that the accuracy order was different on different elicit-ation tasks. For example, on a speaking task progressive -ing was ranked firstin accuracy (that is, it was supplied most frequently in required contexts), buton a grammar test where the learners had to fill in the blanks in a story, pro-gressive -ing was ranked significantly lower (Larsen-Freeman, 1975). Thus,both Labov and Krashen found that speakers vary their speech according to theelicitation task and that they can produce more “correct” speech on tasks thatallow them to pay more attention to the form of their language. Both scholarscalled this attention to form “monitoring.” In these respects their ideas aresimilar, but upon closer inspection Krashen’s notion of monitoring turns outto be somewhat different from Labov’s.

Speaking Style and Monitoring • 135

Krashen’s monitor model attempts to explain why different elicitation tasksproduce different morpheme accuracy orders in the output of second languagelearners. Its basic claim is that second language production involves using oneor both of two separate psycholinguistic systems, as shown in figure 8.2. Noticethat the monitor model is a lot sketchier than the production model shown infigure 6.2.

The first system is the Language Acquisition Device (LAD), which is similarto the LAD proposed by Chomsky (1965) to explain language acquisition bychildren. The LAD constructs a mental grammar of the target language byanalyzing input, a process that is automatic and unconscious. (This mentalgrammar is invoked at level 3 in the language production model in figure 6.2.)The second psycholinguistic system is the monitor. The monitor is a generallearning device, not dedicated to language. Using the monitor, adults can learnabout language in the same way that they learn about physics and history: byconsciously learning facts and the connections between them. An exampleis memorizing the rule that regular English plurals end in -s. (Consciousknowledge is not represented in the production model in figure 6.2.)

The monitor model works in the following way. The model assumes thatan intended message has been already formed, and its processing begins atthe stage when the unconscious mental grammar in the LAD is accessed.If the speaker is not focusing on form, the interlanguage grammar contained inthe LAD will govern production. If this grammar contains a rule for plural -s,that morpheme will be attached to a plural noun (this path is indicated infigure 8.2 by the dotted line). However, if the speaker is focusing on form, theLAD output can be modified by the conscious rule in the monitor, as indicatedby the solid line in figure 8.2. If the output from the LAD does not includea required plural morpheme, -s can be added by the monitor. However, it isimportant to note that the monitor can also delete -s from a plural noun ifthe speaker has not consciously learned the plural rule correctly (see below).Monitor theory also claims that grammatical knowledge in the LAD andgrammatical knowledge in the monitor are internalized in entirely differentways. The unconscious process involving the LAD is called acquisition, andthe conscious process involving the monitor is called learning. Monitor theoryclaims that consciously known rules (the rules in the monitor) are of verylimited use. Successful monitoring can occur only if three conditions are

Figure 8.2 The monitor model.


met. First, the speaker must be attending to form; second, the speaker musthave sufficient processing time; and third, the speaker must accurately knowthe rule for producing the correct form. In later modifications of the model,Krashen (1982, 1987) added a proviso to this last condition: the consciouslyknown rule must be simple and easy to apply—a rule of thumb. If the speakerdoes not accurately know an easy to apply rule, monitoring can actually reduceaccuracy.

The monitor model has been criticized by many. Early criticisms (Spolsky,1985, p. 274; Adamson, 1988, p. 82) focused on the fact that Krashen conceivedof monitoring as an entirely conscious process. But the notion of attention tospeech seems to be broader than that. As Smith (1982) noted, “Attention some-times implies conscious knowledge and sometimes not . . . Attention simplymeans a kind of orientation, concentration, or focus” (p. 43). In response tosuch criticisms, Krashen (1982) modified the monitor model to include thepossibility of unconscious monitoring, based on a “feel for correctness.” Inthe revised model Monitoring, with a capital M, represented conscious ruleapplication, as described above, and monitoring, with a small m, representedthe unconscious “feel for correctness process. To illustrate this new theory, asmall m monitoring box was attached to the right of the LAD box in figure8.2. (The distinction between Monitoring and monitoring has not caught on,and the typography is confusing. Hereafter, I will use the word “monitoring”to refer only to conscious monitoring unless otherwise stated.) This changebrought monitor theory more into line with Labov’s (1966) notion of moni-toring, which implies that there is a continuum between conscious andunconscious monitoring. In reading a list of words, a speaker can con-sciously apply a learned rule of pronunciation, but in speaking there is notsufficient time, so the speaker must rely on an unconscious “feel forcorrectness”.

A second criticism of monitor theory was directed against Krashen’s (1978)claim that conscious knowledge can never become unconscious knowledge.This claim seemed even more unlikely after the monitor theory was modifiedto include a capacity for unconscious monitoring. In the language processingtheory proposed by Anderson (1980, 1983) and adapted to second languageacquisition by McLaughlin (1980, 1987) and others, it is axiomatic that con-scious knowledge becomes automatized and turned into unconscious knowl-edge. Krashen offered no convincing evidence why this process could nothappen in second language acquisition.

Monitoring in interlanguage was explored by a number of researchers in the1970s and 1980s, who looked only at vertical variation (the alternation betweena correct and an incorrect form), and who assumed that monitoring in a sec-ond language was similar to monitoring in a first language because in bothcases it produced more prestigious forms. We will now examine some of thesestudies.


Studies of Monitoring in Interlanguage

Dickerson (1974) was the first SLA researcher to suggest that speakers mightmonitor in their second language in the same way that they monitor in theirfirst language. She studied the pronunciation of /r/ by ten Japanese studentsstudying ESL at an American university. She found that her subjects produced/r/ with 100 percent accuracy when reading word lists but with only 50 percentaccuracy in free conversation. So, Dickerson (1974) claimed that attention toform resulted in the more target-like production.

Tarone (1979, 1982, 1985, 1988) looked at monitoring in morphology andsyntax. Recall that Labov claimed that native speakers have a vernacular stylethat underlies speech production, and that they modify this style appropriatelyto fit the different contexts of speaking, as shown in figure 8.1. Labov alsoclaimed that in language change it is the vernacular style that is most suscep-tible to new forms. This notion was mentioned in chapter 3 as Labov’s (2001a,p. 437) fourth principle of transition: “Linguistic changes from below developfirst in spontaneous speech at the most informal level.” Similarly, Tarone (1982)claimed that interlanguage speakers have a vernacular style (we might thinkof it as the most automatic style at a particular stage of acquisition), which isnot native-like, but which can be modified to include more native-like variantsin circumstances that allow attention to speech. This position was similar toKrashen’s notion of small m monitoring.

On the basis of further research, however, Tarone (1985, 1988) modified herposition. To see why, consider the findings of her 1985 study. She collected datafrom ten speakers of Arabic and ten speakers of Japanese, in three elicitationcontexts that differed in the degree of monitoring they allowed. These were:(1) a multiple choice grammar test; (2) an interview focusing on the inform-ants’ field of study and academic plans; and (3) a spoken narrative in which thesubjects recounted a series of events shown on a video screen.

It seems reasonable that the grammar test would be the best context foraccurate monitoring. Next best would be the interview, where the informantswere describing familiar notions and could use vocabulary and sentence pat-terns of their own choosing. Retelling the narrative would seem to be the worstcontext for accurate monitoring because the informants may have had to usevocabulary that they were not familiar with and to frame events in unexpectedways. Tarone (1985) looked at the subjects’ production of four forms: thirdperson -s, plural -s, articles, and presence or absence of the third person pro-noun it (as in sentences like “I won’t know what is in the package until I receiveit”). The results of the study were, in Tarone’s (1985) words, “Surprisinglycomplex” (p. 98). Because the Japanese speakers showed less variation than theArabic speakers, we will consider only the Arabic speakers’ results, thoughthe results for the Japanese speakers were generally compatible. The results forthe Arabic speakers appear in table 8.1, which shows that only in the case of


third person -s, did monitoring improve accuracy. Increased monitoring didnot affect the accuracy of plural marking, and it decreased the accuracy of arti-cle and third person use.

Tarone (1988) explained this pattern by suggesting that the grammaticalmorphemes she looked at fall into two classes according to the role they playin discourse, and that the two classes are affected differently by monitoring.Third person singular -s is different from the other morphemes because it isredundant. Therefore, in casual speaking or recounting a narrative, speakerscan ignore this form and still communicate effectively. Tarone suggests that thisis what her subjects did on the speaking tasks. On the grammar task, however,discourse coherence was not at stake, so the informants could focus their cogni-tive resources on even the redundant third person -s. But, articles, object pro-noun it, and plurals are not redundant in discourse; rather, they are importantfor conveying an intended meaning and preserving cohesion. Therefore,speakers must focus their cognitive resources on getting these morphemesright on the speaking tasks, even at the expense of accuracy with redundantforms.

Both Adamson (1988, p. 83) and Preston (1989, p. 259) have a somewhatdifferent interpretation of Tarone’s (1985) data. Recall that one of Krashen’s(1987) conditions for successful monitoring was that a rule must be easilystated and remembered—a rule of thumb. The rule for using third person -sfits this description, and, in addition, subject–verb agreement is a major focusof grammar instruction in ESL programs. Therefore, Krashen might predictthat monitor use would be helpful here. The rules for article use, on the otherhand, are far from simple, and so monitoring these forms should result in lessaccurate production. Plural -s is a monitorable form, and Tarone’s (1985) sub-jects used it more accurately in the interview but less accurately on the test.Therefore, the data regarding this form are inconclusive. The most interestingcase is object pronoun it. According to Preston (1989, p. 259), the rules foragreement between this form and its antecedent are subtle and cannot be

Table 8.1 Style shifting on four target language forms in narrative style, interview style,and grammar test style by Arabic speakers. Numbers represent percent ofcorrect usage.

Morpheme Elicitation task Result of Monitoring

Test Interview Narr.

3rd person -s 67 51 39 Improves accuracyNoun plural 70 83 71 No changeArticle 58 85 91 Decreases accuracyObject pronoun it 77 92 100 Decreases accuracy


stated easily. Therefore, monitoring for object pronoun it, like monitoring forarticles, might well result in a decrease in accuracy according to monitortheory.

Accommodation, Audience Design, and Self-identification

Accommodation Theory

Since Labov’s classic studies of style shifting, some researchers have taken avery different approach to the study of speaking style. The social psychologistHoward Giles and his colleagues (Giles and Powesland, 1975; Giles, 1984; Giles,Coupland, and Coupland, 1991) proposed that one source of variation is thatpeople adjust their speech in relation to the speech of their interlocutors. Hecalled this adjustment accommodation, of which there are two kinds: conver-gence and divergence. Convergence occurs when speakers alter their speech tobe more like that of their interlocutors because they want to establish solidaritywith or elicit approval from their audience. An example of convergence isfound in Labov’s (1972a) famous department store study, in which working-class department store employees altered their production of post vocalic /r/to sound more like their customers. Labov (1972a) sampled the speech ofemployees at three New York City department stores: Klein’s, Macy’s, andSaks, which cater to customers from the working class, middle class, and upperclass, respectively. His research design was ingenious. He hid a tape recorder ina shoulder bag, approached an employee, and asked where to find an item thathe already knew was on the fourth floor. The verbal interaction might golike this:

“Where are the men’s wallets?”“They’re on the fourth floor.”“Where?”“Fourth floor.”

Thus, Labov usually collected four tokens of /r/ words from each interaction.Because the store employees were working-class people, we can assume thattheir vernacular style would contain very little post-vocalic /r/, as shown infigure 8.1, and therefore that the chances of recording an /r/ as one of thefour tokens would be slim. This proved to be the case at Klein’s, where only15 percent of the employees Labov approached produced an /r/. However, atMacy’s 55 percent of the employees produced an /r/, and at Saks the figure was65 percent. These results suggest that the Macy’s and Saks employees wereadjusting their /r/ production in order to converge with the speech of theirmiddle- and upper-class customers. The department store study also illustratesthe important point that when speakers do not have the opportunity to sam-ple the speech of their interlocutors, they accommodate to their own mentalimage of that speech. Because Labov did not engage in conversation with the


employees, they could not get a good idea of how he spoke, so they addressedhim in the speaking style of a typical customer.

Divergence occurs when speakers alter their speech to be different from thatof their interlocutor because they want to increase the social distance betweenthem. An example of divergence is provided by Bourhis and Giles (1977), whoasked a group of ethnically Welsh English speakers to respond to an Englishspeaker with a Received Pronunciation accent. They found that when the con-tent of the conversation focused on Welsh–English differences, thereby poten-tially threatening the Welsh identity of the informants, these speakers divergedfrom the speech of the Englishman, emphasizing the Welsh features of theirown variety of English.

Audience Design and Self-identification

Accommodation theory comes from the field of social psychology, and itincludes an elaborate account of speakers’ motivations and personal relation-ships, which is not directly relevant to mainstream sociolinguistic concerns.Alan Bell (1984, 2001) built upon the insights from accommodation theoryto develop a more sociolinguistically centered theory called audience design,which we will now consider.

Audience Design

The basic mechanism of audience design theory is the same as that of accom-modation theory, namely that “Speakers design their style primarily for and inresponse to their audience” (Bell 1984, p. 143). Bell (1977) presented evidencefor this claim by analyzing the speech of four radio announcers, all of whomworked simultaneously for two New Zealand stations broadcasting out of thesame studio. One of the stations was the National Public Radio station, whichcatered to an upper-class audience, and the other was a local station, whichcatered to a working-class audience.

Bell (1977) looked at how the announcers pronounced intervocalic /t/. InAmerican English intervocalic /t/ is almost categorically pronounced as analveolar voiced flap (making writer sound like rider), but in New ZealandEnglish intervocalic /t/ is a true sociolinguistic variable, which stratifies accord-ing to social class. Bell noticed that the broadcasters were voicing considerablyfewer intervocalic /t/s while broadcasting on the National Public Radio stationthan on the local station, shifting an average of 20 percent between the twocontexts. Because this style shift occurred in the speech of the same individualsspeaking in the same physical location, he concluded that the only reason for itwas the change in audience; that is, the announcers were converging with thespeech of their listeners, producing more voiceless /t/s when addressing theupper-class audience and more flapped /t/s when addressing the working-classaudience. As discussed, Labov (1966) had also found a close relationshipbetween speaking style and social class, but Bell’s (1984) audience design


theory describes this relationship more comprehensively than Labov had withthe style axiom, which states:

Variation on the style dimension within the speech of a single speakerderives from and echoes the variation which exists between speakers onthe “social” dimension. (p. 151)

Evidence in support of the style axiom can be seen in the study of ing–in’ vari-ation in chapter 2. The social factor measured in that study was gender, notsocial class (social class was roughly controlled for by the fact that all the infor-mants lived in South Philadelphia, a working-class neighborhood), but genderis one component of the social dimension. Also, in the study in chapter 2, therange of style variation for individuals was not reported, only the range for allspeakers lumped together. Nevertheless, it is possible to compare the range ofvariation according to style for all speakers to the range of variation accordingto gender. As table 2.2 shows, the Varbrul p values for N (in’) for the factorgroup style, range from .72 in casual style to .32 in careful style, a difference of.40. In contrast, the p values for the factor group gender range from .77 for mento .24 for women, a difference of .53. Thus, the range of variation on the socialdimension is indeed larger than the range on the style dimension, as the styleaxiom predicts.

Most studies of language variation, like the study of ing–in’ variation inchapter 2, consider not only the dimensions of style and gender, but also thedimension of linguistic environment. Preston (1991) proposes to account forthe relationship between the linguistic dimension and the social dimensionwith the status axiom, which states that the range of variation on the linguisticdimension is greater than the range of variation on the social dimension. Thisclaim is also supported by the data in table 2.2. As just mentioned, the range ofp values for gender is .53, but the p values for linguistic environment range from1.00 for future (but see note a to table 2.2) to .13 for prepositions, a range of .87,which, as the status axiom predicts, is greater than .53. The question of how thestyle axiom applies in nonnative speech will be discussed later in the chapter.

Responsive Style Shifting and Initiative Style Shifting

Audience design theory posits two main types of style shifting: responsive andinitiative, both of which contain subtypes, as shown in figure 8.3. Responsivestyle shifting resembles convergence in accommodation theory. In its simplestform it occurs when speakers alter their speaking style to be more like thatof their interlocutor. Tracing down the different branches of responsive styleshifting in the tree diagram in figure 8.3, we can distinguish the followingsubtypes.

1. Audience, second person addressee. This kind of style shifting involvesaccommodating to the speech of an addressee, as was the case of the


clerks in Labov’s (1972a) department store study. Recall that, as inthat study, speakers sometimes do not have the opportunity to sampletheir interlocutor’s speech, so they accommodate to their own mentalmodel of how that person would speak.

2. Audience, third persons present. This kind of shifting typically occursduring a conversation within earshot of a third person, or auditor. Forexample, a teenager might address a friend in a more formal way if anadult were present.

3. Non-audience, setting/topic. Recall from chapter 2 that a number ofdifferent topics can be distinguished within the sociolinguistic inter-view, including “language,” “soapbox,” and “kids.” Labov (1972a)claimed that speakers tend to monitor more for some of these topics(such as “language”) than for others (such as “kids”). Bell (1984)agrees that different topics can elicit different styles, but he claims thatmonitoring is not involved. Rather, he says (Bell, 2002, p. 146) thatdifferent topics (as well as different physical settings, like a school or aplayground) are associated with particular styles of speaking. There-fore, switching topics or settings is a bit like switching audiencesbecause speakers recall the audiences that are typically addressed indiscussing these topics or that are typically present in these settings.As Bell (2002, p. 293) notes, this claim is reminiscent of Bakhtin’s(1981, p. 293) idea that “all words have the ‘taste’ of a profession,a genre, . . . a generation . . . Each word tastes of the context andcontexts in which it has lived its socially charged life” (quoted inBell, 2002, p. 143). This claim is also, of course, reminiscent of associ-ationist psychology, which was popular at the time Bakhtin did hiswork. So, according to audience design theory, we can understand

Figure 8.3 Bell’s categorization of different types of style shifting (from Bell, 2001).Reprinted with permissions.


Labov’s (1972a) informants’ shift to informal speech when discussingkids as their recollection of the “taste” of the language associatedwith this topic and setting, rather than their becoming less consciousof their speech and therefore reverting to their natural, vernacularstyle.

The second main type of audience design shown in figure 8.3 is initiativedesign, of which the most important type (and the only type we will discuss) isreferee design. In referee design, speakers do not shift to match the speech oftheir audience but rather shift in the direction of an absent group of speakers,or referees. Two kinds or referee design are distinguished.

1. Ingroup design occurs when speakers shift in the direction of theirown social group while addressing an interlocutor who is not a mem-ber of that group in order to emphasize their own identity. This kindof shifting is similar to divergence in accommodation theory.

2. Outgroup design occurs when speakers adopt a style that emphasizesfeatures of a group with whom they wish to identify. This kind of shiftcan be short term, or temporary, or can be long term, resulting in achange in basic speaking style, as speakers attempt to establish them-selves as members of an outside social group. An early example ofoutgroup design is provided by Labov’s (1972a) study of vowels onMartha’s Vineyard. The island of Martha’s Vineyard, which is dis-cussed in chapter 1, provides a fascinating laboratory for the studyof language change because the population of the island changesevery summer when swarms of tourists arrive from the mainland,bringing with them their mainland accents. The year-long residentsof the Vineyard, especially those who live on the western part of theisland, speak a Yankee dialect that contains relics of an earlier var-iety of English. Features of this variety include the pronunciation ofpost-vocalic /r/ (Boston English is r-less) and centralization of thevowels in the diphthongs /aw/ and /ay/, so that “about the house” ispronounced [əbəwt] the [həws]. Labov studied the pronunciation ofthese diphthongs in the speech of young islanders. He found thathigh school students who planned to leave the island for careers onthe mainland showed virtually no centralization in reading style,but students who were planning to stay on the island showed ahigh degree of centralization. Labov (1972a) suggested that vowelcentralization has become a marker of identity as a traditionalislander, and that those who wish to adopt this identity increase thisfeature in their speech. He notes, “When a man says [rəyt ] or [həws],he is unconsciously establishing the fact that he belongs to theisland: that he is one of the natives to whom the island really belongs”(p. 36).


Since Bell’s early work, sociolinguists have placed more emphasis on refereedesign as a motivation for style shifting because it is a way for speakers to affirmtheir own social identity or to construct a new one. One might ask how it ispossible to determine whether an instance of style shifting is responsive orinitiative. Did the New Zealand radio announcers devoice their /t/s on PublicRadio because they had gauged the speech of their audience and wished toaccommodate to it (responsive, audience design), or was it because they wishedto establish a personal identity as members of the upper class (initiative, ref-eree, outgroup design)? Bell (2001) admits that this question can be difficult,and suggests that sociolinguists must engage in a qualitative study of speakers’social circumstances and identities in order to answer it.

Studies of Style Shifting in Interlanguage

Perhaps the most carefully designed study of audience design in interlanguagewas conducted by Young (1991), who looked at how accurately adult Chinesespeakers supplied the plural morpheme -s in obligatory contexts. Young foundthat a number of factors, including the linguistic environment in which -sappeared, influenced the variation, but here we will look only at the effect ofaudience. Young (1991) arranged to interview each informant twice, once by anative English speaker and once by a native Chinese speaker. Young’s originalhypothesis was that his informants would accommodate to the speech of theirinterlocutors according to shared ethnicity. Thus, he predicted that in a Varbrulanalysis the factor native English speaker would favor plural marking and thefactor nonnative English speaker would disfavor plural marking. Young found,however, that the identity of the interviewer made no significant difference inplural marking. In a second analysis of the data, Young (1989) hypothesizedthat factors other than ethnicity might affect the informants’ degree ofaccommodation. Therefore, he constructed a “convergence index,” which tookinto account not only the ethnicity of the informant and interviewer but alsotheir age and sex. Thus, when an informant and an interviewer were of thesame sex, ethnicity, and approximate age, the convergence index was high, butwhen these factors did not match, the convergence index was low. Young(1989) also divided his informants into two English proficiency groups, highand low. This time the Varbrul analysis did produce some significant results,as shown in table 8.2.

As table 8.2 shows, only the high proficiency speakers contribute signifi-cantly to the variation. This fact suggests that speakers must gain a minimalproficiency in the use of a target language structure before socially conditionedvariation appears in their speech. The high proficiency speakers appear toaccommodate, but in unexpected ways. When there is high convergencebetween the informants and their interlocutors, plural nouns are marked morefrequently, not less frequently as Young had expected. A possible explanationfor this result is that the Chinese-speaking interviewers marked plural nouns


more accurately than the informants (we are not given information about therelative proficiency of the interviewers), and so the informants were, in fact,accommodating to the speech of their interlocutors. Within the audience designframework in figure 8.3, this type of style shifting appears to be: “responsive,audience, second person addressee” (but recall Bell’s [2001] warning thatethnographic research is required to firmly identify the motive for a style shift).The second result apparent in table 8.2 is that the high proficiency informantsdiverged from the speech of the NS interviewers with whom they had low con-vergence. Audience design theory can explain this result by the observation thatthe informants may have felt somewhat threatened by the test-like interview,conducted by a native English speaker who differed from them in age andsex, and therefore they wished to distance themselves from their interviewers.This type of style shifting would fit in figure 8.3 as “initiative, outgroup, tem-porary.” Young’s (1989) reanalysis of the data shows that for his informants, inthe context of this study, ethnic identity is not the only factor in whether highproficiency speakers converge or diverge from the speech of their interlocutors,but that age and gender are important as well.

Adamson and Regan (1991) looked at the pronunciation of -ing in thespeech of Vietnamese and Cambodian adults living in Philadelphia andWashington, D.C., and compared it to the pronunciation of native Englishspeakers (the native speaker part of the study is presented in chapter 2). Wefound that linguistic environment, speaking style, and speaker’s sex con-strained -ing pronunciation for both groups. Among the native speakers, bothmales and females produced the informal variant -in’ more frequently inunmonitored styles, though on the whole males produced much higher per-centages of -in’ than females in both monitored and unmonitored styles.Among the English learners, the females showed the same basic patterning asthe native English-speaking females, where unmonitored style favored -in’ andmonitored style favored -ing. Unexpectedly, however, we found that the malesused more -in’ in their more monitored style. In audience design theory thiskind of style shifting appears to be “initiative, referee, outgroup, long-term”

Table 8.2 Varbrul p values for plural -s marking by Chinese speakers according to con-vergence with interlocutors.

Low-proficiency High-proficiencyInformants Informants

High convergence with an NS n/s n/sHigh convergence with an NNS n/s 0.59Low convergence with an NNS n/s n/aLow convergence with an NS n/s 0.26

n/s = not significant; n/a = insufficient data


design. We speculated that this style shift represented our informants’ effort toconstruct an appropriate male identity in English.

Adamson and Regan (1991) also looked at aspects of variation along the ver-tical continuum. Because of transfer from the native languages of our inform-ants, their initial pronunciation of the -ing morpheme was /iyŋ/. Therefore,movement along the vertical continuum involved acquiring the variant /in/. Asnoted in chapter 2, Houston (1985) and Labov (2001a) had found that fornative speakers the relative frequency of -ing and -in’ was conditioned bygrammatical category. For the historical reasons discussed in chapter 2, thenominal categories of noun, pronoun, and gerund favored -ing, whereas theverbal categories of participle, progressive, and going to future favored -in’.Adamson and Regan (1991) found that, in general, nominals and verbals con-strained ing–in’ variation in the learners’ speech in the same way. However,there were two exceptions: the pronouns something and nothing and the pre-position during, which showed higher frequencies of -in’ than the more verbalcategories of participle and adjective. We speculated that the -in’ pronunciationof pronouns and these words involved lexical learning (memorizing individualwords), and that this kind of learning is easier than the learning of a variablerule, which is required for learning appropriate variation in open classes ofwords, such as progressives and gerunds. In terms of the discussion in chapter 3,the change from -ing to -in’ was proceeding in a way analogous to lexical diffu-sion in regard to the words something, nothing, and during, and was proceedingin a way analogous to regular sound change in regard to progressives andgerunds.

Major (2004) studied English learners’ production of four variants thatallow alternation between a formal and an informal form. These variantswere: (1) -ing versus -in’; (2) /n/ assimilation in can (e.g., can go versus [kæŋ]go); (3) palatalization (e.g., got you versus [gacuw]; and (4) deletion of /v/ in of(e.g., can of beans versus can o’ beans). The subjects of the study were 16 nativespeakers of Japanese, 16 native speakers of Spanish, and 16 native speakers ofEnglish, all of whom were enrolled in first-year university composition classes.Each language group was equally divided between males and females. TheSpanish speakers had been in the United States an average of 5.3 years, and theJapanese speakers for an average of 1.3 years.

The informants were asked to perform two verbal tasks, which, accordingto Major, elicited two different styles. The first task, which was expected toelicit the more informal style, was to read a complete sentence containingone of the target structures. For example, when reading the sentence “CanBetty go or not?” the informants might say “Ca[m] Betty go or not.” Thesecond task, which was expected to elicit the more formal style, was to readonly a phrase containing the target structure, such as “can Betty.” Majorbelieved that when reading only a phrase the informants would be more likelyto monitor their speech because they would not have to process the meaning


of a complete proposition. Therefore, they would be more likely to say “ca[n]Betty.”

Major (2004) first compared the sentence-reading style to the phrase-reading style for each group of speakers by calculating the overall percentage ofinformal variants used by each group when the results for all four phonologicalvariables were combined. He found that the native English speakers showed asignificant difference between the two styles, as shown in table 8.3.

The native Spanish speakers showed a smaller but still statistically significantdifference between the two styles, and the Japanese group showed virtually nodifference between the two styles. Major concluded that this result was due tothe fact that the Spanish speakers had lived in the U.S. long enough to acquirestyle stratification, but the Japanese speakers had not.

Major (2004) also looked for gender stratification within the three languagegroups and found the results shown in table 8.4.

As expected, the degree of gender difference for the native English speakerswas large, with the males producing significantly more informal forms than thefemales. There were also parallel gender differences for the Japanese speakersand the Spanish speakers, with the males producing significantly more infor-mal forms than females for both of these groups. From these results, Major(2004) concluded that his Spanish-speaking and Japanese-speaking informantshad partially internalized the community-based gender norms for the threevariables, although their degree of gender stratification did not exactly matchthat of the native speakers. Major (2004) also noted that although both groupsof English learners had to some extent acquired gender stratification, only the

Table 8.3 Percentage of informal variants for sentence reading style versus phrasereading style for three native language groups. N (the total number of sen-tences + phrases read by each language group) = 1,793.

Language group Sentence reading style Phrase reading style

English 37 7Japanese 27 26Spanish 25 13

Table 8.4 Percentage of informal variants for males and females for three nativelanguage groups. N (the total number of sentences + phrases read by eachlanguage group) = 1,793.

Language group Males Females

English 28 15Japanese 30 22Spanish 24 13


Spanish speakers, who had been immersed in English longer, had acquiredstyle stratification. From this fact, he concluded that gender norms appearto be acquired before style norms. This finding points to the possibility ofa “gender axiom” for adult language learners, along the lines of the styleand status axioms for native speakers. The gender axiom would state thatstyle stratification is acquired after, and is possibly derived from, genderstratification.

Major’s (2004) study aligns with the studies of native speaker speech in thatit showed that in circumstances that allow more attention to form, second lan-guage learners produced more correct and more formal speech. However, thestudies by Tarone (1988) and Adamson and Regan (1991) show that this isnot always the case. A study in which monitoring produced less correctspeech was conducted by Thompson and Brown (2003). These researcherslooked at the interlanguage of a single informant, a native Spanish speakerwho spoke fluent English. She had studied English in Spain in elementary andsecondary school but had no functional ability in English when she arrived inthe United States at the age of 19. At that time, she married an American (whospoke Spanish but was English dominant) and studied English at an Englishlanguage institute. After one year, she received a high enough score on theTOEFL (Test of English as a Foreign Language) test to gain admission toan American university. Thompson and Brown’s (2003) study was conductedthree years later, when the informant was described as “an advanced speaker ofEnglish but with a notable foreign accent” (p. 41).

The researchers looked at the informant’s acquisition of the English phon-eme /i/, which does not exist in Spanish, so that many Spanish speakers pro-nounce ship as sheep and hit as heat. As in the Labovian studies, the informantwas asked to produce speech in five styles: narrative, conversation, reading,word list, and minimal pairs. When constructing the three reading tasks, theresearchers made sure to include only words that the informant was familiarwith. A Varbrul analysis was made to determine the effect of the following fac-tors: style, syllable stress (primary or secondary), preceding sound, followingsound, part of speech, number of syllables in the word, and whether the wordwas part of a minimal pair (e.g., ship forms a minimal pair with sheep, but kit isnot part of a minimal pair because there is no word keet). Of these possibleindependent variables, only style and minimal pair proved to be significant. Itwas expected that minimal pair words would be more accurate because moreconfusion could arise if they were pronounced like their mates. However, theinformant pronounced the minimal pair words less accurately than the non-minimal pair words. This result could be explained by a connectionist model oflexical access in the following way. When a minimal pair word is accessed, itsmate is partially activated and can affect the choice of the vowel in the targetword. When a non-minimal pair word is accessed, words that sound somewhatlike it are also partially activated, but the activation will be less strong; therefore,


phonological interference will be less for these words than for minimal pairwords.

The effect of style was exactly the opposite of that in Major’s (2004) studyand in the Labovian studies. The percentage of accurate pronunciation of/i/ was as follows (Thompson and Brown [2003] do not provide the Varbrulp values):

Style % /i/ No. of tokensNarrative 96 238Conversation 97 300Reading passage 87 70Word list 76 33Minimal pairs 42 24

One explanation for the unexpected effect of style on the informant’s speech isthe same as the one given in regard to Tarone’s (1988) informants’ data. Thatexplanation (endorsed by Adamson [1988] and Preston [1989] but not byTarone [1988]), was that that these speakers were attempting to monitor formsthat they did not consciously control, and in guessing which form to use theywere less accurate than if they had allowed the unconscious speech productionmechanism to work on its own. This account may explain Thompson andBrown’s (2003) results because their informant apparently was attempting tomonitor her vowel production on the reading tasks. Thompson and Brown(2003) report:

Many times the sound would start as a one vowel /i/ and finish as /iy/ orvice versa reflecting a possible attempt to try to approximate the native-speaker norm. Additionally, when [the informant] attempted to producethe minimal pairs, she produced the same vowel but varied the vowellength or pitch attempting to differentiate the two sounds. This lengthen-ing of vowels and change of pitch did not change the timbre of the soundbut reflected the monitoring that was occurring as she tried to dis-tinguish between the two words. She was aware of the difference becauseof the distinct orthography that was present, but she was unable todetermine the correct vowels. (p. 17)

As Thompson and Brown (2003) note, their informant’s behavior was simi-lar to that of Flege’s (1991) informants. In his study, Spanish speakers learningEnglish sometimes produced /i/ for /iy/, for example pronouncing steal as still.Flege (1991) suggested that L2 learners have more difficulty establishing anindependent phoneme category when a target sound is very similar to one intheir own language than when a target sound has no close native languageequivalent.

A second and possibly compatible explanation for the pattern of style shift-ing found in Thompson and Brown’s (2003) study may be found in audience


design theory. Recall from the discussion earlier in the chapter that accordingto Bell (2001) one kind of style shifting involves a change in topic and/or set-ting, and that, following Bahktin (1981), Bell says that certain language formscarry the “flavor” of certain topics and settings. Different kinds of elicitationtasks may evoke particular language forms in this way as well. The informant inthis study had studied English in Spain in elementary and secondary school,where English was taught mainly in a reading context, and with almost noexposure to native English speakers. Therefore, in the classroom setting theinformant presumably had little input containing /i/ and learned to pronouncewritten English with a Spanish accent. However, after arriving in the UnitedStates, the informant was exposed mainly to spoken English in conversationswith native English speakers, and so in this setting she received a great deal ofinput with /i/. Perhaps, then, the informant’s use of /i/ and /iy/ to some extentreflected the circumstances in which she had been exposed to these sounds.

Reconciling Monitoring and Audience Design

An important difference between Labov’s attention to speech theory of styleshifting and Bell’s audience design theory is the notion of cognitive difficulty,which the construct of monitoring implies. The monitoring explanation forstyle shifting assumes that for native speakers the formal styles are more dif-ficult to produce, requiring more attentional resources than the informal styles.Audience design theory does not make this assumption. It claims only thatdifferent styles are appropriate to different speakers, audiences, and topics.

It may be that the different research designs that Labov and Bell used arerelated to this difference in their theories. In Labov’s research design (the socio-linguistic interview) audience was held constant while speaking task was mani-pulated, thus isolating the differences associated with monitoring. In Bell’sresearch design, audience was manipulated while speaking task was held con-stant, thus isolating the differences associated with audience. Labov’s socio-linguistic interview and reading tasks are a bit like a psychological experiment.An informant is usually interviewed and recorded by professors or graduatestudents, often working in pairs. The interviewers try not to identify themselvesas linguists, but in the interest of protecting the informants’ rights (and com-plying with Human Subjects Review Committee requirements), they do iden-tify themselves as scholars. A team of interviewers might explain their purposeto an informant by saying something like the following: “We’re graduate stu-dents at the University of Pennsylvania, and we are studying this community.We’d like to ask you about some of your experiences in this neighborhood.” Inthese circumstances, participants are usually receptive, or at least neutral, totheir interviewers and can be expected to converge with their speech. Of course,this speech will be (or will be assumed to be) prestigious; therefore, the inform-ants can be expected to shift in the direction of upper-class, educated speech. Inother words, the phenomenon of monitoring in the sociolinguistic interview,


and on the associated reading tasks, involves a particular case of audiencedesign, namely “responsive, audience, second person design,” or convergence.In Labov’s theory this kind of style shift is assumed to be cognitively difficult.The reason, as discussed in chapter 3 (and as pointed out by Tarone [1988] andPreston [2001, 2002]) is that as children native speakers learn a vernacular orinformal style of their language, which forms their basic or underlying gram-mar. As they are exposed to more formal varieties in writing and in the speechof teachers and other adults in formal contexts, children learn new rules anddifferent frequencies of application for variable rules. An example of such anew rule is the formal way to express irrealis conditions in an if clause. The ruleof the vernacular grammar may produce sentences like “If I would have knownthat, I would have told you.” The new, formal rule produces “If I had knownthat, I would have told you,” or even “Had I known that, I would have toldyou.” An example of adjusting a variable rule is that New York City childrenmust learn to produce higher frequencies of postvocalic /r/ when discussingacademic topics. Labov’s monitoring theory assumes that when speakers shiftaway from the early-learned, vernacular rules, they must employ additionalmental resources, and that these resources are more available when a speakeris not focusing on the content of speech, as when telling a story or, to a lesserextent, when reading a story.

Unlike Labov, Bell studied speech that was naturally produced in varioussettings, for example when travel agents consulted with their clients, or whenradio announcers addressed their audiences. He did not administer an instru-ment containing different elicitation tasks; therefore, the elicitation context washeld constant while the audience changed. It may be, then, that monitoringtheory and audience design theory are not incompatible—both affect speakingstyle under certain circumstances, and therefore both theories are correct,although each theory on its own is incomplete. To better understand therelationship between monitoring and audience design, research is needed thatsystematically manipulates both the audience and the elicitation context.


9Teaching Implications

Social Dimensions

Introduction

So far, we have investigated both the social and the psycholinguistic dimensionsof linguistic variation. Both of these areas have implications for language teach-ing, which we will discuss in this chapter. We begin with the implications ofsocially conditioned variation.

During the days of the grammar translation method, grammar wasthe focus of foreign and second language teaching, although at the advancedstages learners were assigned readings, ranging from letters to academicarticles, that provided them with exposure to authentic discourse. Theaudio-lingual method, which succeeded the grammar translation methodin the 1960s, also focused on grammatical competence. Grammar wasnot taught explicitly but was learned inductively by drilling strategicgrammatical structures. Audio-lingual texts typically did not contain authen-tic target language readings, but readings constructed to exemplify aparticular grammatical structure. The audio-lingual method was over-thrown by the communicative language teaching revolution of the 1970s and1980s. This revolution had roots in both psycholinguistic and sociolinguistictheory.

Krashen (1978, 1982, 1985, 1987), the most influential scholar of communi-cative language teaching, took a psycholinguistic approach. He advocatedmaking language lessons meaningful, stress-free, and interesting because hebelieved that such lessons provided the most effective input to the LanguageAcquistion Device. A different but compatible school of communicative lan-guage teaching (e.g., Savignon, 1983; Wilkins, 1976) took a sociolinguisticapproach, stressing communicative competence, as advocated by the anthro-pologist Hymes (1972). Canale and Swain’s (1980) classic model of communi-cative competence in second language acquisition was reworked by Bachman(1990), who proposed the model of communicative language ability shownin figure 9.1. Figure 9.1 divides communicative language ability into four basiccompetencies, which are grouped into two larger competencies. Organizationalcompetence consists of grammatical competence (knowledge of the patterns ofsentence organization), and discourse competence (knowledge of how differ-ent discourses are organized). Pragmatic competence consists of illocutionary

153

competence (how to perform speech acts appropriately) and sociolinguisticcompetence (how to use styles and registers appropriately).1

A lot has been written about how to teach grammatical competence andillocutionary competence. The teaching of discourse competence has alsobeen extensively addressed in the literature on basic writing and college com-position under the rubric of “rhetoric.” However, to date few authors haveaddressed whether sociolinguistic competence should be taught and, if so,how. In particular, the question of whether to teach nonstandard dialects ofthe target language has rarely been discussed. Most language teaching pro-grams are similar to the French immersion program in Ontario (reviewedin chapter 4), where, as Mougeon, Rehner, and Nadasdi (2004) point out, socio-linguistic variation is absent from the textbooks and even from the classroomspeech of teachers. In Ontario, students mostly study the French of educatedParisians, just as throughout the world students of English, Spanish, Amharic,and other languages study only the standard variety, even if the students live ina place where a vernacular variety is spoken. This situation is similar in someways to the teaching of English as a second dialect in African American com-munities in the United States, where textbooks and teachers largely ignoreAfrican American English (AAE), the language of the community. In bothcases a local variety is suppressed in favor of a prescriptive norm. It is thereforeinstructive to take a look at proposals to teach AAE in American schools. Wewill briefly review this question before turning to the equally controversialquestion of whether to include regional and social varieties in foreign andsecond language classrooms.

African American English Readers

In the 1960s researchers and educators (e.g., Labov, 1967; Gladney, 1973) notedthat a disproportionate number of African American children were not readingat grade level. Undoubtedly, many factors contributed to this problem, butsome scholars (e.g., Fasold and Shuy, 1970; Laffey and Shuy, 1973) wonderedwhether one factor was the difference between the children’s language and thelanguage of the reading textbooks. An example of how such differences couldcause problems can be seen in the teaching of phonics. Many phonics lessonsteach the English vowel sounds by drilling the difference in “word families,”groups of words that differ by a single sound. Thus, the difference between thevowels in bet and bit might be taught by having the child read the followinggroups of words:

Figure 9.1 Bachman’s model of communicative language ability.


pet pitletter litterset sittent tintpen pin

In this lesson confusion could arise because AAE (like Southern English) doesnot distinguish /e/ and /i/ when these vowels occur before nasals, as in the lasttwo examples. Problems related to syntax and morphology could also arise.For example, AAE, like Spanish, has a rule of negative agreement, allowingsentences like “We ain’t never had no trouble.” Other examples of AAE pat-terns that do not appear in the Standard English textbooks include completivedone (“I done gone to bed”—meaning that the speaker is in bed for the night),and invariant be (“My father, he be tired”—meaning that he is usually tired).

On the basis of such considerations, a number of linguists in the 1960s and1970s proposed that reading materials written in AAE should be used in theschools. Several dialect readers were written, the most ambitious of which wasthe Bridge reading program, published in 1977 (Rickford and Rickford, 1995).These materials included passages in three forms: nonstandard, standard, andan intermediate variety, with standard spelling used throughout. Teachers wereencouraged to let students pronounce the Standard English passages in theirown way, without correcting nonstandard pronunciation. This technique iscommon practice in language minority areas around the world; in Stuttgart,for example, school textbooks are written in High German, but students readthem with a Swabish accent. The Bridge materials also included exercises andan audiotape featuring spoken AAE. An excerpt from a passage written in thenonstandard variety appears below.

DREAMY MAEThis here little Sister name Mae was most definitely untogether. I mean,like she didn’t act together. She didn’t look together. She was just anuntogether Sister. Her teacher was always sounding on her ’bout daydreaming in class. I mean, like, just ’bout every day the teacher would begetting on her case. But it didn’t seem to bother her none. She just kepton keeping on. Like, I guess daydreaming was her groove. And you knowwhat they say: “Don’t knock your Sister’s groove.” But a whole lottapeople did knock it. But like I say, she just kept on keeping on. (fromRickford and Rickford, 1995, p. 127)

Despite evidence of their effectiveness, the AAE readers met with resistancefrom many parents, teachers, and community leaders. As a result, the dialectreaders were quietly dropped.

More recently, opinions about the value and appropriate place of AAE in theschools have changed, and dialect readers, including the Bridge series, have

Teaching Implications • 155

come back—this time to a more positive, though still mixed, reception(Perry and Delpit, 1998). The Stanford linguists Angela and John Rickford(1995) have conducted research on the use of the Bridge readers in East PaloAlto, California. One goal of their small-scale study was to gauge the attitudesof students and teachers, both White and Black, toward dialect readers versusStandard English readers. The attitudes were mixed. Teachers of both racesrated the Standard English readers as better written and more helpful to stu-dents. One African American teacher remarked, “Every Black kid knows thatthere is language for the playground, and then there is language for the class-room, and if you want anyone to take you seriously, you’d better not mix thetwo. . . I just don’t think it’s the right approach for teaching Black kids English”(Rickford and Rickford 1995, p. 118).

The students, on the other hand, preferred the vernacular materials, butthere was a sharp distinction between boys and girls, with boys muchpreferring the AAE materials. In one of the studies, the African American stu-dents were asked whether they preferred the vernacular or Standard Englishversion of a story, and which version was most like the way they talked. To bothquestions all of the boys answered “vernacular” and all of the girls answered“standard.” This split reflects a phenomenon that Labov (1966) noticed in hisstudies of AAE in Harlem. Adolescent males appear to be the most consistentspeakers of the vernacular. It is only when they get older and move in widersocial networks that these speakers show more variation between vernacularand standard forms.

A problem with the Bridge series, which may be apparent from the excerptabove, is that its specially written passages are not real literature. To addressthis problem, A. Rickford (1999) developed a program written in the vernacu-lar that includes literature, and that could be used alongside introductorymaterials like the Bridge program. Rickford was educated in the Caribbean,where she read local authors such as V. S. Naipaul. Remembering the excite-ment and sense of participation she felt reading about her own culture,she decided to create a reading program that would feature traditionalAfrican American folk tales, such as the Brer Rabbit stories, and contemporaryAfrican American short stories. The language used in these stories varied fromformal, Standard English to AAE. Rickford notes, “The vernacular is main-tained as an important cultural marker, but idiosyncratic [forms] are avoided”(Rickford, 1999, p. 241), as in this example:

Yes, Brer Rabbit had fallen in love, and it was with one of Miz Meadows’girls. Don’t nobody know why, ’cause he’d been knowing the girl longerthan us folks have known hard times, but that’s the way love is. One dayyou fine and the next day you in love. (from Rickford, 1999, p. 242)

As discussed in Adamson (2005), the problem with using vernacular lan-guage in the public arena is not a problem with the language; it is a problem


with the public. But teachers cannot (quickly, at least) change social prejudice,whereas they can change their students’ chances of public success by teachingthem to read and write Standard English. A popular compromise positionis that students should be bi-dialectal. In regard to reading, both standard andvernacular materials should be used, but students should be allowed to pro-nounce the standard materials according to their own phonological systems.In regard to writing, students should be encouraged to use the vernacular inappropriate contexts, such as writing a journal, a letter to a friend, or a dialoguefor a short story. Of course, there should be assignments that elicit StandardEnglish, as well. An especially useful exercise is asking the students to recast apiece of writing for a different audience, requiring that Standard English berewritten in vernacular, and vice versa. As we will see, these suggestions couldapply to the teaching of nonstandard varieties in second and foreign languagecontexts, as well.

Teaching Sociolinguistic Competence in Foreign Language Contexts

Definitions

The distinction between a foreign language and a second language hasbeen controversial. Oxford (1996) proposes a functional definition: a foreignlanguage is typically learned in a setting where the language is seldom used orexperienced, while a second language is learned in a setting where the languageis typically used by the majority of individuals for everyday communication.In regard to English, Kachru (1996) proposes a historical criterion. He dividesthe countries of the world into three groups according to the role English playsin the society. Inner circle countries include Britain, the U.S., and Canada,where English is the native language of the majority. In these countries Englishis taught to nonnative speakers as a second language. Outer circle countriesinclude India, Kenya, and Malaysia. Typically, these countries have beencolonized by English speakers, and English has become indigenized and isused as an official language or a lingua franca. In these countries also, Englishis taught as a second language. In expanding circle countries, includingGermany, France, and Mexico, English is taught as a foreign language, used forinternational business, scientific purposes, and cultural enrichment.

Goals of Foreign Language Teaching

Kramsch (2002a) observes that second language instruction and foreign lan-guage instruction come from different academic traditions. Second languagescholars are mostly trained in linguistics and the social sciences, and themain goal of second language programs is to enable students to speak the targetlanguage appropriately and with near-native proficiency. Therefore, secondlanguage programs aim to teach survival skills to new immigrants, technicalEnglish to new factory employees, English for academic purposes to students,


and so on. Foreign language scholars, on the other hand, are mostly trained inliterature and the humanities, and the main goal of foreign language programsis to enable students to read and appreciate the literature of the target culture.

Kramsch (2002b) goes on to compare the goals of foreign language study indifferent countries. For the United States these goals read like the self-help lit-erature that has been popular since Benjamin Franklin. They embody an edu-cational philosophy that tries to improve the adolescent as a whole. Specificgoals include enabling the student to “Look beyond . . . customary borders . . .Act with greater awareness of self [and] of other countries . . . [and] gain directaccess to additional bodies of knowledge” (1996 National Standards Statementof Philosophy, quoted in Kramsch, 2002a, p. 6). In Germany foreign languageeducation goals also include personal improvement but with a civic emphasis:“Foreign language education contributes eminently to . . . character develop-ment of our students . . . [It] should give them insights into Christian andhumanistic traditions, to behave according to moral principles, and to respectreligious and cultural values” (Rahmenplan Gymnasiale Oberstufe, 1998,quoted in Kramsch, 2002a, p. 8). Kramsch explains: “The German guide-lines . . . express the moral obligation of administrators to help school pupilsbecome responsible citizens in a democratic society – the historical legacyof WWII” (Kramsch, 2002a, p. 8). In France foreign language education goalsare couched in the language of the Enlightenment, with a focus on rationalityand intellectual reflection. They aim at “an increasingly nuanced study of textsof increasing complexity, deepening of a reasoned understanding of culture . . .the deepening of a metalinguistic reflection on the target language and onlanguage in general (Ministère de l’Education Nationale 2000, p.19; quoted inKramsch 2002a, p. 19). As Kramsch (2002a, p. 9) notes, nowhere in any of thesegoals for foreign language instruction is achieving near-native proficiencymentioned.

Valdman and the Pedagogical Norm

The scholar who is most identified with the question of what forms to teachin the foreign language classroom is Albert Valdman, who addressed the ques-tion under the rubric of the pedagogical norm. This construct is defined byBardovi-Harlig and Gass (2002) as “The immediate language target, or targets,that learners seek to acquire during their language study” (p. 3). Fox’s (2002)definition is similar: “The term ‘pedagogical norm’ is an abstraction that hasbeen used to define a language variety that is simpler and more uniform thanthat of the native speaker . . . It serves as an immediate target for the languagelearner and represents a step, or series of steps, that can lead to the eventualacquisition of the full range of native speaker variation” (p. 209).

Valdman (1989, p. 21) laid out four principles for choosing the structuresthat comprised pedagogical norms. These norms should:


1. Reflect the actual speech of target language speakers in authenticcommunicative situations.

2. Conform to native speakers’ idealized view of their speech use.3. Conform to expectations of both native speakers and foreign learners

concerning the type of linguistic behavior appropriate for foreignlearners.

4. Take into account processing and learning factors.

There is considerable leeway and even some contradiction in these principles.For example, principle 1 says that the pedagogical norm should reflectwhat native speakers actually say, but principle 2 says that the norm shouldconform to an idealized view of what they say. Both principles 1 and 2 poten-tially conflict with principle 4 because ease of processing might suggest teach-ing nonstandard or even ungrammatical forms. So, the devil is in the details,and over the years Valdman’s instructional materials changed as his emphasisshifted among the principles. Throughout his career, however, Valdmanseems to have placed the greatest emphasis on principle 4 in order to facilitatelearning, a priority that was, no doubt, appreciated by generations of students.

An example of Valdman’s emphasis on learnability can be seen in hisrecommendations for introducing yes/no questions in French. These canbe formed in two ways. The first way is to invert the subject and the tense-carrying verb, as in English. Thus, the statement la plume est sur la table(the pen is on the table) can be changed into the question Est la plume sur latable? The second way to form a question is to put the phrase est-ce que in frontof the corresponding statement as follows: Est-ce que la plume est sur la table?The latter structure seems easier to learn than the former because a singlememorized phrase can change any statement into a question, and Valdmanadvocated teaching this form before the subject–verb inversion form. In hisearlier writings, Valdman (1961, 1966) emphasized the philosophy expressedin principles 2 and 3, as did virtually every French course before the 1990s.As Kramsch (2002b, p. 59) notes, “The norm in F[oreign] L[anguage] teachinghas historically been represented by the standard forms of written language asencountered in canonical works of literature” (emphasis in original). Reflect-ing this philosophy, Valdman stated that the pedagogical norm should be basedon “the speech behavior of educated Paris speakers” (1961, p. 1). At the sametime, however, he noted, “all varieties of French are equally ‘grammatical’ andacceptable . . . [to] . . . the linguistic analyst” (1961, p. 1).

As the 1980s gave way to the 1990s, Krashen’s psycholinguistic approachto communicative language teaching, with its emphasis on comprehensibleinput, gave way to the sociolinguistic approach to communicative languageteaching, with its emphasis on communicative competence. Valdman’s phil-osophy shifted in this direction as well, with attention at the advancedlevels to regional and sociolinguistic variation. Valdman (1989) noted, “As


instruction progresses, and as learners become more capable of processingthe more complex syntactic features characteristic of planned formal discourse,the pedagogical norm must increasingly take into account sociolinguisticconsiderations” (quoted in Bardovi-Harlig and Gass, 2002, p. 28).

But how important and, indeed, how practical is it to teach variable formssuch as the difference between “They were playing basketball” and “They wereplayin’ basketball” when students may have difficulty including any form of-ing in progressive tenses? Acknowledging this problem, Valdman advocatedteaching only a receptive competence of variable features along with meta-linguistic knowledge of what alternating forms signify. Students should learnthe significance of G versus N without being asked to produce the forms inappropriate circumstances. This move toward emphasizing receptive variablecompetence was reflected in Valdman’s (1992) condensing the four principlesfor constructing pedagogical norms into only three. The new principles are:

1. Linguistic: the actual variable production of targeted native speakersin authentic communicative situations.

2. Sociopsychological: native speakers’ idealized views of their speechand the perceptions both native speakers and foreign learners haveregarding expected behavior of foreign users.

3. Psycholinguistic: relative ease of acquisition and use.

In the revised version, teaching the idealized language variety no longer ratesa separate principle but is conflated with the less prescriptive and more socio-linguistic notion that the norms should reflect expected language behavior,which would include informal and perhaps even stigmatized forms.

Other Advocates of a Variable Pedagogical Norm

In recent years, other writers have stressed the sociolinguistic dimension ofpedagogical norms, advocating that foreign language learners be exposed tovarieties other than the standard. However, as Fox (2002) acknowledges, thisproposal, like the proposal to teach AAE, can run into opposition. Fox (2002)notes, for example, that the French of educated Parisians has long been con-sidered to be not only more socially acceptable, but more beautiful and logicalthan other varieties, a judgment that is held not only by Parisians but by otherFrench speakers. “Belgian francophones manifest an acceptance of a linguisticsubjugation with respect to France, the disparagement of ways of speaking theybelieve illegitimate, and a pessimistic view of the future of French” (p. 205).Nonetheless, Fox says that attitudes are changing. In her estimation the mostimportant development in French studies in the past two decades is the multi-cultural turn in language studies, which has resulted in the inclusion of thefrancophone literatures of Africa, Canada, and the Caribbean as legitimateobjects of study. In line with the cross-cultural turn in language studies, Fox


(2002) advocates exposing American students of French to “standard QuebecFrench,” the variety used in Quebec newspapers and radio and television newsreports. Although this variety appears to differ from Parisian French mainlyin terms of pronunciation and does not include the vernacular forms, or evenmany mildly marked forms, Fox (2002) believes such exposure will betterprepare American students to interact with North American French speakers.Her proposal is thus in line with the idea of teaching French as an internationallanguage. However, Fox (2002) acknowledges the practical difficulties of thisproposal. Most U.S. French teachers are speakers of Parisian French and wouldneed to rely on recorded materials, which are not presently available.

Kramsch (2002a, 2002b) also advocates exposing foreign language studentsto variation, in this case not to different regional varieties but to different stylesand registers. She suggests that pedagogical norms should be based on “one’sunique experience as a nonnative speaker” (p. 61). In cases where foreign lan-guage students are likely to encounter native speakers of the target language(as are students of Spanish in the American Southwest, for example), theyshould be exposed to informal styles and local vernacular forms. Like Valdman,Kramsch recommends that students acquire only a receptive competencein these forms. In terms of register variation, she advocates exposing studentsto narrative and poetic forms. But, of course, this is already done in Americanforeign language programs, where upper division courses consist almostentirely of the study of literature. Kramsch’s more innovative suggestion is toalso expose students to different modes of input, including e-mail messages,handwritten documents, telephone conversations, and even code-switchingdiscourse.

Fox (2002) and Kramsch (2002a, 2002b) see the foreign language learner asan active user of the target language in contexts beyond the classroom. This maybe possible for some foreign language learners in the United States but not formost of them, who do not have much access to native speakers. Furthermore,even students who live in bilingual communities are usually not able to acquireproductive competence in any form of the target language because the typicalforeign language program consists of studying the language one hour a dayfor three to five days a week for three or four semesters. For these learners,Valdman’s pedagogical norm, which includes some exposure to register vari-ation but focuses almost completely on standard forms, and places a higherpriority on learnability than on variability, is more realistic. Nevertheless, sug-gestions such as those made by Fox and Kramsch do address the needs of thoseforeign language learners who are able to go beyond basic requirements, espe-cially those who participate in increasingly popular study abroad programs.For these learners, the line between foreign language and second language isblurring.


Teaching Sociolinguistic Competence in Second Language Contexts

As mentioned earlier, foreign language teaching and second language teachingcome from different academic traditions and embrace different models of thetarget language. Second language teaching, the more influential of the twotraditions, is connected to the social sciences, especially psychology andethnography. Foreign language teaching comes from the humanities and isconnected to literary and cultural studies. Thus, Valdman (1989) properlyassociates his pedagogical norm for teaching French as a foreign language witha literary standard because his American students will probably encounterFrench only in the classroom, where they will mostly read and discuss Frenchliterature. Second language students usually have very different goals. In thecase of English language learners in the United States, students desire to inter-act and even integrate with native speakers. For this reason, they need to studyauthentic speech, including informal and perhaps even stigmatized variants.

Auger (2002) addresses the question of what variety of French to teachto heritage English learners in Montreal, a matter that has been debated for40 years. There are three candidates: (1) the continental standard (educatedParisian French); (2) Standard Quebec French; and (3) the vernacular varietythat is spoken by the Montreal working class, sometimes referred to as joual(horse), and that is viewed by many of its speakers as an emblem of theiridentity. According to Auger (2002), this latter variety, which contains many ofthe stigmatized features referred to by Mougeon, Rehner, and Nadasdi (2004),has never been seriously considered for use in schools for the reasons alreadymentioned. Auger (2002) argues that this position is understandable becauseparents who enroll their children in French immersion programs are as muchinstrumentally as integratively motivated, and proficiency in Standard Frenchis the key to professional advancement in Quebec. Valdman (1989) acknow-ledged this situation when he stated that the learning of a foreign language“may be viewed as an economic investment whose value increases in directproportion to the status conferred by variant forms: the higher the social statusassociated with a variant, the more remunerative the investment” (p. 21). Thus,the goal of the Ministry of Education of Quebec, according to Auger (2002),was twofold: the students should speak Standard Quebec French and shouldbe able to understand vernacular French. The problem is that the latter goal hasnot been realized. Graduates of the French immersion programs report thatthey cannot communicate with their French-speaking neighbors.

Auger (2002) offers two recommendations for solving this problem. Thefirst recommendation parallels that of Rickford and Rickford (1995) in regardto AAE. She suggests using literature that features characters speaking in thevernacular. Auger’s second recommendation is more daring. She proposesteaching some productive control of the vernacular using standard communi-cative methods, such as roleplays. For example, a student might act as a DJ


for a radio show or might play the role of a college student welcoming afrancophone roommate. The exercises that Auger (2002) provides show thatshe proposes to teach not only the vocabulary and pronunciation of thevernacular, but also some of the syntactic structures, such as ne deletion andthe postverbal (not preverbal) placement of object clitics. There is a secondparallel between Auger’s (2002) proposal and Rickford and Rickford’s (1995)proposal for using Black English literature. These authors note that duringthe adolescent years language is closely connected with identity, and if a goalof language education is to foster a bilingual and bicultural identity, this is acritical time to do so.

The Postmodern Turn—Identity

Introduction

In the discussion of style shifting in native speaker speech in chapter 8, wenoted the close connection between language use and a speaker’s social iden-tity. This connection has been discussed in recent language acquisition litera-ture, as well. An early definition of identity is provided by Tajfel (1978), whosaid that social identity is “that part of an individual’s self-concept whichderives from his knowledge of his membership of a social group (or groups)together with the value and emotional significance attached to that member-ship (quoted in Joseph, 2004, p. 76). More recent scholars, such as Hecht(2002), have emphasized the fluidity of identity, and how it is constructed notentirely by the self but also by others’ perceptions. Joseph (2004) points outthat the tension between self-defined identity and other-defined identity isparallel to the debate in literary criticism between whether the “real meaning”of a work is to be found in the author’s intent or in the reader’s interpretation.The problem with looking to authorial intent is that we can never be sureexactly what the author had in mind—if, indeed, the author had a clear inten-tion. The problem with looking for meaning in the “reader’s response” is thatreaders might be ill-informed or might bring their own idiosyncrasies or agen-das to the interpretation of a work. Joseph (2004) advocates bringing bothauthorial intent and reader response into the interpretation of a work. It is thesame, he says, for social identity construction: “Both self-identity and the iden-tities others construct for us go into making up our ‘real’ identity” (p. 83).

We have reviewed a number of studies that deal with native speakers’ motiv-ation for changing the way they speak. Recall from chapter 8 that variationistshave found a connection between style shifting and a speaker’s identity. This ismost apparent in the kind of style shifting Bell (2001) calls “referee design,” ofwhich there are several types (see figure 8.3). “Ingroup design” occurs whenspeakers wish to emphasize their identity by increasing the features of theirspeech that differentiate them from other groups, a phenomenon similar toGiles’ (1984) “divergent accommodation.” “Outgroup design” occurs when


speakers change aspects of their basic speaking style in an attempt to establishthemselves as members of an outside social group, and this type is most similarto the situation in SLA. As discussed in chapters 1 and 8, Labov (1972a) foundthat high school students living on Martha’s Vineyard had to choose whetherto remain on the island or to move to the mainland in order to pursue greatereducational and career possibilities. Those who chose to remain and retaintheir identity as islanders changed their speech by emphasizing a pronunci-ation that was associated with island speech. Those who chose to leave did notchange their speech.

While Labov’s (1972a) subjects revived a linguistic feature used by olderislanders, Eckert’s (1999) high school students, as described in chapter 3,advanced an ongoing change, the backing of the vowel /�/ so that busses soundslike bosses. It is no surprise that the speakers who showed the most backing of/�/ were women because, as Labov (2001a, p. 280) observes, “Women have beenfound to be in advance of men in most of the linguistic changes studied . . .in the past several decades.” But the identity of the linguistic innovators inEckert’s (1999) study was more specific than gender. Mid-central vowel back-ing was most advanced among the social group that Eckert (1999) called“burned-out female burnouts.” These were girls who totally rejected the highschool culture of academic achievement, along with sports, clubs, and otherapproved activities.

Identity and Second Language Acquisition

Second language researchers have long been interested in how a speaker’s iden-tity relates to learning a language, but they have investigated the notion underother rubrics. For example, Gardner and Lambert (1972) addressed the ques-tion in terms of motivation. They distinguished two kinds of motivation forEnglish speakers learning French in Montreal: instrumental and integrative.Instrumentally motivated learners wish to use French only for practical pur-poses, such as business. Integratively motivated learners wish to make French-speaking friends and move socially in French-speaking circles. Gardner (2002)notes, “though self-identity is not explicitly identified in it, the concept ofintegrativeness involves the willingness to identify with the other languagecommunity” (p. 164).

Gardner and Lambert’s (1972; Gardner, 2002) studies assumed that peoplehad a more or less fixed identity in regard to language learning. They wereeither willing or reluctant to move beyond their own culture and take on someof the characteristics of another culture. But more recent studies of identityhave stressed its continuously changing nature. As Norton and Toohey (2002)state, “Applied linguistics researchers have been drawn to literature that con-ceives of identity not as static and one-dimensional but as multiple, changing,and a site of struggle” (p. 116). These ideas are compatible with the work of


Vygotsky (1978, 1986) and Bakhtin (1981), who believed that learning a newlanguage, or even a new discourse style or register, involved taking on aspects ofa new identity. As an example of the latter, consider the following conversationbetween a psychologist and a literate Kazakh peasant, reported by Luria (1976),a disciple of Vygotsky.

Experimenter: It is twenty versts from here to Uch-Kurgan, whileShakhimardan is four times closer. [Actually, the reverseis true.]

Peasant: What! Shakhimardian four times closer?! But it’s fartheraway!

Experimenter: Yes, we know. But I gave out this problem as an exercise.Peasant: I’ve never studied, so I can’t solve a problem like that! I

don’t understand it! Divide by four? No . . . I can’t . . .[The experimenter repeats the problem.]Peasant: If you divide by four, it’ll be . . . five versts . . . if you

divide twenty by four, you have five!(quoted in Frawley, 1997, p. 13)

In this example, the peasant shifts from the discourse of everyday com-mon sense reasoning to the discourse of hypothetical academic reasoning.Vygotskians argue that such a shift involves a shift in identity as well: thepeasant adopts an aspect of a recently constructed identity, that of the pupil.We have seen many other examples of identity construction throughout thebook, some of which are briefly reviewed below.

The notion of identity is central to Wolfram, Carter, and Moriello’s (2004)study of Hispanic immigrants to North Carolina, discussed in chapter 4. Aparticularly interesting finding was the difference in the speech of an 11-year-old girl and her 13-year-old brother, both of whom had lived in PiedmontNorth Carolina all of their lives. The girl had not accommodated to localSouthern speech norms, producing only 5.9 percent unglided /ay/ in words likepie and sky. The boy, however, produced 62 percent unglided /ay/, indicatingthat he had acculturated to local norms. Wolfram, Carter, and Moriello (2004)attribute this difference to the fact that the boy identified with the local “jockculture,” whereas his sister identified more with mainstream American culture.

Identity is also central to Auger’s (2002) discussion of teaching French as asecond language. One might wonder why English heritage speakers living inMontreal have to rely on immersion programs to learn French when they livein a francophone city and province. The reason is that Quebec is a dividedsociety (LaPonce, 1992). Auger (2002, p. 91) notes that “most immersion stu-dents rarely use French outside of school, as most of their friends are Anglo-phones . . . Anglophones appear not to seek opportunities to speak French.”Not only that, but as immersion school graduates get older they use less andless French. Auger (2002) suggests that the matter of identity is part of this


social segregation. Identity is largely formed during the preteen and adolescentyears, but although immersion students spend a good part of each day at schoolhearing and speaking French, they lack a way to communicate with theirFrench-speaking peers. Therefore, they develop only an English identity. As anexample, Auger (2002) quotes Louisa, who despite speaking fluent ParisianFrench did not integrate with her francophone classmates. “I did not fit in . . .It’s just obvious from my speech [that I’m not French-Canadian]” (p. 97).Auger expresses the hope that “If French-immersion programs can help instillin the many English-speaking children and adolescents enrolled in them astronger sense of comfort with the French they are learning . . . this willincrease their desire to communicate with French-speaking children andteenagers, thus beginning to erode the linguistic boundaries that continue todivide the two linguistic communities” (p. 98).

To summarize this section, recently both foreign and second language spe-cialists have recommended broadening the traditional curriculum to includeinformal styles and vernacular forms of the target language. In foreignlanguage teaching, the most common recommendation is to aim for only areceptive competence, but in second language teaching, where students haveaccess to native speakers, the recommendation is to aim for some productivecompetence, as well.

Psychological Dimensions

Constructivist Language Teaching

As the discussion throughout the book has shown, variation theory hasemphasized the social dimension of language rather than the psychologicaldimension. To look for the psychological motivations of linguistic variation, wehave turned to the fields of psycholinguistics and Cognitive Linguistics. It istherefore natural that this section will draw more from these fields than fromVariation Theory proper. However, Variation Theory does suggest at least oneimportant guideline for teaching, which we will now explore in some detail.

The intuitive (but often ignored) idea that teaching should build on whata student already knows, which can be called constructivism, has beenemphasized by teachers, linguists, and psychologists. Perhaps the most influen-tial of the psychologists is Vygotsky (1978, 1986), who proposed the Zone ofProximal Development (ZPD). He defined the ZPD as “the distance between theactual developmental level as determined by independent problem solving andthe level of potential development as determined through problem solvingunder adult guidance or in collaboration with more capable peers” (1978,p. 86). Scholars have interpreted this definition in different ways, depending onhow they define “development,” but one common interpretation (Tharp andGallimore, 1988, 1990) equates development with learning (rather than withcognitive maturation, as proposed by Piaget [1972]). In other words, learning


builds on previous learning. For example, students cannot understand thePeriodic Table of the Elements unless they first understand the structure of theatom. Thus, a lesson on the Periodic Table would be within the ZPD of a studentwho understands the basic relationship of protons, neutrons, and electrons, butbeyond the ZPD of a student who does not understand this relationship.Another way of looking at this notion is provided by Wood, Bruner, and Ross(1976), who coined the term scaffolding. Scaffolding is the process by which ateacher helps a student understand new concepts by filling in necessary back-ground information. Thus, for a student who does not clearly understand therelationship of protons, neutrons, and electrons, a lesson on the Periodic Tablewould include scaffolding in the form of a review of atomic structure.

Krashen (1982, 1985, 1987) was perhaps the first linguist to propose therelated idea that there are natural stages in language acquisition, and that struc-tures should be taught in the sequence in which they are naturally acquired.Krashen couched his discussion in general terms, saying that if the naturalsequence for acquiring a series of structures is i, i + 1, i + 2, etc., and that ifa student is at stage i, then the best structure to teach is i + 1. As we saw earlierin the chapter, Valdman (1989) endorsed this idea (without specificallymentioning Krashen) in his principle 4 for constructing a pedagogical gram-mar, which is: “Take into account processing and learning factors.” Krashendid not provide examples of sequential structures, but other researchershave supplied them. One example of an acquisitional order was discovered bythe Zweitspracherwerb Italienischer und Spanischer Arbeiter (ZISA) group inthe 1970s (reported, for example, in Meisel, Clahsen, and Pienemann, 1981and Pienemann, Johnston, and Brindley, 1988). The ZISA group describedfive stages in the acquisition of German word order by speakers of Italian andSpanish. The first three stages are as follows:

1. Learners begin with SVO word order, as in the native languages, evenin incorrect contexts (compare example (1) to example (3)).

Example: *die kinder muss machen die pausethe children must make the pause

2. Learners can move a final element to initial position

Example: kinder spielen dachildren play there ANDda kinder spielenthere children play

3. Learners can move nonfinite verbal elements to the end of the clause,as required in German.

Example: die kinder muss die pause machenthe children must the pause make


The ZISA group explained this acquisitional order in terms of difficulty ofcognitive processing. Stage 1 requires the least amount of processing becauseits structures transfer from the native languages. Stage 2 requires more pro-cessing because an element must be moved. In this case it is an adjunct elementthat is not embedded in a lower structure. Stage 3 requires even more cogni-tive processing because an embedded element must be taken out of a lowerstructure and moved to a higher structure (in the case of the example, a verbmust be moved out of a V’ and embedded under VP). The ZISA group recom-mended that these structures be taught in the order mentioned above, and, infact, claimed that they cannot be learned in any other order.

A second example of a natural acquisition order was studied by Doughty(1991), who looked at the acquisition of relative clauses. The theoretical basisof Doughty’s research was the Noun Phrase Acquisition Hierarchy proposed byKeenan and Comrie (1977). These scholars studied the distribution of differenttypes of relative clauses in the languages of the world and noticed two import-ant facts. The first is that the different types of relative clauses are not equallydistributed among the world’s languages. Subject focus relative clauses aremore frequent than direct object focus clauses, which are more frequentthan indirect object focus clauses, which are more frequent than object ofpreposition focus clauses, etc.2 The second fact is that there is an implicationalhierarchy among the types of relative clauses in the languages of the world.A simplified version of this hierarchy looks like this:

subject focus > direct object focus > indirect object focus > object ofpreposition focus

That is, if a particular language has object of preposition focus relative clauses,it will also have all the types to the left of it in the hierarchy. Likewise, if alanguage has indirect object focus relative clauses, it will also have all the typesto the left of it in the hierarchy. In other words, subject focus relative clauses areless marked, in the distributional sense, than relative clauses of the other types.This fact suggests that subject focus relative clauses may also be less markedin the psychological sense; that is, they may be easier to learn. Several studiessuggest that this is the case (e.g., Eckman, Bell, and Nelson, 1988; Gass, 1980; fora dissenting view, see Tarrallo and Myhill, 1983).

Doughty (1991) taught object of preposition focus relative clauses to collegestudents who were using only subject focus relative clauses in their speech. Hergoal was to determine whether the natural order of relative clause acquisitioncould be “beaten” by offering explicit instruction in a more marked form. Shefound, first of all, that the students were able to learn the object of prepositionrelative clauses even though they had not progressed through the naturalorder represented in the implicational hierarchy. She also found, remarkably,that the students had learned direct object and indirect object focus clauses,even though these types were not taught in the experiment. In other words,


teaching the more marked form somehow imparted knowledge of the lessmarked forms. This result, at first glance, seems to suggest that processing dif-ficulty is not involved in learning relative clauses, but that the acquisitionalorder is caused by other factors, such as saliency of forms or frequency of input.However, Doughty (1991) suggests that a processing constraint is in operation,namely the basic ability to relativize. Because all of her subjects were producingsubject focus relative clauses, she believes that they had developed this process-ing capacity and therefore were ready to learn the other types of relativization.

Regardless of whether processing factors are involved in a natural order ofacquisition, it would seem reasonable to teach structures in the order in whichthey are naturally acquired. For whatever reason, that order seems to be theeasiest for learners to follow. Doughty’s (1991) experiment does not contradictthis recommendation. Although her subjects were able to learn less markedrelative clauses by being taught only the most marked type, they had advan-tages that are not shared by most students. For one thing, they were highlymotivated students at an Ivy League university; for another, they had the bene-fit of tailor-made computerized instruction. But how can a teacher know whatstructures are involved in natural orders (only a few have been studied) and, ifcognitive constraints are involved, how can a teacher know whether the stu-dents have developed the cognitive capacity in the new language to acquire thetarget structure? An obvious answer is that if students are variably producing astructure, they have developed the requisite cognitive capacity and are ready tomaster the structure. In terms of Variation Theory, if students have a variablerule for a structure, it is an appropriate target for instruction, which may pushto completion the change in the internal grammar that is already in progress.

Formulaic Sequences and Constructions

A second area of teaching implications involves cognitive linguistics (CL).Recently, linguists and language teachers have paid a great deal of attention toformulaic sequences (sometimes called collocations). These include idioms like“kick the bucket” and “buy the farm,” as well as less idiomatic but commonlyused phrases, like “as well as” and “first of all.” Formulaic sequences alsoinclude phrases with slots that can be filled with a variable, like “As far as John/Marsha/it is concerned,” and even more abstract patterns like the way con-struction discussed in chapter 7 (“Rocky couldn’t punch his way out of a paperbag”). The fact that this last example of a formulaic sequence is also a construc-tion in CL raises the question of whether all formulaic sequences are construc-tions. They are not: only some formulaic sequences qualify as constructions.We will discuss the difference between these two notions later in the chapter.

One reason for the recent interest in formulaic sequences is that they canbe identified by computer concordancing programs that look for a particularword in a corpus and retrieve the words around it. So, given the word“majority,” the program would come back with sequences like “in the majority


of cases” and “majority report.” These programs can also identify whichformulaic sequences are the most common in a corpus of speech or writing,thus identifying which sequences are the most useful for learners.

A second reason for interest in formulaic sequences is the growing beliefthat in language production and comprehension we do not always encode anddecode word by word, as implied in the discussion of the sentence productionand comprehension models in chapter 6. Sinclair (1996) claims, to the con-trary, that “units of meaning are expected to be largely phrases” (p. 82). Such apossibility is, in fact, allowed by Barsalou’s model. As discussed in chapter 6,at level 2 the formulator finds lemmas that match the preverbal message. But, itwould also be possible for the formulator to match a preverbal message, orpart of one, with a formulaic sequence, retrieving both lemmas and some syn-tactic structure. At level 3 a phrase marker would be constructed that includedthe syntactic structure handed down by the formulator. It is less clear howTownsend and Bever’s (2001) sentence comprehension model could handleformulaic sequences, but an intriguing possibility is that the templates thatconstitute pseudosyntax can include formulaic sequences and constructions.Regardless of whether and how formulaic sequences are involved in speechproduction and comprehension, it is clear they should be taught to ESL stu-dents. As Jones and Haywood (2004) observe, “Formulaic sequences enablestudents to express technical ideas economically and also provide the toneappropriate to a particular academic register” (p. 273).

We now turn to the question of how to teach formulaic sequences. To findout how these forms are presently taught, Jones and Haywood (2004) surveyedfour college-level English for Academic Purposes writing texts, which weretraditionally structured, with a reading passage followed by a list of vocabulary,some of which were formulaic sequences. Jones and Haywood (2004) foundthat the texts were “not very useful if the aim is for the students to acquireformulaic sequences” (p. 270). For one thing, the vocabulary lists included alarge number of phrases, which would likely overwhelm the students. Foranother, the examples of phrases were often decontextualized. Furthermore,the books did not include exercises in which the students had an opportunityto use the phrases.

To find an alternative to this method, Jones and Haywood (2004) conducteda study in which they taught formulaic sequences to English learners froma variety of academic disciplines who were attending a university course inEnglish for Academic Purposes. Their exploratory study suffered from prob-lems common to studies conducted in such programs, namely a small numberof subjects (only 21), no random assignment to groups, incomplete data due tostudents being absent on critical days, and a short time in which to teach (onlyten weeks). Nevertheless, the research is interesting for the teaching methods itemployed as well as for the tentative findings. The formulaic sequences taughtin the experiment were chosen from a list compiled by Biber, Conrad, and


Reppen (1998), which was based on frequently occurring phrases in a corpus offive million words covering a wide range of disciplines. Examples included “thenature of,” “as a result of,” “on the one hand,” and “studies have shown that.”

Ten of the students in the study were assigned to a treatment group, whichreceived instruction in the formulaic sequences, and eleven of the studentswere assigned to a control group, which received the regular lessons. Instruc-tion was presented to the treatment group in a reading class and a writingclass. In the reading class, instruction involved three steps. First, a textwas studied in the normal way with discussion and scanning exercises.Next, the text was read again, but this time in a format with the targetsequences printed in bold. Finally, the students completed exercises involvingthe sequences. In one exercise, for example, they recast the meanings ofthe sequences in more colloquial language. In the writing class, the studentsin the treatment group were asked to write four essays using appropriatesequences and to complete exercises involving the sequences. Finally, thestudents were asked to use a concordancing program to discover formulaicsequences in selected texts.

Pre- and post-tests were administered to see whether the treatment grouphad become more aware of formulaic sequences than the control group.One test was a modified cloze test, where students had to provide the appropri-ate formulaic sequence based on context and surface cues. For example, thefollowing sentence (which was part of a larger passage) should elicit the phrasethis kind of: “Too much of th k o chemical might encouragethe immune system to stop the development of the embryo.” The resultsshowed that the treatment group gained 1.5 points from pre-test to post-test, whereas the control group gained no points. Jones and Haywood (2004)concluded, “These results suggest that the modest improvements in thetreatment group are due to increased knowledge due to teaching” (p. 285).

We now turn to the difference between formulaic sequences and construc-tions, beginning with a phrase that qualifies as both: “Fred had guilt writtenall over his face.” This expression, like many formulaic sequences and construc-tions, has a place for variables. An informal representation of the sequencecould be written as in (1):

(1) {John} had {guilt} written all over {his} face

where the angled brackets contain a variable. Thus, (2) is another example ofthe expression.

(2) Marsha had disappointment written all over her face.

This sequence would not be appropriate for Jones and Haywood’s (2004)experiment because it is too colloquial, but it would be appropriate for aconversation, and it could be taught using Jones and Haywood’s methods bysubstituting dialogues and stories for academic readings. However, there could


be a problem with such a lesson. Notice that not just any emotion can fitcomfortably in the {guilt} slot, as shown in (3) and (4).

(3) ??Marsha had pleasure written all over her face.(4) ??John had rage written all over his face.

The problem with (3) and (4) can be understood by treating (1) as aconstruction rather than as just a formulaic sequence.

Unlike formulaic sequences, constructions are often related to prototypeschemas of lexical items with which they can be fused, and recall that in chapter7 we saw such a relationship between the ditransitive construction andthe prototype schema for “ditransitive verb” (see figures 7.3 and 7.4). The basicmeaning of the ditransitive construction is that someone transfers somethingto someone else, who then possesses it. This event was represented usingthe formal notation shown in (32) in chapter 7. A simpler formal notationis shown in (5).

(5) SOMEONE TRANSFERS SOMETHING [NP1 V NP2 NP3]TO SOMEONE ELSE, WHO THENPOSSESSES IT

In this notation the semantic/pragmatic information is written in capital letterson the left, and the associated syntax is written in square brackets on the right.The ditransitive construction contains only abstract syntactic and semanticunits, but as we have seen constructions can also contain specific lexical items,and when this is the case, these items are included in the square brackets. Thus,Queller (2001) represents the “written all over” construction as in (6).

(6) SOMEONE’S ATTEMPT TO [have {guilt} written all overPRESENT A FRONT OF {one’s} face]COMPOSURE IS BEING“MESSED UP” BY ANUNCONSCIOUS DISPERSALOF EMOTION OVER HIS/HER FACE

Treating (1) as a construction with this semantic interpretation instead of aformulaic sequence allows us to see why (3) and (4) are odd. Someone who isexperiencing pleasure or rage does not usually try to present a composed front.Guilt and disappointment, on the other hand, are emotions that people oftentry to conceal, sometimes without success. Therefore, these lexical items bettermatch the basic meaning of the construction.

The implication for teaching is that a careful examination of a construc-tion’s meaning can make clear why possible surface realizations take the formthey do. Therefore, constructions should be taught in more depth than otherformulaic sequences. The methods used by Jones and Haywood (2004) wouldbe appropriate for teaching constructions, but the lesson should also provide


an analysis of the construction’s meaning and a discussion of why only certainlexical items fit the construction. The goal is to help the students understandwhy certain lexical items, though not predictable from a construction’s mean-ing, are “motivated” by it. This advice applies to teaching the ditransitiveconstruction, as well. Recall that in the pilot study of the ditransitive construc-tion, the native English speakers’ intuitions formed a continuum of grammat-icality, which is shown in table 7.2. The two ends and the middle point onthis continuum are represented in (7) to (9).

(7) Sally got me a ticket.(8) ?Fred pushed him a beer.(9) *George operated them the projector.

Recall also that in chapter 7 it was claimed that this continuum of grammatical-ity resulted from the prototype effects produced by trying to fuse verbs thatdo not belong to one of the permitted categories (as in (9)) or are outlyingmembers of a permitted category (as in (8); see figure 7.4) with the ditransitiveconstruction.3 This account suggests that when teaching the ditransitive,lessons should include not only fully grammatical examples, like give, butalso questionable examples, like push, and ungrammatical examples, like oper-ate, along with an explanation of why these latter words don’t fit. In otherwords, the prototype lexical schemas associated with a construction shouldbe explored. Teachers are often asked, “Why can’t you say x?” and studentsare often told, “You just can’t,” but in the case of constructions it is possible toprovide a more satisfactory answer.

A Philosophy of Language Teaching

Theories of Understanding

This volume began with a discussion of the philosophical foundations ofknowledge, and it is fitting to end with a discussion of the teaching implica-tions of different schools of philosophy. The oldest of the three schools wewill discuss is usually called positivism though cognitive linguists (e.g., Lakoff,1987; Johnson, 1987) use the term objectivism. Objectivism has been thedominant philosophy of science in the West since Aristotle. This philosophyprovides a common-sense answer to two fundamental questions regardinghuman understanding: the ontological question, What is the nature of real-ity? and the epistemological question, How do we know what we know? Theobjectivist’s answer to the ontological question is that the natural world con-sists of objects that have certain properties, such as weight and density, and thatexist in certain relationships to each other; such as “the rock is in the river”;“the bird is flying over the tree.”4 The objectivist’s answer to the epistemo-logical question is that the mind constructs models of reality (that is, schemas),which reflect the objects, properties, and relationships that exist independently


in the world. The central assumption of objectivist psychology is that theseschemas accurately represent the world—that the mind is a mirror of nature.

However, according to objectivism, it is important not to confuse externalreality with mental models of reality. Therefore, objectivism endorses the“independence assumption,” which Lakoff (1987, p. 164) states as follows:“no true fact can depend on people’s believing it, on their knowledge of it, ontheir conceptualization of it, or on any other aspect of human cognition.”Thus, objectivism posits a “God’s eye” view of the universe, independent ofhuman perception, in which all objects, properties, and relationships arecorrectly characterized. In other words, to the question “If a tree falls in theforest and no one hears it, does it make a sound?” the objectivist answers, “Yes.”

Objectivism has been challenged by the philosophy of social construction-ism (Geertz, 1983; Kuhn, 1970, 1977; Rorty, 1979, 1989). Social constructionistsdisagree with a central tenet of objectivism: the claim that there are twodifferent kinds of “facts,” which Lakoff (1987, p. 170) calls “brute facts” and“institutional facts.” Brute facts involve the objects, properties, and relation-ships in the physical world, such as the speed of light and the relative density ofgold and water. According to the independence assumption, such facts are trueregardless of any human institution. Objectivists believe that scientific theoriesare grounded in brute facts, and therefore that these theories can be objectivelyevaluated. Theories that correspond to and predict the actual brute facts ofnature are true; other theories are false. Institutional facts, on the other hand,are set up by human beings. They include customs, like leaving a tip in a res-taurant, and agreed-upon states of affairs, like the fact that George W. Bush waselected President of the United States (or was he? You can see that there is roomfor disagreement here). Institutional facts include all of a society’s laws, beliefs,customs, and myths.

Social constructionists contend that there is no absolute dichotomy betweenbrute facts and institutional facts. Obviously, institutional facts violate theindependence principle: they depend entirely on human understanding, andthey can be changed. For example, in Eastern Europe institutional facts involv-ing national boundaries and political alliances have changed fairly recently.But, as Kuhn (1970) has pointed out, this is the case with brute facts as well.Many scientific “facts” have been discredited. For example, scientists no longerbelieve that matter consists of four elements, that the sun revolves around theearth, or that space is filled with an undetectable substance called “ether.”Social constructionists say that brute facts, like social facts, are constructed bysocieties, in this case a society of scientists. Therefore, they deny that there is anessential difference between brute facts and institutional facts and that anyform of knowledge is firmly grounded in reality. There is no “God’s eye” viewof nature. Social constructionism proposes a relativistic theory of knowledge.Rorty (1989), for example, claims that scientific theories can only be judged as“true” in relation to a particular group of people at a particular time.


Actually, not all social constructionists hold this strongly relativistic pos-ition. Kuhn, perhaps the best-known social constructionist, seems to place ahigher value on sensory experience in evaluating scientific theories than doesRorty. Kuhn (1970, p. 126) asks, “[Is] sensory experience fixed and neutral? The[objectivist] viewpoint . . . dictates an immediate and unequivocal Yes! But inthe absence of a developed alternative, I find it impossible to relinquish entirelythat viewpoint.” Kuhn’s (1970) reservations about the absolute relativity ofknowledge foreshadowed the third school of philosophy we will look at, whichis called experiential realism and is endorsed by many cognitive linguists. Butfirst, let us consider some of the teaching implications of the two philosophiesof science we have discussed so far.

Teaching Implications of Objectivism and Social Constructionism

According to Bruffee (1986), both the objectivist and the social constructionistviews of knowledge have analogs in teaching. The objectivist believes thatknowledge about brute facts, and by extension “proven” scientific theories, isauthoritative: certain claims are true and others are false. The ultimate author-ity is nature, but next in authority is the scientist who understands nature; thus,there is no point in discussing or debating scientific facts. The classroom analogof this view is the lecture course, where an authoritative teacher stands at thefront of the room and supplies facts to the students. The flow of information isfrom the authority to the neophyte.

The social constructionist believes that all knowledge, including scientificknowledge, is collaboratively constructed. Authority resides in a society ofexperts who agree that certain assumptions and approaches are fruitful.Members of this society interact in conversations and by writing books, arti-cles, letters, and e-mail messages. The process of expounding, criticizing, andrevising ideas within a scholarly community is called the “hermeneutic circle.”In science, the hermeneutic circle includes reports of experiments, but experi-mental results are suggestive rather than conclusive. As Kuhn (1970) pointsout, no scientific theory is without exceptions and problems, and scholars mustinterpret and assess how new data affect a dominant theory. Sometimes whenexperimental results call a theory into question, scholars consider the results tobe a special case or a convenient fiction that does not change the basic theory.For example, for at least 50 years after Copernicus proposed the heliocentricuniverse, astronomers accepted the utility of his model for calculating the loca-tion of the planets, but they did not believe that the model was literally true(Kuhn, 1959). Furthermore, sometimes experiments cannot decide betweencompeting theories, and the scientist must hold several theories in mind, usingwhichever theory is most helpful for dealing with a particular phenomenon.As Feynmann (1965) observes,

Every theoretical physicist who is any good knows six or seven different


theoretical representations for exactly the same physics. He knowsthat . . . nobody is ever going to be able to decide which one is right atthat level, but he keeps them in his head, hoping that they will give himdifferent ideas for guessing. (p. 168)

Bruffee (1984) says that the classroom analog of the social constructionists’model of knowledge is a circle of scholars constructing the network of schemasfor a particular area of knowledge. The job of the teacher is to engage studentsin the ongoing conversation of an academic discipline—to introduce theminto the hermeneutic circle. He states, “Our task must involve engaging stu-dents in conversation among themselves at as many points in both the writingand reading process as possible, and that we should continue to ensure thatstudents’ conversations about what they read and write are similar in as manyways as possible to the way we would like them eventually to read and write”(Bruffee 1984, p. 642).

Experiential Realism

Experiential realism is the philosophy of science associated with cognitive lin-guistics (Lakoff, 1987; Johnson, 1987). It endorses the social constructionistclaim that mental models of institutional facts are entirely socially constructed,but it rejects the claim of some social constructionists that mental models ofphysical reality can differ radically in different societies.5 Rather, it holds thatsuch models are constructed, to some extent, by an interaction between thehuman perceptual and cognitive apparatus and the physical world. This con-struction is constrained by universals of perception and cognition and by theuniversality of certain basic aspects of human experience, and not just by socialbeliefs. Experiential realists say that human beings have a “concept-makingcapacity” that allows us to learn about (“construct”) reality directly as wellas by means of language and teaching. Because this cognitive apparatus isuniversal in the species and because basic experience with physical objects issimilar in all societies, “directly known” knowledge is similar as well. Suchknowledge provides a grounding for schemas of brute facts. Evidence for theseclaims comes from studies of universals in language and language acquisition.

Color categorization is an example of how the human concept-makingcapacity interacts with socially constructed mental models. The number ofbasic color terms in the languages of the world varies from two to eleven, but itdoes not vary without limit. There are many languages that have a term for acolor that corresponds to both green and blue in the English system. But thereare no languages that have a term that corresponds to both green and red.This is because the human optical apparatus perceives green and blue as similarand green and red as maximally different. Furthermore, as explained in theappendix, when languages expand their basic color terms, they all take the sameroute, first distinguishing colors that maximally contrast in our perception.


Cognitive Linguistics claims that our capacity to construct concepts in otherdomains is similarly constrained by species-specific perceptual and cognitivemechanisms. These allow us to construct two kinds of mental representations:image schemas and schemas for basic level concepts. Lakoff (1987) notes, “Sinceimage schemas are common to all human beings, as are the principles thatdetermine basic-level concepts, total relativism is ruled out, though limitedrelativism is permitted” (p. 268). It is beyond the scope of this volume to pres-ent the theory of image schemas and basic level objects in full. Instead I willattempt briefly to outline the theory in connection with some of the supportingevidence and refer the reader to more comprehensive discussions.

Image schemas are highly abstract mental representations of basic physicalrelationships, such as an agent exerting force on an object. Johnson (1987)points out that this concept is ubiquitous in human experience. We push adoor and it opens; we throw a baseball and it flies. According to Johnson, suchuniversal experiences give rise to the “force” image schema. This schema is anemergent Gestalt concept that “exists for us prelinguistically, though [it] can beconsiderably refined and elaborated as a result of the acquisition of language”(1987, p. 48). A drawing of the force image schema appears in figure 9.2, whichindicates that when an object, or trajector, is impacted by a force (representedby the solid line), it moves along a path (represented by the dotted line).

How the force image schema emerges in children and how language ismapped onto it are suggested in studies by Slobin (1973, 1985), who foundthat children from different language backgrounds employ certain kinds oflinguistic structures before others. He claimed that these structures reflect“prototypical scenes” in a child’s experience. One such scene is the manipulativeactivity scene, which Slobin (1985) describes as follows:

Manipulative activities involve a cluster of interrelated notions, includ-ing: the concepts representing the physical objects themselves, alongwith sensorimotor concepts of physical agency involving the hands andperceptual-cognitive changes of state and change of location, along withsome . . . notions of . . . causality, embedded in interactional formatsof requesting, giving and taking. (p. 1175)

Certain parts of this scene can be marked grammatically. For example, in manylanguages direct objects are marked for the accusative case. But Slobin claimsthat children learn to mark “prototypical” objects, that is, objects that arephysically affected (such as the object of the action of giving) before they

Figure 9.2 The force image schema.


learn to mark objects that are not physically affected (such as the object ofthe act of seeing). A similar situation occurs in the acquisition of Kaluli(Schieffelin, 1985), an ergative language which marks the subject of verbs thattake a direct object. Here children first learn to use the ergative marker withthe subjects of verbs that involve direct physical manipulation.

According to Slobin, these facts suggest that at first children do notmark grammatical classes, like direct object, or subject of ergative verb, butrather semantic classes, like the object or agent in the manipulative activityscene. Slobin suggests that children understand the manipulative activityscene directly (perhaps in the form of a force image schema) based on theirinteraction with objects in the world as mediated by human perceptionand basic cognition, and that they map language onto this emergent Gestaltconcept. This kind of understanding, he says, is universal and accounts for“basic child grammar,” a grammatical system shared by all children in thebeginning stages of language learning.

Further evidence that knowledge of basic physical relationships can beunderstood directly is provided by Talmy (1985a, 1985b), who claims thatthese relationships form a privileged class of knowledge that all languages tendto represent by grammatical rather than by lexical devices. One example isEnglish prepositions. Prepositions are grammatical rather than lexical becausethey belong to a closed class of words that is relatively small and cannot beadded to easily. Talmy (1985a) points out that prepositions assume imageschemas that contain two basic elements, which he called “figure” and“ground” (more recently the terms trajector and landmark have been used; seethe discussion of figure 9.3 below). A landmark is an entity that can be con-strued as a reference point and a trajector is an entity that is located withrespect to a landmark. For example, the preposition across in “The man swamacross the pool” assumes an image schema where the pool is the landmark andthe man is the trajector. Talmy claims that there are universal constraintson how image schemas can characterize the possible relationships betweentrajector and landmark. The trajector is typically smaller than the landmark, so“The towel lay across the body” sounds more natural than “The body lay acrossthe towel” (unless it’s a very big towel). Furthermore, the absolute size of thetrajector and landmark do not seem to matter. Thus, “The ant walked acrossthe paper” and “The bus drove across the country” sound equally natural.One aspect of the landmark that languages typically encode grammatically isits state or constituent structure. Through refers to an image schema in whichthe ground is some medium, such as water or trees, rather than a flat plane.“The man walked through the field” cannot refer to a plowed field but onlyto a field covered with some substance such as wheat.

Cognitive linguists claim that image schemas also underlie grammaticalconstructions. For example, the “transfer of possession” image schema, shownin figure 9.3, underlies the ditransitive construction. Recall that the ditransitive


construction has the prototypical meaning “someone gives something tosomeone else, who then possesses it.” Figure 9.3 says that a trajector (object)moves from a landmark that is the source of the transfer (the circle on the left)to a landmark that is the goal of the transfer (the circle on the right) andremains there.

Talmy (1985a) characterizes the cognitive process of creating image schemasas a “boiling down of objects, in all their bulk and physicality” (p. 232) toidealized and abstract images. Like Lakoff (1987) and Johnson (1987), heattributes the fact that such schematization appears to work in similar ways inall languages to the universal nature of the human perceptual and cognitiveapparatus interacting with physical reality. “The explanation [for similarities]can be found in our very mode—in large part presumably innate—of conceiv-ing, perceiving, and interacting with the contents of space” (Talmy 1985a,p. 233). According to Johnson (1987), image schemas underlie and permeateour language-based network of concepts for objects and events, and thusmake possible our understanding of the world. They “provide a basis for andcan connect up with our . . . networks or webs of meaning. Without them,we cannot explain the connections and relationships that obtain in our seman-tic networks” (p.189). Johnson does not claim that all languages are built onexactly the same image schemas; only that image schemas are substantiallysimilar in all languages.

A second foundational notion of experiential realism is the notion of basiclevel objects, the evidence for which comes mostly from studies of categoriza-tion. All languages categorize objects at various levels of abstraction. The leastabstract category is one that contains a particular object. For example, thecategory “Montserrat” (my cat) contains one member. In English there are anumber of more abstract categories to which Montserrat belongs: “Siamesecat,” “cat,” “animal,” and “living thing.” According to Lakoff (1987), languagesmay differ considerably in their systems of categorization, but most languageswill have a word denoting objects at the basic level, which is at an intermediatelevel of abstractness. Thus, most languages will have a word that correspondsto “cat,” but not necessarily to “Siamese cat” or “animal” or “living thing.”Languages tend to have words for basic level categories because it is at thislevel of abstraction that the properties shared by members of categories aremost salient to human beings. The basic level of categorization for objects is

Figure 9.3 The transfer of possession schema.


distinguished from more and less abstract levels by three characteristics:(1) Gestalt perception of the object’s overall shape; (2) the human capacity forphysical interaction with the object; and (3) the ability to form a rich mentalimage of the object (it is possible to form a mental image of “Montserrat,”“Siamese cat,” or “cat,” but not of “animal”). Brown (1965), who first pro-posed the basic level theory, noticed that basic level categories are among thefirst to be named by children. This is so because the overall shape and mannerof bodily interaction with objects at this level are salient to children. Brownobserved that this latter characteristic is particularly important. He noted:

When something is categorized, it is regarded as equivalent to certainother things. For what purposes equivalent? . . . Flowers are equivalentin that they are agreeable to smell and are pickable. Cats are equivalentin that they are to be petted, but gently, so that they do not claw.(Brown, 1965, pp. 318–319)

Notice that the prelinguistic ability to conceptualize basic level objects is alsoassumed in Slobin’s (1973, 1985) account of the manipulative activity scene.In order for children to understand that a force is acting on an object, theymust first understand what an object is.

The claim that knowledge in the form of image schemas and schemas forbasic level objects is “preconceptual” (known directly, and not by means oflanguage) explains how human beings who do not have language, such asinfants and feral children, can understand the world. This claim is also consist-ent with the experience of Helen Keller (1988), who reported that before shelearned her first word, “water,” she knew perfectly well what the cool liquidwas, and did not confuse it with milk or bread. However, Johnson (1987)acknowledges that directly known knowledge can be extended and built uponconsiderably by different linguistic and conceptual systems, and thus mostknowledge is relative. Experiential realism stakes out a position between radicalrelativism and objectivism. It claims that all societies are “plugged in” to thephysical world by means of image schemas and schemas for basic level objects.

To summarize, according to experiential realism schemas for social facts areentirely socially constructed and therefore can differ radically from culture toculture, but schemas for brute facts involving basic level objects and physicalrelationships are grounded in human perception and cognition as well as uni-versal experience. Therefore, schemas for these facts, like schemas for color sys-tems, cannot differ without limit. Notice that experiential realism does notendorse the objectivist claim that schemas are “true” reflections of reality.Rather, it claims that these representations are “true” in relation to human beings.

Teaching Implications of Experiential Realism

Experiential realism endorses the social constructionist model of knowledgein most respects, and therefore has similar implications for teaching. The


metaphor of education as an ongoing conversation in which students take agreater and greater part seems particularly appropriate for language teaching.This metaphor, after all, has guided the communicative competence approachto language teaching (Savignon, 1983). The metaphor is also appropriatefor content-based instruction (Adamson, 1993, 2005; Brinton and Master,1997; Snow and Brinton, 1997), where the goal is to introduce students to anacademic culture.

We now consider the ways in which experiential realism differs from socialconstructionism, and therefore has additional implications for language teach-ing. The theory that we can directly know certain concepts helps to explain whysome methods of beginning language instruction are successful and providesan answer to the common question: How can you teach {Spanish} ifyou don’t speak the students’ language? Most communicative language teach-ing methods introduce the target language in connection with basic levelobjects and simple topological relationships. For example, in the Total PhysicalResponse method (Asher, 1969), the teacher gives commands: “Walk to thedoor,” “Turn around,” “Put the pen on the book,” while demonstrating whatthe commands mean. The students then perform the actions themselves. Notranslation is necessary here. These actions and objects are understood directly.

A second method that involves the manipulation of basic level objects andtopographical relationships is The Silent Way (Gattegno, 1972). In this methodthe objects and relationships are modeled by using colored rods while theyare described in the target language: “The red rod is above the green rod,” etc.The Silent Way provides a good example of what is meant by direct or prelin-guistic understanding because it is not necessary for the students’ native lan-guage to mark the distinctions that can be taught with the rods. For example, aswe have seen, many languages do not have separate words for blue and green,but as Rosch (1973) found, human beings can readily learn to mark thesecolors lexically, even though their language does not, because the two hues cor-respond to natural divisions imposed on the color spectrum by human opticalphysiology (Kay and McDaniel, 1978). Thus, students have no difficulty learn-ing to name the two colors. Similarly, Spanish-speaking students quickly learnthe difference between “the rod is on the glass” and “the rod is in the glass”when it is demonstrated, even though the Spanish preposition en would beused in both cases.

In regard to higher level teaching, and in particular to content-basedinstruction, the theory of image schemas and basic level objects helps toexplain why teaching based on demonstration is more effective than expositoryteaching, which is based entirely on language. Adamson (1993) describesthe case of Ceyong, a seventh-grade student from Korea, who was unable tounderstand scientific concepts such as specific gravity on the basis of herteacher’s lectures. However, when Ceyong’s tutor took her to a laboratorywhere she was able to measure the specific gravity of various minerals using


scales and beakers of water, the concept became clear to her. Experientialteaching was endorsed long ago by John Dewey, who said:

When education . . . fails to recognize that the primary or initial subjectmatter always exists as matter of an active doing, involving the use ofthe body and the handling of material, the subject matter of instruct-ion is isolated from the needs and purposes of the learner, and sobecomes just something to be memorized and reproduced upondemand. (1916, p. 184)

It is important to emphasize that content-based teaching must preparestudents to benefit from expository teaching as well as experiential teaching.Much of the academic conversation in which we desire students to participateis expository in nature. The point is that experiential teaching can provide anentry into this conversation. The goal of the content-based course should be toprovide the most effective mixture of expository and experiential instruction.In general, second language students will require considerably more experien-tial instruction than will native speakers, as recommended by many experts,including Valdez (2001) and Rigg and Enright (1986).

To summarize, in this section I have attempted to show that the philosophyof experiential realism can provide an epistemology that is compatible witheffective approaches to language teaching, both at the introductory andadvanced levels. According to this philosophy, most kinds of knowledge aresocially constructed, and authority resides not in an objective standard oftruth, but in a community of experts who share a paradigm that provides aparticular vocabulary and a set of assumptions about what counts as a goodargument. Scholarship is viewed as a continuing conversation among membersof this community, which continually results in a refinement of vocabularyand occasionally in a shift of paradigms. Education is viewed as increasingparticipation in this conversation. This model of knowledge implies that lan-guage instruction should be interactive and collaborative. Experiential realismalso claims that some knowledge can be constructed by direct sensory experi-ence, and therefore is independent of language. Thus, experiential teaching thatinvolves the observation and manipulation of objects and relationships canprovide an entry into the academic conversation of a second language.


AppendixVariation and Change in Color Semantics

Introduction

Scholars who study color category systems have recognized that their workis similar to the study of variation and change in other linguistic systems.Berlin and Kay (1969) pointed out the similarities between their theory ofthe development of color systems and Jacobson’s (1962) theory of children’sphonemic development. Kay (1975) pointed out similarities with Weinreich,Labov, and Herzog’s (1968) theory of sound change, noting that in both casesdiachronic change is accompanied by synchronic variability and heterogeneity.Investigations of sound systems described in Labov (1982, 1994, 2001a), and ofcolor systems described in MacLaury (1986, 1991), point up both similaritiesand differences in how change takes place in these two cognitive systems. Inthis appendix, the theory of color category change is discussed within theframework of sound change outlined in chapter 3.

Background: The Evolution of Color Categories

Berlin and Kay (1969) studied systems of color categorization in 98 languages.They noted that all languages have an infinite number of terms for describingcolors, ranging from “red,” to “reddish-brown,” to “the color of the rust onmy aunt’s Chevrolet.” However, among this infinite number of terms there area maximum of 11 basic color terms. A color term is basic if, among otherthings, it is a single morpheme, is not derived from another term (like reddish-brown), and uniquely names a region of the color spectrum. The set of basiccolor terms exhaustively covers the color spectrum. For each individual, thebasic color terms form a cognitive system that is familiar and easily accessible.The individual will know and use many secondary color terms that are not partof the basic system, but the basic terms will be used most frequently. The11 basic color terms in English are: white, black, red, green, yellow, blue, brown,purple, pink, orange, and gray. Many languages have fewer than 11 basic terms,but these languages can be arranged in a fairly strict implicational relationship.This relationship, as modified by Kay (1975), is shown in figure A.1, which saysthat if a language has only two color terms, they will be light-warm and dark-cool. If a language has only three color terms, they will be white, warm, anddark-cool, and so on.

Berlin and Kay (1969) claimed that their synchronic study of color cate-gorization systems has diachronic implications. That is, the implicational

183

relationships in figure A.1 apply not only to the description of present-daysystems, but also to the historical development of color category systems. Thus,early systems contained the terms in the early stages of figure A.1, and as theydeveloped they added the terms in the later stages. Such evolution can beobserved today in the changing color systems of developing societies, asdescribed below. What is the motivating force that causes a society to expandits stock of basic color terms? Berlin and Kay (1969, p. 16) observe:

There appears to be a positive correlation between general culturalcomplexity (and/or level of technological development) and complexityof color vocabulary . . . Increase in the number of basic color terms maybe seen as part of a general increase in vocabulary, a response to aninformationally richer cultural environment about which speakers mustcommunicate effectively.

Figure A.1 Stages of evolution in color categories.

184 • Appendix

Recall from chapter 3 that Labov’s theory of sound change includes social,functional, and physiological factors. Social factors provide the strongest moti-vation for sound change. The functional principle of maximum contrast is amotivating force behind chain shifts, where, for example, the overcrowding ofphonemes in one area of phonetic space results in a phoneme being pushedinto a different area. Physiological factors include the fact that vowels tendto move toward the front of the mouth because it has more articulatoryspace. Labov (2001a) believes that although physiological factors have a stronginfluence on sound change, they interact with and can be overridden by socialfactors, “just as a boat may tack into the wind” (p. 499).

In contrast, Berlin and Kay’s (1969) theory of change in color systems is afunctional theory narrowly constrained by physiological universals that limitthe direction that change can take. Their theory is functional because it claimsthat if the informational load on a color term becomes too great, a new colorterm is added. The new term will name the hue that maximally contrasts withthe hues already named. Maximum contrast is determined by the physiologicalnature of the human optical apparatus.

Kay (1975) and Dougherty (1977) add a social component to the theory ofcolor category development, noting that how far an individual has progressedalong the hierarchy shown in figure A.1 is related to social factors. Age is thefactor most strongly correlated with color category development. Kay (1975)reports that in studies of color categorization in Aguaruna (Berlin and Berlin,1975), Futunese (Dougherty, 1975), and Binumarien (Hage and Hawkes, 1975),younger speakers progressed significantly further in color category develop-ment than older speakers. He concludes, “All the significant differences pointin the same direction—that younger speakers have more advanced basic colorsystems than older speakers” (Kay, 1975, p. 269).

Another social constraint on the development of color category systems ishow much exposure an individual has to more advanced color systems. Berlinand Berlin (1975) report that, for Aguaruna speakers, contact with Hispanicculture and proficiency in Spanish correlate with advanced color term develop-ment. Similarly, Hage and Hawkes (1975) report that for Binumarien, speakerswith more exposure to the New-Melanesian language are more advanced. Onthe other hand, Dougherty (1975, 1977) reports that for Futunese, the devel-opment of color terms appears to be negatively correlated with exposure toNew-Melanesian.

Change in Color Category Systems

The theory of change in color category systems outlined above can be com-pared to the theory of change in linguistic systems outlined by Weinreich,Labov, and Herzog (1968) and Labov (1982, 1994), which was reviewed inchapter 3. These scholars provide a useful framework for the study of linguisticchange by specifying five problems with which such a theory must deal. These

Appendix • 185

are the problems of constraints, transition, embedding, evaluation, and actu-ation. All of these except the constraints problem (which concerns only easeof articulation) have both a linguistic aspect and a social aspect.

The Constraints Problem

The explanation of the physiological constraints on color development is arobust area of cognitive theory. The following discussion is based on the theorypresented in MacLaury (1986, 1991), which contains three physiological prin-ciples: (1) due to the structure of the optical system people perceive six purestcolors—red, yellow, green, blue, white, and black—which can be arranged intoexactly fifteen different pairs: red-yellow, red-green, red-blue, yellow-green,yellow-blue, yellow-white, etc.; (2) the two colors in each pair are perceived tobe similar to some extent and different to some extent; (3) the strength of thissimilarity and difference (or contrast) differs for each of the fifteen pairs ofcolors. For example, the pair blue/green has a high degree of similarity and alow degree of contrast, whereas the pair red/yellow has a low degree of similar-ity and a high degree of contrast. As mentioned earlier, a color category systemadds a basic color by adding a hue that maximally contrasts with the existingbasic colors. MacLaury’s final principle is a cognitive principle that explainsgradual change in systems as a whole: individuals show variation in their atten-tion to the similarity and difference in color pairs. This principle is discussedbelow as part of the actuation problem. In sum, physiological constraints donot totally determine sound change or color category change; however, they aremuch stronger in regard to color change.

The Transition Problem

The transition problem in color change concerns the route by which a new basiccolor term is added. The most common route involves five phases: (1) near-synonymy with an older term; (2) coextensivity; (3) inclusion; (4) overlapping(a phase that does not always occur); and (5) complementation. In phase 1,when a new color term appears, it has nearly the same range (refers to the samehues) and the same focal point (best example) as an existing color term. Thus,the two terms are virtually synonymous. In phase 2, the focal point of the newterm shifts, although the range remains the same. In this phase, the old termand the new term are no longer synonymous. However, they are coextensivebecause all the hues referred to by the new term could, in a stretch, also bereferred to by the old term. An English example of coextensive terms might berose and pink (although these are not basic color terms). An individual mightname exactly the same hues both rose and pink; however, pink would be moreappropriate for certain hues and rose more appropriate for others. In phase 3,inclusion, the range of the new term retreats so that it refers exclusively to thearea around its focal point. An English example might be the term scarlet,which is completely included within red. The older term can still refer variably

186 • Appendix

to the focal point of the new term, though this would be less likely. In phase 4,overlapping, the older term can no longer refer to the focal area around the newterm; however, the fuzzy area in between the two focal points can still be namedby each term. In phase 5, complementation, the two terms are in comple-mentary distribution. As mentioned, phase 4, overlapping, is not attested in allcolor studies, so apparently the system can go directly from inclusion to com-plementary distribution. Thus, in color change, the range of the older termretreats so that overlapping is first reduced and then eliminated as the old andnew terms become mutually exclusive. At that point the information load isequally distributed between the two terms. Phases 4 and 5 are crucial, for inthese phases a unique form–meaning relationship is created, and a new basiccolor term enters the language.

The route of color change is similar in some ways to the route of regularsound change described in chapter 3 because in the early stages of both casesa new term can refer to all members of an existing class. In phase 1 of colorchange, the new term applies variably to all of the hues named by the oldterm. In sound change, the old phone can be used variably in all the words inthe word class, although, typically, there is environmental conditioning. Forexample, the centralization of /aw/ on Martha’s Vineyard was more advancedbefore voiceless consonants. Thus, incipient sound change normally corres-ponds to phase 2 of color category change. However, regular sound change isdifferent from color category change in that it does not produce a new basiccategory, but merely alters the pronunciation of an existing category.

The route of color change is similar to the route of morphological changein child language acquisition and in decreolization because in these cases adeveloping system expands to create new basic units. In child language acquisi-tion, Slobin (1973, p. 184) observes, “New forms first express old functions”(emphasis in original), a description that exactly fits phase 1 of color change,near-synonymy. Similarly, in decreolization, according to Bickerton (1975), acharacteristic strategy is for new morphemes to be slotted into place in creolestructures—semantic as well as syntactic (p. 70). An example from GuyaneseCreole is the replacement of bin by di/did. Bin is an anterior time marker thatis used most frequently with stative events as in (1), where it indicates asimple past.

(1) dem bin gat wan lil hausThey had a little house.(Bickerton, 1975, p. 35).

For nonstative events, simple past is most frequently indicated by the verb stemalone. The change from basilectal bin to mesolectal di/did appears to occur ina way that is similar to the color category change. In phase 1, near-synonymy,di/did is in free variation with bin, so that there is no distributional differencebetween (1) and (2).

Appendix • 187

(2) dem di/did gat wan lil hausThey had a little house.

In phases 2 and 3, coextensivity and inclusion, di/did still alternates in allsemantic environments with bin, but it appears to develop a focal point in aparticular semantic field, namely use with nonstative verbs. Bickerton (1975)notes that at the low mesolectal level, bin occurs with 75 percent of past stativesand 25 percent of past nonstatives. The figures for di/did exactly reverse thisdistribution. An equivalent to phases 4 and 5 of color change is not attestedin the replacement of bin by di/did. However, it seems reasonable that thesephases could occur, if only for brief periods of time. In phase 4, overlapping,bin would occur with prototypical cases of statives, and di/did would occurwith prototypical cases of nonstatives. In a system that has reached phase 5,complementarity, bin would occur only with prototypical statives and didwould occur in all other cases. In decreolization, of course, there is a furtherphase where did completely replaces bin, and the stative/nonstative marking ofpast time disappears.

The social aspect of the transition problem in both sound change and colorcategory change involves the question of the locus of change. Does significantchange occur in individuals over the course of their lifetimes, or does changeoccur mainly from generation to generation? As we saw in chapter 3, Labov(2001a) views the locus of major systemic sound change as generational, notindividual. He believes that once an individual’s sound system has been set, itcan be modified only in respect to low level rules, such as raising or loweringrules, which produce a different allophone of a phoneme. However, tensingrules, which are likely to produce a new phoneme, can be learned only bychildren.

The situation with color change may be similar. Perhaps individuals do notchange the basic color systems that they first learn, but rather add new second-ary colors to the system, which are used with increasing frequency. When thechildren of these speakers construct their own color systems, they incorporatethese secondary colors as basic colors. This hypothesis is supported by Kay(1975), who finds that in general younger speakers have more basic color cat-egories than older speakers. In addition, MacLaury (1986, 1991) shows thatolder speakers use secondary color terms that younger speakers adopt as basic.The absence of longitudinal data preclude the conclusion that older speakersinnovated the terms when they were young, but this seems likely. In researchconducted in southern Mexico (MacLaury, 1986, pp. 320–324; Burgess,Kempton, and MacLaury, 1983, fig. 7), one Tarahumara speaker was inter-viewed twice, once in his village and again two years later in Oaxaca, where hewas far from home and had just finished two months of intensive linguistictraining. The later data showed shrinkage and polarization of categories andthe use of more qualifiers, signs of stronger emphasis on distinctiveness. But

188 • Appendix

the speaker had not added new basic color categories. Thus, some evidence sug-gests that individual adults make secondary changes but not basic changes inboth color and sound and, therefore, that in both systems the locus of basicchange is normally between generations (however, see the discussion in regardto the embedding problem).

The Evaluation Problem

A surprising feature of color category research in Mesoamerica is the extremevariation that is seen between members of the same speech community. Asa rule, people who interact daily differ vastly in the organization and complex-ity of their color category systems. For example, speaker A might have threebasic color terms, and speaker B might have ten. Neither is aware of their differ-ence, and both are surprised when the difference becomes apparent duringthe course of elicitation. Berlin and Berlin (1975, p. 86, note 9) report youngAguaruna speakers who register surprise and even laugh at their senior relativesas the elders label color chips for an investigator. MacLaury (1991) reportsthat, in spite of this extreme difference in color category systems, apparentlycommunication does not break down. However, closer investigation mayreveal that misunderstandings are more frequent than presently supposed,as was the case with the speakers with different phonological systems describedin chapter 3.

The Actuation Problem and the Embedding Problem

Recall from chapter 3 that social factors are a major motivation for soundchange. MacLaury (1991) claims that social factors can also affect color change.An example is the case of Tzeltal and Tzotzil, two Mayan languages of southernMexico. Tzeltal is proceeding along the expected path of color developmentwith some speakers at stage III of figure A.1 and other speakers at stages Vand even VI. Tzotzil speakers exhibit a similar range from stage III throughstage VI, but they also show a remarkable phenomenon: the various stages ofcolor terms can coexist within the same speaker. For example, one subject hada term for green and a term for blue, but also a term for green-blue. In fact, thissubject had preserved all the terms of the older stage III system with theiroriginal meanings while adding new terms to create a stage V system. In thenormal course of development, when younger speakers add a new basic colorterm, the meaning of an older term is modified to denote a smaller range. Thatis, the new system is built out of the old system, not created alongside it.

The explanation for the unusual development of color terms in Tzotzil canbe found in the social circumstances of the community. Tzotzil is spoken in thevillage of Navenchauc, and Tzeltal is spoken in the village of Tenejapa. Tenejapais an isolated community located at the end of a dirt road, which is exposedto little external influence. Navenchuac, on the other hand, has been exposed tomassive external influence ever since the Pan-American highway was built

Appendix • 189

through the town. In reaction to this threat to tradition, the village has becomeextremely conservative, as seen in the inhabitants’ traditional patterns of dressand in their guarded relationships with outsiders. MacLaury (1991) hypo-thesizes that the older system of color terms in Tzotzil has become emblematicof the culture and thus has resisted extinction despite the emergence of thenew color terms.

Cognitive Aspects of Sound Change and Color Change

It is apparent from the discussion so far that sociolinguists find the motivationfor sound change mainly within the social realm whereas cognitive anthropo-logists find the motivation for color change mainly within the cognitive realm.This difference undoubtedly stems, in part, from the academic orientation ofthe two disciplines. Nevertheless, Labov (1979, 1994) has acknowledged thata full explanation of linguistic change must include cognitive as well as socialconsiderations. For example, cognitive factors seem to be necessary to explainthe very beginnings of sound change. In the search for innovators, individualdifferences within the same social network must be explained. In chapter 3 wesaw that Labov (2001a) identified innovators as individuals who are leaderswithin their social network and have contacts outside the network. But someearlier research by Labov (1979) suggests that cognitive differences betweenindividuals may play a role, as well.

Labov (1979) reported that repetition tests administered to adolescentAfrican American English (AAE) speakers produced highly variable results.In their spontaneous speech, all of the subjects showed equivalent proficiencyin the AAE vernacular, yet some subjects were unable to repeat Standard Eng-lish constructions, whereas others, whom Labov calls “verbal leaders,” hadlittle difficulty. Labov (1979) characterizes the first group as “dialect bound.”The distinction between dialect bound and non-dialect bound individualscorresponds to Day’s (1979) distinction between language bound and languageoptional individuals. Language bound individuals tend to perceive stimuli interms of existing mental schemas, ignoring non-categorical differencesbetween stimuli. The language bound/language optional distinction betweenindividuals recalls MacLaury’s (1991) distinction discussed above betweenindividuals who focus on similarity (corresponding to language boundspeakers) and individuals who focus on distinctiveness (corresponding to lan-guage optional speakers). Thus, the same cognitive principle may help toexplain individual innovation in color category development and in phono-logical change. Within a society exposed to novelty, individuals who are lan-guage optional will be innovators of color category development. Within amiddle level social group, individuals who are language optional and who havemany contacts outside the group (and are thus exposed to linguistic novelty)may be innovators of sound change. If these innovators have high local status,the change may spread and become emblematic of the group.

190 • Appendix

Conclusions

Color category change has been studied largely within a cognitive paradigm,and sound change has been studied largely within a sociological paradigm, butboth cognitive and sociological factors are necessary fully to explain both typesof change.

Appendix • 191

Notes

Chapter 2This chapter is based on Adamson and Regan (1991).1.This section is based closely on Houston’s (1985) discussion.2.

Chapter 3 The parentheses indicate that a form is variably realized. This notation is1.used when the discussion specifically focuses on language change. Otherwise, thetraditional slashes and brackets are used.

Chapter 4

This percentage has dramatically increased in recent years (Regan, V. personal1.communication, December 9, 2007).Montreal has a similar French immersion program (but not identical: the2.Montreal program involves 100 percent French immersion in grades 1 to 3). For areview, see Adamson (2005). Montreal’s program is often cited as the source ofthe instructional method called structured immersion, which is designated by con-stitutional amendment as the only legal method allowed for instructing Englishlanguage learners in the states of Arizona and California.

Chapter 5Though younger children might still be in the process of acquiring Chinese, their1.continued acquisition would be maintained by their primary caregivers (i.e., theirparents) in the home, thus limiting the confounding possibility that English wasbeing acquired as a “second” first language. In any case, the children had alreadybegun to develop a first language matrix upon which to build.The original data were collected by Dr Muriel Saville-Troike and her research2.assistant Junlin Pan. Their focus was on the children’s Chinese language use, butthey also transcribed the English narratives, which they graciously shared with us.Replacive verbs ultimately had to be omitted from the analysis because there were3.too few tokens in the data.

Chapter 6In the interest of continuity I will use variation theory terminology when discuss-1.ing this article; however, the authors used different terms.Feldman (2006) presents a compelling case that the relationship between con-2.nectionist computer models and neural computation in the brain is closerthan metaphorical. However, he notes that others disagree, quoting Chomsky’srecent statement, “The belief that neurophysiology is even relevant to the func-tioning of the mind is just a hypothesis. Who knows if we’re looking at the rightaspects of the brain at all?” (quoted in Feldman, 2006, p. 80).

192

Chapter 7Variationists have learned to question purported cases of free variation and to1.look for factors that favor one form or the other. It is likely that formal discoursewould favor dove, and informal discourse would favor dived. In regard to thispoint Levelt (1989) remarks, “Certain so-called registers . . . seem to select forlexical items with particular connotational properties. [This area] is a matter ofmuch dispute” (p. 183).Notice that the formalism used in (5) does not distinguish between the necessary2.and the optional features of the prototype because it does not use a conventionlike angled brackets.The figures in table 7.2 were arrived at in the following way. The subjects rated the3.grammaticality of each test sentence on a scale of 1 to 5, where 1 meant completelyungrammatical and 5 meant completely grammatical. Then the mean score foreach sentence was calculated. Some sentences received a mean score of 5; somesentences received a mean score of 1; and some sentences received a mean score inbetween these numbers. For ease of display, the mean scores were convertedto a grammaticality index score ranging from 1.0 (completely grammatical)to −1.0 (completely ungrammatical), using the formula:

mean score − 32

with the result rounded to the nearest tenths place.

Chapter 9Actually, there is more in Bachman’s (1990) definition of sociolinguistic com-1.petence, including knowledge of literary registers, but this is the most importantand frequently discussed part of the construct.The focus of a relative clause is determined by what function the relative pro-2.noun (including a deleted relative pronoun) serves in the clause. Here are someexamples:

a. The forest [which burned down] was beautiful. This relative clause issubject focus because which serves as the subject of the clause.

b. The forest [which the careless camper burned down] was beautiful. Thisrelative clause is direct object focus because which serves as the directobject of the clause.

c. The forest [the careless camper burned down] was beautiful. Thisrelative clause is also direct object focus because which could be insertedin the direct object position, as in b.

Notice, by the way, that the same thing is happening with (3) and (4). These3.sentences are not totally unacceptable. Rather, guilt and embarrassment are cen-tral members of the prototype category “emotions that can be concealed,” andpleasure and rage are outlying members. The latter verbs can be fused with theconstruction, but they don’t sound quite right.As Dowty, Wall, and Peters (1981, p. 7) put it: “As a first approximation4.let us simply assume that the world contains various sorts of objects—call them‘entities’—and that in a particular state-of-affairs these entities have certainproperties and stand in certain relations to each other.”

Notes • 193

It is not easy to identify “radical relativists.” According to Smith (1989, p. 218),5.even Rorty is “more positive than he acknowledges.” Smith names herself,Feyerabend (1978), Goodman (1968), and Barnes and Bloor (1982) as self-identified “radical relativists.”

194 • Notes

References

Adamson, H. D. (1980). A study of variable syntactic rules in the interlanguage of Spanish-speakingadults. Unpublished doctoral dissertation. Georgetown University.

Adamson, H. D. (1988). Variation theory and second language acquisition. Washington, DC:Georgetown University Press.

Adamson, H. D. (1993). Academic competence: Theory and classroom practice. New York: Longman.Adamson, H.D. (2005). Language minority students in American schools: An education in English.

Mahwah, NJ: Lawrence Erlbaum Associates.Adamson, H. D., & Elliott, O. P. (1997). Sources of variation in interlanguage. International Review

of Applied Linguistics, 35, 87–98.Adamson, H. D., Fonseca-Greber, B., Kataoka, K., Scardino, V., & Takano, S. (1996). Tense marking

in the English of Spanish-speaking adolescents. In R. Bayley & D. R. Preston (Eds.), Secondlanguage acquisition and linguistic variation (pp. 121–134). Amsterdam: John Benjamins.

Adamson, H. D., & Kovac, C. (1981). Variation theory and second language acquisition: An analysisof Schumann’s data. In D. Sankoff & H. Cedergren (Eds.), Variation Omnibus (pp. 285–292).Carbondale, IL and Edmonton, AL: Linguistic Research, Inc.

Adamson, H. D., & Regan, V. (1991). The acquisition of community speech norms by Asianimmigrants learning ESL: A preliminary study. Studies in Second Language Acquisition,13, 1–22.

Ahrens, K. (1995). The mental representation of verbs. Unpublished doctoral dissertation. Universityof California, San Diego.

Aksu-Koç, A. A., & von Stutterheim, C. (1994). Temporal relations in narrative: Simultaneity. InR. A. Berman & D. I. Slobin (Eds.), Relating events in narrative: A crosslinguistic developmentalstudy (pp. 393–455). Hillsdale, NJ: Lawrence Erlbaum.

Andersen, R. (1993). Four operating principles and input distribution as explanations forunderdeveloped and mature morphological systems. In K. Hyltenstam & A. Viborg (Eds.),Progression and regression in language (pp. 309–339). Cambridge, UK: Cambridge UniversityPress.

Anderson, J. R. (1980). Cognitive psychology and its implications. San Francisco: Freeman.Anderson, J.R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.Anshen, F (1969). Speech variation among Negroes in a small southern community. Unpublished

doctoral dissertation, New York University.Anshen, F. (1975). Varied objections to various variable rules. In R. W. Fasold & R. W. Shuy (Eds.),

Analyzing variation in language (pp. 1–10). Washington, DC: Georgetown University Press.Ashby, W. J. (1981). The loss of the negative particle “ne” in French: A syntactic change in progress.

Lingua, 39, 119–137.Ashby, W. J. (1996). A syntactic change in progress. Language, 57, 674–678.Asher, J. J. (1969). The total physical response approach to second language learning. The Modern

Language Journal, 53, 1–17.Auger, J. (2002). French immersion in Montreal: Pedagogical norm and functional competence. In

S. Gass, K. Bardovi-Harlig, S. S. Magnan, & J. Walz (Eds.), Pedagogical norms for second andforeign language learning and teaching: Studies in honor of Albert Valdman (pp. 81–101).Amsterdam: John Benjamins.

Bachman, L. (1990). Fundamental considerations in language testing. Oxford: Oxford UniversityPress.

Baker, C. L. (1979). Syntactic theory and the projection problem. Linguistic Inquiry, 10, 533–581.Bailey, N., Madden, C., & Krashen, S. (1974). Is there a “natural sequence’ in adult second language

learning?” Language Learning, 21, 235–243.Baillargeon, R. (1995). Physical reasoning in infancy. In M. S. Gazzaniga (Ed.), The cognitive

neurosciences (pp. 181–204). Cambridge, MA: MIT Press.Baker, C. L. (1979). Syntactic theory and the projection problem. Linguistic Inquiry, 10, 533–581.

195

Bakhtin, M. M. (1981). The dialogic imagination (C. Emerson & M. Holquist, Trans.; M. Holquist.Ed.). Austin: University of Texas Press.

Bardovi-Harlig, K. (1995). A narrative perspective on the development of the tense/aspect systemin second language acquisition. Studies in Second Language Acquisition, 17, 263–291.

Bardovi-Harlig, K., & Gass, S. (2002). Introduction. In: S. Gass, K. Bardovi-Harlig, S. S. Magnan,& J. Walz (Eds.), Pedagogical norms for second and foreign language learning and teaching:Studies in honor of Albert Valdman (pp. 1–12). Amsterdam: John Benjamins.

Barlow, M. (1994). Corpora for theory and practice. Houston, Texas: Rice University, ms.Barnes, B., & Bloor, D. (1982). Relativism, rationalism and the sociology of knowledge. In

M. Hollis & S. Lukes (Eds.), Rationality and relativism (pp. 57–65). Cambridge, MA: HarvardUniversity Press.

Barsalou, L. W. (1992). Cognitive psychology: An overview for cognitive scientists. Hillsdale, NJ:Lawrence Erlbaum.

Bayley, R. (1994). Interlanguage variation and the quantitative paradigm: Past tense markingin Chinese-English. In E. Tarone, S. M. Gass, & A. Cohen (Eds.), Research methodology insecond-language acquisition (pp. 157–181). Hillsdale, NJ: Lawrence Erlbaum.

Bayley, R. (1996). Competing constraints on variation in the speech of adult Chinese learners ofEnglish. In R. Bayley & D. Preston (Eds.), Second language acquisition and linguistic variation(pp. 97–130). Philadelphia: John Benjamins.

Bayley, R., & Regan, V. (2004). The acquisition of sociolinguistic competence. Journal of Socio-linguistics, 8, 323–338.

Bell, A. (1977). The language of radio news in Auckland: A sociolinguistic study of style, audience andsubediting variation. Unpublished doctoral dissertation. University of Auckland.

Bell, A. (1984). Language style as audience design. Language in Society, 13 (2), 145–204.Bell, A. (1991). Audience accommodation in the mass media. In H. Giles, J. Coupland, &

N. Coupland (Eds.), Contexts of accommodation: Developments in applied sociolinguistics(pp. 69–102). Cambridge, UK: Cambridge University Press.

Bell, A. (2001). Back in style: Reworking audience design. In P. Eckert & J. R. Rickford (Eds.), Styleand sociolinguistic variation (pp. 139–169). New York: Cambridge University Press.

Bencini, G., & Goldberg, A. (2000). The contribution of argument structure constructionsto sentence meaning. Journal of Memory and Language, 43, 640–651.

Berdan, R. (1975). The necessity of variable rules. In R. W. Fasold & R. Shuy (Eds.), Analyzingvariation in language (pp. 11–26). Washington, DC: Georgetown University Press.

Berdan, R. (1996). Disentangling language acquisition from language variation. In R. Bayley &D. Preston (Eds.), Second language acquisition and linguistic variation (pp. 203–244).Philadelphia: John Benjamins.

Berlin, B., & Berlin, E. A. (1975). Aguaruna color categories. American Ethnologist, 2, 61–87.Berlin, B., & Kay, P. (1969). Basic color terms: Their universality and evolution. Berkeley and Los

Angeles: University of California Press.Berman, R. A., & Slobin, D. I. (Eds.) (1994). Relating events in narrative: A crosslinguistic develop-

mental study. Hillsdale, NJ: Lawrence Erlbaum.Bialystok, E. (1999). Cognitive complexity and attentional control in the bilingual mind. Child

Development, 7 (3), 636–644.Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and

use. Cambridge, UK: Cambridge University Press.Bickerton, D. (1971). Inherent variability and variable rules. Foundations of Language, 7, 457–492.Bickerton, D. (1975). Dynamics of a Creole System. New York: Cambridge University Press.Bickerton, D. (1981). The Roots of Language. Ann Arbor: Karoma.Bley-Vroman, R. (1990). The logical problem of foreign language learning. Lingustic Analysis,

20, 3–49.Bloom, P. (1993). Grammatical continuity in language development: The case of subjectless

sentences. Linguistic Inquiry, 17, 721–734.Bourhis, R. Y., & Giles, H. (1977). The language of intergroup distinctiveness. In H. Giles (Ed.),

Language, ethnicity and intergroup relations (pp. 19–135). London: Academic Press.Bowerman, M. (1988). The “no negative evidence” problem: How do children avoid constructing

an overly general grammar? In J. A. Hawkins (Ed.), Explaining language universals. Malden,MA: Blackwell.

Bowerman, M., & Bresnan, J. (1982). The mental representation of grammatical relations. Cambridge,MA: MIT Press.

196 • References

Bresnan, J. (Ed.) (1982). The mental representation of grammatical relations. Cambridge, MA: MITPress.

Brinton, D. M., & Master, P. (Eds.) (1997). New ways in content-based instruction. Alexandria, VA:TESOL.

Brown, R. (1965). Social psychology. New York: Free Press.Bruffee, K. A. (1984). Collaborative learning and the “conversation of mankind.” College English,

46, 635–652.Bruffee, K. A. (1986). Social construction, language, and the authority of knowledge: A biblio-

graphical essay. College English, 48, 773–790.Bull, W. (1965). Spanish for teachers: Applied linguistics. New York: Ronald Press.Burgess, D., Kempton, W., & MacLaury, R. (1983). Tarahumara color modifiers: Category structure

presaging evolutionary change. American Ethnologist, 10, 133–149.Bybee, J., & Moder, L. (1983). Morphological classes as natural categories. Language, 59, 252–269.Bybee, J., & Slobin, D. (1982). Rules and schemas in the development and use of the English past

tense. Language, 58, 265–289.Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language

teaching and testing. Applied Linguistics, 1, 1–47.Cedergren, H. (1973). The interplay of social and linguistic factors in Panama. Unpublished doctoral

dissertation. Cornell University.Cedergren, H., & Sankoff, D. (1974). Variable rules: Performance as a statistical reflection of

competence. Language, 50, 333–355.Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.Chomsky, N. (1981). Lectures on government and binding. Dordrecht: Foris.Chomsky, N. (1986). Knowledge of language: Its nature, origin and use. New York: Praeger.Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT Press.Chomsky, N. (1999). On the nature, use and acquisition of language. In W. C. Ritchie & T. K. Bhatia

(Eds.), Handbook of child language (pp. 33–54). San Diego: Academic Press.Clark, H. H., & Clark, E. V. (1977). Psychology and language: An introduction to psycholinguistics.

New York: Harcourt.Clifton, C., Kurcz, I., & Jenkins, J. J. (1965). Grammatical relations as determinants of sentence

similarity. Journal of Verbal Learning and Verbal Behavior, 4, 112–117.Clifton, C., & Odom, P. (1966). Similarity relations among certain English sentence constructions.

Psychological Monographs, 80 (5), 1–35.Cofer, T. (1972). Linguistic variability in a Philadelphia speech community. Unpublished doctoral

dissertation. University of Pennsylvania.Corder, S. P. (1981). Formal simplicity and functional simplification. In R. Anderson (Ed.), New

Dimensions in second language acquisition research (pp. 156–152). Rowley, MA: NewburyHouse.

Crain, S., & McKee, C. (1986). Acquisition of structural restrictions on anaphora. In S. Berman,J.-W. Choe, & J. M. McDonough (Eds.), Proceedings of the North Eastern Linguistics Society16 (pp. 94–110). Amherst, MA: GLSA.

Darwin, C. (1859/1998). The origin of species. New York: Modern Library.Day, R. (1979). Verbal fluency and the language bound effect. In C. J. Fillmore, D. Kempler,

& W. S.-Y. Wang (Eds.), Individual differences in language ability and language behavior(pp. 57–84). New York: Academic Press.

Dell, G., Chang, F., & Griffin, Z. (2001). Connectionist models of language production: Lexicalaccess and grammatical encoding. In M. Christiansen & N. Chater (Eds.), ConnectionistPsycholinguistics (pp. 112–143). Westport, CT: Ablex.

Dewey, J. (1916). Democracy and education. New York: Macmillan.Dickerson, L. B. (1974). Internal and external patterning of phonological variability in the speech

of Japanese learners of English: Toward a theory of second language acquisition. Unpublisheddoctoral dissertation. University of Illinois.

Dickerson, L. B. (1975). The learner’s interlanguage as a system of variable rules. TESOL Quarterly,9, 401–407.

Dougherty, J. W. D. (1975). A universalist analysis of variation and change in color semantics.Unpublished doctoral dissertation. University of California, Berkeley.

Dougherty, J. W. D. (1977). Color categorization in West Futunese: Variability and change. InB. G. Blount & M. Sanchez (Eds.), Sociocultural dimensions of language change (pp. 103–118).New York: Academic Press.

References • 197

Doughty, K. (1991). Second language instruction does make a difference: Evidence from anempirical study of SL relativization. Studies in Second Language Acquisition, 13 (4), 431–469.

Dowty, D., Wall, R., & Peters, S. (1981). Introduction to Montague semantics. Dordrecht: Reidel.Dulay, H., & Burt, M. K. (1974). Natural sequences in child second language acquisition. Language

Learning, 24, 37–53.Eckert, P. (1988). Adolescent social structure and the spread of linguistic change. Language in

Society, 17, 183–208.Eckert, P. (1999). Linguistic variation as social practice. Oxford: Blackwell.Eckert, P., & McConnell-Ginet, S. (1992). Think practically and look locally: Language and gender

as community-based practice. Annual Review of Anthropology, 21, 461–490.Eckert, P. & Rickford, J. R. (Eds.) (2001). Style and sociolinguistic variation. New York: Cambridge

University Press.Eckman, F., Bell, L., & Nelson, D. (1988). On the generalization of relative clause instruction in the

acquistion of English as a second language. Applied Linguistics, 9, 1–20.Elliott, O. P. (1995). A glance at the syntactic and semantic principles underlying the Spanish clitic se:

A study in second language acquisition. Unpublished doctoral dissertation. University ofArizona.

Ellis, R. (1990). Reply to Gregg. Applied Linguistics, 11, 384–391.Ellis, R. (1994). The study of second language acquisition. Oxford: Oxford University Press.Fasold, R. (1972). Tense marking in Black English: A linguistic and social analysis. Washington, DC:

Center For Applied Linguistics.Fasold, R. (1985). Perspectives on sociolinguistic variation. Language in Society, 14, 515–526.Fasold, R. W., & Shuy, R. W. (Eds.) (1970). Teaching Standard English in the inner city. Washington,

DC: Center for Applied Linguistics.Fasold, R. W., & Shuy, R. W. (Eds.) (1975). Analyzing variation in language. Washington, DC:

Georgetown University Press.Feldman, J. (2006). From molecule to metaphor. Cambridge, MA: MIT Press.Feyerabend, P. (1978). Against method: Outline of an anarchistic theory of knowledge. London:

Verso.Feynmann, R. (1965). The character of physical law. Cambridge, MA: MIT Press.Fillmore, C. J. (1988). The mechanisms of a construction grammar. BLS, 14, 35–55.Fischer, J. (1958). Social influence on the choice of linguistic variant. Word, 14, 47–56.Flege, J. (1991). Perception and production: The relevance of phonetic input to L2 phonological

learning. In T. Huebner and C. Ferguson (Eds.), Crosscurrents in second language acquisitionand linguistic theory. Philadelphia: John Benjamins.

Fodor, J. A. (1975). The language of thought. New York: Thomas Y. Crowell.Fodor, J. A., & Garrett, M. F. (1967). Some syntactic determinents of sentential complexity. Percep-

tion and Psychophysics, 2, 289–296.Fox, C.A. (2002). Incorporating variation in the French classroom. In S. Gass, K. Bardovi-Harlig,

S. S. Magnan, & J. Walz (Eds.), Pedagogical norms for second and foreign language learning andteaching: Studies in honor of Albert Valdman (pp. 201–211). Amsterdam: John Benjamins.

Frawley, W. (1997). Vygotsky and cognitive science. Cambridge, MA: MIT Press.Gardner, R. C. (2002). Social psychological perspective on second language acquisition. In

R. Kaplan (Ed.), The Oxford Handbook of Applied Linguistics (pp. 160–169). Oxford: OxfordUniversity Press.

Gardner, R. C., & Lambert, W. E. (1972). Attitudes and motivations in second language learning.Rowley, MA: Newbury House.

Garrett, M. (1975). The analysis of sentence production. In G. H. Bower (Ed.), The psychology oflearning and motivation (pp. 133–175). San Diego: Academic Press.

Gass, S. (1980). An investigation of syntactic transfer in adult second language learners. InR. Scarcella & S. Krashen (Eds.), Research in second language acquisition. Rowley, MA:Newbury House.

Gass, S., Bardovi-Harlig, K., Magnan, S. S., & Walz, J. (Eds.) (2002). Pedagogical norms for secondand foreign language learning and teaching: Studies in honor of Albert Valdman. Amsterdam:John Benjamins.

Gattegno, C. (1972). Teaching foreign languages in schools: The silent way. New York: EducationalSolutions.

Geertz, C. (1983). Local knowledge: Further essays in interpretive anthropology. New York: BasicBooks.

198 • References

Giles, H. (Ed.) (1984). The dynamics of speech accommodation. Amsterdam: Mouton.Giles, H., Coupland, J., & Coupland, N. (Eds.) (1991). Contexts of accommodation: Developments in

applied sociolinguistics. Cambridge, UK: Cambridge University Press.Giles, H., & Powesland, P. F. (1975). Speech style and social evaluation. London: Academic Press.Gladney, M.R. (1973). Problems in teaching children with nonstandard dialects. In J. L. Laffey

& R. W. Shuy (Eds.), Language differences: Do they interfere? (pp. 40–46). Newark, DE:International Reading Asociation.

Goldberg, A. (1995). Constructions: A construction grammar approach to argument structure.Chicago: University of Chicago Press.

Goldberg, A. (2006). Constructions at work: The nature of generalization in language. Oxford:Oxford University Press.

Goodman, K. (1968). The psycholinguistic nature of the reading process. Detroit, MI: Wayne StateUniversity Press.

Gregg, K. R. (1990). The variable competence model and why it isn’t. Applied Linguistics, 11,364–383.

Gregg, K. R. (1996). The logical and developmental problems of second language acquisition. InW. C. Ritchie & T. K. Bhatia (Eds.), Handbook of second language acquisition (pp. 49–81).San Diego: Academic Press.

Gropen, J. (1989). Learning locative verbs: How universal linking rules constrain productivity.Unpublished doctoral dissertation. MIT.

Gropen, J., Pinker, S., Hollander, M., & Goldberg, R. (1991). Affectedness and direct objects:The role of lexical semantics in the acquisition of verb argument structure. Cognition, 41,153–195.

Hage, B., & Hawkes, R. (1975). Binumarien color categories. Ethnology, 24, 287–300.Hansen, J. (2001). Linguistic constraints on the acquisition of English syllable codas by native

speakers of Mandarin Chinese. Applied Linguistics, 22 (3), 338–365.Hare, M., & Goldberg, A. (1999). Structural priming: Purely syntactic? Paper presented at the

Proceedings of the Cognitive Science Society. Proceedings of the 22nd annual cognitive sciencesociety (pp. 208–211). Mahwah, NJ: Lawrence Erlbaum.

Hecht, M. L. (2002). A research odyssey toward the development of a communication theory ofidentity. Communication Monographs, 60, 76–82.

Houston, S. (1985). Continuity and change in English morphology: The variable ING. Unpublisheddoctoral dissertation. University of Pennsylvania.

Howard, M., Lemee, I., & Regan, V. (2006). The L2 acquisition of a phonological variable: The caseof /l/ deletion in French. Journal of French Language Studies, 16, 1–24.

Hudson Kam, C. L., & Newport, E. L. (2005). Regularizing unpredictable variation: The rolesof adult and child learners in language formation and change. Language Learning andDevelopment, 1 (2), 151–195.

Huebner, T. (1985). System and variability in interlanguage syntax. Language Learning, 35, 141–163.Hymes, D. (1972). On communicative competence. Philadelphia: University of Pennsylvania Press.Jacobson, R. (1962). Selected Writings I. The Hague: Mouton.Johnson, M. (1987). The body in the mind. Chicago: University of Chicago Press.Jones, M., & Haywood, S. (2004). Facilitating the acquisition of formulaic sequences: An explora-

tory study in an EAP context. In N. Schmitt (Ed.), Formulaic sequences: Acquisition processingand use (pp. 269–300). Philadelphia: John Benjamins.

Joseph, J. E. (2004). Language and identity: National, ethnic, religious. New York: Palgrave.Kachru, B. (1996). The paradigm of marginality. World Englishes, 15 (3), 241–255.Kasckhuk, M. P., & Glenberg, A. M. (2000). Constructing meaning: The role of affordedness and

grammatical constructions in sentence comprehension. Journal of memory and language, 43,508–529.

Kay, P. (1975). Synchronic variability and diachronic change in basic color terms. Journal ofLanguage in Society, 4, 257–270.

Kay, P. (1978). Variable rules, community grammar, and linguistic change. In D. Sankoff (Ed.),Linguistic variation: Models and methods (pp. 71–83). New York: Academic Press.

Kay, P. (1990). Even. Linguistics and philosophy, 13 (1), 59–112.Kay, P., & McDaniel, C. (1978). The linguistic significance of the meaning of basic color terms.

Language, 54, 610–646.Kay, P., & McDaniel, C. (1979). On the logic of variable rules. Language in Society, 8 (3), 151–187.Keller, H. (1988). The story of my life. Norwalk, CT: Easton Press.

References • 199

Kemmer, S. (1993). The middle voice. Philadelphia: John Benjamins.Keenan, E. L., & Comrie, B. (1977). Noun phrase accessibility and universal grammar. Linguistic

Inquiry 8, 63–99.Kovac, C., & Adamson, H.D. (1981). Variation theory and first language acquisition. In D. Sankoff

& H. Cedergren (Eds.), Variation omnibus (pp. 403–410). Carbondale, IL and Edmonton, AL:Linguistic Research, Inc.

Kramsch, C. (2002a). Beyond the second vs. foreign language dichotomy. In K. S. Miller &P. Thompson (Eds.), Unity and diversity in language use (pp. 1–21). London: Continuum.

Kramsch, C. (2002b). Standard, norm, and variability in language learning: A view from foreignlanguage research. In S. Gass, K. Bardovi-Harlig, S. S. Magnan, & J. Walz (Eds.), Pedagogicalnorms for second and foreign language learning and teaching: Studies in honor of AlbertValdman (pp. 60–79). Amsterdam: John Benjamins.

Krashen, S. (1978). The monitor model of second language acquisition. In R. Gingras (Ed.), Secondlanguage acquisition and foreign language teaching. Arlington, VA: Center for AppliedLinguistics.

Krashen, S. (1982). Principles and practices of second language acquisition. Oxford: Pergamon.Krashen, S. (1985). The input hypothesis: Issues and implementations. London: Longman.Krashen, S. (1987). Principles and practice in second language acquisition. Englewood Cliffs, NJ:

Prentice-Hall.Kuhn, T. S. (1959). The Copernican revolution: Planetary astronomy in the development of western

thought. New York: Vintage.Kuhn, T. S. (1970). The structure of scientific revolutions. Chicago: University of Chicago Press.Kuhn, T. S. (1977). The essential tension. Chicago: University of Chicago Press.Labov, W. (1966). The social stratification of English in New York City. Washington, DC: Center for

Applied Linguistics.Labov, W. (1967). Some sources of reading problems for Negro speakers of nonstandard English.

In A. Frazier (Ed.), New directions in elementary English (pp. 140–167). Champaign, IL:NCTE.

Labov, W. (1969). Contraction, deletion and inherent variability of the English copula. Language,45, 715–762.

Labov, W. (1972a). Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.Labov, W. (1972b). Language in the inner city. Philadelphia: University of Pennsylvania Press.Labov, W. (1973). The boundaries of words and their meanings. In C.-J. Bailey & R. Shuy (Eds.),

New ways of analyzing variation in English (pp. 340–373). Washington, DC: GeorgetownUniversity Press.

Labov, W. (1979). Locating the frontier between social and psychological factors in linguistic vari-ation. In C. J. Fillmore, D. Kempler, & W. S.-Y. Wang (Eds.), Individual differences in languageability and language behavior (pp. 324–340). New York: Academic Press.

Labov, W. (1982). Building on empirical foundations. In W. P. Lehmann, & Y. Malliel (Eds.),Perspectives on historical linguistics (pp. 17–92). Amsterdam: John Benjamins.

Labov, W. (1984). Field methods of the project on linguistic change and variation. In J. Baugh &J. Sherzer (Eds.), Language in use: Readings in socio-linguistics (pp. 28–53). Englewood Cliffs,NJ: Prentice-Hall.

Labov, W. (1994). Principles of language change Vol. 1: Internal factors. Oxford: Blackwell.Labov, W. (2001a). Principles of language change Vol. 2: Social factors. Oxford: Blackwell.Labov, W. (2001b). The anatomy of style-shifting. In P. Eckert & J. R. Rickford, (Eds.), Style and

sociolinguistic variation (pp. 85–108). New York: Cambridge University Press.Labov, W., Cohen, P., Robins, C., & Lewis, J. (1968). A study of the non-standard English of

Negro and Puerto Rican speakers in New York City. USOE Final Report, Research ProjectNo. 3288.

Labov, W., & Labov, T. (1976). Learning the syntax of questions. Paper delivered at the Conferenceon the Psychology of Language, Stirling, Scotland.

Ladefoged, P. (1975). A course in phonetics. New York: Harcourt.Laffey, J. L., & Shuy, R. (1973). Language differences: Do they interfere? Newark, DE: International

Reading Association.Lakoff, G. (1987). Women, fire, and dangerous things. Chicago: University of Chicago Press.Langacker, R. (1991). Foundations of cognitive grammar, Vol. 2. Chicago: University of Chicago

Press.LaPonce, J. A. (1992). Reducing the tensions resulting from language contacts: Personal or territorial

200 • References

solutions? In D. Bonin (Ed.), Reconciliation: The language issue in Canada in the 1990s(pp. 125–132). Kingston, Ontario: Queens University Press.

Larsen-Freeman, D. (1975). The acquisition of grammatical morphemes by adult ESL students.TESOL Quarterly, 9, 409–420.

Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech production.

Behavioral and Brain Sciences, 22, 1–75.Luria, A. R. (1976). Cognitive development. Cambridge, MA: Harvard University Press.MacLaury, R. (1986). Color in mesoamerica, vol. 1: A theory of composite categorization.

Unpublished doctoral dissertation. University of California, Berkeley.MacLaury, R. (1987). Color category evolution and Shuswap yellow-green. American Anthropolo-

gist, 89, 107–124.MacLaury, R. (1991). Social and cognitive motivations of change: Measuring variability in color

semantics. Language, 67 (1), 34–62.Major, R. (2004). Gender and stylistic variation in second language phonology. Language Variation

and Change, 16, 169–188.McLaughlin, B. (1980). Theory and research in second language learning: An emerging paradigm.

Language Learning, 30, 331–350.McLaughlin, B. (1987). Theories of second language learning. London: Edward Arnold.Meisel, J., Clahsen, H, & Pienemann, M. (1981). On determining developmental stages in natural

second language acquistion. Studies in second language acquisition, 3, 109–35.Miller, G. A., & McKean, K. O. (1964). A chronometric study of some relations between sentences.

Quarterly Journal of Experimental Psychology, 16, 297–308.Milroy, J. (1982). Probing under the tip of the iceberg: Phonological “normalization” and the shape

of speech communities. In S. Romaine (Ed.), Sociolinguistic variation in speech communities(pp. 35–48). London: Edward Arnold.

Mitchell, R., & Myles, F. (1998). Second language learning theories. New York: Arnold.Mitchell, R., & Myles, F. (2004). Second language learning theories (2nd ed.). New York: Arnold.Mougeon, R., Nadasdi, T., & Rehner, K., (2002). Etat de la recherche sur l’appropriation de la vari-

ation par les apprenants avances du FL2 ou FLE. AILE (Acquisition et Interaction en LangueEtrangère). Special issue: L’Acquisition de la variation par les apprenants du français langueseconde, 17, 17–50.

Mougeon, R., Rehner, K., & Nadasdi, T. (2004). The learning of spoken French variation by immer-sion students from Toronto, Canada. Journal of Sociolinguistics, 8, 408–432.

Muller, M. (1861). Lectures on the science of language. Delivered at the Royal Institution of GreatBritain in April, May, and June, 1886. First Series, London.

Norton, B., & Toohey, K. (2002). Identity and language learning. In R. Kaplan (Ed.), The Oxfordhandbook of applied linguistics (pp. 115–123). New York: Oxford University Press.

Oxford, R. L. (1996). Language learning motivation: Pathways to the new century. Honolulu:University Press of Hawaii.

Payne, A. (1980). Factors controlling the acquisition of the Philadelphia dialect by out-of-statechildren. In W. Labov (Ed.), Locating language in time and space (pp. 329–345). New York:Academic Press.

Perry, T., & Delpit, L. (Eds.) (1998). The real Ebonics debate. Boston: Beacon.Piaget, J. (1972). The child and reality. New York: Wiley.Pienemann, M., Johnston, M., & Brindley, G. (1988). Constructing an acquisition-based pro-

cedure for assessing second language acquisition. Studies in second language acquisition, 10,217–243.

Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. Cambridge, MA:MIT Press.

Pinker, S. (1999). Words and rules. New York: HarperCollins.Pinker, S. (2002). The blank slate: The modern denial of human nature. New York: Viking.Pinker, S., & Prince, A. (1994). Regular and irregular morphology and the psychological status of

rules of grammar. In S. D. Lima, R. L. Corrigan, & G. K. Iverson (Eds.), The reality of lin-guistic rules (pp. 321–352). Amsterdam: John Benjamins.

Poplack, S., & Walker, D. (1986). Going through /l/ in Canadian French. In D. Sankoff (Ed.), Diver-sity and diachrony (pp. 173–198). Amsterdam: John Benjamins.

Preston, D. (1989). Sociolinguistics and second language acquisition. Oxford: Blackwell.

References • 201

Preston, D. (1991). Style, status, and change: Three sociolinguistic axioms. In F. Byrne andT. Hubner (Eds.), Development and structures of creole languages. Essays in honor of DerekBickerton. Creole Language Library, Vol. 9. Amsterdam: John Benjamins. 43–59.

Preston, D. (1996). Variationist perspectives on second language acquisition. In R. Bailey &D. Preston (Eds.), Second language acquisition and linguistic variation (pp. 1–45). Philadel-phia: John Benjamins.

Preston, D. (2001). Style and the psycholinguistics of sociolinguistics: The logical problem oflanguage variation. In P. Eckert & J. R. Rickford (Eds.), Style and sociolinguistic variation(pp. 279–304). New York: Cambridge University Press.

Preston, D. (2002). A variationist perspective on second language acquisition: Psycholinguisticconcerns. In R. Kaplan (Ed.), The Oxford handbook of applied linguistics (pp. 141–159).New York: Oxford University Press.

Queller, K. (2001). A usage based approach to teaching the phrasal lexicon. In M. Putz, S. Niemeyer,& R. Driven (Eds.), Applied Cognitive Linguistics II: Language Pedagogy (pp. 55–83). Berlin:Mouton.

Rand, D., & Sankoff, D. (1990). GoldVarb Version 2: A variable rule application for the MacIntosh.Montreal: Université de Montréal, Centre Recherches Mathématiques.

Regan, V. (1996). Variation in French interlanguage: A longitudinal study of sociolinguisticcompetence. In R. Bayley & D. Preston (Eds.), Second language acquisition and linguisticvariation (pp. 177–201) Philadelphia: John Benjamins.

Rehner, K., Mougeon, R., & Nadasdi, T. (2003). The learning of sociolinguistic variation byadvanced FSL learners: The case of nous versus on in immersion French. Studies in SecondLanguage Acquisition, 25, 127–156.

Rickford, A. E. (1999). I can fly. Lanham, MD: University Press of America.Rickford, A. E., & Rickford, J. (1995). Dialect readers revisited. Linguistics and Education, 7,

107–128.Rickford, J. R., & Eckert, P. (2001). Introduction. In P. Eckert & J. R. Rickford (Eds.), Style and

sociolinguistic variation (pp. 1–18). New York: Cambridge.Rigg, P., & Enright, S. (1986). Children and ESL: Integrating perspectives. Washington, DC: TESOL.Rips, L. J. (1994). The psychology of proof: Deductive reasoning in human thinking. Cambridge, MA:

MIT Press.Roberts, J. (1993). The acquisition of variable rules: t/d deletion and -ing production in preschool

children. Unpublished doctoral dissertation. University of Pennsylvania.Robinson, G. H. (1964). Continuous estimation of a time-varying probability. Ergonomics 7, 7–21.Robinson, J. S., Lawrence, H. R., & Tagliamonte, S. A. (2001). GOLDVARB 2001: A multivariate

analysis application for windows. Department of Language and Linguistic Science, York,Canada: University of York.

Romaine, S. (1982). Sociolinguistic variation in speech communities. London: Arnold.Rorty, R. (1979). Philosophy and the mirror of nature. Princeton, NJ: Princeton University Press.Rorty, R. (1989). Contingency, irony, and solidarity. Cambridge, UK: Cambridge University Press.Rosch, E. (1973). Natural categories. Cognitive Psychology, 4, 328–350.Sankoff, G. (1974). A quantitative paradigm for the study of communicative competence. In

R. Bauman & J. Sherzer (Eds.), Explorations in the ethnography of speaking (pp. 18–49).Cambridge, UK: Cambridge University Press.

Sankoff, D., & Labov, W. (1979). On the uses of variable rules. Language in Society, 8 (3), 189–222.Sankoff, G. & Vincent, D. (1977). L’emploi productif de “ne” dans le français parle de Montréal. Le

Français Moderne, 45 (3), 243–256.Saussure, F. [1915] (1974). Course in general linguistics (W. Baskin, Trans.). London: Fontana/

Collins.Savignon, S. J. (1983). Communitive competence: Theory and classroom practice: Texts and contexts

in second language learning. Reading, MA: Addison-Wesley.Schieffelin, B. B. (1985). The acquisition of Kaluli. In D. I. Slobin (Ed.), The crosslinguistic study of

language acquisition, Vol. 1: The data (pp. 525–594). Hillsdale, NJ: Lawrence Erlbaum.Schiffrin, D. (1981). Tense variation in narrative. Language, 57 (1), 45–62.Schumann, J. (1978). The pidginization hypothesis. Rowley, MA: Newbury House.Schwartz, B. D., & Sprouse, R. A. (1996). L2 cognitive states and the full transfer/access model.

Second Language research, 12, 40–72.Shi, E. (2003). Second language grammar and secondary predication. Unpublished doctoral

dissertation. University of Arizona.

202 • References

Shibatani, M. (1996). Applicatives and benefactives: A cognitive account. In S. Thompson &M. Shibatani (Eds.), Grammatical constructions: Their form and meaning (pp. 245–263).Oxford: Clarendon Press.

Shirai, Y., & Andersen, R. W. (1995). The acquisition of tense-aspect morphology: A prototypeaccount. Language, 71 (4), 743–762.

Shuy, R., Wolfram, W, & Riley, W. (1968). Social stratification in Detroit speech. Washington,DC: Center for Applied Linguistics.

Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.Sinclair, J. (1996). The search for units of meaning. Textus, IX, 75–106.Slobin, D. I. (1973). Cognitive prerequisites for the development of grammar. In C. Ferguson

& D. Slobin (Eds.), Studies of child language development (pp. 175–208). New York: Holt.Slobin, D. I. (1985). Crosslinguistic evidence for the language-making capacity. In D. I. Slobin (Ed.),

The crosslinguistic study of language acquisition, Vol. 2: Theoretical issues (pp. 1157–1256).Hillsdale, NJ: Lawrence Erlbaum.

Smith, B. H. (1989). Contingencies of value. Cambridge, MA: Harvard University Press.Smith, F. (1982). Writing and the writer. New York: Holt.Smolensky, M. (2001). Grammar-based connectionist approaches to language. In M. H. Chris-

tiansen & N. Chater (Eds.), Connectionism and psycholinguistics (pp. 319–347). Westport,CT: Ablex.

Snow, C., & Brinton, D.M. (1997). The content-based classroom: Perspectives on integrating languageand content. White Plains: Longman.

Spelke, E., Vishton, P., & von Hofsten, C. (1995). Object perception, object-directed action,and physical knowledge in infancy. In M. S. Gazzaniga (Ed.), The cognitive neurosciences(pp. 165–179). Cambridge, MA: MIT Press.

Spivey, M. J. & Tanenhaus, M. K. (1998). Syntactic ambiguity resolution in discourse: Modelingthe effects of referential context and lexical frequency. Journal of Experimental Psychology:Learning, Memory, and Cognition, 24, 1521–1543.

Spolsky, B. (1985). Formulating a theory of second language learning. Studies in Second LanguageAcquisition, 7 (3), 269–288.

Stabler, E. (1984). Berwick and Weinberg on linguistics and computational psychology. Cognition,17, 155–179.

Swain, S., & Lapkin, D. (1989). Canadian immersion and adult second language teaching: What’sthe connection? Modern Language Journal, 73 (2), 150–159.

Tajfel, H. (1978). Social categorization, social identity, and social comparison. In H. Tajfel (Ed.),Differentiation between social groups: Studies in the social psychology of intergroup relations(pp. 61–76). London: Academic Press.

Talmy, L. (1985a). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.),Language typology and syntactic description, Vol. 3. Cambridge, UK: Cambridge UniversityPress.

Talmy, L. (1985b). Force dynamics in language and thought. In W. H. Eilfort, P. D. Kroeber, &K. L. Peterson (Eds.), CLS 21 Part 2: Papers from the parasession on causatives and agentivity(pp. 293–337). Chicago: Chicago Linguistics Society.

Tarone, E. (1979). Interlanguage as chameleon. Language Learning, 29, 181–191.Tarone, E. (1982). Systematicity and attention in interlanguage. Language Leaning, 32, 9–84.Tarone, E. (1985). Variability in interlanguage use: A study of style-shifting in morphology and

syntax. Language Learning, 35, 373–403.Tarone, E. (1988). Variation in interlanguage. London: Edward Arnold.Tarone, E. (1990). On variation in interlanguage: A response to Gregg. Applied Linguistics, 11,

392–400.Tarrallo, F., & Myhill, J. (1983). Interference and natural language in second language acquisition.

Language Learning, 33, 55–76.Tharp, R., & Gallimore, R. (1988). Rousing minds to life: Teaching, learning and schooling in social

context. Cambridge, UK: Cambridge University Press.Tharp, R., & Gallimore, R. (1990). Teaching, schooling and literate discourse. In L. Moll

(Ed.), Vygotsky and education (pp. 175–205). Cambridge, UK: Cambridge UniversityPress.

Thompson, G. L., & Brown, A. V. (2003). Interlanguage variation: The influence of contextualizedlanguage on L2 phonological production. Arizona Working Papers in Second LanguageAcquisition and Teaching (SLAT), 10, 35–50.

References • 203

Towel, R., & Hawkins, R. (1994). Approaches to second language acquisition. Clevedon, UK: Multi-lingual Matters.

Townsend, D. J., & Bever, T. G. (2001). Sentence comprehension: The integration of habits and rules.Cambridge, MA: MIT Press.

Trudgill, P. (1974). The social differentiation of English in Norwich. Cambridge, UK: CambridgeUniversity Press.

Valdez, G. (2001). Learning and not learning in English: Latino students in American schools.New York: Teachers College Press.

Valdman, A. (1961). Applied French – A guide for teachers. Boston: Heath.Valdman, A. (1966). Programmed instruction and foreign language teaching. In A. Valdman (Ed.),

Trends in language teaching (pp. 133–158). New York: McGraw-Hill.Valdman, A. (1989). The elaboration of pedagogical norms for second language learners in a con-

flictual diglossia situation. In S. Gass, C. Madden, D. Preston, & L. Selinker (Eds.), Variationin second language acquisition, Vol. I: Discourse and pragmatics (pp. 5–34). Clevedon, UK:Multilingual Matters.

Valdman, A. (1992). Authenticity, variation and communication in the foreign language classroom.In C. Kramsch & S. McConnell-Ginet (Eds.), Text and context: Cross-disciplinary perspectivesin language study (pp. 79–97). Lexington, MA: Heath.

Valian, V. (1979). The wherefores and therefores of the competence–performance distinction. InW. E. Cooper & E. C. T. Walker (Eds.), Sentence processing: Psycholinguistic studies presentedto Merrill Garrett (pp. 1–26). Hillsdale, NJ: Lawrence Erlbaum.

Vendler, Z. (1967). Linguistics in philosophy. Ithaca, NY: Cornell University Press.Vygotsky, L. S. (1978). Mind in society. Cambridge, MA: Harvard University Press.Vygotsky, L. S. (1986). Thought and language. Cambridge, MA: MIT Press.Weinreich, U., Labov, W., & Herzog, M. (1968). Empirical foundations for a theory of language

change. In W. W. Lehman & Y. Malkiel (Eds.), Directions for historical linguistics (pp. 95–188).Austin: University of Texas Press.

Wilkins, D. A. (1976). Notional syllabuses. Oxford: Oxford University Press.Wittgenstein, L. (1953). Philosophical investigations. New York: Macmillan.Wolfram, W. (1969). A sociolinguistic description of Detroit Negro speech. Washington, DC: Center

for Applied Linguistics.Wolfram, W. (1974). Sociolinguistic aspects of assimilation: Puerto Rican English in New York City.

Arlington, VA: Center for Applied Linguistics.Wolfram, W. (1975). Variable constraints and rule relations. In R. W. Fasold & R. W. Shuy,

(Eds.), Analyzing variation in language (pp. 70–88). Washington, DC: Georgetown UniversityPress.

Wolfram, W. (1985). Variability in tense marking: A case for the obvious. Language Learning, 35,229–253.

Wolfram, W., Carter, P., & Moriello, B. (2004). Emerging Hispanic English: New dialect formationin the American South. In R. Bayley & D. Preston (Eds.), Second language acquisition andlinguistic variation (pp. 339–358). Philadelphia: John Benjamins.

Wolfram, W. & Fasold, R. (1974). The study of social dialects in American English. Englewood Cliffs,NJ: Prentice-Hall.

Wolfson, N. (1982). On tense alternation and the need for analysis of native speaker usage insecond language acquisition. Language Learning, 32, 53–68.

Wood, D., Bruner, J., & Ross, G. (1976). The role of tutoring in problem solving. Journal of ChildPsychology and Psychiatry, 17, 89–100.

Young, R. (1989). Ends and means: Methods for the study of interlanguage variation. In S. Gass,C. Madden, D. Preston, & L. Selinker (Eds.), Variation in second language acquisition:Psycholinguistic issues (pp. 65–92). Clevedon, UK: Multilingual Matters.

Young, R. (1991). Variation in interlanguage morphology. New York: Peter Lang.

204 • References

Index

Note: page numbers in italics refer to figures, tables and diagrams

A

Adamson, H.D., 49, 50, 56, 66, 109, 137,

139, 146–7, 150, 156, 181

accomodation theory, 54

actuation problem, 34, 140–1, 144–5

acquisitional orders, 167–9

African American English, 7, 12, 13, 16,

17, 20, 24, 43, 45, 52, 154, 156

analysis–by–synthesis, 86

Andersen, R., 63, 68, 72–3

Anderson, J.R., 137

artificial language: learning of, 77–9

Ashby, W.J., 55–6

audience design, 140–6, 151–2

Auger, J., 162–3, 165–6

B

Bachman, L., 15, 49, 153–4, 193n

Baker, C.L., 111

Bakhtin, M.M., 143, 165

Bardovi–Harlig, K., 64–5, 69, 71–2, 158

Bayley, R., 49, 51–3, 59, 61–2, 66–8, 71–3

Bell, A., 141–5, 151–2, 163

Berdan, R., 21, 51

Berlin, B., xvi, 4, 5, 183–5, 189

Bever, T.G., xv, 11, 12, 86, 92, 170

Biber, D., 170

Bickerton, D., 19, 21, 22, 77, 99, 101,

187–8

Binding Principle B, 5, 7, 8, 10

Belfast speech, 18, 21

Black English literature, 163

blank slate theory (of mind), 3

Bley–Vroman, R., 6

Bowerman, M., 112

Bresnan, J., 112

Brinton, D.M., 181

broad–range semantic constraint (on

dativization), 114–8, 21, 125–6,

137

Brown, A. V., 149–50

Brown, R., 180

Bruffee, K. A., 175–6

Bruner, J., 167

Burgess, D., 199

Bybee, J., 107–9, 112, 126–7

C

Canale, M., 153

Cedergren, H., 17, 20, 26, 66

Chomsky, N., xv, 3, 5, 7, 9, 10, 12, 20, 82,

87, 136, 192n

Cofer, T., 12, 25–6, 31

Cognitive Linguistics: account of

reflexive, 94; introduction to:,

101; prototype categories in,

102–3; teaching implications of,

169–73

colloquations (see formulaic sequences)

color category systems, change in, 5, 34,

37, 129, 176, 180, 183–91

communicative competence, 15, 153, 159,

181

competence (versus performance), 9–11

comprehension model (of sentences), 11,

14, 15, 86–92

concordancing programs, 169, 171

connectionist networks, xv, 20, 96–9,

85–92, 96–9, 109–10, 126–9,

149, 192n

205

construction grammar, 116, 124 teaching

implications of, 169–72

constructions: ditransitive, 112–26, 172;

way, 116–7, 169; written all

over, 172

constructivist language teaching, 166–9

Cookie Monster, 5

Corder, S.P., 49

Crain, S., 5

D

Dani tribe, 4

Darwin, C., 33

Day, R., 190

Dell, G., 84–6

derivational theory of complexity, 11, 12,

80, 92

Descartes, R., 3, 4

developmental problem (of language

acquisition), 6, 7

Dewey, J., 182

dialect readers, 155–6

Dickerson, L. B.

discourse constraints (in narrative), 64–5

Dougherty, J.W.D., 185

Doughty, K., 168–9

E

Eckert, P., 41–2, 135, 164

Eckman, F., 168

Elliott, O. P., 92–9, 109–10, 126

Ellis, R., 13, 19

epistemological question (of knowledge),

3, 173

error back propagation, 90, 98

F

Fasold, R., 12, 20, 32, 154

Feldman, J., xv, 98, 109, 116, 192n

Feyerabend, P., 194n

Feynmann, R., 175

Fillmore, C. J., 116

Fischer, J., 24

Flege, J., 150

Fodor, J.A., 9, 12

formal learners, studies of, 54–6

formulaic sequences, 169–72

Fox, C. A., 159–61

Frawley, W., 165

French: articles in, 44; acquisition of 54–8;

immersion programs, 57, 154,

162, 165–6, 192n; null subject

parameter in, 6

fundamental difference hypothesis, 6, 7

G

Gallimore, R., 166

Gardner, R. C., 164

Garrett, M., xv, 12, 81

Gass, S., 159–60, 168

Gattegno, C., 181

Geertz, C., 174

gender axiom, 149

Giles, H., 140–1, 163

Gladney, M.R.

Goldberg, A., 108, 112, 116, 118–25

government and binding theory, 7, 12,

92

Gregg, K. R., 6, 7, 19, 21

Gropen, J., 112, 121–4

H

Hansen, J., 61, 73

Hare, M., 120

Hecht, M.L., 163

Herzog, M., 34, 36, 45, 183, 185

horizontal variation (in interlanguage),

49–58

Houston, S., 12, 25–6, 31, 147, 192n

Howard, M., 56, 59

Hudson Kam, C.L., 77, 79, 80, 90, 98, 124

Huebner, T., 49

Hume, D., 3

Hymes, D., 153

I

identity, social, xv, 42, 45, 133, 141, 144–7,

162–6

206 • Index

indicator (sociolinguistic), 44

innate ideas, 4

J

Jacobson, R., 183

Jones, M., 170–2

Joseph, J.E., 163

K

Kant, E., 4

Kay, P., xvi, 4, 5, 17, 20–1, 116, 181, 183–5,

189

Keenan, E. L., 168

knockout factor, 29, 30, 50

Krachu, B., 157

Kramsch, C., 157–9, 161

Krashen, S., xv, 98, 133, 135–9, 153,

159

Kuhn, T. S., 174–5

L

Labov, W., xiii–xv, 7, 14, 15, 17, 19, 20–6,

29–39, 41–6, 49–51, 55, 64, 102,

133–5, 137–8, 140–4, 147,

149–52, 154, 156, 164, 183, 188,

190

Ladefoged, P., 34

Lakoff, G., xvi, 116, 173–4, 176–7, 179

Lambert, W., 164

Langacker, R., 104, 106

language acquisition device, 136–7

language bound individuals, 190

language change: actuation problem of,

45; constraints problem of,

36–37; embedding problem of,

40–43; evaluation problem

of, 43–45; transition problem

of, 34, 37–40, 138

language optional individuals, 90

LaPonce, J. A., 165

Larsen–Freeman, D., 135

Levelt, W. J. M., xv, 13n, 81, 84, 190

Leibniz, G, W, 4

lexical diffusion, 38, 40, 54, 147

Lexical Rule Hypothesis (of learning

argument structure), 112–118

linking rules, 82, 113, 117

Locke, J., 3

logical problem: of learning, 3–5, 6; of

language acquisition, 5–6

M

MacLaury, R., 183, 186, 188–90

Major, R., 56, 59, 97, 147–50

Martha’s Vineyard, 17, 43, 45, 144, 164,

187

marker (sociolinguistic), 44–5

McKee, C., 5

McLaughlin, B., 137

Meisel, J., 167

middle experiencer verbs (in Spanish),

93–7, 99, 109–10

Miller, G.A., 11

Milroy, J., 18, 21

Mitchell, R., 13, 15

monitor model, 133, 135–9

monitoring: in first language, 4, 16, 18,

27–9, 44, 52, 55, 133–40, 143,

146, 149–52; in second

language, 52, 55, 59, 133, 135,

139–40, 150–1

morphophonological constraint (on

dativization), 113–14, 116, 121,

123–4

motivation: integrative, 62, 65;

instrumental, 162, 164

Mougeon, R., 49, 56, 58, 154, 162

Myles, F., 13, 15

N

Nadasdi, T., 49, 56, 58, 154, 162

narrative style, 14, 28–9, 133, 138–9,

149–50, 161

narratives (of Chinese–speaking

children), 64–73

naturalistic learners, 51–4

ne deletion, 55–57, 163

Index • 207

Northern Cities Vowel Shift, 41, 43,

45

Noun Phrase Acquisition Hierarchy,

168

O

ontological question (of knowledge), 3,

173

output hypothesis, 98

overgeneralization, 55, 58–9, 95–7,

109–11

Oxford, R. L., 157

P

Panama Spanish, 17, 20

past tense marking: by Chinese–speaking

adults, 52–3; by

Chinese–speaking children,

61–73; in AAE, 13, 32; in

creole languages, 187–8;

prototype schemas for, 106–8,

126–9

Payne, A., 38, 49

pedagogical norm, 158–161

Perry, T., 156

Philadelphia English, 23–32, 35–8, 40, 45,

142

Piaget, J., 166

Pienemann, M., 167

Pinker, S., 4, 107, 111–18, 122, 124–5, 128,

129

Poplack, S., 56

Preston, D., 16, 21, 28, 30, 70, 77, 90, 92,

99, 139, 142, 150, 152

probability matching, 39, 77

production model (of sentences), 12,

81–4, 87, 92, 98, 106–7, 136

property theory (of language acquisition),

6, 7

prototype categories, 102–4, 108, 193n

prototype schemas:, 106–8; compatibility

with connectionist networks,

109–10, 126–9; in syntax/

semantics, 111–26

Q

Quebec French, 56–8, 161–2, 165

Queller, K., 172

R

r deletion, 14, 18, 134

Rand, D., 15

Regan, V., 49, 54–59, 146–7, 149

regular sound change, 37–8, 147, 187

Rehner, K., 49, 56, 58, 154, 162

relative clauses, 88–9, 168–9, 193n

resultative construction, 116

Rickford, A. E., 155–6, 162–3

Rickford, J. R., 135, 155–6, 162–3

Rips, L.J., 103

Robinson, G. H., 77

Robinson, J. S., 66

Romaine, S., 18, 20–1

Rosch, E., 102, 181

Russell, B., 3

S

saliency, principle of, 62, 67, 68, 71, 73

Sankoff, D., 15, 17, 20, 22

Sankoff, G, 17, 26, 55, 66

Savignon, S. J., 153, 181

Schieffelin, B.B., 158

Schiffrin, D., 64, 66

Schumann, J., xiii, 51

Schwartz, B. D., 6

semantic domains, 94–6

Shi, E., 124

Shibatani, M., 116

Shirai, Y., 63, 68, 72

Shuy, R., 25, 31, 54

Slobin, D. I., 64, 107, 177–8, 180, 187

Smith, B. H., 194n

Smith F., 137

Smolensky, M., 9

Snow, C., 181

sociolinguistic competence, xv, 154, 157,

162, 193n

sound change (see language change)

Southern English, 24, 33, 44, 54, 155, 165

208 • Index

speech community, xiv, 12, 17, 18, 32–3,

40–3, 134–5

Spelke, E., 4

Spivey, M.J., 88–91, 127–8

Spolsky, B., 137

Stabler, E., 10

stereotype (linguistic), 44–5

Strict Constructivism Hypothesis, 112

style: of speaking, 13, 14, 16, 18, 24–32; 35,

39, 51–2, 55–6, 59, 90, 133–5,

138–52, 154, 161, 164, 166;

vernacular, 14, 134–5, 138, 140,

144, 152, 157

style axiom, 142

style shifting, 25, 30, 39, 139, 140–51, 163

style tree, 27–8

Swain, M., 98, 153

symbolic structures, 104–6

syntactic templates, 88, 170

T

Tajfel, H., 163

Talmy, L., 178–9

Tarrallo, F., 168

Tharp, R., 166

Thompson, G. L., 149–50

Townsend, D.J., xv, 11, 12, 86, 92, 170

transition problem of language

acquisition (see language

change)

Trudgill, P., 24–5

Tzeltal, 189

Tzotzil, 189–90

U

uniform constraints assumption, 17, 18,

20

Universal Grammar, xiii, 5, 39, 128

V

Valdez, G., 182

Valian, V., 10

Valdman, A., 158–62, 167

Varbrul program, xiii, xiv, 15–17, 23, 26,

28–30, 41, 50, 51, 55, 61, 66–72,

78, 86, 91–5, 98, 142, 145, 146,

149–50

variable rules, 12–16, 50, 86, 92, 98, 147,

152, 168; as prototype schemas,

110, 126–9; in artificial

language, 77–81; objections to,

17–20; logical status of, 21–2;

scope of, 17

variation theory, xiv–xvi; history of,

12–21; related to cognitive

linguistics, 102, 110, 122, 126–9;

related to connectionism, 86,

99, 126–9

Vendler, Z., 62–3

verbs, class II irregular, 108, 126–7

vernacular language, 57, 58, 154, 156–7,

161–3, 166, 190

vertical variation (in interlanguage),

49–51, 53, 61, 135, 138,

147

vowel formants, 34, 36

Vygotsky, L. S., 165, 166

W

Weinreich, U., 34, 36, 45, 183, 185

wh– questions, acquisition of, 50

Wilkins, D. A., 153

Wittgenstein, L., 102

Wolfram, W., 12, 17, 25, 31, 32, 49, 51–4,

56, 59, 61, 62, 66, 67, 68, 71, 73,

165

Wood, D., 167

Y

Young, R., 51, 61, 72, 145–6

Z

zone of proximal development, 166–7

Index • 209

Documents

Adamson - Interlanguage Variation