Pl Biber Conrad Monograph5 Lo

7/25/2019 Pl Biber Conrad Monograph5 Lo

http://slidepdf.com/reader/full/pl-biber-conrad-monograph5-lo 1/61

Corpus Linguistics andGrammar Teaching

Douglas Biber & Susan Conrad

Susan Conrad is Professor of Applied Linguistics

at Portland State University. In addition to

doing teacher training, she has taught ESL/EFL

in South Korea, southern Africa, and numerous

places in the U.S.

Douglas Biber is Regents’ Professor of English

(Applied Linguistics) at Northern Arizona

University. His research efforts have focused

on corpus linguistics, English grammar,

and register variation (in English and cross-

linguistic; synchronic and diachronic).

Dr. Biber and Dr. Conrad have written several books, including the Longman Grammar

of Spoken and Written English and Real Grammar .

1. Why do grammar teachers needcorpus-based studies?

Speaking with English teachers and students throughoutthe world, we have discovered that most of them believethat the authors of their textbooks had some special sourceof information to help them write the book. This information

— they believe — made clear to the textbook author whatcontent to cover and how to cover it. With this source ofinformation, the coverage in the book must be correct,useful, and realistic, right?

Unfortunately, no special source of information fortextbook writers exists. Authors’ intuition, anecdotalevidence, and traditions about what should be in a grammarbook play major roles in determining the content oftextbooks. This usually works just fine for basic descriptionsof grammatical structure. The intuition of a native speakeror a couple examples is sufficient evidence for how to formaccurate grammatical structures in English. For example,intuition or anecdotal evidence work well to tell which

verbs are irregular, to describe the way to form perfectaspect verb phrases, or to list the relative pronouns thatare possible in English.

However, as any experienced English teacher can attest,accurately describing how to form grammatical structuresis only a small part of grammar teaching. And like teachers,textbook writers must make myriad decisions. An authormust decide how to sequence the grammatical information:

Which features should be presented in the first chapters versus features to discuss in later chapters versus featuresto exclude because there is no room to discuss them at all?

Similarly, the author must decide how much space todevote to each feature. At a more detailed level, the authormust decide how to illustrate the targeted grammaticalfeatures in example sentences — e.g. what contexts to useand what specific words to use in the grammatical structure(e.g. what verbs to use to illustrate past tense).

All of these decisions have important implications:teachers and students assume that the grammatical featuresat the beginning of the book are easier, more basic, ormore important, while omitted features are not important.Features that get many pages of coverage are assumed to beimportant and difficult to grasp. Perhaps most importantly,the contexts and example language are taken to be typicalof natural discourse.

Unfortunately, decisions about the sequencing ofmaterial, typical contexts, and natural discourse are notserved as well by intuition and anecdotal evidence asjudgments of accuracy are. First, the intuition of evenexperienced teachers is not consistent. For example, when

they rely purely on their intuition, teachers usually disagreeover whether simple present tense or present progressiveis more common in typical conversations and thereforedeserving of more practice.

In addition, it turns out that intuitions about typicallanguage choices are often wrong. In many cases, we simplydon’t notice the most typical grammatical features becausethey are so common. For example, most native speakers ofEnglish cannot identify the most common lexical verb usedin conversation; we use this verb so frequently that we don’teven notice it. (We describe the use of this verb in Section 2below; in the meantime, you can try to identify it on your own.)



Finally, to make matters worse, specialist books onteaching methods and materials development provide littleguidance about making specific decisions for sequencingor for gathering information about language choices innatural discourse. As Byrd (1995) wrote, “often designdecisions are based on traditions about grammar materialsand their organization rather than on careful rethinking ofeither the content or its organization” (p. 46).

Studies from the field of second language acquisition

can provide some guidance for textbook writers, forexample about what grammar structures are acquiredfirst by English language learners, and about kinds ofactivities that are likely to help learners begin the processof acquisition. But what about the content for grammarteaching — the questions about what forms are morecommon, what examples will best exemplify naturallyoccurring language, and what words are most frequent

with grammatical structures? Answers to these kinds ofquestions have, in recent years, been coming from researchthat uses the tools and techniques of corpus linguistics todescribe English grammar.

Corpus linguistics uses computer-assisted analyses

along with human interpretations of language functionsto study the language patterns in a large “corpus” of texts(see Biber, Conrad, and Reppen, 1998, and McEnergy,

Tono, Xiao, 2006 for useful introductions). The texts canbe written or transcribed spoken texts. In most cases, acorpus is designed to represent particular “registers” (suchas conversation, newspaper writing, or academic prose).

As a result, grammatical studies based on corpora candescribe differences across registers. Because the corporaare large, covering many different speakers and writers, it ispossible to see what is typical for a large group of languageusers in various contexts. Thus, corpus-based researchprovides textbook writers and teachers with a new sourceof information — a data-based source, rather than intuition— to consider when making decisions.

Corpus-based studies of English grammar have provento be especially useful for descriptions of language use.

That is, they help us understand what speakers and writers actually do with the linguistic resources availablein English. Three types of results are most important forgrammar teaching:

• frequency information

• register comparisons

• associations between grammatical structures and words (lexico-grammar)

We illustrate each of these areas with case studies takenfrom research conducted for the Longman Grammar ofSpoken and Written English (LGSWE; Biber et al., 1999),discussing how such information is useful to textbook

writers and teachers. The case studies are based on analysisof corpora from four registers: conversation, fiction,newspaper language, and academic prose. Although theseare general registers, they differ in important ways fromone another (e.g., with respect to mode, interactiveness,production circumstances, purpose, and target audience).

The analyses were carried out on the Longman Spokenand Written English (LSWE) Corpus, which contains

c. 20 million words of text overall, with c. 4-5 million words from each of these four registers. All frequency countsreported below have been normalized to a common basis(a count per 1 million words of text), so that they aredirectly comparable across registers.

In the LGSWE, corpus-based investigations ofgrammatical features were carried out using a variety ofcomputational, interactive, and detailed ‘manual’ textualanalyses. In all cases, the overarching concern was to

achieve an accurate description of the distributionalpatterns of the target grammatical feature. Computationaltechniques made it feasible to analyze the patterns ofuse in a 20-million-word corpus. However, wheneverautomatic techniques produced skewed or inaccurateresults, we shifted to interactive analyses or even manualanalyses carried out on random sub-samples of texts fromeach register. The guiding principles were to achieve anaccurate grammatical description efficiently, using whatevertechniques were required for that purpose, based on themost representative sample of corpus texts that couldreasonably be analyzed using those techniques.

2. Frequency Information

The simplest kind of information available from corpusanalysis is frequency: identifying the grammatical featuresthat are especially common or rare. However, as thefollowing two case studies show, even this kind of analysisoften reveals surprising patterns of use that have importantimplications for grammar teaching.

Progressive vs. Simple Verb Tenses

One strongly held intuition about language use amongmany English-teaching professionals is that progressiveaspect (the ‘present continuous’) is the most common

choice in conversation. This belief is reflected in grammartextbooks, which typically cover the progressive in one ofthe very first chapters of lower-level books.

Corpus-based research has found that progressive aspectis in fact more common in conversation than in most

written registers (Figure 1). The contrast with academicprose is especially noteworthy: progressive aspect is extremelyrare in academic prose but more common in conversation.Progressive verb phrases are nearly as common in fictionas in conversation, and they are relatively common in newsas well.

FIG.1



However, when progressive aspect is compared to simpleaspect in conversation, the corpus research shows that theintuition about progressives being typical is dramatically

wrong. Simple aspect verb phrases are more than 20 timesas common as progressives in conversation (Figure 2).

The following excerpt illustrates the normal reliance on simpleaspect in natural conversation:

[conversation among friends at a party, talking about movies]

Michelle: Do you guys want more sour cream chips or

do you want Doritos?

John: I want pretzels. [pause] Thanks.

Sheila: This is really good. [pause] What the hell

was that movie?

John: You can’t remember the name of it?

Michelle: Was it an older movie or a new one?

Sheila: No, not very recent, within the last three

years I’d say. [pause] This guy is an American.

He goes — his dad had gone to the UnitedStates — he actually gets back into Germany

and he works on a train during World War

Two. He gets embroiled in this family’s mess.

[pause] Oh, it was really good — I wish I

could remember — I’ll think of it.

John: You don’t remember who was in it?

Sheila: No, nobody recognizable.

In contrast, progressive aspect is much less frequent andused for special effects, usually focusing on the fact that anevent is in progress or about to take place, for example:

What’s she doing?

But she’s coming back tomorrow.

Another special effect of progressives occurs withnon-dynamic verbs, when the progressive can refer to atemporary state that exists over a period of time, as in:

I was looking at that one just now.

You should be wondering why.

We were waiting for the train.

This case study illustrates how textbook authors andteachers can be misled by intuitions, in this case believingthat progressive verbs are much more prevalent than theyin fact are. Unfortunately, misperceptions like this arepassed along to our students through textbooks and otherteaching materials that overemphasize the progressive, andpromote over-use by students. Corpus-based findings are astrong antidote to misperceptions of this type.

The most common lexical verbs in conversation

Frequency information from corpus studies can alsohelp materials writers decide what words to use as theygive examples and writer exercises for grammaticalstructures. Even when they are focused on common, easy

vocabulary, for example, materials writers have to choosefrom literally dozens of common lexical verbs in English.For example, nearly 400 different verb forms occur over20 times per million words in the LSWE Corpus (see Biberet al 1999.370-1). These include many everyday verbs suchas pull, throw, choose, fall, etc.

Given this large inventory of relatively common verbs,it might be easy to assume that that no individual verbsstand out as being particularly frequent. However, this is

not at all the case: there are only 63 lexical verbs that occurmore than 500 times per million words in a register, andonly 12 verbs occur more than 1,000 times per million

words in the LSWE Corpus (Biber et al 1999.367-378). These 12 most common verbs are: say, get, go, know, think,see, make, come, take, want, give, mean.

To give an indication of the importance of these 12 verbsFigure 3 plots their combined frequency compared to theoverall frequency of all other verbs. Taken as a group, these12 verbs are especially important in conversation, wherethey account for almost 45% of the occurrences of alllexical verbs!

There are also some large differences among thefrequencies of these 12 verbs. Figure 4 plots the frequenciesfor these verbs in the LSWE conversation corpus. Some ofthese very common verbs are not surprising. For instance,most people are not surprised that say is very frequentin conversation, given how often the speech of othersis reported. The verb say is actually common in bothconversations and in written texts such as newspapers,for example:

FIG.2

FIG.3



You said you didn’t have it. (Conversation)

He said this campaign raised ‘doubts about the

authenticity of free choice’. (Newspaper)

But the extremely high frequency of the verb get inconversation is more surprising for most people. This verb

goes largely unnoticed, yet in conversation it is by far thesingle most common lexical verb. The main reason thatget is so common is that it is extremely versatile, beingused with a wide range of meanings.

These include:

Obtaining something: See if they can get some of

that beer.

Possession: They’ve got a big house.

Moving to or away from something: Get in the car.

Causing something to move or happen: It gets people

talking again, right?

Understanding something: Do you get it?

Changing to a new state: So I’m getting that way now.

Several other verbs are also extremely common inconversation: go, know, and to a lesser extent, think, see,come, want, and mean. News, on the other hand, showsa quite different pattern, with only the verb say beingextremely frequent. However, it should be noted that all12 of these verbs are notably common in both conversationand written newspaper language when compared to most

verbs in English.

An important function of grammar textbooks is the

introduction of new vocabulary, and frequency has animportant guiding role. Frequent words will be more usefulto students receptively and in production, in a wider rangeof circumstances, whereas relatively rare words will proveless useful.

The point here is not to argue against inclusion of a wide range of vocabulary. Relatively rare words can be veryuseful for learners to know. However, there is no reason

why relatively rare words should be illustrated to theexclusion of common words. High-frequency verbs aredifficult to learn because of the wide range of meanings

that they can express. Thus, students are likely to needextensive exposure to those verbs. In the past, authorshave been forced to rely on intuitions regarding the typicalpatterns of language use, resulting in a skewed coverage oflanguage patterns. However, with the availability of thefrequency information that can be obtained from corpusanalyses, it is now possible to ensure coverage of the mostcommon words and grammatical patterns, as well ascoverage of vocabulary breadth.

3. Associations between Grammarand Words

The usefulness of frequency information for materials writers extends beyond simple counts of frequencies ofgrammatical structures or vocabulary, as illustrated inthe last section. Corpus-based research has consistentlyfound that there are actually strong associations betweengrammatical structures and the words used with them. Inother words, not every word is equally likely to occur ina given grammatical structure. It makes sense, then, formaterials writers to give explanations and examples that

reflect the typical associations and give students practice with them.

For example, the last section discussed the rarity ofprogressive aspect in conversation. However, it is the casethat there are a few verbs usually are in the progressiveaspect when they are used. These verbs include bleeding,chasing, shopping, starving, joking, kidding, and moaning.Explanations, examples, and activities with progressivesthus need to include these verbs. As noted in the lastsection, this is not to say that expanding students’

vocabularies is not useful; rather, it doesn’t make senseneglect the most common words used in a grammaticalstructure even when also expanding vocabulary.

Another illustration of the usefulness of corpus studies’lexico-grammatical findings can be seen with verb + gerundand verb + infinitive constructions. Teachers and studentshave long been plagued by long lists of verbs that takegerunds and other lists of verbs that take infinitives. Thelists are, in fact, so long that — while they are useful forreference — they can be overwhelming for students andteachers. You probably will not be surprised at this pointto learn, however, that not all the possible combinationsare frequent. For example, in the case of verb + to infinitive,only four verbs are especially common in both speechand writing (occurring more than 200 times per million

words): want, try, seem, and like. In addition, begin to is

very frequent in fiction writing, while tend to and appearto are common in academic writing.

Other relatively common verbs fit the same kinds ofmeaning, so that the most common verb + infinitive pairscan be grouped into general meaning categories:

want/need verbs: hope, like, need, want, want NP, wish

effort verbs: attempt, fail, manage, try

begin/continue verbs: begin, continue, start

“seem” verbs: appear, seem, tend

FIG.4



A useful way to make the verb + infinitive materialsmore manageable and the practice more focused is toconcentrate on these four categories of meaning, beginning

with the seven most common verbs and then expanding tothese other relatively common verbs with related meanings.

4. Register Comparisons

Another of the most important general findings in

corpus-based studies of grammar is the importance ofregister variation. It turns out that most descriptionsof common grammatical features and their use are not

valid for English overall. Rather, strong patterns of use inone register often represent only weak patterns in otherregisters. And even when there are similarities, there arestill often differences across registers. For example, infiction writing and newspaper writing, the verb say is muchmore frequent than any other lexical verb; in conversation,the verbs go and know are as frequent as say , and get ismore frequent than any of those three verbs; while inacademic writing, the only especially frequent verb is BE.

This example illustrates the most important kind

of information for teachers and materials writers: thegrammatical distinctions between conversation andinformational writing. The typical grammar of conversationis radically different from the typical grammar ofinformational writing. The two registers differ dramaticallyin the grammatical features they most commonly use. Buteven when the two registers use the same grammaticalconstructions, they often do so in very different ways andassociated with different sets of words. The following twocase studies illustrate these patterns: The case of nounpremodifiers provides an illustrative example of the

way conversation and writing use different grammaticalstructures. The case of the definite article illustrates how

they use the same structure in different ways.Noun premodifiers in conversation and writing

In most textbooks, adjectives are typically characterizedas words that describe something, and books usuallyinclude extensive coverage of attributive adjectives as themajor grammatical device used for noun modification (e.g.,the big house). Most textbooks also describe the adjectivalrole of -ing and -ed participles (e.g., an exciting game, aninterested couple). In contrast, the adjectival role of nouns(e.g., a grammar lesson) is less commonly acknowledgedin textbooks. This difference seems to reflect a widelyheld belief that adjectives and participial adjectives arethe primary devices used for noun modification, whereasnouns are considered to be much less important aspremodifiers of another noun. Here again, corpus-basedanalysis provides a very different picture.

Figure 5 presents the frequencies of adjectives, participialadjectives, and nouns as nominal premodifiers, comparingtheir use in conversation versus newspaper writing. Inconversation, adjectives are the primary device used fornoun modification (although most noun phrases inconversation do not include modifiers). If conversationalEnglish is the primary target for a textbook, an exclusivefocus on adjectives seems justified.

However, a very different pattern of use is found inthe written registers. The pattern in newspaper writing isespecially noteworthy: Nouns as premodifiers are extremelyfrequent and nearly as common as adjectives, whereasparticipial forms are surprisingly rare.

It might be argued that the grammar of nouns aspremodifiers is somehow simpler than that of adjectivesor participial forms, and therefore they require little overtattention. However, in actual fact, premodifying nouns canexpress a bewildering array of meanings, with no surface-level clues to guide the reader. For example, consider therelationships between the modifying noun (N1) and thehead noun (N2) in the following pairs:

glass windows, metal seat, tomato sauce (N2 is made

from N1)

pencil case, brandy bottle, patrol car (N2 is used for the

purpose of N1)

sex magazine, sports diary (N2 is about N1)

farmyard manure, computer printout (N2 comes from N1)

summer rains, Paris conference (N1 gives the time or

location of N2)

These are only a few of the many different meaningrelations found with nouns as premodifiers (see Biberet al., 1999, pp. 589–591). These forms are potentiallydifficult to understand in addition to being extremelyfrequent in written registers. Students at intermediate andadvanced levels are likely to need greater exposure to thesecommonly encountered forms than comparatively rare

forms like participial adjectives, and especially at advancedlevels, they are likely to need practice producing themappropriately for the concise, condensed writing expectedin the informational writing of many disciplines.

The Definite Article

The definite article (the) occurs fairly commonly in bothspeaking and writing, but it is used for different functionsin the two registers. In general, definite articles are requiredin English for one of four reasons:

FIG.5



The noun was introduced previously in the text:

A 12-year-old boy got mad at his parents Friday

night <…> and drove off in one of his parent’s four cars

<…> The boy was found unharmed <…>

Shared situational context specifies the noun:

Bob, put the dog out, would you please?

Modifiers of the noun specify the noun:

The introduction [of technology into teaching] should

include support and training.

The specific noun can be inferred from earlier discourse:

…an old pale blue Ford rattled into view. The driver

swung wide around my car.

Definite articles are used for other reasons, includingidioms and generic reference, but these four functions arethe most common.

Corpus-based research has found that the single mostcommon reason for use of the definite article differsbetween informational writing and conversation (Table 1).In conversation, definite articles are usually used whenthere is a shared situational context accounts (55% of alloccurrences), while modifiers of the noun account foronly about 5%. In informational writing modifiers of thenoun are far more likely to be the reason for the use of thedefinite article; they account for 30-40% of all occurrences.

Reason for Definite Conversation Informational Article Use Writing

introduced previous in text 25% 25-30%

shared situational context 55% 10%

modifiers of the noun 5% 30-40%

inference 5% 15%

other 10% 10%

Table 1. Most Common Reasons for Use of the Definite Article

In both registers, about 25% of all definite articlesoccur when a noun is mentioned previously in the text.Many learners of English know this rule. Similarly, manylearners know the rule that the definite article is used whenspeakers are both familiar with the specific noun beingreferred to. What is important from the corpus-basedstudy is that the presence of a noun modifier is such animportant reason for definite article use in informational

writing. The modifiers can be before the noun (such as inthe most affordable homes) but they more often comeafter the noun, in prepositional phrases or relative clauses.

These modifiers rarely contain superlatives that makeit easy to identify a referent as unique, so the choice ofdefinite article is difficult for many learners to understand.Few textbooks, however, call attention to these structuresfor students or provide practice with them.

The differences between registers can be cruciallyimportant for both comprehension (listening versusacademic reading) and production (conversational skills

versus academic writing). In this case, corpus researchprovides the essential information required to supportESP/EAP approaches to teaching. However, even in morespecialized approaches, it is rare that students need tobe taught just one register; rather, they need to be madeaware of register variation. Even within EAP programs,for example, students must learn to comprehend and use

grammar appropriately in conversation to interact withEnglish-speaking friends and participate in group work,

while the grammar of academic writing is required forreading course texts and writing academic papers.

5. Putting it all together: Integratingfrequency, lexico-grammar, andregister information

By using information from corpus research — frequencies word associations, and analysis of register differences —materials developers and teachers can increase the meaningful

input that is provided to learners. This is not to say that corpusresearch provides all the answers. Other factors are equallyimportant — for example, some grammatical topics arerequired as building blocks for later topics; some grammaticaltopics are more difficult and therefore require more practicethan others. In many cases, though, pedagogical decisions aremade based on little empirical evidence. Teachers and authorsall share the same goals of presenting the typical and mostimportant patterns first, moving on to more specialized topicsin later chapters and more advanced books. However, lackingempirical studies, authors have been forced to rely on theirintuitions for these judgments, and widely accepted traditionshave arisen to support those intuitions.

With the rise of corpus-based analyses, we are beginningto see empirical descriptions of language use, identifying thepatterns that are actually frequent (or not) and documentingthe differential reliance on specific forms and words indifferent registers. In some cases, our intuitions as authorshave turned out to be correct; in many other cases, we havebeen wrong. For those latter cases, revising pedagogy to reflectactual use, as shown by frequency studies, can result in radicalchanges that facilitate the learning process for students.

REFERENCESBiber, D., S. Conrad, and R. Reppen. (1998). Corpus linguistics:Investigating language structure and use. Cambridge: Cambridge

University Press.

Biber, D., S. Johansson, G. Leech, S. Conrad, & E. Finegan. (1999).Longman grammar of spoken and written english. London: Longman.

Biber, D. and R. Reppen. 2002. What does frequency have to do withgrammar teaching? Studies in Second Language Acquisition 24.199-208

Byrd, P. (1995). Issues in the writing of grammar textbooks. In P. Byrd(Ed.), Materials writer’s guide (pp. 45–63). Boston: Heinle & Heinle

McEnery, T., R. Xiao, and Y. Tono. (2006) Corpus-based language studies: An advanced resource book. London: Routledge.

0-13-210552-7 978-0-13-210552-1© Copyright by Pearson Education

Documents

Pl Biber Conrad Monograph5 Lo