Teachers’ Top 10 Uses For a Language Corpus Saturday, May 18, 9-10h * PLUS Breakout Session at...

Preview:

Citation preview

1

Teachers’ Top 10 UsesFor a Language Corpus

Saturday, May 18, 9-10h * PLUS Breakout Session at 10h15 <HERE> *

Sunshine State TESOL 2013«Expanding Traditions: Merging Methodology & Technology»

ORLANDO, FLORIDA

Tom CobbDidactique des langues

Université du Québec à Montréal

FIND THIS PPT AT WWW.LEXTUTOR.CA/CV/SS-TESOL.PPT

2

Who?

Tom Cobb teaches Applied Linguistics at a French

university in Montreal. His main interest is adapting the

computer tools of linguists to the needs of language teachers

through his website www.lextutor.ca.

Tom worked abroad for many years (Saudi Arabia,

Oman, Hong Kong) before returning to North America,

and continues to work as a consultant in both developed and

developing counries (Japan, Niger, Benin, Barbados). His

research writings are available at lextutor.ca/cv/ .

3

SS-TESOL - Blurb

A “language corpus” is a sampled collection of written or spoken texts large enough to represent part of a language (medical, economic) or even a language as a whole. The applied linguistics literature is full of references to research involving corpora, and ESL teacher-training courses exhort new teachers to get familiar with corpora and use them for various purposes in their teaching. But - when teachers get into the classroom, do they follow this advice? And if so what do they use a corpus for?

The Lextutor website (wwww.lextutor.ca) offers teachers access to several corpora and checks how they use them. User data shows that > 1,000 (mainly teachers) per day consult a corpus on Lextutor. This data along with email queries and conference presentation makes it clear what teachers are using corpora for, and has made it possible to evolve the tools in line with teachers’ needs and goals. My talk will outline the main 10 reasons teachers consult a concordance.

4

Really ?

5

6

7

8

What?

• What is a corpus?• Why do we need corpora?• What difference do they make?• What is ``the corpus revolution``?

Or, ``Is there a corpus revolution?``

>>>> A brief primer on CORPORA before we get to teachers’ uses

9

Corpora – what are they?

11

Dr Johnson A Dictionary of the English Language

Longman 1755 Based on quotations from literature

copied onto many slips of paper

But using literature has some problems

Early corpora

12

120 years later - James Murray, OED 1879 – REAL LANGUAGE examples sent in by post - Oxford City Post Office sets up a special sub-branch for OED

1960s - Enter The Computer

13

14

15

What is a corpus? NOT just «a lot of text»!

A large collection of language in use, but

…Assembled systematically, according to explicit criteria

of representativeness

How large? Depends on the goal

16

Goals and sizes Linguistics goal - to represent

entire language• 100 million wds still under-represents

common collocations

Pedagogical goal – S`s meet common words, structures

• 1-million-words gives 10 hits for frequent words

Applied linguistics goal – trace an acquisition feature

• 100,000 word Learner Corpora are common

17

Drilling down into… Pedagogical goal – S`s meet

common grammar and vocab Grammar – 1 million is adequate

– All structures get many hits Lexis

• Basic vocab – 1 million gives 10 hits @ 2k level

• Main collocations– 1 million gives the main ones

Torrential rain?

• “Raining cats and dogs”? – 1 billion gives 5 hits

• Identify specialist lexis– 200,000 may be enough

18

A growth industry

Brown 1970………………..1,000,000 wdshttp://icame.uib.no/brown/bcm.html

BNC 1994 .……………… 100,000,000 wdswww.natcorp.ox.ac.uk

COCA (BYU) 2013 .……. 450,000,000 wdsContemporary corpus U.S. English 1990-2012

http://corpus.byu.edu/coca/

Cambridge Int’l 2002....1,000,000,000 wdswww.cambridge.org./elt/corpus/international_corpus.htm

19

Design / composition e.g., Brown (1970s)

Page from Lextutor

20

What does a corpus represent? A language as a whole

• BNC

Or a part• Cancode oral, COCA, MICASE

academic

Or of an individual • Jack London’s collected works

Or a group of individuals–Class of ESL learners

21

How do we read a corpus?

Cannot read it naturally–Defeats the goal

Needs the help of a search technology

concordance index frequency list many others

22

Concordancers

http://www.lextutor.ca/concordancers/concord_e.html

23

Corpora – why do we need them?

24

Why do we need corpora?

A. Corpus work is sexy

B. We have computers – let’s use them

C. Linguistic intuitions are unreliable

25

Linguistic intuitions are notoriously unreliable

Demo 1: Do you think however is more common in spoken or in written language?

By how much? (3 to 1… etc)

26http://www.lextutor.ca/range/range_corpus/

28

29

30

Demo 3: Can you rank order these roughly by frequency band?

0 - 2k3k - 5k6k - 10k11k-15k

http://www.lextutor.ca/freq/train/

31Try one? http://www.lextutor.ca/freq/train/

32

Many linguistic intuitions are unreliable

Implicit patterns are extremely slow to extract from input

N. Ellis, J. Hulstijn

… because of the severe limitations on what we can see and remember

… unaided

And if pattern perception is slow and unreliable for Native Speakers…

How much slower for LEARNERS ?!

33

34

Not only linguistic intuitions are problematic

For every appearance,many possibleexplanations

Stand outside on astarry evening, what does it look like?

35

The role of the computer in modern science is well known. In disciplines like physics and biology, the computer's ability to store and process inhumanly large amounts of information has disclosed patterns and regularities in nature beyond the limits of normal human experience. Similarly in language study, computer analysis of large texts reveals facts about language that are not limited to what people can experience, remember, or intuit. In the natural sciences, however, the computer merely continues the extension of the human sensorium that began 200 years ago with the telescope and microscope. But language study did not have its telescope or microscope. The computer is its first analytical tool, making feasible for the first time a truly empirical science of language.

– Cobb 1999

36

Before the computer, linguists could only study small samples of language at a time because of their limitations of their powers of observation and their memories. Even scholars who relentlessly collected instances of usage all their lives only had a few examples of any particular pattern, and there was no way of telling what they had missed.

Sinclair, 2003, p. ix

37

Most sciences - supplemented by technologies from the 15th century

BIOLOGY..……….microscope ASTRONOMY..…..telescope NAVIGATION.……astrolabe etc

Language study – late 20th century –

….machine readable corpora

38

Corpus Findings – Very Good News for ESL

39

Fabled Core of English is close to disclosure through 35 yrs of corpus work Main lexis + coverage

2000 wd families = 80%, Carrol et al 76 Main collocations in BNC-speech

84 HF collocations belong in 1k list, Shin & Nation 2007

Main phrasal verbs – 25 Ph vbs = 1/3 of all ph vbs in BNC, Gardner & Davies, 2007

Main morphologies Bauer & Nation, 1993

Main stress patterns (Murphy & Kandil) Cf. All this coming together at the same time as

the human genome, also a corpus project

40

Numerous errors are now corrected (in principle)

Definitions no longer harder than the defined word

Simple present no longer automatically the first verb tense taught

Written language no longer the model for spoken language

Status of multi-word units is reinstated Grammar no longer taught …

via unknown lexis as unconnected to lexis

42

This is all great, but… What do teachers do

with corpora?

<<< Back to 10 main uses of Lextutor corpora with ESL learners

43

This is all great, but…• What do teachers do

with corpora?

• <<< Back to 10 main uses of Lextutor corpora with ESL learners

44

1. The obvious use – source of examples for the teacher

• Teacher finds examples to show students– Words– Structures– Discourse features

• Find sentences for test questions – within a rough-tuned level – within a domain

««-- MEANS WE CAN GO LIVE EASILY FROM THIS PLACE

45

Display words, collocations, structures in classroom

50

Conclusion: most of “What it means to know a word” can be shown in a million-word corpus

Nation’s 18 kindsof word knowledge

51

• Uses 2-9 are concordancing in a task context– Where teachers set up concordances for learners

to use independently– because they achieve some goal by doing so

• Payoff for looking through multi-examples

• These were independent uses of concordances– Later incorporated in dedicated interfaces

52

EXAMPLE: A student writer wants to describe a teacher as ``one of the best teacher…``

2. Corpus as a writing resource

53A writing resource click-linked to learner`s text

54

3. Data-Driven Error Analysis

55

… integrated as writing error feedback

56

4-5 : Corpus as a reading resource

Expand the text• Via concordancer hooked up to

learner’s text–With potential payoff in strategy

development

58

4. Give lexical info while reading

• Or, develop lexical strategies while reading• Or, eta-lexical competence… etc

59

4a. Encourage use of context before dictionary

60

5. Show if word is worth stopping for

61

6. Word-focus activities…• Auto-generate rich semantic cuing

63

7. Group made concs for collab-vocab• Learners contribute concordance lines• Since there are too many words to learn alone…

64

7a. Facilitate transfer of word knowledge• to novel context

65

8. Facilitate quick scope out of a k-level

66

8. Facilitate quick-scope of a k-level

67

9. Snapshot of a set of learner essays

• Error patterns?• Are recently learned words coming

through in production?• Are new structures coming through?

– Correctly?

69

10. And, under development

• By popular demand• From my best Googe-hitting paper (1997)• Scope out a level by contextual inference

– Like in L1 but with support

70

Any research supporting all of this?• COCONCORDANCE AS A READING RESOURCE

– Cobb, T. (2009). Internet and literacy in the developing world: Delivering the teacher with the text. In K. Parry (Ed.), Literacy for All in Africa Vol. 2: Reading in Africa: Beyond the School. Kampala: African Book Collective.

• CONCORDANCE AS WRITING FEEDBACK– Gaskell, D., & Cobb, T. (2004) Can learners use concordance feedback for writing errors?

System, 32(3), 301-319

• LEARNER-BUILT CONCORDANCE FOR VOCAB DEVELOPMENT– Horst, M., Cobb, T., & Nicolae, I. (2005). Expanding Academic Vocabulary with a

Collaborative On-line Database. Language Learning & Technology 9(2), 90-110

• CONCORDANCE INVESTIGATION OF LEARNER PRODUCTION– Cobb, T. (2003).

Analyzing late interlanguage with learner corpora: Quebec replications of three European studies. Can. Modern Language Review 59(3), 393-423.

• CONCORDANCE FOR SCOPING OUT A K-LEVEL– Cobb, T. (1997). Is there any measurable learning from hands-on concordancing?

System 25 (3), 301-315.– Cobb, T. & Horst, M (2011). Does Word Coach coach words? CALICO 28(3), 639-661.

MORE AT LEXTUTOR.CA/CV/

Recommended