68
Online Corpus Literacy Teachers’ Best Friend Dominik Lukeš [email protected] @techczech

Using online corpus for literacy teachers

Embed Size (px)

Citation preview

Page 1: Using online corpus for literacy teachers

Online CorpusLiteracy Teachers’ Best Friend

Dominik Lukeš[email protected] @techczech

Page 2: Using online corpus for literacy teachers

Outline

http://www.flickr.com/photos/adactio/3563832656

What is a corpusAnswering questions with a corpusThe language of corpus searchesThe corpus and the classroomPractice

Page 3: Using online corpus for literacy teachers

Corpus / Corpora????

Page 4: Using online corpus for literacy teachers

of about

language

knowledge

http://www.flickr.com/photos/missturner/3029700617/

Page 5: Using online corpus for literacy teachers

Prescriptivism… how language should be used

Descriptivism… how language is used

v

Page 6: Using online corpus for literacy teachers

“Most of the prescriptive rules of the language mavens make no sense on any level. They are bits of folklore that originated for screwball reasons several hundred years ago… For as long as they have existed, speakers have flouted them…”

Page 7: Using online corpus for literacy teachers

“intellectual abdication”“should be ashamed”

“current around 1900” “a perversion of

grammatical education” “blind to textual evidence

even when he himself exhibits it”

“dishonest and stupid”

“vile little compendium of tripe about style”

Grammarian Geoffrey K Pullum on …

“More passives in Orwell's pompous essay with the warning about how you mustn't use

them than in any periodical you can lay

your hands on! “

Page 8: Using online corpus for literacy teachers

This usage stuff is not straightforward and easy. If ever someone tells you that the rules of English grammar are simple and logical and you should just learn them and obey them, walk away, because you're getting advice from a fool.

http://languagelog.ldc.upenn.edu/nll/?p=2790

Page 9: Using online corpus for literacy teachers

CorpusKey modern tool for finding out about how language works…

Page 10: Using online corpus for literacy teachers

Corpus… is a large database of representative language samples …

Page 11: Using online corpus for literacy teachers

Corpus… 100s of millions of words from (mostly) written language in different genres in small samples (~2000 words) …

Page 12: Using online corpus for literacy teachers

Corpus… used for linguistic research, making dictionaries, writing grammars, …

Page 13: Using online corpus for literacy teachers
Page 14: Using online corpus for literacy teachers

Corpora available for teachers

http://corpus.byu.edu

Page 15: Using online corpus for literacy teachers

BYU corpora availableCOCA (contemporary Am English)COHA (historical Am English)GloWbE (global web English)WikipediaGoogle Books (BrEng/AmEng)BNC (British National Corpus)Hansard (British parliamentary speeches)Spanish/Portugese

Page 16: Using online corpus for literacy teachers

Access to COCA and related BYU corpora is free…

…but free registration required for more than ~10 queries a day

Page 17: Using online corpus for literacy teachers
Page 18: Using online corpus for literacy teachers

Other resources derived from BYU corpora

WordFrequency.infoWordAndPhrase.infoAcademicWords.infoCollocates.info

Page 19: Using online corpus for literacy teachers

http://www.webcorp.org.uk

Page 20: Using online corpus for literacy teachers

http://corpus.leeds.ac.uk

Page 21: Using online corpus for literacy teachers

http://www.flickr.com/photos/atoach/3900591006/

Searching a corpus early on in the process of making a generalization can save you a lot of unpleasant surprises later.

Page 22: Using online corpus for literacy teachers

How do we use the word dyslexia?

We speak more often of dyslexic children than adults.

We speak more often of dyslexia than any other dys- word.

Page 23: Using online corpus for literacy teachers

ConcordanceBNC:dyslexic [n*]

COCA: dyslexic [n*]

http://www.americancorpus.org/

http://corpus.byu.edu/bnc

Page 24: Using online corpus for literacy teachers

COCA:dys*

Page 25: Using online corpus for literacy teachers

Suffixing rules

*yed

*ied

Page 26: Using online corpus for literacy teachers

Suffixing rules

*yed

*ied

playedstayed

portrayedenjoyed

unemployedsurveyed

diedtried

marriedworried

identifiedapplied

Page 27: Using online corpus for literacy teachers

The Corpus Magic

*[ ]

?

Different corpora use slightly different codes. Read the

manual.

[n* ]

Page 28: Using online corpus for literacy teachers

The Corpus Magic

*[ ]

?Any one characterAny number of

characters (incl 0)

Lemma (all inflectional

forms of a word)

Different corpora use slightly different codes. Read the

manual.

[n* ]Part of speech tags

(e.g. nouns)

Page 29: Using online corpus for literacy teachers

**each each, reach, beach, teach,

outreach, …, impeach, …

teach* teachers, teaching, …, teachable, teacher-librarians, …

t*ch touch, teach, tech, torch, trench, twitch, …, three-inch, …

teach * teach the, teach us, teach students, …

Page 30: Using online corpus for literacy teachers

??each reach, beach, teach, peach,

leach, keach, …

each? each- (1), each# (1) [ie nothing]

?each? peachy, bleachy, teacha, reachs (2) [ie spelling error], …

t?ch tech, tach, toch, tuch, tsch, tich

t??ch touch, teach, torch, tisch, …

Page 31: Using online corpus for literacy teachers

[Lemma]

Page 32: Using online corpus for literacy teachers

Part of speech tags

[run].[n*]

[run] [n*]

Page 33: Using online corpus for literacy teachers

Common tags

[n*] noun[NN2] plural nouns

[v*] verb[VVD] verb past tense

[aj*] (BNC) / [j*](COCA) adjective[av*] (BNC) / [r*](COCA) adverb

Page 34: Using online corpus for literacy teachers

Help

Page 35: Using online corpus for literacy teachers
Page 36: Using online corpus for literacy teachers
Page 37: Using online corpus for literacy teachers

You can alsocats and dogs search for idioms?each*s combine wildcards[=pretty] search for synonymscar|bike|horse search for alternativesused -car exclude searches

For more details see:

Page 38: Using online corpus for literacy teachers

Concordance + KWIC*ies.[N*]

Page 39: Using online corpus for literacy teachers

KWIC – Key-Word In Context*ies.[N*]

Page 40: Using online corpus for literacy teachers

Limit searches by genre

Page 41: Using online corpus for literacy teachers

Other questions corpus can answerAre there more nouns or verbs ending in -ies?

*ies.[V*] vs. *ies.[N*]Are there four-letter verbs ending in -ed in the present tense? ??ed.[VVB]What are the most common adjectives describing students vs. pupils. [j*] [student] vs. [j*] [pupil] What do we say teachers do most often?

[teacher] [vvb]

Page 42: Using online corpus for literacy teachers

Corpus, rules, and regularity

http://www.flickr.com/photos/51505078@N00/352492687

pre*

*ed

*ies.[V*]

Page 43: Using online corpus for literacy teachers

CollocationsLimits on variability

See also Kennedy, p. 80-23

Page 44: Using online corpus for literacy teachers

CollocationsLimits on variability

See also Kennedy, p. 80-23

Page 45: Using online corpus for literacy teachers

Collocations (cont)

[teacher] must [v*]

Page 46: Using online corpus for literacy teachers

Idioms and set phrases275 results

359 results

Page 47: Using online corpus for literacy teachers

Google as a Corpus"put the search text in quotes"

use * for the search item

Page 48: Using online corpus for literacy teachers

training.dyslexiaaction.org.uk

Page 49: Using online corpus for literacy teachers

Google as a Corpus PRO: rare, low frequency usage,

up-to-date usage

CON: no sampling, no frequency sort, no genre limit, no part of speech tags

Page 50: Using online corpus for literacy teachers

Google results counts are only rough estimates…

http://searchengineland.com/why-google-cant-count-results-properly-53559

Different people searching in different geographic locations can get different numbers

Sometimes searching for A gives fewer results than searching for A without B

Page 51: Using online corpus for literacy teachers

…but Google fights can be fun

Page 52: Using online corpus for literacy teachers

WebCorp is makes Google search results linguist-friendly

Page 53: Using online corpus for literacy teachers

Avoid Common Corpus Errors

http://www.flickr.com/photos/andreassolberg/433734311

Page 54: Using online corpus for literacy teachers

Be aware of limitations: sampling, coverage, size, presence of typos and errors, bad part of speech taggingBeware of low frequency resultsBeware of homographs

Page 55: Using online corpus for literacy teachers

Check results come from multiple sourcesCheck KWIC to confirm relevanceLimit search by genre

Page 56: Using online corpus for literacy teachers

Check examples and sources

training.dyslexiaaction.org.uk

Page 57: Using online corpus for literacy teachers

Always check low frequency resultsmust [v*] [n*]

…sometimes they come from the same source

Page 58: Using online corpus for literacy teachers

False roots

http://etymonline.com

corner, silly, preface, cockroach, protest, stable …

Page 59: Using online corpus for literacy teachers

Make your own corpus with TextSTAT

http://neon.niederlandistik.fu-berlin.de/en/textstat

Page 60: Using online corpus for literacy teachers

Make your own corpus with AntConc

http://www.antlab.sci.waseda.ac.jp/software.html

Page 61: Using online corpus for literacy teachers

Corpus in the classroom

teacher preparation

student discovery

Page 62: Using online corpus for literacy teachers

Teacher preparation

find relevant, common examplesprepare worksheetscheck for exceptionsfind out answers to student questions about rules and usage

Page 63: Using online corpus for literacy teachers

Student discoveryshow search results to students to work out rules or word meaningsteach students how to search for questionsask students to give each other puzzles for searching

Page 64: Using online corpus for literacy teachers

For heavy classroom use…

register for group access to prevent spam lock out

Page 65: Using online corpus for literacy teachers

Corpus v dictionary

Page 66: Using online corpus for literacy teachers

Non-classroom corpus use

supplement dictionarycross-word puzzlescheck typical usage when writing

Page 67: Using online corpus for literacy teachers

Where to go next?

http://www.corpora4learning.net

Page 68: Using online corpus for literacy teachers

Thank youContact [email protected]