22
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade http://tinyurl.com/669o4 zt [email protected] [email protected] k

Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt [email protected] [email protected]

Embed Size (px)

Citation preview

Page 1: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Corpus Linguistics: session 2

Corpus Linguistics (2):

The Tools of the Trade

http://tinyurl.com/669o4zt

[email protected]@it.ox.ac.uk

Page 2: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Today’s session

• An introduction to some features of tools

• Demo of different (kinds of) tools

• Hands-on practice with one tool

AIM: Help you know what to look for in a tool for your work (and what options there are)

Page 3: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

TYPES OF TOOLSThere are different

Page 4: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Different kinds of tools

• Online / offline• For one particular corpus / for any corpus or

text• Use straight away / need to prepare corpus• 'Free' / licence conditions and costs

Page 5: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Different kinds of tools

• Online / offline• For one particular corpus / for any

corpus or text• Use straight away / need to prepare

corpus• 'Free' / licence conditions and costs

Page 6: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Tools may

• have different functions: concordance, wordlist, statistics, collocation, keywords…

• handle annotation: interpret tags, ignore tags, treat tags as text

• take different text formats: .txt, .xml, .html

Page 7: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

TYPICAL FUNCTIONS

Different tools have different functions.

Page 8: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Concordance

• Search word + context

• Can be displayed as KWIC

• Can usually be sorted

• Used to see patterns of use

Page 9: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

KWIC Concordance

Page 10: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Wordlist

List all words in the corpus

• alphabetically

• by frequency

Used as starting point for further functions

• keywords

• lexical density/readability calculations

Page 11: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Sampler AntConc wordlist

Page 12: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Collocations

Co-occurrence patterns

borrow money

borrow books

borrow a car

May I borrow

(more in Session 3)

Page 13: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Collocates: adjectives immediately preceding BUSINESS

Corpus of Contemporary American English

http://www.americancorpus.org/

Page 14: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Visualization

Graphs

Word clouds

Distribution displays

Etc.

Page 15: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Example: BNCweb

Page 16: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

borrow

Page 17: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Example: Voyant Toolshttp://voyant-tools.org

Page 18: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

‘borrow’Compare your intuition to what you find in the corpus

What is borrowed and by whom?

What words do you expect to find together with borrow?

Can these words be grouped in some way, for example based on their word class, function, or meaning?

Where would you expect these words (e.g. before or after borrow? Immediately adjacent or not?)

Who do you think uses the work borrow? In what context or type of language would you find borrow?

Are there any words that are NOT used with borrow?

Page 19: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

AntConc

Download AntConc for free from:

http://www.antlab.sci.waseda.ac.jp/antconc_index.html

(or just search for Antconc)

Use your own texts and corpora. Find some examples at:

http://www.ota.ox.ac.uk/

Page 20: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Tip of the week

Register to use the BYU corpora for free.

http://corpus.byu.edu

Page 21: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Next week (Session 3)Collocation

Corpus linguists claim to have identified an important principle is responsible for the creation of much of the meaning of texts – collocation (co-occurrences). What is it, and are the claims true?

Optional reading:* Xiao, Richard, and Tony McEnery (2006). "Collocation, Semantic Prosody, and near Synonymy: A Cross-Linguistic Perspective " Applied Linguistics 27(1): 103-129. http://applij.oxfordjournals.org/cgi/content/full/27/1/103

Page 22: Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade  669o4zt martin.wynne@it.ox.ac.uk ylva.berglund@it.ox.ac.uk

Corpus Linguistics: session 2

Corpus Linguistics (2):

The Tools of the Trade

http://tinyurl.com/669o4zt

[email protected]@it.ox.ac.uk