Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds

Preview:

Citation preview

Without data, nothing

Adam KilgarriffLexical Computing Ltd

University of Leeds

Generative Lexicon

Account of non-standard uses of words

So: we need a dataset

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 2

Method

Sample of words Sample of corpus instances for each Choose a dictionary Sense-tag Identify mismatches to dict senses For each

Does it fit the GL model?

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 3

Resources

Words (random sample) modest disability steering seize sack (v)

sack (n) onion rabbit handbag Corpus instances

between 82 and 718 for each word Total: 2276

Dictionary: HECTOR OUP/Xerox project in corpus lexicography

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 4

Tagging

Three professional lexicographers Assign sense to each corpus instance

For this exercise If anything other than 3-way agreement

Re-examine 390 of 2276 cases (17%)

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 5

modest

Any two dictionaries divide up space differently HECTOR: 9 CIDE: 3 LDOCE: 4 COBUILD: 5

tagger agreement – less than half Messy but no GL-like casesGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 6

Szeged, Jan 2008 Kilgarriff, Global WordNet 7

What is language?

steering 2 senses

Activity: his steering was careless Mechanism: they overhauled the steering

16 re-examined, most underspecified it has the Peugeot’s steering feel

One more complex case After nearly fifty years [as a bus driver] Mr. Hannis

stepped down from behind the steering wheel

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 8

onion

Two senses: plant and food 34 cases re-examined

10 bridged divide Plant the sets two inches apart to produce

a good yield of medium-sized onions Others – medicine, decorative feature,

dye, cliché of Frenchness It’s not all frogs legs and strings of onions

in the South of FranceGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 9

sack (n)

2 x sack race One metaphor

Santa Claus Ridley pulled another doubtful gift from his sack

Ridley: British politician

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 10

sack (v)

And Labour MP, Mr. Bruce George, has called for the firm to be sacked from duty at Prince Andrew’s £5 million home at Sunningwell Park near Windsor

Non-standard because end-employment needs PERSON as direct object.Candidate for GL treatmentGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 11

handbag She moved from handbags through

gifts to the flower shop [handbag department in department

store]

Candidate for GL treatment

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 12

Results 2276 corpus instances 390 re-examined 41 non-standard uses 2 potentially accounted for by GL

Conclusion GL will never account for a large share of non-

standard word use

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 13

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 14

What is language?

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 15

What is language? In our heads

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 16

What is language? In our heads In texts and sound signals

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 17

What is language? In our heads In texts and sound signals Both

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 18

Methodology Study language in our heads

Introspection Semantic analysis Experiments with human subjects

“rationalist” (Leibniz, Chomsky) Problems: coverage, arbitrariness

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 19

Methodology Study text

“empiricist” (Locke, Hume)

Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech

signals

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 20

Empiricist linguistics A new way to find out about

language 20 years of rapid ascent

Computers Corpora

bigger and bigger data sets available Language technology tools

lemmatizers, POS-taggers, parsers, machine learning for pattern finding

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 21

Preliminaries over

What is a word sense

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 22

Preliminaries over

What is a word sense (my PhD in 5 slides)

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 23

Preliminaries over

What is a word sense (my PhD in 5 slides) Where do you find them?

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 24

Preliminaries over

What is a word sense (my PhD in 5 slides) Where do you find them? Dictionaries!

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 25

The lexicographers

They create them Methods

Introspection Other dictionaries Corpus

Atkins, Hanks

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 26

What is a word sense (1)

SFIP Sufficiently frequent insufficiently

predictable (a glass of) whisky x (a glass of) tequila

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 27

What is a word sense (2)

homonymy

analogy polysemy rules

phraseology

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 28

What is a word sense (3)

A cluster Of instances of use

Operationalised as: corpus lines Clustered by lexicographers

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 29

What is a word sense (3)

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 30

What is a word sense (3)

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 31

What is a word sense (3)

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 32

What is a word sense (3)

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 33

What is a word sense (3) A cluster

Of instances of use Operationalised as: corpus lines

Clustered by lexicographers Makes sense of

Overlapping senses Different dictionaries, different senses Lumping and splitting

Theory

Hanks Norms and exploitations Task of lexicographer

Record the norms Speakers may always exploit norms to

say something new

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 34

Boring question Homonymy or polysemy

We all know it’s a kline

Interesting question Norm or exploitation

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 35

metaphor

see meaning understand Norm

I travelled the path From life towards artDesire the horse Depression the cart Leonard Cohen

ExploitationGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 36

How do they do it?

honeymoon

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 37

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 38

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 39

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 40

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 41

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 42

The Sketch Engine

Corpus query tool Used for making dictionaries at

OUP, CUP, Collins, Macmillan, Le Robert, Cornelsen, Elhuyar Foundation

Also Universities Linguistic research Teaching

Linguistics, also languagesGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 43

60 languages covered

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 44

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 45

Individual licences (£4.99/month) University site licences Free trial – self register

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 46

Build instant corpora form the web WebBootCaT

Install your corpora Compare corpora

http://www.sketchengine.co.uk

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 47

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 48

Thank you

homonymy

analogy polysemy rules

phraseology

Recommended