View
215
Download
0
Category
Tags:
Preview:
Citation preview
Without data, nothing
Adam KilgarriffLexical Computing Ltd
University of Leeds
Generative Lexicon
Account of non-standard uses of words
So: we need a dataset
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 2
Method
Sample of words Sample of corpus instances for each Choose a dictionary Sense-tag Identify mismatches to dict senses For each
Does it fit the GL model?
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 3
Resources
Words (random sample) modest disability steering seize sack (v)
sack (n) onion rabbit handbag Corpus instances
between 82 and 718 for each word Total: 2276
Dictionary: HECTOR OUP/Xerox project in corpus lexicography
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 4
Tagging
Three professional lexicographers Assign sense to each corpus instance
For this exercise If anything other than 3-way agreement
Re-examine 390 of 2276 cases (17%)
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 5
modest
Any two dictionaries divide up space differently HECTOR: 9 CIDE: 3 LDOCE: 4 COBUILD: 5
tagger agreement – less than half Messy but no GL-like casesGasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 6
Szeged, Jan 2008 Kilgarriff, Global WordNet 7
What is language?
steering 2 senses
Activity: his steering was careless Mechanism: they overhauled the steering
16 re-examined, most underspecified it has the Peugeot’s steering feel
One more complex case After nearly fifty years [as a bus driver] Mr. Hannis
stepped down from behind the steering wheel
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 8
onion
Two senses: plant and food 34 cases re-examined
10 bridged divide Plant the sets two inches apart to produce
a good yield of medium-sized onions Others – medicine, decorative feature,
dye, cliché of Frenchness It’s not all frogs legs and strings of onions
in the South of FranceGasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 9
sack (n)
2 x sack race One metaphor
Santa Claus Ridley pulled another doubtful gift from his sack
Ridley: British politician
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 10
sack (v)
And Labour MP, Mr. Bruce George, has called for the firm to be sacked from duty at Prince Andrew’s £5 million home at Sunningwell Park near Windsor
Non-standard because end-employment needs PERSON as direct object.Candidate for GL treatmentGasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 11
handbag She moved from handbags through
gifts to the flower shop [handbag department in department
store]
Candidate for GL treatment
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 12
Results 2276 corpus instances 390 re-examined 41 non-standard uses 2 potentially accounted for by GL
Conclusion GL will never account for a large share of non-
standard word use
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 13
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 14
What is language?
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 15
What is language? In our heads
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 16
What is language? In our heads In texts and sound signals
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 17
What is language? In our heads In texts and sound signals Both
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 18
Methodology Study language in our heads
Introspection Semantic analysis Experiments with human subjects
“rationalist” (Leibniz, Chomsky) Problems: coverage, arbitrariness
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 19
Methodology Study text
“empiricist” (Locke, Hume)
Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech
signals
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 20
Empiricist linguistics A new way to find out about
language 20 years of rapid ascent
Computers Corpora
bigger and bigger data sets available Language technology tools
lemmatizers, POS-taggers, parsers, machine learning for pattern finding
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 21
Preliminaries over
What is a word sense
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 22
Preliminaries over
What is a word sense (my PhD in 5 slides)
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 23
Preliminaries over
What is a word sense (my PhD in 5 slides) Where do you find them?
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 24
Preliminaries over
What is a word sense (my PhD in 5 slides) Where do you find them? Dictionaries!
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 25
The lexicographers
They create them Methods
Introspection Other dictionaries Corpus
Atkins, Hanks
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 26
What is a word sense (1)
SFIP Sufficiently frequent insufficiently
predictable (a glass of) whisky x (a glass of) tequila
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 27
What is a word sense (2)
homonymy
analogy polysemy rules
phraseology
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 28
What is a word sense (3)
A cluster Of instances of use
Operationalised as: corpus lines Clustered by lexicographers
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 29
What is a word sense (3)
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 30
What is a word sense (3)
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 31
What is a word sense (3)
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 32
What is a word sense (3)
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 33
What is a word sense (3) A cluster
Of instances of use Operationalised as: corpus lines
Clustered by lexicographers Makes sense of
Overlapping senses Different dictionaries, different senses Lumping and splitting
Theory
Hanks Norms and exploitations Task of lexicographer
Record the norms Speakers may always exploit norms to
say something new
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 34
Boring question Homonymy or polysemy
We all know it’s a kline
Interesting question Norm or exploitation
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 35
metaphor
see meaning understand Norm
I travelled the path From life towards artDesire the horse Depression the cart Leonard Cohen
ExploitationGasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 36
How do they do it?
honeymoon
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 37
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 38
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 39
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 40
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 41
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 42
The Sketch Engine
Corpus query tool Used for making dictionaries at
OUP, CUP, Collins, Macmillan, Le Robert, Cornelsen, Elhuyar Foundation
Also Universities Linguistic research Teaching
Linguistics, also languagesGasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 43
60 languages covered
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 44
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 45
Individual licences (£4.99/month) University site licences Free trial – self register
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 46
Build instant corpora form the web WebBootCaT
Install your corpora Compare corpora
http://www.sketchengine.co.uk
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 47
Gasteiz-Vitoria, 2012
Kilgarriff: Without Data, Nothing 48
Thank you
homonymy
analogy polysemy rules
phraseology
Recommended