View
222
Download
2
Tags:
Embed Size (px)
Citation preview
2003/04 1
Metodi statistici nella linguistica Metodi statistici nella linguistica computazionalecomputazionale
Lessico e semantica lessicale WordNet
2003/04 2
Beyond part of speech tagging: Beyond part of speech tagging: lexical information and NL lexical information and NL applicationsapplications
NLE applications often need to know the MEANING of words at least, or (e.g., in the case of spoken dialogue systems), whole utterancesMany word-strings express apparently unrelated senses / meanings, even after their POS has been determined
Well-known examples: BANK, SCORE, RIGHT, SET, STOCK Homonymy may affect the results of applications such as IR and machine translation
The opposite case of different words with the same meaning (SYNONYMY) also important
E.g., for IR systems (synonym expansion)
HOMOGRAPHY may affect Speech Synthesis
2003/04 3
Meaning ….Meaning ….
•Characterizing the meaning of words not easy
• Most of the methods considered in these lecture characterize the meaning of a word by stating its relations with other words•This method however doesn’t say much about what the word ACTUALLY mean (e.g., what can you do with a car)
2003/04 4
HomonymyHomonymy
Word-strings like STOCK are used to express apparently unrelated senses / meanings, even in contexts in which their part-of-speech has been determined
Other well-known examples: BANK, RIGHT, SET, SCALE
An example of the problems homonimy may cause for IR systems
Search for 'West Bank' with Google
2003/04 5
Homonymy and machine translationHomonymy and machine translation
2003/04 6
Pronunciation: homography, Pronunciation: homography, homophonyhomophony
HOMOGRAPHS: BASSThe expert angler from Dora, Mo was fly-casting for BASS rather than the traditional trout.The curtain rises to the sound of angry dogs baying and ominous BASS chords sounding.
Problems caused by homography: text to speechRhetoricalAT&T text to speech
Many spelling errors are caused by HOMOPHONES – distinct lexemes with a single pronunciation
Its vs. it’sweather vs. whetherExamples from the sport page of the Rabbit
2003/04 7
More advanced lexical resources for More advanced lexical resources for CL: DICTIONARIESCL: DICTIONARIES
A traditional DICTIONARY is a database containing information about
the PRONUNCIATION of a certain wordits possible PARTS of SPEECHits possible SENSES (or MEANINGS)
In recent years, most dictionaries have appeared in Machine Readable form (MRD)
Oxford English DictionaryCollinsLongman Dictionary of Ordinary Contemporary English (LDOCE)
2003/04 8
An example LEXICAL ENTRY from a An example LEXICAL ENTRY from a machine-readable dictionary: STOCK,from machine-readable dictionary: STOCK,from the LDOCEthe LDOCE
0100 a supply (of something) for use: a good stock of food 0200 goods for sale: Some of the stock is being taken without being paid for 0300 the thick part of a tree trunk 0400 (a) a piece of wood used as a support or handle, as for a gun or tool (b) the piece which goes across the top of an ANCHOR^1 (1) from side to side 0500 (a) a plant from which CUTTINGs are grown (b) a stem onto which another plant is GRAFTed 0600 a group of animals used for breeding 0700 farm animals usu. cattle; LIVESTOCK 0800 a family line, esp. of the stated character 0900 money lent to a government at a fixed rate of interest 1000 the money (CAPITAL) owned by a company, divided into SHAREs 1100 a type of garden flower with a sweet smell 1200 a liquid made from the juices of meat, bones, etc., used in cooking …..
2003/04 9
Why people don’t see the problem?Why people don’t see the problem?
Just as in the case of POS tagging, one sense generally more frequent
2003/04 10
One type of information generally One type of information generally present in MRDs: SYNONYMYpresent in MRDs: SYNONYMY
Two words are SYNONYMS if they have the same meaning at least in some contextsE.g., PRICE and FARE; CHEAP and INEXPENSIVE; LAPTOP and NOTEBOOK; HOME and HOUSE
I’m looking for a CHEAP FLIGHT / INEXPENSIVE FLIGHT
From Roget’s thesaurus:OBLITERATION, erasure, cancellation, deletion
But few words are truly synonymous in ALL contexts:I wanna go HOME / ?? I wanna go HOUSEThe flight was CANCELLED / ?? OBLITERATED / ??? DELETED
Knowing about synonyms may help in IR: NOTEBOOK (get LAPTOPs as well)CHEAP PRICE (get INEXPENSIVE FARE)
2003/04 11
Problems and limitations of MRDsProblems and limitations of MRDs
Identifying distinct senses always difficult- Sense distinctions often subjective
Definitions often circular
Very limited characterization of the meaning of words
2003/04 12
POLYSEMYPOLYSEMY
In cases like BANK, it’s fairly easy to identify two distinct senses (etymology also different). But in other cases, distinctions more questionable
E.g., senses 0100 and 0200 of stock clearly related, like 0600 and 0700, or 0900 and 1000
In some cases, syntactic tests may help. E.g., SERVE:They rarely SERVE red meat, preferring to prepare seafood, poultry or game birdsHe SERVED as US Ambassador to Norway in 1976 and 1977He might have SERVED his time, come out and led an upstanding life.
In general, not always easy to distinguish between HOMONIMY and POLYSEMY
2003/04 13
Homonymy and polysemyHomonymy and polysemy
0100 a supply (of something) for use: a good stock of food 0200 goods for sale: Some of the stock is being taken without being paid for 0300 the thick part of a tree trunk 0400 (a) a piece of wood used as a support or handle, as for a gun or tool (b) the piece which goes across the top of an ANCHOR^1 (1) from side to side 0500 (a) a plant from which CUTTINGs are grown (b) a stem onto which another plant is GRAFTed 0600 a group of animals used for breeding 0700 farm animals usu. cattle; LIVESTOCK 0800 a family line, esp. of the stated character 0900 money lent to a government at a fixed rate of interest 1000 the money (CAPITAL) owned by a company, divided into SHAREs 1100 a type of garden flower with a sweet smell 1200 a liquid made from the juices of meat, bones, etc., used in cooking …..
2003/04 14
Other aspects of lexical meaning not Other aspects of lexical meaning not captured by MRDscaptured by MRDs
Other semantic relations:HYPONYMYANTONYMY
A lot of other information typically considered part of ENCYCLOPEDIAs:
Trees grow bark and twigsAdult trees are much taller than human beings
2003/04 15
Hyponymy and HypernymyHyponymy and Hypernymy
HYPONYMY is the relation between a subclass and a superclass:
CAR and VEHICLEDOG and ANIMALBUNGALOW and HOUSE
Generally speaking, a hyponymy relation holds between X and Y whenever it is possible to substitute Y for X:
That is a X -> That is a YE.g., That is a CAR -> That is a VEHICLE.
HYPERNYMY is the opposite relationKnowledge about TAXONOMIES useful to classify web pages
Eg., Semantic WebAutomatically (e.g., Udo Kruschwitz’s system)
This information not generally contained in MRD
2003/04 17
The organization of the lexiconThe organization of the lexicon
“stock”
WORD-STRINGS LEXEMES SENSES
STOCK-LEX-1
STOCK-LEX-2
STOCK-LEX-3
stock0100
stock0200
stock0600
stock0700
stock0900
stock1000
2003/04 18
SynonymySynonymy
“cheap”
WORD-STRINGS LEXEMES SENSES
CHEAP-LEX-1
CHEAP-LEX-2
INEXP-LEX-3
cheap0100
….
……
cheapXXXX
inexp0900
inexpYYYY
“inexpensive”
2003/04 19
A more advanced lexical resource: A more advanced lexical resource: WordNetWordNet
A lexical database created at PrincetonFreely available for research from the Princeton sitehttp://www.cogsci.princeton.edu/~wn/
Information about a variety of SEMANTICAL RELATIONS Three sub-databases (supported by psychological research as early as (Fillenbaum and Jones, 1965))
NOUNsVERBSADJECTIVES and ADVERBS
Each database organized around SYNSETS
2003/04 20
The noun databaseThe noun database
About 90,000 forms, 116,000 sensesRelations:
hypernym breakfast -> meal
hyponym meal -> lunch
has-member faculty -> professor
member-of copilot -> crew
has-Part table -> leg
part-of course -> meal
antonym leader -> follower
2003/04 21
SynsetsSynsets
Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET
E.g.,
{chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug}
(gloss: person who is gullible and easy to take advantage of)
2003/04 22
HypernymsHypernyms2 senses of robin
Sense 1robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast) => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast) => oscine, oscine bird -- (passerine bird having specialized vocal apparatus) => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless) => bird -- (warm-blooded egg-laying vertebrates characterized by feathers and forelimbs modified as wings) => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium) => chordate -- (any animal of the phylum Chordata having a notochord or spinal column) => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement) => organism, being -- (a living thing that has (or can develop) the ability to act or function independently) => living thing, animate thing -- (a living (or once living) entity) => object, physical object -- => entity, physical thing --
2003/04 23
MeronymyMeronymy
wn beak –holon
Holonyms of noun beak
1 of 3 senses of beak
Sense 2
beak, bill, neb, nib
PART OF: bird
2003/04 24
The verb databaseThe verb database
About 10,000 forms, 20,000 sensesRelations between verb meanings:
Hypernym fly-> travel
Troponym Walk -> stroll
Entails Snore -> sleep
Antonym Increase -> decrease
2003/04 25
Relations between verbal meaningsRelations between verbal meanings
V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2- e.g., snore entails sleep
TROPONYMY when To do V1 is To do V2 in some manner- e.g., limp is a troponym of walk
2003/04 26
The adjective and adverb databaseThe adjective and adverb database
About 20,000 adjective forms, 30,000 senses4,000 adverbs, 5600 sensesRelations:
Antonym (adjective) Heavy <-> light
Antonym (adverb) Quickly <-> slowly
2003/04 27
How to useHow to use
Online: http://cogsci.princeton.edu/cgi-bin/webwnCommand line:
Get synonyms:wn –synsn bank
Get hypernyms:wn –hypen robin
(also for adjectives and verbs): get antonymswn –antsa right
2003/04 29
Other machine-readable lexical Other machine-readable lexical resourcesresources
Machine readable dictionaries:LDOCE
Roget’s ThesaurusThe biggest encyclopedia: CYC
2003/04 30
ReadingsReadings
Jurafsky and Martin, chapter 16WordNet online manualsC. Fellbaum (ed), Wordnet: An Electronic Lexical Database, The MIT Press
2003/04 31
Metodi statistici nella linguistica Metodi statistici nella linguistica computazionalecomputazionale
Word sense disambiguation
2003/04 32
Identifying the sense of a word in its Identifying the sense of a word in its contextcontext
In many respects, a similar task to that of POS taggingBut less agreement on what the senses are, so the UPPER BOUND is lowerMain methods we’ll discuss:
Frequency-based disambiguationSelectional restrictionsDictionary-basedStatistical methods: Naïve Bayes
2003/04 33
Corpora used for word sense Corpora used for word sense disambiguation workdisambiguation work
Sense Annotated (Difficult and expensive to build)Semcor (200.000)DSO (192.000 semantically annotated occurrences of 121 nouns and 70 verbs), Training data Senseval (8699 texts for 73 words)Tal-treebank(80000)
Non Annotated (Available in large quantity)Brown, LOB, Reuters
2003/04 34
SEMCORSEMCOR
<contextfile concordance="brown"><context filename="br-h15" paras="yes">…..<wf cmd="ignore" pos="IN">in</wf> <wf cmd="done" pos="NN" lemma="fig" wnsn="1" lexsn="1:10:00::">fig.</wf> <wf cmd="done" pos="NN" lemma="6" wnsn="1“ lexsn="1:23:00::">6</wf> <punc>)</punc> <wf cmd="done" pos="VBP" ot="notag">are</wf> <wf cmd="done" pos="VB" lemma="slip" wnsn="3" lexsn="2:38:00::">slipped</wf> <wf cmd="ignore" pos="IN">into</wf> <wf cmd="done" pos="NN" lemma="place" wnsn="9" lexsn="1:15:05::">place</wf> <wf cmd="ignore" pos="IN">across</wf> <wf cmd="ignore" pos="DT">the</wf> <wf cmd="done" pos="NN" lemma="roof" wnsn="1" lexsn="1:06:00::">roof</wf> <wf cmd="done" pos="NN" lemma="beam" wnsn="2" lexsn="1:06:00::">beams</wf> <punc>,</punc>
2003/04 35
Dictionary-based approachesDictionary-based approaches
Lesk (1986): 1. Retrieve from MRD all sense definitions of the word to be
disambiguated2. Compare with sense definitions of words in context3. Choose sense with most overlap
Example:PINE
1 kinds of evergreen tree with needle-shaped leaves2 waste away through sorrow or illness
CONE 1 solid body which narrows to a point2 something of this shape whether solid or hollow3 fruit of certain evergreen trees
Disambiguate: PINE CONE
2003/04 36
Psychological research on word-Psychological research on word-sense disambiguationsense disambiguation
Frequency effects: common words such as DOOR are responded more quickly than uncommon words such as CASK (Forster and Chambers, 1973)Context effects: words are identified more quickly in context than out of context (Meyer and Schvaneveldt, 1971)
bread and butter
Lexical access: the COHORT model (Marslen-Wilson and Welsh, 1978)
2003/04 37
Frequency-based word-sense Frequency-based word-sense disambiguationdisambiguation
If you have a corpus in which each word is annotated with its sense, you can collect unigram statistics (count the number of times each sense occurs in the corpus)
P(SENSE)P(SENSE|WORD)
E.g., if you have 5845 uses of the word bridge,5641 cases in which it is tagged with the sense STRUCTURE194 instances with the sense DENTAL-DEVICE
Frequency-based WSD gets about 70% correctTo improve upon these results, need context
2003/04 38
Selectional restrictionsSelectional restrictions
One type of contextual information is the information about the type of arguments that a verb takes – its SELECTIONAL RESTRICTIONS:
AGENT EAT FOOD-STUFFAGENT DRIVE VEHICLE
Example:Which airlines serve DENVER?Which airlines serve BREAKFAST?
Limitations:In his two championship trials, Mr. Kulkarni ate glass on an empty stomach, accompanied only by water and tea.But if fell apart in 1931, perhaps because people realized that you can’t EAT gold for lunch if you’re hungry
Resnik (1998): 44% with these methods
2003/04 39
Supervised approaches to WSD: Supervised approaches to WSD: Naïve BayesNaïve Bayes
A Naïve Bayes Classifier chooses the most probable sense for a word given the context:
As usual, this can be expressed as:
The BAYES ASSUMPTION: all the features are independent
)|(argmax CsPs k
)(
)()|(maxarg
CP
sPsCPs kk
)|()|(1
kj
n
jk svPsCP
2003/04 40
An example of use of Naïve Bayes An example of use of Naïve Bayes classifiers: Gale et al (1992)classifiers: Gale et al (1992)
Used this method to disambiguated word senses using an ALIGNED CORPUS (Hansard) to get the word senses
English French Sense Number of examples
duty droitdevoir
taxobligation
1114691
drug medicamentdrogue
medicalillicit
2292855
land terrepays
propertycountry
1022386
2003/04 41
Gale et al: words as contextual cluesGale et al: words as contextual clues
Gale et al view a ‘context’ as a set of wordsGood clues for the different senses of DRUG:
Medication: prices, prescription, patent, increase, consumer, pharmaceuticalIllegal substance: abuse, paraphernalia, illicit, alcohol, cocaine, traffickers
To determine which interpretation is more likely, extract words (e.g. ABUSE) from context, and use P(abuse|medicament), P(abuse|drogue)To estimate these probabilities, use MLE:
P(abuse|medicament) = C(abuse, medicament) / C(medicament)P(medicament) = C(medicament) / C(drug)
2003/04 42
The algorithmThe algorithm
TRAINING:
for all senses sk of w do for all words vj in the vocabulary do P(vj|sk) = C(vj,sk)/C(sk) endend
for all senses sk of w do P(sk) = C(sk)/C(w)end
DISAMBIGUATION:
for all senses sk of w do score(sk) = log P(sk) for all words vj in the context window c do score(sk) = score(sk) + log P(vj|sk) endend
choose s’ = argmax score(sk)
2003/04 43
ResultsResults
Gale et al (1992): disambiguation system using this algorithm correct for about 90% of occurrences of six ambiguous nouns in the Hansard corpus:
duty, drug, land, language, position, sentence
Good clues for drug:medication sense: prices, prescription, patent, increaseillegal substance sense: abuse, paraphernalia, illicit, alcohol, cocaine, traffickers
2003/04 44
Other methods for WSDOther methods for WSD
Supervised:Brown et al, 1991: using mutual information to combine senses into groupsYarowski (1992): using a thesaurus and a topic-classified corpus
Unsupervised: sense DISCRIMINATIONSchuetze 1996: using the EM algorithm
2003/04 45
Thesaurus-based classificationThesaurus-based classification
Basic idea (e.g., Walker, 1987): A thesaurus assigns to a word a number of SUBJECT CODES Assumption: each subject code a distinct senseDisambiguate word W by assigning to it the subject code most commonly associated with the words in the context Problems:
general topic categorization may not be very useful for a specific domain (e.g., MOUSE in computer manuals)
coverage problems: NAVRATILOVA
Yarowski, 1992:Assign words to thesaurus category T if they occur more often than chance in contexts of category T in the corpus
2003/04 46
EvaluationEvaluation
Baseline: is the system an improvement?Unsupervised: Random, Simple-LeskSupervised: Most Frequent,Lesk-plus-corpus.
Upper bound: agreement between humans?
2003/04 47
SSENSEVALENSEVAL
Goals:Provide a common framework to compare WSD systemsStandardise the task (especially evaluation procedures)Build and distribute new lexical resources (dictionaries and sense tagged corpora)
Web site: http://www.senseval.org/
“There are now many computer programs for automatically determining the sense of a word in context (Word Sense Disambiguation or WSD). The purpose of Senseval is to evaluate the strengths and weaknesses of such programs with respect to different words, different varieties of language, and different languages.” from: http://www.sle.sharp.co.uk/senseval2
2003/04 48
SSENSEVAL HistoryENSEVAL History
ACL-SIGLEX workshop (1997)Yarowsky and Resnik paper
SENSEVAL-I (1998)Lexical Sample for English, French, and Italian
SENSEVAL-II (Toulouse, 2001)Lexical Sample and All WordsOrganization: Kilkgarriff (Brighton)
SENSEVAL-III (2003)
2003/04 49
WSD at SWSD at SENSEVALENSEVAL-II-II
Choosing the right sense for a word among those of WordNet
Sense 1: horse, Equus caballus -- (solid-hoofed herbivorous quadruped domesticated since prehistoric times) Sense 2: horse -- (a padded gymnastic apparatus on legs) Sense 3: cavalry, horse cavalry, horse -- (troops trained to fight on horseback: "500 horse led the attack") Sense 4: sawhorse, horse, sawbuck, buck -- (a framework for holding wood that is being sawed) Sense 5: knight, horse -- (a chessman in the shape of a horse's head; can move two squares horizontally and one vertically (or vice versa)) Sense 6: heroin, diacetyl morphine, H, horse, junk, scag, shit, smack -- (a morphine derivative)
Corton has been involved in the design, manufacture and installation of horse stalls and horse-related equipment like external doors, shutters and accessories.
2003/04 50
SSENSEVALENSEVAL-II Tasks-II Tasks
All Words (without training data): Czech, Dutch, English, Estonian Lexical Sample (with training data): Basque, Chinese, Danish, English, Italian, Japanese, Korean, Spanish, Swedish
2003/04 51
English English SSENSEVAL-IIENSEVAL-II
Organization: Martha Palmer (UPENN)Gold-standard: 2 annotators and 1 supervisor (Fellbaum)Interchange data format: XMLSense repository: WordNet 1.7 (special Senseval release)Competitors:
All Words: 11systems; Lexical Sample: 16 systems
2003/04 52
English All WordsEnglish All Words
Data: 3 texts for a total of 1770 wordsAverage polysemy: 6.5Example: (part of) Text 1
The art of change-ringing is peculiar to the English and, like most English peculiarities , unintelligible to the rest of the world . -- Dorothy L. Sayers , " The Nine Tailors " ASLACTON , England -- Of all scenes that evoke rural England , this is one of the loveliest : An ancient stone church stands amid the fields , the sound of bells cascading from its tower , calling the faithful to evensong . The parishioners of St. Michael and All Angels stop to chat at the church door , as members here always have . […]
2003/04 53
English All Words SystemsEnglish All Words Systems
Unsupervised (6):UMED (relevance matrix over Gutemberg project corpus)Illinois (Lexical Proximity) Malaysia (MTD, Machine Tractable Dictionary)Litkowsky (New Oxford Dictionary and Contextual Clues)Sheffield (Anaphora and WN hierarchy)IRST (WordNet Domains)
Supervised (5):S. Sebastian (decision lists in Semcor)UCLA (Semcor, Semantic Distance and Density, AltaVista for frequency)Sinequa (Semcor and Semantic Classes)Antwerp (Semcor, Memory Based Learning)Moldovan (Semcor plus an additional sense tagged corpus, heuristics)
2003/04 54
2003/04 55
Lexical SampleLexical Sample
Data: 8699 texts for 73 words Average WN polysemy: 9.22 Training Data: 8166 (average 118/word)Baseline (commonest): 0.47 precisionBaseline (Lesk): 0.51 precision
2003/04 56
Lexical SampleLexical Sample
Example: to leave
<instance id="leave.130"><context>I 'd been seeing Johnnie almost a year now, but I still didn't want to <head>leave</head> him for five whole days. </context></instance>
<instance id="leave.157"><context>And he saw them all as he walked up and down. At two that morning, he was still walking -- up and down Peony, up and down the veranda, up and down the silent, moonlit beach.Finally, in desperation, he opened the refrigerator, filched her hand lotion, and <head>left</head> a note. </context></instance>
2003/04 57
English Lexical Sample SystemsEnglish Lexical Sample Systems
Unsupervised (5): Sunderlard, UNED, Illinois, Litkowsky, ITRISupervised (12): S. Sebastian, Sinequa, Manning, Pedersen, Korea, Yarowsky, Resnik, Pennsylvania, Barcellona, Moldovan, Alicante, IRST
2003/04 58Boosting
Decisi
on Lists
Domains
2003/04 59
Boosting 1/2Boosting 1/2
Combine many simple and moderately accurate Weak Classifiers (WC)Train WCs sequentially, each on the examples which were most difficult to classify by the preceding WCsExamples of WCs:
preceding_word=“house”domain=“sport”...
2003/04 60
Boosting 2/2Boosting 2/2
WCi is trained and tested on the whole corpus;
Each pair {word, synset} is given an “importance weight” h depending on how difficult it was for WC1,…,WCi to classify;
WCi+1 is tuned to classify the worst pairs {word, synset} correctly and it is tested on the whole corpus;
so h is updated at each step
At the end all the WCs are combined into a single rule, the combined hypothesis; each WCs is weighted according to its effectiveness in the tests
2003/04 61
BootstrappingBootstrapping
Problem with supervised methods: need to have annotated corpus!Partial solution (Hearst, Yarowski): train on small set (bootstrap), then use the trained system to annotate a larger corpus, which is then use to retrain
2003/04 62
ReadingsReadings
Jurafsky and Martin, chapter 17.1 and 17.2For more info: Manning and Schuetze, chapter 7Papers:
Gale, Church, and Yarowsky. 1992. A method for disambiguating word senses in large corpora. Computers and the Humanities, v. 26, 415-439.
2003/04 63
AcknowledgmentsAcknowledgments
Slides on SENSEVAL borrowed from Bernardo Magnini, IRST
2003/04 64
Tecniche statistiche per la linguistica Tecniche statistiche per la linguistica computazionalecomputazionale
Lexical acquisition
2003/04 65
The limits of hand-encoded lexical The limits of hand-encoded lexical resourcesresources
Manual construction of lexical resources is very costlyBecause language keeps changing, these resources have to be continuously updated Some information (e.g., about frequencies) has to be computed automatically anyway
2003/04 66
The coverage problemThe coverage problem
Sampson (1989): tested coverage of Oxford ALD (~70,000 entries) looking at a 45,000 subpart of the LOB. About 3% of tokens not listed in dictionaryExamples:
Type of problem Example
Proper noun Caramello, Chateau-Chalon
Foreign word perestroika
Code R101
Non-standard English Havin’
Hyphen omitted bedclothes
Technical vocabulary normoglycaemia
2003/04 67
Lexical acquisitionLexical acquisition
Most MRD these days contain at least some information derived by computational meansMethods developed to
Augment resources such as WordNet with additional information (e.g., in technical contexts such as bio informatics)Develop these resources from scratch
These methods also interesting because of hope they can address the problems encountered by lexicographers when trying to specify word senses
2003/04 68
Vector-based lexical semanticsVector-based lexical semantics
Very old idea in NLE: the meaning of a word can be specified in terms of the values of certain `features’ (`DECOMPOSITIONAL SEMANTICS’)
dog : ANIMATE= +, EAT=MEAT, SOCIAL=+horse : ANIMATE= +, EAT=GRASS, SOCIAL=+cat : ANIMATE= +, EAT=MEAT, SOCIAL=-
Similarity / relatedness: proximity in feature space
2003/04 69
Vector-based lexical semanticsVector-based lexical semantics
DOG
CAT
HORSE
2003/04 70
General characterization of vector-General characterization of vector-based semantics (from Charniak)based semantics (from Charniak)
Vectors as models of conceptsThe CLUSTERING approach to lexical semantics:1. Define properties one cares about, and give values to each
property (generally, numerical)2. Create a vector of length n for each item to be classified3. Viewing the n-dimensional vector as a point in n-space,
cluster points that are near one another
What changes between models:1. The properties used in the vector2. The distance metric used to decide if two points are `close’3. The algorithm used to cluster
2003/04 71
Using words as features in a vector-Using words as features in a vector-based semanticsbased semantics
The old decompositional semantics approach requires i. Specifying the featuresii. Characterizing the value of these features for each lexeme
Simpler approach: use as features the WORDS that occur in the proximity of that word / lexical entry
Intuition: “You can tell a word’s meaning from the company it keeps”
More specifically, you can use as `values’ of these features The FREQUENCIES with which these words occur near the words whose meaning we are definingOr perhaps the PROBABILITIES that these words occur next to each other
Alternative: use the DOCUMENTS in which these words occur (e.g., LSA)Some psychological results support this view. Lund, Burgess, et al (1995, 1997): lexical associations learned this way correlate very well with priming experiments. Landauer et al: good correlation on a variety of topics, including human categorization & vocabulary tests.
2003/04 72
Using neighboring words to specify Using neighboring words to specify the meaning of wordsthe meaning of words
Take, e.g., the following corpus:1. John ate a banana.2. John ate an apple.3. John drove a lorry.
We can extract the following co-occurrence matrix:
john ate drove banana apple lorry
john 0 2 1 1 1 1
ate 2 0 0 1 1 0
drove 1 0 0 0 0 1
banana 1 1 0 0 0 0
apple 1 1 0 0 0 0
lorry 1 0 1 0 0 0
2003/04 73
Variant: using modifiers to specify Variant: using modifiers to specify the meaning of wordsthe meaning of words
…. The Soviet cosmonaut …. The American astronaut …. The red American car …. The old red truck … the spacewalking cosmonaut … the full Moon …
cosmonaut
astronaut moon
car truck
Soviet 1 0 0 1 1
American 0 1 0 1 1
spacewalking
1 1 0 0 0
red 0 0 0 1 1
full 0 0 1 0 0
old 0 0 0 1 1
2003/04 74
Acquiring lexical vectors from a Acquiring lexical vectors from a corpuscorpus(Schuetze, 1991; Burgess and Lund, (Schuetze, 1991; Burgess and Lund, 1997)1997)
To construct vectors C(w) for each word w:1. Scan a text2. Whenever a word w is encountered, increment all cells of C(w)
corresponding to the words v that occur in the vicinity of w, typically within a window of fixed size
Differences among methods:Size of windowWeighted or notWhether every word in the vocabulary counts as a dimension (including function words such as the or and) or whether instead only some specially chosen words are used (typically, the m most common content words in the corpus; or perhaps modifiers only). The words chosen as dimensions are often called CONTEXT WORDSWhether dimensionality reduction methods are applied
2003/04 75
Variant: using probabilities (e.g., Variant: using probabilities (e.g., Dagan et al, 1997)Dagan et al, 1997)
E.g., for house
Context vector (using probabilities)0.001394 0.016212 0.003169 0.000734 0.001460 0.002901 0.004725 0.000598 0 0 0.008993 0.008322 0.000164 0.010771 0.012098 0.002799 0.002064 0.007697 0 0 0.001693 0.000624 0.001624 0.000458 0.002449 0.002732 0 0.008483 0.007929 0 0.001101 0.001806 0 0.005537 0.000726 0.011563 0.010487 0 0.001809 0.010601 0.000348 0.000759 0.000807 0.000302 0.002331 0.002715 0.020845 0.000860 0.000497 0.002317 0.003938 0.001505 0.035262 0.002090 0.004811 0.001248 0.000920 0.001164 0.003577 0.001337 0.000259 0.002470 0.001793 0.003582 0.005228 0.008356 0.005771 0.001810 0 0.001127 0.001225 0 0.008904 0.001544 0.003223 0
2003/04 76
Another variant: Another variant: word / document matricesword / document matrices
d1 d2 d3 d4 d5 d6
cosmonaut 1 0 1 0 0 0
astronaut 0 1 0 0 0 0
moon 1 1 0 0 0 0
car 1 0 0 1 1 0
truck 0 0 0 1 0 1
2003/04 77
Measures of semantic similarityMeasures of semantic similarity
Euclidean distance:
Cosine:
Manhattan Metric:
n
i ii yxd1
n
i i
n
i i
n
i ii
yx
yx
1
2
1
2
1)cos(
n
i ii yxd1
2
2003/04 78
Some psychological evidence for Some psychological evidence for vector-space representationsvector-space representations
Burgess and Lund (1996, 1997): the clusters found with HAL correlate well with those observed using semantic priming experiments.Landauer, Foltz, and Laham (1997): scores overlap with those of humans on standard vocabulary and topic tests; mimic human scores on category judgments; etc.Evidence about `prototype theory’ (Rosch et al, 1976)
Posner and Keel, 1968subjects presented with patterns of dots that had been obtained by variations from single pattern (`prototype’)Later, they recalled prototypes better than samples they had actually seen
Rosch et al, 1976: `basic level’ categories (apple, orange, potato, carrot) have higher `cue validity’ than elements higher in the hierarchy (fruit, vegetable) or lower (red delicious, cox)
2003/04 79
The HAL model (Burgess and Lund, The HAL model (Burgess and Lund, 1995, 1997)1995, 1997)
A 160 million words corpus of articles extracted from all newsgroups containing English dialogueContext words: the 70,000 most frequently occurring symbols within the corpusWindow size: 10 words to the left and the right of the wordMeasure of similarity: cosine
2003/04 80
Latent Semantic Analysis (LSA) Latent Semantic Analysis (LSA) (Landauer et al, 1997)(Landauer et al, 1997)
Goal: extract relatons of expected contextual usage from passages Two steps:1. Build a word / document cooccurrence matrix2. `Weigh’ each cell 3. Perform a DIMENSIONALITY REDUCTION
Argued to correlate well with humans on a number of tests
2003/04 81
LSA: the method, 1LSA: the method, 1
2003/04 82
LSA: Singular Value DecompositionLSA: Singular Value Decomposition
2003/04 83
LSA: Reconstructed matrixLSA: Reconstructed matrix
2003/04 84
Topic correlations in `raw’ and Topic correlations in `raw’ and `reconstructed’ data`reconstructed’ data
2003/04 85
ClusteringClustering
Clustering algorithms partition a set of objects into CLUSTERSAn UNSUPERVISED methodTwo applications:
Exploratory Data AnalysisGENERALIZATION
2003/04 86
ClusteringClustering
X X X X
X X X X
2003/04 87
Clustering with Syntactic InformationClustering with Syntactic Information
Pereira and Tishby, 1992: two words are similar if they occur as objects of the same verbs
John ate POPCORNJohn ate BANANAS
C(w) is the distribution of verbs for which w served as direct object.
First approximation: just countsIn fact: probabilities
Similarity: RELATIVE ENTROPYGrefenstette (1992): extended this to include several attributes.
2003/04 88
Some caveatsSome caveats
Two senses of `similarity’Schuetze: two words are similar if one can replace the otherBrown et al: two words are similar if they occur in similar contexts
What notion of `meaning’ is learned here?“One might consider LSA’s maximal knowledge of the world to be analogous to a well-read nun’s knowledge of sex, a level of knowledge often deemed a sufficient basis for advising the young” (Landauer et al, 1997)
Can one do semantics with these representations?Our own experience: using HAL-style vectors for resolving bridging referencesVery limited successApplying dimensionality reduction didn’t seem to help
2003/04 89
Applications of these techniques: Applications of these techniques: Information RetrievalInformation Retrieval
cosmonaut
astronaut moon
car truck
d1 1 0 1 1 0
d2 0 1 1 0 0
d3 1 0 0 0 0
d4 0 0 0 1 1
d5 0 0 0 1 0
d6 0 0 0 0 1
2003/04 90
ReadingsReadings
Jurafsky and Martin, chapter 17.3Also useful:
Manning and Schuetze, chapter 8Charniak, chapters 9-10
Some papers:Burgess, K. and Lund, K. Landauer, Foltz, and Laham. (1997). Introduction to Latent Semantic Analysis. Discourse Processes.