2003/041 Metodi statistici nella linguistica computazionale Lessico e semantica lessicale WordNet

2003/04 1

Metodi statistici nella linguistica Metodi statistici nella linguistica computazionalecomputazionale

Lessico e semantica lessicale WordNet

2003/04 2

Beyond part of speech tagging: Beyond part of speech tagging: lexical information and NL lexical information and NL applicationsapplications

NLE applications often need to know the MEANING of words at least, or (e.g., in the case of spoken dialogue systems), whole utterancesMany word-strings express apparently unrelated senses / meanings, even after their POS has been determined

Well-known examples: BANK, SCORE, RIGHT, SET, STOCK Homonymy may affect the results of applications such as IR and machine translation

The opposite case of different words with the same meaning (SYNONYMY) also important

E.g., for IR systems (synonym expansion)

HOMOGRAPHY may affect Speech Synthesis

2003/04 3

Meaning ….Meaning ….

•Characterizing the meaning of words not easy

• Most of the methods considered in these lecture characterize the meaning of a word by stating its relations with other words•This method however doesn’t say much about what the word ACTUALLY mean (e.g., what can you do with a car)

2003/04 4

HomonymyHomonymy

Word-strings like STOCK are used to express apparently unrelated senses / meanings, even in contexts in which their part-of-speech has been determined

Other well-known examples: BANK, RIGHT, SET, SCALE

An example of the problems homonimy may cause for IR systems

Search for 'West Bank' with Google

http://www.google.com/search?hl=en&ie=ISO-8859-1&q=west+bank

http://www.google.com/search?hl=en&ie=ISO-8859-1&q=west+bank

2003/04 5

Homonymy and machine translationHomonymy and machine translation

2003/04 6

Pronunciation: homography, Pronunciation: homography, homophonyhomophony

HOMOGRAPHS: BASSThe expert angler from Dora, Mo was fly-casting for BASS rather than the traditional trout.The curtain rises to the sound of angry dogs baying and ominous BASS chords sounding.

Problems caused by homography: text to speechRhetoricalAT&T text to speech

Many spelling errors are caused by HOMOPHONES – distinct lexemes with a single pronunciation

Its vs. it’sweather vs. whetherExamples from the sport page of the Rabbit

http://www.rhetorical.com/cgi-bin/demo1.cgi

http://www.research.att.com/~ttsweb/cgi-bin/ttsdemo

2003/04 7

More advanced lexical resources for More advanced lexical resources for CL: DICTIONARIESCL: DICTIONARIES

A traditional DICTIONARY is a database containing information about

the PRONUNCIATION of a certain wordits possible PARTS of SPEECHits possible SENSES (or MEANINGS)

In recent years, most dictionaries have appeared in Machine Readable form (MRD)

Oxford English DictionaryCollinsLongman Dictionary of Ordinary Contemporary English (LDOCE)

2003/04 8

An example LEXICAL ENTRY from a An example LEXICAL ENTRY from a machine-readable dictionary: STOCK,from machine-readable dictionary: STOCK,from the LDOCEthe LDOCE

0100 a supply (of something) for use: a good stock of food 0200 goods for sale: Some of the stock is being taken without being paid for 0300 the thick part of a tree trunk 0400 (a) a piece of wood used as a support or handle, as for a gun or tool (b) the piece which goes across the top of an ANCHOR^1 (1) from side to side 0500 (a) a plant from which CUTTINGs are grown (b) a stem onto which another plant is GRAFTed 0600 a group of animals used for breeding 0700 farm animals usu. cattle; LIVESTOCK 0800 a family line, esp. of the stated character 0900 money lent to a government at a fixed rate of interest 1000 the money (CAPITAL) owned by a company, divided into SHAREs 1100 a type of garden flower with a sweet smell 1200 a liquid made from the juices of meat, bones, etc., used in cooking …..

2003/04 9

Why people don’t see the problem?Why people don’t see the problem?

Just as in the case of POS tagging, one sense generally more frequent

2003/04 10

One type of information generally One type of information generally present in MRDs: SYNONYMYpresent in MRDs: SYNONYMY

Two words are SYNONYMS if they have the same meaning at least in some contextsE.g., PRICE and FARE; CHEAP and INEXPENSIVE; LAPTOP and NOTEBOOK; HOME and HOUSE

I’m looking for a CHEAP FLIGHT / INEXPENSIVE FLIGHT

From Roget’s thesaurus:OBLITERATION, erasure, cancellation, deletion

But few words are truly synonymous in ALL contexts:I wanna go HOME / ?? I wanna go HOUSEThe flight was CANCELLED / ?? OBLITERATED / ??? DELETED

Knowing about synonyms may help in IR: NOTEBOOK (get LAPTOPs as well)CHEAP PRICE (get INEXPENSIVE FARE)

2003/04 11

Problems and limitations of MRDsProblems and limitations of MRDs

Identifying distinct senses always difficult- Sense distinctions often subjective

Definitions often circular

Very limited characterization of the meaning of words

2003/04 12

POLYSEMYPOLYSEMY

In cases like BANK, it’s fairly easy to identify two distinct senses (etymology also different). But in other cases, distinctions more questionable

E.g., senses 0100 and 0200 of stock clearly related, like 0600 and 0700, or 0900 and 1000

In some cases, syntactic tests may help. E.g., SERVE:They rarely SERVE red meat, preferring to prepare seafood, poultry or game birdsHe SERVED as US Ambassador to Norway in 1976 and 1977He might have SERVED his time, come out and led an upstanding life.

In general, not always easy to distinguish between HOMONIMY and POLYSEMY

2003/04 13

Homonymy and polysemyHomonymy and polysemy

0100 a supply (of something) for use: a good stock of food 0200 goods for sale: Some of the stock is being taken without being paid for 0300 the thick part of a tree trunk 0400 (a) a piece of wood used as a support or handle, as for a gun or tool (b) the piece which goes across the top of an ANCHOR^1 (1) from side to side 0500 (a) a plant from which CUTTINGs are grown (b) a stem onto which another plant is GRAFTed 0600 a group of animals used for breeding 0700 farm animals usu. cattle; LIVESTOCK 0800 a family line, esp. of the stated character 0900 money lent to a government at a fixed rate of interest 1000 the money (CAPITAL) owned by a company, divided into SHAREs 1100 a type of garden flower with a sweet smell 1200 a liquid made from the juices of meat, bones, etc., used in cooking …..

2003/04 14

Other aspects of lexical meaning not Other aspects of lexical meaning not captured by MRDscaptured by MRDs

Other semantic relations:HYPONYMYANTONYMY

A lot of other information typically considered part of ENCYCLOPEDIAs:

Trees grow bark and twigsAdult trees are much taller than human beings

2003/04 15

Hyponymy and HypernymyHyponymy and Hypernymy

HYPONYMY is the relation between a subclass and a superclass:

CAR and VEHICLEDOG and ANIMALBUNGALOW and HOUSE

Generally speaking, a hyponymy relation holds between X and Y whenever it is possible to substitute Y for X:

That is a X -> That is a YE.g., That is a CAR -> That is a VEHICLE.

HYPERNYMY is the opposite relationKnowledge about TAXONOMIES useful to classify web pages

Eg., Semantic WebAutomatically (e.g., Udo Kruschwitz’s system)

This information not generally contained in MRD

2003/04 17

The organization of the lexiconThe organization of the lexicon

“stock”

WORD-STRINGS LEXEMES SENSES

STOCK-LEX-1

STOCK-LEX-2

STOCK-LEX-3

stock0100

stock0200

stock0600

stock0700

stock0900

stock1000

2003/04 18

SynonymySynonymy

“cheap”

WORD-STRINGS LEXEMES SENSES

CHEAP-LEX-1

CHEAP-LEX-2

INEXP-LEX-3

cheap0100

….

……

cheapXXXX

inexp0900

inexpYYYY

“inexpensive”

2003/04 19

A more advanced lexical resource: A more advanced lexical resource: WordNetWordNet

A lexical database created at PrincetonFreely available for research from the Princeton sitehttp://www.cogsci.princeton.edu/~wn/

Information about a variety of SEMANTICAL RELATIONS Three sub-databases (supported by psychological research as early as (Fillenbaum and Jones, 1965))

NOUNsVERBSADJECTIVES and ADVERBS

Each database organized around SYNSETS

http://www.cogsci.princeton.edu/~wn/









2003/04 20

The noun databaseThe noun database

About 90,000 forms, 116,000 sensesRelations:

hypernym breakfast -> meal

hyponym meal -> lunch

has-member faculty -> professor

member-of copilot -> crew

has-Part table -> leg

part-of course -> meal

antonym leader -> follower

2003/04 21

SynsetsSynsets

Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET

E.g.,

{chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug}

(gloss: person who is gullible and easy to take advantage of)

2003/04 22

HypernymsHypernyms2 senses of robin

Sense 1robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast) => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast) => oscine, oscine bird -- (passerine bird having specialized vocal apparatus) => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless) => bird -- (warm-blooded egg-laying vertebrates characterized by feathers and forelimbs modified as wings) => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium) => chordate -- (any animal of the phylum Chordata having a notochord or spinal column) => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement) => organism, being -- (a living thing that has (or can develop) the ability to act or function independently) => living thing, animate thing -- (a living (or once living) entity) => object, physical object -- => entity, physical thing --

2003/04 23

MeronymyMeronymy

wn beak –holon

Holonyms of noun beak

1 of 3 senses of beak

Sense 2

beak, bill, neb, nib

PART OF: bird

2003/04 24

The verb databaseThe verb database

About 10,000 forms, 20,000 sensesRelations between verb meanings:

Hypernym fly-> travel

Troponym Walk -> stroll

Entails Snore -> sleep

Antonym Increase -> decrease

2003/04 25

Relations between verbal meaningsRelations between verbal meanings

V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2- e.g., snore entails sleep

TROPONYMY when To do V1 is To do V2 in some manner- e.g., limp is a troponym of walk

2003/04 26

The adjective and adverb databaseThe adjective and adverb database

About 20,000 adjective forms, 30,000 senses4,000 adverbs, 5600 sensesRelations:

Antonym (adjective) Heavy <-> light

Antonym (adverb) Quickly <-> slowly

2003/04 27

How to useHow to use

Online: http://cogsci.princeton.edu/cgi-bin/webwnCommand line:

Get synonyms:wn –synsn bank

Get hypernyms:wn –hypen robin

(also for adjectives and verbs): get antonymswn –antsa right

http://cogsci.princeton.edu/cgi-bin/webwn










2003/04 29

Other machine-readable lexical Other machine-readable lexical resourcesresources

Machine readable dictionaries:LDOCE

Roget’s ThesaurusThe biggest encyclopedia: CYC

2003/04 30

ReadingsReadings

Jurafsky and Martin, chapter 16WordNet online manualsC. Fellbaum (ed), Wordnet: An Electronic Lexical Database, The MIT Press

2003/04 31

Metodi statistici nella linguistica Metodi statistici nella linguistica computazionalecomputazionale

Word sense disambiguation

2003/04 32

Identifying the sense of a word in its Identifying the sense of a word in its contextcontext

In many respects, a similar task to that of POS taggingBut less agreement on what the senses are, so the UPPER BOUND is lowerMain methods we’ll discuss:

Frequency-based disambiguationSelectional restrictionsDictionary-basedStatistical methods: Naïve Bayes

2003/04 33

Corpora used for word sense Corpora used for word sense disambiguation workdisambiguation work

Sense Annotated (Difficult and expensive to build)Semcor (200.000)DSO (192.000 semantically annotated occurrences of 121 nouns and 70 verbs), Training data Senseval (8699 texts for 73 words)Tal-treebank(80000)

Non Annotated (Available in large quantity)Brown, LOB, Reuters

2003/04 34

SEMCORSEMCOR

<contextfile concordance="brown"><context filename="br-h15" paras="yes">…..<wf cmd="ignore" pos="IN">in</wf> <wf cmd="done" pos="NN" lemma="fig" wnsn="1" lexsn="1:10:00::">fig.</wf> <wf cmd="done" pos="NN" lemma="6" wnsn="1“ lexsn="1:23:00::">6</wf> <punc>)</punc> <wf cmd="done" pos="VBP" ot="notag">are</wf> <wf cmd="done" pos="VB" lemma="slip" wnsn="3" lexsn="2:38:00::">slipped</wf> <wf cmd="ignore" pos="IN">into</wf> <wf cmd="done" pos="NN" lemma="place" wnsn="9" lexsn="1:15:05::">place</wf> <wf cmd="ignore" pos="IN">across</wf> <wf cmd="ignore" pos="DT">the</wf> <wf cmd="done" pos="NN" lemma="roof" wnsn="1" lexsn="1:06:00::">roof</wf> <wf cmd="done" pos="NN" lemma="beam" wnsn="2" lexsn="1:06:00::">beams</wf> <punc>,</punc>

2003/04 35

Dictionary-based approachesDictionary-based approaches

Lesk (1986): 1. Retrieve from MRD all sense definitions of the word to be

disambiguated2. Compare with sense definitions of words in context3. Choose sense with most overlap

Example:PINE

1 kinds of evergreen tree with needle-shaped leaves2 waste away through sorrow or illness

CONE 1 solid body which narrows to a point2 something of this shape whether solid or hollow3 fruit of certain evergreen trees

Disambiguate: PINE CONE

2003/04 36

Psychological research on word-Psychological research on word-sense disambiguationsense disambiguation

Frequency effects: common words such as DOOR are responded more quickly than uncommon words such as CASK (Forster and Chambers, 1973)Context effects: words are identified more quickly in context than out of context (Meyer and Schvaneveldt, 1971)

bread and butter

Lexical access: the COHORT model (Marslen-Wilson and Welsh, 1978)

2003/04 37

Frequency-based word-sense Frequency-based word-sense disambiguationdisambiguation

If you have a corpus in which each word is annotated with its sense, you can collect unigram statistics (count the number of times each sense occurs in the corpus)

P(SENSE)P(SENSE|WORD)

E.g., if you have 5845 uses of the word bridge,5641 cases in which it is tagged with the sense STRUCTURE194 instances with the sense DENTAL-DEVICE

Frequency-based WSD gets about 70% correctTo improve upon these results, need context

2003/04 38

Selectional restrictionsSelectional restrictions

One type of contextual information is the information about the type of arguments that a verb takes – its SELECTIONAL RESTRICTIONS:

AGENT EAT FOOD-STUFFAGENT DRIVE VEHICLE

Example:Which airlines serve DENVER?Which airlines serve BREAKFAST?

Limitations:In his two championship trials, Mr. Kulkarni ate glass on an empty stomach, accompanied only by water and tea.But if fell apart in 1931, perhaps because people realized that you can’t EAT gold for lunch if you’re hungry

Resnik (1998): 44% with these methods

2003/04 39

Supervised approaches to WSD: Supervised approaches to WSD: Naïve BayesNaïve Bayes

A Naïve Bayes Classifier chooses the most probable sense for a word given the context:

As usual, this can be expressed as:

The BAYES ASSUMPTION: all the features are independent

)|(argmax CsPs k

)(

)()|(maxarg

CP

sPsCPs kk

)|()|(1

kj

n

jk svPsCP

2003/04 40

An example of use of Naïve Bayes An example of use of Naïve Bayes classifiers: Gale et al (1992)classifiers: Gale et al (1992)

Used this method to disambiguated word senses using an ALIGNED CORPUS (Hansard) to get the word senses

English French Sense Number of examples

duty droitdevoir

taxobligation

1114691

drug medicamentdrogue

medicalillicit

2292855

land terrepays

propertycountry

1022386

2003/04 41

Gale et al: words as contextual cluesGale et al: words as contextual clues

Gale et al view a ‘context’ as a set of wordsGood clues for the different senses of DRUG:

Medication: prices, prescription, patent, increase, consumer, pharmaceuticalIllegal substance: abuse, paraphernalia, illicit, alcohol, cocaine, traffickers

To determine which interpretation is more likely, extract words (e.g. ABUSE) from context, and use P(abuse|medicament), P(abuse|drogue)To estimate these probabilities, use MLE:

P(abuse|medicament) = C(abuse, medicament) / C(medicament)P(medicament) = C(medicament) / C(drug)

2003/04 42

The algorithmThe algorithm

TRAINING:

for all senses sk of w do for all words vj in the vocabulary do P(vj|sk) = C(vj,sk)/C(sk) endend

for all senses sk of w do P(sk) = C(sk)/C(w)end

DISAMBIGUATION:

for all senses sk of w do score(sk) = log P(sk) for all words vj in the context window c do score(sk) = score(sk) + log P(vj|sk) endend

choose s’ = argmax score(sk)

2003/04 43

ResultsResults

Gale et al (1992): disambiguation system using this algorithm correct for about 90% of occurrences of six ambiguous nouns in the Hansard corpus:

duty, drug, land, language, position, sentence

Good clues for drug:medication sense: prices, prescription, patent, increaseillegal substance sense: abuse, paraphernalia, illicit, alcohol, cocaine, traffickers

2003/04 44

Other methods for WSDOther methods for WSD

Supervised:Brown et al, 1991: using mutual information to combine senses into groupsYarowski (1992): using a thesaurus and a topic-classified corpus

Unsupervised: sense DISCRIMINATIONSchuetze 1996: using the EM algorithm

2003/04 45

Thesaurus-based classificationThesaurus-based classification

Basic idea (e.g., Walker, 1987): A thesaurus assigns to a word a number of SUBJECT CODES Assumption: each subject code a distinct senseDisambiguate word W by assigning to it the subject code most commonly associated with the words in the context Problems:

general topic categorization may not be very useful for a specific domain (e.g., MOUSE in computer manuals)

coverage problems: NAVRATILOVA

Yarowski, 1992:Assign words to thesaurus category T if they occur more often than chance in contexts of category T in the corpus

2003/04 46

EvaluationEvaluation

Baseline: is the system an improvement?Unsupervised: Random, Simple-LeskSupervised: Most Frequent,Lesk-plus-corpus.

Upper bound: agreement between humans?

2003/04 47

SSENSEVALENSEVAL

Goals:Provide a common framework to compare WSD systemsStandardise the task (especially evaluation procedures)Build and distribute new lexical resources (dictionaries and sense tagged corpora)

Web site: http://www.senseval.org/

“There are now many computer programs for automatically determining the sense of a word in context (Word Sense Disambiguation or WSD). The purpose of Senseval is to evaluate the strengths and weaknesses of such programs with respect to different words, different varieties of language, and different languages.” from: http://www.sle.sharp.co.uk/senseval2

http://www.senseval.org/

http://www.sle.sharp.co.uk/senseval2





2003/04 48

SSENSEVAL HistoryENSEVAL History

ACL-SIGLEX workshop (1997)Yarowsky and Resnik paper

SENSEVAL-I (1998)Lexical Sample for English, French, and Italian

SENSEVAL-II (Toulouse, 2001)Lexical Sample and All WordsOrganization: Kilkgarriff (Brighton)

SENSEVAL-III (2003)

2003/04 49

WSD at SWSD at SENSEVALENSEVAL-II-II

Choosing the right sense for a word among those of WordNet

Sense 1: horse, Equus caballus -- (solid-hoofed herbivorous quadruped domesticated since prehistoric times) Sense 2: horse -- (a padded gymnastic apparatus on legs) Sense 3: cavalry, horse cavalry, horse -- (troops trained to fight on horseback: "500 horse led the attack") Sense 4: sawhorse, horse, sawbuck, buck -- (a framework for holding wood that is being sawed) Sense 5: knight, horse -- (a chessman in the shape of a horse's head; can move two squares horizontally and one vertically (or vice versa)) Sense 6: heroin, diacetyl morphine, H, horse, junk, scag, shit, smack -- (a morphine derivative)

Corton has been involved in the design, manufacture and installation of horse stalls and horse-related equipment like external doors, shutters and accessories.

2003/04 50

SSENSEVALENSEVAL-II Tasks-II Tasks

All Words (without training data): Czech, Dutch, English, Estonian Lexical Sample (with training data): Basque, Chinese, Danish, English, Italian, Japanese, Korean, Spanish, Swedish

2003/04 51

English English SSENSEVAL-IIENSEVAL-II

Organization: Martha Palmer (UPENN)Gold-standard: 2 annotators and 1 supervisor (Fellbaum)Interchange data format: XMLSense repository: WordNet 1.7 (special Senseval release)Competitors:

All Words: 11systems; Lexical Sample: 16 systems

2003/04 52

English All WordsEnglish All Words

Data: 3 texts for a total of 1770 wordsAverage polysemy: 6.5Example: (part of) Text 1

The art of change-ringing is peculiar to the English and, like most English peculiarities , unintelligible to the rest of the world . -- Dorothy L. Sayers , " The Nine Tailors " ASLACTON , England -- Of all scenes that evoke rural England , this is one of the loveliest : An ancient stone church stands amid the fields , the sound of bells cascading from its tower , calling the faithful to evensong . The parishioners of St. Michael and All Angels stop to chat at the church door , as members here always have . […]

2003/04 53

English All Words SystemsEnglish All Words Systems

Unsupervised (6):UMED (relevance matrix over Gutemberg project corpus)Illinois (Lexical Proximity) Malaysia (MTD, Machine Tractable Dictionary)Litkowsky (New Oxford Dictionary and Contextual Clues)Sheffield (Anaphora and WN hierarchy)IRST (WordNet Domains)

Supervised (5):S. Sebastian (decision lists in Semcor)UCLA (Semcor, Semantic Distance and Density, AltaVista for frequency)Sinequa (Semcor and Semantic Classes)Antwerp (Semcor, Memory Based Learning)Moldovan (Semcor plus an additional sense tagged corpus, heuristics)

2003/04 54

2003/04 55

Lexical SampleLexical Sample

Data: 8699 texts for 73 words Average WN polysemy: 9.22 Training Data: 8166 (average 118/word)Baseline (commonest): 0.47 precisionBaseline (Lesk): 0.51 precision

2003/04 56

Lexical SampleLexical Sample

Example: to leave

<instance id="leave.130"><context>I 'd been seeing Johnnie almost a year now, but I still didn't want to <head>leave</head> him for five whole days. </context></instance>

<instance id="leave.157"><context>And he saw them all as he walked up and down. At two that morning, he was still walking -- up and down Peony, up and down the veranda, up and down the silent, moonlit beach.Finally, in desperation, he opened the refrigerator, filched her hand lotion, and <head>left</head> a note. </context></instance>

2003/04 57

English Lexical Sample SystemsEnglish Lexical Sample Systems

Unsupervised (5): Sunderlard, UNED, Illinois, Litkowsky, ITRISupervised (12): S. Sebastian, Sinequa, Manning, Pedersen, Korea, Yarowsky, Resnik, Pennsylvania, Barcellona, Moldovan, Alicante, IRST

2003/04 58Boosting

Decisi

on Lists

Domains

2003/04 59

Boosting 1/2Boosting 1/2

Combine many simple and moderately accurate Weak Classifiers (WC)Train WCs sequentially, each on the examples which were most difficult to classify by the preceding WCsExamples of WCs:

preceding_word=“house”domain=“sport”...

2003/04 60

Boosting 2/2Boosting 2/2

WCi is trained and tested on the whole corpus;

Each pair {word, synset} is given an “importance weight” h depending on how difficult it was for WC1,…,WCi to classify;

WCi+1 is tuned to classify the worst pairs {word, synset} correctly and it is tested on the whole corpus;

so h is updated at each step

At the end all the WCs are combined into a single rule, the combined hypothesis; each WCs is weighted according to its effectiveness in the tests

2003/04 61

BootstrappingBootstrapping

Problem with supervised methods: need to have annotated corpus!Partial solution (Hearst, Yarowski): train on small set (bootstrap), then use the trained system to annotate a larger corpus, which is then use to retrain

2003/04 62

ReadingsReadings

Jurafsky and Martin, chapter 17.1 and 17.2For more info: Manning and Schuetze, chapter 7Papers:

Gale, Church, and Yarowsky. 1992. A method for disambiguating word senses in large corpora. Computers and the Humanities, v. 26, 415-439.

2003/04 63

AcknowledgmentsAcknowledgments

Slides on SENSEVAL borrowed from Bernardo Magnini, IRST

2003/04 64

Tecniche statistiche per la linguistica Tecniche statistiche per la linguistica computazionalecomputazionale

Lexical acquisition

2003/04 65

The limits of hand-encoded lexical The limits of hand-encoded lexical resourcesresources

Manual construction of lexical resources is very costlyBecause language keeps changing, these resources have to be continuously updated Some information (e.g., about frequencies) has to be computed automatically anyway

2003/04 66

The coverage problemThe coverage problem

Sampson (1989): tested coverage of Oxford ALD (~70,000 entries) looking at a 45,000 subpart of the LOB. About 3% of tokens not listed in dictionaryExamples:

Type of problem Example

Proper noun Caramello, Chateau-Chalon

Foreign word perestroika

Code R101

Non-standard English Havin’

Hyphen omitted bedclothes

Technical vocabulary normoglycaemia

2003/04 67

Lexical acquisitionLexical acquisition

Most MRD these days contain at least some information derived by computational meansMethods developed to

Augment resources such as WordNet with additional information (e.g., in technical contexts such as bio informatics)Develop these resources from scratch

These methods also interesting because of hope they can address the problems encountered by lexicographers when trying to specify word senses

2003/04 68

Vector-based lexical semanticsVector-based lexical semantics

Very old idea in NLE: the meaning of a word can be specified in terms of the values of certain `features’ (`DECOMPOSITIONAL SEMANTICS’)

dog : ANIMATE= +, EAT=MEAT, SOCIAL=+horse : ANIMATE= +, EAT=GRASS, SOCIAL=+cat : ANIMATE= +, EAT=MEAT, SOCIAL=-

Similarity / relatedness: proximity in feature space

2003/04 69

Vector-based lexical semanticsVector-based lexical semantics

DOG

CAT

HORSE

2003/04 70

General characterization of vector-General characterization of vector-based semantics (from Charniak)based semantics (from Charniak)

Vectors as models of conceptsThe CLUSTERING approach to lexical semantics:1. Define properties one cares about, and give values to each

property (generally, numerical)2. Create a vector of length n for each item to be classified3. Viewing the n-dimensional vector as a point in n-space,

cluster points that are near one another

What changes between models:1. The properties used in the vector2. The distance metric used to decide if two points are `close’3. The algorithm used to cluster

2003/04 71

Using words as features in a vector-Using words as features in a vector-based semanticsbased semantics

The old decompositional semantics approach requires i. Specifying the featuresii. Characterizing the value of these features for each lexeme

Simpler approach: use as features the WORDS that occur in the proximity of that word / lexical entry

Intuition: “You can tell a word’s meaning from the company it keeps”

More specifically, you can use as `values’ of these features The FREQUENCIES with which these words occur near the words whose meaning we are definingOr perhaps the PROBABILITIES that these words occur next to each other

Alternative: use the DOCUMENTS in which these words occur (e.g., LSA)Some psychological results support this view. Lund, Burgess, et al (1995, 1997): lexical associations learned this way correlate very well with priming experiments. Landauer et al: good correlation on a variety of topics, including human categorization & vocabulary tests.

2003/04 72

Using neighboring words to specify Using neighboring words to specify the meaning of wordsthe meaning of words

Take, e.g., the following corpus:1. John ate a banana.2. John ate an apple.3. John drove a lorry.

We can extract the following co-occurrence matrix:

john ate drove banana apple lorry

john 0 2 1 1 1 1

ate 2 0 0 1 1 0

drove 1 0 0 0 0 1

banana 1 1 0 0 0 0

apple 1 1 0 0 0 0

lorry 1 0 1 0 0 0

2003/04 73

Variant: using modifiers to specify Variant: using modifiers to specify the meaning of wordsthe meaning of words

…. The Soviet cosmonaut …. The American astronaut …. The red American car …. The old red truck … the spacewalking cosmonaut … the full Moon …

cosmonaut

astronaut moon

car truck

Soviet 1 0 0 1 1

American 0 1 0 1 1

spacewalking

1 1 0 0 0

red 0 0 0 1 1

full 0 0 1 0 0

old 0 0 0 1 1

2003/04 74

Acquiring lexical vectors from a Acquiring lexical vectors from a corpuscorpus(Schuetze, 1991; Burgess and Lund, (Schuetze, 1991; Burgess and Lund, 1997)1997)

To construct vectors C(w) for each word w:1. Scan a text2. Whenever a word w is encountered, increment all cells of C(w)

corresponding to the words v that occur in the vicinity of w, typically within a window of fixed size

Differences among methods:Size of windowWeighted or notWhether every word in the vocabulary counts as a dimension (including function words such as the or and) or whether instead only some specially chosen words are used (typically, the m most common content words in the corpus; or perhaps modifiers only). The words chosen as dimensions are often called CONTEXT WORDSWhether dimensionality reduction methods are applied

2003/04 75

Variant: using probabilities (e.g., Variant: using probabilities (e.g., Dagan et al, 1997)Dagan et al, 1997)

E.g., for house

Context vector (using probabilities)0.001394 0.016212 0.003169 0.000734 0.001460 0.002901 0.004725 0.000598 0 0 0.008993 0.008322 0.000164 0.010771 0.012098 0.002799 0.002064 0.007697 0 0 0.001693 0.000624 0.001624 0.000458 0.002449 0.002732 0 0.008483 0.007929 0 0.001101 0.001806 0 0.005537 0.000726 0.011563 0.010487 0 0.001809 0.010601 0.000348 0.000759 0.000807 0.000302 0.002331 0.002715 0.020845 0.000860 0.000497 0.002317 0.003938 0.001505 0.035262 0.002090 0.004811 0.001248 0.000920 0.001164 0.003577 0.001337 0.000259 0.002470 0.001793 0.003582 0.005228 0.008356 0.005771 0.001810 0 0.001127 0.001225 0 0.008904 0.001544 0.003223 0

2003/04 76

Another variant: Another variant: word / document matricesword / document matrices

d1 d2 d3 d4 d5 d6

cosmonaut 1 0 1 0 0 0

astronaut 0 1 0 0 0 0

moon 1 1 0 0 0 0

car 1 0 0 1 1 0

truck 0 0 0 1 0 1

2003/04 77

Measures of semantic similarityMeasures of semantic similarity

Euclidean distance:

Cosine:

Manhattan Metric:

n

i ii yxd1

n

i i

n

i i

n

i ii

yx

yx

1

2

1

2

1)cos(

n

i ii yxd1

2

2003/04 78

Some psychological evidence for Some psychological evidence for vector-space representationsvector-space representations

Burgess and Lund (1996, 1997): the clusters found with HAL correlate well with those observed using semantic priming experiments.Landauer, Foltz, and Laham (1997): scores overlap with those of humans on standard vocabulary and topic tests; mimic human scores on category judgments; etc.Evidence about `prototype theory’ (Rosch et al, 1976)

Posner and Keel, 1968subjects presented with patterns of dots that had been obtained by variations from single pattern (`prototype’)Later, they recalled prototypes better than samples they had actually seen

Rosch et al, 1976: `basic level’ categories (apple, orange, potato, carrot) have higher `cue validity’ than elements higher in the hierarchy (fruit, vegetable) or lower (red delicious, cox)

2003/04 79

The HAL model (Burgess and Lund, The HAL model (Burgess and Lund, 1995, 1997)1995, 1997)

A 160 million words corpus of articles extracted from all newsgroups containing English dialogueContext words: the 70,000 most frequently occurring symbols within the corpusWindow size: 10 words to the left and the right of the wordMeasure of similarity: cosine

2003/04 80

Latent Semantic Analysis (LSA) Latent Semantic Analysis (LSA) (Landauer et al, 1997)(Landauer et al, 1997)

Goal: extract relatons of expected contextual usage from passages Two steps:1. Build a word / document cooccurrence matrix2. `Weigh’ each cell 3. Perform a DIMENSIONALITY REDUCTION

Argued to correlate well with humans on a number of tests

2003/04 81

LSA: the method, 1LSA: the method, 1

2003/04 82

LSA: Singular Value DecompositionLSA: Singular Value Decomposition

2003/04 83

LSA: Reconstructed matrixLSA: Reconstructed matrix

2003/04 84

Topic correlations in `raw’ and Topic correlations in `raw’ and `reconstructed’ data`reconstructed’ data

2003/04 85

ClusteringClustering

Clustering algorithms partition a set of objects into CLUSTERSAn UNSUPERVISED methodTwo applications:

Exploratory Data AnalysisGENERALIZATION

2003/04 86

ClusteringClustering

X X X X

X X X X

2003/04 87

Clustering with Syntactic InformationClustering with Syntactic Information

Pereira and Tishby, 1992: two words are similar if they occur as objects of the same verbs

John ate POPCORNJohn ate BANANAS

C(w) is the distribution of verbs for which w served as direct object.

First approximation: just countsIn fact: probabilities

Similarity: RELATIVE ENTROPYGrefenstette (1992): extended this to include several attributes.

2003/04 88

Some caveatsSome caveats

Two senses of `similarity’Schuetze: two words are similar if one can replace the otherBrown et al: two words are similar if they occur in similar contexts

What notion of `meaning’ is learned here?“One might consider LSA’s maximal knowledge of the world to be analogous to a well-read nun’s knowledge of sex, a level of knowledge often deemed a sufficient basis for advising the young” (Landauer et al, 1997)

Can one do semantics with these representations?Our own experience: using HAL-style vectors for resolving bridging referencesVery limited successApplying dimensionality reduction didn’t seem to help

2003/04 89

Applications of these techniques: Applications of these techniques: Information RetrievalInformation Retrieval

cosmonaut

astronaut moon

car truck

d1 1 0 1 1 0

d2 0 1 1 0 0

d3 1 0 0 0 0

d4 0 0 0 1 1

d5 0 0 0 1 0

d6 0 0 0 0 1

2003/04 90

ReadingsReadings

Jurafsky and Martin, chapter 17.3Also useful:

Manning and Schuetze, chapter 8Charniak, chapters 9-10

Some papers:Burgess, K. and Lund, K. Landauer, Foltz, and Laham. (1997). Introduction to Latent Semantic Analysis. Discourse Processes.

Documents

2003/041 Metodi statistici nella linguistica computazionale Lessico e semantica lessicale WordNet