48
IIT Kharagpur Chris Biemann, Dec 27, 2012 [email protected] Quantifying Semantics and Contextualizing Distributional Similarity

Quantifying Semantics and Contextualizing Distributional

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quantifying Semantics and Contextualizing Distributional

IIT Kharagpur

Chris Biemann, Dec 27, 2012 [email protected]

Quantifying Semantics and Contextualizing

Distributional Similarity

Page 2: Quantifying Semantics and Contextualizing Distributional

2

Outline

 Quantifying Semantics using Complex Network Analysis  deficiency of n-gram models   transitivity measure, motif profile

 Contextualizing Distributional Semantics   from linguistic theory  over lexical expansion methods   to the two dimensional text

 Language to Knowledge Mapping  word sense induction and disambiguation   labeling word sense clusters  connecting word sense clusters to ontologies

 Conclusions

Page 3: Quantifying Semantics and Contextualizing Distributional

3

What is wrong with this?

 The church has much to Dylan .

 Sonny Rollins received the least popular original housemate .

 Al Wong , Sports Illustrated that he had awakened ; the passive virtues reveal a gun due to various flora and fauna .

Page 4: Quantifying Semantics and Contextualizing Distributional

4

Semantic Incoherence in n-gram Language Models (LMs)

 LMs measure the probability of a sequence, given the model.

 Application for LMs: pick the best of several candidates, e.g. in MT, NLG, spelling correction etc.

 Standard LM: n-gram model:

 n-gram models only take local context into account: Random text generated by n-gram models is readable but semantically incoherent

 Can we quantify the amount of incoherence?

P(w1w2...w

n) = P(w

i|w

i!1wi!2)

i

"

Page 5: Quantifying Semantics and Contextualizing Distributional

5

Quantifying Semantic Coherence

 Method: Compare real language with LM-generated language on quantitative parameters

 Known congruence of n-gram models and real language for  word frequency distribution  scale-free-ness of co-occurrence graph degree distribution

real language corpus

generated language corpus x <=> y quant.

measure quant. measure

Page 6: Quantifying Semantics and Contextualizing Distributional

6

Co-occurrence Graphs

 When regarding words as vertices and edge weights as the number of times two words co-occur (in a sentence), the word co-occurrence graph of a corpus is given by the entirety of all word co-occurrences

 To find out whether the co-occurrence of two specific words A and B is merely due to chance or exhibits a statistical dependency, we compute to what extent the co-occurrence of A and B is statistically significant using the the log-likelihood measure

 Pruning of the co-occurrence graph:  by word frequency:

only retain the most frequent 5000 words  by significance: only retain

edges above a certain threshold s>10.83

Page 7: Quantifying Semantics and Contextualizing Distributional

7

Transitivity

 Transitivity measures the probability of a connection between nodes B and C if both are connected to a third node A

 Closely related to Clustering Coefficient

A

B C

Page 8: Quantifying Semantics and Contextualizing Distributional

8

Network Motifs

 local connectivity patterns in graphs  used for characterizing complex networks  methodology: compare motif profile of real network with random

network

Milo R, Itzkovitz S, Kashtan N, et al. (March 2004). "Superfamilies of evolved and designed networks". Science 303 (5663): 1538–42.

Page 9: Quantifying Semantics and Contextualizing Distributional

9

Transitivity Results

 Distinction: Real-language networks are clustered more  Quantification: Higher n get closer to real language

A

B C

Page 10: Quantifying Semantics and Contextualizing Distributional

10

Monday Networks

  less co-occurrence between related terms for n-grams

 2-gram model not even produces other weekdays

real

4-gram

2-gram

3-gram

Page 11: Quantifying Semantics and Contextualizing Distributional

11

Motifs: real vs. {1,2,3,4}-gram

Largest deviations:  Chain  Box

English, 1M sentences, controlled for sent. length distribution

captured by transitivity

Page 12: Quantifying Semantics and Contextualizing Distributional

12

… again consistent across languages

Page 13: Quantifying Semantics and Contextualizing Distributional

13

Instances of Chains

 total – km2 – square – {root, feet}  Democrats – Social – Sciences – Arts  Number – One – Formula – Championship  difficult – extremely – rare – occasions  Abraham – Lincoln – Nebraska – {Iowa, Missouri}

Words with different usage contexts cause chains: this has to do

with polysemy of language

Page 14: Quantifying Semantics and Contextualizing Distributional

14

Instances of Boxes

 Ancient – Greek – ancient – Greece  winning – award – won – Prize  CBS – News – BBC – Radio  Ph.D. – his – doctorate – University  said – interview – stated – “  wrote – articles – published – poems

Words with similar contexts cause boxes: they co-occur with the same

words but inhibit each other. This has to do with synonymy of language.

Different forms of a word function as “synonyms”,

Page 15: Quantifying Semantics and Contextualizing Distributional

15

Explanation for differences

 n-gram models are not aware of lexical ambiguity: “Abraham Lincoln, Nebraska”

 n-gram models do no have a mechanism to inhibit words that are too similar, rather tend to generate them together

 In motif profiles, language structure manifests itself in the

absence of connections!  Can use motif profile to measure how well a LM captures

synonymy and polysemy

Page 16: Quantifying Semantics and Contextualizing Distributional

16

Fulfilled Conjectures

 Distinction: Complex network analysis can unveil differences between real and (n-gram)-generated text

 Quantification: These differences are larger for more deficient language models (smaller n)

 Relation to Semantics: some of the motifs are connected to semantic phenomena such as synonymy and polysemy

Page 17: Quantifying Semantics and Contextualizing Distributional

17

Towards better LMs

 Increase n in n-gram?

 Add semantic layer  use word sense induction techniques to split words into meanings  use topic models in conjunction with n-gram models  use first-order and second-order statistics for inhibition mechanism?

 Model language in an entirely different way?

Page 18: Quantifying Semantics and Contextualizing Distributional

18

CONTEXTUALIZING DISTRIBUTIONAL SEMANTICS

Page 19: Quantifying Semantics and Contextualizing Distributional

19

Syntagmatic vs. Paradigmatic Relations

 Syntagmatic Relations: syntactic constraints in the context  Paradigmatic Relations: associations, semantic constraints

http://courses.nus.edu.sg/course/elltankw/history/Vocab/B.htm

Ferdinand de Saussure

That’s what Chris calls two-dimensional text

Page 20: Quantifying Semantics and Contextualizing Distributional

20

Motivation: What to do with two-dimensional text

 Knowledge-based Word Sense Disambiguation (à la Lesk)

A patient fell over a stack of magazines in an aisle at a physiotherapist practice.

WordNet: S: (n) magazine (product consisting of a paperback periodic publication as a physical object) "tripped over a pile of magazines”

Zero word overlap

customer student individual person mother user passenger ..

rose dropped climbed increased slipped declined tumbled surged …

pile copy lots dozens array collection amount ton …

field hill line river stairs road hall driveway …

physician attorney psychiatrist scholar engineer journalist contractor …

session game camp workouts training meeting work …

jumped woke turned drove walked blew put fell ..

stack tons piece heap collection bag loads mountain ..

fell stack

stack

fell

pile

pile

collection

collection

ton

tons Overlap = 2 Overlap = 1 Overlap = 2

Tristan Miller, Chris Biemann, Torsten Zesch, Iryna Gurevych (2012): Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation. submitted to COLING-12

Page 21: Quantifying Semantics and Contextualizing Distributional

21

Knowledge-based WSD without MFS back-off

 Expansions help in Simplified Lesk (SL) and Simplified Extended Lesk (SEL)   The less material comes from the resource (SL), the more the expansions help   unsupervised, knowledge-base method exceeds MFS for the first time when not

using sense frequency information Tristan Miller, Chris Biemann, Torsten Zesch, Iryna Gurevych (2012): Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation. submitted to COLING-12

Page 22: Quantifying Semantics and Contextualizing Distributional

22

Distributional Hypothesis

The Distributional Hypothesis in linguistics is the theory that words that occur in similar contexts tend to have similar meanings.  Contexts: Syntagmatic relations  Similar meanings: Paradigmatic relations The Distributional Hypothesis is the basis for Statistical Semantics. It states that the meaning of a word can be defined in terms of its context.

Zellig S. Harris

Z. Harris. (1954). Distributional Structure. Word 10 (2/3)

Page 23: Quantifying Semantics and Contextualizing Distributional

23

Contextual Hypothesis

Operationalizing the distributional hypothesis:   let’s compare words along their contexts   the more contexts they share …  … the more similar they are! Strong Contextual Hypothesis: Two words are semantically similar to the extent that their contextual representations are similar Miller and Charles (1991):   first study on 30-words subset of RG65  shows strong correlation between similarity judgment and number of

common contexts

G. A. Miller, W. G. Charles (1991): Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 1991, 6 (1) 1-28

George A. Miller

Page 24: Quantifying Semantics and Contextualizing Distributional

24 SS 2012 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann |

Contexts of Words

Contexts:   immediate neighbors  other words in the same sentence (bag of words)  words along dependency paths  words in the same document  stop word vector contexts any process that builds a structure on sentences can be used as a source for contexts How to measure saliency of a context?   frequency count (with/ without TF-IDF weighting)  significance measures

Page 25: Quantifying Semantics and Contextualizing Distributional

25

The @@ operation: producing pairs of words and features

SENTENCE: I suffered from a cold and took aspirin.

STANFORD COLLAPSED DEPENDENCIES: nsubj(suffered, I); nsubj(took, I); root(ROOT, suffered); det(cold, a); prep_from(suffered, cold); conj_and(suffered, took); dobj(took, aspirin) WORD-FEATURE PAIRS: suffered nsubj(@@, I) 1 took nsubj(@@, I) 1 cold det(@@, a) 1 suffered prep_from(@@, cold) 1 suffered conj_and(@@, took) 1 took dobj(@@, aspirin) 1

I nsubj(suffered, @@) 1 I nsubj(took, @@) 1 a det(cold, @@) 1 cold prep_from(suffered, @@) 1 took conj_and(suffered, @@) 1 aspirin dobj(took, @@) 1

http://nlp.stanford.edu:8080/parser/

Jo Bim

Page 26: Quantifying Semantics and Contextualizing Distributional

26

Significance Measures

How to determine that is more interesting than ? Significance measures give a score of “interestingness” between words and contexts.  Lexicographer’s Pointwise Mutual Information

 Log-Likelihood ratio

n: number of ‘experiments’ nA: number of experiments where A takes part nB: number of experiments where B takes part nAB: number of experiments where A and B co-occur

LMI(A,B) = nAB! log

2

nAB

nA!n

B

suffered prep_from(@@, cold) I nsubj(took, @@)

Page 27: Quantifying Semantics and Contextualizing Distributional

27

Distributional Thesaurus with MapReduce steps Roomano is a hard Gouda-like cheese from Friesland in the northern part of The Netherlands. It pairs well with aged sherries ...

FreqSig t: min freq s: min sign

extrTupels using gramm. relations

word feature t hard#a cheese#ADJ_MODn 17 cheese#n Gouda-like#ADJ_MODa 5 cheese#n hard#ADJ_MODa 17 pair#v well#ADV_MODa 3 ... .... ...

word feature s hard#a cheese#ADJ_MODn 15.8 cheese#n Gouda-like#ADJ_MODa 7.6 cheese#n hard#ADJ_MODa 0.4 ... .... ...

AggrPerFt feature words cheese#ADJ_MODn hard#a, yellow#a, French#a hard#ADJ_MODa cheese#n, stone#n ... .... ...

SimCounts w: weighting for # words/ feature

word word w.sum hard#a yellow#a 0.234 yellow#a hard#a 0.234 cheese#n stone#n 3.14 ... .... ...

PruneGraph p: max number of features per word ; s

(like data below)

Convert sum threshold

(cheese (desc 0.0000) (sims cream 2.1311548 onion 2.0364845 meat 1.5237559 sausage 1.5049095 sauce 1.4562587 garlic 1.4231627 bean 1.4186963 ))

blue: Parameters green: Steps

Page 28: Quantifying Semantics and Contextualizing Distributional

28

Distributional Thesaurus (DT)

 Computed from distributional similarity statistics  Entry for a target word consists of a ranked list of neighbors meeting meeting 288 meetings 102 hearing 89 session 68 conference 62 summit 51 forum 46 workshop 46 hearings 46 ceremony 45 sessions 41 briefing 40 event 40 convention 38 gathering 36 ...

articulate articulate 89 explain 19 understand 17 communicate 17 defend 16 establish 15 deliver 14 evaluate 14 adjust 14 manage 13 speak 13 change 13 answer 13 maintain 13 ...

immaculate amod(condition,@@)

perfect amod(timing,@@)

nsubj(@@,hair)

cop(@@,remains)

First order

immaculate perfect

Second order

3

amod(Church,@@)

Page 29: Quantifying Semantics and Contextualizing Distributional

29

From Distributional to Contextual Thesaurus

 Distributional Thesaurus (DT)  computed from global statistics   ranked list of most similar terms, mix of senses  words that have similar contexts, but will not fit in all contexts of the target

 Contextual Thesaurus (CT)  use DT as a source for similar terms   re-rank entries according to how they fit in a given context  correct senses should be ranked higher

Why not use (only) a lexical resource for word sense disambiguation/expansion?   all resources leak   we want to be agnostic of sense distinctions   data-driven, language- and domain-independent

Page 30: Quantifying Semantics and Contextualizing Distributional

30

Single target expansion

cold

caught

nasty

dobj

amod

temperature heat weather rain flu wind chill disease

0 42.0 0 10.8 89.9 454.4 8.6 59.2

0 6.3 139.4 0 12.0 0 0 27.9

+1 smoothed harmonic mean

rank

1 12.5 1.98 1.84 22.7 1.99 1.81 39.05

1 2 3 C. Biemann, M. Riedl (submitted): Text: Now in 2D! A Framework for Lexical Expansion with Contextual Similarity

Page 31: Quantifying Semantics and Contextualizing Distributional

31

Lexical Substitution Task Results

 Semeval 2007 lexical substitution task: 5 annotators provide substitutions in context for 200 words/2000 contexts

 oot “out of ten” scoring: how many of the top 10 system answers were also provided by annotators

 mode scoring: give higher weight acc. to substitution frequency

 Adverbs with only 1 dependency are hard

  improvements on all other situations

  this is far away from SOA, but the first system that does not presuppose a list of synonyms

Page 32: Quantifying Semantics and Contextualizing Distributional

32

All-words Contextualization

  like POS-tagging, but with the top-n similar words different tags each time  yields probability distribution over two-dimensional text vocabulary

 which can be used as a representation for the original text

Page 33: Quantifying Semantics and Contextualizing Distributional

33

Parse and obtain priors

SS 2012 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann |

I caught a cold

I0.24

we0.15

they0.13

you0.13

he0.09

she0.09

caught0.09

picked0.02

threw0.02

took0.02

grabbed0.02

got0.02

a0.24

another0.12

every0.09

the0.08

an0.08

this0.07

cold0.27

heat0.03

rain0.02

sun0.02

flu0.02

wind0.02

nasty

nasty0.11

ugly0.02

bad0.02

negative0.01

interesting0.01

serious0.01

dobj nsubj

amod

det

Page 34: Quantifying Semantics and Contextualizing Distributional

34

Reorder (to simplify the example)

SS 2012 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann |

I caught

I0.24

we0.15

they0.13

you0.13

he0.09

she0.09

caught0.09

picked0.02

threw0.02

took0.02

grabbed0.02

got0.02

a

a0.24

another0.12

every0.09

the0.08

an0.08

this0.07

cold

cold0.27

heat0.03

rain0.02

sun0.02

flu0.02

wind0.02

nasty

nasty0.11

ugly0.02

bad0.02

negative0.01

interesting0.01

serious0.01

nsubj dobj

amod

det

Page 35: Quantifying Semantics and Contextualizing Distributional

35

Conditional Probabilities and Priors

 Run Viterbi algorithm over all possible paths

 Sum path probabilities per word

SS 2012 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann |

caught

caught0.09

picked0.02

threw0.02

took0.02

grabbed0.02

got0.02

nasty

nasty0.11

ugly0.02

bad0.02

negative0.01

interesting0.01

serious0.01

dobj amod

cold

cold0.27

heat0.03

rain0.02

sun0.02

flu0.02

wind0.02

0.1

0.01

0.06

0.14

0.01

0.05

0.12

0.04

0.004

0.04

0.01

0.001

0.003

0.01

0.004

0.002

0.002

Page 36: Quantifying Semantics and Contextualizing Distributional

36

Representation: probability distribution over words

 Re-rank by sum of path distributions

 cut on threshold

SS 2012 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann |

caught

caught0.87

took0.12

got0.01

nasty

nasty0.62

bad0.36

serious0.02

dobj amod

cold

cold0.98

heat0.002

flu0.02

wind0.0002

Page 37: Quantifying Semantics and Contextualizing Distributional

37

Representation: probability distribution over words

 Re-rank by sum of path distributions

 cut on threshold

SS 2012 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann |

caught

caught0.87

nasty

nasty0.62

dobj amod

cold

cold0.98

Page 38: Quantifying Semantics and Contextualizing Distributional

38

LANGUAGE TO KNOWLEDGE MAPPING

Page 39: Quantifying Semantics and Contextualizing Distributional

39

Implicit vs. Explicit sense distinctions

 word sense disambiguation is handled implicitly by the contextual thesaurus

  this might be sufficient for paraphrasing and semantic text similarity

But:  hard to link these implicit sense rankings with a given ontology  hard to find types or hypernym labels

Solution:  Clustering for Word Sense Induction  Disambiguation to sense clusters  Automatic mapping of sense clusters into a taxonomy/ontology

Page 40: Quantifying Semantics and Contextualizing Distributional

40

DT entry “paper#NN” with contexts paper#NN s common contexts newspaper#NN 45 told#VBD#-dobj column#NN#-prep_in local#JJ#amod editor#NN#-poss edition#NN#-prep_of editor#NN#-prep_of hometown#NN#nn industry#NN#-

nn clips#NNS#-nn shredded#JJ#amod pick#VB#-dobj news#NNP#appos daily#JJ#amod writes#VBZ#-nsubj write#VB#-prep_for wrote#VBD#-prep_for wrote#VBD#-prep_in wrapped#VBN#-prep_in reading#VBG#-prep_in reading#VBG#-dobj read#VBD#-prep_in read#VBD#-dobj read#VBP#-prep_in read#VB#-dobj read#VB#-prep_in record#NN#prep_of article#NN#-prep_in reports#VBZ#-nsubj reported#VBD#-nsubj printed#VBN#amod printed#VBD#-nsubj printed#VBN#-prep_in published#VBN#-prep_in published#VBN#partmod published#VBD#-nsubj sunday#NNP#nn section#NN#-prep_of school#NN#nn saw#VBD#-prep_in ad#NN#-prep_in copy#NN#-prep_of page#NN#-prep_of pages#NNS#-prep_of morning#NN#nn story#NN#-prep_in

book#NN 33 recent#JJ#amod read#VB#-dobj read#VBD#-dobj reading#VBG#-dobj edition#NN#-prep_of printed#VBN#amod industry#NN#-nn described#VBN#-prep_in writing#VBG#-dobj wrote#VBD#-prep_in wrote#VBD#rcmod write#VB#-dobj written#VBN#rcmod written#VBN#-dobj wrote#VBD#-dobj pick#VB#-dobj photo#NN#nn co-author#NN#-prep_of co-authored#VBN#-dobj section#NN#-prep_of published#VBN#-dobj published#VBN#-nsubjpass published#VBD#-dobj published#VBN#partmod copy#NN#-prep_of buying#VBG#-dobj buy#VB#-dobj author#NN#-prep_of bag#NN#-nn bags#NNS#-nn page#NN#-prep_of pages#NNS#-prep_of titled#VBN#partmod

article#NN 28 authors#NNS#-prep_of original#JJ#amod notes#VBZ#-nsubj published#VBN#-dobj published#VBD#-dobj published#VBN#-nsubjpass published#VBN#partmod write#VB#-dobj wrote#VBD#rcmod wrote#VBD#-prep_in written#VBN#rcmod wrote#VBD#-dobj written#VBN#-dobj writing#VBG#-dobj reported#VBD#-nsubj describing#VBG#partmod described#VBN#-prep_in copy#NN#-prep_of said#VBD#-prep_in recent#JJ#amod read#VB#-dobj read#VB#-prep_in read#VBD#-dobj read#VBD#-prep_in reading#VBG#-dobj author#NN#-prep_of titled#VBN#partmod lancet#NNP#nn

magazine#NN 26 editor#NN#-poss editor#NN#-prep_of edition#NN#-prep_of industry#NN#-nn copy#NN#-prep_of article#NN#-prep_in ad#NN#-prep_in published#VBD#-nsubj published#VBN#partmod published#VBN#-prep_in page#NN#-prep_of pages#NNS#-prep_of story#NN#-prep_in buy#VB#-dobj wrote#VBD#-prep_in wrote#VBD#-prep_for printed#VBN#-prep_in printed#VBN#amod reading#VBG#-dobj read#VBD#-prep_in read#VB#-dobj reported#VBD#-nsubj reports#VBZ#-nsubj column#NN#-prep_for glossy#JJ#amod told#VBD#-dobj

plastic#NN 24 wrapped#VBD#-prep_in wrapped#VBN#-prep_in wood#NN#conj_and sheet#NN#-prep_of sheets#NNS#-prep_of shredded#JJ#amod bits#NNS#-prep_of paper#NN#-conj_and paper#NN#conj_and cardboard#NN#-conj_and cardboard#NN#conj_and pieces#NNS#-prep_of piece#NN#-prep_of rolls#NNS#-prep_of bags#NNS#-nn bags#NN#-nn bag#NN#-nn recycled#JJ#amod cups#NNS#-nn made#VBN#-prep_from white#JJ#amod glossy#JJ#amod glass#NN#-conj_and glass#NN#conj_and

metal#NN 23 bits#NNS#-prep_of made#VBN#-prep_from work#NN#-nn wood#NN#conj_and scrap#NN#nn paper#NN#-conj_and piece#NN#-prep_of pile#NN#-prep_of pieces#NNS#-prep_of plastic#NN#conj_and plastic#NN#-conj_and plastic#NN#conj_or plate#NN#-nn plates#NNS#-nn recycled#JJ#amod clip#NN#-nn products#NNS#-nn put#VBD#-prep_to put#VB#-prep_to glass#NN#-conj_and glass#NN#conj_and tons#NNS#-prep_of white#JJ#amod

Page 41: Quantifying Semantics and Contextualizing Distributional

41

Clustering of DT entries: Sense Induction

SS 2012 | Computer Science Department | UKP Lab - Prof. Dr. Chris Biemann |

bright#JJ

paper#NN

C. Biemann (2006): Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. Proceedings of the HLT-NAACL-06 Workshop on Textgraphs-06, New York, USA.

Page 42: Quantifying Semantics and Contextualizing Distributional

42

Features for Disambiguation

paper 0 (newspaper) read#VB#-dobj 45 reading#VBG#-dobj 45 write#VB#-dobj 38 read#VBD#-dobj 37 writing#VBG#-dobj 36 wrote#VBD#-dobj 34 original#JJ#amod 27 wrote#VBD#-prep_in 26 recent#JJ#amod 26 published#VBN#partmod 25 written#VBN#-dobj 23 published#VBN#-nsubjpass 20 published#VBD#-dobj 19 copy#NN#-prep_of 18 said#VBD#-prep_in 18 author#NN#-prep_of 17 pages#NNS#-prep_of 16 told#VBD#-dobj 15 buy#VB#-dobj 14 published#VBN#-prep_in 14 page#NN#-prep_of 14

paper 1 (material) piece#NN#-prep_of 21 pieces#NNS#-prep_of 17 made#VBN#-prep_from 13 bags#NNS#-nn 11 white#JJ#amod 9 paper#NN#-conj_and 9 glass#NN#-conj_and 9 products#NNS#-nn 9 industry#NN#-nn 8 plastic#NN#conj_and 8 plastic#NN#-conj_and 8 bits#NNS#-prep_of 8 bag#NN#-nn 8 plastic#NN#conj_or 8 sheet#NN#-prep_of 7 recycled#JJ#amod 7 tons#NNS#-prep_of 7 glass#NN#conj_and 7 buy#VB#-dobj 6 plates#NNS#-nn 6 pile#NN#-prep_of 6

These are shared by paper and the cluster members. Disambiguation: find features in context. I am reading an original paper on the paper .

Page 43: Quantifying Semantics and Contextualizing Distributional

43

Cluster Labeling with IS-A Relations

 Run Hearst IS-A patterns (e.g. NP such as NP, NP and NP) on a large collection of text and store (noisy) IS-A pairs with their frequency, if above a threshold

 Activate hypernyms from cluster entries Typical regular polysemy in the medical domain:

influenza#0 viral gastroenteritis, bird flu, pulmonary anthrax, h1n1, tularaemia, west nile fever, mumps, influenza a, herpes zoster, respiratory infection, uri, phn, chicken pox, … influenza#1 trivalent influenza vaccine, antiviral, influenza vaccine, amantadine, peramivir, chemoprophylaxis, influenza vaccination, vaccine, poultry, chickenpox vaccine, …

Hearst, M. Automatic Acquisition of Hyponyms from Large Text Corpora, Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, France, July 1992.

Marti A. Hearst

Page 44: Quantifying Semantics and Contextualizing Distributional

44

Per-Cluster IS-A Pattern Counts

 Sum counts of ISA hypernym per cluster  Multiply by number of times it was found by the cluster members

influenza#0 viral gastroenteritis, bird flu, pulmonary anthrax, h1n1, tularaemia, west nile fever, mumps, influenza a, herpes zoster, respiratory infection, uri, phn, chicken pox, … influenza#1 trivalent influenza vaccine, antiviral, influenza vaccine, amantadine, peramivir, chemoprophylaxis, influenza vaccination, vaccine, poultry, chickenpox vaccine, …

viral gastroenteritis ISA gastroenteritis 555 viral gastroenteritis ISA illness 27 viral gastroenteritis ISA flu 24 viral gastroenteritis ISA stomach flu 24 viral gastroenteritis ISA infection 18 viral gastroenteritis ISA cause 15 viral gastroenteritis ISA The Basics 14

bird flu ISA flu 829 bird flu ISA avian influenza 42 bird flu ISA infection 42 bird flu ISA influenza 27 bird flu ISA pandemic 27 bird flu ISA avian flu 16 bird flu ISA vaccine 15

chicken pox ISA pox 2390 chicken pox ISA virus 93 chicken pox ISA disease 92 chicken pox ISA infection 76 chicken pox ISA viral infection 69 chicken pox ISA symptom 43 chicken pox ISA illness 38

peramivir ISA inhibitor 10 peramivir ISA option 4 peramivir ISA neuraminidase inhibitor 3

antiviral ISA treatment 8 antiviral ISA medication 6 antiviral ISA drug 5

influenza vaccine ISA vaccine 1532 influenza vaccine ISA flumist 10 influenza vaccine ISA intranasal 10 influenza vaccine ISA 2010-XX-XX 8 influenza vaccine ISA LAIV 8 influenza vaccine ISA table 8 influenza vaccine ISA contrast 6

Work done in the MRP project, Sept 2012

Page 45: Quantifying Semantics and Contextualizing Distributional

45

Cluster Labeling with IS-A Patterns

influenza#0: viral gastroenteritis, bird flu, pulmonary anthrax, h1n1, tularaemia, west nile fever, mumps, influenza a, herpes zoster, respiratory infection, uri, phn, chicken pox, human papillomavirus, sars, shingle, upper respiratory tract infection, varicella infection, hepatitis a, omsk hemorrhagic fever, respiratory viral infection, monkeypox, tonsillitis, acute respiratory disease, †, h1n1 influenza, h1n1 virus, rubeola, avian flu, vhf, rotavirus infection, pneumonia, upper respiratory infection, rabies, a/h1n1, varicella, grippe, bacterial pneumonia, croup, acute bronchitis, avian flu virus, rhinovirus, primary coccidioidomycosis, respiratory tract disease, rubella, influenza a virus, cold, contagious disease, diphtheria, infection, scarlet fever, infectious disease, hib, poliomyelitis, rmsf, giardiasis, siv, virus, swine influenza virus, dengue fever, h3n2, primary herpes simplex, acute infection, primary hiv infection, infectious mononucleosis, swine influenza, hepatitis b, sepsis, varivax, influenza epidemic, yellow fever, secondary bacterial pneumonia, chickenpox, cholera, bird flu virus, q fever, flu symptom, whooping cough, stomach flu, influenza pneumonia, neonatal herpes simplex virus infection, pneumococcal infection, adenoviral infection, respiratory disease, mmr, distemper, egg allergy, strain, chikungunya, swine flu, reye's syndrome, non-a, viral infection, parainfluenza, cold symptom, sinus infection, anthrax, adenovirus infection, atypical measle, malaria, poliovirus, vee, hiv, h5n1, human immunodeficiency virus, viral shedding, hepatitis b infection, hpv, eye infection, tetanus, rhinotracheitis, human rabies, coccidioidomycosis, human influenza, fetal damage, adenovirus, hbv, rotavirus, ili, hepatitis, genital herpes simplex, reactive lymphocytosis, neonatal herpes simplex, sore throat, pertussis, mono, bcg, vaccinia, hantavirus, postherpetic neuralgia, illness, respiratory syncytial virus infection, superinfection, hepatitis e, zoster, dengue, h1n1 flu, strep throat, mononucleosis, guillain-barré, influenza virus, yf, japanese encephalitis, acute respiratory infection, typhoid fever, common cold, tick fever, polio, herpangina, pneumococcal pneumonia, cowpox, viral pneumonia, jev, complication, flu, human flu, measle, chikv, symptomatic aortic stenosis, bronchitis, mumps orchitis, virus infection, avian influenza, lyme disease, fhv, hav, shigellosis, meningococcal disease, kyasanur forest disease, smallpox, meningococcal infection, sinusitis

influenza#1: trivalent influenza vaccine, antiviral, influenza vaccine, amantadine, peramivir, chemoprophylaxis, influenza vaccination, vaccine, poultry, chickenpox vaccine, live attenuated vaccine, neuraminidase inhibitor, oseltamivir, inactivated vaccine, oscillococcinum, tamiflu, relenza, antiviral drug, pneumococcal vaccine, rimantadine, zanamivir, shot, antiviral agent, flumist, flu vaccine, laiv, influenza virus vaccine

infection(3310937) disease(1748000) virus(817950) cause(783692) symptom(578480) fever(375228) condition(209022) illness(192675) complication(161158) influenza(155469)

vaccine(38566) drug(9990) agent(5004) vaccination(3960) treatment(2796) inhibitor(1504) medication(792) oseltamivir(752) medicine(448) zanamivir(396)

Work done in the MRP project, Sept 2012

Page 46: Quantifying Semantics and Contextualizing Distributional

46

 s

Page 47: Quantifying Semantics and Contextualizing Distributional

47

Conclusion

 Quantifying Semantics with Complex Network Analysis   transitivity measure distinguishes and quantifies deficiencies in language models   motif profile quantifies synonymy and polysemy

  Two-dimensional Text: Lexical expansion methods as a new metaphor  Context-independent expansion: Distributional Thesaurus   semantic text similarity results   knowledge-based word sense disambiguation results

 Context-dependent expansion: Contextual Thesaurus   lexical substitution results

 Notions of Ambiguity   induce word sense clusters   handle ambiguity by re-ranking the expansions

  Linking to Taxonomies   label clusters with ISA-pattern extractions

Page 48: Quantifying Semantics and Contextualizing Distributional

48

Questions, Discussion, Comments?

THANKS!