21
LEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu From EMNLP & CoNLL 2007 (Proceedings of the 2007 Joint Co nference on Empirical Methods in Natural Language Processing and C omputational Natural Language Lea rning)

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.12.11 From EMNLP

Embed Size (px)

Citation preview

LEDIR: An Unsupervised Algorithm for

Learning Directionality of Inference Rules

Advisor: Hsin-His ChenReporter: Chi-Hsin YuDate: 2007.12.11

From EMNLP & CoNLL 2007(Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning)

Outlines

Introduction Related Work Learning Directionality of Inference Rules Experimental Setup Experimental Results Conclusion

Introduction (1)

Inference: X eats Y ⇔ X likes Y Examples:

“I eat spicy food. I like spicy food. (⇒ YES) I like rollerblading( 直排輪溜冰 ). I eat rollerblading. (⇒ NO) Preference: X eats Y ⇒ X likes Y (Asymmetric)

Plausibility: 2 sets:{1, 2, 3} {4}

Directionality: 3 sets: {1} {2} {3}

Introduction (2)

Applications (for improving the performance of) QA (Harabagiu and Hickl, 2006) Multi-Document Summarization (Barzilay et al. 1999) IR (Anick and Tipirneni 1999)

Proposed algorithm LEDIR (LEarning Directionality of Inference Rules, pronoun

ced “Leader”) Filtering incorrect rules (case 4) Identifying the directionality of the correct ones (case 1, 2,

or 3)

Related Work

Learning inference rules Barzilay and McKeown (2001) for paraphrases, DI

RT (Lin and Pantel 2001) and TEASE (Szpektor et al. 2004) for inference rules Low precision and bidirectional rules only

Learning directionality Chklovski and Pantel (2004) Zanzotto et al. (2006) Torisawa (2006) Geffet and Dagan (2005)

Learning Directionality of Inference Rules (1) – Formal Definition

<x, p, y> p is a binary semantic relation.

The semantic relation can be verb or other relation. x, y are entities.

Plausibility: 2 sets:{1, 2, 3} {4}

Directionality: 3 sets: {1} {2} {3}

Learning Directionality of Inference Rules (2) –Underlying Assumptions

Distributional hypothesis (Harris 1954) words that appear in the same contexts tend to have

similar meanings For modeling lexical semantics

Directionality hypothesis If two binary semantic relations tend to occur in similar

contexts and the first one occurs in significantly more contexts than the second, then the second most likely implies the first and not vice versa.

Generality: X eats Y 3000 次 X eats Y ⇒ X likes Y X likes Y 8000 次

Should be

Learning Directionality of Inference Rules (3) –Underlying Assumptions (cont.)

Concept in semantic space Being much richer for reasoning about inferences

than simple surface words Modeling the context of a relation p of the form <

x, p, y> using the semantic classes cx and cy of words that can

be instantiated for x and y respectively

Context similarity of two relations Overlap coefficient: |X ∩ Y| / min(|X|, |Y|)

Learning Directionality of Inference Rules (4) – Selectional Preferences

Relational selectional preferences (RSPs) of a binary relation p in <X, p, Y> the set of semantic classes C(x) and C(y) of words x and y C(x) = { cx : x in instance <x, p, y>, cx: the class of term x}

C(y) = { cy : y in instance <x, p, y>, cy: the class of term y}

Example: x likes y using the semantic classes from WordNet C(x) = {individual, social_group…} C(y) = {individual, food, activity…}

Learning Directionality of Inference Rules (5) – Inference Plausibility and Directionality

Context similarity of two relations The overlap coefficient of pi and pj

Example:

Learning Directionality of Inference Rules (6) – Inference Plausibility and Directionality (cont.)

α and β will be determined by experiments.

Learning Directionality of Inference Rules (7) – Two Models (JRM and IRM)

Model 1: Joint Relational Model (JRM) Count the actual occurrences of relation p in the

corpus Model 2: Independent Relational Model (IRM)

Context similarity of two relations

Cartesian product

Experiment Setup (1)

Inference rules choosing the inference rules from the DIRT resour

ce (Lin and Pantel 2001) DIRT consists of 12 million rules extracted from 1GB o

f newspaper text

Experiment Setup (2) Semantic classes

Must having the right balance between abstraction and discrimination

The first set of semantic classes obtained by running the CBC clustering algorithm (Pantel a

nd Lin, 2002) on TREC-9 and TREC-2002 newswire collections consisting of ove

r 600 million words. resulted in 1628 clusters, each representing a semantic cla

ss. The second set of semantic classes

Obtained by using WordNet 2.1 (Fellbaum 1998) A cut at depth four resulted in a set of 1287 semantic class

es (only WordNet noun Hierarchy)

Experiment Setup (3)

Implementation parsed the 1999 AP newswire collection consisting of 31 mi

llion words with Minipar (Lin 1993) Gold Standard Construction

randomly sampled 160 inference rules of the form pi ⇔ pj from DIRT, removed 3 nominalization rules, resulted in 157 rules.

Using 2 annotators 57 rules used for training set to train annotators 100 rules used for blind test set for this two annotators

Inter-annotator agreement: kappa=0.63 Revising the disagreements together to get the final gold standard

Experiment Setup (4)

Baselines B-random

Randomly assigns one of the four possible tags to each candidate inference rule.

B-frequent Assigns the most frequently occurring tag in the gold s

tandard to each candidate inference rule B-DIRT

Assumes each inference rule is bidirectional and assigns the bidirectional tag to each candidate inference rule.

Experimental Results (1) Evaluation Criterion

Parameter combination Ran all our algorithms with different parameter combination

s on the development set (the 57 DIRT rules), resulted in a total of 420 experiments

Used the accuracy statistic to obtain the best parameter combination for each of our four systems

Then used these parameter values to obtain the corresponding percentage accuracies on the test set for each of the four systems

Experimental Results (2)

25 11 30 34

27 7 48 18

50

Experimental Results (3)

Baseline 66%

Baseline 48.48%

Conclusion

The problem of semantic inferences fundamental to understanding natural language an integral part of many natural language applicati

ons The Directionality Hypothesis

The Directionality Hypothesis can indeed be used to filter incorrect inference rules

This result is one step in the direction of solving the basic problem of semantic inference

Thanks!!