26
MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka (WSDM’09) Speaker : Yi-Ling Tai Date : 2009/11/23 1

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

Embed Size (px)

DESCRIPTION

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES. Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka (WSDM’09) Speaker : Yi-Ling Tai Date : 2009/11/23. OUTLINE. Introduction Method Retrieving Contexts Extracting Lexical Patterns - PowerPoint PPT Presentation

Citation preview

Page 1: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka(WSDM’09)

Speaker : Yi-Ling TaiDate : 2009/11/23

1

Page 2: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

OUTLINE Introduction Method

Retrieving Contexts Extracting Lexical Patterns Identifying Semantic Relations Measuring Relational similarity

Experiments Conclusions

2

Page 3: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

INTRODUCTION Implicit semantic relations between two words

Google, Youtube (acquisition) Ostrich, bird (is a large)

Similar semantic relations between two words pairs Google, Youtube → Yahoo, Inktomi Ostrich, bird → lion, cat

This paper proposed a method to compute the similarity between implicit semantic relations in two word-pairs.

3

Page 4: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

OUTLINE OF THE SIMILARITY METHOD

4

Page 5: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

OUTLINE OF THE SIMILARITY METHOD Web search component

query a Web search engine to find the contexts Pattern extraction component

extract lexical patterns that express semantic relations

Pattern clustering component cluster the patterns to identify particular relation

Similarity computation component. compute the relational similarity between two

word-pairs

5

Page 6: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

RETRIEVAL CONTEXTS Snippets - brief summaries provided by Web

search engines along with the search results. containing two words, captures the local context

query “Google * *YouTube”

6

Page 7: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

RETRIEVAL CONTEXTS “ * ” - wildcard operator, matches one word or

none.

To retrieve snippets for a word pair (A,B) “A * B”, “B * A”, “A * * B”, “B * * A”,“A * * * B”, “B *

* * A”, and A B query words co-occur within a maximum of three

words “ ” ensure that the two words appear in the order

remove duplicates if they contain the exact sequence of all words 7

Page 8: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXTRACTING LEXICAL PATTERNS shallow lexical pattern extraction algorithm

extract the semantic relations between two words from web snippets.

not require language preprocessing

Consist of the following three steps Step 1:

Replace two words with two variables X and Y replace all numeric values by D do not remove punctuation marks

8

Page 9: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXTRACTING LEXICAL PATTERNS Step 2:

Exactly one X and one Y must exist in a subsequence The maximum length of a subsequence is L words. Gaps should not exceed g words. Total length of all gaps should not exceed G words. expand all negation contractions, didn’t → did not

Step 3: select subsequences with frequency greater than N

9

Page 10: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXTRACTING LEXICAL PATTERNS a modified prefixspan algorithm

consider all the words in a snippet not limited to extracting patterns from only the

mid-fix

X to acquire Y, X acquire Y, X to acquire Y for.10

Page 11: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

IDENTIFYING SEMANTIC RELATIONS A semantic relation can be expressed using

more than one pattern.

If there are many related patterns between two word-pairs, we can expect a high relational similarity.

cluster lexical patterns using their distributions over word-pairs , to identify semantically related patterns.

11

Page 12: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

12

Page 13: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

IDENTIFYING SEMANTIC RELATIONS p : word-pair frequency vector of pattern p : frequency of pattern p occurs with

the word-pair SORT : sorts the patterns in the descending

order of their total occurrence in all word-pairs

c : the vector sum of all word-pair frequency vectors corresponding to the patterns that belong to that cluster.

: denote the vector addition : similarity threshold 13

Page 14: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

MEASURING RELATIONAL SIMILARITY : feature vector of a word-pair

Elements of the feature vector , are the total frequencies of the word-pair in each cluster.

the relational similarity between two word-pairs

is a correlation matrix 14

Page 15: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

MEASURING RELATIONAL SIMILARITY the correlation between clusters and by

the element in

is the union between the two clusters

15

Page 16: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXPERIMENTS Dataset

100 instances (word or named-entity pairs)

five relation types ACQUIRER-ACQUIREE PERSON-BIRTHPLACE CEO-COMPANY COMPANY-HEADQUARTERS PERSON-FIELD

16

Page 17: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXPERIMENTS manually select 20 instances for each types.

Wikipedia online newspapers company reviews

For each instance, download snippets using YahooBOSS API

17

Page 18: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXPERIMENTS - LEXICAL PATTERNS Lexical Patterns

run the pattern extraction algorithm L = 5, g = 2, and G = 4. total number of unique patterns is 473910

we only select the 148655 patterns that occur at least twice. 18

Page 19: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXPERIMENTS - PATTERN CLUSTERS Ratio : singletons to total number of clusters

19

Page 20: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXPERIMENTS -RELATION CLASSIFICATION We evaluate the proposed relational similarity

measure in a relation classification task. k-nearest neighbor classification

classification accuracy

average precision

Rel(r) : a binary valued function that returns 1 if the word-pair at rank r has the same relation 20

Page 21: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXPERIMENTS -RELATION CLASSIFICATION

= 0.955 2629 non-singleton clusters 6930 singletons

21

Page 22: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

EXPERIMENTS -RELATION CLASSIFICATION the top 10 clusters with the largest number

of lexical patterns. the top four patterns that occur in most

number of word-pairs

22

Page 23: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

RELATIONAL SIMILARITY MEASUREScompare the relational similarity measures VSM:

each word-pair is represented by a vector of pattern frequencies

the relational similarity between two word-pairs is computed as the cosine similarity

LRA: Latent Relational Analysis Create a matrix in which the rows represent

word-pairs and the columns represent lexical patterns

singular value decomposition (SVD) 23

Page 24: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

RELATIONAL SIMILARITY MEASURES IP:

set in Formula 2 to the identity matrix compute relation similarity using pattern clusters

CORR: the proposed relational similarity measure.

24

Page 25: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

RELATIONAL SIMILARITY MEASURES

25

Page 26: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS  USING WEB SEARCH ENGINES

CONCLUSIONS We proposed a method to compute the

similarity between implicit semantic relations in two word-pairs. only a few queries to compute quickly compute relational similarity for unseen

word-pairs a general framework - designing relational similarity

measures can be modeled as searching for a matrix

26