Upload
ashely-freeman
View
50
Download
1
Tags:
Embed Size (px)
DESCRIPTION
MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES. Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka (WSDM’09) Speaker : Yi-Ling Tai Date : 2009/11/23. OUTLINE. Introduction Method Retrieving Contexts Extracting Lexical Patterns - PowerPoint PPT Presentation
Citation preview
MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES
Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka(WSDM’09)
Speaker : Yi-Ling TaiDate : 2009/11/23
1
OUTLINE Introduction Method
Retrieving Contexts Extracting Lexical Patterns Identifying Semantic Relations Measuring Relational similarity
Experiments Conclusions
2
INTRODUCTION Implicit semantic relations between two words
Google, Youtube (acquisition) Ostrich, bird (is a large)
Similar semantic relations between two words pairs Google, Youtube → Yahoo, Inktomi Ostrich, bird → lion, cat
This paper proposed a method to compute the similarity between implicit semantic relations in two word-pairs.
3
OUTLINE OF THE SIMILARITY METHOD
4
OUTLINE OF THE SIMILARITY METHOD Web search component
query a Web search engine to find the contexts Pattern extraction component
extract lexical patterns that express semantic relations
Pattern clustering component cluster the patterns to identify particular relation
Similarity computation component. compute the relational similarity between two
word-pairs
5
RETRIEVAL CONTEXTS Snippets - brief summaries provided by Web
search engines along with the search results. containing two words, captures the local context
query “Google * *YouTube”
6
RETRIEVAL CONTEXTS “ * ” - wildcard operator, matches one word or
none.
To retrieve snippets for a word pair (A,B) “A * B”, “B * A”, “A * * B”, “B * * A”,“A * * * B”, “B *
* * A”, and A B query words co-occur within a maximum of three
words “ ” ensure that the two words appear in the order
remove duplicates if they contain the exact sequence of all words 7
EXTRACTING LEXICAL PATTERNS shallow lexical pattern extraction algorithm
extract the semantic relations between two words from web snippets.
not require language preprocessing
Consist of the following three steps Step 1:
Replace two words with two variables X and Y replace all numeric values by D do not remove punctuation marks
8
EXTRACTING LEXICAL PATTERNS Step 2:
Exactly one X and one Y must exist in a subsequence The maximum length of a subsequence is L words. Gaps should not exceed g words. Total length of all gaps should not exceed G words. expand all negation contractions, didn’t → did not
Step 3: select subsequences with frequency greater than N
9
EXTRACTING LEXICAL PATTERNS a modified prefixspan algorithm
consider all the words in a snippet not limited to extracting patterns from only the
mid-fix
X to acquire Y, X acquire Y, X to acquire Y for.10
IDENTIFYING SEMANTIC RELATIONS A semantic relation can be expressed using
more than one pattern.
If there are many related patterns between two word-pairs, we can expect a high relational similarity.
cluster lexical patterns using their distributions over word-pairs , to identify semantically related patterns.
11
12
IDENTIFYING SEMANTIC RELATIONS p : word-pair frequency vector of pattern p : frequency of pattern p occurs with
the word-pair SORT : sorts the patterns in the descending
order of their total occurrence in all word-pairs
c : the vector sum of all word-pair frequency vectors corresponding to the patterns that belong to that cluster.
: denote the vector addition : similarity threshold 13
MEASURING RELATIONAL SIMILARITY : feature vector of a word-pair
Elements of the feature vector , are the total frequencies of the word-pair in each cluster.
the relational similarity between two word-pairs
is a correlation matrix 14
MEASURING RELATIONAL SIMILARITY the correlation between clusters and by
the element in
is the union between the two clusters
15
EXPERIMENTS Dataset
100 instances (word or named-entity pairs)
five relation types ACQUIRER-ACQUIREE PERSON-BIRTHPLACE CEO-COMPANY COMPANY-HEADQUARTERS PERSON-FIELD
16
EXPERIMENTS manually select 20 instances for each types.
Wikipedia online newspapers company reviews
For each instance, download snippets using YahooBOSS API
17
EXPERIMENTS - LEXICAL PATTERNS Lexical Patterns
run the pattern extraction algorithm L = 5, g = 2, and G = 4. total number of unique patterns is 473910
we only select the 148655 patterns that occur at least twice. 18
EXPERIMENTS - PATTERN CLUSTERS Ratio : singletons to total number of clusters
19
EXPERIMENTS -RELATION CLASSIFICATION We evaluate the proposed relational similarity
measure in a relation classification task. k-nearest neighbor classification
classification accuracy
average precision
Rel(r) : a binary valued function that returns 1 if the word-pair at rank r has the same relation 20
EXPERIMENTS -RELATION CLASSIFICATION
= 0.955 2629 non-singleton clusters 6930 singletons
21
EXPERIMENTS -RELATION CLASSIFICATION the top 10 clusters with the largest number
of lexical patterns. the top four patterns that occur in most
number of word-pairs
22
RELATIONAL SIMILARITY MEASUREScompare the relational similarity measures VSM:
each word-pair is represented by a vector of pattern frequencies
the relational similarity between two word-pairs is computed as the cosine similarity
LRA: Latent Relational Analysis Create a matrix in which the rows represent
word-pairs and the columns represent lexical patterns
singular value decomposition (SVD) 23
RELATIONAL SIMILARITY MEASURES IP:
set in Formula 2 to the identity matrix compute relation similarity using pattern clusters
CORR: the proposed relational similarity measure.
24
RELATIONAL SIMILARITY MEASURES
25
CONCLUSIONS We proposed a method to compute the
similarity between implicit semantic relations in two word-pairs. only a few queries to compute quickly compute relational similarity for unseen
word-pairs a general framework - designing relational similarity
measures can be modeled as searching for a matrix
26