20
Intelligent Database Systems Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY A Study of Hybrid Similarity Measures for Semantic Relation Extraction

A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Embed Size (px)

DESCRIPTION

A Study of Hybrid Similarity Measures for Semantic Relation Extraction. Presenter : Bei -YI Jiang Authors : Universit´e catholique de Louvain, Belgium 2012 . Association for Computing Machinery. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Presenter : BEI-YI JIANG

Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM

2012. ASSOCIATION FOR COMPUTING MACHINERY

A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Page 2: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Motivation

• The quality of the relations provided by existing extractors is still lower than the quality of the manually constructed relations.

• Most studies are still not taking into account the whole range of existing measures, combining mostly sporadically different methods.

Page 4: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Objectives

• To development of new relation extraction methods.• The method is a systematic analysis of 16 baseline

measures, and their combinations with 8 fusion methods and 3 techniques for the combination set selection.

Page 5: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Methodology• norm function

• similarity scores

• knn function

Page 6: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Methodology-Single Similarity Measures

• Measures Based on a Semantic Network(5)– exploit the lengths of the shortest paths between

terms in a network– probability of terms derived from a corpus– Wu and Palmer, Leacock and Chodorow, Resnik,

Jiang and Conrath , and Lin

Page 7: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

• Web-based Measures(3)– Web search engines– rely on the number of times the terms co-occur in

the documents– Normalized Google Distance(NGD)– Measures of Semantic Relatedness(MSR)– YAHOO!, BING, GOOGLE over the domain

wikipedia.org

Methodology-Single Similarity Measures

Page 8: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

• Corpus-based Measures(5)– Distributional Measures

› Bag-of-words Distributional Analysis(BDA) › Syntactic Distributional Analysis(SDA)

– Pattern-based Measure› PatternWiki

– Other Corpus-based Measures› Latent Semantic Analysis(LSA)› Normalized Google Distance(NGD)

Methodology-Single Similarity Measures

Page 9: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

• Definition-based Measures(3)– WktWiki– Gloss Vectors– Extended Lesk

Methodology-Single Similarity Measures

Page 10: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

• Combination Methods – Input: a set of similarity matrices{S1, . . . , SK}

produced by K single measures– Output: a combined similarity matrix Scmb

› 1. Mean› 2. Mean-Nnz› 3. Mean-Zscore› 4. Median

Methodology- Hybrid Similarity Measures

› 5. Max› 6. Rank Fusion› 7. Relation Fusion› 8. Logit

Page 11: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

• Combination Methods– Mean. A mean of K pairwise similarity scores:

– Mean-Nnz. A mean of those pairwise similarity scores which have a non-zero value:

Methodology- Hybrid Similarity Measures

Page 12: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

• Combination Methods– Mean-Zscore. A mean of K similarity scores transformed

into Z-scores:

– Median. A median of K pairwise similarities:

Methodology- Hybrid Similarity Measures

Page 13: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

• Combination Methods– Max. A maximum of K pairwise similarities:

– Rank Fusion.

Methodology- Hybrid Similarity Measures

Page 14: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

• Combination Methods– Relation Fusion.

– Logit.

Methodology- Hybrid Similarity Measures

Page 15: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

• Combination Sets– Expert choice of measures

– Forward stepwise procedure

– Logistic regression

Methodology- Hybrid Similarity Measures

Page 16: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Experiments• Evaluation– Human Judgements Datasets.

› MC, RG, WordSim353

– Semantic Relations Datasets.› BLESS, SN

Page 17: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Experiments

Page 18: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Experiments

Page 19: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Conclusions

• The results have shown that the hybrid measures outperform the single measures on all datasets.

• A combination of 15 baseline corpus-, web-, network-, and dictionary-based measures with Logistic Regression provided the best results.

Page 20: A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Intelligent Database Systems Lab

Comments• Advantages– higher performance

• Applications