PATTY: A Taxonomy of Relational Patterns with Semantic Types

PATTY:A Taxonomy of Relational Patternswith Semantic Types

Authors:Ndapandula Nakashole, Gerhard Weikum, Fabian Suchanek

(Max Planck Institute for Informatics)Expositor:

Akihiro Kameda(Aizawa Lab. in The University of Tokyo)

Abstract

● Syntactic + Ontological+ Lexical

● Mining algorithm● Ontological+Lexical

● +Syntactic

● Taxonomy construction

● Mined result● 5

ExperimentalEvaluation

SOL Pattern and synset

● Example: Syntactic + Ontological + Lexical● <person>'s [adj] voice * <song>● “Amy Winehouse's soft voice in 'Rehab'”

● Type signature: <person> × <song>

● Support set: {(Amy, Rehab), (Elvis, AllshookUp)}

● Synset

● syntactically general: X matches A Y matches B⊆● semantically general: P supports A Q support B⊆● synonymous: P⊆

semQ Q∧ ⊆

semP

Mining algorithm

● Pattern extraction and generalization● Lexical + Ontological → +Syntactic

● Taxonomy Construction● Find subsumption relationship● Integrate them into DAG (directed acyclic graph)

Pattern Extraction

● Prepare surface name and semantic type dict

– YAGO2, Freebase● Disambiguation

– Context-similarity prior proposed by Suchanek 2009

● Yields dependency path and connect 2 NE

– Stanford Parser● “Winehouse effortlessly performed her song Rehab”

→”Amy Winehouse effortlessly performed Rehab(song)”

Syntactic Pattern Generalization

● Lexicon to POS-tags, wild-cards, or types● Amy Winehouse's soft voice in 'Rehab'● <person>'s soft voice in <song>● <person>'s [adj] voice * <song>

● Generate all possible generalization at first.● If that subsumes multiple patterns with disjoint

support sets, that is rejected.

Taxonomy Construction

● Compare every pattern support?● Too slow. → Use Prefix-tree method (Han 2005)

● Frequency ordered (descending)● total <= |total entity pairs|● depth <= |largest support set|

Taxonomy Construction

● Traversing the treein bottom up manner.

● Find subsumptionby finding set inclusion

● p3 is nearly included by p4soft

Wilson estimator

● Naively, deg(S B) = |S∩B|/|S|⊆● |S| should be considered also...● Regard S as random sample from S'● [c-d, c+d] (c 0.5, d 0.5→c |S∩B|/|S|, d 0)≒ ≒ ≒ ≒● deg(S B) = c-d⊂

S

B SB B

S

λ=Zα/2

=1.96

DAG Construction

● Eliminate cyclic edge as few as possible● … is NP hard.● Greedy algorithm

● add by Wilson score order● if the relation path exists already or creates a cycle,

do not add.

Mined Result (5 experiments)● 2 data

● the New York Times archive (NYT) which includes about 1.8 Million newspaper articles from the years 1987 to 2007

● the English edition of Wikipedia (WKP), which contains about 3.8 Million articles (as of June 21, 2011)

● 2 knowledge base● YAGO2 consists of about 350,000 semantic classes from

WordNet and the Wikipedia category system

● Freebase consists of 85 domains and a total of about 2000 types within these domains

● Ordered or Random sampling● typed/untyped order

Summary of experiment

● High precision● High recall● WKP > NYT● YAGO2 > Freebase● Type is strong information● Interesting

Summary

● Syntactic + Ontological + Lexical Patternswith taxonomy tree

● 350,569 synset / precision 84.7% 8,162 subsumption / precision 75.0%

● Available online!

http://www.mpi-inf.mpg.de/yago-naga/patty/

http://www.mpi-inf.mpg.de/yago-naga/patty/

質疑応答

● synsetの作成はhard inclusionでやってるのか？● 曖昧にしか書かれていないけれど、おそらく

soft inclusionで相互に強くinclusionしてたらsynsetにしているのでは？

● 高瀬さんの紹介された論文にtaxonomy constructionのとこだけくっつけられないか？

Technology

PATTY: A Taxonomy of Relational Patterns with Semantic Types