Upload
akihiro-kameda
View
2.067
Download
2
Tags:
Embed Size (px)
Citation preview
PATTY:A Taxonomy of Relational Patternswith Semantic Types
Authors:Ndapandula Nakashole, Gerhard Weikum, Fabian Suchanek
(Max Planck Institute for Informatics)Expositor:
Akihiro Kameda(Aizawa Lab. in The University of Tokyo)
Abstract
● Syntactic + Ontological+ Lexical
● Mining algorithm● Ontological+Lexical
● +Syntactic
● Taxonomy construction
● Mined result● 5
ExperimentalEvaluation
SOL Pattern and synset
● Example: Syntactic + Ontological + Lexical● <person>'s [adj] voice * <song>● “Amy Winehouse's soft voice in 'Rehab'”
● Type signature: <person> × <song>
● Support set: {(Amy, Rehab), (Elvis, AllshookUp)}
● Synset
● syntactically general: X matches A Y matches B⊆● semantically general: P supports A Q support B⊆● synonymous: P⊆
semQ Q∧ ⊆
semP
Mining algorithm
● Pattern extraction and generalization● Lexical + Ontological → +Syntactic
● Taxonomy Construction● Find subsumption relationship● Integrate them into DAG (directed acyclic graph)
Pattern Extraction
● Prepare surface name and semantic type dict
– YAGO2, Freebase● Disambiguation
– Context-similarity prior proposed by Suchanek 2009
● Yields dependency path and connect 2 NE
– Stanford Parser● “Winehouse effortlessly performed her song Rehab”
→”Amy Winehouse effortlessly performed Rehab(song)”
Syntactic Pattern Generalization
● Lexicon to POS-tags, wild-cards, or types● Amy Winehouse's soft voice in 'Rehab'● <person>'s soft voice in <song>● <person>'s [adj] voice * <song>
● Generate all possible generalization at first.● If that subsumes multiple patterns with disjoint
support sets, that is rejected.
Taxonomy Construction
● Compare every pattern support?● Too slow. → Use Prefix-tree method (Han 2005)
● Frequency ordered (descending)● total <= |total entity pairs|● depth <= |largest support set|
Taxonomy Construction
● Traversing the treein bottom up manner.
● Find subsumptionby finding set inclusion
● p3 is nearly included by p4soft
Wilson estimator
● Naively, deg(S B) = |S∩B|/|S|⊆● |S| should be considered also...● Regard S as random sample from S'● [c-d, c+d] (c 0.5, d 0.5→c |S∩B|/|S|, d 0)≒ ≒ ≒ ≒● deg(S B) = c-d⊂
S
B SB B
S
λ=Zα/2
=1.96
DAG Construction
● Eliminate cyclic edge as few as possible● … is NP hard.● Greedy algorithm
● add by Wilson score order● if the relation path exists already or creates a cycle,
do not add.
Mined Result (5 experiments)● 2 data
● the New York Times archive (NYT) which includes about 1.8 Million newspaper articles from the years 1987 to 2007
● the English edition of Wikipedia (WKP), which contains about 3.8 Million articles (as of June 21, 2011)
● 2 knowledge base● YAGO2 consists of about 350,000 semantic classes from
WordNet and the Wikipedia category system
● Freebase consists of 85 domains and a total of about 2000 types within these domains
● Ordered or Random sampling● typed/untyped order
Summary of experiment
● High precision● High recall● WKP > NYT● YAGO2 > Freebase● Type is strong information● Interesting
Summary
● Syntactic + Ontological + Lexical Patternswith taxonomy tree
● 350,569 synset / precision 84.7% 8,162 subsumption / precision 75.0%
● Available online!
http://www.mpi-inf.mpg.de/yago-naga/patty/
質疑応答
● synsetの作成はhard inclusionでやってるのか?● 曖昧にしか書かれていないけれど、おそらく
soft inclusionで相互に強くinclusionしてたらsynsetにしているのでは?
● 高瀬さんの紹介された論文にtaxonomy constructionのとこだけくっつけられないか?