Upload
naoaki-okazaki
View
1.239
Download
4
Embed Size (px)
DESCRIPTION
第6回最先端NLP勉強会の発表資料 http://www.cl.ecei.tohoku.ac.jp/~y-matsu/snlp6/
Citation preview
Modeling Missing Data in Distant Supervision for Information Extraction
Alan Ritter (CMU)Luke Zettlemoyer (University of Washington)
Mausam (University of Washington)Oren Etzioni (Vulcan Inc.)TACL, 1, 367-378, 2013.
Presented by Naoaki Okazaki (Tohoku University)
2014-09-05 Modeling Missing Data in Distant Supervision 1
Relation instance extractionSteven Spielberg’s film Saving Private Ryan is loosely based on the brothers’ story.
Extractor Film Director
Saving Private Ryan Steven Spielberg
Film-director relation
• Fully-supervised learning (Zhou+ 05, …)• Uses ACE corpora to build relation-instance classifiers• Suffers from the limited number of training data
• Unsupervised information extraction (Banko+ 07, …)• Extracts relational patterns between entities, and clusters the
patterns into relations• Difficult to map clusters into relations of interest
• Bootstrap learning (Brin 98, …)• Uses seed instances to extract a new set of relational patterns• Often suffers from low precision (semantic drift)
• Distant supervision (Mintz+ 09, …)• Combines the advantages of the above approaches
2014-09-05 Modeling Missing Data in Distant Supervision 2
Distant supervision (Mintz+, 09)Person Birthplace
Edwin Hubble Marshfield
… … Automatic annotation
Astronomer Edwin Hubble was born in Marshfield, Missouri.
Feature extraction
Mintz et al. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011.* Each row presents a single feature. Concatenate features from different sentences containing the same entity pairs.
Problem: An entity pair cannot have multiple relationsE.g., Founded(Jobs, Apple) and CEO-of(Jobs, Apple) are true.
2014-09-05 Modeling Missing Data in Distant Supervision 3
MultiR (Hoffmann+, 11)
Introduces latent variables (𝑧𝑧𝑖𝑖) to indicate the relation expressed by sentence 𝑥𝑥𝑖𝑖
0 1 1 0
Founder Founder CEO-of
𝑦𝑦born−in 𝑦𝑦founder 𝑦𝑦CEO−of 𝑦𝑦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
𝑧𝑧1 𝑧𝑧2 𝑧𝑧3
𝑝𝑝 𝒚𝒚, 𝒛𝒛 𝒙𝒙
=1𝑍𝑍𝑥𝑥�𝑟𝑟
Φjoin(𝑦𝑦𝑟𝑟 , 𝒛𝒛)�𝑖𝑖
Φextract(𝑧𝑧𝑖𝑖 , 𝑥𝑥𝑖𝑖)
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝒛𝒛
𝒙𝒙
𝒚𝒚
For entity pair, (Steve Jobs, Apple) 𝑥𝑥𝑖𝑖: a sentence containing the entity pair𝑦𝑦𝑟𝑟 ∈ {0,1}: 1 if the knowledge base includes the pair with relation 𝑟𝑟, 0 otherwise𝑧𝑧𝑖𝑖 ∈ 𝑅𝑅: the relation expressed by sentence 𝑥𝑥𝑖𝑖
Φextract 𝑧𝑧𝑖𝑖 , 𝑥𝑥𝑖𝑖 = exp �𝑗𝑗
𝜃𝜃𝑗𝑗𝜙𝜙𝑗𝑗(𝑧𝑧𝑖𝑖 , 𝑥𝑥𝑖𝑖)
Φjoin 𝑦𝑦𝑟𝑟 , 𝒛𝒛 = 1(¬𝑦𝑦𝑟𝑟⋁∃𝑖𝑖: 𝑗𝑗 = 𝑧𝑧𝑖𝑖)(Deterministic OR)
The same as (Mintz+ 09)
Φjoin ensures that a sentence 𝑥𝑥𝑖𝑖 expressing the relation 𝑟𝑟 exists if 𝑟𝑟 is true
Allows multiple relations for the same entity pair
2014-09-05 Modeling Missing Data in Distant Supervision 4
MultiR: Training
Hoffmann et al. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550.
Loop for passes over the training data
Loop for entity pairs in the KB
Predict sentence-level and KB-level relations (ignoring
the facts in the KB)
Find an optimal assignment of sentence-level relations
consistent with the facts in KB
We need two kinds of inferences
Update feature weights similarly to the perceptron algorithm
2014-09-05 Modeling Missing Data in Distant Supervision 5
MultiR: Inference 1: argmax𝒚𝒚,𝒛𝒛
𝑝𝑝(𝒚𝒚, 𝒛𝒛|𝒙𝒙)
? ? ? ?
? ? ?
𝑦𝑦born−in 𝑦𝑦founder 𝑦𝑦CEO−of 𝑦𝑦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
𝑧𝑧1 𝑧𝑧2 𝑧𝑧3
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝒛𝒛
𝒙𝒙
𝒚𝒚
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
born−infounderCEO−ofcapita−of
Predict a relation label for each sentence
independently
Aggregate sentence-level predictions into
global-level predictions
2014-09-05 Modeling Missing Data in Distant Supervision 6
MultiR: Inference 1: argmax𝒚𝒚,𝒛𝒛
𝑝𝑝(𝒚𝒚, 𝒛𝒛|𝒙𝒙)
0 1 0 0
founder founder founder
𝑦𝑦born−in 𝑦𝑦founder 𝑦𝑦CEO−of 𝑦𝑦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
𝑧𝑧1 𝑧𝑧2 𝑧𝑧3
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝒛𝒛
𝒙𝒙
𝒚𝒚
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
born−infounderCEO−ofcapita−of
Predict a relation label for each sentence
independently
Aggregate sentence-level predictions into
global-level predictions
Very easy to find!Computational cost:
𝑜𝑜( 𝑅𝑅 𝒙𝒙 )
2014-09-05 Modeling Missing Data in Distant Supervision 7
MultiR: Inference 2: argmax𝒛𝒛
𝑝𝑝(𝒛𝒛|𝒙𝒙,𝒚𝒚)
0 1 1 0
? ? ?
𝑦𝑦born−in 𝑦𝑦founder 𝑦𝑦CEO−of 𝑦𝑦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
𝑧𝑧1 𝑧𝑧2 𝑧𝑧3
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝒛𝒛
𝒙𝒙
𝒚𝒚
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
born−infounderCEO−ofcapita−of
0.5 87 16 11
8 96 7 0.1
0.1 0.2
Define an edge weight: w 𝑦𝑦𝑟𝑟 , 𝑧𝑧𝑖𝑖 = Φextract(𝑟𝑟, 𝑥𝑥𝑖𝑖)
A node with 𝑦𝑦𝑟𝑟 = 1 must have at least an edge connecting to 𝑧𝑧𝑖𝑖
Each node 𝑧𝑧𝑖𝑖 must have an edge connecting to 𝑦𝑦𝑟𝑟
Find a set of edges that maximize the sum of weights
2014-09-05 Modeling Missing Data in Distant Supervision 8
MultiR: Inference 2: argmax𝒛𝒛
𝑝𝑝(𝒛𝒛|𝒙𝒙,𝒚𝒚)
0 1 1 0
founder founder CEO-of
𝑦𝑦born−in 𝑦𝑦founder 𝑦𝑦CEO−of 𝑦𝑦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
𝑧𝑧1 𝑧𝑧2 𝑧𝑧3
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝒛𝒛
𝒙𝒙
𝒚𝒚
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
born−infounderCEO−ofcapita−of
16 118 9
6 7
Define an edge weight: w 𝑦𝑦𝑟𝑟 , 𝑧𝑧𝑖𝑖 = Φextract(𝑟𝑟, 𝑥𝑥𝑖𝑖)
A node with 𝑦𝑦𝑟𝑟 = 1 must have at least an edge connecting to 𝑧𝑧𝑖𝑖
Each node 𝑧𝑧𝑖𝑖 must have an edge connecting to 𝑦𝑦𝑟𝑟
Find a set of edges that maximize the sum of weights
Exact solution in polynomial time
In practice, approximate solution by greedy search (assigning 𝑧𝑧𝑖𝑖 for
each node 𝑦𝑦𝑟𝑟 = 1) is sufficient2014-09-05 Modeling Missing Data in Distant Supervision 9
Contribution of this work• MultiR makes two assumptions (hard constraints):
• If a fact is not found in the database, it cannot be mentioned in the text
• If a fact is in the database, it must be mentioned in at least one sentence.
• Relax MultiR to handle the situation where:• A fact is not mentioned in text (MIT)• A fact mentioned in text is missing in database (MID)
• Side effect of this relaxation• Incorporates the tendency that the knowledge base is
likely to include popular entities and relations2014-09-05 Modeling Missing Data in Distant Supervision 10
Distant Supervision with Data Not Missing at Random (DNMAR)
0 1 1 0
Founder Founder visit
𝑦𝑦born−in 𝑦𝑦founder 𝑦𝑦CEO−of 𝑦𝑦visit
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs visited Apple store…
𝑧𝑧1 𝑧𝑧2 𝑧𝑧3
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝒛𝒛
𝒙𝒙
𝒚𝒚
For entity pair, (Steve Jobs, Apple)
0 1 0 1𝒕𝒕
Introduce a layer of latent variables (𝑡𝑡𝑟𝑟) to handle missing cases
𝜙𝜙miss 𝑦𝑦𝑟𝑟 , 𝑡𝑡𝑟𝑟
=
−𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀 (𝑦𝑦𝑟𝑟 = 1⋀𝑡𝑡𝑟𝑟 = 0)(missing in text)
−𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀 (𝑦𝑦𝑟𝑟 = 0⋀𝑡𝑡𝑟𝑟 = 1)(missing in DB)
0 (otherwise)
Relaxing two hard constraints in MultiR into soft ones with penalty
factors −𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀 and −𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀
Introduce a new factor:
Training algorithm is the same as the one used in MultiR
2014-09-05 Modeling Missing Data in Distant Supervision 11
Constrained inference: argmax𝒛𝒛
𝑝𝑝(𝒛𝒛|𝒙𝒙,𝒚𝒚)
0 1 1 0
? ? ?
𝑦𝑦born−in 𝑦𝑦founder 𝑦𝑦CEO−of 𝑦𝑦visit
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs visited Apple store…
𝑧𝑧1 𝑧𝑧2 𝑧𝑧3
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝒛𝒛
𝒙𝒙
𝒚𝒚
For entity pair, (Steve Jobs, Apple)
? ? ? ?𝒕𝒕
𝑧𝑧∗ = argmax𝒛𝒛
�𝑖𝑖=1
𝑛𝑛
𝜃𝜃 � Φextract 𝑧𝑧𝑖𝑖 , 𝑥𝑥𝑖𝑖 + �𝑟𝑟
𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀 � 1(𝑦𝑦𝑟𝑟⋁∃𝑖𝑖: 𝑟𝑟 = 𝑧𝑧𝑖𝑖) −𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀� 1(¬𝑦𝑦𝑟𝑟⋁∃𝑖𝑖: 𝑟𝑟 = 𝑧𝑧𝑖𝑖)
Became more challenging
A* search can find an exact solution, but is not scalable
with many variables
Present a greedy hill climbing approach for the inference:
1. Initialize 𝑧𝑧𝑖𝑖 at random2. Obtain neighborhoods of
the current solution3. Move to the neighbor
yielding the highest score4. Repeat this process
2014-09-05 Modeling Missing Data in Distant Supervision 12
Incorporating popularity in KB• We tune the penalty factors 𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀 and 𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀 on a
development set• We can take into account how likely each fact is to
be observed in the text and the knowledge base• Facts about Barack Obama are likely to exist• Facts about Naoaki Okazaki are unlikely to exists
• Control the penalty factor for each entity pair• Popularity of entities: 𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀
(𝑒𝑒1,𝑒𝑒2) = −𝛾𝛾min(𝑐𝑐 𝑒𝑒1 , 𝑐𝑐(𝑒𝑒2))• A larger penalty if the model predicts that a fact about a
popular entity does not exist in KB• Well-aligned relations: assign 3 kinds of values of 𝛼𝛼𝑀𝑀𝑀𝑀𝑀𝑀𝑟𝑟
• A larger penalty if a popular relation such as contains, place_lived, and nationality does not exist in text
2014-09-05 Modeling Missing Data in Distant Supervision 13
Experiments• Binary relation extraction
• The standard setting (Riedel+, 10)• Knowledge base: Freebase relations• Text corpus: 1.8m New York Times articles
• Two kinds of evaluation• Sentence-level extractions using the dataset (Hoffmann+, 11)• Holdout evaluation on Freebase knowledge
• Unary relation extraction (NE categorization)• Twitter NE categorization dataset (Ritter+, 11)
• Knowledge base: Freebase (instances and their categories)• Text corpus: tweets
• Hold-out evaluation
2014-09-05 Modeling Missing Data in Distant Supervision 14
Results
17% increase in area under the curve.Incorporating popularity yielded 27% increase over the baseline.
This evaluation underestimate precision because many facts correctly extracted from text are missing in the database.DNMAR doubled the recall.
Ritter et al. (2013) Modeling Missing Data in Distant Supervision for Information Extraction, TACL(1), 367-378.
2014-09-05 Modeling Missing Data in Distant Supervision 15
Conclusion• Investigated the problem of missing data in distant
supervision• Presented an extension of MultiR to handle missing
data• Could incorporate the popularity of facts to be
included in the knowledge base and text• Presented a scalable inference algorithm based on
greedy hill-climbing• Demonstrated the effectiveness of the modeling
2014-09-05 Modeling Missing Data in Distant Supervision 16
References• Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke
Zettlemoyer, Daniel S. Weld. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550.
• Slides and codes
• Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011.
2014-09-05 Modeling Missing Data in Distant Supervision 17