Upload
ilario
View
41
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A Survey on Information Extraction from Documents Using Structures of Sentences. Chikayama Taura Lab. M1 Mitsuharu Kurita. Introduction. Current search systems are based on 2 assumptions Users send words, not sentences The aim is finding documents which is related to the query words - PowerPoint PPT Presentation
Citation preview
A SURVEY ONINFORMATION EXTRACTIONFROM DOCUMENTSUSING STRUCTURES OF SENTENCES
Chikayama Taura Lab. M1 Mitsuharu Kurita
1
INTRODUCTION Current search systems are based on 2
assumptions
1. Users send words, not sentences2. The aim is finding documents which is
related to the query words
We are unconsciously get to select words which will appear nearby the target information
In some cases this clue doesn’t work well2
INTRODUCTION For more convenient access to the
information Analysis of the detail of question
To know the target information
Analysis of the information in retrieved documents To find the requested informationInformation Extraction
3
OUTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
4
INFORMATION EXTRACTION What is Information Extraction?
A kind of task in natural language processing Addresses extraction of information from texts
Not to retrieve the documents Originated with an international conference
named MUC
Message Understanding Conference (MUC) Competition of IE among research groups Set information extraction tasks every year
between 1987-19975
MUC COMPETITION An example of MUC task
MUC-3 terrorism domainInput: news articles
(some of them include terrorism event)
Output: the instances involved in each incident
6
MUC COMPETITION Pattern matching or linguistic analysis
At that time (1987-1997), there were many difficulties to use advanced natural language processing
Therefore, most of competitors adopted pattern matching to find instances
7
OUTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
8
EXAMPLE OF PATTERN MATCHING CIRCUS [92 Lehnert et al.]
Each pattern consists of “trigger word” and “linguistic pattern”
Pattern: kidnap-passiveTrigger:
“kidnap”Linguistic pattern:
“<subject> passive-verb”Variable:
“target”
“The mayor was kidnapped
by terrorists.”1. “kidnap” activates the
pattern2. “was kidnapped” is a
passive verb phrase3. The subject “mayor” is
the target
9
PROBLEMS OF PATTERN MATCHING It takes a huge amount of time to create
patterns In many cases, they were handwritten
It depends a lot on the target domain It is difficult to adapt to the new task
Automatic constructionof patterns
10
THE EARLIESTAUTOMATIC PATTERN
GENERATION AutoSlog [93 Riloff et al.]
Creates the patterns for CIRCUS automatically Training data: articles tagged the target word
Created 1237 patterns from 1500 tagged texts Only 450 of them were judged to be valid by
human
“The mayor was kidnapped
by terrorists.”
Pattern: kidnap-passiveTrigger:
“kidnap”Linguistic pattern:
“<subject> passive-verb”Variable:
“target”
11
Recently it has become possible to use deeper linguistic analysis
Some studies are addressing new IE tasks using these linguistic resources and machine learning approach
12
OUTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
13
SENTENCE STRUCTURES Dependency Structure
Describes modification relations between words One sentence makes up a tree structure
Predicate-Argument structure Describes the semantic relations between
predicate and argument One sentence makes up a graph structure
14
DIFFICULTIES TO USE STRUCTURED DATA Most of the machine learning algorithms deal
with the data as feature vectors
It is difficult to express structured data (e.g. trees, graphs) as vectors
The ways to use sentence structures for IE Frequent substructures Shortest paths between 2 words Applying the kernel method for structured data
15
OUTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
16
IE WITHSUBGRAPH OF SENTENCE STRUCTURES
On-Demand Information Extraction[06 Sekine et
al.] Create extraction patterns on-demand and
extract information with itquery Relevan
tarticles
FrequentSubtreeMining
Article database Dependency analyzer
Table of Information
Dependency trees
Subtree patterns
17
EXPERIMENTAL RESULTS Generated patterns
Found patterns for a query“merger and acquisition” (M&A)
Extracted Information For the query “acquire, acquisition, merger, buy,
purchase”
18
<COM1>
<agree to buy>
<COM2>
<for MNY>
<COM1>
<will acquire>
<COM2>
<for MNY>
<a MNY merger>
<of COM1>
<and COM2>
EXPERIMENTAL RESULTS Very quick construction of patterns
In MUC, it is allowed to take one month ODIE takes only a few minutes to return the
result
No training corpus is needed ODIE learns extraction patterns from the data
Information about reprising event can be extracted well Merger and acquisition Nobel prize winners 19
OUTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
20
IE WITHSHORTEST PATH BETWEEN
WORDS Extraction of interacting protein pair
[06 Yakushiji et al.] Extract the interacting protein pairs from
biomedical articles Focus on the shortest path between 2 protein
names on predicate-argument structure Discriminate with Support Vector Machine (SVM)
Entity1 is interacted with a hydrophilic loop region
of Entity2.be
entity1interact
withregion
ofa
hydrophilicloop
entity2 21
PATTERN GENERATION Variation of Patterns
The extracted patterns are not enough Divide the patterns and combine them into new
patterns
Main PrepEntity Entity
………
X interact Ywithprotein regio
n of
22
PATTERN GENERATION Validation of patterns
Some of these patterns are inappropriate Each patterns are scored by its adequacy to the
learning data
Feature vector
23
TP: True PositiveFP: False Positive
SUPPORT VECTOR MACHINE (SVM) 2 class linear classifier Divide the data space with hyperplane Margin maximization Margin
maximization
24
EXPERIMENTAL RESULTS Learning
AImed corpus 225 abstracts of biomedical papers Annotated with protein names and interactions
Extraction MEDLINE
14 million titles and 8 million abstracts Extracted data
7775 protein pairs 64.0% precision 83.8% recall
25
OUTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
26
IE WITH THE KERNEL METHOD ON SENTENCE STRUCTURES
Kernel Method e.g. SVM
Data are used only in the form of dot products If you can calculate the dot product directly, you
do not have to calculate the vector Furthermore, you can use other functions as long
as they meet some conditions27
Raw data
vector space
classifier
Kernel function
RELATION EXTRACTION Relation Extraction with Tree Kernel
[04 Culotta et al.] Classify the relation between 2 entities
5 entity types(person, organization, geo-political-entity,
location, facility) 5 major types of relations
(at, near, part, role, social) Classify the smallest subtree of dependency tree
which includes the entities
28
TREE KERNEL Represents the similarity between 2 tree-
shaped data Calculated as the sum of similarity of nodes
29
Dequeue a node pair
Add the similarity
Find all child node sequence pairswhose main features of the nodes
are common
Enqueue the child node pairs
Is the queueempty?
Return the similarity
Enqueue root node pair
Start
End
Yes
No
CALCULATION OF TREE KERNEL Features of nodes
The similarity between nodes are defined as the number of common features (except the main features)
30
Main features
CALCULATION OF TREE KERNEL
31
A
B C D
E
A’
B’ D’
E’
F’
A
B
A
D
D
E
C’
A’
B’ C’
A’A
A’
B’
A’
D’
A
B C
D’
E’
X and X’ denote the nodes whose main
features are common
A
C
A’
C’
EXPERIMENTAL RESULTS Data set: ACE corpus
800 annotated documents(gathered from newspapers and
broadcasts) 5 entity types
(person, organization, geo-political-entity, location, facility)
5 major types of relations(at, near, part, role, social)
32
Kernel Precision (%)
Recall (%)
Bag-of-words kernel 47.0 10.0Tree kernel 69.6 25.3
OUTLINE Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
33
CONCLUSION Overview of Information Extraction
The aim of information extraction Recent movement to use deep linguistic resource
The way to use sentence structures for IE Difficulties of using structured data in machine
learning Three different approaches to exploit them
34