Combining Information Extraction, DeductiveReasoning and Machine Learning for Relation
Prediction
Xueyan Jiang 2, Yi Huang 1,2, Maximilian Nickel 2,Volker Tresp 1,2
Siemens AG, Corporate Technology, Munich, Germany 1
Ludwig Maximilian University of Munich, Munich, Germany 2
May 31, 2012
1 / 22
Introduction
Relation prediction in RDF graph
RDF graph: knowledge base in form of a triple store
2 / 22
Introduction
Relation prediction in RDF graph
RDF graph: knowledge base in form of a triple storeTask: predict the truth of an instance of a relation orstatement, i.e. of an RDF triple
3 / 22
Introduction
Knowledge base: existing triples and new triples derived fromdeductive reasoning
4 / 22
Introduction
Unstructured contextual information: Wikipedia pages, Webpages, texts in literals
5 / 22
Motivation
Common approaches for relation prediction
IE (Information Extraction)
Data source: unstructured data, such as texts or imagesLimitation: unstructured information may not be available
DR (Deductive Reasoning)
Data source: a set of axiomsLimitation: can only derive subset, difficult to deal withuncertainty
ML (Machine Learning)
Data source: a set of true statementsLimitation: data must contain relevant statistical structureAdvantage: can express statistical dependencies betweenrelations, handle incomplete data
6 / 22
Motivation
Common approaches for relation prediction
IE (Information Extraction)Data source: unstructured data, such as texts or imagesLimitation: unstructured information may not be available
DR (Deductive Reasoning)Data source: a set of axiomsLimitation: can only derive subset, difficult to deal withuncertainty
ML (Machine Learning)Data source: a set of true statementsLimitation: data must contain relevant statistical structureAdvantage: can express statistical dependencies betweenrelations, handle incomplete data
Proposal
Combine IE, DR and ML in a principled way to make use of allknowledge sources for relation prediction
7 / 22
Outline
Matrix Representation for an RDF Graph
Proposed Framework for Combining IE, DR and ML
Prediction of relations from unstructured information (IE step)Derivation of relations from the knowledge base (DR step)Combination of IE step and DR stepDerivation of confidence values for predicted relations using aprobabilistic latent factor model (ML step)
8 / 22
Matrix Representation for an RDF Graph
We construct a matrix X from the RDF graphEach subject is represented as a rowEach column represents a (p,o) pair
9 / 22
Matrix Representation for an RDF Graph
We construct a matrix X from the RDF graphEach subject is represented as a rowEach column represents a (p,o) pairA matrix element X(s,p,o) is equal to one if the correspondingtriple is known to exist and is equal to zero otherwise
10 / 22
Proposed Framework for Combining IE, DR and ML
Prediction of relations from unstructured information(IE step)
In principle, any IE system can be usedIn our approach, we build a classifier to predictP(X = 1|IE ) ⇐⇒ P(X = 1|textsubject , textobject)
11 / 22
Proposed Framework for Combining IE, DR and ML
Prediction of relations from unstructured information(IE step)
In principle, any IE system can be usedIn our approach, we build a classifier to predictP(X = 1|IE ) ⇐⇒ P(X = 1|textsubject , textobject)
12 / 22
Proposed Framework for Combining IE, DR and ML
Derivation of relations from the knowledge base (DR step)
Knowledge Base: known triples and the triples added viaDeductive Reasoning (calculation of deductive closure)Any reasoner can be used
13 / 22
Proposed Framework for Combining IE, DR and ML
Derivation of relations from the knowledge base (DR step)
Knowledge Base: known triples and the triples added viaDeductive Reasoning (calculation of deductive closure)Any reasoner can be used
14 / 22
Proposed Framework for Combining IE, DR and ML
Derivation of relations from the knowledge base (DR step)
Knowledge Base: known triples and the triples added viaDeductive Reasoning (calculation of deductive closure)Any reasoner can be used
15 / 22
Proposed Framework for Combining IE, DR and ML
Combination of IE step and DR step:P(X = 1|IE ,DR) = max(P(X = 1|IE ),P(X = 1|DR))
16 / 22
Proposed Framework for Combining IE, DR and ML
Derivation of confidence values for predicted relations using aprobabilistic latent factor model (ML step)
Model descriptionWe define a new parameterization with a continuous fi,k usingsig(fi,k) = P(Xi,k = 1|IE ,DR)For each subject entity ei we introduce a d-dimensional latentvariable hi ∼ N(0, I )For each subject entity ei , αi is generated, via αi = Ahi ,where A has d columnsThen we assume fi,k = αi,k + εi,k
17 / 22
Proposed Framework for Combining IE, DR and ML
The maximum likelihood solution can be written as
α̂i = Ud diagd
(λj − σ̂2
λj
)UTd fi
where the columns of Ud are the principal d eigenvectors ofthe covariance matrix C = FTF with eigen values λ1, . . . , λd
Then P(Xi ,k = 1|IE ,DR,ML) = sig(α̂i ,k)
18 / 22
Experiments
Predicting gene-disease-relationships using LOD’s Linked LifeData and BIO2RDF (2462 genes, 331 diseases)
Target: for a given gene, predict likely diseasesIE: text fields from literals
19 / 22
Experiments
YAGO2 experiment: Prediction of writers’ nationalitiesML: 354 writers, 4 countries, city of birthML + AGG: include as columns the country of birth, derivedfrom the city of birth using geo reasoning (DR)IE: unstructured data from wikipages of the writers
20 / 22
Conclusion
IE: Exploit unstructured information
DR: Exploit axiomatic knowledge
ML: Exploit statistical patterns
We proposed an efficient way to combine ML, IE and DR in aprobabilistic model
21 / 22
Thanks!
22 / 22