Upload
grace-buchanan
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
1
Markov Logic in Natural Language Processing
Hoifung PoonDept. of Computer Science & Eng.
University of Washington
2
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
3
Languages Are Structural
governments
lm$pxtm(according to their families)
4
Languages Are Structural
govern-ment-s
l-m$px-t-m(according to their families)
S
V NP
NP VP
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41......
involvement
up-regulation
IL-10human
monocyte
SiteTheme Cause
gp41 p70(S6)-kinase
activation
Theme Cause
Theme
George Walker Bush was the 43rd President of the United States.…… Bush was the eldest son of President G. H. W. Bush and Babara Bush. …….In November 1977, he met Laura Welch at a barbecue.
5
Languages Are Structural
govern-ment-s
l-m$px-t-m(according to their families)
S
V NP
NP VP
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41......
involvement
up-regulation
IL-10human
monocyte
SiteTheme Cause
gp41 p70(S6)-kinase
activation
Theme Cause
Theme
George Walker Bush was the 43rd President of the United States.…… Bush was the eldest son of President G. H. W. Bush and Babara Bush. …….In November 1977, he met Laura Welch at a barbecue.
6
Languages Are Structural
Objects are not just feature vectors They have parts and subparts Which have relations with each other They can be trees, graphs, etc.
Objects are seldom i.i.d.(independent and identically distributed) They exhibit local and global dependencies They form class hierarchies (with multiple inheritance) Objects’ properties depend on those of related objects
Deeply interwoven with knowledge
7
First-Order Logic
Main theoretical foundation of computer scienceGeneral language for describing
complex structures and knowledge Trees, graphs, dependencies, hierarchies, etc.
easily expressed Inference algorithms (satisfiability testing,
theorem proving, etc.)
8
G. W. Bush ………… Laura Bush ……Mrs. Bush ……
Languages Are Statistical
I saw the man with the telescope
I saw the man with the telescope
NP
NP ADVP
I saw the man with the telescope
Here in London, Frances Deek is a retired teacher …In the Israeli town …, Karen London says …Now London says …
London PERSON or LOCATION?
Microsoft buys Powerset
Microsoft acquires Powerset
Powerset is acquired by Microsoft Corporation
The Redmond software giant buys Powerset
Microsoft’s purchase of Powerset, …
……
Which one?
9
Languages Are Statistical
Languages are ambiguousOur information is always incompleteWe need to model correlationsOur predictions are uncertain Statistics provides the tools to handle this
10
Probabilistic Graphical Models
Mixture modelsHidden Markov models Bayesian networksMarkov random fieldsMaximum entropy modelsConditional random fields Etc.
11
The Problem
Logic is deterministic, requires manual coding Statistical models assume i.i.d. data,
objects = feature vectors Historically, statistical and logical NLP
have been pursued separately We need to unify the two! Burgeoning field in machine learning:
Statistical relational learning
12
Costs and Benefits ofStatistical Relational Learning
Benefits Better predictive accuracy Better understanding of domains Enable learning with less or no labeled data
Costs Learning is much harder Inference becomes a crucial issue Greater complexity for user
13
Progress to Date
Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction
[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Etc. This talk: Markov logic [Domingos & Lowd, 2009]
14
Markov Logic: A Unifying Framework
Probabilistic graphical models andfirst-order logic are special cases
Unified inference and learning algorithms Easy-to-use software: Alchemy Broad applicabilityGoal of this tutorial:
Quickly learn how to use Markov logic and Alchemy for a broad spectrum of NLP applications
15
Overview Motivation Foundational areas
Probabilistic inference Statistical learning Logical inference Inductive logic programming
Markov logic NLP applications
Basics Supervised learning Unsupervised learning
16
Markov Networks Undirected graphical models
Cancer
CoughAsthma
Smoking
Potential functions defined over cliques
Smoking Cancer Ф(S,C)
False False 4.5
False True 4.5
True False 2.7
True True 4.5
c
cc xZ
xP )(1
)(
x c
cc xZ )(
17
Markov Networks Undirected graphical models
Log-linear model:
Weight of Feature i Feature i
otherwise0
CancerSmokingif1)CancerSmoking,(1f
5.11 w
Cancer
CoughAsthma
Smoking
iii xfw
ZxP )(exp
1)(
18
Markov Nets vs. Bayes Nets
Property Markov Nets Bayes Nets
Form Prod. potentials Prod. potentials
Potentials Arbitrary Cond. probabilities
Cycles Allowed Forbidden
Partition func. Z = ? Z = 1
Indep. check Graph separation D-separation
Indep. props. Some Some
Inference MCMC, BP, etc. Convert to Markov
19
Inference in Markov NetworksGoal: compute marginals & conditionals of
Exact inference is #P-completeConditioning on Markov blanket is easy:
Gibbs sampling exploits this
exp ( )( | ( ))
exp ( 0) exp ( 1)
i ii
i i i ii i
w f xP x MB x
w f x w f x
1( ) exp ( )i i
i
P X w f XZ
exp ( )i i
X i
Z w f X
20
MCMC: Gibbs Sampling
state ← random truth assignmentfor i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of xP(F) ← fraction of states in which F is true
21
Other Inference Methods
Belief propagation (sum-product)Mean field / Variational approximations
22
MAP/MPE Inference
Goal: Find most likely state of world given evidence
)|(max xyPy
Query Evidence
23
MAP Inference Algorithms
Iterated conditional modes Simulated annealingGraph cuts Belief propagation (max-product) LP relaxation
24
Overview Motivation Foundational areas
Probabilistic inference Statistical learning Logical inference Inductive logic programming
Markov logic NLP applications
Basics Supervised learning Unsupervised learning
25
Generative Weight Learning
Maximize likelihoodUse gradient ascent or L-BFGSNo local maxima
Requires inference at each step (slow!)
No. of times feature i is true in data
Expected no. times feature i is true according to model
)()()(log xnExnxPw iwiw
i
26
Pseudo-Likelihood
Likelihood of each variable given its neighbors in the data
Does not require inference at each stepWidely used in vision, spatial statistics, etc. But PL parameters may not work well for
long inference chains
i
ii xneighborsxPxPL ))(|()(
27
Discriminative Weight Learning
Maximize conditional likelihood of query (y) given evidence (x)
Approximate expected counts by counts in MAP state of y given x
No. of true groundings of clause i in data
Expected no. true groundings according to model
),(),()|(log yxnEyxnxyPw iwiw
i
28
wi ← 0for t ← 1 to T do yMAP ← Viterbi(x) wi ← wi + η [counti(yData) – counti(yMAP)]return wi / T
Voted Perceptron
Originally proposed for training HMMs discriminatively
Assumes network is linear chain Can be generalized to arbitrary networks
29
Overview Motivation Foundational areas
Probabilistic inference Statistical learning Logical inference Inductive logic programming
Markov logic NLP applications
Basics Supervised learning Unsupervised learning
30
First-Order Logic
Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x, y)
Literal: Predicate or its negationClause: Disjunction of literalsGrounding: Replace all variables by constants
E.g.: Friends (Anna, Bob)World (model, interpretation):
Assignment of truth values to all ground predicates
31
Inference in First-Order Logic Traditionally done by theorem proving
(e.g.: Prolog) Propositionalization followed by model
checking turns out to be faster (often by a lot) Propositionalization:
Create all ground atoms and clausesModel checking: Satisfiability testing Two main approaches: Backtracking (e.g.: DPLL) Stochastic local search (e.g.: WalkSAT)
32
Satisfiability
Input: Set of clauses(Convert KB to conjunctive normal form (CNF))
Output: Truth assignment that satisfies all clauses, or failure
The paradigmatic NP-complete problem Solution: Search Key point:
Most SAT problems are actually easy Hard region: Narrow range of
#Clauses / #Variables
33
Stochastic Local Search
Uses complete assignments instead of partial Start with random state Flip variables in unsatisfied clausesHill-climbing: Minimize # unsatisfied clauses Avoid local minima: Random flipsMultiple restarts
34
The WalkSAT Algorithm
for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes # satisfied clausesreturn failure
35
Overview Motivation Foundational areas
Probabilistic inference Statistical learning Logical inference Inductive logic programming
Markov logic NLP applications
Basics Supervised learning Unsupervised learning
36
Rule Induction
Given: Set of positive and negative examples of some concept Example: (x1, x2, … , xn, y) y: concept (Boolean) x1, x2, … , xn: attributes (assume Boolean)
Goal: Induce a set of rules that cover all positive examples and no negative ones Rule: xa ^ xb ^ … y (xa: Literal, i.e., xi or its negation) Same as Horn clause: Body Head Rule r covers example x iff x satisfies body of r
Eval(r): Accuracy, info gain, coverage, support, etc.
37
Learning a Single Rule
head ← ybody ← Ørepeat for each literal x rx ← r with x added to body Eval(rx) body ← body ^ best xuntil no x improves Eval(r)return r
38
Learning a Set of Rules
R ← ØS ← examplesrepeat learn a single rule r R ← R U { r } S ← S − positive examples covered by runtil S = Øreturn R
39
First-Order Rule Induction
y and xi are now predicates with argumentsE.g.: y is Ancestor(x,y), xi is Parent(x,y)
Literals to add are predicates or their negations Literal to add must include at least one variable
already appearing in rule Adding a literal changes # groundings of rule
E.g.: Ancestor(x,z) ^ Parent(z,y) Ancestor(x,y) Eval(r) must take this into account
E.g.: Multiply by # positive groundings of rule still covered after adding literal
40
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
41
Markov Logic
Syntax: Weighted first-order formulasSemantics: Feature templates for Markov
networks Intuition: Soften logical constraintsGive each formula a weight
(Higher weight Stronger constraint)
satisfiesit formulas of weightsexpP(world)
42
Example: Coreference Resolution
Mentions of Obama are often headed by "Obama"Mentions of Obama are often headed by "President"Appositions usually refer to the same entity
Barack Obama, the 44th President of the United States, is the first African American to hold the office. ……
43
MentionOf ( ,Obama) Head( ,"Obama")
MentionOf ( ,Obama) Head( ,"President")
, , Apposition( , ) MentionOf ( , ) MentionOf ( , )
x x x
x x x
x y c x y x c y c
Example: Coreference Resolution
44
Example: Coreference Resolution
1.5
0.8
100
MentionOf ( ,Obama) Head( ,"Obama")
MentionOf ( ,Obama) Head( ,"President")
, , Apposition( , ) MentionOf ( , ) MentionOf ( , )
x x x
x x x
x y c x y x c y c
45
Example: Coreference Resolution
1.5
0.8
100
Head(A,“Obama”)
MentionOf(A,Obama)
Head(A,“President”)
MentionOf(B,Obama)
Apposition(A,B)
Head(B,“Obama”)
Head(B,“President”)
Two mention constants: A and B
Apposition(B,A)
MentionOf ( ,Obama) Head( ,"Obama")
MentionOf ( ,Obama) Head( ,"President")
, , Apposition( , ) MentionOf ( , ) MentionOf ( , )
x x x
x x x
x y c x y x c y c
46
Markov Logic Networks MLN is template for ground Markov nets Probability of a world x:
Typed variables and constants greatly reduce size of ground Markov net
Functions, existential quantifiers, etc. Can handle infinite domains [Singla & Domingos, 2007]
and continuous domains [Wang & Domingos, 2008]
Weight of formula i No. of true groundings of formula i in x
iii xnw
ZxP )(exp
1)(
47
Relation to Statistical Models
Special cases: Markov networks Markov random fields Bayesian networks Log-linear models Exponential models Max. entropy models Gibbs distributions Boltzmann machines Logistic regression Hidden Markov models Conditional random fields
Obtained by making all predicates zero-arity
Markov logic allows objects to be interdependent (non-i.i.d.)
48
Relation to First-Order Logic
Infinite weights First-order logic Satisfiable KB, positive weights
Satisfying assignments = Modes of distributionMarkov logic allows contradictions between
formulas
49
MLN Algorithms:The First Three Generations
Problem First generation
Second generation
Third generation
MAP inference
Weighted satisfiability
Lazy inference
Cutting planes
Marginal inference
Gibbs sampling
MC-SAT Lifted inference
Weight learning
Pseudo-likelihood
Voted perceptron
Scaled conj. gradient
Structure learning
Inductive logic progr.
ILP + PL (etc.)
Clustering + pathfinding
50
MAP/MPE Inference
Problem: Find most likely state of world given evidence
)|(max xyPy
Query Evidence
51
MAP/MPE Inference
Problem: Find most likely state of world given evidence
i
iix
yyxnw
Z),(exp
1max
52
MAP/MPE Inference
Problem: Find most likely state of world given evidence
i
iiy
yxnw ),(max
53
MAP/MPE Inference
Problem: Find most likely state of world given evidence
This is just the weighted MaxSAT problemUse weighted SAT solver
(e.g., MaxWalkSAT [Kautz et al., 1997] )
i
iiy
yxnw ),(max
54
The MaxWalkSAT Algorithm
for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes weights(sat. clauses) return failure, best solution found
55
Computing Probabilities
P(Formula|MLN,C) = ?MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms First construct min subset of network necessary
to answer query (generalization of KBMC) Then apply MCMC
56
But … Insufficient for Logic
Problem:Deterministic dependencies break MCMCNear-deterministic ones make it very slow
Solution:Combine MCMC and WalkSAT
→ MC-SAT algorithm [Poon & Domingos, 2006]
57
Auxiliary-Variable Methods
Main ideas: Use auxiliary variables to capture dependencies Turn difficult sampling into uniform sampling
Given distribution P(x)
Sample from f (x, u), then discard u
1, if 0 ( )( , ) ( , ) ( )
0, otherwise
u P xf x u f x u du P x
58
Slice Sampling [Damien et al. 1999]
Xx(k)
u(k)
x(k+1)
Slice
U P(x)
59
Slice Sampling
Identifying the slice may be difficult
Introduce an auxiliary variable ui for each Фi
1( ) ( )i
i
P x xZ
1
1 if 0 ( )( , , , )
0 otherwisei i
n
u xf x u u
60
The MC-SAT Algorithm Select random subset M of satisfied clauses With probability 1 – exp ( – wi )
Larger wi Ci more likely to be selected
Hard clause (wi ): Always selected
Slice States that satisfy clauses in M Uses SAT solver to sample x | u. Orders of magnitude faster than Gibbs sampling,
etc.
61
But … It Is Not Scalable
1000 researchers Coauthor(x,y): 1 million ground atoms Coauthor(x,y) Coauthor(y,z) Coauthor(x,z): 1
billion ground clauses Exponential in arity
62
Sparsity to the Rescue 1000 researchers Coauthor(x,y): 1 million ground atoms
But … most atoms are false Coauthor(x,y) Coauthor(y,z) Coauthor(x,z):
1 billion ground clauses
Most trivially satisfied if most atoms are falseNo need to explicitly compute most of them
63
Lazy Inference
LazySAT [Singla & Domingos, 2006a] Lazy version of WalkSAT [Selman et al., 1996]
Grounds atoms/clauses as needed Greatly reduces memory usage
The idea is much more general [Poon & Domingos, 2008a]
64
General Method for Lazy Inference
If most variables assume the default value, wasteful to instantiate all variables / functions
Main idea: Allocate memory for a small subset of
“active” variables / functions Activate more if necessary as inference proceeds
Applicable to a diverse set of algorithms: Satisfiability solvers (systematic, local-search), Markov chain Monte Carlo, MPE / MAP algorithms, Maximum expected utility algorithms, Belief propagation, MC-SAT, Etc.
Reduce memory and time by orders of magnitude
65
Lifted Inference
Consider belief propagation (BP)Often in large problems, many nodes are
interchangeable:They send and receive the same messages throughout BP
Basic idea: Group them into supernodes, forming lifted network
Smaller network → Faster inference Akin to resolution in first-order logic
66
Belief Propagation
Nodes (x)
Features (f)
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
67
Lifted Belief Propagation
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
Nodes (x)
Features (f)
68
Lifted Belief Propagation
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
, :Functions of edge counts
Nodes (x)
Features (f)
69
Learning
Data is a relational databaseClosed world assumption (if not: EM) Learning parameters (weights) Learning structure (formulas)
70
Parameter tying: Groundings of same clause
Generative learning: Pseudo-likelihoodDiscriminative learning: Conditional likelihood,
use MC-SAT or MaxWalkSAT for inference
Parameter Learning
No. of times clause i is true in data
Expected no. times clause i is true according to MLN
log ( ) ( ) ( )i x ii
P x n x E n xw
71
Parameter Learning
Pseudo-likelihood + L-BFGS is fast and robust but can give poor inference results
Voted perceptron:Gradient descent + MAP inference
Scaled conjugate gradient
72
HMMs are special case of MLNsReplace Viterbi by MaxWalkSATNetwork can now be arbitrary graph
wi ← 0for t ← 1 to T do yMAP ← MaxWalkSAT(x) wi ← wi + η [counti(yData) – counti(yMAP)]return wi / T
Voted Perceptron for MLNs
73
Problem: Multiple Modes
Not alleviated by contrastive divergence Alleviated by MC-SATWarm start: Start each MC-SAT run at
previous end state
74
Solvable by quasi-Newton, conjugate gradient, etc. But line searches require exact inference Solution: Scaled conjugate gradient
[Lowd & Domingos, 2008]
Use Hessian to choose step size Compute quadratic form inside MC-SAT Use inverse diagonal Hessian as preconditioner
Problem: Extreme Ill-Conditioning
75
Structure Learning
Standard inductive logic programming optimizesthe wrong thing
But can be used to overgenerate for L1 pruning Our approach:
ILP + Pseudo-likelihood + Structure priors For each candidate structure change:
Start from current weights & relax convergence Use subsampling to compute sufficient statistics
76
Structure Learning
Initial state: Unit clauses or prototype KBOperators: Add/remove literal, flip sign Evaluation function:
Pseudo-likelihood + Structure prior Search: Beam search, shortest-first search
77
Alchemy
Open-source software including: Full first-order logic syntaxGenerative & discriminative weight learning Structure learningWeighted satisfiability, MCMC, lifted BP Programming language features
alchemy.cs.washington.edu
78
Alchemy Prolog BUGS
Represent-ation
F.O. Logic + Markov nets
Horn clauses
Bayes nets
Inference Model check- ing, MCMC, lifted BP
Theorem proving
MCMC
Learning Parameters& structure
No Params.
Uncertainty Yes No Yes
Relational Yes Yes No
79
Constrained Conditional Model
Representation: Integer linear programs
Local classifiers + Global constraints Inference: LP solver Parameter learning: None for constraints Weights of soft constraints set heuristically Local weights typically learned independently
Structure learning: None to date But see latest development in NAACL-10
80
Running Alchemy
Programs Infer Learnwts Learnstruct
Options
MLN file Types (optional) Predicates Formulas
Database files
81
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
82
Uniform Distribn.: Empty MLN
Example: Unbiased coin flips
Type: flip = { 1, … , 20 }
Predicate: Heads(flip)
1 0
1 10 0
1(Heads( ))
2Z
Z Z
eP f
e e
83
Binomial Distribn.: Unit Clause
Example: Biased coin flips
Type: flip = { 1, … , 20 }
Predicate: Heads(flip)
Formula: Heads(f)
Weight: Log odds of heads:
By default, MLN includes unit clauses for all predicates
(captures marginal distributions, etc.)
1
1 1 0
1(Heads( ))
1
wZ
w wZ Z
eP f p
e e e
p
pw
1log
84
Multinomial Distribution
Example: Throwing die
Types: throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face)
Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’).
Exist f Outcome(t,f).
Too cumbersome!
85
Multinomial Distrib.: ! Notation
Example: Throwing die
Types: throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Formulas:
Semantics: Arguments without “!” determine arguments with “!”.
Also makes inference more efficient (triggers blocking).
86
Multinomial Distrib.: + Notation
Example: Throwing biased die
Types: throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Formulas: Outcome(t,+f)
Semantics: Learn weight for each grounding of args with “+”.
87
Logistic regression:
Type: obj = { 1, ... , n }Query predicate: C(obj)Evidence predicates: Fi(obj)Formulas: a C(x) bi Fi(x) ^ C(x)
Resulting distribution:
Therefore:
Alternative form: Fi(x) => C(x)
Logistic Regression (MaxEnt)
iiii fba
fba
CP
CP
)0exp(
explog
)|0(
)|1(log
fF
fF
iii cfbac
ZcCP exp
1),( fF
ii fbaCP
CP
)|0(
)|1(log
fF
fF
88
Hidden Markov Models
obs = { Red, Green, Yellow }state = { Stop, Drive, Slow }time = { 0, ..., 100 }
State(state!,time)Obs(obs!,time)
State(+s,0)State(+s,t) ^ State(+s',t+1)Obs(+o,t) ^ State(+s,t)
Sparse HMM: State(s,t) => State(s1,t+1) v State(s2, t+1) v ... .
89
Bayesian Networks
Use all binary predicates with same first argument (the object x).
One predicate for each variable A: A(x,v!) One clause for each line in the CPT and
value of the variable Context-specific independence:
One clause for each path in the decision tree Logistic regression: As before Noisy OR: Deterministic OR + Pairwise clauses
90
Relational Models
Knowledge-based model construction Allow only Horn clauses Same as Bayes nets, except arbitrary relations Combin. function: Logistic regression, noisy-OR or external
Stochastic logic programs Allow only Horn clauses Weight of clause = log(p) Add formulas: Head holds Exactly one body holds
Probabilistic relational models Allow only binary relations Same as Bayes nets, except first argument can vary
91
Relational Models Relational Markov networks
SQL → Datalog → First-order logic One clause for each state of a clique + syntax in Alchemy facilitates this
Bayesian logic Object = Cluster of similar/related observations Observation constants + Object constants Predicate InstanceOf(Obs,Obj) and clauses using it
Unknown relations: Second-order Markov logicS. Kok & P. Domingos, “Statistical Predicate Invention”, inProc. ICML-2007.
92
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
93
Text Classification
The 56th quadrennial United States presidential election was held on November 4, 2008. Outgoing Republican President George W. Bush's policies and actions and the American public's desire for change were key issues throughout the campaign. ……
The Chicago Bulls are an American professional basketball team based in Chicago, Illinois, playing in the Central Division of the Eastern Conference in the National Basketball Association (NBA). ……
……
Topic = politics
Topic = sports
94
Text Classificationpage = {1, ..., max}word = { ... }topic = { ... }
Topic(page,topic)HasWord(page,word)
Topic(p,t)HasWord(p,+w) => Topic(p,+t)
If topics mutually exclusive: Topic(page,topic!)
95
Text Classificationpage = {1, ..., max}word = { ... }topic = { ... }
Topic(page,topic)HasWord(page,word)Links(page,page)
Topic(p,t)HasWord(p,+w) => Topic(p,+t)Topic(p,t) ^ Links(p,p') => Topic(p',t)
Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext ClassificationUsing Hyperlinks,” in Proc. SIGMOD-1998.
96
Entity ResolutionAUTHOR: H. POON & P. DOMINGOSTITLE: UNSUPERVISED SEMANTIC PARSINGVENUE: EMNLP-09
AUTHOR: Hoifung Poon and Pedro DomingsTITLE: Unsupervised semantic parsingVENUE: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
AUTHOR: Poon, Hoifung and Domings, PedroTITLE: Unsupervised ontology induction from textVENUE: Proceedings of the Forty-Eighth Annual Meeting of the Association for Computational Linguistics
AUTHOR: H. Poon, P. DomingsTITLE: Unsupervised ontology inductionVENUE: ACL-10
SAME?
SAME?
97
Problem: Given database, find duplicate records
HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) => SameRecord(r,r’)
Entity Resolution
98
Problem: Given database, find duplicate records
HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) => SameRecord(r,r’)SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”)
Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertaintywith Application to Noun Coreference,” in Adv. NIPS 17, 2005.
Entity Resolution
99
Can also resolve fields:
HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) <=> SameRecord(r,r’)SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”)SameField(f,r,r’) ^ SameField(f,r’,r”) => SameField(f,r,r”)
More: P. Singla & P. Domingos, “Entity Resolution with Markov Logic”, in Proc. ICDM-2006.
Entity Resolution
100
UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS. EMNLP-2009.
Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: ACL.
Information Extraction
101
UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS. EMNLP-2009.
Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: ACL.
Information Extraction
Author Title Venue
SAME?
102
Information Extraction
Problem: Extract database from text orsemi-structured sources
Example: Extract database of publications from citation list(s) (the “CiteSeer problem”)
Two steps: Segmentation:
Use HMM to assign tokens to fields Entity resolution:
Use logistic regression and transitivity
103
Token(token, position, citation)InField(position, field!, citation)SameField(field, citation, citation)SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ InField(i+1,+f,c)
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Information Extraction
104
Token(token, position, citation)InField(position, field!, citation)SameField(field, citation, citation)SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ !Token(“.”,i,c) ^ InField(i+1,+f,c)
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
More: H. Poon & P. Domingos, “Joint Inference in InformationExtraction”, in Proc. AAAI-2007.
Information Extraction
105
Biomedical Text Mining
Traditionally, name entity recognition or information extractionE.g., protein recognition, protein-protein identification
BioNLP-09 shared task: Nested bio-events Much harder than traditional IE Top F1 around 50% Naturally calls for joint inference
106
Bio-Event Extraction
Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...
involvement
up-regulation
IL-10human
monocyte
SiteTheme Cause
gp41 p70(S6)-kinase
activation
Theme Cause
Theme
107
Token(position, token)DepEdge(position, position, dependency)IsProtein(position)EvtType(position, evtType)InArgPath(position, position, argType!)
Token(i,+w) => EvtType(i,+t)Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)DepEdge(i,j,+d) => InArgPath(i,j,+a)Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)…
Bio-Event Extraction
Logistic regression
108
Token(position, token)DepEdge(position, position, dependency)IsProtein(position)EvtType(position, evtType)InArgPath(position, position, argType!)
Token(i,+w) => EvtType(i,+t)Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)DepEdge(i,j,+d) => InArgPath(i,j,+a)Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)…
InArgPath(i,j,Theme) => IsProtein(j) v (Exist k k!=i ^ InArgPath(j, k, Theme)).…
More: H. Poon and L. Vanderwende, “Joint Inference for Knowledge Extraction from Biomedical Literature”, 10:40 am, June 4, Gold Room.
Bio-Event Extraction
Adding a few joint inference rules doubles the F1
109
Temporal Information Extraction
Identify event times and temporal relations (BEFORE, AFTER, OVERLAP)
E.g., who is the President of U.S.A.? Obama: 1/20/2009 present G. W. Bush: 1/20/2001 1/19/2009 Etc.
110
DepEdge(position, position, dependency)Event(position, event)After(event, event)
DepEdge(i,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)
After(p,q) ^ After(q,r) => After(p,r)
Temporal Information Extraction
111
DepEdge(position, position, dependency)Event(position, event)After(event, event)Role(position, position, role)
DepEdge(I,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)Role(i,j,ROLE-AFTER) ^ Event(i,p) ^ Event(j,q) => After(p,q)
After(p,q) ^ After(q,r) => After(p,r)
More:
K. Yoshikawa, S. Riedel, M. Asahara and Y. Matsumoto, “Jointly Identifying Temporal Relations with Markov Logic”, in Proc. ACL-2009.
X. Ling & D. Weld, “Temporal Information Extraction”, in Proc. AAAI-2010.
Temporal Information Extraction
112
Semantic Role Labeling
Problem: Identify arguments for a predicate Two steps: Argument identification:
Determine whether a phrase is an argument Role classification:
Determine the type of an argument (agent, theme, temporal, adjunct, etc.)
113
Token(position, token)DepPath(position, position, path)IsPredicate(position)Role(position, position, role!)HasRole(position, position)
Token(i,+t) => IsPredicate(i)DepPath(i,j,+p) => Role(i,j,+r)
HasRole(i,j) => IsPredicate(i)IsPredicate(i) => Exist j HasRole(i,j)HasRole(i,j) => Exist r Role(i,j,r)Role(i,j,r) => HasRole(i,j)
Cf. K. Toutanova, A. Haghighi, C. Manning, “A global joint model for semantic role labeling”, in Computational Linguistics 2008.
Semantic Role Labeling
114
Token(position, token)DepPath(position, position, path)IsPredicate(position)Role(position, position, role!)HasRole(position, position)Sense(position, sense!)
Token(i,+t) => IsPredicate(i)DepPath(i,j,+p) => Role(i,j,+r)Sense(I,s) => IsPredicate(i)
HasRole(i,j) => IsPredicate(i)IsPredicate(i) => Exist j HasRole(i,j)HasRole(i,j) => Exist r Role(i,j,r)Role(i,j,r) => HasRole(i,j)Token(i,+t) ^ Role(i,j,+r) => Sense(i,+s)
More: I. Meza-Ruiz & S. Riedel, “Jointly Identifying Predicates, Arguments and Senses using Markov Logic”, in Proc. NAACL-2009.
Joint Semantic Role Labeling and Word Sense Disambiguation
115
Practical Tips: Modeling
Add all unit clauses (the default)How to handle uncertain data:R(x,y) ^ R’(x,y) (the “HMM trick”)
Implications vs. conjunctionsFor soft correlation, conjunctions often better Implication: A => B is equivalent to !(A ^ !B) Share cases with others like A => C Make learning unnecessarily harder
116
Practical Tips: Efficiency
Open/closed world assumptionsLow clause aritiesLow numbers of constantsShort inference chains
117
Practical Tips: Development
Start with easy componentsGradually expand to full taskUse the simplest MLN that worksCycle: Add/delete formulas, learn and test
118
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
119
Unsupervised Learning: Why?
Virtually unlimited supply of unlabeled text Labeling is expensive (Cf. Penn-Treebank)Often difficult to label with consistency and
high quality (e.g., semantic parses) Emerging field: Machine reading
Extract knowledge from unstructured text with high precision/recall and minimal human effort
Check out LBR-Workshop (WS9) on Sunday
120
Unsupervised Learning: How?
I.i.d. learning: Sophisticated model requires more labeled data
Statistical relational learning: Sophisticated model may require less labeled data Relational dependencies constrain problem space One formula is worth a thousand labels
Small amount of domain knowledge large-scale joint inference
121
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How?
Are they coreferent?
122
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How
Should be coreferent
123
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How
So must be singular male!
124
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How
Must be singular female!
125
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How
Verdict: Not coreferent!
126
| ,log ( ) ( , ) ( , )z x i x z ii
P x E n x z E n x zw
Marginalize out hidden variables
Use MC-SAT to approximate both expectationsMay also combine with contrastive estimation
[Poon & Cherry & Toutanova, NAACL-2009]
Parameter Learning
Sum over z, conditioned on observed x
Summed over both x and z
127
Unsupervised Coreference Resolution
Head(mention, string)Type(mention, type)MentionOf(mention, entity)
MentionOf(+m,+e)Type(+m,+t)Head(+m,+h) ^ MentionOf(+m,+e)
MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))
… (similarly for Number, Gender etc.)
Mixture model
Joint inference formulas: Enforce agreement
128
Unsupervised Coreference Resolution
Head(mention, string)Type(mention, type)MentionOf(mention, entity)Apposition(mention, mention)
MentionOf(+m,+e)Type(+m,+t)Head(+m,+h) ^ MentionOf(+m,+e)
MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))
… (similarly for Number, Gender etc.)
Apposition(a,b) => (MentionOf(a,e) <=> MentionOf(b,e))
More: H. Poon and P. Domingos, “Joint Unsupervised Coreference Resolution with Markov Logic”, in Proc. EMNLP-2008.
Joint inference formulas: Leverage apposition
129
Relational Clustering: Discover Unknown PredicatesCluster relations along with objectsUse second-order Markov logic
[Kok & Domingos, 2007, 2008]
Key idea: Cluster combination determines likelihood of relationsInClust(r,+c) ^ InClust(x,+a) ^ InClust(y,+b) => r(x,y)
Input: Relational tuples extracted by TextRunner [Banko et al., 2007]
Output: Semantic network
130
Recursive Relational Clustering
Unsupervised semantic parsing [Poon & Domingos, EMNLP-2009]
Text Knowledge Start directly from text Identify meaning units + Resolve variations Use high-order Markov logic (variables over
arbitrary lambda forms and their clusters) End-to-end machine reading:
Read text, then answer questions
131
Semantic Parsing
induces
protein CD11b
nsubj dobj
IL-4
nn
induces
protein CD11b
nsubj dobj
IL-4
nn
INDUCE
INDUCER INDUCED
IL-4
CD11B
INDUCE(e1)
INDUCER(e1,e2) INDUCED(e1,e3)
IL-4(e2) CD11B(e3)
IL-4 protein induces CD11b
Structured prediction: Partition + Assignment
132
Challenge: Same Meaning, Many Variations
IL-4 up-regulates CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is induced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
……
133
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
134
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster same forms at the atom level
135
Cluster forms in composition with same forms
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
136
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
137
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
138
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
139
Unsupervised Semantic Parsing
Exponential prior on number of parameters Event/object/property cluster mixtures:InClust(e,+c) ^ HasValue(e,+v)
Object/Event Cluster: INDUCE
induces 0.1
enhances 0.4
…
…
Property Cluster: INDUCER
0.5
0.4…
IL-4 0.2
IL-8 0.1
…
None 0.1
One 0.8
…
nsubj
agent
140
But … State Space Too Large
Coreference: #-clusters #-mentionsUSP: #-clusters exp(#-tokens) Also, meaning units often small and
many singleton clusters
Use combinatorial search
141
Inference: Hill-Climb Probability
Initialize
Search Operator
Lambda reduction
induces
protein CD11B
nsubj dobj
IL-4
nn
?
? ?
?
?
?
?
protein
IL-4
nn
protein
IL-4
nn
?
?
?
?
142
Learning: Hill-Climb Likelihood
enhances 1induces 1 protein 1IL-4 1
MERGE COMPOSE
IL-4 protein 1induces 0.2enhances 0.8
…Initialize
Search Operator
enhances 1induces 1 protein 1IL-4 1
143
Unsupervised Ontology Induction
Limitations of USP: No ISA hierarchy among clusters Little smoothing Limited capability to generalize
OntoUSP [Poon & Domingos, ACL-2010]
Extends USP to also induce ISA hierarchy Joint approach for ontology induction, population,
and knowledge extraction To appear in ACL (see you in Uppsala :-)
144
OntoUSP Modify the cluster mixture formula
InClust(e,c) ^ ISA(c,+d) ^ HasValue(e,+v) Hierarchical smoothing + clustering New operator in learning:
induces 0.30.1
…
enhances
ISA ISA
inhibits 0.2suppresses 0.1
induces 0.6
up-regulates 0.2
…
INDUCE
INHIBITinhibits 0.4
0.2
…
suppresses
ABSTRACTION
INHIBIT
inhibits 0.4
0.2
…
suppressesinduces 0.6
up-regulates 0.2
…
INDUCE
MERGE with
REGULATE?
145
End of The Beginning …
Not merely a user guide of MLN and Alchemy Statistical relational learning:
Growth area for machine learning and NLP
146
Future Work: Inference
Scale up inference Cutting-planes methods (e.g., [Riedel, 2008]) Unify lifted inference with sampling Coarse-to-fine inference
Alternative technologyE.g., linear programming, lagrangian relaxation
147
Future Work: Supervised Learning
Alternative optimization objectivesE.g., max-margin learning [Huynh & Mooney, 2009]
Learning for efficient inferenceE.g., learning arithmetic circuits [Lowd & Domingos, 2008]
Structure learning: Improve accuracy and scalabilityE.g., [Kok & Domingos, 2009]
148
Future Work: Unsupervised Learning
Model: Learning objective, formalism, etc. Learning: Local optima, intractability, etc.Hyperparameter tuning Leverage available resources Semi-supervised learning Multi-task learning Transfer learning (e.g., domain adaptation)
Human in the loopE.g., interative ML, active learning, crowdsourcing
149
Future Work: NLP Applications
Existing application areas: More joint inference opportunities Additional domain knowledge Combine multiple pipeline stages
A “killer app”: Machine readingMany, many more awaiting YOU to discover
150
SummaryWe need to unify logical and statistical NLPMarkov logic provides a language for this Syntax: Weighted first-order formulas Semantics: Feature templates of Markov nets Inference: Satisfiability, MCMC, lifted BP, etc. Learning: Pseudo-likelihood, VP, PSCG, ILP, etc.
Growing set of NLP applicationsOpen-source software: Alchemy
Book: Domingos & Lowd, Markov Logic,Morgan & Claypool, 2009.
alchemy.cs.washington.edu
151
References[Banko et al., 2007] Michele Banko, Michael J. Cafarella, Stephen
Soderland, Matt Broadhead, Oren Etzioni, "Open Information Extraction From the Web", In Proc. IJCAI-2007.
[Chakrabarti et al., 1998] Soumen Chakrabarti, Byron Dom, Piotr Indyk, "Hypertext Classification Using Hyperlinks", in Proc. SIGMOD-1998.
[Damien et al., 1999] Paul Damien, Jon Wakefield, Stephen Walker, "Gibbs sampling for Bayesian non-conjugate and hierarchical models by auxiliary variables", Journal of the Royal Statistical Society B, 61:2.
[Domingos & Lowd, 2009] Pedro Domingos and Daniel Lowd, Markov Logic, Morgan & Claypool.
[Friedman et al., 1999] Nir Friedman, Lise Getoor, Daphne Koller, Avi Pfeffer, "Learning probabilistic relational models", in Proc. IJCAI-1999.
152
References[Halpern, 1990] Joe Halpern, "An analysis of first-order logics of
probability", Artificial Intelligence 46.
[Huynh & Mooney, 2009] Tuyen Huynh and Raymond Mooney, "Max-Margin Weight Learning for Markov Logic Networks", In Proc. ECML-2009.
[Kautz et al., 1997] Henry Kautz, Bart Selman, Yuejun Jiang, "A general stochastic approach to solving problems with hard and soft constraints", In The Satisfiability Problem: Theory and Applications. AMS.
[Kok & Domingos, 2007] Stanley Kok and Pedro Domingos, "Statistical Predicate Invention", In Proc. ICML-2007.
[Kok & Domingos, 2008] Stanley Kok and Pedro Domingos, "Extracting Semantic Networks from Text via Relational Clustering", In Proc. ECML-2008.
153
References[Kok & Domingos, 2009] Stanley Kok and Pedro Domingos, "Learning
Markov Logic Network Structure via Hypergraph Lifting", In Proc. ICML-2009.
[Ling & Weld, 2010] Xiao Ling and Daniel S. Weld, "Temporal Information Extraction", In Proc. AAAI-2010.
[Lowd & Domingos, 2007] Daniel Lowd and Pedro Domingos, "Efficient Weight Learning for Markov Logic Networks", In Proc. PKDD-2007.
[Lowd & Domingos, 2008] Daniel Lowd and Pedro Domingos, "Learning Arithmetic Circuits", In Proc. UAI-2008.
[Meza-Ruiz & Riedel, 2009] Ivan Meza-Ruiz and Sebastian Riedel, "Jointly Identifying Predicates, Arguments and Senses using Markov Logic", In Proc. NAACL-2009.
154
References[Muggleton, 1996] Stephen Muggleton, "Stochastic logic programs", in
Proc. ILP-1996.
[Nilsson, 1986] Nil Nilsson, "Probabilistic logic", Artificial Intelligence 28.
[Page et al., 1998] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, "The PageRank Citation Ranking: Bringing Order to the Web", Tech. Rept., Stanford University, 1998.
[Poon & Domingos, 2006] Hoifung Poon and Pedro Domingos, "Sound and Efficient Inference with Probabilistic and Deterministic Dependencies", In Proc. AAAI-06.
[Poon & Domingos, 2007] Hoifung Poon and Pedro Domingo, "Joint Inference in Information Extraction", In Proc. AAAI-07.
155
References[Poon & Domingos, 2008a] Hoifung Poon, Pedro Domingos, Marc
Sumner, "A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC", In Proc. AAAI-08.
[Poon & Domingos, 2008b] Hoifung Poon and Pedro Domingos, "Joint Unsupervised Coreference Resolution with Markov Logic", In Proc. EMNLP-08.
[Poon & Domingos, 2009] Hoifung and Pedro Domingos, "Unsupervised Semantic Parsing", In Proc. EMNLP-09.
[Poon & Cherry & Toutanova, 2009] Hoifung Poon, Colin Cherry, Kristina Toutanova, "Unsupervised Morphological Segmentation with Log-Linear Models", In Proc. NAACL-2009.
156
References[Poon & Vanderwende, 2010] Hoifung Poon and Lucy Vanderwende,
"Joint Inference for Knowledge Extraction from Biomedical Literature", In Proc. NAACL-10.
[Poon & Domingos, 2010] Hoifung and Pedro Domingos, "Unsupervised Ontology Induction From Text", In Proc. ACL-10.
[Riedel 2008] Sebatian Riedel, "Improving the Accuracy and Efficiency of MAP Inference for Markov Logic", In Proc. UAI-2008.
[Riedel et al., 2009] Sebastian Riedel, Hong-Woo Chun, Toshihisa Takagi and Jun'ichi Tsujii, "A Markov Logic Approach to Bio-Molecular Event Extraction", In Proc. BioNLP 2009 Shared Task.
[Selman et al., 1996] Bart Selman, Henry Kautz, Bram Cohen, "Local search strategies for satisfiability testing", In Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge. AMS.
157
References[Singla & Domingos, 2006a] Parag Singla and Pedro Domingos,
"Memory-Efficient Inference in Relational Domains", In Proc. AAAI-2006.
[Singla & Domingos, 2006b] Parag Singla and Pedro Domingos, "Entity Resolution with Markov Logic", In Proc. ICDM-2006.
[Singla & Domingos, 2007] Parag Singla and Pedro Domingos, "Markov Logic in Infinite Domains", In Proc. UAI-2007.
[Singla & Domingos, 2008] Parag Singla and Pedro Domingos, "Lifted First-Order Belief Propagation", In Proc. AAAI-2008.
[Taskar et al., 2002] Ben Taskar, Pieter Abbeel, Daphne Koller, "Discriminative probabilistic models for relational data", in Proc. UAI-2002.
158
References[Toutanova & Haghighi & Manning, 2008] Kristina Toutanova, Aria
Haghighi, Chris Manning, "A global joint model for semantic role labeling", Computational Linguistics.
[Wang & Domingos, 2008] Jue Wang and Pedro Domingos, "Hybrid Markov Logic Networks", In Proc. AAAI-2008.
[Wellman et al., 1992] Michael Wellman, John S. Breese, Robert P. Goldman, "From knowledge bases to decision models", Knowledge Engineering Review 7.
[Yoshikawa et al., 2009] Katsumasa Yoshikawa, Sebastian Riedel, Masayuki Asahara and Yuji Matsumoto, "Jointly Identifying Temporal Relations with Markov Logic", In Proc. ACL-2009.