158
1 Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

Embed Size (px)

Citation preview

Page 1: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

1

Markov Logic in Natural Language Processing

Hoifung PoonDept. of Computer Science & Eng.

University of Washington

Page 2: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

2

Overview

Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning

Page 3: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

3

Languages Are Structural

governments

lm$pxtm(according to their families)

Page 4: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

4

Languages Are Structural

govern-ment-s

l-m$px-t-m(according to their families)

S

V NP

NP VP

IL-4 induces CD11B

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41......

involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

George Walker Bush was the 43rd President of the United States.…… Bush was the eldest son of President G. H. W. Bush and Babara Bush. …….In November 1977, he met Laura Welch at a barbecue.

Page 5: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

5

Languages Are Structural

govern-ment-s

l-m$px-t-m(according to their families)

S

V NP

NP VP

IL-4 induces CD11B

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41......

involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

George Walker Bush was the 43rd President of the United States.…… Bush was the eldest son of President G. H. W. Bush and Babara Bush. …….In November 1977, he met Laura Welch at a barbecue.

Page 6: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

6

Languages Are Structural

Objects are not just feature vectors They have parts and subparts Which have relations with each other They can be trees, graphs, etc.

Objects are seldom i.i.d.(independent and identically distributed) They exhibit local and global dependencies They form class hierarchies (with multiple inheritance) Objects’ properties depend on those of related objects

Deeply interwoven with knowledge

Page 7: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

7

First-Order Logic

Main theoretical foundation of computer scienceGeneral language for describing

complex structures and knowledge Trees, graphs, dependencies, hierarchies, etc.

easily expressed Inference algorithms (satisfiability testing,

theorem proving, etc.)

Page 8: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

8

G. W. Bush ………… Laura Bush ……Mrs. Bush ……

Languages Are Statistical

I saw the man with the telescope

I saw the man with the telescope

NP

NP ADVP

I saw the man with the telescope

Here in London, Frances Deek is a retired teacher …In the Israeli town …, Karen London says …Now London says …

London PERSON or LOCATION?

Microsoft buys Powerset

Microsoft acquires Powerset

Powerset is acquired by Microsoft Corporation

The Redmond software giant buys Powerset

Microsoft’s purchase of Powerset, …

……

Which one?

Page 9: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

9

Languages Are Statistical

Languages are ambiguousOur information is always incompleteWe need to model correlationsOur predictions are uncertain Statistics provides the tools to handle this

Page 10: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

10

Probabilistic Graphical Models

Mixture modelsHidden Markov models Bayesian networksMarkov random fieldsMaximum entropy modelsConditional random fields Etc.

Page 11: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

11

The Problem

Logic is deterministic, requires manual coding Statistical models assume i.i.d. data,

objects = feature vectors Historically, statistical and logical NLP

have been pursued separately We need to unify the two! Burgeoning field in machine learning:

Statistical relational learning

Page 12: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

12

Costs and Benefits ofStatistical Relational Learning

Benefits Better predictive accuracy Better understanding of domains Enable learning with less or no labeled data

Costs Learning is much harder Inference becomes a crucial issue Greater complexity for user

Page 13: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

13

Progress to Date

Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction

[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Etc. This talk: Markov logic [Domingos & Lowd, 2009]

Page 14: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

14

Markov Logic: A Unifying Framework

Probabilistic graphical models andfirst-order logic are special cases

Unified inference and learning algorithms Easy-to-use software: Alchemy Broad applicabilityGoal of this tutorial:

Quickly learn how to use Markov logic and Alchemy for a broad spectrum of NLP applications

Page 15: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

15

Overview Motivation Foundational areas

Probabilistic inference Statistical learning Logical inference Inductive logic programming

Markov logic NLP applications

Basics Supervised learning Unsupervised learning

Page 16: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

16

Markov Networks Undirected graphical models

Cancer

CoughAsthma

Smoking

Potential functions defined over cliques

Smoking Cancer Ф(S,C)

False False 4.5

False True 4.5

True False 2.7

True True 4.5

c

cc xZ

xP )(1

)(

x c

cc xZ )(

Page 17: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

17

Markov Networks Undirected graphical models

Log-linear model:

Weight of Feature i Feature i

otherwise0

CancerSmokingif1)CancerSmoking,(1f

5.11 w

Cancer

CoughAsthma

Smoking

iii xfw

ZxP )(exp

1)(

Page 18: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

18

Markov Nets vs. Bayes Nets

Property Markov Nets Bayes Nets

Form Prod. potentials Prod. potentials

Potentials Arbitrary Cond. probabilities

Cycles Allowed Forbidden

Partition func. Z = ? Z = 1

Indep. check Graph separation D-separation

Indep. props. Some Some

Inference MCMC, BP, etc. Convert to Markov

Page 19: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

19

Inference in Markov NetworksGoal: compute marginals & conditionals of

Exact inference is #P-completeConditioning on Markov blanket is easy:

Gibbs sampling exploits this

exp ( )( | ( ))

exp ( 0) exp ( 1)

i ii

i i i ii i

w f xP x MB x

w f x w f x

1( ) exp ( )i i

i

P X w f XZ

exp ( )i i

X i

Z w f X

Page 20: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

20

MCMC: Gibbs Sampling

state ← random truth assignmentfor i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of xP(F) ← fraction of states in which F is true

Page 21: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

21

Other Inference Methods

Belief propagation (sum-product)Mean field / Variational approximations

Page 22: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

22

MAP/MPE Inference

Goal: Find most likely state of world given evidence

)|(max xyPy

Query Evidence

Page 23: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

23

MAP Inference Algorithms

Iterated conditional modes Simulated annealingGraph cuts Belief propagation (max-product) LP relaxation

Page 24: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

24

Overview Motivation Foundational areas

Probabilistic inference Statistical learning Logical inference Inductive logic programming

Markov logic NLP applications

Basics Supervised learning Unsupervised learning

Page 25: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

25

Generative Weight Learning

Maximize likelihoodUse gradient ascent or L-BFGSNo local maxima

Requires inference at each step (slow!)

No. of times feature i is true in data

Expected no. times feature i is true according to model

)()()(log xnExnxPw iwiw

i

Page 26: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

26

Pseudo-Likelihood

Likelihood of each variable given its neighbors in the data

Does not require inference at each stepWidely used in vision, spatial statistics, etc. But PL parameters may not work well for

long inference chains

i

ii xneighborsxPxPL ))(|()(

Page 27: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

27

Discriminative Weight Learning

Maximize conditional likelihood of query (y) given evidence (x)

Approximate expected counts by counts in MAP state of y given x

No. of true groundings of clause i in data

Expected no. true groundings according to model

),(),()|(log yxnEyxnxyPw iwiw

i

Page 28: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

28

wi ← 0for t ← 1 to T do yMAP ← Viterbi(x) wi ← wi + η [counti(yData) – counti(yMAP)]return wi / T

Voted Perceptron

Originally proposed for training HMMs discriminatively

Assumes network is linear chain Can be generalized to arbitrary networks

Page 29: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

29

Overview Motivation Foundational areas

Probabilistic inference Statistical learning Logical inference Inductive logic programming

Markov logic NLP applications

Basics Supervised learning Unsupervised learning

Page 30: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

30

First-Order Logic

Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x, y)

Literal: Predicate or its negationClause: Disjunction of literalsGrounding: Replace all variables by constants

E.g.: Friends (Anna, Bob)World (model, interpretation):

Assignment of truth values to all ground predicates

Page 31: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

31

Inference in First-Order Logic Traditionally done by theorem proving

(e.g.: Prolog) Propositionalization followed by model

checking turns out to be faster (often by a lot) Propositionalization:

Create all ground atoms and clausesModel checking: Satisfiability testing Two main approaches: Backtracking (e.g.: DPLL) Stochastic local search (e.g.: WalkSAT)

Page 32: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

32

Satisfiability

Input: Set of clauses(Convert KB to conjunctive normal form (CNF))

Output: Truth assignment that satisfies all clauses, or failure

The paradigmatic NP-complete problem Solution: Search Key point:

Most SAT problems are actually easy Hard region: Narrow range of

#Clauses / #Variables

Page 33: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

33

Stochastic Local Search

Uses complete assignments instead of partial Start with random state Flip variables in unsatisfied clausesHill-climbing: Minimize # unsatisfied clauses Avoid local minima: Random flipsMultiple restarts

Page 34: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

34

The WalkSAT Algorithm

for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes # satisfied clausesreturn failure

Page 35: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

35

Overview Motivation Foundational areas

Probabilistic inference Statistical learning Logical inference Inductive logic programming

Markov logic NLP applications

Basics Supervised learning Unsupervised learning

Page 36: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

36

Rule Induction

Given: Set of positive and negative examples of some concept Example: (x1, x2, … , xn, y) y: concept (Boolean) x1, x2, … , xn: attributes (assume Boolean)

Goal: Induce a set of rules that cover all positive examples and no negative ones Rule: xa ^ xb ^ … y (xa: Literal, i.e., xi or its negation) Same as Horn clause: Body Head Rule r covers example x iff x satisfies body of r

Eval(r): Accuracy, info gain, coverage, support, etc.

Page 37: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

37

Learning a Single Rule

head ← ybody ← Ørepeat for each literal x rx ← r with x added to body Eval(rx) body ← body ^ best xuntil no x improves Eval(r)return r

Page 38: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

38

Learning a Set of Rules

R ← ØS ← examplesrepeat learn a single rule r R ← R U { r } S ← S − positive examples covered by runtil S = Øreturn R

Page 39: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

39

First-Order Rule Induction

y and xi are now predicates with argumentsE.g.: y is Ancestor(x,y), xi is Parent(x,y)

Literals to add are predicates or their negations Literal to add must include at least one variable

already appearing in rule Adding a literal changes # groundings of rule

E.g.: Ancestor(x,z) ^ Parent(z,y) Ancestor(x,y) Eval(r) must take this into account

E.g.: Multiply by # positive groundings of rule still covered after adding literal

Page 40: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

40

Overview

Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning

Page 41: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

41

Markov Logic

Syntax: Weighted first-order formulasSemantics: Feature templates for Markov

networks Intuition: Soften logical constraintsGive each formula a weight

(Higher weight Stronger constraint)

satisfiesit formulas of weightsexpP(world)

Page 42: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

42

Example: Coreference Resolution

Mentions of Obama are often headed by "Obama"Mentions of Obama are often headed by "President"Appositions usually refer to the same entity

Barack Obama, the 44th President of the United States, is the first African American to hold the office. ……

Page 43: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

43

MentionOf ( ,Obama) Head( ,"Obama")

MentionOf ( ,Obama) Head( ,"President")

, , Apposition( , ) MentionOf ( , ) MentionOf ( , )

x x x

x x x

x y c x y x c y c

Example: Coreference Resolution

Page 44: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

44

Example: Coreference Resolution

1.5

0.8

100

MentionOf ( ,Obama) Head( ,"Obama")

MentionOf ( ,Obama) Head( ,"President")

, , Apposition( , ) MentionOf ( , ) MentionOf ( , )

x x x

x x x

x y c x y x c y c

Page 45: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

45

Example: Coreference Resolution

1.5

0.8

100

Head(A,“Obama”)

MentionOf(A,Obama)

Head(A,“President”)

MentionOf(B,Obama)

Apposition(A,B)

Head(B,“Obama”)

Head(B,“President”)

Two mention constants: A and B

Apposition(B,A)

MentionOf ( ,Obama) Head( ,"Obama")

MentionOf ( ,Obama) Head( ,"President")

, , Apposition( , ) MentionOf ( , ) MentionOf ( , )

x x x

x x x

x y c x y x c y c

Page 46: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

46

Markov Logic Networks MLN is template for ground Markov nets Probability of a world x:

Typed variables and constants greatly reduce size of ground Markov net

Functions, existential quantifiers, etc. Can handle infinite domains [Singla & Domingos, 2007]

and continuous domains [Wang & Domingos, 2008]

Weight of formula i No. of true groundings of formula i in x

iii xnw

ZxP )(exp

1)(

Page 47: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

47

Relation to Statistical Models

Special cases: Markov networks Markov random fields Bayesian networks Log-linear models Exponential models Max. entropy models Gibbs distributions Boltzmann machines Logistic regression Hidden Markov models Conditional random fields

Obtained by making all predicates zero-arity

Markov logic allows objects to be interdependent (non-i.i.d.)

Page 48: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

48

Relation to First-Order Logic

Infinite weights First-order logic Satisfiable KB, positive weights

Satisfying assignments = Modes of distributionMarkov logic allows contradictions between

formulas

Page 49: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

49

MLN Algorithms:The First Three Generations

Problem First generation

Second generation

Third generation

MAP inference

Weighted satisfiability

Lazy inference

Cutting planes

Marginal inference

Gibbs sampling

MC-SAT Lifted inference

Weight learning

Pseudo-likelihood

Voted perceptron

Scaled conj. gradient

Structure learning

Inductive logic progr.

ILP + PL (etc.)

Clustering + pathfinding

Page 50: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

50

MAP/MPE Inference

Problem: Find most likely state of world given evidence

)|(max xyPy

Query Evidence

Page 51: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

51

MAP/MPE Inference

Problem: Find most likely state of world given evidence

i

iix

yyxnw

Z),(exp

1max

Page 52: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

52

MAP/MPE Inference

Problem: Find most likely state of world given evidence

i

iiy

yxnw ),(max

Page 53: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

53

MAP/MPE Inference

Problem: Find most likely state of world given evidence

This is just the weighted MaxSAT problemUse weighted SAT solver

(e.g., MaxWalkSAT [Kautz et al., 1997] )

i

iiy

yxnw ),(max

Page 54: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

54

The MaxWalkSAT Algorithm

for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes weights(sat. clauses) return failure, best solution found

Page 55: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

55

Computing Probabilities

P(Formula|MLN,C) = ?MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms First construct min subset of network necessary

to answer query (generalization of KBMC) Then apply MCMC

Page 56: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

56

But … Insufficient for Logic

Problem:Deterministic dependencies break MCMCNear-deterministic ones make it very slow

Solution:Combine MCMC and WalkSAT

→ MC-SAT algorithm [Poon & Domingos, 2006]

Page 57: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

57

Auxiliary-Variable Methods

Main ideas: Use auxiliary variables to capture dependencies Turn difficult sampling into uniform sampling

Given distribution P(x)

Sample from f (x, u), then discard u

1, if 0 ( )( , ) ( , ) ( )

0, otherwise

u P xf x u f x u du P x

Page 58: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

58

Slice Sampling [Damien et al. 1999]

Xx(k)

u(k)

x(k+1)

Slice

U P(x)

Page 59: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

59

Slice Sampling

Identifying the slice may be difficult

Introduce an auxiliary variable ui for each Фi

1( ) ( )i

i

P x xZ

1

1 if 0 ( )( , , , )

0 otherwisei i

n

u xf x u u

Page 60: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

60

The MC-SAT Algorithm Select random subset M of satisfied clauses With probability 1 – exp ( – wi )

Larger wi Ci more likely to be selected

Hard clause (wi ): Always selected

Slice States that satisfy clauses in M Uses SAT solver to sample x | u. Orders of magnitude faster than Gibbs sampling,

etc.

Page 61: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

61

But … It Is Not Scalable

1000 researchers Coauthor(x,y): 1 million ground atoms Coauthor(x,y) Coauthor(y,z) Coauthor(x,z): 1

billion ground clauses Exponential in arity

Page 62: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

62

Sparsity to the Rescue 1000 researchers Coauthor(x,y): 1 million ground atoms

But … most atoms are false Coauthor(x,y) Coauthor(y,z) Coauthor(x,z):

1 billion ground clauses

Most trivially satisfied if most atoms are falseNo need to explicitly compute most of them

Page 63: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

63

Lazy Inference

LazySAT [Singla & Domingos, 2006a] Lazy version of WalkSAT [Selman et al., 1996]

Grounds atoms/clauses as needed Greatly reduces memory usage

The idea is much more general [Poon & Domingos, 2008a]

Page 64: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

64

General Method for Lazy Inference

If most variables assume the default value, wasteful to instantiate all variables / functions

Main idea: Allocate memory for a small subset of

“active” variables / functions Activate more if necessary as inference proceeds

Applicable to a diverse set of algorithms: Satisfiability solvers (systematic, local-search), Markov chain Monte Carlo, MPE / MAP algorithms, Maximum expected utility algorithms, Belief propagation, MC-SAT, Etc.

Reduce memory and time by orders of magnitude

Page 65: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

65

Lifted Inference

Consider belief propagation (BP)Often in large problems, many nodes are

interchangeable:They send and receive the same messages throughout BP

Basic idea: Group them into supernodes, forming lifted network

Smaller network → Faster inference Akin to resolution in first-order logic

Page 66: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

66

Belief Propagation

Nodes (x)

Features (f)

}\{)(

)()(fxnh

xhfx xx

}{~ }\{)(

)( )()(x xfny

fyxwf

xf yex

Page 67: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

67

Lifted Belief Propagation

}\{)(

)()(fxnh

xhfx xx

}{~ }\{)(

)( )()(x xfny

fyxwf

xf yex

Nodes (x)

Features (f)

Page 68: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

68

Lifted Belief Propagation

}\{)(

)()(fxnh

xhfx xx

}{~ }\{)(

)( )()(x xfny

fyxwf

xf yex

, :Functions of edge counts

Nodes (x)

Features (f)

Page 69: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

69

Learning

Data is a relational databaseClosed world assumption (if not: EM) Learning parameters (weights) Learning structure (formulas)

Page 70: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

70

Parameter tying: Groundings of same clause

Generative learning: Pseudo-likelihoodDiscriminative learning: Conditional likelihood,

use MC-SAT or MaxWalkSAT for inference

Parameter Learning

No. of times clause i is true in data

Expected no. times clause i is true according to MLN

log ( ) ( ) ( )i x ii

P x n x E n xw

Page 71: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

71

Parameter Learning

Pseudo-likelihood + L-BFGS is fast and robust but can give poor inference results

Voted perceptron:Gradient descent + MAP inference

Scaled conjugate gradient

Page 72: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

72

HMMs are special case of MLNsReplace Viterbi by MaxWalkSATNetwork can now be arbitrary graph

wi ← 0for t ← 1 to T do yMAP ← MaxWalkSAT(x) wi ← wi + η [counti(yData) – counti(yMAP)]return wi / T

Voted Perceptron for MLNs

Page 73: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

73

Problem: Multiple Modes

Not alleviated by contrastive divergence Alleviated by MC-SATWarm start: Start each MC-SAT run at

previous end state

Page 74: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

74

Solvable by quasi-Newton, conjugate gradient, etc. But line searches require exact inference Solution: Scaled conjugate gradient

[Lowd & Domingos, 2008]

Use Hessian to choose step size Compute quadratic form inside MC-SAT Use inverse diagonal Hessian as preconditioner

Problem: Extreme Ill-Conditioning

Page 75: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

75

Structure Learning

Standard inductive logic programming optimizesthe wrong thing

But can be used to overgenerate for L1 pruning Our approach:

ILP + Pseudo-likelihood + Structure priors For each candidate structure change:

Start from current weights & relax convergence Use subsampling to compute sufficient statistics

Page 76: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

76

Structure Learning

Initial state: Unit clauses or prototype KBOperators: Add/remove literal, flip sign Evaluation function:

Pseudo-likelihood + Structure prior Search: Beam search, shortest-first search

Page 77: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

77

Alchemy

Open-source software including: Full first-order logic syntaxGenerative & discriminative weight learning Structure learningWeighted satisfiability, MCMC, lifted BP Programming language features

alchemy.cs.washington.edu

Page 78: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

78

Alchemy Prolog BUGS

Represent-ation

F.O. Logic + Markov nets

Horn clauses

Bayes nets

Inference Model check- ing, MCMC, lifted BP

Theorem proving

MCMC

Learning Parameters& structure

No Params.

Uncertainty Yes No Yes

Relational Yes Yes No

Page 79: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

79

Constrained Conditional Model

Representation: Integer linear programs

Local classifiers + Global constraints Inference: LP solver Parameter learning: None for constraints Weights of soft constraints set heuristically Local weights typically learned independently

Structure learning: None to date But see latest development in NAACL-10

Page 80: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

80

Running Alchemy

Programs Infer Learnwts Learnstruct

Options

MLN file Types (optional) Predicates Formulas

Database files

Page 81: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

81

Overview

Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning

Page 82: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

82

Uniform Distribn.: Empty MLN

Example: Unbiased coin flips

Type: flip = { 1, … , 20 }

Predicate: Heads(flip)

1 0

1 10 0

1(Heads( ))

2Z

Z Z

eP f

e e

Page 83: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

83

Binomial Distribn.: Unit Clause

Example: Biased coin flips

Type: flip = { 1, … , 20 }

Predicate: Heads(flip)

Formula: Heads(f)

Weight: Log odds of heads:

By default, MLN includes unit clauses for all predicates

(captures marginal distributions, etc.)

1

1 1 0

1(Heads( ))

1

wZ

w wZ Z

eP f p

e e e

p

pw

1log

Page 84: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

84

Multinomial Distribution

Example: Throwing die

Types: throw = { 1, … , 20 }

face = { 1, … , 6 }

Predicate: Outcome(throw,face)

Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’).

Exist f Outcome(t,f).

Too cumbersome!

Page 85: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

85

Multinomial Distrib.: ! Notation

Example: Throwing die

Types: throw = { 1, … , 20 }

face = { 1, … , 6 }

Predicate: Outcome(throw,face!)

Formulas:

Semantics: Arguments without “!” determine arguments with “!”.

Also makes inference more efficient (triggers blocking).

Page 86: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

86

Multinomial Distrib.: + Notation

Example: Throwing biased die

Types: throw = { 1, … , 20 }

face = { 1, … , 6 }

Predicate: Outcome(throw,face!)

Formulas: Outcome(t,+f)

Semantics: Learn weight for each grounding of args with “+”.

Page 87: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

87

Logistic regression:

Type: obj = { 1, ... , n }Query predicate: C(obj)Evidence predicates: Fi(obj)Formulas: a C(x) bi Fi(x) ^ C(x)

Resulting distribution:

Therefore:

Alternative form: Fi(x) => C(x)

Logistic Regression (MaxEnt)

iiii fba

fba

CP

CP

)0exp(

explog

)|0(

)|1(log

fF

fF

iii cfbac

ZcCP exp

1),( fF

ii fbaCP

CP

)|0(

)|1(log

fF

fF

Page 88: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

88

Hidden Markov Models

obs = { Red, Green, Yellow }state = { Stop, Drive, Slow }time = { 0, ..., 100 }

State(state!,time)Obs(obs!,time)

State(+s,0)State(+s,t) ^ State(+s',t+1)Obs(+o,t) ^ State(+s,t)

Sparse HMM: State(s,t) => State(s1,t+1) v State(s2, t+1) v ... .

Page 89: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

89

Bayesian Networks

Use all binary predicates with same first argument (the object x).

One predicate for each variable A: A(x,v!) One clause for each line in the CPT and

value of the variable Context-specific independence:

One clause for each path in the decision tree Logistic regression: As before Noisy OR: Deterministic OR + Pairwise clauses

Page 90: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

90

Relational Models

Knowledge-based model construction Allow only Horn clauses Same as Bayes nets, except arbitrary relations Combin. function: Logistic regression, noisy-OR or external

Stochastic logic programs Allow only Horn clauses Weight of clause = log(p) Add formulas: Head holds Exactly one body holds

Probabilistic relational models Allow only binary relations Same as Bayes nets, except first argument can vary

Page 91: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

91

Relational Models Relational Markov networks

SQL → Datalog → First-order logic One clause for each state of a clique + syntax in Alchemy facilitates this

Bayesian logic Object = Cluster of similar/related observations Observation constants + Object constants Predicate InstanceOf(Obs,Obj) and clauses using it

Unknown relations: Second-order Markov logicS. Kok & P. Domingos, “Statistical Predicate Invention”, inProc. ICML-2007.

Page 92: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

92

Overview

Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning

Page 93: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

93

Text Classification

The 56th quadrennial United States presidential election was held on November 4, 2008. Outgoing Republican President George W. Bush's policies and actions and the American public's desire for change were key issues throughout the campaign. ……

The Chicago Bulls are an American professional basketball team based in Chicago, Illinois, playing in the Central Division of the Eastern Conference in the National Basketball Association (NBA). ……

……

Topic = politics

Topic = sports

Page 94: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

94

Text Classificationpage = {1, ..., max}word = { ... }topic = { ... }

Topic(page,topic)HasWord(page,word)

Topic(p,t)HasWord(p,+w) => Topic(p,+t)

If topics mutually exclusive: Topic(page,topic!)

Page 95: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

95

Text Classificationpage = {1, ..., max}word = { ... }topic = { ... }

Topic(page,topic)HasWord(page,word)Links(page,page)

Topic(p,t)HasWord(p,+w) => Topic(p,+t)Topic(p,t) ^ Links(p,p') => Topic(p',t)

Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext ClassificationUsing Hyperlinks,” in Proc. SIGMOD-1998.

Page 96: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

96

Entity ResolutionAUTHOR: H. POON & P. DOMINGOSTITLE: UNSUPERVISED SEMANTIC PARSINGVENUE: EMNLP-09

AUTHOR: Hoifung Poon and Pedro DomingsTITLE: Unsupervised semantic parsingVENUE: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

AUTHOR: Poon, Hoifung and Domings, PedroTITLE: Unsupervised ontology induction from textVENUE: Proceedings of the Forty-Eighth Annual Meeting of the Association for Computational Linguistics

AUTHOR: H. Poon, P. DomingsTITLE: Unsupervised ontology inductionVENUE: ACL-10

SAME?

SAME?

Page 97: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

97

Problem: Given database, find duplicate records

HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)

HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) => SameRecord(r,r’)

Entity Resolution

Page 98: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

98

Problem: Given database, find duplicate records

HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)

HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) => SameRecord(r,r’)SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”)

Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertaintywith Application to Noun Coreference,” in Adv. NIPS 17, 2005.

Entity Resolution

Page 99: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

99

Can also resolve fields:

HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)

HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) <=> SameRecord(r,r’)SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”)SameField(f,r,r’) ^ SameField(f,r’,r”) => SameField(f,r,r”)

More: P. Singla & P. Domingos, “Entity Resolution with Markov Logic”, in Proc. ICDM-2006.

Entity Resolution

Page 100: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

100

UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS. EMNLP-2009.

Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: ACL.

Information Extraction

Page 101: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

101

UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS. EMNLP-2009.

Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: ACL.

Information Extraction

Author Title Venue

SAME?

Page 102: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

102

Information Extraction

Problem: Extract database from text orsemi-structured sources

Example: Extract database of publications from citation list(s) (the “CiteSeer problem”)

Two steps: Segmentation:

Use HMM to assign tokens to fields Entity resolution:

Use logistic regression and transitivity

Page 103: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

103

Token(token, position, citation)InField(position, field!, citation)SameField(field, citation, citation)SameCit(citation, citation)

Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ InField(i+1,+f,c)

Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)

Information Extraction

Page 104: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

104

Token(token, position, citation)InField(position, field!, citation)SameField(field, citation, citation)SameCit(citation, citation)

Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ !Token(“.”,i,c) ^ InField(i+1,+f,c)

Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)

More: H. Poon & P. Domingos, “Joint Inference in InformationExtraction”, in Proc. AAAI-2007.

Information Extraction

Page 105: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

105

Biomedical Text Mining

Traditionally, name entity recognition or information extractionE.g., protein recognition, protein-protein identification

BioNLP-09 shared task: Nested bio-events Much harder than traditional IE Top F1 around 50% Naturally calls for joint inference

Page 106: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

106

Bio-Event Extraction

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...

involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

Page 107: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

107

Token(position, token)DepEdge(position, position, dependency)IsProtein(position)EvtType(position, evtType)InArgPath(position, position, argType!)

Token(i,+w) => EvtType(i,+t)Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)DepEdge(i,j,+d) => InArgPath(i,j,+a)Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)…

Bio-Event Extraction

Logistic regression

Page 108: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

108

Token(position, token)DepEdge(position, position, dependency)IsProtein(position)EvtType(position, evtType)InArgPath(position, position, argType!)

Token(i,+w) => EvtType(i,+t)Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)DepEdge(i,j,+d) => InArgPath(i,j,+a)Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)…

InArgPath(i,j,Theme) => IsProtein(j) v (Exist k k!=i ^ InArgPath(j, k, Theme)).…

More: H. Poon and L. Vanderwende, “Joint Inference for Knowledge Extraction from Biomedical Literature”, 10:40 am, June 4, Gold Room.

Bio-Event Extraction

Adding a few joint inference rules doubles the F1

Page 109: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

109

Temporal Information Extraction

Identify event times and temporal relations (BEFORE, AFTER, OVERLAP)

E.g., who is the President of U.S.A.? Obama: 1/20/2009 present G. W. Bush: 1/20/2001 1/19/2009 Etc.

Page 110: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

110

DepEdge(position, position, dependency)Event(position, event)After(event, event)

DepEdge(i,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)

After(p,q) ^ After(q,r) => After(p,r)

Temporal Information Extraction

Page 111: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

111

DepEdge(position, position, dependency)Event(position, event)After(event, event)Role(position, position, role)

DepEdge(I,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)Role(i,j,ROLE-AFTER) ^ Event(i,p) ^ Event(j,q) => After(p,q)

After(p,q) ^ After(q,r) => After(p,r)

More:

K. Yoshikawa, S. Riedel, M. Asahara and Y. Matsumoto, “Jointly Identifying Temporal Relations with Markov Logic”, in Proc. ACL-2009.

X. Ling & D. Weld, “Temporal Information Extraction”, in Proc. AAAI-2010.

Temporal Information Extraction

Page 112: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

112

Semantic Role Labeling

Problem: Identify arguments for a predicate Two steps: Argument identification:

Determine whether a phrase is an argument Role classification:

Determine the type of an argument (agent, theme, temporal, adjunct, etc.)

Page 113: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

113

Token(position, token)DepPath(position, position, path)IsPredicate(position)Role(position, position, role!)HasRole(position, position)

Token(i,+t) => IsPredicate(i)DepPath(i,j,+p) => Role(i,j,+r)

HasRole(i,j) => IsPredicate(i)IsPredicate(i) => Exist j HasRole(i,j)HasRole(i,j) => Exist r Role(i,j,r)Role(i,j,r) => HasRole(i,j)

Cf. K. Toutanova, A. Haghighi, C. Manning, “A global joint model for semantic role labeling”, in Computational Linguistics 2008.

Semantic Role Labeling

Page 114: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

114

Token(position, token)DepPath(position, position, path)IsPredicate(position)Role(position, position, role!)HasRole(position, position)Sense(position, sense!)

Token(i,+t) => IsPredicate(i)DepPath(i,j,+p) => Role(i,j,+r)Sense(I,s) => IsPredicate(i)

HasRole(i,j) => IsPredicate(i)IsPredicate(i) => Exist j HasRole(i,j)HasRole(i,j) => Exist r Role(i,j,r)Role(i,j,r) => HasRole(i,j)Token(i,+t) ^ Role(i,j,+r) => Sense(i,+s)

More: I. Meza-Ruiz & S. Riedel, “Jointly Identifying Predicates, Arguments and Senses using Markov Logic”, in Proc. NAACL-2009.

Joint Semantic Role Labeling and Word Sense Disambiguation

Page 115: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

115

Practical Tips: Modeling

Add all unit clauses (the default)How to handle uncertain data:R(x,y) ^ R’(x,y) (the “HMM trick”)

Implications vs. conjunctionsFor soft correlation, conjunctions often better Implication: A => B is equivalent to !(A ^ !B) Share cases with others like A => C Make learning unnecessarily harder

Page 116: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

116

Practical Tips: Efficiency

Open/closed world assumptionsLow clause aritiesLow numbers of constantsShort inference chains

Page 117: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

117

Practical Tips: Development

Start with easy componentsGradually expand to full taskUse the simplest MLN that worksCycle: Add/delete formulas, learn and test

Page 118: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

118

Overview

Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning

Page 119: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

119

Unsupervised Learning: Why?

Virtually unlimited supply of unlabeled text Labeling is expensive (Cf. Penn-Treebank)Often difficult to label with consistency and

high quality (e.g., semantic parses) Emerging field: Machine reading

Extract knowledge from unstructured text with high precision/recall and minimal human effort

Check out LBR-Workshop (WS9) on Sunday

Page 120: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

120

Unsupervised Learning: How?

I.i.d. learning: Sophisticated model requires more labeled data

Statistical relational learning: Sophisticated model may require less labeled data Relational dependencies constrain problem space One formula is worth a thousand labels

Small amount of domain knowledge large-scale joint inference

Page 121: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

121

Ambiguities vary among objects Joint inference Propagate information from

unambiguous objects to ambiguous ones E.g.:

G. W. Bush …

He …

Mrs. Bush …

Unsupervised Learning: How?

Are they coreferent?

Page 122: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

122

Ambiguities vary among objects Joint inference Propagate information from

unambiguous objects to ambiguous ones E.g.:

G. W. Bush …

He …

Mrs. Bush …

Unsupervised Learning: How

Should be coreferent

Page 123: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

123

Ambiguities vary among objects Joint inference Propagate information from

unambiguous objects to ambiguous ones E.g.:

G. W. Bush …

He …

Mrs. Bush …

Unsupervised Learning: How

So must be singular male!

Page 124: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

124

Ambiguities vary among objects Joint inference Propagate information from

unambiguous objects to ambiguous ones E.g.:

G. W. Bush …

He …

Mrs. Bush …

Unsupervised Learning: How

Must be singular female!

Page 125: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

125

Ambiguities vary among objects Joint inference Propagate information from

unambiguous objects to ambiguous ones E.g.:

G. W. Bush …

He …

Mrs. Bush …

Unsupervised Learning: How

Verdict: Not coreferent!

Page 126: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

126

| ,log ( ) ( , ) ( , )z x i x z ii

P x E n x z E n x zw

Marginalize out hidden variables

Use MC-SAT to approximate both expectationsMay also combine with contrastive estimation

[Poon & Cherry & Toutanova, NAACL-2009]

Parameter Learning

Sum over z, conditioned on observed x

Summed over both x and z

Page 127: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

127

Unsupervised Coreference Resolution

Head(mention, string)Type(mention, type)MentionOf(mention, entity)

MentionOf(+m,+e)Type(+m,+t)Head(+m,+h) ^ MentionOf(+m,+e)

MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))

… (similarly for Number, Gender etc.)

Mixture model

Joint inference formulas: Enforce agreement

Page 128: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

128

Unsupervised Coreference Resolution

Head(mention, string)Type(mention, type)MentionOf(mention, entity)Apposition(mention, mention)

MentionOf(+m,+e)Type(+m,+t)Head(+m,+h) ^ MentionOf(+m,+e)

MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))

… (similarly for Number, Gender etc.)

Apposition(a,b) => (MentionOf(a,e) <=> MentionOf(b,e))

More: H. Poon and P. Domingos, “Joint Unsupervised Coreference Resolution with Markov Logic”, in Proc. EMNLP-2008.

Joint inference formulas: Leverage apposition

Page 129: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

129

Relational Clustering: Discover Unknown PredicatesCluster relations along with objectsUse second-order Markov logic

[Kok & Domingos, 2007, 2008]

Key idea: Cluster combination determines likelihood of relationsInClust(r,+c) ^ InClust(x,+a) ^ InClust(y,+b) => r(x,y)

Input: Relational tuples extracted by TextRunner [Banko et al., 2007]

Output: Semantic network

Page 130: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

130

Recursive Relational Clustering

Unsupervised semantic parsing [Poon & Domingos, EMNLP-2009]

Text Knowledge Start directly from text Identify meaning units + Resolve variations Use high-order Markov logic (variables over

arbitrary lambda forms and their clusters) End-to-end machine reading:

Read text, then answer questions

Page 131: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

131

Semantic Parsing

induces

protein CD11b

nsubj dobj

IL-4

nn

induces

protein CD11b

nsubj dobj

IL-4

nn

INDUCE

INDUCER INDUCED

IL-4

CD11B

INDUCE(e1)

INDUCER(e1,e2) INDUCED(e1,e3)

IL-4(e2) CD11B(e3)

IL-4 protein induces CD11b

Structured prediction: Partition + Assignment

Page 132: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

132

Challenge: Same Meaning, Many Variations

IL-4 up-regulates CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is induced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

……

Page 133: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

133

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Page 134: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

134

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster same forms at the atom level

Page 135: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

135

Cluster forms in composition with same forms

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Page 136: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

136

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms

Page 137: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

137

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms

Page 138: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

138

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms

Page 139: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

139

Unsupervised Semantic Parsing

Exponential prior on number of parameters Event/object/property cluster mixtures:InClust(e,+c) ^ HasValue(e,+v)

Object/Event Cluster: INDUCE

induces 0.1

enhances 0.4

Property Cluster: INDUCER

0.5

0.4…

IL-4 0.2

IL-8 0.1

None 0.1

One 0.8

nsubj

agent

Page 140: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

140

But … State Space Too Large

Coreference: #-clusters #-mentionsUSP: #-clusters exp(#-tokens) Also, meaning units often small and

many singleton clusters

Use combinatorial search

Page 141: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

141

Inference: Hill-Climb Probability

Initialize

Search Operator

Lambda reduction

induces

protein CD11B

nsubj dobj

IL-4

nn

?

? ?

?

?

?

?

protein

IL-4

nn

protein

IL-4

nn

?

?

?

?

Page 142: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

142

Learning: Hill-Climb Likelihood

enhances 1induces 1 protein 1IL-4 1

MERGE COMPOSE

IL-4 protein 1induces 0.2enhances 0.8

…Initialize

Search Operator

enhances 1induces 1 protein 1IL-4 1

Page 143: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

143

Unsupervised Ontology Induction

Limitations of USP: No ISA hierarchy among clusters Little smoothing Limited capability to generalize

OntoUSP [Poon & Domingos, ACL-2010]

Extends USP to also induce ISA hierarchy Joint approach for ontology induction, population,

and knowledge extraction To appear in ACL (see you in Uppsala :-)

Page 144: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

144

OntoUSP Modify the cluster mixture formula

InClust(e,c) ^ ISA(c,+d) ^ HasValue(e,+v) Hierarchical smoothing + clustering New operator in learning:

induces 0.30.1

enhances

ISA ISA

inhibits 0.2suppresses 0.1

induces 0.6

up-regulates 0.2

INDUCE

INHIBITinhibits 0.4

0.2

suppresses

ABSTRACTION

INHIBIT

inhibits 0.4

0.2

suppressesinduces 0.6

up-regulates 0.2

INDUCE

MERGE with

REGULATE?

Page 145: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

145

End of The Beginning …

Not merely a user guide of MLN and Alchemy Statistical relational learning:

Growth area for machine learning and NLP

Page 146: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

146

Future Work: Inference

Scale up inference Cutting-planes methods (e.g., [Riedel, 2008]) Unify lifted inference with sampling Coarse-to-fine inference

Alternative technologyE.g., linear programming, lagrangian relaxation

Page 147: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

147

Future Work: Supervised Learning

Alternative optimization objectivesE.g., max-margin learning [Huynh & Mooney, 2009]

Learning for efficient inferenceE.g., learning arithmetic circuits [Lowd & Domingos, 2008]

Structure learning: Improve accuracy and scalabilityE.g., [Kok & Domingos, 2009]

Page 148: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

148

Future Work: Unsupervised Learning

Model: Learning objective, formalism, etc. Learning: Local optima, intractability, etc.Hyperparameter tuning Leverage available resources Semi-supervised learning Multi-task learning Transfer learning (e.g., domain adaptation)

Human in the loopE.g., interative ML, active learning, crowdsourcing

Page 149: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

149

Future Work: NLP Applications

Existing application areas: More joint inference opportunities Additional domain knowledge Combine multiple pipeline stages

A “killer app”: Machine readingMany, many more awaiting YOU to discover

Page 150: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

150

SummaryWe need to unify logical and statistical NLPMarkov logic provides a language for this Syntax: Weighted first-order formulas Semantics: Feature templates of Markov nets Inference: Satisfiability, MCMC, lifted BP, etc. Learning: Pseudo-likelihood, VP, PSCG, ILP, etc.

Growing set of NLP applicationsOpen-source software: Alchemy

Book: Domingos & Lowd, Markov Logic,Morgan & Claypool, 2009.

alchemy.cs.washington.edu

Page 151: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

151

References[Banko et al., 2007] Michele Banko, Michael J. Cafarella, Stephen

Soderland, Matt Broadhead, Oren Etzioni, "Open Information Extraction From the Web", In Proc. IJCAI-2007.

[Chakrabarti et al., 1998] Soumen Chakrabarti, Byron Dom, Piotr Indyk, "Hypertext Classification Using Hyperlinks", in Proc. SIGMOD-1998.

[Damien et al., 1999] Paul Damien, Jon Wakefield, Stephen Walker, "Gibbs sampling for Bayesian non-conjugate and hierarchical models by auxiliary variables", Journal of the Royal Statistical Society B, 61:2.

[Domingos & Lowd, 2009] Pedro Domingos and Daniel Lowd, Markov Logic, Morgan & Claypool.

[Friedman et al., 1999] Nir Friedman, Lise Getoor, Daphne Koller, Avi Pfeffer, "Learning probabilistic relational models", in Proc. IJCAI-1999.

Page 152: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

152

References[Halpern, 1990] Joe Halpern, "An analysis of first-order logics of

probability", Artificial Intelligence 46.

[Huynh & Mooney, 2009] Tuyen Huynh and Raymond Mooney, "Max-Margin Weight Learning for Markov Logic Networks", In Proc. ECML-2009.

[Kautz et al., 1997] Henry Kautz, Bart Selman, Yuejun Jiang, "A general stochastic approach to solving problems with hard and soft constraints", In The Satisfiability Problem: Theory and Applications. AMS.

[Kok & Domingos, 2007] Stanley Kok and Pedro Domingos, "Statistical Predicate Invention", In Proc. ICML-2007.

[Kok & Domingos, 2008] Stanley Kok and Pedro Domingos, "Extracting Semantic Networks from Text via Relational Clustering", In Proc. ECML-2008.

Page 153: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

153

References[Kok & Domingos, 2009] Stanley Kok and Pedro Domingos, "Learning

Markov Logic Network Structure via Hypergraph Lifting", In Proc. ICML-2009.

[Ling & Weld, 2010] Xiao Ling and Daniel S. Weld, "Temporal Information Extraction", In Proc. AAAI-2010.

[Lowd & Domingos, 2007] Daniel Lowd and Pedro Domingos, "Efficient Weight Learning for Markov Logic Networks", In Proc. PKDD-2007.

[Lowd & Domingos, 2008] Daniel Lowd and Pedro Domingos, "Learning Arithmetic Circuits", In Proc. UAI-2008.

[Meza-Ruiz & Riedel, 2009] Ivan Meza-Ruiz and Sebastian Riedel, "Jointly Identifying Predicates, Arguments and Senses using Markov Logic", In Proc. NAACL-2009.

Page 154: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

154

References[Muggleton, 1996] Stephen Muggleton, "Stochastic logic programs", in

Proc. ILP-1996.

[Nilsson, 1986] Nil Nilsson, "Probabilistic logic", Artificial Intelligence 28.

[Page et al., 1998] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, "The PageRank Citation Ranking: Bringing Order to the Web", Tech. Rept., Stanford University, 1998.

[Poon & Domingos, 2006] Hoifung Poon and Pedro Domingos, "Sound and Efficient Inference with Probabilistic and Deterministic Dependencies", In Proc. AAAI-06.

[Poon & Domingos, 2007] Hoifung Poon and Pedro Domingo, "Joint Inference in Information Extraction", In Proc. AAAI-07.

Page 155: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

155

References[Poon & Domingos, 2008a] Hoifung Poon, Pedro Domingos, Marc

Sumner, "A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC", In Proc. AAAI-08.

[Poon & Domingos, 2008b] Hoifung Poon and Pedro Domingos, "Joint Unsupervised Coreference Resolution with Markov Logic", In Proc. EMNLP-08.

[Poon & Domingos, 2009] Hoifung and Pedro Domingos, "Unsupervised Semantic Parsing", In Proc. EMNLP-09.

[Poon & Cherry & Toutanova, 2009] Hoifung Poon, Colin Cherry, Kristina Toutanova, "Unsupervised Morphological Segmentation with Log-Linear Models", In Proc. NAACL-2009.

Page 156: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

156

References[Poon & Vanderwende, 2010] Hoifung Poon and Lucy Vanderwende,

"Joint Inference for Knowledge Extraction from Biomedical Literature", In Proc. NAACL-10.

[Poon & Domingos, 2010] Hoifung and Pedro Domingos, "Unsupervised Ontology Induction From Text", In Proc. ACL-10.

[Riedel 2008] Sebatian Riedel, "Improving the Accuracy and Efficiency of MAP Inference for Markov Logic", In Proc. UAI-2008.

[Riedel et al., 2009] Sebastian Riedel, Hong-Woo Chun, Toshihisa Takagi and Jun'ichi Tsujii, "A Markov Logic Approach to Bio-Molecular Event Extraction", In Proc. BioNLP 2009 Shared Task.

[Selman et al., 1996] Bart Selman, Henry Kautz, Bram Cohen, "Local search strategies for satisfiability testing", In Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge. AMS.

Page 157: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

157

References[Singla & Domingos, 2006a] Parag Singla and Pedro Domingos,

"Memory-Efficient Inference in Relational Domains", In Proc. AAAI-2006.

[Singla & Domingos, 2006b] Parag Singla and Pedro Domingos, "Entity Resolution with Markov Logic", In Proc. ICDM-2006.

[Singla & Domingos, 2007] Parag Singla and Pedro Domingos, "Markov Logic in Infinite Domains", In Proc. UAI-2007.

[Singla & Domingos, 2008] Parag Singla and Pedro Domingos, "Lifted First-Order Belief Propagation", In Proc. AAAI-2008.

[Taskar et al., 2002] Ben Taskar, Pieter Abbeel, Daphne Koller, "Discriminative probabilistic models for relational data", in Proc. UAI-2002.

Page 158: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

158

References[Toutanova & Haghighi & Manning, 2008] Kristina Toutanova, Aria

Haghighi, Chris Manning, "A global joint model for semantic role labeling", Computational Linguistics.

[Wang & Domingos, 2008] Jue Wang and Pedro Domingos, "Hybrid Markov Logic Networks", In Proc. AAAI-2008.

[Wellman et al., 1992] Michael Wellman, John S. Breese, Robert P. Goldman, "From knowledge bases to decision models", Knowledge Engineering Review 7.

[Yoshikawa et al., 2009] Katsumasa Yoshikawa, Sebastian Riedel, Masayuki Asahara and Yuji Matsumoto, "Jointly Identifying Temporal Relations with Markov Logic", In Proc. ACL-2009.