Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa

Unifying MAX SAT,Local Consistency Relaxations,

and Soft Logic with Hinge-Loss MRFsStephen H. Bach Bert Huang Lise

Getoor Maryland Virginia Tech UC Santa

Cruz

Modeling Relational Datawith

Markov Random Fields

3

Markov Random Fields Probabilistic model for high-dimensional data:

The random variables represent the data, such as whether a person has an attribute or whether a link exists

The potentials score different configurations of the data

The weights scale the influence of different potentials

4

Markov Random Fields Variables and potentials form graphical structure:

5

MRFs with Logic One way to compactly define MRFs is with first-order

logic, e.g., Markov logic networks

Use first-order logic to define templates for potentials

- Ground out weighted rules over graph data- The truth table of each ground rule is a potential- Each first-order rule has a weight that becomes the

potential’s

Richardson and Domingos, 2006

6

MRFs with Logic Let be a set of weighted logical rules, where each

rule has the general form

- Weights and sets and index variables

Equivalent clausal form:

7

MRFs with Logic Probability distribution:

8

MAP (maximum a posteriori) inference seeks a most-probable assignment to the unobserved variables

MAP inference is

This MAX SAT problem is combinatorial and NP-hard!

MAP Inference

MAX SAT Relaxation

10

Approximate Inference View MAP inference as optimizing rounding

probabilities

Expected score of a clause is a weighted noisy-or function:

Then expected total score is

But, is highly non-convex!

11

Approximate Inference It is the products in the objective that make it non-

convex

The expected score can be lower bounded using the relationship between arithmetic and harmonic means:

This leads to the lower bound

Goemans and Williamson, 1994

12

So, we solve the linear program

If we set , a greedy rounding method will find a -optimal discrete solution

If we set , it improves to ¾-optimal

Approximate Inference

Goemans and Williamson, 1994

Local ConsistencyRelaxation

14

Local Consistency Relaxation LCR is a popular technique for approximating MAP in

MRFs- Often simply called linear programming (LP) relaxation- Dual decomposition solves dual to LCR objective

Idea: relax search over consistent marginals to simpler set

LCR admits fast message-passingalgorithms, but no quality guaranteesin general

Wainwright and Jordan 2008

15

Local Consistency Relaxation

: pseudomarginals over variable states

: pseudomarginals over joint potential states

Wainwright and Jordan 2008

Unifying theRelaxations

17

Analysis

Bach et al. AISTATS 2015

j=1

j=2

j=3 and so on…

18

Analysis


19

Analysis We can now analyze each potential’s parameterized

subproblem in isolation:

Using the KKT conditions, we can find a simplified expression for each solution based on the parameters :


20

Analysis


Substitute back into outer objective

21

Analysis Leads to simplified, projected LCR over :


22

Analysis



MAX SATRelaxation

23

Consequences MAX SAT relaxation solved with choice of algorithms

Rounding guarantees apply to LCR!


Soft Logic andContinuous Values

25

Continuous Values Continuous values can also be interpreted as

similarities

Or degrees of truth. Łukasiewicz logic is a fuzzylogic for reasoning about imprecise concepts

Bach et al. In Preparation

26

All Three are Equivalent


MAX SATRelaxation

Exact MAX SAT forŁukasiewicz logic


27

Consequences Exact MAX SAT for Łukasiewicz logic is equivalent to

relaxed Boolean MAX SAT and local consistency relaxation for logical MRFs

So these scalable message-passing algorithms can also be used to reason about similarity, imprecise concepts, etc.!


Hinge-LossMarkov Random Fields

29

Generalizing Relaxed MRFs Relaxed, logic-based MRFs can reason about both

discrete and continuous relational data scalably and accurately

Define a new distribution over continuous variables:

We can generalize this inference objective to be the energy of a new type of MRF that does even moreBach et al. NIPS 12, Bach et al. UAI 13

30

Generalizations Arbitrary hinge-loss functions (not just logical

clauses)

Hard linear constraints

Squared hinge losses

Bach et al. NIPS 12, Bach et al. UAI 13

31

Hinge-Loss MRFs Define hinge-loss MRFs by using this generalized

objective as the energy function


32

HL-MRF Inference and Learning MAP inference for HL-MRFs always a convex

optimization- Highly scalable ADMM algorithm for MAP

Supervised Learning- No need to hand-tune weights- Learn from training data- Also highly scalable

Unsupervised and Semi-Supervised Learning- New learning algorithm that interleaves inference and

parameter updates to cut learning time by as much as 90% (under review)


More in Bert’s talk

More in Bert’s talk

Probabilistic Soft Logic

34

Probabilistic Soft Logic (PSL) Probabilistic programming language for defining HL-

MRFs

PSL components- Predicates: relationships or properties- Atoms: (continuous) random variables- Rules: potentials

35

Example: Voter Identification

?$$

TweetTweetStatus update

Status update

5.0 : Donates(A, “Republican”) -> Votes(A, “Republican”)

0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”)Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0 .

36

Example: Voter Identification

0.8 : Votes(A,P) && Spouse(B,A) -> Votes(B,P)

0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P)

spouse

spouse

colleague

colleague

spouse friend

friend

friend

friend

37

Example: Voter Identification/* Predicate definitions */Votes(Person, Party)Donates(Person, Party) (closed)Mentions(Person, Term) (closed)Colleague(Person, Person) (closed)Friend(Person, Person) (closed)Spouse(Person, Person) (closed)

/* Local rules */5.0 : Donates(A, P) -> Votes(A, P)0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”)0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)...

/* Relational rules */1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P)0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P)0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P)

/* Range constraint */Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0 .

38

PSL Defines HL-MRFs

/* Predicate definitions */Votes(Person, Party)Donates(Person, Party) (closed)Mentions(Person, Term) (closed)Colleague(Person, Person)

(closed)Friend(Person, Person) (closed)Spouse(Person, Person) (closed)

/* Local rules */5.0 : Donates(A, P) -> Votes(A, P)0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”)0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)...

/* Relational rules */1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P)0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P)0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P)

/* Range constraint */Votes(A, “Republican”) + Votes(A, “Democrat”)=1.

+ =

39

Open Source Implementation PSL is implemented as an open source library and

programming language interface

It’s ready to use for your next project

Some of the other groups already using PSL:- Jure Leskovec (Stanford) [West et al., TACL 14]- Dan Jurafsky (Stanford) [Li et al., ArXiv 14]- Ray Mooney (UT Austin) [Beltagy et al., ACL 14]- Kevin Murphy (Google) [Pujara et al., BayLearn

14]

40

Other PSL Topics Not discussed here:

- Smart grounding- Lazy inference- Distributed inference and learning

Future work:- Lifted inference- Generalized rounding guarantees

41

psl.cs.umd.edu

Applications

43

PSL Empirical Highlights Compared with discrete MRFs:

Predicting MOOC outcomes via latent engagement (AuPR):

Collective Classification Trust Prediction

PSL 81.8% 0.7 sec .482 AuPR 0.32 sec

Discrete 79.7% 184.3 sec .441 AuPR 212.36 sec

Tech Women-Civil Genes

Lecture Rank 0.333 0.508 0.688

PSL-Direct 0.393 0.565 0.757

PSL-Latent 0.546 0.816 0.818

Bach et al. UAI 13, Ramesh et al. AAAI 14

44

PSL Empirical Highlights Improved activity recognition in video:

Compared on drug-target interaction prediction:

5 Activities 6 Activities

HOG 47.4% .481 F1 59.6% .582 F1

PSL + HOG 59.8% .603 F1 79.3% .789 F1

ACD 67.5% .678 F1 83.5% .835 F1

PSL + ACD 69.2% .693 F1 86.0% .860 F1

London et al. CVPR WS 13, Fakhraei et al. TCBB 14

AuPR P@130

Perlman’s Method .564 ± .05 .594 ± .04

PSL .617 ± .05 .616 ± .04

Conclusion

46

Conclusion HL-MRFs unite and generalize different ways of

viewing fundamental AI problems, including MAX SAT, probabilistic graphical models, and fuzzy logic

PSL’s mix of expressivity and scalability makes it a general-purpose tool for network analysis, bioinformatics, NLP, computer vision, and more

Many exciting open problems, from algorithms, to theory, to applications

Documents

Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa