Upload
faith-plair
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Unifying MAX SAT,Local Consistency Relaxations,
and Soft Logic with Hinge-Loss MRFsStephen H. Bach Bert Huang Lise
Getoor Maryland Virginia Tech UC Santa
Cruz
3
Markov Random Fields Probabilistic model for high-dimensional data:
The random variables represent the data, such as whether a person has an attribute or whether a link exists
The potentials score different configurations of the data
The weights scale the influence of different potentials
5
MRFs with Logic One way to compactly define MRFs is with first-order
logic, e.g., Markov logic networks
Use first-order logic to define templates for potentials
- Ground out weighted rules over graph data- The truth table of each ground rule is a potential- Each first-order rule has a weight that becomes the
potential’s
Richardson and Domingos, 2006
6
MRFs with Logic Let be a set of weighted logical rules, where each
rule has the general form
- Weights and sets and index variables
Equivalent clausal form:
8
MAP (maximum a posteriori) inference seeks a most-probable assignment to the unobserved variables
MAP inference is
This MAX SAT problem is combinatorial and NP-hard!
MAP Inference
10
Approximate Inference View MAP inference as optimizing rounding
probabilities
Expected score of a clause is a weighted noisy-or function:
Then expected total score is
But, is highly non-convex!
11
Approximate Inference It is the products in the objective that make it non-
convex
The expected score can be lower bounded using the relationship between arithmetic and harmonic means:
This leads to the lower bound
Goemans and Williamson, 1994
12
So, we solve the linear program
If we set , a greedy rounding method will find a -optimal discrete solution
If we set , it improves to ¾-optimal
Approximate Inference
Goemans and Williamson, 1994
14
Local Consistency Relaxation LCR is a popular technique for approximating MAP in
MRFs- Often simply called linear programming (LP) relaxation- Dual decomposition solves dual to LCR objective
Idea: relax search over consistent marginals to simpler set
LCR admits fast message-passingalgorithms, but no quality guaranteesin general
Wainwright and Jordan 2008
15
Local Consistency Relaxation
: pseudomarginals over variable states
: pseudomarginals over joint potential states
Wainwright and Jordan 2008
19
Analysis We can now analyze each potential’s parameterized
subproblem in isolation:
Using the KKT conditions, we can find a simplified expression for each solution based on the parameters :
Bach et al. AISTATS 2015
23
Consequences MAX SAT relaxation solved with choice of algorithms
Rounding guarantees apply to LCR!
Bach et al. AISTATS 2015
25
Continuous Values Continuous values can also be interpreted as
similarities
Or degrees of truth. Łukasiewicz logic is a fuzzylogic for reasoning about imprecise concepts
Bach et al. In Preparation
26
All Three are Equivalent
Local Consistency Relaxation
MAX SATRelaxation
Exact MAX SAT forŁukasiewicz logic
Bach et al. In Preparation
27
Consequences Exact MAX SAT for Łukasiewicz logic is equivalent to
relaxed Boolean MAX SAT and local consistency relaxation for logical MRFs
So these scalable message-passing algorithms can also be used to reason about similarity, imprecise concepts, etc.!
Bach et al. In Preparation
29
Generalizing Relaxed MRFs Relaxed, logic-based MRFs can reason about both
discrete and continuous relational data scalably and accurately
Define a new distribution over continuous variables:
We can generalize this inference objective to be the energy of a new type of MRF that does even moreBach et al. NIPS 12, Bach et al. UAI 13
30
Generalizations Arbitrary hinge-loss functions (not just logical
clauses)
Hard linear constraints
Squared hinge losses
Bach et al. NIPS 12, Bach et al. UAI 13
31
Hinge-Loss MRFs Define hinge-loss MRFs by using this generalized
objective as the energy function
Bach et al. NIPS 12, Bach et al. UAI 13
32
HL-MRF Inference and Learning MAP inference for HL-MRFs always a convex
optimization- Highly scalable ADMM algorithm for MAP
Supervised Learning- No need to hand-tune weights- Learn from training data- Also highly scalable
Unsupervised and Semi-Supervised Learning- New learning algorithm that interleaves inference and
parameter updates to cut learning time by as much as 90% (under review)
Bach et al. NIPS 12, Bach et al. UAI 13
More in Bert’s talk
More in Bert’s talk
34
Probabilistic Soft Logic (PSL) Probabilistic programming language for defining HL-
MRFs
PSL components- Predicates: relationships or properties- Atoms: (continuous) random variables- Rules: potentials
35
Example: Voter Identification
?$$
TweetTweetStatus update
Status update
5.0 : Donates(A, “Republican”) -> Votes(A, “Republican”)
0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”)Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0 .
36
Example: Voter Identification
0.8 : Votes(A,P) && Spouse(B,A) -> Votes(B,P)
0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P)
spouse
spouse
colleague
colleague
spouse friend
friend
friend
friend
37
Example: Voter Identification/* Predicate definitions */Votes(Person, Party)Donates(Person, Party) (closed)Mentions(Person, Term) (closed)Colleague(Person, Person) (closed)Friend(Person, Person) (closed)Spouse(Person, Person) (closed)
/* Local rules */5.0 : Donates(A, P) -> Votes(A, P)0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”)0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)...
/* Relational rules */1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P)0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P)0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P)
/* Range constraint */Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0 .
38
PSL Defines HL-MRFs
/* Predicate definitions */Votes(Person, Party)Donates(Person, Party) (closed)Mentions(Person, Term) (closed)Colleague(Person, Person)
(closed)Friend(Person, Person) (closed)Spouse(Person, Person) (closed)
/* Local rules */5.0 : Donates(A, P) -> Votes(A, P)0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”)0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)...
/* Relational rules */1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P)0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P)0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P)
/* Range constraint */Votes(A, “Republican”) + Votes(A, “Democrat”)=1.
+ =
39
Open Source Implementation PSL is implemented as an open source library and
programming language interface
It’s ready to use for your next project
Some of the other groups already using PSL:- Jure Leskovec (Stanford) [West et al., TACL 14]- Dan Jurafsky (Stanford) [Li et al., ArXiv 14]- Ray Mooney (UT Austin) [Beltagy et al., ACL 14]- Kevin Murphy (Google) [Pujara et al., BayLearn
14]
40
Other PSL Topics Not discussed here:
- Smart grounding- Lazy inference- Distributed inference and learning
Future work:- Lifted inference- Generalized rounding guarantees
43
PSL Empirical Highlights Compared with discrete MRFs:
Predicting MOOC outcomes via latent engagement (AuPR):
Collective Classification Trust Prediction
PSL 81.8% 0.7 sec .482 AuPR 0.32 sec
Discrete 79.7% 184.3 sec .441 AuPR 212.36 sec
Tech Women-Civil Genes
Lecture Rank 0.333 0.508 0.688
PSL-Direct 0.393 0.565 0.757
PSL-Latent 0.546 0.816 0.818
Bach et al. UAI 13, Ramesh et al. AAAI 14
44
PSL Empirical Highlights Improved activity recognition in video:
Compared on drug-target interaction prediction:
5 Activities 6 Activities
HOG 47.4% .481 F1 59.6% .582 F1
PSL + HOG 59.8% .603 F1 79.3% .789 F1
ACD 67.5% .678 F1 83.5% .835 F1
PSL + ACD 69.2% .693 F1 86.0% .860 F1
London et al. CVPR WS 13, Fakhraei et al. TCBB 14
AuPR P@130
Perlman’s Method .564 ± .05 .594 ± .04
PSL .617 ± .05 .616 ± .04
46
Conclusion HL-MRFs unite and generalize different ways of
viewing fundamental AI problems, including MAX SAT, probabilistic graphical models, and fuzzy logic
PSL’s mix of expressivity and scalability makes it a general-purpose tool for network analysis, bioinformatics, NLP, computer vision, and more
Many exciting open problems, from algorithms, to theory, to applications