Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Probabilistic AI Srihari
22
Topics in Complexity of Inference1. Inferring a conditional probability from a BN2. NP Complete Decision Problem3. Inference decision problem is NP-complete4. Exact Inference is #P-complete5. Approximate Inference is NP-hard6. Types of Inference algorithms: exact and
approximate
Probabilistic AI Srihari
Common BN Inference Problem• PGMs represent joint probability distributions
over a set of variables χ• They are used to answer queries of interest• The most common query type is a conditional
probability query – E: evidence variables, whose known value is e– Y: query variables, whose distribution we wish to
know– Conditional probability query is P(Y|E=e)
3
Probabilistic AI Srihari
Computing conditional probability• Assume: variables χ, evidence E=e, query Y
• Conditional probability query P(Y|E=e) is:
– Evaluation of Numerator P(Y,e)• If W=χ-Y-E, ie, variables not query or evidence
– Evaluation of Denominator P(e)
– This process corresponds to taking a vector of marginal probabilities P(y1,e),..P(yk,e) and renormalizing the entries by P(e) to sum to 1
P(Y |E = e) = P(Y ,e)
P(e)
P(y,e) = P(y,e,w)
w∑
P(e) = P(y,e)
y∑ Rather than marginalizing over P(y,e,w) this
allows reusing computation of (1)
(1) Each term in summation over w is simplyan entry in the distribution
From product rule
Probabilistic AI Srihari
P(C,Se) = P(A,Ge,E,Sm,Se,C,L,GA,Ge,E ,Sm,L,Gd∑ d)
= P(A)P(Ge)P(Gd)P(E |A),P(Sm |A,G)P(Se |C)P(C |E,Sm)P(L |C,Gd)A,Ge,E ,Sm,L,Gd∑
Example Query with Cancer BN• χ={A,Ge,E,Sm,C,Gd,Se,L}
– Evidence variable is Se• Serum Calcium is calcium in blood
– Query variable is C cancer• We can evaluate P(C|Se )as:
– With W = χ-C-Se={A,Ge,E,Sm,Gd,L}:
P(C |Se)=P(C,Se)P(Se)
P(Se) = P(C,Se
C∑ )
We take the marginal probabilities P(c1,Se), P(c2,Se) and renormalizethe entries by P(Se) to sum to 1
Probabilistic AI Srihari
Analysis of Complexity
• Approach of summing out the variables in the joint distribution is unsatisfactory– Returns us to exponential blow-up
• PGM was precisely designed to avoid this!
• We now show that problem of inference in PGMs is NP-hard– Requires exponential time in the worst case
except if P=NP
– Even worse, approximate inference is NP - hard• Discussion for BNs applies to MNs also 6
P(y,e) = P(y,e,w)
w∑
Probabilistic AI Srihari
What is a decision problem?• Subset sum decision problem
– Given a set of integers, is there a subset that sums to zero?• No polynomial time algorithm to decide this
– Given an instance, say set ω={−2, −3, 15, 14, 7, −10} • The guess {−2, −3, −10, 15} can be verified in poly time
• Definition of Decision Problem Π: – LΠ defines a precise set of instances
• LΠ consists of integer sets which have a subset that sums to zero
– Decision problem Π: Is instance ω in LΠ ?• Is ω a set of integers with a subset that sums to zero?
Probabilistic AI Srihari
P and NP decision problems• Definition of Decision Problem Π: – LΠ defines a precise set of instances– Decision problem: Is instance ω in LΠ ?
• Decision problem Π is in– P if there is an algorithm that decides in poly time– NP if a guess can be verified in poly time
• Guess is produced non-deterministically• Hence the name non-deterministic polynomial time
– Subset sum decision problem is in NP• Whether a given subset sums to zero verified in poly time• But not in P
– No poly algorithm to determine whether there exists any subset that sums to zero
Probabilistic AI Srihari
3-SAT (Satisfiability) decision problem• 3-SAT formula over binary variables q1,.,,qn
– has the form C1 � C2 � ..Cm
Ci is a clause of form li,1�li,2�li,3 ; li,j i =1,..,m; j =1,2,3 are literals which are either qk or ~qk
e.g., with n=3 m=2 and clauses (q1 � ~q2 � ~q3) � (~q1 � q2 � ~q3)there is a satisfying assignment (assigns true to formula): q1=q2=q3= true
with clauses (~q1 � q2 ) � ( q2 � q3) � (~q1� q3) there is no satisfying assignment
Each assignment verified in polynomial time
• Decision problem Π: Given a 3-SAT formula of size n, is there a satisfying assignment?– To answer this we need to check n binary variables
with 2n assignments•LΠ is the set of 3-SAT formulas that have a satisfying assignment 9
Probabilistic AI Srihari
What is P=NP ?• Input is a formula of size n
– A particular assignment γ can be verified in polynomial time, e.g., q1=q2=q3= true
– Suppose generate guess γ and verify if it satisfies• Since guess verified in polynomial time, decision
problem Π is in NP
• Deterministic problems are subset of nondeterministic ones. So . – Converse is biggest problem in complexity
• If you can verify in polynomial time, can you decide in polynomial time?
– Eg., is there a prime greater than n? 10
P ⊆ NP
Probabilistic AI Srihari
P=NP intuition
• Creating great art– Complexity is NP
• Appreciating art: – Complexity is P
• Is P=NP ?– Most mathematicians don’t think so
11
Probabilistic AI Srihari
NP-hard and NP-complete• Hardest problems in NP are called NP-complete
– If poly time solution exists, can solve any in NP• NP-hard problems need not have
polynomial time verification• If Π is NP-hard it can be
transformed into Π’ in NP• 3-SAT is NP-complete
12
Probabilistic AI Srihari
BN for 3-SAT • Propositional variables q1,..,qn
– Return true if C1� C2� ..Cm , where Ci is a DNF of 3 binary variables qk , has a satisfying assignment,
• e.g., return true for 3-SAT formula (q1 V ~q2 V ~q3) � (~q1 V q2�~q3)since q1=q2=q3=true is a satisfying assignment
and return false for (~q1 � q2� ~q3) � ( q2 � q3) � (~q1� q3)which has no satisfying assignments
13
Q1 QnQ4Q3Q2
C1
A1 XAm –2A2
CmCm –1C3C2
. . .
. . .
BN to infer this:P(qk1)=0.5Ci are deterministic OR
Ai are deterministic AND
X is output (has value 1iff all of the Ci’s are 1
Probabilistic AI Srihari
#P-complete Problem
• Counting the no. of satisfying assignments• E.g., Propositional variables q1,.,,qn
Return true if C1 � C2 � ..Cm , where Ci is a DNF of 3 binary variables qk , has a satisfying assignment,
14
Probabilistic AI Srihari
Analysis of Exact Inference• Worst case: CPD is a table of size
|Val({Xi} ∨ PaXi)|• Most analyses of complexity are stated as
decision-problems• Consider decision problem first, then numerical
one• Natural version of conditional probability task is
given next
Probabilistic AI Srihari
Conditional Probability Decision Task
• Bayesian Network – Probability - Decision task:– BN-Pr-DP: Bayesian Network Decision Problem
• Given a BN B over χ, a variable X ε χ, and a value x εVal(X) decide PB (X=x)>0
• This decision problem can be shown to be NP-complete as seen next
Probabilistic AI Srihari
Proof of BN-Pr-DP is NP-complete
• Whether in NP : – Guess assignment ξ to network variables. Check
whether X=x and P(ξ)>0– One such guess succeeds iff P(X=x)>0. – Done in linear time• Is NP-hard:– Answer for instances in BN-Pr-DP can be used
to answer an NP-hard problem– Show a reduction from 3-SAT problem
17
Probabilistic AI Srihari
Reduction of 3-SAT to BN inference• Given a 3-SAT formula φ create BN Bφwith
variable X such that φ is satisfiable iff PBφ(X=x1)>0
• If BN inference is solved in poly time we can also solve 3-SAT in poly time
18
Q1 QnQ4Q3Q2
C1
A1 XAm –2A2
CmCm –1C3C2
. . .
. . .BN to infer this:P(qk1)=0.5Ci are deterministic ORAi are deterministic ANDX is output
Probabilistic AI Srihari
Original Inference Problem
• It is a numerical problem – rather than a decision problem
• Define BN-Pr– Given a BN B over χ, a variable X ε χ, and
a value x εVal(X) compute PB (X=x)– Task is to compute the total probability of
instantiations that are consistent with X=x• Weighted count of instantiations, with weight being the probability• This problem is #P-complete 19
Probabilistic AI Srihari
Analysis of Approximate Inference• Metrics for quality of approximation• Absolute Error
– Estimate ρ has absolute error ε for P(y|e) if |P(y|e)-ρ| ≤ ε
– A weak definition of error. If a rare disease has probability 0.0001then error of 0.0001 is unacceptable. If the probability is 0.3 then error of 0.0001 is fine
• Relative Error– Estimate ρ has relative error ε for P(y|e) ifρ/(1+ε) ≤ P(y|e) ≤ ρ(1+ε)
• ε=4 means P(y|e) is at least 20% of ρ and at most 600% ofρ. For low values much better than absolute error
Probabilistic AI Srihari
Approximate Inference is NP-hard• The following problem is NP-hard
– Given a BN B over χ, a variable X εχ and a value x ε Val(X), find a number ρ that has relative error εfor PB(X=x)
• Proof: – It is NP-hard to decide if PB (x1) >0
– Assume algorithm returns estimate ρ to PB(x1) which has relative error ε for some ε > 0
– ρ>0 if and only if PB(x1)>0– This achieving relative error is NP-hard 21
Probabilistic AI Srihari
2222
Inference Algorithms• Worst case is exponential• Two types of inference algorithms
– Exact• Variable Elimination• Clique trees
– Approximate• Optimization
– Propagation with approximate messages– Variational (analytical approximations)
• Particle-based (sampling)