95
1 Vasant Honavar, 2006. Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Probabilistic Graphical Models: Bayesian Networks Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science Bioinformatics and Computational Biology Program Center for Computational Intelligence, Learning, & Discovery Iowa State University [email protected] www.cs.iastate.edu/~honavar/ www.cild.iastate.edu/ www.bcb.iastate.edu/ www.igert.iastate.edu Vasant Honavar, 2006. Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω╞φ P(ω)

Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

1

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Probabilistic Graphical Models: Bayesian Networks

Vasant HonavarArtificial Intelligence Research Laboratory

Department of Computer ScienceBioinformatics and Computational Biology Program

Center for Computational Intelligence, Learning, & DiscoveryIowa State University

[email protected]/~honavar/

www.cild.iastate.edu/www.bcb.iastate.edu/www.igert.iastate.edu

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference by enumeration

• Start with the joint probability distribution:�

• For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω)�

Page 2: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

2

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference by enumeration

• Start with the joint probability distribution:�

• For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω)�

• P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2�

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference by enumeration

• Start with the joint probability distribution:�

• Can also compute conditional probabilities:�P(¬cavity | toothache) = P(¬cavity ∧ toothache)

P(toothache)= 0.016+0.064

0.108 + 0.012 + 0.016 + 0.064= 0.4�

Page 3: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

3

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Normalization

Denominator can be viewed as a normalization constant α�P(Cavity | toothache) = α, P(Cavity,toothache)

= α, [P(Cavity,toothache,catch) + P(Cavity,toothache,¬ catch)]= α, [<0.108,0.016> + <0.012,0.064>] = α, <0.12,0.08> = <0.6,0.4>�

General idea: compute distribution on query variable by fixing evidence variables and summing over unobserved variables

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference by enumeration, continued

Typically, we are interested in the posterior joint distribution of the query variables Y given specific values e for the evidence variables E�

Let the hidden or unobserved variables be H = X - Y - E�

Then the required summation of joint entries is done by summing out the hidden variables:�P(Y | E = e) = αP(Y,E = e) = αΣhP(Y,E= e, H = h)�

• The terms in the summation are joint entries because Y, E and Htogether exhaust the set of random variables�

Page 4: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

4

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference by enumeration, continued

• Obvious problems:�1. Worst-case time complexity O(dn) where d is the largest

arity�2. Space complexity O(dn) to store the joint distribution�3. How to find the numbers for O(dn) entries?

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Independence• A and B are independent iff

P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B)�

P(Toothache, Catch, Cavity, Weather)= P(Toothache, Catch, Cavity) P(Weather)

• 32 entries reduced to 12; for n independent biased coins, O(2n) →O(n)�

• Absolute independence powerful but rare�• How can we manage a large numbers of variables?�

Page 5: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

5

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conditional independence• P(Toothache, Cavity, Catch) has 23 – 1 = 7 independent entries�• If I have a cavity, the probability that the probe catches in it doesn't

depend on whether I have a toothache:�

– P(catch | toothache, cavity) = P(catch | cavity)• The same independence holds if I haven't got a cavity:�

– P(catch | toothache,¬cavity) = P(catch | ¬cavity)�• Catch is conditionally independent of Toothache given Cavity:�

– P(Catch | Toothache,Cavity) = P(Catch | Cavity)�

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conditional independence

• Catch is conditionally independent of Toothache given Cavity:�

– P(Catch | Toothache,Cavity) = P(Catch | Cavity)�

• Equivalent statements:

– P(Toothache | Catch, Cavity) = P(Toothache | Cavity)�– P(Toothache, Catch | Cavity) = P(Toothache | Cavity)

P(Catch | Cavity)�

Page 6: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

6

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conditional independence

• Write out full joint distribution using chain rule:�P(Toothache, Catch, Cavity)

= P(Toothache | Catch, Cavity) P(Catch, Cavity)�= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)�= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)�

I.e., 2 + 2 + 1 = 5 independent numbers�• Conditional independence

– often reduces the size of the representation of the joint distribution from exponential in n to linear in n�

– Is one of the most basic and robust form of knowledge about uncertain environments�

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given the value of Z:

P (X |Y, Z ) = P (X |Z ) that is, if

Conditional Independence

)|(),|(),,( kikjikji zZxXPzZyYxXPzyx ======∀

Page 7: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

7

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Independence and Conditional Independence

( ) ( )

( ) ( )

variables random to sassignment value possible all for equations, of sets represent these that Note

t.independen are and if

if given tindependenmutually are space. event given a on variables randomof sets disjoint pairwise be and Let

21121

11

1

1

....

,...,

,...

ZZWZWZZ

WZWZZ

WZZ

WZZ

PP

PP i

n

in

n

n

=

=∪∪ ∏=

U

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Independence Properties of Random Variables

( )( ) ( ) ( ) ( ) ( )

( ) ( )( ) ( )( ) ( )( ) ( ) ( )

. of definition from Follows :Proof,,,,,, d.

,,,, c.,,,, b.

,,,, a.:Then . or , is, That

. given are and that denote ,, Let space. event given a on variables random

of sets disjoint pairwise be ,,, Let

ceindependenIII

IIII

II

PPPPPnt independeI

WYZXWYZXYZXYWZXWYZX

YZXWYZXXZYYZX

YXZYXYZYXYZXYZXZYX

ZYXW

UU

UU

U

UU

⇒∧⇒⇒

==

Page 8: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

8

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayes RuleDoes patient have cancer or not?

A patient takes a lab test and the result comes back positive. The test returns a correct positive result in only 98% of the cases in which the disease is actually present, and a correct negative result in only 97% of the cases in which the disease is not present. Furthermore, .008 of the entire population have this cancer.

=¬+=+

=

)|()|(

)(

cancerPcancerP

cancerP

=¬−=−=¬

)|()|()(

cancerPcancerPcancerP

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayes Rule

Does patient have cancer or not?

030980

0080

.)|(.)|(

.)(

=¬+=+

=

cancerPcancerP

cancerP

970020

9920

.)|(.)|(

.)(

=¬−=−

cancerPcancerP

cancerP

( ) ( )( ) ( ) ( )

( )( ) ( ) ( ) ( )

cancer havenot does not,n likely tha more patient, The79.0)|( ;21.0)|(

0298.00078.0)(0298.0992.003.0 ;0078.0008.098.0

)(

;)(

=+¬=++=+

=×=++¬=×=++

+¬¬+

=+¬+

+=+

cancerPcancerPP

PcancerPPcancerPP

cancerPcancerPcancerP

PcancerPcancerP

cancerP

Page 9: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

9

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayes Rule

• Product rule

– P(a∧b) = P(a | b) P(b) = P(b | a) P(a)�– Bayes' rule: P(a | b) = P(b | a) P(a) / P(b)�

• In distribution form �P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y)

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayes' Rule and conditional independence

P(Cavity | toothache ∧ catch)

= αP(toothache ∧ catch | Cavity) P(Cavity) = αP(toothache | Cavity) P(catch | Cavity) P(Cavity)

• This is an example of a naïve Bayes (idiot Bayes) model:�– P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause)��

• Total number of parameters is linear in n�

Page 10: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

10

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayesian NetworksExploiting conditional independence and graphical representation for reasoning under uncertainty

Review of GraphsReview of Independence and conditional independenceDirected graphical models and probability distributions

Querying a probability distribution – inference

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Review of basic concepts of graphs

Undirected graph G1=(V,E1) Directed Graph G2=(V,E2)Vertex Set V = { A, B, C, D, E }Edge Set E1 = { A–B, B–D, D–E, A–C) C–E }Edge Set E2 = { A B, B D, D E, A C) C E }

A

C

B

D

E

A

C

B

D

E

Page 11: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

11

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Review of basic concepts of graphs

Adjacency Set of a node – immediate neighbors reachable through undirected (directed) links

In G1, Adj(A) = {C, D}; Adj(B) = {D}, Adj(D)={A,B,E} Adj(E)=∅In G2, Adj(A) = {C, D}; Adj(B) = {D}, Adj(D)={E} Adj(E)={C,D}Paths between two nodes – ordered list of nodes starting with the first node

and ending with the second in which each successive node is in the adjacency list of preceding node

A

C

B

D

E

A

C

B

D

E

G1 G2

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs• Complete graph – there is a link between every

pair of nodes• Complete set – a subset of nodes in a graph is

said to be complete if there is a link between every pair of nodes in the subset

• Clique – A complete set of nodes is said to be a clique if it is maximal – i.e., it is not a proper subset of another complete set

The graph G1(A, B. C. D. E) is a complete set and also the only

clique in the graph G1

A

C D

E

Page 12: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

12

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

Identify the cliques in the graph shownTwo cliques (A, C, D), (B, C, D, E)

A

C D

E

B

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

Neighbors of a node In G1, Neighbors(A) = {C, D}Boundary of a set of nodes S – union of neighbors of nodes in S except

nodes in SBoundary({C, D}) = {A,E,D} U {A,B,C,E} – {C,D}

= {A,B,C,D,E} – {C,D} = {A, B, E}

A

C

B

D

E

G1

Page 13: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

13

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

A graph is said to be connected if there exists at least one path between any pair of nodes

G1 is connected

G2 is not connected

A

C D

E

G1

A

C D

E

G2

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

A connected undirected graph is a tree if for every pair of nodes, there is a unique path

G1 is a treeAn undirected graph is said to be multiply

connected if at least one pair of nodes is connected by more than one path – i.e., there is at least one loop

G2 is not a tree (is multiply connected)A

C D

E

G1A

C D

E

G2

Page 14: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

14

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

Chord of a loop – a chord is a link between two nodes in a loop that is not part of the loop

C–D is a chord of the loop A–C–E–D–AA–E is a chord of the loop A–C–E–D–A

A chord of a loop decomposes a loop into two smaller loops

The loop A–E–D–A does not have a chord

A

C D

E

G1

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

Triangulated Graph – An undirected graph is said to be triangulated if every loop of length 4 or greater has at least one chord

C–D is a chord of the loop A–C–E–D–AG1 is not triangulatedG2 is triangulated

Triangulation does not mean dividing the graph into triangles!

A

C D

E

G1

A

C D

E

G2

Page 15: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

15

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

Triangulation – the process of adding chords to make the graph triangulated

There may be multiple ways to triangulate a graphA triangulation is said to be minimal if it contains the

minimal number of chordsG2 is a minimal triangulationG3 is not a minimal triangulationFinding minimal triangulation is NP-HardThere is a greedy algorithm for triangulating a

graph (Tarzan and Yannakakis, 1984)

G1 A

C D

E

A

C D

E

G2

A

C D

E

G3

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

Triangulated graphs have the running intersection property

There exists an ordering of cliques C1 … Cn such that Ci ∩{C1∪ C2.. Ci-1} is contained in at least one of the

cliques C1 C2.. Ci-1 for all i=1..nAn ordering of cliques satisfying the running

intersection property is called a chain of cliquesAn undirected graph has an associated chain of

cliques iff it is triangulatedOrdering {A,C,E,D} {C,D,E,F}

G1A

C D

EF

Page 16: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

16

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

Cluster – a subset of nodes of a graphCluster graph • nodes are clusters • There is an edge between two nodes if and only clusters contain

common nodesClique graph of an undirected graph is a cluster graph in which the

clusters correspond to the cliques of the original graphA clique graph is called a join or a junction graph if it contains all the

possible links between two cliques with a common nodeJoin graph of an undirected graph is unique

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphsA clique graph is a join tree or junction tree if it is a

tree and every node that belongs to two clusters also belongs to every node on the path between the two clusters

An undirected graph has a join tree if and only if it is triangulated

A

B c

ED F

G H I

D,G

D,H B,D,E E,I

A,B,C B,C,E

C,F

D,G

D,H B,D,E E,I

A,B,C B,C,E

C,F

Graph

Join Graph Join Tree

Page 17: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

17

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of undirected graphs

An undirected graph has a join tree if and only if it is triangulated

There is no join tree for this graph

CB

A

D

A,CB,D

A,B

C,D

Non-triangulated graph

Join graph

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of Directed Graphs

Parents(D)={A,B}Children(A)={C,D}Family(D)={A,B,D) (node and its parents)Ancestors(E)={A,B,C,D}Ancestors(A)=∅Ancestral numbering – numbering of

nodes such that the number of any node is less than that of its children

A

C

B

D

E

Page 18: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

18

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of Directed Graphs

Undirected graph associated with a directed graph – drop directionality of links

Moral graph – obtained by linking every pair of nodes that share a common child and dropping the directions on links

A

C

B

D

E

A

C

B

D

E

A

C

B

D

E

Directed Graph Undirected Graph Moral Graph

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of Directed GraphsA cycle in a directed graph – closed directed pathA directed graph is acyclic (DAG) if it has no directed cyclesA directed graph is connected if the associated undirected graph is

connectedA connected directed graph is a tree if the associated undirected graph

is a tree. Otherwise it is multiply connectedSimple directed tree – every node has at most one parent; otherwise

polytree

A

C

B

D

E

Directed Acyclic Graph Simple Tree Polytree

A

C D

E

A

C

B

D

E

Page 19: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

19

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Representation of graphs

Graphical representation Numerical representation • Adjacency matrix entry (i,j) is 1 if there is an edge from

node i to j.• Successive powers A1, A2.. of adjacency matrix provide the

number of paths of length equal to 1, 2, ..• Attainability matrix – entry (i,j) is 1 if there is a path from

node i to node j• If there is a path between two nodes, there is a path of

length less than N where N is the number of nodes in the graph

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

• Random variable X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given the value of Z:

• P (X |Y, Z ) = P (X |Z ) that is, if

Building Probabilistic Models –Conditional Independence

)|(),|(),,( kikjikii zZxXPzZyYxXPzyx ======∀

Page 20: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

20

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conditional Independence

( ) ( )( )1,0|1

1|11,1|1====

======LighteningRainThunderP

LighteningThunderPLightningRainThunderP

( ) ( )( )0,0|1

0|10,1|1====

======LighteningRainThunderP

LighteningThunderPLightningRainThunderP

( ) ( )( )1,0|0

1|01,1|0====

======LighteningRainThunderP

LighteningThunderPLightningRainThunderP

( ) ( )( )0,0|0

0|00,1|0====

======LighteningRainThunderP

LighteningThunderPLightningRainThunderP

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conditional Independence

( ) ( )

( ) ( )

variablesrandom tosassignment valuepossible allfor equations, of setsrepresent e that thesNotet.independen are and if ,

,...,

if given t independenmutually are ,..., space.event given aon

variablesrandom be and ,...Let

21121

121

1

1

ZZWZPWZZP

WZPWZZZP

WZZ

WZZ

i

n

in

n

n

=

= ∏=

Page 21: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

21

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Independence and Conditional Independence

( ) ( )

( ) ( )

variablesrandom tosassignment valuepossible allfor equations, of setsrepresent e that thesNote

t.independen are and if

....

if given t independenmutually are ,..., space.event given aon variablesrandom

of setsdisjoint pairwise be and ,...Let

21121

11

1

1

ZZWZWZZ

WZWZZ

WZZ

WZZ

PP

PP i

n

in

n

n

=

=∪∪ ∏=

U

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Independence Properties of Random Variables

( )( ) ( ) ( ) ( ) ( )

( ) ( )( ) ( )( ) ( )( ) ( ) ( )

ce.independen of definition from Follows :Proof d. c. b. a.

:Then or is, That given tindependen are and that denote Let

space. event given a on variables randomof sets disjoint pairwise be Let

WYZXWYZXYZXYWZXWYZX

YZXWYZXXZYYZX

YXZYXYZYXYZXYZXZYX

ZYXW

UU

UU

U

UU

,,,,,,,,,,

,,,,,,,,

.,.,,

,,,

IIIIIII

IIPPPPP

I

⇒∧⇒⇒

==

Page 22: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

22

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Implications of Independence

• Suppose we have 5 Binary features and a binary class label

• Without independence, in order to specify the joint distribution, we need to specify a probability for each possible assignment of values to each variable resulting in a table of size 26=64

• Suppose the features are independent given the class label – we only need 5(2x2)=20 entries

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayesian Networks

CancerSmoking{ }heavylightnoS ,,∈

{ }malignantbenignnoneC ,,∈P( S=no) 0.80P( S=light) 0.15P( S=heavy) 0.05

Smoking= no light heavyP( C=none) 0.96 0.88 0.60P( C=benign) 0.03 0.08 0.25P( C=malig) 0.01 0.04 0.15

Page 23: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

23

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Product Rule

• P(C,S) = P(C|S) P(S)

S⇓ C⇒ none benign malignantno 0.768 0.024 0.008light 0.132 0.012 0.006heavy 0.035 0.010 0.005

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Marginalization

S⇓ C⇒ none benign malig totalno 0.768 0.024 0.008 .80light 0.132 0.012 0.006 .15heavy 0.035 0.010 0.005 .05

total 0.935 0.046 0.019

P(Cancer)

P(Smoke)

Page 24: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

24

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayes Rule Revisited

)(),(

)()()|()|(

CPSCP

CPSPSCPCSP ==

S⇓ C⇒ none benign maligno 0.768/.935 0.024/.046 0.008/.019light 0.132/.935 0.012/.046 0.006/.019heavy 0.030/.935 0.015/.046 0.005/.019

Cancer= none benign malignantP( S=no) 0.821 0.522 0.421P( S=light) 0.141 0.261 0.316P( S=heavy) 0.037 0.217 0.263

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

A Bayesian Network

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics

Page 25: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

25

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Independence

Age and Gender are independent.

P(A|G) = P(A) A ⊥ G P(G|A) = P(G) G ⊥ A

GenderAge

P(A,G) = P(G|A) P(A) = P(G)P(A)P(A,G) = P(A|G) P(G) = P(A)P(G)

P(A,G) = P(G)P(A)

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conditional Independence

Smoking

GenderAge

Cancer

Cancer is independent of Age and Gendergiven Smoking.

P(C|A,G,S) = P(C|S) C ⊥ A,G | S

Page 26: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

26

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

More Conditional Independence:Naïve Bayes

Cancer

LungTumor

SerumCalcium

Serum Calcium is independent of Lung Tumor, given Cancer

P(L|SC,C) = P(L|C)

Serum Calcium and Lung Tumor are dependent

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Naïve Bayes in general

H

E1 E2 E3 En…...

2n + 1 parameters:nihePheP

hP

ii ,,1),|(),|()(

K=

Page 27: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

27

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

More Conditional Independence:Explaining Away

Exposure to Toxics is dependent on Smoking, given Cancer

Exposure to Toxics and Smoking are independentSmoking

Cancer

Exposureto Toxics

E ⊥ S

P(E = heavy | C = malignant) >P(E = heavy | C = malignant, S=heavy)

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Put it all together

=),,,,,,( SCLCSEGAP

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics

)|()|( CLPCSCP ⋅

⋅⋅ )()( GPAP

⋅⋅ ),|()|( GASPAEP⋅),|( SECP

Page 28: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

28

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

General Product (Chain) Rule for Bayesian Networks

)|(),,,(1

21 iPa∏=

=n

iin XPXXXP K

Pai=parents(Xi)

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

• Naive assumption of variables are independent (e.g., Naïve Bayes assumption that the variables are independent given the class) can be too restrictive

• But representing joint distributions is intractable without some independence assumptions

• Bayesian networks explicitly model conditional independence among subsets of variables to yield a graphical representation of probability distributions that admit such independence

Bayesian Networks

Page 29: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

29

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayesian network

• Bayesian network is a directed acyclic graph (DAG) in which the nodes represent random variables

• Each node is annotated with a probability distribution P (Xi | Parents(Xi ) ) representing the dependency of that node on its parents in the DAG

• Each node is asserted to be conditionally independent of its non-descendants, given its immediate predecessors.

• Arcs represent direct dependencies

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayesian Networks

Efficient factorized representation of probability distributions via conditional independence

0.9 0.1

e

be

0.2 0.8

0.01 0.990.9 0.1

bebb

e

BE P(A | E,B)Earthquake

Radio

Burglary

Alarm

Call

Page 30: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

30

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayesian Networks

• Qualitative partstatistical independence statements represented in the form of a directed acyclic graph (DAG)• Nodes - random

variables • Edges – direct

influence

Quantitative part Conditional probability distributions – one for each random variable conditioned on its parents

0.9 0.1

e

be

0.2 0.8

0.01 0.99

0.9 0.1

bebb

e

BE P(A | E,B)Earthquake

Radio

Burglary

Alarm

Call

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Qualitative part

• Nodes are independent of non-descendants given their parents

d-separation: • a graph theoretic criterion

for reading independence statements

• can be computed in linear time (in the number of edges)

Earthquake

Radio

Burglary

Alarm

Call

Page 31: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

31

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Directed graphs and joint probabilities

• Let be a set of random variables

• Let be the set of parents of

• Associate a vertex in the directed a-cyclic graph with a random variable and a function of the form

• Then

{ }nXXX ...., 21

iX

( )i

xxf ii π,

( ) ( )i

xxfxxp i

n

iin π

=∏= ,...

11

iX π

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

What independences does a Bayes Net model?• In order for a Bayesian network to model a probability

distribution, the following must be true by definition: • Each variable is conditionally independent of all its non-

descendants in the graph given the value of all its parents.

This implies

But what else does it imply?

∏=

=n

iiin XparentsXPXXP

11 ))(|()( K

Earthquake

Radio

Burglary

Alarm

Call

)|(),|()|()()(),,,,(

ACPBEAPERPBPEPCARBEP =

Page 32: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

32

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

What Independences does a Bayes Network model?

Example:

Z

Y

X

Given Y, does learning the value of Z tell us nothing new about X?

i.e., is P(X|Y, Z) equal to P(X | Y)?

Yes. Since we know the value of all ofX’s parents (namely, Y), and Z is not adescendant of X, X is conditionally independent of Z.

Also, since independence is symmetric, P(Z|Y, X) = P(Z|Y).

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Quick proof that independence is symmetric

• Assume: P(X|Y, Z) = P(X|Y) • X and Z are independent given Y

),()()|,(),|(

YXPZPZYXPYXZP =

)()|()(),|()|(

YPYXPZPZYXPZYP

=

(Bayes’s Rule)

(Chain Rule)

(By Assumption)

(Bayes’s Rule))()|(

)()|()|(YPYXP

ZPYXPZYP=

)|()(

)()|( YZPYP

ZPZYP==

Page 33: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

33

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

What Independences does a Bayes Network model?

• Let I(X,Y,Z) represent X and Z being conditionally independent given Y.

• I(X,Y,Z)? Yes, just as in previous example: All X’s parents given, and Z is not a descendant.

Y

X Z

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

What Independences does a Bayes Network model?

• I(X,{U},Z)? No.• I(X,{U,V},Z)? Yes.

Z

VU

X

Page 34: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

34

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Things get a little more confusing

• X has no parents, so we know all its parents’ values trivially• Z is not a descendant of X• So, I(X,{},Z), even though there’s a undirected path from X to Z

through an unknown variable Y.• What if we do know the value of Y ? Or one of its descendants?

ZX

Y

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

The Burglar Alarm example

• Your house has a twitchy burglar alarm that is also sometimes triggered by earthquakes.

• Earth arguably doesn’t care whether your house is currently being burgled

• While you are on vacation, one of your neighbors calls and tells you your home’s burglar alarm is ringing.

Burglar Earthquake

Alarm

Phone Call

Page 35: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

35

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Burgler Alarm Example (Contd)

• But now suppose you learn that there was a medium-sized earthquake in your neighborhood. …Probably not a burglar after all.

• Earthquake “explains away” the hypothetical burglar.

• But then it must NOT be the case thatI(Burglar,{Phone Call}, Earthquake), even though I(Burglar,{}, Earthquake)!

Burglar Earthquake

Alarm

Phone Call

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

d-separation to the rescue

• Fortunately, there is a relatively simple algorithm for determining whether two variables in a Bayesian network are conditionally independent: d-separation.

Page 36: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

36

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Blocked Unblocked

d-separation

Two variables are independent if all paths between them are blocked by evidence

Three cases:• Common cause• Intermediate cause• Common Effect

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Blocked UnblockedE

R A

E

R A

d-separation

• Two variables are independent if all paths between them are blockedby evidence

• Three cases:• Common cause• Intermediate cause• Common Effect

Blocked Unblocked

If we do not know whether an earthquake occurred, then radio announcement can influence our belief about the alarm having gone off. If we know that earthquake occurred, then radio announcement gives no information about the alarm

Evidence may be transmitted through a diverging connection unless it is instantiated.

Page 37: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

37

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Blocked UnblockedE

C

A

E

C

A

d-separation

Common causeIntermediate causeCommon Effect

Blocked Unblocked

Evidence may be transmitted through a serial connection unless it is blocked

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Blocked UnblockedE B

A

C

E B

A

CE B

A

C

d-separation

Common cause

Intermediate cause

Common Effect

Blocked Unblocked

Evidence may be transmitted through a converging connection only if either the variable or one of its descendents has received evidence

Page 38: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

38

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

I(X,Y,Z) denotes X and Z are independent given Y– Surely I(R,B) – Possibly ¬I(R,A,B)– Surely I(R,{E,A}B)– Possibly ¬I(R,B,C)

Example

E B

A

C

R

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

d-separation

Definition: X and Z are d-separated by a set of evidence variables E iff every undirected path from X to Z is “blocked” by evidence E

Page 39: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

39

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

d-separation

• Theorem [Verma & Pearl, 1998]: If a set of evidence variables E d-separates X and Z in a Bayesian network’s graph, then I(X, E, Z).

• d-separation can be computed in linear time using a depth-first search like algorithm.

• We now have a fast algorithm for automatically inferring whether finding out about the value of one variable might give us any additional hints about some other variable, given what we already know.

• Variables may actually be independent when they’re not d-separated, depending on the actual probabilities involved

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

d-separation example

A B

C D

E F

G

I

H

J

I(C, {}, D)?I(C, {A}, D)?I(C, {A, B}, D)?I(C, {A, B, J}, D)?I(C, {A, B, E, J}, D)?

Page 40: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

40

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Markov Blanket

• A node is conditionally independent of all other nodes in the network given its parents, children, and children’s parents -

Alarm

MaryCallsJohnCalls

EarthquakeBurglary

Burglary is independent of John Calls and Mary Calls given Alarm and Earth Quake

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayesian Networks: Summary

• Bayesian networks offer an efficient representation of probability distributions

• Efficient:• Local models• Independence (d-separation)

• Effective: Algorithms take advantage of structure to • Compute posterior probabilities • Compute most probable instantiation• Decision making

Page 41: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

41

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian networks

• BN models compactly the full joint distribution by taking advantage of existing independences between variables

• Inference tasks:• – Diagnostic inference (from effect to cause)

P ( Burglary | JohnCalls = T )• – Predictive inference (from cause to effect)

P ( JohnCalls | Burglary = T )• – Other probabilistic queries (queries on joint distributions).• Can we take advantage of independences to construct

special algorithms and speeding up the inference?

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayesian network inference

P(E)=0.002

Alarm

MaryCallsJohnCalls

EarthquakeBurglaryP(B)=0.001

P(A|B,E)=0.95P(A|B, ¬E)=0.94P(A|¬B,E)=0.29P(A|¬B, ¬E)=0.001

P(J|A)=0.9P(J|¬A)=0.05

P(M|A)=0.7P(M|¬A)=0.01

Page 42: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

42

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example

• Device operating normally or malfunctioning• A sensor indirectly monitors the operation of the device• Sensor reading is either high or low

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Diagnostic inference. Example

Diagnostic inference: compute the probability of device operating normally given the sensor reading is high (S ).

Page 43: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

43

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian network

Bad news:• – Exact inference problem in BNs is NP-hard (Cooper)• – Approximate inference is NP-hard (Dagum, Luby)In practice, things are not so bad• Exact inference

– Inference in Simple Chains– Variable elimination– Clustering / join tree algorithms

• Approximate inference– Stochastic simulation / sampling methods– Markov chain Monte Carlo methods– Mean field theory

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Computing joint probability distributions using a Bayesian network

Any entry in the joint probability distribution can be calculated from the Bayesian network.

)()(),|()|()|( ),(),|()|()|(

),,(),,|()|( ),,,(),,,|(),,,,(

EPBPEBAPAMPAJPEBPEBAPAMPAJP

EBAPEBAMPAJPEBAMPEBAMJPEBAMJP

¬¬¬¬=¬¬¬¬=

¬¬¬¬=¬¬¬¬=¬¬

(We’re just using the chain rule and conditional independence.)

Page 44: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

44

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Computing joint probabilities

• Joint distribution can be used to answer any query about the domain.

• Bayesian network represents the joint distribution• Any query about the domain can be answered using a BN• Tradeoff: A BN can be much more concise, but you need

to calculate, rather than look up in a table, probabilities fromthe joint distribution

))(|()(),...,(2

111 ii

n

iinn XParentsxXPXPxXxXP ==== ∏

=

General formula:

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Computing joint probability distributions using a Bayesian network

Any entry in the joint probability distribution can be calculated from the Bayesian network.

)()(),|()|()|( ),(),|()|()|(

),,(),,|()|( ),,,(),,,|(),,,,(

EPBPEBAPAMPAJPEBPEBAPAMPAJP

EBAPEBAMPAJPEBAMPEBAMJPEBAMJP

¬¬¬¬=¬¬¬¬=

¬¬¬¬=¬¬¬¬=¬¬

(We’re just using the chain rule and conditional independence.)

Page 45: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

45

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian networks

Blind approach• Sum out all un-instantiated variables from the full joint,• Express the joint distribution as a product of conditionals

Computational cost• Number of additions: 15• Number of products: 16x4 = 64

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian networksInterleave sums and products• Combines sums and product in a smart way (multiplication

constants can be taken out of the sum)

Computational cost:• Number of additions: 1+2 [1+1+2]=9• Number of products: 2[2+2(1+2)]=16

Page 46: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

46

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian networks

• Smart interleaving of sums and products can help us to speed up the computation of joint probability queries

• What if we want to compute P(B = T , J = T )?

• Smart caching of results of computation that would otherwise be repeated can save time

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian networks• When does caching of results becomes handy?• There are other queries when results can be shared• General technique: Variable elimination

Page 47: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

47

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian networks

• When does caching of results becomes handy?• What if we want to compute a diagnostic query:

• Exactly probabilities we have just computed !!• There are other queries when cashing and ordering of sums and

products can be shared and saves computation

• General technique: Variable elimination

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian networks

General idea of variable elimination

Results cached in tree structure

Page 48: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

48

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian Networks

Find P(Q=q|E=e)- Q the query variable(s)- E set of evidence variables

P(q|e) = P(q,e) / P(e)X1,.. Xn are network variables except Q,E

( ) ( )∑=nxxx

nXXXeqeqP...,

...,,,,21

21

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Basic Inference

P(b) = ?

A B

∑∑ ==aa

bP P(a) a) | P(b b) P(a,)(

Page 49: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

49

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Basic Inference

∑∑ ==aa

bP P(a) a) | P(b b) P(a,)(

A B C

∑=b

bPbcPcP )()|()(

∑∑

=

=

==

ba

ba

baba

bPbcP

aPabPbcP

aPabPabcPcbaPcP

,

,

,,

)()|(

)()|()|(

)()|(),|(),,()(

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in trees

Y1 Y2

X

( ) ( ) ( ) ( ) ( ) ( )2121212121212121

YPYPYYXPYYPYYXPYYXPXPyyyyyy

∑∑∑ ===,,,

,|,,|,,)(

Page 50: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

50

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Polytrees

A network is singly connected (a polytree) if it contains no undirected loops.

Not a polytree Polytree

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in polytrees

• Theorem: Inference in polytrees can be performed in time that is polynomial in the number of variables.

• Main idea: in variable elimination, need only maintain distributions over single nodes.

Page 51: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

51

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference with Bayesian Networks

• Inference in polytrees can be performed efficiently• Inference with DAG is NP-Hard – Proof by

reduction of SAT to Bayesian network inference

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Approaches to inference

• Exact inference – Inference in Simple Chains– Variable elimination– Clustering / join tree algorithms

• Approximate inference– Stochastic simulation / sampling methods– Markov chain Monte Carlo methods– Mean field theory

Page 52: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

52

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference – A more complicated example

RainSprinkler

Cloudy

WetGrass

∑=c,s,r

)c(P)c|s(P)c|r(P)s,r|w(P)w(P

∑ ∑=s,r c

)c(P)c|s(P)c|r(P)s,r|w(P

∑=s,r

1 )s,r(f)s,r|w(P )s,r(f1

Because of the structure of the BN, some sub-expressions in the joint depend only on a small number of variablesBy computing them once and caching the result, we can avoid generating them exponentially many times

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Variable Elimination

• General idea:• Write query in the form

• Iteratively– Move all irrelevant terms outside of innermost sum– Perform innermost sum, getting a new term– Insert the new term into the product

∑ ∑∑∏=kx x x i

iin paxPXP3 2

)|(),( Le

Page 53: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

53

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Variable Elimination

• A factor over X is a function from Domain(X) to numbers in the interval [0,1]

• A conditional probability table is a factor• A joint distribution is a factor• Bayesian network inference • Factors are multiplied to generate new ones• Variables in factors are summed out (marginalization)• A variable can be summed out as soon as all the factors in which the

variable appears have been multiplied

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

A More Complex Example

Visit to Asia Smoking

Lung CancerTuberculosis

Abnormalityin Chest Bronchitis

X-Ray Dyspnea

Page 54: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

54

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

V S

LT

A B

X D

),|( )|( ),|( )|( )|( )|( )( )( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,b

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,b• Initial factors

Eliminate: v

Note: fv(t) = P(t)In general, result of elimination is not necessarily a probability term

Compute: ∑=v

v vtPvPtf )|()()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

Page 55: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

55

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: s,x,t,l,a,b• Initial factors

Eliminate: s

Summing on s results in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variables.

Compute: ∑=s

s slPsbPsPlbf )|()|()(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

),|()|(),|(),()( badPaxPltaPlbftf sv⇒

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

V SLT

A BX D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: x,t,l,a,b• Initial factors

Eliminate: x

Note: fx(a) = 1 for all values of a

Compute: ∑=x

x axPaf )|()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

),|()|(),|(),()( badPaxPltaPlbftf sv⇒

),|(),|()(),()( badPltaPaflbftf xsv⇒

Page 56: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

56

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: t,l,a,bInitial factors

Eliminate: tCompute: ∑=

tvt ltaPtflaf ),|()(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

),|()|(),|(),()( badPaxPltaPlbftf sv⇒

),|(),|()(),()( badPltaPaflbftf xsv⇒

),|(),()(),( badPlafaflbf txs⇒

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

V SLT

A BX D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: l,a,b• Initial factors

Eliminate: lCompute: ∑=

ltsl laflbfbaf ),(),(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

),|()|(),|(),()( badPaxPltaPlbftf sv⇒

),|(),|()(),()( badPltaPaflbftf xsv⇒

),|(),()(),( badPlafaflbf txs⇒

),|()(),( badPafbaf xl⇒

Page 57: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

57

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: b• Initial factors

Eliminate: a,bCompute:

∑∑ ==b

aba

xla dbfdfbadpafbafdbf ),()(),|()(),(),(

),|(),|()(),()( badPltaPaflbftf xsv⇒

),|()(),( badPafbaf xl⇒

),|(),()(),( badPlafaflbf txs⇒

)(),( dfdbf ba ⇒⇒

V SLT

A BX D

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Basic operations

• Multiplying two factors• Summing out a variable from a product of factors –

marginalization

Page 58: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

58

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example: Multiplying factorsPointwise product

• Pointwise product is NOT– matrix multiplication– element by element multiplication

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Dealing with evidence

• How do we deal with evidence?• Suppose get evidence V = 1, S = 0, D = 1• We want to compute P(L, V = 1, S = 0, D = 1)

V S

LT

A B

X D

Page 59: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

59

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Dealing with Evidence

• We start by writing the factors:

• Since we know that V = 1, we don’t need to eliminate V• Instead, we can replace the factors P(V) and P(T|V) with

• These “select” the appropriate parts of the original factors given the evidence

• Note that fp(V) is a constant, and thus does not appear in elimination of other variables

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

)|()()( )|()( 11 ==== VTPTfVPf VTpVP

V S

LT

A B

X D

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Variable Elimination

• We now understand variable elimination as a sequence of rewriting operations

• Actual computation is done in elimination step• Computation depends on order of elimination

Page 60: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

60

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Dealing with Evidence

• Given evidence V = 1, S = 0, D = 1• Compute P(L, V = 1, S = 0, D = 1 )• Initial factors, after setting evidence:

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

• Given evidence V = 1, S = 0, D = 1• Compute P(L, V = 1, S = 0, D = 1 )• Initial factors, after setting evidence:

• Eliminating x, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

Dealing with EvidenceV S

LT

A B

X D

Page 61: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

61

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Dealing with Evidence

• Given evidence V = 1, S = 0, D = 1• Compute P(L, V = 1, S = 0, D = 1)• Initial factors, after setting evidence:

• Eliminating x, we get

• Eliminating t, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP

V S

LT

A B

X D

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Dealing with Evidence

• Given evidence V = 1, S = 0, D = 1• Compute P(L, V = 1, S = 0, D = 1)• Initial factors, after setting evidence:

• Eliminating x, we get

• Eliminating t, we get

• Eliminating a, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP

),()()( )|()|()()( lbfbflfff asbPslPsPvP

V S

LT

A B

X D

Page 62: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

62

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Variable Elimination Algorithm• Let X1,…, Xm be an ordering on the non-query variables

• For i= m, …, 1

• Leave in the summation for Xi only factors mentioning Xi

• Multiply the factors, getting a factor that contains a number for each value of the variables mentioned, including Xi

• Sum out Xi, getting a factor f that contains a number for each value of the variables mentioned, not including Xi

• Replace the multiplied factor in the summation

∏∑ ∑∑j

jjX XX

XParentsXPm

))(|(...1 2

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

∑=x

kxkx yyxfyyf ),,,('),,( 11 KK

∏=

=m

ilikx i

yyxfyyxf1

,1,1,11 ),,(),,,(' KK

Complexity of variable elimination• Suppose in one elimination step we compute

• This requires multiplications

• For each value for x, y1, …, yk, we do m multiplications• additions

• For each value of y1, …, yk , we do |Domain(X)| additions• Complexity is (not surprisingly) exponential in number of variables

in the intermediate factor!

∏⋅⋅i

iYDomainXDomainm )()(

∏⋅i

iYDomainXDomain )()(

Page 63: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

63

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Understanding Variable Elimination

• We want to select “good” elimination orderings that reduce complexity

• This can be done be examining a graph theoretic property of the “induced” graph; we will not cover this in class.

• This reduces the problem of finding good ordering to graph-theoretic operation that is well-understood—unfortunately computing it is NP-hard!

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Exercise: Variable elimination

smart study

prepared fair

pass

p(smart)=.8

p(study)=.6

p(fair)=.9

Query: What is the probability that a student is smart, given that he/she passes the exam?

.9

.5

.7

.1

TFTF

TTFF

P(Pr|…)StSm

TTTTFFFF

Sm

TTFFTTFF

Pr

TFTFTFTF

F

.9

.1

.7

.1

.7

.1

.2

.1

P(Pa|…)

Page 64: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

64

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayesian Network Inference in polytrees – Message Passing algorithm

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Decomposing the probabilities

• Suppose we want P(Xi | E) where E is some set of evidence variables.

• Let’s split E into two parts:

– Ei- is the part consisting of assignments to variables

in the subtree rooted at Xi

– Ei+ is the rest of the variables in E

Xi

Page 65: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

65

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Decomposing the probabilities

)(λ)(α)|(

)|()|()|(

)|(),|(

),|()|(

ii

ii

iiii

ii

iiiii

iiii

XXπEEP

EXPXEP

EEPEXPEXEP

EEXPEXP

=

=

=

=

+−

+−

+−

++−

+−

Xi

Where:• α is a constant independent of Xi• π(Xi) = P(Xi |Ei

+)• λ(Xi) = P(Ei

-| Xi)

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Using the decomposition for inference

• We can use this decomposition to do inference as follows. First, compute λ(Xi) = P(Ei

-| Xi) for all Xi recursively, using the leaves of the tree as the base case.

Page 66: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

66

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Quick aside: “Virtual evidence”

• For theoretical simplicity, but without loss of generality, let us assume that all variables in E (the evidence set) are leaves in the tree.

Xi

Xi

Xi’Observe Xi Equivalent to Observe Xi’

Where P(Xi’| Xi) =1 if Xi’=Xi, 0 otherwise

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Calculating λ(Xi) for non-leaves

• Suppose Xi has one child, Xj =Xc. • Then:

Xi

Xc

=

=

=

==

−−

j

j

j

j

Xjij

Xjiij

Xjiiij

Xijiiii

XXXP

XEPXXP

XXEPXXP

XXEPXEPX

)(λ)|(

)|()|(

),|()|(

)|,()|()(λ

Page 67: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

67

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Calculating λ(Xi) for non-leaves

• Now, suppose Xi has a set of children, C.• Since Xi d-separates each of its subtrees, the contribution of each

subtree to λ(Xi) is independent:

∏ ∑

⎥⎥⎦

⎢⎢⎣

⎡=

==

CX Xjij

CXijiii

j j

j

XXXP

XXEPX

)λ()|(

)(λ)|()(λ

• where λj(Xi) is the contribution to P(Ei-| Xi) of the part of the evidence

lying in the subtree rooted at one of Xi’s children Xj.

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

We are now λ-happy

• We have a way to recursively compute all the λ(Xi)’s, starting from the root and using the leaves as the base case.

• We can think of each node in the network as an autonomous processor that passes a little “λ message” to its parent.

λ λ λ λ

λλ

Page 68: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

68

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Computing π(Xi)

Xp

Xi

• Where πi(Xp) is defined as

=

=

=

=

==

+

++

++

p

p

p

p

p

Xpipi

X pi

ppi

Xippi

Xipipi

Xipiiii

XXXP

XEXP

XXP

EXPXXP

EXPEXXP

EXXPEXPX

)(π)|(

)(λ)|(

)|(

)|()|(

)|(),|(

)|,()|()(π

)(λ)|(

pi

p

XEXP

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bayesian network inference in trees

• Thus we can compute all the π(Xi)’s, and, in turn, all the P(Xi|E)’s.

• Can think of nodes as autonomous processors passing λ and π messages to their neighbors

λ λ λ λ

λλπ π

π π π π

Page 69: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

69

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conjunctive queries

• What if we want, e.g., P(A, B | C) instead of just marginal distributions P(A | C) and P(B | C)?

• Just use chain rule:

– P(A, B | C) = P(A | C) P(B | A, C)– Each of the latter probabilities can be computed using

the technique just discussed.

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Polytrees

• Previous technique can be generalized to polytrees: undirected versions of the graphs are still trees, but nodes canhave more than one parent

Page 70: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

70

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Dealing with cycles

• Can deal with undirected cycles in graph by• clustering variables together

• Conditioning

B

A

C

D

A

D

BC

Set to 0 Set to 1

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Dealing with cycles

• Can deal with undirected cycles in graph by• clustering variables together

B

A

C

D

A

D

BC

Page 71: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

71

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Join trees or junction treesArbitrary Bayesian network can be transformed via a graph-

theoretic trick into a join tree (also used in databases) in which a similar method can be employed.

AB

E D

F

C

G

In the worst case the join tree nodes must take on values whose number grows exponentially with the number of nodes that are clustered together, but this often works well in practicewhen the number of nodes per cluster is small

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Junction Tree• Why junction tree?

– Variable elimination is inefficient if the undirected graph underlying the Bayesian Nets contains cycles

– We can avoid cycles if we turn highly-interconnected subsets of the nodes into “supernodes” cluster

• Objective– Compute

• is a value of a variable and is evidence for a set of variables

)|( eE == vVPv V e

E

Page 72: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

72

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Potentials

• Potentials:

– Denoted by • Marginalization

– , the marginalization of into X

• Multiplication

– , the multiplication of and

Φ :X→ R+ ∪{0}

∑=X\Y

YX φφ

YX ⊆φ

YXZ ∪=

YφXφYXZ φφφ =

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of Junction Tree• An undirected tree• Each node is a cluster (nonempty set) of variables• Running intersection property:

– Given two clusters and , all clusters on the path between and contain

• Separator sets (sepsets): – Intersection of the adjacent cluster

X YX Y YX ∩

ADEABD DEFAD DE

Cluster ABD

SepsetDE

Page 73: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

73

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of Junction Tree

• Belief potentials: – Map each instantiation of clusters or sepsets into a real

number• Constraints:

– Consistency: for each cluster and neighboring sepset

– The joint distribution

X S

SS\X

X φφ =∑

∏∏=

j

i

j

iPS

XUφφ

)(

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Properties of Junction Tree

• If a junction tree satisfies these properties, it follows that:– For each cluster (or sepset) ,

– The probability distribution of any variable , using any cluster (or sepset) that containsX

)(XX P=φV

X

V

∑=}\{

)(V

VPX

Page 74: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

74

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Building Junction Trees

DAG

Moral Graph

Triangulated Graph

Junction Tree

Identifying Cliques

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Constructing the Moral Graph

A

B

D

C

E

G

F

H

Page 75: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

75

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Constructing The Moral Graph

• Add undirected edges to all co-parents which are not currently joined –Marrying parents

A

B

D

C

E

G

F

H

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Constructing The Moral Graph

• Add undirected edges to all co-parents which are not currently joined –Marrying parents

• Drop the directions of the arcs

A

B

D

C

E

G

F

H

Page 76: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

76

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Triangulating

• An undirected graph is triangulated iff every cycle of length >3 contains an edge to connects two nonadjacent nodes

A

B

D

C

E

G

F

H

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Identifying Cliques

• A clique is a subgraph of an undirected graph that is complete (has an edge between each pair of vertices) and maximal

A

B

D

C

E

G

F

H

EGH

ADEABD

ACEDEF

CEG

Page 77: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

77

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Junction Tree• A junction tree is a subgraph of the clique

graph that – is a tree – contains all the cliques– satisfies the junction tree property

• Junction tree property: For each pair U, Vof cliques with intersection S, all cliques on the path between U and V contain S.

EGH

ADEABD

ACEDEF

CEG

ADEABD ACEAD AE CEGCE

DEF

DE

EGH

EG

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference

• Choose a root• For each distribution (CPT) in the original Bayes Net, put

this distribution into one of the clique nodes that contains allthe variables referenced by the CPT. (At least one such node must exist because of the moralization step).

• For each clique node, take the product of the distributions (as in variable elimination).

Page 78: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

78

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example: Create Join Tree

X1 X2

Y1 Y2

Junction Tree:

X1,X2X1,Y1 X2,Y2X1 X2

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example: Initialization

X1,X2X2

X2,Y2Y2

X1,Y1Y1

X1,Y1X1

Potential functionAssociated ClusterVariable

φX1,Y1 = P(X1)

)1|1()1(1,1 XYPXPYX =φ

φX1,X 2 = P(X2 | X1)

φX 2,Y 2 = P(Y2 | X2)

X1,X2X1,Y1 X2,Y2X1 X2

Page 79: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

79

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example: Collect Evidence

• Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected.

• Call recursively neighboring cliques for messages:• 1. Call X1,Y1.

– 1. Projection:

– 2. Absorption:

∑ ∑−

==φ=φ1

11111 1

111XYX Y

YXX XPYXP},{

, )(),(

),()()|(,, 211121

1

2121XXPXPXXPold

X

XXXXX ==

φ

φφ←φ

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example: Collect Evidence (cont.)

• 2. Call X2,Y2:– 1. Projection:

– 2. Absorption:

φX 2 = φX 2,Y 2 = P(Y2 | X2) =1Y 2∑

{X 2,Y 2}−X 2∑

X1,X2X1,Y1 X2,Y2X1 X2

φX1,X 2 ← φX1,X 2φX 2

φX 2old = P(X1, X2)

Page 80: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

80

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example: Distribute Evidence

• Pass messages recursively to neighboring nodes• Pass message from X1,X2 to X1,Y1:

– 1. Projection:

– 2. Absorption:

φX1 = φX1,X 2 = P(X1, X2) = P(X1)X 2∑

{X1,X 2}−X1∑

φX1,Y1 ← φX1,Y1φX1

φX1old = P(X1,Y1) P(X1)

P(X1)

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example: Distribute Evidence (cont.)

• Pass message from X1,X2 to X2,Y2:– 1. Projection:

– 2. Absorption:

φX 2 = φX1,X 2 = P(X1, X2) = P(X2)X1∑

{X1,X 2}−X 2∑

φX 2,Y 2 ← φX 2,Y 2φX 2

φX 2old = P(Y2 | X2) P(X2)

1= P(Y2, X2)

X1,X2X1,Y1 X2,Y2X1 X2

Page 81: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

81

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Approximate Inference

• With large and highly connected graphical models, the associated cliques for the junction tree algorithm or the intermediate factors in the variable elimination algorithm will grow in size, generating an exponential blowup in the number of computations performed

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Inference in Bayesian network

Exact inference algorithms:• Variable elimination• Symbolic inference (D’Ambrosio)• Message passing algorithm (Pearl)• Clustering and join tree approach (Lauritzen, Spiegelhalter)Approximate inference algorithms:• Monte Carlo methods:• Forward sampling, Likelihood sampling• Variational methods

Page 82: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

82

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Stochastic simulation• Suppose you are given values for some subset of the

variables, G, and want to infer values for unknown variables, U

• Randomly generate a very large number of instantiations from the BN• Generate instantiations for all variables – start at root

variables and work your way “forward”• Only keep those instantiations that are consistent with the

values for G• Use the frequency of values for U to get estimated

probabilities• Accuracy of the results depends on the size of the sample

(asymptotically approaches exact results)

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Stochastic Simulation

RainSprinkler

Cloudy

WetGrass

1. Draw N samples from the BN by repeating 1.1 and 1.21.1. Guess Cloudy at random according to P(Cloudy) 1.2. For each guess of Cloudy, guess

Sprinkler and Rain, then WetGrass2. Compute the ratio of the # runs where

WetGrass and Cloudy are True over the # runs where Cloudy is True

P(WetGrass|Cloudy)?

P(WetGrass|Cloudy) = P(WetGrass, Cloudy) / P(Cloudy)

Page 83: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

83

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Stochastic simulation• The probability is approximated using sample frequencies

BN sampling: • Generate sample in a top down manner, following the links

in BN• A sample is an assignment of values to all

variables

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

BN Sampling Example

Page 84: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

84

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

BN Sampling Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

BN Sampling Example

Page 85: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

85

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

BN Sampling Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

BN Sampling Example

Page 86: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

86

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

BN Sampling Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Rejection Sampling

Rejection sampling:• Generate sample for the full joint by sampling BN• Use only samples that agree with the condition, the

remaining samples are rejected• Problem: many samples can be rejected

Page 87: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

87

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting

• Avoids inefficiencies of rejection sampling• Idea: generate only samples consistent with an evidence

(or conditioning event)• If the value is set by evidence, there is no sampling• Problem: using simple counts is not enough since these

may occur with different probabilities• Likelihood weighting: with every sample keep a weight with

which it should count towards the estimate

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Page 88: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

88

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Page 89: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

89

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Page 90: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

90

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Page 91: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

91

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Page 92: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

92

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Page 93: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

93

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Page 94: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

94

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood weighting Example

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood Sampling

Page 95: Probabilistic Graphical Models: Bayesian Networksweb.cs.iastate.edu/~cs573x/Notes/cs573week13-spring07.pdf · Artificial Intelligence Research Laboratory Properties of undirected

95

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood Sampling

Vasant Honavar, 2006.

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Likelihood Weighting