33
Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models. A clique will be formed because of induced dependency of the two coins given the bell. Coin 1 Bell Coin 2

Bayesian Networks (Directed Acyclic Graphical Models)

Embed Size (px)

DESCRIPTION

Coin 1. Coin 2. Bell. Bayesian Networks (Directed Acyclic Graphical Models). The situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models. - PowerPoint PPT Presentation

Citation preview

1

Bayesian Networks(Directed Acyclic Graphical Models)

The situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models.

A clique will be formed because of induced dependency of the two coins given the bell.

Coin1

Bell

Coin2

2

Bayesian Networks (BNs)Examples of models for diseases &

symptoms & risk factors

One variable for all diseases (values are diseases)

One variable per disease (values are True/False)

Naïve Bayesian Networks versus Bipartite BNs

3

Boundary Basis for Dependency Models

Let M be a dependency model over U={X1,…,Xn}. Let d be an ordering of these elements.

A boundary basis wrt d of M is a set of independence statements I(Xi, Bi, Ui-Bi) that hold in M where Ui={X1,X2,…,Xi-1}, i=1,..n.

A boundary basis is minimal if every Bi is minimal.

Example I: What is the boundary basis for P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)?

4

Example I

I ( X3 , X2 , X1)

I ( X4 , X3, {X1, X2})

X1 X2 X3 X4

A boundary basis and a boundary DAG for: P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)?

The directed acyclic graph (DAG) created by assigning each vertex Xi the parents Bi is called the boundary DAG of M relative to order d.

5

Example II

I ( coin1, { } ,coin2)

Coin1

Bell

Coin2

A boundary basis and a boundary DAG for: P(coin1,coin2,bell) =P(coin1)P(coin2)P(bell|coin1,coin2)

6

Example III

In the order V,S,T,L,B,A,X,D, we have a boundary basis: I( S, { }, V ) I( T, V, S) I( l, S, {T, V}) … I( X,A, {V,S,T,L,B,D})

V S

LT

A B

X D

),|()|(),|()|()|()|()()(

),,,,,,,(

badPaxPltaPsbPslPvtPsPvP

dxabltsvP

Does I ( {X, D} ,A,V) also hold in the dependency model P ?

7

1. A Directed Acyclic Graph (DAG) D=(U,E) is an I-map of a dependency model M over U if ID(X,Z,Y) IM(X,Z,Y) for all disjoint subsets X,Y, Z of U.

2. D is a minimal I-map of M if by removing any edge, D ceases to be an I-map.

3. D is a perfect map of M if ID(X,Z,Y) IM(X,Z,Y) for all disjoint subsets X,Y, Z of U.

DefinitionsCan we define “Independence” ID(X,Z,Y) graphically that answers these probabilistic independence questions ?

8

From Separation in UGs

To d-Separation in DAGs

9

Paths

Intuition: dependency must “flow” along paths in the graph

A path is a sequence of neighboring variables

Examples: X A D B A L S B

V S

LT

A B

X D

10

Path blockage

Every path is classified given the evidence: active -- creates a dependency between the

end nodes blocked – does not create a dependency

between the end nodes

Evidence means the assignment of a value to a subset of nodes.

11

Blocked

S

L B

S

L B

Path Blockage

Three cases: Common cause

Blocked Active

12

Blocked

S

A

L

S

A

L

Path Blockage

Three cases: Common cause

Intermediate cause

Blocked Active

13

Blocked

T L

X

A

T L

X

AT L

X

A

Path Blockage

Three cases: Common cause

Intermediate cause

Common Effect

Blocked Active

14

Definition of Path BlockageDefinition: A path is active, given evidence Z, if Whenever we have the configuration

then either A or one of its descendents is in Z

No other nodes in the path are in Z.

Definition: A path is blocked, given evidence Z, if it is not active.

T L

A

Definition: X is d-separated from Y, given Z, if all paths from a node in X and a node in Y are blocked, given Z.

15

d-Separation

16

ID(T,S|) = yes

Example

V S

LT

A B

X D

17

V S

LT

A B

X D

ID (T,S |) = yes ID(T,S|D) = no

Example

18

ID (T,S |) = yes ID(T,S|D) = no ID(T,S|{D,L,B}) = yes

Example

V S

LT

A B

X D

19

Example

In the order V,S,T,L,B,A,X,D, we get from the boundary basis: ID( S, { }, V )

ID( T, V, S)

ID( l, S, {T, V}) … ID( X,A, {V,S,T,L,B,D})

V S

LT

A B

X D

20

Main Result - Soundness

21

Bayesian Networks(Directed Acyclic Graphical Models)

Definition: Given a probability distribution P on a set of variables U, a DAG D = (U,E) is called a Bayesian Network of P iff D is a minimal I-map of P.

22

First claim holds because any probability distribution is a semi graphoid (Symmetry, Decomposition, Contraction, Weak union).

23

Second claim of uniqueness of parents sets holds due to. I(X,ZW1,YW2) and I(X,ZW2,YW1) I(X,Z,YW1W2)

Proof:(1) I(X, ZW1,YW2). Given.(2) I(X, ZW2,YW1). Given.

(3) I(X, ZW1W2,Y) by weak union from (1).(4) I(X, ZYW1,W2) by weak union from (1).(5) I(X, ZYW2,W1) by weak union from (2).(6) I(X, ZY, W1W2) by intersection from (4) and (5).

I(X, Z, YW1W2) by intersection from (3) and (6).

24

d-separation

The definition of ID(X, Z, Y) is such that:

Soundness [Theorem 9]: ID(X, Z, Y) = yes implies IP(X, Z, Y) follows from the boundary Basis(D).

Completeness [Theorem 10]: ID(X, Z, Y) = no

implies IP(X, Z, Y) does not follow from the boundary Basis(D).

25

Revisiting Example II

V S

LT

A B

X D

So does IP( {X, D} ,A, V) hold ?

Enough to check d-separation !

26

Bayesian Networks with numbers

p(t|v)V S

LT

A B

X D

),|()|(),|()|()|()|()()(

),,,,,,,(

badPaxPltaPsbPslPvtPsPvP

dxabltsvP

p(x|a) p(d|a,b)

p(a|t,l)p(b|s)

p(l|s)

p(s)p(v)

27

Bayesian Network (cont.)Each Directed Acyclic Graph defines a factorization of the form:

n

iiin xpxxp

11 )|(),,( pa

),|()|(),|()|()|()|()()(

),,,,,,,(

badPaxPltaPsbPslPvtPsPvP

dxabltsvP

p(t|v)V S

LT

A B

X D

p(x|a) p(d|a,b)

p(a|t,l)p(b|s)

p(l|s)

p(s)p(v)

28

Independence in Bayesian networks

(*))|(),,(1

1

n

iiin xpxxp pa

n

iiin xxxpxxp

1111 ),|(),,(

This set of independence assertions is denoted Basis(G) .All other independence assertions that are entailed by (*) are derivable using the semi-graphoid axioms.

IP( Xi ; {X1,…,Xi-1}\Pai | Pai )

29

Local distributions- Asymmetric independence

Table:p(A=y|L=n, T=n) = 0.02p(A=y|L=n, T=y) = 0.60p(A=y|L=y, T=n) = 0.99p(A=y|L=y, T=y) = 0.99

Lung Cancer(Yes/No)

Tuberculosis

(Yes/No)

Abnormalityin Chest(Yes/no)

p(A|T,L)

30

COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents.

Proof : Each variable X is conditionally independent of all its non-descendants, given its parents implies using decomposition that it is also independent of its predecessors in a particular order d.

Proof : X is d-separated of all its non-descendants, given its parents. Since D is an I-map, by the soundness theorem the claim holds.

31

COROLLARY 5: If D=(U,E) is a boundary DAG of P constructed in some order d, then any topological order d’ of U will yield the same boundary DAG of P. (Hence construction order can be forgotten).

Proof : By Corollary 4, each variable X is d-separated of all its non-descendants, given its parents in the boundary DAG of P.

In particular, due to decomposition, X is independent given its parents from all previous variables in any topological order d’.

32

Extension of the Markov Chain Property

I(Xk, Xk-1, X1 … Xk-2) I(Xk, Xk-1 Xk+1, X1 … Xk-2 Xk+2… Xn )

Holds due to the soundness theorem. Converse holds when Intersection is assumed.

Markov Blankets in DAGs

33

Consequence: There is no improvement to d-separation and no statement escapes graphical representation.

Reasoning: (1) If there were an independence statement not shown by d-separation, then must be true in all distributions that satisfy the basis. But Theorem 10 states that there exists a distribution that satisfies the basis and violates . (2) Same argument. [Note that (2) is a stronger claim.]