Upload
letitia-stafford
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
1
Bayesian Networks(Directed Acyclic Graphical Models)
The situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models.
A clique will be formed because of induced dependency of the two coins given the bell.
Coin1
Bell
Coin2
2
Bayesian Networks (BNs)Examples of models for diseases &
symptoms & risk factors
One variable for all diseases (values are diseases)
One variable per disease (values are True/False)
Naïve Bayesian Networks versus Bipartite BNs
3
Boundary Basis for Dependency Models
Let M be a dependency model over U={X1,…,Xn}. Let d be an ordering of these elements.
A boundary basis wrt d of M is a set of independence statements I(Xi, Bi, Ui-Bi) that hold in M where Ui={X1,X2,…,Xi-1}, i=1,..n.
A boundary basis is minimal if every Bi is minimal.
Example I: What is the boundary basis for P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)?
4
Example I
I ( X3 , X2 , X1)
I ( X4 , X3, {X1, X2})
X1 X2 X3 X4
A boundary basis and a boundary DAG for: P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)?
The directed acyclic graph (DAG) created by assigning each vertex Xi the parents Bi is called the boundary DAG of M relative to order d.
5
Example II
I ( coin1, { } ,coin2)
Coin1
Bell
Coin2
A boundary basis and a boundary DAG for: P(coin1,coin2,bell) =P(coin1)P(coin2)P(bell|coin1,coin2)
6
Example III
In the order V,S,T,L,B,A,X,D, we have a boundary basis: I( S, { }, V ) I( T, V, S) I( l, S, {T, V}) … I( X,A, {V,S,T,L,B,D})
V S
LT
A B
X D
),|()|(),|()|()|()|()()(
),,,,,,,(
badPaxPltaPsbPslPvtPsPvP
dxabltsvP
Does I ( {X, D} ,A,V) also hold in the dependency model P ?
7
1. A Directed Acyclic Graph (DAG) D=(U,E) is an I-map of a dependency model M over U if ID(X,Z,Y) IM(X,Z,Y) for all disjoint subsets X,Y, Z of U.
2. D is a minimal I-map of M if by removing any edge, D ceases to be an I-map.
3. D is a perfect map of M if ID(X,Z,Y) IM(X,Z,Y) for all disjoint subsets X,Y, Z of U.
DefinitionsCan we define “Independence” ID(X,Z,Y) graphically that answers these probabilistic independence questions ?
8
From Separation in UGs
To d-Separation in DAGs
9
Paths
Intuition: dependency must “flow” along paths in the graph
A path is a sequence of neighboring variables
Examples: X A D B A L S B
V S
LT
A B
X D
10
Path blockage
Every path is classified given the evidence: active -- creates a dependency between the
end nodes blocked – does not create a dependency
between the end nodes
Evidence means the assignment of a value to a subset of nodes.
11
Blocked
S
L B
S
L B
Path Blockage
Three cases: Common cause
Blocked Active
12
Blocked
S
A
L
S
A
L
Path Blockage
Three cases: Common cause
Intermediate cause
Blocked Active
13
Blocked
T L
X
A
T L
X
AT L
X
A
Path Blockage
Three cases: Common cause
Intermediate cause
Common Effect
Blocked Active
14
Definition of Path BlockageDefinition: A path is active, given evidence Z, if Whenever we have the configuration
then either A or one of its descendents is in Z
No other nodes in the path are in Z.
Definition: A path is blocked, given evidence Z, if it is not active.
T L
A
Definition: X is d-separated from Y, given Z, if all paths from a node in X and a node in Y are blocked, given Z.
15
d-Separation
16
ID(T,S|) = yes
Example
V S
LT
A B
X D
17
V S
LT
A B
X D
ID (T,S |) = yes ID(T,S|D) = no
Example
18
ID (T,S |) = yes ID(T,S|D) = no ID(T,S|{D,L,B}) = yes
Example
V S
LT
A B
X D
19
Example
In the order V,S,T,L,B,A,X,D, we get from the boundary basis: ID( S, { }, V )
ID( T, V, S)
ID( l, S, {T, V}) … ID( X,A, {V,S,T,L,B,D})
V S
LT
A B
X D
20
Main Result - Soundness
21
Bayesian Networks(Directed Acyclic Graphical Models)
Definition: Given a probability distribution P on a set of variables U, a DAG D = (U,E) is called a Bayesian Network of P iff D is a minimal I-map of P.
22
First claim holds because any probability distribution is a semi graphoid (Symmetry, Decomposition, Contraction, Weak union).
23
Second claim of uniqueness of parents sets holds due to. I(X,ZW1,YW2) and I(X,ZW2,YW1) I(X,Z,YW1W2)
Proof:(1) I(X, ZW1,YW2). Given.(2) I(X, ZW2,YW1). Given.
(3) I(X, ZW1W2,Y) by weak union from (1).(4) I(X, ZYW1,W2) by weak union from (1).(5) I(X, ZYW2,W1) by weak union from (2).(6) I(X, ZY, W1W2) by intersection from (4) and (5).
I(X, Z, YW1W2) by intersection from (3) and (6).
24
d-separation
The definition of ID(X, Z, Y) is such that:
Soundness [Theorem 9]: ID(X, Z, Y) = yes implies IP(X, Z, Y) follows from the boundary Basis(D).
Completeness [Theorem 10]: ID(X, Z, Y) = no
implies IP(X, Z, Y) does not follow from the boundary Basis(D).
25
Revisiting Example II
V S
LT
A B
X D
So does IP( {X, D} ,A, V) hold ?
Enough to check d-separation !
26
Bayesian Networks with numbers
p(t|v)V S
LT
A B
X D
),|()|(),|()|()|()|()()(
),,,,,,,(
badPaxPltaPsbPslPvtPsPvP
dxabltsvP
p(x|a) p(d|a,b)
p(a|t,l)p(b|s)
p(l|s)
p(s)p(v)
27
Bayesian Network (cont.)Each Directed Acyclic Graph defines a factorization of the form:
n
iiin xpxxp
11 )|(),,( pa
),|()|(),|()|()|()|()()(
),,,,,,,(
badPaxPltaPsbPslPvtPsPvP
dxabltsvP
p(t|v)V S
LT
A B
X D
p(x|a) p(d|a,b)
p(a|t,l)p(b|s)
p(l|s)
p(s)p(v)
28
Independence in Bayesian networks
(*))|(),,(1
1
n
iiin xpxxp pa
n
iiin xxxpxxp
1111 ),|(),,(
This set of independence assertions is denoted Basis(G) .All other independence assertions that are entailed by (*) are derivable using the semi-graphoid axioms.
IP( Xi ; {X1,…,Xi-1}\Pai | Pai )
29
Local distributions- Asymmetric independence
Table:p(A=y|L=n, T=n) = 0.02p(A=y|L=n, T=y) = 0.60p(A=y|L=y, T=n) = 0.99p(A=y|L=y, T=y) = 0.99
Lung Cancer(Yes/No)
Tuberculosis
(Yes/No)
Abnormalityin Chest(Yes/no)
p(A|T,L)
30
COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents.
Proof : Each variable X is conditionally independent of all its non-descendants, given its parents implies using decomposition that it is also independent of its predecessors in a particular order d.
Proof : X is d-separated of all its non-descendants, given its parents. Since D is an I-map, by the soundness theorem the claim holds.
31
COROLLARY 5: If D=(U,E) is a boundary DAG of P constructed in some order d, then any topological order d’ of U will yield the same boundary DAG of P. (Hence construction order can be forgotten).
Proof : By Corollary 4, each variable X is d-separated of all its non-descendants, given its parents in the boundary DAG of P.
In particular, due to decomposition, X is independent given its parents from all previous variables in any topological order d’.
32
Extension of the Markov Chain Property
I(Xk, Xk-1, X1 … Xk-2) I(Xk, Xk-1 Xk+1, X1 … Xk-2 Xk+2… Xn )
Holds due to the soundness theorem. Converse holds when Intersection is assumed.
Markov Blankets in DAGs
33
Consequence: There is no improvement to d-separation and no statement escapes graphical representation.
Reasoning: (1) If there were an independence statement not shown by d-separation, then must be true in all distributions that satisfy the basis. But Theorem 10 states that there exists a distribution that satisfies the basis and violates . (2) Same argument. [Note that (2) is a stronger claim.]