24
Bayesian Networks VISA Hyoungjune Yi

Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert

Embed Size (px)

Citation preview

Bayesian Networks

VISA

Hyoungjune Yi

BN – Intro.

Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert System

Common Sense Reasoning about uncertainty

June is waiting for Larry and Jacobs who are both late for VISA seminar

June is worried that if the roads are icy one or both of them may have crash his car

Suddenly June learns that Larry has crashed June think: “If Larry has crashed then probably the

roads are icy. So Jacobs has also crashed” June then learns that it is warm outside and roads

are salted June Think: “Larry was unlucky; Jacobs should still

make it”

Causal Relationships

State of RoadIcy/ not icy

JacobsCrash/No crash

LarryCrash/No crash

Larry Crashed !

State of RoadIcy/ not icy

JacobsCrash/No crash

LarryCrash/No crash

InformationFlow

But Roads are dry

State of Roadnot icy

JacobsCrash/No crash

LarryCrash/No crash

InformationFlow

Wet grass

To avoid icy roads, Larry moves to UCLA; Jacobs moves in USC

One morning as Larry leaves for work, he notices that his grass is wet. He wondered whether he has left his sprinkler on or it has rained

Glancing over to Jacobs’ lawn he notices that it is also get wet

Larry thinks: “Since Jacobs’ lawn is wet, it probably rained last night”

Larry then thinks: “If it rained then that explains why my lawn is wet, so probably the sprinkler is off”

Larry’s grass is wet

RainYes/no

Larry’s grassWet

Jacobs grassWet/Dry

InformationFlow

SprinklerOn/Off

Jacobs’ grass is also wet

RainYes/no

Larry’s grassWet

Jacobs grassWet

InformationFlow

SprinklerOn/Off

Bayesian Network

Data structure which represents the dependence between variables

Gives concise specification of joint prob. dist. Bayesian Belief Network is a graph that holds

– Nodes are a set of random variables– Each node has a conditional prob. Table– Edges denote conditional dependencies– DAG : No directed cycle– Markov condition

Bayesian network

Markov Assumption– Each random variable X is

independent of its non-descendent given its parent Pa(X)

– Formally, Ind(X; NonDesc(X) | Pa(X))if G is an I-MAP of P (<-? )I-MAP? Later

X

Y1 Y2

Markov Assumption

In this example:– Ind( E; B )– Ind( B; E, R )– Ind( R; A, B, C | E )– Ind( A; R | B,E )– Ind( C; B, E, R | A)

Earthquake

Radio

Burglary

Alarm

Call

I-Maps

A DAG G is an I-Map of a distribution P if the all Markov assumptions implied by G are satisfied by P

Examples:X Y

x y P(x,y)0 0 0.250 1 0.251 0 0.251 1 0.25

X Y

x y P(x,y)0 0 0.20 1 0.31 0 0.41 1 0.1

I-MAP

G is Minimal I-Map iff– G is I-Map of P– If G’ G then G’ is not an I-Map of P

I-Map is not unique

Factorization

Given that G is an I-Map of P, can we simplify the representation of P?

Example:

Since Ind(X;Y), we have that P(X|Y) = P(X) Applying the chain rule

P(X,Y) = P(X|Y) P(Y) = P(X) P(Y)

Thus, we have a simpler representation of P(X,Y)

X Y

Factorization Theorem

Thm: if G is an I-Map of P, then

i

iin1 ))X(Pa|X(P)X,...,X(P

P(C,A,R,E,B) = P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C|A,R,B,E)versus

P(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A)

Earthquake

Radio

Burglary

Alarm

Call

So, what ?

We can write P in terms of “local” conditional probabilities

If G is sparse,that is, |Pa(Xi)| < k ,

each conditional probability can be specified compactly

e.g. for binary variables, these require O(2k) params.

representation of P is compact

linear in number of variables

Formal definition of BN

A Bayesian network specifies a probability distribution via two components:

– A DAG G– A collection of conditional probability distributions P(Xi|

Pai)

The joint distribution P is defined by the factorization

Additional requirement: G is a minimal I-Map of P

i

iin PaXPXXP )|(),...,( 1

Bayesian Network - Example

Each node Xi has a conditional probability distribution P(Xi|Pai)

– If variables are discrete, P is usually multinomial– P can be linear Gaussian, mixture of Gaussians, …

XRay

Lung Infiltrates

Sputum Smear

TuberculosisPneumonia

0.8 0.2

p

t

p

0.6 0.4

0.010.99

0.2 0.8

tp

t

t

p

TP P(I |P, T )

BN Semantics

Compact & natural representation:– nodes have k parents 2k n vs. 2n params

conditionalindependenciesin BN structure

+local

probabilitymodels

full jointdistribution

over domain=

t)|sP(i)|P(xt),p|P(iP(t))pP()sx,i,t,,pP(

X

I

S

TP

d-separation

d-sep(X;Y | Z, G)– X is d-separated from Y, given Z if all paths from a node in

X to a node in Y are blocked given Z

Meaning ?– On the blackboard– Path

Active: dependency between end nodes in the path Blocked: No dependency

– Common cause, Intermediate, common effect On the blackboard

BN – Belief, Evidence and Query

BN is for “Query” - partly Query involves evidence

– Evidence is an assignment of values to a set of variables in the domain

Query is a posteriori belief– Belief

P(x) = 1 or P(x) = 0

Learning Structure

Problem Definition– Given: Data D– Return: directed graph expressing BN

Issue– Superfluous edges– Missing edges

Very difficult– http://robotics.stanford.edu/people/nir/tutorial/

BN Learning

BN models can be learned from empirical data– parameter estimation via numerical optimization– structure learning via combinatorial search.

BN hypothesis space biased towards distributions with independence structure.

InduceInducerr

InduceInducerrData

X

I

S

T P