CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks

CHAPTER 5CHAPTER 5

Probability Theory (continued)Introduction to Bayesian Networks

Joint ProbabilityJoint Probability

Marginal ProbabilityMarginal Probability

Conditional ProbabilityConditional Probability

The Chain Rule IThe Chain Rule I

Bayes’ Rule

More Bayes’ RuleMore Bayes’ Rule

The Chain Rule IIThe Chain Rule II

IndependenceIndependence

Example: IndependenceExample: Independence

Example: Independence?Example: Independence?

Conditional IndependenceConditional Independence


The Chain Rule IIIThe Chain Rule III

ExpectationsExpectations

ExpectationsExpectations

EstimationEstimation

EstimationEstimation• Problems with maximum likelihood estimates:

- If I flip a coin once, and it’s heads, what’s the estimate for P(heads)?- What if I flip it 50 times with 27 heads?- What if I flip 10M times with 8M heads?

• Basic idea:- We have some prior expectation about parameters

(here, the probability of heads)- Given little evidence, we should skew toward prior- Given lots of evidence, we should listen to data

• How can we accomplish this? Stay tuned!

Lewis Carroll's Pillow ProblemLewis Carroll's Pillow Problem

Lewis Carroll's Pillow ProblemLewis Carroll's Pillow Problem

Bayesian Networks: Big PictureBayesian Networks: Big Picture

• Two big problems with joint probability distributions:- Unless there are only a few variables, the distribution is too big to

represent explicitly (Why?)- Hard to estimate anything empirically about more than a few

variables at a time (Why?)

• Hard to compute answers to queries of the form P(y | a) (Why?)

• Bayesian networks are a technique for describing complex joint distributions (models) using a bunch of simple, local distributions- It describes how variables interact locally- Local interactions chain together to give global, indirect interactions- For about 10 min, we’ll be very vague about how these interactions are specified

Graphical Model Notation

Example: Coin FlipsExample: Coin Flips

Example: TrafficExample: Traffic

Example: Traffic II

Example: Alarm NetworkExample: Alarm Network

Bayesian Network SemanticsBayesian Network Semantics

Example: Alarm NetworkExample: Alarm Network

Size of a Bayes’ NetSize of a Bayes’ Net

• How big is a joint distribution over N Boolean variables?

• 2N

• How big is a Bayes net if each node has k parents?• N 2k

• Both give you the power to calculate P(X1,X2,…,Xn)• Bayesian Networks = Huge space savings!• Also easier to elicit local CPTs• Also turns out to be faster to answer queries (future

class)

Building the (Entire) JointBuilding the (Entire) Joint

Example: TrafficExample: Traffic

Example: Reverse TrafficExample: Reverse Traffic

Causality?Causality?• When Bayes’ nets reflect the true causal patterns:

Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts

• BNs need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables Traffic and RoofDrips End up with arrows that reflect correlation, not causation

• What do the arrows really mean? Topology may happen to encode causal structure Topology really encodes conditional independencies

Creating Bayes’ NetsCreating Bayes’ Nets

• So far, we talked about how any fixed Bayes’ netencodes a joint distribution

• Next: how to represent a fixed distribution as aBayes’ net Key ingredient: conditional independence The exercise we did in “causal” assembly of BNs was a kind of intuitive use of conditional

independence Now we have to formalize the process

• After that: how to answer queries (inference)


Documents

CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks