Bayesian Networks II - courses.cs.duke.edu

Bayesian Networks II

George [email protected]

Spring 2016

mailto:[email protected]

Recall: Bayesian Network

Sinus

Flu Allergy

Nose Headache

Recall: BN

Sinus

Flu Allergy

Nose

Headache

Flu PTrue 0.6False 0.4

Allergy PTrue 0.2False 0.8

Nose Sinus P

True True 0.8False True 0.2True False 0.3False False 0.7

Headache Sinus PTrue True 0.6False True 0.4True False 0.5False False 0.5

Sinus Flu Allergy PTrue True True 0.9False True True 0.1True True False 0.6False True False 0.4True False False 0.2False False False 0.8True False True 0.4False False True 0.6

joint: 32 (31) entries

InferenceGiven A compute P(B | A).

Sinus

Flu Allergy

Nose Headache

Last Time: Variable Elimination!!!!! … we can eliminate variables one at a time:(distributive law)

P (h) =X

SANF

P (h|S)P (N |S)P (S|A,F )P (F )P (A)

P (h) =X

SN

P (h|S)P (N |S)X

AF

P (S|A,F )P (F )P (A)

P (h) =X

S

P (h|S)X

N

P (N |S)X

AF

P (S|A,F )P (F )P (A)

SamplingBayesian networks are generative models:• Describe a probability distribution.• Can draw samples from that distribution.• This is like a stochastic simulation.• Computationally expensive, but easy to code!!!

Generative ModelsWidely used methodology in machine learning (later).!Describe a generative process for the data.

• Each variable is generated by a distribution• Can generate more data.

!Natural way to include domain knowledge.

Generative Models

Sinus

Flu

Allergy

Nose

Headache


Allergy PTrue 0.2False 0.8

Nose Sinus P

True True 0.8False True 0.2True False 0.3False False 0.7

Headache Sinus PTrue True 0.6False True 0.4True False 0.5False False 0.5

Sinus Flu Allergy PTrue True True 0.9False True True 0.1True True False 0.6False True False 0.4True False False 0.2False False False 0.8True False True 0.4False False True 0.6

Sampling the JointAlgorithm for generating samples drawn from the joint distribution:!For each node with no parents:

• Draw sample from marginal distribution.• Condition children on choice (removes edge)• Repeat.

Results in artificial data set.Probability values - literally just count.

Sampling the ConditionalWhat if we want to know P(A | B)?!We could use the previous procedure, and just divide the data up based on B.!What if we want P(A | b)?

• Could do the same, just use data with B=b.• But what if b doesn’t happen often?• What is b involves many variables?

Sampling the ConditionalTwo broad approaches. !Rejection sampling:

• Sample, throw away when mismatch occurs. (B != b)!Importance sampling:

• Bias the sampling process to get more “hits”.• Use a reweighing trick to unbias probabilities.

SamplingProperties of sampling:

• Slow.• Always works.• Always applicable.• Computers are getting faster.

Bayes NetsHigh-level thoughts.!

Bayes Nets are a type of representation.!!There are multiple algorithms for inference; you can choose whichever you like.!!

AI researchers talk about models more than algorithms.

Probability DistributionsIf you have a discrete RV, probability distribution is a table:!!!!!What if you have a real-valued random variable?

• Temperature tomorrow• Rainfall• Number of votes in election• Height


PDFsContinuous probabilities described by probability density function f(x). !

PDF is about density, not probability. • Non-negative.• • f(x) might be greater than 1.

f

x

Z

Xf(x) = 1 integrates to 1

PDFsCan’t ask P(x = 0.0014245)?!The probability of a single real-valued number is zero.!Instead we can ask for a range:!!

P (a X b) =

Z b

af(x)dx

DistributionsDistributions usually specified by a PDF type or family.!Each family is a parametrized function describing the PDF. !Get a specific distribution by fixing the parameters.

Uniform DistributionFor example, uniform distribution over [0, 0.5].!Parameter: mean.

f

x0 0.5

µ

Gaussian (Normal)A mean + an exponential drop-off, characterized by variance.

f

x0 0.5

µ

�2

f(x, µ,�2) =1

�

p2⇡

e

� (x�µ)2

2�2

PDFsWhen dealing with a real-valued variable, two steps:

• Specifying the family of distribution.• Specifying the values of the parameters.

!Conditioning on a discrete variable just means picking from a discrete number of parameter settings.

µA �2A B

0.5 0.02 True0.1 0.06 False

PDFsConditioning on real-valued RV: • Parameters function of RV

!Linear regression:

f(x) = w · x+ ✏

y ⇠ N(w · x,�2)

y

x0 0.5

Parametrized FormsMany machine learning algorithms start with parametrized, generative models. !!

Find PDFs / CPTs (i.e., parameters) such that probability that they generated the data is maximized.!There are also non-parametric forms: describe the PDF directly from the data itself, not a function.

Documents

Bayesian Networks II - courses.cs.duke.edu