22
Bayesian Networks II George Konidaris [email protected] Spring 2016

Bayesian Networks II - courses.cs.duke.edu

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bayesian Networks II - courses.cs.duke.edu

Bayesian Networks II

George [email protected]

Spring 2016

Page 2: Bayesian Networks II - courses.cs.duke.edu

Recall: Bayesian Network

Sinus

Flu Allergy

Nose Headache

Page 3: Bayesian Networks II - courses.cs.duke.edu

Recall: BN

Sinus

Flu Allergy

Nose

Headache

Flu PTrue 0.6False 0.4

Allergy PTrue 0.2False 0.8

Nose Sinus P

True True 0.8False True 0.2True False 0.3False False 0.7

Headache Sinus PTrue True 0.6False True 0.4True False 0.5False False 0.5

Sinus Flu Allergy PTrue True True 0.9False True True 0.1True True False 0.6False True False 0.4True False False 0.2False False False 0.8True False True 0.4False False True 0.6

joint: 32 (31) entries

Page 4: Bayesian Networks II - courses.cs.duke.edu

InferenceGiven A compute P(B | A).

Sinus

Flu Allergy

Nose Headache

Page 5: Bayesian Networks II - courses.cs.duke.edu

Last Time: Variable Elimination!!!!! … we can eliminate variables one at a time:(distributive law)

P (h) =X

SANF

P (h|S)P (N |S)P (S|A,F )P (F )P (A)

P (h) =X

SN

P (h|S)P (N |S)X

AF

P (S|A,F )P (F )P (A)

P (h) =X

S

P (h|S)X

N

P (N |S)X

AF

P (S|A,F )P (F )P (A)

Page 6: Bayesian Networks II - courses.cs.duke.edu

SamplingBayesian networks are generative models:• Describe a probability distribution.• Can draw samples from that distribution.• This is like a stochastic simulation.• Computationally expensive, but easy to code!!!

Page 7: Bayesian Networks II - courses.cs.duke.edu

Generative ModelsWidely used methodology in machine learning (later).!Describe a generative process for the data.

• Each variable is generated by a distribution• Can generate more data.

!Natural way to include domain knowledge.

Page 8: Bayesian Networks II - courses.cs.duke.edu

Generative Models

Sinus

Flu

Allergy

Nose

Headache

Flu PTrue 0.6False 0.4

Allergy PTrue 0.2False 0.8

Nose Sinus P

True True 0.8False True 0.2True False 0.3False False 0.7

Headache Sinus PTrue True 0.6False True 0.4True False 0.5False False 0.5

Sinus Flu Allergy PTrue True True 0.9False True True 0.1True True False 0.6False True False 0.4True False False 0.2False False False 0.8True False True 0.4False False True 0.6

Page 9: Bayesian Networks II - courses.cs.duke.edu

Sampling the JointAlgorithm for generating samples drawn from the joint distribution:!For each node with no parents:

• Draw sample from marginal distribution.• Condition children on choice (removes edge)• Repeat.

Results in artificial data set.Probability values - literally just count.

Page 10: Bayesian Networks II - courses.cs.duke.edu

Sampling the ConditionalWhat if we want to know P(A | B)?!We could use the previous procedure, and just divide the data up based on B.!What if we want P(A | b)?

• Could do the same, just use data with B=b.• But what if b doesn’t happen often?• What is b involves many variables?

Page 11: Bayesian Networks II - courses.cs.duke.edu

Sampling the ConditionalTwo broad approaches. !Rejection sampling:

• Sample, throw away when mismatch occurs. (B != b)!Importance sampling:

• Bias the sampling process to get more “hits”.• Use a reweighing trick to unbias probabilities.

Page 12: Bayesian Networks II - courses.cs.duke.edu

SamplingProperties of sampling:

• Slow.• Always works.• Always applicable.• Computers are getting faster.

Page 13: Bayesian Networks II - courses.cs.duke.edu

Bayes NetsHigh-level thoughts.!

Bayes Nets are a type of representation.!!There are multiple algorithms for inference; you can choose whichever you like.!!

AI researchers talk about models more than algorithms.

Page 14: Bayesian Networks II - courses.cs.duke.edu

Probability DistributionsIf you have a discrete RV, probability distribution is a table:!!!!!What if you have a real-valued random variable?

• Temperature tomorrow• Rainfall• Number of votes in election• Height

Flu PTrue 0.6False 0.4

Page 15: Bayesian Networks II - courses.cs.duke.edu

PDFsContinuous probabilities described by probability density function f(x). !

PDF is about density, not probability. • Non-negative.• • f(x) might be greater than 1.

f

x

Z

Xf(x) = 1 integrates to 1

Page 16: Bayesian Networks II - courses.cs.duke.edu

PDFsCan’t ask P(x = 0.0014245)?!The probability of a single real-valued number is zero.!Instead we can ask for a range:!!

P (a X b) =

Z b

af(x)dx

Page 17: Bayesian Networks II - courses.cs.duke.edu

DistributionsDistributions usually specified by a PDF type or family.!Each family is a parametrized function describing the PDF. !Get a specific distribution by fixing the parameters.

Page 18: Bayesian Networks II - courses.cs.duke.edu

Uniform DistributionFor example, uniform distribution over [0, 0.5].!Parameter: mean.

f

x0 0.5

µ

Page 19: Bayesian Networks II - courses.cs.duke.edu

Gaussian (Normal)A mean + an exponential drop-off, characterized by variance.

f

x0 0.5

µ

�2

f(x, µ,�2) =1

p2⇡

e

� (x�µ)2

2�2

Page 20: Bayesian Networks II - courses.cs.duke.edu

PDFsWhen dealing with a real-valued variable, two steps:

• Specifying the family of distribution.• Specifying the values of the parameters.

!Conditioning on a discrete variable just means picking from a discrete number of parameter settings.

µA �2A B

0.5 0.02 True0.1 0.06 False

Page 21: Bayesian Networks II - courses.cs.duke.edu

PDFsConditioning on real-valued RV: • Parameters function of RV

!Linear regression:

f(x) = w · x+ ✏

y ⇠ N(w · x,�2)

y

x0 0.5

Page 22: Bayesian Networks II - courses.cs.duke.edu

Parametrized FormsMany machine learning algorithms start with parametrized, generative models. !!

Find PDFs / CPTs (i.e., parameters) such that probability that they generated the data is maximized.!There are also non-parametric forms: describe the PDF directly from the data itself, not a function.