39
Probabilistic & Approximate Computing Sasa Misailovic UIUC

Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Probabilistic &

Approximate

Computing

Sasa Misailovic

UIUC

Page 2: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Distribution of sum of two uniforms

X := Uniform(0,1)

Y := Uniform(0,1)

Z := X + Y

return Z

$ psi sum_uniform.prb

0 21

1

z

f(z)

Page 3: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Probabilistic Programs

Distribution X := Uniform(0, 1);

Assertion assert ( X >= 0 );

Observation observe ( X >= 0.5 );

Query return X;

Extend Standard (Deterministic) Programs

Page 4: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Probabilistic Model

0.5 0.5

0

0.5

1

0 1

𝑨 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊 𝟎. 𝟓

𝑷(𝑨= 𝟏)

𝒉𝒆𝒂𝒅: 𝟏𝒕𝒂𝒊𝒍: 𝟎

p

Page 5: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Probabilistic Model

𝑨 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊 𝟎. 𝟓𝑩 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊(𝟎. 𝟓)𝑪 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊 𝟎. 𝟓

𝑷(𝑨= 𝟏)

p

0.5 0.5

0

0.5

1

0 1

𝒉𝒆𝒂𝒅: 𝟏𝒕𝒂𝒊𝒍: 𝟎

Page 6: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

0.5 0.5

0

0.5

1

0 1

0.25

0.75

0

0.5

1

0 1

Probabilistic Model

𝑨 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊 𝟎. 𝟓𝑩 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊(𝟎. 𝟓)𝑪 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊 𝟎. 𝟓

𝑷(𝑨= 𝟏|𝑨+𝑩+𝑪≥𝟐)

≥2 heads

Posterior Distribution

Prior Distribution

p

𝒉𝒆𝒂𝒅: 𝟏𝒕𝒂𝒊𝒍: 𝟎

Page 7: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Probabilistic Programming

def main(){

A:=flip(0.5);

B:=flip(0.5);

C:=flip(0.5);

observe(A+B+C>=2);

return A;

}

𝑨 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊 𝟎. 𝟓𝑩 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊(𝟎. 𝟓)𝑪 ~ 𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊 𝟎. 𝟓

𝑷(𝑨= 𝟏|𝑨+𝑩+𝑪≥𝟐)

𝒉𝒆𝒂𝒅: 𝟏𝒕𝒂𝒊𝒍: 𝟎

≥2 heads

Page 8: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

def main(){

A:=flip(0.5);

B:=flip(0.5);

C:=flip(0.5);

observe(A+B+C>=2);

return A;

}

Probabilistic Programming

0.25

0.75

0

0.5

1

0 1

p

Inference Engine

Page 9: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

def main(){

A:=flip(0.5);

B:=flip(0.5);

C:=flip(0.5);

observe(A+B+C>=2);

return A;

}

Probabilistic Programmingp

0.25

0.75

0

0.5

1

0 1

Inference Engine

Page 10: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

0.250.18

0.750.82

0

0.5

1

0 1

0

0.5

1

0 1

def main(){

A:=flip(0.5+0.1);

B:=flip(0.5);

C:=flip(0.5);

observe(A+B+C>=2);

return A;

}

Probabilistic Programmingp Original Modified

Inference Engine

Page 11: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Probabilistic Applications

Scene labeling

Spam

Filter

Face Reconstruction

GPS & NavigationModeling of

Complex Systems

Page 12: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Pyro Tensorflow

Page 13: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

WWW.WEBPPL.ORG

Example Language:

Page 14: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Probability Refresher

Rosenthal J.; A First Look at Rigorous Probability Theory 2 ed.

Page 15: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Probability Distribution

• Discrete Distributions

• Continuous Distributions

• Hybrid Joint Distributions

Probability Refresher

Page 16: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Distribution Function

Probability Distribution Function

Probability Mass Function

Probability Density Function

Page 17: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Expectation

Expected value: measure of central tendency

Variance: measure of spread

Page 18: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Probabilistic Programs

and Graphical Models

X := Uniform(0,1)

Y := Uniform(0,1)

Z := X + Y

return Z

X Y

Z

Dependency Graph

Page 19: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Bayes’ RuleBelief Revision

Thomas Bayes

1701 –1761

Page 20: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Bayes’ RuleBelief Revision

Pr 𝜃 𝑥) =Pr 𝑥 𝜃) ⋅ Pr(𝜃)

Pr(𝑥)

Hypothesis

Data

Page 21: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Bayes’ RuleBelief Revision

Pr 𝜃 𝑥) =Pr 𝑥 𝜃) ⋅ Pr(𝜃)

Pr(𝑥)

Prior

DistributionLikelihoodPosterior

Distribution

Normalization

Constant

Page 22: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Is Our Brain Statistical?*

Probability of sickness is 1%

If a patient is sick, the probability that medical test returns

positive is 80% (true positive)

If a patient is not sick, the probability that medical test returns

positive is 9.6% (false positive)

For a given patient, the test returned positive.

What is the probability that the patient is sick?

* Kahneman and Tversky (1974)

Page 23: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Is Our Brain Statistical?

var test_effective = function() {var PatientSick = flip(0.01);

var PositiveTest = PatientSick? flip(0.8): flip(0.096);

condition (PositiveTest == true);

return PatientSick;}

Infer ({method: 'enumerate'}, test_effective) Fallacy:

Base rate

neglect 0.078

0.922

TRUE FALSEFor discussion: Goodman & Tenenbaum,

Probabilistic Models of Cognition (Ch. 3)

Page 24: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Bayesian NetsAlternative representation of probabilistic models

Graphical representation of dependencies among random variables:

• Nodes are variables

• Links from parent to child nodes are direct dependencies between variables

• Instead of full joint distribution, now termsPr 𝑋 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋)).

The graph has no cycles! DAG

Page 25: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Queries

Posterior distribution – what we got

Expected value –

Most likely value – Mode of the distribution

𝔼 𝑿 =

𝒙∈𝑫𝒐𝒎(𝑿)

𝒙 ⋅ Pr(𝒙)

Page 26: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Variable Dependencies

var test_effective = function() {var PatientSick = flip(0.01);

var PositiveTest = PatientSick? flip(0.8): flip(0.096);

condition (PositiveTest == true);

return PatientSick;}

Infer ({method: 'enumerate'}, test_effective)

Page 27: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Variable Dependencies

var test_effective = function() {var PatientSick = flip(0.01);

var PositiveTest = PatientSick? flip(0.8): flip(0.096);

condition (PositiveTest == true);

return PatientSick;}

Infer ({method: 'enumerate'}, test_effective)

Patient

Sick

Positive

Test

Page 28: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Variable Dependencies

var test_x = function() {

var x = flip(0.50);

var y = x?

flip(0.1): flip(0.2);

var z = x?

flip(0.3): flip(0.4);

condition(x == 1)

return [y, z]

}

X

Y Z

Page 29: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Variable Dependencies

var test_x = function() {

var x = flip(0.50);

var y = x?

flip(0.1): flip(0.2);

var z = x?

flip(0.3): flip(0.4);

condition(x == 1)

return [y, z]

}

X

Y Z

Page 30: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Reminder: Independence

Definition:

𝑷𝒓 𝑿, 𝒀 = 𝑷𝒓 𝑿 ⋅ 𝑷𝒓(𝒀)

But also*:

𝑷𝒓 𝑿 | 𝒀 = 𝑷𝒓 𝑿𝑷𝒓 𝒀 | 𝑿 = 𝑷𝒓(𝒀)

*Using the fact that for any two variables 𝑷𝒓 𝑿, 𝒀 = 𝑷𝒓 𝑿|𝒀 ⋅ 𝑷𝒓(𝒀)

Page 31: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Variable Dependencies

var test_z = function(){

var x = flip(0.50);

var y = flip(0.1);

var z = x+y;

condition(z == 1);

return x;

}

X Y

Z

Page 32: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Variable Dependencies

var test_z = function(){

var x = flip(0.50);

var y = flip(0.1);

var z = x+y;

condition(z == 1);

return x;

}

X Y

Z

Page 33: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Bayes’ RuleBelief Revision

Pr 𝜃 𝑥) =Pr 𝑥 𝜃) ⋅ Pr(𝜃)

Pr(𝑥)

Prior

DistributionLikelihood

Posterior

Distribution

Normalization

Constant

Page 34: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Bayes’ RuleBelief Revision

Pr 𝜃 𝑥) ~ Pr 𝑥 𝜃) ⋅ Pr(𝜃)

Prior

DistributionLikelihood

Posterior

Distribution

Enough to order different interpretations and select the most likely one

Page 35: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Bayes’ RuleBelief Revision

Pr 𝜃 𝑥) ~ Pr 𝑥 𝜃) ⋅ Pr(𝜃)

Prior

Distribution

Equvi-probable

Likelihood

Posterior

Distribution

Enough to order different interpretations and select the most likely one

Page 36: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Bayes’ RuleBelief Revision

Pr 𝜃 𝑥) ~ Pr 𝑥 𝜃)

Likelihood

Posterior

Distribution

Enough to order different interpretations and select the most likely one

Page 37: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Beyond Bayesian Net Models

Geometric Distribution: Probability of the number of

Bernoulli trials to get one success

var geometric = function() {

return flip(.5) ? 0 : geometric() + 1;

}

var dist = Infer({method: 'enumerate', maxExecutions: 10},

geometric);

viz.auto(dist);

Page 38: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Exact Inference

Naïve approach: Compute 𝑃(𝑥1, 𝑥2, … , 𝑥𝑛)

Better approach:

Take advantage of (conditional) independencies

• Whenever we can expose conditional independence,

e.g., P 𝑥1, x2 x3 = P x1 x3 ⋅ P(x2|x3) the

computation is more efficient

Compute distributions from parents to children

Page 39: Probabilisticmisailo.web.engr.illinois.edu › courses › 598sm-fa19 › 598sm-lec6.pdf · A First Look at Rigorous Probability Theory 2 ed. Probability Distribution • Discrete

Complexity of Exact Inference

Number of variables: 𝒏

Naïve enumeration: complexity is 𝑂 2𝑛

Variable Elimination: if the maximum number of

parents of the nodes is 𝑘 ∈ {1, … , 𝑛}, then the

complexity is 𝑛 ⋅ 𝑂(2𝑘).

For many models this is a good improvement, but

always possible to construct pathological models.