28
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

Embed Size (px)

Citation preview

Page 1: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

Uncertainty

ECE457 Applied Artificial IntelligenceSpring 2007 Lecture #8

Page 2: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2

Outline Uncertainty Probability Bayes’ Theorem

Russell & Norvig, chapter 13

Page 3: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 3

Limit of FOL FOL works only when facts are known

to be true or false “Some purple mushrooms are poisonous” x Purple(x) Mushroom(x)

Poisonous(x) In real life there is almost always

uncertainty “There’s a 70% chance that a purple

mushroom is poisonous” Can’t be represented as FOL sentence

Page 4: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 4

Acting Under Uncertainty So far, rational decision is to pick action

with “best” outcome Two actions

#1 leads to great outcome #2 leads to good outcome

It’s only rational to pick #1 Assumes outcome is 100% certain

What if outcome is not certain? Two actions

#1 has 1% probability to lead to great outcome #2 has 90% probability to lead to good outcome

What is the rational decision?

Page 5: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 5

Acting Under Uncertainty Maximum Expected Utility (MEU)

Pick action that leads to best outcome averaged over all possible outcomes of the action

Same principle as Expectiminimax, used to solve games of chance (see Game Playing, lecture #5)

How do we compute the MEU? First, we need to compute the

probability of each event

Page 6: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 6

Types of Uncertain Variables Boolean

Can be true or false Warm {True, False}

Discrete Can take a value from a limited, countable

domain Temperature {Hot, Warm, Cool, Cold}

Continuous Can take a value from a set of real numbers Temperature [-35, 35]

We’ll focus on discrete variables for now

Page 7: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 7

Probability Each possible value in the domain

of an uncertain variable is assigned a probability Represents how likely it is that the

variable will have this value P(Temperature=Warm)

Probability that the “Temperature” variable will have the value “Warm”

We can simply write P(Warm)

Page 8: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 8

Probability Axioms P(x) [0, 1] P(x) = 1

x is necessarily true, or certain to occur P(x) = 0

x is necessarily false, or certain not to occur P(A B) = P(A) + P(B) – P(A B) P(A B) = 0

A and B are said to be mutually exclusive P(x) = 1

If all values of x are mutually exclusive

Page 9: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 9

Visualizing Probabilities P(A) is the proportion of the event space

in which A is true Area of green circle / Total area P(A) = 0 if the green circle doesn’t exist P(A) = 1 if the green circle covers the entire

event space

P(A) P(A)

Event space

Page 10: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 10

Visualizing Probabilities P(A B) = P(A) + P(B) – P(A B)

Sum of area of both circles / Total area P(A B) = 0

There is no intersection between both regions A and B can’t happen together: mutually

exclusive

P(A) P(B)

P(A B)

Page 11: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 11

Prior (Unconditional) Probability Probability that A is true in the

absence of any other information P(A)

Example P(Temperature=Hot) = 0.2 P(Temperature=Warm) = 0.6 P(Temperature=Cool) = 0.15 P(Temperature=Cold) = 0.05 P(Temperature) = {0.2, 0.6, 0.15,

0.05} This is a probability distribution

Page 12: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 12

Joint Probability Distribution Let’s add another variable

Condition {Sunny, Cloudy, Raining} We can compute

P(Temperature,Condition)

Sunny Cloudy

Raining

Hot 0.12 0.05 0.03

Warm

0.23 0.2 0.17

Cool 0.02 0.05 0.08

Cold 0.01 0.02 0.02

Page 13: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 13

Joint Probability Distribution Given a joint probability distribution

P(a,b), we can compute P(a=Ai) P(Ai) = j P(Ai,Bj)

Assumes all events (Ai,Bj) are mutually exclusive

This is called marginalization P(Warm) = P(Warm,Sunny) +

P(Warm,Cloudy) + P(Warm,Raining)P(Warm) = 0.23 + 0.2 + 0.17P(Warm) = 0.6

Page 14: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 14

Visualizing Marginalization P(A) = P(A,B) + P(A,C) + P(A,D)

No area of A not covered by B, C or D B, C and D do not intersect inside A

P(A,B)

P(A,C)

P(A,D)

P(D)

P(B)

P(C)

P(A)

Page 15: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 15

Posterior (Conditional) Probability Probability that A is true given that

we know that B is true P(A|B)

Can be computed using prior and joint probability P(A|B) = P(A,B) / P(B)

P(Warm|Cloudy) = P(Warm,Cloudy) /P(Cloudy)

P(Warm|Cloudy) = 0.2 / 0.32P(Warm|Cloudy) = 0.625

Page 16: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 16

Visualizing Posterior Probability P(A|B) = P(A,B) / P(B)

We know that B is true We want the area of B where A is also true We don’t care about the area P(B)

P(A | B)

P(B)P(A)

Page 17: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 17

Bayes’ Theorem Start from previous conditional

probability equation P(A|B)P(B) = P(A,B) P(B|A)P(A) = P(B,A) P(A|B)P(B) = P(B|A)P(A)

P(A|B) = P(B|A)P(A) / P(B) (important!) P(A|B): Posterior probability P(A): Prior probability P(B|A): Likelihood P(B): Normalizing constant

Page 18: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 18

Bayes’ Theorem Allows us to compute P(A|B)

without knowing P(A,B) In many real-life situations, P(A|B)

cannot be measured directly, but P(B|A) is available

Bayes’ Theorem underlies all modern probabilistic AI systems

Page 19: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 19

Visualizing Bayes’ Theorem P(A|B) = P(B|A)P(A) / P(B)

We know the portion of the event space where A is true, and that where B is true

We know the portion of A where B is also true We want the portion of B where A is also true

P(B|A)

P(A)P(B)

Page 20: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 20

Bayes’ Theorem Example #1 We want to design a classifier (for email

spam) Compute the probability that an item belongs

to class C (spam) given that it exhibits feature F (the word “Viagra”)

We know that 20% of items in the world belong to class C 90% of items in class C exhibit feature F 40% of items in the world exhibit feature F

P(C|F) = P(F|C) * P(C) / P(F)P(C|F) = 0.9 * 0.2 / 0.4P(C|F) = 0.45

Page 21: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 21

Bayes’ Theorem Example #2 A drug test returns “positive” if drugs are

detected in an athlete’s system, but it can make mistakes If an athlete took drugs, 99% chance of + If an athlete didn’t take drugs, 10% chance of + 5% of athletes take drugs What’s the probability that an athlete who tested

positive really does take drugs? P(drug|+) = P(+|drug) * P(drug) / P(+)

P(+) = P(+|drug)P(drug) + P(+|nodrug)P(nodrug)P(+) = 0.99 * 0.05 + 0.1*0.95 = 0.1445

P(drug|+) = 0.99 * 0.05 / 0.1445 = 0.3426

Page 22: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 22

Bayes’ Theorem We computed the normalizing

constant using marginalization! P(B) = i P(B|Ai)P(Ai)

Page 23: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 23

Chain Rule Recall that P(A,B) = P(A|B)P(B) Can be extended to multiple variables Extend to three variables

P(A,B,C) = P(A|B,C)P(B,C)P(A,B,C) = P(A|B,C)P(B|C)P(C)

General form P(A1,A2,…,An) =

P(A1|A2,…,An)P(A2|A3,…,An)…P(An-1|An)P(An) Compute full joint probability distribution

Simple if variables conditionally independent

Page 24: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 24

Visualizing Chain Rule P(A,B,C) = P(A|B,C)P(B|C)P(C)

We want the proportion of the event space where A, B and C are true

Proportion of B,C where A is also true *proportion of C where B is also true *proportion of the event space where C is trueP(A,B,C)

P(B,C)

P(A|B,C)P(B)

P(C) P(B|C)

P(A)

Page 25: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 25

Independence Two variables are independent if

knowledge of one does not affect the probability of the other P(A|B) = P(A) P(B|A) = P(B) P(A B) = P(A)P(B)

Impact on chain rule P(A1,A2,…,An) = P(A1)P(A2)…P(An)

P(A1,A2,…,An) = i=1n P(Ai)

Page 26: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 26

Conditional Independence Independence is hard to satisfy Two variables are conditionally

independent given a third if knowledge of one does not affect the probability of the other if the value of the third is known P(A|B,C) = P(A|C) P(B|A,C) = P(B|C)

Impact on chain rule P(A1,A2,…,An) = P(A1|An)P(A2|An)…P(An-1|An)P(An)

P(A1,A2,…,An) = P(An) i=1n-1 P(Ai|An)

Page 27: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 27

Bayes’ Theorem Example #3 We want to design a classifier

Compute the probability that an item belongs to class C given that it exhibits features F1 to Fn

We know % of items in the world that belong to class C % of items in class C that exhibit feature Fi

% of items in the world exhibit features F1 to Fn

P(C|F1,…,Fn) = P(F1,…,Fn|C)*P(C)/P(F1,…,Fn)P(F1,…,Fn|C) * P(C) = P(C,F1,…,Fn) by chain rule

P(C,F1,…,Fn) = P(C) i P(Fi|C) assuming features

are conditionally independent given the class

P(C|F1,…,Fn) = P(C) i P(Fi|C) / P(F1,…,Fn)

Page 28: Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 28

Bayes’ Theorem Example #3 P(F1,…,Fn)

Independent of class C In multi-class problems, it makes no difference!

P(C|F1,…,Fn) = P(C) i P(Fi|C)

This is called the Naïve Bayes Classifier “Naïve” because it assumes conditional

independence of Fi given C whether it’s actually true or not

Often used in practice in cases where Fi are not conditionally independent given C, with very good results