38
1 07 December 2009 Cao Hoang Tru CSE Faculty - HCMUT Probability FOL fails for a domain due to: Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result Theoretical ignorance: there is no complete theory for the domain Practical ignorance: have not or cannot run all necessary tests

Probability - CSE

Embed Size (px)

Citation preview

1

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• FOL fails for a domain due to:

– Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result

– Theoretical ignorance: there is no complete theory for the domain

– Practical ignorance: have not or cannot run all necessary tests

2

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Probability = a degree of belief

• Probability comes from:

– Frequentist: experiments and statistical assessment

– Objectivist: real aspects of the universe

– Subjectivist: a way of characterizing an agent’s beliefs

• Decision theory = probability + utility theory

3

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Prior probability: probability in the absence of any other information

P(Dice = 2) = 1/6

random variable: Dice

domain = <1, 2, 3, 4, 5, 6>

probability distribution: P(Dice) = <1/6, 1/6, 1/6, 1/6, 1/6, 1/6>

4

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Conditional probability: probability in the presence of some evidence

P(Dice = 2 | Dice is even) = 1/3

P(Dice = 2 | Dice is odd) = 0

• P(A | B) = P(A ∧ B)/P(B)

P(A ∧ B) = P(A | B).P(B)

5

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Example:

S = stiff neck

M = meningitis

P(S | M) = 0.5

P(M) = 1/50000

P(S) = 1/20

⇒ P(M | S) = P(S | M).P(M)/P(S) = 1/5000

6

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Joint probability distributions:

X: <x1, …, xm> Y: <y1, …, yn>

P(X = xi, Y = yj)

7

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Axioms:

0 ≤ P(A) ≤ 1

P(true) = 1 and P(false) = 0

P(A ∨ B) = P(A) + P(B) − P(A ∧ B)

8

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Derived properties:

P(¬A) = 1 − P(A)

P(U) = P(A1) + P(A2) + ... + P(An)

U = A1 ∨ A2 ∨ ... ∨ An collectively exhaustive

Ai ∧ Aj = false mutually exclusive

9

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Bayes’ theorem:

P(Hi | E) = P(Hi∧E)/P(E) = P(E | Hi).P(Hi)/∑P(E | Hi).P(Hi)

Hi’s are collectively exhaustive & mutually exclusive

10

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• A full joint probability distribution P(X1, X2, ..., Xn) is sufficient for computing any probability on Xi's.

11

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Problem: a full joint probability distribution P(X1, X2, ..., Xn) is sufficient for computing any probability on Xi's, but the number of joint probabilities is exponential.

12

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Independence:

P(A ∧ B) = P(A).P(B)

P(A) = P(A | B)

• Conditional independence:

P(A ∧ B | E) = P(A | E).P(B | E)

P(A | E) = P(A | E ∧ B)

13

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

• Example:

P(Toothache | Cavity ∧ Catch) = P(Toothache | Cavity)

P(Catch | Cavity ∧ Toothache) = P(Catch | Cavity)

14

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

"In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."

15

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

"In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."

Q1: If earthquake happens, how likely will John make a call?

16

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Probability

"In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."

Q1: If earthquake happens, how likely will John make a call?

Q2: If the alarm sounds, how likely is the house burglarized?

17

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Bayesian Networks

• Pearl, J. (1982). Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach, presented at the Second National Conference on Artificial Intelligence (AAAI-82), Pittsburgh, Pennsylvania

• 2000 AAAI Classic Paper Award

18

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Bayesian Networks

Burglary Earthquake

Alarm

JohnCalls MaryCalls

.001

P(B)

.002

P(E)

T T

T F

F T

F F

B E

.95

.94

.29

.001

P(A)

T

F

A

.90

.05

P(J)T

F

A

.70

.01

P(M)

19

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Bayesian Networks• Syntax:

– A set of random variables making up the nodes

– A set of directed links connecting pairs of nodes

– Each node has a conditional probability table that quantifies the effects of its parent nodes

– The graph has no directed cycles

A

B C

D

A

B C

D

yes no

20

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Bayesian Networks

• Semantics:

Ordering the nodes: Xi is a predecessor of Xj ⇒ i < j

P(X1, X2, …, Xn)

= P(Xn | Xn-1, …, X1).P(Xn-1 | Xn-2, …, X1). … .P(X2 | X1).P(X1)

= ΠiP(Xi | Xi-1, …, X1) = ΠiP(Xi | Parents(Xi))

P (Xi | Xi-1, …, X1) = P(Xi | Parents(Xi)) Parents(Xi) ⊆ {Xi-1, …, X1}

Each node is conditionally independent of its predecessors given its parents

21

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Bayesian Networks

• Example:

P(J ∧ M ∧ A ∧ ¬B ∧ ¬E)

= P(J | A).P(M | A).P(A | ¬B ∧ ¬E).P(¬B).P(¬E)

22

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Bayesian Networks

= Conditional Probabilities + Independence Assumptions

= Full Joint Probability Distribution

23

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Bayesian Networks

• P(Query | Evidence) = ?

– Diagnostic (from effects to causes): P(B | J)

– Causal (from causes to effects): P(J | B)

– Intercausal (between causes of a common effect): P(B | A, E)

– Mixed: P(A | J, ¬E), P(B | J, ¬E)

24

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

P(B | A)

= P(B ∧∧∧∧ A)/P(A)

= αP(B ∧∧∧∧ A)

Bayesian Networks

25

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Bayesian Networks

• Why Bayesian Networks?

26

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

General Conditional Independence

Y1 Yn

U1 Um

Z1j Znj

X

A node (X) is conditionally independent of its non-descendents (Zij's),

given its parents (Ui's)

27

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

General Conditional Independence

Y1 Yn

U1 Um

Z1j Znj

X

A node (X) is conditionally independent of all other nodes, given

its parents (Ui's), children (Yi's), and children's parents (Zij's)

28

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

General Conditional Independence

X and Y are conditionally independent given E

Z

Z

Z

X YE

29

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

General Conditional Independence

Example:Battery

GasIgnitionRadio

Starts

Moves

30

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

General Conditional Independence

Example:Battery

GasIgnitionRadio

Starts

Moves

Gas - (no evidence at all) - Radio

Gas ↔↔↔↔ Radio | Starts

Gas ↔↔↔↔ Radio | Moves

31

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Network Construction

• Wrong ordering:

– Produces a more complicated net

– Requires more probabilities to be specified

– Produces slight relationships that require difficult and unnatural probability judgments

– Fails to represent all the conditional independence relationships

⇒ a lot of unnecessary probabilities

32

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Network Construction

• Models:

– Diagnostic: symptoms to causes

– Causal: causes to symptoms

Add "root causes" first, then the variables they influence

⇒ fewer and more comprehensible probabilities

33

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Dempster-Shafer Theory

• Example: disease diagnostics

D = {Allergy, Flu, Cold, Pneumonia}

34

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Dempster-Shafer Theory

• Example: disease diagnostics

D = {Allergy, Flu, Cold, Pneumonia}

• Basic probability assignment:

m: 2D → [0, 1]

m1: {Flu, Cold, Pneu} 0.6

D 0.4

35

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Dempster-Shafer Theory

• Belief:

Bel: 2D → [0, 1]

Bel(S) = ∑X⊆S m(X)

36

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Dempster-Shafer Theory

• Belief:

Bel: 2D → [0, 1]

Bel(S) = ∑X⊆S m(X)

• Plausibility:

Pl(p) = 1 − Bel(¬p)

p: [Bel(p), Pl(p)] Bel(p) + Bel(¬p) ≤ 1

37

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Dempster-Shafer Theory

• Combination rules: m1 and m2

∑X∩Y=Zm1(X).m2(Y)

m(Z) = 1 − ∑X∩Y=∅m1(X).m2(Y)

38

07 December 2009

Cao Hoang Tru

CSE Faculty - HCMUT

Exercises

• In Russell’s AIMA (2nd ed.): Chapters 13-14.