Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal [email protected] A.I. September

Artificial IntelligenceLearning: decision lists, evaluation, Naive

Bayesian networks

Peter [email protected]

September 26, 2016 1A.I.

mailto:[email protected]

Algorithms for concept learning◦ Best vs. version space

PAC-learning for decision lists

The evaluation of performance

From predictions to optimal decisions

Learning Naiv Bayesian networks

September 26, 2016A.I. 2

Each model specifies true/false for each proposition symbol

E.g. P1,2 P2,2 P3,1false true false

With these symbols, 8 possible models, can be enumerated automatically.

Rules for evaluating truth with respect to a model m:

S is true iff S is false S1 S2 is true iff S1 is true and S2 is trueS1 S2 is true iff S1is true or S2 is trueS1 S2 is true iff S1 is false or S2 is truei.e., is false iff S1 is true and S2 is falseS1 S2 is true iff S1S2 is true andS2S1 is true

Simple recursive process evaluates an arbitrary sentence, e.g.,

P1,2 (P2,2 P3,1) = true (true false) = true true = true

9/26/2016 3A.I.

9/26/2016 4A.I.

Two sentences are logically equivalent} iff true in same models: α ≡ ß iff α╞ β and β╞ α

9/26/2016 5A.I.

B1,1 (P1,2 P2,1)β

1. Eliminate , replacing α β with (α β)(β α).2.

(B1,1 (P1,2 P2,1)) ((P1,2 P2,1) B1,1)

2. Eliminate , replacing α β with α β.

(B1,1 P1,2 P2,1) ((P1,2 P2,1) B1,1)

3. Move inwards using de Morgan's rules and double-negation:

(B1,1 P1,2 P2,1) ((P1,2 P2,1) B1,1)

4. Apply distributivity law ( over ) and flatten:

(B1,1 P1,2 P2,1) (P1,2 B1,1) (P2,1 B1,1)

9/26/2016 6A.I.

Goal: selection of a logical function f: {0,1}n→{0,1} from a function class C,

which is consistent with the data DN={(x1, y1),..,(xN, yN)}, i.e. for i=1..N: f(xi)= yi.

Predicted Ref.:0 Ref.1

0 True negative (TN)

False negative (FN)

1 False positive

(FP)

True positive

(TP)

Learning method:True negative/ True positive: -False negative: generalizeFalse positive: specialize

False negative: generalization

◦ Replace A B to A

◦ Replace A to A B

False positive: specialization

◦ Replace A to A B

◦ Replace A B to A


+ + +

+ + +

+ + +

+ + +

+ + +

+ + -

+ + +

+ + -

Bound the set of consistent hypotheses with two limiting sets:◦ S: the set of most specific consistent hypotheses

◦ G: the set of most general consistent hypotheses

Learning from (xi, yi): update Si and Gi

◦ For each hypothesis in Si:

FP: delete

FN: generalize to all neigbours

◦ For each hypothesis in Gi:

FP: specialize to all neighbours

FN: delete


Sp

ecia

lg

en

era

l

One possible representation for hypotheses

E.g., here is the “true” tree for deciding whether to wait:

How many distinct decision trees with n Boolean attributes?= number of Boolean functions= number of distinct truth tables with 2n rows = 22n

E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees

How many purely conjunctive hypotheses (e.g., Hungry Rain)?

Each attribute can be in (positive), in (negative), or out 3n distinct conjunctive hypotheses

More expressive hypothesis space◦ increases chance that target function can be expressed◦ increases number of hypotheses consistent with training set

may get worse predictions

Sequential k tests using n attributes: k-DL(n)

Number of tests:

Number of test sequences:

Number of decision lists:


),(3

knConj

!),(3)(DL),(

knConjnkknConj

k

i

knOi

nknConj

0

)(2

),(

Number of decision lists:

PAC sample complexity:


))(log( 22)(DLkk nnOnk

)))(log(1

(ln1

2

kk nnOm

Sensitivity: p(Prediction=TRUE|Ref=TRUE)

Specificity: p(Prediction=FALSE|Ref=FALSE)

PPV: p(Ref=TRUE|Prediction=TRUE)

NPV: p(Ref=FALSE|Prediction=FALSE)

Mutation

Onset

Bleeding

absent

P(D|a,l,m)

Regularity

weak

Onset=early Onset=late

h.wild

regular irregular

mutated

P(D|a,l,h.w.)

P(D|a,e)

strong

P(D|Bleeding=strong)

Mutation

P(D|w,i,m)

h.wild mutated

P(D|w,i,h.w.)

P(D|w,r)

Decision tree: Each internal node represent a (univariate) test, the leafs contains

the conditional probabilities given the values along the path.

Decision graph: If conditions are equivalent, then subtrees can be merged.

E.g. If (Bleeding=absent,Onset=late) ~ (Bleeding=weak,Regularity=irreg)

Healthy Disease present

threshold t

a1

a0

o0

o1

o0

o1

reported Ref.:0 Ref.1

0 C0|0 C0|1

1 C1|0 C1|1

Variables (nodes) Flu: present/absent

FeverAbove38C: present/absent

Coughing: present/absent

Flu

Fever Coughing

P(Fever=present|Flu=present)=0.6

P(Fever=absent|Flu=present)=1-0.6

P(Fever=present|Flu=absent)=0.01

P(Fever=absent|Flu=absent)=1-0.01

P(Flu=present)=0.001

P(Flu=absent)=1-P(Flu=present)Model

P(Coughing=present|Flu=present)=0.3

P(Coughing=absent|Flu=present)=1-0.7

P(Coughing=present|Flu=absent)=0.02

P(Coughing=absent|Flu=absent)=1-0.02

Assumptions:

1, Two types of nodes: a cause and effects.

2, Effects are conditionally independent of each other given their cause.

Decomposition of the joint:

P(Y,X1,..,Xn) = P(Y)∏iP(Xi,|Y, X1,..,Xi-1) //by the chain rule

= P(Y)∏iP(Xi,|Y) // by the N-BN assumption

2n+1 parameteres!

Diagnostic inference:

P(Y|xi1,..,xik) = P(Y)∏jP(xij,|Y) / P(xi1,..,xik)

If Y is binary, then the oddsP(Y=1|xi1,..,xik) / P(Y=0|xi1,..,xik) = P(Y=1)/P(Y=0) ∏j P(xij,|Y=1) / P(xij,|Y=0)

Flu

Fever Coughing

)|()|()(

),|(

presentFlupresentCoughingppresentFluabsentFeverppresentFlup

presentCoughingabsentFeverpresentFlup

9/26/2016A.I. 20

Naive concept learning

Learning decision lists

Decision trees and graphs

Optimal decisions

Error types in classification

Cost-free performance measures

Naive Bayesian network classifiers


Documents

Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal [email protected] A.I. September