47
Decision Trees Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric.

Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

DecisionTrees

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.

Page 2: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

FunctionApproximationProblemSetting• Setofpossibleinstances• Setofpossiblelabels• Unknowntargetfunction• Setoffunctionhypotheses

Input:Trainingexamplesofunknowntargetfunctionf

Output:Hypothesisthatbestapproximatesf

XY

f : X ! YH = {h | h : X ! Y}

h 2 H

BasedonslidebyTomMitchell

{hxi, yii}ni=1 = {hx1, y1i , . . . , hxn, yni}

Page 3: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

SampleDataset• ColumnsdenotefeaturesXi

• Rowsdenotelabeledinstances• Classlabeldenoteswhetheratennisgamewasplayed

hxi, yii

hxi, yii

Page 4: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

DecisionTree• Apossibledecisiontreeforthedata:

• Eachinternalnode:testoneattributeXi

• Eachbranchfromanode:selectsonevalueforXi

• Eachleafnode:predictY (or)

BasedonslidebyTomMitchell

p(Y | x 2 leaf)

Page 5: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

DecisionTree• Apossibledecisiontreeforthedata:

• Whatpredictionwouldwemakefor<outlook=sunny,temperature=hot,humidity=high,wind=weak>?

BasedonslidebyTomMitchell

Page 6: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

DecisionTree• Iffeaturesarecontinuous,internalnodescantestthevalueofafeatureagainstathreshold

6

Page 7: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

8

Problem Setting: •  Set of possible instances X

–  each instance x in X is a feature vector

–  e.g., <Humidity=low, Wind=weak, Outlook=rain, Temp=hot>

•  Unknown target function f : XY

–  Y is discrete valued

•  Set of function hypotheses H={ h | h : XY }

–  each hypothesis h is a decision tree

–  trees sorts x to leaf, which assigns y

Decision Tree Learning

Decision Tree Learning

Problem Setting: •  Set of possible instances X

–  each instance x in X is a feature vector

x = < x1, x2 … xn>

•  Unknown target function f : XY

–  Y is discrete valued

•  Set of function hypotheses H={ h | h : XY }

–  each hypothesis h is a decision tree

Input: •  Training examples {<x(i),y(i)>} of unknown target function f

Output: •  Hypothesis h ∈ H that best approximates target function f

DecisionTreeLearning

SlidebyTomMitchell

Page 8: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

Stagesof(Batch)MachineLearningGiven: labeledtrainingdata

• Assumeseachwith

Trainthemodel:modelß classifier.train(X,Y )

Applythemodeltonewdata:• Given:newunlabeledinstance

yprediction ßmodel.predict(x)

model

learner

X,Y

x yprediction

X,Y = {hxi, yii}ni=1

xi ⇠ D(X ) yi = ftarget(xi)

x ⇠ D(X )

Page 9: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

ExampleApplication:ATreetoPredictCaesareanSectionRisk

9

Decision Trees Suppose X = <X1,… Xn>

where Xi are boolean variables

How would you represent Y = X2 X5 ? Y = X2 ∨ X5

How would you represent X2 X5 ∨ X3X4(¬X1)

BasedonExamplebyTomMitchell

Page 10: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

DecisionTreeInducedPartition

Color

ShapeSize +

+- Size

+-

+big

big small

small

roundsquare

redgreen blue

Page 11: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

DecisionTree– DecisionBoundary• Decisiontreesdividethefeaturespaceintoaxis-parallel(hyper-)rectangles

• Eachrectangularregionislabeledwithonelabel– oraprobabilitydistributionoverlabels

11

Decisionboundary

Page 12: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

Expressiveness• Decisiontreescanrepresentanyboolean functionoftheinputattributes

• Intheworstcase,thetreewillrequireexponentiallymanynodes

Truthtablerowà pathtoleaf

Page 13: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

ExpressivenessDecisiontreeshaveavariable-sizedhypothesisspace• Asthe#nodes(ordepth)increases,thehypothesisspacegrows– Depth1(“decisionstump”):canrepresentanybooleanfunctionofonefeature

– Depth2:anyboolean fn oftwofeatures;someinvolvingthreefeatures(e.g.,)

– etc.(x1 ^ x2) _ (¬x1 ^ ¬x3)

BasedonslidebyPedroDomingos

Page 14: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

AnotherExample:RestaurantDomain(Russell&Norvig)

Model a patron’s decision of whether to wait for a table at a restaurant

~7,000possiblecases

Page 15: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

ADecisionTreefromIntrospection

Isthisthebestdecisiontree?

Page 16: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

Preferencebias:Ockham’sRazor• PrinciplestatedbyWilliamofOckham(1285-1347)– “nonsunt multiplicanda entia praeter necessitatem”– entitiesarenottobemultipliedbeyondnecessity– AKAOccam’sRazor,LawofEconomy,orLawofParsimony

• Therefore,thesmallestdecisiontreethatcorrectlyclassifiesallofthetrainingexamplesisbest• FindingtheprovablysmallestdecisiontreeisNP-hard• ...Soinsteadofconstructingtheabsolutesmallesttreeconsistentwiththetrainingexamples,constructonethatisprettysmall

Idea:Thesimplestconsistentexplanationisthebest

Page 17: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

BasicAlgorithmforTop-DownInductionofDecisionTrees

[ID3,C4.5byQuinlan]

node =rootofdecisiontreeMainloop:1. Aß the“best”decisionattributeforthenextnode.2. AssignA asdecisionattributefornode.3. ForeachvalueofA,createanewdescendantofnode.4. Sorttrainingexamplestoleafnodes.5. Iftrainingexamplesareperfectlyclassified,stop.Else,

recurse overnewleafnodes.

Howdowechoosewhichattributeisbest?

Page 18: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

ChoosingtheBestAttributeKeyproblem:choosingwhichattributetosplitagivensetofexamples

• Somepossibilitiesare:– Random: Selectanyattributeatrandom– Least-Values: Choosetheattributewiththesmallestnumberofpossiblevalues

– Most-Values: Choosetheattributewiththelargestnumberofpossiblevalues

– Max-Gain: Choosetheattributethathasthelargestexpectedinformationgain• i.e.,attributethatresultsinsmallestexpectedsizeofsubtreesrootedatitschildren

• TheID3algorithmusestheMax-Gainmethodofselectingthebestattribute

Page 19: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

ChoosinganAttributeIdea:agoodattributesplitstheexamplesintosubsetsthatare(ideally)“allpositive” or“allnegative”

Whichsplitismoreinformative:Patrons? orType?

BasedonSlidefromM.desJardins &T.Finin

Page 20: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

ID3-inducedDecisionTree

BasedonSlidefromM.desJardins &T.Finin

Page 21: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

ComparetheTwoDecisionTrees

BasedonSlidefromM.desJardins &T.Finin

Page 22: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

22

InformationGain

Whichtestismoreinformative?

Split over whether Balance exceeds 50K

Over 50KLess or equal 50K EmployedUnemployed

Split over whether applicant is employed

BasedonslidebyPedroDomingos

Page 23: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

23

Impurity/Entropy (informal)–Measuresthelevelofimpurity inagroupofexamples

InformationGain

BasedonslidebyPedroDomingos

Page 24: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

24

Impurity

Very impure group Less impure Minimum impurity

BasedonslidebyPedroDomingos

Page 25: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

10

node = Root

[ID3, C4.5, Quinlan]

Entropy

Entropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a

randomly drawn value of X (under most efficient code)

Why? Information theory:

•  Most efficient code assigns -log2P(X=i) bits to encode

the message X=i

•  So, expected number of bits to code one random X is:

# of possible values for X

SlidebyTomMitchell

Entropy:acommonwaytomeasureimpurity

Page 26: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

10

node = Root

[ID3, C4.5, Quinlan]

Entropy

Entropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a

randomly drawn value of X (under most efficient code)

Why? Information theory:

•  Most efficient code assigns -log2P(X=i) bits to encode

the message X=i

•  So, expected number of bits to code one random X is:

# of possible values for X

SlidebyTomMitchell

Entropy:acommonwaytomeasureimpurity

Page 27: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

Example:Huffmancode• In1952MITstudentDavidHuffmandevised,inthecourse

ofdoingahomeworkassignment,anelegantcodingschemewhichisoptimalinthecasewhereallsymbols’probabilitiesareintegralpowersof1/2.

• AHuffmancodecanbebuiltinthefollowingmanner:–Rankallsymbolsinorderofprobabilityofoccurrence–Successivelycombinethetwosymbolsofthelowestprobabilitytoformanewcompositesymbol;eventuallywewillbuildabinarytreewhereeachnodeistheprobabilityofallnodesbeneathit–Traceapathtoeachleaf,noticingdirectionateachnode

BasedonSlidefromM.desJardins &T.Finin

Page 28: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

HuffmancodeexampleMPA.125B.125C.25D.5

.5.5

1

.125.125

.25

A

C

B

D.25

0 1

0

0 1

1

M code length probA 000 3 0.125 0.375B 001 3 0.125 0.375C 01 2 0.250 0.500D 1 1 0.500 0.500

average message length 1.750

Ifweusethiscodetomanymessages(A,B,CorD)withthisprobabilitydistribution,then,overtime,theaveragebits/messageshouldapproach1.75

BasedonSlidefromM.desJardins &T.Finin

Page 29: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

30

2-ClassCases:

• Whatistheentropyofagroupinwhichallexamplesbelongtothesameclass?– entropy=- 1log21=0

• Whatistheentropyofagroupwith50%ineitherclass?– entropy=-0.5log20.5– 0.5log20.5=1

Minimum impurity

Maximumimpurity

notagoodtrainingsetforlearning

goodtrainingsetforlearning

BasedonslidebyPedroDomingos

H(x) = �nX

i=1

P (x = i) log2 P (x = i)Entropy

Page 30: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

SampleEntropy

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 31: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

32

InformationGain• Wewanttodeterminewhichattribute inagivensetoftrainingfeaturevectorsismostuseful fordiscriminatingbetweentheclassestobelearned.

• Informationgain tellsushowimportantagivenattributeofthefeaturevectorsis.

• Wewilluseittodecidetheorderingofattributesinthenodesofadecisiontree.

BasedonslidebyPedroDomingos

Page 32: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

FromEntropytoInformationGain

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 33: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

FromEntropytoInformationGain

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 34: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

FromEntropytoInformationGain

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 35: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

FromEntropytoInformationGain

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 36: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

InformationGain

12

Information Gain is the mutual information between

input attribute A and target variable Y

Information Gain is the expected reduction in entropy

of target variable Y for data sample S, due to sorting

on variable A

SlidebyTomMitchell

Page 37: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

38

CalculatingInformationGain

996.03016log

3016

3014log

3014

22 =÷øö

çèæ ×-÷

øö

çèæ ×-=impurity

787.0174log

174

1713log

1713

22 =÷øö

çèæ ×-÷

øö

çèæ ×-=impurity

Entire population (30 instances)17 instances

13 instances

(Weighted) Average Entropy of Children = 615.0391.03013787.0

3017

=÷øö

çèæ ×+÷

øö

çèæ ×

Information Gain= 0.996 - 0.615 = 0.38

391.01312log

1312

131log

131

22 =÷øö

çèæ ×-÷

øö

çèæ ×-=impurity

InformationGain =entropy(parent)– [averageentropy(children)]

parententropy

childentropy

childentropy

BasedonslidebyPedroDomingos

Page 38: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

39

Entropy-BasedAutomaticDecisionTreeConstruction

Node1Whatfeatureshouldbeused?

Whatvalues?

TrainingSetXx1=(f11,f12,…f1m)x2=(f21,f22,f2m)

.

.xn=(fn1,f22,f2m)

Quinlansuggestedinformationgain inhisID3systemandlaterthegainratio,bothbasedonentropy.

BasedonslidebyPedroDomingos

Page 39: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

40

UsingInformationGaintoConstructaDecisionTree

AttributeA

v1 vkv2

FullTrainingSetX

SetX¢

repeatrecursivelytillwhen?

Disadvantageofinformationgain:• Itprefersattributeswithlargenumberofvaluesthatsplit

thedataintosmall,puresubsets• Quinlan’sgainratiousesnormalizationtoimprovethis

X¢={xÎX |value(A)=v1}

ChoosetheattributeAwithhighestinformationgainforthefulltrainingsetattherootofthetree.

ConstructchildnodesforeachvalueofA.EachhasanassociatedsubsetofvectorsinwhichAhasaparticularvalue.

BasedonslidebyPedroDomingos

Page 40: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

12

Information Gain is the mutual information between

input attribute A and target variable Y

Information Gain is the expected reduction in entropy

of target variable Y for data sample S, due to sorting

on variable A

Page 41: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

13

SlidebyTomMitchell

Page 42: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

13

SlidebyTomMitchell

Page 43: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

13

SlidebyTomMitchell

Page 44: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

DecisionTreeApplet

http://webdocs.cs.ualberta.ca/~aixplore/learning/DecisionTrees/Applet/DecisionTreeApplet.html

Page 45: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

14

Decision Tree Learning Applet

•  http://www.cs.ualberta.ca/%7Eaixplore/learning/

DecisionTrees/Applet/DecisionTreeApplet.html

Which Tree Should We Output?

•  ID3 performs heuristic

search through space of

decision trees

•  It stops at smallest

acceptable tree. Why?

Occam’s razor: prefer the simplest hypothesis that fits the data

SlidebyTomMitchell

Page 46: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

The ID3 algorithm builds a decision tree, given a set of non-categorical attributes C1, C2, .., Cn, the class attribute C, and a training set T of records

function ID3(R:input attributes, C:class attribute, S:training set) returns decision tree;

If S is empty, return single node with value Failure;

If every example in S has same value for C, return single node with that value;

If R is empty, then return a single node with mostfrequent of the values of C found in examples S; # causes errors -- improperly classified record

Let D be attribute with largest Gain(D,S) among R;

Let {dj| j=1,2, .., m} be values of attribute D;

Let {Sj| j=1,2, .., m} be subsets of S consisting of records with value dj for attribute D;

Return tree with root labeled D and arcs labeled d1..dm going to the trees ID3(R-{D},C,S1). . .ID3(R-{D},C,Sm);

BasedonSlidefromM.desJardins &T.Finin

Page 47: Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

Howwelldoesitwork?Manycasestudieshaveshownthatdecisiontreesareatleastasaccurateashumanexperts.–Astudyfordiagnosingbreastcancerhadhumanscorrectlyclassifyingtheexamples65%ofthetime;thedecisiontreeclassified72%correct–BritishPetroleumdesignedadecisiontreeforgas-oilseparationforoffshoreoilplatformsthatreplacedanearlierrule-basedexpertsystem–Cessnadesignedanairplaneflightcontrollerusing90,000examplesand20attributesperexample

BasedonSlidefromM.desJardins &T.Finin