10
Machine Learning Srihari 1 Decision Trees Sargur Srihari [email protected]

Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

1

Decision Trees

Sargur [email protected]

Page 2: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

Topics in Decision Trees1. Decision Trees2. CART as an adaptive basis function

model3. Growing a tree4. Pruning a tree5. Pros and Cons of Trees6. Random Forests

2

Page 3: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

Decision Tree• Choice of model is function of input

variables– Different models become responsible for

making predictions in different regions of input space

– A sequence of binary selections corresponding to traversal of a tree structure

Page 4: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

Tree-based Models• Partition the input space into cuboid regions

– Edges aligned along the axes– Assign a simple model, e.g., a constant, to each

region• Can be viewed as a model combination

method in which only one model is responsible for making predictions at any given point in space

• Process of selecting a model for input xcorresponds to the traversal of a binary tree

Page 5: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

Recursive Binary Tree Partitioning

A

B

C D

E

✓1 ✓4

✓2

✓3

x1

x2

x1 > ✓1

x2 > ✓3

x1 6 ✓4

x2 6 ✓2

A B C D E

2-D input space x=[x1,x2]Partitioned into five regionsusing axis aligned boundaries

Binary Tree corresponding toPartitioning of input space

First step divides the whole of input space into two regions according to whether x1≤θ1 or x1>θ1 where θ1 is a parameter of the model This creates two subregions, each of which can be subdivided independentlye.g., region x1≤θ1 is further subdivided according to whether x2≤θ2 or x2>θ2giving rise to the regions denoted A and BFor a new input x we determine which region it falls into by starting at top of tree

Page 6: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

Use of Tree Models• Can be used for both regression and classification• So they are called Classification and Regression Trees

(CART)• Within each region there is a separate model to predict a

target variable– In regression, we might simply predict a constant over each region– In classification, we might assign each region to a specific class

• Key property: they are interpretable by humans– To predict a disease, is temperature greater than a threshold?, is

BP less than a threshold? Each leaf is a dignosis

Page 7: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

CART as adaptive basis model• An adaptive basis function model (ABM) is a

model of the form

– Where ϕm(x) is the mth basis function which is learned from the data

• Typically the basis functions are parametric– We can write ϕm(x)=ϕ (x ; vm)

• Where vm are the parameters of the basis function itself• We can use θ=(w0,w, v) to denote the entire parameter

set

f (x) = w

0+ w

mm=1

M

∑ φm(x)

Page 8: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

Regression Tree with 2 inputs

Each region is associated with a mean responseResult: piecewise constant function

Model can be written as:

Two input attributes (x1,x2), a real output

K. Murphy, Machine Learning, 2012

First node asks if x1 ≤ t1If yes, ask if x2 ≤ t2If yes, we are at quadrant R1Associate a value of y with each region

x1 x2

t1 t2

t3t4

Result of axis-parallel splits:2-d space partitioned into 5 regions

y

f (x) = E[y | x ] = w

mm=1

M

∑ I(x ∈Rm) = w

mm=1

M

∑ φ(x;vm)

R1

R2

R3

R5

R4

w1

w2

w3

w4

w5

Page 9: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

Tree as a basis function model

• The axis parallel splits partition the 2-D space into 5 regions– We associate a mean response with each region

resulting in a piecewise constant surface• We can write the model in the form

• Where Rm is the mth region, wm is the mean response of this region and vm encodes the choice of variable to split on and the the threshold value on the path from the root to the mth leaf

f (x) = E[y |x ] = w

mm=1

M

∑ I(x ∈Rm) = w

Mm=1

M

∑ φ(x;vm)

Page 10: Decision Trees - University at Buffalosrihari/CSE574/Chap14/14... · 1.Decision Trees 2.CART as an adaptive basis function model 3.Growing a tree 4.Pruning a tree 5.Pros and Cons

Machine Learning Srihari

10

Classification Tree (Binary Output)

Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No

PlayTennis has 4 input attributes

Learned function is a tree

T. Mitchell, Machine Learning, 1986