Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Machine Learning Srihari
Topics in Decision Trees1. Decision Trees2. CART as an adaptive basis function
model3. Growing a tree4. Pruning a tree5. Pros and Cons of Trees6. Random Forests
2
Machine Learning Srihari
Decision Tree• Choice of model is function of input
variables– Different models become responsible for
making predictions in different regions of input space
– A sequence of binary selections corresponding to traversal of a tree structure
Machine Learning Srihari
Tree-based Models• Partition the input space into cuboid regions
– Edges aligned along the axes– Assign a simple model, e.g., a constant, to each
region• Can be viewed as a model combination
method in which only one model is responsible for making predictions at any given point in space
• Process of selecting a model for input xcorresponds to the traversal of a binary tree
Machine Learning Srihari
Recursive Binary Tree Partitioning
A
B
C D
E
✓1 ✓4
✓2
✓3
x1
x2
x1 > ✓1
x2 > ✓3
x1 6 ✓4
x2 6 ✓2
A B C D E
2-D input space x=[x1,x2]Partitioned into five regionsusing axis aligned boundaries
Binary Tree corresponding toPartitioning of input space
First step divides the whole of input space into two regions according to whether x1≤θ1 or x1>θ1 where θ1 is a parameter of the model This creates two subregions, each of which can be subdivided independentlye.g., region x1≤θ1 is further subdivided according to whether x2≤θ2 or x2>θ2giving rise to the regions denoted A and BFor a new input x we determine which region it falls into by starting at top of tree
Machine Learning Srihari
Use of Tree Models• Can be used for both regression and classification• So they are called Classification and Regression Trees
(CART)• Within each region there is a separate model to predict a
target variable– In regression, we might simply predict a constant over each region– In classification, we might assign each region to a specific class
• Key property: they are interpretable by humans– To predict a disease, is temperature greater than a threshold?, is
BP less than a threshold? Each leaf is a dignosis
Machine Learning Srihari
CART as adaptive basis model• An adaptive basis function model (ABM) is a
model of the form
– Where ϕm(x) is the mth basis function which is learned from the data
• Typically the basis functions are parametric– We can write ϕm(x)=ϕ (x ; vm)
• Where vm are the parameters of the basis function itself• We can use θ=(w0,w, v) to denote the entire parameter
set
f (x) = w
0+ w
mm=1
M
∑ φm(x)
Machine Learning Srihari
Regression Tree with 2 inputs
Each region is associated with a mean responseResult: piecewise constant function
Model can be written as:
Two input attributes (x1,x2), a real output
K. Murphy, Machine Learning, 2012
First node asks if x1 ≤ t1If yes, ask if x2 ≤ t2If yes, we are at quadrant R1Associate a value of y with each region
x1 x2
t1 t2
t3t4
Result of axis-parallel splits:2-d space partitioned into 5 regions
y
f (x) = E[y | x ] = w
mm=1
M
∑ I(x ∈Rm) = w
mm=1
M
∑ φ(x;vm)
R1
R2
R3
R5
R4
w1
w2
w3
w4
w5
Machine Learning Srihari
Tree as a basis function model
• The axis parallel splits partition the 2-D space into 5 regions– We associate a mean response with each region
resulting in a piecewise constant surface• We can write the model in the form
• Where Rm is the mth region, wm is the mean response of this region and vm encodes the choice of variable to split on and the the threshold value on the path from the root to the mth leaf
f (x) = E[y |x ] = w
mm=1
M
∑ I(x ∈Rm) = w
Mm=1
M
∑ φ(x;vm)
Machine Learning Srihari
10
Classification Tree (Binary Output)
Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No
PlayTennis has 4 input attributes
Learned function is a tree
T. Mitchell, Machine Learning, 1986