44
Decision Trees, kNN Classifier Machine Learning 10601B Seyoung Kim Many of these slides are derived from Tom Mitchell and William Cohen. Thanks!

DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Decision  Trees,  kNN  Classifier  

Machine  Learning  10-­‐601B  

Seyoung  Kim  

Many  of  these  slides  are  derived  from  Tom  Mitchell  and  William  Cohen.  Thanks!  

Page 2: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Beyond  linearity  

•  Decision  tree  –  What  decision  trees  are  

–  How  to  learn  them  

•  Nearest  neighbor  classifier  

Page 3: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Decision Tree Learning

Problem Setting: •  Set of possible instances X,Y

– each instance x in X is a feature vector x = < x1, x2 … xn>

– Y is discrete-valued •  Unknown target function f : X!Y •  Set of function hypotheses H={ h | h : X!Y }

– each hypothesis h is a decision tree

Input: •  Training examples {<x(i),y(i)>} of unknown target function f Output: •  Hypothesis h ∈ H that best approximates target function f

Page 4: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Each internal node: test one discrete-valued attribute Xi

Each branch from a node: selects one value for Xi

Each leaf node: predict Y (or P(Y|X ∈ leaf))

A  Decision  tree  for    f:  <Outlook,  Humidity,  Wind,  Temp>  !  PlayTennis?  

Page 5: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 6: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Decision  tree  learning  

Page 7: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Mo8va8ons  for  Decision  Trees  

•  ORen  you  can  find  a  fairly  accurate  classifier  which  is  small  and  easy  to  understand.  –  SomeSmes  this  gives  you  useful  insight  into  a  problem  

•  SomeSmes  features  interact  in  complicated  ways  –  Trees  can  find  interacSons  (e.g.,  “sunny  and  humid”)  that  linear  

classifiers  can’t  

•  Trees  are  very  inexpensive  at  test  8me  –  You  don’t  always  even  need  to  compute  all  the  features  of  an  

example.  

–  You  can  even  build  classifiers  that  take  this  into  account….  –  SomeSmes  that’s  important  (e.g.,  “bloodPressure<100”  vs  

“MRIScan=normal”  might  have  different  costs  to  compute).  

Page 8: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Decision Tree Learning Algorithm

Page 9: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Decision  tree  learning  algorithms  

1.  Given  dataset  D:  –  return  leaf(y)  if  all  examples  are  

in  the  same  class  y    …  or  nearly  so  

–  pick  the  best  split,  on  the  best  aaribute  a  

•  a    or  not(a)  •  a=c1    or  a=c2    or  …  •  a<θ      or    a  ≥  θ  •  a  in  {c1,…,ck}    or  not  

–  split  the  data  into  D1  ,D2  ,  …  Dk  and  recursively  build  trees  for  each  subset  

2.  “Prune”  the  tree  

Page 10: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Most  decision  tree  learning  algorithms  

1.  Given  dataset  D:  –  return  leaf(y)  if  all  examples  are  

in  the  same  class  y    …  or  nearly  so...  

–  pick  the  best  split,  on  the  best  aaribute  a  

•  a=c1    or  a=c2    or  …  •  a<θ  or    a  ≥  θ  •  a    or  not(a)  •  a  in  {c1,…,ck}    or  not  

–  split  the  data  into  D1  ,D2  ,  …  Dk  and  recursively  build  trees  for  each  subset  

2.  “Prune”  the  tree  

Popular  splicng  criterion:  try  to  lower  entropy  of  the  y  labels  on  the  resulSng  parSSon  •  i.e.,  prefer  splits  that  

have  very  skewed  distribuSons  of  labels  

Page 11: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Picking  the  Best  Split  

•  Which  aaribute  is  the  best?  

Page 12: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Sample Entropy

Page 13: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Entropy Entropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a randomly drawn value of X (under most efficient code)

Why? Information theory: •  Most efficient possible code assigns -log2 P(X=i) bits

to encode the message X=i •  So, expected number of bits to code one random X is:

# of possible values for X

Page 14: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Entropy Entropy H(X) of a random variable X

Specific  condiSonal  entropy  H(X|Y=v)  of  X  given  Y=v  :  

Mutual  informaSon  (aka  InformaSon  Gain)  of  X  and  Y  :  

CondiSonal  entropy  H(X|Y)  of  X  given  Y  :  

Page 15: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Information Gain is the mutual information between input attribute A and target variable Y

Information Gain is the expected reduction in entropy of target variable Y for data sample S, due to sorting on variable A

Page 16: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 17: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 18: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 19: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 20: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

OverfiDng  

Consider  a  hypothesis  h  and  its  

•  Error  rate  over  training  data:  •  True  error  rate  over  all  data:    

We  say  h  overfits  the  training  data  if  

Amount  of  overficng  =    

Page 21: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 22: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 23: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 24: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Split data into training and validation set

Create tree that classifies training set correctly

Page 25: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 26: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 27: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 28: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Decision  Tree  and  Linear  Classifier  

•  Decision  trees  don’t  (typically)  improve  over  linear  classifiers  when  you  have  lots  of  features  

•  SomeSmes  fail  badly  on  problems  that  linear  classifiers  perform  well  on  –  here’s  an  example  

Page 29: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…
Page 30: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Another  view  of  a  decision  tree  

Page 31: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Another  view  of  a  decision  tree  Sepal_length<5.7  

Sepal_width>2.8  

Page 32: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Another  view  of  a  decision  tree  

Page 33: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Another  view  of  a  decision  tree  

Page 34: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Questions to think about (1)

•  Consider target function f: <x1,x2> ! y, where x1 and x2 are real-valued, y is boolean. What is the set of decision surfaces describable with decision trees that use each attribute at most once?

Page 35: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Ques8ons  to  think  about  (2)  

•  What  is  the  relaSonship  between  learning  decision  trees,  and  learning  IF-­‐THEN  rules  

Page 36: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Ques8ons  to  think  about  (3)  

if  O=sunny  and  H<=  70  then  PLAY  else  if  O=sunny  and  H>70  then  DON’T_PLAY  else  if  O=overcast  then  PLAY  else  if  O=rain  and  windy  then  DON’T_PLAY  else  if  O=rain  and  !windy  then  PLAY    

if  O=sunny  and  H>  70  then  DON’T_PLAY  else  if  O=rain  and  windy  then  DON’T_PLAY  else  PLAY    

Simpler  rule  set  

One  rule  per  leaf  in  the  tree  

Page 37: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Nearest  Neighbor  Learning  

Page 38: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

k-­‐nearest  neighbor  learning  

?  

Given  a  test  example  x:  1.  Find  the  k  training-­‐set  

examples  (x1,y1),….,(xk,yk)  that  are  closest  to  x.  

2.  Predict  the  most  frequent  label  in  that  set.  

?  

Page 39: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Breaking  it  down:  

•  To  train:  –  save  the  data  

•  To  test:  –  For  each  test  example  x:  

1. Find  the  k  training-­‐set  examples  (x1,y1),….,(xk,yk)  that  are  closest  to  x.  

2. Predict  the  most  frequent  label  in  that  set.  

…you  might  build  some  indices….  

Very  fast!  

PredicSon  is  relaSvely  slow  (compared  to  a  linear  classifier  or  decision  tree)  

Page 40: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

What  is  the  decision  boundary  for  1-­‐NN?  

-­‐  

+  +   +  

-­‐  -­‐   -­‐  

-­‐  -­‐  

-­‐  

Voronoi  Diagram  

*  

Each  cell  Ci  is  the  set  of  all  points  that  are  closest  to  a  parScular  example  xi  

Page 41: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

What  is  the  decision  boundary  for  1-­‐NN?  

-­‐  

+  +   +  

-­‐  -­‐   -­‐  

-­‐  -­‐  

-­‐  

Voronoi  Diagram  

Page 42: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Effect  of  k  on  decision  boundary  

Page 43: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

Some  common  variants    

•  Distance  metrics:    –  Euclidean  distance:  ||x1  -­‐  x2||  –  Cosine  distance:  1  -­‐  <x1,x2>/||x1||*||x2||  

•  this  is  in  [0,1]  •  Weighted  nearest  neighbor:  

–  Instead  of  most  frequent    y  in  k-­‐NN  predict  

Page 44: DecisionTrees,(kNNClassifier10601b/slides/DTree_kNN.pdf · DecisionTrees,(kNNClassifier Machine(Learning(10.601B(SeyoungKim ... – Overfitting and tree/rule post-pruning – Extensions…

You should know:

•  Well posed function approximation problems: –  Instance space, X –  Sample of labeled training data { <x(i), y(i)>} –  Hypothesis space, H = { f: X!Y } –  Learning is a search/optimization problem over H

•  Decision tree learning – Greedy top-down learning of decision trees (ID3, C4.5, ...) – Overfitting and tree/rule post-pruning – Extensions…

•  kNN classifier –  Non-linear decision boundary –  Low-cost training, high-cost prediction