35
Decision Trees 6601 Thursday, October 3, 13

Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision Trees6601

Thursday, October 3, 13

Page 2: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Classification

A1 A2 ... AN C

v11 v12 ... v1N c1

v21 v22 ... v2N c2

...

Thursday, October 3, 13

Page 3: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision TreeDecision

Outcome

Classification

Thursday, October 3, 13

Page 4: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision TreeH W O P

H S S Y/N?

N S O Y/N?

H W R Y/N?

Thursday, October 3, 13

Page 5: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision TreeH W O P

H S S N

N S O Y/N?

H W R Y/N?

Thursday, October 3, 13

Page 6: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision TreeH W O P

H S S N

N S O Y

H W R Y/N?

Thursday, October 3, 13

Page 7: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision TreeH W O P

H S S N

N S O Y

H W R Y

Thursday, October 3, 13

Page 8: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Continuos Attributes

x > th

... ...

yes

...

no

Thursday, October 3, 13

Page 9: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Continues Attributes

x

th1

th2

y

Guillotine CutWhat is the tree?

Thursday, October 3, 13

Page 10: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Continues Attributes

x

th1

th2

x > th1 yyes

y > th2

yes no

Thursday, October 3, 13

Page 11: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision Tree

Thursday, October 3, 13

Page 12: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision Tree

Which Tree is better if the classification accuracy is the same ?

Thursday, October 3, 13

Page 13: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision Tree Learning

Thursday, October 3, 13

Page 14: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Entropy

Claude Shannon

Measure Of Uncertainty / Unpredictability

in a Random Variable

Quantifies Information in a Message

2

Thursday, October 3, 13

Page 15: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Entropy

12S

S S S

Thursday, October 3, 13

Page 16: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Information Gain

Thursday, October 3, 13

Page 17: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Calculate

Thursday, October 3, 13

Page 18: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Result

Thursday, October 3, 13

Page 19: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Decision Tree Learning

Thursday, October 3, 13

Page 20: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Greedy Algorithm

• In search terms: A greedy algorithm with the Information Gain as a Heuristic

• Could we do better?

Thursday, October 3, 13

Page 21: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Example@relation 'gatech_admission'@attribute 'recommendation' {'strong', 'weak'}@attribute 'gpa' real@attribute 'gre_math' real@attribute 'gre_verbal' real@attribute 'admitted' {'yes','no'}

@data'strong', 4, 800, 800, 'yes''weak', 3.4, 600, 500, 'yes''strong', 3.6, 800, 550, 'yes''strong', 3, 700, 650, 'yes' 'weak', 3.2, 800, 800, 'yes''strong', 4, 550, 500, 'yes''strong', 3.7, 700, 750, 'yes''weak', 4, 800, 200, 'yes''strong', 4, 200, 800, 'yes''strong', 3.4, 600, 500, 'yes''strong', 3.6, 800, 550, 'yes''weak', 3, 700, 650, 'yes' 'strong', 4, 550, 500, 'yes''strong', 3.7, 700, 750, 'yes''weak', 2.8, 800, 800, 'no''weak', 4, 200, 200, 'no'

'strong', 2, 500, 200, 'no''strong', 3.5, 200, 800, 'no''weak', 2, 800, 800, 'no''weak', 1.7, 100, 100, 'no''weak', 3.7, 50, 0, 'no''weak', 2.8, 100, 100, 'no''weak', 4, 200, 200, 'no''strong', 2, 100, 100, 'no''weak', 1.7, 100, 100, 'no''weak', 3.7, 50, 0, 'no''weak', 2.8, 100, 800, 'no''weak', 4, 200, 200, 'no''strong', 2, 500, 200, 'no''strong', 3.5, 200, 800, 'no'

'weak', 2, 800, 800, 'no''weak', 1.7, 100, 100, 'no''strong', 3.7, 50, 0, 'no''weak', 2.8, 100, 800, 'no''weak', 4, 200, 200, 'no''strong', 2, 500, 200, 'no''strong', 3.5, 200, 800, 'no''weak', 2, 800, 800, 'no''weak', 1.7, 100, 100, 'no''weak', 3.7, 50, 0, 'no'

What is the best?

Thursday, October 3, 13

Page 22: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Weka Demo

Thursday, October 3, 13

Page 23: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Use Case: Mobile Text Entry

Thursday, October 3, 13

Page 24: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Rollon (E ,W)

Thursday, October 3, 13

Page 25: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Rolloff (E ,W)

Thursday, October 3, 13

Page 26: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Use Case: Fat Thumbs

Thursday, October 3, 13

Page 27: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Use Case: Fat Thumbs

Thursday, October 3, 13

Page 28: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

What are the drawbacks?

Decision Trees are known to over fit the data

Thursday, October 3, 13

Page 29: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Ensemble Learning

Vote

Mixture of Experts: Have we seen one before?

Thursday, October 3, 13

Page 30: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Random ForestsWinning!

Thursday, October 3, 13

Page 31: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Random Forests

• INPUT: Data Set of size N with M dimensions

• 1) SAMPLE n times from Data

• 2) SAMPLE m times from Attributes

• 3) LEARN TREE on sampled Data and Attributes

• REPEAT UNTIL k trees

Bagging: Bootstrap AGGregation

Thursday, October 3, 13

Page 32: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Use Case: Kinect

Thursday, October 3, 13

Page 33: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Use Case: Kinect

Thursday, October 3, 13

Page 34: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Use Case: Kinect

Pixel to classify

Offset

Thursday, October 3, 13

Page 35: Decision Trees - Amazon S3 · 2016-02-08 · Decision Tree Learning Thursday, October 3, 13. Greedy Algorithm ... Decision Trees are known to over fit the data Thursday, October

Use Case: Kinect

Training 3 trees to depth 20 from 1 millionimages takes about a day on a 1000 core cluster

Thursday, October 3, 13