17
R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK [email protected] 2 nd May 2013

R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK [email protected] 2 nd May 2013

Embed Size (px)

Citation preview

Page 1: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

R for Classification

Jennifer BroughtonShimadzu Research LaboratoryManchester, UK

[email protected]

2nd May 2013

Page 2: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Classification?

Automatic Identification of Type (Class) of Object from Measured Variables (Features)

Object Type Feature1 Feature2 Feature3 …….Feature nLabel 1 val[1,1] val[1,2] val[1,3] ……. val[1,n]Label 2 val[2,1] val[2,2] val[2,3] …….val[2,n]…… ……. ……. ……. …….………Label m val[m,1] val[m.2]val[m,3] ……. val[m,n]

2 of 17

Page 3: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Example Data

3 of 17

Page 4: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Data Preparation & Investigation

EDA Technique

Box Plots PCA Decision Trees Clustering

Training Set

• Best features to distinguish between classes

• Relationships between features• Feature reduction

4 of 17

Page 5: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Box Plots

PCA & Multivariate Analysis: ade4FactoMineR

5 of 17

Page 6: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Example Classifier

6 of 17

Page 7: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Classification Algorithms in R

Rattle: R Analytical Tool to Learn Easily (Rattle: A Data Mining GUI for R, Graham J Williams, The R Journal, 1(2):45-55 )7 of 17

Page 8: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

SVM

8 of 17

Page 9: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Ensemble Algorithm

9 of 17

Page 10: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Training and Testing

Classification Algorithm:

Neural NetworkSupport Vector MachineRandom Forest

Training Set(labelled)

Test Set(unlabelled)

TrainedClassifier

Classification Results

PredictionResults

+ Labels

Assess Predictions:Confusion MatrixROC Curve (2 categories) ….

10 of 17

Page 11: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Using Classifiers in R

Select Training Data

Build Classifier

Run Classifier

classifier algorithm(formula, data, options)

(boosting and nnet)

classifier.pred predict(classifier, newdata, options)

11 of 17

Page 12: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

SVM & Neural Net Tuning

12 of 17

Page 13: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Classifier Feedbackprint(classifier)plot(classifier)

high Gini Coefficient = high dispersion

13 of 17

Page 14: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Classifier Prediction Resultspredict(type = “class”)

predict(type = “prob”)

confusion matrix14 of 17

Page 15: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

FalseNegative

TruePositive

TrueNegative

FalsePositive

Binary Classification Results

Y NClass Present?

ClassDetected?

Y

N

𝑻𝑷𝑹=𝑻𝑷

𝑻𝑷+𝑭𝑵=𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚

𝑭𝑷𝑹=𝑭𝑷

𝑻𝑵+𝑭𝑷=𝟏−𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚

15 of 17

Page 16: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

ROC Curves in RROCR package

16 of 17

Page 17: R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013

Example Results

17 of 17