44
Predictive Analytics using Machine learning Praisan Padungweang, Ph.D.

Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Predictive Analyticsusing

Machine learningPraisan Padungweang, Ph.D.

Page 2: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Model evaluation

2

Page 3: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

The Confusion MatrixA confusion matrix shows the number of correct and incorrect decisions made by the model compare to the actual labels (target) in the data.

For a problem involving n classes, it is an n × n matrix with the rows labeled with actual classes and the columns labeled with predicted classes.

Predicted

T F

Act

ual T

F

Predicted

a b c

Act

ual

a

b

c

3

Page 4: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

The Confusion MatrixThe relationship between classes can be depicted as a 2 x 2 confusion matrix

◦ True Positive (TP): Correctly classified as the class of interest

◦ True Negative (TN): Correctly classified as not the class of interest

◦ False Positive (FP): Incorrectly classified as the class of interest

◦ False Negative (FN): Incorrectly classified as not the class of interest

Predicted

T F

Act

ual

TTrue

Positive

False

Negative(Type II error)

FFalse

Positive(Type I error)

True

Negative

4

Accuracy (TP+ TN)/(TP+FN+FP+TN)

Page 5: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Model evaluation

5

Confusion matrix

Predicted status

𝑃1 𝑃2 𝑃3 𝑃𝑘

Act

ual

Sta

tus

𝐴1 𝑨𝟏𝑷𝟏 𝐴1𝑃2 𝐴1𝑃3 𝐴1𝑃𝑘

𝐴2 𝐴2𝑃1 𝑨𝟐𝑷𝟐 𝐴2𝑃3 𝐴2𝑃𝑘

𝐴3 𝐴3𝑃1 𝐴3𝑃2 𝑨𝟑𝑷𝟑 𝐴3𝑃𝑘

:

𝐴𝑘 𝐴𝑘𝑃1 𝐴𝑘𝑃2 𝐴𝑘𝑃3 𝐴𝑘 𝑨𝒌𝑷𝒌

Accuracy = 𝐴1𝑃1+𝐴2𝑃2+𝐴3𝑃3+⋯+𝐴𝑘𝑃𝑘

𝑛

Model evaluation for multiple classes

Page 6: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Model evaluation

6

Churn Predicted

1 John Yes 0.72

2 Sophie No 0.56

3 David Yes 0.44

4 Emma No 0.18

5 Bob No 0.36

Predicted status

churn no churn

Actual Status

churn 1 (John) 1(David)

no churn

1 (Sophie)2 (Emma,

Bob)

Accuracy = 𝑇𝑃+𝑇𝑁

𝑛=1+2

5= 0.6

Model evaluation for binary classes

Page 7: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Model evaluation

7

Accuracy = 𝑇𝑃+𝑇𝑁

𝑁

Actual Class Prob. of "1" Actual Class Prob. of "1"

1 0.996 1 0.506

1 0.988 0 0.471

1 0.984 0 0.337

1 0.980 1 0.218

1 0.948 0 0.199

1 0.889 0 0.149

1 0.848 0 0.048

0 0.762 0 0.038

1 0.707 0 0.025

1 0.681 0 0.022

1 0.656 0 0.016

0 0.622 0 0.004

Actual Class Prob. of "1" Actual Class Prob. of "1"

1 0.996 1 0.506

1 0.988 0 0.471

1 0.984 0 0.337

1 0.980 1 0.218

1 0.948 0 0.199

1 0.889 0 0.149

1 0.848 0 0.048

0 0.762 0 0.038

1 0.707 0 0.025

1 0.681 0 0.022

1 0.656 0 0.016

0 0.622 0 0.004

Predicted status

1 0

Actual Status

1

0

Page 8: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Other Evaluation MetricsThere are other Evaluation Metrics that can be calculate from the confusion matrix

◦ Sensitivity and specificity

◦ Precision and Recall

◦ F-measure

8

Sensitivity and specificity

Page 9: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Other Evaluation Metrics

9

Predicted

T F

Act

ual

TTrue

Positive

False

Negative(Type II error)

True positive rate, Sensitivity,

Recall = 𝑻𝑷

𝑻𝑷+𝑭𝑵

FFalse

Positive(Type I error)

True

Negative

True negative rate,

Specificity = 𝑻𝑵

𝑭𝑷+𝑻𝑵

Positive predictive value,

Precision = 𝑻𝑷

𝑻𝑷+𝑭𝑷

Accuracy = 𝑻𝑷+ 𝑻𝑵

𝑻𝑷+𝑭𝑵+𝑭𝑷+𝑻𝑵

F-score = 𝟐×𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏×𝑹𝒆𝒄𝒂𝒍𝒍

𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏+𝑹𝒆𝒄𝒂𝒍𝒍

Page 10: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

10

For example, in spam message problem ◦ the sensitivity of 0.842 implies that 84 percent of spam messages were correctly

classified.

◦ the specificity of 0.996 implies that 99.6 percent of non-spam messages were correctly classified, or alternatively, 0.4 percent of valid messages were rejected as spam.

The idea of rejecting 0.4 percent of valid email messages may be unacceptable

Predicted

T F

Act

ual

TTrue

Positive

False

Negative(Type II error)

True positive rate,

Sensitivity = 𝑻𝑷

𝑻𝑷+𝑭𝑵

FFalse

Positive(Type I error)

True

Negative

True negative rate,

Specificity = 𝑻𝑵

𝑻𝑵+𝑭𝑷

Sensitivity and specificity

Page 11: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

11

Predicted

T F

Act

ual

TTrue

Positive

False

Negative(Type II error)

True positive rate, Sensitivity,

Recall = 𝑻𝑷

𝑻𝑷+𝑭𝑵

FFalse

Positive(Type I error)

True

Negative

Positive predictive value,

Precision = 𝑻𝑷

𝑻𝑷+𝑭𝑷 When a model predicts the positive class, how often is it correct?

o A precise model will only predict the positive class in cases very likely to be positive. It will be very trustworthy.

• having both high precision and recall at the same time is very challenging.

A model with high recall captures a large portion of the positive examples.o For example, a search engine with high

recall returns a large number of documents pertinent to the search query.

Precision and recall

Page 12: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Other Evaluation Metrics

F-measure

A measure of model performance that combines precision and recall into a single number is known as the F-measure (also sometimes called the F1 score or the F-score).

Since the F-measure reduces model performance to a single number, it provides a convenient way to compare several models side-by-side.

12

The F-measure

F1

Page 13: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Problems with Unequal Costs and Benefits

Accuracy makes no distinction between false positive and false negative errors.

◦ It makes the tacit assumption that both errors are equally important.

◦ With real-world domains this is rarely the case.

These two errors are very different, should be counted separately, and should have different costs.

13

Model-> cancer Actual-> not

Model-> notActual-> cancer

He would be given further tests • expensive• inconvenient• stressful

Do nothing!

Page 14: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

A Key Analytical Framework: cThe general form of an expected value calculation

EV = 𝑝(𝑜1) ∙ 𝑣(𝑜1) + 𝑝(𝑜2) ∙ 𝑣(𝑜2)+...

= σ𝑖 𝑝(𝑜𝑖) ∙ 𝑣(𝑜𝑖)

◦ 𝑜𝑖 is a possible decision outcome;

◦ 𝑝(𝑜𝑖) is its probability

◦ 𝑣(𝑜𝑖) is its business value.

The probabilities often can be estimated from the data

The business values often need to be acquired from other sources◦ usually the values must come from external domain knowledge

14

Page 15: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Expected Value for Model EvaluationIn targeted marketing, for example, a consumer need to be assigned as responder versus not likely responder then we could target the likely responders.

Cost/profit◦ If a consumer buys the product for $200 and our product related costs are $100.

◦ We mail some marketing materials, and the overall cost including postage is $1.

Yielding ◦ $99 is a value (profit) if the consumer responds (buys the product).

◦ a cost of $1 or equivalently a benefit of -$1 if the consumer not responds .

15

Cost-Benefit

Predicted

R N

Act

ual

R 99 0

N -1 0

Cost-Benefit matrices

Page 16: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

7.425 0

-0.1 0

Expected Value for Model Evaluation

16

ModelPredicted

R N

Act

ual

R 150 150

N 200 1500

Predicted

R N

Act

ual

R 0.075 0.075

N 0.1 0.75

Cost-Benefit

Predicted

R N

Act

ual

R 99 0

N -1 0

/ 2,000

7.325

Acc=82.5%

Expected value

Targeted marketing

Page 17: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

0 0

0 0

Expected Value for Model Evaluation

17

ModelPredicted

R N

Act

ual

R 0 300

N 0 1700

Predicted

R N

Act

ual

R 0 0.15

N 0 0.85

Cost-Benefit

Predicted

R N

Act

ual

R 99 0

N -1 0

/ 2,000

0

Acc=85%

Expected value

Targeted marketing

Page 18: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Expected Value for Model Evaluation

18

ModelPredicted

Churn not

Act

ual

Churn 100 50

not 150 9700

Cost-Benefit

Predicted

Churn not

Act

ual

Churn -10 -100

not -10 0

Acc=98%

ModelPredicted

Churn not

Act

ual

Churn 0 150

not 0 9850

Expected value = -0.75

Acc=98.5% Expected value = -1.5

Churn prediction

Page 19: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Problems with Unbalanced ClassesConsider a domain where the classes appear in a 999:1 ratio.

◦ A simple rule—always choose the most prevalent class—gives 99.9% accuracy.

Skews of 1:100 are common in fraud detection.

In churn data the baseline churn rate is approximately 10% per month ◦ If we simply classify everyone as negative we could achieve the accuracy of 90%!

19

Page 20: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Problems with Unbalanced Classes

20

Model1Predicted

Churn not

Act

ual

Churn 100 50

not 150 9700

Model2Predicted

Churn not

Act

ual

Churn 0 150

not 0 9850

Accuracy = 98% Accuracy = 98.5%

Page 21: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

OtherMachine Learning

Models

21

Page 22: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision trees

Page 23: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision treesDecision trees are recursive partitioning algorithms (RPAs) that come up with a tree-like structure representing patterns in an underlying data set

Example Decision Tree

23

Page 24: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision trees

The top node is the root node ◦ Specify a testing condition of which the outcome corresponds to a

branch leading up to an internal node.

The terminal nodes of the tree assign the classifications and are also referred to as the leaf nodes.

Parent node

Child nodes

leaf nodes

24

Not Respond Respond

Page 25: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision treesMany algorithms have been suggested to construct decision trees.

Amongst the most popular are: C4.5, CART and CHAID.

These algorithms differ in their way of answering the key decisions to build a tree, which are:

Splitting decision: ◦ Which variable to split at what value (e.g., age < 30 or not, income < 1,000 or not;

marital status = married or not)

Stopping decision: ◦ When to stop growing a tree?

Assignment decision: ◦ What class (e.g., good or bad customer) to assign to a leaf node?

25

Page 26: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision treesSplitting decision

Use the concept of impurity

Consider three nodes containing good (unfilled circles) and bad (filled circles) customers

◦ Minimal impurity occurs when all customers are either good or bad.

◦ Maximal impurity occurs when one has the same number of good and bad customers

Feature X1 Feature X2 Feature X3

26

Page 27: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision trees - Splitting decision

Decision trees will now aim at minimizing the impurity in the data.

The most popular measurement are: Entropy: E(S) = −pGlog2(pG)−pBlog2(pB) (C4.5)

Gini: Gini(S) = 2pGpB (CART)

with pG (pB) being the proportions of class G (good) and B (bad), respectively.

27

Page 28: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision trees

Stopping criterion

The tree can learn to fit the specificities or noise in the data, which is also referred to as overfitting.

The data should be split into a training sample and a validation sample ◦ The training sample will be used to make the splitting decision

◦ The validation sample is an independent sample ◦ monitor the misclassification error

28

Page 29: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Stopping criteria Spark parameters

omaxDepthoMaximum depth of a tree

o Deeper trees are more expressive (potentially allowing higher accuracy), but they are also more costly to train and are more likely to overfit.

o minInstancesPerNodeo For a node to be split further, each of its children must receive at least this number

of training instances

o minInfoGaino For a node to be split further, the split must improve at least this much (in terms of

information gain).

29

Page 30: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision treesAssignment decision

typically looks at the majority class within the leaf node to make the decision

30

Bad Good

Page 31: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision treesDecision trees essentially model decision boundaries orthogonal to the axes

Decision Boundary of a Decision Tree

31

Page 32: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Decision treesDecision trees can be used for various purposes in analytics.

input selection ◦ attributes that occur at the top of the tree are more predictive of the target

initial segmentation. ◦ builds a tree of two or three levels deep as the segmentation scheme

◦ then uses second stage machine learning models for further refinement

final analytical model to be used directly into production◦ It gives a white box model with a clear explanation behind how it reaches its

classifications.

32

Page 33: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Model decision boundaries

33

Decision trees

Logistic regression

Page 34: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Neural Networks

Page 35: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Neural networks

𝑓(. )

w0

w1

w2

A mathematical representations inspired by the functioning of the human brain.

Another more realistic perspective sees neural networks as generalizations of existing machine learning models.

35

Page 36: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Neural networksNeural networks vs Linear regression

x0

x1 𝑓(. )

0

1

36

y

x

𝑓(𝓏) = 𝓏

𝓏 = 𝜃0+ 𝜃1𝐴𝑔𝑒+ 𝜃2𝐼𝑛𝑐𝑜𝑚𝑒

Page 37: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Neural networksNeural networks vs Logistic regression

x0

x1 𝑓(. )

0

1

37

𝑓(𝓏)=1

1+𝑒−(𝓏)

𝓏 = 𝜃0+ 𝜃1𝐴𝑔𝑒+ 𝜃2𝐼𝑛𝑐𝑜𝑚𝑒

Page 38: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Neural networksSingle Layer Perceptron X0=1

w0

w1

w2

w3

CustomerAge(𝑥1)

Income(𝑥2)

Gender(𝑥3)

Response (y)

John 30 1,500 M No 0

Sarah 31 800 F Yes 1

Sophie 52 1,800 F Yes 1

David 48 2,000 M No 1

Peter 34 1,800 M Yes 0

w1

Agew2

Incomew3

Genderw0

Bias (inception)

77.09677288 -1.69512 -2.99575 1.64252

1

1.643

77.097

-1.695

-2.996

Age

Income

Gender

Weights

Page 39: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Neural networksMulti Layer Perceptron (MLP)

Layer 1 Layer 2 Layer 3

Input Layer Hidden Layer Output Layer

39

Page 40: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Neural networksEach node has a transformation function f(.) (also called activation functions). The most popular activation functions are:

Linear ranging between −∞ and +∞; 𝑓 𝑧 = 𝑧

Sigmoid (Logistic)

ranging between 0 and 1; 𝑓 𝑧 =1

1+𝑒−𝑧

Hyperbolic tangent

ranging between –1 and +1; 𝑓 𝑧 =𝑒𝑧−𝑒−𝑧

𝑒𝑧+𝑒−𝑧

40

Page 41: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Selecting activation function

Hidden Layer -> logistic, hyperbolic tangent, linear

Output Layer ◦ For classification (e.g., churn, response, fraud),

◦ it is common practice to adopt a logistic transformation in the output layer, since the outputs can then be interpreted as probabilities.

◦ For regression targets◦ Linear

◦ Linear, logistic, hyperbolic tangent for normalized target

41

Input Layer Hidden Layer Output Layer

Page 42: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Model ComparisonHeld-out test datadata is divide into training set and test set

◦ Training set is user for models creation (training and validation)

◦ Test set is held-out for model selection

42

Training set Test set

models

Test performance

The selected model

mo

del

s cr

eati

on

Page 43: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Model ComparisonCross-validation for model comparison

◦ K-folds cross-validation

43

Page 44: Predictive Analytics using Machine learning · 2017-03-18 · Predictive Analytics using Machine learning Praisan Padungweang, Ph.D. Model evaluation 2. ... information gain). 29

Demo

Data Preprocessing

Models training

Models evaluation

Model deployment

44

Hands on, machine learning using Spark, in the class