31
2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 1 New Perspective on Machine Learning Predictions Under Uncertainty Rahul Vishwakarma, Jayanth Reddy

New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 1

New Perspective on Machine Learning Predictions Under Uncertainty

Rahul Vishwakarma, Jayanth Reddy

Page 2: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 2

Agenda

Understanding Trustworthiness of Prediction

Quantifying Uncertainty

Application in risk-sensitive system

Why should we care?

Page 3: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 3

Classical ML Approach

Classification (Binary) Input: 𝑥𝑥1,𝑦𝑦1 , 𝑥𝑥2,𝑦𝑦2 , 𝑥𝑥𝑛𝑛,𝑦𝑦𝑛𝑛 . . . 𝑥𝑥𝑛𝑛+1,𝑦𝑦𝑛𝑛+1 = ? Task: Predict label of 𝑦𝑦𝑛𝑛+1 new data point Prediction model : 𝑓𝑓 𝑥𝑥𝑖𝑖 → 𝑦𝑦𝑖𝑖

Model performance 𝑓𝑓 𝑥𝑥𝑖𝑖 → 𝑦𝑦𝑖𝑖 accuracy is 98% for 𝑥𝑥1,𝑦𝑦1 , 𝑥𝑥2,𝑦𝑦2 , 𝑥𝑥𝑛𝑛,𝑦𝑦𝑛𝑛

Can we trust the prediction accuracy of 𝑦𝑦𝑛𝑛+1 is also 98%Are we really confident about the specific prediction

Page 4: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 4

Motive

Prediction models only output bare predictions but not the confidence in those predictions

Obtain a metrics which explains: Confidence of prediction for new label Informativeness of each new data points

How do we quantify prediction uncertainty

How do we know when to trust our results?

Page 5: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 5

Sources of Uncertainty in ML

Observation regularizationStatistical

Model Induction

approximate inference

Unobservable Inferred State

Inverse Uncertainty Quantification Problem

Classical Machine Leaning Problem

(A) (B) (C) (D) (E)

measurement errors

regularizationeffects

Model Uncertainty

inferenceerrors

PredictionUncertainty∪ ∪ ∪ ≅

Figure 1: Sources of uncertainty in machine learning

Page 6: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 6

Uncertainty Estimation

Algorithmic Randomness

Algorithmic information theory provides universal measures of confidence but these are, unfortunately, non-computable

Obtain practicable approximations to universal measures of confidence

Page 7: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 7

Conformal Prediction

Page 8: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 8

Learning Framework

Generalizable for any machine leaning algorithm

Framework Algorithmic randomness1

problem of assigning confidences to predictions is closely connected to the problem of defining random sequences

Hypothesis testing

1Algorithmic Learning in a Random World by Vovk, Gammerman and Shafer. Springer, 2005.

Page 9: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 9

Assumption Exchangeability

Figure 2: Exchangeable data Figure 3: Non Exchangeable data

𝑃𝑃 𝑧𝑧1, 𝑧𝑧2, … ∈ 𝑍𝑍∞ ∶ 𝑧𝑧1, … , 𝑧𝑧𝑛𝑛 ∈ 𝐸𝐸 = 𝑃𝑃{ 𝑧𝑧1, 𝑧𝑧2, … ∈ 𝑍𝑍∞ ∶ 𝑧𝑧π(1), … , 𝑧𝑧π(𝑛𝑛) ∈ 𝐸𝐸}

Page 10: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 10

Conformal Prediction

nonconformity p-value Prediction set Inference

α𝑖𝑖𝑦𝑦 =

∑𝑗𝑗=1𝑘𝑘 𝐷𝐷𝑖𝑖𝑗𝑗𝑦𝑦

∑𝑗𝑗=1𝑘𝑘 𝐷𝐷𝑖𝑖𝑗𝑗

−𝑦𝑦 𝑝𝑝(α𝑛𝑛+1𝑦𝑦𝑝𝑝 ) =

#{𝑖𝑖:α𝑖𝑖𝑦𝑦𝑝𝑝 ≥ α𝑛𝑛+1

𝑦𝑦𝑝𝑝 }𝑛𝑛 + 1

Γ1−ε = {𝑦𝑦𝑖𝑖:𝑃𝑃𝑦𝑦𝑖𝑖 > ε,𝑦𝑦𝑖𝑖∈𝑌𝑌 } 𝐶𝐶𝐶𝐶𝑛𝑛𝑓𝑓𝑖𝑖𝐶𝐶𝐶𝐶𝑛𝑛𝐶𝐶𝐶𝐶: 1 − 𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚2𝑛𝑛𝑛𝑛

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖𝐶𝐶𝑖𝑖𝐶𝐶𝑖𝑖𝐶𝐶𝑦𝑦: 𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚

Method

Define nonconformity score for k-NN classifier

Method

Calculate p-value for each of the hypothesis

Method

Prediction set whose p-value satisfies the confidence level

Method

Explainable reliability of each new prediction

Page 11: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 11

Problem statement

P9JB1H4W (B)

K4KT612L (A)

ZEEUIQ4K (C)

YVKW267K (D)

LM7RQWD9 (Y)

3

5

8

4

9

64

7

α𝑖𝑖𝑦𝑦 =

∑𝑗𝑗=1𝑘𝑘 𝐷𝐷𝑖𝑖𝑗𝑗𝑦𝑦

∑𝑗𝑗=1𝑘𝑘 𝐷𝐷𝑖𝑖𝑗𝑗−𝑦𝑦

Predict the label (Y) of new data point

• Given • Disk Drives dataset

• Predict • Label of new Disk • Assign Confidence and Credibility

• Model• k-Nearest Neighbor (k =1)

Page 12: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 12

Nonconformity score

P9JB1H4W (B)

K4KT612L (A)

ZEEUIQ4K (C)

YVKW267K (D)

LM7RQWD9 (Y)

3

5

8

4

9

64

7

α𝐴𝐴 =𝐶𝐶|𝐴𝐴, 𝑌𝑌|

𝐶𝐶|𝐴𝐴, 𝐶𝐶|=

38

α𝐵𝐵 =𝐶𝐶|𝐵𝐵, 𝑌𝑌|

𝐶𝐶|𝐵𝐵, 𝐷𝐷|=

49

α𝐶𝐶 =𝐶𝐶|𝐶𝐶, 𝐷𝐷|

𝐶𝐶|𝐶𝐶, 𝑌𝑌|=

74

α𝐷𝐷 =𝐶𝐶|𝐷𝐷, 𝐶𝐶|

𝐶𝐶|𝐷𝐷, 𝑌𝑌|=

76

α𝑌𝑌 =𝐶𝐶|𝑌𝑌, 𝐴𝐴|

𝐶𝐶|𝑌𝑌, 𝐶𝐶|=

34

Let Y =

Assuming the predicted label is a Failed disk: Calculate non conformity score

α𝑖𝑖𝑦𝑦 =

∑𝑗𝑗=1𝑘𝑘 𝐷𝐷𝑖𝑖𝑗𝑗𝑦𝑦

∑𝑗𝑗=1𝑘𝑘 𝐷𝐷𝑖𝑖𝑗𝑗

−𝑦𝑦

Page 13: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 13

Nonconformity score

P9JB1H4W (B)

K4KT612L (A)

ZEEUIQ4K (C)

YVKW267K (D)

LM7RQWD9 (Y)

3

5

8

4

9

64

7

α𝐴𝐴 =𝐶𝐶|𝐴𝐴, 𝐵𝐵|

𝐶𝐶|𝐴𝐴, 𝑌𝑌|=

53

α𝐵𝐵 =𝐶𝐶|𝐵𝐵, 𝐴𝐴|

𝐶𝐶|𝐵𝐵, 𝑌𝑌|=

54

α𝐶𝐶 =𝐶𝐶|𝐶𝐶, 𝑌𝑌|

𝐶𝐶|𝐶𝐶, 𝐴𝐴|=

48

α𝐷𝐷 =𝐶𝐶|𝐷𝐷, 𝑌𝑌|

𝐶𝐶|𝐷𝐷, 𝐵𝐵|=

69

α𝑌𝑌 =𝐶𝐶|𝑌𝑌, 𝐶𝐶|

𝐶𝐶|𝑌𝑌, 𝐴𝐴|=

43

Let Y =

Assuming the predicted label is a Normal disk: Calculate non conformity score

α𝑖𝑖𝑦𝑦 =

∑𝑗𝑗=1𝑘𝑘 𝐷𝐷𝑖𝑖𝑗𝑗𝑦𝑦

∑𝑗𝑗=1𝑘𝑘 𝐷𝐷𝑖𝑖𝑗𝑗

−𝑦𝑦

Page 14: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 14

p-value and Prediction

𝑝𝑝(α𝑛𝑛+1𝑦𝑦𝑝𝑝 ) =

#{𝑖𝑖:α𝑖𝑖𝑦𝑦𝑝𝑝 ≥ α𝑛𝑛+1

𝑦𝑦𝑝𝑝 }𝑛𝑛 + 1

α𝐴𝐴 =38

= 0.37

α𝐵𝐵 =49

= 0.44

α𝐶𝐶 =𝟕𝟕𝟒𝟒

= 𝟏𝟏.𝟕𝟕𝟕𝟕

α𝐷𝐷 =𝟕𝟕𝟔𝟔

= 𝟏𝟏.𝟏𝟏𝟔𝟔

α𝑌𝑌 =𝟑𝟑𝟒𝟒

= 𝟎𝟎.𝟕𝟕𝟕𝟕

α𝐴𝐴 =𝟕𝟕𝟑𝟑

= 𝟏𝟏.𝟔𝟔𝟔𝟔

α𝐵𝐵 =54

= 1.25

α𝐶𝐶 =48

= 0.5

α𝐷𝐷 =69

= 0.66

α𝑌𝑌 =𝟒𝟒𝟑𝟑

= 𝟏𝟏.𝟑𝟑𝟑𝟑

Let Y =Let Y =

𝑝𝑝 = 35

= 0.6 𝑝𝑝 = 25

= 0.4

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖𝐶𝐶𝑖𝑖𝐶𝐶𝑖𝑖𝐶𝐶𝑦𝑦 = 𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝐶𝐶𝐶𝐶𝑛𝑛𝑓𝑓𝑖𝑖𝐶𝐶𝐶𝐶𝑛𝑛𝐶𝐶𝐶𝐶 = 1 − 𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚2𝑛𝑛𝑛𝑛

LM7RQWD9 (Y)

=LM7RQWD9 (Y)

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖𝐶𝐶𝑖𝑖𝐶𝐶𝑖𝑖𝐶𝐶𝑦𝑦 = 0.6

𝐶𝐶𝐶𝐶𝑛𝑛𝑓𝑓𝑖𝑖𝐶𝐶𝐶𝐶𝑛𝑛𝐶𝐶𝐶𝐶 = 0.6

Higher non-conformity : Lower p-value

Page 15: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 15

Variants of Conformal Prediction

Inductive Conformal Prediction Computationally efficient

Aggregated Conformal Prediction Cross Conformal Prediction

Page 16: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 16

Applications

Page 17: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 17

Applications

Hardware failure prediction Disk drive

Anomaly detection Time Series

Feature selection Prediction quality assessment

Page 18: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 18

Disk Drive Failure Prediction

Dataset: Black Blaze Q2_2019 Classification using Conformal Framework

K- Nearest Neighbors Want to try out Conformal Classification package?https://pypi.org/project/ConfClr/ (pip install ConfClr)

Interpretation of results Confidence Credibility

Application Ability to Rank the Labels

https://f001.backblazeb2.com/file/Backblaze-Hard-Drive-Data/data_Q2_2019.zip

Page 19: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 19

Dataset

https://f001.backblazeb2.com/file/Backblaze-Hard-Drive-Data/data_Q2_2019.zip

Page 20: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 20

Package ConfClr

Page 21: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 21

Prediction with Explainable Reliability

• 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖𝐶𝐶𝑖𝑖𝐶𝐶𝑖𝑖𝐶𝐶𝑦𝑦 = 𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚

• 𝐶𝐶𝐶𝐶𝑛𝑛𝑓𝑓𝑖𝑖𝐶𝐶𝐶𝐶𝑛𝑛𝐶𝐶𝐶𝐶 = 1 − 𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚2𝑛𝑛𝑛𝑛

• Confidence level of 95% means that in 95% the predicted class is also a true class

Page 22: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 22

Interpretation

Figure 4: Interpreting quality of prediction

Page 23: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 23

Framework Application

Disk Serial Number Status Confidence

Z305B2QN FAILED 0.950820

ZA16NQJR FAILED 0.918003

ZJV1CSVX FAILED 0.868852

ZA18CEBF FAILED 0.737705

ZJV02XWA FAILED 0.672131

ZJV0XJQ0 FAILED 0.426230

ZJV02XWV FAILED 0.114754

PL2331LAG9TEEJ FAILED 0.098361

• Priority based insightful action

• Efficient Inventory management

Figure 4: Ranking the failed drive (Confidence)

Page 24: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 24

Sample Interface - Application

Page 25: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 25

Conclusion and Discussion

Page 26: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 26

Challenges

Designing nonconformity score Application in Time Series

Page 27: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 27

Summary

Standard prediction models are based on bare predictions Classification: discrete classes Regression: real-valued point prediction

Model with high accuracy does not guarantee reliable forecasting

Conformal prediction can be used with any machine learning Conformal Framework

Confidence of each single prediction Credibility tells about the quality of data point

Page 28: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 28

Why should we care?

It matters how much we can rely upon a given prediction in risk-sensitive applications.

Seven scientific experts in Italy were convicted of manslaughter and sentenced to six years in prison for failing to give warning before the April 2009 earthquake that killed 309 people.

Diacu F. Is failure to predict a crime? October 2012. New York Times; October 2012. https://www.nytimes.com/2012/10/27/opinion/a-failed-earthquake-prediction-a-crime.html

• Drug design • Insurance• Investment

• Development of new drugs• Precision medicine & Genomics

Page 29: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 29

References

V. Vovk, A. Gammerman, and G. Shafer, Algorithmic learning in a random world. Springer, 2005. G. Shafer and V. Vovk, “A tutorial on conformal prediction,” The Journal of Machine Learning Research, vol. 9, pp. Balasubramanian, V., Ho, S. and Vovk, V. (2014). Conformal Prediction for Reliable Machine Learning.

Page 30: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 30

“As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.” – Albert Einstein

Page 31: New Perspective on Machine Learning Predictions Under ... · New Perspective on Machine Learning Predictions Under Uncertainty ... Algorithmic Randomness Algorithmic information theory

2019 Storage Developer Conference. © Dell EMC. All Rights Reserved. 312019 Storage Developer Conference. © Dell EMC. All Rights Reserved.