70
| | Francesca Grisoni University of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milan, Italy ETH Zurich, Dept. of Chemistry and Applied Biosciences, Zurich, Switzerland [email protected] 17.05.2017 1 Machine Learning for Chemoinformatics An introduction © F. Grisoni, BigChem online course

Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

Francesca GrisoniUniversity of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milan, ItalyETH Zurich, Dept. of Chemistry and Applied Biosciences, Zurich, [email protected]

17.05.2017 1

Machine Learning for ChemoinformaticsAn introduction

© F. Grisoni, BigChem online course

Page 2: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 2

Presentation Outline

• Introduction• Definition• Elements of Machine Learning

• Additional Considerations• The NFL theorem• Validation• Applicability

• Machine Learning approaches: some examples• Local methods• Tree-like approaches• Neural Networks

Page 3: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

Machine learning in chemoinformatics

17.05.2017© F. Grisoni, BigChem online course 3

P = f ( )

• Biological activity prediction

• Toxicity

• Physico-chemical properties

• Multi-objective optimization

• Rational drug design

Introduction

Page 4: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 4

Machine Learning (ML)

“Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.”1

1 Samuel, A. L. (1959). “Some studies in machine learning using the game of checkers”. IBM Journal of research and development, 3(3), 210-229.2 Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877.

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”2

https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer(1) Data (2) Task

(3) Performance

Introduction

Page 5: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 5

(1) The data and the “G-I-G-O” principle

X (n x p)n

sam

ples

p variables

X

Machine Learning Elements

f ( )

f ( )0.1, 1, 0, 3, 3.5, 2, …

Page 6: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

P =

P =

17.05.2017© F. Grisoni, BigChem online course 6

(1) The data and the “G-I-G-O” principle

X (n x p)n

sam

ples

p variables

X

Machine Learning Elements

p'

nsa

mpl

es

Y (n x 1)

Y

f ( )

f ( )0.1, 1, 0, 3, 3.5, 2, …

Page 7: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 7

(1) The data and the “G-I-G-O” principle

Garbage In = Garbage OutX (n x p)n

sam

ples

p variables

X

Machine Learning Elements

p'

nsa

mpl

es

Y (n x 1)

Y

• Structures• Experimental Responses

Page 8: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 8

(2) Machine Learning Tasks

x1

x2

Unsupervised Learning

X (n x p)

nsa

mpl

es

p variables

X

Machine Learning Elements

Page 9: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 9

(2) Machine Learning Tasks

x1

x2

Unsupervised Learning

X (n x p)

nsa

mpl

es

p variables

X

Machine Learning Elements

Page 10: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 10

(2) Machine Learning Tasks

x1

x2

X (n x p)

nsa

mpl

es

p variables

X

Supervised Learning

Machine Learning Elements

Page 11: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

p'

17.05.2017© F. Grisoni, BigChem online course 11

(2) Machine Learning Tasks

x1

x2

Supervised Learning

X (n x p) +

nsa

mpl

es

p variables

Y (n x 1)

nsa

mpl

es

X Y

Machine Learning Elements

Page 12: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

p'

17.05.2017© F. Grisoni, BigChem online course 12

(2) Machine Learning Tasks

x1

x2

Supervised Learning

X (n x p) +

nsa

mpl

es

p variables

Y (n x 1)

nsa

mpl

es

X Y

Machine Learning Elements

Page 13: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 13

(2) Machine Learning Tasks

Machine Learning Elements

Page 14: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 14

Classification

Regression

(2) Machine Learning Tasks

Machine Learning Elements

Page 15: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

• Sensitivity or True Positive rate (TPR)

• Specificity or True Negative rate (TNR)

• Precision

17.05.2017© F. Grisoni, BigChem online course 15

N

P

N P

TN

TP

FP

FN

Single class

(3) PerformanceClassification

Machine Learning Elements

Page 16: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

• Non-Error Rate or Balanced-Accuracy ϵ [0,1]

• Matthews Correlation Coefficient (MCC) ϵ [-1,1]

• Accuracy ϵ [0,1]

17.05.2017© F. Grisoni, BigChem online course 16

Global Performance

(3) PerformanceClassification

N

P

N P

TN

TP

FP

FN

Machine Learning Elements

Page 17: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

• Non-Error Rate or Balanced-Accuracy ϵ [0,1]

• Matthews Correlation Coefficient (MCC) ϵ [-1,1]

• Accuracy ϵ [0,1]

17.05.2017© F. Grisoni, BigChem online course 17

Global Performance

(3) PerformanceClassification

N

P

N P

TN

TP

FP

FN

Machine Learning Elements

N = 990; P = 10TN = 990 (100%)TP = 0 (0%)

Snp = 0/10 = 0%Snp = 990/990 = 100%

NER = 50%Acc = 99%

Page 18: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

• Non-Error Rate or Balanced-Accuracy ϵ [0,1]

• Matthews Correlation Coefficient (MCC) ϵ [-1,1]

• Accuracy ϵ [0,1]

17.05.2017© F. Grisoni, BigChem online course 18

Global Performance

(3) PerformanceClassification

N

P

N P

TN

TP

FP

FN

Machine Learning Elements

N = 990; P = 10TN = 990 (100%)TP = 0 (0%)

SnP = 0/10 = 0%SnN = 990/990 = 100%

NER = 50%Acc = 99%

Page 19: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

• Non-Error Rate or Balanced-Accuracy ϵ [0,1]

• Matthews Correlation Coefficient (MCC) ϵ [-1,1]

• Accuracy ϵ [0,1]

17.05.2017© F. Grisoni, BigChem online course 19

Global Performance

(3) PerformanceClassification

N

P

N P

TN

TP

FP

FN

Machine Learning Elements

N = 990; P = 10TN = 990 (100%)TP = 0 (0%)

SnP = 0/10 = 0%SnN = 990/990 = 100%

NER = 50%Acc = 99%

Page 20: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

• Non-Error Rate or Balanced-Accuracy ϵ [0,1]

• Matthews Correlation Coefficient (MCC) ϵ [-1,1]

• Accuracy ϵ [0,1]

17.05.2017© F. Grisoni, BigChem online course 20

Global Performance

(3) PerformanceClassification

N

P

N P

TN

TP

FP

FN

Machine Learning Elements

N = 990; P = 10TN = 990 (100%)TP = 0 (0%)

SnP = 0/10 = 0%SnN = 990/990 = 100%

NER = 50%Acc = 99%

Page 21: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

• Root Mean Squared Error in Prediction (RMSEP)

17.05.2017© F. Grisoni, BigChem online course 21

𝒚 𝒚#

Real Pred.

(3) PerformanceRegression

Machine Learning Elements

Page 22: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 22

Additional Considerations

Considerations on ML

Page 23: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 23

Additional Considerations

Considerations on ML

1. Choice of the learner

“For every learner, there exists a task on which it fails, even though that task can be successfully learned by another learner.”1

1 Shalev-Shwartz, S., & Ben-David, S. (2014). “Understanding machine learning: From theory to algorithms.” Cambridge university press.

“No Free Lunch” Theorem:

Page 24: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 24

Additional Considerations

Considerations on ML

2. Bias-Variance Trade-off

Complexity

Error • Bias à generalization (underfitting)• Variance à descriptive ability (overfitting)

Page 25: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 25

Additional Considerations

Considerations on ML

2. Bias-Variance Trade-off

Complexity

Error • Bias à generalization (underfitting)• Variance à descriptive ability (overfitting)

Page 26: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 26

group 1

group 2

group 4

Initial dataset

Considerations on ML

Additional Considerations3. Validation

Page 27: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 27

group 1

group 2

group 4

Initial dataset

Training set Test set

Considerations on ML

Additional Considerations3. Validation

Page 28: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 28

group 1

group 2

group 4

Initial dataset

Training set Test set

Training set Test setValidation set

Considerations on ML

Additional Considerations3. Validation

Page 29: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 29

group 1

group 2

group 3

group 4

group 5

Validation set

Training set

Considerations on ML

Additional Considerations3. Validation

Page 30: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 30

Applicability Domain: Chemical space where the property can be reliably predicted

Machine learning models à Reductionist

• Types of chemical structures • Physicochemical properties • Mechanisms of action considered

Considerations on ML

Additional Considerations4. Applicability

The “No Free Dessert (either!)” theorem

Page 31: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

T -1H=X (X X) X× ×

17.05.2017© F. Grisoni, BigChem online course 31

Man

1

p

xy j jj

D x y=

= -å1 1

2 2

min( ), max( )min( ), max( )

x xx x

Considerations on ML

Additional Considerations4. Applicability

Page 32: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 32

Standard Machine Learning workflow (in chemoinformatics)Considerations on ML

Page 33: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 33

Informationextraction

Standard Machine Learning workflow (in chemoinformatics)Considerations on ML

Page 34: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 34

Applicability & predictivity Informationextraction

Standard Machine Learning workflow (in chemoinformatics)Considerations on ML

Page 35: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 35

Applicability & predictivity Informationextraction

Application & new knowledge

Standard Machine Learning workflow (in chemoinformatics)Considerations on ML

Page 36: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 36

Machine Learning methods (overview)

1. Decision Tree-based learning

• Decision Trees

• Random Forest

2. Local Methods

• k-means algorithm

• k-NN algorithm

3. Artificial Neural Networks

• Feed-Forward NN

• Kohonen Maps

Page 37: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 37

(1) Decision Tree Learning

Root node

Decisionnode(s)

Leaves

Page 38: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 38

Root node

Decisionnode(s)

Leaves

1. Easy to interpret

2. No data pretreatment

3. Numerical/categorical variables

4. Classification and regression

5. Non parametric

6. Automatic variable selection

(1) Decision Tree Learning

Page 39: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 39

(1) Decision Tree Learning

Bagging (Bootstrap Aggregating) = “the power of the crowd”Random Forest

Page 40: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 40

(2) Local approachesk-Means clustering

x1

x2

Page 41: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 41

(2) Local approachesk-Means clustering

x1

x21. Select a k (3)

Page 42: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 42

(2) Local approachesk-Means clustering

x1

x21. Select a k (3)

2. Random assignment

Page 43: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 43

(2) Local approachesk-Means clustering

x1

x21. Select a k (3)

2. Random assignment

3. Centroid calculation

Page 44: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 44

(2) Local approachesk-Means clustering

x1

x21. Select a k (3)

2. Random assignment

3. Centroid calculation

4. Closest centroid

Page 45: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 45

(2) Local approachesk-Means clustering

x1

x21. Select a k (3)

2. Random assignment

3. Centroid calculation

4. Closest centroid

Page 46: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 46

(2) Local approachesk-Means clustering

x1

x21. Select a k (3)

2. Random assignment

3. Centroid calculation

4. Closest centroid

Page 47: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 47

(2) Local approachesk-Means clustering

x1

x21. Select a k (3)

2. Random assignment

3. Centroid calculation

4. Closest centroid

Page 48: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 48

(2) Local approachesk-Means clustering

x1

x21. Select a k (3)

2. Random assignment

3. Centroid calculation

4. Closest centroid

5. End

Page 49: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 49

(2) Local approachesk-Nearest Neighbor (kNN)

?

Page 50: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

||

1. Calculate a distance (ntrain times)

17.05.2017© F. Grisoni, BigChem online course 50

(2) Local approachesk-Nearest Neighbor (kNN)

?

Page 51: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 51

(2) Local approachesk-Nearest Neighbor (kNN)

?

1. Calculate a distance (ntrain times)

2. Select a number of neighbors (k) to predict the response

Page 52: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 52

(2) Local approachesk-Nearest Neighbor (kNN)

?

k = 11. Calculate a distance (ntrain times)

2. Select a number of neighbors (k) to predict the response

Page 53: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 53

(2) Local approachesk-Nearest Neighbor (kNN)

?

1. Calculate a distance (ntrain times)

2. Select a number of neighbors (k) to predict the response

k = 2

Page 54: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 54

(2) Local approachesk-Nearest Neighbor (kNN)

?

1. Calculate a distance (ntrain times)

2. Select a number of neighbors (k) to predict the response

k = 3

Page 55: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 55

(2) Local approachesk-Nearest Neighbor (kNN)

?

1. Calculate a distance (ntrain times)

2. Select a number of neighbors (k) to predict the response

k = 4

Page 56: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 56

(2) Local approachesk-Nearest Neighbor (kNN)

1. Good for large training set with “localized” differences

2. Difficult to be interpreted

3. Which k?

4. Which distance measure?

5. Curse of dimensionality à Variable selection

Page 57: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 57

(3) Neural NetworksArtificial Neurons

x1

f (x)

x2

x3

x4

xp

y

Inputs

Output

Activation Function

Page 58: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 58

(3) Neural NetworksArtificial Neurons

x1

f (x)

x2

x3

x4

xp

y

Inputs

Output

Activation Function Neural networks. Comprehensive Chemometrics: Chemical and Biochemical Data Analysis Vol, 3.

Page 59: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 59

(3) Neural Networks

Input Layer

Hidden Layer(s)

Output Layer

1. Untrained Network

2. Compute the outcome

3. Compute the error

4. Back propagation learning

5. Repeat until stop criterion

Feed-Forward NN

Page 60: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 60

(3) Neural Networks

Epochs

Error

Training set

Feed-Forward NN

Page 61: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 61

(3) Neural Networks

Epochs

ErrorValidation set

Training set

Feed-Forward NN

Page 62: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 62

(3) Neural Networks

Epochs

ErrorValidation set

Training set

Feed-Forward NN

Page 63: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 63

(3) Neural NetworksKohonen Maps

• Unsupervised non-linear mapping• Topology preserving map

p dimensional

2 dimensional

Page 64: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 64

(3) Neural NetworksKohonen Maps

Neurons

InputKohonenLayer

Weights

1. Competitive Learning

• Similarity to each neuron

• “Winner takes all”

2. Collaborative Learning• Winning neuron update

• Update of close neurons

Page 65: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 65

(3) Neural NetworksKohonen Maps

Top map (compounds)

Page 66: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 66

(3) Neural NetworksKohonen Maps

Top map (compounds)

Weight maps (p)

Page 67: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 67

(3) Neural NetworksKohonen Maps

Top map (compounds)

Weight maps (p)

Page 68: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 68

• Purpose (clustering, regression, classification)

• Performance vs interpretability

• Covered chemical space (e.g., AD)

• Types of included variables

• …

http://scikit-learn.org/stable/tutorial/machine_learning_map/

Which ML algorithm?

Page 69: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 69

Summary

• Machines can learn from our data

• No ML algorithm always outperforms the others

• Validation and Applicability Domain assessment are crucial

• Pay attention to what the performance metric is telling you!

Page 70: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the

|| 17.05.2017© F. Grisoni, BigChem online course 70

Supplementary reading

Theory and Algorithms

• Shalev-Shwartz, S., & Ben-David, S. (2014). “Understanding machine learning: From theory to algorithms.” Cambridge university press.

• Marini, F. (2009). “Neural networks”. In: Comprehensive Chemometrics: Chemical and Biochemical Data Analysis - Vol, 3.

Online resources

• [Coursera] Ng, A. Machine Learning, Stanford University. https://www.coursera.org/learn/machine-learning• [Online Book] Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/