Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
||
Francesca GrisoniUniversity of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milan, ItalyETH Zurich, Dept. of Chemistry and Applied Biosciences, Zurich, [email protected]
17.05.2017 1
Machine Learning for ChemoinformaticsAn introduction
© F. Grisoni, BigChem online course
|| 17.05.2017© F. Grisoni, BigChem online course 2
Presentation Outline
• Introduction• Definition• Elements of Machine Learning
• Additional Considerations• The NFL theorem• Validation• Applicability
• Machine Learning approaches: some examples• Local methods• Tree-like approaches• Neural Networks
||
Machine learning in chemoinformatics
17.05.2017© F. Grisoni, BigChem online course 3
P = f ( )
• Biological activity prediction
• Toxicity
• Physico-chemical properties
• Multi-objective optimization
• Rational drug design
Introduction
|| 17.05.2017© F. Grisoni, BigChem online course 4
Machine Learning (ML)
“Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.”1
1 Samuel, A. L. (1959). “Some studies in machine learning using the game of checkers”. IBM Journal of research and development, 3(3), 210-229.2 Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877.
“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”2
https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer(1) Data (2) Task
(3) Performance
Introduction
|| 17.05.2017© F. Grisoni, BigChem online course 5
(1) The data and the “G-I-G-O” principle
X (n x p)n
sam
ples
p variables
X
Machine Learning Elements
f ( )
f ( )0.1, 1, 0, 3, 3.5, 2, …
||
P =
P =
17.05.2017© F. Grisoni, BigChem online course 6
(1) The data and the “G-I-G-O” principle
X (n x p)n
sam
ples
p variables
X
Machine Learning Elements
p'
nsa
mpl
es
Y (n x 1)
Y
f ( )
f ( )0.1, 1, 0, 3, 3.5, 2, …
|| 17.05.2017© F. Grisoni, BigChem online course 7
(1) The data and the “G-I-G-O” principle
Garbage In = Garbage OutX (n x p)n
sam
ples
p variables
X
Machine Learning Elements
p'
nsa
mpl
es
Y (n x 1)
Y
• Structures• Experimental Responses
|| 17.05.2017© F. Grisoni, BigChem online course 8
(2) Machine Learning Tasks
x1
x2
Unsupervised Learning
X (n x p)
nsa
mpl
es
p variables
X
Machine Learning Elements
|| 17.05.2017© F. Grisoni, BigChem online course 9
(2) Machine Learning Tasks
x1
x2
Unsupervised Learning
X (n x p)
nsa
mpl
es
p variables
X
Machine Learning Elements
|| 17.05.2017© F. Grisoni, BigChem online course 10
(2) Machine Learning Tasks
x1
x2
X (n x p)
nsa
mpl
es
p variables
X
Supervised Learning
Machine Learning Elements
||
p'
17.05.2017© F. Grisoni, BigChem online course 11
(2) Machine Learning Tasks
x1
x2
Supervised Learning
X (n x p) +
nsa
mpl
es
p variables
Y (n x 1)
nsa
mpl
es
X Y
Machine Learning Elements
||
p'
17.05.2017© F. Grisoni, BigChem online course 12
(2) Machine Learning Tasks
x1
x2
Supervised Learning
X (n x p) +
nsa
mpl
es
p variables
Y (n x 1)
nsa
mpl
es
X Y
Machine Learning Elements
|| 17.05.2017© F. Grisoni, BigChem online course 13
(2) Machine Learning Tasks
Machine Learning Elements
|| 17.05.2017© F. Grisoni, BigChem online course 14
Classification
Regression
(2) Machine Learning Tasks
Machine Learning Elements
||
• Sensitivity or True Positive rate (TPR)
• Specificity or True Negative rate (TNR)
• Precision
17.05.2017© F. Grisoni, BigChem online course 15
N
P
N P
TN
TP
FP
FN
Single class
(3) PerformanceClassification
Machine Learning Elements
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 16
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 17
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
N = 990; P = 10TN = 990 (100%)TP = 0 (0%)
Snp = 0/10 = 0%Snp = 990/990 = 100%
NER = 50%Acc = 99%
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 18
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
N = 990; P = 10TN = 990 (100%)TP = 0 (0%)
SnP = 0/10 = 0%SnN = 990/990 = 100%
NER = 50%Acc = 99%
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 19
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
N = 990; P = 10TN = 990 (100%)TP = 0 (0%)
SnP = 0/10 = 0%SnN = 990/990 = 100%
NER = 50%Acc = 99%
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 20
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
N = 990; P = 10TN = 990 (100%)TP = 0 (0%)
SnP = 0/10 = 0%SnN = 990/990 = 100%
NER = 50%Acc = 99%
||
• Root Mean Squared Error in Prediction (RMSEP)
17.05.2017© F. Grisoni, BigChem online course 21
𝒚 𝒚#
Real Pred.
(3) PerformanceRegression
Machine Learning Elements
|| 17.05.2017© F. Grisoni, BigChem online course 22
Additional Considerations
Considerations on ML
|| 17.05.2017© F. Grisoni, BigChem online course 23
Additional Considerations
Considerations on ML
1. Choice of the learner
“For every learner, there exists a task on which it fails, even though that task can be successfully learned by another learner.”1
1 Shalev-Shwartz, S., & Ben-David, S. (2014). “Understanding machine learning: From theory to algorithms.” Cambridge university press.
“No Free Lunch” Theorem:
|| 17.05.2017© F. Grisoni, BigChem online course 24
Additional Considerations
Considerations on ML
2. Bias-Variance Trade-off
Complexity
Error • Bias à generalization (underfitting)• Variance à descriptive ability (overfitting)
|| 17.05.2017© F. Grisoni, BigChem online course 25
Additional Considerations
Considerations on ML
2. Bias-Variance Trade-off
Complexity
Error • Bias à generalization (underfitting)• Variance à descriptive ability (overfitting)
|| 17.05.2017© F. Grisoni, BigChem online course 26
group 1
group 2
group 4
Initial dataset
Considerations on ML
Additional Considerations3. Validation
|| 17.05.2017© F. Grisoni, BigChem online course 27
group 1
group 2
group 4
Initial dataset
Training set Test set
Considerations on ML
Additional Considerations3. Validation
|| 17.05.2017© F. Grisoni, BigChem online course 28
group 1
group 2
group 4
Initial dataset
Training set Test set
Training set Test setValidation set
Considerations on ML
Additional Considerations3. Validation
|| 17.05.2017© F. Grisoni, BigChem online course 29
group 1
group 2
group 3
group 4
group 5
Validation set
Training set
Considerations on ML
Additional Considerations3. Validation
|| 17.05.2017© F. Grisoni, BigChem online course 30
Applicability Domain: Chemical space where the property can be reliably predicted
Machine learning models à Reductionist
• Types of chemical structures • Physicochemical properties • Mechanisms of action considered
Considerations on ML
Additional Considerations4. Applicability
The “No Free Dessert (either!)” theorem
||
T -1H=X (X X) X× ×
17.05.2017© F. Grisoni, BigChem online course 31
Man
1
p
xy j jj
D x y=
= -å1 1
2 2
min( ), max( )min( ), max( )
x xx x
Considerations on ML
Additional Considerations4. Applicability
|| 17.05.2017© F. Grisoni, BigChem online course 32
Standard Machine Learning workflow (in chemoinformatics)Considerations on ML
|| 17.05.2017© F. Grisoni, BigChem online course 33
Informationextraction
Standard Machine Learning workflow (in chemoinformatics)Considerations on ML
|| 17.05.2017© F. Grisoni, BigChem online course 34
Applicability & predictivity Informationextraction
Standard Machine Learning workflow (in chemoinformatics)Considerations on ML
|| 17.05.2017© F. Grisoni, BigChem online course 35
Applicability & predictivity Informationextraction
Application & new knowledge
Standard Machine Learning workflow (in chemoinformatics)Considerations on ML
|| 17.05.2017© F. Grisoni, BigChem online course 36
Machine Learning methods (overview)
1. Decision Tree-based learning
• Decision Trees
• Random Forest
2. Local Methods
• k-means algorithm
• k-NN algorithm
3. Artificial Neural Networks
• Feed-Forward NN
• Kohonen Maps
|| 17.05.2017© F. Grisoni, BigChem online course 37
(1) Decision Tree Learning
Root node
Decisionnode(s)
Leaves
|| 17.05.2017© F. Grisoni, BigChem online course 38
Root node
Decisionnode(s)
Leaves
1. Easy to interpret
2. No data pretreatment
3. Numerical/categorical variables
4. Classification and regression
5. Non parametric
6. Automatic variable selection
(1) Decision Tree Learning
|| 17.05.2017© F. Grisoni, BigChem online course 39
(1) Decision Tree Learning
Bagging (Bootstrap Aggregating) = “the power of the crowd”Random Forest
|| 17.05.2017© F. Grisoni, BigChem online course 40
(2) Local approachesk-Means clustering
x1
x2
|| 17.05.2017© F. Grisoni, BigChem online course 41
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
|| 17.05.2017© F. Grisoni, BigChem online course 42
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
|| 17.05.2017© F. Grisoni, BigChem online course 43
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
|| 17.05.2017© F. Grisoni, BigChem online course 44
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
|| 17.05.2017© F. Grisoni, BigChem online course 45
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
|| 17.05.2017© F. Grisoni, BigChem online course 46
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
|| 17.05.2017© F. Grisoni, BigChem online course 47
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
|| 17.05.2017© F. Grisoni, BigChem online course 48
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
5. End
|| 17.05.2017© F. Grisoni, BigChem online course 49
(2) Local approachesk-Nearest Neighbor (kNN)
?
||
1. Calculate a distance (ntrain times)
17.05.2017© F. Grisoni, BigChem online course 50
(2) Local approachesk-Nearest Neighbor (kNN)
?
|| 17.05.2017© F. Grisoni, BigChem online course 51
(2) Local approachesk-Nearest Neighbor (kNN)
?
1. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
|| 17.05.2017© F. Grisoni, BigChem online course 52
(2) Local approachesk-Nearest Neighbor (kNN)
?
k = 11. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
|| 17.05.2017© F. Grisoni, BigChem online course 53
(2) Local approachesk-Nearest Neighbor (kNN)
?
1. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
k = 2
|| 17.05.2017© F. Grisoni, BigChem online course 54
(2) Local approachesk-Nearest Neighbor (kNN)
?
1. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
k = 3
|| 17.05.2017© F. Grisoni, BigChem online course 55
(2) Local approachesk-Nearest Neighbor (kNN)
?
1. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
k = 4
|| 17.05.2017© F. Grisoni, BigChem online course 56
(2) Local approachesk-Nearest Neighbor (kNN)
1. Good for large training set with “localized” differences
2. Difficult to be interpreted
3. Which k?
4. Which distance measure?
5. Curse of dimensionality à Variable selection
|| 17.05.2017© F. Grisoni, BigChem online course 57
(3) Neural NetworksArtificial Neurons
x1
f (x)
x2
x3
x4
xp
y
Inputs
Output
Activation Function
|| 17.05.2017© F. Grisoni, BigChem online course 58
(3) Neural NetworksArtificial Neurons
x1
f (x)
x2
x3
x4
xp
y
Inputs
Output
Activation Function Neural networks. Comprehensive Chemometrics: Chemical and Biochemical Data Analysis Vol, 3.
|| 17.05.2017© F. Grisoni, BigChem online course 59
(3) Neural Networks
Input Layer
Hidden Layer(s)
Output Layer
1. Untrained Network
2. Compute the outcome
3. Compute the error
4. Back propagation learning
5. Repeat until stop criterion
Feed-Forward NN
|| 17.05.2017© F. Grisoni, BigChem online course 60
(3) Neural Networks
Epochs
Error
Training set
Feed-Forward NN
|| 17.05.2017© F. Grisoni, BigChem online course 61
(3) Neural Networks
Epochs
ErrorValidation set
Training set
Feed-Forward NN
|| 17.05.2017© F. Grisoni, BigChem online course 62
(3) Neural Networks
Epochs
ErrorValidation set
Training set
Feed-Forward NN
|| 17.05.2017© F. Grisoni, BigChem online course 63
(3) Neural NetworksKohonen Maps
• Unsupervised non-linear mapping• Topology preserving map
p dimensional
2 dimensional
|| 17.05.2017© F. Grisoni, BigChem online course 64
(3) Neural NetworksKohonen Maps
…
Neurons
InputKohonenLayer
Weights
1. Competitive Learning
• Similarity to each neuron
• “Winner takes all”
2. Collaborative Learning• Winning neuron update
• Update of close neurons
|| 17.05.2017© F. Grisoni, BigChem online course 65
(3) Neural NetworksKohonen Maps
Top map (compounds)
|| 17.05.2017© F. Grisoni, BigChem online course 66
(3) Neural NetworksKohonen Maps
Top map (compounds)
Weight maps (p)
|| 17.05.2017© F. Grisoni, BigChem online course 67
(3) Neural NetworksKohonen Maps
Top map (compounds)
Weight maps (p)
|| 17.05.2017© F. Grisoni, BigChem online course 68
• Purpose (clustering, regression, classification)
• Performance vs interpretability
• Covered chemical space (e.g., AD)
• Types of included variables
• …
http://scikit-learn.org/stable/tutorial/machine_learning_map/
Which ML algorithm?
|| 17.05.2017© F. Grisoni, BigChem online course 69
Summary
• Machines can learn from our data
• No ML algorithm always outperforms the others
• Validation and Applicability Domain assessment are crucial
• Pay attention to what the performance metric is telling you!
|| 17.05.2017© F. Grisoni, BigChem online course 70
Supplementary reading
Theory and Algorithms
• Shalev-Shwartz, S., & Ben-David, S. (2014). “Understanding machine learning: From theory to algorithms.” Cambridge university press.
• Marini, F. (2009). “Neural networks”. In: Comprehensive Chemometrics: Chemical and Biochemical Data Analysis - Vol, 3.
Online resources
• [Coursera] Ng, A. Machine Learning, Stanford University. https://www.coursera.org/learn/machine-learning• [Online Book] Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/