Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Application of Machine
Learning to Materials Discovery
and Development
Ankit Agrawal and Alok Choudhary Department of Electrical Engineering and Computer Science
Northwestern University {ankitag,choudhar}@eecs.northwestern.edu
Contributors: Surya Kalidindi (GaTech), Basavarsu (TRDDC), Chris Wolverton (NU),
Ahmet Cecen (GaTech), Parijat Deshpande (TRDDC), Bryce Meredig (NU)
MURI 3-Year Review
June 22-23, 2015
Integrated Computational Materials
Engineering (ICME)
1
© Olson, G. B. (1997). Computational design of hierarchically structured materials. Science, 277(5330), 1237-1242.
Processing
Structure
Properties
Performance
Goal/means
Cause and effect
Project Collaboration
2
© Olson, G. B. (1997). Computational design of hierarchically structured materials. Science, 277(5330), 1237-1242.
Processing
Structure
Properties
Performance
Goal/means
Cause and effect
Project I. Multi-objective Structure-Property Optimization
Project Collaboration
3
© Olson, G. B. (1997). Computational design of hierarchically structured materials. Science, 277(5330), 1237-1242.
Processing
Structure
Properties
Performance
Goal/means
Cause and effect
Project I. Multi-objective Structure-Property Optimization
Project II. Multiscale Prediction of Localization Relationships
Project Collaboration
4
© Olson, G. B. (1997). Computational design of hierarchically structured materials. Science, 277(5330), 1237-1242.
Processing
Structure
Properties
Performance
Goal/means
Cause and effect
Project I. Multi-objective Structure-Property Optimization
Project II. Multiscale Prediction of Localization Relationships
Project III. Exploring Composition-Processing-Property Relationships
Project IV. Composition-based Discovery of Stable Compounds
Predicting fatigue strength of steel from
composition and processing parameters
• CORRELATES TO COMPOSITION
• CORRELATES TO MANUFACTURING
PROCESSES
PROPERTIES
(FATIGUE STRENGTH)
Objective: Employ data-driven approaches to the NIMS public domain materials database for exploring composition-processing-property relationships and constructing predictive models for fatigue strength of steels.
Collaborative project between Agrawal (NU), Choudhary (NU), Kalidindi (GaTech), Basavarsu (TRDDC)
NIMS Database Attributes
6 Reference : http://mits.nims.go.jp/index_en.html
Fatigue Data Sheet Information:
Chemical composition - %C, %Si, %Mn, %P, %S, %Ni, %Cr, Cu %, Mo% (all in wt. %)
Upstream processing details - Ingot size, Reduction ratio, Non-metallic inclusions
Heat treatment conditions – Temperature, Time and other process conditions for Normalizing, Carburizing-Quenching and Tempering processes
Mechanical properties - YS, UTS, %EL (Elongation), %RA (Reduction in Area), Vickers Hardness, Charpy impact value (J/cm2), Rotating bending fatigue strength @ 107 cycles
Total - 437 data records Carbon and low alloy steels - 371 observations, Carburizing steels - 48 observations and Spring steels -18 observations
Steel Fatigue Strength Prediction Framework
7
8
Data Mining Modeling
• Classification/Regression • Learning a predictive model based on
supervised (labeled) training data, which can then be used to classify unseen data
• E.g. Decision trees, Neural Networks, Support Vector Machines, etc.
• Model evaluation • Test-train split
• Split the labeled data into training and testing sets
• Cross-validation • Test every instance in the dataset
using a model that has not seen that instance
• Types • k-fold cross validation • Leave-one-out cross-validation
(LOOCV) with k=n
Training
split
Testing
split
Cluster Visualization
9
Information Gain Based Feature Ranking
10
Compare vectors of actual and predicted values Coefficient of correlation (R)
Coefficient of determination (R2)
Mean Absolute Error (MAE)
Root Mean Squared Error (RMSE)
Standard Deviation of Error (SDE)
Mean Absolute Error Fraction (MAE)
Root Mean Squared Error Fraction (RMSE)
Standard Deviation of Error Fraction (SDE)
Evaluation Metrics
Results Comparison
12
13
14
Results Comparison
A. Agrawal, P. D. Deshpande, A. Cecen, G. P. Basavarsu, A. N. Choudhary, and S. R. Kalidindi, “Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters,” Integrating Materials and Manufacturing Innovation, 3 (8): 1–19, 2014.
Discovery of stable compounds
Collaborative project between Agrawal (NU), Choudhary (NU), Wolverton (NU)
Database Construction
• Thousands of DFT formation energies
• Empirical elemental data
Predictive Modeling
• Model 1: established heuristic
• Model 2: data mining
Model Evaluation
• Test models on unseen formation energies
Prediction
• Run combinatorial list of compositions through models
Ranking
• Combine heuristic and data mining predictions
Validation
• Experiments
• Crystal structure prediction
Millions of
candidate
ternary
compositions
Formation
energy
predictions
Models Compound
discovery
(a)
(b)
Ranked
high-
potential
candidates
Discovery Framework
Model Validation: Numerical
-5.0
-4.0
-3.0
-2.0
-1.0
0.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0
Mo
del
form
ati
on
en
erg
y (
eV
/ato
m)!
DFT formation energy (eV/atom)!
DM: binaries!R2 = 0.87!
MAE = 0.27 eV/at!
-5.0
-4.0
-3.0
-2.0
-1.0
0.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0
Mo
del
form
ati
on
en
erg
y (
eV
/ato
m)!
DFT formation energy (eV/atom)!
DM: binaries!R2 = 0.87!
MAE = 0.27 eV/at!
DM: bin. + tern.!R2 = 0.93!
MAE = 0.16 eV/at! -5.0
-4.0
-3.0
-2.0
-1.0
0.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0
Mo
del
form
ati
on
en
erg
y (
eV
/ato
m)!
DFT formation energy (eV/atom)!
DM: binaries!R2 = 0.87!
MAE = 0.27 eV/at!
DM: bin. + tern.!R2 = 0.93!
MAE = 0.16 eV/at!
Heuristic!R2 = 0.95!
MAE = 0.12 eV/at!
Model Validation: Ranking
Combined model
outperforms
either alone in
regime of interest Classifyallunstable
0!
0.2!
0.4!
0.6!
0.8!
1!
0! 0.2! 0.4! 0.6! 0.8! 1!
Tru
e p
osi
tive
rat
e (s
ensi
tivi
ty)!
False positive rate (1 - specificity)!
random
guessing!
heuristic!combined!
DM: bin.
+4k tern.!
perfect classifier classify all stable
classify all unstable
Classifier becomes:
more conservative
less conservative
What happens when we rank “all
possible ternaries” by their
likelihood of stability?
Predictions for Discovery
Average of all A-B-X ternaries
Fingerprint of entire
unexplored ternary
composition space!
Interesting insights:
Highest ranked ternary:
SiYb3F5
Si acts as an anion
Validated with structure
and DFT calculations
pnictides, chalcogenides,
halides
Pt-X-Y
Pm12S19Se – a missing
binary Pm2S3?
Example of discovered stable ternary compositions whose stability was explicitly confirmed with crystal structure prediction. Our method is successful at identifying new stable
compounds across a wide variety of chemistries. 21
Validation
B. Meredig*, A. Agrawal*, S. Kirklin, J. E. Saal, J. W. Doak, A. Thompson, K. Zhang, A. Choudhary, and C. Wolverton, “Combinatorial screening for new materials in unconstrained composition space with machine learning”, Phys. Rev. B, 89, 094104, March 2014.
Summary
Steel Fatigue Strength Prediction o NIMS database consisting of composition and processing
parameters linked with performance (fatigue strength).
o Neural networks, decision trees, multivariate polynomial regression able to achieve high R2 values of >0.98.
Stable Compound Discovery o A database of DFT calculations used to learn composition-
property relationships, thus mimicking DFT for estimating stability.
o The resulting predictive models used to scan the entire ternary composition space to discover likely stable compositions.
o Many predictions explicitly confirmed with crystal structure prediction and DFT.
22
Future Outlook
23
Processing
Structure
Properties
Performance
Goal/means
Cause and effect
Project I. Multi-objective Structure-Property Optimization
Project II. Multiscale Prediction of Localization Relationships
Project III. Exploring Composition-Processing-Property Relationships
Project IV. Composition-based Discovery of Stable Compounds
Publications
• A. Agrawal, P. D. Deshpande, A. Cecen, G. P. Basavarsu, A. N. Choudhary, and S. R. Kalidindi, “Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters,” Integrating Materials and Manufacturing Innovation, vol. 3, no. 8, pp. 1–19, 2014.
• B. Meredig, A. Agrawal, S. Kirklin, J. E. Saal, J. W. Doak, A. Thompson, K. Zhang, A. Choudhary, and C. Wolverton, “Combinatorial screening for new materials in unconstrained composition space with machine learning,” Physical Review B, vol. 89, no. 094104, pp. 1–7, 2014. BM and AA are co-first authors.
• Ruoqian Liu, Abhishek Kumar, Zhengzhang Chen, Ankit Agrawal, Veera Sundararaghavan, Alok Choudhary, “A predictive machine learning approach for microstructure optimization and materials design,” Scientific Reports, Nature Publishing Group, 2015, in press.
• R. Liu, Z. Chen, T. Fast, S. Kalidindi, A. Agrawal, and A. Choudhary, “Predictive Modeling in Characterizing Localization Relationships.” 2014. 2014 TMS Annual Meeting & Exhibition, Symposium of Data Analytics for Materials Science and Manufacturing, Feb. 16-20, San Diego, CA.
• R. Liu, A. Kumar, Z. Chen, A. Agrawal, V. Sundararaghavan, and A. Choudhary, “A Data Mining Approach in Structure-Property Optimization.” 2014. 2014 TMS Annual Meeting & Exhibition, Symposium of Data Analytics for Materials Science and Manufacturing, Feb. 16-20, San Diego, CA.
• P. D. Deshpande, B. P. Gautham, A. Cecen, S. Kalidindi, A. Agrawal, and A. Choudhary, “Application of Statistical and Machine Learning Techniques for Correlating Properties to Composition and Manufacturing Processes of Steels,” in 2nd World Congress on Integrated Computational Materials Engineering, July 7-11, 2013, Salt Lake City, Utah, 2013, pp. 155–160.
• R. Liu, Y. Yabansu, S. Kalidindi, A. Agrawal, and A. Choudhary, “Predictive Modeling in Characterizing Localization Relationships.” 2015, in preparation. 24
Thank You !
25
Ankit Agrawal Research Associate Professor
Dept. of Electrical Engineering and Computer Science
Northwestern University [email protected]
www.eecs.northwestern.edu/~ankitag/