Materials informatics

Preview:

Citation preview

Evgeny Blokhin

Chelyabinsk SUSU’2013 summer workshop

Max-Planck Institute for Solid State Research

Stuttgart, Germany

Materials informatics

Outlook

1. Data-mining in materials science

2. Blue Obelisk

3. Python programming language

What is data-mining?

statistics

databases

information theory machine learning

artificial in

telligence

optimization

Datamining

Tasks of data-mining

1. Classification

2. Prognosing

3. Visualization

4. Reasoning

5. Analysis

6. Expert systems

Big data in materials science

EXAMPLE: nearly for the last 4 years

with my colleagues-theoreticians we produced:

over 9000 simulation output files

over 50 articles

1. Accelrys Pipeline Pilot and Materials Studio, http://accelrys.com/products2. AFLOW framework and Aflowlib.org repository, http://www.aflowlib.org3. AIDA, Bosch LLC4. Blue Obelisk Data Repository (XSLT, XML), http://bodr.sourceforge.net5. CCLib (Python), http://cclib.sf.net6. CDF (Python), http://kitchingroup.cheme.cmu.edu/cdf7. CMR (Python), https://wiki.fysik.dtu.dk/cmr8. Comp. Chem. Comparison and Benchmark Database, http://cccbdb.nist.gov9. cctbx: Computational Crystallography Toolbox, http://cctbx.sourceforge.net10. ESTEST (Python, XQuery), http://estest.ucdavis.edu11. J-ICE online viewer (based on Jmol, Java), http://j-ice.sourceforge.net12. Materials Project (Python), http://www.materialsproject.org13. PAULING FILE world largest database for inorganic compounds, http://paulingfile.com14. Quixote, http://quixote.wikispot.org15. Scipio (Java), https://scipio.iciq.es16. WebMO: Web-based interface to computational chemistry packages (Java,

Perl), http://webmo.net

New type of modeling software

…and smart codesENCUT = 500IBRION = 2ISIF = 3NSW = 20IDIOT = 3NELMIN = 5EDIFF = 1.0e-08EDIFFG = -1.0e-08IALGO = 38ISMEAR = 0LREAL = .FALSE.LWAVE = .FALSE.

*** VASP MASTER: I AM SURE YOU KNOW WHAT YOU ARE DOING ***

d-metal oxides

band gap problem

standard DFT GGA approach

Hartree-Fockadmixing

LCAO approximation

Usage of Gaussian basis sets

good atomization energy

Example of inference over an ontology

Open data, open standards, open source in chemistry

Open data, open standards, open source in chemistry

1.Elsevier, Wiley, Springer publishers are “evil”

2.“The right to read is right to mine”

3.“Jailbreaking” the scientific data from PDFs: access, reuse, integrity

4.Why the level of collaboration is so low?

Materials Project

Prof. G. Ceder,

MIT, Boston

Guido van Rossum,

Google, Dropboxhttp://goo.gl/FtFS7h

Python programming language

Advantages of Python

Syntax: tabulation, syntactic sugar, speech-like, flexibility, expression

VERY fast prototyping

Great popularity in scientific community

100% cross-platform and portable

Disadvantages of Python

Relatively slow speed comparing to compiled languages like C++ or Fortran

Global Interpreter Lock (GIL)

Historically not popular in some narrow scientific areas (“reigns” of Java)

Two examples

list = [x**2 for x in range(10)]

numbers = [10, 4, 2, -1, 6]filter(lambda x: x < 5, numbers)

1. Multi-dimensional array manipulation (fast!)

2. Discrete fourier transform

3. Linear Algebra

4. Mathematical functions

5. Matrix library

6. Polynomials

7. Set routines

8. Sorting, searching and counting

9. Statistics

eigvals, eigvecs = numpy.linalg.eigh(dynmat)

Solving eigenvalue problem for a dynamical matrix (phonopy code):

Recommended