- Home
- Documents
*Compressed sensing meets symbolic regression: SISSO Compressed sensing: SISSO SIS: Sure-Independence*

prev

next

out of 28

View

1Download

0

Embed Size (px)

Compressed sensing meets

symbolic regression: SISSO

- Part 2 -

Luca M. Ghiringhelli

On-line course on Big Data and Artificial Intelligence in Materials Sciences

P = c1d1 + c2d2 + … + cndn

Compressed sensing, not only LASSO

Residual1 P (property)

d1 d2

P = c1d1 + c2d2 + … + cndn

Residual1 P (property)

d1 d1*d2

d2*

Compressed sensing, not only LASSO

Greedy method: Orthogonal Matching Pursuit

Limitation of greedy methods:

Compressed sensing: SISSO

SIS: Sure-Independence Screening

S2D S1D

features Residual1D

features

P (property)

Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802

Compressed sensing: SISSO

SIS: Sure-Independence Screening

S2D S1D

features Residual1D

features

P (property)

SO: Sparsifying Operator

Exact (by enumeration) overSimilarity criterion in SIS step: ● Scalar product (Pearson correlation) ● Spearman correlation (captures nonlinear monotonicity) ● Mutual information, … ● However: computational cost is to be factored in

Compressed sensing: SISSO

SIS: Sure-Independence Screening

S2D S1D

features Residual1D

features

P (property)

SO: Sparsifying Operator

Exact solution of:

by enumeration over

Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802

Compressed sensing: SISSO

SIS: Sure-Independence Screening

S2D S1D

features Residual1D

features

P (property)

SO: Sparsifying Operator

Exact (by enumeration) over

In practice: 0. i = 1, S = Ø 1. Rank features according to similarity to Residuali-1 (Property = Residual0). 2. Add first k features to S. 3. Perform least-square regression over all n-tuples in S. 4. The lowest error model is the i-dimensional SISSO model. 5. i ← i+1; goto 1.

P = c1d1 + c2d2 + … + cndn

Predicting crystal structures from the composition

Octet binaries (NaCl, ZnS, BN, KF, GaAs, CaO, …) Rock-salt or Zinc-blende structure?

Learning the relative stability from the property of the isolated atomic species

Rock salt 6-fold coordination Ionic bonding

Zinc blende 4-fold coordination Covalent bonding

KS le

ve ls

[e V]

Valence p

Valence sRadius @ max

example: Sn (Tin)

Valence p (HOMO)

Valence s KS level s [eV]

LUMO

Atomic features

exp(x)

xn

Energy2 Energy1

| x - y |

x / y

Length1 Length2

x / y

exp(-x)

ln(x)

Systematic construction of candidates

Length1 Length2

x + y

x·y

arctan(x)

Length1 Length2

x / y

exp(-x)

Energy2 Energy1

| x - y |

Systematic construction of candidates

P = c1d1 + c2d2 + … + cndn Each feature (column in the matrix), is a tree-represented candidate function, projected onto the training data. The (selected) descriptor has as component the features selected by the sparse recovery algorithm (here, SISSO).

Structure map from SISSO starting from 7x2 atomic features

LMG et al., PRL 2015 DOI: 10.1103/PhysRevLett.114.105503 LMG et al., NJP 2017 DOI: 10.1088/1367-2630/aa57bf

Predicting crystal structures from the composition

P = c1d1 + c2d2 + …

In SISSO the “hyperparameters” are:

The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

The size of the feature space determined by the complexity of the tree

Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

Data-driven model complexity

Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

In SISSO the “hyperparameters” are:

The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

The size of the feature space determined by the complexity of the tree

Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

Two levels of the tree, formulas like

Three levels of the tree, formulas like

Data-driven model complexity

Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

Compressed-sensing-based model identification: Shares concepts with

● Regularized regression. But: Massive sparsification.

● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

● Features (basis-set) selection. But: non-greedy solver.

● Symbolic regression. But: deterministic solver.

Few bits of taxonomy for SISSO

Compressed-sensing-based model identification: Shares concepts with

● Regularized regression. But: Massive sparsification.

● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

● Features (basis-set) selection. But: non-greedy solver.

● Symbolic regression. But: deterministic solver.

Few bits of taxonomy for SISSO

Open challenges of symbolic regression + compressed sensing approach: ● Efficiently include constants and scaling factors in the symbolic tree ● Include known, physical invariances in the symbolic-tree construction ● Include vectors (and tensors) as features. Contractions?

Intepretability

James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

Flexibility/complexity

In te

rp re

ta bi

lit y

● Sparsifying methods, LASSO, SISSO, Symbolic regression ● Linear regression

● Kernelized regression ● Trees

● Forests ● Support vector machine

● Neural Networks

Model Interpretability: related to sparse features selection

Intepretability

James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

Flexibility/complexity

In te

rp re

ta bi

lit y

● Sparsifying methods, LASSO, SISSO, Symbolic regression ● Linear regression

● Kernelized regression ● Trees

● Forests ● Support vector machine

● Neural Networks

Model Interpretability: related to sparse features selection

In general, with symbolic regression: ● If the exact equation is within reach of the searching/optimizing algorithm,

it is found. Simple model does not necessarily mean less accurate For other powerful ML methods (kernel regression, regression trees and forests, deep learning, this is not the case.

● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

x Atomic fraction IE Ionization energy χ Electronegativity

Intepretability: what might endow us with

x Atomic fraction IE Ionization energy χ Electronegativity

Intepretability: what might endow us with

HgTe (std pressure, ZB) GaAs (std pressure, ZB)

CdTe (std pressure, ZB)

(9 GPa, RS) (29 GPa, oI4)

(4 GPa, RS)

Intepretability: what might endow us with

Intepretability: what might endow us with

Multi-task learning

Application: multi-phase stability diagram Properties: crystal-structure formation energies

d1

d2 RS

CsCl

Multi-task learning

Multi-task learning

Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

Multi-task learning

Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

MT-SISSO is remarkably

data-parsimonious

Multi-task learning

Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

Intepretability

James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

Flexibility/complexity

In te

rp re

ta bi

lit y

● Sparsifying methods, LASSO, SISSO, Symbolic regression ● Linear regression

● Kernelized regression ● Trees

● Forests ● Support vector machine

● Neural Networks

Model Interpretability: related to sparse features selection

In general, with symbolic regression: ● If the exact equation is within reach of the searching/optimizing algorithm,

it is found. Simple model does not necessarily mean less accurate For other powerful ML methods (kernel regression, regression trees and forests, deep learning, this is not the case.

● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

Slide 1 Slide 2 Slide 6 Slide 8 Slide 12 Slide 14 Slide 20 Slide 21 Slide 22 Slide 23 Slide 25 Slide 26 Slide 31 Slide 32 Slide 37 Slide 38 Slide 39 Slide 40 Slide 41 Slide 43 Slide 44 Slide 45 Slide 50 Slide 51 Slide 52 Slide 53 Slide 54 Slide 55