Download pdf - Compressed sensing meets symbolic regression: SISSO · Compressed sensing: SISSO SIS: Sure-Independence Screening S2D S1D features Residual1D features P (property) Ouyang et al.,

Compressed sensingmeets

symbolic regression:SISSO

- Part 2 -

Luca M. Ghiringhelli

On-line course on Big Data and Artificial Intelligence in Materials Sciences

P = c1d1 + c2d2 + … + cndn

Compressed sensing, not only LASSO

Residual1P (property)

d1d2

P = c1d1 + c2d2 + … + cndn

Residual1P (property)

d1d1*d2

d2*

Compressed sensing, not only LASSO

Greedy method:Orthogonal Matching Pursuit

Limitation of greedy methods:

Compressed sensing: SISSO

SIS:Sure-Independence Screening

S2DS1D

featuresResidual1D

features

P (property)

Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802



S2DS1D

featuresResidual1D

features

P (property)

SO:Sparsifying Operator

Exact (by enumeration) overSimilarity criterion in SIS step:● Scalar product (Pearson correlation)● Spearman correlation (captures nonlinear monotonicity)● Mutual information, …● However: computational cost is to be factored in



S2DS1D

featuresResidual1D

features

P (property)


Exact solution of:

by enumeration over

Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802



S2DS1D

featuresResidual1D

features

P (property)


Exact (by enumeration) over

In practice:0. i = 1, S = Ø1. Rank features according to similarity to Residuali-1 (Property = Residual0). 2. Add first k features to S. 3. Perform least-square regression over all n-tuples in S.4. The lowest error model is the i-dimensional SISSO model.5. i ← i+1; goto 1.

P = c1d1 + c2d2 + … + cndn

Predicting crystal structures from the composition

Octet binaries (NaCl, ZnS, BN, KF, GaAs, CaO, …)Rock-salt or Zinc-blende structure?

Learning the relative stability from the property of the isolated atomic species

Rock salt6-fold coordinationIonic bonding

Zinc blende4-fold coordinationCovalent bonding

KS le

vels

[eV]

Valence p

Valence sRadius @ max

example: Sn (Tin)

Valence p (HOMO)

Valence sKS level s [eV]

LUMO

Atomic features

exp(x)

xn

Energy2 Energy1

| x - y |

x / y

Length1 Length2

x / y

exp(-x)

ln(x)

Systematic construction of candidates

Length1 Length2

x + y

x·y

arctan(x)

Length1 Length2

x / y

exp(-x)

Energy2 Energy1

| x - y |

Systematic construction of candidates

P = c1d1 + c2d2 + … + cndnEach feature (column in the matrix), is a tree-represented candidate function, projected onto the training data.The (selected) descriptor has as component the features selected by the sparse recovery algorithm (here, SISSO).

Structure map from SISSOstarting from 7x2 atomic features

LMG et al., PRL 2015 DOI: 10.1103/PhysRevLett.114.105503LMG et al., NJP 2017 DOI: 10.1088/1367-2630/aa57bf

Predicting crystal structures from the composition

P = c1d1 + c2d2 + …

In SISSOthe “hyperparameters” are:

The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

The size of the feature spacedetermined by the complexity of the tree

Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

Data-driven model complexity

Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

In SISSO the “hyperparameters” are:

The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

The size of the feature spacedetermined by the complexity of the tree

Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

Two levels of the tree, formulas like

Three levels of the tree, formulas like

Data-driven model complexity


Compressed-sensing-based model identification:Shares concepts with

● Regularized regression. But: Massive sparsification.

● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

● Features (basis-set) selection. But: non-greedy solver.

● Symbolic regression. But: deterministic solver.

Few bits of taxonomy for SISSO

Compressed-sensing-based model identification:Shares concepts with

● Regularized regression. But: Massive sparsification.

● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

● Features (basis-set) selection. But: non-greedy solver.

● Symbolic regression. But: deterministic solver.

Few bits of taxonomy for SISSO

Open challenges of symbolic regression + compressed sensing approach:● Efficiently include constants and scaling factors in the symbolic tree● Include known, physical invariances in the symbolic-tree construction● Include vectors (and tensors) as features. Contractions?

Intepretability

James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

Flexibility/complexity

Inte

rpre

tabi

lity

● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression

● Kernelized regression● Trees

● Forests● Support vector machine

● Neural Networks

Model Interpretability: related to sparse features selection

Intepretability



Inte

rpre

tabi

lity




● Neural Networks


In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,

it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.

● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

x Atomic fractionIE Ionization energyχ Electronegativity

Intepretability: what might endow us with

HgTe (std pressure, ZB)GaAs (std pressure, ZB)

CdTe (std pressure, ZB)

(9 GPa, RS)(29 GPa, oI4)

(4 GPa, RS)


Multi-task learning

Application: multi-phase stability diagramProperties: crystal-structure formation energies

d1

d2 RS

CsCl

Multi-task learning

Multi-task learning


MT-SISSO is remarkably

data-parsimonious

Multi-task learning


Intepretability



Inte

rpre

tabi

lity




● Neural Networks


In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,

it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.

● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

Slide 1Slide 2Slide 6Slide 8Slide 12Slide 14Slide 20Slide 21Slide 22Slide 23Slide 25Slide 26Slide 31Slide 32Slide 37Slide 38Slide 39Slide 40Slide 41Slide 43Slide 44Slide 45Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55