Compressed sensing meets symbolic regression: SISSO Compressed sensing: SISSO SIS: Sure-Independence

  • View
    1

  • Download
    0

Embed Size (px)

Text of Compressed sensing meets symbolic regression: SISSO Compressed sensing: SISSO SIS: Sure-Independence

  • Compressed sensing meets

    symbolic regression: SISSO

    - Part 2 -

    Luca M. Ghiringhelli

    On-line course on Big Data and Artificial Intelligence in Materials Sciences

  • P = c1d1 + c2d2 + … + cndn

    Compressed sensing, not only LASSO

  • Residual1 P (property)

    d1 d2

    P = c1d1 + c2d2 + … + cndn

    Residual1 P (property)

    d1 d1*d2

    d2*

    Compressed sensing, not only LASSO

    Greedy method: Orthogonal Matching Pursuit

    Limitation of greedy methods:

  • Compressed sensing: SISSO

    SIS: Sure-Independence Screening

    S2D S1D

    features Residual1D

    features

    P (property)

    Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802

  • Compressed sensing: SISSO

    SIS: Sure-Independence Screening

    S2D S1D

    features Residual1D

    features

    P (property)

    SO: Sparsifying Operator

    Exact (by enumeration) overSimilarity criterion in SIS step: ● Scalar product (Pearson correlation) ● Spearman correlation (captures nonlinear monotonicity) ● Mutual information, … ● However: computational cost is to be factored in

  • Compressed sensing: SISSO

    SIS: Sure-Independence Screening

    S2D S1D

    features Residual1D

    features

    P (property)

    SO: Sparsifying Operator

    Exact solution of:

    by enumeration over

    Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802

  • Compressed sensing: SISSO

    SIS: Sure-Independence Screening

    S2D S1D

    features Residual1D

    features

    P (property)

    SO: Sparsifying Operator

    Exact (by enumeration) over

    In practice: 0. i = 1, S = Ø 1. Rank features according to similarity to Residuali-1 (Property = Residual0). 2. Add first k features to S. 3. Perform least-square regression over all n-tuples in S. 4. The lowest error model is the i-dimensional SISSO model. 5. i ← i+1; goto 1.

    P = c1d1 + c2d2 + … + cndn

  • Predicting crystal structures from the composition

    Octet binaries (NaCl, ZnS, BN, KF, GaAs, CaO, …) Rock-salt or Zinc-blende structure?

    Learning the relative stability from the property of the isolated atomic species

    Rock salt 6-fold coordination Ionic bonding

    Zinc blende 4-fold coordination Covalent bonding

  • KS le

    ve ls

    [e V]

    Valence p

    Valence sRadius @ max

    example: Sn (Tin)

    Valence p (HOMO)

    Valence s KS level s [eV]

    LUMO

    Atomic features

  • exp(x)

    xn

    Energy2 Energy1

    | x - y |

    x / y

    Length1 Length2

    x / y

    exp(-x)

    ln(x)

    Systematic construction of candidates

    Length1 Length2

    x + y

    x·y

    arctan(x)

    Length1 Length2

    x / y

    exp(-x)

    Energy2 Energy1

    | x - y |

  • Systematic construction of candidates

    P = c1d1 + c2d2 + … + cndn Each feature (column in the matrix), is a tree-represented candidate function, projected onto the training data. The (selected) descriptor has as component the features selected by the sparse recovery algorithm (here, SISSO).

  • Structure map from SISSO starting from 7x2 atomic features

    LMG et al., PRL 2015 DOI: 10.1103/PhysRevLett.114.105503 LMG et al., NJP 2017 DOI: 10.1088/1367-2630/aa57bf

    Predicting crystal structures from the composition

    P = c1d1 + c2d2 + …

  • In SISSO the “hyperparameters” are:

    The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

    The size of the feature space determined by the complexity of the tree

    Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

    Data-driven model complexity

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • In SISSO the “hyperparameters” are:

    The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

    The size of the feature space determined by the complexity of the tree

    Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

    Two levels of the tree, formulas like

    Three levels of the tree, formulas like

    Data-driven model complexity

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • Compressed-sensing-based model identification: Shares concepts with

    ● Regularized regression. But: Massive sparsification.

    ● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

    ● Features (basis-set) selection. But: non-greedy solver.

    ● Symbolic regression. But: deterministic solver.

    Few bits of taxonomy for SISSO

  • Compressed-sensing-based model identification: Shares concepts with

    ● Regularized regression. But: Massive sparsification.

    ● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

    ● Features (basis-set) selection. But: non-greedy solver.

    ● Symbolic regression. But: deterministic solver.

    Few bits of taxonomy for SISSO

    Open challenges of symbolic regression + compressed sensing approach: ● Efficiently include constants and scaling factors in the symbolic tree ● Include known, physical invariances in the symbolic-tree construction ● Include vectors (and tensors) as features. Contractions?

  • Intepretability

    James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

    Flexibility/complexity

    In te

    rp re

    ta bi

    lit y

    ● Sparsifying methods, LASSO, SISSO, Symbolic regression ● Linear regression

    ● Kernelized regression ● Trees

    ● Forests ● Support vector machine

    ● Neural Networks

    Model Interpretability: related to sparse features selection

  • Intepretability

    James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

    Flexibility/complexity

    In te

    rp re

    ta bi

    lit y

    ● Sparsifying methods, LASSO, SISSO, Symbolic regression ● Linear regression

    ● Kernelized regression ● Trees

    ● Forests ● Support vector machine

    ● Neural Networks

    Model Interpretability: related to sparse features selection

    In general, with symbolic regression: ● If the exact equation is within reach of the searching/optimizing algorithm,

    it is found. Simple model does not necessarily mean less accurate For other powerful ML methods (kernel regression, regression trees and forests, deep learning, this is not the case.

    ● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

  • x Atomic fraction IE Ionization energy χ Electronegativity

    Intepretability: what might endow us with

  • x Atomic fraction IE Ionization energy χ Electronegativity

    Intepretability: what might endow us with

  • HgTe (std pressure, ZB) GaAs (std pressure, ZB)

    CdTe (std pressure, ZB)

    (9 GPa, RS) (29 GPa, oI4)

    (4 GPa, RS)

    Intepretability: what might endow us with

  • Intepretability: what might endow us with

  • Multi-task learning

  • Application: multi-phase stability diagram Properties: crystal-structure formation energies

    d1

    d2 RS

    CsCl

    Multi-task learning

  • Multi-task learning

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • Multi-task learning

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • MT-SISSO is remarkably

    data-parsimonious

    Multi-task learning

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • Intepretability

    James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

    Flexibility/complexity

    In te

    rp re

    ta bi

    lit y

    ● Sparsifying methods, LASSO, SISSO, Symbolic regression ● Linear regression

    ● Kernelized regression ● Trees

    ● Forests ● Support vector machine

    ● Neural Networks

    Model Interpretability: related to sparse features selection

    In general, with symbolic regression: ● If the exact equation is within reach of the searching/optimizing algorithm,

    it is found. Simple model does not necessarily mean less accurate For other powerful ML methods (kernel regression, regression trees and forests, deep learning, this is not the case.

    ● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

    Slide 1 Slide 2 Slide 6 Slide 8 Slide 12 Slide 14 Slide 20 Slide 21 Slide 22 Slide 23 Slide 25 Slide 26 Slide 31 Slide 32 Slide 37 Slide 38 Slide 39 Slide 40 Slide 41 Slide 43 Slide 44 Slide 45 Slide 50 Slide 51 Slide 52 Slide 53 Slide 54 Slide 55