Transcript
  • Compressed sensingmeets

    symbolic regression:SISSO

    - Part 2 -

    Luca M. Ghiringhelli

    On-line course on Big Data and Artificial Intelligence in Materials Sciences

  • P = c1d1 + c2d2 + … + cndn

    Compressed sensing, not only LASSO

  • Residual1P (property)

    d1d2

    P = c1d1 + c2d2 + … + cndn

    Residual1P (property)

    d1d1*d2

    d2*

    Compressed sensing, not only LASSO

    Greedy method:Orthogonal Matching Pursuit

    Limitation of greedy methods:

  • Compressed sensing: SISSO

    SIS:Sure-Independence Screening

    S2DS1D

    featuresResidual1D

    features

    P (property)

    Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802

  • Compressed sensing: SISSO

    SIS:Sure-Independence Screening

    S2DS1D

    featuresResidual1D

    features

    P (property)

    SO:Sparsifying Operator

    Exact (by enumeration) overSimilarity criterion in SIS step:● Scalar product (Pearson correlation)● Spearman correlation (captures nonlinear monotonicity)● Mutual information, …● However: computational cost is to be factored in

  • Compressed sensing: SISSO

    SIS:Sure-Independence Screening

    S2DS1D

    featuresResidual1D

    features

    P (property)

    SO:Sparsifying Operator

    Exact solution of:

    by enumeration over

    Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802

  • Compressed sensing: SISSO

    SIS:Sure-Independence Screening

    S2DS1D

    featuresResidual1D

    features

    P (property)

    SO:Sparsifying Operator

    Exact (by enumeration) over

    In practice:0. i = 1, S = Ø1. Rank features according to similarity to Residuali-1 (Property = Residual0). 2. Add first k features to S. 3. Perform least-square regression over all n-tuples in S.4. The lowest error model is the i-dimensional SISSO model.5. i ← i+1; goto 1.

    P = c1d1 + c2d2 + … + cndn

  • Predicting crystal structures from the composition

    Octet binaries (NaCl, ZnS, BN, KF, GaAs, CaO, …)Rock-salt or Zinc-blende structure?

    Learning the relative stability from the property of the isolated atomic species

    Rock salt6-fold coordinationIonic bonding

    Zinc blende4-fold coordinationCovalent bonding

  • KS le

    vels

    [eV]

    Valence p

    Valence sRadius @ max

    example: Sn (Tin)

    Valence p (HOMO)

    Valence sKS level s [eV]

    LUMO

    Atomic features

  • exp(x)

    xn

    Energy2 Energy1

    | x - y |

    x / y

    Length1 Length2

    x / y

    exp(-x)

    ln(x)

    Systematic construction of candidates

    Length1 Length2

    x + y

    x·y

    arctan(x)

    Length1 Length2

    x / y

    exp(-x)

    Energy2 Energy1

    | x - y |

  • Systematic construction of candidates

    P = c1d1 + c2d2 + … + cndnEach feature (column in the matrix), is a tree-represented candidate function, projected onto the training data.The (selected) descriptor has as component the features selected by the sparse recovery algorithm (here, SISSO).

  • Structure map from SISSOstarting from 7x2 atomic features

    LMG et al., PRL 2015 DOI: 10.1103/PhysRevLett.114.105503LMG et al., NJP 2017 DOI: 10.1088/1367-2630/aa57bf

    Predicting crystal structures from the composition

    P = c1d1 + c2d2 + …

  • In SISSOthe “hyperparameters” are:

    The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

    The size of the feature spacedetermined by the complexity of the tree

    Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

    Data-driven model complexity

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • In SISSO the “hyperparameters” are:

    The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

    The size of the feature spacedetermined by the complexity of the tree

    Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

    Two levels of the tree, formulas like

    Three levels of the tree, formulas like

    Data-driven model complexity

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • Compressed-sensing-based model identification:Shares concepts with

    ● Regularized regression. But: Massive sparsification.

    ● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

    ● Features (basis-set) selection. But: non-greedy solver.

    ● Symbolic regression. But: deterministic solver.

    Few bits of taxonomy for SISSO

  • Compressed-sensing-based model identification:Shares concepts with

    ● Regularized regression. But: Massive sparsification.

    ● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

    ● Features (basis-set) selection. But: non-greedy solver.

    ● Symbolic regression. But: deterministic solver.

    Few bits of taxonomy for SISSO

    Open challenges of symbolic regression + compressed sensing approach:● Efficiently include constants and scaling factors in the symbolic tree● Include known, physical invariances in the symbolic-tree construction● Include vectors (and tensors) as features. Contractions?

  • Intepretability

    James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

    Flexibility/complexity

    Inte

    rpre

    tabi

    lity

    ● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression

    ● Kernelized regression● Trees

    ● Forests● Support vector machine

    ● Neural Networks

    Model Interpretability: related to sparse features selection

  • Intepretability

    James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

    Flexibility/complexity

    Inte

    rpre

    tabi

    lity

    ● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression

    ● Kernelized regression● Trees

    ● Forests● Support vector machine

    ● Neural Networks

    Model Interpretability: related to sparse features selection

    In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,

    it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.

    ● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

  • x Atomic fractionIE Ionization energyχ Electronegativity

    Intepretability: what might endow us with

  • x Atomic fractionIE Ionization energyχ Electronegativity

    Intepretability: what might endow us with

  • HgTe (std pressure, ZB)GaAs (std pressure, ZB)

    CdTe (std pressure, ZB)

    (9 GPa, RS)(29 GPa, oI4)

    (4 GPa, RS)

    Intepretability: what might endow us with

  • Intepretability: what might endow us with

  • Multi-task learning

  • Application: multi-phase stability diagramProperties: crystal-structure formation energies

    d1

    d2 RS

    CsCl

    Multi-task learning

  • Multi-task learning

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • Multi-task learning

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • MT-SISSO is remarkably

    data-parsimonious

    Multi-task learning

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • Intepretability

    James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

    Flexibility/complexity

    Inte

    rpre

    tabi

    lity

    ● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression

    ● Kernelized regression● Trees

    ● Forests● Support vector machine

    ● Neural Networks

    Model Interpretability: related to sparse features selection

    In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,

    it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.

    ● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

    Slide 1Slide 2Slide 6Slide 8Slide 12Slide 14Slide 20Slide 21Slide 22Slide 23Slide 25Slide 26Slide 31Slide 32Slide 37Slide 38Slide 39Slide 40Slide 41Slide 43Slide 44Slide 45Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55