38
Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets symbolic regression: SISSO - Part 1 -

Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Luca M. Ghiringhelli

On-line course on Big Data and Artificial Intelligence in Materials Sciences

Compressed sensingmeets

symbolic regression:SISSO

- Part 1 -

Page 2: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Reminder: Few bits of taxonomy

Machine learning

Representation learning

Learning algorithms that learn their representation and the predictive model.- symbolic regression- deep learning

Artificial intelligence

Page 3: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Compressed sensing meets symbolic regression

SymbolicregressionSymbolic

regressionCompressed

sensingCompressed

sensing

EvolutionaryprogrammingEvolutionary

programming

Sure-independence screeningcombined with

sparsifying operatorSISSO

Sure-independence screeningcombined with

sparsifying operatorSISSO

Featureselection/identification

Featureselection/identification

Page 4: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Linear regression

Symbolic regression

Page 5: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Linear regression

Symbolic regression

Page 6: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

11111111

Linear regression

Symbolic regression

Page 7: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Linear regression

Symbolic regression

Page 8: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Linear regression

Symbolic regression

Page 9: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Linear regression Kernel regression

One-hidden-layer perceptron

Symbolic regression

Page 10: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Linear regression Kernel regression

One-hidden-layer perceptron

Symbolic regression

Page 11: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

exp(x)

xn

ln(x)

Systematic construction of candidates

Length1 Length2

x + y

x·y

arctan(x)

Energy2 Energy1 Length1 Length2

| x - y | x + y

x / y

Length3

x3

Energy2 Energy1

| x - y |

Page 12: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

exp(x)

xn

Energy2 Energy1

| x - y |

x / y

Length1 Length2

x / y

exp(-x)

ln(x)

Systematic construction of candidates

Length1 Length2

x + y

x·y

arctan(x)

Length1 Length2

x / y

exp(-x)

Energy2 Energy1

| x - y |

Page 13: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

exp(x)

xn

Energy2 Energy1

| x - y |

x / y

Length1 Length2

x / y

exp(-x)

ln(x)

Length1 Length2

x + y

x·y

arctan(x)

Length1 Length2

x / y

exp(-x)

Energy2 Energy1

| x - y |

Symbolic regression

Page 14: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Evolutionary/genetic algorithmInitialize population

Representing individuals as binary genes

0 0 1 1 0 1 1 0 1 0 0

1 0 0 1 0 0 0 1 0 1 1

1 0 0 1 0 0 0 0 0 1 1

1 1 0 1 0 1 0 1 1 1 1

1 0 1 1 0 0 0 1 0 0 0

1 0 0 1 1 0 1 1 0 1 1

Page 15: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Representing individuals as binary genes

Evolutionary/genetic algorithm

0 0 1 1 0 1 1 0 1 0 0

1 0 0 1 0 0 0 1 0 1 1

1 1 0 1 0 1 0 1 1 1 1

1 0 0 1 0 0 0 0 0 1 1

1 0 0 1 1 0 1 1 0 1 1

1 0 1 1 0 0 0 1 0 0 0

0.89

0.55

0.34

0.21

0.13

0.08

Rank wrt fit function

Initialize population

Page 16: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Evolutionary/genetic algorithm

Rank wrt fit function

Randomly select “fittest first”

Initialize population

0 0 1 1 0 1 1 0 1 0 0

1 0 0 1 0 0 0 1 0 1 1

1 1 0 1 0 1 0 1 1 1 1

1 0 0 1 0 0 0 0 0 1 1

1 0 0 1 1 0 1 1 0 1 1

1 0 1 1 0 0 0 1 0 0 0

0.89

0.55

0.34

0.21

0.13

0.08

Page 17: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

0 0 1 1 0 1 1 0 1 0 0 1 0 0 1 0 0 0 1 0 1 1

Evolutionary/genetic algorithm

Rank wrt fit function

Randomly select “fittest first”

Crossover

Initialize population

Page 18: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

0 0 1 1 0 1 1 0 1 0 0 1 0 0 1 0 0 0 1 0 1 1

1 0 0 1 0 0 0 0 1 0 00 0 1 1 0 1 1 1 0 1 1

Crossover

Evolutionary/genetic algorithm

Rank wrt fit function

Randomly select “fittest first”

Crossover

Initialize population

Page 19: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

0 0 1 1 0 1 1 0 1 0 0 1 0 0 1 0 0 0 1 0 1 1

1 0 0 1 0 0 0 0 1 0 00 0 1 1 0 1 1 1 0 1 1

0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 0

Crossover

Mutation

Evolutionary/genetic algorithm

Rank wrt fit function

Randomly select “fittest first”

Crossover

Mutation

Initialize population

Page 20: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

0 0 1 1 0 1 1 0 1 0 0 1 0 0 1 0 0 0 1 0 1 1

1 0 0 1 0 0 0 0 1 0 00 0 1 1 0 1 1 1 0 1 1

0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 0

Crossover

Mutation

Evolutionary/genetic algorithm

Rank wrt fit function

Randomly select “fittest first”

Crossover

Mutation

Initialize population

Happy?No

Rank wrt fit function

Page 21: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

0 0 1 1 0 1 1 0 1 0 0 1 0 0 1 0 0 0 1 0 1 1

1 0 0 1 0 0 0 0 1 0 00 0 1 1 0 1 1 1 0 1 1

0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 0

Crossover

Mutation

Evolutionary/genetic algorithm

Rank wrt fit function

Randomly select “fittest first”

Crossover

Mutation

Initialize population

Happy?EndYes No

Rank wrt fit function

Page 22: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

E.g., crossover for molecules/clusters

Rank wrt fit function

Randomly select “fittest first”

Crossover

Mutation

Initialize population

Happy?EndYes No

Evolutionary/genetic algorithm

Rank wrt fit function

Page 23: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Evolutionary programming

Energy2 Energy1

| x - y |

x / y

Length1 Length2

x / y

exp(-x)

Energy2 Energy1

x + y

x / y

Length1 Length2

x / y

ln(x)

Example of crossover between symbolic trees

Page 24: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Evolutionary programming

Energy2 Energy1

| x - y |

x / y

Length1 Length2

x / y

exp(-x)

Energy2 Energy1

x + y

x / y

Length1 Length2

x / y

ln(x)

Example of crossover between symbolic trees

Page 25: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Evolutionary programming

Energy2 Energy1

| x - y |

x / y

Length1 Length2

x / y

exp(-x)

Energy2 Energy1

x + y

x / y

Length1 Length2

x / y

ln(x)

Energy2 Energy1

x + y

x / y

Length1 Length2

x / y

exp(-x)ln(x)

Example of crossover between symbolic trees

Page 26: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Model selection: Pareto front

Objective 2

Obj

ectiv

e 1++++

++

+++++++

++

+++++++

++

+ + +

+

Multi-objective optimization:Points on the Pareto front are such that no point is found that simultaneously improve all the objective functions.

Page 27: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Model selection: Pareto front

Complexity (depth of the tree)

Accu

racy

(RM

SE) ++++

++

+++++++

++

+++++++

++

+ + +

+

Multi-objective optimization:Points on the Pareto front are such that no point is found that simultaneously improve all the objective functions.

Page 28: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

A famous example: EUREQA

Distilling Free-Form Natural Laws from Experimental DataSchmidt M., Lipson H., Science, Vol. 324, No. 5923, (2009)EUREQA: genetic programming software.

Page 29: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

EUREQA: Pareto front

Page 30: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Eureqa

In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,

it is found. For other powerful ML methods (e.g., kernel regression, regression treesand forests, deep learning), this is not the case.

● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

Page 31: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Compressing signals

Page 32: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Compressed sensing

Page 33: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Compressed sensing

Page 34: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Feature selection/identification vs extraction

Feature selection: selection of a subset among given features● Filters univariate ranking, i.e., each feature with the property● Wrappers search strategies, e.g., GA● Embedded (non-stochastic) optimization of objective function,

e.g., regularized regression, decision tree

Feature extraction: new (fewer) features are functions (e.g., linear combinations) of potentially all given features.● Dimension reduction● Autoencoders

Page 35: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Compressed sensing

D.L. Donoho, IEEE Trans. Inf. Theory 2006 DOI: 10.1109/TIT.2006.871582EJ Candès, J Romberg, T Tao, Trans. Inf. Theory 2006 DOI:10.1109/TIT.2005.862083R. Tibshirani J. Royal Stat. Soc. 1997 DOI: 10.1111/j.2517-6161.1996.tb02080.x

LASSO Least Absolute Shrinkage and Selection Operator

Page 36: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Compressed sensing

D.L. Donoho, IEEE Trans. Inf. Theory 2006DOI: 10.1109/TIT.2006.871582

EJ Candès, J Romberg, T Tao, Trans. Inf. Theory 2006 DOI:10.1109/TIT.2005.862083

Recovery possible when:

N: #features, M: #observations, Ω: sparsity

Page 37: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Compressed sensing

D.L. Donoho, IEEE Trans. Inf. Theory 2006 DOI: 10.1109/TIT.2006.871582EJ Candès, J Romberg, T Tao, Trans. Inf. Theory 2006 DOI:10.1109/TIT.2005.862083R. Tibshirani J. Royal Stat. Soc. 1997 DOI: 10.1111/j.2517-6161.1996.tb02080.x

LASSO Least Absolute Shrinkage and Selection Operator

Compressed sensing, or “sparse recovery”enables the recovering of a sparse signal from very few, non-adaptive measurements.

Compressed sensing, or “sparse recovery”enables the recovering of a sparse signal from very few, non-adaptive measurements.

Page 38: Compressed sensing meets symbolic regression: SISSO€¦ · Luca M. Ghiringhelli On-line course on Big Data and Artificial Intelligence in Materials Sciences Compressed sensing meets

Bonus slide: Suggested literature

Notable examples of other-than-SISSO compressed sensing applied to materials science.Actually, in this example LASSO is the applied method. Often LASSO and compressed sensing are thought to be equivalent, whereas compressed sensing includes LASSO as solution protocol.

V Ozoliņš, R Lai, R Caflisch, S Osher - PNAS, 2013 DOI: 10.1073/pnas.1318679110

LJ Nelson, GLW Hart, F Zhou, V Ozoliņš - PRB, 2013 DOI: 10.1103/PhysRevB.87.035125

LJ Nelson, V Ozoliņš, CS Reese, F Zhou, GLW Hart - PRB, 2013 DOI: 10.1103/PhysRevB.88.155105

F Zhou, W Nielson, Y Xia, V Ozoliņš - PRL, 2014 DOI: 10.1103/PhysRevLett.113.185501