29
Discovering Equations from Data Melissa R. McGuirl February 28, 2019 Melissa R. McGuirl Discovering Equations from Data February 28, 2019 1 / 29

Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Discovering Equations from Data

Melissa R. McGuirl

February 28, 2019

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 1 / 29

Page 2: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Outline

1 Goals

2 SINDy

3 HAVOK

4 MATLAB Examples

5 References

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 2 / 29

Page 3: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Problem Set-Up

Inputs

Observation data sampled at discrete times ti : {x(ti )}i∈I , wherex(ti ) ∈ Rn

Goal

Learn f (x(t)) such that dxdt = f (x(t))

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 3 / 29

Page 4: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Approach 1: Sparse Identification of Non-linear DynamicalSystems (SINDy)

SINDy References/Sources:

K. Champion, S. Brunton, J. N. Kutz, Discovery of Non-linearMultiscale Systems: Sampling Strategies and Embeddings, SIAMJournal of Applied Dynamical Systems (2019)

S. Brunton, J. Proctor, J. N. Kutz, Discovering governing equationsfrom data by sparse identification of nonlinear dynamical systems,PNAS (2016)

https://www.youtube.com/watch?v=gSCa78TIldg

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 4 / 29

Page 5: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Approach 1: Sparse Identification of Non-linear DynamicalSystems (SINDy)

Given measurement data {x(ti )}i∈I , can we accurately learn f (x(t)) sothat dx

dt = f (x(t))?

SINDy Assumptions

1 We have the full state measurements

2 f only has a few active terms, i.e. f is sparse is the space of allpossible functions of x(t)

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 5 / 29

Page 6: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

SINDy Algorithm

Case 1: Also assume data sampled from uniscale dynamical system

SINDy: Step 1

Collect measurement data and form a state space matrix:

X =

xT (t1)xT (t2)

...xT (tm)

=

x1(t1) x2(t1) . . . xn(t1)x1(t2) x2(t2) . . . xn(t2)

.... . .

. . ....

x1(tm) x2(tm) . . . xn(tm)

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 6 / 29

Page 7: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

SINDy Algorithm

SINDy: Step 2

Numerically approximate X to get

X =

xT (t1)xT (t2)

...xT (tm)

=

x1(t1) x2(t1) . . . xn(t1)x1(t2) x2(t2) . . . xn(t2)

.... . .

. . ....

x1(tm) x2(tm) . . . xn(tm)

Common methods for approximating the derivative:

Total variation regularized derivative (see R. Chartrand, NumericalDifferentiation of Noisy, Nonsmooth Data, ISRN AppliedMathematics, 2011)

Finite difference methods

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 7 / 29

Page 8: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

SINDy Algorithm

SINDy: Step 3

Construct library Θ(X) of candidate nonlinear functions of X:

Θ(X) =

| | | | | |1 X XP2 XP3 · · · sin(X) cos(X) · · ·| | | | | |

,where

XP2 =

x21 (t1) x1(t1)x2(t1) · · · x22 (t1) · · · x2n (t1)x21 (t2) x1(t2)x2(t2) · · · x22 (t2) · · · x2n (t2)

......

. . ....

. . ....

x21 (tm) x1(tm)x2(tm) · · · x22 (tm) · · · x2n (tm)

and similarly for general XPq .

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 8 / 29

Page 9: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

SINDy Algorithm

SINDy: Step 4

Sparse regression on X = Θ(X)Σ to solve for Σ = [σ1, . . . , σn] coefficients,σi ∈ Rp.

Let λ > 0 be the sparsity threshold

1 Initial guess: solve X = Θ(X)Σ via ordinary least squares

2 If Σ(i , j) < λ set Σ(i , j) = 03 for k = 1, 2, . . . , n

solve X(:, k) = Θ(X)(:,Σ(:, k) > λ)Σ(Σ(:, k) > λ), k) via least squares

4 Repeat steps 2-3 until coefficients do not change (or for a fixednumber of iterations)

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 9 / 29

Page 10: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

SINDy Summary

Figure 1: SINDy algorithm. Figure from Figure 1 of Discovering governingequations from data by sparse identification of nonlinear dynamical systems (S.Brunton, J. Proctor, J.N. Kutz, PNAS, 2016).

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 10 / 29

Page 11: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

SINDy Parameter Choices

Differentiation method and corresponding parameters (e.g., for thetotal variation regularized derivative you need to specify theregularization term)

Polynomial order for Θ(X) and whether or not to include sine/cosinebases functions

Sparsity threshold λ

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 11 / 29

Page 12: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

SINDy Case 2 Algorithm

Case 2: Assume data sampled from multi-scale dynamical system

Refined Problem Set-Up: nonlinear systems with linear coupling

Given measurement data {u(ti )}i∈I and {v(ti )}i∈I , with u(ti ) ∈ Rn (fastvariables) and v(ti ) ∈ Rl (slow variables), we try to learn functions f (u(t))and g(v(t)) and C ∈ Rn×l , D ∈ Rl×n such that

τfastu = f (u) + Cv

τslowv = g(v) + Du

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 12 / 29

Page 13: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

SINDy Case 2 Algorithm

SINDy Multiscale Approach: Burst Sampling

Sampling scheme: Collect sample measurements in short bursts witha small step size, spread out of over a long duration

Repeat SINDy with burst sampling scheme

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 13 / 29

Page 14: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Burst Sampling

Figure 2: Burst Sampling. Figure from Figure 3(a) in Discovery of Non-linearMultiscale Systems: Sampling Strategies and Embeddings (K. Champion, S.Brunton, J.N. Kutz, SIAM Journal of Applied Dynamical Systems, 2019).

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 14 / 29

Page 15: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

SINDy Case 2 (multiscale dynamics) Additional ParameterChoices

Burst size (number of samples per burst)

Duration over which to sample (about 2 times the period of the slowdynamics, at least)

Total number of bursts to collect

Placement of bursts (select burst times as Poisson arrival times orselect them uniformly and then add noise to the left/right)

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 15 / 29

Page 16: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Aside: What about PDEs?

Figure 3: PDE-FIND algorithm. Figure from Figure 1 in Data-driven Discovery ofpartial differential equations (S. Rudy, S. Brunton, J. Proctor, J. N. Kutz, ScienceAdvances, 2017). Also see https:

//www.youtube.com/watch?v=oI3grKBxkqM&feature=youtu.be+target%3D

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 16 / 29

Page 17: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Approach 2: Hankel alternative view of Koopman(HAVOK) analysis

HAVOK References/Sources:

S. Brunton, B. Brunton, J. Proctor, E. Kaiser, J. N. Kutz, Chaos asan intermittently forced linear system, Nature Communications (2017)

K. Champion, S. Brunton, J. N. Kutz, Discovery of Non-linearMultiscale Systems: Sampling Strategies and Embeddings, SIAMJournal of Applied Dynamical Systems (2019)

https://www.youtube.com/watch?v=Q8VzAtGGlDQ

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 17 / 29

Page 18: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Approach 2: Hankel alternative view of Koopman(HAVOK) analysis

HAVOK Assumptions

1 We only have one state measurement variable {x(ti )}i∈I , x(ti ) ∈ R2 Data comes from an r-dimensional chaotic dynamical system

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 18 / 29

Page 19: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

HAVOK Algorithm

HAVOK: Step 1

Collect measurement data and form a state space vector:

X =

x(t1)x(t2)

...x(tm)

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 19 / 29

Page 20: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

HAVOK Algorithm

HAVOK: Step 2

Form Hankel matrix:

H =

x(t1) x(t2) . . . x(tp)x(t2) x(t3) . . . x(tp+1)

......

. . ....

x(tq) x(tq+1) . . . x(tm)

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 20 / 29

Page 21: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

HAVOK Algorithm

HAVOK: Step 3

Apply singular value decomposition to Hankel matrix:

H = UΣV ∗

HAVOK: Step 4

Find optimal rank r (M. Gavish and D. L. Donoho The optimal hardthreshold for singular values is 4/

√3, IEEE Transactions on Information

Theory (2014))

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 21 / 29

Page 22: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

HAVOK Algorithm

HAVOK: Step 5

Apply SINDy algorithm (with degree 1) using first r columns of V asmeasurement data:

Vr = [V1,V2, . . . ,Vr ]T

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 22 / 29

Page 23: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

HAVOK Algorithm

HAVOK: Step 6

Separate out rth variable. Build state-space model from first (r − 1) termsand use rth term as a forcing term:

dVdt

= AV(t) + BVr (t),

where V = [V1,V2, . . . ,Vr−1]T , and A and B are coefficient matricesfound from SINDy in step 5

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 23 / 29

Page 24: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

HAVOK Summary

Figure 4: HAVOK analysis. Figure from Figure 1 in Chaos as an intermittentlyforced linear system (S. Brunton, B. Brunton, J. Proctor, E. Kaiser, J. N. Kutz,Nature Communications, 2017).

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 24 / 29

Page 25: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

HAVOK Parameter Choices

q (HAVOK matrix dimension)

Differentiation method and corresponding parameters

Sparsity threshold λ

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 25 / 29

Page 26: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Other HAVOK approaches

What if we have quasiperiodic data instead of chaotic data?

Methods for tackling this case with both uniscale and multiscaledynamics are presented in Discovery of Non-linear Multiscale Systems:Sampling Strategies and Embeddings (K. Champion, S. Brunton, J.N. Kutz, SIAM Journal of Applied Dynamical Systems, 2019)

Methods rely on dynamic mode decomposition of Hankel matrix andshifted Hankel matrix

Return to these methods after we cover dynamic modedecomposition?

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 26 / 29

Page 27: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Open source code

MATLAB SINDy and PDE-FIND code available here:http://faculty.washington.edu/kutz/page26/

MATLAB HAVOK code available here:http://faculty.washington.edu/sbrunton/HAVOK.zip

Modified comparison code on Dropbox

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 27 / 29

Page 28: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

Group Activities

1 Try MATLAB code with favorite dynamical systems

2 Play around with parameter and analyze how performance changes

3 Add noise to system data and see how methods perform

4 Download relevant data from https://www.kaggle.com/datasets

(or try your own data sets) and see if we can learninteresting/accurate governing equations for the data

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 28 / 29

Page 29: Discovering Equations from Data - Applied MathematicsSystems (SINDy) Given measurement data fx(t i)g i2I, can we accurately learn f(x(t)) so that dx dt = f(x(t))? SINDy Assumptions

References

S. Brunton, B. Brunton, J. Proctor, E. Kaiser, J. N. Kutz, Chaos asan intermittently forced linear system, Nature Communications (2017)

S. Brunton, J. Proctor, J. N. Kutz, Discovering governing equationsfrom data by sparse identification of nonlinear dynamical systems,PNAS, (2016)

K. Champion, S. Brunton, J. N. Kutz, Discovery of Non-linearMultiscale Systems: Sampling Strategies and Embeddings, SIAMJournal of Applied Dynamical Systems (2019).

R. Chartrand, Numerical Differentiation of Noisy, Nonsmooth Data,ISRN Applied Mathematics (2011).

M. Gavish and D. L. Donoho The optimal hard threshold for singularvalues is 4/

√3, IEEE Transactions on Information Theory (2014).

S. Rudy, S. Brunton, J. Proctor, J. N. Kutz, Data-driven Discovery ofpartial differential equations, Science Advance (2017).

Melissa R. McGuirl Discovering Equations from Data February 28, 2019 29 / 29