50
An introduction to optimization on manifolds min โˆˆโ„ณ ( ) Nicolas Boumal, July 2020 Mathematics @ EPFL

Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

An introduction tooptimization on manifolds

min๐‘ฅ๐‘ฅโˆˆโ„ณ

๐‘“๐‘“(๐‘ฅ๐‘ฅ)

Nicolas Boumal, July 2020Mathematics @ EPFL

Page 2: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Optimization on smooth manifoldsmin๐‘ฅ๐‘ฅ๐‘“๐‘“ ๐‘ฅ๐‘ฅ subject to ๐‘ฅ๐‘ฅ โˆˆ โ„ณ

Linear spacesUnconstrained; linear equality constraintsLow rank (matrices, tensors)Recommender systems, large-scale Lyapunov equations, โ€ฆOrthonormality (Grassmann, Stiefel, rotations)Dictionary learning, SfM, SLAM, PCA, ICA, SBM, Electr. Struct. Comp.โ€ฆPositivity (positive definiteness, positive orthant)Metric learning, Gaussian mixtures, diffusion tensor imaging, โ€ฆSymmetry (quotient manifolds)Invariance under group actions

2

Page 3: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

A Riemannian structure gives us gradients and Hessians

The essential tools of smooth optimization are defined generally on Riemannian manifolds.

Unified theory, broadly applicable algorithms.

First ideas from the โ€˜70s.First practical in the โ€˜90s.

3

Page 4: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on
Page 5: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

5

Page 6: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

6

Page 7: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

7

Page 8: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

8

Page 9: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

9

Page 10: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

10

Page 11: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

manopt.org

With Bamdev Mishra,P.-A. Absil & R. Sepulchre

11

Lead by J. Townsend,N. Koep & S. Weichwald

Lead by Ronny Bergmann

Page 12: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

manopt.org

With Bamdev Mishra,P.-A. Absil & R. Sepulchre

12

Lead by J. Townsend,N. Koep & S. Weichwald

Lead by Ronny Bergmann

nicolasboumal.net/book

Page 13: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

What is a smooth manifold?

13

Yes

Yes

No

This overview: restricted to embedded submanifolds of linear spaces.

Page 14: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

๐‘ฅ๐‘ฅ

๐‘ฃ๐‘ฃ

โ„ณ โŠ† โ„ฐ is an embedded submanifold of codimension ๐‘˜๐‘˜ if:There exists a smooth function โ„Ž:โ„ฐ โ†’ ๐‘๐‘๐‘˜๐‘˜ such that:

โ„ณ โ‰ก โ„Ž ๐‘ฅ๐‘ฅ = 0 and Dโ„Ž ๐‘ฅ๐‘ฅ :โ„ฐ โ†’ ๐‘๐‘๐‘˜๐‘˜ has rank ๐‘˜๐‘˜ โˆ€๐‘ฅ๐‘ฅ โˆˆ โ„ณ

E.g., sphere: โ„Ž ๐‘ฅ๐‘ฅ = ๐‘ฅ๐‘ฅ๐‘‡๐‘‡๐‘ฅ๐‘ฅ โˆ’ 1 and Dโ„Ž ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = 2๐‘ฅ๐‘ฅ๐‘‡๐‘‡๐‘ฃ๐‘ฃ

These properties allow us to linearize the set:โ„Ž ๐‘ฅ๐‘ฅ + ๐‘ก๐‘ก๐‘ฃ๐‘ฃ = โ„Ž ๐‘ฅ๐‘ฅ + ๐‘ก๐‘กDโ„Ž ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ + ๐‘‚๐‘‚ ๐‘ก๐‘ก2

This is ๐‘‚๐‘‚ ๐‘ก๐‘ก2 iff ๐‘ฃ๐‘ฃ โˆˆ ker Dโ„Ž ๐‘ฅ๐‘ฅ .โ†’Tangent spaces: T๐‘ฅ๐‘ฅโ„ณ = ker Dโ„Ž ๐‘ฅ๐‘ฅ ,

(subspace of โ„ฐ)

What is a smooth manifold? (2)

14

Page 15: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

โ„ณ โŠ† โ„ฐ is an embedded submanifold of codimension ๐‘˜๐‘˜ if:

For each ๐‘ฅ๐‘ฅ โˆˆ โ„ณ there is a neighborhood ๐‘ˆ๐‘ˆ of ๐‘ฅ๐‘ฅ in โ„ฐand a smooth function โ„Ž:๐‘ˆ๐‘ˆ โ†’ ๐‘๐‘๐‘˜๐‘˜ such that:

a) Dโ„Ž ๐‘ฅ๐‘ฅ :โ„ฐ โ†’ ๐‘๐‘๐‘˜๐‘˜ has rank ๐‘˜๐‘˜, and

b) โ„ณโˆฉ๐‘ˆ๐‘ˆ = โ„Žโˆ’1 0 = ๐‘ฆ๐‘ฆ โˆˆ ๐‘ˆ๐‘ˆ:โ„Ž ๐‘ฆ๐‘ฆ = 0

(Necessary for fixed-rank matrices.)

What is a smooth manifold? (3)

15

Page 16: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Bootstrapping our set of tools

1. Smooth maps2. Differentials of smooth maps3. Vector fields and tangent bundles4. Retractions5. Riemannian metrics6. Riemannian gradients7. Riemannian connections8. Riemannian Hessians9. Riemannian covariant derivatives along curves

16

Page 17: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Smooth maps between manifolds

With โ„ณ,โ„ณโ€ฒ embedded in โ„ฐ,โ„ฐโ€ฒ,

Define: a map ๐น๐น:โ„ณ โ†’โ„ณโ€ฒ is smooth if:

There exists a neighborhood ๐‘ˆ๐‘ˆ of โ„ณ in โ„ฐ and a smooth map ๏ฟฝ๐น๐น:๐‘ˆ๐‘ˆ โ†’ โ„ฐโ€ฒ such that ๐น๐น = ๏ฟฝ๐น๐น|โ„ณ .

We call ๏ฟฝ๐น๐น a smooth extension of ๐น๐น.

17

Page 18: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Differential of a smooth map

18

For ๏ฟฝ๐น๐น:โ„ฐ โ†’ โ„ฐโ€ฒ, we have D ๏ฟฝ๐น๐น ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = lim๐‘ก๐‘กโ†’0

๏ฟฝ๐น๐น ๐‘ฅ๐‘ฅ+๐‘ก๐‘ก๐‘ก๐‘ก โˆ’ ๏ฟฝ๐น๐น ๐‘ฅ๐‘ฅ๐‘ก๐‘ก

.

For ๐น๐น:โ„ณ โ†’โ„ณโ€ฒ smooth, ๐น๐น ๐‘ฅ๐‘ฅ + ๐‘ก๐‘ก๐‘ฃ๐‘ฃ may not be defined.But picking a smooth extension ๏ฟฝ๐น๐น, for ๐‘ฃ๐‘ฃ โˆˆ T๐‘ฅ๐‘ฅโ„ณ we say:

Define: D๐น๐น ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = lim๐‘ก๐‘กโ†’0

๏ฟฝ๐น๐น ๐‘ฅ๐‘ฅ+๐‘ก๐‘ก๐‘ก๐‘ก โˆ’ ๏ฟฝ๐น๐น ๐‘ฅ๐‘ฅ๐‘ก๐‘ก

= D ๏ฟฝ๐น๐น ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ .

Claim: this does not depend on choice of ๏ฟฝ๐น๐น.Claim: we retain linearity, product rule & chain rule.

Page 19: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Differential of a smooth map (2)

19

We defined D๐น๐น ๐‘ฅ๐‘ฅ = D ๏ฟฝ๐น๐น ๐‘ฅ๐‘ฅ |T๐‘ฅ๐‘ฅโ„ณ . Equivalently:

Claim: for each ๐‘ฃ๐‘ฃ โˆˆ T๐‘ฅ๐‘ฅโ„ณ, there exists a smooth curve ๐‘๐‘:๐‘๐‘ โ†’ โ„ณ such that ๐‘๐‘ 0 = ๐‘ฅ๐‘ฅ and ๐‘๐‘โ€ฒ 0 = ๐‘ฃ๐‘ฃ.

Page 20: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Differential of a smooth map (2)

20

We defined D๐น๐น ๐‘ฅ๐‘ฅ = D ๏ฟฝ๐น๐น ๐‘ฅ๐‘ฅ |T๐‘ฅ๐‘ฅโ„ณ . Equivalently:

Claim: for each ๐‘ฃ๐‘ฃ โˆˆ T๐‘ฅ๐‘ฅโ„ณ, there exists a smooth curve ๐‘๐‘:๐‘๐‘ โ†’ โ„ณ such that ๐‘๐‘ 0 = ๐‘ฅ๐‘ฅ and ๐‘๐‘โ€ฒ 0 = ๐‘ฃ๐‘ฃ.

Define: D๐น๐น ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = lim๐‘ก๐‘กโ†’0

๐น๐น ๐‘๐‘(๐‘ก๐‘ก) โˆ’๐น๐น ๐‘ฅ๐‘ฅ๐‘ก๐‘ก

= ๐น๐น โˆ˜ ๐‘๐‘ โ€ฒ 0 .

Page 21: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Vector fields and the tangent bundleA vector field ๐‘‰๐‘‰ ties to each ๐‘ฅ๐‘ฅ a vector ๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ โˆˆ T๐‘ฅ๐‘ฅโ„ณ.

What does it mean for ๐‘‰๐‘‰ to be smooth?

Define: the tangent bundle of โ„ณ is the disjoint union of all tangent spaces:

Tโ„ณ = ๐‘ฅ๐‘ฅ, ๐‘ฃ๐‘ฃ : ๐‘ฅ๐‘ฅ โˆˆ โ„ณ and ๐‘ฃ๐‘ฃ โˆˆ T๐‘ฅ๐‘ฅโ„ณ

Claim: Tโ„ณ is a manifold (embedded in โ„ฐ ร— โ„ฐ).

๐‘‰๐‘‰ is smooth if it is smooth as a map from โ„ณ to Tโ„ณ.23

Page 22: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Aside: new manifolds from old ones

Given a manifold โ„ณ, we can create a new manifold by considering its tangent bundle.

Here are other ways to recycle:

โ€ข Products of manifolds: โ„ณ ร— โ„ณโ€ฒ, โ„ณ๐‘›๐‘›

โ€ข Open subsets of manifolds*โ€ข (Quotienting some equivalence relations.)

*For embedded submanifolds: subset topology.24

Page 23: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Retractions: moving around

Given ๐‘ฅ๐‘ฅ, ๐‘ฃ๐‘ฃ โˆˆ Tโ„ณ, we can move away from ๐‘ฅ๐‘ฅ along ๐‘ฃ๐‘ฃ using any ๐‘๐‘:๐‘๐‘ โ†’โ„ณ with ๐‘๐‘ 0 = ๐‘ฅ๐‘ฅ and ๐‘๐‘โ€ฒ 0 = ๐‘ฃ๐‘ฃ.

Retractions are a smooth choice of curves over Tโ„ณ.

Define: A retraction is a smooth map ๐‘…๐‘…: Tโ„ณ โ†’โ„ณsuch that if ๐‘๐‘ ๐‘ก๐‘ก = ๐‘…๐‘… ๐‘ฅ๐‘ฅ, ๐‘ก๐‘ก๐‘ฃ๐‘ฃ = ๐‘…๐‘…๐‘ฅ๐‘ฅ(๐‘ก๐‘ก๐‘ฃ๐‘ฃ)then ๐‘๐‘ 0 = ๐‘ฅ๐‘ฅ and ๐‘๐‘โ€ฒ 0 = ๐‘ฃ๐‘ฃ.

25

Page 24: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Retractions: moving around (2)

Define: a retraction is a smooth map ๐‘…๐‘…: Tโ„ณ โ†’โ„ณsuch that if ๐‘๐‘ ๐‘ก๐‘ก = ๐‘…๐‘… ๐‘ฅ๐‘ฅ, ๐‘ก๐‘ก๐‘ฃ๐‘ฃ = ๐‘…๐‘…๐‘ฅ๐‘ฅ(๐‘ก๐‘ก๐‘ฃ๐‘ฃ),then ๐‘๐‘ 0 = ๐‘ฅ๐‘ฅ and ๐‘๐‘โ€ฒ 0 = ๐‘ฃ๐‘ฃ.

Equivalently: ๐‘…๐‘…๐‘ฅ๐‘ฅ 0 = ๐‘ฅ๐‘ฅ and D๐‘…๐‘…๐‘ฅ๐‘ฅ 0 = Id.

Typical choice: ๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = projection of ๐‘ฅ๐‘ฅ + ๐‘ฃ๐‘ฃ to โ„ณ:โ€ขโ„ณ = ๐‘๐‘๐‘›๐‘›, ๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = ๐‘ฅ๐‘ฅ + ๐‘ฃ๐‘ฃโ€ขโ„ณ = S๐‘›๐‘›โˆ’1, ๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = ๐‘ฅ๐‘ฅ+๐‘ก๐‘ก

๐‘ฅ๐‘ฅ+๐‘ก๐‘กโ€ขโ„ณ = ๐‘๐‘๐‘Ÿ๐‘Ÿ๐‘š๐‘šร—๐‘›๐‘›, ๐‘…๐‘…๐‘‹๐‘‹ ๐‘‰๐‘‰ = SVD๐‘Ÿ๐‘Ÿ ๐‘‹๐‘‹ + ๐‘‰๐‘‰

26

Page 25: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Towards gradients: reminders from ๐‘๐‘๐‘›๐‘›

The gradient of a smooth ๐‘“๐‘“:๐‘๐‘๐‘›๐‘› โ†’ ๐‘๐‘ at ๐‘ฅ๐‘ฅ is defined by:

D ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = grad ๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘ฃ๐‘ฃ for all ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘›๐‘›.

In particular, with the inner product ๐‘ข๐‘ข, ๐‘ฃ๐‘ฃ = ๐‘ข๐‘ขโŠค๐‘ฃ๐‘ฃ,

grad ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘–๐‘– = grad ๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘’๐‘’๐‘–๐‘– = D ๐‘“๐‘“ ๐‘ฅ๐‘ฅ [๐‘’๐‘’๐‘–๐‘–] =๐œ•๐œ• ๐‘“๐‘“๐œ•๐œ•๐‘ฅ๐‘ฅ๐‘–๐‘–

๐‘ฅ๐‘ฅ .

Note: grad ๐‘“๐‘“ is a smooth vector field on ๐‘๐‘๐‘›๐‘›.27

Page 26: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Riemannian metrics

T๐‘ฅ๐‘ฅโ„ณ is a linear space (subspace of โ„ฐ).

Pick an inner product โ‹…,โ‹… ๐‘ฅ๐‘ฅ for each T๐‘ฅ๐‘ฅโ„ณ.

Define: โ‹…,โ‹… ๐‘ฅ๐‘ฅ defines a Riemannian metric on โ„ณ if for any two smooth vector fields ๐‘‰๐‘‰,๐‘Š๐‘Š the function ๐‘ฅ๐‘ฅ โ†ฆ ๐‘‰๐‘‰(๐‘ฅ๐‘ฅ),๐‘Š๐‘Š(๐‘ฅ๐‘ฅ) ๐‘ฅ๐‘ฅ is smooth.

A Riemannian manifold is a manifold with a Riemannian metric.

28

Page 27: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Riemannian submanifolds

Let โ‹…,โ‹… be the inner product on โ„ฐ.

Since T๐‘ฅ๐‘ฅโ„ณ is a linear subspace of โ„ฐ, one choice is:

๐‘ข๐‘ข, ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ = ๐‘ข๐‘ข, ๐‘ฃ๐‘ฃ

Claim: this defines a Riemannian metric on โ„ณ.

With this metric, โ„ณ is a Riemannian submanifold of โ„ฐ.

29

Page 28: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Riemannian gradient

Let ๐‘“๐‘“:โ„ณ โ†’ ๐‘๐‘ be smooth on a Riemannian manifold.

Define: the Riemannian gradient of ๐‘“๐‘“ at ๐‘ฅ๐‘ฅ is the unique tangent vector at ๐‘ฅ๐‘ฅ such that:

D๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ for all ๐‘ฃ๐‘ฃ โˆˆ T๐‘ฅ๐‘ฅโ„ณ

Claim: grad๐‘“๐‘“ is a smooth vector field.Claim: if ๐‘ฅ๐‘ฅ is a local optimum of ๐‘“๐‘“, grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ = 0.

30

Page 29: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Gradients on Riemannian submanifoldsLet ๐‘“๐‘“ be a smooth extension of ๐‘“๐‘“. For all ๐‘ฃ๐‘ฃ โˆˆ T๐‘ฅ๐‘ฅโ„ณ:

grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ ,๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ = D๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ= D ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = grad ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ,๐‘ฃ๐‘ฃ

Assume โ„ณ is a Riemannian submanifold of โ„ฐ.Since โ‹…,โ‹… = โ‹…,โ‹… ๐‘ฅ๐‘ฅ, by uniqueness we conclude:

grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ = Proj๐‘ฅ๐‘ฅ grad ๐‘“๐‘“ ๐‘ฅ๐‘ฅ

Proj๐‘ฅ๐‘ฅ is the orthogonal projector from โ„ฐ to T๐‘ฅ๐‘ฅโ„ณ.31

Page 30: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

A first algorithm: gradient descent

For ๐‘“๐‘“:๐‘๐‘๐‘›๐‘› โ†’ ๐‘๐‘, ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 = ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’ ๐›ผ๐›ผ๐‘˜๐‘˜grad ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜

For ๐‘“๐‘“:โ„ณ โ†’ ๐‘๐‘, ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 = ๐‘…๐‘…๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’๐›ผ๐›ผ๐‘˜๐‘˜grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜

For the analysis, need to understand ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 :

The composition ๐‘“๐‘“ โˆ˜ ๐‘…๐‘…๐‘ฅ๐‘ฅ: T๐‘ฅ๐‘ฅโ„ณ โ†’ ๐‘๐‘ is on a linear space, hence we may Taylor expand it.

32

Page 31: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

A first algorithm: gradient descent๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 = ๐‘…๐‘…๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’๐›ผ๐›ผ๐‘˜๐‘˜grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜

The composition ๐‘“๐‘“ โˆ˜ ๐‘…๐‘…๐‘ฅ๐‘ฅ: T๐‘ฅ๐‘ฅโ„ณ โ†’ ๐‘๐‘ is on a linear space, hence we may Taylor expand it:

๐‘“๐‘“ ๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = ๐‘“๐‘“ ๐‘…๐‘…๐‘ฅ๐‘ฅ 0 + grad ๐‘“๐‘“ โˆ˜ ๐‘…๐‘…๐‘ฅ๐‘ฅ 0 , ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ + ๐‘‚๐‘‚ ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ2

= ๐‘“๐‘“ ๐‘ฅ๐‘ฅ + grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ + ๐‘‚๐‘‚ ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ2

Indeed: D ๐‘“๐‘“ โˆ˜ ๐‘…๐‘…๐‘ฅ๐‘ฅ 0 ๐‘ฃ๐‘ฃ = D๐‘“๐‘“ ๐‘…๐‘…๐‘ฅ๐‘ฅ 0 D๐‘…๐‘…๐‘ฅ๐‘ฅ 0 ๐‘ฃ๐‘ฃ = D๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ .

33

Page 32: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Gradient descent on โ„ณA1 ๐‘“๐‘“ ๐‘ฅ๐‘ฅ โ‰ฅ ๐‘“๐‘“low for all ๐‘ฅ๐‘ฅ โˆˆ โ„ณA2 ๐‘“๐‘“ ๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ โ‰ค ๐‘“๐‘“ ๐‘ฅ๐‘ฅ + ๐‘ฃ๐‘ฃ, grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ + ๐ฟ๐ฟ

2๐‘ฃ๐‘ฃ 2

Algorithm: ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 = ๐‘…๐‘…๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’ 1๐ฟ๐ฟ

grad๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘˜๐‘˜)

Complexity: grad๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘˜๐‘˜) โ‰ค ๐œ€๐œ€ with some ๐‘˜๐‘˜ โ‰ค 2๐ฟ๐ฟ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ0 โˆ’ ๐‘“๐‘“low1๐œ€๐œ€2

A2 โ‡’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 โ‰ค ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’1๐ฟ๐ฟ

grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ 2 +12๐ฟ๐ฟ

grad๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘˜๐‘˜) 2

โ‡’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 โ‰ฅ12๐ฟ๐ฟ

grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ 2

๐€๐€๐€๐€ โ‡’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ0 โˆ’ ๐‘“๐‘“low โ‰ฅ โˆ‘๐‘˜๐‘˜=0๐พ๐พโˆ’1 ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 > ๐œ€๐œ€2

2๐ฟ๐ฟ๐พ๐พ

34

๐‘ฅ๐‘ฅ

๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ๐‘ฃ๐‘ฃ

(for contradiction)

Page 33: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Tips and tricks to get the gradient

Use definition as starting point: D๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = โ‹ฏ

Cheap gradient principle

Numerical check of gradient: Taylor ๐‘ก๐‘ก โ†ฆ ๐‘“๐‘“ ๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ก๐‘ก๐‘ฃ๐‘ฃManopt: checkgradient(problem)

Automatic differentiation: Python, JuliaNot Matlab :/โ€”this being said, for theory, often need to manipulate gradient โ€œon paperโ€ anyway.

35

Page 34: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

On to second-order methods

Consider ๐‘“๐‘“:๐‘๐‘๐‘›๐‘› โ†’ ๐‘๐‘ smooth. Taylor says:

๐‘“๐‘“ ๐‘ฅ๐‘ฅ + ๐‘ฃ๐‘ฃ โ‰ˆ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ + grad ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ,๐‘ฃ๐‘ฃ +12๐‘ฃ๐‘ฃ, Hess ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ

If Hess ๐‘“๐‘“ ๐‘ฅ๐‘ฅ โ‰ป 0, quadratic model minimized for ๐‘ฃ๐‘ฃ s.t.:

Hess ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = โˆ’grad ๐‘“๐‘“ ๐‘ฅ๐‘ฅ

From there, we can construct Newtonโ€™s method etc.36

Page 35: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Towards Hessians: reminders from ๐‘๐‘๐‘›๐‘›

Consider ๐‘“๐‘“:๐‘๐‘๐‘›๐‘› โ†’ ๐‘๐‘ smooth.

The Hessian of ๐‘“๐‘“ at ๐‘ฅ๐‘ฅ is a linear operator which tells us how the gradient vector field of ๐‘“๐‘“ varies:

Hess ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = Dgrad ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ

With ๐‘ข๐‘ข, ๐‘ฃ๐‘ฃ = ๐‘ข๐‘ขโŠค๐‘ฃ๐‘ฃ, yields: Hess ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘–๐‘–๐‘–๐‘– = ๐œ•๐œ•2 ๐‘“๐‘“๐œ•๐œ•๐‘ฅ๐‘ฅ๐‘–๐‘–๐œ•๐œ•๐‘ฅ๐‘ฅ๐‘—๐‘—

(๐‘ฅ๐‘ฅ).

Notice that Hess ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ is a vector in ๐‘๐‘๐‘›๐‘›.

37

Page 36: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

A difficulty on manifolds

A smooth vector field ๐‘‰๐‘‰ on โ„ณ is a smooth map:we already have a notion of how to differentiate it.

Example: with ๐‘“๐‘“ ๐‘ฅ๐‘ฅ = 12๐‘ฅ๐‘ฅโŠค๐ด๐ด๐‘ฅ๐‘ฅ on the sphere,

๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ = grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ = Proj๐‘ฅ๐‘ฅ ๐ด๐ด๐‘ฅ๐‘ฅ = ๐ด๐ด๐‘ฅ๐‘ฅ โˆ’ ๐‘ฅ๐‘ฅโŠค๐ด๐ด๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ

D๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข = โ‹ฏ = Proj๐‘ฅ๐‘ฅ ๐ด๐ด๐‘ข๐‘ข โˆ’ ๐‘ฅ๐‘ฅโŠค๐ด๐ด๐‘ฅ๐‘ฅ ๐‘ข๐‘ข โˆ’ ๐‘ฅ๐‘ฅโŠค๐ด๐ด๐‘ข๐‘ข ๐‘ฅ๐‘ฅ

Issue: D๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข may not be a tangent vector at ๐‘ฅ๐‘ฅ!

38

Page 37: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Connections:A tool to differentiate vector fieldsLet ๐’ณ๐’ณ โ„ณ be the set of smooth vector fields on โ„ณ.Given ๐‘ˆ๐‘ˆ โˆˆ ๐’ณ๐’ณ โ„ณ and smooth ๐‘“๐‘“, ๐‘ˆ๐‘ˆ๐‘“๐‘“ ๐‘ฅ๐‘ฅ = D๐‘“๐‘“ ๐‘ฅ๐‘ฅ [๐‘ˆ๐‘ˆ(๐‘ฅ๐‘ฅ)].

A map ๐›ป๐›ป:๐’ณ๐’ณ โ„ณ ร— ๐’ณ๐’ณ โ„ณ โ†’ ๐’ณ๐’ณ โ„ณ is a connection if:1. ๐›ป๐›ป๐‘“๐‘“๐‘“๐‘“+๐‘”๐‘”๐‘”๐‘” ๐‘‰๐‘‰ = ๐‘“๐‘“๐›ป๐›ป๐‘“๐‘“๐‘‰๐‘‰ + ๐‘”๐‘”๐›ป๐›ป๐‘”๐‘”๐‘‰๐‘‰2. ๐›ป๐›ป๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘‰๐‘‰ + ๐‘๐‘๐‘Š๐‘Š = ๐‘Ž๐‘Ž๐›ป๐›ป๐‘“๐‘“๐‘‰๐‘‰ + ๐‘๐‘๐›ป๐›ป๐‘“๐‘“๐‘Š๐‘Š3. ๐›ป๐›ป๐‘“๐‘“ ๐‘“๐‘“๐‘‰๐‘‰ = ๐‘ˆ๐‘ˆ๐‘“๐‘“ ๐‘‰๐‘‰ + ๐‘“๐‘“๐›ป๐›ป๐‘“๐‘“๐‘‰๐‘‰

Example: for โ„ณ = โ„ฐ, ๐›ป๐›ป๐‘“๐‘“๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ = D๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ ๐‘ˆ๐‘ˆ ๐‘ฅ๐‘ฅExample: for โ„ณ โŠ† โ„ฐ, ๐›ป๐›ป๐‘“๐‘“๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ = Proj๐‘ฅ๐‘ฅ D๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ ๐‘ˆ๐‘ˆ ๐‘ฅ๐‘ฅ

39

Page 38: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Riemannian connections:A unique and favorable choiceLet โ„ณ be a Riemannian manifold.Claim: there exists a unique connection ๐›ป๐›ป on โ„ณ s.t.:

4. ๐›ป๐›ป๐‘“๐‘“๐‘‰๐‘‰ โˆ’ ๐›ป๐›ป๐‘‰๐‘‰๐‘ˆ๐‘ˆ ๐‘“๐‘“ = ๐‘ˆ๐‘ˆ ๐‘‰๐‘‰๐‘“๐‘“ โˆ’ ๐‘‰๐‘‰ ๐‘ˆ๐‘ˆ๐‘“๐‘“5. ๐‘ˆ๐‘ˆ ๐‘‰๐‘‰,๐‘Š๐‘Š = ๐›ป๐›ป๐‘“๐‘“๐‘‰๐‘‰,๐‘Š๐‘Š + ๐‘‰๐‘‰,๐›ป๐›ป๐‘“๐‘“๐‘Š๐‘Š

It is called the Riemannian connection (Levi-Civita).

Claim: if โ„ณ is a Riemannian submanifold of โ„ฐ, then

๐›ป๐›ป๐‘“๐‘“๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ = Proj๐‘ฅ๐‘ฅ D๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ ๐‘ˆ๐‘ˆ ๐‘ฅ๐‘ฅ

is the Riemannian connection on โ„ณ.40

Page 39: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Riemannian HessiansClaim: ๐›ป๐›ป๐‘“๐‘“๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ depends on ๐‘ˆ๐‘ˆ only through ๐‘ˆ๐‘ˆ(๐‘ฅ๐‘ฅ).This justifies the notation ๐›ป๐›ป๐‘ข๐‘ข๐‘‰๐‘‰; e.g.: ๐›ป๐›ป๐‘ข๐‘ข๐‘‰๐‘‰ = Proj๐‘ฅ๐‘ฅ D๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข

Define: the Riemannian Hessian of ๐‘“๐‘“:โ„ณ โ†’ ๐‘๐‘ at ๐‘ฅ๐‘ฅis a linear operator from T๐‘ฅ๐‘ฅโ„ณ to T๐‘ฅ๐‘ฅโ„ณ defined by:

Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข = ๐›ป๐›ป๐‘ข๐‘ขgrad๐‘“๐‘“

where ๐›ป๐›ป is the Riemannian connection.

Claim: Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ is self-adjoint.Claim: if ๐‘ฅ๐‘ฅ is a local minimum, then Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ โ‰ฝ 0.

41

Page 40: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Hessians on Riemannian submanifolds

Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข = ๐›ป๐›ป๐‘ข๐‘ขgrad๐‘“๐‘“(๐‘ฅ๐‘ฅ)

On a Riemannian submanifold of a linear space,

๐›ป๐›ป๐‘ข๐‘ข๐‘‰๐‘‰ = Proj๐‘ฅ๐‘ฅ D๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข

Combining:

Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข = Proj๐‘ฅ๐‘ฅ Dgrad๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข

42

Page 41: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Hessians on Riemannian submanifolds (2)

Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข = Proj๐‘ฅ๐‘ฅ Dgrad๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข

Example: ๐‘“๐‘“ ๐‘ฅ๐‘ฅ = 12๐‘ฅ๐‘ฅโŠค๐ด๐ด๐‘ฅ๐‘ฅ on the sphere in ๐‘๐‘๐‘›๐‘›.

๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ = grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ = Proj๐‘ฅ๐‘ฅ ๐ด๐ด๐‘ฅ๐‘ฅ = ๐ด๐ด๐‘ฅ๐‘ฅ โˆ’ ๐‘ฅ๐‘ฅโŠค๐ด๐ด๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ

D๐‘‰๐‘‰ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข = Proj๐‘ฅ๐‘ฅ ๐ด๐ด๐‘ข๐‘ข โˆ’ ๐‘ฅ๐‘ฅโŠค๐ด๐ด๐‘ฅ๐‘ฅ ๐‘ข๐‘ข โˆ’ ๐‘ฅ๐‘ฅโŠค๐ด๐ด๐‘ข๐‘ข ๐‘ฅ๐‘ฅ

Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ข๐‘ข = Proj๐‘ฅ๐‘ฅ ๐ด๐ด๐‘ข๐‘ข โˆ’ ๐‘ฅ๐‘ฅโŠค๐ด๐ด๐‘ฅ๐‘ฅ ๐‘ข๐‘ข

Remarkably, grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ = 0 and Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ โ‰ฝ 0 iff ๐‘ฅ๐‘ฅ optimal.

43

Page 42: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Newton, Taylor and RiemannNow that we have a Hessian, we might guess:

๐‘“๐‘“ ๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ โ‰ˆ ๐‘š๐‘š๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = ๐‘“๐‘“ ๐‘ฅ๐‘ฅ + grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ +12๐‘ฃ๐‘ฃ, Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ

If Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ is invertible, ๐‘š๐‘š๐‘ฅ๐‘ฅ: T๐‘ฅ๐‘ฅโ„ณ โ†’ ๐‘๐‘ has one critical point, solution of this linear system:

Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ = โˆ’grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ for ๐‘ฃ๐‘ฃ โˆˆ T๐‘ฅ๐‘ฅโ„ณ

Newtonโ€™s method on โ„ณ: ๐‘ฅ๐‘ฅnext = ๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ .

Claim: if Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅโˆ— โ‰ป 0, quadratic local convergence.44

Page 43: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

We need one more toolโ€ฆ

The truncated expansion

๐‘“๐‘“ ๐‘…๐‘…๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ โ‰ˆ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ + grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ ,๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ +12๐‘ฃ๐‘ฃ, Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ

is not quite correct (well, not alwaysโ€ฆ)

To see why, letโ€™s expand ๐‘“๐‘“ along some smooth curve.

45

Page 44: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

We need one more toolโ€ฆ (2)Letโ€™s expand ๐‘“๐‘“ along some smooth curve.

With ๐‘๐‘:๐‘๐‘ โ†’โ„ณ s.t. ๐‘๐‘ 0 = ๐‘ฅ๐‘ฅ and ๐‘๐‘โ€ฒ 0 = ๐‘ฃ๐‘ฃ, the composition ๐‘”๐‘” = ๐‘“๐‘“ โˆ˜ ๐‘๐‘ maps ๐‘๐‘ โ†’ ๐‘๐‘, so Taylor holds:

๐‘”๐‘” ๐‘ก๐‘ก โ‰ˆ ๐‘”๐‘” 0 + ๐‘ก๐‘ก๐‘”๐‘”โ€ฒ 0 +๐‘ก๐‘ก2

2๐‘”๐‘”โ€ฒโ€ฒ 0

๐‘”๐‘” 0 = ๐‘“๐‘“ ๐‘ฅ๐‘ฅ

๐‘”๐‘”โ€ฒ ๐‘ก๐‘ก = D๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก ๐‘๐‘โ€ฒ ๐‘ก๐‘ก = grad๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก , ๐‘๐‘โ€ฒ ๐‘ก๐‘ก ๐‘๐‘(๐‘ก๐‘ก)

๐‘”๐‘”โ€ฒ 0 = grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ

๐‘”๐‘”โ€ฒโ€ฒ 0 = ? ? ?46

Page 45: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Differentiating vector fields along curves

A map ๐‘๐‘:๐‘๐‘ โ†’ Tโ„ณ is a vector field along ๐‘๐‘:๐‘๐‘ โ†’โ„ณ if ๐‘๐‘(๐‘ก๐‘ก)is a tangent vector at ๐‘๐‘(๐‘ก๐‘ก).Let ๐’ณ๐’ณ ๐‘๐‘ denote the set of smooth such fields.

Claim: there exists a unique operator Dd๐‘ก๐‘ก

:๐’ณ๐’ณ ๐‘๐‘ โ†’ ๐’ณ๐’ณ ๐‘๐‘ s.t.1. D

d๐‘ก๐‘ก๐‘Ž๐‘Ž๐‘Ž๐‘Ž + ๐‘๐‘๐‘๐‘ = ๐‘Ž๐‘Ž D

d๐‘ก๐‘ก๐‘Ž๐‘Ž + ๐‘๐‘ D

d๐‘ก๐‘ก๐‘๐‘

2. Dd๐‘ก๐‘ก

๐‘”๐‘”๐‘๐‘ = ๐‘”๐‘”โ€ฒ๐‘๐‘ + ๐‘”๐‘” Dd๐‘ก๐‘ก๐‘๐‘

3. Dd๐‘ก๐‘ก

๐‘ˆ๐‘ˆ ๐‘๐‘(๐‘ก๐‘ก) = ๐›ป๐›ป๐‘๐‘โ€ฒ(๐‘ก๐‘ก)๐‘ˆ๐‘ˆ

4. dd๐‘ก๐‘ก

๐‘Ž๐‘Ž ๐‘ก๐‘ก ,๐‘๐‘ ๐‘ก๐‘ก ๐‘๐‘(๐‘ก๐‘ก) = Dd๐‘ก๐‘ก๐‘Ž๐‘Ž ๐‘ก๐‘ก ,๐‘๐‘

๐‘๐‘(๐‘ก๐‘ก)+ ๐‘Ž๐‘Ž ๐‘ก๐‘ก , D

d๐‘ก๐‘ก๐‘๐‘ ๐‘ก๐‘ก

๐‘๐‘(๐‘ก๐‘ก)

where ๐›ป๐›ป is the Riemannian connection.47

Page 46: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Differentiating fields along curves (2)

Claim: there exists a unique operator Dd๐‘ก๐‘ก

:๐’ณ๐’ณ ๐‘๐‘ โ†’ ๐’ณ๐’ณ ๐‘๐‘ s.t.1. D

d๐‘ก๐‘ก๐‘Ž๐‘Ž๐‘Ž๐‘Ž + ๐‘๐‘๐‘๐‘ = ๐‘Ž๐‘Ž D

d๐‘ก๐‘ก๐‘Ž๐‘Ž + ๐‘๐‘ D

d๐‘ก๐‘ก๐‘๐‘

2. Dd๐‘ก๐‘ก

๐‘”๐‘”๐‘๐‘ = ๐‘”๐‘”โ€ฒ๐‘๐‘ + ๐‘”๐‘” Dd๐‘ก๐‘ก๐‘๐‘

3. Dd๐‘ก๐‘ก

๐‘ˆ๐‘ˆ ๐‘๐‘(๐‘ก๐‘ก) = ๐›ป๐›ป๐‘๐‘โ€ฒ(๐‘ก๐‘ก)๐‘ˆ๐‘ˆ

4. dd๐‘ก๐‘ก

๐‘Ž๐‘Ž ๐‘ก๐‘ก ,๐‘๐‘ ๐‘ก๐‘ก ๐‘๐‘(๐‘ก๐‘ก) = Dd๐‘ก๐‘ก๐‘Ž๐‘Ž ๐‘ก๐‘ก ,๐‘๐‘

๐‘๐‘(๐‘ก๐‘ก)+ ๐‘Ž๐‘Ž ๐‘ก๐‘ก , D

d๐‘ก๐‘ก๐‘๐‘ ๐‘ก๐‘ก

๐‘๐‘(๐‘ก๐‘ก)

where ๐›ป๐›ป is the Riemannian connection.

Claim: if โ„ณ is a Riemannian submanifold of โ„ฐ,Dd๐‘ก๐‘ก๐‘๐‘ ๐‘ก๐‘ก = Proj๐‘๐‘ ๐‘ก๐‘ก

dd๐‘ก๐‘ก๐‘๐‘ ๐‘ก๐‘ก

48

Page 47: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

With ๐‘”๐‘” ๐‘ก๐‘ก = ๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก and ๐‘”๐‘”โ€ฒ ๐‘ก๐‘ก = grad๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก , ๐‘๐‘โ€ฒ ๐‘ก๐‘ก ๐‘๐‘(๐‘ก๐‘ก):

๐‘”๐‘”โ€ฒโ€ฒ ๐‘ก๐‘ก =Dd๐‘ก๐‘ก

grad๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก , ๐‘๐‘โ€ฒ ๐‘ก๐‘ก๐‘๐‘(๐‘ก๐‘ก)

+ grad๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก ,Dd๐‘ก๐‘ก๐‘๐‘โ€ฒ ๐‘ก๐‘ก

๐‘๐‘(๐‘ก๐‘ก)

= ๐›ป๐›ป๐‘๐‘โ€ฒ ๐‘ก๐‘ก grad๐‘“๐‘“, ๐‘๐‘โ€ฒ ๐‘ก๐‘ก๐‘๐‘(๐‘ก๐‘ก)

+ grad๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก , ๐‘๐‘โ€ฒโ€ฒ(๐‘ก๐‘ก) ๐‘๐‘(๐‘ก๐‘ก)

๐‘”๐‘”โ€ฒโ€ฒ(0) = Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ , ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ + grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘๐‘โ€ฒโ€ฒ(0) ๐‘ฅ๐‘ฅ

49

1. Dd๐‘ก๐‘ก

๐‘Ž๐‘Ž๐‘Ž๐‘Ž + ๐‘๐‘๐‘๐‘ = ๐‘Ž๐‘Ž Dd๐‘ก๐‘ก๐‘Ž๐‘Ž + ๐‘๐‘ D

d๐‘ก๐‘ก๐‘๐‘

2. Dd๐‘ก๐‘ก

๐‘”๐‘”๐‘๐‘ = ๐‘”๐‘”โ€ฒ๐‘๐‘ + ๐‘”๐‘” Dd๐‘ก๐‘ก๐‘๐‘

3. Dd๐‘ก๐‘ก

๐‘ˆ๐‘ˆ ๐‘๐‘(๐‘ก๐‘ก) = ๐›ป๐›ป๐‘๐‘โ€ฒ(๐‘ก๐‘ก)๐‘ˆ๐‘ˆ

4. dd๐‘ก๐‘ก

๐‘Ž๐‘Ž ๐‘ก๐‘ก ,๐‘๐‘ ๐‘ก๐‘ก ๐‘๐‘(๐‘ก๐‘ก) = Dd๐‘ก๐‘ก๐‘Ž๐‘Ž ๐‘ก๐‘ก ,๐‘๐‘

๐‘๐‘(๐‘ก๐‘ก)+ ๐‘Ž๐‘Ž ๐‘ก๐‘ก , D

d๐‘ก๐‘ก๐‘๐‘ ๐‘ก๐‘ก

๐‘๐‘(๐‘ก๐‘ก)

Page 48: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

With ๐‘”๐‘” ๐‘ก๐‘ก = ๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก and ๐‘”๐‘”โ€ฒ ๐‘ก๐‘ก = grad๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก , ๐‘๐‘โ€ฒ ๐‘ก๐‘ก ๐‘๐‘(๐‘ก๐‘ก):

๐‘”๐‘”โ€ฒโ€ฒ ๐‘ก๐‘ก =Dd๐‘ก๐‘ก

grad๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก , ๐‘๐‘โ€ฒ ๐‘ก๐‘ก๐‘๐‘(๐‘ก๐‘ก)

+ grad๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก ,Dd๐‘ก๐‘ก๐‘๐‘โ€ฒ ๐‘ก๐‘ก

๐‘๐‘(๐‘ก๐‘ก)

= ๐›ป๐›ป๐‘๐‘โ€ฒ ๐‘ก๐‘ก grad๐‘“๐‘“, ๐‘๐‘โ€ฒ ๐‘ก๐‘ก๐‘๐‘(๐‘ก๐‘ก)

+ grad๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก , ๐‘๐‘โ€ฒโ€ฒ(๐‘ก๐‘ก) ๐‘๐‘(๐‘ก๐‘ก)

๐‘”๐‘”โ€ฒโ€ฒ(0) = Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ , ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ + grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘๐‘โ€ฒโ€ฒ(0) ๐‘ฅ๐‘ฅ

๐‘“๐‘“ ๐‘๐‘ ๐‘ก๐‘ก = ๐‘“๐‘“ ๐‘ฅ๐‘ฅ + ๐‘ก๐‘ก โ‹… grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ +๐‘ก๐‘ก2

2โ‹… Hess๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ , ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ

+๐‘ก๐‘ก2

2โ‹… grad๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘๐‘โ€ฒโ€ฒ 0 ๐‘ฅ๐‘ฅ + ๐‘‚๐‘‚ ๐‘ก๐‘ก3

The annoying term vanishes at critical points and for specialcurves (special retractions). Mostly fine for optimization.

50

Page 49: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

Trust-region method:Newtonโ€™s with a safeguardWith the same tools, we can design a Riemanniantrust-region method: RTR (Absil, Baker & Gallivan โ€™07).

Approximately minimizes quadratic model of ๐‘“๐‘“ โˆ˜ ๐‘…๐‘…๐‘ฅ๐‘ฅ๐‘˜๐‘˜restricted to a ball in T๐‘ฅ๐‘ฅ๐‘˜๐‘˜โ„ณ, with dynamic radius.

Complexity known. Excellent performance, also withapproximate Hessian (e.g., finite differences of grad๐‘“๐‘“).

In Manopt, call trustregions(problem).

51

Page 50: Introduction to optimization on smooth manifoldsweb.math.princeton.edu/~nboumal/papers/IntroOptimManifoldsSlides2020.pdfA difficulty on manifolds. A smooth vector field ๐‘‰๐‘‰. on

In the lecture notes:โ€ข Proofs for all claims in these slidesโ€ข References to the (growing) literatureโ€ข Short descriptions of applicationsโ€ข Fully worked out manifolds

E.g.: Stiefel, fixed-rank matrices, general ๐‘ฅ๐‘ฅ โˆˆ โ„ฐ:โ„Ž ๐‘ฅ๐‘ฅ = 0โ€ข Details about computation, pitfalls and tricks

E.g.: how to compute gradients, checkgradient, checkhessian

โ€ข Theory for general manifoldsโ€ข Theory for quotient manifoldsโ€ข Discussion of more advanced geometric tools

E.g.: distance, exp, log, transports, Lipschitz, finite differencesโ€ข Basics of geodesic convexity

52

nicolasboumal.net/book