Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
An introduction tooptimization on manifolds
min๐ฅ๐ฅโโณ
๐๐(๐ฅ๐ฅ)
Nicolas Boumal, July 2020Mathematics @ EPFL
Optimization on smooth manifoldsmin๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ subject to ๐ฅ๐ฅ โ โณ
Linear spacesUnconstrained; linear equality constraintsLow rank (matrices, tensors)Recommender systems, large-scale Lyapunov equations, โฆOrthonormality (Grassmann, Stiefel, rotations)Dictionary learning, SfM, SLAM, PCA, ICA, SBM, Electr. Struct. Comp.โฆPositivity (positive definiteness, positive orthant)Metric learning, Gaussian mixtures, diffusion tensor imaging, โฆSymmetry (quotient manifolds)Invariance under group actions
2
A Riemannian structure gives us gradients and Hessians
The essential tools of smooth optimization are defined generally on Riemannian manifolds.
Unified theory, broadly applicable algorithms.
First ideas from the โ70s.First practical in the โ90s.
3
5
6
7
8
9
10
manopt.org
With Bamdev Mishra,P.-A. Absil & R. Sepulchre
11
Lead by J. Townsend,N. Koep & S. Weichwald
Lead by Ronny Bergmann
manopt.org
With Bamdev Mishra,P.-A. Absil & R. Sepulchre
12
Lead by J. Townsend,N. Koep & S. Weichwald
Lead by Ronny Bergmann
nicolasboumal.net/book
What is a smooth manifold?
13
Yes
Yes
No
This overview: restricted to embedded submanifolds of linear spaces.
๐ฅ๐ฅ
๐ฃ๐ฃ
โณ โ โฐ is an embedded submanifold of codimension ๐๐ if:There exists a smooth function โ:โฐ โ ๐๐๐๐ such that:
โณ โก โ ๐ฅ๐ฅ = 0 and Dโ ๐ฅ๐ฅ :โฐ โ ๐๐๐๐ has rank ๐๐ โ๐ฅ๐ฅ โ โณ
E.g., sphere: โ ๐ฅ๐ฅ = ๐ฅ๐ฅ๐๐๐ฅ๐ฅ โ 1 and Dโ ๐ฅ๐ฅ ๐ฃ๐ฃ = 2๐ฅ๐ฅ๐๐๐ฃ๐ฃ
These properties allow us to linearize the set:โ ๐ฅ๐ฅ + ๐ก๐ก๐ฃ๐ฃ = โ ๐ฅ๐ฅ + ๐ก๐กDโ ๐ฅ๐ฅ ๐ฃ๐ฃ + ๐๐ ๐ก๐ก2
This is ๐๐ ๐ก๐ก2 iff ๐ฃ๐ฃ โ ker Dโ ๐ฅ๐ฅ .โTangent spaces: T๐ฅ๐ฅโณ = ker Dโ ๐ฅ๐ฅ ,
(subspace of โฐ)
What is a smooth manifold? (2)
14
โณ โ โฐ is an embedded submanifold of codimension ๐๐ if:
For each ๐ฅ๐ฅ โ โณ there is a neighborhood ๐๐ of ๐ฅ๐ฅ in โฐand a smooth function โ:๐๐ โ ๐๐๐๐ such that:
a) Dโ ๐ฅ๐ฅ :โฐ โ ๐๐๐๐ has rank ๐๐, and
b) โณโฉ๐๐ = โโ1 0 = ๐ฆ๐ฆ โ ๐๐:โ ๐ฆ๐ฆ = 0
(Necessary for fixed-rank matrices.)
What is a smooth manifold? (3)
15
Bootstrapping our set of tools
1. Smooth maps2. Differentials of smooth maps3. Vector fields and tangent bundles4. Retractions5. Riemannian metrics6. Riemannian gradients7. Riemannian connections8. Riemannian Hessians9. Riemannian covariant derivatives along curves
16
Smooth maps between manifolds
With โณ,โณโฒ embedded in โฐ,โฐโฒ,
Define: a map ๐น๐น:โณ โโณโฒ is smooth if:
There exists a neighborhood ๐๐ of โณ in โฐ and a smooth map ๏ฟฝ๐น๐น:๐๐ โ โฐโฒ such that ๐น๐น = ๏ฟฝ๐น๐น|โณ .
We call ๏ฟฝ๐น๐น a smooth extension of ๐น๐น.
17
Differential of a smooth map
18
For ๏ฟฝ๐น๐น:โฐ โ โฐโฒ, we have D ๏ฟฝ๐น๐น ๐ฅ๐ฅ ๐ฃ๐ฃ = lim๐ก๐กโ0
๏ฟฝ๐น๐น ๐ฅ๐ฅ+๐ก๐ก๐ก๐ก โ ๏ฟฝ๐น๐น ๐ฅ๐ฅ๐ก๐ก
.
For ๐น๐น:โณ โโณโฒ smooth, ๐น๐น ๐ฅ๐ฅ + ๐ก๐ก๐ฃ๐ฃ may not be defined.But picking a smooth extension ๏ฟฝ๐น๐น, for ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ we say:
Define: D๐น๐น ๐ฅ๐ฅ ๐ฃ๐ฃ = lim๐ก๐กโ0
๏ฟฝ๐น๐น ๐ฅ๐ฅ+๐ก๐ก๐ก๐ก โ ๏ฟฝ๐น๐น ๐ฅ๐ฅ๐ก๐ก
= D ๏ฟฝ๐น๐น ๐ฅ๐ฅ ๐ฃ๐ฃ .
Claim: this does not depend on choice of ๏ฟฝ๐น๐น.Claim: we retain linearity, product rule & chain rule.
Differential of a smooth map (2)
19
We defined D๐น๐น ๐ฅ๐ฅ = D ๏ฟฝ๐น๐น ๐ฅ๐ฅ |T๐ฅ๐ฅโณ . Equivalently:
Claim: for each ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ, there exists a smooth curve ๐๐:๐๐ โ โณ such that ๐๐ 0 = ๐ฅ๐ฅ and ๐๐โฒ 0 = ๐ฃ๐ฃ.
Differential of a smooth map (2)
20
We defined D๐น๐น ๐ฅ๐ฅ = D ๏ฟฝ๐น๐น ๐ฅ๐ฅ |T๐ฅ๐ฅโณ . Equivalently:
Claim: for each ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ, there exists a smooth curve ๐๐:๐๐ โ โณ such that ๐๐ 0 = ๐ฅ๐ฅ and ๐๐โฒ 0 = ๐ฃ๐ฃ.
Define: D๐น๐น ๐ฅ๐ฅ ๐ฃ๐ฃ = lim๐ก๐กโ0
๐น๐น ๐๐(๐ก๐ก) โ๐น๐น ๐ฅ๐ฅ๐ก๐ก
= ๐น๐น โ ๐๐ โฒ 0 .
Vector fields and the tangent bundleA vector field ๐๐ ties to each ๐ฅ๐ฅ a vector ๐๐ ๐ฅ๐ฅ โ T๐ฅ๐ฅโณ.
What does it mean for ๐๐ to be smooth?
Define: the tangent bundle of โณ is the disjoint union of all tangent spaces:
Tโณ = ๐ฅ๐ฅ, ๐ฃ๐ฃ : ๐ฅ๐ฅ โ โณ and ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ
Claim: Tโณ is a manifold (embedded in โฐ ร โฐ).
๐๐ is smooth if it is smooth as a map from โณ to Tโณ.23
Aside: new manifolds from old ones
Given a manifold โณ, we can create a new manifold by considering its tangent bundle.
Here are other ways to recycle:
โข Products of manifolds: โณ ร โณโฒ, โณ๐๐
โข Open subsets of manifolds*โข (Quotienting some equivalence relations.)
*For embedded submanifolds: subset topology.24
Retractions: moving around
Given ๐ฅ๐ฅ, ๐ฃ๐ฃ โ Tโณ, we can move away from ๐ฅ๐ฅ along ๐ฃ๐ฃ using any ๐๐:๐๐ โโณ with ๐๐ 0 = ๐ฅ๐ฅ and ๐๐โฒ 0 = ๐ฃ๐ฃ.
Retractions are a smooth choice of curves over Tโณ.
Define: A retraction is a smooth map ๐ ๐ : Tโณ โโณsuch that if ๐๐ ๐ก๐ก = ๐ ๐ ๐ฅ๐ฅ, ๐ก๐ก๐ฃ๐ฃ = ๐ ๐ ๐ฅ๐ฅ(๐ก๐ก๐ฃ๐ฃ)then ๐๐ 0 = ๐ฅ๐ฅ and ๐๐โฒ 0 = ๐ฃ๐ฃ.
25
Retractions: moving around (2)
Define: a retraction is a smooth map ๐ ๐ : Tโณ โโณsuch that if ๐๐ ๐ก๐ก = ๐ ๐ ๐ฅ๐ฅ, ๐ก๐ก๐ฃ๐ฃ = ๐ ๐ ๐ฅ๐ฅ(๐ก๐ก๐ฃ๐ฃ),then ๐๐ 0 = ๐ฅ๐ฅ and ๐๐โฒ 0 = ๐ฃ๐ฃ.
Equivalently: ๐ ๐ ๐ฅ๐ฅ 0 = ๐ฅ๐ฅ and D๐ ๐ ๐ฅ๐ฅ 0 = Id.
Typical choice: ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = projection of ๐ฅ๐ฅ + ๐ฃ๐ฃ to โณ:โขโณ = ๐๐๐๐, ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = ๐ฅ๐ฅ + ๐ฃ๐ฃโขโณ = S๐๐โ1, ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = ๐ฅ๐ฅ+๐ก๐ก
๐ฅ๐ฅ+๐ก๐กโขโณ = ๐๐๐๐๐๐ร๐๐, ๐ ๐ ๐๐ ๐๐ = SVD๐๐ ๐๐ + ๐๐
26
Towards gradients: reminders from ๐๐๐๐
The gradient of a smooth ๐๐:๐๐๐๐ โ ๐๐ at ๐ฅ๐ฅ is defined by:
D ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = grad ๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ for all ๐ฃ๐ฃ โ ๐๐๐๐.
In particular, with the inner product ๐ข๐ข, ๐ฃ๐ฃ = ๐ข๐ขโค๐ฃ๐ฃ,
grad ๐๐ ๐ฅ๐ฅ ๐๐ = grad ๐๐ ๐ฅ๐ฅ , ๐๐๐๐ = D ๐๐ ๐ฅ๐ฅ [๐๐๐๐] =๐๐ ๐๐๐๐๐ฅ๐ฅ๐๐
๐ฅ๐ฅ .
Note: grad ๐๐ is a smooth vector field on ๐๐๐๐.27
Riemannian metrics
T๐ฅ๐ฅโณ is a linear space (subspace of โฐ).
Pick an inner product โ ,โ ๐ฅ๐ฅ for each T๐ฅ๐ฅโณ.
Define: โ ,โ ๐ฅ๐ฅ defines a Riemannian metric on โณ if for any two smooth vector fields ๐๐,๐๐ the function ๐ฅ๐ฅ โฆ ๐๐(๐ฅ๐ฅ),๐๐(๐ฅ๐ฅ) ๐ฅ๐ฅ is smooth.
A Riemannian manifold is a manifold with a Riemannian metric.
28
Riemannian submanifolds
Let โ ,โ be the inner product on โฐ.
Since T๐ฅ๐ฅโณ is a linear subspace of โฐ, one choice is:
๐ข๐ข, ๐ฃ๐ฃ ๐ฅ๐ฅ = ๐ข๐ข, ๐ฃ๐ฃ
Claim: this defines a Riemannian metric on โณ.
With this metric, โณ is a Riemannian submanifold of โฐ.
29
Riemannian gradient
Let ๐๐:โณ โ ๐๐ be smooth on a Riemannian manifold.
Define: the Riemannian gradient of ๐๐ at ๐ฅ๐ฅ is the unique tangent vector at ๐ฅ๐ฅ such that:
D๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = grad๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ ๐ฅ๐ฅ for all ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ
Claim: grad๐๐ is a smooth vector field.Claim: if ๐ฅ๐ฅ is a local optimum of ๐๐, grad๐๐ ๐ฅ๐ฅ = 0.
30
Gradients on Riemannian submanifoldsLet ๐๐ be a smooth extension of ๐๐. For all ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ:
grad๐๐ ๐ฅ๐ฅ ,๐ฃ๐ฃ ๐ฅ๐ฅ = D๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ= D ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = grad ๐๐ ๐ฅ๐ฅ ,๐ฃ๐ฃ
Assume โณ is a Riemannian submanifold of โฐ.Since โ ,โ = โ ,โ ๐ฅ๐ฅ, by uniqueness we conclude:
grad๐๐ ๐ฅ๐ฅ = Proj๐ฅ๐ฅ grad ๐๐ ๐ฅ๐ฅ
Proj๐ฅ๐ฅ is the orthogonal projector from โฐ to T๐ฅ๐ฅโณ.31
A first algorithm: gradient descent
For ๐๐:๐๐๐๐ โ ๐๐, ๐ฅ๐ฅ๐๐+1 = ๐ฅ๐ฅ๐๐ โ ๐ผ๐ผ๐๐grad ๐๐ ๐ฅ๐ฅ๐๐
For ๐๐:โณ โ ๐๐, ๐ฅ๐ฅ๐๐+1 = ๐ ๐ ๐ฅ๐ฅ๐๐ โ๐ผ๐ผ๐๐grad๐๐ ๐ฅ๐ฅ๐๐
For the analysis, need to understand ๐๐ ๐ฅ๐ฅ๐๐+1 :
The composition ๐๐ โ ๐ ๐ ๐ฅ๐ฅ: T๐ฅ๐ฅโณ โ ๐๐ is on a linear space, hence we may Taylor expand it.
32
A first algorithm: gradient descent๐ฅ๐ฅ๐๐+1 = ๐ ๐ ๐ฅ๐ฅ๐๐ โ๐ผ๐ผ๐๐grad๐๐ ๐ฅ๐ฅ๐๐
The composition ๐๐ โ ๐ ๐ ๐ฅ๐ฅ: T๐ฅ๐ฅโณ โ ๐๐ is on a linear space, hence we may Taylor expand it:
๐๐ ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = ๐๐ ๐ ๐ ๐ฅ๐ฅ 0 + grad ๐๐ โ ๐ ๐ ๐ฅ๐ฅ 0 , ๐ฃ๐ฃ ๐ฅ๐ฅ + ๐๐ ๐ฃ๐ฃ ๐ฅ๐ฅ2
= ๐๐ ๐ฅ๐ฅ + grad๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ ๐ฅ๐ฅ + ๐๐ ๐ฃ๐ฃ ๐ฅ๐ฅ2
Indeed: D ๐๐ โ ๐ ๐ ๐ฅ๐ฅ 0 ๐ฃ๐ฃ = D๐๐ ๐ ๐ ๐ฅ๐ฅ 0 D๐ ๐ ๐ฅ๐ฅ 0 ๐ฃ๐ฃ = D๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ .
33
Gradient descent on โณA1 ๐๐ ๐ฅ๐ฅ โฅ ๐๐low for all ๐ฅ๐ฅ โ โณA2 ๐๐ ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ โค ๐๐ ๐ฅ๐ฅ + ๐ฃ๐ฃ, grad๐๐ ๐ฅ๐ฅ + ๐ฟ๐ฟ
2๐ฃ๐ฃ 2
Algorithm: ๐ฅ๐ฅ๐๐+1 = ๐ ๐ ๐ฅ๐ฅ๐๐ โ 1๐ฟ๐ฟ
grad๐๐(๐ฅ๐ฅ๐๐)
Complexity: grad๐๐(๐ฅ๐ฅ๐๐) โค ๐๐ with some ๐๐ โค 2๐ฟ๐ฟ ๐๐ ๐ฅ๐ฅ0 โ ๐๐low1๐๐2
A2 โ ๐๐ ๐ฅ๐ฅ๐๐+1 โค ๐๐ ๐ฅ๐ฅ๐๐ โ1๐ฟ๐ฟ
grad๐๐ ๐ฅ๐ฅ๐๐ 2 +12๐ฟ๐ฟ
grad๐๐(๐ฅ๐ฅ๐๐) 2
โ ๐๐ ๐ฅ๐ฅ๐๐ โ ๐๐ ๐ฅ๐ฅ๐๐+1 โฅ12๐ฟ๐ฟ
grad๐๐ ๐ฅ๐ฅ๐๐ 2
๐๐๐๐ โ ๐๐ ๐ฅ๐ฅ0 โ ๐๐low โฅ โ๐๐=0๐พ๐พโ1 ๐๐ ๐ฅ๐ฅ๐๐ โ ๐๐ ๐ฅ๐ฅ๐๐+1 > ๐๐2
2๐ฟ๐ฟ๐พ๐พ
34
๐ฅ๐ฅ
๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ๐ฃ๐ฃ
(for contradiction)
Tips and tricks to get the gradient
Use definition as starting point: D๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = โฏ
Cheap gradient principle
Numerical check of gradient: Taylor ๐ก๐ก โฆ ๐๐ ๐ ๐ ๐ฅ๐ฅ ๐ก๐ก๐ฃ๐ฃManopt: checkgradient(problem)
Automatic differentiation: Python, JuliaNot Matlab :/โthis being said, for theory, often need to manipulate gradient โon paperโ anyway.
35
On to second-order methods
Consider ๐๐:๐๐๐๐ โ ๐๐ smooth. Taylor says:
๐๐ ๐ฅ๐ฅ + ๐ฃ๐ฃ โ ๐๐ ๐ฅ๐ฅ + grad ๐๐ ๐ฅ๐ฅ ,๐ฃ๐ฃ +12๐ฃ๐ฃ, Hess ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ
If Hess ๐๐ ๐ฅ๐ฅ โป 0, quadratic model minimized for ๐ฃ๐ฃ s.t.:
Hess ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = โgrad ๐๐ ๐ฅ๐ฅ
From there, we can construct Newtonโs method etc.36
Towards Hessians: reminders from ๐๐๐๐
Consider ๐๐:๐๐๐๐ โ ๐๐ smooth.
The Hessian of ๐๐ at ๐ฅ๐ฅ is a linear operator which tells us how the gradient vector field of ๐๐ varies:
Hess ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = Dgrad ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ
With ๐ข๐ข, ๐ฃ๐ฃ = ๐ข๐ขโค๐ฃ๐ฃ, yields: Hess ๐๐ ๐ฅ๐ฅ ๐๐๐๐ = ๐๐2 ๐๐๐๐๐ฅ๐ฅ๐๐๐๐๐ฅ๐ฅ๐๐
(๐ฅ๐ฅ).
Notice that Hess ๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ is a vector in ๐๐๐๐.
37
A difficulty on manifolds
A smooth vector field ๐๐ on โณ is a smooth map:we already have a notion of how to differentiate it.
Example: with ๐๐ ๐ฅ๐ฅ = 12๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ on the sphere,
๐๐ ๐ฅ๐ฅ = grad๐๐ ๐ฅ๐ฅ = Proj๐ฅ๐ฅ ๐ด๐ด๐ฅ๐ฅ = ๐ด๐ด๐ฅ๐ฅ โ ๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ ๐ฅ๐ฅ
D๐๐ ๐ฅ๐ฅ ๐ข๐ข = โฏ = Proj๐ฅ๐ฅ ๐ด๐ด๐ข๐ข โ ๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ ๐ข๐ข โ ๐ฅ๐ฅโค๐ด๐ด๐ข๐ข ๐ฅ๐ฅ
Issue: D๐๐ ๐ฅ๐ฅ ๐ข๐ข may not be a tangent vector at ๐ฅ๐ฅ!
38
Connections:A tool to differentiate vector fieldsLet ๐ณ๐ณ โณ be the set of smooth vector fields on โณ.Given ๐๐ โ ๐ณ๐ณ โณ and smooth ๐๐, ๐๐๐๐ ๐ฅ๐ฅ = D๐๐ ๐ฅ๐ฅ [๐๐(๐ฅ๐ฅ)].
A map ๐ป๐ป:๐ณ๐ณ โณ ร ๐ณ๐ณ โณ โ ๐ณ๐ณ โณ is a connection if:1. ๐ป๐ป๐๐๐๐+๐๐๐๐ ๐๐ = ๐๐๐ป๐ป๐๐๐๐ + ๐๐๐ป๐ป๐๐๐๐2. ๐ป๐ป๐๐ ๐๐๐๐ + ๐๐๐๐ = ๐๐๐ป๐ป๐๐๐๐ + ๐๐๐ป๐ป๐๐๐๐3. ๐ป๐ป๐๐ ๐๐๐๐ = ๐๐๐๐ ๐๐ + ๐๐๐ป๐ป๐๐๐๐
Example: for โณ = โฐ, ๐ป๐ป๐๐๐๐ ๐ฅ๐ฅ = D๐๐ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅExample: for โณ โ โฐ, ๐ป๐ป๐๐๐๐ ๐ฅ๐ฅ = Proj๐ฅ๐ฅ D๐๐ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ
39
Riemannian connections:A unique and favorable choiceLet โณ be a Riemannian manifold.Claim: there exists a unique connection ๐ป๐ป on โณ s.t.:
4. ๐ป๐ป๐๐๐๐ โ ๐ป๐ป๐๐๐๐ ๐๐ = ๐๐ ๐๐๐๐ โ ๐๐ ๐๐๐๐5. ๐๐ ๐๐,๐๐ = ๐ป๐ป๐๐๐๐,๐๐ + ๐๐,๐ป๐ป๐๐๐๐
It is called the Riemannian connection (Levi-Civita).
Claim: if โณ is a Riemannian submanifold of โฐ, then
๐ป๐ป๐๐๐๐ ๐ฅ๐ฅ = Proj๐ฅ๐ฅ D๐๐ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ
is the Riemannian connection on โณ.40
Riemannian HessiansClaim: ๐ป๐ป๐๐๐๐ ๐ฅ๐ฅ depends on ๐๐ only through ๐๐(๐ฅ๐ฅ).This justifies the notation ๐ป๐ป๐ข๐ข๐๐; e.g.: ๐ป๐ป๐ข๐ข๐๐ = Proj๐ฅ๐ฅ D๐๐ ๐ฅ๐ฅ ๐ข๐ข
Define: the Riemannian Hessian of ๐๐:โณ โ ๐๐ at ๐ฅ๐ฅis a linear operator from T๐ฅ๐ฅโณ to T๐ฅ๐ฅโณ defined by:
Hess๐๐ ๐ฅ๐ฅ ๐ข๐ข = ๐ป๐ป๐ข๐ขgrad๐๐
where ๐ป๐ป is the Riemannian connection.
Claim: Hess๐๐ ๐ฅ๐ฅ is self-adjoint.Claim: if ๐ฅ๐ฅ is a local minimum, then Hess๐๐ ๐ฅ๐ฅ โฝ 0.
41
Hessians on Riemannian submanifolds
Hess๐๐ ๐ฅ๐ฅ ๐ข๐ข = ๐ป๐ป๐ข๐ขgrad๐๐(๐ฅ๐ฅ)
On a Riemannian submanifold of a linear space,
๐ป๐ป๐ข๐ข๐๐ = Proj๐ฅ๐ฅ D๐๐ ๐ฅ๐ฅ ๐ข๐ข
Combining:
Hess๐๐ ๐ฅ๐ฅ ๐ข๐ข = Proj๐ฅ๐ฅ Dgrad๐๐ ๐ฅ๐ฅ ๐ข๐ข
42
Hessians on Riemannian submanifolds (2)
Hess๐๐ ๐ฅ๐ฅ ๐ข๐ข = Proj๐ฅ๐ฅ Dgrad๐๐ ๐ฅ๐ฅ ๐ข๐ข
Example: ๐๐ ๐ฅ๐ฅ = 12๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ on the sphere in ๐๐๐๐.
๐๐ ๐ฅ๐ฅ = grad๐๐ ๐ฅ๐ฅ = Proj๐ฅ๐ฅ ๐ด๐ด๐ฅ๐ฅ = ๐ด๐ด๐ฅ๐ฅ โ ๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ ๐ฅ๐ฅ
D๐๐ ๐ฅ๐ฅ ๐ข๐ข = Proj๐ฅ๐ฅ ๐ด๐ด๐ข๐ข โ ๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ ๐ข๐ข โ ๐ฅ๐ฅโค๐ด๐ด๐ข๐ข ๐ฅ๐ฅ
Hess๐๐ ๐ฅ๐ฅ ๐ข๐ข = Proj๐ฅ๐ฅ ๐ด๐ด๐ข๐ข โ ๐ฅ๐ฅโค๐ด๐ด๐ฅ๐ฅ ๐ข๐ข
Remarkably, grad๐๐ ๐ฅ๐ฅ = 0 and Hess๐๐ ๐ฅ๐ฅ โฝ 0 iff ๐ฅ๐ฅ optimal.
43
Newton, Taylor and RiemannNow that we have a Hessian, we might guess:
๐๐ ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ โ ๐๐๐ฅ๐ฅ ๐ฃ๐ฃ = ๐๐ ๐ฅ๐ฅ + grad๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ ๐ฅ๐ฅ +12๐ฃ๐ฃ, Hess๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ ๐ฅ๐ฅ
If Hess๐๐ ๐ฅ๐ฅ is invertible, ๐๐๐ฅ๐ฅ: T๐ฅ๐ฅโณ โ ๐๐ has one critical point, solution of this linear system:
Hess๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ = โgrad๐๐ ๐ฅ๐ฅ for ๐ฃ๐ฃ โ T๐ฅ๐ฅโณ
Newtonโs method on โณ: ๐ฅ๐ฅnext = ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ .
Claim: if Hess๐๐ ๐ฅ๐ฅโ โป 0, quadratic local convergence.44
We need one more toolโฆ
The truncated expansion
๐๐ ๐ ๐ ๐ฅ๐ฅ ๐ฃ๐ฃ โ ๐๐ ๐ฅ๐ฅ + grad๐๐ ๐ฅ๐ฅ ,๐ฃ๐ฃ ๐ฅ๐ฅ +12๐ฃ๐ฃ, Hess๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ ๐ฅ๐ฅ
is not quite correct (well, not alwaysโฆ)
To see why, letโs expand ๐๐ along some smooth curve.
45
We need one more toolโฆ (2)Letโs expand ๐๐ along some smooth curve.
With ๐๐:๐๐ โโณ s.t. ๐๐ 0 = ๐ฅ๐ฅ and ๐๐โฒ 0 = ๐ฃ๐ฃ, the composition ๐๐ = ๐๐ โ ๐๐ maps ๐๐ โ ๐๐, so Taylor holds:
๐๐ ๐ก๐ก โ ๐๐ 0 + ๐ก๐ก๐๐โฒ 0 +๐ก๐ก2
2๐๐โฒโฒ 0
๐๐ 0 = ๐๐ ๐ฅ๐ฅ
๐๐โฒ ๐ก๐ก = D๐๐ ๐๐ ๐ก๐ก ๐๐โฒ ๐ก๐ก = grad๐๐ ๐๐ ๐ก๐ก , ๐๐โฒ ๐ก๐ก ๐๐(๐ก๐ก)
๐๐โฒ 0 = grad๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ ๐ฅ๐ฅ
๐๐โฒโฒ 0 = ? ? ?46
Differentiating vector fields along curves
A map ๐๐:๐๐ โ Tโณ is a vector field along ๐๐:๐๐ โโณ if ๐๐(๐ก๐ก)is a tangent vector at ๐๐(๐ก๐ก).Let ๐ณ๐ณ ๐๐ denote the set of smooth such fields.
Claim: there exists a unique operator Dd๐ก๐ก
:๐ณ๐ณ ๐๐ โ ๐ณ๐ณ ๐๐ s.t.1. D
d๐ก๐ก๐๐๐๐ + ๐๐๐๐ = ๐๐ D
d๐ก๐ก๐๐ + ๐๐ D
d๐ก๐ก๐๐
2. Dd๐ก๐ก
๐๐๐๐ = ๐๐โฒ๐๐ + ๐๐ Dd๐ก๐ก๐๐
3. Dd๐ก๐ก
๐๐ ๐๐(๐ก๐ก) = ๐ป๐ป๐๐โฒ(๐ก๐ก)๐๐
4. dd๐ก๐ก
๐๐ ๐ก๐ก ,๐๐ ๐ก๐ก ๐๐(๐ก๐ก) = Dd๐ก๐ก๐๐ ๐ก๐ก ,๐๐
๐๐(๐ก๐ก)+ ๐๐ ๐ก๐ก , D
d๐ก๐ก๐๐ ๐ก๐ก
๐๐(๐ก๐ก)
where ๐ป๐ป is the Riemannian connection.47
Differentiating fields along curves (2)
Claim: there exists a unique operator Dd๐ก๐ก
:๐ณ๐ณ ๐๐ โ ๐ณ๐ณ ๐๐ s.t.1. D
d๐ก๐ก๐๐๐๐ + ๐๐๐๐ = ๐๐ D
d๐ก๐ก๐๐ + ๐๐ D
d๐ก๐ก๐๐
2. Dd๐ก๐ก
๐๐๐๐ = ๐๐โฒ๐๐ + ๐๐ Dd๐ก๐ก๐๐
3. Dd๐ก๐ก
๐๐ ๐๐(๐ก๐ก) = ๐ป๐ป๐๐โฒ(๐ก๐ก)๐๐
4. dd๐ก๐ก
๐๐ ๐ก๐ก ,๐๐ ๐ก๐ก ๐๐(๐ก๐ก) = Dd๐ก๐ก๐๐ ๐ก๐ก ,๐๐
๐๐(๐ก๐ก)+ ๐๐ ๐ก๐ก , D
d๐ก๐ก๐๐ ๐ก๐ก
๐๐(๐ก๐ก)
where ๐ป๐ป is the Riemannian connection.
Claim: if โณ is a Riemannian submanifold of โฐ,Dd๐ก๐ก๐๐ ๐ก๐ก = Proj๐๐ ๐ก๐ก
dd๐ก๐ก๐๐ ๐ก๐ก
48
With ๐๐ ๐ก๐ก = ๐๐ ๐๐ ๐ก๐ก and ๐๐โฒ ๐ก๐ก = grad๐๐ ๐๐ ๐ก๐ก , ๐๐โฒ ๐ก๐ก ๐๐(๐ก๐ก):
๐๐โฒโฒ ๐ก๐ก =Dd๐ก๐ก
grad๐๐ ๐๐ ๐ก๐ก , ๐๐โฒ ๐ก๐ก๐๐(๐ก๐ก)
+ grad๐๐ ๐๐ ๐ก๐ก ,Dd๐ก๐ก๐๐โฒ ๐ก๐ก
๐๐(๐ก๐ก)
= ๐ป๐ป๐๐โฒ ๐ก๐ก grad๐๐, ๐๐โฒ ๐ก๐ก๐๐(๐ก๐ก)
+ grad๐๐ ๐๐ ๐ก๐ก , ๐๐โฒโฒ(๐ก๐ก) ๐๐(๐ก๐ก)
๐๐โฒโฒ(0) = Hess๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ , ๐ฃ๐ฃ ๐ฅ๐ฅ + grad๐๐ ๐ฅ๐ฅ , ๐๐โฒโฒ(0) ๐ฅ๐ฅ
49
1. Dd๐ก๐ก
๐๐๐๐ + ๐๐๐๐ = ๐๐ Dd๐ก๐ก๐๐ + ๐๐ D
d๐ก๐ก๐๐
2. Dd๐ก๐ก
๐๐๐๐ = ๐๐โฒ๐๐ + ๐๐ Dd๐ก๐ก๐๐
3. Dd๐ก๐ก
๐๐ ๐๐(๐ก๐ก) = ๐ป๐ป๐๐โฒ(๐ก๐ก)๐๐
4. dd๐ก๐ก
๐๐ ๐ก๐ก ,๐๐ ๐ก๐ก ๐๐(๐ก๐ก) = Dd๐ก๐ก๐๐ ๐ก๐ก ,๐๐
๐๐(๐ก๐ก)+ ๐๐ ๐ก๐ก , D
d๐ก๐ก๐๐ ๐ก๐ก
๐๐(๐ก๐ก)
With ๐๐ ๐ก๐ก = ๐๐ ๐๐ ๐ก๐ก and ๐๐โฒ ๐ก๐ก = grad๐๐ ๐๐ ๐ก๐ก , ๐๐โฒ ๐ก๐ก ๐๐(๐ก๐ก):
๐๐โฒโฒ ๐ก๐ก =Dd๐ก๐ก
grad๐๐ ๐๐ ๐ก๐ก , ๐๐โฒ ๐ก๐ก๐๐(๐ก๐ก)
+ grad๐๐ ๐๐ ๐ก๐ก ,Dd๐ก๐ก๐๐โฒ ๐ก๐ก
๐๐(๐ก๐ก)
= ๐ป๐ป๐๐โฒ ๐ก๐ก grad๐๐, ๐๐โฒ ๐ก๐ก๐๐(๐ก๐ก)
+ grad๐๐ ๐๐ ๐ก๐ก , ๐๐โฒโฒ(๐ก๐ก) ๐๐(๐ก๐ก)
๐๐โฒโฒ(0) = Hess๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ , ๐ฃ๐ฃ ๐ฅ๐ฅ + grad๐๐ ๐ฅ๐ฅ , ๐๐โฒโฒ(0) ๐ฅ๐ฅ
๐๐ ๐๐ ๐ก๐ก = ๐๐ ๐ฅ๐ฅ + ๐ก๐ก โ grad๐๐ ๐ฅ๐ฅ , ๐ฃ๐ฃ ๐ฅ๐ฅ +๐ก๐ก2
2โ Hess๐๐ ๐ฅ๐ฅ ๐ฃ๐ฃ , ๐ฃ๐ฃ ๐ฅ๐ฅ
+๐ก๐ก2
2โ grad๐๐ ๐ฅ๐ฅ , ๐๐โฒโฒ 0 ๐ฅ๐ฅ + ๐๐ ๐ก๐ก3
The annoying term vanishes at critical points and for specialcurves (special retractions). Mostly fine for optimization.
50
Trust-region method:Newtonโs with a safeguardWith the same tools, we can design a Riemanniantrust-region method: RTR (Absil, Baker & Gallivan โ07).
Approximately minimizes quadratic model of ๐๐ โ ๐ ๐ ๐ฅ๐ฅ๐๐restricted to a ball in T๐ฅ๐ฅ๐๐โณ, with dynamic radius.
Complexity known. Excellent performance, also withapproximate Hessian (e.g., finite differences of grad๐๐).
In Manopt, call trustregions(problem).
51
In the lecture notes:โข Proofs for all claims in these slidesโข References to the (growing) literatureโข Short descriptions of applicationsโข Fully worked out manifolds
E.g.: Stiefel, fixed-rank matrices, general ๐ฅ๐ฅ โ โฐ:โ ๐ฅ๐ฅ = 0โข Details about computation, pitfalls and tricks
E.g.: how to compute gradients, checkgradient, checkhessian
โข Theory for general manifoldsโข Theory for quotient manifoldsโข Discussion of more advanced geometric tools
E.g.: distance, exp, log, transports, Lipschitz, finite differencesโข Basics of geodesic convexity
52
nicolasboumal.net/book