Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Low Rank ApproximationLecture 9
Daniel KressnerChair for Numerical Algorithms and HPC
Institute of Mathematics, [email protected]
1
Manifold optimizationGeneral setting: Aim at solving optimization problem
minX∈Mr
f (X ),
whereMr is a manifold of rank-r matrices or tensors.Goal: Modify classical optimization algorithms (line search, Newton,quasi-Newton, ...) to produce iterates that stay onMr .Advantages over ALS:
I No need to solve subproblems, at least for first-order methods;I Can draw on concepts from classical smooth optimization (line
search strategies, convergence analysis, ...).Two valuable resources:
I Absil/Mahony/Sepulchre’2011:Optimization Algorithms on MatrixManifolds. PUP, 2008. Available fromhttps://press.princeton.edu/absil
I Manopt, a Matlab toolbox for optimizationon manifolds. Available fromhttps://manopt.org/
2
ManifoldsFor open sets U ⊂M, V ⊂ Rd chart is bijective function ϕ : U → V.Atlas ofM into Rd is collection of charts (Uα, ϕα) such that:
I⋃α Uα =M
I for any α, β with Uα ∩ Uβ 6= ∅, change of coordinates
ϕβ ϕ−1α : Rd → Rd
is smooth (C∞) on its domain ϕα(Uα ∩ Uβ).
Illustration taken from Wikipedia.
3
Manifolds
In the following, we assume that atlas is maximal. Proper definition ofsmooth manifoldM needs further properties (topology induced bymaximal atlas is Hausdorff and second-countable). See [Lee’2003]and [Absil et al.’2008].Properties ofM:
I finite-dimensional vector spaces are always manifolds;I d = dimension ofM;I M does not need to be connected (in the context of smooth
optimization makes sense to consider connected manifolds only);I function f :M→ R differentiable at point x ∈M if and only if
f ϕ−1 : ϕ(U) ⊂ Rd → R
is differentiable at ϕ(x) for some chart (U , ϕ) with x ∈ U .
4
Manifolds: First examples
LemmaLetM be a smooth manifold and N ⊂M an open subset. Then N isa smooth manifold (of equal dimension).Proof: Given atlas forM obtain atlas for N by selecting charts (U , ϕ)with U ⊂ N .Example: GL(n,R), the set of real invertible n × n matrices, is asmooth manifold.EFY. Show that Rm×n
∗ , the set of real m × n matrices of full rank minm, n, is a smooth manifold.
EFY. Show that the set of n × n symmetric positive definite matrices is a smooth manifold.
Two main classes of matrix manifolds:I embedded submanifolds of Rm×n;
Example: Stiefel manifold of orthonormal bases.I quotient manifolds;
Example: Grassmann manifold Rm×n∗ /GL(n,R).
Will focus on embedded submanifolds (much easier to work with).
5
Immersions and submersion
LetM1,M2 be smooth manifolds and F :M1 →M2. Let x ∈M1and y = F (x) ∈M2. Choose charts ϕ1, ϕ2 around x , y . Thencoordinate representation of F given by
F := ϕ2 F ϕ−11 : Rd1 → Rd2 .
I F is called smooth if F is smooth (that is, C∞).I rank of F at x ∈M1 defined as the rank of DF (ϕ(x1)) (Jacobian
of F at ϕ(x1))I F is called an immersion if its rank equals d1 at every x ∈M1.I F is called a submersion if its rank equals d2 at every x ∈M1.
6
Embedded submanifoldsSubset N ⊂M is called an embedded submanifold of dimension k inM if for each point p ∈ N there is a chart (U , ϕ) inM such that allelements of U ∩ N are obtained by varying first k coordinates only.(See Chapter 5 of [Lee’2003] for more details.)
TheoremLetM,N be smooth manifolds and let F :M→N be a smooth mapwith constant rank `. Then each level set
F−1(y) := x ∈M : F (x) = y
is a closed embedded submanifold of codimension ` inM.
Corollaries:I If F :M→N is a submersion then each level is a closed
embedded submanifold of codimension equal to the dimension ofN .
I In fact, by open submanifold lemma, only need to check full rankcondition of submersion for points in the level set (replaceM bythe open set for which F has full rank).
7
The Stiefel manifold
For m ≥ n, consider the set of all m × n matrices with orthonormalcolumns:
St(m,n) := X ∈ Rm×n : X T X = In.
CorollarySt(m,n) is an embedded submanifold of Rm×n.Proof: Define F : Rm×n → symm(n) as F : X 7→ X T X , wheresymm(n) denotes set of n × n symmetric matrices. At X ∈ St(m,n),consider Jacobian
DF (X ) : H 7→ X T H + HT X .
Given symmetric Y ∈ Rn×n, set H = XY/2. Then DF (X )[H] = Y ;thus DF (X ) is surjective.EFY. What is the dimension of the Stiefel manifold?
8
The manifold of rank-k matrices
Locality of definition of embedded submanifolds implies the followinglemma (Lemma 5.5 in [Lee’2003]).
LemmaLet N be subset of smooth manifoldM. Suppose every point p ∈ Nhas a neighborhood U ⊂M such that U ∩ N is an embeddedsubmanifold of U . Then N is an embedded submanifold ofM.
TheoremGiven m ≥ n, the set
Mk = A ∈ Rm×n : rank(A) = k
is an embedded submanifold of Rm×n for every 0 ≤ k ≤ n.
9
The manifold of rank-k matrices
Choose arbitrary A0 ∈Mk . After a suitable permutation, may assumew.l.o.g. that
A0 =
(A11 A12A21 A22
), A11 ∈ Rk×k is invertible.
This property remains true in an open neighborhood U ⊂ Rm×n of A0.Factorize A ∈ U as
A =
(I 0
A21A−111 I
)(A11 00 A22 − A21A−1
11 A12
)(I A−1
11 A120 I
).
Define F : U → R(m−k)×(n−k) as F : A 7→ A22 − A21A−111 A12. Then
F−1(0) = U ∩Mk .
10
The manifold of rank-k matrices
For arbitrary Y ∈ R(m−k)×(n−k), we obtain that
DF (A)
[(0 00 Y
)]= Y .
Thus, F is a submersion. In turn, U ∩Mk is an embeddedsubmanifold of U . By lemma,Mk is an embedded submanifold ofRm×n.EFY. What is the dimension ofMk ?
EFY. IsMk connected?
EFY. Prove that the set of symmetric rank-k matrices is an embedded submanifold of Rn×n . Is this manifold connected?
11
Tangent space
In the following, much of the discussion restricted to submanifoldsMembedded in vector space V with inner product 〈·, ·〉 and inducednorm ‖ · ‖.Given smooth curve γ : R→M with x = γ(0), we call γ′(0) ∈ V atangent vector at x . The tangent space TxM⊂ V is the set of alltangent vectors at x .
LemmaTxM is a subspace of V .Proof. If v1, v2 are tangent vectors then there are smooth curvesγ1, γ2 such that x = γ1(0) = γ2(0) and γ′1(0) = v1, γ′2(0) = v2. Toshow that αv1 + βv2 for α, β ∈ R is again a tangent vector, considerchart (U , ϕ) around x such that ϕ(x) = 0. Define
γ(t) = ϕ−1(αϕ(γ1(t)) + βϕ(γ2(t)))
for t sufficiently close to 0. Then γ(0) = x and γ′(0) = αv1 + βv2.EFY. Prove that the dimension of TxM equals the dimension ofM using a coordinate chart.
12
Tangent space
Application of definition to Stiefel manifold. Let
γ(t) = X + tY +O(t2)
be a smooth curve with X ∈ St(m,n). To ensure that γ(t) ∈ St(m,n),we require
In = γ(t)Tγ(t) = (X +tY )T (X +tY )+O(t2) = In+t(X T Y +Y T X )+O(t2).
Thus, X T Y + Y T X = 0 characterizes tangent space:
Tx St(m,n) = Y ∈ Rm×n : X T Y = −Y T X= XW + X⊥W⊥ : W ∈ Rn×n,W = −W T ,W⊥ ∈ R(m−n)×n
where the columns of X⊥ form basis of span(X )⊥
13
Tangent space
WhenM is defined (at least locally) as level set of constant rankfunction F : V → RN , we have
TxM = ker(DF (x)).
Proof. Let v ∈ TxM, that is, there is a curve γ : R→M such thatγ(0) = x and γ′(0) = v . Then, by chain rule,
DF (x)[v ] = DF (x)[γ′(0)] =∂
∂tF (γ(t))
∣∣∣∣t=0
= 0,
because F is constant onM. Thus, TxM⊂ ker(DF (x)), whichcompletes the proof by counting dimensions.
14
Tangent space ofMkRecall thatMk was obtained as level set of local submersion
F : A 7→ A22 − A21A−111 A12.
Given A ∈Mk consider SVD
A =(U U⊥
)(Σ 00 0
)(V V⊥
)T.
We have
DF(
Σ 00 0
)[H] = H22.
Thus, H is in the kernel if and only if H22 = 0. In terms of A thisimplies
TAMk = ker(DF (A)) =(U U⊥
)( Rk×k Rk×(n−k)
R(m−k)×k 0
)(V V⊥
)T
= UMV T + UpV T + UV Tp : M ∈ Rk×k ,UT
p U = V Tp V = 0.
EFY. Compute the tangent space for the embedded submanifold of rank-k symmetric matrices.
15
Riemannian manifold and gradient
For submanifoldM embedded in vector space V : Inner product 〈·, ·〉on V induces inner product on TxM. This turnsM into a Riemannianmanifold.1
The (Riemannian) gradient of smooth f :M→ R at x ∈M is definedas the unique element grad f (x) ∈ TxM that satisfies
〈grad f (x), ξ〉 = Df (x)[ξ], ∀ξ ∈ TxM.
EFY. Prove that the Riemannian gradient satisfies the steepest ascent property
grad f (x)
‖grad f (x)‖2= arg maxξ∈TxM‖ξ‖=1
Df (x)[ξ].
1In general, for a Riemannian manifold one needs to have an inner product on TxMthat varies smoothly wrt x .
16
Riemannian gradient
For submanifoldM embedded in vector space V : The (Euclidean)gradient of f in V admits the decomposition
∇f (x) = Px∇f (x) + P⊥x ∇f (x),
where Px ,P⊥x are the orthogonal projections onto TxM, T⊥x M. Forevery ξ ∈ TxM we have
Df (x)[ξ] = 〈∇f (x), ξ〉 = 〈Px∇f (x), ξ〉+ 〈P⊥x ∇f (x), ξ〉= 〈Px∇f (x), ξ〉,
where we used that P⊥x ∇f (x) ⊥ ξ. Hence,
grad f (x) = Px∇f (x).
The Riemannian gradient is the orthogonal projection of theEuclidean gradient onto the tangent space.
17
Riemannian gradientExample: Given symmetric n × n matrix A, consider traceoptimization problem
minX∈St(n,k)
trace(X T AX )
Study first-order perturbation
trace((X + H)T A(X + H))− trace(X T AX )
= trace(HT AX ) + trace(X T AH) +O(‖H‖2)
= 2〈H,AX 〉+O(‖H‖2).
Euclidean gradient at X given by 2AX .Note that skew(W ) = (W −W T )/2 is orth projection onskew-symmetric matrices. Thus,
PX (Z ) = (I − XX T )Z + X · skew(X T Z ).
grad f (X ) = PX (∇f (X )) = 2(I − XX T )AX + 2X · skew(X T AX )
= 2(AX − X X T AX ).
18
Riemannian gradient
Example: For A ∈Mk consider SVD A = UΣV T with Σ ∈ Rk×k .Define orthogonal projections onto span(U), span(V ), and theircomplements:
PU = UUT , P⊥U = I − UUT , PV = VV T , P⊥V = I − VV T .
Recall that
TAMk = UMV T + UpV T + UV Tp : M ∈ Rk×k ,UT
p U = V Tp V = 0
The three terms of the sum are orthogonal to each other and canthus be considered separately Orthogonal projection onto TAMkgiven by
PA(Z ) = PUZPV + P⊥U ZPV + PUZP⊥V .
EFY. Compute the Riemannian gradient of f (A) = ‖A − B‖2F onMk for given B ∈ Rm×n .
19
Line search: Concepts from Euclidean case
minx∈RN
f (x),
Line search is optimization algorithm of the form
xj+1 = xj + αjηj
with search direction ηj and step size αj > 0.I First-order optimal choice of ηj : ηj = −∇f (xj ) gradient
descent.Motivation for other choices: Faster local convergence(Newton-type methods), exact gradient computation tooexpensive, . . .Gradient-related search directions: 〈ηj ,∇f (xj )〉 < δ < 0 for all j .
20
Line search: Concepts from Euclidean caseI Exact line search chooses
αj = arg minα
f (xj + αηj ).
Only in exceptional cases simple optimization problem, e.g.,admitting closed form solution.EFY. Derive the closed form solution for exact line search applied to
minx∈Rn
1
2xT Ax + bT x
for symmetric positive A ∈ Rn×n and b ∈ Rn .
Alternative: Armijo rule. Let β ∈ (0,1) (typically β = 1/2) andc ∈ (0,1) (e.g., c = 10−4) be fixed parameters. Determinelargest αj ∈ 1, β, β2, β3, . . . such that
f (xj + αjηj )− f (xj ) ≤ cαj∇f (xj )Tηj
holds. (Such αj always exists provided that ηj is descentdirection, i.e., when 〈ηj ,∇f (xj )〉 < 0.)
More details in [J. Nocedal and S. J. Wright. Numerical optimization.Second edition. Springer Series in Operations Research andFinancial Engineering. Springer, 2006].
21
Line search: Extension to manifolds
minx∈M
f (x)
Cannot use line search xk+1 = xj + αjηj , simply because addition isnot well defined inM.Idea:Search along smooth curve γ(α) ∈M with γ(0) = xj andγ′(0) = ηj ∈ TxjM.Step in direction xj + αηj ∈ xj + TxjM and go back to manifold viaretraction:
22
Retraction
Tangent bundle TM :=⋃
x∈M TxM
DefinitionMapping R : TM→M is called a retraction onM if for everyX0 ∈M there exists neighborhood U around (X0,0) ∈ TM such that:
1. U ⊂ dom(R) and R|U : U →M is smooth.2. R(x ,0) = x for all (x ,0) ∈ U .3. DR(x ,0)[0, ξ] = ξ for all (x , ξ) ∈ U .
Will write Rx = R(x , ·) : TxM→M in the following.Intuition behind definition:Property 2 = retraction does nothing to elements on manifold.Property 3 = retraction preserves direction of curves. Equivalentcharacterization: For every tangent vector ξ ∈ TxM, the curveγ : α 7→ Rx (αξ) satisfies γ′(0) = ξ.EFY. What is a retraction for the manifold of invertible n × n matrices (trick question)?
23
RetractionExponential maps are most natural choice of retraction fromtheoretical point of view but often too expensive/too cumbersome tocompute.
In practice for matrix manifolds: Retractions are often built frommatrix decompositions and metric projections.
Example St(n, k): Given Y ∈ Rn×k∗ (i.e., rank(Y ) = k ), the
economy-sized QR decomposition
Y = XR, X T X = Ik , R =@@@
is unique provided that diagonal elements of R are positive. Thisdefines a diffeomorphism
φ : St(n, k)× triu+(k)→ Rn×k∗ , φ : (X ,R) 7→ XR,
where triu+(k) denotes upper triangular matrix with positive diagonalelements. Applying φ−1 just means computing the QRdecomposition. Note that
dim St(n, k) + dim triu+(k) = dimRn×k∗ .
24
Retraction
Abstract setting: LetM be embedded submanifold of vector space Vand N smooth manifold such that
dim(M) + dim(N ) = dim(V ).
Assume there is diffeomorphism
φ :M×N → V∗ : (x , y) 7→ φ(x , y)
for some open subset V∗ of V . Moreover, assume ∃ neutral elementid ∈ N such that φ(x , id) = x for all x ∈M.
LemmaUnder above assumptions,
Rx (η) := π1(φ−1(x + η))
is a retraction onM, where π1 is projection onto first component:π1(x , y) = x.
25
Retraction
Proof of lemma. Need to verify three properties of retraction.Property 1: Immediately follows from assumptions that Rx (ξ) isdefined and smooth for all ξ in a neighborhood of 0 ∈ TxM.Property 2: Rx (0) := π1(φ−1(x)) = π1(x , id) = x .Property 3: Differentiating x = π1 φ−1(φ(x , id)) we obtain for anyξ ∈ TxM that
ξ = D(π1 φ−1)[Dφ(x , id)[ξ,0]]
= D(π1 φ−1)(x)[ξ] = DRx (0)[ξ].
26
RetractionFor z ∈ V sufficiently close toM, metric projection is well defined:
PM(z) := arg minx∈M
‖z − x‖.
Corollary (Lewis/Malick’2008)The map
Rx (η) := PM(x + η)
defines a retraction.
Examples for retractions based on metric projection:I For St(n, k), polar factor Y (Y T Y )−1/2 of Y ∈ Rn×k
∗ defines aretraction.
I For rank-k matrix manifoldMk , best rank-k approximation Tkdefines a retraction.There are other choices; see [Absil/Oseledets’2015: Low-rankretractions: a survey and new results].
EFY. For all examples discussed so far, develop algorithms that efficiently realize the retraction by exploiting the structure of x + η.
EFY. Find a retraction for the manifold of symmetric rank-k matrices.
27
Riemannian line search
xj+1 = Rxj (αjηj ).
Assumption. Sequence ηj is bounded and gradient related:
lim supk→∞
〈grad f (xj ), ηj〉 < 0.
Canonical choice: ηj = −grad f (xj ).Extension of Armijo rule. Let β ∈ (0,1) and c ∈ (0,1) (e.g., c = 10−4)be fixed parameters. Determine largest αj ∈ 1, β, β2, β3, . . . suchthat
f (Rxj (αjηj ))− f (xj ) ≤ cαj〈grad f (xj ), ηj〉 (1)
holds.EFY. Show that the Armijo condition (1) can always be satisfied for sufficiently small αj .
28
Riemannian line search1: for j = 0,1,2,. . . do2: Pick ηj ∈ TxjM such that sequence ηj is gradient-related.3: Choose αj ∈ 1, β, β2, β3, . . . such that Armijo condition is
satisfied.4: Set xj+1 = Rxj (αjηj ).5: end for
Convergence theory in Section 4.3 of [Absil’2008].We call x∗ ∈M a critical point of f if grad f (x∗) = 0.
TheoremEvery accumulation point of xj is a critical point of cost function f .More can be said if manifold (or at least level set) is compact.
CorollaryAssume that L = x ∈M : f (x) ≤ f (x0) is compact. Thenlimj→∞ ‖grad f (xj )‖ → 0.Note thatMk is not compact and it is not clear a priori whether L iscompact..
29
Application tolow-rank matrix
and tensor completion
30
Matrix Completion
PΩ A =
recover? A
PΩ : Rm×n → Rm×n, PΩ X =
Xij if (i , j) ∈ Ω,
0 else.
Applications: image reconstruction, image inpainting, Netflix problemLow-rank matrix completion:
minX
rank(X ) , X ∈ Rm×n
subject to PΩ X = PΩ A
31
Low-rank matrix completion: ( NP-Hard)
minX
rank(X ) , X ∈ Rm×n
subject to PΩ X = PΩ A
Nuclear norm relaxation: ( convex, but expensive)
minX
‖X‖∗ =∑
i
σi , X ∈ Rm×n
subject to PΩ X = PΩ A
Robust low-rank completion: (Assume rank is known)
minX
12‖PΩ X − PΩ A‖2
F , X ∈ Rm×n
subject to rank(X ) = k
Huge body of work! Overview: http://perception.csl.illinois.edu/matrix-rank/
32
Setting
minimizeX
f (X ) :=12‖PΩ(X − A)‖2
F
subject to X ∈Mk :=
X ∈ Rm×n : rank(X ) = k
PΩ : Rm×n → Rm×n
Xij 7→
Xij if (i , j) ∈ Ω,0 if (i , j) 6∈ Ω.
Riemannian gradient given by:
grad f (X ) = PTXMk
(PΩ(X − A)
)with orthogonal projection PTXMk : Rm×n → TXMk .
33
Geometric nonlinear CG for matrix completionInput: Initial guess X0 ∈Mk .η0 ← −grad f (X0)α0 ← argminα f (X0 + αη0)X1 ← RX0 (α0η0)
for i = 1,2, . . . doCompute gradient:ξi ← grad f (Xi )Conjugate direction by PR+ updating rule:ηi ← −ξi + βiTXi−1→Xiηi−1Initial step size from linearized line search:αi ← argminα f (Xi + αηi )Armijo backtracking for sufficient decrease:Find smallest integer m ≥ 0 such thatf (Xi )− f (RXi (2
−mαiηi )) ≥ −1 · 10−4〈ξi ,2−mαiηi〉Obtain next iterate:Xi+1 ← RXi (2
−mαiηi )end for
Cost/iteration: O((m + n)k2 + |Ω|k) ops.
34
Vector transportConjugate gradient method requires combination of gradients forsubsequent iterates:
grad f (X ) ∈ TXMk , grad f (Y ) ∈ TYMk
⇒ grad f (X ) + grad f (Y ) ???
Can be addressed byvector transport:
TX→Y : TXMk → TYMk
TX→Y (ξ) = PTYMk (ξ).
TX→Y (ξ)
Mk
Xη
ξ
TYMk
TXMk
Can be implemented in O((m + n)k2) ops.
35
Numerical experimentsI Comparison to LMAFit [Wen/Yin/Zhang’2010].http://lmafit.blogs.rice.edu/ .
I Oversampling factor OS = |Ω|/(k(2n − k)).I Purely academic example A = ALAT
R with AL,AR = randn.
36
Influence of n
0 50 100 150 200
10−10
10−5
100
iteration
rela
tive r
esid
ual
Convergence curve: k = 40, OS = 3
n=1000n=2000n=4000n=8000n=16000n=32000
I Dashed lines: LMAFit. Solid lines: Nonlinear CG.I time(1 iteration of Nonlinear CG)≈ 2× time(1 iteration of LMAFit)
37
Influence of rank
0 50 100 150 200
10−10
10−5
100
iteration
rela
tive r
esid
ual
Convergence curve: n = 8000, OS = 3
k=10k=20k=30k=40k=50k=60
I Dashed lines: LMAFit. Solid lines: Nonlinear CG.I time(1 iteration of Nonlinear CG)≈ 2× time(1 iteration of LMAFit)
38
Numerical experimentsI Comparison to LMAFit [Wen/Yin/Zhang’2010].http://lmafit.blogs.rice.edu/ .
I Oversampling factor OS = |Ω|/(k(2n − k)) = 8.I 8 000× 8 000 matrix A is obtained from evaluating
f (x , y) =1
1 + |x − y |2
on [0,1]× [0,1].
39
Influence of rank
I Hom: Start with k = 1 and subsequently increase k , usingprevious result as initial guess.
40
Tensor Completion
Low-rank tensor completion:
minX
rank(X ) , X ∈ Rn1×n2×...×nd
subject to PΩ X = PΩA
Applications:I Completion of multidimensional data,
e.g. hyperspectral images, CT ScansI Compression of multivariate
functions with singularitiesI . . .
41
Manifold of Tensors of fixed multilinear rank
Mk :=X ∈ Rn1×...×nd : rank(X ) = k
,
dim(Mk) =d∏
j=1
kj +d∑
i=1
(kini −
ki (ki − 1)
2
).
I Mk is a smooth manifold. Discussed for more general formats in[Holtz/Rohwedder/Schneider’2012], [Uschmajew/Vandereycken’2012]
I Riemannian with metric induced by standard inner product〈X ,Y〉 = 〈X(1),Y(1)〉 (sum of element-wise product)
Manifold structure used inI dynamical low-rank approximation
[Koch/Lubich’2010], [Arnold/Jahnke’2012],[Lubich/Rohwedder/Schneider/Vandereycken’2012],[Khoromskij/Oseledets/Schneider’2012], . . .
I best multilinear approximation [Eldén/Savas’2009], [Ishteva/Absil/VanHuffel/De Lathauwer’2011], [Curtef/Dirr/Helmke’2012]
42
Gradients and Tangent Space TXMk
Every ξ in the tangent space TXMk at X = C ×1 U ×2 V ×3 Wcan be written as:
ξ = S ×1 U ×2 V ×3 W+ C ×1 U⊥ ×2 V ×3 W+ C ×1 U ×2 V⊥ ×3 W+ C ×1 U ×2 V ×3 W⊥
for some S ∈ Rk1×k2×k3 , U⊥ ∈ Rn1×k1 with UT⊥U = 0, . . .
Again, we obtain the Riemannian gradient of the objective function
f (X ) :=12‖PΩ X − PΩA‖2
F
by projecting the Euclidean gradient into the tangent space:
grad f (X ) = PTXMk (PΩ X − PΩA)
43
RetractionCandidate for retraction: Metric projection
RX (ξ) = PX (X + ξ) = arg minZ∈Mk
‖X + ξ −Z‖.
No closed-form solution availableI Replaced by HOSVD truncation.I Seems to work fine.I HOSVD truncation is a retraction
[K./Steinlechner/Vandereycken’13].
44
Geometric Nonlinear CG for Tensor Completion
Input: Initial guess X0 ∈Mk.η0 ← −grad f (X0)α0 ← argminα f (X0 + αη0)X1 ← RX0 (α0η0)
for i = 1,2, . . . doCompute gradient:ξi ← grad f (Xi )Conjugate direction by PR+ updating rule:ηi ← −ξi + βiTXi−1→Xiηi−1Initial step size from linearized line search:αi ← argminα f (Xi + αηi )Armijo backtracking for sufficient decrease:Find smallest integer m ≥ 0 such thatf (Xi )− f (RXi (2
−mαiηi )) ≥ −1 · 10−4〈ξi ,2−mαiηi〉Obtain next iterate:Xi+1 ← RXi (2
−mαiηi )end for Cost/iteration: O(nkd + |Ω|kd−1) ops.
45
Reconstruction of CT Scan199× 199× 150 tensor from scaled CT data set “INCISIX”,(taken from OSIRIX MRI/CT data base[www.osirix-viewer.com/datasets/])
Slice of original tensor HOSVD approx. of rank 21
Sampled tensor (6.7%) Low-rank completion of rank 21
Compares very well with existing results w.r.t. low-rank recovery andspeed, e.g., [Gandy/Recht/Yamada/’2011].
46
Hyperspectral ImageSet of photographs, (204× 268 px) taken across a large range ofwavelengths. 33 samples from ultraviolet to infrared [Image data:Foster et al.’2004]Stacked into a tensor of size 204× 268× 33
Completed Tensor, 16th SliceFinal Rank is k = [50 50 6]
10% of the Original Hyperspectral Imega Tensor, 16th SliceSize of Tensor is [204, 268, 33]
Here: Only 10% of entries known; [Signoretti et al.’2011] use 50%.47
How many samples dowe need?
Matrix case:O(n · logβ n) samples suffice![Candès/Tao’2009]⇒ Completion of tensor byapplying matrix completion tomatricization: O(n2 log(n)). Givesupper bound!
Tensor case:Certainly: |Ω| O(n2)In all cases of convergence exact reconstruction.
Conjecture: |Ω| = O(n · logβ n)
4 4.5 5 5.5 6 6.5 78.5
9
9.5
10
10.5
11
11.5
12
12.5
13
log(n)
log(
Sm
alle
st s
ize
of
nee
ded
to c
onve
rge
)
y = 1.2*x + 4
O(n2)
O(n)
O(n log n)
48
Robustness of Convergence
0 5 10 15 20 25 3010 16
10 14
10 12
10 10
10 8
10 6
10 4
10 2
100
102
Iteration
Rel.
err o
n O
meg
a an
d no
ise le
vels
Noisy completion, n = 100, k = [4, 5, 6]
I Random 100× 100× 100 tensor of multilinear rank (4,5,6)perturbed by white noise.
I Upon convergence reconstruction up to noise level.
49
Final remarks on Riemannian low-rank optimizationI Only discussed first-order methods. Fine for well-conditioned
problems but slow convergence for ill-conditioned problems.I Second-order methods (Newton-like) require Riemannian
Hessian: painful and:I not of much help for well-conditioned problems (low-rank matrix
completion).I linearized equations hard to solve efficiently for low-rank matrix and
tensor manifolds.I Low-rank matrices/tensors can also be viewed as products of
quotient manifolds. Requires careful choice of metric to stayrobust wrt small singular value σk [Ngo/Saad’2012],[Kasai/Mishra, ICML’2016].
I Lots of open problems concerning convergence analysis oflow-rank Riemannian optimization!
50