Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
Tensor Basic Facts II
2
Overview
Hierarchical Tucker Tensor Train Generalizing matrix properties Software
3
Tensor Networks
Original tensor Ai1,…,id: memory nd CP: dnR, good compression but bad approximation Tucker: dnr+rd , good approximation but bad compression
Find better Tensor Networks as a compromise of both: Recursive binary tree ( H-Tucker) or Recursive chain ( Tensor train)
4
Hierarchical Decompositions dII
r
jj
dIRaACP ××
= =∈⊗=∑ 1
1,1
: µµTensor rank
dd
d
d
rrr
j
r
jj
d
jj IRCaCA ××
=∈⊗=∑ ∑
11
1
1,:Tucker ,1,, µµ µ multilinear rank
{ }q
r
j
r
jj
dp
J jjJaBAq
q
,,,:TuckerΗ 1,11
1
1
⊂⊗=− ∑ ∑∏ ==
νµµνµν
hierarchical rank
5
Tucker revisited Special case of Tucker: Orthogonal Tucker, if all matrices U(μ) the so called mode frames are orthogonal = HOSVD
]],...,,;[[... )()2()1()()2(2
)1(1
NNN UUUGUUUGA =×××=
,......1
1
2
2
221121211 1 1
)()2()1(...... ∑∑ ∑
= = =
=R
k
R
k
R
k
Nkikikikkkiii
N
N
NNNNuuuGA nn Ii ,...,1=
6
Best Appoximation For given orthogonal U(μ) , μ=1,…,d, it holds
TTT NN
NNG
UUUAG
UUUGA)()2(
2)1(
1
)()2(2
)1(1
...
...min
×××=
⇔×××−
Truncation of tensor A to Tucker rank (k1,…,kN) via SVD:
µµ µµµµµµµµµ σσσσ nnT diagVUA ,1,,1,)( ...,),...,(, ≥≥=ΣΣ=
):1(:,:~µµµ kUU =Define , the Tucker truncation of A by
( )( ) ( )T
NNNT
NNTNN
Tkk
UUUUA
UUUUAATN
~~...~~
~...~~...~:)(
2111
211211),...,( 1
×××=
=××××××=
7
Tucker Truncation Error
bestN n
kiikk AANATA
N−≤≤− ∑ ∑
= +=1 1
2,),...,( )(
1µ
µ
µ
µ
σ
where Abest is the best possible approximation in Tucker(k1,…,kN).
Beware! Change of the notation: Unfolding A(n) A(μ), N d !!!
It holds:
8
Hierarchical Rank Model d
I IIIIRA ××=∈ 1,
( ) ( ),11 dqq IIIIIRA ××××× +∈
Matricification by reduction to two indices matrix A
Compute SVD of A:
,,, 11
1
dqq IIi
IIi
d
i
Tiiii IRVIRUVUA ××××
=
+∈∈Σ=∑
Ui, Vi seen as vectors/tensors of lower dimensional. Repeat for tensor matricisation Ul:
( ) ( ) .,~,ˆ1
),(),()(
1
)()( etcZYWXWUr
i
Tjliii
jli
lj
r
i
Tliii
lil ∑∑
==
Σ=Σ=
9
Memory Costs A U1 … Uk V1 … Vk U1 … Vk … U1 … Vk U1 … Vk … U1 … Vk …………………………………………………………………….. U1…Vk ……………………………………………………U1…Vk
22d
nk ⋅
( ) 422d
nk ⋅
( ) ( ))dlog(dd)dlog( dnkOnkO =
⋅2Data complexity of the last row:
10
Hierarchical uniform subspaces ( ) ( ),11 dqq IIIIIRA ××××× +∈
,,, 11
1
dqq IIi
IIi
d
i
Tiiii IRVIRUVUA ××××
=
+∈∈Σ=∑
Ui, Vi seen as vectors/tensors are lower dimensional.
Better and cheaper recursion in binary tree: Repeat for tensor Ul:
.,~,1
..11
,,1
etcZBYWXBWUr
i
Tjjili
r
jl
r
i
Tjjili
r
jl ∑∑∑∑
= == =
==
Leads to smaller data complexity )dkdnk(O 3+
11
Dimension Tree
{1,2,3,4,5}
{3,4,5}
{1,2}
{4,5}
{3}
{4}
{5}
{1}
{2}
Level: 0 1 2 3
Root: Leaves: Interior node:
12
Dimension Tree
Definition: A dimension tree TI for dimension d ϵ IN is a tree with root {1,…,d} and depth such that each node t ϵ Td is either 1. a leaf and singleton t={μ } on level l ϵ {p-1,p} 2. the union of two disjoint successors S(t) = {s1,s2}:
)(log2 dp =
21 sst ∪=
The level l of the tree is defined as the set of all nodes having a distance of exactly l to the root.
{ }.)(|: ltlevelTtT Il
I =∈=
Set of leaves: L(TI). Set of interior nodes: I(TI). A node of the tree is a mode cluster = union of modes.
13
Properties
Up to last level: complete binary tree. For a canonical dimension tree each interior node has two successors:
1},,...,{ 1 >= qt qµµ
},...,{:,2/:},,...{: 1211 qrr tqrt µµµµ +===
Total number of nodes: 2d-1 Number of leaves: d Number of interior nodes: d-1
14
Hierarchical Rank For a mode cluster t in a dimension tree TI we define the complementary cluster with ,\},...,1{:' tdt =
,:,:'' µµµµIIII
tttt ∈∈×=×=
and the related t-matricization
( ))(
),...,()(,)(
)(:
:)(,:1'''
'
tt
iiiitIII
t
AAMNotation
AAMIRIRMdtt
tt
=
=→∈∈
×
µµµµ
{1,2,3,4,5}
{3,4,5}
{1,2}
{4,5}
{3}
{4} {5}
{1}
{2} }5,4,2,1{},3{
})3({ AA =
15
Example 4321 IIIIIRdcbaA ×××∈⊗⊗⊗=
( )( ) ( ) ( )
( )( ) ( ) ( )
( )( ) ( ) ( )
( ) ( )4321
4132
2143
4321
})1({
})3,2({
})4,3({
})2,1({
IIIIT
IIIIT
IIIIT
IIIIT
IRdcbaA
IRdacbA
IRbadcA
IRdcbaA
×××
×××
×××
×××
∈⊗⊗=
∈⊗⊗=
∈⊗⊗=
∈⊗⊗=
For tensor A , dimension tree T, node t with complementary cluster t‘ it holds Ttt AA )()'( =
16
Hierarchical Rank For dimension tree TI the hierarchical rank of a tensor is defined by
ITttk ∈)(IIRA∈
)(:: )(ttI ArankkTt =∈∀
The set of tensors of hierarchical rank at most (node-wise) is denoted by
ITttk ∈)(
{ }.)(:|:)Tucker( )(t
tI
ITtt kArankTtIRAkH
I≤∈∀∈=− ∈
17
SVD of A(t), d=5, ni=25 255
255 9765625
253x252 15625x625 22
252x253 625x15625 22
252x253 625x15625 22
25x254 25x390625 22
25x254 25x390625 22
25x254 25x390625 22
25x254 25x390625 22
25x254 25x390625 22
Matrix sizes of A(t) and
# large singular values:
kt1
kt2
kt3
kt4
kt5
kt6
kt7
kt8
18
Nestedness of Matricizations Let T be a dimension tree and a tensor with hierarchical rank . Let be a node with sons s1, s2.
ITttk ∈)(
IIRA∈
Tt∈( ) )A(imageofbasisabek,...,i,U )t(
tit 1=
( ) )A(imageofbasisabek,...,j,U )s(sjs
1
111=
( ) )A(imageofbasisabek,...,l,U )s(sls
2
221=
Let
Then there exist coefficients (Bt)i,j,l such that
( ) ( ) ( ) ( ) .UUBUs sk
j
k
llsjsl,j,itit ∑∑
= =
⊗=1 2
211 1
If the bases are orthogonal then 21 ss U,U
( ) ( ) ( ) ( ) .UU,UBlsjsitl,j,it 21
⊗=
19
Proof:
Consider one column of the matricization A(t).
For index (jμ)μϵt‘ this column defines a matrix ( ) ( )
'tj:,
)t(A∈µµ
( ) ( ) .: ),...,(, 121
dssjjjj AY =
∈∈ µµµµ
By assumption the rows and columns of Y are in the span of ( ) ( ) :U,U
lsjs 21
( ) ( )∑∑= =
=1 2
1 121
s sk
j
k
l
Tlsjsl,j UUcY
for some coefficients cj,l.
Therefore, every column of A(t) is a linear combination of ( ) ( ) .UU
lsjs 21⊗
20
t-frame, frame tree:
Let be a mode cluster and a family of non-negative integers.
Tt∈ ( ) Tttk ∈
We call a matrix a t-frame and a tuple of frames a frame tree.
tt kIt IRU ×∈
( )ITssU ∈
A frame is called orthogonal if the columns are orthogonal. A frame tree is called orthogonal if each frame (except the root) is orthogonal.
21
transfer tensor
A frame tree is nested if for each interior mode cluster t with successors S(t) = {t1,t2} the following relation holds:
( ){ } ( ) ( ){ }.kj,ki|UUspanki|Uspan ttjtittit 2121111 ≤≤≤≤⊗⊂≤≤
The corresponding tensor relative to the represen- tation of by is called the transfer tensor :
21 ttt kkkt IRB ××∈
( )itU21 tt U,U
( ) ( ) ( ) ( ) .1 2
1 1,, 21∑∑
= =
⊗=t tk
j
k
lltjtljitit UUBU
22
Hierarchical Tucker Format T is a dimension tree, a family of non-negative integers,
.
Let be a nested frame tree
with transfer tensors and
( ) Tttk ∈
( )( )TttkHA ∈−∈ Tucker( ) TttU ∈
( ) )(TIttB ∈
( ) ( ) .,: ),...,1()(
dtt
I UAUimageAimageTt ==∈∀
Then the representation is a hierarchical Tucker representation. The family is the hierarchical representation rank.
( ) ( )( ))()( , TLttTItt UB ∈∈
( ) Tttk ∈
Note that the columns of Ut need not to be linear independent! This representation with orthogonal frame tree is unique upto orthogonal transformations of the t-frames.
23
Storage complexity Again T dimension tree with given
in hierarchical Tucker representation
and for S(t)={t1,t2}, Bt of minimal size.
( )( )TttkHA ∈−∈ Tucker( ) ( )( ))()( , TLttTItt UB ∈∈
21 ttt kkkt IRB ××∈
Then the total storage for all transfer tensors and
leaf-frames in terms of number of entries
is bounded by
( ) )(TIttB ∈
( ) )(TLttU ∈
( ) ( )( ) ( ) ,1,1
3)()( ∑
=∈∈ +−≤
d
TLttTItt nkkdUBStorageµ
µ
,max: tTt kk ∈=
is linearly bounded in the dimension d (provided the representation parameter k is uniformly bounded).
where
24
Proof: For each leaf t={μ} of the dimension tree we have to store the t-frame which yields the second term .
1∑=
d
nkµ
µ
tknt IRU ×∈ µ
For all d-1 interior mode clusters we have to store the transfer tensors Each has at most k3 entries.
21 ttt kkkt IRB ××∈
25
Hierarchical Truncation error Def.: T a dimension tree, t ϵ T and Ut an orthogonal t-frame. Orthogonal frame projection πt : IRI IRI is defined as
( ).:
},...,1{:
},...,1{
)()(
AAdtforAUUA
d
tTtt
tt
=≠=
ππ
Theorem (Hierarchical truncation error): Dimension tree T, AϵIRI. Let Abest be the best approximation of A in H-Tucker((kt)tϵT), πt the ortogonal frame projection for t-frame Ut that consists of the left singular vectors of A(t) corresponding to the kt largest singular values σt,i of A(t). Then it holds
.222,
best
Tt kiit
Ttt AAdAA
t
−−≤≤− ∑∑∏∈ >∈
σπ
26
Proof: Lemma: It holds
∑∏∈∈
−≤−Tt
tTt
t AAAA 22
ππ
222 AAAAAA stst ππππ −+−≤−
Proof of Lemma: ( ) ( )( ) 2222
22
AAAAAAAA
AAAAAA
stsst
sttst
πππππ
πππππ
−+−≤−+−
=−+−=−
Proof of Theorem: It holds 22
,2 best
kiitt AAAA
t
−≤≤− ∑>
σπ
Applying the above Lemma and the result on the number of nodes of a dimension tree yields:
( ) 22,
2
22 best
Tt kiit
Ttt AAdAA
t
−−≤≤− ∑∑∏∈ >∈
σπ
Can be improved to (2d - 3)
27
Properties of H-Tucker Format
The set H-Tucker((kt)tϵT) is - a closed set in IRI , but - is not a linear space (the rank increases by linear combinations)
Storage complexity: With and
3
},{)(),(1 21
21)( dkdnkkkkknAStorage
tttsonsTItttt
d
+≤+≤ ∑∑=∈=µ
µµ
µµ nn d,...,1max: == tTt kk ∈= max:
All tensors of canonical rank k are contained in H-Tucker((kt)tϵT) (also all tensors of boarder rank k). The set H-Tucker((kt)tϵT) is much thinner than the Tucker format because we impose additional rank conditions.
28
Example { }( ) [ ],||: 212211
2,1 quququA ⊗⊗⊗=:333 ××∈ IRA( )( )( )
=⊗=⊗=⊗
=321
,21
,22
,11})2,1({
),,(
lifqulifqulifqu
A
ji
ji
ji
lji
.010
,21
021
,010
,001
2121
=
=
=
= qquu
{1,2,3}
{1,2}
{3}
{1}
{2}
t
t1
t2 [ ][ ] [ ]2121
212211
|:,|:
,||:
21qqUuuU
qqququU
tt
t
==
⊗⊗⊗=
Consider orthogonal mode frames:
9 x 3 - matrix
29
( )
=
=⊗⊗⊗=
00000000000001000000211000021
212211)}2,1({ quququA
30
( ) ( )
)1()1(
)1(
2
121
)1(
000010001
010001
001001
1
11
1
1
AA
Auu
uuAUUA T
TtT
ttt
t
=
=
== =π
00210000021 000
010000 000
001000 3
1 2
( )
⊗=⊗21
021
00111 qu ( )
⊗=⊗
010
01022 qu
( )
⊗=⊗
010
00121 qu
=00000000000001000001000021021
)1(A
( )( ) .21
1=t
t Arank π
31
( ) ( )TTTTTtt
Ttt
Ttt
Ttttttt
qqqquuuuUUUUQ
UUUUUU
221122112211
22112121,,
+⊗+=⊗=⇔
⇔⇔⇔ππ
( ) ( )( )( )
⊗⊗⊗=
=⊗⊗⊗+⊗+=
212211
21221122112211
21 quququ
qqququqqqquuuuQU TTTTt
( ) ( )
( )
( )
⊗⊗⊗=
⊗⊗⊗=
=⊗⊗⊗
⊗⊗⊗=
==
212211212211
212211212211
})2,1({})2,1({})2,1({
21
2100010001
21
21
qqququqqququ
ququququququU
AQUUQAUUAT
t
Ttt
Tttttt πππ
32
( )( ) 32
1212211
)1(21
=
=
TTT
ttt qqququrankArank πππ
because u1, u2, q1 are linearly independent.
3 x 9 matrix
The first projection maps A into Tucker(2,2,3), but after the coarser projection πt the 1-mode rank is 3 and thus This is because πt mixes the t1-frame and the t2-frame. Tucker(2,2,3) means ranks in the standard Tucker format.
21 tt ππ
).3,2,2Tucker(21
∉Attt πππ
33
Root-to-Leaves Truncation
Input: tensor A, dimension tree TI, target rank ((kt)tϵT). For each singelton t ϵ L(TI) do Compute SVD of A(t), store dominant kt left singular vectors in the columns of the t-frame Ut. For l=p-1,…,0 do For each mode cluster t ϵ L(TI) on level l do Compute SVD of A(t), store dominant kt left singular vectors in the columns of the t-frame Ut. Let and denote the frames for the successors of t on level l+1. Compute the entries of the transfer tensor
1tU
2tU
( ) ( ) ( ) ( )νν 21
,:,, tjtitjit UUUB ⊗=
34
Compute the entries of the root (with sons t1, t2 ) transfer tensor:
{ }( ) ( ) ( )νν 21
,:,,1,...,1 tjtjd UUAB ⊗=
Return H-Tucker representation for
( ) ( )( ))()( ,II TIttTLtt BU ∈∈
( )( ).TuckerITttH kHA ∈−∈
Complexity: ( )( )231 ... dnnO ⋅⋅
35
Brothers and related matricizations For dimension tree TI and non-root mode cluster t with father f we define the unique mode cluster as the brother of t such that .
tttf ∪=
Let TI a dimension tree with interior node . Assume the matricization and the representation
21 ttt ∪=
∑=
=k
Tt vuA1
)(
ννν
kIRyIRxyxcu ii Il
Ij
k
j
k
lljlj ,...,1,,, 21
1 2
1 1,, =∈∈⊗=∑∑
= =
ννν
This gives the matricization Tk k
lllj
k
jj
t vycxA
⊗= ∑∑∑
= == 1 1,,
1
)(21
1
ννν
36
Proof:
( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
( ) .'''
21
111
2
'''222
1
111
1 2
'''2'22111'''1
)(1 1,,
1)(
1 1)()(,,
1)(
1 1 1)()()(,,
)(,),...,(
t
t
ttt
tttttd
i
k k
lllj
k
jij
k k
liillj
k
jij
k k
j
k
liilijlj
tiiii
vycx
vycx
vyxcAA
∈
∈
∈∈∈
∈∈∈∈∈
⊗=
⊗=
==
∑∑∑
∑∑∑
∑∑∑
= ==
= ==
= = =
µµ
µµ
µµµµµµ
µµµµµµµµµµ
ννν
ννν
ννν
37
Matricization in H-Tucker format TI dimension tree, AϵH-Tucker((kt)tϵI) with nested orthogonal frame tree and transfer tensors .
For p>0 let Root(TI) = t0,t1,…,tp-1,tp=t a path of length p. ( )
ITttU ∈ ( )ITttB ∈
Let denote the frames of the corresponding brothers, the corresponding transfer tensors, and the corresponding representation ranks.
pUU ,...,1
10 ,..., −pBBpkk ,...,0
We always assume that the brother is always first ( ) ( )∑∑ +
⊗= +
i jjt
li
ljit ll
UUBU1
1,,νν
Then it holds with complementary frame ( ) ( ) Ttt
kT
ttt VUVUA ==∑
=
1
1
)(
ννν
( ) .11,,
0,,1 1111∑∑ ∑∑∑ ⊗⊗= −
−
pii
pjijjijt ppppp
UUBBV
38
Accumulated Transfer Tensors ( ) ,:ˆ
1
1
1111111
0,,1
0,,1,
1 ∑=
=k
isijisj BBB
( ) ( ) ,,...,2,ˆ:ˆ 1,,
1 1
1,,,
1, 1
1
1
1
1
111plBBBB l
sis
k
s
k
i
k
j
ljijsj
lsj
llll
l
l
l
l
l
l
lllllll=
= −
= =
−−−
−
−
−
−
−−−∑∑ ∑
.ˆ:ˆ pt BB =
are useful for computing all matricizations out of the transfer tensors.
39
H-Tucker
( ) ( ) ( ) ( )∑∑= =
⊗=1 2
211 1
,,
t tk
j
k
lltjtljitit UUBU
0t1t
1t
2t
2t0tB
1tB
1
1 , tt kU
2
2 , tt kU
2
2 , tt kU
1
1 , tt kU
AU t =0
40
(non-orthogonal) H-Tucker for CP ., ,
1,
1
µµµ
µ
Ii
k
ii
d
IRaaA ∈=∑⊗= =
CP-tensor:
H-Tucker representation:
Leaves: ( ) ,:,,...,1,::)(}{ , kkkiaUTLt iitI ===∈=∀ µµµ
Interior nodes, transfer tensors:
( ) .:,,01
::)(\)( ,, kkIRBotherwise
ljiifBTRootTIt t
kkktljitII =∈
==
=∈∀ ××
Root transfer tensor:
( ) .1:,,01
: },...,1{1
},...,1{,,},...,1{ =∈ =
= ××d
kkdljid kIRB
otherwiseljif
B
41
Examples
25,5,:2
1
1
2),...,( 1
==
=
−
=∑ µµ
µ ndiAd
ii dConsider tensor as discretization of the function 1/ ||x|| on [1,25]2.
255 9765625
253x252 15625x625 22
252x253 625x15625 22
252x253 625x15625 22
25x254 25x390625 22
25x254 25x390625 22
25x254 25x390625 22
25x254 25x390625 22
25x254 25x390625 22
Approximation by H-Tucker:
42
Tensor Train
Instead of complete binary tree we can also consider a linear list:
Partitioning the index sets in half: (i1,…,i2d)(i1,…,id)(id+1,…,i2d)
Partitioning the index sets in one and rest: (i1,…,id+1)(i1,…,id)(id+1)
43
Tensor train by recursive splitting:
TT: ∑=
− ⋅⋅=−−−
d
d
dddddd
DDD
jjjjidjjidjjijiii ggggA
,...,
1,...,,;;,;;1,;;2;;1),...,(
21
21
121212111
∑=
=D
jjidjijiii dd
uuuA1
,;,;2,;1),...,( 211CP:
∑∑ ∑ ∑
∑ ∑∑
= =−
= =
= = =
−−−−
−
−
−
−
⋅⋅⋅⋅=
=⋅⋅=⋅=
1
1
2
2
1121
2
2
1
1
21211
1
1
1
1
2
2
322121121111
1 1;;,;;2
1 1,;;2;;1
1 1 1),...,(,,;;2;;1),...,(,;;1),...,(
D
j
D
jjidjjid
D
j
D
jjjiji
D
j
D
j
D
jiijjjijiiijjiii
ddddd
d
d
d
d
ddd
gggg
AggAgA
44
Tensor train by recursive splitting:
1i 2i 3i 4i
),...,( 41 iiA
∑=
⋅1
1
432111
),,(;;1
D
jiiiji Ag
1i 2i 3i 4i
1j
1i 2i 3i 4i
1j 2j
∑=
⋅21
21
4321211
,
1,),(,;;2;;1
DD
jjiijjiji Agg
3j1j
1i 2i 3i 4i
2j
∑=
321
321
3432321211
,,
1,,;;4,;;3,;;2;;1
DDD
jjjjijjijjiji gggg
45
Tensor Train Network
i1 i2 …. id-1 id A
i1j1 j1i2j2 j2i3j3 jd-2id-1jd-1 jd-1id …
46
Matrix Formulation
dd
ddddd
d
d
d
d
ddd
id
id
ii
D
j
D
jjidjjid
D
j
D
jjjiji
D
j
D
j
D
jiijjijiiijiii
GGGG
gggg
AggAgA
⋅⋅⋅⋅=
=⋅⋅⋅⋅=
=⋅⋅=⋅=
−
−−−−
−
−
−
−
−
= =−
= =
= = =
∑∑ ∑ ∑
∑ ∑∑
121
1
1
2
2
1121
2
2
1
1
21211
1
1
1
1
2
2
3212112111
121
1 1;;,;;1
1 1,;;2;;1
1 1 1),...,(,;;2;;1),...,(;;1),...,(
with Dj-1 x Dj – matrices as core tensors. jijG
and are vectors (row, resp. column vector).
The j indices are related to the matrix product.
;1;,...,1 01 ==∈= ×−
dDDi
jjj DDwithIRGni jjj
11nG dn
dG
47
Visualization
dd id
id
ii GGGGA 121121−−=
dd
dnd
nd
nn
dd
ii
GGGG
GGGGA
121
1
121
111
12
11
,...,−−
−
=
d
dd
niiii
⋅⋅−
1121
Visualization:
48
Periodic case:
With additional index jd, and summation over jd represented by trace summation.
dDDi
jjj DDdjforIRGni jjj ==∈= ×−
0,,...,1;,...,1 1
( )dd
dddddd
d
d
d
d
d
d
id
id
ii
D
j
D
jjjidjjid
D
jjjijji
D
jii
GGGGtrace
ggggA
⋅⋅⋅⋅=
=⋅⋅⋅⋅=
−
−−−−
−
−
−
= =−
= =∑∑ ∑ ∑
121
1
1
2
2
1121
1
1
212111
121
1 1,;;,;;1
1,;;2,;;1
1),...,(
3j1j
1i 2i 3i 4i
2j
∑=
4321
4321
434323212141
,,,
1,,,,;;4,;;3,;;2,;;1
DDDD
jjjjjjijjijjijji gggg
4j
49
Rank - Unfolding
{ }{ } )(:,: ,...,,,..., 11 ppiiiip ArankrAAdpp
==+
Dp are called compression ranks = size of matrices
Theorem: If for each unfolding rank(Ap) = rp = Dp, then there exists a TT decomposition with compression ranks not higher than Dp.
Proof: { }∑=
⋅==1
1
2111211
...,,)...(,1 :n
jiijjiiii
Tdd
VUAUVA
Consider V also as a d-1 tensor with indices with „long index“ j1i2 varying from 1 to n1n2 and consider all unfoldings of V resulting in V2,…,Vd . We show: rank(Vp) ≤ rp .
{ } { }diiijV ,...,, 321
50
Proof To prove rank(Vp) ≤ rp we express V as
( ) WAUUUAVUVA TTTTT1
111 ==⇒=
−{ } { }∑
=
=⇒1
1
1113,211
,,...,,...,,
n
ijiiiiiij WAV
dd
Repeat what we have started with A and i1 now for V and i2. …
componentwise
Because the p-th mode has compression rank rp it holds
⇒=∑=
+
p
dppd
r
iiiiii GFA1
,...,,,,...,,..., 111β
ββ
{ }{ }
∑
∑∑
=
= =
+
++
=
===
p
dpp
p
dppdpp
r
iiiij
n
i
r
iiiijiiiiijp
GH
GFWVV
1,...,,,...,,
1 1,...,,,,...,,,...,,...,
1,21
1
1
1111121
ββ
βββ
β with ∑=
⋅=1
1
1113211
,,,...,,,...,,
n
ijiiiiiij WFH
pp ββ
resulting in rank(Vp) ≤ rp.
componentwise
51
Complexity
Following the proof the TT form of a general tensor A can be derived by successive SVD of matrix unfoldings.
The number of parameters in the TT format is bounded by (d-2)nD2 + 2nD where n = max(n1,…,nd), D = max(r1,…,rd). Proof: The number of core tensor matrices is bounded by n and the number of entries is bounded by D2 for interior core tensors, resp. D for the first/last vector.
In the periodic case the bound is given by dnD2.
52
Approximation
Suppose that the unfolding matrices are only approximated by low rank terms
1,...,1,,)(, −===+= dpErRrankERA pFpppppp ε
Theorem: With the algorithm we can compute for a given Tensor A an TT-tensor B with ranks rp and
∑−
=
≤−1
1
2d
ppF
BA ε
F
bestF
AAdBA −−≤− 1
53
Recompression TT TT
Let us assume that we have already given a tensor in the TT format
dd id
id
ii GGGGA 121121−−=
We want the derive minimal ranks, resp. compute approximations with smaller ranks. More tomorrow.
54
Basic Operations
Addition:
( )
=
=+=+==
−
−
−
−
d
d
d
d
ddd
id
id
id
id
i
iii
id
iid
iid
i
BA
BA
BA
BA
BBAABACCC
1
1
2
211
111
1
1
2
211
111
00
00
Scalar multiplication: ( ) dd
d
id
iiid
iii AAAAAA 211
1 211... ααα ==
( ) ( )
=
=+=+==
d
d
ddd
id
id
i
i
id
iid
iid
i
BA
BA
trace
BBAAtraceBACCCtrace
00
00
1
1
111
1
1
111
Periodic:
55
Hadamard product ( )( )
( ) ( )( ) ( )dd
dd
ddd
id
id
ii
id
iid
i
id
iid
iid
i
BABA
BBAA
BBAABACCC
⊗⊗=
=⊗=
====
11
11
111
11
11
111
Inner product: Take the Hadamard product to derive C in TT-format, and then compute the contractions with vectors of all ones.
∑
∑∑−−=
==
d
dd
d
d
d
dd
ii
id
id
ii
iiii
iiiiii
CCCC
CBABA
,...,121
,...,...
,...,......
1
121
1
1
1
11,
56
Inner Product:
∑ ∑ ∑∑
∑ ∑
∑
∑∑
=
==
==
==
−
−
−
−−
−
−
−
1 2
2
21
1
1
1
1 11
1
1
12
2
21
1
1
1
121
1
1
1
11
,2,1
,..., ,...,,,1,2,1
,...,121
,...,...
,...,......,
j i
ijj
i
ij
ii jj
ijd
ijjd
ijj
ij
ii
id
id
ii
iiii
iiiiii
CC
CCCC
CCCC
CBABA
d d
d
d
d
dd
d
dd
d
d
d
dd
221 nDD11nD21DD
)( 2ndDO
57
Equivalent formulation for Vector
( )
( ) ( ) ( )
∑
∑ ∑∑
∑ ∑
∑ ∑
∑
−
−−−
−
−
−
−
−
−
−
−−
⊗⊗⊗⊗=
⊗⊗
=
⊗⊗⊗=
=⊗⊗⊗
=
==
−
−
11
112211
11
1
1
1
1
1
11 1
12
2
211
1
1
1
21
11
1
1
12
2
21
1
1
1
11
...,,1,2,1
...,,1
... ...,,2,1
... ...,1,2,1
.........
d
ddd
d d
d
d
d
d d
d
d
d
d
d
d
d
d
d
dd
d
dd
jjjdjjdjjj
jj ii
ijd
ii
ij
jj iii
ijdi
ijji
ij
iiiii
jj
idj
ijjd
ijj
ij
iiiiii
uuuu
eAeA
eAeAeA
eeeAAAA
eAx
with vectors of length nk. Compare CP. kk jjku
1, −
58
Inner Product for vector:
( ) ( ) ( )
∑
∑
∑∑
∑∑
∑∑
−−
−−
−−
−−
−
−
−
−
′′′′
′′′′
′′′′
′′′′
⋅=
⋅=
⊗⊗⊗⋅
⊗⊗⊗=
===
==
1111
11221111
1111
11212111
11
1211
11
1211
1
1
1
11
1
11
1
11
''...,,2,1
''...,,,2,2,1,1
...,,2,1
''...,,2,1
......
.........
.........
''.........
dd
dd
dd
dd
d
d
d
d
d
d
d
dd
d
dd
d
dd
jjjjjjdjjjjjj
jjjjjd
Tjdjj
Tjjj
Tj
jjjdjjj
T
jjjdjjj
iiii
iiiiii
iiiiii
ii
Tiiii
T
www
uvuvuv
uuuvvv
CAB
eAeBxy
59
Special Cases
dd
d
id
id
ii
jdjji
aaaa
eeee121
21
121
,,2,1
−−=
=⊗⊗⊗=
Representing unit vector ei:
with
≠=
==kk
kkji
ik jiif
jiifa
kk
k
01
,δ
( ) iii
iiii aaaa ,12121 ,
01
01
0001
21
21 δ===
⊗
=
60
Examples
( ) ( ) ( )( ) ( ) ( ) 1
0
000111
k
k
AA
( )Tee 00110..00 ==
1
0
1000
1000
1000
0001
0001
0001
k
k
A
A
( )T1001
dd
d
id
id
iiii AAAAx 121
1 121...−−=
1
0
000000001
000010000
100000000
100010000
100000001
000010001
k
k
A
A
( )T10001010
61
Application to functions
∑∑∑===
++++≈d
jjjjjjjjj
d
jjjjjj
d
jjjd xxxgxxgxggxxxf
1,,,3
1,,2
1,1021
321
321321
21
2121
1
11...),,(),()(),...,,(
Truncated ANOVA decomposition:
Find functions gk that allow a good approximation of f.
)()()()(),...,,( 11221121 ddddd xgxGxGxgxxxf −−≈
Tensor train approximation with matrices Gk:
Similarly we can generalize CP, Tucker, H-Tucker to functions.
∑=
−−≈n
jdjddjdjjd xgxgxgxgxxxf
1,1,12,21,121 )()()()(),...,,(
62
Representing Matrices
{ }{ }dddd
dd
jid
jid
jijijjii GGGGAM ,,
1,
2,
1......112211
11
−−−==
∑−
−−−⊗⊗⊗⊗= −
11
112211...
,,1,2,1d
dddjj
jdjjdjjj UUUUM
Special case: Laplacian
63
Generalizing Matrix Properties
Let A be a symmetric n-dimensional tensor and xm a rank-one tensor for vector x ϵ IR :
;:;11... mm ii
mii xxxaA ==
IRxxaAxxfn
iiiiii
m
m
mm∈== ∑
=1,...,...
1
11::)(
Define the n-dimensional homogeneous polynomial of degree m
A symmetric: invariant under any index permutations. A positive definite: f(x) > 0 for all x ≠ 0.
64
Eigenvalue
n
n
i
n
iiiiiii
m IRxxaAxm
mm∈
=
==
− ∑11...
,...,,1
2
22: Define by a vector.
We call a number λ ϵ C an eigenvalue of A if λ and x≠0 are solutions of the polynomial equation
( ) ( )immii
m IxxAx 111 −−− == λλ
11 −− = mm IxAx λ , or the vector x is a fixed point of operator A.
Here I is the identity tensor: ai,i,…,i=1, and a=0 otherwise.
65
Matrix terms A Hermitian: - Invariant subspace: Ax=λx - Rayleigh quotient: xTAx/xTx - Lagrange multipliers: xTAx – λ (||x||2 – 1) - Best rank-1 approximation: min||x||=1 ||A – λ xxT||
A general: - Pseudospectrum: σε(A) = {λ | ||(A-λI)-1|| > ε-1 } - Numerical range: W(A) = {x*Ax | xTx=1}
How to generalize to tensors?
66
Numerical Range
( ) ∑=⋅=d
ddii
idiiidd xxAxxAxxA,...,
,,1,...,111
11,...,),...,( Rayleigh Q.
( ) ∑=⋅==d
ddii
iiii xxAxxAxxAAW,...,
,...,1
11,...,),...,()( Range:
CP or Tucker as generalization of SVD: -CP loosing the orthogonality! - Tucker loosing the diagonal form of the core tensor!
67
Eigenvalues As critical points of the Rayleigh quotient
),...,(),...,(),...,(
xxIxxA
xxxA
d
d
=
By Lagrangian ( )1),...,(:),( −−= d
dxxxAxL λλ
Characteristic Polynomial p(λ) is the resultant of the two polynomials Axm-1 and λxm-1 (searching for common zeros).
Eigenvalues are roots of p. #eigenvalues: n(m-1)n-1 Product of eigenvalues = det(A) = resultant of Axm-1 and 0 Sum of all eigenvalues is equal (m-1)n-1trace(A)
Liqun Qi, Lek-Heng Lim
68
Software
Kolda: Data structures, CP, Tucker Oseledets: TT Kressner, Tobler: H-Tucker ALPS: Quantum simulation
http://www.sandia.gov/~tgkolda/TensorToolbox/index-2.5.html http://spring.inm.ras.ru/osel/?page_id=24 http://www.sam.math.ethz.ch/NLAgroup/htucker_toolbox.html https://www.rdb.ethz.ch/projects/project.php?proj_id=8486