Input: i) n⇥ d input data matrix S (or covariance A =
1/n · S>S)ii) k: # of components, iii) s # nnz entries/component,
iv) accuracy ✏ 2 (0, 1), v) r: rank of approximation,
Output: X(r) 2 Xk such that
Tr�X>
(r)AX(r)
�� (1� ✏) · OPT� 2 · k · kA�Ak2,
in time TSKETCH(r) + TSVD(r) +O⇣�
4✏
�r·k · d · (s · k)2⌘.
Theorem II: Algo Guarantees (Full rank)
Sparse PCA via Bipartite MatchingsMegasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alex Dimakis
[ Multiple Sparse Components ]
Given a covariance matrix , find direction of maximum variance, as a linear combination of only a few variables:
Component1 2 3 4 5 6
CumulativeExpl.Variance
0
500
1000
1500
2000
2500
3000
+20:99%
k = 6 components, s = 10 nnz/component
TPowerEM-SPCASpanSPCASPCABiPart
Number of target components2 3 4 5 6 7 8
TotalCumulativeExpl.
Variance
0
500
1000
1500
2000
2500
3000
3500
+10.01%
+16.22%
+17.94%+19.10%
+20.99%+23.57%
+24.51%
s = 10 nnz/component
SPCABiPartSpanSPCAEM-SPCATPower
2
664
1 0 0 ✏0 � 0 00 0 � 0✏ 0 0 1
3
775
Solution I: Deflation
A =
2
664
1 0 0 ✏0 � 0 00 0 � 0✏ 0 0 1
3
775
2
664
1 0 0 ✏0 � 0 00 0 � 0✏ 0 0 1
3
775
Solution II: Joint Optimization
[ One approach: Deflation ]
[ Sparse PCA ]
Ax
>
x
A XX>
x? = arg max
x2X
�x
>Ax
�
X =�x 2 Rd : kxk2 = 1, kxk0 = s
Find multiple sparse components with disjoint support sets:
Example: NY Times text corpus - Find 8 components, each 10-sparse. - Sparse disjoint components interpreted as distinct topics.
[ SPCA on a Low Dim Sketch]
Xk =
(X 2 Rd⇥k : kXjk2 = 1, kXjk0 = s, 8j
supp(Xi) \ supp(Xj) = ;, 8 i, j
)
X? = arg max
X2Xk
Tr�X>AX
�(MultiSPCA)
SPCA Algo
SKETCHS S
A 1/n · S>S
A
X(r)
[ In Practice ]- Taking too long? - Run our algorithm and stop it any time. → Ignore the theoretical guarantees → Still finds solutions with higher explained variance, compared to deflation based methods.
Example: Leukemia Dataset - # samples n = 72, dimension d=12582 (probe sets) - Compare to deflation using TPower, EM-SPCA and SpanSPCA
for the single component SPCA problem.
�max
✓1 ✏✏ 1
�◆�max
✓� 00 �
�◆+
= 1 + 1 = 2
= 1 + ✏+ � ⌧ 2
+�max
✓1 00 �
�◆�max
✓� 00 1
�◆
Problem: Given a 4 x 4 PSD matrix , find two 2-sparse components with disjoint supports, that maximize .
x1,x2
x
>1 Ax1 + x
>2 Ax2
A
Compute components one-by-one. - Compute one sparse PC. - Remove used variables from the dataset. - Repeat.
A
Theorem I: Algo Guarantees (Low rank)Input: i) d⇥d rank-r PSD matrixA, ii) k: desired # of components,
iii) s # nnz entries/component, iv) accuracy ✏ 2 (0, 1).
Output: X 2 Xk such that
Tr⇣X
>AX
⌘� (1� ✏) · OPT,
in time TSVD(r) +O⇣�
4✏
�r·k · d · (s · k)2⌘.
kX
j=1
DbXj ,Wj
E2=
kX
j=1
X
i2Ij
W 2ij .
Observation I If we knew the support sets , we could determine the optimal value based on Cauchy-Schwarz:
I1, . . . , Ik
u(1)1
u(1)s
...
u(k)1
u(k)s
...
v1
vp
vi
...
...
...
W 2i1
W 2i1
W 2ik
W 2ik
U1
Uk
Maximum Weight Matching on G: - Each vertex on the left is mapped to a vertex on the right.→ indices are assigned to each “support set”.
- Each right vertex is used at most once.→ Support sets are disjoint.
- Maximum weight = maximum objective in (✳).
Consider the complete bipartite graph G on vertices:k · s+ d
[ Subroutine ]
bX = arg max
X2Xk
kX
j=1
⌦Xj ,Wj
↵2
: disjoint support sets of the k components (columns of ).I1, . . . , Ik
In reality, data is not low rank. However, maybe close to low rank. → Spectrum of may be sharply decaying → is well approximated by a low rank matrix.
A
A
[ Our Algorithm ]
A = VV>
x
>Ax = kV>
xk22 �⌦V
>x, c
↵28c 2 Rr : kck2 = 1
x
>Ax = max
c2Sr�12
hx, Vci2
Matrix is PSD and can be decomposed into .
Observation I
For multiple components…
Observation II
max
X2Xk
Tr�X>AX
�= max
X2Xk
max
C:Cj2Sr�12 8j
kX
j=1
⌦Xj , VCj
↵2.
V V>A =A
In turn, a variational characterization is the following:
Fix the value of the variable . Let .r ⇥ k C
→ SPCA reduces to determining the optimal . → Low dimensional variable: sample to find the best.
bX = arg max
X2Xk
kX
j=1
⌦Xj ,Wj
↵2
W VC
C
Input: rank- PSD - Initialize empty collection - Compute ( ) - For
Sample Compute Solve
Add to the collection . Output: Best solution in collection .
d⇥ d r A
i = 1 : O⇣(4/✏)r·k
⌘
S
S
W VC
bX = arg max
X2Xk
kX
j=1
⌦Xj ,Wj
↵2
bXX
C
V Chol(A) d⇥ r
[ Algorithm ]
S
AThink of the d x d matrix as having rank r. For now r < d.
[ Summary ]- First algorithm for multi-component SPCA with disjoint supports;
Operates by recasting MultiSPCA into multiple instances of the bipartite maximum weight matching problem.
- Provable approximation guarantees. - Complexity:
- Low-order polynomial in the ambient dimension d, but - Exponential in the intrinsic dimension r.
bX
✳
s
Input: matrix : # nnz entries / column of ) 1. Construct bipartite graph G as above. 2. Compute maximum weight matching
to determine the supports 3. Compute each column of for the given
support based on Cauchy-Schwarz.
I1, . . . , Ik
Ws bX
bX
[ Algorithm ]d⇥ k