View
10
Download
0
Category
Preview:
Citation preview
Screening Tests for the LASSO
Hanxiao Liuhanxiaol@cs.cmu.edu
November 3, 2015
1 / 18
Outline
LASSO
Primal & Dual formPrimal-dual correspondence
Safe Test for LASSO
Static case (example: sphere test)Dynamic case
Better safe test based on duality gap
Geometric illustrationConvergence
Empirical Results
2 / 18
LASSO
X ∈ Rn×p where p� n
minβ∈Rp
1
2‖Xβ − y‖22 + λ‖β‖1 (1)
Commonly used forhigh-dimensional feature selection
Today’s topic—screening tests
Rules to early discard irrelevant features prior to starting aLASSO solver without affecting the final opt solution.
The “chicken-and-egg problem”?
3 / 18
LASSO Dual
Primal: minβ∈Rp,z∈Rn
1
2‖y − z‖22 + λ‖β‖1 s.t. z = Xβ (2)
Dual: maxu∈Rn
minβ,z
1
2‖y − z‖22 + λ‖β‖1 + u> (z −Xβ)︸ ︷︷ ︸
Lagrangian L(β,z,u)︸ ︷︷ ︸Dual Objective g(u)
(3)
=⇒ maxu
[minz
(1
2‖y − z‖22 + u>z
)− λmax
β
(u>X
λβ − ‖β‖1
)](4)
=⇒ maxu
1
2‖y‖22 −
1
2‖u− y‖22 − λI{∥∥∥X>u
λ
∥∥∥∞≤1} (5)
θ=uλ====⇒ max
θ∈Rn1
2‖y‖22 −
λ2
2
∥∥∥θ − y
λ
∥∥∥2
2s.t.
∥∥X>θ∥∥∞ ≤ 1 (6)
4 / 18
Karush-Khun-Tucker Condition
Primal-dual correspondence
β, z ∈ argminβ,z L(β, z, θ
)(7)
⇓
0n ∈ ∂βL(β, z, θ
)(8)
= ∂β
[1
2‖y − z‖22 + λ‖β‖1 + λθ>
(z −Xβ
)](9)
= λ∂β‖β‖1 − λX>θ (10)
⇓
x>j θ ∈ ∂βj |βj | =
{sign
(βj)
βj 6= 0
[−1, 1] βj = 0∀j ∈ [p] (11)
Key observation: |x>j θ| < 1 =⇒ βj = 0
5 / 18
Safe Test
|x>j θ| < 1 =⇒ βj = 0
Challenge: dual solution θ is unknown
Workaround: relaxation
Let C be a set containing θ and define µC (xj) := supθ∈C |x>j θ|.Obviously |x>j θ| ≤ µC (xj).
Safe Test
µC (xj) < 1 =⇒ βj = 0 (12)
The test is useful when
1 µC (xj) can be evaluated efficiently.
2 C is small—thus leading to small µC (xj).
Goal: Find C satisfying both conditions above.
6 / 18
Sphere Tests
Parametrize C as a closed `2-ball B (c, r) = {θ : ‖θ − c‖2 ≤ r}.
µC (xj) = µB(c,r)(xj) = supθ∈B(c,r)
|x>j θ| = |c>xj |+ r‖xj‖2 (13)
Q: How to find B(c, r) that contains θ without knowing θ?A: Any dual-feasible θ′ defines a ball over θ∥∥θ − y
λ︸︷︷︸c
∥∥2≤∥∥θ′ − y
λ
∥∥2︸ ︷︷ ︸
r
(14)
Recall: maxθ∈Rn12‖y‖
22 − λ2
2
∥∥θ − yλ
∥∥2
2s.t.
∥∥X>θ∥∥∞ ≤ 1︸ ︷︷ ︸
θ∈∆X
A trivial feasible point: θ′ = yλmax
where λmax := ‖X>y‖∞ 1.
=⇒ c =y
λ, r = ‖y‖
∣∣∣∣ 1λ − 1
λmax
∣∣∣∣ (15)
1In fact, θ′ obtained in this manner is the dual solution when λ = λmax,corresponding to all-zero primal solution.
7 / 18
Dynamic Safe Tests
To iteratively apply safe tests as the algorithm proceeds
Recall ∀θ′ ∈ ∆X defines an `2-ball containing θ
Let θk ∈ ∆X be a dual-feasible point at iteration k, {θk}k∈Ndefines a sequence of balls
{B( yλ ,∥∥θk − y
λ
∥∥2
)}k∈N
Each ball defines a safe test
How θk is defined via βk?
θk := Π∆X∩span(ρk)
( yλ
)where ρk := y −Xβk
Intuition1 Dual opt: θ = Π∆X
(yλ
)2 Primal-dual correspondence: θ ∈ span (ρ) where ρ := y −Xβ
8 / 18
Mind the Duality Gap
Can we better bound θ by also leveraging primal info?
∀ (β, θ) ∈ Rp ×∆X , we claim
R (β) ≤∥∥θ − y
λ
∥∥ ≤ R (θ) (16)
where
R (θ) = ‖θ − yλ‖ (dual optimality)
R (β) = 1λ
√(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖2)+ (duality gap)
weak duality
1
2‖y‖2 − λ2
2
∥∥θ − y
λ
∥∥2 ≤ 1
2‖y −Xβ‖2 + λ‖β‖1 (17)
Therefore, θ lies in an annulus, i.e. A(yλ , R (θ) , R (β)
)9 / 18
Geometric Illustration
Recall R (β) ≤∥∥θ − y
λ
∥∥ ≤ R (θ)
1[Fercoq et al., 2015]
10 / 18
Fine-grained Analysis
Two geometrical observations
1
[θ, θ]⊆ A
(yλ , R (θ) , R (θ)
)Proof.
Convexity of polyhedron ∆X =⇒ convexity of ∆X ∩B( yλ , R (θ)
)=⇒ convexity of ∆X ∩A
(yλ , R (θ) , R (β)
)containing θ, θ
2 vecAngle(θ − θ, yλ − θ
)≥ 90◦
Proof.
∀θ′ ∈[θ, θ]
we have ‖ yλ − θ‖ ≤ ‖yλ − θ
′‖ as θ, θ′ ∈ ∆X and
θ = Π∆X
( yλ
). However, suppose vecAngle
(θ − θ, yλ − θ
)< 90◦,
contradiction occurs by setting θ′ := Π[θ,θ]( yλ
).
11 / 18
Geometric Illustration
Recall vecAngle(θ − θ, yλ − θ
)≥ 90◦
1[Fercoq et al., 2015]
12 / 18
Safe Tests Refined
Two convex relaxation schemes
Sphere Crelaxed = B(θ,
√R (θ)2 − R (β)2︸ ︷︷ ︸
r(θ,β)
)(18)
Dome Crelaxed = conv (darkBlueRegion) (19)
1[Fercoq et al., 2015]13 / 18
Convergence
Proposition
Let G (β, θ) be the LASSO duality gap, ∀ (β, θ) ∈ Rp ×∆X wehave r (β, θ)2 ≤ 2
λ2G (β, θ)
1
2‖y‖2 − λ2
2
∥∥θ − y
λ
∥∥2︸ ︷︷ ︸dual obj
+G(β, θ) =1
2‖y −Xβ‖2 + λ‖β‖1︸ ︷︷ ︸
primal obj
(20)
2
λ2G(β, θ) =
∥∥θ − y
λ
∥∥2︸ ︷︷ ︸=R(θ)2
− 1
λ2
(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖1
)︸ ︷︷ ︸
≤R(β)2
(21)
≥r (β, θ)2 (22)
=⇒ limk→∞ r (βk, θk) = 0. Convergence of domes is also implied.
14 / 18
Experiment Results (a)
Proportion of active variables v.s. (1) num of iterations (2) λ
1[Fercoq et al., 2015]
15 / 18
Experiment Results (b)
Leukemiapn = 7129
72 ≈ 99.0
20NewsGrouppn = 10094
961 ≈ 10.5
RCV1pn = 47236
20242 ≈ 2.3
1[Fercoq et al., 2015]
16 / 18
Conclusion
Summary
LASSO - primal & dual
Safe rules - discard irrelevant variables prior to optimization
Refined safe rules based on duality gap
Additional Note
Safe rules can be applied to other models as well, e.g. supportvector machines [Ogawa et al., 2013]
17 / 18
Reference I
El Ghaoui, L., Viallon, V., and Rabbani, T. (2010).Safe feature elimination in sparse supervised learning technical report no.Technical report, UCB/EECS-2010-126, EECS Department, University ofCalifornia, Berkeley.
Fercoq, O., Gramfort, A., and Salmon, J. (2015).Mind the duality gap: safer rules for the lasso.arXiv preprint arXiv:1505.03410.
Ogawa, K., Suzuki, Y., and Takeuchi, I. (2013).Safe screening of non-support vectors in pathwise svm computation.In Proceedings of the 30th International Conference on Machine Learning, pages1382–1390.
Xiang, Z. J., Wang, Y., and Ramadge, P. J. (2014).Screening tests for lasso problems.arXiv preprint arXiv:1405.4897.
18 / 18
Recommended