Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors...

Screening Tests for the LASSO

Hanxiao Liuhanxiaol@cs.cmu.edu

November 3, 2015

1 / 18

Outline

Primal & Dual formPrimal-dual correspondence

Safe Test for LASSO

Static case (example: sphere test)Dynamic case

Better safe test based on duality gap

Geometric illustrationConvergence

Empirical Results

2 / 18

X ∈ Rn×p where p� n

minβ∈Rp

2‖Xβ − y‖22 + λ‖β‖1 (1)

Commonly used forhigh-dimensional feature selection

Today’s topic—screening tests

Rules to early discard irrelevant features prior to starting aLASSO solver without affecting the final opt solution.

The “chicken-and-egg problem”?

3 / 18

LASSO Dual

Primal: minβ∈Rp,z∈Rn

2‖y − z‖22 + λ‖β‖1 s.t. z = Xβ (2)

Dual: maxu∈Rn

minβ,z

2‖y − z‖22 + λ‖β‖1 + u> (z −Xβ)︸︷︷︸

Lagrangian L(β,z,u)︸︷︷︸Dual Objective g(u)

=⇒ maxu

2‖y − z‖22 + u>z

)− λmax

λβ − ‖β‖1

=⇒ maxu

2‖y‖22 −

2‖u− y‖22 − λI{∥∥∥X>u

∥∥∥∞≤1} (5)

θ=uλ====⇒ max

θ∈Rn1

2‖y‖22 −

∥∥∥θ − y

∥∥∥2

∥∥X>θ∥∥∞ ≤ 1 (6)

4 / 18

Karush-Khun-Tucker Condition

Primal-dual correspondence

β, z ∈ argminβ,z L(β, z, θ

0n ∈ ∂βL(β, z, θ

= ∂β

2‖y − z‖22 + λ‖β‖1 + λθ>

(z −Xβ

= λ∂β‖β‖1 − λX>θ (10)

x>j θ ∈ ∂βj |βj | =

βj 6= 0

[−1, 1] βj = 0∀j ∈ [p] (11)

Key observation: |x>j θ| < 1 =⇒ βj = 0

5 / 18

Safe Test

|x>j θ| < 1 =⇒ βj = 0

Challenge: dual solution θ is unknown

Workaround: relaxation

Let C be a set containing θ and define µC (xj) := supθ∈C |x>j θ|.Obviously |x>j θ| ≤ µC (xj).

Safe Test

µC (xj) < 1 =⇒ βj = 0 (12)

The test is useful when

1 µC (xj) can be evaluated efficiently.

2 C is small—thus leading to small µC (xj).

Goal: Find C satisfying both conditions above.

6 / 18

Sphere Tests

Parametrize C as a closed `2-ball B (c, r) = {θ : ‖θ − c‖2 ≤ r}.

µC (xj) = µB(c,r)(xj) = supθ∈B(c,r)

|x>j θ| = |c>xj |+ r‖xj‖2 (13)

Q: How to find B(c, r) that contains θ without knowing θ?A: Any dual-feasible θ′ defines a ball over θ∥∥θ − y

λ︸︷︷︸c

∥∥2≤∥∥θ′ − y

∥∥2︸︷︷︸

Recall: maxθ∈Rn12‖y‖

22 − λ2

∥∥θ − yλ

∥∥2

∥∥X>θ∥∥∞ ≤ 1︸︷︷︸

θ∈∆X

A trivial feasible point: θ′ = yλmax

where λmax := ‖X>y‖∞ 1.

=⇒ c =y

λ, r = ‖y‖

∣∣∣∣ 1λ − 1

∣∣∣∣ (15)

1In fact, θ′ obtained in this manner is the dual solution when λ = λmax,corresponding to all-zero primal solution.

7 / 18

Dynamic Safe Tests

To iteratively apply safe tests as the algorithm proceeds

Recall ∀θ′ ∈ ∆X defines an `2-ball containing θ

Let θk ∈ ∆X be a dual-feasible point at iteration k, {θk}k∈Ndefines a sequence of balls

{B( yλ ,∥∥θk − y

∥∥2

)}k∈N

Each ball defines a safe test

How θk is defined via βk?

θk := Π∆X∩span(ρk)

)where ρk := y −Xβk

Intuition1 Dual opt: θ = Π∆X

)2 Primal-dual correspondence: θ ∈ span (ρ) where ρ := y −Xβ

8 / 18

Mind the Duality Gap

Can we better bound θ by also leveraging primal info?

∀ (β, θ) ∈ Rp ×∆X , we claim

R (β) ≤∥∥θ − y

∥∥ ≤ R (θ) (16)

R (θ) = ‖θ − yλ‖ (dual optimality)

R (β) = 1λ

√(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖2)+ (duality gap)

weak duality

2‖y‖2 − λ2

∥∥θ − y

∥∥2 ≤ 1

2‖y −Xβ‖2 + λ‖β‖1 (17)

Therefore, θ lies in an annulus, i.e. A(yλ , R (θ) , R (β)

)9 / 18

Geometric Illustration

Recall R (β) ≤∥∥θ − y

∥∥ ≤ R (θ)

1[Fercoq et al., 2015]

10 / 18

Fine-grained Analysis

Two geometrical observations

[θ, θ]⊆ A

(yλ , R (θ) , R (θ)

)Proof.

Convexity of polyhedron ∆X =⇒ convexity of ∆X ∩B( yλ , R (θ)

)=⇒ convexity of ∆X ∩A

(yλ , R (θ) , R (β)

)containing θ, θ

2 vecAngle(θ − θ, yλ − θ

)≥ 90◦

Proof.

∀θ′ ∈[θ, θ]

we have ‖ yλ − θ‖ ≤ ‖yλ − θ

′‖ as θ, θ′ ∈ ∆X and

θ = Π∆X

). However, suppose vecAngle

(θ − θ, yλ − θ

)< 90◦,

contradiction occurs by setting θ′ := Π[θ,θ]( yλ

11 / 18

Geometric Illustration

Recall vecAngle(θ − θ, yλ − θ

)≥ 90◦

12 / 18

Safe Tests Refined

Two convex relaxation schemes

Sphere Crelaxed = B(θ,

√R (θ)2 − R (β)2︸︷︷︸

r(θ,β)

Dome Crelaxed = conv (darkBlueRegion) (19)

1[Fercoq et al., 2015]13 / 18

Convergence

Proposition

Let G (β, θ) be the LASSO duality gap, ∀ (β, θ) ∈ Rp ×∆X wehave r (β, θ)2 ≤ 2

λ2G (β, θ)

2‖y‖2 − λ2

∥∥θ − y

∥∥2︸︷︷︸dual obj

+G(β, θ) =1

2‖y −Xβ‖2 + λ‖β‖1︸︷︷︸

primal obj

λ2G(β, θ) =

∥∥θ − y

∥∥2︸︷︷︸=R(θ)2

(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖1

)︸︷︷︸

≤R(β)2

≥r (β, θ)2 (22)

=⇒ limk→∞ r (βk, θk) = 0. Convergence of domes is also implied.

14 / 18

Experiment Results (a)

Proportion of active variables v.s. (1) num of iterations (2) λ

15 / 18

Experiment Results (b)

Leukemiapn = 7129

72 ≈ 99.0

20NewsGrouppn = 10094

961 ≈ 10.5

RCV1pn = 47236

20242 ≈ 2.3

16 / 18

Conclusion

Summary

LASSO - primal & dual

Safe rules - discard irrelevant variables prior to optimization

Refined safe rules based on duality gap

Additional Note

Safe rules can be applied to other models as well, e.g. supportvector machines [Ogawa et al., 2013]

17 / 18

Reference I

El Ghaoui, L., Viallon, V., and Rabbani, T. (2010).Safe feature elimination in sparse supervised learning technical report no.Technical report, UCB/EECS-2010-126, EECS Department, University ofCalifornia, Berkeley.

Fercoq, O., Gramfort, A., and Salmon, J. (2015).Mind the duality gap: safer rules for the lasso.arXiv preprint arXiv:1505.03410.

Ogawa, K., Suzuki, Y., and Takeuchi, I. (2013).Safe screening of non-support vectors in pathwise svm computation.In Proceedings of the 30th International Conference on Machine Learning, pages1382–1390.

Xiang, Z. J., Wang, Y., and Ramadge, P. J. (2014).Screening tests for lasso problems.arXiv preprint arXiv:1405.4897.

18 / 18

Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors...

Documents

PROTOCOLLO DI SCREENING Valutazione …glipontedilegno.altervista.org/docs/protocollo-screening.pdf · Ha spesso problemi di “disnomia”, cioè “non gli vengono le parole”

IN VITRO BLAST SCREENING TECHNIQUE FOR GENETIC STUDIES …ejtafs.mardi.gov.my/jtafs/14-1/Blast screening.pdf · IN VITRO BLAST SCREENING TECHNIQUE FOR GENETIC STUDIES OF RICE B.H

Pathwise coordinate optimization - Stanford University

Pathwise analysis and robustness of hedging strategies for path … · 2019-01-10 · Motivation Pathwise Functional Calculus Pathwise Continuous-Time Trading Pathwise analysis of

Pathwise Framework induction program: beginning teacher ...€¦ · PATHWISE FRAMEWORK INDUCTION PROGRAM: BEGINNING TEACHER OVERVIEW . ... schools after their first year, ... •

Breast Cancer and Breast Screening - easyhealth.org.ukeasyhealth.org.uk/sites/default/files/null/Breast Cancer & Breast Screening.pdf · Breast Cancer and Breast Screening. Breast

The Pathwise MIMO Interfering Broadcast Channel · The Pathwise MIMO Interfering Broadcast Channel Wassim Tabikh,Dirk Slock& Yi-Yuan-Wu# Mobile Communications Department, EURECOM

On the Pathwise Large Deviations of Stochastic Di erential ...doras.dcu.ie/14870/1/thesis_huizhong_wu2009.pdf · On the Pathwise Large Deviations of Stochastic Di erential and Functional

dysphagya screening.pdf

Robust trading strategies, pathwise It^o calculus, and ... · Functional and pathwise extension of the Fernholz{Karatzas stochastic portfolio theory (A.S., Speiser & Voloshchenko

2019 Annual Report - Pathwise

Screening methods for salinity tolerance: a case study ...plantstress.com/methods/Salinity screening.pdf · Screening methods for salinity tolerance: ... chlorophyll (using a hand-held

PATHWISE STOCHASTIC INTEGRATION AND APPLICATIONS TO … · 256 W. Willinger, M.S. Taqqu / Pathwise stochastic integration more satisfying object such as a stochastic process. The

Chapter 4: Screening - Kathmandu University chapter 4 screening.pdf · Chapter 4: Screening Environmental Screening, Scoping and Terms of Reference 9/16/2011 ... 11. The completion

VALUTAZIONE E SCREENING DEI DISTURBI EMOTIVI, ATTENTIVI … corso screening.pdf · emotivi, attentivi e comportamentali •Deficit motori che si correlano a problemi ... Senza Agorafobia

Travel and Communicable Disease Screeningspice.unc.edu/.../2019/10/...Disease-Screening.pdf · Do you have a method for screening patients who may have an infectious disease? 10/25/2019

Purely pathwise probability-free It^o integralcall \purely pathwise", let us consider two examples in which pathwise de ni-tions are in fact purely pathwise but very restrictive. Example

GVO - Screening qualitativ 2015/Auswertung DLA 24-2015 GMO-Screening.pdf · August 2015 DLA – 24/2015 – GVO-Screening qualitativ DLA Dienstleistung Lebensmittel Analytik GbR Auswertungs-Bericht

A Fourier approach to pathwise stochastic integration › ~perkowsk › files › ciesielski.pdf · A Fourier approach to pathwise stochastic integration Massimiliano Gubinelli CEREMADE

Screening and Assessment instruments May 2008 TO PDFpdfs/pubs/screening.pdf · In recent years, there has been a growing emphasis on the mental health and social and behavioral developmental