38
Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts Steepest Descent Methods for Variable Selection Julian Wolfson 593 Final Project May 29, 2007 J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 1 / 29

Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Steepest Descent Methods for Variable Selection

Julian Wolfson

593 Final Project

May 29, 2007

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 1 / 29

Page 2: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

1 Motivation

2 Boosting for Beginners

3 L1 Penalization and Boosting - Separated at birth?

4 Beyond Boosting: TGDR

5 Applying TGDR

6 Extensions

7 Final Thoughts

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 2 / 29

Page 3: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

The Penalty Box

Focus of this class: Solvemin

βL(β) + λP(β)

for some penalty function P(β)

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 3 / 29

Page 4: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

The Penalty Box, cont’d

Good stuff (with right choice of penalty)

Does variable selection

Nice asymptotic results

Can be solved QUICKLY in simple situations (eg. linear regression)

Software available ⇒ ubiquitous, many variants

Bad stuffConstrained optimization more complicated for loss functions other thansquared-error

Unclear whether these methods (particularly more complex variants) can beapplied to real-life large problems

New loss functions/penalties tackled in a piecemeal fashion - seemingly a new“trick” required for every adaptation of LASSO

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 4 / 29

Page 5: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Alternatives?

Let’s restrict ourselves to the problem of obtaining good predictions only(don’t worry about interpretability)

How do we get good predictions for big problems?

Fundamental question for those working in area of machine learning

Boosting is a popular technique:

FastSimpleGeneral

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 5 / 29

Page 6: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Alternatives?

Let’s restrict ourselves to the problem of obtaining good predictions only(don’t worry about interpretability)

How do we get good predictions for big problems?

Fundamental question for those working in area of machine learning

Boosting is a popular technique:

FastSimpleGeneral

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 5 / 29

Page 7: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Very quick intro to boosting

Boosting is an iterative technique for building an additive model

FT () =J∑

j=1

hj() · β(T )j

We call H = {hj , j = 1, . . . , J} a dictionary of candidate predictors (or weaklearners)

β(T )j is the coefficient derived after T iterations of boosting

Define some loss function L with which to evaluate predictions FT () = Y

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 6 / 29

Page 8: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Very quick intro to boosting, cont’d.

Super Simplified Boosting Algorithm1 Set coefficient vector = 02 For t = 1 : T ,

1 Pick hj ∈ H for which change in hj results in greatest decrease in lossfunction

2 Increment the coefficient associated with hj by some (small) amount

Boosting = Steepest Descent in Predictor Space

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 7 / 29

Page 9: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Towards Interpretability

Coefficients of functions in weak learner set H aren’t interpretable for generalHBut if we take H to be {1, . . . ,p }, the set of covariates, then coefficients areinterpretable in standard way

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 8 / 29

Page 10: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Boosting in Covariate Space

Boosting algorithm simplifies to

1 Set β(0) = 02 For t = 1 : T ,

1 Identify jt = arg maxj |∇L(β(t−1))|2 Set β

(t)jt

= β(t−1)jt

− αtsign(∇L(β(t−1))jt )

Questions1 How to choose increment αt?

2 When do we stop?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 9 / 29

Page 11: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Boosting in Covariate Space

Boosting algorithm simplifies to

1 Set β(0) = 02 For t = 1 : T ,

1 Identify jt = arg maxj |∇L(β(t−1))|2 Set β

(t)jt

= β(t−1)jt

− αtsign(∇L(β(t−1))jt )

Questions1 How to choose increment αt?

2 When do we stop?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 9 / 29

Page 12: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Incrementing and Stopping

Options for αt

1 Exact line search (= Forward Selection... too greedy)

2 Inexact line search (≈ LARS in linear regression case, TGDR in general)

3 Small constant value (= Forward Stagewise Selection)

4 Etc

When to stop1 Leave-one-out CV

2 k-fold CV

3 Etc

Beyond the scope of this talk

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 10 / 29

Page 13: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Boosting and L1 - separated at birth?

Some empirical evidence:

Coincidence? What is the connection between boosting and L1-penalizedmethods?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 11 / 29

Page 14: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

A theorem (Rosset, Zhu, Hastie, 2004)

Theorem 1Consider applying the boosting algorithm with αt = ε to any convex loss function,generating a path of solutions β(ε)(t). Then if the L1-penalized coefficient pathsare monotone for all c < c0, i.e. if ∀j , |(c)j | is non-decreasing in the rangec < c0, then

limε0

β(ε)(c0/ε) = (c0)

where (c0) is the LASSO solution, i.e.

(c0) = arg minβ

L(β) + c0

∑|βj |

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 12 / 29

Page 15: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Some intuition

Consider the problemmin L(β)s.t. ——β||1 − ||β0||1 ≤ ε——β|| ≥ ||β0||, component-wiseExpand L(β) about β0:

L(β) = L(β0) +∇L(β0)(β − β0) + O(ε2)

L(β) is seen to be optimized as ε0 by updating the element of β such that|∇L(β)|j is maximum, provided sign(β0,j) = −sign(∇L(β0)j).

Boosting solves the local L1-constrained problems it encounters along the way

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 13 / 29

Page 16: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Beyond Boosting

β(t)jt

= β(t−1)jt

− αtsign(∇L(β(t−1))jt )

An ObservationBoosting only updates one element of the coefficient vector at each iteration -could we do better by updating multiple elements at once?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 14 / 29

Page 17: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Enter TGDR

TGDR:Threshold Gradient Descent Regularization

Suggested by Friedman and Popescu (2004)

Motivated by boosting and gradient descent methods

Allows multiple directions to be updated in each iteration

Early stopping provides regularization

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 15 / 29

Page 18: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Tweaking the update rule

ε-boosting:

β(t)jt

= β(t−1)jt

− ε · sign(∇L(β(t−1))jt )

TGDR:β(t) = β(t−1) + (β(t−1)) · ε · ∇L(β(t−1))

where

(·) = 1[|∇L(·)| >= τ · maxk=1,...,p

(|∇L(·)|k)]

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 16 / 29

Page 19: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Thresholding

∇L(β)

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 17 / 29

Page 20: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Thresholding

(β) = 1[|∇L(β)| >= τ · maxk=1,...,p

(|∇L(β)|k)]

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 18 / 29

Page 21: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Thresholding

(β) = 1[|∇L(β)| >= τ · maxk=1,...,p

(|∇L(β)|k)]

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 19 / 29

Page 22: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

An example

Gui and Li (2005) extended TGDR for Cox regression

Use partial likelihood loss:L = −`p(β;X )

I adapted TGDR to handle time-varying covariates

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 20 / 29

Page 23: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Application: ACTG 398

Relevant Data≈ 490 HIV-infected patients

Current drug regimen

HIV protein sequences (300 AAs) collected post-infection for approximatelytwo years

Endpoint of Interest

(T ,C ), where

T is the time until a patient “fails” a drug regimen

C is the censoring indicator

Question

Which amino acid positions on HIV (mutations, insertions, deletions) areassociated with time until drug regimen failure?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 21 / 29

Page 24: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Application: ACTG 398

Relevant Data≈ 490 HIV-infected patients

Current drug regimen

HIV protein sequences (300 AAs) collected post-infection for approximatelytwo years

Endpoint of Interest

(T ,C ), where

T is the time until a patient “fails” a drug regimen

C is the censoring indicator

Question

Which amino acid positions on HIV (mutations, insertions, deletions) areassociated with time until drug regimen failure?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 21 / 29

Page 25: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Application: ACTG 398

Relevant Data≈ 490 HIV-infected patients

Current drug regimen

HIV protein sequences (300 AAs) collected post-infection for approximatelytwo years

Endpoint of Interest

(T ,C ), where

T is the time until a patient “fails” a drug regimen

C is the censoring indicator

Question

Which amino acid positions on HIV (mutations, insertions, deletions) areassociated with time until drug regimen failure?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 21 / 29

Page 26: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Results: ACTG 398 Data

Run TGDR on 60% of data (training set) for a range of values of τ ...

Estimated coefficients from training set70R 74V 103N 108I 118I 122E 123E 181C 184V 190A

τ K L K V V K D Y M G0.5 0.134 0.258 0.134 −0.164 0.1310.55 0.115 0.421 0.096 0.092 0.117 −0.255 0.1280.6 0.115 0.421 0.117 −0.164 0.1280.65 0.118 0.434 0.125 −0.143 0.1280.7 0.092 0.535 0.086 0.088 0.207 −0.143 0.2290.75 0.105 0.542 0.078 −0.080 0.085 0.075 0.184 −0.143 0.2210.8 0.434 −0.1430.85 −0.063 0.087 0.554 0.143 −0.082 0.088 0.142 0.119 −0.201 0.3680.9 −0.069 0.083 0.554 0.147 −0.082 0.087 0.079 0.119 −0.202 0.3100.95 −0.062 0.145 0.541 0.206 −0.207 0.147 0.141 0.105 −0.204 0.3800.96 −0.062 0.092 0.541 0.206 −0.148 0.144 0.141 0.094 −0.203 0.3870.97 −0.066 0.098 0.535 0.208 −0.149 0.082 0.143 0.087 −0.204 0.3860.98 −0.066 0.092 0.535 0.146 −0.149 0.084 0.143 0.094 −0.205 0.3810.99 −0.066 0.086 0.535 0.147 −0.150 0.087 0.143 0.094 −0.205 0.380

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 22 / 29

Page 27: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Results: ACTG 398 Data

Run TGDR on 60% of data (training set) for a range of values of τ ...

Estimated coefficients from training set70R 74V 103N 108I 118I 122E 123E 181C 184V 190A

τ K L K V V K D Y M G0.5 0.134 0.258 0.134 −0.164 0.1310.55 0.115 0.421 0.096 0.092 0.117 −0.255 0.1280.6 0.115 0.421 0.117 −0.164 0.1280.65 0.118 0.434 0.125 −0.143 0.1280.7 0.092 0.535 0.086 0.088 0.207 −0.143 0.2290.75 0.105 0.542 0.078 −0.080 0.085 0.075 0.184 −0.143 0.2210.8 0.434 −0.1430.85 −0.063 0.087 0.554 0.143 −0.082 0.088 0.142 0.119 −0.201 0.3680.9 −0.069 0.083 0.554 0.147 −0.082 0.087 0.079 0.119 −0.202 0.3100.95 −0.062 0.145 0.541 0.206 −0.207 0.147 0.141 0.105 −0.204 0.3800.96 −0.062 0.092 0.541 0.206 −0.148 0.144 0.141 0.094 −0.203 0.3870.97 −0.066 0.098 0.535 0.208 −0.149 0.082 0.143 0.087 −0.204 0.3860.98 −0.066 0.092 0.535 0.146 −0.149 0.084 0.143 0.094 −0.205 0.3810.99 −0.066 0.086 0.535 0.147 −0.150 0.087 0.143 0.094 −0.205 0.380

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 22 / 29

Page 28: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Results (cont’d)

Get η = X from test set (40% of data)

HR = Hazard ratio comparing group with η ≥ 0 (“high risk”) to η < 0 (“lowrisk”)

τ HR 95% CI0.5 2.258 1.438 3.5460.55 2.360 1.499 3.7160.6 2.025 1.290 3.1780.65 2.025 1.290 3.1780.7 2.384 1.492 3.8100.75 2.349 1.476 3.7390.8 2.054 1.311 3.2170.85 2.441 1.549 3.8460.9 2.475 1.571 3.9000.95 2.429 1.537 3.8370.96 2.429 1.537 3.8370.97 2.463 1.558 3.8930.98 2.463 1.558 3.8930.99 2.463 1.558 3.893

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 23 / 29

Page 29: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Loss functions and descent directions

For log-likelihood (or log partial likelihood) loss, the descent direction is just`β ≡ ˙, the score function.

Extensive literature on modified/adapted/approximate/quasi score functionswhich allow for:

Missing dataMeasurement errorHeteroskedasticity. . .

IdeaEstimating equations propose a “descent direction” in some sense - applyingTGDR to these descent directions could allow us to do variable selection wheneverestimating equations are available (even if closed-form likelihoods aren’t available).

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 24 / 29

Page 30: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Loss functions and descent directions

For log-likelihood (or log partial likelihood) loss, the descent direction is just`β ≡ ˙, the score function.

Extensive literature on modified/adapted/approximate/quasi score functionswhich allow for:

Missing dataMeasurement errorHeteroskedasticity. . .

IdeaEstimating equations propose a “descent direction” in some sense - applyingTGDR to these descent directions could allow us to do variable selection wheneverestimating equations are available (even if closed-form likelihoods aren’t available).

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 24 / 29

Page 31: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Loss functions and descent directions

For log-likelihood (or log partial likelihood) loss, the descent direction is just`β ≡ ˙, the score function.

Extensive literature on modified/adapted/approximate/quasi score functionswhich allow for:

Missing dataMeasurement errorHeteroskedasticity. . .

IdeaEstimating equations propose a “descent direction” in some sense - applyingTGDR to these descent directions could allow us to do variable selection wheneverestimating equations are available (even if closed-form likelihoods aren’t available).

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 24 / 29

Page 32: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Loss functions and descent directions

For log-likelihood (or log partial likelihood) loss, the descent direction is just`β ≡ ˙, the score function.

Extensive literature on modified/adapted/approximate/quasi score functionswhich allow for:

Missing dataMeasurement errorHeteroskedasticity. . .

IdeaEstimating equations propose a “descent direction” in some sense - applyingTGDR to these descent directions could allow us to do variable selection wheneverestimating equations are available (even if closed-form likelihoods aren’t available).

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 24 / 29

Page 33: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Will this work?

Simplest way of solving estimating equations: iterative substitution

Consider estimating equation gn(β) = 0

If gn(β) is asymptotically unbiased, then solutions will be such that

gn() ≈ 0

In the neighbourhood of a solution, write

≈ +gn()

Suggests the iteration(t) =(t−1) +εgn(

(t−1))

Add thresholding to get TGDR iteration

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 25 / 29

Page 34: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Asymptotics

Some things I’d like to show:

Conjecture 1: Knight/Fu consistency

For suitably well-behaved loss functions L/descent directions ˙, the TGDRestimate converges to min L(β) or the solution of ˙ = 0 as the number ofiterations ∞.

Proof sketch?

Apply results of Bickel, Ritov, Zakai (2006), who show consistency for a verygeneral class of boosting methods

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 26 / 29

Page 35: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Asymptotics, cont’d.

Conjecture 2: Greenshtein/Ritov persistency

For suitably well-behaved loss functions L/descent directions ˙, the TGDRestimates are persistent.

Proof sketch?

Exploit relationship between L1-penalization (shown to be persistent) andboosting (similar to TGDR)

Ideas are welcome!

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 27 / 29

Page 36: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

In Conclusion

TGDR is...

Variable selection based on thresholded gradient descent

Beautifully simple

Computationally tractable

Easy to extend to more complex data structures

But TGDR is not...

Popular (yet)

Particularly amenable to inference (confidence intervals?)

Well studied from a theoretical perspective:

When does it work?How well does it work?How does it compare to competing methods?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 28 / 29

Page 37: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

In Conclusion

TGDR is...

Variable selection based on thresholded gradient descent

Beautifully simple

Computationally tractable

Easy to extend to more complex data structures

But TGDR is not...

Popular (yet)

Particularly amenable to inference (confidence intervals?)

Well studied from a theoretical perspective:

When does it work?How well does it work?How does it compare to competing methods?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 28 / 29

Page 38: Steepest Descent Methods for Variable Selectionjulianw/downloads/JW.SteepestDescent.pdf · 1 Leave-one-out CV 2 k-fold CV 3 Etc Beyond the scope of this talk J. Wolfson (593 Final

Motivation Boosting for Beginners L1 and Boosting Beyond Boosting: TGDR Applying TGDR Extensions Final Thoughts

Acknowledgements

Prof. Peter Gilbert (thesis supervisor)

Prof. Victor DeGruttola (for providing ACTG data)

Thanks!

Questions?

J. Wolfson (593 Final Project) Steepest Descent Methods for Variable Selection May 29, 2007 29 / 29