Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf ·...

Preview:

Citation preview

GreedyColumnSubsetSelection:NewBoundsandDistributedAlgorithms

JasonAltschuler

JointworkwithAdityaBhaskara,ThomasFu,Vahab Mirrokni,AfshinRostamizadeh,andMorteza Zadimoghaddam

ICML2016

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedyalgorithm

4. (Distributed)coreset greedyalgorithm

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedyalgorithm

4. (Distributed)coreset greedyalgorithm

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

Low-RankApproximation

Given(large)matrixAinRmxn andtargetrankk<<m,n:

• Optimalsolution:k-rankSVD• Applications:

• Dimensionalityreduction• Signaldenoising• Compression• ...

ColumnSubsetSelection(CSS)• Columnsoftenhaveimportantmeaning• CSS:Low-rankmatrixapproximationincolumnspaceofA

m

n

m

kk

n

A[S]AAA

WhyuseCSSfordimensionalityreduction?• Unsupervised• Don’tneedlabeleddata

• Classifierindependent• Canreuseoutputfordifferentclassifiers

• Interpretable• Generatefeaturesbysubselecting insteadofarbitraryfunction

• Efficientduringinference• Featuresubselection (CSS)betterthanmatrixmultiplication(SVD)if:• Latencysensitive• SVDprojectionmatrixprohibitivelylarge• Sparse

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedyalgorithm

4. (Distributed)coreset greedyalgorithm

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

• CSSisUG-hard [Civril 2014]

• Importancesampling [Drineas etal.2004,Friezeetal.2004,…]• Fast,butadditive-errorbounds

• Morecomplicatedalgorithms [Desphande etal.2006,Drineas etal.2006,Boutsidis etal.2009,Boutsidis etal.2011,Cohenetal.2015,…]• Multiplicative-errorbounds,butcomplicated→notasfast/distributable

(Verysimplified)backgroundonCSS

• CSSisUG-hard [Civril 2014]

• Importancesampling [Drineas etal.2004,Friezeetal.2004,…]• Fast,butadditive-errorbounds

• Morecomplicatedalgorithms [Desphande etal.2006,Drineas etal.2006,Boutsidis etal.2009,Boutsidis etal.2011,Cohenetal.2015,…]• Multiplicative-errorbounds,butcomplicated→notasfast/distributable

• Greedy [Farahat etal.2011,Civril etal.2011,Boutsidis etal.2015]

• Multiplicative-error boundsandfast/distributable

(Verysimplified)backgroundonCSS

Contributions• Provetightapproximationguaranteeforthegreedyalgorithm

• Firstdistributedimplementationwithprovableapproximationfactors

• Furtheroptimizationsforthegreedyalgorithm

• Empiricalresultsshowingthesealgorithmsareextremelyscalableandhaveaccuracycomparablewiththestate-of-the-art

CSS(A,k)

GCSS(A,B,k)

• GCSS(A,B,k)useskcolumnsofBtoapproximateA

• Note:GCSS(A,A,k)=CSS(A,k)

GeneralizedColumnSubsetSelection(GCSS)

denote byf(S) originalGCSScostfunction

• GCSS maximizingfsubjecttocardinalityconstraint• Intuition:fmeasureshowmuchofAis“covered/explained”by

selectedcolumns

ConvenientreformulationofGCSS

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedyalgorithm

4. (Distributed)coreset greedyalgorithm

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

GREEDYalgorithmtomaximizef

Ourresult:AnalysisofGREEDY

• Weexpectvectorsintobewell-conditioned(think“almostorthogonal”) small

• If boundedbyaconstant,thenonlyneed columns

• Significantimprovementuponcurrentbounds:dependonworst singularvalueofany kcolumns

Ourresult:AnalysisofGREEDY

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedy+ approximationguarantees

4. (Distributed)coreset greedy+ approximationguarantees

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

DISTGREEDY:GCSS(A,B,k)withLmachinesB

Machine1 MachineLMachine2

Designatedmachine

DISTGREEDY:firstobservations• Easy/naturaltoimplementinMapReduce

• 2-passstreamingalgorithminrandomarrivalmodelforcolumns

• Canalsodomultiplerounds/epochs.Goodfor:• Massivedatasets• Gettingbetterapproximations(nextslide)

Ourresults:AnalysisofDISTGREEDYConsideraninstanceGCSS(A,B,k)

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedy+ approximationguarantees

4. (Distributed)coreset greedy+ approximationguarantees

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

4optimizationsthatpreserveourapproximationfor

1.JLLemma [Johnson&Lindenstrauss 1982,Sarlos 2006]:randomlyprojecttorowswhile

stillpreservingk-linearcombos

2.Projection-CostPreservingSketches[Cohenetal.2015]:sketchAwith columns.

3.“StochasticGreedy”[Mirzasoleiman etal. 2015]:eachiterationonlyuses marginalutilitycalls

insteadof..

4.UpdatingAeveryiteration [Farahat etal.2013]:aftereachiteration,removeprojectionsofAandBonto

selectedcolumn.Reducescomplexityofmarginalutilityfrom

ScalableImplementation:GREEDY++

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedy+ approximationguarantees

4. (Distributed)coreset greedy+ approximationguarantees

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

“Small”dataset(mnist):toshowaccuracy

• Takeaway: GREEDY,GREEDY++,andGREEDY-corehaveroughlysameaccuracyasstate-of-the-art

Largedataset(news20.binary)toshowscalability

• Takeaway:DISTGREEDYabletoscaletomassivedatasetswhilestillselectingeffectivefeatures

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedy+ approximationguarantees

4. (Distributed)coreset greedy+ approximationguarantees

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketch:analysisofGREEDY

TalkOutline

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Questions?

Recommended