34
Greedy Column Subset Selection: New Bounds and Distributed Algorithms Jason Altschuler Joint work with Aditya Bhaskara, Thomas Fu, Vahab Mirrokni , Afshin Rostamizadeh, and Morteza Zadimoghaddam ICML 2016

Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

GreedyColumnSubsetSelection:NewBoundsandDistributedAlgorithms

JasonAltschuler

JointworkwithAdityaBhaskara,ThomasFu,Vahab Mirrokni,AfshinRostamizadeh,andMorteza Zadimoghaddam

ICML2016

Page 2: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedyalgorithm

4. (Distributed)coreset greedyalgorithm

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

Page 3: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedyalgorithm

4. (Distributed)coreset greedyalgorithm

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

Page 4: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Low-RankApproximation

Given(large)matrixAinRmxn andtargetrankk<<m,n:

• Optimalsolution:k-rankSVD• Applications:

• Dimensionalityreduction• Signaldenoising• Compression• ...

Page 5: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

ColumnSubsetSelection(CSS)• Columnsoftenhaveimportantmeaning• CSS:Low-rankmatrixapproximationincolumnspaceofA

m

n

m

kk

n

A[S]AAA

Page 6: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

WhyuseCSSfordimensionalityreduction?• Unsupervised• Don’tneedlabeleddata

• Classifierindependent• Canreuseoutputfordifferentclassifiers

• Interpretable• Generatefeaturesbysubselecting insteadofarbitraryfunction

• Efficientduringinference• Featuresubselection (CSS)betterthanmatrixmultiplication(SVD)if:• Latencysensitive• SVDprojectionmatrixprohibitivelylarge• Sparse

Page 7: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedyalgorithm

4. (Distributed)coreset greedyalgorithm

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

Page 8: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

• CSSisUG-hard [Civril 2014]

• Importancesampling [Drineas etal.2004,Friezeetal.2004,…]• Fast,butadditive-errorbounds

• Morecomplicatedalgorithms [Desphande etal.2006,Drineas etal.2006,Boutsidis etal.2009,Boutsidis etal.2011,Cohenetal.2015,…]• Multiplicative-errorbounds,butcomplicated→notasfast/distributable

(Verysimplified)backgroundonCSS

Page 9: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

• CSSisUG-hard [Civril 2014]

• Importancesampling [Drineas etal.2004,Friezeetal.2004,…]• Fast,butadditive-errorbounds

• Morecomplicatedalgorithms [Desphande etal.2006,Drineas etal.2006,Boutsidis etal.2009,Boutsidis etal.2011,Cohenetal.2015,…]• Multiplicative-errorbounds,butcomplicated→notasfast/distributable

• Greedy [Farahat etal.2011,Civril etal.2011,Boutsidis etal.2015]

• Multiplicative-error boundsandfast/distributable

(Verysimplified)backgroundonCSS

Page 10: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Contributions• Provetightapproximationguaranteeforthegreedyalgorithm

• Firstdistributedimplementationwithprovableapproximationfactors

• Furtheroptimizationsforthegreedyalgorithm

• Empiricalresultsshowingthesealgorithmsareextremelyscalableandhaveaccuracycomparablewiththestate-of-the-art

Page 11: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

CSS(A,k)

GCSS(A,B,k)

• GCSS(A,B,k)useskcolumnsofBtoapproximateA

• Note:GCSS(A,A,k)=CSS(A,k)

GeneralizedColumnSubsetSelection(GCSS)

Page 12: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

denote byf(S) originalGCSScostfunction

• GCSS maximizingfsubjecttocardinalityconstraint• Intuition:fmeasureshowmuchofAis“covered/explained”by

selectedcolumns

ConvenientreformulationofGCSS

Page 13: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedyalgorithm

4. (Distributed)coreset greedyalgorithm

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

Page 14: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

GREEDYalgorithmtomaximizef

Page 15: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Ourresult:AnalysisofGREEDY

Page 16: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

• Weexpectvectorsintobewell-conditioned(think“almostorthogonal”) small

• If boundedbyaconstant,thenonlyneed columns

• Significantimprovementuponcurrentbounds:dependonworst singularvalueofany kcolumns

Ourresult:AnalysisofGREEDY

Page 17: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedy+ approximationguarantees

4. (Distributed)coreset greedy+ approximationguarantees

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

Page 18: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

DISTGREEDY:GCSS(A,B,k)withLmachinesB

Machine1 MachineLMachine2

Designatedmachine

Page 19: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

DISTGREEDY:firstobservations• Easy/naturaltoimplementinMapReduce

• 2-passstreamingalgorithminrandomarrivalmodelforcolumns

• Canalsodomultiplerounds/epochs.Goodfor:• Massivedatasets• Gettingbetterapproximations(nextslide)

Page 20: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Ourresults:AnalysisofDISTGREEDYConsideraninstanceGCSS(A,B,k)

Page 21: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedy+ approximationguarantees

4. (Distributed)coreset greedy+ approximationguarantees

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

Page 22: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

4optimizationsthatpreserveourapproximationfor

1.JLLemma [Johnson&Lindenstrauss 1982,Sarlos 2006]:randomlyprojecttorowswhile

stillpreservingk-linearcombos

2.Projection-CostPreservingSketches[Cohenetal.2015]:sketchAwith columns.

3.“StochasticGreedy”[Mirzasoleiman etal. 2015]:eachiterationonlyuses marginalutilitycalls

insteadof..

4.UpdatingAeveryiteration [Farahat etal.2013]:aftereachiteration,removeprojectionsofAandBonto

selectedcolumn.Reducescomplexityofmarginalutilityfrom

ScalableImplementation:GREEDY++

Page 23: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedy+ approximationguarantees

4. (Distributed)coreset greedy+ approximationguarantees

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketches

TalkOutline

Page 24: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

“Small”dataset(mnist):toshowaccuracy

• Takeaway: GREEDY,GREEDY++,andGREEDY-corehaveroughlysameaccuracyasstate-of-the-art

Page 25: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Largedataset(news20.binary)toshowscalability

• Takeaway:DISTGREEDYabletoscaletomassivedatasetswhilestillselectingeffectivefeatures

Page 26: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

1. Background/motivationforColumnSubsetSelection(CSS)

2. Previouswork+ ourcontributions

3. (Single-machine)greedy+ approximationguarantees

4. (Distributed)coreset greedy+ approximationguarantees

5. Furtheroptimizations

6. Experiments

7. [Timepermitting]Proofsketch:analysisofGREEDY

TalkOutline

Page 27: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Page 28: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Page 29: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Page 30: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Page 31: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Page 32: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Page 33: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Proofsketch:AnalysisofGREEDY

● Keylemma:ExistselementofOPTk thatgiveslargemarginalgaintoGREEDYr

● Closesgaptof(OPTk)● Similartosubmodular functions

Page 34: Greedy Column Subset Selection: New Bounds and Distributed ...jasonalt/Altschuler_ICML_talk.pdf · Why use CSS for dimensionality reduction? • Unsupervised • Don’t need labeled

Questions?