14
LIGHTNING, A LIBRARY FOR LARGE-SCALE MACHINE LEARNING IN PYTHON , Fabian Pedregosa (1) Mathieu Blondel (2) (1) Chaire Havas-Dauphine / INRIA, Paris France (2) NTT Communication Science Laboratories, Kyoto Japan

Lightning: large scale machine learning in python

Embed Size (px)

Citation preview

Page 1: Lightning: large scale machine learning in python

LIGHTNING, A LIBRARY FORLARGE-SCALE MACHINELEARNING IN PYTHON

, Fabian Pedregosa (1) Mathieu Blondel (2)

(1) Chaire Havas-Dauphine / INRIA, Paris France

(2) NTT Communication Science Laboratories, Kyoto Japan

Page 2: Lightning: large scale machine learning in python

SCIKIT-LEARN: WITH GREAT CODECOMES GREAT RESPONSABILITY

# lines of code in scikit-learn

Very selective for new algorithms/models.

Page 3: Lightning: large scale machine learning in python

LIGHTNINGIncorporate recent progress in large-scale optimization.

scikit-learn compatible .scalable on large datasets.support for dense and sparse input.emphasis on structured sparsity penalties.

dependencies = Python + Cython + scikit-learn.

Page 4: Lightning: large scale machine learning in python

SCIKIT-LEARN COMPATIBLE

mix lightning with scikit-learn Pipeline, GridSearchCV,etc.

Page 5: Lightning: large scale machine learning in python

FROM LARGE DATA TO LARGEOPTIMIZATION

Big data comes in different flavors.

n{⎛

⎜⎜⎜⎜

DA

TA

⎟⎟⎟⎟

pLarge sample:

Computer vision, advertising,etc.

Large dimension:Biology, neuroscience, etc.

Page 6: Lightning: large scale machine learning in python

LEARNING FROM LARGE SAMPLESUsual methods (gradient descent, BFGS, etc.):

Pass through the data at each iteration.Prohibitive for large datasets.

Back to simple methods:

Stochastic gradient descent (Robbins and Monro, 1951).

Page 7: Lightning: large scale machine learning in python

LEARNING FROM LARGE SAMPLES

lighting example, n=100.000

In last 5 years, flurry ofnew stochastic methods:

Stochastic variance-reduced gradient(SVRG)Stochastic DualCoordinate Ascent(SDCA)Stochastic AverageGradient (SAG/SAGA)

They are all in lightning!

Page 8: Lightning: large scale machine learning in python

LEARNING FROM LARGE FEATURESIterate through the columns.Coordinate Descent-like algorithms.Very efficient for sparse models.

(Blondel et al. 2013) , multiclass classification with group-lasso penalty

Page 9: Lightning: large scale machine learning in python

STRUCTURED SPARSITYThere's so much more than the Lasso ...

Group sparse penalty.Total variation.Trace norm (low rank).

Page 10: Lightning: large scale machine learning in python

APISimilarities and differences with scikit-learn

scikit-learn: (penalty = 'l1', )LogisticRegression

loss function

solver='liblinear' algorithm

lightning: (penalty = 'l1', ) CDClassifier

algorithm

loss='log' loss function

API based on algorithms, not models.

Page 11: Lightning: large scale machine learning in python

EXTENSIBILITYTypical loss and penalties available.Possible to pass custom loss or penalty function

clf = FistaClassifier( loss=my_loss, penalty=my_penalty)

(available for Fista* and SAGA*)

Page 12: Lightning: large scale machine learning in python

FUTURE CHALLENGESParallel stochastic methods

(Leblond, Pedregosa, Lacoste-Julien 2016)

Out of core (scale beyond computer memory).

Page 13: Lightning: large scale machine learning in python

SCIKIT-LEARN-CONTRIBlightning is just the beginning.

Welcome projects that are:

Your browser does not support SVG

scikit-learn compatible.Documented.Test coverage > 80%.

Page 14: Lightning: large scale machine learning in python

THANKS FOR YOUR ATTENTIONhttp://contrib.scikit-learn.org/lightning/

(We're hiring!)