28
Ga¨ el Varoquaux 1. Data-driven science “Brain mining” 2. Data mining in Python Mayavi, scikit-learn, joblib

Python for brain mining: (neuro)science with state of the art machine learning and data visualization

Embed Size (px)

DESCRIPTION

Talk given at scipy 2011

Citation preview

Page 1: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

Python for brain mining:(neuro)science with state of the art machine learningand data visualization

Gael Varoquaux

1. Data-driven science“Brain mining”

2. Data mining in PythonMayavi, scikit-learn, joblib

Page 2: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Brain miningLearning models of brain function

Gael Varoquaux 2

Page 3: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Imaging neuroscience

Brainimages

Models offunction

Cognitive tasks

Gael Varoquaux 3

Page 4: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Imaging neuroscience

Brainimages

Models offunction

Cognitive tasks

Data-driven science

i ~ ∂

∂t Ψ = HΨ

Gael Varoquaux 3

Page 5: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Brain functional data

Rich data50 000 voxels perframeComplex underlyingdynamics

Few observations∼ 100

Drawing scientific conclusions?Ill-posed statistical problem

Gael Varoquaux 4

Page 6: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Brain functional data

Rich data50 000 voxels perframeComplex underlyingdynamics

Few observations∼ 100

Drawing scientific conclusions?Ill-posed statistical problem

Modern complex system studies:from strong hypothesizes to rich data

Gael Varoquaux 4

Page 7: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Statistics: the curse of dimensionalityy function of x1

y function of x1 and x2

More fit parameters?⇒ need exponentially more data

y function of 50 000 voxels?Expert knowledge

(pick the right ones) Machine learning

Gael Varoquaux 5

Page 8: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Statistics: the curse of dimensionalityy function of x1 y function of x1 and x2

More fit parameters?⇒ need exponentially more data

y function of 50 000 voxels?Expert knowledge

(pick the right ones) Machine learning

Gael Varoquaux 5

Page 9: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Statistics: the curse of dimensionalityy function of x1 y function of x1 and x2

More fit parameters?⇒ need exponentially more data

y function of 50 000 voxels?Expert knowledge

(pick the right ones) Machine learning

Gael Varoquaux 5

Page 10: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Brain readingPredict from brain images the object viewed

Correlation analysis

Gael Varoquaux 6

Page 11: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Brain readingPredict from brain images the object viewed

Inverse problem

Observations Spatial code

Correlation analysis

Inject prior: regularize

Sparse regression= compressive sensing

Gael Varoquaux 6

Page 12: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Brain readingPredict from brain images the object viewed

Inverse problem

Observations Spatial code

Correlation analysis

Inject prior: regularize

Extract brain regionsTotal variationregression

[Michel, Trans Med Imag 2011]Gael Varoquaux 6

Page 13: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Brain readingPredict from brain images the object viewed

Inverse problem

Observations Spatial code

Correlation analysis

Inject prior: regularize

Extract brain regionsTotal variationregression

[Michel, Trans Med Imag 2011]

Cast the problem in a prediction task:supervised learning.

Prediction is a model-selection metric

Gael Varoquaux 6

Page 14: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 On-going/spontaneous activity

95% of theactivity is

unrelated to task

Gael Varoquaux 7

Page 15: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Learning regions from spontaneous activity

Multi-subject dictionarylearning

SpatialmapTime series

Sparsity + spatial continuity + spatial variability

⇒ Individual maps + functional regions atlas

[Varoquaux, Inf Proc Med Imag 2011]

Gael Varoquaux 8

Page 16: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Graphical models: interactions between regions

Estimate covariance structureMany parameters to learn

Regularize: conditionalindependence= sparsity on inverse

covariance

[Varoquaux NIPS 2010]

Gael Varoquaux 9

Page 17: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

1 Graphical models: interactions between regions

Estimate covariance structureMany parameters to learn

Regularize: conditionalindependence= sparsity on inverse

covariance

[Varoquaux NIPS 2010]

Find structure via a density estimation:unsupervised learning.

Model selection: likelihood of new data

Gael Varoquaux 9

Page 18: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 My data-science software stackMayavi, scikit-learn, joblib

Gael Varoquaux 10

Page 19: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 Mayavi: 3D data visualizationRequirements

large 3D datainteractive visualizationeasy scripting

SolutionVTK: C++ data visualizationUI (traits)

+ pylab-inspired API

Black-box solutions don’t yield new intuitions

Limitationshard to installclunky & complexC++ leaking through3D visualization doesn’tpay in academia

Tragedy of thecommons or

niche product?

Gael Varoquaux 11

Page 20: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 scikit-learn: statistical learningVision

Address non-machine-learning expertsSimplify but don’t dumb downPerformance: be state of the artEase of installation

Gael Varoquaux 12

Page 21: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 scikit-learn: statistical learningTechnical choices

Prefer Python or Cython, focus on readabilityDocumentation and examples are paramountLittle object-oriented design. Opt for simplicityPrefer algorithms to frameworkCode quality: consistency and testing

Gael Varoquaux 13

Page 22: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 scikit-learn: statistical learningAPI

Inputs are numpy arraysLearn a model from the data:estimator.fit(X train, Y train)

Predict using learned modelestimator.predict(X test)

Test goodness of fitestimator.score(X test, y test)

Apply change of representationestimator.transform(X, y)

Gael Varoquaux 14

Page 23: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 scikit-learn: statistical learningComputational performance

scikit-learn mlpy pybrain pymvpa mdp shogunSVM 5.2 9.47 17.5 11.52 40.48 5.63LARS 1.17 105.3 - 37.35 - -Elastic Net 0.52 73.7 - 1.44 - -kNN 0.57 1.41 - 0.56 0.58 1.36PCA 0.18 - - 8.93 0.47 0.33k-Means 1.34 0.79 ∞ - 35.75 0.68

Algorithms rather than low-level optimizationconvex optimization + machine learning

Avoid memory copies

Gael Varoquaux 15

Page 24: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 scikit-learn: statistical learningCommunity

35 contributors since 2008, 103 github forks25 contributors in latest release (3 months span)

Why this success?Trendy topic?Low barrier of entryFriendly and very skilled mailing listCredit to people

Gael Varoquaux 16

Page 25: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 joblib: Python functions on steroidsWe keep recomputing the same things

Nested loops with overlapping sub-problemsVarying parameters I/O

Standard solution:pipelines

ChallengesDependencies modelingParameter tracking

Gael Varoquaux 17

Page 26: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 joblib: Python functions on steroids

PhilosophySimple don’t change your codeMinimal no dependenciesPerformant big dataRobust never fail

joblib’s solution = lazy recomputation:Take an MD5 hash of function arguments,Store outputs to disk

Gael Varoquaux 18

Page 27: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

2 joblibLazy recomputing

>>> from j o b l i b import Memory>>> mem = Memory( c a c h e d i r =’/tmp/ joblib ’)>>> import numpy as np>>> a = np. v a n d e r (np. a r a n g e (3))>>> s q u a r e = mem. cache (np. s q u a r e )>>> b = s q u a r e (a)

[Memory] C a l l i n g s q u a r e ...s q u a r e ( a r r a y ([[0 , 0, 1],

[1, 1, 1],[4, 2, 1]]))

s q u a r e - 0.0 s>>> c = s q u a r e (a)>>> # No recomputation

Gael Varoquaux 19

Page 28: Python for brain mining: (neuro)science with state of the art machine learning and data visualization

Conclusion

Data-driven science will needmachine learning because of thecurse of dimensionality

Scikit-learn and joblib: focus onlarge-data performance and easy ofuse

Cannot develop software and science separatelyGael Varoquaux 20