50
Brain reading: Compressive sensing, fMRI, and statistical learning in Python Ga¨ el Varoquaux INRIA/Parietal

Brain reading, compressive sensing, fMRI and statistical learning in Python

Embed Size (px)

DESCRIPTION

Talk given at Gipsa-lab on using machine learning to learn from fMRI brain patterns and regions related to behavior. This talks focuses on the signal and inverse-problem aspects of the equation, as well as on the software.

Citation preview

Page 1: Brain reading, compressive sensing, fMRI and statistical learning in Python

Brain reading:Compressive sensing, fMRI,and statistical learning in Python

Gael Varoquaux

INRIA/Parietal

Page 2: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain reading: predictive models

2 Sparse recovery with correlated de-signs

3 Having an impact: software

G Varoquaux 2

Page 3: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain reading: predictivemodels

Functional brain imaging:Study of human cognition

G Varoquaux 3

Page 4: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain imagingfMRI data > 50 000 voxels stimuli

Standard analysis

Detect voxels that correlate tothe stimuli

G Varoquaux 4

Page 5: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain imagingfMRI data > 50 000 voxels stimuli

Standard analysis

Detect voxels that correlate tothe stimuli

G Varoquaux 4

Page 6: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain reading

Predicting the object category viewed[Haxby 2001, Distributed and OverlappingRepresentations of Faces and Objects in VentralTemporal Cortex ]

Supervised learning task

Find combinations of voxels topredict the stimuli

Multi-variate statistics

G Varoquaux 5

Page 7: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain reading

Predicting the object category viewed[Haxby 2001, Distributed and OverlappingRepresentations of Faces and Objects in VentralTemporal Cortex ]

Supervised learning taskFind combinations of voxels topredict the stimuli

Multi-variate statistics

G Varoquaux 5

Page 8: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Linear model for fMRI

sign(X w + e) = y

Designmatrix × Coefficients =

Target

Problem size:p > 50 000n∼100 per category

G Varoquaux 6

Page 9: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Estimation: statistical learning

Inverse problem Minimize an error term:

w = argminw

l(y− X w)

Ill-posed: X is not full rank

Inject prior: regularize

w = argminw

l(y− X w) + p(w)

Example: Lasso = sparse regressionw = argmin

w‖y− X w‖2

2 + `1(w)

`1(w) =∑

i |wi |

G Varoquaux 7

Page 10: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Estimation: statistical learning

Inverse problem Minimize an error term:

w = argminw

l(y− X w)

Ill-posed: X is not full rank

Inject prior: regularize

w = argminw

l(y− X w) + p(w)

Example: Lasso = sparse regressionw = argmin

w‖y− X w‖2

2 + `1(w)

`1(w) =∑

i |wi |

G Varoquaux 7

Page 11: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 TV-penalization to promote regions

Neuroscientists think interms of brain regions

[Haxby 2001]

Total-variation penalizationImpose sparsity on the gradientof the image:

p(w) = `1(∇w)

[Michel TMI 2011]

G Varoquaux 8

Page 12: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Prediction with logistic regression - TV

w = argminw

l(y− X w) + p(w)

l : least-square or logistic-regression p: TV

Optimization: proximal gradient (FISTA)- Gradient descent on l (smooth term)- Projections on TV

Prediction performance:Feature screening + SVC 0.77

Sparse regression 0.78Total Variation 0.84

(explained variance)

Standard analysis

G Varoquaux 9

Page 13: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Prediction with logistic regression - TV

w = argminw

l(y− X w) + p(w)

l : least-square or logistic-regression p: TV

Optimization: proximal gradient (FISTA)- Gradient descent on l (smooth term)- Projections on TV

Prediction performance:Feature screening + SVC 0.77

Sparse regression 0.78Total Variation 0.84

(explained variance)

Standard analysis

G Varoquaux 9

Page 14: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Standard analysis or predictive modeling?

Predicting the object category viewed[Haxby 2001, Distributed and OverlappingRepresentations of Faces and Objects in VentralTemporal Cortex ]

Take home message:brain regions, not prediction

G Varoquaux 10

Page 15: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Standard analysis or predictive modeling?

Recovery rather than prediction

G Varoquaux 11

Page 16: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Good prediction 6=6=6= good recovery

SimulationsGround truth

Ground truth

LassoPrediction: 0.78Recovery: 0.429

SVMPrediction: 0.71Recovery: 0.486

Need a method suited for recoveryG Varoquaux 12

Page 17: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain mapping: a statistical perspective

× =

Small sample linear model estimationRandom correlated design

Problem size:p > 50 000n∼100 per category

G Varoquaux 13

Page 18: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain mapping: a statistical perspective

× =

Small sample linear model estimationRandom correlated design

Estimation strategyStandard approach: univariate statisticsMultiple comparisons problem

⇒ statistical power ∝ 1/p

We want sub-linear sample complexity⇒ non-rotationally-invariant estimators

e.g. `1 penalization[ Ng, 2004 Feature selection, `1 vs. `2 regularization,

and rotational invariance ]G Varoquaux 13

Page 19: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain mapping as a sparse recovery task

Recovering brain regions

nmin ∼ 2 k log pRestricted-isometry-like property:The design matrix is well-conditioned [Candes 2006]on sub-matrices of size > k [Tropp 2004]

Mutual incoherence: [Wainwright 2009]

Relevant features S and irrelevantones S are not too correlated

Violated by spatialcorrelations in our design

G Varoquaux 14

Page 20: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Brain mapping as a sparse recovery task

lasso: 23 non-zeros

Recovering k non-zero coefficientsnmin ∼ 2 k log pRestricted-isometry-like property:The design matrix is well-conditioned [Candes 2006]on sub-matrices of size > k [Tropp 2004]

Mutual incoherence: [Wainwright 2009]

Relevant features S and irrelevantones S are not too correlated

Violated by spatialcorrelations in our design

G Varoquaux 14

Page 21: Brain reading, compressive sensing, fMRI and statistical learning in Python

1 Randomized sparsity[Meinshausen and Buhlmann 2010, Bach 2008]

Perturb the design matrix:Subsample the dataRandomly rescale features

+ Run sparse estimatorKeep features that are often selected⇒ Good recovery without mutual incoherence

But RIP-like condition

Cannot recover largecorrelated groups

For m correlated features,selection frequency divided by m

G Varoquaux 15

Page 22: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Sparse recovery withcorrelated designs

Not enough samples: nmin ∼ 2 k log p

Spatial correlations

G Varoquaux 16

Page 23: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Sparse recovery withcorrelated designs

Combining

Clustering

Sparsity

[Varoquaux ICML 2012]

G Varoquaux 16

Page 24: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Brain parcellations

Spatially-connected hierarchical clustering⇒ reduces voxel numbers [Michel Pat Rec 2011]

Replace features by corresponding cluster average+ Use a supervised learner on reduced problem

Cluster choice sub-optimal for regression

G Varoquaux 17

Page 25: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Brain parcellations + sparsityHypothesis: clustering compatible with support(w)

Benefits of clusteringReduced k and p⇒ n > nmin: good side of the “sharp threshold”

Cluster together correlated features⇒ Improves RIP-like conditions

Recovery possible on reduced features

G Varoquaux 18

Page 26: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Randomized parcellations + sparsity

Randomization+ Stability scores

Marginalize thecluster choice

Relaxes mutualincoherencerequirement

G Varoquaux 19

Page 27: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Algorithm1 set n clusters and sparsity by cross-validation

2 loop: perturb randomly data

3 clustering to form reduced features

4 sparse linear model on reduced features

5 accumulate non-zero features

6 threshold map of apparition countsG Varoquaux 20

Page 28: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Simulationsp = 2048, k = 64, n = 256 (nmin > 1000)Weights w: patches of varying sizeDesign matrix: 2D Gaussian random images of

varying smoothnessEstimators

Randomized lassoElastic Net

Our approachUnivariate F test

Parameters set by cross-validation

Performance metricRecovery seen as a 2-class problem⇒ Report AUC of the precision-recall curve

G Varoquaux 21

Page 29: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 When can we recover patches?

Smoothness helps (reduces noise degrees of freedom)Small patches are hard to recover

G Varoquaux 22

Page 30: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 What is the best method for patch recovery?

For small patches: elastic netFor large patches: randomized-clustered sparsityLarge patches and very smooth images: F-test

G Varoquaux 23

Page 31: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Randomizing clusters matters!

Non-random (Ward) clustering inefficientFully-random performs quite wellRandomized Ward gives an extra gain

Degenerate family of clusterassignements

G Varoquaux 24

Page 32: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Randomizing clusters matters!

Non-random (Ward) clustering inefficientFully-random performs quite wellRandomized Ward gives an extra gain

Degenerate family of clusterassignements

G Varoquaux 24

Page 33: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 fMRI: face vs house discrimination [Haxby 2001]

F-scores

L R

y=-31 x=17

L R

z=-17

G Varoquaux 25

Page 34: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 fMRI: face vs house discrimination [Haxby 2001]

`1 Logistic

L R

y=-31 x=17

L R

z=-17

G Varoquaux 25

Page 35: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 fMRI: face vs house discrimination [Haxby 2001]

Randomized `1 Logistic

L R

y=-31 x=17

L R

z=-17

G Varoquaux 25

Page 36: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 fMRI: face vs house discrimination [Haxby 2001]

Randomized Clustered `1 Logistic

L R

y=-31 x=17

L R

z=-17

G Varoquaux 25

Page 37: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 fMRI: face vs house discrimination [Haxby 2001]

F-scores

L R

y=-31 x=17

L R

z=-17

G Varoquaux 25

Page 38: Brain reading, compressive sensing, fMRI and statistical learning in Python

2 Predictive model on selected featuresObject recognition [Haxby 2001]

Using recovered features improves predictionG Varoquaux 26

Page 39: Brain reading, compressive sensing, fMRI and statistical learning in Python

Small-sample brain mappingSparse recovery of patches onspatially-correlated designs

Ingredients: Clustering + Randomization⇒ Reduced feature set compatible with recovery:

matches sparsity pattern + recovery conditions

Compressive sensing questionsCan we recover k > n, in the case of large patches?When do we loose sub-linear sample-complexity?

G Varoquaux 27

Page 40: Brain reading, compressive sensing, fMRI and statistical learning in Python

3 Having an impact: softwareHow to we reach our targetaudience (neuroscientists)?

How do we disseminate our ideas?

How do we facilitate new ideas?

G Varoquaux 28

Page 41: Brain reading, compressive sensing, fMRI and statistical learning in Python

3 Python as a scientific environment

General purpose

Easy, readable syntax

Interactive (ipython)

Great scientific libraries (numpy, scipy, matplotlib...)

G Varoquaux 29

Page 42: Brain reading, compressive sensing, fMRI and statistical learning in Python

3 Growing a software stack

Code lines are costly

⇒ Open source + community driven

Need quality and impact

⇒ Focus on the general purpose libraries first

Scikit-learn: machine learning in Pythonhttp://scikit-learn.org

G Varoquaux 30

Page 43: Brain reading, compressive sensing, fMRI and statistical learning in Python

3 scikit-learn: machine learning in PythonTechnical choices

Prefer Python or Cython, focus on readabilityDocumentation and examples are paramountLittle object-oriented design. Opt for simplicityPrefer algorithms to frameworkCode quality: consistency and testing

G Varoquaux 31

Page 44: Brain reading, compressive sensing, fMRI and statistical learning in Python

3 scikit-learn: machine learning in PythonAPI

Inputs are numpy arraysLearn a model from the data:estimator.fit(X train, Y train)

Predict using learned modelestimator.predict(X test)

Test goodness of fitestimator.score(X test, y test)

Apply change of representationestimator.transform(X, y)

G Varoquaux 32

Page 45: Brain reading, compressive sensing, fMRI and statistical learning in Python

3 scikit-learn: machine learning in PythonComputational performance

scikit-learn mlpy pybrain pymvpa mdp shogunSVM 5.2 9.47 17.5 11.52 40.48 5.63LARS 1.17 105.3 - 37.35 - -Elastic Net 0.52 73.7 - 1.44 - -kNN 0.57 1.41 - 0.56 0.58 1.36PCA 0.18 - - 8.93 0.47 0.33k-Means 1.34 0.79 ∞ - 35.75 0.68

Algorithms rather than low-level optimizationconvex optimization + machine learning

Avoid memory copies

G Varoquaux 33

Page 46: Brain reading, compressive sensing, fMRI and statistical learning in Python

3 scikit-learn: machine learning in PythonCommunity

163 contributors since 2008, 397 github forks25 contributors in latest release (3 months span)

Why this success?Trendy topic?Low barrier of entryFriendly and very skilled mailing listCredit to people

G Varoquaux 34

Page 47: Brain reading, compressive sensing, fMRI and statistical learning in Python

3 Research code 6= software libraryFactor 10 in time investment

Corner cases in algorithm (numerical stability)

Multiple platforms and library versions (Blas )

Documentation

Making it simpler (and get less educated users)

User and developer support ( ∼ 100 mails/day)

Exhausting,but has impact on science and society

Technical + scientific tradeoffs

Ease of install/ease of use rather than speed

Focus on “old science”

Nice publications and theorems are nota recipe for useful code

G Varoquaux 35

Page 48: Brain reading, compressive sensing, fMRI and statistical learning in Python

3 Research code 6= software libraryFactor 10 in time investment

Corner cases in algorithm (numerical stability)

Multiple platforms and library versions (Blas )

Documentation

Making it simpler (and get less educated users)

User and developer support ( ∼ 100 mails/day)

Exhausting,but has impact on science and society

Technical + scientific tradeoffs

Ease of install/ease of use rather than speed

Focus on “old science”

Nice publications and theorems are nota recipe for useful code

G Varoquaux 35

Page 49: Brain reading, compressive sensing, fMRI and statistical learning in Python

Statistical learning to study brain function

Spatial regularization forpredictive modelsTotal variation

Compressive-sensing approachSparsity + randomizedclustering for correlated designs

Machine learning in PythonHuge impact

Post-doc positions availableG Varoquaux 36

Page 50: Brain reading, compressive sensing, fMRI and statistical learning in Python

Bibliography[Michel TMI 2011] V. Michel, et al., Total variation regularization forfMRI-based prediction of behaviour, IEEE Transactions in medicalimaging (2011)http://hal.inria.fr/inria-00563468/en

[Varoquaux ICML 2012] G. Varoquaux, A. Gramfort, B. ThirionSmall-sample brain mapping: sparse recovery on spatially correlateddesigns with randomization and clustering, ICML (2012)http://hal.inria.fr/hal-00705192/en

[Michel Pat Rec 2011] V. Michel, et al., A supervised clustering approachfor fMRI-based inference of brain states, Pattern Recognition (2011)http://hal.inria.fr/inria-00589201/en

[Pedregosa ICML 2011] F. Pedregosa, el al., Scikit-learn: machinelearning in Python, JMRL (2011)http://hal.inria.fr/hal-00650905/en

G Varoquaux 37