Upload
mlconf
View
83
Download
1
Embed Size (px)
Citation preview
minx ||y - Ax||2 + λ ||x||1
Irina Rish IBM T.J. Watson Research Center
Learning About the Brain and
Brain-Inspired Learning
Collaborators (an incomplete list)
IBM T.J. Watson Research:
Guillermo Cecchi Steve Heisig Aurelie Lozano
Google: Mt Sinai: Northwestern U.
Melissa Carroll Rita Goldstein A. Vania Apkarian INRIA: Neurospin/UC Berkeley
Bertrand Thirion
MIT:
Pouya Bashivan
Purdue:
Jean Honorio
Lehigh U.
Katya Scheinberg
SUNY Stony Brook
Dimitris Samaras
St. Johns U.
Genady GrabarnikJB Poline
USC:
Sahil Garg
AI Brain2
Brain 2 AI: Brain-inspired AI Algorithms
AI 2 Brain: Mental-State Predictionand Statistical Biomarker Discovery
Mental State Recognition to Improve Mental Function
Detecting emotional & cognitive changes to predict response to different types of input, e.g. music, video, news, ads, emails (both for mental health and for neuromarketing)
Safety: detecting changes in driver’s alertness level (drowsiness, microsleeps) to prevent accidents
Computational psychiatry: data-analytic approach to diagnosis based on objective measurements(new Research Domain Criteria (RDoC) initiative by NIMH)
Our current focus: schizophrenia, addiction, Huntington’s, Alzheimer’s, Parkinson’s
“Psychiatric research is in crisis” [Wiecki et al. 2015]
AI 2 Brain:
Health & Productivity: mental-state-sensitive software monitoring cognitive load, focus/attention; monitoring stress/anxiety
Overview: Machine Learning in Neuroimaging “Statistical biomarkers”:
[Cecchi et al, NIPS 2009][Rish et al, PLOS One, 2013]
[Carroll et al, Neuroimage 2009][Scheinberg&Rish, ECML 2010]
Schizophrenia classification: 86% to 93% accuracy
[Rish et al, Brain Informatics 2010][Rish et al, SPIE Med.Imaging 2012][Cecchi et al, PLOS Comp Bio 2012]
Cognitive state prediction in videogames: 70-95%
Pain perception: 70-80%, distributed activation patterns
[Honorio et al, AISTATS 2012][Rish et al, SPIE Med.Imaging 2016]
Cocaine addiction: evaluating potential treatments
[Bashivan et al, ICLR 2016] EEG-cognitive load prediction: 91% w/ recurrent ConvNets
+++
+- - ---
Predictive Model
mentaldisorder
healthy
Example 1: Cocaine Addiction fMRI Study
Cocaine: Mechanism of Action • Cocaine affects the reward pathway in the brain (blocks the dopamine transporter)
• May lead to addiction: cocaine use disorder (CUD)
MPH: A Stimulant for a Stimulant?
A potential therapeutic agent for CUD?(e.g., similarly to nicotine patch and using methadone for heroin addiction)
Methylphenidate Hydrochloride (MPH)• Common ADHD treatment (Ritalin)
• Similarity to cocaine:
• chemical structure
• mechanism of action (blocks
dopamine transporter)
• Difference: slower rate of clearance
(90 vs 20 min), and thus a lower
dependence and abuse potential
• MPH has shown positive behavioral effects on CUD subjects [Levin 2007]
• MPH tends to normalize both task-related [Goldstein 2010] and resting-state functional activity in
certain areas [Konova 2013]
Resting-state Functional MRI Image courtesy of fMRI Research Center at Columbia University
Resting-state fMRI experiment: MPH vs. placebo [Konova et al 2013]
Features: functional network degrees
• Network link (i,j) correlation between BOLD signals of voxels i and j
exceeds a given threshold (e.g., > 0.6)
• Feature selection: univariate ranking based on p-values; multiple subsets of
top K features, with increasing K, are used to train classifiers
Classification Results: MPH Normalizes CUD’s Networks[Rish, Bashivan, Cecchi, Goldstein, SPIE 2016]
MPH ‘normalizes’ CUD networks:
CUD’s are harder to discriminate from
controls (10-20% increase in classification
error) under MPH vs under placebo
MPH has stronger effect on CUDs:
MPH (M2) vs Placebo (P2) condition
is much easier to discriminate for CUDs
rather than for controls
Leave-one-out CV with Nearest Neighbor (NN), Linear SVM, Decision Tree (DT), Random
Forest (RF), Logistic Regression (LR), Naïve Bayes (NB), Linear Discriminant Analysis (LDA)
Example 2: Working Memory Load Classification from EEG)
EEG Experiment: 64-electrode EEG
Working Memory task, 4 levels of difficulty:
2,4,6, or 8 symbols to remember
13 subjects, 240 trials each (=3120 trial)
[Bashivan, Rish, Yeasin, Codella, ICLR 2016]
Classification Problem: given time-series recorded during each trial of WM task, predict WM load level
Data Samples: 2670 correctly answered trials (a subset of total 3120)
Feature Extraction: FFT to find spectral power within each electrode at three frequency bands - theta (4-8Hz), alpha (8-13Hz), and beta (13-30Hz).
Evaluation: leave-one-subject out (i.e., 13-fold) CV
Brain ‘Movie’ Classification with Recurrent ConvNets
• Idea: combine spatial, temporal and frequency – make EEG ‘movies’• EEG images: project 3D electrode locations (64) into a 2D via distance-preserving Azimuthal
Equidistant Projection, then interpolate the activity • RGB colores = Theta, Alpha, Beta frequences• Each trial = 7 frames (RGB images) short “movies” as samples
• FFT over the complete trial single image for each trial
• VGG style ConvNets [Simonyan & Zisserman, 2015]
• Conv layers with 3 x 3 receptive
fields
• 4 architectures, increasing depth; deeper is (slightly) better
Baseline: Non-temporal Approach with ConvNets ConvNet Configurations
A B C Dinput (32 x 32 3-channel image)
Conv3-32Conv3-32
Conv3-32Conv3-32
Conv3-32Conv3-32
Conv3-32Conv3-32Conv3-32Conv3-32
maxpool
- Conv3-64Conv3-64
Conv3-64Conv3-64
Conv3-64Conv3-64
- maxpool
- - Conv3-128
Conv3-128
- - maxpool
Architecture Number of parameters Test Error
A ~10k 13.05B ~65.5k 13.17C ~139.4k 13.91D ~158k 12.39
Adding Time is Better: Recurrent ConvNets
Best result: 8.9% error discriminating among 4 levels of cognitive load achieved by recurrent Conv Nets with LSTM + time convolution
• EEG times series for each trial split into 7 windows (0.5 sec). FFT on each time window to get an image as before
• Best ConvNet (7-layer) used as C component
• All 7 ConvNets shared parameters
• video classification architectures from [Ng et al, CVPR 2015]
• Temporal Maxpool: Max pool over time frames
• Temporal Convolution: 1D convolution over time frames
• LSTM - sequence mapping over times frames
• Mixed LSTM/1D-Conv: Combination of both LSTM and 1D-Conv architectures
Architecture Test Error (%)
Validation Error (%)
Number of parameters
RBF SVM 15.34 - -L1-logistic regression 15.32 -
-
Random Forest 12.59 - -DBN 14.96 8.37 1.02 mil
ConvNet+Maxpool 14.80 8.48
1.21 mil
ConvNet+1D-Conv 11.32 9.28
441 k
ConvNet+LSTM 10.54 6.10 1.34 milConvNet+LSTM/
1D-Conv 8.89 8.39 1.62 mil
[Bashivan, Rish, Yeasin, Codella, ICLR 2016]
Interpretability via Deconvolution
Code: https://github.com/pbashivan/EEGLearn
Using deconvnet of [Zeiler et al] to map features back to the brain images
Back Projections: maps obtained by deconvnet on the feature map displaying structures in the input image that excite that particular feature map.
Some of these features correspond to well-known electrophysiological markers of cognitive load.
First-layer features (1st stack, kernel 7) captured wide-spread theta (1st stack output-kernel7) and another (1st stack, kernel 23) frontal beta activity
Second- and third-layer features – frontal theta/beta (2nd stack,kernel7) and 3rd
stack kernel60, 112) as well as parietal alpha (2nd stack kernel29) .
Frontal theta and beta activity as well as parietal alpha are most prominent markers of cognitive/memory load in neuroscience literature [Bashivan et al., 2015; Jensen et al., 2002; Onton et al., 2005; Tallon-Baudry et al., 1999]
Input EEG images: top 9 images with highestfeature activations across the training set La
yer 4
Laye
r 6La
yer 7
• Current theories: the hippocampus functions as an autoenconder to evoke memories; similar encoding function is suggested in the olfactory bulb
• Our computational model: sparse linear autoencoder (online dictionary learning of Mairal et al) + dynamic addition (birth) abnd deletion (death) of hidden nodes
Adult Neurogenesis: Inspiration for Adaptive Representation Learning
• Predominant in the dentate gyrus of the hippocampus and in the olfactory bulb
Olfactory bulb Dentate gyrus
[Garg, Rish, Cecchi, Lozano, ICLR 2017]
n sa
mpl
es
p variables~~
m b
asis
vect
ors
(dict
iona
ry)
sparse representation
input x
output x’ reconstructed x
hidden nodes c encoded x
link weights ‘dictionary’ D
c c
Brain 2 AI:
Better Adaptation in Non-Stationary Environment
Learned dictionary size ‘Old’ domain reconstruction ‘New’ domain reconstruction
non-stationary visual input
Outperforms fixed-size autoencoder on non-stationary input:improved accuracy + more compact representation
Adapts to a new domain without forgetting the old one (via ‘memory’ matrices, part of original Mairal’s method)
Some Lessons
In brain imaging applications Datasets are relatively small (e.g., few 1000 samples)
Model interpretability is important
Brain-inspired algorithms: Neurogenesis, attention, memory and many other brain phenomena can serve
as inspiration for better AI algorithms Challenge: deeper understanding and better modeling of such phenomena
Deep learning faces specific challenges in neuroimaging Need for stronger regularization Need for interpretability (e.g., deconvolution, sparsity)
References [Garg, Rish, Cecchi, Lozano 2016; submitted] S. Garg, I. Rish, G. Cecchi, A. Lozano. Neurogenesis-inspired Dictionary Learning: Online Model Adaptation in a changing world, submitted to ICLR-2017
[Bashivan et al, ICLR 2016] P. Bashivan, I. Rish, M. Yeasin, N. Codella. Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks. ICLR 2016 : International Conference on Learning Representations.
[Bashivan et al, 2015] Mental State Recognition via Wearable EEG, in Proc. of MLINI-2015 workshop at NIPS-2015.
[Heisig et al, 2014] S. Heisig, G. Cecchi, R. Rao and I. Rish. Augmented Human: Human OS for Improved Mental Function. AAAI 2014 Workshop on Cognitive Computing and Augmented Human Intelligence.
[Neuropsychopharmacology, 2014] A Window into the Intoxicated Mind? Speech as an Index of Psychoactive Drug Effects. Bedi G, Cecchi G A, Fernandez Slezak D, Carrillo F, Sigman M, de Wit H. Neuropsychopharmacology, 2014
[NPJ 2015] G. Bedi, F. Carrillo, G. A Cecchi, D. F. Slezak, M. Sigman, N. B Mota, S. Ribeiro, D C Javitt, M. Copelli, C M Corcoran. Automated analysis of free speech predicts psychosis onset in high-risk youths. NPJ Schizophrenia 2015.
[PLoS ONE, 2013] Schizophrenia as a Network Disease: Disruption of Emergent Brain Function in Patients with Auditory Hallucinations, I Rish, G Cecchi, B Thyreau, B Thirion, M Plaze, M-L Paillere-Martinot, C Martelli, J-L Martinot, J-B Poline. PloS ONE 8(1), e50625, Public Library of Science, 2013.
[PLoS One, 2012] Speech Graphs Provide a Quantitative Measure of Thought Disorder in Psychosis. N.B. Mota, N.A.P. Vasconcelos, N. Lemos, A.C. Pieretti, O. Kinouchi, G.A. Cecchi, M. Copelli, S. Ribeiro. PLoS One, 2012
[Rish et al, SPIE 2016] I.Rish, P. Bashivan, G. A. Cecchi, R.Z. Goldstein, Evaluating Effects of Methylphenidate on Brain Activity in Cocaine Addiction: A Machine-Learning Approach. SPIE Medical Imaging, 2016
[SPIE Med.Imaging 2012] Sparse regression analysis of task-relevant information distribution in the brain.Irina Rish, Guillermo A Cecchi, Kyle Heuton, Marwan N Baliki, A Vania Apkarian, SPIE Medical Imaging, 2012.
[AISTATS 2012] J. Honorio, D. Samaras, I. Rish, G.A. Cecchi. Variable Selection for Gaussian Graphical Models. AISTATS, 2012.
[PLoS Comp Bio 2012] Predictive Dynamics of Human Pain Perception, GA Cecchi, L Huang, J Ali Hashmi, M Baliki, MV Centeno, I Rish, AV Apkarian, PLoS Comp Bio 8(10), e1002719, Public Library of Science, 2012.
[Brain Informatics 2010] I. Rish, G. Cecchi, M.N. Baliki and A.V. Apkarian. Sparse Regression Models of Pain Perception, in Proc. of Brain Informatics (BI-2010), Toronto, Canada, August 2010. [NeuroImage, 2009] Prediction and interpretation of distributed neural activity with sparse models. Melissa K Carroll, Guillermo A Cecchi, Irina Rish, Rahul Garg, A Ravishankar Rao. NeuroImage 44(1), 112--122, Elsevier, 2009. [NIPS, 2009] Discriminative network models of schizophrenia, GA Cecchi, I Rish, B Thyreau, B Thirion, M Plaze, M-L Paillere-Martinot, C Martelli, J-L Martinot, J-B Poline. Advances in Neural Information Processing Systems (NIPS 2009) , pp. 252--260, 2009.