Analyzing Brain Signals by Combinatorial Optimization Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN December 1, 2008 Quantifying

Analyzing Brain Signals by Combinatorial Optimization Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN December 1, 2008 Quantifying statistical interdependence of point processes Application to spike data and EEG Slide 2 Topics Mathematical problem Similarity of Multiple Point Processes Motivation/Application Early diagnosis of Alzheimers disease from EEG signals Along the way Spike synchrony Collaborators Franois Vialatte*, Theophane Weber +, and Andrzej Cichocki* (*RIKEN, + MIT) Financial Support Slide 3 Alzheimer's disease Mild (early stage) -becomes less energetic or spontaneous -noticeable cognitive deficits -still independent (able to compensate) Moderate (middle stage) -Mental abilities decline -personality changes -become dependent on caregivers Severe (late stage) -complete deterioration of the personality -loss of control over bodily functions -total dependence on caregivers Apathy Memory (forgetting relatives) Evolution of the disease (stages) One disease, many symptoms Video sources: Alzheimer society 2 to 5 years before -mild cognitive impairment (MCI) -6 to 25 % progress to Alzheimers memory, language, executive functions, apraxia, apathy, agnosia, etc 2% to 5% of people over 65 years old up to 20% of people over 80 Jeong 2004 (Nature) EEG data GOAL: Diagnosis of MCI based on EEG EEG is relatively simple and inexpensive technology Early diagnosis: medication more effective, more time to prepare future care of patient, etc. Slide 4 Overview Alzheimers Disease (AD) decrease in EEG synchrony Similarity of Point Processes Two 1-D point processes Two multi-D point processes Multiple multi-D point processes Numerical Results Conclusion Slide 5 Alzheimer's disease Inside glimpse: abnormal EEG AD vs. MCI (Hogan et al. 203; Jiang et al., 2005) AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006) MCI vs. mildAD (Babiloni et al., 2006). Decrease of synchrony Brain slow-down slow rhythms (0.5-8 Hz)fast rhythms (8-30 Hz) (Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993). Images: www.cerebromente.org.br EEG system: inexpensive, mobile, useful for screening focus of this project Slide 6 Spontaneous (scalp) EEG Fourier power f (Hz) t (sec) amplitude Fourier |X(f)| 2 EEG x(t) Time-frequency |X(t,f)| 2 (wavelet transform) Time-frequency patterns (bumps) Slide 7 Sparse representation: bump model Bumps Sparse representation F. Vialatte et al. A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics, Neural Networks (2007). 10 4 - 10 5 coefficients about 10 2 parameters t (sec) f(Hz) t (sec) f(Hz) t (sec) Assumptions: 1.time-frequency map is suitable representation 2.oscillatory bursts (bumps) convey key information Slide 8 Similarity of bump models How similar are n 2 bump models? Similarity of multiple multi-dimensional point processes with and point / event Slide 9 Overview Alzheimers Disease (AD) decrease in EEG synchrony Similarity of Point Processes Two 1-dim point processes Two multi-dim point processes Multiple multi-dim point processes Numerical Results Conclusion Slide 10 Two one-dimensional point processes t x x 0 0 t How synchronous/similar? Classical methods for continuous time series fail e.g., cross-correlation Slide 11 Two aspects of synchrony Analogy: waiting for a train Train may not arrive (e.g., mechanical problem) = Event reliability Train may or may not be on time = Timing precision Slide 12 Two 1-dim point processes Review of Spike Synchrony Measures Surrogate Spike Data Spike Trains from Morris-Lecar Neuron Conclusion Slide 13 Spike Synchrony Measures Von Rossum distance (mixed) Schreiber et al similarity measure (mixed) Hunter-Milton similarity measure (mixed) Victor-Purpura distance metric (event reliability) Event synchronization (mixed) Stochastic event synchrony (timing precision and event reliability) Slide 14 Van Rossum distance measure Spikes convolved with exponential or Gaussian function spike trains converted into time series s(t) and s(t) Squared distance between s(t) and s(t) If x = x, we have D R = 0 Time constant R x x 0 0 RR van Rossum M.C.W., 2001. A novel spike distance. Neural Computation 13, 75163. Slide 15 Schreiber et al. similarity measure Spikes convolved with exponential or Gaussian function spike trains converted into time series s(t) and s(t) Correlation between s(t) and s(t) If x = x, we have S S = 1 Time constant S Schreiber S., Fellous J.M., Whitmer J.H., Tiesinga P.H.E., and Sejnowski T.J., 2003. A new correlation-based measure of spike timing reliability. Neurocomputing 52, 925931. Slide 16 x x 0 0 Victor-Purpura distance measure Minimal cost D V of transforming x into x' Basic operations event insertion/deletion: cost = 1 event movement: cost proportional to distance (constant C V ) If x = x, we have D V = 0 Time constant V = 1/C V DELETION INSERTION Victor J. D. and Purpura K. P., 1997. Metric-space analysis of spike trains: theory, algorithms, and application. Network: Comput. Neural Systems 8(17), 127164. Slide 17 Stochastic Event Synchrony x and x synchronous if identical apart from delay little timing jitter few deletions/insertions based on generative statistical model x x 0 0 v 0 Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press. Slide 18 Stochastic Event Synchrony v 0 T0T0 T0T0 T0T0 0 0 T0T0 x x 0 0 T0T0 - t /2 t /2 non-coincident x x Stochastic event synchrony (SES): delay t, jitter s t, non-coincidence Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press. Slide 19 Marginalizing over v: v 0 T0T0 T0T0 T0T0 0 0 T0T0 x x 0 0 T0T0 - t /2 t /2 geometric prior for lenght events i.u.d. in [0,T 0 ] Gaussian offsets with mean - t /2 and variance s t /2 Gaussian offsets with mean t /2 and variance s t /2 i.i.d. deletions with prob p d non-coincident x x Stochastic Event Synchrony Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press. Slide 20 Probabilistic inference DYNAMIC PROGRAMMING PARAMETER ESTIMATION PROBLEM: Given 2 point processes x and x, compute and = t, s t APPROACH: (j*, j*,*) = argmax j,j, log p(x, x, j, j,) SOLUTION: Coordinate descent (j (i+1), j (i+1) ) = argmax j,j log p(x, x, j, j, (i) ) (i+1) = argmax x log p(x, x, j (i+1), j (i+1), ) 0x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 0 x 1 x 2 x 3 x 4 x 5 x 6 x k non-coincident x k non-coincident (x k x k ) coincident pair Slide 21 Spike Synchrony Measures Von Rossum distance (mixed) Schreiber et al similarity measure (mixed) Hunter-Milton similarity measure (mixed) Victor-Purpura distance metric (event reliability) Event synchronization (mixed) Stochastic event synchrony (timing precision and event reliability) Slide 22 Two 1-dim point processes Review of Spike Synchrony Measures Surrogate Spike Data Spike Trains from Morris-Lecar Neuron Conclusion Slide 23 Surrogate Data p d = 0, 0.1, , 0.4 (deletion probability) t = 0, 25, and 50 ms (delay) t = 10, 30, and 50 ms (timing jitter) length of hidden sequence = 40/(1-p d ) expected length of x and x = 40 E{S} computed over 10000 pairs Slide 24 Surrogate Data: Results E{D R } increases with p d and t D R cannot distinguish timing dispersion from event reliability (likewise all measures except SES and D V ) E{D V } increases with p d, practically independent of t D V measure for event reliability ONLY curves for t = 0ms, measures strongly depend on lag Victor Purpura measure D V Van Rossum measure D R similar for S S,S H,S Q t =0 Slide 25 Surrogate Data: Results for SES E{ t } increases with t, practically independent of p d t measure for timing dispersion E{} increases with p d, practically independent of t measure for event reliability Curves for t = 0, 25, and 50 ms practically coincident Slide 26 Two 1-dim point processes Review of Spike Synchrony Measures Surrogate Spike Data Spike Trains from Morris-Lecar Neuron Conclusion Slide 27 Morris-Lecar Neurons Simple neuron model Exhibits behavior of Type I & II neurons (saddle-node/Hopf bifurc.) Input current: baseline + sinusoid + Gaussian noise Membrane potential Type I Type II 5 trials Spiking threshold Slide 28 High reliability Large timing dispersion Low reliability Small timing dispersion jitter s t = (15ms) 2, non-coincidence = 3% jitter s t = (3ms) 2, non-coincidence = 27% Type I Type II Morris-Lecar Neurons (2) 50 trials Slide 29 Morris-Lecar Neurons: Results Small : Type II has larger similarity than type I (dispersion in Type I) Large : Type I has larger similarity than type II (drop-outs in Type II) Observation: Similarity depends on time constant similarity FUNCTION S() SES AUTOMATICALLY selects s t Slide 30 Two 1-dim point processes Review of Spike Synchrony Measures Surrogate Spike Data Spike Trains from Morris-Lecar Neuron Conclusion Slide 31 Similarity of pairs of spike trains: timing precision and reliability Comparison of various spike synchrony measures Most measures not able to separate the two aspect of synchrony Exception: Victor-Purpura and Stochastic Event Synchrony Victor-Purpura: event reliability SES: both timing precision and event reliability Most measures depend on time constant, to be chosen by user Exception: Event Synchronization and SES Most measures sensitive to lags between the two spike trains Exception: SES Future work: application to neurophysiological recordings Slide 32 Overview Alzheimers Disease (AD) decrease in EEG synchrony Similarity of Point Processes Two 1-dim point processes Two multi-dim point processes Multiple multi-dim point processes Numerical Results Conclusion Slide 33 Similarity of two bump models... Slide 34 ... by matching bumps Bumps in one model, but NOT in other fraction of non-coincident bumps Bumps in both models, but with offset Average time offset t (delay) Timing jitter with variance s t Average frequency offset f Frequency jitter with variance s f PROBLEM: Given two bump models, compute (, t, s t, f, s f ) Stochastic Event Synchrony (SES) = (, t, s t, f, s f ) Slide 35 Generative model Generate bump model (hidden) geometric prior for number of bumps bumps are uniformly distributed in rectangle amplitude, width (in t and f) all i.i.d. Generate two noisy observations offset between hidden and observed bump = Gaussian random vector with mean ( t /2, f /2) covariance diag(s t /2, s f /2) amplitude, width (in t and f) all i.i.d. deletion with probability p d y hidden yyy ( - t /2, - f /2) ( t /2, f /2) Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press. Slide 36 Summary MATCHING max-product ESTIMATION closed-form PROBLEM: Given two bump models, compute (, t, s t, f, s f ) APPROACH: (c*,*) = argmax c, log p(y, y, c, ) SOLUTION: Coordinate descent c (i+1) = argmax c log p(y, y, c, (i) ) (i+1) = argmax x log p(y, y, c (i+1), ) Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press. Slide 37 Average synchrony 3. SES for each pair of models 4. Average the SES parameters 1.Group electrodes in regions 2.Bump model for each region Slide 38 Overview Alzheimers Disease (AD) decrease in EEG synchrony Similarity of Point Processes Two 1-dim point processes Two multi-dim point processes Multiple multi-dim point processes Numerical Results Conclusion Slide 39 Beyond pairwise interactions Pairwise similarity Multi-variate similarity Slide 40 Similarity of multiple bump models y1y1 y2y2 y3y3 y4y4 y5y5 y1y1 y2y2 y3y3 y4y4 y5y5 Constraint: in each cluster at most one bump from each signal Models similar if few deletions/large clusters little jitter Dauwels J., Vialatte F., Weber T. and Cichocki. Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008. Slide 41 Generative model Generate bump model (hidden) geometric prior for number n of bumps bumps are uniformly distributed in rectangle amplitude, width (in t and f) all i.i.d. Generate M noisy observations offset between hidden and observed bump = Gaussian random vector with mean ( t,m /2, f,m /2) covariance diag(s t,m /2, s f,m /2) amplitude, width (in t and f) all i.i.d. deletion with probability p d y hidden y1y1 y2y2 y3y3 y4y4 y5y5 Parameters: = t,m, f,m, s t,m, s f,m, p c p c (i) = p(cluster size = i |y) (i = 1,2,,M) Dauwels J., Vialatte F., Weber T. and Cichocki. Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008. Slide 42 Probabilistic inference CLUSTERING (Integer Program) ESTIMATION OF PARAMETERS PROBLEM: Given M bump models, compute = t,m, f,m, s t,m, s f,m, p c APPROACH: (b*,*) = argmax b, log p(y, y, b, ) SOLUTION: Coordinate descent b (i+1) = argmax c log p(y, y, b, (i) ) (i+1) = argmax x log p(y, y, b (i+1), ) Integer programming methods (e.g., LP relaxation) IP with 10.000 variables solved in about 1s CPLEX: commercial toolbox for solving IPs (combines several algorithms) Dauwels J., Vialatte F., Weber T. and Cichocki. Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008. Slide 43 Overview Alzheimers Disease (AD) decrease in EEG synchrony Similarity of Point Processes Two 1-dim point processes Two multi-dim point processes Multiple multi-dim point processes Numerical Results Conclusion Slide 44 EEG Data EEG data provided by Prof. T. Musha EEG of 22 Mild Cognitive Impairment (MCI) patients and 38 age-matched control subjects (CTR) recorded while in rest with closed eyes spontaneous EEG All 22 MCI patients suffered from Alzheimers disease (AD) later on Electrodes located on 21 sites according to 10-20 international system Electrodes grouped into 5 zones (reduces number of pairs) 1 bump model per zone Band pass filtered between 4 and 30 Hz Slide 45 Similarity measures Correlation and coherence Granger causality (linear system): DTF, ffDTF, dDTF, PDC, PC,... Phase Synchrony: compare instantaneous phases (wavelet/Hilbert transform) State space based measures sync likelihood, S-estimator, S-H-N-indices,... Information-theoretic measures KL divergence, Jensen-Shannon divergence,... No Phase Locking Phase Locking TIME FREQUENCY Slide 46 Sensitivity (average synchrony) Granger Info. Theor. State Space Phase SES Corr/Coh Mann-Whitney test: small p value suggests large difference in statistics of both groups Significant differences for ffDTF and SES (more unmatched bumps, but same amount of jitter) Slide 47 Classification (bi-SES) Clear separation, but not yet useful as diagnostic tool Additional indicators needed (fMRI, MEG, DTI,...) Can be used for screening population (inexpensive, simple, fast) ffDTF 85% correctly classified Slide 48 Strong (anti-) correlations families of sync measures Correlations Slide 49 Overview Alzheimers Disease (AD) decrease in EEG synchrony Similarity of Point Processes Two 1-dim point processes Two multi-dim point processes Multiple multi-dim point processes Numerical Results Conclusion Slide 50 Conclusions Measure for similarity of point processes Key idea: matching of events Applications Spiking synchrony (surrogate data/Morris Lecar neuron) EEG synchrony of MCI patients SES allows to distinguish event reliability from timing precision About 85-90% correctly classified MCI vs. healthy subjects perhaps useful for screening a large population Future work: Combination with other modalities (MEG, fMRI,...) Integration of biophysical models Alternative inference techniques (variations on max-product, Monte-Carlo) Slide 51 Analyzing Brain Signals by Combinatorial Optimization Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN December 1, 2008 Quantifying statistical interdependence of point processes Application to spike data and EEG Slide 52 References + software References Quantifying Statistical Interdependence by Message Passing on Graphs PART I: One-Dimensional Point Processes, Neural Computation (under revision) Quantifying Statistical Interdependence by Message Passing on Graphs PART II: Multi-Dimensional Point Processes, Neural Computation (under revision) Quantifying Statistical Interdependence by Message Passing on Graphs PART III: Multivariate Approach, Neural Computation (in preparation) A Comparative Study of Synchrony Measures for the Early Diagnosis of Alzheimer's Disease Based on EEG, NeuroImage (under revision) On the Early Diagnosis of Alzheimer's Disease Based on EEG, Current Alzheimers Research (in preparation, invited review) Measuring Neural Synchrony by Message Passing, NIPS 2007 Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008 Software MATLAB implementation of the synchrony measures MATLAB Toolbox for bump modelling. Slide 53 Summary Similarity of multiple multi-dimensional point processes Step 1: TWO ONE-dimensional point processes Step 2: TWO MULTI-dimensional point processes Step 3: MULTIPLE MULTI-dimensional point processes Dynamic programming Max-product/LP relaxation/Edmund-Karp Integer Programming Slide 54 Estimation Deltas: average offsetSigmas: var of offset...where Simple closed form expressions artificial observations (conjugate prior) Slide 55 Large-scale synchrony Apparently, all brain regions affected... Slide 56 Alzheimer's disease Outside glimpse: the future (prevalence) USA (Hebert et al. 2003) World (Wimo et al. 2003) Million of sufferers 2% to 5% of people over 65 years old Up to 20% of people over 80 Jeong 2004 (Nature) Slide 57 Ongoing and future work Applications alternative inference techniques (e.g., MCMC, linear programming) time dependent (Gaussian processes) multivariate (T.Weber) Fluctuations of EEG synchrony Caused by auditory stimuli and music (T. Rutkowski) Caused by visual stimuli (F. Vialatte) Yoga professionals (F. Vialatte) Professional shogi players (RIKEN & Fujitsu) Brain-Computer Interfaces (T. Rutkowski) Spike data from interacting monkeys (N. Fujii) Calcium propagation in gliacells (N. Nakata) Neural growth (Y. Tsukada & Y. Sakumura)... Algorithms Slide 58 Fitting bump models Signal Bump Initialisation After adaptation Adaptation gradient method F. Vialatte et al. A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics, Neural Networks (2007). Slide 59 Boxplots SURPRISE! No increase in jitter, but significantly less matched activity! Physiological interpretation neural assemblies more localized? harder to establish large-scale synchrony? Slide 60 Generative model Generate bump model (hidden) geometric prior for number n of bumps p(n) = (1- S) ( S) -n bumps are uniformly distributed in rectangle amplitude, width (in t and f) all i.i.d. Generate two noisy observations offset between hidden and observed bump = Gaussian random vector with mean ( t /2, f /2) covariance diag(s t /2, s f /2) amplitude, width (in t and f) all i.i.d. deletion with probability p d y hidden yyy Easily extendable to more than 2 observations ( - t /2, - f /2) ( t /2, f /2) Slide 61 Probabilistic inference MATCHING POINT ESTIMATION PROBLEM: Given two bump models, compute ( spur, t, s t, f, s f ) APPROACH: (c*,*) = argmax c, log p(y, y, c, ) SOLUTION: Coordinate descent c (i+1) = argmax c log p(y, y, c, (i) ) (i+1) = argmax x log p(y, y, c (i+1), ) Slide 62 Alzheimer's disease Inside glimpse: abnormal EEG AD vs. MCI (Hogan et al. 203; Jiang et al., 2005) AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006) MCI vs. mildAD (Babiloni et al., 2006). Decrease of synchrony Brain slow-down slow rhythms (0.5-8 Hz)fast rhythms (8-30 Hz) (Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993). Images: www.cerebromente.org.br EEG system: inexpensive, mobile, useful for screening focus of this project Slide 63 Comparing EEG signal rhythms ? PROBLEM I: Signals of 3 seconds sampled at 100 Hz ( 300 samples) Time-frequency representation of one signal = about 25 000 coefficients 2 signals Slide 64 Numerous neighboring pixels Comparing EEG signal rhythms ?(2) One pixel PROBLEM II: Shifts in time-frequency! Slide 65 Generative model Generate bump model (hidden) geometric prior for number n of bumps p(n) = (1- S) ( S) -n bumps are uniformly distributed in rectangle amplitude, width (in t and f) all i.i.d. Generate M noisy observations offset between hidden and observed bump = Gaussian random vector with mean ( t,m /2, f,m /2) covariance diag(s t,m /2, s f,m /2) amplitude, width (in t and f) all i.i.d. deletion with probability p d y hidden y1y1 y2y2 y3y3 y4y4 y5y5 Parameters: = t,m, f,m, s t,m, s f,m, p c p c (i) = p(cluster size = i |y) (i = 1,2,,M) Slide 66 90% correctly classified 85% correctly classified Average cluster size Classification (multi-SES) Average cluster size Average bump freq Average bump width ffDTF Slide 67 Similarity of bump models... How similar or synchronous are two bump models? Slide 68 Signatures of local synchrony f (Hz) t (sec) Time-frequency patterns (bumps) EEG stems from thousands of neurons bump if neurons are phase-locked = local synchrony Slide 69 Alzheimer's disease Inside glimpse: brain atrophy Video source: P. Thompson, J.Neuroscience, 2003 Images: Jannis Productions. (R. Fredenburg; S. Jannis) amyloid plaques and neurofibrillary tangles Video source: Alzheimer society Slide 70 POINT ESTIMATION: (i+1) = argmax x log p(y, y, c (i+1), ) Uniform prior p(): t, f = average offset, s t, s f = variance of offset Conjugate prior p(): still closed-form expression Other kind of prior p(): numerical optimization (gradient method) Probabilistic inference Slide 71 MATCHING: c (i+1) = argmax c log p(y, y, c, (i) ) ALGORITHMS Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm) Linear programming relaxation: extreme points of LP polytope are integral Max-product algorithm gives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)] EQUIVALENT to (imperfect) bipartite max-weight matching problem c (i+1) = argmax c log p(y, y, c, (i) ) = argmax c kk w kk (i) c kk s.t. k c kk 1 and k c kk 1 and c kk 2 {0,1} Probabilistic inference not necessarily perfect find heaviest set of disjoint edges Slide 72 p(y, y, c, ) / I(c) p () kk ( N(t k t k ; t, s t,kk ) N(f k f k ; f, s f, kk ) -2 ) c kk Max-product algorithm MATCHING: c (i+1) = argmax c log p(y, y, c, (i) ) Generative model Slide 73 Max-product algorithm MATCHING: c (i+1) = argmax c log p(y, y, c, (i) ) Conditioning on Slide 74 Max-product algorithm (2) Iteratively compute messages At convergence, compute marginals p(c kk ) = (c kk ) (c kk ) (c kk ) Decisions: c* kk = argmax c kk p(c kk ) Slide 75 Algorithm MATCHING max-product ESTIMATION closed-form PROBLEM: Given two bump models, compute ( spur, t, s t, f, s f ) APPROACH: (c*,*) = argmax c, log p(y, y, c, ) SOLUTION: Coordinate descent c (i+1) = argmax c log p(y, y, c, (i) ) (i+1) = argmax x log p(y, y, c (i+1), ) Slide 76 Generative model Generate bump model (hidden) geometric prior for number n of bumps p(n) = (1- S) ( S) -n bumps are uniformly distributed in rectangle amplitude, width (in t and f) all i.i.d. Generate two noisy observations offset between hidden and observed bump = Gaussian random vector with mean ( t /2, f /2) covariance diag(s t /2, s f /2) amplitude, width (in t and f) all i.i.d. deletion with probability p d y hidden yyy Easily extendable to more than 2 observations ( - t /2, - f /2) ( t /2, f /2) Slide 77 Generative model (2) Binary variables c kk c kk = 1 if k and k are observations of same hidden bump, else c kk = 0 (e.g., c ii = 1 c ij = 0) Constraints: b k = k c kk and b k = k c kk are binary (matching constraints) Generative Model p(y, y, y hidden, c, t, f, s t, s f ) (symmetric in y and y) Eliminate y hidden offset is Gaussian RV with mean = ( t, f ) and covariance diag (s t, s f ) Probabilistic Inference: (c*,*) = argmax c, log p(y, y, c, ) yyy ( - t /2, - f /2) ( t /2, f /2) i i j p(y, y, c, ) = p(y, y, y hidden, c, ) dy hidden Slide 78 Bumps in one model, but NOT in other fraction of spurious bumps spur Bumps in both models, but with offset Average time offset t (delay) Timing jitter with variance s t Average frequency offset f Frequency jitter with variance s f PROBLEM: Given two bump models, compute ( spur, t, s t, f, s f ) APPROACH: (c*,*) = argmax c, log p(y, y, c, ) Summary Slide 79 Objective function Logarithm of model: log p(y, y, c, ) = kk w kk c kk + log I(c) + log p () + w kk = - ( 1/s t (t k t k t ) 2 + 1/s f (f k f k f ) 2 ) - 2 log = p d (/V) 1/2 Euclidean distance between bump centers Large w kk if : a) bumps are close b) small p d c) few bumps per volume element No need to specify p d, , and V, they only appear through = knob to control # matches yyy ( - t /2, - f /2) ( t /2, f /2) i i j Slide 80 Distance measures w kk = 1/s t,kk (t k t k t ) 2 + 1/s f,kk (f k f k f ) 2 + 2 log s t,kk = (t k + t k ) s t s f,kk = (f k + f k ) s f Scaling Non-Euclidean Slide 81 p(y, y, c, ) / I(c) p () kk ( N(t k t k ; t, s t,kk ) N(f k f k ; f, s f, kk ) -2 ) c kk Generative model Slide 82 Expect bumps to appear at about same frequency, but delayed Frequency shift requires non-linear transformation, less likely than delay Conjugate priors for s t and s f (scaled inverse chi-squared): Improper prior for t and t : p( t ) = 1 = p( f ) Prior for parameters Slide 83 CTR MCI Preliminary results for multi-variate model linear comb of p c Slide 84 Probabilistic inference MATCHING POINT ESTIMATION PROBLEM: Given two bump models, compute ( spur, t, s t, f, s f ) APPROACH: (c*,*) = argmax c, log p(y, y, c, ) SOLUTION: Coordinate descent c (i+1) = argmax c log p(y, y, c, (i) ) (i+1) = argmax x log p(y, y, c (i+1), ) X Y Min x 2 X, y 2 Y d(x,y) Slide 85 Generative model Generate bump model (hidden) geometric prior for number n of bumps p(n) = (1- S) ( S) -n bumps are uniformly distributed in rectangle amplitude, width (in t and f) all i.i.d. Generate M noisy observations offset between hidden and observed bump = Gaussian random vector with mean ( t,m /2, f,m /2) covariance diag(s t,m /2, s f,m /2) amplitude, width (in t and f) all i.i.d. deletion with probability p d (other prior p c0 for cluster size) y hidden y1y1 y2y2 y3y3 y4y4 y5y5 Parameters: = t,m, f,m, s t,m, s f,m, p c p c (i) = p(cluster size = i |y) (i = 1,2,,M) Slide 86 (Hebb 1949, Fuster 1997) Stimuli ConsolidationStimulus VoiceFaceVoice Role of local synchrony Assembly activation Hebbian consolidation Assembly recall Slide 87 Probabilistic inference CLUSTERING (IP or MP) POINT ESTIMATION PROBLEM: Given M bump models, compute = t,m, f,m, s t,m, s f,m, p c APPROACH: (c*,*) = argmax c, log p(y, y, c, ) SOLUTION: Coordinate descent c (i+1) = argmax c log p(y, y, c, (i) ) (i+1) = argmax x log p(y, y, c (i+1), ) Integer program Max-product algorithm (MP) on sparse graph Integer programming methods (e.g., LP relaxation) Slide 88 Fourier transform High frequency Low frequency Frequency 1 2 3 2 1 3 Slide 89 Windowed Fourier transform * = Fourier basis functions Window function windowed basis functions Windowed Fourier Transform t f Slide 90 Overview Alzheimers Disease (AD): decrease in EEG synchrony Synchrony measure in time-frequency domain Pairs of EEG signals Collections of EEG signals Numerical Results Conclusion Slide 91 Average synchrony 3. SES for each pair of models 4. Average the SES parameters 1.Group electrodes in regions 2.Bump model for each region Slide 92 Beyond pairwise interactions... Pairwise similarity Multi-variate similarity Slide 93 Similarity measures Correlation and coherence Granger causality (linear system): DTF, ffDTF, dDTF, PDC, PC,... Phase Synchrony: compare instantaneous phases (wavelet/Hilbert transform) State space based measures sync likelihood, S-estimator, S-H-N-indices,... Information-theoretic measures KL divergence, Jensen-Shannon divergence,... No Phase Locking Phase Locking TIME FREQUENCY Slide 94 Generative model (2) Cost function unit cost of non-coincident event unit cost of coincident pair Model Slide 95 Surrogate Data: Results (2) S S depends on t likewise other S except SES Slide 96 Probabilistic inference MATCHING POINT ESTIMATION PROBLEM: Given two bump models, compute (, t, s t, f, s f ) APPROACH: (c*,*) = argmax c, log p(y, y, c, ) SOLUTION: Coordinate descent c (i+1) = argmax c log p(y, y, c, (i) ) (i+1) = argmax x log p(y, y, c (i+1), ) Slide 97 MATCHING: c (i+1) = argmax c log p(y, y, c, (i) ) ALGORITHMS Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm) Linear programming relaxation: gives optimal solution if unique [Sanghavi (2007)] Max-product algorithm gives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)] EQUIVALENT to (imperfect) bipartite max-weight matching problem c (i+1) = argmax c log p(y, y, c, (i) ) = argmax c kk w kk (i) c kk s.t. k c kk 1 and k c kk 1 and c kk 2 {0,1} Probabilistic inference (2) not necessarily perfect find heaviest set of disjoint edges Slide 98 Max-product algorithm MATCHING: c (i+1) = argmax c log p(y, y, c, (i) ) At convergence, compute marginals p(c kk ) = (c kk ) (c kk ) (c kk ) Decisions: c* kk = argmax c kk p(c kk ) (optimal if solution unique) Slide 99 Exemplar-based formulation y hidden y1y1 y2y2 y3y3 y4y4 y5y5 Exemplars = identical copies of hidden bumps = cluster center Other bumps in cluster = non-identical copies of exemplars Is event an exemplar? If not, which exemplar is it associated with? Several constraints Integer program Slide 100 Exemplar-based formulation: IP Binary Variables Integer Program: LINEAR objective function/constraints Equivalent to k-dim matching: for k = 2: in P but for k > 2: NP-hard!

Documents

Analyzing Brain Signals by Combinatorial Optimization Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN December 1, 2008 Quantifying