131
Inference in biological physics Phil Nelson University of Pennsylvania For these slides see: www.physics.upenn.edu/~pcn When you see a title like that, you need to worry: “Uh-oh, sounds philosophical.” Well, I just wanted to tell you two concrete stories about cases when my colleagues and I managed to do something useful by virtue of knowing something about inference. The ideas we needed were things I didn’t know a few years ago, so I thought you might be interested too. Thursday, March 15, 2012

Inference in biological physics

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Inference in biological physics

Inference in biological physics

Phil NelsonUniversity of Pennsylvania

For these slides see: www.physics.upenn.edu/~pcn

When you see a title like that, you need to worry: “Uh-oh, sounds philosophical.”Well, I just wanted to tell you two concrete stories about cases when my colleagues and I managed to do something useful by virtue of knowing something about inference. The ideas we needed were things I didn’t know a few years ago, so I thought you might be interested too.

Thursday, March 15, 2012

Page 2: Inference in biological physics

Focuswe need to make sure we don’t forget to tell our undergraduates why theory exists. (Are there any students here today?) Remember, to them “theory” is a strongly negative word. (“How’s it going in Physics N? Too much theory.”) To them it means “stuff that won’t get me a job because it’s not relevant to anything practical.” To them it’s at best decorative, like music or art. If we think there’s more to it than that, we’d better not forget to tell them

“Of course that’s what we do -- everybody knows that.” -- No. They don’t.

“If your experiment requires statistics, then you ought to have done a better experiment.” -- Ernest Rutherford

Thursday, March 15, 2012

Page 3: Inference in biological physics

Focuswe need to make sure we don’t forget to tell our undergraduates why theory exists. (Are there any students here today?) Remember, to them “theory” is a strongly negative word. (“How’s it going in Physics N? Too much theory.”) To them it means “stuff that won’t get me a job because it’s not relevant to anything practical.” To them it’s at best decorative, like music or art. If we think there’s more to it than that, we’d better not forget to tell them

“Of course that’s what we do -- everybody knows that.” -- No. They don’t.

“If your experiment requires statistics, then you ought to have done a better experiment.” -- Ernest Rutherford

Thursday, March 15, 2012

Page 4: Inference in biological physics

Focus

Well... Statistical inference sounds “too theoretical,” but it is often needed to extract information from data:

we need to make sure we don’t forget to tell our undergraduates why theory exists. (Are there any students here today?) Remember, to them “theory” is a strongly negative word. (“How’s it going in Physics N? Too much theory.”) To them it means “stuff that won’t get me a job because it’s not relevant to anything practical.” To them it’s at best decorative, like music or art. If we think there’s more to it than that, we’d better not forget to tell them

“Of course that’s what we do -- everybody knows that.” -- No. They don’t.

“If your experiment requires statistics, then you ought to have done a better experiment.” -- Ernest Rutherford

Thursday, March 15, 2012

Page 5: Inference in biological physics

Focus

Well... Statistical inference sounds “too theoretical,” but it is often needed to extract information from data:✴Sometimes suggests a new kind of measurement that tests a model more stringently, or

distinguishes two different models more completely, than previous measurements.

we need to make sure we don’t forget to tell our undergraduates why theory exists. (Are there any students here today?) Remember, to them “theory” is a strongly negative word. (“How’s it going in Physics N? Too much theory.”) To them it means “stuff that won’t get me a job because it’s not relevant to anything practical.” To them it’s at best decorative, like music or art. If we think there’s more to it than that, we’d better not forget to tell them

“Of course that’s what we do -- everybody knows that.” -- No. They don’t.

“If your experiment requires statistics, then you ought to have done a better experiment.” -- Ernest Rutherford

Thursday, March 15, 2012

Page 6: Inference in biological physics

Focus

Well... Statistical inference sounds “too theoretical,” but it is often needed to extract information from data:✴Sometimes suggests a new kind of measurement that tests a model more stringently, or

distinguishes two different models more completely, than previous measurements.✴Sometimes our model is not obviously connected with what we can actually measure

experimentally, and we need to make a connection.

we need to make sure we don’t forget to tell our undergraduates why theory exists. (Are there any students here today?) Remember, to them “theory” is a strongly negative word. (“How’s it going in Physics N? Too much theory.”) To them it means “stuff that won’t get me a job because it’s not relevant to anything practical.” To them it’s at best decorative, like music or art. If we think there’s more to it than that, we’d better not forget to tell them

“Of course that’s what we do -- everybody knows that.” -- No. They don’t.

“If your experiment requires statistics, then you ought to have done a better experiment.” -- Ernest Rutherford

Thursday, March 15, 2012

Page 7: Inference in biological physics

Focus

Well... Statistical inference sounds “too theoretical,” but it is often needed to extract information from data:✴Sometimes suggests a new kind of measurement that tests a model more stringently, or

distinguishes two different models more completely, than previous measurements.✴Sometimes our model is not obviously connected with what we can actually measure

experimentally, and we need to make a connection.✴Sometimes the model that interests us involves the behavior of actors that we can only

see indirectly in our data; theory may be needed to separate them out from each other, and from noise.

we need to make sure we don’t forget to tell our undergraduates why theory exists. (Are there any students here today?) Remember, to them “theory” is a strongly negative word. (“How’s it going in Physics N? Too much theory.”) To them it means “stuff that won’t get me a job because it’s not relevant to anything practical.” To them it’s at best decorative, like music or art. If we think there’s more to it than that, we’d better not forget to tell them

“Of course that’s what we do -- everybody knows that.” -- No. They don’t.

“If your experiment requires statistics, then you ought to have done a better experiment.” -- Ernest Rutherford

Thursday, March 15, 2012

Page 8: Inference in biological physics

Focus

Well... Statistical inference sounds “too theoretical,” but it is often needed to extract information from data:✴Sometimes suggests a new kind of measurement that tests a model more stringently, or

distinguishes two different models more completely, than previous measurements.✴Sometimes our model is not obviously connected with what we can actually measure

experimentally, and we need to make a connection.✴Sometimes the model that interests us involves the behavior of actors that we can only

see indirectly in our data; theory may be needed to separate them out from each other, and from noise.

There’s more, of course, but that’s enough to get started.

we need to make sure we don’t forget to tell our undergraduates why theory exists. (Are there any students here today?) Remember, to them “theory” is a strongly negative word. (“How’s it going in Physics N? Too much theory.”) To them it means “stuff that won’t get me a job because it’s not relevant to anything practical.” To them it’s at best decorative, like music or art. If we think there’s more to it than that, we’d better not forget to tell them

“Of course that’s what we do -- everybody knows that.” -- No. They don’t.

“If your experiment requires statistics, then you ought to have done a better experiment.” -- Ernest Rutherford

Thursday, March 15, 2012

Page 9: Inference in biological physics

Part I:

start with a topic that may not be obviously biophysical in character.Suppose I stood here and said “all men are mortal; Socrates is mortal; therefore Socrates is a man”

Thursday, March 15, 2012

Page 10: Inference in biological physics

Part I:

men

mortal

start with a topic that may not be obviously biophysical in character.Suppose I stood here and said “all men are mortal; Socrates is mortal; therefore Socrates is a man”

Thursday, March 15, 2012

Page 11: Inference in biological physics

Part I:

men

mortal

start with a topic that may not be obviously biophysical in character.Suppose I stood here and said “all men are mortal; Socrates is mortal; therefore Socrates is a man”

*In classical logic it’s fairly easy to spot errors of inference.

Thursday, March 15, 2012

Page 12: Inference in biological physics

G. Gigerenzer, Calculated risks

An everyday question in clinical practice

Thursday, March 15, 2012

Page 13: Inference in biological physics

G. Gigerenzer, Calculated risks

An everyday question in clinical practice

Here are the replies of 24 practicing physicians, who had an average of 14 years of professional experience:

Frequency

Thursday, March 15, 2012

Page 14: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

Thursday, March 15, 2012

Page 15: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

Thursday, March 15, 2012

Page 16: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

Thursday, March 15, 2012

Page 17: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

But what we were given was P(+|sick) = B/(A+B).

Thursday, March 15, 2012

Page 18: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

But what we were given was P(+|sick) = B/(A+B).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

Thursday, March 15, 2012

Page 19: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

But what we were given was P(+|sick) = B/(A+B).

Thursday, March 15, 2012

Page 20: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

But what we were given was P(+|sick) = B/(A+B).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

Thursday, March 15, 2012

Page 21: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

But what we were given was P(+|sick) = B/(A+B).These are not the same thing: they have different denominators. To get one from the other we need some more information:

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

Thursday, March 15, 2012

Page 22: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

But what we were given was P(+|sick) = B/(A+B).These are not the same thing: they have different denominators. To get one from the other we need some more information:

BB+D = B

A+B × A+BB+D

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

Thursday, March 15, 2012

Page 23: Inference in biological physics

Work it outWe are asked for P(sick|+) = B/(B+D).

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

But what we were given was P(+|sick) = B/(A+B).These are not the same thing: they have different denominators. To get one from the other we need some more information:

BB+D = B

A+B × A+BB+D

P (sick|+) = P (+|sick)× P (sick)

P (+)

Posterior estimate(desired)

Priorestimate(given)

Likelihood(given)

Still need this

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

Thursday, March 15, 2012

Page 24: Inference in biological physics

Finish working it out

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

P (sick|+) = P (+|sick)× P (sick)

P (+)

Is that last factor a big deal?P(sick) was given, but we need:

Thursday, March 15, 2012

Page 25: Inference in biological physics

Finish working it out

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

P (sick|+) = P (+|sick)× P (sick)

P (+)

Is that last factor a big deal?P(sick) was given, but we need:P (+) = B +D

=B

A+B(A+B) +

D

C +D(C +D)

= P (+|sick)P (sick) + P (+|healthy)P (healthy)

= (0.5)(0.003) + (0.03)(0.997) ≈ 0.03

Thursday, March 15, 2012

Page 26: Inference in biological physics

Finish working it out

A=Sick, –

B=Sick, +

C=Healthy, –

D=Healthy, +

P (sick|+) = P (+|sick)× P (sick)

P (+)

Is that last factor a big deal?P(sick) was given, but we need:P (+) = B +D

=B

A+B(A+B) +

D

C +D(C +D)

= P (+|sick)P (sick) + P (+|healthy)P (healthy)

= (0.5)(0.003) + (0.03)(0.997) ≈ 0.03

P (sick)

P (+)≈ 0.003

0.03≈ 0.1

It’s huge: a positive test result meansonly a 5% chance you’re sick. Not 97%.

Thursday, March 15, 2012

Page 27: Inference in biological physics

Many thanks to Haw Yang. See also Lucas P. Watkins and Haw Yang J. Phys. Chem. B 2005

Part II: Changepoint analysis in single-molecule TIRF

JF Beausang, Yale Goldman, PN

✴Sometimes our model is not obviously connected with what we can actually measure experimentally, but and we need to makes a connection.

✴Sometimes the model that interests us involves the behavior of actors that we can only see indirectly in our data; theory may be needed to separate them out from each other, and from noise.

Thursday, March 15, 2012

Page 28: Inference in biological physics

Myosin V ProcessivityWe’d like to know things like: How does it walk? What are the steps in the kinetic pathway? What is the geometry of each state?One classic approach is to monitor the position in space of a marker (e.g. a bead) attached to the motor. But this does not address the geometry of each state.

Thursday, March 15, 2012

Page 29: Inference in biological physics

Myosin V ProcessivityWe’d like to know things like: How does it walk? What are the steps in the kinetic pathway? What is the geometry of each state?One classic approach is to monitor the position in space of a marker (e.g. a bead) attached to the motor. But this does not address the geometry of each state.

The approach I’ll discuss involves attaching a bifunctional fluorescent label to one lever arm. The label has a dipole moment whose orientation in space reflects that of the arm.

Thursday, March 15, 2012

Page 30: Inference in biological physics

Myosin V Processivity

θ1

Thursday, March 15, 2012

Page 31: Inference in biological physics

Myosin V Processivity

θ2

Thursday, March 15, 2012

Page 32: Inference in biological physics

Myosin V Processivity

θ3

Thursday, March 15, 2012

Page 33: Inference in biological physics

Myosin V Processivity

θ4

Thursday, March 15, 2012

Page 34: Inference in biological physics

Fluorescence illumination by the evanescent wave eliminates a lot of noise, and importantly, maintains the polarization of the incident light.To tickle the fluorophore with every possible polarization, we need the incoming light to have at least two different beam directions.

Polarized total internal reflection fluorescence microscopy

QuartzSlide

AqueousMedium

MicroscopeObjective

FluorescentEmission

EvanescentField

ExcitationLaser BeamGlass Prism

Thursday, March 15, 2012

Page 35: Inference in biological physics

pol-TIRF setup

Thursday, March 15, 2012

Page 36: Inference in biological physics

pol-TIRF setup

8 polarized illuminations x 2 detectors = 16 fluorescent intensities per cycle

Thursday, March 15, 2012

Page 37: Inference in biological physics

JN Forkey et al. Nature 2003

Current state of the art

It’s a bit more meaningful to convert from lab-frame angles θ,φ to actin-frame angles α,β. Even then, however, state of the art calculations give pretty noisy determinations, with pretty poor time resolution. You could easily miss a short-lived state -- e.g. the elusive diffusive-search step (if it exists). Can’t we do better?

Thursday, March 15, 2012

Page 38: Inference in biological physics

Unfortunately, the total photon counts from a fluorescent probe may not be very informative. Here we divided a time period of interest into 20 bins. There is some Poisson noise in the photon counts, of course.( [ATP]=10uM )

Time (a.u.)

phot

on co

unt

Horizontal axis is time. Vertical axis is binned photon count, PFI =polarized fluorescence intensity

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).Thursday, March 15, 2012

Page 39: Inference in biological physics

Unfortunately, the total photon counts from a fluorescent probe may not be very informative. Here we divided a time period of interest into 20 bins. There is some Poisson noise in the photon counts, of course.( [ATP]=10uM )

Time (a.u.)

phot

on co

unt

Time (a.u.)

If we classify the photons by polarization and bin them separately, that reveals a definite changepoint. But when exactly did it occur? Probably not at the dashed line shown, but how can we be more precise?

phot

on co

unt

Horizontal axis is time. Vertical axis is binned photon count, PFI =polarized fluorescence intensity

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).Thursday, March 15, 2012

Page 40: Inference in biological physics

Unfortunately, the total photon counts from a fluorescent probe may not be very informative. Here we divided a time period of interest into 20 bins. There is some Poisson noise in the photon counts, of course.( [ATP]=10uM )

Time (a.u.)

phot

on co

unt

Time (a.u.)

If we classify the photons by polarization and bin them separately, that reveals a definite changepoint. But when exactly did it occur? Probably not at the dashed line shown, but how can we be more precise?

phot

on co

unt

If we choose wider bins, we’ll get worse time resolution; if we choose narrower bins, we’ll get worse shot-noise errors.

Horizontal axis is time. Vertical axis is binned photon count, PFI =polarized fluorescence intensity

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).Thursday, March 15, 2012

Page 41: Inference in biological physics

Unfortunately, the total photon counts from a fluorescent probe may not be very informative. Here we divided a time period of interest into 20 bins. There is some Poisson noise in the photon counts, of course.( [ATP]=10uM )

Time (a.u.)

phot

on co

unt

Time (a.u.)

If we classify the photons by polarization and bin them separately, that reveals a definite changepoint. But when exactly did it occur? Probably not at the dashed line shown, but how can we be more precise?

phot

on co

unt

If we choose wider bins, we’ll get worse time resolution; if we choose narrower bins, we’ll get worse shot-noise errors.Can we evade the cruel logic of photon statistics?

Horizontal axis is time. Vertical axis is binned photon count, PFI =polarized fluorescence intensity

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).Thursday, March 15, 2012

Page 42: Inference in biological physics

Unfortunately, the total photon counts from a fluorescent probe may not be very informative. Here we divided a time period of interest into 20 bins. There is some Poisson noise in the photon counts, of course.( [ATP]=10uM )

Time (a.u.)

phot

on co

unt

Time (a.u.)

If we classify the photons by polarization and bin them separately, that reveals a definite changepoint. But when exactly did it occur? Probably not at the dashed line shown, but how can we be more precise?

phot

on co

unt

If we choose wider bins, we’ll get worse time resolution; if we choose narrower bins, we’ll get worse shot-noise errors.Can we evade the cruel logic of photon statistics?

sequence number

Tim

e (a

.u.)

It turns out that binning the data destroyed some information. Something magical happens if instead of binning, we just we plot photon arrival time versus photon sequence number. Despite some ripples from Poisson statistics, it’s obvious that each trace has a sharp changepoint, and moreover that the two changepoints found independently in this way are simultaneous. (A similar approach in the context of FRET was pioneered by Haw Yang.)

Horizontal axis is time. Vertical axis is binned photon count, PFI =polarized fluorescence intensity

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).Thursday, March 15, 2012

Page 43: Inference in biological physics

Now that I have your attention•Why did that trick work? How did we get such great time resolution from such cruddy data?

•How well does it work? If we have even fewer photons, for example because a state is short-lived, how can we quantify our confidence that any changepoint occurred at all?

• Could we generalize and automate this trick? Ultimately we’ll want to handle data with multiple polarizations, and find lots of changepoints.

Thursday, March 15, 2012

Page 44: Inference in biological physics

Now that I have your attention•Why did that trick work? How did we get such great time resolution from such cruddy data?

•How well does it work? If we have even fewer photons, for example because a state is short-lived, how can we quantify our confidence that any changepoint occurred at all?

• Could we generalize and automate this trick? Ultimately we’ll want to handle data with multiple polarizations, and find lots of changepoints.

The appropriate tool is maximum-likelihood analysis:

Thursday, March 15, 2012

Page 45: Inference in biological physics

Now that I have your attention•Why did that trick work? How did we get such great time resolution from such cruddy data?

•How well does it work? If we have even fewer photons, for example because a state is short-lived, how can we quantify our confidence that any changepoint occurred at all?

• Could we generalize and automate this trick? Ultimately we’ll want to handle data with multiple polarizations, and find lots of changepoints.

The appropriate tool is maximum-likelihood analysis:Focus on just one “flavor” of photons (e.g. one polarization).

Thursday, March 15, 2012

Page 46: Inference in biological physics

Now that I have your attention•Why did that trick work? How did we get such great time resolution from such cruddy data?

•How well does it work? If we have even fewer photons, for example because a state is short-lived, how can we quantify our confidence that any changepoint occurred at all?

• Could we generalize and automate this trick? Ultimately we’ll want to handle data with multiple polarizations, and find lots of changepoints.

The appropriate tool is maximum-likelihood analysis:Focus on just one “flavor” of photons (e.g. one polarization). Suppose that in total time T we catch N photons at times t1,... tN.

Thursday, March 15, 2012

Page 47: Inference in biological physics

Now that I have your attention•Why did that trick work? How did we get such great time resolution from such cruddy data?

•How well does it work? If we have even fewer photons, for example because a state is short-lived, how can we quantify our confidence that any changepoint occurred at all?

• Could we generalize and automate this trick? Ultimately we’ll want to handle data with multiple polarizations, and find lots of changepoints.

The appropriate tool is maximum-likelihood analysis:Focus on just one “flavor” of photons (e.g. one polarization). Suppose that in total time T we catch N photons at times t1,... tN.We wish to explore the hypothesis that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.

Thursday, March 15, 2012

Page 48: Inference in biological physics

Now that I have your attention•Why did that trick work? How did we get such great time resolution from such cruddy data?

•How well does it work? If we have even fewer photons, for example because a state is short-lived, how can we quantify our confidence that any changepoint occurred at all?

• Could we generalize and automate this trick? Ultimately we’ll want to handle data with multiple polarizations, and find lots of changepoints.

The appropriate tool is maximum-likelihood analysis:Focus on just one “flavor” of photons (e.g. one polarization). Suppose that in total time T we catch N photons at times t1,... tN.We wish to explore the hypothesis that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.We want to find our best estimates of the three parameters t*, R, and R’, find confidence intervals for them, and compare the null hypothesis that there was no changepoint.

Thursday, March 15, 2012

Page 49: Inference in biological physics

Now that I have your attention•Why did that trick work? How did we get such great time resolution from such cruddy data?

•How well does it work? If we have even fewer photons, for example because a state is short-lived, how can we quantify our confidence that any changepoint occurred at all?

• Could we generalize and automate this trick? Ultimately we’ll want to handle data with multiple polarizations, and find lots of changepoints.

The appropriate tool is maximum-likelihood analysis:Focus on just one “flavor” of photons (e.g. one polarization). Suppose that in total time T we catch N photons at times t1,... tN.We wish to explore the hypothesis that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.We want to find our best estimates of the three parameters t*, R, and R’, find confidence intervals for them, and compare the null hypothesis that there was no changepoint.

To do this, we ask for the “Likelihood,” the probability that the data we actually observed would have been observed in a world described by our model with particular values of the unknown fit parameters:

Thursday, March 15, 2012

Page 50: Inference in biological physics

Now that I have your attention•Why did that trick work? How did we get such great time resolution from such cruddy data?

•How well does it work? If we have even fewer photons, for example because a state is short-lived, how can we quantify our confidence that any changepoint occurred at all?

• Could we generalize and automate this trick? Ultimately we’ll want to handle data with multiple polarizations, and find lots of changepoints.

The appropriate tool is maximum-likelihood analysis:Focus on just one “flavor” of photons (e.g. one polarization). Suppose that in total time T we catch N photons at times t1,... tN.We wish to explore the hypothesis that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.We want to find our best estimates of the three parameters t*, R, and R’, find confidence intervals for them, and compare the null hypothesis that there was no changepoint.

To do this, we ask for the “Likelihood,” the probability that the data we actually observed would have been observed in a world described by our model with particular values of the unknown fit parameters:

log P (t1, . . . , tN |R,R�, t∗) =t∗/∆t�

k=1

log

�R ∆t if a photon in this slice(1−R ∆t) otherwise

Thursday, March 15, 2012

Page 51: Inference in biological physics

Now that I have your attention•Why did that trick work? How did we get such great time resolution from such cruddy data?

•How well does it work? If we have even fewer photons, for example because a state is short-lived, how can we quantify our confidence that any changepoint occurred at all?

• Could we generalize and automate this trick? Ultimately we’ll want to handle data with multiple polarizations, and find lots of changepoints.

The appropriate tool is maximum-likelihood analysis:Focus on just one “flavor” of photons (e.g. one polarization). Suppose that in total time T we catch N photons at times t1,... tN.We wish to explore the hypothesis that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.We want to find our best estimates of the three parameters t*, R, and R’, find confidence intervals for them, and compare the null hypothesis that there was no changepoint.

To do this, we ask for the “Likelihood,” the probability that the data we actually observed would have been observed in a world described by our model with particular values of the unknown fit parameters:

log P (t1, . . . , tN |R,R�, t∗) =t∗/∆t�

k=1

log

�R ∆t if a photon in this slice(1−R ∆t) otherwise

+T/∆t�

k�=t∗/∆t+1

log

�R� ∆t if a photon in this slice(1−R� ∆t) otherwise

Thursday, March 15, 2012

Page 52: Inference in biological physics

Now: Divide the N photons into n that arrived before the putative changepoint, and n’=N-n that arrived after.Take the limit : ∆t→ 0

log P (t1, . . . , tN |R,R�, t∗) =t∗/∆t�

k=1

log

�R ∆t if a photon in this slice(1−R ∆t) otherwise

From previous slide: In total time T we catch N photons at times t1,... tN.Hypothesis is that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.

+T/∆t�

k�=t∗/∆t+1

log

�R� ∆t if a photon in this slice(1−R� ∆t) otherwise

Thursday, March 15, 2012

Page 53: Inference in biological physics

Now: Divide the N photons into n that arrived before the putative changepoint, and n’=N-n that arrived after.Take the limit : ∆t→ 0

log P (t1, . . . , tN |R,R�, t∗) =t∗/∆t�

k=1

log

�R ∆t if a photon in this slice(1−R ∆t) otherwise

From previous slide: In total time T we catch N photons at times t1,... tN.Hypothesis is that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.

+T/∆t�

k�=t∗/∆t+1

log

�R� ∆t if a photon in this slice(1−R� ∆t) otherwise

P ≈ N log(∆t) + n logR+ n� logR� −� t∗∆t

− n��

R∆t�−

�T − t∗∆T

− 1− (N − n)��

R�∆t�

≈ const + n logR+ n� logR� −Rt∗ −R�(T − t∗)

Thursday, March 15, 2012

Page 54: Inference in biological physics

Now: Divide the N photons into n that arrived before the putative changepoint, and n’=N-n that arrived after.Take the limit : ∆t→ 0

log P (t1, . . . , tN |R,R�, t∗) =t∗/∆t�

k=1

log

�R ∆t if a photon in this slice(1−R ∆t) otherwise

Maximize this first over R and R’:

R = n/t∗ , R� = n�/(T − t∗)

From previous slide: In total time T we catch N photons at times t1,... tN.Hypothesis is that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.

+T/∆t�

k�=t∗/∆t+1

log

�R� ∆t if a photon in this slice(1−R� ∆t) otherwise

P ≈ N log(∆t) + n logR+ n� logR� −� t∗∆t

− n��

R∆t�−

�T − t∗∆T

− 1− (N − n)��

R�∆t�

≈ const + n logR+ n� logR� −Rt∗ −R�(T − t∗)

Thursday, March 15, 2012

Page 55: Inference in biological physics

Now: Divide the N photons into n that arrived before the putative changepoint, and n’=N-n that arrived after.Take the limit : ∆t→ 0

log P (t1, . . . , tN |R,R�, t∗) =t∗/∆t�

k=1

log

�R ∆t if a photon in this slice(1−R ∆t) otherwise

Maximize this first over R and R’:

R = n/t∗ , R� = n�/(T − t∗)

OK, duh, that was no surprise! But it does explain why we can just lay a ruler along the cumulative plot to get our best estimate of the before and after rates.

From previous slide: In total time T we catch N photons at times t1,... tN.Hypothesis is that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.

+T/∆t�

k�=t∗/∆t+1

log

�R� ∆t if a photon in this slice(1−R� ∆t) otherwise

P ≈ N log(∆t) + n logR+ n� logR� −� t∗∆t

− n��

R∆t�−

�T − t∗∆T

− 1− (N − n)��

R�∆t�

≈ const + n logR+ n� logR� −Rt∗ −R�(T − t∗)

Thursday, March 15, 2012

Page 56: Inference in biological physics

Now: Divide the N photons into n that arrived before the putative changepoint, and n’=N-n that arrived after.Take the limit : ∆t→ 0

log P (t1, . . . , tN |R,R�, t∗) =t∗/∆t�

k=1

log

�R ∆t if a photon in this slice(1−R ∆t) otherwise

Maximize this first over R and R’:

R = n/t∗ , R� = n�/(T − t∗)

OK, duh, that was no surprise! But it does explain why we can just lay a ruler along the cumulative plot to get our best estimate of the before and after rates. More interestingly, we can substitute these optimal rates into the formula for P to find the likelihood as a function of putative changepoint:

From previous slide: In total time T we catch N photons at times t1,... tN.Hypothesis is that photons are arriving in a Poisson process with rate R from time 0 to time t* , and thereafter arrive in another Poisson process with rate R’.

+T/∆t�

k�=t∗/∆t+1

log

�R� ∆t if a photon in this slice(1−R� ∆t) otherwise

P ≈ N log(∆t) + n logR+ n� logR� −� t∗∆t

− n��

R∆t�−

�T − t∗∆T

− 1− (N − n)��

R�∆t�

≈ const + n logR+ n� logR� −Rt∗ −R�(T − t∗)

Thursday, March 15, 2012

Page 57: Inference in biological physics

Application

Thursday, March 15, 2012

Page 58: Inference in biological physics

Here’s some very fake data; the photons arrive uniformly, not at random.

Application

Thursday, March 15, 2012

Page 59: Inference in biological physics

Here’s some very fake data; the photons arrive uniformly, not at random.

Here are two lines corresponding to non-optimal choices of the changepoint. We’d like to see the likelihood function and how it selects the “right” changepoint, which for fake data is known.

Application

Thursday, March 15, 2012

Page 60: Inference in biological physics

Here’s some very fake data; the photons arrive uniformly, not at random.

Here are two lines corresponding to non-optimal choices of the changepoint. We’d like to see the likelihood function and how it selects the “right” changepoint, which for fake data is known.

Application

Here is our log-likelihood function as a function of putative changepoint time.

Thursday, March 15, 2012

Page 61: Inference in biological physics

Left: Some more realistic (Poisson-arrival) simulated data, shown in traditional binned form and in the improved version.

Right: Likelihood function for placement of the changepoint. Dashed line, maximum-likelihood point. Black triangle: Actual changepoint used in the simulation.

Phot

on c

ount

/ bi

n

Time, a.u.0. 2 0. 4 0. 6 0. 8 1. 0

05

10152025

I1

I2

0 50 100 150 2000.0

0.20.40.60.8

Photon sequence number

Tim

e, a

.u.

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).Thursday, March 15, 2012

Page 62: Inference in biological physics

Left: Some more realistic (Poisson-arrival) simulated data, shown in traditional binned form and in the improved version.

Right: Likelihood function for placement of the changepoint. Dashed line, maximum-likelihood point. Black triangle: Actual changepoint used in the simulation.

Phot

on c

ount

/ bi

n

Time, a.u.0. 2 0. 4 0. 6 0. 8 1. 0

05

10152025

I1

I2

0 50 100 150 2000.0

0.20.40.60.8

Photon sequence number

Tim

e, a

.u.

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).

0 50 100 150 2000

10

20

30

40

Photon sequence number

log

likel

ihoo

d ra

tio

Thursday, March 15, 2012

Page 63: Inference in biological physics

Oh, yes -- the method also works on multiple-channel data. Left: one channel (red) starts with rare photons, then jumps to higher intensity. Another channel (blue) does the opposite. The sum of the intensities (black) doesn’t change much at all.Middle: “kink” representations of the same data. Right: both channels contribute to a likelihood function with a robust peak, even though there were only a total of just 200 photons in the entire dataset.

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).

!"! !"# !"$ !"% !"& '"!#%'!'$

! (! '!! '(! #!!!"!

!"(

'"!

! (! '!! '(! #!!!'!#!)!

*+,-./012-

!"#

sequence number sequence numbertime, a.u.

time,

a.u

.

Thursday, March 15, 2012

Page 64: Inference in biological physics

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).

Payoff

theta = polar in the lab framephi = azimuthal in the lab frame

Thursday, March 15, 2012

Page 65: Inference in biological physics

JF Beausang, YE Goldman, and PCN, Meth. Enzymol. 487:431 (2011).

Payoff

theta = polar in the lab framephi = azimuthal in the lab frame

Oh, yes--it also works on real experimental data.

Now we can get back to the original motivation. Previously, people would take data from multiple polarizations, bin it, and pipe the inferred intensities into a maximum-likelihood estimator of the orientation of the fluorophore. That procedure leads to the rather noisy dots shown here. One problem is that if a transition happens in the middle of a time bin, then the inferred orientation in that time bin can be crazy.

Here the solid lines are the inferred orientations of the probe molecule during successive states defined by changepoint analysis. We see a nice alternating stride in φ.

Thursday, March 15, 2012

Page 66: Inference in biological physics

Summary Part II

✴When you only get a million photons, you’d better make every photon count.

✴A simple maximum-likelihood analysis accomplishes this.

✴In the context of TIRF it can dramatically improve the tradeoff between time resolution and accuracy.

Thursday, March 15, 2012

Page 67: Inference in biological physics

Part III: Parallel recordings from dozens of individual neurons

✴Sometimes suggests a new kind of measurement that tests a model more stringently, or distinguishes two different models more completely, than previous measurements.

✴Sometimes our model is not obviously connected with what we can actually measure experimentally, and we need to make a connection.

✴Sometimes the model that interests us involves the behavior of actors that we can only see indirectly in our data; theory may be needed to separate them out from each other, and from noise.

Thursday, March 15, 2012

Page 68: Inference in biological physics

Part III: Parallel recordings from dozens of individual neurons

✴Sometimes suggests a new kind of measurement that tests a model more stringently, or distinguishes two different models more completely, than previous measurements.

✴Sometimes our model is not obviously connected with what we can actually measure experimentally, and we need to make a connection.

✴Sometimes the model that interests us involves the behavior of actors that we can only see indirectly in our data; theory may be needed to separate them out from each other, and from noise.

Thursday, March 15, 2012

Page 69: Inference in biological physics

Sources of energyExperiments done in the lab of Vijay Balasubramanian (Penn).

Thursday, March 15, 2012

Page 70: Inference in biological physics

Jason Prentice, Penn Physics

Kristy Simmons, Penn Neuroscience

Jan Homann, Penn Physics

(plus Gasper Tkacik.)(Many thanks to Michael Berry and Olivier Marre, Princeton; Bart Borghuis, Janelia Farms; Michael Freed and others at Penn Retina Lab; Joerg Sander, U Alberta; Ronen Segev, BGU, Chris Wiggins, Columbia.)

Sources of energyExperiments done in the lab of Vijay Balasubramanian (Penn).

Thursday, March 15, 2012

Page 71: Inference in biological physics

Really big pictureRetina is also an approachable, yet still complex, part of the brain. It’s a 2D carpet consisting of “only” three layers of neurons.

Optics Retina Brain BehaviorVisual scene in

Retinal ganglion cell spike trains

Retinal illumination pattern

Thursday, March 15, 2012

Page 72: Inference in biological physics

Really big pictureRetina is also an approachable, yet still complex, part of the brain. It’s a 2D carpet consisting of “only” three layers of neurons.

Optics Retina Brain BehaviorVisual scene in

Retinal ganglion cell spike trains

Retinal illumination pattern

Thursday, March 15, 2012

Page 73: Inference in biological physics

It matters

Thursday, March 15, 2012

Page 74: Inference in biological physics

It matters

Thursday, March 15, 2012

Page 75: Inference in biological physics

Summary, Part III

Get data Cluster Fit Interpret

Thursday, March 15, 2012

Page 76: Inference in biological physics

1. Experiment2. Clustering3. Fitting4. Performance

Get data Cluster Fit Interpret

Thursday, March 15, 2012

Page 77: Inference in biological physics

HardwareCf Meister, Pine, and Baylor 1994.Incredibly, one can keep a mammalian retina alive in a dish for over 6 hours while presenting it stimuli and recording its activity.

Thursday, March 15, 2012

Page 78: Inference in biological physics

HardwareCf Meister, Pine, and Baylor 1994.Incredibly, one can keep a mammalian retina alive in a dish for over 6 hours while presenting it stimuli and recording its activity.

Thursday, March 15, 2012

Page 79: Inference in biological physics

What’s in the dish

Michael Berry, Princeton

Thursday, March 15, 2012

Page 80: Inference in biological physics

67 ms of data, viewed as a movie.[data have been smoothed]

Simple events

Classic: Gerstein+Clark 1964; Abeles+Goldstein 1977; Schmidt 1984.

Thursday, March 15, 2012

Page 81: Inference in biological physics

67 ms of data, viewed as a movie.[data have been smoothed]

Some spikes move across the array:

Simple events

Classic: Gerstein+Clark 1964; Abeles+Goldstein 1977; Schmidt 1984.

Thursday, March 15, 2012

Page 82: Inference in biological physics

67 ms of data, viewed as a movie.[data have been smoothed]

Some spikes move across the array:

Simple events

Classic: Gerstein+Clark 1964; Abeles+Goldstein 1977; Schmidt 1984.

Thursday, March 15, 2012

Page 83: Inference in biological physics

67 ms of data, viewed as a movie.[data have been smoothed]

Some spikes move across the array:

Simple events

Classic: Gerstein+Clark 1964; Abeles+Goldstein 1977; Schmidt 1984.

Mostly we are hearing retinal ganglion cells, as desired, because they’re the ones that spike.

Thursday, March 15, 2012

Page 84: Inference in biological physics

67 ms of data, viewed as a movie.[data have been smoothed]

Some spikes move across the array:

Simple events

The spike-sorting problem is: Given raw data like these, convert to a list of discrete events (which cells fired at what times).

Classic: Gerstein+Clark 1964; Abeles+Goldstein 1977; Schmidt 1984.

Mostly we are hearing retinal ganglion cells, as desired, because they’re the ones that spike.

Thursday, March 15, 2012

Page 85: Inference in biological physics

Not-so-simple events

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

Unfortunately many events are complex, with multiple overlapping spikes in many locations. And of course these may be the most interesting ones!

It really matters because “Failure in identification of overlapping spikes from multiple neuron activity causes artificial correlations” [Bar-Gad ‘01].Moreover, when we graduate to bigger arrays, nearly all events will involve overlaps in time!!

Many authors say bursts are a big problem, but here is a nice fit that we obtained with no special effort. See later.

We even handle overlapping spikes, which some algorithms do not attempt. See later.

JS Prentice, J Homann, KD Simmons, G Tkacik, V Balasubramanian, PCN, PLoS ONE 6(7): e19884 (2011).

time

potential, uVwhich electrode, x

whi

ch e

lect

rode

, y

60 ms

Thursday, March 15, 2012

Page 86: Inference in biological physics

1. Experiment2. Clustering3. Fitting4. Performance

Get data Cluster Fit Interpret

[Sorry, no time to discuss our method for this step.]

Thursday, March 15, 2012

Page 87: Inference in biological physics

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

aligned cluster=4

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

50 100 150

−100

−50

0

50

Typical clusterSuperposing 50 traces chosen from 284 in this cluster shows that they really do all resemble each other.

Occasional events in which this event collides with another don’t affect the “archetype waveform” (template) (next slide).

Although the shape of each instance of the archetype is quite constant, still its amplitude has significant variation.

JS Prentice, J Homann, KD Simmons, G Tkacik, V Balasubramanian, PCN, PLoS ONE 6(7): e19884 (2011).

3 ms

Thursday, March 15, 2012

Page 88: Inference in biological physics

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0consensus4

284 exemplar

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

10 20 30−100

−50

0

We scaled each instance of each archetype to get best agreement with the others, then took the median at each time point to find our best estimate of the consensus waveform (blue). As a check, the pointwise mean waveform looks the same (red).

Resulting archetype waveform

3 ms

Thursday, March 15, 2012

Page 89: Inference in biological physics

1. Experiment2. Clustering3. Fitting4. Performance

Get data Cluster Fit Interpret

Thursday, March 15, 2012

Page 90: Inference in biological physics

Noise covarianceVanilla least-squares fitting is not appropriate for time series, because it assumes that every sample is independent of all others--whereas actually, successive samples are correlated.Here is the covariance of channel #13 with all other channels (after an initial spatial filter, also obtained from data). For reference, each channel has a single blue curve showing an exponential function.

Nelson notes III: Fitting 7

approximation to the data, i.e. the one that reproduces the observed mean and covariance. To

interpret Eqn. 3, we say that the e-vectors of C−1 with small e-values are “noiselike” features

in data, and hence not so useful for discriminating features; those with large e-values are “un-

noiselike” and, when present, imply a feature that is not noise. That is, we have learned how

to weight features in the data properly.

One might hope that C would be very simple, and in particular diagonal in the channel

numbers, because we already decorrelated channels. And that’s approximately correct. We also

expect that C will be time translation invariant: Cαt1;βt2 = S(|t1−t2|)δαβ . That’s approximately

correct too. Best world would be if S(δt) = ηe−δt/τcorr . That’s also approximately correct, with

η = 48µV2 and τcorr = 0.2ms (2 samples). To see these things, here are graphs of C13,t1;β,(t1+δt)

as functions of δt (measured in steps of 0.1ms). Each panel is a different channel β. Each trace

is a different t1.

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

0 5 10 15 20!50

0

50

13

In each panel I also drew the exponential function (48µV2)e−δt/τcorr . We see that: All traces

in a given panel are similar (time-translation invariance). All traces are roughly zero except

panel 13 (diagonal in channel number).13 The traces in panel #13 display exponential falloff.

Other choices of channel α are similar (not shown).

Such a correlation matrix has a simple inverse. Let x = e−0.1ms/τcorr = e−1/2. Then

(dropping δαβ and assuming infinite time)

C−1ideal(t1, t2) = η−1 ×

(1 + x2)/(1 − x2) , if t1 = t2−x/(1 − x2) , if t1 = t2 ± 1

0 , if otherwise

(4)

How unsurprising: To account for temporal correlations, we must apply a sharpening filter.

Normally we are warned “Don’t sharpen noisy data! It accentuates the noise.” But here we

13 Channel #9 is weird because it’s dead; its nonzero values came from the (imperfect) whitening step.

We see that #13 is correlated only with itself, and it has a simple covariance matrix that is easy to invert. The inverse covariance thus obtained defines our correlated Gaussian model of the noise.

[Again: The covariance is not a delta function, contrary to what is assumed in naive least-squares fitting.]

Thursday, March 15, 2012

Page 91: Inference in biological physics

Suppose we measure some experimental data, and wish to make an inference about some situation that we cannot directly observe. That is, we imagine a variety of worlds with different values of X, and ask which is most probable given the observed data.

On inference

See M. Denny and S. Gaines, Chance in Biology.Thursday, March 15, 2012

Page 92: Inference in biological physics

Suppose we measure some experimental data, and wish to make an inference about some situation that we cannot directly observe. That is, we imagine a variety of worlds with different values of X, and ask which is most probable given the observed data.

On inference

P (X|observed data) = P (data|X)P (X)

P (data)We can ignore the denominator, if all we want is to compare two hypotheses (e.g. maximize over X).

If we know the probability that those data would have arisen in a world with a particular value of X, then Bayes’s formula gives us what we actually want:

See M. Denny and S. Gaines, Chance in Biology.Thursday, March 15, 2012

Page 93: Inference in biological physics

Suppose we measure some experimental data, and wish to make an inference about some situation that we cannot directly observe. That is, we imagine a variety of worlds with different values of X, and ask which is most probable given the observed data.

On inference

P (X|observed data) = P (data|X)P (X)

P (data)We can ignore the denominator, if all we want is to compare two hypotheses (e.g. maximize over X).

If we know the probability that those data would have arisen in a world with a particular value of X, then Bayes’s formula gives us what we actually want:

See M. Denny and S. Gaines, Chance in Biology.

For our application, we’d like P(spikes | data), where “data” is an observed waveform and “spikes” refers to a collection of spike archetypes occurring at timeswith amplitudes relative to the amplitude of the corresponding archetype (neuron). Bayes’s formula gives what we want as

µ1, . . . t1, . . .A1, . . .

K × (likelihood) × (prior) = KP (data | spikes)P (spikes)

Thursday, March 15, 2012

Page 94: Inference in biological physics

Here “spikes” refers to a collection of spike archetypes occurring at times with amplitudes relative to the amplitude of the corresponding archetype.

Bayesian ideaPrevious slide expressed P(spikes | data) as:

K × (likelihood) × (prior) = K P(data | spikes) P(spikes)

Priors come from data itself--an unexpected benefit of 2-step analysis-- and can be ctsly updatedµ1, . . .

t1, . . . A1, . . .

Thursday, March 15, 2012

Page 95: Inference in biological physics

Here “spikes” refers to a collection of spike archetypes occurring at times with amplitudes relative to the amplitude of the corresponding archetype.

Bayesian ideaPrevious slide expressed P(spikes | data) as:

K × (likelihood) × (prior) = K P(data | spikes) P(spikes)

To get the prior, P(spikes), assume that for a single spike it has the form

The three factors are respectively the popularity of this neuron, uniform in time, and a Gaussian reflecting its typical amplitude and amplitude variability. We get these priors from the data subset used in clustering.

P cell(µ)P time(t)P ampl(A|µ)

Priors come from data itself--an unexpected benefit of 2-step analysis-- and can be ctsly updatedµ1, . . .

t1, . . . A1, . . .

Thursday, March 15, 2012

Page 96: Inference in biological physics

Here “spikes” refers to a collection of spike archetypes occurring at times with amplitudes relative to the amplitude of the corresponding archetype.

Bayesian ideaPrevious slide expressed P(spikes | data) as:

K × (likelihood) × (prior) = K P(data | spikes) P(spikes)

To get the prior, P(spikes), assume that for a single spike it has the form

The three factors are respectively the popularity of this neuron, uniform in time, and a Gaussian reflecting its typical amplitude and amplitude variability. We get these priors from the data subset used in clustering.

P cell(µ)P time(t)P ampl(A|µ)

Priors come from data itself--an unexpected benefit of 2-step analysis-- and can be ctsly updatedµ1, . . .

t1, . . . A1, . . .

To get the likelihood function P(data | spikes), suppose that the data consist of onearchetype, plus noise. And suppose that the noise is some Gaussian, independent of which spikes fired. We know all about this Gaussian from our measurement of noise covariance.

Then the likelihood is that distribution, evaluated at the difference between the actual waveform and the idealized one. [Pouzat et. al. 2002]

Thursday, March 15, 2012

Page 97: Inference in biological physics

Bayesian idea, IIWe start with an experimental trace (“data”).We find its peak (absolute minimum), and start looking for a spike there.

We ask for the likelihood ratio between the hypotheses of no spike versus one spike of given type, at given time, with given amplitude.

★To compute the likelihood of no spike, evaluate the noise distribution on the trace.

★To compute the probability of one spike, choose a spike archetype and a value of t, the spike time. Holding the “data” fixed, the probability is now a Gaussian function in the remaining parameter A, so it’s fast and easy to marginalize over A.

Thursday, March 15, 2012

Page 98: Inference in biological physics

[Nuts and Bolts]

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

which is a Gaussian in A. So it’s easy to marginalize over A: just complete the square! [Here doesn’t depend on A.]

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

of type . Define the deviation

Then the probability that one spike, of type , is present is

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

Let be measured voltage, electrode and be archetype waveform Vα(t) α Fµα(t)

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

P (spikes | data) = Kµ exp�− (A− γµ)2

2σ2µ

− 12(δV)tC−1(δV)

�The noise covariance

JS Prentice, J Homann, KD Simmons, G Tkacik, V Balasubramanian, PCN, PLoS ONE 6(7): e19884 (2011).Thursday, March 15, 2012

Page 99: Inference in biological physics

[Nuts and Bolts]

Next, we sweep over a range of t to find the best value of likelihood ratio for this spike type. [We only check t values close to the peak of the event.]

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

which is a Gaussian in A. So it’s easy to marginalize over A: just complete the square! [Here doesn’t depend on A.]

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

of type . Define the deviation

Then the probability that one spike, of type , is present is

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

Let be measured voltage, electrode and be archetype waveform Vα(t) α Fµα(t)

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

P (spikes | data) = Kµ exp�− (A− γµ)2

2σ2µ

− 12(δV)tC−1(δV)

�The noise covariance

JS Prentice, J Homann, KD Simmons, G Tkacik, V Balasubramanian, PCN, PLoS ONE 6(7): e19884 (2011).Thursday, March 15, 2012

Page 100: Inference in biological physics

[Nuts and Bolts]

Next, we sweep over a range of t to find the best value of likelihood ratio for this spike type. [We only check t values close to the peak of the event.]

Then we choose the winner among spike types.

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

which is a Gaussian in A. So it’s easy to marginalize over A: just complete the square! [Here doesn’t depend on A.]

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

of type . Define the deviation

Then the probability that one spike, of type , is present is

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

Let be measured voltage, electrode and be archetype waveform Vα(t) α Fµα(t)

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

P (spikes | data) = Kµ exp�− (A− γµ)2

2σ2µ

− 12(δV)tC−1(δV)

�The noise covariance

JS Prentice, J Homann, KD Simmons, G Tkacik, V Balasubramanian, PCN, PLoS ONE 6(7): e19884 (2011).Thursday, March 15, 2012

Page 101: Inference in biological physics

[Nuts and Bolts]

Next, we sweep over a range of t to find the best value of likelihood ratio for this spike type. [We only check t values close to the peak of the event.]

Then we choose the winner among spike types.

If the winner’s likelihood ratio is good enough (bigger than about 1), we say there’s a spike here. That’s the absolute criterion I promised earlier.

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

which is a Gaussian in A. So it’s easy to marginalize over A: just complete the square! [Here doesn’t depend on A.]

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

of type . Define the deviation

Then the probability that one spike, of type , is present is

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

Let be measured voltage, electrode and be archetype waveform Vα(t) α Fµα(t)

Nelson notes III: Fitting 11

space, because C−1 is also slightly nonlocal in space. Again the key is “slightly.” Off diagonal

elements are small.

VI.C.2. Optimizing over amplitude

Suppose that we can also approximate the prior P ampl(A;µ) as a Gaussian.19 The advantage

of this assumption, combined with assumption (1), is that we can now optimize over amplitude

analytically (i.e. fast).

Thus

log P ampl(A;µ) = −12 log(2πσ2

µ) − (A−γµ)2

2σ2µ

We estimate γµ and σµ from data by computing the mean and variance of the amplitudes of

each of the exemplars contributing to cluster µ.

Let’s use a uniform prior20 for P time and see how the 1-spike log-probability function depends

on amplitude, for fixed time shift t1 and cell type µ. Let $V (t) be the observed waveform and$Fµ(t) a template.21 Recall that the boldface vectors above combine channel and time indices.

Thus

[δV]αt = Vα(t) − AFµα(t − t1)

Eqn. 2 now becomes22

logP1(µ,A, t1) = log Kµ − (A−γµ)2

2σ2µ

− 12(δV)tC−1(δV) (7)

Here Kµ = P cell(µ)P time(t1)(

2πσ2µ

)

−1/2. (We dropped some factors independent of µ.)

This quantity is maximal at A∗, where

0 = −A∗−γµ

σ2µ

+∑

α,t,β,t′

Fµα(t − t1)C−1α,t;β,t′(Vβ(t′) − A∗Fµβ(t′ − t1)) (8)

This equation is linear in A and hence trivial to solve. Define the adjoint

Gµβ(τ) =∑

α,t̂

Fµα(t̂ )Sα,β(τ − t̂ )

Then let

‖Fµ‖2 =

β,t′′

Gµβ(t′′)Fµβ(t′′)

19 cmultifit.m checks post hoc if this is true, and updates the means and variances based on the fit it found. I’llalso assume that these are all independent Gaussians for each µ; this too could be checked by cmultifit.m,but hasn’t been yet. Gaussianity is thought by some to be a bad approximation [Shoham03], but on closerinspection I see they’re just referring to amplitude changes during a burst [Fee et al].

20 It would not be hard to replace this by a no-refractory-violations prior, if we upgrade the analysis of collisionsto a Lewicki94 approach: Just before the final choice between competing spike train interpretations, clobberthe ones with violations.

21 Units: !F and !V have units µV. A is dimensionless.22 Paninski’s lecture notes 2007, eqn. 3, is essentially the same but is used in a very different framework.

P (spikes | data) = Kµ exp�− (A− γµ)2

2σ2µ

− 12(δV)tC−1(δV)

�The noise covariance

JS Prentice, J Homann, KD Simmons, G Tkacik, V Balasubramanian, PCN, PLoS ONE 6(7): e19884 (2011).Thursday, March 15, 2012

Page 102: Inference in biological physics

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

mean r#2N=9, meth= median

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

20 40 60 80 100−80−60−40−20

020

say more and better

Test our assumptionsCan we really assume that the spikes from a particular cell differ only in overall amplitude? We took many events that contained a single spike of each type. Point by point in time, we subtracted the scaled shifted archetype and found the residual (on each channel).

Green: the archetype itself.Red: mean deviation from archetype.Blue: std deviation from archetype.We really do subtract spikes pretty completely.

Thursday, March 15, 2012

Page 103: Inference in biological physics

Successfully fit overlaps

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

5058−−5063:6(18):LAN=3 R=429(396)

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

100 200 300

−100−50

050

Thursday, March 15, 2012

Page 104: Inference in biological physics

Successfully fit overlaps

Thursday, March 15, 2012

Page 105: Inference in biological physics

Successfully fit overlapsTop: Closeup of four channels, showing three fit archetypes found by the algorithm.Bottom: sum of those fits (color) versus actual data (black).

Thursday, March 15, 2012

Page 106: Inference in biological physics

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

Successfully fit bursts

Thursday, March 15, 2012

Page 107: Inference in biological physics

Get data Cluster Fit Interpret

Thursday, March 15, 2012

Page 108: Inference in biological physics

Each cell has a receptive field...

200 µm

... and they tile the whole visual field. MEA recording is high throughput: We got dozens of cells all at once. Here are cells from just one functional group, “on cells.” Each putative receptive field is a single connected region of image space.

KD Simmons, JS Prentice, G Tkacik, J Homann, PCN, V Balasubramanian, submitted.

Region of retina responded to by ganglion cell #1, etc.

Thursday, March 15, 2012

Page 109: Inference in biological physics

Receptive fieldsOnce you’ve got the spike trains, you can find receptive fields etc. Here’s a typical spike-triggered average.

How interesting--guinea pig retina has a lot of these highly anisotropic receptive fields. The “surround” doesn’t surround the “center”!

Thursday, March 15, 2012

Page 110: Inference in biological physics

Receptive fieldsOnce you’ve got the spike trains, you can find receptive fields etc. Here’s a typical spike-triggered average.

How interesting--guinea pig retina has a lot of these highly anisotropic receptive fields. The “surround” doesn’t surround the “center”!

Thursday, March 15, 2012

Page 111: Inference in biological physics

Receptive fields

stimulus grid y

Once you’ve got the spike trains, you can find receptive fields etc. Here’s a typical spike-triggered average.

How interesting--guinea pig retina has a lot of these highly anisotropic receptive fields. The “surround” doesn’t surround the “center”!

Thursday, March 15, 2012

Page 112: Inference in biological physics

Takehome Part III

Optics Retina Brain Behavior

Visual scene in

Retinal ganglion cell spike trains

Retinal illumination pattern

I described how we identify the individual ganglion cell signals from a hash of noise and overlapping real signals:

Thursday, March 15, 2012

Page 113: Inference in biological physics

Takehome Part III

Optics Retina Brain Behavior

Visual scene in

Retinal ganglion cell spike trains

Retinal illumination pattern

Get data Cluster Fit Interpret

I described how we identify the individual ganglion cell signals from a hash of noise and overlapping real signals:

Thursday, March 15, 2012

Page 114: Inference in biological physics

Medicaltests

ChangepointAnalysis

Multi-Electrode

Array

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

Full circle OK, I was a scatterbrain and gave you three talks. But wait -- if I can fill in the spaces

Thursday, March 15, 2012

Page 115: Inference in biological physics

Medicaltests

ChangepointAnalysis

Multi-Electrode

ArrayPoisson processes

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

Full circle OK, I was a scatterbrain and gave you three talks. But wait -- if I can fill in the spaces

Thursday, March 15, 2012

Page 116: Inference in biological physics

Medicaltests

ChangepointAnalysis

Multi-Electrode

ArrayPoisson processes

Bayes formula 1000 2000 3000

−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

Full circle OK, I was a scatterbrain and gave you three talks. But wait -- if I can fill in the spaces

Thursday, March 15, 2012

Page 117: Inference in biological physics

Medicaltests

ChangepointAnalysis

Multi-Electrode

ArrayPoisson processes

Bayes formula 1000 2000 3000

−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

Theory can cut across apparently different kinds of experiment, offering useful methods to one domain from another without having to reinvent everything.

Full circle OK, I was a scatterbrain and gave you three talks. But wait -- if I can fill in the spaces

Thursday, March 15, 2012

Page 118: Inference in biological physics

Wait, there’s more

ChangepointAnalysis

Multi-Electrode

Array

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

Poisson processes

Bayes formula

Thursday, March 15, 2012

Page 119: Inference in biological physics

Wait, there’s more

ChangepointAnalysis

Multi-Electrode

Array

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

A physical model -- localized spreading of potential changes in solution -- helped us to extract what was going on.

Poisson processes

Bayes formula

Thursday, March 15, 2012

Page 120: Inference in biological physics

Wait, there’s more

ChangepointAnalysis

Multi-Electrode

Array

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

A physical model -- photon theory -- helped us to extract what was going on.

A physical model -- localized spreading of potential changes in solution -- helped us to extract what was going on.

Poisson processes

Bayes formula

Thursday, March 15, 2012

Page 121: Inference in biological physics

Wait, there’s more

ChangepointAnalysis

Multi-Electrode

Array

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

348:12(28):TM

N=25 R=3397(1360)

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

1000 2000 3000−400

−200

0

There is something weirdly -- unreasonably -- effective about approaching biological systems with a physical model. I don’t understand why. I don’t need to understand why.

A physical model -- photon theory -- helped us to extract what was going on.

A physical model -- localized spreading of potential changes in solution -- helped us to extract what was going on.

Poisson processes

Bayes formula

Thursday, March 15, 2012

Page 122: Inference in biological physics

Go long

along the lines of, “clean up the bibliography”

Thursday, March 15, 2012

Page 123: Inference in biological physics

Go longOften, when we want to justify theory, we scratch our heads and say, “Well Hodgkin and Huxley was a big deal.”

along the lines of, “clean up the bibliography”

Thursday, March 15, 2012

Page 124: Inference in biological physics

Go longOften, when we want to justify theory, we scratch our heads and say, “Well Hodgkin and Huxley was a big deal.” Indeed. But that sort of cherry-picking approach can leave the impression that theory is something that happens every 50 years or so. It’s also too reverent.

along the lines of, “clean up the bibliography”

Thursday, March 15, 2012

Page 125: Inference in biological physics

Go longOften, when we want to justify theory, we scratch our heads and say, “Well Hodgkin and Huxley was a big deal.” Indeed. But that sort of cherry-picking approach can leave the impression that theory is something that happens every 50 years or so. It’s also too reverent.

My point so far is that theory is needed every day. It’s our microscope; our Geiger counter; it helps us to see the invisible.

along the lines of, “clean up the bibliography”

Thursday, March 15, 2012

Page 126: Inference in biological physics

Go longOften, when we want to justify theory, we scratch our heads and say, “Well Hodgkin and Huxley was a big deal.” Indeed. But that sort of cherry-picking approach can leave the impression that theory is something that happens every 50 years or so. It’s also too reverent.

My point so far is that theory is needed every day. It’s our microscope; our Geiger counter; it helps us to see the invisible.To emphasize that, I didn’t select famous examples; instead I have told you about the two things I happen to be working on right now (a random choice, you’ll agree).

along the lines of, “clean up the bibliography”

Thursday, March 15, 2012

Page 127: Inference in biological physics

Go longOften, when we want to justify theory, we scratch our heads and say, “Well Hodgkin and Huxley was a big deal.” Indeed. But that sort of cherry-picking approach can leave the impression that theory is something that happens every 50 years or so. It’s also too reverent.

My point so far is that theory is needed every day. It’s our microscope; our Geiger counter; it helps us to see the invisible.To emphasize that, I didn’t select famous examples; instead I have told you about the two things I happen to be working on right now (a random choice, you’ll agree).

Another context in which theory enters laboratory discussions is, “We need some theory to get this thing published. Go do some theory, run some ANOVA, whatever.”

along the lines of, “clean up the bibliography”

Thursday, March 15, 2012

Page 128: Inference in biological physics

Go longOften, when we want to justify theory, we scratch our heads and say, “Well Hodgkin and Huxley was a big deal.” Indeed. But that sort of cherry-picking approach can leave the impression that theory is something that happens every 50 years or so. It’s also too reverent.

My point so far is that theory is needed every day. It’s our microscope; our Geiger counter; it helps us to see the invisible.To emphasize that, I didn’t select famous examples; instead I have told you about the two things I happen to be working on right now (a random choice, you’ll agree).

Another context in which theory enters laboratory discussions is, “We need some theory to get this thing published. Go do some theory, run some ANOVA, whatever.”I’d just like to suggest that this attitude, though common, misses out on some of what theory can do for you. Particularly, a physical model can give a lot of dividends.

along the lines of, “clean up the bibliography”

Thursday, March 15, 2012

Page 129: Inference in biological physics

Go longOften, when we want to justify theory, we scratch our heads and say, “Well Hodgkin and Huxley was a big deal.” Indeed. But that sort of cherry-picking approach can leave the impression that theory is something that happens every 50 years or so. It’s also too reverent.

My point so far is that theory is needed every day. It’s our microscope; our Geiger counter; it helps us to see the invisible.To emphasize that, I didn’t select famous examples; instead I have told you about the two things I happen to be working on right now (a random choice, you’ll agree).

Another context in which theory enters laboratory discussions is, “We need some theory to get this thing published. Go do some theory, run some ANOVA, whatever.”I’d just like to suggest that this attitude, though common, misses out on some of what theory can do for you. Particularly, a physical model can give a lot of dividends.

We like to teach famous success stories in science, but we don’t always remember to present them as showcases of the utility of physical modeling.

along the lines of, “clean up the bibliography”

Thursday, March 15, 2012

Page 130: Inference in biological physics

Thanks

University of Pennsylvania

For these slides see: www.physics.upenn.edu/~pcn

Thursday, March 15, 2012

Page 131: Inference in biological physics

Thanks

University of Pennsylvania NSF DMR, BIO, IBN

NSF NSEC

National Eye Institute Training grant

Computational Neuroscience Training grant For these slides see:

www.physics.upenn.edu/~pcn

Thursday, March 15, 2012