31
www.sciencemag.org/cgi/content/full/338/6103/135/DC1 Supplementary Materials for Network Resets in Medial Prefrontal Cortex Mark the Onset of Behavioral Uncertainty Mattias P. Karlsson, Dougal G. R. Tervo, Alla Y. Karpova* *To whom correspondence should be addressed. E-mail: [email protected] Published 5 October 2012, Science 338, 135 (2012) DOI: 10.1126/science.1226518 This PDF file includes: Materials and Methods Figs. S1 to S10 References

Supplementary Materials for - science.sciencemag.orgscience.sciencemag.org/content/sci/suppl/2012/10/03/338.6103.135... · least 70 trials were kept for the rejection rate analysis

  • Upload
    lykien

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

www.sciencemag.org/cgi/content/full/338/6103/135/DC1

Supplementary Materials for

Network Resets in Medial Prefrontal Cortex Mark the Onset of Behavioral

Uncertainty

Mattias P. Karlsson, Dougal G. R. Tervo, Alla Y. Karpova*

*To whom correspondence should be addressed. E-mail: [email protected]

Published 5 October 2012, Science 338, 135 (2012)

DOI: 10.1126/science.1226518

This PDF file includes: Materials and Methods

Figs. S1 to S10

References

2

Materials and methods

Subjects

12 male Long Evans rats (400-550g) were used to characterize behavior during the

described task. Of these, four animals (500-550g at implantation) were implanted with

microdrive arrays for electrophysiological recordings. Animals were kept at 85% of their initial

body weight before food restriction, and maintained on a 12hr light/12hr dark schedule.

Experiments were conducted according to National Institutes of Health guidelines for animal

research and were approved by the Institutional Animal Care and Use Committee at HHMI’s

Janelia Farm Research Campus.

Behavioral task design

The idea behind the task design was to create a setting in which the animal would

abruptly abandon old beliefs during identifiable periods of behavioral uncertainty, with the added

feature that these moments would be decoupled from abrupt changes in sensory input or

behavioral output. In such a setting, an abrupt change in neural activity is most likely to be a

neural correlate of the change in the internal state of belief. We took advantage of the

expectation that in a stochastic and unstable environment, evidence accumulates gradually but,

once there is sufficient information to indicate that the environment has indeed changed, old

beliefs are abandoned (reset) abruptly and exploration is initiated. Thus, abrupt and substantial

changes in neural activity, such as network resets, that occur at moments when neither the

statistics of the environment nor the behavioral output changes abruptly are likely to be neural

correlates of changes in the belief state. Our behavioral paradigm exploits this situation

particularly well for the following reasons:

3

1) Because of the stochastic nature of the action-outcome association, animals sample both

sides for a wide range of outcome probabilities. Thus, neural activity can be compared between

situations that are identical with respect to action (motor output) and sensory input, but different

in belief state.

2) While the reward probabilities may change abruptly, no single outcome is sufficient (unlike

in the case of deterministic rewards) to inform the animal of that change in the environment.

Instead, detection of that change requires gradual evidence accumulation and will thus be

delayed with respect to when the probability is switched. This separates in time the point of the

environmental change from the point when the animal becomes aware of it. Importantly, around

that latter point the local outcome history is statistically constant for trials to the same side.

3) The sequential presentation of the two behavioral options introduces additional stochasticity,

because the manifestation of a decision to re-sample the previously non-preferred option has to

wait, for a variable number of trials, until that option is presented by the computer. Therefore, a

sudden decision to abandon old beliefs will not always coincide with a change in behavioral

output associated with the decision to explore- a change that will result in consecutive trials to

one side being suddenly separated by more trials to the other side. Thus, if reset-like dynamics

are observed in the network activity even in the absence of a coincidental change in local history,

such dynamics can be most parsimoniously attributed to a change in animal’s internal state.

While this design makes it somewhat harder to pinpoint, at the behavioral level, the exact

moment when the decision to explore has been made, it allows us to dissociate abrupt transitions

in network activity due to changes in the internal state of belief from those due to changes in

behavioral output and sensory input.

4

Behavioral apparatus

All behavior was confined to a box with 23 cm high plastic walls and stainless steel

floors (Island Motion Corp). The floor of the box was 25 cm by 34 cm, and the levers and nose

ports were all arranged on one of the short walls. All lights, nose ports, levers, and reward

deliveries were controlled and monitored with a custom-programmed microcontroller, which in

turn communicated via USB to a PC running a control program based on MATLAB (The

Mathworks, Inc.). Nose port entries were detected with an infrared beam-break detector (IR LED

and photodiode pair). The central initiation port contained one white LED that indicated the

option to initiate a new trial. Upon each trial initiation, the left and right levers were

pneumatically extended from the wall, (both were retracted after one of the two levers was

pressed), and simultaneously one of two sounds was presented by two speakers (located on the

two 34 cm walls) with equal volume. The sounds were frequency modulated (1% modulation at

6.67 Hz) around a single base frequency. 6.5 or 14 kHz base frequency indicated, respectively,

that the left or the right lever was correct. The trial identity was random, with equal probability

for each tone being presented. The animals were required to stay in the initiation port for at least

250 ms in order to initiate a trial and for the tone to be played. The animal was then required to

exit the port, and upon exiting could not initiate a new trial for 500 ms. All behavior was video

recorded at 30 frames/sec using an infrared-sensitive camera. Except after incorrect lever

presses, the only visible light source inside the box was from the LED in the initiation port.

During error trials, a white LED array in the ceiling of the box was lit, and no trials could be

initiated during a 30 second timeout period. Liquid rewards (0.1 ml drops of 10% sucrose mixed

with black cherry Kool-Aid) were delivered from the reward ports 0.5 seconds after port entry

with a motorized syringe pump (Harvard Apparatus PHD 2000).

5

Behavioral training

Food-restricted animals were trained to perform the task with minimal ‘shaping’—from

the first moment of training, animals were exposed to the full task with four exceptions: 1)

animals only needed to press the correct lever once before reward became available, 2) reward

probabilities were kept at or above 0.5 for both sides, 3) the timeout period for pressing the

wrong lever was kept short (0.5 seconds) and 4) the time that was allowed to pass between

initiating the trial and pressing the lever, as well as between pressing the lever and collecting the

reward was 300 sec. At first, animals performed the correct sequence of actions rather

infrequently and by chance. However, after approximately 1-2 weeks of training, most animals

learned the entire task structure, including the presence of the option to reject trials and of

unsignalled changes in reward contingencies. After each animal successfully completed two trial

blocks in one session, the number of required lever presses was gradually increased to 5, the

timeout period was increased to 30 seconds, and reward probabilities became randomly drawn

from a set spanning low and high values (0.15, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7 and 0.8) at block

transitions. Reward probabilities associated with the two sides were selected independently. This

design feature made it difficult for the rat to infer the identity of the new and more profitable

option simply from the change in reward probability of the preferred option, thus prompting

exploratory bouts. Animals were considered proficient on the task when they preferentially

rejected the less profitable side and would dynamically update this rejection policy when reward

probabilities changed. Of 13 attempts to train animals with the described method, 12 became

proficient within one month of training. To prevent animals from predicting when reward

probability changes would occur, the number of trials within each block was drawn randomly

6

from a Gaussian distribution (mean 250 with standard deviation of 25). Other than the changed

reward probabilities, no cues were presented to indicate block transitions.

Rejection-rate analysis around behavioral transitions

Large shifts in choice preference were identified using change point analysis on the

stream of accepted trials (18) . The identity of accepted trials was represented as 1s and 0s (for

left and right trials, respectively). The cumulative sum of the difference between each value and

the average session value was calculated along the entire behavioral time series. The presence of

a change point was inferred if the maximum deflection from zero exceeded that for the 99% of

200 bootstraps (random reordering of data points). The location of the change point was taken as

the data point with the highest absolute value of the cumulative sum. Following the detection of

a first change point, the acceptance data stream was split in two (before and after the change

point) and the process was repeated for each segment. This iterative process continued until no

further change point was detected. The change points identified in the stream of accepted trials

was then mapped back onto the full data set. Only those change points that were separated by at

least 70 trials were kept for the rejection rate analysis. The rejection rate was then computed in 5-

trial bins around the detected change points and normalized to the average rejection rate for that

session.

Reaction-time analysis around behavioral transitions

Reaction time for acceptance trials was defined as the time between engaging the

initiation port and pressing of the lever. Reaction time was computed for all accepted trials in 10-

7

trial bins around the detected behavioral change points (see above) and normalized to the average

acceptance trial reaction time for that session.

Implant preparation and surgery

For neural recordings, a microdrive array containing 20 independently movable tetrodes

(29) was chronically implanted on the head of the animal. Each tetrode was constructed by

twisting and fusing together four insulated 13 μm wires (stablohm 800A, California Fine Wire).

Each tetrode tip was gold-plated to reduce the impedance, yielding values of 200-300 kΩ at 1

kHz. Within the implant, the tetrodes converged to a circular bundle (1.9 mm diameter), angled

20° with respect to vertical (pointing towards the midline). This angle allowed the implant to be

positioned laterally relative to the mPFC to avoid puncturing the midline sulcus.

For surgery, trained animals were initially anaesthetized with 5% isoflurane gas (2.5

L/min) and 0.03 mg/kg buprenorphine. After 10-15 minutes, isoflurane was reduced to 0.5-1.0%

and the flow rate to 0.5 L/min. A local anesthetic (Bupivacaine) was injected under the skin 10

minutes before making an incision. The microdrive array was implanted such that the tetrode

bundle was centered 3.0 mm anterior and 1.8 mm lateral to bregma (right hemisphere). Small

stainless steel bone screws and dental cement were used to secure the implant to the skull. One of

the screws was connected to a wire leading to system ground. Before the animal woke up, all

tetrodes were advanced into brain.

Tetrode positioning

Over a period of two weeks following the surgery, the tetrodes were gradually advanced

to a depth of 2 mm along the 20° trajectory, moving approximately 160 μm/day. During this

8

time, animals were acclimatized to performing the task while being tethered to the recording

system. When performance on the task regained pre-surgery levels (in terms of motivation and

dynamic rejection behavior), recording sessions began. After each recording session, any

tetrodes that did not appear to have any isolatable units were advanced by 80 μm. No adjustment

was made within 12 hours prior to each recording session. Once a tetrode had been moved a total

of 2.5 mm from the surface, which is the approximate border between anterior cingulate and

prelimbic cortices (fig. S2), it was no longer advanced.

Recording sessions and data preprocessing

Each recording session lasted 1.5 to 3 hours, depending on the animal’s motivation to

perform. Animals were not forced to perform the task and occasionally took breaks (generally

around 5 minutes, but sometimes up to 30 minutes). Combining neural recordings with behavior

imposes limitations on the length and number of recording sessions, prompting us to simplify the

task structure to emphasize exploratory bouts in the following way. We limited the reward

structure to reversals of high/low probabilities, but used animals that had previously been trained

on more complex probability pairs. Reversals of high/low probabilities were used to make

exploratory bouts stand out. Keeping the high/low probabilities at 0.5 and 0.25 respectively was

usually sufficient for this purpose, but higher contrast (0.6/0.2 or 0.7/0.3) was occasionally

required. At the end of each block, both probabilities were switched, such that the previously

unfavorable side was now favorable. Despite the simplification of the block structure, animals

continued to go through a transient period of exploration once they detected a change in reward

contingencies, likely due to their prior training. Although animals quickly transitioned to a stable

strategy of preferential acceptance of one trial type, presumably due to the relatively large

9

difference in reward probabilities associated with selection of the two sides, exploratory bouts

were well-defined.

Data were collected with an Nspike data acquisition system (L. Frank, UC San Francisco,

and J. MacArthur, Harvard Instrumentation Design Laboratory). Signals were first amplified on

the animals using a small unity-gain preamplifier array. Then, signals were carried to the Nspike

system via a bundle of 80 fine wires (Cooner Wire) to be digitized and processed. An infrared

diode array with a large and a small group of diodes was attached to the animal's preamplifier

array and the animal's position in the environment was reconstructed using semi-automated

analysis of a digital video recording of the experiment with custom-written software. Spike data

were sampled at 30 kHz, low and high pass filtered at 600 Hz and 6 KHz, respectively (2 pole

Bessel) and all above-threshold events were saved to disk. Local field potential data from all

tetrodes was sampled continuously at 1.5 KHz, digitally filtered between 0.5 and 400 Hz and

saved to disk.

After neural data were collected, individual units on each tetrode were identified by

manually classifying spikes using polygons in two-dimensional views of waveform parameters

(Matclust, M.K.). For each channel of a tetrode, the parameters used were peak waveform

amplitude and the waveform’s projection onto the first two principal components computed

across the session. We also used autocorrelation analysis to exclude units with non-physiological

single-unit spike trains. Only units where the entire cluster was visible throughout the recording

session were included. Thus, a unit was not isolated for further analysis if any part of the cluster

vanished into the noise or was cut off by the recording threshold. The quality of each cell’s

isolation was assessed using standard measures, Lratio and isolation distance (30).

10

Cell selection

Only cells that were active during performance of the task were analyzed. This

assessment was performed independently for each of the three analysis intervals within the trial.

For each trial, we computed the firing rate of each cell (number of spikes/1 sec) during the 1-

second interval centered on each of three behavioral analysis point (analysis points 1, 2, and 3;

Fig. 2A caption and see below). As a form of low-pass filtering for activity dynamics, we applied

a slight smoother across trials (Gaussian smoother with 1 trial standard deviation) in order to

reduce false detections of ensemble fluctuations due to rate-measurement variability. Any cell

with mean firing rate below 1 Hz in the 1-second time interval (calculated across the entire

session) was excluded from the pool of analyzed cells for that interval. No effort was made to

distinguish excitatory and inhibitory neurons. Four to nineteen neurons active during the

performance of the task were recorded simultaneously (fig. S3), revealing a variety of response

profiles (see Fig. 2A for examples). The total number of neurons included in the study was 320.

Visualizing network transitions

For visualization purposes only (Figs 2A and 3A) time was aligned to the three analysis

points within the trial and time between these alignment points was stretched or compressed to

allow visualization of activity across the entire trial while still maintaining alignment. Spikes

were binned in 200 ms bins (in stretched/compressed time relative to the average trial timing

between analysis points) and firing rates were computed by dividing the number of spikes by the

actual time that went into the bin (time occupancy normalized rate).

11

Quantitative detection of network transitions

To detect such transitions, we characterized local changes in the firing of individual

neurons and determined when these changes were unexpectedly large across the recorded

ensemble. We minimized the possibility that observed changes in neural activity simply reflect

changes in local behavior by performing all analyses separately on left- and right-bound

acceptance trials. Furthermore, for each trial, the analysis of activity changes was limited to

points (analysis points 1, 2, and 3; Fig. 2A caption) when the animal’s spatial trajectories were

highly stereotyped (fig. S5). These three points represented 1s windows around: 1) the initiation

of a new trial, 2) the lever press, and 3) the reward checking.

The slope in the normalized firing rate for each cell was calculated for each analysis point

using a ten-trial sliding window (Fig. 2B). For each position of the sliding window, the slope of

the firing rate change (for each of the three analysis points) was calculated using linear

regression analysis (MATLAB’s ‘regress’). The total change in firing rate over the 9 time steps

within the ten-trial window was estimated by multiplying the slope by 9. This value was then

normalized by the mean firing rate of the cell at the corresponding analysis point (norm change =

total rate change/mean rate). A steep slope (rising or falling) indicated that the neuron’s firing

rate changed rapidly. In the example in Figure 2, the firing rate of cell 5 decreased by close to its

mean firing rate around trial 58 at all three analysis time points (Fig 2B, right panel). The median

absolute firing rate change across the population was used to gauge how widespread a change in activity

was in the network (Fig. 2C, top panel). With each step of the sliding detection window, we

calculated the probability of observing an equal or larger population-wide fluctuation in firing

rate by chance alignment of uncorrelated fluctuations in single-cell activity. For significance

testing, we performed the same calculation 100 times with scrambled trial order to determine the

probability of observing a population-wide fluctuation in firing rate within the particular analysis

12

window with equal or greater median change due to chance alignment of ongoing single-cell

variability in activity. Independently for each of the three analysis points, we tallied up the

number of ten-trial windows across the 100 scrambled sessions that had a median change value

equal to or greater than that observed in the non-scrambled data, and divided it by the total

number of windows in the 100 scrambled sessions (100 number of ten-trial windows in the

session). A session with 510 trials would provide 500 ten-trial comparison windows; after 100

session scrambles, this would provide 50000 comparison windows. Under the assumption that

the firing rates across the three analysis points were independent, the three probabilities were

then multiplied together to estimate the joint probability of change. To estimate the expected

frequency of the observed network transition for one session, the computed probability was

multiplied by the total number of trial windows in the session. If the expected frequency was

below 0.05/session, we used a peak finder algorithm (PeakFinder for MATLAB, N.Yoder) to

center the detection window on the point with the lowest expected frequency. This window was

later used when assessing the behavioral correlates of the network transition. Our metric revealed

clear moments when the observed population-wide changes in activity were unexpectedly large

(Fig. 2C).

Determining behavioral correlates of network transitions

We focused our analysis on three behavioral variables: side preference (left vs. right

acceptance trials), reward (including both types of trials), and trial rejection. Because these

parameters were binary, we could obtain an instantaneous estimate of the underlying behavioral

parameters by smoothing with a Gaussian (standard deviation of 3 trials, including all trial

types). The change of each behavioral parameter within a 10-trial window was calculated using

13

linear regression (MATLAB’s ‘regress’) to fit a straight line through the 10 behavioral values

(from one trial type) in the window.

We used a generalized linear model (31) with a binomial link function (‘glmfit’ in

MATLAB) to compare the predictive power of reward dynamics to that of rejection dynamics.

We excluded a randomly selected 10% of the behavioral windows during model training for

subsequent testing of the models. During model testing, we used the behavioral measures in

those 10% of windows as model predictors of whether or not a network transition would occur

(‘glmval’ in MATLAB). The model’s predictions were compared to actual detection outcomes.

This procedure was repeated ten times (using a new random test set for each repeat) to build up a

test set equal in size to the original dataset (10-fold cross validation (32)). A receiver operating

characteristic (ROC) curve was calculated for each of the three behavioral predictors using

MATLAB’s ‘perfcurve’, with false and true positive rate on the X- and Y-axis, respectively. The

area under the ROC curves was used to compare performance of the two behavioral predictors.

To estimate the variance of the measure, the entire 10-fold cross validation procedure was

repeated 100 times for each behavioral predictor.

Calculation of network participation

For each detected network transition, we calculated the proportion of active cells that

exceeded in the detection window a normalized change of either 0.75 or 1.0 of their mean firing

rate (calculated across the session). Any cell that exceeded this threshold for at least one of the

three analysis points was considered a participant in the network transition. The total number of

active cells was defined as the number of cells that were active in at least one of the three

analysis points (at least 1 Hz mean rate across the session).

14

Analysis of network transition abruptness

We characterized how abrupt the activity change was by examining the network

dynamics across the 10 trials surrounding each network transition by representing each trial as a

point in a multidimensional space where each dimension corresponded to the firing rate of an

individual cell. If the majority of the cells abruptly change their firing between the same two

trials, the 10 points would fall into two easily separable clusters with the network suddenly

jumping from one cluster to another within a single trial. If, on the other hand, individual cells

modify their activity more gradually or suddenly but on different trials, the 10 points would form

a more spread-out distribution in this space. One way to test this is to force the 10 points into two

clusters (k-means classifier algorithm) without regard for the temporal order of the points and

measure the distance from each point to the centroid of each cluster. As one moves along the trial

axis, the distance to the first cluster would abruptly increase in the first case but grow more

gradually in the second. We used a multi-dimensional k-means classification algorithm

(MATLAB’s ‘kmeans’ with the number of clusters set to 2) to assign each trial in the 10-trial

detection window to one of two states. The rate values for each cell were taken from the three 1-

second analysis points, normalized by that cell’s mean firing rate in each of those intervals.

Because each cell contributed 3 measurement points per trial, the dimensionality of the clustering

space was 3 x N, where N is the number of active isolated cells. The relative ensemble distance

between the two cluster centroids was defined as d1/(d1+d2), where d1 and d2 are the Euclidean

distances to the centroids of cluster 1 and cluster 2, respectively. For the majority of network

transitions, a state transition occurred that was centered in the 10-trial window, but occasionally

the state transitions were significantly off-center in the detection window. To preserve

15

confidence in the timing of the network transitions, the 87 detected transitions included only

those where the k-means state switch occurred between trials 4-5, 5-6, or 6-7 in the detection

window. This excluded 10 transitions from further analysis.

Local field potential analysis

Local field potential was analyzed using time frequency analysis. For each 1-second

analysis point within a trial, the average power at frequencies ranging from 5 Hz to 140 Hz was

calculated, using 5 Hz steps. The average power was then compiled for three frequency ranges-

theta (5-10Hz), low gamma (25-55Hz), and high gamma (65-140Hz). The 55-65 Hz frequency

band was ignored to avoid potential contamination from 60 Hz electrical noise. For each of the

three frequency ranges, the average power was normalized as a percentile for the session, using

data from the same analysis point across all same-side trials in the session. Mean percentiles for

each trial in the 10-trial detection window (3 analysis points, and 3 frequency ranges) were then

calculated across all 87 network transitions. Significance of any change in the LFP power on an

individual trial was established by comparing the values in that trial to the values for the other 9

trials (Wilcoxon rank sum test with the final p value corrected for the number of comparisons).

Analysis of variability in activity around network transitions

We measured relative trial-to-trial variability in network activity in ten-trial windows

centered on trials before and after each network transition. For the initial characterization of trial-

to-trial variability in the activity of each individual cell, we represented this activity on each trial

as a point in a three dimensional space, where each dimension represented mean firing rate of the

cell around one of the three analysis points within the trial. As a measure of trial-to-trial

16

variability in the activity of that cell within a particular ten-trial window of interest, we took the

mean Euclidean distance between all pair-wise combinations of the associated ten points in this

three dimensional space. This individual-cell measure of trial-to-trail variability in activity within

the ten-trial window of interest was then compared (as a percentile) to the same measure across

all the other ten-trial windows in the corresponding session. Doing this analysis separately for

each cell ensured proper normalization for individual firing rates. The mean percentile for all

cells that participated in the network transition (based on a 0.75 activity change threshold) was

subsequently computed at each window location. Finally, this network transition variability

measure was averaged at each window location across all transitions. Large deviations from the

50th percentile (when the lower SEM range of the plot was above the 50th percentile range) was

taken to indicate a significant change in trial-to-trial variability of network activity.

Analysis of ensemble states

To compare ensemble states before and after these transient periods of network plasticity, we

computed a trial-to-trial similarity matrix, using the Euclidean distance in the multidimensional

space representing the state of the network as our measure of state similarity. To compare the

new stable ensemble state to the state before the network transition, we performed a 2-group

comparison of specific values within the similarity matrix. Because the matrix is symmetrical,

only values above the diagonal in the matrix were used to avoid duplication. In the first group,

we included all within-state distances (using trials -20 to -1 for state 1 and trials 16 to 35 for state

2; where trial 0 was when the network transition was detected). In group 2 we included all

between-state distances (similar to the blue dashed box in Fig. 4C, except only 20 trials were

included in each group). 15 trials separated the two states. This was based on the average time

17

needed for the cell activity to stabilize (Fig. 4B). For each transition, we used a Wilcoxon rank

sum test to examine whether the first group of distances was significantly smaller than the

second group. If so, this was interpreted as the average distance between the two groups being

greater than the average within-state distance, and therefore that the new state was significantly

different than the previous state. 73 network transitions were analyzed, as the remaining events

either occurred too early or too late in the session to perform the calculation.

Lesions and histology

Recording sites were assessed by creating small electrolytic lesions with the tetrodes

before the animal was euthanized. 10 µA of current was applied for 14 seconds to each tetrode

before they were retracted from the brain. Then, animals were euthanized, and brains were fixed

with 4% paraformaldehyde, sectioned (50 µm coronal sections), and stained with cresyl violet.

Rej

ecti

on

rate

Rew

ard

pro

b.

0

1

0

1

200 400 600

Trial

Low contrast

Left side

Right side

0

1

0840 880 920

Rew

ard

pro

b.

Trial

Accepted trialRejected trial

A

B

Right trialsLeft trials

Fig. S1. Additional examples of rejection behavior. (A) Rejection behavior for blocks of di�erent reward contrast. Top panel shows the reward probabilities for the right and left trial types. Bottom panel shows the smoothed rejection rates for each trial type (Gaussian smoother with σ=3 trials). Note lower rejection rate for both trial types during low reward contrast period (grey background). (B) Example of a change in choice preference at a block transition without a transient decline in trial rejection. Note an abrupt switch in side preference at trial 885.

A B

AC

PL

M2

1 cm Bregma 2.5 mm

Fig. S2. Recording details. (A) Top view of the microdrive array, showing the 20 shuttles surrounding a connector array. Each shuttle drives a single tetrode into the brain. (B) Coronal section (cresyl violet stain) showing where the ends of two tetrodes were at the end of the experiment (for animal 1). Lesions were made with current injection to make the tetrode locations visible in the brain slices. At the end of each experiment, most tetrodes were located near the border of the anterior cingulate (AC) and prelimbic (Pl) cortices.

0 2 4 6 8 10 12 14 16 18 200

1

2

3

4

Num

ber o

f ses

sion

s

Number of cells active during task performance

5

6

Fig. S3. Number of cells recorded per session. Only cells that had an average �ring rate of 1.0 Hz or more in at least one of the six analysis points (3 for right-bound trials and 3 for left-bound trials) were counted. All other cells were included neither here nor in any further analysis.

Tet 11, Cell 1 Tet 13, Cell 1 Tet 16, Cell 1 Tet 16, Cell 3 Tet 17, Cell 2 Tet 19, Cell 1 Tet 19, Cell 2

Before

After

Spike feature 1

Spik

e fe

atur

e 2

Example event 2 (from �g 3a)

Coun

t

Isolation distance L ratio

Tet 2, Cell 2 Tet 4, Cell 2 Tet 13, Cell 2 Tet 16, Cell 1 Tet 2, Cell 1

Before

After

Spike feature 1

Spik

e fe

atur

e 2

Example event 1 (from �g 2a)

0

40

80

120

0 40 80 120 160 200

Z = 0.3125, N.S.

.0

100

200

300

400

0 0.1 0.2 0.3 0.4 0.5

Coun

t

Z = 0.1702, N.S.

First 5 trials in windowLast 5 trials in window

A

B

C D

Fig. S4. Cell isolation quality during events. (A-B) Spike clusters for the cells shown in the two event examples in the main text (A for Fig. 2A and B for Fig. 3A). Each panel shows two spike features (which, in combination, provided the best visual separation of spikes from the highlighted cell from other spikes) during the �rst 5 trials (top row) and the last 5 trials (bottom row) in each of the two the event windows. Each column shows a di�erent cell (in the same order as presented in the main text). Note that no decrease in isolation quality is apparent during the events in either case. (C-D) Isolation quality for all 87 events. Shown are histograms of the number of cells across two di�erent quality measures ((C) isolation distance; (D) Lratio). Blue lines indicate the �rst �ve trials in the event windows, and the green lines indicate the last 5 trials. No signi�cant di�erences were found with either measure (Isolation distance, Wilcoxon rank sum, Z = 0.3125, N.S.; Lratio, Wilcoxon rank sum, Z=0.1702, N.S.).

0

2

4

6

0 50 100 150

0

1

2

3

0 40 80 120

30 40 5012

16

20

24

28

30 40 5012

16

20

24

28

30 40 5012

16

20

24

28

20 30 40 5022

26

30

34

38

20 30 40 5022

26

30

34

38

20 30 40 5022

26

30

34

38

X position (pixels)

Y po

sitio

n (p

ixel

s)

X position (pixels)

Y po

sitio

n (p

ixel

s)

Dis

tanc

e fr

om m

ean

traj

ecto

ry (z

-sco

re)

Dis

tanc

e fr

om m

ean

traj

ecto

ry (z

-sco

re)

Accepted L-trial

Accepted R-trial

0 1 2 3 4 50

5

10

15

Average distance from mean trajectory (z-score)

Num

ber o

f eve

nts

First 5 trials in windowLast 5 trials in window

A B

C D

E

Analysis point 1 Analysis point 2 Analysis point 3

Analysis point 1 Analysis point 2 Analysis point 3

Z = 0.638, N.S.

Time

Time

Fig. S5. Spatial trajectories during abrupt network transitions. (A) Spatial trajectories of the animal (measured by tracking diodes mounted on the Microdrive) during the example event in Fig. 2A. The three panels (blue, green, and red) plot the trajectories during each of the 1-second analysis intervals in the trial. The wall containing levers and nose ports was located along the top of each plot. The grey lines show all trajectories throughout the session, and the colored lines show the trajectories for the 10-trial window around the event. Note that because the diodes were positioned approximately 4 cm above the animal’s head, head tilts also contributed to the measured trajectories; for example in analysis point 2, the animal tended to swing it’s head backwards as it moved from the lever to the reward port. Inset for the analysis point 3: y position (same range as main plot) over time. (B) Each trial’s spatial trajectory compared to the mean trajectory across the session. For each trial, we tracked the animal’s X and Y location and compiled these locations for the three analysis time points. Data was collected at 30 frames/sec, with 3 seconds analyzed per trial (X and Y data), totaling 180 data points per trial. We computed a z-score transformation for each 180-point trajectory. The yellow bar indicates the 10-trial window where the event was detected. Note that none of the trials within this window had an absolute z-score above 2. (C-D) Same as (A-B) for the example event in Fig. 3A. (E) The number of events vs. the average absolute trajectory z-score in either the �rst 5 trials (blue) or the last �ve trials (green) of the 10-trial detection window. Note that there was no signi�cant di�erence between these groups, suggesting that the detected transitions did not occur because of abrupt changes in spatial trajectories.

0

500

1000

1500

2000

2500

3000

3500

4000

0 -2 -4 -6 -8 -12

Expected frequency (log10)

-102

Tria

l tim

e in

terv

al (s

ec)

Stable network dynamicsduring large time gaps

Signi�cant �uctuationswith no large time gaps

Fig. S6. Network transitions and time gaps between trials. The x-axis represents probability (per session) of the observed network dynamics for each step of the 10-trial sliding window. The vertical dashed black line indicates the probability above which the observed change was considered statistically surprising (compared to the permutation tests). The time gap from the beginning of the 4th trial in the window to the end of the 6th trial is plotted on the y-axis. Note that there are many points with signi�cant network transitions but short trial gaps (green dashed box) and vice versa (red dashed box), suggesting that the abrupt network dynamics were not the result of a slow drift in the neural representation occurring between trials.

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Animal 1 (15 events, 7 sessions) Animal 2 (27 events, 7 sessions) Animal 3 (18 events, 6 sessions) Animal 4 (27 events, 9 sessions)

Nor

mal

ized

reje

ctio

n ra

te

Trial in event window

Fig. S7. Rejection rate during network transitions for individual animals. We performed the same analysis as in Fig 3D, separately for each of the four animals. Each animal showed the same decreasing trend in rejection rate during detected events (plotted is mean rejection rate across events +/- S.E.M.).

0.8

0.9

1.0

1.1

1.2

1 2 3 4 5 6 7 8 9 10

Trial in event window

Nor

mal

ized

reje

ctio

n ra

te

Fig. S8. Rejection rate during network transitions with lower expected frequency. We performed the same analysis as in Fig 3D for the subpopulation of network transitions that showed the expected frequency of 0.01 or less in bootstrap analysis.

1.0

1.1

1.2

0.9

0.82 4 6 8 10

Nor

mal

ized

rew

ard

rate

Trial in event window

N.S.

Fig. S9. Changes in reward frequency and network transitions. Note a slight, but not statistically signi�cant, increase in reward rate occurring around the detected events (Wilcoxon rank sum, Z=0.885, N.S.). The small magnitude of this trend was expected, given that the contrast between the two outcome probabilities was generally too low to generate large and abrupt jumps in outcome trends, and it suggests that reward coding, by itself, cannot explain the occurrence of the abrupt network dynamics. To test this idea directly, however, we used a cross validation method to compare the predictive power of the two behavioral variables (change in reward rate vs. change in rejection rate, both calculated using a straight-line �t over the 10 trials). We divided all behavior into 10-trial non-overlapping segments, and used a generalized linear model to predict the occurrence of a network event based solely on one of the two behavioral variables (see Materials and methods). We found that changes in rejection rate were far better behavioral predictors of the events than changes in reward rate (Wilcoxon rank sum, comparing area under receiver operating characteristic curves, Z=12.16, p < 10-33).

0 10 20 30 40 50 60 70 80

5

10

15

20

25

Number of ‘other side’ accepted trialsbetween event trials 4 and 7

Num

ber o

f det

ecte

d ne

twor

k ev

ents

Fig. S10. Network transitions and trial gaps. Histogram of network transitions with di�erent number of acceptance trials to the ‘other side’ between trials 4 and 7 of the detection window. Note that dispersion by a large number of accepted trials to the other side was not necessary for the network transition to happen. Note particularly that for 25 of the 87 events, no ‘other side’ accepted trials separated the two trials �anking the transition, i.e. the transition occurred during purely consecutive exposure to the same side. This group of network transitions represented abrupt dynamics that was as widespread across the network as the rest of the events (using a 0.75 threshold, see Materials and methods, consecutive partipation = 0.732; non consecutive participation = 0.719).

References

1. E. C. Tolman, Cognitive maps in rats and men. Psychol. Rev. 55, 189 (1948).

doi:10.1037/h0061626 Medline

2. W. Edwards, Behavioral decision theory. Annu. Rev. Psychol. 12, 473 (1961).

doi:10.1146/annurev.ps.12.020161.002353 Medline

3. K. Körding, Decision theory: What “should” the nervous system do? Science 318, 606 (2007).

doi:10.1126/science.1142998 Medline

4. J. M. Pearce, M. E. Bouton, Theories of associative learning in animals. Annu. Rev. Psychol.

52, 111 (2001). doi:10.1146/annurev.psych.52.1.111 Medline

5. T. E. Behrens, M. W. Woolrich, M. E. Walton, M. F. Rushworth, Learning the value of

information in an uncertain world. Nat. Neurosci. 10, 1214 (2007). doi:10.1038/nn1954

Medline

6. A. J. Yu, P. Dayan, Uncertainty, neuromodulation, and attention. Neuron 46, 681 (2005).

doi:10.1016/j.neuron.2005.04.026 Medline

7. M. R. Nassar, R. C. Wilson, B. Heasly, J. I. Gold, An approximately Bayesian delta-rule

model explains the dynamics of belief updating in a changing environment. J. Neurosci.

30, 12366 (2010). doi:10.1523/JNEUROSCI.0822-10.2010 Medline

8. J. M. Pearson, S. R. Heilbronner, D. L. Barack, B. Y. Hayden, M. L. Platt, Posterior cingulate

cortex: Adapting behavior to a changing world. Trends Cogn. Sci. 15, 143 (2011).

doi:10.1016/j.tics.2011.02.002 Medline

9. S. Fusi, W. F. Asaad, E. K. Miller, X. J. Wang, A neural circuit model of flexible sensorimotor

mapping: Learning and forgetting on multiple timescales. Neuron 54, 319 (2007).

doi:10.1016/j.neuron.2007.03.017 Medline

10. W. F. Asaad, G. Rainer, E. K. Miller, Neural activity in the primate prefrontal cortex during

associative learning. Neuron 21, 1399 (1998). doi:10.1016/S0896-6273(00)80658-3

Medline

11. A. Pasupathy, E. K. Miller, Different time courses of learning-related activity in the

prefrontal cortex and striatum. Nature 433, 873 (2005). doi:10.1038/nature03287

Medline

12. S. A. Huettel, A. W. Song, G. McCarthy, Decisions under uncertainty: Probabilistic context

influences activation of prefrontal and parietal cortices. J. Neurosci. 25, 3304 (2005).

doi:10.1523/JNEUROSCI.5070-04.2005 Medline

13. J. H. Sul, H. Kim, N. Huh, D. Lee, M. W. Jung, Distinct roles of rodent orbitofrontal and

medial prefrontal cortex in decision making. Neuron 66, 449 (2010).

doi:10.1016/j.neuron.2010.03.033 Medline

14. H. D. Critchley, C. J. Mathias, R. J. Dolan, Neural activity in the human brain relating to

uncertainty and arousal during anticipation. Neuron 29, 537 (2001). doi:10.1016/S0896-

6273(01)00225-2 Medline

15. B. Y. Hayden, J. M. Pearson, M. L. Platt, Neuronal basis of sequential foraging decisions in a

patchy environment. Nat. Neurosci. 14, 933 (2011). doi:10.1038/nn.2856 Medline

16. C. B. Holroyd, M. G. Coles, Dorsal anterior cingulate cortex integrates reinforcement history

to guide voluntary behavior. Cortex 44, 548 (2008). doi:10.1016/j.cortex.2007.08.013

Medline

17. R. Quilodran, M. Rothé, E. Procyk, Behavioral shifts and action valuation in the anterior

cingulate cortex. Neuron 57, 314 (2008). doi:10.1016/j.neuron.2007.11.031 Medline

18. D. Durstewitz, N. M. Vittoz, S. B. Floresco, J. K. Seamans, Abrupt transitions between

prefrontal neural ensemble states accompany behavioral transitions during rule learning.

Neuron 66, 438 (2010). doi:10.1016/j.neuron.2010.03.029 Medline

19. S. W. Kennerley, M. E. Walton, T. E. Behrens, M. J. Buckley, M. F. Rushworth, Optimal

decision making and the anterior cingulate cortex. Nat. Neurosci. 9, 940 (2006).

doi:10.1038/nn1724 Medline

20. S. W. Kennerley, J. D. Wallis, Evaluating choices by single neurons in the frontal lobe:

Outcome value encoded across multiple decision variables. Eur. J. Neurosci. 29, 2061

(2009). doi:10.1111/j.1460-9568.2009.06743.x Medline

21. B. Y. Hayden, M. L. Platt, Neurons in anterior cingulate cortex multiplex information about

reward and action. J. Neurosci. 30, 3339 (2010). doi:10.1523/JNEUROSCI.4874-09.2010

Medline

22. N. Kolling, T. E. Behrens, R. B. Mars, M. F. Rushworth, Neural mechanisms of foraging.

Science 336, 95 (2012). doi:10.1126/science.1216930 Medline

23. B. Y. Hayden, J. M. Pearson, M. L. Platt, Fictive reward signals in the anterior cingulate

cortex. Science 324, 948 (2009). doi:10.1126/science.1168488 Medline

24. D. R. Euston, B. L. McNaughton, Apparent encoding of sequential context in rat medial

prefrontal cortex is accounted for by behavioral variability. J. Neurosci. 26, 13143

(2006). doi:10.1523/JNEUROSCI.3803-06.2006 Medline

25. M. Rigotti, D. Ben Dayan Rubin, X. J. Wang, S. Fusi, Internal representation of task rules by

recurrent dynamics: The importance of the diversity of neural responses. Front Comput

Neurosci 4, 24 (2010). doi:10.3389/fncom.2010.00024 Medline

26. P. Dayan, A. J. Yu, Phasic norepinephrine: A neural interrupt signal for unexpected events.

Network 17, 335 (2006). doi:10.1080/09548980601004024 Medline

27. S. Bouret, S. J. Sara, Network reset: A simplified overarching theory of locus coeruleus

noradrenaline function. Trends Neurosci. 28, 574 (2005). doi:10.1016/j.tins.2005.09.002

Medline

28. G. Aston-Jones, J. D. Cohen, An integrative theory of locus coeruleus-norepinephrine

function: Adaptive gain and optimal performance. Annu. Rev. Neurosci. 28, 403 (2005).

doi:10.1146/annurev.neuro.28.061604.135709 Medline

29. J. O’Keefe, M. L. Recce, Phase relationship between hippocampal place units and the EEG

theta rhythm. Hippocampus 3, 317 (1993). doi:10.1002/hipo.450030307 Medline

30. N. Schmitzer-Torbert, J. Jackson, D. Henze, K. Harris, A. D. Redish, Quantitative measures

of cluster quality for use in extracellular recordings. Neuroscience 131, 1 (2005).

doi:10.1016/j.neuroscience.2004.09.066 Medline

31. J. A. Nelder, R. W. M. Wedderburn, Generalized linear models. J. R. Stat. Soc. Ser. A 135,

370 (1972). doi:10.2307/2344614

32. R. Kohavi, in Proceedings of the 14th International Joint Conference on Artificial

Intelligence (Morgan Kaufmann, San Francisco, 1995), vol. 2, pp. 1137–1143.