physics.princeton.eduphysics.princeton.edu/archives/theses/lib/upload/chen_xiaowen_thesi… · Abstract In recent years, advances in experimental techniques have allowed for the rst

Statistical physics approaches to

collective behavior

in networks of neurons

Xiaowen Chen

A Dissertation

Presented to the Faculty

of Princeton University

in Candidacy for the Degree

of Doctor of Philosophy

Recommended for Acceptance

by the Department of

Physics

Adviser: Professor William Bialek

November 2020

c© Copyright by Xiaowen Chen, 2020.

All rights reserved.

Abstract

In recent years, advances in experimental techniques have allowed for the first time si-

multaneous measurements of many interacting components in living systems at almost

all scales, making now an exciting time to search for physical principles of collective

behavior in living systems. This thesis focuses on statistical physics approaches to

collective behavior in networks of interconnected neurons; both statistical inference

methods driven by real data, and analytical methods probing the theory of emergent

behavior, are discussed.

Chapter 3 is based on work with F. Randi, A. M. Leifer, and W. Bialek

[Chen et al., 2019], where we constructed a joint probability model for the neural

activity of the nematode, Caenorhabditis elegans. In particular, we extended the

pairwise maximum entropy model, a statistical physics approach to consistently

infer distributions from data that has successfully described the activity of networks

with spiking neurons, to this very different system with neurons exhibiting graded

potential. We discuss signatures of collective behavior found in the inferred models.

Chapter 4 is based on work with W. Bialek [Chen and Bialek, 2020], where we

examine the tuning condition for the connection matrix among neurons such that

the resulting dynamics exhibit long time scales. Starting from the simplest case of

random symmetric connections, we combine maximum entropy and random matrix

theory methods to explore the constraints required from long time scales to become

generic. We argue that a single long time scale can emerge generically from realistic

constraints, but a full spectrum of slow modes requires more tuning.

iii

Acknowledgements

Since the beginning of my graduate school training and perhaps even earlier, I have

been enjoying reading the acknowledgement sections of doctoral theses and books,

and was in awe of how a doctoral degree cannot be completed alone. My thesis is no

different. Throughout the past five years, I have met many amazingly talented and

friendly mentors and colleagues, and grown thanks to my interactions with them.

First and foremost, I would like to thank my advisor Professor Bill Bialek. From

our first meeting during the Open House, he has introduced me to the wonderland

of theoretical biophysics; provided me with continuous support, encouragement and

guidance; and allowed me sufficient freedom to explore my scientific interests. In ad-

dition to his fine taste of choosing research questions and rigor in conducting research,

I have also been influenced by his optimism amid scientific expeditions in the field of

biophysics, and his leadership both as a scientist and as a citizen.

I would like to thank Professor Andrew Leifer and Dr. Francesco Randi for our

collaboration on the data-analysis project in this thesis; they have taught me to be

true to the facts. I also appreciate Andy’s enthusiasm for science, friendliness and

continuous support throughout the years, advising my experimental project, inviting

me to design followup experiments, and serving on my thesis committee. I also would

like to thank Professor Michael Aizenman for serving on my thesis committee; and

Professor Ned Wingreen for serving as a Second Reader of this thesis, providing much

feedback over the past years, and allowing me to attend his group meetings.

I consider myself very lucky to study biological physics at Princeton, especially

while the NSF Center for the Physics of Biological Function (CPBF) was being es-

tablished. The center offers a wonderful and unique community for collaborative

science and learning, financial support for participation in conferences such as the

APS March Meeting and the annual iPoLS meeting, and opportunities to give back

to the community through the undergraduate summer school. I would like to thank

iv

the leadership of Bill and Prof. Josh Shaevitz, and the administrative support from

Dr. Halima Chahboune and Svitlana Rogers. I also thank Halima for her friendliness

and support throughout the years. I would like to thank the theory faculty, Pro-

fessors Curt Callan, David Schwab, Stephanie Palmer, and Vijay Balasubramanian,

who have given me many valuable pieces of advice during theory group meetings. I

also would like to thank the experimental faculty, Professors Robert Austin, Thomas

Gregor, and Josh Shaevitz, for learning, teaching opportunities and career advice.

I learned a tremendous amount from many discussions with the postdocs, graduate

students, and visiting students that I had the fortune to overlap with at the Center,

including Vasyl Alba, Ricard Alert Zenon, Marianne Bauer, Farzan Beroz, Ben Brat-

ton, Katherine Copenhagen, Yuval Elhanati, Amir Erez, Kamesh Krishnamurthy,

Endao Han, Caroline Holmes, Daniel Lee, Zhiyuan Li, Andreas Mayer, Lauren Mc-

Gough, Leenoy Meshulam, Luisa Fernanda Ramırez Ochoa, Pierre Ronceray, Zachary

Sethna, Ben Weiner, Jim Wu, Bin Xu, and Yaojun Zhang. I have also enjoyed many

informal conversations and spontaneous Icahn lunches with many of you. I thank

Cassidy Yang and Diana Valverde Mendez for being amazing office mates and for

befriending a theorist; I miss seeing you in the office. I also would like to thank

especially the members of the Leifer Lab, including Kevin Chen, Matthew Creamer,

Kelsey Hallinen, Ashley Linder, Mochi Liu, Jeffery Nguyen, Francesco Randi, Anuj

Sharma, Monika Scholz, and Xinwei Yu for all the discussions related to worms and

experimental techniques, and for welcoming me in the lab meetings and the lab itself.

I would like to thank the training and support provided by the Department of

Physics. I thank Kate Brosowsky for her support to graduate students, Laurel Lerner

for organizing the departmental recitals, and all of the very friendly and supportive

administrators. My attendance in many conferences and summer schools were made

possible by the Compton Fund. I also thank the Women in Physics groups for the

effort in creating a more inclusive environment in the Department. At the University

v

level, I would like to thank the Graduate School and the Counseling & Psychological

Services at University Health Services for support.

My last five years would be less colorful without the friends I met through graduate

school, including Trithep Devakul, Christian Jepsen, Ziming Ji, Du Jin, Rocio Kiman,

Ho Tat Lam, Zhaoqi Leng, Xinran Li, Sihang Liang, Jingjing Lin, Jingyu Luo, Zheng

Ma, Wenjie Su, Jie Wang, Wudi Wang, Zhenbin Yang, Zhaoyue Zhang, and many

others. I value my friendship with Junyi Zhang and Jiaqi Jiang, which goes back all

the way to attending the same high school in Shanghai to now being in the same

cohort at Princeton Physics. I treasure my friendship with Xue (Sherry) Song and

Jiaqi Jiang, who have been there for me through all the ups and downs of my graduate

career. I would also like to thank Hanrong Chen for company, support, proofreading

this thesis and many other things.

Finally, I would like to thank my parents for their unwavering support and en-

couragement. It was my father, Wei Chen, who bought me a frog to observe and

learn swimming from, and my mother, Yanling Guo, who started a part-time PhD

degree a few years before my graduate journey, who have kindled and cultivated my

curiosity and courage for this scientific quest.

vi

To my parents.

vii

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

1 Introduction 1

1.1 The nervous system and its collective behavior . . . . . . . . . . . . . 4

1.2 Key problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Mathematical and statistical physics methods 11

2.1 Maximum Entropy Principle . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Random Matrix Theory . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Collective behavior in the small brain of C. elegans 22

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Data acquisition and processing . . . . . . . . . . . . . . . . . . . . . 25

3.3 Maximum Entropy Model . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Does the model work? . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5 What does the model teach us? . . . . . . . . . . . . . . . . . . . . . 40

3.5.1 Energy landscape . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5.2 Criticality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

viii

3.5.3 Network topology . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.4 Local perturbation leads to global response . . . . . . . . . . . 44

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Searching for long time scales without fine tuning 50

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3 Time scales for ensembles with different global constraints . . . . . . 57

4.3.1 Model 1: the Gaussian Orthogonal Ensemble . . . . . . . . . . 57

4.3.2 Model 2: GOE with hard stability threshold . . . . . . . . . . 60

4.3.3 Model 3: Constraining mean-square activity . . . . . . . . . . 63

4.4 Dynamic tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5 Conclusion and Outlook 78

A Appendices for Chapter 3 81

A.1 Perturbation methods for overfitting analysis . . . . . . . . . . . . . . 81

A.2 Maximum entropy model with the pairwise correlation tensor constraint 85

A.3 Maximum entropy model fails to predict the dynamics of the neural

networks as expected . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

B Dynamical inference for C. elegans neural activity 89

B.1 Estimate correlation time from the data . . . . . . . . . . . . . . . . 90

B.2 Coupling the neural activity and its time derivative . . . . . . . . . . 93

C Appendices for Chapter 4 97

C.1 How to take averages for the time constants? . . . . . . . . . . . . . . 97

C.2 Finite size effect for Model 2 . . . . . . . . . . . . . . . . . . . . . . . 99

C.3 Derivation for the scaling of time constants in Model 3 . . . . . . . . 100

ix

C.4 Decay of auto-correlation coefficient . . . . . . . . . . . . . . . . . . . 106

C.5 Model with additional constraint on self-interaction strength . . . . . 107

Bibliography 110

x

List of Tables

C.1 Scaling of inverse slowest time scale (gap) g0, width of the support

of spectral density l, and averaged norm per neuron 〈x2i 〉 versus the

Lagrange multiplier ξ (to leading order) in different regimes. . . . . . 102

xi

List of Figures

3.1 Schematics of data acquisition and processing of C. elegans neural

activity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Comparison of the pairwise mutual information distributions for the

calcium-sensitive GCaMP worms and the GFP control worms, with

mutual information measured for the calcium activity for each pairs of

neurons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Discretization of the empirically observed fluorescence signals. . . . . 31

3.4 Model construction: learning the maximum entropy model from data. 34

3.5 No signs of overfitting are observed for pairwise maximum entropy

models with up to N = 50 neurons. . . . . . . . . . . . . . . . . . . . 35

3.6 Pairwise maximum entropy model predicts unconstrained higher order

correlations of the data. . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.7 Comparison between model prediction and data for observables not

constrained by the model. . . . . . . . . . . . . . . . . . . . . . . . . 39

3.8 Energy landscape of the inferred maximum entropy model. . . . . . . 41

3.9 The heat capacity is plotted against temperature for models with dif-

ferent number of neurons, N . . . . . . . . . . . . . . . . . . . . . . . 43

3.10 The topology of the learned maximum entropy model approaches that

of the structural connectome, as the number of neurons being modeled,

N , increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

xii

3.11 Local perturbation of the neural network leads to global response. . . 49

4.1 Schematics for emergent time scales from interconnected neuerons. . . 55

4.2 Spectral density for the connection matrix M drawn from various en-

sembles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3 Mean field results for Model 3, i.e. ensembles with Gaussian prior and

with a maximum entropy constraint on the norm activity. . . . . . . 68

4.4 Finite size effects on the time scales τmax and τcorr for Model 3, inter-

action strength c = 1 (the supercritical phase). . . . . . . . . . . . . . 69

4.5 Langevin dynamics for neural networks to tune itself to the ensembles

with slow modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

A.1 Equilibrium dynamics of the inferred pairwise maximum entropy model

fails to capture the neural dynamics of C. elegans. . . . . . . . . . . 88

B.1 Overlap of the discretized neuron signal, defined as q(∆t) =⟨1N

∑Ni=1 δσi(t)σi(t+∆t)

⟩t, versus delay time ∆t. . . . . . . . . . . . . . 91

B.2 Spectra of the (cross-)correlation matrix for observed neuron groups

with number of neurons N = 10, 30. . . . . . . . . . . . . . . . . . . . 92

B.3 Maximum entropy model constructed by constraining only the mean

activity, and the pairwise correlation 〈θiσj〉 between the magnitude

of neural activity θ and its time derivative σ fails to predict the

correlation among each class of the observables. . . . . . . . . . . . . 95

B.4 The full pairwise maximum entropy model coupling the magnitude

of neural activity and the time derivative of neural activiy does not

improve prediction of higher order statistics compare to the pairwise

maximum entropy model for only the time derivatives. . . . . . . . . 96

xiii

C.1 The fractional standard deviation of log time scale decreases with sys-

tem size, while the fractional standard deviation of time scales is large

and does not show trend of decreasing, suggesting the distribution of

time scales has long tails. . . . . . . . . . . . . . . . . . . . . . . . . 98

C.2 Finite size scaling for τmax and τcorr, for matrices drawn from GOE

with hard stability constraint, interaction strength being the critical,

cc = 1/√

2, (left panel) and super-critical, c = 1 (right panel). For

each system size N , we average the time scales over 1000 Monte Carlo

realization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

C.3 The autocorrelation coefficient R(t) decays with time t. As system size

increases, R(t) approaches the theory prediction of a power-law decay

∼ t−1/2. We need to pay attention to how we take the average. Here,

we are at the critical case in GOE with hard stability constraint. . . . 100

C.4 Comparison between autocorrelation coefficient function at different

system size and the mean field results. . . . . . . . . . . . . . . . . . 100

C.5 Scaling of g0, l, and 〈x2i 〉 as a function of the Lagrange multiplier ξ for

random matrices with maximum entropy constraint on the norm. . . 103

C.6 Autocorrelation coefficient R(t) decays with time at different parame-

ter sets (interaction strength c and Lagrange multiplier ξ). . . . . . . 106

C.7 Finite size effect for the autocorrelation coefficient vs. time. The

interaction strength is set at c = 1 (the supercritical phase). . . . . . 107

C.8 Scaling of the longest time scale tmax and the correlation time tcorr vs.

the constrained norm µ for connection matrix M with the additional

maximum entropy constraint fixing 〈Mii〉 = 0. . . . . . . . . . . . . . 109

xiv

Chapter 1

Introduction

One of the most exciting goals of a physicist is to find principles in the world of

matter. For physicists working in the field of living systems, it is to find principles

that are as effective and perhaps even as elegant as they are in the inanimate counter-

parts. This has been a rewarding yet challenging quest: thanks to the development

of modern experimental methods, physicists are able to perform quantitative mea-

surements on living systems, but are at the same time confronted by the complexity

of these systems, such as the lack of symmetry, being far out of equilibrium, and

the difficulty in isolating causality, etc. Nonetheless, there exist seminal examples

where physics and biology can indeed shed light upon each other. In one direc-

tion, principles can inspire new discoveries in biology. A prominent example was

Schroedingers lecture on What is Life?, where the physicist, puzzled by the durable

inheritance amid stochastic fluctuation of mutation, hypothesized the existence of an

code script and anticipated the discovery of DNA [Schrodinger, 1944, Philip, 2018].

In turn, biology also inspires new principles, exemplified by the field of active mat-

ters [Vicsek et al., 1995, Ramaswamy, 2010, Marchetti et al., 2013].

Among the many subfields of physics, the one with the strongest connection to

living systems is perhaps statistical physics. We list four main connections below.

1

• Understand collective behavior: how it emerges from local interaction and why

Statistical physics has traditionally been successful in understanding the

emergent collective properties from local interactions in non-living matter.

Meanwhile, almost by definition, things interesting in living systems are

collective behavior, as living systems either have many interacting chemical

components, are multi-cellular, or interact with many other peers and other

species. Examples of these system-level phenomena are abundant at all levels

of living systems, ranging from gene regulation within a single cell all the way

to the predator-prey dynamics across different species. Statistical mechanics

has been successful in describing for example, the flocking of birds, the stability

of eco-systems, the formation of biofilms, etc. In all these cases, concepts

from statistical mechanics such as phase transitions have been inspiring in

understanding biological functions.

• Characterize living systems across different scales and with different interactions

As mentioned above, collective behavior in biology occurs at all scales. In

addition, the forms of interaction are diverse, including the electro-chemical in-

teraction in an interconnected network of neurons, the mechano-sensory inter-

action among contacting bacteria in a biofilm, or even some phenomenological

“social forces” that aligns flocking birds. Statistical physics offers a framework

to characterize all these different systems, as the establishment of universality

classes suggests that systems with the same symmetry constraints, even with

different detailed interactions, exhibit the same macroscopic properties. The

idea that details do not matter also gives us hope to understand living systems

by replacing the interaction with random ones drawn from specific ensembles.

2

• Characterize high-dimensional data

We are at an exciting time with a proliferating amount of high-dimensional

data in biology. Thanks to advance of high-throughput measurement tech-

nology, scientists can now sequence the entire human genome in a few

hours [Reuter et al., 2015], simultaneously measure up to 10000 neurons in

alert animals [Stringer et al., 2019], and track the position and velocity of

each bird in a flock of thousands [Cavagna et al., 2008]. These vast amounts

of data require new analysis tools. Statistical physics offers tools to directly

learn the rules of interaction from the data (statistical inference), and has been

making great progress in for example quantifying the neural activity of many

neurons in various brain regions [Schneidman et al., 2006, Tkacik et al., 2014,

Meshulam et al., 2017], the adaptive immune system [Mora et al., 2010], and

the flocking of birds [Bialek et al., 2012, Bialek et al., 2014]. Furthermore,

these principles, directly learned from the data, can then be used to modify

our understanding of the real system.

• Connect mechanism to function

With a statistical physics model learned from real biological data, we can ask

questions directly relate to the biology. The first thing we can ask is how the

macroscopic property emerges, and are local interactions enough. Or really, can

physics say something about biology. Then, we can ask what are the functional

advantage of the specific biological system. A simple way to compare it to

other models in the same class is to vary the parameter of the statistical physics

model. Finally, we can ask the question of how the living system responds to

perturbation, how it is maintained in a fluctuating environment, or how it has

evolved to reach the current state.

3

This thesis investigates collective behavior in systems of interconnected neurons.

For the rest of this chapter, we review relevant background in neuroscience, the hy-

pothesis of criticality in biology, and statistical physics and mathematical tools we

will use in this thesis.

1.1 The nervous system and its collective behavior

From neurons to neural networks

The nervous system is a familiar and heavily studied biological system at the interface

between physics and biology. This familiarity naturally arises from our daily obser-

vation of our own cognitive activity, such as reading this sentence while recalling the

title of my dissertation. But in order to understand how those cognitive activities are

performed, and to understand how surprised we should be that our brain can both

read and recall, we need to look deeper into the individual unit of cognition – the

neurons, and how biological functions can emerge from nervous systems with large

numbers of neurons interacting with each other.

Neurons are cells in the nervous system; they are basic units of cognition. Through

careful controls of an electric potential across their membranes, neurons can be excited

electrically. If the excitement is large enough, the change of potential (often in the

form of a pulse called the “action potential”) can propagate down the axon of the neu-

ron in milliseconds [Kandel et al., 2000]. Mathematical models developed by Hodgkin

and Huxley in 1952 have been highly successful in describing the initiation and prop-

agation of these action potentials in individual neurons [Hodgkin and Huxley, 1952].

Meanwhile, once propagated, such changes in voltage of a neuron can excite nearby

neurons through chemical transmission across the synapses, which when the number

of neurons is large, gives rise to cognitive activity. In particular, because the num-

ber of neurons in a nervous system is often very large, ranging from 102 to 1011

4

depending on the organism, studies of the nervous system are amenable to statis-

tical physics methods [Amit et al., 1987]. Theoretical efforts in understanding the

emergent functions of the interconnected neurons include early efforts by the bi-

nary input-output artificial neural networks modeled by McCulloch and Pitts, and

the perceptron developed by Rosenblatt, both which can perform simple classifying

tasks [McCulloch and Pitts, 1943, Rosenblatt, 1958].

In addition, a unique feature of neurons, especially compared to usual physi-

cal systems, is that the brain is plastic: namely, the connection strength among

neurons is a function of time and the neural activity itself, i.e. the brain can

perform learning [Magee and Grienberger, 2020]. The framework was first laid down

by Hebb in 1949 [Hebb, 1949], which is perhaps best summarized as “neurons

that fire together wire together” [Shatz, 1992]. Recently, increasing experimen-

tal evidences have revealed more complex updating rules for the neuronal net-

work [Abbott and Nelson, 2000], for example the total synapse strength can be reg-

ulated by some global variable through so-called synaptic scaling [Turrigiano, 2008],

and that the learning rule can also depend on the activity strength of the pre-

and post-synaptic neuron [Bienenstock et al., 1982]. With learning rules, neu-

ronal networks can be designed to perform additional functions, such as con-

tent addressable memory, and optimization tasks [Hopfield, 1982, Hopfield, 1984,

Hopfield and Tank, 1985]. Networks with more non-linearity and unique structures

have been found successful in many more tasks in modern development of usage

of artificial neural networks, such as the Recurrent Neural Networks and Deep

Learning [Cheng and Titterington, 1994].

Finding collective behavior with many-neuron measurements

Recent advance in technologies have allowed simultaneous measurements of an

increasing number of neurons, both in vivo and in brain slices [Dombeck et al., 2010,

5

Ahrens et al., 2013, Segev et al., 2004, Nguyen et al., 2016, Nguyen et al., 2017,

Venkatachalam et al., 2016]. As shown by the “Moores law” in neuroscience

in [Stevenson and Kording, 2011], the number of simultaneously recorded neurons

has a doubling time of 7.4 years since the 1960s, when only single neuron activity

could be measured; a recent experiment was able to measure more than 10,000

neurons in the mouse visual cortex [Stringer et al., 2019]. These recordings offer an

exciting opportunity for researchers to test the statistical physics idea that cognitive

activities are emergent from interactions among neurons, and to reveal new principles

of interacting nervous systems.

How to extract principles from these high-dimensional data? One approach is

to examine coarse-grained variables of the neural activity. Interestingly, researchers

have often found self-similarity of such variables when measured at different scales:

examples include a power-law like distribution of avalanche sizes in cultured slices

of rat cortex [Beggs and Plenz, 2003], and a non-trivial RG fixed point when coarse-

graining the spiking pattern in mice hippocampus [Meshulam et al., 2018]. Another

approach is to infer the entire probability distribution from real data using maximum

entropy models [Jaynes, 1957]. In many cases, these models can illustrate the col-

lective character of network activity. In particular, the state of individual neurons

often can be predicted with high accuracy from the state of the other neurons in the

network, and the models that are inferred from the data are close to critical surfaces

in their parameter space [Tkacik et al., 2009, Tkacik et al., 2015].

These novel experimental data and analysis methods have led to a controver-

sial hypothesis, that biological systems operate at criticality for functional advan-

tages [Mora and Bialek, 2011, Munoz, 2018]. This includes both statistical criticality

with a peak in the heat capacity as parameters are varied to manifest an optimal

dynamic range, and dynamical criticality, which is believed to be essential for the

biological systems to operate with a high sensitivity and information processing ca-

6

pacity, while maintaining certain robustness. Examples of both criticalities have been

found in many collective biological systems in additional to the nervous system, such

as flocking of birds [Bialek et al., 2012, Bialek et al., 2014].

1.2 Key problems

While the quest of finding collective behavior in neural data and understanding the

emergence of such collective behavior is exciting, especially given the increasingly

high-dimensional data which are available, both modeling collective behavior using

real data, and finding principles such as the criticality hypothesis are still relatively

new. Many questions remain to be explored, including:

• How general is the approach of constructing joint distributions of neural activity

using maximum entropy models that build on local interactions? For example,

whether these approaches could capture the dynamics of networks in which the

neurons generate graded electrical responses.

• In networks where neurons generate graded electrical responses, does the system

still exhibit signatures of ciriticality? How general is the criticality hypothesis

in models of neuronal systems?

• Often for a system to exhibit signatures of criticality, some level of fine tuning

is required. For example, in the case of Ising models, there is only one critical

temperature which gives self-similarity. How can biological systems achieve

criticality? How much fine tuning is required vs. how much of these can be self-

organized? And how is the criticality maintained when the system is coupled

to an environment?

7

• Are there alternative mechanisms for neural systems to exhibit signatures of

criticality, such as an emergence of long time scales in a dynamical network,

without being poised at criticality? Are they easier to maintain?

1.3 Thesis overview

This thesis addresses the above questions in two example problems: statistical in-

ference of experimentally observed neural activity in the nematode, Caenorhabditis

elegans ; and the search for long time scales without fine tuning in dynamical systems.

Example 1: Collective behavior in a small brain

In the past decade, it has been shown that in some regions of the brain such as

the salamander retina as it responds to natural movies, and the mouse hippocampus

during exploration, statistical physics models with only pairwise interactions can de-

scribe the network activity quite well, and that the models inferred from the data are

close to critical surfaces in their parameter space. Nonetheless, almost all discussions

of collective phenomena in networks of neurons had been focused on large vertebrate

brains, with neurons that generate discrete, stereotyped action potentials or spikes.

In Chapter 3, we showed that these inverse approaches can be successfully extended

to capture the neural activity of networks in which neurons have graded electrical re-

sponse by studying the nervous system of the nematode C. elegans. Despite the brain

being very different to vertebrate brains in both its analog signal and its compactness,

we found that the network activity for a large portion of the brain can be explained

by a joint probability model, constructed using the maximum entropy principle to

constrain the mean activity of individual neurons and the pairwise interaction. We

also found collective behavior in our model: parameters are close to a critical surface,

as shown by the peak of the heat capacity when we vary the parameters, and the

8

multiple local maxima in the inferred probability distribution of neural activity are

reminiscent to the Hopfield model of memory. In addition, we also made a novel

prediction on the function of such criticality for the brain to be both robust against

external perturbations and efficient in information transmission, which is to be tested

experimentally. The work in this chapter was done in collaboration with F. Randi, A.

M. Leifer, and W. Bialek. It has been previously presented in the 2018 APS March

Meeting in Los Angelas, the 2018 iPoLS Annual Conference in Houston, and the 2019

APS March Meeting in Boston. It has also been published previously as the following

refereed journal article:

• X. Chen, F. Randi, A. Leifer, W. Bialek. Searching for collective behavior in a

small brain, Physical Review E, 99:052418 (2019).

Example 2: Emergent long time scales in models of interconnected neurons

One simple but important example of dynamical criticality is the emergence of long

time scales in neural systems to hold continuous variables in memory for a time scale

much longer than the response time of individual neurons. A simple theoretical model

for these persistent neural activities is a line attractor, which requires fine-tuning of

the parameters that some biological systems were experimentally shown to achieve

through coupling with the stimuli [Seung, 1996, Major et al., 2004a]. But are there

other more general methods, other than fine tuning its interaction parameters, to

generate sufficiently long time scales? And in what cases does one obtain one long

time scale versus a wide range of slow modes? Chapter 4 addresses these questions

by combining maximum entropy and random matrix theory methods to construct

ensembles of networks, and exploring the constraints required for long time scales to

become generic. We argue that a single long time scale can emerge generically from

realistic constraints, but a full spectrum of slow modes requires more tuning. We also

identified the Langevin dynamics that will generate patterns of synaptic connections

9

drawn from these ensembles as familiar dynamics in neuronal systems, involving a

combination of Hebbian learning and activity–dependent synaptic scaling. This work

was done in collaboration with W. Bialek, was presented in the 2020 iPoLS Annual

Conference, and has been posted to the arXiv preprint server as the following article:

• X. Chen and W. Bialek. Searching for long time scales without fine tuning.

arXiv:2008.11674 [physics.bio-ph] (2020).

10

Chapter 2

Mathematical and statistical

physics methods

We describe the statistical physics and mathematical methods used in this thesis. We

first review the maximum entropy principle, which is used in Chapter 3 to construct

the (joint) probability distribution of neural activities in C. elegans and in Chapter 4

to construct random matrix ensembles with global constraints. We will also introduce

random matrix theory that is used in Chapter 4.

2.1 Maximum Entropy Principle

Maximum entropy model was developed more than 60 years ago as scientists gained

better understanding between information and statistical mechanics. An excellent re-

view of its development can be found in [Presse et al., 2013]. The idea that minimal

information equals maximal entropy was first suggested from Shannon’s seminal pa-

per that paved the groundwork of information theory [Shannon, 1948]. Later, Jaynes

was able to connect information theory and statistical mechanics through the max-

imum entropy framework [Jaynes, 1957]; this framework was further developed by

an axiomatic derivation as the only method that draws consistent inferences about

11

probability distributions [Shore and Johnson, 1980]. In recent years, maximum en-

tropy model has been tremendously successful in describing many systems with high

dimension; this is especially true in the field of biophysics, where the model not only

provides a description of the data, but also for the first time allows probing of the

underlying principles of system biology [Mora and Bialek, 2011, Roudi et al., 2009].

For completeness, we outline the basics of maximum entropy methods in this

section.

Maximizing the entropy of a distribution

In equilibrium statistical mechanics, one solve for the equilibrium distribution by

maximizing the entropy subject to the appropriate constraints for the given ensemble.

For example, Boltzmann distribution is the probability distribution for canonical

ensembles, where the entropy is maximized when the averaged energy is fixed. And

for grand canonical ensembles, both averaged energy and averaged number of particle

are constrained. All these are based on a general principle, that if things are not

constrained, they will lead to maximum entropy.

The maximum entropy principle carries to information theory, and is especially

useful in thinking about how to model distributions. For a general probability dis-

tribution, P (x), we can define the Shannon entropy as a quantity that measures the

amount of uncertainty,

S(P ) = −∫dxP (x) lnP (x) (2.1)

It turns out the only way of having a consistent inference is to have the distribution

with the maximum entropy, while satisfying the constraint. Mathematically, we as-

sume the distribution P (x) satisfy a set of constraints on the observable Oµ(x), such

that ∫dxP (x)Oµ(x) = fµ, (2.2)

12

then the distribution

P (x) =1

Zexp

[−∑µ

λµOµ(x)

](2.3)

where the Lagrange multiplier λµ is set to satisfy the constraints, have

S(PME) ≥ S(P ) (2.4)

The distribution is mathematically equivalent to the Boltzmann distribution with an

effective energy written as the combination of the constraints.

What happens if there is a non-uniform prior, Q(x)? Instead of maximizing the

entropy of the distribution subject to the constraint, we can find the distribution that

is the closest to the prior, while matching the constraint. More formally, we minimize

the Kullback-Leibler divergence, an information theoretical measure of the difference

between two distribution

DKL(P ‖ Q) = −∫dxP (x) ln

Q(x)

P (x)(2.5)

The corresponding distribution is

P (x) =1

ZQ(x) exp

[−∑µ

λµOµ(x)

](2.6)

Maximum entropy model in statistical inference

Given that the current trend of biological data often has many degrees of freedom,

and not that much more independent samples, maximum entropy principle offers a

good principle (instead of a more frequentlist approach) to perform statistical infer-

ence. The idea is to choose well-measured observables from the data, and construct

a probablity distribution that matches these constraints, while otherwise has maxi-

mum entropy. This is especially helpful when we use low order of moments as the

13

constraints, since they can be measured accurately with a relatively small amount of

independent samples.

In addition, if we consider low order moments as the constraint, maximum

entropy model is a natural way to construct systems with local interaction. For

example, for a given spin system, these are the correlation. And if the sys-

tem is truly interacting locally, the higher order statistics can be constructed

from the lower interactions. This method has successfully characterized the

collective behavior of many biological systems, such as the coherent motion

of bird flocks [Bialek et al., 2012, Bialek et al., 2014], firing patterns in neu-

ral networks [Schneidman et al., 2006, Meshulam et al., 2017, Tkacik et al., 2009,

Tkacik et al., 2014], protein interaction networks [Weigt et al., 2009], and antibody

distribution in immune systems [Mora et al., 2010].

We now go through the methods. If we have selected a set of constraints, the

goal is to learn the Lagrange multipliers, λµ, such that the model reproduces the

observables in the data. Mathematically, we define

fmodelµ (λ) ≡

∫dxP (x)Oµ(x) (2.7)

fdataµ ≡ 1

T

∑t

Oµ(x(t)) (2.8)

and we want to find the set of λ such that

fmodelµ (λ) = fdata

µ (2.9)

for all constraints fµ.

This is a large system of non-linear equations, and is in general hard to solve.

Instead, we can consider the equivalent but simpler convex optimization problem. If

the system is truly described by the model, then we can write the probability of the

14

data to be

P (D | λµ) =T∏t=1

1

Zexp

[−∑µ

λµOµ(x(t))

](2.10)

the normalized log likelihood (or the empirical log loss) is

L(λ) = − 1

TlogP =

∑µ

λµfdataµ + logZ(λ) (2.11)

At the maximum likelihood, we have as desired,

∂L

∂λµ= 0 = fmodel

µ − fdataµ (2.12)

In real systems with finite data, there is always error when we compute the em-

pirical average of the observable. Thus, we only need to optimize the likelihood of

the data, such that the resulted model-prediction is within the errorbar, given by the

data.

One difficulty of the problem is to compute fmodelµ . Naively, it requires integrat-

ing over all degrees of freedom, and is generally hard to solve. Instead, one can

estimate the observable by sampling the distribution with Monte-Carlo methods,

and perform the average over these samples. This is a computationally expensive

step, although there are many efforts in speeding up the learning [Dudık et al., 2004,

Broderick et al., 2007]. Alternative approaches include approximation methods such

as message passing algorithms [Yedidia et al., 2001, Mezard and Mora, 2009], the

Toulouse-Anderson-Palmer approximation [Thouless et al., 1977, Tanaka, 1998], and

diagrammatical expansion [Cocco and Monasson, 2012].

Comment on statistical inference and dynamics

The maximum entropy distribution is mathematically equivalent to the Boltzmann

distribution, but it is important to note that there is no default connection between

15

the probability distribution we write down with any sort of equilibrium statistical

mechanics systems. In particular, systems with very different dynamics can have

the same steady state distribution. To infer the dynamics for real biological system

requires models that at least take the time derivative of states into consideration, or

with constraints matching cross-time correlation.

2.2 Random Matrix Theory

In physics we often encounter interaction systems with many degrees of freedom,

and in many cases the detailed interaction is too complicated to track down one

by one. Luckily, when the system is large enough, it is often the case that the

detailed interaction does not matter; rather, features of the system are determined

by general properties and the symmetry. This concept of universality was what guided

Eugene Wigner, when he considered the problem of energy level in heavy atoms using

random matrices [Wigner, 1951], which subsequently lead the foundation of the field

of random matrix theory.

The modern field of random matrix theory spans both math and physics. The key

goal is to compute the distribution of eigenvalues and eigenvectors, and the coarse

grained variable such as the spectral density, and fluctuation statistics for random ma-

trices drawn from different ensemble [Akemann et al., 2015]. Random Matrix Theory

is also highly applicable: for example, in number theory, the level spacing is con-

jected to relate to the distribution of the complex zeros of the Riemann-Zeta func-

tion [Keating and Snaith, 2000]; in theoretical physics, it has shed light on problems

in quantum chaos [Kos et al., 2018], black holes [Cotler et al., 2017], and the stability

of metastable states in disordered systems [Castellani and Cavagna, 2005]; in finance,

it helps identify high-reward portfolios from noisy background [Plerou et al., 2002,

Bouchaud and Potters, 2003].

16

In the field of biophysics, there are two main fields where random matrix the-

ory have shown being useful. One is ecology, where Random Matrix Theory was

used to show that large ecological systems with too strong predator-prey interaction

is unstable, as argued in the seminal paper by Sir Robert May [May, 1972]. Fur-

ther investigation considering that not all fixed points are equal under the predator-

pray dynamics showed that there exists phase transitions between single dominant

species and many stable species, related to the marginal stability in disordered sys-

tems [Biroli et al., 2018].

The other biophysics field where random matrix theory has been applied to is

neural dynamics. The interaction matrix of a linear or non-linear neural network is

often considered to be drawn from a simple random matrix ensemble, to study for

example the phase transition between chaos and quiescence, to study how brains

store information, etc. [Sompolinsky et al., 1988, Vreeswijk and Sompolinsky, 1996,

Rajan and Abbott, 2006, Gudowska-Nowak et al., 2020]. In related field such as ma-

chine learning, random matrix theory is also heavily used [Pennington and Worah, 2017,

Can et al., 2020].

Example: Compute the spectral density of the Gaussian Orthogonal En-

semble

For completeness, we sketch here the well-known derivation of the spectral density

for random matrices drawn from the Gaussian Orthogonal Ensemble [Wigner, 1951,

Dyson, 1962a, Dyson, 1962b]. Excellent pedagogical discussions can be found in

Refs [Marino, 2016, Livan et al., 2018]. These same methods allow us to derive the

spectral densities in all the other cases that we consider in the main text.

17

Let M be a matrix with size N ×N . Assume M is real symmetric, and that the

individual elements of the M matrix are independent gaussian random numbers,

Mii ∼ N (0, 1/N), (2.13)

Mij|i 6=j ∼ N (0, 1/2N). (2.14)

This is the Gaussian Orthogonal Ensemble (GOE). Together with its complex and

quaternion counterparts, the Gaussian Ensembles are the only random matrix ensem-

bles that both have independent entries and are invariant under orthogonal (unitary,

symplectic) transformations, which is more obvious when we write the probability

distribution of M in terms of its trace:

P (M) ∝ exp

(−N

2TrMᵀM

). (2.15)

Symmetric matrices can be diagonalized by orthogonal transformations,

M = OᵀΛO, (2.16)

where the matrix O is constructed out of the eigenvectors of M and the matrix Λ is

diagonal with elements given by the eigenvalues λi. Because P (M) is invariant to

orthogonal transformations of M , it is natural to integrate over these transformations

and obtain the joint distribution of eigenvalues. To do this we need the Jacobian,

also called the Vandermonde determinant,

dM =∏i<j

|λi − λj|dµ(O)N∏i=1

dλi, (2.17)

18

where dµ(O) is the Haar measure of the orthogonal group under its own action. Now

we can integrate over the matrices O, or equivalently over the eigenvectors, to obtain

P (λi)N∏i=1

dλi =

∫dµ(O)

∏i<j

|λi − λj|P (M)N∏i=1

dλi (2.18)

P (λi) ∝ exp

[−N

2

∑i

λ2i +

1

2

∑j 6=k

ln∣∣λj − λk∣∣] . (2.19)

Intuitively, one can think about these eigenvalues are experiencing a quadratic

local potential with strength u(λ) = λ2/2. In addition, each pair of the eigenvalues

repel each other with logarithmic strength; this term comes from the Vandermonde

determinant, and gives all the universal features for Gaussian ensembles. Mathe-

matically, the distribution P (λi) is equivalent to the Boltzmann distribution of a

two-dimensional electron gas confined to one dimension.

In mean-field theory, which for these problems becomes exact in the thermody-

namic limit N → ∞, we can replace sums over eigenvalues by integrals over the

spectral density,

ρ(λ) =1

N

∑i

δ(λ− λi). (2.20)

Then the eigenvalue distribution can be approximated by1

P (ρ(λ)) ∝ exp

[−1

2N2S[ρ(λ)]

](2.21)

where

S [ρ(λ))] =

∫dλρ(λ)λ2 −

∫dλdλ′ρ(λ)ρ(λ′) ln

∣∣λ− λ′∣∣ (2.22)

1The double integral over the log difference need to be corrected by the self-interaction. Luckily,these terms, after summation, are of order N , which is small compared to other terms with orderN2.

19

Because N is large, the probability distribution is dominated by the saddle point,

ρ∗, such that

δS

δρ

∣∣∣ρ=ρ∗

= 0 (2.23)

Here,

S = S + κ

∫dλρ(λ) (2.24)

has a term with the Lagrange multiplier κ to enforce the normalization of the density.

Then, the spectral distribution satisfies

λ2 − 2

∫dλ ρ∗(λ′) ln

∣∣λ− λ′∣∣ = −κ. (2.25)

To eliminate κ we can take a derivative with respect to λ, which gives us

λ = Pr

∫dλ′ρ∗(λ′)

λ− λ′ , (2.26)

where we understand the integral to be defined by its Cauchy principal value.

More generally, if

P (λ) ∝ exp

(−∑i

u(λi) +1

2

∑j 6=k

ln|λj − λk|), (2.27)

then everything we have done in the GOE case still goes through but Eq (2.26)

becomes

g(λ) ≡ du(λ)

dλ= Pr

∫dλ′ρ(λ′)

λ− λ′ . (2.28)

Two methods are common in solving equations of this form. One is the resolvent

method, which we will not discuss in detail; see [Livan et al., 2018]. The other is

the Tricomi solution [Tricomi, 1957], which states that for smooth enough g(λ), the

20

solution of Eq (2.28) for the density ρ(λ) is

ρ(λ) =1

π√λ− a

√b− λ

[C − 1

πPr

∫ b

a

dλ′√λ′ − a

√b− λ′

λ− λ′ g(λ′)

], (2.29)

where a and b are the edges of the support, and

C =

∫ b

a

ρ(λ)dλ. (2.30)

If the distribution has a single region of support, then C = 1. If the distribution more

than one region of support, then we need to solve with Tricomi’s solution separately

for each support, and the normalization changes accordingly. In general, solving the

equation reduces to finding the edges of the support.

For the Gaussian Orthogonal Ensemble, we substitute g(λ) = λ into Tricomi’s

solution. The distribution is invariant for λ → −λ, so we can set a = −b. Then the

integral

1

πPr

∫ b

−bdλ′√λ′ − b

√b− λ′

λ− λ′ λ′ = λ2 − b2

2(2.31)

We expect the density to fall to zero at the edges of the support, rather than having a

jump. Thus, we impose ρ(a) = ρ(b) = 0, which sets b =√

2, and the spectral density

becomes

ρ(λ) =1

π

√2− λ2 (2.32)

This is Wigner’s semicircle law.

More generally, one would also like to solve for the two-point function and the

spacing distribution for eigenvalues. They are more complicated, and solution may

not be simple for all random matrix ensembles.

21

Chapter 3

Collective behavior in the small

brain of C. elegans

The materials in this chapter were previously published as “Searching for collective

behavior in a small brain” in [Chen et al., 2019].

In large neuronal networks, it is believed that functions emerge through the col-

lective behavior of many interconnected neurons. Recently, the development of ex-

perimental techniques that allow simultaneous recording of calcium concentration

from a large fraction of all neurons in Caenorhabditis elegans—a nematode with 302

neurons—creates the opportunity to ask if such emergence is universal, reaching down

to even the smallest brains. Here, we measure the activity of 50+ neurons in C. ele-

gans, and analyze the data by building the maximum entropy model that matches the

mean activity and pairwise correlations among these neurons. To capture the graded

nature of the cells’ responses, we assign each cell multiple states. These models,

which are equivalent to a family of Potts glasses, successfully predict higher statisti-

cal structure in the network. In addition, these models exhibit signatures of collective

behavior: the state of single cells can be predicted from the state of the rest of the

22

network; the network, despite being sparse in a way similar to the structural con-

nectome, distributes its response globally when locally perturbed; the distribution

over network states has multiple local maxima, as in models for memory; and the

parameters that describe the real network are close to a critical surface in this family

of models.

3.1 Introduction

The ability of the brain to generate coherent thoughts, percepts, memories, and

actions depends on the coordinated activity of large numbers of interacting neu-

rons. It is an old idea in the physics community that these collective behaviors

in neural networks should be describable in the language of statistical mechan-

ics [Hopfield, 1982, Hopfield, 1984, Amit et al., 1985]. For many years it was very dif-

ficult to connect these ideas with experiment, but new opportunities are offered by the

recent emergence of methods to record, simultaneously, the electrical activity of large

numbers of neurons [Dombeck et al., 2010, Ahrens et al., 2013, Segev et al., 2004,

Nguyen et al., 2016, Nguyen et al., 2017, Venkatachalam et al., 2016]. In particu-

lar, it has been suggested that maximum entropy models [Jaynes, 1957] provide a

path to construct a statistical mechanics description of network activity directly

from real data [Schneidman et al., 2006], and this approach has been pursued in

the analysis of the vertebrate retina as it responds to natural movies and other

light conditions [Schneidman et al., 2006, Cocco et al., 2009, Tkacik et al., 2015,

Tkacik et al., 2014], the dynamics of the hippocampus during exploration of

real and virtual environments [Monasson and Rosay, 2015, Posani et al., 2017,

Meshulam et al., 2017], and the coding mechanism of spontaneous spikes in cortical

networks [Tang et al., 2008, Ohiorhenuan et al., 2010, Koster et al., 2014].

23

Maximum entropy models that match low order features of the data, such

as the mean activity of individual neurons and the correlations between pairs,

make quantitative predictions about higher order structures in the network,

and in some cases these are in surprisingly detailed agreement with experi-

ment [Tkacik et al., 2014, Meshulam et al., 2017]. These models also illustrate

the collective character of network activity. In particular, the state of individual

neurons often can be predicted with high accuracy from the state of the other

neurons in the network, and the models that are inferred from the data are close to

critical surfaces in their parameter space, which connects with other ideas about the

possible criticality of biological networks [Mora and Bialek, 2011, Tkacik et al., 2015,

Munoz, 2018, Meshulam et al., 2018].

Thus far, almost all discussion about collective phenomena in networks of neurons

has been focused on vertebrate brain, with neurons that generate discrete, stereotyped

action potentials or spikes [Rieke et al., 1997]. This discreteness suggests a natural

mapping into an Ising model, which is at the start of the maximum entropy analy-

ses, although one could imagine alternative approaches. What is not at all clear is

whether these approaches could capture the dynamics of networks in which the neu-

rons generate graded electrical responses. An important example of this question is

provided by the nematode Caenorhabditis elegans, which does not have the molecular

machinery needed to generate conventional action potentials [Goodman et al., 1998].

The nervous system of C. elegans has just 302 neurons, yet the worm can still ex-

hibit complex neuronal functions: locomotion, sensing, nonassociative and associative

learning, and sleep-wake cycles [Stephens et al., 2011, Sengupta and Samuel, 2009,

Ardiel and Rankin, 2010, Nichols et al., 2017]. All of the neurons are “identi-

fied,” meaning that we can find the cell with a particular label in every or-

ganism of the species, and in some cases we can find analogous cells in nearby

species [Bullock and Horridge, 1965]. In addition, this is the only organism in which

24

we know the entire pattern of connections among the cells, usually known as the

(structural) connectome [White et al., 1986]. The small size of this nervous system,

together with its known connectivity, has always made it a tempting target for

theorizing, but relatively little was known about the patterns of electrical activity

in the system. This has changed dramatically with the development of genetically

encodable indicator molecules, whose fluorescence is modulated by changes in calcium

concentration, a signal which in turn follows electrical activity [Chen et al., 2013].

Combining these tools with high resolution tracking microscopy opens the possibility

of recording the activity in the entire C. elegans nervous system as the animal behaves

freely [Nguyen et al., 2016, Venkatachalam et al., 2016, Nguyen et al., 2017].

In this paper we make a first try at the analysis of experiments in C. elegans

using the maximum entropy methods that have been so successful in other contexts.

Experiments are evolving constantly, and in particular we expect that recording times

will increase significantly in the near future. To give ourselves the best chance of

saying something meaningful, we focus on sub–populations of up to fifty neurons,

in immobilized worms where signals are most reliable. We find that, while details

differ, the same sorts of models, which match mean activity and pairwise correlations,

are successful in describing this very different network. In particular, the models

that we learn from the data share topological similarity with the known structural

connectome, allow us to predict the activity of individual cells from the state of the

rest of the network, and seem to be near a critical surface in their parameter space.

3.2 Data acquisition and processing

Following methods described previously [Nguyen et al., 2016, Nguyen et al., 2017],

nematodes Caenorhabditis elegans were genetically engineered to expressed two flu-

orescent proteins in all of their neurons, with tags that cause them to be localized

25

to the nuclei of these cells. One of these proteins, GCaMP6s, fluoresces in the green

with an intensity that depends on the surrounding calcium concentration, which fol-

lows the electrical activity of the cell and in many cases is the proximal signal for

transmission across the synapses to other cells [Chen et al., 2013]. The second pro-

tein, RFP, fluoresces in the red and serves as a position indicator of the nuclei as

well as a control for changes in the visibility of the nuclei during the course of the

experiment. Parallel control experiments were done on worms engineered to express

GFP and RFP, neither of which should be sensitive to electrical activity. Although

our ultimate goal is to understand neural dynamics in the freely moving animal, as

a first step we study worms that are immobilized with polystyrene beads, to reduce

motion-induced artifacts [Kim et al., 2013].

As described in Ref. [Nguyen et al., 2016], the fluorescence is excited using lasers.

A spinning disk confocal microscope and a high-speed, high-sensitivity Scientific

CMOS (sCMOS) camera records red- and green-channel fluorescent image of the

head of the worm at a rate of 6 brain-volumes per second at a magnification of 40×; a

second imaging path records the position and posture of the worm at a magnification

of 10×, which are used in the tracking of the neurons across different time frames.

As shown in Fig. 3.1a, the raw data thus are essentially movies. By using a custom

machine-learning approach [Nguyen et al., 2017], we are able to reduce the data to

the green and red intensities for each neuron i, Igi (t) and Iri (t). The data are described

in more detail in [Scholz et al., 2018].

As indicated in Fig. 3.1b, the fluorescence intensity undergoes photobleaching,

fortunately on much longer time scale than the calcium dynamics. Thus, we can

extract the photobleaching effect by modeling the observed fluorescence intensity

26

(a) (c)

0 100 200 300 400 500

time (s)

0

200

400

600

800

1000

1200

flu

ore

sce

nce

in

ten

sity p

er

are

a

(d)

f

0 100 200 300 400 500

time (s)

0.5

1

1.5

I

0 100 200 300 400 500

time (s)

0.5

1

1.5

0 100 200 300 400 500

time (s)

-0.05

0

0.05

(e)

f.

(b)

10× IR

40× GCaMP

(arb

. u

nits)

40× RFP10× RFP

RFP

GCaMP

Figure 3.1: Schematics of data acquisition and processing. (a) Examples of the rawimages acquired through the 10× (scale bar equals 100µm) and 40× (scale bar equals10µm) objectives. The body of the nematode is outlined with light green curves.As an example, we show for one neuron that (b) the intensity of the nuclei-localizedfluorescent protein tags—the calcium-sensitive GCaMP and the control fluorophoreRFP—are measured as functions of time. Photobleaching occurs on a longer timescale than the intracellular calcium dynamics, which allows us to perform photo-bleaching correction by dividing the raw signal with its exponential fit, resulting inthe signals of panel (c). (d) The normalized ratio of the photobleaching-corrected in-tensity, f , is a proxy for the calcium concentration in each neuron nuclei (dark grey).As described in the text, this signal is discretized using the denoised time derivativef ; we use three states, marked as red, blue, and black after smoothing (lightly offsetfor ease of visualization). (e) The time derivative f , extracted using total-variationregularized differentiation.

with an exponential decay:

Ig(t) = Sg(t)(1 + ηg)(e−t/τg + Ag)

Ir(t) = Sr(t)(1 + ηr)(e−t/τr + Ar)

(3.1)

Here, Sg(t) and Sr(t) are the true signals corresponding to the calcium concentration,

ηg and ηr are stochastic variables representing the noise due to the laser and the cam-

era, τg and τr are the characteristic time for photobleaching of the two fluorophores,

27

and Ag and Ar represent nonnegative offsets due to a population of unbleachable

fluorophores, or regeneration of fluorescent states under continuous illumination.1

For each neuron, we fit the observed fluorescence intensities to Eqs (3.1) with

Sg(t) = S0g and ηg = 0, and similarly for Sr(t). As shown by the black lines in Fig. 3.1b,

this captures the slow photobleaching dynamics; we then divide these out to recover

normalized intensities in each channel and each cell, Igi (t) and Iri (t). Finally, to reduce

instrumental and/or motion induced artifacts, we consider the ratio of the normalized

intensities as the signal for each neuron, i.e. fi(t) = Igi (t)/Iri (t) (Fig. 3.1d). In this

normalization scheme, if the calcium concentration remains constant, then fi(t) = 1.

Our goal is to write a model for the joint probability distribution of activity in all

of the cells in the network. One approach to construct the distribution is to directly

use the continuous normalized fluorescence ratio fi(t) as the microscopic degrees of

freedom. However, it is not clear how to select the class of probability distributions

for continuous variables, especially because the number of independent samples is

relatively small due to the large temporal correlation in the data, and that the one-

point and two-point marginal distributions of the data are manifestly non-gaussian.

To stay as close as possible to previous work, at least in this first try, it makes

sense to quantize the activity into discrete states. One possibility is to discretize

based on the magnitude of the fluorescence ratio fi(t). But this is problematic, since

even in “control” worms where the fluorescence signal should not reflect electrical

activity, variations in different cells are correlated; this is illustrated in Fig. 3.2a,

where we see that the distribution of mutual information between fi(t) and fj(t),

across all pairs (i, j), is almost the same in control and experimental worms. A

closer look at the raw signal suggests that normalizing by the RFP intensity is not

1One may worry that a constant “background” fluorescence should be subtracted from the rawsignal, rather than contributing to a divisive normalization. In our data, this background subtractionleads to strongly non-stationary noise in the normalized intensity after the photobleaching correction,in marked contrast to what we find by treating the constant as a contribution from unbleachable orregenerated fluorophores.

28

enough to correct for occasional wobbles of the worm; this causes the distribution

of the fluorescence ratio to be non-stationary, and generates spurious correlations.

This suggests that (instantaneous) fluorescence signals are not especially reliable, at

least given the current processing methods and the state of our experiments. An

alternative is to look at the derivatives of these signals, which are still biologically

meaningful as they capture the net calcium ion flux of the cell, and by definition

suffer from the global noise only at a few instances; now there is very little mutual

information between fi(t) and fj(t) in the control worms, and certainly much less

than in the experimental worms, as seen in Fig. 3.2b.

To give ourselves a bit more help in isolating a meaningful signal, we denoise

the time derivatives. The optimal Bayesian reconstruction of the underlying time

derivative signal u(t) combines a description of noise in the raw fluorescence signal f(t)

with some prior expectations about the signal u itself. We approximate the noise in f

as Gaussian and white, which is consistent with what we see at high frequencies, and

we assume that the temporal variations in the derivative are exponentially distributed

and only weakly correlated in time. Then maximum likelihood reconstruction is

equivalent to minimizing

F (u) =τfσf

∫ T

0

dt|u|+ 1

2σ2nτn

∫ T

0

dt|Au− f |2 , (3.2)

where A is the antiderivative operator, the combination σ2nτn is the spectral density of

noise floor that we see in f at high frequencies, while σf is the total standard deviation

of the signal and τf is the typical time scale of these variations; for more on these

reconstruction methods see Ref. [Chartrand, 2011, Kato et al., 2015]. We determine

the one unknown parameter τf by asking that, after smoothing, the cumulative power

spectrum of the residue Au − f has the least root mean square difference from the

cumulative power spectrum of the extrapolated white noise.

29

As an example, Fig. 3.1e shows the smooth derivative of the trace in Fig. 3.1d.

After the smooth derivative u is estimated, we discretized the smooth estimate of the

signal, Au, into three states of “rise,” “fall,” and “flat,” depending on whether the

derivative u exceeds a constant multiple of σn/τf , the expected standard deviation

of the smooth derivative extracted from a pure white noise. The constant is chosen

to be σn/τf = 5, such that the GFP control worm has almost all pairwise mutual

information being zero after going through the same data processing pipeline. An

example of the raw fluorescence and final discretized signals is shown in Fig. 3.3.

(a) (b)

0 0.5 110-2

100

102

0 0.5 110-2

100

102

. .

P(MI)

(1/bits)

MI(fi; fj) (bits)

P(MI)

(1/bits)

MI(fi; fj) (bits)

GCaMP6

GFP

Figure 3.2: Comparison of pairwise mutual information distribution for the calcium-sensitive GCaMP worms and the GFP control worms. Mutual information isestimated using binning and finite-sample extrapolation methods as describedin [Slonim et al., 2005] for all pairs of neurons. For the normalized fluorescence ratio,f , the distribution of the mutual information, P (MI(fi; fj)), exhibits little differencebetween the calcium-sensitive GCaMP worm and the GFP control worm (panel (a)).In comparison, for the time derivative of the normalized fluorescence ratio, f , thedistribution of the mutual information, P (MI(fi; fj)), is peaked around zero for theGFP control worm, while the distribution is wide for the calcium-sensitive GCaMPworm (panel (b)). This observation suggests that time derivative of fluorescence ratio,fi, is more informative than its magnitude, fi.

30

100 200 300 400 500

10

20

30

40

50

60

70

80

100 200 300 400 500

10

20

30

40

50

60

70

800.5

1

1.5

2

100 200 300 400 500

20

40

60

800.5

1

1.5

2

Neuron ID,i

i

t (sec) t (sec)

(a) (b)

rise

fall

flat

Normalized fluorescence ratio, f Discretized signal, σNormalized fluorescence ratio,(a) (b) Discretized signal,

Neuron ID, i

t (sec) t (sec)

i

Figure 3.3: Discretization of the empirically observed fluorescence signals. (a)Heatmap of the normalized fluorescence ratio between photobleaching-correctedGCaMP fluorescence intensity and RFP fluorescence intensity, f , for each neuronas a function of time. (b) Heatmap of the neuronal activity after discretization basedon time derivatives of f . Green corresponds to a state of “rising”, red “falling”, andwhite “flat”.

3.3 Maximum Entropy Model

After preprocessing, the state of each neuron is described by a Potts variable σi, and

the state of the entire network is σi. As in previous work on a wide range of bio-

logical systems [Schneidman et al., 2006, Tkacik et al., 2014, Meshulam et al., 2017,

Weigt et al., 2009, Mora et al., 2010, Bialek et al., 2012], we use a maximum entropy

approach to generate relatively simple approximations to the distribution of states,

P (σi), and then ask how accurate these models are in making predictions about

higher order structure in the network activity.

The maximum entropy approach begins by choosing some set of observables,

Oµ(σi), over the states of the system, and we insist that any model we write down

for P (σi) match the expectation values for these observables that we find in the

data, ∑σi

P (σi)Oµ(σi) = 〈Oµ(σi)〉expt. (3.3)

31

Among the infinitely many distributions consistent with these constraints, we choose

the one that has the largest possible entropy, and hence no structure beyond what is

needed to satisfy the constraints in Eq. (3.3). The formal solution to this problem is

P (σi) =1

Zexp

[−∑µ

λµOµ(σi)], (3.4)

where coupling constant λµ must be set to satisfy and Eq. (3.3), and the partition

function Z as usual enforces normalization. Note that although the maximum entropy

model is mathematically equivalent to the Boltzmann distribution, and hence can be

analyzed by well-developed tools in equilibrium statistical mechanics, the model is a

probability distribution for the one-time statistics of the data and does not assume

the system to be in thermodynamic equilibrium, nor that the underlying dynamics

obeys detailed balance (see below).

Following the original application of maximum entropy methods to neural activ-

ity [Schneidman et al., 2006], we choose as observables the mean activity of each cell,

and the correlations between pairs of cells. With neural activity described by three

states, “correlations” could mean a whole matrix or tensor of joint probabilities for

two cells to be in particular states. We will see that models which match this tensor

have too many parameters to be inferred reliably from the data sets we have available,

and so we take a simpler view in which “correlation” measures the probability that

two neurons are in the same state. Equation (3.4) then becomes

P (σ) =1

Ze−H(σ) , (3.5)

with the effective Hamiltonian

H(σ) = −1

2

∑i 6=j

Jijδσiσj −∑i

p−1∑r=1

hri δσir . (3.6)

32

The number of states p = 3, corresponding to “rise,” “fall,” and “flat” as defined

above. The parameters are the pairwise interaction Jij and the local fields hri , and

these must be set to match the experimental values of the correlations

cij ≡ 〈δσiσj〉 =1

T

T∑t=1

δσi(t)σj(t) , (3.7)

and the magnetizations

mri ≡ 〈δσir〉 =

1

T

T∑t=1

δσi(t)r . (3.8)

Note that the local field for the “flat” state, hpi , is set to zero by convention. In

addition, the interaction Jij can be non-zero for any pairs of neurons i and j regardless

of the positions of the neurons (both physical and in the structural connectome), i.e.

the equivalent Potts model does not have a pre-defined spatial structure.

The model parameters are learned using coordinate descent and Markov

chain Monte Carlo (MCMC) sampling [Dudık et al., 2004, Broderick et al., 2007,

Schmidt, 2007]. In particular, we initialize all parameters at zero. For each optimiza-

tion step, we calculate the model prediction cij and mri by alternating between MCMC

sampling with 104 MC sweeps and histogram sampling to speed up the estimation.

Then, we choose a single parameter from the set of parameters Jij, hri to update,

such that the increase of likelihood of the data is maximized [Dudık et al., 2004].

We repeat the observable estimation and parameter update steps until the model

reproduces the constraints within the experimental errors, which we estimate from

variations across random halves of the data. This training procedure leaves part of the

interaction matrix Jij zero, while the model is able to reproduce the magnetization

mri and the pairwise correlation cij within the experimental errors (Fig. 3.4).

Because of the large temporal correlation in the data, the number of independent

data in the recording is small compared to the number of parameters. This makes

us worry about overfitting, which we test by randomly selecting 5/6 of the data as

33

0 10 20 30 40 500

0.5

1

0 0.5 1cij, model

0

0.5

1

c ij, dat

a

0 0.5 1

mir, model

0

0.5

1

mir , d

ata

0 10 20 30 40 50Neuron ID

0

0.5

1

mir , d

ata

10 20 30 40 50Neuron ID

10

20

30

40

50

Neu

ron

ID

-0.5

0

0.5

10 20 30 40 50Neuron ID

10

20

30

40

50

Neu

ron

ID

-1

0

1

10 20 30 40 50Neuron ID

10

20

30

40

50N

euro

n ID

-0.5

0

0.5

10 20 30 40 50Neuron ID

10

20

30

40

50

Neu

ron

ID

-1

0

1

i i

j j

CijJij

0 10 20 30 40 50Neuron ID

-5

0

5

h ir , mod

el

mir hi

r

(a) (b)

(c) (d)

i i

rise fall flat

mir, model

mir,

datacij, data

cij, model

(e) (f)

0 10 20 30 40 50-5

0

5

0 10 20 30 40 50Neuron ID

0

0.5

1

mir , d

ata

0 10 20 30 40 50Neuron ID

0

0.5

1

mir , d

ata

Figure 3.4: Model construction: learning the maximum entropy model from data. (a)Connected pairwise correlation matrix, Cij, measured for a subgroup of 50 neurons.(b) The inferred interaction matrix, Jij. (c) Probability of neuron i in state r, forthe same group of 50 neurons as panel (a). (d) The inferred local field, hri . (e)Model reproduces pairwise correlation (unconnected) within variation throughout theexperiment. Error bars are extrapolated from bootstrapping random halves of thedata. (f) Same as panel (e), but for mean neuron activity mr

i .

training set, inferring the maximum entropy model from this training set, and then

comparing the log-likelihood of both the training data and the test data with respect

to the maximum entropy model. No signs of overfitting are found for subgroups of up

to N = 50 neurons, as indicated by that fact that the difference of the log-likelihood

is zero within error bars (Fig. 3.5; details in Appendix A. 1). This is not true if we

34

try to match the full tensor correlations (Appendix A. 2), which is why we restrict

ourselves to the simpler model.

N

ltest-ltrain

ltest-ltrain

N

10 20 30 40 50

0.1

0

-0.1

0.1

0

-0.1

(a)

(b)

Figure 3.5: (a) No signs of overfitting are observed for pairwise maximum entropymodels with up to N = 50 neurons, measured by the difference of per-neuron log-likelihood of the data under the pairwise maximum entropy model for training setsconsists of 5/6 of the data and test sets. Clusters around N = 10, 15, 20, . . . , 50represent randomly chosen subgroups of N neurons. Error bars are the standarddeviation across 10 random partitions of training and test samples. The dashed linesshow the expected per-neuron log-likelihood difference and its standard deviationcalculated through perturbation methods (see Appendix A. 1). (b) The differencebetween log likelihood of the training data and of the test data is greater than 0 (thered line) within error bars for maximum entropy models on N = 10, 20, . . . , 50 neuronswith pairwise correlation tensor constraint (see Appendix A. 2), which suggests thatthis model does not generalize well.

3.4 Does the model work?

The maximum entropy model has many appealing features, not least its mathematical

equivalence to statistical physics problems for which we have some intuition. But this

does not mean that this model gives an accurate description of the real network. Here

we test several predictions of the model. In practice we generate these predictions by

running a long Monte Carlo simulation of the model, and then treating the samples in

35

this simulation exactly as we do the real data. We emphasize that, having matched

the mean activity and pairwise correlations, there are no free parameters, so that

everything which follows is a prediction and not a fit.

Since we use the correlations between pairs of neurons in constructing our model,

the first nontrivial test is to predict correlations among triplets of neurons,

Cijk =

p∑r=1

〈(δσir − 〈δσir〉)(δσjr − 〈δσjr〉)(δσkr − 〈δσkr〉)〉 . (3.9)

More subtly, since we used only the probability of two neurons being in the same

state, we can try to predict the full matrix of pairwise correlations,

Crsij ≡ 〈δσirδσjs〉 − 〈δσir〉〈δσjs〉 ; (3.10)

note that the trace of this matrix is what we used in building the model. Scatter

plots of observed vs predicted values for Cijk and Crsij are shown in Fig. 3.6a and c.

In parts b and d of that figure we pool the data, comparing the root-mean-square

differences between our predictions and mean observations (model error) with errors in

the measurements themselves. Although not perfect, model errors are always within

1.5× the measurement errors, over the full dynamic range of our predictions.

Turning to more global properties of the system, we consider the probability of k

neurons being in the same state, defined as

P (k) ≡⟨ p∑r=1

1∑Ni=1 δσir=k

⟩, (3.11)

where 1 is the indicator function. It is useful to compute this distribution not just

from the data, but also from synthetic data in which we break correlations among

neurons by shifting each cell’s sequence of states by an independent random time. We

see in Fig. 3.7a that the real distribution is very different from what we would see with

36

-0.2 0 0.2C

ijrs, model

-0.2

-0.1

0

0.1

0.2

Cijrs, data

-0.1 0 0.1C

ijrs, model

0.005

0.01

0.015

0.02

Cijrs

-0.01 0 0.01 0.02 0.03

Cijk

, model

2

4

6

8

10

Cijk

10-3

-0.05 0 0.05

Cijk

, model

-0.05

0

0.05

ijkC , data

(a) (b)

(c) (d)

data

model

model

data

Figure 3.6: Model validation: The model predicts unconstrained higher order correla-tions of the data. Panel (a) shows the comparison between model prediction and datafor the connected three-point correlation Cijk for a representative group of N = 50neurons. All 19800 possible triplets are plotted with the blue dot. Error bars are gen-erated by bootstrapping random halves of the data, and are shown for 20 uniformlyspaced random triplets in red. Panel (b) shows the error of three-point function ∆Cijkas a function of the connected three-point function Cijk, binned by its value predictedby the model, Cijk,model. The red curve is the difference between data and modelprediction. The blue curve is the standard error from mean of Cijk over the course ofthe experiment, extracted by bootstrapping random halves of the experiment. Panels(c, d) are the same as panels (a, b), but for the connected two-point correlation tensorCrsij .

independent neurons, so that in particular the tails provide a signature of correlations.

These data agree very well with the distributions predicted by the model.

Our model assigns an “energy” to every possible state of the network [Eq. (3.6)],

which sets the probability of that state according to the Boltzmann distribution.

37

Because our samples are limited, we cannot test whether the energies of individual

states are correct, but we can ask whether the distribution of these assigned energies

across the real states taken on by the network agree with what it predicted from

the model. Figure 3.7b compares these distributions, shown cumulatively, and we

see that there is very good overlap between theory and experiment across ∼ 90%

of the density, with the data having a slightly fatter tail than predicted. The good

agreement extends over a range of ∆E ∼ 20 in energy, corresponding to predicted

probabilities that range over a factor of exp(∆E) ∼ 108.

The maximum entropy model gives the probability for the entire network to be

in a given state, which means that we can also compute the conditional probabilities

for the state of one neuron given the state of all the other neurons in the network.

Testing whether we get this right seems a very direct test of the idea that activity in

the network is collective. This conditional probability can be written as

P (σi|σj 6=i) ∝ exp

[p−1∑r=1

gri δσir

], (3.12)

where the effective fields are combinations of the local field hri and each cell’s inter-

action with the rest of the network.

gri = hri +N∑j 6=i

Jij(δσjr − δσjp) . (3.13)

Then the probabilities for the states of neuron i are set by

P (σi = r)

P (σi = p)= eg

ri , (3.14)

where the last state p is a reference. In Figure 3.7c and d we test these predictions.

In practice we walk through the data, and at each moment in time, for each cell,

we compute effective fields. We then find all moments where the effective field falls

38

-5 0 5girise

-4

-2

0

2

4

6

ln(Prise/Pflat)

-4 -2 0 2 4 6

gifall

-4

-2

0

2

4

6

ln(Pfall/Pflat)

-80 -60 -40 -20

Energy

10-4

10-2

100

0 20 40

k

0

0.05

0.1

P(k)

data

ind.

pairwise

(a) (b)

(c) (d)

model

model

data

data

probability

Figure 3.7: Model validation: comparison between model prediction and data forobservables not constrained by the model. The neuron network has N = 50 neurons.(a) Probability of k neurons being in the same state. Blue dots are computed fromthe data. Yellow dash-dot line is the prediction from a model where all neuronsare independent, generated by applying a random temporal cyclic permutation tothe activity of each neuron. Purple line is the prediction of the pairwise maximumentropy model. (b) Tail distribution of the energy for the data and the model. Allerror bars in this figure are extrapolated from bootstrapping. (c, d) Probability ratioof the state of a single neuron as a function of the effective field gri , binned by thevalue of the effective field. Error bars are the standard deviation after binning.

into a small bin, and compute the ratio of probabilities for the states of the one cell,

collecting the data as shown. The agreement is excellent, except at extreme values

of the field which are sampled only very rarely in the data. We note the agreement

extends over a dynamic range of roughly two decades in the probability ratios.2

2The claim that behaviors are collective requires a bit more than predictability. It is possiblethat behaviors of individual cells are predictable from the state of the rest of the network, but that

39

3.5 What does the model teach us?

3.5.1 Energy landscape

Maximum entropy models are equivalent to Boltzmann distributions and thus define

an energy landscape over the states of the system, as shown schematically in Fig. 3.8a.

In our case, as in other neural systems, the relevant models have interactions with

varying signs, allowing the development of frustration and hence a landscape with

multiple local minima. These local minima are states of high probability, and serve

to divide the large space of possible states into basins. It is natural to ask how many

of these basins are supported in subnetworks of different sizes.

To search for energy minima, we performed quenches from initial condi-

tions corresponding to the states observed in the experiment, as described

in [Tkacik et al., 2014]. Briefly, at each update, we change the state of one neuron

such that the decrease of energy is maximized, and we terminate this procedure when

no single spin flip will decrease the energy; the states that are attracted to local

energy minimum α form a basin of attraction Ωα. As shown in Fig. 3.8c, the number

of energy minima grows sub-exponentially as the number of neurons increases. Note

that this approach only gives us the states that the animal has access to, rather than

all metastable states, whose number is approximated by greedy quench along a long

MCMC trajectory. Nonetheless, the probability of visiting a basin is similar between

the data and the model, shown by the rank-frequency plot (Fig. 3.8d).

Whether the energy minima correspond to well defined collective states depends

on the heights of the barriers between states. Here, we calculate the barrier height

between basins by single-spin-flip MCMC, initialized at one minimum α and terminat-

ing when the state of the system belongs to a different basin Ωβ; the barrier between

most of the predictive power comes from interaction with a single strongly coupled partner. We havechecked that the mutual information I(σi; g

ri ) is larger than the maximum of I(σi;σk), in almost all

cases.

40

α

Ωα

E

Ωβ

(a) (b)

(c) (d)

E

N = 10N = 20N = 30

(e)

basin

metabasin

barrier

local min.

probability density

rank

frequency

model

data

N

Nbasin Nmetabasin

E - E0

β

104

102

100

100

10-5

100 102 10-1 100 101

102

10010-5

100

20 40

0

5

10

15

20

25

Figure 3.8: Energy landscape of the inferred maximum entropy model. (a) Schematicof the energy landscape with local minima α, β and the corresponding basin Ωα, Ωβ.Colored in light blue is the metabasin formed at the given energy threshold, ∆E. (b)Typical distribution of the value of the energy minima and the barriers of a maximumentropy model on N = 30 neurons. The global energy minimum, E0, is subtractedfrom the energy, E. (c) The number of energy minima increases sub-exponentiallyas number of neurons included in the model increases. Error bars are the standarddeviation of 10 different subgroups of N neurons. (d) The rank-frequency plot forfrequency of visiting each basin matches well between data and model for a typicalsubgroup of 40 neurons. (e) The number of metabasins, grouped according to theenergy barrier, diverges when the energy threshold ∆E approaches 1 from above.

basins Ωα and Ωβ is defined as the maximum energy along this trajectory. This sam-

pling procedure is repeated 1000 times for each initial basin to compute the mean

energy barrier. As shown in Fig. 3.8b, the distribution of barrier energies strongly

overlaps the distribution of the energy minima, which implies that the minima are

not well separated.

Further visualization of the topography of the energy landscape is performed by

constructing metabasins, following Ref [Becker and Karplus, 1997]. Here, we con-

41

struct metabasins by grouping the energy minima according to the barrier height;

basins with barrier height lower than a given energy threshold, ∆E, are grouped into

a single metabasin. This threshold can be varied: at high enough threshold, the sys-

tem effectively does not see any local minima; at low threshold, the partition of the

energy landscape approaches the partition given by the original basins of attraction.

If the dynamics were just Brownian motion on the landscape, states within the same

metabasin would transition into one other more rapidly than states belonging to dif-

ferent metabasins. As shown in Fig. 3.8e, there is a transition at ∆E ≈ 1.2 from

single to multiple metabasins for all N = 10, 20, and 30. Since the dynamics of the

real system do not correspond to a simple walk on the energy landscape (Appendix

A. 3 and Fig. A.1), we cannot conclude that this is a true dynamical transition, but

it does suggest that the state space is organized in ways that are similar to what is

seen in systems with such transitions.

3.5.2 Criticality

Maximum entropy models define probability distributions that are equivalent to equi-

librium statistical physics problems. As these systems become large, we know that the

parameter space separates into distinct phases, separated by critical surfaces. In sev-

eral biological systems that have been analyzed, including the neural networks in sala-

mander retina and mouse hippocampus, the diversity of human B cell repertoire, and

the spontaneous flocking of European starlings, there are signs that these critical sur-

faces are not far from the operating points of the real networks [Meshulam et al., 2018,

Tkacik et al., 2015, Bialek et al., 2012, Mora et al., 2010], although the interpreta-

tion of this result remains controversial [Mora and Bialek, 2011, Munoz, 2018]. Here

we ask simply whether the same pattern emerges in C. elegans.

One natural slice through the parameter space of models corresponds to changing

the effective temperature of the system, effectively scaling all terms in the log prob-

42

ability up and down uniformly. Concretely, we replace H(σ)→ H(σ)/T in Eq (3.5).

We monitor the heat capacity of the system, as we would in thermodynamics; here

the natural interpretation is of the heat capacity as being proportional to the vari-

ance of the log probability, so it measures the dynamic range of probabilities that can

be represented by the network. Results are shown in Fig. 3.9, for randomly chosen

subsets of N = 10, 20, ..., 50 neurons. A peak in heat capacity often signals a critical

point, and here we see that the maximum of the heat capacity approaches the oper-

ational temperature T0 = 1 from below as N becomes larger, suggesting that the full

network is near to criticality.

10-1

100

101

Temperature

0

5

10

15

20

25

30

35

40

He

at

ca

pa

city

N = 10

N = 20

N = 30

N = 40

N = 50

Figure 3.9: The heat capacity is plotted against temperature for models with differentnumber of neurons, N . The maximum of the heat capacity approaches the operationaltemperature of the C. elegans neural system T0 = 1 from below as N increases. Errorbars are the standard error across 10 random subgroups of N neurons.

3.5.3 Network topology

The worm C. elegans is special in part because it is the only organism in which we

know (essentially) the full pattern of connectivity among neurons. Our models also

43

have a “connectome,” since only a small fraction of the possible pairs of neurons are

linked by a nonzero value of Jij. The current state of our experiments is such that

we cannot identify the individual neurons, and so we cannot check if the effective

connectivity in our model is similar to the anatomical connections. But we can ask

statistical questions about the connections, and we focus on two global properties

of the network: the clustering coefficient C, defined as the fraction of actual links

compared to all possible links connecting the neighbors of a given neuron, averaged

over all neurons; and the characteristic path length L, defined as the average short-

est distance between any pair of neurons. As shown in Fig. 3.10, the topology of

the inferred networks for all three worms that we investigated differ from random

Erdos-Renyi graphs with the same number of nodes (neurons) and links (non-zero

interactions). Moreover, as we increase the number of neurons that we consider, the

clustering coefficient C and the characteristic path length L approaches that found

in the structural connectome [Watts and Strogatz, 1998].

3.5.4 Local perturbation leads to global response

How well can the sparsity of the inferred network explain the observed globally-

distributed pairwise correlation? In particular, we would like to examine the response

of the network to local perturbations. This test is of particular interest, since its pre-

dictions can be examined experimentally, as local perturbation of the neural network

can be achieved through optogenetic clamping or ablation of individual neurons.

The maximum entropy model can be perturbed through both “clamping” and

“ablation.” By definition, the only possible state in which we can clamp a single neu-

ron is the all “flat” state, σk = p. Following the maximum entropy model [Eq. (3.6)],

the probability distribution for the rest of the network becomes

Pk(σ) ≡ P (σ1, σ2, ...σN−1|σk = 3) =1

Zke−Hk(σ) , (3.15)

44

20 40 60

N

1.5

2

2.5

3

Ch

ara

cte

ristic

pa

th le

ng

th

20 40 60

N

0

0.5

1

Clu

ste

rin

g c

oe

ffic

ien

t

model

(a) (b)

randomstruct.

struct. rand.

Figure 3.10: The topology of the learned maximum entropy model approaches thatof the structural connectome, as the number of neurons being modeled, N , increases.The two global topological properties being measured are the clustering coefficient C(panel (a)) and characteristic path length L (panel (b)). Here, the inferred networktopology for three different worms is plotted in blue. Red curves are for the random-ized network with the same number of neurons, N , and number of connections, NE,as the model, where we expect Lrandom ∼ ln(N)/ ln(2NE/N) and Crandom ∼ 2NE/N

2.The dark blue line corresponds to the network property of the structural connectome;the dark red line corresponds to randomized network with number of nodes and edgesequal to those of the structural connectome [Watts and Strogatz, 1998]. Error barsare generated from the standard deviation across different 10 subgroups of N neurons.

where the effective Hamiltonian is

Hk(σ) = −1

2

∑i 6=j 6=k

Jijδσiσj −∑i 6=k

Jikδσip −∑i 6=k

p−1∑r=1

hri δσir . (3.16)

On the other hand, ablation of neuron k means the removal of neuron k from the

network, which leads to an effective Hamiltonian

Hk(σ) = −1

2

∑i 6=j 6=k

Jijδσiσj −∑i 6=k

p−1∑r=1

hri δσir . (3.17)

We examine the effect of clamping and ablation by Monte Carlo simulation of these

modified models. We focus on the response of individual neurons i to perturbing

45

neuron k, which is summarized by change in the magnetizations, mri → mr

i . But

since these also represent the probabilities of finding the neuron i in each of the

states r = 1, ..., p, we can measure the change as a Kullback–Leibler divergence,

DKL =

p∑r=1

mri log

(mri

mri

)bits. (3.18)

As shown in Fig. 3.11, the response of the network to the local perturbation is dis-

tributed throughout the network for both clamping and ablation. However, clamping

leads to much larger DKLs, suggesting that the network is more sensitive to clamping,

and perhaps robust against (limited) ablation. Interestingly, this result echoes the

experimental observation that C. elegans locomotion is easily disturbed through op-

togenetic manipulation of single neurons [Gordus et al., 2015, Liu et al., 2018], while

ablation of single neurons has limited effect on the worms’ ability to perform differ-

ent patterns of locomotion [Gray et al., 2005, Piggott et al., 2011, Yan et al., 2017],

although further experimental investigation is needed to test our hypotheses on net-

work response.

3.6 Discussion

Soon it should be possible to record the activity of the entire nervous system of

C. elegans as it engages in reasonably natural behaviors. As these experiments evolve,

we would like to be in a position to ask question about collective phenomena in this

small neural network, perhaps discovering aspects of these phenomena which are

shared with larger systems, or even (one might hope) universal. We start modestly,

guided by the state of the data.

We have built maximum entropy models for groups of up to N = 50 cells, match-

ing the mean activity and pairwise correlations in these subnetworks. Perhaps our

most important result is that these models work, providing successful quantitative

46

predictions for many higher order statistical structures in the network activity. This

parallels what has been seen in systems where the neurons generate action potentials,

but the C. elegans network operates in a very different regime. The success of pair-

wise models in this new context adds urgency to the question of when and why these

models should work, and when we might expect them to fail.

Beyond the fact that the models make successful quantitative predictions, we find

other similarities with analyses of vertebrate neural networks. The probability distri-

butions that we infer have multiple peaks, corresponding to a rough energy landscape,

and the parameters of these models appear close to a critical surface. In addition, we

have shown that the inferred model is sparse, and has topological properties similar to

that of the structural connectome. Nevertheless, global response is observed when the

modeled network is perturbed locally, in a way similar to experimental observations.

With the next generation of experiments, we hope to extend our analysis in four

ways. First, longer recording will allow construction of meaningful models for larger

groups of neurons. If coupled with higher signal–to–noise ratios, it should also be

possible to make a more refined description of the continuous signals relevant to

C. elegans neurons, rather than having to compress our description down to a small

number of discrete states. This alternative description will be mathematically equiv-

alent to a Boltzmann distribution of soft spins, constrained by the one- and two-point

functions as well as a family of higher order correlations specified by the data. Sec-

ond, registration and identification of the observed neurons will make it possible to

compare the anatomical connections between neurons with the pattern of interactions

in our probabilistic models. Being able to identify neurons across multiple worms will

also allow us to address the degree of reproducibility across individuals, and perhaps

extend the effective size of data sets by averaging. Third, optogenetic tools will al-

low local perturbation of the neural network experimentally, which can be compared

directly with the theoretical predictions in §V.D above. Finally, improvements in ex-

47

perimental methods will enable constructions of maximum entropy models for freely

moving worms, with which we can map the relation between the collective behavior

identified in the neuronal activity and the behavior of the animal.

48

10 20 30 40 50i

10

20

30

40

50

j

-1.5

-1

-0.5

0

0.5

1

1.5

J ij

0 0.5 1DKL, ablation

10-2

100

102

104

P(D KL

)0 0.5 1

DKL, clamping

10-2

100

102

P(D KL

)

-2 0 2Jij

10-4

10-2

100

102

P(J ij)

ablate k

10 20 30 40 50i

10

20

30

40

50

k

0

0.02

0.04

0.06

0.08

0.1

DKL

clamp k

2040

i

2040

k

0 0.2

0.4

DKLclam

p k

2040

i

2040k

0 0.2

0.4

DKL

(e)

2040

i

2040

j

-1 0 1

Jij20

40i

2040

j

-1 0 1

Jij(a)

clamp k

10 20 30 40 50i

10

20

30

40

50

k

0

0.1

0.2

0.3

0.4

DKL

clamp k

2040

i

2040

k

0 0.2

0.4

DKL

clamp k

2040

i

2040

k

0 0.2

0.4

DKL

(c) (d)

(b)

(f)

Figure 3.11: Local perturbation of the neural network leads to global response. (a,b) For a typical group of N = 50 neurons, the inferred interaction matrix J is sparse.Here, the neuron index i and j are sorted based on mflat

i , as in Fig. 3.4. (c, d) Whenneuron k is clamped to a constant voltage, the Kullback-Leibler divergence (in bits) ofthe marginal distribution of states for neuron i is distributed throughout the network.(e, f) When neuron k is ablated, the DKL is also distributed throughout the network,but is smaller than in response to clamping.

49

Chapter 4

Searching for long time scales

without fine tuning

The materials in this chapter were previously posted in [Chen and Bialek, 2020].

Most of animal and human behavior occurs on time scales much longer than the

response times of individual neurons. In many cases it is plausible that these long

time scales emerge from the recurrent dynamics of electrical activity in networks of

neurons. In linear models, time scales are set by the eigenvalues of a dynamical matrix

whose elements measure the strengths of synaptic connections between neurons. It is

not clear to what extent these matrix elements need to be tuned in order to generate

long time scales; in some cases, one needs not just a single long time scale but a whole

range. Starting from the simplest case of random symmetric connections, we combine

maximum entropy and random matrix theory methods to construct ensembles of net-

works, exploring the constraints required for long time scales to become generic. We

argue that a single long time scale can emerge generically from realistic constraints,

but a full spectrum of slow modes requires more tuning. Langevin dynamics that

50

will generate patterns of synaptic connections drawn from these ensembles involve a

combination of Hebbian learning and activity–dependent synaptic scaling.

4.1 Introduction

Living systems face various challenges over their lifetimes, and responding to these

challenges often involves behaviors that occur over multiple time scales. As an ex-

ample, a migratory bird needs to both react to instantaneous gusts, and to nav-

igate its course over months. Recent experiments have focused attention on this

problem, demonstrating the approximate power–law decay of behavioral correlation

functions in fruit flies and mice [Berman et al., 2016, Shemesh et al., 2013], and near–

marginal modes in locally linear approximations to neural and behavioral dynamics

of the nematode C. elegans [Costa et al., 2019]. These long time scales could emerge

from responses of the organism to a fluctuating environment, or could be intrin-

sic, as would happen if the underlying neural networks were poised near critical-

ity [Mora and Bialek, 2011, Munoz, 2018].

In the cases where we can decouple the organism from its environment, the long

time scales in behavior must come from long time scales in the generator of behav-

iors, the nervous system. While transient responses of individual neurons decay on

the time scale of tens of milliseconds, autonomous behaviors can last orders of mag-

nitude longer. We see this when we hold a string of numbers in our heads for tens of

seconds before dialing a phone, and when a musician plays a piece from memory that

lasts many minutes. Experimentally, long time scales in behavior have been associ-

ated with persistent neural activities, where after a pulse stimulation, some neurons

are found to hold their firing rate at specific values that encode the transient stim-

uli [Aksay et al., 2001, Brody et al., 2003a, Major et al., 2004a, Major et al., 2004b,

Major and Tank, 2004, Srimal and Curtis, 2008].

51

It is plausible that persistent neural activities emerge from the recurrent dynamics

of electrical activity in the network of neurons. In the simplest linear model, the

relaxation times of the system depend on the eigenvalues of a matrix representing the

synaptic connection strengths among neurons, and we can imagine this being tuned

so that time scales become arbitrarily long [Seung, 1996]. This simple model has

successfully explained the long time scale in the oculomotor system of goldfish, where

the nervous system tunes its dynamics to be slow and stable using constant feedback

from the environments [Major et al., 2004a, Major et al., 2004b]. In general, long

time scales in linear dynamical systems require fine tuning, as the modes need to be

slow, but not unstable. There have been a number of discussions of how to avoid such

fine tuning [Brody et al., 2003b], including adding non-linearity to create discrete

approximations of the continuous attractors for the dynamics [Brody et al., 2003b],

placing neurons in special configurations such as a feed-forward line [Goldman, 2009]

or a ring network [Burak and Fiete, 2012], promoting the interaction matrix to a

dynamical variable [Magnasco et al., 2009], and regulating the overall neural activity

with synaptic scaling [Renart et al., 2003, Tetzlaff et al., 2013]. Nonetheless, it is not

clear to what extent connection strengths need to be tuned in order to generate

sufficiently long time scales, especially when one needs not just a single long time

scale but a whole spectrum of slow modes.

In this manuscript, we address the fine-tuning question by asking whether we can

find ensembles of random connection matrices, subject to biologically plausible con-

straints, such that the resulting time scales of the system grow with increasing system

sizes. We also discuss the conditions for systems to exhibit a continuous spectrum of

slow modes as opposed to single slow modes. Finally, we present plausible dynamics

for the system to tune its connection matrix towards these desired ensembles.

52

4.2 Setup

The problem of characterizing time scales in fully nonlinear neural networks—or

any high dimensional dynamical system—is very challenging. To make progress,

we follow the example of Ref [Seung, 1996] and consider the case of linear net-

works. For linear systems, time scales are related to the eigenvalues of the dy-

namical matrix that embodies the pattern of synaptic connectivity. The question

of whether behavior is generic can be made precise by drawing these matrices at

random from some probability distribution, connecting with the large literature on

random matrix theory [Livan et al., 2018, Marino, 2016]. Importantly, we expect

that some behaviors in these ensembles of networks become sharp as the networks

become large, a result which has been exploited in thinking about problems ranging

from energy levels of quantum systems [Wigner, 1951] to ecology [May, 1972] and

finance [Bouchaud and Potters, 2003].

Concretely, we represent the activity each neuron i = 1, 2, · · · , N by a continuous

variable xi, which we might think of as a smoothed version of the sequence of action

potentials, and assume a linear dynamics

xi = −xi +Mijxj + ηi(t) . (4.1)

If the neurons were unconnected (M = 0), their activity x would relax exponentially

on a time scale which we choose as our unit of time. In what follows it will be

important to imagine that the system is driven, at least weakly, and we take these

driving terms to be independent in each cell and uncorrelated in time, where 〈ηi(t)〉 =

0 and

〈ηi(t)ηj(t′)〉 = 2δijδ(t− t′). (4.2)

53

The choice of white noise is conventional, but also important because we want to

understand how time scales emerge from the network dynamics rather than being

imposed upon the network by outside inputs.

In linear systems we can rotate to independent modes, corresponding to weighted

combinations of the original variables. If the matrix M is symmetric, then the dy-

namics are described by the relaxation times of these modes,

τi ≡1

1− λi=

1

ki,

where λi are the eigenvalues of M ; the system is stable only if all λi < 1. If

the matrix M is chosen from a distribution P (M) then the eigenvalues are random

variables, but their density, for example, becomes smooth and well defined in the

limit N →∞,

ρ(λ) ≡ limN→∞

1

N

N∑i=1

δ(λ− λi). (4.3)

The simplest case is the Gaussian Orthogonal Ensemble (GOE), where the matrix

elements are independent Gaussian random variables, with variances such that

Mii ∼ N (0, c2/N) (4.4)

Mij|i 6=j ∼ N (0, c2/2N); (4.5)

the factor of N in the variance ensures that the density ρ(λ) has support over a range

of eigenvalues that are O(1) at large N .

When we say that we want to search for long time scales, there are two possibilities.

One is that we are interested in the single longest time scale, and the other is that

we are interested in the full range of time scales. To get at these different questions,

we define the longest time scale of the system, τmax, and the correlation time scale,

54

Figure 4.1: Schematics. A linear dynamical system with damping and pairwise in-teraction M has time scales determined by the eigenvalue spectrum of M , especiallythe gap g0 to the stability threshold. If the system is perturbed (red arrow), thenorm activity decays with a longest time scale τmax = 1/g0, while the correlationsin the unperturbed system decay with a characteristic time scale, defined to be thecorrelation time τcorr. In cases where the system has a continuous range of long timescales, the correlations decay as a power law (red curve). Systems with long timescales are defined such that τmax and τcorr grow as system size N .

τcorr. We continue to think about the case where M is symmetric, and return to the

more general case in the discussion.

Longest time scale. The longest time scale, τmax, is the time constant given by

the slowest mode of the system, which dominates the dynamics after long enough

times. This time scale is determined by the gap, g0, between the largest eigenvalue

and the stability threshold, which with our choice of units is w = 1. Mathematically,

we define

τmax ≡1

g0

=1

1− λmax

. (4.6)

55

In the thermodynamic limit, the gap is taken to be between the stability threshold

and the right edge of the support of the spectral density1.

Correlation time. To get at the correlation time, let’s take seriously the idea that

the network is driven by noise. Then x(t) become a stochastic process, and from Eqs

(4.1) and (4.2) we can calculate the correlation function

CN(t) ≡ 1

N

∑i

〈xi(0)xi(t)〉

=1

N

∑i

1

1− λie−(1−λi)|t|

=1

N

∑τie−|t|/τi .

(4.7)

The normalized correlation function

RN(t) ≡ CN(t)

CN(0)=

∑i τie

−|t|/τi∑i τi

(4.8)

has the intuitive behavior of starting at RN(0) = 1 and decaying monotonically. Then

there is a natural definition of the correlation time, by analogy with single exponential

decays,

τcorr(λ) ≡∫ ∞

0

dtRN(t) =

∑i τ

2i∑

i τi. (4.9)

In the thermodynamic limit, the autocorrelation coefficient R(t) and the correlation

time τcorr becomes the ratio of two integrals over the eigenvalue density ρ(λ).

Importantly, τmax depends only on the largest eigenvalue, while τcorr depends on

the entire spectrum, and hence can be used to differentiate cases where the system is

dominated by a single vs. a continuous spectrum of slow modes. The two time scales

satisfy τcorr ≤ τmax, with the equality assumed only when all eigenvalues are equal,

i.e. the spectral density is a delta function at λ = λmax.

1We note that this approximation is not ideal, as the spectral distribution of eigenvalues does notconverge uniformly. In some cases, the fluctuation of the largest eigenvalue can be more meaningfulthan the average.

56

With these definitions, we refine our goal as to find biologically plausible ensembles

for the connection matrix M , such that the resulting stochastic linear dynamics has

time scales, τmax and τcorr, that are “long,” growing as a power of the system size N ,

perhaps even extensively. To avoid fine tuning, we will construct examples of such

ensembles by imposing global constraints on measurable observables of the dynamical

system. We then compute the spectral density and the corresponding time scales using

a combination of mean-field theory and numerical sampling of finite systems.

4.3 Time scales for ensembles with different global

constraints

To construct linear dynamical systems that generate long time scales without fine

tuning individual parameters, we want to find probability distributions P (M) for

the connection matrix such that time scales are long, on average. We start with

the simplest Gaussian distribution, the GOE above, and gradually add constraints.

We will see that for the GOE itself, there is a critical value of the scaled variance

c2crit = 1/2. For c < ccrit the system is stable but time scales are short, while for

c > ccrit the system is unstable. Exactly at the critical point c = ccrit time scales

are long in the sense that we have defined, diverging with system size. The essential

challenge is to find the weakest constraints on P (M) that make these long time scales

generic. Many of the results that weneed along the way are known in the random

matrix theory literature, but we will arrive at some new theoretical questions.

4.3.1 Model 1: the Gaussian Orthogonal Ensemble

The simplest ensemble for the interaction matrix M is the Gaussian Orthogonal

Ensemble (GOE) without any additional constraints, which has been studied since

the beginning of random matrix theory [Wigner, 1951, Dyson, 1962a, Dyson, 1962b,

57

Figure 4.2: Spectral density for the connection matrix M drawn from the GaussianOrthogonal Ensemble (GOE, a), the GOE with a hard threshold enforcing stability(hard threshold, b), and the GOE with an additional global constraint on the normactivity (soft threshold, d). Three representative parameters are chosen for eachensemble such that the system is subcritical (c = 0.6, red), critical (c = 1/

√2, blue),

and supercritical (c = 0.8, black). The stability threshold is visualized as the dashedgray line at λw = 1. Panel (c) is a schematic for constraining the averaged normactivity, generating ensembles with the soft threshold.

58

Livan et al., 2018] and overviewed in Chapter 2.2. Mathematically, we have M = Mᵀ,

and

P (M) ∝ exp

(− N

2c2TrMᵀM

); (4.10)

since c sets the scale of synaptic connections, we will refer to this parameter as the

interaction strength. Because the distribution only depends on matrix traces, it is

invariant to rotations of M . Equivalently, if we think of decomposing the matrix M

into its eigenvectors and eigenvalues, the probability depends only on the eigenvalues.

Thus, we can integrate out the eigenvectors, and obtain the joint distribution of

eigenvalues,

PGOE(λi) ∝ exp

[− N

2c2

∑i

λ2i +

1

2

∑j 6=k

ln∣∣λj − λk∣∣] , (4.11)

where the logarithmic repulsion term emerges from the Jacobian when we change

variables from matrix elements to eigenvalues and eigenvectors. The spectral density

can then be found using a mean field approximation, which becomes exact as N →∞.

The result is Wigner’s well-known semicircle distribution [Wigner, 1951],

ρGOE(λ) =1

πc

√2− λ2

c2, λ ∈ [−

√2c,√

2c]. (4.12)

Equation (4.12) for the spectral density, together with Fig 4.2a, shows that there

is a phase transition at ccrit = 1/√

2. If the interaction strength is greater than this

critical strength (supercritical), then λmax > 1 and the system becomes unstable. If

the interaction strength is smaller (subcritical), then the gap size between the largest

eigenvalue and the stability threshold λ = 1 is of order 1, and the time scales remain

finite as system size increases. The only case when the system has slow modes is at

the critical value of the interaction strength, c = ccrit = 1/√

2, where the spectral

density becomes tangential to the stability threshold.

59

At criticality, corresponding to the blue curve in Fig 4.2a, we can estimate the

size of the gap by asking that the gap be large enough to contain ∼ 1 mode, that is

∫ 1

1−g0dλNρ(λ) ∼ 1. (4.13)

With ρ(λ) ∼ (1 − λ)1/2, this gives Ng3/20 ∼ 1 or g0 ∼ N−2/3. Thus the longest time

scale grows with system size, τmax ∼ N2/3.

In the same way, we can estimate the full correlation function from the correlation

time [Eq (4.7)],

CN(t) =1

N

∑τie−t/τi

=

∫dλ

ρ(λ)

(1− λ)e−(1−λ)|t| (4.14)

∼∫dλ

1

(1− λ)1/2e−(1−λ)|t| ∼ |t|−1/2. (4.15)

This has the power–law behavior expected for a critical system, where there is a

continuum of slow modes.

Note that in the GOE system, slow modes with time scales growing as system

size are only possible at a single value of the interaction strength. Nonetheless, we

need to distinguish the fine tuning here as happening at an ensemble level, which

is different from the element-wise fine tuning that might have been required if we

considered particular interaction matrices.

4.3.2 Model 2: GOE with hard stability threshold

Drawing interaction matrices from the GOE leads to long time scales only at a critical

value of interaction strength. Can we modify the ensemble such that long time scales

can be achieved without this fine tuning? In particular, in the GOE, if the interaction

strength is too large, then the system becomes unstable. What will the spectral

60

distribution look like if we allow c > ccrit but edit out of the ensemble any matrix

that leads to instability?

Mathematically, a global constraint on the system stability requires all eigenval-

ues to be less than the stability threshold, λw = 1. This modifies the eigenvalue

distribution with a Heaviside step function:

Phard(λi) ∝ PGOE(λi)∏i

Θ(1− λi). (4.16)

Conceptually, what this model does is to pull matrices out of the GOE and discard

them if they produce unstable dynamics; the distribution Phard(λ) describes the

matrices that remain after this editing. Importantly we do not introduce any extra

structure, and in this sense Phard is a maximum entropy distribution, as discussed

more fully below.

The spectral density ρ(λ) that follows from Phard was first found by Dean and

Majumdar [Dean and Majumdar, 2006, Dean and Majumdar, 2008]. Again, there is

a phase transition depending on the interaction strength. For ensembles with inter-

action strength less than the critical value ccrit = 1/√

2, the stability threshold is

away from the bulk spectrum, so the spectral density remains as Wigner’s semicircle.

On the other hand, if the interaction strength is greater than the critical value, the

spectral density becomes

ρ(λ) =1

c2

√λ+ l∗ − 1

2π√

1− λ(l∗ − 2λ) , (4.17)

where

l∗ =2

3

(1 +√

1 + 6c2).

As shown in Fig 4.2b, the stability threshold acts as a wall pushing the eigenvalues to

pile up. More precisely, near the stability thresshold λ = 1 we have ρ(λ) ∼ (1−λ)−1/2,

61

which [by the same argument as in Eq (4.13)] indicates that the longest time scale

increases as system size with τmax ∼ N2.

The autocorrelation function also is dominated by the eigenvalues close to the

stability threshold. The calculation is a bit subtle, however, since

C(t) =

∫dλ

ρ(λ)

(1− λ)e−(1−λ)|t| ∼

∫dk k−1/2 k−1 e−k|t|

is not integrable. After introducing an IR cut-off at ε ∼ g0 ∼ N−2, we can write the

resulting autocorrelation coefficient as

R(t) =C(t)

C(0)= 1−√π(εt)1/2 + εt+O((εt)2). (4.18)

The correlation time

τcorr =

∫εdkρ(k)k−2∫

εdkρ(k)k−1

∼∫εdkk−5/2∫

εdkk−3/2

∼ ε−1 ∼ N2. (4.19)

We see that for supercritical systems, both the longest time scale τmax and the

correlation time τcorr increase as a power of the system size; the rate is even faster than

the system at criticality. In fact, there are divergently many slow modes. Meanwhile,

the interaction strength can undertake a range of values, as long as they are greater

than a certain threshold. Thus it would seem that we have overcome the fine tuning

problem!

In fact, we cannot quite claim that the problem is solved. First, in the supercritical

phase, the correlation function does not decay as a power law. Instead, the correlation

function stays at 1 for a time period τcorr ∼ τmax, and then decays exponentially. This

means the system has a single long time scale, rather than a continuous spectrum

of slow modes. Second, in order for a system to impose a hard constraint on its

stability, it needs to measure its stability. Naively, checking for stability, especially

62

in the presence of slow modes, requires access to infinitely long measuring times;

implementing a sharp threshold may also be challenging.

4.3.3 Model 3: Constraining mean-square activity

While it can be difficult to check for stability, it is much easier to imagine checking the

overall level of activity in the network. One can even think about mechanisms that

would couple indirectly to activity, such as the metabolic load. If the total activity is

larger than some target level, the system might be veering toward instability, and there

could be feedback mechanisms to reduce the overall strength of connections. Regula-

tion of this qualitative form is known to occur in the brain, and is termed synaptic scal-

ing [Turrigiano et al., 1998, Turrigiano and Nelson, 2004, Abbott and Nelson, 2000];

this is hypothesized to play an important role in maintaining persistent neural activ-

ities [Renart et al., 2003, Tetzlaff et al., 2013]. In this section, we construct the least

structured distribution P (M) that is consistent with a fixed mean (square) level of

activity, which we can think of as a soft threshold on stability, and derive the density

of eigenvalues that follow from this distribution. In the following section we discuss

possible mechanisms for a system to generate matrices M , dynamically, out of this

ensemble.

The spectral density

It is useful to remember that the GOE, Eq (4.10), can be seen as the maximum

entropy distribution of matrices consistent with some fixed variance of the matrix

elements Mij [Jaynes, 1957, Presse et al., 2013]. If we want to add a constraint,

we can stay within the maximum entropy framework, and in this way we isolate

the generic consequences of this constraint: we are constructing the least structured

ensemble of networks that satisfies the added condition.

63

We recall that if we want to constrain the mean values of several functions fµ(M),

then the maximum entropy distribution has the form

P (M) =1

Zexp

[−∑µ

gµfµ(M)

]. (4.20)

In our case are are interested in the mean–square value of the individual matrix

elements, and the mean–square value of the activity variables xi. But our basic

model Eqs (4.1) and (4.2) predicts that

µ =1

N

∑i

〈x2i 〉 =

1

N

∑i

1

1− λi, (4.21)

so the relevant maximum entropy model becomes

P (M) =1

Zexp

[− N

2c2TrMᵀM −Nξ

∑i

1

1− λi

]. (4.22)

Again, this distribution is invariant to orthogonal transformation. After the integra-

tion over the rotation matrices, we have

P (λi) ∝ exp

[− N

2c2

∑i

λ2i +

1

2

∑j 6=k

ln∣∣λj − λk∣∣−Nξ∑

i

1

1− λi

], (4.23)

where the scaling Nξ ensures that all the terms in the exponent are ∼ N2, so there

will be a well defined thermodynamic limit. Luckily, the same arguments that yield

the exact density of eigenvalues in the Gaussian Orthogonal Ensemble also work here

(see ), and we find

ρ(λ) =1

π√

(λ− 1 + g0 + l)(1− g0 − λ)B(λ),

B(λ) =

[1 +

l2

8c2+

(1− g0 −

1

2l

)λ

c2− λ2

c2+ξ

2

(2g0 − 2g20 + l − 2g0l)− (2g0 + l)λ√g0(g0 + l)(λ− 1)2

],

(4.24)

64

where the gap size g0 and the width of the support l are fixed by setting the spectral

density at the two ends of the support zero.

To our surprise, we find a finite gap for all ξ > 0. This means that there exists

a maximum time scale even when the system is infinitely large. This upper limit of

longest time scale depends on the Lagrange multiplier ξ and the interaction strength

c. Because the Lagrange multiplier ξ is used to constrain the averaged norm activity

µ, the maximum time scale is set by the allowed norm of the activity µ, measured in

units of expected norm for independent neurons. As we explain below, the greater

dynamic range the system can allow leads to longer time scales.

The dependence of the gap on the Lagrange multiplier ξ is shown in Fig 4.2d at

each of several fixed values of c; as before there is a phase transition at ccrit = 1/√

2.

This is understandable, since in the limit of ξ = 0 we recover the hard wall case. For

small ξ, the spectrum is similar to the hard wall case, with the eigenvalues close to the

stability threshold pushed into the bulk spectrum; for large ξ, the entire spectrum is

pushed away from the wall. A closer look at the longest time scale τmax vs. Lagrange

multiplier in Fig. 4.3a confirms that amplification of time scales occurs only when

ξ < 1, corresponding to an amplification of mean–square activity µ.

The scaling of time scales in three phases

We now discuss the dependence of time scales on the interaction strength c. In

contrast to the ensemble with a hard stability threshold, we find a finite gap for

all values of interaction strength c, but the scaling of time scales vs. the Lagrange

multiplier (and hence the mean–square activity) is different in the different phases.

For the subcritical and critical phases, the results are as expected from the hard

wall case. In the subcritical phase, as ξ → 0, the spectral distribution converges

smoothly to the familiar semicircle. The time scales and the mean–square activity

both approach constants. On the other hand, when c = ccrit, we find that the longest

65

time scale grows as τmax ∼ ξ−2/5, and the correlation time scale τcorr ∼ ξ−1/5, i.e.

both time scales can be large if ξ is small enough; meanwhile, the norm activity µ

approaches a constant value of µ = 2. This suggests that if a system is poised at

criticality, then the system can exhibit long time scales, even when the dynamic range

of individual components is well controlled. The autocorrelation function exhibits a

power law-like decay, as expected; see the blue curve in Fig 4.3f.

The most interesting case is the supercritical phase, where the interaction strength

c > ccrit = 1/√

2. As ξ → 0, the spectrum does not converge to the spectrum with

the hard constraint. We find that both time scales and the norm activity increase as

power laws of the Lagrange multiplier ξ (Fig 4.3), with τmax ≈ 3τcorr ∼ ξ−2/3, and

µ ∼ ξ−1/3. This implies that the time scales grow as a power of the allowed dynamic

range of the system, although not with the size of the system. The question of whether

the resulting time scales are “long” then becomes more subtle. Quantitatively, we

see from Fig 4.3c, that if the system has an allowed dynamic range just 10× that of

independent neurons, the system can generate time scales τmax almost 104× longer

than the time scale of isolated neurons.

Interestingly, once the system is in the supercritical phase, the ratio of amplifica-

tion has only a small dependence on the interaction strength c (Fig 4.3e). Intuitively,

while an increasing interaction strength c implies that without constraints more modes

will be unstable, while with constraints more modes concentrate near the stability

threshold, but the entire support of the spectrum also expands, so the density of slow

modes and their distance to the stability threshold remain similar. This is perhaps

another indication for long time scales without fine tuning when the system uses its

norm activity to regulate its connection matrix.

We note that, although both the critical phase and the supercritical phase can

reach time scales that are as long as the dynamic range allows, there are significant

differences between the two phases. One difference is that in the critical phase, locally

66

the dynamic range for each neuron can remain finite, while for the supercritical phase,

the variance of activity for individual neurons can be much greater. Moreover, as

shown by Fig. 4.3f, systems in the supercritical phase are dominated by a single slow

mode, rather than by a continuous spectrum of slow modes. While the autocorrelation

function decays as a power law in the critical phase, in the supercritical phase, it holds

at R(t) = 1 for a much longer time compared to the subcritical case, but then decays

exponentially. While a single long time scale can be achieved without fine tuning, it

seems that a continuous spectrum of long time scales is much more challenging.

Finite Size Effects

If we want these ideas to be relevant to real biological systems, we need to understand

what happens at finite N . We investigate this numerically using direct Monte-Carlo

sampling of the (joint) eigenvalue distribution. As the system size grows the time

scales τmax and τcorr also grow, up to the upper limit given by the mean field results;

see Fig 4.4. Finite size scaling is difficult in this case, since the scaling exponent α

for the gap difference,

∆g0(N) ≡ g0(N)− g0 ∼ Nα, (4.25)

depends on the Lagrange multiplier ξ (Fig 4.4e,f ). In particular, the scaling interpo-

lates between two limiting cases: for small ξ, the gap scales as ∆g0 ∼ N−2, as in the

universality class of the hard threshold; for large ξ, the gap scales as ∆g0 ∼ N−2/3,

which is in the universality class of the Gaussian ensemble without any additional

constraint. In any case, thousands of neurons will be well described by the N → ∞

limit.

Distribution of matrix elements

Now that we have examples of network ensembles that exhibit long time scales, we

need to go back and check what these ensembles predict for the distribution of individ-

67

Figure 4.3: Mean field results for Model 3. The longest time scale τmax (a) and thecorrelation time scale τcorr (b) increase with different scaling as the Lagrange multiplierξ decreases, corresponding to an increasing value for the constrained averaged normactivity, µ (c,d). The exact scaling and the amplification of time constant dependingon whether the system is subcritical (with interaction strength c < 1/

√2), critical

(c = 1/√

2), or supercritical (e). Despite the supercritical phase exhibit long timescales, the autocorrelation function decays as a power law only at c = ccrit, and asexponential for other values of interaction strength (f). For this panel, ξ = 10−10, redcurves are for c = 0.2, c = 0.6, blue curve is at c = ccrit, and black curves are at c = 1and c = 3.

68

Figure 4.4: Finite size effects on the time scales τmax and τcorr for Model 3, interactionstrength c = 1 (the supercritical phase). Results from direct Monte Carlo samplingof the eigenvalue distribution are plotted together with the mean field results. Thereis no universal exponent that explains the convergence of τmax when system sizeincreases (e), and apparent exponents α depend on the Lagrange multiplier ξ. Panelf shows that the apparent α depends on the maximum system size used in the fitting,interpolating between α = 2 and α = 2/3.

69

ual matrix elements. In particular, because we did not constrain the self interaction

Mii to be 0, we want to check whether the long time scales emerge as a collective

behavior of the network, or trivially from an effective increase of the intrinsic time

scales for individual neurons, τ effind = 1/(1 − 〈Mii〉) = 1/(1 − 〈λ〉). A similar ques-

tion arises in real networks, where there have been debates about the importance of

feedback within single neurons vs. the network dynamics in maintaining long time

scales; see Ref [Major and Tank, 2004] for a review. We confirm that, at least in

our models, this is not an issue: the constraint on the norm activity, in fact, pushes

the average eigenvalues to be negative, and hence the effective self interaction for

individual neurons actually leads to a shorter intrinsic time scale. We can impose

additional constraints on the distribution of matrix elements, such that 〈Mii〉 = 0;

for this ensemble, we can again solve for the spectral distribution, and we find that

scaling behaviors described above don’t change.

4.4 Dynamic tuning

So far, we have established that a distribution constraining the norm activity can

lead generically to long time scales, but we haven’t really found a mechanism for

implementing this idea. But if we can write the distribution of connection matrices

M in the form a Boltzmann distribution, we know that we can sample this distribution

by allowing the matrix elements be dynamical variables undergoing Brownian motion

in the effective potential. We will see that this sort of dynamics is closely related

to previous work on self–tuning to criticality [Magnasco et al., 2009], and we can

interpret the dynamics as implementing familiar ideas about synaptic dynamics, such

as Hebbian learning and metaplasticity.

70

We can rewrite our model in Eq (4.22) as

P (M) =1

Zexp

[− N

2c2TrMᵀM −Nξ

∑i

1

1− λi

]=

1

Zexp [−V (M)/T ] (4.26)

V (M) =1

2TrMᵀM + c2ξTr(1−M)−1, (4.27)

with a temperature T = c2/N . The matrix M will be drawn from the distribution

P (M), as M itself performs Brownian motion or Langevin dynamics in the potential

V (M):

τMM = −∂V (M)

∂M+ ζ(t)

= −M − c2ξ(1−M)−2 + ζ(t),

(4.28)

where the noise has zero mean, 〈ζ〉 = 0, and is independent for each matrix element,

〈ζij(t)ζkl(t′)〉 = 2TτMδikδjlδ(t− t′). (4.29)

It is useful to remember that, in steady state, our dynamical model for the xi,

Eqs (4.1) and (4.2), predicts that

〈xixj〉 = [(1−M)−1]ij. (4.30)

This means that we can rewrite the Langevin dynamics of M , element by element, as

τMMij = −Mij − c2ξ[(1−M)−2]ij + ζij(t)

= −Mij − c2ξ〈xixk〉〈xkxj〉+ ζij(t). (4.31)

71

Because the xi are Gaussian, we have

〈xixk〉〈xkxj〉 =1

2(〈xixkxkxj〉 − 〈xixj〉〈xkxk〉) , (4.32)

where as above the summation over the repeated index k is understood, so that

〈xixk〉〈xkxj〉 =1

2

⟨xixj

(∑k

x2k −

∑k

〈x2k〉)⟩

. (4.33)

We now imagine that the dynamics of M is sufficiently slow that we can replace

averages by instantaneous values, and let the dynamics of M do the averaging for us.

In this approximation we have

τMMij = −Mij −1

2c2ξxixj

(∑k

x2k − θ

)+ ζij(t), (4.34)

where the threshold θ =∑

k〈x2k〉.

The terms in this Langevin dynamics have a natural biological interpretation.

First, the connection strength decays with an overall time constant τM . Second, the

synaptic connection Mij is driven by the correlation between pre– and post–synaptic

activity, ∼ xixj, as in Hebbian learning [Hebb, 1949, Magee and Grienberger, 2020].

In more detail, we see that the response to correlations is modulated depending

on whether the global neural activity is greater or less than a threshold value; if the

network is highly active, then the connection between neurons with correlated activity

will decrease, i.e. the dynamics are anti-Hebbian, while and if the overall network is

quiet the dynamics are Hebbian.

We still have the problem of setting the threshold θ. Ideally, for the dynamics to

generate samples out of the correct P (M), we need

θ = θ∗ ≡ 〈xᵀx〉s.s. = Nµ(c, ξ), (4.35)

72

Figure 4.5: A neural network can tune its connection matrix to the ensemble withslow modes using simple Langevin dynamics. Two candidates for the dynamics in-clude one with a fixed threshold on the averaged neural activity, and one with asliding threshold. (a) Average λmax. With fixed threshold (left), the system is stableonly if the updating time scale τM is long enough, and if the fixed threshold for thenorm activity, θ, is small enough, thus requiring fine tuning. With a sliding threshold(right), the system is stable for a large range of τθ. (b) The spectral distributionfor connection matrix drawn from the dynamics with fixed threshold approaches thestatic distribution as the time constant τM increases. (c) If the connection matrixupdates too fast, i.e. τM is too small, the system exhibits quasi-periodic oscillations,and does not reach a steady state distribution. In contrast, long τM leads to the adi-abatic approximation for the steady state distribution. (d) The expected eigenvaluesfor the dynamics with fixed threshold, and for the ones with sliding threshold, wiithN = 32, c = 1, ξ = 2−5 and τM = 1000. (e) Example traces for the eigenvaluesvs. time.

73

where as above µ is the mean activity whose value is enforced by the Lagrange mul-

tiplier ξ. This means that θ needs to be tuned in relation to ξ, and it is challenging

to have a mechanism that does this directly, and just pushes the fine tuning problem

back one step. Importantly, if θ = θ∗ then the steady state spectral density of the

connection matrix approaches the desired equilibrium distribution as the update time

constant increases (Fig 4.5b), but if θ deviates from θ∗ then the steady distribution

does not have slow modes. As shown in Fig 4.5de, if the threshold is too small, then

the entire spectrum is shifted away from the stability threshold, and the system no

longer exhibits long time scales; if the threshold is too large, then the largest eigen-

value oscillates around the stability threshold λ = 1, and a typical connection matrix

drawn from the steady state distribution is unstable.

But we can once again relieve the fine tuning problem by promoting θ to a dy-

namical variable,

τθθ =∑k

x2k − θ, (4.36)

which we can think of as a sliding threshold in the spirit of models for metaplastic-

ity [Bienenstock et al., 1982]. We pay the price of introducing yet another new time

scale, τθ, but in Figs 4.5de we see that this can vary over at least three orders of

magnitude without significantly changing the spectral density of the eigenvalues.

To see whether the sliding threshold really works, we can compare results where

θ is fixed to those where it changes dynamically; we follow the mean value of λmax

as an indicator of system performance. We choose parameters in the supercritical

phase, specifically c = 1 and ξ = 2−5, and study a system with N = 32. Figure 4.5a

shows that with fixed threshold, even in the adiabatic limit where τM 1, there is

only a measure-zero range for the fixed threshold θ such that λmax is very close to,

but smaller than 1. In contrast, for the dynamics with sliding threshold, at τM 1

there is a large range of values for the time constant τθ such that the system hovers

just below instability, generating a long time scale.

74

The Langevin dynamics of our system is similar to the BCM theory of meta-

plasticity in neural networks, in that both models involve a combination of Hebbian

learning and a threshold on the neural activity [Bienenstock et al., 1982], but there

are two key differences. First, the BCM theory imposes a threshold on the activity of

locally connected neurons, while here the threshold is on the overall neural activity.

Second, the BCM dynamics is Hebbian when the post– and pre–synaptic activities

are larger than the threshold, and anti–Hebbian otherwise, which is the opposite of

the dynamics for our system. It is interesting that in some other models for home-

ostasis, plasticity requires the activity detection mechanism to be fast (τθ/τM 1)

for the system to be stable [Zenke et al., 2013, Zenke et al., 2017], which we do not

observe for our system.

4.5 Discussion

Living systems can generate behaviors on time scales that are much longer than the

typical time scales of their component parts, and in some cases correlations in these

behaviors decay as an approximate power–law, suggesting a continuous spetcrum of

slow modes. In order to understand how surprised we should be by these behaviors,

it is essential to ask whether there exist biologically plausible dynamical systems that

can generate these long time scales “easily.” Typically, to achieve long time scales

a high dimensional dynamical system requires some degree of fine tuning: from the

most to the least stringent, examples include setting individual elements of the con-

nection matrix, choosing a particular network architecture, or imposing some global

constraints which allow ensembles of systems with long time scales.

In this note, we were able to construct a mechanism for living systems to reach

long time scales with the least stringent fine-tuning condition: when the interaction

strength of the connection matrix is large enough, imposing a global constraint on the

75

stability of the system leads to divergent many slow modes. To impose a biologically

plausible mechanism for living systems, we constrain the averaged norm activity as a

proxy for global stability; in this case, the time scales for the slow modes are set by

the allowed dynamic range of the system. Further, we showed that these ensembles

can be achieved by updating the connection matrix M with a sliding threshold for

the norm activity, a mechanism that resembles metaplasticity in neural networks.

Importantly, the slow modes achieved through constraining norm activity typically

lead to exponentially decaying correlations; only when the interaction strength of

the matrix is at a critical value do we find power-law decays. This suggest that a

continuous range of slow modes is more difficult to achieve than a single long time

scale. A natural follow-up question is whether there exist mechanisms which can

tune the system to criticality in a self-organized way, for example by coupling the

interaction strength to the averaged norm activity.

Both for simplicity and to understand the most basic picture, we have been fo-

cusing on linear networks with symmetric connections. Realistically, many biological

networks are asymmetric, which gives rise to more complex and perhaps even chaotic

dynamics [Sompolinsky et al., 1988, Vreeswijk and Sompolinsky, 1996]. In the asym-

metric case, the eigenvalue spectrum for the Gaussian ensemble is well known

(uniform distribution inside a unit circle) [Ginibre, 1965, Forrester and Nagao, 2007],

but a similar global constraint on the norm activity leads to a dependence on

the overlap among the left and right eigenvectors [Chalker and Mehlig, 1998,

Mehlig and Chalker, 2000]. In particular, the matrix distribution can no longer be

separated into the product of eigenvalues and eigenvectors, and it is difficult to solve

for the spectral distribution analytically. Two new features emerge in the asymmetric

case. First, the time scales given by real eigenvalues vs. complex eigenvalues may

be different, leading to more (or less) dominant oscillatory slow modes in large

systems [Akemann and Kanzieper, 2007]. Second, asymmetric connection matrices

76

can lead to complicated transient activity when the system is perturbed, with the

time scales mostly dominated by the eigenvector overlaps, and can be very different

from the time scales given by the eigenvalues [Grela, 2017]. In the limit of strong

asymmetry, the network is organized into a feed-forward line, information can be

held for a time that is extensive in system size; see examples in Refs. [Goldman, 2009]

and [Ganguli et al., 2008]. It will be interesting to check whether systems can store

information in these transients without fine-tuning the structure of the network.

The system we study can be extended to consider more specific ensembles for par-

ticular biological systems. For example, real neural networks have inhibitory and exci-

tatory neurons, so that elements belonging to the same column need to share the same

sign, and the resulting spectral distribution has been shown to differ from the unit

sphere [Rajan and Abbott, 2006]; more generally recent work explores how structured

connectivity can lead to new but still universal dynamics [Tarnowski et al., 2020]. An-

other limitation of our work is that it only considers linear dynamics, or only dynami-

cal systems where all fixed points to be equally likely. In contrast, some non-linear dy-

namics such as the Lotka-Volterra model in ecology [Biroli et al., 2018], and the gat-

ing neural network in machine learning [Can et al., 2020, Krishnamurthy et al., 2020]

have been shown to drive systems to non-generic, marginally stable fixed points,

around which there exists an extensive number of slow directions for the dynamics.

In summary, we believe the issue of whether a continuum of slow modes can arise

generically in neural networks remains open, but we hope that our study of very

simple models has helped to clarify this question.

77

Chapter 5

Conclusion and Outlook

Living systems often are found to perform incredible tasks, such as collective motion

and foraging on the organism level, and cognitive activity in our brains. Naively,

if one would take apart a living system, and then randomly couple the individual

components back together without any design principles, one does not expect the

resulting system will be as intelligent or even just animate as before. Thus, one may

regard the living systems as a surprising result of emergent phenomena. On the other

hand, even in the inanimate world, the rise of macroscopic features of materials –

such as the spontaneous magnetization in ferromagnetic materials – from microscopic

interaction is also surprising, although these can be understood now using statistical

physics. When we compare the animate and inanimate systems, shall we be more

surprised by one compared to the other? More precisely, can we understand the

collective behavior of living systems as an extension of inanimate physical systems, or

is there something unique about biology, such that “more is different” takes its literal

meaning? These questions need to be addressed by first examining the collective

behavior of living systems in the framework of statistical physics.

This dissertation was born thanks to the continuous effort of both experimentalists

and theorists in developing new technology to collect and understand the emergence

78

of collective behavior in living systems. Although it mainly focuses on systems of in-

terconnected neurons, the approaches developed by this dissertation can be extended

to other living systems at different scales, such as in groups of interacting animals.

The first part of the dissertation extended a statistical physics framework of con-

structing probability model for the activity of large groups of neurons. In particular,

we examine the collective behavior in the neural network of the nematode C. elegans

in Chapter 3. Through analyzing data from real neurons, it extended the maximum

entropy model which matches lower orders of statistics of neural activity, a method

that has been successful in spiking networks, to describe this very different neural

network with very small numbers of neurons and graded electrical activities, and

successfully identified features of collective behavior, such as the inferred model has

parameters near a critical surface. Importantly, this research also leads to testable

hypothesis on how the network would react to local perturbations. Currently, there

are plans to test this hypothesis, which will further facilitate our understanding of

how well statistical physics models can be used to describe this neural network, and

how out-of-equilibrium the real system is when being perturbed.

A natural question resulted from this work in C. elegans is how finely the system

need to be tuned to appear at criticality. For example, for a worm to develop from

an embryo with a functional brain, does C. elegans need to encode all information

about the strength of neuronal interaction in its genome, or is a stochastic method for

development enough? Currently, several research groups are developing techniques to

identify neurons across individual worms, which will create an exciting opportunity

to test this idea by comparing the inferred statistical models across animals, and to

construct an ensemble for the neural networks of the worm. In the case that the

precise interaction strengths were shown to be essential, a more urgent question then

need to be asked on how the brain with such fine-tuned parameters is developed

79

through evolution, and how we can be inspired to design artificial neural networks

with optimized performance.

The question of fine-tuning was then further addressed in Chapter 4, where we

focused on the emergent long time scales in large dynamical systems. In biological

systems, often we observe long time scales in the behavior; here, we probe the question

of how surprising we should be by examining the conditions for a dynamical system

with random interaction to exhibit long time scales. In particular, we found that it is

possible for the neural network use a self-adaptive mechanism to achieve a single long

time scales, but to have a continuous spectrum of long time scales requires criticality

and is harder to achieve. This offers another way of thinking about what we see in

biological systems and how surprised we should be.

These questions relating to the temporal evolution of biological networks and

their interaction with the environment are not limited to systems of interconnected

neurons. More generally, for example, these questions can be studied in collective

animal behaviors, where the “group” is already constantly changing – an example is

that a flock of birds can spontaneously be broken into two and recombine – and the

system can be systematically perturbed by researchers. Some interesting questions

include when an individual is introduced to an already formed group, how does it

adapt, and how does the rest of the network respond? If the hypothesis that biological

systems would like to be poised near criticality is correct, how do the individuals tune

its interaction when either the system or the environment is changing to maintain the

criticality? The hope is that by studying the dynamics of collective behaviors across

different systems, it may give us insights on the evolution of biological networks in

general, and some clues of how to design self-assembling networks.

80

Appendix A

Appendices for Chapter 3

A.1 Perturbation methods for overfitting analysis

To test if our maximum entropy model overfits, we partition the samples into a set of

training data and a set of test data. The difference of the per-neuron log-likelihood for

the training data and the test data is used as a metric of whether the model overfits:

if the two values for the log-likelihood are equal within error bars, then the model

generalizes well to the test data and does not overfit. Here, we outline a perturbation

analysis which uses the number of independent samples and the number of parameters

of the model to estimate the expectation value of this log-likelihood difference.

Consider a Boltzmann distribution parameterized by g = g1, g2, . . . , gm acting on

observables φ1, φ2, . . . , φm. The probability for the N spins taking the value σ =

σ1, σ2, . . . , σN is

P (σ|g) =1

Z(g)exp

(−

m∑i=1

giφi(σ)

), (A.1)

81

where Z is the partition function. Then, the log-likelihood of a set of data with T

samples under the Boltzmann distribution parameterized by g is

L(σ1, σ2, . . . , σT |g) =1

T

T∑t=1

logP (σt|g)

= − logZ(g)−m∑i=1

gi

(1

T

T∑t=1

φti

) (A.2)

Now, let us assume that a set of true underlying parameters, g∗, exists for the

system we study, which leads to a true expectation value be f ∗i = fi(g∗). However,

we are only given finite number of observations, σ1, σ2, . . . , σT , from which we con-

struct a maximum entropy model, i.e. infer the parameters g by maximizing the

likelihood of the data. Our hope is that the difference between the true parameters

and the inferred parameters is small, in which case we can approximate the inferred

parameters using a linear approximation

gi = g∗i + δgi, (A.3)

where δgi ≈∑j

∂gi∂fj

δfj = −∑j

χijδfj. (A.4)

Here, χ is the inverse of the susceptibility matrix χij = −∂fi/∂gj = 〈φiφj〉−〈φi〉〈φj〉;

and δfj is the difference between empirical mean and the true mean of φj,

δfj =1

T

T∑t=1

φj(σt)− f ∗j (A.5)

For convenience, we will use short-hand notation φi(σt) = φti to indicate the value of

the observable φi at time t.

Let the number of samples in the training data be T1, and the number of samples

in the test data be T2. For simplicity, assume that all samples are independent. We

maximize the entropy of the model on only the training data to obtain parameters

82

g, and we would like to know how well our model generalize to the test data. Thus,

we quantify the degree of overfitting by the difference of likelihood of the training

data and the test data:

Ltest − Ltrain =

[− logZ(g)−

m∑i=1

gi

(1

T2

T2∑t′=1

φt′

i

)]−[− logZ(g)−

m∑i=1

gi

(1

T1

T1∑t=1

φti

)]

=m∑i=1

(g∗i −

∑j

χij

(1

T1

T1∑t=1

φtj − f ∗j

))(1

T1

T1∑t=1

φti −1

T2

T2∑t′=1

φt′

i

).

(A.6)

For simplicity of notation, let us write

α(1)i =

1

T1

T1∑t=1

φti − f ∗i , α(2)i =

1

T2

T2∑t=1

φti − f ∗i . (A.7)

By the Central Limit Theorem, α(1)i and α

(2)i are Gaussian variables. Terms that

appear in the likelihood difference [Eq. (A.6)], have expectation values

〈α(1)i 〉 = 0 , 〈α((1)

i α(1)j 〉 =

1

T1

χij . (A.8)

In addition, because we assume that the training data and the test data are indepen-

dent, the cross-covariance between the training data and the test data is

〈α((1)i α

(2)j 〉 = 0 . (A.9)

83

Combining all the above expressions, we obtain the expectation value of the like-

lihood difference [Eq. (A.6)],

〈Ltest − Ltrain〉 =⟨ m∑

i=1

(g∗i −

∑j

χijα(1)j

)(α

(1)i − α(2)

i

)⟩= −

m∑i=1

m∑j=1

χij〈α(1)i α

(1)j 〉

= − 1

T1

m∑i=1

m∑j=1

χijχij

= −mT1

(A.10)

Note that the difference of likelihood is only related to the number of parameters in

our model and the number of independent samples in the training data.

Similarly, we can evaluate the variance of the likelihood difference to be

〈(Ltest − Ltrain)2〉 =∑i,k

g∗i g∗kχik

(1

T1

+1

T2

)+

1

T 21

(m2 + 2m) +m

T1T2

(A.11)

using Wick’s theorem for multivariate Gaussian variables and chain rules of partial

derivatives.

In order to test whether perturbation theory can be applied to the maximum

entropy model learned from the real data, we estimate the number of independent

samples using Nind. sample ∼ T/τ , where T is the length of the experiment and τ is

the correlation time. The correlation time is extracted as the decay exponent of the

overlap function, defined to be

q(∆t) =⟨ 1

N

N∑i=1

δσi(t)σi(t+∆t)

⟩t, (A.12)

84

In our experiment, the correlation time is τ = 4 ∼ 6s. For a typical recording of 8

minutes, the number of independent samples is between 80 and 120.

In Figure 3.5, we compute the perturbation results using the number of non-zero

parameters after the training and the number of independent samples estimated from

the data. The prediction is within the error bar from the data, which suggests that the

inferred coupling is within the perturbation regime of the true underlying coupling.

Note that the plotted difference is computed for the per-neuron log-likelihood, ltest−

ltrain = (Ltest − Ltrain)/N .

A.2 Maximum entropy model with the pairwise

correlation tensor constraint

To fully describe the pairwise correlation between neurons with p = 3 states, the

equal-state pairwise correlation cij = 〈δσiσj〉 is not enough; rather, we should constrain

the pairwise correlation tensor, defined as

crsij ≡ 〈δσirδσjs〉 . (A.13)

Here, we constrain the pairwise correlation tensor crsij together with the local mag-

netization mri ≡ 〈δσir〉. Notice that for each pair of neurons (i, j), the number of

constraints are p2 + 2p = 15, but these constraints are related through normaliza-

tion requirements,∑

rmri = 1 and

∑s c

rsij = mr

i , which leads to only 7 independent

variables for each pair of neurons. Because of this dependence, choosing which vari-

ables to constraint is a problem of gauge fixing. Here, we choose the gauge where we

constrain the local magnetization mri for states “rise” and “fall”, and the pairwise cor-

relations crij ≡ crrij ; in this gauge the parameters can be compared meaningfully to the

equal-state maximum entropy model above. The corresponding maximum entropy

85

model has the form

P (σ) ∝ exp

(−1

2

∑i 6=j

3∑r=1

Jrijδσirδσjr −∑i

2∑r=1

hri δσir

)(A.14)

Note that the equivalence between constraining the equal-state correlation for

each state and constraining the full pairwise correlation tensor only holds for the case

of p = 3. For p > 3 states, one need to choose more constraints to fix the gauge, and

it is not obvious which variables to fix.

We train the maximum entropy model with tensor constraint [Eq. (A.14)] with the

same procedure as the model with equal-state correlation constraint, described in the

main text. The model is able to reproduce the constraints with a sparse interaction

tensor J . However, as shown in the bottom panel of Fig. 3.5, the difference between

ltrain, the per-neuron log likelihood of the training data (randomly chosen 5/6 of

all data) and ltest, the per-neuron log likelihood of the test data, is greater than

zero within error bars. This indicates that the maximum entropy model with tensor

constraint overfits for all N = 10, 20, . . . , 50.

A.3 Maximum entropy model fails to predict the

dynamics of the neural networks as expected

By construction, the maximum entropy model is a static probability model of the

observed neuronal activities. No constraint on the dynamics was imposed in building

the model, and infinitely many dynamical models can generate the observed static

distribution. The simplest possibility corresponds to the dynamics being like the

dynamics of Monte Carlo itself, which is essentially Brownian motion on the energy

landscape. To test whether this equilibrium dynamics can capture the real neural

dynamics of C. elegans, we compare the mean occupancy time of each basin, 〈τα〉,

86

calculated using the experimental data and using MCMC. The mean occupancy time

is defined as the average time a trajectory spends in a basin before escaping to another

basin. For equilibrium dynamics, the mean occupancy time is determined by the

height of energy barriers according to the transition state theory, or by considering

random walks on the energy landscape, which gives the relation τ ∼ −p2/2e ln(Pα),

where p = 3 is the number of Potts states and Pα is the fraction of time the system

visits basin α. As shown in Figure A.1, the mean occupancy time 〈τMCα 〉 found in the

Monte Carlo simulation can be predicted by this simple approximation. In contrast,

the empirical neural dynamics deviates from the equilibrium dynamics, as we might

have expected. The dependence between 〈τdataα 〉 and P data

α is weak; a linear fit gives

〈τdataα 〉 ≈ P data

α0.5±0.027

.

87

10-4 10-3 10-2 10-1 100

P

10-1

100

101

102 (s

ec)

data

model

hdata↵ i / P 0.5±0.027

↵<latexit sha1_base64="i6SKBv5WghJZpVb2NiJZSL/H21w=">AAACLXicbVDLSiNBFK12fMZXHJduCoPgKnSLEpeCLmYZwUQhHcPtyk1SWF1dVN0WQ5MfcuOvDAOziIjb+Y2pPARfBwoO59zLrXMSo6SjMBwHCz8Wl5ZXVtdK6xubW9vlnZ9Nl+VWYENkKrM3CThUUmODJCm8MRYhTRReJ3fnE//6Hq2Tmb6iocF2Cn0te1IAealTvogV6L7CmCDvxKDMAG5jwgcqukAwiu3MNTYzlPH620gRVk9ik/KwGh7VRp1yxZMp+FcSzUmFzVHvlP/E3UzkKWoSCpxrRaGhdgGWpFA4KsW5QwPiDvrY8lRDiq5dTNOO+IFXuryXWf808an6fqOA1LlhmvjJFGjgPnsT8TuvlVPvtF1IbXJCLWaHerniPvikOt6VFgWpoScgrPR/5WIAFgT5gku+hOhz5K+keVSNwmp0eVw5C+d1rLI9ts8OWcRq7Iz9YnXWYII9st9szJ6Dp+Bv8BK8zkYXgvnOLvuA4N9/oLqoxw==</latexit><latexit sha1_base64="i6SKBv5WghJZpVb2NiJZSL/H21w=">AAACLXicbVDLSiNBFK12fMZXHJduCoPgKnSLEpeCLmYZwUQhHcPtyk1SWF1dVN0WQ5MfcuOvDAOziIjb+Y2pPARfBwoO59zLrXMSo6SjMBwHCz8Wl5ZXVtdK6xubW9vlnZ9Nl+VWYENkKrM3CThUUmODJCm8MRYhTRReJ3fnE//6Hq2Tmb6iocF2Cn0te1IAealTvogV6L7CmCDvxKDMAG5jwgcqukAwiu3MNTYzlPH620gRVk9ik/KwGh7VRp1yxZMp+FcSzUmFzVHvlP/E3UzkKWoSCpxrRaGhdgGWpFA4KsW5QwPiDvrY8lRDiq5dTNOO+IFXuryXWf808an6fqOA1LlhmvjJFGjgPnsT8TuvlVPvtF1IbXJCLWaHerniPvikOt6VFgWpoScgrPR/5WIAFgT5gku+hOhz5K+keVSNwmp0eVw5C+d1rLI9ts8OWcRq7Iz9YnXWYII9st9szJ6Dp+Bv8BK8zkYXgvnOLvuA4N9/oLqoxw==</latexit><latexit sha1_base64="i6SKBv5WghJZpVb2NiJZSL/H21w=">AAACLXicbVDLSiNBFK12fMZXHJduCoPgKnSLEpeCLmYZwUQhHcPtyk1SWF1dVN0WQ5MfcuOvDAOziIjb+Y2pPARfBwoO59zLrXMSo6SjMBwHCz8Wl5ZXVtdK6xubW9vlnZ9Nl+VWYENkKrM3CThUUmODJCm8MRYhTRReJ3fnE//6Hq2Tmb6iocF2Cn0te1IAealTvogV6L7CmCDvxKDMAG5jwgcqukAwiu3MNTYzlPH620gRVk9ik/KwGh7VRp1yxZMp+FcSzUmFzVHvlP/E3UzkKWoSCpxrRaGhdgGWpFA4KsW5QwPiDvrY8lRDiq5dTNOO+IFXuryXWf808an6fqOA1LlhmvjJFGjgPnsT8TuvlVPvtF1IbXJCLWaHerniPvikOt6VFgWpoScgrPR/5WIAFgT5gku+hOhz5K+keVSNwmp0eVw5C+d1rLI9ts8OWcRq7Iz9YnXWYII9st9szJ6Dp+Bv8BK8zkYXgvnOLvuA4N9/oLqoxw==</latexit><latexit sha1_base64="i6SKBv5WghJZpVb2NiJZSL/H21w=">AAACLXicbVDLSiNBFK12fMZXHJduCoPgKnSLEpeCLmYZwUQhHcPtyk1SWF1dVN0WQ5MfcuOvDAOziIjb+Y2pPARfBwoO59zLrXMSo6SjMBwHCz8Wl5ZXVtdK6xubW9vlnZ9Nl+VWYENkKrM3CThUUmODJCm8MRYhTRReJ3fnE//6Hq2Tmb6iocF2Cn0te1IAealTvogV6L7CmCDvxKDMAG5jwgcqukAwiu3MNTYzlPH620gRVk9ik/KwGh7VRp1yxZMp+FcSzUmFzVHvlP/E3UzkKWoSCpxrRaGhdgGWpFA4KsW5QwPiDvrY8lRDiq5dTNOO+IFXuryXWf808an6fqOA1LlhmvjJFGjgPnsT8TuvlVPvtF1IbXJCLWaHerniPvikOt6VFgWpoScgrPR/5WIAFgT5gku+hOhz5K+keVSNwmp0eVw5C+d1rLI9ts8OWcRq7Iz9YnXWYII9st9szJ6Dp+Bv8BK8zkYXgvnOLvuA4N9/oLqoxw==</latexit>

hRW↵ i = 9

2e

1

log P↵<latexit sha1_base64="4SJrWOe/r0wKavyxMei7GxFDvlc=">AAACM3icbVDPSxtBGJ211mpsNa1HL4NB6KVhVwq1B0HwIj2l0hghk4ZvJ98mg7Ozy8y30rDs/+TFf8SDUDy0FK/+D06SPdQfDwYe732Pb74X51o5CsPfwdKr5dcrb1bXGutv321sNt9/OHVZYSV2ZaYzexaDQ60MdkmRxrPcIqSxxl58fjTzexdoncrMD5rmOEhhbFSiJJCXhs1vQoMZaxQExVCAzifwUxD+ovKkVwk79/gB/yQSC7L8WpV7WC14VJVCZ2PeqWPVsNkK2+Ec/DmJatJiNTrD5rUYZbJI0ZDU4Fw/CnMalGBJSY1VQxQOc5DnMMa+pwZSdINyfnPFd70y4klm/TPE5+r/iRJS56Zp7CdToIl76s3El7x+Qcn+oFQmLwiNXCxKCs0p47MC+UhZlKSnnoC0yv+Vywn4RsjX3PAlRE9Pfk5O99pR2I6+f24dhnUdq2yb7bCPLGJf2CE7Zh3WZZJdshv2h/0NroLb4F9wtxhdCurMFnuE4P4BgC+r8g==</latexit><latexit sha1_base64="4SJrWOe/r0wKavyxMei7GxFDvlc=">AAACM3icbVDPSxtBGJ211mpsNa1HL4NB6KVhVwq1B0HwIj2l0hghk4ZvJ98mg7Ozy8y30rDs/+TFf8SDUDy0FK/+D06SPdQfDwYe732Pb74X51o5CsPfwdKr5dcrb1bXGutv321sNt9/OHVZYSV2ZaYzexaDQ60MdkmRxrPcIqSxxl58fjTzexdoncrMD5rmOEhhbFSiJJCXhs1vQoMZaxQExVCAzifwUxD+ovKkVwk79/gB/yQSC7L8WpV7WC14VJVCZ2PeqWPVsNkK2+Ec/DmJatJiNTrD5rUYZbJI0ZDU4Fw/CnMalGBJSY1VQxQOc5DnMMa+pwZSdINyfnPFd70y4klm/TPE5+r/iRJS56Zp7CdToIl76s3El7x+Qcn+oFQmLwiNXCxKCs0p47MC+UhZlKSnnoC0yv+Vywn4RsjX3PAlRE9Pfk5O99pR2I6+f24dhnUdq2yb7bCPLGJf2CE7Zh3WZZJdshv2h/0NroLb4F9wtxhdCurMFnuE4P4BgC+r8g==</latexit><latexit sha1_base64="4SJrWOe/r0wKavyxMei7GxFDvlc=">AAACM3icbVDPSxtBGJ211mpsNa1HL4NB6KVhVwq1B0HwIj2l0hghk4ZvJ98mg7Ozy8y30rDs/+TFf8SDUDy0FK/+D06SPdQfDwYe732Pb74X51o5CsPfwdKr5dcrb1bXGutv321sNt9/OHVZYSV2ZaYzexaDQ60MdkmRxrPcIqSxxl58fjTzexdoncrMD5rmOEhhbFSiJJCXhs1vQoMZaxQExVCAzifwUxD+ovKkVwk79/gB/yQSC7L8WpV7WC14VJVCZ2PeqWPVsNkK2+Ec/DmJatJiNTrD5rUYZbJI0ZDU4Fw/CnMalGBJSY1VQxQOc5DnMMa+pwZSdINyfnPFd70y4klm/TPE5+r/iRJS56Zp7CdToIl76s3El7x+Qcn+oFQmLwiNXCxKCs0p47MC+UhZlKSnnoC0yv+Vywn4RsjX3PAlRE9Pfk5O99pR2I6+f24dhnUdq2yb7bCPLGJf2CE7Zh3WZZJdshv2h/0NroLb4F9wtxhdCurMFnuE4P4BgC+r8g==</latexit><latexit sha1_base64="4SJrWOe/r0wKavyxMei7GxFDvlc=">AAACM3icbVDPSxtBGJ211mpsNa1HL4NB6KVhVwq1B0HwIj2l0hghk4ZvJ98mg7Ozy8y30rDs/+TFf8SDUDy0FK/+D06SPdQfDwYe732Pb74X51o5CsPfwdKr5dcrb1bXGutv321sNt9/OHVZYSV2ZaYzexaDQ60MdkmRxrPcIqSxxl58fjTzexdoncrMD5rmOEhhbFSiJJCXhs1vQoMZaxQExVCAzifwUxD+ovKkVwk79/gB/yQSC7L8WpV7WC14VJVCZ2PeqWPVsNkK2+Ec/DmJatJiNTrD5rUYZbJI0ZDU4Fw/CnMalGBJSY1VQxQOc5DnMMa+pwZSdINyfnPFd70y4klm/TPE5+r/iRJS56Zp7CdToIl76s3El7x+Qcn+oFQmLwiNXCxKCs0p47MC+UhZlKSnnoC0yv+Vywn4RsjX3PAlRE9Pfk5O99pR2I6+f24dhnUdq2yb7bCPLGJf2CE7Zh3WZZJdshv2h/0NroLb4F9wtxhdCurMFnuE4P4BgC+r8g==</latexit>

(sec)hdata

↵ i<latexit sha1_base64="XOb/o3matVuyZZFJ2VbX83Rdrqw=">AAACEHicdVA9TxtBEN3jGyeACWWaFVaUVKc92ziUSGkoiRQDks9Yc+uxvWJv77Q7h2Kd/BNo+Cs0FEQRbUo6/g1rYySC4EkjPb03o5l5Sa6VIyEegoXFpeWV1bX1yoePG5tb1e1Pxy4rrMS2zHRmTxNwqJXBNinSeJpbhDTReJKc/5j6JxdoncrMLxrn2E1haNRASSAv9apfYw1mqJHHBEUvBp2P4Cwm/E1lHwgmPLYzv1etiVA0GmIv4iKs7zdbe01PWvWo3mrwKBQz1NgcR73qfdzPZJGiIanBuU4kcuqWYElJjZNKXDjMQZ7DEDueGkjRdcvZQxP+xSt9PsisL0N8pr6cKCF1bpwmvjMFGrnX3lR8y+sUNNjvlsrkBaGRT4sGheaU8Wk6vK8sStJjT0Ba5W/lcgQWJPkMKz6E50/5++S4HkYijH42awdiHsca+8x22TcWse/sgB2yI9Zmkl2ya3bL/gRXwU3wN7h7al0I5jM77D8E/x4B5ziduA==</latexit><latexit sha1_base64="XOb/o3matVuyZZFJ2VbX83Rdrqw=">AAACEHicdVA9TxtBEN3jGyeACWWaFVaUVKc92ziUSGkoiRQDks9Yc+uxvWJv77Q7h2Kd/BNo+Cs0FEQRbUo6/g1rYySC4EkjPb03o5l5Sa6VIyEegoXFpeWV1bX1yoePG5tb1e1Pxy4rrMS2zHRmTxNwqJXBNinSeJpbhDTReJKc/5j6JxdoncrMLxrn2E1haNRASSAv9apfYw1mqJHHBEUvBp2P4Cwm/E1lHwgmPLYzv1etiVA0GmIv4iKs7zdbe01PWvWo3mrwKBQz1NgcR73qfdzPZJGiIanBuU4kcuqWYElJjZNKXDjMQZ7DEDueGkjRdcvZQxP+xSt9PsisL0N8pr6cKCF1bpwmvjMFGrnX3lR8y+sUNNjvlsrkBaGRT4sGheaU8Wk6vK8sStJjT0Ba5W/lcgQWJPkMKz6E50/5++S4HkYijH42awdiHsca+8x22TcWse/sgB2yI9Zmkl2ya3bL/gRXwU3wN7h7al0I5jM77D8E/x4B5ziduA==</latexit><latexit sha1_base64="XOb/o3matVuyZZFJ2VbX83Rdrqw=">AAACEHicdVA9TxtBEN3jGyeACWWaFVaUVKc92ziUSGkoiRQDks9Yc+uxvWJv77Q7h2Kd/BNo+Cs0FEQRbUo6/g1rYySC4EkjPb03o5l5Sa6VIyEegoXFpeWV1bX1yoePG5tb1e1Pxy4rrMS2zHRmTxNwqJXBNinSeJpbhDTReJKc/5j6JxdoncrMLxrn2E1haNRASSAv9apfYw1mqJHHBEUvBp2P4Cwm/E1lHwgmPLYzv1etiVA0GmIv4iKs7zdbe01PWvWo3mrwKBQz1NgcR73qfdzPZJGiIanBuU4kcuqWYElJjZNKXDjMQZ7DEDueGkjRdcvZQxP+xSt9PsisL0N8pr6cKCF1bpwmvjMFGrnX3lR8y+sUNNjvlsrkBaGRT4sGheaU8Wk6vK8sStJjT0Ba5W/lcgQWJPkMKz6E50/5++S4HkYijH42awdiHsca+8x22TcWse/sgB2yI9Zmkl2ya3bL/gRXwU3wN7h7al0I5jM77D8E/x4B5ziduA==</latexit><latexit sha1_base64="XOb/o3matVuyZZFJ2VbX83Rdrqw=">AAACEHicdVA9TxtBEN3jGyeACWWaFVaUVKc92ziUSGkoiRQDks9Yc+uxvWJv77Q7h2Kd/BNo+Cs0FEQRbUo6/g1rYySC4EkjPb03o5l5Sa6VIyEegoXFpeWV1bX1yoePG5tb1e1Pxy4rrMS2zHRmTxNwqJXBNinSeJpbhDTReJKc/5j6JxdoncrMLxrn2E1haNRASSAv9apfYw1mqJHHBEUvBp2P4Cwm/E1lHwgmPLYzv1etiVA0GmIv4iKs7zdbe01PWvWo3mrwKBQz1NgcR73qfdzPZJGiIanBuU4kcuqWYElJjZNKXDjMQZ7DEDueGkjRdcvZQxP+xSt9PsisL0N8pr6cKCF1bpwmvjMFGrnX3lR8y+sUNNjvlsrkBaGRT4sGheaU8Wk6vK8sStJjT0Ba5W/lcgQWJPkMKz6E50/5++S4HkYijH42awdiHsca+8x22TcWse/sgB2yI9Zmkl2ya3bL/gRXwU3wN7h7al0I5jM77D8E/x4B5ziduA==</latexit>

(MC sweep)

hMC↵ i

<latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit><latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit><latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit><latexit sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit><latexit sha1_base64="2O57DSZK9mfx3RPtw5h2brH+nsQ=">AAACA3icbZC9SgNBFIXv+m/8i7Y2gyJYhV0bLQUbG0HBmEA2hruTm2RwdnaZuSuGJU9g46vYWCjiO9j5Nk5iCv8ODBzOmeHO/ZJcK8dh+BHMzM7NLywuLVdWVtfWN6qbq1cuK6ykusx0ZpsJOtLKUJ0Va2rmljBNNDWSm5Nx37gl61RmLnmYUzvFvlE9JZF91KnuxRpNX5OIGYtOjDof4HXMdMfl2clIxHbSdqq7YS2cSPw10dTswlTnnep73M1kkZJhqdG5VhTm3C7RspKaRpW4cJSjvME+tbw1mJJrl5N1RmLPJ13Ry6w/hsUk/f6ixNS5YZr4mynywP3uxuF/Xavg3lG7VCYvmIz8GtQrtOBMjNmIrrIkWQ+9QWmV/6uQA7Qo2ROseAjR75X/mquDWhTWoosQlmAbdmAfIjiEYziFc6iDhHt4hGd4CR6Cp+D1C9dMMOW2BT8UvH0C3uea9g==</latexit><latexit sha1_base64="PQSt19o7XeD8eef3KCoKuz3z6nI=">AAACA3icdZDNSiNBFIVv64w60dGM29kUI4Krpjv+JO4G3MxGcMCokI7hduUmKayubqpui6HJE7jxVWbjQhnmHWY3b2MlKjiiBwoO51Rx635poZXjKPoXzM1/+LiwuPSptrzyeXWt/mXlxOWlldSWuc7tWYqOtDLUZsWazgpLmKWaTtOLg2l/eknWqdwc87igboZDowZKIvuoV99MNJqhJpEwlr0EdTHC84TpiqvDg4lI7Kzt1TeicL+xu7e7I6Jwpxlvt7a9acTNViMWcRjNtAFPOurV/yb9XJYZGZYanevEUcHdCi0rqWlSS0pHBcoLHFLHW4MZuW41W2ciNn3SF4Pc+mNYzNKXLyrMnBtnqb+ZIY/c624avtV1Sh60upUyRclk5OOgQakF52LKRvSVJcl67A1Kq/xfhRyhRcmeYM1DeN5UvG9OGmEchfHPCJbgK3yDLYihCd/hBxxBGyRcwy+4g/vgJrgNfj/imgueuK3Dfwr+PABM4ZtE</latexit><latexit sha1_base64="w5CfhoMF57sszR/7ktr2I43mCzs=">AAACDnicdVBNSyNBEO3xY9XsukY97qUxCHsaeuJHsjfBixfBhY0KmWyo6VSSxp6eobtGNgz5BV78K148KMtePXvz32wnRlDZfVDweK+KqnpJrpUjIZ6CufmFxQ9LyyuVj59WP69V1zdOXVZYiS2Z6cyeJ+BQK4MtUqTxPLcIaaLxLLk4nPhnl2idyswPGuXYSWFgVF9JIC91q9uxBjPQyGOCohuDzofwMyb8ReXx4ZjHdup2qzURfqvv7e/tchHuNqKd5o4n9ajRrEc8CsUUNTbDSbf6GPcyWaRoSGpwrh2JnDolWFJS47gSFw5zkBcwwLanBlJ0nXL6zphve6XH+5n1ZYhP1dcTJaTOjdLEd6ZAQ/fem4j/8toF9ZudUpm8IDTyeVG/0JwyPsmG95RFSXrkCUir/K1cDsGCJJ9gxYfw8in/Pzmth5EIo++idiBmcSyzL2yLfWURa7ADdsROWItJdsVu2B27D66D2+B38Oe5dS6YzWyyNwge/gIBY5yn</latexit><latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit><latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit><latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit><latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit><latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit><latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit>

Probability of visiting basin α, Pα

Mean occupancy time

Figure A.1: Equilibrium dynamics of the inferred pairwise maximum entropy modelfails to capture the neural dynamics of C. elegans. The mean occupancy time ofeach basin of the energy landscape, 〈τα〉, is plotted against the fraction of time thesystem visits the basin, Pα. For 10 subgroups of N = 10 (in dots) and N = 20(in asterisks) neurons, the empirical dynamics exhibits a weak power-law relationbetween 〈τα〉 and Pα. The striped patterns are artifacts due to finite sample size. Incontrast, equilibrium dynamics extracted from a Monte Carlo simulation followingdetailed balance shows an inverse logarithmic relation between 〈τα〉 and Pα, whichcan be explained by random walks on the energy landscape. Error bars of the data areextracted from random halves of the data. Error bars of the Monte Carlo simulationare calculated using correlation time and standard deviation of the observables.

88

Appendix B

Dynamical inference for C. elegans

neural activity

In Chapter 3 (also see [Chen et al., 2019]), we successfully constructed a (joint)

probability model for the neural activity of 50+ neurons in the brain of nematode

Caenorhabditis elegans, when the worms are immobilized. By matching the mean and

the pairwise correlation of the discretized neural activity while maximizing the en-

tropy, we were able to construct a statistical physics model that successfully captures

the collective behavior of these neurons, such as that the state of individual neurons

can be predicted using network information; the number of local maxima in the prob-

ability landscape is extensive as the number of neurons in the model. Nonetheless, we

also showed that the equilibrium dynamics naturally given the static model, learned

using neural activity patterns when the time axis is disregarded, cannot predict the

out-of-equilibrium dynamics of the observed neural activity. In general, many dy-

namical processes can share the same steady state distribution. In order to infer the

mechanism of the interaction, we need to add the time axis back into the data, and

perform dynamical inference for the neural activity.

89

To learn the dynamical interaction rules requires larger numbers of parameters

in the model, and hence more number of independent time points in the data com-

pared to learning the static model. The current state of the data is such that 100+

neurons can be measured simultaneously, but not for many times longer than the

correlation time in the data, so learning a meaningful dynamical model is difficult.

In this appendix, we first estimate the correlation time of neural activity in immo-

bilized C. elegans. Then, we provide a possible mechanism for learning a dynamical

model, which will be able to provide more biological insights once experiments can

be performed to collect much longer datasets.

B.1 Estimate correlation time from the data

To understand the dynamical properties of the data, we estimate the correlation time

from the activity of both individual neurons and the entire network. Two methods

are discussed here: one is through fitting exponential functions to the decay of the

correlation function, and the other is through the convergence of eigenvalues of the

cross-correlation matrix measured at different time difference. In both methods, we

are using the discrete representation of neural activity, σi(t), which assign the activity

of a neuron i based on the time derivative into three states: rise, fall, and flat.

The first method to estimate the correlation time for individual neuron requires

measuring the overlap of the neuronal state, which we defined to be

qi(∆t) = 〈δσi(t)σi(t+∆t)〉t . (B.1)

The overlap takes value of 1 when ∆t = 0, and takes a baseline value q0 at large ∆t.

We extract q0 by measuring the overlap after we randomly permute the time points to

corrupt the time correlation in the data. The overlap function decays exponentially;

thus we fit the exponential function qi(∆t) = (1−q0) exp(−t/τi)+q0 to the measured

90

overlap to extract the correlation time τi for individual neurons (see Fig. B.1). Note

that this correlation time is an effective time constant driven by the entire network:

even though each neuron has an unique time constant computed this way, it is likely

to be different compared to the time constant of the same neuron, when it is isolated

from the rest of the network.

We can also extract the correlation time for the entire network by measure the

average overlap of the neuron states for the whole network. The average overlap is

defined as

q(∆t) =⟨ 1

N

N∑i=1

δσi(t)σi(t+∆t)

⟩t,

where N is the total numbers of neurons in our system. For the 84 neurons recorded

in the dataset used in [Chen et al., 2019], the correlation time for the network is

τ = 4.1s.

Figure B.1: Overlap of the discretized neuron signal, defined as q(∆t) =⟨1N

∑Ni=1 δσi(t)σi(t+∆t)

⟩t, versus delay time ∆t, showing global correlation time (blue),

and two example neurons (red, orange). Black lines show exponential fit.

An alternative method to extract a global correlation time is through the spectra

of the cross-correlation matrix at different time difference. As shown in Fig. B.2, the

rank order plot of the eigenvalues of the cross-correlation matrix,

Cτij = 〈δσi(t)σj(t+τ)〉, (B.2)

91

100

101

102

rank

10-3

10-2

10-1

100

101

102

eig

en

va

lue

= 0

= 2s

= 4s

= 6s

= 8s

= 10s

= 12s

= 14s

N = 30, specific choice of neurons

100

101

102

rank

10-4

10-3

10-2

10-1

100

101

eig

en

va

lue

N = 30, specific choice of neurons, connected

100

101

rank

10-3

10-2

10-1

100

101

eig

en

va

lue

100

101

rank

10-3

10-2

10-1

100

101

eig

en

va

lue

100

101

102

rank

10-4

10-3

10-2

10-1

100

101

102

eig

en

va

lue

100

101

102

rank

10-3

10-2

10-1

100

101

eig

en

va

lue

N = 10, specific choice of neurons

N = 10, specific choice of neurons N = 50, specific choice of neurons, connected

N = 10, specific choice of neurons, connected

Figure B.2: Spectra of the (cross-)correlation matrix for observed neuron groups withnumber of neurons N = 10, 30. We only show the real parts of the eigenvalues.

and its connected version,

Cτij = 〈δσtiσt+τj

〉 −p∑r=1

〈δσir〉〈δσjr〉, (B.3)

converges when the time difference τ = 8s ∼ 12s. This convergence implies that

neuron-neuron interaction occurs at a timescale less than 8 seconds, which is consis-

tent with the decaying exponent we extracted using overlaps.

92

B.2 Coupling the neural activity and its time

derivative

Our goal is to construct a dynamical model where the network properties can be

predicted using only local interaction among neurons. A successful dynamical model

can, for example, predict the long network correlation time as an emergent property

despite short time scales of the neuron-neuron interaction. We continue using the

idea of statistical inference directly from data, by constructing models that match

the lower-order observables but otherwise are least structured.

A natural method of constructing dynamical model is to infer its action with both

the magnitude of neural activity (measured by the Calcium concentration observed

in each neuron nuclei as a proxy), which we denote as f , and its time derivative, f .

In addition, experimental evidences also suggest that neurons in C. elegans interact

with each other through such couplings, as in multiple cases, the rate of the voltage

change in the post-synaptic neuron is found proportional to the voltage of the pre-

synaptic neuron [Liu et al., 2009, Goodman et al., 2012]. The inferred dynamics is a

probability model that gives the probability for each possible trajectory. For simplic-

ity, we construct dynamical models for discretized neural states, where the signal for

neuron i, fi, is discretized into q = 2 states using histogram equalization, i.e. such

that the q states are equally-likely visited by the neuron, and denote the discretized

signal θi ∈ 0, 1. For the time derivative fi, we discretize it into p = 2 states, where

σi = 1 if fi > 0 and 0 otherwise.The resulting dynamics becomes a Markov process,

defined by the joint distribution, P (θ, σ).

The first question we ask, is whether a model with only local field and an interac-

tion term between the signal θ and the derivatives σ can reproduce the observed

pairwise correlation within θ and σ. To construct such model, we constrain the

93

mean activity

mi = 〈σi〉 (B.4)

µi = 〈θi〉 (B.5)

and the pairwise correlation matrix

Γij = 〈θiσj〉.

The resulting maximum entropy distribution is

P (θ, σ) =1

Zexp

(∑i,j

Aijθiσj +∑i

giθi +∑i

hiσi

). (B.6)

This is very similar to a Restricted Boltzmann Machine, but it’s important to keep

in mind that there is no hidden variable in our model. The Lagrange multipliers

A, g, h are learned by maximizing the probability of the data for both θ and σ.

Nonetheless, as shown by Fig. B.3, the model fails to predicts the cross-correlation

within θ and σ.

Alternatively, we construct the full pairwise model for θ, σ, which additionally

constrains the pairwise correlation Γθij = 〈θiθj〉, and Γσij = 〈σiσj〉. The resulting

distribution is

P (θ, σ) =1

Zexp

(∑giθi +

∑hiσi +

∑Kijθiθj +

∑Jijσiσj +

∑Jijθiσj

)(B.7)

As shown by Fig. B.4, the inferred interaction parameter mostly concentrate in the

interaction among neural activities, Kij, and the interaction among neural deriva-

tives, Jij. The cross-interaction matrix Jij is sparse. In addition, if we set the

cross-interaction to zero, the performance of the model in predicting the higher-order

94

0 0.5 1

ij, data

0

0.5

1

ij, mod

el

fdf/dt

-0.3 0 0.3C ij, data

-0.3

0

0.3

Cij, m

odel

Figure B.3: Maximum entropy model constructed by constraining only the meanactivity, and the pairwise correlation 〈θiσj〉 between the magnitude of neural activityθ and its time derivative σ fails to predict the correlation among each class of theobservables. The left panel shows the unconnected correlation matrix, Γfij = 〈θiθj〉,and Γ

df/dtij = 〈sigmaiσj〉; the right panel shows the connected version. Data plotted

here are drawn from a subset of neurons with size N = 30.

correlation terms of the data is not largely impacted. In particular, the distribution

for the joint activity P (θ) (Fig. B.4, bottom left panel) cannot be captured by the

model with only pairwise interaction, while the distribution for the time derivatives

P (σ) can be well-described by a pairwise model (bottom right panel).

In summary, this maximum entropy model coupling the discrete version of the

neural activity and its time derivative does not add much to construct the join distri-

bution; it seems from the inferred interaction coefficient, that the interaction across

f and f is small, which is surprising considering the graded nature of C. elegans

neuronal system. Another limitation of this approach is that with discretized data, if

we were to generate actual traces from this model, we are required to further model

some coherent time. These problems will be better addressed if we have long enough

data to perform inference directly on the continuous neural activity.

95

N = 30

f df/dt

f

df/dt

-0.2

-0.1

0

0.1

0.2

Cij

N = 30

f df/dt

f

df/dt

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

J ij, inf

erre

d

-0.1 0 0.1C ijk, data

-0.1

0

0.1

Cijk

, mod

el

P( )P( , )

-0.1 0 0.1C ijk, data

-0.1

0

0.1

Cijk

, mod

el

P( )P( , )

Figure B.4: Learn the full pairwise maximum entropy model for N = 30. Top left :Full connected correlation for N = 30. Top right : The inferred interaction strengthfrom the full pairwise maximum entropy model. Bottom: Connected three-pointcorrelation cannot be predicted by the model for discretized magnitude of the signalθ, but can be predicted for the discretized time derivative σ. Adding couplingbetween θ and σ does not help.

96

Appendix C

Appendices for Chapter 4

C.1 How to take averages for the time constants?

For finite-sized systems, we compute the observables by averaging over samples of

connection matrix M , drawn with Monte Carlo methods from the distribution P (M).

Usually, we can just take the arithmetic mean:

[f ]av ≡1

Nmc

∑αmc

f(λiαmc) (C.1)

However, in the case of the correlation time τcorr, because it is a ratio of two sums over

functions of the eigenvalues, we have to take the geometric mean when we perform

the averages:

τcorr(N) ≡ exp [ln τcorr(λiN)]av =

(∏α

∑i τ

2α,i∑

i τα,i

)1/Nmc

(C.2)

Similarly, when we average over disorder for τmax at finite system size, we also

want to take the geometric mean, especially because the distribution of time scales

has long tails, as shown by the fractional standard deviation in Fig. C.1.

for the time scales in Fig. C.1.

97

100 102 104

N

1.1

1.2

1.3

1.4

1.5

ln t

100 102 104

N

0

1

2

3

t

105

100 102 104

N

0

0.5

1

ln t/[l

n t]

av

tmax

tcorr

100 102 104

N

0

5

10

t/[t] av

Figure C.1: The fractional standard deviation of log time scale decreases with systemsize, while the fractional standard deviation of time scales is large and does not showtrend of decreasing, suggesting the distribution of time scales has long tails. Theseresults shows that we should take the geometric mean (rather than arithmetic mean)when we average over disorder to compute time scales. Here, the matrices are drawnfrom the GOE ensemble with the hard wall. Interaction strength is greater its criticalvalue (i.e. in the supercritical phase), with c = 1.

98

100 102 104

N

100

101

102

103

exp

([ ln

t ] av

)

critical c, hard wall

tmax

tcorr

N2/3

N2/5

N1/3

100 102 104

N

100

101

102

103

[t]av

critical c, hard wall

100 102 104

N

100

102

104

106

exp

([ ln

t ] av

)

c = 1, hard wall

tmax

tcorr

N2

100 102 104

N

100

102

104

106

[t]av

c = 1, hard wall

Figure C.2: Finite size scaling for τmax and τcorr, for matrices drawn from GOE withhard stability constraint, interaction strength being the critical, cc = 1/

√2, (left

panel) and super-critical, c = 1 (right panel). For each system size N , we average thetime scales over 1000 Monte Carlo realization.

C.2 Finite size effect for Model 2

We compute the time scales, τcorr(N) and τmax(N), by averaging over the disorder.

As shown by Fig. C.2, in the critical regime where c = 1/√

2, the longest time scale

tmax ∼ N2/3, and the correlation time scale tcorr ∼ N2/5; in the supercritical regime,

the time constants τmax(N) ∼ τcorr ∼ N2, as expected from the mean field result.

The decay of the auto-correlation coefficient R(t) is also phase-dependent. For

systems with the critical interaction strength, the mean field prediction of the auto-

correlation function decays as a power-law, as R(t) ∼ t−1/2. However, for finite sized

systems, the true power-law cannot be achieved (see Fig. C.3). On the other hand,

if the system is in the supercritical phase, i.e. c > 1/√

2, then the autocorrelation

coefficient function has a plateau of length ∼ N2, and then decays. As shown in

Fig. C.4, it is perhaps clearer to plot 1 − [R(t)]av in log-log scales. We see that for

1 < t < N2, there is an intermediate scaling of 1/2. Notice that the mean field results

we compare to must have a cutoff (otherwise we cannot get C(t)). This intermediate

scaling of 1−R(t) ∼ t1/2 is very interesting, and not something one expect from sums

of exponential decays.

99

10-5 100 105

t

10-3

10-2

10-1

100

exp[

ln R

(t)] av

Mean fieldN = 4N = 16N = 64N = 256N = 1024

t -1/2

10-5 100 105

t

0.2

0.4

0.6

0.8

1

exp[

ln R

(t)] av

Mean fieldN = 4N = 16N = 64N = 256N = 1024

t -1/2

Figure C.3: The autocorrelation coefficient R(t) decays with time t. As system sizeincreases, R(t) approaches the theory prediction of a power-law decay ∼ t−1/2. Weneed to pay attention to how we take the average. Here, we are at the critical casein GOE with hard stability constraint.

10-5 100 105

t

10-2

10-1

100

exp[

ln R

(t)] av

Mean fieldN = 4N = 16N = 64N = 256N = 1024

10-5 100 105 1010

t

10-8

10-6

10-4

10-2

100ex

p ([

ln (

1-R

(t))

] av)

c = 1, hard wall

Mean field*N = 4N = 16N = 64N = 256N = 1024~ t

~ t1/2

Figure C.4: Comparison between autocorrelation coefficient function at different sys-tem size and the mean field results. For mean field result, we impose a cut-off for theeigenvalues to be at most 1 − ε. For this plot, we take ε = 1/N2, where N = 1024is chosen to be the same as the largest system we study. The top left and top rightpanels show the same data points, but with different axes scales. The bottom panelsshows the amount of autocorrelation decay, 1−R(t).

C.3 Derivation for the scaling of time constants in

Model 3

For the ensemble of connection matrices with a maximum entropy constraint on

the norm activity (Model 3), we are interested in how the time constants, τmax and

100

τcorr, depend on the parameters of the system: the interaction strength c, and the

constrained norm activity µ. In particular, how the time constants scale with the

norm activity when the system is set in different phases by the interaction strength

c. Here, we analyze the spectral distribution ρ(λ), and investigate how the gap size

g0, the length of the support l, and the constrained norm activity ]mu scales with

the Lagrange multiplier that we used to constrain the norm, ξ. The results are

summarized in Tab. C.1, and can be visualized in Fig. C.5.

We have shown that the spectral distribution is (also see Eqs. 4.24)

ρ(λ) =1

π√

(λ− 1 + g0 + l)(1− g0 − λ)B(λ),

B(λ) =

[1 +

l2

8c2+

(1− g0 −

1

2l

)λ

c2− λ2

c2+ξ

2

(2g0 − 2g20 + l − 2g0l)− (2g0 + l)λ√g0(g0 + l)(λ− 1)2

].

(C.3)

By setting the spectral density at the edge of the support to zero, we can solve for

the scaling dependence of gap size g0 and the length of the support l on the Lagrange

multiplier ξ. Mathematically, we have

B(1− g0 − l) = 0,

B(1− g0) = 0,

(C.4)

After simple algebraic manipulation, this set of constraints becomes

(2g0 + l)(8c2 − 3l2) + 4l2 = 0 (C.5)

8− l2

c2− 2ξl2(l + g0)−3/2g

−3/20 = 0, (C.6)

which sets the scaling relation between the longest time scale τmax = 1/g0 and the

Lagrange multiplier ξ.

101

For the correlation time τcorr, we can express it as the ratio

τcorr =ν

µ, (C.7)

where the denominator is the expectation value for averaged norm activity,

µ = 〈x2i 〉 =

∫dλρ(λ)

1

1− λ

=1

c2+

1

c2√g(g + l)

(c2 − g − l

2+l2

8

)− ξ

8

l2

g2(g + l)2.

(C.8)

and the numerator can be written as

ν ≡∫dλρ(λ)

1

(1− λ)2

= − 1

c2+

1

c2g−3/2(g + l)−3/2(c2g + g3 +

1

2c2l +

3

2g2l − 1

4l2 +

5

8gl2 +

1

16l3)− ξ

8

l2(2g + l)

g3(g + l)3.

(C.9)

g0 l 〈x2i 〉

ξ 1 0 < c < 1/√

2 1−√

2c+ A−ξ 2√

2c−B−ξ 1c2−√

1−2c2

c2−D−ξ

ξ 1 c = 1/√

2 Acξ2/5 2−Bcξ

2/5 2−Dcξ1/5

ξ 1 c >√

2 A+ξ2/3 l0 −B+ξ

2/3 D+ξ−1/3 + 1

c2

Table C.1: Scaling of inverse slowest time scale (gap) g0, width of the support ofspectral density l, and averaged norm per neuron 〈x2

i 〉 versus the Lagrange multiplierξ (to leading order) in different regimes.

.

Because in the limit of ξ 1, g0 is large and there is no long time scales, we focus

our discussions on cases where ξ 1.

Case 1: c < 1/√

2 - As ξ approaches 0, we recover the semicircle spectral density

with the wall far away from the spectrum. This suggests we can write g0 = 1−√

2c+

A−ξγ and l = 2

√2c−B−ξγ. The expected value for the norm is

limξ→0〈x2

i 〉|c<1/√

2 =

∫ √2c

−√

2c

dλ1

cπ

√2− λ2/c2

1− λ =1

c2−√

1− 2c2

c2(C.10)

102

10-10 100

10-5

100

105norm constraint, w = 1, M ij N(0,c2/N)

10-6 100 1060

0.5

1

1.5

2

2.5norm constraint, w = 1, M ij N(0,c2/N)

10-5 100 10510-2

10-1

100

101

norm, w = 1, M ij N(0,c2/N)

10-2 100 10210-5

100

105norm, w = 1, M ij N(0,c2/N)

Figure C.5: Scaling of g0, l, and 〈x2i 〉 as a function of the Lagrange multiplier ξ for

random matrices with maximum entropy constraint on the norm.

Taylor expansion leads to γ = 1. As ξ decreases, both g0 and µ reaches an upper

limit. There is no long time scales in this case.

Case 2 c = 1/√

2 - This is the critical case in the hard wall limit, when the

gap g0 is 0, and the spectral density remains a semicircle. To solve for the scaling

behavior when ξ 1, we follow similar procedure as in the previous section and

assume g0 = Acξγ, l = l0 + Bcξ

γ. And again, l0 = 2√

2c = 2. Now, the 0th order

terms of Eq. C.6 has no terms that scales with ξ, requiring us to go to higher orders

to solve for γ, which gives

γ =2

5. (C.11)

103

The norm is

〈x2i 〉 = 2−Dξ1/5 +O(ξ3/5) (C.12)

This is interesting: as ξ decreases, the norm activity µ approaches the limit µ = 2,

and the corresponding g0 continuous to decrease. The system can reach an infinitely

long timescale with a bounded dynamic range for individual neurons.

Case 3 c > 1/√

2 - In this regime, the spectral density at ξ = 0 is divergent near

λ = 1. But for any ξ > 0, there is a finite gap between the wall and right edge of the

spectrum, so a limit for the spectral density when ξ → 0 is not well-defined.

We can assume the gap g0 takes the form g0 = A+ξγ. Because l is of order 1, we

can assume l = l0 + B+ξγ. Plugging these to Eq. C.5, we can solve the 0th order

equation and get

l0 =2

3

(1 +√

1 + 6c2)

(C.13)

This l0 is equal to the length of the spectrum at the hard wall limit, i.e. when ξ = 0.

Plugging g0 and l into Eq. C.6, we get from 0th order solution

µ =2

3(C.14)

and

A+ = 2−2/3l1/30 (2− l20

4c2)−2/3 (C.15)

We notice that the prefactor A is not a monotonic function of c. Rather, A takes a

minimum Amin = 1 at c = 2. This is interesting, as we don’t want the interaction

strength to be too large.

104

The norm becomes

µ =

∫dλρ(λ)

1

1− λ = D+ξ−1/3 +

1

c2+O(ξ)1/3 (C.16)

where

D+ =1

c2A−1/2l

−1/20

(c2 +

l2

8− l

2

)− A−2

8. (C.17)

Similarly, the prefactor D+ has a maximum Dmax = 3/8 at c = 2.

For the correlation time, the denominator is the averaged norm, and the numerator

is

ν ≡∫dλρ(λ)

1

(1− λ)2

= Eξ−1 − 1

c2+O(ξ)

(C.18)

The prefactor E has maximum Emax = 1/8 at c = 2. Then, the correlation time is

τ swcorr =

E

Dξ−2/3 +O(ξ) (C.19)

Interestingly, when we compute the ratio of the two time scales, τmax and τcorr, we

found that the ratio is a constant, as

τ swcorr

τ swmax

=1

3− AE

c2D2ξ1/3 +O(ξ). (C.20)

In this case, as ξ approaches 0, the gap g0 decreases, but the norm also increases

to infinite. If we only look at the relation between the parametrized variable g0 and

µ, we find that g0 ∼ µ−2, which matches what we get from numerically solving the

exact Eqs. C.5 and C.6. To get an infinitely long time scale, an unlimited growth

of dynamic range for individual neuron is required, which is impossible to realize in

biological systems.

105

10-5 100 105

t

0

1

R(t

)

c = 0.6

= 10-1

= 10-4

= 10-7

= 10-10

10-5 100 105

t

0

1

R(t

)

critical c

= 10-1

= 10-4

= 10-7

= 10-10

10-5 100 105

t

0

1

R(t

)

c = 0.8

= 10-1

= 10-4

= 10-7

= 10-10

10-5 100 105

t

10-6

10-4

10-2

100

1-R

(t)

c = 0.6

= 10-1

= 10-4

= 10-7

= 10-10

10-5 100 105

t

10-6

10-4

10-2

100

1-R

(t)

critical c

= 10-1

= 10-4

= 10-7

= 10-10

10-5 100 105

t

10-5

100

1-R

(t)

c = 0.8

= 10-1

= 10-4

= 10-7

= 10-10

~t1/2

Figure C.6: Autocorrelation coefficient R(t) decays with time at different parametersets (interaction strength c and Lagrange multiplier ξ). The bottom row plots 1−R(t)against time in the log-log scale to show the ∼ t1/2 scaling at small ξ for interactionstrength in the supercritical phase.

C.4 Decay of auto-correlation coefficient

How does the autocorrelation function decay? If a system has long time scale, we

expect that we see a power law decay of autocorrelation function for some time period

0 < t < τcorr, or even a plateau that holds at R(t) = 1 for this initial time period.

This is confirmed by Fig. C.6: if the Lagrange multiplier ξ is small enough, then we

expect to see the 1/2 scaling when we plot 1 − R(t) vs. t. For finite systems, the

autocorrelation coefficient function R(t) decays in a way consistent with the mean

field results (see Fig. C.7).

106

10-5 100 105

t

10-5

100

exp[

ln (

1-R

(t))

] av

= 10-10

10-5 100 105

t

10-5

100

exp[

ln (

1-R

(t))

] av

= 10-1

N = 4N = 16N = 64N = 256N = 1024Mean field

10-5 100 105

t

10-5

100

exp[

ln (

1-R

(t))

] av

= 10-7

10-5 100 105

t

10-5

100

exp[

ln (

1-R

(t))

] av

= 10-4

Figure C.7: Finite size effect for the autocorrelation coefficient vs. time. The inter-action strength is set at c = 1 (the supercritical phase).

C.5 Model with additional constraint on self-

interaction strength

As described by the main text, the mean of Mii in Model 3 is negative. We can

add an additional maximum entropy constraint which fixes the expectation value of

the diagonal entries of the interaction matrix to be 0. Because Tr(M) =∑

i λi, the

resulting probability distribution is simply

P (λ) ∝ exp

(− 1

2c2N∑i

λ2i +

1

2

∑j 6=k

ln|λi − λj| −Nξ∑i

1

1− λi−N α

c2

∑i

λi

)(C.21)

with the Lagrange multiplier α fixing the Tr(M) = 0.

107

We can go through the math again, and find the resulting spectral density as

f(y) ≡ ρ(λ− a) =1

π√y(l − y)

[1 +

1

8c2(l2 + 4ly − 8y2)− 1

c2(1− g0 − l + α)(y − l

2)

+ξ

2

(l − 2y)(g0 + l) + ly√g0(g0 + l)(g0 + l − y)2

](C.22)

In addition to requiring the spectral density being zero at the edge of the support,

we have an additional requirement∫dλρ(λ)λ = 0. We solve the coupled equations

(2g0 + l)(8c2 − 3l2) + 4(α + 1)l2 = 0 (C.23)

8− l2

c2− 2ξl2(l + g0)−3/2g

−3/20 = 0 (C.24)

1− g0 −l

2− l2

8c2

(1− g − l

2+ α

)+ξ

2

[2− 2g + l

g1/2(g + l)1/2

]= 0 (C.25)

for g0, l, and α.

The mean difference between this model and the Model 3 (with only the norm

being constrained), is that the support of the spectrum must contain both positive

and negative eigenvalues. This effect is especially large when ξ 1, where the

eigenvalues are now packed together, occupying a length l that is no longer order 1.

However, the scaling relation between g0 and ξ, and µ and ξ, does not change in the

ξ 1 limit (through similar argument as in the scaling relation for Model 3; also see

Fig. C.8).

108

100 101 102 103100

102

104

106

100 101 102 103100

102

104

106

Figure C.8: Scaling of the longest time scale tmax and the correlation time tcorr vs. theconstrained norm µ for connection matrix M with the additional maximum entropyconstraint fixing 〈Mii〉 = 0.

109

Bibliography

[Abbott and Nelson, 2000] Abbott, L. and Nelson, S. (2000). Synaptic plasticity:Taming the beast. Nat. Neurosci., 3(S11):1178–1183.

[Ahrens et al., 2013] Ahrens, M., Orger, M., Robson, D. N., Li, J., and Keller, P.(2013). Whole-brain functional imaging at cellular resolution using light-sheetmicroscopy. Nat. Methods, 10(5):413–420.

[Akemann et al., 2015] Akemann, G., Baik, J., and Di Francesco, P. (2015). TheOxford handbook of Random Matrix Theory. Oxford University Press.

[Akemann and Kanzieper, 2007] Akemann, G. and Kanzieper, E. (2007). Integrablestructure of Ginibres ensemble of real random matrices and a Pfaffian integrationtheorem. J. Stat. Phys., 129(5):1159–1231.

[Aksay et al., 2001] Aksay, E., Gamkrelidze, G., Seung, H., Baker, R., and Tank, D.(2001). In vivo intracellular recording and perturbation of persistent activity in aneural integrator. Nat. Neurosci., 4(2):184–193.

[Amit et al., 1985] Amit, D. J., Gutfreund, H., and Sompolinsky, H. (1985). Spin-glass models of neural networks. Phys. Rev. A, 32(2):1007.

[Amit et al., 1987] Amit, D. J., Gutfreund, H., and Sompolinsky, H. (1987). Statis-tical mechanics of neural networks near saturation. Ann. Phys., 173(1):30–67.

[Ardiel and Rankin, 2010] Ardiel, E. and Rankin, C. (2010). An elegant mind: learn-ing and memory in Caenorhabditis elegans. Learn Mem., 17(4):191–201.

[Becker and Karplus, 1997] Becker, O. M. and Karplus, M. (1997). The topologyof multidimensional potential energy surfaces: Theory and application to peptidestructure and kinetics. J. Chem. Phys., 106(4):1495–1517.

[Beggs and Plenz, 2003] Beggs, J. and Plenz, D. (2003). Neuronal avalanches in neo-cortical circuits. J. Neurosci., 23(35):11167–11177.

[Berman et al., 2016] Berman, G. J., Bialek, W., and Shaevitz, J. W. (2016). Pre-dictability and hierarchy in Drosophila behavior. Proc. Natl. Acad. Sci. USA,113(42):11943–11948.

110

[Bialek et al., 2014] Bialek, W., Cavagna, A., Giardina, I., Mora, T., Pohl, O., Sil-vestri, E., Viale, M., and Walczak, A. (2014). Social interactions dominate speedcontrol in poising natural flocks near criticality. Proc. Natl. Acad. Sci. USA,111(20):7212–7217.

[Bialek et al., 2012] Bialek, W., Cavagna, A., Giardina, I., Mora, T., Silvestri, E.,Viale, M., and Walczak, A. (2012). Statistical mechanics for natural flocks ofbirds. Proc. Natl. Acad. Sci. USA, 109(13):4786–4791.

[Bienenstock et al., 1982] Bienenstock, E. L., Cooper, L. N., and Munro, P. W.(1982). Theory for the development of neuron selectivity: Orientation specificityand binocular interaction in visual cortex. J. Neurosci., 2(1):32–48.

[Biroli et al., 2018] Biroli, G., Bunin, G., and Cammarota, C. (2018). Marginallystable equilibria in critical ecosystems. New J. Phys., 20(8):083051.

[Bouchaud and Potters, 2003] Bouchaud, J.-P. and Potters, M. (2003). Theory of Fi-nancial Risk and Derivative Pricing: From Statistical Physics to Risk Management.Cambridge University Press.

[Broderick et al., 2007] Broderick, T., Dudik, M., Tkacik, G., Schapire, R., andBialek, W. (2007). Faster solutions of the inverse pairwise Ising problem. arXivpreprint arXiv:0712.2437.

[Brody et al., 2003a] Brody, C. D., Hernandez, A., Zainos, A., and Romo, R. (2003a).Timing and neural encoding of somatosensory parametric working memory inmacaque prefrontal cortex. Cereb. Cortex, 13(11):1196–1207.

[Brody et al., 2003b] Brody, C. D., Romo, R., and Kepecs, A. (2003b). Basic mech-anisms for graded persistent activity: Discrete attractors, continuous attractors,and dynamic representations. Curr. Opin. Neurobiol., 13(2):204–211.

[Bullock and Horridge, 1965] Bullock, T. and Horridge, G. A. (1965). Structure andfunction in the nervous systems of invertebrates. San Francisco.

[Burak and Fiete, 2012] Burak, Y. and Fiete, I. R. (2012). Fundamental limits onpersistent activity in networks of noisy neurons. Proc. Natl. Acad. Sci. USA,109(43):17645–17650.

[Can et al., 2020] Can, T., Krishnamurthy, K., and Schwab, D. J. (2020). Gatingcreates slow modes and controls phase-space complexity in GRUs and LSTMs.arXiv preprint. arXiv:2002.00025.

[Castellani and Cavagna, 2005] Castellani, T. and Cavagna, A. (2005). Spin-GlassTheory for pedestrians. J. Stat. Mech., 2005(05):P05012.

[Cavagna et al., 2008] Cavagna, A., Giardina, I., Orlandi, A., Parisi, G., Procaccini,A., Viale, M., and Zdravkovic, V. (2008). The STARFLAG handbook on collectiveanimal behaviour: Part I, empirical methods. Anim. Behav., 76(1):217–236.

111

[Chalker and Mehlig, 1998] Chalker, J. T. and Mehlig, B. (1998). Eigenvector statis-tics in non-Hermitian random matrix ensembles. Phys. Rev. Lett., 81(16):3367–3370.

[Chartrand, 2011] Chartrand, R. (2011). Numerical differentiation of noisy, nons-mooth data. ISRN Appl. Math., 2011.

[Chen et al., 2013] Chen, T.-W., Wardill, T. J., Sun, Y., Pulver, S. R., Renninger,S. L., Baohan, A., Schreiter, E. R., Kerr, R. A., Orger, M. B., Jayaraman, V., et al.(2013). Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature,499(7458):295.

[Chen and Bialek, 2020] Chen, X. and Bialek, W. (2020). Searching for long timescales without fine tuning. arXiv preprint. arXiv:2008.11674 [physics.bio-ph].

[Chen et al., 2019] Chen, X., Randi, F., Leifer, A. M., and Bialek, W. (2019). Search-ing for collective behavior in a small brain. Phys. Rev. E, 99(5):052418.

[Cheng and Titterington, 1994] Cheng, B. and Titterington, D. M. (1994). Neuralnetworks: A review from a statistical perspective. Stat. Sci., 9(1):2–30.

[Cocco et al., 2009] Cocco, S., Leibler, S., and Monasson, R. (2009). Neuronal cou-plings between retinal ganglion cells inferred by efficient inverse statistical physicsmethods. Proc. Natl. Acad. Sci. USA, 106(33):14058–14062.

[Cocco and Monasson, 2012] Cocco, S. and Monasson, R. (2012). Adaptive clusterexpansion for the inverse Ising problem: convergence, algorithm and tests. J. Stat.Phys., 147(2):252–314.

[Costa et al., 2019] Costa, A. C., Ahamed, T., and Stephens, G. J. (2019). Adap-tive, locally linear models of complex dynamics. Proc. Natl. Acad. Sci. USA,116(5):1501–1510.

[Cotler et al., 2017] Cotler, J. S., Gur-Ari, G., Hanada, M., Polchinski, J., Saad, P.,Shenker, S. H., Stanford, D., Streicher, A., and Tezuka, M. (2017). Black holes andrandom matrices. J. High Energy Phys., 2017(5):118.

[Dean and Majumdar, 2006] Dean, D. and Majumdar, S. (2006). Large deviations ofextreme eigenvalues of random matrices. Phys. Rev. Lett., 97(16):160201.

[Dean and Majumdar, 2008] Dean, D. and Majumdar, S. (2008). Extreme valuestatistics of eigenvalues of Gaussian random matrices. Phys. Rev. E, 77(4):041108.

[Dombeck et al., 2010] Dombeck, D. A., Harvey, C. D., Tian, L., Looger, L. L., andTank, D. W. (2010). Functional imaging of hippocampal place cells at cellularresolution during virtual navigation. Nat. Neurosci., 13(11):1433.

112

[Dudık et al., 2004] Dudık, M., Phillips, S. J., and Schapire, R. E. (2004). Perfor-mance guarantees for regularized maximum entropy density estimation. In Pro-ceedings of the 17th annual Conference on Learning Theory,(COLT 2004), Banff,Canada, volume 3120, pages 472–486. Springer.

[Dyson, 1962a] Dyson, F. (1962a). The threefold way: Algebraic structure of symme-try groups and ensembles in quantum mechanics. J. Math. Phys., 3(6):1199–1215.

[Dyson, 1962b] Dyson, F. J. (1962b). A Brownian-motion model for the eigenvaluesof a random matrix. J. Math. Phys., 3(6):1191–1198.

[Forrester and Nagao, 2007] Forrester, P. J. and Nagao, T. (2007). Eigenvalue statis-tics of the real Ginibre ensemble. Phys. Rev. Lett., 99(5):050603.

[Ganguli et al., 2008] Ganguli, S., Huh, D., and Sompolinsky, H. (2008). Memorytraces in dynamical systems. Proc. Natl. Acad. Sci. USA, 105(48):18970–18975.

[Ginibre, 1965] Ginibre, J. (1965). Statistical ensembles of complex, quaternion, andreal matrices. J. Math. Phys., 6(3):440–449.

[Goldman, 2009] Goldman, M. (2009). Memory without feedback in a neural network.Neuron, 61(4):621–634.

[Goodman et al., 1998] Goodman, M. B., Hall, D. H., Avery, L., and Lockery, S. R.(1998). Active currents regulate sensitivity and dynamic range in C. elegans neu-rons. Neuron, 20(4):763–772.

[Goodman et al., 2012] Goodman, M. B., Lindsay, T. H., Lockery, S. R., and Rich-mond, J. E. (2012). Electrophysiological methods for Caenorhabditis elegans neu-robiology. In Methods Cell Biol., volume 107, pages 409–436. Elsevier.

[Gordus et al., 2015] Gordus, A., Pokala, N., Levy, S., Flavell, S. W., and Bargmann,C. I. (2015). Feedback from network states generates variability in a probabilisticolfactory circuit. Cell, 161(2):215–227.

[Gray et al., 2005] Gray, J. M., Hill, J. J., and Bargmann, C. I. (2005). A circuitfor navigation in Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA, 102(9):3184–3191.

[Grela, 2017] Grela, J. (2017). What drives transient behavior in complex systems?Phys. Rev. E, 96(2):022316.

[Gudowska-Nowak et al., 2020] Gudowska-Nowak, E., Nowak, M. A., Chialvo, D. R.,Ochab, J. K., and Tarnowski, W. (2020). From synaptic interactions to collectivedynamics in random neuronal networks models: Critical role of eigenvectors andtransient behavior. Neural Comput., 32(2):395–423.

[Hebb, 1949] Hebb, D. O. (1949). The Organization of Behavior: A Neuropsycholog-ical Theory. J. Wiley; Chapman & Hall.

113

[Hodgkin and Huxley, 1952] Hodgkin, A. L. and Huxley, A. F. (1952). A quantitativedescription of membrane current and its application to conduction and excitationin nerve. J. Physiol., 117(4):500.

[Hopfield, 1982] Hopfield, J. J. (1982). Neural networks and physical systemswith emergent collective computational abilities. Proc. Natl. Acad. Sci. USA,79(8):2554–2558.

[Hopfield, 1984] Hopfield, J. J. (1984). Neurons with graded response have collectivecomputational properties like those of two-state neurons. Proc. Natl. Acad. Sci.USA, 81(10):3088–3092.

[Hopfield and Tank, 1985] Hopfield, J. J. and Tank, D. W. (1985). Neural computa-tion of decisions in optimization problems. Biol. Cybern., 52(3):141–152.

[Jaynes, 1957] Jaynes, E. (1957). Information theory and statistical mechanics. Phys.Rev., 106(4):620.

[Kandel et al., 2000] Kandel, E. R., Schwartz, J. H., and Jessell, T. M. (2000). Prin-ciples of Neural Science. McGraw-hill New York.

[Kato et al., 2015] Kato, S., Kaplan, H. S., Schrodel, T., Skora, S., Lindsay, T. H.,Yemini, E., Lockery, S., and Zimmer, M. (2015). Global brain dynamics embed themotor command sequence of Caenorhabditis elegans. Cell, 163(3):656–669.

[Keating and Snaith, 2000] Keating, J. P. and Snaith, N. C. (2000). Random MatrixTheory and ζ (1/2+ it). Comm. Math. Phys., 214(1):57–89.

[Kim et al., 2013] Kim, E., Sun, L., Gabel, C. V., and Fang-Yen, C. (2013). Long-term imaging of Caenorhabditis elegans using nanoparticle-mediated immobiliza-tion. PLoS One, 8(1):e53419.

[Kos et al., 2018] Kos, P., Ljubotina, M., and Prosen, T. (2018). Many-bodyquantum chaos: Analytic connection to random matrix theory. Phys. Rev. X,8(2):021062.

[Koster et al., 2014] Koster, U., Sohl-Dickstein, J., Gray, C. M., and Olshausen, B. A.(2014). Modeling higher-order correlations within cortical microcolumns. PLoSComp. Biol., 10(7):e1003684.

[Krishnamurthy et al., 2020] Krishnamurthy, K., Can, T., and Schwab, D. J. (2020).Theory of gating in recurrent neural networks. arXiv preprint. arXiv:2007.14823.

[Liu et al., 2018] Liu, M., Sharma, A. K., Shaevitz, J., and Leifer, A. M. (2018). Tem-poral processing and context dependency in C. elegans response to mechanosensa-tion. eLife, 7:e36419.

[Liu et al., 2009] Liu, Q., Hollopeter, G., and Jorgensen, E. M. (2009). Graded synap-tic transmission at the Caenorhabditis elegans neuromuscular junction. Proc. Natl.Acad. Sci. USA, 106(26):10823–10828.

114

[Livan et al., 2018] Livan, G., Novaes, M., and Vivo, P. (2018). Introduction toRandom Matrices: Theory and Practice. Springer International Publishing.arxiv.org/1712.07903.

[Magee and Grienberger, 2020] Magee, J. C. and Grienberger, C. (2020). Synapticplasticity forms and functions. Annu. Rev. Neurosci., 43:95–117.

[Magnasco et al., 2009] Magnasco, M. O., Piro, O., and Cecchi, G. A. (2009). Self-tuned critical anti-Hebbian networks. Phys. Rev. Lett., 102(25):258102.

[Major et al., 2004a] Major, G., Baker, R., Aksay, E., Mensh, B., Seung, H., andTank, D. (2004a). Plasticity and tuning by visual feedback of the stability of aneural integrator. Proc. Natl. Acad. Sci. USA, 101(20):7739–7744.

[Major et al., 2004b] Major, G., Baker, R., Aksay, E., Seung, H., and Tank, D.(2004b). Plasticity and tuning of the time course of analog persistent firing ina neural integrator. Proc. Natl. Acad. Sci. USA, 101(20):7745–7750.

[Major and Tank, 2004] Major, G. and Tank, D. (2004). Persistent neural activity:prevalence and mechanisms. Curr. Opin. Neurobiol., 14(6):675–684.

[Marchetti et al., 2013] Marchetti, M. C., Joanny, J.-F., Ramaswamy, S., Liverpool,T. B., Prost, J., Rao, M., and Simha, R. A. (2013). Hydrodynamics of soft activematter. Rev. Mod. Phys., 85(3):1143.

[Marino, 2016] Marino, R. (2016). Number statistics in random matrices and appli-cations to quantum systems. PhD thesis, Universite Paris–Saclay.

[May, 1972] May, R. M. (1972). Will a large complex system be stable? Nature,238(5364):413–414.

[McCulloch and Pitts, 1943] McCulloch, W. S. and Pitts, W. (1943). A logical cal-culus of the ideas immanent in nervous activity. Bull. Math. Biol., 5(4):115–133.

[Mehlig and Chalker, 2000] Mehlig, B. and Chalker, J. T. (2000). Statistical proper-ties of eigenvectors in non-Hermitian Gaussian random matrix ensembles. J. Math.Phys., 41(5):3233–3256.

[Meshulam et al., 2017] Meshulam, L., Gauthier, J. L., Brody, C. D., Tank, D. W.,and Bialek, W. (2017). Collective behavior of place and non-place neurons in thehippocampal network. Neuron, 96(5):1178 – 1191.e4.

[Meshulam et al., 2018] Meshulam, L., Gauthier, J. L., Brody, C. D., Tank, D. W.,and Bialek, W. (2018). Coarse-graining, fixed points, and scaling in a large popu-lation of neurons. arXiv preprint arXiv:1809.08461 [q-bio.NC].

[Mezard and Mora, 2009] Mezard, M. and Mora, T. (2009). Constraint satisfactionproblems and neural networks: A statistical physics perspective. J. Physiol. Paris,103(1-2):107–113.

115

[Monasson and Rosay, 2015] Monasson, R. and Rosay, S. (2015). Transitions betweenspatial attractors in place-cell models. Phys. Rev. Lett., 115:098101.

[Mora and Bialek, 2011] Mora, T. and Bialek, W. (2011). Are biological systemspoised at criticality? J. Stat. Phys., 144(2):268–302.

[Mora et al., 2010] Mora, T., Walczak, A., Bialek, W., and Callan, C. (2010).Maximum entropy models for antibody diversity. Proc. Natl. Acad. Sci. USA,107(12):5405–5410.

[Munoz, 2018] Munoz, M. A. (2018). Colloquium: Criticality and dynamical scalingin living systems. Rev. Mod. Phys., 90(3):031001.

[Nguyen et al., 2016] Nguyen, J., Shipley, F., Linder, A., Plummer, G., Liu, M.,Setru, S., Shaevitz, J., and Leifer, A. (2016). Whole-brain calcium imaging withcellular resolution in freely behaving Caenorhabditis elegans. Proc. Natl. Acad. Sci.USA, 113(8):E1074–E1081.

[Nguyen et al., 2017] Nguyen, J. P., Linder, A. N., Plummer, G. S., Shaevitz, J. W.,and Leifer, A. M. (2017). Automatically tracking neurons in a moving and deform-ing brain. PLoS Comp. Biol., 13(5):e1005517.

[Nichols et al., 2017] Nichols, A. L., Eichler, T., Latham, R., and Zimmer, M.(2017). A global brain state underlies C. elegans sleep behavior. Science,356(6344):eaam6851.

[Ohiorhenuan et al., 2010] Ohiorhenuan, I. E., Mechler, F., Purpura, K. P., Schmid,A. M., Hu, Q., and Victor, J. D. (2010). Sparse coding and high-order correlationsin fine-scale cortical networks. Nature, 466(7306):617.

[Pennington and Worah, 2017] Pennington, J. and Worah, P. (2017). Nonlinear ran-dom matrix theory for deep learning. In Advances in Neural Information ProcessingSystems, pages 2637–2646.

[Philip, 2018] Philip, B. (2018). Schrodinger’s cat among biology’s pigeons: 75 yearsof What is Life? Nature, 560:548–550.

[Piggott et al., 2011] Piggott, B. J., Liu, J., Feng, Z., Wescott, S. A., and Xu, X. S.(2011). The neural circuits and synaptic mechanisms underlying motor initiationin C. elegans. Cell, 147(4):922–933.

[Plerou et al., 2002] Plerou, V., Gopikrishnan, P., Rosenow, B., Nunes Amaral, L.,Guhr, T., and Stanley, H. (2002). Random matrix approach to cross correlationsin financial data. Phys. Rev. E, 65(6):066126.

[Posani et al., 2017] Posani, L., Cocco, S., Jezek, K., and Monasson, R. (2017). Func-tional connectivity models for decoding of spatial representations from hippocampalCA1 recordings. J. Comp. Neurosci., 43(1):17–33.

116

[Presse et al., 2013] Presse, S., Ghosh, K., Lee, J., and Dill, K. A. (2013). Principlesof maximum entropy and maximum caliber in statistical physics. Rev. Mod. Phys.,85(3):1115–1141.

[Rajan and Abbott, 2006] Rajan, K. and Abbott, L. (2006). Eigenvalue spectra ofrandom matrices for neural networks. Phys. Rev. Lett., 97(18):188104.

[Ramaswamy, 2010] Ramaswamy, S. (2010). The mechanics and statistics of activematter. Annu. Rev. Condens. Matter Phys., 1(1):323–345.

[Renart et al., 2003] Renart, A., Song, P., and Wang, X.-J. (2003). Robust spatialworking memory through homeostatic synaptic scaling in heterogeneous corticalnetworks. Neuron, 38(3):473–485.

[Reuter et al., 2015] Reuter, J. A., Spacek, D. V., and Snyder, M. P. (2015). High-throughput sequencing technologies. Mol. Cell, 58(4):586–597.

[Rieke et al., 1997] Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek,W. (1997). Spikes: Exploring the Neural Code. MIT Press.

[Rosenblatt, 1958] Rosenblatt, F. (1958). The perceptron: A probabilistic model forinformation storage and organization in the brain. Psychol. Rev., 65(6):386.

[Roudi et al., 2009] Roudi, Y., Aurell, E., and Hertz, J. (2009). Statistical physics ofpairwise probability models. Front. Comp. Neurosci., 3:22.

[Schmidt, 2007] Schmidt, M. (2007). UGM: A Matlab toolbox for probabilisticundirected graphical models. http://www.cs.ubc.ca/~schmidtm/Software/UGM.html.

[Schneidman et al., 2006] Schneidman, E., Berry II, M. J., Segev, R., and Bialek, W.(2006). Weak pairwise correlations imply strongly correlated network states in aneural population. Nature, 440:1007.

[Scholz et al., 2018] Scholz, M., Linder, A. N., Randi, F., Sharma, A. K., Yu, X.,Shaevitz, J. W., and Leifer, A. (2018). Predicting natural behavior from whole-brain neural dynamics. bioRxiv preprint bioRxiv:445643.

[Schrodinger, 1944] Schrodinger, E. (1944). What is Life? ThePhysical Aspect of the Living Cell. Cambridge University Press.http://www.whatislife.ie/downloads/What-is-Life.pdf.

[Segev et al., 2004] Segev, R., Goodhouse, J., Puchalla, J., and Berry II, M. J. (2004).Recording spikes from a large fraction of the ganglion cells in a retinal patch. Nat.Neurosci., 7(10):1155.

[Sengupta and Samuel, 2009] Sengupta, P. and Samuel, A. D. (2009). Caenorhab-ditis elegans : A model system for systems neuroscience. Curr. Opin. Neurobiol.,19(6):637–643.

117

http://www.cs.ubc.ca/~schmidtm/Software/UGM.html

http://www.cs.ubc.ca/~schmidtm/Software/UGM.html

[Seung, 1996] Seung, H. S. (1996). How the brain keeps the eyes still. Proc. Natl.Acad. Sci. USA, 93(23):13339–13344.

[Shannon, 1948] Shannon, C. E. (1948). A mathematical theory of communication.Bell Syst. Tech., 27:379–423, 623–656.

[Shatz, 1992] Shatz, C. J. (1992). The developing brain. Sci. Am., 267:60–67.

[Shemesh et al., 2013] Shemesh, Y., Sztainberg, Y., Forkosh, O., Shlapobersky, T.,Chen, A., and Schneidman, E. (2013). High-order social interactions in groups ofmice. eLife, 2:e00759.

[Shore and Johnson, 1980] Shore, J. and Johnson, R. (1980). Axiomatic derivationof the principle of maximum entropy and the principle of minimum cross-entropy.IEEE Trans. Inform. Theory, 26(1):26–37.

[Slonim et al., 2005] Slonim, N., Atwal, G. S., Tkacik, G., and Bialek, W. (2005).Estimating mutual information and multi-information in large networks. arXivpreprint cs/0502017.

[Sompolinsky et al., 1988] Sompolinsky, H., Crisanti, A., and Sommers, H. (1988).Chaos in random neural networks. Phys. Rev. Lett., 61(3):259–262.

[Srimal and Curtis, 2008] Srimal, R. and Curtis, C. E. (2008). Persistent neural ac-tivity during the maintenance of spatial position in working memory. Neuroimage,39(1):455–468.

[Stephens et al., 2011] Stephens, G., de Mesquita, M., Ryu, W., and Bialek, W.(2011). Emergence of long timescales and stereotyped behaviors in Caenorhab-ditis elegans. Proc. Natl. Acad. Sci. USA, 108(18):7286–7289.

[Stevenson and Kording, 2011] Stevenson, I. H. and Kording, K. P. (2011). Howadvances in neural recording affect data analysis. Nat. Neurosci., 14(2):139–142.

[Stringer et al., 2019] Stringer, C., Pachitariu, M., Steinmetz, N., Reddy, C. B.,Carandini, M., and Harris, K. D. (2019). Spontaneous behaviors drive multidi-mensional, brainwide activity. Science, 364(6437).

[Tanaka, 1998] Tanaka, T. (1998). Mean-field theory of Boltzmann machine learning.Phys. Rev. E, 58(2):2302.

[Tang et al., 2008] Tang, A., Jackson, D., Hobbs, J., Chen, W., Smith, J. L., Patel,H., Prieto, A., Petrusca, D., Grivich, M. I., Sher, A., Hottowy, P., Dabrowski,W., Litke, A. M., and Beggs, J. M. (2008). A maximum entropy model appliedto spatial and temporal correlations from cortical networks in vitro. J. Neurosci.,28(2):505–518.

[Tarnowski et al., 2020] Tarnowski, W., Neri, I., and Vivo, P. (2020). Universaltransient behavior in large dynamical systems on networks. Phys. Rev. Res.,2(2):023333.

118

[Tetzlaff et al., 2013] Tetzlaff, C., Kolodziejski, C., Timme, M., Tsodyks, M., andWorgotter, F. (2013). Synaptic scaling enables dynamically distinct short- andlong-term memory formation. PLoS Comput. Biol., 9(10).

[Thouless et al., 1977] Thouless, D. J., Anderson, P. W., and Palmer, R. G. (1977).Solution of ‘solvable model of a spin glass’. Philos. Mag., 35(3):593–601.

[Tkacik et al., 2015] Tkacik, G., Mora, T., Marre, O., Amodei, D., Palmer, S. E.,Berry, M. J., and Bialek, W. (2015). Thermodynamics and signatures of criticalityin a network of neurons. Proc. Natl. Acad. Sci. USA, 112(37):11508–11513.

[Tkacik et al., 2014] Tkacik, G., Marre, O., Amodei, D., Schneidman, E., Bialek, W.,and Berry, II, M. J. (2014). Searching for collective behavior in a large network ofsensory neurons. PLoS Comp. Biol., 10(1):1–23.

[Tkacik et al., 2009] Tkacik, G., Schneidman, E., Berry, I., Michael, J., and Bialek,W. (2009). Spin glass models for a network of real neurons. arXiv preprintarXiv:0912.5409.

[Tricomi, 1957] Tricomi, F. G. (1957). Integral Equations. Interscience Publishers,Inc., London & New York.

[Turrigiano et al., 1998] Turrigiano, G., Leslie, K., Desai, N., Rutherford, L., andNelson, S. (1998). Activity-dependent scaling of quantal amplitude in neocorticalneurons. Nature, 391(6670):892–896.

[Turrigiano and Nelson, 2004] Turrigiano, G. and Nelson, S. (2004). Homeostaticplasticity in the developing nervous system. Nat. Rev. Neurosci., 5(2):97–107.

[Turrigiano, 2008] Turrigiano, G. G. (2008). The self-tuning neuron: Synaptic scalingof excitatory synapses. Cell, 135(3):422–435.

[Venkatachalam et al., 2016] Venkatachalam, V., Ji, N., Wang, X., Clark, C.,Mitchell, J., Klein, M., Tabone, C., Florman, J., Ji, H., Greenwood, J., Chisholm,A., Srinivasan, J., Alkema, M., Zhen, M., and Samuel, A. (2016). Pan-neuronal imaging in roaming Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA,113(8):E1082–E1088.

[Vicsek et al., 1995] Vicsek, T., Czirok, A., Ben-Jacob, E., Cohen, I., and Shochet,O. (1995). Novel type of phase transition in a system of self-driven particles. Phys.Rev. Lett., 75(6):1226.

[Vreeswijk and Sompolinsky, 1996] Vreeswijk, C. v. and Sompolinsky, H. (1996).Chaos in neuronal networks with balanced excitatory and inhibitory activity. Sci-ence, 274(5293):1724–1726.

[Watts and Strogatz, 1998] Watts, D. J. and Strogatz, S. H. (1998). Collective dy-namics of “small-world” networks. Nature, 393(6684):440–442.

119

[Weigt et al., 2009] Weigt, M., White, R. A., Szurmant, H., Hoch, J. A., and Hwa,T. (2009). Identification of direct residue contacts in protein–protein interactionby message passing. Proc. Natl. Acad. Sci. USA, 106(1):67–72.

[White et al., 1986] White, J. G., Southgate, E., Thomson, J. N., and Brenner, S.(1986). The structure of the nervous system of the nematode Caenorhabditis ele-gans. Philos. Trans. R. Soc. Lond. B Biol. Sci., 314(1165):1–340.

[Wigner, 1951] Wigner, E. (1951). On the statistical distribution of the widths andspacings of nuclear resonance levels. Math. Proc. Cambridge, 47(4):790798.

[Yan et al., 2017] Yan, G., Vertes, P. E., Towlson, E. K., Chew, Y. L., Walker, D. S.,Schafer, W. R., and Barabasi, A.-L. (2017). Network control principles predictneuron function in the Caenorhabditis elegans connectome. Nature, 550(7677):519.

[Yedidia et al., 2001] Yedidia, J. S., Freeman, W. T., and Weiss, Y. (2001). Gener-alized belief propagation. In Advances in Neural Information Processing Systems,pages 689–695.

[Zenke et al., 2017] Zenke, F., Gerstner, W., and Ganguli, S. (2017). The temporalparadox of Hebbian learning and homeostatic plasticity. Curr. Opin. Neurobiol.,43:166–176.

[Zenke et al., 2013] Zenke, F., Hennequin, G., and Gerstner, W. (2013). Synapticplasticity in neural networks needs homeostasis with a fast rate detector. PLoSComput. Biol., 9(11):e1003330.

120

Documents

physics.princeton.eduphysics.princeton.edu/archives/theses/lib/upload/chen_xiaowen_thesi… · Abstract In recent years, advances in experimental techniques have allowed for the rst