Consciousness as a State of Matter - arXiv · Consciousness as a State of Matter Max Tegmark ... be viewed as novel states of matter and investigated with traditional physics tools

Consciousness as a State of Matter

Max TegmarkDept. of Physics & MIT Kavli Institute, Massachusetts Institute of Technology, Cambridge, MA 02139

(Dated: Accepted for publication in Chaos, Solitons & Fractals March 17, 2015)

We examine the hypothesis that consciousness can be understood as a state of matter, “percep-tronium”, with distinctive information processing abilities. We explore four basic principles thatmay distinguish conscious matter from other physical systems such as solids, liquids and gases:the information, integration, independence and dynamics principles. If such principles can identifyconscious entities, then they can help solve the quantum factorization problem: why do consciousobservers like us perceive the particular Hilbert space factorization corresponding to classical space(rather than Fourier space, say), and more generally, why do we perceive the world around us asa dynamic hierarchy of objects that are strongly integrated and relatively independent? Tensorfactorization of matrices is found to play a central role, and our technical results include a theoremabout Hamiltonian separability (defined using Hilbert-Schmidt superoperators) being maximized inthe energy eigenbasis. Our approach generalizes Giulio Tononi’s integrated information frameworkfor neural-network-based consciousness to arbitrary quantum systems, and we find interesting linksto error-correcting codes, condensed matter criticality, and the Quantum Darwinism program, aswell as an interesting connection between the emergence of consciousness and the emergence of time.

I. INTRODUCTION

A. Consciousness in physics

A commonly held view is that consciousness is irrel-evant to physics and should therefore not be discussedin physics papers. One oft-stated reason is a perceivedlack of rigor in past attempts to link consciousness tophysics. Another argument is that physics has been man-aged just fine for hundreds of years by avoiding this sub-ject, and should therefore keep doing so. Yet the factthat most physics problems can be solved without refer-ence to consciousness does not guarantee that this appliesto all physics problems. Indeed, it is striking that manyof the most hotly debated issues in physics today involvethe notions of observations and observers, and we cannotdismiss the possibility that part of the reason why theseissues have resisted resolution for so long is our reluc-tance as physicists to discuss consciousness and attemptto rigorously define what constitutes an observer.

For example, does the non-observability of spacetimeregions beyond horizons imply that they in some sensedo not exist independently of the regions that we canobserve? This question lies at the heart of the contro-versies surrounding the holographic principle, black holecomplementarity and firewalls, and depends crucially onthe role of observers [1, 2]. What is the solution to thequantum measurement problem? This again hinges cru-cially on the role of observation: does the wavefunctionundergo a non-unitary collapse when an observation ismade, are there Everettian parallel universes, or does itmake no sense to talk about an an observer-independentreality, as argued by QBism advocates [3]? Is our per-sistent failure to unify general relativity with quantummechanics linked to the different roles of observers in thetwo theories? After all, the idealized observer in generalrelativity has no mass, no spatial extent and no effecton what is observed, whereas the quantum observer no-

toriously does appear to affect the quantum state of theobserved system. Finally, out of all of the possible fac-torizations of Hilbert space, why is the particular factor-ization corresponding to classical space so special? Whydo we observers perceive ourselves are fairly local in realspace as opposed to Fourier space, say, which accordingto the formalism of quantum field theory corresponds toan equally valid Hilbert space factorization? This “quan-tum factorization problem” appears intimately related tothe nature of an observer.

The only issue there is consensus on is that there isno consensus about how to define an observer and itsrole. One might hope that a detailed observer definitionwill prove unnecessary because some simple propertiessuch as the ability to record information might suffice;however, we will see that at least two more properties ofobservers may be necessary to solve the quantum factor-ization problem, and that a closer examination of con-sciousness may be required to identify these properties.

Another commonly held view is that consciousness isunrelated to quantum mechanics because the brain is awet, warm system where decoherence destroys quantumsuperpositions of neuron firing much faster than we canthink, preventing our brain from acting as a quantumcomputer [4]. In this paper, I argue that consciousnessand quantum mechanics are nonetheless related, but ina different way: it is not so much that quantum mechan-ics is relevant to the brain, as the other way around.Specifically, consciousness is relevant to solving an openproblem at the very heart of quantum mechanics: thequantum factorization problem.

B. Consciousness in philosophy

Why are you conscious right now? Specifically, whyare you having a subjective experience of reading thesewords, seeing colors and hearing sounds, while the inani-

arX

iv:1

401.

1219

v3 [

quan

t-ph

] 1

8 M

ar 2

015

2

mate objects around you are presumably not having anysubjective experience at all? Different people mean dif-ferent things by “consciousness”, including awareness ofenvironment or self. I am asking the more basic ques-tion of why you experience anything at all, which is theessence of what philosopher David Chalmers has termed“the hard problem” of consciousness and which has pre-occupied philosophers throughout the ages (see [5] andreferences therein). A traditional answer to this prob-lem is dualism — that living entities differ from inani-mate ones because they contain some non-physical ele-ment such as an “anima” or “soul”. Support for dualismamong scientists has gradually dwindled with the real-ization that we are made of quarks and electrons, whichas far as we can tell move according to simple physicallaws. If your particles really move according to the lawsof physics, then your purported soul is having no effect onyour particles, so your conscious mind and its ability tocontrol your movements would have nothing to do with asoul. If your particles were instead found not to obey theknown laws of physics because they were being pushedaround by your soul, then we could treat the soul as justanother physical entity able to exert forces on particles,and study what physical laws it obeys, just as physicistshave studied new forces fields and particles in the past.

The key assumption in this paper is that consciousnessis a property of certain physical systems, with no “se-cret sauce” or non-physical elements.1, This transformsChalmers’ hard problem. Instead of starting with thehard problem of why an arrangement of particles canfeel conscious, we will start with the hard fact that somearrangement of particles (such as your brain) do feel con-scious while others (such as your pillow) do not, and askwhat properties of the particle arrangement make thedifference.

This paper is not a comprehensive theory of concious-ness. Rather, it is an investigation into the physicalproperties that conscious systems must have. If we un-

1 More specifically, we pursue an extreme Occam’s razor approachand explore whether all aspects of reality can be derived fromquantum mechanics with a density matrix evolving unitarily ac-cording to a Hamiltonian. It this approach should turn out tobe successful, then all observed aspects of reality must emergefrom the mathematical formalism alone: for example, the Bornrule for subjective randomness associated with observation wouldemerge from the underlying deterministic density matrix evo-lution through Everett’s approach, and both a semiclassicalworld and consciousness should somehow emerge as well, perhapsthough processes generalizing decoherence. Even if quantumgravity phenomena cannot be captured with this simple quantumformalism, it is far from clear that gravitational, relativistic ornon-unitary effects are central to understanding consciousness orhow conscious observers perceive their immediate surroundings.There is of course no a priori guarantee that this approach willwork; this paper is motivated by the view that an Occam’s razorapproach is useful if it succeeds and very interesting if it fails,by giving hints as to what alternative assumptions or ingredientsare needed.

derstood what these physical properties were, then wecould in principle answer all of the above-mentioned openphysics questions by studying the equations of physics:we could identify all conscious entities in any physicalsystem, and calculate what they would perceive. How-ever, this approach is typically not pursued by physicists,with the argument that we do not understand conscious-ness well enough.

C. Consciousness in neuroscience

Arguably, recent progress in neuroscience has funda-mentally changed this situation, so that we physicistscan no longer blame neuroscientists for our own lack ofprogress. I have long contended that consciousness is theway information feels when being processed in certaincomplex ways [6, 7], i.e., that it corresponds to certaincomplex patterns in spacetime that obey the same lawsof physics as other complex systems. In the seminal pa-per “Consciousness as Integrated Information: a Provi-sional Manifesto” [8], Giulio Tononi made this idea morespecific and useful, making a compelling argument thatfor an information processing system to be conscious, itneeds to have two separate traits:

1. Information: It has to have a large repertoire ofaccessible states, i.e., the ability to store a largeamount of information.

2. Integration: This information must be integratedinto a unified whole, i.e., it must be impossibleto decompose the system into nearly independentparts, because otherwise these parts would subjec-tively feel like two separate conscious entities.

Tononi’s work has generated a flurry of activity in theneuroscience community, spanning the spectrum fromtheory to experiment (see [9–13] for recent reviews), mak-ing it timely to investigate its implications for physics aswell. This is the goal of the present paper — a goal whosepursuit may ultimately provide additional tools for theneuroscience community as well.

Despite its successes, Tononi’s Integrated InformationTheory (IIT)2 leaves many questions unanswered. If itis to extend our consciousness-detection ability to ani-mals, computers and arbitrary physical systems, then weneed to ground its principles in fundamental physics. IITtakes information, measured in bits, as a starting point.But when we view a brain or computer through our physi-cists eyes, as myriad moving particles, then what physicalproperties of the system should be interpreted as logical

2 Since it’s inception [8], IIT has been further developed [12]. Inparticular, IIT 3.0 considers both the past and the future of amechanism in a particular state (it’s so-called cause-effect reper-toire) and replaces the Kullback-Leibler measure with a propermetric.

3

ManyState of long-lived Information Easily Complex?matter states? integrated? writable? dynamics?Gas N N N YLiquid N N N YSolid Y N N NMemory Y N Y NComputer Y ? Y YConsciousness Y Y Y Y

TABLE I: Substances that store or process information canbe viewed as novel states of matter and investigated withtraditional physics tools.

bits of information? I interpret as a “bit” both the po-sition of certain electrons in my computers RAM mem-ory (determining whether the micro-capacitor is charged)and the position of certain sodium ions in your brain (de-termining whether a neuron is firing), but on the basisof what principle? Surely there should be some way ofidentifying consciousness from the particle motions alone,or from the quantum state evolution, even without thisinformation interpretation? If so, what aspects of thebehavior of particles corresponds to conscious integratedinformation? We will explore different measures of inte-gration below. Neuroscientists have successfully mappedout which brain activation patterns correspond to certaintypes of conscious experiences, and named these patternsneural correlates of consciousness. How can we general-ize this and look for physical correlates of consciousness,defined as the patterns of moving particles that are con-scious? What particle arrangements are conscious?

D. Consciousness as a state of matter

Generations of physicists and chemists have studiedwhat happens when you group together vast numbers ofatoms, finding that their collective behavior depends onthe pattern in which they are arranged: the key differ-ence between a solid, a liquid and a gas lies not in thetypes of atoms, but in their arrangement. In this pa-per, I conjecture that consciousness can be understoodas yet another state of matter. Just as there are manytypes of liquids, there are many types of consciousness.However, this should not preclude us from identifying,quantifying, modeling and ultimately understanding thecharacteristic properties that all liquid forms of matter(or all conscious forms of matter) share.

To classify the traditionally studied states of matter,we need to measure only a small number of physical pa-rameters: viscosity, compressibility, electrical conductiv-ity and (optionally) diffusivity. We call a substance asolid if its viscosity is effectively infinite (producing struc-tural stiffness), and call it a fluid otherwise. We calla fluid a liquid if its compressibility and diffusivity are

small and otherwise call it either a gas or a plasma, de-pending on its electrical conductivity.

What are the corresponding physical parameters thatcan help us identify conscious matter, and what are thekey physical features that characterize it? If such param-eters can be identified, understood and measured, thiswill help us identify (or at least rule out) consciousness“from the outside”, without access to subjective intro-spection. This could be important for reaching consen-sus on many currently controversial topics, ranging fromthe future of artificial intelligence to determining whenan animal, fetus or unresponsive patient can feel pain.If would also be important for fundamental theoreticalphysics, by allowing us to identify conscious observersin our universe by using the equations of physics andthereby answer thorny observation-related questions suchas those mentioned in the introductory paragraph.

E. Memory

As a first warmup step toward consciousness, let usfirst consider a state of matter that we would character-ize as memory3 — what physical features does it have?For a substance to be useful for storing information, itclearly needs to have a large repertoire of possible long-lived states or attractors (see Table I). Physically, thismeans that its potential energy function has a large num-ber of well-separated minima. The information storagecapacity (in bits) is simply the base-2 logarithm of thenumber of minima. This equals the entropy (in bits)of the degenerate ground state if all minima are equallydeep. For example, solids have many long-lived states,whereas liquids and gases do not: if you engrave some-one’s name on a gold ring, the information will still bethere years later, but if you engrave it in the surface of apond, it will be lost within a second as the water surfacechanges its shape. Another desirable trait of a memorysubstance, distinguishing it from generic solids, is that itis not only easy to read from (as a gold ring), but alsoeasy to write to: altering the state of your hard drive oryour synapses requires less energy than engraving gold.

F. Computronium

As a second warmup step, what properties should weascribe to what Margolus and Toffoli have termed “com-

3 Neuroscience research has demonstrated that long-term mem-ory is not necessary for consciousness. However, even extremelymemory-impaired conscious humans such as Clive Wearing [14]are able to retain information for several seconds; in this paper,I will assume merely that information needs to be rememberedlong enough to be subjectively experienced — perhaps 0.1 sec-onds for a human, and much less for entities processing informa-tion more rapidly.

4

putronium” [15], the most general substance that canprocess information as a computer? Rather than justremain immobile as a gold ring, it must exhibit com-plex dynamics so that its future state depends in somecomplicated (and hopefully controllable/programmable)way on the present state. Its atom arrangement mustbe less ordered than a rigid solid where nothing interest-ing changes, but more ordered than a liquid or gas. Atthe microscopic level, computronium need not be par-ticularly complicated, because computer scientists havelong known that as long as a device can perform certainelementary logic operations, it is universal: it can be pro-grammed to perform the same computation as any othercomputer with enough time and memory. Computervendors often parametrize computing power in FLOPS,floating-point operations per second for 64-bit numbers;more generically, we can parametrize computronium ca-pable of universal computation by “FLIPS”: the numberof elementary logical operations such as bit flips that itcan perform per second. It has been shown by Lloyd[16] that a system with average energy E can perform amaximum of 4E/h elementary logical operations per sec-ond, where h is Planck’s constant. The performance oftoday’s best computers is about 38 orders of magnitudelower than this, because they use huge numbers of parti-cles to store each bit and because most of their energy istied up in a computationally passive form, as rest mass.

G. Perceptronium

What about “perceptronium”, the most general sub-stance that feels subjectively self-aware? If Tononi isright, then it should not merely be able to store and pro-cess information like computronium does, but it shouldalso satisfy the principle that its information is inte-grated, forming a unified and indivisible whole.

Let us also conjecture another principle that conscioussystems must satisfy: that of autonomy, i.e., that in-formation can be processed with relative freedom fromexternal influence. Autonomy is thus the combinationof two separate properties: dynamics and independence.Here dynamics means time dependence (hence informa-tion processing capacity) and independence means thatthe dynamics is dominated by forces from within ratherthan outside the system. Just like integration, autonomyis postulated to be a necessary but not sufficient condi-tion for a system to be conscious: for example, clocksand diesel generators tend to exhibit high autonomy, butlack substantial information storage capacity.

H. Consciousness and the quantum factorizationproblem

Table II summarizes the four candidate principles thatwe will explore as necessary conditions for consciousness.Our goal with isolating and studying these principles is

Principle DefinitionInformation A conscious system has substantial

principle information storage capacity.Dynamics A conscious system has substantial

principle information processing capacity.Independence A conscious system has substantial

principle independence from the rest of the world.Integration A conscious system cannot consist of

principle nearly independent parts.Autonomy A conscious system has substantial

principle dynamics and independence.Utility An evolved conscious system records mainly

principle information that is useful for it.

TABLE II: Four conjectured necessary conditions for con-sciousness that we explore in this paper. The fifth principlesimply combines the second and third. The sixth is not anecessary condition, but may explain the evolutionary originof the others.

not merely to strengthen our understanding of conscious-ness as a physical process, but also to identify simpletraits of conscious matter that can help us tackle otheropen problems in physics. For example, the only propertyof consciousness that Hugh Everett needed to assume forhis work on quantum measurement was that of the infor-mation principle: by applying the Schrodinger equationto systems that could record and store information, heinferred that they would perceive subjective randomnessin accordance with the Born rule. In this spirit, we mighthope that adding further simple requirements such as inthe integration principle, the independence principle andthe dynamics principle might suffice to solve currentlyopen problems related to observation. The last princi-ple listed in Table II, the utility principle, is of a differ-ent character than the others: we consider it not as anecessary condition for consciousness, but as a potentialunifying evolutionary explanation of the others.

In this paper, we will pay particular attention to whatI will refer to as the quantum factorization problem:why do conscious observers like us perceive the particu-lar Hilbert space factorization corresponding to classicalspace (rather than Fourier space, say), and more gener-ally, why do we perceive the world around us as a dy-namic hierarchy of objects that are strongly integratedand relatively independent? This fundamental problemhas received almost no attention in the literature [18].We will see that this problem is very closely related tothe one Tononi confronted for the brain, merely on alarger scale. Solving it would also help solve the “physics-from-scratch” problem [7]: If the Hamiltonian H and thetotal density matrix ρ fully specify our physical world,how do we extract 3D space and the rest of our semi-classical world from nothing more than two Hermitianmatrices, which come without any a priori physical in-terpretation or additional structure such as a physicalspace, quantum observables, quantum field definitions,an “outside” system, etc.? Can some of this information

5

be extracted even from H alone, which is fully specifiedby nothing more than its eigenvalue spectrum? We willsee that a generic Hamiltonian cannot be decomposedusing tensor products, which would correspond to a de-composition of the cosmos into non-interacting parts —instead, there is an optimal factorization of our universeinto integrated and relatively independent parts. Basedon Tononi’s work, we might expect that this factoriza-tion, or some generalization thereof, is what consciousobservers perceive, because an integrated and relativelyautonomous information complex is fundamentally whata conscious observer is!

The rest of this paper is organized as follows. In Sec-tion II, we explore the integration principle by quanti-fying integrated information in physical systems, findingencouraging results for classical systems and interestingchallenges introduced by quantum mechanics. In Sec-tion III, we explore the independence principle, findingthat at least one additional principle is required to ac-count for the observed factorization of our physical worldinto an object hierarchy in three-dimensional space. InSection IV, we explore the dynamics principle and otherpossibilities for reconciling quantum-mechanical theorywith our observation of a semiclassical world. We dis-cuss our conclusions in Section V, including applicationsof the utility principle, and cover various mathematicaldetails in the three appendices. Throughout the paper,we mainly consider finite-dimensional Hilbert spaces thatcan be viewed as collections of qubits; as explained in Ap-pendix C, this appears to cover standard quantum fieldtheory with its infinite-dimensional Hilbert space as well.

II. INTEGRATION

A. Our physical world as an object hierarchy

The problem of identifying consciousness in an arbi-trary collection of moving particles is similar to the sim-pler problem of identifying objects there. One of the moststriking features of our physical world is that we perceiveit as an object hierarchy, as illustrated in Figure 1. If youare enjoying a cold drink, you perceive ice cubes in yourglass as separate objects because they are both fairly in-tegrated and fairly independent, e.g., their parts are morestrongly connected to one another than to the outside.The same can be said about each of their constituents,ranging from water molecules all the way down to elec-trons and quarks. Zooming out, you similarly perceivethe macroscopic world as a dynamic hierarchy of objectsthat are strongly integrated and relatively independent,all the way up to planets, solar systems and galaxies. Letus quantify this by defining the robustness of an objectas the ratio of the integration temperature (the energyper part needed to separate them) to the independencetemperature (the energy per part needed to separate theparent object in the hierarchy). Figure 1 illustrates that

all of the ten types of objects shown have robustness often or more. A highly robust object preserves its identity(its integration and independence) over a wide range oftemperatures/energies/situations. The more robust anobject is, the more useful it is for us humans to perceiveit as an object and coin a name for it, as per the above-mentioned utility principle.

Returning to the “physics-from-scratch” problem, howcan we identify this object hierarchy if all we have tostart with are two Hermitian matrices, the density ma-trix ρ encoding the state of our world and the Hamilto-nian H determining its time-evolution? Imagine that weknow only these mathematical objects ρ and H and haveno information whatsoever about how to interpret thevarious degrees of freedom or anything else about them.A good beginning is to study integration. Consider, forexample, ρ and H for a single deuterium atom, whoseHamiltonian is (ignoring spin interactions for simplicity)

H(rp,pp, rn,pn, re,pe) = (1)

= H1(rp,pp, rn,pn) + H2(pe) + H3(rp,pp, rn,pn, re,pe),

where r and p are position and momentum vectors, andthe subscripts p, n and e refer to the proton, the neutronand the electron. On the second line, we have decom-posed H into three terms: the internal energy of theproton-neutron nucleus, the internal (kinetic) energy ofthe electron, and the electromagnetic electron-nucleus in-teraction. This interaction is tiny, on average involvingmuch less energy than those within the nucleus:

tr H3ρ

tr H1ρ∼ 10−5, (2)

which we recognize as the inverse robustness for a typicalnucleus in Figure 3. We can therefore fruitfully approx-imate the nucleus and the electron as separate objectsthat are almost independent, interacting only weaklywith one another. The key point here is that we couldhave performed this object-finding exercise of dividingthe variables into two groups to find the greatest indepen-dence (analogous to what Tononi calls “the cruelest cut”)based on the functional form of H alone, without evenhaving heard of electrons or nuclei, thereby identifyingtheir degrees of freedom through a purely mathematicalexercise.

B. Integration and mutual information

If the interaction energy H3 were so small that wecould neglect it altogether, then H would be decompos-able into two parts H1 and H2, each one acting on onlyone of the two sub-systems (in our case the nucleus andthe electron). This means that any thermal state wouldbe factorizable:

ρ ∝ e−H/kT = e−H1/kT e−H2/kT = ρ1ρ2, (3)

6

Object: Oxygen atomRobustness: 10Independence T: 1 eVIntegration T: 10 eV

Object: Oxygen nucleusRobustness: 105

Independence T: 10 eVIntegration T: 1 MeV

Object: ProtonRobustness: 200Independence T: 1 MeVIntegration T: 200 MeV

Object: NeutronRobustness: 200Independence T: 1 MeVIntegration T: 200 MeV

Object: ElectronRobustness: 1022?Independence T: 10 eVIntegration T: 1016 GeV?

Object: Down quarkRobustness: 1017?Independence T: 200 MeVIntegration T: 1016 GeV?

Object: Up quarkRobustness: 1017?Independence T: 200 MeVIntegration T: 1016 GeV?

Object: Hydrogen atomRobustness: 10Independence T: 1 eVIntegration T: 10 eV

Object: Ice cubeRobustness: 105

Independence T: 3 mKIntegration T: 300 K

Object: Water moleculeRobustness: 30Independence T: 300 KIntegration T: 1 eV ~ 104K

{mgh/kB~3mK permolecule

FIG. 1: We perceive the external world as a hierarchy of objects, whose parts are more strongly connected to one anotherthan to the outside. The robustness of an object is defined as the ratio of the integration temperature (the energy per partneeded to separate them) to the independence temperature (the energy per part needed to separate the parent object in thehierarchy).

so the total state ρ can be factored into a product ofthe subsystem states ρ1 and ρ2. In this case, the mutualinformation

I ≡ S(ρ1) + S(ρ2)− S(ρ) (4)

vanishes, where

S(ρ) ≡ −tr ρ log2 ρ (5)

is the von Neumann entropy (in bits) — which is simplythe Shannon entropy of eigenvalues of ρ. Even for non-thermal states, the time-evolution operator U becomesseparable:

U ≡ eiHt/~ = eiH1t/~eiH2t/~ = U1U2, (6)

which (as we will discuss in detail in Section III) impliesthat the mutual information stays constant over time andno information is ever exchanged between the objects. Insummary, if a Hamiltonian can be decomposed withoutan interaction term (with H3 = 0), then it describes twoperfectly independent systems.4

4 Note that in this paper, we are generally considering H and ρfor the entire cosmos, so that there is no “outside” containingobservers etc. If H3 = 0, entanglement between the two systemsthus cannot have any observable effects. This is in stark contrastto most textbook quantum mechanics considerations, where onestudies a small subsystem of the world.

7

Let us now consider the opposite case, when a sys-tem cannot be decomposed into independent parts. Letus define the integrated information Φ as the mutual in-formation I for the “cruelest cut” (the cut minimizingI) in some class of cuts that subdivide the system intotwo (we will discuss many different classes of cuts be-low). Although our Φ-definition is slightly different fromTononi’s [8]5, it is similar in spirit, and we are reusing hisΦ-symbol for its elegant symbolism (unifying the shapesof I for information and O for integration).

C. Maximizing integration

We just saw that if two systems are dynamically inde-pendent (H3 = 0), then Φ = 0 at all time both for ther-mal states and for states that were independent (Φ = 0)at some point in time. Let us now consider the oppo-site extreme. How large can the integrated informationΦ get? A as warmup example, let us consider the fa-miliar 2D Ising model in Figure 2 where n = 2500 mag-netic dipoles (or spins) that can point up or down areplaced on a square lattice, and H is such that they pre-fer aligning with their nearest neighbors. When T →∞,ρ ∝ e−H/kT → I, so all 2n states are equally likely, alln bits are statistically independent, and Φ = 0. WhenT → 0, all states freeze out except the two degenerateground states (all spin up or all spin down), so all spinsare perfectly correlated and Φ = 1 bit. For interme-diate temperatures, long-range correlations are seen toexist such that typical states have contiguous spin-up orspin-down patches. On average, we get about one bit ofmutual information for each such patch crossing our cut(since a spin on one side “knows” about at a spin on theother side), so for bipartitions that cut the system intotwo equally large halves, the mutual information will beproportional to the length of the cutting curve. The “cru-elest cut” is therefore a vertical or horizontal straight lineof length n1/2, giving Φ ∼ n1/2 at the temperature wheretypical patches are only a few pixels wide. We would sim-ilarly get a maximum integration Φ ∼ n1/3 for a 3D Isingsystem and Φ ∼ 1 bit for a 1D Ising system.

Since it is the spatial correlations that provide the in-tegration, it is interesting to speculate about whetherthe conscious subsystem of our brain is a system near itscritical temperature, close to a phase transition. Indeed,Damasio has argued that to be in homeostasis, a num-ber of physical parameters of our brain need to be keptwithin a narrow range of values [19] — this is preciselywhat is required of any condensed matter system to benear-critical, exhibiting correlations that are long-range

5 Tononi’s definition of Φ [8] applies only for classical systems,whereas we wish to study the quantum case as well. Our Φ ismeasured in bits and can grow with system size like an extrinsicvariable, whereas his is an intrinsic variable akin representing asort of average integration per bit.

(providing integration) but not so strong that the wholesystem becomes correlated like in the right panel or in abrain experiencing an epileptic seizure.

D. Integration, coding theory and error correction

Even when we tuned the temperature to the most fa-vorable value in our 2D Ising model example, the inte-grated information never exceeded Φ ∼ n1/2 bits, whichis merely a fraction n−1/2 of the n bits of informationthat n spins can potentially store. So can we do better?Fortunately, a closely related question has been carefullystudied in the branch of mathematics known as codingtheory, with the aim of optimizing error correcting codes.Consider, for example, the following set of m = 16 bitstrings, each written as a column vector of length n = 8:

M =

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 10 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 1 0 1 0 0 1 0 1 1 0 1 0 0 10 1 0 1 0 1 0 1 1 0 1 0 1 0 1 00 1 0 1 1 0 1 0 0 1 0 1 1 0 1 00 0 1 1 0 0 1 1 1 1 0 0 1 1 0 00 0 1 1 1 1 0 0 0 0 1 1 1 1 0 00 1 1 0 0 1 1 0 1 0 0 1 1 0 0 1

This is known as the Hamming(8,4)-code, and has Ham-ming distance d = 4, which means that at least 4 bitflips are required to change one string into another [20].It is easy to see that for a code with Hamming distanced, any (d− 1) bits can always be reconstructed from theothers: You can always reconstruct b bits as long as eras-ing them does not make two bit strings identical, whichwould cause ambiguity about which the correct bit stringis. This implies that reconstruction works when the Ham-ming distance d > b.

To translate such codes of m bit strings of length ninto physical systems, we simply created a state spacewith n bits (interpretable as n spins or other two-statesystems) and construct a Hamiltonian which has an m-fold degenerate ground state, with one minimum cor-responding to each of the m bit strings in the code.In the low-temperature limit, all bit strings will re-ceive the same probability weight 1/m, giving an entropyS = log2m. The corresponding integrated informationΦ of the ground state is plotted in Figure 3 for a fewexamples, as a function of cut size k (the number of bitsassigned to the first subsystem). To calculate Φ for a cutsize k in practice, we simply minimize the mutual infor-mation I over all

(nk

)ways of partitioning the n bits into

k and (n− k) bits.We see that, as advertised, the Hamming(8,4)-code

gives gives Φ = 3 when 3 bits are cut off. However,it gives only Φ = 2 for bipartitions; the Φ-value for bi-partitions is not simply related to the Hamming distance,and is not a quantity that most popular bit string codesare optimized for. Indeed, Figure 3 shows that for bipar-titions, it underperforms a code consisting of 16 random

8

Morecorrelation

Too little Too muchOptimum UniformRandom

Lesscorrelation

FIG. 2: The panels show simulations of the 2D Ising model on a 50× 50 lattice, with the temperature progressively decreasingfrom left to right. The integrated information Φ drops to zero bits at T → ∞ (leftmost panel) and to one bit as T → 0(rightmost panel), taking a maximum at an intermediate temperature near the phase transition temperature.

Bits cut off

Inte

grat

ed in

form

atio

n Hamming (8,4)-code

(16 8-bit strings)

16 random 8-bit strings

128 8-bit stringswith checksum bit

2 4 6 8

1

2

3

4

FIG. 3: For various 8-bit systems, the integrated informationis plotted as a function of the number of bits cut off into asub-system with the “cruelest cut”. The Hamming (8,4)-codeis seen to give classically optimal integration except for a bi-partition into 4 + 4 bits: an arbitrary subset containing nomore than three bits is completely determined by the remain-ing bits. The code consisting of the half of all 8-bit stringswhose bit sum is even (i.e., each of the 128 7-bit strings fol-lowed by a parity checksum bit) has Hamming distance d = 2and gives Φ = 1 however many bits are cut off. A random setof 16 8-bit strings is seen to outperform the Hamming (8,4)-code for 4+4-bipartitions, but not when fewer bits are cut off.

unique bit strings of the same length. A rich and diverseset of codes have been published in the literature, andthe state-of-the-art in terms of maximal Hamming dis-tance for a given n is continually updated [21]. Althoughcodes with arbitrarily large Hamming distance d exist,there is (just as for our Hamming(8,4)-example above)no guarantee that Φ will be as large as d − 1 when thesmaller of the two subsystems contains more than d bits.Moreover, although Reed-Solomon codes are sometimesbilled as classically optimal erasure codes (maximizingd for a given n), their fundamental units are generally

Bits cut off

Inte

grat

ed in

form

atio

n

256 r

andom

16-bi

t word

s

5 10 15

2

4

6

8

128 random 14-bit w

ords

64 12-bit words

32 10-bit words

16 8-bit words

8 6-bit words

4 4-bit words

FIG. 4: Same as for previous figure, but for random codeswith progressively longer bit strings, consisting of a randomsubset containing

√2n of the 2n possible bit strings. For

better legibility, the vertical axis has been re-centered for theshorter codes.

not bits but groups of bits (generally numbers modulosome prime number), and the optimality is violated if wemake cuts that do not respect the boundaries of these bitgroups.

Although further research on codes maximizing Φwould be of interest, it is worth noting that simple ran-dom codes appear to give Φ-values within a couple of bitsof the theoretical maximum in the limit of large n, as il-lustrated in Figure 4. When cutting off k out of n bits,the mutual information in classical physics clearly can-not exceed the number of bits in either subsystem, i.e., kand n− k, so the Φ-curve for a code must lie within theshaded triangle in the figure. (The quantum-mechanicalcase is more complicated, and we well see in the next sec-tion that it in a sense integrates both better and worse.)The codes for which the integrated information is plottedsimply consist of a random subset containing 2n/2 of the2n possible bit strings, so roughly speaking, half the bitsencode fresh information and the other half provide the

9

2-logarithm of number of patterns used

Inte

grat

ed in

form

atio

n (b

its)

2 4 6 8 10 12 14

1

2

3

4

5

6

7

FIG. 5: The integrated information is shown for randomcodes using progressively larger random subsets of the 214

possible strings of 14 bits. The optimal choice is seen to beusing about 27 bit strings, i.e., using about half the bits toencode information and the other half to integrate it.

redundancy giving near-perfect integration.Just as we saw for the Ising model example, these ran-

dom codes show a tradeoff between entropy and redun-dancy, as illustrated in Figure 5. When there are n bits,how many of the 2n possible bit strings should we useto maximize the integrated information Φ? If we use mof them, we clearly have Φ ≤ log2m, since in classicalphysics, Φ cannot exceed the entropy if the system (themutual information is I = S1 + S2 − S, where S1 ≤ Sand S2 ≤ S so I ≤ S). Using very few bit strings istherefore a bad idea. On the other hand, if we use all2n of them, we lose all redundancy, the bits become in-dependent, and Φ = 0, so being greedy and using toomany bit strings in an attempt to store more informa-tion is also a bad idea. Figure 5 shows that the optimaltradeoff is to use

√2n of the codewords, i.e., to use half

the bits to encode information and the other half to in-tegrate it. Taken together, the last two figures thereforesuggest that n physical bits can be used to provide aboutn/2 bits of integrated information in the large-n limit.

E. Integration in physical systems

Let us explore the consequences of these results forphysical systems described by a Hamiltonian H and astate ρ. As emphasized by Hopfield [22], any physicalsystem with multiple attractors can be viewed as an in-formation storage device, since its state permanently en-codes information about which attractor it belongs to.Figure 6 shows two examples of H interpretable as po-tential energy functions for a a single particle in two di-mensions. They can both be used as information storagedevices, by placing the particle in a potential well andkeeping the system cool enough that the particle staysin the same well indefinitely. The egg crate potentialV (x, y) = sin2(πx) sin2(πy) (top) has 256 minima andhence a ground state entropy (information storage ca-pacity) S = 8 bits, whereas the lower potential has only

FIG. 6: A particle in the egg-crate potential energy land-scape (top panel) stably encodes 8 bits of information thatare completely independent of one another and therefore notintegrated. In contrast, a particle in a Hamming(8,4) poten-tial (bottom panel) encodes only 4 bits of information, butwith excellent integration. Qualitatively, a hard drive is morelike the top panel, while a neural network is more like thebottom panel.

16 minima and S = 4 bits.The basins of attraction in the top panel are seen to

be the squares shown in the bottom panel. If we writethe x− and y− coordinates as binary numbers with bbits each, then the first 4 bits of x and y encode whichsquare (x, y) is in. The information in the remainingbits encodes the location within this square; these bitsare not useful for information storage because they canvary over time, as the particle oscillates around a mini-mum. If the system is actively cooled, these oscillationsare gradually damped out and the particle settles towardthe attractor solution at the minimum, at the center of itsbasin. This example illustrates that cooling is a physicalexample of error correction: if thermal noise adds smallperturbations to the particle position, altering the leastsignificant bits, then cooling will remove these perturba-tions and push the particle back towards the minimumit came from. As long as cooling keeps the perturbationssmall enough that the particle never rolls out of its basinof attraction, all the 8 bits of information encoding itsbasin number are perfectly preserved. Instead of inter-preting our n = 8 data bits as positions in two dimen-sions, we can interpret them as positions in n dimensions,where each possible state corresponds to a corner of then-dimensional hypercube. This captures the essence ofmany computer memory devices, where each bit is storedin a system with two degenerate minima; the least sig-nificant and redundant bits that can be error-correctedvia cooling now get equally distributed among all the di-mensions.

10

How integrated is the information S? For the top panelof Figure 6, not at all: H can be factored as a tensorproduct of 8 two-state systems, so Φ = 0, just as fortypical computer memory. In other words, if the particleis in a particular egg crate basin, knowing any one of thebits specifying the basin position tells us nothing aboutthe other bits. The potential in the lower panel, on theother hand, gives good integration. This potential retainsonly 16 of the 256 minima, corresponding to the 16 bitstrings of the Hamming(8,4)-code, which as we saw givesΦ = 3 for any 3 bits cut off and Φ = 2 bits for symmetricbipartitions. Since the Hamming distance d = 4 for thiscode, at least 4 bits must be flipped to reach anotherminimum, which among other things implies that no twobasins can share a row or column.

F. The pros and cons of integration

Natural selection suggests that self-reproducinginformation-processing systems will evolve integration ifit is useful to them, regardless of whether they are con-scious or not. Error correction can obviously be use-ful, both to correct errors caused by thermal noise andto provide redundancy that improves robustness towardfailure of individual physical components such as neu-rons. Indeed, such utility explains the preponderanceof error correction built into human-developed devices,from RAID-storage to bar codes to forward error cor-rection in telecommunications. If Tononi is correct andconsciousness requires integration, then this raises an in-teresting possibility: our human consciousness may haveevolved as an accidental by-product of error correction.There is also empirical evidence that integration is usefulfor problem-solving: artificial life simulations of vehiclesthat have to traverse mazes and whose brains evolve bynatural selection show that the more adapted they are totheir environment, the higher the integrated informationof the main complex in their brain [23].

However, integration comes at a cost, and as we willnow see, near maximal integration appears to be pro-hibitively expensive. Let us distinguish between the max-imum amount of information that can be stored in a statedefined by ρ and the maximum amount of informationthat can be stored in a physical system defined by H. Theformer is simply S(ρ) for the perfectly mixed (T = ∞)state, i.e., log2 of the number of possible states (the num-ber of bits characterizing the system). The latter canbe much larger, corresponding to log2 of the number ofHamiltonians that you could distinguish between givenyour time and energy available for experimentation. Letus consider potential energy functions whose k differentminima can be encoded as bit strings (as in Figure 6),and let us limit our experimentation to finding all theminima. Then H encodes not a single string of n bits,but a subset consisting of k out of all 2n such strings, one

for each minimum. There are(

2n

k

)such subsets, so the

information contained in H is

S(H) = log2

(2n

k

)= log2

2n!

k!(2n − k)!≈

≈ log2

(2n)k

kk= k(n− log2 k) (7)

for k � 2n, where we used Stirling’s approximationk! ≈ kk. So crudely speaking, H encodes not n bitsbut kn bits. For the near-maximal integration givenby the random codes from the previous section, we hadk = 2n/2, which gives S(H) ∼ 2n/2 n

2 bits. For example,

if the n ∼ 1011 neurons in your brain were maximallyintegrated in this way, then your neural network wouldrequire a dizzying 1010000000000 bits to describe, vastlymore information than can be encoded by all the 1089

particles in our universe combined.The neuronal mechanisms of human memory are still

unclear despite intensive experimental and theoretical ex-plorations [24], but there is significant evidence that thebrain uses attractor dynamics in its integration and mem-ory functions, where discrete attractors may be used torepresent discrete items [25]. The classic implementa-tion of such dynamics as a simple symmetric and asyn-chronous Hopfield neural network [22] can be conve-niently interpreted in terms of potential energy func-tions: the equations of the continuous Hopfield networkare identical to a set of mean-field equations that mini-mize a potential energy function, so this network alwaysconverges to a basin of attraction [26]. Such a Hopfieldnetwork gives a dramatically lower information contentS(H) of only about 0.25 bits per synapse[26], and we haveonly about 1014 synapses, suggesting that our brains canstore only on the order of a few Terabytes of information.

The integrated information of a Hopfield network iseven lower. For a Hopfield network of n neurons withHebbian learning, the total number of attractors isbounded by 0.14n [26], so the maximum information ca-pacity is merely S ≈ log2 0.14n ≈ log2 n ≈ 37 bits forn = 1011 neurons. Even in the most favorable case wherethese bits are maximally integrated, our 1011 neuronsthus provide a measly Φ ≈ 37 bits of integrated informa-tion, as opposed to about Φ ≈ 5×1010 bits for a randomcoding.

G. The integration paradox

This leaves us with an integration paradox: why doesthe information content of our conscious experience ap-pear to be vastly larger than 37 bits? If Tononi’s informa-tion and integration principles from Section I are correct,the integration paradox forces us6 to draw at least one of

6 Can we sidestep the integration paradox by simply dismissing theidea that integration is necessary? Although it remains contro-

11

the following three conclusions:

1. Our brains use some more clever scheme for encod-ing our conscious bits of information, which allowsdramatically larger Φ than Hebbian Hopfield net-works.

2. These conscious bits are much fewer than we mightnaively have thought from introspection, implyingthat we are only able to pay attention to a verymodest amount of information at any instant.

3. To be relevant for consciousness, the definition ofintegrated information that we have used must bemodified or supplemented by at least one additionalprinciple.

We will see that the quantum results in the next sectionbolster the case for conclusion 3. Interestingly, there isalso support for conclusion 2 in the large psychophysicalliterature on the illusion of the perceptual richness of theworld. For example, there is evidence suggesting that ofthe roughly 107 bits of information that enter our braineach second from our sensory organs, we can only beaware of a tiny fraction, with estimates ranging from 10to 50 bits [27, 28].

The fundamental reason why a Hopfield network isspecified by much less information than a near-maximallyintegrated network is that it involves only pairwise cou-plings between neurons, thus requiring only ∼ n2 cou-pling parameters to be specified — as opposed to 2n pa-rameters giving the energy for each of the 2n possiblestates. It is striking how H is similarly simple for thestandard model of particle physics, with the energy in-volving only sums of pairwise interactions between parti-cles supplemented with occasional 3-way and 4-way cou-plings. H for the brain and H for fundamental physicsthus both appear to belong to an extremely simple sub-class of all Hamiltonians, that require an unusually smallamount of information to describe. Just as a system im-plementing near-maximal integration via random codingis too complicated to fit inside the brain, it is also toocomplicated to work in fundamental physics: Since theinformation storage capacity S of a physical system isapproximately bounded by its number of particles [16] orby its area in Planck units by the Holographic principle[17], it cannot be integrated by physical dynamics thatitself requires storage of the exponentially larger informa-tion quantity S(H) ∼ 2S/2 S

2 unless the Standard ModelHamiltonian is replaced by something dramatically morecomplicated.

versial whether integrated information is a sufficient conditionfor consciousness as asserted by IIT, it appears rather obviousthat it is a necessary condition if the conscious experience isunified: if there were no integration, the conscious mind wouldconsist of two separate parts that were independent of one an-other and hence unaware of each other.

An interesting theoretical direction for further research(pursuing resolution 1 to the integration paradox) istherefore to investigate what maximum amount of in-tegrated information Φ can be feasibly stored in a physi-cal system using codes that are algorithmic (such as RS-codes) rather than random. An interesting experimentaldirection would be to search for concrete implementa-tions of error-correction algorithms in the brain.

In summary, we have explored the integration prin-ciple by quantifying integrated information in physicalsystems. We have found that although excellent integra-tion is possible in principle, it is more difficult in prac-tice. In theory, random codes provide nearly maximalintegration, with about half of all n bits coding for dataand the other half providing Ψ ≈ n bits of integration),but in practice, the dynamics required for implement-ing them is too complex for our brain or our universe.Most of our exploration has focused on classical physics,where cuts into subsystems have corresponded to parti-tions of classical bits. As we will see in the next section,finding systems encoding large amounts of integrated in-formation is even more challenging when we turn to thequantum-mechanical case.

III. INDEPENDENCE

A. Classical versus quantum independence

How cruel is what Tononi calls “the cruelest cut”, di-viding a system into two parts that are maximally in-dependent? The situation is quite different in classicalphysics and quantum physics, as Figure 7 illustrates fora simple 2-bit system. In classical physics, the state isspecified by a 2×2 matrix giving the probabilities for thefour states 00, 01, 10 and 11, which define an entropy Sand mutual information I. Since there is only one pos-sible cut, the integrated information Φ = I. The pointdefined by the pair (S,Φ) can lie anywhere in the “pyra-mid” in the figure, who’s top at (S,Φ) = (1, 1) (blackstar) gives maximum integration, and corresponds to per-fect correlation between the two bits: 50% probability for00 and 11. Perfect anti-correlation gives the same point.The other two vertices of the classically allowed regionare seen to be (S,Φ) = (0, 0) (100% probability for a sin-gle outcome) and (S,Φ) = (2, 0) (equal probability for allfour outcomes).

In quantum mechanics, where the 2-qubit state is de-fined by a 4× 4 density matrix, the available area in the(S, I)-plane doubles to include the entire shaded trian-gle, with the classically unattainable region opened upbecause of entanglement. The extreme case is a Bell pairstate such as

|ψ〉 =1√2

(|↑〉|↑〉+ |↓〉|↓〉) , (8)

which gives (S, I) = (0, 2). However, whereas there wasonly one possible cut for 2 classical bits, there are now in-

12

0.5 1.0 1.5 2.0

0.5

1.0

1.5

2.0

Entropy S

Mut

ual i

nfor

mat

ion

I

Possible onlyquantum-mechanically

(entanglement)

Possibleclassically

Bell pair

( ).4.4.2.0( ).91

.03

.03

.03

( )1/31/31/30

( )1000

( ).7.1.1.1

( )1/200

1/2

( )1/21/200

( )1/41/41/41/4

( ).3.3.3.1

Quantum integrated

Unitary transform

ation

Unitary transformation

FIG. 7: Mutual information versus entropy for various 2-bitsystems. The different dots, squares and stars correspondto different states, which in the classical cases are definedby the probabilities for the four basis states 00, 01 10 and11. Classical states can lie only in the pyramid below theupper black star with (S, I) = (1, 1), whereas entanglementallows quantum states to extend all the way up to the upperblack square at (0, 2). However, the integrated information Φfor a quantum state cannot lie above the shaded green/greyregion, into which any other quantum state can be broughtby a unitary transformation. Along the upper boundary ofthis region, either three of the four probabilities are equal, orto two of them are equal while one vanishes.

finitely many possible cuts because in quantum mechan-ics, all Hilbert space bases are equally valid, and we canchoose to perform the factorization in any of them. SinceΦ is defined as I after the cruelest cut, it is the I-valueminimized over all possible factorizations. For simplicity,we use the notation where ⊗ denotes factorization in thecoordinate basis, so the integrated information is

Φ = minU

I(UρU†), (9)

i.e., the mutual information minimized over all possi-ble unitary transformations U. Since the Bell pair ofequation (8) is a pure state ρ = |ψ〉〈ψ|, we can unitarilytransform it into a basis where the first basis vector is|ψ〉, making it factorizable:

U

12 0 0 1

20 0 0 00 0 0 012 0 0 1

2

U† =

1 0 0 00 0 0 00 0 0 00 0 0 0

=

(1 00 0

)⊗(

1 00 0

).

(10)This means that Φ = 0, so in quantum mechanics, thecruelest cut can be very cruel indeed: the most entangled

states possible in quantum mechanics have no integratedinformation at all!

The same cruel fate awaits the most integrated 2-bit state from classical physics: the perfectly correlatedmixed state ρ = 1

2 |↑〉〈↑| +12 |↓〉〈↓|. It gave Φ = 1 bit

classically above (upper black star in the figure), but aunitary transformation permuting its diagonal elementsmakes it factorable:

U

12 0 0 00 0 0 00 0 0 00 0 0 1

2

U† =

12 0 0 00 1

2 0 00 0 0 00 0 0 0

=

(1 00 0

)⊗(

12 00 1

2

),

(11)so Φ = 0 quantum-mechanically (lower black star in thefigure).

B. Canonical transformations, independence andrelativity

The fundamental reason that these states are more sep-arable quantum-mechanically is clearly that more cutsare available, making the cruelest one crueler. Interest-ingly, the same thing can happen also in classical physics.Consider, for example, our example of the deuteriumatom from equation (1). When we restricted our cuts tosimply separating different degrees of freedom, we foundthat the group (rp,pp, rn,pn) was quite (but not com-pletely) independent of the group (re,pe), and that therewas no cut splitting things into perfectly independentpieces. In other words, the nucleus was fairly indepen-dent of the electron, but none of the three particles wascompletely independent of the other two. However, if weallow our degrees of freedom to be transformed beforethe cut, then things can be split into two perfectly in-dependent parts! The classical equivalent of a unitarytransformation is of course a canonical transformation(one that preserves phase-space volume). If we performthe canonical transformation where the new coordinatesare the center-of-mass position rM and the relative dis-placements r′p ≡ rp − rM and r′e ≡ re − rM , and cor-respondingly define pM as the total momentum of thewhole system, etc., then we find that (rM ,pM ) is com-pletely independent of the rest. In other words, the av-erage motion of the entire deuterium atom is completelydecoupled from the internal motions around its center-of-mass.

Interestingly, this well-known possibility of decompos-ing any isolated system into average and relative motions(the “average-relative decomposition”, for short) is equiv-alent to relativity theory in the following sense. Thecore of relativity theory is that all laws of physics (in-cluding the speed of light) are the same in all inertialframes. This implies the average-relative decomposition,since the laws of physics governing the relative motionsof the system are the same in all inertial frames andhence independent of the (uniform) center-of-mass mo-tion. Conversely, we can view relativity as a special case

13

of the average-relative decomposition. If two systems arecompletely independent, then they can gain no knowl-edge of each other, so a conscious observer in one willbe unaware of the other. The average-relative decompo-sition therefore implies that an observer in an isolatedsystem has no way of knowing whether she is at rest orin uniform motion, because these are simply two differentallowed states for the center-of-mass subsystem, which iscompletely independent from (and hence inaccessible to)the internal-motions subsystem of which her conscious-ness is a part.

C. How integrated can quantum states be?

We saw in Figure 7 that some seemingly inte-grated states, such as a Bell pair or a pair of clas-sically perfectly correlated bits, are in fact not inte-grated at all. But the figure also shows that somestates are truly integrated even quantum-mechanically,with I > 0 even for the cruelest cut. How inte-grated can a quantum state be? The following theo-rem, proved by Jevtic, Jennings & Rudolph [29], en-ables the answer to be straightforwardly calculated7:

ρ-Diagonality Theorem (ρDC):The mutual information always takes its min-imum in a basis where ρ is diagonal

The first step in computing the integrated informationΦ(ρ) is thus to diagonalize the n × n density matrix ρ.If all n eigenvalues are different, then there are n! pos-sible ways of doing this, corresponding to the n! waysof permuting the eigenvalues, so the ρDC simplifies thecontinuous minimization problem of equation (9) to a dis-crete minimization problem over these n! permutations.Suppose that n = l ×m, and that we wish to factor then-dimensional Hilbert space into factor spaces of dimen-sionality l and m, so that Φ = 0. It is easy to see thatthis is possible if the n eigenvalues of ρ can be arrangedinto an l × m matrix that is multiplicatively separable(rank 1), i.e., the product of a column vector and a rowvector. Extracting the eigenvalues for our example fromequation (11) where l = m = 2 and n = 4, we see that(

12

12

0 0

)is separable, but

(12 00 1

2

)is not,

and that the only difference is that the order of the fournumbers has been permuted. More generally, we see thatto find the “cruelest cut” that defines the integrated in-formation Φ, we want to find the permutation that makesthe matrix of eigenvalues as separable as possible. It is

7 The converse of the ρDC is straightforward to prove: if Φ = 0(which is equivalent to the state being factorizable; ρ = ρ1⊗ρ2),then it is factorizable also in its eigenbasis where both ρ1 and ρ2are diagonal.

easy to see that when seeking the permutation givingmaximum separability, we can without loss of generalityplace the largest eigenvalue first (in the upper left corner)and the smallest one last (in the lower right corner). Ifthere are only 4 eigenvalues (as in the above example),the ordering of the remaining two has no effect on I.

D. The quantum integration paradox

We now have the tools in hand to answer the key ques-tion from the last section: which state ρ maximizes theintegrated information Φ? Numerical search suggeststhat the most integrated state is a rescaled projectionmatrix satisfying ρ2 ∝ ρ. This means that some num-ber k of the n eigenvalues equal 1/k and the remain-ing ones vanish.8 For the n = 4 example from Fig-ure 7, k = 3 is seen to give the best integration, witheigenvalues (probabilities) 1/3, 1/3, 1/3 and 0, givingΦ = log(27/16)/ log(8) ≈ 0.2516.

For classical physics, we saw that the maximal at-tainable Φ grows roughly linearly with n. Quantum-mechanically, however, it decreases as n increases!9

In summary, no matter how large a quantum systemwe create, its state can never contain more than about aquarter of a bit of integrated information! This exacer-bates the integration paradox from Section II G, eliminat-ing both of the first two resolutions: you are clearly awareof more than 0.25 bits of information right now, and thisquarter-bit maximum applies not merely to states of Hop-field networks, but to any quantum states of any system.Let us therefore begin exploring the third resolution: thatour definition of integrated information must be modifiedor supplemented by at least one additional principle.

8 A heuristic way of understanding why having many equal eigen-values is advantageous is that it helps eliminate the effect of theeigenvalue permutations that we are minimizing over. If the op-timal state has two distinct eigenvalues, then if swapping themchanges I, it must by definition increase I by some finite amount.This suggests that we can increase the integration Φ by bringingthe eigenvalues infinitesimally closer or further apart, and repeat-ing this procedure lets us further increase Φ until all eigenvaluesare either zero or equal to the same positive constant.

9 One finds that Φ is maximized when the k identical nonzeroeigenvalues are arranged in a Young Tableau, which corre-sponds to a partition of k as a sum of positive integersk1 + k2 + ..., giving Φ = S(p) + S(p∗) − log2 k, wherethe probability vectors p and p∗ are defined by pi = ki/kand p∗i = k∗i /k. Here k∗i denotes the conjugate partition.For example, if we cut an even number of qubits into twoparts with n/2 qubits each, then n = 2, 4, 6, ..., 20 givesΦ ≈ 0.252, 0.171, 0.128, 0.085, 0.085, 0.073, 0.056, 0.056, 0.051and 0.042 bits, respectively.

14

E. How integrated is the Hamiltonian?

An obvious way to begin this exploration is to con-sider the state ρ not merely at a single fixed time t, butas a function of time. After all, it is widely assumed thatconsciousness is related to information processing, notmere information storage. Indeed, Tononi’s original Φ-definition [8] (which applies to classical neural networksrather than general quantum systems) involves time, de-pending on the extent to which current events affect fu-ture ones.

Because the time-evolution of the state ρ is determinedby the Hamiltonian H via the Schrodinger equation

ρ = i[H, ρ], (12)

whose solution is

ρ(t) = eiHtρe−iHt, (13)

we need to investigate the extent to which the cruelestcut can decompose not merely ρ but the pair (ρ,H) intoindependent parts. (Here and throughout, we often useunits where ~ = 1 for simplicity.)

F. Evolution with separable Hamiltonian

As we saw above, the key question for ρ is whether itit is factorizable (expressible as product ρ = ρ1 ⊗ ρ2 ofmatrices acting on the two subsystems), whereas the keyquestion for H is whether it is what we will call addi-tively separable, being a sum of matrices acting on thetwo subsystems, i.e., expressible in the form

H = H1 ⊗ I + I⊗H2 (14)

for some matrices H1 and H2. For brevity, we will oftenwrite simply separable instead of additively separable. Asmentioned in Section II B, a separable Hamiltonian Himplies that both the thermal state ρ ∝ e−H/kT andthe time-evolution operator U ≡ eiHt/~ are factorizable.An important property of density matrices which waspointed out already by von Neumann when he inventedthem [30] is that if H is separable, then

ρ1 = i[H1, ρ1], (15)

i.e., the time-evolution of the state of the first subsystem,ρ1 ≡ tr 2ρ, is independent of the other subsystem and ofany entanglement with it that may exist. This is easy toprove: Using the identities (A12) and (A14) shows that

tr2

[H1 ⊗ I, ρ] = tr2{(H1 ⊗ I)ρ} − tr

2{ρ(H1 ⊗ I)}

= H1ρ1 − ρ1H2 = [H1, ρ1]. (16)

Using the identity (A10) shows that

tr2

[I⊗H2, ρ] = 0. (17)

Summing equations (16) and (17) completes the proof.

G. The cruelest cut as the maximization ofseparability

Since a general Hamiltonian H cannot be written inthe separable form of equation (14), it will also include athird term H3 that is non-separable. The independenceprinciple from Section I therefore suggests an interest-ing mathematical approach to the physics-from-scratchproblem of analyzing the total Hamiltonian H for ourphysical world:

1. Find the Hilbert space factorization giving the“cruelest cut”, decomposing H into parts with thesmallest interaction Hamiltonian H3 possible.

2. Keep repeating this subdivision procedure for eachpart until only relatively integrated parts remainthat cannot be further decomposed with a smallinteraction Hamiltonian.

The hope would be that applying this procedure to theHamiltonian of our standard model would reproduce thefull observed object hierarchy from Figure 1, with the fac-torization corresponding to the objects, and the variousnon-separable terms H3 describing the interactions be-tween these objects. Any decomposition with H3 = 0would correspond to two parallel universes unable tocommunicate with one another.

We will now formulate this as a rigorous mathemat-ics problem, solve it, and derive the observational conse-quences. We will find that this approach fails catastroph-ically when confronted with observation, giving interest-ing hints regarding further physical principles needed forunderstanding why we perceive our world as an objecthierarchy.

H. The Hilbert-Schmidt vector space

To enable a rigorous formulation of our problem, letus first briefly review the Hilbert-Schmidt vector space, aconvenient inner-product space where the vectors are notwave functions |ψ〉 but matrices such as H and ρ. Forany two matrices A and B, the Hilbert-Schmidt innerproduct is defined by

(A,B) ≡ tr A†B. (18)

For example, the trace operator can be written as aninner product with the identity matrix:

tr A = (I,A). (19)

This inner product defines the Hilbert-Schmidt norm(also known as the Frobenius norm)

||A|| ≡ (A,A)12 = (tr A†A)

12 =

∑ij

|Aij |2 1

2

. (20)

15

If A is Hermitian (A† = A), then ||A||2 is simply thesum of the squares of its eigenvalues.

Real symmetric and antisymmetric matrices form or-thogonal subspaces under the Hilbert-Schmidt innerproduct, since (S,A) = 0 for any symmetric matrix S(satisfying St = S) and any antisymmetric matrix A(satisfying At = −A). Because a Hermitian matrix (sat-isfying H† = H) can be written in terms of real sym-metric and antisymmetric matrices as H = S + iA, wehave

(H1,H2) = (S1,S2) + (A1,A2),

which means that the inner product of two Hermitianmatrices is purely real.

I. Separating H with orthogonal projectors

By viewing H as a vector in the Hilbert-Schmidt vec-tor space, we can rigorously define an decomposition ofit into orthogonal components, two of which are the sep-arable terms from equation (14). Given a factorization ofthe Hilbert space where the matrix H operates, we definefour linear superoperators10 Πi as follows:

Π0H ≡ 1

n(tr H) I (21)

Π1H ≡(

1

n2tr2

H

)⊗ I2 −Π0H (22)

Π2H ≡ I1 ⊗(

1

n1tr1

H

)−Π0H (23)

Π3H ≡ (I−Π1 −Π2 −Π3)H (24)

It is straightforward to show that these four linear op-erators Πi form a complete set of orthogonal projectors,i.e., that

3∑i=0

Πi = I, (25)

ΠiΠj = Πiδij , (26)

(ΠiH,ΠjH) = ||ΠiH||2δij . (27)

This means that any Hermitian matrix H can be de-composed as a sum of four orthogonal components Hi ≡ΠiH, so that its squared Hilbert-Schmidt norm can bedecomposed as a sum of contributions from the four com-

10 Operators on the Hilbert-Schmidt space are usually called su-peroperators in the literature, to avoid confusions with operatorson the underlying Hilbert space, which are mere vectors in theHilbert-Schmidt space.

ponents:

H = H0 + H1 + H2 + H3, (28)

Hi ≡ ΠiH, (29)

(Hi,Hj) = ||Hi||2δij , (30)

||H||2 = ||H0||2+||H1||2+||H2||2+||H3||2. (31)

We see that H0 ∝ I picks out the trace of H, whereas theother three matrices are trace-free. This trace term is ofcourse physically uninteresting, since it can be eliminatedby simply adding an unobservable constant zero-point en-ergy to the Hamiltonian. H1 and H2 corresponds to thetwo separable terms in equation (14) (without the traceterm, which could have been arbitrarily assigned to ei-ther), and H3 corresponds to the non-separable residual.A Hermitian matrix H is therefore separable if and only ifΠ3H = 0. Just as it is customary to write the norm or avector r by r ≡ |r| (without boldface), we will denote theHilbert-Schmidt norm of a matrix H by H ≡ ||H||. Forexample, with this notation we can rewrite equation (31)as simply H2 = H2

0 +H21 +H2

2 +H23 .

Geometrically, we can think of n×n Hermitian matri-ces H as points in the N -dimensional vector space RN ,whereN = n×n (Hermiteal matrices have n real numberson the diagonal and n(n− 1)/2 complex numbers off thediagonal, constituting a total of n+ 2× n(n− 1)/2 = n2

real parameters). Diagonal matrices form a hyperplaneof dimension n in this space. The projection operatorsΠ0, Π1, Π2 and Π3 project onto hyperplanes of dimen-sion 1, (n − 1), (n − 1) and (n − 1)2, respectively, soseparable matrices form a hyperplane in this space of di-mension 2n− 1. For example, a general 4× 4 Hermitianmatrix can be parametrized by 10 numbers (4 real for thediagonal part and 6 complex for the off-diagonal part),and its decomposition from equation (28) can be writtenas follows:

H =

t+a+b+v d+w c+x yd∗+w∗ t+a−b−v z c−xc∗+x∗ z∗ t−a+b−v d−wy∗ c∗−x∗ d∗−w∗ t−a−b+v

=

=

t 0 0 00 t 0 00 0 t 00 0 0 t

+

a 0 c 00 a 0 cc∗ 0 −a 00 c∗ 0 −a

+

b d 0 0d∗ −b 0 00 0 b d0 0 d∗ −b

+

+

v w x yw∗ −v z −xx∗ z∗ −v −wy∗ −x∗ −w∗ v

(32)

We see that t contributes to the trace (and H0) whilethe other three components Hi are traceless. We also seethat tr 1H2 = tr 2H1 = 0, and that both partial tracesvanish for H3.

16

J. Maximizing separability

We now have all the tools we need to rigorously max-imize separability and test the physics-from-scratch ap-proach described in Section III G. Given a HamiltonianH, we simply wish to minimize the norm of its non-separable component H3 over all possible Hilbert spacefactorizations, i.e., over all possible unitary transforma-tions. In other words, we wish to compute

E ≡ minU||Π3H||, (33)

where we have defined the integration energy E by anal-ogy with the integrated information Φ. If E = 0, thenthere is a basis where our system separates into two paral-lel universes, otherwise E quantifies the coupling betweenthe two parts of the system under the cruelest cut.

The Hilbert-Schmidt space allows us to interpret theminimization problem of equation (33) geometrically, asillustrated in Figure 8. Let H∗ denote the Hamiltonianin some given basis, and consider its orbit H = UHU†

under all unitary transformations U. This is a curved hy-persurface whose dimensionality is generically n(n − 1),i.e., n lower than that of the full space of Hermitian ma-trices, since unitary transformation leave all n eigenval-ues invariant.11 We will refer to this curved hypersur-face as a subsphere, because it is a subset of the full n2-dimensional sphere: the radius H (the Hilbert-Schmidtnorm ||H||) is invariant under unitary transformations,but the subsphere may have a more complicated topologythan a hypersphere; for example, the 3-sphere is knownto topologically be the double cover of SO(3), the matrixgroup of 3× 3 orthonormal transformations.

We are interested in finding the most separable point Hon this subsphere, i.e., the point on the subsphere that isclosest to the (2n−1)-dimensional separable hyperplane.In our notation, this means that we want to find the pointH on the subsphere that minimizes ||Π3H||, the Hilbert-Schmidt norm of the non-separable component. If weperform infinitesimal displacements along the subsphere,||Π3H|| thus remains constant to first order (the gradientvanishes at the minimum), so all tangent vectors of thesubsphere are orthogonal to Π3H, the vector from theseparable hyperplane to the subsphere.

Unitary transformations are generated by anti-Hermitian matrices, so the most general tangent vectorδH is of the form

δH = [A,H] ≡ AH−HA (34)

for some anti-Hermitian n×n matrix A (any matrix sat-isfying A† = −A). We thus obtain the following simple

11 n×n-dimensional Unitary matrices U are known to form an n×n-dimensional manifold: they can always be written as U = eiH

for some Hermitian matrix H, so they are parametrized by thesame number of real parameters (n× n) as Hermitian matrices.

condition for maximal separability:

(Π3H, [A,H]) = 0 (35)

for any anti-Hermitian matrix A. Because the most gen-eral anti-Hermitian matrix can be written as A = iB fora Hermitian matrix B, equation (35) is equivalent to thecondition (Π3H, [B,H]) = 0 for all Hermitian matricesB. Since there are n2 anti-Hermitian matrices, equa-tion (35) is a system of n2 coupled quadratic equationsthat the components of H must obey.

Tangent vectorδH=[A,H]

Integrationenergy

E=||Π3Η||

Non-separablecomponent

Π3Η

Sepa

rabl

e hy

perp

lane

: Π3Η

=0Subsphere H=UH*U

†{FIG. 8: Geometrically, we can view the integration energyas the shortest distance (in Hilbert-Schmidt norm) betweenthe hyperplane of separable Hamiltonians and a subsphereof Hamiltonians that can be unitarily transformed into oneanother. The most separable Hamiltonian H on the subsphereis such that its non-separable component Π3 is orthogonalto all subsphere tangent vectors [A,H] generated by anti-Hermitian matrices A.

K. The Hamiltonian diagonality theorem

Analogously to the above-mentioned ρ-diagonality the-orem, we will now prove that maximal separability is at-tained in the eigenbasis.

H-Diagonality Theorem (HDT):The Hamiltonian is always maximally separa-ble (minimizing ||H3||) in the energy eigenba-sis where it is diagonal.

As a preliminary, let us first prove the following:

Lemma 1: For any Hermitian positive semidefinitematrix H, there is a diagonal matrix H∗ giving the samesubsystem eigenvalue spectra, λ(Π1H∗) = λ(Π1H),λ(Π2H∗) = λ(Π2H), and whose eigenvalue spectrum ismajorized by that of H, i.e., λ(H) � λ(H∗).

17

Proof: Define the matrix H′ ≡ UHU†, where U ≡U1⊗U2, and U1 and U2 are unitary matrices diagonal-izing the partial trace matrices tr 2H and tr 1H, respec-tively. This implies that tr 1H

′ and tr 2H′ are diagonal,

and λ(H′) = λ(H). Now define the matrix H∗ to be H′

with all off-diagonal elements set to zero. Then tr 1H∗ =tr 1H

′ and tr 2H∗ = tr 2H′, so λ(Π1H∗) = λ(Π1H) and

λ(Π2H∗) = λ(Π2H). Moreover, since the eigenvalues ofany Hermitian positive semidefinite matrix majorize itsdiagonal elements [31], λ(H∗) ≺ λ(H′) = λ(H), whichcompletes the proof.

Lemma 2: The set S(H) of all diagonal matriceswhose diagonal elements are majorized by the vectorλ(H) is a convex subset of the subsphere, with boundarypoints on the surface of the subsphere that are diagonalmatrices with all permutations of λ(H).

Proof: Any matrix H∗ ∈ S(H) must lie either onthe subsphere surface or in its interior, because of thewell-known result that for any two positive semidefiniteHermitian matrices of equal trace, the majorization con-dition λ(H∗) ≺ λ(H) is equivalent to the former lyingin the convex hull of the unitary orbit of the latter [32]:

H∗ =∑i piUiHU†i , pi ≥ 0,

∑i pi = 1, UiU

†i = I. S(H)

contains the above-mentioned boundary points, becausethey can be written as UHU† for all unitary matricesU that diagonalize H, and for a diagonal matrix, thecorresponding H∗ is simply the matrix itself. The setS(H) is convex, because the convexity condition thatpλ1 + (1 − p)λ2 � λ if λ1 � λ, λ2 � λ, 0 ≤ p ≤ 1follows straight from the definition of �.

Lemma 3: The function f(H) ≡ ||Π1H||2 + ||Π2H||2is convex, i.e., satisfies f(paHa + pbHb) ≤ paf(Ha) +pbf(Hb) for any constants satisfying pa ≥ 0, pb ≥ 0,pa + pb = 1.

Proof: If we arrange the elements of H into a vec-tor h and denote the action of the superoperators Πi

on h by matrices Pi, then f(H) = |P1h|2 + |P2h|2 =

h†(P†1P1 + P†2P2)h. Since the matrix in parenthesis issymmetric and positive semidefinite, the function f is apositive semidefinite quadratic form and hence convex.

We are now ready to prove the H-diagonality theorem.This is equivalent to proving that f(H) takes its maxi-mum value on the subsphere in Figure 8 for a diagonal H:since both ||H|| and ||H0|| are unitarily invariant, mini-mizing ||H3||2 = ||H||2 − ||H0||2 − f(H) is equivalent tomaximizing f(H).

Let O(H) denote the subphere, i.e., the unitary orbitof H. By Lemma 1, for every H ∈ O(H), there is an H∗ ∈S(H) such that f(H) = f(H∗). If f takes its maximumover S(H) at a point H∗ which also belongs to O(H),then this is therefore also the maximum of f over O(H).Since the function f is convex (by Lemma 3) and the setS(H) is convex (by Lemma 2), f cannot have any localmaxima within the set and must take its maximum valueat at least one point on the boundary of the set. As perLemma 2, these boundary points are diagonal matriceswith all permutations of the eigenvalues of H, so they

also belong to O(H) and therefore constitute maxima off over the subsphere. In other words, the Hamiltonianis always maximally separable in its energy eigenbasis,q.e.d.

This result holds also for Hamiltonians with negativeeigenvalues, since we can make all eigenvalues positiveby adding an H0-component without altering the opti-mization problem. In addition to the diagonal optimum,there will generally be other bases with identical valuesof ||H3||, corresponding to separable unitary transforma-tions of the diagonal optimum.

We have thus proved that separability is always max-imized in the energy eigenbasis, where the n× n matrixH is diagonal and the projection operators Πi definedby equations (21)-(24) greatly simplify. If we arrangethe n = lm diagonal elements of H into an l×m matrixH, then the action of the linear operators Πi is given bysimple matrix operations:

H0 ≡ QlHQm, (36)

H1 ≡ PlHQm, (37)

H2 ≡ QlHPm, (38)

H3 ≡ PlHPm, (39)

where

Pk ≡ I −Qk, (40)

(Qk)ij ≡1

k(41)

are k × k projection matrices satisfying P 2k = Pk, Q2

k =Qk, PkQk = QkPk = 0, Pk + Qk = I. (To avoid confu-sion, we are using boldface for n× n matrices and plainfont for smaller matrices involving only the eigenvalues.)For the n = 2× 2 example of equation (32), we have

P2 =

(12

12

12

12

), Q2 =

1

2

(12 −

12

− 12

12

), (42)

and a general diagonal H is decomposed into four termsH = H0 +H1 +H2 +H3 as follows:

H =

(t tt t

)+

(a a−a −a

)+

(b −bb −b

)+

(v −v−v v

). (43)

As expected, only the last matrix is non-separable, andthe row/column sums vanish for the two previous matri-ces, corresponding to vanishing partial traces.

Note that we are here choosing the n basis states ofthe full Hilbert space to be products of basis states fromthe two factor spaces. This is without loss of generality,since any other basis states can be transformed into suchproduct states by a unitary transformation.

Finally, note that the theorem above applies only to ex-act finite-dimensional Hamiltonians, not to approximatediscretizations of infinite-dimensional ones such as arefrequently employed in physics. If n is not factorizable,the H-factorization problem can be rigorously mapped

18

onto a physically indistinguishable one with a slightlylarger factorizable n by setting the corresponding newrows and columns of the density matrix ρ equal to zero,so that the new degrees of freedom are all frozen out —we will discuss this idea in more detail in in Section IV F.

L. Ultimate independence and the Quantum Zenoparadox

FIG. 9: If the Hamiltonian of a system commutes with theinteraction Hamiltonian ([H1,H3] = 0), then decoherencedrives the system toward a time-independent state ρ wherenothing ever changes. The figure illustrates this for the BlochSphere of a single qubit starting in a pure state and ending upin a fully mixed state ρ = I/2. More general initial states endup somewhere along the z-axis. Here H1 ∝ σz, generating asimple precession around the z-axis.

In Section III G, we began exploring the idea that if wedivide the world into maximally independent parts (withminimal interaction Hamiltonians), then the observed ob-ject hierarchy from Figure 1 would emerge. The HDTtells us that this decomposition (factorization) into max-imally independent parts can be performed in the energyeigenbasis of the total Hamiltonian. This means that allsubsystem Hamiltonians and all interaction Hamiltonianscommute with one another, corresponding to an essen-tially classical world where none of the quantum effectsassociated with non-commutativity manifest themselves!In contrast, many systems that we customarily refer toas objects in our classical world do not commute withtheir interaction Hamiltonians: for example, the Hamil-tonian governing the dynamics of a baseball involves itsmomentum, which does not commute with the position-dependent potential energy due to external forces.

As emphasized by Zurek [33], states commuting withthe interaction Hamiltonian form a “pointer basis” ofclassically observable states, playing an important rolein understanding the emergence of a classical world.The fact that the independence principle automaticallyleads to commutativity with interaction Hamiltoniansmight therefore be taken as an encouraging indicationthat we are on the right track. However, whereasthe pointer states in Zurek’s examples evolve over timedue to the system’s own Hamiltonian H1, those in ourindependence-maximizing decomposition do not, becausethey commute also with H1. Indeed, the situationis even worse, as illustrated in Figure 9: any time-dependent system will evolve into a time-independentone, as environment-induced decoherence [34–37, 39, 40]drives it towards an eigenstate of the interaction Hamil-tonian, i.e., an energy eigenstate.12

The famous Quantum Zeno effect, whereby a systemcan cease to evolve in the limit where it is arbitrar-ily strongly coupled to its environment [41], thus has astronger and more pernicious cousin, which we will termthe Quantum Zeno Paradox or the Independence Para-dox.

Quantum Zeno Paradox:If we decompose our universe into maximallyindependent objects, then all change grinds toa halt.

In summary, we have tried to understand the emer-gence of our observed semiclassical world, with its hi-erarchy of moving objects, by decomposing the worldinto maximally independent parts, but our attempts havefailed dismally, producing merely a timeless world remi-niscent of heat death. In Section II G, we saw that usingthe integration principle alone led to a similarly embar-rassing failure, with no more than a quarter of a bit ofintegrated information possible. At least one more prin-ciple is therefore needed.

IV. DYNAMICS AND AUTONOMY

Let us now explore the implications of the dynamicsprinciple from Table II, according to which a conscioussystem has the capacity to not only store information,but also to process it. As we just saw above, there is aninteresting tension between this principle and the inde-pendence principle, whose Quantum Zeno Paradox givesthe exact opposite: no dynamics and no information pro-cessing at all.

12 For a system with a finite environment, the entropy will eventu-ally decrease again, causing the resumption of time-dependence,but this Poincare recurrence time grows exponentially with envi-ronment size and is normally large enough that decoherence canbe approximated as permanent.

19

We will term the synthesis of these two competing prin-ciples the autonomy principle: a conscious system hassubstantial dynamics and independence. When explor-ing autonomous systems below, we can no longer studythe state ρ and the Hamiltonian H separately, since theirinterplay is crucial. Indeed, we well see that there are in-teresting classes of states ρ that provide substantial dy-namics and near-perfect independence even when the in-teraction Hamiltonian H3 is not small. In other words,for certain preferred classes of states, the independenceprinciple no longer pushes us to simply minimize H3 andface the Quantum Zeno Paradox.

A. Probability velocity and energy coherence

To obtain a quantitative measure of dynamics, let usfirst define the probability velocity v ≡ p, where the prob-ability vector p is given by pi ≡ ρii. In other words,

vk = ρkk = i[H, ρ]kk. (44)

Since v is basis-dependent, we are interested in findingthe basis where

v2 ≡∑k

v2k =∑k

(ρkk)2 (45)

is maximized, i.e., the basis where the sums of squares ofthe diagonal elements of ρ is maximal. It is easy to seethat this basis is the eigenbasis of ρ:

v2 =∑k

(ρkk)2 =∑jk

(ρjk)2 −∑j 6=k

(ρjk)2

= ||ρ||2 −∑j 6=k

(ρjk)2 (46)

is clearly maximized in the eigenbasis where all off-diagonal elements in the last term vanish, since theHilbert-Schmidt norm ||ρ|| is the same in every basis;||ρ||2 = tr ρ2, which is simply the sum of the squares ofthe eigenvalues of ρ.

Let us define the energy coherence

δH ≡ 1√2||ρ|| = 1√

2||i[H, ρ]|| =

√−tr {[H, ρ]2}

2

=√

tr [H2ρ2 −HρHρ]. (47)

For a pure state ρ = |ψ〉〈ψ|, this definition implies thatδH ≡ ∆H, where ∆H is the energy uncertainty

∆H =[〈ψ|H2|ψ〉 − 〈ψ|H|ψ〉2

]1/2, (48)

so we can think of δH as the coherent part of the en-ergy uncertainty, i.e., as the part that is due to quantumrather than classical uncertainty.

Since ||ρ|| = ||[H, ρ]|| =√

2δH, we see that the maxi-mum possible probability velocity v is simply

vmax =√

2 δH, (49)

so we can equivalently use either of v or δH as conve-nient measures of quantum dynamics.13 Whimsicallyspeaking, the dynamics principle thus implies that en-ergy eigenstates are as unconscious as things come, andthat if you know your own energy exactly, you’re dead.

Although it is not obvious from their definitions, thesequantities vmax and δH are independent of time (eventhough ρ generally evolves). This is easily seen in theenergy eigenbasis, where

− iρmn = [H, ρ]mn = ρmn(Em − En), (51)

where the energies En are the eigenvalues of H. In thisbasis, ρ(t) = eiHtρ(0)e−iHt simplifies to

ρ(t)mn = ρ(0)mnei(Em−En)t, (52)

This means that in the energy eigenbasis, the probabili-ties pn ≡ ρnn are invariant over time. These quantitiesconstitute the energy spectral density for the state:

pn = 〈En|ρ|En〉. (53)

In the energy eigenbasis, equation (48) reduces to

δH2 = ∆H2 =∑n

pnE2n −

(∑n

pnEn

)2

, (54)

which is time-invariant because the spectral density pnis. For general states, equation (47) simplifies to

δH2 =∑mn

|ρmn|2En(En − Em). (55)

This is time-independent because equation (52) showsthat ρmn changes merely by a phase factor, leaving |ρmn|invariant. In other words, when a quantum state evolvesunitarily in the Hilbert-Schmidt vector space, both theposition vector ρ and the velocity vector ρ retain theirlengths: both ||ρ|| and ||ρ|| remain invariant over time.

B. Dynamics versus complexity

Our results above show that if all we are interested in ismaximizing the maximal probability velocity vmax, thenwe should find the two most widely separated eigenval-ues of H, Emin and Emax, and choose a pure state thatinvolves a coherent superposition of the two:

|ψ〉 = c1|Emin〉+ c2|Emax〉, (56)

13 The fidelity between the state ψ(t) and the initial state ψ0 isdefined as

F (t) ≡ 〈ψ0|ψ(t)〉, (50)

and it is easy to show that F (0) = 0 and F (0) = −(∆H)2, sothe energy uncertainty is a good measure of dynamics in that italso determines the fidelity evolution to lowest order, for purestates. For a detailed review of related measures of dynam-ics/information processing capacity, see [16].

20

FIG. 10: Time-evolution of Bloch vector trσρ1 for a single qubit subsystem. We saw how minimizing H3 leads to a static statewith no dynamics, such as the left example. Maximizing δH, on the other hand, produces extremely simple dynamics such asthe right example. Reducing δH by a modest factor of order unity can allow complex and chaotic dynamics (center); shownhere is a 2-qubit system where the second qubit is traced out.

where |c1| = |c2| = 1/√

2. This gives δH = (Emax −Emin)/2, the largest possible value, but produces an ex-tremely simple and boring solution ρ(t). Since the spec-tral density pn = 0 except for these two energies, thedynamics is effectively that of a 2-state system (a sin-gle qubit) no matter how large the dimensionality of His, corresponding to a simple periodic solution with fre-quency ω = Emax − Emin (a circular trajectory in theBloch sphere as in the right panel of Figure 10). This vi-olates the dynamics principle as defined in Table II, sinceno substantial information processing capacity exists: thesystem is simply performing the trivial computation thatflips a single bit repeatedly.

To perform interesting computations, the systemclearly needs to exploit a significant part of its energyspectrum. As can be seen from equation (52), if theeigenvalue differences are irrational multiples of one an-other, then the time evolution will never repeat, and ρwill eventually evolve through all parts of Hilbert spaceallowed by the invariants |〈Em|ρ|En〉|. The reduction ofδH required to transition from simple periodic motionto such complex aperiodic motion is quite modest. Forexample, if the eigenvalues are roughly equispaced, thenchanging the spectral density pn from having all weight atthe two endpoints to having approximately equal weightfor all eigenvalues will only reduce the energy coherenceδH by about a factor

√3, since the standard deviation

of a uniform distribution is√

3 times smaller than itshalf-width.

C. Highly autonomous systems: sliding along thediagonal

What combinations of H, ρ and factorization producehighly autonomous systems? A broad and interestingclass corresponds to macroscopic objects around us thatmove classically to an excellent approximation.

The states that are most robust toward environment-induced decoherence are those that approximately com-mute with the interaction Hamiltonian [36]. As a simplebut important example, let us consider an interactionHamiltonian of the factorizable form

H3 = A⊗B, (57)

and work in a system basis where the interaction termA is diagonal. If ρ1 is approximately diagonal in thisbasis, then H3 has little effect on the dynamics, whichbecomes dominated by the internal subsystem Hamilto-nian H1. The Quantum Zeno Paradox we encounteredin Section III L involved a situation where H1 was alsodiagonal in this same basis, so that we ended up withno dynamics. As we will illustrate with examples below,classically moving objects in a sense constitute the op-posite limit: the commutator ρ1 = i[H1, ρ1] is essentiallyas large as possible instead of as small as possible, con-tinually evading decoherence by concentrating ρ arounda single point that continually slides along the diagonal,as illustrated in Figure 11. Decohererence rapidly sup-presses off-diagonal elements far from this diagonal, butleaves the diagonal elements completely unaffected, so

21

Dynamics from H1

slides along diagonal

ρij≠0

High-decoherence subspace

High-decoherence subspace

Dynamics from H3 suppresses

off-diagonal elements ρij

Dynamics from H3 suppresses

off-diagonal elements ρij

Low-decoherence subspace

i

j

FIG. 11: Schematic representation of the time-evolution ofthe density matrix ρij for a highly autonomous subsystem.ρij ≈ 0 except for a single region around the diagonal(red/grey dot), and this region slides along the diagonal un-der the influence of the subsystem Hamiltonian H1. Any ρij-elements far from the diagonal rapidly approach zero becauseof environment-decoherence caused by the interaction Hamil-tonian H3.

there exists a low-decoherence band around the diagonal.Suppose, for instance, that our subsystem is the center-of-mass position x of a macroscopic object experiencinga position-dependent potential V (x) caused by couplingto the environment, so that Figure 11 represents the den-sity matrix ρ1(x, x′) in the position basis. If the potentialV (x) has a flat (V ′ = 0) bottom of width L, then ρ1(x, x′)will be completely unaffected by decoherence for the band|x′−x| < L. For a generic smooth potential V , the deco-herence suppression of off-diagonal elements grows onlyquadratically with the distance |x′−x| from the diagonal[4, 35], again making decoherence much slower than theinternal dynamics in a narrow diagonal band.

As a specific example of this highly autonomous type,let us consider a subsystem with a uniformly spaced en-ergy spectrum. Specifically, consider an n-dimensionalHilbert space and a Hamiltonian with spectrum

Ek =

[k − n− 1

2

]~ω = k~ω + E0, (58)

k = 0, 1, ..., n − 1. We will often set ~ω = 1 for simplic-ity. For example, n = 2 gives the spectrum {− 1

2 ,12}

like the Pauli matrices divided by two, n = 5 gives{−2,−1, 0, 1, 2} and n → ∞ gives the simple Harmonicoscillator (since the zero-point energy is physically irrele-vant, we have chosen it so that tr H =

∑Ek = 0, whereas

the customary choice for the harmonic oscillator is suchthat the ground state energy is E0 = ~ω/2).

If we want to, we can define the familiar position andmomentum operators x and p, and interpret this systemas a Harmonic oscillator. However, the probability ve-locity v is not maximized in either the position or themomentum basis, except twice per oscillation — whenthe oscillator has only kinetic energy, v is maximized inthe x-basis, and when it has only potential energy, v ismaximized in the p-basis, and when it has only poten-tial energy. If we consider the Wigner function W (x, p),which simply rotates uniformly with frequency ω, it be-comes clear that the observable which is always chang-ing with the maximal probability velocity is instead thephase, the Fourier-dual of the energy. Let us thereforedefine the phase operator

Φ ≡ FHF†, (59)

where F is the unitary Fourier matrix.Please remember that none of the systems H that we

consider have any a priori physical interpretation; rather,the ultimate goal of the physics-from-scratch program isto derive any interpretation from the mathematics alone.Generally, any thus emergent interpretation of a subsys-tem will depend on its interactions with other systems.Since we have not yet introduced any interactions for oursubsystem, we are free to interpret it in whichever way isconvenient. In this spirit, an equivalent and sometimesmore convenient way to interpret our Hamiltonian fromequation (58) is as a massless one-dimensional scalar par-ticle, for which the momentum equals the energy, so themomentum operator is p = H. If we interpret the par-ticle as existing in a discrete space with n points and atoroidal topology (which we can think of as n equispacedpoints on a ring), then the position operator is related tothe momentum operator by a discrete Fourier transform:

x = FpF†, Fjk ≡1√Nei

jk2πn . (60)

Comparing equations (59) and (60), we see that x =Φ. Since F is unitary, the operators H, p, x and Φall have the same spectrum: the evenly spaced grid ofequation (58).

As illustrated in Figure 12, the time-evolution gen-erated by H has a simple geometric interpretation inthe space spanned by the position eigenstates |xk〉, k =1, ...n: the space is unitarily rotating with frequency ω,so after a time t = 2π/nω, a state |ψ(0)〉 = |xk〉 hasbeen rotated such that it equals the next eigenvector:|ψ(t)〉 = |xk+1〉, where the addition is modulo n. Thismeans that the system has period T ≡ 2π/ω, and that|ψ〉 rotates through each of the n basis vectors duringeach period.

Let us now quantify the autonomy of this system, start-ing with the dynamics. Since a position eigenstate is aDirac delta function in position space, it is a plane wavein momentum space — and in energy space, since H = p.

22

x

zy

z

ω

x

y

z

ˆ

ˆ ˆ

ω

2x

1x

5x

8x7x

4x

6x

3x

ω

FIG. 12: For a system with an equispaced energy spectrum (such as a truncated harmonic oscillator or a massless particle ina discrete 1-dimensional periodic space), the time-evolution has a simple geometric interpretation in the space spanned by theeigenvectors xk of the phase operator FHF, the Fourier dual of the Hamiltonian, corresponding to unitarily rotating the entirespace with frequency ω, where ~ω is the energy level spacing. After a time 2π/nω, each basis vector has been rotated into thesubsequent one, as schematically illustrated above. (The orbit in Hilbert space is only planar for n ≤ 3, so the figure shouldnot be taken too literally.) The black star denotes the α = 1 apodized state described in the text, which is more robust towarddecoherence.

This means that the spectral density is pn = 1/n for aposition eigenstate. Substituting equation (58) into equa-tion (54) gives an energy coherence

δH = ~ω√n2 − 1

12. (61)

For comparison,

||H|| =

(n−1∑k=0

E2k

)1/2

= ~ω√n(n2 − 1)

12=√n δH. (62)

Let us now turn to quantifying independence and de-coherence. The inner product between the unit vector|ψ(0)〉 and the vector |ψ(t)〉 ≡ eiHt|ψ(0)〉 into which itevolves after a time t is

fn(φ) ≡ 〈ψ|eiHφω |ψ〉 =

1

n

n−1∑k=0

eiEkφ = e−in−12 φ

n−1∑k=0

eikφ

=1

ne−i

n−12 φ 1− einφ

1− eiφ=

sinnφ

n sinφ, (63)

where φ ≡ ωt. This inner product fn is plotted in Fig-ure 13, and is seen to be a sharply peaked even functionsatisfying fn(0) = 1, fn(2πk/n) = 0 for k = 1, ..., n − 1and exhibiting one small oscillation between each of thesezeros. The angle θ ≡ cos−1 fn(φ) between an initial vec-tor φ and its time evolution thus grows rapidly from 0◦ to90◦, then oscillates close to 90◦ until returning to 0◦ aftera full period T . An initial state |ψ(0)〉 = |xk〉 thereforeevolves as

ψj(t) = fn(ωt− 2π[j − k]/n)

1.0

-0.2

0.8

0.4

0.2

0.6

-100°-150° 150°100°50°-50°

Pena

lty fu

nctio

n

Apo

dize

d

Non-apodized

FIG. 13: The wiggliest (heavy black) curve shows the innerproduct of a position eigenstate with what it evolves into atime t = φ/ω later due to our n = 20-dimensional Hamilto-nian with energy spacings ~ω. When optimizing to minimizethe square of this curve using the 1 − cosφ penalty func-tion shown, corresponding to apodization in the Fourier do-main, we instead obtain the green/light grey curve, resultingin much less decoherence.

in the position basis, i.e., a wavefunction ψj sharplypeaked for j ∼ k + nωt/2π (mod n). Since the densitymatrix evolves as ρij(t) = ψi(t)ψj(t)

∗, it will thereforebe small except for i ∼ j ∼ k + nωt/2π (mod n), corre-sponding to the round dot on the diagonal in Figure 11.In particular, the decoherence-sensitive elements ρjk willbe small far from the diagonal, corresponding to the smallvalues that fn takes far from zero. How small will thedecoherence be? Let us now develop the tools needed toquantify this.

23

D. The exponential growth of autonomy withsystem size

Let us return to the most general Hamiltonian Hand study how an initially separable state ρ = ρ1 ⊗ ρ2evolves over time. Using the orthogonal projectors ofSection III I, we can decompose H as

H = H1 ⊗ I + I⊗H2 + H3, (64)

where tr 1H3 = tr 2H3 = 0. By substituting equa-tion (64) into the evolution equation ρ1 = tr 2ρ =itr 2[H, ρ] and using various partial trace identities fromSection A to simplify the resulting three terms, we obtain

ρ1 = i tr2

[H, ρ1 ⊗ ρ2] = i [H1 + H∗, ρ1], (65)

where what we will term the effective interaction Hamil-tonian

H∗ ≡ tr2{(I⊗ ρ2)H3} (66)

can be interpreted as an average of the interaction Hamil-tonian H3, weighted by the environment state ρ2. Asimilar effective Hamiltonian is studied in [42–44]. Equa-tion (65) implies that the evolution of ρ1 remains unitaryto first order in time, the only effect of the interaction H3

being to replace H1 from equation (15) by an effectiveHamiltonian H1 + H∗.

The second time derivative is given by ρ1 = tr 2ρ =−tr 2[H, [H, ρ]], and by analogously substituting equa-tion (64) and using partial trace identities from Section Ato simplify the resulting nine terms, we obtain

− ρ1 = tr 2[H, [H, ρ1 ⊗ ρ2]] =

= [H1, [H1, ρ1]]− i [K, ρ1] +

+ [H1, [H∗, ρ1]] + [H∗, [H1, ρ1]] +

+ tr 2[H3, [H3, ρ1 ⊗ ρ2]], (67)

where we have defined the Hermitian matrix

K ≡ i tr 2{(I⊗ [H2, ρ2])H3}. (68)

To qualify independence and autonomy, we are inter-ested in the extent to which H3 causes entanglement andmakes the time-evolution of ρ1 non-unitary. When think-ing of ρ as a vector in the Hilbert-Schmidt vector spacethat we reviewed in Section III H, unitary evolution pre-serves its length ||ρ||. To provide geometric intuition forthis, let us define dot and cross product notation analo-gous to vector calculus. First note that

(A†, [A,B]) = tr AAB− tr ABA = 0, (69)

since a trace of a product is invariant under cyclic per-mutations of the factors. This shows that a commuta-tor [A,B] is orthogonal to both A† and B† under theHilbert-Schmidt inner product, and a Hermitian matrixH is orthogonal to its commutator with any matrix.

This means that it we restrict ourselves to the Hilbert-Schmidt vector space of Hermitian matrices, we obtain aninteresting generalization of the standard dot and crossproducts for 3D vectors. Defining

A ·B ≡ (A,B), (70)

A×B ≡ i[A,B], (71)

we see that these operations satisfy all the same prop-erties as their familiar 3D analogs: the scalar (dot)product is symmetric (B · A = tr B†A = tr AB† =A · B), while the vector (cross) product is antisym-metric (A × B = B × A), orthogonal to both factors([A×B] ·A = [A×B] ·B = 0), and produces a result ofthe same type as the two factors (a Hermitian matrix).

In this notation, the products of an arbitrary Hermi-tian matrix A with the identity matrix I are

I ·A = tr A, (72)

I×A = 0, (73)

and the Schrodinger equation ρ = i[H, ρ] becomes simply

ρ = H× ρ. (74)

Just as in the 3D vector analogy, we can think of thisas generating rotation of the vector ρ that preserves itslength:

d

dt||ρ||2 =

d

dtρ · ρ = 2ρ · ρ = 2(H× ρ) · ρ = 0. (75)

A simple and popular way of quantifying whether evo-lution is non-unitary is to compute the linear entropy

Slin ≡ 1− tr ρ2 = 1− ||ρ||2, (76)

and repeatedly differentiating equation (76) tells us that

Slin = −2ρ · ρ, (77)

Slin = −2(||ρ||2 + ρ · ρ), (78)...S

lin= −6ρ · ρ− 2ρ ·

...ρ . (79)

Substituting equations (65) and (67) into equations (77)and (78) for ρ1, we find that almost all terms cancel,leaving us with the simple result

Slin1 = 0, (80)

Slin1 = 2 tr {ρ1 tr

2[H3, [H3, ρ]]} − 2||[H∗, ρ1]||2. (81)

This means that, to second order in time, the entropyproduction is completely independent of H1 and H2, de-pending only on quadratic combinations of H3, weightedby quadratic combinations of ρ. We find analogous re-sults for the Shannon entropy S: If the density matrixis initially separable, then S1 = 0 and S1 depends noton the full Hamiltonian H, but only on its non-separablecomponent H3, quadratically.

We now have the tools we need to compute the auton-omy of our “diagonal-sliding” system from the previous

24

subsection. As a simple example, let us take H1 to beour Hamiltonian from equation (58) with its equispacedenergy spectrum, with n = 2b, so that we can view theHilbert space as that of b coupled qubits. Equation (61)then gives an energy coherence

δH ≈ ~ω√12

2b, (82)

so the probability velocity grows exponentially with thesystem size b.

We augment this Hilbert space with one additional“environment” qubit that begins in the state |↑〉, withinternal dynamics given by H2 = ~ω2σx, and couple itto our subsystem with an interaction

H3 = V (x)⊗ σx (83)

for some potential V ; x is the position operator fromequation (60). As a first example, we use the sinusoidalpotential V (x) = sin(2πx/n), start the first subsystemin the position eigenstate |x1〉 and compute the linearentropy Slin

1 (t) numerically.As expected from our qualitative arguments of the pre-

vious section, Slin1 (t) grows only very slowly, and we find

that it can be accurately approximated by its Taylor ex-pansion around t = 0 for many orbital periods T ≡ 2π/ω:

Slin1 (t) ≈ Slin

1 (0) t2/2, where Slin1 (0) is given by equa-

tion (81). Figure 14 shows the linear entropy after oneorbit, Slin

1 (T ), as a function of the number of qubits bin our subsystem (top curve in top panel). Whereasequation (83) showed that the dynamics increases expo-nentially with system size (as 2b), the figure shows thatSlin1 (T ) decreases exponentially with system size, asymp-

totically falling as 2−4b as b→∞.Let us define the dynamical timescale τdyn and the in-

dependence timescale τind as

τdyn =~δH

, (84)

τind = [Slin1 (0)]−1/2. (85)

Loosely speaking, we can think of τdyn as the time oursystem requires to perform an elementary informationprocessing operation such as a bit flip [16], and τind as thetime it takes for the linear entropy to change by of orderunity, i.e., for significant information exchange with theenvironment to occur. If we define the autonomy A asthe ratio

A ≡ τindτdyn

, (86)

the autonomy of our subsystem thus grows exponen-tially with system size, asymptotically increasing as A ∝22b/2−b = 23b as b→∞.

As illustrated by Figure 11, we expect this exponen-tial scaling to be quite generic, independent of interactiondetails: the origin of the exponential is simply that thesize of the round dot in the figure is of order 2b times

smaller than the size of the square representing the fulldensity matrix. The independence timescale τind is ex-ponentially large because the dot, with its non-negligibleelements ρij , is exponentially close to the diagonal. Thedynamics timescale τdyn is exponentially small because itis roughly the time it takes the dot to traverse its own di-ameter as it moves around at some b-independent speedin the figure.

This exponential increase of autonomy with systemsize makes it very easy to have highly autonomous sys-tems even if the magnitude H3 of the interaction Hamil-tonian is quite large. Although the environment contin-ually “measures” the position of the subsystem throughthe strong coupling H3, this measurement does not de-cohere the subsystem because it is (to an exponen-tially good approximation) a non-demolition measure-ment, with the subsystem effectively in a position eigen-state. This phenomenon is intimately linked to the quan-tum Darwinism paradigm developed by Zurek and collab-orators [40], where the environment mediates the emer-gence of a classical world by acting as a witness, stor-ing large numbers of redundant copies of informationabout the system state in the basis that it measures. Wethus see that systems that have high autonomy via the“diagonal-sliding” mechanism are precisely objects thatdominate quantum Darwinism’s “survival of the fittest”by proliferating imprints of their states in the environ-ment.

E. Boosting autonomy with optimized wavepackets

In our worked example above, we started our subsys-tem in a position eigenstate |x1〉, which cyclically evolvedthough all other position eigenstates. The slight decoher-ence that did occur thus originated during the times whenthe state was between eigenstates, in a coherent super-positions of multiple eigenstates quantified by the mostwiggly curve in Figure 13. Not surprisingly, these wiggles(and hence the decoherence) can be reduced by a better

choice of initial state |ψ〉 =∑k ψk|xk〉 =

∑k ψk|Ek〉 for

our subsystem, where ψk and ψk are the wavefunctionamplitudes in the position and energy bases, respectively.Equation (63) then gets generalized to

gn(φ) ≡ 〈x1|eiHφω |ψ〉 = e−i

n−12 φ

n−1∑k=0

ψkeikφ. (87)

Let us choose the initial state |ψ〉 that minimizes thequantity ∫ π

−π|gn(θ)|2w(θ)dθ (88)

for some penalty function w(θ) that punishes states giv-ing large unwanted |g(θ)| far from θ = 0. This givesa simple quadratic minimization problem for the vec-

tor of coefficients ψk, whose solution turns out to be the

25

4

Entro

py in

crea

se d

urin

g fir

st o

rbit

10-6

1

10-2

10-4

10-8

10-6

10-2

10-4

10-8

10-10

System qubits2 5 104 63 987

Sinusoidal potential

Gaussian potential

α = 0

α = 2

α = 3

α = 4

α = 1

α = 0

α = 1

Entro

py in

crea

se d

urin

g fir

st o

rbit

FIG. 14: The linear entropy increase during the first orbit,Slin1 (2π/ω), is plotted for as a function of the subsystem

size (number of qubits b). The interaction potential V (x)is sinusoidal (top) and Gaussian (bottom), and the differentapodization schemes used to select the initial state are labeledby their corresponding α-value, where α = 0 corresponds tono apodization (the initial state being a position eigenstate).Some lines have been terminated in the bottom panel due toinsufficient numerical precision.

last (with smallest eigenvalue) eigenvector of the Toeplitzmatrix whose first row is the Fourier series of w(θ). Aconvenient choice of penalty function 1− cosφ (see Fig-ure 13), which respects the periodicity of the problemand grows quadratically around its φ = 0 minimum. Inthe n → ∞ limit, the Toeplitz eigenvalue problem sim-

plifies to Laplace’s equation with a ψ(φ) = cos φ2 winningeigenvector, giving

ψk ≡∫ π

−πcos(kφ)φ(φ)dφ =

cos(πk)

1− 4k2. (89)

The corresponding curve gn(φ) is plotted is Figure 13,and is seen to have significantly smaller wiggles awayfrom the origin at the cost of a very slight widening of thecentral peak. Figure 14 (top panel, lower curve) showsthat this choice significantly reduces decoherence.

What we have effectively done is employ the standardsignal processing technique known as apodization. Asidefrom the irrelevant phase factor, equation (87) is simply

the Fourier transform of ψ, which can be made narrower

by making ψ smoothly approach zero at the two end-

points. In the n → ∞ limit, our original choice corre-

sponded to ψ = 1 for −π ≤ φ ≤ π, which is discontin-

uous, whereas our replacement function ψ = cos φ2 van-ishes at the endpoints and is continuous. This reducesthe wiggling because Riemann-Lebesgue’s lemma impliesthat the Fourier transform of a function whose first dderivatives are continuous falls off faster than k−d. By

instead using ψ(α)(φ) = (cos φ2 )α for some integer α ≥ 0,we get α continuous derivatives, so the larger we chooseα, the smaller the decoherence-inducing wiggles, at thecost of widening the central peak. The first five casesgive

ψ(0)k = δ0k, (90)

ψ(1)k =

cos(πk)

1− 4k2, (91)

ψ(2)k = δ0k +

1

2δ1,|k|, (92)

ψ(3)k =

cos(πk)

(1− 4k2)(1− 49k

2), (93)

ψ(4)k = δ0k +

2

3δ1,|k| +

1

6δ2,|k|, (94)

and it is easy to show that the α→∞ limit correspondsto a Gaussian shape.

Which apodization is best? This depends on the in-teraction H3. For our sinusoidal interaction potential(Figure 14, top), the best results are for α = 1, whenthe penalty function has a quadratic minimum. Whenswitching to the roughly Gaussian interaction potentialV (x) ∝ e4 cos(2πx/n) (Figure 14, bottom), the resultsare instead seen to keep improving as we increase α, pro-ducing dramatically less decoherence than for the sinu-soidal potential, and suggesting that the optical choiceis the α → ∞ state: a Gaussian wave packet. Gaussianwave packets have long garnered interest as models of ap-proximately classical states. They correspond to general-ized coherent states, which have shown to be maximallyrobust toward decoherence in important situations in-volving harmonic oscillator interactions [45]. They havealso been shown to emerge dynamically in harmonic os-cillator environments, from the accumulation of manyindependent interactions, in much the same way as thecentral limit theorem gives a Gaussian probability distri-bution to sums of many independent contributions [46].Our results suggest that Gaussian wave packets may alsoemerge as the most robust states towards decoherencefrom short-range interactions with exponential fall-off.

F. Optimizing autonomy when we can choose thestate: factorizable effective theories

Above we explored specific examples of highly au-tonomous systems, motivated by approximately classicalsystems that we find around us in nature. We found thatthere are combinations of ρ, H and Hilbert space factor-ization that provide excellent autonomy even when the

26

interaction H3 is not small. We will now see that, moregenerally, given any H and factorization, there are statesρ that give perfect factorization and infinite autonomy.The basic idea is that for states such that some of thespectral density invariants pk vanish, it makes no differ-ence if we replace the corresponding unused eigenvaluesof H by others to make the Hamiltonian separable.

Consider a subspace of the full Hilbert space defined bya projection operator Π. A projection operator satisfiesΠ2 = Π = Π†, so its eigenvalues are all zero or one, andthe latter correspond to our subspace of interest. Letus define the symbol l to denote that operator equalityholds in this subspace. For example,

A−B l 0 (95)

means that

Π(A−B)Π = 0. (96)

Below will often chose the subspace to correspond to low-energy states, so the wave symbol in l is intended to re-mind us that equality holds in the long wavelength limit.

We saw that the energy spectral density pn of equa-tion (53) remains invariant under unitary time evolution,so any energy levels for which pn = 0 will never haveany physical effect, and the corresponding dimensions ofthe Hilbert space can simply be ignored as “frozen out”.This remains true even considering observation-relatedstate projection as described in the next subsection. Letus therefore define

Π =∑k

θ(pn)|En〉〈En|, (97)

where θ is the Heaviside step function (θ(x) = 1 if x > 0,vanishing otherwise) i.e., summing only over those en-ergy eigenstates for which the probability pn is non-zero.Defining new operators in our subspace by

ρ′ ≡ ΠρΠ, (98)

H′ ≡ ΠHΠ, (99)

(100)

equation (97) implies that

ρ′ =∑mn

θ(pm)θ(pn)|Em〉〈Em|ρ|En〉〈En|

=∑mn

|Em〉〈Em|ρ|En〉〈En| = ρ, (101)

Here the second equal sign follows from the fact that|〈Em|ρ|En〉|2 ≤ 〈Em|ρ|Em〉〈En|ρ|En〉14, so that the left

14 This last inequality follows because ρ is Hermitian and positivesemidefinite, so the determinant must be non-negative for the2× 2 matrix 〈Ei|ρ|Ej〉 where i and j each take the two values kand l.

hand side must vanish whenever either pm or pn vanishes— the Heaviside step functions therefore have no effectin equation (101) and can be dropped.

Although H′ 6= H, we do have H′ l H, and this meansthat the time-evolution of ρ can be correctly computedusing H′ in place of the full Hamiltonian H:

ρ(t) = Πρ(t)Π = ΠeiHtΠρ(0)Πe−iHtΠ = eiH′tρ(0)e−iH

′t.

The frozen-out part of the Hilbert space is therefore com-pletely unobservable, and we can act as though the sub-space is the only Hilbert space that exists, and as if H′

is the true Hamiltonian. By working only with ρ′ and H′

restricted to the subspace, we have also simplified thingsby reducing the dimensionality of these matrices.

Sometimes, H′ can possess more symmetry than H.Sometimes, H′ can be separable even if H is not:

H l H′ = H1 ⊗ I + I⊗H2 (102)

To create such a situation for an arbitrary n×n Hamilto-nian, where n = n1n2, simply pick a state ρ such that thespectral densities pk vanish for all except n1 +n2− 1 en-ergy eigenvectors. This means that in the energy eigenba-sis, with the eigenvectors sorted to place these n1+n2−1special ones first, ρ is a block-diagonal matrix vanishingoutside of the upper left (n1+n2−1)×(n1+n2−1) block.Equation (52) shows that ρ(t) will retain this block formfor all time, and that changing the energy eigenvalues Ekwith k > n1 +n2− 1 leaves the time-evolution of ρ unaf-fected. We can therefore choose these eigenvalues so thatH becomes separable. For example, for the case wherethe Hilbert space dimensionality n = 9, suppose that pkvanishes for all energies except E0, E1, E2, E3, E4, andadjust the irrelevant zero-point energy so that E0 = 0.Then define H′ whose 9 eigenvalues are 0 E1 E2

E3 E1 + E3 E2 + E3

E4 E1 + E4 E2 + E4

. (103)

Note that H′ l H, and that although H is genericallynot separable, H′ is separable, with subsystem Hamilto-nians H′1 = diag {0, E1, E2} and H′2 = diag {0, E3, E4}.Subsystems 1 and 2 will therefore evolve as a paralleluniverses governed by H′1 and H′1, respectively.

G. Minimizing quantum randomness

When we attempted to maximize the independence fora subsystem above, we implicitly wanted to maximizethe ability to predict the subsystems future state fromits present state. The source of unpredictability thatwe considered was influence from outside the subsystem,from the environment, which caused decoherence and in-creased subsystem entropy.

Since we are interested in modeling also conscious sys-tems, there is a second independent source of unpre-dictability that we need to consider, which can occur even

27

if there is no interaction with the environment: “quantumrandomness”. If the system begins in a single consciousstate and unitarily evolves into a superposition of subjec-tively distinguishable conscious states, then the observerin the initial state has no way of uniquely predicting herfuture perceptions.

A comprehensive framework for treating such situa-tions is given in [47], and in the interest of brevity, wewill not review it here, merely use the results. To be ableto state them as succinctly as possible, let us first intro-duce notation for a projection process “pr ” that is in asense dual to partial-tracing.

For a Hilbert space that is factored into two parts,we define the following notation. We indicate the tensorproduct structure by splitting a single index α into an in-dex pair ii′. For example, if the Hilbert space is the ten-sor product of an m-dimensional and an n-dimensionalspace, then α = n(i − 1) + i′, i = 1, ...,m, i′ = 1, ..., n,α = 1, ...,mn, and if A = B⊗C, then

Aαβ = Aii′jj′ = BijCi′j′ . (104)

We define ? as the operation exchanging subsystems 1and 2:

(A?)ii′jj′ = Ai′ij′j (105)

We define pr kA as the kth diagonal block of A:

(prk

A)ij = Akikj

For example, pr 1A is the m×m upper left corner of A.As before tr iA, denotes the partial trace over the ith

subsystem:

(tr1

A)ij =∑k

Akikj (106)

(tr2

A)ij =∑k

Aikjk (107)

The following identities are straightforward to verify:

tr1

A? = tr2

A (108)

tr2

A? = tr1

A (109)

tr1

A =∑k

prk

A (110)

tr2

A =∑k

prk

A? (111)

tr prk

A = (tr2

A)kk (112)

tr prk

A? = (tr1

A)kk (113)

Let us adopt the framework of [47] and decomposethe full Hilbert space into three parts corresponding tothe subject (the conscious degrees of freedom of the ob-server), the object (the external degrees of freedom that

the observer is interested in making predictions about)and the environment (all remaining degrees of freedom).

If the subject knows the object-environment densitymatrix to be ρ, it obtains its density matrix for the objectby tracing out the environment:

ρo = treρ.

If the subject-object density matrix is ρ, then the sub-ject may be in a superposition of having many differentperceptions |sk〉. Take the |sk〉 to form a basis of thesubject Hilbert space. The probability that the subjectfinds itself in the state |sk〉 is

pk = (tr2ρ)kk, (114)

and for a subject finding itself in this state |sk〉, the objectdensity matrix is

ρ(k)o =pr k ρ

pk. (115)

If ρ refers to a future subject-object state, and the sub-ject wishes to predict its future knowledge of the object,it takes the weighted average of these density matrices,obtaining

ρo =∑k

pkρ(k)o =

∑k

prkρ = tr

sρ,

i.e., it traces out itself! (We used the identity (110) inthe last step.) Note that this simple result is indepen-dent of whatever basis is used for the object-space, so allissues related to how various states are perceived becomeirrelevant.

As proven in [48], any unitary transformation of a sep-arable ρ will increase the entropy of tr 1ρ. This meansthat the subject’s future knowledge of ρo is more un-certain than its present knowledge thereof. However, asproven in [47], the future subject’s knowledge of ρo willon average be less uncertain than it presently is, at least ifthe time-evolution is restricted to be of the measurementtype.

The result ρo = tr 1 ρ also holds if you measure theobject and then forget what the outcome was. In thiscase, you are simply playing the role of an environment,resulting in the exact same partial-trace equation.

In summary, for a conscious system to be able to pre-dict the future state of what it cares about (ρo) as wellas possible, we must minimize uncertainty introducedboth by the interactions with the environment (fluctu-ation, dissipation and decoherence) and by measurement(“quantum randomness”). The future evolution can bebetter predicted for certain object states than for oth-ers, because they are more stable against both of theabove-mentioned sources of unpredictability. The util-ity principle from Table II suggests that it is preciselythese most stable and predictable states that consciousobservers will perceive. The successful “predictability

28

sieve” idea of Zurek and collaborators [50] involves pre-cisely this idea when the source of unpredictability isenvironment-induced decoherence, so the utility principlelets us generalize this idea to include the second unpre-dictability source as well: to minimize apparent quantumrandomness, we should pay attention to states whose dy-namics lets them remain relatively diagonal in the eigen-basis of the subject-object interaction Hamiltonian, sothat our future observations of the object are essentiallyquantum non-demolition measurements.

A classical computer is a flagship example of a sucha maximally causal system, minimizing its uncertaintyabout its future. By clever design, a small subset ofthe degrees of freedom in the computer, interpreted asbits, deterministically determine their future state withvirtually no uncertainty. For my laptop, each bit corre-sponds to the positions of certain electrons in its memory(determining whether a micro-capacitor is charged). Anideal computer with zero error rate thus has not onlycomplex dynamics (which is Turing-complete modulo re-source limitations), but also perfect autonomy, with itsfuture state determined entirely by its own state, inde-pendently of the environment state. The Hilbert spacefactorization that groups the bits of this computer intoa subsystem is therefore optimal, in the sense that anyother factorization would reduce the autonomy. More-over, this optimal solution to the quantum factorizationproblem is quite sharply defined: considering infinites-imal unitary transformations away from this optimum,any transformation that begins rotating an environmentbit into the system will cause a sharp reduction of theautonomy, because the decoherence rate for environmentqubits (say a thermal collision frequency ∼ 1015 Hz) isorders of magnitude larger than the dynamics rate (saythe clock frequency ∼ 109 Hz). Note that H3 is far fromzero in this example; the pointer basis corresponds toclassical bit strings of which the environment performsfrequent quantum non-demolition measurements.

This means that if artificial intelligence researchers oneday succeed in making a classical computer conscious,and if we turn off any input devices though which ouroutside world can affect its information processing, thenit will subjectively perceive itself as existing in a paralleluniverse completely disconnected from ours, even thoughwe can probe its internal state from outside. If a futurequantum computer is conscious, then it will feel like in aparallel universe evolving under the Hamiltonian H1(t)that we have designed for it — until the readout stage,when we switch on an interaction H3.

H. Optimizing autonomy when the state is given

Let us now consider the case where both H and ρ aretreated as given, and we want to vary the Hilbert spacefactorization to attain maximal separability. H and ρtogether determine the full time-evolution ρ(t) via theSchrodinger equation, so we seek the unitary transforma-

tion U that makes Uρ(t)U† as factorizable as possible.For a pure initial state, exact factorability is equivalentto ρ1(t) being pure, with ||ρ1|| = 1 and vanishing linearentropy Slin = 1−||ρ1(t)||2, so let us minimize the linearentropy averaged over a range of times. As a concreteexample, we minimize the function

f(U) ≡ 1− 1

m

m∑i=1

|| tr1

Uρ(ti)U†||2, (116)

using 9 equispaced times ti ranging from t = 0 and t = 1,a random 4×4 Hamiltonian H, and a random pure stateρ(0).

0 2 4 6 8 10

0.2

0.4

0.6

0.8

1.0

Time

Nor

m ||ρ

1||

New

factorization

Old factorization

FIG. 15: The Hilbert-Schmidt norm ||ρ1|| is plotted for a ran-dom pure-state 2-qubit system when factorizing the Hilbertspace in the original basis (black curve) and after a unitarytransformation optimized to keep ρ1 as pure as possible fort ≤ 1 (red/grey curve).

The result of numerically solving this optimizationproblem is shown in Figure 15, and we see that the newfactorization keeps the norm ||ρ1|| visually indistinguish-able from unity for the entire time period optimized for.The optimization reduced the average Shannon entropyover this period from S ≈ 1.1 bits to S = 0.0009 bits.

The reason that the optimization is so successful ispresumably that it by adjusting N = n2 − n21 − n22 =16 − 4 − 4 = 8 real parameters15 in U, it is able toapproximately zero out the first N terms in the Taylorexpansion of Slin(t), whose leading terms are given byequations (77)- (79). A series of similar numerical exper-iments indicated that such excellent separability couldgenerally be found as long as the number of time stepsti was somewhat smaller than the number of free pa-rameters N but not otherwise, suggesting that separa-bility can be extended over long time periods for large

15 There are n2 parameters for U, but transformations within eachof the two subspaces have no effect, wasting n2

1 and n22 parame-

ters.

29

n. However, because we are studying only unitary evolu-tion here, neglecting the important projection effect fromthe previous section, it is unclear how relevant these re-sults are to our underlying goal. We have therefore notextended these numerical optimizations, which are quitetime-consuming, to larger n.

V. CONCLUSIONS

In this paper, we have explored two problems that areintimately related. The first problem is that of under-standing consciousness as a state of matter, “perceptro-nium”. We have focused not on solving this problem,but rather on exploring the implications of this view-point. Specifically, we have explored four basic principlesthat may distinguish conscious matter from other phys-ical systems: the information, integration, independenceand dynamics principles.

The second one is the physics-from-scratch problem:If the total Hamiltonian H and the total density ma-trix ρ fully specify our physical world, how do we ex-tract 3D space and the rest of our semiclassical worldfrom nothing more than two Hermitian matrices? Cansome of this information be extracted even from H alone,which is fully specified by nothing more than its eigen-value spectrum? We have focused on a core part of thischallenge which we have termed the quantum factoriza-tion problem: why do conscious observers like us perceivethe particular Hilbert space factorization correspondingto classical space (rather than Fourier space, say), andmore generally, why do we perceive the world around usas a dynamic hierarchy of objects that are strongly inte-grated and relatively independent?

These two problems go hand in hand, because a genericHamiltonian cannot be decomposed using tensor prod-ucts, which would correspond to a decomposition of thecosmos into non-interacting parts, so there is some op-timal factorization of our universe into integrated andrelatively independent parts. Based on Tononi’s work,we might expect that this factorization, or some gener-alization thereof, is what conscious observers perceive,because an integrated and relatively autonomous infor-mation complex is fundamentally what a conscious ob-server is.

A. Summary of findings

We first explored the integration principle, and foundthat classical physics allows information to be essentiallyfully integrated using error-correcting codes, so that anysubset containing up to about half the bits can be re-constructed from the remaining bits. Information storedin Hopfield neural networks is naturally error-corrected,but 1011 neurons support only about 37 bits of integratedinformation. This leaves us with an integration paradox:why does the information content of our conscious expe-

rience appear to be vastly larger than 37 bits? We foundthat generalizing these results to quantum informationexacerbated this integration paradox, allowing no morethan about a quarter of a bit of integrated information— and this result applied not only to Hopfield networksof a given size, but to the state of any quantum system ofany size. This strongly implies that the integration prin-ciple must be supplemented by at least one additionalprinciple.

We next explored the independence principle and theextent to which a Hilbert space factorization can decom-pose the Hamiltonian H (as opposed to the state ρ) intoindependent parts. We quantified this using projectionoperators in the Hilbert-Schmidt vector space where Hand ρ are viewed as vectors rather than operators, andproved that the best decomposition can always be foundin the energy eigenbasis, where H is diagonal. This leadsto a more pernicious variant of the Quantum Zeno Ef-fect that we termed the Quantum Zeno Paradox: if wedecompose our universe into maximally independent ob-jects, then all change grinds to a halt. Since consciousobservers clearly do not perceive reality as being staticand unchanging, the integration and independence prin-ciples must therefore be supplemented by at least oneadditional principle.

We then explored the dynamics principle, accordingto which a conscious system has the capacity to notonly store information, but also to process it. We found

the energy coherence δH ≡√

2 tr ρ2 to be a conve-nient measure of dynamics: it can be proven to be time-independent, and it reduces to the energy uncertainty∆H for the special case of pure states. Maximizing dy-namics alone gives boring periodic solutions unable tosupport complex information processing, but reducingδH by merely a modest percentage enables chaotic andcomplex dynamics that explores the full dimensionality ofthe Hilbert space. We found that high autonomy (a com-bination of dynamics and independence) can be attainedeven if the environment interaction is strong. One classof examples involves the environment effectively perform-ing quantum-non-demolition measurements of the au-tonomous system, whose internal dynamics causes thenon-negligible elements of the density matrix ρ to “slidealong the diagonal” in the measured basis, remainingin the low-decoherence subspace. We studied such anexample involving a truncated harmonic oscillator cou-pled to an external spin, and saw that it is easy to findclasses of systems whose autonomy grows exponentiallywith the system size (measured in qubits). Generalizedcoherent states with Gaussian wavefunctions appearedparticularly robust toward interactions with steep/short-range potentials. We found that any given H can also beperfectly decomposed given a suitably chosen ρ that as-signs zero amplitude to some energy eigenstates. Whenoptimizing the Hilbert space factorization for H and ρjointly, it appears possible to make a subsystem historyρ1(t) close to separable for a long time. However, it isunclear how relevant this is, because the state projection

30

caused by observation also alters ρ1.

B. How does a conscious entity perceive the world?

What are we to make of these findings? We have notsolved the quantum factorization problem, but our re-sults have brought it into sharper focus, and highlightedboth concrete open sub-problems and various hints andclues from observation about paths forward. Let us firstdiscuss some open problems, then turn to the hints.

For the physics-from-scratch problem of deriving howwe perceive our world from merely H, ρ and theSchrodinger equation, there are two possibilities: eitherthe problem is well-posed or it is not. If not, this wouldbe very interesting, implying that some sort of additionalstructure beyond ρ and H is needed at the fundamen-tal level — some additional mathematical structure en-coding properties of space, for instance, which would besurprising given that this appears unnecessary in latticeGauge theory (see Appendix C). Since we have limitedour treatment to unitary non-relativistic quantum me-chanics, obvious candidates for missing structure relateto relativity and quantum gravity, where the Hamiltonianvanishes, and to mechanisms causing non-unitary wave-function collapse. Indeed, Penrose and others have spec-ulated that gravity is crucial for a proper understandingof quantum mechanics even on small scales relevant tobrains and laboratory experiments, and that it causesnon-unitary wavefunction collapse [51]. Yet the Occam’srazor approach is clearly the commonly held view thatneither relativistic, gravitational nor non-unitary effectsare central to understanding consciousness or how con-scious observers perceive their immediate surroundings:astronauts appear to still perceive themselves in a semi-classical 3D space even when they are effectively in a zero-gravity environment, seemingly independently of rela-tivistic effects, Planck-scale spacetime fluctuations, blackhole evaporation, cosmic expansion of astronomically dis-tant regions, etc.

If, on the other hand, the physics-from-scratch prob-lem is well-posed, we face crucial unanswered questionsrelated to Hilbert space factorization. Why do we per-ceive electromagnetic waves as transferring informationbetween different regions of space, rather than as com-pletely independent harmonic oscillators that each stayput in a fixed spatial location? These two viewpointscorrespond to factoring the Hilbert space of the elec-tromagnetic field in either real space or Fourier space,which are simply two unitarily equivalent Hilbert spacebases. Moreover, how can we perceive a harmonic oscil-lator as an integrated system when its Hamiltonian can,as reviewed in Appendix B, be separated into completelyindependent qubits? Why do we perceive a magnetic sys-tem described by the 3D Ising model as integrated, whenit separates into completely independent qubits after a

unitary transformation?16 In all three cases, the answerclearly lies not within the system itself (in its internaldynamics H1), but in its interaction H3 with the rest ofthe world. But H3 involves the factorization problem allover again: whence this distinction between the systemitself and the rest of the world, when there are countlessother Hilbert space factorizations that mix the two?

C. Open problems

Based on our findings, three specific problems standin the way of solving the quantum factorization problemand answering these questions, and we will now discusseach of them in turn.

1. Factorization and the chicken-and-egg problem

What should we determine first: the state or the fac-torization? If we are given a Hilbert space factorizationand an environment state, we can use the predictabilitysieve formalism [50] to find the states of our subsystemthat are most robust toward decoherence. In some sim-ple cases, they are eigenstates of the effective interactionHamiltonian H∗ from equation (66). However, to findthe best factorization, we need information about thestate. A clock is a highly autonomous system if we fac-tor the Hilbert space so that the first factor correspondsto the spatial volume containing the clock, but if thestate were different such that the clock were somewhereelse, we should factor out a different volume. Moreover,if the state has the clock in a superposition of two macro-scopically different locations, then there is no single op-timal factorization, but instead a separate one for eachbranch of the wavefunction. An observer looking at theclock would use the clock position seen to project ontothe appropriate branch using equation (115), so the solu-tion to the quantum factorization problem that we shouldbe looking for is not a single unique factorization of theHilbert space. Rather, we need a criterion for identifyingconscious observers, and then a prescription that deter-mines which factorization each of them will perceive.

2. Factorization and the integration paradox

A second challenge that we have encountered is theextreme separability possible for both H and ρ. In the

16 If we write the Ising Hamiltonian as a quadratic function ofσx-operators, then it is also quadratic in the annihilation andcreation operators and can therefore be diagonalized after aJordan-Wigner transform [49]. Note that such diagonalizationis impossible for the Heisenberg ferromagnet, whose couplingsare quadratic in all three Pauli matrices, because σ2

z -terms arequartic in the annihilation and creation operators.

31

introduction, we expressed hope that the apparent inte-gration of minds and external objects might trace back tothe fact that for generic ρ and H, there is no Hilbert spacefactorization that makes ρ factorizable or H additivelyseparable. Yet by generalizing Tononi’s ideas to quantumsystems, we found that what he terms the “cruelest cut”is very cruel indeed, able to reduce the mutual informa-tion in ρ to no more than about 0.25 bits, and typicallyable to make the interaction Hamiltonian H3 very smallas well. We saw in Section IV H that even the combinedeffects ρ and H can typically be made close to separable,in the sense that there is a Hilbert space factorizationwhere a subsystem history ρ1(t) is close to separable fora long time. So why do we nonetheless perceive out uni-verse as being relatively integrated, with abundant infor-mation available to us from near and far? Why do we notinstead perceive our mind as essentially constituting itsown parallel universe, solipsism-style, with merely expo-nentially small interactions with the outside world? Wesaw that the origin of this integration paradox is the vast-ness of the group of unitary transformations that we areminimizing over, whose number of parameters scales liken2 = 22b with the number of qubits b and thus grows ex-ponentially with system size (measured in either volumeor number of particles).

3. Factorization and the emergence of time

A third challenge involves the emergence of time. Al-though this is a famously thorny problem in quantumgravity, our results show that it appears even in non-relativistic unitary quantum mechanics. It is intimatelylinked with our factorization problem, because we areoptimizing over all unitary transformations U, and timeevolution is simply a one-dimensional subset of thesetransformations, given by U = eiHt. Should the opti-mal factorization be determined separately at each time,or only once and for all? In the latter case, this wouldappear to select only one special time when our universeis optimally separable, seemingly contrary to our obser-vations that the laws of physics are time-translation in-variant. In the former case, the continuous change infactorization will simply undo time evolution [18], mak-ing you feel that time stands still! Observationally, it isobvious that the optimal factorization can change at leastsomewhat with time, since our designation of objects istemporary: the atoms of a highly autonomous woodenbowling ball rolling down a lane were once dispersed (asCO2 and H2O in the air, etc.) and will eventually disperseagain.

An obvious way out of this impasse is to bring con-sciousness back to center-stage as in Section IV G and[4, 47, 48]. Whenever a conscious observer interacts withher environment and gains new information, the state ρwith which she describes her world gets updated accord-ing to equation (115), the quantum-mechanical versionof Bayes Theorem [48]. This change in her ρ is non-

unitary and therefore evades our timelessness argumentabove. Because she always perceives herself in a purestate, knowing the state of her mind, the joint state orher and the rest of the world is always separable. It there-fore appears that if we can one day solve the quantumfactorization problem, then we will find that the emer-gence of time is linked to the emergence of consciousness:the former cannot be fully understood without the latter.

D. Observational hints and clues

In summary, the quantum factorization problem isboth very interesting and very hard. However, as op-posed to the hard problem of quantum gravity, say, wherewe have few if any observational clues to guide us, physicsresearch has produced many valuable hints and clues rel-evant to the quantum factorization problem. The factor-ization of the world that we perceive and the quantumstates that we find objects in have turned out to be ex-ceptionally unusual and special in various ways, and foreach such way that we can identify, quantify and under-stand the underlying principle responsible for, we willmake another important stride towards solving the fac-torization problem. Let us now discuss the hints that wehave identified upon so far.

1. The universality of the utility principle

The principles that we listed in Table II were for con-scious systems. If we shift attention to non-consciousobjects, we find that although dynamics, independenceand integration still apply in many if not most cases, theutility principle is the only one that universally appliesto all of them. For example, a rain drop lacks significantinformation storage capacity, a boulder lacks dynamics,a cogwheel can lack independence, and a sand pile lacksintegration. This universality of the utility principle ishardly surprising, since utility is presumably the reasonwe evolved consciousness in the first place. This suggeststhat we examine all other clues below through the lens ofutility, to see whether the unusual circumstances in ques-tion can be explained via some implication of the utilityprinciple. In other words, if we find that useful conscious-ness can only exist given certain strict requirements onthe quantum factorization, then this could explain whywe perceive a factorization satisfying these requirements.

2. ρ is exceptional

The observed state ρ of our universe is excep-tional in that it is extremely cold, with most of theHilbert space frozen out — what principles might re-quire this? Perhaps this is useful for consciousnessby allowing relatively stable information storage andby allowing large autonomous systems thanks to the

32

large available dynamic range in length scales (uni-verse/brain/atom/Planck scale)? Us being far from ther-mal equilibrium with our 300K planet dumping heat fromour 6000K sun into our 3K space is clearly conducive todynamics and information processing.

3. H is exceptional

The Hamiltonian H of the standard model of particlephysics is of the very special form

H =

∫Hr(r)d3r, (117)

which is seen to be almost additively separable in thespatial basis, and in no other basis. Although equa-tion (117) superficially looks completely separable justas H =

∑i Hi, there is a coupling between infinitesi-

mally close spatial points due to spatial derivatives inthe kinetic terms. If we replace the integral by a sum inequation (117) by discretizing space as in lattice gaugetheory, we need couplings only between nearest-neighborpoints. This is a strong hint of the independence princi-ple at work; all this near-independence gets ruined by ageneric unitary transformation, making the factorizationcorresponding to our 3D physical space highly special;indeed, 3D space and the exact form of equation (117)could presumably be inferred from simply knowing thespectrum of H.

H from equation (117) is also exceptional in that itcontains mainly quadratic, cubic and quartic functionsof the fermion and boson fields, which can in turn beexpressed linearly or quadratically in terms of qubit rais-ing and lowering operators (see Appendix C). A genericunitary transformation would ruin this simplicity as well,introducing polynomials of enormous degree. What prin-ciple might be responsible for this?

H from equation (117) is also exceptional by exhibitingtremendous symmetry: the form of Hr in invariant underboth space and time translation, and indeed under thefull Poincare group; using a factorization other than 3Dspace would ruin this symmetry.

4. The ubiquity of autonomy

When discussing the integration paradox above, weworried about factorizations splitting the world intonearly independent parts. If there is a factorization withH3 = 0, then the two subsystems are independent for anystate, for all time, and will act as two parallel universes.This means that if the only way to achieve high inde-pendence were to make H3 tiny, the integration paradoxwould indeed be highly problematic. However, we saw inSection IV that this is not at all the case: it is quite easyto achieve high independence for some states, at leasttemporarily, even when H3 is large. The independenceprinciple therefore does not push us inexorably towards

perceiving a more disconnected world than the one we arefamiliar with. The ease of approximately factoring ρ1(t)during a significant time period as in Section IV H alsoappears unlikely to be a problem: as mentioned, our cal-culation answered the wrong question by studying onlyunitary evolution, neglecting projection. The take-awayhint is thus that observation needs to be taken into ac-count to address this issue properly, just as we arguedthat it must be taken into account to understand theemergence of time.

5. Decoherence as enemy

Early work on decoherence [34, 35] portrayed it mainlyas an enemy, rapidly killing off most quantum states,with only a tiny minority surviving long enough to beobservable. For example, a bowling ball gets struck byabout 1025 air molecules each second, and a single strikesuffices to ruin any macrosuperposition of the balls po-sition extending further than about an angstrom, themolecular De Broglie wavelength [35, 52]. The successfulpredictability sieve idea of Zurek and collaborators [50]states that we will only perceive those quantum statesthat are most robust towards decoherence, which in thecase of macroscopic objects such as bowling balls selectsroughly classical states with fairly well-defined positions.In situations where the position basis emerges as special,this might thus trace back to the environmental interac-tions H3 (with air molecules etc.) probing the position,which might in turn traces back to the fact that H fromequation (117) is roughly separable in the position basis.Note, however, that the general situation is more compli-cated, since the predictability sieve depends also on thestate ρ, which might contain long-distance entanglementbuilt up over time by the kinetic terms in equation (117).Indeed, ρ can describe a laboratory where a system isprobed in a non-spatial basis, causing the predictablitysieve to favor, say, energy eigenstates.

In terms of Table II, we can view the predictabilitysieve as an application of the utility principle, since thereis clearly no utility in trying to perceive something thatwill be irrelevant 10−25 seconds later. In summary, thehint from this negative view of decoherence is that weshould minimize it, either by factoring to minimize H3

itself or by using robust states on which H3 essentiallyperforms quantum non-demolition measurements.

6. Decoherence as friend

Although quantum computer builders still view deco-herence as their enemy, more recent work on decoher-ence has emphasized that it also has a positive side: theQuantum Darwinism framework [40] emphasizes the roleof environment interactions H3 as a valuable communica-tion channel, repeatedly copying information about the

33

states of certain systems into the environment17, therebyhelping explain the emergence of a consensus reality [53].Quantum Darwinism can also be viewed as an applica-tion of the utility principle: it is only useful for us totry to be aware of things that we can get informationabout, i.e., about states that have quantum-spammedthe environment with redundant copies of themselves. Ahint from this positive view of environmental interactionsis that we should not try to minimize H3 after all, butshould instead reduce decoherence by the second mech-anism: using states that are approximate eigenstates ofthe effective interaction H∗ and therefore get abundantlycopied into the environment.

Further work on Quantum Darwinism has revealedthat such situations are quite exceptional, reaching thefollowing conclusion [54]: “A state selected at randomfrom the Hilbert space of a many-body system is over-whelmingly likely to exhibit highly non-classical correla-tions. For these typical states, half of the environmentmust be measured by an observer to determine the stateof a given subsystem. The objectivity of classical reality— the fact that multiple observers can agree on the stateof a subsystem after measuring just a small fraction ofits environment — implies that the correlations found innature between macroscopic systems and their environ-ments are very exceptional.” This gives a hint that theparticular Hilbert space factorization we observe mightbe very special and unique, so that using the utility prin-ciple to insist on the existence of a consensus reality mayhave large constraining power among the factorizations— perhaps even helping nail down the one we actually

observe.

E. Outlook

In summary, the hypothesis that consciousness can beunderstood as a state of matter leads to fascinating in-terdisciplinary questions spanning the range from neu-roscience to computer science, condensed matter physicsand quantum mechanics. Can we find concrete exam-ples of error-correcting codes in the brain? Are therebrain-sized non-Hopfield neural networks that supportmuch more than 37 bits of integrated information? Cana deeper understanding of consciousness breathe new lifeinto the century-old quest to understand the emergenceof a classical world from quantum mechanics, and can iteven help explain how two Hermitian matrices H and ρlead to the subjective emergence of time? The quests tobetter understand the internal reality of our mind andthe external reality of our universe will hopefully assistone another.

Acknowledgments: The author wishes to thankChristoph Koch, Meia Chita-Tegmark, Russell Hanson,Hrant Gharibyan, Seth Lloyd, Bill Poirier, MatthewPusey, Harold Shapiro and Marin Soljacic and for help-ful information and discussions, and Hrant Gharibyanfor mathematical insights regarding the ρ- and H-diagonality theorems. This work was supported by NSFAST-090884 & AST-1105835.

17 Charles Bennett has suggested that Quantum Darwinism wouldbe more aptly named “Quantum Spam”, since the many redun-dant imprints of the system’s state are normally not further re-

produced.

[1] A. Almheiri, D. Marolf, J. Polchinski, and J. Sully, JHEP2, 62 (2013).

[2] T. Banks, W. Fischler, S. Kundu, and J. F. Pedraza,arXiv:1401.3341 (2014).

[3] S. Saunders, J. Barrett, A. Kent, and D. Wallace, ManyWorlds? Everett, Quantum Theory, & Reality (Oxford,Oxford Univ. Press, 2010).

[4] M. Tegmark, PRE 61, 4194 (2000).[5] D. J. Chalmers, J. Consc. Studies 2, 200 (1995).[6] P. Hut, M. Alford, and M. Tegmark, Found. Phys. 36,

765 (2006, physics/0510188).[7] M. Tegmark, Found.Phys. 11/07, 116 (2007).[8] G. Tononi, Biol. Bull. 215, 216, http://www.biolbull.

org/content/215/3/216.full (2008).[9] S. Dehaene, Neuron 70, 200 (2011).

[10] G. Tononi, Phi: A Voyage from the Brain to the Soul(New York, Pantheon, 2012).

[11] A. Casali et al., Sci. Transl. Med 198, 1 (2013).[12] M. Oizumi, L. Albantakis, and Tononi G, PLoS comp.

bio, e1003588 (2014).[13] S. Dehaene et al., Current opinion in neurobiology 25,

76 (2014).[14] B. A. Wilson and D. Wearing 1995, in Broken memories:

Case studies in memory impairment, ed. R. Campbelland M. A. Conway (Malden: Blackwell)

[15] I. Amato, Science 253, 856 (1991).[16] S. Lloyd, Nature 406, 1-47 (2000).[17] G. t’Hooft, arXiv:gr-qc/9310026 (1993).[18] J. Schwindt, arXiv:1210.8447 [quant-ph] (2012).[19] A. Damasio, Self Comes to Mind: Constructing the Con-

scious Brain (New York, Vintage, 2010).[20] R. W. Hamming, The Bell System Technical Journal 24,

2 (1950).[21] M. Grassl, http://i20smtp.ira.uka.de/home/grassl/

codetables/

[22] J. J. Hopfield, Proc. Natl. Acad. Sci. 79, 2554 (1982).[23] N. J. Joshi, G. Tononi, and C. Koch, PLOS Comp. Bio.

9, e1003111 (2013).[24] O. Barak et al., Progr. Neurobio. 103, 214 (2013).[25] Yoon K et al., Nature Neuroscience 16, 1077 (2013).[26] D. J. C McKay, Information Theory, Inference, and

Learning Algorithms (Cambridge, Cambridge UniversityPress, 2003).

[27] K. Kupfmuller, Nachrichtenverarbeitung im Menschen,in Taschenbuch der Nachrichtenverarbeitung, K. Stein-buch, Ed., 1481-1502 (1962).

http://www.biolbull.org/content/215/3/216.full

http://www.biolbull.org/content/215/3/216.full

http://i20smtp.ira.uka.de/home/grassl/codetables/

http://i20smtp.ira.uka.de/home/grassl/codetables/

34

[28] T. Nørretranders, The User Illusion: Cutting Conscious-ness Down to Size (New York, Viking, 1991).

[29] S. Jevtic, Jennings D, and T. Rudolph, PRL 108, 110403(2012).

[30] J. von Neumann., Die mathematischen Grundlagen derQuantenmechanik (Berlin., Springer, 1932).

[31] A. W. Marshall, I. Olkin, and B,. Inequalities: Theory ofMajorization and Its Applications, 2nd ed. Arnold (NewYork, Springer, 2011).

[32] S. Bravyi, Quantum Inf. and Comp. 4, 12 (2004).[33] W. H. Zurek, quant-ph/0111137 (2001).[34] H. D. Zeh, Found.Phys. 1, 69 (1970).[35] E. Joos and H. D. Zeh, Z. Phys. B 59, 223 (1985).[36] W. H. Zurek, S. Habib, and J. P. Paz, PRL 70, 1187

(1993).[37] D. Giulini, E. Joos, C. Kiefer, J. Kupsch, I. O. Sta-

matescu, and H. D. Zeh, Decoherence and the Appear-ance of a Classical World in Quantum Theory (Springer,Berlin, 1996).

[38] W. H. Zurek, Nature Physics 5, 181 (2009).[39] M. Schlosshauer, Decoherence and the Quantum-To-

Classical Transition (Berlin, Springer, 2007).[40] W. H. Zurek, Nature Physics 5, 181 (2009).[41] E. C. G Sudarshan and B. Misra, J. Math. Phys. 18, 756

(1977).[42] R. Omnes, quant-ph/0106006 (2001).[43] J. Gemmer and G. Mahler, Eur. Phys J. D 17, 385

(2001).[44] T. Durt, Z. Naturforsch. 59a, 425 (2004).[45] W. H. Zurek, S. Habib, and J. P. Paz, PRL 70, 1187

(1993).[46] M. Tegmark and H. S. Shapiro, Phys. Rev. E 50, 2538

(1994).[47] H. Gharibyan and M. Tegmark, arXiv:1309.7349 [quant-

ph] (2013).[48] M. Tegmark, PRD 85, 123517 (2012).[49] Nielsen 2005, http://michaelnielsen.org/blog/

archive/notes/fermions_and_jordan_wigner.pdf

[50] D. A. R Dalvit, J. Dziarmaga, and W. H. Zurek, PRA72, 062101 (2005).

[51] R. Penrose, The Emperor’s New Mind (Oxford, OxfordUniv. Press, 1989).

[52] M. Tegmark, Found. Phys. Lett. 6, 571 (1993).[53] M. Tegmark, Our Mathematical Universe: My Quest for

the Ultimate Nature of Reality (New York, Knopf, 2014).[54] C. J. Riedel, W. H. Zurek, and M. Zwolak, New J. Phys.

14, 083010 (2012).[55] S. Lloyd, Programming the Universe (New York, Knopf,

2006).[56] Z. Gu and X. Wen, Nucl.Phys. B 863, 90 (2012).[57] X. Wen, PRD 68, 065003 (2003).[58] M. A. Levin and X. Wen, RMP 77, 871 (2005).[59] M. A. Levin and X. Wen, PRB 73, 035122 (2006).[60] M. Tegmark and L. Yeh, Physica A 202, 342 (1994).

Appendix A: Useful identities involving tensorproducts

Below is a list of useful identities involving tensor mul-tiplication and partial tracing, many of which are usedin the main part of the paper. Although they are allstraightforward to prove by writing them out in the in-

dex notation of equation (104), I have been unable tofind many of them in the literature. The tensor product⊗ is also known as the Kronecker product.

(A⊗B)⊗C = A⊗ (B⊗C) (A1)

A⊗ (B + C) = A⊗B + A⊗C (A2)

(B + C)⊗A = B⊗A + C⊗A (A3)

(A⊗B)† = A† ⊗B† (A4)

(A⊗B)−1 = A−1 ⊗B−1 (A5)

tr [A⊗B] = (tr A)(tr B) (A6)

tr1

[A⊗B] = (tr A)B (A7)

tr2

[A⊗B] = (tr B)A (A8)

tr1

[A(B⊗ I)] = tr1

[(B⊗ I)A] (A9)

tr2

[A(I⊗B)] = tr2

[(I⊗B)A] (A10)

tr1

[(I⊗A)B] = A(tr1

B) (A11)

tr2

[(A⊗ I)B] = A(tr2

B) (A12)

tr1

[A(I⊗B)] = (tr1

A)B (A13)

tr2

[A(B⊗ I)] = (tr2

A)B (A14)

tr1

[A(B⊗C)] = tr1

[A(B⊗ I)]C (A15)

tr2

[A(B⊗C)] = tr2

[A(I⊗C)]B (A16)

tr1

[(B⊗C)A] = C tr1

[(A⊗ I)B] (A17)

tr2

[(B⊗C)A] = B tr2

[(I⊗C)A] (A18)

tr{

[(tr2

A)⊗ I]B}

= tr [(tr2

A)(tr2

B)] (A19)

tr{

[I⊗ (tr1

A)]B}

= tr [(tr1

A)(tr1

B)] (A20)

(A⊗B,C⊗D) = (A,C)(B,D) (A21)

||A⊗B|| = ||A|| ||B|| (A22)

Identities A11-A14 are seen to be special cases of iden-tities A15-A18. If we define the superoperators T1 andT2 by

T1A ≡ 1

n1I⊗ (tr 1A), (A23)

T2A ≡ 1

n2(tr 2A)⊗ I, (A24)

then identities A19-A20 imply that they are self-adjoint:

(T1A,B) = (A,T1B), (T2A,B) = (A,T2B).

They are also projection operators, since they satisfyT2

1 = T1 and T22 = T2.

Appendix B: Factorization of Harmonic oscillatorinto uncoupled qubits

If the Hilbert space dimensionality n = 2b for someinteger b, then the truncated harmonic oscillator Hamil-

http://michaelnielsen.org/blog/archive/notes/fermions_and_jordan_wigner.pdf

http://michaelnielsen.org/blog/archive/notes/fermions_and_jordan_wigner.pdf

35

tonian of equation (58) can be decomposed into b inde-pendent qubits: in the energy eigenbasis,

H =

b−1∑j=0

Hj , Hj = 2j(

12 00 − 1

2

)j

= 2j−1σzj , (B1)

where the subscripts j indicate that an operator acts onlyon the jth qubit, leaving the others unaffected. For ex-ample, for b = 3 qubits,

H =

(2 00 −2

)⊗ I⊗ I + I⊗

(1 00 −1

)⊗ I + I⊗ I⊗

(12 00 − 1

2

)

=

− 72 0 0 0 0 0 0 00 − 5

2 0 0 0 0 0 00 0 − 3

2 0 0 0 0 00 0 0 − 1

2 0 0 0 00 0 0 0 1

2 0 0 00 0 0 0 0 3

2 0 00 0 0 0 0 0 5

2 00 0 0 0 0 0 0 7

2

, (B2)

in agreement with equation (58). This factorization cor-responds to the standard binary representation of inte-gers, which is more clearly seen when adding back thetrace (n− 1)/2 = (2b − 1)/2:

H +7

2=

(4 00 0

)⊗ I⊗ I + I⊗

(2 00 0

)⊗ I + I⊗ I⊗

(1 00 0

)

=

0 0 0 0 0 0 0 00 1 0 0 0 0 0 00 0 2 0 0 0 0 00 0 0 3 0 0 0 00 0 0 0 4 0 0 00 0 0 0 0 5 0 00 0 0 0 0 0 6 00 0 0 0 0 0 0 7

. (B3)

Here we use the ordering convention that the most sig-nificant qubit goes to the left. If we write k as

k =b−1∑j=0

kj2j ,

where kj are the binary digits of k and take values 0 or1, then the energy eigenstates can be written

|Ek〉 =b−1⊗j=0

(σ†)kj |0〉, (B4)

where |0〉 is the ground state (all b qubits in the downstate), the creation operator

σ† =

(0 10 0

)(B5)

raises a qubit from the down state to the up state, and(σ†)0 is meant to be interpreted as the identity matrix

I. For example, since the binary representation of 6 is“110”, we have

|E6〉 = σ† ⊗ σ† ⊗ I|0〉 = |110〉, (B6)

the state where the first two qubits are up and the lastone is down. Since (σ†)kj

(01

)is an eigenvector of σz with

eigenvalue (2kj − 1), i.e., +1 for spin up and −1 for spindown, equations (B1) and (B4) give H|Ek〉 = Ek|Ek〉,where

Ek =

b−1∑j=0

2j−1(2kj − 1)|Ek〉 = k − 2b − 1

2(B7)

in agreement with equation (58).The standard textbook harmonic oscillator corre-

sponds to the limit b → ∞, which remains completelyseparable. In practice, a number of qubits b = 200 islarge enough to be experimentally indistinguishable fromb = ∞ for describing any harmonic oscillator ever en-countered in nature, since it corresponds to a dynamicrange of 2200 ∼ 1060, the ratio between the largest andsmallest potentially measurable energies (the Planck en-ergy versus the energy of a photon with wavelength equalto the diameter of our observable universe). So far, wehave never measured any physical quantity to better than17 significant digits, corresponding to 56 bits.

Appendix C: Emergent space and particles fromnothing but qubits

Throughout the main body of our paper, we have lim-ited our discussion to a Hilbert space of finite dimen-sionality n, often interpreting it as b qubits with n = 2b.On the other hand, textbook quantum mechanics usuallysets n =∞ and contains plenty of structure additional tomerely H and ρ, such as a continuous space and variousfermion and boson fields. The purpose of this appendixis to briefly review how the latter picture might emergefrom the former. An introduction to this “it’s all qubits”approach by one of its pioneers, Seth Lloyd, is given in[55], and an up-to-date technical review can be found in[56].

As motivation for this emergence approach, note that alarge number of quasiparticles have been observed such asphonons, holes, magnons, rotons, plasmons and polarons,which are known not to be fundamental particles, butinstead mere excitations in some underlying substrate.This raises the question of whether our standard modelparticles may be quasiparticles as well. It has been shownthat this is indeed a possibility for photons, electrons andquarks [57–59], and perhaps even for gravitons [56], withthe substrate being nothing more than a set of qubitswithout any space or other additional structure.

In Appendix B, we saw how to build a harmonic oscil-lator out of infinitely many qubits, and that a truncatedharmonic oscillator built from merely 200 qubits is exper-imentally indistinguishable from an infinite-dimensional

36

one. We will casually refer to such a qubit collection de-scribing a truncated harmonic oscillator as a “qubyte”,even if the number of qubits it contains is not precisely8. As long as our universe is cold enough that the veryhighest energy level is never excited, a qubyte will behaveidentically to a true harmonic oscillator, and can be usedto define position and momentum operators obeying theusual canonical commutation relations.

To see how space can emerge from qubits alone, con-sider a large set of coupled truncated harmonic oscillators(qubytes), whose position operators qr and momentumoperators pr are labeled by an index r = (i, j, k) consist-ing of a triplet of integers — r has no a priori meaningor interpretation whatsoever except as a record-keepingdevice used to specify the Hamiltonian. Grouping theseoperators into vectors p and q, we choose the Hamilto-nian

H =1

2|p|2 +

1

2qtAq, (C1)

where the coupling matrix A is translationally invariant,i.e., Arr′ = ar′−r, depending only on the difference r′−rbetween two index vectors. For simplicity, let us treatthe lattice of index vectors r as infinite, so that A isdiagonalized by a 3D Fourier transform. (Alternatively,we can take the lattice to be finite and the matrix A tobe circulant, in which case A is again diagonalized by aFourier transform; this will lead to the emergence of atoroidal space.)

Fourier transforming our qubyte lattice preserves thecanonical commutation relations and corresponds to aunitary transformation that decomposes H into indepen-dent harmonic oscillators. As in [60], the frequency of theoscillator corresponding to wave vector κ is

ω(k)2 =∑r

are−iκ·r. (C2)

For example, consider the simple case where each oscilla-tor has a self-coupling µ and is only coupled to its 6 near-est neighbors by a coupling γ: a1,0,0 = a−1,0,0 = a0,1,0 =a0,−1,0 = a0,0,1 = a0,0,−1 = −γ2, a0,0,0 = µ2+6γ2. Then

ω(κ)2 = µ2 + 4γ2(

sin2 κx2

+ sin2 κy2

+ sin2 κz2

), (C3)

where κx, κy and κz lie in the interval [π, π]. If wewere to interpret the lattice points as existing in a three-dimensional space with separation a between neighboringlattice points, then the physical wave vector k would begiven by

k =κ

a. (C4)

Let us now consider a state ρ where all modes exceptlong-wavelength ones with |κ| � 1 are frozen out, in thespirit of our own relatively cold universe. Using the lsymbol from Section IV F, we then have H l H′, whereH′ is a Hamiltonian with the isotropic dispersion relation

ω2 = µ2 + γ2(κ2x + κ2y + κ2z

)= µ2 + γ2κ2, (C5)

i.e., where the discreteness effects are absent. Comparingthis with the standard dispersion relation for a relativisticparticle,

ω2 = µ2 + (ck)2, (C6)

where c is the speed of light, we see that the two agree ifthe lattice spacing is

a =c

γ. (C7)

For example, if the lattice spacing is the Planck length,then the coupling strength γ is the inverse Planck time.In summary, this Hilbert space built out of qubytes, withno structure whatsoever except for the Hamiltonian H,is physically indistinguishable from a system with quan-tum particles (scalar bosons of mass µ) propagating ina continuous 3D space with the same translational androtational symmetry that we normally associate withinfinite Hilbert spaces, so not only did space emerge,but continuous symmetries not inherent in the originalqubit Hamiltonian emerged as well. The 3D structure ofspace emerged from the pattern of couplings between thequbits: if they had been presented in a random order, thegraph of which qubits were coupled could have been ana-lyzed to conclude that everything could be simplified intoa 3D rectangular lattice with nearest-neighbor couplings.

Adding polarization to build photons and other vec-tor particles is straightforward. Building simple fermionfields using qubit lattices is analogous as well, except thata unitary Jordan-Wigner transform is required for con-verting the qubits to fermions. Details on how to buildphotons, electrons, quarks and perhaps even gravitonsare given in [56–59]. Lattice gauge theory works simi-larly, except that here, the underlying finite-dimensionalHilbert space is viewed not as the actual truth but asa numerically tractable approximation to the presumedtrue infinite-dimensional Hilbert space of quantum fieldtheory.

Documents

Consciousness as a State of Matter - arXiv · Consciousness as a State of Matter Max Tegmark ... be viewed as novel states of matter and investigated with traditional physics tools