31
Multi-information in the Thermodynamic Limit Ionas Erb Nihat Ay SFI WORKING PAPER: 2003-11-064 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu SANTA FE INSTITUTE

Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

Multi-information in theThermodynamic LimitIonas ErbNihat Ay

SFI WORKING PAPER: 2003-11-064

SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent theviews of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our externalfaculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, orfunded by an SFI grant.©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensuretimely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rightstherein are maintained by the author(s). It is understood that all persons copying this information willadhere to the terms and constraints invoked by each author's copyright. These works may be reposted onlywith the explicit permission of the copyright holder.www.santafe.edu

SANTA FE INSTITUTE

Page 2: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

Multi-Information in the Thermodynamic Limit

Ionas Erb∗† and Nihat Ay†‡§

November 10, 2003

Abstract

A multivariate generalization of mutual information, multi-information,is defined in the thermodynamic limit. The definition takes phase coex-istence into account by taking the infimum over the translation-invariantGibbs measures of an interaction potential. It is shown that this infimumis attained in a pure state. An explicit formula can be found for theIsing square lattice, where the quantity is proved to be maximized at thephase-transition point. By this, phase coexistence is linked to high modelcomplexity in a rigorous way.

Keywords: Mutual information, Ising model, phase transitions,excess entropy, complexity.

∗Interdisciplinary Center for Bioinformatics, University of Leipzig, Kreuzstr. 7b, 04103Leipzig, Germany; e-mail: [email protected]

†Max-Planck Institute for Mathematics, Inselstr. 22-26, 04103 Leipzig, Germany‡Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501; e-mail: [email protected]§Mathematical Institute, Friedrich-Alexander University Erlangen-Nuremberg, Bismarckstr.

1 1/2, 91054 Erlangen, Germany

1

Page 3: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

1 Introduction

1.1 Why multi-information?

Shannon’s mutual information compares the summed entropies H of twodistributions p1, p2 with the entropy of their joint distribution p1,2:

I(p1,2) = H(p1) + H(p2) − H(p1,2). (1)

There are several generalizations for finite sets Λ with more than two sub-systems. Keeping the two-point property of I, one can let the quantitydepend on a distance between the elements of Λ [Li1]. There are also mul-tivariate generalizations. Co-information is an alternating sum of entropiesof the marginals pV of a distribution pΛ for all subsystems V ⊂ Λ [Be]. Asimple multivariate generalization that is valid in any dimension is calledmulti-information [SV]:

I(pΛ) =∑

i∈Λ

H(pi) − H(pΛ) (2)

Below we will give a motivation for this quantity coming from informationgeometry. Also, in section 4 we show the relationship to excess entropy.Let us already mention that in the limit of infinite shift-invariant systems,multi-information will again be mutual information, namely between anelementary subsystem and the infinite system.Information-theoretic measures as the ones cited above quantify stochasticinterdependence in probability distributions. They are used in a variety offields, e.g. communication theory [Sh], multivariate statistics [HKO], neuralnetworks [Be], complexity measures [TSE, CF1], learning rules [L, Ay2],to mention only a few of them.The behaviour of a quantity like mutual information is best shown bya simple example: two units x1, x2 which can take values from 0, 1.Knowing the probabilities p1,2(x1, x2) of the four configurations, mutualinformation is given by (1). Let us introduce an additional parameter βby which we can tune p1,2. We define

pβ(x1, x2) :=(p1,2(x1, x2))

β

x′1,x′

2∈0,1(p1,2(x′1, x

′2))

β. (3)

The denominator normalizes pβ. For β = 0, pβ is the equidistribution,whereas β → ∞ gives us the Dirac measure. For a generic choice of p1,2,the function β 7→ I(pβ), let us call it I(β), is shown in figure (1). Thetrajectory of the curve β 7→ pβ within the simplex of all probability mea-sures for the four configurations is shown in figure (2). It ranges fromthe barycentre to one of the corners of the simplex. The Kullback-Leiblerdistance (see below) of this curve from the surface of independent distri-butions is mutual information.

2

Page 4: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

0

0.001

1 beta

I(β)

β

Figure 1: Plot of I(β) for fixed values of p1,2(x1, x2). I(β) vanishes for b = 0and β → ∞, i.e. there are no stochastic dependencies for complete randomnessor complete predictability.

3

Page 5: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

p1

δ10

δ01

δ00

δ11

12δ10 + 1

2δ11

Figure 2: The set of probability distributions for the four configurations (x1, x2)with the plane of factorizable distributions and the curve pβ.

4

Page 6: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

Multi-information generalizes exactly this property: It still is the Kullback-Leibler distance of pΛ from its factorized distribution ⊗i∈Λpi [Am, Ay1].Interest in this quantity is motivated by finite-volume information geom-etry, see [Am] and the references therein. Entropy and multi-informationhave natural decompositions of a form that we will briefly describe now.Let us use a simple example for pΛ:

pΛ(xΛ) = eF+∑

∅6=V ⊂Λ ΘV

i∈V xi , (4)

with xΛ the collection of xi, i ∈ Λ, where the xi are from 0, 1. F is anormalization constant (the free energy). The coefficients ΘV ∈ R, V ⊂ Λrepresent the strength of direct interaction between the units i. Let us

now denote by p(k)Λ a distribution that has the same marginals as pΛ up to

k-th order, but no intrinsic interactions of higher order (ΘV = 0 for all Vsuch that |V | > k). It is the maximum-entropy estimate [J1] of pΛ givenits k-th order marginals. Now there is an “extended Pythagoras theorem”

D(pΛ||p(0)Λ ) :=

pΛ(xΛ) lnpΛ(xΛ)

p(0)Λ (xΛ)

=

|Λ|∑

k=1

D(p(k)Λ ||p(k−1)

Λ ), (5)

where D denotes the Kullback-Leibler distance [CT]. In our example, theleft-hand side (LHS) is |Λ| ln 2 − H(pΛ), the “distance” from the equidis-

tribution, whereas multi-information is given by D(pΛ||p(1)Λ ). It can be

decomposed into a sum like the right-hand side (RHS) without the k = 1term. Note that D, although not a metric, is the canonical [AN] measureof distance in information geometry and that the above decomposition isnon-trivial for |Λ| > 2. Note also that e.g. covariance makes only use ofcorrelations up to second order, whereas multi-information contains alsoinformation from all the higher-order marginals.

1.2 Statistical mechanics

Looking at figures 1,2, the question about the maximum of I springs tomind. To give us an idea what such a maximization [Ay1, AK] can mean,we want to define multi-information in the context of statistical mechan-ics. There we have a mathematical formalism that describes models ofdifferent structural richness. A simple example for a finite-volume state(Gibbs measure) is given by (4). The set of all its terms ΘV

i∈V xi inthe exponent is called an interaction potential. A parametrized family ofsuch potentials is called a model. The parameters (c.f. the β in (3)) canbe inverse temperature, magnetic field etc.There are (infinite-volume) potentials whose infinite-volume state isnot uniquely determined. Models showing this phenomenon for certainparameter values are said to exhibit phase coexistence. According toclear hints in the literature, measures of stochastic interdependence are

5

Page 7: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

maximized at critical parameter values [MKNYM, Ar, CF2], or at phasetransitions in a less strict sense [CY, LF, GL]. One can go a step furtherand look at the structural phenomena occurring at the phase coexistencepoint in standard models like the Ising square lattice: infinite-clusterformation, divergence of the correlation length. They can be seen as signsof “complex” behaviour. From this perspective it seems natural to assumethat large stochastic interdependence is associated with high structuralcomplexity. In [FC1] this is discussed in connection with excess entropy, aquantity that, as we will show, is closely related to multi-information.Phase transitions thus seem to mark the “border of maximum complexbehaviour” between complete randomness and absolute predictability. Itis one of the objectives of the present work to give an example wherethis kind of statement can be made rigorous. To do so, we generalizemulti-information to a quantity in the thermodynamic limit that takesinto account the non-uniqueness of infinite-volume Gibbs measures forcertain interaction potentials. Using the example of the Ising squarelattice, we can connect phase coexistence with maximum distance fromfactorizability in a formal way.

6

Page 8: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

2 Multi-information in statistical me-

chanics

2.1 Notation

Our systems take discrete values on the points of an infinite lattice. LetS be a finite set (the spin space), and let x : Z

d 7→ S, i 7→ xi be configura-tions on the d-dimensional lattice of integers Z. To make it a measurablespace, the space of configurations Ω := S(Zd) is equipped with the productsigma algebra F , which contains the cylinder sets x ∈ Ω : XΛ(x) = xΛ,where xΛ := (xi)i∈Λ is a configuration on the finite1 set Λ ⊂⊂ Z

d andXΛ : Ω 7→ SΛ, x 7→ xΛ the natural projection onto a finite configuration.Thus the projection XΛ yields finite measurable spaces (ΩΛ,FΛ), whereΩΛ := SΛ denotes the set of xΛ and FΛ its power set.We will first define multi-information on these finite spaces, for pΛ a prob-ability measure on (ΩΛ,FΛ). At this point, the form of the measure is ofno importance.Definition 2.1: Let pΛ be a probability measure on (ΩΛ,FΛ), where Λis a finite set. The multi-information of pΛ is defined by

I(pΛ) :=∑

i∈Λ

H(pi) − H(pΛ). (6)

Here, H(pΛ) := −∑xΛ∈ΩΛpΛ(xΛ) ln pΛ(xΛ) denotes the Shannon entropy

and pi(xi) :=∑

xΛ\ip(xΛ\i, xi) are the marginal distributions of the

elementary subsystems in Λ.

2.2 Thermodynamic limit

To define multi-information for distributions on the infinite measurablespace (Ω,F), our starting point are measures pΛ on finite spaces (ΩΛ,FΛ),Λ ⊂⊂ Z

d. These we consider as being obtained from a translation invari-ant measure p on (Ω,F) by defining its marginal distributions pΛ(xΛ) :=p(XΛ = xΛ). Translation invariance of p is defined by

p((xi+j)j∈Zd |(xj)j∈Zd ∈ A) = p(A) ∀A ∈ F , ∀i ∈ Zd. (7)

Existence and properties of the van-Hove limit [R] of multi-informationfollow in straightforward fashion from well-known results for entropy (seethe appendix for a proof). Notice that the set of translation invariantmeasures is a simplex and thus convex [S, Ge].Theorem and Definition 2.2: Let p be a translation invariant probabil-ity measure on (Ω,F). Then the van-Hove limit limΛZd

1|Λ|I(pΛ) =: I(p)

exists and I(p) ∈ [0, ln |S|]. The function p 7→ I(p) is concave and lower-semicontinuous (w.r.t. the weak∗ topology).

1We denote finiteness of subsets by ⊂⊂.

7

Page 9: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

The quantity I(p) depends on the state of a system. In statistical mechan-ics, however, models are defined via the interaction between their con-stituents (spins, particles). In the following, we want to obtain a definitionwhich directly depends on the interaction potential.

2.3 Phase coexistence

The construction of measures in infinite volume [Do, LR] can yield non-uniqueness for a given interaction, so the description of phase coexis-tence becomes possible. For an interaction-dependent definition of multi-information we have to choose from a set of possible measures now. Tointroduce the necessary notation and to make our point clear, we give abrief description of the standard construction of infinite-volume Gibbs mea-sures. All the results stated in this section can be found in this or a similarform in [Ge, S], for short descriptions of the subject see also [Pe, Gr].From finite-volume statistical mechanics one knows the form of the con-ditional probabilities for a finite configuration given an exterior configura-tion that the measure p on (Ω,F) should have [Gr]. Specifying these, oneobtains a condition that possible infinite-volume Gibbs measures shouldfulfill. For this, we need to define interaction potentials in infinite volume.Definition 2.3: A potential Φ on Z

d is a family of functions ΦV V ⊂⊂Zd

from Ω to R with(i) ΦV is XV -measurable for all V ⊂⊂ Z

d

(ii) The series EΦΛ (xΛ, yΛc) :=

V ⊂⊂Zd:V ∩Λ6=∅

ΦV (xΛ, yΛc) converges for all

Λ ⊂⊂ Zd and for all (xΛ, yΛc) := x ∈ Ω (where Λc denotes the com-

plement of Λ in Zd.).

EΦΛ (xΛ, yΛc) is the energy of xΛ with boundary condition yΛc .

This definition enables us to specify the Gibbsian conditional probabilitiesfor the desired measures. Since we know nothing about the existence ofthese measures, we can only fix probability kernels (i.e. loosely speaking,

conditional probabilities “waiting for a measure”). Let ΩΛc = SZd\Λ. Us-

ing a definition from [Ge], a specification for finite S is given by a familykΦ

ΛΛ⊂⊂Zd of probability kernels from (ΩΛc ,FΛc) to (Ω,F) where

A 7→ kΦΛ(A|yΛc) :=

xΛ:

(xΛ,yΛc )∈A

e−EΦΛ (xΛ,yΛc)

x′Λ∈ΩΛ

e−EΦΛ (x′

Λ,yΛc). (8)

Here, Φ is a potential, Λ ⊂⊂ Zd, A ∈ F and yΛc ∈ ΩΛc . Such specifications

fulfill consistency conditions analogous to those of conditional probabilities.The set of DLR measures is now defined as the solution set of p(A|FΛc) =kΦΛ(A|·) p-a.s. for all finite volumes Λ and events A. Here, p(A|FΛc)

is the conditional expectation of 1A, i.e. of the indicator function for anevent A, given the sigma algebra of events outside Λ. For the definitionof conditional expectations given a sub-sigma algebra, see e.g. [Ba]. Theproperties of the set of DLR measures are well known.

8

Page 10: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

Proposition and Definition 2.4: Given a potential Φ, the set ofinfinite-volume Gibbs states (DLR measures) is defined by

G(Φ) :=

p on (Ω,F) : p(A|FΛc) = kΦΛ(A|·) p-a.s. ∀A ∈ F ,Λ ⊂⊂ Z

d

.

G(Φ) is a compact, convex set (more precisely: a simplex). Depending onthe potential Φ, there are the following possibilities for its cardinality:

|G(Φ)| = 0, (9)

|G(Φ)| = 1, (10)

|G(Φ)| = ∞. (11)

The set of Gibbs measures is always non-empty if the potential is transla-tion invariant, i.e. if it fulfills

ΦV +i((xj−i)j∈Zd) = ΦV (x) ∀x ∈ Ω, ∀V ⊂⊂ Zd, ∀i ∈ Z

d. (12)

Notice that even translation-invariant potentials need not have onlytranslation-invariant states. Since the thermodynamic limit of multi-information was obtained for translation-invariant states, we will actuallyneed the set of translation-invariant Gibbs measures GI(Φ), i.e. the in-tersection of all translation-invariant measures on (Ω,F) with the set ofGibbs measures of translation invariant potentials. The set GI(Φ) is alsocompact and convex and its cardinality can be 1 or infinity.

2.4 Multi-information of a potential

We are now in the position to define multi-information as a function ofthe interaction potential of a statistical-mechanics model. To extract theminimum stochastic dependence, we defineDefinition 2.5: Multi-information given a translation invariant potentialΦ is defined by

I(Φ) := infp∈GI(Φ)

I(p), (13)

where I(p) is given by Proposition and Definition 2.2.

Remark 2.6: Because of lower-semicontinuity of I(p) (Theorem 2.2) andcompactness of GI it follows that the infimum is indeed attained. This fol-lows from general statements about extrema of semicontinuous functionsover compact sets, c.f. Theorem 25.9 in [Ch]. F

The non-uniqueness expressed by (11) is called phase coexistence. Phasesare the extreme points of the simplex G(Φ), which are also just the phys-ically realized states2. These so-called pure states have fluctuation-free

2Also, the property of ergodicity is equivalent with being an extreme point of the simplexof translation-invariant probability measures.

9

Page 11: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

macroscopic quantities. On the other hand, we can construct convex com-binations of them, which do not stand for physically realized states butwhich express our uncertainty about the state we are in [Ge]. That is whythe following proposition helps motivating our choice of defining I(Φ).Theorem 2.7: Let ex(GI) be the set of extreme points of GI . We have

I(Φ) = infp∈GI(Φ)

I(p) = infp∈ex(GI)(Φ)

I(p). (14)

Thus the infimum is attained in a physically relevant state. To illustrateDefinition 2.5 and Proposition 2.7, figure 3 shows I(p) over the set ofinfinite-volume Gibbs measures in the case of the two-dimensional Isingmodel.

pβ+pβ

I(p)

I(pβ±)

p

Figure 3: Schematic view of multi-information depending on p in the 2d Isingmodel

10

Page 12: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

3 Ising square lattice

3.1 Multi-information for the model

Taking advantage of the wealth of exact results for the two-dimensionalIsing model, we can find an explicit expression for multi-information. Def-inition 2.5 is applied to the Ising potential

ΦβV (x) = −βxixj if V = i, j ⊂ Z

2 where |i − j| = 1, (15)

and ΦβV (x) = 0 for all other sets V , the spin space S = ±1 3 xi and

β ∈ R+. The parameter β is the inverse temperature and stands for

the strength of interaction between spins. We use existing results for freeenergy and magnetization, critical temperature and the known set of Gibbsmeasures, for a list of references see [Ge].Let us first present a visualization of the main result of this paper: A plot ofmulti-information of the potential (15) as a function of inverse temperature(see figure 4). What one can see is a sharp isolated global maximum atthe point of phase transition. The analytic result will be given in 3.2.

0

0.5log2

0.4407 0.501 beta

12log 2

I(Φβ)

βc β

Figure 4: Multi-information of the Ising square lattice

It is well known that below a critical temperature the set of infinite-volumeGibbs measures is the convex hull of two extreme probability measures:

G(Φβ) = tpβ− + (1 − t)pβ

+ : t ∈ [0, 1] (16)

11

Page 13: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

where the two extreme points pβ± are connected by a spin-flip symmetry

that can be written as

pβ+ (XΛ = xΛ) = pβ

− (XΛ = −xΛ) ∀Λ ⊂⊂ Zd. (17)

Moreover, for the single-spin expectations (the magnetization) we have

pβ−(X0) = −pβ

+(X0). It is essential that these order parameters are non-zero for β > βc. The Yang formula (a rigorous result, see [S], p. 153)is

mβ := pβ+(X0) =

(1 − sinh−4 2β)18 if β > βc,

0 otherwise.(18)

The essential feature of the model is a continuous phase transition at acritical temperature βc:

sinh2βc = 1, i.e. βc =1

2ln(1 +

√2). (19)

We will also need the entropy (per unit volume)

h(β) := h(pβ±) = ln(

√2 cosh 2β) +

1

π

∫ π2

0ln

1 +√

1 − κ2β sin2 ω

− 2β tanh 2β − βsinh2 2β − 1

sinh 2β cosh 2β

2

π

∫ π2

0

dω√

1 − κ2β sin2 ω

− 1

, (20)

where

κβ =2 sinh 2β

cosh2 2β. (21)

This expression can be found using the results for free energy f(β) andenergy e(β), see e.g. [Wa], because of

h(β) = βe(β) − βf(β). (22)

Theorem 3.1: Let mβ and h(β) be defined by (18) and (20). Also, let

s(x) = −1 + x

2ln

1 + x

2− 1 − x

2ln

1 − x

2, x ∈ [−1, 1], (23)

(see figure 5)3. Multi-information of the Ising square lattice is given by

I(Φβ) = s(mβ) − h(β). (24)

Remark 3.2: Notice that similar expressions can be found for alltranslation-invariant models with binary spin space. F

30 ln 0 := 0.

12

Page 14: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

0

0.5

ln 2

-1 -0.5 0.5 1

x

Figure 5: The function s(x)

3.2 The maximum of multi-information

Putting some effort into bounding the terms in (24), one can obtainanalytic results connecting the phase transition with maximum multi-information:Theorem 3.3: In the two-dimensional Ising model, multi-informationas a function β 7→ I(Φβ) of inverse temperature attains its isolated globalmaximum at the point of phase transition β = βc. At this point, the left-sided derivative goes to +∞, the right-sided one to −∞.This subsection is devoted to the proof of the theorem. Some technicalresults are needed. Using the shorthand notation

Θ(β) :=sinh2 2β − 1

sinh 2β cosh 2β

2

π

∫ π2

0

dω√

1 − κ2β sin2 ω

− 1

, (25)

we have the following bounds:Lemma 3.4: Let β ≥ βc. Then

βΘ(β) ≤ min

sinh 2β − 1

2ln

sinh 2β + 1

sinh 2β − 1,

β

sinh 2β cosh 2β

. (26)

13

Page 15: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

Moreover, Θ(βc) = 0.

− ln(√

2 cosh 2β)

+ 2β tanh 2β

≤ min

2βc(β − βc) +√

2βc − ln 2,−β

sinh 2β cosh 2β+ ln

√2

, (27)

s(mβ) ≤ ln 2 − (1 − sinh−4 2β)14

2− (1 − sinh−4 2β)

12

12, (28)

ds(mβ)

dβ≤ −sinh−4 2β

tanh 2βm−6

β . (29)

Lemma 3.5: For 0 ≤ y ≤ 1/2 we have

(1 − (1 + y2)−4)14

2>

y2

2ln

2 + y2

y2. (30)

Proof of Theorem 3.3: Multi-information is considered in four differentregimes. Let us start with the high-temperature case:(A) β ≤ βc: We consider the monotonicity of I(Φβ). Here, the orderparameter mβ vanishes. Thus, as an immediate consequence of Theorem3.1 we obtain

I(Φβ) = ln 2 − h(β), β ≤ βc. (31)

Using (22) and d(βf(β))/dβ = e(β) [R], the β derivative is

dI(Φβ)

dβ= −dh(β)

dβ= −β

de(β)

dβ= −β

d2(βf(β))

dβ2≥ 0. (32)

This relation follows from the convexity of −βf(β), see again [R]. Hence,the monotonicity up to the critical point is known.(B) β = βc: Let us now prove the cusp at the critical temperature. Somecare especially for the right-sided derivative is necessary since we haveantagonist terms going to infinity. For the left-sided derivative we use thesecond equality in (32). The divergence of the specific heat is known fromthe literature:

de(β)

dβ=

8

πln |β − βc| + bounded terms, (33)

see [S] (p. 152). Since this expression goes to −∞ as β βc, the left-sidedderivative of I(Φβ) goes to +∞.Above βc, the derivative of the first term in (24) comes into play. To make(33) depend on mβ, we use (1 − sinh−4 2β) ≤ 8

√2(β − βc), which follows

from the concavity of the LHS. Together with (18) we have

ln(β − βc) ≥ 8 lnmβ − ln(8√

2). (34)

14

Page 16: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

With this and (29) we find

limmβ0

dI(Φβ)

dβ≤ lim

mβ0

[

−sinh−4 2β

tanh 2βm−6

β − 8

π8 ln mβ + bounded terms

]

= limy→∞

y6

[

−√

2 +64 ln y

πy6+

b. T.

y6

]

= −∞, (35)

where we used the substitution y := 1/mβ and sinh−4 2βc/ tanh 2βc =√

2.(C) βc < β: For the remaining β domain we only show I(Φβ) < I(Φβc)for β > βc. Together with Theorem 3.1 this becomes

s(mβ) − h(β) < ln 2 − h(βc), β > βc. (36)

With (19), (20), the entropy at βc is found to be

h(βc) = ln 2 −√

2βc +1

π

∫ π2

0ln [1 + cosω] dω. (37)

(We used Θ(βc) = 0 from Lemma 3.4 and cosh 2βc =√

2.) The relation tobe shown, (36), thus becomes

s(mβ) − ln(√

2 cosh 2β) +1

π

∫ π2

0ln

1 + cos ω

1 +√

1 − κ2β sin2 ω

+ 2β tanh 2β + βΘ(β) −√

2βc < 0, β > βc. (38)

The integral in (38) is smaller or equal to zero since cos ω ≤ √.... It thus

suffices to no longer consider this term in the following. For an additionalpartitioning of the domain above βc we use

β :=1

2arsinh (1 − K4)−

14 , K := 2(

√2βc −

3

2ln 2). (39)

(C1) βc < β ≤ β: If we feed the corresponding terms from Lemma 3.4into (38), we obtain the following inequality whose proof suffices to prove(38):

− (1 − sinh−4 2β)14

2− (1 − sinh−4 2β)

12

12

+ 2βc(β − βc) +sinh 2β − 1

2log

sinh 2β + 1

sinh 2β − 1< 0 (40)

We now show that the sum of the first and last terms of the LHS, as wellas the sum of the two terms in-between them are negative in the requiredrange βc up to β. For first and last term we define y by sinh 2β =: 1 + y2

and use Lemma 3.5. Also using

sinh 2β < 1 + (1/2)2 (41)

15

Page 17: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

(which can be checked using (39), sinh 2β ≈ 1.18) it is clear that the sumof first and last terms of the LHS of (40) are smaller zero in the requiredrange. For the middle terms, we square the corresponding inequality toobtain

4β2c (β − βc)

2 <1 − sinh−4 2β

144, βc < β ≤ β. (42)

Here, we do the following: At β = βc, both sides are equal to zero. Takingthe second derivatives shows that the LHS is a concave function, the RHSa convex one. If the inequality holds for the point β, it also holds for theentire interval (βc, β]. Using (39) one calculates for β = β

4β2c (β − βc)

2 <K4

144, (43)

from which we obtain (taking the square root, shifting terms and applyingthe hyperbolic sine)

sinh 2β < sinh

[

K2

12βc+ 2βc

]

= 1.19471... (44)

Putting in the value of sinh β using (39) shows that this relation indeedholds. Hence (42) holds in the required β range including β, and thus (38)holds.(C2) β < β: Lemma 3.4 again makes (38) an inequality whose proofsuffices to prove (38):

3

2ln 2 −

√2βc −

(1 − sinh−4 2β)14

2< 0, β < β. (45)

the corresponding equality is just solved by β, cf. (39). As the LHS ismonotonically decreasing, the inequality holds above β.

3.3 Discussion

Since I = H(p0)− h(p) (see (50)), two sorts of uncertainty or knowledgeplay a role: In figure 6 we show the information about the single spinthat stems from the entire system, ln 2 − h(p) (information per site, orredundancy) and the information about the single spin that comes from asingle spin only, ln 2−H(p0) (c.f. also e.g. [BNT, BP] for a discussion ofentropies on different scales). Multi-information is the difference betweenthese terms and can be seen as the average information the single spincarries about the full system and vice versa. We will show this in terms ofmutual information for the one-dimensional case.Interestingly, if one plots correlation functions of near neighbours and thesquared magnetization, one gets a very similar picture to figure 6, see figure8.10 in [MW]. The difference then would be covariance, which, althoughonly accounting for second-order interactions, behaves similar to multi-information in the Ising case, c.f. also [Li1].

16

Page 18: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

Figure 4 fits nicely into the universal picture discussed in [Ar], figure 1.Clearly, the cusp is due to the phase transition, which is also characterizedby non-analyticity of the free energy.Let us also briefly mention the approaches to stochastic dependence in theIsing square lattice that can, to our knowledge, be found in the literature.In [MKNYM], the mutual information between two spins depending ontheir distance and on temperature is considered. The relation to magne-tization and correlation function is shown. Plots very similar to ours areobtained by simulation. In [Ar], a similar plot, here of excess entropy (seenext section) is obtained by simulation. There are different choices fora definition of excess entropy in two dimensions. In [FC2] this problemis tackled, and three different definitions (seen as equivalent in one di-mension) are presented in the 2d case. Plots for nearest and next-nearestneighbour Ising model are obtained using simulations.Since multi-information coincides with excess entropy for the Ising chain(see next section), we can consider it another definition of excess entropyfor the Ising square lattice.Only in the last of the above cited articles is the problem of multiple ex-cess entropies stated. They appear because of non-uniqueness of the Gibbsmeasure. Note that Definition 2.5 takes this fact into account.

0

0.346574

0.693147

1 2y 21

ln 2 − h(β)

ln 2 − s(mβ)

βc/β

log 2

Figure 6: 2d Ising: Full-system and single-spin information against normalizedtemperature. Their difference is multi-information.

17

Page 19: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

4 One-dimensional systems

4.1 Multi-information and excess entropy

One-dimensional systems on (SZ,F) are readily interpreted as time-dependent. Translation invariance is seen as stationarity of a stochasticprocess. In this section it is convenient to leave out the dependence onp and to use the notation . . . X−1, X0, X1, X2 . . . for the infinite chainof random variables that take values from the finite alphabet S, so e.g.H(X1, . . . Xn) = H(p1,...,n) denote the entropy of n consecutive randomvariables. In the following we want to discuss the relationship of multi-information to standard quantities defined in the time context, so we makethe followingDefinition 4.1: Let the conditional entropies be denoted by hn :=H(Xn|Xn−1, . . . , X1) = H(p1,...,n)−H(p1,...,n−1) where n > 0 and h1 :=H(X1) =: H0, the entropy rate by h := limn→∞ H(p1,...,n)/n. The mutualinformation between Xn and X1, . . . , Xn−1 is MI(Xn;X1, . . . , Xn−1) :=H0 − hn. The excess entropy (or effective measure complexity) is definedby

E :=

∞∑

n=1

(hn − h). (46)

It is well known that hn converges to h for n → ∞. Depending on thespeed of convergence, E can be finite or infinite. It is known to measurethe total information needed for optimal predictions [Gr], see [CF2] andreferences there. On the other hand hn is the average unpredictability [Eb]of Xn given the values of the n − 1 preceding variables in the chain.Using the chain rule for entropy, finite-volume multi-information can bewritten as

IL(X1, . . . XL) =

L∑

n=1

(H(Xn) − hn) =

L∑

n=1

(H0 − hn) . (47)

Defining a finite-volume excess entropy EL(X1, . . . XL) :=∑L

n=1 (hn − h),we observe that EL and IL are two sides of the same coin, both summingup differences of hn from a fixed quantity. These summands are calledmeasure complexities [Gr, Eb] for EL and mutual informations for IL. Inthe spirit of [FC1, CF2] the similarity between EL and IL is schematicallyillustrated in figure 7. Note however that in the limit both quantities arein general no longer similar.Theorem 4.2: For p a translation-invariant probability measure on(SZ,F), we have the following expressions for I and E:(i) I = limn→∞ MI(Xn;X1, . . . , Xn−1),(ii) E = I +

∑∞n=2 (hn − h), especially E = I if p is Markov.

The proof follows immediately from the definitions of the involved quanti-ties. Multi-information in the limit is thus the average information that the

18

Page 20: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

n

H0

hEL

I

hn

1 L

IL

Figure 7: The area below the hn curve converges to excess entropy, the area abovethe curve is finite-volume multi-information

past carries about the next variable in the chain. Excess entropy is oftendiscussed in the context of mutual information between the past and thefuture, i.e. between two semi-infinite blocks in the chain. Also, it is well-known that the convergence behaviour of hn informs us about importantsystem properties [BNT, BP]. However, from the size of E alone we cannotin general tell how fast hn converges: Different values of I give differentconvergence behaviour for the same E. Thus E and I complement eachother as regards information about the convergence of hn. Moreover, beingthe first summand in (46), I is a lower bound on E. We have E = I forMarkov chains (nearest-neighbour spin chains) since hn = h for all n > 1in this case. This leads us to the Ising model.

4.2 Ising model as a Markov chain

The usual description of Markov chains by one-sided conditional probabili-ties is closely related to a description by means of the Gibbsian probabilitykernels. Indeed, S assumed finite, we have a one-to-one correspondencebetween the set of all positive transition matrices of a Markov chain andthe shift-invariant potentials for which ΦV = 0 if V 6= i or i, i + 1(homogeneous nearest-neighbour) [Ge, Gr]. The set of Gibbs measuresof such potentials only contains the unique measure defining the Markovchain whose transition matrix can be calculated explicitly from Φ.There is a complete solution for the nearest-neighbour Ising chain in a

19

Page 21: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

magnetic field b. Our interaction potential depends on this additional pa-rameter now:

Φβ,bV (x) = −βxixi+1 if V = i, i + 1,

Φβ,bV (x) = −bxi if V = i, (48)

and Φβ,bV (x) = 0 for all other V ⊂ Z. In analogy to (24) we can write

down a formula for multi-information (for entropy and magnetization seee.g. [Pa]) and plot it depending on β and b (figure 8). Alternatively, one

-10

1 02

4

0

0.5

b β

I(Φβ,b)

Figure 8: Multi-information (=excess entropy) in the Ising chain

can get an expression for I from the transition matrix of the correspondingMarkov chain [Li2]. This matrix is [Ge]

Pβ,b =

(

e−2βbq−1β,b 1 − e−2βbq−1

β,b

1 − q−1β,b q−1

β,b

)

(49)

where qβ,b := e−βb(cosh βb +√

e−4β + sinh2 βb). For b = 0 the two groundstate configurations are described by the unique Gibbs measure that givesequal weight to the respective Dirac measures (c.f. also the dotted linein figure 2). We have symmetry in this case, the probability that thedirection of the spin stays the same after the next time step is 1/(1 +e−2β). H0 keeps its maximal value ln 2 through all temperatures, so multi-information is completely determined by the entropy rate. It means thatwe can predict that the spin will not flip with higher and higher certaintywhen lowering temperature, but that there are equal chances for bothpossibilities regarding the actual value of X1. For b > 0 the symmetryis broken. We have a preference for the value of X1 in the direction ofthe field, H0 is reduced. There is competition between the order creatingand destroying influences of field and temperature. The past carries most

20

Page 22: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

information about X1 if these influences balance out. Although no phasetransition takes place, this maximization of I can be seen as a relatedphenomenon.It is interesting to investigate the nature of this non-critical “transition”.See [CF1] for a similar picture as figure 8 (for b held fixed). There, excessentropy (which, as mentioned, equals multi-information in this case) wascalculated using transfer matrices.

21

Page 23: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

5 Proofs of lemmas and theorems

Let us first state a lemma which will be needed for the proof of Theorem2.2:Lemma 5.1: Let p, q be probability measures on (Ω,F). We have(i) 0 ≤ H(p0) ≤ ln |S|,(ii) H

(

(tp + (1 − t)q)0)

≥ tH(p0) + (1 − t)H(q0) ∀t ∈ [0, 1],(iii) H(p0) is continuous for the weak* topology.Proof: For (i), (ii) see [CT]. In our case the measures are marginals ofp, q, but (ii) follows immediately from the affinity of the projection of ponto p0, i.e. (tp + (1 − t)q)0 = tp0 + (1 − t)q0.(iii) follows from the continuity of entropy w.r.t. p0, see [Ca]. It remainsto show that the projection π0 of p onto p0 is continuous, i.e. that frompn → p follows π0(pn) → π0(p). Continuity for the weak* topology on ourtopological space Ω means that pn → p is equivalent to pn(f) → p(f)∀f ∈C(Ω) (being the space of continuous functions for the product topology).For f we choose the indicator function 1X0=x0 (which is continuous sincethe inverse images of 1 and 0 are open sets). Now we have

π0(pn)(x0) = pn(1X0=x0) → p(1X0=x0) = π0(p)(x0) ∀x0 ∈ S.

Proof of Theorem 2.2: We use the existence of the van-Hove limit,upper-semicontinuity and affinity of the entropy limΛZd

1|Λ|H(pΛ) =:

h(p) ∈ [0, ln |S|] (cf. [Is, R], these properties follow immediately fromthe proof for a more generally defined entropy not requiring finite S) andLemma 5.1. Similarly to I(pΛ), we can split I(p) =

limΛZd

1

|Λ|

[

i∈Λ

H(pi) − H(pΛ)

]

= H(p0) − h(p), (50)

where the second equality follows from translation invariance. Sincewe have all the required properties for h(p) and H(p0), the theoremfollows.

Proof of Theorem 2.7: We have to show that the Infimum is alwaysattained in an extreme point of GI . This follows from compactness andconvexity of GI as well as lower-semicontinuity and concavity of I(p) ac-cording to Theorem 25.9 in vol. 2 of [Ch].

Proof of Theorem 3.1: We show:(i) I(Φβ) = I(pβ

±),

(ii) I(pβ±) = s(mβ) − h(β).

(i): According to (16), the pβ± are the only extreme Gibbs measures. First

we show that I(p) is symmetric around (pβ− + pβ

+)/2. For this we use the

measures p = (1−t)pβ−+tpβ

+ and p′ = tpβ−+(1−t)pβ

+ for t ∈ [0, 1]. Becauseof the spin-flip symmetry (17), for Λ ⊂⊂ Z

d we have H(pΛ) = H(p′Λ).Taking the limit yields h(p) = h(p′), with (50) we also have I(p) = I(p′).

22

Page 24: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

By theorem 2.7 I(Φβ) = infp∈pβ

−,pβ+ I(p), because of the above symmetry

we have I(pβ−) = I(pβ

+) (see figure 3).(ii): We have p(X0) =

x0=±1 p(X0 = x0)x0 = p(X0 = 1) − p(X0 = −1).Also using

x0=±1 p(x0) = 1, we obtain for the single-spin probability

p(X0 = x0) =1 + x0p(X0)

2. (51)

Hence

H0(pβ±) = −

x0=±1

1 + x0pβ±(X0)

2ln

1 + x0pβ±(X0)

2= s

(

pβ+(X0)

)

. (52)

Since s is even, both expectations give the same result. From (50) follows(ii).

Proof of Lemma 3.4:Equation (26): 1.) βΘ(β) ≤ sinh 2β−1

2 log sinh 2β+1sinh 2β−1 , Θ(βc) = 0

Taking the -1 out of the square brackets in (25) and into the integral, theresulting numerator in the integral can be modified like this:

1 −√

1 − κ2β sin2 ω ≤ 1 −

(

1 − κ2β sin2 ω

)

≤ κ2β sinω. (53)

Estimating the resulting integral [BS] yields

∫ π2

0

sinω√

1 − κ2 sin2 ωdω =

1

2κln

1 + κ

1 − κ, |κ| < 1. (54)

We use (21) and

1 + κβ

1 − κβ

=

(

sinh 2β + 1

sinh 2β − 1

)2

(55)

to find

Θ(β) ≤ sinh2 2β − 1

cosh3 2β

4

πln

sinh 2β + 1

sinh 2β − 1. (56)

We still have to show that

βsinh2 2β − 1

cosh3 2β

4

π≤ sinh 2β − 1

2, (57)

orsinh 2β + 1

cosh3 2β≤ π

8β. (58)

Note that for β > βc we have sinh 2β > 1, so the LHS can be bounded like

sinh 2β + 1

cosh3 2β≤ sinh2 2β + 1

cosh3 2β=

cosh2 2β

cosh3 2β≤ 1

1 + (2β)2/2, (59)

23

Page 25: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

which follows from the series expansion of the hyperbolic cosine. Returningto (58), we have to show

1

1 + (2β)2/2≤ π

8β(60)

or

0 ≤ β2 − 4

πβ +

1

2, (61)

which is fulfilled for β = 0. Since the corresponding equation has no realzeroes, the relation also holds for all the other β, and thus (57) is proven.We still have to show that Θ(βc) = 0. Clearly, Θ(β) ≥ 0 for β ≥ βc.In part (C1) of the proof of Theorem 3.3 we have moreover shown that

βΘ(β) ≤ (1 − sinh−4 2β)14 /2, given the just proven first statement of the

lemma. Hence we have

0 ≤ Θ(βc) ≤(1 − sinh−4 2βc)

14

2βc= 0. (62)

2.) Θ(β) ≤ 1/ sinh 2β cosh 2βFor β ≥ βc we have

1 − κ2β =

sinh2 2β − 1

cosh2 2β. (63)

Using this, (25) becomes

Θ(β) = coth 2β√

1 − κ2β

2

π

∫ π2

0

dω√

1 − κ2βsin2ω

− 1

. (64)

We take the root into the square brackets and obtain the integrand

1 − κ2

1 − κ2 sin2 ω−√

1 − κ2. (65)

This is a continuous function in ω which has the value zero at ω = 0and the value 1 −

√1 − κ2 at ω = π/2. Connecting these two points, one

obtains the diagonal of a rectangle with area (1−√

1 − κ2)π2 . The integral

can be bounded by half of the area of this rectangle. We show that thepart of the area A of the rectangle above the integrand is greater or equalto the part B of the area below the integrand (the integral itself). Insteadof comparing A and B, we compare their respective integrands. We obtainthe integrand of A by twice reflecting the integrand of B: once in the

vertical line through π/4, once in the horizontal line through 1−√

1−κ2

2 .The resulting inequality is

1 − κ2

1 − κ2 sin2 ω−√

1 − κ2 ≤ 1 −√

1 − κ2

1 − κ2 cos2 ω, (66)

24

Page 26: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

or put differently,

1 − κ2

1 − κ2 sin2 ω+

1 − κ2

1 − κ2 cos2 ω

≤ 1 +√

1 − κ2. (67)

The expression in curly brackets is symmetric around π/4 because ofcos2 ω = sin2(π/2 − ω). In order to prove the inequality, we just haveto show that the expression is monotonic decreasing up to π/4 (for ω = 0we have equality), which is easily seen by looking at its derivative. ThusB ≤ A and

∫ π2

0

1 − κ2

1 − κ2 sin2 ω−√

1 − κ2

dω ≤ 1

2

(

1 −√

1 − κ2) π

2. (68)

Continuing with (64) we find

Θ(β) ≤ coth 2β2

π

1

2

(

1 −√

1 − κ2β

) π

2. (69)

Again using (63), we obtain the desired bound.Equation (27):1.) − ln

(√2 cosh 2β

)

+ 2β tanh 2β ≤ 2βc(β − βc) +√

(2)βc − ln 2Expanding the LHS into a Taylor series around βc, we observe that thesecond derivative is smaller zero for 4β tanh 2β > 1 (which holds for β >βc), so for an upper bound the series can be truncated after the first term.2.) − ln

(√2 cosh 2β

)

+ 2β tanh 2β ≤ −βsinh 2β cosh 2β

+ ln√

2

Factoring out e2β from the definition of cosh 2β, the LHS can be equated

to β (2 tanh 2β − 2) + ln√

21+e−4β . Moreover,

2 tanh 2β − 2 +1

sinh 2β cosh 2β=

e−4β

sinh 2β cosh 2β. (70)

It follows that

− ln(√

2 cosh 2β) + 2β tanh 2β − β

sinh 2β cosh 2β≤

βe−4β

sinh 2β cosh 2β− ln

[

1 + e−4β]

+ ln√

2 ≤ ln√

2. (71)

The last relation was obtained using the fact that the sum of the first twoterms does not exceed zero. To show this, we modify the first term asfollows:

βe−4β

sinh 2β cosh 2β=

2βe−4β

sinh 4β≤ 2βe−4β

4β=

e−4β

2. (72)

Now we have the inequality

e−4β

2≤ ln

[

1 + e−4β]

=e−4β

2+

e−4β

2− e−8β

2+

∞∑

n=2

[

1

e4β(2n−1)(2n − 1)− 1

e4β(2n)2n

]

, (73)

25

Page 27: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

since on the RHS the terms after the first exp(−4β)/2 are pairwise greater0 (we expanded ln(1 + x), cf. [BS]). Thus (71) holds.Equation (28): The function s(x) can be rewritten as follows:

s(x) = −1 + x

2ln

1 + x

2− 1 − x

2ln

1 − x

2

= ln 2 − 1

2[(1 + x) ln(1 + x) + (1 − x) ln(1 − x)] . (74)

The expression in square brackets is expanded (see again [BS]) andbounded below:

[ ] = (1 + x)

∞∑

n=1

(−1)n+1 xn

n− (1 − x)

∞∑

n=1

xn

n

=∞∑

n=1

[

(−1)n+1 xn

n− xn

n

]

+ x∞∑

n=1

[

(−1)n+1 xn

n+

xn

n

]

= −∞∑

n=1

x2n

n+ 2

∞∑

n=1

x2n

2n − 1=

∞∑

n=1

x2n

2n2 − n≥ x2 +

x4

6. (75)

This bound is possible since all the coefficients in the sum are positive.Together with (18) for mβ we thus obtain

s(mβ) ≤ ln 2 − (1 − sinh−4 2β)14

2− (1 − sinh−4 2β)

12

12. (76)

Equation (29): We have

ds(mβ)

dβ=

sinh−4 2β

tanh 2βm−7

β

1

2ln

1 − mβ

1 + mβ

. (77)

Expanding the logarithm into a series (see e.g. [BS]), we obtain the follow-ing bound:

m−1β

1

2ln

1 − mβ

1 + mβ

= −1 −∞∑

n=1

m2nβ

2n + 1≤ −1. (78)

From this the lemma follows.

Proof of Lemma 3.5: In order to show that

(1 − (1 + y2)−4)14

2>

y2

2ln

2 + y2

y2, 0 ≤ y ≤ 1

2(79)

we show that the LHS is greater than 35y, the RHS is smaller than 3

5y. Sofor the RHS we have to show

5

6y ln

2 + y2

y2< 1. (80)

26

Page 28: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

We only need y ≤ 1/2. In this case 2 + y2 ≤ 9/4, and thus we also have

5

6y ln

2 + y2

y2≤ 5

6y ln

9

4y2= −5

3y ln

2

3y. (81)

By equating the first derivative to zero we obtain the maximum of thefunction −y ln (2y/3) at 3/(2e). Hence

−5

3y ln

2

3y ≤ 5

3

3

2e< 1. (82)

With this, (80) is shown for y ≤ 1/2. For the LHS of (79) one has to prove:

(1 − (1 + y2)−4)14

y>

6

5. (83)

The LHS is modified as follows:

= 4

(1 + y2)4 − 1

y4(1 + y2)4=

1

(1 + y2)4

y8 + 4y6 + 6y4 + 4y2

y4≥

4√

4y−2

1 + y2. (84)

Since in the last expression the denominator is monotonically decreasing,the numerator increasing, for a lower bound it suffices to evaluate theexpression for the greatest y:

4√

4y−2

1 + y2≥

4

4(12 )−2

1 + 14

=8

5>

6

5, y ≤ 1

2. (85)

Acknowledgments

This work was mainly done at the Max-Planck Institute for Mathematics inLeipzig. I.E. thanks Jurgen Jost for general support and Ulrich Steinmetzfor discussions. N.A. also thanks the Santa Fe Institute. We are grateful toone of the anonymous referees for considerably improving the manuscriptby pointing out the importance of excess entropy and specifying relevantliterature.

References

[Am] Amari, Shun-ichi: Information geometry on hierarchy of prob-ability distributions, IEEE Trans. Inform. Theory 47 (2001)1701-1711.

[AN] Amari, Shun-ichi; Nagaoka, Hiroshi: Methods of InformationGeometry, AMS Translations of Mathematical Monographs191, Oxford University Press, Oxford 2000.

27

Page 29: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

[Ar] Arnold, Dirk V.: Information-theoretic analysis of phase tran-sitions, Complex Systems 10 (1996) 143-155.

[Ay1] Ay, Nihat: An information geometric approach to a theory ofpragmatic structuring, The Annals of Probability 30 1 (2002)416-436.

[Ay2] Ay, Nihat: Locality of global stochastic interaction in directedacyclic networks, Neural Computation 14 12 (2002) 2959 –2980.

[AK] Ay, Nihat; Knauf, Andreas: Maximizing multi-information,Max-Planck-Institute of Mathematics in the Sciences Preprintno. 42/2003 (2003).

[Ba] Bauer, Heinz: Probability Theory, Walter de Gruyter, Berlin,New York 1996.

[Be] Bell, A.J.: The co-information lattice, Preprint Redwood Neu-roscience Institute RNI-TR-02-1 (2002).

[BNT] Bialek, W.; Nemenman, I.; Tishby, N.: Predictability, complex-ity, and learning, Neural Computation 13 (2001) 2409-2463.

[BP] Binder, P.M.; Plazas, J.: Multiscale analysis of complex sys-tems, Physical Review E 63 (2001) 065203R.

[BS] Bronstein, I.N.; Semendjajew, K.A.: Taschenbuch der Mathe-matik, BSB B.G. Teubner Verlagsgesellschaft, Leipzig, 1989.

[Ca] Catlin, D.E.: Estimation, Control, and the Discrete KalmanFilter, Springer-Verlag, New York 1989.

[Ch] Choquet, Gustave: Lectures on Analysis, Vol. I-III, W.A. Ben-jamin, London 1969.

[CT] Cover, Thomas M.; Thomas, Joy A.: Elements of InformationTheory, John Wiley & Sons, Inc., New York 1991.

[CY] Crutchfield, J.P.; Young, K.: Computation at the onset ofchaos, in: Complexity, Entropy and the Physics of Informa-tion, Zurek, W.H. (ed.), Addison-Wesley Publishing Company,Reading Mass. 1990.

[CF1] Crutchfield, J.P; Feldman, D.P.: Statistical complexity of sim-ple one-dimensional spin systems, Physical Review E 55 (1997)R1239-R1242.

[CF2] Crutchfield, J.P; Feldman, D.P.: Regularities unseen, random-ness observed: Levels of entropy convergence, Chaos 13 (2003)25-54.

[Do] Dobrushin, R.L.: The description of a random field by meansof conditional probabilities and condition of its regularities, Th.Prob. Appl. 13 (1968) 458-486.

28

Page 30: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

[Eb] Ebeling, W.: Prediction and entropy of nonlinear dynamicalsystems and symbolic sequences with LRO, Physica D 109(1997) 42-52.

[FC1] Feldman, D.P.; Crutchfield, J.P.: Discovering noncritical or-ganization: statistical mechanical, information theoretic, andcomputational views of patterns in one-dimensional spin sys-tems, Santa Fe Institute Working Paper 98-04-026.

[FC2] Feldman, D.P.; Crutchfield, J.P.: Structural information intwo-dimensional patterns: Entropy convergence and excess en-tropy, Physical Review E 67 (2003) 051104.

[Ge] Georgii, Hans-Otto: Gibbs Measures and Phase Transitions,Walter de Gruyter, Berlin, New York 1988.

[Gr] Grassberger, P.: Toward a quantitative theory of self-generatedcomplexity, International Journal of Theoretical Physics 25(1986) 907-938.

[GL] Greenfield, Elliot; Lecar, Harold: Mutual information in a di-lute, asymmetric neural network model, Physical Review E 63(2001) 041905.

[Gr] Griffeath, D: Introduction to random fields, in: Kemeney,J.G; Snell, J.L.; Knapp, A.W.: Denumerable Markov chains,Springer New York, Heidelberg, Berlin 1976.

[HKO] Hyvarinen, A; Karhunen, J; Oja, E: Independent ComponentAnalysis, John Wiley 2001.

[Is] Israel, Robert B.: Convexity in the Theory of Lattice Gases,Princeton University Press, Princeton 1979.

[J1] Jaynes, E.T.: Information theory and statistical mechanics,Physical Review 106 (1957) 620-630.

[Li1] Li, Wentian: Mutual information versus correlation functions,Journal of Statistical Physics 60 5/6 (1990) 823-837.

[Li2] Li, Wentian: On the relationship between complexity and en-tropy for Markov chains and regular languages, Complex Sys-tems 5 (1991) 381-399.

[L] Linsker, R: Self-organization in a perceptual network, IEEEComputer 21 (1988) 105-117.

[LR] Lanford, O.E.; Ruelle, D.: Observables at infinity and stateswith short range correlations in statistical mechanics, Comm.Math. Phys. 13 (1969) 194-215.

[LF] Luque, Bartolo; Ferrera, Antonio: Measuring mutual informa-tion in random Boolean networks, Complex Systems 12 (2001)241-246.

29

Page 31: Multi-information in the Thermodynamic Limit · p(f(xi+j)j2Zdj(xj)j2Zd 2 Ag) = p(A) 8A 2 F; 8i 2 Z d: (7) Existence and properties of the van-Hove limit [R] of multi-information follow

[MKNYM] Matsuda, H; Kudo, K.; Nakamura, R.; Yamakawa, O.; Murata,T.: Mutual information of Ising systems, International Journalof Theoretical Physics 35 4 (1996) 839-845.

[MW] McCoy, Barry M.; Wu, Tai Tsun: The Two-Dimensional IsingModel, Harvard University Press, Cambridge Massachusetts1973.

[Pa] Pathria, R.K.: Statistical Mechanics, Butterworth-Heinemann,Oxford 1996.

[Pe] Petritis, Dimitri: Thermodynamic formalism of neural com-puting, in: Dynamics of Complex Interacting Systems, Goles,E.; Servet, M (eds.), Kluwer Academic Publishers, Dordrecht,Boston, London 1996.

[R] Ruelle, David: Statistical Mechanics - Rigorous Results, WorldScientific, Singapore / Imperial College Press, London 1999.

[Sh] Shannon, C.E.: A mathematical theory of communication, BellSystem Tech. J. 27 (1948) 379-423, 623-656.

[S] Simon, Barry: The Statistical Mechanics of Lattice Gases,Princeton University Press, Princeton 1993.

[SV] Studeny, M.; Vejnarova, J: The multiinformation function asa tool for measuring stochastic dependence, in: Jordan, M.I.(ed.) Learning in Graphical models, Dordrecht, Kluwer 1998.

[TSE] Tononi, Giulio; Sporns, Olaf; Edelman, Gerald: A measurefor brain complexity: Relating functional segregation and inte-gration in the nervous system, Proc. Natl. Acad. Sci. USA 91(1994) 5033-5037.

[Wa] Wannier, Gregory H.: Statistical Physics, Dover Publications,New York 1987.

30