86
GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES ERWIN BOLTHAUSEN, ˝ SHUTA NAKAJIMA, : NIKE SUN, AND ; CHANGJI XU A. We consider the Ising perceptron model with # spins and " # paerns, with a general activation function * that is bounded above. For * bounded away from zero or *pGq“ 1tG ě u, it was shown by Talagrand [Tal00, Tal11b] that for small densities , the free energy of the model converges as # Ñ8 to the replica symmetric formula conjectured in the physics literature [KM89] (see also [GD88]). We give a new proof of this result, which covers the more general class of all functions * that are bounded above and satisfy a certain variance bound. e proof uses the (rst and second) moment method conditional on the approximate message passing iterates of the model. In order to deduce our main theorem, we also prove a new concentration result for the perceptron model in the case where * is not bounded away from zero. C 1. Introduction 1 2. First moment conditional on AMP 9 3. Technical estimates 23 4. Analysis of rst moment 35 5. Second moment conditional on AMP 44 6. Local central limit theorem 55 7. Concentration of partition function 66 Appendix A. Review of AMP for perceptron 77 References 85 1. I 1.1. Overview. We study a class of generalized Ising perceptron models, dened as follows. Let M be an " ˆ # matrix with i.i.d. standard gaussian entries. Let * : Ñr0, 8q be a bounded measurable function (the activation function) and denote D log * : Ñ r´8, 8q. e associated Ising perceptron partition function is pMq” ÿ exp " ÿ 0ď" D ˆ pe 0 q t M # 1{2 ˙* , (1.1) where the sum goes over P t´1, `1u # . e 8 are called the spins, while the vectors g 0 are called the patterns. A special case is the half-space intersection model dened by the function * pGq“ 1tG ě u, where P is a xed parameter. In this paper we develop a method to compute the asymptotic free energy of the generalized model (1.1) with " # for small , and # Ñ8. Note that if * is scaled by any factor 2 , then the partition function (1.1) is scaled by 2 " — therefore, since we assume * is bounded, we may as well assume that * maps into r0, 1s. More precisely, we work throughout under the following: Assumption 1. e function * is a measurable mapping from into r0, 1s. Moreover, with denoting expectation over the law of a standard gaussian random variable , we have r* pqs “ ż I* pI q !pI q 3I 0 , (1.2) Date: November 5, 2021. Institute of Mathematics, University of Zurich. ˝ Department of Mathematics and Computer Science, University of Basel. : Department of Mathematics, Massachuses Institute of Technology. ; Center for Mathematical Sciences and Applications, Harvard University. 1 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES

‹ERWIN BOLTHAUSEN, ˝SHUTA NAKAJIMA, :NIKE SUN, AND ;CHANGJI XU

Abstract. We consider the Ising perceptron model with# spins and" “ # pa�erns, with a general activation function* that is bounded above. For* bounded away from zero or*pGq “ 1tG ě �u, it was shown by Talagrand [Tal00, Tal11b]that for small densities , the free energy of the model converges as # Ñ8 to the replica symmetric formula conjecturedin the physics literature [KM89] (see also [GD88]). We give a new proof of this result, which covers the more general class ofall functions* that are bounded above and satisfy a certain variance bound. �e proof uses the (�rst and second) momentmethod conditional on the approximate message passing iterates of the model. In order to deduce our main theorem, wealso prove a new concentration result for the perceptron model in the case where * is not bounded away from zero.

Contents

1. Introduction 12. First moment conditional on AMP 93. Technical estimates 234. Analysis of �rst moment 355. Second moment conditional on AMP 446. Local central limit theorem 557. Concentration of partition function 66Appendix A. Review of AMP for perceptron 77References 85

1. Introduction

1.1. Overview. We study a class of generalized Ising perceptron models, de�ned as follows. Let M be an " ˆ #

matrix with i.i.d. standard gaussian entries. Let * : ℝÑ r0,8q be a bounded measurable function (the activationfunction) and denote D ” log* : ℝÑ r´8,8q. �e associated Ising perceptron partition function is

` ” `pMq ”ÿ

exp"

ÿ

0ď"

D

ˆ

pe0qtM�#1{2

˙*

, (1.1)

where the sum goes over � P t´1,`1u# . �e �8 are called the spins, while the vectors g0 are called the patterns.A special case is the half-space intersection model de�ned by the function *pGq “ 1tG ě �u, where � P ℝ is a�xed parameter. In this paper we develop a method to compute the asymptotic free energy of the generalized model(1.1) with " “ # for small , and # Ñ 8. Note that if * is scaled by any factor 2, then the partition function(1.1) is scaled by 2" — therefore, since we assume * is bounded, we may as well assume that * maps into r0, 1s.More precisely, we work throughout under the following:

Assumption 1. �e function* is a measurable mapping fromℝ into r0, 1s. Moreover, with�� denoting expectationover the law of a standard gaussian random variable �, we have

��r�*p�qs “

ż

I*pIq!pIq 3I ‰ 0 , (1.2)

Date: November 5, 2021.‹Institute of Mathematics, University of Zurich. ˝Department of Mathematics and Computer Science, University of Basel. :Department of

Mathematics, Massachuse�s Institute of Technology. ;Center for Mathematical Sciences and Applications, Harvard University.1

arX

iv:2

111.

0285

5v1

[m

ath.

PR]

4 N

ov 2

021

Page 2: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

2 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

where ! denotes the standard gaussian density.

Assumption 2. Writing ��,�1 for expectation over i.i.d. standard gaussians �, �1, the quantity

p 2q1p*q ” max

"

1, sup"

��,�1rp� ´ �1q2*pG ` 2�q*pG ` 2�1qs

��,�1r*pG ` 2�q*pG ` 2�1qs: G P ℝ, 2

5 ď 2 ď73

**

.

is �nite. �is assumption implies that the quantity

2p*q ” max"

1, sup"

��,�1rp� ´ �1q2*pG ` 2�q*pG ` 2�1qs

��,�1r*pG ` 2�q*pG ` 2�1qs: G P ℝ, 1

2 ď 2 ď 2**

is also �nite, and indeed 2p*q ď p 2q1p*q. (�e bound on p 2q

1p*q further ensures that 2p*�q is bounded, where*� is a smoothed approximation of * ; see Lemma 3.9.)

See Remark 1.3 below for more discussion on the above assumptions — in particular, we will explain that assump-tion (1.2) only rules out an easier case of the problem. To state our main results we introduce some further notation.As above, let � denote an independent standard gaussian random variable, and let �� denote expectation over thelaw �. Given @ P r0, 1q let

!@pGq ” log��*´

G ` p1´ @q1{2�¯

” logż

G ` p1´ @q1{2I¯

!pIq 3I , (1.3)

where ! denotes the standard gaussian density as above. Let

�@pGq “ p!@q1pGq “

1p1´ @q1{2

��r�*pG ` p1´ @q1{2�qs

��*pG ` p1´ @q1{2�q. (1.4)

We will sometimes abbreviate ! ” !@ and � ” �@ .

Proposition 1.1 (proved in Section 3). If * satis�es Assumption 1, then there exists a positive constant p*q ą 0such that for all 0 ă ď p*q there exists a unique pair p@,#q P r0, 1{25s ˆ r0,8q satisfying

ˆ

@

#

˙

ˆ

@p#q Ap@q

˙

ˆ

�rthp#1{2/q2s

�r�@p@1{2/q2s

˙

, (1.5)

where �@ is de�ned by (1.4) (and depends on*). Moreover we can take

p*q ”1

410 ¨ 21 ¨ �1p*q6 ¨ 2p*q4, (1.6)

where 21 is an absolute constant characterized by Lemma 3.7 and Corollary 3.8, and �1p*q is a �nite constant dependingonly on* which is characterized by Lemma 3.3. �e solution p@,#q of (1.5) satis�es

p��r�*p�qsq2

2 ď@

ď

#

ď 3 ¨ �1p*q

2 (1.7)

for all 0 ď ď p*q.

For any* and such that (1.5) has a unique solution p@,#q P r0, 1q ˆ r0,8q, the replica symmetric formulafor the free energy of the corresponding perceptron model (1.1) is given by

RS ” RSp ;*q “ ´#p1´ @q

2 `�

"

log 2 chp#1{2/q ` !@p@1{2/q

*

, (1.8)

where the expectation is over an independent standard gaussian random variable /. In this paper we show:

�eorem 1.2 (main theorem). If* satis�es Assumptions 1 and 2, then there exists a positive constant “ 1p*q ą 0such that, if M is an" ˆ# matrix with i.i.d. standard gaussian entries and"{# Ñ with 0 ď ď 1p*q, then forthe (generalized) Ising perceptron model (1.1) we have

lim#Ñ8

1#

log`pMq “ RSp ;*q ,

Page 3: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 3

where the limit is in probability. Moreover we can take

1p*q ”1

416 ¨ 21 ¨ �1p*q6 ¨ p 2q1p*q4ď

p*q

46 , (1.9)

for p*q as de�ned by (1.6).

For* bounded away from zero, as well as for*pGq “ 1tG ě �u, the result of �eorem 1.2 was previously shownby Talagrand [Tal00, Tal11b]. Our proof is very di�erent from Talagrand’s, and uses the idea of “conditioning on theAMP iteration,” as previously introduced by [DS18, Bol19] (see also [AS20, FW21, BY21]). By contrast, Talagrand’sproof uses an interpolation approach, which seemingly necessitates more conditions on * . Our result extends tothe more general class of functions * satisfying Assumptions 1 and 2. See §1.2.3 below for further discussion andcomparison.

Remark 1.3. We make some further comments on our assumptions:1. From our perspective, Assumption 1 is relatively mild. It may be possible to relax the condition* ď 1 to accom-

modate functions*pGq that do not grow too quickly in |G|, but we will not pursue this here. Next, if the condition(1.2) fails — meaning that ��r�*p�qs “ 0 — then the �xed-point equation (1.5) is solved by @ “ # “ 0, and thereplica symmetric free energy (1.8) reduces to the annealed free energy

annp ;*q “ 1#

log�`pMq “ log 2` log�*p�q . (1.10)

In this case, it is known that the limiting free energy can be obtained by a direct �rst and second moment methodapproach, without the need of a conditioning scheme. �is is done for the case of symmetric * by [APZ19], andthe argument of that paper can be extended to cover the case ��r�*p�qs “ 0. Moreover it is expected that thiscase may be more tractable to analyze for �ner properties of the solution space, following [PX21, ALS21] (furtherdiscussed in §1.2.5 below).

2. We view Assumption 2 as the slightly more restrictive condition, although we will show (by straightforwardarguments) that it holds if* is bounded away from zero, compactly supported, or logconcave (see Proposition 1.4below). Moreover, Assumption 2 is essentially necessary to ensure that the function �@ in (1.4) is Lipschitz — thisis by an easy calculation, which we give in Lemma 3.14. �is allows us to use existing results on AMP and stateevolution ([BM11, Bol14]; see §1.3 and §2.1) which all require the message-passing functions to be Lipschitz.

Assumption 1 holds throughout this paper, even if not explicitly stated. However, we will point out explicitly each placewhere Assumption 2 is used.

Proposition 1.4 (proved in §3.4). Suppose * satis�es Assumption 1. If in addition * is bounded away from zero,compactly supported, or logconcave, then* also satis�es Assumption 2.

1.2. Background and related work. In this subsection we give some background on the perceptron model, andsurvey the related work. Some high-level discussion of key ideas in this paper is given in §1.2.3–1.2.4.

�e perceptron problem originates from a toy model of a single-layer neural network, as follows. Suppose wehave # ` 1 input nodes, labelled 0 ď 9 ď # . Likewise we have # ` 1 output nodes, labelled 0 ď 8 ď # . For all8 ‰ 9, between the 9-th input node and the 8-th output node there is an edge weight �8 , 9 , to be determined. It will beconvenient to �x �8 ,8 ” 0 for all 8. �e system is given " input “pa�erns” 61 , . . . , 6" , which are vectors in ℝ#`1.We then say that the system memorizes the pa�ern 60 if

sgnˆ #ÿ

9“0�8 , 9p6

0q9

˙

“ sgn´

p60q8

¯

(1.11)

for all 0 ď 8 ď # . One can then ask, given " “ # i.i.d. random pa�erns, whether there exists a choice of edgeweights � such that the system memorizes all " pa�erns. �e storage capacity c of the model is the supremumof all “ "{# for which memorization of all " given pa�erns is possible with probability 1´ >#p1q. Models ofthis type have been considered at least since the mid-20th century (e.g. [MP43, Heb49, Lit74, Hop82]).

Page 4: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

4 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

One can consider the constraint (1.11) separately for each 0 ď 8 ď # , and by symmetry it su�ces to understandthe case 8 “ 0. Recall that �0,0 ” 0, and denote �8 ” �8 ,0 for 1 ď 8 ď # . Denote 60,9 ” p60q0p60q9 for all 1 ď 0 ď "

and 1 ď 9 ď # , and note that the 60,8 are i.i.d. standard gaussian random variables. �us (1.11) is equivalent to

1#1{2

#ÿ

9“060,9 �9 ě �

for � “ 0. Of course, one can then generalize the model by taking a non-zero parameter �: taking � ă 0 weakensthe original constraint (1.11), while taking � ą 0 gives a more restrictive constraint than (1.11). �is is equivalentto the model (1.1) with *pGq “ 1tG ě �u. �e two most commonly studied variants of the model are the Isingperceptron where �8 P t´1,`1u (as in this paper), and the spherical perceptron where � “ p�8q8ď# is restrictedto the sphere of radius #1{2.

1.2.1. Non-rigorous results from statistical physics. In the physics literature, the spherical perceptron with *pGq “1tG ě �u for � ě 0 was analyzed in a series of celebrated works of Gardner and Derrida [Gar87, Gar88, GD88, GD89],using the non-rigorous replica method. �is method also applies to the Ising perceptron with *pGq “ 1tG ě �ufor any � P ℝ, but the original Gardner–Derrida analysis contained an error leading to incorrect predictions. Acorrected replica calculation for the Ising model was �rst given by Krauth and Mezard [KM89]. �e same resultswere rederived using the cavity method by Mezard [Mez89]. (While the replica and cavity methods are both non-rigorous, the cavity method may be generally considered to yield more transparent derivations.)

�e Gardner–Derrida and Krauth–Mezard results cover the replica symmetric regime, where the system isexpected to exhibit some form of correlation decay. �e spherical perceptron with*pGq “ 1tG ě �u (also called thepositive spherical perceptron) is expected to be replica symmetric only for � ě 0, whereas the Ising perceptron with*pGq “ 1tG ě �u is expected to be replica symmetric for all � P ℝ. More recently there have been several worksin the physics literature investigating the negative spherical perceptron and its potential consequences in statisticalapplications, e.g. [FP16, FPS`17].

1.2.2. Rigorous results on the spherical perceptron. �e mathematical literature contains numerous very strong resultson the spherical perceptron for*pGq “ 1tG ě �u, especially for � ě 0 (conjecturally the replica symmetric regime).For � “ 0, the storage capacity c “ 2 was known since the 1960s [Wen62, Cov65]. For general � ě 0, the storagecapacity cp�q was proved by a short and elegant argument [Sto13], using convex duality together with Gordon’sgaussian minimax comparison inequality [Gor85, Gor88]. However, perhaps the most striking result for this modelis that of Shcherbina and Tirozzi [ST03], proving the Gardner free energy formula for the spherical perceptron for all� ě 0 and all up to cp�q. �e proof of [ST03] makes crucial use of the classical Brunn–Minkowski inequality forvolumes of bodies in euclidean space [Lus35, HO56]. �e main result of [ST03] was reproved by Talagrand ([Tal11a,Ch. 3] and [Tal11b, Ch. 8]) with a perhaps slightly simpler argument, using instead the functional Brunn–Minkowski(Prekopa–Leindler) inequality [Pre71, Lei72, Pre73]. �is inequality implies concentration of Lipschitz functionalsunder strongly logconcave measures [Mau91], which can be used to deduce concentration of overlaps and cavityequations (see e.g. [Tal11a, �m. 3.1.11]).1 As noted by [ST03] and [Tal11a, §3.4], similar concentration results canalso be obtained using instead the Brascamp–Lieb inequality [BL76]; and indeed this idea appears in earlier work onthe Hop�eld model [BG98]. �us, all existing results on the positive spherical perceptron (excluding the case � “ 0)use powerful tools from convex geometry.2

1.2.3. Rigorous results on the Ising perceptron. �e mathematical literature on the Ising perceptron is far less advancedthan for the spherical perceptron. For the half-space model, the free energy was computed heuristically by [KM89];their method applies also to the more general model (1.1). One consequence of the [KM89] calculation is an explicitprediction ‹ for the storage capacity c for the model *pGq “ 1tG ě �u — for � “ 0, the conjectured threshold ‹ is approximately 0.83.

1In this work we have also used the result of [Mau91] (restated in �eorem 3.12), but only to prove Proposition 1.4 which is not required forthe main theorem.

2�e Prekopa–Leindler inequality generalizes the Brunn–Minkowski inequality, and also can be used to deduce the Brascamp–Lieb inequality[BL00]. For more on the relations among these inequalities we refer to the survey [Gar02].

Page 5: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 5

In the rigorous literature, most existing results concern the half-space model *pGq “ 1tG ě 0u.3 For this model,it was shown by [KR98, Tal99a] that there is a small absolute constant & ą 0 such that the transition must occurbetween & and 1´ &: that is, the partition function (1.1) is non-zero with high probability for ď &, and zero withhigh probability for ě 1 ´ &. A more recent work [DS18] (further discussed below) uses some of the methods ofthis paper to show, under a certain variational hypothesis, that the partition function is non-zero with non-negligibleprobability for ă ‹, where ‹ is the conjectured threshold from [KM89]. A more recent work [Xu21] con�rmsthat the model indeed has a sharp threshold.4

For the situation where we have a more general function * in (1.1), Talagrand [Tal00] (see also [Tal11a, Ch. 2])proves that the limiting free energy is given by the replica symmetric formula (1.8), for small enough , under theassumption that the function D ” log* is uniformly bounded. �is corresponds to the case of our main result�eorem 1.2 where D is bounded, which we prove at the end of Section 5. Even for bounded D, the two proofs arevery di�erent: [Tal00] uses an interpolation method to derive replica symmetric equations, while this paper uses�rst and second moments conditional on the AMP iteration. We remark also that the argument of [Tal00] seeminglyneeds to go through a smoothed approximation of D, while our proof for bounded D requires no smoothing.

In comparison with previous work of Talagrand, the main new result of this work is that the limiting free energy isgiven by the replica symmetric formula (1.8), for small enough , for all* satisfying Assumptions 1 and 2. A specialcase of this result, for the half-space model *pGq “ 1tG ě �u, was previously obtained in [Tal11a, Ch. 9] (withpartial results appearing in a previous work [Tal99b]).5 Talagrand’s proof for the half-space model relies cruciallyon an estimate [Tal11b, �m. 8.2.4] which says roughly that if pD8q8ď= is a near-isotropic gaussian process, thenthe fraction of indices 8 where D8 ě � cannot be too small. �e proof of this estimate uses a gaussian comparisoninequality (see [Tal11a, Lem. 1.3.1] and [Tal11b, Propn. 8.2.2]), and does not extend for instance to the event D8 P �where � is a bounded measurable subset of ℝ. In this paper we prove an analogous (weaker) estimate for general �by di�erent methods (Proposition 7.1), and use this in the proof of �eorem 1.2 in the case of unbounded D.

1.2.4. TAP, AMP, and conditioning. �e main idea in the proof of �eorem 1.2, which we discuss further in §1.3 below,is to compute (�rst and second) moments of the partition function (1.1) conditional on the AMP �ltration.�e motivation originates from the TAP (�ouless–Anderson–Palmer) framework, which were introduced forthe classical Sherrington–Kirkpatrick model [SK75] by [TAP77] (and further investigated by [dAT78, Ple82]). Forthe model (1.1), the TAP equations read

m ” thpHq “ thˆ

Mtn#1{2 ´ �m

˙

, (1.12)

n ” �phq “ �

ˆ

Mm#1{2 ´ �n

˙

, (1.13)

where the functions th and � are applied coordinatewise, m ” thpHq is a vector in ℝ# , and n ” �phq is a vector inℝ" . For the model (1.1) at small , it is conjectured that the TAP equations (1.12) and (1.13) have a unique solutionpm‹ , n‹q, such that m‹ approximates the mean value of a random con�guration � sampled from the Gibbs measure

�p�q ”1

`pMq

ź

0ď"

*

ˆ

pe0qtM�#1{2

˙

. (1.14)

Meanwhile, the vector n‹ describes the distribution of the vector M�{#1{2 where � is sampled from �; see [Mez89].It is further expected that #´1 log` concentrates very well around a TAP free energy Φpm‹ , n‹q, which in turnconcentrates around the replica symmetric value (1.8). �e TAP equations and TAP free energy can be derived asa dense limit of the belief propagation equations and Bethe free energy; see [Mez17]. For more recent work on theTAP framework in a variety of se�ings, we refer to [CPS18, CPS21, FMM21, AJ21, ABvSY21].

3�e existing results for *pGq “ 1tG ě 0u can likely be extended to cover *pGq “ 1tG ě �u for any � P ℝ.4To be precise, the result of [DS18] is with gaussian noise M (as in this paper), while the other results [KR98, Tal99a, Xu21] are for the Bernoulli

noise model where 60,8 are i.i.d. symmetric random signs. It is reasonable to expect that the result of [KR98, Tal99a, Xu21] can be transferred tothe Bernoulli noise model.

5�e function *pGq “ 1tG ě �u satis�es the hypothesis of �eorem 1.2: it clearly satis�es Assumption 1, and one can check that it satis�esAssumption 2 either by direct calculation or by applying Proposition 1.4.

Page 6: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

6 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

As we commented in Remark 1.3 above, if we have��r�*p�qs “ 0 (i.e. if the assumption (1.2) does not hold), thenthe (unconditional) second moment method can be used to analyze the partition function ` from (1.1), following[APZ19]. If ��r�*p�qs ‰ 0, however, it is well known that the unconditional second moment method does not sayanything about the random variable `, at any positive “ "{# . Since the TAP �xed point pm‹ , n‹q is describedby a relatively simple set of equations (1.12) and (1.13), and is conjectured to carry a great deal of information aboutthe random measure (1.14), it is natural to consider the second moment method conditional on the TAP solutionpm‹ , n‹q. �e problem with this approach is that it is not in fact known that the equations (1.12) and (1.13) have aunique solution. A way around this is to use theAMP (approximatemessage passing) iteration, which constructsapproximate solutions of the TAP equations [BM11, Bol14].

�e idea of conditioning on the AMP iteration was introduced by [DS18, Bol19] and has been developed insubsequent works [AS20, FW21, BY21]. Of these prior works, [Bol19] and [BY21] concern the classical Sherrington–Kirkpatrick (SK) model with a gaussian coupling matrix (i.e., the Hamiltonian is a scalar multiple of �tM� whereM is an # ˆ # matrix with i.i.d. random gaussian entries). �e work [FW21] concerns (more general) SK modelswith random orthogonally invariant coupling matrices, and uses a simpli�ed “memory-free” AMP iteration that wasdeveloped and analyzed by [OW01, OCW16, CO19, Fan20]. �e works [DS18] and [AS20] concern the perceptronmodel, but only use the AMP conditioning method to prove lower bounds. In the current work, we show that theAMP conditioning method gives sharp upper and lower bounds for the generalized perceptron (1.1) at small .

1.2.5. Other related work. As noted above, in the special case that* satis�es��r�*p�qs “ 0 (i.e. if assumption (1.2)does not hold), the model (1.1) is mathematically much more tractable, and can be analyzed by an (unconditional)second moment method. �e condition ��r�*p�qs “ 0 holds for instance if * is a bounded symmetric function.�e second moment analysis was done for the cases *pGq “ 1t|G| ď �u and *pGq “ 1t|G| ě �u in [APZ19].For the model *pGq “ 1t|G| ď �u, much �ner structural results (on the typical geometry of the solution space)were obtained by [PX21, ALS21]. �ese results were inspired in part by questions raised in the physics literatureabout the algorithmic accessibility of CSP solutions (see e.g. [BBC`16, BRS19]). Finally, for the perceptron model instatistical se�ings, there is an extensive literature which we will not describe here; we refer the reader for instanceto [BKM`19, MZZ21] and many references therein.

1.3. AMP iteration. Our convention throughout is that if 5 : ℝÑ ℝ and z ” pI 9q9 is any vector, then

5 pzq ” p 5 pI 9qq9 (1.15)

denotes the vector of the same length which results from applying 5 componentwise to z. Recall � ” �@ from (1.4).Let mp0q “ 0 P ℝ# , np0q “ 0 P ℝ" , mp1q “ @1{21 P ℝ# , np1q “ p#{ q1{21 P ℝ" . �e approximate messagepassing (AMP) iteration for the perceptron model is given by (cf. (A.3) and (A.2))

mpC`1q ” thpHpC`1qq “ thˆ

MtnpCq

#1{2 ´ �mpC´1q˙

, (1.16)

npC`1q ” �phpC`1qq “ �

ˆ

MmpCq

#1{2 ´ �npC´1q˙

, (1.17)

where �mpC´1q and �npC´1q are the Onsager correction terms, whose coe�cients are de�ned by˜

¸

ˆ

��1p@1{2/q

� th1p#1{2/q

˙

. (1.18)

We remark that since th1pGq “ 1 ´ pth Gq2, it follows using (1.5) that � “ 1 ´ @. Recall from the discussion of§1.2.4 that the main idea in the proof of �eorem 1.2 is to compute (�rst and second) moments of the partitionfunction (1.1) conditional on the AMP �ltration

ℱ ” ℱpCq ” �

ˆ

´

MmpBq , npB`1q : B ď C¯

Mtnpℓq ,mpℓ`1q : ℓ ď C ´ 1¯

˙

(1.19)

in the limit C Ñ8. �e computation relies on existing results on the asymptotic behavior of AMP in the large-# limit[BM11, Bol14] (see also [DMM09, JM13, RV18, BMN20]). In §2.1 we review the relevant results from [BM11, Bol14]

Page 7: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 7

that are used in our proofs. �e results from our conditional method of moments calculation are summarized asfollows:

�eorem 1.5 (conditional �rst moment). If * satis�es Assumptions 1 and 2, then there exists a positive constant p*q ą 0 such that, if M is an " ˆ # matrix with i.i.d. standard gaussian entries and "{# Ñ , and ℱpCq is theAMP �ltration de�ned by (1.19), then for all 0 ď ď p*q we have

´

ˇ

ˇℱpCq¯

ď exp"

RSp ;*q ` >Cp1q¯

*

with high probability (i.e., with probability 1´ >#p1q).

�eorem 1.5 implies the upper bound in �eorem 1.2 by standard arguments, using Markov’s inequality. �eproof of the upper bound in �eorem 1.2 is therefore given at the end of Section 4, a�er the proof of �eorem 1.5.

�eorem 1.6 (conditional second moment). Suppose* satis�es Assumptions 1 and 2, and 0 ď ď p*q as de�nedby (1.6). If M is an " ˆ # matrix with i.i.d. standard gaussian entries and "{# Ñ , we can construct a randomvariable ¯ ď ` such that

´

¯ pMqˇ

ˇ

ˇℱpCq¯

ě exp"

RSp ;*q ´ >Cp1q¯

*

(1.20)

with high probability, and for which we have the second moment estimate

´

¯ pMq2ˇ

ˇ

ˇℱpCq¯

ď exp"

2#´

RSp ;*q ` >Cp1q¯

*

, (1.21)

also with high probability.

In the bounded case }D}8 ă 8 (recall D ” log*), �eorem 1.6 implies the lower bound in �eorem 1.2 by standardarguments, using the Azuma–Hoe�ding martingale inequality. �e proof of the lower bound in �eorem 1.2 in thebounded case is given at the end of Section 5, a�er the proof of �eorem 1.6. In the more general se�ing where Dmay be unbounded, the proof of �eorem 1.2 requires further estimates, as we outline in the next subsection.

1.4. Concentration results for unbounded case. Assumption 1 implies that we must have

1 ě *pGq ą �11tG P �p*qu (1.22)

where �1 is a positive constant, and �p*q is a subset of the real line of positive Lebesgue measure (which we denote|�p*q|). Moreover we can assume without loss that �p*q is bounded, i.e., �p*q Ď r´�maxp*q, �maxp*qs for some�nite �maxp*q. Following [Tal11b, §8.3], de�ne the truncated logarithm

log�pGq ” max!

´ �, log G)

.

�e following is an adaptation of [Tal11b, Propn. 9.2.6] (see also [Tal11b, Propn. 8.3.6]):

Proposition 1.7. Suppose* satis�es Assumption 1, and let �1 and �p*q be as above. �en for � “ expp´12q we have

ˆ

1#

ˇ

ˇ

ˇ log#�

ˆ

`2#

˙

´� log#�

ˆ

`2#

˙

ˇ

ˇ

ˇ ěplog#q2

#1{2

˙

ď1#2

for all # large enough (depending on |�p*q|, �maxp*q, and �1).

Next let � be a small positive constant, and consider the smoothed function

*�pGq ” p* ˚ !�qpGq “

ż

*pG ` �Iq!pIq 3I “ ��*pG ` ��q . (1.23)

Let `p�q denote the perceptron partition function with *� in place of * :

`p�q ”ÿ

ź

0ď"

*�

ˆ

pg0 , �q#1{2

˙

. (1.24)

Note that *� satis�es Assumption 1: it is a smooth mapping from ℝ into r0, 1s for any � ą 0, and condition (1.2)holds for � small enough. We will show (see Lemma 3.9) that 2p*�q can be bounded in terms of p 2q

1p*q. We thenhave the following approximation result:

Page 8: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

8 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proposition 1.8. Suppose* satis�es Assumption 1, and let �1 and �p*q be as above. �en we have

lim sup#Ñ8

1#

ˇ

ˇ

ˇ

ˇ

log#�

ˆ

`p�q

2#

˙

´ log#�

ˆ

`2#

˙ˇ

ˇ

ˇ

ˇ

ď >�p1q

for � “ expp´12q.

Propositions 1.7 and 1.8 are proved in Section 7. �e proofs rely on a bound for near-isotropic gaussian processes,Proposition 7.1, which we mentioned in §1.2.3 above. Finally, we have the following:

Proposition 1.9. If* satis�es Assumption 1, then we have

lim�Ó0

RSp ;*�q “ RSp ;*q

for all 0 ď ď 1p*q (as de�ned by (1.9)).

Proposition 1.10. Suppose* satis�es Assumption 1, and let `p�q be as in (1.24). �en we have

ˆ

ˇ

ˇ

ˇ log`p�q ´� log`p�qˇ

ˇ

ˇ ě #G

˙

ď 32# ¨ exp"

´#G2

32�2�1p* ;�q2

*

for all 0 ď G ď 5p�2q1{2�1p* ;�q.

�e proof of Proposition 1.9 is given in Section 3, while the proof of Proposition 1.10 is given in Section 7. �enPropositions 1.7, 1.8, 1.9, and 1.10 can be combined to �nish the proof of �eorem 1.2 in the unbounded case }D}8 “8. �e argument goes roughly as follows: by Propositions 1.7 and 1.8, with high probability

1#

log#�`2#´ >#p1q “ � log#�

`2#

“1#� log#�

`p�q

2#` >�p1q .

By applying �eorem 1.6 to *�, and combining with Proposition 1.9 and Proposition 1.10, we obtain

1#� log#�

`p�q

2#´ >#p1q “ RSp ;*�q ´ log 2 “ RSp ;*q ´ log 2` >�p1q .

For 0 ă ď p*q, the above is ě ´�{2 by straightforward estimates (Corollary 3.8). It follows that

´�2 ď RSp ;*q ´ log 2 “ >#p1q `

1#

log#�`2#

“ >#p1q `1#

log `2#

with high probability, as desired. At the end of Section 7 we give the conclusion of the proof of �eorem 1.2, wherethe above sketch is made precise.

Organization. �e remaining sections of the paper are organized as follows:‚ In Section 2 we give a preliminary expression (see �eorem 2.11) for the �rst moment of the perceptron

partition function conditional on ℱpCq.‚ In Section 3 we collect some basic technical results, including basic consequences of Assumptions 1 and 2.

We also give the proofs of Propostions 1.1, 1.4, and 1.9.‚ In Section 4 we analyze the conditional �rst moment calculations from Section 2 and complete the proof of

�eorem 1.5. �is leads to the upper bound in �eorem 1.2, presented at the end of the section.‚ In Section 5 we prove �eorem 1.6, which bounds the �rst and second moments of the (truncated) perceptron

partition function conditional on ℱpCq. From this we deduce the lower bound in �eorem 1.2 for the case}D}8 ă 8.

‚ In Section 6 we prove a local central limit theorem (Proposition 6.13) which is required for the calculationsof Sections 2–5.

‚ In Section 7 we prove Propositions 1.7, 1.8, and 1.10; and use these to conclude the proof of �eorem 1.2.‚ Lastly, in Appendix A we prove a gaussian resampling identity (Lemma 2.15) which is used in the conditional

moment calculations of Sections 2–5. We also give a heuristic review of the state evolution limit of AMP,which was rigorously established in earlier works [BM11, Bol14].

Page 9: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 9

Acknowledgements. We are grateful to Andrew Lawrie, Joe Neeman, Elchanan Mossel, and Ofer Zeitouni formany helpful conversations. Research of S.N. is supported by SNSF grant 176918. Research of N.S. is supported byNSF CAREER grant DMS-1940092 and NSF-Simons grant DMS-2031883.

2. First moment conditional on AMP

We consider the perceptron model (1.1) with an independent copy M1 of the disorder matrix M — this is clearlyequivalent (in law) to the original model. �e (random) weight of the con�guration � is

S ” S�pM1q ” exp"ˆ

1, Dˆ

M1�

#1{2

˙˙*

, (2.1)

where D ” log* : ℝ Ñ r´8, 0s is applied componentwise according to the convention (1.15). As in (1.1), thecorresponding perceptron partition function is

`pM1q ”ÿ

S�pM1q . (2.2)

Let mpBq and npℓq be generated from the AMP iteration (1.16) and (1.17) with M1 in place of M (and with the sameinitial values for mp0q, np0q, mp1q, np1q as before). �en, similarly as in (1.19), let

ℱ1pCq ” �

ˆ

´

M1mpBq , npB`1q : B ď C¯

pM1qtnpℓq ,mpℓ`1q : ℓ ď C ´ 1¯

˙

. (2.3)

We emphasize that ℱ1pCq in (2.3) is de�ned with respect to M1 while ℱpCq in (1.19) was de�ned with respect to M.�is section is organized as follows:

‚ In §2.1 we give a brief review of known results [BM11, Bol14] on the state evolution limit of AMP.‚ In §2.2 we decompose `pM1q into two parts (see (2.30)): one part `˝pM1q roughly captures the contribution

of con�gurations � P t´1,`1u# which lie close to mpCq in some sense (see (2.28)), while `‚pM1q is theremainder of the partition function. We then state the main result of this section, �eorem 2.11, which givesthe conditional �rst moment upper bound for `˝pM1q.

‚ In §2.3 we state and prove Proposition 2.12, which gives a conditional �rst moment upper bound for a singlecon�guration � P t´1,`1u# .

‚ In §2.4 we complete the proof of �eorem 2.11. We also supply some large deviations bounds, Lemmas 2.21and 2.22, which will be used later to bound `‚pM1q (see Corollary 4.1 in §4.1).

�e bound from �eorem 2.11 will be analyzed in Section 4 to conclude the proof of �eorem 1.5. �roughout thissection, * satis�es Assumption 1 and 2.

2.1. Review of AMP state evolution. In this subsection we review the main results on approximate messagepassing (as introduced in §1.3) that will be used in our proofs. What follows is primarily based on [BM11, Bol14]. Amore detailed review (with heuristic derivations) is given in Section A.

De�nition 2.1 (state evolution recursions). Let p@,#q be as given by Proposition 1.1, and abbreviate � ” �@ . Let

�1 ” �1 ”

ˆ

1@

˙1{2� thp#1{2/q “ 0 , �1 ” �1 ”

ˆ

#

˙1{2��p@1{2/q (2.4)

(cf. (A.14)). Next let �, �1 be independent standard gaussian random variables, and for B ě 1 let

�B`1 ” �p�Bq ”1@�

thˆ

#1{2!

�B� ` r1´ p�Bq2s1{2�1)

˙

thp#1{2�q

�B`1 ” �p�Bq ” #�

ˆ

@1{2!

�B� ` r1´ p�Bq2s1{2�1)

˙

�p@1{2�q

(2.5)

(cf. (A.20) and (A.27)). Supposing that �1 , . . . , �B´1 and �1 , . . . ,�B´1 have been de�ned, we let

�B “�B ´ΛB´1

p1´ΛB´1q1{2, �B “

�B ´ ΓB´1

p1´ ΓB´1q1{2(2.6)

Page 10: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

10 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

(cf. (A.31)), where we have used the abbreviations

ΓB´1 ”ÿ

ℓďB´1p�ℓ q

2 , ΛB´1 ”ÿ

ℓďB´1p�ℓ q

2 . (2.7)

�e above recursions are standard in the AMP literature, so we defer the explanations to Section A. We will con�rmin Lemma 3.10 that the recursions result in well-de�ned quantities for all B ě 1.

We now explain how the constants given in De�nition 2.1 describe the large-# behavior of the AMP iteration.To this end, we de�ne the (deterministic) matrices

� ”

¨

˚

˚

˚

˚

˚

˚

˝

1�1 p1´ Γ1q

1{2

�1 �2 p1´ Γ2q1{2

.... . .

�1 �2 ¨ ¨ ¨ p1´ ΓC´2q1{2

˛

P ℝpC´1qˆpC´1q , (2.8)

� ”

¨

˚

˚

˚

˚

˚

˚

˝

1�1 p1´Λ1q

1{2

�1 �2 p1´Λ2q1{2

.... . .

�1 �2 ¨ ¨ ¨ p1´ΛC´1q1{2

˛

P ℝCˆC . (2.9)

It will follow from Lemma 3.10 below that in our se�ing we will have ΓB P r0, 1q and ΛB P r0, 1q for all B ě 0,which implies that both � and � are non-singular matrices. As in (2.3), let mpBq and npℓq be generated from the AMPiteration (1.16) and (1.17) with M1 in place of M. Recall that mpBq ” thpHpBqq and npBq ” �phpBqq, where � “ �@ isgiven by (1.4). We de�ne vectors ypBq and xpBq by se�ing

HrC ´ 1s#1{2 ”

1#1{2

¨

˚

˝

pHp2qqt...

pHpCqqt

˛

‚” �

¨

˚

˝

pyp1qqt...

pypC´1qqt

˛

‚” �yrC ´ 1s P ℝpC´1qˆ# , (2.10)

hrCs@1{2 ”

1@1{2

¨

˚

˝

php2qqt...

phpC`1qqt

˛

‚” �

¨

˚

˝

pxp1qqt...

pxpCqqt

˛

‚” �xrCs P ℝCˆ" , (2.11)

for � and � as in (2.8) and (2.9). �en the xpBq “behave like” i.i.d. standard gaussian vectors in ℝ" , while the ypBq“behave like” i.i.d. standard gaussian vectors inℝ# . For an intuitive explanation we refer to the heuristic derivationof (A.29) and (A.30) given in in Section A. �e formal version is given by the next de�nition and lemma:

De�nition 2.2 (pseudo-Lipschitz functions). Following [BM11], we say that a function 5 : ℝℓ Ñ ℝ (where ℓ is anypositive integer) is pseudo-Lipschitz of order : if there exists a constant ! ą 0 such that

} 5 pGq ´ 5 pHq} ď !

ˆ

1` }G}:´1 ` }H}:´1˙

}G ´ H}

for all G, H P ℝℓ . We say for short that 5 is a PLp:q function.

Lemma 2.3 ([BM11, Lem. 1]). Suppose* satis�es Assumptions 1 and 2. In particular, this guarantees that the function�@ of (1.4) is Lipschitz (see Lemma 3.14). Let M be an " ˆ # matrix with i.i.d. standard gaussian entries, such that"{# “ . Assume 0 ď ď p*q, and let p@,#q be the solution given by Proposition 1.1. �en let mpBq ” thpHpBqqand npℓq ” �@phpℓqq be generated from the AMP iteration (1.16) and (1.17), with the same initial values for mp0q, np0q,mp1q, np1q as before. If 5 : ℝC´1 Ñ ℝ is a PLp:q function, then

1#

ÿ

8ď#

pHrC ´ 1se8qt¯

#Ñ8ÝÑ � 5 p#1{2�/q

Page 11: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 11

where / here denotes a standard gaussian vector in ℝC´1, and the convergence holds in probability as # Ñ 8 for any�xed C. Likewise, if 5 : ℝC Ñ ℝ is a PLp:q function, then

1"

ÿ

0ď"

phrCse0qt¯

#Ñ8ÝÑ � 5 p@1{2�/q

where / here denotes a standard gaussian vector in ℝC .

We remark that the results of [BM11] are for a more general se�ing where the AMP iteration starts from a randominitialization with bounded moments up to order 2:´2; the result then holds for any 5 which is PLp:q. In this paperwe start from an initialization with bounded moments of all �nite orders, so in Lemma 2.3 we can take 5 to be inPLp:q for any �nite :. We now present a few applications of Lemma 2.3 which illustrate how some of the recursionsfrom De�nition 2.1 naturally arise. First, it follows from Lemma 2.3 and the de�nition (2.5) that

pmpAq ,mpBqq

#@“pthpHpAqq, thpHpBqqq

#@» �pp��tqA´1,B´1q .

In the above and throughout this paper, we write 5 » 6 to indicate that 5 ´ 6 converges to zero in probability as# Ñ8. In the case A “ B we have

p��tqA´1,A´1(2.8)“

ÿ

ℓďA´2p�ℓ q

2 ` p1´ ΓA´2q(2.7)“ ΓA´2 ` p1´ ΓA´2q “ 1 .

If A ‰ B, we can suppose without loss that A ă B, in which case

p��tqA´1,B´1(2.8)“

ÿ

ℓďA´2p�ℓ q

2 ` �A´1p1´ ΓA´2q1{2 (2.7)

“ ΓA´2 ` �A´1p1´ ΓA´2q1{2 (2.6)

“ �A´1 .

It follows that }mpAq}2 » #@ for all A, and for A ă B we have

pmpAq ,mpBqq

#@» �p�A´1q

(2.5)“ �A

(2.6)“ ΛA´1 ` �Ap1´ ΓA´1q

1{2 (2.8)“ p��tqA,B . (2.12)

A similar calculation gives that }npAq}2 » ## for all A, and for A ă B we have

pnpAq , npBqq##

» �p�A´1q “ �A “ p��tqA,B . (2.13)

Let rpBq be the Gram–Schmidt orthogonalization of the vectors mpBq for B ě 1: thus rp1q “ mp1q{}mp1q} “ 1{#1{2,

rp2q “mp2q ´ pmp2q , rp1qqrp1q

}mp2q ´ pmp2q , rp1qqrp1q},

and so on. �e rpBq form an orthonormal set in #-dimensional space (assuming the number of iterations is muchsmaller than the dimension). Likewise, let cpBq be the Gram–Schmidt orthogonalization of the vectors npBq for B ě 1;these form an orthonormal set in "-dimensional space. Let �# , �# be the (random) matrices such that

mrCsp#@q1{2

”1

p#@q1{2

¨

˚

˝

pmp1qqt

...

pmpCqqt

˛

‚“ �#

¨

˚

˝

prp1qqt...

prpCqqt

˛

‚” �#rrCs P ℝCˆ# , (2.14)

nrC ´ 1sp##q1{2

”1

p##q1{2

¨

˚

˝

pnp1qqt...

pnpC´1qqt

˛

‚“ �#

¨

˚

˝

pcp1qqt...

pcpC´1qqt

˛

‚” �#crC ´ 1s P ℝpC´1qˆ" . (2.15)

It can be deduced from (2.12) and (2.13) thatˆ

�#

�#

˙

»

ˆ

��

˙

. (2.16)

Page 12: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

12 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

(�is means �# ´ � and �# ´ � converge entrywise to zero, in probability, as # Ñ 8.) Since rrCs and crC ´ 1shave orthonormal rows, the above implies

mrCsmrCst

#@

(2.14)“ �#rrCsrrCstp�#q

t “ �#p�#qt »

hrCshrCst

# @P ℝCˆC ,

nrC ´ 1snrC ´ 1st

##(2.15)“ �#crC ´ 1scrC ´ 1stp�#qt “ �#p�#qt »

HrCsHrCst

##P ℝpC´1qˆpC´1q ,

where the approximations on the right-hand side use Lemma 2.3. �is above of course consistent with the previouscalculations (2.12) and (2.13) (cf. [BM11, eq. (3.18) and (3.19)]).

A further consequence of Lemma 2.3 is that for all :, ℓ ě 1 we havepmp:`1q , ypℓqq

#@1{2 “pthpHp:`1qq, ypℓqq

#@1{2(2.10)“

1#@1{2

ˆ

thˆ

#1{2"

ÿ

ℓ 1ďC´1Γ:,ℓ 1ypℓ

1q

, ypℓq˙

»Γ:,ℓ

@1{2�”

/ thp#1{2/qı

“Γ:,ℓ

@1{2#1{2�

th1p#1{2/qı

(1.5)“

Γ:,ℓ

@1{2#1{2p1´ @q ,

having used the gaussian integration by parts identity. Recall also that mp1q “ @1{21, so Lemma 2.3 also implies

pmp1q , ypℓqq#@1{2 » �� “ 0

for all ℓ ď C ´ 1, where � is a standard gaussian random variable. �e above calculations can be summarized as›

yrC ´ 1smrCst

#@1{2 ´

˜

0#1{2

@1{2 p1´ @q�t¸

8

ď ERRC ,1 » 0 , (2.17)

where 0 denotes the zero vector in C´1 dimensions, and ERRC ,1 is an an ℱpCq-measurable random variable that con-verges to zero in probability as # Ñ8 (cf. [BM11, eq. (3.20) and (3.21)]). �is concludes our review of the requiredresults on the state evolution of AMP, and we turn next to the conditional moment calculations. We introduce somenotation which will be used later in the paper:

Remark 2.4 (bounds on �# and �# ). Since � and � are both non-singular (this will be veri�ed in Lemma 3.10below), we can de�ne a large �nite constant �C such that we have the bound

max"

}�#}8 , }p�#q´1}8 , }�#}8 , }p�#q´1}8

*

ď

ˆ

�CC

˙1{2(2.18)

with high probability. In the above, and throughout this paper, } ¨}8 denotes the entrywise maximum absolute valueof a vector or matrix. On the other hand, we write }D} for the euclidean norm of a vector D, and }�} for the spectralnorm a matrix �. It follows from (2.18) that we also have

max"

}�#}, }p�#q´1}, }�#}, }p�#q´1}

*

ď p�Cq1{2

with high probability.

�e proof of the following proposition is deferred to §3.4. It amounts to checking that an Almeida–�ouless (AT)condition ([dAT78]; see Lemma 3.11) is satis�ed.

Proposition 2.5. Suppose * satis�es Assumptions 1 and 2. For 0 ă ď p*q as de�ned by (1.6), the state evolutionrecursions from De�nition 2.1 result in ΓC Ñ 1 and ΛC Ñ 1 as C Ñ8.

Page 13: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 13

2.2. Positions of con�gurations relative to AMP iterates. We now de�ne parameters �p�q and +p�q whichsummarize the position of con�gurations � P t´1,`1u# relative to the vectors rpBq and ypℓq from (2.14) and (2.10).

De�nition 2.6 (parameters � and +). Let ℱ1pCq be as in (2.3). For � P t´1,`1u# , de�ne

�p�q ”rrCs�#1{2 “

ˆ

prpBq , �q#1{2

˙

BďC

P ℝC , (2.19)

+p�q ”yrC ´ 1s�

#“

ˆ

pypℓq , �q#

˙

ℓďC´1P ℝC´1 . (2.20)

Note that for any given � P t´1,`1u# , its parameters �p�q and +p�q are measurable with respect to ℱ1pCq.

Recall that the vectors rpBq and mpBq (1 ď B ď C) are linearly related by (2.14), while the vectors ypℓ`1q and Hpℓq(1 ď ℓ ď C´ 1) are linearly related by (2.10). For part of our calculation it is more convenient to work with mpBq andHpℓ`1q rather than with rpBq and ypℓq. For this reason we also de�ne the following parameters:

De�nition 2.7 (parameters � and �). Given ℱ1pCq as in (2.3), and given any � P t´1,`1u# , we decompose � as

� “ �1 ` �2 where �1 is the orthogonal projection of � onto the span of the vectors mpBq, 1 ď B ď C. We let �B for1 ď B ď C be the coe�cients such that

�1 “ÿ

BďC

�BmpBq

@1{2 “mrCst�@1{2 . (2.21)

Next let v ” �2{}�2}, and let � P ℝC´1 be de�ned by

�#p�#qt� “HrC ´ 1svp##q1{2

. (2.22)

Note that for any given � P t´1,`1u# , its parameters �p�q and �p�q are measurable with respect to ℱ1pCq.

�e parameters p�, +q of De�nition 2.6 are related as follows to the parameters p�, �q of De�nition 2.7:

Lemma 2.8 (change of basis). Givenℱ1pCq as in (2.3), suppose � P t´1,`1u# has parameters �p�q, +p�q, �p�q, �p�q

as in De�nitions 2.6 and 2.7. �en we have �p�q “ p�#qt�p�q, and

+p�q “yrC ´ 1smrCst

#@1{2 �p�q `´

1´ }�p�q}2¯1{2

�´1�#p�#qt�p�q .

Proof. For convenience we will o�en abbreviate � ” �p�q, etc. �e expression (2.21) can be rewri�en as�1

#1{2(2.21)“

mrCst�p#@q1{2

(2.14)“ rrCstp�#q

t� ,

so by comparing with (2.19) we see that �p�q “ p�#qt�p�q. Next we have

HrC ´ 1s�1

##1{2(2.21)“

HrC ´ 1smrCst�#p#@q1{2

(2.10)“

�yrC ´ 1smrCst�#@1{2 . (2.23)

It is clear from (2.19) that }�1}{#1{2 “ }�}, and since v ” �2{}�2}, it follows thatHrC ´ 1s�2

##1{2 “}�2}

#1{2 ¨HrC ´ 1svp##q1{2

´

1´ }�}2¯1{2HrC ´ 1sv

p##q1{2(2.22)“

´

1´ }�}2¯1{2

�#p�#qt� . (2.24)

Combining (2.10), (2.23), and (2.24) gives

+p�q(2.20)“

yrC ´ 1s�#

(2.10)“

�´1HrC ´ 1s�##1{2

(2.24)“

�´1HrC ´ 1s�1

##1{2 `

´

1´ }�}2¯1{2

�´1�#p�#qt

(2.23)“

yrC ´ 1smrCst

#@1{2 �`´

1´ }�}2¯1{2

�´1�#p�#qt� .

�is concludes the proof. �

Page 14: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

14 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Lemma 2.9 (approximate change of basis). Givenℱ1pCq as in (2.3), suppose again that � P t´1,`1u# has parameters�p�q, +p�q, �p�q, �p�q as in De�nitions 2.6 and 2.7. De�ne also 9�p�q ” �t�p�q and

9+p�q ” p�#qt"

#1{2

@1{2 p1´ @q�p�q `´

1´ }�p�q}2¯1{2

�p�q

*

, (2.25)

where � ” �p�q ” p�2 , . . . , �Cq P ℝC´1. �en

max"

›�p�q ´ 9�p�q›

8`

›+p�q ´ 9+p�q›

8: � P t´1,`1u#

*

ď ERRC ,2 ,

where ERRC ,2 is an anℱ1pCq-measurable random variable that converges to zero in probability as # Ñ8.

Proof. It follows trivially from the de�nition (2.6) and the Cauchy–Schwarz inequality that

}�p�q}8 ď max"

}rpBq} ¨ }�}#1{2 : B ď C

*

“ 1 ,

where we emphasize that the bound clearly holds uniformly over all � P t´1,`1u# . �erefore›

›�p�q ´ 9�p�q›

8ď sup

"›

ˆ

p�#q´1p�# ´�q

˙t

D

8

: }D}8 ď 1*

.

�e right-hand side above is ℱ1pCq-measurable and does not depend on �, and it follows from (2.16) that it tends tozero in probability as # Ñ 8. Next, to compare +p�q with 9+p�q, we note that +p�q ´ 9+p�q can be expressed asIp�q ` IIp�q where

Ip�q ”"

yrC ´ 1smrCst

#@1{2 ´

˜

0#1{2

@1{2 p1´ @qp�#qt¸

*

p�#q´1�p�q ,

IIp�q ”´

1´ }�p�q}2¯1{2

�´1p�# ´ �qp�#qt�p�q .

Since }�p�q}8 ď 1 as noted above, it follows using (2.16) and (2.17) that }Ip�q}8 can be bounded uniformly over �by an ℱ

1pCq-measurable quantity that tends to zero in probability as # Ñ 8. Next we note that (2.22) combinedwith the Cauchy–Schwarz inequality gives, for all �,

›�#p�#qt�p�q›

8ď max

"

}Hpℓq}##1{2 : ℓ ď C ´ 1

*

.

�e right-hand side above isℱ1pCq-measurable, and it can be deduced from Lemma 2.3 that it converges in probabilityto 1 as # Ñ8. It follows by combining with (2.16) that }IIp�q}8 can also be bounded uniformly over � by an ℱ

1pCq-measurable quantity that tends to zero in probability as # Ñ8. �is proves the claim. �

We next use the AMP iteration to de�ne a convenient change of measure on the discrete cube:

De�nition 2.10 (change of measure). Let P denote the uniform probability measure on t´1,`1u# , and let Q bethe probability measure on the same space which is given by

3Q3P

“ź

8ď#

expppHpCqq8 �8qchpHpCqq8

“exptpHpCq , �qu

exptp1, log chHpCqqu.

If � is sampled from the measure Q, its expected value is exactly thpHpCqq “ mpCq. We now compute the expectedvalues under Q of the parameters from De�nition 2.6. First we note that

9�˚ ”rrCsmpCq

#1{2 “rrCsmrCst 4C

#1{2(2.14)“ @1{2rrCsrrCstp�#q

t 4C “ @1{2p�#qt 4C , (2.26)

where 4B denotes the B-th standard basis vector in ℝC . Let us de�ne also �˚ ” @1{2�t 4C , and note that �˚ » 9�˚ by(2.16). Next we note that

9+˚ ”yrC ´ 1smpCq

#

(2.17)» #1{2p1´ @qp�t 4C´1q ” +˚ P ℝ

C´1 , (2.27)

Page 15: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 15

where 4ℓ denotes the ℓ -th standard basis vector in ℝC´1.

Recalling (1.1) and (2.1), we now de�ne

T˝ ”

"

p�, +q : max!

}�p�q ´ �˚}, }+p�q ´ +˚})

ď 16 ¨ �1p*q 1{2*

, (2.28)

where the constant �1p*q comes from Lemma 3.3 below. We also let

ℍ˝ ”

"

� P t´1,`1u# : p�p�q, +p�qq P T˝

*

, (2.29)

and we let ℍ‚ ” t´1,`1u#zℍ˝. We then decompose `pM1q “ `˝pM1q ` `‚pM1q where

`˝pM1q ”ÿ

�Pℍ˝

S�pM1q , `‚pM1q ”ÿ

�Pℍ‚

S�pM1q , (2.30)

�e main result of this section is as follows:

�eorem 2.11. Suppose* satis�es Assumptions 1 and 2, and let ℱ1pCq be as in (2.3). Given & P ℝ, de�ne

^p�, +q ” xrCst�˚ `"

xrCstp�´ �˚q ` #1{2 &crC ´ 1stp+ ´ +˚q

*

P ℝ" ,

for �˚ and +˚ as in De�nition 2.10. (�e parameter & will be �xed later in (4.6).) �en de�ne

Ψp�, +q ”}+ ´ &p+ ´ +˚q}2

2p1´ }�}2q ´p+˚ , +q

1´ @`

1#

ÿ

0ď"

!}�}2p^0p�, +qq .

If Q is the measure on t´1,`1u# from De�nition 2.10, then we have

�p`˝pM1q |ℱ1pCqq

exptp1, logp2 chpHpCqqqquď

ÿ

�Pℍ˝

Qp�q exp"

#

Ψp�p�q, +p�qq ` ERRC ,3

*

,

where ERRC ,3 is an anℱ1pCq-measurable random variable that converges to zero in probability as # Ñ8.

�e proof of �eorem 2.11 is given in §2.4.

2.3. First moment for a single con�guration. �e main result of this subsection is the following:

Proposition 2.12. Suppose* satis�es Assumption 1 and 2, and let ℱ1pCq be as in (2.3). De�ne

Ap�, 9�, 9+, �q ”} 9+ ´ �}2

2p1´ }�}2q `1#

ÿ

0ď"

!}�}2

ˆ

xrCst 9�` #1{2crC ´ 1st�˙

,

where the function ! is de�ned by (1.3). Recall S�pM1q from (2.1). �ere exists a �nite constant ℘C ,1 such that for anylarge �nite constant �max, it holds with probability 1´ >#p1q that

1#

log�´

S�pM1qˇ

ˇ

ˇℱ1pCq

¯

ď inf"

�p�q, 9�p�q, 9+p�q, �¯

: }�} ď �max

*

`℘C ,1#

uniformly over all � P t´1,`1u# with }�p�q} ď 4{5.

�e proof of Proposition 2.12 is given at the end of this subsection.

De�nition 2.13 (row and column subspaces). Given ℱ1pCq as in (2.3), de�ne the linear subspaces

+R ” +RpCq ” span"

e0pmpBqqt : 1 ď 0 ď ", 1 ď B ď C

*

,

+C ” +CpC ´ 1q ” span"

npℓqpe8qt : 1 ď 8 ď #, 1 ď ℓ ď C ´ 1*

.

Let +RC ” +R `+C. Let projR denote the orthogonal projection onto +R, and de�ne analogously projC and projRC.Note that pM1qRC ” projRCpM1q is measurable with respect to ℱ

1pCq.

Page 16: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

16 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

De�nition 2.14 (row and column events). We now let M be an independent copy of M1, and de�ne

R ”!

projRpMq “ pM1qR)

"

MmpBq

#1{2 “ hpB`1q ` �npB´1q for all 1 ď B ď C

*

, (2.31)

C ”!

projCpMq “ pM1qC)

"

Mtnpℓq

#1{2 “ Hpℓ`1q ` �mpℓ´1q for all 1 ď ℓ ď C ´ 1*

. (2.32)

We shall refer to R as the row event (since it constrains the rows of the matrix M). Likewise we shall refer to C asthe column event.

Our calculation is based on the following resampling principle (proved in §A.2):

Lemma 2.15 (resampling). Let ℱ1pCq be as in (2.3). If 5 : ℝ"ˆ# Ñ ℝ is any bounded measurable function, then

´

5 pM1qˇ

ˇ

ˇℱ1pCq

¯

“ �

ˆ

5 pMq

ˇ

ˇ

ˇ

ˇ

R,C, pM1qRC

˙

where M denotes an independent copy of M1; and the events R and C are de�ned by (2.31) and (2.32).

De�nition 2.16 (con�guration-dependent subspaces). Given ℱ1pCq as in (2.3), and � P t´1,`1u# , recall from

De�nition 2.7 that we decompose � “ �1 ` �2, and let v ” �2{}�2}. We then de�ne the linear subspaces

+P ” span"

e0vt : 1 ď 0 ď "

*

,

+A ” span"

npℓqvt : 1 ď ℓ ď C ´ 1*

Note that +A is a subspace of +P, and is also a subspace of +C. Let projA denote the orthogonal projection onto +A,and note that pM1qA ” projApM1q is measurable with respect to ℱ

1pCq.

De�nition 2.17 (admissibility event). As in De�nition 2.14, let M be an independent copy of M1, and de�ne

A ”!

projApMq “ pM1qA)

(2.32)“

"

nrC ´ 1sMv##1{2 “

HrC ´ 1svp##q1{2

*

, (2.33)

where the last identity holds assuming M belongs to the event C from (2.32). Note that HrC ´ 1sv is determined bythe parameter �p�q from De�nition 2.7. We refer to A as the admissibility event, and note A Ď C.

In the se�ing of the perceptron model, the calculation of Lemma 2.15 can be simpli�ed as follows:

Lemma 2.18 (reduction of column constraints). If ℎ : ℝ" Ñ ℝ is any bounded measurable function, then

ˆ

ℎpM�q

ˇ

ˇ

ˇ

ˇ

R,C, pM1qRC

˙

“ �

ˆ

ℎpM�q

ˇ

ˇ

ˇ

ˇ

R,A, pM1qRA

˙

where M denotes an independent copy of M1 and the events R,C,A are de�ned by (2.31), (2.32), and (2.33).

Proof. Let +CzR be the orthogonal complement of +R inside +R `+C: that is,

+R `+C “ +R k+CzR ,

where we use k to denote the sum of two orthogonal vector spaces. Note that +A is a subspace of +C which isorthogonal to +R, so it follows that +A is also a subspace of +CzR. Let projCzR denote the orthogonal projection onto+CzR. Note that +A is a subspace of +P, and +P is orthogonal to +R. We claim that

projCzRp+Pq “ +A . (2.34)

Since we already noted that +A Ď +CzR, it su�ces to show inclusion in the other direction. �e space +P is spannedby the elements e0vt. Let cpℓq, 1 ď ℓ ď C´1, be any orthonormal basis for the span of the vectors npℓq, 1 ď ℓ ď C´1.

Page 17: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 17

An orthonormal basis for +A is then given by the matrices cpℓqvt, 1 ď ℓ ď C ´ 1. On the other hand, the space +C isspanned by the elements cpℓqpe8qt. We therefore have

ˆ

e0vt ´ projA´

e0vt¯

, cpℓqpe8qt ´ projR´

cpℓqpe8qt¯

˙

ˆ

e0vt ´ projA´

e0vt¯

, cpℓqpe8qt˙

“ pcpℓqq0v8 ´ˆ

ÿ

:ďC´1pe0vt , cp:qvtqcp:qvt , cpℓqpe8qt

˙

“ pcpℓqq0v8 ´ pcpℓqq0v8 “ 0 .

It follows that for any �P P +P we have �P ´ projAp�Pq orthogonal to +CzR, which concludes the proof of (2.34). Itfollows that +P “ +A k +PzA where +PzA is the orthogonal complement of +A inside +P, and +PzA is orthogonal to+CzR. As a result, if M is an "ˆ# matrix with i.i.d. standard gaussian entries, we can decompose MP “ MA`MPzA

where MPzA “ projPzApMq is independent of MCzR. It follows that

ˆ

ℎpM�q

ˇ

ˇ

ˇ

ˇ

R,C, pM1qRC

˙

“ �

ˆ

ℎpMR�1 ` MP�

2q

ˇ

ˇ

ˇ

ˇ

R,A,C, pM1qRC

˙

“ �

ˆ

ℎpMR�1 ` pMA ` MPzAq�

2q

ˇ

ˇ

ˇ

ˇ

R,A,C, pM1qRC

˙

“ �

ˆ

ℎpMR�1 ` pMA ` MPzAq�

2q

ˇ

ˇ

ˇ

ˇ

R,A, pM1qRA

˙

“ �

ˆ

ℎpM�q

ˇ

ˇ

ˇ

ˇ

R,A, pM1qRA

˙

,

as claimed. �

Further towards the proof of Proposition 2.12, we record the following calculations:

Lemma 2.19. For � P t´1,`1u# , recall the decomposition � “ �1 ` �2, and de�ne ˜� ” M�1{#1{2. On the event R

from (2.31), we have

˜� “

1@1{2

"

hrCst�p�q ` �nrC ´ 1st�p�q*

“ xrCst 9�p�q ` #1{2crC ´ 1stˆ

9+p�q ´´

1´ }�p�q}2¯1{2

p�#qt�˙

.

In the above, �p�q is given by De�nition 2.6; �p�q and �p�q are given by De�nition 2.7; and �p�q, 9�p�q, and 9+p�q arede�ned by Lemma 2.9.

Proof. Fix � and abbreviate � ” �p�q, etc. Conditional on the event R from (2.31), we have

˜� ”

M�1

#1{2(2.21)“

ÿ

BďC

�BMmpBq

p#@q1{2(2.31)“

ÿ

BďC

�B@1{2

´

hpB`1q ` �npB´1q¯

.

Recall the notation (2.11) and (2.15), and also that np0q ” 0 P ℝ" . �erefore the above can be rewri�en as

˜� “

hrCst�@1{2 `

�nrC ´ 1st�@1{2 .

Combining with (2.11) and (2.15) gives

˜�

(2.11)“ xrCst�t�`

p1´ @qnrC ´ 1st�@1{2

(2.15)“ xrCst�t�`

#1{2#1{2p1´ @q

@1{2 crC ´ 1sp�#qt� .

Recalling the notation of Lemma 2.9 gives, with 9� ” �t� and 9+ as in (2.25),

˜� “ xrCst 9�` #1{2crC ´ 1st

ˆ

9+ ´´

1´ }�}2¯1{2

p�#qt�˙

.

�is concludes the proof. �

Page 18: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

18 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Lemma 2.20. Given � P t´1,`1u# , de�ne the cumulant-generating function

K�p�q ”1#

log�„

exp"

#1{2ÿ

ℓďC´1�ℓ pcpℓq ,Mvq

*

S�pMq

ˇ

ˇ

ˇ

ˇ

R

for � P ℝC´1. �en, with ! as in (1.3), the function K� satis�es

K�p�q ´}�}2

2 “1#

ˆ

1, !}�p�q}2

ˆ

˜� ` #1{2

´

1´ }�p�q}2¯1{2

crC ´ 1st�˙˙

” ℒ�p�q ,

with �p�q as in De�nition 2.6 and ˜� is as in Lemma 2.19.

Proof. Conditional on the event R, it follows from Lemma 2.19 that M�1{#1{2 “ ˜� ” ˜ . We also have

M�2

#1{2 “}�2}

#1{2 Mv ”´

1´ }�}2¯1{2

/ ,

where � ” �p�q, and / ” Mv is distributed as an independent gaussian vector in ℝ# . It follows that

K�p�q “1#

ÿ

0ď"

log��

exp"

#1{2ÿ

ℓďC´1�ℓ pcpℓqq0�

*

*

ˆ

˜0 `

´

1´ }�}2¯1{2

˙

,

where � denotes a standard gaussian random variable. Making a change of variable gives

K�p�q “}�}2

2 `1#

ÿ

0ď"

log��*

ˆ

˜0 `

´

1´ }�}2¯1{2

� ` #1{2ÿ

ℓďC´1�ℓc

pℓq0

˙

,

from which the result follows. �

Having collected most of the necessary ingredients, we now prove the main result of this subsection. �e proofrequires one more slightly technical estimate which we defer to Proposition 6.13 in Section 6.

Proof of Proposition 2.12. With ℱ1pCq as in (2.3) and S�pM1q as in (2.1), let us abbreviate the quantity of interest as

�� ” �´

S�pM1qˇ

ˇ

ˇℱ1pCq

¯

.

By the resampling principle from Lemma 2.15, we can express

�� “ �´

S�pMqˇ

ˇ

ˇR,C, pM1qRC¯

,

where M is an independent copy of M1, and R and C are the row and column events of De�nition 2.14. ApplyingLemma 2.18 then gives the further simpli�cation

�� “ �´

S�pMqˇ

ˇ

ˇR,A, pM1qRA¯

, (2.35)

where A is the admissibility event de�ned by (2.33).Let +R be as in De�nition 2.13, and note that an orthonormal basis for +R is given by the elements e0prpBqqt for

1 ď 0 ď ", 1 ď B ď C. Denote

gR ”ˆ

pM, e0prpBqqtq : 1 ď 0 ď ", 1 ď B ď C

˙

P ℝ"C .

Likewise let +P and +A be as in De�nition 2.16: recall that +P is orthogonal to +R, and +A is a subpsace of +P. Anorthonormal basis for +P is given by the elements e0vt for 1 ď 0 ď ". Denote

gP ”ˆ

pM, e0vtq : 1 ď 0 ď "

˙

“ Mv P ℝ" . (2.36)

An orthonormal basis for +A is given by the elements cpℓqvt for 1 ď ℓ ď C ´ 1, and we shall denote

gA ”ˆ

pM, cpℓqvtq : 1 ď ℓ ď C ´ 1˙

“ crC ´ 1sMv P ℝC´1 . (2.37)

Page 19: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 19

Lastly, as in the proof of Lemma 2.18, let+PzA be the orthogonal complement of+A inside+P. Choose an orthonormalbasis for +PzA, and denote it H 9 for 1 ď 9 ď " ´ pC ´ 1q. We then let

gB ”ˆ

pM, H 9q : 1 ď 9 ď " ´ pC ´ 1q˙

P ℝ"´C`1 . (2.38)

Note that there is an orthogonal transformation of ℝ" which maps gP to the pair pgA , gBq. In what follows we let?R denote the probability density function for gR, so

?Rp6Rq “1

p2�q"C{2 exp"

´}6R}

2

2

*

. (2.39)

Likewise let ?A and ?B denote the densities for gA and gB respectively. Since the three subspaces+R, +A, and +B aremutually orthogonal, the joint density of pgR , gA , gBq is simply the product ?Rp6Rq?Ap6Aq?Bp6Bq.

�e weight S�pMq, as de�ned by (2.1), is a function of M�, which we decomposed in the proof of Lemma 2.20 as asum of M�1 and M�2. Note that M�1 is a function of gR, while M�2 is a function of gP which in turn is a function ofpgA , gBq. �us (2.1) can be rewri�en as a function Y� of pgR , gA , gBq: explicitly,

S�pMq “ź

0ď"

*

ˆ

ÿ

BďC

p� , rpBqq#1{2 pgRq0,B `

}�2}

#1{2 pgPq0˙

” Y�pgR , gA , gBq .

On the event R, the value of gR is �xed to a value 6R:

p6Rq0,B “ pMrrCstq0,B(2.14)“

ˆ

MmrCstpp�#qtq´1

p#@q1{2

˙

0,B

,

where the right-hand side can be computed from (2.31). Likewise, on the event A, the value of gA is �xed to a value6A. We then introduce a parameter � P ℝC´1, and de�ne

S� ,�pMq ” Y� ,�pgR , gA , gBq ” Y�pgR , gA , gBq exp"

#1{2p�, gAq*

. (2.40)

�en, for any � P ℝC´1, we can rewrite (2.35) as

�� “ �´

S�pMqˇ

ˇ

ˇR,A, pM1qRA¯

“ �

ˆ

Y� ,�pgR , gA , gBqexpp#1{2p�, 6Aqq

ˇ

ˇ

ˇ

ˇ

pgR , gAq “ p6R , 6Aq˙

“1

expp#1{2p�, 6Aqq

ż

Y� ,�p6R , 6A , 6Bq?Bp6Bq 36B . (2.41)

By contrast, the expected value of S� ,� given only the row constraints is

K�p� | 6Rq ” �´

S� ,�pMqˇ

ˇ

ˇR, pM1qR¯

“ �

ˆ

Y� ,�pgR , gA , gBqˇ

ˇ

ˇ

ˇ

gR “ 6R

˙

ż

?Ap6Aq

ż

Y� ,�p6R , 6A , 6Bq?Bp6Bq 36B 6A “ expp#K�p�qq , (2.42)

which was computed in Lemma 2.20 above. We then let p� ,�p¨ | 6Rq be the probability density function of 6A underthe measure that is biased by S� ,�pMq, conditional on the event R, that is to say,

p� ,�p6A | 6Rq 36A ”�pS� ,�pMq1tgA P 36Au |Rq

�pS� ,�pMq |Rq”

?Ap6Aq

K�p� | 6Rq

ż

Y� ,�p6R , 6A , 6Bq?Bp6Bq 36B 36A . (2.43)

�en, for any � P ℝC´1, we can rewrite (2.41) as

�� “K�p� | 6Rq ¨ p� ,�p6A | 6Rq

expt#1{2p�, 6Aqu ¨ ?Ap6Aq. (2.44)

We will show in Proposition 6.13 (deferred to Section 6) that there is a �nite constant ℘C ,0 such that for any �niteconstant �max, we have the uniform bound

max"

›p� ,�p¨ | 6Rq›

8: � P t´1,`1u# , }�p�q} ď 4

5 , }�} ď �max

*

ď ℘C ,0 (2.45)

Page 20: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

20 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

with high probability. It therefore remains to estimate the other two terms on the right-hand side of (2.44). We thennote that De�nition 2.17 implies that, on the event A, we have

6A

#1{2 “gA#1{2

(2.37)“

crC ´ 1sMv#1{2

(2.15)“

p�#q´1nrC ´ 1sMv##1{2

(2.33)“

p�#q´1HrC ´ 1svp##q1{2

(2.22)“ p�#qt� . (2.46)

Substituting (2.46) into the formula for ?A (similar to (2.39)) gives

?Ap6Aq “1

p2�qpC´1q{2 exp"

´#}p�#qt�}2

2

*

. (2.47)

Meanwhile, it follows by combining (2.42) and (2.46) thatK�p� | 6Rq

expt#1{2p�, 6Aqu“ exp

"

#”

K�p�q ´ p�, p�#qt�qı

*

. (2.48)

Substituting (2.45), (2.47), and (2.48) into (2.44) gives��

p2�qC{2 ¨ ℘C ,0ď exp

"

#

K�p�q ´ p�, p�#qt�q `}p�#qt�}2

2

*

.

Recalling the calculation of K�p�q from Lemma 2.20 gives��

p2�qC{2 ¨ ℘C ,0ď exp

"

#

}�´ p�#qt�}2

2 ` ℒ�p�q

*

” exp!

#A�p�q)

, (2.49)

where A� is de�ned by the last identity. To simplify the above expression, we will recenter � around

� ” �p�q ” ´#1{2p1´ @qp�#qt�

@1{2p1´ }�}2q1{2(2.25)“ ´

9+

p1´ }�}2q1{2` p�#qt� . (2.50)

We then make a change of variables from � to �, via the de�nition

� ” �`�

p1´ }�}2q1{2. (2.51)

�is change of variables results in the simpli�cation

�´ p�#qt�(2.51)“ �`

p1´ }�}2q1{2´ p�#qt�

(2.50)“

� ´ 9+

p1´ }�}2q1{2.

�e computation of ˜� from Lemma 2.19 can also be rewri�en as

˜�

(2.50)“ xrCst 9�´ #1{2

´

1´ }�}2¯1{2

crC ´ 1st� . (2.52)

As a result the function ℒ� from (2.20) can be reparametrized as

ℒ�

ˆ

�`�

p1´ }�}2q1{2

˙

(2.52)“

1#

ˆ

1, !}�}2

´

xrCst 9�` #1{2crC ´ 1st�¯

˙

.

It follows by substituting the above calculations into (2.49) that

A�

ˆ

�`�

p1´ }�}2q1{2

˙

“} 9+ ´ �}2

2p1´ }�}2q `1#

ÿ

0ď"

!}�}2

ˆ

xrCst 9�` #1{2crC ´ 1st�˙

.

�e claim follows by taking ℘C ,1 ” plog℘C ,0 ` C logp2�qq{2. �

�e above completes the proof of Proposition 2.12, modulo Proposition 6.13 which is deferred to Section 6.

Page 21: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 21

2.4. First moment for partition function. We now collect some of the preceding results to complete the proofof the main result of this section:

Proof of �eorem 2.11. For any � P t´1,`1u# we can calculate (abbreviating + ” +p�q)

pHpCq , �q#

“p4C´1q

tHrC ´ 1s�#

(2.10)“

#1{2p4C´1qt�yrC ´ 1s�#

(2.20)“ p#1{2�t 4C´1 , +q

(2.27)“

p+˚ , +q

1´ @.

It follows by combining with De�nition 2.10 that�p`˝pM1q |ℱ1pCqq

exptp1, logp2 chpHpCqqqqu“

ÿ

�Pℍ˝

Qp�qˆ

Pp�qQp�q exptp1, log chpHpCqqqu

˙

´

S�pM1qˇ

ˇ

ˇℱ1pCq

¯

“ÿ

�Pℍ˝

Qp�q exp"

´#p+˚ , +q

1´ @

*

´

S�pM1qˇ

ˇ

ˇℱ1pCq

¯

. (2.53)

Combining Proposition 2.12 with Lemma 2.9 gives, with high probability,

´

S�pM1qˇ

ˇ

ˇℱ1pCq

¯

ď}+ ´ �}2

2p1´ }�}2q `1#

ÿ

0ď"

!}�}2

ˆ

xrCst�` #1{2crC ´ 1st�˙

` ERRC ,3 ,

uniformly over }�p�q} ď 4{5 and }�} ď �max. �e claim follows by se�ing � “ &p+ ´ +˚q. �

Recall from (2.30) that ` “ `˝ ` `‚ where `˝ is bounded by �eorem 2.11. In the remainder of this section weshow that the other quantity `‚ can be bounded by a priori estimates. For this purpose we prove a rough estimate on�p�q (Lemma 2.21), followed by a more precise estimate on +p�q (Lemma 2.22). In fact Lemma 2.22 is more precisethan what is needed to analyze `‚, but it will be needed later (in Section 4) in the analysis of `˝. We �rst state andprove the estimate for �p�q:

Lemma 2.21. Recall �p�q from De�nition 2.6 and 9�˚ from (2.26). For Q as in De�nition 2.10, we have

Qˆ"

� P t´1,`1u# :›

›�p�q ´ 9�˚›

› ě 3

ď

ˆ

66C@

˙C{2exp

"

´#32p1´ 3@1{2q

8

*

for all |3| ě 1{#1{2. (�e bound is vacuous unless #32 is large compared to C log C.)

Proof. Under the measure Q, the random vector � ´mpCq has independent entries of mean zero. We note also that

pm8q2 ” max

"

ˇ

ˇ

ˇ�8 ´ pmpCqq8

ˇ

ˇ

ˇ

2: �8 P t´1,`1u

*

ď

´

1` |pmpCqq8 |

¯2ď 1` 3|pmpCqq8 | ď 4 . (2.54)

�us for any 0 P ℝC we can bound

+maxp0q ”ÿ

8ď#

ˆ

ÿ

BďC

0BprpBqq8˙2pm8q

2 ď 4›

ÿ

BďC

0BrpBq›

2“ 4}0}2 .

It follows by the Azuma–Hoe�ding bound that

1#1{2

ˆ

ÿ

BďC

0BrpBq , � ´mpCq

˙

ě G

˙

ď exp"

´#G2

2+maxp0q

*

ď exp"

´#G2

8}0}2

*

. (2.55)

On the other hand, it follows from De�nition 2.6 and (2.26) that1

#1{2

ˆ

ÿ

BďC

0BrpBq , � ´mpCq

˙

´

0,�p�q ´ 9�˚¯

. (2.56)

Given 3 ą 0 and & P p0, 1{4s, note there exists a p3&q-net of r´43, 43sC of cardinality at mostR

8C1{2&

VC

ď

ˆ

8C1{2&` 1

˙C

. (2.57)

Page 22: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

22 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

If � is any element of t´1,`1u# with 3 ď }�p�q ´ 9�˚} ď 23, and �net is an element of the p3&q-net at minimaldistance from �p�q, then }�net ´ 9�˚} ě 3p1´ &q, and

´

�net ´ 9�˚ ,�p�q ´ 9�˚¯

ě

›�p�q ´ 9�˚›

ˇ

ˇ

ˇ

ˇ

´

�net ´ �p�q,�p�q ´ 9�˚¯

ˇ

ˇ

ˇ

ˇ

ě 32p1´ 2&q .

�us, by taking 0 “ �net ´ 9�˚ and & “ @1{2 in (2.55) and (2.56), we obtain

3 ď›

›�p�q ´ 9�˚›

› ď 23˙

ď

ˆ

65C@

˙C{2exp

"

´#34p1´ 2@1{2q2

832p1´ @1{2q2

*

ď

ˆ

65C@

˙C{2exp

"

´#32p1´ 3@1{2q

8

*

.

Since 4: ě 3: for all : ě 0, as long as #32 ě 1 we can bound

›�p�q ´ 9�˚›

› ě 3

˙

ďÿ

:ě0

p65C{@qC{2

expt#p2:3q2p1´ 3@1{2q{8uď

2p65C{@qC{2

expt#32p1´ 3@1{2q{8u.

�is proves the claim. �

�e result for +p�q is very similar, although slightly more involved since we require a more precise estimate:

Lemma 2.22. Recall +p�q from De�nition 2.6, and 9+˚ from (2.27). For Q as in De�nition 2.10, we have

Qˆ"

� P t´1,`1u# :›

›+p�q ´ 9+˚›

› ě 3

ď

ˆ

66C@

˙C{2exp

"

´#32p1´ 8@1{2q

2

*

for all |3| ě 1{#1{2. (�e bound is vacuous unless #32 is large compared to C log C.)

Proof. For 1 P ℝC´1, denote

,maxp1q ”1#

ÿ

8ď#

ˆ

ÿ

ℓďC´11ℓ pypℓqq8

˙2pm8q

2 ď,0p1q ` 3,1p1q .

Recall the bound (2.54) from the proof of Lemma 2.21; it implies ,max ď,0 ` 3,1 where

,0p1q ”1#

ÿ

ℓďC´11ℓypℓq

2,

,1p1q ”1#

ÿ

8ď#

ˆ

ÿ

ℓďC´11ℓ pypℓqq8

˙2|pmpCqq8 | .

It follows from Lemma 2.3 that ,0p1q Ñ }1}2 in probability as # Ñ8. Lemma 2.3 also implies,1p1q

}1}2#Ñ8ÝÑ �

„ˆ

�/ ` p1´ �2q1{2/1˙2ˇ

ˇ

ˇ thp#1{2/qˇ

ˇ

ˇ

” F1p�q .

in probability, where � P r´1, 1s is a value that can depend on 1. However we can crudely bound

F1p�q ď

ˆ

�p/4q�rthp#1{2/q2s

˙1{2(1.5)“ p3@q1{2 .

It follows by the Azuma–Hoe�ding inequality that

1#

ˆ

ÿ

ℓďC´11ℓypℓq , � ´mpCq

˙

ě G

˙

ď exp"

´#G2

2,maxp1q

*

ď exp"

´#G2

2}1}2p1` 6@1{2q

*

. (2.58)

On the other hand, it follows from De�nition 2.6 and (2.27) that1#

ˆ

ÿ

ℓďC´11ℓypℓq , � ´mpCq

˙

´

1, +p�q ´ 9+˚¯

. (2.59)

Page 23: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 23

Given 3 ą 0 and & P p0, 1{4s, note there exists a p3&q-net of r´43, 43sC´1 with cardinality upper bounded by (2.57).If � is any element of t´1,`1u# with 3 ď }+p�q ´ 9+˚} ď 23, and +net is an element of the p3&q-net at minimaldistance from +p�q, then }+net ´ 9+˚} ě 3p1´ &q, and

´

+net ´ 9+˚ , +p�q ´ 9+˚¯

ě

›+p�q ´ 9+˚›

ˇ

ˇ

ˇ

ˇ

p+net ´ +p�q, +p�q ´ 9+˚q

ˇ

ˇ

ˇ

ˇ

ě 32p1´ 2&q .

�us, by taking & “ @1{2 and 1 “ +net ´ 9+˚ in (2.58) and (2.59), we obtain

3 ď›

›+p�q ´ 9+˚›

› ď 23˙

ď

ˆ

65C@

˙C{2exp

"

´#34p1´ 2@1{2q2

232p1´ @1{2q2p1` 6@1{2q

*

ď

ˆ

65C@

˙C{2exp

"

´#32p1´ 8@1{2q

2

*

.

Since 4: ě 3: for all : ě 0, as long as #32 ě 1 we can bound

›+p�q ´ 9+˚›

› ě 3

˙

ďÿ

:ě0

p65C{@qC{2

expt#p2:3q2p1´ 8@1{2q{2uď

2p65C{@qC{2

expt#32p1´ 8@1{2q{2u.

�e claim follows. �

3. Technical estimates

We now collect some technical results which will be used later in the proof. �is section is organized as follows:‚ In §3.1 we prove some basic consequences of Assumptions 1 and 2.‚ In §3.2 we give the proof of Proposition 1.1, which characterizes the replica symmetric �xed-point solution.

As a consequence of this analysis we obtain a rough estimate (Corollary 3.8) of the replica symmetric formula(1.8), which will be used in later sections. We also prove Proposition 1.9, showing that the replica symmetricformula for *� converges to the one for * as � Ó 0.

‚ In §3.3 we prove Lemma 3.11, which gives the Almeida–�ouless (AT) condition in our se�ing.‚ In §3.4 we give the proof of Proposition 1.4, showing that Assumption 2 holds if D ” log* is either bounded

or concave. We also give the proof of Proposition 2.5 (convergence of the state evolution recursions), whichamounts to checking that AT condition derived in Lemma 3.11 holds for 0 ă ď p*q. We conclude thesection with some further consequences (Lemmas 3.14 and 3.15) of Assumption 2.

�e following notation will be used throughout the paper:

De�nition 3.1. For 2 ą 0 and G P ℝ, let �G,2 denote the probability measure on the real line whose density (withrespect to the Lebesgue measure) is given by

3�G,23I

“ "G,2pIq ”*pG ` 2Iq!pIq

��r*pG ` 2�qs.

We use �G,2 , VarG,2 , and CovG,2 to denote expectation, variance, and covariance under �G,2 .

3.1. Preliminary bounds. In this subsection we prove some basic consequences of Assumptions 1 and 2. As before,� denotes a standard gaussian random variable, and �� denotes expectation over �.

Lemma 3.2. Suppose * satis�es Assumption 1, and let @G,2pIq ” *pG ` 2Iq!pIq as above. �en, given any & ą 0and any ! ă 8, it is possible to choose �1 small enough such that we have the bound

ż

ˇ

ˇ

ˇ*pG ` 2Iq ´*pG1 ` 21Iqˇ

ˇ

ˇ!pIq 3I ď &

as long as 2, 21 P r1{3, 3s, G, G1 P r´!, !s, and maxt|G ´ G1|, |2 ´ 21|u ď �1.

Proof. Given & ą 0, we can clearly choose !p&q large enough (depending only on &) such that !p&q ě !, andż

|I|ě!p&q

ˇ

ˇ

ˇ*pG ` 2Iq ´*pG1 ` 21Iqˇ

ˇ

ˇ!pIq 3I ď

ż

|I|ě!p&q!pIq 3I ď

&4 . (3.1)

Page 24: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

24 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

If |I| ď !p&q, then the assumptions imply |G ` 2I| ď 4!p&q and |G1 ` 21I| ď 4!p&q, soż

|I|ď!p&q

ˇ

ˇ

ˇ*pG ` 2Iq ´*pG1 ` 21Iqˇ

ˇ

ˇ!pIq 3I ď

ż

ˇ

ˇ

ˇDpG ` 2Iq ´ DpG1 ` 21Iqˇ

ˇ

ˇ!pIq 3I

where DpGq ” *pGq1t|G| ď 4!p&qu. �en, since D P !1, it is well known that we can choose a function D which iscompactly supported and smooth, such that }D ´ D}1 ď &{4 (see e.g. [LL01, Lem. 2.19]). �erefore

ż

ˇ

ˇ

ˇDpG ` 2Iq ´ DpG ` 2Iqˇ

ˇ

ˇ!pIq 3I ď !p0qż

ˇ

ˇ

ˇDpG ` 2Iq ´ DpG ` 2Iqˇ

ˇ

ˇ 3I “!p0q}D ´ D}1

&4 ,

where this estimate holds for all G P ℝ and all 2 ě 1{2. We also haveż

ˇ

ˇ

ˇDpG ` 2Iq ´ DpG1 ` 21Iqˇ

ˇ

ˇ!pIq 3I ď }D1}8

ż

´

|G ´ G1| ` |2 ´ 21||I|¯

!pIq 3I ď 2}D1}8�1 ,

which can be made at most &{4 by taking �1 “ &{p8}D1}8q. Combining the above estimates givesż

|I|ď!p&q

ˇ

ˇ

ˇ*pG ` 2Iq ´*pG1 ` 21Iqˇ

ˇ

ˇ ď3&4 ,

and combining with the estimate (3.1) for |I| ě !p&q gives the conclusion. �

Lemma 3.3. Suppose* satis�es Assumption 1 . �ere exists a �nite constant �1p*q, depending on* only, such that

�G,2p|/|?q “

��p|�|?*pG ` 2�qq

��*pG ` 2�qď �1p*q `

ˆ

1.82 ¨ |G|2

˙?

for all 0 ď ? ď 200, 1{2 ď 2 ď 2, and G P ℝ. (We can assume, without loss, �1p*q ě 10.)

Proof. It follows from Assumption 1 that��*p2�q ą 0 for any 2 ą 0. Lemma 3.2 gives that��*p2�q is a continuousfunction of 1{2 ď 2 ď 2, so by compactness considerations we must have

21p*q ” max"

2, sup"

1��*p2�q

: 12 ď 2 ď 2

**

ă 8 (3.2)

(where we chose 21p*q ě 2 for convenience). Next, for any " ą 0, it holds for all 1{2 ď 2 ď 2 that

��

´

*p2�q; |2�| ě "¯

ď ℙ

ˆ

|�| ě"

2

˙

ď!p"{2q

"{2.

If we take ě 0p*q “ p8 log 21p*qq1{2 ě 2, then for all 1{2 ď 2 ď 2 we have

��

´

*p2�q; |2�| ď ¯

ě ��*p2�q ´!p {2q {2 ě

1221p*q

.

In what follows let pGq ” maxt 0p*q, |G|u. �en we can lower bound

��*pG ` 2�q “

ż

*p2Iq!

ˆ

I ´G

2

˙

3I “

ż

*p2Iq exp"

´G2

222 `GI

2

*

!pIq 3I

ě�p*p2�q; |2�| ď pGqq

exptp3{2q pGq2{22uě

1{p221p*qq

exptp3{2q pGq2{22u. (3.3)

Next we note that for any " ě 0 and �1 “ 1{10 we have

��p|�|? ; |�| ě "q “

ż

|I|ě"

|I|?

p2�q1{2exp

"

´I2

2

*

3I

ż

|I|ě"{p1`�1q1{2

p1` �1qp?`1q{2|I|?

p2�q1{2exp

"

´p1` �1qI2

2

*

3I

ď sup"

1.05?`1|I|?

exppI2{20q : I P ℝ*

ˆ

|�| ě"

p1` �1q1{2

˙

ď 20!p"{1.05q"{1.05 , (3.4)

Page 25: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 25

where 20 ě 5 is an absolute constant since we restricted 0 ď ? ď 200. Combining (3.3) with (3.4) gives��p|�|?*pG ` 2�qq

��*pG ` 2�qď

ˆ

1.82 ¨ pGq2

˙?

`��p|�|? ; |�| ě 1.82 ¨ pGq{2q

��*pG ` 2�q

ď

ˆ

1.82 ¨ pGq2

˙?

` 20 ¨!p1.82 ¨ pGq{p1.05 ¨ 2qq

1.82 ¨ pGq{p1.05 ¨ 2q ¨exptp3{2q pGq2{22u

1{p221p*qq.

(�e �rst inequality above also uses that * ď 1, from Assumption 1.) Recalling again the restrictions 1{2 ď 2 ď 2and 0 ď ? ď 200, we can simplify the above to obtain

��p|�|?*pG ` 2�qq

��*pG ` 2�qď

ˆ

1.82 ¨ pGq2

˙?

` 220 ¨ 21p*q ¨ !

ˆ

1.82 ¨ pGq1.05 ¨ 2

˙

exp"

3 pGq2

222

*

ď

ˆ

1.82 ¨ |G|2

˙?

`

1.82 ¨ 0p*q

1{2

˙200`

2 ¨ 20 ¨ 21p*q

p2�q1{2

*

ď

ˆ

1.82 ¨ |G|2

˙?

`

"

´

4 0p*q¯200

` 20 ¨ 21p*q

*

ˆ

1.82 ¨ |G|2

˙?

` �1p*q , (3.5)

where the last equality de�nes �1p*q. �e above choices guarantee �1p*q ě 20 ¨ 21p*q ě 10. �

Remark. �e bound from Lemma 3.3 is reasonably tight. To see this, consider the function

*pGq “ 1"

|G ´ 0| ď0

2

*

for 0 ą 0. If *pG ` �q “ 1, then G ` � ě 0{2, so � ě 0{2´ G. In the case that G ď 0, it implies |�| ě 0{2` |G|. Itfollows that for any G ď 0 we have

�G,2p|/|?q “

��p|�|?*pG ` �qq

��*pG ` �qě

ˆ

0

2 ` |G|˙?

ě

ˆ

0

2

˙?

` |G|? ,

where 0 ą 0 can be chosen to be arbitrarily large.

Next we combine Assumption 2 (which bounds VarG,2p/q) with the calculations of Lemma 3.3 to obtain boundson VarG,2p/2q and CovG,2p/, /2q:

Lemma 3.4. Suppose* satis�es Assumptions 1 and 2, and let �1p*q be as in Lemma 3.3. �en we have

VarG,2p/2q ď 2p*q ¨

1.82 ¨ |G|2

˙2` �1p*q

*

, (3.6)

CovG,2p/, /2q ď 2p*q

21{2 ¨

ˆ

1.82 ¨ |G|2

` �1p*q1{2˙

, (3.7)

for all 1{2 ď 2 ď 2 and all G P ℝ.

Proof. Let pGq be as in the proof of Lemma 3.3. From the de�nition of 2p*q (see Assumption 2),

pIq ”��,�1rp� ´ �1q2p� ` �1q2*pG ` 2�q*pG ` 2�1q; |� ` �1| ď 21{2 ¨ 1.82 ¨ pGq{2s

��,�1r*pG ` 2�q*pG ` 2�1qs

ď 2 2p*q ¨

ˆ

1.82 ¨ pGq2

˙2.

If � and �1 are independent standard gaussian random variables, then � ´ �1 and � ` �1 are independent gaussianrandom variables with mean zero and variance 2. It follows that

��,�1

p� ´ �1q2p� ` �1q2; |� ` �1| ě?

2"ı

“ 4 ¨��

|�|2; |�| ě "ı (3.4)ď 4 ¨ 20

!p"{1.05q"{1.05 . (3.8)

Page 26: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

26 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Combining (3.8) with our earlier bound (3.3) gives

pIIq ”��,�1rp� ´ �1q2p� ` �1q2*pG ` 2�q*pG ` 2�1q; |� ` �1| ě 21{2 ¨ 1.82 ¨ pGq{2s

��,�1r*pG ` 2�q*pG ` 2�1qs

ď 4 ¨ 20!p1.82 ¨ pGq{p1.05 ¨ 2qq

1.82 ¨ pGq{p1.05 ¨ 2q ¨exptp3{2q pGq2{22u

1{p221p*qq

(�e �rst inequality above also uses that * ď 1, from Assumption 1.) By combining the above bounds for thequantities (I) and (II), and recalling again that 1{2 ď 2 ď 2, we obtain

VarG,2p/2q “��,�1rp� ´ �1q2p� ` �1q2*pG ` 2�q*pG ` 2�1qs

2 ¨��,�1*pG ` 2�q*pG ` 2�1q

ď 2p*q ¨

ˆ

1.82 ¨ pGq2

˙2`

4 ¨ 20 ¨ 21p*q

1.82{1.05 ¨ !

ˆ

1.82 ¨ pGq1.05 ¨ 2

˙

exp"

3 pGq2

222

*

ď 2p*q ¨

1.82 ¨ |G|2

˙2`

ˆ

1.82 ¨ 0p*q

1{2

˙2*

`4 ¨ 20 ¨ 21p*q

p1.82{1.05q ¨ p2�q1{2

ď 2p*q ¨

1.82 ¨ |G|2

˙2` 14 ¨ 0p*q

2 ` 20 ¨ 21p*q

*

ď 2p*q ¨

1.82 ¨ |G|2

˙2` �1p*q

*

,

where the second-to-last inequality uses that we took 2p*q ě 1 (see Assumption 2), and the last inequality usesthe de�nition (3.5) of �1p*q from the proof of Lemma 3.3. �is proves (3.6). Combining with Assumption 2 and theCauchy–Schwarz inequality gives

CovG,2p/, /2q ď

"

VarG,2p/qVarG,2p/2q

*1{2ď 2p*q

21{2

1.82 ¨ |G|2

˙2` �1p*q

*1{2

ď 2p*q

21{2

ˆ

1.82 ¨ |G|2

` �1p*q1{2˙

,

where the last inequality again uses that 2p*q ě 1. �is proves (3.7). �

Remark 3.5. We include here an example of a function* that satis�es Assumption 1 but does not satisfy the bound(3.6) (and hence, by Lemma 3.4, must violate Assumption 2). For : ě 1 let 1: ” expp´100 ¨ 4:q, and let

�:pGq ”

ˆ

1!

G P r0, 1s)

` 1!

G P r2: ´ 1, 2:s)

˙

1:!pGq

”1: 5:pGq

!pGq.

�en clearly �: is a nonnegative measurable function supported on r0, 1s Y r2: ´ 1, 2:s, with

}�:}8 ď1:

!p2:q“ 1:p2�q1{2 exp

ˆ

4:2

˙

ďp2�q1{2

expp99 ¨ 4:q.

Let � be a large absolute constant, and de�ne G: ” �2: and

*pGq ”ÿ

:ě1�:pG: ` Gq .

From the above bound on }�:}8 it is clear that * satis�es Assumption 1. Next we note that��r�2�:p�qs

1:“

ż

I2 5:pIq 3I “13

ˆ

p2:q3 ´ p2: ´ 1q3 ` 1˙

“ 22:ˆ

1`$p1q

2:

˙

,

��r�4�:p�qs

1:“

ż

I4 5:pIq 3I “15

ˆ

p2:q5 ´ p2: ´ 1q5 ` 1˙

“ 24:ˆ

1`$p1q

2:

˙

.

Page 27: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 27

For any : ě 1, we have ��*p´G: ` �q ě ���:p�q “ 21: . For ℓ ě : ` 1 and 0 ď ? ď 4, we have��r|�|?�ℓ p´G: ` Gℓ ` Gqs

1:ď}�ℓ }8��p�4q

1:ď

3p2�q1{2 expp100 ¨ 4:qexpp99 ¨ 4ℓ q

ď3p2�q1{2

expp4:r74 ¨ 4ℓ´: ` p25 ¨ 4ℓ´: ´ 100qsqď

3p2�q1{2

expp74 ¨ 4ℓ qď

1expp70 ¨ 4ℓ q

.

On the other hand, for 1 ď ℓ ď : ´ 1 and 0 ď ? ď 4, we have (again taking � large enough)

��r|�|?�ℓ p´G: ` Gℓ ` Gqs

1:ď}�ℓ }8��r�4; |�| ě �2:{4s

1:ď

p2�q1{2 expp100 ¨ 4:qexpp99 ¨ 4ℓ ` �24:{33q

ďp2�q1{2 expp100 ¨ 4:qexpp99` �24:{33q

ďp2�q1{2

expp99` �4:qď

1expp70 ¨ 4:q

.

(In the �rst inequality above, we used that the support of �ℓ is contained in r0, 2ℓ s.) Altogether we conclude��r�2*p´G: ` �qs

��*p´G: ` �q“

ˆ

1`$p1q

expp4:q

˙

��r�2�:p�qs

���:p�q“

ˆ

1`$p1q

expp4:q

˙

22:

2 ,

��r�4*p´G: ` �qs

��*p´G: ` �q“

ˆ

1`$p1q

expp4:q

˙

��r�4�:p�qs

���:p�q“

ˆ

1`$p1q

expp4:q

˙

24:

2 .

Recalling the notation of De�nition 3.1, we obtain

Var´G: ,1p/2q “

ˆ

1`$p1q

expp4:q

˙"

24:

2 ´

ˆ

22:

2

˙2*

“ Θp24:q .

�us shows that * does not satisfy the bound (3.6), as claimed.

3.2. Estimates of the replica symmetric solution. In this subsection we give the proof of Proposition 1.1. As aconsequence we obtain a rough estimate (Corollary 3.8) of the replica symmetric formula which will be used laterin our analysis.

Lemma 3.6. Suppose* satis�es Assumption 1. As in Proposition 1.1, let @p#q ” �rthp#1{2/q2s. �en

max!

0, 1´ 4#)

ď3@

3#ď 1

for all # ě 0.

Proof. It is clear that @ is increasing with respect to # ě 0: indeed,3@

3#“ �

thp#1{2/q th1p#1{2/q/

#1{2

ą 0 ,

since th1pGq ą 0 for all G P ℝ, and G thpGq ě 0 for all G P ℝ. Integrating by parts gives3@

3#“ �

´

th1p#1{2/q¯2` thp#1{2/q th2p#1{2/q

“ �

ˆ

1´ 4 thp#1{2/q2 ` 3 thp#1{2/q4˙

.

Note that G “ thp#1{2/q2 P r0, 1s almost surely, and 1´ 4G ď 1´ 4G ` 3G2 ď 1 for all G P r0, 1s, so

1 ě3@

3#ě �

´

1´ 4 thp#1{2/q2¯

ě 1´ 4# ¨�p/2q “ 1´ 4# ,

for all # ě 0. �

Lemma 3.7. Suppose* satis�es Assumption 1. As in Proposition 1.1, let Ap@q ” �r�@p@1{2/q2s. �en

sup"ˇ

ˇ

ˇ

ˇ

3A

3@

ˇ

ˇ

ˇ

ˇ

: 0 ď @ ď12

*

ď 21 ¨ �1p*q6 ,

where 21 ě 1 is an absolute constant while �1p*q is the constant from Lemma 3.3.

Page 28: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

28 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof. For convenience we shall rewrite (1.4) as

�@pGq “��*

1pG ` p1´ @q1{2�q

��*pG ` p1´ @q1{2�q. (3.9)

Note that the above makes sense for any * satisfying Assumption 1, without any smoothness assumption, since* 1 can be interpreted as a distributional derivative (as in e.g. [LL01, Ch. 6]). Similarly one can make sense of thedistributional derivative *p:q for any integer : ě 1. We can then calculate

3A

3@“ �

2�@p@1{2/q3r�@p@

1{2/qs

3@

“ (I)´ (II)

where, abbreviating *p:q ” *p:qp@1{2/ ` p1´ @q1{2�q, we have

(I) “ �„

/

@1{2 �@p@1{2/q ¨ p�@q

1p@1{2/q

“ �

�@p@1{2/q ¨ p�@q

2p@1{2/q `´

p�@q1p@1{2/q

¯2

,

(II) “ �„

�@p@1{2/q

p1´ @q1{2

ˆ

��p�*2q

��*´p��*

1q��p�* 1q

p��*q2

˙

.

It follows by repeated applications of the inequality 201 ď 02 ` 12 thatˇ

ˇ

ˇ

ˇ

3A

3@

ˇ

ˇ

ˇ

ˇ

ď �ÿ

0ď:,?ď3�

ˆ

��r|�|:*p@1{2/ ` p1´ @q1{2�qs

��*p@1{2/ ` p1´ @q1{2�q

˙2?

for all 0 ď @ ď 1{2, where � is an absolute constant. It then follows from Lemma 3.3 thatˇ

ˇ

ˇ

ˇ

3A

3@

ˇ

ˇ

ˇ

ˇ

ď �ÿ

0ď:,?ď3�

´

p4@1{2|/|q: ` �1p*q¯2?

ď 21 ¨ �1p*q6

for all 0 ď @ ď 1{2, where 21 ě 1 is an absolute constant. �

Proof of Proposition 1.1. We seek a value @ P r0, 1{25s that satis�es the �xed-point equation (1.5), i.e., @ “ @p Ap@qq.�is is the same as a root @ P r0, 1{25s of the function

6p@q “@´1p@q

´ Ap@q . (3.10)

Note that @p0q “ 0, and it follows from Lemma 3.6 that @1p#q P r4{5, 1s for all # ď 1{20, so45# ď @p#q ď #

for all # ď 1{20. Consequently, if @p#q ď 1{25 then we must have # ď 1{20, that is to say,

sup"

p@q´1p@q : @ ď 125

*

ď120 .

It follows from Lemma 3.6 that p@´1q1p@q P r1, 5{4s for all @ ď 1{25. Combining with Lemma 3.7 gives1 ´ 21 ¨ �1p*q

6 ď36

3@ď

54 ` 21 ¨ �1p*q

6 ,

where 21 is the absolute constant from Lemma 3.7. It follows that as long as ď p*q as de�ned by (1.6), then forall 0 ď @ ď 1{25 we will have

12 ď

36

3@ď

2 .

At @ “ 0 we have 6p0q “ ´Ap0q, and it follows by Assumption 1 combined with Lemma 3.3 that´

��r�*p�qs¯2ď Ap0q “

ˆ

��*1p�q

��*p�q

˙2“

ˆ

��r�*p�qs

��*p�q

˙2ď �1p*q

2 .

Page 29: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 29

It follows that on the interval 0 ď @ ď 1{25, the function 6 has a unique root @, which must satisfyp��r�*p�qsq2

2 ď@

ď 2�1p*q

2

It follows from the earlier bound on # thatp��r�*p�qsq2

2 ď@

ď

#

ď

5@4 ď

5�1p*q2

2 ,

so this concludes the proof. �

Corollary 3.8. If the function* satis�es Assumptions 1 and 2, then for all 0 ă ď p*q we have

RSp ;*q

ěannp ;*q

´ 1.51 ¨ �1p*q

2 ělog 2

´ 1.53 ¨ �1p*q2 ,

where �1p*q is the constant from Lemma 3.3, and p*q is given by (1.6).

Proof. Let p@,#q be the solution from Proposition 1.1, and recall from (1.8) that

RSp ;*q ´ log 2 “ ´#p1´ @q

2 `�

"

log chp#1{2/q ` !@p@1{2/q

*

.

We herea�er abbreviate

ℓp@q ” �!@p@1{2/q “ �

log��*´

@1{2/ ` p1´ @q1{2�¯

.

Since chpGq ě 1 for all G P ℝ, we can lower bound

RSp ;*q ´ log 2 ě ´#

2 ` �!@p@1{2/q ě ´

#

2 `

"

ℓp0q ´ @ sup0ď@ď1{2

ˇ

ˇ

ˇ

ˇ

3ℓ

3@

ˇ

ˇ

ˇ

ˇ

*

.

Similarly as in the proof of Lemma 3.7, we can boundˇ

ˇ

ˇ

ˇ

3ℓ

3@

ˇ

ˇ

ˇ

ˇ

ď �ÿ

0ď:,?ď2�

„ˆ

��r|�|:*p@1{2/ ` p1´ @q1{2�qs

��*p@1{2/ ` p1´ @q1{2�q

˙?

ď 21 ¨ �1p*q2 (3.11)

for all 0 ď @ ď 1{2, where 21 is an absolute constant (and can be arranged to be the same as the 21 from Lemma 3.7).By combining the above bounds we conclude

RSp ;*q ´ plog 2` ℓp0qq

ě ´#

2 ´ @ sup0ď@ď1{2

ˇ

ˇ

ˇ

ˇ

3ℓ

3@

ˇ

ˇ

ˇ

ˇ

(3.11)ě ´

#

2 ´ @ ¨ 21 ¨ �1p*q2

(1.7)ě ´3�1p*q

12 ` 21 ¨ �1p*q

2

˙

(1.6)ě ´3�1p*q

12 `

1410�1p*q4 2p*q4

˙

ě ´1.51 ¨ �1p*q2 ,

where the last bound uses that we chose �1p*q ě 10 in the proof of Lemma 3.3. �en, recalling (1.10), we have

annp ;*q ´ log 2 “ ℓp0q “ ´ log 1�*p/q

(3.2)ě ´ log 21p*q ě ´ 21p*q ě ´

�1p*q2

50 ,

using that we also chose �1p*q ě 5 ¨ 21p*q ě 10 in the proof of Lemma 3.3. �e claim follows. �

Lemma 3.9. Suppose * satis�es Assumptions 1 and 2, and let *� “ * ˚ !� as in (1.23). �en, using the notation ofAssumption 2, we will have 2p*�q ď 4p 2q

1p*q for all � ď 1.

Proof. Let �, �1 be i.i.d. standard gaussian random variables. We need to bound the quantity��,�1rp� ´ �1q2*�pG ` 2�q*�pG ` 2�1qs

��,�1r*�pG ` 2�q*�pG ` 2�1qs”#�pG, 2q

��pG, 2q. (3.12)

Let �, �1 be independent copies of �, �1, and note that

#�pG, 2q “ ��,�1 ,�,�1

p� ´ �1q2*pG ` 2� ` ��q*pG ` 2�1 ` ��1q

.

Page 30: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

30 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Taking an orthogonal transformation of p�, �q gives another pair of i.i.d. standard gaussians,ˆ

-

.

˙

“1

p22 ` �2q1{2

ˆ

2 �´� 2

˙ˆ

��

˙

.

Likewise we like p- 1 , .1q be the pair obtained by the same transformation applied to p�1 , �1q. �en note that

p� ´ �1q2 “

ˆ

2p- ´ - 1q ´ �p. ´ .1q

p22 ` �2q1{2

˙2ď 2 ¨

22p- ´ - 1q2 ` �2p. ´ .1q2

22 ` �2 .

Rewriting #�pG, 2q in terms of the random variables -, - 1 , ., .1 gives

#�pG, 2q ď 2 ¨�-,-1 ,.,.1„ˆ

22p- ´ - 1q2 ` �2p. ´ .1q2

22 ` �2

˙

*pG ` p22 ` �2q1{2-q*pG ` p22 ` �2q1{2- 1q

“222

22 ` �2#0pG, p22 ` �2q1{2q `

4�2

22 ` �2�0pG, p22 ` �2q1{2q ,

where #0 and �0 are as in (3.12) but with* in place of*�. If 1{2 ď 2 ď 2 and � ď 1, then 1{2 ď p22`�2q1{2 ď 7{3,so Assumption 2 will give

#�pG, 2q ď

ˆ

222

22 ` �2 p 2q1p*q `

4�2

22 ` �2

˙

�0pG, p22 ` �2q1{2q ď 4p 2q

1p*q��pG, 2q .

�e claim follows. �

Proof of Proposition 1.9. Recall from Lemma 3.3 the constant �1p*q: it depends on the absolute constant 20, as wellas the constant 21p*q de�ned by (3.2). Let �, � be i.i.d. standard gaussians, and note

��*�p2�q “ ��,�*´

2� ` ��¯

“ ��*´

p22 ` �2q1{2�¯

.

From this it is clear that 21p*�q converges to 21p*q as � Ó 0. Next recall from Lemma 3.9 that if � ď 1 then we have 2p*�q ď 4p 2q

1p*q. Consequently, recalling (1.6) and (1.9), we have

p*�q(1.6)”

1410 ¨ 21 ¨ �1p*�q

6 ¨ 2p*�q4 ě

1416 ¨ 21 ¨ �1p*q6 ¨ p 2q1p*q4

(1.9)“ 1p*q . (3.13)

�is shows that for all 0 ă ď 1p*q, we also have ď p*�q for all � small enough, which means that the resultsof Proposition 1.1 apply for*� as well as for* . We see from the proof of Proposition 1.1 that the replica symmetric�xed point @� for *� is a root @� P r0, 1{25s of the function (cf. (3.10))

6�p@q “@´1p@q

´ A�p@q ,

where A� is de�ned as in (1.5) but with *� in place of * :

A�p@q “1

p1´ @q�

„ˆ

��r�*�p/ ` p1´ @q1{�qs

��*�p/ ` p1´ @q1{�q

˙2

.

It is clear that 6� converges uniformly to 6 over 0 ď @ ď 1{25, so @� converges to @, and consequently #� convergesto #. It is then straightforward to deduce from the formula (1.8) that RSp ;*�q converges to RSp ;*q as � Ó 0. �

3.3. Almeida–�ouless condition. Recall from De�nition 2.1 the state evolution recursions.

Lemma 3.10. Suppose * satis�es Assumption 1. �e recursions of De�nition 2.1 are well-de�ned: the recursions (2.5)lead to |�B | ď 1 and |�B | ď 1 for all B ě 1, and the recursions (2.6) leads to ΛB P r0, 1q and ΓB P r0, 1q for all B ě 0.

Proof. We abbreviate � ” �@ throughout this proof. We have �1 ” �1 and �1 ” �1 as in (2.4), and it follows that

0 ď p�1q2 “ p�1q

2 ď1@�

thp#1{2/q2ı

(1.5)“ 1 ,

Page 31: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 31

and likewise 0 ď p�1q2 “ p�1q

2 ď 1. �en for B ě 1 we have �B`1 and �B`1 de�ned by (2.5), and it follows by theCauchy–Schwarz inequality that

|�B`1| ď�rthp#1{2/q2s

@

(1.5)“ 1 , |�B`1| ď

�r�p@1{2/q2s

#(1.5)“ 1 .

�us |�B | ď 1 and |�B | ď 1 for all B ě 1, which con�rms that the recursions (2.5) are well-de�ned.It remains to verify that the quantitiesΛB´1 and ΓB´1 from (2.7) are strictly smaller than 1 for all B ě 1. �e claim

holds trivially in the base case B “ 0, since clearly Λ0 “ Γ0 “ 0. We therefore suppose inductively that we haveΛB´1 ă 1 and ΓB´1 ă 1. �is means that the quantities �B and �B are well-de�ned by the recursions (2.6). Denote"1 ” @1{2 and #1 ” p#{ q1{2. Next let .8 , -9 be a collection of i.i.d. standard gaussian random variables, and let

"8`1 ” thˆ

#1{2"

�1.1 ` . . . ` �8´1.8´1 ` p1´ Γ8´1q1{2.8

#9`1 ” �

ˆ

@1{2"

�1-1 ` . . . ` � 9´1-9´1 ` p1´Λ8´1q1{2-9

(cf. (A.29) and (A.30)). �is gives well-de�ned random variables ": , #: for all 1 ď : ď B ` 1, with �rp":q2s “ @

and �rp#:q2s “ #{ . If 2 ď : ă ℓ ď B ` 1, then

�p":"ℓ q

@“ �

ˆ

p�1q2 ` . . . ` p�:´2q

2 ` �:´1p1´ Γ:´2q1{2˙

(2.6)“ �p�:´1q

(2.5)“ �: , (3.14)

�p#:#ℓ q

#{ “ �

ˆ

p�1q2 ` . . . ` p�:´2q

2 ` �:´1p1´Λ:´2q1{2˙

(2.6)“ �p�:´1q

(2.5)“ �: . (3.15)

(cf. (2.12) and (2.13)). Now let '8 , �8 (8 ě 1) be the Gram–Schmidt orthogonalization of the random variables "8 , #8 :

'8`1 “1A8`1

"

"8`1 ´ÿ

9ď8

�p"8`1' 9q' 9

*

, (3.16)

�8`1 “128`1

"

#8`1 ´ÿ

9ď8

�p#8`1� 9q� 9

*

(3.17)

where A8`1 and 28`1 are the normalizing constants such that�rp'8`1q2s “ 1 and�rp�8`1q

2s “ 1. To see that AB`1 isa well-de�ned positive number, we apply the inductive hypothesis ΓB´1 ă 1: then follows from the above de�nition(together with the fact that th is a non-constant function) that "B`1 depends non-trivially on.B . On the other hand,the random variables ' 9 for 9 ď B can depend only on .1 , . . . , .B´1. It follows that the random variable

"B`1 ´ÿ

9ďB

�p"B`1' 9q' 9

has strictly positive variance, so AB`1 is well-de�ned and positive. Likewise, using the inductive hypothesisΛB´1 ă 1together with the fact that � is non-constant, we deduce that 2B`1 is also well-de�ned and positive. Next, since wesee from above that the quantities�p":"ℓ q and�p#:#ℓ q depend only on mint:, ℓu, it follows that there is a value; 9 such that �p"8`1' 9q “ @1{2; 9 for all 8 ě 9, and likewise there is a value H 9 such that �p#8`1� 9q “ p#{ q1{2H 9for all 8 ě 9. As in (2.7), let us abbreviate

!8 ”ÿ

9ď8

p; 9q2 , .8 ”

ÿ

9ď8

pH 9q2 .

It follows by the above calculations that

;8`1 “�p"8`2'8`1q

@1{2(3.16)“

@1{2

A8`1

"

�p"8`2"8`1q

@´ÿ

9ď8

p; 9q2*

(3.14)“

�8`1 ´ !8

p1´ !8q1{2,

H8`1 “�p#8`2�8`1q

p#{ q1{2(3.17)“

p#{ q1{2

A8`1

"

�p#8`2#8`1q

#{ ´ÿ

9ď8

pH 9q2*

(3.15)“

�8`1 ´ .8

p1´ .8q1{2.

Page 32: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

32 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Recalling that A8`1 and 28`1 are positive for all 8 ď B, we deduce

0 ăA8`1

@1{2 “1@1{2�

„ˆ

"8`1 ´ÿ

9ď8

�p"8`1' 9q' 9

˙21{2“ p1´ !8q1{2 ,

and similarly 0 ă p1 ´ .8q1{2, which implies !8 , .8 P r0, 1q. We see moreover that the sequences ;8 , <8 satisfy thesame recursions (2.6) as the sequences �8 , �8 , which implies ;8 “ �8 and <8 “ �8 for all 8 ě 1. �is proves thatΛ8 “ !8 and Γ8 “ .8 both lie in r0, 1q for all 8 ě 1. �erefore the recursions (2.6) give well-de�ned quantities �B and�B for all B ě 1, as desired. �

Lemma 3.11 (Almeida–�ouless condition). Suppose* satis�es Assumption 1, and moreover that

ATp ;*q ” ¨

"

´

p�@q1p@1{2/q2

¯

*"

´

th1p#1{2/q2¯

*

ď 1 . (3.18)

In this case, the recursions of De�nition 2.1 lead to ΓB Ñ 1 and ΛB Ñ 1 as B Ñ8.

Proof. We begin with a general observation. Let /, �, �1 be i.i.d. standard gaussians. Suppose 5 : ℝ Ñ ℝ is anyfunction with at most polynomial growth, and consider the function

A 5 pCq ” �

C1{2/ ` p1´ Cq1{2�¯

C1{2/ ` p1´ Cq1{2�1¯

,

which is de�ned for 0 ď C ď 1. Write /pCq ” C1{2/ ` p1´ Cq1{2�, and note that

A 5 pCq “ �

„ˆ

�� 5´

C1{2/ ` p1´ Cq1{2�¯

˙2

“ �

´

�� 5 p/pCqq¯2

ě 0 .

Next we di�erentiate with respect to C and apply gaussian integration by parts to obtain

pA 5 q1pCq “ �

"

´

�� 5 p/pCqq¯

��

5 1p/pCqq

ˆ

/

C1{2´

p1´ Cq1{2

˙*

“ �

´

�� 51p/pCqq

¯2

“ A 5 1pCq ě 0 .

It follows moreover that pA 5 q2pCq “ pA 5 1q1pCq “ A 5 2pCq ě 0 for all 0 ď C ď 1, so A 5 is convex.Now, returning to the state evolution recursions from De�nition 2.1, we will consider A( and A) for

(pGq ”

ˆ

#

˙1{2�p@1{2Gq , )pGq ”

ˆ

1@

˙1{2thp#1{2Gq .

Denote A() ” A( ˝ A) . Note that the �xed point equation (1.5) implies A(p1q “ 1 and A)p1q “ 1, so A()p1q “ 1. Wealso have from (2.4) that A(p0q “ �1, while A)p0q “ �1 “ 0; so if �1 “ 0 then A()p0q “ 0. However, the condition(3.18) is equivalent to pA()q1p1q ă 1, which implies pA()q1pCq ă 1 for all C P r0, 1s, and consequently

1´ A()p0q “ A()p1q ´ A()p0q “ż 1

0pA()q

1pCq 3C ď pA()q1p1q ă 1 .

�is shows that if pA()q1p1q ă 1 then we must have A()p0q “ A(p0q “ p�1q2 ą 0.

Next we argue that �2 ‰ 0. To this end, for the function ) we can directly calculate that for all 0 ď C ď 1,

pA)q1pCq “ A)1pCq ě A)1p0q “

´

�)1p/q¯2“

#

@

´

� th1p#1{2/q¯2 (1.5)“

#p1´ @q2

@ą 0 ,

so A)pCq is strictly increasing. �us, in the case �1 ą 0 we obtain

�2(2.5)“ �p�1q “ A)

´

p�1q1{2¯

ą A)p0q “ 0 .

Since ) is an odd function, in the case �1 ă 0 we obtain

�2(2.5)“ �p�1q “ ´A)

´

p´�1q1{2¯

ă ´A)p0q “ 0 .

In both cases we obtain �2 ‰ 0 as claimed.

Page 33: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 33

To conclude, note that Lemma 3.10 implies that ΛB Ò Λ8 ď 1 and ΓB Ò Γ8 ď 1 as B Ñ 8. If pA()q1p1q ă 1, theabove considerations give pΓ8q2 ě p�1q

2 “ p�1q2 ą 0, as well as

pΛ8q2 ě p�2q

2 (2.6)“

ˆ

�2 ´Λ1

p1´Λ1q1{2

˙2(2.4)“ p�2q

2 ą 0 .

Clearly we must also have �B Ñ 0 and �B Ñ 0 as B Ñ8, soˆ

�B�B

˙

(2.6)“

ˆ

ΛB´1 ` �Bp1´ΛB´1q1{2

ΓB´1 ` �Bp1´ ΓB´1q1{2

˙

BÑ8ÝÑ

ˆ

Λ8

Γ8

˙

ą

ˆ

00

˙

.

�us, for B large enough, we can express

�B`1(2.5)“ �p�Bq

(2.5)“ �p�p�B´1q “ A()

´

p�B´1q1{2¯

which shows that Γ8 must be a �xed point of A() , and Λ8 “ A)pΓ8q. Since we saw above that A()pCq is convex onthe interval 0 ď C ď 1, if pA()q1p1q ď 1 then the only �xed point of A()pCq on the interval 0 ď C ď 1 occurs at C “ 1,and thus we obtain Λ8 “ Γ8 “ 1. �

3.4. Logconcavity. In this subsection we review the proof of Proposition 1.4 which follows from well-known resultson logconcave measures. We then state and prove Lemmas 3.14 and 3.15, which give some further consequences ofAssumption 2. We also present the proof of Proposition 2.5.

�eorem 3.12 ([Mau91]). Suppose * satis�es Assumption 1 and is logconcave. Recall that ! denotes the standardgaussian density on ℝ, and let � be the probability measure on ℝ whose density (with respect to Lebesgue measure) is

3�

3I“*pIq!pIq

��*p�q.

�en for any measurable subset � Ď ℝ we have the concentration boundż

expˆ

3pI, �q2

4

˙

3�pIq ď1

�p�q,

where 3pI, �q denotes the minimum distance from I to �.

�eorem 3.12 is obtained as a consequence of the Prekopa–Leindler inequality (or functional Brunn–Minkowskiinequality) [Pre71, Pre73, Lei72] from convex geometry; see also [BL00] and [Tal11a, �m. 3.1.4]. In this paper weuse �eorem 3.12 only in the proof of Proposition 1.4, which is not needed for the main result �eorem 1.2. See §1.2.2for a discussion of results on the positive spherical perceptron which use convex geometry in more essential ways.By well-known arguments, �eorem 3.12 can be used to deduce the following:

�eorem 3.13 (see e.g. [Tal11a, �m. 3.1.4]). In the same se�ing as �eorem 3.12, if 5 : ℝÑ ℝ is Lipschitz, thenż

p 5 pHq ´ 5 pIqq2:

p16:q:3�pHq 3�pIq ď

ż

exp"

p 5 pHq ´ 5 pIqq2

16

*

3�pHq 3�pIq ď 4

for any integer : ě 1.

Note that the concentration bounds from �eorems 3.12 and 3.13 rely on the strong logconcavity of the gaussiandensity !pGq, and the bounds hold uniformly over all logconcave functions * . As a consequence we obtain:

Proof of Proposition 1.4. Suppose* satis�es Assumption 1. If* is bounded away from zero or compactly supported,then Assumption 2 holds by trivial calculations. In the case that * is logconcave, Assumption 2 follows from theabove result �eorem 3.13. �

Lemma 3.14. If* satis�es Assumption 1 and 2, then the function �@ of (1.4) satis�es

}p�@q1}8 ď

11´ @

ˆ

2p*q

2 ` 1˙

.

�erefore �@ is Lipschitz for any @ P r0, 1q.

Page 34: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

34 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof. From (1.4) and (3.9) we calculate

p�@q1pGq “

��*2pG ` p1´ @q1{2�q

��*pG ` p1´ @q1{2�q´

ˆ

��*1pG ` p1´ @q1{2�q

��*pG ` p1´ @q1{2�q

˙2. (3.19)

Applying gaussian integration by parts gives

p�@q1pGq “

11´ @

"

��rp�2 ´ 1q*pG ` p1´ @q1{2�qs

��*pG ` p1´ @q1{2�q´

ˆ

��r�*pG ` p1´ @q1{2�qs

��*pG ` p1´ @q1{2�q

˙*

“1

1´ @

"

12��,�1rp� ´ �1q2*pG ` 2�q*pG ` 2�1qs

��,�1r*pG ` 2�q*pG ` 2�1qs´ 1

*

.

�e result follows from Assumption 2. �

Proof of Proposition 2.5. In view of Lemma 3.11, it su�ces to check that the condition (3.18) holds for 0 ă ď p*q.By Lemma 3.14 and the fact that th1pGq P p0, 1q for all G P ℝ, we can bound

ATp ;*q ď p1´ @q2

ˆ

2p*q

2 ` 1˙2ďp3{2q2 ¨ 2p*q

2 ¨

p1´ @q2(1.7)ď 3 ¨ 2p*q

2 ¨

(1.6)ď

3410�1p*q6 2p*q2

ă 1 ,

having used that �1p*q ě 10 and 2p*q ě 1. �

Lemma 3.15. Suppose* satis�es Assumption 1 and 2. Letℱ1pCq be as in (2.3). �en

max"

}hpℓq}8 , }npℓq}8 , }HpBq}8 , }mpBq}8 : B ď C , ℓ ď C ´ 1*

ď #0.01

with probability 1´ >#p1q.

Proof. Note that Lemma 2.3 implies, for all 1 ď ℓ ď C ´ 1 and all 1 ď B ď C,

lim#Ñ8

1#

ÿ

8ď#

ˆ

pHpBqq8#1{2

˙101“ lim

#Ñ8

1"

ÿ

0ď"

ˆ

phpℓqq0@1{2

˙101“ �p/101q ,

where the convergence holds in probability. It follows that the event

”"

max"

1#

ÿ

8ď#

ˆ

pHpBqq8#1{2

˙101,

1"

ÿ

0ď"

ˆ

phpℓqq0@1{2

˙101: B ď C , ℓ ď C ´ 1

*

ď 2�p/101q

*

occurs with probability 1´ >#p1q. We claim that implies the desired bounds. Indeed, clearly implies

max"

}mpBq}8 : B ď C

*

ď max"

}HpBq}8 : B ď C

*

ď #1{2´

2#�p/101q¯1{101

ď #1{100 .

In the above, the �rst inequality uses that mpBq “ thpHpBqq and | thpGq| ď |G|; and the last bound holds for # largeenough (depending on #). Similarly, implies

max"

}hpℓq}8 : ℓ ď C ´ 1*

ď @1{2´

# �p/101q¯1{101

ď #1{100 ,

where the last bound holds for # large enough (depending on , @). Finally, it follows using Lemma 3.3 thatˇ

ˇ

ˇpnpℓqq0ˇ

ˇ

ˇ “

ˇ

ˇ

ˇ�pphpℓqq0qˇ

ˇ

ˇ

(1.4)“

ˇ

ˇ

ˇ

ˇ

1p1´ @q1{2

��r�*pphpℓqq0 ` p1´ @q1{2�qs

��*pphpℓqq0 ` p1´ @q1{2�q

ˇ

ˇ

ˇ

ˇ

ď�1p*q ` 4|phpℓqq0 |

p1´ @q1{2,

and combining with the previous bound on }hpℓq}8 gives

max"

}npℓq}8 : ℓ ď C ´ 1*

“ max"

}�phpℓqq}8 : ℓ ď C ´ 1*

ď�1p*q ` 4@1{2p# �p/101qq1{101

p1´ @q1{2ď #1{100 ,

Page 35: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 35

where the last bound holds for # large enough. �is proves the claim. �

4. Analysis of first moment

In this section we �nish analyzing the conditional �rst moment bound (�eorem 2.11) obtained in Section 2. �isleads to the proof of �eorem 1.5, our main result on the conditional �rst moment. From this we can deduce theupper bound in �eorem 1.2, as presented at the end of this section. For the reader’s convenience, we begin byreviewing some important notations. Recall from (2.26) that

�˚ ” @1{2�t 4C “ @1{2

¨

˚

˚

˚

˝

�1...

�C´1p1´ΛC´1q

1{2q

˛

P ℝC . (4.1)

Recall also from (2.27) that we de�ned

+˚ ” p1´ @q#1{2�t 4C´1 ” p1´ @q#1{2

¨

˚

˚

˚

˝

�1...

�C´2p1´ ΓC´2q

1{2q

˛

P ℝC´1 . (4.2)

Given � P ℝC with }�}2 ď 1, we denote 2p�q ” p1 ´ }�}2q1{2. Next, as in the statement of �eorem 2.11, given aparameter & P ℝ (see (4.6) below), we let

^p�, +q ” xrCst�˚ `"

xrCstp�´ �˚q ` #1{2 &crC ´ 1stp+ ´ +˚q

*

P ℝ" . (4.3)

We then recall the function ! from (1.3), and use it to de�ne

ℒp�, +q ” 1#

ÿ

0ď"

!}�}2p^0p�, +qq “1#

ÿ

0ď"

log��*p_0p�, +qq , (4.4)

�e bound in �eorem 2.11 is expressed in terms of the function

Ψp�, +q ”}+ ´ &p+ ´ +˚q}2

22p�q2 ´p+˚ , +q

1´ @` ℒp�, +q . (4.5)

Recall (2.30) that we decomposed `pM1q “ `˝pM1q ` `‚pM1q. �e rest of this section is organized as follows:‚ In §4.1 we use Lemmas 2.21 and 2.22 to prove Corollary 4.1, which gives a bound on `‚pM1q. �is takes care

of the case p�, +q R T˝ (see (2.28)), so in the rest of the section we restrict to p�, +q P T˝.‚ In §4.2 we prove Lemmas 4.2 and 4.3, which show that the point p�˚ , +˚q, as de�ned by (4.1) and (4.2), is

approximately a stationary point of the function Ψ of (4.5).‚ In §4.3 we prove Proposition 4.4, which bounds HessΨ for p�, +q P T˝.‚ In §4.4 we combine the results described above to conclude the proof of �eorem 1.5. We then use this to

conclude the proof of the upper bound in �eorem 1.2.Lastly, we now �x the parameter

& “ 45�1p*q 1{2 (1.6)

ď1

�1p*q2 2p*q. (4.6)

However, this choice of & will not become important until Lemma 4.10 below.

4.1. Azuma–Hoe�ding bounds.

Corollary 4.1. If* satisifes Assumptions 1 and 2, then with high probability we have

´

`‚pM1qˇ

ˇ

ˇℱ1pCq

¯

ď exp"

RSp ;*q ´ �1p*q2 ¯

*

for `‚pM1q as de�ned by (2.30).

Page 36: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

36 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof. Recall from Corollary 3.8 that for ď p*q we haveRSp ;*q ´ log 2

ě ´1.53 ¨ �1p*q

2 ,

where �1p*q ě 10 is the constant from Lemma 3.3. Recalling (2.30), we will �rst bound the case where }+p�q´+˚}is large. To this end, denote

`3pM1q ”ÿ

1"

3 ď }+p�q ´ +˚} ď 23*

Qp�qS�pM1q .

�anks to Assumption 1, in Proposition 2.12 we also have the trivial bound �pS�pM1q |ℱ1pCqq ď 1. Substituting thisinto the calculation (2.53) from the proof of �eorem 2.11 gives

�p`3pM1q |ℱ1pCqq

exptp1, logp2 chpHpCqqqquď

ÿ

�:3ď}+p�q´+˚}ď23Qp�q exp

"

´#p+˚ , +p�qq

1´ @

*

.

By Lemma 2.3 combined with Jensen’s inequality, we have

lim#Ñ8

p1, log chpHpCqqq#

“ � log chp#1{2/q ď log� chp#1{2/q

“ log� expp#1{2/q “#

2(1.7)ď

3�1p*q2

2 , (4.7)

where the convergence holds in probability as # Ñ8. It follows that, with high probability,exptp1, logp2 chpHpCqqqqu

expt#RSp ;*qu ď exp"

#

ˆ

1.53` 1.51˙

�1p*q2

*

ď exp"

3.05 ¨ #�1p*q2

*

.

Next, it follows from (2.8) and (2.27) that

}+˚}(2.27)“ #1{2p1´ @q}�t 4C´1}

(2.8)“ #1{2p1´ @q

ˆ

ÿ

ℓďC´2p�ℓ q

2 ` 1´ ΓC´2

˙1{2“ #1{2p1´ @q .

Note also that if 3 ď }+p�q ´ +˚} ď 23 then

´p+˚ , +p�qq

1´ @“ ´

}+˚}2 ` p+˚ , +p�q ´ +˚q

1´ @ď}+˚}}+p�q ´ +˚}

1´ @ď

23}+˚}1´ @

“ 2#1{23 .

Combining the above bounds gives, with high probability,�p`3pM1q |ℱ1pCqq

expt#RSp ;*qu ď exp"

#

ˆ

3.05 ¨ �1p*q2 ` 2#1{23

˙*

3 ď›

›+p�q ´ +˚›

› ď 23˙

ď exp"

´ #

ˆ

32

2.01 ´ 2#1{23 ´ 3.05 ¨ �1p*q2 ` >#p1q

˙*

,

where the last inequality is by Lemma 2.22. If we take 3 ě 30 ” 8 ¨ �1p*q 1{2, then we obtain�p`3pM1q |ℱ1pCqq

expt#RSp ;*qu ď exp"

´ #�1p*q2

ˆ

82

2.01 ´ 2 ¨ 31{2 ¨ 8´ 3.05´ >#p1q˙*

ď exp"

´ 1.1 ¨ #�1p*q2

*

.

�is concludes our analysis of the case where }+p�q ´ +˚} is large, so we next turn to the case that }�p�q ´ �˚} islarge. To this end, let us denote

`1pM1q ”ÿ

1"

}+p�q ´ +˚}

�1p*q 1{2 ď 8,}�p�q ´ �˚p�q}

�1p*q 1{2 ě 16*

S�pM1q .

Page 37: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 37

It follows from the previous bounds that�p`1pM1q |ℱ1pCqq

expt#RSp ;*qu ď exp"

#

ˆ

3.05 ¨ �1p*q2 ` 2#1{2 ¨ 8 ¨ �1p*q

1{2˙*

}�p�q ´ �˚p�q}

�1p*q 1{2 ě 16˙

ď exp"

#�1p*q2

ˆ

3.05` 2 ¨ 31{2 ¨ 8´ 162

8.01 ` >#p1q˙*

ď exp"

´ 1.2 ¨ #�1p*q2

*

,

where the second-to-last inequality is by Lemma 2.21. Recalling the de�nition (2.30) of `˝pM1q, we have

`‚pM1q ď `1pM1q `ÿ

:ě0`2:30pM

1q ,

where 30 “ 8 ¨ �1p*q 1{2 as above. It follows by combining the above bounds that�p`‚pM1q |ℱ1pCqq

expt#RSp ;*qu ď exp"

´ #�1p*q2

*

with high probability, which proves the claim. �

4.2. Stationarity at replica symmetric value. In this subsection we show that the functionΨp�, +q from (4.5) isapproximately stationary at the point p�˚ , +˚q.

Lemma 4.2. Suppose* satis�es Assumption 1 and 2. �en for all 1 ď B ď C we haveBΨ

B�Bp�˚ , +˚q » 0 ,

where » indicates convergence in probability as # Ñ8.

Proof. Recalling (1.4), (3.9), and (3.19), we can rewrite

p�@q1pGq “

��*2pG ` p1´ @q1{2�q

��*pG ` p1´ @q1{2�q´ p�@pGqq

2 . (4.8)

Recall from (4.3) the de�nition of _ ” _p�, +q. We then calculateBℒB�B

(4.4)“

1#

ÿ

0ď"

"

B^0

B�B

��*1p_0q

��*p_0q`B2

B�B

��r�* 1p_0qs��*p_0q

*

(3.9)“

1#

ÿ

0ď"

"

pxpBqq0�}�}2p^0q ´ �B��*

2p_0q��*p_0q

*

(4.8)“

1#

"

pxpBq , �}�}2p^0qq ´ �Bp1, p�}�}2q1p^qq ´ �B}�}�}2p^q}2*

It follows from (4.1) that }�˚}2 “ @, and 2˚ ” 2p�˚q “ p1´ @q1{2. We also note that

hpC`1q “ hrCst 4C(2.11)“ @1{2xrCst�t 4C

(4.1)“ xrCst�˚

(4.3)“ ^p�˚ , +˚q ” ^˚ .

It follows using (2.11) and Lemma 2.3 that at p�˚ , +˚q we havepxpBq , �}�˚}2p^˚qq

#» ΛC ,B�/�@p@

1{2/q(4.1)“ �˚,B �p�@q

1p@1{2/q ,

having again used gaussian integration by parts at the last step. As a consequenceBℒB�B

p�˚ , +˚q » ´�˚,B �”

�@p@1{2/q2

ı

(1.5)“ ´�˚,B# .

Substituting this into (4.5) givesBΨ

B�Bp�˚ , +˚q »

}+˚}2�˚,Bp1´ }�˚}2q2

´ �˚,B#(4.2)“ 0 ,

as claimed. �

Page 38: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

38 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Lemma 4.3. Suppose* satis�es Assumption 1 and 2. �en for all 1 ď ℓ ď C ´ 2 we haveBΨ

B+ℓp�˚ , +˚q » 0 ,

where » indicates convergence in probability as # Ñ8. For ℓ “ C ´ 1 we have

B+ℓp�˚ , +˚q » &#1{2

ˆ

�C´1 ´ p1´ ΓC´2q1{2˙

,

where the right-hand side is >Cp1q.

Proof. Similarly to the proof of Lemma 4.2, we calculate

BℒB+ℓ

(4.4)“

1#

ÿ

0ď"

B^0

B+ℓ

��*1p_0q

��*p_0q“

&pcpℓq , �}�}2p^qq

#1{2 .

It follows by recalling Lemma 2.3 that

BℒB+ℓ

p�˚ , +˚q “&pcpℓq , npC`1qq

#1{2 » &#1{2�ℓ .

Substituting this into (4.5) givesBΨ

B+ℓp�˚ , +˚q » &

"

´+˚,ℓ1´ @

` #1{2�ℓ

*

,

and combining with (4.2) gives the claim. �

4.3. Hessian calculation. In this subsection we analyze the Hessian of the function Ψ from (4.5) to prove:

Proposition 4.4. If* satis�es Assumptions 1 and 2, then the function ℒ of (4.4) satis�es

HessΨp�, +q “ˆ

Ψ�,� Ψ�,+

Ψ�,+ Ψ+,+

˙ˇ

ˇ

ˇ

ˇ

p�,+qď

ˆ

47�1p*q2 2p*q � 00 1´ 1.9&

˙

for all p�, +q P T˝ (as de�ned by (2.28)), for 0 ď ď p*q as de�ned by (1.6), and & “ &p ;*q as in (1.6).

�e proof of Proposition 4.4 is given at the end of this subsection. We divide the analysis into several steps. De�ne

�2pGq “��*

2pG ` 2�q

��*pG ` 2�q´

ˆ

��*1pG ` 2�q

��*pG ` 2�q

˙2“ p�1´22q1pGq (4.9)

�2pGq ”��r�*2pG ` 2�qs

��*pG ` 2�q´��r�* 1pG ` 2�qs

��*pG ` 2�q

��*1pG ` 2�q

��*pG ` 2�q(4.10)

De�ne the "-dimensional vectors A ” �2p�qp^q and B ” �2p�qp^q. Next let

02pGq ”��r�* 1pG ` 2�qs

��*pG ` 2�q, (4.11)

12pGq ”��r�2*2pG ` 2�qs

��*pG ` 2�q´

ˆ

��r�* 1pG ` 2�qs

��*pG ` 2�q

˙2, (4.12)

and de�ne the scalars 0 ” p1, 02p�qp^qq and 1 ” p1, 12p�qp^qq.

Lemma 4.5. For the function ℒ de�ned by (4.4) we have

ℒ�,� “1#

"

xrCspdiagAqxrCst `ˆ

xrCsBp∇2qt ` p∇2qpxrCsBqt˙

` 0 ¨ Hess 2 ` 1 ¨ p∇2qp∇2qt*

, (4.13)

ℒ�,+ “&

#1{2

"

xrCspdiagAqcrC ´ 1st ` p∇2qcrC ´ 1sB*

, (4.14)

ℒ+,+ “ &2"

crC ´ 1spdiagAqcrC ´ 1st*

(4.15)

for A, B, 0, and 1 as de�ned above.

Page 39: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 39

Proof. Note that _ is linear in +, with �rst derivativeB_0B+ℓ

“B^0

B+ℓ“ #1{2 &pcpℓqq0 .

It follows by di�erentiating (4.4) twice that

Bℒ2

B+:B+ℓ“

1#

ÿ

0ď"

"

��r*2p_0q

B_0B+:

B_0B+ℓs

��*p_0q´

ˆ

��r*1p_0q

B_0B+:s

��*p_0q

˙ˆ

��r*1p_0q

B_0B+ℓs

��*p_0q

˙*

“1#

ÿ

0ď"

A0B^0

B+:

B^0

B+ℓ“ &2

"

crC ´ 1spdiagAqcrC ´ 1st*

:,ℓ

,

which veri�es (4.15). On the other hand we note that _ depends on � both through ^ and through 2p�q, andB_0B�B

“B^0

B�B`B2

B�B� .

We use this to calculate the mixed partialBℒ2

B�BB+ℓ“

1#

"

ÿ

0ď"

A0B^0

B�B

B^0

B+ℓ`B2

B�B

ÿ

0ď"

B0B^0

B+ℓ

*

“&

#1{2

ˆ

xrCspdiagAqcrC ´ 1st ` p∇2qcrC ´ 1sB˙

B,ℓ

,

which veri�es (4.14). Finally, a similar calculation givesBℒ2

B�AB�B“

1#

"

ÿ

0ď"

A0B^0

B�B

B^0

B+ℓ`B2

B�B

ÿ

0ď"

B0B^0

B+ℓ` 0

B22

B�AB�B` 1

B2

B�A

B2

B�B

*

,

which implies (4.13). �

We now proceed to bound the quantities de�ned above.

Lemma 4.6. Suppose* satis�es Assumptions 1 and 2. With the notation from (4.10), we have}B}"1{2 “

}�2p�qp^q}

"1{2 ď 2p*q

ˆ

2.5 ¨ �1p*q ` 5.8 ¨}^}

"1{2

˙

.

for all 0.95 ď 2 ď 1.

Proof. Recalling the notation of De�nition 3.1, we �rst use gaussian integration by parts to rewrite (4.10) as

�2pGq “122

"

CovG,2p/2 , /q ´ 2 ¨�G,2p/q*

.

It follows by combining Lemmas 3.3 and 3.4 that for all 0.95 ď 2 ď 1,

|�2pGq| ď122

"

�1p*q `1.82 ¨ |G|

0.95

˙

` 2p*q

21{2

ˆ

1.82 ¨ |G|0.95 ` �1p*q

1{2˙*

.

Recall also that we assumed (without loss) �1p*q ě 10 and 2p*q ě 1. �erefore

|�2pGq| ď 2p*q

0.952

2` 1p2 ¨ 10q1{2

˙

�1p*q `

ˆ

2` 121{2

˙

1.82 ¨ |G|0.95

*

ď 2p*q

ˆ

2.5 ¨ �1p*q ` 5.8 ¨ |G|˙

.

where the last bound again uses that �1p*q ě 10. �e claim follows. �

Lemma 4.7. Suppose* satis�es Assumption 1. With the notation from (4.11), we have|0|

"“|p1, 02p�qp^qq|

"ď 1.1 ¨ �1p*q ` 3.7 ¨

}^}2

"

for all 0.95 ď 2 ď 1.

Page 40: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

40 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof. We use gaussian integration by parts to rewrite (4.11) as

02pGq “��r�2*pG ` 2�qs

��*pG ` 2�q´ 1 .

It follows by Lemma 3.3 (which uses only Assumption 1) that for all 0.95 ď 2 ď 1,

|02pGq| ď 1`ˆ

�1p*q `

ˆ

1.82 ¨ G0.95

˙2˙

ď 1.1 ¨ �1p*q ` 3.7 ¨ G2 ,

where the last bound uses that we took �1p*q ě 10. �

Lemma 4.8. Suppose* satis�es Assumptions 1 and 2. With the notation from (4.12), we have|1|

"“|p1, 12p�qp^qq|

"ď 2p*q

ˆ

4.6 ¨ �1p*q ` 17 ¨}^}2

"

˙

for all 0.95 ď 2 ď 1.

Proof. We use gaussian integration by parts to rewrite (4.12) as

12pGq “122

"

��rp�4 ´ 5�2 ` 2q*pG ` 2�qs

��*pG ` 2�q´

ˆ

��r�2*pG ` 2�qs

��*pG ` 2�q´ 1

˙2*

“122

"

VarG,2p/2q ´ 3 ¨�G,2p/2q ` 1*

.

It follows by combining Lemmas 3.3 and 3.4 that for all 0.95 ď 2 ď 1,

|12pGq| ď122

"

2p*q

1.82 ¨ G2

˙2` �1p*q

*

` 3ˆ

�1p*q `

ˆ

1.82 ¨ G0.95

˙2˙

` 1*

ď 2p*q

0.952

4` 110

˙

�1p*q ` 4 ¨ˆ

1.82 ¨ G0.95

˙2*

,

where the last bound uses that we took �1p*q ě 10 and 2p*q ě 1. �e claim follows. �

Corollary 4.9. If* satis�es Assumptions 1 and 2, then the function ℒ of (4.4) satis�es

Hessℒp�, +q “ˆ

ℒ�,� ℒ�,+

ℒ�,+ ℒ+,+

˙ˇ

ˇ

ˇ

ˇ

p�,+qď 2p*q

ˆ

�1p*q2 � 0

0 5 ¨ &2�

˙

for all p�, +q P T˝ (as de�ned by (2.28)), for 0 ď ď p*q as de�ned by (1.6), and & “ &p ;*q as in (1.6).

Proof. We will bound each of the terms computed in Lemma 4.5. Let D denote any vector inℝC and let E denote anyvector in ℝC´1. It follows from Lemma 2.3 that, with high probability,

sup"

}xrCstD}2

"“

1"

ÿ

BďC

DBxpBq›

2: D P ℝC´1 , }D} “ 1

*

ď 2 . (4.16)

Next, it follows from (4.1) that }�˚} “ @1{2, so the restriction p�, +q P T˝ (see (2.28)) implies}∇2}

2 ď }�} ď @1{2 ` 16 ¨ �1p*q 1{2 (1.7)

ď 18 ¨ �1p*q 1{2 (1.6)

ď18

45 ¨ �1p*q2 2p*q2ď

146 ¨ 2p*q2

, (4.17)

where the last bound uses that we assumed (without loss) �1p*q ě 10 and 2p*q ě 1. �erefore we certainly have2p�q “ p1´ }�}2q1{2 ě 0.95. It follows from (4.9) and Lemma 3.14 (which uses Assumption 2) that

}A}8 ď }�2p�q}8 “ }p�}�}q1}8 ď1

2p�q2

"

2p*q

2 ` 1*

ď 2p*q{2` 1

0.952 ď 1.7 ¨ 2p*q . (4.18)

Page 41: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 41

It follows that, with high probability, it holds for all unit vectors D, E that

Dtˆ

1#xrCspdiagAqxrCst

˙

D ď}A}8#

¨

›xrCstD›

2ď 3.4 ¨ 2p*q (4.19)

Dtˆ

1#1{2 xrCspdiagAqcrC ´ 1st

˙

E ď}A}8#1{2 ¨

›xrCstD›

› ¨

›crC ´ 1stE›

› ď 2.5 ¨ 2p*q 1{2 , (4.20)

Etˆ

crC ´ 1spdiagAqcrC ´ 1st˙

E ď }A}8 ¨›

›crC ´ 1stE›

2ď 1.7 ¨ 2p*q . (4.21)

Next, recalling (4.3), for all p�, +q P T˝ we have}^p�, +q}

"1{2

(4.16)ď 21{2}�} `

&}+ ´ +˚}

1{2

(2.28)ď 21{2}�} ` 16 ¨ &�1p*q

(4.17)ď 0.33 ¨ �1p*q , (4.22)

having used that �1p*q ě 10 and |&| ď 1{50. Combining (4.22) with Lemma 4.6 gives}B}"1{2 ď 2p*q

ˆ

2.5 ¨ �1p*q ` 5.8 ¨}^}

"1{2

˙

ď 4.5 ¨ �1p*q 2p*q .

Combining the above with (4.16) and (4.17) gives that with high probability, for all unit vectors D, E we have

Dtˆ

1#xrCsBp∇2qt

˙

D ď}xrCstD}}B}}∇2}

#ďp2"q1{2

#

ˆ

4.5 ¨ �1p*q 2p*q"1{2˙ˆ

36 ¨ �1p*q 1{2˙

(1.6)ď

21{2 ¨ 4.5 ¨ 3645 ¨ �1p*q

¨ ď 0.16 ¨ , (4.23)

again using that �1p*q ě 10. Similarly, with high probability, it holds for all unit vectors D, E that

Dtˆ

1#1{2 p∇2qcrC ´ 1sB

˙

E ď}B}}∇2}#1{2 ď

ˆ

4.5 ¨ �1p*q 2p*q 1{2˙ˆ

36 ¨ �1p*q 1{2˙

(1.6)ď

4.5 ¨ 3645 ¨ �1p*q

¨ 1{2 ď 0.11 ¨ 1{2 . (4.24)

Next, combining (4.22) with Lemma 4.8 gives|1|

"ď 2p*q

ˆ

4.6 ¨ �1p*q ` 17 ¨´

0.33 ¨ �1p*q¯2˙

ď 2.4 ¨ �1p*q2 2p*q .

Combining the above with (4.17) gives, for any unit vector D,

Dtˆ

1#1p∇2qp∇2qt

˙

D ď|1|}∇2}2#

ď

ˆ

2.4 ¨ �1p*q2 2p*q

˙

¨

ˆ

1845 ¨ �1p*q2 2p*q2

˙2 ď

47 . (4.25)

Finally, we note that the Hessian of 2p�q “ p1´ }�}2q1{2 can be calculated as

Hess 2p�q “ ´ 12p�q

"

� `��t

2p�q2

*

.

We can bound the above in operator norm by

}Hess 2p�q} ď 10.95

ˆ

1`}�}2

0.952

˙

(4.17)ď

10.95

ˆ

1`p1{46q2

0.952

˙

ď 1.1 .

Combining (4.22) with Lemma 4.7 gives|0|

"ď 1.1 ¨ �1p*q ` 3.7 ¨

´

0.33 ¨ �1p*q¯2ď 0.6 ¨ �1p*q

2 ,

so altogether we obtain, for any unit vector D,

Dtˆ

1#0 ¨ Hess 2

˙

D ď 1.1 ¨ 0.6 ¨ �1p*q2 ď 0.7 ¨ �1p*q

2 . (4.26)

To conclude, we note that substituting (4.21) into (4.15) implies}ℒ+,+}

&2 ď 1.7 ¨ 2p*q .

Page 42: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

42 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Substituting (4.20) and (4.24) into (4.14) implies}ℒ�,+}

1{2 &ď 2.5 ¨ 2p*q ` 0.11 ď 2.7 ¨ 2p*q .

Finally, substituting (4.19), (4.23), (4.25), and (4.26) into (4.13) gives}ℒ�,�}

ď 3.4 ¨ 2p*q ` 2 ¨ 0.16` 1

47 ` 0.7 ¨ �1p*q2 ď 0.8 ¨ �1p*q

2 2p*q .

Consequently, for any vector G ” p 9G, :Gq where 9G P ℝC and :G P ℝC´1, we have|GtpHessℒqG|

2p*qď 0.8 ¨ �1p*q

2 } 9G}2 ` 1.7 ¨ &2}:G}2 ` 2 ¨ 2.7 ¨ 1{2 &} 9G}}:G}

ď

ˆ

0.8 ¨ �1p*q2 ` 2.7

˙

} 9G}2 `´

1.7` 2.7¯

&2}:G}2 .

�e claim follows. �

Recalling (4.5), let us now denote

Pp�, +q ” }+ ´ &p+ ´ +˚q}2

22p�q2 ´p+˚ , +q

1´ @, (4.27)

so that Ψ “ P ` ℒ.

Lemma 4.10. If* satis�es Assumptions 1 and 2, then the function P of (4.27) satis�es

Hess Pp�, +q “ˆ

P�,� P�,+

P�,+ P+,+

˙ˇ

ˇ

ˇ

ˇ

p�,+qď

ˆ

1080 ¨ �1p*q2 � 0

0 p1´ 1.95 ¨ &q�

˙

for all p�, +q P T˝ (as de�ned by (2.28)), for 0 ď ď p*q as de�ned by (1.6), and & “ &p ;*q as in (1.6).

Proof. We �rst calculate the mixed partial derivatives

P�,� “}+ ´ &p+ ´ +˚q}2

2p�q4

"

1` 4��t

2p�q2

*

,

P�,+ “2p1´ &q

2p�q4�´

+ ´ &p+ ´ +˚q¯t,

P+,+ “p1´ &q2

2p�q2� “

p1´ &q2

1´ }�}2 � .

We have from (4.2) that }+˚} “ p1´ @q#1{2 ď #1{2. �en, for p�, +q P T˝ (as de�ned by (2.28)) we must have

}+} ď #1{2 ` 16 ¨ �1p*q 1{2 (1.7)

ď 18 ¨ �1p*q 1{2 (1.6)

ď18

45 ¨ �1p*q2ď

146 (4.28)

(very similarly to (4.17)). It follows using (4.17) and (4.28) that

}P�,�} ďp18 ¨ �1p*qq

2

0.954

ˆ

1`4 ¨ p1{46q2

0.952

˙

ď 46 ¨ �1p*q2 ,

}P�,+} ď2p18 ¨ �1p*qq

2

0.954 ď 720 ¨ �1p*q2

(4.6)ď

&2

43 ,

}P+,+} ďp1´ &q2

1´ p18 ¨ �1p*qq2 ď p1´ &q2 ` 2 ¨ p18 ¨ �1p*qq

2 (4.6)ď 1´ 2& ` 1.03 ¨ &2 .

Consequently, for any vector G ” p 9G, :Gq where 9G P ℝC and :G P ℝC´1, we have

|GtpHess PqG| ď 360 ¨ �1p*q2 } 9G}2 ` 2 ¨ 720 ¨ �1p*q

2 } 9G}}:G} `

ˆ

1´ 2& ` 1.03 ¨ &2˙

}:G}2

ď �1p*q2´

360` 720¯

} 9G}2 `

ˆ

1´ 2& ` 1.03 ¨ &2 `&2

43

˙

}:G}2 .

�e claim follows. �

Page 43: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 43

Proof of Proposition 4.4. It follows by combining Corollary 4.9 and Lemma 4.10 that

HessΨ “ Hess P ` Hessℒ ď

ˆ

�1p*q2p 2p*q ` 1080q � 0

0 p1´ 1.95 ¨ & ` 5 ¨ 2p*q&2q�

˙

.

We use the choice of & from (4.6) to bound

5 ¨ 2p*q&2 (4.6)ď

5&�1p*q2

ď 0.05 ¨ & ,

and the claim follows. �

4.4. Replica symmetric upper bound. In this subsection we give the proof of �eorem 1.5. We then use this toconclude the proof of the upper bound in �eorem 1.2.

Proof of �eorem 1.5. Recall from (2.30) that we decomposed `pM1q “ `˝pM1q``‚pM1q. For `˝pM1q, we will analyzethe bound from �eorem 2.11. Note that Lemma 2.3 implies

p1, logp2 chpHpCqqqq#

#Ñ8ÝÑ log`� log chp#1{2/q (4.29)

in probability. Recalling (4.1), (4.2), and (4.5), and applying Lemma 2.3 again, we have

Ψp�˚ , +˚q “ ´}+˚}2

2p1´ @q` ℒp�˚ , +˚q #Ñ8ÝÑ ´

#p1´ @q

2 ` �!@p@1{2/q (4.30)

in probability, for ! as in (1.3). It follows by comparing (4.29) and (4.30) with (1.8) thatp1, logp2 chpHpCqqqq

#`Ψp�˚ , +˚q

#Ñ8ÝÑ RSp ;*q (4.31)

in probability. Next, it follows by combining Lemmas 2.21 and 2.22 that

Qˆ"

� P t´1,`1u# :›

›�p�q ´ �˚›

› ě 31 and›

›+p�q ´ +˚›

› ě 32

ď exp"

´ #

'p1´ 3@1{2q

8 p31q2 ` p1´ 'q

p1´ 8@1{2q

2 p32q2 ` >#p1q

*

(4.32)

for any ' P r0, 1s. On the other hand, if p�, +q P T˝ (as de�ned by (2.28)) with }�´ �˚} ď 31 and }+´ +˚} ď 32,then it follows by combining Lemmas 4.2 and 4.3 with Proposition 4.4 that

Ψp�, +q ´Ψp�˚ , +˚q ď ∇Ψp�˚ , +˚qˆ

�´ �˚+ ´ +˚

˙

`47�1p*q

2 2p*q

2 p31q2 `

p1´ 1.9&q2 p32q

2

ď >#p1q ` >Cp1q `47�1p*q

2 2p*q

2 p31q2 `

p1´ 1.9&q2 p32q

2 . (4.33)

Let us take ' “ 4 1{2. �en, for 31 ď }�´ �˚} ď p1` q1{231, combining the }�´ �˚}2 terms in (4.32) and (4.33)results in

´4 1{2p1´ 3@1{2q

8 `47�1p*q

2 2p*q p1` q

2(1.6)ď

ˆ

´ 1` 3@1{2 `47p1` q

45�1p*q

˙

1{2

2 ď ´ 1{2

10 .

For 32 ď }+ ´ +˚} ď p1` q1{232, combining the }+ ´ +˚}2 terms in (4.32) and (4.33) results in

´p1´ 4 1{2qp1´ 8@1{2q

2 `p1` qp1´ 1.9&q

2(1.7)ď

ˆ

2` 4 ¨ 31{2�1p*q ` 1{2˙

1{2 ´1.9 ¨ &

2

ď 8�1p*q 1{2 ´

1.9 ¨ &2

(4.6)ď

ˆ

8´ 1.9 ¨ 45

2

˙

�1p*q 1{2 ď ´1000 ¨ 1{2 .

Substituting the above bounds into the result of �eorem 2.11 gives, with high probability,

�p`˝pM1q |ℱ1pCqq

expt#pRSp ;*q ` >Cp1qquď

ÿ

:1 ,:2ě0exp

"

´# 1{2

10

2ÿ

8“1p38q

2p1` q:8*

ď $p1q .

�e result follows by combining with the bound on `‚pM1q from Corollary 4.1. �

Page 44: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

44 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof of �eorem 1.2 upper bound. It follows from �eorem 1.5 and Markov’s inequality that for any & ą 0,

ˆ

1#

log`pM1q ě RSp ;*q ` &

ˇ

ˇ

ˇ

ˇ

ℱ1pCq

˙

ďexpp#>Cp1qq

expp#&q,

with high probability over the randomness of ℱ1pCq. It follows that

ˆ

1#

log`pM1q ě RSp ;*q ` &

˙

ď >#p1q `expp#>Cp1qq

expp#&q.

�e le�-hand side does not depend on C, so it follows that

lim sup#Ñ8

1#

log` ď RSp ;*q

in probability, which gives the upper bound in �eorem 1.2. �

5. Second moment conditional on AMP

In this section we give the proof of �eorem 1.6, our main result on the conditional second moment. From this wewill deduce the lower bound in �eorem 1.2 in the bounded case, as explained at the end of this section. �e lowerbound in the general case will be treated in Section 7. Recalling (2.28), we now restrict further to

T˚ ”

"

p�, +q : max!

}�p�q ´ �˚}, }+p�q ´ +˚})

ď >#p1q*

, (5.1)

so T˚ Ď T˝. �en, analogously to (2.29), we let

ℍ˚ ”

"

� P t´1,`1u# : p�p�q, +p�qq P T˚

*

,

so ℍ˚ Ď ℍ˝. Analogously to (2.30), we let

`˚pMq ”ÿ

�Pℍ˚

S�pMq ď `˝pMq ď `pMq . (5.2)

We will prove �eorem 1.6 for the random variable

¯ pMq ”ÿ

�Pℍ˚

S�pMq1"

}Mv�}2

"ď 5�1p*q

2*

ď `˚pMq , (5.3)

where v� “ �2{}�2} as in De�nition 2.7, and �1p*q is the constant from Lemma 3.3. �e remainder of this sectionis organized as follows:

‚ In §5.1 we prove the �rst moment lower bound (1.20), which is the �rst assertion of �eorem 1.6.‚ In §5.2 we introduce a parameter � “ �p� , q (De�nition 5.3) which captures the correlation of a pair

of con�gurations � , P t´1,`1u# . We then prove �eorem 5.9 which gives a preliminary bound on thesecond moment contribution from pairs with small � (see (5.29)). We also prove Corollary 5.10 which boundsthe second moment contribution from pairs with larger �.

‚ In §5.3 we further analyze the bound obtained in �eorem 5.9. We show in Proposition 5.11 that the bound isapproximately stationary at � “ 0, and then in Corollary 4.9 we control the second derivative of the boundwith respect to �.

‚ In §5.4 we combine the results of the preceding sections to conclude the proof of �eorem 1.6. From this wededuce the lower bound of �eorem 1.2 in the case }D} ă 8.

�e calculation of this section follows a similar outline as that of Sections 2 and 4, so we will point out the parallelsthroughout. As before, we let M be an independent copy of M1.

Page 45: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 45

5.1. First moment lower bound. In this subsection we prove (1.20), the �rst assertion of �eorem 1.6. To this end,we begin with the following result which essentially says that the upper bound of �eorem 2.11 is tight in the casep�, +q “ p�˚ , +˚q.

Proposition 5.1. Suppose* satis�es Assumptions 1 and 2. Letℱ1pCq be as in (2.3). For `˚ as in (5.2) we have

´

`˚pM1qˇ

ˇ

ˇℱ1pCq

¯

ě exp"

RSp ;*q ´ >Cp1q¯

*

with high probability.

Proof. Recall from the proof of Proposition 2.12 that

�� ” �´

S�pM1qˇ

ˇ

ˇℱ1pCq

¯

(2.44)“

K�p� | 6Rq ¨ p� ,�p6A | 6Rqexpt#1{2p�, 6Aqu ¨ ?Ap6Aq

. (5.4)

If � P ℍ˚, then it follows from Lemma 2.9 that �p�q » �˚ ” @1{2 4C´1, and

6A

#1{2(2.46)“ p�#qt�p�q »

1p1´ @q1{2

"

+˚ ´#1{2

@1{2 p1´ @q�t�˚

*

(2.27)“ 0 P ℝC´1 . (5.5)

Substituting this into the result of Proposition 6.13 givesp� ,�p6A | 6Rq#1{2| det �# |

»g� ,�p6Aq

#1{2| det �# |(6.24)“ 6� ,�

ˆ

´ p##q1{2„

�# �` 2p�qnrC ´ 1s�}�}2p^� ,�q

##1{2 ` >#p1q˙

, (5.6)

for ^� ,� as de�ned by (6.1). To evaluate the right-hand side above, note that � P ℍ˚ implies

�# �#1{2p1´ @q1{2

”�# �p�q

#1{2p1´ @q1{2(2.50)» ´

�+˚#1{2p1´ @q

(2.27)“ ´��t 4C´1

(2.8)“ ´

¨

˚

˚

˚

˝

�1...

�C´21

˛

.

It also implies ^� ,� » hpC`1q, and consequently

2p�qnrC ´ 1s�}�}2p^� ,�q

##p1´ @q1{2»

nrC ´ 1s�@phpC`1qq

##

(2.13)»

¨

˚

˚

˚

˝

�1...

�C´2�C´1

˛

.

Note moreover that Proposition 2.5 and Lemma 3.11 together imply �C´1 “ 1´>Cp1q. Substituting these calculationsinto (5.6) gives (cf. (2.45))

p� ,�p6A | 6Rq » #1{2| det �# |6� ,�

¨

˚

˚

˚

˝

p##q1{2

¨

˚

˚

˚

˝

>#p1q...

>#p1q>Cp1q

˛

˛

“ expt#>Cp1qu . (5.7)

Substituting (2.47), (2.48), and (5.7) into (2.44) gives (cf. (2.49))

�� “ exp"

#”

A�p�q ` >Cp1qı

*

.

It then follows from the proof of �eorem 2.11 that (cf. (2.53))�p`˚pM1q |ℱ1pCqq

exptp1, logp2 chpHpCqqqqu“ Qpℍ˚q exp

"

#”

Ψp�˚ , +˚q ` >Cp1qı

*

. (5.8)

We have Qpℍ˚q » 1 by the law of large numbers, so the claim follows by recalling (4.31). �

To �nish the proof of (1.20), it remains only to account for the restriction on }Mv} in (5.3):

Page 46: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

46 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof of �rst moment lower bound (1.20). We begin with an easy large deviations calculation. If � is a standard gauss-ian random variable, then it is well known that �2{2 is a gamma random variable with shape parameter 1{2, andmoment-generating function

� expˆ

��2

2

˙

ż 8

0

4´p1´�qG

2�1{2G1{2 3G “1

p1´ �q1{2,

for any � ă 1. If ' is a standard gaussian random vector in ℝ" , then for any ! ą 1 we have

ˆ

}'}2

"ě !

˙

ď exp"

´"

2 inf"

logp1´ �q ` !� : � P r0, 1q**

ď exp"

´"

2

!´ log !´ 1ı

*

. (5.9)

Now, recalling (5.2) and (5.3), let us take ! ” !1p*q ” 5�1p*q2 ě 500 and de�ne

`2pM1q ” `˚pM1q ´ ¯ pM1q ”ÿ

S�pM1q1"

}M1v�}2

"ą !

*

. (5.10)

It follows from Lemmas 2.15 and 2.18 that

´

`2pM1qˇ

ˇ

ˇℱ1pCq

¯

“ �

´

`2pMqˇ

ˇ

ˇR,A, pM1qRA¯

ďÿ

ˆ

}Mv�}2

"ą !

ˇ

ˇ

ˇ

ˇ

R,A, pM1qRA

˙

.

Recall from De�nition 2.13 that +C denotes the span of the vectors cpℓq for ℓ ď C ´ 1. Let us decompose Mv� P ℝ"

as pMv�q‖ ` pMv�qK where pMv�q‖ is the orthogonal projection of Mv� onto +C. Conditional on the events R andA, pMv�q‖ is �xed by the admissibility condition (see (2.34)), while pMv�qK behaves as an independent standardgaussian random vector in the orthogonal complement of +C. It follows that, conditional on R and A, }Mv�}2{" isequidistributed as

#}p�#qt�}"

`}'1}2

"“ >#p1q `

}'1}2

",

where '1 is a standard gaussian random vector in ℝ"´ℓ´1. It follows by applying (5.9) that

´

`2pMqˇ

ˇ

ˇℱpCq¯

ď 2#ℙˆ

}'}2

"ě !

˙

ď exp"

#

log 2´5 �1p*q

2

3

*

ď exp"

#

RSp ;*q ´ �1p*q

2

10

*

,

where the last bound uses the result of Corollary 3.8. Combining with the result of Proposition 5.1 gives

´

¯ pM1qˇ

ˇ

ˇℱ1pCq

¯

ě �

´

`˚pM1qˇ

ˇ

ˇℱ1pCq

¯

´�

´

`2pM1qˇ

ˇ

ˇℱ1pCq

¯

ě exp"

RSp ;*q ´ >Cp1q¯

*

,

with high probability. �

5.2. Expected weight of a correlated pair.

De�nition 5.2. Recall the function Y�pgR , gA , gBq from (2.40). Moreover recall that by (2.37) and (2.38) combined,the pair pgA , gBq is equivalent to gP ” Mv. We let ℚ�p¨q denote the measure on ℝ" such that

ℚ�p�q “�pS�pMq1tMv P �u |R, Aq

�pS�pMq |R, Aq“�pY�p6R , 6A , gBq1tp6A , gBq P �uq

�pY� 6R , 6A , gBqq.

Note that ℚ� depends on 6R and 6A, where 6R does not depend on �, but 6A does.

De�nition 5.3 (analogous to De�nition 2.6). Let � , P t´1,`1u# . Recall from De�nition 2.6 that we decompose� “ �1 ` �2 where �1 is the orthogonal projection of � onto the span of the vectors mpBq, 1 ď B ď C. Analogouslydecompose “ 1 ` 2. Recall that v ” �2{}�2}, and de�ne analogously v ” 2{} 2}. �en let

�p� , q ”

ˆ

�2

}�2}, 2

} 2}

˙

“ pv, v q ,

Page 47: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 47

so clearly we have ´1 ď �p� , q ď 1. We further denote

w ” 2 ´ p 2 , vqv} 2 ´ p 2 , vqv}

»v ´ �vp1´ �2q1{2

, (5.11)

so w is a unit vector in ℝ# orthogonal to v.

De�nition 5.4 (analogous to De�nition 2.16). Given ℱ1pCq as in (2.3), and � , P t´1,`1u# , recall from De�ni-

tion 5.3 that we decompose � “ �1 ` �2 and “ 1 ` 2, and de�ne corresponding unit vectors v and w. Let

+Pp q ” span"

e0wt : 1 ď 0 ď "

*

,

+Ap q ” span"

npℓqwt : 1 ď ℓ ď C ´ 1*

.

Note +Ap q is a subspace of +Pp q, and is also a subspace of the space +C from De�nition 2.13. Let projAp q denotethe orthogonal projection onto +Ap q, and note that pM1qAp qB ” projAp qpM1q is measurable with respect to ℱ

1pCq.

De�nition 5.5 (analogous to De�nition 2.17). As before, let M be an independent copy of M1. Let

Ap q ”!

projAp qpMq “ pM1qAp q)

(2.32)“

"

nrC ´ 1sMw##1{2 “

HrC ´ 1swp##q1{2

*

, (5.12)

where the last identity holds assuming M belongs to the event C from (2.32).

De�nition 5.6 (extension of De�nition 2.10). We now let P denote the uniform probability measure over pairsp� , q P pt´1,`1u#q2, and let Q be the probability measure on the same space which is given by

3Q3P

“exptpHpCq , � ` qu

expt2 ¨ p1, log chHpCqqu.

Note that � and are independent under Q, and each has mean mpCq.

Proposition 5.7 (analogous to Proposition 2.12). For ' P ℝ" de�ne

A2p� | 'q ”#p1´ @q

2p1´ �2q`

1#

ˆ

1, !@`�2p1´@q

ˆ

hpC`1q ` p1´ @q1{2�'

˙˙

.

�en, for � , P ℍ˚, we have

�pS�pM1qS pM1q1t}M1v}2{" ď !u |ℱ1pCqq

�pS�pM1q |ℱ1pCqqď

ż

1"

}'}2

"ď !

*

expt#A2p� | 'quℚ�p3'q

with ℚ� as in De�nition 5.2, and � “ �p� , q as in De�nition 5.3.

In preparation for the proof of Proposition 5.7, we record the following calculation:

Lemma 5.8 (analogous to Lemma 2.20). For � , P t´1,`1u# and ' P ℝ" , de�ne the cumulant-generating function

K |�p� | 'q ”1#

log�ˆ

S pMq exp"

#1{2�tcrC ´ 1sMw*ˇ

ˇ

ˇ

ˇ

R, pM1qR ,Mv “ '

˙

for � P ℝC´1. Next, with ! as in (1.3) and with ˜ as de�ned by Lemma 2.19, de�ne

ℒ |�p� | 'q ”1#

ˆ

1, !}�p q}2p1´�2q`�2

ˆ

˜ ` 2p�p qq

�' ` p1´ �2q1{2#1{2crC ´ 1st�ı

˙˙

, (5.13)

where 2p�p qq ” p1´ }�p q}2q1{2. �en the function K |� satis�es

K |�p� | 'q “}�}2

2 ` ℒ |�p� | 'q .

Page 48: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

48 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof. Conditional on the event R, it follows from Lemma 2.19 that M 1{#1{2 “ ˜ ” ˜ . We also have

M 2

#1{2 “} 2}

#1{2 Mv “ 2p�p qq´

�' ` p1´ �2q1{2/¯

, (5.14)

where / “ Mw is distributed as an independent gaussian vector in ℝ# . �us

K |�p� | 'q “1#

ÿ

0ď"

log��

exp"

#1{2ÿ

ℓďC´1�ℓ pcpℓqq0�

*

*

ˆ

˜0 ` 2p�p qq

!

�'0 ` p1´ �2q1{2�)

˙

,

where � denotes a standard gaussian random variable. Making a change of variable gives

K |�p� | 'q “}�}2

2 `1#

ÿ

0ď"

log��*

ˆ

˜0 ` 2p�p qq

"

�'0 ` p1´ �2q1{2„

� ` #1{2ÿ

ℓďC´1�ℓ pcpℓqq0

,

from which the result follows. �

Proof of Proposition 5.7. We follow a very similar outline as in the proof of Proposition 2.12. As in (5.10) above, letus write ! “ 5�1p*q

2. Given ℱ1pCq as in (2.3) and � , P t´1,`1u# , we abbreviate the quantity of interest as

�� , ” �

ˆ

S�pM1qS pM1q1"

}M1v}2

"ď !

ˇ

ˇ

ˇ

ℱ1pCq

˙

. (5.15)

It follows by the obvious generalization of Lemma 2.15 that

�� , “ �

ˆ

S�pMqS pMq1"

}Mv}2

"ď !

ˇ

ˇ

ˇ

R,C, pM1qRC

˙

,

where M is an independent copy of M1. Next, the obvious generalization of Lemma 2.18 gives the simpli�cation

�� , “ �

ˆ

S�pMqS pMq1"

}Mv}2

"ď !

ˇ

ˇ

ˇ

R,A,Ap q˙

,

where A and Ap q are as in De�nition 2.17 and De�nition 5.5 respectively. By the law of iterated expectations,

�� , “ �

ˆ

S�pMq1"

}Mv}2

"ď !

*

S pMqˇ

ˇ

ˇR,Mv,Ap qı

ˇ

ˇ

ˇ

ˇ

R,A˙

. (5.16)

We therefore �rst consider the calculation of

� |�p'q ” �

ˆ

S pMq

ˇ

ˇ

ˇ

ˇ

R,Mv “ ',Ap q˙

(5.17)

(where we assume that ' satis�es the constraints imposed by A).Towards the calculation of (5.17), recall the notation of De�nition 5.4, and let +Pp qzAp q be the orthogonal com-

plement of +Ap q inside +Pp q. Analogously to (2.36) and (2.37), de�ne gPp q and gAp q, for instance

gAp q ”ˆ

pM, cpℓqwtq : 1 ď ℓ ď C ´ 1˙

“ crC ´ 1sMw P ℝC´1 . (5.18)

Choose an orthonormal basis for+Pp qzAp q, and denote it H 9p q for 1 ď 9 ď "´pC´ 1q. Analogously to (2.38), let

gBp q ”ˆ

pM, H 9p qq : 1 ď 9 ď " ´ pC ´ 1q˙

P ℝ"´C`1 .

Note that there is an orthogonal transformation of ℝ" which maps gPp q to the pair pgAp q , gBp qq.�e weight S pMq, as de�ned by (2.1), is a function of M , which we decomposed in the proof of Lemma 5.8 as a

sum of M 1 and M 2. Recall that M 1 is a function of gR. Meanwhile (see e.g. (5.14)) M 2 is a linear combinationof gP “ Mv “ ' and gPp q “ Mw, where gPp q is equivalent to the pair pgAp q , gBp qq as noted above. �us S pMqcan be rewri�en as a function Y |� of pgR , gP , gAp q , gBp qq: explicitly,

S pMq “ź

0ď"

*

ˆ

ÿ

BďC

p , rpBqq#1{2 pgRq0,B `

} 2}

#1{2

´

�pgPq0 ` p1´ �2q1{2pgPp qq0¯

˙

” Y |�pgR , gP , gAp q , gBp qq ,

Page 49: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 49

with � ” �p� , q as given by De�nition 5.3. On the event Ap q, the value of gAp q is �xed to a value 6Ap q. We thenintroduce a parameter � P ℝC´1, and de�ne (analogously to (2.40))

S |� ,�pMq ” Y |� ,�pgR , gP , gAp q , gBp qq ” Y |�pgR , gP , gAp q , gBp qq exp"

#1{2p�, gAp qq*

.

�en, analogously to (2.41), for any � P ℝC´1 we can rewrite (5.17) as

� |�p'q “ �

ˆ

Y |� ,�pgR , gP , gAp q , gBp qqexpt#1{2p�, 6Ap qqu

ˇ

ˇ

ˇ

ˇ

pgR , gP , gAp qq “ p6R , ', 6Ap qq˙

“1

expt#1{2p�, 6Ap qqu

ż

Y |� ,�p6R , ', 6Ap q , 6Bp qq?Bp qp6Bp qq 36Bp q . (5.19)

By contrast, the expected value of S |� ,� given only the row constraints is (cf. (2.42))

K |�p� | 6R , 'q ” �

ˆ

S |� ,�pMq

ˇ

ˇ

ˇ

ˇ

R,Mv “ '

˙

“ �

ˆ

Y |� ,�pgR , gP , gAp q , gBp qqˇ

ˇ

ˇ

ˇ

pgR , gPq “ p6R , 'q˙

ż

?Ap qp6Ap qq

ż

Y |� ,�p6R , ', 6Ap q , 6Bp qq?Bp qp6Bp qq 36Bp q 36Ap q

“ exp"

#K |�p� | 'q

*

. (5.20)

�en, analogously to (2.43), we de�ne the probability density function

p |� ,�p6Ap q | 6R , 'q 36Ap q ”�pS |� ,�pMq1tgAp q P 36Ap qu |R,Mv “ 'q

�pS |� ,�pMq |R,Mv “ 'q

“?Ap qp6Ap qq

K |�p� | 6R , 'q

ż

Y |� ,�p6R , ', 6Ap q , 6Bp qq?Bp qp6Bp qq 36Bp q (5.21)

�en it follows similarly to (2.44) that we can rewrite (5.19) as

� |�p'q “K |�p� | 6R , 'q ¨ p |� ,�p6Ap q | 6R , 'qexpt#1{2p�, 6Ap qqu ¨ ?Ap qp6Ap qq

. (5.22)

We will show in Proposition 6.14 (deferred to Section 6) that (cf. (2.45))

max"

›p |� ,�p¨ | 6Rq›

8: � P t´1,`1u# , }�p�q} ď 4

5 , |�| ď45 , }�} ď �max

*

ď ℘C ,2 . (5.23)

It therefore remains to estimate the other two terms on the right-hand side of (5.22). We then note that De�nition 5.5implies that, on the event Ap q, we have (cf. (2.46))

6Ap q

#1{2(5.18)“

crC ´ 1sMw#1{2

(2.15)“

p�#q´1nrC ´ 1sMw##1{2

(5.12)“

p�#q´1HrC ´ 1swp##q1{2

(5.11)“

p�#q´1HrC ´ 1sp##q1{2

ˆ

v ´ �vp1´ �2q1{2

˙

(2.22)“

p�#qtr�p q ´ ��p�qs

p1´ �2q1{2“ >#p1q , (5.24)

where the last estimate holds thanks to the restriction � , P ℍ˚ (see (5.5)). Substituting (5.24) into the formula for?Ap q (similar to (2.47)) gives

?Ap qp6Ap qq “1

p2�qpC´1q{2 exp"

´#

2

p�#qtr�p q ´ ��p�qs

p1´ �2q1{2

2*

“ expt# ¨ >#p1qu . (5.25)

Meanwhile, it follows by combining (5.20) and (5.24) that (cf. (2.48))

� |�p'q ď exp"

#

K |�p� | 'q ´

ˆ

�,p�#qtr�p q ´ ��p�qs

p1´ �2q1{2

˙

` >#p1q*

(5.26)

Substituting (5.23), (5.25), and (5.26) into (5.22), and combining with Lemma 5.8, gives (cf. (2.49))� |�p'q

℘C ,2p2�qC{2ď exp

"

#

12

�´p�#qtr�p q ´ ��p�qs

p1´ �2q1{2

2` ℒ |�p� | 'q

*

” exp!

#A |�p� | 'q)

, (5.27)

Page 50: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

50 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

where A |� is de�ned by the last identity. To simplify the above expression, we set � “ �p�q where

�p�q ”+˚

p1´ @q1{2p1´ �2q1{2(4.2)“ ´

#1{2p1´ @q1{2

p1´ �2q1{2�t 4C´1 .

Substituting this into (5.13), and recalling the de�nition of ˜ from Lemma 2.19, we obtain

ℒ |�p�p�q | 'q »1#

ˆ

1, !@p1´�2q`�2

´

hpC`1q ` p1´ @q1{2�'¯

˙

` >#p1q ” ℒ2p� | 'q , (5.28)

where ℒ2 is de�ned by the last identity. By substituting the above into (5.27), we see that the quantity from (5.17)can be upper bounded by

� |�p'q ď exp"

#

#p1´ @q

2p1´ �2q` ℒ2p� | 'q ` >#p1q

*

“ exp"

#”

A2p� |, 'q ` >#p1qı

*

,

for A2p� |, 'q as in the statement of the proposition. By comparing (5.16) with (5.17), we see that

�� , “ �pS� |R,Aqż

1"

}'}2

"ď !

*

� |�p'qℚ�p3'q ,

so the claim follows. �

Analogously to (2.29), we now de�ne

pℍ˚q2,˝ ”

"

p� , q P pℍ˚q2 :|�p� , q|

1{2 ď 10 ¨ �1p*q

*

,

so pℍ˚q2,˝ is a subset of pℍ˚q2. �en decompose ¯ 2pM1q ” ¯ 2,˝pM1q ` ¯ 2,‚pM1q where (cf. (2.30))

¯ 2,˝pM1q ”ÿ

p� , qPpℍ˚q2,˝

S�pM1qS pM1q1"

}M1v�}2

"ď !,

}M1v }2

"ď !

*

. (5.29)

We bound ¯ 2,˝pM1q as follows:

�eorem 5.9 (analogous to �eorem 2.11). Suppose * satis�es Assumptions 1 and 2, and let ℱ1pCq be as in (2.3).Recalling Proposition 5.7 and (5.28), letΨ2p� | 'q be de�ned by

Ψ2p� | 'q ´Ψp�˚ , +˚q ” ´#p1´ @q `A2p� | 'q “ ´#p1´ @q `#p1´ @q

2p1´ �2q` ℒ2p� | 'q .

For ¯ 2,˝pM1q as de�ned by (5.29), we have�p ¯ 2,˝pM1q |ℱ1pCqq

expt2 ¨ p1, logp2 chpHpCqqqquď

ÿ

p� , qPpℍ˚q2,˝

Qp� , qż

expt#Ψ2p� | 'quℚ�p3'q

for ℚ� as in De�nition 5.2 and Q as in De�nition 5.6.

Proof. We follow the proof of �eorem 2.11. Suppose � , P ℍ˚ with � “ �p� , q as given by De�nition 5.3.Recalling De�nition 2.6, the restriction � P ℍ˚ implies

HrC ´ 1s�##1{2

(2.10)“

�yrC ´ 1s�#

(2.20)“ �+p�q » �+˚

(2.27)“ #1{2p1´ @q��t 4C´1 .

It follows that, for all � P ℍ˚,pHpCq , �q#

» #p1´ @qp��tqC´1,C´1(2.8)“ #p1´ @q .

Since pHpCq ,HpC`1qq{p##q “ 1´ >Cp1q, we conclude that, for all � P ℍ˚.

pHpC`1q , �q

#» #p1´ @q ´ >Cp1q .

Page 51: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 51

Let �� , be as in (5.15). Combining with De�nition 5.6 gives

�p ¯ 2,˝pM1q |ℱ1pCqq

expt2 ¨ p1, logp2 chpHpCqqqquď

ÿ

p� , qPpℍ˚q2,˝

Qp� , qˆ

Pp� , q{Qp� , qexpt2 ¨ p1, log chpHpCqqqu

˙

�� ,

ďÿ

p� , qPpℍ˚q2,˝

Qp� , q�� ¨ �� , {��

expt2#r#p1´ @q ´ >Cp1qsu

ď exp!

#Ψp�˚ , +˚q)

ÿ

p� , qPpℍ˚q2,˝

Qp� , q�� , {��

expt#r#p1´ @q ´ >Cp1qsu, (5.30)

where the last bound is by the calculation (5.8) from the proof of Proposition 5.1. Combining with Proposition 5.7gives the claim. �

Corollary 5.10 (analogous to Corollary 4.1). Suppose * satis�es Assumptions 1 and 2, and let ℱ1pCq be as in (2.3).We then have the bound

´

¯ 2,‚pM1qˇ

ˇ

ˇℱ1pCq

¯

ď exp"

2#ˆ

RSp ;*q ´ 0.1 ¨ �1p*q2

˙*

for ¯ 2,‚pM1q “ ¯ 2pM1q ´ ¯ 2,˝pM1q as de�ned by (5.29).

Proof. For �� , as in (5.15) we also have trivially �� , ď 1, and combining this with the calculation (5.30) gives

�p ¯ 2,‚pM1q |ℱ1pCqq

expt2 ¨ p1, logp2 chpHpCqqqquď

ÿ

p� , qPpℍ˚q2,‚

Qp� , qexpt2#r#p1´ @q ´ >Cp1qsu

.

Combining Proposition 1.1 with Corollary 3.8 and (4.7) gives, with high probability,�p ¯ 2,‚pM1q |ℱ1pCqq

expt2#RSp ;*qu ď Q´

pℍ˚q2,‚¯

¨ exp"

# ¨ 2„

3` 1.53` 32 ` >Cp1q

�1p*q2 ¨

*

ď Q´

pℍ˚q2,‚¯

¨ exp"

12.1 ¨ # ¨ �1p*q2

*

.

For any � P t´1,`1u# , it follows by the Azuma–Hoe�ding inequality that

P t´1,`1u# :ˇ

ˇ

ˇ

ˇ

p� ´mpCq , ´mpCqq

#

ˇ

ˇ

ˇ

ˇ

ě G

˙

ď 2 exp"

´#G2

8

*

for any G ě 0. Recalling De�nition 5.3, it follows that for any � P ℍ˚,

P ℍ˚ : |�p� , q| ě ;

˙

ď 2 exp"

´#p1´ @ ` >#p1qq;2

8

*

(5.31)

for any ; ě 0. Taking ; “ 10 ¨ �1p*q 1{2 and summing over � gives

pℍ˚q2,‚¯

ďÿ

�Pℍ˚

P ℍ˚ : |�p� , q| ě 10 ¨ �1p*q 1{2˙

ď exp"

´ 12.4 ¨ # ¨ �1p*q2

*

.

It follows by combining the above bounds that�p ¯ 2,‚pM1q |ℱ1pCqq

expt2#RSp ;*qu ď exp"

´ 0.3 ¨ # ¨ �1p*q2

*

,

which concludes the proof. �

Page 52: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

52 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

5.3. Analysis of second moment. In this subsection we analyze the bound from �eorem 5.9.

Proposition 5.11 (analogous to Lemmas 4.2 and 4.3). For A2p� | 'q as de�ned by Proposition 5.7, we have

3A2p� | 'q

3�

ˇ

ˇ

ˇ

ˇ

�“0“ >#p1q ` >Cp1q !1{2

provided that ' “ Mv is compatible with the admissibility condition (2.33), and satis�es the bound }'}2 ď "! where! “ !1p*q “ 5�1p*q

2 as de�ned above.

Proof. For ℒ2p� | 'q as de�ned by (5.28), we have

3A2p� | 'q

3�

ˇ

ˇ

ˇ

ˇ

�“0“3ℒ2p� | 'q

3�

ˇ

ˇ

ˇ

ˇ

�“0“p1´ @q1{2

#

ÿ

0ď"

��*1pphpC`1qq0 ` p1´ @q1{2�q

�*pphpC`1qq0 ` p1´ @q1{2�q'0

(3.9)“p1´ @q1{2p�@phpC`1qq, 'q

#“p1´ @q1{2pnpC`1q , 'q

#.

On the other hand, it follows from the admissibility condition that ' “ Mv must satisfycrC ´ 1s'#1{2

(2.34)“ p�#qt�p�q “ >#p1q , (5.32)

where the last step is by the restriction � P ℍ˚. �e span of the vectors cpℓq for ℓ ď C´ 1 — which is the same as thespan of the vectors npℓq for ℓ ď C ´ 1 — does not contain npC`1q, but recall from (2.13) that

pnpC`1q , npC´1qq

##(2.13)“ �C´1 .

It follows from Proposition 2.5 and Lemma 3.11 that �C´1 “ 1´ >Cp1q, so we can decompose

npC`1q

p##q1{2“ c‖ ` cK

where c‖ lies in the span of the vectors cpℓq and has norm 1 ´ >Cp1q, while cK is orthogonal to the vectors cpℓq andhas norm >Cp1q. It follows that

ˇ

ˇ

ˇ

ˇ

ˆ

npC`1q

##1{2 , '

˙ˇ

ˇ

ˇ

ˇ

“|pc‖ ` cK , 'q|

#1{2

(5.32)ď >#p1q `

}cK}}'}#1{2 ď >#p1q ` >Cp1qp !q1{2 ,

having used Cauchy–Schwarz together with the assumption }'}2 ď "!. In conclusion we �nd3A2p� | 'q

3�

ˇ

ˇ

ˇ

ˇ

�“0“ >#p1q ` >Cp1q !1{2 ,

as claimed. �

Lemma 5.12 (analogous to Corollary 4.9). Suppose* satis�es Assumptions 1 and 2. Recall 2p*q from Assumption 2,and �1p*q from Lemma 3.3. For ℒ2p� | 'q as de�ned by (5.28), we have the bound

ˇ

ˇ

ˇ

ˇ

32ℒ2p� | 'q

3�2

ˇ

ˇ

ˇ

ˇ

ď 420 ¨ �1p*q2 2p*q ¨

as long as |�| ď 4{5 and }'}2{" ď ! “ !1p*q “ 5�1p*q2.

Proof. Let 4p�q ” p1´ @q1{2p1´ �2q1{2. Denote

_ ” _p�; 'q “"

hpC`1q ` p1´ @q1{2�'

*

` p1´ @q1{2p1´ �2q1{2�1 ” ^p�; 'q ` 4p�q�1 .

�en the function ℒ2p� | 'q from Proposition 5.7 can be rewri�en as

ℒ2p� | 'q ”1#

ÿ

0ď"

log��*p_0p�, 'qq .

Page 53: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 53

Recalling the notation of (4.9), (4.10), (4.11), and (4.12), let us now de�ne A ” �2p^q, B ” �2p^q, 0 ” p1, 02p^qq,and 1 ” p1, 12p^qq, for 2 “ 4p�q and ^ “ ^p�; 'q. With this notation, we have

32ℒ2p� | 'q

3�2 “1#

ÿ

0ď"

"

��*1p_0q

32_03�2 `��*

2p_0qp3_03� q

2

��*p_0q´

ˆ

��*1p_0q

3_03�

��*p_0q

˙2*

.

We decompose the above as pIq ` pIIq ` pIIIq ` pIVq where (cf. Lemma 4.5)

pIq ” 1#p^ 1p�qqtpdiagAq^ 1p�q “

1´ @

#'tpdiagAq' ,

pIIq ” 2#4 1p�qBt^ 1p�q “ ´

2p1´ @q1{2�

#p1´ �2q1{2Bt' ,

pIIIq ” 1#42p�q0 “ ´

p1´ @q1{2

#p1´ �2q3{20 ,

pIVq ” 1#4 1p�2q1 “

p1´ @q�2

#p1´ �2q1 .

We bound each of the above terms, assuming |�| ď 4{5. Applying (4.18) gives

|pIq| ď 1#}A}8}'}2 ď

1#

1.7 ¨ 2p*q}'}2 ď 1.7 ¨ 2p*q ! “ 8.5 ¨ �1p*q

2 2p*q ¨ .

It follows from the above de�nition of ^ ” ^p�; 'q that}^}

"1{2 ď}hpC`1q} ` }'}

"1{2 ď 2@1{2 ` !1{2 ď 2.5 ¨ �1p*q , (5.33)

with high probability. Combining (5.33) with Lemma 4.6 gives

|pIIq| ď 83# }B}}'} ď

8"1{2}'}

3# 2p*q

ˆ

2.5 ¨ �1p*q ` 5.8 ¨}^}

"1{2

˙

ď8 !1{2

3 2p*q

ˆ

2.5 ¨ �1p*q ` 5.8 ¨ 2.5 ¨ �1p*q

˙

ď 105 ¨ �1p*q2 2p*q ¨

Next, combining (5.33) with Lemma 4.7 gives

|pIIIq| ď 4.7#|0| ď

4.7 ¨"#

"

1.1 ¨ �1p*q ` 3.7 ¨}^}2

"

*

ď4.7 ¨"#

"

1.1 ¨ �1p*q ` 3.7 ¨ 2.52 ¨ �1p*q2*

ď 110 ¨ �1p*q2 ¨ .

Finally, combining (5.33) with Lemma 4.8 gives

|pIVq| ď 1.8#|1| ď 1.8 ¨ 2p*q

ˆ

4.6 ¨ �1p*q ` 17 ¨}^}2

"

˙

¨

ď 1.8 ¨ 2p*q

ˆ

4.6 ¨ �1p*q ` 17 ¨´

2.5 ¨ �1p*q¯2˙

¨ ď 193 ¨ �1p*q2 2p*q ¨ .

Combining the above bounds gives the claim. �

Corollary 5.13 (analogous to Proposition 4.4). Suppose * satis�es Assumptions 1 and 2. For Ψ2p� | 'q as in thestatement of �eorem 5.9, we have the bound

ˇ

ˇ

ˇ

ˇ

32Ψ2p� | 'q

3�2

ˇ

ˇ

ˇ

ˇ

ď 610 ¨ �1p*q2 2p*q ¨ ,

as long as |�| ď 4{5 and }'}2{" ď !1p*q “ 5�1p*q2.

Page 54: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

54 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof. It follows from the de�nition thatˇ

ˇ

ˇ

ˇ

32Ψ2p� | 'q

3�2

ˇ

ˇ

ˇ

ˇ

ď#p1´ @qp1` 3�2q

p1´ �2q3`

ˇ

ˇ

ˇ

ˇ

32ℒ2p� | 'q

3�2

ˇ

ˇ

ˇ

ˇ

ď 62.6 ¨ # `ˇ

ˇ

ˇ

ˇ

32ℒ2p� | 'q

3�2

ˇ

ˇ

ˇ

ˇ

.

Applying Proposition 1.1 and Lemma 5.12 givesˇ

ˇ

ˇ

ˇ

32A2p� | 'q

3�2

ˇ

ˇ

ˇ

ˇ

ď

"

62.6 ¨ 3 ¨ �1p*q2 ` 420 ¨ �1p*q

2 2p*q

*

¨ .

�e claim follows. �

5.4. Conclusion of second moment. In this concluding subsection we �nish the proof of �eorem 1.6, and use itto deduce the lower bound of �eorem 1.2 in the case }D}8 ă 8.

Proof of �eorem 1.6 (conclusion). Recall that the proof of the �rst moment lower bound (1.20) was already given atthe end of §5.1. It therefore remains to show the second moment upper bound (1.21), and for this we follow the proofof �eorem 1.5. Recall from (5.29) that we decomposed ¯ 2pM1q as the sum of ¯ 2,˝pM1q and ¯ 2,‚pM1q. For ¯ 2,˝pM1q,we will analyze the bound from �eorem 5.9. We note that at � “ 0 we have

Ψ2p0 | 'q ´Ψp�˚ , +˚q “ ´#p1´ @q

2 ` ℒ2p0 | 'q(5.28)» ´

#p1´ @q

2 ` �!@p@1{2/q

(4.30)» Ψp�˚ , +˚q .

It follows by combining with (4.29) that

p1, logp2 chpHpCqqqq `Ψp� | 'q

2#Ñ8ÝÑ RSp ;*q

Next, for |�| ď 4{5, it follows by combining Proposition 5.11 and Corollary 5.13 that

Ψ2p� | 'q ´Ψ2p0 | 'q ď3Ψ2p� | 'q

3�

ˇ

ˇ

ˇ

ˇ

�“0¨ �`max

ˇ

ˇ

ˇ

32Ψ2p� | 'q

3�2

ˇ

ˇ

ˇ

ˇ

: |�| ď 45

*

¨�2

2

ď >Cp1q�` 610 ¨ �1p*q2 2p*q ¨ ¨

�2

2 .

Recalling that ď p*q as de�ned by (1.6), the above can be simpli�ed as

Ψ2p� | 'q ´Ψ2p0 | 'q(1.6)ď >Cp1q `

610 ¨ �2

2410�1p*q4 2p*q3ď >Cp1q `

�2

413 .

Substituting this into the bound from �eorem 5.9 gives�p ¯ 2,˝pM1q |ℱ1pCqq

expt2#rRSp ;*q ` >Cp1qsuď

ÿ

p� , qPpℍ˚q2,˝

Qp� , q exp"

#�2

413

*

.

It follows by combining with (5.31) that the right-hand side is bounded by a constant. Finally, we recall that�p ¯ 2,‚pM1q was bounded by Corollary 5.10, so the claim follows. �

Proof of �eorem 1.2 lower bound assuming }D}8 ă 8. It follows from the �rst bound from �eorem 1.6 that

ˆ

¯ 1"

¯ ě �p ¯ |ℱpCqq2

ˇ

ˇ

ˇ

ℱpCq

˙

ě�p ¯ |ℱpCqq

2(1.20)ě

expt#pRSp ;*q ´ >Cp1qqu2 , (5.34)

with high probability over the randomness of ℱpCq. On the other hand, the Cauchy–Schwarz inequality gives

ˆ

¯ 1"

¯ ě �p ¯ |ℱpCqq2

ˇ

ˇ

ˇ

ℱpCq

˙2ď �p ¯ 2 |ℱpCqq ¨ ℙ

ˆ

¯ ě �p ¯ |ℱpCqq2

ˇ

ˇ

ˇ

ˇ

ℱpCq

˙

.

Combining the above with the second bound from �eorem 1.6 gives, again with high probability,

ˆ

¯ ě expt#pRSp ;*q ´ >Cp1qqu2

ˇ

ˇ

ˇ

ˇ

ℱpCq

˙

(5.34)ě

expt2#pRSp ;*q ´ >Cp1qqu{4�p ¯ 2 |ℱpCqq

(1.21)ě

1{4expp2#>Cp1qq

. (5.35)

Page 55: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 55

Next let ℙ9 denote probability conditional on the �rst 9 rows of M, and let �9 denote expectation with respect to ℙ9 .�en, as in the proof of [Tal11b, Propn. 9.2.6], we take the martingale decomposition

1#

"

log` ´� log`*

“ÿ

9ď"

1#

"

�9 log` ´�9´1 log`*

”ÿ

9ď"

-9 .

To bound -9 , let ` 9 denote the normalized partition function without the 9-th factor,

` 9 ”ÿ

ź

0ď",0‰9

*

ˆ

pg0 , �q#1{2

˙

. (5.36)

Since ` 9 does not depend on the 9-th row of M, we can rewrite

#-9 “ �9 log `

` 9´�9´1 log `

` 9.

By Assumption 1 and the uniform bound on D ” log* , we have1

expp}D}8qď

`` 9ď 1 ,

which implies |#-9 | ď }D}8 almost surely. It follows from the Azuma–Hoe�ding bound that

ˆ

ˇ

ˇ

ˇ log` ´� log`ˇ

ˇ

ˇ ě #&

˙

ď 2 exp"

´#&2

2 p}D}8q2

*

”2

expp#Bp&qq . (5.37)

On the other hand, if we �x any & ą 0, then (5.35) implies

ˆ

1#

log` ě RSp ;*q ´ & ´log 2#

˙

ě >#p1q `1{4

expp2#>Cp1qqě

1{4expp#Bp&q{2q . (5.38)

Note that (5.37) and (5.38) contradict one another unless1#� log` ě RSp ;*q ´ 2& ´

log 2#

. (5.39)

It follows using (5.37) again that, for # large enough,

ˆ

1#

log` ď RSp ;*q ´ 4&˙

(5.39)ď ℙ

ˆ

log` ´� log` ď ´#&

˙

(5.37)ď >#p1q .

In the above, the le�-hand side does not depend on C, so it follows that

lim inf#Ñ8

1#

log` ě RSp ;*q

in probability. �is gives the lower bound in �eorem 1.2 in the case }D}8 ă 8. �

6. Local central limit theorem

In this section we state and prove Proposition 6.13 (used in the proofs of Proposition 2.12 and Proposition 5.1)and Proposition 6.14 (used in the proof of Proposition 5.7). Recall the calculation of ˜

� from Lemma 2.19. Given� P t´1,`1u# and � P ℝC´1, we de�ne

^ ” ^� ,� “ ˜� ` #1{22p�p�qqcrC ´ 1st� “

hrCst�@1{2 `

p1´ @qnrC ´ 1st�@1{2 ` #1{22p�p�qqcrC ´ 1st�

(2.15)“

hrCst�@1{2 ` nrC ´ 1st

"

p1´ @q�

@1{2 `2p�p�qq

#1{2 pp�#qtq´1�

*

. (6.1)

Let '0 (0 ď ") be independent scalar random variables, such that '0 has density given by (cf. De�nition 3.1)

"^0 ,2pIq ”*p^0 ` 2Iq!pIq

��*p^0 ` 2�q, (6.2)

Page 56: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

56 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

where ^ ” ^� ,� as above, and 2 ” 2p�p�qq ” p1´ }�p�q}2q1{2. Note that

�'0 “��r�*p^0 ` 2�qs

��*p^0 ` 2�q(1.4)“ 2p�p�qq�}�p�q}2p^0q . (6.3)

Let n0 P ℝC´1 denote the 0-th column of the matrix nrC ´ 1s, and consider the random variable

] ”1

#1{2

ÿ

0ď"

p'0 ´�'0qn0 “nrC ´ 1sp' ´�'q

#1{2 P ℝC´1 . (6.4)

Let %� ,� denote the law of ] . We will compare %� ,� with the gaussian distribution on ℝC´1 that has mean zero andcovariance

Σ ” Σ� ,� ”1#

ÿ

0ď"

pVar '0qn0pn0qt P ℝpC´1qˆpC´1q . (6.5)

(We bound the singular values of Σ� ,� in Lemma 6.2 below.) �e majority of this section is occupied with provingthe following result:

Proposition 6.1 (local central limit theorem). Suppose* satis�es Assumptions 1 and 2. Recall that %� ,� is the law ofthe random variable] from (6.4). For any �nite constant �max, it holds with high probability that for all � P t´1,`1u#and all }�} ď �max, the measure %� ,� has a bounded continuous density ?� ,�. Moreover, again with high probability,

sup"

}?� ,� ´ 6� ,�}8 : � P t´1,`1u# , }�p�q} ď 45 , }�} ď �max

*

ď1

p2�qC´1#0.35 ď1#0.3 ,

where 6� ,� denotes the density of the centered gaussian distribution on ℝC´1 with covariance Σ ” Σ� ,�.

At the end of this section we will show that Proposition 6.1 readily implies the required results Propositions 6.13and 6.14. Towards the proof of Proposition 6.1, we introduce some notation. Write ?0 for the density function of therandom variable '0 ´�'0 , so in the notation of (6.2) we have

?0pIq “ "^0 ,2pI `�'0q “ "^0 ,2

ˆ

I `��r�*p^0 ` 2�qs

��*p^0 ` 2�q

˙

.

�e characteristic function of the random variable ] from (6.4) (i.e., the Fourier transform of the measure %� ,�) isgiven by the function

?psq ” ?� ,�psq ” � exppips,] qq “ź

0ď"

� exp"

ips, n0qp'0 ´�'0q#1{2

*

“ź

0ď"

?0

ˆ

ps, n0q#1{2

˙

, (6.6)

where ?0 denotes the Fourier transform of ?0 . �e Fourier transform of the gaussian density 6 ” 6� ,� is given by

6psq ” 6� ,�psq ” exp"

´ps,Σsq

2

*

“ź

0ď"

exp"

´ps, n0q2 Var '0

2#

*

”ź

0ď"

60psq . (6.7)

With ? ” ?� ,� as in (6.6) and 6 ” 6� ,� as in (6.7), we de�ne

�1p� , �q ”

ż

ˇ

ˇ

ˇ?� ,�psq ´ 6� ,�psqˇ

ˇ

ˇ1!

}s} ď #0.01)

3s , (6.8)

�2p� , �, &2q ”

ż

ˇ

ˇ

ˇ?� ,�psq ´ 6� ,�psqˇ

ˇ

ˇ1!

#0.01 ď }s} ď &2#1{2)

3s , (6.9)

�3p� , �, &2q ”

ż

ˇ

ˇ

ˇ?� ,�psq ´ 6� ,�psqˇ

ˇ

ˇ1!

}s} ě &2#1{2)

3s . (6.10)

In the analysis below we show that the integrals � 9p� , �q can be bounded uniformly over � P t´1,`1u# such that}�p�q} ď 4{5, and any bounded range of vectors �. �e remainder of this section is organized as follows:

‚ In §6.1 we bound the quantities �1 and �2 from (6.8) and (6.9).‚ In §6.2, in preparation for bounding �3 from (6.10), we prove rough estimates concerning the nondegeneracy

of the vectors arising from the AMP iteration.‚ In §6.3 we bound �3 from (6.10).

Page 57: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 57

‚ In §6.4 we combine the bounds from the preceding sections to �nish the proof of Proposition 6.1. We thenstate and prove Proposition 6.13 and 6.14.

�e analysis of this section is based on standard methods; see e.g. [Pet75, Bor17].

6.1. Fourier estimates at low and intermediate frequency. In this subsection we prove Lemmas 6.4 and 6.5,bounding the quantities �1 and �2 from (6.8) and (6.9).

Lemma 6.2. Suppose* satis�es Assumption 1 and 2, and let Σ be as in (6.5). Given any �max ă 8, there is a positiveconstant �1, depending on C and on �max, such that we have the bounds

inf"

pD,Σ� ,�Dq : }D} “ 1, � P t´1,`1u2 , }�p�q} ď45 , }�} ď �max

*

ě �1 , (6.11)

sup"

pD,Σ� ,�Dq : }D} “ 1, � P t´1,`1u2 , }�p�q} ď45 , }�} ď �max

*

ď1�1, (6.12)

with probability 1´ >#p1q.

Proof. Abbreviate E0 ” Var '0 , and note that Assumption 2 gives, with 2 “ 2p�p�qq ” p1´ }�p�q}2q2,

E0 “ Ep^0 , 2q ”12��rp� ´ �1q2*p^0 ` 2�q*p^0 ` 2�1qs

��,�1r*p^0 ` 2�q*p^0 ` 2�1qsď 2p*q

2 . (6.13)

It follows that, for any unit vector D P ℝC´1, and with �C as de�ned by Remark 2.4, we have

pD,ΣDq “1#

ÿ

0ď"

E0pn0 , Dq2 ď 2p*q

2#ÿ

0ď"

pn0 , Dq2

“ 2p*q

2#

›nrC ´ 1stD›

2 (2.15)“

2p*q#

2

›p�#qtD›

2ď 2p*q#�C

2 ,

which proves (6.11). Next, for any !, let "p!q Ď r"s denote the subset of indices 0 ď " satisfying the condition

m0 ” max"

|phpBqq0 |, |pnpℓqq0 | : B ď C , ℓ ď C ´ 1*

ď ! . (6.14)

It follows from Lemma 2.3 that with high probability we can bound

max"

1"

ÿ

0ď"

pphpBqq0q4 ,1"

ÿ

0ď"

ppnpℓqq0q4 : B ď C , ℓ ď C ´ 1*

ď ℘4

for a constant ℘4. As a result, for any �nite !, we can bound

1"

ÿ

0ď"

1!

|phpBqq0 | ě !)

ď1"

ÿ

0ď"

pphpBqq0q4

!4 ď℘4!4 ,

and similarly with npℓq in place of hpBq. It follows using the Cauchy–Schwarz that

1"

ÿ

0ď"

pnpℓq0 q21!

|phpBqq0 | ě !)

ď

ˆ

1"

ÿ

0ď"

pnpℓq0 q4˙1{2ˆ 1

"

ÿ

0ď"

1!

|phpBqq0 | ě !)

˙1{2ď

℘4!2 ,

1"

ÿ

0ď"

pnpℓq0 q21!

|pnp9qq0 | ě !)

ď

ˆ

1"

ÿ

0ď"

pnpℓq0 q4˙1{2ˆ 1

"

ÿ

0ď"

1!

|pnp9qq0 | ě !)

˙1{2ď

℘4!2 ,

where the bounds hold for all B ď C and all 9 , ℓ ď C´ 1. Combining these bounds gives, withm0 as de�ned in (6.14),1"

ÿ

0R"p!q

pnpℓq0 q2 “1"

ÿ

0ď"

pnpℓq0 q21!

|m0 | ě !)

ď2C℘4!2 . (6.15)

Page 58: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

58 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Next, it follows from the de�nition (6.1) of ^ ” ^� ,C that for all 0 ď ",

|^0 | ď}�}8@1{2

ÿ

BďC

|phpBqq0 | `ˆ

}�}8@1{2 `

�C}�}

#1{2

˙

ÿ

ℓďC´1|pnpℓqq0 |

ď�Cp1` }�}qp@#q1{2

ÿ

BďC

ˆ

|phpBqq0 | ` |pnpBqq0 |˙

. (6.16)

If }�} ď �max where (without loss) �max ě 1, then we obtain

max"

|^0 | : 0 P "p!q*

ď4�C�maxC!

p@#q1{2” !1 . (6.17)

It follows using Assumption 1 that for any �nite !1 we must have

inf"

EpG, 2q : 12 ď 2 ď 1, |G| ď !1

*

ě &p!1q ą 0 .

It follows that, for any unit vector D P ℝC´1, we have the lower bound

pD,ΣDq ě&p!1q

#

ÿ

0P"p!q

pn0 , Dq2 ě&p!1q

#

"

›nrC ´ 1stD›

ÿ

0R"p!q

pn0 , Dq2*

(2.15)ě &p!1q

"

#›

›p�#qtD›

1#

ÿ

0R"p!q

}n0}2*

(6.15)ě &p!1q

"

#

�C´

2C2℘4!2

*

ě&p!1q#

2�C,

where the last inequality can be arranged by taking ! large enough (note that ! depends on C, and !1 depends on !).�is proves the second assertion (6.12). �

Lemma 6.3 (Taylor expansion of characteristic function). Suppose * satis�es Assumptions 1 and 2. Let ?0 be as in(6.6), and recall that it depends on both � and �. It holds with high probability that

max"ˇ

ˇ

ˇ

ˇ

?0psq ´

ˆ

1´ps, n0q2 Var '0

2#

˙ˇ

ˇ

ˇ

ˇ

: � P t´1,`1u# , }�p�q} ď 45 , }�} ď #0.01

*

ď}s}3

#1.4

for all s P ℝC´1 and all 0 ď ".

Proof. It is well-known that for all G P ℝ we haveˇ

ˇ

ˇ

ˇ

4 8G ´

ˆ

1` 8G ´G2

2

˙ˇ

ˇ

ˇ

ˇ

ď|G|3

6 .

We also note that Lemma 3.3 implies the third moment bound

ˆ

ˇ

ˇ

ˇ'0 ´�'0ˇ

ˇ

ˇ

ď 8�p|'0 |3q “8��r|�|3*p^0 ` 2�qs

��*p^0 ` 2�qď 8

ˆ

�1p*q ` p8|^0 |q3˙

.

As a consequence, for all s P ℝC´1 we haveˇ

ˇ

ˇ

ˇ

?0psq ´

ˆ

1´ps, n0q2 Var '0

2#

˙ˇ

ˇ

ˇ

ˇ

ď|ps, n0q|3

6#3{2 �

ˆ

ˇ

ˇ

ˇ'0 ´�'0ˇ

ˇ

ˇ

ď4|ps, n0q|3

3#3{2

ˆ

�1p*q ` p8|^0 |q3˙

. (6.18)

By combining Lemma 3.15 with the bound (6.16) and the restriction }�} ď #0.01, we must have }^}8 ď #0.021 withhigh probability. �erefore, with high probability,

4|ps, n0q|3

3#3{2

ˆ

�1p*q ` p8|^0 |q3˙

ď4}s}3C3{2#0.03

3#3{2

ˆ

�1p*q ` p8|^0 |q3˙

ď}s}3

#1.4 .

Combining with (6.18) concludes the proof. �

Page 59: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 59

Lemma 6.4 (low-frequency estimate). Suppose* satis�es Assumption 1 and 2. In the notation of (6.8), we have

max"

�1p� , �q : � P t´1,`1u# , }�p�q} ď 45 , }�} ď #0.01

*

ď1

#0.38

with probability 1´ >#p1q.

Proof. Recall from (6.13) that Var '0 ď 2p*q{2. Combining with Lemma 3.15 gives, with high probability,ps, n0q2 Var '0

2# ď}s}2Cp}n0}8q2 2p*q

4# ď}s}2C 2p*q

4#0.98 ď}s}2

#0.97 ,

We have | logp1´ Gq ` G| ď G2 for all G small enough, so if }s} ď #0.01, then combining with Lemma 6.3 givesˇ

ˇ

ˇ

ˇ

log ?0psq `ps, n0q2 Var '0

2#

ˇ

ˇ

ˇ

ˇ

ď}s}3

#1.4 `

ˆ

}s}2

#0.97 `}s}3

#1.4

˙2

ď}s}3

#1.4 `

ˆ

2}s}2

#0.97

˙2ď

2}s}3

#1.4 ď1

#1.39 .

Summing the above over 0 ď " gives that the multiplicative error between ?psq and 6psq is small for all }s} ď #0.01.�erefore, with high probability, we have the bound

�1p� , �q ď

ż

6psq

"

expˆ

"

#1.39

˙

´ 1*

3s ďp2�qpC´1q{2

#0.385pdetΣq1{2ď

1#0.38 ,

uniformly over all � P t´1,`1u# and all }�} ď #0.01. �

Lemma 6.5 (moderate-frequency estimate). Suppose * satis�es Assumption 1 and 2. With the notation of (6.9), forany �nite constant �max, we can choose &2 depending on �max such that

sup"

�2p� , �, &2q : � P t´1,`1u# , }�p�q} ď 45 , }�} ď �max

*

ď1

expp#0.01q

with probability 1´ >#p1q.

Proof. It follows from the bound (6.18) in the proof of Lemma 6.3 that, with high probability, we have

|?psq| ď 6psq exp"

4}s}3

3#3{2

ÿ

0ď"

}n0}3´

�1p*q ` p8|^0 |q3¯

*

for all s P ℝC´1. Recall the bound (6.16) on ^ “ ^� ,�. If we assume without loss of generality that �max ě 1, thencombining (6.16) with Lemma 2.3 gives, with high probability,

sup"

43#

ÿ

0ď"

}n0}3´

�1p*q ` p8|^0 |q3¯

: � P t´1,`1u= , }�} ď �max

*

ď p�maxq3℘1 ,

where ℘1 is a �nite constant. On the other hand, by Lemma 6.2, with high probability

6psq ď exp"

´�1}s}2

2

*

.

It follows that, with high probability,

|?psq| ď exp"

´�1}s}2

2 `p�maxq

3℘1}s}3

#1{2

*

.

To ensure that the quadratic term dominates the cubic term, we restrict to }s} ď &2#1{2 where

&2 ”�1

4p�maxq3℘1.

For this choice of &2 we �nd that, with high probability, we have the bound

�2p� , �, &q ď

ż

}s}ě#0.01exp

"

´�1}s}2

4

*

3s ď1

expp#0.01q

uniformly over all � P t´1,`1u# and }�} ď �max. �

Page 60: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

60 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

6.2. Non-degeneracy of TAP iterates. In this subsection we prove some preliminary results which will be usedin §6.3 to estimate the quantity �3 from (6.10).

Lemma 6.6. If � is any : ˆ" matrix such that ��t “ �: , then � has a : ˆ : submatrix* such that

| det*| ěˆ

:!":

˙1{2.

Proof. We argue by induction on :. If : “ 1 then � consists of a single row which is a unit vector in ℝ" , so clearly� must have an entry with absolute value at least 1{"1{2. Now suppose : ě 2 and that the claim has been provedup to : ´ 1. Denote the columns of � as b1 , . . . , b" where each b0 P ℝ: . Since

ÿ

0ď"

}b0}2 “ trp��tq “ : ,

there must exist at least one index 0 ď " with

}b0}2 ě:

".

We assume without loss that 0 “ 1. Let $ be a :ˆ : orthonormal matrix such that $b1 “ }b1}41, where 41 denotesthe �rst standard basis vector in ℝ: . Let � ” $�, and note that ��t “ $��t$t “ �: , so � also has orthonormalrows. We can further decompose

� “ $� “

ˆ

}b1} ˚

0 �

˙

where 0 denotes the zero vector in ℝ:´1, and � is a p: ´ 1q ˆ p" ´ 1q matrix with orthonormal rows. It followsfrom the inductive hypothesis that � has a p: ´ 1q ˆ p: ´ 1q submatrix * with

| det *| ěˆ

p: ´ 1q!p" ´ 1q:´1

˙1{2.

As a result, � has a : ˆ : submatrix * with

| det *| “ˇ

ˇ

ˇ

ˇ

detˆ

}b1} ˚

0 *

˙ˇ

ˇ

ˇ

ˇ

ě }b1} ¨ | det *| ě :1{2

"1{2

ˆ

p: ´ 1q!p" ´ 1q:´1

˙1{2ě

ˆ

:!":

˙1{2.

�e claim follows by noting that * “ $t* is a submatrix of the original matrix �. �

Corollary 6.7. If � is any : ˆ" matrix such that }��t ´ �:}8 ď 1{p3:q, then � has a : ˆ : submatrix* with

| det*| ě 13

ˆ

:!":

˙1{2. (6.19)

(In the above, as elsewhere, } ¨ }8 denotes the entrywise maximum absolute value of the matrix.)

Proof. Denote the rows of � as u1 , . . . , u: where each uℓ P ℝ" . Consider the Gram–Schmidt orthogonalization ofthese vectors: for each ℓ ď :, we decompose

uℓ ” uℓ ,‖ ` uℓ ,K ”ÿ

9ďℓ´12ℓ , 9u9 ` uℓ ,K

where uℓ ,‖ is the orthogonal projection of uℓ onto the span of u1 , . . . , uℓ´1. �en for all 9 ď ℓ ´ 1 we must have

0 “ puℓ ,K , u9q “ puℓ , u9q ´ÿ

8ďℓ´11t8 ‰ 9u2ℓ ,8pu8 , u9q ´ 2ℓ , 9}u9}2 .

Abbreviate & ” &p:q ” 1{p3:q, so the assumptions imply that }u9}2 ě 1 ´ & while |pu8 , u9q| ď & for all 8 ‰ 9.Rearranging the above gives an upper bound for |2ℓ , 9 | in terms of the other coe�cients 2ℓ ,8 (8 ‰ 9). If we furtherdenote 2max ” maxt|2ℓ , 9 | : ℓ ď :, 9 ď ℓ ´ 1u, then we have

2max ď&

1´ &

"

1` p: ´ 2q2max

*

.

Page 61: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 61

Rearranging the inequality gives the bound

2max ď&

1´ &

1´&p: ´ 2q

1´ &

˙

“&

1´ &p: ´ 1q ď3&2 .

From this bound we can deduce that for all ℓ ď : we have

}uℓ ,‖}2 “›

ÿ

9ďℓ´12ℓ , 9u9

ˆ

3&2

˙2"

pℓ ´ 1qp1` &q ` pℓ ´ 1qpℓ ´ 2q&*

ď

ˆ

3&2

˙2 4:3 “ & .

It follows that }uℓ ,K}2 “ }uℓ }2 ´ }uℓ ,‖}2 ě 1´ 2&. (We also have trivially }uℓ ,K}2 ď }uℓ }2 ď 1` &.) Let ' denotethe Gram–Schmidt matrix, so ' is : ˆ : lower triangular with entries

'ℓ , 9 “1

}uℓ ,K}

"

1tℓ “ 9u ´ 1tℓ ă 9u2ℓ , 9

*

.

Since ' is lower triangular, its determinant is simply the product of its diagonal entries, so1

p1` &q:ď det' “

ź

ℓď:

1}uℓ ,K}

ď1

p1´ 2&q:.

By construction, � “ '� is a : ˆ " matrix with orthonormal rows, so Lemma 6.6 implies that � has a : ˆ :

submatrix * with

| det *| ěˆ

:!":

˙1{2.

�erefore * “ '´1* is a : ˆ : submatrix of the original matrix �, with

| det*| “| det *|det' ě p1´ 2&q:

ˆ

:!":

˙1{2“

ˆ

1´ 23:

˙:ˆ:!":

˙1{2ě

13

ˆ

:!":

˙1{2,

where the bound holds for all : ě 1. �

Lemma 6.8. Suppose * satis�es Assumption 1 and 2. Recall from (2.15) that the matrix crC ´ 1s is pC ´ 1q ˆ" withorthonormal rows. Let "p!q Ď r"s be as de�ned in the proof of Lemma 6.2 (see (6.14)). If � “ �p!q is the submatrixof crC ´ 1s with column indices in "p!q, then with high probability it satis�es

}�}8 ďC�C!

p##q1{2.

It is possible to choose ! “ !pCq large enough such that, with high probability, }��t ´ �C´1}8 ď 1{p4Cq.

Proof. It follows using (2.15) that for each ℓ ď C ´ 1,´

pcpℓqq0¯2“

ˆ

ÿ

9ďC´1

pp�#q´1qℓ , 9pnp9qq0p##q1{2

˙2ďCp�Cq2

##

ÿ

9ďC´1

´

pnp9qq0¯2. (6.20)

Applying (6.20) for 0 P "p!q gives the claimed bound on }�p!q}8. On the other hand, by applying (6.20) for0 R "p!q and combining with the bound (6.15) from the proof of Lemma 6.2, we �nd, with high probability,

ÿ

0R"p!q

´

pcpℓqq0¯2ďCp�Cq2

##

ÿ

9ďC´1

ÿ

0R"p!q

´

pnp9qq0¯2 (6.15)

ď"C2p�Cq2

##¨

2C℘4!2 ,

which can be made ď 1{p4Cq by choosing ! large enough. �en, for any ℓ , 9 ď C ´ 1, we haveˇ

ˇ

ˇ

ˇ

ÿ

0P"p!q

pcpℓqq0pcp9qq0 ´ 1tℓ “ 9u

ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

ÿ

0R"p!q

pcpℓqq0pcp9qq0ˇ

ˇ

ˇ

ˇ

ď14C ,

which shows that the matrix � “ �p!q satis�es }��t ´ �C´1}8 ď 1{p4Cq as desired. �

Corollary 6.9. Let"p!q Ď r"s be as in Lemma 6.8, where ! “ !pCq. With high probability, the matrix nrC ´ 1s hasdisjoint pC ´ 1q ˆ pC ´ 1q submatrices �1 , . . . , �t#0.9u, all involving only columns indexed by "p!q, such that each �8has minimal singular value lower bounded by a positive constant �2 (depending on C).

Page 62: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

62 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof. Let � “ �p!q be the submatrix of crC ´ 1s guaranteed by Lemma 6.8, so

}�}8 ďC�C!

p##q1{2, }��t ´ �} ď

14C .

�en � satis�es the conditions of Corollary 6.7, so it has a pC´ 1qˆ pC´ 1q submatrix*1 satisfying the determinantlower bound (6.19). Let �1 be the matrix obtained by deleting *1 from �. �en for all # large enough we have

›p�1qp�1qt ´ �C´1

14C ` C

ˆ

C�C!

p##q1{2

˙2ď

13C .

�us �1 also satis�es the conditions of Corollary 6.7, so it has a pC´1qˆpC´1q submatrix*2 which also satis�es thedeterminant lower bound (6.19). Repeating the same argument, we see that with high probability the original matrix� has disjoint pC ´ 1q ˆ pC ´ 1q submatrices*1 , . . . , *t#0.9u, all satisfying (6.19). Recalling (2.15), the correspondingsubmatrices of nrC ´ 1s are given by �8 ” p##q1{2�#*8 , and

| det�8 | ěp##qpC´1q{2| det*8 |

�Cě

ˆ

##

"

˙pC´1q{2ppC ´ 1q!q1{2

3�C” �2 .

Take any � “ �8 , and denote its singular values �1 ě . . . ě �C´1 ě 0. Note that �1 ď C}�}8 ď C!, where the lastbound holds since � only involves columns of nrC ´ 1s indexed by 0 P "p!q (as in Lemma 6.8). �en

�C´1 ě| det�|p�1qC´2 ě

�2pC!qC´2 ” �2 .

�is concludes the proof. �

6.3. Fourier estimates at high frequency. �e main result of this subsection is the following lemma:

Lemma 6.10 (high-frequency estimate). Suppose* satis�es Assumption 1 and 2. With the notation of (6.10), it holdsfor any �max ă 8 and any &2 ą 0 that

max"

�3p� , �, &2q : � P t´1,`1u# , }�p�q} ď 45 , }�} ď �max

*

ď1

expp#0.8q

with probability 1´ >#p1q.

Towards the proof of Lemma 6.10, recall that the random variable '0 has density given by (6.2). �us

?0pBq “ � exp"

iB´

'0 ´�'0¯

*

“"^0 ,2pBq

exppiB�'0q. (6.21)

We also denote @G,2pIq ” *pG ` 2Iq!pIq, and note that

"G,2pBq “@G,2pBq

��*pG ` 2�q“@G,2pBq

@G,2p0q. (6.22)

Note that Jensen’s inequality impliesˇ

ˇ

ˇ@G,2pBq ´ @G1 ,21pBqˇ

ˇ

ˇ “

ˇ

ˇ

ˇ

ˇ

ż

4 8BI´

*pG ` 2Iq ´*pG1 ` 21Iq¯

!pIq 3I

ˇ

ˇ

ˇ

ˇ

ď

ż

ˇ

ˇ

ˇ*pG ` 2Iq ´*pG1 ` 21Iqˇ

ˇ

ˇ!pIq 3I , (6.23)

and the last expression is bounded by Lemma 3.2.

Corollary 6.11. Suppose * satis�es Assumption 1. Given any & ą 0 and any ! ă 8, it is possible to choose largeenough (depending on & and !) such that

sup"

|"G,2pBq| : 12 ď 2 ď 2, |G| ď !, |B| ě

*

ď & ,

where "G,2 is as de�ned by (6.2).

Page 63: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 63

Proof. Recall from (6.22) the relation

"G,2pBq “@G,2pBq

��*pG ` 2�q“@G,2pBq

@G,2p0q.

By Assumption 1, the denominator @G,2p0q “ ��*pG ` 2�q is strictly positive for any given G P ℝ, 2 ą 0. On theother hand, it follows from Lemma 3.2 and (6.23) that @G,2p0q is continuous in pG, 2q. It follows that

inf"

��*pG ` 2�q : 12 ď 2 ď 2, |G| ď !

*

ă 8

for any �nite !. �erefore it su�ces to show the claim with @G,2 in place of "G,2 . By Lemma 3.2 again, given any& ą 0, we can choose �1 small enough such that

›@G,2 ´ @G1 ,21›

ż

ˇ

ˇ

ˇ*pG ` 2Iq ´*pG1 ` 21Iqˇ

ˇ

ˇ!pIq 3I ď&2

as long as 2, 21 P r1{2, 2s, G, G1 P r´!, !s, and maxt|G´ G1|, |2´ 21|u ď �1. Let tG8u be a �nite �1-net of r´!, !s, andlet t2 9u be a �nite �1-net of r1{2, 2s. It follows by the Riemann–Lebesgue lemma that there exists �nite such that

sup"

max8 , 9|@G8 ,2 9 pBq| : |B| ě

*

ď&2 .

For any |G| ď ! and 1{2 ď 2 ď 2, we can �nd G8 , 2 9 with maxt|G ´ G8 |, |2 ´ 2 9 |u ď �1, so

sup"

@G,2pBq : |B| ě

*

ď &

by combining the previous bounds. �is concludes the proof. �

Corollary 6.12. Suppose* satis�es Assumption 1. Let "G,2 be as de�ned by (6.2). �en

sup"

|"G,2pBq| : 12 ď 2 ď 2, |G| ď !, |B| ě &

*

ď 1´ &1 ă 1

for any �nite ! and any & ą 0, where &1 is a small positive constant depending on* , !, and &.

Proof. By Lemma 6.11, we can choose large enough such that

sup"

|"G,2pBq| : 12 ď 2 ď 2, |G| ď !, |B| ě

*

ď12 .

For any given G, 2, let ' be a random variable with density "G,2 . For any B ‰ 0,

|"G,2pBq| “

"

´

� cospB'q¯2`

´

� sinpB'q¯2*1{2

ă 1

by Jensen’s inequality. It follows from Lemma 3.2 and (6.23) that "G,2pBq is continuous in pG, 2, Bq, so

sup"

|"G,2pBq| : 12 ď 2 ď 2, |G| ď !, & ď |B| ď

*

ă 1

by compactness considerations. �e claim follows. �

Proof of Lemma 6.10. For any subset of indices ) “ t8p1q, . . . , 8pC ´ 1qu Ď r"s denote

))psq ”ź

ℓďC´1?8pℓqpsℓ q

for s P ℝC´1. It follows from (6.21) and Plancherel’s identity that the !2 norm of the function ?0pBq is the same asthe !2 norm of the function "^0 ,2pBq de�ned by (6.2). We also note that Assumption 1 implies

}"G,2}2 “

ż

*pG ` 2Iq2!pIq2

p��*pG ` 2�qq23I ď

1p1�q1{2

ż

*pG ` 2Iq!pIq

p��*pG ` 2�qq23I “

1��*pG ` 2�q

.

By compactness considerations (similarly as for (3.2)), we must have

inf"

��*pG ` 2�q : 12 ď 2 ď 2, |G| ď !1

*

ě 21p*, !1q .

Page 64: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

64 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

If ) Ď "p!q (as de�ned by Lemma 6.8), then it follows by combining the above with (6.17) that

}))}2 “ź

ℓďC´1}?8pℓq}2 ď

ˆ

sup"

}"G,2}2 : 12 ď 2 ď 2, |G| ď !

*˙C´1ď

ˆ

121p*, !1q

˙C

” ℘5 .

Now let �1 , . . . , �t#0.9u be the submatrices of nrC ´ 1s guaranteed (with high probability) by Corollary 6.9. Let )8denote the subset of column indices involved in �8 , and note

|?psq| ďź

8ďt#0.9u

ˇ

ˇ

ˇ

ˇ

))8

ˆ

p�8qts

#1{2

˙ˇ

ˇ

ˇ

ˇ

.

Moreover, each individual factor ))8 has modulus at most one. Combining with the preceding !2 bound givesż

ˇ

ˇ

ˇ

ˇ

))8

ˆ

p�8qts

#1{2

˙ˇ

ˇ

ˇ

ˇ

23s “

#pC´1q{2p}))8 }2q2

| det�8 |ď#pC´1q{2p℘5q

2

p�2qC´1 .

It follows using the Cauchy–Schwarz inequality thatż

ˇ

ˇ

ˇ

ˇ

))1

ˆ

p�1qts

#1{2

˙

))2

ˆ

p�2qts

#1{2

˙ˇ

ˇ

ˇ

ˇ

3s ď#pC´1q{2p℘5q

2

p�2qC´1 .

On the other hand, if }s} ě &2#1{2, then the least singular value bound from Corollary 6.9 implies

max"

|ps, n0q|#1{2 : 0 P )8

*

“}p�8q

ts}8

#1{2 ě}p�8q

ts}

p#Cq1{2ě

�2&2

C1{2.

Recall again that for 0 P "p!q, |^0 | is bounded by (6.17). Combining with the result of Corollary 6.12 givesˇ

ˇ

ˇ

ˇ

))8

ˆ

p�8qts

#1{2

˙ˇ

ˇ

ˇ

ˇ

ď sup"

|"G,2pBq| : 12 ď 2 ď 2, |G| ď !1 , |B| ě

�2&2

C1{2

*

ď 1´ &1 ă 1 .

To conclude we note that the quantity �3p� , �, &2q from (6.10) can be bounded by �3,6 ` �3,? where �3,6 is the integralof 6� ,�, while �3,? is the integral of ?� ,�. By (6.7) and Lemma 6.2, we have with high probability

�3,6 ”

ż

ˇ

ˇ

ˇ6� ,�psqˇ

ˇ

ˇ1!

}s} ě &2#1{2)

3s ď1

expp#0.9q.

By the previous calculations, we also have with high probability

�3,? ď

ˇ

ˇ

ˇ

ˇ

ź

8“1,2))8

ˆ

p�8qts

#1{2

˙ˇ

ˇ

ˇ

ˇ

3s

*

¨ sup" #0.9ź

8“3

ˇ

ˇ

ˇ

ˇ

))8

ˆ

p�8qts

#1{2

˙ˇ

ˇ

ˇ

ˇ

: |s| ě &2#1{2*

ď#pC´1q{2p℘5q

2

p�2qC´1 p1´ &1q#0.85ď

1expp#0.8q

.

�is concludes the proof. �

6.4. Conclusion of local CLT. In this concluding subsection we prove the local CLT Proposition 6.1, and apply itto deduce Propositions 6.13 and 6.14.

Proof of Proposition 6.1. Recall that ?� ,� and 6� ,� are de�ned by (6.6) and (6.7). It follows by combining Lemmas 6.4,6.5, and 6.10 that for any �nite constant �max, we have

sup"ż

ˇ

ˇ

ˇ?� ,�psq ´ 6� ,�psqˇ

ˇ

ˇ 3s : � P t´1,`1u# , }�p�q} ď 45 , }�} ď �max

*

ď1

#0.35

with high probability. Inverting the Fourier transform shows that, with high probability, the random variable ]from (6.4) has a bounded continuous density function ?� ,�, which satis�es

sup"

}?� ,� ´ 6� ,�}8 : � P t´1,`1u# , }�p�q} ď 45 , }�} ď �max

*

ď1

p2�qC´1#0.35 ď1#0.3 ,

as claimed. �

Page 65: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 65

We now de�ne the transformed gaussian density

g� ,�pIq ” #1{2| det �# |6� ,�ˆ

#1{2�#pI ´ #1{2�q ´nrC ´ 1s�'

#1{2

˙

, (6.24)

where �' is as in (6.3).

Proposition 6.13 (density bound for �rst moment). Suppose* satis�es Assumptions 1 and 2. �en we have

sup"

›p� ,�p¨ | 6Rq ´ g� ,�p¨q›

8: � P t´1,`1u# , }�} ď �max

*

ď1

#0.25

with high probability, where p� ,�p¨ | 6Rq is as in (2.43), while g� ,� is as in (6.24).

Proof. Recall that Proposition 6.1 above estimates the density ?� ,� of the random variable ] from (6.4),

] “nrC ´ 1sp' ´�'q

#1{2 P ℝC´1 , (6.25)

where each '0 has density given by (6.2). On the other hand, let / P ℝ" be a random vector with independentcoordinates, such that /0 has density

?0pIq – *´

p ˜ �q0 ` 2I¯

exp"

#1{2�tcrC ´ 1se0I*

!pIq ,

where 2 “ 2p�p�qq, ˜� is as in Lemma 2.19, and– denotes equality up to a normalizing constant. We see from (2.43)

that p� ,�p¨ | 6Rq is the density of the random variable crC ´ 1s/, for / as we have just described. Note that

?0

ˆ

I ` #1{2�tcrC ´ 1se0˙

– *

ˆ

p ˜ �q0 ` 2!

I ` #1{2�tcrC ´ 1se0)

˙

!pIq

(6.1)“ *

´

p^� ,�q0 ` 2I¯

!pIq(6.2)– "^0 ,2pIq ,

so it follows that / ´ #1{2crC ´ 1st� is equidistributed as ' for ' as in (6.25). �us p� ,�p¨ | 6Rq is the same as thedensity of

crC ´ 1s´

' ` #1{2crC ´ 1st�¯

(2.15)“

p�#q´1nrC ´ 1s'p##q1{2

` #1{2�

(2.43)“

p�#q´1]

#1{2 `p�#q´1nrC ´ 1s�'

p##q1{2` #1{2� .

It follows by making a change of variables that

p� ,�pI | 6Rq “ #1{2| det �# |?� ,�ˆ

#1{2�#pI ´ #1{2�q ´nrC ´ 1s�'

#1{2

˙

.

Comparing with (6.24), we have›

›p� ,�p¨ | 6Rq ´ g� ,�p¨q›

8“ #1{2| det �# |

›?� ,� ´ 6� ,�

8,

so the result follows from Proposition 6.1. �

Proposition 6.14 (density bound for second moment). Suppose * satis�es Assumptions 1 and 2. �en the bound(5.23) holds with high probability, where p |� ,�p¨ | 6R , 'q is as in (5.21).

Proof. �rough we abbreviate 2 “ 2p�p qq. First we slightly modify the de�nition from (6.2): let 2 P ℝ" be arandom vector with independent coordinates, such that each 20 has density given by "^0 ,4p�q for

^ “ ^ |� ,�p'q ” ˜ ` 2 ¨

ˆ

�' ` p1´ �2q1{2#1{2crC ´ 1st�˙

and 4p�q “ 2 ¨ p1´ �2q1{2. In this de�nition, ˜ is as in Lemma 5.8, and � “ �p� , q. We de�ne also (cf. (6.4))

] 1 ”nrC ´ 1sp2 ´�2q

#1{2 P ℝC´1 . (6.26)

Page 66: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

66 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

On the other hand, let / P ℝ" be a random vector with independent coordinates, such that �0 has density

?0pIq – *

ˆ

p ˜ q0 ` 2´

�'0 ` p1´ �2q1{2I¯

˙

exp"

#1{2�tcrC ´ 1se0I*

!pIq .

We see from (5.21) that p |� ,�p¨ | 6R , 'q is the density of the random variable crC ´ 1s/. Note that

?0

ˆ

I ` #1{2�tcrC ´ 1se0˙

– *

ˆ

^ |� ,�p'q ` 2 ¨ p1´ �2q1{2I

˙

!pIq – "^0 ,4p�q ,

which implies that / ´ #1{2crC ´ 1st� is equidistributed as 2. �us p |� ,�p¨ | 6R , 'q is the same as the density of

crC ´ 1s´

2 ` #1{2crC ´ 1st�¯

“p�#q´1] 1

#1{2 `p�#q´1nrC ´ 1s�2

p##q1{2` #1{2�

for 2 and ] 1 as de�ned above. It follows by a minor modi�cation of Proposition 6.1 (replacing ] from (6.4) with] 1 from (6.26)) that p |� ,�p¨ | 6R , 'q can be uniformly approximated by a gaussian density. �e claim follows. �

7. Concentration of partition function

In this section we prove Propositions 1.7, 1.8, and 1.10; and use these to conclude the proof of �eorem 1.2. �esection is organized as follows:

‚ As commented earlier, both Propositions 1.7 and 1.8 rely on a bound for near-isotropic gaussian processes,Proposition 7.1, which is proved in §7.1. See Remark 7.2 for further discussion of this result.

‚ In §7.2 we give the proof of Proposition 1.7.‚ In §7.3 we give the proof of Proposition 1.10, and use this to deduce that the free energy of the smoothed

model (1.24) is given by the replica symmetric formula (Corollary 7.10).‚ In §7.4 we give the proof of Proposition 1.8, and conclude the proof of �eorem 1.2.

Recall from §1.4 that Assumption 1 implies (1.22), where we can assume without loss that�p*q Ď r´�maxp*q, �maxp*qs

for some �nite �maxp*q.

7.1. Bounds for near-isotropic gaussian processes. �e following is a variant of [Tal11b, Cor. 8.2.5]:

Proposition 7.1. Let 2 P p0, 1{12s. Let v1 , . . . , v= be unit vectors in ℝ= such that pv8 , v9q ď 2 for all 8 ‰ 9. �en

ˆ

1=

ˇ

ˇ

ˇ

!

8 ď = : pg, v8q P �p*q)ˇ

ˇ

ˇ ď �

˙

ď �1{p252q

for all logp5{2q{plog =q ď � ď �0 “ �0p|�p*q|, �maxp*qq and = large enough.

Remark 7.2. We point out that there are two main di�erences between [Tal11b, Cor. 8.2.5] and Proposition 7.1.First, [Tal11b, Cor. 8.2.5] considers the event tpg, v8q ě 0u, and the proof relies crucially on Gordon’s inequality. Bycontrast, Proposition 7.1 considers the event tpg, v8q P �p*qu, where it does not seem possible to apply standardgaussian comparison inequalities. As a result we rely on more ad hoc arguments which yield a weaker bound, in thesense that [Tal11b, Cor. 8.2.5] holds for � polynomially small in = while Proposition 7.1 holds only for � decayinglogarithmically in =.

�e proof of Proposition 7.1 is given at the end of this subsection. We begin with some preparatory lemmas:

Lemma 7.3 (used in proof of Lemma 7.4). Let 2 P p0, 1q and denote �1p2q “ 1{ logp4{2q. For any P ℕ thereexists =0p2, q ă 8 such that the following holds for all = ě =0p2, q: if v1 , . . . , v= are unit vectors in ℝ= and< ď �1p2q log =, then there must exist distinct indices < ă 81 ă . . . ă 8 ď = such that

max"

›%<

´

v80 ´ v81¯›

› : 0, 1 ď

*

ď 2 ,

where %< denotes the orthogonal projection onto the span of tv1 , . . . , v<u.

Page 67: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 67

Proof. Suppose for contradiction that for all < ă 81 ă . . . ă 8 ď = we have

max"

›%<

´

v80 ´ v81¯›

› : 0, 1 ď

*

ą 2 . (7.1)

Let * denote the disjoint union of *<`1 , . . . , *= , where *8 is a copy of

�<

ˆ

%<v8 ,2

2

˙

ˆ

span!

v1 , . . . , v<)

˙

X �

ˆ

%<v8 ,2

2

˙

.

Note that if G P �p%<v8 , 2{2q then }G} ď }%<v8} ` 2{2 ď 3{2, so we have a natural mapping 8 : * Ñ �<p0, 3{2q.By the assumption (7.1), each point in �<p0, 3{2q has at most ´ 1 distinct preimages under the mapping 8, so

p= ´ <q vol �ˆ

%<v8 ,2

2

˙

“ vol* ď p ´ 1q vol �<ˆ

0,32

˙

.

If <1 “ dim spantv1 , . . . , v<u ď <, then it follows that

= ´ < ď p ´ 1qˆ

3{22{2

˙<1

ď

ˆ

32

˙�1p2q log =“ exp

"

logp3{2qlogp4{2q log =

*

,

which yields a contradiction for = large enough (depending on 2 and ). �

Lemma 7.4. Let 2 P p0, 1q and denote �1p2q “ 1{ logp4{2q. �ere exists =0p2q ă 8 such that the following holds forall = ě =0p2q: if v1 , . . . , v= are unit vectors in ℝ= with pv8 , v9q ď 2 for all 8 ‰ 9, then the vectors can be re-indexed insuch a way that

max"

›%<v<`1›

› : 1 ď < ď �1p2q log =*

ď p32q1{2 ,

where %< denotes the orthogonal projection onto the span of tv1 , . . . , v<u. (�e claim is non-trivial only if 2 ă 1{3.)

Proof. We shall assume the vectors are indexed such that for all 1 ď ℓ ď = we have›

›%ℓ´1vℓ›

› “ min"

›%ℓ´1v:›

› : ℓ ď : ď =

*

. (7.2)

Now suppose for the sake of contradiction that for some < ď �1p2q log = we have›

›%<v<`1›

(7.2)“ min

"

›%<v:›

› : < ` 1 ď : ď =

*

ą p32q1{2 . (7.3)

Take “ 2` r1{2s. By Lemma 7.3, for all = large enough we can �nd indices < ă 81 ă . . . ă 8 ď = such that

max"

›%<

´

v80 ´ v81¯›

› : 0, 1 ď

*

ď 2 . (7.4)

As a consequence, for any 0 ‰ 1 where 0, 1 ď , we haveˆ

p� ´ %<qv80 , p� ´ %<qv81˙

“ pv80 , v81 q ´ p%<v80 , %<v81 q

“ ´}%<v80 }2 `"

pv80 , v81 q ´´

%<v80 , %<pv81 ´ v80 q¯

*

ď ´2 ,

where the last bound uses (7.3), (7.4), and the assumption that pv8 , v9q ď 2 for all 8 ‰ 9. If we let

x0 ”p� ´ %<qv80

}p� ´ %<qv80 },

then the above implies that px0 , x1q ď ´2 for all 0 ‰ 1. It follows that

0 ď›

ÿ

x0›

2“

ÿ

0,1ď

px0 , x1q ď ´

1´ 2p ´ 1q¯

,

which gives a contradiction since we chose ě 2` 1{2. �

Page 68: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

68 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Lemma 7.5. Let v1 , . . . , v< be unit vectors in ℝ= (for any <, =) such that

max"

›%ℓ´1vℓ›

› : 1 ď ℓ ď <

*

ď 21 ď12 ,

where %ℓ´1 denotes the orthogonal projection onto the span of tv1 , . . . , vℓ´1u. Let g be a standard gaussian randomvector in ℝ= . �ere exists �0 “ �0p|�p*q|, �maxp*qq ą 0 such that

ˆ

1<

ˇ

ˇ

ˇ

!

8 ď < : pg, v8q P �p*q)ˇ

ˇ

ˇ ď �

˙

ď �1{p8p21q2q

for all 1{< ď � ď �0.

Proof. We shall assume without loss that <� is integer-valued. Let D8 ” pg, v8q, so that pD8q de�nes a (centered)gaussian random vector indexed by 8 ď =. For each 8 we can decompose D8 ” �8 ` �8 where �8 ” pg, %8´1v8q; atthe �rst step �1 “ 0. De�ne a parameter

B ” Bp*q ď max"

10, �maxp*q,

ˆ

ˇ

ˇ

ˇ log |�p*q|ˇ

ˇ

ˇ

˙1{2*

, (7.5)

and de�ne the random subset of indices� ”

"

8 ď < : |�8 | ď B

*

.

Let Ω� denote the event of interest,

Ω� ”

"

1<

ˇ

ˇ

ˇ

!

8 ď < : D8 P �p*q)ˇ

ˇ

ˇ ď �

*

.

On the event Ω� there must be a subset � Ď r<s of size <� such that D8 R �p*q for all 8 R �. �erefore

ℙpΩ�q ď ℙ

ˆ

|�| ď<

2

˙

`ÿ

|�|“<�

ˆ

D8 R �p*q @8 R �; |�| ą <

2

˙

. (7.6)

To bound the above we will consider a �xed subset �, without loss � “ t< ´ <� ` 1, . . . , <u. De�ne

�ℓ ” �

ˆ

p�8 , �8q : 1 ď 8 ď ℓ

˙

.

Let �0 ” 0 and de�ne the increasing sequence

�ℓ ” inf"

8 ą �ℓ´1 : 8 ď <, |�8 | ď B

*

.

Note that since �ℓ P �ℓ´1, the �ℓ are stopping times with respect to the �ltration �ℓ . We take the usual conventionthat inf ∅ ” 8, so the set of �nite stopping times corresponds exactly to the set �. Let 5 p8q ” 1tD8 R �p*qu. Itfollows from the assumption that �8 has the law of a gaussian random variable which is independent of �8 , and hasvariance between 1´ p21q2 ě 3{4 and 1. �erefore we have

?ℓ ” �

ˆ

1t�ℓ ă 8u 5 p�ℓ qˇ

ˇ

ˇ

ˇ

��ℓ´1

˙

“ 1t�ℓ ă 8uℙˆ

D�ℓ “ ��ℓ ` ��ℓ R �p*q

ˇ

ˇ

ˇ

ˇ

��ℓ´1

˙

ď max"

ˆ

/ R�p*q ´ G

˙

34

˙1{2ď � ď 1, |G| ď B

*

.

To bound the above, note that the set �´1p�p*q ´ Gq has Lebesgue measure at least |�p*q| (since � ď 1), and iscontained in the interval r´5B{2, 5B{2s (by the assumption B ě �maxp*q from (7.5), together with the restriction� ě p3{4q1{2). It follows that

?ℓ ď 1´ |�p*q|!ˆ

5B2

˙

ď 1´ 1p2�q1{2

exp"

´7B2

2

*

,

Page 69: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 69

where the last bound uses the assumption B2 ě | log |�p*q|| from (7.5). It then follows by iterated expectations that

ˆ

D8 R �p*q @8 R �; |�| ą <

2

˙

ď �

ź

9ď<{21t�9 ă 8u 5 p�9q

ď �

„ˆ

ź

9ď<{2´11t�9 ă 8u 5 p�9q

˙

ˆ

1t�r<{2s ă 8u 5 p�r<{2sq

ˇ

ˇ

ˇ

ˇ

��r<{2s´1

˙

ď

ˆ

1´ 1p2�q1{2

exp"

´7B2

2

*˙<{2ď exp

"

´< expp´7B2{2q

2p2�q1{2

*

.

Substituting this bound into (7.6) and accounting for the number of choices of � gives

ℙpΩ�q ď ℙ

ˆ

|�| ď<

2

˙

` exp"

<

ℋp�q ´ expp´7B2{2q2p2�q1{2

*

,

where ℋ denotes the binary entropy function, and satis�es ℋp�q ď � logp4{�q. If we take � “ expp´4B2q, then

ℋp�q ´ expp´7B2{2q2p2�q1{2

ď1

expp7B2{2q

ˆ

1` 4B2

exppB2{2q ´1

2p2�q1{2

˙

ď´1

6 expp7B2{2q ,

where the last bound uses the assumption B ě 10 from (7.5). It follows that

ℙpΩ�q ď ℙ

ˆ

|�| ď<

2

˙

` exp"

´<

6 expp7B2{2q

*

, (7.7)

and it remains to bound the probability that |�| ď <{2. To this end, note each �8 is a gaussian random variable withvariance at most p21q2, so

ℙp|�8 | ě Bq ď ℙp21|/| ě Bq ď21

Bexp

"

´B2

2p21q2

*

.

It follows by Markov’s inequality and the preceding bound that

ˆ

|�| ď<

2

˙

“ ℙ

ˆ

|�2 | ě<

2

˙

ď 2 max"

ℙp|�8 | ě Bq : 8 ď <

*

ď221B

exp"

´B2

2p21q2

*

ď12 exp

"

´B2

2p21q2

*

,

where the last bound follows trivially from the bounds 21 ď 1 and B ě 10 (from (7.5)). If < ě 1{�, thenB2

2 ¨ 6 expp7B2{2q “ 3B2

exppB2{2q ¨1�ď

1�ď < ,

so that (7.7) is dominated by the �rst term. It follows that

ℙpΩ�q ď exp"

´B2

2p21q2

*

“ �1{p8p21q2q ,

provided � “ expp´4B2q for B satisfying (7.5), and < ě 1{�. �is concludes the proof. �

Proof of Proposition 7.1. As in Lemma 7.4, let �1p2q “ 1{ logp4{2q. Let

< “

Z

12�1p2q log =

^

, ! “

Z

= ´ =1{2

<

^

.

By repeatedly applying Lemma 7.4, we see that there exists a re-indexing of v1 , . . . , v= such that

max"

›%ℓ<,8´1vℓ<`8›

2: 0 ď ℓ ď !´ 1, 1 ď 8 ď <

*

ď p32q1{2 ” 21 ď12 ,

where %ℓ<,8´1 denotes the orthogonal projection onto the span of tvℓ<`1 , . . . , vℓ<`8´1u. Let

#ℓ ”

ˇ

ˇ

ˇ

ˇ

!

1 ď 8 ď < : pg, vℓ<`8q P �p*q)

ˇ

ˇ

ˇ

ˇ

.

Page 70: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

70 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Note that if #ℓ ě 2<� for at least ={p2<q indices 0 ď ℓ ď ! ´ 1, then we will have pg, v8q P �p*q for at least =�indices 1 ď 8 ď =. It follows by combining with Markov’s inequality that

ˆ

1=

ˇ

ˇ

ˇ

!

8 ď = : pg, v8q P �p*q)ˇ

ˇ

ˇ ď �

˙

ď ℙ

ˆ

ÿ

ℓď!

1t#ℓ ď 2<�u ě=

3<

˙

ď3<=

ÿ

0ďℓď!´1ℙp#ℓ ď 2<�q .

Applying Lemma 7.5 gives, for 1{< ď 2� ď �0 “ �0p|�p*q|, �maxp*qq,

ˆ

1=

ˇ

ˇ

ˇ

!

8 ď = : pg, v8q P �p*q)ˇ

ˇ

ˇ ď �

˙

ď 4p2�q1{p242q .

�e claim follows. �

7.2. Polynomial concentration of free energy. In this subsection we give the proof of Proposition 1.7. Towardsthis end, we �rst state and prove Lemma 7.6 below. �is is an adaptation of [Tal11b, Propn. 8.2.6] (see also [Tal11b,Lem. 9.2.2]), using Proposition 7.1 in place of [Tal11b, Cor. 8.2.5].

Lemma 7.6. Let � be any probability measure on t´1,`1u# with weights proportional to Fp�q such that 0 ď Fp�q ď

1{2# for all � P t´1,`1u# , and, “

ÿ

Fp�q ě 4´#�

for � “ expp´12q. If ℙ denotes the law of a standard gaussian vector g in ℝ# , then

ˆ

ˆ"

� P t´1,`1u# :pg, �q#1{2 P �p*q

ď�

4

˙

(7.11)ď �11{2 ,

for expp14q{# ď � ď �0 “ �0p|�p*q|, �maxp*qq and # large enough.

Proof. First, it follows by a direct application of [Tal11b, Lem. 9.2.1] that since , ě expp´#�q, we have

�b2ˆ"

p�1 , �2q P t´1,`1u2# :p�1 , �2q

#ě p8�q1{2

ď1

expp2#�q. (7.8)

We then proceed to adapt the proof of [Tal11b, Propn. 8.2.6]. Let

&= ”

"

�1:= ” p�1 , . . . , �=q P t´1,`1u=# :p� : , �ℓ q

#ď p8�q1{2 @1 ď : ă ℓ ď =

*

.

It follows from (7.8) (and taking a union bound over all 1 ď : ă ℓ ď =) that

�b=p&=q(7.8)ě 1´ =2

2 expp2#�qě

12 , (7.9)

where the last inequality holds provided = ď expp#�q. Next de�ne

Ω�p�1:=q ”

"

g : 1=

ˇ

ˇ

ˇ

ˇ

"

ℓ ď = :pg, �ℓ q#1{2 P �p*q

*

ď �

*

.

If we take 2 “ p8�q1{2, then 2 ď 1{12 by the assumption � “ expp´12q, and so Proposition 7.1 implies that for every�1:= P &= we have the bound

´

Ω�p�1:=q

¯

ď �1{p252q , (7.10)for logp5{2q{plog =q ď � ď �0 and = large enough. De�ne the random variable

Υ� ”ÿ

�1:=P&=

�b=p�1:=q1!

g P Ω�p�1:=q

)

,

and note that Markov’s inequality combined with (7.10) gives

ˆ

Υ� ě14

˙

ď�Υ�

1{4 “ 4ÿ

�1:=P&=

�b=p�1:=qℙ´

Ω�p�1:=q

¯ (7.10)ď 4�1{p252q . (7.11)

Page 71: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 71

On the other hand, we can lower bound

Γ ” �

ˆ"

� P t´1,`1u# :pg, �q#1{2 P �p*q

“ÿ

�1:=

�b=p�1:=q1=

ˇ

ˇ

ˇ

ˇ

"

ℓ ď = :pg, �ℓ q#1{2 P �p*q

ˇ

ˇ

ˇ

ě �ÿ

�1:=P&=

�b=p�1:=q1!

g R Ω�p�1:=q

)

“ �

ˆ

�b=p&=q ´ Υ�

˙

(7.9)ě �

ˆ

12 ´ Υ�

˙

.

As a consequence, if Γ ď �{4, we must have Υ� ě 1{4. It follows that

ˆ

Γ ď�

4

˙

ď ℙ

ˆ

Υ� ě14

˙

(7.11)ď 4�1{p252q ,

again for logp5{2q{plog =q ď � ď �0 and = large enough. Recall moreover that for (7.9) to hold we must have= ď expp#�q, so we must ultimately require

�0 ě � ělogp5{2q#�

“logp5{p8�q1{2q

#�.

�e claim follows by recalling � “ expp´12q and 2 “ p8�q1{2. �

We now proceed to prove Proposition 1.7. �is is an adaptation of the proof of [Tal11b, Propn. 9.2.6], using theabove result Lemma 7.6 in place of [Tal11b, Propn. 8.2.6].

Proof of Proposition 1.7. As in the proof of �eorem 1.2 in the bounded case, let ℙ9 denote probability conditional onthe �rst 9 rows of M, and let�9 denote expectation with respect to ℙ9 . �en, as in the proof of [Tal11b, Propn. 9.2.6],we let ] ” `{2# and decompose

1#

"

log#�] ´� log#�]

*

“ÿ

9ď"

1#

"

�9 log#�] ´�9´1 log#�]

*

”ÿ

9ď"

-9 .

To bound -9 , recall (5.36) and denote

]9 ”` 9

2#”ÿ

F 9p�q ”ÿ

12#

ź

0ď",0‰9

*

ˆ

pg0 , �q#1{2

˙

.

Note that 0 ď] ď]9 ď 1. Since ]9 does not depend on the 9-th row of M, we can rewrite

#-9 “ �9

ˆ

log#�] ´ log#�]9

˙

´�9´1ˆ

log#�] ´ log#�]9

˙

.

Recall that 0 ď] ď]9 , so if ]9 ď 4´#� then log#�]9 “ ´#� “ log#�] . It follows that

! 9 ” log#�]9 ´ log#�] “ 1!

]9 ě 4´#�)

ˆ

log#�]9 ´ log#�]

˙

P r0, #�s .

Recall that ℙ9 denotes probability conditional on all rows of M except the 9-th one, and note �9´1 “ �9�9 where

�9 is expectation with respect to ℙ9 . We can rewrite -9 “ ´ 9G 9 ` :G 9 where

# 9G 9 ” �9

´

log#�]9 ´ log#�]¯

;]9 ě 4´#�

“ �9! 9 P r0, #�s ,

# :G 9 ” �9´1

´

log#�]9 ´ log#�]¯

;]9 ě 4´#�

“ �9´1! 9 “ �9p# 9G 9q . (7.12)

For comparison let -9 “ ´ 9I 9 ` :I 9 where 9I 9 ” 9G 9 ´ 94 9 and :I 9 ” :G 9 ´ :4 9 , for

# 94 9 ” �9

! 9 ;]]9

ă�1414

4#

“ �9

! 9 ;]]9

ă�1414

4#

P r0, #�s ,

#:4 9 ” �9´1

! 9 ;]]9

ă�1414

4#

“ �9´1„

! 9 ;]]9

ă�1414

4#

“ �9p# 94 9q .

Page 72: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

72 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Let � 9 be as in Lemma 7.6, and note the assumption *pGq ě �11tG P �p*qu implies

]]9

ě �1� 9

ˆ"

� P t´1,`1u# :pg9 , �q#1{2 P �p*q

” �1Γ9 .

Since 0 ď ! 9 ď #�, we can use Markov’s inequality to bound

0 ď �9p# 94 9q “ #:4 9 ď #��9´1„

1!

]9 ě 4´#�)

ℙ9

ˆ

]]9

ă�1414

4#

˙

ď #�

ˆ

�1414

#

˙11{2(7.13)

where the last inequality is by Lemma 7.6. It follows using Markov’s inequality again that

ˆ

ÿ

9ď"

ˇ

ˇ

ˇ-9 ´ -9

ˇ

ˇ

ˇ ě1

2#2

˙

ď 2#2ÿ

9ď"

´

94 9 ` :4 9

¯ (7.13)ď #3

ˆ

�1414

#

˙11{2. (7.14)

It remains to bound the random variables -9 “ ´ 9I 9 ` :I 9 . Using Jensen’s inequality,

expp#:I 9q ď �9´1 expp# 9I 9q ď 1`�9´1

]9

];]9 ě 4´#� ,

]9

4#�1414

.

It then follows by using Lemma 7.6 again that the above can be bounded by

1`�9´1„

]9

];]9 ě 4´#� ,

]9

4#�1414

ď 1`ż 4#{p�1414q

0ℙ9´1

ˆ

]9 ě 4´#�;]9

]ě D

˙

3D

ď 1` 4�1�0

`

ż 8

4{p�1�0q

ˆ

4�1D

˙11{23D ď �0 ” �0p|�p*q|, �maxp*q, �

1q . (7.15)

It follows that we can choose �0 small enough (depending on �0) such that for all 0 ď � ď �0,

�9´1„

expp#�|-9 |q

ď expp#�:I 9q ¨�9´1

expp#� 9I 9q

ď expp#�:I 9q ¨

ˆ

�9´1 expp# 9I 9q

˙�

ď p�0q2� ď 2 .

It follows by the martingale version of Bernstein’s inequality (see e.g. [Tal11b, eq. (A.41)]) that

ˆˇ

ˇ

ˇ

ˇ

ÿ

9ď"

-9

ˇ

ˇ

ˇ

ˇ

ě C

˙

ď 2 expˆ

´#C�

2 min"

1, C�2

for all C ě 0. In particular, taking C “ plog#q{#1{2 gives

ˆˇ

ˇ

ˇ

ˇ

ÿ

9ď"

-9

ˇ

ˇ

ˇ

ˇ

ělog##1{2

˙

ď expˆ

´�2plog#q2

2

˙

. (7.16)

�e claimed bound follows by combining (7.14) with (7.16). �

7.3. Exponential concentration for smoothed model. In this subsection we give the proof of Proposition 1.10,showing concentration for the log-partition function of the smoothed model (1.24).

�eorem 7.7 (Pisier [Pis86]). If 5 : ℝ= Ñ ℝ is �1, and - and. are independent standard gaussian random variablesin ℝ= , then for any convex function 6 : ℝÑ ℝ it holds that

�6´

5 p-q ´ 5 p.q¯

ď �6

ˆ

�2 p∇ 5 p-q, .q

˙

.

In particular, taking 6pGq “ exppBGq for any real number B gives

� exp"

5 p-q ´ 5 p.q¯

*

ď � exp"

B2�2

8 }∇ 5 p-q}2*

.

In the case that∇ 5 is bounded, this recovers the standard theorem [TIS76] (see also [Bor75]) on concentration of Lipschitzfunctionals of gaussian random variables.

Page 73: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 73

We also recall that if M is an " ˆ# matrix with i.i.d. standard gaussian entries and " ď # , then the maximumsingular value BmaxpMq satis�es the tail bound

ˆ

BmaxpMq ě p�2#q1{2 ` C

˙

ď2

expp22C2q(7.17)

for all C ě 0, where 22 and �2 are absolute constants. See for instance [RV10, Propn. 2.4] where the result is in factstated more generally for matrices with independent subgaussian entries (with mean zero and unit variance). Fromthis bound it is straightforward to deduce the following:

Lemma 7.8. IfM is an"ˆ# matrix with i.i.d. standard gaussian entries and" ď # , then we can take 22 “ 1{�2 ď 1in the bound (7.17). With this choice of constants, we have

� exp´

'BmaxpMq2¯

ď 16# expp2'�2#q

for all 0 ď ' ď 22{2 “ 1{p2�2q.

Proof. It follows by a change of variables that

�p'q ” � exp´

'BmaxpMq2¯

ż 8

0ℙ

ˆ

expp'BmaxpMq2q ě G

˙

3G

“ 2'ż 8

0D expp'D2q ¨ ℙ

ˆ

BmaxpMq ě D

˙

3D ď (I)` (II) ,

where (I) is the contribution to the integral from D ď p�2#q1{2, while (II) is the contribution from D ě p�2#q

1{2.We then have the trivial bound

(I) ď 2'ż p�2#q1{2

0D expp'D2q 3D ď 2' expp'�2#q

ż p�2#q1{2

0D 3D

“ '�2# expp'�2#q ď#

2 expp'�2#q ,

where the last inequality uses the assumption ' ď 22{2 “ 1{p2�2q. For the other term, it follows from the singularvalue tail bound (7.17) (and again using ' ď 22{2 “ 1{p2�2q) that

(II) ď 4'ż 8

0

ˆ

p�2#q1{2 ` D

˙

exp"

p�2#q1{2 ` D

¯2´ 22D

2*

3D

ď 4' expp'�2#q

ż 8

0

ˆ

p�2#q1{2 ` D

˙

exp"

2'p�2#q1{2D ´

22D2

2

*

3D .

Completing the square and making another change of variables gives

(II) ď 4' exp"ˆ

1` 2'22

˙

'�2#

*ż 8

´8

ˇ

ˇ

ˇ

ˇ

D `

ˆ

1` 2'22

˙

p�2#q1{2ˇ

ˇ

ˇ

ˇ

exp"

´22D

2

2

*

3D

ď4'

p22q1{2expp2'�2#q

ż 8

´8

ˇ

ˇ

ˇ

ˇ

D

p22q1{2` 2p�2#q

1{2ˇ

ˇ

ˇ

ˇ

exp"

´D2

2

*

3D

ď expp2'�2#q4'p2�q1{2

22

´

1` 2#1{2¯

ď 6p2�q1{2 ¨ # expp2'�2#q .

Combining the bounds for (I) and (II) gives the claimed bound. �

Lemma 7.9. Suppose * satis�es Assumption 1, and let `p�q be as de�ned by (1.24). If 5 “ log`p�q viewed as afunction of the gaussian disorder M, then there exists a �nite constant �1p* ;�q such that

� expˆ

B2}∇ 5 pMq}2˙

ď 16# ¨ exp"

# ¨ 6�2�1p* ;�q2B2*

for all |B| ď p22q1{2{p2�1p* ;�qq, where �2 and 22 are the constants from Lemma 7.8.

Page 74: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

74 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Proof. Recall that *� ” * ˚ !�, and denote D� ” log*�. Denote the probability meausure

��p�q ”F�p�q

`p�q“

1`p�q

ź

0ď"

*�

ˆ

pg0 , �q#1{2

˙

“1

`p�q

ź

0ď"

*�pΔ0q ,

where we abbreviate Δ0 “ pg0 , �q{#1{2. �enˇ

ˇ

ˇ

ˇ

35

360,8

ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

ÿ

��p�qpD�q1pΔ0q

�8

#1{2

ˇ

ˇ

ˇ

ˇ

ď1

#1{2

ÿ

Fp�q|pD�q1pΔ0q| .

Note that if pD�q1 were uniformly bounded, then 5 would be �-Lipschitz with � “ }pD�q1}8{#1{2, and the desiredexponential concentration for log`p�qwould follow from standard concentration theorems for Lipschitz functionalsof gaussians. Since pD�q1 may be unbounded, we cannot conclude that 5 is Lipschitz. However, we note that

pD�q1pGq “

��r�*pG ` ��qs

���*pG ` ��qď �1p* ;�q

´

1` |G|¯

,

where the last bound holds by an obvious extension of Lemma 3.3 (using Assumption 1). �ereforeˇ

ˇ

ˇ

ˇ

35

360,8

ˇ

ˇ

ˇ

ˇ

ď�1p* ;�q#1{2

ˆ

1`ÿ

��p�q|Δ0 |

˙

“�1p* ;�q#1{2

ˆ

1` x|Δ0 |y�˙

,

where x¨y� denotes expectation over ��. It follows that

}∇ 5 }2 ď �1p* ;�q2ÿ

0ď"

ˆ

1` x|Δ0 |y�˙2ď 2�1p* ;�q2

ÿ

0ď"

ˆ

1` px|Δ0 |y�q2˙

ď 2�1p* ;�q2ÿ

0ď"

ˆ

1` xpΔ0q2y�˙

“ 2�1p* ;�q2"

" `}M�}2

#

*

ď 2�1p* ;�q2"

" ` BmaxpMq2*

,

where BmaxpMq denotes the maximum singular value of M, as above. Taking the expectation over M and applyingLemma 7.8 gives

� expˆ

B2}∇ 5 pMq}2˙

ď 16# ¨ exp"

" ` 2�2#¯

�1p* ;�q2B2*

,

where the bound holds provided |B| ď p22q1{2{p2�1p* ;�qq. �e result follows by recalling that we assumed " ď #

and �2 ě 1. �

Proof of Proposition 1.10. Let M1 be an independent copy of M. It follows by �eorem 7.7 and Lemma 7.9 that

� exp"

5 pMq ´� 5 pMq¯

*

ď � exp"

5 pMq ´ 5 pM1q¯

*

ď � exp"

B2�2

8 }∇ 5 pMq}2*

ď 16# ¨ exp"

# ¨ 8�2�1p* ;�q2B2*

,

for all |B| ď p22q1{2{p3�1p* ;�qq. �us, for G ě 0, it holds for 0 ď B ď p22q

1{2{p3�1p* ;�qq that

ˆ

5 pMq ´� 5 pMq ě #G

˙

ď � exp"

5 pMq ´� 5 pMq¯

´ #BG

*

ď 16# ¨ exp"

8�2�1p* ;�q2B2 ´ BG¯

*

.

A similar bound holds for G ď 0. In any case it is clear that we can take B small enough to obtain exponential decay.In particular, for G ě 0 small enough we can let

B “G

16�2 ¨ �1p* ;�q2 ďp22q

1{2

3�1p* ;�q ,

Page 75: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 75

where the last bound holds for G ď 5p�2q1{2�1p* ;�q. �is results in the bound

ˆ

ˇ

ˇ

ˇ 5 pMq ´� 5 pMqˇ

ˇ

ˇ ě #G

˙

ď 32# ¨ exp"

´#G2

32�2�1p* ;�q2

*

,

which concludes the proof. �

Corollary 7.10. Suppose* satis�es Assumptions 1 and 2, and let `p�q be as in (1.24). �en

lim#Ñ8

1#

log`p�q “ RSp ;*�q

for all 0 ă ď 1p*q.

Proof. Recall from (3.13) that if 0 ă ď 1p*q, then we will also have ď p*�q for � small enough. �e upperbound on `p�q follows from the upper bound in �eorem 1.2, which was already proved at the end of Section 4. Forthe lower bound on `p�q, we argue similarly as in the proof of the �eorem 1.2 lower bound for the case }D}8 ă 8,but using the concentration result from Proposition 1.10 in place of the Azuma–Hoe�ding bound. To this end, let¯ p�q be de�ned as ¯ from (5.3), but with *� in place of * . It follows from �eorem 1.6 (by the same calculationleading to (5.35)) that, with high probability,

ˆ

1#

log ¯ p�q ě RSp ;*�q ´ >Cp1qˇ

ˇ

ˇ

ˇ

ℱpCq

˙

ě1

expp#>Cp1qq.

On the other hand, it follows from Proposition 1.10 that, again with high probability,

ˆ

1#

log` ě 1#� log`p�q ` G

ˇ

ˇ

ˇ

ˇ

ℱpCq

˙

ď 35# ¨ exp"

´#G2

35�2�1p* ;�q2

*

for su�ciently small G ą 0. �e above two bounds are in contradiction with one another unless1#� log`p�q ě RSp ;*�q ´ >#p1q .

It then follows by another application of Proposition 1.10 that

ˆ

1#

log`p�q ď RSp ;*�q ´ >#p1q ´ G

˙

ď ℙ

ˆ

1#

log`p�q ď 1#� log`p�q ´ G

˙

ď 35# ¨ exp"

´#G2

35�2�1p* ;�q2

*

for su�ciently small G ą 0. �is yields the lower bound for `p�q and concludes the proof. �

7.4. Comparison with smoothed model and conclusion. In this subsection we prove Proposition 1.8 whichgives the comparison between the quantities ` and `p�q from (1.1) and (1.24). We then conclude the proof of themain theorem.

Proof of Proposition 1.8. Some of the steps below are similar to the steps in the proof of Proposition 1.7. Let

\: ”1

2#ÿ

"

ź

0ď:

*�

ˆ

pg0 , �q#1{2

˙*"

ź

:ă0ď"

*

ˆ

pg0 , �q#1{2

˙*

.

Recall ] ” `{2# , and write ] ” `p�q{2# . Note \0 “] , and \" “ ] . Let us also de�ne

\:,˝ ”1

2#ÿ

"

ź

0ă:

*�

ˆ

pg0 , �q#1{2

˙*"

ź

:ă0ď"

*

ˆ

pg0 , �q#1{2

˙*

”ÿ

F:,˝p�q . (7.18)

Note that \:,˝ ě maxt\:´1 ,\:u. We can then decompose1#�

ˆ

log#� ] ´ log#�]

˙

“1#

ÿ

:ď"

ˆ

log#� \: ´ log#� \:´1

˙

“ÿ

:ď"

H:

Page 76: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

76 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

(compare with (7.12)). Let �:,˝ be the probability measure on t´1,`1u# with weights proportional to F:,˝p�q asde�ned by (7.18). Write x¨y:,˝ for expectation with respect to �:,˝. Abbreviate

*: ” *

ˆ

pg: , �q#1{2

˙

, *: ” *�

ˆ

pg: , �q#1{2

˙

.

Recalling that *pGq ą �11tG P �p*qu (from (1.22)), we have

\:\:,˝

“ x*:y:,˝ ě �1�:,˝

ˆ"

� P t´1,`1u# :pg: , �q#1{2 P �p*q

” �1Γ:,˝ .

For � small enough we will also have *�pGq ą �11tG P �p*qu, so we can also bound

\:´1\:,˝

“ x*:y:,˝ ě �1�:,˝

ˆ"

� P t´1,`1u# :pg: , �q#1{2 P �p*q

“ �1Γ:,˝ .

We also have from [Tal11b, Lem. 8.3.10] that if G, H, I ď 1, thenˇ

ˇ

ˇ

ˇ

log�pGIq ´ log�pHIqˇ

ˇ

ˇ

ˇ

ď

ˇ

ˇ

ˇ

ˇ

log� G ´ log� Hˇ

ˇ

ˇ

ˇ

¨ 1!

I ě 4´�)

.

Combining the above bounds gives

|#H: | ď �

ˇ

ˇ

ˇ

ˇ

log#�

´

\:,˝x*:y:,˝

¯

´ log#�

´

\:,˝x*:y:,˝

¯

ˇ

ˇ

ˇ

ˇ

ď �

„ˇ

ˇ

ˇ

ˇ

log#�x*:y:,˝ ´ log#�x*:y:,˝

ˇ

ˇ

ˇ

ˇ

;\:,˝ ě 4´#�

ď (I)` (II) ,

for (I) and (II) de�ned by

(I) ” #�ℙ

ˆ

\:,˝ ě 4´#� , �1Γ:,˝ ă414

4#

˙

,

(II) ” �„ˇ

ˇ

ˇ

ˇ

logˆ

1`x*:y:,˝ ´ x*:y:,˝

�1Γ:,˝

˙ˇ

ˇ

ˇ

ˇ

;\:,˝ ě 4´#� , Γ:,˝ ě414

4#

.

Combining with Lemma 7.6 gives (similarly to (7.13))

(I) ď #�

ˆ

414

#

˙11{2.

Meanwhile, using the bound logp1` Gq ď G together with the Cauchy–Schwarz inequality gives

(II) ď �„ˇ

ˇ

ˇ

ˇ

x*:y:,˝ ´ x*:y:,˝

�1Γ:,˝

ˇ

ˇ

ˇ

ˇ

;\:,˝ ě 4´#� , Γ:,˝ ě414

4#

ď1�1

"

´

x*: ´*:y:,˝

¯2

¨�

1pΓ:,˝q

2 ;\:,˝ ě 4´#� , Γ:,˝ ě414

4#

*1{2.

For the �rst factor we note that

´

x*: ´*:y:,˝

¯2

ď

B

p*: ´*:q2ı

F

:,˝

“ �

´

*�p�q ´*p�q¯2

ď >�p1q .

For the second factor, applying Lemma 7.6 again gives (similarly to (7.15))

1pΓ:,˝q

2 ;\:,˝ ě 4´#� , Γ:,˝ ě414

4#

ď

ż p4#{414q2

0ℙ

ˆ

414

4# ď Γ:,˝ ď1H1{2

˙

3H

ď

ˆ

�0

4

˙2`

ż p4#{414q2

p�0{4q2

ˆ

4H1{2

˙11{23H ď

412

p�0q7{2,

where �0 “ �0p|�p*q|, �maxp*qq is as in Proposition 7.1. Altogether it follows that

|#H: | ď #�

ˆ

414

#

˙11{2`

>�p1q46

�1p�0q7{4,

Page 77: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 77

and the claim follows by summing over : ď " “ # . �

We now �nally �nish the proof of the main theorem:

Proof of �eorem 1.2 (conclusion). �e proof of the upper bound was given at the end of Section 4, a�er the proof of�eorem 1.5. �e proof of the lower bound in the case }D}8 ă 8 was given at the end of Section 5, a�er the proofof �eorem 1.6. It remains to prove the lower bound in the case }D}8 “ 8. We follow the proof sketch given at theend of Section 1. It follows from Propositions 1.7 and 1.8 that

ˆˇ

ˇ

ˇ

ˇ

1#

log#�

ˆ

`2#

˙

´1#� log#�

ˆ

`p�q

2#

˙ˇ

ˇ

ˇ

ˇ

ěplog#q2

#1{2 ` >�p1q˙

ď >#p1q . (7.19)

Given & ą 0, we can choose � small enough such that the >�p1q error above is at most & in absolute value. ByProposition 1.9 together with Corollary 7.10, for 0 ă ď 1p*q and � small enough we have

ˆ

1#

log`p�q ď RSp ;*q ´ 2&˙

ď ℙ

ˆ

1#

log`p�q ď RSp ;*�q ´ &

˙

ď >#p1q , (7.20)

It follows from Corollary 3.8 that RSp ;*q ě log 2´ �{4 for 0 ă ď p*q, so taking & ď �{8 in the above gives

0 ď 1#

"

� log#�

ˆ

`p�q

2#

˙

´� logˆ

`p�q

2#

˙*

ď �ℙ

ˆ

1#

log`p�q

2#ď ´

�2

˙

(7.20)ď >#p1q .

It follows by combining with (7.19) and (7.20) that

ˆ

1#

log#�

ˆ

`2#

˙

ď RSp ;*q ´ log 2´ 4&˙

ď >#p1q .

Since RSp ;*q ´ log 2´ 4& ě ´�{4´ 4& ě ´�, it follows that in fact

ˆ

1#

log` ď RSp ;*q ´ 4&˙

ď >#p1q ,

as claimed. �

Appendix A. Review of AMP for perceptron

In §A.1 and A.2 we prove Lemma 2.15. In the rest of the section, we give a heuristic derivation of the stateevolution recursions introduced in De�nition 2.1. We emphasize that §A.1 and A.2 are rigorous, while §A.3–A.4 arenot (and are intended only to provide intuition). For rigorous derivations of the asymptotics described in §A.3–A.4,we again refer the reader to [BM11, Bol14].

A.1. Gaussian conditioning results. Suppose for simplicity that C , � : ℝ Ñ ℝ are two smooth functions. Let /denote a standard gaussian random variable. Suppose we have p@,#q such that (cf. (1.5))

ˆ

@

#

˙

ˆ

�rCp#1{2/q2s

�r�p@1{2/q2s

˙

. (A.1)

Let mp0q “ 0 P ℝ# , np0q “ 0 P ℝ" , mp1q “ @1{21 P ℝ# , np1q “ p#{ q1{21 P ℝ" . �e AMP iteration in this se�ingis given by (cf. (1.16) and (1.17))

mpC`1q “ C

ˆ

MtnpCq

#1{2 ´ �mpC´1q˙

P ℝ# , (A.2)

npC`1q “ �

ˆ

MmpCq

#1{2 ´ �npC´1q˙

P ℝ" , (A.3)

where the Onsager coe�cients are de�ned as (cf. (1.18))˜

¸

ˆ

��1p@1{2/q

�C1p#1{2/q

˙

.

A preliminary observation is the following:

Page 78: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

78 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

Lemma A.1. Let M be an " ˆ # matrix with jointly gaussian entries. Suppose r is a unit vector in ℝ# , while c is aunit vector in ℝ" . �en �pM |Mrq “ Mrrt, �pM |Mtcq “ cctM, and

ˆ

M

ˇ

ˇ

ˇ

ˇ

Mr,Mtc˙

“ pMrqrt ` cpMtcqt ´ pctMrqcrt ” �´

r, c,Mr,Mtc¯

” � . (A.4)

Proof. Denote X0 ” e0rt for 0 ď ", and denote I8 ” cpe8qt for 8 ď # . �e equations Mr “ x and Mtc “ y areequivalent to the linear constraints

G0 “ e0Mr “ pM, e0rtq “ pM,X0q ,

H8 “ cMpe8qt “ pM, cpe8qtq “ pM, I8q ,

where p¨, ¨q above denotes the Frobenius inner product. Note then that

�pM |Mr “ xq “ÿ

0ď"

pM,X0qX0 “ÿ

0ď"

G0e0rt “ xrt “ Mrrt ,

�pM |Mtc “ yq “ÿ

8ď#

pM, I8qI8 “ÿ

8ď#

H8cpe8qt “ cyt “ cctM .

�e claim (A.4) follows by noting that �r “ x, �tc “ y, and � is in the span of the pX0 , I8q. �

Let rp1q , . . . , rpCq be the Gram–Schmidt orthogonalization of the vectors mp1q , . . . ,mpCq. Likewise let cp1q , . . . , cpCqbe the Gram–Schmidt orthogonalization of the vectors np1q , . . . , npCq. Let Mp1q ” M, and suppose recursively thatMpBq has been de�ned. Let MpBqrpBq “ xpBq, pMpBqqtcpBq “ ypBq, and de�ne (cf. (A.4))

MpB`1q ” MpBq ´ �ˆ

rpBq , cpBq , xpBq , ypBq˙

” MpBq ´ �pBq . (A.5)

We also de�ne a corresponding �-�eld

ℱ‹pCq ” �

ˆ

pxpBq : B ď Cq, pypBq : B ď Cq

˙

. (A.6)

�e next lemma records some basic facts about ℱ‹pCq.

Lemma A.2. For the AMP iteration described above, the random variablesˆ

´

mpBq , npBq , rpBq , cpBq : B ď C ` 1¯

Mmpℓq ,Mtnpℓq ,Mrpℓq ,Mtcpℓq , �pℓq : ℓ ď C¯

˙

are all measurable with respect to ℱ‹pCq.

Proof. Recall that the initial vectors mp0q, np0q, mp1q, np1q are �xed and deterministic, so they are measurable withrespect to the trivial �-�eld ℱ‹p0q. From these we can also obtain the deterministic vectors rp1q and cp1q. Next weconsider the �-�eld ℱ‹p1q: it is clear that �p1q is ℱ‹p1q-measurable. Next note that

xp1q “ Mrp1q “Mmp1q

}mp1q}, yp1q “ Mtcp1q “

Mtnp1q

}np1q},

so we see that Mmp1q and Mtnp1q are measurable with respect to ℱ‹p1q. We can then apply the AMP iteration (A.2)and (A.3) to obtain mp2q and np2q, so these are also measurable with respect to ℱ‹p1q. It follows by Gram–Schmidtorthogonalization that rp2q and cp1q are also ℱ‹p1q-measurable.

Now suppose inductively that the claim holds up to ℱ‹pC ´ 1q, and consider the �-�eld ℱ‹pCq. �en the matrix�pCq is clearly ℱ‹pCq-measurable. Next note that (A.5) implies

M “ Mp1q “ �p1q ` Mp2q “ . . . “

C´1ÿ

B“1�pBq ` MpCq ,

where the �pBq, B ď C ´ 1, are all measurable with respect to ℱ‹pC ´ 1q Ď ℱ‹pCq. �erefore

MrpCq “C´1ÿ

B“1�pBqrpCq ` xpCq

Page 79: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 79

is ℱ‹pCq-measurable, as is MtcpCq. Recall from the Gram–Schmidt orthogonalization that

rpCq “mpCq ´

ř

BďC´1pmpCq , rpBqqrpBq

}mpCq ´ř

BďC´1pmpCq , rpBqqrpBq},

which we can rearrange to obtain an expression for mpCq. It follows from this that MmpCq is ℱ‹pCq-measurable,as is MtnpCq. We can then apply the AMP iteration (A.2) and (A.3) to obtain mpC`1q and npC`1q, so these are alsomeasurable with respect to ℱ‹pCq. Finally, it follows by Gram–Schmidt orthogonalization that rpC`1q and cpC`1q arealso ℱ‹p1q-measurable. �is veri�es the inductive hypothesis and proves the claim. �

A.2. Projection and resampling. In this subsection we give the proof of Lemma 2.15. For notational convenience,the roles of M and M1 through this section are switched from the main body of the paper.

De�nition A.3 (similar to De�nition 2.13). Given ℱ‹pC ´ 1q as in (A.6), consider the linear subspaces

+RpCq ” span"

e0pmpBqqt : 1 ď 0 ď ", 1 ď B ď C

*

,

+CpCq ” span"

npℓqpe8qt : 1 ď 8 ď #, 1 ď ℓ ď C

*

.

It follows from Lemma A.2 that these (random) subspaces are measurable with respect to ℱ‹pC ´ 1q. Let +‹pCq “+RpCq `+CpCq, and let projC denote orthogonal projection onto +‹pCq.

We remark that +‹pCq is very similar to the (random) subspace +RC “ +RpCq ` +CpC ´ 1q which appears in theproof of Lemma 2.15. We will address the discrepancy between +‹pCq and +RC in the proof of Lemma 2.15, below.�e following is a straightforward consequence of the preceding lemmas and the de�nition:

Corollary A.4. Let M be an " ˆ # matrix with i.i.d. standard gaussian entries. Withℱ‹pCq as in (A.6),

´

ˇ

ˇℱ‹pCq¯

“ÿ

BďC

�pBq “ M ´ MpC`1q “ projCpMq ,

where projC is the orthogonal projection onto the (random) subspace+‹pCq from De�nition A.3. Moreover, conditional onℱ‹pC ´ 1q, MpC`1q is distributed as a standard gaussian element of the (ℱ‹pC ´ 1q-measurable) subspace +‹pCqK, and isindependent of ℱ‹pCq.

Proof. Note that the recursive de�nition (A.5) implies

MpCq “ MpC´1q ´ �pC´1q “ . . . “ M ´C´1ÿ

B“1�pBq . (A.7)

By induction, conditional on ℱ‹pC ´ 2q, the random matrix MpCq is distributed as a standard gaussian element of the(ℱ‹pC´2q-measurable) subspace+‹pC´1qK, and is independent of ℱ‹pC´1q. It follows that MpCq has jointly gaussianentries conditional on ℱ‹pC ´ 1q. We also have from Lemma A.2 that the vectors rpCq and cpCq are measurable withrespect to ℱ‹pC ´ 1q. It follows by applying Lemma A.1 (conditional on ℱ‹pC ´ 1q) that

´

MpCqˇ

ˇ

ˇℱ‹pCq¯

“ �ˆ

rpCq , cpCq , xpCq , ypCq˙

“ �pCq , (A.8)

and MpC`1q “ MpCq ´ �pCq and ℱ‹pCq are independent given ℱ‹pC ´ 1q. It follows that

´

ˇ

ˇℱ‹pCq¯

(A.5)“ �

ˆ

ÿ

BďC´1�pBq ` MpCq

ˇ

ˇ

ˇ

ˇ

ℱ‹pCq

˙

“ÿ

BďC´1�pBq `�

ˆ

MpCqˇ

ˇ

ˇ

ˇ

ℱ‹pCq

˙

(A.8)“

ÿ

BďC

�pBq(A.7)“ M ´ MpC`1q .

We also note that MpB`1qrpBq “ pMpBq ´ �pBqqrpBq “ 0 P ℝ" by the construction of �pBq, and as a result

�pB`1qrpBq “ˆ

Mrrt ` cctM ´ pctMrqcrt˙pB`1q

rpBq “ 0 P ℝ" ,

Page 80: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

80 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

and likewise p�pB`1qqtcpBq “ 0 P ℝ# . One can then show by induction that for all B ă C we have MpCqrpBq “ 0 P ℝ" ,and likewise pMpCqqtcpBq “ 0 P ℝ# . �is implies, for all ℓ ď C,

ˆ

M ´ÿ

BďC

�pBq˙

rpℓq (A.7)“ MpC`1qrpℓq “ 0 P ℝ" ,

ˆ

M ´ÿ

BďC

�pBq˙t

cpℓq (A.7)“ pMpC`1qqtcpℓq “ 0 P ℝ# .

It follows from this that MpC`1q is orthogonal to +‹pCq. On the other hand

M ´ MpC`1q “ÿ

BďC

�pBq

lies in +‹pCq, so we see that this the orthogonal projection of M onto +‹pCq, as claimed. We also see that MpC`1q isthe orthogonal projection of MpCq onto +‹pCqK, which con�rms the inductive hypothesis. �

�e next result is similar to Lemma 2.15:

Lemma A.5. Let M be an " ˆ # matrix with i.i.d. gaussian entries, and use it to de�ne ℱ‹pCq as in (A.6). As inDe�nition A.3, let projC denote the orthogonal projection onto the ℱ‹pC ´ 1q-measurable subspace +‹pCq. �en, for anybounded measurable function 5 : ℝ"ˆ# Ñ ℝ, we have

ˆ

5 pMpC`1qq

ˇ

ˇ

ˇ

ˇ

ℱ‹pCq

˙

“ �

ˆ

5 pMpC`1qq

ˇ

ˇ

ˇ

ˇ

ℱ‹pC ´ 1q˙

(A.9)

“ �

ˆ

M1 ´ projCpM1q¯

ˇ

ˇ

ˇ

ˇ

ℱ‹pC ´ 1q˙

(A.10)

where M1 is an independent copy of M.

Proof. We saw in Corollary A.4 that conditional on ℱ‹pC´ 1q, the random matrix MpC`1q is distributed as a standardgaussian element of the (ℱ‹pC ´ 1q-measurable) subspace +‹pCqK. As a result, MpC`1q and ℱ‹pCq are independentconditional onℱ‹pC´1q, so the �rst claim (A.9) follows. Since M and M1 are independent, if we condition onℱ‹pC´1qthen the random matrix M1 ´ projCpM1q is also distributed as a standard gaussian element of +‹pCqK. �is implies(A.9). �

Remark A.6. We can also give a more explicit description of the projection of M1 onto +‹pCq, although it is notneeded in the above proof of Lemma A.5. De�ne M‚p1q ” M1, and recursively

M‚pC`1q ” M‚pCq ´ �‚pCq ” MpCq ´ �ˆ

rpCq , cpCq ,M‚pCqrpCq , pM‚pCqqtrpCq˙

. (A.11)

Note that �‚pCq is de�ned using the vectors rpCq and cpCq that came from M, not M1. As in De�nition A.3, we let projCdenote the orthogonal projection onto the ℱ‹pC ´ 1q-measurable subspace +‹pCq. We then claim that

projCpM1q “ M1 ´ M‚pC`1q (A.11)“

ÿ

BďC

�‚pCq . (A.12)

�is is very similar to the proof of Corollary A.4, but in fact simpler because M and M1 are independent, which impliesthat M1 is independent of the random subspace +‹pCq. Arguing as before, we have by construction M‚pB`1qrpBq “ 0and pM‚pB`1qqtcpBq “ 0. One can then show by induction that for all B ă C we have M‚pCqrpBq “ 0 and pM‚pCqqtcpBq “0. �is implies, for all ℓ ď C,

ˆ

M1 ´ÿ

BďC

�‚pBq˙

rpℓq (A.11)“ M‚pC`1qrpℓq “ 0 P ℝ" ,

ˆ

M1 ´ÿ

BďC

�‚pBq˙t

cpℓq (A.11)“ pM‚pC`1qqtcpℓq “ 0 P ℝ# .

It follows from this that MpC`1q is orthogonal to +‹pCq. �is veri�es (A.12), since we see that the right-hand side of(A.12) lies in +‹pCq.

Page 81: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 81

Proof of Lemma 2.15. Recall that, for notational convenience, the roles of M and M1 in this section are switched fromthe statement of Lemma 2.15. �us, for the purposes of the proof, we use M for the AMP iteration (A.2) and (A.3),and this de�nes ℱ‹pCq as in (A.6). We also let R and C be as in De�nition 2.14, but with M and M1 switched. �e�-�eld ℱpCq from (1.19) is very closely related to ℱ‹pC´ 1q, but is not exactly the same: indeed, we can see from theproof of Lemma A.2 that

ℱpCq “ �

ˆ

ℱ‹pC ´ 1q, xpCq˙

“ �

ˆ

ℱ‹pC ´ 1q,MmpCq , npC`1q˙

.

By a similar (but simpler) argument as in Corollary A.4, we see that

´

ˇ

ˇℱpCq¯

“ÿ

BďC´1�pBq ` rpCqpxpCqqt “ projRCpMq “ MRC ,

where projRC denotes orthogonal projection onto +RC as in De�nition 2.13, except that +RC here is de�ned for Mrather than M1. Conditional on ℱ‹pC ´ 1q, the random matrix M ´ projRCpMq is distributed as a standard gaussianelement of the ℱ‹pC ´ 1q-measurable vector space p+RCq

K, and is conditionally independent of ℱpCq. �erefore

´

5 pMqˇ

ˇ

ˇℱpCq¯

“ �

MRC `

´

M1 ´ projRCpM1q¯¯

ˇ

ˇ

ˇ

ˇ

ℱpCq

“ �

´

5 pM1qˇ

ˇ

ˇR,C,MRC

¯

.

�is concludes the proof. �

A.3. AMP iterates at C “ 2 and C “ 3. Returning to the AMP iteration (A.2) and (A.3) we have (cf. (2.10) and (2.11))

mp2q ” CpHp2qq “ C

ˆ

Mtnp1q

#1{2

˙

“ C

ˆ

#1{2Mt np1q

p##q1{2

˙

“ Cp#1{2Mtcp1qq “ Cp#1{2yp1qq ,

np2q ” �php2qq “ �

ˆ

Mmp1q

#1{2

˙

“ �

ˆ

@1{2Mmp1q

p#@q1{2

˙

“ �p@1{2Mrp1qq “ �p@1{2xp1qq . (A.13)

It follows using (A.1) that }mp2q}2 » #@, }np2q}2 » ##, and moreover (cf. (2.4))

pmp2q ,mp1qq

#@»

ˆ

1@

˙1{2�Cp#1{2/q ” �1 ” �1 ,

pnp2q , np1qq##

»

ˆ

#

˙1{2��p@1{2/q ” �1 ” �1 . (A.14)

�erefore in the Gram–Schmidt orthogonalization we have

rp2q “pmp2qqK

}pmp2qqK}»

mp2q ´ �1mp1q

r#@p1´ p�1q2qs1{2,

cp2q “pnp2qqK

}pnp2qqK}»

np2q ´ �1np1q

r##p1´ p�1q2qs1{2. (A.15)

We can express the m, n vectors in terms of the r, c vectors asmp2q

p#@q1{2» �1rp1q `

´

1´ p�1q2¯1{2

rp2q ,

np2q

p##q1{2» �1cp1q `

´

1´ p�1q2¯1{2

cp2q . (A.16)

At the next step of the AMP iteration we have (cf. (A.13))

mp3q ” CpHp3qq “ C

ˆ

Mtnp2q

#1{2 ´ �mp1q˙

“ C

ˆ

pMp2qqtpnp2qqK

#1{2 `p�p1qqtnp2q

#1{2 ´ �mp1q˙

,

np3q ” Cphp3qq “ �

ˆ

Mmp2q

#1{2 ´ �np1q˙

“ �

ˆ

Mp2qpmp2qqK

#1{2 `�p1qmp2q

#1{2 ´ �np1q˙

. (A.17)

Page 82: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

82 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

In order to evaluate p�p1qqtnp2q{#1{2, we calculate"

rrtMt

#1{2

*p1qnp2q “

mp1q

#@1{2 pxp1q , np2qq “

mp1q

#@1{2 pxp1q , �p@1{2xp1qqq » �mp1q ,

"

Mtcct

#1{2

*p1qnp2q “

yp1q

#1{2pnp1q , np2qqp##q1{2

» #1{2�1yp1q ,

"

pctMrqrct

#1{2

*p1qnp2q “

1#1{2

1tM1# 1{2

pnp1q , np2qq#p@#q1{2

mp1q “ $

ˆ

1#1{2

˙

mp1q .

In order to evaluate �p1qmp2q{#1{2, we calculate"

Mrrt

#1{2

*p1qmp2q “

xp1q

#1{2pmp1q ,mp2qq

p#@q1{2» @1{2�1xp1q ,

"

cctM#1{2

*p1qmp2q “

np1q

##1{2 pyp1q ,mp2qq “

np1q

##1{2 pyp1q , Cp#1{2yp1qqq » �np1q ,

"

pctMrqcrt

#1{2

*p1qmp2q “

1#1{2

1tM1# 1{2

pmp1q ,mp2qq

#p@#q1{2np1q “ $

ˆ

1#1{2

˙

np1q .

Substituting this back into (A.17) gives the decomposition (cf. (2.10), (2.11), and (A.13))

mp3q ” CpHp3qq » C

ˆ

#1{2"

�1yp1q `´

1´ p�1q2¯1{2

yp2q*˙

,

np3q ” �php3qq » �

ˆ

@1{2"

�1xp1q `´

1´ p�1q2¯1{2

xp2q*˙

. (A.18)

It follows that (A.14) continues to hold (approximately) with mp3q, np3q in place of mp2q, np2q. We also see by com-bining (A.13) with (A.18) that

pHp2q ,Hp3qq##

»1#

ˆ

yp1q , �1yp1q `´

1´ p�1q2¯

yp2q˙

» �1 ” �1 ,

php2q , hp3qq"@

»1"

ˆ

xp1q ,�1xp1q `´

1´ p�1q2¯

xp2q˙

» �1 ” �1 , (A.19)

from which we obtain (cf. (2.5))

pmp2q ,mp3qq

#@»

1@�

«

C

ˆ

#1{2"

�1/ `´

1´ p�1q2¯1{2

Cp#1{2/q

ff

” �p�1q “ �p�1q ” �2 ,

pnp2q , np3qq##

» #�

«

ˆ

@1{2"

�1/ `´

1´ p�1q2¯1{2

�p@1{2/q

ff

” �p�1q “ �p�1q ” �2 . (A.20)

It follows that (cf. (2.6))ˆ

mp3q

p#@q1{2, rp2q

˙

»

ˆ

mp3q

p#@q1{2,

mp2q ´ �1mp1q

r#@p1´ p�1q2qs1{2

˙

»�2 ´ p�1q

2

r1´ p�1q2s1{2” �2 ,

ˆ

np3q

p##q1{2, cp2q

˙

»

ˆ

np3q

p##q1{2,

np2q ´ �1np1q

r##p1´ p�1q2qs1{2

˙

»�2 ´ p�1q

2

r1´ p�1q2s1{2” �2 . (A.21)

�en in the Gram–Schmidt orthogonalization we have (cf. (A.15))

rp3q »mp3q ´ p#@q1{2p�1rp1q ` �2rp2qqr#@p1´ p�1q2 ´ p�2q2qs1{2

cp3q »np3q ´ p##q1{2p�1cp1q ` �2cp2qqr##pp1´ p�1q2 ´ p�2q2qs1{2

. (A.22)

Page 83: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 83

We can express the m, n vectors in terms of the r, c vectors as (cf. (2.14), (2.15), and (A.16))mp3q

p#@q1{2» �1rp1q ` �2rp2q `

´

1´ p�1q2 ´ p�2q

2¯1{2

rp3q ,

np3q

p##q1{2» �1cp1q ` �2cp2q `

´

1´ p�1q2 ´ p�2q

2¯1{2

cp3q . (A.23)

A.4. AMP iterates at C “ 4. At the next step of the AMP iteration we have (cf. (A.17))

mp4q ” CpHp4qq “ C

ˆ

pMp3qqtpnp3qqK

#1{2 `p�p2qqtnp3q

#1{2 `p�p1qqtnp3q

#1{2 ´ �mp2q˙

,

np4q ” �php4qq “ �

ˆ

Mp3qpmp3qqK

#1{2 `�p2qmp3q

#1{2 `�p1qmp3q

#1{2 ´ �np2q˙

. (A.24)

For the purposes of evaluating p�pBqqnp3q for B “ 1, 2 we calculate"

rrtM#1{2

*p1qnp3q “

rp1q

#1{2 pxp1q , np3qq »

rp1q

#1{2# �

„ˆ

�1/ `´

1´ p�1q2¯1{2

˙

�p@1{2/q

“ p#@q1{2��1rp1q ,"

Mcct

#1{2

*p1qnp3q “

yp1q

#1{2 pcp1q , np3qq “ #1{2�1yp1q .

Substituting back into (A.24) gives (cf. (2.10), (2.11), (A.13), and (A.18))

mp4q ” CpHp4qq » C

ˆ

#1{2"

�1yp1q ` �2yp2q `´

1´ p�1q2 ´ p�2q

2¯1{2

yp3q*˙

,

np4q ” �php4qq » �

ˆ

@1{2"

�1xp1q ` �2xp2q `´

1´ p�1q2 ´ p�2q

2¯1{2

xp3q*˙

. (A.25)

It follows that (A.14) continues to hold (approximately) with mp4q, np4q in place of mp2q, np2q. Likewise, (A.21)continues to approximately hold with mp4q, np4q in place of mp3q, np3q. We also have (cf. (A.19))

pHp3q ,Hp4qq##

» p�1q2 ` �2

´

1´ p�1q2¯1{2 (A.21)

“ �2 ,

php3q , hp4qq"@

» p�1q2 ` �2

´

1´ p�1q2¯1{2 (A.21)

“ �2 , (A.26)

from which it we obtain (cf. (2.5) and (A.20))pmp4q ,mp3qq

#@» �p�2q ” �3 ,

pnp4q , np3qq##

» �p�2q ” �3 . (A.27)

It then follows that (cf. (2.6) and (A.21))ˆ

mp4q

p#@q1{2, rp3q

˙

»

ˆ

mp4q

p#@q1{2,mp3q{p#@q1{2 ´ �1rp1q ´ �2rp2q

r1´ p�1q2 ´ p�2q2s1{2

˙

»�3 ´ p�1q

2 ´ p�2q2

r1´ p�1q2 ´ p�2q2s1{2” �3 ,

ˆ

np4q

p##q1{2, cp3q

˙

»

ˆ

np4q

p##q1{2,np3q{p##q1{2 ´ �1cp1q ´ �2cp2q

r1´ p�1q2 ´ p�2q2s1{2

˙

»�3 ´ p�1q

2 ´ p�2q2

r1´ p�1q2 ´ p�2q2s1{2” �3 . (A.28)

In summary, using the notation (2.7), we have (cf. (2.10), (2.11), (A.13), (A.18), and (A.25))

mpC`1q ” CpHpC`1qq » C

ˆ

#1{2"

�1yp1q ` . . . ` �C´1ypC´1q ` p1´ ΓC´1q1{2ypCq

, (A.29)

npC`1q ” �phpC`1qq » �

ˆ

@1{2"

�1xp1q ` . . . ` �C´1xpC´1q ` p1´ΛC´1q1{2xpCq

. (A.30)

Page 84: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

84 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

where the coe�cients are de�ned recursively: we start with �1 ” �1 and �1 ” �1 as in (A.14) (cf. (2.4)). For B ě 1we let �B`1 ” �p�Bq and �B`1 ” �p�Bq as in (A.20) and (A.27) (cf. (2.5)). �en, as in (A.21) and (A.28) (cf. (2.6)), wecan de�ne recursively the constants

�B “�B ´ΛB´1

p1´ΛB´1q1{2, �B “

�B ´ ΓB´1

p1´ ΓB´1q1{2. (A.31)

We use these to de�ne the matrices � and � as in (2.8) and (2.9). �en (A.29) and (A.30) can be rewri�en as (2.10) and(2.11). �e Gram–Schmidt orthogonalization (A.16) and (A.23) then correspond (approximately) to (2.14) and (2.15).

Page 85: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

GARDNER FORMULA FOR ISING PERCEPTRON MODELS AT SMALL DENSITIES 85

References[ABvSY21] A. Adhikari, C. Brennecke, P. von Soosten, and H.-T. Yau. Dynamical approach to the TAP equations for the

Sherrington-Kirkpatrick model. J. Stat. Phys., 183(3):Paper No. 35, 27, 2021.[AJ21] G. B. Arous and A. Jagannath. Sha�ering versus metastability in spin glasses. arXiv:2104.08299, 2021.[ALS21] E. Abbe, S. Li, and A. Sly. Proof of the contiguity conjecture and lognormal limit for the symmetric perceptron. arXiv:2102.13069,

2021.[APZ19] B. Aubin, W. Perkins, and L. Zdeborova. Storage capacity in symmetric binary perceptrons. J. Phys. A, 52(29):294003, 32, 2019.[AS20] A. E. Alaoui and M. Sellke. Algorithmic pure states for the negative spherical perceptron. arXiv:2010.15811, 2020.[BBC`16] C. Baldassi, C. Borgs, J. T. Chayes, A. Ingrosso, C. Lucibello, L. Saglie�i, and R. Zecchina. Unreasonable e�ectiveness of learning

neural networks: From accessible states and robust ensembles to basic algorithmic schemes. Proc. Nat. Acad. Sci. U.S.A.,113(48):E7655–E7662, 2016.

[BG98] A. Bovier and V. Gayrard. Hop�eld models as generalized random mean �eld models. In Mathematical aspects of spin glasses andneural networks, volume 41 of Progr. Probab., pages 3–89. Birkhauser Boston, Boston, MA, 1998.

[BKM`19] J. Barbier, F. Krzakala, N. Macris, L. Miolane, and L. Zdeborova. Optimal errors and phase transitions in high-dimensionalgeneralized linear models. Proc. Natl. Acad. Sci. USA, 116(12):5451–5460, 2019.

[BL76] H. J. Brascamp and E. H. Lieb. On extensions of the Brunn-Minkowski and Prekopa-Leindler theorems, including inequalities for logconcave functions, and with an application to the di�usion equation. J. Functional Analysis, 22(4):366–389, 1976.

[BL00] S. G. Bobkov and M. Ledoux. From Brunn-Minkowski to Brascamp-Lieb and to logarithmic Sobolev inequalities. Geom. Funct. Anal.,10(5):1028–1052, 2000.

[BM11] M. Bayati and A. Montanari. �e dynamics of message passing on dense graphs, with applications to compressed sensing. IEEETrans. Inform. �eory, 57(2):764–785, 2011.

[BMN20] R. Berthier, A. Montanari, and P.-M. Nguyen. State evolution for approximate message passing with non-separable functions. Inf.Inference, 9(1):33–79, 2020.

[Bol14] E. Bolthausen. An iterative construction of solutions of the TAP equations for the Sherrington-Kirkpatrick model. Comm. Math.Phys., 325(1):333–366, 2014.

[Bol19] E. Bolthausen. A Morita type proof of the replica-symmetric formula for SK. In Statistical mechanics of classical and disorderedsystems, volume 293 of Springer Proc. Math. Stat., pages 63–93. Springer, Cham, 2019.

[Bor75] C. Borell. �e Brunn-Minkowski inequality in Gauss space. Invent. Math., 30(2):207–216, 1975.[Bor17] A. Borovkov. Generalization and re�nement of the integro-local Stone theorem for sums of random vectors. �eory Probab. Appl.,

61(4):590–612, 2017.[BRS19] L. Budzynski, F. Ricci-Tersenghi, and G. Semerjian. Biased landscapes for random constraint satisfaction problems. Journal of

Statistical Mechanics: �eory and Experiment, 2019(2):023302, 2019.[BY21] C. Brennecke and H.-T. Yau. A note on the replica symmetric formula for the SK model. arXiv:2109.07354, 2021.[CO19] B. Cakmak and M. Opper. Memory-free dynamics for the Thouless–Anderson–Palmer equations of Ising models with arbitrary

rotation-invariant ensembles of random coupling matrices. Phys. Rev. E, 99(6):062140, 2019.[Cov65] T. M. Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pa�ern recognition. IEEE.

Trans. Electron., 3:326–334, 1965.[CPS18] W.-K. Chen, D. Panchenko, and E. Subag. �e generalized TAP free energy. arXiv:1812.05066, 2018.[CPS21] W.-K. Chen, D. Panchenko, and E. Subag. �e generalized TAP free energy II. Comm. Math. Phys., 381(1):257–291, 2021.[dAT78] J. R. de Almeida and D. J. �ouless. Stability of the Sherrington–Kirkpatrick solution of a spin glass model. J. Phys. A, 11(5):983, 1978.[DMM09] D. L. Donoho, A. Maleki, and A. Montanari. Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci.,

106(45):18914–18919, 2009.[DS18] J. Ding and N. Sun. Capacity lower bound for the Ising perceptron. arXiv:1809.07742, 2018.[Fan20] Z. Fan. Approximate message passing algorithms for rotationally invariant matrices. arXiv:2008.11892, 2020.[FMM21] Z. Fan, S. Mei, and A. Montanari. TAP free energy, spin glasses and variational inference. Ann. Probab., 49(1):1–45, 2021.[FP16] S. Franz and G. Parisi. �e simplest model of jamming. J. Phys. A, 49(14):145001, 2016.[FPS`17] S. Franz, G. Parisi, M. Sevelev, P. Urbani, and F. Zamponi. Universality of the SAT-UNSAT (jamming) threshold in non-convex

continuous constraint satisfaction problems. SciPost Physics, 2(3):019, 2017.[FW21] Z. Fan and Y. Wu. �e replica-symmetric free energy for Ising spin glasses with orthogonally invariant couplings. arXiv:2105.02797,

2021.[Gar87] E. Gardner. Maximum storage capacity in neural networks. Europhys. Le�., 4(4):481, 1987.[Gar88] E. Gardner. �e space of interactions in neural network models. J. Phys. A, 21(1):257, 1988.[Gar02] R. J. Gardner. �e Brunn-Minkowski inequality. Bull. Amer. Math. Soc. (N.S.), 39(3):355–405, 2002.[GD88] E. Gardner and B. Derrida. Optimal storage properties of neural network models. J. Phys. A, 21(1):271, 1988.[GD89] E. Gardner and B. Derrida. �ree un�nished works on the optimal storage capacity of networks. J. Phys. A, 22(12):1983, 1989.[Gor85] Y. Gordon. Some inequalities for Gaussian processes and applications. Israel J. Math., 50(4):265–289, 1985.[Gor88] Y. Gordon. On Milman’s inequality and random subspaces which escape through a mesh in R= . In Geometric aspects of functional

analysis (1986/87), volume 1317 of Lecture Notes in Math., pages 84–106. Springer, Berlin, 1988.[Heb49] D. O. Hebb. �e organization of behavior. Wiley, New York, 1949.

Page 86: G Ñ8 arXiv:2111.02855v1 [math.PR] 4 Nov 2021

86 E. BOLTHAUSEN, S. NAKAJIMA, N. SUN, AND C. XU

[HO56] H. Hadwiger and D. Ohmann. Brunn-Minkowskischer Satz und Isoperimetrie. Math. Z., 66:1–8, 1956.[Hop82] J. J. Hop�eld. Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. U.S.A.,

79(8):2554–2558, 1982.[JM13] A. Javanmard and A. Montanari. State evolution for general approximate message passing algorithms, with applications to spatial

coupling. Inf. Inference, 2(2):115–144, 2013.[KM89] W. Krauth and M. Mezard. Storage capacity of memory networks with binary couplings. J. Physique, 50(20):3057–3066, 1989.[KR98] J. H. Kim and J. R. Roche. Covering cubes by random half cubes, with applications to binary neural networks. J. Comput. System Sci.,

56(2):223–252, 1998. Eighth Annual Workshop on Computational Learning �eory (COLT) (Santa Cruz, CA, 1995).[Lei72] L. Leindler. On a certain converse of Holder’s inequality. In Linear operators and approximation (Proc. Conf., Oberwolfach, 1971),

pages 182–184. Internat. Ser. Numer. Math., Vol. 20, 1972.[Lit74] W. A. Li�le. �e existence of persistent states in the brain. Math. Biosci., 19(1-2):101–120, 1974.[LL01] E. H. Lieb and M. Loss. Analysis, volume 14 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI,

second edition, 2001.[Lus35] L. Lusternik. Die Brunn–Minkowskische ungleichung fur beliebige messbare mengen. C. R. Acad. Sci. URSS, 8:55–58, 1935.[Mau91] B. Maurey. Some deviation inequalities. Geom. Funct. Anal., 1(2):188–197, 1991.[Mez89] M. Mezard. �e space of interactions in neural networks: Gardner’s computation with the cavity method. J. Phys. A, 22(12):2181,

1989.[Mez17] M. Mezard. Mean-�eld message-passing equations in the Hop�eld model and its generalizations. Phys. Rev. E, 95(2):022117, 2017.[MP43] W. S. McCulloch and W. Pi�s. A logical calculus of the ideas immanent in nervous activity. B. Math. Biophys., 5(4):115–133, 1943.[MZZ21] A. Montanari, Y. Zhong, and K. Zhou. Tractability from overparametrization: the example of the negative perceptron.

arXiv:2110.15824, 2021.[OCW16] M. Opper, B. Cakmak, and O. Winther. A theory of solving TAP equations for Ising models with general invariant random matrices.

J. Phys. A, 49(11):114002, 2016.[OW01] M. Opper and O. Winther. Adaptive and self-averaging Thouless–Anderson–Palmer mean-�eld theory for probabilistic modeling.

Phys. Rev. E, 64(5):056131, 2001.[Pet75] V. V. Petrov. Sums of independent random variables. Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 82. Springer-Verlag,

New York-Heidelberg, 1975. Translated from the Russian by A. A. Brown.[Pis86] G. Pisier. Probabilistic methods in the geometry of Banach spaces. In Probability and analysis (Varenna, 1985), volume 1206 of

Lecture Notes in Math., pages 167–241. Springer, Berlin, 1986.[Ple82] T. Ple�a. Convergence condition of the tap equation for the in�nite-ranged ising spin glass model. J. Phys. A, 15(6):1971, 1982.[Pre71] A. Prekopa. Logarithmic concave measures with application to stochastic programming. Acta Sci. Math. (Szeged), 32:301–316, 1971.[Pre73] A. Prekopa. On logarithmic concave measures and functions. Acta Sci. Math. (Szeged), 34:335–343, 1973.[PX21] W. Perkins and C. Xu. Frozen 1-RSB structure of the symmetric Ising perceptron. In Proc. 53rd STOC, pages 1579–1588, 2021.[RV10] M. Rudelson and R. Vershynin. Non-asymptotic theory of random matrices: extreme singular values. In Proceedings of the

International Congress of Mathematicians. Volume III, pages 1576–1602. Hindustan Book Agency, New Delhi, 2010.[RV18] C. Rush and R. Venkataramanan. Finite sample analysis of approximate message passing algorithms. IEEE Trans. Inform. �eory,

64(11):7264–7286, 2018.[SK75] D. Sherrington and S. Kirkpatrick. Solvable model of a spin-glass. Phys. Rev. Le�., 35(26):1792, 1975.[ST03] M. Shcherbina and B. Tirozzi. Rigorous solution of the Gardner problem. Comm. Math. Phys., 234(3):383–422, 2003.[Sto13] M. Stojnic. Another look at the Gardner problem. arXiv:1306.3979, 2013.[Tal99a] M. Talagrand. Intersecting random half cubes. Random Structures Algorithms, 15(3-4):436–449, 1999. Statistical physics methods in

discrete probability, combinatorics, and theoretical computer science (Princeton, NJ, 1997).[Tal99b] M. Talagrand. Self-averaging and the space of interactions in neural networks. Random Structures Algorithms, 14(3):199–213, 1999.[Tal00] M. Talagrand. Intersecting random half-spaces: toward the Gardner-Derrida formula. Ann. Probab., 28(2):725–758, 2000.[Tal11a] M. Talagrand. Mean �eld models for spin glasses. Volume I, volume 54 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A

Series of Modern Surveys in Mathematics. Springer-Verlag, Berlin, 2011. Basic examples.[Tal11b] M. Talagrand. Mean �eld models for spin glasses. Volume II, volume 55 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge.

A Series of Modern Surveys in Mathematics. Springer, Heidelberg, 2011. Advanced replica-symmetry and low temperature.[TAP77] D. J. �ouless, P. W. Anderson, and R. G. Palmer. Solution of ‘solvable model of a spin glass’. Philosophical Magazine, 35(3):593–601,

1977.[TIS76] B. S. Tsirelson, I. A. Ibragimov, and V. N. Sudakov. Norms of Gaussian sample functions. In Proceedings of the Third Japan-USSR

Symposium on Probability Theory (Tashkent, 1975), pages 20–41. Lecture Notes in Math., Vol. 550, 1976.[Wen62] J. G. Wendel. A problem in geometric probability. Math. Scand., 11:109–111, 1962.[Xu21] C. Xu. Sharp threshold for the Ising perceptron model. Ann. Probab., 49(5):2399–2415, 2021.