Dynamical approach to random matrix theoryhtyau/RM-Aug-2016.pdf · Dynamical approach to random matrix theory L aszl o Erd ... Random matrix theory is a fast expanding research area

Dynamical approach to random matrix theory

Laszlo Erdos∗, Horng-Tzer Yau†

May 9, 2017

∗Partially supported by ERC Advanced Grant, RANMAT 338804†Partially supported by the NSF grant DMS-1307444 and a Simons Investigator Award

1

AMS Subject Classification (2010): 15B52, 82B44

Keywords: Random matrix, local semicircle law, Dyson sine kernel, Wigner-Dyson-Mehta conjecture,Tracy-Widom distribution, Dyson Brownian motion.

2

Preface

This book is a concise and self-contained introduction of the recent techniques to prove local spectraluniversality for large random matrices. Random matrix theory is a fast expanding research area and this bookmainly focuses on the methods we participated in developing over the past few years. Many other interestingtopics are not included, nor are several new developments within the framework of these methods. We havechosen instead to present key concepts that we believe are the core of these methods and should be relevant forfuture applications. We keep technicalities to a minimum to make the book accessible to graduate students.With this in mind, we include in this book the basic notions and tools for high dimensional analysis such aslarge deviation, entropy, Dirichlet form and logarithmic Sobolev inequality.

The material in this book originates from our joint works with a group of collaborators in the past severalyears. Not only were the main mathematical results in this book taken from these works, but the presentationof many sections followed the routes laid out in these papers. In alphabetical order, these coauthors werePaul Bourgade, Antti Knowles, Sandrine Peche, Jose Ramırez, Benjamin Schlein and Jun Yin. We wouldlike to thank all of them.

This manuscript was developed and continuously improved over the last five years. We have taught thismaterial in several regular graduate courses at Harvard, Munich and Vienna, in addition to various summerschools and short courses. We are thankful for the generous support of the Institute for Advanced Studies,Princeton, where part of this manuscript was written during the special year devoted to random matrices in2013-2014. L.E. also thanks Harvard University for the continuous support during his numerous visits. L.E.was partially supported by the SFB TR 12 grant of the German Science Foundation and the ERC AdvancedGrant, RANMAT 338804 of the European Research Council. H.-T. Y. would like to thank the NationalCenter for the Theoretic Sciences at the National Taiwan University, where part of the manuscript waswritten, for the hospitality and support for his several visits. H.-T. Y. gratefully acknowledges the supportfrom NSF DMS-1307444 and a Simons Investigator award.

Finally, we are grateful to the editorial support from the publisher, to Amol Aggarwal, Johannes Alt,Patrick Lopatto for careful reading of the manuscript and to Alex Gontar for his help in composing thebibliography.

3

Contents

1 Introduction 6

2 Wigner matrices and their generalizations 10

3 Eigenvalue density 113.1 Wigner semicircle law and other canonical densities . . . . . . . . . . . . . . . . . . . . . . . . 113.2 The moment method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 The resolvent method and the Stieltjes transform . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Invariant ensembles 164.1 Joint density of eigenvalues for invariant ensembles . . . . . . . . . . . . . . . . . . . . . . . . 164.2 Universality of classical invariant ensembles via orthogonal polynomials . . . . . . . . . . . . 18

5 Universality for generalized Wigner matrices 245.1 Different notions of universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.2 The three-step strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6 Local semicircle law for universal Wigner matrices 276.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.2 Spectral information on S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.3 Stochastic domination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306.4 Statement of the local semicircle law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.5 Appendix: Behaviour of Γ and Γ and the proof of Lemma 6.3 . . . . . . . . . . . . . . . . . . 33

7 Weak local semicircle law 367.1 Proof of the weak local semicircle law, Theorem 7.1 . . . . . . . . . . . . . . . . . . . . . . . . 367.2 Large deviation estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

8 Proof of the local semicircle law 488.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488.2 Self-consistent equations on two levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508.3 Proof of the local semicircle law without using the spectral gap . . . . . . . . . . . . . . . . . 52

9 Sketch of the proof of the local semicircle law using the spectral gap 62

10 Fluctuation averaging mechanism 6510.1 Intuition behind the fluctuation averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6510.2 Proof of Lemma 8.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6610.3 Alternative proof of (8.47) of Lemma 8.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

11 Eigenvalue location: the rigidity phenomenon 7611.1 Extreme eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7611.2 Stieltjes transform and regularized counting function . . . . . . . . . . . . . . . . . . . . . . . 7611.3 Convergence speed of the empirical distribution function . . . . . . . . . . . . . . . . . . . . . 7911.4 Rigidity of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

12 Universality for matrices with Gaussian convolutions 8212.1 Dyson Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8212.2 Derivation of Dyson Brownian motion and perturbation theory . . . . . . . . . . . . . . . . . 8412.3 Strong local ergodicity of the Dyson Brownian motion . . . . . . . . . . . . . . . . . . . . . . 8512.4 Existence and restriction of the dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4

13 Entropy and the logarithmic Sobolev inequality (LSI) 9213.1 Basic properties of the entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9213.2 Entropy on product spaces and conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9413.3 Logarithmic Sobolev inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9613.4 Hypercontractivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10213.5 Brascamp-Lieb inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10413.6 Remarks on the applications of the LSI to random matrices . . . . . . . . . . . . . . . . . . . 10513.7 Extensions to the simplex, regularization of the DBM . . . . . . . . . . . . . . . . . . . . . . 108

14 Universality of the Dyson Brownian Motion 11314.1 Main ideas behind the proof of Theorem 14.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11414.2 Proof of Theorem 14.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11514.3 Restriction to the simplex via regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12014.4 Dirichlet form inequality for any β > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12114.5 From gap distribution to correlation functions: proof of Theorem 12.4. . . . . . . . . . . . . . 12214.6 Details of the proof of Lemma 14.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

15 Continuity of local correlation functions under matrix OU process 12715.1 Proof of Theorem 15.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12915.2 Proof of the correlation function comparison Theorem 15.3 . . . . . . . . . . . . . . . . . . . 131

16 Universality of Wigner matrices in small energy window: GFT 13616.1 The Green function comparison theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13616.2 Conclusion of the three step strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

17 Edge Universality 142

18 Further results and historical notes 15018.1 Wigner matrices: bulk universality in different senses . . . . . . . . . . . . . . . . . . . . . . . 15018.2 Historical overview of the three-step strategy for bulk universality . . . . . . . . . . . . . . . 15118.3 Invariant ensembles and log-gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15318.4 Universality at the spectral edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15518.5 Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15518.6 General mean field models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15718.7 Beyond mean field models: band matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

5

1 Introduction

“Perhaps I am now too courageous when I try to guess the distribution of the distancesbetween successive levels (of energies of heavy nuclei). Theoretically, the situation is quitesimple if one attacks the problem in a simpleminded fashion. The question is simply what arethe distances of the characteristic values of a symmetric matrix with random coefficients.”

Eugene Wigner on the Wigner surmise, 1956

Random matrices appeared in the literature as early as 1928, when Wishart [138] used them in statistics.The natural question regarding their eigenvalue statistics, however, was not raised until the pioneeringwork [137] of Eugene Wigner in the 1950s. Wigner’s original motivation came from nuclear physics when henoticed from experimental data that gaps in energy levels of large nuclei tend to follow the same statisticsirrespective of the material. Quantum mechanics predicts that energy levels are eigenvalues of a self-adjointoperator, but the correct Hamiltonian operator describing nuclear forces was not known at that time. Inaddition, the computation of the energy levels of large quantum systems would have been impossible evenwith the full Hamiltonian explicitly given. Instead of pursuing a direct solution of this problem, Wignerappealed to a phenomenological model to explain his observation. Wigner’s pioneering idea was to modelthe complex Hamiltonian by a random matrix with independent entries. All physical details of the systemwere ignored except one, the symmetry type: systems with time reversal symmetry were modeled by realsymmetric random matrices, while complex Hermitian random matrices were used for systems without timereversal symmetry (e.g. with magnetic forces). This simple-minded model amazingly reproduced the correctgap statistics, indicating a profound universality principle working in the background.

Notwithstanding their physical significance, random matrices are also very natural mathematical objectsand their studies could have been initiated by mathematicians driven by pure curiosity. A large numberof random numbers and vectors have been known to exhibit universal patterns; the obvious examples arethe law of large numbers and the central limit theorem. What are their analogues in the non-commutativesetting, e.g. for matrices? Focusing on the spectrum, what do eigenvalues of typical large random matriceslook like?

As the first result of this type, Wigner proved a type of law of large numbers for the density of eigenvalues,which we now explain. The (real or complex) Wigner ensembles consist of N × N self-adjoint matricesH = (hij) with matrix elements having mean zero and variance 1/N that are independent up to the symmetryconstraint hij = hji. The Wigner semicircle law states that the empirical density of the eigenvalues of H is

given by the semicircle law, %sc(x) = 12π

√(4− x2)+, as N →∞, independent of the details of the distribution

of hij .

On the scale of individual eigenvalues, Wigner predicted that the fluctuations of the gaps are universaland their distribution is given by a new law, the Wigner surmise. This might be viewed as the randommatrix analogue of the central limit theorem.

After Wigner’s discovery, Dyson, Gaudin and Mehta achieved several fundamental mathematical results,in particular they were able to compute the gap distribution and the local correlation functions of theeigenvalues for Gaussian ensembles. They are called Gaussian Orthogonal Ensemble (GOE) and GaussianUnitary Ensemble (GUE) corresponding to the two most important symmetry classes, the real symmetricand complex Hermitian matrices. The Wigner surmise turned out to be slightly wrong and the correct law isgiven by the Gaudin distribution. Dyson and Mehta gradually formulated what is nowadays known as theWigner-Dyson-Mehta (WDM) universality conjecture. As presented in the classical treatise of Mehta [105],this conjecture asserts that the local eigenvalue statistics for large random matrices with independent entriesare universal, i.e., they do not depend on the particular distribution of the matrix elements. In particular,they coincide with those in the Gaussian case that were computed explicitly.

On a more philosophic level, we can recast Wigner’s vision as the hypothesis that the eigenvalue gapdistributions for large complicated quantum systems are universal in the sense that they depend only on thesymmetry class of the physical system but not on other detailed structures. Therefore, the Wigner-Dyson-Mehta universality conjecture is merely a test of Wigner’s hypothesis for a special class of matrix models, theWigner ensembles, which are characterized by the independence of their matrix elements. The other large

6

class of matrix models is the invariant ensembles. They are defined by a Radon-Nikodym density of theform exp(−TrV (H)) with respect to the flat Lebesgue measure on the space of real symmetric or complexHermitian matrices H. Here V is a real valued function called the potential of the invariant ensemble.These distributions are invariant under orthogonal or unitary conjugation, but the matrix elements arenot independent except for the Gaussian case. The universality conjecture for invariant ensembles assertsthat the local eigenvalue gap statistics are independent of the function V . In contrast to the Wigner case,substantial progress was made for invariant ensembles in the last two decades. The key element of thissuccess was that invariant ensembles, unlike Wigner matrices, have explicit formulas for the joint densities ofthe eigenvalues. These formulas express the eigenvalue correlation functions as determinants whose entriesare given by functions of orthogonal polynomials. The limiting local eigenvalue statistics are thus determinedby the asymptotic behavior of these formulas and in particular those of the orthogonal polynomials as thesize of the matrix tends to infinity. A key important tool, the Riemann-Hilbert method [71], was broughtinto this subject by Fokas, Its and Kitaev, and the universality of eigenvalue statistics was established forlarge classes of invariant ensembles by Bleher-Its [20] and by Deift and collaborators [37]– [41].

Behind the spectacular progress in understanding the local statistics of invariant ensembles, the corner-stone of this approach lies in the fact that there are explicit formulas to represent eigenvalue correlationfunctions by orthogonal polynomials – a key observation made in the original work of Gaudin and Mehtafor Gaussian random matrices. For Wigner ensembles, there are no explicit formulas for the eigenvaluestatistics and the WDM conjecture was open for almost fifty years with virtually no progress. The firstsignificant advance in this direction was made by Johansson [85], who proved the universality for complexHermitian matrices under the assumption that the common distribution of the matrix entries has a substan-tial Gaussian component, i.e., the random matrix H is of the form H = H0 + aHG where H0 is a generalWigner matrix, HG is the GUE matrix, and a is a certain, not too small, positive constant independent ofN . His proof relied on an explicit formula by Brezin and Hikami [30, 31] that uses a certain version of theHarish-Chandra-Itzykson-Zuber formula [84]. These formulas are available for the complex Hermitian caseonly, which restricted the method to this symmetry class.

If local spectral universality is so ubiquitous as Wigner, Dyson and Mehta conjectured, there must be anunderlying mechanism driving local statistics to their universal fixed point. In hindsight, the existence ofsuch a mechanism is almost a synonym of “universality”. However, up to ten years ago there was no clearindication whether a solution of the WDM conjecture would rely on some yet to be discovered formulas orwould come from some other deep insight.

The goal of this book is to give a self-contained introduction to a new approach to local spectral uni-versality which, in particular, resolves the WDM conjecture. To keep technicalities to a minimum, we willconsider the simplest class of matrices that we call generalized Wigner matrices (Definition 2.1) and wewill focus on proving universality in a simpler form, in the averaged energy sense (see Section 5.1) and thenecessary prerequisites. We stress that these are not the strongest results that are currently available, butthey are good representatives of the general ideas we have developed in the past several years. We believethat these techniques are sufficiently mature to be presented in a book format.

This approach consists of the following three steps:

Step 1. Local semicircle law: It provides an a priori estimate showing that the density of eigenvalues ofgeneralized Wigner matrices is given by the semicircle law at very small microscopic scales, i.e., down tospectral intervals that contain Nε eigenvalues.

Step 2. Universality for Gaussian divisible ensembles: It proves that the local statistics of Gaussian divisibleensembles H0 + aHG are the same as those of the Gaussian ensembles HG as long as a > N−1/2+ε, i.e.,already for very small a.

Step 3. Approximation by a Gaussian divisible ensemble: It is a type of “density argument” that extendsthe local spectral universality from Gaussian divisible ensembles to all Wigner ensembles.

The conceptually novel point of our method is Step 2. The eigenvalue distributions of the Gaussiandivisible ensembles, written in the form e−t/2H0 +

√1− e−tHG, are the same as that of the solution of a

matrix valued Ornstein-Uhlenbeck (OU) process Ht for any time t > 0. Dyson [45] observed half a century

7

ago that the dynamics of the eigenvalues of Ht is given by an interacting stochastic particle system, calledthe Dyson Brownian motion (DBM). In addition, the invariant measure of this dynamics is exactly theeigenvalue distribution of GOE or GUE. This invariant measure is also a Gibbs measure of point particles inone dimension interacting via a long range logarithmic potential. Using a heuristic physical argument, Dysonremarked [45] that the DBM reaches its “local equilibrium” on a short time scale t & N−1. We will refer tothis as Dyson’s conjecture, although it was rather an intuitive physical picture than an exact mathematicalstatement.

Since Dyson’s work in the sixties, there was virtually no progress in proving this conjecture. Besides thelimit of available mathematical tools, one main factor is the vague formulation of the conjecture involving thenotion of “local equilibrium”, which even nowadays is not well-defined for a Gibbs measure with a generallong range interaction. Furthermore, a possible connection between Dyson’s conjecture and the solution ofthe WDM conjecture has never been elucidated in the literature.

In fact, “relaxation to local equilibrium” in a time scale t refers to the phenomenon that after time t thedynamics has changed the system, at least locally, from its initial state to a local equilibrium. It thereforeappears counterintuitive that one may learn anything useful in this way about the WDM conjecture, sincethe latter concerns the initial state. The key point is that by applying local relaxation to all initial states(within a reasonable class) simultaneously, Step 2 generates a large set of random matrix ensembles for whichuniversality holds. We prove that, for the purpose of universality, this set is sufficiently dense so that anyWigner matrix H is sufficiently close to a Gaussian divisible ensemble of the form e−t/2H0 +

√1− e−tHG

with a suitably chosen H0. Originally, our motivation was driven not by Dyson’s conjecture, but by thedesire to prove the universality for Gaussian divisible ensembles with the Gaussian component as small aspossible. As it turned out, our method used in Step 2 actually proved a precisely formulated version ofDyson’s conjecture.

The applicability of our approach actually goes well beyond the original scope; in the last years themethod has been extended to invariant ensembles, sample covariance matrices, adjacency matrices of sparserandom graphs and certain band matrices. At the end of the book we will give a short overview of theseapplications. In the following, we outline the main contents of the book.

In Sections 2–4 we start with an elementary recapitulation of some well known concepts, such as themoment method, the orthogonal polynomial method, the emergence of the sine kernel, the Tracy-Widomdistribution and the invariant ensembles. These sections are included only to provide backgrounds for thetopics discussed in this book; details are often omitted, since these issues are discussed extensively in otherbooks. The core material starts from Section 5 with a precise definition of different concepts of universalityand with the formulation of a representative theorem (Theorem 5.1) that we eventually prove in this book.We also give here a detailed overview of the three step strategy.

The first step, the local version of Wigner’s semicircle law is given in Section 6. For pedagogical reasons,we give the proof on several levels. A weaker version is given in Section 7 whose proof is easier; an ambitiousreader may skip this section. The proof of the stronger version is found in Section 8 which gives the optimalresult in the bulk of the spectrum. To obtain the optimal result at the edge, an additional idea is needed;this is sketched in Section 9, but we do not give all the technical details. This section may be skipped atfirst reading. A key ingredient of the proofs, the fluctuation averaging mechanism, is presented separately inSection 10. Important consequences of the local law, such as rigidity, speed of convergence and eigenvectordelocalization, are given in Section 11.

The second step, the analysis of the Dyson Brownian motion and the proof of Dyson’s conjecture, ispresented in Section 12 and Section 14. Before the proof in Section 14, we added a separate Section 13 devotedto general tools of very large dimensional analysis, where we introduce the concepts of entropy, Dirichletform, Bakry-Emery method, and logarithmic Sobolev inequality. Some concepts (e.g. the Brascamp-Liebinequality and hypercontractivity) are not used in this book but we included them since they are useful andrelated topics.

The third step, the perturbative argument to compare arbitrary Wigner matrices with Gaussian divisiblematrices, is given in two versions. A shorter but somewhat weaker version, the continuity of the correlationfunctions, is found in Section 15. A technically more involved version, the Green function comparison theoremis presented in Section 16. This completes the proof of our main result.

8

Separately, in Section 17 we prove the universality at the edge; the main input is the strong form of thelocal semicircle law at the edge. Finally, we give a historical overview, discuss newer results, and providesome outlook on related models in Section 18.

This book is not a comprehensive monograph on random matrices. Although we summarize a few generalconcepts at the beginning, we mostly focus on a concrete strategy and the necessary technical steps behindit. For readers interested in other aspects, in addition to the classical book of Mehta [105], several excellentworks are available that present random matrices in a broader scope. The books by Anderson, Guionnetand Zeitouni [8] and Pastur and Shcherbina [112] contain extensive material starting from the basics. Tao’sbook [127] provides a different aspect to this subject and is self-contained as a graduate textbook. Forrester’smonograph [72] is a handbook for any explicit formulas related to random matrices. Finally, [6] is an excellentcomprehensive overview of diverse applications of random matrix theory in mathematics, physics, neuralnetworks and engineering.

Finally, we list a few notational conventions. In order to focus on the essentials, we will not follow thedependencies of various constants on different parameters. In particular, we will use the generic letters Cand c to denote positive constants, whose values may change from line to line and which may depend onsome fixed basic parameters of the model. For two positive quantities A and B, we will write A . B toindicate that there exists a constant C such that A 6 CB. If A and B are comparable in the sense thatA . B and B . A, then we write A B. We introduce the notation JA,BK := Z ∩ [A,B] for the set ofintegers between any two real numbers A < B.

9

2 Wigner matrices and their generalizations

Consider N ×N square matrices of the form

H = H(N) =

h11 h12 . . . h1N

h21 h22 . . . h2N

......

...hN1 hN2 . . . hNN

(2.1)

with centered entries

Ehij = 0, i, j = 1, 2, . . . , N. (2.2)

The random variables hij , i, j = 1, . . . , N are real or complex random variables subject to the symmetryconstraint hij = hji so that H is either Hermitian (complex) or symmetric (real). Let S = S(N) denote thematrix of variances

S :=

s11 s12 . . . s1N

s21 s22 . . . s2N

......

...sN1 sN2 . . . sNN

, sij := E|hij |2. (2.3)

We assume thatN∑j=1

sij = 1, for every i = 1, 2, . . . , N, (2.4)

i.e., the deterministic N ×N matrix of variances, S = (sij), is symmetric and doubly stochastic. The size ofthe random matrix, N , is the fundamental limiting parameter throughout this book. Most results becomesharp in the large N limit. Many quantities, such as the distribution of H, the matrix S, naturally dependon N , but for notational simplicity we will often omit this dependence from the notation.

The simplest case, when

sij =1

Ni, j = 1, 2, . . . , N,

is called the Wigner matrix ensemble. We summarize this setup in the following definition:

Definition 2.1. An N ×N symmetric or Hermitian random matrix (2.1) is called a universal Wigner matrix(ensemble) if the entries are centered (2.2), their variances sij = E|hij |2 satisfy∑

j

sij = 1, i = 1, 2, . . . , N, (2.5)

and hij : i 6 j are independent. An important subclass of the universal Wigner ensembles is calledgeneralized Wigner matrices (ensembles) if, additionally, the variances are comparable, i.e.,

0 < Cinf 6 Nsij 6 Csup <∞, i, j = 1, 2, . . . , N, (2.6)

holds with some fixed positive constants Cinf , Csup. For Hermitian ensembles, we additionally require thatfor each i, j the 2× 2 covariance matrix is bounded by C/N in matrix sense, i.e.,

Σij : =

(E(<hij)2 E(<hij)(=hij)

E(<hij)(=hij) E(=hij)2

)>

C

N.

In the special case sij = 1/N and Σij = 12N I2×2 we recover the original definition of the Wigner matrices

or Wigner ensemble [137].

10

In this book we will focus on the generalized Wigner matrices. These are mean-field models in the sensethat the typical size of the matrix elements hij are comparable, see (2.6). In Section 18.7 we will discussmodels beyond the mean field regime.

The most prominent Wigner ensembles are the Gaussian Orthogonal Ensemble (GOE) and the GaussianUnitary Ensemble (GUE); i.e., real symmetric and complex Hermitian Wigner matrices with rescaled matrixelements

√Nhij being standard Gaussian variables. More precisely, in the Hermitian case,

√Nhij is a

standard complex Gaussian variable, i.e., E|√Nhij |2 = 1 with the real and imaginary part independent

having the same variance when i 6= j. Furthermore,√Nhii is a real standard Gaussian variable with

variance 1. For the real symmetric case, we require that E|√Nhij |2 = 1 = E|

√Nhii|2/2 for any i 6= j.

For simplicity of the presentation, in the case of the Wigner ensembles, we will assume that not only thevariances of hij , i < j are identical, but they are identically distributed. In this case we fix a distribution

ν and we assume that the rescaled matrix elements√Nhij are distributed according to ν. Depending on

the symmetry type, the diagonal elements may have a slightly different distribution, but we will omit thissubtlety from the discussion. The distribution ν will be called the single entry distribution of H. In the case

of generalized Wigner matrices, we assume that the rescaled matrix elements s−1/2ij hij , with unit variance,

all have law ν.We will need some decay property of ν, and we will consider two decay types. Either we assume that ν

has subexponential decay, i.e., that there are constants C, ϑ > 0 such that for any s > 0∫R1(|x| > s

)dν(x) 6 C exp

(−sϑ

); (2.7)

or we assume that ν has a polynomial decay with arbitrary high power, i.e.,∫R1(|x| > s

)dν(x) 6 Cp(1 + s)−p, ∀ s > 0, (2.8)

for any p ∈ N.Another class of random matrices, which even predate Wigner, are the random sample covariance ma-

trices. These are matrices of the form

H = X∗X, (2.9)

where X is a rectangular M ×N matrix with centered i.i.d. entries with variance E|Xij |2 = M−1. Note thatthe matrix elements of H are not independent, but they are generated from the independent matrix elementsof X in a straightforward way. These matrices appear in statistical samples and were first considered byWishart [138]. In the case when Xij are centered Gaussian, the random covariance matrices are calledWishart matrices or ensemble.

3 Eigenvalue density

3.1 Wigner semicircle law and other canonical densities

For real symmetric or complex Hermitian Wigner matrix H, let λ1 6 λ2 6 . . . 6 λN denote the eigenvalues.The empirical distribution of eigenvalues follows a universal pattern, the Wigner semicircle law. To formulateit more precisely, note that the typical spacing between neighboring eigenvalues is of order 1/N , so in a fixedinterval [a, b] ⊂ R, one expects macroscopically many (of order N) eigenvalues. More precisely, it can beshown (first proof was given by Wigner [137]) that for any fixed a 6 b real numbers,

limN→∞

1

N#i : λi ∈ [a, b]

=

∫ b

a

%sc(x)dx, %sc(x) :=1

2π

√(4− x2)+, (3.1)

where (a)+ := maxa, 0 denotes the positive part of the number a. Note the emergence of the universaldensity, the semicircle law, that is independent of the details of the distribution of the matrix elements. The

11

semicircle law holds beyond Wigner matrices: it characterizes the eigenvalue density of the universal Wignermatrices (see Definition 2.1).

For the random covariance matrices (2.9), the empirical density of eigenvalues λi of H converges to theMarchenko-Pastur law [103] in the limit when M,N →∞ such that d = N/M is fixed 0 < d 6 1:

limN→∞

1

N#i : λi ∈ [a, b]

=

∫ b

a

%MP (x)dx, %MP (x) :=1

2πd

√[(λ+ − x)(x− λ−)

]+

x2(3.2)

with λ± := (1±√d)2 being the spectral edges. Note that in case M 6 N , the matrix H has macroscopically

many zero eigenvalues, otherwise the spectra of XX∗ and X∗X coincide so the Marchenko-Pastur law canbe applied to all nonzero eigenvalues with the role of M and N exchanged.

3.2 The moment method

The eigenvalue density is commonly approached via the fairly robust moment method (see [8] for an expose)that was also the original approach of Wigner to prove the semicircle law [137]. The following proof isformulated for Wigner matrices with identically distributed entries, in particular with variances sij = N−1,but a slight modification of the argument also applies to more general ensembles satisfying (2.5). Thesimplest moment is the second moment, TrH2:

N∑i=1

λ2i = TrH2 =

N∑i,j=1

|hij |2.

Taking expectation and using (2.5) we have

1

N

∑i

Eλ2i =

1

N

∑ij

sij = 1.

In general, we can compute traces of all even powers of H, i.e.,

E TrH2k

by expanding the product as

E∑

i1,i2,...i2k

hi1i2hi2i3 . . . hi2ki1 . (3.3)

Since the expectation of each matrix element vanishes, each factor hxy must be paired with at least anothercopy hyx = hxy, otherwise the expectation value is zero. In general, there is no need that the sequence forma perfect pairing. If we identify hyx with hxy, the only restriction is that the matrix element hxy shouldappear at least twice in the sequence. Under this condition, the main contribution comes from the indexsequences which satisfy the perfect pairing condition such that each hxy is paired with a unique hyx in (3.3).These sequences can be classified according to their complexity, and it turns out that the main contributioncomes from the so-called backtracking sequences.

An index sequence i1i2i3 . . . i2ki1, returning to the original index i1, is called backtracking if it can besuccessively generated by a substitution rule

a→ aba, b ∈ 1, 2, . . . , N, b 6= a, (3.4)

with an arbitrary index b. For example, we represent the term

hi1i2hi2i3hi3i2hi2i4hi4i5hi5i4hi4i2hi2i1 , i1, i2, . . . , i5 ∈ 1, 2, . . . , N, (3.5)

in the expansion of TrH8 (k = 4) by a walk of length 2× 4 on the set 1, 2, . . . , N. This path is generatedby the operation (3.4) in the following order

i1 → i1i2i1 → i1i2i3i2i1 → i1i2i3i2i4i2i1 → i1i2i3i2i4i5i4i2i1, with i1 6= i2, i2 6= i3, i3 6= i4, i4 6= i5.

12

It may happen that two non-successive labels coincide (e.g. i1 = i3), but the contribution of such terms is bya factor 1/N less than the leading term so we can neglect them. Thus we assume that all labels i1, i2, i3, i4, i5are distinct. We may bookkeep these paths by two objects:

a) a graph on vertices labelled by 1, 2, 3, 4, 5 (i.e., by the indices of the labels ij) and with edges definedby walking over the vertices in following order 1, 2, 3, 2, 4, 5, 4, 2, 1, i.e., the order of the indices of thelabels in (3.5);

b) an assignment of a distinct label ij , j = 1, 2, 3, 4, 5, to each vertex.

It is easy to see that the graph generated by backtracking sequences is a (double-edged) tree on thevertices 1, 2, 3, 4, 5. Now we count the combinatorics of the objects a) and b) separately. We start with a).

Lemma 3.1. The number of graphs with k double-edges, derived from backtracking sequences, is explicitlygiven by the Catalan numbers, Ck := 1

k+1

(2kk

).

Proof. There is a one to one correspondence between backtracking sequences and the number of nonnegativeone dimensional random walks of length 2k returning to the origin at the time 2k. This is defined by thesimple rule, that a forward step a→ b in the substitution rule (3.4) will be represented by a step to the rightin the 1d random walk, while the backtracking step b→ a in (3.4) is step to the left. Since at any momentthe number of backtracking steps cannot be larger than the number of forward steps, the walk remains onthe nonnegative semi-axis.

From this interpretation, it is easy to show that the recursive relation

Cn =

n−1∑k=0

CkCn−k−1, C0 = C1 = 1,

holds for the number Cn of the backtracking sequences of total length 2n. Elementary combinatorics showsthat the solution to this recursion is given by the Catalan numbers, which proves the lemma. We remarkthat Cn has many other definitions. Alternatively, it is also the number of planar binary trees with n + 1vertices and one could also use this definition to verify the lemma.

The combinatorics of label-assignments in step b) for backtracking paths of length 2k is N(N−1) . . . (N−k) ≈ Nk+1 for any fixed k in the N → ∞ limit. It is easy to see that among all paths of length 2k, thebacktracking paths support the maximal number (k + 1) independent indices. The combinatorics for allnon-backtracking paths is at most CNk, i.e., by a factor 1/N less, thus they are negligible. Hence ETrH2k

can be computed fairly precisely for each finite k:

1

NETrH2k =

1

k + 1

(2k

k

)+Ok(N−1). (3.6)

Note that the number of independent labels, Nk+1, after dividing by N , exactly cancels the size of the k-foldproduct of variances, (E|h|2)k = N−k.

The expectation of the traces of odd powers of H is negligible since they can never satisfy the pairingcondition. One can easily check that ∫

Rx2kρsc(x)dx = Ck, (3.7)

i.e., the semicircle law is identified as the probability measure on R whose even moments are the Catalannumbers and the odd moments vanish. This proves that

1

NETrP (H)→

∫RP (x)%sc(x)dx (3.8)

for any polynomial P . With standard approximation arguments, the polynomials can be replaced with anycontinuous functions with compact support and even with characteristic functions of intervals. This proves

13

the semicircle law (3.1) for the Wigner matrices in the “expectation sense”. To rigorously complete the proofof the semicircle law (3.1) without expecatation, we also need to control the variance of TrP (H) in the lefthand side of (3.8). Although this can also be done by the moment method, we will not get into furtherdetails here; see [8] for a detailed proof.

Finally, we also remark that there are index sequences in (3.3) that require to compute higher thanthe second moment of h. While the combinatorics of these terms is negligible, the proof above assumedpolynomial decay (2.8), i.e., that all rescaled moments of h are finite, E|hij |m 6 CmN−m/2. This conditioncan be relaxed by a cutoff argument which we skip here. The interested reader should consult [8] for moredetails.

3.3 The resolvent method and the Stieltjes transform

An alternative approach to the eigenvalue density is via the Stieltjes transform. The empirical measure ofthe eigenvalues is defined by

%N (dx) :=1

N

N∑j=1

δ(x− λj)dx.

The Stieltjes transform of the empirical measure is defined by

m(z) = mN (z) :=1

NTr

1

H − z=

1

N

N∑j=1

1

λj − z=

∫R

d%N (x)

x− z(3.9)

for any z = E + iη, E ∈ R, η > 0. Notice that m is simply the normalized trace of the resolvent of therandom matrix H with spectral parameter z. The real part E = Re z will often be referred to ”energy”,alluring to the quantum mechanical interpretation of the spectrum of H. An important property of theStieltjes transform of any measure on R is that its imaginary part is positive whenever Im z > 0.

For large z one can expand mN as follows

mN (z) =1

NTr

1

H − z= − 1

Nz

∞∑m=0

(Hz

)m, (3.10)

so after taking the expectation, using (3.6) and neglecting the error terms, we get

EmN (z) ≈ −∞∑k=0

1

k + 1

(2k

k

)z−(2k+1), (3.11)

which, after some calculus, can be identified as the Laurent series of 12 (−z +

√z2 − 4). The approximation

becomes exact in the N → ∞ limit. Although the expansion (3.10) is valid only for large z, given that thelimit is an analytic function of z, one can extend the relation

limN→∞

EmN (z) =1

2(−z +

√z2 − 4) (3.12)

by analytic continuation to the whole upper half plane z = E + iη, η > 0. It is an easy exercise to see thatthis is exactly the Stieltjes transform of the semicircle density, i.e.,

msc(z) :=1

2(−z +

√z2 − 4) =

∫R

%sc(x)dx

x− z. (3.13)

The square root function is chosen with a branch cut in the segment [−2, 2] so that√z2 − 4 z at infinity.

This guarantees that Immsc(z) > 0 for Im z > 0. Since the Stieltjes transform identifies the measureuniquely, and pointwise convergence of Stieltjes transforms implies weak convergence of measures, we obtain

E %N (dx) %sc(x)dx. (3.14)

14

The relation (3.12) actually holds with high probability, i.e., for any z with Im z > 0,

limN→∞

mN (z) =1

2(−z +

√z2 − 4), (3.15)

in probability. In the next sections we will prove this limit with an effective error term via the resolventmethod.

The semicircle law can be identified in many different ways. The moment method in Section 3.2 utilizedthe fact that the moments of the semicircle density are given by the Catalan numbers (3.7), which alsoemerged as the normalized traces of powers of H, see (3.6). The resolvent method relies on the fact that mN

approximately satisfies a self-consistent equation, mN ≈ −(z + mN )−1, that is very close to the quadraticequation that msc from (3.13) satisfies:

msc(z) = − 1

z +msc(z).

In other words, in the resolvent method the semicircle density emerges via a specific relation for its Stieltjestransform. It turns out that this approach allows us to perform a much more precise analysis, especiallyin the short scale regime, where Im z approaches to 0 as a function of N . Since the Stieltjes transform ofa measure at spectral parameter z = E + iη essentially identifies the measure around E on scale η > 0, aprecise understanding of mN (z) for small Im z will yield a local version of the semicircle law. This will beexplained in Section 6.

15

4 Invariant ensembles

4.1 Joint density of eigenvalues for invariant ensembles

There is another natural way to define probability distributions on real symmetric or complex Hermitianmatrices apart from directly imposing a given probability law ν on their entries. They are obtained bydefining a density function directly on the set of matrices:

P(H)dH := Z−1 exp (−N TrV (H))dH. (4.1)

Here dH =∏i6j dHij is the flat Lebesgue measure (in case of complex Hermitian matrices and i < j, dHij

is the Lebesgue measure on the complex plane C). The function V : R → R is assumed to grow mildly atinfinity (some logarithmic growth would suffice) to ensure that the measure defined in (4.2) is finite, and Zis the normalization factor. Probability distributions of the form (4.1) are called invariant ensembles sincethey are invariant under the orthogonal or unitary conjugation (in case of symmetric or Hermitian matrices,respectively). For example, in the Hermitian case, for any fixed unitary matrix U , the transformation

H → U∗HU

leaves the distribution (4.1) invariant thanks to TrV (U∗HU) = TrV (H).

Wigner matrices and invariant ensembles form two different universes with quite different mathematicaltools available for their studies. In fact, these two classes are almost disjoint, the Gaussian ensembles beingthe only invariant Wigner matrices. This is the content of the following lemma:

Lemma 4.1 ( [37] or Theorem 2.6.3 [105]). Suppose that the real symmetric or complex Hermitian matrixensembles given in (4.1) have independent entries hij, i 6 j. Then V (x) is a quadratic polynomial, V (x) =ax2 + bx+ c with a > 0. This means that apart from a trivial shift and normalization, the ensemble is GOEor GUE.

For ensembles that remain invariant under the transformations H → U∗HU for any unitary matrix U(or, in case of symmetric matrices H, for any orthogonal matrix U), the joint probability density functionof all the N eigenvalues can be explicitly computed. These ensembles are typically given by

P(H)dH = Z−1 exp(− β

2N TrV (H)

)dH. (4.2)

which is of the form (4.1) with a traditional extra factor β/2 that makes some later formulas nicer. Theparameter β is determined by the symmetry type; β = 1 for real symmetric ensembles and β = 2 for complexHermitian ensembles.

The joint (symmetrized) probability density of the eigenvalues of H can be computed explicitly:

pN (λ1, λ2, . . . , λN ) = const.∏i<j

(λi − λj)βe−β2N

∑Nj=1 V (λj). (4.3)

In particular, for the Gaussian case, V (λ) = 12λ

2 is quadratic and thus the joint distribution of the GOE(β = 1) and GUE (β = 2) eigenvalues is given by

pN (λ1, λ2, . . . , λN ) = const.∏i<j

(λi − λj)βe−14βN

∑Nj=1 λ

2j . (4.4)

In particular, the eigenvalues are strongly correlated. (In this section we neglect the ordering of the eigen-values and we will consider symmetrized statistics.)

The emergence of the Vandermonde determinant in (4.3) is a result of integrating out the ”angle” variablesin (4.2), i.e., the unitary matrix in the diagonalization of H = UΛU∗. For illustration, we now show this

16

formula for a 2× 2 matrix. Consider first the real case. By diagonalization, any real symmetric 2× 2 matrixcan be written in the form

H =

(x zz y

)=

(cos θ − sin θsin θ cos θ

)(λ1 00 λ2

)(cos θ sin θ− sin θ cos θ

), x, y, z ∈ R. (4.5)

Direct calculation shows that the Jacobian of the coordinate transformation from (x, y, z) to (λ1, λ2, θ) is

(λ1 − λ2). (4.6)

The complex case is slightly more complicated. We can write with z = u+ iv and x, y ∈ R

H =

(x zz y

)= eiA

(λ1 00 λ2

)e−iA, (4.7)

where A is a Hermitian matrix with trace zero, thus it can be written in the form

A =

(a b+ ic

b− ic −a

), a, b, c ∈ R. (4.8)

This parametrization of SU(2) with three real degrees of freedom is standard but for our purpose we onlyneed two in (4.7) in addition to the two degrees of freedoms from λ’s. The reason is that the two phases ofthe eigenvectors are redundant and the trace zero condition only takes out one degree of freedom, leavingone more superfluous parameter. We will see that eventually a plays no role. First we evaluate the Jacobianat A = 0 from the formula (4.7), we only need to keep the leading order in A which gives(

x zz y

)= Λ + i[A,Λ] +O(‖A‖2), Λ =

(λ1 00 λ2

). (4.9)

Thus (x zz y

)=

(λ1 i(b+ ic)(λ2 − λ1)

−i(b− ic)(λ2 − λ1) λ2

)+O(‖A‖2), (4.10)

and the Jacobian of the transformation from (x, y, z) to (λ1, λ2, b, c) at b = c = 0 is of the form

C(λ2 − λ1)2

1 0 0 00 1 0 00 0 i −i0 0 −1 −1

= C(λ1 − λ2)2 (4.11)

with some constant C. To compute the Jacobian not at the identity, we first notice that by rotationinvariance, the measure factorizes. This means that its density with respect to the Lebesgue measure canbe written of the form f(Λ)g(U) with some functions f and g, in fact g(U) is constant (the marginal on theunitary part is the Haar measure). The function f may be computed at any point, in particular at U = I,this was the calculation (4.11) yielding f(Λ) = C(λ1−λ2)2. This proves (4.4) for N = 2, modulo the case ofmultiple eigenvalue λ1 = λ2 where the parametrization of U is even more redundant. But this set has zeromeasure, so it is negligible (see [8] for a precise argument). The formula (4.9) is the basis for the proof forgeneral N which we leave it to the readers. The detailed proof can be found in [8] or [37].

It is often useful to think of the measure (4.4) as a Gibbs measure on N ”particles” or ”points” λ =(λ1, λ2, . . . , λN ) of the form

µN (dλ) = pN (λ)dλ =e−βNH(λ)

Z, H(λ) :=

1

2

N∑i=1

V (λi)−1

N

∑i<j

log |λj − λi| (4.12)

with the confining potential V (λ) and logarithmic interaction. (This connection was exploited first in [46]).We adopt the standard convention in random matrix theory that the Hamiltonian H expresses energy per

17

particle, in contrast to the standard statistical physics terminology where the ”Hamiltonian” refers to thetotal energy. This explains the unusual N factor in the exponent. Notice the emergence of the Vandermondedeterminant in (4.3), which directly comes from integrating out the Haar measure and the symmetry type ofthe ensemble, appears through the exponent β. Only the ”classical” β = 1, 2 or 4 cases correspond to matrixensembles of the form (4.2), namely, to the real symmetric, complex Hermitian and quaternion self-dualmatrices. We will not give the precise definition of the latter (see, e.g. Chapter 7 of [105] or [64]), justmention that this is the natural generalization of symmetric or Hermitian matrices to quaternion entries andthey have real eigenvalues.

We remark that despite the convenience of the explicit formula (4.3) or (4.12) for the joint density,computing various statistics, such as correlation functions or even the density of a single eigenvalue, is ahighly nontrivial task. For example, the density involves ”only” integrating out all but one eigenvalue, butthe measure is far from being a product, so these integrations cannot be performed directly when N is large.The measure (4.12) has a strong and long range interaction, while conventional methods of statistical physicsare well suited for short range interactions. In fact, from this point of view β can be any positive numberand does not have to be restricted to the specific values β = 1, 2, 4. For other values of β there is no invariantmatrix ensemble behind the measure (4.12), but it is still a very interesting statistical mechanical system,called the log gas or β-ensemble. If the potential V is quadratic, then (4.12) coincides with (4.4) and it iscalled the Gaussian β-ensemble. We will briefly discuss log gases in Section 18.3.

4.2 Universality of classical invariant ensembles via orthogonal polynomials

The classical values β = 1, 2, 4 in the ensemble (4.12) are specific not only because they originate from aninvariant matrix ensemble (4.3). For these specific values an extra mathematical structure emerges, namelythe orthogonal polynomials with respect to the weight function e−NV (λ) on the real line. Owing to thisstructure, much more is known about these ensembles than about log gases with general β. This approachwas originally applied by Mehta and Gaudin [105, 106] to compute the gap distribution for the Gaussiancase that involved the classical Hermite orthonormal polynomials. Dyson [47] computed the local correlationfunctions for a related ensemble (circular ensemble) that was extended to the standard Gaussian ensemblesby Mehta [104]. Later a general method using orthogonal polynomials and the Riemann-Hilbert problemhas been developed to tackle a very general class of invariant ensembles (see, e.g. [20, 37, 40–42,71, 105,111]and references therein).

For simplicity, to illustrate the connection, we will consider the Hermitian case β = 2 with a Gaussianpotential V (λ) = λ2/2 (which, by Lemma 4.1, is also a Wigner matrix ensemble, namely the GUE). Tosimplify the presentation further, for the purpose of this subsection only, we rescale the eigenvalues

x =√Nλ, (4.13)

which effectively removes the factor N from the exponent in (4.3). (This simple scaling works only in thepure Gaussian case, and it is only a technical convenience to simplify formulas.)

After the rescaling and setting β = 2, the measure we will consider is given by a density which we denoteby

pN (x1, x2, . . . , xN ) = const.∏i<j

(xi − xj)2N∏j=1

e−12x

2j . (4.14)

Let Pk(x) be the k-th orthogonal polynomial on R with respect to the weight function e−x2/2 with leading

coefficient 1. Let

ψk(x) :=e−x

2/4Pk(x)

‖e−x2/4Pk‖L2(R)

be the corresponding orthonormal function, i.e.,∫Rψk(x)ψ`(x)dx = δk,`. (4.15)

18

In the particular case of the Gaussian weight function, Pk is given by the Hermite polynomials

Pk(x) = Hk(x) := (−1)kex2/2 dk

dxke−x

2/2,

and

ψk(x) =Pk(x)

(2π)1/4(k!)1/2e−x

2/4, (4.16)

but for the following discussion we will not need these explicit formulae, only the orthonormality relation(4.15).

The key observation is that, by simple properties of the Vandermonde determinant, we have

∆N (x) =∏

16i<j6N

(xj − xi) = det(xj−1i

)Ni,j=1

= det(Pj−1(xi)

)Ni,j=1

, (4.17)

by exploiting that Pj(x) = xj + . . . is a polynomial of degree j with leading coefficient equal one. Define thekernel

KN (x, y) :=

N−1∑k=0

ψk(x)ψk(y), (4.18)

i.e., the projection kernel onto the subspace spanned by the first K orthonormal functions. Then (4.17)immediately implies

pN (x1, . . . , xN ) =CN

[det(Pj−1(xi)

)Ni,j=1

]2 N∏i=1

e−x2i /2 (4.19)

=C ′N

[det(ψj−1(xi)

)Ni,j=1

]2= C ′N det

(KN (xi, xj)

)Ni,j=1

,

where in the last step we used that the product of the matrices(ψj−1(xi)

)Ni,j=1

and(ψj−1(xk)

)Nj,k=1

is exactly(KN (xi, xk)

)Ni,k=1

and we did not follow the precise constants for simplicity. The determinantal structure

(4.19) for the joint density functions is remarkable: it allows one to describe a function of N variables interms of a kernel of two variables only. This structure greatly simplifies the analysis.

We now define the correlation functions that play a crucial role to describe universality.

Definition 4.2. Let pN (λ1, λ2, . . . , λN ) be the joint symmetrized probability distribution of the eigenvalues(without rescaling (4.13)). For any n > 1, the n-point correlation function is defined by

p(n)N (λ1, λ2, . . . , λn) :=

∫RN−n

pN (λ1, . . . , λn, λn+1, . . . λN )dλn+1 . . . dλN . (4.20)

We also define the rescaled correlation function p(n)N (x1, x2, . . . , xn) in a similar way.

Remark. In other sections of this book we usually label the eigenvalues in increasing order so that theirprobability density, denoted by pN (λ), is defined on the set

Ξ(N) := λ1 < λ2 < . . . < λN ⊂ RN .

For the purpose of Definition 4.2, however, we dropped this restriction and we consider pN (λ1, λ2, . . . , λN )to be a symmetric function of N variables, λ = (λ1, . . . , λN ) on RN . The relation between the ordered andunordered densities is clearly pN (λ) = N ! pN (λ) · 1(λ ∈ Ξ(N)).

The significance of the correlation functions is that with their help one can compute the expectationvalue of any symmetrized observable. For example, for any test function O of two variables we have, directlyfrom the definition of the correlation functions, that

1

N(N − 1)E∑i 6=j

O(λi, λj) =

∫R×R

O(λ1, λ2)p(2)N (λ1, λ2)dλ1dλ2, (4.21)

19

where the expectation is w.r.t. the probability density pN or in this case w.r.t. the original random matrixensemble. Similar formula holds for observables of any number of variables.

To compute the correlation functions of a determinantal joint density (4.19), we start with the followingprototype calculation for N = 3, n = 2

∫R

dx3

K3(x1, x1) K3(x1, x2) K3(x1, x3)K3(x2, x1) K3(x2, x2) K3(x2, x3)K3(x3, x1) K3(x3, x2) K3(x3, x3)

(4.22)

=

∫R

dx3

[K3(x2, x1) K3(x2, x2)K3(x3, x1) K3(x3, x2)

]K3(x1, x3)

−∫R

dx3

[K3(x1, x1) K3(x1, x2)K3(x3, x1) K3(x3, x2)

]K3(x2, x3)

+

∫R

dx3

[K3(x1, x1) K3(x1, x2)K3(x2, x1) K3(x2, x2)

]K3(x3, x3).

From the definition (4.18) and the orthonormality of the ψ’s we have the reproducing property∫R

dyKN (x, y)KN (y, z) = KN (x, z) (4.23)

and the normalization ∫R

dxKN (x, x) = N. (4.24)

Thus (4.22) equals to[K3(x2, x1) K3(x2, x2)K3(x1, x1) K3(x1, x2)

]−[K3(x1, x1) K3(x1, x2)K3(x2, x1) K3(x2, x2)

]+ 3

[K3(x1, x1) K3(x1, x2)K3(x2, x1) K3(x2, x2)

](4.25)

=

[K3(x1, x1) K3(x1, x2)K3(x2, x1) K3(x2, x2)

]. (4.26)

It is easy to generalize this computation to get

p(n)N (x1, . . . , xn) =

(N − n)!

N !det[KN (xi, xj)]

ni,j=1, (4.27)

i.e., the correlation functions continue to have a determinantal structure. Here the constant is obtained by

the normalization condition that p(n)N is a probability density. Thus we have an explicit formula for the

correlation functions in terms of the kernel KN . We note that this structure is very general and it is notrestricted to Hermite polynomials, it only requires a system of orthogonal polynomials.

To understand the behavior ofKN , first we recall a basic algebraic property of the orthogonal polynomials,the Christoffel–Darboux formula:

KN (x, y) =

N−1∑j=0

ψj(x)ψj(y) =√N

[ψN (x)ψN−1(y)− ψN (y)ψN−1(x)

x− y

]. (4.28)

Since the Hermite polynomials and the orthonormal functions ψN differ only by an exponential factor (4.16),and these factors in ψ(x)ψ(y) on both side of (4.28) are cancelled, so (4.28) it is just a property of the Hermitepolynomials. We now sketch a proof of this identity. Multiplying both side by (x−y), we need to prove that

N−1∑j=0

ψj(x)ψj(y)(x− y) =√N[ψN (x)ψN−1(y)− ψN (y)ψN−1(x)

]. (4.29)

20

The left side is a polynomial of degree N in x and degree N in y, up to common exponential factors. Themultiplication of x or y can be computed by the following identity

xψj(x) =√j + 1ψj+1(x) +

√jψj−1(x) (4.30)

that directly follows from the “three-term” relation for the Hermite polynomials. Collecting all the termsgenerated in this way, we obtain the right hand side of (4.29). Details can be found in Lemma 3.2.7 of [8].

It is well known that orthogonal polynomials of high degree have asymptotic behavior (Plancherel-Rotachasymptotics). For the Hermite orthonormal functions ψ these formulas read as follows:

ψ2m(x) =(−1)m

N1/4√π

cos(√Nx)

+ o(N−1/4), ψ2m+1(x) =(−1)m

N1/4√π

sin(√Nx)

+ o(N−1/4), (4.31)

as N → ∞ for any m such that |2m − N | 6 C. The approximation is uniform for |x| 6 CN−1/2. We canthus compute that

KN (x, y) ≈ 1

π

sin(√Nx) cos(

√Ny)− sin(

√Ny) cos(

√Nx)

x− y=

sin√N(x− y)

π(x− y), (4.32)

i.e., the celebrated sine-kernel emerged [47,104].To rewrite this formula into a canonical form, recall that we have done a rescaling (4.13) λ = x/

√N

where λ is the original variable. The two point function in the original variables was denoted by p(2)N and

the rescaled variables by p(2)N . The relation between p

(2)N and p

(2)N is determined by

p(2)N (λ1, λ2)dλ1dλ2 = p

(2)N (x1, x2)dx1dx2, (4.33)

and thus we havep

(2)N (λ1, λ2) = Np

(2)N

(√Nλ1,

√Nλ2

). (4.34)

Now we introduce another rescaling of the eigenvalues that rescales the typical gap between them toorder 1. We set

λj =aj

%sc(0)N, %sc(0) =

1

π, (4.35)

where %sc is the semicircle density, see (3.1). Using the expression (4.20) for correlation functions, we have,in terms of original variable, that

1

[%sc(0)]2p

(2)N

( a1

%sc(0)N,

a2

%sc(0)N

)= det

(KN

11 KN12

KN21 KN

22

), (4.36)

where

KN12 :=

1

%sc(0)

1√N − 1

KN

( a1

ρsc(0)√N,

a2

ρsc(0)√N

) S(a1 − a2), S(x) :=

sinx

πx, (4.37)

where we used (4.32). Due to the rescaling, this calculation reveals the correlation functions around E = 0.The general formula for any fixed energy E in the bulk, i.e., |E| < 2, can be obtained similarly and it isgiven by

1

[%sc(E)]np

(n)N

(E+

α1

N%sc(E), E+

α2

N%sc(E), . . . , E+

αnN%sc(E)

) q

(n)GUE (α) := det

(S(αi−αj)

)ni,j=1

(4.38)

as weak convergence of functions in the variables α = (α1, . . . , αn). Note that the limit is universal in asense that it is independent of the energy E. Formula (4.38) holds for the GUE case. The correspondingexpression for GOE is more involved [8, 105]

q(n)GOE (α) := det

(K(αi − αj)

)ni,j=1

, K(x) :=

(S(x) S′(x)

− 12 sgn(x) +

∫ x0S(t)dt S(x)

). (4.39)

21

Here the determinant is understood as the trace of the quaternion determinant after the canonical corre-spondance between quaternions a · 1 + b · i + c · j + d · k, a, b, c, d ∈ C, and 2 × 2 complex matrices givenby

1↔(

1 00 1

)i↔

(i 00 −i

)j↔

(0 1−1 0

)k↔

(0 ii 0

).

The main technical input is the refined asymptotic formulae (4.31) for orthogonal polynomials. In case ofthe classical orthogonal polynomials (appearing in the standard Gaussian Wigner and Wishart ensembles)they are usually obtained by a Laplace asymptotics from their integral representation. For a general potentialV the corresponding analysis is quite involved and depends on the regularity properties of V . One successfulapproach was initiated by Fokas, Its and Kitaev [71] and by P. Deift and collaborators via the Riemann-Hilbert method, see [37] and references therein. An alternative method was presented in [98,101] using moredirect methods from orthogonal polynomials.

4.2.1 Edge universality: the Airy kernel

Near the spectral edges, i.e., for energy E = ±2, a different scaling has to be used. Recall the formula

KN (x, y) =√N

[ψN (x)ψN−1(y)− ψN (y)ψN−1(x)

x− y

]in the rescaled variables x, y. We will need the following identity of the derivatives of the Hermite functions

ψ′N (x) = −x2ψN (x) +

√NψN−1(x). (4.40)

Thus we can rewrite

KN (x, y) =

[ψN (x)ψ′N (y)− ψN (y)ψ′N (x)

x− y− 1

2ψN (x)ψN (y)

]. (4.41)

Define a new rescaled functionΨN (u) := N1/12ψN

(2√N +

u

N1/6

). (4.42)

The Plancherel-Rotach edge asymptotics for Ψ asserts that

limN→∞

|ΨN (z)−Ai(z)| = 0 (4.43)

in any compact domain in C, where Ai(x) is the Airy function, i.e.,

Ai(x) =1

π

∫ ∞0

cos(1

3t3 + xt

)dt.

It is well-known that the Airy function is the solution to the second order differential equation, y′′−xy = 0,with vanishing boundary condition at x =∞.

We now define the Airy kernel by

A(u, v) :=Ai(u)Ai′(v)−Ai′(u)Ai(v)

u− v.

Under the edge scaling (4.42), we have

N−1/6KN

(2√N +

u

N1/6, 2√N +

v

N1/6

)→ A(u, v). (4.44)

In these variables, we have, by (4.34), that

p(2)N

(2 +

α1

N2/3, 2 +

α2

N2/3

)= Np

(2)N

(2√N +

α1

N1/6, 2√N +

α1

N1/6

).

22

By using (4.20), we can continue with

p(2)N

(2 +

α1

N2/3, 2 +

α2

N2/3

)= N

1

N(N − 1)det[KN

(2√N +

αiN1/6

, 2√N +

αjN1/6

)]2i,j=1

N−2/3 det[N−1/6KN

(2√N +

αiN1/6

, 2√N +

αjN1/6

)]2i,j=1

,

and similar formulas hold for any k-point correlation functions. Using the limiting statement (4.44), in termsof the original variables, we obtain

Nk/3p(k)N

(2 +

α1

N2/3, 2 +

α2

N2/3, . . . , 2 +

αkN2/3

) det

(A(αi, αj)

)ki,j=1

(4.45)

in a weak sense. In particular, the last formula with k = 2 implies, for any smooth test function O withcompact support, that∑j 6=k

EO(N2/3(λj − 2)), N2/3(λk − 2)) = N(N − 1)N−4/3

∫R2

dα1dα2O(α1, α2)p(2)N

(2 +

α1

N2/3, 2 +

α2

N2/3

)→∫R2

dα1dα2O(α1, α2) det(A(αi, αj)

)2i,j=1

. (4.46)

Similar statement holds at the lower spectral edge, E = −2.

23

5 Universality for generalized Wigner matrices

5.1 Different notions of universality

The universality of eigenvalue statistics can be considered via several different notions of convergence whichyield somewhat different concepts of universality. The local statistics can either be expressed in terms oflocal correlation functions rescaled around some energy E or one may ask for the gap statistics for a gapλj+1 − λj with a given (typically N -dependent) label j. These will be called fixed energy and fixed gapuniversality and these two concepts do not coincide. To see this, notice that since eigenvalues fluctuate ona scale much larger than the typical eigenvalue spacing, the label j of the eigenvalue λj closest to a fixedenergy E is not a deterministic function of E. Furthermore, one may look for cumulative statistics of gapsaveraged over a mesoscopic scale, i.e., both above concepts have a natural averaged version. We now definethese four concepts precisely.

The correlation functions and the gaps need to be rescaled by the limiting local density, %(E), to get auniversal limit. In case of generalized Wigner matrices we have %(E) = %sc(E), but the definitions belowhold in more general setup as well. We also remark that all results in this book hold for both real symmetricmatrices or complex Hermitian matrices. For simplicity of notations, we formulate all concepts and laterstate all results in terms of real symmetric matrices.

We recall the notation JA,BK := Z ∩ [A,B] for the set of integers between two real numbers A < B.

(i) Fixed energy universality (in the bulk): For any n > 1, F : Rn → R smooth and compactly supportedfunction and for any κ > 0, we have, uniformly in E ∈ [−2 + κ, 2− κ],

limN→∞

1

%(E)n

∫Rn

dαF (α)p(n)N

(E +

α

N%(E)

)=

∫Rn

dαF (α)q(n)GOE (α) , (5.1)

where α = (α1, . . . , αn) . Here p(n)N is the n-point function of the matrix ensemble and q

(n)GOE is the

limiting n-point function of the GOE defined in (4.39). To shorten the argument of p(n)N , we used the

convention that E + α = (E + α1, . . . , E + αn) for any α = (α1, . . . , αn) ∈ CN .

(ii) Averaged energy universality (in the bulk, on scale N−1+ξ): For any n > 1, F : Rn → R smooth andcompactly supported function and for some 0 < ξ < 1 and for any κ > 0, we have, uniformly inE ∈ [−2 + κ, 2− κ],

limN→∞

1

%(E)n

∫ E+b

E−b

dx

2b

∫Rn

dαF (α)p(n)N

(x+

α

N%(E)

)dα =

∫Rn

dαF (α)q(n)GOE (α) , (5.2)

where b = bN := N−1+ξ.

(iii) Fixed gap universality (in the bulk): Fix any positive number 0 < α < 1 and an integer n. For anysmooth compactly supported function G : Rn → R and for any k,m ∈ JαN, (1− α)NK we have

limN→∞

∣∣∣EµNG((N%(λk))(λk − λk+1), . . . , (N%(λk))(λk − λk+n))

(5.3)

− E(GOE)G((N%(λm))(λm − λm+1), . . . , (N%(λm))(λm − λm+n)

)∣∣∣ = 0,

where µN denotes the law of the random matrix ensemble under consideration.

(iv) Averaged gap universality (in the bulk, on scale N−1+ξ): With the same notations as in (iii) and ` =Nξ with 0 < ξ < 1, we have

limN→∞

∣∣∣ 1

2`+ 1

k+∑j=k−`

EµNG((N%(λj))(λj − λj+1), . . . , (N%(λj))(λj − λj+n)

)(5.4)

− E(GOE)G((N%(λm))(λm − λm+1), . . . , (N%(λm))(λm − λm+n)

)∣∣∣ = 0.

24

Note that in the bulk ` = Nξ consecutive eigenvalues range over a scale N−1+ξ in the spectrum, hencethe name.

Although we have formulated the universality notions in terms of large N limits, all limit statements inthis book have effective error bounds of the form N−c for some c > 0. For the four notions of universalitystated here, the fixed energy (fixed gap, resp.) universality obviously implies averaged energy (averaged gap,resp.) universality. However, the fixed gap universality and the fixed energy universality are not logicalconsequences of each other. On the other hand, under suitable conditions, the averaged energy universalityand averaged gap universality are equivalent. In Section 14 we will prove that the latter implies the former,this is the direction that we actually use in the proof. The opposite direction goes along similar argumentsand will be omitted.

In this book, we will focus on establishing the averaged energy universality. From this one, the average gapuniversality follows by the equivalence just mentioned. We now state precisely our representative universalitytheorem that will be proven in this book. It asserts average energy universality (in the bulk) for generalizedWigner matrices, where the scale of the energy average is reduced to N−1+ξ for arbitrary ξ > 0.

For convenience we assume that the normalized matrix entries

ζij :=√Nhij (5.5)

have a polynomial decay of arbitrary high degree, i.e., for all p ∈ N there is a constant µp such that

E|ζij |p 6 µp (5.6)

for all N , i, and j. Recall from (2.6) that sij N−1, so this condition is equivalent to (2.8). We makethis assumption in order to streamline notation, but in fact, our results hold, with the same proof, provided(5.6) is valid for some large but fixed p. In fact, even p = 4 + ε is sufficient, see Section 18. On the otherhand, if we strengthen the decay condition to uniform subexponential decay (2.7), then certain estimatesconcerning the local semicircle law (e.g., Theorem 6.7) become stronger, although we will not them in thisbook (see [70]).

Theorem 5.1. Let H be an N × N generalized real symmetric Wigner matrix. Suppose that the rescaledmatrix elements

√Nhij satisfy the decay condition (5.6). Then averaged energy universality holds in the

sense of (5.1) on scale N−1+ξ for any 0 < ξ < 1.

The rest of this book is devoted to a proof of Theorem 5.1 and related questions on eigenvectors. Theproof of this theorem will be based on the following three step strategy.

5.2 The three-step strategy

Step 1. Local semicircle law and delocalization of eigenvectors: It states that the density of eigenvalues isgiven by the semicircle law not only as a weak limit on macroscopic scales (3.1), but also in a high probabilitysense with an effective convergence speed and down to short scales containing only Nξ eigenvalues for allξ > 0. This will imply the rigidity of eigenvalues, i.e., that the eigenvalues are near their classical locationsin the sense to be made clear in Theorem 11.5. We also obtain precise estimates on the matrix elements ofthe Green function which in particular imply complete delocalization of eigenvectors.

Step 2. Universality for Gaussian divisible ensembles: The Gaussian divisible ensembles are matrices ofthe form Ht = e−t/2H0 +

√1− e−tHG where t > 0 is a parameter, H0 is a (generalized) Wigner matrix and

HG is an independent GOE matrix. The parametrization of Ht is chosen so that Ht can be obtained byan Ornstein-Uhlenbeck process starting from H0. More precisely, consider the following matrix Ornstein-Uhlenbeck process

dHt =1√N

dBt −1

2Htdt (5.7)

25

with initial data H0, where Bt = bij,tNi,j=1 is a symmetric N × N matrix such that its matrix elements

bij,t for i < j and bii,t/√

2 are independent standard Brownian motions starting from zero. Then Ht ande−t/2H0 +

√1− e−tHG have the same distribution.

The aim of Step 2 is to prove the bulk universality of Ht for t = N−τ for the entire range of 0 < τ < 1.This is connected to the local ergodicity of the Dyson Brownian motion which we now define.

Definition 5.2. Given a real parameter β > 1, consider the following system of stochastic differential equations(SDE)

dλi =

√2√βN

dBi +

−λi2

+1

N

∑j 6=i

1

λi − λj

dt , i ∈ J1, NK , (5.8)

where (Bi) is a collection of real-valued, independent standard Brownian motions. The solution of this SDEis called the Dyson Brownian Motion (DBM).

In a seminal paper [45], Dyson observed that the eigenvalue flow of the matrix OU process is exactlythe DBM with β = 1, 2 corresponding to real symmetric or complex Hermitian ensembles. Furthermore,the invariant measure of the DBM is given by the Gaussian β-ensemble defined in (4.4). Dyson furtherconjectured that the time to “local equilibrium” for DBM is of order 1/N while the time to global equilibriumis of order one. It should be noted that there is no standard notion for the “local equilibrium”; we will insteadtake a practical point of view to interpret Dyson’s conjecture as that the local statistics of the DBM at anytime t N−1 satisfy the universality defined in earlier in this section. In other words, Dyson’s conjectureis exactly that the local statistics of Ht are universal for t = N−τ for any 0 < τ < 1.

Step 3. Approximation by a Gaussian divisible ensemble: It is a simple density argument in the space ofmatrix ensembles which shows that for any probability distribution of the matrix elements there exists aGaussian divisible distribution with a small Gaussian component, as in Step 2, such that the two associatedWigner ensembles have asymptotically identical local eigenvalue statistics. The general result to compareany two matrix ensembles with matching moments will be given in Theorem 16.1. Alternatively, to followthe evolution of the Green function under the OU flow, we can use the following continuity of matrix OUprocess:

Step 3a. Continuity of eigenvalues under matrix OU process. In Theorem 15.2 will show that the changesof the local statistics in the bulk under the flow (5.7) up to time scales t N−1/2 are negligible, seeLemma 15.4. This clearly can be used in combination with Step 2 to complete the proof of Theorem 5.1.

The three step strategy outlined here is very general and it has been applied to many different models aswe will explain in Section 18. It can also be extended to the edges of the spectrum, yielding the universalityat the spectral edges. This will be reviewed in Section 17.

26

6 Local semicircle law for universal Wigner matrices

6.1 Setup

We recall the definition of the universal Wigner matrices (Definition 2.1), in particular, the matrix elementsmay have different distributions but independence (up to symmetry) is always assumed. The fundamentaldata of this model is the N ×N matrix of variances S = (sij), where

sij := E |hij |2,

and we assume that S is (doubly) stochastic: ∑j

sij = 1 (6.1)

for all i. We will assume the polynomial decay analogous to (5.6)

E∣∣hij∣∣p 6 µp[sij ]p/2, (6.2)

where µp depends only on p and is uniform in i, j and N .

We introduce the parameter M :=[

maxi,j sij]−1

that expresses the maximal size of sij :

sij 6 M−1 (6.3)

for all i and j. We regard N as the fundamental parameter and M = MN as a function of N . In this sectionwe do not assume lower bound on sij . Notice that for generalized Wigner matrices M is comparable with Nand one may use N everywhere instead of M in all error bounds in this section. However, we wish to keepM as a separate parameter and assume only that

Nδ 6 M 6 N (6.4)

for some fixed δ > 0. For standard Wigner matrices, hij are identically distributed, hence sij = 1N and

M = N . Another motivating example where M may substantially differ from N is the random bandmatrices that play a key role interpolating between Wigner matrices and random Schrodinger operators (seeSection 18.7).

Example 6.1. Random band matrices are characterized by translation invariant variances of the form

sij =1

Wf( |i− j|N

W

), (6.5)

where f is a smooth, symmetric probability density on R, W is a large parameter, called the band width, and|i−j|N denotes the periodic distance on the discrete torus T of length N . In this case M is comparable with Wwhich is typically much smaller than N . The generalization in higher spatial dimensions is straightforward,in this case the rows and columns of H are labelled by a discrete d dimensional torus TdL of length L withN = Ld.

Recall from (3.1) that the global eigenvalue density of H in the N → ∞ limit follows the celebratedWigner semicircle law,

%sc(x) :=1

2π

√(4− x2)+ , (6.6)

and its Stieltjes transform with the spectral parameter z = E + iη is defined by

msc(z) :=

∫R

%sc(x)

x− zdx . (6.7)

The spectral parameter z will sometimes be omitted from the notation. The two endpoints ±2 of the supportof %sc are called the spectral edges.

27

It is well-known (see Section 3.3) that the Stieltjes transform msc is the unique solution of the quadraticequation

m(z) +1

m(z)+ z = 0 (6.8)

with Imm(z) > 0 for Im z > 0. Thus we have

msc(z) =−z +

√z2 − 4

2, (6.9)

where the square root is chosen so that Immsc(z) > 0 for Im z > 0. In particular,√z2 − 4 z for z large

so that it cancels the −z term in the above formula. Under this convention, we collect basic bounds on msc

in the following lemma.

Lemma 6.2. We have for all z = E + iη with η > 0 that

|msc(z)| = |msc(z) + z|−1 6 1. (6.10)

Furthermore, there is a constant c > 0 such that for E ∈ [−10, 10] and η ∈ (0, 10] we have

c 6 |msc(z)| 6 1− cη , (6.11)

|1−m2sc(z)|

√κ+ η , (6.12)

as well as

Immsc(z)

√κ+ η if |E| 6 2η√κ+η

if |E| > 2 ,(6.13)

where κ :=∣∣|E| − 2

∣∣ denotes the distance of E to the spectral edges.

Proof. The proof is an elementary exercise using (6.9).

We define the Green function or the resolvent of H through

G(z) := (H − z)−1 ,

and denote its entries by Gij(z). We recall from (3.9) that the Stieltjes transform of the empirical spectralmeasure

%(dx) = %N (dx) :=1

N

∑α

δ(λα − x)dx

for the eigenvalues λ1 6 λ2 6 . . . 6 λN of H is

m(z) = mN (z) :=

∫R

%N (dx)

x− z=

1

NTrG(z) =

1

N

∑α

1

z − λα. (6.14)

We remark that every quantity related to the random matrix H, such as the eigenvalues, Green function,empirical density of states and its Stieltjes transform, all depend on N , but this dependence will often beomitted in the notation for brevity. In some formulas, especially in statements of the main results we willput back the N dependence to stress its presence.

Since1

πIm

1

z − λα= θη(E − λα), with θη(x) :=

1

π

η

x2 + η2

is an approximation to the identity (i.e., delta function) at the scale η = Im z, we have π−1 ImmN (z) =%N ∗ θη(z), i.e., the imaginary part of mN (z) is the density of the eigenvalues “at the scale η”. Thus theconvergence of the Stieltjes transform mN (z) to msc(z) as N →∞ will show that the empirical local densityof the eigenvalues around the energy E in a window of size η converges to the semicircle law %sc(E). Thereforethe key task is to control mN (z) for small η.

28

6.2 Spectral information on S

We will show that the diagonal matrix elements of the resolvent Gii(z) satisfy a system of self-consistentvector equation of the form

− 1

Gii(z)≈ z +

N∑j=1

sijGjj(z) (6.15)

with very high probability. This equation will be viewed as a small perturbation of the deterministic equation

− 1

mi(z)= z +

N∑j=1

sijmi(z) (6.16)

which, under the side condition that Immi > 0, has a unique solution, namely mi(z) = msc(z) for everyi (see (6.8)). Here the stochasticity condition (6.1) is essential. For the stability analysis of (6.16), theinvertibility of the operator 1−m2

sc(z)S plays a key role.Therefore an important parameter of the model is

Γ(z) :=∥∥∥ 1

1−m2sc(z)S

∥∥∥∞→∞

, Im z > 0. (6.17)

Note that S, being a stochastic matrix, satisfies −1 6 S 6 1, and 1 is an eigenvalue with eigenvectore = N−1/2(1, 1, . . . , 1), Se = e. For convenience we assume that 1 is a simple eigenvalue of S (which holdsif S is irreducible and aperiodic). Another important parameter is

Γ(z) :=

∥∥∥∥∥ 1

1−m2sc(z)S

∣∣∣∣e⊥

∥∥∥∥∥∞→∞

, (6.18)

i.e., the norm of (1 − m2scS)−1 restricted to the subspace orthogonal to the constants. Recalling that

−1 6 S 6 1 and using the upper bound |msc| 6 1 from (6.11), we find that there is a constant c > 0 suchthat

c 6 Γ 6 Γ . (6.19)

Since |msc(z)| 6 1 − cη , we can expand (1 −m2scS)−1 into a geometric series. Using that ‖S‖`∞→`∞ 6 1

from (6.1), we obtain the trivial upper bound

Γ 6 Cη−1. (6.20)

We remark that we also have the following easy lower bound on Γ:

Γ > |1−m2sc|−1 >

1

2. (6.21)

This can be seen by applying the matrix (1−m2scS)−1 to the constant vector and using the definition of Γ.

For standard Wigner matrices, sij = 1N , and for bounded energies |E| 6 C, we easily obtain that

Γ(z) =1

|1−m2sc(z)|

1√κE + η

, Γ(z) = 1, (6.22)

whereκE := min|E − 2|, |E + 2| (6.23)

is the distance of E = Re z from the spectral edges. To see this, we note that in this case S = Pe, theprojection operator onto the direction e. Hence the operator [1 −m2

sc(z)S]−1 can be computed explicitly.The comparison relation in (6.22) follows from (6.12). In the general case, we have the same relations up toa constant factor:

29

Lemma 6.3. For generalized Wigner matrices (Definition 2.1) and for bounded range of energies, say for any|E| 6 10, we have

c

|1−m2sc(z)|

6 Γ(z) 6C

|1−m2sc(z)|

C√κE + η

, c 6 Γ(z) 6 C, (6.24)

where the constant C depends on Cinf and Csup in (2.6) and the constant c in the lower bound is universal.

The proof will be given in Section 6.5.

6.3 Stochastic domination

The following definition introduces a notion of a high-probability bound that is suited for our purposes. Itfirst appeared in [55] and it relieves us from the burden to keep track of exceptional sets of small probabilitywhere some bound does not hold.

Definition 6.4 (Stochastic domination). Let

X =(X(N)(u) : N ∈ N, u ∈ U (N)

), Y =

(Y (N)(u) : N ∈ N, u ∈ U (N)

)be two families of nonnegative random variables, where U (N) is a possibly N -dependent parameter set. Wesay that X is stochastically dominated by Y , uniformly in u, if for all (small) ε > 0 and (large) D > 0 wehave

supu∈U(N)

P[X(N)(u) > NεY (N)(u)

]6 N−D

for large enough N > N0(ε,D). Unless stated otherwise, throughout this paper the stochastic dominationwill always be uniform in all parameters apart from the parameter δ in (6.4) and the sequence of constantsµp in (5.6); thus, N0(ε,D) also depends on δ and µp. If X is stochastically dominated by Y , uniformly inu, we use the notation X ≺ Y . Moreover, if for some complex family X we have |X| ≺ Y we also writeX = O≺(Y ).

The following proposition collects some basic properties of the stochastic domination, the proofs are leftas an exercise.

Proposition 6.5. The relation ≺ satisfies the following properties:

i) ≺ is transitive: X ≺ Y and Y ≺ Z imply X ≺ Z;

ii) ≺ satisfies the familiar arithmetic rules of order relations, i.e., if X1 ≺ Y1 and X2 ≺ Y2 then X1+X2 ≺Y1 + Y2 and X1X2 ≺ Y1Y2;

iii) Moreover, the following cancellation property holds

if X ≺ Y +N−εX for some ε > 0, then X ≺ Y ; (6.25)

iv) Furthermore, if X ≺ Y , EY > N−C and |X| 6 NC almost surely with some fixed exponent C, thenfor any ε > 0 and sufficiently large N > N0(ε) we have

EX 6 NεEY. (6.26)

Later in Lemma 10.1 the relation (6.26) will be extended to partial expectations.We now define appropriate subsets of the spectral parameter z.

Definition 6.6 (Spectral domain). We call an N -dependent family

D ≡ D(N) ⊂z : |E| 6 10 , M−1 6 η 6 10

a spectral domain. (Recall that M ≡MN depends on N .)

30

We always consider families X(N)(u) = X(N)i (z) indexed by u = (z, i), where z takes on values in some

spectral domain D, and i takes on values in some finite (possibly N -dependent or empty) index set. Thestochastic domination X ≺ Y of such families will always be uniform in z and i, and we usually do not statethis explicitly. Usually, which spectral domain D is meant will be clear from the context, in which case weshall not mention it explicitly.

For example, using Chebyshev’s inequality and (6.2) one easily finds that

hij ≺ (sij)1/2 ≺ M−1/2, (6.27)

uniformly in i and j, so that we may also write hij = O≺((sij)1/2). The definition of ≺ with the polynomial

factors N−ε and N−D are tailored for the assumption (6.2). We remark that if the analogous subexponentialdecay (2.7) is assumed then a stronger form of stochastic domination can be introduced but we will not pursuethis direction here.

6.4 Statement of the local semicircle law

The local semicircle law is very sensitive to the closeness of η = Im z to the real axis; when η is too small, theresolvent and its trace becomes strongly fluctuating. So our results will hold only above a certain thresholdfor η. We now define the lower threshold for η that depends on the energy E ∈ [−10, 10]

ηE := min

η :

1

Mξ6 min

M−γ

Γ(E + iξ)3,

M−2γ

Γ(E + iξ)4 Immsc(E + iξ)

holds for all ξ > η

. (6.28)

Although this expression looks complicated, we shall see that it comes out naturally in the analysis of theself-consistent equations for the Green functions. Here γ > 0 is a parameter that can be chosen arbitrarilysmall; for all practical purposes the reader can neglect it. For generalized Wigner matrices, M N , from(6.24) we have

ηE 6 CN−1+2γ , (6.29)

i.e., we will get the local semicircle law on the smallest possible scale η N−1, modulo an Mγ correctionwith an arbitrary small exponent. We remark that if we assume subexponential decay (2.7) instead of thepolynomial decay (6.2), then the small Mγ correction can be replaced with a (logM)C factor.

Finally we define our fundamental control parameter

Π(z) :=

√Immsc(z)

Mη+

1

Mη. (6.30)

We can now state the main result of this section, which in this full generality first appeared in [55].Previous results that have cumulatively led to this general formulation will be summarized at the end of thesection.

Theorem 6.7 (Local semicircle law [55]). Consider a universal Wigner matrix satisfying the polynomial decaycondition (6.2) and (6.3). Then, uniformly in the energy |E| 6 10 and η ∈ [ηE , 10], we have the bounds

maxi,j

∣∣Gij(z)− δijmsc(z)∣∣ ≺ Π(z) =

√Immsc(z)

Mη+

1

Mη, z = E + iη, (6.31)

as well as ∣∣mN (z)−msc(z)∣∣ ≺ 1

Mη. (6.32)

Moreover, outside of the spectrum we have the stronger estimate∣∣mN (z)−m(z)∣∣ ≺ 1

M(κE + η)+

1

(Mη)2√κE + η

(6.33)

31

uniformly in z ∈ z : 2 6 |E| 6 10 , ηE 6 η 6 10, Mη√κE + η > Mγ for any fixed γ > 0, where

κE :=∣∣ |E| − 2

∣∣.For generalized Wigner matrix, the threshold ηE can be chosen ηE = N−1+2γ .

We point out two remarkable features of these bounds. The error term for the resolvent entries behavesessentially as (Mη)−1/2, with an improvement near the edges where Immsc vanishes. The error bound forthe Stieltjes transform, i.e., for the average of the diagonal resolvent entries, is one order better, (Mη)−1,but without improvement near the edge.

The resolvent matrix element Gij may be viewed as the scalar product 〈ei, Gej〉 where ei is the i-thcoordinate vector. In fact, a more general version of (6.31), the isotropic local law also holds for generalizedWigner matrices:

Theorem 6.8 (Isotropic law [21]). For a generalized Wigner matrix with polynomial decay (5.6) and for anyfixed unit vector v,w we have

∣∣〈v, G(z)w〉 −msc(z)〈v,w〉∣∣ ≺ √

Immsc(z)

Nη+

1

Nη, (6.34)

uniformly in the set z = E + iη : |E| 6 ω−1, N−1+ω 6 η 6 ω−1 for any fixed ω > 0.

Isotropic law was first proven in [88] for Wigner matrices under a vanishing third moment condition. Thegeneral case in the form above was given in [21]. We will not prove this result here since it is not needed forthe proof of Theorem 5.1.

We will first prove a weaker version of Theorem 6.7 in Section 7, the so called weak local semicircle law,where the error term is not optimal. After that, we will prove Theorem 6.7 in Section 8 using Γ instead of Γ.This yields the same estimate as given in Theorem 6.7 but only on a smaller set of the spectral parameterfor which the argument is somewhat simpler. The proof of Theorem 6.7 for the entire domain will only besketched in Section 9 and we refer the reader to the original paper for the complete version. Section 7 isincluded mainly for pedagogical reasons to introduce the ideas of continuity argument and self-consistentequations. Section 8 demonstrates how to use the vector self-consistent equations in a simpler setup. InSection 9 we sketch the analysis of the same self-consistent equation but splitting it on space orthogonal tothe constant vector and exploiting a spectral gap. Finally Section 10 is devoted to a key technical lemma,the fluctuation averaging lemma.

We close this section with a short summary of the main developments that have gradually led to thelocal semicircle law Theorem 6.7 in its general form. In Section 8.2 we will demonstrate that all proofs ofthe local semicircle law rely on some version of a self-consistent equation. At the beginning this was a scalarequation for mN used in various forms by many authors, see e.g. Pastur [109], Girko [76] and Bai [10]. Theself-consistent vector equation for vi = Gii − msc (see Section 8.2.2) first appeared in [69]. This allowedone to deviate from the identical distributions for hij and opened up the route to estimates on individualresolvent matrix elements. Finally, the self-consistent matrix equation for E|Gxy|2 first appeared in [54] andit yielded the diffusion profile for the resolvent. In this book we only need the vector equation.

The local semicircle law has three important features that have been gradually achieved. We explainthem for the case when M is comparable with N , the general case is only a technical extension. First, thelocal law in the bulk holds down to the scale expressed by the lower bound η N−1. This is the smallestpossible scale to control m(z), since at scale η . N−1 a few individual eigenvalues strongly influence itsbehavior and m(z) is not close to a deterministic quantity. This optimal scale in the bulk of the spectrumwas first established for Wigner matrices in a series of papers [60–62] with the extension to generalizedWigner matrices in [69]. Second, the optimal speed of convergence in the entrywise bound (6.31) is N−1/2,while in the bound m −msc (6.32) it is N−1. These optimal N -dependences were first achieved in [68] byintroducing the fluctuation averaging mechanism. Third, near the spectral edge the stability of the self-consistent equation deteriorates, manifested by the behavior of Γ(z) in (6.24) when κE is small. This can becompensated by the fact that the density is small near the edge. After many attempts and weaker results

32

in [64, 68, 69], the optimal form of this compensation was eventually found in [70], basically separating the

analysis of the self-consistent equation onto the space orthogonal to constants. Since Γ does not deterioratenear the edge, see (6.24), the estimate becomes optimal.

6.5 Appendix: Behaviour of Γ and Γ and the proof of Lemma 6.3

In this appendix we give basic bounds on the parameters Γ and Γ. Readers interested only in the Wignercase may skip this section entirely; if sij = N−1, then the explicit formulas in (6.22) already suffice. As it

turns out, the behaviour of Γ and Γ is intimately linked with the spectrum of S, more precisely with itsspectral gaps. Recall that the spectrum of S lies in [−1, 1], with 1 being a simple eigenvalue.

Definition 6.9. Let δ− be the distance from −1 to the spectrum of S, and δ+ the distance from 1 to thespectrum of S restricted to e⊥. In other words, δ± are the largest numbers satisfying

S > −1 + δ−, S∣∣e⊥6 1− δ+ .

For generalized Wigner matrices the lower and upper spectral gaps satisfy δ± > a with a constanta := minCinf , Csup, see (2.6). This simple fact follows easily by splitting

S = (S − aee∗) + aee∗

and noticing that the first term is (1 − a) times a doubly stochastic matrix, hence its spectrum lies in[−1 + a, 1− a].

Proof of Lemma 6.3. The lower bound on Γ in (6.24) follows from (1 − m2scS)−1e = (1 − m2

sc)−1e

combined with (6.12). For the upper bound, we first notice that 1−m2scS is invertible since −1 6 S 6 1 and

|msc| < 1, see (6.11). Since msc and its reciprocal are bounded (6.11), it is sufficient to bound the inverseof m−2

sc − S. Since the spectrum of S lies in the set [−1 + δ−, 1 − δ+] ∪ 1 ⊂ [−1 + a, 1 − a] ∪ 1, and|m−2

sc | > 1, we easily get∥∥∥ 1

1−m2scS

∥∥∥`2→`2

6 C∥∥∥ 1

m−2sc − S

∥∥∥`2→`2

6C

mina, |m−2sc − 1|

6Ca

|1−m2sc|. (6.35)

In order to find the `∞ → `∞ norm, we solve (1−m2scS)v = u directly using (6.10):

‖v‖∞ = ‖u +m2scSv‖∞ 6 ‖u‖∞ + ‖S‖`1→`∞‖v‖1 6 ‖u‖∞ +N1/2‖S‖`1→`∞‖v‖2

6 ‖u‖∞ +N1/2‖S‖`1→`∞∥∥∥ 1

1−m2scS

∥∥∥`2→`2

‖u‖2

6(

1 +CaN‖S‖`1→`∞|1−m2

sc|

)‖u‖∞.

Here we used (6.35) and that ‖v‖1 =∑i |vi| 6 N1/2

(∑i |vi|2

)1/2and ‖u‖2 6 N1/2‖u‖∞. Since for

generalized Wigner matrices N‖S‖`1→`∞ = N ·maxij sij 6 Csup from (2.6), this proves∥∥∥∥ 1

1−m2scS

∥∥∥∥`∞→`∞

6C

|1−m2sc|,

where the constant depends on Cinf and Csup. This proves the upper bound on Γ in (6.24).

Finally, we bound Γ; the lower bound was already given in (6.19). For the upper bound we follow theargument above but we restrict S to e⊥. Since the spectrum of this restriction lies in [−1 + a, 1 − a], weimmediately get the bound C/a for the `2-norm of (1 −m2

scS)−1∣∣e⊥

in the right hand side of (6.35). Thiscan be lifted to the same estimate for the `∞ norm. This completes the proof of the lemma.

33

This simple proof of Lemma 6.3 used both spectral gaps and that sij = O(N−1). Lacking these infor-

mation in the general case, the following proposition gives explicit bounds on Γ and Γ depending on thespectral gaps δ± in the general case. We recall the notations z = E + iη, κ = κE :=

∣∣|E| − 2∣∣ and define

θ ≡ θ(z) :=

κ+ η√

κ+ηif |E| 6 2

√κ+ η if |E| > 2 ,

(6.36)

Proposition 6.10. For the matrix elements of S we assume 0 6 sij = sji 6 M−1 and∑j sij = 1. Then

there is a universal constant C such that the following holds uniformly in the domainz = E + iη : |E| 6

10, M−1 6 η 6 10

, and in particular in any spectral domain D.

(i) We have the estimate

1

C√κ+ η

6 Γ(z) 6C logN

1−max±∣∣ 1±m2

sc

2

∣∣ 6 C logN

minη + E2, θ. (6.37)

(ii) In the presence of a gap δ− we may improve the upper bound to

Γ(z) 6C logN

minδ− + η + E2, θ. (6.38)

(iii) For Γ we have the bounds

C−1 6 Γ(z) 6C logN

minδ− + η + E2, δ+ + θ. (6.39)

Proof. The first bound of (6.37) follows from (1−m2scS)−1e = (1−m2

sc)−1e combined with (6.12). In order

to prove the second bound of (6.37), we write

1

1−m2scS

=1

2

1

1− 1+m2scS

2

,

and observe that ∥∥∥∥1 +m2scS

2

∥∥∥∥`2→`2

6 max±

∣∣∣∣1±m2sc

2

∣∣∣∣ =: q . (6.40)

Therefore ∥∥∥∥ 1

1−m2scS

∥∥∥∥`∞→`∞

6n0−1∑n=0

∥∥∥∥1 +m2scS

2

∥∥∥∥n`∞→`∞

+√N

∞∑n=n0

∥∥∥∥1 +m2scS

2

∥∥∥∥n`2→`2

6 n0 +√N

qn0

1− q

6C logN

1− q,

where in the last step we chose n0 = C0 logN1−q for large enough C0. Here we used that ‖S‖`∞→`∞ 6 1 and

(6.11) to estimate the summands in the first sum. This concludes the proof of the second bound of (6.37).The third bound of (6.37) follows from the elementary estimates∣∣∣∣1−m2

sc

2

∣∣∣∣ 6 1− c(η + E2) ,

∣∣∣∣1 +m2sc

2

∣∣∣∣ 6 1− c(

(Immsc)2 +

η

Immsc + η

)6 1− cθ (6.41)

for some universal constant c > 0, where in the last step we used Lemma 6.2.

34

The estimate (6.38) follows similarly. Due to the gap δ− in the spectrum of S, we may replace theestimate (6.40) with ∥∥∥∥1 +m2

scS

2

∥∥∥∥`2→`2

6 max

1− δ− − η − E2 ,

∣∣∣∣1 +m2sc

2

∣∣∣∣ . (6.42)

Hence (6.38) follows using (6.41).The lower bound of (6.39) was proved in (6.19). The upper bound is proved similarly to (6.38), except

that (6.42) is replaced with∥∥∥∥1 +m2scS

2

∣∣∣∣e⊥

∥∥∥∥`2→`2

6 max

1− δ− − η − E2 , min

1− δ+ ,

∣∣∣∣1 +m2sc

2

∣∣∣∣.

This concludes the proof of (6.39).

35

7 Weak local semicircle law

Before we prove the local semicircle law in the strong form Theorem 6.7, for pedagogical reasons we firstprove the following weaker version whose proof is easier. For simplicity, in this section we consider theWigner case, i.e., sij = 1/N and M = N . In the bulk, this weaker estimate is still effective for all η downto the smallest scales N−1+ε, but the power of 1/N in the error estimate is not optimal (1/2 instead of 1 in(6.32)). Near the edge the bound is even weaker; the power of 1/N is reduced to 1/4, indicating that thisproof is not sufficiently strong near the edge.

Theorem 7.1 (Weak local semicircle law). Let z = E + iη and κ :=∣∣|E| − 2

∣∣. Let H be a Wigner matrix, letG(z) = (H − z)−1 be its resolvent and set mN (z) = 1

N TrG(z). We assume that the single entry distributionsatisfies the decay condition (5.6). Choose any γ > 0. Then for z ∈ Dγ := E + iη : |E| 6 10, N−1+γ 6η 6 10 we have

|mN (z)−msc(z)| ≺ min 1√

Nηκ,

1

(Nη)1/4

. (7.1)

For definiteness, we present the proof for the Hermitian case, but all formulas below carry over to theother symmetry classes with obvious modifications.

7.1 Proof of the weak local semicircle law, Theorem 7.1

The proof is divided into four steps.

7.1.1 Step 1. Schur complement formula.

The first step to prove the weak local semicircle law is the to use the Schur complement formula, which westate in the following lemma. The proof is an elementary exercise.

Lemma 7.2 (Schur formula). Let A, B, C be n×n, m×n and m×m matrices. We define (m+n)× (m+n)matrix D as

D :=

(A B∗

B C

)(7.2)

and n× n matrix D asD := A−B∗C−1B. (7.3)

Then D is invertible if D is invertible and for any 1 6 i, j 6 n, we have

(D−1)ij = (D−1)ij (7.4)

for the corresponding matrix elements.

Recall that Gij = Gij(z) denotes the matrix element of the resolvent

Gij =

(1

H − z

)ij

.

Let H(i) be the matrix where all matrix elements of H = (hab) in the i-th column and row are set to be zero:(H(i)

)ab

:= hab · 1(a 6= i) · 1(b 6= i), a, b = 1, 2, . . . , N.

In other words, H(i) is the i-th minor H [i] of H augmented to an N ×N matrix by adding a zero row andcolumn. Recall that the minor H [i] is an (N − 1)× (N − 1) matrix with the i-th row and column removed:

H[i]ab := hab, a, b 6= i.

36

Denote the Green function of H(i) by G(i)(z) = (H(i) − z)−1 which is again an N ×N matrix. Notice that

G(i)ab =

(−z)−1 if a = b = i

0 if exactly one of a or b equals i

G[i]ab = (H [i] − z)−1

ab if a 6= i, b 6= i.

(7.5)

We warn the reader that in some earlier papers on the subject a different convention was used, where H(i)

and G(i) denoted the (N − 1)× (N − 1) minors (H [i] with the current notation) and their Green functions.The current convention simplifies several formulas although the mathematical contents of both versions areidentical.

With similar conventions, we can define G(ij) etc. The superscript in parenthesis for resolvents alwaysmeans “after setting the corresponding row and column of H to be zero” (in some terminology, this procedureis also described as ”zero out the corresponding row and column”), in particular, by independence of matrixelements, this means that the matrix G(ij), say, is independent of the i-th and j-th row and column of H.This helps to decouple dependencies in formulae. Let

ai = (h1i, h2i, . . . , 0, . . . hNi)t (7.6)

be the i-th column of H, after setting the i-th entry zero.Using Lemma 7.2 for n = 1, m = N − 1, and (7.5), (7.6) we have

Gii =1

hii − z − ai ·G(i)ai, (7.7)

whereai ·G(i)ai =

∑k,l 6=i

hikG(i)kl hli =

∑k,l 6=i

hikG[i]klhli. (7.8)

(We sometimes use u ·v instead of u∗v or (u,v) for the usual Hermitian scalar product.) We now introducesome notations:

Definition 7.3 (Partial expectation and independence). Let X ≡ X(H) be a random variable. For i ∈1, . . . , N define the operations Pi and Qi through

PiX := E(X|H(i)) , QiX := X − PiX .

We call Pi partial expectation in the index i. Moreover, we say that X is independent of a set T ⊂ 1, . . . , Nif X = PiX for all i ∈ T.

We can decompose ai ·G(i)ai into its expectation and fluctuation

ai ·G(i)ai = Pi[ai ·G(i)ai

]+ Zi,

whereZi := Qi

[ai ·G(i)ai

]. (7.9)

Since G(i) is independent of ai, so we need to compute expectations and fluctuations of quadratic functions.The expectation is easy

Pi[ai ·G(i)ai

]= Pi

∑k,l

aikG(i)kl a

il =

∑k,l 6=i

Pi[hikG

(i)kl hli

]=

1

N

∑k 6=i

G(i)kk ,

where in the last step we used that different matrix elements are independent, i.e., Pi[hikhil

]= 1

N δkl. Thesummations always run over all indices from 1 to N , apart from those that are explicitly excluded. We define

m(i)N (z) :=

1

N − 1TrG[i](z) =

1

N − 1

∑k 6=i

G(i)kk(z),

where we used G[i]kk = G

(i)kk for k 6= i from (7.5). Hence we have the identity

Gii =1

hii − z − Pi[ai ·G(i)ai

]− Zi

=1

hii − z −(1− 1

N

)m

(i)N (z)− Zi

. (7.10)

37

7.1.2 Step 2. Interlacing of eigenvalues

We now estimate the difference between msc and m(i)N . The first step is the following well known lemma. We

include a short proof for completeness; for simplicity we consider the randomized setup with a continuousdistribution to avoid multiple eigenvalues. The general case easily follows from standard approximationarguments.

Lemma 7.4 (Interlacing of eigenvalues). Let H be a symmetric or Hermitian N ×N matrix with continuousdistribution. Decompose H as follows

H =

(h a∗

a B

), (7.11)

where a = (h12, . . . , h1N )∗ and B = H [1] is the (N − 1)× (N − 1) minor of H obtained by removing the firstrow and first column from H. Denote by µ1 6 µ2 6 . . . 6 µN the eigenvalues of H and λ1 6 λ2 6 . . . 6 λN−1

the eigenvalues of B. Then with probability one the eigenvalues of B are distinct and the eigenvalues of Hand B are interlaced:

µ1 < λ1 < µ2 < λ2 < µ3 < . . . . . . µN−1 < λN−1 < µN . (7.12)

Proof. Since matrices with multiple eigenvalues form a lower dimensional submanifold within the set of allHermitian matrices, the eigenvalues are distinct almost surely. Let µ be one of the eigenvalues of H and letv = (v1, . . . , vN )t be a normalized eigenvector associated with µ. From the eigenvalue equation Hv = µvand from (7.11) we find that

hv1 + a ·w = µv1, and av1 +Bw = µw (7.13)

with w = (v2, . . . , vN )t. From these equations we obtain

w = (µ−B)−1av1 and thus (µ− h)v1 = a · (µ−B)−1av1 =v1

N

∑α

ξαµ− λα

, (7.14)

using the spectral representation of B, where we set

ξα = |√Na · uα|2,

with uα being the normalized eigenvector of B associated with the eigenvalue λα. From the continuity ofthe distribution it also that v1 6= 0 almost surely and thus we have

µ− h =1

N

∑α

ξαµ− λα

, (7.15)

where ξα’s are strictly positive almost surely (notice that a and uα are independent). In particular, thisshows that µ 6= λα for any α. In the open interval µ ∈ (λα−1, λα) the function

Φ(µ) :=1

N

∑α

ξαµ− λα

is strictly decreasing from ∞ to −∞, therefore there is exactly one solution to the equation µ − h = Φ(µ).Similar argument shows that there is also exactly one solution below λ1 and above λN−1. This completesthe proof.

We can now compare the normalized traces of G and G[i]:

Lemma 7.5. Under the conditions of Lemma 7.4, for any 1 6 i 6 N , we have∣∣∣mN (z)−(

1− 1

N

)m

(i)N (z)

∣∣∣ 6 C

Nη, η = Im z > 0. (7.16)

38

Proof. With the notations of Lemma 7.4, let

F (x) :=1

N#λj 6 x, F (i)(x) :=

1

N − 1#µj 6 x

denote the normalized counting functions of the eigenvalues. The interlacing property of the eigenvalues ofH and H [i] (see (7.12)) in terms of these functions means that

supx|NF (x)− (N − 1)F (i)(x)| 6 1.

Then, after integrating by parts,∣∣∣mN (z)−(

1− 1

N

)m

(i)N (z)

∣∣∣ =∣∣∣ ∫ dF (x)

x− z−(

1− 1

N

)∫ dF (i)(x)

x− z

∣∣∣=

1

N

∣∣∣ ∫ NF (x)− (N − 1)F (i)(x)

(x− z)2dx∣∣∣

61

N

∫dx

|x− z|26

C

Nη, (7.17)

which proves (7.16). Note that the (7.16) would also hold without the prefactor (1− 1N ) since we have the

trivial bound |m(i)| 6 η−1.

We can now rewrite (7.10) into the following form, after summing over i:

mN (z) =1

N

∑i

1

−z −mN (z) + Ωi, with Ωi := hii − Zi +O

( 1

Nη

). (7.18)

At this stage we can explain the main idea for the proof of the local semicircle law (7.1). Equation (7.18)is the key self-consistent equation for mN , the Stieltjes transform of the empirical eigenvalue density of H.Notice that if Ωi were zero, then we would have the equation

m = − 1

m+ z(7.19)

complemented with the side condition that Imm > 0. This is exactly the defining equation (6.8) of theStieltjes transform of the semicircle density. In the next step we will give bounds on hii and Zi, and then wewill effectively control the stability of (7.19) against small perturbations. This will give the desired estimate(7.1) on |mN −msc|.

7.1.3 Step 3. Large deviation estimate of the fluctuations.

The quantity Zi defined in (7.9) will be viewed as a random variable in the probability space of the i-thcolumn. We will need an upper bound on it in large deviation sense, i.e., with a very high probability. Sinceit is a quadratic function of the independent matrix elements of the i-th column, standard large deviationtheorems do not directly apply. To focus on the main line of the proof and to keep technicalities at minimum,we postpone the discussion of full version of the quadratic large deviation bounds to Section 7.2. However,in order to get an idea of the size of Zi, we compute its second moment as follows:

Pi|Zi|2 =∑k,l 6=i

∑k′,l′ 6=i

Pi

[(hikG

(i)kl hli − Pi

[hikG

(i)kl hli

])(hik′G

(i)

k′l′hl′i − Pi[hik′G

(i)

k′l′hl′i])]

. (7.20)

Since Eh = 0, the non-zero contributions to this sum come from index combinations when all h and h arepaired. For pedagogical simplicity, assume that Eh2 = 0, this holds, for example, if the distribution of thereal and imaginary parts are the same. Then h factors in the above expression have to be paired in such a

39

way that hik = hik′ and hil = hil′ , i.e., k = k′, l = l′. Note that pairing hik = hil would give zero becausethe expectation is subtracted. The result is

Pi|Zi|2 =1

N2

∑k,l 6=i

|G(i)kl |

2 +m4 − 1

N2

∑k 6=i

|G(i)kk |

2, (7.21)

where m4 = E|√Nh|4 is the fourth moment of the single entry distribution. The first term can be computed

1

N2

∑k,l 6=i

|G(i)kl |

2 =1

N2

∑k,l 6=i

|G[i]kl |

2 =1

N2

∑k 6=i

(|G[i]|2)kk =1

Nη

1

N

∑k 6=i

ImG[i]kk =

1

Nη

(1− 1

N

)Imm

(i)N , (7.22)

where |G|2 = GG∗. In the middle step we have used the Ward identity valid for the resolvent R(z) = (A−z)−1

of any Hermitian matrix A:

|R(z)|2 =1

|A− E|2 + η2=

1

ηImR(z), z = E + iη. (7.23)

We can estimate the second term in (7.21) by the general fact that the resolvent R = (A − z)−1 of anyHermitian matrix A, we have ∑

k

|Rkk|2 6∑α

1

|ζα − z|2, (7.24)

where ζα are the eigenvalues of A. To see this, let uα be the normalized eigenvectors. Then by spectraltheorem

Rkk =∑α

|uα(k)|2

ζα − z,

and thus we have ∑k

|Rkk|2 6∑k

∑α,β

|uα(k)|2

|ζα − z||uβ(k)|2

|ζβ − z|

6∑α

1

|ζα − z|2∑k

|uα(k)|2∑β

|uβ(k)|2 =∑α

1

|ζα − z|2,

where we have used the Schwarz inequality and that uβ is an orthonormal basis. Applying this bound tothe Green function of H [i] with eigenvalues µα, we have

1

N2

∑k 6=i

|G(i)kk |

2 =1

N2

∑k 6=i

|G[i]kk|

2 61

Nη

1

N

N−1∑α=1

η

|µα − z|2=

1

Nη

(1− 1

N

)Imm

(i)N . (7.25)

By (7.16), we can estimate m(i)N by mN . Thus the estimates (7.22) and (7.25) confirm that the size of Zi

is roughly

|Zi| .1√Nη

√ImmN (7.26)

in the second moment sense. In Section 7.2 we will prove that this inequality actually holds in large deviationsense, i.e., we have

|Zi| ≺1√Nη

√ImmN . (7.27)

The diagonal entry hii can be easily estimated. Since the single entry distribution has finite moments(5.6), we have

P(|hii| > NεN−1/2

)6 CpN

−εp

for each i and for any ε > 0. Hence we can guarantee that all diagonal elements hii simultaneously satisfy|hii| ≺ N−1/2.

40

7.1.4 Step 4. Initial estimate at the large scales

To control the error terms in the self-consistent equation (7.18) we need two inputs. First, from now on weassume that (7.27) holds. Since Nη is large, this implies that Zi is small provided that ImmN is bounded.Second, we need to ensure that the denominator in the right hand side of (7.18) does not become too small.Since the main term in this denominator is z +mN (z), our task is to show that

1

|z +mN (z)|≺ 1. (7.28)

Notice that both inputs are in terms of the yet uncontrolled quantity mN ; they would be trivially availableif mN were replaced with msc (see Lemma 6.2). Since the smallness of mN − msc is our goal, to breakthis apparently circular argument we will use a bootstrap strategy. The convenient bootstrap parameter isη. We first establish the result for large η in this section which is called the initial estimate. Then, in thenext section, step by step we reduce the value η by using the control from the previous scale to estimateImmN and |z+mN (z)|−1. This control will use the large deviation bounds on Zi which hold with very highprobability. Hence at each step an exceptional event of very small probability will have to be excluded. Thisis the main reason why the bootstrap argument is done in small discrete steps, although in essence this is acontinuity argument. We will explain this important argument in more details in the next section.

Both the initial estimate and the continuity argument use the following simple idea. By expanding Ωi inthe denominator in (7.18) and using (7.27) and hii ≺ N−1/2, we have∣∣∣mN (z) +

1

z +mN (z)

∣∣∣ ≺ 1

|z +mN (z)|2maxi|Ωi|

≺ 1

|z +mN (z)|2[√ ImmN

Nη+N−1/2 + (Nη)−1

](7.29)

provided that Ωi can be considered as a small perturbation of z +mN (z), i.e.

1

|z +mN (z)|maxi|Ωi| 1, (7.30)

so that the expansion is justified. We can now use the following elementary lemma.

Lemma 7.6. Fix z = E + iη, with |E| 6 20, 0 < η 6 10, and set κ :=∣∣|E| − 2

∣∣. Suppose that m satisfies theinequality ∣∣∣m+

1

z +m

∣∣∣ 6 δ (7.31)

for some δ 6 1. Then

min|m−msc(z)|,

∣∣∣m− 1

msc(z)

∣∣∣ 6 Cδ√κ+ η + δ

6 C√δ. (7.32)

Proof. For δ 6 1 and |z| 6 20, (7.31) implies that |m| 6 22. Write (7.31) as

m+1

z +m=: ∆, |∆| 6 δ,

and subtract this from the equation msc + (z +msc)−1 = 0. After some simple algebra we get

(m−msc)[m− 1

msc

]= ∆(m+ z), (7.33)

i.e.,

|m−msc|∣∣∣m− 1

msc

∣∣∣ 6 Cδ, (7.34)

41

for some fixed constant C, where we have used (6.11).We separate two cases. If |1−m2

sc| 6 C ′√δ for some large C ′, then we write the above inequality as

|m−msc|∣∣∣m−msc −

1−m2sc

msc

∣∣∣ 6 Cδ. (7.35)

We claim that this implies |m − msc| 6 2C ′√δ/|msc|. Indeed, if |m − msc| > 2C ′

√δ/|msc| were true,

then the second factor in (7.35) were at least C ′√δ/|msc|, so the left hand side of (7.35) were at least

2(C ′/|msc|)2δ. Since msc 1, see (6.11), we would get a contradiction if C ′ is large enough. Thus we proved|m−msc| 6 2C ′

√δ/|msc| 6 C

√δ in this case, which is in agreement with the first inequality in (7.32) since

the condition |1−m2sc| 6 C ′

√δ also implies

√κ+ η .

√δ by (6.12).

In the second case we have |1 − m2sc| > C ′

√δ, i.e.,

√κ+ η &

√δ, so for (7.32) we need to prove that

|m −msc| . δ/√κ+ η or |m −m−1

sc | . δ/√κ+ η. If |m −msc| 6 1

2 |1 −m2sc|/|msc|, then the second factor

in (7.35) were at least 12 |1−m

2sc|/|msc|

√κ+ η, so we would immediately get |m−msc| . δ/

√κ+ η. We

are left with the case |m−msc| > 12 |1−m

2sc|/|msc|, i.e., |m−msc| &

√κ+ η. Rewrite (7.35) as∣∣∣m− 1

msc+

1−m2sc

msc

∣∣∣ ∣∣∣m− 1

msc

∣∣∣ 6 Cδ, (7.36)

and repeat the previous argument with interchanging the role of msc and 1/msc. We conclude that either|m −m−1

sc | 6 12 |1 −m

2sc|/|msc|, in which case we immediately get |m −m−1

sc | . δ/√κ+ η, or |m −m−1

sc | &√κ+ η. In the latter case, combining it with |m −msc| &

√κ+ η from the previous argument, we would

get |m−msc||m−m−1sc | & κ+ η. Since κ+ η & |1−m2

sc| > C ′δ, this would contradict (7.34) if C ′ is largeenough. This completes the proof of the lemma.

Applying Lemma 7.6 in (7.29), we see that

min|mN (z)−msc(z)|,

∣∣∣mN (z)− 1

msc(z)

∣∣∣ ≺ 1

|z +mN (z)|min

(maxi|Ωi|

)1/2,

maxi |Ωi|√κ

, (7.37)

i.e., a good bound on Ωi directly yields an estimate on mN −msc provided a lower bound on |z +mN (z)| isgiven and if the possibility that |mN (z)−m−1

sc (z)| is small can be excluded.Using this scheme, we now derive the initial estimate, which is (7.1) for any η > N−1/16. To check (7.30),

we start with the trivial bound ImmN 6 η−1. By (7.27), we have Zi ≺ N−15/32 for η > N−1/16. Hence wehave

maxi|Ωi| 6 max

i|Zi|+ |hii|+ CN−15/16 ≺ N−15/32. (7.38)

Since Im(z +mN (z)) > η > N−1/16, (7.30) is satisfied for η > N−1/16. Thus (7.29) implies that∣∣∣mN (z) +1

z +mN (z)

∣∣∣ ≺ 1

|z +mN (z)|2O(N−15/32) ≺ N−11/32 (7.39)

for any η > N−1/16. The precise exponents do not matter here, our goal is to find some η = N−c, such thatthe last equation holds with an error N−c

′for some c, c′ > 0.

Applying Lemma 7.6 we obtain, for any η > N−1/16, that either

|mN (z)−msc(z)| ≺ N−11/64, or∣∣∣mN (z)− 1

msc(z)

∣∣∣ ≺ N−11/64. (7.40)

However, the second option is excluded since ImmN > 0 and thus∣∣∣mN (z)− 1

msc(z)

∣∣∣ > ImmN − Im1

msc(z)> c Immsc(z) > cη > cN

−1/16, (7.41)

where we used (6.13). This proves that

|mN (z)−msc(z)| ≺ N−11/64. (7.42)

42

Once we know that |mN (z)−msc(z)| is small, we can simply perturb the relation∣∣∣ 1

z +msc(z)

∣∣∣ = |msc(z)| 6 1

from Lemma 6.2 to obtain

|mN (z)| 6 C, and∣∣∣ 1

z +mN (z)

∣∣∣ 6 C. (7.43)

Thus we can bound Ωi from (7.27) and hii ≺ N−1/2 by

maxi|Ωi| ≺

1√Nη

, (7.44)

and, instead of (7.39), we get the stronger bound∣∣∣mN (z) +1

z +mN (z)

∣∣∣ ≺ 1√Nη

.

Applying (7.32) once again, with δ = (Nη)−1/2, and using (7.41) to exclude the possibility that mN is closeto 1/msc, we obtain the better bound

|mN (z)−msc(z)| ≺ min( 1

(Nη)1/4,

1√Nηκ

), for any η > N−1/16, (7.45)

which is exactly (7.1) for η > N−1/16.

7.1.5 Step 5. Continuity argument and completion of the proof

With (7.1) proven for any η > N−1/16, we now proceed to reduce the scale of η, while E, the real part of z,is kept fixed. To do this, choose another scale η1 = η − N−4 slightly smaller than η (but still η1 > N−1).Since

|m′N (z)| 6 N−1∑α

1

|λα − z|26 η−2, (7.46)

we have the deterministic relation

|mN (z1)−mN (z)| 6 N−2, z = E + iη, z1 = E + iη1. (7.47)

Hence (7.1) holds as well for η1 instead of η since N−2 is smaller than all our error terms. But we cannot relyon this idea for the obvious reason that we would need to apply this procedure at least N4 times and the manysmall errors would accumulate. We need to show that the estimate in (7.1) does not deteriorate even by thesmall amount N−2. Since the definition of ≺ includes additional factors Nε, it is not sufficiently sensitiveto track such precision. For the continuity argument we will now abandon the formalism of stochasticdomination and we go back to tracking exceptional sets precisely. In Section 8 we will setup a continuityscheme fully within the framework of stochastic domination, but here we complete the proof in a moreelementary way.

The idea is to show that for any small exponent ε ∈ (0, γ10 ) there is a constant C1 large enough such thatif

S(z) := min|mN (z)−msc(z)|, |mN (z)−m−1

sc (z)|6 C1N

ε min( 1

(Nη)1/4,

1√Nηκ

)(7.48)

in a set A in the probability space, then we not only have the bound

S(z1) 6 C1Nε min

( 1

(Nη)1/4,

1√Nηκ

)+ 2N−2 (7.49)

43

that trivially follows from (7.47), but we also have that

S(z1) 6 C1Nε min

( 1

(Nη1)1/4,

1√Nη1κ

)(7.50)

holds with the same C1 and ε in a set A1 ⊂ A with

P (A \A1) 6 N−D for any D > 0. (7.51)

Thus the deterioration of the estimate from (7.47) to (7.50) can be avoided at least with a very highprobability. To see this, we note that in the regime η > N−1+5ε the bound (7.49) implies that in the set Awe have

| ImmN (z1)| 6 |mN (z1)| 6 |msc(z1)|+ o(1) 6 2, (7.52)

and1

|z1 +mN (z1)|6 2. (7.53)

The last estimate is a perturbative consequence of the bounds |z +msc(z)|−1 6 1 and |z +m−1sc (z)|−1 6 C,

where we used either

|(z1 +mN (z1))− (z +msc(z))| 6 |mN (z1)−mN (z)|+ |z1 − z|+ |mN (z)−msc(z)| = o(1) (7.54)

or

|(z1 +mN (z1))− (z +m−1sc (z))| 6 |mN (z1)−mN (z)|+ |z1 − z|+ |mN (z)−m−1

sc (z)| = o(1);

one of which follows from (7.47) and (7.48). Let A1 be the intersection of three sets: A, the set on which

|hii| 6 N−12 +ε holds and the set that (7.27) holds at the scale η1, i.e., at z = E + iη1. Hence, together with

(7.53), the condition (7.30) holds in A1. Now we can use (7.37) together with (7.52) and (7.53) to prove that(7.50) holds if C1 is chosen large enough.

At each step, we lose a set of probability N−D due to checking (7.27) at that scale. The estimate fromthe previous scale is used only to check (7.52), (7.53) and (7.30). Finally, the constant C1 in the final boundof S(z1) comes from the constant in (7.32) which is uniform in η. Therefore C1 does not deteriorate whenpassing to the next scale. The only price to pay is the loss of an exceptional set of probability N−D. Sincethe number of steps are N4, while D can be arbitrary large, this loss is affordable. Since the exponent ε > 0was arbitrary, this proves

S(z) = min|mN (z)−msc(z)|, |mN (z)−m−1

sc (z)|≺ min

1√Nηκ

,1

(Nη)1/4

(7.55)

for any fixed z ∈ Dγ . Using this relation for a discrete net Dγ := Dγ ∩ N−4(Z + iZ) and a union bound,

we obtain that (7.55) holds simultaneously for all z ∈ Dγ . Using the Lipschitz continuity of S(z) (with aLipschitz constant at most N2), we get (7.55) simultaneously for all z ∈ Dγ . Finally, we need to show thatthe estimate holds not only for the minimum, but for |mN (z)−msc(z)|. We have already seen this for anyη > N−1/16 in (7.45). Fixing E = Re z, we consider mN (E+iη), msc(E+iη) and S(E+iη) as functions of η.Since these are continuous functions, by reducing η we obtain that mN (E+ iη) remains close to msc(E+ iη)as long as the right hand side (7.55) is larger than the difference∣∣msc(E + iη)−m−1

sc (E + iη)∣∣ √κE + η.

Since the right hand side of (7.55) increases as η decreases, while the separation bound√κE + η decreases,

once η is so small that the separation bound√κE + η becomes smaller than the right hand side of (7.55),

the difference between msc and m−1sc remains irrelevant for any smaller η. Thus (7.1) holds on the entire

Dγ . This proves the weak local semicircle law, i.e., Theorem 7.1 except the large deviation estimate (7.27)which we will prove in the next subsection.

44

7.2 Large deviation estimates

Finally, in order to estimate large sums of independent random variables as in (7.8) and later in (8.2), wewill need a large deviation estimate for linear and quadratic functionals of independent random variables.The case of the linear functionals is standard. Quadratic functionals were considered in [62,69]; the currentformulation is taken from [52].

Theorem 7.7 (Large deviation bounds). Let(X

(N)i

)and

(Y

(N)i

)be independent families of random variables

and(a

(N)ij

)and

(b(N)i

)be deterministic; here N ∈ N and i, j = 1, . . . , N . Suppose that all entries X

(N)i and

Y(N)i are independent and satisfy

EX = 0 , E|X|2 = 1 , ‖X‖p := (E|X|p)1/p 6 µp (7.56)

for all p ∈ N and some constants µp. Then we have the bounds

∑i

biXi ≺(∑

i

|bi|2)1/2

, (7.57)

∑i,j

aijXiYj ≺(∑i,j

|aij |2)1/2

, (7.58)

∑i6=j

aijXiXj ≺(∑i 6=j

|aij |2)1/2

. (7.59)

Our proof in fact generalizes trivially to arbitrary multilinear estimates for quantities of the form∑∗i1,...,ik

ai1...ik(u)Xi1(u) · · ·Xik(u), where the star indicates that the summation indices are constrainedto be distinct.

For our purposes the most important inequality is the last one (7.59). It confirms the intuition fromSection 7.1.3 that for sums of the form

∑i6=j aijXiXj the second moment calculation gives the correct order

of magnitude and it can be improved to a high probability statement in large deviation sense. Indeed, thesecond moment calculation gives

E∣∣∣∑i 6=j

aijXiXj

∣∣∣2 =∑i 6=j

∑i′ 6=j′

aij ai′j′XiXjXi′Xj′ =∑i 6=j

[|aij |2 + aij aji

]6 2

∑i 6=j

|aij |2,

since only the pairings i = i′, j = j′ or i = j′, j = i′ give nonzero contribution.

To prepare the proof of Theorem 7.7, we first recall the following version of the Marcinkiewicz-Zygmundinequality.

Lemma 7.8. Let X1, . . . , XN be a family of independent random variables each satisfying (7.56) and supposethat the family (bi) is deterministic. Then

∥∥∥∥∑i

biXi

∥∥∥∥p

6 (Cp)1/2µp

(∑i

|bi|2)1/2

. (7.60)

Proof. The proof is a simple application of Jensen’s inequality. Writing B2 :=∑j |bi|2, we get, by the

45

classical Marcinkiewicz-Zygmund inequality [126] in the first line, that∥∥∥∥∑i

biXi

∥∥∥∥pp

6 (Cp)p/2∥∥∥∥(∑

i

|bi|2|Xi|2)1/2∥∥∥∥p

p

= (Cp)p/2Bp E[(∑

i

|bi|2

B2|Xi|2

)p/2]6 (Cp)p/2Bp E

[∑i

|bi|2

B2|Xi|p

]6 (Cp)p/2Bpµpp .

Next, we prove the following intermediate result.

Lemma 7.9. Let X1, . . . , XN , Y1, . . . , YN be independent random variables each satisfying (7.56), and supposethat the family (aij) is deterministic. Then for all p > 2 we have∥∥∥∥∑

i,j

aijXiYj

∥∥∥∥p

6 Cpµ2p

(∑i,j

|aij |2)1/2

.

Proof. Write ∑i,j

aijXiYj =∑j

bjYj , bj :=∑i

aijXi .

Note that (bj) and (Yj) are independent families. By conditioning on the family (bj), we therefore get fromLemma 7.8 and the triangle inequality that∥∥∥∥∑

j

bjYj

∥∥∥∥p

6 (Cp)1/2 µp

∥∥∥∥∑j

|bj |2∥∥∥∥1/2

p/2

6 (Cp)1/2 µp

(∑j

‖bj‖2p)1/2

.

Using Lemma 7.8 again, we have

‖bj‖p 6 (Cp)1/2 µp

(∑i

|aij |2)1/2

.

This concludes the proof.

Lemma 7.10. Let X1, . . . , XN be independent random variables each satisfying (7.56), and suppose that thefamily (aij) is deterministic. Then we have∥∥∥∥∑

i 6=j

aijXiXj

∥∥∥∥p

6 Cpµ2p

(∑i6=j

|aij |2)1/2

.

Proof. The proof relies on the identity (valid for i 6= j)

1 =1

ZN

∑ItJ=NN

1(i ∈ I)1(j ∈ J) , (7.61)

where the sum ranges over all partitions of NN = 1, . . . , N into two sets I and J , and ZN := 2N−2 isindependent of i and j. Moreover, we have ∑

ItJ=NN

1 = 2N − 2 , (7.62)

46

where the sum ranges over nonempty subsets I and J . Now we may estimate∥∥∥∥∑i6=j

aijXiXj

∥∥∥∥p

61

ZN

∑ItJ=NN

∥∥∥∥∑i∈I

∑j∈J

aijXiXj

∥∥∥∥p

61

ZN

∑ItJ=NN

Cpµ2p

(∑i 6=j

|aij |2)1/2

,

where we used that, for any partition I t J = NN , the families (Xi)i∈I and (Xj)j∈J are independent, andhence the Lemma 7.9 is applicable. The claim now follows from (7.62).

As remarked above, the proof of Lemma 7.10 may be easily extended to multilinear expressions of theform

∑∗i1,...,ik

ai1...ikXi1 · · ·Xik .We may now complete the proof of Theorem 7.7.

Proof of Theorem 7.7. The proof is a simple application of Chebyshev’s inequality. Part (i) follows fromLemma 7.8, part (ii) from Lemma 7.10, and part (iii) from Lemma 7.9. We give the details for part (iii).

For ε > 0 and D > 0 we have

P

[∣∣∣∣∑i6=j

aijXiXj

∣∣∣∣ > NεΨ

]6 P

[∣∣∣∣∑i 6=j

aijXiXj

∣∣∣∣ > NεΨ ,

(∑i 6=j

|aij |2)1/2

6 Nε/2Ψ

]

+ P

[(∑i 6=j

|aij |2)1/2

> Nε/2Ψ

]

6 P

[∣∣∣∣∑i 6=j

aijXiXj

∣∣∣∣ > Nε/2

(∑i 6=j

|aij |2)1/2

]+N−D−1

6

(Cpµ2

p

Nε/2

)p+N−D−1

for arbitrary D. In the second step we used the definition of(∑

i 6=j |aij |2)1/2 ≺ Ψ with parameters ε/2 and

D+1. In the last step we used Lemma 7.10 by conditioning on (aij). Given ε and D, there is a large enoughp such that the first term on the last line is bounded by N−D−1. Since ε and D were arbitrary, the proof iscomplete.

The claimed uniformity in u in the case that aij and Xi depend on an index u also follows from the aboveestimate.

47

8 Proof of the local semicircle law

In this section we start the proof of the local semicircle law, Theorem 6.7. This section can be read indepen-dently of the previous Section 7, so some basic definitions and facts are repeated for convenience. For thosereaders who may wish to compare this argument with the proof the weak law, Theorem 7.1, we mention thatthe basic strategy is similar except that we use an additional mechanism that we call fluctuation averaging.We point out that the first estimate in (7.29) was not optimal; here we estimated the average of Ωi by itsmaximum. Since Ωi’s are almost centred random variables with weak correlation, a more precise estimateof their average leads to a considerable improvement. We now recall that the basic steps to prove the weaklaw were

1) self consistent equation for mN ;

2) interlacing of eigenvalues to compare mN and m(i)N ;

3) quadratic large deviation estimate for the error term Zi;

4) initial estimate for large η;

5) extend the estimate to small η by the continuity argument.

Exploiting the fluctuation averaging mechanism requires a control on the individual matrix elements ofthe resolvent G instead of just its normalized trace, mN = N−1 TrG. Therefore, instead of consideringthe scalar equation for mN , we will consider the vector self-consistent equation for the diagonal elementsGii of the Green function and investigate the stability of this equation. There is no direct analogue of theinterlacing property for Gii, thus we introduce new resolvent decoupling identities to compare the resolventsof the original matrix and its minors. We will still use the quadratic large deviation estimate to bound theerror term. We also use a continuity argument similar to the one given in the weak law to derive a crudeestimate on Gii (formulated in terms of a certain dichotomy), but instead of making many small steps andkeeping track of the exceptional sets, we follow a genuinely continuous approach within the framework ofthe stochastic domination. Finally, we use the fluctuation averaging lemma (Lemma 8.9) to boost the errorestimate by one order and an iteration argument to prove the local semicircle law, Theorem 6.7. We nowstart the rigorous proof. We will largely follow the presentation in [55].

8.1 Tools

In this subsection we collect some basic definitions and facts. First we repeat the Definition 7.3 of the partialexpectation:

Definition 8.1 (Partial expectation and independence). Let X ≡ X(H) be a random variable. For i ∈1, . . . , N define the operations Pi and Qi through

PiX := E(X|H(i)) , QiX := X − PiX .

We call Pi partial expectation in the index i. Moreover, we say that X is independent of a set T ⊂ 1, . . . , Nif X = PiX for all i ∈ T.

Next, we define matrices with certain columns and rows zeroed out.

Definition 8.2. For T ⊂ 1, . . . , N we set H(T) to be the N ×N matrix defined by

(H(T))ij := 1(i /∈ T)1(j /∈ T)hij , i, j = 1, 2, . . . , N.

Moreover, we define the resolvent of H(T) and its normalized trace through

G(T)ij (z) := (H(T) − z)−1

ij , m(T)(z) :=1

NTrG(T)(z).

48

We also set the notation(T)∑i

:=∑i : i/∈T

.

These definitions are the natural generalizations ofH(i) andG(i) introduced in Section 7.1.1. In particular,notice that H(T) is the matrix obtained by setting all rows and columns in T to be zero. This is differentfrom considering the minors by removing the columns and rows in T. Similarly, G(T) is still an N×N matrix

with G(T)ii = −z−1 for i ∈ T and G

(T)ij = 0 if i ∈ T and j 6∈ T. We will denote G(i) simply by G(i) and

similarly for a few more indices. This is consistent with the notations we used in the earlier chapters.The following resolvent decoupling identities form the backbone of all of our calculations. The idea

behind them is that a resolvent matrix element Gij depends strongly on the i-th and j-th columns of H,but weakly on all other columns. The first identity determines how to make a resolvent matrix element Gijindependent of an additional index k 6= i, j. The second identity expresses the dependence of a resolventmatrix element Gij on the matrix elements in the i-th or in the j-th column of H. We added a third identitythat relates sums of off-diagonal resolvent entries with a diagonal one. The proofs are elementary.

Lemma 8.3 (Resolvent decoupling identities). For any real or complex Hermitian matrix H and T ⊂1, . . . , N the following identities hold.

i) First resolvent decoupling identity [69]: If i, j, k /∈ T and i, j 6= k then

G(T)ij = G

(Tk)ij +

G(T)ik G

(T)kj

G(T)kk

. (8.1)

ii) Second resolvent decoupling identity [52]: If i, j /∈ T satisfy i 6= j then

G(T)ij = −G(T)

ii

(Ti)∑k

hikG(Ti)kj = −G(T)

jj

(Tj)∑k

G(Tj)ik hkj , (8.2)

where the superscript in the summation means omission, e.g., the summation in the first sum runsover all k /∈ T ∪ i.

iii) [Ward identity] For any T ⊂ 1, . . . , N we have∑j

|G(T)ij |

2 =1

ηImG

(T)ii . (8.3)

Proof. We will prove this lemma only for T= 0, the general case is a straightforward modification. We firstconsider the (8.2). Recall the resolvent expansion stating that for any two matrices A and B,

1

A+B=

1

A− 1

A+BB

1

A=

1

A− 1

AB

1

A+B, (8.4)

provided that all the matrix inverses exist.To obtain the first formula in (8.2), we use the first resolvent identity (8.4) at the (ij)-th matrix element

with A = H(i) − z and B = H −H(i). Since G(i)ij = (A−1)ij = 0 if i 6= j, we immediately have

Gij = −∑k

GiihikG(i)kj −

∑k

GikhkiG(i)ij = −Gii

∑k 6=i

hikG(i)kj , i 6= j. (8.5)

The second identity in (8.2) follows in the same way by using the second identity in (8.4). To prove theidentity (8.1), we let A = H(k) − z and B = H −H(k). Then from the first formula in (8.4) we have

Gij = G(k)ij −Gik

∑` 6=k

hk`G(k)`j = G

(k)ij +

GikGkjGkk

, (8.6)

49

and in the second step we used (8.2).Finally, the Ward identity (8.3), already used in (7.23), is well known. It follows from the spectral

decomposition for H with eigenvalues λα and eigenvectors uα, namely∑j

|Gij |2 =∑j

GijG∗ji = [|G|2]ii =

∑α

|uα(i)|2

|λα − z|2=

1

ηIm∑α

|uα(i)|2

λα − z=

1

ηImGii.

8.2 Self-consistent equations on two levels

Using the notation from the previous section, the Schur formula (Lemma 7.2) can be written as

1

G(T)ii

= hii − z −(Ti)∑k,l

hikG(Ti)kl hli , (8.7)

where i /∈ T ⊂ 1, . . . , N. We can take T = ∅ to have

Gii =1

hii − z −∑(i)k,l hikG

(i)kl hli

. (8.8)

The partial expectation with respect to the index i in the denominator can be computed explicitly since

hikhli is independent of G(i)kl and its expectation is nonzero only if k = l. We thus get

Pi

[ (i)∑k,l

hikG(i)kl hli

]=

(i)∑k

sikG(i)kk =

(i)∑k

sikGkk −(i)∑k

sikGikGkiGii

=∑k

sikGkk −∑k

sikGikGkiGii

,

where in the second step we used (8.1). We will compare (8.8) with the defining equation of msc:

msc =1

−z −msc, (8.9)

so we introduce the notation for the difference

vi := Gii −msc .

Recalling (6.1), we get the following system of self-consistent equations for vi

vi =1

−z −msc −(∑

k sikvk −Υi

) −msc , (8.10)

where

Υi := Ai + hii − Zi , Ai :=∑k

sikGikGkiGii

, Zi := Qi

[ (i)∑k,l

hikG(i)kl hli

]. (8.11)

All these quantities depend on z, but we omit it in the notation. We will show that Υ is a small error term.This is clear about hii by (6.27). The term Ai will be small since off-diagonal resolvent entries are small.Finally, Zi will be small by a large deviation estimate (7.59) from Theorem 7.7.

Before we present more details, we heuristically show the power of this new system of self-consistentequations (8.10) and compare it with the single self-consistent equation used in Section 7 and, in fact, usedin all previous literature on the resolvent method for Wigner matrices.

50

8.2.1 A scalar self-consistent equation

Introduce the notation

[a] =1

N

∑i

ai (8.12)

for the average of a vector (ai)Ni=1. Consider the standard Wigner case, sij = 1/N . Then∑k

sikvk =1

N

∑k

vk = [v](

= mN −msc

).

Neglecting Υi in (8.10) and taking the average of this relation for each i = 1, 2, . . . , N , we get

[v] ≈ 1

−z −msc − [v]−msc . (8.13)

Recall the (8.9), the defining equation of msc:

msc =1

−z −msc.

Using Lemma 7.6, which asserts that this equation is stable under small perturbations, at least away fromthe spectral edges z = ±2, we conclude [v] ≈ 0 from (8.13). This means that mN ≈ msc and hence for theempirical density %N ≈ %sc, i.e., we obtained the Wigner’s original semicircle law. This is exactly the ideawe used in Section 7 except that the interlacing argument is replaced by Lemma 8.3 this time.

8.2.2 A vector self-consistent equation

If we are interested in individual resolvent matrix elements Gii instead of their average, 1N TrG, then the

scalar equation (8.13) discussed in the previous section is not sufficient. We have to consider (8.10) as asystem of equations for the components of the vector v = (v1, . . . , vN ). In order to analyse it, we willlinearize this system of equations. There are two possible linearizations depending on how we expand theerror terms.

Linearization I. If we know that(∑

k sikvk −Υi

)is small, we can expand the denominator in (8.10) to have

vi = m2sc

(∑k

sikvk −Υi

)+m3

sc

(∑k

sikvk −Υi

)2+O

((∑k

sikvk −Υi

)3), (8.14)

where we have used the defining equation (8.9) for msc, and used that |m4sc| 6 1 in the error term. After

rearranging (8.14) we get

[(1−m2scS)v]i = Ei := −m2

scΥi +m3sc

(∑k

sikvk −Υi

)2+O

((∑k

sikvk −Υi

)3), (8.15)

Linearization II. If we know that vi is small, then it is useful to rewrite (8.10) into the following form:

−∑k

sikvk + Υi =1

msc + vi− 1

msc, (8.16)

where we have also used the defining equation of msc. If |vi| = o(1), then we expand vi to the second orderand multiply both sides by m2

sc to obtain

[(1−m2scS)v]i = Ei := −m2

scΥi +1

mscv2i +O(|vi|3) , (8.17)

where we have estimated |m2sc| > c in the last term (see Lemma 6.2).

51

Notice that the definitions of Ei in (8.15) and (8.18) are different, although their leading behavior −m2scΥi

is the same. In both cases we can continue the analysis by inverting the operator (1−m2scS) to obtain

v =1

1−m2scSE , hence ‖v‖∞ 6

∥∥∥ 1

1−m2scS

∥∥∥∞→∞

‖E‖∞ = Γ‖E‖∞, (8.18)

and this relation shows how the quantity Γ, defined in (6.17), emerges. If the error term is indeed small andΓ is bounded, then we obtain that ‖v‖∞ = max |Gii −msc| is small.

While the expansion logic behind equations (8.14) and (8.17) is the same and the resulting formulae arevery similar, the structure of the main proof depends on which linearization of the self-consistent equation isused. In both cases we need to derive an a-priori bound to ensure that the expansion is valid. Intuitively, thesmallness of

∑k sikvk −Υi seems easier than that of vi since both terms are averaged quantities and extra

averaging typically helps. But these are random objects and every estimate comes with an exceptional set inthe probability space where it does not hold. It turns out that on the technical level it is worth minimizingthe bookkeeping of these events and this reason favors the second version of the linearization which operateswith controlling a single quantity, maxi |vi|. In this book, therefore, we will follow the linearization (8.17).We remark that the other option was used in [69, 70] which required first proving the weak semicircle law,Theorem 7.1, to provide the necessary a priori bound. The linearization (8.17) circumvents this step andthe a priori bound on vi will be proved directly.

8.3 Proof of the local semicircle law without using the spectral gap

In this section we prove a restricted version of Theorem 6.7, namely we replace threshold ηE with a largerthreshold ηE defined as

ηE := min

η :

1

Mξ6 min

M−γ

Γ(E + iξ)3,

M−2γ

Γ(E + iξ)4 Immsc(E + iξ)

holds for all ξ > η

. (8.19)

This definition is exactly the same as (6.28), but Γ is replaced with the larger quantity Γ, in other words wedo not make use of the spectral gap in S. This will pedagogically simplify the presentation but it will provethe estimates in Theorem 6.7 only for the η > ηE regime. In Section 9 we will give the proof for the entireη > ηE regime. We recall Lemma 6.3 showing that there is no difference between Γ and Γ for generalizedWigner matrices away from the edges (both are of order 1), so readers interested in the local semicircle lawonly in the bulk should be content with the simpler proof. Near the spectral edges, however, there is asubstantial difference. Note that even in the Wigner case (see (6.22)), ηE is much larger near the spectraledges than the optimal threshold ηE . For generalized Wigner matrices, while ηE 1/N , the threshold ηEis determined by the relation NηE(κE + ηE)3/2 1, where κE =

∣∣|E| − 2∣∣ is the distance of E from the

spectral edges.

We stress, however, that the proof given below does not use any model-specific upper bound on Γ, suchas (6.22) or (6.24), only the trivial lower and upper bounds, (6.19) and (6.20), are needed. The actual sizeof Γ enters only implicitly by determining the threshold ηE . This makes the argument applicable to a wideclass of problems beyond generalized Wigner matrices, including band matrices, see [55].

Definition 8.4. We call a deterministic nonnegative function Ψ ≡ Ψ(N)(z) an admissible control parameterif we have

cM−1/2 6 Ψ 6 M−c (8.20)

for some constant c > 0 and large enough N . Moreover, after fixing a γ > 0, we call any (possibly N -dependent) subset

D = D(N) ⊂z : |E| 6 10, M−1 6 η 6 10

a spectral domain.

52

In this section we will mostly use the spectral domain

S :=z : |E| 6 10, η ∈ [ηE , 10]

, (8.21)

where we note that

ηE >1

8M−1+γ (8.22)

using the lower bound Γ > c, from (6.19), in the definition (8.19). Define the random control parameters

Λo := maxi 6=j|Gij | , Λd := max

i|Gii −msc| , Λ := max(Λo,Λd) . (8.23)

where the letters d and o refer to diagonal and off-diagonal elements. In the typical regime we will work,all these quantities are small. The key quantity is Λ and we will develop an iterative argument to controlit. We first derive an estimate of Λo + |Υi| in terms of Λ. This will be possible only in the event when Λis already small, so we will need to introduce an indicator function φ = 1(Λ 6 M−c) with some small c.More generally, we will consider any indicator function φ so that φΛ ≺M−c. Notice that this is a somewhatweaker concept than φΛ 6 M−c (even if the exponent c is slightly adjusted), but it turns out to be moreflexible since algebraic manipulations involving ≺ (Proposition 6.5) can be directly used.

8.3.1 Large deviation bounds on Λo and Υi

Lemma 8.5. The following statements hold for any spectral domain D. Let φ be the indicator function ofsome (possibly z-dependent) event. If φΛ ≺M−c for some c > 0 holds uniformly in z ∈ D, then

φ(Λo + |Zi|+ |Υi|

)≺

√Immsc + Λ

Mη(8.24)

uniformly in z ∈ D. Moreover, for any fixed (N -independent) η > 0 we have

Λo + |Zi|+ |Υi| ≺ M−1/2 (8.25)

uniformly in z ∈ w ∈ D : Imw = η.

In other words, (8.24) means that

Λo + |Zi|+ |Υi| ≺

√Immsc + Λ

Mη, (8.26)

on the event where Λ ≺M−c has been apriori established.

Proof of Lemma 8.5. We first observe that φΛ ≺ M−c 1 and the positive lower bound |msc(z)| > cimplies that

φ

|Gii|≺ 1. (8.27)

A simple iteration of the expansion formula (8.1) concludes that

φ|G(T)ij | ≺M

−c, for i 6= j, φ|G(T)ii | ≺ 1,

φ

|G(T)ii |≺ 1 (8.28)

for any subset T of fixed cardinality.We begin with the first statement in Lemma 8.5. First we estimate Zi, which we split as

φ|Zi| 6 φ

∣∣∣∣∣(i)∑k

(|hik|2 − sik

)G

(i)kk

∣∣∣∣∣+ φ

∣∣∣∣∣(i)∑k 6=l

hikG(i)kl hli

∣∣∣∣∣ . (8.29)

53

We estimate each term using Theorem 7.7 by conditioning on G(i) and using the fact that the family (hik)Nk=1

is independent of G(i). By (7.57) the first term of (8.29) is stochastically dominated by

φ[ (i)∑

k

s2ik|G

(i)kk |

2]1/2

≺M−1/2,

where (8.28), (6.3) and (6.1) were used. For the second term of (8.29) we apply (7.59) from Theorem 7.7

with akl = s1/2ik G

(i)kl s

1/2li and Xk = s

−1/2ik hik. We find

φ

(i)∑k,l

sik∣∣G(i)

kl

∣∣2sli 6 φ

M

(i)∑k,l

sik∣∣G(i)

kl

∣∣2 =φ

Mη

(i)∑k

sik ImG(i)kk ≺

Immsc + Λ

Mη, (8.30)

where in the last step we used (8.1) and the estimate 1/Gii ≺ 1. Thus we get

φ|Zi| ≺

√Immsc + Λ

Mη, (8.31)

where we absorbed the bound M−1/2 on the first term of (8.29) into the right-hand side of (8.31). Here weonly needed to use Immsc(z) > cη as follows from an explicit estimate, see (6.13).

Next, we estimate Λo. We can iterate (8.2) once to get, for i 6= j,

Gij = −Gii(i)∑k

hikG(i)kj = −GiiG(i)

jj

(hij −

(ij)∑k,l

hikG(ij)kl hlj

). (8.32)

The term hij is trivially O≺(M−1/2). In order to estimate the other term, we invoke (7.58) from Theorem

7.7 with akl = s1/2ik G

(ij)kl s

1/2lj , Xk = s

−1/2ik hik, and Yl = s

−1/2lj hlj . As in (8.30), we find

φ

(i)∑k,l

sik∣∣G(ij)

kl

∣∣2slj ≺ Immsc + Λ

Mη,

and thus

φΛo ≺

√Immsc + Λ

Mη, (8.33)

where we again absorbed the term hij ≺M−1/2 into the right-hand side.In order to estimate Ai and hii in the definition of Υi, we use (8.28) to get

φ[|Ai|+ |hii|] ≺ φΛ2o +M−1/2 6 φΛo + C

√Immsc

Mη≺

√Immsc + Λ

Mη,

where the second step follows from Immsc > cη. Collecting (8.31), (8.33), this completes the proof of (8.24).

The proof of (8.25) is almost identical to that of (8.24). The quantities∣∣G(i)

kk

∣∣ and∣∣G(ij)

kk

∣∣ are estimatedby the trivial deterministic bound η−1 = O(1). We omit the details.

8.3.2 Initial bound on Λ

In this subsection, we prove an initial bound asserting that Λ ≺ M−1/2 holds for z with a large imaginarypart η. This bound is rather easy to get after we have proved in (8.25) that the error term Υi in theself-consistent equation (8.16) is small.

54

Lemma 8.6. We have Λ ≺M−1/2 uniformly in z ∈ [−10, 10] + 2i.

Proof. We shall make use of the trivial bounds

∣∣G(T)ij

∣∣ 6 1

η=

1

2, |msc| 6

1

η=

1

2, (8.34)

where the last inequality follows from the fact that msc is the Stieltjes transform of a probability measure.From (8.25) we get

Λo + |Zi| ≺ M−1/2 . (8.35)

Moreover, we use (8.1) and (8.32) to estimate

|Ai| 6∑j

sij

∣∣∣∣GijGjiGii

∣∣∣∣ 6 M−1 +

(i)∑j

sij∣∣GjiG(i)

jj

∣∣ ∣∣∣∣∣hij −(ij)∑k,l

hikG(ij)kl hlj

∣∣∣∣∣ ≺ M−1/2 ,

where the last step follows by using (7.58), exactly as the estimate of the right-hand side of (8.32) in theproof of Lemma 8.5. We conclude that |Υi| ≺M−1/2.

Next, we write (8.16) as

vi =msc

(∑k sikvk −Υi

)m−1sc −

∑k sikvk + Υi

. (8.36)

Using |m−1sc | > 2 and |vk| 6 1 as follows from (8.34), we find∣∣∣∣m−1

sc +∑k

sikvk −Υi

∣∣∣∣ > 1 +O≺(M−1/2) .

Using |msc| 6 1/2 and taking the maximum over all i in (8.36), we therefore conclude that

Λd 6Λd +O≺(M−1/2)

2 +O≺(M−1/2)=

Λd2

+O≺(M−1/2) , (8.37)

from which the claim follows together with the estimate on Λo from (8.35).

8.3.3 A rough bound on Λ and a continuity argument

The next step is to get a rough bound to Λ by a continuity argument.

Proposition 8.7. We have Λ ≺M−γ/3Γ−1 uniformly in the domain S defined in (8.21).

Proof. The core of the proof is a continuity argument. The first task is to establish a gap in the range of Λby establishing a dichotomy. Roughly speaking, the following lemma asserts that, for all z ∈ S, with highprobability either Λ 6M−γ/2Γ−1 or Λ >M−γ/4Γ−1, i.e., there is a gap or forbidden region in the range ofΛ with very high probability.

Lemma 8.8. We have the bound

1(Λ 6M−γ/4Γ−1

)Λ ≺ M−γ/2Γ−1

uniformly in S.

55

Proof. Setφ := 1

(Λ 6M−γ/4Γ−1

).

Then by definition we have φΛ 6 M−γ/4Γ−1 6 CM−γ/4, where in the last step we have used that Γ is

bounded below (6.19). Hence we may invoke (8.24) to estimate Λo and Υi by√

Immsc+ΛMη . In order to

estimate Λd, we use (8.18) to get

φΛd = φmaxi|vi| ≺ Γ

(Λ2 +

√Immsc + Λ

Mη

). (8.38)

Recalling (6.19) and (8.24), we therefore get

φΛ ≺ φΓ

(Λ2 +

√Immsc + Λ

Mη

). (8.39)

Next, by definition of φ we may estimate

φΓΛ2 6 M−γ/2Γ−1 .

Moreover, by definition of S, we have

1

Mη6 min

M−γ

Γ3,

M−2γ

Γ4 Immsc

. (8.40)

Together with the definition of φ, we have

φΓ

√Immsc + Λ

Mη6 Γ

√Immsc

Mη+ Γ

√Γ−1

Mη6 M−γΓ−1 +M−γ/2Γ−1 6 2M−γ/2Γ−1 .

(Notice that the middle inequality is the crucial place where the definition of ηE and the restriction η > ηEare used.) Plugging this into (8.39) yields φΛ ≺M−γ/2Γ−1, which is the claim.

If we knew that Λ is excluded from the interval [M−γ/2Γ−1,M−γ/4Γ−1], then we could immediatelyfinish the proof of Proposition 8.7. We could argue that Λ = Λ(E + iη) is continuous in η = Im z and hencecannot jump from one side of the gap to the other; moreover, for η = 2 it is below the gap by Lemma 8.6,so Λ is below the gap for all z ∈ S with high probability. For a pictorial illustration of this argument, seeFigure 1.

However, Lemma 8.8 guarantees a gap in the range of Λ only with a very high probability for each fixed z.We need to use a fine discrete grid in the space of z to upgrade this statement to all z with high probability.Then the continuity argument described in the previous paragraph will be valid with high probability. Inthe next step we explain the details.

The continuity argument

Fix D > 10. Lemma 8.8 implies that for each z ∈ S the probability that Λ falls into the gap (or theforbidden region) is very small, i.e., we have

P(M−γ/3Γ(z)−1 6 Λ(z) 6M−γ/4Γ(z)−1

)6 N−D (8.41)

for N > N0, where N0 ≡ N0(γ,D) does not depend on z. All argument below is valid for N > N0.Next, take a lattice ∆ ⊂ S such that |∆| 6 N10 and for each z ∈ S there exists a w ∈ ∆ such that

|z − w| 6 N−4. Then (8.41) combined with a union bounds gives

P(∃w ∈ ∆ : M−γ/3Γ(w)−1 6 Λ(w) 6M−γ/4Γ(w)−1

)6 N−D+10. (8.42)

56

Figure 1: The (η,Λ)-plane for a fixed E with the graph of η → Λ(E + iη). The shaded region is forbiddenwith high probability by Lemma 8.8. The initial estimate, Lemma 8.6, is marked with a black dot. Thegraph of Λ = Λ(E + iη) is continuous and lies beneath the shaded region. Note that this method does notcontrol Λ(E + iη) in the regime η 6 ηE .

From the definitions of Λ(z), Γ(z), and S (recall (6.19)), we immediately find that Λ and Γ are Lipschitzcontinuous on S, with Lipschitz constant at most M2. Hence (8.42) implies that

P(∃z ∈ S : 2M−γ/3Γ(z)−1 6 Λ(z) 6 2−1M−γ/4Γ(z)−1

)6 N−D+10,

i.e., a slightly smaller gap is present simultaneously for all z ∈ S and not only for the discrete lattice ∆.We conclude that there is an event Ξ satisfying P(Ξ) > 1 − N−D+10 such that, for each z ∈ S, either1(Ξ)Λ(z) 6 2M−γ/3Γ(z)−1 or 1(Ξ)Λ(z) > 2−1M−γ/4Γ(z)−1. Since Λ is continuous and S is by definitionconnected, we conclude that either

∀z ∈ S : 1(Ξ)Λ(z) 6 2M−γ/3Γ(z)−1 (8.43)

or∀z ∈ S : 1(Ξ)Λ(z) > 2−1M−γ/4Γ(z)−1 . (8.44)

Here the bounds (8.43) and (8.44) each hold surely, i.e., for every realization of Λ(z).It remains to show that (8.44) is impossible. In order to do so, it suffices to show that there exists

a z ∈ S such that Λ(z) < 2−1M−γ/4Γ(z)−1 with probability greater than 1/2. But this holds for any zwith Im z = 2, as follows from Lemma 8.6 and the bound Γ 6 Cη−1 (6.20). This concludes the proof ofProposition 8.7.

8.3.4 Fluctuation averaging lemma

Recall the definition of the small control parameter Π from (6.30) and the definition of the average [a] of avector (ai)

Ni=1 from (8.12). We have shown in Lemma 8.5 that |Υi| ≺ Π, but in fact the average [Υ] is one

order better; it is Π2. This is due to the fluctuation averaging phenomenon which we state as the followinglemma. We will explain its proof and related earlier results in Section 10.

We shall perform the averaging with respect to a family of complex weights T = (tik) satisfying

0 6 |tik| 6 M−1 ,∑k

|tik| 6 1 . (8.45)

57

Typical example weights are tik = sik and tik = N−1. Note that in both of these cases T commutes with S.

Lemma 8.9 (Fluctuation averaging). Fix a spectral domain D and a deterministic control parameter Ψsatisfying (8.20). Let the weights T = (tik) satisfy (8.45).

(i) If Λ ≺ Ψ, then we have ∑k

tikQkGkk = O≺(Ψ2) . (8.46)

(ii) If Λd ≺M−c for some c > 0 and Λo ≺ Ψo (with Ψo also satisfying (8.20)), then we have∑k

tikQk1

Gkk= O≺(Ψ2

o) . (8.47)

(iii) Assume that T commutes with S and Λ ≺ Ψ. Then we have∑k

tikvk = O≺(ΓΨ2) . (8.48)

If, additionally, we have ∑k

tik = 1, for all i, (8.49)

then ∑k

tik(vk − [v]) = O≺(ΓΨ2) , for all i, (8.50)

where we defined vi := Gii −m.

The estimates (8.46)–(8.48), and (8.50) are uniform in the index i.

Notice that the last statement (8.50) involves the parameter Γ instead of Γ indicating that the spectral gapof S is used in its proof. We now prove a simple corollary of the bound (8.47).

Corollary 8.10. Suppose that Λd ≺ M−c for some c > 0 and Λo ≺ Ψo for some deterministic controlparameter Ψo satisfying (8.20). Suppose that the weights T = (tik) satisfy (8.45). Then we have

∑k takΥk =

O≺(Ψ2o).

Proof. The claim easily follows from the Schur complement formula (8.7) written in the form

Υi = Ai +Qi1

Gii.

We may therefore estimate∑k takΥk using the trivial bound |Ai| ≺ Ψ2

o as well as the fluctuation averagingbound from (8.47).

8.3.5 The final iteration scheme

First note that Proposition 8.7 guarantees that φ ≡ 1 may be chosen in Lemma 8.5, since the conditionφΛ ≺M−c is satisfied. Therefore Lemma 8.5 asserts that Λo is stochastically dominated by√

Immsc + Λ

Mη6M−εΛ +

√Immsc

Mη+Mε

Mη,

where we have used the Schwarz inequality. The next lemma, the main estimate behind the proof ofTheorem 6.7, extends this estimate to bound Λd with the same quantity. Thus, roughly speaking, we canestimate Λ by M−εΛ plus a deterministic error term. This gives a recursive relation on the upper bound forΛ which will be the basic step of our iteration scheme.

58

Proposition 8.11. Let Ψ be a control parameter satisfying

cM−1/2 6 Ψ 6 M−γ/3Γ−1 (8.51)

and fix ε ∈ (0, γ/3). Then on the domain S we have the implication

Λ ≺ Ψ =⇒ Λ ≺ F (Ψ) , (8.52)

where we defined

F (Ψ) := M−εΨ +

√Immsc

Mη+Mε

Mη.

Proof. Suppose that Λ ≺ Ψ for some deterministic control parameter Ψ satisfying (8.51). We invoke Lemma8.5 with φ = 1 (recall the bound (6.19)) to get

Λo + |Zi|+ |Υi| ≺

√Immsc + Λ

Mη≺

√Immsc + Ψ

Mη. (8.53)

Next, we estimate Λd. Define the z-dependent indicator function

ψ := 1(Λ 6M−γ/4) .

By (8.51), (6.19), and the assumption Λ ≺ Ψ, we have 1− ψ ≺ 0. On the event ψ = 1, (8.17) is rigorousand we get the bound

ψ|vi| 6 Cψ

∣∣∣∣∑k

sikvk −Υi

∣∣∣∣+ CψΛ2 .

Using the fluctuation averaging estimate (8.48) to bound∑k sikvk and (8.53) to bound Υi, we find

ψ|vi| ≺ ΓΨ2 +

√Immsc + Ψ

Mη, (8.54)

where we used the assumption Λ ≺ Ψ and the lower bound from (6.19) so that CψΛ2 6 ΓΨ2. Since the setψ = 0 has very small probability (using our notation, this is expressed by 1− ψ ≺ 0), we conclude

Λd ≺ ΓΨ2 +

√Immsc + Ψ

Mη, (8.55)

which, combined with (8.53), yields

Λ ≺ ΓΨ2 +

√Immsc + Ψ

Mη. (8.56)

Using Schwarz inequality and the assumption Ψ 6M−γ/3Γ−1 we conclude the proof.

Finally, we complete the proof of Theorem 6.7. It is easy to check that, on the domain S, if Ψ satisfies(8.51) then so does F (Ψ). In fact this step is the origin of the somewhat complicated definition of ηE .We may therefore iterate (8.52). This yields a bound on Λ that is essentially the fixed point of the mapΨ 7→ F (Ψ), which is given by Π, defined in (6.30), (up to the factor Mε). More precisely, the iteration isstarted with Ψ0 := M−γ/3Γ−1; the initial hypothesis Λ ≺ Ψ0 is provided by Proposition 8.7. For k > 1 weset Ψk+1 := F (Ψk). Hence from (8.52) we conclude that Λ ≺ Ψk for all k. Choosing k := dε−1e yields

Λ ≺

√Immsc

Mη+Mε

Mη.

59

Since ε was arbitrary, we have proved that

Λ ≺ Π =

√Immsc(z)

Mη+

1

Mη, (8.57)

which is (6.31)What remains is to prove (6.32). On the set ψ = 1, we once again use (8.17) to get

ψm2sc

(−∑k

sikvk + Υi

)= −ψvi +O(ψΛ2) . (8.58)

Averaging in (8.58) yieldsψm2

sc

(−[v] + [Υ]

)= −ψ[v] +O(ψΛ2) . (8.59)

By (8.57) and (8.53) with Ψ = Π, we have Λ + |Υi| ≺ Π. Moreover, by Corollary 8.10 (with the choice oftik = 1

N ) we have |[Υ]| ≺ Π2. Thus we get

ψ[v] = m2scψ[v] +O≺(Π2) .

Since 1− ψ ≺ 0, we conclude that [v] = m2sc[v] +O≺(Π2). Therefore

|[v]| ≺ Π2

|1−m2sc|6

(Immsc

|1−m2sc|

+1

|1−m2sc|Mη

)2

Mη6

(C +

Γ

Mη

)2

Mη6

C

Mη. (8.60)

Here in the third step we used the elementary explicit bound Immsc 6 C|1 −m2sc| from (6.12)–(6.13) and

the bound Γ > |1 −m2sc|−1 from (6.21). Since |mN −msc| = |[v]|, this concludes the proof of (6.32). The

proof of (6.33) is exactly the same, just in the third inequality of (8.60) we may use the stronger boundImmsc η/

√κ+ η from (6.13) in the regime |E| > 2. This completes the proof of Theorem 6.7 in the entire

regime S, i.e., for η > ηE .

8.3.6 Summary of the proof: a recapitulation

In order to highlight the main ideas again, we summarize the key steps leading to the proof of Theorem 6.7restricted to the regime η > ηE . As a guiding principle we stress that the main control parameter in theproof is Λ = maxij|Gii −msc|, |Gij |. The main goal is to get a closed, self-improving inequality for Λ, atleast with a very high probability. We needed the following ingredients:

1) A self consistent system of equations for the diagonal elements of the resolvent, written as

−(Sv)i + Υi =1

msc + vi− 1

msc, where vi = Gii −msc. (8.61)

Since Υi contains the resolvent G(i), we also need perturbation formulas connecting G and G(i) tomake these equations self-consistent.

2) A large deviation bound on the off-diagonal elements and on the fluctuation, i.e., on Λo + |Zi|, as

Λo + |Zi|+ |Υi| ≺

√Immsc + Λ

Mη; (8.62)

3) An initial estimate on Λ for large η, where the a priori bounds |Gij | 6 η−1 6 C are effective;

4) A dichotomy, showing that there is a forbidden region for Λ:

1(Λ 6M−γ/4Γ−1

)Λ ≺ M−γ/2Γ−1;

60

5) A crude bound on Λ down to small η obtained from the dichotomy and the initial estimate via acontinuity argument;

6) Application of the fluctuation averaging lemma, i.e., Lemma 8.9, to estimate Sv in the self-consistentequation. Thus Sv becomes of order Λ2, i.e., one order higher than the trivial bound |vi| 6 Λ. This“boost” exploits the cancellations in averages of weakly correlated centered random variables and itconstitutes the crucial improvement over Section 7. Thus we can use (8.17) to have

vi . Υi +O(Λ2). (8.63)

7) Combination of the large deviation bound (8.62) for Υi with (8.63) provides an estimate on Λd =maxi |vi| in terms of Λ that is better than the trivial bound Λd 6 Λ. Together with the large deviationbound on Λo in (8.62), we have a closed inequality for Λ:

Λ 6 M−εΛ +

√Immsc

Mη+Mε

Mη.

The error term M−εΛ can be absorbed into the left side and we have proved the estimate on Λ assertedin Theorem 6.7. In practice, the last inequality was formulated in terms of a control parameter andwe used an iteration scheme.

8) Averaging the self-consistent equation (8.17) in order to obtain a stronger estimate on the averagequantity [v]. Fluctuation averaging is used again, this time to the average of Υ, so that it becomes oneorder higher. This concludes the proof of Theorem 6.7 for η > ηE .

61

9 Sketch of the proof of the local semicircle law using the spectral gap

In Section 8.3 we proved the local semicircle law, Theorem 6.7, uniformly for η > ηE instead of the largerregime η > ηE . Recall that for generalized Wigner matrices the two thresholds ηE and ηE are determinedby the relation NηE(κE + ηE)3/2 1 and ηE 1/N . Hence these two thresholds coincide in the bulk butsubstantially differ near the edge. In this section we sketch the proof of Theorem 6.7 for any η > ηE , i.e.,for the optimal domain for η. For the complete proof we refer to Section 6 of [55]. This section can be readindependently of Section 8.3.

We point out that the difference between these two thresholds stems from the difference between Γ andΓ, see (6.28) and (8.19). The bound Γ on the norm of (1−m2S)−1 entered the proof when the self-consistent

equation (8.17) were solved. The key idea in this section is that we can use Γ to solve the self-consistentequation (8.17) separately on the subspace of constants (the span of the vector e) and on its orthogonalcomplement e⊥.

Step 1. We bound the control parameter Λ = Λ(z) from (8.23) in terms of Θ := |mN −msc|. This isthe content of the following lemma. From now on, we assume that c 6 Γ 6 C, which is valid for generalizedWigner matrices, see (6.24). This will simplify several estimates in the argument that follows; for the generalcase, we refer to [55].

Lemma 9.1. Define the z-dependent indicator function

φ := 1(Λ 6M−γ/4) (9.1)

and the random control parameter

q(Θ) :=

√Immsc + Θ

Mη+

1

Mη, Θ = |mN −msc|. (9.2)

Then we haveφΛ ≺ Θ + q(Θ) . (9.3)

Proof. For the whole proof we work on the event φ = 1, i.e., every quantity is multiplied by φ. Weconsistently drop these factors φ from our notation in order to avoid cluttered expressions. In particular, weset Λ 6M−γ/4 throughout the proof.

We begin by estimating Λo and Λd in terms of Θ. Recalling (6.19), we find that φ satisfies the hypothesesof Lemma 8.5, from which we get

Λo + |Υi| ≺ r(Λ) , r(Λ) :=

√Immsc + Λ

Mη. (9.4)

In order to estimate Λd, from (8.17), we have, on the event φ = 1, that

vi −m2sc

∑k

sikvk = O≺(Λ2 + r(Λ)

); (9.5)

here we used the bound (9.4) on |Υi|. Next, we subtract the average N−1∑i from each side to get

(vi − [v])−m2sc

∑k

sik(vk − [v]) = O≺(Λ2 + r(Λ)

),

where we used∑k sik = 1. Note that the average over i of the left-hand side vanishes, so that the average

of the right-hand side also vanishes. Hence the right-hand side is perpendicular to e. Inverting the operator1−m2

scS on the subspace e⊥ and using Γ 6 C therefore yields∣∣vi − [v]∣∣ ≺ Λ2 + r(Λ) . (9.6)

62

Combining with the bound Λo ≺ r(Λ) from (9.4) and recalling that Θ = |[v]|, we therefore get

Λ ≺ Θ + Λ2 + r(Λ) . (9.7)

By definition of φ we have Λ2 6 M−γ/4Λ, so that the second term on the right-hand side of (9.7) may beabsorbed into the left-hand side:

Λ ≺ Θ + r(Λ) , (9.8)

where we used the cancellation property (6.25). Using (9.8) and the Cauchy-Schwarz inequality, we get

r(Λ) 6

√Immsc + Λ

Mη≺

√Immsc + Θ

Mη+

√r(Λ)

Mη6

√Immsc + Θ

Mη+M−εr(Λ) +Mε 1

Mη

for any ε > 0. Since ε > 0 can be arbitrarily small, we conclude that

r(Λ) ≺ q(Θ) . (9.9)

Clearly, (9.3) follows from (9.8) and (9.9).

Step 2. Having estimated Λ in terms of Θ, we can derive a closed relation involving only Θ. Moreprecisely, we will show that the following self-consistent equation holds:

φ(

(1−m2sc)[v]−m−1

sc [v]2)

= φO≺(q(Θ)2 +M−γ/4Θ2

). (9.10)

Considering the right hand side as a small error, we will view (9.10) as a small perturbation of a quadraticequation for [v]. Recalling that Θ = |[v]|, the error term can be determined self-consistently. Thus, up tothe accuracy of the error terms, we can solve (9.10) for [v].

We now give a heuristic proof for (9.10). There are two main issues which we ignore in this sketch. First,we work only on the event φ = 1, i.e., we assume that an a-priori bound Λ 1 has already been proveduniformly for all η > ηE . It will require a separate argument to show that the complement event φ = 0is negligible and in some sense this constitutes the essential part to extend the proof of Theorem 6.7 fromη > ηE to the larger regime η > ηE . Second, we will neglect certain subtleties of the ≺ relation. Whilemost arithmetics involving ≺ works in the same way as the usual inequality relation 6, the cancellation rule(6.25) requires the coefficient of X on the right hand side to be small. We will disregard this requirementbelow and we will treat ≺ as the usual 6.

Since we work on the event φ = 1, it is understood that every quantity below is multiplied by φ, butwe will ignore this fact in the formulas. Recall from (8.17) that

vi −m2sc

∑k

sikvk +m2scΥi = m−1

sc v2i +O(Λ3) . (9.11)

In order to take the average over i and get a closed equation for [v], we write, using (9.6),

v2i =

([v] + vi − [v]

)2= [v]2 + 2[v](vi − [v]) +O≺

((Λ2 + r(Λ)

)2).

Plugging this back into (9.11) and taking the average over i gives

(1−m2sc)[v] +m2

sc[Υ] = m−1sc [v]2 +O≺

(Λ3 + r(Λ)2

).

By Corollary 8.10 with tik = 1N , we can estimate

[Υ] ≺ Λ2o ≺ r(Λ)2 ≺ q(Θ)2, (9.12)

63

where we have used (8.24) and (9.9). Hence we have

(1−m2sc)[v]−m−1

sc [v]2 = O≺

(Λ3 + r(Λ)2

)6 O≺

(Θ3 + q(Θ)2

),

where we have used (9.3). This shows (9.10).

Step 3. Finally, we need to solve the approximately quadratic relation (9.10) for [v], keeping in mindthat Θ = |[v]|. This will give the main estimates (6.32) and (6.33) in Theorem 6.7. The entry-wise versionof the local semicircle law, (6.31), will then directly follow from (9.3) on the event φ = 1.

Recall the definition of q from (9.2) so that

q(Θ)2 Immsc + Θ

Mη+

1

(Mη)2, (9.13)

and recall from Lemma 6.2 that

c 6 |msc(z)| 6 1− cη , |1−m2sc(z)|

√κ+ η, Immsc(z)

√κ+ η if |E| 6 2η√κ+η

if |E| > 2 .(9.14)

In particular, the coefficient of Θ2 = [v]2 in the left hand side of (9.10) is a non vanishing constant, in theright hand side it is at most M−γ/4 1, so the latter can be absorbed into the former. We can thus rewrite(9.10) as

[v]2 −B[v] = D, (9.15)

where

B =1−m2

sc +O((Mη)−1

)m−1sc +O(M−γ/4)

1−m2sc +O

((Mη)−1

), |D| 6

ImmscMη + 1

(Mη)2

m−1sc +O(M−γ/4)

Immsc

Mη+

1

(Mη)2.

This quadratic equation has two solutions

[v] =B ±

√B2 + 4D

2, (9.16)

the correct one is selected by a continuity argument. Fix an energy |E| 6 2, consider [v] = [v(E + iη)] as afunction of η and look at the two solution branches (9.16) as continuous functions of η.

It is easy to check that for η 1 large we have |B| 1 and |D| 1, so the two solution are wellseparated, one of them is close to 0, the other one close to B. The a priori bound |[v]| 6 η−1 1 guaranteesthat the correct branch is the one near 0. A more careful analysis of (9.16) with the given coefficient functionsB(η) and D(η) shows that on this branch the following inequality holds:

Θ = |[v]| . (Mη)−1. (9.17)

To see this more precisely, we distinguish two cases by reducing η from a very large value to 0. In thefirst regime, where Mη

√κ+ η 1, we have |B|

√κ+ η and |D| .

√κ+ η/Mη |B|2, so the two

branches remain separated and the relevant branch satisfies Θ . |D/B| . (Mη)−1. As we decrease η andenter the regime Mη

√κ+ η . 1 both solutions satisfy Θ . |B| +

√|D|. In this regime |B| . (Mη)−1 and

|D| . (Mη)−2, so we also have (9.17). In both cases we conclude the final bound (6.32). The proof of (6.33)for the regime |E| > 2 is similar. The improvement comes from the fact that in this regime (9.14) gives abetter bound for Immsc(z).

64

10 Fluctuation averaging mechanism

10.1 Intuition behind the fluctuation averaging

Before starting the proof of Lemma 8.9 we explain why it is so important in our context. We also give aheuristic second moment calculation to show the essential mechanism.

The leading error in the self-consistent equation (8.58) for vi is Υi. Among the three summands of Υi,see (8.11), typically Zi is the largest, thus we typically have

vi ≺ Zi (10.1)

in the regime where Γ is bounded. The large deviation bounds in Theorem 7.7 show that Zi ≺ Λo and asimple second moment calculation shows that this bound is essentially optimal. On the other hand, (8.3)shows that the typical size of the off-diagonal resolvent matrix elements is at least of order (Nη)−1/2, thusthe estimate

Zi ≺ Λo ≺1√Nη

is essentially optimal in the generalized Wigner case (M N). Together with (10.1) this shows that thenatural bound for Λ is (Nη)−1/2, which is also reflected in the bound (6.31).

However, the bound (6.32) for the average, mN −m = [v], is of order Λ2 (Nη)−1, i.e., it is one orderbetter than the bound for vi. For the purpose of estimating [v], it is the average [Υ] of the leading errors Υi

that matters, see (8.59). Since Zi, the leading term in Υi, is a fluctuating quantity with zero expectation,the improvement comes from the fact that fluctuations cancel out in the average. The basic example ofthis phenomenon is the central limit theorem; the average of N independent centered random variables istypically of order N−1/2, much smaller than the size of each individual variable. In our case, however, Ziare not independent. In fact, their correlations do not decay, at least in the Wigner case where the columnof H with an index other than i plays the same role in the function Zi. Thus standard results on centrallimit theorems for weakly correlated random variables do not apply here and a new argument is needed. Wefirst give an idea how the fluctuation averaging mechanism works by a second moment calculation.

Second moment calculation. First we claim that∣∣∣∣Qk 1

Gkk

∣∣∣∣ ≺ Ψo. (10.2)

Indeed, from the Schur complement formula (8.7) we get |Qk(Gkk)−1| 6 |hkk| + |Zk|. The first term isestimated by |hkk| ≺ M−1/2 6 Ψo. The second term is estimated exactly as in (8.29) and (8.30), giving

|Zk| ≺ Ψo. In fact, the same bound holds if Gkk is replaced with G(T)kk as long as the cardinality of T is

bounded, see (10.10) later.Abbreviate Xk := Qk(Gkk)−1 and compute the variance

E∣∣∣∣∑k

tikXk

∣∣∣∣2 =∑k,l

tiktilEXkXl =∑k

t2ikEXkXk +∑k 6=l

tiktilEXkXl . (10.3)

Using the bounds (8.45) on tik and (10.2), we find that the first term on the right-hand side of (10.3) isO≺(M−1Ψ2

o) = O≺(Ψ4o), where we used that Ψo is admissible, recalling (8.20). Let us therefore focus on the

second term of (10.3). Using the fact that k 6= l, we apply the first resolvent decoupling formula (8.1) to Xk

and Xl to get

EXkXl = EQk(

1

Gkk

)Ql

(1

Gll

)= EQk

(1

G(l)kk

− GklGlk

GkkG(l)kkGll

)Ql

(1

G(k)ll

− GlkGkl

GllG(k)ll Gkk

). (10.4)

Notice that we used (8.1) in the form

1

G(T)kk

=1

G(Tl)kk

−G

(T)kl G

(T)lk

G(T)kk G

(Tl)kk G

(T)ll

for any k 6= l, k, l 6∈ T, (10.5)

65

with T = ∅. We multiply out the parentheses on the right-hand side of (10.4). The crucial observationis that if the random variable Y is independent of i (see Definition 7.3) then not only QiY = 0 but alsoEQi(X)Y = EQi(XY ) = 0 holds for any X. Hence out of the four terms obtained from the right-hand sideof (10.4), the only nonvanishing one is

EQk(

GklGlk

GkkG(l)kkGll

)Ql

(GlkGkl

GllG(k)ll Gkk

)≺ Ψ4

o ,

where we used that the denominators are harmless, see (8.27)–(8.28). Together with (8.45), this concludes

the proof of E∣∣∑

k tikXk

∣∣2 ≺ Ψ4o, which means that

∑k tikXk is bounded by Ψ2

o in second moment sense.

10.2 Proof of Lemma 8.9

Here we follow the presentation of the proof from [55] and we will comment on previous results in Sec-tion 10.3.1. We start with a simple lemma which summarizes the key properties of ≺ when combined withpartial expectation.

Lemma 10.1. Suppose that the deterministic control parameter Ψ satisfies Ψ > N−C , and that for all p thereis a constant Cp such that the nonnegative random variable X satisfies EXp 6 NCp . Suppose moreover thatX ≺ Ψ. Then for any fixed n ∈ N we have

EXn ≺ Ψn . (10.6)

(Note that this estimate involves deterministic quantities only, i.e., it means that EXn 6 NεΨn for anyε > 0 if N > N0(n, ε).) Conversely, if (10.6) holds for any n ∈ N, then X ≺ Ψ. Moreover, we have

PiXn ≺ Ψn , QiX

n ≺ Ψn (10.7)

uniformly in i. If X = X(u) and Ψ = Ψ(u) depend on some parameter u and the above assumptions areuniform in u, then so are the conclusions.

This lemma is a generalization of (6.26), but notice that the majoring quantity Ψ has to be deterministic.For general random variables, X ≺ Y may not imply the analogous relation E[X|F ] ≺ E[Y |F ] for conditionalexpectations with respect to a σ-algebra F .

Proof of Lemma 10.1. To prove (10.6), it is enough to consider the case n = 1; the case of larger n followsimmediately from the case n = 1, using the basic property of the stochastic domination that X ≺ Ψ impliesXn ≺ Ψn.

For the first claim, pick ε > 0. Then

EX = EX · 1(X 6 NεΨ) + EX · 1(X > NεΨ) 6 NεΨ +√EX2

√P(X > NεΨ) 6 NεΨ +NC2/2−D/2 ,

for arbitraryD > 0. The first claim therefore follows by choosingD large enough. For the converse statement,we use Chebyshev inequality: for any ε > 0 and D > 0,

P(X > NεΨ) 6

EXn

NεnΨn6 Nε−εn 6 N−D,

by choosing n large enough.Finally, the claim (10.7) follows from Chebyshev’s inequality, using a high-moment estimate combined

with Jensen’s inequality for partial expectation. We omit the details, which are similar to those of the firstclaim.

We remark that it is tempting to generalize the converse of (10.6) and try to verify stochastic dominationX ≺ Y between two random variables by proving that EXn 6 EY n. This implication in general is wrong,but it is correct if Y is deterministic. This subtlety is the reason why many of the following theorems about

66

two random variables A and B are formulated in such a way that “if A satisfies A ≺ Ψ for some deterministicΨ then, B ≺ Ψ”, instead of directly saying the more natural (but sometimes wrong) assertion that A ≺ B.

We shall apply Lemma 10.1 to the resolvent entries of G. In order to verify its assumptions, we recordthe following bounds.

Lemma 10.2. Suppose that Λ ≺ Ψ and Λo ≺ Ψo for some deterministic control parameters Ψ and Ψo bothsatisfying (8.20). Fix p ∈ N. Then for any i 6= j and T ⊂ 1, . . . , N satisfying |T| 6 p and i, j /∈ T we have

G(T)ij = O≺(Ψo) ,

1

G(T)ii

= O≺(1) . (10.8)

Moreover, we have the rough bounds∣∣G(T)

ij

∣∣ 6M and

E

∣∣∣∣∣ 1

G(T)ii

∣∣∣∣∣n

6 Nε (10.9)

for any ε > 0 and N > N0(n, ε).

Proof. The bounds (10.8) follow easily by a repeated application of (8.1), the assumption Λ ≺ M−c, and

the lower bound in (6.11). The deterministic bound∣∣G(T)

ij

∣∣ 6 M follows immediately from η > M−1 bydefinition of a spectral domain.

In order to prove (10.9), we use the Schur complement formula (8.7) applied to 1/G(T)ii , where the

expectation is estimated using (5.6) and∣∣G(T)

ij

∣∣ 6M . (Recall (6.4).) This gives

E

∣∣∣∣∣ 1

G(T)ii

∣∣∣∣∣p

≺ NCp

for all p ∈ N. Since 1/G(T)ii ≺ 1, (10.9) therefore follows from (10.6).

We now start the proof of Lemma 8.9. The main statement is (8.47), the other bounds will be relativelysimple consequences. Finally, we present an alternative argument for (8.47) that organizes the stopping rulein the expansion somewhat differently.

Proof of Lemma 8.9. Part I: Proof of (8.47). First we claim that, for any fixed p ∈ N, we have∣∣∣∣Qk 1

G(T)kk

∣∣∣∣ ≺ Ψo (10.10)

uniformly for T ⊂ 1, . . . ,N, |T| 6 p, and k /∈ T. To simplify notation, for the proof we set T = ∅; the prooffor nonempty T is the same. From the Schur complement formula (8.7) we get |Qk(Gkk)−1| 6 |hkk|+ |Zk|.The first term is estimated by |hkk| ≺ M−1/2 6 Ψo. The second term is estimated exactly as in (8.29) and(8.30):

|Zk| ≺

((k)∑x 6=y

skx∣∣G(k)

xy

∣∣2syk)1/2

≺ Ψo ,

where in the last step we used that∣∣G(k)

xy

∣∣ ≺ Ψo as follows from (10.8), and the bound 1/|Gkk| ≺ 1 (recallthat Λ ≺ Ψ 6M−c). This concludes the proof of (10.10).

Abbreviate Xk := Qk(Gkk)−1. We shall estimate∑k tikXk in probability by estimating its p-th moment

by Ψ2po , from which the claim (8.47) will easily follow using Chebyshev’s inequality.

After this preparation, the rest of the proof for (8.47) can be divided into four steps.

67

Step 1: Coincidence structure in the expansion of the Lp norm. Fix some even integer p and write

E∣∣∣∣∑k

tikXk

∣∣∣∣p =∑

k1,...,kp

tik1 · · · tikp/2tikp/2+1· · · tikp EXk1 · · ·Xkp/2Xkp/2+1

· · ·Xkp .

Next, we regroup the terms in the sum over k := (k1, . . . , kp) according to the coincidence structure in kas follows: given a sequence of indices k, define the partition P(k) of 1, . . . , p by the equivalence relationr ∼ s if and only if kr = ks. Denote the set of all partitions of 1, . . . , p by Pp. Then we write

E∣∣∣∣∑k

tikXk

∣∣∣∣p =∑P∈Pp

∑k

tik1 · · · tikp/2tikp/2+1· · · tikp 1(P(k) = P )V (k) , (10.11)

where we definedV (k) := EXk1 · · ·Xkp/2Xkp/2+1

· · ·Xkp .

Given a partition P , for any r ∈ 1, . . . , p, we denote by [r] the block of r in P , i.e., the set of all indices inthe same block of the partition as r. Let L ≡ L(P ) := r : [r] = r ⊂ 1, . . . , p be the set of lone labels.We denote by kL := (kr)r∈L the summation indices associated with lone labels.

Step 2: Resolution of dependence in weakly dependent random variables. The resolvent entry Gkk dependsstrongly on the randomness in the k-column of H, but only weakly on the randomness in the other columns.We conclude that if r is a lone label then all factors Xks with s 6= r in V (k) depend weakly on the randomnessin the kr-th column of H (if r is not a lone label then this statement holds only for “all factors Xks withks 6= kr”). Thus, the idea is to make all resolvent entries inside the expectation of V (k) as independent of theindices kL as possible (see Definition 7.3), using the first decoupling resolvent identity (8.1): for x, y, u /∈ Tand x, y 6= u (x can be equal to y),

G(T)xy = G(Tu)

xy +G

(T)xuG

(T)uy

G(T)uu

(10.12)

and for x, u /∈ T and x 6= u

1

G(T)xx

=1

G(Tu)xx

− G(T)xuG

(T)ux

G(T)xxG

(Tu)xx G

(T)uu

. (10.13)

Definition 10.3. A resolvent entry G(T)xy with x, y /∈ T is maximally expanded with respect to a set B ⊂

1, . . . N if B ⊂ T ∪ x, y.

Given the set kL of lone indices, we say that a resolvent entry G(T)xy is maximally expanded if it is

maximally expanded with respect to the set B = kL. The motivation behind this definition is that using(8.1) we cannot add upper indices from the set kL to a maximally expanded resolvent entry. We shallapply (8.1) to all resolvent entries in V (k). In this manner we generate a sum of monomials consisting ofoff-diagonal resolvent entries and inverses of diagonal resolvent entries. We can now repeatedly apply (8.1)to each factor until either they are all maximally expanded or a sufficiently large number of off-diagonalresolvent entries has been generated. The cap on the number of off-diagonal entries is introduced to ensurethat this procedure terminates after a finite number of steps.

In order to define the precise algorithm, let A denote the set of monomials in the off-diagonal entries

G(T)xy , with T ⊂ kL, x 6= y, and x, y ∈ k \ T, as well as the inverse diagonal entries 1/G

(T)xx , with T ⊂ kL and

x ∈ k \ T. Starting from V (k), the algorithm will recursively generate sums of monomials in A. Let d(A)denote the number of off-diagonal entries in A ∈ A. For A ∈ A we shall define w0(A), w1(A) ∈ A satisfying

A = w0(A) + w1(A) , d(w0(A)) = d(A) , d(w1(A)) > max

2, d(A) + 1. (10.14)

The idea behind this splitting is to use (8.1) on one entry of A; the first term on the right-hand side of (8.1)gives rise to w0(A) and the second to w1(A). The precise definition of the algorithm applied to A ∈ A is asfollows.

68

(1) If all factors of A are maximally expanded or d(A) > p+ 1 then stop the expansion of A.

(2) Otherwise choose some (arbitrary) factor of A that is not maximally expanded. If this entry is off-

diagonal, G(T)xy , we use (10.12) to decompose G

(T)xy into a sum of two terms with u the smallest element

in kL \ (T∪x, y). If the chosen entry is diagonal, 1/G(T)xx , we use (10.13) to decompose G

(T)xx into two

terms with u the smallest element in kL \ (T∪ x). The choice of u to be the smallest element is not

important, we just chose it for definiteness of the algorithm. From the splitting of the factor G(T)xy or

1/G(T)xx in the monomial A, we obtain a splitting A = w0(A) + w1(A) (i.e., we replace G

(T)xy or 1/G

(T)xx

by the right-hand sides of (10.12) or (10.13)).

It is clear that (10.14) holds with the algorithm just defined. In fact, in most cases the last inequality in(10.14) is an equality, d(w1(A)) = max

2, d(A) + 1

. The only exception is when (10.12) is used for x = y,

then two new off-diagonal entries are added, i.e., d(w1(A)) = d(A) + 2. Notice also that this algorithmcontains some arbitrariness in the choice of the factor of A to be expanded but any choice will work for theproof.

We now apply this algorithm recursively to each entry Ar := 1/Gkrkr in the definition of V (k). Moreprecisely, we start with Ar and define Ar0 := w0(Ar) and Ar1 := w1(Ar). In the second step of the algorithmwe define four monomials

Ar00 := w0(Ar0) , Ar01 := w0(Ar1) , Ar10 := w1(Ar0) , Ar11 := w1(Ar1) ,

and so on, at each iteration performing the steps (1) and (2) on each new monomial independently of theothers. Note that the lower indices are binary sequences that describe the recursive application of theoperations w0 and w1. In this manner we generate a binary tree whose vertices are given by finite binarystrings σ. The associated monomials satisfy Arσi := wi(A

rσ) for i = 0, 1, where σi denotes the binary string

obtained by appending i to the right end of σ. See Figure 2 for an illustration of the tree. Notice that anymonomial that is obtained along these recursion is of the form

G(T1)x1y1G

(T2)x2y2 . . .

G(T′1)z1z1G

(T′2)z2z2 . . .

, (10.15)

where all factors in the numerators are off-diagonal entries (xi 6= yi) and all factors in the denominators arediagonal entries. For later usage, we shall refer to the sum

|T1|+ |T2|+ . . .+ |T′1|+ |T′2|+ . . . (10.16)

as the total number of upper indices in the monomial (10.15). (The absolute value denotes the cardinalityof the set.)

We stop the recursion of a tree vertex whenever the associated monomial satisfies the stopping rule ofstep (1). In other words, the set of leaves of the tree is the set of binary strings σ such that either all factorsof Arσ are maximally expanded or d(Arσ) > p+ 1.

We list a few obvious facts of this algorithm:

(i) d(Arσ) 6 p+ 1 for any vertex σ of the tree (by the stopping rule in (1)).

(ii) the number of ones in any σ is at most p (since each application of w1 increases d(·) by at least one).

(iii) the number of resolvent entries (in the numerator and denominator) in Arσ is bounded by 4p+ 1 (sinceeach application of w1 increases the number of resolvent entries by at most four, and the applicationof w0 does not change this number).

(iv) The maximal total number of upper indices in Arσ for any tree vertex σ is (4p+ 1)p (since |Ti| 6 p and|T′i| 6 p in the formula (10.16)).

(v) σ contains at most (4p + 1)p zeros (since each application of w0 increases the total number of upperindices by one).

69

Figure 2: The binary tree generated by applying the algorithm (1)–(2) to a monomial Ar. Each vertex of thetree is indexed by a binary string σ, and encodes a monomial Arσ. An arrow towards the left represents theaction of w0 and an arrow towards the right the action of w1. The monomial Ar11 satisfies the assumptionsof step (1), and hence its expansion is stopped, so that the tree vertex 11 has no children.

(vi) The maximal length of the string σ (i.e., the depth of the tree) is at most (4p+ 1)p+ p = 4p2 + 2p.

(vii) The number of leaves of a tree is bounded by (Cp2)p (since this number is bounded by∑pk=0

(4p2+2p

k

)6

(Cp2)p where k is the number of ones in the string encoding the leaf ). Denoting by Lr the set of leavesof the binary tree generated from Ar, we have |Lr| 6 (Cp2)p. In particular, the algorithm terminatesafter at most (Cp2)p steps since each step generates a new leaf.

By definition of the tree and w0 and w1, we have the following resolution of dependence decomposition

Xkr = Qkr∑σ∈Lr

Arσ , (10.17)

where each monomial Arσ for σ ∈ Lr either consists entirely of maximally expanded resolvent entries orsatisfies d(Arσ) = p + 1. (This is an immediate consequence of the stopping rule in (1)). Using (10.11) and(10.17) we have the representation

V (k) =∑σ1∈L1

· · ·∑σp∈Lp

E(Qk1A

1σ1

)· · ·(QkpA

pσp

). (10.18)

Step 3. Bounding the individual terms in the expansion (10.18). We now claim that any nonzero term onthe right-hand side of (10.18) satisfies

E(Qk1A

1σ1

)· · ·(QkpA

pσp

)= O≺

(Ψp+|L|o

). (10.19)

Proof of (10.19). Before embarking on the proof, we explain its idea. First notice that for any string σ wehave

Akσ = O≺(Ψb(σ)+1o

), (10.20)

where b(σ) is the number ones in the string σ. Indeed, if b(σ) = 0 then this follows from (10.10); if b(σ) > 1this follows from the last statement in (10.14) which guarantees that every one in the string σ increases theexponent of Ψo by at least one (we also use (10.8)). In particular, each Akσ is bounded by at least Ψo.

If we used only the trivial bound Akσ = O≺(Ψo

)for each factor in (10.19), i.e., we did not exploit the

gain from the w1(A) type terms in the expansion, then the naive size of the left-hand side of (10.19) wouldonly be Ψp

o. The key observation behind (10.19) is that each lone label s ∈ L yields one extra factor Ψo tothe estimate. This is because the expectation in (10.18) would vanish if all other factors

(QkrA

rσr

), r 6= s,

70

were independent of ks. The expansion of the binary tree makes this dependence explicit by exhibitingks as a lower index. But this requires performing an operation w1 with the choice u = ks in (10.12) or(10.13). However, w1 increases the number of off-diagonal element by at least one. In other words, everyindex associated with a lone label must have a “partner” index in a different resolvent entry which aroseby application of w1. Such a partner index may only be obtained through the creation of at least oneoff-diagonal resolvent entry. The actual proof below shows that this effect applies cumulatively for all lonelabels.

In order to give the rigorous proof of (10.19), we consider two cases. Consider first the case where forsome r = 1, . . . , p the monomial Arσr on the left-hand side of (10.19) is not maximally expanded. Thend(Arσr ) = p + 1, so that (10.8) yields Arσr ≺ Ψp+1

o . Therefore the observation that Asσs ≺ Ψo for all s 6= r,together with (10.7) implies that the left-hand side of (10.19) is O≺

(Ψ2po

). Since |L| 6 p, (10.19) follows.

Consider now the case where Arσr on the left-hand side of (10.19) is maximally expanded for all r =1, . . . , p. The key observation is the following claim about the left-hand side of (10.19) with a nonzeroexpectation.

(∗) For each s ∈ L there exists r := τ(s) ∈ 1, . . . , p\s such that the monomial Arσr contains a resolvententry with lower index ks.

To prove (∗), suppose by contradiction that there exists an s ∈ L such that for all r ∈ 1, . . . , p\s thelower index ks does not appear in the monomial Arσr . To simplify notation, we assume that s = 1. Then,for all r = 2, . . . , p, since Arσr is maximally expanded, we find that Arσr is independent of k1. Therefore, wehave

E(Qk1A

1σ1

)(Qk2A

2σ2

)· · ·(QkpA

pσp

)= EQk1

(A1σ1

(Qk2A

2σ2

)· · ·(QkpA

pσp

))= 0 ,

where in the last step we used that EQi(X)Y = EQi(XY ) = 0 if Y is independent of i. This concludes theproof of (∗).

The statement (∗) can be reformulated as asserting that, after expansion, every lone label s has a“partner” label r = τ(s), such that the index ks appears also as a lower index in the expansion of Ar (notethat there may be several such partner labels r, we can choose τ(s) to be any one of them ).

For r ∈ 1, . . . , p we define `(r) :=∑s∈L 1(τ(s) = r), the number of times that the label r was chosen

as a partner to some lone label s. We now claim that

Arσr = O≺(Ψ1+`(r)o

). (10.21)

To prove (10.21), fix r ∈ 1, . . . , p. By definition, for each s ∈ τ−1(r) the index ks appears as a lowerindex in the monomial Arσr . Since s ∈ L is by definition a lone label and s 6= r, we know that ks does notappear as an index in Ar = 1/Gkrkr , i.e., kr 6= ks. By definition of the monomials associated with the treevertex σr, it follows that b(σr), the number of ones in σr, is at least

∣∣τ−1(r)∣∣ = `(r) since each application

of w1 adds precisely one new lower index while application of w0 leaves the lower indices unchanged. Notethat in this step it is crucial that s ∈ τ−1(r) was a lone label. Recalling (10.20), we therefore get (10.21).

Using (10.21) and Lemma 10.1 we find∣∣∣(Qk1A1σ1

)· · ·(QkpA

pσp

)∣∣∣ ≺ p∏r=1

Ψ1+`(r)o = Ψp+|L|

o .

This concludes the proof of (10.19).

Summing over the binary trees in (10.18) and using Lemma 10.1, we get from (10.19)

V (k) = O≺(Ψp+|L|o

). (10.22)

Step 4. Summing over the expansion. We now return to the sum (10.11). We perform the summation byfirst fixing P ∈ Pp, with associated lone labels L = L(P ). Using |tij | 6M−1, (8.45), we have∣∣∣∣∑

k

1(P(k) = P ) tik1 · · · tikp/2tikp/2+1· · · tikp

∣∣∣∣ 6 M−p∣∣∣∣∑

k

1(P(k) = P )

∣∣∣∣.

71

Now the number of k with |L| lone labels can be bounded easily by M |L|+(p−|L|)/2 since each block of Pthat is not contained in L consists of at least two labels. Thus we can bound the last displayed equation by∣∣∣∣∑

k

1(P(k) = P ) tik1 · · · tikp/2tikp/2+1· · · tikp

∣∣∣∣ 6 M−pM |L|+(p−|L|)/2 = (M−1/2)p−|L|.

From (10.11) and (10.22) we get

E∣∣∣∣∑k

tikXk

∣∣∣∣p ≺ ∑P∈Pp

(M−1/2)p−|L(P )|Ψp+|L(P )|o 6 CpΨ

2po ,

where in the last step we used the lower bound from (8.20) and estimated the summation over Pp with aconstant Cp (which is bounded by (Cp2)p). Summarizing, we have proved that

E∣∣∣∣∑k

tikXk

∣∣∣∣p ≺ Ψ2po (10.23)

for any p ∈ 2N.We conclude the proof of (8.47) with a simple application of Chebyshev’s inequality. Fix ε > 0 and

D > 0. Using (10.23) and Chebyshev’s inequality we find

P(∣∣∣∣∑

k

tikXk

∣∣∣∣ > NεΨ2o

)6 N N−εp

for large enough N > N0(ε, p). Choosing p > ε−1(1 +D), we conclude the proof of (8.47).

Remark 10.4. The first decoupling formula (8.1) is the only identity about the entries of G that is neededin the proof of Lemma 8.9. In particular, the second decoupling formula (8.2) is never used, and the actualentries of H never appear in the argument.

Part II. Proof of (8.46). The bound (8.46) may be proved by following the proof of (8.47) verbatim; theonly modification is the bound ∣∣QkG(T)

kk

∣∣ =∣∣Qk(G(T)

kk −m)∣∣ ≺ Ψ ,

which replaces (10.10). Here we again use the same upper bound Ψo = Ψ for Λ and Λo.

Part III. Proof of (8.48) and (8.50). In order to prove (8.48), we note that the last term in (8.17) is of orderO≺(Ψ2). Hence we have

wa :=∑i

taivi = m2sc

∑i,k

taisikvk −m2sc

∑i

taiΥi +O≺(Ψ2) = m2sc

∑i,k

saitikvk +O≺(Ψ2) ,

where in the last step we used Corollary 8.10 (recall that the proof of this corollary used only (8.47) whichwas already proved in Part I) to bound

∑i taiΥi by O≺(Ψ2) and that the matrices T and S commute by

assumption. Introducing the vector w = (wa)Na=1 we therefore have the equation

w = m2scSw +O≺(Ψ2) , (10.24)

where the error term is in the sense of the `∞-norm (uniform in the components of the vector w). Invertingthe matrix 1−m2

scS and recalling the definition (6.17) yields (8.48).The proof of (8.50) is similar, except that we have to treat the subspace e⊥ separately. Using (8.49) we

write ∑i

tai(vi − [v]) =∑i

taivi −∑i

1

Nvi ,

72

and apply the above argument to each term separately. This yields∑i

tai(vi − [v]) = m2sc

∑i

tai∑k

sikvk −m2sc

∑i

1

N

∑k

sikvk +O≺(Ψ2)

= m2sc

∑i,k

saitik(vk − [v]) +O≺(Ψ2) ,

where we used (6.1) in the second step. Note that the error term on the right-hand side is perpendicular to ewhen regarded as a vector indexed by a, since all other terms in the equation are. Hence we may invert thematrix (1−m2S) on the subspace e⊥, as above, to get (8.50). This completes the proof of Lemma 8.9.

10.3 Alternative proof of (8.47) of Lemma 8.9

We conclude this section with an alternative proof of the key statement (8.47) of Lemma 8.9. While theunderlying argument remains similar, the following proof makes use of an additional decomposition of thespace of random variables, which avoids the use of the stopping rule from Step (1) in the above proof ofLemma 8.9. This decomposition may be regarded as an abstract reformulation of the stopping rule.

Alternative proof of (8.47) of Lemma 8.9. As before, we set Xk := Qk(Gkk)−1. For simplicity of presen-tation, we set tik = N−1. The decomposition is defined using the operations Pi and Qi, introduced inDefinition 7.3. It is immediate that Pi and Qi are projections, that Pi + Qi = 1, and that all of theseprojections commute with each other. For a set A ⊂ 1, . . . , N we use the notations PA :=

∏i∈A Pi and

QA :=∏i∈AQi.

Let p be even and introduce the shorthand Xks := Xks for s 6 p/2 and Xks := Xks for s > p/2. Thenwe get

E∣∣∣∣ 1

N

∑k

Xk

∣∣∣∣p =1

Np

∑k1,...,kp

Ep∏s=1

Xks =1

Np

∑k1,...,kp

Ep∏s=1

(p∏r=1

(Pkr +Qkr )Xks

).

Introducing the notations k = (k1, . . . , kp) and [k] = k1, . . . , kp, we therefore get by multiplying out theparentheses

E∣∣∣∣ 1

N

∑k

Xk

∣∣∣∣p =1

Np

∑k

∑A1,...,Ap⊂[k]

Ep∏s=1

(PAcsQAsXks

). (10.25)

Next, by definition of Xks , we have that Xks = QksXks , which implies that PAcsXks = 0 if ks /∈ As.Hence may restrict the summation to As satisfying

ks ∈ As (10.26)

for all s. Moreover, we claim that the right-hand side of (10.25) vanishes unless

ks ∈⋃q 6=s

Aq (10.27)

for all s. Indeed, suppose that ks ∈⋂q 6=sA

cq for some s, say s = 1. In this case, for each s = 2, . . . , p, the

factor PAcsQAsXks is independent of k1 (see Definition 7.3). Thus we get

Ep∏s=1

(PAcsQAsXks

)= E

(PAc1QA1

Qk1Xk1

) p∏s=2

(PAcsQAsXks

)= EQk1

((PAc1QA1

Xk1

) p∏s=2

(PAcsQAsXks

))= 0 ,

73

where in the last step we used that EQi(X) = 0 for any i and random variable X.We conclude that the summation on the right-hand side of (10.25) is restricted to indices satisfying

(10.26) and (10.27). Under these two conditions we have

p∑s=1

|As| > 2 |[k]| , (10.28)

since each index ks must belong to at least two different sets Aq: to As (by (10.26)) as well as to some Aqwith q 6= s (by (10.27)).

Next, we claim that for k ∈ A we have

|QAXk| ≺ Ψ|A|o . (10.29)

(Note that if we were doing the case Xk = QkGkk instead of Xk = Qk(Gkk)−1, then (10.29) would have tobe weakened to |QAXk| ≺ Ψ|A|, in accordance with (8.46). Indeed, in that case and for A = k, we onlyhave the bound |QkGkk| ≺ Ψ and not |QkGkk| ≺ Ψo.)

Before proving (10.29), we show how it may be used to complete the proof. Using (10.25), (10.29), andLemma 10.1, we find

E∣∣∣∣ 1

N

∑k

Xk

∣∣∣∣p ≺ Cp1

Np

∑k

Ψ2|[k]|o = Cp

p∑u=1

Ψ2uo

1

Np

∑k

1(|[k]| = u)

6 Cp

p∑u=1

Ψ2uo N

u−p 6 Cp(Ψo +N−1/2)2p 6 CpΨ2po ,

where in the first step we estimated the summation over the sets A1, . . . , Ap by a combinatorial factor Cpdepending on p, in the forth step we used the elementary inequality anbm 6 (a+ b)n+m for positive a, b, andin the last step we used (8.20) and the bound M 6 N . Thus we have proved (10.23), from which the claimfollows exactly as in the first proof of (8.47).

What remains is the proof of (10.29). The case |A| = 1 (corresponding to A = k) follows from (10.10),exactly as in the first proof of (8.47). To simplify notation, for the case |A| > 2 we assume that k = 1 andA = 1, 2, . . . , t with t > 2. It suffices to prove that∣∣∣∣Qt · · ·Q2

1

G11

∣∣∣∣ ≺ Ψto . (10.30)

We start by writing, using the first decoupling formula (8.1),

Q21

G11= Q2

1

G(2)11

+Q2G12G21

G11G(2)11 G22

= Q2G12G21

G11G(2)11 G22

,

where the first term vanishes since G(2)11 is independent of 2 (see Definition 7.3). We now consider

Q3Q21

G11= Q2Q3

G12G21

G11G(2)11 G22

,

and apply again the first decoupling formula (8.1) with k = 3 to each resolvent entry on the right-handside, and multiply everything out. The result is a sum of fractions of entries of G, whereby all entries in thenumerator are diagonal and all entries in the denominator are diagonal. The leading order term vanishes,

Q2Q3G

(3)12 G

(3)21

G(3)11 G

(23)11 G

(3)22

= 0 ,

74

so that the surviving terms have at least three (off-diagonal) resolvent entries in the numerator. We maynow continue in this manner; at each step the number of (off-diagonal) resolvent entries in the numeratorincreases by at least one.

More formally, we obtain a sequence A2, A3, . . . , At, where A2 := Q2G12G21

G11G(2)11 G22

and Ai is obtained by

applying (8.1) with k = i to each entry of QiAi−1, and keeping only the nonvanishing terms. The followingproperties are easy to check by induction.

(i) Ai = QiAi−1.

(ii) Ai consists of the projection Q2 · · ·Qi applied to a sum of fractions such that all entries in the numeratorare off-diagonal and all entries in the denominator are diagonal.

(iii) The number of (off-diagonal) entries in the numerator of each term of Ai is at least i.

By Lemma 10.1 combined with (ii) and (iii) we conclude that |Ai| ≺ Ψio. From (i) we therefore get

Qt · · ·Q21

G11= At = O≺(Ψt

o) .

This is (10.30). Hence the proof is complete.

10.3.1 History of the fluctuation averaging

Lemma 8.9 is a version of the fluctuation averaging mechanism, taken from [55], that is the most useful forour purpose in proving Theorem 6.7. We now briefly comment on its history.

The first version of the fluctuation averaging mechanism appeared in [68] for the Wigner case, where[Z] = N−1

∑k Zk was bounded by Λ2

o (recall Zi from (8.11)). Since Qk[Gkk]−1 is essentially Zk, see (8.7),this corresponds to the first bound in (8.46). A different proof (with a better bound on the constants) wasgiven in [70]. A conceptually streamlined version of the original proof was extended to sparse matrices [56] andto sample covariance matrices [113]. Finally, an extensive analysis in [52] treated the fluctuation averaging ofgeneral polynomials of resolvent entries and identified the order of cancellations depending on the algebraicstructure of the polynomial. Moreover, in [52] an additional cancellation effect was found for the quantityQi|Gij |2. All proofs of the fluctuation averaging lemmas rely on computing expectations of high momentsof the averages and carefully estimating various terms of different combinatorial structure. In [52] we havedeveloped a Feynman diagrammatic representation for bookkeeping the terms, but this is necessary onlyfor the case of general polynomials. For the special cases stated in Lemma 8.9, the proof presented here isrelatively simple.

75

11 Eigenvalue location: the rigidity phenomenon

The local semicircle law in Theorem 6.7 was proven for universal Wigner matrices, characterized by theupper bound sij 6 1/M on the variances of hij . This result implies rigidity estimates on the location ofthe eigenvalues with a precision depending on M ; e.g. in the bulk spectrum the eigenvalues can typicallybe located with a precision slightly above 1/M . For the sake of a simplicity, from now on we restrict ourpresentation to the special case of generalized Wigner matrices, characterized by Cinf/N 6 sij 6 Csup/Nwith two positive constants Cinf and Csup, see Definition 2.1. So from now on the parameter M will bereplaced with N . For rigidity results in the general case, see Theorem 7.6 in [55].

In this section we will show that eigenvalues for generalized Wigner matrices are quite rigid; they mayfluctuate only on a scale slightly above 1/N . This is a manifestation that the eigenvalues are stronglycorrelated; the typical fluctuation scale of N independent, say Poisson, points in a finite interval would beN−1/2.

11.1 Extreme eigenvalues

We first state a corollary to Theorem 6.7 on the extreme eigenvalues.

Corollary 11.1. For any generalized Wigner matrix, the largest eigenvalue of H is bounded by 2 +N−2/3+ε

for any ε > 0 in the sense that for any ε > 0 and any D > 1 we have

P(

maxα=1,...,N

|λα| > 2 +N−2/3+ε)6 N−D (11.1)

for any N > N0(ε,D).

Proof. Set η = N−2/3 and choose an energy E = 2 + κ outside of the spectrum with some κ > N−2/3+ε Nε/2η. From (6.33) with z = E + iη we have

| ImmN (z)− Immsc(z)| 6Nε/2

Nκ 1

Nη(11.2)

with very high probability. On the other hand, if there is an eigenvalue λ with |λ− E| 6 η then we have

ImmN (z) >1

2Nη Immsc(z) +

Nε/2

Nκ. (11.3)

Here the first inequality comes from

ImmN (z) =1

N

∑α

η

|λα − E|2 + η2>

1

N

η

|λ− E|2 + η2,

while the second follows from Immsc(z) η/√κ from (6.13).

The equations (11.2) and (11.3) however contradict each other, showing that with very high probabilitythere is no eigenvalue with |λ− E| 6 N−2/3. We can repeat this argument for a grid of energies E = Ek =2 + N−2/3+ε + kN−2/3 for k = 0, 1, . . . , CN2/3+D for any D finite. We can then use the union bound forthe exceptional sets of small probabilities to exclude the existence of eigenvalues between N−2/3+ε and ND.Finally, for a large enough D, it is trivial to prove that the bound ‖H‖ 6

∑ij |hij | 6 ND holds with very

high probability. This excludes eigenvalues |λ| > ND and proves the corollary.

11.2 Stieltjes transform and regularized counting function

For any signed measure ρ on the real line, define its Stieltjes transform by

S(z) =

∫R

ρ(λ)

λ− zdλ, z ∈ C \ R. (11.4)

76

The Helffer-Sjostrand formula (that was originally used to develop an alternative functional calculus forself-adjoint operators, see e.g., [36]) yields an expression for

∫R f(λ)ρ(λ)dλ for a large class of function f(λ)

on the real line in terms of the Stieltjes transform of ρ. More precisely, let f ∈ C1(R) with compact supportand let χ(y) be a smooth cutoff function with support in [−1, 1], with χ(y) = 1 for |y| 6 1/2 and withbounded derivatives. Define

f(x+ iy) := (f(x) + iyf ′(x))χ(y)

to be an almost-analytic extension of f . With the standard convention z = x+ iy and ∂z = [∂x + i∂y]/2, wehave

f(λ) =1

π

∫R2

∂z f(x+ iy)

λ− x− iydxdy. (11.5)

To see this identity, we use ∂z(λ− z)−1 = 0 to write

1

π

∫R2

∂z f(x+ iy)

λ− x− iydxdy =

1

2πi

∫R2

∂z

[ f(x+ iy)

λ− x− iy

]dzdz. (11.6)

We can rewrite the last term as1

2πi

∫R2

d[ f(x+ iy)

λ− x− iydz], (11.7)

where the operator d is the differential in the sense of one form. From the Green theorem and the compactsupport of f , we can integrate by parts to a small circle Cε of radius ε around λ. The contribution of thetwo dimensional integration inside the circle will vanish in the limit ε→ 0. Hence the last term is equal to

limε→0

1

2πi

∫Cε

f(x+ iy)

λ− x− iydz = f(λ) (11.8)

and this proves the (11.5). Computing the ∂z in (11.5) explicitly, we have

f(λ) =1

π

∫R2

iyf ′′(x)χ(y) + i(f(x) + iyf ′(x))χ′(y)

λ− x− iydxdy. (11.9)

We remark that the same formula holds if we simply defined f(x+ iy) := f(x)χ(y). The addition of the

term iyf ′(x)χ(y) is to make ∂z f(x+ iy) = O(y) for |y| small which will be needed in the following estimates.In fact, the concept of almost-analytic function can be extended to arbitrary order n. We could have definedf such that ∂z f(x+ iy) = O(|y|n), but we will not need this result as it would not improve our estimate.

The following lemma in a slightly less general forms appeared in [59, 68, 69]. It shows how to translateestimates on the Stieltjes transform S(z) in the regime Im z > η to the regularized distribution function of%. Given two real numbers, E1 < E2, we wish to estimate %([E1, E2]), the %-measure of the interval [E1, E2].Since the integral of the sharp indicator function cannot be directly controlled, we need to smooth it out;the function f2 − f1 in the lemma below plays this role. The scale of the smoothing, denoted by η below,must be larger than η. Another feature of this lemma is that the condition on S(z) = S(E + iη) is local;only E-values in a η-neighborhood of the endpoints E1, E2 are needed.

Lemma 11.2. Let % be a signed measure on R and let S be its Stieltjes transform. Fix two energies E1 < E2.Suppose that for some 0 < η 6 1/2 and 0 < U1 6 U2 we have

|S(x+ iy)| 6 U1

yfor any x ∈ [E1, E1 + η] ∪ [E2, E2 + η] and η 6 y 6 1, (11.10)

| Im S(x+ iy)| 6 U1

yfor any x ∈ [E1, E2 + η] and 1/2 6 y 6 1, (11.11)

| Im S(x+ iy)| 6 U2

yfor any x ∈ [E1, E1 + η] ∪ [E2, E2 + η] and 0 < y < η. (11.12)

77

For η > η, define two functions fj = fEj ,η: R→ R such that fj(x) = 1 for x ∈ (−∞, Ej ], fj(x) vanishes forx ∈ [Ej + η,∞); moreover |f ′j(x)| 6 C η−1 and |f ′′j (x)| 6 C η−2 for some constant C. Then for some otherconstant C > 0 independent of U1, U2 and η, we have∣∣∣∣∫ (f2 − f1)(λ)%(λ)dλ

∣∣∣∣ 6 C‖%‖TV [U1| log η|+ U2η2

η2

], (11.13)

where ‖ · ‖TV denotes the total variation norm of signed measures. In addition, if we assume that ρ hascompact support, then we also have∣∣∣∣∫ f2(λ)%(λ)dλ

∣∣∣∣ 6 C ‖%‖TV [U1| log η|+ U2η2

η2

], (11.14)

where C depends on the size of the support of %.

Proof. In this proof, we will consider only the case η = η and U1 = U2. We will denote these common valuesby η and U . We may also assume ‖%‖TV = 1. These simplifications only streamline some notation, thegeneral case is proven exactly in the same way.

Let f = f2 − f1 that is smooth and compactly supported. From the representation (11.9) and since f isreal, we have∣∣∣∣∫ ∞

−∞f(λ)ρ(λ)dλ

∣∣∣∣ =

∣∣∣∣Re

∫ ∞−∞

f(λ)ρ(λ)dλ

∣∣∣∣ 6 C ∣∣∣∣∫∫ yf ′′(x)χ(y) ImS(x+ iy)dxdy

∣∣∣∣+ C

∫∫ (|f(x)||χ′(y)| |ImS(x+ iy)|+ |y||f ′(x)||χ′(y)| |ReS(x+ iy)|

)dxdy, (11.15)

for some universal C > 0, and where χ is a smooth cutoff function with support in [−1, 1], with χ(y) = 1 for|y| 6 1/2 and with bounded derivatives. Recall that f ′ is O(η−1) on two intervals of size O(η) each and χ′

is supported in 1/2 6 |y| 6 1. By (11.10), we can bound∫∫|y||f ′(x)||χ′(y)| |ReS(x+ iy)|dxdy 6 CU (11.16)

(note that due to S(x+ iy) = S(x− iy), the bounds analogous to (11.10)–(11.11) also hold for negative y’s).Using that f is bounded with compact support, we have, by (11.11),∫∫

|f(x)||χ′(y)| |ImS(x+ iy)|dxdy 6 CU . (11.17)

For the first term on the right hand side of (11.15), we split the integral into two regimes depending on0 < y < η or η < y < 1. Note that by symmetry we only need to consider positive y. From (11.12), theintegral on the first integration regime is easily bounded by∣∣∣∣∫∫

0<y<η

yf ′′(x)χ(y) ImS(x+ iy)dxdy

∣∣∣∣= O

(∫∫0<y<η

[1(E1 6 x 6 E1 + η) + 1(E2 6 x 6 E2 + η)

]yη−2U

ydxdy

)= O(U).

For the second integration regime, y ∈ [η, 1], we can integrate by parts in x, then use ∂x ImS(x + iy) =−∂y ReS(x+ iy) and then integrate by parts in y to get:∫∫

y>η

yf ′′(x)χ(y) ImS(x+ iy)dxdy =−∫∫

y>η

f ′(x)∂y(yχ(y)) ReS(x+ iy)dxdy (11.18)

−∫f ′(x)ηχ(η) ReS(x+ iη)dx. (11.19)

78

Recall that f ′(x) = 0 unless |x − Ej | < η for some j = 1, 2 and in this regime we have f ′ = O(η−1). By(11.10), the last integral is easily bounded by O(U). For the first integral, we use ∂y(yχ(y)) = O(1) andS(x+ iy) = O(U/y) to have∣∣∣∣∫∫

y>η

∂y(yχ(y))f ′(x)S(x+ iy)dxdy

∣∣∣∣ 6 O

(U

∫ 1

η

dy

y

)= O (U | log η|) , (11.20)

which is (11.13). To prove (11.14), we simply choose E1 to be any energy on the left side of the support ofρ and notice that ∣∣∣∣∫ f2(λ)%(λ)dλ

∣∣∣∣ =

∣∣∣∣∫ (f2 − f1)(λ)%(λ)dλ

∣∣∣∣ 6 C U | log η|. (11.21)

This completes the proof of Lemma 11.2.

11.3 Convergence speed of the empirical distribution function

Define the empirical distribution function (or, in physics terminology, integrated density of states)

nN (E) :=1

N|α : λα 6 E| =

∫ E

−∞%N (x) dx ,

where %N is the empirical eigenvalue distribution

%N (x) =1

N

N∑α=1

δ(x− λα) . (11.22)

Similarly, we define the distribution function of the semicircle density

nsc(E) :=

∫ E

−∞%sc(x)dx.

We introduce the differences

%∆ := %N − %sc , m∆ := mN −msc .

and recall mN (z) = 1N TrG(z) =

∫%N (x)(x− z)−1 dx is the Stieltjes transform of the empirical density.

The following lemma, proven below, shows how local semicircle law implies a convergence rate for theempirical distribution function.

Lemma 11.3. Suppose that for some |E| 6 10 and N−1+ε 6 η 1 with some ε > 0 we know that

|mN (x+ iη)−msc(x+ iη)| ≺ 1

Nη(11.23)

holds for any |x− E| 6 η and any η ∈ [η, 10]. Then we have∣∣∣nN (E)− nsc(E)∣∣∣ ≺ η . (11.24)

For generalized Wigner matrices, the condition (11.23) holds with the choice η = N−1+ε uniformly in|E| 6 10 (see (6.29) and (6.32)). Thus we have proved the following corollary:

Corollary 11.4. For generalized Wigner matrices, we have

sup|E|610

∣∣∣nN (E)− nsc(E)∣∣∣ 6 N−1+ε (11.25)

with probability at least 1−N−D for any D, ε > 0, if N > N0(ε,D) is sufficiently large.

79

Proof of Lemma 11.3. Since both the semicircle measure and the empirical measure (by Corollary 11.1)have a compact support within [−3, 3] with a very high probability, (11.24) clearly holds for E 6 −3 andfor E > 3. For an energy |E| 6 3, we will apply Lemma 11.2 with the choice ρ = ρ∆, U1 = U2 = Nεη andE2 = E. The estimate (11.14) will immediately imply (11.24), provided the other assumptions on Lemma11.2 are satisfied.

For this purpose, we need the bounds (11.10), (11.11) and (11.12) on the Stieltjes transform with thechoice U1 = U2 = Nεη. We denote this common value by U := Nεη. The assumption (11.23) implies (11.10)(with very high probability) if we choose U > N−1+ε. From Lemma 6.2 we find

|Immsc(x+ iy)| 6 C√κx + y , (11.26)

recalling the definition κx = min|x− 2|, |x+ 2| from (6.23). By spectral decomposition of H, it is easy tosee that the function y 7→ y ImmN (x+ iy) is monotone increasing. Thus we get, using (11.26) and (11.23),that

y ImmN (x+ iy) 6 η ImmN (x+ iη) ≺ η

(√κx + η +

1

Nη

)≺ η

√κx + η +

1

N, (11.27)

for y 6 η and |x| 6 10. Using the notation m∆ := mN −msc and recalling (11.26), we therefore get

|y Imm∆(x+ iy)| ≺ η√κx + η +

1

N6 Cη (11.28)

for y 6 η and |x| 6 10; here we have used the assumption η > 1N in the last step. Therefore, with the choice

U = ηNε, the bounds (11.10), (11.11) and (11.12) indeed hold.Recall from Lemma 11.2 that fE,η denotes a smooth indicator function of [−∞, E] on scale η, namely

fE,η(x) = 1 for x ∈ [−∞, E] and fE,η(x) = 0 for x > E + η and having derivative bounded by η−1. Using(11.14) from Lemma 11.2, we conclude that∣∣∣∣∫ fE,η(λ)(λ) %∆(λ) dλ

∣∣∣∣ ≺ η, (11.29)

Hence we have

nN (E)− nsc(E)−∫fE,η(λ) %∆(λ) dλ = −

∫ E+η

E

fE,η(λ) %N (λ) dλ+

∫ E+η

E

fE,η(λ) %sc(λ) dλ 6 Cη .

Notice that we only used the positivity of %N in the first term and the boundedness of f and %sc in thesecond. Conversely, we have

nN (E)− nsc(E)−∫fE−η,η(λ) %∆(λ) dλ

=

∫ E

E−η(1− fE−η,η)(λ) %N (λ) dλ−

∫ E

E−η(1− fE−η,η)(λ) %sc(λ) dλ > −Cη .

Hence we have proved that

−Cη +

∫fE−η,η(λ) %∆(λ) dλ 6 nN (E)− nsc(E) 6

∫fE,η(λ) %∆(λ) dλ+ Cη .

The right hand side is directly bounded by (11.29), the left hand side is estimated in the same way, using(11.29) for fE−η,η instead of fE,η. This proves Lemma 11.3.

11.4 Rigidity of eigenvalues

We now state and prove the rigidity theorem concerning the eigenvalue locations for generalized Wignermatrices. Recall the definition of %N from (11.22). Hence we have

j

N=

∫ λj

−∞%N (x)dx. (11.30)

80

We define the classical location of the j-th eigenvalue by the equation

j

N=

∫ γj

−∞%sc(x)dx, (11.31)

in other words, γj is the j-th N -quantile of the semicircle distribution.

Theorem 11.5 (Rigidity of eigenvalues). For any generalized Wigner matrix ensemble, for any constant ε > 0and any constant D > 1 we have

P

∃j : |λj − γj | > Nε

[min

(j,N − j + 1

)]−1/3

N−2/3

6 N−D (11.32)

for any sufficiently large N > N0(ε,D).

Proof. We consider only the case j 6 N/2; the other case is similar. By (11.30) and (11.31) we have

0 =

∫ λj

−∞ρN (x)dx−

∫ γj

−∞ρsc(x)dx =

∫ λj

−∞(ρN (x)− ρsc(x))dx−

∫ γj

λj

%sc(x)dx, (11.33)

i.e., ∣∣∣ ∫ γj

λj

%sc(x)dx∣∣∣ 6 ∣∣∣ ∫ λj

−∞(ρN (x)− ρsc(x))dx

∣∣∣ =∣∣nN (λj)− nsc(λj)

∣∣ ,and thus by the uniform bound (11.25) ∣∣∣ ∫ γj

λj

%sc(x)dx∣∣∣ ≺ 1

N. (11.34)

Since %sc(x) √κx for x ∈ [−2, 1], we have

nsc(x) κ3/2x , %sc(x) = n′sc(x) n1/3

sc (x), x ∈ [−2, 1] . (11.35)

This also implies for j 6 N/2 that

γj + 2 ( jN

)2/3

, %sc(γj) ( jN

)1/3

. (11.36)

If we knew that %sc(γj) and %sc(λj) were comparable, then the monotonicity of %sc and the mean-valuetheorem (11.34) would immediately imply

|λj − γj |ρsc(γj) ≺ N−1. (11.37)

Combining this with the second asymptotics form (11.36), we would have

|λj − γj | ≺ N−1%sc(γj)−1 6 CN−1(j/N)−1/3. (11.38)

For the complete proof, we first consider the indices j > j0 := Nε/2. Since nsc(γj) > N−1+ε/2 andnsc(γj) = nN (λj), we have ∣∣nsc(λj)− nsc(γj)∣∣ 6 N−ε/4nsc(γj)with very high probability by (11.25). This shows that nsc(λj) nsc(γj), but then %sc(λj) %sc(γj) by(11.35), as we presumed above.

Finally, for indices j 6 j0 = Nε/2 we use monotonicity and rigidity (11.38) for the index j0:

λj 6 λj0 6 γj0 +N−2/3+ε/2 6 −2 +N−2/3+ε

with very high probability. In the last step we used (11.36). For the lower bound on λj we refer to (11.1).This shows that |λj + 2| 6 N−2/3+ε with very high probability and we also have |γj + 2| 6 N−2/3+ε, so|λj − γj | 6 2N−2/3+ε. This proves the rigidity estimate (11.32) for all indices j.

81

12 Universality for matrices with Gaussian convolutions

12.1 Dyson Brownian motion

Consider the following matrix valued stochastic differential equation

dH(t) =1√N

dB(t)− 1

2H(t)dt, t > 0 (12.1)

with initial data H0, where B(t) is defined somewhat differently in the two symmetry classes, namely:

(i) in the real symmetric case (indicated by superscript) B(s) is an N ×N matrix such that b(s)ij for i < j

and b(s)ii /√

2 are independent standard Brownian motions, and b(s)ij = b

(s)ji .

(ii) in the complex Hermitian case B(h) is an N ×N matrix such that√

2 Re(b(h)ij ),

√2 Im(b

(h)ij ) for i < j

and b(h)ii are independent standard Brownian motions, and b

(h)ji = (b

(h)ij )∗.

We will drop the s or h superscript, the following formulas hold for both cases. In coordinates, we have

dhij(t) =dbij(t)√

N− 1

2hij(t)dt, (12.2)

where the entries bij(t) have variance t in the Hermitian case and in the symmetric case the off-diagonalentries (bij(t)) have variance t, while the diagonal entries have variance 2t.

The equation (12.1) defines a matrix valued Ornstein-Uhlenbeck (OU) process. It plays a distinguishedrole in the theory of random matrices that was discovered by Dyson. Depending on the typographicalconvenience, the time dependence will sometimes be indicated as an argument, H(t), and sometimes as anindex, Ht, we keep both notations in parallel, i.e., H(t) = Ht. In particular, unlike in some PDE literature,subscripts do not mean derivatives.

Next, we are concerned about the evolution of the eigenvalues λ(t) = (λ1(t), λ2(t), . . . , λN (t)) of Ht alongthe OU flow. We label the eigenvalues in an increasing order, i.e., we assume that λ(t) ∈ ΣN , where wedefine the open simplex

ΣN := λ ∈ RN : λ1 < λ2 < . . . < λN. (12.3)

A theorem below will guarantee that the eigenvalues are simple and they are continuous functions of t, hencethe labelling is consistently preserved along the evolution.

In principle, the eigenvalues λj(t) and eigenvectors uj(t) of Ht are strongly related and one wouldexpect a coupled system of stochastic differential equations for them. It was Dyson’s fundamental observation[45] that the eigenvalues themselves satisfy an autonomous system of stochastic differential equations (SDE)that do not involve the eigenvectors. This SDE is given in the following definition.

Definition 12.1. Given a real parameter β > 1, consider the following system of SDE

dλi =

√2√βN

dBi +

−λi2

+1

N

(i)∑j

1

λi − λj

dt , i ∈ J1, NK , (12.4)

where (Bi) is a collection of real-valued, independent standard Brownian motions. The solution of this SDEis called the Dyson Brownian Motion (DBM) with parameter β.

Notice that we defined the DBM for any β > 1 and not only for the classical values β = 1, 2 that willcorrespond to an OU matrix flow. The following theorem summarizes the main properties of the DBM. Forthe proof, see Lemma 4.3.3 and Proposition 4.3.5 of [8]. We remark that the authors in [8] considered (12.4)without the drift term −λi/2. However, any drift term of the form −V ′(λi) with V ∈ C2(R) could be addedwithout further complications.

82

Theorem 12.2. Let β > 1 and suppose that the initial data satisfy λ(0) ∈ ΣN . Then there exists a unique(strong) solution to (12.4) in the space of continuous functions (λ(t))t>0 ∈ C(R+,ΣN ). Furthermore, forany t > 0 we have λ(t) ∈ ΣN and λ(t) depends continuously on λ(0). In particular, if λ(0) ∈ ΣN , i.e., themultiplicity of the initial points is one, then (λ(t))t>0 ∈ C(R+,ΣN ), i.e., this property is preserved for alltimes along the evolution.

We point out that the solution is considered in the strong sense. This means that despite the singularityin (12.4), the DBM admits a solution for almost all realization of the Brownian motions Bi. For the precisedefinition we recall the concept of strong solution to a (scalar) SDE of the form

dXt = a(Xt)dBt + b(Xt)dt, X0 = ξ, t > 0, (12.5)

with respect to a fixed realization of a Brownian motion Bt, where a and b are given coefficient functions.The strong solution is a process (Xt)t>0 on a filtrated probability space (Ω,Ft) with continuous sample pathsthat satisfies the following properties

• Xt is adapted to the filtration Ft = σ(Gt ∪N ), where

Gt = σ(Bs, s 6 t,X0), N := N ⊂ Ω,∃G ⊂ G∞ with N ⊂ G, P(G) = 0,

i.e., the filtration of Xt is the same as that of Bt after completing it with events of zero measure;

• X0 = ξ almost surely;

• For any time t, we have∫ t

0

[|b(Xs)|2 + |a(Xs)|2

]ds <∞ almost surely;

• The integral version of (12.5), i.e.,

Xt = X0 +

∫ t

0

a(Xs)dBs +

∫ t

0

b(Xs)ds

holds almost surely for any t > 0.

For simplicity we only presented the definition for the scalar case, Xt ∈ R; the extension to the vector valuedcase, Xt ∈ RN , is straightforward.

The relation between the OU matrix flow (12.1) and the DBM (12.4) is given by the following theoremessentially due to Dyson.

Theorem 12.3. [45] Let Ht solve the matrix valued SDE (12.1) in a strong sense. Then its eigenvalue processsatisfies (12.4), where the parameter β = 1 if the matrix B is real symmetric and β = 2 if B is complexHermitian.

We remark that the Ornstein-Uhlenbeck drift term − 12Htdt in (12.1) is not essential. One may replace it

with −V ′(H)dt with any V ∈ C2(R) potential, then the same theorem holds but the −λi/2 term has to bereplaced with −V ′(λi) in (12.4). In particular, the drift term can be removed; for example the monograph [8]defines DBM without the drift term. While there is no fundamental difference between these formulations,(12.1) is technically slightly more convenient since it preserves not only the zero expectation value but alsothe 1/N variance of the matrix elements if the variance of the initial data H0 is 1/N .

If technical complications related to the singularities (λi − λj)−1 in (12.4) could be neglected, the proofof Theorem 12.3 would be a relatively straightforward Ito calculus combined with standard perturbationformulas for eigenvalues and eigenvectors of self-adjoint matrices. We will present this calculation in the nextsection. A rigorous proof is slightly more cumbersome since a priori the L2 integrability of the singularityis not guaranteed. The actual proof first regularizes the singular term on a very short scale. Then astopping time argument shows that the regularization can be removed since the dynamics has an effectivelevel repulsion mechanism that keeps neighboring points separated. We will not present this argument heresince it is not particularly instructive for this book. The full details can be found in Section 4.3.1 of [8] forthe case when the term − 1

2Htdt is not present in (12.1). The necessary modifications for (12.1) (or for otherdrift term −V ′(λi)) are straightforward.

83

12.2 Derivation of Dyson Brownian motion and perturbation theory

Let uα, α = 1, 2, . . . , N , be the `2-normalized eigenvectors of H = (hij) with (real) eigenvalues λα. Forsimplicity we assume that all eigenvalues are distinct. Fix an index pair (i, j) and abbreviate the partialderivative as

f :=∂f

∂hij.

Differentiating Huα = λαuα and u∗αuβ = δα,β yields

Huα +Huα = λαuα + λαuα, (12.6)

as well as

u∗αuβ + u∗αuβ = 0, u∗αuα = 0.

Taking the inner product with uα on both side of (12.6), we get

λα = u∗αHuα. (12.7)

Moreover, multiplying (12.6) by u∗β yields, for β 6= α,

u∗βHuα + u∗βHuα = λαu∗βuα,

i.e.,

u∗βHuα + λβu∗βuα = λαu

∗βuα,

where we have used that H is self-adjoint and λβ is real in the last equation. Hence,

uα =∑β 6=α

(u∗βuα)uβ =∑β 6=α

u∗βHuα

λα − λβuβ . (12.8)

For simplicity of notations, from now on, we consider only the real symmetric case. The complex Her-mitian case can be treated similarly. In the real symmetric case, (12.7) reads as

∂λα∂hij

= uα(i)uα(j)[2− δij ], (12.9)

where uα(1), uα(2), . . . , uα(N) denote the coordinates of the vector uα. We may assume that the eigenvectorsare real. From (12.8) we get

∂uα(k)

∂hij=∑β 6=α

uβ(i)uα(j) + uβ(j)uα(i)[1− δij ]λα − λβ

uβ(k). (12.10)

Combining these last two formulas allows us to compute the second partial derivatives, i.e., for any fixedindices i, j, k, ` we have

∂2λα∂h`j∂hik

=[2− δik][∂uα(i)

∂h`juα(k) + uα(i)

∂uα(k)

∂h`j

]=[2− δik]

∑β 6=α

1

λα − λβ

[(uβ(j)uα(`) + uβ(`)uα(j)[1− δj`]

)uβ(i)uα(k)

+(uβ(`)uα(j) + uβ(j)uα(`)[1− δj`]

)uβ(k)uα(i)

]=[2− δik]

∑β 6=α

1

λα − λβ

[(uβ(j)uα(`) + uβ(`)uα(j)[1− δj`]

)(uβ(i)uα(k) + uβ(k)uα(i)

)].

(12.11)

84

By (12.2), (12.9), (12.11) and using Ito’s formula (neglecting the issue of singularity), we have

dλα =∑i6k

∂λα∂hik

dhik +1

2

∑i6k

∑j6`

∂2λα∂hik∂h`j

(dhik)(dh`j)

=∑i,k

uα(i)uα(k)[dbik√

N− hik

2dt]

+1

2N

∑i,k

∑β 6=α

1

λα − λβ

[|uβ(i)|2|uα(k)|2 + |uα(i)|2|uβ(k)|2

]dt

=1√N

∑i,k

uα(i)uα(k)dbik −λα2

dt+1

N

∑β 6=α

1

λα − λβdt.

(12.12)

In the second line we used bik = bki for i 6= k and that (dhik)(dh`j) = 1N δi`δkj [1 + δik]dt. Finally, in the last

line we have used the equation Huα = λαuα to get∑i,k

uα(i)uα(k)hik =∑i

uα(i)(Huα)(i) = λα∑i

|uα(i)|2 = λα,

which gives the −(λi/2)dt term in (12.4).In the first term on the right hand side of (12.12) we define a new real Gaussian process

Bα :=∑i,k

uα(i)uα(k)bik.

Clearly, EdBα = 0 and its covariance satisfies

EdBαdBα′ = E∑i,k

∑`,j

uα(i)uα(k)dbikuα′(`)uα′(j)db`j = 2∑i,k

uα(i)uα′(i)uα(k)uα′(k)dt = δαα′2dt.

where there are two pairings, (i, `) = (k, j) and (i, `) = (j, k), that contributed to the contractions. Thus

Bα =√

2Bα, where (Bα)Nα=1 is the standard (real) Brownian motion in RN and this gives the martingaleterm in (12.4) with β = 1. A similar formula can be derived for the Hermitian case, where the parameterbecomes β = 2; we omit the details.

In the next section we put the Dyson Brownian motion in a more general context which is closer to theinterpretation of GUE as an invariant ensemble in the spirit of Section 4. It turns out that the measure onthe eigenvalues, explicitly given in (4.3), generates a DBM in a canonical way such that this measure willbe invariant under the dynamics. This holds for any values β > 1, i.e., even if there is no underlying matrixensemble.

12.3 Strong local ergodicity of the Dyson Brownian motion

One key property of DBM is that it leaves the Gaussian Wigner ensembles invariant. Moreover, the Gaussianmeasure is the only equilibrium of DBM and the DBM dynamics converges to this equilibrium from anyinitial condition. In this section we will quantify these ideas.

We start with introducing the concept of the Dirichlet form and the generator associated with anyprobability measure µ = µN on RN . For definiteness, the reader may keep the invariant β-ensemble (4.3) inmind, i.e.,

µN (dλ) =e−βNHN (λ)

Zdλ, HN (λ) :=

1

2

N∑i=1

V (λi)−1

N

∑i<j

log |λj − λi|. (12.13)

We will drop the subscript N , but most quantities depend on N .

85

We define the Dirichlet form associated with the measure µ on RN , which is just the homogeneous H1

norm, i.e.,

Dµ(f) :=1

βN

N∑i=1

∫(∂if)2dµ =

1

βN‖∇f‖2L2(µ) , (∂i ≡ ∂λi) . (12.14)

We remark that in our earlier papers the Dirichlet form was defined with a 1/(2N) prefactor instead of1/(βN). The current convention is suitable to consider DBM for general β.

The symmetric operator associated with the Dirichlet form is called generator and denoted by L = Lµ.It satisfies

Dµ(f) = 〈f, (−L)f〉L2(µ) = −∫fLfdµ.

Notice that we follow the probabilistic convention to define the generator as a negative operator, L 6 0.Formally, we have L = 1

βN∆− (∇H) · ∇, i.e.,

L =

N∑i=1

1

βN∂2i +

N∑i=1

(− 1

2V ′(λi) +

1

N

(i)∑j

1

λi − λj

)∂i . (12.15)

In the special Gaussian case, when the confining potential is V (λ) = 12λ

2, the corresponding measurerestricted to ΣN ⊂ RN is denoted by µG. Notice that for β = 1, 2 the measure µG coincides with the GOE,GUE measures, respectively (see also (4.12)). The generator (12.15) reads as

LG =

N∑i=1

1

βN∂2i +

N∑i=1

(− 1

2λi +

1

N

(i)∑j

1

λi − λj

)∂i . (12.16)

Now we consider the Dyson Brownian motion (12.4), i.e., dynamics of the eigenvalues λ = (λ1, λ2, . . . , λN ) ∈ΣN of Ht that evolves by (12.1). We write the distribution of λ of Ht at time t as ft(λ)µG(dλ). Comparingthe SDE (12.4) with LG, we notice that Kolmorogov’s forward equation for the evolution of the density fttakes the form

∂tft = LGft , (t > 0) . (12.17)

We remark that the rigorous definition of L as a self-adjoint operator on L2(dµ) through the Friedrichsextension and the existence of the corresponding dynamics (12.17) restricted to the simplex ΣN for any β > 1will be discussed in the separate Section 12.4. In particular, by spectral theorem the operator (−L)1/2etL isbounded for t > 0, thus we see that for any initial condition f0 ∈ L2(dµ) we have ft ∈ H1(dµ) for any t > 0.The restriction β > 1 is essential to define the dynamics on the simplex ΣN without specifying additionalboundary conditions; the same restriction was also necessary in Theorem 12.2 to define the strong solutionto the DBM as a stochastic differential equation. Indeed, if β < 1, the particles cross each other in the SDEformulation. In this section we thus assume β > 1. Nevertheless, some ideas and results presented here arestill applicable to any β > 0 with a regularization; we will comment on this in Section 13.7.

By writing the distribution of the eigenvalues as ftµG, our formulation of the problem has already takeninto account Dyson’s observation that the invariant measure for DBM is µG since as we will see, at t→∞the solution ft converges to f∞ = 1. A natural question regarding the DBM is how fast the dynamics reachesequilibrium. Dyson had already posed this question in 1962:

Dyson’s conjecture [45]: The global equilibrium of DBM is reached in time of order one and the localequilibrium (in the bulk) is reached in time of order 1/N . Dyson further remarked,

“The picture of the gas coming into equilibrium in two well-separated stages, with microscopic andmacroscopic time scales, is suggested with the help of physical intuition. A rigorous proof that thispicture is accurate would require a much deeper mathematical analysis.”

86

We will prove that Dyson’s conjecture is correct if the initial data of the DBM is the eigenvalues of aWigner ensemble, which was Dyson’s original interest. Our result in fact is valid for DBM with much moregeneral initial data that we now survey. Briefly, it will turn out that the global equilibrium is indeed reachedwithin a time of order one, but local equilibrium is achieved much faster if an a priori estimate on the locationof the eigenvalues (also called points) is satisfied. Recalling the definition of the classical locations γj from(11.31), the a priori bound (originally referred to Assumption III in [63,64]) is formulated as follows

A priori Estimate: There exists a ξ > 0 such that average rigidity on scale N−1+ξ holds, i.e.,

Q = Qξ := sup06t6N

1

N

∫ N∑j=1

(λj − γj)2ft(λ)µG(dλ) 6 CN−2+2ξ (12.18)

with a constant C uniformly in N . (We assumed (12.18) with a lower bound t > 0 in the supremum, but infact the proof shows that t > N−1+2ξ would be sufficient.)

Notice that (12.18) requires that the rigidity of the eigenvalues on scale N−1+ξ holds, at least in anaverage sense. We recall that Theorem 11.5 guarantees that for Wigner eigenvalues rigidity holds on scaleN−1+ξ for any ξ > 0 (in the bulk), so (12.18) holds for any ξ > 0 in this case. However, our theory on thelocal ergodicity of the DBM applies to other models as well, so we wish to keep our formulation general andallowing situations where (12.18) holds only for larger ξ.

The main result on the local ergodicity of Dyson Brownian motion (12.17) states that if the a prioriestimate (12.18) is satisfied then the local correlation functions of the measure ftµG are the same as thecorresponding ones for the Gaussian measure, µG = f∞µG, provided that t is larger than N−1+2ξ. Then-point correlation functions of the (symmetrized) probability measure ν are defined, similarly to (4.20), by

p(n)ν,N (x1, x2, . . . , xn) :=

∫RN−n

ν(x)dxn+1 . . . dxN , x = (x1, x2, . . . , xN ). (12.19)

In particular, when ν = ftdµG, we denote the correlation functions by

p(n)t,N (x1, x2, . . . , xn) = p

(n)ftµG,N

(x1, x2, . . . , xn). (12.20)

We also use p(n)G,N for p

(n)µG,N

. In general, if H is an N ×N Wigner matrix, then we will also use p(n)H,N for the

correlation functions of the eigenvalues of H.Due to the convention that one can view the locations of eigenvalues as the coordinates of particles (or

points), we have used x, instead of λ, in the last equation. From now on, we will use both conventions de-pending on which viewpoint we wish to emphasize. Notice that the probability distribution of the eigenvaluesat the time t, ftµG, is the same as that of the Gaussian divisible matrix:

Ht = e−t/2H0 + (1− e−t)1/2HG, (12.21)

where H0 is the initial Wigner matrix and HG is an independent standard GUE (or GOE) matrix. Thisestablishes the universality of the Gaussian divisible ensembles. The precise statement is the followingtheorem (notice that [64] uses somewhat different notation).

Theorem 12.4. [64, Theorem 2.1] Suppose that for some exponent ξ ∈ (0, 12 ), the average rigidity (12.18)

holds for the solution ft of the forward equation (12.17) on scale N−1+ξ. Additionally, suppose that in thebulk the rigidity holds on scale N−1+ξ even without averaging, i.e., for any κ > 0

supκN6j6(1−κ)N

|λj − γj | ≺ N−1+ξ (12.22)

holds for any t ∈ [N−1+2ξ, N ] if N > N0(ξ, κ) is large enough. Let E ∈ (−2, 2) and b = bN > 0 such that[E − b, E + b] ⊂ (−2, 2). Then for any integer n > 1 and for any compactly supported smooth test functionO : Rn → R, we have, for any t ∈ [N−1+2ξ, N ],∣∣∣ ∫ E+b

E−b

dE′

2b

∫Rn

dα O(α)(p

(n)t,N − p

(n)G,N

)(E′ +

α

N

)∣∣∣ 6 Nε[N−1+ξ

b+

√1

bNt

]‖O‖C1 (12.23)

for any N sufficiently large, N > N0(n, ξ, κ).

87

The upper limit t 6 N for the time interval in (12.18) and within the theorem is unimportant, one couldhave replaced it with Nε for any ε > 0. Beyond t > Nε the measure ftµG is already exponentially close toequilibrium µG in entropy sense, so all correlation functions are also close. Also notice that for simplicity ofthe notation we replaced α

N%sc(E) by αN in the argument of the correlation functions (compare (12.23) with

(5.2)). This can be easily achieved by a change of variables and a redefinition of the test function O. Thisconvention will be followed in the rest of the book.

In other words, this theorem states that if we have rigidity on scale N−1+ξ for some ξ ∈ (0, 12 ),

then the DBM has average energy universality in the bulk for any time t N−1+2ξ on scale b maxN−1+ξ, (Nt)−1. For generalized Wigner matrices we know that rigidity holds on the smallest possiblescale, so we have the following corollary:

Corollary 12.5. Consider the matrix OU process with initial matrix ensemble H0 being a generalized Wignerensemble and let |E| < 2. Then for any ε > 0, for any t ∈ [N−1+2ε, Nε] and for any b > (Nt)−1 withb <

∣∣|E| − 2∣∣ we have

∣∣∣ ∫ E+b

E−b

dE′

2b

∫Rn

dα O(α)(p

(n)t,N − p

(n)G,N

)(E′ +

α

N

)∣∣∣ 6 N3ε

√bNt‖O‖C1 , (12.24)

for any N > N0(n, ε), where p(n)t,N is the correlation function of Ht, or equivalently, that of e−t/2H0 +√

1− e−tHG.

Proof. Since H0 is a generalized Wigner ensemble, so is e−t/2H0 +√

1− e−tHG for all t. Hence the optimalrigidity estimate (11.32) holds for all times. Therefore, (12.18) and (12.22) hold with any ξ > 0. Chooseξ = ε/2 and apply Theorem 12.4. The right hand side of (12.23) is bounded by

Nε[Nε/2

Nb+

1√bNt

]6

N3ε

√bNt

,

where we have used that t 6 Nε and bNt > 1.

The estimate (12.24) means that averaged energy universality holds in a window of size slightly biggerthan tN . A large energy window of size b = N−δ, for some small δ > 0, can already be achieved after a veryshort times, i.e., for any t > N−1+2δ (by choosing ε = δ/8). To achieve universality on very short energyscales, b = N−1+δ, we will need relatively large times, t > N−δ/2 (by choosing ε = δ/16). In both cases theright hand side (12.24) is bounded by CN−ε. In the sense of universality of large energy windows of sizeb = N−δ, our result essentially establishes the Dyson’s conjecture that the time to local equilibrium is N−1.A better error estimate can also be achieved if both b and t are large, e.g., with the choice b = N−δ andt > N−δ, we get a convergence of order N−1/2+δ+ε for any ε > 0.

Going back to Theorem 12.4, we also remark that there is a considerable room in this argument even ifoptimal rigidity is not available (this is the case for random band matrices, see [55]). In fact, Theorem 12.4provides universality, albeit only for relatively large times t > N−δ and with a large energy averaging,b = N−δ, even if (12.18) and (12.22) hold only with ξ = 1+δ

2 . This restriction comes from the requirementthat t > N−1+2ξ should be implied by t > N−δ. This exponent ξ slightly above 1/2 corresponds to a rigiditycontrol on a scale just below N−1/2 in the bulk. Clearly, N−1/2 is a critical threshold; no local universalitycan be concluded with this argument unless a rigidity control on scale below N−1/2 is established a priori.

12.4 Existence and restriction of the dynamics

This technical section was taken from Appendix A of [64]; we include it here for the reader’s convenience.As in Section 12.3, we consider the Euclidean space RN with the normalized measure µ = exp(−βNH)/Z.

The Hamiltonian H, of the form (12.13), is symmetric with respect to the permutation of the variables x =(x1, . . . , xN ), thus the measure can be restricted to the subset ΣN ⊂ RN defined in (12.3). In this appendixwe outline how to define the dynamics (12.17) with its generator, formally given by L = 1

2N∆ − 12 (∇H)∇,

88

on ΣN . The condition β > 1 and the specific factors∏i<j |xj − xi|β will play a key role in the argument, in

particular, we will see that β = 1 is the critical threshold for this method to work.We first recall the standard definition of the dynamics on RN . The quadratic form

E(u, v) :=

∫RN∇u · ∇v dµ

is a closable Markovian symmetric form on L2(RN ,dµ) with a domain C∞0 (RN ) (see Example 1.2.1 andTheorem 3.1.3 of [74]). This form can be closed with a form domain H1(RN ,dµ) defined as the closure ofC∞0 in the norm ‖ · ‖2+ = E(·, ·) + ‖ · ‖22. The closure is called the Dirichlet form. It generates a stronglycontinuous Markovian semigroup Tt, t > 0, on L2 (Theorem 1.4.1 [74]) and it can be extended to a contractionsemigroup to L1(RN ,dµ), ‖Ttf‖1 6 ‖f‖1 (Section 1.5 [74]). The generator L of the semigroup, is defined viathe Friedrichs extension (Theorem 1.3.1 [74]) and it is a positive self-adjoint operator on its natural domainD(L) with C∞0 being the core. The generator is given by L = 1

2N∆ − 12 (∇H)∇ on its domain (Corollary

1.3.1 [74]). By the spectral theorem, Tt maps L2 into D(L), thus with the notation ft = Ttf for some f ∈ L2,it holds that

∂tft = Lft, t > 0, and limt→0+

‖ft − f‖2 = 0. (12.25)

Moreover, by approximating f by L2 functions and using that Tt is contraction in L1 (Section 1.5 in [74]),the differential equation holds even if the initial condition f is only in L1. In this case the convergenceft → f , as t→ 0+, holds only in L1. We remark that Tt is also a contraction on L∞, by duality.

Now we restrict the dynamics to Σ = ΣN equipped with the probability measure µΣ(dx) := N !µ(dx) ·1(x ∈ Σ). Here µN is from (12.13) and the factor N ! restores the normalization after restriction to Σ.Repeating the general construction with RN replaced by ΣN , we obtain the corresponding generator L(Σ)

and the semigroup T(Σ)t on the space H1(Σ,dµΣ) that is defined as the closure of C∞0 (Σ) with respect to

the norm ‖ · ‖2+ = E(·, ·) + ‖ · ‖22 on Σ.

To establish the relation between L and L(Σ), we first define the symmetrized version of Σ

Σ := RN \x : ∃i 6= j with xi = xj

.

Denote X := C∞0 (Σ). The key information is that X is dense in H1(RN ,dµ) which is equivalent to thedensity of X in C∞0 (RN ,dµ). We will check this property below. Then the general argument above directly

applies if RN is replaced by ΣN and it shows that the generator L is the same (with the same domain) if westart from X instead of C∞0 (RN ,dµ) as a core.

Note that both L and L(Σ) are local operators and L is symmetric with respect to the permutation ofthe variables. For any function f defined on Σ, we define its symmetric extension onto Σ by f . Clearly,

Lf = L(Σ)f for any f ∈ C∞0 (Σ). Since the generator is uniquely determined by its action on its core, and the

generator uniquely determines the dynamics, we see that for any f ∈ L1(Σ,dµ), one can determine T(Σ)t f by

computing Ttf and restricting it to Σ. In other words, the dynamics (12.17) is well defined when restrictedto Σ = ΣN .

Finally, we have to prove the density of X in C∞0 (RN ,dµ), i.e., to show that if f ∈ C∞0 (RN ), then there

exists a sequence fn ∈ C∞0 (Σ) such that E(f − fn, f − fn) → 0. The structure of Σ is complicated since inaddition to the one codimensional coalescence hyperplanes xi = xj (and xi = 0 in case of Σ+), it containshigher order coalescence subspaces with higher codimensions. We will show the approximation argument ina neighborhood of a point x such that xi = xj but xi 6= xk for any other k 6= i, j. The proof uses the factthat the measure dµ vanishes at least to first order, i.e., at least as |xi − xj |, around x, thanks to β > 1.This is the critical case; the argument near higher order coalescence points is even easier, since they havelower codimension and the measure µ vanishes at even higher order.

In a neighborhood of x we can change to local coordinates such that r := xi−xj remains the only relevantcoordinate. Thus the task is equivalent to show that any g ∈ C∞0 (R) can be approximated by a sequencegε ∈ C∞0 (R \ 0) in the sense that ∫

R|g′(r)− g′ε(r)|2|r|dr → 0 (12.26)

89

as ε → 0 over a discrete set, e.g. ε = εn = 1/n. It is sufficient to consider only the positive semi-axis, i.e.,r > 0. More precisely, define

gε(r) := g(r)φε(r), (12.27)

where φε is a continuous cutoff function defined as φε(r) = 0 for r 6 ε2, φε(r) = 1 for r > ε and

φε(r) = Cε(log r − 2 log ε), ε2 6 r 6 ε, Cε =1

log ε− log ε2= | log ε|−1. (12.28)

Simple calculation shows that∫ ∞0

|g′(r)− g′ε(r)|2rdr 6 C∫ ∞

0

|g′(r)|2|1− φε(r)|2rdr + C

∫ ∞0

|g(r)|2|φ′ε(r)|2rdr. (12.29)

The first term converges to zero as ε → 0 by dominated convergence since g ∈ H1(R+, rdr) and φε → 1pointwise. The second term is bounded by∫ ∞

0

|g(r)|2|φ′ε(r)|2rdr 6 C2ε‖g‖2∞

∫ ε

ε2

∣∣∣1r

∣∣∣2rdr =‖g‖2∞| log ε|

,

which also tends to zero since g ∈ L∞. This shows that gε converges to g in H1. The function gε is notyet smooth, but we can smooth it on a scale δ ε2 by taking the convolution with a smooth function ηδcompactly supported in [−δ, δ]. It is an elementary exercise to see that ηδ ?gε converges to gε in H1 as δ → 0.This verifies (12.26). The same proof shows that C∞0 (R+ \0) is dense in H1(R+, rdr)∩L∞(R+, rdr). Thiscompletes the proof that for β > 1 the dynamics is well defined in the space H1(Σ,dµΣ).

In fact, the boundedness condition f ∈ L∞(R+, rdr) can be removed in the above argument and one canshow that C∞0 (R+ \ 0) is dense in H1(R+, rdr). This is a type of Meyers-Serrin theorem concerning theequivalence of two possible definitions of Sobolev spaces on the measure space (R+, rdr): the closure of C∞0functions coincides with the set of functions with finite H1-norm. Another formulation is the following: Ifwe extend the functions on R+ to two dimensional radial functions, i.e., G(x) := g(|x|), Gε(x) = gε(|x|) forx ∈ R2, then this statement is equivalent to the fact that a point in two dimensions has zero capacity.

To see this stronger claim, we need a different cutoff function in (12.27). Let us now define

φε(r) := log(a+ b log r), ε2 6 r 6 ε,

with b := (1 − e)/ log ε, a := 2e − 1, and φε(r) := 0 for r < ε2, and φε(r) := 1 for r > ε. The first term onthe right hand side of (12.29) goes to zero as before. For the last term in (12.29), we have∫ ∞

0

|g(r)|2|φ′ε(r)|2rdr = b2∫ ε

ε2

|g(r)|2)

r2(a+ b log r)2rdr.

We view this as an integral on R2, by radially extending the function g to R2 by G(y) := g(|y|) for anyy ∈ R2. Then the LHS above gives

b2∫ ε

ε2

|g(r)|2

r2(a+ b log r)2rdr = b2

∫ε26|y|6ε

|G(y)|2

|y|2(a+ b log |y|)2dy 6 C

∫ε26|y|6ε

|G(y)|2

|y|2(log |y|)2dy.

Define the sequence εk := 2−2k , clearly ε2k = εk+1. Setting

Ik :=

∫εk+16|y|6εk

|G(y)|2

|y|2(log |y|)2dy,

we clearly have

∞∑k=1

Ik =

∫|y|61/2

|G(y)|2

|y|2(log |y|)2dy 6 C

∫R2

|∇G|2 = C ′∫ ∞

0

|g′(r)|2rdr,

90

where the last step we used the critical Hardy inequality with logarithmic correction (see, e.g. Theorem 2.8in [48]). Since in our case g ∈ H1(R+, rdr), from the last displayed inequality we have Ik → 0 as k → ∞.Therefore, we can use the cutoff function φεk to prove the claim.

The Meyers-Serrin theorem we just proved for (R+, rdr) can be extended to (RN ,dµ) using local changesof coordinates if β > 1. Thus one may prove that the form domain of the operator L(Σ) is H1(Σ,dµΣ) andthe dynamics (12.25) is well defined on this space.

91

13 Entropy and the logarithmic Sobolev inequality (LSI)

13.1 Basic properties of the entropy

In this section, we prove a few well-known inequalities for the entropy which will be used in next section forthe logarithmic Sobolev inequality (LSI). In this section we work on an arbitrary probability space (Ω,B)with an appropriate σ-algebra. We will use the probabilistic and the analysis notations and concepts inparallel. In particular, we interchangeably use the concept of random variables and measurable functions onΩ. Similarly, both the expectation Eµ[f ] and the integral notation

∫Ωfdµ will be used. We will drop the Ω

in the notation.

Definition 13.1. For any two probability measures ν, µ on the same probability space, we define the relativeentropy of ν w.r.t. µ by

S(ν|µ) :=

∫log

dν

dµdν =

∫dν

dµlog

dν

dµdµ (13.1)

if ν is absolutely continuous with respect to µ and we set S(ν|µ) :=∞ otherwise.

Applying the definition (13.1) to ν = fµ for any probability density f with∫fdµ = 1, we may also write

Sµ(f) := S(fµ|µ) =

∫f(log f)dµ.

In most cases, the reference measure µ will be canonical (e.g. the natural equilibrium measure), so often wewill drop the subscript µ if there is no confusion. We may then call Sµ(f) = S(f) the entropy of f . Usingthe convexity of the function x 7→ x log x on R+, a simple Jensen’s inequality,

0 =(∫

fdµ)

log(∫

fdµ)6∫f log fdµ,

shows that the relative entropy is always non-negative, S(ν|µ) > 0.

Proposition 13.2 (Entropy inequality or Gibbs inequality). Let µ, ν be two probability measures on a commonprobability space and let X be a random variable. Then, for any positive number α > 0, the followinginequality holds as long as the right hand side is finite.

Eν [X] 6 α−1S(ν|µ) + α−1 logEµ eαX . (13.2)

Proof. First note that we only have to prove the case α = 1 since we can redefine αX → X. From theconcavity of the logarithm and Jensen’s inequality, we have∫

Xdν − S(ν|µ) =

∫log[eX

dµ

dν

]dν 6 logEν eX ,

and this proves (13.2).Although (13.2) is just an inequality, there is a variational characterization of the relative entropy behind

it. Namely,

S(ν|µ) = supX

[Eν [X]− logEµ eX

], (13.3)

where the supremum is over all bounded random variables. As we will not need this relation in this book,we leave it to the interesting readers to prove it.

As a corollary of (13.2) we mention that for any set A we have the bound

Pν(A) 6log 2 + S(ν|µ)

log 1Pµ(A)

. (13.4)

92

The proof is left as an exercise. This inequality will be used in the following context. Suppose that therelative entropy of ν with respect to µ is finite. Then in order to show that a set has a small probabilityw.r.t ν, we need to verify that this set is exponentially small w.r.t. µ. In this sense, entropy provides onlya relatively weak link between two measures. However, it is still stronger than the total variational normwhich we will show now.

For any two probability measures fµ and µ, the Lp distance between fµ and µ is defined by[ ∫|f − 1|pdµ

]1/p. (13.5)

When p = 1, it is called the total variational norm between fµ and µ. Entropy is a weaker measure ofdistance between two probability measures than the Lp distance for any p > 1 but stronger than the totalvariational norm. The former statement can be expressed, for example, by the elementary inequality∫

f log fdµ 6 2[ ∫|f − 1|pdµ

]1/p+

2

p− 1

∫|f − 1|pdµ, p > 1, (13.6)

whose proof is left as an exercise. (There is a more natural way to compare Lp distance and the relativeentropy in terms of the L logL Orlicz norm but we will not use this norm in this book.) The latter statementwill be made precise in the following proposition. Furthermore, we also remark the following easy relationamong Lp norms and the entropy:

d

dp

∣∣∣p=1

[ ∫fpdµ

]1/p=

∫f log fdµ (13.7)

holds for any probability density f w.r.t. µ.

Proposition 13.3 (Entropy and total variation norm, Pinsker inequality). Suppose that∫fdµ = 1 and f > 0.

Then we have

2[ ∫|f − 1|dµ

]26∫f log fdµ. (13.8)

Proof. By variational principle, we first rewrite∫|f − 1|dµ = sup

|g|61

∫fgdµ−

∫gdµ. (13.9)

For any such function g, we have by the entropy inequality (13.2) that for any t > 0∫fgdµ−

∫gdµ 6 t−1 log

∫etgdµ−

∫gdµ+ t−1

∫f log fdµ. (13.10)

We now define the function

h(t) := log

∫etgdµ

for any t > 0. A simple calculation shows that the second derivative of h is given by

h′′(t) = 〈g; g〉ωt , ωt :=etgµ∫etgdµ

,

where ωt is a probability measure and

〈f ; g〉ω :=

∫fgdω −

(∫fdω

)(∫gdω

)denotes the covariance. Recall that the covariance is positive definite, i.e., it satisfies the usual Schwarzinequality,

|〈f ; g〉ω|2 6 〈f ; f〉ω〈g; g〉ω.

93

Since |g| 6 1, we have 0 6 〈g; g〉ωt 6 1, so h′′(t) 6 1. Thus by Taylor’s theorem, h(t) 6 h(0) + th′(0) + t2/2,i.e.,

t−1 log

∫etgdµ 6

∫gdµ+ t/2 .

Together with (13.10), we have∫fgdµ−

∫gdµ 6

t

2+ t−1

∫f log fdµ 6

√1

2

∫f log fdµ ,

where we optimized t in the last step. Since this bound holds for any g with |g| 6 1, using (13.9) we haveproved (13.8).

13.2 Entropy on product spaces and conditioning

Now we consider a product measure, i.e., we assume that the probability space has a product structure,Ω = Ω1 × Ω2 and µ = µ1 ⊗ µ2, ν = ν1 ⊗ ν2, where µj and νj are probability measures on Ωj , j = 1, 2. Forsimplicity, we denote the elements of Ω by (x, y) with x ∈ Ω1, y ∈ Ω2. Writing νj = fjµj , clearly we have

S(ν1 ⊗ ν2|µ1 ⊗ µ2) = Sµ1⊗µ2(f1 ⊗ f2) =

∫∫Ω1×Ω2

f1(x)f2(y)[

log f1(x) + log f2(y)]µ1(dx)µ2(dy)

= Sµ1(f1) + Sµ2(f2) = S(ν1|µ1) + S(ν2|µ1), (13.11)

i.e., the entropy is additive for product measures.This property of the entropy makes it an especially suitable tool to measure closeness in very high

dimensional analysis. For example, if the probability space is a large product space Ω = Ω⊗N1 equipped withtwo product measures µ = µ⊗N1 and ν = f⊗Nµ, then their relative entropy

Sµ(f⊗N ) = NSµ1(f) (13.12)

grows only linearly in N , the number of degrees of freedom. It is easy to check that any Lp distance (13.5)for p > 1 grows exponentially in N , which usually renders it useless. Thus the entropy is still stronger thanthe L1 norm, but its growth in N is much more manageable. An additional advantage is that the entropy isoften easier to compute than any other Lp-norm (with the exception of L2).

Next we discuss how the entropy decomposes under conditioning. For simplicity, we assume the productstructure, Ω = Ω1×Ω2 as before. Let ω be a probability measure on the product space Ω. For any integrablefunction (random variable) u(x, y) on Ω we denote its conditional expectation by

u = Eω[u|F1], (13.13)

where F1 is the σ-algebra of Ω1 canonically lifted to Ω. Note that u = u(x) depends only on the x variable.The conditional expectation u may also be characterized by the following relation: it is the unique

measurable function on Ω1 such that for any bounded measurable (w.r.t. F1) function O(x) the followingidentity holds:

Eω[uO]

= Eω[uO].

Written out in coordinates, it means that∫∫Ω

u(x)O(x)ω(dx dy) =

∫∫Ω

u(x, y)O(x)ω(dx dy). (13.14)

In particular, if ω(dxdy) = ω(x, y)dxdy is absolutely continuous with respect to some reference productmeasure dxdy on Ω, then we have, somewhat formally,

u(x) =

∫Ω2u(x, y)ω(x, y)dy∫Ω2ω(x, y)dy

. (13.15)

94

The notation dxdy for the reference measure already indicates that in our applications Ωj will be Euclideanspaces Rnj of some dimension nj , and the reference measure will be the Lebesgue measure.

We remark that the concept of conditioning can be defined in full generality with respect to any sub-σ-algebra; the product structure of Ω is not essential. However, we will not need the general definition in thisbook and the above definition is conceptually simpler.

The conditional expectation gives rise to a trivial martingale decomposition:

u = u+ (u− u) , (13.16)

where u is F1-measurable, while u − u has zero expectation on any F1-measurable set. Subtracting theexpectation u := Eωu = Eωu and squaring this formula, we have the martingale decomposition of thevariance of u:

Varω(u) := Eω(u− u)2 = Eω(u− u)2 + Eω(u− u)2 = Eω Var(u(x, ·)) + Varω(u) , (13.17)

where we defined the conditional variance

Var(u(x, ·)) := E[(u− u)2|F1] = E[u2|F1]− [u]2 .

The identity (13.17) is a triviality, but its interpretation is important. It means that the variance is additivew.r.t. the martingale decomposition (13.16). The first term Eω Var(u(x, ·)) is the expectation of the variancew.r.t. y conditioned on x; the second term Var(u − u)2 is the variance of the marginal w.r.t. x. In otherwords, we can compute the variance one by one.

The martingale decomposition has an analogue for the entropy. For simplicity, we assume that ω has adensity ω(x, y) w.r.t. a reference measure dxdy. Denote by ω the marginal probability density on Ω1,

ω(x) =

∫Ω2

ω(x, y)dy ,

and by f the marginal density of fω w.r.t. ω, i.e.,

f(x) =

∫Ω2f(x, y)ω(x, y)dy

ω(x).

In particular, for any test function O(x) we have∫∫Ω

O(x)f(x, y)ω(x, y)dxdy =

∫Ω1

O(x)f(x)ω(x)dx . (13.18)

Let

ωx(y) :=ω(x, y)

ω(x)

be the probability density on Ω2 conditioned on a fixed x ∈ Ω1. Define

fx(y) :=f(x, y)

f(x)(13.19)

to be the corresponding density of the measure fω conditioned on a fixed x ∈ Ω1. Note that with thesedefinitions we have

(fω)x(y) =f(x, y)ω(x, y)∫

Ω2f(x, y)ω(x, y)dy

= fx(y)ωx(y) .

Now we are ready to state the following proposition on the additivity of entropy w.r.t. martingaledecomposition:

95

Proposition 13.4. With the notations above, we have

Sω(f) = Ef ωSωx(fx) + Sω(f) . (13.20)

In particular, the marginal entropy is bounded by the total entropy, i.e.,

Sω(f) 6 Sω(f) . (13.21)

In other words, the entropy is additive w.r.t. the martingale decomposition, i.e., the entropy is the sumof the expectation of the entropy in y conditioned on x and the entropy of the marginal in x. The additivityof entropy in this sense is an important tool in the application of the LSI as we will demonstrate shortly. Italso indicates that entropy is an extensive quantity, i.e., the entropy of two probabilities on a product spaceΩN is often of order N , generalizing the formulas (13.11)-(13.12) to non-product measures.

Proof. We can decompose the entropy as follows:

Sω(f) =

∫Ω

f log fdω =

∫Ω

f log(f/f)dω +

∫f log fdω =

∫Ω

f log(f/f)dω +

∫Ω1

f log fdω , (13.22)

where in the last step we used (13.18). The first term we can rewrite as∫f log(f/f)ω(x, y)dxdy =

∫dxω(x)f(x)

[ ∫fx(y) log fx(y)ωx(y)dy

]= Ef ωSωx(fx) . (13.23)

We have thus proved (13.20).

13.3 Logarithmic Sobolev inequality

The logarithmic Sobolev inequality requires that the underlying probability space admits a concept of dif-ferentiation. While the theory can be developed for more general spaces, we restrict our attention to theprobability space Ω = RN in this subsection. We will work with probability measures µ on RN that aredefined by a Hamiltonian H:

dµ(x) =e−H(x)

Zdx, (13.24)

where Z is a normalization factor. In Section 13.7 we will comment on how to extend the results of thissection to certain subsets of RN , most importantly, to the simplex ΣN = x ∈ RN : x1 < x2 < . . . < xN.

Note that for simplicity in this presentation we neglect the βN prefactor compared with (12.13). Let Lbe the generator of the dynamics associated with the Dirichlet form

D(f) = Dµ(f) = −∫f(Lf)dµ :=

∫|∇f |2dµ =

N∑j=1

∫(∂jf)2dµ, ∂j = ∂xj . (13.25)

Formally we have L = ∆− (∇H) · ∇. The operator L is symmetric with respect to the measure dµ, i.e.,∫f(Lg)dµ =

∫(Lf)gdµ = −

∫∇f · ∇gdµ. (13.26)

We remark that in many books on probability, e.g. [43], the Dirichlet form (13.25) is defined with a factor1/2, but this convention is not compatible with the 1/(βN) prefactor in (12.14). The lack of this 1/2 factorin (13.25) causes slight deviations from their customary form in the following theorems.

Definition 13.5. The probability measure µ on RN satisfies the logarithmic Sobolev inequality if there existsa constant γ such that

S(f) =

∫f log f dµ 6 γ

∫|∇√f |2dµ = γD(

√f) (13.27)

holds for any smooth density function f > 0 with∫fdµ = 1. The smallest such γ is called the logarithmic

Sobolev inequality constant of the measure µ.

96

A simple density argument shows that (13.27) extends from smooth functions to any nonnegative functionf ∈ C∞0 (RN ), the space of smooth functions with compact support. In fact, it is easy to extend it to all√f ∈ H1(dµ). Since we will not use the space H1(dµ) in this book, we will just use f ∈ C∞0 (RN ) in the

LSI.

Theorem 13.6 (Bakry-Emery). [13] Consider a probability measure µ on RN of the form (13.24), i.e.,µ = e−H/Z. Suppose that a convexity bound holds for the Hamiltonian, i.e., with some positive constant Kwe have

∇2H(x) > K (13.28)

for any x (in the sense of quadratic forms). Then the logarithmic Sobolev inequality holds (13.27) with anLSI constant γ 6 2/K, i.e.,

S(f) 62

KD(√f), for any density f with

∫fdµ = 1. (13.29)

Furthermore, the dynamics∂tft = Lft, t > 0, (13.30)

relaxes to equilibrium on the time scale t 1/K both in the sense of entropy and Dirichlet form:

S(ft) 6 e−2tKS(f0), D(

√ft) 6

2

te−tKS(f0). (13.31)

Proof. Let ft be the solution to the evolution equation (13.30) with a given smooth initial condition f0.Simple calculation shows that the derivative of the entropy S(ft) is given by

∂tS(ft) =

∫(Lft) log ftdµ+

∫ftLftft

dµ = −∫

(∇ft)2

ftdµ = −4D(

√ft), (13.32)

where we used that∫Lftdµ = 0 by (13.26). Similarly, we can compute the evolution of the Dirichlet form.

Let ht :=√ft for simplicity, then

∂tht =1

2ht∂th

2t =

1

2htLh2

t = Lht +1

ht(∇ht)2.

We compute (dropping the t subscript for brevity)

∂tD(√ft) =∂t

∫(∇h)2dµ = 2

∫∇h · ∇∂th

=2

∫(∇h) · (∇Lh)dµ+ 2

∫(∇h) · ∇ (∇h)2

hdµ

=2

∫(∇h) · [∇,L]hdµ+ 2

∫(∇h) · L(∇h)dµ+ 2

∫ ∑ij

∂ih[2(∂jh)∂i∂jh

h− (∂jh)2∂ih

h2

]dµ

=− 2

∫(∇h) · (∇2H)∇hdµ− 2

∫ ∑ij

(∂i∂jh)2dµ+ 2

∫ ∑ij

[2(∂jh)(∂ih)∂ijh

h− (∂jh)2(∂ih)2

h2

]dµ

=− 2

∫(∇h) · (∇2H)∇hdµ− 2

∫ ∑ij

(∂ijh−

(∂ih)(∂jh)

h

)2

dµ, (13.33)

where we used the commutator[∇,L] = −(∇2H)∇.

Therefore, under the convexity condition (13.28), we have

∂tD(√ft) 6 −2KD(

√ft). (13.34)

97

Integrating (13.34), we have

D(√ft) 6 e

−2tKD(√f0). (13.35)

This proves that the equilibrium is achieved at t =∞ with f∞ = 1, and both the entropy and the Dirichletform are zero. Integrating (13.32) from t = 0 to t =∞, and using the monotonicity of D(

√ft) from (13.34),

we obtain

−S(f0) = −4

∫ ∞0

D(√ft)dt > −4D(

√f0)

∫ ∞0

e−2tKdt = − 2

KD(√f0). (13.36)

This proves the LSI (13.29) for any smooth function f = f0. By a standard density argument, it canbe extended to any function f with finite Dirichlet form, D(

√f) < ∞. For functions with unbounded

Dirichlet form we interpret the LSI (13.29) as a tautology. In particular, (13.29) holds for ft as well,S(ft) 6 (2/K)D(

√ft). Inserting this back to (13.32), we have

∂tS(ft) 6 −2KS(ft).

Integrating this inequality from time zero, we obtain the exponential relaxation of the entropy on time scalet 1/K

S(ft) 6 e−2tKS(f0). (13.37)

Finally, we can integrate (13.32) from time t/2 to t to get

S(ft)− S(ft/2) = −4

∫ t

t/2

D(√fτ )dτ.

Using the positivity of the entropy S(ft) > 0 on the left side and the monotonicity of the Dirichlet form(from (13.34)) on the right side, we get

D(√ft) 6

2

tS(ft/2), (13.38)

thus, using (13.37), we obtain exponential relaxation of the Dirichlet form on time scale t 1/K

D(√ft) 6

2

te−tKS(f0).

Standard example. Let µ be the centered Gaussian measure with variance a2 on Rn, i.e.,

dµ(x) =1

(2πa2)n/2e−x

2/2a2dx. (13.39)

Written in the form e−H/Z with the quadratic Hamiltonian H(x) = x2/2a2, we see that (13.28) holds withK = a−2. Then (13.29) with the replacement f → f2 yields that∫

f2 log f2dµ 6 2a2

∫(∇f)2dµ (13.40)

for any normalized∫f2dµ = 1. The constant of this inequality is optimal.

Alternative formulation of LSI. We remark that there is another formulation of the LSI (see [99]). Tomake this connection, let

g(x) := f(x)1

(2πa2)n/4e−x

2/4a2 , (13.41)

and notice that∫g2(x)dx =

∫f2dµ, where dx is the Lebesgue measure and dµ is of the form (13.39).

Rewriting (13.40) to an inequality for g and with the replacement 2πa2 → a2, we get the following versionof the LSI ∫

Rng2 log(g2/‖g‖2)dx+ n[1 + log a]

∫Rng2dx 6 (a2/π)

∫Rn|∇g|2dx (13.42)

98

that holds for any a > 0 and any function g, where ‖g‖ = (∫g2dx)1/2 is the L2 norm with respect to the

Lebesgue measure.

Proposition 13.7 (LSI implies spectral gap). Let µ satisfy the LSI (13.27) with a LSI constant γ. Then forany v ∈ L2(µ) with

∫vdµ = 0 we have∫

v2dµ 6γ

2

∫|∇v|2dµ =

γ

2D(v) , (13.43)

i.e., µ has a spectral gap of size at least γ/2.

Proof. By definition of the LSI constant, we have∫u log udµ 6 γD(

√u)

for any u with∫udµ = 1. For any bounded smooth function v with

∫vdµ = 0, define u = 1 + εv. Then we

have

ε−2

∫(1 + εv) log(1 + εv)dµ 6

γ

4

∫|∇v|2

1 + εvdµ . (13.44)

Taking the limit ε → 0, we get that the right hand side converges to 12

∫v2dµ by dominated convergence.

This proves the proposition.

Proposition 13.8 (Concentration inequality (Herbst bound)). Suppose that the measure µ satisfies the LSIwith a constant γ. Let F be a function with EµF = 0. Then we have

Eµ eF 6 exp(γ

4‖∇F‖2∞

). (13.45)

where

‖∇F‖∞ := supx

√∑i

|∂iF (x)|2.

In particular, we have

Pµ(|F | > α) 6 exp(− α2

γ‖∇F‖2∞

)(13.46)

for any α > 0.

Notice that we get an exponential tail estimate from the LSI. If we only have the spectral gap estimate,(13.43), we can only bound the variance of F which yields a quadratic tail estimate

Pµ(|F | > α) 6γ‖∇F‖2∞

2α2.

In our typical applications, we have γ‖∇F‖2∞ N−1. We often need to control the concentration of many( NC) different functions F in parallel. Thus the simple union bound is applicable with the LSI but notwith the spectral gap estimate.

Proof. Denote by

u = u(t) :=exp

(etF

)Eµ exp

(etF

) .By differentiation and the LSI, we have

d

dt

[e−t logEµ exp

(etF

)]= e−tEµu log u 6 e−tγEµ|∇

√u|2. (13.47)

99

Clearly,

Eµ|∇√u|2 6 e2t

4‖∇F‖2∞. (13.48)

Integrating from any t < 0 to 0 yields that[logEµ exp

(F)]−[e−t logEµ exp

(etF

)]=γ

4‖∇F‖2∞

∫ 0

t

ds es. (13.49)

From the condition EµF = 0 we have

limt→−∞

[e−t logEµ exp

(etF

)]= limε→0

1

εlogEµeεF = lim

ε→0

1

εlogEµ

[1 + εF +O(ε2eεF )

]= 0

by dominated convergence. This proves the inequality (13.45). The concentration bound (13.46) followsfrom an exponential Markov inequality:

Pµ(F > α) 6 Eµet(F−α) 6 exp(− αt+ t2

γ

4‖∇F‖2∞

),

where we chose the optimal t = 2α/[γ‖∇F‖2∞] in the last step. Changing F → −F we obtain the oppositebound and thus we proved (13.46).

Proposition 13.9 (Stability of the LSI constant). Consider two measures µ, ν on RN that are related byν = gµ with some bounded function g. Let γµ denote the LSI constant for µ. Then

γν 6 ‖g‖∞‖g−1‖∞γµ.

Proof. Take an arbitrary function f > 0 with∫fdν = 1. Let α :=

∫fdµ 6 ‖g−1‖∞ and by definition of the

entropy we have

Sµ(f/α) =

∫(f/α) log(f/α)dµ.

The following inequality for any two nonnegative numbers a, b can be checked by elementary calculus:

a log a− b log b− (1 + log b)(a− b) > 0.

Hence, ∫ [f log f − α logα− (1 + logα)(f − α)

]dν

6 ‖g‖∞∫ [

f log f − α logα− (1 + logα)(f − α)]dµ = ‖g‖∞αSµ(f/α) .

The left hand side of the last inequality is equal to

Sν(f)− [logα− (α− 1)] > Sν(f) ,

where we have used the concavity of log. This proves that

Sν(f) 6 ‖g‖∞αSµ(f/α) .

Suppose that the LSI holds for µ with constant γµ. Then we have

Sν(f) 6 ‖g‖∞γµα∫

(∇√f/α)2dµ = ‖g‖∞γµ

∫(∇√f)2g−1dν

6 ‖g‖∞‖g−1‖∞γµ∫

(∇√f)2dν ,

and this proves the lemma.

100

Proposition 13.10 (Tensorial property of the LSI). Consider two probability measures µ, ν on Rn,Rm respec-tively. Suppose that the LSI holds for them with LSI constants γµ and γν , respectively. Let ω = µ⊗ ν be theproduct measure on Rm+n. Then the LSI holds for ω with LSI constant γω 6 maxγµ, γν.Proof. We will use the notations introduced in Section 13.2 with Ω1 = Rn, Ω2 = Rm. Recall the additivityof entropy (13.20) w.r.t. martingale decomposition. In the current situation, ω = µ⊗ ν and thus ω = µ andωx = ν for any x. Furthermore,

f(x) = ω(x)−1

∫f(x, y)ω(x, y)dy =

∫f(x, y)ν(y)dy . (13.50)

By the additivity of entropy and the LSI w.r.t. µ and ν, we have

Sω(f) = EfµSν(fx) + Sµ(f)

6 γν

∫f(x)µ(x)dx

∫ |∇y√f(x, y)|2

f(x)ν(y)dy + γµ

∫ ∣∣∣∇x√∫

f(x, y)ν(y)dy∣∣∣µ(x)dx. (13.51)

The integral in the first term on the right hand side is equal to∫∫|∇y

√f(x, y)|2µ(x)ν(y)dxdy.

The integral of the second term on the right hand side is bounded by∫ ∣∣ ∫ ∇xf(x, y)ν(y)dy∣∣2

4∫f(x, y)ν(y)dy

µ(x)dx 6∫∫ ∣∣∇x√f(x, y)

∣∣2µ(x)ν(y)dxdy,

where we have written ∇xf = 2(∇x√f)√f and used the Schwarz inequality. Summarizing, we have proved

that

Sω(f) 6 γν

∫|∇y

√f(x, y)|2ω(x, y)dxdy + γµ

∫ ∣∣∇x√f(x, y)∣∣2ω(x, y)dxdy,

and this proves the proposition.

The following lemma is a useful tool to control the entropy flow w.r.t non-equilibrium measure.

Lemma 13.11. [139] Suppose we have evolution equation ∂tft = Lft with L defined via the Dirichlet form〈f, (−L)f〉µ =

∑j

∫(∂jf)2 dµ. Then for any time dependent probability density ψt w.r.t. µ, we have the

entropy flow identity

∂tSµ(ft|ψt) = −4∑j

∫(∂j√gt)

2 ψt dµ+

∫gt(L − ∂t)ψt dµ ,

where gt := ft/ψt and

Sµ(ft|ψt) :=

∫ft log

ftψt

dµ = S(ftµ|ψtµ)

is the relative entropy of ftµ w.r.t. ψtµ.

Proof. A simple computation then yields that

d

dtSµ(ft|ψt) =

∫(Lft)(log gt) dµ+

∫ftLftft

dµ−∫

(∂tψt) gt dµ

=

∫(Lft)(log gt) dµ−

∫(∂tψt) gt dµ

=

∫(gtψt)L(log gt) dµ−

∫gt∂tψt dµ

=

∫ψt

[gtL(log gt)− gt

Lgtgt

]dµ+

∫gt(L − ∂t)ψt dµ.

101

By definition of L, we have

L(log g)− Lgg

= −∑j

(∂jg)2

g= −4

∑j

(∂j√g)2, (13.52)

and we have proved Lemma 13.11.

We remark that stochastic processes and their generators can be defined in more general setups, e.g. theunderlying probability spaces may be different from RN and the generators may involve discrete jumps. Inthis case, the stochastic generator L is defined directly, without an a priori Dirichlet form. It can, however,be proved that −g[L(log g) − (Lg)/g] > 0 which can then be viewed as a generalization of the “Dirichletform” associated with a generator.

13.4 Hypercontractivity

We now present an interesting connection between the LSI of a probability measure µ and the hypercon-tractivity properties of the semigroup generated by L = Lµ. Since this result will not be used later in thisbook, this section can be skipped.

To state the result, we define the semigroup Ptt>0 by Ptf := ft, where ft solves the equation ∂tft = Lftwith initial condition f0 = f .

Theorem 13.12 (L. Gross [78]). For a measure µ on Rn and for any fixed constants β > 0 and γ > 0 thefollowing two statements are equivalent:

(i) The generalized LSI∫f log f dµ 6 γ

∫|∇√f |2dµ+ β, for any f > 0 with

∫fdµ = 1, (13.53)

holds;

(ii) The hypercontractivity estimate

‖Ptf‖Lq(µ) 6 exp

β

[1

p− 1

q

]‖f‖Lp(µ) (13.54)

holds for all exponents satisfying

q − 1

p− 16 e4t/γ , 1 < p 6 q <∞.

Proof. We will only prove (i)⇒(ii), i.e., that the generalized LSI implies the decay estimate, the proof of theconverse statement can be found in [43]. First we assume that f > 0, hence ft > 0. Direct differentiationyields the following identity

d

dtlog ‖ft‖p(t) =

p(t)

p(t)2

[−4(p(t)− 1)

p(t)D(u(t)) +

∫u(t)2 log(u(t)2)dµ

], (13.55)

with p(t) = ddtp(t) and where we defined

u(t) := fp(t)/2t ‖ft‖−p(t)/2p(t) , D(u) =

∫(∇u)2dµ.

Now choose p(t) to solve the differential equation

γ =4(p(t)− 1)

p(t), with p(0) = p, i.e., p(t) = 1 + (p− 1)e4t/γ ,

102

where γ is the constant given in the theorem. Using (13.53) for the L1(µ)-normalized function u(t)2, wehave

d

dtlog ‖ft‖p(t) 6

βp(t)

p(t)2.

Integrating both sides, from t = 0 to some T we have

log ‖fT ‖p(T ) − log ‖f‖p 6 β[1

p− 1

p(T )

].

Choosing T such that p(T ) = q, we have proved (13.54) for f > 0. The general case follows from separatingthe positive and negative parts of f .

Exercise. In this exercise, we show that the idea of the LSI can be useful even if the invariant measure isnot a probability measure. The sketch below follows the paper by Carlen-Loss [33] and it works for anyparabolic equation of the type

∂tft = [∇ · (D(x, t)∇) + b(x, t) · ∇]ft ,

for any divergence free b and D(x, t) > c > 0. For simplicity of notation, we consider only the heat equationon Rn

∂tft = ∆ft . (13.56)

The invariant measure of this flow is the standard Lebesgue measure on Rn.

(i) Check the formula (13.55) in this setup, i.e., show that the following identity holds:

d

dtlog ‖ft‖p(t) =

1

p(t)‖ft‖−p(t)p(t)

d

dt

∫fp(t)t dx− p(t)

p(t)2log(‖ft‖pp

)(13.57)

=p(t)

p(t)2

[−4(p(t)− 1)

p(t)

∫(∇u(t))2dx+

∫u(t)2 log(u(t)2)dx

],

where all norms are w.r.t. the Lebesgue measure and

u(t) = fp/2t ‖ft‖−p/2p , p = p(t) .

(ii) For any t fixed, choose the constant in (13.42) by setting

a2 =4(p(t)− 1)

p(t).

Thus we haved

dtlog ‖ft‖p(t) 6 −

np(t)

p(t)2

[1 +

1

2log

(4π(p(t)− 1)

p(t)

)].

For a suitable choice of the function p(t) with p(0) = 1, p(T ) =∞, show that

‖fT ‖∞ 6 (4πT )−n/2‖f0‖1. (13.58)

We remark that in the case of D ≡ 1 and b ≡ 0, this heat kernel estimate is also a trivial consequence of theexplicit formula for the heat kernel et∆(x, y). The point is, as remarked earlier, that this proof works for ageneral class of second order parabolic equations.

103

13.5 Brascamp-Lieb inequality

The following inequality will not be needed for the main result of this book, so the reader may skip it.Nevertheless, we included it here, since it is an important tool in the analysis of probability measures in veryhigh dimensions and it was used in universality proofs for the invariant ensembles [25]. We will formulate iton RN and in Section 13.7 we comment on how to extend it to certain probability measures on the simplexΣN defined in (12.3).

Theorem 13.13 (Brascamp-Lieb inequality [29]). Consider a probability measure µ on RN of the form (13.24),i.e., µ = e−H/Z. Suppose that the Hamiltonian is strictly convex, i.e.,

∇2H(x) > K > 0 (13.59)

as a matrix inequality for some positive constant K. Then for any smooth function f ∈ L2(µ) function wehave

〈f ; f〉µ 6 〈∇f,[∇2H

]−1∇f〉µ . (13.60)

Recall that 〈f, g〉µ =∫fgdµ denotes the scalar product and 〈f ; g〉µ = 〈f, g〉µ − 〈1, f〉µ〈1, g〉µ is the

covariance. With a slight abuse of notation, we also use the notation 〈F,G〉µ =∑Ni=1〈Fi, Gi〉µ for the scalar

product of any two vector-valued functions F,G : RN → RN ; this extended scalar product is used in theright hand side of (13.60).

Remark: Notice that the Brascamp-Lieb inequality is optimal in the sense that, if µ is a Gaussianmeasure, then we can find f so that the inequality becomes equality. Moreover, if we use the matrixinequality ∇2H > K, then (13.60) is reduced to the spectral gap inequality (13.43), but of course (13.60) isstronger. The following proof is due to Helffer-Sjostrand [80] and Naddaf-Spencer [108].

Proof. Recall that L is the generator for a reversible dynamics with reversible measure µ defined in (13.26).For the dynamics (13.30), we have

d

dt〈f, etLg〉 = 〈f,LetLg〉 = −

∑j

〈∂xjf, ∂xjetLg〉.

DefineG(t, x) := (G1(t,x), . . . , GN (t,x)), Gj(t,x) := ∂xj

[etLg(x)

].

Recall the explicit form of L:

L = ∆− (∇H) · ∇ = ∆ +∑k

bk∂xk , bk := −∂xkH.

Define a new generator L on any vector-valued function G : RN → RN by

(LG)j(x) := LGj(x) +∑k

(∂xj bk)Gk(x).

Clearly, by explicit computation, we have

∂tG(t,x) = LG(t,x).

From the decay estimate (13.31), the dynamics are mixing, i.e., limt→∞〈f, etLg〉 = 0 for any smoothfunction f with

∫fdµ = 0. Thus for any such function f we have

〈f, g〉µ = −∫ ∞

0

d

dt〈f, etLg〉µdt =

∑j

∫ ∞0

〈∂xjf,Gj(t,x)〉µdt =

∫ ∞0

〈∇f, etL∇g〉µdt

= 〈∇f, (−L)−1∇g〉µ.

104

Taking g = f and from the definition of L, we have

−(L)i,j = −Lδi,j − (∂xibj) > (∂xi∂xjH)

as an operator inequality and we have proved that

〈f, f〉µ = 〈∇f, (−L)−1∇f〉µ 6 〈∇f,[∇2H

]−1∇f〉µ.

13.6 Remarks on the applications of the LSI to random matrices

The Herbst bound provides strong concentration results for probability measures that satisfy the LSI. Inthis section we exploit this connection for random matrices. For definiteness, we will consider the Gaussianorthogonal ensemble (GOE), although the arguments below can be extended to more general Wigner as wellas invariant ensembles. There are two ways to view Gaussian matrix ensembles in the context of the LSI:we can either work on the probability space of the matrix H with Gaussian matrix elements, or we can workdirectly on the space of the eigenvalues λ by using the explicit formula (4.12) for invariant ensembles. Bothmeasures satisfy the LSI (in the next Section 13.7 we will comment on how to generalize the LSI from RNto the simplex ΣN , (12.3), a version of the LSI that we will actually use). We will explore both possibilities,and we start with the matrix point of view.

13.6.1 LSI from the Wigner matrix point of view

Let µ = µG be the standard Gaussian measure µ ∼ exp (−N TrH2) on symmetric matrices. Notice thatfor this measure the family xij = N1/2hij , i 6 j is a collection of independent standard Gaussian randomvariables (up to a factor 2). Hence the LSI holds for every matrix element and by its tensorial property, i.e.,Proposition 13.10, the LSI holds for any function of the full matrix considered as a function of this collectionxij = N1/2hij , i 6 j. In particular, using the spectral gap estimate (13.43) for the Gaussian variables xij ,we have for any function F = F (H),

〈F (H);F (H)〉µ 6 C∑i6j

∫|∂xijF (H)|2dµ =

C

N

∑i6j

∫|∂hijF (H)|2dµ, (13.61)

where the additional N−1 factor comes from the scaling hij = N−1/2xij .Suppose that F (H) = R(λ(H)) is a real function of the eigenvalues λ = (λ1, . . . , λN ), then by the chain

rule we haveC

N

∑i6j

∫|∂hijR(λ(H))|2dµ =

C

N

∑i6j

∫ ∣∣∣∑α

∂λαR(λ)∂λα∂hij

∣∣∣2dµ.

Expanding the square and using the perturbation formula (12.9) with real eigenvectors, we can compute

1

N

∑i6j

∫ ∣∣∣∑α


∣∣∣2dµ 61

N

∑i6j

2

2− δij

∫ ∣∣∣∑α


∣∣∣2dµ

=1

N

∑i6j

2

2− δij

∑α,β

∫∂λαR(λ)∂λβR(λ)

∂λα∂hij

∂λβ∂hij

dµ

=2

N

∑i6j

[2− δij ]∑α,β

∫∂λαR(λ)∂λβR(λ)uα(i)uα(j)uβ(i)uβ(j)dµ

=2

N

∑i,j

∑α,β

∫∂λαR(λ)∂λβR(λ)uα(i)uα(j)uβ(i)uβ(j)dµ

=2

N

∑α

∫|∂λαR(λ)|2dµ, (13.62)

105

where we have used the orthogonality property and the normalization convention of the eigenvectors in thelast step.

We remark that the annoying factor [2 − δij ] can be avoided if we first consider λα as function of allxij : 1 6 i, j 6 N as independent variables. Then the perturbation formula (12.9) becomes

∂λα∂hij

∣∣∣hij=hji

= uα(i)uα(j), (13.63)

i.e., the derivative evaluated on the submanifold of Hermitian matrices. In this way we can keep the sum-mation in (13.61) unrestricted and up to a constant factor, we will get the same final result as in (13.62).

In summary, we proved

〈F ;F 〉µ = 〈R(λ(H));R(λ(H))〉µ 6C

N

∑α

∫|∂λαR(λ)|2dµ. (13.64)

Notice that this argument holds for any generalized Wigner matrix as long as a spectral gap estimate (13.43)holds for the distribution of every rescaled matrix element N1/2hij . Furthermore, it can be generalized forWigner matrices with Bernoulli random variables for which there is a spectral gap (and LSI) in discreteform. Similar remarks apply to the following LSI estimates.

To see how good this estimate is, consider a local linear statistics of eigenvalues, i.e., the local averageof K consecutive eigenvalues with label near M , i.e., set

R(λ) :=1

K

∑α

A(α−M

K

)λα,

where A a smooth function of compact support and 1 6 M,K 6 N . Computing the right hand side of(13.64) and combining it with (13.61), we get

〈F ;F 〉µ 6C

NK2

∑α

A2(α−M

K

)6

CANK

, (13.65)

where the last constant CA depends on the function A. For K = 1, this inequality estimates the squareof the fluctuation of a single eigenvalue (choosing A appropriately). The bound (13.65) is off by a factorN since the true fluctuation is of order almost 1/N by rigidity, Theorem 11.5, at least in the bulk, i.e., ifδN 6 M 6 (1 − δ)N . On the other hand, for K = N the bound (13.65) is much more precise; it showsthat the variance of a macroscopic average of the eigenvalues is at most of order N−2. This is the correctorder of magnitude, in fact, it is known that

∑αA(αN

)λα converges to a Gaussian random variable, see,

e.g. [102, 117] and references therein. Hence the spectral gap argument yields the optimal (up to constant)result for any macroscopic average of eigenvalues.

Another common quantity of interest is the Stieltjes transform of the empirical eigenvalue distribution,i.e.,

F (H) = G(λ(H)) with G(λ) = mN (z) =1

N

∑α

1

λα − z, z = E + iη. (13.66)

In this case, using (13.64), we can estimate

〈mN (z);mN (z)〉µ 6C

N

∑α

∫|∂λαG(λ)|2dµ =

C

N3

∑α

∫1

|λα − z|4dµ 6

C

N2η4, (13.67)

where we used |λα − z| > Im z = η.This shows that the variance of mN (z) vanishes if η N−1/2. The last estimate can be improved if

we know that the density of the empirical measure is bounded. Roughly speaking, if we know that in thesummation over α, the number of eigenvalues in an interval of size η near E is at most of order Nη, thenthe last estimate can be improved to

1

N3

∑α

∫1

|λα − z|4dµ .

1

N3

∑α

|λα−E|6η

∫1

|λα − z|4dµ 6

C

N3(Nη)

1

η4=

C

N2η3. (13.68)

106

This shows that the variance of mN (z) vanishes if η N−2/3. Since vanishing fluctuations can be usedto estimate the density, this argument can actually be made rigorous by a bootstrapping argument [61].Notice that the scale η N−2/3 is still far from the resolution demonstrated in the local semicircle law,Theorem 6.7.

Whenever the LSI is available, the variance bounds can be easily lifted to a concentration estimate withexponential tail. We now demonstrate this mechanism for the fluctuation of a single eigenvalue. In otherwords, we will apply (13.46) with F (H) = λα(H)− Eµλα(H) for a fixed α. From (12.9), we have∑

i6j

|∇xijF |2 62

N,

thus (13.46) implies

Pµ(|λα − Eµλα| > t) 6 exp(− cNt2

)(13.69)

for any α fixed. This inequality has two deficiencies compared with the rigidity bounds in Theorem 11.5:First, the control of |λα − Eµλα| is only up to N−1/2 which is still far from the optimal N−1 (in the bulk).Second, it does not give information on the difference between the expectation Eµλα and the correspondingclassical location γα defined as the α-th N -quantile of the limiting density, see (11.31).

13.6.2 LSI from invariant ensemble point of view

Now we pass to the second point of view, where the basic measure µG is the invariant ensemble on theeigenvalues. One might hope that the situation can be improved since we look directly at the Gaussianeigenvalue ensemble defined in (12.13) with the Gaussian choice for V (λ) = 1

2λ2. Notice that the role of H

in Theorem 13.6 will be played by NHN defined in (12.13) (with β = 1). The Hessian of HN is given by (allinner products and norms in the following equation are w.r.t the standard inner product in RN )

(v,∇2HN (x)v

)>

1

2‖v‖2 +

1

N

∑i<j

(vi − vj)2

(xi − xj)2>

1

2‖v‖2, v ∈ RN , (13.70)

thus the convexity bound (13.28) holds with a constant K = N/2 for NHN . Hence the spectral gap from(13.43) implies that for any function R(λ) we have

〈R(λ);R(λ)〉µG 6C

N

∑α

∫|∂λαR(λ)|2dµG. (13.71)

Notice that this bound is in the same form as in (13.64). A similar statement holds for the LSI, i.e., we have∫R logR dµG 6

C

N

∑α

∫|∂λα

√R(λ)|2dµG. (13.72)

We can apply the concentration inequality (13.46) to the function R(λ) = λα to get (13.69). Once again,we get exactly the same result as considered earlier, so there is no advantage of using the explicit invariantmeasure µG.

We have seen that in both the matrix and the invariant ensemble setup, even for Gaussian ensembles,concentration estimates based on spectral gap or LSI do not yield optimal results for the eigenvalues on shortscales. In the matrix setup, the proof of the optimal rigidity bounds for Wigner matrices, Theorem 11.5, usesa completely different approach independent of the LSI. In the invariant ensemble setup, while the convexitybound (13.70) cannot be improved for a general vector v, it becomes much stronger if we consider it onlyfor vectors v satisfying

∑j vj = 0 due to the structure of the Hessian (13.70). In the next section, we will

explore this idea to improve estimates on certain functions of the eigenvalues.

107

13.7 Extensions to the simplex, regularization of the DBM

In the above application to either spectral gap or LSI, the measure µG was restricted to the subset Σ =ΣN = x ∈ RN : x1 < x2 < . . . < xN. It is absolutely continuous with respect to the Lebesgue measureon ΣN , but if we write it in the form (13.24) then the Hamiltonian H has to be infinite outside of ΣN ; inparticular it is not differentiable, a property that is implicitly assumed in Section 13.3. This issue is closelyrelated to the boundary terms in the integration by parts, especially in (13.33), that arise if one formallyextends the proofs to Σ. Therefore, the results of Theorem 13.6 do not apply directly. The proper procedureinvolves an approximation and extension of the measure from ΣN to RN , using the results of Section 13.3for the regularized measure on RN and then removing the regularization. Whether the regularization canbe removed for any β > 0 or only for β > 1 depends on the statement, as we explain now.

For simplicity, we work in the setup of the Gaussian β-ensemble, i.e., we set

H(x) =

N∑i=1

1

4x2i −

1

N

∑i<j

log(xj − xi),

and define µβ(x) = Z−1µ e−NβH(x) on the simplex ΣN , exactly as in (12.13) with the Gaussian potential

V (x) = 12x

2 and with any parameter β > 0. As usual, Zµ is a normalization constant.Recall from (12.4) that the DBM on the simplex ΣN is defined via the stochastic differential equation

dxi =

√2√βN

dBi +

(−1

2xi +

1

N

∑j 6=i

1

xi − xj

)dt for i = 1, . . . , N , (13.73)

where B1, . . . , BN is a family of independent standard Brownian motions. The unique strong solution existsonly if β > 1 (Theorem 12.2) with equilibrium measure µβ . Since DBM does not exists on ΣN unless β > 1,we cannot use DBM either in the SDE form (12.4) or in the PDE form (12.17) when β < 1, see a remarkbelow (12.17). On the other hand, certain results may be extended to any β > 0 if their final formulationsdo not involve DBM.

In this section, we present a regularization procedure to show that substantial parts of the main resultsof Theorem 13.6, i.e., the LSI for any β > 0 and exponential relaxation decay of the entropy for β > 1,remain valid on the simplex ΣN . A similar generalization holds for the Brascamp-Lieb inequality. In the nextsection, the same regularization will be used to show that the key Dirichlet form inequality (Theorem 14.3)also hold for β > 0.

For later applications, we work with a slightly bigger class of measures than just µβ . We consider measureson Σ = ΣN of the form

ω = Z−1ω e−βNHµβ ,

where H(x) =∑j Uj(xj) for some convex real valued functions Uj on R. The total Hamiltonian of ω is

Hω := H + H. Note that Uj are defined and convex on the entire RN . The entropy and the Dirichlet formare defined as before:

Sω(f) =

∫Σ

f log f dω, Dω(f) =1

βN

∫Σ

|∇f |2dω.

The corresponding DBM is given by

dxi =

√2√βN

dBi +

(−1

2xi − U ′i(xi) +

1

N

∑j 6=i

1

xi − xj

)dt. (13.74)

We assume a lower bound on the Hessian of the total Hamiltonian,

H′′ω = H′′ + H′′ > 1

2+ min

jminx∈R

U ′′j (x) > K (13.75)

for some positive constant K on the entire RN . This bound (13.75) plays the role of (13.28). Let Dω, Sωand Lω denote the Dirichlet form, entropy and generator corresponding to the measure ω. Now we claimthat Theorem 13.6 holds for the measure ω on ΣN in the following form:

108

Theorem 13.14. Assume (13.75). Then for β > 0, the LSI holds, i.e.,

Sω(f) 62

KDω(

√f) (13.76)

for any nonnegative normalized density f on ΣN ,∫fdω = 1, that satisfies f ∈ L∞ and ∇

√f ∈ L∞. For

β > 1 the requirement that f and ∇√f are bounded can be removed.

Moreover, the Brascamp-Lieb inequality also holds for any β > 0, i.e., for any bounded function f ∈L2(ΣN ,dω) we have

〈f ; f〉ω 6 〈∇f,[H′′ω]−1∇f〉ω. (13.77)

For β > 1 the requirement that f is bounded can be removed.Furthermore, for any β > 1 the dynamics ∂tft = Lωft with initial condition f0 = f is well defined on

ΣN and it approaches to equilibrium in the sense that

Sω(ft) 6 e−2tKSω(f). (13.78)

The inequalities above are understood in the usual sense that they are relevant only when the right handside is finite. Moreover, by a standard density argument they extend to the closure of the correspondingspaces. For example, (13.76) holds for any f that can be approximated by a sequence of bounded normalizeddensities fn ∈ L∞ with ∇

√fn ∈ L∞ such that Dω(

√fn −

√f)→ 0.

Before starting the formal proof, we explain the key idea. For the proofs of (13.76) and (13.77), we extendthe measure ω from Σ to the entire RN in a continuous way by relaxing the strict ordering xi < xi+1 imposedby Σ but heavily penalizing the opposite order. In this way we can use Theorem 13.6 for the regularizedmeasure and we avoid the problematic boundary terms in the integration by parts in (13.33). At the end weremove the regularization using the additional boundedness assumptions on f and ∇

√f . For β > 1 these

additional assumptions are not necessary using that C∞0 (Σ) functions are dense in H1(Σ,dω). The entropydecay (13.78) will follow from the LSI and the time-integral version of the entropy dissipation (13.32) thatcan be proven directly on Σ if β > 1.

We remark that there is an alternative regularization method that mimics the proof of Theorem 13.6directly on Σ for β > 1. This is based on introducing carefully selected cutoff functions in order to ap-proximate ft by a function compactly supported on Σ. The compact support renders the boundary termsin the integration by parts zero, but the cutoff does not commute with the dynamics; the error has to betracked carefully. The advantage of this alternative method is that it also gives the exponential decay of theDirichlet form and it is also the closest in spirit to the strategy of the proof of Theorem 13.6 on RN . Thedisadvantage is that it works only for β > 1; in particular it does not yield the LSI for β ∈ (0, 1). We willnot discuss this approach in this book, the interested reader may find details in Appendix B of [64].

Proof. For any δ > 0 define the extension µδβ = Z−1µ,δe

−βNHδ of the measure µβ from ΣN to RN by

replacing the singular logarithm with a C2-function. To that end, we introduce the approximation parameterδ > 0 and define for x ∈ RN ,

Hδ(x) :=∑i

1

4x2i −

1

N

∑i<j

logδ(xj − xi) , (13.79)

where we set

logδ(x) := 1(x > δ) log x+ 1(x < δ)

(log δ +

x− δδ− 1

2δ2(x− δ)2

), x ∈ R.

It is easy to check that logδ ∈ C2(R), is concave, and satisfies

limδ→0

logδ(x) =

log x if x > 0

−∞ if x 6 0 .

109

The convergence is monotone decreasing, i.e.,

logδ(x) > logδ′(x), x ∈ R, (13.80)

for any δ > δ′. Furthermore, we have

∂x logδ(x) =

1x if x > δ2δ−xδ2 if x 6 δ ,

and the lower bound

∂2x logδ(x) >

− 1x2 if x > δ

− 1δ2 if x 6 δ ,

in particular Hδ is convex with H′′δ > 12 . Similarly, we can extend the measure ω to RN by setting

ωδ := Z−1ω,δe

−βNHµδβ , (13.81)

and its Hamiltonian still satisfies (13.75). By the monotonicity (13.80), we know that e−βNHδ(x) is pointwisemonotonically decreasing in δ and clearly Zµ,δ and Zω,δ converge to 1 as δ → 0. Clearly, ωδ → ω as δ → 0in the sense that for any set A ⊂ RN we have ωδ(A) → ω(A ∩ ΣN ). We remark that the regularized DBMcorresponding to the measure ωδ is given by

dxi =

√2√βN

dBi +

(−1

2xi − U ′i(xi) +

1

N

∑j<i

log′δ(xi − xj)−1

N

∑j>i

log′δ(xj − xi))

dt. (13.82)

Given a function f on ΣN such that∫

ΣNfdω = 1 and Dω(

√f) =

∫ΣN|∇√f |2dω < ∞, we extend it by

symmetry to RN . To do so, we first define the symmetrized version of ΣN , i.e.,

ΣN := RN \x : ∃i 6= j, xi = xj

, (13.83)

which has the disjoint union structure

ΣN =⋃π∈SN

π(ΣN ),

where π runs through all N -element permutations and it acts by permuting the coordinates of any pointx ∈ RN . For any x ∈ ΣN there is a unique π so that x ∈ π(ΣN ) and we then define the extension f by

f(x) := f(π−1(x)). Clearly, π−1(x) is simply the coordinates of x = (x1, x2, . . . , xN ) permuted in increasing

order. Thus f is defined on RN apart from a zero measure set and it is bounded since f ∈ L∞. Furthermore,∇[(f)1/2] is also bounded since ∇

√f ∈ L∞.

Since f is bounded, we have∫RN fdωδ →

∫ΣN

fdω = 1, so there is a constant Cδ such that fδ := Cδ f

is a probability density, i.e.,∫RN fδdωδ = 1 and clearly Cδ → 1 as δ → 0. Applying Theorem 13.6 to the

measure ωδ and the function fδ on RN , we see that the LSI holds for ωδ, i.e.,

Sωδ(fδ) 62

KDωδ(

√fδ),

or equivalently,

Cδ

∫RN

f log fdωδ + logCδ 62CδK

Dωδ

(√f).

Now we let δ → 0. Using the boundedness of ∇[(f)1/2], the weak convergence of ωδ to ω and that Zω,δ andCδ converge to 1, the Dirichlet form on the right side of the last inequality converges to Dω(

√f). The first

term on the left side of the inequality converges to∫f log fdω by dominated convergence, and the second

term converges to zero. Thus we arrive at (13.76).

110

In the above argument, the boundedness of f and ∇√f were only used to ensure that f or rather

its extension f has finite integral and the Dirichlet form w.r.t. the regularized measure ωδ, Dωδ(

√f ),

converges to Dω(√f ). For β > 1 we can remove these conditions by using a different extension of f to RN

if f ∈ H1(Σ,dω). We may assume that Dω(√f) <∞; otherwise (13.76) is a tautology. We first still assume

that f ∈ L∞(Σ). We smoothly cut off f to be zero at the boundary of ΣN , i.e., we find a nonnegativesequence fε ∈ C∞0 (ΣN ) such that

√fε →

√f in H1(Σ.dω) and

∫fεdω = 1. For β > 1 the existence of a

similar sequence but fε → f in H1 was shown in Section 12.4. In fact, the same construction also showsthat we can also guarantee

√fε →

√f in H1(Σ,dω). Now we use the LSI for the smooth functions fε, i.e.,

Sω(fε) 62

KDω(

√fε),

and we let ε → 0. The right hand side converges to Dω(√f) by the above choice of fε. For the left hand

side, recall that apart from a smoothing which can be dealt with via standard approximation arguments,the cutoff function was constructed in the form fε(x) = Cεφε(x)f(x), where φε(x) ∈ (0, 1) with φε 1monotonically pointwise and Cε is a normalization such that Cε → 1 as ε→ 0. Clearly,

Sω(fε) = Cε logCε

∫Σ

φεfdω + Cε

∫Σ

fφε log φεdω + Cε

∫Σ

φεf log fdω.

The first term converges to zero since∫φεfdω 6 1 and Cε logCε → 0. The second term also converges to

zero by dominated convergence since |φε log φε| is bounded by one and goes to zero pointwise. Finally, thelast term converges to Sω(f) by monotone convergence. This proves (13.76) for β > 1 for any bounded f .Finally we remove this last condition. Given any f with Dω(

√f) < ∞, we define fM := minf,M and

fM := CMfM , where CM is the normalization. Clearly, CM → 1 and fM f pointwise. Since (13.76) holds

for fM , we have

CM logCM

∫fMdω + CM

∫fM log fMdω 6

2

KCM

∫|∇√fM |2dω. (13.84)

Now we let M →∞. The first term on the left is just logCM → 0. The second term on the left converges toSω(f) by monotone convergence and CM → 1 and similarly the right hand side converges to (2/K)Dω(

√f)

by monotone convergence. This proves (13.76) for β > 1 without any additional condition on f .The Brascamp-Lieb inequality, (13.77), is proved similarly, starting from its regularized version on RN

〈f ; f〉ωδ 6 〈∇f,[H′′δ + H′′

]−1∇f〉ωδ

that follows directly from Theorem 13.13. We can then take the limit δ → 0 using monotone convergence

on the left and the dominated convergence on the right, using that H′′δ → H and the inverse[H′′δ + H′′

]−1

is uniformly bounded.For the third part of the theorem, for the proof of (13.78), we first note that the remark after (12.17)

applies to the generator Lω as well, i.e., β > 1 is necessary for the well-posedness of the equation ∂tft = Lωfton ΣN with initial condition f0 supported on ΣN . The construction of the dynamics in Section 12.4 alsoimplies that ft ∈ H1(dω) for any t > 0 if f ∈ L2(dω).

We now mimic the proof of the entropy dissipation (13.32) in our setup. Since we do not know thatDω(√ft) <∞, we have to introduce a regularization c > 0 to keep ft away from zero. We compute

d

dt

∫ft log(ft + c)dω =

∫(Lft) log(ft + c) dω +

∫ftLftft + c

dω

= −∫|∇ft|2

ft + cdω −

∫cLftft + c

dω

= −∫|∇ft|2

ft + cdω −

∫c|∇ft|2

(ft + c)2dω. (13.85)

111

Owing to the regularization c > 0, we used integration by parts for H1(dω) functions only. Dropping thelast term that is negative and integrating from 0 to t, we obtain∫

ft log(ft + c)dω +

∫ t

0

ds

∫|∇fs|2

fs + cdω 6

∫f0 log(f0 + c)dω. (13.86)

Since ft log ft+cft> 0, we have∫

ft log ftdω +

∫ t

0

∫|∇fs|2

fs + cds 6

∫f0 log(f0 + c)dω. (13.87)

Note that both terms on the left hand side are nonnegative. Now we let c → 0. By monotone convergenceand Sω(f0) <∞, we get ∫

ft log ftdµ+

∫ t

0

ds

∫|∇fs|2

fsdω 6

∫f0 log f0dω (13.88)

or

Sω(ft) + 4

∫ t

0

Dω(√fs) ds 6 Sω(f0). (13.89)

This is the entropy dissipation inequality in a time integral form. Notice that neither equality nor thedifferential version like (13.32) is claimed. Note that by integrating (13.85) between t and τ , a similarargument yields

Sω(ft) + 4

∫ t

τ

Dω(√fs) ds 6 Sω(fτ ), t > τ > 0. (13.90)

In particular the entropy decays:

0 6 Sω(ft) 6 Sω(fτ ), t > τ > 0. (13.91)

Now we use the LSI (13.76) to estimate Dω(√fs) in (13.90) and recall that the LSI holds for any fs since

β > 1. We get

Sω(ft) + 2K

∫ t

τ

Sω(fs) ds 6 Sω(fτ ), t > τ > 0. (13.92)

A standard calculus exercise shows that Sω(ft) 6 e−2KtSω(f0) for all t > 0. One possible argument is to fixany δ > 0 and choose τ = (n− 1)δ, t = nδ with n = 1, 2, . . . in (13.92). By monotonicity of the entropy wehave

(1 + 2Kδ)Sω(fnδ) 6 Sω(f(n−1)δ),

for any n, and by iteration we obtain

Sω(fnδ) 6 (1 + 2Kδ)−nSω(f0) .

Setting δ = t/n and letting n→∞ we get (13.78). This completes the proof of Theorem 13.14.

112

14 Universality of the Dyson Brownian Motion

In this section we assume β > 1 since Dyson Brownian Motion in a strong sense is well defined only for thesevalues of β. However, some results will hold for any β > 0 that we will comment on separately. We considerthe Gaussian equilibrium measure µ = µG ∼ exp(−βNH) on ΣN , defined in (12.13) in Section 12.3 with theGaussian potential V (λ) = 1

2λ2. From (12.14) we recall the corresponding Dirichlet form

Dµ(f) :=1

βN

N∑i=1

∫(∂if)2dµ =

1

βN‖∇f‖2L2(µ) , (∂i ≡ ∂λi) , (14.1)

and the generator L = LG defined via

Dµ(f) = −∫fLGfdµ , (14.2)

and explicitly given by LG = 1βN∆− (∇H) · ∇. The corresponding dynamics is given by (12.17), i.e.,

∂tft = LGft , t > 0 . (14.3)

In this section, we will drop all subscripts G.As remarked in the previous section, the Hamiltonian H is convex since the Hessian of the Hamiltonian

of µ satisfies ∇2(βNH) > βN/2 by (13.70). Taking the different normalization of the Dirichlet form in(13.25) and (14.1) into account, Theorem 13.6 (actually, its extension to ΣN in Theorem 13.14 with Uj ≡ 0)guarantees that µ satisfies the LSI in the form

Sµ(f) 6 4Dµ(√f),

and the relaxation time to equilibrium is of order one. Furthermore, we have the exponential convergence tothe equilibrium µG in the sense of total variation norm (13.78) at exponential speed on time scales beyondorder one for initial data with finite entropy.

The following theorem is the main result of this section. It shows that under a certain condition on therigidity of the eigenvalues the relaxation times of the DBM for observables depending only on the eigenvaluedifferences are in fact much shorter than order one. The main assumption is the a priori estimate (12.18)which we restate here.

A priori Estimate: There exists ξ > 0 such that

Q := sup06t6N

1

N

∫ N∑j=1

(λj − γj)2ft(λ)µG(dλ) 6 CN−2+2ξ (14.4)

with a constant C uniformly in N . We also assume that after time 1/N the solution of the equation (14.3)satisfies the bound

Sµ(f1/N ) 6 CNm (14.5)

for some fixed m. Later in Lemma 14.6 we will show that for β = 1, 2 this bound automatically holds.

Theorem 14.1 (Gap universality of the Dyson Brownian motion for short time). [64, Theorem 4.1]Let β > 1 and assume (14.5). Fix n > 1 and an array of positive integers, m = (m1,m2, . . . ,mn) ∈ Nn+. LetG : Rn → R be a bounded smooth function with compact support and define

Gi,m(x) := G(N(xi − xi+m1), N(xi+m1 − xi+m2), . . . , N(xi+mn−1 − xi+mn)

). (14.6)

Then for any ξ ∈ (0, 12 ) and any sufficiently small ε > 0, independent of N , there exist constants C, c > 0,

depending only on ε and G, such that for any J ⊂ 1, 2, . . . , N −mn we have

supt>N−1+2ξ+ε

∣∣∣ ∫ 1

|J |∑i∈JGi,m(x)(ftdµ− dµ)

∣∣∣ 6 CNε

√N2Q

|J |t+ Ce−cN

ε

. (14.7)

113

In particular, if (14.4) holds, then for any t > N−1+2ξ+2ε+δ with some δ > 0, we have∣∣∣ ∫ 1

|J |∑i∈JGi,m(x)(ftdµ− dµ)

∣∣∣ 6 C 1√|J |Nδ−1

+ Ce−cNε

. (14.8)

Thus the gap distribution, averaged over J indices, coincides for ftdµ and dµ provided that |J |Nδ−1 →∞.

We stated this result with a very general test function (14.6) because we will need this form later.Obviously, controlling test functions of the form (14.6) for all n is equivalent to controlling test functionsof neighboring gaps only (maybe with a larger n), so the reader may think only of the case m1 = 1,m2 = 2, . . . ,mn = n.

In our applications, J is chosen to be the indices of the eigenvalues in the interval [E− b, E+ b] and thus|J | Nb. This identifies the averaged gap distributions of eigenvalues. However, Theorem 12.4 concernscorrelation functions and not gap distributions directly. It is intuitively clear that the information carried ingap distribution or correlation functions are equivalent if both statistics are averaged on a scale larger thanthe typical fluctuation of a single eigenvalue (which is smaller than N−1+ε in the bulk by (11.32)). Thisstandard fact will be proved in Section 14.5.

As pointed out after Theorem 12.4, the input of this theorem, the apriori estimate (14.4), identifies thelocation of the eigenvalues only on a scale N−1+ξ which is much weaker than the 1/N precision encoded inthe rescaled eigenvalue differences in (14.7). Moreover, by the rigidity estimate (11.32), the a priori estimate(14.4) holds for any ξ > 0 if the initial data of the DBM is a generalized Wigner ensemble. Therefore,Theorem 14.1 holds for any t > N−1+ε for any ε > 0 and this establishes Dyson’s conjecture (described inSection 12.3) in the sense of averaged gap distributions for any generalized Wigner matrices.

14.1 Main ideas behind the proof of Theorem 14.1

The key method is to analyze the relaxation to equilibrium of the dynamics (13.30). This approach was firstintroduced in Section 5.1 of [63]; the presentation here follows [64].

Recall the convexity inequality (13.70) for the Hessian of the Hamiltonian⟨v,∇2H(x)v

⟩>

1

2‖v‖2 +

1

N

∑i<j

(vi − vj)2

(xi − xj)2>

1

2‖v‖2, v ∈ RN , (14.9)

which guarantees a relaxation to equilibrium on a time scale of order one. The key idea to prove Theorem14.1 is that the relaxation time is in fact much shorter than order one for local observables that depend onlyon the eigenvalue differences. The convexity bound (14.9) shows that the relaxation in the direction vi − vjis much faster than order one provided that xi− xj are close. However, this effect is hard to exploit directlydue to that all modes of different wavelengths are coupled. Our idea is to add an auxiliary strongly convexpotential H(x) to the original Hamiltonian H to “speed up” the convergence to local equilibrium. On theother hand, we will also show that the cost of this speeding up can be effectively controlled if the a prioriestimate (12.18) holds.

The auxiliary potential H(x) is defined by

H(x) :=

N∑j=1

Uj(xj), Uj(x) :=1

2τ(xj − γj)2 , (14.10)

i.e., it is a quadratic confinement on the scale√τ for each eigenvalue near its classical location, where the

parameter 0 < τ < 1 will be chosen later on. The total Hamiltonian is given by

H := H+ H, (14.11)

where H is the Gaussian Hamiltonian given by (4.12) with V (x) = 12x

2. The measure with Hamiltonian H,

dω := ω(x)dx, ω := e−βNH/Z, (14.12)

114

will be called the local relaxation measure.

The local relaxation flow is defined to be the flow with the generator characterized by the natural Dirichletform w.r.t. ω, explicitly, L:

L = L −∑j

bj∂j , bj = U ′j(xj) =xj − γjτ

. (14.13)

We will choose τ 1 so that the additional term H substantially increases the lower bound (13.70) on theHessian, hence speeding up the dynamics so that the relaxation time is at most τ .

The idea of adding an artificial potential H to speed up the convergence appears to be unnatural here.The current formulation is a streamlined version of a much more complicated approach that appearedin [63] which took ideas from the earlier work [59]. Roughly speaking, in the hydrodynamical limit, theshort wavelength modes always have shorter relaxation times than the long wavelength modes. A directimplementation of this idea is extremely complicated due to the logarithmic interaction that couples shortand long wavelength modes. Adding a strongly convex auxiliary potential H shortens the relaxation timeof the long wavelength modes, but it does not affect the short modes, i.e., the local statistics, which are ourmain interest. The analysis of the new system is much simpler since now the relaxation is faster, uniform forall modes. Finally, we need to compare the local statistics of the original system with those of the modifiedone. It turns out that the difference is governed by (∇H)2 which can be directly controlled by the a prioriestimate (14.4).

Our method for enhancing the convexity of H is reminiscent of a standard convexification idea concerningmetastable states. To explain the similarity, consider a particle near one of the local minima of a double wellpotential separated by a local maximum, or energy barrier. Although the potential is not convex globally,one may still study a reference problem defined by convexifying the potential along with the well in which theparticle initially resides. Before the particle reaches the energy barrier, there is no difference between thesetwo problems. Thus questions concerning time scales shorter than the typical escape time can be convenientlyanswered by considering the convexified problem; in particular the escape time in the metastability problemitself can be estimated by using convex analysis. Our DBM problem is already convex, but not sufficientlyconvex. The modification by adding H enhances convexity without altering the local statistics. This issimilar to the convexification in the metastability problem which does not alter events before the escapetime.

14.2 Proof of Theorem 14.1

We will work with the measures µ and ω defined on the simplex ΣN and we present the proof as if Theo-rem 13.6 and its proof were valid not only for measures on RN but also on ΣN . Theorem 13.14 demonstratedthat this is indeed true except that the exponential decay to equilibrium was proven only in entropy senseand not Dirichlet form sense as in (13.31). For pedagogical simplicity, we first neglect this technicality and inSection 14.3 we will comment on how to remedy this problem with the help of the regularization introducedin Section 13.7.

The core of the proof is divided into three theorems. For the flow with generator L, we have the followingestimates on the entropy and Dirichlet form.

Theorem 14.2. Let β > 1 arbitrary. Consider the forward equation

∂tqt = Lqt, t > 0, (14.14)

with the reversible measure ω defined in (14.12). Let the initial condition q0 satisfy∫q0dω = 1. Then we

have the following estimates

∂tDω(√qt) 6 −

2

τDω(√qt)−

1

βN2

∫ N∑i,j=1

(∂i√qt − ∂j

√qt)

2

(xi − xj)2dω, (14.15)

115

1

βN2

∫ ∞0

ds

∫ N∑i,j=1

(∂i√qs − ∂j

√qs)

2

(xi − xj)2dω 6 Dω(

√q0), (14.16)

and the logarithmic Sobolev inequalitySω(q0) 6 CτDω(

√q0) (14.17)

with a universal constant C. Thus the relaxation time to equilibrium is of order τ :

Sω(qt) 6 e−Ct/τSω(q0), Dω(qt) 6 e

−Ct/τDω(q0). (14.18)

Proof. Denote by h =√q and by a calculation similar to (13.33) we have the equation

∂tDω(ht) = ∂t1

βN

∫(∇h)2e−βNHdx 6 − 2

βN

∫∇h(∇2H)∇h e−βNHdx. (14.19)

In our case, (13.70) and (14.10) imply that the Hessian of H is bounded from below as

∇h(∇2H)∇h > 1

τ

∑j

(∂jh)2 +1

2N

∑i,j

1

(xi − xj)2(∂ih− ∂jh)2. (14.20)

This proves (14.15) and (14.16). The rest can be proved by straightforward arguments analogously to(13.32)–(13.37).

Notice that the estimate (14.16) is an additional information that we extracted from the Bakry-Emeryargument by using the second term in the Hessian estimate (13.70). It plays a key role in the next theorem.

Theorem 14.3 (Dirichlet form inequality). Let β > 1 be arbitrary. Let q be a probability density with respect tothe relaxation measure ω from (14.12), i.e.,

∫qdω = 1. Fix n > 1, an array of positive integers m ∈ Nn and a

smooth function G : Rn → R with compact support as in Theorem 14.1. Then for any J ⊂ 1, 2, . . . , N−mnand any t > 0 we have∣∣∣ ∫ 1

|J |∑i∈JGi,m(x)(qdω − dω)

∣∣∣ 6 C(tDω(√q)

|J |

)1/2

+ C√Sω(q)e−ct/τ . (14.21)

Proof. We give the proof for the β > 1 case here since this is relevant for Theorem 14.1. The generalcase β > 0 with additional assumptions will be discussed in Section 14.4. For simplicity of the notation, weconsider only the case n = 1, m1 = 1, Gi,m(x) = G(N(xi − xi+1)). Let qt satisfy

∂tqt = Lqt, t > 0,

with an initial condition q0 = q. We write∫ [ 1

|J |∑i∈J

G(N(xi − xi+1))](q − 1)dω (14.22)

=

∫ [ 1

|J |∑i∈J

G(N(xi − xi+1))](q − qt)dω +

∫ [ 1

|J |∑i∈J

G(N(xi − xi+1))](qt − 1)dω.

The second term in (14.22) can be estimated by (13.8), the decay of the entropy (14.18) and the boundednessof G; this gives the second term in (14.21).

To estimate the first term in (14.22), by the evolution equation ∂qt = Lqt and the definition of L we have∫1

|J |∑i∈J

G(N(xi − xi+1))qtdω−∫

1

|J |∑i∈J

G(N(xi − xi+1))q0dω

=

∫ t

0

ds

∫1

|J |∑i∈J

G′(N(xi − xi+1))[∂iqs − ∂i+1qs]dω.

116

From the Schwarz inequality and ∂q = 2√q∂√q, the last term is bounded by

2

[∫ t

0

ds

∫RN

N2

|J |2∑i∈J

[G′(N(xi − xi+1))

]2(xi − xi+1)2 qsdω

]1/2

(14.23)

×

[∫ t

0

ds

∫RN

1

N2

∑i

1

(xi − xi+1)2[∂i√qs − ∂i+1

√qs]

2dω

]1/2

6 C(Dω(

√q0)t

|J |

)1/2

,

where we have used (14.16) and that[G′(N(xi − xi+1))

]2(xi − xi+1)2 6 CN−2 due to G being smooth and

compactly supported.

Alternatively, we could have directly estimated the left hand side of (14.21) by using the total variationnorm between qω and ω, which in turn could be estimated by the entropy and the Dirichlet form using thelogarithmic Sobolev inequality, i.e., by

C

∫|q − 1|dω 6 C

√Sω(q) 6 C

√τDω(

√q). (14.24)

However, compared with this simple bound, the estimate (14.21) gains an extra factor |J | N in thedenominator, i.e., it is in terms of Dirichlet form per particle. The improvement is due to the fact thatthe observable in (14.21) depends only on the gap, i.e., difference of points. This allows us to exploitthe additional term (14.16) gained in the Bakry-Emery argument. This is a manifestation of the generalobservation that gap-observables behave much better than point-observables.

The final ingredient in proving Theorem 14.1 is the following entropy and Dirichlet form estimates.

Theorem 14.4. Let β > 1 be arbitrary. Suppose that (13.70) holds and recall the definition of Q from (12.18).Fix some (possibly N -dependent) τ > 0 and consider the local relaxation measure ω with this τ . Set ψ := ω/µand let gt := ft/ψ with ft solving the evolution equation (14.3). Suppose there is a constant m such that

S(fτµ|ω) 6 CNm. (14.25)

Fix any small ε > 0. Then for any t ∈ [τNε, N ] the entropy and the Dirichlet form satisfy the estimates:

S(gtω|ω) 6 CN2Qτ−1, Dω(√gt) 6 CN

2Qτ−2, (14.26)

where the constants depend on ε and m.

Remark 14.5. It will not be difficult to check that if the initial data is given by a Wigner ensemble, then(14.25) holds without any assumption, see Lemma 14.6.

Proof of Theorem 14.4. The evolution of the entropy S(ftµ|ω) = S(ftµ|ψµ) can be computed byLemma 13.11 as

∂tS(ftµ|ω) = − 4

βN

∑j

∫(∂j√gt)

2 ψ dµ+

∫(Lgt)ψ dµ,

where L is defined in (14.3) and we used that ψ = ω/µ is time independent. Hence we have, by using (14.13),

∂tS(ftµ|ω) = − 4

βN

∑j

∫(∂j√gt)

2 dω +

∫Lgt dω +

∑j

∫bj∂jgt dω.

Since ω is L-invariant and time independent, the middle term on the right hand side vanishes. From theSchwarz inequality and ∂g = 2

√g∂√g, we have

∂tS(ftµ|ω) 6 −2Dω(√gt) + CN

∑j

∫b2jgt dω 6 −2Dω(

√gt) + CN2Qτ−2. (14.27)

117

Notice that (14.27) is reminiscent to (13.32) for the derivative of the entropy of the measure gtω = ftµ withrespect to ω. The difference is, however, that gt does not satisfy the evolution equation with the generatorL. The last term in (14.27) expresses the error.

Together with the logarithmic Sobolev inequality (14.17), we have

∂tS(ftµ|ω) 6 −2Dω(√gt) + CN2Qτ−2 6 −Cτ−1S(ftµ|ω) + CN2Qτ−2. (14.28)

Integrating the last inequality from τ to t and using the assumption (14.25) and t > τNε, we have provedthe first inequality of (14.26). Using this result and integrating (14.27), we have∫ t

τ

Dω(√gs)ds 6 CN

2Qτ−1. (14.29)

Notice that

Dµ(√fs) =

1

βN

∫|∇(gsψ)|2

gsψ

dω

ψ6

C

βN

∫ [ |∇gs|2gs

+ |∇ logψ|2gs]dω = CDω(

√gs) + CN2Qτ−2 (14.30)

by a Schwarz inequality. Thus from (14.29), after restricting the integration and using t > 2τ , we get

CN2Qτ−1 >∫ t

t−τDω(√gs)ds >

∫ t

t−τ

[ 1

CDµ(

√fs)− CN2Qτ−2

]ds >

τ

CDµ(

√ft)− CN2Qτ−1, (14.31)

where, in the last step, we used that Dµ(√ft) is decreasing in t which follows from the convexity of the

Hamiltonian of µ, see e.g. (13.33). Using the opposite inequality Dω(√gt) 6 CDµ(

√ft) + CN2Qτ−2, that

can be proven similarly to (14.30), we obtain

Dω(√gt) 6 CN

2Qτ−2,

i.e., the second inequality of (14.26).

Finally, we complete the proof of Theorem 14.1. For any given t > 0 we now choose τ := tN−ε and weconstruct the local relaxation measure ω with this τ as in (14.12). Set ψ = ω/µ and let q := gt = ft/ψ bethe density q in Theorem 14.3. We would like to apply Theorem 14.4 and for this purpose we need to verifythe assumption (14.25). By the definitions of ω, µ and ψ = ω/µ, we have

S(fτµ|ω) =

∫fτ log fτdµ−

∫fτ logψdµ = Sµ(fτ )−

∫fτ logψdµ. (14.32)

We can bound the last term by

−∫fτ logψdµ 6 CN

∫fτWdµ+ log

Z

Z6 CN2Qτ−1 6 CNm (14.33)

for some m, where we have used that Z 6 Z since H > H. To verify the assumption (14.25), it remains toprove that Sµ(fτ ) 6 CNm for some m. This inequality follows from the following lemma.

Lemma 14.6. Let β = 1, 2. Suppose the initial data f0 of the DBM is given by the eigenvalue distribution ofa Wigner matrix. Then for any τ > 0 we have

Sµ(fτ ) 6 CN2(1− log(1− e−τ )

). (14.34)

Proof. For simplicity, we consider only the case β = 1, i.e., the real Wigner matrices. Recall that theprobability measure fτµ, is the same as the eigenvalue distribution of the Gaussian divisible matrix (12.21):

Hτ = e−τ/2H0 + (1− e−τ )1/2HG, (14.35)

118

where H0 is the initial Wigner matrix and HG is an independent standard GOE matrix. Since the entropyis monotonic w.r.t. taking a marginal, (13.21), we have

Sµ(fτ ) = S(fτµ|µ) 6 S(µHτ |µHG),

where µHτ and µHG are the laws of the matrix Hτ and HG respectively. Since the laws of both µHτ andµHG are given by the product of the laws of the matrix elements, from the additivity of entropy (13.11),S(µHτ |µHG) is equal to the sum of the relative entropies of the matrix elements. Recall that the variances ofoff-diagonal and diagonal entries for GOE differ by a factor 2. For simplicity of notations, we consider onlythe off-diagonal terms. Let γ = 1− e−τ and denote by gα the standard Gaussian distribution with varianceα, i.e.,

gα(x) :=1√2πα

exp(− x2

2α

).

Let %γ be the probability density of (1− γ)1/2ζ = e−τ/2ζ where ζ is the random variable for an off-diagonalmatrix element. By definition, the probability density of the matrix element of Hτ is given by ζτ = %γ ∗g2γ/N .Therefore Jensen’s inequality yields

S(ζτ |g2/N ) = S

(∫dy %γ(y) g2γ/N (· − y)

∣∣∣∣ g2/N

)6∫

dy %γ(y)S(g2γ/N (· − y)|g2/N

). (14.36)

By explicit computation one finds

S(gσ2(· − y)|gs2

)= log

s

σ+

y2

2s2+σ2

2s2− 1

2.

In our case, we have

S(g2γ/N (· − y)|g2/N

)=

1

2

(N

2y2 − log γ + γ − 1

).

We can now continue the estimate (14.36). Using∫y2%γ(y)dy = 2/N , we obtain

S(ζτ |g2/N ) 6 C − log γ ,

and the claim follows.

We now return to the proof of Theorem 14.1. We can apply Lemma 14.6 with the choice τ = N−1+2ξ,where ξ ∈ (0, 1

2 ) is from the assumption (14.4). Together with (14.32)–(14.33), this implies that (14.25)holds. Thus Theorem 14.4 and Theorem 14.3 imply for any t ∈ [τNε, N ] that

∣∣∣ ∫ 1

N

∑i∈JGm,i(x)(ftdµ− dω)

∣∣∣ 6 C(tDω(√q)

|J |

)1/2

+ C√Sω(q)e−cN

ε

(14.37)

6 C

(tN2Q

|J |τ2

)1/2

+ Ce−cNε

6 CNε

√N2Q

|J |t+ Ce−cN

ε

,

i.e., the local statistics of ftµ and ω are the same for any initial data fτ .

Applying the same argument to the Gaussian initial data, f0 = fτ = 1, we can also compare µ and ω. Wehave thus proved the estimate (14.7). Finally, if t > N−1+2ξ+δ+2ε, then the assumption (14.4) guaranteesthat, √

N2Q

|J |t6

1√|J |Nδ−1

,

which proves Theorem 14.1.

119

14.3 Restriction to the simplex via regularization

In the previous section we tacitly assumed that the Bakry-Emery analysis from Section 13.3 extends to Σ.Strictly speaking, this is not fully rigorous. There are two options to handle this technical point.

As we remarked after Theorem 13.14, there is a direct method via cutoff functions to make the Bakry-Emery argument rigorous on Σ if β > 1 (see Appendix B of [64]). The same technique can also make everystep in Section 14.2 rigorous on Σ.

We present here a more transparent argument that relies on the regularized measure ωδ on RN alreadyused in the proof of the LSI in Theorem 13.14. This path is technically simpler since it performs the core ofthe analysis on RN , and it uses an extra input, the strong solution to the DBM, Theorem 12.2, to removethe regularization at the end. We now explain the correct procedure of the proof of Theorem 14.1 using thisregularization. The goal is to prove (14.37).

For any small δ > 0 we introduce the regularized versions µδ and ωδ of the measures µ and ω as inSection 13.7. Let Lωδ be the generator of the regularized measure ωδ on RN and let ft,δ be the solution to∂tft,δ = Lωδft,δ on RN with the same initial condition as for δ = 0, i.e., f0,δ is the extension of f0 to theentire RN with f0,δ(x) = 0 for x 6∈ ΣN .

The three key theorems, Theorems 14.2–14.4, of Section 14.2 were formulated and proven for the regular-ized setup on RN , so they can be applied to the regularized dynamics. With the notations in these theorems,the conclusion is that if

S(fτ,δµδ|ωδ) 6 CNm (14.38)

(see (14.25)), then

∣∣∣ ∫ 1

N

∑i∈JGm,i(x)(ft,δdµδ − dωδ)

∣∣∣ 6 CNε

√N2Qδ|J |t

+ Ce−cNε

, t ∈ [τNε, N ], (14.39)

where Qδ is defined exactly as in (14.4) with the regularized measures, i.e.,

Qδ := sup06t6N

1

N

∫ N∑j=1

(xj − γj)2ft,δ(x)µδ(dx). (14.40)

Now we let δ → 0. Since ωδ converges weakly to ω in the sense that ωδ(A) → ω(A ∩ ΣN ) for any subsetA ∈ RN , we have

∫O(x)dωδ →

∫O(x)dω for any bounded observable O. Setting

O(x) =1

N

∑i∈JGm,i(x), (14.41)

we see that the second term within the integral in the left hand side of (14.39) converges to the correspondingterm in (14.37). On the right hand side a similar argument shows that Qδ → Q as δ → 0. Strictly speaking,the observable (xj − γj)2 is unbounded, but a very simple cutoff argument shows that

Pft,δµδ(

maxj|xj | > N2

)6 e−cN

uniformly in δ and t ∈ [0, N ].For the first term in the left side of (14.39), we use the stochastic representation of the solution to

∂tft,δ = Lωδft,δ to note that ∫RN

O(x)ft,δ(x)dωδ = Ef0ωEx0

δ O(x(t)), (14.42)

where Ex0

δ denotes the expectation with respect to the law of the regularized DBM (x(t))t, see (13.82),starting from x0, and Ef0ω denotes the expectation of x0 with respect to the measure f0ω. Now we usethe existence of the strong solution to the DBM (13.74) for any β > 1, that can be obtained exactly as inTheorem 12.2 for the Uj ≡ 0 case. Since x(t) is continuous and it remains in the open set ΣN , the probability

120

that up to a fixed time t it remains away from a δ-neighborhood of the boundary of ΣN goes to 1 as δ → 0,i.e.,

limδ→0

Pωδ(

dist(x(s), ∂ΣN ) > δ : s ∈ [0, t])

= 1.

Notice that (13.74) and (13.82) are exactly the same for paths that stay away from the boundary of ΣN .This means that the right hand side of (14.42) converges to Ef0ωEx0O(x(t)), where Ex0 denotes expectationwith respect to the law of (13.74). This proves that

limδ→0

∫RN

O(x)ft,δ(x)dωδ =

∫ΣN

O(x)ft(x)dω , (14.43)

which completes the proof of (14.37).Finally, we need to verify the condition (14.38). Following (14.32)–(14.33), we only need to show that

Sµδ(fτ,δ) 6 CNm for sufficiently small δ > 0. We first run the DBM for a very short time and replace theoriginal matrix ensemble H0 by Hσ, its tiny Gaussian convolution of variance σ = N−10, see (14.35). This isnot a restriction, since Theorem 14.1 anyway concerns ft for much larger times, t N−1. The correspondingeigenvalue density fσ supported on ΣN satisfies Sµ(fσ) 6 CN12 by (14.34). As a second cutoff, for some large

K we set fσ,K := CK maxfσ,K with a normalization constant CK such that∫fσ,Kdµ = 1 and CK → 1

as K → ∞. We will actually apply Theorem 14.1 to fσ,K as an initial condition. The regularized flow also

starts with this initial condition, i.e., we define ft,δ to be the solution of ∂tft,δ = Lµδft,δ with f0,δ := fσ,K .

Since the entropy decays along the regularized flow, we immediately have Sµδ(fτ,δ) 6 Sµδ(f0,δ) = Sµδ(fσ,K).

Since fσ,K is supported on ΣN and it is bounded, clearly Sµδ(fσ,K) converges to Sµ(fσ,K) as δ → 0. Wethen have

Sµ(fσ,K) = CK

∫maxfσ,K

[log maxfσ,K+ logCK

]dµ→ Sµ(fσ)

as K →∞ by monotone convergence. Thus first choosing K sufficiently large so that Sµ(fσ,K) 6 2Sµ(fσ) 6CN12, then choosing δ sufficiently small, we achieved Sµδ(fτ,δ) 6 CN

12.This verifies the condition (14.38) for some sufficiently small δ, so that (14.39) holds. Letting δ → 0, we

obtain (14.7) which completes the fully rigorous proof of Theorem 14.1.

14.4 Dirichlet form inequality for any β > 0

Here we extend the proof of the key Dirichlet form inequality, Theorem 14.3, from β > 1 to any β > 0.This extension will not be needed for this book, so the reader may skip this section, but we remark thatthe Dirichlet form inequality for β > 0 played a crucial role in the proof of the universality for invariantensembles for any β > 0, see [25].

Lemma 14.7. Let β > 0 and q be a probability density with respect to ω defined in (14.12) with ∇√q ∈ L∞(dω)and q ∈ L∞(dω). Then for any J ⊂ 1, 2, . . . , N −mn − 1 and any t > 0 we have∣∣∣∣∫ 1

|J |∑i∈JGi,m q dω −

∫1

|J |∑i∈JGi,m dω

∣∣∣∣ 6 C

√Dω(√q) t

|J |+ C

√Sω(q) e−ct/τ . (14.44)

For β > 1, the conditions ∇√q ∈ L∞ and q ∈ L∞(dω) can be removed.

We emphasize that this lemma holds for any β > 0, i.e., it does not (directly) rely on the existence of theDBM. The parameter t is not connected with the time parameter of a dynamics on ΣN (although it emergesas a time cutoff in a regularized dynamics on RN within the proof).

Proof of Lemma 14.7. First we record the result if ω on ΣN is replaced with the regularized measure ωδ onRN , as defined in (13.81) with the choice of H(x) =

∑j Uj(xj), where Uj is given in (14.10). In this case

the proof of Theorem 14.3 applies to the letter even for β > 0 since now we work on RN and complications

121

arising from the boundary are absent. We immediately obtain for any probability density q on RN with∫qdωδ = 1, for any J ⊂ 1, 2, . . . , N −mn − 1 and any t > 0 that∣∣∣∣∫ 1

|J |∑i∈JGi,m q dωδ −

∫1

|J |∑i∈JGi,m dωδ

∣∣∣∣ 6 C

√Dωδ(

√q) t

|J |+ C

√Sωδ(q) e−ct/τ . (14.45)

Suppose now that√q ∈ H1(dω) and q is a bounded probability density in ΣN with respect to ω. Similarly

to the proof of (13.76) for the general β > 0 case, we may extend q to Σ by symmetrization. Let q denote

this extension which is bounded and ∇f1/2 is also bounded since q has these properties. Then there is aconstant Cδ such that qδ := Cδ q is a probability density with respect to ωδ. Since q is bounded, we have∫RN qdωδ →

∫ΣN

qdω by dominated convergence, and thus Cδ → 1 as δ → 0.

Now we apply (14.45) q = qδ. Taking the limit δ → 0, the left hand side converges to that of (14.44)since Gi,m is a bounded smooth function and qδdωδ = Cδ qdωδ converges weakly to q(ω)1(ω ∈ ΣN )dω bydominated convergence. We thus have∣∣∣∣∣∫

1

|J |∑i∈JGi,m q dω −

∫1

|J |∑i∈JGi,m dω

∣∣∣∣∣6 C lim sup

δ→0

√CδDωδ(

√q) t

|J |+ C lim sup

δ→0

√Sωδ(Cδ q) e−ct/τ . (14.46)

By dominated convergence, using that ∇f1/2 ∈ L∞, we have Dωδ(f1/2)→ Dω(

√f). Similarly, the entropy

term also converges to Sω(q) by q ∈ L∞.This proves (14.44) under the condition q ∈ L∞, ∇√q ∈ L∞. Finally, these conditions can be removed for

β > 1 by a simple approximation. We may assume Dω(√q) <∞; otherwise (14.44) is an empty statement.

By the LSI (13.76) we also know that Sω(q) < ∞. First, we still keep the assumption that q ∈ L∞. SinceC∞0 (Σ) is dense in H1(dω) (see Section 12.4), we can find a sequence of densities qn ∈ L∞(Σ) such that∇√qn ∈ L2(dω) and

√qn →

√q in L2(dω) and ∇√qn → ∇

√q in L2(dω). In fact, the construction in

Section 12.4 guarantees that, apart from an irrelevant smoothing, qn may be chosen of the form qn = Cnφnq,where φn is a cutoff function with 0 6 φn 6 1, converging to 1 pointwise and Cn → 1. We know that (14.44)holds for every qn. Taking the limit n→∞, the left hand side converges since∣∣∣ ∫ O(qn − q)dω

∣∣∣ 6 ‖O‖∞‖√qn −√q‖L2(dω)‖√qn +

√q‖L2(dω) → 0,

where we have used that ‖√qn+√q‖2 6 ‖

√qn‖2+‖√q‖2 and ‖√q‖2 =

∫qdω = 1. Here O is given by (14.41).

For the right hand side of (14.44) applied to the approximate function qn, we note that Dω(√qn)→ Dω(

√q)

directly by the choice of qn. Finally, we need to show that Sω(qn)→ Sω(q) as n→∞. Clearly,

Sω(qn)− Sω(q) =

∫Σ

[qn log qn − q log q

]dω =

∫Σ

[Cnφnq log(Cnφn) + (Cnφn − 1)q log q

]dω,

and both integrals go to zero by dominated convergence and by Sω(q) <∞. This proves (14.44) for any q ∈L∞. Finally, this last condition can be removed by approximating any density q with qM := CM maxq,M,where CM is the normalization and CM → 1 as M → ∞. Similarly to the proof in (13.84), one can showthat Sω(qM )→ Sω(q) and Dω(qM )→ Dω(q). Thus we can pass to the limit in the inequality (14.44) writtenup for qM . This proves Lemma 14.7. Notice that the proof did not use the existence of the DBM; instead,it used the existence of the regularized DBM.

14.5 From gap distribution to correlation functions: proof of Theorem 12.4.

Theorem 12.4 follows from Theorem 14.1 if we can show that the correlation function difference in (12.23)can be bounded in terms of the difference of the expectation of gap observables in (14.8). We state it as thefollowing lemma which clearly proves Theorem 12.4 with Theorem 14.1 as an input.

122

Lemma 14.8. Let f be a probability density with respect to the Gaussian equilibrium measure (12.13). Supposethat the estimate (12.22) holds with some exponent ξ > 0 and that∣∣∣ ∫ 1

|J |∑i∈JGi,m(x)(fdµ− dµ)

∣∣∣ 6 C 1√|J |Nδ−1

(14.47)

also holds for some δ > 0 and for any J ⊂ 1, 2, . . . , N −mn, where Gi,m is defined in (14.6). Then forany ε > 0 and N large enough we have for any b = bN with N−1 b 1 that∣∣∣ ∫ E+b

E−b

dE′

2b

∫Rn

dα O(α)(p

(n)fµ,N − p

(n)µ,N

)(E′ +

α

N%sc(E)

)∣∣∣ 6 N2ε[N−1+ξ

b+

√N−δ

b

]. (14.48)

This is a fairly standard technical argument, and the details will be given in Section 14.6. Here we justsummarize the main points. To understand the difference between (14.47) and (14.48), consider n = 1 forsimplicity and let m1 = 1, say. The observable (14.7) answers the question: “What is the empirical distribu-tion of differences of consecutive eigenvalues?”, in other words (14.7) directly identifies the gap distribution.Correlation functions answer the question: “What is the probability that there are two eigenvalues at afixed distance away from each other?”, in other words they are not directly sensitive to the possible othereigenvalues in between. Of course these two questions are closely related and it is easy to deduce the answersfrom each other under some reasonable assumptions on the distribution of the eigenvalues. We now provethat the correlation functions can be derived from (generalized) gap distributions which is the content ofLemma 14.8.

14.6 Details of the proof of Lemma 14.8

By definition of the correlation function in (12.19), we have∫ E+b

E−b

dE′

2b

∫Rn

dα O(α)p(n)fµ,N

(E′ +

α

N

)=CN,n

∫ E+b

E−b

dE′

2b

∫ ∑i1 6=i2 6=... 6=in

O(N(xi1 − E′), N(xi1 − xi2), . . . N(xin−1

− xin))fdµ,

=CN,n

∫ E+b

E−b

dE′

2b

∫ ∑m∈Sn

N∑i=1

Yi,m(E′,x)fdµ, (14.49)

with CN,n := Nn(N −n)!/N ! = 1 +On(N−1), where we let Sn denote the set of increasing positive integers,m = (m2,m3, . . . ,mn) ∈ Nn−1

+ , m2 < m3 < . . . < mn, and we introduced

Yi,m(E′,x) := O(N(xi − E′), N(xi − xi+m2), . . . , N(xi − xi+mn)

). (14.50)

We will set Yi,m = 0 if i+mn > N . By permutational symmetry of p(n)f,N we can assume that O is symmetric

and thus we restrict the summation to i1 < i2 < . . . < in upon an overall factor n!. Then we changed indicesi = i1, i2 = i+m2, i3 = i+m3, . . . , and performed a resummation over all index differences encoded in m.Apart from the first variable N(xi1 −E′), the function Yi,m is of the form (14.6), so (14.47) will apply. Thedependence on the first variable will be negligible after the dE′ integration on a macroscopic interval.

To control the error terms in this argument, especially to show that the error terms in the potentiallyinfinite sum over m ∈ Sn converge, one needs an apriori bound on the local density. But this information isprovided very precisely by the bulk rigidity estimate (12.22). The details for the rest of the argument willbe presented in the next subsection.

Since (14.49) also holds if we set f = 1, to prove Lemma 14.8, we only need to estimate

Θ :=

∣∣∣∣∣∫ E+b

E−b

dE′

2b

∫ ∑m∈Sn

N∑i=1

Yi,m(E′,x)(f − 1)dµ

∣∣∣∣∣. (14.51)

123

Let M be an N -dependent parameter chosen at the end of the proof. Let

Sn(M) := m ∈ Sn , mn 6M, Scn(M) := Sn \ Sn(M),

and note that |Sn(M)| 6Mn−1. We have the simple bound Θ 6 Θ(1)M + Θ

(2)M + Θ

(3)M , where

Θ(1)M :=

∣∣∣∣∣∫ E+b

E−b

dE′

2b

∫ ∑m∈Sn(M)

N∑i=1

Yi,m(E′,x)(f − 1)dµ

∣∣∣∣∣ (14.52)

and

Θ(2)M :=

∑m∈Scn(M)

∣∣∣∣∣∫ E+b

E−b

dE′

2b

∫ N∑i=1

Yi,m(E′,x)fdµ

∣∣∣∣∣ . (14.53)

We define Θ(3)M to be the same as Θ

(2)M but with f replaced by the constant 1, i.e., the equilibrium measure

µ.

Step 1: Small m case; estimate of Θ(1)M .

After performing the dE′ integration, we will eventually apply Theorem 14.1 to the function

G(u1, u2, . . .

):=

∫RO(y, u1, u2, . . .

)dy,

i.e., to the quantity ∫R

dE′ Yi,m(E′,x) =1

NG(N(xi − xi+m2

), . . .)

(14.54)

for each fixed i and m.For any E and 0 < ζ < b define sets of integers J = JE,b,ζ and J± = J±E,b,ζ by

J :=i : γi ∈ [E − b, E + b]

, J± :=

i : γi ∈ [E − (b± ζ), E + b± ζ]

,

where γi was defined in (11.31). Clearly, J− ⊂ J ⊂ J+. With these notations, we have∫ E+b

E−bdE′

N∑i=1

Yi,m(E′,x) =

∫ E+b

E−bdE′

∑i∈J+

Yi,m(E′,x) + Ω+J,m(x). (14.55)

The error term Ω+J,m, defined by (14.55) indirectly, comes from those i 6∈ J+ indices, for which xi ∈

[E−b, E+b]+O(N−1) since Yi,m(E′,x) = 0 unless |xi−E′| 6 C/N , the constant depending on the supportof O. Thus

|Ω+J,m(x)| 6 CN−1# i : |xi − γi| > ζ/2, |xi − E| 6 2b

for any sufficiently large N assuming ζ 1/N and using that O is a bounded function. The additional N−1

factor comes from the dE′ integration. Due to the rigidity estimate (12.22) and choosing ζ = N−1+ξ+ε′ withsome ε′ > 0, we get ∫

|Ω+J,m(x)|fdµ 6 N−D (14.56)

for any D. We can also estimate∫ E+b

E−bdE′

∑i∈J+

Yi,m(E′,x) 6∫ E+b

E−bdE′

∑i∈J−

Yi,m(E′,x) + CN−1|J+ \ J−|

=

∫R

dE′∑i∈J−

Yi,m(E′,x) + CN−1|J+ \ J−|+ Ξ+J,m(x) (14.57)

6∫R

dE′∑i∈J

Yi,m(E′,x) + CN−1|J+ \ J−|+ CN−1|J \ J−|+ Ξ+J,m(x),

124

where the error term Ξ+J,m, defined by (14.57), comes from indices i ∈ J− such that xi 6∈ [E − b, E + b] +

O(1/N). It satisfies the same bound (14.56) as Ω+J,m. By the continuity of %, the density of γi’s is bounded

by CN , thus |J+ \ J−| 6 CNζ and |J \ J−| 6 CNζ. Therefore, summing up the formula (14.54) for i ∈ J ,we obtain from (14.55), (14.56) and (14.57)∫ E+b

E−bdE′

∫ N∑i=1

Yi,m(E′,x)fdµ 6∫

1

N

∑i∈J

G(N(xi − xi+m2), . . .

)fdµ+ Cζ + CN−D

for each m ∈ Sn. A similar lower bound can be obtained analogously, and we get∣∣∣∣∣∫ E+b

E−bdE′

∫ N∑i=1

Yi,m(E′,x)fdµ−∫

1

N

∑i∈J

G(N(xi − xi+m2), . . .

)fdµ

∣∣∣∣∣ 6 Cζ + CN−D (14.58)

for each m ∈ Sn. The error term N−D can be neglected. Adding up (14.58) for all m ∈ Sn(M), we get∣∣∣∣∣∫ E+b

E−bdE′

∫ ∑m∈Sn(M)

N∑i=1

Yi,m(E′,x)fdµ−∫ ∑

m∈Sn(M)

1

N

∑i∈J

G(N(xi − xi+m2

), . . .)fdµ

∣∣∣∣∣6 CMn−1ζ . (14.59)

Clearly, the same estimate holds for the equilibrium, i.e., if we set f = 1 in (14.59). Subtracting these twoformulas and applying (14.47) to each summand on the second term in (14.58), we conclude that

Θ(1)M =

∣∣∣∣∣∫ E+b

E−b

dE′

2b

∫ ∑m∈Sn(M)

N∑i=1

Yi,m(E′,x)(fdµ−dµ)

∣∣∣∣∣ 6 CMn−1(b−1N−1+ξ+ε′+b−1/2N−δ/2

), (14.60)

where we have used that |J | 6 CNb. This completes the estimate of Θ(1)M .

Step 2. Large m case; estimate of Θ(2)M and Θ

(3)M .

For a fixed y ∈ R, ` > 0, let

χ(y, `) :=

N∑i=1

1(xi ∈

[y − `

N, y +

`

N

])denote the number of points in the interval [y − `/N, y + `/N ]. Note that for a fixed m = (m2, . . . ,mn), wehave

N∑i=1

|Yi,m(E′,x)| 6 C · χ(E′, `) · 1(χ(E′, `) > mn

)6 C

N∑m=mn

m · 1(χ(E′, `) > m

), (14.61)

where ` denotes the maximum of |u1|+ . . .+ |un| in the support of O(u1, . . . , un).Since the summation over all increasing sequences m = (m2, . . . ,mn) ∈ Nn−1

+ with a fixed mn containsat most mn−2

n terms, we have

∑m∈Scn(M)

∣∣∣∣∣∫ E+b

E−bdE′

∫ N∑i=1

Yi,m(E′,x)fdµ

∣∣∣∣∣ 6 C∫ E+b

E−bdE′

N∑m=M

mn−1

∫1(χ(E′, `) > m

)fdµ. (14.62)

The rigidity bound (12.22) clearly implies∫1(χ(E′, `) > m

)fdµ 6 CD,ε′N

−D

125

for any D > 0 and ε′ > 0, as long as m > `Nε′ . Therefore, choosing D = 2n + 2, we see from (14.62) thatΘ(2) is negligible, e.g.

Θ(2)M 6 CN

−1

if M > `Nε′ . Since all rigidity bounds hold for the equilibrium measure, the last bound is valid for Θ(3)M as

well. We will choose M = `Nε′ and recall that ` is independent of N . Combining these bounds with (14.60),we obtain

Θ 6 CNε′(n+1)(b−1N−1+ξ+ε′ + b−1/2N−δ/2

)+ CN−1 .

Choosing ε′ = ε/(n+ 1), we conclude the proof of Lemma 14.8.

126

15 Continuity of local correlation functions under matrix OU process

We have completed the first two steps of the three step strategy introduced in Section 5, i.e., the localsemicircle law and the universality of Gaussian divisible ensembles (Theorem 12.4). In this section, we willcomplete this strategy by proving a continuity result for the local correlation functions of the matrix OUprocess in the following Theorem 15.2 and Lemma 15.3. This is the Step 3a defined in Section 5. Fromthese results, we obtain a weaker version of Theorem 5.1, namely we get averaged energy universality ofWigner matrices but only on scale b > N−1/2+ε. In Section 16.1, we will use the idea of ”approximation bya Gaussian divisible ensemble” and prove Theorem 5.1 down to any scale b > N−1+ε.

Theorem 15.1. [68, Theorem 2.2] Let H be an N ×N real symmetric or complex Hermitian Wigner matrix.In the Hermitian case we assume that the real and imaginary parts are i.i.d. Suppose that the distributionν of the rescaled matrix elements

√Nhij satisfies the decay condition (5.6). Fix a small ε > 0, an integer

n > 1 and let O : Rn → R be a continuous, compactly supported function. Then for any |E| < 2 andb ∈ [N−1/2+ε, N−ε], we have

limN→∞

1

2b

∫ E+b

E−bdE′

∫Rn

dα O(α)(p

(n)H,N − p

(n)G,N

)(E′ +

α

N

)= 0. (15.1)

To prove this theorem, we first recall the matrix OU process (12.1) defined by

dHt =1√N

dBt −1

2Htdt, (15.2)

with the initial data H0. The eigenvalue evolution of this process is the DBM and recall that we denotethe eigenvalue distribution at the time t by ftdµ with ft satisfying (12.17). In this section, we assume thatthe initial data H0 is a N ×N Wigner matrix and the distribution of matrix element satisfies the uniformpolynomial decay condition (5.6). We have the following Green function continuity theorem for the matrixOU process.

Theorem 15.2 (Continuity of Green function). Suppose that the initial data H0 is a N ×N Wigner matrixwith the distribution of matrix element satisfying the uniform polynomial decay condition (5.6). Let κ > 0 bearbitrary and suppose that for some small parameter σ > 0 and for any y > N−1+σ, we have the followingestimate on the diagonal elements of the resolvent for any 0 6 t 6 1:

max16k6N

max|E|62−κ

∣∣∣∣( 1

Ht − E − iy

)kk

∣∣∣∣ ≺ N2σ (15.3)

with some constants C, c depending only on σ, κ.Let G(t, z) = (Ht − z)−1 denote the resolvent and m(t, z) = N−1 TrG(t, z). Suppose that F (x1, . . . , xn)

is a function such that for any multi-index α = (α1, . . . , αn) with 1 6 |α| 6 5 and for any ε′ > 0 sufficientlysmall, we have

max

|∂αF (x1, . . . , xn)| : max

j|xj | 6 Nε′

6 NC0ε

′(15.4)

and

max

|∂αF (x1, . . . , xn)| : max

j|xj | 6 N2

6 NC0 (15.5)

for some constant C0.Let ε > 0 be arbitrary and choose an η with N−1−ε 6 η 6 N−1. For any sequence of complex parameters

zj = Ej ± iη, j = 1, . . . , n, with |Ej | 6 2 − κ, there is a constant C, such that for any choices of the signsin the imaginary part of zj and for any t ∈ [0, 1], we have∣∣∣EF (m(t, z1), . . . ,m(t, zn))− EF (m(0, z1), . . . ,m(0, zn))

∣∣∣ 6 CN20σ+6ε(t√N). (15.6)

127

We defer the proof of Theorem 15.2 to the next subsection. The following result shows how the Greenfunction continuity, i.e., estimate of the type (15.6), can be used to compare correlation functions. The proofwill be given in Section 15.2.

Next we will compare local statistics of two Wigner matrix ensembles. We will use the labels v and wto distinguish them because later in Section 16 we will denote the matrix elements of the two ensembles bydifferent letters, vij and wij . For any two (generalized) Wigner matrix ensembles Hv and Hw, we denotethe probability laws of their eigenvalues λv and λw by µv and µw respectively. Denote by mv(z), mw(z)the Stieltjes transforms of the eigenvalues, i.e.,

mv(z) =1

N

N∑j=1

1

λvj − z,

and similarly for mv(z). Let p(n)v,N and p

(n)w,N be the n-point correlation functions of the eigenvalues w.r.t. µv

and µw. We remind the reader that sometimes we will use the notation EvF (λ) to denote the expectationof EF (λv). Although the setup is motivated by local correlation functions of eigenvalues, we point out thatthe concept of matrices or eigenvalues plays no role in the following theorem. It is purely a statement aboutcomparing two point processes on the real line, asserting that if the finite dimensional marginals of theStieltjes transforms are close then the local correlation functions are also close. It is essential that in (15.8)below the spectral parameter of the Stieltjes transform must be at distance N−1−σ to the real axis for someσ > 0, i.e., below the scale of the eigenvalue spacing. Controlling the Stieltjes transform on this very shortscale is necessary to identify and compare local correlation functions at the scale N−1.

The following theorem is a slightly modified version of Theorem 6.4 [69].

Theorem 15.3 (Correlation function comparison). Let κ > 0 be arbitrary and suppose that for some smallparameters σ, δ > 0 the following two conditions hold.

(i) For any ε > 0 and any k integer

E[

Immv(E + iN−1+ε)]k

+ E[

Immw(E + iN−1+ε)]k6 C (15.7)

holds for any |E| 6 2− κ and N > N0(ε, k, κ);

(ii) For any sequence zj = Ej + iηj, j = 1, . . . , n with |Ej | 6 2− κ and ηj = N−1−σj for some σj 6 σ, wehave ∣∣∣E (Immv(z1) · · · Immv(zn))− E (Immw(z1) · · · Immw(zn))

∣∣∣ 6 N−δ. (15.8)

Then for any integer n > 1 there are positive constants cn = cn(σ, δ) such that for any |E| 6 2− 2κ and forany C1 function O : Rn → R with compact support,∫

Rndα O(α)

(p

(n)v,N − p

(n)w,N

)(E +

α

N

)6 CN−cn , (15.9)

where C depends on O and N is sufficiently large.

We remark that in some applications we will use slightly different conditions. Instead of (15.8) we mayassume ∣∣∣EF (Immv(z1), . . . , Immv(zn))− EF (Immw(z1), . . . , Immw(zn))

∣∣∣ 6 N−δ, (15.10)

where F is as in Theorem 15.2. Then (15.8) holds since we can approximate E[

Immw(z1) . . . Immw(zn)]

bythe expression in (15.10) where the function F is chosen to be F (x1, . . . , xn) := x1x2 . . . xn if maxj |xj | 6 N c

and it is smoothly cutoff to go to zero in the regime maxj |xj | > N2c for some small c > 0.To see this, recall the elementary inequality,

θη1 6η2

η1θη2 , implying Imm(E + iη1) 6

η2

η1Imm(E + iη2) (15.11)

128

for any η2 > η1 > 0. Hence (15.7) implies that

E[

Immv(z)]k6 C(Nη)k (15.12)

for any k > 0. Let η = N−1+a, a > 0, and consider n = 1 for simplicity. By choosing c = 2a,

E|F (Immv(z))−Immv(z)| 6 E Immv(z)1(Immv(z) > N c) 6 N−c(k−1)E[

Immv(z)]k6 Nka−c(k−1) 6 N−δ

provided that k is large enough. Similar arguments can be used for general n and this implies that thedifference between the expectation of F and the product is negligible. This proves that the condition (15.8)can be replaced by the condition (15.10).

Notice that we pass through the continuity of traces of Green functions as an intermediate step to getcontinuity of the correlation functions. If we choose to follow the evolution of the correlation functions directlyby differentiating them, it will involve higher derivatives of eigenvalues. From the formulas of derivativesof eigenvalues and eigenvectors (12.9), (12.10), higher derivatives of eigenvalues involve singularities of theform (λi − λj)

−n for some positive integers n. These singularities are very difficult to control precisely.Our approach to use Green function as an intermediate step completely avoids this difficulty because Greenfunction has a natural regularization parameter, the imaginary part of the spectral parameter.

Proof of Theorem 15.1. Recall the matrix OU (15.2) can be solved by the formula (12.21) so that theprobability distribution of the matrix OU is given by a Wigner matrix ensemble if the initial data is aWigner matrix ensemble. More precisely, if we denote the initial Wigner matrix by H0 then the distributionof Ht is the same as e−t/2H0 + (1− e−t)1/2HG. Hence the rigidity holds in this case by (11.32) and (15.3)

holds with any sufficiently small σ > 0; we may choose σ = ε1. Recall p(n)t,N denotes the correlation functions

of Ht. We now apply Theorem 15.3 with Hv = H0 and Hw = Ht. The assumption (15.8) can be verifiedby (15.6) if t 6 N−1/2−ε for any ε > 0. From (15.9), we have

limN→∞

∫ E+b

E−b

dE′

2b

∫Rn

dα O(α)(p

(n)0,N − p

(n)t,N

)(E′ +

α

N

)= 0. (15.13)

This compares the correlation functions of H0 and Ht if t is not too large. To compare Ht with H∞ = HG, by

(12.24), we have for t = N−1/2−ε and b > N−1/2+10ε that (15.13) holds with p(n)0,N (which is the correlation

functions of H0) replaced by p(n)G,N . We have thus completed the proof of Theorem 15.1.

15.1 Proof of Theorem 15.2

The first ingredient to prove Theorem 15.2 is the following continuity estimate of matrix OU process. Tostate it, we need to introduce the deformation of a matrix H at certain matrix elements. For any i 6 j, letθijH denote a new matrix with matrix elements (θijH)k` = Hk` if k, ` 6= i, j and (θijH)k,` = θijk,`Hk`

with some 0 6 θijk,` 6 1 if k, ` = i, j. In other words, the deformation operation θij multiplies the matrixelements Hij and Hji by a factor between 0 and 1 and leaves all other matrix elements intact.

Lemma 15.4. Suppose that H0 is a Wigner ensemble and fix t ∈ [0, 1]. Let g be a smooth function of thematrix elements (hij)i6j and set

Mt := sup06s6t

supi6j

supθij

E(

(N3/2|hij(s)3|+√N |hij(s)|)

∣∣∂3hijg(θijHs)

∣∣) , (15.14)

where the last supremum runs through all deformations θij. Then

Eg (Ht)− Eg (H0) = O(t√N)Mt.

This lemma holds also for generalized Wigner matrices. We refer the reader to [12] for the minoradjustment needed to this case.

129

Proof. Denote ∂ij = ∂hij ; notice that despite the two indices, this is still a first and not a second orderpartial derivate. By Ito’s formula, we have

∂tEg (Ht) = −∑i6j

(E (hij(t)∂ijg(Ht))−

1

2NE(∂2ijg(Ht)

)).

A Taylor expansion of the first derivative ∂ijg in the direction hij yields

E (hij(t)∂ijg(Ht)) = Ehij(t)∂ijghij(t)=0 + E(hij(t)

2∂2ijghij(t)=0

)+ O

(supθij

E(|hij(t)3∂3

ijg(θijHt)|))

= sijE(∂2ijghij(t)=0

)+ O

(supθij

E(|hij(t)3∂3

ijg(θijHt)|))

,

E(∂2ijg(Ht)

)= E

(∂2ijghij(t)=0

)+ O

(supθij

E(|hij(t)(∂3

ijg(θijH)|)))

.

Here the short hand notation ∂ijghij(t)=0 means (∂ijg)(Ht), where (Ht)k` = (Ht)k` if k, ` 6= i, j and

(Ht)ij = (Ht)ji = 0. In the calculation above we used the independence of the matrix elements, in particular

that Ht is independent of hij(t). We also used the fact that the OU process preserves the first and secondmoments, i.e., Ehij(t) = Ehij(0) = 0 and E|hij(t)|2 = E|hij(0)|2 = 1/N . Thus we have

∂tEg(Ht) = N1/2 O

(supi6j

supθij

E(N3/2|hij(t)3|+N1/2|hij(t)|)|∂3ijg(θijHt)|

).

Integration over time finishes the proof.

The second ingredient to prove Theorem 15.2 is an estimate of the type (15.3) but not only for diagonalmatrix elements and for spectral parameters with imaginary parts slightly below N−1.

Lemma 15.5. Suppose for a Wigner matrix H we have the following estimate

max16k6N

sup|E|62−κ

supη>N−1−ε

∣∣∣∣Im (1

H − E ± iη

)kk

∣∣∣∣ ≺ N3σ+ε. (15.15)

Then we have that for any η > N−1−ε

sup16k,`6N

sup|E|62−κ

∣∣∣∣( 1

H − E ± iη

)k`

∣∣∣∣ ≺ N3σ+ε. (15.16)

We remark that (15.16) could be strengthened by including the supremum over η > N−1−ε. This can beobtained by establishing the estimate first for a fine grid of η’s with spacing N−10 and then extend the boundfor all η by using that the Green functions are Lipschitz continuous in η with a Lipschitz constant η−2.

Proof. Let λm and um denote the eigenvalues and eigenvectors of H, then by the definition of the Greenfunction, we have∣∣∣∣∣

(1

H − z

)jk

∣∣∣∣∣ 6N∑m=1

|um(j)||um(k)||λm − z|

6

[N∑m=1

|um(j)|2

|λm − z|

]1/2 [ N∑m=1

|um(k)|2

|λm − z|

]1/2

. (15.17)

Define a dyadic decomposition

Un = m : 2n−1η 6 |λm − E| < 2nη, n = 1, 2, . . . , n0 := C logN, (15.18)

U0 = m : |λm − E| < η, U∞ := m : 2n0η 6 |λm − E|,

130

and divide the summation over m into ∪nUnN∑m=1

|um(j)|2

|λm − z|=∑n

∑m∈Un

|um(j)|2

|λm − z|6 C

∑n

∑m∈Un

Im|um(j)|2

λm − E − i2nη6 C

∑n

Im

(1

H − E − i2nη

)jj

.

Now using (15.15) we can control the right hand side of (15.17) and conclude (15.16).

Now we can finish the proof of Theorem 15.2. First note that from the trivial bound

Im

(1

H − E − iη

)jj

6

(y

η

)Im

(1

H − E − iy

)jj

, η 6 y, (15.19)

and (15.3), the assumption (15.15) in Lemma 15.5 holds. Therefore the bounds (15.16) on the matrixelements are available.

Next, we will first consider the specific function

g(H) = Gab(z), (15.20)

for a fixed index pair a, b, where z = E+ iη with N−1−ε 6 η 6 1 and |E| 6 2−κ. While this function is notof the form appearing in (15.6), once the comparison for Gab(z) is established, we take a = b and averageover a to get the normalized trace of G, i.e., the Stieltjes transform. This will then prove (15.6) for thefunction F (x) = x, i.e., compare Em(t, z) with Em(0, z). The case of a polynomial F with several argumentsis analogous, and similarly any function satisfying (15.4)–(15.5) can be sufficiently well approximated by aTaylor expansion.

Returning to the case (15.20), in order to apply Lemma 15.4 we need to bound the third derivatives ofg(H) to estimate Mt from (15.14). We have

∂3ijG(z)ab = −

∑α,β

G(z)aα1G(z)β1,α2G(z)β2,α3G(z)β3,b,

where αk, βk = i, j or j, i. By (15.16), the following four expressions

G(z)aα1, G(z)β1α2

, G(z)β2α3, G(z)β3b

are bounded by N4σ+ε with very high probability provided N−1−ε 6 η 6 1. Consequently, we proved thatuniformly in E ∈ (−2 + κ, 2− κ), N−1−ε 6 η 6 1,

∂3ijG(z)ab = O(N20σ+5ε)

with very high probability. The same argument holds if some matrix elements are reduced by deformation,i.e., we have ∣∣∂3

ijg(θijHs)∣∣ 6 N20σ+5ε

and thus (15.14) holds with Mt = N20σ+6ε using the bound hij ≺ N−1/2. Hence by Lemma 15.4, we haveproved that for any t 6 1 we have

|Eg(Ht)− Eg(H0)| 6 CN20σ+6ε(t√N).

This completes the proof of Theorem 15.2.

15.2 Proof of the correlation function comparison Theorem 15.3

Define an approximate delta function at the scale η by

θη(x) :=1

πIm

1

x− iη. (15.21)

131

We will choose η ∼ N−1−a, with some small a > 0, i.e., slightly smaller than the typical eigenvalue spacing.This means that an observable of the form θη have sufficient resolution to detect individual eigenvalues.Moreover, polynomials of such observables detect correlation functions. On the other hand,

1

N

∑i

θη(λi − E) = π Imm(z) ,

therefore expectation values of such observables are covered by the condition (15.8). The rest of the proofconsists of making this idea precise. There are two technicalities to resolve. First, correlation functionsinvolve distinct eigenvalues (see (15.25) below), while polynomials of the resolvent include an overcountingof coinciding eigenvalues. Thus an exclusion-inclusion formula will be needed. Second, although η is muchsmaller than the relevant scale 1/N , it still does not give pointwise information on the correlation functions.However, the correlation functions in (4.38) are identified only as a weak limit, i.e., tested against a continuousfunction O. The continuity of O can be used to show that the difference between the exact correlationfunctions and the smeared out ones on the scale η ∼ N−1−a is negligible. This last step requires an a prioriupper bound on the density to ensure that not too many eigenvalues fall into an irrelevantly small interval;this bound is given in (15.7), and it will eventually be verified by the local semicircle law.

Proof of Theorem 15.3. For notational simplicity, we give the detailed proof only for the case of n = 3-pointcorrelation functions; the proof is analogous for the general case.

Denote by η = N−1−a for some small a > 0; by following the proof one may check that any a 6cminδ, σ/n2 will do. Let O be a compactly supported test function and let

Oη(β1, β2, β3) :=1

N3

∫R3

dα1dα2dα3O(α1, α2, α3)θη

(β1 − α1

N

). . . θη

(β3 − α3

N

)be its smoothing on scale Nη. We note that for any nonnegative C1 function O with compact support, thereis a constant depending on O such that

Oη 6 COη′ , for any 0 6 η 6 η′ 6 1/N. (15.22)

We can apply this bound with η′ = 1/N and combine it with (15.11) with η1 = 1/N and η2 = N−1+ε forany small ε > 0 to have

Oη 6 CN3εON−1+ε , for any 0 6 η 6 1/N. (15.23)

After the change of variables xj = E + βj/N and with Ej := E +αjN , we have∫

R3

dβ1dβ2dβ3 Oη(β1, β2, β3)p(3)w,N

(E +

β1

N, . . . , E +

β3

N

)(15.24)

=

∫R3

dα1dα2dα3 O(α1, α2, α3)

∫R3

dx1dx2dx3p(3)w,N (x1, x2, x3)θη(x1 − E1)θη(x2 − E2)θη(x3 − E3).

By definition of the correlation function, for any fixed E, α1, α2, α3,∫dx1dx2dx3p

(3)w,N (x1, x2, x3)θη(x1 − E1)θη(x2 − E2)θη(x3 − E3) (15.25)

=Ew 1

N(N − 1)(N − 2)

∑i 6=j 6=k

θη

(λi − E −

α1

N

)θη

(λj − E −

α2

N

)θη

(λk − E −

α3

N

),

where Ew indicates expectation w.r.t. the w variables. By the exclusion-inclusion principle,

Ew 1

N(N − 1)(N − 2)

∑i 6=j 6=k

θη(x1 − E1)θη(x2 − E2)θη(x3 − E3) = EwA1 + EwA2 + EwA3, (15.26)

132

where

A1 :=N3

N(N − 1)(N − 2)

3∏j=1

[1

N

∑i

θη(λi − Ej)

],

A3 :=2

N(N − 1)(N − 2)

∑i

θη(λi − E1)θη(λi − E2)θη(λi − E3),

and A2 := B1 +B2 +B3 with Bj defined as follows:

B3 = − 1

N(N − 1)(N − 2)

∑i

θη(λi − E1)θη(λi − E2)∑k

θη(λk − E3);

B1 consists of terms with j = k, and B2 consists of terms with i = k from the triple sum (15.26). We remarkthat A3 +A2 6 0 and thus∫

dx1dx2dx3p(3)w,N (x1, x2, x3)θη(x1 − E1)θη(x2 − E2)θη(x3 − E3) 6 CEw

3∏j=1

Imm(Ej + iη). (15.27)

The same bound holds for p(3)v,N as well. Therefore, we can combine (15.23) and (15.7) to obtain for any

small ε > 0 we have∫R3

dβ1dβ2dβ3 Oη(β1, β2, β3)(p(3)w,N + p

(3)v,N )

(E +

β1

N, . . . , E +

β3

N

)= O(N3ε). (15.28)

To approximate EwB3, we define φE1,E2(x) = θη(x − E1)θη(x − E2). Recall η = N−1−a and let η =

N−1−9a. Decompose θη = θ1 + θ2 where θ2(y) = θη(y)1(|y| > ηN3a). Denote

θ = (1− ψ(a))−1θ1, ψ(a) :=

∫θ2(y)dy 6 N−3a (15.29)

so that∫θ = 1. Using |φ′E1,E2

(x)| 6 Cη−2θη(x− E1) and |φ′′E1,E2(x)| 6 Cη−3θη(x− E1), we have

|θ ∗ φE1,E2(x)− φE1,E2(x)| 6 1

1− ψ(a)

∣∣∣ ∫ dyθ1(y)[φE1,E2(x− y)− φE1,E2

(x)]∣∣∣ (15.30)

6 C∣∣∣ ∫ dyθ1(y)[−φ′E1,E2

(x)y +O(φ′′E1,E2(x))y2]

∣∣∣ 6 C η2N6a

η3θη(x− E1),

(15.31)

where we used the symmetry of θ1, i.e.,∫

dyθ1(y)y = 0 and that θ1(y) is supported on |y| 6 ηN3a. Togetherwith (15.11) with the choice of η′ = N−1+ε for some small ε > 0, we have∣∣∣E 1

N3

∑i

[θ ∗ φE1,E2(λi)− φE1,E2(λi)]∑k

θη(λk − E3)∣∣∣ 6 η2N6a

Nη3

∣∣∣E 1

N2

∑i,k

θη(λi − E1)θη(λk − E3)∣∣∣

6η2N8a+2ε

Nη3

∣∣∣E 1

N2

∑i,k

θN−1+ε(λi − E1)θN−1+ε(λk − E3)∣∣∣ 6 (Nη)2N11a+2ε 6 N−6a, (15.32)

where we used the a priori bound (15.7) and then ε 6 a/2 in the two last steps. Hence we can approximateEwB3 by

EwB3 = Ew

∫dyφE1,E2

(y)N−3∑i,k

θ(λi − y)θη(λk − E3) +O(N−a). (15.33)

Recall θη = θ1 + θ2 and θ = (1 − ψ(a))−1θ1. We wish to replace θ in the right hand side of (15.33) by(1− ψ(a))−1θη with an error O(N−a). For this purpose, we need to prove that the additional contribution

from θ2 is bounded by O(N−a).

133

Since θ2(y) 6 CN−3aθN3aη(y), we have∫dyφE1,E2

(y)EwN−3∑i,k

θ2(λi − y)θη(λk − E3)

6∫

dyφE1,E2(y)EwN−3

∑i,k

N−3aθN3aη(λi − y)θη(λk − E3). (15.34)

Now we show that the contribution of this error term to the r.h.s. of (15.24) is negligible. RecallingE1 = E + α1/N and using

∫dα1θη(y − E1) 6 CN , we have∣∣∣ ∫

R3


∫dyθη(y − E1)θη(y − E2)EwN−3

∑i,k

N−3aθN3aη(λi − y)θη(λk − E3)∣∣∣

6 CN−3a

∫|α2|+|α3|6C

dα2dα3Ew N−2∑i

θη ∗ θN3aη(λi − E2)∑k

θη(λk − E3)

6 CN−3a

∫|α2|+|α3|6C

dα2dα3Ew N−2∑i

θη(λi − E2)∑k

θη(λk − E3), (15.35)

where we have used θη ∗ θη′ 6 Cθη if η > η′ with C independent of η, η′. Notice that we have only used thatO is compactly supported and ‖O‖∞ is bounded. Using (15.11) and (15.7), we can bound the last line byN−a+Cε.

As we will choose ε a, neglecting the Cε exponent, we can thus approximate the contribution of EwB3

to the r.h.s. of (15.24) by∫R3

dα1dα2dα3 O(α1, α2, α3)EwB3 (15.36)

= (1− ψ(a))−1

∫R3


∫dyφE1,E2

(y)EwN−3∑i,k

θη(λi − y)θη(λk − E3) +O(N−a).

The same estimate holds if we replace the expectation Ew by Ev. Recalling (15.8), we can thus estimatetheir difference by∣∣ ∫

R3

dα1dα2dα3 O(α1, α2, α3)[Ew − Ev]B3

∣∣ (15.37)

6 C∫

dyφE1,E2(y)∣∣∣[Ew − Ev]N−3

∑i,k

θη(λi − y)θη(λk − E3)∣∣∣+O(N−a)

6 C(Nη)−1N−δ +O(N−a) 6 O(N−a) ,

provided that a 6 δ/2. Similar arguments can be used to prove that∫R3

dα1dα2dα3 O(α1, α2, α3)[Ew − Ev][A1 +A2 +A3] = O(N−a). (15.38)

Recalling (15.24), we have thus proved that∫R3

dβ1dβ2dβ3 Oη(β1, β2, β3)(p(3)w,N − p

(3)v,N )

(E +

β1

N, . . . , E +

β3

N

)= O(N−a), (15.39)

for any O compactly supported with ‖O‖∞ bounded.In order to prove Theorem 15.3, it remains to replace Oη with O in (15.39); at this point we will use that

O is differentiable. For this purpose, we only need to bound the error∫R3

dβ1dβ2dβ3 (O −Oη)(β1, β2, β3)p(3)w,N

(E +

β1

N, . . . , E +

β3

N

)= O(N−a+ε) (15.40)

134

and use the similar estimate with w replaced by v. One can check easily that there is a C1 nonnegativefunction O with compact support and ‖O‖∞ 6 1 such that

|O −Oη| 6 C(Nη)Oη. (15.41)

Hence (15.40) follows from applying (15.28) to O. Choosing ε much smaller than a, this completes the proofof Theorem 15.3.

135

16 Universality of Wigner matrices in small energy window: GFT

We have proved the universality of Wigner matrices in energy windows bigger than N−1/2+ε in Section 15.In this section, we will use the Green function comparison theorem to improve it to any energy windowsof size bigger than N−1+ε. This is the Step 3 in the three step strategy introduced in Section 5. In thefollowing, we will first prove the Green function comparison theorem and then use it to prove our mainuniversality theorem, i.e., Theorem 5.1.

16.1 The Green function comparison theorems

The main ingredient to prove this result is the following Green function comparison theorem stating thatthe correlation functions of eigenvalues of two matrix ensembles are identical on scale 1/N provided that thefirst four moments of all matrix elements of these two ensembles are almost identical. Here we do not assumethat the real and imaginary parts are i.i.d., hence the k-th moment of hij is understood as the collection ofnumbers

∫hshk−sνij(dh), s = 0, 1, 2, . . . , k.

Before stating the result we explain a notation. We will distinguish between the two ensembles by usingdifferent letters, vij and wij for their matrix elements and we often use the notation H(v) and H(w) toindicate the difference. Alternatively, one could denote the matrix elements of H by hij and make thedistinction of the two ensembles in the measure, especially in the expectation, by using notations Ev andEw. Since the matrix elements will be replaced one by one from one distribution to the other, the latternotation would have been cumbersome and so we will follow the first convention in this section.

Theorem 16.1 (Green function comparison). [69, Theorem 2.3] Suppose that we have two generalized N×NWigner matrices, H(v) and H(w), with matrix elements hij given by the random variables N−1/2vij andN−1/2wij, respectively, with vij and wij satisfying the uniform polynomial decay condition (5.6). Fix abijective ordering map on the index set of the independent matrix elements,

φ : (i, j) : 1 6 i 6 j 6 N →

1, . . . , γ(N), γ(N) :=

N(N + 1)

2,

and denote by Hγ the generalized Wigner matrix whose matrix elements hij follow the v-distribution ifφ(i, j) 6 γ and they follow the w-distribution otherwise; in particular H(v) = H0 and H(w) = Hγ(N). Letκ > 0 be arbitrary and suppose that, for any small parameter τ > 0 and for any y > N−1+τ , we have thefollowing estimate on the diagonal elements of the resolvent

max06γ6γ(N)

max16k6N

max|E|62−κ

∣∣∣∣( 1

Hγ − E − iy

)kk

∣∣∣∣ ≺ N2τ (16.1)

with some constants C, c depending only on τ, κ. Moreover, we assume that the first four moments of vijand wij satisfy that ∣∣Evsijv4−s

ij − Ewsijw4−sij

∣∣ 6 N−δ−2+s/2, s = 0, 1, 2, 3, 4, (16.2)

for some given δ > 0. Let ε > 0 be arbitrary and choose an η with N−1−ε 6 η 6 N−1. For any sequenceof complex parameters zj = Ej ± iη, j = 1, . . . , n, with |Ej | 6 2− 2κ and with an arbitrary choice of the ±signs. Let G(v)(z) = (H(v)− z)−1 denote the resolvent and let F (x1, . . . , xn) be a function such that for anymulti-index α = (α1, . . . , αn) with 1 6 |α| 6 5 and for any ε′ > 0 sufficiently small, we have

max

|∂αF (x1, . . . , xn)| : max

j|xj | 6 Nε′

6 NC0ε

′(16.3)

and

max

|∂αF (x1, . . . , xn)| : max

j|xj | 6 N2

6 NC0 (16.4)

for some constant C0.

136

Then, there is a constant C1, depending on ϑ,∑m km and C0 such that for any η with N−1−ε 6 η 6 N−1

and for any choices of the signs in the imaginary part of zmj , we have∣∣∣∣∣EF (G(v)a1b1

(z1), . . . , G(v)anbn

(zn))− EF

(G(v) → G(w)

) ∣∣∣∣∣ 6 C1N−1/2+C1ε + C1N

−δ+C1ε, (16.5)

where the arguments of F in the second term are changed from the Green functions of H(v) to H(w) and allother parameters remain unchanged.

Moreover, for any |E| 6 2−2κ, any k > 1 and any compactly supported smooth test function O : Rn → Rwe have

limN→∞

∫Rn

dα O(α)(p

(n)v,N − p

(n)w,N

)(E +

α

N

)= 0, (16.6)

where p(n)v,N and p

(n)w,N denote the n-point correlation functions of the v and w-ensembles, respectively.

Remark 1: We formulated Theorem 16.1 for functions of traces of monomials of the Green functionbecause this is the form we need in the application. However, the result (and the proof we are going topresent) holds directly for matrix elements of monomials of Green functions as well, for the precise statement,see [69]. We also remark that Theorem 16.1 holds for generalized Wigner matrices if Csup = supij Nsij <∞.The positive lower bound on the variances, Cinf > 0 in (2.6) is not necessary for this theorem.

Remark 2: Although we state Theorem 16.1 for Hermitian and symmetric ensembles, similar results holdfor real and complex sample covariance ensembles; the modification of the proof is obvious.

Proof of the Green function comparison Theorem 16.1: The basic idea is to estimate the effect of changingmatrix elements of the resolvent one by one by a resolvent expansion. Since each matrix element has atypical size of N−1/2 and the resolvents are essentially bounded thanks to (16.1), a resolvent expansion upto the fourth order will identify the change of each element with a precision O(N−5/2) (modulo some tinycorrections of order NO(τ)). The expectation values of the terms up to fourth order involve only the first fourmoments of the single entry distribution, which can be directly compared. The error terms are negligibleeven when we sum them up N2 times, the number of comparison steps needed to replace all matrix elements.

Now we turn to the one by one replacement. For notational simplicity, we will consider the case whenthe test function F has only n = 1 variable and k1 = 1, i.e., we consider the trace of a first order monomial;the general case follows analogously. Consider the telescopic sum of differences of expectations

EF(

1

NTr

1

H(v) − z

)−EF

(1

NTr

1

H(w) − z

)(16.7)

=

γ(N)∑γ=1

[EF

(1

NTr

1

Hγ − z

)− EF

(1

NTr

1

Hγ−1 − z

)].

Let E(ij) denote the matrix whose matrix elements are zero everywhere except at the (i, j) position, where

it is 1, i.e., E(ij)k` = δikδj`. Fix an γ > 1 and let (i, j) be determined by φ(i, j) = γ. We will compare Hγ−1

with Hγ . Note that these two matrices differ only in the (i, j) and (j, i) matrix elements and they can bewritten as

Hγ−1 = Q+1√NV, V := vijE

(ij) + vjiE(ji)

Hγ = Q+1√NW, W := wijE

(ij) + wjiE(ji),

with a matrix Q that has zero matrix element at the (i, j) and (j, i) positions and where we set vji := vijfor i < j and similarly for w. Define the Green functions

R :=1

Q− z, S :=

1

Hγ − z.

137

We first claim that the estimate (15.16) holds for the Green function R as well. To see this, we have,from the resolvent expansion,

R = S +N−1/2SV S + . . .+N−9/5(SV )9S +N−5(SV )10R.

Since V has only at most two nonzero element, when computing the (k, `) matrix element of this matrixidentity, each term is a finite sum involving matrix elements of S or R and vij , e.g. (SV S)k` = SkivijSj` +SkjvjiSi`. Using the bound (15.16) for the S matrix elements, the subexponential decay for vij and thetrivial bound |Rij | 6 η−1, we obtain that the estimate (15.16) holds for R.

We can now start proving the main result by comparing the resolvents of H(γ−1) and H(γ) with theresolvent R of the reference matrix Q. By the resolvent expansion,

S = R−N−1/2RV R+N−1(RV )2R−N−3/2(RV )3R+N−2(RV )4R−N−5/2(RV )5S, (16.8)

so we can write

1

NTrS = R+ ξ, ξ :=

4∑m=1

N−m/2R(m)v +N−5/2Ωv

with

R :=1

NTrR, R(m)

v := (−1)m1

NTr(RV )mR, Ωv := − 1

NTr(RV )5S. (16.9)

For each diagonal element in the computation of these traces, the contribution to R, R(m)v and Ωv is a sum

of a few terms. E.g.

R(2)v =

1

N

∑k

[RkivijRjjvjiRik +RkivijRjivijRjk +RkjvjiRiivijRjk +RkjvjiRijvjiRik

]and similar formulas hold for the other terms. Then we have

EF(

1

NTr

1

Hγ − z

)=EF

(R+ ξ

)(16.10)

=E[F (R) + F ′(R)ξ + F ′′(R)ξ2 + . . .+ F (5)(R+ ξ′)ξ5

]=

5∑m=0

N−m/2EA(m)v ,

where ξ′ is a number between 0 and ξ and it depends on R and ξ; the A(m)v ’s are defined as

A(0)v = F (R), A(1)

v = F ′(R)R(1)v , A(2)

v = F ′′(R)(R(1)v )2 + F ′(R)R(2)

v ,

and similarly for A(3)v and A

(4)v . Finally,

A(5)v = F ′(R)Ω + F (5)(R+ ξ′)(R(1)

v )5 + . . . .

The expectation values of the terms A(m)v , m 6 4, with respect to vij are determined by the first four

moments of vij , for example

EA(2)v = F ′(R)

[ 1

N

∑k

RkiRjjRik + . . .]E |vij |2 + F ′′(R)

[ 1

N2

∑k,`

RkiRj`R`jRik + . . .]E |vij |2

+F ′(R)[ 1

N

∑k

RkiRjiRjk + . . .]E v2

ij + F ′′(R)[ 1

N2

∑k,`

RkiRj`RìRjk + . . .]E v2

ij .

138

Note that the coefficients involve up to four derivatives of F and normalized sums of matrix elements of R.Using the estimate (15.16) for R and the derivative bounds (16.3) for the typical values of R, we see thatall these coefficients are bounded by NC(τ+ε) with a very large probability, where C is an explicit constant.We use the bound (16.4) for the extreme values of R but this event has a very small probability by (15.16).

Therefore, the coefficients of the moments E vsijvuij , u+ s 6 4, in the quantities A(0)v , . . . , A

(4)v are essentially

bounded, modulo a factor NC(τ+ε). Notice that the k-th moment of vij appears only in the m = k termthat already has a prefactor N−k/2 in (16.10).

Finally, we have to estimate the error term A(5)v . All terms without Ω can be dealt with as before; after

estimating the derivatives of F by NC(τ+ε), one can perform the expectation with respect to vij that is

independent of R(m). For the terms involving Ω one can argue similarly, by appealing to the fact that thematrix elements of S are also essentially bounded by NC(τ+ε), see (15.16), and that vij has subexponentialdecay. Alternatively, one can use Holder inequality to decouple S from the rest and use (15.16) directly, forexample:

E|F ′(R)Ω| = 1

NE|F ′(R) Tr(RV )5S| 6 1

N

[E(F ′(R))2 TrS2

]1/2 [E Tr(RV )5(V R∗)5

]1/26 CN−

52 +C(τ+ε).

Note that exactly the same perturbation expansion holds for the resolvent of Hγ−1 with Av replaced byAw, i.e., vij is replaced with wij everywhere. By the moment matching condition (16.2), the expectationvalues

N−m/2∣∣∣EA(m)

v − EA(m)w

∣∣∣ 6 NC(τ+ε)−δ−2

for m 6 4 in (16.10). Choosing τ = ε, we have

EF(

1

NTr

1

Hγ − z

)− EF

(1

NTr

1

Hγ−1 − z

)6 CN−5/2+Cε + CN−2−δ+Cε.

After summing up in (16.7) we have thus proved that

EF(

1

NTr

1

H(v) − z

)− EF

(1

NTr

1

H(w) − z

)6 CN−1/2+Cε + CN−δ+Cε.

The proof can be easily generalized to functions of several variables. This concludes the proof of Theo-rem 16.1.

16.2 Conclusion of the three step strategy

In this section we put the previous results together to prove our main result Theorem 5.1. Recall the mainresult of the step 2, Theorem 12.4, asserts the universality of Gaussian divisible ensemble. However, in orderto reach energy window N−1+ε, we need to substantial large Gauss component. In terms of DBM, we needthe time t > N−cε for some c > 0. The eigenvalue continuity result in the previous section is unfortunatelyrestricted to t 6 N−1−ε. To over come the deficiency of the continuity, we will approximate the Wignerensemble by a suitably chosen Gaussian divisible ensemble. Thus we term this step “ Approximation by aGaussian divisible ensemble”.

We first review the classical moment matching problem for real random variables. For any real randomvariable ξ, denote by mk(ξ) = Eξk its k-th moment. By Schwarz inequality, the sequence of moments,m1,m2, . . . are not arbitrary numbers, for example |m1|k 6 mk and m2

2 6 m4, etc, but there are more subtlerelations. For example, if m1 = 0, then

m4m2 −m23 > m

32 , (16.11)

which can be obtained by

m23 =

[Eξ3

]2=[Eξ(ξ2 − 1)

]26[Eξ2

][E(ξ2 − 1)2

]= m2(m4 − 2m2

2 + 1)

139

and noticing that (16.11) is scale invariant, so it is sufficient to prove it for m2 = 1. In fact, it is easy to seethat (16.11) saturates if and only of the support of ξ consists of exactly two points (apart from the trivialcase when ξ ≡ 0).

This restriction shows that given a sequence of four admissible moments, m1 = 0, m2 = 1, m3,m4,there may not exist a Gaussian divisible random variable ξ with these moments; e.g. the moment sequence(m1,m2,m3,m4) = (0, 1, 0, 1) uniquely characterizes the standard Bernoulli variable (ξ = ±1 with 1/2-1/2probability). However, if we allow a bit room in the fourth moment, then one can match any four admissiblemoments by a small Gaussian divisible variable. This is the content of the next lemma [69, Lemma 6.5].This matching idea in the context of random matrices appeared earlier in [130]. For simplicity, we formulateit in the case of real variables, the extension to complex variables is standard.

Lemma 16.2. [Moment matching] Let m3 and m4 be two real numbers such that

m4 −m23 − 1 > 0, m4 6 C2 (16.12)

for some positive constant C2. Let ξG be a Gaussian random variable with mean 0 and variance 1. Thenfor any sufficient small γ > 0 (depending on C2), there exists a real random variable ξγ with subexponentialdecay and independent of ξG, such that the first four moments of

ξ′ = (1− γ)1/2ξγ + γ1/2ξG (16.13)

are m1(ξ′) = 0, m2(ξ′) = 1, m3(ξ′) = m3 and m4(ξ′), and

|m4(ξ′)−m4| 6 Cγ (16.14)

for some C depending on C2.

Proof. We first construct a random variable X with the first four moments given by 0, 1,m3,m4 satisfyingm4 −m2

3 − 1 > 0. We take the law of X to be of the form

pδa + qδ−b + (1− p− q)δ0,

where a, b, p, q > 0 are parameters satisfying p+ q 6 1. The conditions m1(X) = 0 and m2(X) = 1 imply

p =1

a(a+ b), q =

1

b(a+ b).

Furthermore, the condition p+ q 6 1 reads ab > 1. By an explicit computation we find

m3(X) = a− b , m4(X) = m3(X)2 + ab . (16.15)

Clearly, the condition m4 −m23 − 1 > 0 implies that (16.15) has a solution with ab > 1. This proves the

existence of a random variable supported at three points with given four moments 0, 1,m3,m4 satisfyingm4 −m2

3 − 1 > 0.Our main task is to construct a Gaussian divisible one which approximate the four moment matching

condition stated in the lemma. For any real random variable ζ, independent of ξG, and with the first fourmoments being 0, 1, m3(ζ) and m4(ζ) <∞, the first four moments of

ζ ′ = (1− γ)1/2ζ + γ1/2ξG (16.16)

are 0, 1,

m3(ζ ′) = (1− γ)3/2m3(ζ) (16.17)

and

m4(ζ ′) = (1− γ)2m4(ζ) + 6γ − 3γ2. (16.18)

140

Since we can match four moments by a random variable, for any γ > 0 there exists a real random variableξγ such that the first four moments are 0, 1,

m3(ξγ) = (1− γ)−3/2m3 (16.19)

andm4(ξγ) = m3(ξγ)2 + (m4 −m2

3).

With m4 6 C2, we have m23 6 C

3/22 , thus

|m4(ξγ)−m4| 6 Cγ (16.20)

for some C depending on C2. Hence with (16.17) and (16.18), we obtain that ξ′ = (1 − γ)1/2ξγ + γ1/2ξG

satisfies m3(ξ′) = m3 and (16.14). This completes the proof of Lemma 16.2.

We now prove Theorem 5.1, i.e, that the limit in (15.1) holds. We restrict ourselves to the real symmetriccase, since the moment matching Lemma 16.2 was formulated for real random variables. A similar argumentworks for the complex case. From the universality of Gaussian divisible ensembles (more precisely, (12.24))for any b = N−1+10ε and t > N−ε, we have

1

2b

∫ E+b

E−bdE′

∫Rn

dα O(α)(p

(n)t,N − p

(n)G,N

)(E′ +

α

N

)→ 0 , (16.21)

where p(n)t,N is the eigenvalue correlation function of the matrix ensemble Ht = e−t/2H0 +

√1− e−tHG. The

initial matrix ensemble H0 can be any Wigner ensemble. Recall H is the Wigner ensemble for which wewish to prove (15.1). Given the first four moments m1 = 0,m2 = 1,m3,m4 of the rescaled matrix elements√Nhij of H, we first use Lemma 16.2 to construct a distribution ξγ with γ = 1 − e−t. This distribution

has the property that the first four moments of ξ′, defined by (16.13), matches the first four moments of therescaled matrix elements of H. We now choose H0 to be the Wigner ensemble with rescaled matrix elementsdistributed according to ξγ . Then, on one hand, the matrix Ht will have universal local spectral statisticsin the sense of (16.21); on the other hand we can apply the Green function comparison Theorem 16.1 tothe matrix ensembles H and Ht so that (15.9) holds for these two ensembles. Clearly, (15.1) follows from(16.21) and (15.9). Note that averaging in the energy parameter E was necessary only for the first step in(16.21), the comparison argument works for any fixed energy E.

141

17 Edge Universality

The main result of edge universality for generalized Wigner matrix is given by the following theorem. Forsimplicity, we formulate it only for the real symmetric case, the complex Hermitian case is analogous.

Theorem 17.1 (Universality of extreme eigenvalues). Suppose that we have two real symmetric generalizedN ×N Wigner matrices, H(v) and H(w), with matrix elements hij given by the random variables N−1/2vijand N−1/2wij, respectively, with vij and wij satisfying the uniform subexponential decay condition (2.7).Suppose that

Ev2ij = Ew2

ij . (17.1)

Let λ(v)N and λ

(w)N denote the largest eigenvalues of H(v) and H(w), respectively. Then there is an ε > 0 and

δ > 0 depending on ϑ in (2.7) such that for any real parameter s (may depend on N) we have

P(N2/3(λ(v)N −2) 6 s−N−ε)−N−δ 6 P(N2/3(λ

(w)N −2) 6 s) 6 P(N2/3(λ

(v)N −2) 6 s+N−ε) +N−δ (17.2)

for N > N0 sufficiently large, where N0 is independent of s. Analogous result holds for the smallest eigenvalueλ1.

This theorem shows that the statistics of the extreme eigenvalues depend only on the second moments ofthe centered matrix entries, but the result does not determine the distribution. For the Gaussian case, thecorresponding distribution was identified by Tracy and Widom in [133,134]. Theorem 17.1 immediately givesthe same result for Wigner matrices, but not yet for generalized Wigner matrices. In fact, the Tracy-Widomlaw for generalized Wigner matrices does not follow from Theorem 17.1 and its proof in [24] required a quitedifferent argument, see Theorem 18.7.

We first give an outline of the proof of Theorem 17.1. Denote by %(v)N the empirical eigenvalue distribution

%(v)N (x) =

1

N

N∑α=1

δ(x− λ(v)α ) .

Our goal is to compare the difference of %(v)N and %

(w)N via their Stieltjes transforms, m

(v)N (z) and m

(w)N (z).

We will be able to locate the largest eigenvalue via the distribution of certain functionals of the Stieltjes

transforms. This will be done in the preparatory Lemma 17.2 and its Corollary 17.3. Therefore, if m(v)N (z)

and m(w)N (z) are sufficiently close, we will be able to compare λ

(v)N and λ

(w)N .

The main ingredient of the proof of Theorem 17.1 is the Green function comparison theorem at the edge

(Theorem 17.4), which shows that the distributions of m(v)N (z) and m

(w)N (z) on the critical scale Im z ≈ N−2/3

are the same provided the second moments of the matrix elements match. This will be done by a resolventexpansion, similarly to the Green function comparison theorem in the bulk (Theorem 16.1). However, theresolvent expansion is more efficient at the edge for a reason that we explain now.

Recall (6.13) stating that msc(z) η if κ =∣∣|E| − 2

∣∣ 6 η and z = E + iη. Since the extremal eigenvalue

gaps are expected to be order N−2/3, we need to set η N−2/3 in order to identify individual eigenvalues.This is a smaller scale than that of the local semicircle law; in fact mN (z) does not have a deterministiclimit; we need to identify its distribution. Hence the reference size of ImmN (z) which we should keep inmind is N−1/3. The largest eigenvalue can be located via the Stieltjes transform as follows. By definition ofmN , if there is an eigenvalue in a neighborhood of E within distance η then

ImmN (E + iη) =1

N

∑α

η

(λα − E)2 + η2>

1

Nη N−1/3,

and otherwise ImmN (E + iη) . N−1/3. Based upon this idea, we will construct a certain functional ofImmN that expresses the distribution of λN . In other words, we expect that if we can control the imaginarypart of the trace of the Green function by N−1/3, then we can identify the individual extremal eigenvalues.

142

Roughly speaking, our basic principle is that whenever we understand mN (z) with a precision better than(Nη)−1, i.e., we can improve over the strong local semicircle law (6.32)∣∣mN (z)−msc(z)

∣∣ ≺ 1

Nη, Im z = η

at some scale η, then we can ”identify” the eigenvalue distribution at that scale precisely. It is instructive tocompare the edge situation with the bulk, where the critical scale η 1/N identifies individual eigenvaluesand the typical size of Im mN is of order 1. In other words, at the critical scale Im mN is of order one inthe bulk and it is of order N−1/3 at the edge. Owing to (6.31), this fact extends to each resolvent matrixelement, i.e., |Gij | . N−1/3 at the edge. Thus a resolvent expansion is more powerful near the edge, whichexplains that less moment matching is needed than in the bulk.

Now we start the detailed proof of Theorem 17.1. We first introduce a few notation. For any E1 6 E2,let

N (E1, E2) := #j : E1 6 λj 6 E2 = Tr1[E1,E2](H)

be the unnormalized counting function of the eigenvalues. Set E+ := 2 + N−2/3+ε. From rigidity, Theo-rem 11.5, we know that λN 6 E+, i.e.,

N (E,∞) = N (E,E+) (17.3)

with very high probability, so it is sufficient to consider the counting function N (E,E+) = TrχE(H), withχE(x) := 1[E,E+](x). We will approximate the sharp cutoff function χE its smoothened version χE ? θη on

scale η N−2/3, where we recall the notation θη(x) from (15.21). The following lemma shows that theerror caused by this replacement is negligible.

Lemma 17.2. For any ε > 0 set `1 = N−2/3−3ε and η = N−2/3−9ε. Then there exist constants C, c such thatfor any E satisfying |E − 2| 6 3

2N−2/3+ε and for any D we have

P

∣∣TrχE(H)− TrχE ? θη(H)∣∣ 6 C(N−2ε +N (E − `1, E + `1)

)> 1−N−D (17.4)

if N > N0(D) is large enough.

We will not give the detailed proof of this lemma since it is a straightforward approximation argument.We only point out that θη is an approximate delta function on scale η N−2/3, and if it were compactlysupported, then the difference between TrχE(H) and TrχE ? θη(H) were clearly bounded by the number ofeigenvalues in an η-vicinity of E. Since θη(x) = π−1η/(x2 + η2), i.e., it has a quadratically decaying tail, theabove argument must be complemented by estimating the density of eigenvalues away from E. This estimatecomes from two contributions. Eigenvalues within the `1-vicinity of E are directly estimated by the termN (E − `1, E + `1). Eigenvalues farther away come with an additional factor η/`1 = N−6ε due to the decayof θη, thus it is sufficient to estimate their density only up to an Nε factor precision, which is provided bythe optimal local law at the edge. For precise details, see the proof of Lemma 6.1 in [70] for essentially thesame argument.

Using (17.3) and the local law to estimate a slightly averaged version of N (E − `1, E + `1), we arrive atthe following Corollary:

Corollary 17.3. Let |E − 2| 6 N−2/3+ε, ` = 12`1N

2ε = 12N−2/3−ε. Then the inequality

TrχE+` ? θη(H)−N−ε 6 N (E,∞) 6 TrχE−` ? θη(H) +N−ε (17.5)

holds with probability bigger than 1−N−D for any D if N is large enough. Furthermore, we have

EF(

TrχE−` ? θη(H))6 P

(N (E,∞) = 0

)6 EF

(TrχE+` ? θη(H)

)+ CN−D. (17.6)

143

Proof. We have E+ − E ` thus |E − 2 − `| 6 32N−2/3−ε, therefore (17.4) holds for E replaced with

y ∈ [E − `, E] as well. We thus obtain

TrχE(H) 6 `−1

∫ E

E−`dy Trχy(H)

6 `−1

∫ E

E−`dy Trχy ∗ θη(H) + C`−1

∫ E

E−`dy[N−2ε +N (y − `1, y + `1)

]6 TrχE−` ∗ θη(H) + CN−2ε + C

`1`N (E − 2`, E + `)

with a probability larger than 1−N−D for any D > 0. From the rigidity estimate in the form (11.25), theconditions |E − 2| 6 N−2/3+ε, `1/` = 2N−2ε and ` 6 N−2/3, we can bound

`1`N (E − 2`, E + `) 6 N1−2ε

∫ E+`

E−2`

%sc(x)dx+N−2εNε 6 CN−ε

with a very high probability, where we estimated the explicit integral using that the integration domain isin a CN−2/3+ε-vicinity of the edge at 2. We have thus proved

N (E,E+) = TrχE(H) 6 TrχE−` ∗ θη(H) +N−ε.

By (17.3), we can replace N (E,E+) by N (E,∞). This proves the upper bound of (17.5) and the lowerbound can be proved similarly.

On the event that (17.5) holds, the condition N (E,∞) = 0 implies that TrχE+` ∗ θη(H) 6 1/9. Thuswe have

P (N (E,∞) = 0) 6 P (TrχE+` ∗ θη(H) 6 1/9) + CN−D. (17.7)

Together with the Markov inequality, this proves the upper bound in (17.6). For the lower bound, we use

E F(

TrχE−` ∗ θη(H))6 P

(TrχE−` ∗ θη(H) 6 2/9

)6 P

(N (E,∞) 6 2/9 +N−ε

)= P

(N (E,∞) = 0

),

where we used the upper bound from (17.5) and that N is an integer. This completes the proof of theCorollary.

Since P(λN < E) = P(N (E,∞) = 0

), we will use (17.6) and the identity

TrχE ? θη(H) = N

∫ E+

E

dy ImmN (y + iη) (17.8)

to relate the distribution of λN with that of the Stieltjes transform mN . The following Green functioncomparison theorem shows that the distribution of the right hand side of (17.8) depends only on the secondmoments of the matrix elements. Its proof will be given after we have completed the proof of Theorem 17.1.

Theorem 17.4 (Green function comparison theorem on the edge). Suppose that the assumptions of Theorem17.1, including (17.1), hold. Let F : R→ R be a function whose derivatives satisfy

maxx|F (α)(x)| (|x|+ 1)

−C1 6 C1, α = 1, 2, 3, (17.9)

with some constant C1 > 0. Then there exists ε0 > 0 depending only on C1 such that for any ε < ε0 and forany real numbers E1 and E2 satisfying

|E1 − 2| 6 CN−2/3+ε, |E2 − 2| 6 CN−2/3+ε,

and setting η = N−2/3−ε, we have∣∣∣∣∣EF(N

∫ E2

E1

dy Imm(v)(y + iη)

)− EF

(N

∫ E2

E1

dy Imm(w)(y + iη)

)∣∣∣∣∣ 6 CN−1/6+Cε (17.10)

for some constant C and large enough N depending only on C1, ϑ, δ± and C0 (in (14.26)).

144

Note that the N times the integration gives a factor N |E1 − E2| ∼ N1/3+ε on the left hand side whichcompensates for the ”natural” size of the imaginary part of the Stieltjes transform obtained from the lo-cal semicircle law and makes the argument of F to be of order one. The bound (17.10) shows that thedistributions of this order one quantity with respect to the v and w ensembles are asymptotically the same.

Assuming that Theorem 17.4 holds, we now prove Theorem 17.1.

Proof of Theorem 17.1. By the rigidity of eigenvalues in the form of (11.25), we know that

|λN − 2| 6 N−2/3+ε, N (2− CN−2/3+ε, 2 + CN−2/3+ε) 6 CNε

with very high probability. Thus we can assume that |s| 6 Nε holds for the parameter s in Theorem 17.1;the statement is trivial for other values of s. We define E := 2 + sN−2/3. With the left side of (17.6), forany sufficiently small ε > 0, we have

Ew F (TrχE−` ∗ θη(H)) 6 Pw(N (E,∞) = 0) (17.11)

with the choice

` :=1

2N−2/3−ε, η := N−2/3−9ε.

The bound (17.10) applying to the case E1 = E−` and E2 = E+ shows that there exist δ > 0, for sufficientlysmall ε > 0, such that

Ev F (TrχE−` ∗ θη(H)) 6 Ew F (TrχE−` ∗ θη(H)) +N−δ (17.12)

(note that 9ε plays the role of the ε in the Green function comparison theorem). Then applying the rightside of (17.6) to the l.h.s of (17.12), we have

Pv(N (E − 2`,∞) = 0) 6 Ev F (TrχE−` ∗ θη(H)) + CN−D. (17.13)

Combining these inequalities, we have

Pv(N (E − 2`,∞) = 0) 6 Pw(N (E,∞) = 0) + 2N−δ (17.14)

for sufficiently small ε > 0 and sufficiently large N . Recalling that E = 2 + sN−2/3, this proves the firstinequality of (17.2) and, by switching the role of v,w, the second inequality of (17.2) as well. This completesthe proof of Theorem 17.1.

Proof of Theorem 17.4. We follow the notations and the setup in the proof of Theorem 16.1. First we provea simpler version of (17.10) with F (x) = x and without integration. Namely, we show that for any E with|E − 2| 6 N−2/3+ε we have

Nη∣∣∣Em(v)

N (z)− Em(w)N (z)

∣∣∣ = Nη∣∣∣E 1

NTr

1

H(v) − z− E

1

NTr

1

H(w) − z

∣∣∣ 6 CN−1/6+Cε, z = E + iη.

(17.15)

The prefactor Nη accounts for the N times the integration from E1 to E2 in (17.10) since |E1 − E2| 6 Cη.We write

E1

NTr

1

H(v) − z− E

1

NTr

1

H(w) − z=

γ(N)∑γ=1

[E

1

NTr

1

Hγ − z− E

1

NTr

1

Hγ−1 − z

], (17.16)

with R := (Q− z)−1, S := (Hγ − z)−1 and

1

NTrS = R+ ξv, ξv :=

4∑m=1

N−m/2R(m)v +N−5/2Ωv (17.17)

145

with

R :=1

NTrR, R(m)

v := (−1)m1

NTr(RV )mR, Ωv := − 1

NTr(RV )5S. (17.18)

All these quantities depend on γ, equivalently on the index pair (i, j) with φ(i, j) = γ (see the proof of

Theorem 16.1), i.e., R = R(γ) etc., but we will omit this fact from the notation. We also define the samequantities for v replaced with w. By the moment matching condition (17.1),

ER(m)v = ER(m)

w , m = 0, 1, 2,

so we can consider the m = 3, 4 terms only in the summation in (17.17). Since |E − 2| 6 N−2/3+ε andη = N−2/3−ε, the strong local semicircle law (6.32) implies that

∣∣Rij(z)− δijmsc(z)∣∣ ≺ √

Immsc(z)

Nη+

1

Nη6 N−1/3+Cε, (17.19)

and similar bound holds for S. In particular, the off-diagonal terms of the Green functions are of orderN−1/3+Cε and the diagonal terms are bounded with high probability. As a crude bound, we immediately

obtain that |R(m)v |, |Ωv| 6 NCε with high probability.

Hence we have

γ(N)∑γ=1

(Nη)∣∣∣E 1

NTr

1

Hγ − z− E

1

NTr

1

Hγ−1 − z

∣∣∣ 6 4∑m=3

N2(Nη)N−m/2 maxγ

∣∣∣E[R(m)v − R(m)

w

]∣∣∣+ ηN1/2+Cε.

(17.20)Given a fixed γ with φ(i, j) = γ, we have the explicit formula

R(m)v = R(m)

v (γ) = (−1)m1

N

∑k

m∑`=1

∑a`,b`=i,j

[Rka1va1b1Rb1a2 . . . vambmRbmk

]. (17.21)

First we discuss the generic case, when all indices are different, in particular i 6= j, and later we commenton the case of the coinciding indices. Notice that generically, the first and the last resolvents Rka1 , Rbmk areoff-diagonal terms, thus their size is N−1/3+Cε. Every other factor in (17.21) is O(Nε). Hence the genericcontributions to the sum (17.20) give

N2(Nη)N−m/2R(m)v 6 N2−m/2N−1/3+Cε. (17.22)

For m = 4 this is N−1/3+Cε, so it is negligible. For m = 3, we have

N2(Nη)N−m/2R(3)v 6 N1/6+Cε, (17.23)

so the straightforward power counting is not sufficient. Our goal is to show that the expectation of thisterm is in fact much smaller. To see this, we can assume that on the right hand side of (17.21), all theGreen functions except the first and the last one are diagonal since otherwise we have an additional (third)off-diagonal term, which yields an extra factor N−1/3+Cε, which would be sufficient if combined with thetrivial power counting yielding (17.23). If both intermediate Green functions in (17.21) are diagonal, thenwe can replace each of them with msc using the strong local semicircle law (17.19) at the expense of amultiplicative error N−1/3+Cε. This improves the previous power counting of order N1/6+Cε to N−1/6+Cε

as required in (17.15). Thus we only have to consider the contributions from the terms

1

N

∑k

∑a`,b`=i,j

m2scE[Rka1va1b1va2b2va3b3Rb3k

], (17.24)

contributing to R(3)v , where the summation is restricted to the choices consistent with the fact that the all

Green functions in the middle are diagonal, i.e., b1 = a2, b2 = a3. Together with the restriction a`, b` =i, j and the assumption that all indices are distinct, it yields only two terms:

1

N

∑k

m2scE[RkivijvjivijRjk

]+

1

N

∑k

m2scE[RkjvjivijvjiRik

]. (17.25)

146

Considering the prefactors N2(Nη)N−3/2 6 N5/6 in (17.23), in order to prove (17.15) we need to show thatboth terms in (17.25) are bounded by N−1+Cε, i.e., they are at least by a factor N−1/3 smaller than thetrivial power counting N−2/3+Cε from the two off-diagonal elements would give.

We will focus on the first term in (17.25). For the last resolvent entry, Rjk, we use the identity

Rjk = R(i)jk +

RjiRikRii

from (8.1). The second factor has already two off-diagonal elements, RjiRik, yielding an additional N−1/3+Cε

factor with high probability (i.e., even without taking the expectation). So we focus on the term

1

N

∑k

m2scE[RkivijvjivijR

(i)jk

]. (17.26)

Recall the identity (8.2)

Rki = −Rkk(i)∑`

R(i)k`

(N−1/2vì

).

Hence we have that

(17.26) = − 1

N

∑k

m2scN

−1/2

(i)∑`

E[RkkR

(i)k` vìvijvjivijR

(i)jk

]. (17.27)

We may again replace the diagonal term Rkk with msc at a negligible error and we are left with

−m3scN

−3/2∑k

(i)∑`

E[R

(i)k` vìvijvjivijR

(i)jk

]. (17.28)

The expectation with respect to the variable vì renders this term zero, unless ` = j, in which case we gainan additional factor N−1/2+Cε in the power counting compared with (17.26). So the contribution of the lastdisplayed expression to (17.23) is improved from N1/6+Cε to N−1/3+Cε.

The argument so far assumed that all indices are distinct. In case of each coinciding index we gain afactor 1/N from the restriction in the summation and one off-diagonal element, accounted for as a factorN−1/3+Cε, may be ”lost” in a sense that it may become diagonal. The total balance of a coinciding index inthe power counting is thus a factor N−2/3, so we conclude that terms with coinciding indices are negligible.This proves the simplified version (16.5) of Theorem 17.4.

To prove (17.10), for simplicity we ignore the integration and we prove only that∣∣∣EF (Nη Imm(v)(z))− EF

(Nη Imm(w)(z)

)∣∣∣ 6 CN−1/6+Cε, z = E + iη, (17.29)

for any E with |E−2| 6 N−2/3+ε. The η factor, up to an irrelevant N2ε accounts for the integration domainof length |E2 − E1| 6 N−2/3+ε in the arguments in (17.10). The general form (17.10) follows in the sameway. We point out that (17.29) with F (Nη Imm(z)) replaced by F (Nηm(z)) also holds with the sameargument given below.

Taking into account the telescopic summation over all γ’s, for the proof of (17.29) it is sufficient to provethat

N2

∣∣∣∣EF (Nη 1

NIm Tr

1

Hγ − z

)− EF

(Nη

1

NIm Tr

1

Hγ − z

)∣∣∣∣ 6 CN−1/6+Cε, (17.30)

and then sum it up for all γ = φ(i, j). As before, we assume the generic case, i.e., i 6= j.We Taylor expand the function F around the point

Ξ := Nη Imm(R)(z), z = E + iη,

147

where m(R) = 1N TrR. Notice that m(R) is independent of vij and wij . Recalling the definition of

ξv =

4∑m=1

N−m/2R(m)v +N−5/2Ωv (17.31)

from (17.17), and a similar definition for ξw, we obtain∣∣∣∣∣EF(Nη 1

NIm Tr

1

Hγ − z

)− EF

(Nη

1

NIm Tr

1

Hγ−1 − z

)∣∣∣∣∣ (17.32)

6∣∣∣EF ′(Ξ)

(Nη[ξv − ξw

])∣∣∣+1

2

∣∣∣EF ′′(Ξ)([Nηξv

]2 − [Nηξw]2)∣∣∣+ . . . .

We now substitute (17.31) into this expression. By the naive power counting

(Nη)N−m/2|R(m)v | 6 N−m/2−1/3+Cε, (Nη)N−5/2|Ωv| 6 N−13/6+Cε

from the previous proof. The contributions of all the Ω error terms as well as the m = 4 terms clearly satisfythe aimed bound (17.30) and thus can be neglected.

First, we prove that the quadratic term (as well as all higher order terms) in the Taylor expansion (17.32)is negligible. Here the cancellation between the v and w contributions is essential. It is therefore sufficientto consider

(Nη)23∑

m,m′=1

N−m/2−m′/2[R(m)

v R(m′)v − R(m)

w R(m′)w

]instead of [Nηξv]2 − [Nηξw]2. Since F ′′ is bounded, any term with m+m′ > 4 is negligible similarly to theprevious by power counting. The contribution of the m = m′ = 1 term to the quadratic term in (17.32) isexplicitly given by

EF ′′(Ξ)( 1

N

∑k

[RkivijRjk +RkjvjiRik

])( 1

N

∑k′

[Rk′ivijRjk′ +Rk′jvjiRik′

])− EF ′′(Ξ)

(v ↔ w

).

Since R and Ξ are independent of vij and wij the expectations with respect to vij and wij can be computed.Since their second moments coincide, the two expectations above exactly cancel each other.

We are left with the m+m′ = 3 terms, we may assume m = 2 and m′ = 1. We do not expect any morecancellation from the v and w terms, so we consider them separately. A typical such term contributing to(17.30) with all the prefactors is of the form

N2(Nη)2N−3/2F ′′(Ξ)( 1

N

∑k

RkivijRjjvjiRik

)( 1

N

∑k′

Rk′ivijRjk′).

The key point is that generically there are at least four off-diagonal elements, Rki, Rik, Rk′i and Rjk′ (wechose the ”worst” term, where the resolvent Rjj in the middle is diagonal). So together with the boundednessof F ′′, the direct power counting gives

N2(Nη)2N−3/2(N−1/3+Cε)4 = N−1/6+Cε,

which is exactly the precision required in (17.30). In case of each coinciding index (for example k = i),two off-diagonal elements may be ”lost”, but we gain a factor 1/N from the restriction of the summation.So terms with coinciding indices are negligible, as before. Here we discussed only the quadratic term inthe Taylor expansion (17.32), but it is easy to see that higher order terms are even smaller, without anycancellation effect.

So we are left with estimating the linear term in (17.32), i.e., we need to bound

N2(Nη)

3∑m=1

N−m/2EF ′(Ξ)[R(m)

v − R(m)w

],

148

where we already took into account that the m = 4 term as well as the Ωv error term from (17.31) arenegligible by direct power counting. If F ′(Ξ) were deterministic, then the previous argument leading to(17.15) would also prove the N−1/6+Cε bound for the linear term in (17.32). In fact, the exact cancellation

between the expectations of the R(m)v and R

(m)w terms for m = 1 and m = 2 relied only on computing the

expectation with respect to vij and wij and on the matching of the first and second moments. Since F ′(Ξ)is independent of vij and wij , this argument remains valid even with the F ′(Ξ) factor included. We thusneed to control the m = 3 term, i.e., show that

N2(Nη)N−3/2EF ′(Ξ)R(3)v 6 N−1/6+Cε. (17.33)

The same bound then would also hold if v is replaced with w.Using the boundedness of F ′ and the argument between (17.24) and (17.28) in the previous proof, we

see that for (17.33) it is sufficient to show that∣∣∣∣∣N2(Nη)N−3/2N−3/2∑k

(i)∑`

EF ′(Ξ)[R

(i)k` vìvijvjivijR

(i)jk

]∣∣∣∣∣ 6 N−1/6+Cε. (17.34)

Without the F ′(Ξ) term, the expectation with respect to vì would render this term zero, in the generic casewhen ` 6= j. In order to exploit this effect we need to expand F ′(Ξ) in the single variable vì.

Fix an index ` 6= j and let Q` be the matrix identical to Q except the (`, i) and (i, `) matrix elementsare zero, and let T = T ` = (Q`− z)−1 be its resolvent. Clearly, T is independent of vij and vì and the locallaw (17.19) holds for the matrix elements of T as well. Using the resolvent expansion, we find

Ξ = (Nη) Im[ 1

N

∑n

[Tnn +N−1/2

(Tn`vìTin + TniviìT`n

)+ . . .

]], (17.35)

where the higher order terms carry a prefactor N−1. We Taylor expand F ′(Ξ) around

Ξ0 := (Nη) Im1

N

∑n

Tnn,

and we obtain

F ′(Ξ) = F ′(Ξ0) +1

2F ′′(Ξ0)(Nη) Im

[ 1

N3/2

∑n

[Tn`vìTin + TniviìT`n + . . .

]]+ . . . . (17.36)

Since Ξ0 is independent of vì, when we insert the above expansion for F ′(Ξ) into (17.34), the contributionof the leading term F ′(Ξ0) gives zero after taking the expectation for vì.

The two subleading terms in (17.36), that are linear in vì, have generically two off-diagonal elements,Tn` and Tin (or Tni and T`n) which have size N−1/3+Cε each. So the contribution of this term to (17.34) bysimple power counting is

N2(Nη)N−3/2N−3/2NN(Nη)N−1/2(N−1/3+Cε)4 6 N−1/6+Cε.

Similar argument shows that the higher order terms in the Taylor expansion (17.36) as well as higher orderterms in the resolvent expansion (17.35) have a negligible contribution.

As before, we presented the argument for generic indices; coinciding indices give smaller contributions aswe explained before. We leave the details to the reader. This completes the proof of (17.30) and thus theproof of Theorem 17.4.

149

18 Further results and historical notes

Spectral statistics of large random matrices have been studied from many different perspectives. The mainfocus of this book was on one particular result; the Wigner-Dyson-Mehta bulk universality for Wignermatrices in the average energy sense (Theorem 5.1). In this last section we collect a few other recent resultsand open questions of this very active research area. We also give references to related results and summarizethe history of the development.

18.1 Wigner matrices: bulk universality in different senses

Theorem 5.1, which is also valid for complex Hermitian ensembles, settles the average energy version of theWigner-Dyson-Mehta conjecture for generalized Wigner matrices under the moment assumption (5.6) on thematrix elements. Theorem 5.1 was proved in increasing order of generality in [63], [69], [68] and [70]; seealso [130] for the complex Hermitian Wigner ensemble and for a restricted class of real symmetric matrices(see comments about this class later on). The following theorem reduces the moment assumption to 4 + εmoments. The proof in [53] also provides an effective speed of convergence in (18.2). We also point out thatthe condition (2.6) can be somewhat relaxed, see Corollary 8.3 [55].

Theorem 18.1 (Universality with averaged energy). [53, Theorem 7.2] Suppose that H = (hij) is a complexHermitian (respectively, real symmetric) generalized Wigner matrix. Suppose that for some constants ε > 0,C > 0,

E∣∣∣√Nhij∣∣∣4+ε

6 C. (18.1)

Let n ∈ N and O : Rn → R be a test function (i.e., compactly supported and continuous). Fix |E0| < 2 andξ > 0, then with bN = N−1+ξ we have

limN→∞

∫ E0+bN

E0−bN

dE

2bN

∫Rn

dα O(α)1

%sc(E)n

(p

(n)N − p(n)

G,N

)(E +

α

N%sc(E)

)= 0 . (18.2)

Here %sc is the semicircle law defined in (3.1), p(n)N is the n-point correlation function of the eigenvalue

distribution of H (4.20), and p(n)G,N is the n-point correlation function of an N ×N GUE (respectively, GOE)

matrix.

A relatively straightforward consequence of Theorem 18.1 is the average gap universality formulated inthe following corollary. The gap distributions of the comparison ensembles, i.e., the Gaussian ensembles, canbe explicitly expressed in terms of a Fredholm determinant [37,40]. Recall the notation JA,BK := Z∩ [A,B]for any real numbers A < B.

Corollary 18.2 (Gap universality with averaged label). Let H be as in Theorem 18.1 and O be a test functionof n variables. Fix small positive constants ξ, α > 0. Then for any integer j0 ∈ JαN, (1− α)NK we have

limN→∞

1

2Nξ

∑|j−j0|6Nξ

[E− EG

]O(N(λj − λj+1), N(λj − λj+2), . . . , N(λj − λj+n)

)= 0. (18.3)

Here λj’s are the ordered eigenvalues and for brevity we omitted the rescaling factor %(λj) from the argumentof O. Moreover, E and EG denote the expectation with respect to the Wigner ensemble H and the Gaussian(GOE or GUE) ensemble, respectively.

The next result shows that Theorem 18.1 holds under the same conditions without energy averaging.

Theorem 18.3 (Universality at fixed energy). [26] Consider a complex Hermitian (respectively, real sym-metric) generalized Wigner matrix with moment condition (18.1). Then for any E with |E| < 2 we have

limN→∞

∫Rn

dα O(α)1

%sc(E)n

(p

(n)N − p(n)

G,N

)(E +

α

N%sc(E)

)= 0 . (18.4)

150

The fixed energy universality (18.4) for the β = 2 (complex Hermitian) case was already known earlier,see [57, 66, 131]. This case is exceptional since the Harish-Chandra/Itzykson/Zuber identity allows one tocompute correlation functions for Gaussian divisible ensembles. This method relies on an algebraic identityand has no generalization to other symmetry classes.

Finally, the gap universality with fixed label asserts that (18.3) holds without averaging.

Theorem 18.4 (Gap universality with fixed label). [67, Theorem 2.2] Consider a complex Hermitian (respec-tively, real symmetric) generalized Wigner matrix and assume subexponential decay of the matrix elementsinstead of (18.1). Then Corollary 18.2 holds without averaging:

limN→∞

[E− EG

]O(N(λj − λj+1), N(λj − λj+2), . . . , N(λj − λj+n)

)= 0, (18.5)

for any j ∈ JαN, (1− α)NK with a fixed α > 0. More generally, for any k,m ∈ JαN, (1− α)NK we have

limN→∞

∣∣∣EO((N%k)(λk − λk+1), (N%k)(λk − λk+2), . . . , (N%k)(λk − λk+n))

(18.6)

− EGO(

(N%m)(λm − λm+1), . . . , (N%m)(λm − λm+n))∣∣∣ = 0,

where the local density %k is defined by %k := %sc(γk) with γk from (11.31).

The second part (18.6) of this theorem asserts that the gap distribution is not only independent of thespecific Wigner ensemble, but it is also universal throughout the bulk spectrum. This is the counterpart ofthe statement that the appropriately rescaled correlation functions (18.4) have a limit that is independentof E, see (4.38).

Prior to [67], universality for a single gap was only achieved in the special case of the Gaussian unitaryensemble (GUE) in [128]; this statement then immediately implies the same result for complex HermitianWigner matrices satisfying the four moment matching condition.

18.2 Historical overview of the three-step strategy for bulk universality

We summarize the recent history related to the bulk universality results in Section 18.1. The three stepstrategy first appeared implicitly in [57] in the context of complex Hermitian matrices and was conceptualizedfor all symmetry classes in [63] by linking it to the DBM. Several key components of this strategy appearedin the later work [63,64,69]. We will review the history of Steps 1-3 separately.

History of Step 1. The semicircle law was proved by Wigner for energy windows of order one [137].Various improvements were made to shrink the spectral windows; in particular, results down to scale N−1/2

were obtained by combining the results of [11] and [79]. The result at the optimal scale, N−1, referred to asthe local semicircle law, was established for Wigner matrices in a series of papers [60–62]. The method wasbased on a self-consistent equation for the Stieltjes transform of the eigenvalues, m(z), and the continuityin the imaginary part of the spectral parameter z. As a by-product, the optimal eigenvector delocalizationestimate was proved. In order to deal with the generalized Wigner matrices, one needed to consider the self-consistent equation of the vector (Gii(z))

Ni=1, the diagonal matrix elements of the Green function, since in

this case there is no closed equation for m(z) = N−1 TrG(z) [68,69]. Together with the fluctuation averaginglemma, this method implied the optimal rigidity estimate of eigenvalues in the bulk in [68] and up to theedges in [70]. Furthermore optimal control was obtained for individual matrix elements of the resolvent Gijand not only for its trace. The estimate on Gii also provided a simple alternative proof of the eigenvectordelocalization estimate. A comprehensive summary of these results can be found in [55]. Several furtherextensions concern weaker moment conditions and improvements of (logN)-powers, see e.g. [32,77,129] andreferences therein.

History of Step 2. The universality for Gaussian divisible ensembles was proved by Johansson [85] forcomplex Hermitian Wigner ensembles in a certain range of the bulk spectrum. This was the first result beyond

151

exact Gaussian calculations going back to Gaudin, Dyson and Mehta. It was later extended to complexsample covariance matrices on the entire bulk spectrum by Ben Arous and Peche [19]. There were two majorrestrictions of this method: (i) the Gaussian component was required to be of order one independent of N ;(ii) it relies on an explicit formula by Brezin-Hikami [30,31] for the correlation functions of eigenvalues whichis valid only for Gaussian divisible ensembles with unitary invariant Gaussian component. The size of theGaussian component was reduced to N−1+ε in [57] by using an improved formula for correlation functionsand the local semicircle law from [60–62].

To eliminate the usage of an explicit formula, a conceptual approach for Step 2 via the local ergodicityof Dyson Brownian motion was initiated in [63]. In [64], a general theorem for the bulk universality wasformulated which applies to all classical ensembles, i.e., real and complex Wigner matrices, real and complexsample covariance matrices and quaternion Wigner matrices. The relaxation time to local equilibrium provedin these two papers were not optimal; the optimal relaxation time, conjectured by Dyson, was obtained laterin [70]. The DBM approach in these papers yields the average gap and hence the average energy universality,while the method via the Brezin-Hikami formula gives universality in a fixed energy sense, but is strictlyrestricted to the complex Hermitian case. The fixed energy universality for all symmetry classes was recentlyachieved in [26]. The method in this paper was still based on DBM, but the analysis of DBM was very differentfrom the one discussed in this book.

History of Step 3. Once universality was established for Gaussian divisible ensembles e−t/2H0 + (1 −e−t)1/2HG (here we used the convention used in Chapter 5) for a sufficiently large class of H0 and sufficientlysmall t, the final step is to approximate the local eigenvalue distribution of a general Wigner matrix by thatof a Gaussian divisible one. The first approximation result was obtained via the reversal heat flow in [57]which required smoothness of the distribution of matrix elements. Shortly after, Tao and Vu [130] proved thefour moment theorem which asserts that the local statistics of two Wigner ensembles are the same providedthat the first four moments of their matrix elements are identical. The third comparison method is theGreen function comparison theorem, Theorem 16.1 [63, 68, 69], which was the main content of Section 16.It uses similar moment conditions as those appeared in [130], but it provides detailed information on thematrix elements of Green functions as well. Both [130] and the proof of the Green function comparisontheorem [63, 68, 69] use the local semicircle law [60, 62] and the Lindeberg’s idea, which had appeared inthe proof of the Wigner semicircle law by Chatterjee [34]. The approach [130] requires additional difficultestimates on level repulsions, while Theorem 16.1 follows directly from the estimates on the matrix elementsof the Green function from Step 1 via standard resolvent expansions. Another comparison method reviewedin this book, Theorem 15.2, can be viewed as a variation of the GFT.

In summary, the WDM conjecture was first solved in the complex Hermitian case and then for the realsymmetric case. In the first case, the bulk universality for Hermitian matrices with smooth distribution wasproved in [57]. The smoothness requirement was removed in [130] but the complex Bernoulli distributionwas still excluded; this was finally added in [58], see also [66,131] for related discussions.

The result of [130] also implies that bulk universality of symmetric Wigner matrices under the restrictionthat the first four moments of the matrix elements match those of GOE. The major roadblock to the WDMconjecture for real symmetric matrices was the lack of Step 2 in this case. For symmetric matrices, we donot know any Brezin-Hikami type formula, which was the backbone for the Step 2 in the Hermitian case.The resolution of this difficulty was to abandon any explicit formula and to prove Dyson’s conjecture onthe relaxation of the Dyson Brownian Motion, see Step 2. Together with the Green function comparisontheorem [63], bulk universality was proved for real symmetric matrices without any smoothness requirement.In particular, this result covered the real Bernoulli case [68] that has been a major objective in this subjectdue to the application to the adjacency matrices for random graphs. The flexibility of the DBM approachalso allowed one to establish universality beyond the traditional Wigner ensembles. The first such result wasbulk universality for generalized Wigner matrices [69], followed by many other models, see Section 18.6.

Finally, the technical condition assumed in all these papers, i.e., that the probability distributions of thematrix elements decay subexponentially, was reduced to the (4 + ε)-moment assumption (18.1) by using theuniversality of Erdos-Renyi matrices [53]. This yielded the bulk universality as formulated in Theorem 18.1.

152

18.3 Invariant ensembles and log-gases

The explicit formula (4.3) for the joint density function of the eigenvalues for the invariant matrix ensembledefined in (4.2) gives rise to a direct statistical physics interpretation: it can be viewed a measure µ(N) onN point particles on the real line. Inspired by the Gibbs formalism, we may write it as

µ(N)(dλ) = pN (λ)dλ = const · e−βNH(λ)dλ, λ = (λ1, . . . , λN ), (18.7)

where

H(λ) :=

N∑j=1

1

2V (λj)−

1

N

∑16i<j6N

log(λj − λi).

Clearly, H can be interpreted as the classical Hamiltonian function of a log-gas, i.e., a system of N interactingparticles subject to a potential V and a two-body logarithmic interaction. While a direct relation to invariantensembles exists only for specific values β = 1, 2, 4, log-gases may be studied for any β > 0, i.e., at any inversetemperature.

In the case of invariant ensembles, it is well-known that for V satisfying certain mild conditions thesequence of one-point correlation function, or density, associated with (18.7) has a limit as N →∞ and thelimiting equilibrium density %V (s) can be obtained as the unique minimizer of the functional

I(ν) =

∫RV (t)ν(t)dt−

∫R

∫R

log |t− s|ν(s)ν(t)dtds.

This is the analogue of the Wigner semicircle law for log-gases.We assume that % = %V is supported on a single compact interval, [A,B] and % ∈ C2(A,B). Moreover,

we assume that V is regular in the sense that % is strictly positive on (A,B) and vanishes as a square rootat the endpoints, i.e.,

%(t) = sA√t−A (1 +O (t−A)) , t→ A+, (18.8)

for some constant sA > 0 and a similar condition holds at the upper edge.It is known that these conditions are satisfied if, for example, V is strictly convex. In this case %V satisfies

the equation1

2V ′(t) =

∫R

%V (s)ds

t− s(18.9)

for any t ∈ (A,B). For the Gaussian case, V (x) = x2/2, the equilibrium density is given by the semicirclelaw, %V = %sc, see (3.1).

Under these conditions, the density of µ(N) converges to %V even locally down to the optimal scale andrigidity analogous to Theorem 11.5 holds:

Pµ(N)(|λj − γj,V | > N−

23 +ξ mink,N − k + 1− 1

3

)6 e−N

c

,

where the quantiles γj,V of the density %V are defined by

j

N=

∫ γj,V

A

%V (x)dx, (18.10)

similarly to (11.31), see Theorem 2.4 [24].The higher order correlation functions, given in Definition 4.2 and rescaled by the local density %β,V (E)

around an energy E in the bulk, exhibit a universal behavior in a sense that they depend only on β butindependent of the potential V . Similar result holds at the edges of the support of the density %β,V . Thiscan be viewed as a special realization of the universality hypothesis for the invariant matrix ensemble.Historically, the universality for the special classical cases β = 1, 2, 4 had been extensively studied viaorthogonal polynomial methods. These methods rely on explicit expressions of the correlation functions in

153

terms of Fredholm determinants involving orthogonal polynomials. The key advantage of this method is thatit yields explicit formulas for the limiting gap distributions or correlation functions. However, these explicitexpressions are available only for the specific values β = 1, 2, 4. Within the framework of this approach, theβ = 1, 2, 4 cases of Theorem 18.5 below was proved for very general potentials for β = 2 and for analyticpotentials with some additional conditions for β = 1, 4. This is a whole subject by itself and we can onlyrefer the readers to some reviews or books [39,90,115].

The first general result for nonclassical β values is the following theorem proved in [23–25]. In theseworks, a new approach to the universality of β ensemble was initiated. It departed from the previouslymentioned traditional approach using explicit expressions for the correlation functions. Instead, it relies oncomparing the probability measures of the β ensembles to those of the Gaussian ones. This basic idea ofcomparing two probability measures directly was also used in the later works, [118] and [18], albeit in adifferent way, where Theorem 18.5 were reproved under different sets of conditions on the potential V .

Theorem 18.5 (Universality with averaged energy). Assume V ∈ C4(R), regular and let β > 0. Consider the

β-ensemble µ(N) = µ(N)β,V given in (18.7) with correlation functions p

(n)V,N defined analogously to (4.20). For

the Gaussian case, V (x) = x2/2, the correlation functions are denoted by p(n)G,N . Let E0 ∈ (A,B) lie in the

interior of the support of % and similarly let E′0 ∈ (−2, 2) be inside the support of %sc. Then for bN = N−1+ξ

with some ξ > 0 we have

limN→∞

∫Rn

dαO(α)

[∫ E0+bN

E0−bN

dE

2bN

1

%(E)np

(n)V,N

(E +

α

N%(E)

)(18.11)

−∫ E′0+bN

E′0−bN

dE′

2bN

1

%sc(E′)np

(n)G,N

(E′ +

α

N%sc(E′)

)]= 0 ,

i.e., the correlation functions of µ(N)β,V averaged around E0 asymptotically coincide with those of the Gaussian

case. In particular, they are independent of E0.

Universality at fixed energy, i.e., (18.11) without averaging in E, was proven by M. Shcherbina in [118]for any β > 0 under certain regularity assumptions on the potentials. We remark that the gap distributionin the Gaussian case for any β > 0 can be described by an interesting stochastic process [135].

Theorem 18.5 immediately implies gap universality with averaged labels, exactly in the same way asCorollary 18.2 was deduced from Theorem 18.1. To formulate the gap universality with a fixed label, werecall the quantiles γj,V from (18.10) and we set

%Vj := %V (γj,V ), and %j := %sc(γj) (18.12)

to be the limiting densities at the j-th quantiles. Let EµV and EG denote the expectation w.r.t. the measureµV and its Gaussian counterpart for V (λ) = 1

2λ2.

Theorem 18.6 (Gap universality with fixed label). [67, Theorem 2.3] Consider the setup of Theorem 18.5and we also assume β > 1. Set some α > 0, then

limN→∞

∣∣∣∣∣EµV O((N%Vk )(λk − λk+1), (N%Vk )(λk − λk+2), . . . , (N%Vk )(λk − λk+n))

(18.13)

− EµGO(

(N%m)(λm − λm+1), . . . , (N%m)(λm − λm+n))∣∣∣∣∣ = 0

for any k,m ∈ JαN, (1−α)NK. In particular, the distribution of the rescaled gaps w.r.t. µV does not dependon the index k in the bulk.

We point out that Theorem 18.5 holds for any β > 0, but Theorem 18.6 requires β > 1. This is partly dueto that the De Giorgi-Nash-Moser regularity theory was used in [67]. This restriction was later removed byBekerman-Figalli-Guionnet [18] under a higher regularity assumption on V and some additional hypothesis.

154

18.4 Universality at the spectral edge

So far we stated results for the bulk of the spectrum. Similar results hold at the edge; in this case the“averaged” results are meaningless. In Theorem 17.1 we already presented a universality result for thelargest eigenvalue for the Wigner ensemble. That proof easily extends to the joint distribution of finitely manyextreme eigenvalues. The following theorem gives a bit more: they control finite dimensional distributionsof the first N1/4 eigenvalues and they also identify the limit distribution. We then give the analogous resultfor log gases.

Theorem 18.7 (Universality at the edge for Wigner matrices). [24] Let H be a generalized Wigner ensemblewith subexponentially decaying matrix elements. Fix n ∈ N, κ < 1/4 and a test function O of n variables.Then for any Λ ⊂ J1, NκK with |Λ| = n, we have∣∣∣∣[E− EG

]O

((N2/3j1/3(λj − γj)

)j∈Λ

)∣∣∣∣ 6 N−χ,with some χ > 0, where EG is expectation w.r.t. the standard GOE or GUE ensemble depending on thesymmetry class of H and γj’s are semicircle quantiles (11.31).

The edge universality for Wigner matrices was first proved by Soshinikov [124] under the assumptionthat the distribution of the matrix elements is symmetric and has finite moments for all orders. Soshnikovused an elaborated moment matching method and the conditions on the moments are not easy to improve.A completely different method based on the Green function comparison theorem was initiated in [70]. Thismethod is more flexible in many ways and it removes essentially all restrictions in previous works. Inparticular, the optimal moment condition on the distribution of the matrix elements was obtained by Leeand Yin [97]. All these works assume that the variances of the matrix elements are identical. The mainpoint of Theorem 18.7 is to consider generalized Wigner matrices, i.e., matrices with non-constant variances.It was proved in [70, Theorem 17.1] that the edge statistics for any generalized Wigner matrix coincide withthose of a generalized Gaussian Wigner matrix with the same variances, but it was not shown that thestatistics are independent of the variances themselves. Theorem 18.7 provides this missing step and thus itproves the edge universality for the generalized Wigner matrices.

Theorem 18.8 (Universality at the edge for log-gases). [24] Let β > 1 and V (resp. V ) be in C4(R), regular

such that the equilibrium density %V (resp. %V ) is supported on a single interval [A,B] (resp. [A, B]).Without loss of generality we assume that for both densities (18.8) holds with A = 0 and with the sameconstant sA. Fix n ∈ N, κ < 2/5. Then for any Λ ⊂ J1, NκK with |Λ| = n, we have∣∣∣∣(EµV − EµV )O

((N2/3j1/3(λj − γj)

)j∈Λ

)∣∣∣∣ 6 N−χ (18.14)

with some χ > 0. Here γj are the quantiles w.r.t. the density %V (18.10).

The history of edge universality runs in parallel to the bulk one in the sense that most bulk methods wereextended to the edge case. Similar to the bulk case, the edge universality was first proved via orthogonalpolynomial methods for the classical values of β = 1, 2, 4 in [38,42,110,116]. The first edge universality resultswere given independently in [24] (valid for general potentials and β > 1) and, with a completely differentmethod, in [91] (valid for strictly convex potentials and any β > 0). The later work [18] also covered theedge case and proved universality to β > 0 under some higher regularity assumption on V and an additionalcondition that can be checked for convex V . Some open questions concern the universal behavior at thenon-regular edges where even the scaling may change.

18.5 Eigenvectors

Universality questions have been traditionally formulated for eigenvectors, but they naturally extend toeigenvectors as well. They are closely related to two different important physical phenomena: delocalizationand quantum ergodicity.

155

The concept of delocalization stems from the basic theory of random Schrodinger operators describinga single quantum particle subject to a random potential V in Rd. If the potential decays at infinity, thenthe operator −∆ + V acting on L2(Rd) has a discrete pure point spectrum with L2-eigenfunctions, localizedor bound states, in the low energy regime. In the high energy regime it has absolutely continuous spectrumwith bounded but not L2-normalizable generalized eigenfunctions. The latter are also called delocalizedor extended states and they correspond to scattering and conductance. If the potential V is a stationary,ergodic random field, then celebrated Anderson localization [9] occurs: at high disorder a dense pure pointspectrum with exponentially decaying eigenfunctions appears even in the energy regime that would classicallycorrespond to scattering states. The localized regime has been mathematically well understood for both thehigh disorder regime and the regime of low density of states (near the spectral edges) since the fundamentalworks of Frohlich and Spencer [73] and later by Aizenman and Molchanov [1]. The local spectral statistics isPoisson [107], reflecting the intuition that exponentially decaying eigenfunctions typically have no overlap,so their energies are independent.

The low disorder regime, however, is mathematically wide open. In d = 3 or higher dimensions it is con-jectured that −∆+V has absolutely continuous spectrum (away from the spectral edges), the correspondingeigenfunctions are delocalized and the local eigenvalue statistics of the restriction of −∆ +V to a large finitebox follow the GOE local statistics. In other words, a phase transition is believed to occur as the strengthof the disorder varies; this is called the Anderson metal-insulator transition. Currently even the existenceof this extended states regime is not known; this is considered one of the most important open question inmathematical physics. Only the Bethe lattice (regular tree) is understood [2, 86] but this model exhibitsPoisson local statistics [3] due to its exponentially growing boundary.

The strong link between local eigenvalue statistics and delocalization for random Schrodinger operatorsnaturally raises the question about the structure of the eigenvectors for an N × N Wigner matrix H.Complete delocalization in this context would mean that any `2 normalized eigenvector u = (u1, . . . , uN ) ofH is substantially supported on each coordinate; ideally |ui|2 ≈ N−1 for each i. Due to fluctuations, thiscan hold only in a somewhat weaker sense.

The first type of results provide an upper bound on the `∞ norm in very high probability:

Theorem 18.9. Let uα, α = 1, 2, . . . , N , be the `2-normalized eigenvectors of a generalized Wigner matrix Hsatisfying the moment condition (5.6). Then for any (small) ε > 0 and (large) D > 0 we have

P(∃α : ‖uα‖∞ >

Nε

N

)6 N−D. (18.15)

Under stronger decay conditions on H, the Nε tolerance factor may be changed to a log-power and theprobability estimate improved to sub exponential. This theorem directly follows from the local semicirclelaw, Theorem 6.7. Indeed, by spectral decomposition

ImGii(z) = η∑α

|uα(i)|2

(E − λα)2 + η2> η−1|uα(i)|2, z = E + iη,

where we chose E in the η-vicinity of λα. Since from (6.31) the left hand side is bounded by Nε(Nη)−1 withvery high probability, we obtain (18.15). Under stronger decay conditions on H, the local semicircle law canbe slightly improved and thus the Nε tolerance factor may be changed to a log-power and the probabilityestimate improved to sub exponential.

Although this bound prevents concentration of eigenvectors onto a set of size less than N1−ε, it does notimply the “complete flatness” of eigenvectors in the sense that |uα(i)| ≈ N−1/2. Note that by a completeU(N) or O(N) symmetry, the eigenfunctions of a GUE or GOE matrix are distributed according to the Haarmeasure on the N -dimensional unit sphere, moreover the distribution of a particular coordinate uα(i) of anfixed eigenvector uα is asymptotically Gaussian. The same result holds for generalized Wigner matrices byTheorem 1.2 of [28] (a similar result was also obtained in [87, 132] under the condition that the first fourmoments match with those of the Gaussian):

Theorem 18.10 (Asymptotic normality of eigenvectors). Let uα be the eigenvectors of a generalized Wignermatrix with moment condition (5.6). Then there is a δ > 0 such that for any m ∈ N and I ⊂ TN :=

156

J1, N1/4K ∪ JN1−δ, N − N1−δK ∪ JN − N1/4, NK with |I| = m and for any unit vector q ∈ RN , the vector(√N(|〈q,uα〉|)α∈I

)∈ Rm is asymptotically normal in the sense of finite moments.

Moreover, the study of eigenvectors leads to another fundamental problem of mathematical physics:quantum ergodicity. Recall the quantum ergodicity theorem (Shnirel’man [122], Colin de Verdiere [35] andZelditch [140]) asserts that “most” eigenfunctions for the Laplacian on a compact Riemannian manifoldwith ergodic geodesic flow are completely flat. For d-regular graphs under certain assumptions on theinjectivity radius and spectral gap of the adjacency matrices, similar results were proved for eigenvectorsof the adjacency matrices [7]. A stronger notion of quantum ergodicity, the quantum unique ergodicity(QUE) proposed by Rudnick-Sarnak [114] demands that all high energy eigenfunctions become completelyflat, and it supposedly holds for negatively curved compact Riemannian manifolds. One case for which QUEwas rigorously proved concerns arithmetic surfaces, thanks to tools from number theory and ergodic theoryon homogeneous spaces [81, 82, 100]. For Wigner matrices, a probabilistic version of QUE was settled inCorollary 1.4 of [28]:

Theorem 18.11 (Quantum unique ergodicity for Wigner matrices). Let uα be the eigenvectors of a generalizedWigner matrix with moment condition (5.6). Then there exists ε > 0 such that for any deterministic1 6 α 6 N and I ⊂ J1, NK, for any δ > 0 we have

P

(∣∣∣∑i∈I|uα(i)|2 − |I|

N

∣∣∣ > δ) 6 N−ε

δ2. (18.16)

18.6 General mean field models

Wigner matrices and their generalizations in Definition 2.1 are the most prominent mean field random matrixensembles, but there are several other models of interest. Here we give a partial list. More detailed accountof the results and citations can be found later on in the few representative references.

The three step strategy opens up a path to investigate local spectral statistics of a large class of matrices.In many examples the limiting density of the eigenvalues is not the semicircle any more, so the DysonBrownian motion is not in global equilibrium and Step 2 is more complicated. In the spirit of Dyson’sconjecture, however, gap universality already follows from local equilibration of DBM. Indeed, a local versionof the DBM was used as a basic tool in [25]. A recent development to compare local and global dynamicsof DBM was implemented by a coupling and homogenization argument in [26]. It turns out that as long asthe initial condition of the DBM is regular on a certain scale, the local gap statistics reaches its equilibrium,i.e., the Wigner-Dyson-Mehta distribition, on the corresponding time scale [92] (see also [65] for a similarresult). This concept reduces the proof of universality to a proof of the corresponding local law.

One natural class of mean field ensembles are the deformed Wigner matrices; these are of the formH = V + W , where V is a diagonal matrix and W is a standard Wigner matrix. In fact, after a trivialrescaling, these matrices may be viewed as instances of the DBM with initial condition H0 = V , recallingthat the law of the DBM at time t is given by Ht = e−t/2V + (1 − e−t)1/2W . Assuming that the diagonalelements of V have a limiting density %V , the limiting density of Ht is given by the free convolution of %Vwith the semicircle density. The corresponding local law was proved in [93] followed by the proof of bulkuniversality in [96] and the edge universality [95,96].

The free convolution of two arbitrary densities, %α %β , naturally appear in random matrix theory asthe limiting density of the sum of two asymptotically free matrices A and B whose limiting densities aregiven by %α and %β . Moreover, for any deterministic A and B, the matrices A and UBU∗ are asymptoticallyfree [136], where U is a Haar distributed unitary matrix. Thus the eigenvalue density of A+UBU∗ is givenby the free convolution. The corresponding local law on the optimal scale in the bulk was given in [15].

Another way to generate a matrix ensemble with a limiting density different from the semicircle lawis to consider Wigner-type matrices that are further generalizations of the universal Wigner matrices fromDefinition 2.1. These are complex Hermitian or real symmetric matrices with centered independent (up tosymmetry) matrix elements but without the condition that the sum of the variances in each row is constant

157

(2.5). The limiting density is determined by the matrix of variances S by solving the corresponding Dyson-Schwinger equation. Depending on S, the density may exhibit square root singularity similar to the edge ofthe semicircle law or a cubic root singularity [5]. The corresponding optimal local law and bulk universalitywere proved in [4].

Different complications arise for adjacency matrices of sparse graphs. Each edge of the Erdos-Renyi graphis chosen independently with probability p, so it is a mean field model with a semicircle density, but a typicalrealization of the matrix contains many zero elements if p 1. Optimal local law was proved in [56], bulkuniversality for p N−1/3 in [53] and for p N−1 in [83]. Another prominent model of random graphs isthe uniform measure on d-regular graphs. The elements of the adjacency matrix are weakly dependent sincetheir sum in every row is d. For d 1 the limiting density is the semicircle law and optimal local law wasobtained in [17]. Bulk universality was shown in the regime 1 d N2/3 in [16],

Sample covariance matrices (2.9) and especially their deformations with finite rank deterministic matricesplay an important role in statistics. The first application of the three step strategy to prove bulk universalitywas presented in [64]. The edge universality was achieved in [89,94]. For applications in principal componentanalysis, the main focus is on the extreme eigenvalues and the outliers whose detailed analysis was givenin [22]. Here we have listed references for papers using methods closely related to this book. There are manyreferences in this subject which we are unable to include here.

18.7 Beyond mean field models: band matrices

Wigner predicted that universality should hold for any large quantum system, described by a HamiltonianH, of sufficient complexity. After discretization of the underlying physical state space, the Hamilton operatoris given by a matrix H whose matrix elements Hxy describe the quantum transition rate from state x to y.Generalized Wigner matrices correspond to a mean-field situation where transition from any state x to y ispossible. In contrast, random Schrodinger operators of the form −∆ +V defined on Zd allow transition onlyto neighboring lattice sites with a random on-site interaction, so they are prototypes of disordered quantumsystems with a non-trivial spatial structure.

As explained in Section 18.5, from the point of view of Anderson phase transition, generalized Wignermatrices are always in the delocalized regime. Random Schrodinger operators in d = 1 dimension, or moregeneral random matrices H representing one dimensional Hamiltonians with short range random hoppings,are in the localized regime. It is therefore natural to vary the range of interaction to detect the phasetransition.

One popular model interpolating between the Wigner matrices and random Schrodinger operator is therandom band matrix, see Example 6.1. In this model the physical state space, that labels the matrix elements,is equipped with a distance. Band matrices are characterized by the property that Hij becomes negligibleif dist(i, j) exceeds a certain parameter, W , called the band width. A fundamental conjecture [75] statesthat the local spectral statistics of a band matrix H are governed by random matrix statistics for largeW and by Poisson statistics for small W . The transition is conjectured to be sharp [75, 125] for the bandmatrices in one spatial dimension around the critical value W =

√N . In other words, if W

√N , we

expect the universality results of [26,57,63,67] to hold. Furthermore, the eigenvectors of H are expected tobe completely delocalized in this range. For W

√N , one expects that the eigenvectors are exponentially

localized. This is the analogue of the celebrated Anderson metal-insulator transition for random bandmatrices. The only rigorous work indicating the

√N threshold concern the second mixed moments of the

characteristic polynomial for a special class of Gaussian band matrices [119,121].

The localization length for band matrices in one spatial dimension was recently investigated in numerousworks. For general distribution of the matrix entries, eigenstates were proved to be localized [115] forW N1/8, and delocalization of most eigenvectors in a certain averaged sense holds for W N6/7 [50,51],improved to W N4/5 [54]. The Green’s function (H − z)−1 was controlled down to the scale Im z W−1

in [69], implying a lower bound of order W for the localization length of all eigenvectors. When the entries areGaussian with some specific covariance profiles, supersymmetry techniques are applicable to obtain strongerresults. This approach has first been developed by physicists (see [49] for an overview); the rigorous analysiswas initiated by Spencer (see [125] for an overview), with an accurate estimate on the expected density of

158

states on arbitrarily short scales for a three-dimensional band matrix ensemble in [44]. More recent worksinclude universality for W > cN [120], and the control of the Green’s function down to the optimal scaleIm z N−1, hence delocalization in a strong sense for all eigenvectors, when W N6/7 [14] with first fourmoments matching the Gaussians ones (both results require a block structure and hold in part of the bulkspectrum).

While delocalization and Wigner-Dyson-Mehta spectral statistics are expected to occur simultaneously,there is no rigorous argument directly linking them. The Dyson Brownian Motion, the cornerstone of thethree step strategy, proves universality for matrices where each entry has a non trivial Gaussian componentand the comparison ideas require that second moments match exactly. Therefore this approach cannot beapplied to matrices with many zero entries. However, a combination of the quantum unique ergodicity witha mean field reduction strategy in [27] yields Wigner-Dyson-Mehta bulk universality for a large class ofband matrices with general distribution in the large band width regime W > cN . In contrast to the bulk,universality at the spectral edge is much better understood: extreme eigenvalues follow the Tracy-Widomlaw for W N5/6, an essentially optimal condition [123].

159

References

[1] M. Aizenman and S. Molchanov. Localization at large disorder and at extreme energies: an elementaryderivation. Comm. Math. Phys., 157(2):245–278, 1993.

[2] M. Aizenman, R. Sims, and S. Warzel. Absolutely continuous spectra of quantum tree graphs withweak disorder. Comm. Math. Phys., 264(2):371–389, 2006.

[3] M. Aizenman and S. Warzel. The canopy graph and level statistics for random operators on trees.Math. Phys. Anal. Geom., 9(4):291–333 (2007), 2006.

[4] O. Ajanki, L. Erdos, and T. Kruger. Quadratic vector equations on complex upper half-plane.arXiv:1506.05095, June 2015.

[5] O. Ajanki, L. Erdos, and T. Kruger. Universality for general Wigner-type matrices. arXiv:1506.05098,June 2015.

[6] G. Akemann, J. Baik, and P. Di Francesco, editors. The Oxford handbook of random matrix theory.Oxford University Press, Oxford, 2011.

[7] N. Anantharaman and E. Le Masson. Quantum ergodicity on large regular graphs. Duke Math. J.,164(4):723–765, 2015.

[8] G. W. Anderson, A. Guionnet, and O. Zeitouni. An introduction to random matrices, volume 118 ofCambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2010.

[9] P. W. Anderson. Absence of diffusion in certain random lattices. Phys. Rev., 109:1492–1505, Mar1958.

[10] Z. D. Bai. Convergence rate of expected spectral distributions of large random matrices. I. Wignermatrices. Ann. Probab., 21(2):625–648, 1993.

[11] Z. D. Bai, B. Miao, and J. Tsay. Convergence rates of the spectral distributions of large Wignermatrices. Int. Math. J., 1(1):65–90, 2002.

[12] Z. D. Bai and Y. Q. Yin. Limit of the smallest eigenvalue of a large-dimensional sample covariancematrix. Ann. Probab., 21(3):1275–1294, 1993.

[13] D. Bakry and M. Emery. Diffusions hypercontractives. In Seminaire de probabilites, XIX, 1983/84,volume 1123 of Lecture Notes in Math., pages 177–206. Springer, Berlin, 1985.

[14] Z. Bao and L. Erdos. Delocalization for a class of random block band matrices. Probab. Theory RelatedFields, pages 1–104, 2016.

[15] Z. Bao, L. Erdos, and K. Schnelli. Local law of addition of random matrices on optimal scale.arXiv:1509.07080, September 2015.

[16] R. Bauerschmidt, J. Huang, A. Knowles, and H.-T. Yau. Bulk eigenvalue statistics for random regulargraphs. arXiv:1505.06700, May 2015.

[17] R. Bauerschmidt, A. Knowles, and H.-T. Yau. Local semicircle law for random regular graphs.arXiv:1503.08702, March 2015.

[18] F. Bekerman, A. Figalli, and A. Guionnet. Transport maps for β-matrix models and universality.Comm. Math. Phys., 338(2):589–619, 2015.

[19] G. Ben Arous and S. Peche. Universality of local eigenvalue statistics for some sample covariancematrices. Comm. Pure Appl. Math., 58(10):1316–1357, 2005.

160

[20] P. Bleher and A. Its. Semiclassical asymptotics of orthogonal polynomials, Riemann-Hilbert problem,and universality in the matrix model. Ann. of Math. (2), 150(1):185–266, 1999.

[21] A. Bloemendal, L. Erdos, A. Knowles, H.-T. Yau, and J. Yin. Isotropic local laws for sample covarianceand generalized Wigner matrices. Electron. J. Probab., 19:no. 33, 53, 2014.

[22] A. Bloemendal, A. Knowles, H.-T. Yau, and J. Yin. On the principal components of sample covariancematrices. Probab. Theory Related Fields, 164(1-2):459–552, 2016.

[23] P. Bourgade, L. Erdos, and H.-T. Yau. Bulk universality of general β-ensembles with non-convexpotential. J. Math. Phys., 53(9):095221, 19, 2012.

[24] P. Bourgade, L. Erdos, and H.-T. Yau. Edge universality of beta ensembles. Comm. Math. Phys.,332(1):261–353, 2014.

[25] P. Bourgade, L. Erdos, and H.-T. Yau. Universality of general β-ensembles. Duke Math. J.,163(6):1127–1190, 2014.

[26] P. Bourgade, L. Erdos, H.-T. Yau, and J. Yin. Fixed energy universality for generalized Wignermatrices. Comm. Pure Appl. Math., pages 1–67, 2015.

[27] P. Bourgade, L. Erdos, H.-T. Yau, and J. Yin. Universality for a class of random band matrices.arXiv:1602.02312, February 2016.

[28] P. Bourgade and H.-T. Yau. The Eigenvector Moment Flow and local Quantum Unique Ergodicity.arXiv:1312.1301, December 2013.

[29] H. Brascamp and E. H. Lieb. Best constants in Young’s inequality, its converse, and its generalizationto more than three functions. Adv. Math., 20(2):151–173, 1976.

[30] E. Brezin and S. Hikami. Correlations of nearby levels induced by a random potential. Nuclear Phys.B, 479(3):697–706, 1996.

[31] E. Brezin and S. Hikami. Spectral form factor in a random matrix theory. Phys. Rev. E, 55(4):4067–4083, 1997.

[32] C. Cacciapuoti, A. Maltsev, and B. Schlein. Bounds for the Stieltjes transform and the density ofstates of Wigner matrices. Probab. Theory Related Fields, 163(1-2):1–59, 2015.

[33] E. A. Carlen and M. Loss. Optimal smoothing and decay estimates for viscously damped conservationlaws, with applications to the 2-D Navier-Stokes equation. Duke Math. J., 81(1):135–157 (1996), 1995.A celebration of John F. Nash, Jr.

[34] T. Chen. Localization lengths and Boltzmann limit for the Anderson model at small disorders indimension 3. J. Stat. Phys., 120(1-2):279–337, 2005.

[35] Y. Colin de Verdiere. Ergodicite et fonctions propres du laplacien. Comm. Math. Phys., 102(3):497–502,1985.

[36] E. B. Davies. The functional calculus. J. London Math. Soc., 52(1):166–176, 1995.

[37] P. Deift. Orthogonal polynomials and random matrices: a Riemann-Hilbert approach, volume 3 ofCourant Lecture Notes in Mathematics. New York University, Courant Institute of MathematicalSciences, New York; American Mathematical Society, Providence, RI, 1999.

[38] P. Deift and D. Gioev. Universality at the edge of the spectrum for unitary, orthogonal, and symplecticensembles of random matrices. Comm. Pure Appl. Math., 60(6):867–910, 2007.

161

[39] P. Deift and D. Gioev. Universality in random matrix theory for orthogonal and symplectic ensembles.Int. Math. Res. Pap., 2007(2):Art. ID rpm004, 116, 2007.

[40] P. Deift and D. Gioev. Random matrix theory: invariant ensembles and universality, volume 18of Courant Lecture Notes in Mathematics. Courant Institute of Mathematical Sciences, New York;American Mathematical Society, Providence, RI, 2009.

[41] P. Deift, T. Kriecherbauer, K. T-R McLaughlin, S. Venakides, and X. Zhou. Strong asymptotics oforthogonal polynomials with respect to exponential weights. Comm. Pure Appl. Math., 52(12):1491–1552, 1999.

[42] P. Deift, T. Kriecherbauer, K. T.-R. McLaughlin, S. Venakides, and X. Zhou. Uniform asymptotics forpolynomials orthogonal with respect to varying exponential weights and applications to universalityquestions in random matrix theory. Comm. Pure Appl. Math., 52(11):1335–1425, 1999.

[43] J.-D. Deuschel and D. W. Stroock. Large deviations, volume 137 of Pure and Applied Mathematics.Academic Press, Inc., Boston, MA, 1989.

[44] M. Disertori, H. Pinson, and T. Spencer. Density of states for random band matrices. Comm. Math.Phys., 232(1):83–124, 2002.

[45] F. J. Dyson. A Brownian-motion model for the eigenvalues of a random matrix. J. Math. Phys.,3:1191–1198, 1962.

[46] F. J. Dyson. Statistical theory of the energy levels of complex systems. i, ii, and iii. J. Math. Phys,3:140–156, 157–165, 166–175, 1962.

[47] F. J. Dyson. Correlations between eigenvalues of a random matrix. Comm. Math. Phys., 19:235–250,1970.

[48] D. E. Edmunds and H. Triebel. Sharp Sobolev embeddings and related Hardy inequalities: the criticalcase. Math. Nachr., 207:79–92, 1999.

[49] K. Efetov. Supersymmetry in disorder and chaos. Cambridge University Press, Cambridge, 1997.

[50] L. Erdos and A. Knowles. Quantum diffusion and delocalization for band matrices with generaldistribution. Ann. Henri Poincare, 12(7):1227–1319, 2011.

[51] L. Erdos and A. Knowles. Quantum diffusion and eigenfunction delocalization in a random bandmatrix model. Comm. Math. Phys., 303(2):509–554, 2011.

[52] L. Erdos, A. Knowles, and H.-T. Yau. Averaging fluctuations in resolvents of random band matrices.Ann. Henri Poincare, 14(8):1837–1926, 2013.

[53] L. Erdos, A. Knowles, H.-T. Yau, and J. Yin. Spectral statistics of Erdos-Renyi Graphs II: Eigenvaluespacing and the extreme eigenvalues. Comm. Math. Phys., 314(3):587–640, 2012.

[54] L. Erdos, A. Knowles, H.-T. Yau, and J. Yin. Delocalization and diffusion profile for random bandmatrices. Comm. Math. Phys., 323(1):367–416, 2013.

[55] L. Erdos, A. Knowles, H.-T. Yau, and J. Yin. The local semicircle law for a general class of randommatrices. Electron. J. Probab., 18:no. 59, 58, 2013.

[56] L. Erdos, A. Knowles, H.-T. Yau, and J. Yin. Spectral statistics of Erdos-Renyi graphs I: Localsemicircle law. Ann. Probab., 41(3B):2279–2375, 2013.

[57] L. Erdos, S. Peche, J. A. Ramırez, B. Schlein, and H.-T. Yau. Bulk universality for Wigner matrices.Comm. Pure Appl. Math., 63(7):895–925, 2010.

162

[58] L. Erdos, J. Ramırez, B. Schlein, T. Tao, V. Vu, and H.-T. Yau. Bulk universality for Wigner Hermitianmatrices with subexponential decay. Math. Res. Lett., 17(4):667–674, 2010.

[59] L. Erdos, J. A. Ramırez, B. Schlein, and H.-T. Yau. Universality of sine-kernel for Wigner matriceswith a small Gaussian perturbation. Electron. J. Probab., 15:no. 18, 526–603, 2010.

[60] L. Erdos, B. Schlein, and H.-T. Yau. Local semicircle law and complete delocalization for Wignerrandom matrices. Comm. Math. Phys., 287(2):641–655, 2009.

[61] L. Erdos, B. Schlein, and H.-T. Yau. Semicircle law on short scales and delocalization of eigenvectorsfor Wigner random matrices. Ann. Probab., 37(3):815–852, 2009.

[62] L. Erdos, B. Schlein, and H.-T. Yau. Wegner estimate and level repulsion for Wigner random matrices.Int. Math. Res. Not., 2010(3):436–479, 2010.

[63] L. Erdos, B. Schlein, and H.-T. Yau. Universality of random matrices and local relaxation flow. Invent.Math., 185(1):75–119, 2011.

[64] L. Erdos, B. Schlein, H.-T. Yau, and J. Yin. The local relaxation flow approach to universality of thelocal statistics for random matrices. Ann. Inst. H. Poincare Probab. Statist., 48(1):1–46, 2012.

[65] L. Erdos and K. Schnelli. Universality for Random Matrix Flows with Time-dependent Density.arXiv:1504.00650, April 2015.

[66] L. Erdos and H.-T. Yau. A comment on the Wigner-Dyson-Mehta bulk universality conjecture forWigner matrices. Electron. J. Probab., 17:no. 28, 5, 2012.

[67] L. Erdos and H.-T. Yau. Gap universality of generalized Wigner and β-ensembles. J. Eur. Math. Soc.(JEMS), 17(8):1927–2036, 2015.

[68] L. Erdos, H.-T. Yau, and J. Yin. Universality for generalized Wigner matrices with Bernoulli distri-bution. J. Comb., 2(1):15–81, 2011.

[69] L. Erdos, H.-T. Yau, and J. Yin. Bulk universality for generalized Wigner matrices. Probab. TheoryRelated Fields, 154(1-2):341–407, 2012.

[70] L. Erdos, H.-T. Yau, and J. Yin. Rigidity of eigenvalues of generalized Wigner matrices. Adv. Math.,229(3):1435–1515, 2012.

[71] A. S. Fokas, A. R. It.s, and A. V. Kitaev. The isomonodromy approach to matrix models in 2Dquantum gravity. Comm. Math. Phys., 147(2):395–430, 1992.

[72] P. J. Forrester. Log-gases and random matrices, volume 34 of London Mathematical Society MonographsSeries. Princeton University Press, Princeton, NJ, 2010.

[73] J. Frohlich and T. Spencer. Absence of diffusion in the Anderson tight binding model for large disorderor low energy. Comm. Math. Phys., 88(2):151–184, 1983.

[74] M. Fukushima, Y. Oshima, and M. Takeda. Dirichlet forms and symmetric Markov processes, volume 19of de Gruyter Studies in Mathematics. Walter de Gruyter & Co., Berlin, 1994.

[75] Y. V. Fyodorov and A. D. Mirlin. Scaling properties of localization in random band matrices: aσ-model approach. Phys. Rev. Lett., 67(18):2405–2409, 1991.

[76] V. L. Girko. Asymptotics of the distribution of the spectrum of random matrices. Uspekhi Mat. Nauk,44(4(268)):7–34, 256, 1989.

[77] F. Gotze, A. Naumov, and A. Tikhomirov. Local semicircle law under moment conditions. Part I: TheStieltjes transform. arXiv:1510.07350, October 2015.

163

[78] L. Gross. Logarithmic sobolev inequalities. Amer. J. Math., 5:1061–1083, 1975.

[79] A. Guionnet and O. Zeitouni. Concentration of the spectral measure for large matrices. Electron.Comm. Probab., 5:119–136 (electronic), 2000.

[80] B. Helffer and J. Sjostrand. On the correlation for Kac-like models in the convex case. J. Stat. Phys.,74(1-2):349–409, 1994.

[81] R. Holowinsky. Sieving for mass equidistribution. Ann. of Math. (2), 172(2):1499–1516, 2010.

[82] R. Holowinsky and K. Soundararajan. Mass equidistribution for Hecke eigenforms. Ann. of Math. (2),172(2):1517–1528, 2010.

[83] J. Huang, B. Landon, and H.-T. Yau. Bulk universality of sparse random matrices. J. Math. Phys.,56(12):123301, 19, 2015.

[84] C. Itzykson and J. B. Zuber. The planar approximation. II. J. Math. Phys., 21(3):411–421, 1980.

[85] K. Johansson. Universality of the local spacing distribution in certain ensembles of Hermitian Wignermatrices. Comm. Math. Phys., 215(3):683–705, 2001.

[86] A. Klein. Absolutely continuous spectrum in the Anderson model on the Bethe lattice. Math. Res.Lett., 1(4):399–407, 1994.

[87] A. Knowles and J. Yin. Eigenvector distribution of Wigner matrices. Probab. Theory Related Fields,155(3-4):543–582, 2013.

[88] A. Knowles and J. Yin. The isotropic semicircle law and deformation of Wigner matrices. Comm.Pure Appl. Math., 66(11):1663–1750, 2013.

[89] A. Knowles and J. Yin. Anisotropic local laws for random matrices. arXiv:1410.3516, October 2014.

[90] T. Kriecherbauer and M. Shcherbina. Fluctuations of eigenvalues of matrix models and their applica-tions. arXiv:1003.6121, March 2010.

[91] M. Krishnapur, B. Rider, and B. Virag. Universality of the stochastic airy operator. Comm. PureAppl. Math., 69(1):145–199, 2016.

[92] B. Landon and H.-T. Yau. Convergence of local statistics of Dyson Brownian motion. arXiv:1504.03605,April 2015.

[93] J. O. Lee and K. Schnelli. Local deformed semicircle law and complete delocalization for Wignermatrices with random potential. J. Math. Phys., 54(10):103504, 62, 2013.

[94] J. O. Lee and K. Schnelli. Tracy-Widom Distribution for the Largest Eigenvalue of Real SampleCovariance Matrices with General Population. arXiv:1409.4979, September 2014.

[95] J. O. Lee and K. Schnelli. Edge universality for deformed Wigner matrices. Rev. Math. Phys.,27(8):1550018, 94, 2015.

[96] J. O. Lee, K. Schnelli, B. Stetler, and H.-T. Yau. Bulk universality for deformed Wigner matrices.Ann. Probab., 44(3):2349–2425, 2016.

[97] J. O. Lee and J. Yin. A necessary and sufficient condition for edge universality of Wigner matrices.Duke Math. J., 163(1):117–173, 2014.

[98] E. Levin and D. S. Lubinsky. Universality limits in the bulk for varying measures. Adv. Math.,219(3):743–779, 2008.

164

[99] E. H. Lieb and M. Loss. Analysis, volume 14 of Graduate Studies in Mathematics. American Mathe-matical Society, Providence, RI, second edition, 2001.

[100] E. Lindenstrauss. Invariant measures and arithmetic quantum unique ergodicity. Ann. of Math. (2),163(1):165–219, 2006.

[101] D. S. Lubinsky. A new approach to universality limits involving orthogonal polynomials. Ann. ofMath. (2), 170(2):915–939, 2009.

[102] A. Lytova and L. Pastur. Central limit theorem for linear eigenvalue statistics of random matriceswith independent entries. Ann. Probab., 37(5):1778–1840, 2009.

[103] V. A. Marcenko and L. A. Pastur. Distribution of eigenvalues in certain sets of random matrices. Mat.Sb. (N.S.), 72 (114):507–536, 1967.

[104] M. L. Mehta. A note on correlations between eigenvalues of a random matrix. Comm. Math. Phys.,20:245–250, 1971.

[105] M. L. Mehta. Random matrices. Academic Press, Inc., Boston, MA, second edition, 1991.

[106] M. L. Mehta and M. Gaudin. On the density of eigenvalues of a random matrix. Nuclear Phys. B,18:420–427, 1960.

[107] N. Minami. Local fluctuation of the spectrum of a multidimensional Anderson tight binding model.Comm. Math. Phys., 177(3):709–725, 1996.

[108] A. Naddaf and T. Spencer. On homogenization and scaling limit of some gradient perturbations of amassless free field. Comm. Math. Phys., 183(1):55–84, 1997.

[109] L. Pastur. Spectra of random selfadjoint operators. Uspekhi Mat. Nauk, 28(1(169)):3–64, 1973.

[110] L. Pastur and M. Shcherbina. On the edge universality of the local eigenvalue statistics of matrixmodels. Mat. Fiz. Anal. Geom., 10(3):335–365, 2003.

[111] L. Pastur and M. Shcherbina. Bulk universality and related properties of Hermitian matrix models.J. Stat. Phys., 130(2):205–250, 2008.

[112] L. Pastur and M. Shcherbina. Eigenvalue distribution of large random matrices, volume 171 of Math-ematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2011.

[113] N. S. Pillai and J. Yin. Universality of covariance matrices. Ann. Appl. Probab., 24(3):935–1001, 2014.

[114] Z. Rudnick and P. Sarnak. The behaviour of eigenstates of arithmetic hyperbolic manifolds. Comm.Math. Phys., 161(1):195–213, 1994.

[115] J. Schenker. Eigenvector localization for random band matrices with power law band width. Comm.Math. Phys., 290(3):1065–1097, 2009.

[116] M. Shcherbina. Edge universality for orthogonal ensembles of random matrices. J. Stat. Phys.,136(1):35–50, 2009.

[117] M. Shcherbina. Central limit theorem for linear eigenvalue statistics of the Wigner and sample covari-ance random matrices. Zh. Mat. Fiz. Anal. Geom., 7(2):176–192, 2011.

[118] M. Shcherbina. Change of variables as a method to study general β-models: bulk universality. J.Math. Phys., 55(4):043504, 23, 2014.

[119] T. Shcherbina. On the second mixed moment of the characteristic polynomials of 1D band matrices.Comm. Math. Phys., 328(1):45–82, 2014.

165

[120] T. Shcherbina. Universality of the local regime for the block band matrices with a finite number ofblocks. J. Stat. Phys., 155(3):466–499, 2014.

[121] T. Shcherbina. Universality of the second mixed moment of the characteristic polynomials of the 1Dband matrices: real symmetric case. J. Math. Phys., 56(6):063303, 23, 2015.

[122] A. I. Snirel′man. Ergodic properties of eigenfunctions. Uspekhi Mat. Nauk, 29(6(180)):181–182, 1974.

[123] S. Sodin. The spectral edge of some random band matrices. Ann. of Math. (2), 172(3):2223–2251,2010.

[124] A. Soshnikov. Universality at the edge of the spectrum in Wigner random matrices. Comm. Math.Phys., 207(3):697–733, 1999.

[125] T. Spencer. Random banded and sparse matrices. In The Oxford handbook of random matrix theory,pages 471–488. Oxford Univ. Press, Oxford, 2011.

[126] D. W. Stroock. Probability theory, an analytic view. Cambridge University Press, Cambridge, 1993.

[127] T. Tao. Topics in random matrix theory, volume 132 of Graduate Studies in Mathematics. AmericanMathematical Society, Providence, RI, 2012.

[128] T. Tao. The asymptotic distribution of a single eigenvalue gap of a Wigner matrix. Probab. TheoryRelated Fields, 157(1-2):81–106, 2013.

[129] T. Tao and V. Vu. Random matrices: localization of the eigenvalues and the necessity of four moments.Acta Math. Vietnam., 36(2):431–449, 2011.

[130] T. Tao and V. Vu. Random matrices: universality of local eigenvalue statistics. Acta Math., 206(1):127–204, 2011.

[131] T. Tao and V. Vu. The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices. Elec-tron. J. Probab., 16:no. 77, 2104–2121, 2011.

[132] T. Tao and V. Vu. Random matrices: universal properties of eigenvectors. Random Matrices TheoryAppl., 1(1):1150001, 27, 2012.

[133] C. A. Tracy and H. Widom. Level-spacing distributions and the Airy kernel. Comm. Math. Phys.,159(1):151–174, 1994.

[134] C. A. Tracy and H. Widom. On orthogonal and symplectic matrix ensembles. Comm. Math. Phys.,177(3):727–754, 1996.

[135] B. Valko and B. Virag. Continuum limits of random matrices and the Brownian carousel. Invent.Math., 177(3):463–508, 2009.

[136] D. Voiculescu. Limit laws for random matrices and free products. Invent. Math., 104(1):201–220, 1991.

[137] E. P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. (2),62:548–564, 1955.

[138] J. Wishart. The generalised product moment distribution in samples from a normal multivariatepopulation. Biometrika, 20A(1/2):32–52, 1928.

[139] H.-T. Yau. Relative entropy and hydrodynamics of Ginzburg-Landau models. Lett. Math. Phys.,22(1):63–80, 1991.

[140] S. Zelditch. Uniform distribution of eigenfunctions on compact hyperbolic surfaces. Duke Math. J.,55(4):919–941, 1987.

166

Index

β-ensemble, 17

Admissible control parameter, 50Airy function, 21Airy kernel, 21Anderson localization, 154Anderson transition, 154, 156

Backtracking sequence, 11Bakry-Emery argument, 95Band width, 156Brascamp-Lieb inequality, 102, 107

Catalan numbers, 12Christoffel-Darboux formula, 19Classical exponent, 17Classical location of eigenvalues, 79Concentration inequality, 97Correlation function, 18

Deformed Wigner matrix, 155Delocalization, 24, 154Determinantal correlation, 19Dirichlet form, 83, 94, 111Dirichlet form inequality, 114, 119Dyson Brownian motion (DBM), 7, 25, 80Dyson conjecture, 7, 25, 84, 112

Edge universality, 140, 153Empirical distribution function, 77Empirical eigenvalue distribution, 77Empirical eigenvalue measure, 13Energy parameter, 13Energy universality; averaged, fixed, 23, 148Entropy inequality, 91

Fluctuation averaging, 31, 46, 56, 63Free convolution, 155

Gap universality; averaged, fixed, 23, 148Gaudin distribution, 5Gaussian β-ensemble, 17, 25Gaussian divisible ensemble, 6, 24, 25Gaussian Orthogonal Ensemble (GOE), 5, 10Gaussian Unitary Ensemble (GUE), 5, 10Generalized Wigner matrix, 6, 9Generator, 84Gibbs inequality, 90Gibbs measure, 16Green function, 27Green function comparison, 134, 140, 142

Green function continuity, 125Gross’ hypercontractivity bound, 100

Haar measure, 16, 155Harish-Chandra-Itzykson-Zuber formula, 6Helffer-Sjostrand formula, 75Herbst bound, 97Hermite polynomial, 18

Integrated density of states, 77Interlacing of eigenvalues, 36Invariant ensemble, 6, 15, 151Invariant measure, 25Isotropic local law, 31, 31

Level repulsion, 81Local equilibrium, 7, 25Local relaxation flow, 113Local relaxation measure, 113Local semicircle law, 6, 24, 30Log gas, 17, 151Logarithmic Sobolev inequality (LSI), 94

Marchenko-Pastur law, 11Marcinkiewicz-Zygmund inequality, 44Mean field model, 10Moment matching, 138Moment method, 11

Ornstein-Uhlenbeck process, 6, 24, 80, 125

Partial expectation, 35, 46Pinsker inequality, 91Plancherel-Rotach asymptotics, 20, 21

Quadratic large deviation, 44Quantum (unique) ergodicity, 155, 157Quaternion determinant, 21

Random band matrix, 26, 156Random Schrodinger operator, 26, 154Regular graph, 156Regular potential, 151Relative entropy, 90Resolvent, 13, 27Resolvent decoupling identities, 47Riemann-Hilbert method, 6Rigidity of eigenvalues, 24, 79, 151

Sample covariance matrix, 10, 156Schur formula, 34

167

Self-consistent equation, 31, 37, 46Sine-kernel, 20Single entry distribution, 10Sparse graph, 156Spectral domain, 29, 50Spectral edge, 26Spectral gap, 98Spectral parameter, 13Stieltjes transform, 13, 26, 74Stochastic domination, 29

Three-step strategy, 6, 24, 149Tracy-Widom law, 140

Universal Wigner matrix, 9

Vandermonde determinant, 15

Ward identity, 47Weak local law, 31, 34Wigner ensemble, 5, 9Wigner semicircle law, 10, 26Wigner surmise, 5Wigner-Dyson-Mehta (WDM) conjecture, 5, 157Wigner-type matrix, 155Wishart matrix, 10

168

Documents

Dynamical approach to random matrix theoryhtyau/RM-Aug-2016.pdf · Dynamical approach to random matrix theory L aszl o Erd ... Random matrix theory is a fast expanding research area