25
On Martingales, Markov Chains and Concentration David Aldous 28 June 2018 David Aldous On Martingales, Markov Chains and Concentration

On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

On Martingales, Markov Chains and Concentration

David Aldous

28 June 2018

David Aldous On Martingales, Markov Chains and Concentration

Page 2: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

When I was a Ph.D. student (1974-77) Markov Chains were along-established and still active topic in Applied Probability; andMartingales were a long-established and still active topic in TheoreticalProbability. But (according to memory) there wasn’t much connectionbetween those topics. Maybe martingales were a potentially useful toolfor studying Markov Chains, but were they actually being used?

Here are the results of a MathSciNet search on “year = 1977” and“anywhere = martingale and Markov chain”.

David Aldous On Martingales, Markov Chains and Concentration

Page 3: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

4/26/2018 MR: Publications results for "(Anywhere=(Markov chain) AND Anywhere=(martingale) AND Publication Type=(Journals)) AND pubyear=1977 "

https://mathscinet.ams.org/mathscinet/search/publdoc.html?arg3=1977&co4=AND&co5=AND&co6=AND&co7=AND&dr=pubyear&pg4=AUCN&pg5=AUCN&pg6=ALLF&pg7=ALLF&pg8=ET&r=1&review_format=html&s4=&s5=&

CitationsFrom References: 0From Reviews: 1

Click here to activate Remote Access

Univ of Calif, Berkeley

ISSN 2167-5163

Publications results for "(Anywhere=(Markov chain) AND Anywhere=(martingale) AND Publication Type=(Journals)) AND pubyear=1977 "MR0482943 (58 #2978) Reviewed Bolthausen, E.

On rates of convergence in a random central limit theorem and in the central limit theorem for Markov chains. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 38 (1977), no. 4, 279–286.

60F05 (60J10)

Let be a positive recurrent irreducible Markov chain with countable state space , and assume that a map (reals) isgiven. Fix a state and define and . Using renewal theory, optional stopping results formartingales and some well-known Berry-Esseen type results for i.i.d. summands, it is proved, given specified constants and ,

and the standard normal distribution, that and for some imply

Reviewed by Roy V. Erickson © Copyright 2018, American Mathematical Society

Privacy Statement

, n = 0, 1,⋯Xn I φ: I → R

a ∈ I τ := infn ≥ 1: = aXn := φ( )φτ ∑τ

n=1Xn

μ σ

(t) := ( φ( ) − nμ ≤ σt)Fn Pa ∑n

1Xk n

1/2Φ |τ < ∞Ea |4∨c | < ∞Ea ϕτ |c c ≥ 3

‖ − Φ‖ = o( ) for allδ > 1/(6(c + 1)).Fn n−1/3+δ

David Aldous On Martingales, Markov Chains and Concentration

Page 4: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Fast forward to 1987

David Aldous On Martingales, Markov Chains and Concentration

Page 5: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

4/26/2018 MR: Publications results for "(Anywhere=(Markov chain) AND Anywhere=(martingale) AND Publication Type=(Journals)) AND pubyear=1987 "

https://mathscinet.ams.org/mathscinet/search/publications.html?pg4=AUCN&s4=&co4=AND&pg5=AUCN&s5=&co5=AND&pg6=ALLF&s6=Markov+chain&co6=AND&pg7=ALLF&s7=martingale&co7=AND&dr=pubyear&yrop=eq&

Stochastic Process. Appl.(1)

Year

1987(7)

MR0943579 Reviewed Aalen, Odd O. Dynamic modelling and causality. Scand. Actuar. J. 1987, no. 3-4, 177–190. 62P10 (60G44 62M1062P05)

MR0926552 Reviewed Moore, Marc Inférence statistique dans les processus stochastiques: aperçu historique. (French) [Statisticalinference in stochastic processes: a historical overview] Canad. J. Statist. 15 (1987), no. 3, 185–207. (Reviewer: W. A. O'N. Waugh) 62-02(01A60 60-02 62M99)

MR0925936 Reviewed Aldous, David; Shepp, Larry The least variable phase type distribution is Erlang. Comm. Statist. Stochastic Models3 (1987), no. 3, 467–473. 60J27 (90B22)

MR0917251 Reviewed Bierth, K.-J. An expected average reward criterion. Stochastic Process. Appl. 26 (1987), no. 1, 123–140. (Reviewer:Uriel G. Rothblum) 90C39 (90C40)

MR0900114 Reviewed Sonin, I. M. A theorem on separation of jets and some properties of random sequences. Stochastics 21 (1987), no.3, 231–249. (Reviewer: Robert Kertz) 60J10 (60F99 60G44)

MR0889799 Reviewed Fayolle, Guy; Iasnogorodski, Rudolph Criteria for the nonergodicity of stochastic processes: application to theexponential back-off protocol. J. Appl. Probab. 24 (1987), no. 2, 347–354. (Reviewer: Linn I. Sennott) 60G40 (60K99 68M10 94A05)

MR0877725 Reviewed Rosenkrantz, Walter A. Approximate counting: a martingale approach. Stochastics 20 (1987), no. 2, 111–120.(Reviewer: S. Csibi) 68R99 (60G42 68Q25)

© Copyright 2018, American Mathematical Society

Privacy Statement

David Aldous On Martingales, Markov Chains and Concentration

Page 6: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

I will first talk about Aldous-Shepp (1987) and then about some recentwork.

Consider the rather trivial n + 1-state continuous-time chain (Xt) withstates 0, 1, . . . , n and transition rates

qi,i−1 = 1, 1 ≤ i ≤ n.

The hitting time T = Tn,0 from n to 0 has relative variability

var(T )/(ET )2 = 1/n.

It is natural to conjecture that this is the smallest possible value for anyhitting time of any n + 1-state continuous-time chain, and Aldous-Shepp(1987) gave a short proof, based on the following general identity.

David Aldous On Martingales, Markov Chains and Concentration

Page 7: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Lemma

Let T be a first hitting time and write h(i) := EiT .

Then vari T = Ei

∑s≤T (h(Xs)− h(Xs−))2 = Ei

∫ T

0a(Xt) dt

where a(i) :=∑

j qij (h(i)− h(j))2.

Now w.l.o.g. label states as 0, 1, . . . , n, consider the hitting time on state0, and re-order so that 0 = h(0) < h(1) ≤ h(2) ≤ . . . ≤ h(n). Easy to seethat for any path i0, i1, . . . from m to 0 we have∑

u

(h(iu)− h(iu−1))2 ≥m∑i=1

(h(i)− h(i − 1))2.

So

varmT0 ≥m∑i=1

(h(i)− h(i − 1))2 (first equality in Lemma)

≥ m−1

(m∑i=1

(h(i)− h(i − 1))

)2

(Cauchy-Schwarz)

≥ m−1h2(m) ≥ n−1(EmT0)2.

David Aldous On Martingales, Markov Chains and Concentration

Page 8: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

This result is mentioned in treatments of “phase-type” distributions butneither Larry nor I pursued the topic further.

The “theory” way to understand the lemma is that t → E(T |Xs , s ≤ t)must be a martingale, and in fact it is

Mt := E(T |Xs , s ≤ t) = h(Xt∧T ) + t ∧ T (1)

for h(i) := EiT . Next, M2t has a Doob-Meyer decomposition into a

martingale Qt and a predictable process, and the decomposition is

M2t −M2

0 = Qt +

∫ t

0

a(Xs)ds, t ≤ T

fora(i) :=

∑j

qij (h(i)− h(j))2.

From these ingredients and optional sampling we get a pair of generalidentities.

David Aldous On Martingales, Markov Chains and Concentration

Page 9: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Lemma

Let T be a first hitting time within a continuous-time chain, and

h(i) := EiT .

Then

EiT = Ei

∫ T

0

b(Xt) dt, vari T = Ei

∫ T

0

a(Xt) dt

wherea(i) :=

∑j

qij (h(i)− h(j))2

b(i) :=∑j

qij(h(i)− h(j)).

These are curiously hard to find in Applied Probability textbooks, andindeed are not helpful for explicit calculations in a specific model.

David Aldous On Martingales, Markov Chains and Concentration

Page 10: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

As background to my second topic, recall the method of boundeddifferences; for a RV Z of the form

Z = f (ξ1, . . . ξn); for independent (ξi )

where f has the property

|f (x)− f (x′)| ≤ 1 whenever x, x′ differ in only one coordinate

we haveP(|Z − EZ | ≥ λn1/2) ≤ 2 exp(−λ2/2).

Basic example of a general concentration inequality – a key point isthat one can bound a difference |Z −EZ | even when you don’t know EZ .

I will describe a rather different CI, which holds for Markov chains with aspecial property.

David Aldous On Martingales, Markov Chains and Concentration

Page 11: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Consider a finite-state continuous-time Markov chain and a hitting timeT = TA on some subset A of states, and suppose

h(i) := EiT <∞ ∀i .

Neither “finite” nor “continuous” is actually important here. What isimportant is the next assumption:

h(j) ≤ h(i) for each possible transition i → j . (2)

This is a very restrictive assumption – not obvious that any “interesting”chain satisfies this assumption.

Proposition

Under condition (2), for any initial state,

var T

ET≤ maxh(i)− h(j) : i → j a possible transition.

David Aldous On Martingales, Markov Chains and Concentration

Page 12: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

We can rewrite this as

κ := maxh(i)− h(j) : i → j a possible transition

s.d.(T )

ET≤√

κ

ETwhere the right side is always ≤ 1. So we get a weak concentrationinequality if κ/ET is small.

The Proposition is an immediate corollary of the previous lemma [show]

David Aldous On Martingales, Markov Chains and Concentration

Page 13: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Are there any interesting chains where our “strong monotonicity”condition holds?

Our applications are all in the context of chains (Zt) whose states aresubsets S of a given discrete space and whose transitions are of the formS → S ∪ v. In words “increasing set-valued processes”.

And our applications use hitting times T of the form

T := inft : Zt ⊇ B for some B ∈ B

for a specified collection B of subsets B.

In applications we boundκ := maxh(i)− h(j) : i → j a possible transition by using somenatural coupling of the processes started from i and from j .

The rest of the talk is 3 examples which fit this context. The first twoare straightforward to implement.

David Aldous On Martingales, Markov Chains and Concentration

Page 14: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Example 1: a general growth process (Zt) on the lattice Z2.The states are finite vertex-sets S , the possible transitions areS → S ∪ v where v is a vertex adjacent to S . For each such transition,we assume the transition rates are bounded above and below:

0 < c∗ ≤ q(S ,S ∪ v) ≤ c∗ <∞. (3)

Initially Z0 = 0, where 0 denotes the origin. The “monotonicity”condition we impose is that these rates are increasing in S :

if v , v ′ are adjacent to S then q(S ,S ∪ v) ≤ q(S ∪ v ′,S ∪ v , v ′) .(4)

Note that we do not assume any kind of spatial homogeneity.

Proposition

Let B be an arbitrary subset of vertices Z2 \ 0, and considerT := inft : Zt ∩ B is non-empty.. Under assumptions (3, 4),

var T ≤ ET/c∗.

David Aldous On Martingales, Markov Chains and Concentration

Page 15: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Proof. Condition (4) allows us to couple versions (Z ′t ,Z′′t ) of the process

starting from states S ′ ⊂ S ′′, such that in the coupled process we haveZ ′t ⊆ Z ′′t for all t ≥ 0. In particular, h(S) := EST satisfies themonotonicity condition (2). To deduce the result from Lemma 1 we needto show that, for any given possible transition S0 → S0 ∪ v0, we have

h(S0) ≤ h(S0 ∪ v0) + 1/c∗. (5)

Now by running the process started at S0 until the first time T ∗ thisprocess contain v0, and then coupling the future of that process to theprocess started at S0 ∪ v0, we have h(S0) ≤ ES0T

∗ + h(S0 ∪ v0).And ES0T

∗ ≤ 1/c∗ by (3), establishing (5).

David Aldous On Martingales, Markov Chains and Concentration

Page 16: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Two recent arXiv posts deal with specific models of suchnon-homogeneous growth processes.

Asymptotic behavior of the Eden model with positively homogeneousedge weights by Bubeck and Gwynne

Nucleation and growth in two dimensions by Bollobas et al.

David Aldous On Martingales, Markov Chains and Concentration

Page 17: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Example 2: A multigraph process(This is one aspect of broad study in Aldous-Li (2018) A Framework forImperfectly Observed Networks).

Take a network – a finite connected graph (V,E) with edge-weightsw = (we), where we > 0 ∀e ∈ E.

Define a multigraph-valued process as follows. Initially we have thevertex-set V and no edges. For each vertex-pair e = (vy) ∈ E, edges vyappear at the times of a Poisson (rate we) process, independent overe ∈ E.

So at time t the state of the process, Zt say, is a multigraph withNe(t) ≥ 0 copies of edge e, where (Ne(t), e ∈ E) are independentPoisson(twe) random variables.

David Aldous On Martingales, Markov Chains and Concentration

Page 18: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Background motivation: model of imperfectly observed network. Howwell can we estimate some aspect of the true network from observedmultigraph?

Leads to a huge range of questions; one basic issue is to say somethingabout connectivity properties of true network. This must relate toconnectivity properties of the observed multigraph.

David Aldous On Martingales, Markov Chains and Concentration

Page 19: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

We study how long until Zt has various connectivity properties.Specifically, consider

T ′k = inft : Zt is k-edge-connectedTk = inft : Zt contains k edge-disjoint spanning trees.

Here we regard the Ne(t) copies of e as disjoint edges. Remarkably,Lemma 1 enables us to give a simple proof of a “weak concentration”bound which does not depend on the underlying weighted graph.

Proposition

s.d.(Tk)

ETk≤ 1√

k, k ≥ 1.

Via a continuization device, the same bound holds in the discrete-timemodel where edges e arrive IID with probabilities proportional to we .

We conjecture that some similar result holds for T ′k . But proving this byour methods would require some structure theory (beyond Menger’stheorem) for k-edge-connected graphs, and it is not clear whetherrelevant theory is known.

David Aldous On Martingales, Markov Chains and Concentration

Page 20: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Proof. Here the states S are multigraphs over V, and h(S) is theexpectation, starting at S , of the time until the process contains kedge-disjoint spanning trees. Monotonicity property is clear. What arethe possible values of h(S)− h(S ∪ e), where S ∪ e denotes theresult of adding an extra copy of e to the multigraph S?

Consider the “min-cut” over proper subsets S ⊂ V

γ := minS

w(S ,Sc)

where w(S ,Sc) =∑

v∈S,y∈Sc wvy . Because a spanning tree must have atleast one edge across the min-cut,

ETk ≥ k/γ. (6)

On the other hand we claim

h(S)− h(S ∪ e) ≤ 1/γ.

Given this, Lemma 1 establishes the proposition.

David Aldous On Martingales, Markov Chains and Concentration

Page 21: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Claim : h(S)− h(S ∪ e) ≤ 1/γ.

To prove this, take the natural coupling (Zt ,Z+t ) of the processes started

from S and from S ∪ e, and run the coupled process until Z+t contains

k edge-disjoint spanning trees. At this time, the process Zt eithercontains k edge-disjoint spanning trees, or else contains k − 1 spanningtrees plus two trees (regard as edge-sets t1 and t2) such thatt1 ∪ t2 ∪ e is a spanning tree. So the extra time we need to run (Zt) isat most the time until some arriving edge links t1 and t2, which hasmean at most 1/γ. This establishes the Claim.

David Aldous On Martingales, Markov Chains and Concentration

Page 22: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Perhaps the most interesting example – from The Incipient GiantComponent in Bond Percolation . . . (2016).

Example 3: Bond percolation on a general network.

An edge e of weight we becomes open at an Exponential(we)random time.

In this process we can consider

C (t) = max size (number of vertices) in a connectedcomponent of open edges at time t.

And consider “emergence of the giant component”. Studiedextensively on many non-random and specific models of randomnetworks. Can we say anything about (almost) arbitrary networks?

Traditional setting: number of vertices n→∞ asymptotics.

David Aldous On Martingales, Markov Chains and Concentration

Page 23: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Suppose (after time-scaling) there exist constants δ > 0,K <∞ suchthat

limn

ECn(δ)/n = 0; limn

ECn(K )/n > 0. (7)

In the language of random graphs, this condition says a giant componentemerges (with non-vanishing probability) at some random time of order 1.

Proposition

Given a sequence of networks satisfying (1), there exist constantsτn ∈ [δ,K ] such that, for every sequence εn ↓ 0 sufficiently slowly, therandom times

Tn := inft : Cn(t) ≥ εnn

satisfyTn − τn →p 0.

The Proposition asserts, informally, that the “incipient” time at whichthe giant component starts to emerge is deterministic to first order.

David Aldous On Martingales, Markov Chains and Concentration

Page 24: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

The proof of the Proposition is more complicated (we need to allow rare“bad” transitions) and less explicit (use a compactness argument) – butonly 2-3 pages. See The Incipient Giant Component in Bond Percolation. . . (2016),

Why care about this kind of result?Bond percolation can be re-interpreted as the SI epidemic. TheProposition can be re-interpreted as saying that, in the SI epidemic on anarbitrary network with an ”infectiousness” parameter λ, there is always acritical value λcrit such that, starting with Ω(1) but o(n) infectives,(λ < λcrit): w.h.p. o(n) ever infected(λ > λcrit): w.h.p. Ω(n) ever infected.

Would like to extend to more realistic SIR models, but needs differentarguments.

David Aldous On Martingales, Markov Chains and Concentration

Page 25: On Martingales, Markov Chains and Concentrationaldous/Talks/shepp.pdf · When I was a Ph.D. student (1974-77) Markov Chains were a long-established and still active topic in Applied

Just for fun, a math example;Take vertices as integers 1, 2, 3, . . . ,N and edge-weights

wij = g.c.d.(i , j)

with normalization 1/(N logN). Here are 6 realizations of CN(·) forN = 72, 000.

David Aldous On Martingales, Markov Chains and Concentration