Probabilistic Combinatorics

Embed Size (px)

Citation preview

  • 7/29/2019 Probabilistic Combinatorics

    1/42

    Probabilistic Combinatorics

    Unoffi

    cial Lecture NotesHilary Term 2011, 16 lecturesO. M. Riordan [email protected]

    These notes give APPROXIMATELY the contents of the lectures. They arenot official in any way. They are also not intended for distribution, only as arevision aid. I would be grateful to receive corrections by e-mail. (But pleasecheck the course webpage first in case the correction has already been made.)

    Horizontal lines indicate approximately the division between lectures.

    Recommended books:For much of the course The Probabilistic Method (second edition, Wiley,

    2000) by Alon and Spencer is the most accessible reference.

    Very good books containing a lot of material, especially about randomgraphs, are Random Graphs by Bollobas, and Random Graphs by Janson,Luczak and Rucinski. However, do not expect these books to be easy to read!

    Remark on notation: simply means , though the latter is sometimes usedfor emphasis.

    0 Introduction: what is probabilistic combina-

    torics?

    The first question is what is combinatorics! This is hard to define exactly, but

    should become clearer through examples, of which the main one is graph theory.Roughly speaking, combinatorics is the study of discrete structures. Herediscrete means either finite, or infinite but discrete in the sense that the integersare, as opposed to the reals. Usually in combinatorics, there are some underlyingobjects whose internal structure we ignore, and we study structures built onthem: the most common example is graph theory, where we do not care whatthe vertices are, but study the abstract structure of graphs on a given set ofvertices. Abstractly, a graph is just a set of unordered pairs of vertices, i.e., asymmetric irreflexive binary relation on its vertex set. More generally, we mightstudy collections of subsets of a given vertex set, for example.

    Turning to probabilistic combinatorics, this is combinatorics with random-ness involved. It can mean two things: (a) the use of randomness (e.g., randomgraphs) to solve deterministic combinatorial problems, or (b) the study of ran-dom combinatorial objects for their own sake. Historically, the subject startedwith (a), but after a while, the same objects (e.g., random graphs) come upagain and again, and one realizes that it is not only important, but also inter-esting, to study these in themselves, as well as their applications. The subjecthas also lead to new developments in probability theory.

    The course will mainly be organized around proof techniques. However,each technique will be illustrated with examples, and one particular example

    1

  • 7/29/2019 Probabilistic Combinatorics

    2/42

    (random graphs) will occur again and again, so by the end of the course we will

    have covered aim (b) in this special case as well as aim (a) above.The first few examples will be mathematically very simple; nevertheless theywill show the power of the method in general. Of course, modern applicationsare usually not quite so simple.

    1 The first moment method

    Perhaps the most basic inequality in probability is the union bound: if A1 andA2 are two events, then P(A1 A2) P(A1) + P(A2). More generally,

    P

    A1 An

    n

    i=1P(Ai).

    This totally trivial fact is already useful.

    Example 1.1 (Ramsey numbers). Recall the definition: the Ramsey numberR(k, ) is the smallest n such that every red/blue colouring of the edges of thecomplete graph Kn contains either a red Kk or a blue K.

    Theorem 1.1 (Erdos, 1947). If n, k 1 are integers such that nk21(k2) < 1,then R(k, k) > n.

    Proof. Colour the edges ofKn red/blue at random so that each edge is red withprobability 1/2 and blue with probability 1/2, and the colours of all edges areindependent.

    There are nk copies of Kk in Kn. Let Ai be the event that the ith copy is

    monochromatic. Then

    P(Ai) = 2

    1

    2

    (k2)= 21(

    k2).

    Thus

    P( monochromatic Kk)

    P(Ai) =

    n

    k

    21(

    k2) < 1.

    Thus, in the random colouring, the probability that there is no monochromaticKk is greater than 0. Hence it is possible that the random colouring is good(contains no monochromatic Kk), i.e., there exists a good colouring.

    To deduce an explicit bound on R(k, k) involves a little calculation.

    Corollary 1.2. R(k, k) 2k/2 for k 3.Proof. Set n = 2k/2. Then

    n

    k

    21(

    k2) n

    k

    k!21(

    k2) 2

    k2/2

    k!21k

    2/2+k/2 =21+k/2

    k!,

    which is smaller than 1 if k 3.

    2

  • 7/29/2019 Probabilistic Combinatorics

    3/42

    Remark. The result above is very simple, and may seem weak. But the best

    known lower bounds proved by non-random methods are roughly k(log k)C

    with Cconstant, i.e., grow only slightly faster than polynomially. This is tiny comparedwith the exponential bound above. Note that the upper bounds are roughly 4k,so exponential is the right order: the constant is unknown.

    Often, the first-moment method simply refers to using the union bound asabove. But it is much more general than that. We recall another basic termfrom probability.

    Definition. The first moment of a random variable X is simply its mean, orexpectation, written E[X].

    Recall that expectation is linear. If X and Y are (real-valued) random vari-ables and is a (constant!) real number, then E[X + Y] = E[X] + E[Y], andE[X] = E[X]. Crucially, these ALWAYS hold, irrespective of any relationship(or not) between X and Y.

    If A is an event, then its indicator function IA is the random variable whichtakes the value 1 when A holds and 0 when A does not hold.

    Let A1, . . . , An be events, let Ii be the indicator function of Ai, and setX =

    Ii, so X is the (random) number of the events Ai that hold. Then

    E[X] =ni=1

    E[Ii] =ni=1

    P(Ai).

    We use the following trivial observation: for any random variable X, it cannotbe true that X is always smaller than its mean, or always larger: ifE[X] = ,then P(X ) > 0 and P(X ) > 0.Example 1.2 (Ramsey numbers again).

    Theorem 1.3. Letn, k 1 be integers. Then

    R(k, k) > n

    n

    k

    21(

    k2).

    Proof. Colour the edges of Kn as before. Let X denote the (random) numberof monochromatic copies of Kk in the colouring. Then

    = E[X] =

    n

    k

    21(

    k2).

    Since P(X ) > 0, there exists a colouring with at most monochromaticcopies of Kk. Pick one vertex from each of these monochromatic Kks thismay involve picking the same vertex more than once. Delete all the selectedvertices. Then we have deleted at most vertices, and we are left with a goodcolouring of Km for some m n . Thus R(k, k) > m n .

    The type of argument above is often called a deletion argument. Instead oftrying to avoid bad things in our random structure, we first ensure that therearent too many, and then fix things (here by deleting) to get rid of those few.

    3

  • 7/29/2019 Probabilistic Combinatorics

    4/42

    Corollary 1.4. R(k, k) (1 o(1))e1k2k/2.Here we are using standard asymptotic notation (see handout). Explicitly,

    we mean that for any > 0 there is a k0 such that R(k, k) (1 )e1k2k/2for all k k0.Proof. Exercise: take n = e1k2k/2.

    We now give a totally different example of the first-moment method.

    Example 1.3 (Sum-free sets).

    Definition. A set S R is sum-free if there do not exist a,b,c S such thata + b = c.

    Note that {1, 2} is not sum-free, since 1 + 1 = 2. The set {2, 3, 7, 8, 12} issum-free, for example.

    Theorem 1.5 (Erdos, 1965). LetS = {s1, s2, . . . , sn} be a set of n 1 non-zerointegers. There is some A S such that A is sum-free and |A| > n/3.

    End L1 Start L2

    Proof. We use a trick: fix a prime p of the form 3k + 2 not dividing any si; forexample, take p > max |si|.

    Let I = {k + 1, . . . , 2k + 1}. Then I is sum-free modulo p: there do not exista,b,c I such that a + b c mod p.

    Pick r uniformly at random from 1, 2, . . . , p 1, and set ti = rsi mod p, soeach ti is a random element of {1, 2, . . . , p

    1}.

    For each i, as r runs from 1 to p1, ti takes each possible value 1, 2, . . . , p1exactly once. Hence

    P(ti I) = |I|p 1 =

    k + 1

    3k + 1>

    1

    3.

    By the first moment method, we have

    E[#i such that ti I] =ni=1

    P(ti I) > n/3.

    It follows that there is some r such that, for this particular r, the number ofi with ti

    I is greater than n/3. For this r, let A = {si : ti

    I}, so A

    S

    and |A| > n/3. If we had si, sj , sk A with si + sj = sk then we would haversi + rsj = rsk, and hence ti + tj tk mod p, which contradicts the fact thatI is sum-free modulo p.

    The proof above is an example of an averaging argument. This particularexample is not so easy to come up with without a hint, although it is hopefullyeasy to follow.

    4

  • 7/29/2019 Probabilistic Combinatorics

    5/42

    Example 1.4 (2-colouring hypergraphs). A hypergraph H is simply an ordered

    pair (V, E) where V is a set of vertices and E is a set of hyperedges, i.e., a setof subsets of V. Hyperedges are often called simply edges.Note that E is a set, so each possible hyperedge (subset of V) is either

    present or not, just as each possible edge of a graph is either present or not.If we want to allow multiple copies of the same hyperedge, we could definemulti-hypergraphs in analogy with multigraphs.

    H is r-uniform if |e| = r for all e E, i.e., if every hyperedge consists ofexactly r vertices. In particular, a 2-uniform hypergraph is simply a graph.

    An example of a 3-uniform hypergraph is the Fano plane shown in the figure.This has 7 vertices and 7 hyperedges; in the drawing, the 6 straight lines and thecircle each represent a hyperedge. (As usual, how they are drawn is irrelevant,all that matters is which vertices each hyperedge contains.)

    A (proper) 2-colouring of a hypergraph H is a red/blue colouring of thevertices such that every hyperedge contains vertices of both colours. (If H is2-uniform, this is same as a proper 2-colouring of H as a graph.) We say that His 2-colourable if it has a 2-colouring. (Sometimes this is called having propertyB.)

    Let m(r) be the minimum m such that there exists a non 2-colourable r-uniform hypergraph with m edges. It is easy to check that m(2) = 3. It isharder to check that m(3) = 7 (there is no need to do this!).

    Theorem 1.6. For r 2 we have m(r) 2r1.Proof. Let H = (V, E) be any r-uniform hypergraph with m < 2r1 edges.Colour the vertices red and blue randomly: each red with probability 1/2 andblue with probability 1/2, with different vertices coloured independently. Forany e E, the probability that e is monochromatic is 2(1/2)r = (1/2)r1.By the union bound, it follows that the probability that there is at least onemonochromatic edge is at most m(1/2)r1 < 1. Thus there exists a goodcolouring.

    We can also obtain a bound in the other direction; this is slightly harder.

    Theorem 1.7 (Erdos, 1964). If r is large enough then m(r) 3r22r.Proof. Fix r 3. Let V be a set of n vertices, where n (which depends on r)will be chosen later.

    5

  • 7/29/2019 Probabilistic Combinatorics

    6/42

    Let e1, . . . , em be chosen independently and uniformly at random from allnr

    possible hyperedges on V. Although repetitions are possible, the hypergraphH = (V, {e1, . . . , em})

    certainly has at most m hyperedges.Let be any red/blue colouring of V. (Not a random one this time.) Then

    has either at least n/2 red vertices, or at least n/2 blue ones. It follows that at

    least (crudely)n/2r

    of the possible hyperedges are monochromatic with respect

    to .Let p = p denote the probability that e1 (a hyperedge chosen at random

    from all possibilities) is monochromatic with respect to . Then

    p

    n/2r nr = (n/2)(n/2 1) (n/2 r + 1)

    n(n 1) (n r + 1)

    n/2 r + 1n r + 1r

    n/2 rn rr

    = 2r

    1 rn rr

    .

    Set n = r2. Then p 2r(1 1/(r 1))r. Since (1 1/(r 1))r e1 asr , we see that p 132r if r is large enough, which we assume from nowon.

    The probability that a given, fixed colouring is a proper 2-colouring ofour random hypergraph H is simply the probability that none of e1, . . . , emis monochromatic with respect to . Since e1, . . . , em are independent, this isexactly (1 p)m.

    By the union bound, the probability that H is 2-colourable is at most the

    sum over all possible

    of the probability that

    is a 2-colouring, which is atmost 2n(1 p)m.Recall that n = r2, and set m = 3r22r. Then using the standard fact

    1 x ex, we have

    2n(1 p)m 2nepm 2r2e 3r22r

    32r = 2r2

    er2

    < 1.

    Therefore there exists an r-uniform hypergraph H with at most m edges andno 2-colouring.

    End L2 Start L3

    Remark. Why does the first moment method work? Often, there is some com-plicated event A whose probability we want to know (or at least bound). Forexample, A might be the event that the random colouring is a 2-colouringof a fixed (complicated) hypergraph H. Often, A is constructed by taking theunion or intersection of simple events A1, . . . , Ak.

    If A1, . . . , Ak are independent, then

    P(A1 Ak) =

    P(Ai) and P(A1 Ak) = 1

    (1P(Ai)).

    6

  • 7/29/2019 Probabilistic Combinatorics

    7/42

    If A1, . . . , Ak are mutually exclusive, then

    P(A1 Ak) =

    P(Ak).

    (For example, this gives us the probability 2(1/2)|e| that a fixed hyperedge ismonochromatic in a random 2-colouring.)

    In general, the relationship between the Ai may be very complicated. How-ever, if X is the number of Ai that hold, then we always have E[X] =

    P(Ai)

    andP(Ai) = P(X > 0)

    P(Ai).

    The key point is that while the left-hand side is complicated, the right-handside is simple: we evaluate it by looking at one simple event at a time.

    So far we have used the expectation only via the observations P(XE[X]) >

    0 and P(X E[X]) > 0, together with the union bound. A slightly moresophisticated (but still simple) way to use it is via Markovs inequality.

    Lemma 1.8 (Markovs inequality). If X is a random variable taking only non-negative values and t > 0, thenP(X t) E[X]/t.Proof. E[X] tP(X t) + 0P(X < t).

    We now start on one of our main themes, the study of the random graphG(n, p).

    Definition. Given an integer n 1 and a real number 0 p 1, the randomgraph G(n, p) is the graph with vertex set [n] = {1, 2, . . . , n} in which each

    possible edge ij, 1 i < j n is present with probability p, independently ofthe others.Thus, for any graph H on [n],

    P

    G(n, p) = H

    = pe(H)(1 p)(n2)e(H).

    For example, if p = 1/2, then all 2(n2) graphs on [n] are equally likely.

    Remark. It is important to remember that we work with labelled graphs. Forexample, the probability that G(3, p) is a star with three vertices is 3p2(1 p),since there are three (isomorphic) graphs on {1, 2, 3} that are stars.

    We use the notation G(n, p) for the probability space of graphs on [ n] with

    the probabilities above. All of G G(n, p), G = G(n, p) and G G(n, p) meanexactly the same thing, namely that G is a random graph with this distribution.

    This model of random graphs is often called the Erd osRenyi model, al-though in fact it was first defined by Gilbert. Erdos and Renyi introduced anessentially equivalent model, and were the real founds of the theory of randomgraphs, so associating the model with their names is reasonable!

    7

  • 7/29/2019 Probabilistic Combinatorics

    8/42

    Example 1.5 (High girth and chromatic number). Let us recall some defini-

    tions. The girth g(G) of a graph G is the length of the shortest cycle in G, or if G contains no cycles. The chromatic number(G) is the least k such thatG has a k-colouring of the vertices in which adjacent vertices receive differentcolours, and the independence number (G) is the maximum number of verticesin an independent set I in G, i.e., a set of vertices of G no two of which arejoined by an edge.

    Recall that (G) |G|/(G).Theorem 1.9 (Erdos, 1959). For any k and there exists a graph G with(G) k and g(G) .

    There are non-random proofs of this, but it is not so easy.The idea of the proof is to consider G(n, p) for suitable n and p. We will

    show separately that (a) very likely there are few short cycles, and (b) verylikely there is no large independent set. Then it is likely that the properties in(a) and (b) both hold, and after deleting a few vertices (to kill the short cycles),we obtain the graph we need.

    Proof. Fix k, 3. There aren(n 1) (n r + 1)

    2r

    possible cycles of length r in G(n, p): the numerator counts sequences of rdistinct vertices, and the denominator accounts for the fact that each cyclecorresponds to 2r sequences, depending on the choice of starting point anddirection.

    Let Xr be the number of r-cycles in G(n, p). Then

    E[Xr] =n(n 1) (n r + 1)

    2rpr n

    rpr

    2r.

    Set p = n1+1/, and let X be the number of short cycles, i.e., cycles withlength less than . Then X = X3 + X4 + + X1, so

    E[X] =1r=3

    E[Xr] 1r=3

    (np)r

    2r=

    1r=3

    nr/

    2r= O(n

    1 ) = o(n).

    By Markovs inequality it follows that

    P(X n/2) E[X]n/2

    0.

    Set m = n11/(2). Let Y be the number of independent sets in G(n, p) ofsize (exactly) m. Then

    E[Y] =

    n

    m

    (1 p)(m2 ) en

    m

    mep(

    m2 ) =en

    mep

    m12

    m.

    8

  • 7/29/2019 Probabilistic Combinatorics

    9/42

    Now

    pm

    1

    2 pm

    2 n1+

    1 n1

    12

    2 =n

    12

    2 .

    Thus p(m 1)/2 2log n ifn is large enough, which we may assume. But then

    E[Y] en

    mn2m

    0,

    and by Markovs inequality we have

    P(Y > 0) = P(Y 1) E[Y] 0,i.e., P((G) m) 0.

    Combining the two results above, we have P(X n/2 or (G) m) 0.Hence, if n is large enough, there exists some graph G with n vertices, withfewer than n/2 short cycles, and with (G) < m.

    Construct G by deleting one vertex from each short cycle of G. Theng(G) , and |G| n n/2 = n/2. Also, (G) (G) < m. Thus

    (G) |G|

    (G) n/2

    m n/2

    n112

    =1

    2n

    12 ,

    which is larger than k if n is large enough.

    End L3 Start L4

    Our final example of the first moment method will be something ratherdifferent.

    Example 1.6 (Antichains).

    A set system on V is just a collection A of subsets of V, i.e., a subset of thepower set P(V). Thus A corresponds to the edge set of a hypergraph on V;although the concepts are the same, the viewpoints tend to be slightly different.

    The r-th layer Lr of P(V) is just {A V : |A| = r}.We often think of an arrow from A to B whenever A, B P(V) with A B

    and A = B. A chain is a sequence A1, . . . , At of subsets of V with A1 A2 At, i.e., each subset is strictly contained in the next. An antichain is justa collection of sets A1, . . . , Ar with no arrows, i.e., no Ai contained in any otherAj .

    Suppose that |V| = n. How many sets can a chain contain? At most n + 1,since we can have at most one set from each layer. Also, every maximal chainC is of the form (C0, C1, . . . , C n) with Cr Lr; in other words, we start fromC0 =

    and add elements one-by-one in any order, finishing with Cn = V.

    A harder question is: how many sets can an antichain A P(V) contain?An obvious candidate for a large antichain is the biggest layer Lr.

    Theorem 1.10 (LYM Inequality). LetA P(V) be an antichain, where |V| =n. Then

    nr=0

    |A Lr||Lr|

    1.

    9

  • 7/29/2019 Probabilistic Combinatorics

    10/42

    Proof. Let C = (C0, C1, . . . , C n) be a maximal chain chosen uniformly at ran-

    dom. Since Cr is uniformly random from Lr, we have

    P(Cr A) = |A Lr||Lr|

    .

    Now |C A| 1 otherwise for two sets A, B in the intersection, since they arein C we have we have A B or B A, which is impossible for two sets in A.Thus

    1 E[|C A|] =n

    r=0

    P(Cr A) =n

    r=0

    |A Lr||Lr|

    .

    Corollary 1.11 (Sperners Theorem). If |V| = n and A

    P(V) is an an-

    tichain, then |A| maxr |Lr| = nn/2.Proof. Exercise.

    2 The second moment method

    Suppose (Xn) is a sequence of random variables, each taking non-negative in-teger values. IfE[Xn] 0 as n , then we have P(Xn > 0) = P(Xn 1) E[Xn] 0. Under what conditions can we show that P(Xn > 0) 1? SimplyE[Xn] is not enough: it is easy to find examples where E[Xn] , butP(Xn = 0) 1. Rather, we need some control on the difference between Xnand E[Xn].

    Definition. The variance Var[X] of a random variable X is defined by

    Var[X] = E

    (X EX)2 = EX2 EX2.We recall a basic fact from probability.

    Lemma 2.1 (Chebyshevs Inequality). LetX be a random variable and t 0.Then

    P

    |X EX| t Var[X]t2

    .

    Proof. We have

    P

    |X EX| t = P(X EX)2 t2.Applying Markovs inequality this is at most E

    (XEX)2/t2 = Var[X]/t2.

    In practice, we usually use this as follows.

    Corollary 2.2. Let(Xn) be a sequence of random variables withE[Xn] = n >0 and Var[Xn] = o(2n). ThenP(Xn = 0) 0.

    10

  • 7/29/2019 Probabilistic Combinatorics

    11/42

    Proof.

    P(Xn = 0) P|Xn n| n Var[Xn]2n 0.Remark. The mean = E[X] is usually easy to calculate. Since Var[X] =E[X2] 2, this means that knowing the variance is equivalent to knowing thesecond momentE[X2]. In particular, with n = E[Xn], the condition Var[Xn] =o(2n) is exactly equivalent to E[X

    2n] = (1 + o(1))

    2n, i.e., E[X

    2n] 2n:

    Var[Xn] = o(2n) E[Xn]2 2n.

    Sometimes the second moment is more convenient to calculate than the variance.

    Suppose that X = I1 + + Ik, where each Ii is the indicator function ofsome event Ai. We have seen that E[X] is easy to calculate; E[X2] is not too

    much harder:

    E[X2] = E

    i

    Iij

    Ij

    = E

    i

    j

    IiIj

    =i

    j

    E[IiIj ] =k

    i=1

    kj=1

    P(AiAj).

    Example 2.1 (K4s in G(n, p)).

    Theorem 2.3. Letp = p(n) be a function of n.

    1. If n4p6 0 as n , thenP(G(n, p) contains a K4) 0.2. If n4p6 as n , thenP(G(n, p) contains a K4) 1.

    Proof. Let X (really Xn, as the distribution depends on n) denote the numberof K4s in G(n, p). For each set S of 4 vertices from [n], let AS be the event thatS spans a K4 in G(n, p). Then

    = E[X] =S

    P(AS) =

    n

    4

    p6 =

    n(n 1)(n 2)(n 3)4!

    p6 n4p6

    24.

    In case 1 it follows that E[X] 0, so P(X > 0) 0, as required.For the second part of the result, we have E[X2] =

    S

    T P(ASAT). The

    contributions from all terms where S and T meet in a given number of verticesare as follows:

    |S T| contribution

    0

    n4

    n44

    p12 n424 n4

    24p12 2

    1n4

    4n43

    p12 = (n7p12)

    2n4

    42

    n42

    p11 = (n6p11)

    3n4

    43

    n41

    p9 = (n5p9)

    4n4

    p6 =

    11

  • 7/29/2019 Probabilistic Combinatorics

    12/42

    Recall that by assumption n4p6 , so and the last contribution is o(

    2

    ). How do the other contributions compare to

    2

    ? Firstly, since2 = (n8p12), we have n7p12 = o(2). For the others, note that n4p6 0,i.e., p6 = o(n4), so p1 = o(n2/3). Thus

    n6p11

    n8p12= n2p1 = o(n2n2/3) = o(1),

    andn5p9

    n8p12= n3p3 = o(n3n2) = o(1).

    Putting this all together, E[X2] = 2 + o(2), so Var[X] = o(2), and byCorollary 2.2 we have P(X = 0) 0.

    End L4 Start L5

    Remark. Sometimes one considers the 2nd factorial moment of X, namelyE2[X] = E[X(X1)]. IfX is the number of events Ai that hold, then X(X1)is the number of ordered pairs of distinct events that hold, so

    E2[X] =i

    j=i

    P(Ai Aj).

    Since X2 = X(X 1) + X, we have E[X2] = E2[X] + E[X], so if we know themean, then knowing any ofE[X2], E2[X] or Var[X] lets us easily calculate theothers.

    When applying the second moment method, our ultimate aim is alwaysto estimate the variance, showing that it is small compared to the square of

    the mean, so Corollary 2.2 applies. So far we first calculated E[X2], due tothe simplicity of the formula

    i

    j P(Ai Aj). However, this involves some

    unnecessary work when many of the events are independent. We can avoidthis by directly calculating the variance.

    Suppose as usual that X = I1 + . . . + Ik, with Ii the indicator function ofAi. Then

    Var[X] = E[X2] (E[X])2

    =i

    j

    P(Ai Aj)

    i

    P(Ai)

    j

    P(Aj)

    =i

    j

    P(Ai Aj) P(Ai)P(Aj)

    .

    Write i j if i = j and Ai and Aj are dependent. The contribution from termswhere Ai and Aj are independent is zero by definition, so

    Var[X] =i

    P(Ai) P(Ai)2

    +i

    ji

    P(Ai Aj) P(Ai)P(Aj)

    E[X] +i

    ji

    P(Ai Aj).

    12

  • 7/29/2019 Probabilistic Combinatorics

    13/42

    Note that the second last line is an exact formula for the variance. The last

    line is just an upper bound, but this upper bound is often good enough.The bound above gives another standard way of applying the 2nd momentmethod.

    Corollary 2.4. If = E[X] and =iji P(Ai Aj) = o(2), thenP(X > 0) 1.Proof. We have

    Var[X]

    2 +

    2=

    1

    +

    2 0.

    Now apply Chebyshevs inequality in the form of Corollary 2.2.

    Definition. Let P be a property of graphs (e.g., contains a K4). A functionp(n) is called a threshold function for P in the model G(n, p) if

    p(n) = o(p(n)) implies that P(G(n, p(n)) has P) 0, and p(n)/p(n) implies that P(G(n, p(n)) has P) 1.Theorem 2.3 says that n2/3 is a threshold function for G(n, p) to contain a

    K4. Note that threshold functions are not quite uniquely defined (e.g., 2n2/3

    is also one). Also, not every property has a threshold function, although manydo.

    Example 2.2 (Small subgraphs in G(n, p)). Fix a graph H with v vertices ande edges. What is the threshold for copies of H to appear in G(n, p)? (Here His small in the sense that its size is fixed as n .)

    Let X be the number of copies of H in G(n, p). There are n(n

    1) (n

    v + 1) injective maps : V(H) [n]. Two maps , give the same copy ifand only if = for some automorphism of H. Thus there are

    n(n 1) (n v + 1)aut(H)

    possible copies, where aut(H) is the number of automorphisms of H. (Forexample, aut(Cr) = 2r.) It follows that

    E[X] =n(n 1) (n v + 1)

    aut(H)pe n

    vpe

    aut(H)= (nvpe).

    This suggests that the threshold should be p = nv/e.Unfortunately, this cant quite be right. Consider for example H to be a K4

    with an extra edge hanging off, so v = 5 and e = 7. Our proposed thresholdis p = n5/7, which is much smaller than p = n2/3. Consider the rangein between, where p/n5/7 but p/n2/3 0. Then E[X] , but theprobability that G(n, p) contains a K4 tends to 0, so the probability that G(n, p)contains a copy of H tends to 0.

    The problem is that H contains a subgraph K4 which is hard to find, becauseits e/v ratio is larger than that of H.

    13

  • 7/29/2019 Probabilistic Combinatorics

    14/42

    Definition. The edge density d(H) of a graph H is e(H)/|H|, i.e., 1/2 times

    the average degree of H.Definition. H is balanced if = H H implies d(H) d(H), and strictlybalanced if = H H and H = H implies d(H) < d(H).

    Examples of strictly balanced graphs are complete graphs, trees, and con-nected regular graphs. Examples of balanced but not strictly balanced graphsare disconnected regular graphs, or a triangle with an extra edge attached (mak-ing four vertices and four edges).

    For balanced graphs, p = nv/e does turn out to be the threshold.

    Theorem 2.5. Let H be a balanced graph with |H| = v and e(G) = e. Thenp(n) = nv/e is a threshold function for the property G(n, p) contains a copyof H.

    Proof. Let X denote the number of copies of H in G(n, p), and set = EX, so = (nvpe). Ifp/nv/e 0 then 0, so P(X 1) 0.

    Suppose that p/nv/e , i.e., that nvpe .End L5 Start L6

    Let H1, . . . , H N list all possible copies of H with vertices in [n], and let Aidenote the event that the ith copy Hi is present in G = G(n, p). Let

    =i

    ji

    P(Ai Aj) =i

    ji

    P(Hi Hj G).

    Note that in all terms above, the graphs Hi and Hj share at least one edge.

    We split the sum by the number r of vertices of Hi Hj and the number sof edges of Hi Hj . Note that Hi Hj is a non-empty subgraph of Hi, whichis isomorphic to H. Since H is balanced, we thus have

    s

    r= d(Hi Hj) d(H) = e

    v,

    so s re/v.The contribution to from terms with given r and s is

    n2vrp2es

    =

    2/(nrps)

    .

    Nownrps

    nrpre/v = (nvpe)r/v = (r/v).

    Since and r/v > 0, it follows that nrps , so the contribution fromthis pair (r, s) is o(2).

    Since there are only a fixed number of pairs to consider, it follows that = o(2). Hence by Corollary 2.4, P(X > 0) 1.

    In general the threshold is n1/d(H), where H is the densest subgraph of

    H. The proof is almost the same.

    14

  • 7/29/2019 Probabilistic Combinatorics

    15/42

    Remark. If H is strictly balanced and p = cnv/e, then tends to a constant

    and the rth factorial momentEr[X] =

    E

    [X(X 1) (X r + 1)] satisfiesEr[X] r, from which one can show that the number of copies of H hasasymptotically a Poisson distribution. We shall not do this.

    3 The ErdosLovasz local lemma

    Suppose that we have some bad events A1, . . . , An, and we want to show thatits possible that no Ai holds, no matter how unlikely. If

    P(Ai) < 1 then the

    union bound gives what we want. What about ifP(Ai) is large? In general

    of course it might be that

    Ai always hold. One trivial case where we can rulethis out is when the Ai are independent. Then

    P (Aci ) =P(Aci ) =ni=1

    (1 P(Ai)) > 0,

    provided each Ai has probability less than 1.What if each Ai depends only on a few others?Recall that A1, . . . , An are independent if for all disjoint S, T [n] we have

    P

    iS

    Ai iT

    Aci

    =iS

    P(Ai)iT

    P(Aci ).

    This is not the same as each pair of events being independent (see below).

    Definition. An event A is independent of a set {B1, . . . , Bn} of events if for

    all disjoint S, T [n] we haveP

    A iS

    Bi iT

    Bci

    = P(A),

    i.e., if knowing that certain Bi hold and certain others do not does not affectthe probability that A holds.

    For example, suppose that each of the following four sequences of coin tosseshappens with probability 1/4: TTT, THH, HTH and HHT. Let Ai be theevent that the ith toss is H. Then one can check that any two events Ai areindependent, but {A1, A2, A3} is not a set of independent events. Similarly, A1is not independent of {A2, A3}, since P(A1 | A2 A3) = 0.

    Recall that a digraph on a vertex set V is a set of ordered pairs of distinctelements of V, i.e., a graph in which each edge has an orientation, there areno loops, and there is at most one edge from a given i to a given j, but we mayhave edges in both directions. We write i j if there is an edge from i to j.Definition. A digraph D on [n] is called a dependency digraph for the eventsA1, . . . , An if for each i the event Ai is independent of the set of events {Aj :j = i, i j}.

    15

  • 7/29/2019 Probabilistic Combinatorics

    16/42

    Theorem 3.1 (Local Lemma, general form). Let D be a dependency digraph

    for the events A1, . . . , An. Suppose that there are real numbers 0 < xi < 1 suchthatP(Ai) xi

    j : ij(1 xj)

    for eachi. Then

    P

    ni=1

    Aci

    ni=1

    (1 xi) > 0.

    Roughly speaking, Ai is allowed to depend on Aj when i j. Moreprecisely, Ai must be independent of the remaining Aj as a set, not just indi-vidually.

    Proof. We claim that for any subset S of [n] and any i /

    S we have

    P

    Aci

    jSAcj

    1 xi, (1)

    i.e., that

    P

    Ai

    jSAcj

    xi. (2)

    Assuming the claim, then

    P

    n

    i=1Aci

    = P(Ac1)P(A

    c2 | A

    c1)P(A

    c3 | A

    c1 Ac2)

    (1 x1)(1 x2)(1 x3)

    =ni=1

    (1 xi).

    It remains to prove the claim. For this we use induction on |S|.For the base case |S| = 0 we have

    P

    Ai

    jSAcj

    = P(Ai) xi

    j : ij(1 xj) xi,

    as required.

    Now suppose the claim holds whenever |S| < r, and consider S with |S| = r,and i / S. Let S1 = {j S : i j} and S0 = S\ S1 = {j S : i j}, andconsider B =

    jS1 Acj and C =

    jS0 Acj .

    The left-hand side of (2) is

    P(Ai | B C) = P(Ai B C)P(B C) =

    P(Ai B C)P(C)

    P(C)

    P(B C) =P(Ai B | C)P(B | C)

    .

    (3)

    16

  • 7/29/2019 Probabilistic Combinatorics

    17/42

    To bound the numerator, note that P(Ai B | C) P(Ai | C) = P(Ai), sinceAi is independent of the set of events {Aj : j S0}. Hence, by the assumptionof the theorem,

    P(Ai B | C) P(Ai) xi

    j : ij(1 xj). (4)

    For the denominator in (3), write S1 as {j1, . . . , ja} and S0 as {k1, . . . , kb}. Then

    P(B | C) = P(Acj1 Acja | Ack1 Ackb)

    =

    at=1

    P(Acjt | Ack1 Ackb Acj1 Acjt1).

    In each conditional probability in the product, we condition on the intersection

    of at most a + b 1 = r 1 events, and Ajt is not one of them, so (1) applies,and thus

    P(B | C) a

    t=1

    (1 xjt)

    =jS1

    (1 xj)

    j : ij(1 xj)

    since S1 {j : i j}. Together with (3) and (4) this gives P(Ai | B C) xi,which is exactly (2). This completes the proof by induction.

    End L6 Start L7

    Dependency digraphs are slightly slippery. First recall that given the eventsA1, . . . , An, we cannot construct D simply by taking i j if Ai and Aj aredependent. Considering three events such that each pair is independent but{A1, A2, A3} is not, a legal dependency digraph must have at least one edgefrom vertex 1 (since A1 does depend on {A2, A3}), and similarly from eachother vertex.

    The same example shows that (even minimal) dependency digraphs are notunique: in this case there are 8 minimal dependency digraphs.

    There is an important special case where dependency digraphs are easy toconstruct; we state it as a trivial lemma.

    Lemma 3.2. Suppose that (X)F is a set of independent random variables,and that A1, . . . , An are events where Ai is determined by {X : Fi} forsome Fi F. Then the (di)graph in which i j and j i if and only ifFi Fj = is a dependency digraph for A1, . . . , An.Proof. For each i, the events {Aj : j = i, i j} are (jointly) determined by thevariables (X : F\Fi), and Ai is independent of this family of variables.

    17

  • 7/29/2019 Probabilistic Combinatorics

    18/42

    We now turn to a more user friendly version of the local lemma. The out-

    degree of a vertex i in a digraph D is simply the number of j such that i j.Theorem 3.3 (Local Lemma, Symmetric version). Let A1, . . . , An be eventshaving a dependency digraph D with maximum out-degree at most d. IfP(Ai) p for all i and ep(d + 1) 1, thenP(Aci ) > 0.Proof. Set xi = 1/(d + 1) for all i and apply Theorem 3.1. We have |{j : i j}| d, so

    xi

    j : ij(1 xj) 1

    d + 1

    d

    d + 1

    d 1

    e(d + 1) p P(Ai),

    and Theorem 3.1 applies.

    Remark. Considering d +1 disjoint events with probability 1/(d +1) shows thatthe constant (here e) must be at least 1. In fact, the constant e is best possiblefor large d.

    Example 3.1 (Hypergraph colouring).

    Theorem 3.4. LetH be an r-uniform hypergraph in which each edge meets atmost d other edges. If d + 1 2r1/e then H has a 2-colouring.Proof. Colour the vertices randomly in the usual way, each red/blue with prob-ability 1/2, independently of the others. Let Ai be the event that the ith edgeei is monochromatic, so P(Ai) = 21

    r = p.By Lemma 3.2 we may form a dependency digraph for the Ai by joining i to

    j (both ways) ifei and ej share one or more vertices. The maximum out-degreeis at most d by assumption, and

    ep(d + 1) e21r 2r

    2e= 1,

    so Theorem 3.3 gives P(Aci ) > 0, i.e., there exists a good colouring.Example 3.2 (Ramsey numbers again).

    Theorem 3.5. If k 3 and e21(k2)k2 nk2 1 then R(k, k) > n.Proof. Colour the edges of Kn as usual, each red or blue with probability 1/2,independently of each other. For each S [n] with |S| = k let AS be the event

    that the complete graph on S is monochromatic, so P(AS) = 21

    (k2)

    .For the dependency digraph, by Lemma 3.2 we may join S and T if theyshare an edge, i.e., if |S T| 2. The maximum degree d is

    d = |{T : |S T| 2}| n. The conditions are satisfied provided wehave

    p3 x(1 x)3n(1 y)nk (5)

    and (1 p)(k2) y(1 x)(k2)n(1 y)nk . (6)It turns out that

    p =1

    6

    nx =

    1

    12n3/2k = 30

    n log n y = nk

    satisfies (5) and (6) if k is large enough. This gives the following result.

    19

  • 7/29/2019 Probabilistic Combinatorics

    20/42

    Theorem 3.7. There exists a constant c > 0 such that R(3, k) ck2/(log k)2

    if k is large enough.Proof. The argument above shows that R(3, 30

    n log n) > n. Rearrange this

    to get the result.

    Remark. This bound is best possible apart from one factor of log k. Removingthis factor is not easy, and was a major achievement of J.H. Kim.

    4 The Chernoff bounds

    Generally speaking, we are ultimately interested in whether the random graphG(n, p) has some property almost always (with probability tending to one asn ), or almost never. For example, this is enough to allow us to show theexistence of graphs with various combinations of properties, using the fact thatif two or three properties individually hold almost always, then their intersectionholds almost always. Sometimes, however, we need to consider a number k ofproperties (events) that tends to infinity as n . This means that we wouldlike tighter bounds on the probability that individual events fail to hold.

    For example, let G = G(n, p) and consider its maximum degree (G). Forany d we have P((G) d) nP(dv d) where v is any given vertex. In turnthis is at most nP(X d) where X Bi(n, d). To show that P((G) d) 0for some d = d(n) we would need a bound of the form

    P(X d) = o(1/n). (7)Recall that if X Bi(n, p) then = E[X] = np and 2 = Var[X] = np(1 p).(For example, if p = 1/2 then = n/2 and = n/2.) Chebyshevs inequalitygives P(X +) 2; to use this for (7) we need much larger than n.If p = 1/2 this gives n, which is useless.

    On the other hand, the central limit theorem suggests that

    P(X + ) = P

    X

    P(N(0, 1) ) e2/2,

    where N(0, 1) is the standard normal distribution. But this is only valid for constant as n , so again it is no use for (7).

    Our next aim is to prove a bound similar to the above, but valid no matterhow depends on n.

    Theorem 4.1. Suppose that n

    1 and p, x

    (0, 1). Let X

    Bi(n, p). Then

    P(X nx) p

    x

    x 1 p1 x1xn

    if x p,

    and

    P(X nx) p

    x

    x 1 p1 x1xn

    if x p.

    20

  • 7/29/2019 Probabilistic Combinatorics

    21/42

    Note that the exact expression is in some sense not so important; what

    matters is (a) the proof technique, and (b) that it is exponential in n if x andp are fixed.

    Proof. The idea is simply to apply Markovs inequality to the random variableetX for some number t that we will choose so as to optimize the bound.

    Consider X as a sum X1 + . . . + Xn where the Xi are independent withP(Xi = 1) = p and P(Xi = 0) = 1 p. Then

    E[etX ] = E[etX1etX2 etXn ]

    = E[etX1 ] E[etXn ]

    = E[etX1 ]n

    = (pet + (1 p)e0)n,

    where we used independence for the second equality.For any t > 0, using the fact that y ety is increasing and Markovs

    inequality, we have

    P(X nx) = P(etX etnx) (8) E[etX ]/etnx= [(pet + 1 p)etx]n.

    To get the best bound we minimize over t (by differentiating and equating tozero).

    For x > p, it turns out that the minimum occurs when

    et = xp 1 p1 x 1,

    so t > 0 (otherwise the argument above is invalid), and we get

    P(X nx)

    x1 p1 x + 1 pp

    x

    x1 x1 pxn

    =

    px

    x 1 p1 x1xn

    ,

    proving the first part of the theorem. (For x = p we get t = 0 which works too,but when x = p the result is trivial in any case.)

    For the second part, argue similarly but with t < 0 at the minimum, so ety

    is a decreasing function, and (8) becomes P(X nx) = P(etX etnx), or letY = n

    X, so Y

    Bi(n, 1

    p). Then P(X

    nx) = P(Y

    n(1

    x)), and

    apply the first part.

    Remark. Theorem 4.1 gives the best possible bound among bounds of the formP(X nx) g(x, p)n where g(x, p) is some function of x and p.

    In the form above, the bound is a little hard to use. Here are some morepractical forms.

    21

  • 7/29/2019 Probabilistic Combinatorics

    22/42

    Corollary 4.2. LetX Bi(n, p). Then for h,t > 0P

    X np + nh e2h2nand

    P

    X np + t e2t2/n.Also, for 0 1 we have

    P

    X (1 + )np e2np/4and

    P

    X (1 )np e2np/2.Proof. Fix p. For x > p or x < p Theorem 4.1 gives P(X nx) ef(x)n orP(X

    nx)

    ef(x)n, where

    f(x) = x log

    x

    p

    + (1 x)log

    1 x1 p

    .

    We aim to bound f(x) from below by some simpler function. Note that f(p) = 0.Also,

    f(x) = log x logp log(1 x) + log(1 p),so f(p) = 0 and

    f(x) =1

    x+

    1

    1 x .If f(x) a for all a between p and p + h then (e.g., by Taylors Theorem) weget f(p + h) ah2/2.

    Now for any x we have f

    (x)

    infx>0

    {1/x + 1/(1

    x)} = 4, so f(p + h)2h2, giving the first bound; the second is the same bound in different notation,

    setting t = nh.For the third bound, if p x p(1 + ) 2p then f(x) 1/x 1/(2p),

    giving f(p + p) 2p22 12p , which gives the result.For the final bound, if 0 < x p then f(x) 1/x 1/p, giving f(pp)

    2p2

    21p .

    End L8 Start L9

    Remark. Recall that =

    np(1 p), so when p is small then np np.The central limit theorem suggests that the probability of a deviation this largeshould be around e

    2np/2 as in the final bound above. The second bound is

    weaker, but has the advantage of being correct!In general, think of the bounds as of the form ec

    2

    for the probability ofbeing standard deviations away from the mean. Alternatively, deviations onthe scale of the mean are exponentially unlikely.

    The Chernoff bounds apply more generally than just to binomial distribu-tions; they apply to other sums of independent variables where each variablehas bounded range.

    22

  • 7/29/2019 Probabilistic Combinatorics

    23/42

    Example 4.1 (The maximum degree of G(n, p)).

    Theorem 4.3. Let p = p(n) satisfy np 10 log n, and let be the maximumdegree of G(n, p). Then

    P np + 3

    np log n 0

    as n .Proof. Let d = np + 3

    np log n. As noted at the start of the section,

    P( d) nP(dv d) nP(X d)

    where dv Bi(n 1, p) is the degree of a given vertex, and X Bi(n, p).Applying the second bound in Corollary 4.2 with = 3log n/(np)

    1, we

    have

    nP(X d) ne2np/4 = ne9(log n)/4 = nn9/4 = n5/4 0,

    giving the result

    Note that for large n there will be some vertices with degrees any givennumber of standard deviations above the average. The result says however thatall degrees will be at most C

    log n standard deviations above. This is best

    possible, apart from the constant.

    5 The phase transition in G(n, p)

    [ Summary of what we know about G(n, p) in various ranges; most interestingnear p = 1/n. ]

    5.1 Branching processes

    Let Z be a probability distribution on the non-negative integers. The GaltonWatson branching process with offspring distribution Z is defined as follows:

    Generation 0 consists of a single individual.

    Generation t + 1 consists of the children of individuals in generation t(formally, its the disjoint union of the sets of children of the individualsin generation t).

    The number of children of each individual has distribution Z, and is in-dependent of everything else, i.e., of the history so far, and of other indi-viduals in the same generation.

    23

  • 7/29/2019 Probabilistic Combinatorics

    24/42

    We write Xt for the number of individuals in generation t, and X = (X0, X1, . . .)

    for the random sequence of generation sizes. Note that X0 = 1, and given thatXt = k, then Xt+1 is the sum of k independent copies of Z.Let = EZ. Then EX0 = 1. Also E[Xt+1 | Xt = k] = k. Thus

    E[Xt+1] =k

    P(Xt = k)E[Xt+1 | Xt = k]

    =k

    P(Xt = k)k = E[Xt].

    Hence E[Xt] = t for all t.

    The branching process survives if Xt > 0 for all t, and dies out or goesextinct if Xt = 0 for some t.

    If = E[Z] < 1, then for any t we have

    P(X survives) P(Xt > 0) E[Xt] = t.

    Letting t shows that P(X survives) = 0.What if > 1? Note that any branching process with P(Z = 0) > 0 may

    die out the question is, can it survive?We recall some basic probabilities of probability generating functions.

    Definition. If Z is a random variable taking non-negative integer values, theprobability generating function of Z is the function fZ : [0, 1] R defined by

    fZ(x) = E[xZ] =

    k=0P(Z = k)xk.

    The following facts are easy to check

    fZ is continuous on [0, 1].

    fZ(0) = P(Z = 0) and fZ(1) = 1.

    fZ is increasing, with fZ(1) = 1.

    IfP(Z 2) > 0, then fZ is strictly convex.For the last two observations, note that for 0 < x 1 we have

    fZ(x) =

    k=1

    kP(Z = k)xk1

    0,

    andfZ(x) =k2

    k(k 1)P(Z = k)xk2 0,

    with strict equality if P(Z 2) > 0. (We are implicitly assuming that E[Z]exists.)

    24

  • 7/29/2019 Probabilistic Combinatorics

    25/42

    End L9 Start L10

    Let t = P(Xt = 0). Then 0 = 0 and

    t+1 =k

    P(Xt+1 = 0 | X1 = k)P(X1 = k) =k

    kt P(Z = k) = fZ(t),

    since, given the number of individuals in the first generation, the descendantsof each of them form an independent copy of the branching process.

    Let XZ denote the GaltonWatson branching process with offspring dis-tribution Z. Let = (Z) denote the extinction probability of XZ, i.e., theprobability that the process dies out.

    Theorem 5.1. For any probability distribution Z on the non-negative integers,(Z) is equal to the smallest non-negative solution to fZ(x) = x.

    Proof. As above, let t = P(Xt = 0), so 0 = 0

    1

    2 . Since the events

    {Xt = 0} are nested and their union is the event that the process dies out, wehave t as t .

    As shown above, t+1 = fZ(t). Since fZ is continuous, taking the limit ofboth sides gives = fZ(), so is a non-negative solution to fZ(x) = x.

    Let x0 be the smallest non-negative solution to fZ(x) = x, so x0 . Then0 = 0 x0. Since fZ is increasing, this gives

    1 = f(0) f(x0) = x0.Similarly, by induction we obtain t x0 for all t, so taking the limit, x0,and hence = x0.

    Corollary 5.2. IfE[Z] > 1 then(Z) < 1, i.e., the probability that XZ survivesis positive. IfE[Z] < 1 then(Z) = 1.

    Proof. The question is simply whether the curves fZ(x) and x meet anywherein [0, 1] other than at x = 1; sketch the graphs!

    For the first statement, suppose for convenience that E[Z] is defined. ThenfZ(1) > 1, so there exists > 0 such that fZ(1 ) < 1 . Since fZ(0) 0,applying the Intermediate Value Theorem to fZ(x) x, there must be somex [0, 1 ] for which fZ(x) = x. But then x 1 < 1.

    We have already proved the second; it can be reproved using the convexityof fZ.

    Note that when P(Z 2) > 0 and E[Z] > 1, then there is a unique solutionto fZ(x) = x in [0, 1); this follows from the strict convexity of fZ.

    Remark. We have left out the case E[Z] = 1; it is an exercise to show that if

    E[Z] = 1 then, except in a trivial case, the process always dies.

    Definition. For c > 0, a random variable Z has the Poisson distribution withmean c, written Z Po(c), if

    P(Z = k) =ck

    k!ec

    for k = 0, 1, 2, . . ..

    25

  • 7/29/2019 Probabilistic Combinatorics

    26/42

    Basic properties (exercise): E[Z] = c, E2[Z] = c2. (More generally, the rth

    factorial moment isEr[Z] = c

    r

    ).Lemma 5.3. Suppose n and p 0 with np c where c > 0 is constant.Let Zn have the binomial distribution Bi(n, p), and let Z Po(c). Then Znconverges in distribution to Z, i.e., for each k, P(Zn = k) P(Z = k).Proof. For k fixed,

    P(Zn = k) =

    n

    k

    pk(1 p)nk n

    k

    k!pk(1 p)n = (np)

    k

    k!enp+O(np

    2) ck

    k!ec,

    since np c and np2 0.As we shall see shortly, there is a very close connection between components

    in G(n, c/n) and the GaltonWatson branching process XPo(c) where the off

    -spring distribution is Poisson with mean c. The extinction probability of thisprocess will be especially important.

    Theorem 5.4. Let c > 0. Then the extinction probability = (c) of thebranching process XPo(c) satisfies the equation

    = ec(1).

    Furthermore, < 1 if and only if c > 1.

    Proof. The probability generating function of the Poisson distribution withmean c is given by

    f(x) = k=0

    ck

    k!ecxk = ecxec = ec(x1) = ec(1x).

    The result now follows from Theorem 5.1.

    5.2 Component exploration

    In the light of Lemma 5.3, it is intuitively clear that the Poisson branchingprocess gives a good local approximation to the neighbourhood of a vertex ofG(n, c/n). To make this precise, we shall explore the component of a vertexin a certain way. First we describe the (simpler) exploration for the branchingprocess.

    Exploration process (for branching process).Start with Y0 = 1, meaning one live individual (the root). At each step,select a live individual; it has Zt children and then dies. (So several steps maycorrespond to a single generation.) Let Yt be the number of individuals aliveafter t steps. Then

    Yt =

    Yt1 + Zt 1 if Yt1 > 00 if Yt1 = 0.

    26

  • 7/29/2019 Probabilistic Combinatorics

    27/42

    The process dies out if Ym = 0 for some m; in this case the total number of

    individuals is min{m : Ym = 0}.Until it hits zero, the sequence (Yt) is a random walk with i.i.d. steps Z1 1, Z2 1, . . . taking values in {1, 0, 1, 2, . . .}. Each step has expectation E[Z1] = 1, so < 1 implies negative drift and the walk will hit 0, i.e., theprocess will die, with probability 1. If > 1 then the drift is positive, so theprobability of never hitting 0 is positive, i.e., the probability that the processsurvives is positive.

    Component exploration in G(n, p).Let v be a fixed vertex of G = G(n, p). At each stage, each vertex u of G

    will be live, unreached, or processed. Yt will be the number of live verticesafter t steps; there will be exactly t processed vertices, and n t Yt unreachedvertices.

    At t = 0, v is live and all other vertices are unreached, so Y0 = 1.At each step t, pick a live vertex w, if there is one. For each unreached w,check whether ww E(G); if so, make w live. After completing these checks,set w to be processed.

    Let Rt be the number of w which become live during step t. (Think of thisas the number of vertices Reached in step t.) Then

    Yt =

    Yt1 + Rt 1 if Yt1 > 00 if Yt1 = 0.

    The process stops at the first m for which Ym = 0, and it is easy to checkthat we have reached all vertices in the component Cv of G containing v. Inparticular, |Cv| = m.

    End L10 Start L11

    So far, G could be any graph. Now suppose that G = G(n, p). Then eachedge is present with probability p independently of the others. No edge is testedtwice (we only check edges from live to unreached vertices, and then one endbecomes processed). Since n (t 1) Yt1 vertices are unreached at the startof step t, it follows that given Y0, . . . , Y t1, the number Rt of vertices reachedin step t has the distribution

    Rt Bi(n (t 1) Yt1, p). (9)

    5.3 Vertices in small components

    Given a vertex v of a graph G, let Cv denote the component of G containing v,and |Cv| its size, i.e., number of vertices.

    Let k(c) denote the probability that |XPo(c)| = k, where |X| =

    t0 Xt de-notes the total number of individuals in all generations of the branching processX.

    27

  • 7/29/2019 Probabilistic Combinatorics

    28/42

    Lemma 5.5. Suppose that p = p(n) satisfies np c where c > 0 is constant.Let v be a given vertex of G(n, p). For each constant k we have

    P(|Cv| = k) k(c)as n .Proof. The idea is simply to show that the random walks (Yt) and (Yt) havealmost the same probability of first hitting zero at t = k. We do this bycomparing the probabilities of individual trajectories.

    Define (Yt) and (Rt) as in the exploration above. Then |Cv| = k if and onlyif Yk = 0 and Yt > 0 for all t < k. Let Sk be the set of all possible sequencesy = (y0, . . . , yk) of values for Y = (Y0, . . . , Y k) with this property. (I.e., y0 = 1,yk = 0, yt > 0 for t < k, and each yt is an integer with yt yt1 1.) Then

    P(|Cv| = k) =ySk

    P(Y = y).

    Similarly,

    k(c) = P

    |XPo(c)| = k

    =ySk

    P(Y = y).

    Fix any sequence y Sk. For each t let rt = yt yt1 + 1, so (rt) is thesequence of Rt values corresponding to Y = y. From (9) we have

    P(Y = y) =k

    t=1

    P

    Bi(n (t 1) yt1, p) = rt

    .

    In each factor, t

    1, yt1 and rt are constants. As n

    we have n

    (t

    1) yt1 n, so (n (t 1) yt1)p c. Applying Lemma 5.3 to each factorin the product, it follows that

    P(Y = y) k

    t=1

    P

    Po(c) = rt

    .

    But this is just P(Y = y), from the exploration for the branching process.Summing over the finite number of possible sequences y gives the result.

    We write Nk(G) for the number of vertices of a graph G in components withk vertices. (So Nk(G) is k times the number of k-vertex components in G.)

    Corollary 5.6. Suppose that np

    c where c > 0 is constant. For each fixed k

    we haveENk(G(n, p)) nk(c) as n .Proof. The expectation is simply

    v P(|Cv| = k) = nP(|Cv| = k) nk(c).

    Lemma 5.5 tells us that the branching process predicts the expected numberof vertices in components of each fixed size k. It is not hard to use the secondmoment method to show that in fact this number is concentrated around itsmean.

    28

  • 7/29/2019 Probabilistic Combinatorics

    29/42

    Definition. Let (Xn) be a sequence of real-valued random variables and a a

    (constant) real number. Then Xn converges to a in probability, written Xnp

    a,if for all (fixed) > 0 we have P(|Xn a| > ) 0 as n .Lemma 5.7. Suppose thatE[Xn] a andE[X2n] a2. Then Xn p a.Proof. Note that Var[Xn] = E[X2n] (EXn)2 a2 a2 = 0, and apply Cheby-shev.

    In fact, whenever we showed that some quantity Xn was almost alwayspositive by using the second moment method, we really showed more, that

    Xn/E[Xn]p 1, i.e., that Xn is concentrated around its mean.

    Lemma 5.8. Let c > 0 and k be constant, and let Nk = Nk(G(n,c/n)). Then

    Nk/np

    k(c).

    Proof. We have already shown that E(Nk/n) k(c).Let Iv be the indicator function of the event that |Cv| = k, so Nk =

    v Iv

    andN2k =v

    w

    IvIw = A + B,

    whereA =v

    w

    IvIwI{Cv=Cw}

    is the part of the sum from vertices in the same component, and

    B =

    v wIvIwI{Cv=Cw}

    is the part from vertices in different components. [Note that we can split thesum even though its random whether a particular pair of vertices are in thesame component or not.]

    If Iv = 1, then |Cv| = k, so

    w IwI{Cv=Cw} = k. Hence A = kNk kn,and E[A] = o(n2).

    Since all vertices v are equivalent, we can rewrite E[B] as

    nP(|Cv| = k)E

    w

    IwI{Cv=Cw} |Cv| = k

    where v is any fixed vertex. Now

    w IwI{Cv=Cw} is just Nk(GCv), the number

    of vertices of G

    Cv in components of size k. Exploring Cv as before, given

    that |Cv| = k we have not examined any of the edges among the n k verticesnot in Cv, so G Cv has the distribution of G(n k,c/n). Since n k n,Lemma 5.5 gives

    E[B] nP(|Cv| = k)(n k)k(c) (nk(c))2.Hence, E[N2k ] = E[A] + E[B] (nk(c))2, i.e., E[(Nk/n)2] k(c)2.

    Lemma 5.7 now gives the result.

    29

  • 7/29/2019 Probabilistic Combinatorics

    30/42

    End L11 Start L12

    Let NK(G) denote the number of vertices v of G with |Cv| K, and letK(c) = P

    |XPo(c)| K

    .

    With G = G(n, c/n), we have seen that for k fixed, Nk(G)/np k(c).

    Summing over k = 1, . . . , K , it follows that if K is fixed, then

    NK(G)n

    p K(c). (10)

    What if we want to consider components of sizes growing with n? Then wemust be more careful.

    Recall that = (c) denotes the extinction probability of the branchingprocess XPo(c), so

    k k(c) = . In other words,

    K(c) =Kk=1

    k(c) (c) as K .

    If c > 1, then Nn(G)/n = 1, while n(c) (c) < 1, so we cannot extendthe formula (10) to arbitrary K = K(n). But we can allow K to grow at somerate.

    Lemma 5.9. Let c > 0 be constant, and suppose that k = k(n) satisfiesk and k n1/4. Then the number N of vertices of G(n, c/n) incomponents with at most k vertices satisfies N/n

    p (c).Proof. [Sketch; non-examinable] The key point is that since k , we haveP

    (|XPo(c)| k) c.We omit the comparison of the branching process and graph in this case. Forenthusiasts, one possibility is to repeat the proof of Lemma 5.9 but being morecareful with the estimates. One can use the fact that P

    Bi(n m,c/n) = k

    and P(Po(c) = k) agree within a factor 1 O((k + m + 1)2/n) when k, m n/4,to show that all sequences y corresponding to the random walk ending by timek have essentially the same probability in the graph and branching processexplorations.

    For each fixed k, we know almost exactly how many vertices are in compo-nents of size k. Does this mean that we know the whole component structure?Not quite: if c > 1, so < 1, then Lemma 5.9 tells us that there are around(1 )n vertices in components of size at least n1/4, say. But are these compo-nents really of around that size, or much larger?

    To answer this, we return to the exploration process.

    5.4 The phase transition

    Recall that our exploration of the component of vertex v in G(n, c/n) leads toa random walk (Yt)

    mt=0 with Y0 = 1, Ym = 0, and at each step Yt = Yt1 +

    30

  • 7/29/2019 Probabilistic Combinatorics

    31/42

    Rt1 where, conditional on the process so far, Rt has the binomial distributionBi(n (t 1) Yt1,c/n).Suppose first that c < 1. The distribution ofRt is dominated by a Bi(n,c/n)distribution. More precisely, we can define independent variables R+t Bi(n, c/n)so that Rt R+t holds for all t for which Rt is defined. [ To be totally explicit,construct the random variables step by step. At step t, we want Rt to beBi(x, c/n) for some x n that depends what has happened so far. Toss x coinsto determine Rt, and then n x further coins, taking the total number of headsto be R+t . ]

    Let (Y+t ) be the walk with Y+0 = 1 and increments R

    +t 1, so Yt Y+t for

    all t until our exploration in G(n, c/n) stops. Then for any k we have

    P(|Cv| > k) P

    Y0, . . . , Y k > 0

    PY+0 , . . . , Y +k > 0

    P(Y+k > 0).But Y+k has an extremely simple distribution:

    Y+k + k 1 =k

    i=1

    R+k Bi(nk,c/n),

    so

    P(Y+k > 0) = P(Y+k + k 1 k) = P

    Bi(nk,c/n) > k

    = P

    Bi(nk, c/n) > ck + (1 c)k

    .

    Letting = min{(1c)/c, 1}, the Chernoffbound gives that this final probabilityis at most e

    2ck/4. If we set k = A log n with A > 8/(2c), then we haveP(|Cv| k) e2logn = 1/n2, proving the following result.

    We say that an event (formally a sequence of events) holds with high proba-bility or whp if its probability tends to 1 as n .Theorem 5.10. Let c < 1 be constant. There is a constant A > 0 such thatwhp every component of G(n,c/n) has size at most A log n.

    Proof. Define A as above. Then the expected number of vertices in componentslarger than A log n is at most 1/n = o(1).

    Now suppose that c > 1. Then our random walk has positive drift: at least

    to start with. Once the number ntYt of unexplored vertices becomes smallerthan n/c, this is no longer true.Fix any > 0, and let k+ = (1 1/c )n. Now let Rt be independent

    increments with the distribution Bi(n/c + n,c/n), and let (Yt ) be the corre-sponding random walk. As long as we have used up at most k+ vertices, thenour increments Rt dominate R

    t . It follows that for any k k+ we have

    P(|Cv| = k) P(Yk = 0) P(Yk 0).

    31

  • 7/29/2019 Probabilistic Combinatorics

    32/42

    Once again, Yk has a simple distribution: it is 1 k + Bi(nk(c1 + ),c/n).HenceP(Yk 0) = P

    Bi(nk(c1 + ),c/n) k 1 PBi(nk(c1 + ),c/n) k.The binomial has mean = k + ck, so k = (1 ) for = c/(1 + c),which is a positive constant. By the Chernoffbound, this probability is at moste

    2/2 e2k/2.Choose a constant A such that A > 6/2 and let k = A log n. Then for

    k k k+ we haveP(|Cv| = k) e

    2k/2 e2k/2 e3logn = 1/n3.Applying the union bound over k k k+ and over all n vertices v, it followsthat whp there are no vertices at all in components of size between k and

    k+. In other words, whp all components are either small, i.e., of size at mostk = O(log n), or large, i.e., of size at least k+ = (1 1/c )n.

    From Theorem 5.9, we know that there almost exactly n vertices in smallcomponents; hence there are almost exactly (1 )n vertices in large compo-nents.

    Given a graph G, let Li(G) denote the number of vertices in the ith largestcomponent. Note that which component is the ith largest may be ambiguous,if there are ties, but the value of Li(G) is unambiguous.

    We close this section by showing that almost all vertices not in small com-ponents are in a unique giant component.

    Theorem 5.11. Let c > 0 be constant, and let G = G(n,c/n).If c < 1 there is a constant A such that L

    1(G)

    A log n holds whp.

    If c > 1 then L1(G)/np (1 ), and there is a constant A such that

    L2(G) A log n holds whp.Proof. We have almost completed the proof. The only remaining ingredient isto show in the c > 1 case that there cannot be two large components.

    The simplest way to show this is just to choose so that (1 1/c ) >(1 )/2. Then we dont have enough vertices in large components to have twoor more large components. But is this possible? Such a exists if and only if(1 1/c) > (1 )/2, i.e., if and only if > 2/c 1.

    Recall that = (c) is the smallest solution to = ec(1). Furthermore,for x < we have < ec(1) and for x > we have > ec(1). Sowhat we have to show is that x = 2/c 1 falls into the first case, i.e., that2/c 1 < e

    c(1(2/c

    1))

    = e22c

    .Multiplying by c, let f(c) = ce22c + c 2, so our task is to show f(c) > 0

    for c > 1. This is easy by calculus: we have f(1) = 0, f(1) = 0 and f(c) > 0for c > 1.

    (In fact f(c) = 4(c 1)e22c.)End L12 Start L13

    32

  • 7/29/2019 Probabilistic Combinatorics

    33/42

    6 Correlation and concentration

    6.1 The HarrisKleitman Lemma

    In this section we turn to the following simple question and its generalizations.Does conditioning on G = G(n, p) containing a triangle make G more or lesslikely to be connected? Note that if we condition on a fixed set E of edgesbeing present, then this is the same as simply adding E to G(n, p), which doesincrease the chance of connectedness. But conditioning on at least one trianglebeing present is not so simple.

    Let X be any finite set, the ground set. For 0 p 1 let Xp be a randomsubset of X obtained by selecting each element independently with probabilityp. A property of subsets of X is just some collection A P(X) of subsets of X.For example, the property contains element 1 or element 3 may be identified

    with the set A of all subsets A of X with 1 A or 3 A.We write PXp (A) for

    P(Xp A) =AA

    p|A|(1 p)|X||A|.

    Most of the time, we omit X from the notation, writing Pp(A) for PXp (A).

    We say that A P(X) is an up-set, or increasing property, if A A andA B X implies B A. Similarly, A is a down-set or decreasing propertyif A A and B A implies B A. Note that A is an up-set if and only ifAc = P(X) \ A is a down-set.

    To illustrate the definitions, consider the (for us) most common special case.Here X consists of all

    n2 edges of Kn, and Xp is then simply the edge-set of

    G(n, p). Then a property of subsets of X is just a set of graphs on [n], e.g., allconnected graphs. A property is increasing if it is preserved by adding edges,and decreasing if it is preserved by deleting edges.

    Lemma 6.1 (Harriss Lemma). If A, B P(X) are up-sets and 0 p 1then

    Pp(A B) Pp(A)Pp(B).In other words, P

    Xp A and Xp B P(Xp A)P(Xp B), i.e.,

    P

    Xp A | Xp B) P(Xp A), i.e., increasing properties are positivelycorrelated.

    Proof. We use induction on n = |X|. The base case n = 0 makes perfect senseand holds trivially, though you can start from n = 1 if you prefer.

    Suppose that |X| = n 1 and that the result holds for smaller sets X.Without loss of generality, let X = [n] = {1, 2, . . . , n}.

    For any C P(X) letC0 = {C C : n / C P([n 1])

    andC1 = {C\ {n} : C C, n C} P([n 1]).

    33

  • 7/29/2019 Probabilistic Combinatorics

    34/42

    Thus C0 and C1 correspond to the subsets of C not containing and containing n

    respectively, except that for C1 we delete n from every set to obtain a collectionof subsets of [n 1].Note that

    Pp(C) = (1 p)Pp(C0) +pPp(C1). (11)More precisely,

    P[n]p (C) = (1 p)P[n1]p (C0) +pP[n1]p (C1).

    Suppose now that A and B P([n]) are up-sets. Then A0, A1, B0 and B1 areall up-sets in P([n 1]). Also, A0 A1 and B0 B1. Let a0 = Pp(A0) etc, socertainly a0 a1 and b0 b1.

    Since (A B)i = Ai Bi, by (11) and the induction hypothesis we havePp(A B) = (1 p)

    Pp(A0 B0) +p

    Pp(A1 B1) (1 p)a0b0 +pa1b1 = x,

    say. On the other hand

    Pp(A)Pp(B) =

    (1 p)a0 +pa1

    (1 p)b0 +pb1

    = y,

    say. So it remains to show that x y. Butx y = (1 p) (1 p)2a0b0 p(1 p)a0b1 p(1 p)a1b0 + (p p2)a1b1

    = p(1 p)(a1 a0)(b1 b0) 0,recalling that a0 a1 and b0 b1.

    Harriss Lemma has an immediate corollary concerning two down-sets, orone up- and one down-set.

    Corollary 6.2. If U is an up-set and D1 and D2 are down-sets, then

    Pp(U D1) Pp(U)Pp(D1),and

    Pp(D1 D2) Pp(D1)Pp(D2).Proof. Exercise, using the fact that Dci is an up-set.

    An up-set U is called principal if it is of the form U = {A X : S A} forsome fixed S X. Recall that conditioning on such a U is the same as simplyadding the elements of S into our random set Xp.

    Corollary 6.3. If D1 and D2 are down-sets and U0 is a principal up-set, then

    Pp(D1 | U0 D2) Pp(D1 | U0).Proof. Thinking of the elements of the set S X corresponding to U0 as alwayspresent, this can be viewed as a statement about two down-sets D1 and D

    2 in

    P(X \ S). Apply Harriss Lemma to these.

    34

  • 7/29/2019 Probabilistic Combinatorics

    35/42

    6.2 Jansons inequalities

    We have shown (from the Chernoff bounds) that, roughly speaking, if we havemany independent events and the expected number that hold is large, then theprobability that none holds is very small. What if our events are not quiteindependent, but each depends on only a few others?

    End L13 Start L14

    Let E1, . . . , E k be sets of possible edges of G(n, p), and let Ai be the eventthat all edges in Ei are present in G(n, p). Note that each Ai is a principalup-set. Let X be the number of Ai that hold. (For example, if the Ei list allpossible edge sets of triangles, then X is the number of triangles in G(n, p).)

    As usual, let = E[X] =

    i P(Ai).As in Chapter 2, write i j if i = j and Ei Ej = , and let

    =i

    ji

    P(Ai Aj).

    Theorem 6.4. In the setting above, we have P(X = 0) e+/2.Before turning to the proof, note that

    P(X = 0) = P(Ac1 Ack)= P(Ac1)P(A

    c2 | A

    c1) P(A

    ck | A

    c1 Ack1)

    k

    i=1

    P(Aci ) =k

    i=1

    (1 P(Ai)),

    where we used Harriss Lemma and the fact that the intersection of two or moredown-sets is again a down-set. In the (typical) case that all P(Ai) are small,the final bound is roughly e

    P(Ai) = e, so (if is small), Theorem 6.4 is

    saying that the probability that X = 0 is not much larger than the minimum itcould possibly be.

    Proof. Let ri = P(Ai | Ac1 Aci1). Note that

    P(X = 0) = P(Ac1 Ack) =k

    i=1

    (1 ri). (12)

    Our aim is to show that ri is not much smaller than P(Ai).Fix i, and let D1 be the intersection of those A

    cj where j < i and j

    i. Let

    D0 be the intersection of those Acj where j < i and j i. Thusri = P(Ai | D0 D1).

    For any three events

    P(A | B C) = P(A B C)P(B C)

    P(A B C)P(C)

    = P(A B | C).

    35

  • 7/29/2019 Probabilistic Combinatorics

    36/42

    Thus ri P(Ai D1 | D0) = P(Ai | D0)P(D1 | Ai D0).Now D0 depends only on the presence of edges in

    ji Ej , which is disjointfrom Ei. Hence P(Ai | D0) = P(Ai). Also, Ai is a principal up-set and D0 and

    D1 are down-sets, so by Corollary 6.3 we have P(D1 | Ai D0) P(D1 | Ai).Thus

    ri P(Ai)P(D1 | Ai) = P(Ai D1)

    = P

    Ai \

    j

  • 7/29/2019 Probabilistic Combinatorics

    37/42

    become random variables. By linearity of expectation we have

    E[S] =i

    rP(Ai) = r

    andE[S] =i

    ji

    P(Ai Aj)P(i, j S) = r2.

    Thus E[SS/2] = r r2/2.Since a random variable cannot always be smaller than its mean, there exists

    some set S such that SS/2 r r2/2. Applying (13) to this particularset S it follows that

    P(X = 0) er+r2/2.

    This bound is valid for any 0 r 1; to get the best result we optimize, whichsimply involves setting r = / 1. Then we obtain

    P(X = 0) e2

    +

    2

    2 = e2

    2 .

    Together Theorems 6.4 and 6.5 give the following.

    Corollary 6.6. Under the assumptions of Theorem 6.4 we have P(X = 0) exp min{/2, 2/(2)}.

    Proof. For < apply Theorem 6.4; for apply Theorem 6.5.

    End L14 Start L15

    How do the second moment method and Jansons inequalities compare?Suppose that X is the number of events Ai that hold, let = E[X], and let =

    i

    ji P(AiAj), as in the context of Corollary 2.4. Then Corollary 2.4

    says that if and = o(2) (i.e., 2/ ), then P(X = 0) 0.More concretely, if L and 2/ L, then the proof of Corollary 2.4 gives

    P(X = 0) 2/L.

    Jansons inequality, in the form of Corollary 6.6, has more restrictive assump-tions: the events Ai have to be events of a specific type. When this holds, the there is the same as before. When

    L and 2/

    L, the conclusion is

    thatP(X = 0) eL/2.

    Both bounds imply that P(X = 0) 0 when and 2/ both tend to in-finity, but when Jansons inequalities apply, the concrete bound they give isexponentially stronger than that from the second moment method.

    37

  • 7/29/2019 Probabilistic Combinatorics

    38/42

    6.3 Clique and chromatic number ofG(n, p)

    We shall illustrate the power of Jansons inequality by using it to study thechromatic number of G(n, p). The ideas are more important than the detailsof the calculations. We start by looking at something much simpler: the cliquenumber.

    Throughout this section p is constant with 0 < p < 1.As usual, let (G) = max{k : G contains a copy of Kk} be the clique num-

    ber of G. For k = k(n) let Xk be the number of copies of Kk in G = G(n, p),and

    k = E[Xk] =

    n

    k

    p(

    k2).

    Note thatk+1

    k= n

    k + 1n

    k1

    p(k+12 )(k2) =

    n kk + 1

    pk, (14)

    which is a decreasing function of k. It follows that the ratio is at least 1 up tosome point and then at most one, so k first increases from 0 = 1, 1 = n, . . . ,and then decreases.

    We definek0 = k0(n, p) = min{k : k < 1}.

    Lemma 6.7. With 0 < p < 1 fixed we have k0 2log1/p n = 2 lognlog(1/p) asn .Proof. Using standard binomial bounds,

    nkkpk(k1)/2 k

    en

    kkpk(k1)/2.

    Taking the kth root it follows that

    1/kk = n

    kp(k1)/2

    = n

    kpk/2

    .

    Let > 0 be given.If k (1 )2log1/p n then k/2 (1 )log1/p n, so (1/p)k/2 n1, i.e.,

    pk/2 n1+. Thus 1/kk is at least a constant times nn1+/ log n = n/ log n,so

    1/kk > 1 if n is large. Hence k > 1, so k0 k.

    Similarly, if k (1 + )2log1/p n then pk/2 n1 and if n is large enoughit follows that k < 1, so k0 k. So for any fixed we have

    (1 )2log1/p n k0 (1 + )2 log1/p nif n is large enough, i.e., k0 2log1/p n.

    Note for later that if k k0 then1

    p

    k= n2+o(1) (15)

    38

  • 7/29/2019 Probabilistic Combinatorics

    39/42

    so from (14) we have

    k+1k

    =n O(log n)(log n)

    n2+o(1) = n1+o(1). (16)

    Lemma 6.8. With0 < p < 1 fixed we haveP(G(n, p)) > k0 0 asn .

    Proof. We have (G(n, p)) > k0 if and only if Xk0+1 > 0, which has probabilityat most E[Xk0+1] = k0+1. Now k0 < 1 by definition, so by (16) we havek0+1 n1+o(1), so k0+1 0.

    Let k be the expected number of ordered pairs of distinct k-cliques sharingat least one edge. This is exactly the quantity appearing in Corollaries 2.4and 6.6 when X is the number of k-cliques.

    Lemma 6.9. Suppose that k k0. Thenk

    2k max

    n2+o(1),n1+o(1)

    k

    .

    In particular, if k thenk = o(2k).Proof. We have

    k =k1s=2

    n

    k

    k

    s

    n kk s

    p2(k2)(s2),

    sok

    2k=

    k1

    s=2

    s,

    where

    s =

    ks

    nkksnk

    p(s2).Let

    s =s+1

    s=

    k ss + 1

    k sn 2k + s + 1p

    s,

    so

    s = n1+o(1)

    1

    p

    s. (17)

    In particular, using (15) we have s < 1 for s k/4, say, and s > 1 fors

    3k/4. In between we have s+1/s

    1/p, so s+1/s

    1, and s is

    increasing when s runs from k/4 to 3k/4.It follows that there is some s0 [k/4, 3k/4] such that s 1 for s s0 and

    s > 1 for s > s0. In other words, the sequence s decreases and then increases.Hence, max{s : 2 s k 1} = max{2,k1}, so

    k

    2k=

    k1s=2

    s k max{2,k1} = no(1) max{2,k1}.

    39

  • 7/29/2019 Probabilistic Combinatorics

    40/42

    End L15 Start L16

    Either calculating directly, or using 0 1, 2 = 001, and the approx-imate formula for s in (17), one can check that 2 = n

    2+o(1). Similarly,k = 1/k and k1 = k/k1 = n1+o(1)/k, using (17) and (15).

    Theorem 6.10. Let 0 < p < 1 be fixed. Define k0 = k0(n, p) as above, and letG = G(n, p). Then

    P

    k0 2 (G) k0 1

    Proof. The upper bound is Lemma 6.8. For the lower bound, k01 1 bythe definition of k0. So by (16) we have k02 n1o(1), and in particulark02 . Let k = k0 2. Then by Lemma 6.9 we have k = o(2k). Henceby the second moment method (Corollary 2.4) we have P((G) < k0 2) =P(Xk = 0)

    0.

    Note that we have pinned down the clique number to one of three values;with only a very little more care, we can pin it down to two values or, almostall the time, a single value! (Usually, k01 is much larger than one, and k0much smaller than one.)

    Using Jansons inequality, we can get a very tight bound on the probabilitythat the clique number is significantly smaller than expected.

    Theorem 6.11. Under the assumptions of Theorem 6.10 we have

    P(G) < k0 3 en2o(1) .

    Note that this is an almost inconceivably small probability: the probability

    that G(n, p) contains no edges at all is (1 p)(n

    2) = e(n2)

    .

    Proof. Let k = k0 3. Then arguing as above we have k n2o(1). Hence byLemma 6.9 we have k/2k n2+o(1), so 2k/k n2o(1). Thus by Jansonsinequality (Corollary 6.6) we have P(Xk = 0) en2o(1) .

    Why is such a good error bound useful? Because it allows us to study thechromatic number, by showing that with high probability every subgraph of adecent size contains a fairly large independent set.

    Theorem 6.12 (Bollobas). Let 0 < p < 1 be constant and let G = G(n, p).Then for any fixed > 0 we have

    P

    (1 )n

    2logx n (G) (1 + )n

    2logx n 1

    as n , where x = 1/(1 p).

    40

  • 7/29/2019 Probabilistic Combinatorics

    41/42

    Proof. Apply Theorem 6.10 to the complement Gc of G, noting that Gc G(n, 1 p). Writing (G) for the independence number of G, we find that withprobability tending to 1 we have (G) = (Gc) k0(n, 1 p) 2logx n. Since(G) n/(G), this gives the lower bound.

    For the upper bound, let m = n/(log n)2, say. For each subset V of V(G)with |V| = m, let EV be the event that G[V] contains an independent set ofsize at least k = k0(m, 1 p) 3. Note that

    k 2logx m 2logx n.Applying Theorem 6.11 to the complement of G[V], which has the distributionof G(m, 1 p), we have

    P(EcV) em2o(1)

    = en2o(1)

    .

    Let E =

    |V|=m EV . Considering thenm

    2n possible sets separately, theunion bound gives

    P(Ec) = P(EcV) 2nen2o(1) 0.

    It follows that E holds with high probability. But when E holds one can colourby greedily choosing independent sets of size at least k for the colour classes,until at most m vertices remain, and then simply using one colour for eachvertex. Since we use at most n/k sets of size at least k, this shows that whenE holds, then

    (G(n, p)) nk

    + m = (1 + o(1))n

    2logx

    n+

    n

    (log n)2 n

    2logx

    n,

    completing the proof.

    7 Postscript: other models

    There are several standard models of random graphs on the vertex set [n] ={1, 2, . . . , n}. We have focussed on G(n, p), where each possible edge is includedindependently with probability p.

    The model originally studied by the founders of the theory of random graphs,Erdos and Renyi, is slightly different. Fix n 1 and 0 m N = n2. Therandom graph G(n, m) is the graph with vertex set [n] obtained by choosingexactly m edges randomly, with all N

    m possible sets of m edges equally likely.

    For most questions (but not, for example, is the number of edges even),G(n, p) and G(n, m) behave very similarly, provided we choose the density pa-rameters in a corresponding way, i.e., take p m/N.

    Often, we consider random graphs of different densities simultaneously. InG(n, m), there is a natural way to do this, called the random graph process. Thisis the random sequence (Gm)m=0,1,...,N of graphs on [n] obtained by startingwith no edges, and adding edges one-by-one in a random order, with all N!

    41

  • 7/29/2019 Probabilistic Combinatorics

    42/42

    orders equally likely. Note that each individual Gm has the distribution of

    G(n, m): we take the first m edges in a random order, so all possibilities areequally likely. The key point is that in the sequence (Gm), we define all the Gmtogether, in such a way that if m1 < m2, then Gm1 Gm2 . (This is called acoupling of the distributions G(n, m) for different m.)

    There is a similar coupling in the G(n, p) setting, the continuous time randomgraph process. This is the random sequence (Gt)t[0,1] defined as follows: foreach possible edge, let Ue be a random variable with the uniform distributionon the interval [0, 1], with the different Ue independent. Let the edge set of Gtbe {e : Ue t}. (Formally this defines a random function t Gt from [0, 1] tothe set of graphs on [n].) One can think of Ue as giving the time at which theedge e is born; Gt consists of all edges born by time t. For any p, Gp has thedistribution of G(n, p), but again these distributions are coupled in the naturalway: if p1 < p2 then Gp1

    Gp2 .

    Of course there are many other random graph models not touched on inthis course (as well as many more results about G(n, p)). These include otherclassical models, such as the configuration model of Bollobas, and also newinhomogeneous models introduced as more realistic models for networks in thereal world.

    42