Upload
buidat
View
217
Download
2
Embed Size (px)
Citation preview
Random growth models
Laura Florescu
NYU - Courant Institute
Advisor: Joel SpencerApril 26, 2017
Laura Florescu Random growth models 1 / 46
Research theme
For my Ph.D., the unifying research theme is
Variations of randomness in networks
1) how much determinism in random di↵usions de-randomizes theprocess (rotor walks, constrained di↵usions)
2) how superposing “planted” information on a random network changesthe structure so as to recover the planted one. (stochastic block models)
Laura Florescu Random growth models 2 / 46
Research theme
For my Ph.D., the unifying research theme is
Variations of randomness in networks
1) how much determinism in random di↵usions de-randomizes theprocess (rotor walks, constrained di↵usions)
2) how superposing “planted” information on a random network changesthe structure so as to recover the planted one. (stochastic block models)
Laura Florescu Random growth models 2 / 46
Research theme
For my Ph.D., the unifying research theme is
Variations of randomness in networks
1) how much determinism in random di↵usions de-randomizes theprocess (rotor walks, constrained di↵usions)
2) how superposing “planted” information on a random network changesthe structure so as to recover the planted one. (stochastic block models)
Laura Florescu Random growth models 2 / 46
Rotor walks
Deterministic analogue of random walk, proposed by physicists as’Eulerian Walkers’ in the late 90’s, and by Jim Propp in 2003.
Attach arrows (rotors) at each site pointing in any direction. At eachstep, rotate the arrow counter-clockwise by 90 degrees and then movethe particle in that direction.
In the square grid Z2, successive exits could repeatedly cycle throughthe sequence North, East, South, West.
Laura Florescu Random growth models 4 / 46
Rotor walks
Deterministic analogue of random walk, proposed by physicists as’Eulerian Walkers’ in the late 90’s, and by Jim Propp in 2003.
Attach arrows (rotors) at each site pointing in any direction. At eachstep, rotate the arrow counter-clockwise by 90 degrees and then movethe particle in that direction.
In the square grid Z2, successive exits could repeatedly cycle throughthe sequence North, East, South, West.
Laura Florescu Random growth models 4 / 46
Rotor walks
Deterministic analogue of random walk, proposed by physicists as’Eulerian Walkers’ in the late 90’s, and by Jim Propp in 2003.
Attach arrows (rotors) at each site pointing in any direction. At eachstep, rotate the arrow counter-clockwise by 90 degrees and then movethe particle in that direction.
In the square grid Z2, successive exits could repeatedly cycle throughthe sequence North, East, South, West.
Laura Florescu Random growth models 4 / 46
MotivationApplications:1 Theoretical Computer Science
1 Randomized Mergesort on Parallel Disks (Barve, Grove, Vitter 1996)
2 Model of mobile agents exploring a territory
3 Load balancing in distributed computing (Friedrich et al)
4 Design principles for navigation problems and optimal transport innetworks (Li et al)
5 Broadcasting information in networks (Doerr, Friedrich, Kunnemann,Sauerwald 2009)
2 Physics: model of self-organized criticality, connections to abeliansandpile model (Holroyd, Levine, Meszaros, Peres, Propp, Wilson2008), (Priezzhev, Dhar, Dhar, Krishnamurthy 1996)
Laura Florescu Random growth models 5 / 46
MotivationApplications:1 Theoretical Computer Science
1 Randomized Mergesort on Parallel Disks (Barve, Grove, Vitter 1996)
2 Model of mobile agents exploring a territory
3 Load balancing in distributed computing (Friedrich et al)
4 Design principles for navigation problems and optimal transport innetworks (Li et al)
5 Broadcasting information in networks (Doerr, Friedrich, Kunnemann,Sauerwald 2009)
2 Physics: model of self-organized criticality, connections to abeliansandpile model (Holroyd, Levine, Meszaros, Peres, Propp, Wilson2008), (Priezzhev, Dhar, Dhar, Krishnamurthy 1996)
Laura Florescu Random growth models 5 / 46
MotivationApplications:1 Theoretical Computer Science
1 Randomized Mergesort on Parallel Disks (Barve, Grove, Vitter 1996)
2 Model of mobile agents exploring a territory
3 Load balancing in distributed computing (Friedrich et al)
4 Design principles for navigation problems and optimal transport innetworks (Li et al)
5 Broadcasting information in networks (Doerr, Friedrich, Kunnemann,Sauerwald 2009)
2 Physics: model of self-organized criticality, connections to abeliansandpile model (Holroyd, Levine, Meszaros, Peres, Propp, Wilson2008), (Priezzhev, Dhar, Dhar, Krishnamurthy 1996)
Laura Florescu Random growth models 5 / 46
MotivationApplications:1 Theoretical Computer Science
1 Randomized Mergesort on Parallel Disks (Barve, Grove, Vitter 1996)
2 Model of mobile agents exploring a territory
3 Load balancing in distributed computing (Friedrich et al)
4 Design principles for navigation problems and optimal transport innetworks (Li et al)
5 Broadcasting information in networks (Doerr, Friedrich, Kunnemann,Sauerwald 2009)
2 Physics: model of self-organized criticality, connections to abeliansandpile model (Holroyd, Levine, Meszaros, Peres, Propp, Wilson2008), (Priezzhev, Dhar, Dhar, Krishnamurthy 1996)
Laura Florescu Random growth models 5 / 46
MotivationApplications:1 Theoretical Computer Science
1 Randomized Mergesort on Parallel Disks (Barve, Grove, Vitter 1996)
2 Model of mobile agents exploring a territory
3 Load balancing in distributed computing (Friedrich et al)
4 Design principles for navigation problems and optimal transport innetworks (Li et al)
5 Broadcasting information in networks (Doerr, Friedrich, Kunnemann,Sauerwald 2009)
2 Physics: model of self-organized criticality, connections to abeliansandpile model (Holroyd, Levine, Meszaros, Peres, Propp, Wilson2008), (Priezzhev, Dhar, Dhar, Krishnamurthy 1996)
Laura Florescu Random growth models 5 / 46
MotivationApplications:1 Theoretical Computer Science
1 Randomized Mergesort on Parallel Disks (Barve, Grove, Vitter 1996)
2 Model of mobile agents exploring a territory
3 Load balancing in distributed computing (Friedrich et al)
4 Design principles for navigation problems and optimal transport innetworks (Li et al)
5 Broadcasting information in networks (Doerr, Friedrich, Kunnemann,Sauerwald 2009)
2 Physics: model of self-organized criticality, connections to abeliansandpile model (Holroyd, Levine, Meszaros, Peres, Propp, Wilson2008), (Priezzhev, Dhar, Dhar, Krishnamurthy 1996)
Laura Florescu Random growth models 5 / 46
MotivationApplications:1 Theoretical Computer Science
1 Randomized Mergesort on Parallel Disks (Barve, Grove, Vitter 1996)
2 Model of mobile agents exploring a territory
3 Load balancing in distributed computing (Friedrich et al)
4 Design principles for navigation problems and optimal transport innetworks (Li et al)
5 Broadcasting information in networks (Doerr, Friedrich, Kunnemann,Sauerwald 2009)
2 Physics: model of self-organized criticality, connections to abeliansandpile model (Holroyd, Levine, Meszaros, Peres, Propp, Wilson2008), (Priezzhev, Dhar, Dhar, Krishnamurthy 1996)
Laura Florescu Random growth models 5 / 46
Why care, and what to ask
Compare with random walk.
hitting times of sets,
number of visits to a site,
number of sites visited,
recurrent/transient configurations and mechanisms.
Can one do something better with rotor walk?For rumor spreading, yes.
Laura Florescu Random growth models 6 / 46
Why care, and what to ask
Compare with random walk.
hitting times of sets,
number of visits to a site,
number of sites visited,
recurrent/transient configurations and mechanisms.
Can one do something better with rotor walk?For rumor spreading, yes.
Laura Florescu Random growth models 6 / 46
Why care, and what to ask
Compare with random walk.
hitting times of sets,
number of visits to a site,
number of sites visited,
recurrent/transient configurations and mechanisms.
Can one do something better with rotor walk?For rumor spreading, yes.
Laura Florescu Random growth models 6 / 46
Why care, and what to ask
Compare with random walk.
hitting times of sets,
number of visits to a site,
number of sites visited,
recurrent/transient configurations and mechanisms.
Can one do something better with rotor walk?For rumor spreading, yes.
Laura Florescu Random growth models 6 / 46
Why care, and what to ask
Compare with random walk.
hitting times of sets,
number of visits to a site,
number of sites visited,
recurrent/transient configurations and mechanisms.
Can one do something better with rotor walk?For rumor spreading, yes.
Laura Florescu Random growth models 6 / 46
Why care, and what to ask
Compare with random walk.
hitting times of sets,
number of visits to a site,
number of sites visited,
recurrent/transient configurations and mechanisms.
Can one do something better with rotor walk?
For rumor spreading, yes.
Laura Florescu Random growth models 6 / 46
Why care, and what to ask
Compare with random walk.
hitting times of sets,
number of visits to a site,
number of sites visited,
recurrent/transient configurations and mechanisms.
Can one do something better with rotor walk?For rumor spreading, yes.
Laura Florescu Random growth models 6 / 46
Recurrence/transience
configuration ⇢ is recurrent if the rotor walk with this initialconfiguration ⇢ returns to the origin infinitely often (xn = o forinfinitely many n);
otherwise, we say that ⇢ is transient.
Laura Florescu Random growth models 7 / 46
Recurrence/transience
configuration ⇢ is recurrent if the rotor walk with this initialconfiguration ⇢ returns to the origin infinitely often (xn = o forinfinitely many n);
otherwise, we say that ⇢ is transient.
Laura Florescu Random growth models 7 / 46
Background
Rotor walks on trees: uniform rotor walk on b�ary tree transient forb � 3 and recurrent for b = 2
Escape rates: if you make n particles perform rotor walk one afteranother, how many never come back?
on Zd, with d � 3, a positive fraction, while on Z2, order n/logn.
Laura Florescu Random growth models 8 / 46
Background
Rotor walks on trees: uniform rotor walk on b�ary tree transient forb � 3 and recurrent for b = 2
Escape rates: if you make n particles perform rotor walk one afteranother, how many never come back?
on Zd, with d � 3, a positive fraction, while on Z2, order n/logn.
Laura Florescu Random growth models 8 / 46
Background
Rotor walks on trees: uniform rotor walk on b�ary tree transient forb � 3 and recurrent for b = 2
Escape rates: if you make n particles perform rotor walk one afteranother, how many never come back?
on Zd, with d � 3, a positive fraction, while on Z2, order n/logn.
Laura Florescu Random growth models 8 / 46
Background
1 Random walk comparison in terms of expected number of particles atany given vertex by Cooper and Spencer
2 Rotor-router version of IDLA converges to a d-dimensional Euclideanball by Levine and Peres
Laura Florescu Random growth models 9 / 46
Background
1 Random walk comparison in terms of expected number of particles atany given vertex by Cooper and Spencer
2 Rotor-router version of IDLA converges to a d-dimensional Euclideanball by Levine and Peres
Laura Florescu Random growth models 9 / 46
RangeInitial configuration of iid rotors on Z2?
Figure: The set of sites visited after the 18th excursion and the 54th, respectively.Excursion = 4 consecutive visits to o.
Laura Florescu Random growth models 10 / 46
i.i.d rotors
Question: How many distinct sites would a particle typically visit?
We know RW visits t/ log t sites in t steps.
physicists conjectured (from simulations) in the early 90s that therange of rotor walk on Z2 is ⇥(t2/3)
Laura Florescu Random growth models 11 / 46
i.i.d rotors
Question: How many distinct sites would a particle typically visit?
We know RW visits t/ log t sites in t steps.
physicists conjectured (from simulations) in the early 90s that therange of rotor walk on Z2 is ⇥(t2/3)
Laura Florescu Random growth models 11 / 46
i.i.d rotors
Question: How many distinct sites would a particle typically visit?
We know RW visits t/ log t sites in t steps.
physicists conjectured (from simulations) in the early 90s that therange of rotor walk on Z2 is ⇥(t2/3)
Laura Florescu Random growth models 11 / 46
Iid rotors
Theorem
The number of sites visited by iid rotor walk on Zd in t steps is ⌦(td/d+1
).
Figure: The set of sites visited on excursions 1 through 20.Laura Florescu Random growth models 12 / 46
Comb lattice
Theorem
The number of sites visited by iid rotor walk on the comb lattice in t stepsis ⇥(t2/3).
It is of note that this result contrasts with random walk on C2
whichexpects to visit ( 1
2
p2⇡
+ o(1))pt log t as shown in Pach,Tardos.
Ox
Laura Florescu Random growth models 13 / 46
Manhattan lattice
Figure: Set of sites visited by rotor walk on the Manhattan lattice after the 2ndand 11th excursion respectively.
Laura Florescu Random growth models 15 / 46
F-lattice
Figure: Set of sites visited by rotor walk on the F-lattice at the first and secondexcursion after 100000 steps respectively.
Laura Florescu Random growth models 16 / 46
Recurrence of directed lattices
Theorem
The F- and Manhattan lattices are recurrent, through connection tocritical percolation.
Idea: connection to stochastic pin-ball model/mirror model/ criticalpercolation.
Stochastic pin ball: place at each vertex x a mirror and orientit either NW or NE. Start particle, bounce it o↵ mirrors.
Laura Florescu Random growth models 17 / 46
Recurrence of directed lattices
Theorem
The F- and Manhattan lattices are recurrent, through connection tocritical percolation.
Idea: connection to stochastic pin-ball model/mirror model/ criticalpercolation. Stochastic pin ball: place at each vertex x a mirror and orientit either NW or NE. Start particle, bounce it o↵ mirrors.
Laura Florescu Random growth models 17 / 46
Manhattan lattice
O
Figure: Mirrors and rotor walk on the Manhattan lattice.
Laura Florescu Random growth models 18 / 46
Conclusions for rotor walks
Theorem
Proved the lower bound of a long-standing physics conjecture(Rt = ⌦(td/d+1
) for all configurations on Zd)
For comb, Rt = ⇥(t2/3)
Showed Manhattan and F- lattices are recurrent
Showed some results about escape rates (not in this talk)
Laura Florescu Random growth models 19 / 46
Conclusions for rotor walks
Theorem
Proved the lower bound of a long-standing physics conjecture(Rt = ⌦(td/d+1
) for all configurations on Zd)
For comb, Rt = ⇥(t2/3)
Showed Manhattan and F- lattices are recurrent
Showed some results about escape rates (not in this talk)
Laura Florescu Random growth models 19 / 46
Conclusions for rotor walks
Theorem
Proved the lower bound of a long-standing physics conjecture(Rt = ⌦(td/d+1
) for all configurations on Zd)
For comb, Rt = ⇥(t2/3)
Showed Manhattan and F- lattices are recurrent
Showed some results about escape rates (not in this talk)
Laura Florescu Random growth models 19 / 46
Conclusions for rotor walks
Theorem
Proved the lower bound of a long-standing physics conjecture(Rt = ⌦(td/d+1
) for all configurations on Zd)
For comb, Rt = ⇥(t2/3)
Showed Manhattan and F- lattices are recurrent
Showed some results about escape rates (not in this talk)
Laura Florescu Random growth models 19 / 46
Community detection
Goal: Detect communities in networks.
Laura Florescu Random growth models 21 / 46
Community detection (or graph clustering)Find groups of similar nodes (clusters or communities) by observing
their pairwise interactions (the graph or network)
Figure: Community detection
Laura Florescu Random growth models 22 / 46
Community detection (or graph clustering)Find groups of similar nodes (clusters or communities) by observing
their pairwise interactions (the graph or network)
Figure: Community detection
Laura Florescu Random growth models 22 / 46
Long standing question:When can algorithms extract meaningful clusters?
Our goal:Answer this question on a generative modelThe general Stochastic Block Model
Figure: Red edges added with P = p and blue edges with P = q.
At intersection of machine learning, statistics, probability, theoretical CSand statistical physics.
Laura Florescu Random growth models 23 / 46
Long standing question:When can algorithms extract meaningful clusters?Our goal:Answer this question on a generative model
The general Stochastic Block Model
Figure: Red edges added with P = p and blue edges with P = q.
At intersection of machine learning, statistics, probability, theoretical CSand statistical physics.
Laura Florescu Random growth models 23 / 46
Long standing question:When can algorithms extract meaningful clusters?Our goal:Answer this question on a generative modelThe general Stochastic Block Model
Figure: Red edges added with P = p and blue edges with P = q.
At intersection of machine learning, statistics, probability, theoretical CSand statistical physics.
Laura Florescu Random growth models 23 / 46
SBM, Information theory threshold
When p = a/n and q = b/n
Theorem (Mossel, Neeman, Sly, 2012)
It is information theoretically impossible to cluster when
(a� b)2 < 2(a+ b).
Proves conjecture of [Decelle, Krzakala, Moore, Zdeborova ’11].
Laura Florescu Random growth models 27 / 46
Binary symmetric broadcast model
T : Galton-Watson tree with mean o↵spring distribution mean b.
Root R labeled uniformly +1/� 1, each child takes parent’s label withP = 1� ⌘ and opposite label with P = ⌘.
Goal: reconstruct value of R from labels at level n.
Theorem (Evans, Kenyon, Peres, Schulman ’00)
Probability of correct reconstruction of value of R tends to 1
2
as n ! 1 if
(1� 2⌘)2 pc(T ),
where pc(T ) is the critical probability for percolation on T .
Can think of pc(T ) as the edge density at which the tree is connected.trees with o↵spring distribution Pois(a+b
2
) and take 1� ⌘ =
aa+b .
Then threshold reduces to (a� b)2 2(a+ b)!
Laura Florescu Random growth models 28 / 46
Binary symmetric broadcast model
T : Galton-Watson tree with mean o↵spring distribution mean b.Root R labeled uniformly +1/� 1, each child takes parent’s label with
P = 1� ⌘ and opposite label with P = ⌘.
Goal: reconstruct value of R from labels at level n.
Theorem (Evans, Kenyon, Peres, Schulman ’00)
Probability of correct reconstruction of value of R tends to 1
2
as n ! 1 if
(1� 2⌘)2 pc(T ),
where pc(T ) is the critical probability for percolation on T .
Can think of pc(T ) as the edge density at which the tree is connected.trees with o↵spring distribution Pois(a+b
2
) and take 1� ⌘ =
aa+b .
Then threshold reduces to (a� b)2 2(a+ b)!
Laura Florescu Random growth models 28 / 46
Binary symmetric broadcast model
T : Galton-Watson tree with mean o↵spring distribution mean b.Root R labeled uniformly +1/� 1, each child takes parent’s label with
P = 1� ⌘ and opposite label with P = ⌘.
Goal: reconstruct value of R from labels at level n.
Theorem (Evans, Kenyon, Peres, Schulman ’00)
Probability of correct reconstruction of value of R tends to 1
2
as n ! 1 if
(1� 2⌘)2 pc(T ),
where pc(T ) is the critical probability for percolation on T .
Can think of pc(T ) as the edge density at which the tree is connected.trees with o↵spring distribution Pois(a+b
2
) and take 1� ⌘ =
aa+b .
Then threshold reduces to (a� b)2 2(a+ b)!
Laura Florescu Random growth models 28 / 46
Binary symmetric broadcast model
T : Galton-Watson tree with mean o↵spring distribution mean b.Root R labeled uniformly +1/� 1, each child takes parent’s label with
P = 1� ⌘ and opposite label with P = ⌘.
Goal: reconstruct value of R from labels at level n.
Theorem (Evans, Kenyon, Peres, Schulman ’00)
Probability of correct reconstruction of value of R tends to 1
2
as n ! 1 if
(1� 2⌘)2 pc(T ),
where pc(T ) is the critical probability for percolation on T .
Can think of pc(T ) as the edge density at which the tree is connected.trees with o↵spring distribution Pois(a+b
2
) and take 1� ⌘ =
aa+b .
Then threshold reduces to (a� b)2 2(a+ b)!
Laura Florescu Random growth models 28 / 46
Binary symmetric broadcast model
T : Galton-Watson tree with mean o↵spring distribution mean b.Root R labeled uniformly +1/� 1, each child takes parent’s label with
P = 1� ⌘ and opposite label with P = ⌘.
Goal: reconstruct value of R from labels at level n.
Theorem (Evans, Kenyon, Peres, Schulman ’00)
Probability of correct reconstruction of value of R tends to 1
2
as n ! 1 if
(1� 2⌘)2 pc(T ),
where pc(T ) is the critical probability for percolation on T .
Can think of pc(T ) as the edge density at which the tree is connected.
trees with o↵spring distribution Pois(a+b2
) and take 1� ⌘ =
aa+b .
Then threshold reduces to (a� b)2 2(a+ b)!
Laura Florescu Random growth models 28 / 46
Binary symmetric broadcast model
T : Galton-Watson tree with mean o↵spring distribution mean b.Root R labeled uniformly +1/� 1, each child takes parent’s label with
P = 1� ⌘ and opposite label with P = ⌘.
Goal: reconstruct value of R from labels at level n.
Theorem (Evans, Kenyon, Peres, Schulman ’00)
Probability of correct reconstruction of value of R tends to 1
2
as n ! 1 if
(1� 2⌘)2 pc(T ),
where pc(T ) is the critical probability for percolation on T .
Can think of pc(T ) as the edge density at which the tree is connected.trees with o↵spring distribution Pois(a+b
2
) and take 1� ⌘ =
aa+b .
Then threshold reduces to (a� b)2 2(a+ b)!
Laura Florescu Random growth models 28 / 46
Binary symmetric broadcast model
T : Galton-Watson tree with mean o↵spring distribution mean b.Root R labeled uniformly +1/� 1, each child takes parent’s label with
P = 1� ⌘ and opposite label with P = ⌘.
Goal: reconstruct value of R from labels at level n.
Theorem (Evans, Kenyon, Peres, Schulman ’00)
Probability of correct reconstruction of value of R tends to 1
2
as n ! 1 if
(1� 2⌘)2 pc(T ),
where pc(T ) is the critical probability for percolation on T .
Can think of pc(T ) as the edge density at which the tree is connected.trees with o↵spring distribution Pois(a+b
2
) and take 1� ⌘ =
aa+b .
Then threshold reduces to (a� b)2 2(a+ b)!
Laura Florescu Random growth models 28 / 46
SBM, Computational threshold
Dyer, Frieze 1989 p = na > q = nb fixed
Condon, Karp 2001 a� b � n1/2
McSherry 2001 a� b �pb log n
Coja-Oghlan 2010 a� b �pb
Massoulie 2013 and Mossel, Neeman, Sly 2013 - detection possibleand e�cient
(a� b)2 > 2(a+ b).
Ingenious spectral methods
Laura Florescu Random growth models 29 / 46
SBM, Computational threshold
Dyer, Frieze 1989 p = na > q = nb fixed
Condon, Karp 2001 a� b � n1/2
McSherry 2001 a� b �pb log n
Coja-Oghlan 2010 a� b �pb
Massoulie 2013 and Mossel, Neeman, Sly 2013 - detection possibleand e�cient
(a� b)2 > 2(a+ b).
Ingenious spectral methods
Laura Florescu Random growth models 29 / 46
SBM, Computational threshold
Dyer, Frieze 1989 p = na > q = nb fixed
Condon, Karp 2001 a� b � n1/2
McSherry 2001 a� b �pb log n
Coja-Oghlan 2010 a� b �pb
Massoulie 2013 and Mossel, Neeman, Sly 2013 - detection possibleand e�cient
(a� b)2 > 2(a+ b).
Ingenious spectral methods
Laura Florescu Random growth models 29 / 46
Bipartite Stochastic Block Model
Figure: Bipartite stochastic model on V1 and V2. Red edges added withP = �p(n1, n2) and blue edges with P = (2� �)p(n1, n2).
Laura Florescu Random growth models 30 / 46
Background
Intermediate step in recovering solutions in planted problems [Feldman,Perkins, Vempala ’14] - reduction.
planted constraint satisfaction problems (CSP), planted k-SAT
Reducing planted problems on n variables will give vertex sets of sizen1
= n, n2
= nk�1. (n2
⇡ n2
)
Laura Florescu Random growth models 31 / 46
Background
Intermediate step in recovering solutions in planted problems [Feldman,Perkins, Vempala ’14] - reduction.
planted constraint satisfaction problems (CSP), planted k-SAT
Reducing planted problems on n variables will give vertex sets of sizen1
= n, n2
= nk�1. (n2
⇡ n2
)
Laura Florescu Random growth models 31 / 46
Planted CSPs random k-SAT
Planted random k-SAT: Form a truth assignment � of literals, thenselect each clause independently from the k-tuples of literals where at leastone literal is set to 1 by �.
More generally, a CSP consists of
a set of variables X = {x1
, . . . , xn}for each variable xi, a finite set Di of possible values (its domain)
a set of constraints restricting the values that the variables cansimultaneously take
Note: the BSBM is an intermediate step in planted CSPs.
Laura Florescu Random growth models 32 / 46
Planted CSPs random k-SAT
Planted random k-SAT: Form a truth assignment � of literals, thenselect each clause independently from the k-tuples of literals where at leastone literal is set to 1 by �.
More generally, a CSP consists of
a set of variables X = {x1
, . . . , xn}
for each variable xi, a finite set Di of possible values (its domain)
a set of constraints restricting the values that the variables cansimultaneously take
Note: the BSBM is an intermediate step in planted CSPs.
Laura Florescu Random growth models 32 / 46
Planted CSPs random k-SAT
Planted random k-SAT: Form a truth assignment � of literals, thenselect each clause independently from the k-tuples of literals where at leastone literal is set to 1 by �.
More generally, a CSP consists of
a set of variables X = {x1
, . . . , xn}for each variable xi, a finite set Di of possible values (its domain)
a set of constraints restricting the values that the variables cansimultaneously take
Note: the BSBM is an intermediate step in planted CSPs.
Laura Florescu Random growth models 32 / 46
Planted CSPs random k-SAT
Planted random k-SAT: Form a truth assignment � of literals, thenselect each clause independently from the k-tuples of literals where at leastone literal is set to 1 by �.
More generally, a CSP consists of
a set of variables X = {x1
, . . . , xn}for each variable xi, a finite set Di of possible values (its domain)
a set of constraints restricting the values that the variables cansimultaneously take
Note: the BSBM is an intermediate step in planted CSPs.
Laura Florescu Random growth models 32 / 46
Constraint satisfaction problems
Other uses
scheduling jobs on machines
car sequencing
vehicle routing
resource allocation
error-correcting codes
genetic regulatory networks
financial clusters
register allocation in compilers
Laura Florescu Random growth models 33 / 46
BSBM, formal
Goal: get the planted assignment � (on V1
for bipartite stochasticmodel)
Detection: compute v that agrees with � on 1/2 + ✏ fraction ofvertices
Recovery: compute v that agrees with � on 1� o(1) fraction ofvertices
Laura Florescu Random growth models 34 / 46
BSBM, formal
Goal: get the planted assignment � (on V1
for bipartite stochasticmodel)
Detection: compute v that agrees with � on 1/2 + ✏ fraction ofvertices
Recovery: compute v that agrees with � on 1� o(1) fraction ofvertices
Laura Florescu Random growth models 34 / 46
BSBM, formal
Goal: get the planted assignment � (on V1
for bipartite stochasticmodel)
Detection: compute v that agrees with � on 1/2 + ✏ fraction ofvertices
Recovery: compute v that agrees with � on 1� o(1) fraction ofvertices
Laura Florescu Random growth models 34 / 46
Previous work - Spectral methods
Applying some classical results to bipartite model using spectrum withp = O(1/n
1
) recovers partition
typical analysis of spectral algos: 2nd singular value > spectral normof noise matrix M � EM ;
for p << 1/n1
, �2
(EM) =
˜
⇥(ppn1
n2
), norm of noise||M � EM || = ˜
⇥(
ppn
2
).
Laura Florescu Random growth models 35 / 46
Previous work - Spectral methods
Applying some classical results to bipartite model using spectrum withp = O(1/n
1
) recovers partition
typical analysis of spectral algos: 2nd singular value > spectral normof noise matrix M � EM ;
for p << 1/n1
, �2
(EM) =
˜
⇥(ppn1
n2
), norm of noise||M � EM || = ˜
⇥(
ppn
2
).
Laura Florescu Random growth models 35 / 46
Previous work - Spectral methods
Applying some classical results to bipartite model using spectrum withp = O(1/n
1
) recovers partition
typical analysis of spectral algos: 2nd singular value > spectral normof noise matrix M � EM ;
for p << 1/n1
, �2
(EM) =
˜
⇥(ppn1
n2
), norm of noise||M � EM || = ˜
⇥(
ppn
2
).
Laura Florescu Random growth models 35 / 46
Questions
1 Here �2
< ||M � EM ||. Is SVD doomed for p ⌧ 1/n1
?
2 What is the optimal threshold for detection in BSBM?
Laura Florescu Random growth models 36 / 46
Questions
1 Here �2
< ||M � EM ||. Is SVD doomed for p ⌧ 1/n1
?
2 What is the optimal threshold for detection in BSBM?
Laura Florescu Random growth models 36 / 46
Results - Impossibility
Theorem (F., Perkins, COLT’16)
If n2
� n1
and
p 1
(� � 1)
2
pn1
n2
,
then no algorithm can detect the partition.
Idea: Couple to a broadcast model on a multi-type Galton Watson tree.Show that conditioned on the labels of a log n bdry of the tree, the labelof root is asymp indep of the rest of graph.
Laura Florescu Random growth models 37 / 46
Results - Impossibility
Theorem (F., Perkins, COLT’16)
If n2
� n1
and
p 1
(� � 1)
2
pn1
n2
,
then no algorithm can detect the partition.
Idea: Couple to a broadcast model on a multi-type Galton Watson tree.Show that conditioned on the labels of a log n bdry of the tree, the labelof root is asymp indep of the rest of graph.
Laura Florescu Random growth models 37 / 46
Results - sharp reconstruction
Theorem (F., Perkins, COLT’16)
Let n2
� n1
. Then there is a polynomial-time algorithm that detects thepartition V
1
= A1
[B1
if
p >1 + ✏
(� � 1)
2
pn1
n2
for any fixed ✏ > 0.
Idea: reduce to SBM on graph on V1
induced by paths of length 2 inbipartite graph.
Laura Florescu Random growth models 38 / 46
Results - sharp reconstruction
Theorem (F., Perkins, COLT’16)
Let n2
� n1
. Then there is a polynomial-time algorithm that detects thepartition V
1
= A1
[B1
if
p >1 + ✏
(� � 1)
2
pn1
n2
for any fixed ✏ > 0.
Idea: reduce to SBM on graph on V1
induced by paths of length 2 inbipartite graph.
Laura Florescu Random growth models 38 / 46
Implications for planted k-SAT
- detection in the block model exhibits a sharp threshold at
m⇤= ⇥(nr/2
) hyperedges/clauses
Laura Florescu Random growth models 39 / 46
Spectral algorithms
Standard SVD: Compute left singular vector of M (adjacency matrix)corresponding to 2nd singular value, round its signs to get v; compare� and v
Diagonal deletion SVD: Set diagonal entries of MMT to 0, computesecond eigenvector, round its signs to get v; compare � and v
Laura Florescu Random growth models 40 / 46
Spectral algorithms
Standard SVD: Compute left singular vector of M (adjacency matrix)corresponding to 2nd singular value, round its signs to get v; compare� and v
Diagonal deletion SVD: Set diagonal entries of MMT to 0, computesecond eigenvector, round its signs to get v; compare � and v
Laura Florescu Random growth models 40 / 46
Results - spectral
Theorem (F., Perkins, COLT’16)
Let n2
� n1
, with n1
! 1. Then
1 If pD > (n1
n2
)
�1/2, then whp the diagonal deletion SVD algorithmrecovers the partition V
1
= A1
[B1
.
2 If pV > n�2/31
n�1/32
, then whp the standard SVD algorithm recoversthe partition.
When n2
= n2, pD ⇡ n�3/2, pV ⇡ n�4/3.
Laura Florescu Random growth models 41 / 46
Results - spectral
Theorem (F., Perkins, COLT’16)
Let n2
� n1
, with n1
! 1. Then
1 If pD > (n1
n2
)
�1/2, then whp the diagonal deletion SVD algorithmrecovers the partition V
1
= A1
[B1
.
2 If pV > n�2/31
n�1/32
, then whp the standard SVD algorithm recoversthe partition.
When n2
= n2, pD ⇡ n�3/2, pV ⇡ n�4/3.
Laura Florescu Random growth models 41 / 46
Results - spectral
Theorem (F., Perkins, COLT’16)
Let n2
� n1
, with n1
! 1. Then
1 If pD > (n1
n2
)
�1/2, then whp the diagonal deletion SVD algorithmrecovers the partition V
1
= A1
[B1
.
2 If pV > n�2/31
n�1/32
, then whp the standard SVD algorithm recoversthe partition.
When n2
= n2, pD ⇡ n�3/2, pV ⇡ n�4/3.
Laura Florescu Random growth models 41 / 46
Results - spectral
Theorem (F., Perkins, COLT’16)
Let n2
� n1
, with n1
! 1. Then
1 If pD > (n1
n2
)
�1/2, then whp the diagonal deletion SVD algorithmrecovers the partition V
1
= A1
[B1
.
2 If pV > n�2/31
n�1/32
, then whp the standard SVD algorithm recoversthe partition.
When n2
= n2, pD ⇡ n�3/2, pV ⇡ n�4/3.
Laura Florescu Random growth models 41 / 46
Thresholds origins
DiagD: B = MMT �DV , SVD: B0= B +DV
�: partition, e2(B): second largest eigenvector of B, DV : degrees.
DiagD: sin(B,EB) CkB�EBk�2
SVD: sin(B0,EB0) CkB�EBk+kDV �EDV k
�2by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap
Cn1/21 n
1/22 p
(��1)
2n1n2p2; C
n1/21 n
1/22 p+(C
pn2p logn1)
(��1)
2n1n2p2(2nd � asymptotics)
= O⇣
1
logn1
⌘; = O
⇣1
logn1
⌘
ke2
(B)� �/pn1
k = O(log
�1 n1
) (by special case of Sin Theta Theorem).
Conclude by rounding signs of e2
(B).
Laura Florescu Random growth models 43 / 46
Thresholds origins
DiagD: B = MMT �DV , SVD: B0= B +DV
�: partition, e2(B): second largest eigenvector of B, DV : degrees.
DiagD: sin(B,EB) CkB�EBk�2
SVD: sin(B0,EB0) CkB�EBk+kDV �EDV k
�2by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap
Cn1/21 n
1/22 p
(��1)
2n1n2p2; C
n1/21 n
1/22 p+(C
pn2p logn1)
(��1)
2n1n2p2(2nd � asymptotics)
= O⇣
1
logn1
⌘; = O
⇣1
logn1
⌘
ke2
(B)� �/pn1
k = O(log
�1 n1
) (by special case of Sin Theta Theorem).
Conclude by rounding signs of e2
(B).
Laura Florescu Random growth models 43 / 46
Thresholds origins
DiagD: B = MMT �DV , SVD: B0= B +DV
�: partition, e2(B): second largest eigenvector of B, DV : degrees.
DiagD: sin(B,EB) CkB�EBk�2
SVD: sin(B0,EB0) CkB�EBk+kDV �EDV k
�2by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap
Cn1/21 n
1/22 p
(��1)
2n1n2p2; C
n1/21 n
1/22 p+(C
pn2p logn1)
(��1)
2n1n2p2(2nd � asymptotics)
= O⇣
1
logn1
⌘; = O
⇣1
logn1
⌘
ke2
(B)� �/pn1
k = O(log
�1 n1
) (by special case of Sin Theta Theorem).
Conclude by rounding signs of e2
(B).
Laura Florescu Random growth models 43 / 46
Thresholds origins
DiagD: B = MMT �DV , SVD: B0= B +DV
�: partition, e2(B): second largest eigenvector of B, DV : degrees.
DiagD: sin(B,EB) CkB�EBk�2
SVD: sin(B0,EB0) CkB�EBk+kDV �EDV k
�2by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap
Cn1/21 n
1/22 p
(��1)
2n1n2p2; C
n1/21 n
1/22 p+(C
pn2p logn1)
(��1)
2n1n2p2(2nd � asymptotics)
= O⇣
1
logn1
⌘; = O
⇣1
logn1
⌘
ke2
(B)� �/pn1
k = O(log
�1 n1
) (by special case of Sin Theta Theorem).
Conclude by rounding signs of e2
(B).
Laura Florescu Random growth models 43 / 46
Thresholds origins
DiagD: B = MMT �DV , SVD: B0= B +DV
�: partition, e2(B): second largest eigenvector of B, DV : degrees.
DiagD: sin(B,EB) CkB�EBk�2
SVD: sin(B0,EB0) CkB�EBk+kDV �EDV k
�2by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap
Cn1/21 n
1/22 p
(��1)
2n1n2p2; C
n1/21 n
1/22 p+(C
pn2p logn1)
(��1)
2n1n2p2(2nd � asymptotics)
= O⇣
1
logn1
⌘; = O
⇣1
logn1
⌘
ke2
(B)� �/pn1
k = O(log
�1 n1
) (by special case of Sin Theta Theorem).
Conclude by rounding signs of e2
(B).
Laura Florescu Random growth models 43 / 46
Thresholds origins
DiagD: B = MMT �DV , SVD: B0= B +DV
�: partition, e2(B): second largest eigenvector of B, DV : degrees.
DiagD: sin(B,EB) CkB�EBk�2
SVD: sin(B0,EB0) CkB�EBk+kDV �EDV k
�2by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap
Cn1/21 n
1/22 p
(��1)
2n1n2p2; C
n1/21 n
1/22 p+(C
pn2p logn1)
(��1)
2n1n2p2(2nd � asymptotics)
= O⇣
1
logn1
⌘; = O
⇣1
logn1
⌘
ke2
(B)� �/pn1
k = O(log
�1 n1
) (by special case of Sin Theta Theorem).
Conclude by rounding signs of e2
(B).
Laura Florescu Random growth models 43 / 46
Thresholds origins
DiagD: B = MMT �DV , SVD: B0= B +DV
�: partition, e2(B): second largest eigenvector of B, DV : degrees.
DiagD: sin(B,EB) CkB�EBk�2
SVD: sin(B0,EB0) CkB�EBk+kDV �EDV k
�2by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap
Cn1/21 n
1/22 p
(��1)
2n1n2p2; C
n1/21 n
1/22 p+(C
pn2p logn1)
(��1)
2n1n2p2(2nd � asymptotics)
= O⇣
1
logn1
⌘; = O
⇣1
logn1
⌘
ke2
(B)� �/pn1
k = O(log
�1 n1
) (by special case of Sin Theta Theorem).
Conclude by rounding signs of e2
(B).
Laura Florescu Random growth models 43 / 46
Conclusions for this part
Theorem
Can e�ciently detect partition in BSBM if p > 1+✏(��1)
2pn1n2
Cannot detect if p 1
(��1)
2pn1n2
spectral method still works if �2
norm of noise matrix
modifying adjacency matrix improves recovery significantly
Laura Florescu Random growth models 44 / 46
Conclusions for this part
Theorem
Can e�ciently detect partition in BSBM if p > 1+✏(��1)
2pn1n2
Cannot detect if p 1
(��1)
2pn1n2
spectral method still works if �2
norm of noise matrix
modifying adjacency matrix improves recovery significantly
Laura Florescu Random growth models 44 / 46
Conclusions for this part
Theorem
Can e�ciently detect partition in BSBM if p > 1+✏(��1)
2pn1n2
Cannot detect if p 1
(��1)
2pn1n2
spectral method still works if �2
norm of noise matrix
modifying adjacency matrix improves recovery significantly
Laura Florescu Random growth models 44 / 46