MIT Randomization in Graph Optimization Problems David Karger MIT karger

MIT

Randomization in Graph Optimization

ProblemsDavid Karger

MIT

http://theory.lcs.mit.edu/~karger

MIT

Randomized Algorithms

Flip coins to decide what to do next Avoid hard work of making “right” choice Often faster and simpler than

deterministic algorithms

Different from average-case analysis» Input is worst case» Algorithm adds randomness

MIT

Methods

Random selection» if most candidate choices “good”, then a

random choice is probably good Monte Carlo simulation

» simulations estimate event likelihoods Random sampling

» generate a small random subproblem» solve, extrapolate to whole problem

Randomized Rounding for approximation

MIT

Cuts in Graphs

Focus on undirected graphs A cut is a vertex partition Value is number (or total weight) of

crossing edges

MIT

Optimization with Cuts

Cut values determine solution of many graph optimization problems:» min-cut / max-flow» multicommodity flow (sort-of)» bisection / separator» network reliability» network design

Randomization helps solve these problems

MIT

Presentation Assumption

For entire presentation, we consider unweighted graphs (all edges have weight/capacity one)

All results apply unchanged to arbitrarily weighted graphs» Integer weights = parallel edges» Rational weights scale to integers» Analysis unaffected» Some implementation details

MIT

Basic Probability

Conditional probability» Pr[A B] = Pr[A] Pr[B | A]

Independent events multiply:» Pr[A B] = Pr[A] Pr[B]

Linearity of expectation: » E[X + Y] = E[X] + E[Y]

Union Bound» Pr[X Y] Pr[X] + Pr[Y]

MIT

Random Selection forMinimum Cuts

Random choices are good

when problems are rare

MIT

Minimum Cut

Smallest cut of graph Cheapest way to separate into 2 parts Various applications:

» network reliability (small cuts are weakest)» subtour elimination constraints for TSP» separation oracle for network design

Not s-t min-cut

MIT

Max-flow/Min-cut

s-t flow: edge-disjoint packing of s-t paths s-t cut: a cut separating s and t [FF]: s-t max-flow = s-t min-cut

» max-flow saturates all s-t min-cuts» most efficient way to find s-t min-cuts

[GH]: min-cut is “all-pairs” s-t min-cut» find using n flow computations

MIT

Flow Algorithms

Push-relabel [GT]:» push “excess” around graph till it’s gone» max-flow in O*(mn) (note: O* hides logs)

– Recent O*(m3/2) [GR]

» min-cut in O*(mn2) --- “harder” than flow Pipelining [HO]:

» save push/relabel data between flows» min-cut in O*(mn) --- “as easy” as flow

MIT

Contraction

Find edge that doesn’t cross min-cut Contract (merge) endpoints to 1 vertex

MIT

Contraction Algorithm

Repeat n - 2 times:» find non-min-cut edge» contract it (keep parallel edges)

Each contraction decrements #vertices At end, 2 vertices left

» unique cut» corresponds to min-cut of starting graph

MIT

Picking an Edge

Must contract non-min-cut edges [NI]: O(m) time algorithm to pick edge

» n contractions: O(mn) time for min-cut» slightly faster than flows

If only could find edge faster….

Idea: min-cut edges are few

MIT

Randomize

Repeat until 2 vertices remainpick a random edge

contract it

(keep fingers crossed)

MIT

Analysis I

Min-cut is small---few edges» Suppose graph has min-cut c» Then minimum degree at least c» Thus at least nc/2 edges

Random edge is probably safe

Pr[min-cut edge] c/(nc/2)

= 2/n

(easy generalization to capacitated case)

MIT

Analysis II

Algorithm succeeds if never accidentally contracts min-cut edge

Contracts #vertices from n down to 2 When k vertices, chance of error is 2/k

» thus, chance of being right is 1-2/k Pr[always right] is product of

probabilities of being right each time

MIT

22

)1(2

31

132

32

122

n

2k

)())((

)1()1)(1(

safe]n contractio Pr[ success]Pr[

n

nn

nn

nn

nn

thk

Analysis III

…not too good!

MIT

Repetition

Repetition amplifies success probability» basic failure probability 1 - 2/n2

» so repeat 7n2 times

6

72

7

2

10

)1(

once]) (Pr[fail

times]7 Pr[failfailure] completePr[

2

2

2

n

n

n

n

MIT

How fast?

Easy to perform 1 trial in O(m) time» just use array of edges, no data structures

But need n2 trials: O(mn2) time Simpler than flows, but slower

MIT

An improvement [KS]

When k vertices, error probability 2/k» big when k small

Idea: once k small, change algorithm» algorithm needs to be safer» but can afford to be slower

Amplify by repetition!» Repeat base algorithm many times

MIT

Recursive Algorithm

Algorithm RCA ( G, n )

{G has n vertices}

repeat twice

randomly contract G to n/21/2 vertices

RCA(G,n/21/2)

(50-50 chance of avoiding min-cut)

MIT

Main Theorem

On any capacitated, undirected graph, Algorithm RCA» runs in O*(n2) time with simple structures» finds min-cut with probability 1/log n

Thus, O(log n) repetitions suffice to find the minimum cut (failure probability 10-6) in O(n2 log2 n) time.

MIT

Proof Outline

Graph has O(n2) (capacitated) edges So O(n2) work to contract, then two

subproblems of size n/2½

» T(n) = 2 T(n/2½) + O(n2) = O(n2 log n) Algorithm fails if both iterations fail

» Iteration succeeds if contractions and recursion succeed

» P(n)=1 - [1 - ½ P(n/2½)]2 = (1 / log n)

MIT

Failure Modes

Monte Carlo algorithms always run fast and probably give you the right answer

Las Vegas algorithms probably run fast and always give you the right answer

To make a Monte Carlo algorithm Las Vegas, need a way to check answer» repeat till answer is right

No fast min-cut check known (flow slow!)

MIT

How do we verify a minimum cut?

MIT

Enumerating Cuts

The probabilistic method, backwards

MIT

Cut Counting

Original CA finds any given min-cut with probability at least 2/n(n-1)

Only one cut found Disjoint events, so probabilities add So at most n(n-1)/2 min-cuts

» probabilities would sum to more than one Tight

» Cycle has exactly this many min-cuts

MIT

Enumeration

RCA as stated has constant probability of finding any given min-cut

If run O(log n) times, probability of missing a min-cut drops to 1/n3

But only n2 min-cuts So, probability miss any at most 1/n So, with probability 1-1/n, find all

» O(n2 log3 n) time

MIT

Generalization

If G has min-cut c, cut c is -mincut Lemma: contraction algorithm finds any

given -mincut with probability (n-2) » Proof: just add factor to basic analysis

Corollary: O(n2) -mincuts Corollary: Can find all in O*(n2) time

» Just change contraction factor in RCA

MIT

Summary

A simple fast min-cut algorithm» Random selection avoids rare problems

Generalization to near-minimum cuts Bound on number of small cuts

» Probabilistic method, backwards

MIT

Network Reliability

Monte Carlo estimation

MIT

The Problem

Input:» Graph G with n vertices » Edge failure probabilities

– For simplicity, fix a single p

Output:» FAIL(p): probability G is disconnected by

edge failures

MIT

Approximation Algorithms

Computing FAIL(p) is #P complete [V] Exact algorithm seems unlikely Approximation scheme

» Given G, p, , outputs -approximation» May be randomized:

– succeed with high probability

» Fully polynomial (FPRAS) if runtime is polynomial in n, 1/

MIT

Monte Carlo Simulation

Flip a coin for each edge, test graph k failures in t trials FAIL(p) k/t E[k/t] = FAIL(p) How many trials needed for confidence?

» “bad luck” on trials can yield bad estimate» clearly need at least 1/FAIL(p)

Chernoff bound: O*(1/2FAIL(p)) suffice to give probable accuracy within » Time O*(m/2FAIL(p))

MIT

Chernoff Bound

Random variables Xi [0,1] Sum X = Xi

Bound deviation from expectation

Pr[ |X-E[X]| E[X] ] < exp(-2E[X]/4) If E[X] 4(log n)/2, “tight concentration”

» Deviation by probability < 1 / n No one variable is a big part of E[X]

MIT

Application

Let Xi=1 if trial i is a failure, else 0

Let X = X1 + … + Xt

Then E[X] = t FAIL(p) Chernoff says X within relative of E[X]

with probability 1-exp(2 t FAIL(p)/4) So choose t to cancel other terms

» “High probability” t = O(log n / 2FAIL(p))» Deviation by probability < 1 / n

MIT

Review

Contraction Algorithm» O(n2) -mincuts» Enumerate in O*(n2) time

MIT

Network reliability problem

Random edge failures» Estimate FAIL(p) = Pr[graph disconnects]

Naïve Monte Carlo simulation» Chernoff bound---“tight concentration”

Pr[ |X-E[X]| E[X] ] < exp(-2E[X]/4)» O(log n / 2FAIL(p)) trials expect O(log n / 2)

network failures---good for Chernoff» So estimate within in O*(m/2FAIL(p)) time

MIT

Rare Events

When FAIL(p) too small, takes too long to collect sufficient statistics

Solution: skew trials to make interesting event more likely

But in a way that let’s you recover original probability

MIT

DNF Counting

Given DNF formula (OR of ANDs)(e1 e2 e3) (e1 e4) (e2 e6)

Each variable set true with probability p Estimate Pr[formula true]

» #P-complete [KL, KLM] FPRAS

» Skew to make true outcomes “common”» Time linear in formula size

MIT

Rewrite problem

Assume p=1/2 » Count satisfying assignments

“Satisfaction matrix » Sij=1 if ith assignment satisfies jth clause

We want number of nonzero rows Randomly sampling rows won’t work

» Might be too few nonzeros

MIT

New sample space

So normalize every nonzero row to sum to one (divide by number of nonzeros)» Now sum of nonzeros is desired value» So sufficient to estimate average nonzero

MIT

Sampling Nonzeros

We know number of nonzeros/column» If satisfy given clause, all variables in

clause must be true» All other variables unconstrained

Estimate average by random sampling» Know number of nonzeros/column» So can pick random column» Then pick random true-for-column

assignment

MIT

Few Samples Needed

Suppose k clauses Then E[sample] > 1/k

» 1 satisfied clauses k» 1 sample value 1/k

Adding O(k log n / 2) samples gives “large” mean

So Chernoff says sample mean is probably good estimate

MIT

Reliability Connection

Reliability as DNF counting:» Variable per edge, true if edge fails» Cut fails if all edges do (AND of edge vars)» Graph fails if some cut does (OR of cuts)» FAIL(p)=Pr[formula true]

Problem: the DNF has 2n clauses

MIT

Focus on Small Cuts

Fact: FAIL(p) > pc

Theorem: if pc=1/n(2+) then Pr[>-mincut fails]< n-

Corollary: FAIL(p) Pr[-mincut fails],

where=1+2/ Recall: O(n2) -mincuts Enumerate with RCA, run DNF counting

MIT

Proof of Theorem

Given pc=1/n(2+)

At most n2 cuts have value c Each fails with probability pc=1/n(2+)

Pr[any cut of value c fails] = O(n) Sum over all

MIT

Algorithm

RCA can enumerate all -minimum cuts with high probability in O(n2) time.

Given -minimum cuts, can -estimate probability one fails via Monte Carlo simulation for DNF-counting (formula size O(n2))

Corollary: when FAIL(p)< n-(2+), can -approximate it in O (cn2+4/) time

MIT

Combine

For large FAIL(p), naïve Monte Carlo For small FAIL(p), RCA/DNF counting Balance: -approx. in O(mn3.5/2) time Implementations show practical for

hundreds of nodes Again, no way to verify correct

MIT

Summary

Naïve Monte Carlo simulation works well for common events

Need to adapt for rare events Cut structure and DNF counting lets us

do this for network reliability

MIT

Random Sampling

More min-cut algorithms

MIT

Random Sampling

General tool for faster algorithms:» pick a small, representative sample» analyze it quickly (small)» extrapolate to original (representative)

Speed-accuracy tradeoff» smaller sample means less time» but also less accuracy

MIT

Min-cut Duality

[Edmonds]: min-cut=max tree packing» convert to directed graph» “source” vertex s (doesn’t matter which)» spanning trees directed away from s

[Gabow] “augmenting trees”» add a tree in O*(m) time» min-cut c (via max packing) in O*(mc)

» great if m and c are small…

MIT

Example

min-cut 2

2 directedspanning trees

directed min-cut 2

MIT

Sampling for Approximation

MIT

Random Sampling

[Gabow] scheme great if m, c small Random sampling

» reduces m, c

» scales cut values (in expectation)» if pick half the edges, get half of each cut

So find tree packings, cuts in samples

Problem: maybe some large deviations

MIT

Sampling Theorem

Given graph G, build a sample G(p) by including each edge with probability p

Cut of value v in G has expected value pv in G(p)

Definition: “constant” = 8 (ln n) / 2

Theorem: With high probability, all cuts in G( / c) have (1 ± ) times their expected values.

MIT

A Simple Application

[Gabow] packs trees in O*(mc) time Build G( / c)

» minimum expected cut » by theorem, min-cut probably near » find min-cut in time O*(m) using [Gabow]» corresponds to near-min-cut in G

Result: (1+) times min-cut in O*(m) time

MIT

Proof of Sampling: Idea

Chernoff bound says probability of large deviation in cut value is small

Problem: exponentially many cuts. Perhaps some deviate a great deal

Solution: showed few small cuts» only small cuts likely to deviate much» but few, so Chernoff bound applies

MIT

Proof of Sampling

Sampled with probability /c, » a cut of value c has mean » [Chernoff]: deviates from expected size by

more than with probability at most n-3

At most n2 cuts have value c Pr[any cut of value c deviates] = O(n) Sum over all

MIT

Las Vegas Algorithms

Finding Good Certificates

MIT

Approximate Tree Packing

Break edges into c /random groups Each looks like a sample at rate / c

» O*( m / c) edges» each has min expected cut » so theorem says min-cut 1 –

So each has a packing of size 1 – [Gabow] finds in time O*(m/c) per group

» so overall time is c O*(m/c) = O*(m)

MIT

Las Vegas Algorithm

Packing algorithm is Monte Carlo Previously found approximate cut (faster) If close, each “certifies” other

» Cut exceeds optimum cut» Packing below optimum cut

If not, re-run both Result: Las Vegas, expected time O*(m)

MIT

Exact Algorithm

Randomly partition edges in two groups» each like a 1/2-sample: =O*(c-1/2)

Recursively pack trees in each half» c/2 - O*(c1/2) trees

Merge packings» gives packing of size c - O*(c1/2)

» augment to maximum packing: O*(mc1/2) T(m,c)=2T(m/2,c/2)+O*(mc1/2) = O* (mc1/2)

MIT

Nearly Linear Time

MIT

Analyze Trees

Recall: [G] packs c (directed)-edge disjoint spanning trees

Corollary: in such a packing, some tree crosses min-cut only twice

To find min-cut:» find tree packing» find smallest cut with 2 tree edges crossing

Problem: packing takes O*(mc) time

MIT

Constraint trees

Min-cut c: » c directed trees» 2c directed min-cut

edges » On average, two

min-cut edges/tree

Definitions:

tree 2-crosses cut

MIT

Finding the Cut

From crossing tree edges, deduce cut

Remove tree edges

No other edges cross So each component

is on one side And opposite its

“neighbor’s” side

MIT

Sampling

Solution: use G(/c) with =1/8» pack O*() trees in O*(m) time » original min-cut has (1) edges in G( / c) » some tree 2-crosses it in G( / c) » …and thus 2-crosses it in G

Analyze O*() trees in G» time O*(m) per tree» Monte Carlo

MIT

Simplify

Discuss case where one tree edge crosses min-cut

MIT

Root tree, so cut subtree Use dynamic program up from leaves to

determine subtree cuts efficiently Given cuts at children of a node,

compute cut at parent Definitions:

» v are nodes below v» C(v) is value of cut at subtree v

Analyzing a tree

MIT

The Dynamic Program

u

v w

keep

discard

C u C v C w C v w 2 ,

Edges with leastcommon ancestor u

MIT

Algorithm: 1-Crossing Trees

Compute edges’ LCA’s: O(m) Compute “cuts” at leaves

» Cut values = degrees» each edge incident on at most two leaves» total time O(m)

Dynamic program upwards: O(n)

Total: O(m+n)

MIT

2-Crossing Trees

Cut corresponds to two subtrees:

n2 table entries fill in O(n2) time with dynamic program

C v w C v C w C v w 2 ,

v w

keep

discard

MIT

Bottleneck is C(v, w) computations Avoid. Find right “twin” w for each v

Linear Time

Compute using addpath and minpath operations of dynamic trees [ST]

Result: O(m log3 n) time (messy)

C v C w C v ww

min ,2

MIT

How do we verify a minimum cut?

MIT

Network Design

Randomized Rounding

MIT

Problem Statement

Given vertices, and cost cvw to buy and edge from v to w, find minimum cost purchase that creates a graph with desired connectivity properties

Example: minimum cost k-connected graph.

Generally NP-hard Recent approximation algorithms [GW],

[JV]

MIT

Integer Linear Program

Variable xvw=1 if buy edge vw

Solution cost xvw cvw

Constraint: for every cut, xvw k

Relaxing integrality gives tractable LP» Exponentially many cuts» But separation oracles exist (eg min-cut)

What is integrality gap?

MIT

Randomized Rounding

Given LP solution values xvw

Build graph where vw is present with probability xvw

Expected cost is at most opt: xvw cvw

Expected number of edges crossing any cut satisfies constraint

If expected number large for every cut, sampling theorem applies

MIT

k-connected subgraph

Fractional solution is k-connected So every cut has (expected) k edges

crossing in rounded solution Sampling theorem says every cut has at

least k-(k log n)1/2 edges Close approximation for large k Can often repair: e.g., get k-connected

subgraph at cost 1+((log n)/k)1/2 times min

MIT

Nonuniform Sampling

Concentrate on the important things

[Benczur-Karger, Karger, Karger-Levine]

MIT

s-t Min-Cuts

Recall: if G has min-cut c, then in G(/c) all cuts approximate their expected values to within.

Applications:

Min-cut inO*(mc) time [G]

Approximate/exact inO*((m/c) c) =O*(m)

s-t min-cut of value v in O*(mv)

Approximate inO*(mv/c) time

Trouble if c is small and v large.

MIT

The Problem

Cut sampling relied on Chernoff bound Chernoff bounds required that no one

edge is a large fraction of the expectation of a cut it crosses

If sample rate 1/c, each edge across a min-cut is too significant

But: if edge only crosses large cuts, then sample rate 1/c is OK!

MIT

Biased Sampling

Original sampling theorem weak when» large m

» small c But if m is large

» then G has dense regions» where c must be large» where we can sample more sparsely

MIT

Approx. s-t min-cut O*(mn) O*(n2 / 2)

Approx. s-t max-flow O*(m3/2 ) O*(mn1/2 / )

Flow of value v O*(mv)

O*(n11/9v)

Approx. bisection O*(m2) O*(n2 / 2)

m n /2 in weighted, undirected graphs

Problem Old Time New Time

vmnO *

MIT

Definition: A k-strong component is a maximal vertex-induced subgraph with min-cut k.

Strong Components

2 2

3

3

MIT

Nonuniform Sampling

Definition: An edge is k-strong if its endpoints are in same k-component.

Stricter than k-connected endpoints. Definition: The strong connectivity ce

for edge e is the largest k for which e is k-strong.

Plan: sample dense regions lightly

MIT

Nonuniform Sampling

Idea: if an edge is k-strong, then it is in a k-connected graph

So “safe” to sample with probability 1/k Problem: if sample edges with different

probabilities, E[cut value] gets messy Solution: if sample e with probability pe,

give it weight 1/pe

Then E[cut value]=original cut value

MIT

Compression Theorem

Definition: Given compression probabilities pe, compressed graph G[pe]

» includes edge e with probability pe and

» gives it weight 1/pe if included

Note E[G[pe]] = G

Theorem: G[/ ce]

» approximates all cuts by » has O (n) edges

MIT

Proof (approximation)

Basic idea: in a k-strong component, edges get sampled with prob. / k» original sampling theorem works

Problem: some edges may be in stronger components, sampled less

Induct up from strongest components:» apply original sampling theorem inside» then “freeze” so don’t affect weaker parts

MIT

Strength Lemma

Lemma: 1/ce n

» Consider connected component C of G» Suppose C has min-cut k

» Then every edge e in C has ce k

» So k edges crossing C’s min-cut have

1/ce 1/k k (1/k ) = 1

» Delete these edges (“cost” 1)» Repeat n - 1 times: no more edges!

MIT

Proof (edge count)

Edge e included with probability / ce

So expected number is / ce

We saw 1/ce n

So expected number at most n

MIT

Construction

To sample, must find edge strengths» can’t, but approximation suffices

Sparse certificates identify weak edges:» construct in linear time [NI]» contain all edges crossing cuts k

» iterate until strong components emerge Iterate for 2i-strong edges, all i

» tricks turn it strongly polynomial

MIT

Certificate Algorithm

Repeat k times» Find a spanning forest» Delete it

Each iteration deletes one edge from every cut (forest is spanning)

So at end, any edge crossing a cut of size k is deleted

[NI] merge all iterations in O(m) time

MIT

Flows

Uniform sampling led to flow algorithms» Randomly partition edges» Merge flows from each partition element

Compression problematic» Edge capacities changed» So flow path capacities distorted» Flow in compressed graph doesn’t fit in

original graph

MIT

Smoothing

If edge has strength ce, divide into / ce edges of capacity ce /» Creates 1/ce = n edges

Now each edge is only 1/ fraction of any cut of its strong component

So sampling a 1/ fraction works So dividing into groups works Yields (1-) max-flow in time /* vmnO

MIT

Cleanup

Approximate max-flow can be made exact by augmenting paths

Integrality problems» Augmenting paths fast for small integer flow» But breakup by smoothing ruins integrality

Surmountable» Flows in dense and sparse parts separable

Result: max-flow in O*(n11/9v) time

MIT

Proof By Picture

s t

Dense regions

MIT

Compress Dense Regions

s t

MIT

Solve Sparse Flow

s t

MIT

Replace Dense Parts(keep flow in sparse bits)

s t

MIT

“Fill In” Dense Parts

s t

MIT

Conclusions

MIT

Conclusion

Randomization is a crucial tool for algorithm design

Often yields algorithms that are faster or simpler than traditional counterparts

In particular, gives significant improvements for core problems in graph algorithms

MIT

Randomized Methods

Random selection» if most candidate choices “good”, then a

random choice is probably good Monte Carlo simulation

» simulations estimate event likelihoods Random sampling

» generate a small random subproblem» solve, extrapolate to whole problem

Randomized Rounding for approximation

MIT

Random Selection

When most choices good, do one at random

Recursive contraction algorithm for minimum cuts» Extremely simple (also to implement)» Fast in theory and in practice [CGKLS]

MIT

Monte Carlo

To estimate event likelihood, run trials Slow for very rare events Bias samples to reveal rare event FPRAS for network reliability

MIT

Random Sampling

Generate representative subproblem Use it to estimate solution to whole

» Gives approximate solution» May be quickly repaired to exact solution

Bias sample toward “important” or “sensitive” parts of problem

New max-flow and min-cut algorithms

MIT

Randomized Rounding

Convert fractional to integral solutions Get approximation algorithms for integer

programs “Sampling” from a well designed sample

space of feasible solutions Good approximations for network

design.

MIT

Generalization

Our techniques work because undirected graph are matroids

All our results extend/are special cases» Packing bases» Finding minimum “quotients”» Matroid optimization (MST)

MIT

Directed Graphs?

Directed graphs are not matroids Directed graphs can have lots of

minimum cuts Sampling doesn’t appear to work Residual graphs for flows are directed

» Precludes obvious recursive solutions to flow problems

MIT

Open problems

Flow in O(nv) time (complete m n)» Eliminate v dependence» Apply to weighted graphs with large flows» Flow in O(m) time?

Las Vegas algorithms» Finding good certificates

Detrministic algorithms» Deterministic construction of “samples”» Deterministically compress a graph

MIT

Randomization in Graph Optimization

ProblemsDavid Karger

MIT

http://theory.lcs.mit.edu/~karger

[email protected]

Documents

MIT Randomization in Graph Optimization Problems David Karger MIT karger