73
Fast Regression Algorithms Using Spectral Graph Theory Richard Peng

Fast Regression Algorithms Using Spectral Graph Theory

Embed Size (px)

DESCRIPTION

Fast Regression Algorithms Using Spectral Graph Theory. Richard Peng. Outline. Regression: why and how Spectra: fast solvers Graphs: tree embeddings. Learning / Inference. Find (hidden) pattern in (noisy) data. Input signal, s:. Output:. Regression. p ≥ 1: convex - PowerPoint PPT Presentation

Citation preview

Page 1: Fast Regression Algorithms Using Spectral Graph Theory

Fast Regression Algorithms Using Spectral Graph Theory

Richard Peng

Page 2: Fast Regression Algorithms Using Spectral Graph Theory

OUTLINE

•Regression: why and how• Spectra: fast solvers•Graphs: tree embeddings

Page 3: Fast Regression Algorithms Using Spectral Graph Theory

LEARNING / INFERENCE

Find (hidden) pattern in (noisy) data

Output:Input signal, s:

Page 4: Fast Regression Algorithms Using Spectral Graph Theory

REGRESSION

• p ≥ 1: convex• Convex constraints

e.g. linear equalities

Mininimize: |x|p

Subject to: constraints on x

minimize

Page 5: Fast Regression Algorithms Using Spectral Graph Theory

APPLICATION 0: LASSO

Widely used in practice:• Structured output• Robust to noise

[Tibshirani `96]:Min |x|1

s.t. Ax = s

Ax

Page 6: Fast Regression Algorithms Using Spectral Graph Theory

APPLICATION 1: IMAGES

No bears were harmed in the making of these slides

Poisson image processing

MinΣi~j∈E(xi-xj-si~j)2

Page 7: Fast Regression Algorithms Using Spectral Graph Theory

APPLICATION 2: MIN CUT

Remove fewest edges to separate vertices s and t

Min Σij∈E|xi-xj|

s.t. xs=0, xt=1

s t0 10

0 0

1

1 1

Fractional solution = integral solution

Page 8: Fast Regression Algorithms Using Spectral Graph Theory

REGRESSION ALGORITHMS

Convex optimization• 1940~1960: simplex, tractable• 1960~1980: ellipsoid, poly time• 1980~2000: interior point,

efficientÕ(m1/2) interior steps

• m = # non-zeros• Õ hides log

factors

minimize

Page 9: Fast Regression Algorithms Using Spectral Graph Theory

EFFICIENCY MATTERS

•m > 106 for most images• Even bigger (109):• Videos• 3D medical data

Page 10: Fast Regression Algorithms Using Spectral Graph Theory

Õ(m1/2)

KEY SUBROUTINE

Each step of interior point algorithms finds a step direction

minimize Ax=b

Linear system solves

Page 11: Fast Regression Algorithms Using Spectral Graph Theory

MORE REASONS FOR FAST SOLVERS

[Boyd-Vanderberghe `04], Figure 11.20:The growth in the average number of Newton iterations (on randomly generated SDPs)… is very small

Page 12: Fast Regression Algorithms Using Spectral Graph Theory

LINEAR SYSTEM SOLVERS

• [1st century CE] Gaussian Elimination: O(m3)• [Strassen `69] O(m2.8)• [Coppersmith-Winograd `90]

O(m2.3755)• [Stothers `10] O(m2.3737)• [Vassilevska Williams`11]

O(m2.3727)

Total: > m2

Page 13: Fast Regression Algorithms Using Spectral Graph Theory

NOT FAST NOT USED:

• Preferred in practice: coordinate descent, subgradient methods• Solution quality traded for time

Page 14: Fast Regression Algorithms Using Spectral Graph Theory

FAST GRAPH BASED L2 REGRESSION[SPIELMAN-TENG ‘04]

Input: Linear system where A is related to graphs, bOutput: Solution to Ax=bRuntime: Nearly Linear, Õ(m)

Ax=bMore in 12 slides

Page 15: Fast Regression Algorithms Using Spectral Graph Theory

GRAPHS USING ALGEBRA

Fast convergence+ Low cost per step= state of the art algorithms

Ax=b

Page 16: Fast Regression Algorithms Using Spectral Graph Theory

LAPLACIAN PARADIGM

[Daitch-Spielman `08]: mincost fow[Christiano-Kelner-Mądry-Spielman-

Teng `11]:approx maximum flow /min cut

Ax=b

Page 17: Fast Regression Algorithms Using Spectral Graph Theory

EXTENSION 1

[Chin-Mądry-Miller-P `12]:

regression, image processing, grouped L2

Page 18: Fast Regression Algorithms Using Spectral Graph Theory

EXTENSION 2

[Kelner-Miller-P `12]: k-commodity flowDual: k-variate labeling of graphs

s t

Page 19: Fast Regression Algorithms Using Spectral Graph Theory

EXTENSION 3

[Miller-P `13]: faster for structured images / separable graphs

Page 20: Fast Regression Algorithms Using Spectral Graph Theory

NEED: FAST LINEAR SYSTEM SOLVERS

Implication of fast solvers:• Fast regression routines• Parallel, work efficient graph algorithms

minimize Ax=b

Page 21: Fast Regression Algorithms Using Spectral Graph Theory

OTHER APPLICATIONS

• [Tutte `66]: planar embedding• [Boman-Hendrickson-Vavasis`04]: PDEs• [Orecchia-Sachedeva-Vishnoi`12]: balanced cut / graph separator

Page 22: Fast Regression Algorithms Using Spectral Graph Theory

OUTLINE

• Regression: why and how•Spectra: Linear system solvers•Graphs: tree embeddings

Page 23: Fast Regression Algorithms Using Spectral Graph Theory

PROBLEM

Given: matrix A, vector bSize of A:• n-by-n• m non-zeros

Ax=b

Page 24: Fast Regression Algorithms Using Spectral Graph Theory

SPECIAL STRUCTURE OF A

A = Deg – Adj• Deg: diag(degree)• Adj: adjacency

matrix

[Gremban-Miller `96]: extensions to SDD matrices

`

Aij =deg(i) if i=jw(ij)

otherwise

Page 25: Fast Regression Algorithms Using Spectral Graph Theory

UNSTRUCTURED GRAPHS

• Social network• Intermediate systems of other algorithms are almost adversarial

Page 26: Fast Regression Algorithms Using Spectral Graph Theory

NEARLY LINEAR TIME SOLVERS[SPIELMAN-TENG ‘04]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: Nearly Linear.O(m logcn log(1/ε)) expected

• runtime is cost per bit of accuracy.

• Error in the A-norm: |y|A=√yTAy.

Page 27: Fast Regression Algorithms Using Spectral Graph Theory

HOW MANY LOGS

Runtime: O(mlogcn log(1/ ε))

Value of c: I don’t know

[Spielman]: c≤70

[Koutis]: c≤15

[Miller]: c≤32

[Teng]: c≤12

[Orecchia]: c≤6

When n = 106, log6n > 106

Page 28: Fast Regression Algorithms Using Spectral Graph Theory

PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `10]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: O(mlog2n log(1/ ε))• runtime is cost per bit of

accuracy.• Error in the A-norm: |y|A=√yTAy.

Page 29: Fast Regression Algorithms Using Spectral Graph Theory

PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `11]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: O(mlogn log(1/ ε))• runtime is cost per bit of

accuracy.• Error in the A-norm: |y|A=√yTAy.

Page 30: Fast Regression Algorithms Using Spectral Graph Theory

STAGES OF THE SOLVER

• Iterative Methods• Spectral Sparsifiers• Low Stretch Spanning Trees

Page 31: Fast Regression Algorithms Using Spectral Graph Theory

ITERATIVE METHODS

Numerical analysis:Can solve systems in A by iteratively solving spectrally similar, but easier, B

Page 32: Fast Regression Algorithms Using Spectral Graph Theory

WHAT IS SPECTRALLY SIMILAR?

A ≺ B ≺ kA for some small k

• Ideas from scalars hold!• A ≺ B: for any vector x,

|x|A2 < |x|B

2

[Vaidya `91]: Since A is a graph, B should be too!

[Vaidya `91]: Since G is a graph, H should be too!

Page 33: Fast Regression Algorithms Using Spectral Graph Theory

`EASIER’ H

Goal: H with fewer edges that’s similar to G

Ways of easier:• Fewer vertices• Fewer edges

Can reduce vertex count if edge count is small

Page 34: Fast Regression Algorithms Using Spectral Graph Theory

GRAPH SPARSIFIERS

Sparse equivalents of graphs that preserve something

• Spanners: distance, diameter.• Cut sparsifier: all cuts.•What we need: spectrum

Page 35: Fast Regression Algorithms Using Spectral Graph Theory

WHAT WE NEED: ULTRASPARSIFIERS

[Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time.

• Given: G with n vertices, m edges parameter k• Output: H with n

vertices, n-1+O(mlogpn/k) edges• Goal: G ≺ H ≺ kG

` `

Page 36: Fast Regression Algorithms Using Spectral Graph Theory

EXAMPLE: COMPLETE GRAPH

O(nlogn) random edges (with scaling) suffice w.h.p.

Page 37: Fast Regression Algorithms Using Spectral Graph Theory

GENERAL GRAPH SAMPLING MECHANISM

• For edge e, flip coin Pr(keep) = P(e)• Rescale to maintain expectation

Number of edges kept: ∑e P(e)

Also need to prove concentration

Page 38: Fast Regression Algorithms Using Spectral Graph Theory

EFFECTIVE RESISTANCE

•View the graph as a circuit•R(u,v) = Pass 1 unit of current from u to v, measure resistance of circuit

`

Page 39: Fast Regression Algorithms Using Spectral Graph Theory

EE101

Effective resistance in general:solve Gx = euv, where euv is indicator vector, R(u,v) = xu – xv.

`

Page 40: Fast Regression Algorithms Using Spectral Graph Theory

(REMEDIAL?) EE101

•Single edge: R(e) = 1/w(e)•Series: R(u, v) = R(e1) + … + R(el)

`w1

`

u v

u v

w1 w2

R(u, v) = 1/w1

R(u, v) = 1/w1 + 1/w2

Page 41: Fast Regression Algorithms Using Spectral Graph Theory

SPECTRAL SPARSIFICATION BY EFFECTIVE RESISTANCE

[Spielman-Srivastava `08]: Setting P(e) to W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G*

*Ignoring probabilistic issues

[Foster `49]: ∑e W(e)R(e) = n-1Spectral sparsifier with O(nlogn) edges

Ultrasparsifier? Solver???

Page 42: Fast Regression Algorithms Using Spectral Graph Theory

THE CHICKEN AND EGG PROBLEM

How to find effective resistance?

[Spielman-Srivastava `08]: use solver[Spielman-Teng `04]: need sparsifier

Page 43: Fast Regression Algorithms Using Spectral Graph Theory

OUR WORK AROUND

•Use upper bounds of effective resistance, R’(u,v)•Modify the problem

Page 44: Fast Regression Algorithms Using Spectral Graph Theory

RAYLEIGH’S MONOTONICITY LAW

Rayleigh’s Monotonicity Law: R(u, v) only increase when edges are removed

`

Calculate effective resistance w.r.t. a tree T

Page 45: Fast Regression Algorithms Using Spectral Graph Theory

SAMPLING PROBABILITIES ACCORDING TO TREE

Sample Probability: edge weight times effective resistance of tree path

`

Goal: small total stretch

stretch

Page 46: Fast Regression Algorithms Using Spectral Graph Theory

GOOD TREES EXIST

Every graph has a spanning tree with total stretch O(mlogn)

O(mlog2n) edges, too many!

∑e W(e)R’(e) = O(mlogn)

Hiding loglogn

More in 12 slides (again!)

Page 47: Fast Regression Algorithms Using Spectral Graph Theory

‘GOOD’ TREE???

Unit weight case:stretch ≥ 1 for all edges

`

Stretch = 1+1 = 2

Page 48: Fast Regression Algorithms Using Spectral Graph Theory

WHAT ARE WE MISSING?

• Need:• G ≺ H ≺ kG• n-1+O(mlogpn/k) edges

• Generated:• G ≺ H ≺ 2G• n-1+O(mlog2n) edges

` `Haven’t used k!

Page 49: Fast Regression Algorithms Using Spectral Graph Theory

USE K, SOMEHOW

• Tree is good!• Increase weights of

tree edges by factor of k

`

G ≺ G’ ≺ kG

Page 50: Fast Regression Algorithms Using Spectral Graph Theory

RESULT

• Tree heavier by factor of k• Tree effective resistance

decrease by factor of k

`

Stretch = 1/k+1/k = 2/k

Page 51: Fast Regression Algorithms Using Spectral Graph Theory

NOW SAMPLE?

Expected in H:Tree edges: n-1Off tree edges: O(mlog2n/k)

`

Total: n-1+O(mlog2n/k)

Page 52: Fast Regression Algorithms Using Spectral Graph Theory

BUT WE CHANGED G!

G ≺ G’ ≺ kGG’ ≺ H ≺ 2G’

`

G ≺ H≺ 2kG

Page 53: Fast Regression Algorithms Using Spectral Graph Theory

WHAT WE NEED: ULTRASPARSIFIERS

[Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time.

• Given: G with n vertices, m edges parameter k• Output: H with n

vertices, n-1+O(mlogpn/k) edges• Goal: G ≺ H ≺ kG

` `

G ≺ H≺ 2kGn-1+O(mlog2n/k) edges

Page 54: Fast Regression Algorithms Using Spectral Graph Theory

• Input: Graph Laplacian G• Compute low stretch tree T of

G• T ( log2n) T • H G + T • H SampleT(H)• Solve G by iterating on H and

solving recursively, but reuse T

PSEUDOCODE OF O(MLOGN) SOLVER

Page 55: Fast Regression Algorithms Using Spectral Graph Theory

EXTENSIONS / GENERALIZATIONS

• [Koutis-Levin-P `12]: sparsify mildly dense graphs in O(m) time• [Miller-P `12]: general matrices: find ‘simpler’ matrix that’s similar in O(m+n2.38+a) time.

` `

Page 56: Fast Regression Algorithms Using Spectral Graph Theory

SUMMARY OF SOLVERS

• Spectral graph theory allows one to find similar, easier to solve graphs• Backbone: good trees

` `

Page 57: Fast Regression Algorithms Using Spectral Graph Theory

SOLVERS USING GRAPH THEORY

Fast solvers for graph Laplacians use combinatorial graph theory

Ax=b

Page 58: Fast Regression Algorithms Using Spectral Graph Theory

OUTLINE

• Regression: why and how• Spectra: linear system solvers•Graphs: tree embeddings

Page 59: Fast Regression Algorithms Using Spectral Graph Theory

LOW STRETCH SPANNING TREE

Sampling probability: edge weight times effective resistance of tree path Unit weight case: length of tree path

Low stretch spanning tree: small total stretch

Page 60: Fast Regression Algorithms Using Spectral Graph Theory

DIFFERENT THAN USUAL TREES

n1/2-by-n1/2 unit weighted mesh

stretch(e)= O(1)

total stretch = Ω(n3/2)

stretch(e)=Ω(n1/2)

‘haircomb’ is both shortest path and max weight spanning tree

Page 61: Fast Regression Algorithms Using Spectral Graph Theory

A BETTER TREE FOR THE GRID

Recursive ‘C’

Page 62: Fast Regression Algorithms Using Spectral Graph Theory

LOW STRETCH SPANNING TREES

[Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]:Any graph has a spanning tree with total stretch O(mlogn)

Hiding loglogn

Page 63: Fast Regression Algorithms Using Spectral Graph Theory

ISSUE: RUNNING TIME

Algorithms given by[Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]take O(nlog2n+mlogn) time

Reason: O(logn) shortest paths

Page 64: Fast Regression Algorithms Using Spectral Graph Theory

SPEED UP

[Koutis-Miller-P `11]:• Round edge weights to powers of

2• k=logn, total work = O(mlogn)

[Orlin-Madduri-Subramani-Williamson `10]:Shortest path on graphs with k distinct weights can run in O(mlogm/nk) time

Hiding loglogn, we actually improve these

Page 65: Fast Regression Algorithms Using Spectral Graph Theory

• [Blelloch-Gupta-Koutis-Miller-P-Tangwongsan. `11]: current framework parallelizes to O(m1/3+a) depth• Combine with Laplacian paradigm fast parallel graph algorithms

` `

PARALLEL ALGORITHM?

Page 66: Fast Regression Algorithms Using Spectral Graph Theory

• Before this work: parallel time > state of the art sequential time

• Our result: parallel work close to sequential, and O(m2/3) time

PARALLEL GRAPH ALGORITHMS?

Page 67: Fast Regression Algorithms Using Spectral Graph Theory

FUNDAMENTAL PROBLEM

Long standing open problem: theoretical speedups for BFS / shortest path in directed graphs

Sequential algorithms are too fast!

Page 68: Fast Regression Algorithms Using Spectral Graph Theory

First step of framework by[Elkin-Emek-Spielman-Teng `05]:

` `

PARALLEL ALGORITHM?

shortest path

Page 69: Fast Regression Algorithms Using Spectral Graph Theory

•Workaround: use earlier algorithm by [Alon-Karp-Peleg-West `95]

• Idea: repeated clustering• Based on ideas from [Cohen `93, `00] for approximating shortest path

PARALLEL TREE EMBEDDING

Page 70: Fast Regression Algorithms Using Spectral Graph Theory

PARALLEL TREE EMBEDDING

Page 71: Fast Regression Algorithms Using Spectral Graph Theory

THE BIG PICTURE

•Need fast linear system solvers for graph regression•Need combinatorial graph algorithms for fast solvers

Ax=bminimize

Page 72: Fast Regression Algorithms Using Spectral Graph Theory

ONGOING / FUTURE WORK

• Better regression?• Faster/parallel solver?• Sparse approximate (pseudo) inverse?•Other types of systems?

Page 73: Fast Regression Algorithms Using Spectral Graph Theory

THANK YOU!

Questions?