Doubling Dimension: a short survey Anupam Gupta Carnegie Mellon University Barriers in Computational Complexity II, CCI, Princeton

Doubling Dimension:a short survey

Anupam GuptaCarnegie Mellon University

Barriers in Computational Complexity II, CCI, Princeton

Metric space M = (V, d)

(finite) set V of points

symmetric non-negativedistances d(x,y)

triangle inequalityd(x,y) ≤ d(x,z) + d(z,y)

x

y

z

Dimension dimD(M) is the smallest k such that

every set S with diameter DS

can be covered by 2k sets of diameter ½DS

D

doubling dimension

¸ = 2dim_D = doubling constant

doubling generalizes geometric dimension

Take k-dim Euclidean space Rk

Claim: dimD(Rk) ≈ Θ(k)

Easy to see for boxes

Argument for spheres a bit more involved. 23 boxes to cover

larger box in R3

facts about doubling

The notion of doubling dimension behaves smoothly under metric distortion

definition closed under taking submetrics

jargon: “doubling” = family of metrics with doubling dimension bounded by some absolute constant c independent of n.

Suppose a metric (X,d) has doubling dimension k.

If any subset S µ X of points has all inter-point distances lying between ± and ¢

then |S| ≤ (2¢/±)k

useful property of doubling

Proof: recursively apply the definition…

Suppose a metric (X,d) has doubling dimension k.

If any subset S µ X of points has all inter-point distances lying between ± and ¢

then |S| ≤ (2¢/±)k

useful property of doubling

/2D

this 2-dim set

has O(/)2 points

what is not a doubling metric?

The equidistant metric Un on n points has dimension (log n)

Hence low doubling dimension captures the fact that the metric does not have large (near)-equidistant metrics.

the picture thus far…

Doubling dimension kEuclidean

dimension £(k)

Metrics with >> 2k

nearly-equidistant points

btw, just to check

Natural Q: Do all doubling metrics embed into ℓ2 with distortion O(1)?

No.

The Laakso fractals require (√log n) distortion to embed into ℓ2 withany number of dimensions. [GKL’03]

In fact, the right behavior is £(√ dimD log n) [KLMN’04, ABN’05, JLM’09]

Many geometric algorithms can be extended to doubling spaces…

Near neighbor searchCompact routingDistance labelingNetwork triangulationSensor placements

Small-world networksTraveling SalesmanSparse SpannersApprox. inferenceNetwork Design

Clustering problemsWell-separated pair

decompositionData structuresLearnability

a substantial(?) generalization

Doubling dimension kEuclidean

dimension £(k)

example application

Assign labels L(x) to each host x in a metric spaceLooking just at L(x) and L(y), can infer distance d(x,y)

Results

labels with (O(1)/ε)dim × log n bitsestimates within (1 + ε) factor

Contrast withlower bound of n bit labels in general for any factor < 2

x

y010001

110001

f( , )

110001

010001

≈ d(x,y)

[Arora 95] showed that TSP on Rk was (1+²)-approximable in time

[Talwar 04] extended the first result to metrics with doubling dimension k

another example

Can we get the PTAS as well?

example in action: sparse spanners for doubling metrics

spanners

Given a metric M = (V, d), a graph G = (V, E) is an (m, ²)-spanner if1) number of edges in G is m2) d(x,y) ≤ dG(x,y) ≤ (1 + ²) d(x,y)

A reasonable goal: ² = 0.1, m = O(n)

Fact: For the equidistant metric Un, if ² < 1 then G = Kn

spanners for doubling metrics

Theorem:Given any metric M, and any ² < ½,we can efficiently find an spanner G with stretch ²

and number of edges m = n (1 + 1/²) dimD(M)

Hence, for doubling metrics, linear-sized spanners!

Generalizes a similar theorem for Euclidean metrics.

standard tool: nets

Nets: A set of points N is an r-net of a set S if– d(u,v) ≥ r for any u,v 2 N– For every w 2 S \ N, there is a u 2 N with d(u,w) < r

r

standard tool: nets

Nets: A set of points N is an r-net of a set S if– d(u,v) ≥ r for any u,v 2 N– For every w 2 S \ N, there is a u 2 N with d(u,w) < r

Fact: If a metric has doubling dim k and N is an r-net

) B(x,2r) \ N has O(1)k points.

recursive nets

24

816

so you take a 2-net N1 of these pointsNow you can take a 4-net N2 of this net

And so on…

Suppose all the points were at least unit distance apart

recursive nets

N0 = V

Nt is a 2t-net of the set Nt-1

N1

N2

N3

N4

Nt is a 2t+1-net of the set V (almost)

the spanner construction

N0 = V

Nt is a 2t-net of the set Nt-1

N1

N2

N3

N4

Nt is a 2t+1-net of the set V (almost)

Connect eachnet point in Nt

to other net points at distance

at most O(1/²) 2t

the number of edges

Number of points in Nt within O(1/²) 2t of some net point

at most O(1/²)k

Number of levels = O(log diameter)

Number of nodes in net at each level ≤ n

Hence, number of edges ≤ n × log diameter × O(1/²)k

Can be improved to n × O(1/²)k

the stretch factor

spanners for doubling metrics

Theorem:Given any metric M, and any ² < ½,we can efficiently find an (m, ²)-spanner G with

number of edges m = n (1 + 1/²) dimD(M)

Hence, for doubling metrics, linear-sized spanners!

example in action: TSP for doubling metrics

plan of attack

We have PTASs for TSP for points in constant-dimensional ℓ2.

If we could embed doubling metrics into constant-dimensional ℓ2

that maintains distances to within (1+²) (in expectation)

we’d be done.

completely ridiculous strategy, but maybe we’ll get somewhere.

embedding doubling trees into ℓ2

Recall: embedding doubling metrics into ℓ2requires (√log n) distortion, regardless of

dim’n.

however…

Theorem: if a doubling metric is also a tree metric, embeds into ℓ2 with distortion O(1) and dimension

O(1)poly(¸) poly(¸)

embedding doubling metrics into doubling trees

Bad news:2-d grids require (log n) distortion

to embed into distributions over trees

Good news:All doubling metrics embed into distributions over

doubling trees with distortion O(log n).

plan of attack

We have PTASs for TSP for points in constant-dimensional ℓ2.

If we could embed doubling metrics into constant-dimensional ℓ2

that maintains distances to within (1+²) (in expectation)

we’d be done.

revised

Arora’s simpler TSP idea

Given any TSP tour of length L in d-dim spacefind B = (log n/±)d portals in each cluster

and show there exists a portal-respecting

tour which increases length by ≤ ± L

Now dynamic program to find best portal-resp tour

Þ runtime ~ (n log n) BB

Arora’s simpler TSP idea

Given any TSP tour of length L in d-dim spacefind B = (log n/±)d portals in each cluster

and show there exists a portal-respecting tour which increases length by ≤ ± L

define portals, choosing ± = ²/O(log n)

OPT tour of length L* in original doubling metric

embeds into O(1)-dim space with length L = O(log n)L*

increase in length = ± L = ² L*

and now find the best portal-respecting tour in original doubling metric!

recap for TSP

embedded doubling metric randomly into doubling trees

embedded those into constant-dimensional ℓ2

use that to find clusters/portalsand claim existence of (1+²) OPT tour

find best tour in original metric using dynamic programming.

Talwar’s algorithm does it better, dependence on dimD, not on ¸

low dimensional embeddings(and dimensionality reduction)

dimensionality reduction

If a Euclidean metric embeds into Rk for some dimension kwith distortion O(1)

the Euclidean metric has doubling dimension O(k)

we want to efficiently find an Euclidean embedding into RO(k)

with distortion O(1)

We just saw: embed any metric with doubling dimension k into distribution over 2O(k)-dimensional ℓ1 spaces

with distortion O(log n)2O(k).



with distortion O(log n)2O(k).







with distortion O(log n)2O(k). O(k) ℓ2 space

O*(log n)

Better:





a more general bound

Example Theorem:Any metric with doubling dimension dimD embeds intoEuclidean space with T dimensions with distortion

(where T 2 [ dimD log log n, log n])

All these techniques are ultimately limited by fact thatthey embed all doubling metrics, and not just Euclidean ones.

log ndimD

T

special cases of interest

Distortionon using O(dimD (M))Euclidean dimensions

Distortion on using O(log n)

Euclidean dimensions

General metrics

Euclidean

This generalizes result we talked about in Lecture #2: any metric embeds into Euclidean space with O(log n) distortionThis is just the Johnson-Lindenstrauss lemma.

If the metric is doubling, this quantity is sqrt{log n}.

In general, this is never more than O(log n).

Again generalizes the previous result.

weaken requirements?

Low-dimensional projection preserving near-neighborsO(log dimD poly ²-1) dimension random projection [IN05?]

(random projections also work for points on smooth manifolds)

Give low-dim set of points approximating d(x,y)0.99

Again, can get similar dimensionality… [GK10, BRS10]

one more useful tool..

Given a metric M,want to partition it randomly

into pieces of “small” diametersuch that “nearby” vertices lie in different pieces

only with “small” probability.

“random metric decompositions”

“padded” decompositions

A metric (V,d) admits ¯-padded decompositions, if for every ¢, we can output a random partition

V = V1 ] V2 ] … ] Vk

1. each Vj has diameter ≤ ¢

2. Pr[ B(x,½) split ] ≤

the facts

Thm: Doubling metrics admit O(dimD)-padded decompositions

Useful wherever padded decompositions are useful

E.g.: can prove that all doubling metrics embed into ℓ2 with distortion

last slide: some questions

For specific metric space problems, can we match the performance for their geometric counterparts?

Which problems admit algorithms whose performance can be parameterized using such a notion of dimension?

Other notions of dimension that are algorithmically significant?

thank you!

Documents

Doubling Dimension: a short survey Anupam Gupta Carnegie Mellon University Barriers in Computational Complexity II, CCI, Princeton