Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)

1

Sketching and Embedding are

Equivalent for NormsAlexandr Andoni (Simons Institute)

Robert Krauthgamer (Weizmann Institute)

Ilya Razenshteyn (CSAIL MIT)

2

Sketching

• Compress a massive object to a small sketch• Rich theories: high-dimensional vectors, matrices, graphs• Similarity search, compressed sensing, numerical linear algebra• Dimension reduction (Johnson, Lindenstrauss 1984): random

projection on a low-dimensional subspace preserves distances

n

d

When is sketching possible?

3

Similarity search

• Motivation: similarity search• Model similarity as a metric• Sketching may speed-up computation

and allow indexing• Interesting metrics:• Euclidean ℓ2: d(x, y) = (∑i|xi – yi|2)1/2

• Manhattan, Hamming ℓ1: d(x, y) = ∑i|xi – yi|• ℓp distances d(x, y) = (∑i|xi – yi|p)1/p for p ≥ 1• Edit distance, Earth Mover’s Distance etc.

4

Sketching metrics

• Alice and Bob each hold a point from a metric space (say x and y)• Both send s-bit sketches to Charlie• For r > 0 and D > 1 distinguish• d(x, y) ≤ r• d(x, y) ≥ Dr

• Shared randomness, allow 1% probability of error• Trade-off between s and D

sketch(x) sketch(y)

d(x, y) ≤ r or d(x, y) ≥ Dr?

0 1 1 0 … 1

Alice Bob

Charlie

x y

5

Near Neighbor Search via sketches

• Near Neighbor Search (NNS):• Given n-point dataset P• A query q within r from some data point• Return any data point within Dr from q

• Sketches of size s imply NNS with space nO(s) and a 1-probe query• Proof idea: amplify probability of

error to 1/n by increasing the size to O(s log n); sketch of q determines the answer

6

Sketching real line

• Distinguish |x – y| ≤ 1 vs. |x – y| ≥ 1 + ε• Randomly shifted pieces of

size w = 1 + ε/2• Repeat O(1 / ε2) times• Overall:• D = 1 + ε• s = O(1 / ε2)

x y0 01

7

Sketching ℓp for 0 < p ≤ 2

• (Indyk 2000): can reduce sketching of ℓp with 0 < p ≤ 2 to sketching reals via random projections• If (G1, G2, …, Gd) are i.i.d. N(0, 1)’s, then ∑i xiGi – ∑i yiGi is distributed as

‖x - y‖2 • N(0, 1)• For 0 < p < 2 use p-stable distributions instead• Again, get D = 1 + ε with s = O(1 / ε2)• For p > 2 sketching ℓp is hard: to achieve D = O(1) one needs sketch

size to be s = Θ~(d1-2/p) (Bar-Yossef, Jayram, Kumar, Sivakumar 2002), (Indyk, Woodruff 2005)

8

Anything else?

• A map f: X → Y is an embedding with distortion D’, if for a, b from X:

dX(a, b) / D’ ≤ dY(f(a), f(b)) ≤ dX(a, b)• Reductions for geometric problems• If Y has s-bit sketches for approximation D, then for X one gets s bits

and approximation DD’

9

Metrics with good sketches

• A metric X admits sketches with s, D = O(1), if:• X = ℓp for p ≤ 2• X embeds into ℓp for p ≤ 2 with distortion O(1)

• Are there any other metrics with efficient sketches (D and s are O(1))?• We don’t know!• Some new techniques waiting to be discovered?• No new techniques?!

10

The main result

If a normed space X admits sketches of size s and approximation D, then for every ε > 0 the space X embeds (linearly) into ℓ1 – ε with distortion O(sD / ε)

Embedding into ℓp, p ≤ 2

Efficient sketches

(Kushilevitz, Ostrovsky, Rabani 1998)(Indyk 2000)

For norms

• A vector space X with ‖.‖: X → R≥0 is a normed space, if• ‖x‖ = 0 iff x = 0• ‖αx‖ = |α|‖x‖• ‖x + y‖ ≤ ‖x‖ + ‖y‖

• Every norm gives rise to a metric: define d(x, y) = ‖x - y‖

11

Sanity check

• ℓp spaces: p > 2 is hard, 1 ≤ p ≤ 2 is easy, p < 1 is not a norm• Can classify mixed norms ℓp(ℓq): in particular, ℓ1(ℓ2) is easy, while ℓ2(ℓ1)

is hard! (Jayram, Woodruff 2009), (Kalton 1985)• A non-example: edit distance is not a norm, sketchability is largely

open (Ostrovsky, Rabani 2005), (Andoni, Jayram, Pătraşcu 2010)

ℓq ℓp

12

No embeddings → no sketches

• In the contrapositive: if a normed space is non-embeddable into ℓ1 – ε, then it does not have good sketches• Can convert sophisticated non-embeddability results into lower

bounds for sketches

13

Example 1: the Earth Mover’s Distance• For x: R[Δ]×[Δ] → R with ∑i,j xi,j = 0, define the Earth Mover’s Distance

‖x‖EMD as the cost of the best transportation of the positive part of x to the negative part (Monge-Kantorovich norm)• Best upper bounds:• D = O(1 / ε) and s = Δε (Andoni, Do Ba, Indyk, Woodruff 2009)• D = O(log Δ) and s = O(1) (Charikar 2002), (Indyk, Thaper 2003), (Naor,

Schechtman 2005)

No embedding into ℓ1 – ε with distortion O(1)(Naor, Schechtman 2005)

No sketches with D = O(1) and s = O(1)

14

Example 2: the Trace Norm

• For an n × n matrix A define the Trace Norm (the Nuclear Norm) ‖A‖ to be the sum of the singular values• Previously: lower bounds only for certain restricted classes of

sketches (Li, Nguyen, Woodruff 2014)

Any embedding into ℓ1 requires distortion Ω(n1/2) (Pisier 1978)

Any sketch must satisfy sD = Ω(n1/2 / log n)

15

The plan of the proof

If a normed space X admits sketches of size s and approximation D, then for every ε > 0 the space X embeds (linearly) into ℓ1 – ε with distortion O(sD / ε)

SketchesWeak embedding into ℓ2

Linear embedding into ℓ1 – ε

Information theory Nonlinear functional analysis

A map f: X → Y is (s1, s2, τ1, τ2)-threshold, if• dX(x1, x2) ≤ s1 implies dY(f(x1), f(x2)) ≤ τ1

• dX(x1, x2) ≥ s2 implies dY(f(x1), f(x2)) ≥ τ2

(1, O(sD), 1, 10)-threshold map from X to ℓ2

16

Sketch → Threshold map

X has a sketch of size s and approximation D

There is a (1, O(sD), 1, 10)-threshold map from X to ℓ2

?

No (1, O(sD), 1, 10)-threshold map from X to ℓ2

Poincaré-type inequalities on X

Convex dualityℓk

∞(X) has no sketches of size Ω(k) and approximation Θ(sD)

(Andoni, Jayram, Pătraşcu 2010) (direct sum theorem for information complexity)

X has no sketches of size s and approximation D

‖(x1, …, xk)‖ = maxi ‖xi‖

17

Sketching direct sums

X has sketches of size s and approximation D

ℓk∞(X) has sketches of size O(s)

and approximation Dk

Alice Bob(a1, a2, …, ak) (b1, b2, …, bk)

∑i σi ai ∑i σi bi

sketch(∑i σi ai) sketch(∑i σi bi)

maxi ‖ai - bi‖≤ ‖∑i σi(ai – bi)‖≤ ∑i ‖ai - bi‖≤ k maxi ‖ai - bi‖

Crucially use the linear structure of X (not enough to be just a metric!)

(σ1, σ2, …, σk) — random ±1’s

with probability 1/2

18

Threshold map → linear embedding

(1, O(sD), 1, 10)-threshold map from X to ℓ2

Linear embedding into ℓ1 – ε with distortion O(sD / ε)

Uniform embedding into ℓ2

g: X → ℓ2 s.t. L(‖x1 – x2‖) ≤ ‖g(x1) – g(x2)‖ ≤ U(‖x1 – x2‖) where• L and R are non-decreasing,• L(t) > 0 for t > 0• U(t) → 0 as t → 0

(Aharoni, Maurey, Mityagin 1985) (Nikishin 1973)

?

19

Threshold map → uniform embedding• A map f: X → ℓ2 such that• ‖x1 - x2‖ ≤ 1 implies ‖f(x1) - f(x2)‖ ≤ 1• ‖x1 - x2‖ ≥ Θ(sD) implies ‖f(x1) - f(x2)‖ ≥ 10

• Building on (Johnson, Randrianarivony 2006)• 1-net N of X; f Lipschitz on N• Extend f from N to a Lipschitz function on the whole X

20

Open problems

• Extend to as general class of metrics as possible• Connection to linear sketches?

• Sketches of the form x → Ax• Conjecture: sketches of size s and approximation D can be converted to linear

sketches with f(s) measurements and approximation g(D)

• Spaces that admit no non-trivial sketches (s = Ω(d) for D = O(1)): is there anything besides ℓ∞?• Can one strengthen our theorem to “sketchability implies embeddability

into ℓ1”?• Equivalent to an old open problem from Functional Analysis.

• Sketches imply NNS, is there a reverse implication?

Documents

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)