Upload
byron-amos-gallagher
View
226
Download
0
Embed Size (px)
Citation preview
1
Sketching and Embedding are
Equivalent for NormsAlexandr Andoni (Simons Institute)
Robert Krauthgamer (Weizmann Institute)
Ilya Razenshteyn (CSAIL MIT)
2
Sketching
• Compress a massive object to a small sketch• Rich theories: high-dimensional vectors, matrices, graphs• Similarity search, compressed sensing, numerical linear algebra• Dimension reduction (Johnson, Lindenstrauss 1984): random
projection on a low-dimensional subspace preserves distances
n
d
When is sketching possible?
3
Similarity search
• Motivation: similarity search• Model similarity as a metric• Sketching may speed-up computation
and allow indexing• Interesting metrics:• Euclidean ℓ2: d(x, y) = (∑i|xi – yi|2)1/2
• Manhattan, Hamming ℓ1: d(x, y) = ∑i|xi – yi|• ℓp distances d(x, y) = (∑i|xi – yi|p)1/p for p ≥ 1• Edit distance, Earth Mover’s Distance etc.
4
Sketching metrics
• Alice and Bob each hold a point from a metric space (say x and y)• Both send s-bit sketches to Charlie• For r > 0 and D > 1 distinguish• d(x, y) ≤ r• d(x, y) ≥ Dr
• Shared randomness, allow 1% probability of error• Trade-off between s and D
sketch(x) sketch(y)
d(x, y) ≤ r or d(x, y) ≥ Dr?
0 1 1 0 … 1
Alice Bob
Charlie
x y
5
Near Neighbor Search via sketches
• Near Neighbor Search (NNS):• Given n-point dataset P• A query q within r from some data point• Return any data point within Dr from q
• Sketches of size s imply NNS with space nO(s) and a 1-probe query• Proof idea: amplify probability of
error to 1/n by increasing the size to O(s log n); sketch of q determines the answer
6
Sketching real line
• Distinguish |x – y| ≤ 1 vs. |x – y| ≥ 1 + ε• Randomly shifted pieces of
size w = 1 + ε/2• Repeat O(1 / ε2) times• Overall:• D = 1 + ε• s = O(1 / ε2)
x y0 01
7
Sketching ℓp for 0 < p ≤ 2
• (Indyk 2000): can reduce sketching of ℓp with 0 < p ≤ 2 to sketching reals via random projections• If (G1, G2, …, Gd) are i.i.d. N(0, 1)’s, then ∑i xiGi – ∑i yiGi is distributed as
‖x - y‖2 • N(0, 1)• For 0 < p < 2 use p-stable distributions instead• Again, get D = 1 + ε with s = O(1 / ε2)• For p > 2 sketching ℓp is hard: to achieve D = O(1) one needs sketch
size to be s = Θ~(d1-2/p) (Bar-Yossef, Jayram, Kumar, Sivakumar 2002), (Indyk, Woodruff 2005)
8
Anything else?
• A map f: X → Y is an embedding with distortion D’, if for a, b from X:
dX(a, b) / D’ ≤ dY(f(a), f(b)) ≤ dX(a, b)• Reductions for geometric problems• If Y has s-bit sketches for approximation D, then for X one gets s bits
and approximation DD’
9
Metrics with good sketches
• A metric X admits sketches with s, D = O(1), if:• X = ℓp for p ≤ 2• X embeds into ℓp for p ≤ 2 with distortion O(1)
• Are there any other metrics with efficient sketches (D and s are O(1))?• We don’t know!• Some new techniques waiting to be discovered?• No new techniques?!
10
The main result
If a normed space X admits sketches of size s and approximation D, then for every ε > 0 the space X embeds (linearly) into ℓ1 – ε with distortion O(sD / ε)
Embedding into ℓp, p ≤ 2
Efficient sketches
(Kushilevitz, Ostrovsky, Rabani 1998)(Indyk 2000)
For norms
• A vector space X with ‖.‖: X → R≥0 is a normed space, if• ‖x‖ = 0 iff x = 0• ‖αx‖ = |α|‖x‖• ‖x + y‖ ≤ ‖x‖ + ‖y‖
• Every norm gives rise to a metric: define d(x, y) = ‖x - y‖
11
Sanity check
• ℓp spaces: p > 2 is hard, 1 ≤ p ≤ 2 is easy, p < 1 is not a norm• Can classify mixed norms ℓp(ℓq): in particular, ℓ1(ℓ2) is easy, while ℓ2(ℓ1)
is hard! (Jayram, Woodruff 2009), (Kalton 1985)• A non-example: edit distance is not a norm, sketchability is largely
open (Ostrovsky, Rabani 2005), (Andoni, Jayram, Pătraşcu 2010)
ℓq ℓp
12
No embeddings → no sketches
• In the contrapositive: if a normed space is non-embeddable into ℓ1 – ε, then it does not have good sketches• Can convert sophisticated non-embeddability results into lower
bounds for sketches
13
Example 1: the Earth Mover’s Distance• For x: R[Δ]×[Δ] → R with ∑i,j xi,j = 0, define the Earth Mover’s Distance
‖x‖EMD as the cost of the best transportation of the positive part of x to the negative part (Monge-Kantorovich norm)• Best upper bounds:• D = O(1 / ε) and s = Δε (Andoni, Do Ba, Indyk, Woodruff 2009)• D = O(log Δ) and s = O(1) (Charikar 2002), (Indyk, Thaper 2003), (Naor,
Schechtman 2005)
No embedding into ℓ1 – ε with distortion O(1)(Naor, Schechtman 2005)
No sketches with D = O(1) and s = O(1)
14
Example 2: the Trace Norm
• For an n × n matrix A define the Trace Norm (the Nuclear Norm) ‖A‖ to be the sum of the singular values• Previously: lower bounds only for certain restricted classes of
sketches (Li, Nguyen, Woodruff 2014)
Any embedding into ℓ1 requires distortion Ω(n1/2) (Pisier 1978)
Any sketch must satisfy sD = Ω(n1/2 / log n)
15
The plan of the proof
If a normed space X admits sketches of size s and approximation D, then for every ε > 0 the space X embeds (linearly) into ℓ1 – ε with distortion O(sD / ε)
SketchesWeak embedding into ℓ2
Linear embedding into ℓ1 – ε
Information theory Nonlinear functional analysis
A map f: X → Y is (s1, s2, τ1, τ2)-threshold, if• dX(x1, x2) ≤ s1 implies dY(f(x1), f(x2)) ≤ τ1
• dX(x1, x2) ≥ s2 implies dY(f(x1), f(x2)) ≥ τ2
(1, O(sD), 1, 10)-threshold map from X to ℓ2
16
Sketch → Threshold map
X has a sketch of size s and approximation D
There is a (1, O(sD), 1, 10)-threshold map from X to ℓ2
?
No (1, O(sD), 1, 10)-threshold map from X to ℓ2
Poincaré-type inequalities on X
Convex dualityℓk
∞(X) has no sketches of size Ω(k) and approximation Θ(sD)
(Andoni, Jayram, Pătraşcu 2010) (direct sum theorem for information complexity)
X has no sketches of size s and approximation D
‖(x1, …, xk)‖ = maxi ‖xi‖
17
Sketching direct sums
X has sketches of size s and approximation D
ℓk∞(X) has sketches of size O(s)
and approximation Dk
Alice Bob(a1, a2, …, ak) (b1, b2, …, bk)
∑i σi ai ∑i σi bi
sketch(∑i σi ai) sketch(∑i σi bi)
maxi ‖ai - bi‖≤ ‖∑i σi(ai – bi)‖≤ ∑i ‖ai - bi‖≤ k maxi ‖ai - bi‖
Crucially use the linear structure of X (not enough to be just a metric!)
(σ1, σ2, …, σk) — random ±1’s
with probability 1/2
18
Threshold map → linear embedding
(1, O(sD), 1, 10)-threshold map from X to ℓ2
Linear embedding into ℓ1 – ε with distortion O(sD / ε)
Uniform embedding into ℓ2
g: X → ℓ2 s.t. L(‖x1 – x2‖) ≤ ‖g(x1) – g(x2)‖ ≤ U(‖x1 – x2‖) where• L and R are non-decreasing,• L(t) > 0 for t > 0• U(t) → 0 as t → 0
(Aharoni, Maurey, Mityagin 1985) (Nikishin 1973)
?
19
Threshold map → uniform embedding• A map f: X → ℓ2 such that• ‖x1 - x2‖ ≤ 1 implies ‖f(x1) - f(x2)‖ ≤ 1• ‖x1 - x2‖ ≥ Θ(sD) implies ‖f(x1) - f(x2)‖ ≥ 10
• Building on (Johnson, Randrianarivony 2006)• 1-net N of X; f Lipschitz on N• Extend f from N to a Lipschitz function on the whole X
20
Open problems
• Extend to as general class of metrics as possible• Connection to linear sketches?
• Sketches of the form x → Ax• Conjecture: sketches of size s and approximation D can be converted to linear
sketches with f(s) measurements and approximation g(D)
• Spaces that admit no non-trivial sketches (s = Ω(d) for D = O(1)): is there anything besides ℓ∞?• Can one strengthen our theorem to “sketchability implies embeddability
into ℓ1”?• Equivalent to an old open problem from Functional Analysis.
• Sketches imply NNS, is there a reverse implication?