Upload
cheryl
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Efficient Sketches for Earth-Mover Distance, with Applications. David Woodruff IBM Almaden. Joint work with Alexandr Andoni, Khanh Do Ba, and Piotr Indyk. (Planar) Earth-Mover Distance. For multisets A , B of points in [ ∆] 2 , | A |=| B |= N , - PowerPoint PPT Presentation
Citation preview
Efficient Sketches for Earth-Mover Distance, with Applications
David WoodruffIBM Almaden
Joint work with Alexandr Andoni, Khanh Do Ba, and Piotr Indyk
(Planar) Earth-Mover Distance
• For multisets A, B of points in [∆]2, |A|=|B|=N,
i.e., min cost of perfect matching between A and B
AaBAaaBA )(min),EMD(
:
EMD( , ) = 6 + 3√2 2
Geometric Representation of EMD
• Map A, B to k-dimensional vectors F(A), F(B)– Image space of F “simple,” e.g., k small– Can estimate EMD(A,B) from F(A), F(B) via some
efficient recovery algorithm E
F2 Rk
E ≈ EMD(A,B)
3
Geometric Representation of EMD: Motivation
• Visual search and recognition:– Approximate nearest neighbor under EMD
• Reduces to approximate NN under simpler distances• Has been applied to fast image search and recognition in
large collections of images [Indyk-Thaper’03, Grauman-Darrell’05, Lazebnik-Schmid-Ponce’06]
• Data streaming computation:– Estimating the EMD between two point sets given as a
stream• Need mapping F to be linear: adding new point a to A
translates to adding F(a) to F(A)• Important open problem in streaming [“Kanpur List ’06”]
4
Our result Non-norm O(∆ε) O(1/ε)
Prior and New Results
Paper Recovery Dimension Approx.
[Charikar’02, Indyk-Thaper’03] ℓ1 O(∆2) O(log ∆)
[Naor-Schechtman’06] ℓ1 Any Ω(log1/2 ∆)
Main TheoremFor any ε 2 (0,1), there exists a distribution over linear mappings F: R∆2 ! R∆ε s.t. for multisets A,B µ [∆]2 of equal size, we can produce an O(1/ε)-approximation to EMD(A,B) from F(A), F(B) with probability 2/3.
5
Geometric representation of EMD:
Implications
• Streaming:
• Approximate nearest neighbor:
Paper Space Approximation
[Indyk’04] logO(1)(∆N) O(log ∆)
Our result ∆ε logO(1)(∆N) O(1/ε)
Paper Space Query time Approximation[Andoni-Indyk-Krauthgamer’09]
s2+ε 2∆1/α ∆O(1) sε O((α/ε) loglog s)
Our result 2∆ε log(s∆N)O(1) (∆ log(s∆N))O(1) O(1/ε)
* N = number of points
* s = number of data points (multisets) to preprocess α>1 free parameter 6
Proof Outline• Old [Agarwal-Varadarajan’04, Indyk’07]:
– Extend EMD to EEMD which:• Handles sets of unequal size |A| · |B| in a grid of side-length k• EEMD(A,B) = min|S|=|A| and S µ B EMD(A,S) + k¢|B\S|
• Is induced by a norm ||¢||EEMD, i.e., EEMD(A,B) = ||Â(A) – Â(B)||EEMD, where Â(A)2 R∆2 is the characteristic vector of A
– Decomposition of EEMD into weighted sum of small EEMD’s• O(1/ε) distortion
• New:– Linear sketching of “sum-norms”
EMD over [∆]2
EEMD over [∆ε]2
+ + … +EEMD over [∆ε]2 EEMD over [∆ε]2
∆O(1) terms
7
Old Idea [Indyk ’07]
EEMD over [∆1/2]2
EEMD over [∆ε]2
+ + … +EEMD over [∆ε]2 EEMD over [∆ε]2
∆O(1) terms
EMD over [∆]2
EMD over [∆]2
+ … +
EEMD over [∆1/2]2
Old Idea [Indyk ’07]EMD over [∆]2 Solve EEMD in each of ¢ cells,
each a problem in [¢1/2]2
2
Old Idea [Indyk ’07]
2
Solve one additionalEEMD problem in [¢1/2]2
Should also scale edgelengths by ¢1/2
Old Idea [Indyk ’07]• Total cost is the sum of the two phases
• Algorithm outputs a matching, so its cost is at least the EMD cost
• Indyk shows that if we put a random shift of the [¢1/2]2 grid on top of the [¢]2 grid, algorithm’s cost is at most a constant factor times the true EMD cost
• Recursive application gives multiple [¢ε]2 grids on top of each other, and results in O(1/ε)-approximation
Main New Technical Theorem
12
Given C > 0 and λ > 0, if C/λ · ||M||1, X · C, there is a distribution over linear mappings
μ: Xn ! X(λ log n)O(1)
such that we can produce an O(1)-approximation to ||M||1,X from μ(M) w.h.p.
For normed space X = (Rt, ||¢||X) and M 2 Xn, denote ||M||1,X = ∑i ||Mi||X.
+ + … +||M1||X ||M2||X ||Mn||X
||M||1, X =
Proof Outline: Sum of Norms• First attempt:
– Sample (uniformly) a few Mi’s to compute ||Mi||X
– Problem: sum could be concentrated in 1 block
• Second attempt:– Sample Mi w/probability proportional to ||Mi||X [Indyk’07]– Problem: how to do online?– Techniques from [JW09, MW10]?
• Need to sample/retrieve blocks, not just individual coordinates
M1 M2 M3 Mn……
M2 contains most of mass
13
Proof Outline: Sum of Norms (cont.)
• Our approach:– Split into exponential levels:
• Assume ||M||1, X · C• Sk = i2[n] s.t. ||Mi||X 2 (Tk, 2Tk], Tk=C/2k
• Suffices to estimate |Sk| for each level k. How?
– For each level k, subsample from [n] at a ratesuch that event Ek (“isolation” of level k)
holds with probability proportional to |Sk|
– Repeat experiment several times, count number of successes
S1
1S2
S3
Sℓ
…M2M4, M7
M1, M3, M8, M9
M5, M10, Mn
M = (M1, M2, …, Mn)
M:
Subsample:
Ek?
Y N 14
Proof Outline: Event Ek
• Ek $ “isolation” of level k:– Exactly one i 2 Sk gets subsampled– Nothing from Sk’ for k’<k
• Verification of trial success/failure– Hash subsampled elements
• Each cell maintains vector sum of subsampled Mi’s that hash there
– Ek holds roughly (we “accept”) when:• 1 cell has X-norm in (0.9Tk, 2.1Tk]• All other cells have X-norm ≤ 0.9Tk
– Check fails only if:• Elements from lighter levels contribute a lot to 1 cell• Elements from heavier levels subsampled and collide
– Both unlikely if hash table big enough– Under-estimates |Sk|. If |Sk| > 2k/polylog(n), gives O(1)-approximation – Remark: triangle inequality of norm gives control over impact of collisions15
∑ ∑ ∑ ∑
Subsample:
M1 M4 M5 M6 M9 M11 Mn–1
Sketch and Recovery Algorithm
– For each level k, count number ck of “accepting” hash tables
– Return ∑k Tk · (ck/t) · (1/pk)
Recovery algorithm:
Sketch:– For each level k, create t hash tables– For each hash table:
• Subsample from [n], including each i2[n] w.p. pk = 2-k
• Each cell maintains sum of Mi’s that hash to it
16
- For every k, the estimator under-estimates |Sk|
- If |Sk| > 2k/polylog n, the estimator is (|Sk|)
EMD Wrapup
• We achieve a linear embedding of EMD– with constant distortion, namely O(1/ε),– into a space of strongly sublinear dimension, namely ∆ε.
• Open problems:– Getting (1+ε)-approximation / proving impossibility– Reducing dimension to logO(1)∆ / proving lower bound
17
What We Did• We showed that in a data stream, one can sketch ||M||
1,X = ∑i ||Mi||X with space about the space complexity of computing (or sketching) ||¢||X
• This quantity is known as a cascaded norm, written as L1(X)
• Cascaded norms have many applications [CM, JW]
• Can we generalize this? E.g., what about L2(X), i.e., (∑i ||Mi||2
X )1/2
Cascaded Norms [JW09]• No!
• L2(L1), i.e., (∑i ||Mi||2 1)1/2, requires (n1/2) space, where n is
the number of different i, but sketching complexity of L1 is O(log n)
• More generally, for p ¸ 1, Lp(L1), i.e., (∑i ||Mi||p 1)1/p is £(n1-1/p) space
• So, L1(X) is very special
Thank You!