Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error...

Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-

Constant Error

T.S. Jayram David Woodruff

IBM Almaden

Data Stream Model• Have a stream of m updates to an n-dimensional vector v

– “add x to coordinate i”– Insertion model -> all updates x are positive– Turnstile model -> x can be positive or negative

– stream length and updates < poly(n)

• Estimate statistics of v– # of distinct elements F0

– Lp-norm |v|p = (Σi |vi|p )1/p

– entropy– and so on

• Goal: output a (1+ε)-approximation with limited memory

Lots of “Optimal” Papers• Lots of “optimal” results

– “An optimal algorithm for the distinct elements problem” [KNW]

– “Fast moment estimation in optimal space” [KNPW]– “A near-optimal algorithm for estimating entropy of a stream”

[CCM]– “Optimal approximations of the frequency moments of data

streams” [IW]– “A near-optimal algorithm for L1-difference” [NW]– “Optimal space lower bounds for all frequency moments” [W]

• This paper– Optimal Bounds for Johnson-Lindenstrauss Transforms and

Streaming Problems with Sub-Constant Error

What Is Optimal?

• F0 = # of non-zero entries in v

• “For a stream of indices in {1, …, n}, our algorithm computes a (1+ε)-approximation using an optimal O(ε-2 + log n) bits of space with 2/3 success probability… This probability can be amplified by independent repetition.”

• If we want high probability, say, 1-1/n, this increases the space by a multiplicative log n

• So “optimal” algorithms are only optimal for algorithms with constant success probability

Can We Improve the Lower Bounds?

x 2 {0,1}ε-2 y 2 {0,1}ε-2

Gap-Hamming: either Δ(x,y) > ½ + ε or Δ(x,y) < ½-ε

Lower bound of Ω(ε-2) with 1/3 error probability

But upper bound of ε-2 with 0 error probability

Our Results

Streaming Results• Independent repetition is optimal!

• Estimating Lp-norm in turnstile model up to 1+ε w.p. 1-δ – Ω(ε-2 log n log 1/δ) bits for any p– [KNW] get O(ε-2 log n log 1/δ) for 0 · p · 2

• Estimating F0 in insertion model up to 1+ε w.p. 1-δ – Ω(ε-2log 1/δ + log n) bits– [KNW] get O(ε-2 log 1/δ) for ε-2 > log n

• Estimating entropy in turnstile model up to 1+ε w.p. 1-δ– Ω(ε-2log n log 1/δ) bits– Improves Ω(ε-2 log n) bound [KNW]

Johnson-Lindenstrauss Transforms

• Let A be a random matrix so that with probability 1- δ, for any fixed q 2 Rd

|Aq|2 = (1 ± ε) |q|2

• [JL] A can be a 1/ε2 log 1/δ x d matrix– Gaussians or sign variables work

• [Alon] A needs to have (1/ε2 log 1/δ) / log 1/ε rows

• Our result: A needs to have 1/ε2 log 1/δ rows

Communication Complexity Separation

f(x,y) 2 {0,1}

D1/3, ρ (f) = communication of best 1-way deterministic protocol that errs w.p. 1/3 on distribution ρ

[KNR]: R||1/3(f) = maxproduct distributions ¹ £ λ D ¹ £ λ,1/3(f)

Communication Complexity Separation

f(x,y) 2 {0,1}

VC-dimension: maximum number r of columns for which all 2r rows occur in communication matrix on these columns

[KNR]: R||1/3(f) = Θ(VC-dimension(f))

Our result: there exist f and g with VC-dimension k, but:

R||δ(f) = Θ(k log 1/δ) while R||

δ(g) = Θ(k)

Our Techniques

Lopsided Set Intersection (LSI)

S ½ {1, 2, …, U}|S| = 1/ε2

U = 1/ε2 ¢ 1/δ

Is S Å T = ;?

- Alice cannot describe S with o(ε-2 log U) bits

- If x, y are uniform then with constant probability, S Å T = ;

- R||1/3(LSI) > Duniform, 1/3 (LSI) = Ω(ε-2 log 1/δ)

T ½ {1, 2, …, U}|T| = 1/δ

Lopsided Set Intersection (LSI2)

S ½ {1, 2, …, U}|S| = 1/ε2

U = 1/ε2 ¢ 1/δ

Is S Å T = ;?

T ½ {1, 2, …, U}|T| = 1

-R||δ/3(LSI2) ¸ R||

1/3(LSI) = Ω(ε-2 log 1/δ)- Union bound over set elements in LSI instance

Low Error Inner Product

x 2 {0, ε}U

|x|2 = 1

y 2 {0, 1}U

|y|2 = 1

Does <x,y> = 0?

Estimate <x, y> up to ε w.p. 1-δ -> solve LSI2 w.p. 1-δ

R||δ (inner productε) = Ω(ε-2 log 1/δ)

U = 1/ε2 ¢ 1/δ

L2-estimationε

x 2 {0, ε}U

|x|2 = 1

What is |x-y|2 ?

U = 1/ε2 ¢ 1/δ

y 2 {0, 1}U

|y|2 = 1

- |x-y|22 = |x|22 + |y|22 - 2<x, y> = 2 – 2<x,y>

- Estimate |x-y|2 up to (1+Θ(ε))-factor solves inner-productε

- So R||δ (L2-estimationε) = Ω(ε-2 log 1/δ)

- log 1/δ factor is new, but want an (ε-2 log n log 1/δ) lower bound

- Can use a known trick to get an extra log n factor

- log 1/δ factor is new, but want an (ε-2 log n log 1/δ) lower bound

- Can use a known trick to get an extra log n factor

Augmented Lopsided Set Intersection (ALSI2)

Universe [U] = [1/ε2 ¢ 1/δ]

S1, …, Sr ½ [U]

All i: |Si| = 1/ε2

j 2 [U]i* 2 {1, 2, …, r}

Si*+1 …, Sr

Is j 2 Si*?

R||1/3(ALSI2) = (r ε-2 log 1/δ)

Reduction of ALSI2 to L2-estimationε

jSi*+1

} x } y

y - x = 10i* yi* - ii* 10i ¢ xi

|y-x|2 is dominated by 10i* |yi* – xi*|2

- Set r = Θ(log n)

- R|| δ(L2-estmationε) = (ε-2 log n log

- Streaming Space > R|| δ(L2-estimationε)

- Set r = Θ(log n)

- R|| δ(L2-estmationε) = (ε-2 log n log

- Streaming Space > R|| δ(L2-estimationε)

Lower Bounds for Johnson-Lindenstrauss

Use public randomness to agree on a JL matrix A

x 2 {-nO(1), …, nO(1)} t

Ax Ay-

|A(x-y)|2

- Can estimate |x-y|2 up to 1+ε w.p. 1-δ- #rows(A) = (r ε-2 log 1/δ /log n)- Set r = Θ(log n)

y 2 {-nO(1), …, nO(1)} t

Low-Error Hamming DistanceUniverse = [n]Δ(x,y) = Hamming Distance between x and y

x 2 {0,1}n y 2 {0,1}n

• R||δ (Δ(x,y)ε) = (ε-2 log 1/δ log n)• Reduction to ALSI2• Gap-Hamming to LSI2 reductions with Low Error

• Implies our lower bounds for estimating• Any Lp-norm• Distinct Elements• Entropy

Conclusions

• Prove first streaming space lower bounds that depend on probability of error δ– Optimal for Lp-norms, distinct elements– Improves lower bound for entropy– Optimal dimensionality bound for JL transforms

• Adds several twists to augmented indexing proofs– Augmented indexing with a small set in a large domain– Proof builds upon lopsided set disjointness lower bounds– Uses multiple Gap-Hamming to Indexing reductions that

handle low error

ALSI2 to Hamming Distance

- Let t = 1/ ε2 log 1/δ

- Use public coin to generate t random strings b1, …, bt 2 {0,1}t

- Alice sets xi = majorityk in Si bi, k

- Bob sets yi = bi ,j

S1, …, Sr ½ [1/ε2 ¢ 1/δ]All i: |Si| = 1/ε2

j 2 [U]i* 2 {1, 2, …, r}

Si*+1 …, Sr

Embed multiple copies by duplicating coordinates at different scales

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error...

Documents

Faster Johnson-Lindenstrauss style reductions

Precios Almaden

The State comptroller Lindenstrauss Report 2010

Lecture 4. Random Projections and Johnson-Lindenstrauss Lemma

Almaden Institute Robert Hecht-Nielsen

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden

Introduction - The Hebrew Universityelon/Publications/oppenheim.pdf · ELON LINDENSTRAUSS AND GREGORY MARGULIS Dedicated to the memory of Joram Lindenstrauss Abstract. We give an

Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform

Almaden community center

IBM Research – Almaden · 2019-06-24 · IBM Almaden Client Briefings Program Jcd 2019.05.10a IBM Research – Almaden 650 Harry Road (see below!) San Jose, CA 95120 Main lobby:

IBM Research-Almaden: San Jose, California

Festividad El Cristo 2013 Almaden

Lake almaden

Programa El Cristo Almaden 2014

Joram Lindenstrauss, in Memoriam · Memoriam Compiled by William B. Johnson and Gideon Schechtman J oram Lindenstrauss was born on October28, 1936, in Tel Aviv. He was the only child

Welcome to almaden 20150716 v19

Saffron at IBM Almaden Cognitive Computing

Joram Lindenstrauss, in Memoriam · 4/29/2012 · Memoriam Compiled by William B. Johnson and Gideon Schechtman J oram Lindenstrauss was born on October28, 1936, in Tel Aviv. He

Feria Almaden 2013

Almaden - Tom Mangan