39
1 Streaming Algorithms for Geometric Problems Piotr Indyk MIT

Streaming Algorithms for Geometric Problems

  • Upload
    nemo

  • View
    44

  • Download
    3

Embed Size (px)

DESCRIPTION

Streaming Algorithms for Geometric Problems. Piotr Indyk MIT. Streaming: The Model. Single pass over the data: e 1 , e 2 , …, e n Bounded storage Fast processing time per element. Recap: Norm estimation. Norm estimation: Stream elements: (i,b) , i=1…m Interpretation: x i =x i +b - PowerPoint PPT Presentation

Citation preview

  • Streaming Algorithms for Geometric Problems Piotr IndykMIT

  • Streaming: The ModelSingle pass over the data: e1, e2, ,enBounded storage Fast processing time per element

  • Recap: Norm estimationNorm estimation:Stream elements: (i,b) , i=1mInterpretation: xi=xi+bWant to (1+)-approximate ||x||pNote: the frequency moment Fp is a special caseAlgorithms with (log n+1/)O(1) space:p=2: [Alon-Matias-Szegedy96]p(0,2]: [Indyk00]Idea: maintain Ax

  • Recap: Other algorithmsCompute the number of distinct elements F0Exactly: (n) bits of space(1+) -approximation: O(1/2 *log n) bits [Flajolet-Martin, JCSS85] ,Compute the median Exactly: (n) (50% ) -approximation: O(1/ *polylog n) [Paterson-Munro, TCS80] ,

  • Geometric Data Stream Algorithms as Data StructuresData structures that support:Insert(p) to PPossibly: Delete(p) from PCompute-Some-Property(P) Use space that is sub-linear in |P|

  • Dichotomy Insertions

    Merge and Reduce [MP80], core-sets

    Deterministic

    Insertions + Deletions

    Randomized linear mappings [FM85], [AMS96]

    Randomized

  • Insertions and Deletions

  • Minimum Weight Bi-chromatic Matching Estimate the cost of MWBM

  • Minimum Weight Matching Estimate the cost of MWM

  • Minimum Spanning Tree Estimate the cost of MST

  • Facility Location Goal: choose a set F of facilities to minimize the sum of the distances to nearest facility plus the number of facilities times f Again, report the cost

  • ApproachAssume P{1}2Reduce to vector problemsImpose square grids G0Gk, with side lengths 20,21, , 2k , shifted at random.For each square cell c in Gi, let nP(c) be the number of points from P in c.The algorithms will maintain certain statistics over nP(.), which will allow it to approximately solve the problems 21113115

  • EstimatorsMST:i 2i c Gi [nP(c)>0]MWM:i 2i c Gi [nP(c) is odd]MWBM:i 2i c Gi |nG(c)-nB(c)|Fac. Loc.:i 2i c Gi min[nP(c), Ti]K-median:i 2i c Gi - B(Q, 2^i) nP(c) (const. factor)

    Maintain #non-zero entries in nP [FM85]Maintain L1 difference [I00]

  • Results [Indyk04]*follows from [Charikar02,Indyk-Thaper03]

    Space: (log +log n)O(1) 1+ [Frahling-Indyk-Sohler05]

  • Probabilistic embeddings into HSTs 2111311Known [Bartal96, Charikar-Chekuri-Goel-Guha-Plotkin98]: ||p-q|| Dtree (p,q) E[ Dtree(p,q) ] ||p-q|| * O(log )T5

  • MSTE[Cost(MST in T)] O(log ) Cost(MST)Cost(MST in T) Cost(T) How to compute Cost(T) ?Sum over all levels i, of the #nodes at i, times 2iNode c exists iff ni(c)>021113115

  • MatchingAlgorithm:Match what you can at the current levelOdd leftovers wait for the next levelRepeatOptimal on the HSTCost=i 2i c Gi [nP(c) is odd]

    011110011

  • Insertions-only

  • k-median/k-center k is given Goal: choose k medians/centers to minimize: k-median: the sum of the distances k-center: the max distance

  • Metric clustering problemsk-center [Charikar-Chekuri-Feder-Motwani97]k-median [Guha-Mishra-Motwani-OCallaghan00, Meyerson01, Charikar-OCallaghan-Panigrahy03] Bounds:Poly(K,log n) spaceO(1)-approximation

  • Geometric ProblemsDiameter, Minimum Enclosing Ball [Agarwal-Har-Peled01, Feigenbaum-Kannan-Zhang02 (Algorithmica), Hershberger-Suri04]K-center [AHP01]K-median [Har-Peled-Mazumdar04]Range searching via -approximations:[Suri-Toth-Zhou04][Bagchi-Chaudhary-Eppstein-Goodrich04]

  • Dominant Approach: Merge and ReduceMain ideas:Design an (off-line) algorithm that computes a sketch of the inputSmall sizeSufficient to solve the problemA sketch of sketches is a sketch

  • Tree Computationp1p2p3p4p5p7p6p8p9p10p11p12p13p15p14p16

  • AlgorithmSpace: (sketch size)*log nTime: sketch computation timeQuestion: Where do sketches come from ?

  • Idea I: solution=sketchConsider k-median[Guha-Mishra-Motwani-OCallaghan00] : approximate k-median of approximate weighted k-medians is an approximate k-medianResult:Constant depth treeSpace: kn , >0O(1) -approximationWorks for any metric space231231k=3

  • Use the solution, ctd.-Approximations: find a subset SP , such that for any rectangle/halfspace/etc R, |RS|/|S| = |RP|/|P| [Matousek] : approximation of a union of approximations is an approximation[BCEG04] : convert it into streaming algorithm, applications1/2 space [STZ04] : better/optimal bounds for rectangles and halfspaces

  • Idea 2: Core-Sets [AHP01]Assume we want to minimize CP(o) SP is an -core-set for P, if for any o, and a set T: CPT (o) < (1+) CST (o)Note: this must hold for all o, not just the optimal oneThe latter is called a weak coreset [Badoiu-HarPeled-Indyk02]o

  • Example: Core-set for MEBCompute extremal points:Choose densely spaced direction v1 vk I.e., for any u there is vi such that u*vi ||u||2 / (1+)For each direction maintain extremal pointk=O(1/)(d-1)/2 suffice

  • Stream Algorithms via Core-setsDiameter/MEB/width: O(1/)(d-1)/2 log n space [AHPV01+02+04]k-center: O(k/d) log n [HP01]k-median: O(k/d) log n [HPM04] (k+d+log n+1/)O(1) [Chen06]Faster algorithms and other results: [Chan04], [Suri-Hershberger03]

  • Core-sets + Randomized Maps(1+)-approximate k-median under insertions and deletions [Frahling-Sohler05]

  • Insertions and Deletions, ctd

  • HistogramsView x as a function x:[1n] [1M]Approximate it using piecewise constant function h, with B pieces (buckets)Problem can be formulated in 2D as well (buckets become rectangular tiles)

  • Results: 1D[Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss, STOC02] :Maintains h with B pieces such that ||x-h||2 (1+)||x-hOPT||2Under increments/decrements of xSpace: poly(B,1/,log n)Time: poly(B,1/,log n)

  • Results: 2D[Thaper-Guha-Indyk-Koudas, SIGMOD02] :Maintains h with B log (nM) tiles such that ||x-h||2 (1+)||x-hOPT||2Under increments/decrements of xSpace/Update time: poly(B,1/,log n)Histogram reconstruction time: poly(B,1/, n)[Muthukrishnan-Strauss, FSTTCS03] :Maintains h with 4B tilesTime: poly(B,1/, log(nM))

  • General ApproachMaintain sketches Ax of xThis allows us to estimate the error of any given h, via ||x-h|| ||Ax-Ah|| Construct h:EnumerationGreedyDynamic Programming

  • ConclusionsAlgorithms for geometric data streamsInsertions-only: merge and reduceInsertions and deletions: randomized linear embeddings

  • Open ProblemsHigh dimensions:Diameter: 21/2-approx, O(d2 n1/2 ) space, follows from [Goel-Indyk-Varadarajan, SODA01]c-approx, O( dn1/(c2 - 1) ) [Indyk, SODA03]Conjecture: 21/2-approx, O(d polylog n) spaceMin-width cylinder: 18-approx, O(d) space [Chan04]Other problems ?

  • Open ProblemsRange queries:General lower bounds ? (Not just for - approximations)(1/2) -bit bound for general queries follows from LB for dot product [Indyk-Woodruff, FOCS03] , and is tight (for randomized algorithms)What about e.g., half-space queries ? O(1/4/3) is known [STZ04]Other problems [STZ04]

  • Open ProblemsMatchings, Facility Location, etc:Replace log by O(1) or even 1+Possible for MST [Frahling-Indyk-Sohler??]Related to computing bi-chromatic matching [Agarwal-Varadarajan04]Min-sum clustering ?