Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of...

Ad-hoc Distributed Spatial Joins on Mobile Devices

Panos Kalnis, Xiaochen LiNational University of Singapore

Nikos MamoulisThe University of Hong Kong

Spiridon BakirasHong Kong University of Science and Technology

Motivation

Users are equipped with a mobile device (eg. PDA)

Ad-hoc spatial queries Combine data from remote servers

Hotels Restaurants

“Find hotels which are within 500m of a seafood restaurant”

Servers do not collaborate with each other The query is executed on the mobile device

Mediators?

Services may only allow end-user connections (eg., subscribers only)

Access through mediators may be more expensive

Requests are ad-hoc; existing mediators may not support them

Hotels Restaurants

Mediator

Telecommunication companies typically charge by the bulk of transferred data (eg. GPRS), instead of connection time.

Goal: Minimize the amount of transferred data.

Solution

Ask aggregate queries to estimate the data distribution (i.e., statistics)

Partition the space recursively to achieve sub-linear transfer cost

Choose the physical operator indepen-dently for each partition

Related Work

Hash-based methods (eg. PBSM): require all data to be transferred

R-tree based methods (eg., [Tan et.al, TKDE, 2000]): require access to internal index

Mediators : HERMES : Statistics from previous queries DISCO, Garlic : Statistics during initialization Tuckila : Optimize parts of the execution tree

Operators

WINDOW query: return all objects intersecting a window w

COUNT query: return the number of objects intersecting w

ε-RANGE query: return all objects within range ε from a point p

NO access to the internal indices!

Query Types Intersection Join

Find hotels which are inside parks

E-range Join Find restaurants which

are within 500m of a hotel

Iceberg Semi-join Find hotels which are

close to at least 3 restaurants

Hash Based Spatial Join

Each partition must fit in memory

Recursive evaluation

Retrieve statistics for each subpart

Inefficient HBSJ

Nested Loop Spatial Join

Recursive HBSJ : 4 QRY + 2 RCV + 5 RCV

NLSJ : 2 RCV + 2 SND + 2 RES

Inefficient NLSJ

Cost Model

TCP/IP: MTU = MSS + BH

BBBBT DHDDB )(

c1: download |RW| objects from R and |Sw| objects from S and join them on the PDA

C2,3: download |RW| objects from R, send them as window queries to S and retrieve the results

c4: repartition w, retrieve detailed statistics and apply the algorithm recursively

UpJoin (Uniform Partition Join)

Decide if datasets are uniform

If HBSJ is cheaper and both datasets are uniform then perform HBSJ

If NLSJ is cheaper and the largest dataset is uniform then perform NLSJ

Else repartition

Uniformity check

wiww DDD

Dw’0 Dw’1

Dw’3 Dw’2

% variation from uniform distribution

Note: UpJoin will not repartition if the cost for retrieving statistics is larger than the cost of joining

Inefficient UpJoin

SR-Join (Similarity Related Join)

Area% variationof density

Identify dense and sparse quadrants

If the distribution is similar then apply HBSJ or NLSJ

Else repartitionX

Experimental setup Implementation

Server: Unix Client: HP-Ipaq PDA (WiFi network, 400MHz

RISC CPU, 64MB RAM, Windows Pocket PC) Datasets:

Synthetic: 1K – 10K points, varying skew Real: Roads and railways of Germany

Setting the parameters

α (for UpJoin) ρ (for SR-Join)Uniform Uniform

Real Dataset

Uniform

Comparison with SemiJoin

•SemiJoin: Use intermediate levels of R-Tree index•We cannot use it in practice, because we cannot access the index

Uniform

Conclusions Distributed spatial joins on mobile devices No mediator – non collaborative servers – limited

set of supported operators Two algorithms

UpJoin SRJoin Both estimate the datasets’ distribution

Future work Support multi-way spatial joins Improve the accuracy of the cost model

Questions?

Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of...

Documents

Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore

Supplementary Information Condition Formaldehyde Production … · Formaldehyde Production under Ambient Condition Libo Yao†, Yanbo Pan†, Xiaochen Shen, Dezhen Wu, Abdulaziz Bentalib,

Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University

The Impact of Duality on Data Synopsis Problems Panagiotis Karras KDD, San Jose, August 13 th, 2007 work with Dimitris Sacharidis and Nikos Mamoulis

DISIC : Metacomputing PeerOLAP 1 An Adaptive Peer-to-Peer Network for Distributed Caching of OLAP Results Panos Kalnis, Wee Siong Ng, Beng Chin Ooi,Dimitris

Christian S. Jensen csj joint work with Man Lung Yiu, Hua Lu, Jesper Møller, Gabriel Ghinita, and Panos Kalnis Privacy for Spatial Queries

Engineering (ICICEE 2012) - GBV · UAVFlight PerformanceOptimization BasedonRestart PRPConjugateGradient Method 867 XiaoChen, Xinmin Wang, cmdJicmZhou ResearchonUltra Wide-Band IndoorLocationSystem

Database Systems for Advanced Applicationseprints.usq.edu.au/8485/2/Binder2.pdf · Jiaheng Lu (Renmin University, China) Sanjay Madria (University of Missouri-Rolla, USA) Nikos Mamoulis

E-lockbox Team08 Jian Lei: Project Manager/Builder Mu Bai: Requirements Engineer/Builder Hanadi Mardah: Life Cycle Planner/ UML Modeler Xiaochen Wang:

Fast Data Anonymization with Low Information Loss 1 National University of Singapore {ghinitag,kalnis}@comp.nus.edu.sg 2 Hong Kong University {pkarras,nikos}@cs.hku.hk

Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Lin Wei Primary School Liu Xiaochen Unit2 My schoolbag B Let's learn

Private Queries in Location-Based Services: Anonymizers are Not Necessary Gabriel Ghinita 1 Panos Kalnis 1 Ali Khoshgozaran 2 Cyrus Shahabi 2 Kian Lee

Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

One-Pass Wavelet Synopses for Maximum-Error Metrics Panagiotis Karras Trondheim, August 31st, 2005 Research at HKU with Nikos Mamoulis

XiaoChen Published Energies 07 02274

Indexing DNA sequences for local similarity search Joint work of Angela, Dr. Mamoulis and Dr. Yiu 17/5/2007

Mizan Mizan: Optimizing Graph Mining in Large Parallel Systems Panos Kalnis King Abdullah University of Science and Technology (KAUST) H. Jamjoom (IBM

A Hybrid Technique for Private Location-Based Queries with Database Protection Gabriel Ghinita 1 Panos Kalnis 2 Murat Kantarcioglu 3 Elisa Bertino 1 1

Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)