Upload
duongkhue
View
222
Download
3
Embed Size (px)
Citation preview
Trajectory Data Modeling and
Processing in HBase
Faisal Moeen Orakzai
Fachgebiet Datenbanksysteme und Informationsmanagement
Technische Universitat Berlin
A thesis submitted for the degree of
Master of Science (M.Sc.) in Computer Science
August 10th 2014
Advisor
Dipl.-Inf. Alexander S. Alexandrovv
Reviewers
Prof. Dr. rer. nat. Volker Markl
Prof. Dr. Esteban Zimanyi
Abstract
The number of devices equipped with location sensors has increased expo-
nentially in the last couple of years. These devices generate huge amount of
movement data which is difficult to process or query because of lack of scal-
ability of existing approaches and systems. There has been efforts in this
direction but these are limited to research prototypes and such systems have
neither attracted users nor have made it to the industry. In this thesis, we
design and implement Guting’s moving objects algebra on an open-source
key-value store HBase. We discuss various spatial indexing strategies to
improve query performance and present our strategy based on space filling
curves. To enable efficient querying using space filling curves, we present
the design and implementation of a query processing layer on top of Apache
Phoenix and compare the performance of our implementation with existing
work.
Zusammenfassung
Die Anzahl der Gerate mit Sensoren ausgerustet Lage hat exponentiell in
den letzten Jahren zugenommen. Diese Gerate erzeugen enorme Bewe-
gungsdaten, die schwierig zu verarbeiten oder Abfrage ist wegen des Man-
gels an Skalierbarkeit bestehender Ansatze und Systeme. Es hat Bemhun-
gen in dieser Richtung, aber diese sind zu Forschungsprototypen beschrankt
und ein solches System weder zogen Benutzer noch hat sie in der Industrie
hergestellt. In dieser Arbeit, entwerfen und implementieren wir Guting be-
wegender Objekte Algebra auf einer Open-Source-Schlussel-Wert-Speicher
HBase. Wir diskutieren verschiedene raumliche Indizierung Strategien, um
die Abfrageleistung zu verbessern und prasentieren unsere Strategie, die auf
raumfullende Kurven. Um effiziente Abfrage ermoglichen mit raumfullende
Kurven, prasentieren wir das Design und die Implementierung eines Abfragev-
erarbeitung Schicht auf Apache Phoenix und vergleichen Sie die Leistung
unserer Umsetzung mit bestehenden Arbeit.
iv
Acknowledgements
I would like to thank Prof. Dr. Ralf Hartmut Guting for his help during the
course of this thesis. I have special appreciation for Jiamin Lu for his help
with Parallel Secondo and answering all my questions immediately without
any consideration of time even while he was on holidays. I would also like
to thank Johannes Kirschnick and all colleagues from the database systems
research group (DIMA) at TU Berlin for their continuous support and the
numerous comments in the past months.
ii
Contents
List of Figures ix
List of Tables xi
1 Introduction 1
2 Background 5
2.1 Spatio-Temporal Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Allens Temporal Concepts . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 SQL 2011 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Guting’s Spatio-temporal Algebra . . . . . . . . . . . . . . . . . 6
2.1.4 Hermes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Guting’s Spatio-temporal Algebra . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1.1 mpoint & mregion . . . . . . . . . . . . . . . . . . . . . 7
2.2.1.2 Other Data-Types . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Example Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 SECONDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Spatio-temporal Indexes [1] . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Multidimensional Indexes . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1.1 R-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1.2 3D R-Tree . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1.3 STR-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1.4 TB-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Multi-Version R-Trees . . . . . . . . . . . . . . . . . . . . . . . . 15
iii
CONTENTS
2.3.2.1 HR-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2.2 HR+-Tree . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2.3 MVR-tree . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3 Grid Based Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3.1 SETI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3.2 MTSB-Tree . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3.3 CSE-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Space-Filling Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Z-Order Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Hilbert Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.3 GeoHash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.3.2 How to Calculate Geo-Hash . . . . . . . . . . . . . . . . 22
2.4.3.3 Objectives to Achieve in Geo-Hash Indexing . . . . . . 23
3 Distributed Platforms for Querying 25
3.1 Distributed Spatial Data Processing Platforms . . . . . . . . . . . . . . 25
3.1.1 Spatial-Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Hadoop-GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Distributed Online Querying Platforms . . . . . . . . . . . . . . . . . . . 26
3.2.1 Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 Stinger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3 HBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.4 Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Parallel Secondo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Parallel Query Execution . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2.1 PS-Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.2.2 Distributed Data Types . . . . . . . . . . . . . . . . . . 31
3.3.2.3 Distributed Operators . . . . . . . . . . . . . . . . . . . 33
3.4 HBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.1 Log Structured Message Trees . . . . . . . . . . . . . . . . . . . . 35
3.4.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
iv
CONTENTS
3.4.3 Write Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.4 Read Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.5 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Choice of Platform for Guting’s Algebra . . . . . . . . . . . . . . . . . . 40
3.5.1 Schema Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.2 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.3 Partitioning Control . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.4 Co-location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.5 Scan Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5.6 Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5.7 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Algebra Implementation 45
4.1 Motivation behind the use of Apache Phoenix . . . . . . . . . . . . . . . 45
4.2 Implementation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.1 Use of Struct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.2 Binary Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.3 Data Type Flattening . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 Spatial Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1.1 Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1.2 Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1.3 Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1.4 DLine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.1.5 Region . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2 Basic Unit Data Types . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.3 Spatial Unit Data Types . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.4 Basic Range Data Types . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.5 Temporal Range Data Types . . . . . . . . . . . . . . . . . . . . 50
4.3.6 Basic Temporal Data Types . . . . . . . . . . . . . . . . . . . . . 50
4.3.7 Spatio-Temporal Data Types . . . . . . . . . . . . . . . . . . . . 51
4.3.7.1 MPoint . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
v
CONTENTS
4.4 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Indexing Strategy & Querying Framework 53
5.1 Indexing in HBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Indexing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2.1 Maintaining a Global Index . . . . . . . . . . . . . . . . . . . . . 54
5.2.2 Maintaining Local Indexes . . . . . . . . . . . . . . . . . . . . . . 55
5.2.3 Maintaining Distributed Indexes . . . . . . . . . . . . . . . . . . 56
5.2.4 SFC based Indexing for HBase . . . . . . . . . . . . . . . . . . . 56
5.3 Spatial Index Design for LSMT . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.1 Co-location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3.2 Lesser Size of Unwanted Data Scan . . . . . . . . . . . . . . . . . 57
5.3.3 Lesser Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4.1 Priliminary Choices . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4.2 Choice of SFC Index . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4.2.1 Choice of Geo-Hash . . . . . . . . . . . . . . . . . . . . 59
5.4.3 Indexing a Region . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4.3.1 Single-Level Single-Hash (SLSH) . . . . . . . . . . . . . 60
5.4.3.2 Multiple Hashes per Region . . . . . . . . . . . . . . . . 61
5.4.4 Physical Approaches for Building the Index . . . . . . . . . . . . 64
5.4.4.1 Single-Index Approach . . . . . . . . . . . . . . . . . . 64
5.4.4.2 Multi-Index Approach . . . . . . . . . . . . . . . . . . . 64
5.4.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4.6 Index Implementation . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4.6.1 Schema Design for GET Requests . . . . . . . . . . . . 68
5.4.6.2 Schema Design for SCAN Requests . . . . . . . . . . . 68
5.5 The Querying Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.5.1 Guting’s Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.5.2 SFC Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.5.3 Query Translator . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5.4 Query Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.5.5 Stats-Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
vi
CONTENTS
5.5.6 Hash Coverage Algorithm . . . . . . . . . . . . . . . . . . . . . . 80
5.5.7 Client-side filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5.8 Meta-Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.5.8.1 Schema Meta-Data . . . . . . . . . . . . . . . . . . . . 82
5.5.8.2 Algebra Meta-Data . . . . . . . . . . . . . . . . . . . . 85
5.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6 Benchmark & Results 87
6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3 Query Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.4.1 Query-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.4.2 Query-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.4.3 Query-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.4.4 Query-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4.5 Query-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7 Conclusion 95
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
References 97
vii
CONTENTS
viii
List of Figures
2.1 A moving point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Type Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Operators for moving types . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Operators with moving results . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Two views of R-Tree [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
(a) Objects & Minimum Bounding Boxes . . . . . . . . . . . . . . . . 13
(b) R-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 TB-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 HR-TRee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.9 MV3RTree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.10 Spatial Query using Hilbert Curve . . . . . . . . . . . . . . . . . . . . . 18
2.11 Z-Order Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.12 Z-Order Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.13 Hilbert Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.14 Geo-Hash Precision Coverage . . . . . . . . . . . . . . . . . . . . . . . . 21
2.15 Relative Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.16 Geo-Hash Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Hadoop-GIS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 PS-Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Parallel Secondo Infrastructure . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Some examples of flist data type . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Multipage blocks iteratively merged across LSM-trees . . . . . . . . . . 36
3.6 HBase Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
ix
LIST OF FIGURES
3.7 An HBase table with two column families . . . . . . . . . . . . . . . . . 39
5.1 Implementation Block Diagram . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 GeoHash Edge Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Meta-Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.1 Query-1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2 Query-2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3 Query-3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.4 Query-4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.5 Query-5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
x
List of Tables
3.1 Platform Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1 Unit Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Spatial Unit Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Basic Range Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Basic Temporal Data Types . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.1 Number of points in each grid level . . . . . . . . . . . . . . . . . . . . . 64
5.2 Index for Movement Table along with hash-length . . . . . . . . . . . . 67
5.3 Constant Spatial Entities . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Index Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1 BerlinMOD Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2 Benchmark queries by index and input types . . . . . . . . . . . . . . . 89
xi
LIST OF TABLES
xii
1
Introduction
Research on Moving Object Databases has been going on since 1995. This type of
databases is more complex than relational databases because of a continuously chang-
ing dimension of time. The main goal has been to allow one to represent moving
entities in databases and to enable a user to ask all kinds of questions about such
movements. This requires extensions of the DBMS data model and query language.
Further, DBMS implementation needs to be extended at all levels, for example, by
providing data structures for representation of moving objects, efficient algorithms for
query operations, indexing and join techniques, extensions of the query optimizer, and
extensions of the user interface to visualize and animate moving objects.
A model of data together with some operations on it is captured by the concept
of an abstract data type (ADT). Guting et al. in 2000 [2, 3] proposed a type system
and operations carefully designed for handling the temporal aspect of the data. Further
work by them defines the discrete model and develops algorithms for the operations [4].
The model has also been extended to a network-based representation of moving objects
(or trajectories) [5]. SECONDO [6, 7, 8, 9] and Hermes [10] are the two prototypical
implementations of this data model. SECONDO has been developed at University of
Hagen and is very extensible. It provides querying at two levels, SQL and its executable
language. SECONDO uses Berkeley DB as the underlying database engine. Moreover
Hermes has been developed at University of Piraeus Greece and uses SQL as query
language with Oracle as its underlying DB.
Trajectories are hard to process because of continuously changing time dimension
especially with huge data sizes. Less work is available that deals with processing tra-
1
1. INTRODUCTION
jectories in distributed fashion. Parallel SECONDO is the parallel/distributed version
of Secondo which is an effort towards this direction but it lacks user base and open
source community support. It is a research project and has neither been benchmarked
for performance and scalability against other systems nor test proven in the industry.
It has its own executable query language (no SQL support) which gives the user power
to run the queries at slave databases and collect the results at the master but the user
has to parallelize queries himself to get the required results. It would be interesting
to see how well the known open source distributed data stores tackle the problem of
trajectories storage and querying.
HBase is an open source implementation of Google’s Big Table [11] key-value store.
It is a distributed, scalable and fault-tolerant system that gives fast query response
times over massive data. Data is automatically sharded to various nodes based on
’regions’ which are horizontal partitions of data. Data is sorted which means that
similar data is stored together. This results in faster range queries. On one hand it
shines because of its scalability, fault-tolerance and efficiency in online querying but on
the other hand it has limited support for defining a schema which makes it difficult to
model spatio-temporal data storage. Being a Key-Value store, HBase supports querying
only by the key which makes it difficult to design an effective storage strategy with
acceptable performance for spatial and temporal domains. There is limited support for
indexing and it does not allow to plug-in custom indexes.
In this thesis we present the design and implementation of Guting’s Algebra in
HBase and compare it with Parallel Secondo with respect to performance. We discuss
possible spatial index integration scenarios, devise a spatial indexing strategy based
on Space Filling Curves and present the design and implementation of a query pre-
processing layer on top of Apache Phoenix that enables support for spatial indexing
in HBase. We benchmark the performance of both systems using Parallel BerlinMOD
Benchmark for parallel moving object databases. We also inquire the effects of our
work i.e. implementation of spatial index & Guting’s algebra, on query performance
when compared to raw HBase and present results.
The remainder of this thesis is structured as follows: Chapter 2 introduces spatio-
temporal algebra and spatio-temporal indexing structures with a focus on Guting’s
algebra and GeoHash index. Chapter 3 describes some of the relevant distributed
platforms for online querying and our criteria which lead us to the choice of HBase.
2
Chapter 4 presents our implementation of Guting’s algebra. Chapter 5 proposes var-
ious indexing approaches in HBase using Space Filling Curves and presents our index
design and querying framework. Chapter 6 presents our experimental design and re-
sults. Finally, Chapter 7 concludes the discussion and provides some ideas for future
development.
3
1. INTRODUCTION
4
2
Background
This chapter reviews the background required to understand the ideas presented in the
exposition part of the thesis. We first introduce some existing spatio-temporal algebras
in section 2.1 and explain in more detail Guting’s Algebra which we have implemented
as part of this thesis in section 2.2. Section 2.3 gives an overview of various types
of spatio-temporal indexes and briefly explains a few indexing structures. We discuss
Space Filling Curves in section 2.4 as an alternate strategy for indexing spatial data
using conventional 1D indexes. GeoHash index being our choice for implementation in
this thesis is explained in more detail in section 2.4.3.
2.1 Spatio-Temporal Algebras
2.1.1 Allens Temporal Concepts
Allen et al. [12] were one of the first proposers of temporal concepts like intervals and
their relationships. The concepts of recent temporal SQL standards resemble their work
to a great extent. They proposed concepts like during, before, after, overlaps,
equals and meets. Most of the temporal concepts of SQL-2011 can be represented
using the concepts proposed by them.
2.1.2 SQL 2011
SQL-2011 was published in December of 2011, replacing SQL-2008 as a SQL standard.
This version added the ability to create and manipulate temporal tables. A lot of
temporal concepts have been added which make querying over temporal data very
5
2. BACKGROUND
easy. For completeness sake, a few of the data types as well as operators are mentioned
below [13]:
PERIOD Defines a period with a start and end data: PERIOD(DATE ’2010-01-01’,
DATE ’2011-01-01’)
OVERLAPS The predicate X OVERLAPS Y returns true if X or Y have atleast one time
point in common.
CONTAINS The predicate X CONTAINS Y returns true if Y is a subset of X.
PRECEDES The predicate X PRECEDES Y returns true if X occurs before Y and they
do not overlap.
SUCCEEDS The predicate X SUCCEEDS Y returns true if X occurs after Y and they
do not overlap.
As SQL-2011 handles data which changes in time, movement data can also be modeled
in a temporal database and queried. One of the drawbacks using this approach is that
it focuses more on the state or version of the data rather than movement aspects of it.
The operators are also not intuitive for querying moving objects which make it really
hard to design analytical queries.
2.1.3 Guting’s Spatio-temporal Algebra
This algebra focuses on moving objects and was proposed by Prof. Guting from Uni-
versity of Hagen in year 2000. From here onwards we refer to this as Guting’s algebra
to differentiate it from other algebras. During the course of this thesis, this algebra
was implemented in HBase. Section 2.2 discusses this algebra in detail with examples.
2.1.4 Hermes
HERMES [14] is a prototype DB engine that defines a powerful query language for
trajectory databases, which enables the support of mobility-centric applications, such
as Location-Based Services (LBS). The querying model of HERMES is an extended
version of Guting’s algebra. HERMES extends the data definition and manipulation
language of Object-Relational DBMS (ORDBMS) with spatio-temporal semantics and
functionality based on advanced spatio-temporal indexing and query processing tech-
niques. HERMES has been implemented using Oracle RDBMS.
6
2.2 Guting’s Spatio-temporal Algebra
2.2 Guting’s Spatio-temporal Algebra
This section describes and discusses Guting’s approach to modeling moving and evolv-
ing spatial objects based on the use of abstract data types. We introduce data types for
moving points together with a set of operations on such entities. Some of the related
auxiliary data types, such as pure spatial or temporal types, time-dependent real num-
bers are also discussed. Chapter 4 discusses the implementation of a collection of these
types and operations in HBase using Apache Phoenix framework to obtain a complete
data model and query language.
2.2.1 Data Types
2.2.1.1 mpoint & mregion
Both mpoint & mregion are extensions of purely spatial data types point and region
respectively. An mpoint and mregion can be described as mappings from time into
space, that is
mpoint = time→ point
mregion = time→ region
More generally, we can introduce a type constructor τ which transforms any given
atomic data type a into α type τ(α) with semantics
τ(α) = time→ α
and we can denote the types mpoint and mregion also as τ(point) and τ(region),
respectively.
A value of type mpoint describes the position of a point as a function of time.
This can be represented as a curve in the 3-D space (x, y, t) as shown in figure 2.1.
The assumption is that space as well as time dimensions are continuous. This means
that if the position of a point is asked at a time instant that lies between the recorded
timestamps, the data type will still return a position. A value of type mregion is a
set of volumes in the 3-D space (x, y, t). Any intersection of that set of volumes with
a plane t = t0 yields a region value, describing the moving region at time t0. It is
7
2. BACKGROUND
Figure 2.1: A movoing point.[15]
Figure 2.2: Type Constructors[2]
possible that this intersection is empty, and an empty region is also a proper region
value.
2.2.1.2 Other Data-Types
Figure 2.2 shows the type constructors used to construct data-types of various forms
and their signatures. The data-types are divided into five types. BASE types include
the basic types which every database supports e.g. int, real, string and bool. SPATIAL
data-types include conventional spatial types like point, line and region. The differ-
ence between these types and the ones used in other spatial databases is that here the
line or a region can be disconnected. This means that a line is in fact a collection of
more than one disconnected line and a region is a collection of more than one discon-
nected regions. As you can see in figure 2.2, points is an additional data-type which
represents a collection of points. Third category is of type TIME. It contains only one
data-type instant. It corresponds to the timestamp type supported by conventional
8
2.2 Guting’s Spatio-temporal Algebra
Figure 2.3: Operators for moving types [2]
databases. It represent time as a long value. Fourth category is RANGE. RANGE
data-types can be either of type BASIC or TIME. Data-types part of RANGE cate-
gory represent intervals. Periods is a data-type belonging to temporal range category.
BASIC range types include rint, rbool, rreal etc. Fifth category is TEMPORAL.
It is further divided into either spatio-temporal types or basic temporal types. mpoint
and mregion discussed previously belong to spatio-temporal category. Basic temporal
types include mint, mbool, mreal. They are the moving versions of basic types.
2.2.2 Operators
Figure 2.3 shows some of the operators that can be applied to a moving data-type.
At gives the value of a moving object at a particular point in time. Minvalue and
maxvalue give the minimum and maximum values of a moving object. For both these
functions, a total order must exist for α. Start and stop return the minimum and
maximum of a moving value’s (time) domain, and duration gives the total length of
time intervals a moving object is defined. We can also use the functions startvalue(x)
and stopvalue(x) as an abbreviation for at(x, start(x)) and at(x, stop(x)), re-
spectively. Whereas all these operations assume the existence of moving objects, const
offers a canonical way to build spatio-temporal objects: const(x) is the ”moving”
object that yields x at any time.
In particular, for moving spatial objects we may have operations such as mdinstance
and visits. mdistance computes the distance between the two moving points at
all times and hence returns a time changing real number, a type that we call mreal
(”moving real”;mreal = τ(real)), and visits returns the positions of the moving point
given as a first argument at the times when it was inside the moving region provided
as a second argument. Here it becomes clear that a value of type mpoint may also
be a partial function, in the extreme case a function where the point is undefined at
9
2. BACKGROUND
Figure 2.4: Operators with moving results
all times. Operations may also involve pure spatial or pure temporal types and other
auxiliary types. line is a data-type describing a curve in 2-D space which may consist
of several disjoint pieces; it may also be self-intersecting. Region is a type for regions in
the plane which may consist of several disjoint faces with holes. Figure 2.5 summarizes
the operators that are part of the BerlinMOD benchmark and have been implemented
by us.
2.2.3 Example Queries
When the above mentioned data types and operators are implemented in a DBMS, we
can have a relation as follows:
flights(id: string; from: string; to: string; route: mpoint)
Now a query can be asked ”Give me all flights from Dusseldorf that are longer than
5000 kms”:
SELECT id
FROM flights
WHERE from = ”DUS” AND length(trajectory(route)) > 5000
This query projects the route i.e. an mpoint into space. We can also project it on
to time dimension as follows:
SELECT to
FROM flights
WHERE from = ”SFO” AND duration(route) ≤ 2.0
We can use the projections into space and time, to solve some spatio-temporal
question like:”Find all pairs of planes that during their flight came closer to each other
than 500 meters!”:
10
2.2 Guting’s Spatio-temporal Algebra
Figure 2.5: Operators [16]
11
2. BACKGROUND
SELECT A.id, B.id
FROM flights A, flights B
WHERE A.id 6= B.id AND minvalue(mdistance(A.route,B.route)) ≤ 0.5
2.2.4 SECONDO
SECONDO [6] is an extensible DBMS developed at University of Hagen. SECONDO
does not have a fixed datamodel, but is open for implementation of new models. It has
following three major components which can be used together or independently:
1. The kernel, which offers query processing over a set of implemented algebras, each
offering some type constructors and operators.
2. The optimizer,which implements the essential part of an SQL-like language.
3. The graphical user interface which is extensible by viewers for new data types and
which provides a sophisticated viewer for spatial and spatio-temporal (moving)
objects.
Many algebras have been implemented in SECONDO such as relations, spatial data
types, R-trees, or midi objects (music files), each with suitable operations. Each com-
ponent of SECONDO is extensible e.g the kernel can be extended by algebras, the
optimizer by optimization rules and cost functions, and the GUI by viewers and dis-
play functions. SECONDO is of importance to us because it implements Guting’s
Algebra. This algebra can be used with in its SQL like interface.
2.3 Spatio-temporal Indexes [1]
There are three different kinds of spatio-temporal indexes based on their approach.
1. The first kind of indexes uses any multidimensional indexes like R-tree and extend
them for temporal dimensions such as 3D R-tree [17], or STR-tree [17].
2. The second kind of indexes builds a separate R-tree for each time stamp and
share intersecting parts between two consecutive R-trees. A few examples of this
type are MR-tree [18], HR-tree [19], HR+-tree [20], and MV3R-tree [20].
3. The third kind of indexes divides the spatial space into grids and for each grid
builds a temporal index. This category includes SETI [21], and MTSB-tree [22].
12
2.3 Spatio-temporal Indexes [1]
(a) Objects & Minimum Bounding Boxes (b) R-Tree
Figure 2.6: Two views of R-Tree [1]
2.3.1 Multidimensional Indexes
2.3.1.1 R-Tree
R-Tree is one of the most widely used spatial index put to use by many spatial databases.
It forms the bases of many varieties of spatial as well as spatio-temporal indexes. Un-
derstanding R-Tree is necessary to grasp 3D R-Tree which can also index the temporal
dimension in addition to the spatial dimension. R-Tree is a data structure with bal-
anced height. Each node in R-tree represents a region which is the minimum bounding
box (MBB) of all its children nodes. Each node can have many children. The node
contains entry for every child and this entry represents the MBB of the referenced child
node. Whenever a point or a region needs to be searched, it is based on the MBB of
the nodes which acts as a key for finding the right nodes. R-trees can also be used for
nearest neighbor queries using either depth first search or best first search [23].
2.3.1.2 3D R-Tree
3D R-Tree is an extension to R-Tree which also keeps in consideration the time domain
while calculating MBBs. Instead of storing 2D MBBs, it stores a 3D MBB for each
segment which increases the size of the bounding box although the size of the segment
is still small. This reduces the discrimination capability of the index. The temporal
aspect of a query could be based on either a time instant or a time period. The insertion
and deletion of data is the same as R-Tree.
13
2. BACKGROUND
Figure 2.7: TB-Tree [17]
2.3.1.3 STR-Tree
The STR-tree (Spatio-Temporal R-Tree) is an extension of the 3D Rtree which supports
efficient querying of trajectories. It differs from 3D R-tree in insertion and split strategy.
An STR-Tree improves R-Tree by keeping segments belonging to the same trajectory
together.
2.3.1.4 TB-Tree
The TB-tree (Trajectory-Bundle tree) [24] is an extension to R-Tree. It is a trajectory
bundle tree which bundles the segments of the same trajectory into the same leaf node.
TB-Tree consists of a set of leaf nodes each of which contains a partial trajectory
organized in a tree hierarchy. In simple words, a trajectory spans a set of disconnected
leaf nodes. Figure 2.7 shows a trajectory symbolized by the gray band is fragmented
across six nodes c1, c3, etc. The shown part of a TB-tree structure illustrates how this
trajectory is stored. The leaf nodes representing the trajectory are connected through
a linked list.
‘
14
2.3 Spatio-temporal Indexes [1]
Figure 2.8: HR-Tree
2.3.2 Multi-Version R-Trees
Another solution for spatio-temporal indexing different than adding a temporal dimen-
sion to an R-Tree is to construct an R-Tree for each timestamp and then index the
R-Trees for time. Any temporal index can be used but B-Trees serve the purpose well.
B-Tree can be used to locate the R-Trees based on a time instant or a time period and
these R-Trees are drilled down to locate objects of interest. Constructing an R-Tree
for each timestamp is space consuming. To optimize further on this approach, only
the part of R-Tree which is different from the previous timestamp is created for the
new timestamp. Thus consecutive R-Trees share branches. The indexing structures
that use this strategy are MRtree [25] (Multiversion R-tree), HR-tree [19] (Historical
R-tree) and HR+-tree [20]].
2.3.2.1 HR-Tree
Figure 2.8 shows an example of HR-tree which stores spatial objects for timestamp 0
and 1. Since all spatial objects in A0 do not move, the entire branch is shared by both
trees R0 and R1. In this case, it is not necessary to recreate the entire branch in R1,
instead, a pointer is created to point branch A0 in R0.
2.3.2.2 HR+-Tree
H+-Tree is an improvement of HR-Tree. It allows entries belonging to different times-
tamps to be stored in the same node. In simple words, if there is a small change in
movement of an object, it can be shared with different R-trees. Because of this reason,
HR+-Tree consumes approximately 20% less space when compared to an HR-Tree yet
is several times faster [20]. For a single timestamp query, querying time is the same.
15
2. BACKGROUND
Figure 2.9: MV3RTree
2.3.2.3 MVR-tree
Although HR-Tree and HR+-Tree save a lot of space by not storing a complete R-Tree
for each timestamp, they are still plagued by a lot of duplication which costs space.
They are good in timestamp queries but perform poorly in interval queries. MV3R-
Tree [26] uses a combination of multiversion B-Trees and 3D R-Trees to overcome these
disadvantages. A MV3R-tree consists of two structures: a multiversion R-tree (MVR-
tree) and a small auxiliary 3D R-tree built on the leaves of the MVR-tree in order to
process interval queries. Figure 2.9 shows an overview of MV3R-tree.
2.3.3 Grid Based Index
Spatial and temporal dimensions are different in a way that spatial dimension has fixed
domain while temporal domain is continuously changing. The rate of change also vary
i.e. faster in temporal domain. 3D R-Tree handles both dimensions equally which
results in a lot of overlapping bounding boxes. This leads to poor performance when
data size grows. Grid based indexes handle this by partitioning the data spatially and
with-in that partition, data is indexed for temporal dimension. The SETI indexing
mechanism (Scalable and Efficient Trajectory Index) [21] is the first grid based index.
2.3.3.1 SETI
SETI partitions the data into spatial cells. This partitioning can be fixed as well
as dynamic. Each cell contains the trajectories present within its boundaries. If a
trajectory spans multiple cells, the trajectory is split into multiple pieces in such a way
that each cell only stores the piece lying inside its boundaries. Each cell is represented
by a data page and each trajectory is stored in the file as a tuple. The number of files
16
2.4 Space-Filling Curves
may increse with time. The lifetime of the page file is indexed by using an R*-tree.
Hence it has a sparse temporal index and is lightweight.
2.3.3.2 MTSB-Tree
MTSB-Tree is a variant of SETI which differs in temporal indexing strategy. Unlike
SETI, it uses the TSB-tree (Time Split B-tree) [27] to index the time dimension within
each cell. Compared to R*-tree that is used by SETI, the advantage of using TSB-tree
is that it provides results sorted by time. So it is better for those queries returning
trajectories close in spatial as well as temporal dimensions.
2.3.3.3 CSE-Tree
Another variant of SETI is CSE-tree (Compressed Start-End tree) [28] which uses
different temporal indexes for each cell. If the cell is frequently updated, it uses B+-
Tree whereas, for rarely updated data, it uses sorted dynamic array .
2.4 Space-Filling Curves
The indexes like R-Tree mentioned before work efficiently but when the data size grows,
they perform poorly. Multi-dimensional queries may lead to a multitude of disk seeks
which may kill the performance. Another reason for them performing poorly as the
data size grows is during insertion. Updating R-Trees or its variants can be a bottle
neck in such a case. Space filling curves convert multi-dimensional data into a single
dimension. B-Trees can then be used to index this dimension. As B-Trees perform
extremely well on 1-D data, the query performance increases significantly. Also the
resulting 1-D data can be sorted which makes the range queries perform far better.
The performance of range queries also depend on the type of space filling curve being
used. The curve which keeps the data in close proximity closer after the transformation
process, performs better. Figure 2.10 shows a polygon present inside an area mapped
by Hilbert Curve. As each grid cell has been assigned a number using Hilber Curve, it
can easily be retrieved using following SQL query.
SELECT *
FROM regiontable
WHERE hilbert_value=3 AND hilbert_value>=7
17
2. BACKGROUND
Figure 2.10: Spatial Query using Hilbert Curve [29]
AND hilbert_value<=12 AND hilbert_value<>11
Some of the famous space filling curves are briefly explained in the following.
2.4.1 Z-Order Curve
Z-order also known as Morton order maps multidimensional data to one dimension
while preserving locality of the data points. The z-value of a multidimensional point
is calculated by interleaving the binary representations of its coordinate values. Fig-
ure 2.11 explains the calculation of z-value for each grid where as Figure 2.12 shows
first three orders of Z-curve. Once the data are sorted into this ordering, any one-
dimensional data structure can be used such as binary search trees, B-trees, skip lists
or (with low significant bits truncated) hash tables. The resulting ordering can equiva-
lently be described as the order one would get from a depth-first traversal of a quadtree;
because of its close connection with quadtrees, the Z-ordering can be used to efficiently
construct quadtrees and related higher-dimensional data structures. [30]
2.4.2 Hilbert Curve
Hilbert curve is similar to z-order curve but follows its own ordering. This ordering
also preserves the proximity of points in most of the cases. However, like z-order curve,
18
2.4 Space-Filling Curves
Figure 2.11: Z-Order Calculation
Figure 2.12: 1st, 2nd and 3rd order Z-Curves[31]
19
2. BACKGROUND
Figure 2.13: Approximations of Hilbert Curve [31]
there are edge cases where multiple range queries might be required to retrieve a region.
Figure 2.13 shows different approximations of Hilbert Curve.
2.4.3 GeoHash
2.4.3.1 Introduction
Geo-Hash is a hierarchical spatial data structure which divides the space into grid
shaped buckets. One of the benefits of Geo-Hash is that it supports multiple precisions
just by removing or adding characters to the final hash value. The lower the number
of characters used, the lower will be the precision. This helps in side optimization and
allows filtering of data at much lower precision. This allows to filter out data that
might not be interesting by using cheaper operations. Another property of Geo-Hash
index is its co-location of neighboring coordinates. This results in the representation of
nearby places with similar prefixes but not always. The longer a shared prefix is, the
closer the two places are.
A Geo-Hash is a kind of space filling curve that turns multidimensional values into
a single dimensional. This requires each of the dimensions to have a fixed domain.
For spatial values, we want to convert latitudes and longitudes to a single value. The
longitude dimension has a range [-180.0, 180.0], and the latitude dimension has a range
[-90.0, 90.0]. Although Geo-Hash preserves spatial locality, it has some edge-cases as
well where this does not hold true. A precision is designated when calculating a Geo-
Hash. The highest precision that 8-byte Long can hold is 12 characters. We can increase
20
2.4 Space-Filling Curves
Figure 2.14: Geo-Hash Precisoin Coverage [32]
the precision but then the string will not remain understandable. When end characters
are removed from a Geo-Hash, its precision decreases and hence it represents a larger
area of map. 12 characters which is full precision, represents a point. Any Geo-Hash
with less than 10 characters represents an area on the map i.e. bounding box around an
area. Figure 2.14 illustrates the variation in size of area represented when a Geo-Hash
is truncated.
In HBase, we can use a Geo-Hash as a prefix for querying. All points within
the space represented by the Geo-Hash match the common prefix. This means that
we can use HBase’s prefix scan on the rowkeys to retrieve points that are relevant
to the query. As rowkeys are sorted, the closer points are stored together on the
disk. But as figure 2.14 shows, if we choose a lower precision, we might retrieve a
lot of unwanted data. let’s look at some real points. Consider these three locations:
LaGuardia Airport (40.77 N, 73.87 W), JFK International Airport (40.64 N, 73.78
W), and Central Park (40.78 N, 73.97 W). Their coordinates Geo-Hash to the values
dr5rzjcw2nze, dr5x1n711mhd, and dr5ruzb8wnfr respectively. We can look at those
points on the map in figure 2.15 and see that Central Park is closer to LaGuardia
21
2. BACKGROUND
Figure 2.15: Reletive Distances of Points [32]
than JFK. In absolute terms, Central Park to LaGuardia is about 5 miles, whereas
Central Park to JFK is about 14 miles. Because they’re closer to each other spatially,
we expect Central Park and LaGuardia to share more common prefix characters than
Central Park and JFK. [32]
sort <(echo "dr5rzjcw2nze"; echo "dr5x1n711mhd"; echo "dr5ruzb8wnfr")
dr5ruzb8wnfr
dr5rzjcw2nze
dr5x1n711mhd
2.4.3.2 How to Calculate Geo-Hash
Although we are representing Geo-Hashes as Base32 encoding character strings, in
reality, the Geo-Hash is a sequence of bits representing an increasingly granular sub-
partition of longitude and latitude. For example, 40.78 N is a latitude. It falls in the
upper half of the [-90.0, 90.0] range, so its first Geo-Hash bit is 1. 40.78 is in the lower
half of the range [0.0, 90.0] so its second bit is 0. The third bit is 1 because 40.78 falls
in the upper half of third range [0.0, 45.0] and so. We represent this binary value as a
22
2.4 Space-Filling Curves
Figure 2.16: Geo-Hash Calculation [32]
sequence of ASCII characters using Base-32 encoding. So if the point ≥ the midpoint,
it’s a 1-bit otherwise, it’s a 0-bit. This process is repeated, again by cutting the range
in half and selecting a 1 or 0 based on the locality of target point. This is done for
both the longitude and latitude values. Then the bits are weaved together to create
the hash.
2.4.3.3 Objectives to Achieve in Geo-Hash Indexing
Although Geo-Hash is an efficient of way of converting multiple dimensions to a single
one, it comes with some complications. If the data is indexed using Geo-Hash, it is
supposed to be queried using a Geo-Hash string. As the size of the grids represented
by Geo-Hashes is fixed, exact representation of an area is a challenge. Keeping this in
context, we define following objectives for our index and query design.
1. Lesser Results: To represent an area with a single Geo-Hash might result in a
grid cell far greater than the actual area. If this grid-cell is used to index the area,
we might get a lot of unwanted results which will have to be filter at the client.
We need to figure out a way in which we can reduce the number of unwanted
results.
23
2. BACKGROUND
2. Lesser Scans: We can reduce the number of results by representing an area
with high precision Geo-Hashes which means a lot of hashes are required to index
a single area. For instance, if we have indexed a column with hashes of length
11 and we want to query it with a comparatively large area which requires 1000
hashes of length 11 to be represented, we will have to send 1000 different scan
requests which is totally in-optimal. We need to figure out a way to solve this
problem.
24
3
Distributed Platforms for
Querying
This chapter provides an overview of existing distributed querying platforms. We cat-
egorize the platforms in 3 types. Section 3.1 describes well known distributed spatial
platforms. Parallel Secondo being the only distributed Moving Object Database is de-
scribed in detail in section 3.3. In section 3.2, we briefly discuss existing distributed
platforms for online querying. As the thesis deals with the implementation of Guting’s
algebra over HBase, section 3.4 discusses it in detail. We conclude this chapter by
describing some of the criteria we considered before selecting HBase for our implemen-
tation.
3.1 Distributed Spatial Data Processing Platforms
3.1.1 Spatial-Hadoop
SpatialHadoop [33] is a MapReduce extension to Apache Hadoop developed at Univer-
sity of Minnesota. It has been designed specially to work with spatial data and can
be used to analyze huge spatial datasets on a cluster of machines. It provides efficient
processing of spatial data by providing data types to be used in MapReduce jobs in-
cluding point, rectangle and polygon [34]. Spatial indexes can be built in HDFS such
as Grid file, R-tree and R+-tree. To efficiently read these indexes in MapReduce jobs,
InputFormats and RecordReaders have been provided. Spatial operations have been
implemented as MapReduce jobs which access spatial indexes. It allows developers to
25
3. DISTRIBUTED PLATFORMS FOR QUERYING
implement custom spatial operations which can benefit with spatial indexes. Spatial-
Hadoop comes bundled with Pigeon [35] which is a spatial extension to Pig which makes
querying easier and intuitive. All operations in Pigeon are introduced as user-defined
functions (UDFS) which decouples it from users’ existing deployments of Pig. The
spatial functionality of Pigeon is based on ESRI Geometry API, a native Java open
source library for spatial functionality licensed under Apache Public License. Pigeon
uses the same function names as PostGIS to make its use easier for existing PostGIS
users. To give a feel of how it looks like, following is an example that computes the
union of all ZIP codes in each city:
1 zip_codes = LOAD ’zips ’ AS (zip , city , geom);
2 zip_by_city = GROUP zip_codes BY city;
3 zip_union = FOREACH zip_by_city
4 GENERATE group AS city , ST_Union(geom);
3.1.2 Hadoop-GIS
Hadoop-GIS [36] is a scalable and high performance spatial data warehousing system
for running large scale spatial queries on Hadoop. Hadoop-GIS is based on MapReduce
and improves spatial query performance by using spatial partitioning. It has a cus-
tomizable spatial query engine called RESQUE. RESQUE implements operators and
measurement functions to provide geometric computations and implicit parallel spatial
query execution on MapReduce. It also implements effective methods for amending
query results through handling boundary objects. Hadoop-GIS constructs a global
index based on its partitioning strategy and a customizable on demand local spatial
index to achieve efficient query processing performance for local operations. To sup-
port declarative spatial queries, an integration with Hive has also been developed. The
architecture of Hadoop-GIS is shown in figure 3.1.
3.2 Distributed Online Querying Platforms
3.2.1 Cassandra
Apache Cassandra is a distributed key-value store developed at Facebook [37]. It
can handle very large amounts of data spread out across many commodity servers.
26
3.2 Distributed Online Querying Platforms
Figure 3.1: Hadoop-GIS Architecture [36]
Cassandra provides high availability without single point of failure by replication of
data to servers across multiple data centers. It also provides the option for choosing
between synchronous or asynchronous replication for each update. Also, its elasticity
allows read and write throughput, both increasing linearly as new machines are added,
with no downtime or interruption to applications. Its architecture is a mixture of
Google’s BigTable [11] and Amazon’s Dynamo [38]. As in Amazon’s Dynamo, every
node in the cluster has the same role, so there is no single point of failure unlike HBase.
It resembles HBase in a way that the data model provides a structured key-value store
where columns are added only to specified keys which means that different keys can have
different number of columns in any given family. Cassandra is a write-oriented system
whereas HBase was designed to get high performance for intensive read workloads.
Cassandra can be queried using a SQL style language called CQL or Cassandra Query
Language.
3.2.2 Stinger
Stringer is an improvement of Hive [39](originally developed at Facebook). Hive is a
SQL like language that performs reasonably well for running data warehouse style an-
alytical queries on huge amounts of data; however, it is not suitable for online queries.
Stinger is a community wide initiative to build interactive querying support into Hive
and claims performance improvement of up to 100x. Significant performance improve-
27
3. DISTRIBUTED PLATFORMS FOR QUERYING
ments over original Hive include, introduction of an ORCFile format, new query opti-
mizer for complex query operations and a vectorized query engine.
3.2.3 HBase
HBase [40] is an open source, distributed, column-oriented database system based on
Googles BigTable [11]. It runs on top of Apache Hadoop [42] and Apache ZooKeeper [41]
and uses the Hadoop Distributed Filesystem (HDFS) [42]. HDFS is an open source
implementation of Googles file system GFS [43] which provides fault-tolerance and
replication for the data stored on it. HBase is written in Java. It provides linear
and modular scalability, strictly row based consistent data access automatic and con-
figurable sharding of data. HBase has limited support for full-fledge schema creation
but supports tables, columns and column families. As it is a key-value store, access is
restricted based on the key only. HBase can be accessed either through its API for real-
time access or by using the MapReduce jobs in Hadoop for analytical batch processing
use-cases. Column families can contain many columns. Each row may have a different
set of columns. The table cells are versioned and are stored as an uninterpreted array
of bytes.
3.2.4 Phoenix
Apache Phoenix [44] is a SQL layer over HBase. It is available as a client-embedded
JDBC driver which enables low latency queries over HBase data in SQL like language.
It takes a SQL query, compiles it into a series of HBase scans, and orchestrates the
results to deliver regular JDBC result sets to the client. It stores the table metadata
in an HBase table in a versioned format. Versioning enables snapshot queries over
prior versions to run using the correct schema automatically. Under the hood, it uses
the HBase API and for efficiency reasons implements most of its functionality using
coprocessors and custom filters. This results in performance on the order of seconds
for big datasets.
3.3 Parallel Secondo
Parallel SECONDO [45] scales up the capability of processing extensible data models
in the SECONDO database system to a cluster of computers. It makes available almost
28
3.3 Parallel Secondo
all the operators of SECONDO to be run in parallel on individual SECONDO nodes
by using MapReduce framework. The drawback in this case is that the queries have to
be written in SECONDO executable language which is non-intuitive and more complex
when compared to SQL. Using the executable language, the user can write parallel
queries without learning too many details about the underlying Hadoop platform.
3.3.1 Architecture
Parallel SECONDO has been designed by coupling the Hadoop framework and discrete
SECONDO databases deployed on various nodes of a cluster of computers, as shown in
Figure 3.3. Its deployment is flexible and can be deployed either on a single computer
or a cluster. Both hadoop and Parallel SECONDO are deployed on the same cluster
but can be used independently. Hadoop uses HDFS (Hadoop Distributed File System)
for data exchange, whereas Parallel SECONDO uses a distributed file system called
PSFS (Parallel SECONDO File System), specially prepared for Parallel SECONDO.
Each individual deployable component in Hadoop is called a node however in Parallel
SECONDO it is called a Data Server. A Data Server contains a compact version of
SECONDO called Mini-SECONDO and its database, together with a PSFS node. A
single cluster machine can contain many Data Servers. This increases performance on
machines having multiple hard disks. A Data Server can be deployed on each disk for
higher throughput. When parallel queries are processed, MapReduce framework is used
which means HDFS is used to assign tasks to Data Servers however most intermediate
data is exchanged through the PSFS. Parallel SECONDO contains a master Data
Server and many slave Data Servers. The entry point to the system is only through
the master. Parallel SECONDO comes with a lot of PQC (Parallel Query Converter)
operators, also called Hadoop operators, which convert a parallel query to a Hadoop
job with various tasks to be executed at the slave Data Servers. Data Servers process
these tasks in parallel inside the Hadoop framework. The master database stores meta
data of the whole system.
3.3.2 Parallel Query Execution
To understand the system better, lets see how parallel queries are written in SECONDO
executable language. The SECONDO executable language is more complex than SQL
but allows easier querying than using MapReduce jobs.
29
3. DISTRIBUTED PLATFORMS FOR QUERYING
Figure 3.2: PS-Matrix[45]
30
3.3 Parallel Secondo
3.3.2.1 PS-Matrix
Parallel SECONDO uses the concept of PS-Matrix to distribute data over the cluster,
as shown in Figure 3.2. A Secondo object is partitioned using two functions, d(x) and
d(y). The d(x) divides the data into R rows. These rows are distributed over the
cluster. A single Data Server can contain more than one rows. After distribution, d(y)
is used to divide each row into C columns. Hence a PS-Matrix is composed by RXC
pieces.
3.3.2.2 Distributed Data Types
To represent the PS-Matrix, Parallel SECONDO provides a data type flist . It wraps the
existing SECONDO objects and makes them distributable by Parallel Secondo. After
division of an object into a PS-Matrix, piece data belonging to flist is distributed and
kept in slave Data Servers but the partition scheme is kept in the Master SECONDO
as an flist object.
An flist object can be distributed to slave Data Servers in two ways. Either the
piece data is stored in mini SECONDO databases belonging to Data Servers as objects
or it is stored as files in the PSFS node of the Data Servers. Hence, there are two kinds
of flist objects in Parallel SECONDO.
1. Distributed Local Objects (DLO): In a DLO, large-sized SECONDO objects
are divide into a Nx1 PS-Matrix. Each row of this matrix is saved in a slave Mini-
SECONDO database as SECONDO objects, called sub-objects. The sub-objects
that belong to the same flist have the same name in different slave databases.
Theoretically, DLO flist can wrap all available SECONDO data types.
2. Distributed Local Files (DLF): In a DLF, the data is also divided into a
RXC PS-Matrix, but the difference with DLO is that each piece is saved as a
PSFS file, called sub-file. During parallel operations, sub-files can be exchanged
between data servers. At present, only relations can be saved as sub-files.
An flist object can wrap any kind of SECONDO object but it is not always optimal.
For example, the objects which are of very small size should not be stored as an flist,
rather they should be duplicated to various nodes. The process of duplication is good
for small objects but is very heavy for larger objects because the objects contain data
31
3. DISTRIBUTED PLATFORMS FOR QUERYING
Figure 3.3: Parallel Secondo Infrastructure [46]
Figure 3.4: Some examples of flist data type
32
3.3 Parallel Secondo
in nested-list format. If a relation has millions of tuples, there will be a lot of overhead
involved in transforming it. Therefore, Parallel SECONDO introduces a new data kind
called DELIVERABLE. The data types belonging to this kind can be duplicated to
slaves during the runtime and hence can be used in parallel queries,.
3.3.2.3 Distributed Operators
There are three types of operators specifically designed for Parallel SECONDO: Flow
operators, Assist operators and Hadoop operators. Flow operators are responsible for
distribution and collection of objects to slave nodes. In other words, they connect se-
quential queries with parallel queries. Two operator of this type spread and collect are
explained in the following sections. Assist operators are designed to work with Hadoop
operators. They enable flist objects to be used with normal operators. Hadoop opera-
tors are based on either Map or the Reduce phase of a hadoop job. They allow running
sequential SECONDO queries at slave nodes of Parallel SECONDO. The explanation
of each of the operators is out of the scope of the thesis but a few important distributed
operators have been chosen which deserve explanation. The informal explanation will
help understand Parallel SECONDO better.
1. Spread Operator: The spread operator partitions a SECONDO relation into
a PS-Matrix, distributes pieces into the cluster, and returns a DLF flist. It di-
vides a relation into rows of PS-Matrix according to a partition attribute AI.
Each row can be further partitioned into columns if another partition attribute
AJ is provided. As it returns a DLF object, each piece of relation is exported as
a sub-file. Following is an example of the use of spread operator. The ffeed op-
erator called ”File Feed” reads a file ’QueryPoints’ from /home/fmorakzai/data
directory and passes it on to spread operator. The spread operator partitions
the data into 4 files each stored in one of the 4 slave SECONDO nodes. When
the data is stored in the sub-files in each node, a reference to those is returned to
the master node for it to store its metadata as a DLF object.
let QueryPoints_p = "QueryPoints" ffeed[’/home/fmorakzai/data’;;]
spread[; Id, 4,TRUE;];
33
3. DISTRIBUTED PLATFORMS FOR QUERYING
2. Collect Operator: The collect operator performs the opposite function
to that of the spread operator. It takes as an input, a DLF kind flist object,
collects the constituent sub-files distributed over the cluster, and returns a stream
of tuples from sub-files. Following example explains its operation. The collect
operator reads the ’QueryPoints p’ flist object that was created in the previous
example from slave nodes and combines them at the master as a single tuple
stream. This stream is then passed to count operator which counts the number
of tuples and returns the result.
query QueryPoints_p collect[] count;
3. Para Operator: The flist can wrap all available SECONDO data types, and
work with various SECONDO operators. However, all operators can not recognize
this new data type and do not know how to process it. To solve this, Parallel
SECONDO implements the para operator which unwraps flist objects and returns
their embedded data types. After this, the object can pass the type checking of
existing operators.
4. HadoopMap Operator: hadoopMap creates an flist object of either DLO or
DLF kind, after processing provided sequential operators by slaves in parallel,
during the map step of the template Hadoop job. The operators provided to it as
an argument are not evaluated in the master node, but delivered and processed in
slaves. Lets take the creation of a distributed B-Tree as an example. The original
sequential query for creation of a B-Tree index looks like:
let singleTable_btree = singleTable createbtree[Licence];
The singleTable exists at the master and the index is also created at the master
just like normal SECONDO B-Tree index. To create a distributed index, we first
need to distribute/partition the singleTable to slave nodes and store it in slave
SECONDO databases as a table. The following query does that. The spread
operator distributes the data to slave nodes as sub-files based on ID attribute
of the single table. HadoopMap operator excepts the DLF object created by the
spread operator and runs the consume operator at each node. The consume
operator stores the sub-files as a table in slave SECONDOs thus creating a DLO.
34
3.4 HBase
let distributedTable = singleTable feed
spread[;ID, 10, TRUE;]
hadoopMap[; . consume];
After running this query, singleTable is distributed over slave SECONDO nodes
and is represented by a DLO distributedTable. We can create a local B-Tree
for this table at each slave node by passing the table name and the function to
create B-Tree to HadoopMap operator as follows.
let distributedTable_btree = distributedTable
hadoopMap[; . createbtree[ID]];
This will create local B-Tree indexes and return a reference distributedTable btree
as a DLO.
3.4 HBase
HBase has been described briefly in section 3.2.3. This section discusses the architec-
ture HBase in more detail. HBase is based on the concept of Log-Structured Merge-
Trees(LSM-trees) [47] just like Google’s BigTable [11]. To understand the functioning
of HBase, its necessary to first understand how LSM-trees work.
3.4.1 Log Structured Message Trees
Log-structured merge-trees [47], also known as LSM-trees, store the incomming data
in a logfile first, completely sequentially. Once the log has the modification saved, it
then updates an in-memory store that holds the most recent updates for fast lookup.
When the system has accumulated enough updates and starts to fill up the in-memory
store, it flushes the sorted list of key→record pairs to disk, creating a new store file.
At this point, the updates to the log can be thrown away, as all modifications have
been persisted. The store files are arranged similar to B-trees, but are optimized for
sequential disk access where all nodes are completely filled and stored as either single-
page or multipage blocks. Updating the store files is done in a rolling merge fashion,
that is, the system packs existing on-disk multipage blocks together with the flushed
in-memory data until the block reaches its full capacity, at which point a new one is
started.
35
3. DISTRIBUTED PLATFORMS FOR QUERYING
Figure 3.5: Multipage blocks iteratively merged across LSM-trees
Figure 3.5 shows how a multipage block is merged from the in-memory tree into the
next on-disk tree. Merging writes out a new block with the combined result. Eventually,
the trees are merged into the larger blocks. As more flushes are taking place over time,
creating many store files, a background process aggregates the files into larger ones so
that disk seeks are limited to only a few store files. The on-disk tree can also be split
into separate trees to spread updates across multiple store files. All of the stores are
always sorted by key, so no reordering is required to fit new keys in between existing
ones. Lookups are done in a merging fashion in which the in-memory store is searched
first, and then the on-disk store files are searched next. That way, all the stored data,
no matter where it currently resides, forms a consistent view from a clients perspective.
Deletes are a special case of update wherein a delete marker is stored and is used
during the lookup to skip deleted keys. When the pages are rewritten asynchronously,
the delete markers and the key they mask are eventually dropped. An additional
feature of the background processing for housekeeping is the ability to support predicate
deletions. These are triggered by setting a time-to-live (TTL) value that retires entries,
for example, after 20 days. The merge processes will check the predicate and, if true,
drop the record from the rewritten blocks.
LSM-trees work at disk transfer rates and scale much better to handle large amounts
of data. They also guarantee a very consistent insert rate, as they transform random
writes into sequential writes using the logfile plus in-memory store. The reads are
independent from the writes, so we get no contention between these two operations.
The stored data is always in an optimized layout. So, we have a predictable and
consistent boundary on the number of disk seeks to access a key, and reading any
number of records following that key doesnt incur any extra seeks. In general, what
could be emphasized about an LSM-tree-based system is cost transparency: we know
36
3.4 HBase
Figure 3.6: HBase Architecture
that if we have five storage files, access will take a maximum of five disk seeks, whereas
we have no way to determine the number of disk seeks an RDBMS query will take,
even if it is indexed.
The next sections will explain the storage architecture.
3.4.2 Architecture
Figure 3.6 shows an overview of how HBase and HDFS are combined to store data.
The figure shows that HBase handles basically two kinds of file types: one is used for
the write-ahead log and the other for the actual data storage. The files are primarily
handled by the HRegionServers. In certain cases, the HMaster will also have to perform
low-level file operations. The actual files are divided into blocks when stored within
HDFS.
HBase consists of one master and many slave nodes. The slave nodes are called
Region Severs. They are called Region Servers because they contain regions of tables.
A table is partitioned based on its key. Each partition (called a Region) is assigned
to a Region Server. Each Region Server ensures efficient read and write access to the
region it is responsible for. A region of a table is called HRegion. A Region Server can
host more than one HRegions belonging to one or more tables. To store an HRegion,
the Region Server uses LSM-Tree as explained before. The actual storage file on disk is
37
3. DISTRIBUTED PLATFORMS FOR QUERYING
called an HFile where as the storage in memory is known as MemStore. When MemStore
gets full, it is flushed to disk as an HFile. Thus, a single HRegion can be stored on
multiple HFiles on disk. For improving disk access, these files are later merged into a
single file through a process called compaction. Every Region Server maintains a Write
Ahead Log (WAL) known as HLog. Update to any of the regions it maintains are first
written to WAL and then stored in the corresponding MemStore.
The WAL (HLog) as well as all HFiles are stored on HDFS for the purpose of
replication and fault-tolerance. Each Region Server has a DFS Client which communi-
cates with HDFS. HDFS DataNode and the Region Server can both exist on the same
machine thus preventing writes or reads over the network.
3.4.3 Write Process
When a client issues a write request, it is routed to the HRegionServer, which hands
the details to the matching HRegion instance. The first step is to write the data to
the write ahead log (the WAL), represented by the HLog class. The WAL is a standard
Hadoop SequenceFile and it stores HLogKey instances. These keys contain a sequential
number as well as the actual data and are used to replay not-yet-persisted data after
a server crash. Once the data is written to the WAL, it is placed in the MemStore. At
the same time, it is checked to see if the MemStore is full and, if so, a flush to disk is
requested. The request is served by a separate thread in the HRegionServer, which
writes the data to a new HFile located in HDFS. It also saves the last written sequence
number so that the system knows what was persisted so far.
3.4.4 Read Process
When a client issues a read request, it is routed to the HRegionServer, which hands
the details to the matching HRegion instance. As the request is always based on the
key, the key is searched in the MemStore. If found, the KeyValue instance is returned.
If the key can not be found in MemStore, it is searched for in the existing HFiles
as more than one HFiles may exist. To optimize the search, each HFile contains a
Bloom Filter which can tell with 100% confidence if the file does not contain the key.
If the Bloom Filter suggests that the file contains the key, there is high probability of
its existence. Based on the suggestion of Bloom Filter, the key is searched in the file.
38
3.4 HBase
Figure 3.7: An HBase table with two column families [48]
HFile is stored as B-Tree which greatly enhances the search efficiency. If the key is
found, a KeyValue instance is returned.
3.4.5 Data Model
HBases data model is very different from what you have likely worked with or know of
in relational databases . As described in the original Bigtable paper [11], its a sparse,
distributed, persistent multidimensional sorted map, which is indexed by a row key,
column key, and a timestamp . The easiest and most naive way to describe HBases
data model is in the form of tables, consisting of rows and columns . The concepts of
rows and columns is slightly different the RDBMSs. We define some of the concepts
first:
1. Table: HBase organizes data into tables . Table names are Strings and composed
of characters that are safe for use in a file system path.
2. Row: Within a table, data is stored according to its row. Rows are identified
uniquely by their row key. Row keys do not have a data type and are always
treated as a byte[] (byte array).
3. Column Family: Data within a row is grouped by column family. Column
families also impact the physical arrangement of data stored in HBase. For this
reason, they must be defined up front and are not easily modified. Every row in a
table has the same column families, although a row need not store data in all its
39
3. DISTRIBUTED PLATFORMS FOR QUERYING
families. Column families are Strings and composed of characters that are safe
for use in a file system path.
4. Column Qualifier: Data within a column family is addressed via its column
qualifier, or simply, column. Column qualifiers need not be specified in advance.
Column qualifiers need not be consistent between rows. Like row keys, column
qualifiers do not have a data type and are always treated as a byte[].
5. Cell: A combination of row key, column family, and column qualifier uniquely
identifies a cell. The data stored in a cell is referred to as that cells value. Values
also do not have a data type and are always treated as a byte[].
6. Timestamp: Values within a cell are versioned. Versions are identified by their
version number, which by default is the timestamp of when the cell was written.
If a timestamp is not specified during a write, the current timestamp is used. If
the timestamp is not specified for a read, the latest one is returned. The number
of cell value versions retained by HBase is configured for each column family. The
default number of cell versions is three.
3.5 Choice of Platform for Guting’s Algebra
A number of NoSQL databases are available in the market each with a particular set
of features. Deciding about the platform of choice is not an easy task. Before going for
HBase, we outlined some criteria which we considered important for the platform for
implementation of the algebra. These criteria are described in the following.
3.5.1 Schema Design
One of the features we were looking for, was the ability to design flexible schemas.
Although almost all of the online NoSQL platforms are Key-Value stores, some of them
provide an abstraction over physical storage which allows users to design schemas. This
greatly enhances the usability of the system and makes it intuitive. It also makes it
easier for people from RDBMS community to migrate to the system. All of the systems
under discussion hold schema design capabilities.
40
3.5 Choice of Platform for Guting’s Algebra
3.5.2 Indexing
As the objective of the thesis is to have online querying support for trajectories or
moving objects, indexing is an important criteria. Moving Objects Databases queries
are normally multi-dimensional. A distributed system with indexing support performs
better for such queries. The system should not only provide indexing support but should
also allow to develop custom indexes. This is important because different types of
applications require different kinds of indexes that perform efficiently for that particular
use-case. All of the systems under discussion possess indexing capabilities. Indexing in
Cassandra is limited as it only helps if the column to be indexed has low cardinality.
We also require the platform to have the support of multi-dimensional indexing. By
this we mean that it should allow indexes on different columns of the same table.
3.5.3 Partitioning Control
Trajectory data is difficult to process when the data grows bigger especially when it
is distributed over a cluster of computers. Imagine a trajectory X stored at node
A. If a new point that is part of trajectory X, arrives and is stored at node B, the
trajectory becomes distributed which makes it difficult to process. A single operation
on trajectory X will require other parts of this trajectory to be retrieved over the
network which is very costly. Some queries require data belonging to a certain time
period to be processed together. In that case the data should be partitioned by time.
Keeping this in view, the system should give us the control to partition the data that
suits us best for the implementation of Guting’s algebra.
3.5.4 Co-location
Partitioning of data allows us to have similar data on a single node but this is not
enough. It prevents us from network latencies but not from disk seeks. Keeping in
view the nature of queries based on Guting’s algebra, we believe that co-location of
data is of crucial nature. If we have an area query asking for all the points inside a
region fulfilling a certain criteria, we want all the points to be located together on disk.
If the data is co-located, we can access all those points with a single seek and then
operate at disk transfer rate. We can get co-location of data in HBase by carefully
41
3. DISTRIBUTED PLATFORMS FOR QUERYING
designing its key however in Cassandra, it is difficult to achieve as it hashes the key
and does not guarantee co-location on disk.
3.5.5 Scan Performance
Most of the queries that we can think of by using Guting’s algebra, are range queries or
require a scan e.g. ”Give me the license details of all taxis with in 3 km of Berlin Hbf
right now”. The platfrom should give reasonable scan performance. Apache Cassandra
is good at point queries; however, Apache HBase gives reasonable scan performance
because of the way it sorts and stores the data co-located based on its key.
3.5.6 Transactions
As our aim is to have the algebra implemented for on-line querying, this suggests that
the system is expected to be deployed in production and accepting a lot of live inputs.
This means that there will be a lot of inserts and updates. We mentioned updates
because the way Guting’s mpoint data type is stored using object based approach i.e.
in a single row, the row will have to be updated each time a new point arrives that
belongs to the trajectory. This motivates the support of transactions. HBase provides
row level consistency and transactional support however Apache Cassandra does not.
Apache Stinger or in other words Apache Hive does not support insertions at all.
3.5.7 Latency
Our objective is to have Guting’s algebra in an on-line environment which means that
whatever platform we choose for our implementation should have low latency in query
processing. Apache HBase and Cassandra are good at it. Although Project Stinger
improved the performance of Apache Hive many times, it is still a platform for offline
analytical queries.
3.6 Summary
In this chapter we discussed various open source distributed platforms for query pro-
cessing that we can use for implementation of Guting’s algebra and that could give
comparable performance with Parallel Secondo which in our knowledge is the only
parallel moving object database. We explained the criteria for our choice of the right
42
3.6 Summary
Ser. Feature Stinger Cassandra HBase
1. Index partial partial yes
2. Partitioning Control no no yes
3. Co-location no no yes
4. Scan Performance yes no yes
5. Transactions no no yes
6. Low Latency no yes yes
Table 3.1: Platform Selection Criteria
platform. Table 3.1 summarizes the discussed platforms’ capabilities according to our
shortlisted criteria. We chose Apache HBase as a platform of our choice after compar-
ison with other platforms.
43
3. DISTRIBUTED PLATFORMS FOR QUERYING
44
4
Algebra Implementation
This chapter presents the implementation of Guting’s algebra over HBase. In sec-
tion 4.1 we discuss the motivation behind the use of Apache Phoenix instead of raw
HBase. Section 4.2 discusses the choice of data structure for implementation of data-
types. Section 4.3 presents the data structures of various types implemented in Apache
Phoenix. As the operators have already been explained in chapter 2, we conclude this
chapter by summarizing some points regarding the implementation of operators.
4.1 Motivation behind the use of Apache Phoenix
Initial work of the thesis was done on raw HBase. Although this approach could lead
to better performance but following drawbacks let to the choice of Apache Phoenix.
1. The schema design could be highly optimized for a particular dataset (BerlinMOD
for this thesis) but there was no generalized way of describing it. Which means
that for a new dataset the same effort is supposed to be done
2. Querying support for the moving object data was provided in the form of a java
api. Operators like ”feed” and ”filter” were provided but the user was supposed to
know the storage structure and the schema design intricacies to write an optimal
query.
3. No SQL interface was available which means no automatic optimization was pos-
sible
45
4. ALGEBRA IMPLEMENTATION
4. For optimization purposes, co-processors were used which run in HBase Region
Server process space. If the code crashed, it would take down the region server
with it. Use of such Algebra implementation requires a lot of trust by the user.
4.2 Implementation Approaches
Keeping in view the above drawbacks, Apache Phoenix was chosen to implement the
algebra. At the time of writing of this thesis, Phoenix does not support definition of
custom data types. To overcome this drawback, following approaches were considered.
4.2.1 Use of Struct
Most commercial databases support ”STRUCT” data type. This allows users to define
structures of their own and store custom objects. Supporting STRUCT data type is
work in progress in Phoenix. This led to the consideration of following approaches.
4.2.2 Binary Objects
Use of binary objects is another way of defining custom data types. Phoenix supports
binary data type using which, any custom object/data can be stored. This approach was
avoided because the data in binary form is hard to handle specially during debugging
of operators, loading scripts and the data types themselves. Also decoding the binary
data each time we have to perform some operation is costly especially if we have huge
number of rows.
4.2.3 Data Type Flattening
Another approach for implementation of data types is by flattening custom data types
to the natively supported data types of the system. This means that all custom data
types are flattened to arrays of native types. This approach has two benefits. First,
it does not require a decoding process and operations can be performed in the native
data-type. Second, it makes the development process simple as it is easily readable
while debugging.
46
4.3 Data Structures
4.3 Data Structures
4.3.1 Spatial Data Types
For the implementation of Guting’s data-types, we used the flattening approach. In
the following, the representation of various implemented data types is shown.
4.3.1.1 Point
float[3]-->{1,x,y}
A point is represented with an array of 3 float numbers. The first digit ”1” denotes the
type code and the next two numbers represent the x and y coordinates of the point.
The type code is used by the operators to identify the type of object passed to them.
4.3.1.2 Points
float[2n+6]-->{2,bb{xmin,ymin,xmax,ymax},numPoints,
x1,y1,x2,y2,...xn,yn}
Points data type is represented by an array of type float. The first element of the
array represents the type code. the next 4 elements of the array represent the bounding
box covering all points. The bounding box helps in indexing the data-type. This also
helps by preventing the need for parsing all the objects for application of an operator.
Therefore, only those objects are parsed whose bounding box intersects with the area
of interest. The next number represents the number of points n contained in the array.
The points are represented in the array starting from index 5 till 5 + 2n− 1.
4.3.1.3 Line
float[n]-->{3,bb{xmin,ymin,xmax,ymax},numPoints,length,
x1,y1,x2,y2,...xn,yn}
Line data type is represented by a float array. The first element of the array represents
the type code. Next four elements represent the bounding box for the line. The
bounding box is stored to make spatial queries faster as the lines can be filtered just by
comparing the bounding box with query parameter instead of comparing all the points
inside the line with the query parameter. The next element represents the number of
47
4. ALGEBRA IMPLEMENTATION
points contained in the line. The next element is the length of the line. This parameter
is put here to improve the performance of queries based on line length. After that the
points of the line are stored in the array. If there are n points in the line, the size of
the array will be 2n+ 7.
4.3.1.4 DLine
float[n]-->{4,bb{xmin,ymin,xmax,ymax},numLines,length,
numPts,x1,y1,x2,y2,...xn,yn,
numPts,x1,y1,x2,y2,...xn,yn,
numPts,...}
DLine data type represents a line with disconnections or breaks. Line in Guting’s
data types, represents a disconnected line but for the sake of compatibility with other
spatial libraries, Line has been used to represent a connected line whereas DLine is
used to represent a line with breaks. DLine is represented as a float array. The first
element represents the type code. The next four elements represent the bounding box
of DLine. The next element represents the number of connected lines in the data type
after which the total length of DLine is stored. Hereon, the data related to individual
lines is stored. Each connected line starts with the element representing the the number
of points in the line. This helps in knowing how many elements of the array to scan
next. The element after the number of elements twice this number, is part of next line.
4.3.1.5 Region
float[n]-->{5,bb{xmin,ymin,xmax,ymax},numFaces,
numPts,x1,y1,x2,y2,...xn,yn,
numPts,x1,y1,x2,y2,...xn,yn,...}
Region data type represents an area/region. The difference between this and the con-
ventional Region type used in geo-spatial domain is that this data type can represent
a disconnected region. A disconnected region consists of more than one region which
may or may not share common area. In simple words, it is a collection of regions. This
data type is represented by a float array whose first element is the type id. The next
four floats represent the bounding box of the area covered by all regions combined.
This helps in faster query processing and the operator does not need to parse the whole
48
4.3 Data Structures
Ser. Data Type Array Type Array Structure
1 Uint long[3] {11,int,time}2 UBool long[2] {12,+-time}3 UReal double[3] {13,double,time}4 UString varchar[n+3] {14,size,varchar,time}
Table 4.1: Unit Data Types
Ser. Data Type Array Type Array Structure
1 UPoint float[4] {21,x,y,time}2 UPoints float[n] {22,Points,time}3 ULine float[n] {23,Line,time}4 URegion float[n] {24,Region,time}
Table 4.2: Spatial Unit Data Types
data type to check if it lies in the area of interest. The next float represents the total
number of disconnected regions the object contains. Hereon the individual regions are
represented. Each region starts with numPts which is the number of points this region’s
boundary has. numPts and numFaces are required for parsing purposes. Using this in-
formation, the decoder of the operator knows where a new region starts and ends in
the float array.
4.3.2 Basic Unit Data Types
As explained before, a data type can be converted to a unit type by adding a time
attribute to it. Table 4.1 shows the data structures for implemented basic unit data
types. All of the following types are implemented as arrays.
4.3.3 Spatial Unit Data Types
Data structure for spatial data types have been explained before. Spatial unit data
types add an attribute of time to them. Following table shows the data structure of
implemented spatial unit data types. The spatial data types embedded in spatial unit
data types do not contain their type code.
49
4. ALGEBRA IMPLEMENTATION
Ser. Data Type Array Type Array Structure
1 RINT int[n] {31,numComponents,min,max,s,e,s,e...}2 RBOOL int[n] {32,numComponents,min,max,s,e,s,e...}3 RREAL double[n] {33,numComponents,min,max,s,e,s,e...}
Table 4.3: Basic Range Data Types
4.3.4 Basic Range Data Types
Basic Range data types represent ranges of basic types e.g. int, bool and real. Table 4.3
shows the structure of these types. Each data type starts with the type id followed
by the number of ranges it contains. min and max denote the bound of all the ranges
stored in the object of the data type. Each range is represented by a start value s and
an end value e.
4.3.5 Temporal Range Data Types
Periods is the temporal range data type. It is represented by an array of long with
following data structure:
long[n]-->{36,numComponents,min,max,s,e,lc,rc,s,e,lc,rc...}
Just like other data types, Periods starts with the type id. The next number numComponents
denotes the number of periods the object contains. min and max denote the minimum
and maximum timestamps that bound all of the periods.
4.3.6 Basic Temporal Data Types
By basic temporal data types we mean the moving versions of basic type e.g MInt and
MBool etc. Following is the flattened representation of basic int moving type. For
MBool and MReal, int is replaced by the corresponding basic data type.
float[n]-->{5*,min,max,no-components,periods(deftime),
{numPoints,n1,int,t1,int,t2.....int,tn1},
{numPoints,n2,int,t1,int,t2....,int,tn2}...}
The first element of the array denotes the data type. This is used by the operators
for type checking and using the corresponding decoder for using the type. min and
50
4.3 Data Structures
Ser. Data Type Array Type Type Code
1 Mint float[n] 51
2 MBool float[n] 52
3 MReal float[n] 53
4 MString varchar[n] 54
Table 4.4: Basic Temporal Data Types
max represent the bound of all the objects contained in the type. The no-components
variable shows the number of connected components the data type object holds. The
periods element contains the temporal boundaries of each object it holds. After this
element, all the individual connected basic moving objects are represented. Table 4.4
shows the array type and type codes of each basic moving type.
4.3.7 Spatio-Temporal Data Types
Spatio-temporal data types are also called the moving spatial types and include Mpoint,
Mpoints, MLine and MRegion. During the course of the thesis, only Mpoint has been
implemented as this is the type having most use-cases in real world. This also means
that the operators implemented only support Mpoint for their operations. It is not dif-
ficult to extend the operators to also support other moving spatial types. The following
section explains the flattened type representation of Mpoint.
4.3.7.1 MPoint
float[n]-->{61,bbox,no-components,periods(deftime),
{n1,x1,y1,t1,x2,y2,t2.....xn1,yn1,tn1},
{n2,x1,y1,t1....,xn2,yn2,tn2}...}
The MPoint data type is represented by a float array. The first element shows the type
code. The next four elements represent the bounding box covering the whole of MPoint.
The following element represents the number of components in the MPoint. As MPoint
can have breaks, this number represents the number of connected MPoints. The next
few elements represent the Periods during which the Mpoint is defined. After Periods,
connected Mpoints are stored. The first element of a connected Mpoint denotes the
number of points. The next elements represent the points belonging to this MPoint.
51
4. ALGEBRA IMPLEMENTATION
Each point is represented by a set of 3 numbers {t, x, y} where t is the time instant and
x & y are the coordinates of the object at that instant. This is followed by the next
MPoint and the next.
4.4 Operators
All the operators in figure 2.5 were implemented as custom functions in Apache Phoenix
except for concat and circle. Guting’s algebra contains many other operators but
these are the operators being used in BerlinMOD Benchmark. As Apache Phoenix
does not support custom data types, it can not distinguish between the data types we
implemented. This means that it can not do type checking as well. Each of the operator
does type checking inside the function by using the type ids. For pure spatial functions
we use JTS Topology Suite [49] which conforms to the Simple Features Specification for
SQL published by the Open GIS Consortium [50]. The operators have to parse the data
type objects before JTS can be used. For efficiency purposes, instead of parsing the
complete object and applying the operator, only the metadata of the data-type object
is read to see if the object is relevant. This filters irrelevant objects before an operation
is applied. For example if an intersection is required between a Region column an
MPoint column, the operator first checks if the bounding box of Mpoint stored in its
meta data overlaps with the region. If it doesn’t, instead of parsing the object and
applying intersection on each contained connected MPoints, a null is returned. All
the operators operate on array types instead of casting them to objects for performance
reasons although it makes the code less readable and less intuitive.
52
5
Indexing Strategy & Querying
Framework
This chapter presents our index design for Guting’s Algebra in HBase. We start by
presenting different index deployment strategies. We motivate our use of SFC(Space
Filling Curves) based index and present three logical and two physical index design
strategies and discuss their querying methodology. We present our querying framework
that is capable of benefiting from our index implementation. We present its components
in detail by giving examples. We also discuss the techniques we used to optimize index
access. We conclude the chapter by describing the future work.
5.1 Indexing in HBase
Any datatype that can be sorted by HBase and is put as a key in a table should be
considered indexed. The reason is that HBase optimizes its row access based on keys
and uses techniques like Bloom filter for faster query processing. In Guting’s spatio-
temporal algebra, time and space are first class citizens and most of the operators work
on these dimensions. As time is represented as a long, it can be sorted and handled
by HBase without any extra effort however a spatial data type like Point or Region
require a different approach towards indexing. An Mpoint involves both spatial and
temporal dimensions which is more complex to index. In the following sections, our
approach to spatial indexing for composite/complex data types is explained. We also
present arguments to support our choice of our indexing strategy based on Space Filling
53
5. INDEXING STRATEGY & QUERYING FRAMEWORK
Curves.
5.2 Indexing Strategies
A lot of work has been published about spatial, temporal and spatio-temporal indexes.
Some surpass others in certain criteria of performance. Before even considering their
performance, it is important to consider how can they be deployed in a distributed
environment. The indexes like R-Tree, KD-Tree or R+ Tree can either be deployed
as a global index serving the whole cluster or as a local index for each slave node.
In addition to that there exist indexes like SETI which are especially designed for
distributed systems. The following sections discuss the pros and cons of each of the
deployment approaches.
5.2.1 Maintaining a Global Index
Maintaining a global index is the easiest approach as it stores all the indexing informa-
tion at a single place which in most cases is the master node. This allows the master
node to find out which rows to access before sending the query to slave nodes. It also
allows to send the query to only those slave nodes where the data is actually present.
Following are the advantages of maintaining a global index:
1. Easy maintenance as index is stored at a single location.
2. Indexed based joins are processed as a local join which prevents distributed join
operations. This greatly improves query performance.
3. Almost all non-distributed indexes can be customized to be used as a global index
for distributed systems.
These advantages seem tempting but when the global indexing approach is consid-
ered in light of scalability, fault-tolerance and throughput, following problems can be
identified.
1. When data-size grows, the size of its global index also grows and at some point
exceeds the capacity of a single node.
54
5.2 Indexing Strategies
2. Due to limited memory capacity of a single node and large size of a global index,
it is not possible to cache the whole index.
3. The performance of most of the non-distributed indexes degrades with the in-
crease in their size. As a global index is responsible for indexing data belonging
to all slave nodes, its size grows dramatically which leads to quicker performance
degradation.
4. When the number of queries grow, master node becomes the bottleneck because
every query is routed through master node for index lookups. Some strategies can
be adopted to partially resolve this problem but they lead to further complexity
and kills ’Simplicity’.
5. HBase is a scalable and fault-tolerant system but does not support integration
with custom indexes. A custom index can only be implemented outside of its
indexing framework. Therefore the custom index (e.g R Tree etc) does not inherit
fault-tolerance and scalability from HBase. A lot of complex work needs to be
done to do intelligent replication and make the index fault-tolerant.
5.2.2 Maintaining Local Indexes
In this approach each slave node maintains its local index. Parallel SECONDO (the
system with which we compare our implementation) uses this approach and builds local
spatial and B-Tree indexes at the slave nodes. This approach has following benefits.
1. This approach is more scalable than global indexing approach. Each index only
handles the data stored in the node where it is deployed.
2. The index can be cached in local nodes because of less size of the data.
There are a few disadvantages of using local indexes as well:
1. Although local index joins can be performed, a global join is still required.
2. Local indexes can not be used unless a complete framework is developed for
support of custom local indexes. All queries will have to be intercepted and
re-written in the HBase co-processors to use local indexes.
55
5. INDEXING STRATEGY & QUERYING FRAMEWORK
3. When the size of a region grows, HBase splits the region into two and moves
the second part to some other node in the cluster. As local indexes would be
implemented without support for custom indexes in HBase, all the features like
index splitting, replication and fault-tolerance will have to be implemented from
scratch.
5.2.3 Maintaining Distributed Indexes
There are a couple of distributed spatio-temporal indexes like SETI. They are scalable
but its really hard to integrate them into HBase. The index will have to built outside
HBase and the queries will have to be rewritten before sending them to HBase. This
can be considered as processing two queries at two different systems instead of writing
one query for HBase. Another drawback is that these indexes do not handle scalability
and fault-tolerance automatically. A lot of effort is needed to induce such qualities in
them.
5.2.4 SFC based Indexing for HBase
Space Filling Curves (SFC) have been discussed in chapter 2.4. As discussed before, the
indexing strategy of HBase is based on sorting the key and storing the data in HFile
like a B-Tree to optimize disk access. In other words, HBase can only index a single
dimension. As we intend to index trajectory data which is multi-dimensional, SFCs can
be used to convert multiple dimensions to a single dimension. The spatial dimensions
of a data-type can be reduced to a single dimension which can either be a number or
a base-32 encoded string. It can then be sorted and indexed by HBase. The benefit of
using this approach is that it is scalable. We do not need to take care of fault-tolerance
and replication etc. This approach might not be the most optimal one but the benefits
of scalability, fault-tolerance and simplicity are a strong motivation for us to use it.
Things get tricky when we think of indexing a Region or mpoint data-type. In the
coming sections, we discuss how to tackle these problems.
5.3 Spatial Index Design for LSMT
Log Structured Merge Trees (LSMTs) have been discussed before. HBase is an LSMT
based key-Value store. As mentioned before, HBase indexes data based on the key. The
56
5.3 Spatial Index Design for LSMT
design of the key heavily affects the querying performance. Data is sorted according to
the key. HBase optimizes its disk access to retrieve the data based on the key. HBase
allows us to design a composite key to optimize the performance of our queries based
on our use-case. Before presenting our approach to indexing spatial data, we outline
two crucial objectives for our design with an example. Lets say we have the following
query to process: ”Find all cars within 200 meters of Theodor-Heuss Platz”. Keeping
the query in mind, while designing a spatial index following three objectives should be
taken care of:
5.3.1 Co-location
All points located close to each other should be stored close to each other on disk as
well. Considering above query, if all cars within 200 meters of Theodor-Heuss Platz are
co-located on disk, it will require a single seek and the result will be returned at disk
transfer speed. If the cars are spread over disk, multiple seeks will be required and the
results will be returned at disk seek speed which is many times slower than its transfer
speed. Our choice of implementing an SFC based index already makes that happen as
SFC hashes are similar for closer points (except for a few edge cases) and HBase stores
the hashes in a sorted manner.
5.3.2 Lesser Size of Unwanted Data Scan
The size of unwanted data returned should be small. Unwanted data is the data which
does not fulfill the query criteria. In the above example query, the cars present in the
area beyond 200 meters of Theodor-Heuss Platz are unwanted data. GeoHash has been
explained in section 2.4.3 in detail. The grid sizes of GeoHash are fixed and depend
on the level of detail chosen. If GeoHash or any SFC is used for indexing, a region is
represented by the grid it lies in. In the example query, we want to find all cars in
the circle of radius 200 meters with its center at Theodor-Heuss Platz. To process this
query, the grid cell containing the circle completely, is found. The grid cell will cover
an area greater than that of the circle. This grid cell is used to process the query and
all cars present in the cell are retrieved. This also brings some extra cars which are
not present in the circle. These extra cars are then filtered out. If the size of the cell is
too big, more unwanted data will be retrieved which is costly. This happens when the
grid cell of one level is a bit smaller than the circle and the grid cell of the next level
57
5. INDEXING STRATEGY & QUERYING FRAMEWORK
is used for this purpose. The difference in area of grid-cells of two consecutive levels
is significantly large. To prevent this, instead of finding a single grid cell which covers
the query circle, we can also find multiple smaller grid cells that cover the query circle.
5.3.3 Lesser Scans
When we represent a circle with smaller grid cells, there are chances that the number
of grid cells required to cover the circle is very large. For example the circle with in
which we want to find out the cars is very big and requires 1000 grid-cells of level 11.
To process this kind of query, 1000 scans would be required in HBase.
5.4 Our Approach
We achieve the objectives of section 5.3 at two levels i.e. indexing and querying. For
indexing, we present three different strategies for indexing a region and argue which
of those can be used for primary or secondary indexing. At querying level, we present
a querying framework which by using data statistics and schema meta-data, optimizes
the queries to achieve the above mentioned objectives.
5.4.1 Priliminary Choices
5.4.2 Choice of SFC Index
Hbase sorts all the tables on its key attribute. It optimizes disk accesses by using bloom
filters and storing the H-Files like a B-Tree which allows it to quickly reach the required
key. Range queries are also fast because the data is stored in a sorted way. This allows
HBase to remove the complexity of maintaining a separate index for each Region in
each Region Server and creation of new index after region split.
The decision to use the built-in index support for Keys in a table was based on
the fact that it allows to inherit all properties of HBase like scalability, replication
and fault-tolerance. The problem with this is that it only allows one dimensional
querying or indexing support while we want to index multidimensional data like points,
lines, regions and mpoins. For solving this problem, we use Space Filling Curves like
Z-order curve, Hilbert curve or GeoHash. The Space Filling Curves (SFC) convert
multi-dimensional data into a single dimension which allows us to query HBase for this
58
5.4 Our Approach
attribute by using it as a table key. A point (two dimensional) is converted into a one
dimensional hash value. This hash can be sorted and stored by HBase and queried
efficiently.
5.4.2.1 Choice of Geo-Hash
SFCs like Z-order curve, Hilbert Curve or GeoHash tend to achieve the same objective.
They convert multiple dimensions into a single dimension and try to maintain data
locality. Each of these have edge cases where they fail to maintain data locality. We
could not find any study comparing their performance so the choice was hard to make.
We chose GeoHash for our indexing approach because of its acceptance in the open-
source and GIS community. Amazon uses it to index spatial data in its DynamoDB.
From now on, we will be using grid cell or hash interchangeably as a grid cell is repre-
sented by a hash. Also grid level and hash-length represent the same concept. GeoHash
divides multiple dimensions into hierarchical grids where the deepest level is 12 which
represents a point. Unlike R-Trees, the size of the grid cells is fixed. This means that
GeoHash cannot exactly represent data types representing an area e.g. Region, Line
or an mpoint with a single hash value. If a single hash is demanded, it would repre-
sent an area mostly far greater than the actual area covered. Guting’s data-types can
have standard, spatial or temporal dimensions. Standard dimension can be handled
by HBase without hassle. During the thesis, we focused on the spatial dimension and
left temporal dimension as a future work. Spatial dimension can be of type point or
region. Indexing a point is simple and requires a 12 character Geo-Hash to be used as
a key in an HBase table. In the following sections, we present our approach to indexing
data-types that represent an area i.e. an mpoint.
5.4.3 Indexing a Region
Indexing a point is easy using SFC but when it comes to data types having higher
dimensions than a point, it becomes complex. The reason is as follows. A point can
be exactly represented by a length-12 Geo-Hash which makes it possible to have exact
matches while querying point data. A region mostly can not be exactly represented
by a Geo-Hash grid. The reason is that a region can have any shape or size whereas
a grid-cell is always fixed in Geo-Hash. The probability that a region will be of the
exactly same shape and size as the grid is rare. Thus we can say that indexing a region
59
5. INDEXING STRATEGY & QUERYING FRAMEWORK
is always an approximation because it does not allow to get exact matches from the
index. In other words we consider a region index a ’filter’. We use the region filter to
retrieve relevant results and then perform exact spatial operations to obtain the true
results. Same approach is used in an R-Tree. An R-Tree can only index objects in
terms of their bounding boxes. An object whose bounding box matches the query is
retrieved but then the actual object is checked to see if it satisfies the query criteria.
If it does, it is retained otherwise it is discarded. In R-Tree, the objects are indexed
based on their bounding boxes. The drawback in SFC based approach is that objects
are indexed based on the grid-cells in which they lie which are pre-decided and often
cover far more area than the bounding box of actual object. This means that after
querying the index, more rows are available for the actual operator to be applied on
as compared to R-Tree. In the following sections we present three different logical
approaches i.e. SLSH, SLMH & MLMH for indexing a region and two approaches i.e.
single-index & multi-index, related to physically maintaining the index.
5.4.3.1 Single-Level Single-Hash (SLSH)
In this approach a single region is represented by a single hash. A single hash has a
unique level that is why we call this approach SLSH (Single Level Single Hash). Using
a single hash to index a region is the simplest solution but it can be highly inefficient
as well. Consider a small region r which can be covered easily by a grid-cell a11 of
hash-length 11. If r lies at the border between two grid cells of length 11 i.e. a11 &
b11, it can no more be indexed using either a11 or b11. We have to choose a bigger grid
cell c10 i.e. a grid cell of length 10. As discussed in section 2.4.3, the difference of area
between the grid cells of two consecutive levels is very big, c10 is many times bigger
than either of a11 or b11. If we choose c10 for indexing r which is even smaller than
a11, c10 is very inaccurately indexing the r. It could lead to low query performance.
Lets assume that the area of r is 5% of the area of c10. Lets say we are querying
the index to find regions intersecting with a point stream. and the points are equally
distributed over the space. As r has been indexed by c10 whose 95% of the area falsely
represents r, the region r would be returned in the results wrongly, 95% of the times.
This example if generalized over the complete indexed dataset tells us that the number
of false results will be more. These results are then transfered to the client for filtering
60
5.4 Our Approach
out extra results. This is heavy on the network as more results are transfered over the
network. This goes against our ’lesser results’ objective.
There is a benefit to this approach as well. As the regions are indexed with a bigger
grid-cell, more and more regions will be represented by a single grid. This means all of
these regions will be stored closed to each other. This satisfies our co-location objective
with less chances of edge cases. Also there is only one hash per region being stored
in the index. This makes it faster on disk. This is less heavy on disk as the data can
ideally be accessed using a single disk seek however it is heavy on network because of
large number of results to be transfered to the client.
Although this approach is not very optimal but we implemented it because this is
the only approach we can use to build a primary index. In HBase, there can only be
one primary index in the table which is the key. As there can only be one key per row,
we can only use a single hash value as a key per row. SLSH allows us to have only
one hash per region which we can use in a rowkey. For a single region column, we can
have one primary and many secondary indexes. For secondary indexes we propose two
approaches based on multiple hashes in the following.
Table 5.1 shows number of points with the hash length of their bounding box, for
a particular table.
To index these hashes, we have two approaches as discussed in the following.
5.4.3.2 Multiple Hashes per Region
To prevent the retrieval of a large set of irrelevant results, we can use multiple hashes to
index a single region. We present two approaches for using multiple hashes for indexing
a region. One approach uses multiple hashes of the same level to index and the other
approach uses multiple hashes of different levels. We call these approaches SLMH &
MLMH respectively. These approaches are presented below.
1. Single-Level Multi-Hash (SLMH): We can index a region with more ac-
curacy if we use multiple grid cells of a granular level. The more granular the
level is, the more accurate the index would be and would satisfy the objective
’lesser unwanted results’. One approach is to use the most granular level for all
i.e. level 11 but we might need to insert hundreds of hashes in the index for a big
region. This increases the size of the index and takes more time in disk access
61
5. INDEXING STRATEGY & QUERYING FRAMEWORK
but greatly enhances the performance by reducing the number unwanted tuples
transmitted to the client. If a too large grid level is chosen, it generalizes the
index but decreases the size of the index which means faster disk access. The
choice of wrong level for index can lead to poor performance. It is also hard for a
common database user to understand what grid level is the right one because it all
depends on the data. To prevent this, we propose a smarter way to handle this.
We present a simple algorithm which chooses the level of a region automatically.
The algorithm takes an input variable MAX HASHES PER REGION. This pa-
rameter tells the algorithm, the maximum number of hashes it can use to index
a single object. This indirectly determines the maximum size of the index. If
we have one million rows to be indexed and this parameter is set to 100, we can
be sure that the index size won’t grow more than 100 million rows. Algorithm 1
shows the code of the algorithm. It takes as parameters the region for which
hashes are to be found and the max-hashes parameter. We start with the grid-
cell which completely covers the region and then drill deeper into more granular
levels. The deeper we go, the more hashes will be required to cover the region.
If by going deeper, the total number of hashes returned becomes greater than
the limit MAX HASHES PER REGION, we return the previous list of hashes
with lesser hashes than the maximum allowed. Using this algorithm, regions of
different sizes are represented by hashes of different level proportionate to their
size. This ensures that the level is neither too small for the region nor too big.
Data: region,MAX HASHES PER REGIONResult: List of HasheshashLength= findSingleCoveringHash(region);noOfHashes=0;while hashLength <MAX HASH LENGTH do
newHashes=findHashes(region,++hashLength);if newHashes.size() >MAX HASHES PER REGION then
return listOfHashes;else
listOfHashes=newHashes;end
endAlgorithm 1: Algorithm to determine hashes per region
This is an improved approach when compared to SLSH as it satisfies objectives
’lesser unwanted data’ as well as ’lesser scans’ because the levels are of appropri-
62
5.4 Our Approach
ate granularity for each region. The limitation of this approach is that it should
only be used for building secondary indexes in HBase. Secondary indexes are
maintained in separate table than the main table. They only contain the indexed
hash bundled with the primary key of main table. If a single row is indexed
using 50 hashes for example, each hash will also contain the rowid bundled with
it which helps in joining the results with the main table. As the rowid is gener-
ally small compared to the whole row, this duplication does not produce much
overhead. If SLMH is used as a primary index, each row will have to be stored
as many times as the number of hashes for the row. This wastes a lot of space
and greatly reduces query performance because a range scan now takes longer
because of duplicated data. As part of this thesis, we implemented this approach
for maintaining secondary indexes.
2. Multi-Level Multiple-Hash (MLMH): This approach involves the use of
multiple hashes belonging to different grid levels for indexing purposes. The use
of multiple hashes suggests that it should be used as a secondary index. This
approach is more accurate in representing a region and index size is smaller as
well. When a region is required to be indexed, a coverage-algorithm is used
to find the hashes/grid-cells best covering the region. If a region is big, unlike
SLMH approach which might require 100s of hashes to cover it, this approach
uses a few hashes of bigger size to cover the region and for accuracy purposes,
uses granular hashes to cover small areas of the region not covered by the bigger
hashes. Imagine a big region which is covered by a single-level index of length 10.
Suppose that the region could have been represented by a single grid-cell of length
10 but it has a very small part extending to a neighboring grid cell. SLMH index
would dig deeper and represent it more accurately by using grid-cells of level-11.
This will increase the accuracy but also increase the number of hashes required,
significantly. MLMH index will use one grid cell of level-10 and one grid cell of
level-11 (which is more granular) to index the region. Thus this approach is more
accurate and faster.
63
5. INDEXING STRATEGY & QUERYING FRAMEWORK
Hash-Length Count
11 150000
10 0
9 600
8 100
7 2000
6 0
5 0
4 0
3 0
2 0
1 0
Table 5.1: Number of points in each grid level
5.4.4 Physical Approaches for Building the Index
The above types of indexes can be implemented either using a single index or a collection
of indexes. These approaches are discussed below.
5.4.4.1 Single-Index Approach
In this approach, all hashes are stored in a single index. This makes it easy to handle
the indexing process. Consider an index with statistics shown in table 5.1. As we
have hash values of length 7,8,9 and 11, we will have to send four get/scan requests
for each hash-length to retrieve the results and combine them at the client. For all our
experiments, we used this approach because it is easier, clean and more intuitive for
the query optimizer to optimize queries on single-index approach. This is explained in
detail in section 5.4.6 explaining with examples why we need multiple get/scan requests.
5.4.4.2 Multi-Index Approach
Another approach is to build four different indexes with hash-length 7,8,9 and 11. To
process a spatial query, we need to formulate four different queries, one for each index
and combine the results in the end. The difference with single index approach is that in
single index, multiple scan requests are sent to a single index however in multiple index
64
5.4 Our Approach
approach, the same number of requests are sent to different indexes. In the scenario
under discussion, both approaches will require execution of four scan requests. Which
approach is faster is a question that needs to be further looked into.
5.4.5 Optimization
As it can be seen that the lines being covered by hash-lengths 9 and 10 are very less,
they can be merged with hashes of level 8. Thus we can build only two indexes with
hash-length 8 and 11. All the hashes of length 9 and 10 would be trimmed to length 8.
Let us see what happens when a line with a hash-length 8 and hash value ’dr65h8p6’
is an input to a query for finding intersection. Here we assume that the line has been
translated by the Query Optimizer into a single hash but the operator has not been
rewritten yet for simplification purposes.
As we have two indexes, one of length 8 and the other of length 11, the input
bounding box will be queried against both as follows:
WHERE Index8.Line = ’dr65h8p6’ ;
and:
WHERE Index11.Line LIKE ’dr65h8p6%’ ;
The results are then merged at the client.
5.4.6 Index Implementation
We implemented an SLSH index for primary indexing purposes and an SLMH index
for secondary indexes using the single-index approach. We use our coverage-algorithm
to find out the hashes that effectively cover the region. These hashes are stored in a
separate HBase table as a key and the key of original data holding table is stored as a
value. For every spatial query, the index is queried first to get potential matches and
using those matches, the table containing the actual data is queried. Building an index
for a region is an easy task but when it comes to querying, a few tweaks are required.
To understand how we can query such an index, lets take an example query:
SELECT M.name
FROM Movement M
WHERE intersects(M.Route, ’dr65h8p6d542’ );
65
5. INDEXING STRATEGY & QUERYING FRAMEWORK
Here we want to find all the people who crossed a specific point. Here Route is a column
containing lines that are indexed using a single hash. Although the index used in this
example is SLSH, the querying process we are about to explain is true for SLMH as
well. The length of hashes vary from 7 to 11 as shown in table 5.1. Lets say that we
have 8 hash entries in the table as shown in table 5.2. As we can see, the query should
return rows belonging to Faisal, Alex, David and Bob. As HBase only supports get and
scan operations over the key, this SQL will be translated by Phoenix into HBase get
and scan operations. Lets discuss how can we retrieve the correct results using a partial
scan or in simple words using a LIKE or = operator. The reason we are discussing this
example with a LIKE or = operator will be clear in later sections where we explain
how we translate Guting’s operators into LIKE or = operator for better performance.
Lets see what happens when we use a hash of length 10 to query e.g:
WHERE M.Route = ’dr65h8p6d5’ );
This will return only Faisal. But we also want Alex, David and Bob to be returned.
Lets try it by trimming the query point to 7 digits so that we can retrieve other relevant
results as well.
WHERE M.Route LIKE ’dr65h8p%’ );
This query will return the whole table because it is too generic. From this we
understand that we will have to issue four queries as we have 4 different lengths of
hashes stored in the table. Our modified query becomes like this:
WHERE M.Route = ’dr65h8p’
OR M.Route = ’dr65h8p6’
OR M.Route = ’dr65h8p6d’
OR M.Route = ’dr65h8p6d5’);
This query will retrieve only the correct results. For this process we need to know
what levels of the grid an index contains. If we don’t know this, we will have to issue a
query with 11 conditions in the where clause, one for each hash-length. This is highly
in-optimal as the chances of having an index containing all the levels is rare. For
making the querying process efficient, we use data statistics which we maintain in our
stats-store which is explained in the section 5.5.5.
66
5.4 Our Approach
Hash Name Hash-Length
dr65h8p6d5 Faisal 10
dr65h8p6d Alex 9
dr65h8p6 David 8
dr65h8p Bob 7
dr65h8p6d1 Terry 11
dr65h8p6d2 Olena 11
dr65h8p6d3 Charlotte 11
dr65h8p6d4 Ivan 11
Table 5.2: Index for Movement Table along with hash-length
The above example explains how to query the index when a point is an input.
This method also works if a region is an input to the query but the region should be
represented by a hash that is greater or equal in length to the most granular hash in
the index. Lets say we have the same query but this time we want to find all the people
who crossed my field ever. Lets assume I have a big field which is represented by a
hash of length 8. The query to find all those people is:
SELECT M.name
FROM Movement M
WHERE intersects(M.Route, ’dr65h8p6’ );
In the previous example we has a point as an input with a hash of length 12 which
we trimmed to make is suitable for matching hashes of smaller length. But in this
example, we have an input hash that is smaller than some hashes in the index. In
this case, we will use LIKE operator for all hash lengths that are greater than 8. The
modified WHERE clause would look like this:
WHERE M.Route = ’dr65h8p’
OR M.Route LIKE ’dr65h8p6%’
For matching the input with smaller hashes in the index, we trim it and use = operator
whereas for hashes that are greater or equal in length to the input hash, we use the
LIKE operator. The more the number of conditions in the WHERE clause, the more
the number of get/scan requests. In our experiments we had three different lengths of
67
5. INDEXING STRATEGY & QUERYING FRAMEWORK
hashes in out SLMH index and we saw that sending 3 get/scan requests did not incur
much overhead.
5.4.6.1 Schema Design for GET Requests
As it can be seen we use get and scan interchangeably when we talk of requests. It
totally depends on the schema design which kind of request will be used. For example,
if the key of the index only contains the hash, the ids of all regions having the same
hash in their coverage will be stored against the same key. This does not mean that
the ids will be appended to the same key. How HBase stores this is as follows. Lets
assume our index has two columns hash and id where id is the identifier of region. We
index a region which has a hash=dr65h8p and id=1. This will be stored in HBase as
a row. Now we get another region to index with hash=dr65h8p and id=2. If we insert
this in the index, HBase will not overwrite the previous row instead it will version it.
It will attach a timestamp to this row and store it just next to the previous row. All
other rows with a similar hash will be stored in the same way. Now if we query the
index for hash=dr65h8p, we will get a row with id=2. The reason is that by default,
HBase returns the most recent row it stored which in our case was the one with id=2.
To use this kind of schema design, we increase the number of versions HBase stores for
a row to an appropriate number and tell it to return all the versions for a row when
ever a get request is sent. We use this design approach because this makes our query
translation and optimization more intuitive. This approach allows us to use = operator
to retrieve all the regions belonging to a particular hash. It would be interesting to
see which schema design approach performs better. When we use this schema design
approach, the WHERE clause of our SQL query would look like this:
WHERE hash = ’dr65h8p’
5.4.6.2 Schema Design for SCAN Requests
This schema design approach involves the use of a composite key in HBase. Lets see
this with the example of an index which has two columns hash and id where id is the
identifier of region. Instead of having two columns, we merge them together to have a
composite key column of the format hash-id. Lets say we have two regions and both
have hash=dr65h8p. For one region id=1 and for the other id=2. When we form a
68
5.5 The Querying Framework
composite key, it would look like dr65h8p-1 and dr65h8p-2 respectively. If we want
to retrieve all the regions belonging to dr65h8p hash, we will have to issue a SCAN
request in HBase because now we have two rows for the same hash. Using Phoenix
(which is a SQL layer on top of HBase), we would use the LIKE operator instead of =
operator and our WHERE clause will look like:
WHERE hash LIKE ’dr65h8p%’
5.5 The Querying Framework
In the previous section we presented our design of the spatial index. To enable the user
to transparently query the trajectory data, a querying framework was required that
could hide the complexities of our approach and optimize the query. Figure 5.1 shows
the block diagram of our implementation. The following sections describe each of these
components one by one which as a whole form our querying framework.
5.5.1 Guting’s Algebra
This module contains our implementation of Guting’s algebra (data-types and opera-
tors) within the framework of Apache Phoenix. The operators are designed in a way
that they can take geometric information in the lat-long or SFC hash format. All inter-
nal operations are performed in lat-long format as the JTS library works on WGS-84
geometry. Therefore the types in SFC hash format are converted to lat-long format
before applying the operation. The operators have no knowledge of this conversion.
For this purpose they use the SFC plugins supplied as a package with the algebra.
5.5.2 SFC Plugins
The algebra has no knowledge about matters related to SFCs. During the course of
the thesis, we used GeoHash for all purposes but any other kind of SFC can also
be integrated by implementation of a simple interface. The algebra or the query
translator, only require encode and decode methods but for rest of the modules re-
quire some advanced methods like findNorthNeighbour(), findAllNeighbours() and
findChildCells() etc. to be implemented.
69
5. INDEXING STRATEGY & QUERYING FRAMEWORK
Constant Spatial Entities Basic Data Type Format n
Point float[1+2n] {TypeID,lat,long} 1
Bounding Box float[1+2n] {TypeID,lat1,long1,...,lat4,long4} 4
Line float[5+2(n-2)] {TypeID,lat1,long1,...,latn,longn} ≥ 2
Region float[7+2(n-3)] {TypeID,lat1,long1,...,latn,longn} ≥ 3
Table 5.3: Constant Spatial Entities
We distribute SFC plug-ins and Guting’s algebra in the same package so that the
operators can use the encoding and decoding functions of the respective SFC imple-
mentation.
5.5.3 Query Translator
For enabling spatio-temporal querying transparent, a naive query translator has been
implemented. The translator adds support for space filling curves based index. A
variety of space filling curves can be plugged in by implementing a simple interface.
As part of this thesis, GeoHash indexing support was built. Lets take GeoHash as an
example to explain the functioning of this translator. It receives a SQL query as an
input and parses it for existence of a spatial entity. For the purpose of translation, a
spatial entity can either be a constant entity or a meta entity. Types of spatial constant
entities are described in table 5.3.
where n = no.ofpoints in the spatial entity.
As a second step, meta spatial entities are detected. A meta spatial entity is a
table or a column in the table storing a spatial attribute. For the translator, only
those spatial columns are important which store the spatial attribute using space filling
curves. This information is stored in the meta-data store. All spatial constant entities
are translated to their respective space filling curve representation by the translator.
Lets take an example of a spatial query-4 from BerlinMOD Benchmark [16] presented
here as Query-1.
Query-1
SELECT PP.Pos AS Pos, C.Licence AS Licence
FROM dataScar C, QueryPoints PP
WHERE C.Trip passes PP.Pos;
70
5.5 The Querying Framework
Figure 5.1: Implementation Block Diagram
71
5. INDEXING STRATEGY & QUERYING FRAMEWORK
This query finds out which licence plate numbers belong to vehicles that have passed
the points from QueryPoints. As Phoenix only supports equi-joins for now, this query
needs to be divided into two queries. We divide the queries manually at the client.
This process is not part of the Query Translator. We use this example just to show
how can we handle a query where we have joins based on Guting’s operators but is not
part of the scope of this thesis. The first query looks like this:
Query-2
SELECT PP.Pos AS Pos
FROM QueryPoints PP;
This gets all the points in the QueryPoints table. As the size of this table is very
small i.e. 100 points, these points are added to the WHERE clause of second query as
constants.
Query-3
SELECT C.Licence AS Licence
FROM dataScar C, QueryPoints PP
WHERE passes(C.Trip , [1,52.40092d,13.52795d])
OR passes(C.Trip , [1,52.46290d,13.55138d])
OR ...;
The query translator queries the meta-store to determine if C.Trip attribute is a
space filling curve enabled and if yes, of what type. It then translates all the points to
the respective representation using the corresponding space filling curve plugin. In our
case, it will use the GeoHash plugin to translate the points and the query would look
something like this:
Query-4
SELECT C.Licence AS Licence
FROM dataScar C, QueryPoints PP
WHERE passes(C.Trip , ’u33d5g6c2r9f’)
OR passes(C.Trip ,’u33dkqefd1fw’)
OR ...;
The points in our implementation of Geo-Hash are base-32 encoded and are rep-
resented by 12 characters . Although support for Geohash has been added to the
72
5.5 The Querying Framework
implemented Guting’s Algerba operators, this approach is far better than allowing the
operators to perform the transformation. Imagine a relation with 2 billion rows and
the following query:
Query-5
SELECT M.Loc
FROM MOVEMENT M
WHERE intersects(M.Loc, [1,52.40092d,13.52795d]);
If M.Loc is a Space Filling Curve (SFC) enabled column, the operator ’intersects’
will be called by Phoenix/HBase for each row which means that the lat-long point will
be translated to the corresponding SFC 2 Bn times. If the query-translator is used, it
will be translated only once. This gives a huge performance boost. So the translated
query which HBase will have to execute is:
Query-6
SELECT M.Loc
FROM MOVEMENT M
WHERE intersects(M.Loc, ’u33d5g6c2r9f’);
The same procedure is applied to other data types mentioned in the above table.
Lets consider an example where the all locations which intersect with a bounding box
are to be retrieved. In that case the ’intersects’ function will be passed a double[9]
array representing a bounding box. The query translator converts this double array
into an array of type varchar[5] where the first element of the array is the type identifier
of the SFC based bounding box. The above query will look something like following:
Query-7
SELECT M.Loc
FROM MOVEMENT M
WHERE intersects(M.Loc, [’51’, ’u33d5g6c2r9f’, ’u33d5g6c2r9g’,
’u33d5g6c2r9h’, ’u33d5g6c2r9i’);
The second performance benefit of using this translator is that it helps in the con-
version of a scan query into a point query which is done by the Query Optimizer. After
translation of the query into an SFC based query, the Query Translator passes it on
to the Query Optimizer which as the name suggests, optimizes the query. The Query
Optimizer is explained in the following.
73
5. INDEXING STRATEGY & QUERYING FRAMEWORK
5.5.4 Query Optimizer
The query optimizer takes an SFC based translated query as an input and transforms
a scan query into a point query by using some information from the meta-store. But
before that, the query has to fulfill some criteria. If both the parameters of the operator
are of type point, the optimizer converts the Guting operator into a basic equality
operator. The Query Optimizer gets this info from the type-id of the input constant
spatial entity and from the meta-store for meta spatial entities. Lets take the example
of the above query-6. The signature of intersects involves two SFC based point
parameters. This can be transformed into following query:
Query-8
SELECT M.Loc
FROM MOVEMENT M
WHERE M.Loc = ’u33d5g6c2r9f’;
This increases query performance manifolds as the scan query has turned into a
point query now. HBase can optimally locate the rows and filter them based on the
equality.
This approach is also applied to the operators involving parameters of type other
than the point. To understand how query optimizer optimizes a query involving pa-
rameters that represent an area or a line, lets take Query-7 as an example. When this
query is passed to the optimizer, it identifies the spatial operators and the datatypes of
their parameters. In the current example, it will find out that intersects is a spatial
operator and the parameters provided to it are a point and a bounding box. This query
if sent to HBase will result in a full table scan because Phoenix does not understand
the semantics of the operator. The optimizer converts this full table scan query into a
range query by replacing the intersects operator into a LIKE operator. This is done
by taking the least common string of all 4 SFC points and inserting a literal ’%’ at the
end. Here is how the query looks like after performing this process:
Query-9
SELECT M.Loc
FROM MOVEMENT M
WHERE M.Loc LIKE ’u33d5g6c2r9%’;
74
5.5 The Querying Framework
This greatly enhances the performance of the query and prevents full table scan.
The only problem with that query is that it returns more results than required because
we lost some information while converting a bounding box to a single SFC hash. In
this example, 12 digit hashes were converted to a single 11 digit hash. The results
are retrieved very fast but they need to be filtered at the client. This is done by the
implemented Client Filter.
This doesn’t always lead to optimization for edge cases (discussed previously in
GeoHash explanation. This happens when the area to be queried lies at the edge of
SFC grids. Lets take the example of following figure. The small box in the Geohash grid
represents the bounding box we want to use in our example query. After translation
the query would look something like this:
Query-10
SELECT M.Loc
FROM MOVEMENT M
WHERE intersects(M.Loc, [’51’, ’dr72h8p6c2r9f’,
’dr72hb0u33d5g’, ’dr5ruzb5cr7m6’, ’dr5ruxz2gty6n’);
If a least common string is taken for all 4 points, we get only dr. dr is a 2 character
geohash which represents a very large area. If the query optimizer sends a query with
the WHERE clause such as WHERE M.Loc LIKE ’dr%’, It will probably bring in the whole
database to the client and it will fail. For such cases we need a mechanism which makes
HBase to do lesser disk accesses and also most of the filtering part is shifted to the server
while the client filter is used for fine trimming of the result.
The optimizer solves this problem by taking a user parameter which tells the opti-
mizer the maximum number of hashes or the maximum number of where conditions it
can generate. If this parameter is set to 2, the above query-10 would look like this:
Query-11
SELECT M.Loc
FROM MOVEMENT M
WHERE M.Loc LIKE ’dr72h%’ OR M.Loc LIKE ’dr5ru%’;
In this case, a single intersects operator is transformed into two LIKE operators
each handling a 5 digit hash. This greatly reduces the number of results returned to
75
5. INDEXING STRATEGY & QUERYING FRAMEWORK
the client as well disk hits. This was a simple case where each two of the four hashes
had 5 characters in common. This does not happen often. Although this significantly
reduces the size of the result returned to the client, the size of the grids being searched
for, is very large. If the maximum hash split parameter allows for more splits, more
granular grids can be chosen. If the parameter is increased too much, the number of
results will be significantly reduced but can take more time to execute because each
LIKE operator will be turned into an HBase scan. If the data is sparse, many scans
can return an empty set and waste resources thus increasing the overall execution
time. This parameter should be chosen very carefully. The coverage calculation is
optimized based on the statistics of stored data. The coverage of the whole stored data
is divided into grids. For each grid, the total number of points it contains is stored.
The optimizer uses these statistics to find the grid hashes better covering the desired
area. The objective of this coverage algorithm is to find out hashes which reduce the
number of total returned results. This algorithm is described in a separate section.
The job of the optimizer becomes tricky when the column to be queried upon is an
area based data-type e.g. bounding box or a region. Let us consider the conditional
clause of an input query to the optimizer.
Query-12
WHERE intersects(M.Loc, [’51’, ’dr72h8p6c2r9f’, ’dr72hb0u33d5g’,
’dr5ruzb5cr7m6’, ’dr5ruxz2gty6n’);
Lets assume that M.Loc is of type SFC based bounding box. The optimizer gets the
information about its length from the stats-store and uses this to decide the length of
the input hashes. If the length is less than the smallest input hash, all input hashes are
trimmed to make them equal the column hash length. The length parameter is passed
to the coverage algorithm which determines hashes of only that length. Suppose the
length of M.Loc is 5, the coverage algorithm will return hashes of length 5 only. The
above query-12 will be rewritten as:
Query-13
WHERE M.Loc = ’dr72h’ OR M.Loc = ’dr5ru’;
The above mentioned approach is used if an index has a fixed length. This is
true if multi-index approach mentioned in section 5.4.4.2. If a single-index approach
76
5.5 The Querying Framework
Figure 5.2: GeoHash Edge Case [32]
77
5. INDEXING STRATEGY & QUERYING FRAMEWORK
is used, we might have hashes of different lengths in the same index. For this, we
ask the coverage-algorithm to provide us with hashes of those levels only. During the
course of our thesis, we used the BerlinMOD Benchmark[16] data and constructed
SLSH approach for primary indexes and SLMH approach for secondary indexes. The
optimizer checks this information from the meta-store to find out what kind of index
does the column hold and the stats-store to find out how many hash-lengths does it
contain and generates queries automatically.
The query can either be on the index table or the main table. For any of these
cases, if the table or the index contain all the columns requested in the query, the
Query Optimizer can generate queries automatically. If the query is on main table,
the optimizer can read the meta information to find out if the column has a secondary
spatial index and query the secondary index if the index contains the columns requested
in the query. But if the index does not contain all the requested columns, the query
needs to be split into two queries, one for the index and the other one for main table.
This scenario is not currently handled by the Query Optimizer automatically. It is just
an implementation challenge which requires more work-hours. Currently we do this
process manually and consider this automation as a future work. We merge the results
of the two queries programmatically.
As a last step, the Query Optimizer appends the original WHERE clause at the
end of the generated query with an AND logical operator. The optimizer does this
to push the operator to Region Servers. Apache Phoenix evaluates the conditions in
order. In our case, the conditions that are in the beginning of WHERE clause are the
ones comparing hashes and are evaluated first. As hashes are just approximations of
the actual covered area, these conditions help to filter out most of the unwanted data
quickly. After evaluation of all hash conditions, the actual operator is applied on the
filtered results to obtain true results. Pushing the operator to Region Servers gives us
two benefits. First, the operators are applied by many machines at the same time, thus
the intermediate results are filtered very fast as compared to a single client doing all
the filtering. Second, the unwanted results are not sent to the client which improves
network performance.
78
5.5 The Querying Framework
5.5.5 Stats-Store
The statistics-store contains various statistics about the data which help Query Op-
timizer and the Coverage Algorithm to optimize their operations. The whole data is
divided into grids of different sizes. The grids are divided into different granularity
levels. The levels to calculate statistics for, are provided in a properties file. Currently,
the statistics can only be stored for GeoHash SFC. The granularity levels vary from
1-12. Granularity level 1 represents the biggest area a base-32 encoded GeoHash can
represent. The dataset provided may not be of this big area and can belong to a coun-
try or a city which requires much granular GeoHashes. To cater for this, the user has
to provide the bounds of the dataset which are stored as part of the statistics. Based
on this information, the parameter MIN HASH LEVEL is determined. For example,
if this parameter is calculated to be 6, the user can tell the statistics module to cal-
culate and store the statistics for levels 6 and above. As this thesis only deals with
SELECTs rather than INSERTs, the statistics module is not integrated with the sys-
tem. Instead the statistics are calculated offline in a batch mode. Keeping in view the
size of the datasets involved, a MapReduce job has been written which does this job.
These statistics are used by the Query Optimizer as well as the Coverage Algorithm
for their operations. The statistics-store is a simple xml file which can be accessed by
the split/max-hash algorithm. The statistics-store stores following information:
1. Maximum Spatial Bounds: The spatial bounds of the data are calculated
offline using a MapReduce job. This information is used to calculate the highest
grid level which is used by the query optimizer in transformation of queries. The
spatial information in BerlinMOD dataset is in grid format rather than lat-long.
We use this information to transform the spatial attribute to lat-long format.
2. Maximum Temporal Bounds: This information can be used for designing
an index for temporal periods. We have left this for future work.
3. Total count of points for each grid cell: The stats-store maintains a list
of grid-cells for top 11 levels and the number of points each cell holds. This
information is used by the coverage-algorithm to decide which cell to go deep in.
4. Total count of objects for each level: The stats-store contains the total
count of objects for each level. This information is used by the index builder
79
5. INDEXING STRATEGY & QUERYING FRAMEWORK
to minimize the number of different length hashes in an index. This helps in
reducing the number of HBase scan/get operations for processing a query.
5. Hash-length Frequency for Indexes: The stats-store contains the frequency
of hashes of each length for all SFC indexes. This information is used by the
Query Optimizer to formulate WHERE clause conditions for only those lengths
that exist in the index.
5.5.6 Hash Coverage Algorithm
The hash-coverage algorithm takes as input a list of hashes to determine the coverage
area and return the hashes which cover the area. It accepts a parameter containing the
list of levels for which coverage hashes to generate. The algorithm uses the statistics
of data from statistics store to find best hashes in a greedy manner. The pseudocode
of the algorithm is presented in algorithm 2.
The algorithm starts with the highest level and finds the hashes representing it.
If the number of hashes do not exceed the maximum number of hashes specified, the
hashes for next level are found. These hashes are stored in a list where each element of
the list belongs to hashes of a single level. The hashes which are completely covered by
the hashes of granular level represent the area accurately. We keep those hashes and
remove all the hashes covered by this hash from the granular level. We continue this
process until we get the best coverage in the list of levels provided. If the total number
of hashes exceed the maximum number of hashes specified, we trim the hashes. This
is done by trimming the hashes starting from the granular level. Those hashes are re-
moved first which can be best covered by adding a hash at the immediate higher level. If
the lowest level becomes empty, we move to the next higher level and repeat the process.
80
5.5 The Querying Framework
Data: listOfLevels, region, maxHashesResult: listOfHashessortAscending(listOfLevels);foreach level in listOfLevels do
listOfHashes[level]=findHashes(region,level);if listOfHashes[level-1] exists then
foreach hash in listOfHashes[level-1] doif hash completely covered by listOfHashes[level] then
remove hashes from listOfHashes[level] covered by hash ;endremove hash from listOfHashes[level-1] ;
end
endif size(listOfHashes) >maxHashes) then
break;end
endwhile size(listOfHashes) >maxHashes do
level=LOWEST LEVEL ;commonPrefix = the level-1 common prefix shared by most of thelistOfHashes[level] ;Add commonPrefix to listOfHashes[level-1] ;remove hashes from listOfHashes[level] with commonPrefix ;if size(listOfHashes[level]=0) then
level=level+1;end
endAlgorithm 2: Coverage Algorithm
5.5.7 Client-side filter
When the queries are rewritten by the optimizer, there can be two scenarios. Either
the re-written part only involves points or they also involve area datatypes. If the
re-written part only involves points, the results are exact and do not need to be filtered
at the client end. If the user wrote a query which covers more results than required,
he needs to filter the results by himself. Normally, the Query Optimizer pushes the
actual operator to Region Servers so that no filtering is required at the client but there
are scenarios where some data needs to be filtered. For example, a secondary index
may contain many object IDs against a single hash and a single object ID may be
present against many hashes. When such kind of result is returned to the client, the
81
5. INDEXING STRATEGY & QUERYING FRAMEWORK
result needs to be filtered. The client filter receives the results in the form of JDBC
ResultSet and does the filtering. The user can also use the Client-filter to filter his
results. Currently we handle such situation manually. Design of a generic Client-side
filter is recommended as future task.
5.5.8 Meta-Store
The meta-store represents information about schema design useful for Query Translator
and Query Optimizer. In our implementation the meta-store is kept at the client-end for
easier experimentation and tests. It is strongly recommended to keep this information
in HDFS so that other clients connecting to HBase can use this information to optimize
Spatio-temporal queries. The meta-store is created in XML format for easy parsing and
is loaded into memory before running queries.
Meta-Store contains meta information about the schema, indexes as well as the
algebra. The meta-store has two parts which are discussed in the following.
5.5.8.1 Schema Meta-Data
Schema meta-data contains information about the schema that is useful for the query
translator and query optimizer. The general schema information is maintained by
Phoenix which can also be queried for some information but to keep it simple and
robust, we keep all the meta-data as an xml file at the client. The root node of the
meta-store is <MetaStore> which contains a list of <Table> nodes. It is important to
put the information about all tables which contain at least one spatial column. There
can be many spatial columns in a table. As the columns can store spatial information
either in lat-long format or SFC format, this information is stored as an attribute
to each spatial column node. If a column is spatial but is not SFC based, Query
Translator wont translate input arguments to SFC hashes. Query Optimizer will skip
the conditional clauses based on such columns.
Each spatial column can be indexed using SFC hashes. Each of these hashes is
represented as a column of a table. For now, we only support SFC based indexing
which means that we have to store SFC indexing attributes in the <column> node
which is the child of <IndexColumns> node. There can be more than one indexes of
a column depending on the number of levels chosen for indexing. The higher the level,
more results will be returned which means that the work to be done by client-filter
82
5.5 The Querying Framework
Index Type Abbreviation
Single-Level Single-Hash SLSH
Single-Level Multi-Hash SLMH
Multi-Level Multi-Hash MLMH
Table 5.4: Index Types
will be significantly greater. Each index has an attribute of type IndexType which
tells which kind of index it contains. There are three possible options which have been
discussed before but summarized in table 5.4. The Query Optimizer can rewrite the
query in a way that instead of querying the original table, the index is queried. But
the requirement in this case is that the secondary index should contain all the columns
requested in the query and the names of the columns should be the same. If this is not
the case, the query needs to be split into two queries one for the index and based on
the results, another query for main table. Currently query splitting is not supported
by the Query Optimizer but is planned as a future work.
Figure 5.3 shows the meta-data of two tables. First table represents movement of
cars and has columns ID, LICENCE, LOC and MPOINT. LOC denotes the current
location of the car. LOC has been stored as a key in HBase table which means that it is
also indexed. LOC is a point and is represented by 12 characters. All queries involving
LOC attribute will be translated to SFC queries. Another spatial column in this table
is MPOINT which represents motion of the car with time. As a table can have only
one index, it is not possible to index this column in the same table. <IndexColumn>
node contains information about secondary indexes. In the current example, MPOINT
has two secondary indexes MOVE INDEX 1 and MOVE INDEX 2, one of length 10
and the other of length 8 respectively. The IndexType attribute tells that their type is
SLSH. The SFCLength attribute denotes the length of hashes stored. If this attribute is
set to 0, this means that the hashes are of variable length as in case of ROUTE column
of ROAD table. Each secondary index also stores information about its sister columns
present in the index. The secondary indexes of MPOINT column are bundled with ID
and LICENCE attributes. This means that if the query is diverted to the index instead
of main table, these two columns can also be retried from the index. This optimizes
83
5. INDEXING STRATEGY & QUERYING FRAMEWORK
Figure 5.3: Meta-Store
the query in a way that a second query to get these columns from main table is not
needed to be sent.
The second table ROADS stores the roads of an area. It has two columns ID and
ROUTE. ROUTE is of type line so can not be stored as an HBase key. Instead,
its bounding box is indexed as key. The <IndexColumns> shows two indexes for
this column. One in the same ROADS table which serves as the key and the other in
MOV INDEX 12 table. Both indexes are of type MLMH. This information is mentioned
to understand the indexing type and is not needed by the querying framework. The
querying process is same for all types of indexes. The information that is helpful
is the length of the index. If it is fixed and greater than 0, it represents a kind of
multi-index approach where a single index can only hold hashes of same length. This
information is used by the Query Optimizer to generate queries best fitting this length.
If the mentioned length is 0, this means that the index contains the hashes of different
length. In this case, the Query Optimizer uses the information from stats-store to know
84
5.6 Future Work
what variety of hash-lengths are available and formulates the query accordingly.
5.5.8.2 Algebra Meta-Data
This store contains the meta-information about the implemented Guting’s algebra op-
erators. Query Translator uses this information to find out Guting’s operators in a
query. This information can also be used for type checking and query validation. No
work has been done by us in this direction. We consider this a part of our future work.
5.6 Future Work
When the user types the query in Phoenix, it is parsed and type checked by Phoenix.
This means that if the user has written a wrong query e.g. comparing a Varchar with
a Long, the parser will throw an error informing user about the actual problem. In
our implementation, the query is directly handed over to the Query Translator which
hands it over to Query Optimizer after translation. Query Optimizer transforms the
query and sends it to Phoenix. By now the query is different than what user wrote and
Phoenix can only do checks on the transformed query. If an exception is thrown, its
hard to debug what went wrong. We suggest that a mechanism should be developed
by which Phoenix does the type checking first before it is handed over to our Query
Translator.
The Query Optimizer is an important part of our implementation and improves
the performance of queries many times. It can convert whole table scans to point
queries and choose the best coverage for input geometries etc. In its current state, it
can not handle secondary indexes which do not contain the other required columns,
automatically and some manual intervention is required. We propose to improve the
optimizer so that it can automatically split the queries and join the results for the user.
The query performance can be enhanced for queries if we also index time periods
in the similar manner as we do for spatial domain. We believe that we can devise a
custom SFC based index for time-periods as well but this is left for future work.
The client-side filter can filter out results based on equality. In some scenarios,
Guting’s operators are needed to filter the result set. Currently these scenarios are
handled manually. The design of a generic client-side filter is recommended.
85
5. INDEXING STRATEGY & QUERYING FRAMEWORK
We currently index the regions using SLMH approach which gave us good per-
formance during our experiments. We consider MLMH approach to be more optimal
which can perform better than SLMH for huge datasets. Our coverage algorithm can be
modified a bit to find the best multi-level hashes for this index. It would be interesting
to compare its performance with SLMH.
We implemented Guting’s data-types using float arrays. The encoding and decoding
the types are cpu heavy although we try to skip it where possible (by only reading the
meta-data of the data-type) and the implementation of the operators is complex and
non-intuitive as for performance reasons most of the operations are performed on float
arrays. Parsing the data-types and formation of objects is also an extra overhead which
we managed to reduce significantly because of our indexing approach which helps us
in applying the actual operators to filtered data which is significantly smaller than the
table. If the query can not be transformed by the Query Optimizer for some reason
e.g. index not available etc, the operators will make the query perform very poorly.
Struct data-type in Phoenix is under implementation. If this is released in the next
version, we propose to port the algebra using Struct data-type. This will essentially
involve re-writing of most part of the algebra.
86
6
Benchmark & Results
This chapter presents the results from our experiments conducted using BerlinMOD
Benchmark data. The experiments focus on a comparison between Parallel Secondo,
raw HBase/Phoenix and our implementation of Guting’s Algebra over HBase/Phoenix.
We first present our experimental setup and describe the BerlinMOD Benchmark
datasets. We explain our choice of queries and provide a query-wise analysis of ex-
perimental results.
6.1 Experimental Setup
We conducted our experiments on a cluster of four machines. Each machine was
equipped with 2 Intel Xeon E5530 CPUs (4 cores, 8 hardware contexts) and 48GB
RAM. The machines disk arrays read 500 MB/sec, according to hdparm. The clus-
ter has consequently 32 cores, 64 threads. We used HBase version 0.94.18, Hadoop
version 1.2.1, Apache Phoenix version 3.0 and Parallel SECONDO version 3.3.0 all in
their default configuration. We used OpenJDK version 7 for Parallel Secondo and
Oracle Java version 7 for rest of the platforms. For all index building purposes,
MAX HASHES PER REGION parameter was set to 100. The points and regions used
as an input for our test queries were sampled from QueryPoints and QueryRegions
tables part of BerlinMOD Benchmark.
87
6. BENCHMARK & RESULTS
ScaleFactor Days Vehicles Trips Size
0.005 2 141 1,797 64.5MB
0.05 6 447 15,045 561MB
0.2 13 894 62,510 2.2GB
1.0 28 2,000 292,940 11GB
Table 6.1: BerlinMOD Datasets
6.2 The Dataset
For our experiments we used the data of BerlinMOD[16] benchmark. BerlinMOD has
been designed at University of Hagen under Dr. Ralf Guting who originally proposed
the algebra we implemented as part of this thesis. This benchmark data has been
primarily designed to measure the performance of queries on moving objects data or
specifically an mpoint. The data has been sampled from the moving point data of a
set of cars whose driving is simulated on the street network of Berlin. The simulation
models have been built to represent the behavior of typical workers who commute
between their homes and work places on all working days, and make some trips to
other places in their free time. The available sampled data is mapped to the street
network. The dataset also comes with a generator which allows us to generate data
ourselves based on various parameters. Although the data is mapped to street network,
disturbed data can also be generated. The benchmark data is generated based on real
world spatial data that has been imported from a tool called bbbike (http://bbbike.de).
Bbbike contains real world spatial data based on the streets of Belrin. Currently, the
data of four scales is available. We used all four scales in our experiments to check
scalability. The different datasets that we used and their characteristics are given in
Table 6.1.
6.3 Query Selection
We selected five different queries to test our implementation against other platforms.
These queries have been selected based on the indexed data-type and the provided
input. As our index implementation as well as query optimization revolves around
handling varios index and input combinations of area and point data-types, we tried to
88
6.4 Results
Query Index Input
1. Point Point
2. Point Region
3. Region Point
4. Region Region
5. Region Region
Table 6.2: Benchmark queries by index and input types
cover all possible scenarios and see how our implementation performs when compared
to other platforms. Table 6.2 shows our selected queries by their index and input types.
6.4 Results
In the following, we present the query-wise results of our experiments.
6.4.1 Query-1
SELECT C. Licence AS Licence
FROM dataScar C
WHERE equals(C.loc,Point);
This query retrieves the licenses of all cars who are currently present at a particular
point. This is a simple point comparison. As seen in figure 6.1, raw Phoenix performs
very poor and does not scale. The reason is that Phoenix/HBase does not support
point data-type. We tried to use the latitude component of point as a key but still
HBase has to perform a scan for matching longitude component. The performance is
very bad for scale-1 because the increase in the data to scan. Parallel SECONDO gives
constant processing time. The overhead is because of running a Hadoop job for getting
the results. For our implementation, it is a simple HBase get/scan request because the
point as a whole is indexed as part of the key which allows HBase to retrieve the results
optimally.
89
6. BENCHMARK & RESULTS
Figure 6.1: Query-1 Results
6.4.2 Query-2
SELECT C. Licence AS Licence
FROM dataScar C
WHERE inside(C.loc,Region);
This query is similar to Query-1 but takes a region as an input. Raw Phoenix could
not return results for scale-0.2 and scale-1.0. This is because it has to retrieve all the
points at the client to perform inside operation. In our implementation, Phoenix issues
a simple get/scan request on the key which contains the GeoHash of point.
6.4.3 Query-3
SELECT C. Licence AS Licence
FROM dataScar C
WHERE passes(C.Trip,Point);
This query retrieves the licenses of all vehicles which have been at a particular point
at any point in time in their history. C.Trip is of type mpoint. Raw Phoenix fails to
return results for scale-0.2 and scale-1.0 because it has to retrieve all data to the client
for processing. It retrieves all the points from HBase, accumulates them together to
form a line for each car and then uses JTS library to find the intersection between
90
6.4 Results
Figure 6.2: Query-2 Results
the input point and this line. Our implementation performs better because of two
reasons. Firstly because we use a secondary index to retrieve licenses that are potential
candidates and query the main table for only those licenses. Secondly, We push the
operation to HBase Region Servers which means that the operator is applied to the
potential candidates in each slave node and only true results are returned. Our mpoint
data-type contains its exact bounding box in its header which helps us further filter
out irrelevant results and improve performance.
6.4.4 Query-4
SELECT C. Licence AS Licence
FROM dataScar C
WHERE intersects(C.Trip,Region);
This query finds all cars which have been in an area during any time in their
history. Both the parameters of the operator are of type area. The query optimizer
formulates the optimal hash for the input region and sends the query to secondary index.
Operation-wise this query is similar to Query-3 because the secondary index contains
hashes of length 5,6,7 and the input region can be covered by a hash of length 8. This
means that the region has to be trimmed to achieve the WHERE clause conditions.
Same happens in Query-3 where a point represented by hash-length 12 is trimmed to
91
6. BENCHMARK & RESULTS
Figure 6.3: Query-3 Performance
match the index. Figure 6.4 shows the performance comparison of this query. Raw
Phoenix could only process the query on scale-0.005 and crashed for all others as it
could not handle the amount of data flowing in.
6.4.5 Query-5
SELECT C. Licence AS Licence
FROM dataScar C
WHERE length(trajectory(at(C.Trip,Region)))>10;
This query is operationally similar to Query-4 as index performance is the same. It
only contains more spatial operations on the filtered data which means that this is CPU
heavy when compared to the previous query. Figure 6.5 shows that raw Phoenix could
only process this query on scale-0.005 data. Our implementation performs better than
Parallel SECONDO and scales well.
92
6.4 Results
Figure 6.4: Query-4 Performance
Figure 6.5: Query-5 Performance
93
6. BENCHMARK & RESULTS
94
7
Conclusion
This master’s thesis presented the implementation of Guting’s Algebra in HBase, differ-
ent index design and a querying framework which as a whole enable Guting’s Algebra
queries’ execution in HBase. We conclude this work with a summary of the topics
covered in each chapter and an outlook on some of our ideas for future development.
7.1 Summary
Chapter 2 provided the background for this thesis. It covered different algebras that
could be used to process trajectory data and explained Guting’s Algebra in detail. We
discussed different kinds of indexing techniques with a focus on GeoHash.
Chapter 3 presented different distributed platforms for spatio-temporal data pro-
cessing. We discussed some of the open-source platforms for online querying and ex-
plained Parallel SECONDO as well as HBase in detail as they form the basis of our
thesis. We wrap the chapter by presenting various criteria which we considered before
choosing HBase as a platform for our implementation.
Chapter 4 presented our implementation of Guting’s data-types. We discussed
the internal representation of each data-type and explained important data-types like
mpoint in more detail.
Chapter 5 started with an explanation of indexing in HBase. We discussed different
index deployment strategies and motivated the suitability of SFC based indexes for
HBase. We presented three different index designs for indexing regions and discussed
their querying process. We presented a querying framework which enables the use
95
7. CONCLUSION
of our index implementation for Guting’s algebra queries and explained its various
components such as Query Translator and Query Optimizer in detail. We presented a
hash-coverage algorithm for determining hashes to be sent as input to a query on our
index.
Finally, Chapter 6 reported the results of our experiments on five selected queries.
The experiments compared the scalability and the execution performance of Parallel
SECONDO, raw HBase and our implementation. We analyzed the experiment results
and offered possible explanations for the observed performance difference between the
these systems.
7.2 Outlook
Our work in this thesis has delivered promising results and can be extended in many
ways for performance enhancement and usability. One of the major areas to work on, is
temporal periods indexing. Unlike spatial dimension which has fixed bounds, temporal
dimension is always changing. In future we would like to find out how can we use SFCs
to index temporal periods effectively. We also intend to discover ways and means by
which we can combine SFC based spatial and temporal indexes in such a way that a
single key can be used to index both the dimensions.
In the future, we want to improve the querying framework in a two fold way: First,
we want to add support of splitting the queries automatically. Second, we plan to
integrate it with the Query Optimizer of Phoenix so that all other features of Phoenix
e.g. type checking and meta-data storage, can be reused instead of maintaining separate
meta-stores. We also plan to integrate our index building process with Phoenix in such
a way that a user can use SQL to create an index by specifying the corresponding type
i.e. SLSH or SLMH etc.
We intend to compare different indexing strategies i.e. SLSH, SLMH and MLMH
for performance. We also want to see the effect of max-hash parameter in SLMH and
MLMH indexes and find out how an optimal value can be found. We plan to improve
our implementation of data-types by using struct data-type of Phoenix whenever it is
released. We intend to contribute this work to Apache Phoenix.
96
References
[1] Yu Zheng and Xiaofang Zhou. Computing with spatial
trajectories. 2011. iii, ix, 12, 13, 15
[2] Ralf Hartmut Guting, Michael H Bohlen, Martin Erwig,
Christian S Jensen, Nikos A Lorentzos, Markus Schneider,
and Michalis Vazirgiannis. A foundation for repre-
senting and querying moving objects. ACM Trans-
actions on Database Systems (TODS), 25(1):1–42, 2000.
1, 8, 9
[3] Luca Forlizzi, Ralf Hartmut Guting, Enrico Nardelli, and
Markus Schneider. A data model and data structures for
moving objects databases, 29. ACM, 2000. 1
[4] Shashi Shekhar and Hui Xiong. Moving Object
Databases. In Encyclopedia of GIS, pages 732–732.
Springer, 2008. 1
[5] Hartmut Guting, Teixeira de Almeida, and Zhiming Ding.
Modeling and querying moving objects in net-
works. The VLDB JournalThe International Journal on
Very Large Data Bases, 15(2):165–190, 2006. 1
[6] Ralf Hartmut Guting, Thomas Behr, Victor Almeida,
Zhiming Ding, Frank Hoffmann, Markus Spiekermann, and
LG Datenbanksysteme fur neue Anwendungen. SEC-
ONDO: An extensible DBMS architecture and prototype.
FernUniversitat, Fachbereich Informatik, 2004. 1, 12
[7] Ralf Hartmut Guting, Victor Almeida, Dirk Ansorge,
Thomas Behr, Zhiming Ding, Thomas Hose, Frank Hoff-
mann, Markus Spiekermann, and Ulrich Telle. Secondo:
An extensible DBMS platform for research pro-
totyping and teaching. In Data Engineering, 2005.
ICDE 2005. Proceedings. 21st International Conference
on, pages 1115–1116. IEEE, 2005. 1
[8] Victor Teixeira de Almeida, Ralf Hartmut Guting, and
Thomas Behr. Querying Moving Objects in SEC-
ONDO. In MDM, 6, page 47, 2006. 1
[9] Ralf Hartmut Guting, Thomas Behr, and Christian Dunt-
gen. Secondo: A platform for moving objects database
research and for publishing and integrating research im-
plementations. Fernuniv., Fak. fur Mathematik u. Infor-
matik, 2010. 1
[10] Nikos Pelekis, Elias Frentzos, Nikos Giatrakos, and Yan-
nis Theodoridis. HERMES: aggregative LBS via a
trajectory DB engine. In Proceedings of the 2008
ACM SIGMOD international conference on Management
of data, pages 1255–1258. ACM, 2008. 1
[11] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C
Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chan-
dra, Andrew Fikes, and Robert E Gruber. Bigtable:
A distributed storage system for structured
data. ACM Transactions on Computer Systems (TOCS),
26(2):4, 2008. 2, 27, 28, 35, 39
[12] James F Allen. An Interval-Based Representation of
Temporal Knowledge. In IJCAI, 81, pages 221–226,
1981. 5
[13] Krishna Kulkarni and Jan-Eike Michels. Temporal fea-
tures in SQL: 2011. ACM SIGMOD Record, 41(3):34–
43, 2012. 6
[14] Nikos Pelekis, E Frentzos, N Giatrakos, and Y Theodor-
idis. HERMES: A trajectory DB engine for
mobility-centric applications. International Journal
of Knowledge-based Organizations, 2011. 6
[15] Martin Erwig, Ralf Hartmut Gu, Markus Schneider,
Michalis Vazirgiannis, et al. Spatio-temporal data
types: An approach to modeling and query-
ing moving objects in databases. GeoInformatica,
3(3):269–296, 1999. 8
[16] Christian Duntgen, Thomas Behr, and Ralf Hartmut
Guting. BerlinMOD: a benchmark for moving ob-
ject databases. The VLDB Journal, 18(6):1335–1368,
2009. 11, 70, 78, 88
[17] Dieter Pfoser, Christian S Jensen, Yannis Theodoridis,
et al. Novel approaches to the indexing of mov-
ing object trajectories. In Proceedings of VLDB, pages
395–406. Citeseer, 2000. 12, 14
[18] Kyung-Chang Kim and Suk-Woo Yun. MR-Tree: a
cache-conscious main memory spatial index struc-
ture for mobile GIS. In Web and Wireless Geographical
Information Systems, pages 167–180. Springer, 2005. 12
[19] Mario A Nascimento and Jefferson RO Silva. Towards
historical R-trees. In Proceedings of the 1998 ACM
symposium on Applied Computing, pages 235–240. ACM,
1998. 12, 15
[20] Yufei Tao and Dimitris Papadias. Efficient historical R-
trees. In Scientific and Statistical Database Management,
2001. SSDBM 2001. Proceedings. Thirteenth International
Conference on, pages 223–232. IEEE, 2001. 12, 15
[21] V Prasad Chakka, Adam C Everspaugh, and Jignesh M Pa-
tel. Indexing large trajectory data sets with SETI.
Ann Arbor, 1001:48109–2122, 2003. 12, 16
[22] Panfeng Zhou, Donghui Zhang, Betty Salzberg, Gene
Cooperman, and George Kollios. Close pair queries
in moving object databases. In Proceedings of the
13th annual ACM international workshop on Geographic
information systems, pages 2–11. ACM, 2005. 12
[23] Gısli R Hjaltason and Hanan Samet. Distance browsing
in spatial databases. ACM Transactions on Database
Systems (TODS), 24(2):265–318, 1999. 13
[24] Xiaolei Li, Jiawei Han, Jae-Gil Lee, and Hector Gonza-
lez. Traffic density-based discovery of hot routes
in road networks. In Advances in Spatial and Temporal
Databases, pages 441–459. Springer, 2007. 14
97
REFERENCES
[25] Xiaomei Xu, Jiawei Han, and Wei Lu. RT-tree: an im-
proved R-tree index structure for spatiotemporal
databases. In Proceedings of the 4th international sympo-
sium on spatial data handling, 2, pages 1040–1049. IGU
Commission on GIS, 1990. 15
[26] Yufei Tao and Dimitris Papadias. The mv3r-tree: A
spatio-temporal access method for timestamp and
interval queries. 2001. 16
[27] David Lomet and Betty Salzberg. The performance
of a multiversion access method. In ACM SIGMOD
Record, 19, pages 353–363. ACM, 1990. 17
[28] Longhao Wang, Yu Zheng, Xing Xie, and Wei-Ying Ma.
A flexible spatio-temporal indexing scheme for
large-scale GPS track retrieval. In Mobile Data Man-
agement, 2008. MDM’08. 9th International Conference on,
pages 1–8. IEEE, 2008. 17
[29] Michael Rys Nicholas Dritsas Ed Katibah, Milan Stojic.
Tuning Spatial Point Data Queries in SQL Server
2012. In http://social.technet.microsoft.com/, pages 441–
459. Microsoft, 2013. 18
[30] Marshall Bern, David Eppstein, and Shang-Hua Teng.
Parallel construction of quadtrees and quality tri-
angulations. International Journal of Computational
Geometry & Applications, 9(06):517–532, 1999. 18
[31] Jonathan K Lawder and Peter JH King. Using space-
filling curves for multi-dimensional indexing. In
Advances in Databases, pages 20–35. Springer, 2000. 19,
20
[32] Nick Dimiduk, Amandeep Khurana, Mark Henry Ryan, and
Michael Stack. HBase in action. Manning Shelter Island,
2013. 21, 22, 23, 77
[33] Ahmed Eldawy and Mohamed F Mokbel. A demon-
stration of spatialhadoop: an efficient mapreduce
framework for spatial data. Proceedings of the VLDB
Endowment, 6(12):1230–1233, 2013. 25
[34] Ahmed Eldawy, Yuan Li, Mohamed F Mokbel, and Ravi
Janardan. CG Hadoop: computational geome-
try in MapReduce. In Proceedings of the 21st ACM
SIGSPATIAL International Conference on Advances in
Geographic Information Systems, pages 284–293. ACM,
2013. 25
[35] Ahmed Eldawy and Mohamed F Mokbel. Pigeon: A
spatial MapReduce language. In Data Engineering
(ICDE), 2014 IEEE 30th International Conference on,
pages 1242–1245. IEEE, 2014. 26
[36] Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaol-
ing Liu, Xiaodong Zhang, and Joel Saltz. Hadoop GIS:
a high performance spatial data warehousing sys-
tem over mapreduce. Proceedings of the VLDB En-
dowment, 6(11):1009–1020, 2013. 26, 27
[37] Avinash Lakshman and Prashant Malik. Cassandra: a
decentralized structured storage system. ACM
SIGOPS Operating Systems Review, 44(2):35–40, 2010.
26
[38] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gu-
navardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Swaminathan Sivasubramanian, Peter Vosshall, and
Werner Vogels. Dynamo: amazon’s highly available
key-value store. In ACM SIGOPS Operating Systems
Review, 41, pages 205–220. ACM, 2007. 27
[39] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng
Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete
Wyckoff, and Raghotham Murthy. Hive: a warehous-
ing solution over a map-reduce framework. Pro-
ceedings of the VLDB Endowment, 2(2):1626–1629, 2009.
27
[40] Apache HBase. 28
[41] Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira,
and Benjamin Reed. ZooKeeper: Wait-free Coordina-
tion for Internet-scale Systems. In USENIX Annual
Technical Conference, 8, page 9, 2010. 28
[42] Apache Hadoop. 28
[43] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
The Google file system. In ACM SIGOPS Operating
Systems Review, 37, pages 29–43. ACM, 2003. 28
[44] Apache Phoenix. 28
[45] Jiamin Lu and Ralf Hartmut Guting. Parallel SEC-
ONDO: A practical system for large-scale process-
ing of moving objects. In Data Engineering (ICDE),
2014 IEEE 30th International Conference on, pages 1190–
1193. IEEE, 2014. 28, 30
[46] Jiamin Lu and Ralf Hartmut Guting. Simple and ef-
ficient coupling of Hadoop with a database en-
gine. In Proceedings of the 4th annual Symposium on
Cloud Computing, page 32. ACM, 2013. 32
[47] Patrick ONeil, Edward Cheng, Dieter Gawlick, and Eliz-
abeth ONeil. The log-structured merge-tree (LSM-
tree). Acta Informatica, 33(4):351–385, 1996. 35
[48] Amandeep Khurana. Introduction to HBase Schema
Design. Usenix;login, 37(5):1626–1629, 2012. 39
[49] JTS Topology Suite. 52
[50] Open GIS Consortium. 52
98
Declaration
I herewith declare that I have produced this paper without the prohibited
assistance of third parties and without making use of aids other than those
specified; notions taken over directly or indirectly from other sources have
been identified as such. This paper has not previously been presented in
identical or similar form to any other German or foreign examination board.
The thesis work was conducted from 1st April to 10th August 2014 under
the supervision of Alexander S. Alexandrovv at the DIMA research group
at TU Berlin.
Berlin, 10th August 2014