Upload
ccrinc
View
106
Download
1
Embed Size (px)
DESCRIPTION
GeoMesa presentation from LocationTech Tour - DC - November, 14th 2013. Presented by Anthony Fox (@algoriffic) of CCRi. GeoMesa is an open source project providing spatio-temporal indexing, querying, and visualizing capabilities to Accumulo. Learn more at http://geomesa.github.io/
Citation preview
Anthony Fox Director, Data Science and System Architecture Commonwealth Computer Research, Inc [email protected]
What is this talk about?
Indexing, querying, visualizing, and analyzing spatio-temporal data at scale.
Using open-source.
Why?
Why?
● Volume of spatio-temporal data is increasing exponentially ● Traditional multi-dimensional indexing techniques are
straining to keep up
How?
• Storage - leverage distributed databases like Accumulo.
• Compute - parallelize spatio-temporal queries and analytics using MapReduce.
GeoMesa enables geospatial analytics within
the Hadoop ecosystem.
What is GeoMesa?
• A flexible spatio-temporal index built on Accumulo.
• An implementation of GeoTools interfaces to make integration seamless.
• A set of GeoServer plugins for OGC compliant access to data.
Integration
What is Accumulo?
“The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org
What is Accumulo?
“The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org
Based on Google BigTable Adds cell-level security and server side programming model in the form of composable iterators
What is Accumulo?
“The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org
h"p://accumulo.apache.org/1.4/user_manual/Accumulo_Design.html
What is Accumulo?
“The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org
h"p://accumulo.apache.org/1.4/user_manual/Accumulo_Design.html
How Do We Store Multi-Dimensional Data in a Dictionary?
• Space Filling Curves project multiple dimensions into a single dimension
• Base32 encoding induces an Accumulo friendly lexicographic ordering
• Recursive nesting facilitates storing different resolutions of data
• GeoHashes are common in web services
http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves
How Does GeoMesa’s Index Work? Constructs a key beginning with a
shard id for horizontal scalability.
How Does GeoMesa’s Index Work? Constructs a key beginning with a
shard id for horizontal scalability.
How Does GeoMesa’s Index Work? Constructs a key beginning with a
shard id for horizontal scalability.
How Does GeoMesa’s Index Work? Constructs a key beginning with a
shard id for horizontal scalability.
Uses Space Filling Curves to encode spatio-temporal data in Accumulo keys.
How Does GeoMesa’s Index Work? Constructs a key beginning with a
shard id for horizontal scalability.
Uses Space Filling Curves to encode spatio-temporal data in Accumulo keys.
Stacks server side iterators to apply (E)CQL standard queries in parallel at scan time.
What is the GeoMesa Model?
How Does GeoMesa Perform?
GDELT - Global Database of Events, Language, and Tone Leetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual
Conference, April 2013. San Diego, CA. - See more at: http://gdelt.utdallas.edu/about.html
220 million geocoded events from 1979 until current. Exhibits pathologies common in spatio-temporal data sets
Hot spots Bad geocoding
GDELT GDELT assigns an Event Code
to each event.
Codes are based on CAMEO - Conflict Mediation and Event Observation.
There are 20 top level CAMEO codes.
John Beieler developed a visualization of every protest (one of the top level categories) on the planet since 1979.
http://www.foreignpolicy.com/articles/2013/08/22/mapped_what_every_protest_in_the_last_34_years_looks_like
GDELT
http://geomesa.github.io/gdelt.html
How?
Storage, Querying, Filtering
Aggregation and analysis
Visualization
Using Open Source
Distributed Spatial Computations
● Scalding greatly simplifies Map/Reduce
● AccumuloSource is an implementation of a Scalding source/sink
● GeoMesa allows developers to work with SimpleFeatures in a Map/Reduce job
Performance
PostGIS 1000 responses in > 30 seconds
GeoMesa 1000 responses in < 1 second
Roadmap
• Enhance integration with cell level security • Build statistical index and query optimization
o Bring Your Own Space Filling Curve o “VACUUM ANALYZE”
• Integrate GeoWebCache and Hadoop • Ease developer on-ramping • Grow community through LocationTech