52
Benchmarking Geospatial RDF Stores George Garbis, Kostis Kyzirakos, Manolis Koubarakis Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece 1st International Workshop on Benchmarking RDF Systems (BeRSys 2013)

Geographica: A Benchmark for Geospatial RDF Stores

Embed Size (px)

DESCRIPTION

Geospatial extensions of SPARQL like GeoSPARQL and stSPARQL have recently been defined and corresponding geospatial RDF stores have been implemented. However, there is no widely used benchmark for evaluating geospatial RDF stores which takes into account recent advances to the state of the art in this area. In this paper, we develop a benchmark, called Geographica, which uses both real-world and synthetic data to test the o ffered functionality and the performance of some prominent geospatial RDF stores.

Citation preview

Page 1: Geographica: A Benchmark for Geospatial RDF Stores

Benchmarking Geospatial RDF Stores

George Garbis, Kostis Kyzirakos, Manolis Koubarakis

Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece

1st International Workshop on Benchmarking RDF Systems (BeRSys 2013)

Page 2: Geographica: A Benchmark for Geospatial RDF Stores

25/26/2013

Outline

• Motivation

• SPARQL extensions for querying geospatial data expressed in RDF

• State-of-the-art geospatial RDF stores

• Related benchmarks

• The benchmark Geographica

• Evaluating the performance of geospatial RDF stores using Geographica

• Conclusions

Page 3: Geographica: A Benchmark for Geospatial RDF Stores

4

Geospatial data on the Web

Page 4: Geographica: A Benchmark for Geospatial RDF Stores

5

Open Government Data

Page 5: Geographica: A Benchmark for Geospatial RDF Stores

6

Linked geospatial data – Ordnance Survey

Page 6: Geographica: A Benchmark for Geospatial RDF Stores

7

Linked geospatial data – Open Street Map

Page 7: Geographica: A Benchmark for Geospatial RDF Stores

8

Linked geospatial data – http://www.linkedopendata.gr

Page 8: Geographica: A Benchmark for Geospatial RDF Stores

9

Linked geospatial data – Wildfire Monitoring

Page 9: Geographica: A Benchmark for Geospatial RDF Stores

105/26/2013

Outline

• Motivation

• SPARQL extensions for querying geospatial data expressed in RDF

• State-of-the-art geospatial RDF stores

• Related benchmarks

• The benchmark Geographica

• Evaluating the performance of geospatial RDF stores using Geographica

• Conclusions

Page 10: Geographica: A Benchmark for Geospatial RDF Stores

Our Contributions

• The data model stRDF and the query language stSPARQL

• The system Strabon

11

[ ISWC 2012 ]

Page 11: Geographica: A Benchmark for Geospatial RDF Stores

The Data Model stRDF

• stRDF stands for spatiotemporal RDF.

• It is an extension of the W3C standard RDF for the representation of geospatial data that may change over time.

• stRDF extends RDF with:

• Spatial literals encoded in OGC standards Well-Known Text or GML

• New datatypes for spatial literals (strdf:WKT, strdf:GML and strdf:geometry)

• Valid time of triples (ignored in this talk) [ ESWC 2013 ]

Page 12: Geographica: A Benchmark for Geospatial RDF Stores

stRDF: An example

Page 13: Geographica: A Benchmark for Geospatial RDF Stores

stRDF: An example (WKT)

ex:BurntArea1 rdf:type noa:BurntArea.

ex:BurntArea1 noa:hasID "1"^^xsd:decimal.

ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.

ex:BurntArea1 strdf:hasGeometry "POLYGON(( 38.16 23.7, 38.18 23.7, 38.18 23.8, 38.16 23.8, 38.16 23.7)); <http://spatialreference.org/ref/epsg/4121/>"^^strdf:WKT .

Spatial Literal (OpenGIS Simple Features)

Spatial Data Type Well-Known Text

Page 14: Geographica: A Benchmark for Geospatial RDF Stores

stRDF: An example (GML)

ex:BurntArea1 rdf:type noa:BurntArea.

ex:BurntArea1 noa:hasID "1"^^xsd:decimal.

ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.

ex:BurntArea1 strdf:hasGeometry

"<gml:Polygon

srsName='http://www.opengis.net/def/crs/EPSG/0/4121'>

<gml:outerBoundaryIs>

<gml:LinearRing>

<gml:coordinates>38.16,23.70 38.18,23.70 38.18,

23.80 38.16,23.80,38.16 23.70

</gml:coordinates>

</gml:LinearRing>

</gml:outerBoundaryIs>

</gml:Polygon>"^^strdf:GML .

Spatial Literal (GML Simple Features Profile)

Spatial Data Type GML

Page 15: Geographica: A Benchmark for Geospatial RDF Stores

• Find all burnt forests close to a citySELECT ?BA ?BAGEO

WHERE {

?R rdf:type noa:Region .

?R strdf:hasGeometry ?RGEO .

?R noa:hasLandCover ?F .

?F rdfs:subClassOf clc:Forests .

?CITY rdf:type dbpedia:City .

?CITY strdf:hasGeometry ?CGEO .

?BA rdf:type noa:BurntArea .

?BA strdf:hasGeometry ?BAGEO .

FILTER(strdf:anyInteract(?RGEO,?BAGEO) &&

strdf:distance(?BAGEO,?CGEO) < 0.02) }

stSPARQL: An example

Spatial Functions(OGC Simple Feature Access)

Page 16: Geographica: A Benchmark for Geospatial RDF Stores

stSPARQL: Geospatial SPARQL 1.1

• We start from SPARQL 1.1.

• We add a SPARQL extension function for each function defined in the OGC standard OpenGIS Simple Feature Access – Part 2: SQL option (ISO 19125) for adding geospatial data to relational DBMSs and SQL.

• We add appropriate geospatial extensions to SPARQL 1.1 Update language

Page 17: Geographica: A Benchmark for Geospatial RDF Stores

stSPARQL (cont’d)

• Basic functions

• Get a property of a geometry (e.g., strdf:srid)

• Get the desired representation of a geometry (e.g., strdf:AsText)

• Test whether a certain condition holds (e.g., strdf:IsEmpty, strdf:IsSimple)

• Functions for testing topological spatial relationships (e.g., strdf:equals, strdf:intersects)

• OGC Simple Features Access, Egenhofer, RCC-8

• Spatial analysis functions

• Construct new geometric objects from existing geometric objects (e.g., strdf:buffer, strdf:intersection, strdf:convexHull)

• Spatial metric functions (e.g., strdf:distance, strdf:area)

• Spatial aggregate functions (e.g., strdf:union, strdf:extent)

Page 18: Geographica: A Benchmark for Geospatial RDF Stores

stSPARQL (cont’d)

• SELECT clause

Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))

Spatial aggregate functions (e.g., strdf:extent(?geo))

Metric functions (e.g., strdf:area(?geo))

• FILTER clause

Functions for testing topological spatial relationships between spatial terms (e.g., strdf:contains(?G1, strdf:union(?G2, ?G3)))

Numeric expressions involving spatial metric functions (e.g.,strdf:area(?G1)<=2*strdf:area(?G2)+1)

• HAVING clause

Spatial aggregate functions and spatial metric functions or functions testing for topological relationships between spatial terms (e.g., strdf:area(strdf:union(?geo))>1)

Page 19: Geographica: A Benchmark for Geospatial RDF Stores

• Isolate the parts of the burnt areas that lie in coniferous forests.

SELECT ?burntArea (strdf:intersection(?baGeom, strdf:union(?fGeom)) AS ?burntForest) WHERE {

?burntArea rdf:type noa:BurntArea; noa:hasGeometry [ noa:hasSerialization ?baGeo ].

?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest;

clc:hasGeometry [ clc:hasSerialization ?fGeom ].

FILTER(strdf:intersects(?baGeom,?fGeom)) } GROUP BY ?burntArea ?baGeom

stSPARQL: An example

Page 20: Geographica: A Benchmark for Geospatial RDF Stores

GeoSPARQL

Core

Topology VocabularyExtension

- relation family

Geometry Extension - serialization - version

Geometry TopologyExtension

- serialization - version - relation family

Query RewriteExtension

- serialization - version - relation family

RDFS Entailment Extension

- serialization - version - relation family

Parameters• Serialization

• WKT• GML

• Relation Family• Simple

Features• RCC8• Egenhofer

Page 21: Geographica: A Benchmark for Geospatial RDF Stores

System Language Index Geometries CRS support

Comments on Functionality

Strabon stSPARQL/ GeoSPARQL*

R-tree-over-GiST

WKT / GML support

Yes • OGC-SFA• Egenhofer• RCC-8

Parliament GeoSPARQL R-Tree WKT / GML support

Yes •OGC-SFA•Egenhofer•RCC-8

Brodt et al. (RDF-3X)

SPARQL R-Tree WKT support No OGC-SFA

Perry SPARQL-ST R-Tree GeoRSS GML Yes RCC8

AllegroGraph Extended SPARQL

Distribution sweeping technique

2D point geometries

Partial •Buffer•Bounding Box•Distance

OWLIM Extended SPARQL

Custom 2D point geometries (W3C Basic Geo Vocabulary)

No •Point-in-polygon•Buffer•Distance

Virtuoso SPARQL R-Tree 2D point geometries (in WKT)

Yes SQL/MM (subset)

uSeekM GeoSPARQL R-tree-over GiST

WKT support No OGC-SFA

Page 22: Geographica: A Benchmark for Geospatial RDF Stores

235/26/2013

Outline

• Motivation

• SPARQL extensions for querying geospatial data expressed in RDF

• State-of-the-art geospatial RDF stores

• Related benchmarks

• The benchmark Geographica

• Evaluating the performance of geospatial RDF stores using Geographica

• Conclusions

Page 23: Geographica: A Benchmark for Geospatial RDF Stores

245/26/2013

Related Work

• Benchmarks for SPARQL query processing:

• LUBM (JWS 2005)

• DBpedia SPARQL Benchmark (ISWC 2011)

• …

• Benchmarks for geospatial relational DBMS:

• Sequoia 2000 (SIGMOD 1993)

• VESPA (BNCOD 2000)

• Jackpine (ICDE 2011)

• Benchmarks for spatial indexing and query processing operations

• Benchmarks for geospatial RDF stores:

• A Benchmark for Spatial Semantic Web Systems (Kolas, SSWS 2008)

Page 24: Geographica: A Benchmark for Geospatial RDF Stores

255/26/2013

The Benchmark Geographica

• Aim: measure the performance of today’s geospatial RDF stores

• Organized around two workloads:

• Real-world workload:

• Based on existing linked geospatial datasets and known application scenarios

• Synthetic workload:

• Measure performance in a controlled environment where we can play around with properties of the data and the queries.

• Γεωγραφικά: 17-volume geographical

encyclopedia by Στράβων (AD 17)

Page 25: Geographica: A Benchmark for Geospatial RDF Stores

265/26/2013

Real-World Workload

• Datasets: Real-world datasets for the geographic area of Greece playing an important role in the LOD cloud or having complex geometries

• LinkedGeoData (LGD) for rivers and roads in Greece

• GeoNames for Greece

• DBpedia for Greece

• Greek Administrative Geography (GAG)

• CORINE land cover (CLC) for Greece

• Hotspots

Page 26: Geographica: A Benchmark for Geospatial RDF Stores

275/26/2013

Real-World WorkloadDatasets

Dataset Size # of Triples

# of Points

# of Lines(max/min/avg points/line)

# of Polygons(max/min/avg

points/polygon)

GAG 33MB 4K - - 325 (15K/4/400)

CLC 401MB 630K - - 45K (5K/4/140)

LGD 29MB 150K - 12K (1.6K/2/21)

-

GeoNames 45MB 400K 22K - -

DBpedia 89MB 430K 8K - -

Hotspots 90MB 450K - - 37K (4/4/4)

user
Να κανατσεκάρω τα νούμερα στις παρενθέσεις.Μήπως να τα βάλω σε ξεχωριστή στήλη??
Page 27: Geographica: A Benchmark for Geospatial RDF Stores

285/26/2013

Real-World WorkloadParts

• For this workload, Geographica has two parts (following Jackpine):

• Micro part: Tests primitive spatial functions offered by geospatial RDF stores

• Macro part: Simulates some typical application scenarios

Page 28: Geographica: A Benchmark for Geospatial RDF Stores

295/26/2013

Real-World WorkloadMicro part

• 29 queries that consist of one or two triple patterns and a spatial function.

• Functions included:

• Spatial analysis: boundary, envelope, convex hull, buffer, area

• Topological: equals, intersects, overlaps, crosses, within, distance, disjoint

• As used in spatial selections and spatial joins

• Spatial aggregates: extent, union

• Functions are applied to many representative types of geometries .

Page 29: Geographica: A Benchmark for Geospatial RDF Stores

305/26/2013

Example – spatial analysis

• Construct the boundary of all polygons of CLC

PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ( geof:boundary(?o1) as ?ret )

WHERE {

GRAPH <http://geographica.di.uoa.gr/dataset/clc>

{ ?s1 <http://geo.linkedopendata.gr/corine/ontology#asWKT> ?o1} }

Page 30: Geographica: A Benchmark for Geospatial RDF Stores

315/26/2013

Example – spatial selection

• Find all points in Geonames that are contained in a given polygon.

PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ?s1 ?o1

WHERE {

GRAPH <http://geographica.di.uoa.gr/dataset/geonames>

{ ?s1 <http://www.geonames.org/ontology#asWKT> ?o1 }

FILTER( geof:sfWithin(?o1, "GIVEN_POLYGON_IN_WKT"^^<http://www.opengis.net/ont/geosparql#wktLiteral>)). }

Page 31: Geographica: A Benchmark for Geospatial RDF Stores

325/26/2013

Example – spatial join

• Find all pairs of GAG polygons that overlap

PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ?s1 ?s2

WHERE {

GRAPH <http://geographica.di.uoa.gr/dataset/gag>

{?s1 <http://geo.linkedopendata.gr/gag/ontology/asWKT> ?o1}

GRAPH <http://geographica.di.uoa.gr/dataset/clc>

{?s2 <http://geo.linkedopendata.gr/corine/ontology#asWKT> ?o2}

FILTER( geof:sfOverlaps(?o1, ?o2) )

}

Page 32: Geographica: A Benchmark for Geospatial RDF Stores

Query Point Query Line Query Polygon

Point Within BufferDistance

WithinDisjoint

Line EqualsCrosses

IntersectsDisjoint

Polygon Intersects EqualsOverlaps

Real-World WorkloadMicro part

• Spatial Selections

• Spatial JoinsPoint Line Polygon

Point Equals Intersects IntersectsWithin

Line IntersectsWithinCrosses

Polygon WithinTouchesOverlaps

Page 33: Geographica: A Benchmark for Geospatial RDF Stores

345/26/2013

Real-World WorkloadMacro part

• Reverse Geocoding: Attribute a street address and place to a given point.

• Queries:

• Find the closest populated place (from GeoNames)

• Find the closest street (from LGD)

Page 34: Geographica: A Benchmark for Geospatial RDF Stores

355/26/2013

Real-World WorkloadMacro part

• Web Map Search and Browsing

• Queries:

• Find the co-ordinates of a given POI based on thematic criteria (from GeoNames)

• Find roads in a given bounding box around these co-ordinates (from LGD)

• Find other POI in a given bounding box around these co-ordinates (from LGD)

Page 35: Geographica: A Benchmark for Geospatial RDF Stores

365/26/2013

Real-World WorkloadMacro part

• Rapid Mapping for Fire Monitoring: representative of typical rapid mapping tasks carried out by space agencies in the case of an emergency

Page 36: Geographica: A Benchmark for Geospatial RDF Stores

375/26/2013

Real-World WorkloadMacro part

• Rapid Mapping for Fire Monitoring

• Queries:

• Simple tasks retrieving background mapping information:

• Find the land cover of areas inside a given bounding box (from CLC)

• Find primary roads inside a given bounding box (from LGD)

• Find capitals of prefectures inside a given bounding box (from GAG)

• (Often) more complex main mapping tasks:

• Find municipality boundaries inside a bounding box (from GAG)

• Find coniferous forests which are on fire (from CLC and Hotspots)

• Find road segments which may be damaged by fire (from LGD and Hotspots)

Page 37: Geographica: A Benchmark for Geospatial RDF Stores

385/26/2013

Experimental EvaluationReal-world workload

• Geospatial RDF stores tested: Strabon, Parliament, uSeekM

• Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM, 4 HDD with RAID-5

• Micro part:

• Metric: response time

• Run 3 times and compute the median

• Time out: 1 hour

• Run both on warm caches and cold caches

• Macro part:

• Run each scenario many times for one hour with warm caches

• Metric: Average time for a complete execution

Page 38: Geographica: A Benchmark for Geospatial RDF Stores

395/26/2013

ResultsData Storage

• Strabon is the slowest given the PostGIS tables and indices it is building.

• uSeekM does much better using the Sesame native store

• Parliament slows down due to geo:asWKT subproperty inferencing

System Strabon uSeekM Parliament

Storage time 550 sec 214 sec 250 sec

Page 39: Geographica: A Benchmark for Geospatial RDF Stores

405/26/2013

ResultsMicro part (cold caches)

Page 40: Geographica: A Benchmark for Geospatial RDF Stores

415/26/2013

ResultsMacro part

Scenario Strabon uSeekM Parliament

Reverse Geocoding 65 sec 0.77 sec 2.6 sec

Map Search and Browsing

0.9 sec 0.6 sec 22.2 sec

Rapid Mapping for Fire Monitoring

207.4 sec - -

Page 41: Geographica: A Benchmark for Geospatial RDF Stores

425/26/2013

Synthetic Workload

• Goal: Evaluate performance in a controlled environment where we can vary the thematic and spatial selectivity of queries

• Thematic selectivity: the fraction of the total geographic features of a dataset that satisfy the non-spatial part of a query

• Spatial selectivity: the fraction of the total geographic features of a dataset which satisfy the topological relation in the FILTER clause of a query

Page 42: Geographica: A Benchmark for Geospatial RDF Stores

435/26/2013

Synthetic Workload

• Dataset: As in VESPA, the produced datasets are geographic features on a synthetic map:

• States in a country ( (n/3)2 )

• Land ownership (n2)

• Roads (n)

• POI (n2)

Page 43: Geographica: A Benchmark for Geospatial RDF Stores

445/26/2013

Synthetic WorkloadOntology

• Based roughly on the ontology of OpenStreetMap and the GeoSPARQL vocabulary

• Tagging each feature with a key enables us to select a known fraction of features in a uniform way

Page 44: Geographica: A Benchmark for Geospatial RDF Stores

455/26/2013

Synthetic WorkloadQuery template for spatial selections

SELECT ?s

WHERE {?s ns:hasGeometry ?g.?s c:hasTag ?tag.?g ns:asWKT ?wkt.?tag ns:hasKey “THEMA”

FILTER(FUNCTION(?wkt, “GEOM”))}

• Parameters:

• ns: specifies the kind of feature (and geometry type) examined

• THEMA: defines the thematic selectivity of the query using another parameter k

• FUNCTION: specifies the topological function examined

• GEOM: specifies a rectangle that controls the spatial selectivity of the query

Page 45: Geographica: A Benchmark for Geospatial RDF Stores

465/26/2013

Synthetic WorkloadQuery template for spatial joins

SELECT ?s1 ?s2

WHERE {?s1 ns1:hasGeometry ?g1.?s1 c:hasTag ?tag1.?g1 ns:asWKT ?wkt1.?tag1 ns:hasKey “THEMA”

?s2 ns2:hasGeometry ?g2.?s2 c:hasTag ?tag2.?g2 ns2:asWKT ?wkt2.?tag2 ns2:hasKey “THEMA’”

FILTER(FUNCTION(?wkt1, ?wkt2))}

Page 46: Geographica: A Benchmark for Geospatial RDF Stores

475/26/2013

Experimental EvaluationSynthetic workload

• Geospatial RDF stores tested: Strabon, Parliament, uSeekM

• Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM, 4 HDD with RAID-5

• Details:

• Metric: response time

• Run 3 times and compute the median

• Time out: 1 hour

• Run both on warm caches and cold caches

Page 47: Geographica: A Benchmark for Geospatial RDF Stores

485/26/2013

ResultsData Storage

• We generate the synthetic dataset with n=512 and k=9. This results in:

• 28900 states

• 262144 land ownerships

• 512 roads

• 262144 points of interest

• Size: 3,880,224 triples (745 MB)

System Strabon uSeekM Parliament

Storage time 221 sec 406 sec 462 sec

Page 48: Geographica: A Benchmark for Geospatial RDF Stores

495/26/2013

ResultsSynthetic Workload (Spatial Selections)

IntersectsTag 1, cold caches

IntersectsTag 512, cold caches

Page 49: Geographica: A Benchmark for Geospatial RDF Stores

505/26/2013

ResultsSynthetic Workload (Spatial Joins)

Intersects

Page 50: Geographica: A Benchmark for Geospatial RDF Stores

515/26/2013

Outline

• Motivation

• SPARQL extensions for querying geospatial data expressed in RDF

• State-of-the-art geospatial RDF stores

• Related benchmarks

• The benchmark Geographica

• Evaluating the performance of geospatial RDF stores using Geographica

• Conclusions

Page 51: Geographica: A Benchmark for Geospatial RDF Stores

525/26/2013

Conclusions

• We defined Geographica, a new comprehensive benchmark for geospatial RDF stores, and used it to compare 3 relevant systems (Strabon, Parliament, uSeekM).

• More implementation work is necessary in adding features to other geospatial RDF stores beyond the ones tested.

• More real-world scenarios can be added.

• Next target: spatiotemporal RDF stores

Page 52: Geographica: A Benchmark for Geospatial RDF Stores

535/26/2013

Advertisement

Strabon: http://strabon.di.uoa.gr

Geographica: http://geographica.di.uoa.gr

Tutorials/Survey paper

More at ESWC

Paper: Storing and Querying the Valid Time of Triples in Linked Geospatial Data

Demo: Sextant, a web tool for browsing and mapping Linked Geospatial Data http://strabon.di.uoa.gr:8080/sextant/

Project networking: TELEIOS http://www.earthobservatory.eu