Upload
tamarr
View
70
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Finding Spatial Equivalences Across Multiple RDF Datasets. Juan Salas, Andreas Harth. Outline. Motivation NeoGeo Vocabularies Geospatial Datasets Integration Challenges Finding Geometric E quivalences Conclusion. Motivation. Geodata is becoming increasingly relevant. - PowerPoint PPT Presentation
Citation preview
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
Finding Spatial Equivalences Across Multiple RDF DatasetsJuan Salas, Andreas Harth
Juan Salas, Andreas Harth2
Outline
Motivation
NeoGeo Vocabularies
Geospatial Datasets
Integration Challenges
Finding Geometric Equivalences
Conclusion
Juan Salas, Andreas Harth3
Motivation
Geodata is becoming increasingly relevant.Location-based servicesMobile applicationsEver increasing amount of sensor data (phones, satelites)
Different sources.Many formats:
GML, KML, Shapefile, GPX, WKT, RDF?…
Applications require integrated access to geodata.
Juan Salas, Andreas Harth4
NeoGeo Vocabularies
Geometry Vocabulary – http://geovocab.org/geometryRepresentation of georeferenced geometric shapes.
Spatial Ontology – http://geovocab.org/spatialRepresentation and reasoning on topological relations based on the Region Connection Calculus (RCC).
Juan Salas, Andreas Harth5
Geospatial Datasets
GADM-RDF – http://gadm.geovocab.orgRDF representation of the administrative regions of the GADM project: http://gadm.org
NUTS-RDF – http://nuts.geovocab.orgRDF representation of Eurostat's NUTS nomenclature.
They serve as:New geospatial information on the Semantic Web.Bridges between already published spatial datasets.Proof-of-concept platforms.
Juan Salas, Andreas Harth6
Integration Challenges
Vocabularies – http://geovocab.org/doc/survey.html
Survey of several well-known Linked Data datasets (Ordnance Survey, GeoLinkedData.es, LinkedGeoData.org, GeoNames, DBpedia).Identified properties and classes mapped to the NeoGeo vocabularies published at GeoVocab.org
InstancesFinding equivalences between regions across multiple datasets at the geometry level.
Juan Salas, Andreas Harth7
Integration Challenges
Juan Salas, Andreas Harth8
Finding Geometric EquivalencesNUTS-RDF and GADM-RDF have different:
Sampling valuesScalesStarting pointsRounding effects
Geometric shapes will not be vertex by vertex equivalent.
A sensible criterion for finding geometric equivalences is needed.
Juan Salas, Andreas Harth9
Algorithm Overview
WGS-84, Plate Carrée projection
Hausdorff distance
spatial:EQ
1
1
*
Juan Salas, Andreas Harth10
1. Retrieve sample dataThe algorithm requires:
WGS-84 coordinate reference system.Plate Carrée projection:
X = longitudeY = latitude
Coordinates are treated as Cartesian. Distorts all parameters (area, shape, distance, direction).
Geometric shapes are equally distorted on both datasets.Local reprojections are avoided (e.g. UTM).Units will be presented in centesimal degrees.
Juan Salas, Andreas Harth11
2. Similarity threshold functionThe Hausdorff Distance provides a measure of similarity between geometric shapes.
Can be intuitively defined asthe largest distance between the closest points of two geometric shapes.
Juan Salas, Andreas Harth12
2. Similarity threshold functionSmaller regions need a lower Hausdorff Distance threshold than larger regions.
Juan Salas, Andreas Harth13
2. Similarity threshold function
NUTS Name NUTS Area GADM Name Hausdorff Distance Midpoint Value
ESPAÑA 53.47 España 1.63 10.39
Tamanghasset 19.15
ΕΛΛΑΔΑ / ELLADA 13.16 Ellas or Ellada 1.05 3.7
Bulgaria 6.34
ÖSTERREICH 10.07 Österreich 0.18 2.06
Ceská republika 3.93
Hedmark 4.61 Hedmark 0.48 2.93
Oppland 2.45
Somme 0.78 Somme 0.32 0.5
Oise 0.67
We calculate the midpoint value between the Hausdorff Distances for a correct guess and the lowest wrong guess.
Juan Salas, Andreas Harth14
We perform regression on the midpoint values to obtain the Hausdorff Distance threshold function.
2. Similarity threshold function
Juan Salas, Andreas Harth15
3. Finding spatial equivalences
NUTS Name NUTS Area
GADM Name Hausdorff Distance
ThresholdFunction
spatial:EQ
HRVATSKA 6.21 Hrvatska 1.14 3.49 Yes
NEDERLAND 4.83 Nederland 0.39 2.96 Yes
LIETUVA 9.15 Lietuva 0.47 4.31 Yes
ΕΛΛΑΔΑ / ELLADA 13.17 Bulgaria 6.35 5.08 No
UNITED KINGDOM 33.03 France 12.6 7.02 No
Córdoba 1.41 Sevilla 1.41 0.37 No
Juan Salas, Andreas Harth16
Poor Geospatial InformationSometimes location is approximated as a single point.Can lead to false assertions while calculating containment relations.
<http://dbpedia.org/resource/Germany> geo:lat 52.516666; geo:long 13.383333 .
<http://nuts.geovocab.org/id/DE30_geometry> rdf:type ngeo:Polygon .
Germany is not contained in Berlin.
Other properties must be considered to calculate containment relations (e.g. rdf:type).Other spatial relations (e.g. spatial:EQ) cannot be calculated.
Juan Salas, Andreas Harth17
Optimizations
The cost of calculating the Hausdorff distance depends on the amount of vertices.
The Ramer-Douglas-Peucker algorithm allows to simplify geometric shapes, using an arbitrary maximum separation.
Juan Salas, Andreas Harth18
Optimizations
Region Name NUTS Points
GADM Points
Hausdorff Distance (Original)
Time [ms] (Original)
Hausdorff Distance
(0.2 Simplif.)
Time [ms] (0.2 Simplif.)
Finland 389 107783 1.3996 30353 1.3483 2504
Croatia 175 193180 1.1374 7830 1.1366 1108
Schleswig-Holstein 118 28001 0.7281 1870 0.7257 296
Iceland 320 7610 0.4163 567 0.4613 66
Karlsruhe 47 1021 0.1062 35 0.1906 13
Seine-Saint-Denis 6 30 0.0812 1 0.0716 2
Juan Salas, Andreas Harth19
Spatial Databases
The algorithm works also well with spatial databases (e.g. PostgreSQL / PostGIS):
SELECT g.gadm_id, n.nuts_id FROM nuts n INNER JOIN gadm g ON (n.geometry && g.geometry) WHERE n.shape_area BETWEEN (g.shape_area * 0.9) AND (g.shape_area * 1.1) AND ST_HausdorffDistance( ST_SimplifyPreserveTopology(n.geometry, 0.5), ST_SimplifyPreserveTopology(g.geometry, 0.5) ) < g.max_hausdorff_dist;
Juan Salas, Andreas Harth20
EvaluationNot every NUTS region matches a GADM region.
Many NUTS regions represent parts or aggregations of GADM administrative boundaries.
1,671 NUTS regions => 965 matches & 13 false positives.
NUTS UKF2Leicestershire, Rutland and Northamptonshire
GADM 2_13988Leicestershire
Juan Salas, Andreas Harth21
EvaluationNUTS Region NUTS Area Incorrect GADM guess Hausdorff Distance
UKM34 0.0214 East Renfrewshire 0.1862
FR106 0.0334 Val-De-Marne 0.1644
BE321 0.0654 Soignies 0.3521
BE353 0.1188 Thuin 0.2834
CH061 0.1672 Aargau 0.3653
LT 9.5204 Latvija 2.5098
LI 0.0205 Appenzell Innerrhoden 0.2783
UKM28 0.0689 North Lanarkshire 0.3478
BE331 0.1013 Lige 0.335
BE353 0.1188 Thuin 0.2834
CH061 0.1672 Aargau 0.3653
SE3 60.585 Norge 7.8658
BE321 0.0654 Soignies 0.3521
Juan Salas, Andreas Harth22
Conclusion
NeoGeo vocabularies:Survey and mappings to other vocabularies.
NUTS-RDF and GADM-RDF datasets:GADM-RDF links to DBpedia, UK Ordnance Survey and NUTS-RDF.Linked Data Services for accessing/querying spatial indices (withinRegion, boundingBox).
Work on spatial similarity metrics:Promising results
Juan Salas, Andreas Harth23
Future WorkNeoGeo vocabularies.
Temporal context.
Datasets:More Earth and space science data.Add more instance mappings.
Spatial similarity:Improve precision.Develop tools to support the mapping process.
More experiments: Querying of integrated data and reasoning.
Juan Salas, Andreas Harth24
Acknowledgements
European Commission's Seventh Framework Programme FP7/2007-2013 (PlanetData, Grant 257641)