24
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu ding Spatial Equivalences Across Multiple RDF Datas Juan Salas, Andreas Harth

Finding Spatial Equivalences Across Multiple RDF Datasets

  • Upload
    tamarr

  • View
    70

  • Download
    0

Embed Size (px)

DESCRIPTION

Finding Spatial Equivalences Across Multiple RDF Datasets. Juan Salas, Andreas Harth. Outline. Motivation NeoGeo Vocabularies Geospatial Datasets Integration Challenges Finding Geometric E quivalences Conclusion. Motivation. Geodata is becoming increasingly relevant. - PowerPoint PPT Presentation

Citation preview

Page 1: Finding Spatial Equivalences Across Multiple RDF Datasets

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu

Finding Spatial Equivalences Across Multiple RDF DatasetsJuan Salas, Andreas Harth

Page 2: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth2

Outline

Motivation

NeoGeo Vocabularies

Geospatial Datasets

Integration Challenges

Finding Geometric Equivalences

Conclusion

Page 3: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth3

Motivation

Geodata is becoming increasingly relevant.Location-based servicesMobile applicationsEver increasing amount of sensor data (phones, satelites)

Different sources.Many formats:

GML, KML, Shapefile, GPX, WKT, RDF?…

Applications require integrated access to geodata.

Page 4: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth4

NeoGeo Vocabularies

Geometry Vocabulary – http://geovocab.org/geometryRepresentation of georeferenced geometric shapes.

Spatial Ontology – http://geovocab.org/spatialRepresentation and reasoning on topological relations based on the Region Connection Calculus (RCC).

Page 5: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth5

Geospatial Datasets

GADM-RDF – http://gadm.geovocab.orgRDF representation of the administrative regions of the GADM project: http://gadm.org

NUTS-RDF – http://nuts.geovocab.orgRDF representation of Eurostat's NUTS nomenclature.

They serve as:New geospatial information on the Semantic Web.Bridges between already published spatial datasets.Proof-of-concept platforms.

Page 6: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth6

Integration Challenges

Vocabularies – http://geovocab.org/doc/survey.html

Survey of several well-known Linked Data datasets (Ordnance Survey, GeoLinkedData.es, LinkedGeoData.org, GeoNames, DBpedia).Identified properties and classes mapped to the NeoGeo vocabularies published at GeoVocab.org

InstancesFinding equivalences between regions across multiple datasets at the geometry level.

Page 7: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth7

Integration Challenges

Page 8: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth8

Finding Geometric EquivalencesNUTS-RDF and GADM-RDF have different:

Sampling valuesScalesStarting pointsRounding effects

Geometric shapes will not be vertex by vertex equivalent.

A sensible criterion for finding geometric equivalences is needed.

Page 9: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth9

Algorithm Overview

WGS-84, Plate Carrée projection

Hausdorff distance

spatial:EQ

1

1

*

Page 10: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth10

1. Retrieve sample dataThe algorithm requires:

WGS-84 coordinate reference system.Plate Carrée projection:

X = longitudeY = latitude

Coordinates are treated as Cartesian. Distorts all parameters (area, shape, distance, direction).

Geometric shapes are equally distorted on both datasets.Local reprojections are avoided (e.g. UTM).Units will be presented in centesimal degrees.

Page 11: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth11

2. Similarity threshold functionThe Hausdorff Distance provides a measure of similarity between geometric shapes.

Can be intuitively defined asthe largest distance between the closest points of two geometric shapes.

Page 12: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth12

2. Similarity threshold functionSmaller regions need a lower Hausdorff Distance threshold than larger regions.

Page 13: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth13

2. Similarity threshold function

NUTS Name NUTS Area GADM Name Hausdorff Distance Midpoint Value

ESPAÑA 53.47 España 1.63 10.39

Tamanghasset 19.15

ΕΛΛΑΔΑ / ELLADA 13.16 Ellas or Ellada 1.05 3.7

Bulgaria 6.34

ÖSTERREICH 10.07 Österreich 0.18 2.06

Ceská republika 3.93

Hedmark 4.61 Hedmark 0.48 2.93

Oppland 2.45

Somme 0.78 Somme 0.32 0.5

Oise 0.67

We calculate the midpoint value between the Hausdorff Distances for a correct guess and the lowest wrong guess.

Page 14: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth14

We perform regression on the midpoint values to obtain the Hausdorff Distance threshold function.

2. Similarity threshold function

Page 15: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth15

3. Finding spatial equivalences

NUTS Name NUTS Area

GADM Name Hausdorff Distance

ThresholdFunction

spatial:EQ

HRVATSKA 6.21 Hrvatska 1.14 3.49 Yes

NEDERLAND 4.83 Nederland 0.39 2.96 Yes

LIETUVA 9.15 Lietuva 0.47 4.31 Yes

ΕΛΛΑΔΑ / ELLADA 13.17 Bulgaria 6.35 5.08 No

UNITED KINGDOM 33.03 France 12.6 7.02 No

Córdoba 1.41 Sevilla 1.41 0.37 No

Page 16: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth16

Poor Geospatial InformationSometimes location is approximated as a single point.Can lead to false assertions while calculating containment relations.

<http://dbpedia.org/resource/Germany> geo:lat 52.516666; geo:long 13.383333 .

<http://nuts.geovocab.org/id/DE30_geometry> rdf:type ngeo:Polygon .

Germany is not contained in Berlin.

Other properties must be considered to calculate containment relations (e.g. rdf:type).Other spatial relations (e.g. spatial:EQ) cannot be calculated.

Page 17: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth17

Optimizations

The cost of calculating the Hausdorff distance depends on the amount of vertices.

The Ramer-Douglas-Peucker algorithm allows to simplify geometric shapes, using an arbitrary maximum separation.

Page 18: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth18

Optimizations

Region Name NUTS Points

GADM Points

Hausdorff Distance (Original)

Time [ms] (Original)

Hausdorff Distance

(0.2 Simplif.)

Time [ms] (0.2 Simplif.)

Finland 389 107783 1.3996 30353 1.3483 2504

Croatia 175 193180 1.1374 7830 1.1366 1108

Schleswig-Holstein 118 28001 0.7281 1870 0.7257 296

Iceland 320 7610 0.4163 567 0.4613 66

Karlsruhe 47 1021 0.1062 35 0.1906 13

Seine-Saint-Denis 6 30 0.0812 1 0.0716 2

Page 19: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth19

Spatial Databases

The algorithm works also well with spatial databases (e.g. PostgreSQL / PostGIS):

SELECT g.gadm_id, n.nuts_id FROM nuts n INNER JOIN gadm g ON (n.geometry && g.geometry) WHERE n.shape_area BETWEEN (g.shape_area * 0.9) AND (g.shape_area * 1.1) AND ST_HausdorffDistance( ST_SimplifyPreserveTopology(n.geometry, 0.5), ST_SimplifyPreserveTopology(g.geometry, 0.5) ) < g.max_hausdorff_dist;

Page 20: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth20

EvaluationNot every NUTS region matches a GADM region.

Many NUTS regions represent parts or aggregations of GADM administrative boundaries.

1,671 NUTS regions => 965 matches & 13 false positives.

NUTS UKF2Leicestershire, Rutland and Northamptonshire

GADM 2_13988Leicestershire

Page 21: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth21

EvaluationNUTS Region NUTS Area Incorrect GADM guess Hausdorff Distance

UKM34 0.0214 East Renfrewshire 0.1862

FR106 0.0334 Val-De-Marne 0.1644

BE321 0.0654 Soignies 0.3521

BE353 0.1188 Thuin 0.2834

CH061 0.1672 Aargau 0.3653

LT 9.5204 Latvija 2.5098

LI 0.0205 Appenzell Innerrhoden 0.2783

UKM28 0.0689 North Lanarkshire 0.3478

BE331 0.1013 Lige 0.335

BE353 0.1188 Thuin 0.2834

CH061 0.1672 Aargau 0.3653

SE3 60.585 Norge 7.8658

BE321 0.0654 Soignies 0.3521

Page 22: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth22

Conclusion

NeoGeo vocabularies:Survey and mappings to other vocabularies.

NUTS-RDF and GADM-RDF datasets:GADM-RDF links to DBpedia, UK Ordnance Survey and NUTS-RDF.Linked Data Services for accessing/querying spatial indices (withinRegion, boundingBox).

Work on spatial similarity metrics:Promising results

Page 23: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth23

Future WorkNeoGeo vocabularies.

Temporal context.

Datasets:More Earth and space science data.Add more instance mappings.

Spatial similarity:Improve precision.Develop tools to support the mapping process.

More experiments: Querying of integrated data and reasoning.

Page 24: Finding Spatial Equivalences Across Multiple RDF Datasets

Juan Salas, Andreas Harth24

Acknowledgements

European Commission's Seventh Framework Programme FP7/2007-2013 (PlanetData, Grant 257641)