32
GeoLinkedData Asunción Gómez-Pérez, Alexander de Leon, Victor Saquicela, Luis M. Vilches, Oscar Corcho, and Boris Villazón-Terrazas Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net Phone: 34.91.3366605, Fax: 34.91.3524819

GeoLinkedData

Embed Size (px)

DESCRIPTION

GeoLinked Data (.es) is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data. This initiative started off by publishing diverse information sources belonging to the Spanish National Geographic Institute. Such sources are made available as RDF (Resource Description Framework) knowledge bases according to the Linked Data principles. With this work, Spain has joined the Linked Data initiative, in which the United Kingdom and Germany are already participating. In this presentation, we provide an overview of the process that has been followed for the development of this initiative.

Citation preview

Page 1: GeoLinkedData

GeoLinkedData

Asunción Gómez-Pérez, Alexander de Leon, Victor Saquicela, Luis M. Vilches,

Oscar Corcho, and Boris Villazón-Terrazas

Facultad de Informática, Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net

Phone: 34.91.3366605, Fax: 34.91.3524819

Page 2: GeoLinkedData

ToC

• Motivation

• Related Work

• GeoLinkedData• Identification of the data sources• Vocabulary Development• Generation of the RDF data• Publication of the RDF data• Data cleansing• Linking the RDF data• Enable effective discovery

• Future Work

Page 3: GeoLinkedData

Motivation

99.171 % English

0.019 % Spanish

Source:Billion Triples dataset at http://km.aifb.kit.edu/projects/btc-2010/Thanks to Aidan and Richard

Page 4: GeoLinkedData

Related Work

N Provider ProvenanceData

Topics Datasets URI Pattern Vocabulary Geometry Lang

1 OrdnanceSurvey

OrdnanceSurvey

AdministrativeUnits

Administrative gazetteer

for Great Britain

http://data.ordnancesurvey.co.uk/id/.+ Spatial Relations Ontology

Administrative

Point en

2 LinkedGeoData

OpenStreetMap Points of interest

OpenStreetMap database

(post offices, traffic lights,

bus stops)

http://linkedgeo.data.org/triplify/+ LGD Ontology,WGS84 Geo Positioning

Point en

3 GeoNames GeoNames Toponyms Datasources used byGeoNames

http://sws.geonames.org/id/+ GeoNames Ontology,

WGS84 GeoPositioning

Point en, es,de,ca, nb, it,da, fr

4 Dbpedia Wikipedia General Knowledge

Wikipedia http://dbpedia.org/resource/+ Dbpedia Ontology,

WGS84 Geopositioning

Point 92 lang.

Page 5: GeoLinkedData

GeoLinkedData

• It is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data.

• It has started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN).

• http://geo.linkeddata.es

• Recently, National Statistics Institute (INE)

Page 6: GeoLinkedData

Process for Publishing Linked Data on the Web

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

Page 7: GeoLinkedData

1. Identification and selection of the data sources

IGN

INE

Page 8: GeoLinkedData

2. Lightweight Ontology Development

hasStatisticalData

on

Ontology

Specification

Legend

hydrOntology

4

FAO

FAO Geopolitical ontology

WGS84

4W3C Vocabulary

GML

4GML Specification

O. Statistics

SCOVO

O. Time

W3C Time

hasLat/Long

hasGeometry

hasLat/Long

hasGeometry

hasLocation/isLocated

Thesaurus

UNESCO

4EGM / ERM

GeoNames

scv:Dimensionscv:Item

scv:Dataset

WGS84 Geo Positioning: an RDF

vocabulary

hydrographical phenomena (rivers,

lakes, etc.)

Ontology for OGC Geography Markup Language

Vocabulary for instants, intervals, durations, etc.

Names and international code systems for territories and groups

Following the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation.hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time

Classes 33 33

Object Properties 44 44

Data Properties 318 318

Page 9: GeoLinkedData

3. Generation of the RDF Data

INE

NOR2O

ODEMapster

IGN

IGN

Geospatial column

Geometry2RDF

Page 10: GeoLinkedData

3. Generation of the RDF Data – NOR2OIndustry Production Index

Province

Year

NOR2O

Page 11: GeoLinkedData

3. Generation of the RDF Data – R2O & ODEMapster

• R2O is an extensible, fully declarative language to describe mappings between relational database schemas and ontologies.

• The ODEMapster processor generates Semantic Web instances from relational instances based on the mapping description expressed in the R2O document

Page 12: GeoLinkedData

3. Generation of the RDF Data – R2O & ODEMapster

• Creation of the R2O Mappings

Page 13: GeoLinkedData

3. Generation of the RDF Data – R2O & ODEMapster

Excerpt of the R2O document

Page 14: GeoLinkedData

3. Generation of the RDF Data – Geometry2RDF

Oracle STO UTIL package

SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry

FROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta='Arroyo'

Page 15: GeoLinkedData

3. Generation of the RDF Data – Geometry2RDF

Page 16: GeoLinkedData

3. Generation of the RDF Data – Geometry2RDF

Page 17: GeoLinkedData

3. Generation of the RDF data – RDF graphs

IGN INE

So far

7 RDF Named Graphs

1412248 triples

BTN25 BCN200 IPI….

http://geo.linkeddata.es/dataset/IGN/BTN25 http://geo.linkeddata.es/dataset/IGN/BCN200 http://geo.linkeddata.es/dataset/INE/IPI

Page 18: GeoLinkedData

4. Publication of the RDF Data

SPARQL

Pubby

Linked DataHTML

Virtuoso 6.1.0

Pubby 0.3

Including ProvenanceSupport

Page 19: GeoLinkedData

4. Publication of the RDF Data

Page 20: GeoLinkedData

4. Publication of the RDF Data - License

• License for GeoLinkedData• Creative Commons Attribution-ShareAlike 3.0 • GNU Free Documentation License

• Each dataset will have its own specific license, IGN, INE, etc.

Page 21: GeoLinkedData

5. Data cleansing

• Lack of documentation of the IGN datasets

• Broken links: Spain, IGN resources

• Lack of documentation of the ontology

• Missing english and spanish labels

• Building a spanish ontology and importing some concepts of other ontology (in English):

• Importing the English ontology. Add annotations like a Spanish label to them.

• Importing the English ontology, creating new concepts and properties with a Spanish name and map those to the English equivalents.

• Re-declaring the terms of the English ontology that we need (using the same URI as in the English ontology), and adding a Spanish label.

• Creating your own class and properties that model the same things as the English ontology.

select DISTINCT ?graph where {GRAPH ?graph {?s ?p ?o.}.}select DISTINCT ?graph where {GRAPH ?graph {?s ?p ?o.}.}

Page 22: GeoLinkedData

5. Data cleansing

• URIs in Spanish• http://geo.linkeddata.es/ontology/Río

• RDF allows UTF-8 characters for URIs• But, Linked Data URIs has to be URLs as well• So, non ASCII-US characters have to be %code

• http://geo.linkeddata.es/ontology/R%C3%ADo

select DISTINCT ?graph where {GRAPH ?graph {?s ?p ?o.}.}select DISTINCT ?graph where {GRAPH ?graph {?s ?p ?o.}.}

Page 23: GeoLinkedData

6. Linking of the RDF Data

• Silk - A Link Discovery Framework for the Web of Data

• First set of links: Provinces of Spain• 86% accuracy

GeoLinkedDataDBPedia Geonames

Page 24: GeoLinkedData

7. Enable effective discovery

Page 25: GeoLinkedData

DEMO

http://geo.linkeddata.es/

Page 26: GeoLinkedData

Provinces

Page 27: GeoLinkedData

Industry Production Index – Capital of Province

Page 28: GeoLinkedData

Rivers

Page 29: GeoLinkedData

Beaches

Page 30: GeoLinkedData

Future Work

• Generate more datasets from other domains, e.g. universities in Spain.

• Identify more links to DBPedia and Geonames.

• Cover complex geometrical information, i.e. not only Point and LineString-like data; we will also treat information representation through polygons.

Page 31: GeoLinkedData

Go raibh maith agaibh

Page 32: GeoLinkedData

GeoLinkedData

Asunción Gómez-Pérez, Alexander de Leon, Victor Saquicela, Luis M. Vilches,

Oscar Corcho, and Boris Villazón-Terrazas

Facultad de Informática, Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net

Phone: 34.91.3366605, Fax: 34.91.3524819