Upload
boris-villazon-terrazas
View
1.894
Download
0
Embed Size (px)
DESCRIPTION
GeoLinked Data (.es) is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data. This initiative started off by publishing diverse information sources belonging to the Spanish National Geographic Institute. Such sources are made available as RDF (Resource Description Framework) knowledge bases according to the Linked Data principles. With this work, Spain has joined the Linked Data initiative, in which the United Kingdom and Germany are already participating. In this presentation, we provide an overview of the process that has been followed for the development of this initiative.
Citation preview
GeoLinkedData
Asunción Gómez-Pérez, Alexander de Leon, Victor Saquicela, Luis M. Vilches,
Oscar Corcho, and Boris Villazón-Terrazas
Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
Phone: 34.91.3366605, Fax: 34.91.3524819
ToC
• Motivation
• Related Work
• GeoLinkedData• Identification of the data sources• Vocabulary Development• Generation of the RDF data• Publication of the RDF data• Data cleansing• Linking the RDF data• Enable effective discovery
• Future Work
Motivation
99.171 % English
0.019 % Spanish
Source:Billion Triples dataset at http://km.aifb.kit.edu/projects/btc-2010/Thanks to Aidan and Richard
Related Work
N Provider ProvenanceData
Topics Datasets URI Pattern Vocabulary Geometry Lang
1 OrdnanceSurvey
OrdnanceSurvey
AdministrativeUnits
Administrative gazetteer
for Great Britain
http://data.ordnancesurvey.co.uk/id/.+ Spatial Relations Ontology
Administrative
Point en
2 LinkedGeoData
OpenStreetMap Points of interest
OpenStreetMap database
(post offices, traffic lights,
bus stops)
http://linkedgeo.data.org/triplify/+ LGD Ontology,WGS84 Geo Positioning
Point en
3 GeoNames GeoNames Toponyms Datasources used byGeoNames
http://sws.geonames.org/id/+ GeoNames Ontology,
WGS84 GeoPositioning
Point en, es,de,ca, nb, it,da, fr
4 Dbpedia Wikipedia General Knowledge
Wikipedia http://dbpedia.org/resource/+ Dbpedia Ontology,
WGS84 Geopositioning
Point 92 lang.
GeoLinkedData
• It is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data.
• It has started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN).
• http://geo.linkeddata.es
• Recently, National Statistics Institute (INE)
Process for Publishing Linked Data on the Web
Identificationof the data
sources
Vocabularydevelopment
Generationof the RDF Data
Publicationof the RDF data
Linking the RDF data
Data cleansing
Enable effective discovery
1. Identification and selection of the data sources
IGN
INE
2. Lightweight Ontology Development
hasStatisticalData
on
Ontology
Specification
Legend
hydrOntology
4
FAO
FAO Geopolitical ontology
WGS84
4W3C Vocabulary
GML
4GML Specification
O. Statistics
SCOVO
O. Time
W3C Time
hasLat/Long
hasGeometry
hasLat/Long
hasGeometry
hasLocation/isLocated
Thesaurus
UNESCO
4EGM / ERM
GeoNames
…
scv:Dimensionscv:Item
scv:Dataset
WGS84 Geo Positioning: an RDF
vocabulary
hydrographical phenomena (rivers,
lakes, etc.)
Ontology for OGC Geography Markup Language
Vocabulary for instants, intervals, durations, etc.
Names and international code systems for territories and groups
Following the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation.hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time
Classes 33 33
Object Properties 44 44
Data Properties 318 318
3. Generation of the RDF Data
INE
NOR2O
ODEMapster
IGN
IGN
Geospatial column
Geometry2RDF
3. Generation of the RDF Data – NOR2OIndustry Production Index
Province
Year
NOR2O
3. Generation of the RDF Data – R2O & ODEMapster
• R2O is an extensible, fully declarative language to describe mappings between relational database schemas and ontologies.
• The ODEMapster processor generates Semantic Web instances from relational instances based on the mapping description expressed in the R2O document
3. Generation of the RDF Data – R2O & ODEMapster
• Creation of the R2O Mappings
3. Generation of the RDF Data – R2O & ODEMapster
Excerpt of the R2O document
3. Generation of the RDF Data – Geometry2RDF
Oracle STO UTIL package
SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry
FROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta='Arroyo'
3. Generation of the RDF Data – Geometry2RDF
3. Generation of the RDF Data – Geometry2RDF
3. Generation of the RDF data – RDF graphs
IGN INE
So far
7 RDF Named Graphs
1412248 triples
BTN25 BCN200 IPI….
http://geo.linkeddata.es/dataset/IGN/BTN25 http://geo.linkeddata.es/dataset/IGN/BCN200 http://geo.linkeddata.es/dataset/INE/IPI
4. Publication of the RDF Data
SPARQL
Pubby
Linked DataHTML
Virtuoso 6.1.0
Pubby 0.3
Including ProvenanceSupport
4. Publication of the RDF Data
4. Publication of the RDF Data - License
• License for GeoLinkedData• Creative Commons Attribution-ShareAlike 3.0 • GNU Free Documentation License
• Each dataset will have its own specific license, IGN, INE, etc.
5. Data cleansing
• Lack of documentation of the IGN datasets
• Broken links: Spain, IGN resources
• Lack of documentation of the ontology
• Missing english and spanish labels
• Building a spanish ontology and importing some concepts of other ontology (in English):
• Importing the English ontology. Add annotations like a Spanish label to them.
• Importing the English ontology, creating new concepts and properties with a Spanish name and map those to the English equivalents.
• Re-declaring the terms of the English ontology that we need (using the same URI as in the English ontology), and adding a Spanish label.
• Creating your own class and properties that model the same things as the English ontology.
select DISTINCT ?graph where {GRAPH ?graph {?s ?p ?o.}.}select DISTINCT ?graph where {GRAPH ?graph {?s ?p ?o.}.}
5. Data cleansing
• URIs in Spanish• http://geo.linkeddata.es/ontology/Río
• RDF allows UTF-8 characters for URIs• But, Linked Data URIs has to be URLs as well• So, non ASCII-US characters have to be %code
• http://geo.linkeddata.es/ontology/R%C3%ADo
select DISTINCT ?graph where {GRAPH ?graph {?s ?p ?o.}.}select DISTINCT ?graph where {GRAPH ?graph {?s ?p ?o.}.}
6. Linking of the RDF Data
• Silk - A Link Discovery Framework for the Web of Data
• First set of links: Provinces of Spain• 86% accuracy
GeoLinkedDataDBPedia Geonames
7. Enable effective discovery
Provinces
Industry Production Index – Capital of Province
Rivers
Beaches
Future Work
• Generate more datasets from other domains, e.g. universities in Spain.
• Identify more links to DBPedia and Geonames.
• Cover complex geometrical information, i.e. not only Point and LineString-like data; we will also treat information representation through polygons.
Go raibh maith agaibh
GeoLinkedData
Asunción Gómez-Pérez, Alexander de Leon, Victor Saquicela, Luis M. Vilches,
Oscar Corcho, and Boris Villazón-Terrazas
Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
Phone: 34.91.3366605, Fax: 34.91.3524819