Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Publishing Linked Data – There is no One-Size-Fits-All
Formula Asunción Gómez-Pérez
Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net [email protected]
Acknowledgements: O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Our partners at: BNE, IGN, …
LOV SYMPOSIUM: LINKING AND OPENING VOCABULARIES. 18th June, 2012
Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0
LOV-HIVE Symposium. 18th June 2012
Table of content
1. The concept 2. Foundations 3. The process 4. Examples
• Libraries: http://datos.bne.es • Geo: http://geo.linkeddata.es/ • Metereology:http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/
2
LOV-HIVE Symposium. 18th June 2012
Complex queries using data from heterogeneous Web pages
3
http://www.aemet http://www.viaf.org/
*Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell
Cervantes enthusiast from Germany visiting Madrid and willing to know more about Cervantes’ work and life
http://www.bne.es/
http://elviajero.elpais.com/
LOV-HIVE Symposium. 18th June 2012
M. Cervantes
Don Quixote
Hebrew
creator
Translated into
1960
Year of publication
VIAF
located
Data Integration
4
M. Cervantes Alcalá de Henares
Alcalá de Henares
birthPlace
Same as
Alcalá de Henares
20º
Temperatura
M. Cervantes
El Quijote
Autor
1605 Año de
Publicación
BNE
Ubicado en
BD BNE
BD VIAF
BD AEMET
BD IGN
Alcalá de Henares
Tapas Siglo de Oro
guía
BD Prisa
BD DBpedia
LOV-HIVE Symposium. 18th June 2012
Table of content
1. The concept 2. Foundations 3. The process 4. Examples
• Libraries: http://datos.bne.es • Geo: http://geo.linkeddata.es/ • Metereology:http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/
5
LOV-HIVE Symposium. 18th June 2012
Linked Data: why it is important?
• Facilitate data integration • From heterogeous sources • In different formats • Different granularity • In different languages • From different countries
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
LOV-HIVE Symposium. 18th June 2012
Foundations Unique identifiers: URI identify or name a resource
RDF(S) models
Cer El Quijote Cervantes Is creator of
Cer Work Person Is creator of
Is a Is a
http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX3383563
http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://iflastandards.info/ns/fr/frbr/frbrer/C1001
Equivalence links to other datasets Same As
http://viaf.org/viaf/17220427
Cervantes
Same As Same As
http://dbpedia.org/resource/Miguel_de_Cervantes
Cervantes
Data navigation
LOV-HIVE Symposium. 18th June 2012
Foundations Aligning Models with Owl EquivalentClass
EquivalentClass
Same As
http://xmlns.com/foaf/0.1/Person Person
http://schema.org/Person Person
EquivalentClass
Lessons learnt 1. Reuse existing models 2. Align the data and the concepts.
Municipality
Person
birthPlace
http://iflastandards.info/ns/fr/frbr/frbrer/C1005
http://dbpedia.org/resource/Municipalities_of_Spain
http://dbpedia.org/page/Alcal%C3%A1_de_Henares
Alcalá de Henares
Is a
http://geo.linkeddata.es/ontology/Municipio
Municipio
http://geo.linkeddata.es/resource/Alcalá de Henares
Alcalá de Henares
Is a
LOV-HIVE Symposium. 18th June 2012
Table of content
1. The concept 2. Foundations 3. The process 4. Examples
9
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
Methodology
• Data sources analysis
• URI Design
• License definition
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
10 Reunión bilateral CNIG – OEG Proyecto OTALEX
LOV-HIVE Symposium. 18th June 2012
Identification and selection of data sources
11
Geographical Spanish Institute
Statistical Spanish Institute
Spanish National Libraries
Metereological Office (AEMET)
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012 12
• Geographic Spanish Institute • Multilingual (Spanish, Vasc, Gallician, Catalan) • Conceptualization mistmatches • Granularity (scale concept) • Domain vocabulary
Inform. hidrográfica. Embalse, albufera, río, etc. Transportes. Vía desdoblada, Ferrocarril, … Unidades Administrativas. Municipio.
• Particularaties • Longitude and latitude
• Statistic Spanish Institute • Monolingual • Numerical information • Particularaties
• Geo (textual level) and Temporal
1. Identification and selection of the data sources
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
1. Identification and selection of the data sources: Geographical information
IGN-E
LOV-HIVE Symposium. 18th June 2012
Statistical information
14
1. Identification and selection of the data sources
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
Specification
• Records in the MARC 21 format • 3.9 million bibliographical records • 4.2 million authority records • Version: November, 2011
15
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
URI design
• Meaningful URIs versus Opaque URIs • Separate TBox (ontology model) from ABox • Base URI http://linkeddata.es/ http://datos.bne.es/ http://geo.linkeddata.es/ http://otalex.linkeddata.es/
• OntologyTBox URIs) http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://phenomenontology.linkeddata.es/ontology/{concept|property} http://phenomenontology.linkeddata.es/ontology/Municipio We use the RDF Data Cube Vocabulary and/or other vocabularies
• Data (ABox URIs) http://datos.bne.es/resource/XX1718747 http://geo.linkeddata.es/resource/{resource type}/{resource name} http://geo.linkeddata.es/resource/Municipio/Badajoz
16
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
Ontology
18
• Ontologies: • A set of terms • A set of explicit assumptions regarding the intended meaning of
the terms. • Almost always including concepts and their classification • Almost always including properties between concepts
• Shared understanding of a domain of interest
• Ontologies expressed in OWL or RDF(S), both based on RDF
• The NeOn methodology helps to build ontologies
Modelling
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
2. Vocabulary development
• Features • Lightweight :
• Taxonomies and a few properties • Consensuated vocabularies
• To avoid the mapping problems • Multilingual
• Linked data are multilingual
• The NeOn methodology can help to • Re-enginer Non ontological resources into ontologies
• Pros: use domain terminology already consensuated by domain experts
• Withdraw in heavyweight ontologies those features that you don’t need
• Reuse existing vocabularies
19
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
The Ontology for BNE: based on IFLA vocabularies
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
Geolinkeddata ontology
hasStatisticalData
on
Ontology
Specification
Legend
hydrOntology
4
FAO
FAO Geopolitical ontology
WGS84
4W3C Vocabulary
GML
4GML Specification
O. Statistics
SCOVO
O. Time
W3C Time
hasLat/Long
hasGeometry
hasLat/Long
hasGeometry
hasLocation/isLocated
Thesaurus
UNESCO
4EGM / ERM
GeoNames…
scv:Dimension scv:Item
scv:Dataset
WGS84 Geo Positioning: an RDF
vocabulary
hydrographical phenomena (rivers,
lakes, etc.)
Ontology for OGC Geography Markup Language
Vocabulary for instants, intervals, durations, etc.
Names and international code systems for territories and groups
Following the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation. hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time
Classes 33 33
Object Properties 44 44
Data Properties 318 318 reused
LOV-HIVE Symposium. 18th June 2012
3. Generation of RDF
• From the Data sources • Geographic information (Databases) • Statistic information (.xsl) • Geospatial information • Biobliographic information (MARC 21)
• Different technologies for RDF generation • NOR20 (from excell, XML, text files, …) • R20 and ODEMapster (from Databases) • Geometry2RDF and SPh2RDF (for Geo
data) • Marimba for Libraries
BNE
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
Libraries: Marimba uses the ontology to generate RDF
BNE
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia
BNE
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia
BNE
http://datos.bne.es/resource/XX1718747
Same As Same As
Same As
Same As
Same As
LIBRIS http://libris.kb.se/resource/auth/45369
SUDOC
http://www.idref.fr/026774771/id
DNB
http://d-nb.info/gnd/11851993X
DBpedia
http://dbpedia.org/resource/Miguel_de_Cervantes
VIAF http://viaf.org/viaf/17220427
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
Publicación
Data publication Metadata publicacion using VOID To facilitate the discovery
• Register in CKAN your dataset
• Use to sitemap4rdf to generate the site map
• Upload the site map to Google and Sindice
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
LOV-HIVE Symposium. 18th June 2012
Exploitation
select distinct COUNT(?Obras) where { http://datos.bne.es/resource/XX1718747 <http://iflastandards.info/ns/fr/frbr/frbrer/P2010> ?Obras }
URI Cervantes
Is author
SPARQL queries
Web Interface Especification
Model
RDF generation
Publication
Exploitation
Specification
Modelling
RDF Generation
Publication
Exploitation
Links Generation
http://linkeddata3.dia.fi.upm.es/bne-demo
LOV-HIVE Symposium. 18th June 2012
Table of content
1. The concept 2. Foundations 3. The process 4. Examples
• Libraries: http://datos.bne.es • http://linkeddata3.dia.fi.upm.es/bne-demo • Geo: http://geo.linkeddata.es/ • Metereology: http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/
29
LOV-HIVE Symposium. 18th June 2012 ..:.. O<luo<do
0 OCovllhl _,? .,.....-
PombJI O COmtwl (e) d::~e!. 0
,..,'l:' .• Nantes
o ,._
Tours 0 o
• DV10m on MADRID,RETIRO
Capas
El Viajero Filtr• porf" ch "
L Sanuago
• Composle p
Pon!evedr
Estacion MADRID,RETIRO 21 :40 26/5/201 1
Djr media del viento: 276 grados
Recorrido del viento: 13 Hm
V el. media del v iento: 2.2 nnls
O ir. de la v . max. del viento· 251 grados
Temperatura del aire· 18.5 grados C.
Humedad relativa: 75 %
Temp. del pto. de rocio: 13.9 grados C.
Vel max del viento· 4. 7 mis
Precjpjtacjon: O litros/m2
~938. 4 hPa
l!ti l .ü!!l.rul.-ª
!!ti 1 semana
l!ti l .ü!!l.rul.-ª
l!ti l .ü!!l.rul.-ª
l!ti l .ü!!l.rul.-ª
l!ti l .ü!!l.rul.-ª
l!ti l .nm.20.2
l!ti l .nm.20.2
ogo Pres. reducida al nivel del mar 1 013.6 hPa !!ti 1 .ü!!l.rul.-ª
Brog11 O CIIev.. 0
o Fofe
Porto 0 O Pore<IH o
Sln!e Mono da Fe,.
o Las chicas de Artón Martfn
o Reflejos versalle~cos en
No hay fotos disponibes
8 Jn paseo por Madrid
o Visitando El Escorial -
LOV-HIVE Symposium. 18th June 2012
There is no One-Size-Fits-All Formula Phase BNE IGN AEMET PRISA INE
Modeling
RDF generation
Links generation
Publication
Exploitation 31
Scovo
Data cube SSN ontology
SIOC DC
map4rdf SPARQL
geometry2rdf NOR2O
sitemap4rdf Pubby
MARiMbA
Silk Silk Silk NOR2O
DNB VIAF LIBRIS DBPEDIA
DBPEDIA Geonames
Geolinkeddata.es DBPEDIA Geolinkeddata.es
Geolinkeddata.es
hydrontology
Wgs84 time
CSV parser CSV parser NOR2O
LOV-HIVE Symposium. 18th June 2012
Lessons learnt • URI
• Follow existing design guidelines for new URIs • Reuse existing URIs from authoritative sources
• Models • Reuse existing models when available • Create new models from authoritative sources • Do not forget to align your model with existing models
• Generation • Vertical domains usually require specific tools for generation
• Link • Generic link discovery tools performs well in vertical domains • Link to other data sets using
• Equivalence links (sameAs) • Typed links
• Discovery • Use sitemap4rdf to allow search engines to find your data
• Use an iterative-incremental life cycle in your development
32
Municipality Person birthPlace
Dbpedia:cervantes bne:Cervantes sameAs
Learn about Linked Data with UPM official courses in
one week
Publishing Linked Data – There is no One-Size-Fits-All
Formula Asunción Gómez-Pérez
Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net [email protected]
Acknowledgements: O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Our partners at: BNE, IGN, …
LOV SYMPOSIUM: LINKING AND OPENING VOCABULARIES. 18th June, 2012
Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0