Upload
boris-villazon-terrazas
View
1.749
Download
0
Embed Size (px)
Citation preview
Methodological Guidelines for Publishing Linked Datag
Boris Villazón-Terrazas, Oscar CorchoFacultad de Informática, Universidad Politécnica de Madrid,
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madridhttp://www.oeg-upm.net
{bvillazon,ocorcho}@fi.upm.esPhone: 34 91 3366605 Fax: 34 91 3524819Phone: 34.91.3366605, Fax: 34.91.3524819
Slides available at: http://www.slideshare.net/boricles/
Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches,Vi t S i l Al d d L ó d th th t
WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0
Victor Saquicela, Alexander de León, and many others that wemay have omitted.
Main References
Wood, David (Ed) Linking Government Data - 2011
Methodological Guidelines for Publishing Government Linked Data
Boris Villazón-Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez
Best Practices for Publishing Linked Data
W3C Editor’s Draft – Government Linked Data Working Group
Michael Hausenblas, Bernadette Hyland, Boris Villazón-Terrazas
https://dvcs.w3.org/hg/gld/raw-file/bcb72f87b5cc/bp/index.html
Cookbook for Open Government Linked Data
W3C Editor’s Draft – Government Linked Data Working Group
Bernadette Hyland, Boris Villazón-Terrazas, Sarven Capadisli
http://www w3 org/2011/gld/wiki/Linked Data Cookbookhttp://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook
Guidelines for Publishing Linked Data
• The process of publishing Linked Data has aniterative incremental life cycle model.
Based on our experience in the production of Linked• Based on our experience in the production of LinkedData in several Governmental Contexts, have beenapplied in real case scenarios.
3
SpecificationSpecification• Identification and analysis of the data
sources
• URI design
• Definition of the license
6
Identification and analysis of the data sourcesSpecification
We have to distinguish
O d bli h d t th t t i h• Open and publish data that government agencies havenot yet opened up and published• Task that may require contacting to specific government data
owners to get access to their legacy data
• Reuse and leverage on data already opened up and published by government agenciesp y g g• Task to look for these data in public government catalogs
• Open Government Data• datacatalogs org• datacatalogs.org• Open Government Catalog
7
Identification and analysis of the data sourcesSpecification
After we have identified and selected the government data sources
• Search and compile all the available data and documentation about those resources
• Identify the schema of those resources includingt l t d th i l ti hiconceptual components and their relationships
• Identify the items in the domain i e things whose• Identify the items in the domain, i.e., things whoseproperties and relations are described in the data sources
8
GeoLinkedData – Identification of the data sourcesSpecification
IGNNational Geographic Institute of Spain
Agreement with the IGN
Oracle & MySQL
D t il bl
INENational Statistic Institute of Spain
Data sources availablein a public data catalog
National Statistic Institute of Spain
9
URI DesignSpecification
• Use meaningful URIs, instead of opaque URIs, whenpossible
• Separate TBox (ontology model) from ABox(instances) URIs(instances) URIs.• Base URI
http://data.gov.bo/http://health.data.gov.bo/
• TBox URIshttp://data.gov.bo/ontology/{class|property}p g gy { |p p y}
• ABox URIshttp://data.gov.bo/resource/http://data gov bo/resource/province/Tiraquehttp://data.gov.bo/resource/province/Tiraque
11
GeoLinkedData - URI designSpecification
• Base URIhttp://linkeddata.es/http://geo.linkeddata.es/
• TBox URIshttp://geo.linkeddata.es/ontology/{concept|property}http://geo linkeddata es/ontology/Provinciahttp://geo.linkeddata.es/ontology/Provincia
• ABox URIsABox URIshttp://geo.linkeddata.es/resource/{r. type}/{r. name}http://geo.linkeddata.es/resource/Provincia/Madrid
12
Definition of the licenseSpecification
• Several possibilities
• The UK Open Government License
• Open Database License
• Public Domain Dedication and License
• Open Data Commons Attribution License
C C• The Creative Commons Licenses
It is also possible to reuse and apply an existing licensep pp y gof the government data sources.
13
GeoLinkedData - Definition of the licenseSpecification
• Reusing the original license of the government data sources. IGN and INE data sources have their own li i il t Att ib ti Sh Alik 2 5 G ilicense, similar to Attribution-Share Alike 2.5 GenericLicense
14
http://creativecommons.org/licenses/by-sa/2.5/
OntologyModelling
• An ontology is an engineering artifact, which provides: • A set of terms• A set of explicit assumptions regarding the intended meaning of the terms.
Almost always including concepts and their classification• Almost always including concepts and their classification• Almost always including properties between concepts
• Shared understanding of a domain of interest
• Ontologies expressed in OWL or RDF(S), both based on RDF
16
Reuse available vocabulariesModelling
S h f it blSearch for suitablevocabularies
Linked Open Vocabularies
are theresuitable
Build the vocabulary byreusing available
Yes
vocabularies?g
vocabularies
No
17
…
Reuse available non-ontological resourcesModelling
S h f it bl
Highly reliable Web Sites
Domain-related sitesSearch for suitablenon-ontological resources
Domain related sites
Government Catalogs
are there Build the vocabulary byt f i il bl
Yes
suitableresources?
transforming availableresources
No
Build the vocabulary fromscratch
18
scratch
GeoLinkedDataWGS84 Geo
Modelling
scv:Dimensionscv:Item
scv:Dataset
Positioning: an RDF vocabulary
hydrographical phenomena (rivers,
lakes, etc.)
Vocabulary for instants, intervals, durations, etc.
Ontology for OGC
Names and international code systems for Ontology for OGC
Geography Markup Language
territories and groups
Classes 33 33
http://neon-toolkit.org/
Object Properties 44 44
Data Properties 318 318
19
TransformationGeneration
• Take the data sources selected in the specificationactivity and transform them to RDF according to the
b l t d i th d lli ti itvocabulary created in the modelling activity
• Some tools• Some tools• CSV and spreadsheets
• RDF extension of Google Refine, XLWrap, RDF123, NOR2O• RDB
• D2R Server, ODEMapster, W3C RDB2RDF WG – R2RML• XML
• GRDDL, ReDeFer
23
GenerationGeoLinkedData - Transformation
NOR2O
INEINE
ODEMapster
IGNIGN
IGNIGN
GeospatialGeospatialcolumncolumn
Geometry2RDF
24
• R2O is an extensible, fully declarative language to describe
GenerationGeoLinkedData - Transformation
2O s a e te s b e, u y dec a at e a guage to desc bemappings between relational database schemas and ontologies.
• The ODEMapster processor generates RDF instances from relational instances based on the mapping description pp g pexpressed in the R2O document
26
www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster
• Creation of the R O Mappings
GenerationGeoLinkedData - Transformation
• Creation of the R2O Mappings
27
GenerationGeoLinkedData - Transformation
• Tool for generating RDF from geometrical information
• The geometry could be available in GML or WKT
• The RDF generated follows our Geometry Model
29
http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf
GenerationGeoLinkedData - Transformation
Oracle STO UTIL package
SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry
FROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta='Arroyo'
30
Data CleansingGeneration
• To find possible errors, identified by Hogan et al.• http-level issues, such as accessibility and derefencability,
e g HTTP URIs ret rn 40 /50 errorse.g., HTTP URIs return 40x/50x errors• reasoning issues such as namespace without vocabulary,
e.g., rss:item term invented• malformed/incompatible datatypes, e.g., “true” as xsd:int
• To fix the identified errorsTo fix the identified errors
32
GeoLinkedData – Data CleansingGeneration
• Errors• Some resources, with the same name, were mixed. For
e ample Granada m nicipalit belongs to Granadaexample, Granada municipality belongs to Granada province, and La Granada municipality belongs to Barcelona Province.
• Autonomous communities that only have one province, e.g., Murcia Region, missed some municipalities, but their corresponding provinces e g Murcia Province have thecorresponding provinces, e.g., Murcia Province, have the correct number of municipalities.
S f• Some hydrographical resources missed some parts of their geometrical information.
33
LinkingGeneration
Identify suitable data sets as linking targets
http://ckan.net
Discover relationshipsbetween data items
Silk FrameworkLIMEShttp://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/
Validate the relationshipsdiscovered sameAs Validator
http://oegdev.dia.fi.upm.es:8080/sameAs/
34
GeoLinkedData - LinkingGeneration
GeoLinkedData
GeoNamesDBPedia
…. …. ….
http://sws.geonames.org/6355233/
http://geo.linkeddata.es/.../Madrid
http://dbpedia.org/resource/Madrid
35
…. …. ….
GeoLinkedData - LinkingGeneration
http://oegdev dia fi upm es:8080/sameAs/
36
http://oegdev.dia.fi.upm.es:8080/sameAs/
Dataset PublicationPublication
• Tools for storing RDF• Virtuoso Universal Server, Jena, Sesame, 4Store, YARS,
OWLIMOWLIM
• SPARQL endpoint and Linked Data frontendSPARQL endpoint and Linked Data frontend• Pubby, Talis Platform, Fuseki
39
Metadata PublicationPublication
• VoID allows to express metadata about RDF datasets
• Open Provenance Model
40
Dataset discoveryPublication
• Register the dataset into CKAN Registry
• Generate sitemap files for your dataset, by usingsitemap4rdf
• Submit the sitemap location to Google and Sindice
41
http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation
GeoLinkedData – Dataset publicationPublication
SPARQLLinked DataHTML
PubbyIncluding Provenance
Pubby 0.3
Support
http://www4.wiwiss.fu-berlin.de/pubby/
Virtuoso 6 1 0
42
Virtuoso 6.1.0
GeoLinkedDataExploitation
map4rdf:• Google maps viewer of RDF resources
http://oegdev.dia.fi.upm.es/projects/map4rdf/
• Google maps viewer of RDF resources• Resources with spatial information
• Extensible with google plugins• Used in other applications like Aemet GoodrelationsUsed in other applications like Aemet, Goodrelations
map4rdf SPARQL
46
Triplestore
Methodological Guidelines for Publishing Linked Datag
Boris Villazón-Terrazas, Oscar CorchoFacultad de Informática, Universidad Politécnica de Madrid,
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madridhttp://www.oeg-upm.net
{bvillazon,ocorcho}@fi.upm.esPhone: 34 91 3366605 Fax: 34 91 3524819Phone: 34.91.3366605, Fax: 34.91.3524819
Slides available at: http://www.slideshare.net/boricles/
Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches,Vi t S i l Al d d L ó d th th t
WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0
Victor Saquicela, Alexander de León, and many others that wemay have omitted.