53
Methodological Guidelines for Publishing Linked Data Boris Villazón-Terrazas, Oscar Corcho Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net {bvillazon,ocorcho}@fi.upm.es Phone: 34 91 3366605 Fax: 34 91 3524819 Phone: 34.91.3366605, Fax: 34.91.3524819 Slides available at: http://www.slideshare.net/boricles/ Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches, Vi t S i l Al d d d th th t WorkdistributedunderthelicenseCreativeCommonsAttribution- Noncommercial-Share Alike 3.0 Victor Saquicela, Alexander de León, and many others that we may have omitted.

Methodological Guidelines for Publishing Linked Data

Embed Size (px)

Citation preview

Methodological Guidelines for Publishing Linked Datag

Boris Villazón-Terrazas, Oscar CorchoFacultad de Informática, Universidad Politécnica de Madrid,

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madridhttp://www.oeg-upm.net

{bvillazon,ocorcho}@fi.upm.esPhone: 34 91 3366605 Fax: 34 91 3524819Phone: 34.91.3366605, Fax: 34.91.3524819

Slides available at: http://www.slideshare.net/boricles/

Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches,Vi t S i l Al d d L ó d th th t

WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0

Victor Saquicela, Alexander de León, and many others that wemay have omitted.

Main References

Wood, David (Ed) Linking Government Data - 2011

Methodological Guidelines for Publishing Government Linked Data

Boris Villazón-Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez

Best Practices for Publishing Linked Data

W3C Editor’s Draft – Government Linked Data Working Group

Michael Hausenblas, Bernadette Hyland, Boris Villazón-Terrazas

https://dvcs.w3.org/hg/gld/raw-file/bcb72f87b5cc/bp/index.html

Cookbook for Open Government Linked Data

W3C Editor’s Draft – Government Linked Data Working Group

Bernadette Hyland, Boris Villazón-Terrazas, Sarven Capadisli

http://www w3 org/2011/gld/wiki/Linked Data Cookbookhttp://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

Guidelines for Publishing Linked Data

• The process of publishing Linked Data has aniterative incremental life cycle model.

Based on our experience in the production of Linked• Based on our experience in the production of LinkedData in several Governmental Contexts, have beenapplied in real case scenarios.

3

4

5

SpecificationSpecification• Identification and analysis of the data

sources

• URI design

• Definition of the license

6

Identification and analysis of the data sourcesSpecification

We have to distinguish

O d bli h d t th t t i h• Open and publish data that government agencies havenot yet opened up and published• Task that may require contacting to specific government data

owners to get access to their legacy data

• Reuse and leverage on data already opened up and published by government agenciesp y g g• Task to look for these data in public government catalogs

• Open Government Data• datacatalogs org• datacatalogs.org• Open Government Catalog

7

Identification and analysis of the data sourcesSpecification

After we have identified and selected the government data sources

• Search and compile all the available data and documentation about those resources

• Identify the schema of those resources includingt l t d th i l ti hiconceptual components and their relationships

• Identify the items in the domain i e things whose• Identify the items in the domain, i.e., things whoseproperties and relations are described in the data sources

8

GeoLinkedData – Identification of the data sourcesSpecification

IGNNational Geographic Institute of Spain

Agreement with the IGN

Oracle & MySQL

D t il bl

INENational Statistic Institute of Spain

Data sources availablein a public data catalog

National Statistic Institute of Spain

9

GeoLinkedData – Analysis of the data sourcesSpecification

Year

Industry Production IndexProvince

10

URI DesignSpecification

• Use meaningful URIs, instead of opaque URIs, whenpossible

• Separate TBox (ontology model) from ABox(instances) URIs(instances) URIs.• Base URI

http://data.gov.bo/http://health.data.gov.bo/

• TBox URIshttp://data.gov.bo/ontology/{class|property}p g gy { |p p y}

• ABox URIshttp://data.gov.bo/resource/http://data gov bo/resource/province/Tiraquehttp://data.gov.bo/resource/province/Tiraque

11

GeoLinkedData - URI designSpecification

• Base URIhttp://linkeddata.es/http://geo.linkeddata.es/

• TBox URIshttp://geo.linkeddata.es/ontology/{concept|property}http://geo linkeddata es/ontology/Provinciahttp://geo.linkeddata.es/ontology/Provincia

• ABox URIsABox URIshttp://geo.linkeddata.es/resource/{r. type}/{r. name}http://geo.linkeddata.es/resource/Provincia/Madrid

12

Definition of the licenseSpecification

• Several possibilities

• The UK Open Government License

• Open Database License

• Public Domain Dedication and License

• Open Data Commons Attribution License

C C• The Creative Commons Licenses

It is also possible to reuse and apply an existing licensep pp y gof the government data sources.

13

GeoLinkedData - Definition of the licenseSpecification

• Reusing the original license of the government data sources. IGN and INE data sources have their own li i il t Att ib ti Sh Alik 2 5 G ilicense, similar to Attribution-Share Alike 2.5 GenericLicense

14

http://creativecommons.org/licenses/by-sa/2.5/

15

OntologyModelling

• An ontology is an engineering artifact, which provides: • A set of terms• A set of explicit assumptions regarding the intended meaning of the terms.

Almost always including concepts and their classification• Almost always including concepts and their classification• Almost always including properties between concepts

• Shared understanding of a domain of interest

• Ontologies expressed in OWL or RDF(S), both based on RDF

16

Reuse available vocabulariesModelling

S h f it blSearch for suitablevocabularies

Linked Open Vocabularies

are theresuitable

Build the vocabulary byreusing available

Yes

vocabularies?g

vocabularies

No

17

Reuse available non-ontological resourcesModelling

S h f it bl

Highly reliable Web Sites

Domain-related sitesSearch for suitablenon-ontological resources

Domain related sites

Government Catalogs

are there Build the vocabulary byt f i il bl

Yes

suitableresources?

transforming availableresources

No

Build the vocabulary fromscratch

18

scratch

GeoLinkedDataWGS84 Geo

Modelling

scv:Dimensionscv:Item

scv:Dataset

Positioning: an RDF vocabulary

hydrographical phenomena (rivers,

lakes, etc.)

Vocabulary for instants, intervals, durations, etc.

Ontology for OGC

Names and international code systems for Ontology for OGC

Geography Markup Language

territories and groups

Classes 33 33

http://neon-toolkit.org/

Object Properties 44 44

Data Properties 318 318

19

GeoLinkedDataModelling

20

21

GenerationGeneration• Transformation

• Data cleansing

• Linking

22

TransformationGeneration

• Take the data sources selected in the specificationactivity and transform them to RDF according to the

b l t d i th d lli ti itvocabulary created in the modelling activity

• Some tools• Some tools• CSV and spreadsheets

• RDF extension of Google Refine, XLWrap, RDF123, NOR2O• RDB

• D2R Server, ODEMapster, W3C RDB2RDF WG – R2RML• XML

• GRDDL, ReDeFer

23

GenerationGeoLinkedData - Transformation

NOR2O

INEINE

ODEMapster

IGNIGN

IGNIGN

GeospatialGeospatialcolumncolumn

Geometry2RDF

24

Industry Production Index Year

GenerationGeoLinkedData - Transformation

Province

NOR2O

25

• R2O is an extensible, fully declarative language to describe

GenerationGeoLinkedData - Transformation

2O s a e te s b e, u y dec a at e a guage to desc bemappings between relational database schemas and ontologies.

• The ODEMapster processor generates RDF instances from relational instances based on the mapping description pp g pexpressed in the R2O document

26

www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster

• Creation of the R O Mappings

GenerationGeoLinkedData - Transformation

• Creation of the R2O Mappings

27

GenerationGeoLinkedData - Transformation

Excerpt of the R2O document

28

GenerationGeoLinkedData - Transformation

• Tool for generating RDF from geometrical information

• The geometry could be available in GML or WKT

• The RDF generated follows our Geometry Model

29

http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf

GenerationGeoLinkedData - Transformation

Oracle STO UTIL package

SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry

FROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta='Arroyo'

30

GenerationGeoLinkedData - Transformation

Data CleansingGeneration

• To find possible errors, identified by Hogan et al.• http-level issues, such as accessibility and derefencability,

e g HTTP URIs ret rn 40 /50 errorse.g., HTTP URIs return 40x/50x errors• reasoning issues such as namespace without vocabulary,

e.g., rss:item term invented• malformed/incompatible datatypes, e.g., “true” as xsd:int

• To fix the identified errorsTo fix the identified errors

32

GeoLinkedData – Data CleansingGeneration

• Errors• Some resources, with the same name, were mixed. For

e ample Granada m nicipalit belongs to Granadaexample, Granada municipality belongs to Granada province, and La Granada municipality belongs to Barcelona Province.

• Autonomous communities that only have one province, e.g., Murcia Region, missed some municipalities, but their corresponding provinces e g Murcia Province have thecorresponding provinces, e.g., Murcia Province, have the correct number of municipalities.

S f• Some hydrographical resources missed some parts of their geometrical information.

33

LinkingGeneration

Identify suitable data sets as linking targets

http://ckan.net

Discover relationshipsbetween data items

Silk FrameworkLIMEShttp://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/

Validate the relationshipsdiscovered sameAs Validator

http://oegdev.dia.fi.upm.es:8080/sameAs/

34

GeoLinkedData - LinkingGeneration

GeoLinkedData

GeoNamesDBPedia

…. …. ….

http://sws.geonames.org/6355233/

http://geo.linkeddata.es/.../Madrid

http://dbpedia.org/resource/Madrid

35

…. …. ….

GeoLinkedData - LinkingGeneration

http://oegdev dia fi upm es:8080/sameAs/

36

http://oegdev.dia.fi.upm.es:8080/sameAs/

37

PublicationPublication• Dataset publication

• Metadata publication

• Dataset discovery

38

Dataset PublicationPublication

• Tools for storing RDF• Virtuoso Universal Server, Jena, Sesame, 4Store, YARS,

OWLIMOWLIM

• SPARQL endpoint and Linked Data frontendSPARQL endpoint and Linked Data frontend• Pubby, Talis Platform, Fuseki

39

Metadata PublicationPublication

• VoID allows to express metadata about RDF datasets

• Open Provenance Model

40

Dataset discoveryPublication

• Register the dataset into CKAN Registry

• Generate sitemap files for your dataset, by usingsitemap4rdf

• Submit the sitemap location to Google and Sindice

41

http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

GeoLinkedData – Dataset publicationPublication

SPARQLLinked DataHTML

PubbyIncluding Provenance

Pubby 0.3

Support

http://www4.wiwiss.fu-berlin.de/pubby/

Virtuoso 6 1 0

42

Virtuoso 6.1.0

GeoLinkedData – Dataset discoveryPublication

43

44

Exploitation

45

Streaming resources

GeoLinkedDataExploitation

map4rdf:• Google maps viewer of RDF resources

http://oegdev.dia.fi.upm.es/projects/map4rdf/

• Google maps viewer of RDF resources• Resources with spatial information

• Extensible with google plugins• Used in other applications like Aemet GoodrelationsUsed in other applications like Aemet, Goodrelations

map4rdf SPARQL

46

Triplestore

DEMOhttp://geo.linkeddata.es/browser

47

Provinces

48

Capital of Province

49

Provinces – Industry Production Index

50

Beaches

51

Methodological Guidelines for Publishing Linked Datag

Boris Villazón-Terrazas, Oscar CorchoFacultad de Informática, Universidad Politécnica de Madrid,

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madridhttp://www.oeg-upm.net

{bvillazon,ocorcho}@fi.upm.esPhone: 34 91 3366605 Fax: 34 91 3524819Phone: 34.91.3366605, Fax: 34.91.3524819

Slides available at: http://www.slideshare.net/boricles/

Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches,Vi t S i l Al d d L ó d th th t

WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0

Victor Saquicela, Alexander de León, and many others that wemay have omitted.