31
Publishing Linked Data – There is no One-Size-Fits-All Formula Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net [email protected] Acknowledgements: O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Our partners at: BNE, IGN, … LOV SYMPOSIUM: LINKING AND OPENING VOCABULARIES. 18th June, 2012 Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0

Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

Publishing Linked Data – There is no One-Size-Fits-All

Formula Asunción Gómez-Pérez

Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net [email protected]

Acknowledgements: O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Our partners at: BNE, IGN, …

LOV SYMPOSIUM: LINKING AND OPENING VOCABULARIES. 18th June, 2012

Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0

Page 2: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Table of content

1. The concept 2. Foundations 3. The process 4. Examples

• Libraries: http://datos.bne.es • Geo: http://geo.linkeddata.es/ • Metereology:http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/

2

Page 3: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Complex queries using data from heterogeneous Web pages

3

http://www.aemet http://www.viaf.org/

*Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell

Cervantes enthusiast from Germany visiting Madrid and willing to know more about Cervantes’ work and life

http://www.bne.es/

http://elviajero.elpais.com/

Page 4: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

M. Cervantes

Don Quixote

Hebrew

creator

Translated into

1960

Year of publication

VIAF

located

Data Integration

4

M. Cervantes Alcalá de Henares

Alcalá de Henares

birthPlace

Same as

Alcalá de Henares

20º

Temperatura

M. Cervantes

El Quijote

Autor

1605 Año de

Publicación

BNE

Ubicado en

BD BNE

BD VIAF

BD AEMET

BD IGN

Alcalá de Henares

Tapas Siglo de Oro

guía

BD Prisa

BD DBpedia

Page 5: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Table of content

1. The concept 2. Foundations 3. The process 4. Examples

• Libraries: http://datos.bne.es • Geo: http://geo.linkeddata.es/ • Metereology:http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/

5

Page 6: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Linked Data: why it is important?

• Facilitate data integration • From heterogeous sources • In different formats • Different granularity • In different languages • From different countries

© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

Page 7: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Foundations Unique identifiers: URI identify or name a resource

RDF(S) models

Cer El Quijote Cervantes Is creator of

Cer Work Person Is creator of

Is a Is a

http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX3383563

http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://iflastandards.info/ns/fr/frbr/frbrer/C1001

Equivalence links to other datasets Same As

http://viaf.org/viaf/17220427

Cervantes

Same As Same As

http://dbpedia.org/resource/Miguel_de_Cervantes

Cervantes

Data navigation

Page 8: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Foundations Aligning Models with Owl EquivalentClass

EquivalentClass

Same As

http://xmlns.com/foaf/0.1/Person Person

http://schema.org/Person Person

EquivalentClass

Lessons learnt 1. Reuse existing models 2. Align the data and the concepts.

Municipality

Person

birthPlace

http://iflastandards.info/ns/fr/frbr/frbrer/C1005

http://dbpedia.org/resource/Municipalities_of_Spain

http://dbpedia.org/page/Alcal%C3%A1_de_Henares

Alcalá de Henares

Is a

http://geo.linkeddata.es/ontology/Municipio

Municipio

http://geo.linkeddata.es/resource/Alcalá de Henares

Alcalá de Henares

Is a

Page 9: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Table of content

1. The concept 2. Foundations 3. The process 4. Examples

9

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 10: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Methodology

• Data sources analysis

• URI Design

• License definition

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

10 Reunión bilateral CNIG – OEG Proyecto OTALEX

Page 11: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Identification and selection of data sources

11

Geographical Spanish Institute

Statistical Spanish Institute

Spanish National Libraries

Metereological Office (AEMET)

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 12: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012 12

• Geographic Spanish Institute • Multilingual (Spanish, Vasc, Gallician, Catalan) • Conceptualization mistmatches • Granularity (scale concept) • Domain vocabulary

Inform. hidrográfica. Embalse, albufera, río, etc. Transportes. Vía desdoblada, Ferrocarril, … Unidades Administrativas. Municipio.

• Particularaties • Longitude and latitude

• Statistic Spanish Institute • Monolingual • Numerical information • Particularaties

• Geo (textual level) and Temporal

1. Identification and selection of the data sources

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 13: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

1. Identification and selection of the data sources: Geographical information

IGN-E

Page 14: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Statistical information

14

1. Identification and selection of the data sources

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 15: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Specification

• Records in the MARC 21 format • 3.9 million bibliographical records • 4.2 million authority records • Version: November, 2011

15

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 16: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

URI design

• Meaningful URIs versus Opaque URIs • Separate TBox (ontology model) from ABox • Base URI http://linkeddata.es/ http://datos.bne.es/ http://geo.linkeddata.es/ http://otalex.linkeddata.es/

• OntologyTBox URIs) http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://phenomenontology.linkeddata.es/ontology/{concept|property} http://phenomenontology.linkeddata.es/ontology/Municipio We use the RDF Data Cube Vocabulary and/or other vocabularies

• Data (ABox URIs) http://datos.bne.es/resource/XX1718747 http://geo.linkeddata.es/resource/{resource type}/{resource name} http://geo.linkeddata.es/resource/Municipio/Badajoz

16

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 17: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Ontology

18

• Ontologies: • A set of terms • A set of explicit assumptions regarding the intended meaning of

the terms. • Almost always including concepts and their classification • Almost always including properties between concepts

• Shared understanding of a domain of interest

• Ontologies expressed in OWL or RDF(S), both based on RDF

• The NeOn methodology helps to build ontologies

Modelling

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 18: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

2. Vocabulary development

• Features • Lightweight :

• Taxonomies and a few properties • Consensuated vocabularies

• To avoid the mapping problems • Multilingual

• Linked data are multilingual

• The NeOn methodology can help to • Re-enginer Non ontological resources into ontologies

• Pros: use domain terminology already consensuated by domain experts

• Withdraw in heavyweight ontologies those features that you don’t need

• Reuse existing vocabularies

19

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 19: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

The Ontology for BNE: based on IFLA vocabularies

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 20: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Geolinkeddata ontology

hasStatisticalData

on

Ontology

Specification

Legend

hydrOntology

4

FAO

FAO Geopolitical ontology

WGS84

4W3C Vocabulary

GML

4GML Specification

O. Statistics

SCOVO

O. Time

W3C Time

hasLat/Long

hasGeometry

hasLat/Long

hasGeometry

hasLocation/isLocated

Thesaurus

UNESCO

4EGM / ERM

GeoNames…

scv:Dimension scv:Item

scv:Dataset

WGS84 Geo Positioning: an RDF

vocabulary

hydrographical phenomena (rivers,

lakes, etc.)

Ontology for OGC Geography Markup Language

Vocabulary for instants, intervals, durations, etc.

Names and international code systems for territories and groups

Following the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation. hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time

Classes 33 33

Object Properties 44 44

Data Properties 318 318 reused

Page 21: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

3. Generation of RDF

• From the Data sources • Geographic information (Databases) • Statistic information (.xsl) • Geospatial information • Biobliographic information (MARC 21)

• Different technologies for RDF generation • NOR20 (from excell, XML, text files, …) • R20 and ODEMapster (from Databases) • Geometry2RDF and SPh2RDF (for Geo

data) • Marimba for Libraries

BNE

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 22: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Libraries: Marimba uses the ontology to generate RDF

BNE

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 23: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia

BNE

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 24: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia

BNE

http://datos.bne.es/resource/XX1718747

Same As Same As

Same As

Same As

Same As

LIBRIS http://libris.kb.se/resource/auth/45369

SUDOC

http://www.idref.fr/026774771/id

DNB

http://d-nb.info/gnd/11851993X

DBpedia

http://dbpedia.org/resource/Miguel_de_Cervantes

VIAF http://viaf.org/viaf/17220427

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 25: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Publicación

Data publication Metadata publicacion using VOID To facilitate the discovery

• Register in CKAN your dataset

• Use to sitemap4rdf to generate the site map

• Upload the site map to Google and Sindice

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 26: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Exploitation

select distinct COUNT(?Obras) where { http://datos.bne.es/resource/XX1718747 <http://iflastandards.info/ns/fr/frbr/frbrer/P2010> ?Obras }

URI Cervantes

Is author

SPARQL queries

Web Interface Especification

Model

RDF generation

Publication

Exploitation

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

http://linkeddata3.dia.fi.upm.es/bne-demo

Page 27: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Table of content

1. The concept 2. Foundations 3. The process 4. Examples

• Libraries: http://datos.bne.es • http://linkeddata3.dia.fi.upm.es/bne-demo • Geo: http://geo.linkeddata.es/ • Metereology: http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/

29

Page 28: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012 ..:.. O<luo<do

0 OCovllhl _,? .,.....-

PombJI O COmtwl (e) d::~e!. 0

,..,'l:' .• Nantes

o ,._

Tours 0 o

• DV10m on MADRID,RETIRO

Capas

El Viajero Filtr• porf" ch "

L Sanuago

• Composle p

Pon!evedr

Estacion MADRID,RETIRO 21 :40 26/5/201 1

Djr media del viento: 276 grados

Recorrido del viento: 13 Hm

V el. media del v iento: 2.2 nnls

O ir. de la v . max. del viento· 251 grados

Temperatura del aire· 18.5 grados C.

Humedad relativa: 75 %

Temp. del pto. de rocio: 13.9 grados C.

Vel max del viento· 4. 7 mis

Precjpjtacjon: O litros/m2

~938. 4 hPa

l!ti l .ü!!l.rul.-ª

!!ti 1 semana

l!ti l .ü!!l.rul.-ª

l!ti l .ü!!l.rul.-ª

l!ti l .ü!!l.rul.-ª

l!ti l .ü!!l.rul.-ª

l!ti l .nm.20.2

l!ti l .nm.20.2

ogo Pres. reducida al nivel del mar 1 013.6 hPa !!ti 1 .ü!!l.rul.-ª

Brog11 O CIIev.. 0

o Fofe

Porto 0 O Pore<IH o

Sln!e Mono da Fe,.

o Las chicas de Artón Martfn

o Reflejos versalle~cos en

No hay fotos disponibes

8 Jn paseo por Madrid

o Visitando El Escorial -

Page 29: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

There is no One-Size-Fits-All Formula Phase BNE IGN AEMET PRISA INE

Modeling

RDF generation

Links generation

Publication

Exploitation 31

Scovo

Data cube SSN ontology

SIOC DC

map4rdf SPARQL

geometry2rdf NOR2O

sitemap4rdf Pubby

MARiMbA

Silk Silk Silk NOR2O

DNB VIAF LIBRIS DBPEDIA

DBPEDIA Geonames

Geolinkeddata.es DBPEDIA Geolinkeddata.es

Geolinkeddata.es

hydrontology

Wgs84 time

CSV parser CSV parser NOR2O

Page 30: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

LOV-HIVE Symposium. 18th June 2012

Lessons learnt • URI

• Follow existing design guidelines for new URIs • Reuse existing URIs from authoritative sources

• Models • Reuse existing models when available • Create new models from authoritative sources • Do not forget to align your model with existing models

• Generation • Vertical domains usually require specific tools for generation

• Link • Generic link discovery tools performs well in vertical domains • Link to other data sets using

• Equivalence links (sameAs) • Typed links

• Discovery • Use sitemap4rdf to allow search engines to find your data

• Use an iterative-incremental life cycle in your development

32

Municipality Person birthPlace

Dbpedia:cervantes bne:Cervantes sameAs

Learn about Linked Data with UPM official courses in

one week

Page 31: Publishing Linked Data – There is no One-Size-Fits-All Formula · Publishing Linked Data – There is no One-Size-Fits-All Formula Author: Gómez-Pérez, Asunción Subject: LOV

Publishing Linked Data – There is no One-Size-Fits-All

Formula Asunción Gómez-Pérez

Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net [email protected]

Acknowledgements: O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Our partners at: BNE, IGN, …

LOV SYMPOSIUM: LINKING AND OPENING VOCABULARIES. 18th June, 2012

Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0