41
Linked Data Basics Anja Jentzsch, Freie Universität Berlin 1 17 April 2012 Tutorial: Practical Cross-Dataset Queries on the Web of Data WWW2012, Lyon, France

Linked Data Basics

Embed Size (px)

DESCRIPTION

Linked Data Basics Slot in WWW2012 Tutorial: Practical Cross-Dataset Queries on the Web of Data http://latc-project.eu/events/www2012-tutorial-cross-dataset-queries

Citation preview

Page 1: Linked Data Basics

Linked Data Basics

Anja Jentzsch, Freie Universität Berlin

1

17 April 2012

Tutorial: Practical Cross-Dataset Queries on the Web of Data

WWW2012, Lyon, France

Page 2: Linked Data Basics

Architecture of the classic Web

B C

HTMLHTML

Web Browsers

Search Engines

hyper-links

A

HTML

Single global document space

Small set of simple standards

1. HTML as document format

2. HTTP URLs as

• globally unique IDs

• retrieval mechanism

3. Hyperlinks to connect everything

2

Page 3: Linked Data Basics

Web 2.0 APIs and Mashups

No single global data space

Shortcomings

1. APIs have proprietary interfaces

2. Mashups are based on a fixed set of data sources

3. No hyperlinks between data items within different APIs

WebAPI

A

Mashup

WebAPI

B

WebAPI

C

WebAPI

D

3

Page 4: Linked Data Basics

Web APIs slice the Web into Walled Gardens

Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY 4

Page 5: Linked Data Basics

Extend the Web with a single global data space

1. by using RDF to publish structured data on the Web

2. by setting links between data items within different data sources

Linked Data

B C

RDF

RDFLinks

A D E

RDFLinks

RDFLinks

RDFLinks

RDF

RDF

RDF

RDF

RDF RDF

RDF

RDF

RDF

5

Page 6: Linked Data Basics

Linked Data Principles

Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web.

1. Use URIs as names for things.2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful RDF information.4. Include RDF statements that link to other URIs so that they can discover

related things.

Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006

6

Page 7: Linked Data Basics

The RDF Data Model

Chris Bizer

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:typepd:chris

7

Page 8: Linked Data Basics

Data Items are identified with HTTP URIs

pd:chris = http://www.bizer.de#chrisdbpedia:Berlin = http://dbpedia.org/resource/Berlin

foaf:name

foaf:based_near

foaf:Personrdf:typepd:chris

dbpedia:Berlin

Chris Bizer

8

Page 9: Linked Data Basics

Resolving URIs over the Web

dp:Cities_in_Germany

3.450.889dp:population

skos:subjectdbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:typepd:chris

Chris Bizer

9

Page 10: Linked Data Basics

Dereferencing URIs over the Web

dbpedia:Hamburg

dbpedia:Muenchen

skos:subject

skos:subjectdp:Cities_in_Germany

3.450.889dp:population

skos:subjectdbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:typepd:chris

Chris Bizer

10

Page 11: Linked Data Basics

RDF

• RDF is just a data model, it requires a serialization format

• For transmission over the network• For storage as files

• Multiple serialization formats have been defined

• RDF/XML• Turtle• N-Triples• RDFa• ...

• It’s all triples!

• Syntax doesn’t matter much and can be chosen case-by-case for pragmatic reasons

11

Page 12: Linked Data Basics

Properties of the Web of Linked Data

• Global, distributed data space build on a simple set of standards

• RDF, URIs, HTTP

• Entities are connected by links

• creating a global data graph that spans data sources and

• enables the discovery of new data sources

• Provides for data-coexistence

• Everyone can publish data to the Web of Linked Data

• Everyone can express their personal view on things

• Everybody can use the vocabularies/schema that they like

12

Page 13: Linked Data Basics

W3C Linking Open Data Project

• Grassroots community effort to

• publish existing open license datasets as Linked Data on the Web

• interlink things between different data sources

13

Page 14: Linked Data Basics

LOD Data Sets on the Web: May 2007

• 12 data sets

• Over 500 million RDF triples

• Around 120,000 RDF links between data sources 14

Page 15: Linked Data Basics

LOD Data Sets on the Web: November 2007

• 28 data sets15

Page 16: Linked Data Basics

LOD Data Sets on the Web: September 2008

• 45 data sets

• Over 2 billion RDF triples 16

Page 17: Linked Data Basics

LOD Data Sets on the Web: July 2009

• 95 data sets

• Over 6.5 billion RDF triples 17

Page 18: Linked Data Basics

LOD Data Sets on the Web: September 2010

• 203 data sets

• Over 24,7 billion RDF triples

• Over 436 million RDF links between data sources 18

Page 19: Linked Data Basics

LOD Data Sets on the Web: September 2011

• 295 data sets

• Over 31 billion RDF triples

• Over 504 million RDF links between data sources 19

Page 20: Linked Data Basics

LOD Data Set statistics as of 09/2011

LOD Cloud Data Catalog on CKAN

• http://www.ckan.net/group/lodcloud

More statistics

• http://lod-cloud.net/state/

20

Page 21: Linked Data Basics

Uptake in the Government Domain

• The EU is pushing Linked Data (LOD2, LATC, Eurostat)

• W3C Government Linked Data (GLD) Working Group

Page 22: Linked Data Basics

Uptake in the Libraries Community• Institutions publishing Linked Data

• Library of Congress (subject headings)

• German National Library (PND dataset and subject headings)

• Swedish National Library (Libris - catalog)

• Hungarian National Library (OPAC and Digital Library)

• British National Library

• Europeana project

22

Page 23: Linked Data Basics

Uptake in the Libraries Community• W3C Library Linked Data Incubator Group (2010)

• OKFN Working Group on Bibliographic Data (2010)

• Goals:

• Integrate Library Catalogs on global scale

• Interconnect resources between repositories (by topic, by location, by historical period, by ...)

23

Page 24: Linked Data Basics

Uptake in the Media Industry• Publish data as RDF or embed as

RDFa

• Goal: Drive traffic to websites via search engines

24

Page 25: Linked Data Basics

schema.org

• jointly proposed vocabularies for embedding data into HTML pages (Microdata)

• available since June 2011 25

Page 26: Linked Data Basics

Linked Data Applications

B C

Thing

typed links

A D E

typed links

typed links

typed links

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

Search Engines

Linked Data Mashups

Linked Data Browsers

26

Page 27: Linked Data Basics

27

Page 28: Linked Data Basics

28

Page 29: Linked Data Basics

29

Page 30: Linked Data Basics

Lower Data Integration Costs

• Data Publisher

• publishes data as RDF

• sets identity links

• reuses terms or publishes mappings

• Third Parties

• set identity links pointing at your data

• publish mappings to the Web

• Data Consumer

• has to do the rest

• using record linkage and schema matching techniques

The overall data integration effort is split between the data publisher, the data consumer and third parties.

30

Page 31: Linked Data Basics

Is your data 5 star?

★ ★

★ ★ ★

★ ★ ★ ★

★ ★ ★ ★ ★

Make your stuff available on the Web (whatever format) under an open license.

Make it available as structured data (e.g., Excel instead of image scan of a table) so that it can be reused.

Use non-proprietary, open formats (e.g., CSV instead of Excel).

Use URIs to identify things, so that people can point at your stuff and serve RDF from it.

Link your data to other data to provide context.

Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2010

31

Page 32: Linked Data Basics

How to publish Linked Data

Tasks:

1. Make data available as RDF via HTTP2. Set RDF links pointing at other data sources3. Make your data self-descriptive4. Reuse common vocabularies

Tom Heath, Christian Bizer : Linked Data: Evolving the Web into a Global Data Space

http://linkeddatabook.com/32

Page 33: Linked Data Basics

Make Data available as RDF via HTTP

•Ready to use tools (examples)

• D2R Server

• provides for mapping relational databases into RDF and for serving them as Linked Data

• Pubby

• Linked Data Frontend for SPARQL Endpoints

• More tools

• http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/PublishingTools

33

Page 34: Linked Data Basics

Set RDF links to other data sources

• Examples of RDF links

<http://dbpedia.org/resource/Berlin> owl:sameAs <http://

sws.geonames.org/2950159> .

<http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest

<http://dbpedia.org/resource/Semantic_Web> .

<http://example-bookshop.com/book006251587X> owl:sameAs <http://

www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X> .

34

Page 35: Linked Data Basics

How to generate RDF links?• Pattern-based approaches

• Exploit naming conventions within URIs (for instance ISBNs, ISINs, …)

• Similarity-based approaches

• Compare items within different data sources using various similarity metrics

• Ready to use tools (Examples)

• Silk Link Discovery Framework

• provides a declarative language for specifying link conditions which may combine different similarity metrics

• More tools

• http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/EquivalenceMining

35

Page 36: Linked Data Basics

Make your Data Self-Descriptive

• Increase the usefulness of your data and ease data integration

• Aspects of self-descriptiveness

• Enable clients to retrieve the schema

• Reuse terms from common vocabularies

• Publish schema mappings for proprietary terms

• Provide provenance metadata

• Provide licensing metadata

• Provide data-set-level metadata using voiD

• Refer to additional access methods using voiD

36

Page 37: Linked Data Basics

Enable Clients to retrieve the Schema

Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions.

<http://xmlns.com/foaf/0.1/Person>

rdf:type owl:Class ; rdfs:label "Person";

rdfs:subClassOf <http://xmlns.com/foaf/0.1/Agent> ; rdfs:subClassOf <http://xmlns.com/wordnet/1.6/Agent> .

<http://richard.cyganiak.de/foaf.rdf#cygri>

foaf:name "Richard Cyganiak" ; rdf:type <http://xmlns.com/foaf/0.1/Person> .

RDFS or OWL definition

Some data on the Web

Resolve unknown term http://xmlns.com/foaf/0.1/Person

37

Page 38: Linked Data Basics

Reuse Terms from Common Vocabularies• Common Vocabularies

• Friend-of-a-Friend for describing people and their social network

• SIOC for describing forums and blogs

• SKOS for representing topic taxonomies

• Organization Ontology for describing the structure of organizations

• GoodRelations provides terms for describing products and business entities

• Music Ontology for describing artists, albums, and performances

• Review Vocabulary provides terms for representing reviews

• Common sources of identifiers (URIs) for real world objects

• LinkedGeoData and Geonames locations

• GeneID and UniProt life science identifiers 38

Page 39: Linked Data Basics

Linked Data Sets: Distribution of used vocabularies

39

Page 40: Linked Data Basics

Conclusion

• Linked Data provides a standardized data access interface• Linked Data allows for the development of a variety of tools to integrate,

enhance and and view the data• The Web of Data is growing rapidly

• There are active deployment communities in different domains

• Web search is evolving into query answering

• Search engines will increasingly rely on structured data from the Web

40

Page 41: Linked Data Basics

ThanksQuestions?

References• Tom Heath, Christian Bizer : Linked Data: Evolving the Web into a Global Data Space

http://linkeddatabook.com/• Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Far

http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf • Linking Open Data Project Wiki

http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

41

Email: [email protected] : @anjeve