Upload
bernhard-haslhofer
View
3.667
Download
1
Tags:
Embed Size (px)
Citation preview
1 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Web of Data in the Context of Multimedia Part 1: Linked Open Data: Vision, Concepts and Technologies
Bernhard Haslhofer, Bernhard Schandl, Andreas Langegger,
Wolfgang Halb, Tobias Bürger
2 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Agenda Introduction
Producing Linked Data
Existing Data Sets
Linking Data
Consuming Linked Data
Multimedia Interlinking
Multimedia Annotations
Enriching Personal Media Collections
3 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Introduction: The Web of Data Vision
4 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
5 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
6 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
7 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
8 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
There is lots of data about the movie “The Shining” available on the Web…
Web of Data Vision
Genre: Horror/Thriller
Title: The Shining
Release Date: 23 May 1980 (USA)
Distributors: Warner Bros. Pictures Distribution
Starring: Jack Nicholson, Shelly Duvall, Danny Lloyd, …
Produced by: Stanley Kubrick
Running Time: 146 min (original), 144 min (cut), 120min (European cut)
9 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Web of Data Vision …but only in a human-readable representation
(HTML)
Difficult to access these data by applications, unless they start parsing HTML representations (difficult, erroneous)
they use data source specific APIs (e.g., Web Services), if there are any
Metadata about (multimedia) resources are still locked in closed data silos
10 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
DB DB
DB DB
11 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Web of Data Vision The Web is successful because it provides
Uniform encoding (HTML)
Uniform addressing (URI) Uniform transportation (HTTP)
for the exchange of documents.
Why not apply the same mechanism to the underlying data?
12 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Web of Data Vision
The Web of Data vision is to use the Web to provide access not only to documents but also to the underlying data
13 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
The Linked Data Principles
Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
Include links to other URIs, so that they can discover more things
14 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
The Enabling Technologies
15 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
URI Uniform Resource Identifiers (URI) identify things
Use dereferencable HTTP URIs in the Web of Data
16 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
RDF A data model for representing metadata on the Web
Several statements (triples) form a graph
17 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
RDF Links are an intrinsic RDF feature
18 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
RDF/XML, N3, Turtle, etc
19 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
RDFS A language for describing vocabularies in a machine-
understandable way
20 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
OWL A more expressive language for expressing
vocabularies and/or ontologies in a machine-understandable way
21 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
SKOS A language for describing controlled vocabularies
22 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
SPARQL A query language and protocol for accessing RDF
data on the Web
23 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Linked Data Implementation Best Practices
24 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
How to publish vocabularies
Hash-based URIs E.g., http://example.com/example1#ClassA
Suited to group the description of a moderate number of related terms into one document
Agent can retrieve terms with a single HTTP request
Slash-based URIs E.g., http://example.com/example1/ClassB
Suited to split the descriptions of terms in large vocabularies into one document per term
No need for the agent to download a massive document to find the description of a term
25 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
How to publish vocabularies E.g.. extended configuration for hash namespace
26 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
How to publish vocabularies E.g.. extended configuration for hash namespace
27 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
How to publish Linked Data Distinguish between
non-information resource http://dbpedia.org/resource/The_Shining_%28film%29
Information resource http://dbpedia.org/page/The_Shining_%28film%29 (HTML)
http://dbpedia.org/data/The_Shining_%28film%29 (RDF)
28 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
The Linking Open Data Project
29 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Some clarifications Open Data: a philosophy, practice, or policy that data
are freely available to everyone without restrictions from copyright, patents, a.s.o.
Linked Data: best practices for exposing, sharing, and connecting data using URIs and RDF
Linking Open Data: a W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources
30 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
As of October 2007
31 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
32 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
33 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Available Tools - Overview
34 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
RDF APIs Jena Semantic Web Framework (Java)
http://jena.sourceforge.net/
Sesame
ARC (PhP) http://arc.semsol.org/
Redland RDF – Ruby interface (Ruby) http://librdf.org/docs/ruby.html
RDFlib (Python) http://www.rdflib.net/
35 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Triple Stores Jena Semantic Web Framework (Java)
http://jena.sourceforge.net
Sesame (Java) http://www.openrdf.org/
OpenLink Virtuoso http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/
ARC (PhP) http://arc.semsol.org/
…
36 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
References RDF Primer: http://www.w3.org/TR/rdf-primer/
OWL 2 Overview: http://www.w3.org/TR/2009/REC-owl2-primer-20091027/
Best Practice Recipes for Publishing RDF Vocabularies: http://www.w3.org/TR/swbp-vocab-pub/
How to Publish Linked Data on the Web: http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
Pedantic Web Group: http://pedantic-web.org/
37 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
References Linking Open Data Project
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
38 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 38 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Agenda Introduction
Producing Linked Data
Existing Data Sets
Linking Data
Consuming Linked Data
Multimedia Interlinking
Multimedia Annotations
Enriching Personal Media Collections
39 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 39 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Linked Data Publishing Steps The basic tenets of Linked Data are to:
use the RDF data model to publish structured data on the Web
use RDF links to interlink data from different data sources
RDF data can be contained in
a single file (RDF/XML) or
embedded in an existing file with RDFa
40 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 40 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Principles Resources
all items of interest are called resources
Resource Identifiers Uniform Resource Identifiers (URIs) are used, use of HTTP URIs is strongly suggested
Representation of an information resource is a stream of bytes (HTML, JPG, RDF, …)
Dereferencing HTTP URIs
41 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 41 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Content Negotiation Example: http://dbpedia.org/resource/Graz
(URI identifying the non-information resource Graz)
http://dbpedia.org/data/Graz (information resource with an RDF/XML representation describing Graz)
http://dbpedia.org/page/Graz (information resource with an HTML representation describing Graz)
42 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 42 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Content Negotiation with RDFa When RDF is embedded in another representation
(e.g. via RDFa) no content negotiation is needed
One representation is used for both humans and machines
(X)HTML
RDF
XHTML+RDFa representation
43 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 43 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
RDFa example Plain (X)HTML: ... All content on this site is licensed under
<a href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License </a>.
(X)HTML with RDF embedded: ... All content on this site is licensed under
<a rel="license“ href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License </a>.
More information on RDFa:
http://www.w3.org/TR/xhtml-rdfa-primer/
44 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 44 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Choosing URIs Resources are named with URI references
Choose „good“ URIs for your resources
Cool URIs for the Semantic Web http://www.w3.org/TR/cooluris/
45 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 45 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Choosing URIs Use HTTP URIs for everything
Define your URIs in an HTTP namespace under your control
Keep implementation details out of your URIs. Short, mnemonic names are better http://dbpedia.org/resource/Graz VS
http://www.confuseme.com:2020/demos/xyz/cgi-bin/resources.php?id=Graz
Try to keep your URIs stable and persistent Cool URIs don’t change!
Use some kind of primary key inside your URIs (e.g. when dealing with books use the ISBN, etc.)
46 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 46 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Which vocabularies to use? Reuse terms from well-known vocabularies wherever
possible
Only define new terms yourself if you can’t find required terms in existing vocabularies
List of well-known vocabularies: http://esw.w3.org/topic/TaskForces/Community Projects/LinkingOpenData/CommonVocabularies
47 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 47 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Some well-known vocabularies Friend-of-a-Friend (FOAF), vocabulary for describing
people. Dublin Core (DC) defines general metadata attributes. Semantically-Interlinked Online Communities (SIOC),
vocabulary for representing online communities. Description of a Project (DOAP), vocabulary for describing
projects. Simple Knowledge Organization System (SKOS),
vocabulary for representing taxonomies and loosely structured knowledge.
Music Ontology provides terms for describing artists, albums and tracks.
Review Vocabulary, vocabulary for representing reviews. Creative Commons (CC), vocabulary for describing
license terms.
48 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 48 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
More best practices You can mix terms from different vocabularies, e.g.
rdfs:label and foaf:depiction
Use URI references from well-established data sources, e.g.: Geonames
DBpedia
MusicBrainz
RDF Book Mashup
49 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 49 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Defining your own terms If you need to define own terms use RDFS or OWL.
1. Do not define new vocabularies from scratch
2. Provide for both humans and machines
3. Make term URIs dereferenceable
4. Make use of other people's terms
5. State all important information explicitly
6. Do not create over-constrained, brittle models; leave some flexibility for growth
50 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 50 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
What should the RDF contain? What triples should go into the RDF representation that
is returned (after a 303 redirect) in response to dereferencing a URI identifying a non-information resource?
Description
Backlinks
(Related descriptions)
Metadata
Syntax: RDF/XML (+ maybe other serializations)
51 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 51 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Serving linked data Things must be identified with dereferenceable HTTP
URIs
When MIME-type application/rdf+xml is requested, a data source must return an RDF/XML description of the identified resource
Provide RDF links to other resources so that clients can navigate the Web of Data by following RDF links
52 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 52 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Serving static RDF files RDF files generated manually or
generated by some software
Configure server for correct MIME types When using Apache and server is not yet configured to return the
correct MIME type add to httpd.conf or .htaccess: AddType application/rdf+xml .rdf
File size (should be < 1 MB)
53 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 53 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Serving relational databases Several tools exist to generate RDF from relational
databases
54 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 54 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Linked Data Publishing Steps For more information:
Tutorial „How to Publish Linked Data on the Web” By Chris Bizer, Richard Cyganiak, Tom Heath
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
55 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 55 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
RDB2RDF W3C Incubator Group in 2008 & early 2009 to
examine existing approaches for generating RDF from relational databases http://www.w3.org/2005/Incubator/rdb2rdf/
Survey of Current Approaches for Mapping of Relational Databases to RDF http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf
Incubator Group Report http://www.w3.org/2005/Incubator/rdb2rdf/XGR-rdb2rdf/
56 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 56 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
RDB2RDF Working Group W3C Working Group started in 2009 to standardize
a language for mapping relational data and relational database schemas into RDF and OWL, tentatively called the RDB2RDF Mapping Language, R2RML http://www.w3.org/2001/sw/rdb2rdf/
First public results to be expected in early 2010
Participants from industry and academia
57 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 57 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Tools for Publishing Linked Data
58 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 58 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Pubby Linked Data Frontend Provides Linked Data from SPARQL endpoint
data must be available in RDF already
if not: use a wrapper
Originally developed for DBpedia Richard Cyganiak, Chris Bizer - FU Berlin
Provides dereferenceable HTTP-URIs
Simple HTML interface for browsing
Handles 303 redirects correctly
Content Negotiation (HTML, RDF/XML, N3)
Java Web application
59 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 59 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Pubby - Architecture
SPARQL endpoint must support DESCRIBE queries
supports multiple datasets (SPARQL endpoints)
Text
Text
60 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 60 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Pubby Setup How-To Download and extract from:
http://www4.wiwiss.fu-berlin.de/pubby/download
Setup a Servlet Container (e.g. Tomcat, Jetty...)
Use ant to build WAR or copy /webapp folder
Configure config file in WEB-INF/config.n3 file location can be set by context param. “config-file” in web.xml
N3 syntax
General Server section (instance of conf:Configuration) conf:webBase - root of URL of web application, e.g. http://localhost/
sub-section for each dataset conf:sparqlEndpoint - SPARQL endpoint URI
conf:datasetBase - common URI prefix of resources in the dataset
61 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 61 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Wrapping Relational Databases D2RQ-Map and D2R-Server (FU Berlin)
http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/index.html)
Triplify (Uni Leipzig) http://triplify.org)
OpenLink Virtuoso RDF Views http://virtuoso.openlinksw.com/wiki/main/Main/
62 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 62 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
D2RQ-Map and D2R-Server Java
wraps any ODBC database to RDF
2 Components D2RQ-Map (wrapping component): dumps + virtual
D2R-Server (adds SPARQL endpoint)
can be used in Jena applications (Assembler)
Automatic generation of mapping file (simple) shell script: “generate-mapping”
63 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 63 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
D2RQ-Map and D2R-Server
64 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 64 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
D2R-Server How-To Download/Extract
Generate mapping file automatically first generate-mapping -o mapping.n3 -d driver.class.name!
-u db-user -p db-password jdbc:url:..." Inspect the generated mapping Model your desired target graph
you should always know what you want...
also study your source database model
Adjust the mapping
65 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 65 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Mapping language d2rq:Database
d2rq:ClassMap
d2rq:PropertyBridge
Rather expressive Joins
Conditions
Value-translations
66 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 66 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Example Mapping
67 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 67 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Wrapping Spreadsheets Excel2RDF, RDF123, TopBraid Composer
XLWrap (A. Langegger) http://xlwrap.sourceforge.net/
supports cross tables, repetitive patterns in spreadsheets
arbitrary target graphs powerful expressions (extensible, user-defined functions)
68 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 68 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
XLWrap Example
69 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 69 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Example Template Graph § [ xl:uri "'http://example.org/revenue_' &!§ URLENCODE(SHEETNAME(A1) & '_' & B2 & '_' &!§ A4)"^^xl:Expr ] a ex:Revenue ;!§ ex:country "DBP_LOCALITY(SHEETNAME(A1))"^^xl:Expr ;!§ ex:year "DBP_YEAR(B2)"^^xl:Expr ;!§ ex:product "A4"^^xl:Expr ;!
§ ex:itemsSold "B4"^^xl:Expr ;!§ ex:revenue "C4"^^xl:Expr .!
Text
70 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Agenda Introduction
Producing Linked Data
Existing Data Sets
Linking Data
Consuming Linked Data
Multimedia Interlinking
Multimedia Annotations
Enriching Personal Media Collections
71 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Existing Relevant Linked Data Sets
What is out there?
72 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
The Linked Data Cloud
(a success story, 2007-2009)
73 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
74 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 10/2007
75 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009 03/2008
76 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
77 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
78 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
79 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Some numbers Data Set # of RDF Triples
ACM RKB > 12,000,000
AudioScrobbler > 600,000,000
BBC Music + Programmes > 20,000,000
Bio2RDF > 2,000,000,000
data.gov > 5,000,000,000
DBpedia > 470,000,000
Freebase > 100,000,000
Geonames > 90,000,000
Linked Geo Data > 3,000,000,000
MusicBrainz > 60,000,000
RDF Book Mashup > 100,000,000
US Census Data > 1,000,000,000 Source: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics
80 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
DBpedia: the Linked Data Hub DBpedia [Auer07] is a Linked Data representation of
Wikipedia content.
(Semi-)structured data are extracted (mostly from infoboxes) and are published as RDF.
DBpedia 3.4 (Nov. 2009) describes 2.9 million things, including persons, places, organizations, ...
Each thing has an URI; most of them have a label and an abstract, in up to 91 languages.
DBpedia provides > 8 millions links to other data sets (web pages and RDF), and 75,000 YAGO categories.
81 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
DBpedia Vocabularies
DBpedia uses its own set of terms, which are derived from the naming in Wikipedia Infoboxes.
Additionally used: RDF standard terms, FOAF.
The importance of DBpedia lies not primarily in the data it provides, but in the names (i.e., HTTP URIs) it provides.
82 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
83 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
84 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
BBC Linked Data BBC is continually increasing their Linked Data
services.
BBC Programmes publishes all broadcast programmes and provides URIs for them.
BBC Music provides data about music artists and links to DBpedia and MusicBrainz.
BBC uses Linked Data technology to link their internal, heterogeneous data sets [Kobi09].
Used vocabularies: RDF(S), FOAF, DC, Music Ontology, ...
85 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
86 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
87 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
DBtune DBtune [Rai08] is a collection of music-related
Linked Data sets, amongst others: Jamendo (creative commons music site)
MusicBrainz (community music metadata)
AudioScrobbler (last.fm playcounts)
MySpace
URIs for entities can be retrieved by lookup services or through links to DBpedia and other data sets.
Used vocabularies: RDF(S), FOAF, DC, Music Ontology, ...
88 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
89 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
90 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Linked Movie Data Base LMDB [Hass09] is an RDF representation of parts of
the data from the Internet Movie Database (IMDb).
Data about ~40,000 films, ~30,000 actors, ~8,000 directors, etc.
Links to DBpedia
YAGO
flickr
RDF book mashup
MusicBrainz
Geonames
Vocabularies: RDF(S), FOAF, Movie Ontology, SKOS, DC
91 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
92 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
93 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
There is more ... flickr wrapper: generates links to flickr pictures on
demand, represents them as RDF
Geonames: information about places (cities, countries)
Linked Geo Data: RDF representation of OpenStreetMap (geo data)
revyu.com: provides reviews for any kind of entities (including movies and music)
New York Times: >5.000 concepts from their archive
and many more ...
Data is out there ... use it!
94 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
References & Credits
The Linked Data Cloud is maintained by Richard Cyganiak and Anja Jentzsch.
[Auer07] S. Auer et al., DBpedia: A Nucleus for a Web of Open Data, Proc. ISWC 2007
[Kobi09] G. Kobilarov et al., Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections, Proc. ESWC 2009
[Rai08] Y. Raimond et al., A Web of Musical Information, Proc. ISMIR 2008
[Hass09] O. Hassanzadeh et al., Linked Movie Data Base, Proc. LDOW 2009
95 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Web of Data in the Context of Multimedia Part 1: Linked Open Data: Vision, Concepts and Technologies
Linking Data
Bernhard Haslhofer, Bernhard Schandl, Andreas Langegger,
Wolfgang Halb, Tobias Bürger
96 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Linking Data RDF links enable Linked Data browsers and crawlers
to navigate between data sources and to discover additional data.
Properties used to link data depend on the application domain, e.g. :person1 foaf:knows :person2
Use owl:sameAs for URI aliases when two URIs refer to the same thing
97 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Linking Data Example: Equivalent URIs for
http://data.linkedmdb.org/resource/film/2014 owl:sameAs 1.http://dbpedia.org/resource/The_Shining_(film) 2.http://data.linkedmdb.org/resource/film/2014 3.http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000046c3da
Linking can be de done manually
automatically
98 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Setting RDF links manually Usually only done in very small datasets like
personal FOAF profiles (e.g. stating who you know by setting foaf:knows links)
Look up URI for the resource that you want to link to uriqr.com
sindice.com
(SPARQL) queries to existing datasets
Use the URI identifying the resource,
http://dbpedia.org/resource/The_Shining_(film) not the URI for the document about it
http://dbpedia.org/page/The_Shining_(film)
99 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Auto-generating RDF links Manual approach not feasible for larger datasets
Record linkage well-known problem in the DB world
Specialised pattern-based algorithms for individual problems Use of unique identifiers (e.g. ISBN for books, ISIN for financial
securites, ISO-3166 country codes, etc.)
Example: http://dbpedia.org/resource/Harry_Potter_and_the_Half-Blood_Prince
has a property dbpprop:isbn 747581088
owl:sameAs link can be easily generated to http://www4.wiwiss.fu-berlin.de/bookmashup/books/0747581088
100 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Tools for data interlinking Input: 2 (or more) datasets + linkage specification Output: links between the datasets
Domain-independent tools: SILK ODD-Linker RDF-AI Knofuss
Domain-specific tools: LD-Mapper (Music Ontology) RKB co-reference resolution system (publications)
101 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Overview
RKB CRS LD-Mapper ODD RDF-AI Silk Knofuss
Ontologies multi multi single single single multi
Automation semi automatic semi semi semi semi
User input ad-hoc program none link spec. query
dataset structure alignment method
links spec. alignment method
merged ontology
Input format Java Prolog LinQL XML Silk-LSL (XML) OWL
Matching techniques string
string, similarity propagation string, Wordnet string string
string, adaptive learning
Onto. alignment no no no no no yes, as input
Output owl:sameAs owl:sameAs linkset linkset
alignment format, merged dataset linkset
alignment format, merged dataset
Data access API local copy ODBC local copy SPARQL local copy
Domain publications Music Ontology independent independent independent independent
Overview generated by François Scharffe and Jérôme Euzenat http://melinda.inrialpes.fr
102 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Specification formats
datasets resources links matching
Silk-LSL SPARQL endpoint, graphs name
resources to interlink, resources type
link condition (for each resource)
string matching, matchers combination
Knowfuss local copy (SPARQL query) fusion method string matching (for each resource)
RDF-AI local copy resource descriptions link description fuzzy string, wordnet
LD-Mapper local copy resource query link description string matching
ODD-linker local copy resource description (table.column) link description,
synonym, hyponym, weightedJaccard, token intersect
Overview generated by François Scharffe and Jérôme Euzenat http://melinda.inrialpes.fr
103 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
SILK Silk - A Linking Framework for the Web of Data
http://www4.wiwiss.fu-berlin.de/bizer/silk/
Silk Link Specification Language
User specifies the type of resources to be linked and the comparison technique to be used
Silk uses many string comparison techniques, numerical and date similarity measures, concept distances in a taxonomy
104 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
SILK Input: 2 datasets behind SPARQL endpoint + LSL
specification
Output: linkset with owl:sameAs (or other user specified property) links between resources
Silk LSL example on next slide
105 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
<Interlink id="cities">
<LinkType>owl:sameAs</LinkType>
<SourceDataset dataSource="dbpedia" var="a">
<RestrictTo>
?a rdf:type dbpedia:City
</RestrictTo>
</SourceDataset>
<TargetDataset dataSource="geonames" var="b">
<RestrictTo>
?b rdf:type gn:P
</RestrictTo>
</TargetDataset>
<LinkCondition>
<AVG>
<Compare metric="jaroSimilarity">
<Param name="str1" path="?a/rdfs:label" />
<Param name="str2" path="?b/gn:name" />
</Compare>
<Compare metric="numSimilarity">
<Param name="num1" path="?city1/dbpedia:populationTotal" />
<Param name="num2" path="?city2/gn:population" />
<Compare>
</AVG>
</LinkCondition>
<Thresholds accept="0.9" verify="0.7" />
<Output acceptedLinks="accepted_links.n3"
verifyLinks="verify_links.n3"
mode="truncate" />
</Interlink>
106 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Web of Data in the Context of Multimedia Part 1: Linked Open Data: Vision, Concepts, and Technologies:
Consuming Linked Data
Bernhard Haslhofer, Bernhard Schandl, Andreas Langegger,
Wolfgang Halb, Tobias Bürger
107 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Agenda URI Discovery
Data Discovery
Data Set Discovery
Tools and Libraries to Access and Consume Linked Data Mashups and Browsers
Programmatic Access to Linked Data
108 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
URI Discovery 1. Querying specific data sources,
e.g., http://lookup.dbpedia.org
2. Using dedicated search engines, e.g.,
Falcons http://iws.seu.edu.cn/services/falcons/
Sindice http://sindice.com
SWSE http://www.swse.org
Watson http://watson.kmi.open.ac.uk
109 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Example Search Engine: Falcons
(1) Entry of keywords
(2) Results of objects
(3) Class hierarchy to refine search
Try yourself: http://iws.seu.edu.cn/services/falcons/
110 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Data Discovery 1. Manual link traveral (i.e. follow rdfs:seeAlso or
owl:sameAs links)
2. Use of co-reference services, e.g. http://sameas.org
3. Use of Uri-based query engines, e.g. http://sindice.org
111 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Sameas.org
112 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Sindice.org
113 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Data Set Discovery 1. Manually, i.e., by browsing and selecting a data set
from http://esw.w3.org/topic/SparqlEndpoints
2. (Semi-) automatically, i.e., by exploiting VoiD, the Vocabulary of interlinked Datasets
114 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Describing Datasets The problem:
Only human comprehensible descriptions of datasets available
Automation of tasks impossible such as Efficient & effective search
Selection of datasets (for apps, interlinking targets)
Generation of maps, etc.
Solution: voiD, the “Vocabulary of Interlinked Datasets” provides a formal description of What a dataset is about (topic, technical details).
How and under which conditions to access it.
How the dataset is interlinked with other datasets.
Qualitative level: type of interlinking.
Quantitative level: number of links, resources, etc.
How to discover the metadata.
cf. K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao "Describing Linked Datasets" Proceedings of LDOW 2009, 2009.
115 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
VoiD – Core concepts A dataset is a set of RDF triples that are published, maintained or
aggregated by a single provider.
A dataset is authoritative with respect to a certain URI namespace if it contains information about resources named by URIs in this namespace, and is published by the URI owner.
A linkset LS is a set of RDF triples where for all triples ti=⟨si,pi,oi⟩ ∈ LS, the subject is in one dataset, i.e. all si are described in DS1 , and the object is in another dataset, i.e. all oi are described in DS2 .
cf. K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao "Describing Linked Datasets" Proceedings of LDOW 2009, 2009.
116 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
voiD Vocabulary
cf. K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao "Describing Linked Datasets" Proceedings of LDOW 2009, 2009.
117 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
voiD Usage Example
cf. K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao "Describing Linked Datasets" Proceedings of LDOW 2009, 2009.
118 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Tools and Applications to Consume Linked Data
Linked Data browsers To explore things and datasets and to navigate between them.
Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE), OpenLink RDF Browser (OpenLink, UK), Zitgist Dataviewr (Zitgist, USA), Disco Hyperdata Browser (FU Berlin, DE), Fenfire (DERI, Ireland)
Linked Data mashups Sites that mash up (thus combine Linked data)
Revyu.com (KMI, UK), DBtune Slashfacet (Queen Mary, UK), DBPedia Mobile (FU Berlin, DE), Semantic Web Pipes (DERI, Ireland)
Search engines To search for Linked Data.
Falcons (IWS, China), Sindice (DERI, Ireland), MicroSearch (Yahoo, Spain), Watson (Open University, UK), SWSE (DERI, Ireland), Swoogle (UMBC, USA)
Listing on this slide by T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, O. Hartig
119 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Example Linked Data Browser: Marbles
Server-based Linked Data browser.
Formats RDF for XHTML using Fresnel.
Unique feature: Indicates the origin of displayed data using colored dots.
Support for different views: Full view: all available data is displayed. Summary view: returns a short textual summary about a resource.
Photo view: provides a photo for a given resource.
Retrieves data from multiple sources by (a) issuing parallel queries to multiple Linked Data search engines and (b) by following owl:sameAs and rdfs:seeAlso links.
120 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Example Linked Data Browser: Marbles (2) (1) Entry of query URL
(2) Data display
(3) Sources
Try yourself: http://marbles.sourceforge.net/
121 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Example Linked Data Browser: gFacet
cf. Heim, P., Ziegler, J., and Lohmann, S. "gFacet: A Browser for the Web of Data" In Proceedings of IMC-SSW 2008, 2008. Try yourself: http://gFacet.org
122 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Example Mashup: Revyu.com
Try yourself: http://revyu.com Picture from revyu.com
123 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Example Mashup: DBPedia Mobile
Try yourself: http://wiki.dbpedia.org/DBpediaMobile
Pictures from DBPedia Mobile
124 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
Libraries to Programmatically Access the Web of Data
SPARQL JavaScript Library, http://www.thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_a_sparql.html
ARC for PHP, http://arc.semsol.org/
RAP – RDF API for PHP http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html
SPARQL Wrapper (Python), http://sparql-wrapper.sourceforge.net/
PySPARQL (Python), http://code.google.com/p/pysparql/
DARQ (Distributed ARQ), http://darq.sourceforge.net/
Semantic Web Client library (SWClLib) for Java, http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
Listing on this slide by O. Hartig, J. Sequeda
125 SAMT 2009 – Tutorial Web of Data in the Context of Multimedia (WoDMM) Graz, Austria - 2 Dec 2009
What’ll come next Part 2: Linked Multimedia