184
DBpedia and the Emerging Web of Linked Data Sören Auer

Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Embed Size (px)

DESCRIPTION

Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea intoa very promising candidate for addressing one of the biggest challengesof computer science: the exploitation of the Web as a platform for dataand information integration. To translate this initial success into aworld-scale reality, a number of research challenges need to beaddressed: the performance gap between relational and RDF datamanagement has to be closed, coherence and quality of data published onthe Web have to be improved, provenance and trust on the Linked Data Webmust be established and generally the entrance barrier for datapublishers and users has to be lowered. This tutorial will discussapproaches for tackling these challenges. As an example of a successfulLinked Data project we will present DBpedia, which leverages Wikipediaby extracting structured information and by making this informationfreely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well asthe recently launched DBpedia benchmark.

Citation preview

Page 1: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

DBpedia and the Emerging Web ofLinked Data

Sören Auer

Page 2: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 2

http://lod2.eu

• 2000 Mathematics and Computer Science studies inHagen, Dresden and Екатеринбург

• Managing director of adVIS GmbH – SME focusedon Web-Application and Content Management technology

• IT consultant for various companies (T-Mobile AG, RDL Corp., Science Computing AG)

• 2006 doctorate in Information Systems / Computer Science at Universität Leipzig

• 2006-2008 post-doctoral researcher at the DB Group at University of Pennsylvania (USA)

• Head of AKSW research group – DBpedia, OntoWiki, LinkedGeoData, Triplify

• Research interests: Information Systems, Database and Web Technologies, Semantic Web and Knowledge Engineering, Adaptive Methodologies, HCI, E-Science, Digital Libraries

• Coordinator of the EU FP7 IP Project “LOD2 – Creating Knowledge out of Interlinked Data”

• Work as expert for W3C, EU FP6/FP7/CIP, University City Keystone Innovation Zone, Swiss National Science Foundation

Dr. Sören Auer

Page 3: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 3

http://lod2.eu

1. The Vision & Big Picture

2. Linked Data 101

3. The Linked Data Life-cycle

Agenda

Page 4: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 4

http://lod2.eu

1. Reasoning does not scale on the Web

• IR / one dimensional indexing scales (Google)

• Next step conjunctive querying (OWL-QL?, dynamic

scale-out / clustering)

• Web scalable DL reasoning is out-of-sight (maybe fragment,

fuzzy reasoning has some chances)

2. If it would scale it would not be affordable

• “What is the only former Yugoslav republic in the

European Union?”

• 2880 POWER7 cores, 16 Terabytes memory, 4 Terabytes

clustered storage (IBM Watson) still can not answer this

question

Why the Semantic Web won‘t work (soon)

Page 5: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 5

http://lod2.eu

Web server

Web server

Problem: Try to search for these things on the current Web:

• Apartments near German-Russian bilingual childcare in Berlin.

• ERP service providers with offices in Vienna and London.

• Researchers working on multimedia topics in Eastern Europe.

Information is available on the Web, but opaque to current search.

Why do we need the Data Web?

berlin.deHas everything about childcare in Berlin.

Immobilienscout.deKnows all about real estate offers in GermanyDB

Web server

DB

Web server

Search engineHTML HTML

RDF RDF

Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources:

Page 6: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

From the Document Web to theSemantic Data Web

Web (since 1992)• HTTP• HTML/CSS/JavaScript

Semantic Web(Vision 1998, starting ???)• Reasoning• Logic, Rules• Trust

Social Web (since 2003)• Folksonomies/Tagging• Reputation, sharing• Groups, relationships

Data Web (since 2006)• URI de-referencability• Web Data integration• RDF serializations

Page 7: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Web 1.0 Web 2.0 Web 3.0

Many Web sitescontaining unstructured,textual content

Few large Web sites are specialized onspecific content types

Many Web sites containing & semantically syndicating arbitrarily structured content

PicturesVideo

Encyclopedicarticles+ +

Page 8: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

The Long Tail of Information DomainsPictures

NewsVideo

Recipes

Calendar

Currently supportedstructuredcontent types

SemWeb supported structured content

Genesequences

Itinerary ofKing George

Talentmanagement

Popu

larit

y

Not or insufficiently supported content types

The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to information domains

… …

Requirements-Engineering

……

Special interestcommunities

Page 9: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 9

http://lod2.eu

1. Uses RDF Data Model

Linked Data in a Nutshell

SBBD2011

Florianopolis

3.10.2011

SBCorganizes

starts

takesPlaceIn

2. Is serialised in triples:SBC organizes SBBD2011SBBD2011 starts “20111003”^^xsd:dateSBBD2011 takesPlaceAt Florianopolis

3. Uses Content-negotiation

Page 10: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

The emerging Web of Data

20082007

20082008

20082009

2009

Virtouso

SemMF

SILK

poolparty

DL-Learner

Sindice

Sigma

ORE

OntoWiki

MonetDB

DXX Engine

WiQA

repair

interlink

fuse

classify

enrich

create

Page 11: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 11

http://lod2.eu

Conceptual LevelData Access and Integration

Object-relational mappings (ORM)• NeXT’s EOF / WebObjects• ADO.NET Entity Framework• Hibernate

Entity-attribute-value (EAV)• HELP medical record

system, TrialDB

Column-oriented DBMS• Collocates column

values rather than row values

• Vertica, C-Store, MonetDB

Data Web• URIs as entity identifiers• HTTP as data access

protocol• Local-As-View (LAV)

RDBMS• Organize data in

relations, rows, cells

• Oracle, DB2, MS-SQL

Triple/Quad Stores•RDF data model•Virtuoso, Oracle,

Sesame

Dat

a M

odel

s Others• XML, hierachical,

tree, graph-oriented DBMS

Procedural APIs• ODBC• JDBC

Dat

a Ac

cess Query Languages

• Datalog, SQL• SPARQL• XPATH/XQuery

Dat

a In

tegr

ation

Linked Data• de-referencable

URIs• RDF serialization

formats

Enterprise Information Integrationsets of heterogeneous data sources appear as a single, homogeneous data source

Data Warehousing• Based on extract,

transform load (ETL)• Global-As-View (GAV)

ResearchMediatorsOntology-basedP2PWeb service-based

Page 12: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 12

http://lod2.eu

1. The Vision & Big Picture

2. Linked Data 101

(based on Michael Hausenblas‘

slides)

3. The Linked Data Life-cycle

Agenda

Page 13: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 13

http://lod2.eu

Orientation

Page 14: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 14

http://lod2.eu

Linked Data 101

Linked Data provides a standardised API for:

Data and metadata discovery

Data integration

Distributed query

Page 15: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 15

http://lod2.eu

Linked Data principles

1. Use URIs to identify the “things” in your data

2. Use http:// URIs so people (and machines) can look them up on the web

3. When a URI is looked up, return a description ofthe thing (in RDF format)

4. Include links to related things

http://www.w3.org/DesignIssues/LinkedData.html

Page 16: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 16

http://lod2.eu

Linked Data principles

They are principles, not implementation advices Not humans or machines but humans and

machines! Content negotiation (e.g. HTML and RDF/XML) HTML+ RDFa

Metcalfe’s Lawhttp://en.wikipedia.org/wiki/Metcalfe%27s_law

Page 17: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 17

http://lod2.eu

Linked Data example

17

Page 18: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 18

http://lod2.eu

HTTP URIs

A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. [RFC3986]

SyntaxURI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

Examplefoo://example.com:8042/over/there?name=ferret#nose

\_/ \_________________/\_________/ \__________/ \__/

| | | | |

scheme authority path query fragment

Page 19: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 19

http://lod2.eu

HTTP URIs

URI referencesAn RDF URI reference is a Unicode string does not contain any control characters (#x00 - #x1F, #x7F-#x9F) and would produce a valid URI character sequence representing an absolute URI when subjected to an UTF-8 encoding along with %-escaping non-US-ASCII octets.

Qualified Names (QNames)XML’s way to allow namespaced elements/attributes as of QName = Prefix ‘:‘ LocalPart

Compact URIs (CURIEs)Generic, abbreviated syntax for expressing URIs

Page 20: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 20

http://lod2.eu

HTTP

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

Page 21: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 21

http://lod2.eu

HTTP

HTTP messages consist of requests from client to server and responses from server to client

Set of methods is predefined GET POST PUT DELETE HEAD (OPTIONS)

Page 22: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 22

http://lod2.eu

HTTP

Status codes Informational 1xx, provisional response, (100

Continue) Successful 2xx, request successfully received,

understood, and accepted (201 Created) Redirection 3xx, further action needs to be taken by

user agent to fulfill the request (301 Moved Permanently)

Client Error 4xx, client erred (405 Method Not Allowed) Server Error 5xx, server encountered an unexpected

condition (501 Not Implemented)

Page 23: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 23

http://lod2.eu

HTTP

GET /html/rfc2616 HTTP/1.1

Host: tools.ietf.org

User-Agent: Mozilla/5.0

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

HTTP/1.x 200 OK

Date: Thu, 05 Mar 2009 08:17:33 GMT

Server: Apache/2.2.11

Content-Location: rfc2616.html

Last-Modified: Tue, 20 Jan 2009 09:16:04 GMT

Content-Type: text/html; charset=UTF-8

RE

QU

ES

TR

ES

PO

NS

E

Page 24: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 24

http://lod2.eu

HTTP

Content Negotiation: selecting representation for a given response when multiple representations available

Three types of CN: server-driven, agent-driven CN, transparent CN

Example:

curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/GalwayHTTP/1.1 303 See OtherContent-Type: application/rdf+xmlLocation: http://dbpedia.org/data/Galway.rdf

Page 25: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 25

http://lod2.eu

HTTP

Caching (see Cache–Control header field) is essential for scalabilityhttp://webofdata.wordpress.com/2009/11/23/linked-open-data-http-caching/

HTTPbis IETF WG chaired by Mark Nottingham, mainly about: patches, clarifications, deprecate non-used features, documentation of security properties

Page 26: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 26

http://lod2.eu

REST - HTTP

Representational State Transfer (REST)

resource intended conceptual target of a hypertext reference

resource identifier URL, URN

representation HTML document, JPEG image

representation media type, last-modified timemetadata

resource source link, alternates, varymetadata

control data if-modified-since, cache-control

http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm http://webofdata.wordpress.com/2009/10/09/linked-data-for-restafarians/

Page 27: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 27

http://lod2.eu

Web's Standard Retrieval Algorithm

1. parse URI and find HTTP protocol2. look up DNS name to determine the associated

IP address3. open a TCP stream to port 80 at the IP address

determined above4. format an HTTP GET request for resource and

sends that to the server5. read response from the server6. from the status code (200) determine that a

representation of the resource is available7. inspect the returned Content-Type8. pass the entity-body to its HTML rendering

engine

http://www.w3.org/2001/tag/doc/selfDescribingDocuments

Page 28: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 28

http://lod2.eu

RDF

A data model - directed, labeled graph Triple: (subject predicate object)

subject … URIref or bNode predicate … URIref object … URIref or bNode or literal

Page 29: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 29

http://lod2.eu

RDF Triple

• Inspired by linguistic categories

• Allowed usage:

Subject : URI or blank node

Predicate: URI (also called properties)

Object : URI or blank nodes or literal

Burkhard Jung LeipzigisMayorOf

Subject Predicate Object

Page 30: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 30

http://lod2.eu

Example RDF Graph

0341LeipzighasAreaCode

Burkhard Jung

hasMayor

Saxony

locatedIn

51.3333

latitude

12.3833longitude

GermanySocial Democratic Party

1958-03-07 isMemberOflocatedIn

born

isMayorOf

Page 31: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 31

http://lod2.eu

Literals

• Representation of data values• Serialization as strings• Interpretation based on the datatype• Literals without Datatype are treated as

strings

Leipzig

Burkhard Jung

51.3333latitude

12.3833longitude

1958-03-07born

isMayorOf hasMayor

Page 32: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 32

http://lod2.eu

RDF Serialization

N3: "Notation 3" - extensive formalism

N-Triples: part of N3

Turtle: Extension of N-Triples (shortcuts)

Quelle:http://www.w3.org/DesignIssues/Notation3.html

Page 33: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 33

http://lod2.eu

Turtle Syntax

• URIs in angle brackets• Literals in quotes• Triples separated by dot • Whitespace is ignored

33

Page 34: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 34

http://lod2.eu

Turtle Syntax: Shortcuts

http://dbpedia.org/resource/Leipzig http://dbpedia.org/property/hasMayor http://dbpedia.org/resource/Burkhard_Jung ;http://www.w3.org/2000/01/rdf-schema#label "Leipzig"@de ;http://www.w3.org/2003/01/geo/wgs84_pos#lat "51.333332"^^xsd:float ;http://www.w3.org/2003/01/geo/wgs84_pos#lon "12.383333"^^xsd:float .

Shortcuts for namespace prefixes:@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> .@prefix dbp:<http://dbpedia.org/resource/> .@prefix dbpp:<http://dbpedia.org/property/> .@prefix geo:<http://www.w3.org/2003/01/geo/wgs84_pos#> .

dbp:Leipzig dbpp:hasMayor dbp:Burkhard_Jung .dbp:Leipzig rdfs:label "Leipzig"@de .dbp:Leipzig geo:lat "51.333332"^^xsd:float .dbp:Leipzig geo:lon "12.383333"^^xsd:float .

Page 35: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 35

http://lod2.eu

Turtle Syntax: Shortcuts

Group triples with same subject using “;” instead of “.”:@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs="http://www.w3.org/2000/01/rdf-schema#> .@prefix dbp="http://dbpedia.org/resource/> .@prefix dbpp="http://dbpedia.org/property/> .@prefix geo="http://www.w3.org/2003/01/geo/wgs84_pos#> .

dbp:Leipzig dbpp:hasMayor dbp:Burkhard_Jung ; rdfs:label "Leipzig"@de ; geo:lat "51.333332"^^xsd:float ;

geo:lon "12.383333"^^xsd:float .

also Triple with same subject and predicate:@prefix dbp="http://dbpedia.org/resource/> .@prefix dbpp="http://dbpedia.org/property/> .

dbp:Leipzig dbp:locatedIn dbp:Saxony, dbp:Germany; dbpp:hasMayor dbp:Burkhard_Jung .

Page 36: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 36

http://lod2.eu

XML-Syntax von RDF

• Turtle intuitively readable and machine processable

• but: better tool support and programming libraries for XML

<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dbpp="http://dbpedia.org/property/"

xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"><rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">

<property:hasMayor rdf:resource="http://dbpedia.org/resource/Burkhard_Jung" />

<rdfs:label xml:lang="de">Leipzig</rdfs:label><geo:lat rdf:datatype="float">51.3333</geo:lat><geo:lon rdf:datatype="float">12.3833</geo:lon>

</rdf:Description></rdf:RDF>

Page 37: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 37

http://lod2.eu

RDF/JSON

• JSON = JavaScript Object Notation• Compact format for data exchange between

applications• JSON documents are valid JavaScript• Programming language independent, since parser exist for all

popular programming languages• Less overhead when parsing and serialising than XML

{ "S" : { "P" : [ O ] } }•Subject: URI, BNode•Predicate: URI•Object:

Type: „URI“, „Literal“ or „bnode“Value: data valueLang: language tagDatatype: URI of the datatype.

Page 38: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 38

http://lod2.eu

JSON Example

{ "http://dbpedia.org/resource/Leipzig" : { "http://dbpedia.org/property/hasMayor":

[ { "type":"uri", "value":"http://dbpedia.org/resource/Burkhard_Jung" } ], "http://www.w3.org/2000/01/rdf-schema#label":

[ { "type":"literal", "value":"Leipzig", "lang":"en" } ] , "http://www.w3.org/2003/01/geo/wgs84_pos#lat":

[ { "type":"literal", "value":"51.3333", "datatype":"http://www.w3.org/2001/XMLSchema#float" } ]

"http://www.w3.org/2003/01/geo/wgs84_pos#lon":% [ { "type":"literal", "value":"12.3833", "datatype":"http://www.w3.org/2001/XMLSchema#float" } ] }}

Page 39: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 39

http://lod2.eu

RDFa Syntax

• RDFa = Resource Description Framework – in –attributes• Embedding RDF in XHTML• UTF-8 and UTF-16, since Extension of XML based XHTML• Due to embedding in HTML more overhead than other

serialisations• Less readable

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"><html version="XHTML+RDFa 1.0" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:dbpp="http://dbpedia.org/property/"xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">

<head><title>Leipzig</title></head> <body about="http://dbpedia.org/resource/Leipzig"> <h1 property="rdfs:label" xml:lang="de">Leipzig</h1> <p>Leipzig is a city in Germany. Leipzig's mayor is

<a href="Burkhard_Jung" rel="dbpp:hasMayor">Burkhard Jung</a>. It is located at latitude <span property="geo:lat" datatype="xsd:float">51.3333</span>

and longitude <span property="geo:lon" datatype="xsd:float">12.3833</span>.</p> </body></html>

Page 40: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 40

http://lod2.eu

Vocabularies

Schema layer of RDF Defines terms (classes and properties) Typically RDFS or OWL family Common vocabularies:

Dublin Core, SKOS FOAF, SIOC, vCard DOAP Core Organization Ontology VoID

http://www.slideshare.net/prototypo/introduction-to-linked-data-rdf-vocabularies

Page 41: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

SS2011 41

Vokabulare: Friend-of-a-Friend (FOAF)

defines classes and properties for representing

information about people and their

relationships

Soeren rdf:type foaf:Person .Soeren currentProject http://OntoWiki.net .Soeren foaf:homepage http://aksw.org/Soeren .Soeren foaf:knows http://sembase.at/Tassilo .Soeren foaf:sha1 09ac456515dee .

Soeren rdf:type foaf:Person .Soeren currentProject http://OntoWiki.net .Soeren foaf:homepage http://aksw.org/Soeren .Soeren foaf:knows http://sembase.at/Tassilo .Soeren foaf:sha1 09ac456515dee .

Page 42: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

SS2011 42

Vokabulare: SemanticallyInterlinked Online

Communities.

Represent content from Blogs, Wikis, Forums,

Mailinglists, Chats etc.

Page 43: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

SS2011 43

Vokabulare: Simple Knowledge Organization System (SKOS)

support the use of thesauri, classification schemes, subject

heading systems and taxonomies

Page 44: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

SS2011

Instance dataInstances are associated with one or several

classes:

Boddingtons rdf:type Ale .Grafentrunk rdf:type Bock .Hoegaarden rdf:type White .Jever rdf:type Pilsner .

Boddingtons rdf:type Ale .Grafentrunk rdf:type Bock .Hoegaarden rdf:type White .Jever rdf:type Pilsner .

Page 45: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 45

http://lod2.eu

The Linked Open Data cloud

Page 46: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

20082007

20082008

20082009

20092010

46

Linked Open Data cloud

Page 47: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Linked Open Data cloud

http://lod-cloud.net/

Media

Government

Geo

Publications

User-generated

Life sciences

Cross-domain

Page 48: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 48

http://lod2.eu

LOD cloud stats

triples distribution

links distribution

http://lod-cloud.net/state/

Page 49: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

TimBL’s 5-star plan for open data

★ Make your data available on the Web under an open license

★★ Make it available as structured data (Excel sheet instead of image scan of a table)

★★★ Use a non-proprietary format (CSV file instead of an Excel sheet)

★★★★ Use Linked Data format (URIs to identify things, RDF to represent data)

★★★★★ Link your data to other people’s data to provide context

More: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/

Page 50: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Why going for the 5th star?

Central Contractor Registration (CCR)

Geonames

http://webofdata.wordpress.com/2011/05/22/why-we-link/

Page 51: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 51

http://lod2.eu

Effort distribution

Third Party Effort

Consumer‘s Effort

Publisher‘s Effort

Fix Overall Data Integration

Effort

Page 52: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 52

http://lod2.eu

Datasets

A dataset is a set of RDF triples that are published,maintained or aggregated by a single provider

Page 53: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 53

http://lod2.eu

Linksets

An RDF link is an RDF triple whose subject and object are described in different datasets

A linkset is a collection of such RDF links between two datasets

Page 54: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 54

http://lod2.eu

Describing Datasets - VoID

General dataset metadata Access metadata Structural metadata Describing linksets Deployment and discovery of voiD files

http://www.w3.org/TR/void/

Page 55: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 55

http://lod2.eu

General dataset metadata

Dataset homepage Publisher Title and description Categorisation Licensing Technical features

Page 56: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 56

http://lod2.eu

General dataset metadata

:DBpedia a void:Dataset ; dcterms:title "DBpedia” ; dcterms:description "RDF data extracted from Wikipedia” ; dcterms:contributor :FU_Berlin ; dcterms:contributor :Uni_Leipzig ; dcterms:contributor :Openlink ; dcterms:source <http://dbpedia.org/resource/Wikipedia> ; void:feature <http://www.w3.org/ns/formats/RDF_XML> ; dcterms:modified "2008-11-17"^^xsd:date .

:Geonames a void:Dataset ; dcterms:subject <http://dbpedia.org/resource/Location> .

:GeoSpecies a void:Dataset ; dcterms:license <http://creativecommons.org/licenses/by-sa/3.0/us/> .

Page 57: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 57

http://lod2.eu

Access metadata

SPARQL endpoints RDF data dumps Root resources URI lookup endpoints OpenSearch description documents

Page 58: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 58

http://lod2.eu

Access metadata

:exampleDS void:Dataset ; void:sparqlEndpoint <http://example.org/sparql> ; void:dataDump <http://example.org/dump1.rdf> ; void:uriLookupEndpoint <http://api.example.org/search?qt=term> .

Page 59: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 59

http://lod2.eu

Structural metadata

Provides high-level information about the schema and internal structure of a dataset and can be helpful when exploring or querying datasets: Example resources Patterns for resource URIs Vocabularies Dataset partitions Statistics

Page 60: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 60

http://lod2.eu

Structural metadata

:DBpedia a void:Dataset; void:exampleResource <http://dbpedia.org/resource/Berlin> .

:LiveJournal a void:Dataset; void:vocabulary <http://xmlns.com/foaf/0.1/> .

:DBpedia a void:Dataset; void:classPartition [ void:class foaf:Person; void:entities 312000; ]; void:propertyPartition [ void:property foaf:name; void:triples 312000; ]; .

Page 61: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 61

http://lod2.eu

Describing linksets

Page 62: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 62

http://lod2.eu

Describing linksets

:DBpedia a void:Dataset ; void:subset :DBpedia2Geonames .

:Geonames a void:Dataset .

:DBpedia2Geonames a void:Linkset ; void:target :DBpedia ; void:target :Geonames ; void:linkPredicate owl:sameAs .

Page 63: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 63

http://lod2.eu

Deployment and discovery

Choosing URIs for datasets Publishing a VoID file alongside a dataset

Turtle RDFa

SPARQL Service Description Vocabularyhttp://www.w3.org/TR/sparql11-service-description/

Discovery (well-known URI), based on of RFC5758], registered with IANAhttp://www.example.com/.well-known/void

Page 64: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 64

http://lod2.eu

Consumption - Essentials

Linked Data provides for a global data-space with a uniform API (due to RDF as the data model)

Access methods Dereference URIs via HTTP GET (RDF/XML, RDFa, etc.) SPARQL (‘the SQL of RDF’) Data dumps (RDF/XML, etc.)

Page 65: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 65

http://lod2.eu

Consumption - Technologies

Linked Data access mechanisms widely supported all major platforms and languages (HTTP interface &

RDF parsing), such as Java, Python, PHP, C/C++/.NET, etc.

Command line tools (curl, rapper, etc.) Online tools

– http://redbot.org/ (HTTP/low-level)– http://sindice.com/developers/inspector (RDF/data-level)

Structured query: SPARQL (more later)

Page 66: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 66

http://lod2.eu

Consumption - Technologies

Distributed setup need for central point of access (indexer, aggregator)

Sindice, an index of the Web of Data http://sindice.com/

Sig.ma, Web of Data aggregator & browser http://sig.ma/

Relationship discovery http://relfinder.semanticweb.org/

Page 67: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 67

http://lod2.eu

Technologies – FYN

http://dbpedia.org/resource/Galway

67

Page 68: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 68

http://lod2.eu

Technologies – Sig.ma

http://sig.ma/search?q=Galway

Sig.ma is a Web of Data platform enabling entity visualisation and consolidation both for humans and machines (API)

68

Page 69: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 69

http://lod2.eu

Technologies – sameas.org

Sameas.org is a service to find co-references on the Web of Data

http://sameas.org/html?q=Galway

Page 70: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 70

http://lod2.eu

• All Linked Data datasets share a uniform data model, the RDF statement data model

• Information is represented in facts expressed as (subject, predicate, object) triples

• Components: globally unique IRI/URI entity identifiers & typed data values (literals) as objects

Linked Data Benefits: Uniformity

Page 71: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 71

http://lod2.eu

• URIs not just used for identifying entities, but also (as URLs) for locating and retrieving resources that describe these entities on the Web

Linked Data Benefits: De-referencability

Page 72: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 72

http://lod2.eu

• triples containing URIs from different namespaces as subject and object, establish a link between (the entity identified by the) subject with (the entity identified by the) object (typed RDF links)

Linked Data Benefits: Coherence

Berlin Germany

European Union

isCapitalOf

isMemberOfKnowledge base 1

Knowledgebase 2

Page 73: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 73

http://lod2.eu

• RDF data model, is based on a single mechanism for representing information (triples) -> very easy to attain a syntactic and simple semantic integration of different Linked Data sets.

• higher level semantic integration can be achieved by employing schema and instance matching techniques and expressing found matches again as additional triple facts

Linked Data Benefits: Integrateability

Page 74: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 74

http://lod2.eu

• Publishing and updating Linked Data is relatively simple thus facilitating a timely availability

• once a Linked Data source is updated it is straightforward to access and use the updated data source (time consuming and error prune extraction, transformation and loading not required)

Linked Data Benefits: Timeliness

Page 75: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 75

http://lod2.eu

1. The Vision & Big Picture

2. Linked Data 101

3. The Linked Data Life-cycle

Agenda

Page 76: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 76

http://lod2.eu

Achievements1. Extension of the Web with

a data commons (25B facts

2. vibrant, global RTD community

3. Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly)

4. Emerging governmental adoption in sight

5. Establishing Linked Data as a deployment path for the Semantic Web.

What works now? What has to be done?

Challenges

1. Coherence: Relatively few, expensively maintained links

2. Quality: partly low quality data and inconsistencies

3. Performance: Still substantial penalties compared to relational

4. Data consumption: large-scale processing, schema mapping and data fusion still in its infancy

5. Usability: Establishing direct end-user tools and network effect

• Web - a global, distributed platform for data, information and knowledge integration• exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web

using URIs and RDF

July 2007 April 2008 September 2008

July 2009

Page 77: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 77

http://lod2.eu

Inter-linking/ Fusing

Classifi-cation/

Enrichment

Quality Analysis

Evolution / Repair

Search/ Browsing/

Exploration

Extraction

Storage/ Querying

Manual revision/ authoring

Linked DataLifecycle

Page 78: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 78

http://lod2.euExtracti

on

Page 79: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 79

http://lod2.eu

From unstructured sources

• NLP, text mining, annotation

From semi-structured sources

• DBpedia, LinkedGeoData,

SCOVO/DataCube

From structured sources

• RDB2RDF

Extraction

Page 80: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 80

http://lod2.eu

extract structured information from Wikipedia

& make this information available on the Web as

LOD:• ask sophisticated queries against Wikipedia (e.g.

universities in brandenburg, mayors of elevated towns,

soccer players),

• link other data sets on the Web to Wikipedia data

• Represents a community consensus

Recently launched DBpedia Live transforms

Wikipedia into a structured knowledge base

Transforming Wikipedia into an Knowledge Base

Page 81: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Structure in Wikipedia

• Title• Abstract• Infoboxes• Geo-coordinates• Categories• Images• Links

– other language versions– other Wikipedia pages– To the Web– Redirects– Disambiguations

Page 82: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Infobox templates{{Infobox Korean settlement| title = Busan Metropolitan City| img = Busan.jpg| imgcaption = A view of the [[Geumjeong]] district in Busan| hangul = 부산 광역시...| area_km2 = 763.46| pop = 3635389| popyear = 2006| mayor = Hur Nam-sik| divs = 15 wards (Gu), 1 county (Gun)| region = [[Yeongnam]]| dialect = [[Gyeongsang]]}}

http://dbpedia.org/resource/Busan

dbp:Busan dbpp:title ″Busan Metropolitan City″dbp:Busan dbpp:hangul ″ 부산 광역시″ @Hangdbp:Busan dbpp:area_km2 ″763.46“^xsd:floatdbp:Busan dbpp:pop ″3635389“^xsd:intdbp:Busan dbpp:region dbp:Yeongnamdbp:Busan dbpp:dialect dbp:Gyeongsang...

Wikitext-Syntax

RDF representation

Page 83: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

A vast multi-lingual, multi-domain knowledge base

DBpedia extraction results in:• descriptions of ca. 3.4 million things (1.5 million classified in a consistent ontology,

including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases

• labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages;4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories

• altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions

• DBpedia Live (http://live.dbpedia.org/sparql/) &Mappings Wiki (http://mappings.dbpedia.org)integrate the community into a refinement cycle

• Upcomming DBpedia inline

Page 84: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 84

DBpedia Architecture

Extraction Job

Extraction Manager

PageCollections

DestinationsN-TripleDumps

WikipediaDumps

WikipediaOAI-PMH

DatabaseWikipedia

LiveWikipedia

N-TripleSerializer

SPARQL-UpdateDestination

Extractors

Generic Infobox

Label

Geo

Redirect Disambiguation

Image

Abstract Pagelink

Parsers

DateTime Units

Ontology-Mappings

Mapping-based Infobox

String-List Numbers

Geo

SPARQL endpoint

SPARQL endpoint

Linked Data

Linked Data

The WebThe WebRDF browser

HTML browserSPARQL clients

DBpedia apps

Triple StoreVirtuoso

Triple StoreVirtuoso

UpdateStream

Article-Queue

Wikipedia

Category

Page 85: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 85

Hierarchies

DBpedia Ontology Schema:

manually created for DBpedia (infoboxes) 275 classes + 1335 properties; 20mio triples

YAGO:

large hierarchy linking Wikipedia leaf categories to WordNet 250,000 classes

UMBEL (Upper Mapping and Binding Exchange Layer):

20000 classes derived from OpenCyc

Wikipedia Categories:

Not a class hierarchy (e.g. cycles), represented using SKOS 415,000+ categories

Page 86: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 86

DBpedia SPARQL Endpoint

http://dbpedia.org/sparql hosted on a OpenLink Virtuoso server can answer SPARQL queries like

Give me all Sitcoms that are set in NYC? All tennis players from Moscow? All films by Quentin Tarentino? All German musicians that were born in Berlin in the 19th

century? All soccer players with tricot number 11, playing for a club

having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?

Page 87: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 87

DBpedia SPARQL EndpointSELECT ?name ?birth ?description ?person WHERE {

?person dbp:birthPlace dbp:Berlin .

?person skos:subject dbp:Cat:German_musicians .

?person dbp:birth ?birth .

?person foaf:name ?name .

?person rdfs:comment ?description .

FILTER (LANG(?description) = 'en') .

} ORDER BY ?name

Page 88: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 88

DBpedia Applications

DBpedia Mobile: location aware mobile client for DBpedia Uses current location and DBpedia to display map Can navigate into other knowledge bases

DBpedia Query Builder: user front end for building queries

DBpedia Relationship Finder finds relation between two objects in DBpedia

Page 89: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 89

DBpedia Applications

Page 90: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 90

DBpedia Applications: Relfinder

http://www.visualdataweb.org/relfinder.php

Page 91: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 91

DBpedia Applications: Zemanta

Page 92: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 92

DBpedia Applications: Faceted-Browser

Page 93: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

2011/05/12 CONSEGI - Sören Auer: DBpedia 93

DBpedia Applications (3rd party)

Muddy Boots (BBC): Annotate actors in BBC News with DBpedia identifiers

Open Calais (Reuters): named entity recognition; entities are connected via owl:sameAs to DBpedia, Freebase, Geonames

Faviki: Social Bookmarking Tool uses DBpedia in backend to group tags etc. and multi-language support

Topbraid Composer: ontology editor, which links entities to DBpedia based on their labels

Page 94: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 94

LinkedGeoData

Conversion, interlinking and publishing of OpenStreetMap.org* data sets as RDF.

* ”Wikipedia for geographic data”

Page 95: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 95

Motivation

● Ease information integration tasks that require spatial knowledge, such as

● Offerings of bakeries next door

● Map of distributed branches of a company

● Historical sights along a bicycle track

● Therefore use RDF/OWL in order overcome structural and semantic heterogeneity.

● Requires a vocabulary – which we try to establish.

● LOD cloud contains data sets with spatial features● e.g. Geonames, DBpedia, US census, EuroStat

● But: they are restricted to popular or large entities like countries, famous places etc.

● Therefore they lack buildings, roads, mailboxes, etc.

Page 96: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 96

OpenStreetMap - Datamodel

● Basic entities are:● Nodes Latitude, Longitude

● Ways Sequence of nodes

● Relations Associations between any number of nodes, ways and relations.

● Each entity may be described with tags (= key-value pairs)

Page 97: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 97

Example: Leipzig's zoo

Page 98: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 98

Data/Mapping Example

node_id | k | v-----------+------------------+--------------------- 259212302 | name | Universität Leipzig, Mathematik und Informatik 259212302 | amenity | university 259212302 | addr:street | Johannisgasse 259212302 | addr:postcode | 04103 259212302 | addr:housenumber | 26 259212302 | addr:city | Leipzig

Page 99: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 99

Data/Mapping Example

node_id | k | v-----------+------------------+--------------------- 259212302 | name | Universität Leipzig, Mathematik und Informatik 259212302 | amenity | university 259212302 | addr:street | Johannisgasse 259212302 | addr:postcode | 04103 259212302 | addr:housenumber | 26 259212302 | addr:city | Leipzig

lgd:node259212302 a lgdo:University ; rdfs:label "Universität Leipzig, Mathematik undInformatik" ; lgdo:hasCity "Leipzig" ; lgdo:hasHouseNumber "26" ; lgdo:hasPostalCode "04103" ; lgdo:hasStreet "Johannisgasse" ; georss:point "51.3369334 12.385401" ; geo:lat 51.3369334 ; geo:long 12.385401 .

Page 100: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 100

Mapping Types

● Three Mapping Types● Text

– (5, name, Leipzig) → lgd:node5 rdfs:label ”Leipzig”

– (5, name:de, Leipzig) → lgd:node5 rdfs:label ”Leipzig”@de

● Datatypes– (6, seats, 4) → lgd:node6 lgdo:seats ”4”^^xsd:integer

● Classes/Object Properties– (7, place, city) → lgdn:7 a lgdo:City

– (7, religion, pastafarian) → lgdn:7 lgdo:religion lgdo:Pastafarian

Page 101: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 101

Access

● Rest Interface (based on Postgis DB, full osm dataset loaded, > 1billion triples)

● Supports limited queries (e.g. circular/rectangular area, filtering by labels)

● Sparql Endpoints (based on Virtuoso DB, subset of osm dataset, ~222M triples)

● Static (http://linkedgeodata.org/sparql)● Live (http://live.linkedgeodata.org/sparql)

● Downloads (http://downloads.linkedgeodata.org)

● Monthly updates on the above datasets envisioned

Page 102: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 102

LinkedGeoData Live

● OpenStreetMap provides full dumps and minutely changesets for download

● Changesets are numbered, e.g. ”001/234/567.osc.gz”

● We also convert the changesets to sets of added and removed triples (relative to our store) and publish them

● 001/234/567.added.nt.gz

● 001/234/567.removed.nt.gz

● Advantage: Other users could easily sync their RDF store with LinkedGeoData

Page 103: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 103

DBpedia Mapping – Step By Step

Given a DBpedia point, query LGD points within type specific maximum distance

Basic idea (performed with Silk):

● Compute spatial score

● Compute name similarity (rdfs:label)

Page 104: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 104

DBpedia Mapping – Step By Step

Given a DBpedia point, query LGD points within type specific maximum distance

Basic idea (performed with Silk):

● Compute spatial score

● Compute name similarity (rdfs:label)

● Combine both scores

● Depending on final score, either automatically accept/reject links or mark for manual verification.

Page 105: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 105

Statistics (2011-Feb-23)

● 222.539.712 Triples

● 6.666.865 Ways● 5.882.306 Nodes

● Among them

● 352.673 PlaceOfWorship● 60.573 RailwayStation● 59.468 Recycling● 50.955 Town● 30.099 Toilet● 7.222 City

Page 106: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)

Authors: Sören Auer, Jens Lehmann, Slide 106

Conclusion

● OpenStreetMap

● immensely successful project for collaboratively creating free spatial data

● Community uses key value structures, which provide a rich source of information

● Key strength: broad coverage

● LGD Contributions

● Established mapping to Dbpedia

● Geonames mapping partially done (37 different entity types cities, churches, ...)

● Facet-based LGD Browser provides an interface for OSM/LGD, which highlights its structural aspects

● Live sync

● Goal: Make LGD as useful (succesful) as DBpedia for the geospatial domain

Page 107: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 107

http://lod2.eu

Many different approaches (D2R, Virtuoso

RDF Views, Triplify, …)

No agreement on a formal

semantics of RDF2RDF

mapping

• LOD readiness,

SPARQL-SQL translation

W3C RDB2RDF WG

Extraction Relational Data

Tool Triplify D2RQ Virtuoso RDF Views

TechnologyScripting

languages (PHP)

JavaWhole

middleware solution

SPARQL endpoint - X X

Mapping language SQL RDF based RDF based

Mapping generation Manual Semi-

automatic Manual

ScalabilityMedium-high

(but no SPARQL)

Medium High

Page 108: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 108

http://lod2.eu

From unstructured sources

• Deploy existing NLP approaches (OpenCalais, Ontos API)

• Develop standardized, LOD enabled interfaces between NLP tools

(NLP2RDF)

From semi-structured sources

• Efficient bi-directional synchronization

From structured sources

• Declarative syntax and semantics of data model transformations

(W3C WG RDB2RDF)

Orthogonal challenges

• Using LOD as background knowledge

• Provenance

Extraction Challenges

Page 109: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 109

http://lod2.eu

Storage and Querying

Page 110: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 110

http://lod2.eu

Still by a factor 5-50 slower than relational data management

(BSBM, DBpedia Benchmark)

Performance increases steadily

Comprehensive, well-supported open-soure and commercial

implementations are available:• OpenLink’s Virtuoso (os+commercial)

• Big OWLIM (commercial), Swift OWLIM (os)

• 4store (os)

• Talis (hosted)

• Bigdata (distributed)

• Allegrograph (commercial)

• Mulgara (os)

RDF Data Management

Page 111: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 111

http://lod2.eu

• Uses DBpedia as data and a

selection of 25 frequently

executed queries

• Can generate fractions and

multiples of DBpedia‘s size

• Does not resemble

relational data

Performance differences,

observed with other

benchmarks are amplified

DBpedia Benchmark

Geometric Mean

Page 112: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 112

http://lod2.eu

• Reduce the performance gap between

relational and RDF data management

• SPARQL Query extensions• Spatial/semantic/temporal data management

• More advanced query result caching

• View maintenance / adaptive reorganization

based on common access patterns

• More realistic benchmarks

Storage and Querying Challenges

Page 113: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 113

http://lod2.eu

Authoring

Page 114: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 114

http://lod2.eu

1. Semantic (Text)

Wikis

• Authoring of

semantically

annotated texts

2. Semantic Data

Wikis

• Direct authoring of

structured information

(i.e. RDF, RDF-

Schema, OWL)

Two Kinds of Semantic Wikis

Page 115: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 115

http://lod2.eu

Versatile domain-independent tool

Serves as Linked Data / SPARQL endpoint on the Data

Web

Open-source project hosted at Google code

Not just a Wiki UI, but a whole framework for the

development of Semantic Web applications

Developed in PHP based on the Zend framework

Very active developer and user community

More than 500 downloads monthly

Large number of use cases

OntoWiki – a semantic data wiki

Page 116: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 116

http://lod2.eu

Ont

oWik

iDynamic views on knowledge bases

Page 117: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 117

http://lod2.eu

OntoWiki

RDF triples on resource details page

Page 118: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 118

http://lod2.eu

OntoWiki

Dynamische Vorschläge aus dem Daten Web

Page 119: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 119

http://lod2.eu

Catalogus Professorum Lipsiensis

Page 120: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

OntoWiki: Caucasian Spiders

Page 121: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 121

http://lod2.eu

RDFauthor in OntoWiki

Page 122: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 122

http://lod2.eu

Semantic Portal with OntoWiki: Vakantieland

Page 123: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 123

http://lod2.eu

RDFaCE- RDFa Content Editor

Page 124: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 124

http://lod2.eu

RDFaCE Architecture

Page 125: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 125

http://lod2.eu

Integrating various NLP APIs

Page 126: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 126

http://lod2.eu

© CC-BY-NC-ND by ~Dezz~ (residae on flickr)

Linking

Page 127: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 127

http://lod2.eu

Automatic

Semi-automatic• SILK

• LIMES

Manual• Sindice integration into UIs

• Semantic Pingback

LOD Linking

Page 128: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

LIMES 0.3: Basic IdeaUses the characteristics of metric

spacesEspecially consequences of

triangle inequality◦d(x, y) < d(x, z) + d(z, y) ◦d(x, z) - d(z, y) < d(x, y) < d(x, z)

+ d(z, y) Basic idea

◦Use pessimistic approximations of distances instead of computing them

◦Only compute distances when needed

Page 129: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Overview

Computation of

exemplarsFiltering

Similarity computati

on

Serialization

Knowledge sources

Page 130: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Computation of ExemplarsAssumption: number of

exemplars is givenGoal: Segment target data set

Page 131: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Computation of Exemplars

Page 132: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Computation of Exemplars

Page 133: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Computation of Exemplars

Page 134: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Computation of Exemplars

Page 135: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Computation of Exemplars

NB: Distances from exemplars to all other points are known

Page 136: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Filtering

x y

z

1. Measure distance from each x to each exemplar

Page 137: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Filtering

x y

z

2. Apply d(x, y) - d(y, z) > t d(x, z) > t

Page 138: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Similarity Computation

x y

z

d(x, y) - d(y, z) < t Compute d(x, z)

Page 139: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

SerializationResults are returned as RDFFor example mapping DBpedia and Drugbank

@prefix drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/> .

@prefix dbpedia: <http://dbpedia.org/ontology/> .@prefix owl: <http://www.w3.org/2002/07/owl#> .dbpedia:Cefaclor owl:sameAs drugbank:DB00833 .dbpedia:Clortermine owl:sameAs drugbank:DB01527 .dbpedia:Prednicarbate owl:sameAs

drugbank:DB01130 .dbpedia:Linezolid owl:sameAs drugbank:DB00601 .dbpedia:Valaciclovir owl:sameAs drugbank:DB00577 .….

Page 140: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

ExperimentsQ1: What is the best number of

exemplars?Q2: What is the relation between

the similarity threshold q and the total number of comparisons?

Q3: Does the assignment of S and T matter?

Q4: How does LIMES compare to SILK?

Page 141: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Q1 and Q2Experiments on synthetic dataKnowledge bases of sizes 2000,

3000, 5000, 7500 and 10000Varied number of exemplarsVaried thresholdsExperiments were repeated 5

timesAverage results are presented

Page 142: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Q1 and Q2

0 50 100 150 200 250 3000

20000000

40000000

60000000

80000000

100000000

120000000

0.750.80.850.90.95Brute force

Page 143: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Q1 and Q2Q1

◦Best number of exemplars depends on q

◦For q > 0.9, best number lies around |T|1/2

Q2◦As expected, number of comparisons

diminishes with growing q

Page 144: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Q3 (order of S and T)Experiments on synthetic dataKnowledge bases of sizes 1000,

2000, 3000, …, 10000Number of exemplars was |T|1/2

Experiments were repeated 5 times

Average results are presented

Page 145: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Q3T\S 1000 2000 3000 4000 5000 6000 7000 8000 9000 100001000 0.20 0.37 0.53 0.69 0.88 1.04 1.14 1.40 1.58 1.672000 0.36 0.64 0.88 1.24 1.37 1.63 1.97 2.25 2.50 2.703000 0.51 0.86 1.17 1.57 2.00 2.09 2.69 2.91 3.35 3.584000 0.70 1.11 1.59 2.00 2.45 2.88 3.10 3.61 3.94 4.505000 0.85 1.36 1.87 2.28 2.81 3.39 3.91 4.20 4.84 5.546000 1.02 1.60 2.14 2.81 3.29 3.93 4.44 4.96 5.39 6.087000 1.22 1.86 2.58 3.15 3.66 4.35 5.11 5.69 6.44 6.628000 1.41 2.04 2.78 3.43 4.06 4.98 5.51 6.55 7.14 7.539000 1.63 2.36 2.99 3.85 4.72 5.44 6.25 6.88 7.59 8.20

10000 1.80 2.62 3.51 4.25 4.97 6.01 6.33 7.81 8.31 9.15

Green = S first is more time-efficient

Overall less than 5% difference

Page 146: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Q4 (comparison with SILK)3 Experiments on real data

◦Drugs◦Diseases◦SimCities

Number of exemplars was |T|1/2

Comparison of runtime with SILKExperiments were repeated

thriceBest runtimes are presented

Page 147: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Q4

Drugbank SimCities Diseases0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

LIMES (0.95)LIMES (0.90)LIMES (0.85)LIMES (0.80)LIMES (0.75)SILK

Page 148: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Q4We outperform SILK 2 by 1.5

orders of magnitudeThe larger the data sources, the

higher our speedup (64 for SimCities)

Page 149: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 151

http://lod2.eu

update and notification services for LOD

Downward compatible with Pingback (blogosphere)

http://aksw.org/Projects/SemanticPingBack

Creating a network effect aroundLinking Data: Semantic Pingback

Page 150: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 152

http://lod2.eu

Visualizing Pingbacks in OntoWiki

Page 151: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 153

http://lod2.eu

Only 5% of the information on the Data Web is actually

linked

• Make sense of work in the de-duplication/record linkage

literature

• Consider the open world nature of Linked Data

• Use LOD background knowledge

• Zero-configuration linking

• Explore active learning approaches, which integrate users in a

feedback loop

• Maintain a 24/7 linking service: Linked Open Data Around-The-

Clock project (LATC-project.eu)

Interlinking Challenges

Page 152: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 154

http://lod2.eu

Enrichment

Page 153: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 155

http://lod2.eu

Linked Data is mainly instance data and !!!

ORE (Ontology Repair and Enrichment) tool allows to improve an

OWL ontology by fixing inconsistencies & making suggestions for

adding further axioms.• Ontology Debugging: OWL reasoning to detect inconsistencies and

satisfiable classes + detect the most likely sources for the problems.

user can create a repair plan, while maintaining full control.

• Ontology Enrichment: uses the DL-Learner framework to suggest

definitions & super classes for existing classes in the KB. works if

instance data is available for harmonising schema and data.

http://aksw.org/Projects/ORE

Enrichment & Repair

Page 154: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 156

http://lod2.eu

Analysis

Quality

Page 155: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 157

http://lod2.eu

Quality on the Data Web is varying a lot• Hand crafted or expensively curated

knowledge base (e.g. DBLP, UMLS) vs.

extracted from text or Web 2.0 sources

(DBpedia)

Research Challenge• Establish measures for assessing the authority,

provenance, reliability of Data Web resources

Linked Data Quality Analysis

Page 156: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 158

http://lod2.euEvolutio

n© CC-BY-SA by alasis on flickr)

Page 157: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 159

http://lod2.eu

• unified method, for both data evolution and ontology refactoring.

• modularized, declarative definition of evolution patterns is relatively

simple compared to an imperative description of evolution• allows domain experts and knowledge engineers to amend the ontology

structure and modify data with just a few clicks

• Combined with RDF representation of evolution patterns and their

exposure on the Linked Data Web, EvoPat facilitates the development

of an evolution pattern ecosystem• patterns can be shared and reused on the Data Web.

• declarative definition of bad smells and corresponding evolution

patterns promotes the (semi-)automatic improvement of

information quality.

EvoPat – Pattern based KB Evolution

Page 158: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 160

http://lod2.eu

Evolution Patterns

Page 159: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Page 160: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 162

http://lod2.eu

Exploration

Page 161: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 163

http://lod2.eu

An ecosystem of LOD visualizations

LOD

Exp

lora

tion

Wid

gets

Spatial faceted-browsing

Faceted-browsing

Statisticalvisualization

Entity-/faceted-Based browsing

Domain specificvisualizations … …

LOD

Dat

aset

sCh

oreo

grap

hyla

yer

• Dataset analysis (size, vocabularies, property histograms etc.)• Selection of suitable visualization widgets

Page 162: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 164

http://lod2.eu

TODO: Put ULEI slides

Faceted spatial-semantic browsing component

Page 163: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 165

http://lod2.eu

Pure JavaScript, requires only SPARQL Endpoint for data access, Cross-Origin Resource

Sharing (CORS) enabled.

operates on local spatial regions, doed not depend on global meta-data about the data

Source code:

• https://github.com/AKSW/SpatialSemanticBrowsingWidgets

Online Demo - LinkedGeoData Browser:

• http://browser.linkedgeodata.org

Next steps

• Polygone/curve markers, domain specific visualization templates, integration of other

sources, mobile interface

Publication:

• Claus Stadler, Jens Lehmann, Konrad Höffner, Sören Auer: LinkedGeoData: A Core for

a Web of Spatial Open Data. To appear in Semantic Web Journal - Special Issue on

Linked Spatiotemporal Data and Geo-Ontologies.

Faceted spatial-semantic browsing - Availability

Page 164: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 166

http://lod2.eu

Generic entity-based exploration with OntoWikihttp://fintrans.publicdata.eu

Page 165: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 167

http://lod2.eu

Domain-specific visualization:http://energy.publicdata.eu

Page 166: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 168

http://lod2.eu

Visualization of statistic data (datacube vocab.)http://scoreboard.lod2.eu

Page 167: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Page 168: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 170

http://lod2.eu

11.04.2023 Sören Auer - The emerging Web of Linked Data

170

Page 169: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 171

http://lod2.eu

11.04.2023 Sören Auer - The emerging Web of Linked Data

171

Page 170: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Page 171: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 173

http://lod2.eu

Visual Query Builder

Page 172: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 174

http://lod2.eu

Relationship Finder in CPL

Page 173: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 175

http://lod2.eu

Distributed Social Semantic Networking

Page 174: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 176

http://lod2.eu

Social Networks are walled gardens• Take users' data out of their hands,• predefined privacy & data security regulations• infrastructure of a single provider (lock-in)• Facebook (600M+ users) = Web inside the Web• Interoperability is limited to proprietary APIs

Social networks should be open and evolving• allow users to control what to enter & keep control over their data• users should be able to host the data on infrastructure, which is under

their direct control, the same way as they host their own website (TBL)

We need a truly Distributed Social Semantic Network (DSSN)• Initial approaches appeared with GNU social and more recently Diaspora• a DSSN should be based on semantic resource descriptions and de-referenceability

so as to ensure versatility, reusability and openness in order to accommodate unforeseen usage scenarios• a number of standards and best-practices for social, Semantic Web applications such as FOAF, WebID and Semantic

Pingback emerged.

DistributedSocialSemanticNetworking

Page 175: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 177

http://lod2.eu

(1) Resources announce services and feeds, feeds announce services – in particular a push service.

(2) Applications initiate ping requests to spin the Linked Data network

(3) Applications subscribe to feeds on push services and receive instant notifications on updates.

(4) Update services are able to modify resources and feeds (e.g. on request of an application)

(5) Personal and global search services index social network resources and are used by applications

(6) Access to resources & services can be delegated to applications by a WebID, i.e. application can act in name of

WebID owner

(7) The majority of all access operations is executed through standard web requests.

DSSN Architecture

Page 176: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 178

http://lod2.eu

• Open-source, MVC architecture

• Plattform independent, based on HTML5, CSS,

Javascript

• jQuery, jQuery Mobile, jQuery UI

• rdfQuery – simple triple store in Javascript

• PhoneGap (Apache Device ready) native apps for

iOS, Android, Blackberry OS, WebOS, Symbian,

Bada

• http://aksw.org/Projects/MobileSocialSemanticWeb

DSSN Mobile Client

Page 177: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 179

http://lod2.eu

Page 178: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 180

http://lod2.eu

DSSN Mobile Browsing

Page 179: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 181

http://lod2.eu

DSSN Mobile Editing

Page 180: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

EU-FP7 LOD2 Project Overview . Page 182

http://lod2.eu

Creating Knowledge out of Interlinked DataInter-

linking/ Fusing

Classifi-cation/

Enrichment

Quality Analysis

Evolution / Repair

Search/ Browsing/ Exploratio

n

Extraction

Storage/ Querying

Manual revision/ authoring

LOD Lifecyclesupported byDebian basedLOD2 Stack

(released next week)

Page 181: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

EU-FP7 LOD2 Project Overview . Page 183

http://lod2.eu

Creating Knowledge out of Interlinked Data

First release of the LOD2 Stack: stack.lod2.eu & demo.lod2.eu/lod2demo

Page 182: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

EU-FP7 LOD2 Project Overview . Page 184

http://lod2.eu

Creating Knowledge out of Interlinked Data

Page 183: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

EU-FP7 LOD2 Project Overview . Page 185

http://lod2.eu

Creating Knowledge out of Interlinked Data

AKSW Team

Page 184: Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Creating Knowledge out of Interlinked Data

Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 186

http://lod2.eu

Thanks for your attention!

Sören Auerhttp://www.uni-leipzig.de/~auer/ | http://aksw.org | http://[email protected]