Upload
soeren-auer
View
6.666
Download
6
Embed Size (px)
DESCRIPTION
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea intoa very promising candidate for addressing one of the biggest challengesof computer science: the exploitation of the Web as a platform for dataand information integration. To translate this initial success into aworld-scale reality, a number of research challenges need to beaddressed: the performance gap between relational and RDF datamanagement has to be closed, coherence and quality of data published onthe Web have to be improved, provenance and trust on the Linked Data Webmust be established and generally the entrance barrier for datapublishers and users has to be lowered. This tutorial will discussapproaches for tackling these challenges. As an example of a successfulLinked Data project we will present DBpedia, which leverages Wikipediaby extracting structured information and by making this informationfreely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well asthe recently launched DBpedia benchmark.
Citation preview
DBpedia and the Emerging Web ofLinked Data
Sören Auer
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 2
http://lod2.eu
• 2000 Mathematics and Computer Science studies inHagen, Dresden and Екатеринбург
• Managing director of adVIS GmbH – SME focusedon Web-Application and Content Management technology
• IT consultant for various companies (T-Mobile AG, RDL Corp., Science Computing AG)
• 2006 doctorate in Information Systems / Computer Science at Universität Leipzig
• 2006-2008 post-doctoral researcher at the DB Group at University of Pennsylvania (USA)
• Head of AKSW research group – DBpedia, OntoWiki, LinkedGeoData, Triplify
• Research interests: Information Systems, Database and Web Technologies, Semantic Web and Knowledge Engineering, Adaptive Methodologies, HCI, E-Science, Digital Libraries
• Coordinator of the EU FP7 IP Project “LOD2 – Creating Knowledge out of Interlinked Data”
• Work as expert for W3C, EU FP6/FP7/CIP, University City Keystone Innovation Zone, Swiss National Science Foundation
Dr. Sören Auer
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 3
http://lod2.eu
1. The Vision & Big Picture
2. Linked Data 101
3. The Linked Data Life-cycle
Agenda
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 4
http://lod2.eu
1. Reasoning does not scale on the Web
• IR / one dimensional indexing scales (Google)
• Next step conjunctive querying (OWL-QL?, dynamic
scale-out / clustering)
• Web scalable DL reasoning is out-of-sight (maybe fragment,
fuzzy reasoning has some chances)
2. If it would scale it would not be affordable
• “What is the only former Yugoslav republic in the
European Union?”
• 2880 POWER7 cores, 16 Terabytes memory, 4 Terabytes
clustered storage (IBM Watson) still can not answer this
question
Why the Semantic Web won‘t work (soon)
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 5
http://lod2.eu
Web server
Web server
Problem: Try to search for these things on the current Web:
• Apartments near German-Russian bilingual childcare in Berlin.
• ERP service providers with offices in Vienna and London.
• Researchers working on multimedia topics in Eastern Europe.
Information is available on the Web, but opaque to current search.
Why do we need the Data Web?
berlin.deHas everything about childcare in Berlin.
Immobilienscout.deKnows all about real estate offers in GermanyDB
Web server
DB
Web server
Search engineHTML HTML
RDF RDF
Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources:
From the Document Web to theSemantic Data Web
Web (since 1992)• HTTP• HTML/CSS/JavaScript
Semantic Web(Vision 1998, starting ???)• Reasoning• Logic, Rules• Trust
Social Web (since 2003)• Folksonomies/Tagging• Reputation, sharing• Groups, relationships
Data Web (since 2006)• URI de-referencability• Web Data integration• RDF serializations
Web 1.0 Web 2.0 Web 3.0
Many Web sitescontaining unstructured,textual content
Few large Web sites are specialized onspecific content types
Many Web sites containing & semantically syndicating arbitrarily structured content
PicturesVideo
Encyclopedicarticles+ +
The Long Tail of Information DomainsPictures
NewsVideo
Recipes
Calendar
Currently supportedstructuredcontent types
SemWeb supported structured content
Genesequences
Itinerary ofKing George
Talentmanagement
Popu
larit
y
Not or insufficiently supported content types
The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to information domains
… …
Requirements-Engineering
……
Special interestcommunities
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 9
http://lod2.eu
1. Uses RDF Data Model
Linked Data in a Nutshell
SBBD2011
Florianopolis
3.10.2011
SBCorganizes
starts
takesPlaceIn
2. Is serialised in triples:SBC organizes SBBD2011SBBD2011 starts “20111003”^^xsd:dateSBBD2011 takesPlaceAt Florianopolis
3. Uses Content-negotiation
The emerging Web of Data
20082007
20082008
20082009
2009
Virtouso
SemMF
SILK
poolparty
DL-Learner
Sindice
Sigma
ORE
OntoWiki
MonetDB
DXX Engine
WiQA
repair
interlink
fuse
classify
enrich
create
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 11
http://lod2.eu
Conceptual LevelData Access and Integration
Object-relational mappings (ORM)• NeXT’s EOF / WebObjects• ADO.NET Entity Framework• Hibernate
Entity-attribute-value (EAV)• HELP medical record
system, TrialDB
Column-oriented DBMS• Collocates column
values rather than row values
• Vertica, C-Store, MonetDB
Data Web• URIs as entity identifiers• HTTP as data access
protocol• Local-As-View (LAV)
RDBMS• Organize data in
relations, rows, cells
• Oracle, DB2, MS-SQL
Triple/Quad Stores•RDF data model•Virtuoso, Oracle,
Sesame
Dat
a M
odel
s Others• XML, hierachical,
tree, graph-oriented DBMS
Procedural APIs• ODBC• JDBC
Dat
a Ac
cess Query Languages
• Datalog, SQL• SPARQL• XPATH/XQuery
Dat
a In
tegr
ation
Linked Data• de-referencable
URIs• RDF serialization
formats
Enterprise Information Integrationsets of heterogeneous data sources appear as a single, homogeneous data source
Data Warehousing• Based on extract,
transform load (ETL)• Global-As-View (GAV)
ResearchMediatorsOntology-basedP2PWeb service-based
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 12
http://lod2.eu
1. The Vision & Big Picture
2. Linked Data 101
(based on Michael Hausenblas‘
slides)
3. The Linked Data Life-cycle
Agenda
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 13
http://lod2.eu
Orientation
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 14
http://lod2.eu
Linked Data 101
Linked Data provides a standardised API for:
Data and metadata discovery
Data integration
Distributed query
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 15
http://lod2.eu
Linked Data principles
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can look them up on the web
3. When a URI is looked up, return a description ofthe thing (in RDF format)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 16
http://lod2.eu
Linked Data principles
They are principles, not implementation advices Not humans or machines but humans and
machines! Content negotiation (e.g. HTML and RDF/XML) HTML+ RDFa
Metcalfe’s Lawhttp://en.wikipedia.org/wiki/Metcalfe%27s_law
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 17
http://lod2.eu
Linked Data example
17
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 18
http://lod2.eu
HTTP URIs
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. [RFC3986]
SyntaxURI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
Examplefoo://example.com:8042/over/there?name=ferret#nose
\_/ \_________________/\_________/ \__________/ \__/
| | | | |
scheme authority path query fragment
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 19
http://lod2.eu
HTTP URIs
URI referencesAn RDF URI reference is a Unicode string does not contain any control characters (#x00 - #x1F, #x7F-#x9F) and would produce a valid URI character sequence representing an absolute URI when subjected to an UTF-8 encoding along with %-escaping non-US-ASCII octets.
Qualified Names (QNames)XML’s way to allow namespaced elements/attributes as of QName = Prefix ‘:‘ LocalPart
Compact URIs (CURIEs)Generic, abbreviated syntax for expressing URIs
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 20
http://lod2.eu
HTTP
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 21
http://lod2.eu
HTTP
HTTP messages consist of requests from client to server and responses from server to client
Set of methods is predefined GET POST PUT DELETE HEAD (OPTIONS)
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 22
http://lod2.eu
HTTP
Status codes Informational 1xx, provisional response, (100
Continue) Successful 2xx, request successfully received,
understood, and accepted (201 Created) Redirection 3xx, further action needs to be taken by
user agent to fulfill the request (301 Moved Permanently)
Client Error 4xx, client erred (405 Method Not Allowed) Server Error 5xx, server encountered an unexpected
condition (501 Not Implemented)
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 23
http://lod2.eu
HTTP
GET /html/rfc2616 HTTP/1.1
Host: tools.ietf.org
User-Agent: Mozilla/5.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
HTTP/1.x 200 OK
Date: Thu, 05 Mar 2009 08:17:33 GMT
Server: Apache/2.2.11
Content-Location: rfc2616.html
Last-Modified: Tue, 20 Jan 2009 09:16:04 GMT
Content-Type: text/html; charset=UTF-8
RE
QU
ES
TR
ES
PO
NS
E
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 24
http://lod2.eu
HTTP
Content Negotiation: selecting representation for a given response when multiple representations available
Three types of CN: server-driven, agent-driven CN, transparent CN
Example:
curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/GalwayHTTP/1.1 303 See OtherContent-Type: application/rdf+xmlLocation: http://dbpedia.org/data/Galway.rdf
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 25
http://lod2.eu
HTTP
Caching (see Cache–Control header field) is essential for scalabilityhttp://webofdata.wordpress.com/2009/11/23/linked-open-data-http-caching/
HTTPbis IETF WG chaired by Mark Nottingham, mainly about: patches, clarifications, deprecate non-used features, documentation of security properties
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 26
http://lod2.eu
REST - HTTP
Representational State Transfer (REST)
resource intended conceptual target of a hypertext reference
resource identifier URL, URN
representation HTML document, JPEG image
representation media type, last-modified timemetadata
resource source link, alternates, varymetadata
control data if-modified-since, cache-control
http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm http://webofdata.wordpress.com/2009/10/09/linked-data-for-restafarians/
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 27
http://lod2.eu
Web's Standard Retrieval Algorithm
1. parse URI and find HTTP protocol2. look up DNS name to determine the associated
IP address3. open a TCP stream to port 80 at the IP address
determined above4. format an HTTP GET request for resource and
sends that to the server5. read response from the server6. from the status code (200) determine that a
representation of the resource is available7. inspect the returned Content-Type8. pass the entity-body to its HTML rendering
engine
http://www.w3.org/2001/tag/doc/selfDescribingDocuments
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 28
http://lod2.eu
RDF
A data model - directed, labeled graph Triple: (subject predicate object)
subject … URIref or bNode predicate … URIref object … URIref or bNode or literal
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 29
http://lod2.eu
RDF Triple
• Inspired by linguistic categories
• Allowed usage:
Subject : URI or blank node
Predicate: URI (also called properties)
Object : URI or blank nodes or literal
Burkhard Jung LeipzigisMayorOf
Subject Predicate Object
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 30
http://lod2.eu
Example RDF Graph
0341LeipzighasAreaCode
Burkhard Jung
hasMayor
Saxony
locatedIn
51.3333
latitude
12.3833longitude
GermanySocial Democratic Party
1958-03-07 isMemberOflocatedIn
born
isMayorOf
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 31
http://lod2.eu
Literals
• Representation of data values• Serialization as strings• Interpretation based on the datatype• Literals without Datatype are treated as
strings
Leipzig
Burkhard Jung
51.3333latitude
12.3833longitude
1958-03-07born
isMayorOf hasMayor
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 32
http://lod2.eu
RDF Serialization
N3: "Notation 3" - extensive formalism
N-Triples: part of N3
Turtle: Extension of N-Triples (shortcuts)
Quelle:http://www.w3.org/DesignIssues/Notation3.html
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 33
http://lod2.eu
Turtle Syntax
• URIs in angle brackets• Literals in quotes• Triples separated by dot • Whitespace is ignored
33
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 34
http://lod2.eu
Turtle Syntax: Shortcuts
http://dbpedia.org/resource/Leipzig http://dbpedia.org/property/hasMayor http://dbpedia.org/resource/Burkhard_Jung ;http://www.w3.org/2000/01/rdf-schema#label "Leipzig"@de ;http://www.w3.org/2003/01/geo/wgs84_pos#lat "51.333332"^^xsd:float ;http://www.w3.org/2003/01/geo/wgs84_pos#lon "12.383333"^^xsd:float .
Shortcuts for namespace prefixes:@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> .@prefix dbp:<http://dbpedia.org/resource/> .@prefix dbpp:<http://dbpedia.org/property/> .@prefix geo:<http://www.w3.org/2003/01/geo/wgs84_pos#> .
dbp:Leipzig dbpp:hasMayor dbp:Burkhard_Jung .dbp:Leipzig rdfs:label "Leipzig"@de .dbp:Leipzig geo:lat "51.333332"^^xsd:float .dbp:Leipzig geo:lon "12.383333"^^xsd:float .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 35
http://lod2.eu
Turtle Syntax: Shortcuts
Group triples with same subject using “;” instead of “.”:@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs="http://www.w3.org/2000/01/rdf-schema#> .@prefix dbp="http://dbpedia.org/resource/> .@prefix dbpp="http://dbpedia.org/property/> .@prefix geo="http://www.w3.org/2003/01/geo/wgs84_pos#> .
dbp:Leipzig dbpp:hasMayor dbp:Burkhard_Jung ; rdfs:label "Leipzig"@de ; geo:lat "51.333332"^^xsd:float ;
geo:lon "12.383333"^^xsd:float .
also Triple with same subject and predicate:@prefix dbp="http://dbpedia.org/resource/> .@prefix dbpp="http://dbpedia.org/property/> .
dbp:Leipzig dbp:locatedIn dbp:Saxony, dbp:Germany; dbpp:hasMayor dbp:Burkhard_Jung .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 36
http://lod2.eu
XML-Syntax von RDF
• Turtle intuitively readable and machine processable
• but: better tool support and programming libraries for XML
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dbpp="http://dbpedia.org/property/"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"><rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">
<property:hasMayor rdf:resource="http://dbpedia.org/resource/Burkhard_Jung" />
<rdfs:label xml:lang="de">Leipzig</rdfs:label><geo:lat rdf:datatype="float">51.3333</geo:lat><geo:lon rdf:datatype="float">12.3833</geo:lon>
</rdf:Description></rdf:RDF>
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 37
http://lod2.eu
RDF/JSON
• JSON = JavaScript Object Notation• Compact format for data exchange between
applications• JSON documents are valid JavaScript• Programming language independent, since parser exist for all
popular programming languages• Less overhead when parsing and serialising than XML
{ "S" : { "P" : [ O ] } }•Subject: URI, BNode•Predicate: URI•Object:
Type: „URI“, „Literal“ or „bnode“Value: data valueLang: language tagDatatype: URI of the datatype.
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 38
http://lod2.eu
JSON Example
{ "http://dbpedia.org/resource/Leipzig" : { "http://dbpedia.org/property/hasMayor":
[ { "type":"uri", "value":"http://dbpedia.org/resource/Burkhard_Jung" } ], "http://www.w3.org/2000/01/rdf-schema#label":
[ { "type":"literal", "value":"Leipzig", "lang":"en" } ] , "http://www.w3.org/2003/01/geo/wgs84_pos#lat":
[ { "type":"literal", "value":"51.3333", "datatype":"http://www.w3.org/2001/XMLSchema#float" } ]
"http://www.w3.org/2003/01/geo/wgs84_pos#lon":% [ { "type":"literal", "value":"12.3833", "datatype":"http://www.w3.org/2001/XMLSchema#float" } ] }}
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 39
http://lod2.eu
RDFa Syntax
• RDFa = Resource Description Framework – in –attributes• Embedding RDF in XHTML• UTF-8 and UTF-16, since Extension of XML based XHTML• Due to embedding in HTML more overhead than other
serialisations• Less readable
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"><html version="XHTML+RDFa 1.0" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dbpp="http://dbpedia.org/property/"xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<head><title>Leipzig</title></head> <body about="http://dbpedia.org/resource/Leipzig"> <h1 property="rdfs:label" xml:lang="de">Leipzig</h1> <p>Leipzig is a city in Germany. Leipzig's mayor is
<a href="Burkhard_Jung" rel="dbpp:hasMayor">Burkhard Jung</a>. It is located at latitude <span property="geo:lat" datatype="xsd:float">51.3333</span>
and longitude <span property="geo:lon" datatype="xsd:float">12.3833</span>.</p> </body></html>
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 40
http://lod2.eu
Vocabularies
Schema layer of RDF Defines terms (classes and properties) Typically RDFS or OWL family Common vocabularies:
Dublin Core, SKOS FOAF, SIOC, vCard DOAP Core Organization Ontology VoID
http://www.slideshare.net/prototypo/introduction-to-linked-data-rdf-vocabularies
SS2011 41
Vokabulare: Friend-of-a-Friend (FOAF)
defines classes and properties for representing
information about people and their
relationships
Soeren rdf:type foaf:Person .Soeren currentProject http://OntoWiki.net .Soeren foaf:homepage http://aksw.org/Soeren .Soeren foaf:knows http://sembase.at/Tassilo .Soeren foaf:sha1 09ac456515dee .
Soeren rdf:type foaf:Person .Soeren currentProject http://OntoWiki.net .Soeren foaf:homepage http://aksw.org/Soeren .Soeren foaf:knows http://sembase.at/Tassilo .Soeren foaf:sha1 09ac456515dee .
SS2011 42
Vokabulare: SemanticallyInterlinked Online
Communities.
Represent content from Blogs, Wikis, Forums,
Mailinglists, Chats etc.
SS2011 43
Vokabulare: Simple Knowledge Organization System (SKOS)
support the use of thesauri, classification schemes, subject
heading systems and taxonomies
SS2011
Instance dataInstances are associated with one or several
classes:
Boddingtons rdf:type Ale .Grafentrunk rdf:type Bock .Hoegaarden rdf:type White .Jever rdf:type Pilsner .
Boddingtons rdf:type Ale .Grafentrunk rdf:type Bock .Hoegaarden rdf:type White .Jever rdf:type Pilsner .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 45
http://lod2.eu
The Linked Open Data cloud
20082007
20082008
20082009
20092010
46
Linked Open Data cloud
Linked Open Data cloud
http://lod-cloud.net/
Media
Government
Geo
Publications
User-generated
Life sciences
Cross-domain
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 48
http://lod2.eu
LOD cloud stats
triples distribution
links distribution
http://lod-cloud.net/state/
TimBL’s 5-star plan for open data
★ Make your data available on the Web under an open license
★★ Make it available as structured data (Excel sheet instead of image scan of a table)
★★★ Use a non-proprietary format (CSV file instead of an Excel sheet)
★★★★ Use Linked Data format (URIs to identify things, RDF to represent data)
★★★★★ Link your data to other people’s data to provide context
More: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
Why going for the 5th star?
Central Contractor Registration (CCR)
Geonames
http://webofdata.wordpress.com/2011/05/22/why-we-link/
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 51
http://lod2.eu
Effort distribution
Third Party Effort
Consumer‘s Effort
Publisher‘s Effort
Fix Overall Data Integration
Effort
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 52
http://lod2.eu
Datasets
A dataset is a set of RDF triples that are published,maintained or aggregated by a single provider
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 53
http://lod2.eu
Linksets
An RDF link is an RDF triple whose subject and object are described in different datasets
A linkset is a collection of such RDF links between two datasets
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 54
http://lod2.eu
Describing Datasets - VoID
General dataset metadata Access metadata Structural metadata Describing linksets Deployment and discovery of voiD files
http://www.w3.org/TR/void/
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 55
http://lod2.eu
General dataset metadata
Dataset homepage Publisher Title and description Categorisation Licensing Technical features
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 56
http://lod2.eu
General dataset metadata
:DBpedia a void:Dataset ; dcterms:title "DBpedia” ; dcterms:description "RDF data extracted from Wikipedia” ; dcterms:contributor :FU_Berlin ; dcterms:contributor :Uni_Leipzig ; dcterms:contributor :Openlink ; dcterms:source <http://dbpedia.org/resource/Wikipedia> ; void:feature <http://www.w3.org/ns/formats/RDF_XML> ; dcterms:modified "2008-11-17"^^xsd:date .
:Geonames a void:Dataset ; dcterms:subject <http://dbpedia.org/resource/Location> .
:GeoSpecies a void:Dataset ; dcterms:license <http://creativecommons.org/licenses/by-sa/3.0/us/> .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 57
http://lod2.eu
Access metadata
SPARQL endpoints RDF data dumps Root resources URI lookup endpoints OpenSearch description documents
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 58
http://lod2.eu
Access metadata
:exampleDS void:Dataset ; void:sparqlEndpoint <http://example.org/sparql> ; void:dataDump <http://example.org/dump1.rdf> ; void:uriLookupEndpoint <http://api.example.org/search?qt=term> .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 59
http://lod2.eu
Structural metadata
Provides high-level information about the schema and internal structure of a dataset and can be helpful when exploring or querying datasets: Example resources Patterns for resource URIs Vocabularies Dataset partitions Statistics
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 60
http://lod2.eu
Structural metadata
:DBpedia a void:Dataset; void:exampleResource <http://dbpedia.org/resource/Berlin> .
:LiveJournal a void:Dataset; void:vocabulary <http://xmlns.com/foaf/0.1/> .
:DBpedia a void:Dataset; void:classPartition [ void:class foaf:Person; void:entities 312000; ]; void:propertyPartition [ void:property foaf:name; void:triples 312000; ]; .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 61
http://lod2.eu
Describing linksets
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 62
http://lod2.eu
Describing linksets
:DBpedia a void:Dataset ; void:subset :DBpedia2Geonames .
:Geonames a void:Dataset .
:DBpedia2Geonames a void:Linkset ; void:target :DBpedia ; void:target :Geonames ; void:linkPredicate owl:sameAs .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 63
http://lod2.eu
Deployment and discovery
Choosing URIs for datasets Publishing a VoID file alongside a dataset
Turtle RDFa
SPARQL Service Description Vocabularyhttp://www.w3.org/TR/sparql11-service-description/
Discovery (well-known URI), based on of RFC5758], registered with IANAhttp://www.example.com/.well-known/void
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 64
http://lod2.eu
Consumption - Essentials
Linked Data provides for a global data-space with a uniform API (due to RDF as the data model)
Access methods Dereference URIs via HTTP GET (RDF/XML, RDFa, etc.) SPARQL (‘the SQL of RDF’) Data dumps (RDF/XML, etc.)
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 65
http://lod2.eu
Consumption - Technologies
Linked Data access mechanisms widely supported all major platforms and languages (HTTP interface &
RDF parsing), such as Java, Python, PHP, C/C++/.NET, etc.
Command line tools (curl, rapper, etc.) Online tools
– http://redbot.org/ (HTTP/low-level)– http://sindice.com/developers/inspector (RDF/data-level)
Structured query: SPARQL (more later)
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 66
http://lod2.eu
Consumption - Technologies
Distributed setup need for central point of access (indexer, aggregator)
Sindice, an index of the Web of Data http://sindice.com/
Sig.ma, Web of Data aggregator & browser http://sig.ma/
Relationship discovery http://relfinder.semanticweb.org/
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 67
http://lod2.eu
Technologies – FYN
http://dbpedia.org/resource/Galway
67
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 68
http://lod2.eu
Technologies – Sig.ma
http://sig.ma/search?q=Galway
Sig.ma is a Web of Data platform enabling entity visualisation and consolidation both for humans and machines (API)
68
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 69
http://lod2.eu
Technologies – sameas.org
Sameas.org is a service to find co-references on the Web of Data
http://sameas.org/html?q=Galway
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 70
http://lod2.eu
• All Linked Data datasets share a uniform data model, the RDF statement data model
• Information is represented in facts expressed as (subject, predicate, object) triples
• Components: globally unique IRI/URI entity identifiers & typed data values (literals) as objects
Linked Data Benefits: Uniformity
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 71
http://lod2.eu
• URIs not just used for identifying entities, but also (as URLs) for locating and retrieving resources that describe these entities on the Web
Linked Data Benefits: De-referencability
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 72
http://lod2.eu
• triples containing URIs from different namespaces as subject and object, establish a link between (the entity identified by the) subject with (the entity identified by the) object (typed RDF links)
Linked Data Benefits: Coherence
Berlin Germany
European Union
isCapitalOf
isMemberOfKnowledge base 1
Knowledgebase 2
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 73
http://lod2.eu
• RDF data model, is based on a single mechanism for representing information (triples) -> very easy to attain a syntactic and simple semantic integration of different Linked Data sets.
• higher level semantic integration can be achieved by employing schema and instance matching techniques and expressing found matches again as additional triple facts
Linked Data Benefits: Integrateability
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 74
http://lod2.eu
• Publishing and updating Linked Data is relatively simple thus facilitating a timely availability
• once a Linked Data source is updated it is straightforward to access and use the updated data source (time consuming and error prune extraction, transformation and loading not required)
Linked Data Benefits: Timeliness
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 75
http://lod2.eu
1. The Vision & Big Picture
2. Linked Data 101
3. The Linked Data Life-cycle
Agenda
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 76
http://lod2.eu
Achievements1. Extension of the Web with
a data commons (25B facts
2. vibrant, global RTD community
3. Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly)
4. Emerging governmental adoption in sight
5. Establishing Linked Data as a deployment path for the Semantic Web.
What works now? What has to be done?
Challenges
1. Coherence: Relatively few, expensively maintained links
2. Quality: partly low quality data and inconsistencies
3. Performance: Still substantial penalties compared to relational
4. Data consumption: large-scale processing, schema mapping and data fusion still in its infancy
5. Usability: Establishing direct end-user tools and network effect
• Web - a global, distributed platform for data, information and knowledge integration• exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web
using URIs and RDF
July 2007 April 2008 September 2008
July 2009
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 77
http://lod2.eu
Inter-linking/ Fusing
Classifi-cation/
Enrichment
Quality Analysis
Evolution / Repair
Search/ Browsing/
Exploration
Extraction
Storage/ Querying
Manual revision/ authoring
Linked DataLifecycle
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 78
http://lod2.euExtracti
on
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 79
http://lod2.eu
From unstructured sources
• NLP, text mining, annotation
From semi-structured sources
• DBpedia, LinkedGeoData,
SCOVO/DataCube
From structured sources
• RDB2RDF
Extraction
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 80
http://lod2.eu
extract structured information from Wikipedia
& make this information available on the Web as
LOD:• ask sophisticated queries against Wikipedia (e.g.
universities in brandenburg, mayors of elevated towns,
soccer players),
• link other data sets on the Web to Wikipedia data
• Represents a community consensus
Recently launched DBpedia Live transforms
Wikipedia into a structured knowledge base
Transforming Wikipedia into an Knowledge Base
Structure in Wikipedia
• Title• Abstract• Infoboxes• Geo-coordinates• Categories• Images• Links
– other language versions– other Wikipedia pages– To the Web– Redirects– Disambiguations
Infobox templates{{Infobox Korean settlement| title = Busan Metropolitan City| img = Busan.jpg| imgcaption = A view of the [[Geumjeong]] district in Busan| hangul = 부산 광역시...| area_km2 = 763.46| pop = 3635389| popyear = 2006| mayor = Hur Nam-sik| divs = 15 wards (Gu), 1 county (Gun)| region = [[Yeongnam]]| dialect = [[Gyeongsang]]}}
http://dbpedia.org/resource/Busan
dbp:Busan dbpp:title ″Busan Metropolitan City″dbp:Busan dbpp:hangul ″ 부산 광역시″ @Hangdbp:Busan dbpp:area_km2 ″763.46“^xsd:floatdbp:Busan dbpp:pop ″3635389“^xsd:intdbp:Busan dbpp:region dbp:Yeongnamdbp:Busan dbpp:dialect dbp:Gyeongsang...
Wikitext-Syntax
RDF representation
A vast multi-lingual, multi-domain knowledge base
DBpedia extraction results in:• descriptions of ca. 3.4 million things (1.5 million classified in a consistent ontology,
including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases
• labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages;4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories
• altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions
• DBpedia Live (http://live.dbpedia.org/sparql/) &Mappings Wiki (http://mappings.dbpedia.org)integrate the community into a refinement cycle
• Upcomming DBpedia inline
2011/05/12 CONSEGI - Sören Auer: DBpedia 84
DBpedia Architecture
Extraction Job
Extraction Manager
PageCollections
DestinationsN-TripleDumps
WikipediaDumps
WikipediaOAI-PMH
DatabaseWikipedia
LiveWikipedia
N-TripleSerializer
SPARQL-UpdateDestination
Extractors
Generic Infobox
Label
Geo
Redirect Disambiguation
Image
Abstract Pagelink
Parsers
DateTime Units
Ontology-Mappings
Mapping-based Infobox
String-List Numbers
Geo
SPARQL endpoint
SPARQL endpoint
Linked Data
Linked Data
The WebThe WebRDF browser
HTML browserSPARQL clients
DBpedia apps
Triple StoreVirtuoso
Triple StoreVirtuoso
UpdateStream
Article-Queue
Wikipedia
Category
2011/05/12 CONSEGI - Sören Auer: DBpedia 85
Hierarchies
DBpedia Ontology Schema:
manually created for DBpedia (infoboxes) 275 classes + 1335 properties; 20mio triples
YAGO:
large hierarchy linking Wikipedia leaf categories to WordNet 250,000 classes
UMBEL (Upper Mapping and Binding Exchange Layer):
20000 classes derived from OpenCyc
Wikipedia Categories:
Not a class hierarchy (e.g. cycles), represented using SKOS 415,000+ categories
2011/05/12 CONSEGI - Sören Auer: DBpedia 86
DBpedia SPARQL Endpoint
http://dbpedia.org/sparql hosted on a OpenLink Virtuoso server can answer SPARQL queries like
Give me all Sitcoms that are set in NYC? All tennis players from Moscow? All films by Quentin Tarentino? All German musicians that were born in Berlin in the 19th
century? All soccer players with tricot number 11, playing for a club
having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?
2011/05/12 CONSEGI - Sören Auer: DBpedia 87
DBpedia SPARQL EndpointSELECT ?name ?birth ?description ?person WHERE {
?person dbp:birthPlace dbp:Berlin .
?person skos:subject dbp:Cat:German_musicians .
?person dbp:birth ?birth .
?person foaf:name ?name .
?person rdfs:comment ?description .
FILTER (LANG(?description) = 'en') .
} ORDER BY ?name
2011/05/12 CONSEGI - Sören Auer: DBpedia 88
DBpedia Applications
DBpedia Mobile: location aware mobile client for DBpedia Uses current location and DBpedia to display map Can navigate into other knowledge bases
DBpedia Query Builder: user front end for building queries
DBpedia Relationship Finder finds relation between two objects in DBpedia
2011/05/12 CONSEGI - Sören Auer: DBpedia 89
DBpedia Applications
2011/05/12 CONSEGI - Sören Auer: DBpedia 90
DBpedia Applications: Relfinder
http://www.visualdataweb.org/relfinder.php
2011/05/12 CONSEGI - Sören Auer: DBpedia 91
DBpedia Applications: Zemanta
2011/05/12 CONSEGI - Sören Auer: DBpedia 92
DBpedia Applications: Faceted-Browser
2011/05/12 CONSEGI - Sören Auer: DBpedia 93
DBpedia Applications (3rd party)
Muddy Boots (BBC): Annotate actors in BBC News with DBpedia identifiers
Open Calais (Reuters): named entity recognition; entities are connected via owl:sameAs to DBpedia, Freebase, Geonames
Faviki: Social Bookmarking Tool uses DBpedia in backend to group tags etc. and multi-language support
Topbraid Composer: ontology editor, which links entities to DBpedia based on their labels
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 94
LinkedGeoData
Conversion, interlinking and publishing of OpenStreetMap.org* data sets as RDF.
* ”Wikipedia for geographic data”
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 95
Motivation
● Ease information integration tasks that require spatial knowledge, such as
● Offerings of bakeries next door
● Map of distributed branches of a company
● Historical sights along a bicycle track
● Therefore use RDF/OWL in order overcome structural and semantic heterogeneity.
● Requires a vocabulary – which we try to establish.
● LOD cloud contains data sets with spatial features● e.g. Geonames, DBpedia, US census, EuroStat
● But: they are restricted to popular or large entities like countries, famous places etc.
● Therefore they lack buildings, roads, mailboxes, etc.
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 96
OpenStreetMap - Datamodel
● Basic entities are:● Nodes Latitude, Longitude
● Ways Sequence of nodes
● Relations Associations between any number of nodes, ways and relations.
● Each entity may be described with tags (= key-value pairs)
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 97
Example: Leipzig's zoo
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 98
Data/Mapping Example
node_id | k | v-----------+------------------+--------------------- 259212302 | name | Universität Leipzig, Mathematik und Informatik 259212302 | amenity | university 259212302 | addr:street | Johannisgasse 259212302 | addr:postcode | 04103 259212302 | addr:housenumber | 26 259212302 | addr:city | Leipzig
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 99
Data/Mapping Example
node_id | k | v-----------+------------------+--------------------- 259212302 | name | Universität Leipzig, Mathematik und Informatik 259212302 | amenity | university 259212302 | addr:street | Johannisgasse 259212302 | addr:postcode | 04103 259212302 | addr:housenumber | 26 259212302 | addr:city | Leipzig
lgd:node259212302 a lgdo:University ; rdfs:label "Universität Leipzig, Mathematik undInformatik" ; lgdo:hasCity "Leipzig" ; lgdo:hasHouseNumber "26" ; lgdo:hasPostalCode "04103" ; lgdo:hasStreet "Johannisgasse" ; georss:point "51.3369334 12.385401" ; geo:lat 51.3369334 ; geo:long 12.385401 .
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 100
Mapping Types
● Three Mapping Types● Text
– (5, name, Leipzig) → lgd:node5 rdfs:label ”Leipzig”
– (5, name:de, Leipzig) → lgd:node5 rdfs:label ”Leipzig”@de
● Datatypes– (6, seats, 4) → lgd:node6 lgdo:seats ”4”^^xsd:integer
● Classes/Object Properties– (7, place, city) → lgdn:7 a lgdo:City
– (7, religion, pastafarian) → lgdn:7 lgdo:religion lgdo:Pastafarian
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 101
Access
● Rest Interface (based on Postgis DB, full osm dataset loaded, > 1billion triples)
● Supports limited queries (e.g. circular/rectangular area, filtering by labels)
● Sparql Endpoints (based on Virtuoso DB, subset of osm dataset, ~222M triples)
● Static (http://linkedgeodata.org/sparql)● Live (http://live.linkedgeodata.org/sparql)
● Downloads (http://downloads.linkedgeodata.org)
● Monthly updates on the above datasets envisioned
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 102
LinkedGeoData Live
● OpenStreetMap provides full dumps and minutely changesets for download
● Changesets are numbered, e.g. ”001/234/567.osc.gz”
● We also convert the changesets to sets of added and removed triples (relative to our store) and publish them
● 001/234/567.added.nt.gz
● 001/234/567.removed.nt.gz
● Advantage: Other users could easily sync their RDF store with LinkedGeoData
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 103
DBpedia Mapping – Step By Step
Given a DBpedia point, query LGD points within type specific maximum distance
Basic idea (performed with Silk):
● Compute spatial score
● Compute name similarity (rdfs:label)
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 104
DBpedia Mapping – Step By Step
Given a DBpedia point, query LGD points within type specific maximum distance
Basic idea (performed with Silk):
● Compute spatial score
● Compute name similarity (rdfs:label)
● Combine both scores
● Depending on final score, either automatically accept/reject links or mark for manual verification.
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 105
Statistics (2011-Feb-23)
● 222.539.712 Triples
● 6.666.865 Ways● 5.882.306 Nodes
● Among them
● 352.673 PlaceOfWorship● 60.573 RailwayStation● 59.468 Recycling● 50.955 Town● 30.099 Toilet● 7.222 City
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW)
Authors: Sören Auer, Jens Lehmann, Slide 106
Conclusion
● OpenStreetMap
● immensely successful project for collaboratively creating free spatial data
● Community uses key value structures, which provide a rich source of information
● Key strength: broad coverage
● LGD Contributions
● Established mapping to Dbpedia
● Geonames mapping partially done (37 different entity types cities, churches, ...)
● Facet-based LGD Browser provides an interface for OSM/LGD, which highlights its structural aspects
● Live sync
● Goal: Make LGD as useful (succesful) as DBpedia for the geospatial domain
Creating Knowledge out of Interlinked Data
Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 107
http://lod2.eu
Many different approaches (D2R, Virtuoso
RDF Views, Triplify, …)
No agreement on a formal
semantics of RDF2RDF
mapping
• LOD readiness,
SPARQL-SQL translation
W3C RDB2RDF WG
Extraction Relational Data
Tool Triplify D2RQ Virtuoso RDF Views
TechnologyScripting
languages (PHP)
JavaWhole
middleware solution
SPARQL endpoint - X X
Mapping language SQL RDF based RDF based
Mapping generation Manual Semi-
automatic Manual
ScalabilityMedium-high
(but no SPARQL)
Medium High
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 108
http://lod2.eu
From unstructured sources
• Deploy existing NLP approaches (OpenCalais, Ontos API)
• Develop standardized, LOD enabled interfaces between NLP tools
(NLP2RDF)
From semi-structured sources
• Efficient bi-directional synchronization
From structured sources
• Declarative syntax and semantics of data model transformations
(W3C WG RDB2RDF)
Orthogonal challenges
• Using LOD as background knowledge
• Provenance
Extraction Challenges
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 109
http://lod2.eu
Storage and Querying
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 110
http://lod2.eu
Still by a factor 5-50 slower than relational data management
(BSBM, DBpedia Benchmark)
Performance increases steadily
Comprehensive, well-supported open-soure and commercial
implementations are available:• OpenLink’s Virtuoso (os+commercial)
• Big OWLIM (commercial), Swift OWLIM (os)
• 4store (os)
• Talis (hosted)
• Bigdata (distributed)
• Allegrograph (commercial)
• Mulgara (os)
RDF Data Management
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 111
http://lod2.eu
• Uses DBpedia as data and a
selection of 25 frequently
executed queries
• Can generate fractions and
multiples of DBpedia‘s size
• Does not resemble
relational data
Performance differences,
observed with other
benchmarks are amplified
DBpedia Benchmark
Geometric Mean
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 112
http://lod2.eu
• Reduce the performance gap between
relational and RDF data management
• SPARQL Query extensions• Spatial/semantic/temporal data management
• More advanced query result caching
• View maintenance / adaptive reorganization
based on common access patterns
• More realistic benchmarks
Storage and Querying Challenges
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 113
http://lod2.eu
Authoring
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 114
http://lod2.eu
1. Semantic (Text)
Wikis
• Authoring of
semantically
annotated texts
2. Semantic Data
Wikis
• Direct authoring of
structured information
(i.e. RDF, RDF-
Schema, OWL)
Two Kinds of Semantic Wikis
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 115
http://lod2.eu
Versatile domain-independent tool
Serves as Linked Data / SPARQL endpoint on the Data
Web
Open-source project hosted at Google code
Not just a Wiki UI, but a whole framework for the
development of Semantic Web applications
Developed in PHP based on the Zend framework
Very active developer and user community
More than 500 downloads monthly
Large number of use cases
OntoWiki – a semantic data wiki
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 116
http://lod2.eu
Ont
oWik
iDynamic views on knowledge bases
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 117
http://lod2.eu
OntoWiki
RDF triples on resource details page
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 118
http://lod2.eu
OntoWiki
Dynamische Vorschläge aus dem Daten Web
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 119
http://lod2.eu
Catalogus Professorum Lipsiensis
OntoWiki: Caucasian Spiders
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 121
http://lod2.eu
RDFauthor in OntoWiki
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 122
http://lod2.eu
Semantic Portal with OntoWiki: Vakantieland
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 123
http://lod2.eu
RDFaCE- RDFa Content Editor
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 124
http://lod2.eu
RDFaCE Architecture
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 125
http://lod2.eu
Integrating various NLP APIs
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 126
http://lod2.eu
© CC-BY-NC-ND by ~Dezz~ (residae on flickr)
Linking
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 127
http://lod2.eu
Automatic
Semi-automatic• SILK
• LIMES
Manual• Sindice integration into UIs
• Semantic Pingback
LOD Linking
LIMES 0.3: Basic IdeaUses the characteristics of metric
spacesEspecially consequences of
triangle inequality◦d(x, y) < d(x, z) + d(z, y) ◦d(x, z) - d(z, y) < d(x, y) < d(x, z)
+ d(z, y) Basic idea
◦Use pessimistic approximations of distances instead of computing them
◦Only compute distances when needed
Overview
Computation of
exemplarsFiltering
Similarity computati
on
Serialization
Knowledge sources
Computation of ExemplarsAssumption: number of
exemplars is givenGoal: Segment target data set
Computation of Exemplars
Computation of Exemplars
Computation of Exemplars
Computation of Exemplars
Computation of Exemplars
NB: Distances from exemplars to all other points are known
Filtering
x y
z
1. Measure distance from each x to each exemplar
Filtering
x y
z
2. Apply d(x, y) - d(y, z) > t d(x, z) > t
Similarity Computation
x y
z
d(x, y) - d(y, z) < t Compute d(x, z)
SerializationResults are returned as RDFFor example mapping DBpedia and Drugbank
@prefix drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/> .
@prefix dbpedia: <http://dbpedia.org/ontology/> .@prefix owl: <http://www.w3.org/2002/07/owl#> .dbpedia:Cefaclor owl:sameAs drugbank:DB00833 .dbpedia:Clortermine owl:sameAs drugbank:DB01527 .dbpedia:Prednicarbate owl:sameAs
drugbank:DB01130 .dbpedia:Linezolid owl:sameAs drugbank:DB00601 .dbpedia:Valaciclovir owl:sameAs drugbank:DB00577 .….
ExperimentsQ1: What is the best number of
exemplars?Q2: What is the relation between
the similarity threshold q and the total number of comparisons?
Q3: Does the assignment of S and T matter?
Q4: How does LIMES compare to SILK?
Q1 and Q2Experiments on synthetic dataKnowledge bases of sizes 2000,
3000, 5000, 7500 and 10000Varied number of exemplarsVaried thresholdsExperiments were repeated 5
timesAverage results are presented
Q1 and Q2
0 50 100 150 200 250 3000
20000000
40000000
60000000
80000000
100000000
120000000
0.750.80.850.90.95Brute force
Q1 and Q2Q1
◦Best number of exemplars depends on q
◦For q > 0.9, best number lies around |T|1/2
Q2◦As expected, number of comparisons
diminishes with growing q
Q3 (order of S and T)Experiments on synthetic dataKnowledge bases of sizes 1000,
2000, 3000, …, 10000Number of exemplars was |T|1/2
Experiments were repeated 5 times
Average results are presented
Q3T\S 1000 2000 3000 4000 5000 6000 7000 8000 9000 100001000 0.20 0.37 0.53 0.69 0.88 1.04 1.14 1.40 1.58 1.672000 0.36 0.64 0.88 1.24 1.37 1.63 1.97 2.25 2.50 2.703000 0.51 0.86 1.17 1.57 2.00 2.09 2.69 2.91 3.35 3.584000 0.70 1.11 1.59 2.00 2.45 2.88 3.10 3.61 3.94 4.505000 0.85 1.36 1.87 2.28 2.81 3.39 3.91 4.20 4.84 5.546000 1.02 1.60 2.14 2.81 3.29 3.93 4.44 4.96 5.39 6.087000 1.22 1.86 2.58 3.15 3.66 4.35 5.11 5.69 6.44 6.628000 1.41 2.04 2.78 3.43 4.06 4.98 5.51 6.55 7.14 7.539000 1.63 2.36 2.99 3.85 4.72 5.44 6.25 6.88 7.59 8.20
10000 1.80 2.62 3.51 4.25 4.97 6.01 6.33 7.81 8.31 9.15
Green = S first is more time-efficient
Overall less than 5% difference
Q4 (comparison with SILK)3 Experiments on real data
◦Drugs◦Diseases◦SimCities
Number of exemplars was |T|1/2
Comparison of runtime with SILKExperiments were repeated
thriceBest runtimes are presented
Q4
Drugbank SimCities Diseases0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
LIMES (0.95)LIMES (0.90)LIMES (0.85)LIMES (0.80)LIMES (0.75)SILK
Q4We outperform SILK 2 by 1.5
orders of magnitudeThe larger the data sources, the
higher our speedup (64 for SimCities)
Creating Knowledge out of Interlinked Data
Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 151
http://lod2.eu
update and notification services for LOD
Downward compatible with Pingback (blogosphere)
http://aksw.org/Projects/SemanticPingBack
Creating a network effect aroundLinking Data: Semantic Pingback
Creating Knowledge out of Interlinked Data
Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 152
http://lod2.eu
Visualizing Pingbacks in OntoWiki
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 153
http://lod2.eu
Only 5% of the information on the Data Web is actually
linked
• Make sense of work in the de-duplication/record linkage
literature
• Consider the open world nature of Linked Data
• Use LOD background knowledge
• Zero-configuration linking
• Explore active learning approaches, which integrate users in a
feedback loop
• Maintain a 24/7 linking service: Linked Open Data Around-The-
Clock project (LATC-project.eu)
Interlinking Challenges
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 154
http://lod2.eu
Enrichment
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 155
http://lod2.eu
Linked Data is mainly instance data and !!!
ORE (Ontology Repair and Enrichment) tool allows to improve an
OWL ontology by fixing inconsistencies & making suggestions for
adding further axioms.• Ontology Debugging: OWL reasoning to detect inconsistencies and
satisfiable classes + detect the most likely sources for the problems.
user can create a repair plan, while maintaining full control.
• Ontology Enrichment: uses the DL-Learner framework to suggest
definitions & super classes for existing classes in the KB. works if
instance data is available for harmonising schema and data.
http://aksw.org/Projects/ORE
Enrichment & Repair
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 156
http://lod2.eu
Analysis
Quality
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 157
http://lod2.eu
Quality on the Data Web is varying a lot• Hand crafted or expensively curated
knowledge base (e.g. DBLP, UMLS) vs.
extracted from text or Web 2.0 sources
(DBpedia)
Research Challenge• Establish measures for assessing the authority,
provenance, reliability of Data Web resources
Linked Data Quality Analysis
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 158
http://lod2.euEvolutio
n© CC-BY-SA by alasis on flickr)
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 159
http://lod2.eu
• unified method, for both data evolution and ontology refactoring.
• modularized, declarative definition of evolution patterns is relatively
simple compared to an imperative description of evolution• allows domain experts and knowledge engineers to amend the ontology
structure and modify data with just a few clicks
• Combined with RDF representation of evolution patterns and their
exposure on the Linked Data Web, EvoPat facilitates the development
of an evolution pattern ecosystem• patterns can be shared and reused on the Data Web.
• declarative definition of bad smells and corresponding evolution
patterns promotes the (semi-)automatic improvement of
information quality.
EvoPat – Pattern based KB Evolution
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 160
http://lod2.eu
Evolution Patterns
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 162
http://lod2.eu
Exploration
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 163
http://lod2.eu
An ecosystem of LOD visualizations
LOD
Exp
lora
tion
Wid
gets
Spatial faceted-browsing
Faceted-browsing
Statisticalvisualization
Entity-/faceted-Based browsing
Domain specificvisualizations … …
LOD
Dat
aset
sCh
oreo
grap
hyla
yer
• Dataset analysis (size, vocabularies, property histograms etc.)• Selection of suitable visualization widgets
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 164
http://lod2.eu
TODO: Put ULEI slides
Faceted spatial-semantic browsing component
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 165
http://lod2.eu
Pure JavaScript, requires only SPARQL Endpoint for data access, Cross-Origin Resource
Sharing (CORS) enabled.
operates on local spatial regions, doed not depend on global meta-data about the data
Source code:
• https://github.com/AKSW/SpatialSemanticBrowsingWidgets
Online Demo - LinkedGeoData Browser:
• http://browser.linkedgeodata.org
Next steps
• Polygone/curve markers, domain specific visualization templates, integration of other
sources, mobile interface
Publication:
• Claus Stadler, Jens Lehmann, Konrad Höffner, Sören Auer: LinkedGeoData: A Core for
a Web of Spatial Open Data. To appear in Semantic Web Journal - Special Issue on
Linked Spatiotemporal Data and Geo-Ontologies.
Faceted spatial-semantic browsing - Availability
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 166
http://lod2.eu
Generic entity-based exploration with OntoWikihttp://fintrans.publicdata.eu
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 167
http://lod2.eu
Domain-specific visualization:http://energy.publicdata.eu
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 168
http://lod2.eu
Visualization of statistic data (datacube vocab.)http://scoreboard.lod2.eu
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 170
http://lod2.eu
11.04.2023 Sören Auer - The emerging Web of Linked Data
170
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 171
http://lod2.eu
11.04.2023 Sören Auer - The emerging Web of Linked Data
171
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 173
http://lod2.eu
Visual Query Builder
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 174
http://lod2.eu
Relationship Finder in CPL
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 175
http://lod2.eu
Distributed Social Semantic Networking
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 176
http://lod2.eu
Social Networks are walled gardens• Take users' data out of their hands,• predefined privacy & data security regulations• infrastructure of a single provider (lock-in)• Facebook (600M+ users) = Web inside the Web• Interoperability is limited to proprietary APIs
Social networks should be open and evolving• allow users to control what to enter & keep control over their data• users should be able to host the data on infrastructure, which is under
their direct control, the same way as they host their own website (TBL)
We need a truly Distributed Social Semantic Network (DSSN)• Initial approaches appeared with GNU social and more recently Diaspora• a DSSN should be based on semantic resource descriptions and de-referenceability
so as to ensure versatility, reusability and openness in order to accommodate unforeseen usage scenarios• a number of standards and best-practices for social, Semantic Web applications such as FOAF, WebID and Semantic
Pingback emerged.
DistributedSocialSemanticNetworking
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 177
http://lod2.eu
(1) Resources announce services and feeds, feeds announce services – in particular a push service.
(2) Applications initiate ping requests to spin the Linked Data network
(3) Applications subscribe to feeds on push services and receive instant notifications on updates.
(4) Update services are able to modify resources and feeds (e.g. on request of an application)
(5) Personal and global search services index social network resources and are used by applications
(6) Access to resources & services can be delegated to applications by a WebID, i.e. application can act in name of
WebID owner
(7) The majority of all access operations is executed through standard web requests.
DSSN Architecture
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 178
http://lod2.eu
• Open-source, MVC architecture
• Plattform independent, based on HTML5, CSS,
Javascript
• jQuery, jQuery Mobile, jQuery UI
• rdfQuery – simple triple store in Javascript
• PhoneGap (Apache Device ready) native apps for
iOS, Android, Blackberry OS, WebOS, Symbian,
Bada
• http://aksw.org/Projects/MobileSocialSemanticWeb
DSSN Mobile Client
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 179
http://lod2.eu
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 180
http://lod2.eu
DSSN Mobile Browsing
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 181
http://lod2.eu
DSSN Mobile Editing
EU-FP7 LOD2 Project Overview . Page 182
http://lod2.eu
Creating Knowledge out of Interlinked DataInter-
linking/ Fusing
Classifi-cation/
Enrichment
Quality Analysis
Evolution / Repair
Search/ Browsing/ Exploratio
n
Extraction
Storage/ Querying
Manual revision/ authoring
LOD Lifecyclesupported byDebian basedLOD2 Stack
(released next week)
EU-FP7 LOD2 Project Overview . Page 183
http://lod2.eu
Creating Knowledge out of Interlinked Data
First release of the LOD2 Stack: stack.lod2.eu & demo.lod2.eu/lod2demo
EU-FP7 LOD2 Project Overview . Page 184
http://lod2.eu
Creating Knowledge out of Interlinked Data
EU-FP7 LOD2 Project Overview . Page 185
http://lod2.eu
Creating Knowledge out of Interlinked Data
AKSW Team
Creating Knowledge out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 186
http://lod2.eu
Thanks for your attention!
Sören Auerhttp://www.uni-leipzig.de/~auer/ | http://aksw.org | http://[email protected]