73
From Document Web to a Web of Linked Data Dr. Sören Auer AKSW, Institut für Informatik

Linked Data Tutorial

Embed Size (px)

DESCRIPTION

This tutorial explains the Data Web vision, some preliminary standards and technologies as well as some tools and technological building blocks developed by AKSW research group from Universität Leipzig.

Citation preview

Page 1: Linked Data Tutorial

From Document Web toa Web of Linked Data

Dr. Sören AuerAKSW, Institut für Informatik

Page 2: Linked Data Tutorial

Overview

1. The Linked Data Web Vision2. Data Web Technologies3. Publishing relational data on the Web4. DBpedia – transforming Wikipedia into a

knowledge base5. OntoWiki – an Linked Data Wiki6. Open Street Maps – linked open geo data

Linked Data Tutorial

Page 3: Linked Data Tutorial

From the Document Web to theLinked Open Data Web (and beyond)

Linked Data Tutorial

Web (since 1992)•HTTP•HTML/CSS/JavaScript

Semantic Web(Vision 1998, starting ???)•Reasoning•Logic, Rules•Trust

Social Web (since 2003)•Folksonomies/Tagging•Reputation, sharing•Groups, relationships

Data Web (since 2006)•URI de-referencability•CBD•RDF serializations

Page 4: Linked Data Tutorial

Conceptual LevelData Access and Integration

Linked Data Tutorial

Object-relational mappings (ORM)•NeXT’s EOF / WebObjects•ADO.NET Entity Framework•Hibernate

Entity-attribute-value (EAV)•HELP medical record system, TrialDB

Column-oriented DBMS•Collocates column values rather than row values•Vertica, C-Store, MonetDB

Data Web•URIs as entity identifiers•HTTP as data access protocol•Local-As-View (LAV)

RDBMS•Organize data in relations, rows, cells•Oracle, DB2, MS-SQL

Triple/Quad Stores•RDF data model•Virtuoso, Oracle, Sesame

Dat

a M

odel

sD

ata

Mod

els

Others•XML, hierachical, tree, graph-oriented DBMS

Procedural APIs•ODBC•JDBC

Dat

a Ac

cess

Dat

a Ac

cess

Query Languages•Datalog, SQL•SPARQL•XPATH/XQuery

Dat

a In

tegr

ation

Dat

a In

tegr

ation

Linked Data•de-referencable URIs•RDF serialization formats

Enterprise Information Integrationsets of heterogeneous data sources appear as a single, homogeneous data source

Data Warehousing•Based on extract, transform load (ETL)•Global-As-View (GAV)

ResearchMediatorsOntology-basedP2PWeb service-based

Page 5: Linked Data Tutorial

Web 1.0 Web 2.0 Web 3.0

Many Web sitescontaining unstructured,textual content

Few large Web sites are specialized onspecific content types

Many Web sites containing & semantically syndicating arbitrarily structured content

PicturesVideo

Encyclopedicarticles+ +

Linked Data Tutorial

Page 6: Linked Data Tutorial

The Long Tail of Information DomainsPictures

NewsVideo

Recipes

Calendar

Currently supportedstructuredcontent types

SemWeb supported structured content

Genesequences

Itinerary ofKing George

Talentmanagement

Popu

larit

y

Not or insufficiently supported content types

The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to information domains

… …

Requirements-Engineering

……

Special interestcommunities

Linked Data Tutorial

Page 7: Linked Data Tutorial

Web server

Web server

Why Do We Need Another Web?Try to search for these things on the current Web:• Apartments near German-French bilingual childcare in Leipzig.• ERP service providers with offices in Vienna and Berlin.• Researchers working on DB related topics in south-east Asia.Information to answer such search queries is available on the Web,

but opaque to current Web search.(Semantic) Data Web allows to complement text on Web pages with

structured data and to intelligently combine and integrate such structured information from different sources:

Linked Data Tutorial

Leipzig.deHas everything about childcare in L.e.

Immobilienscout.deKnows all about real estate offers in GermanyDB

Web serverWeb

server

DB

Web server

Search engineSearch engineHTML HTML

RDF RDF

Page 8: Linked Data Tutorial

Overview

1. The Linked Data Web Vision2. Data Web Technologies3. Publishing relational data on the Web4. DBpedia – transforming Wikipedia into a

knowledge base5. OntoWiki – an Linked Data Wiki6. Virtuoso – Knowledge Store7. Open Street Maps – free and open geo data

Linked Data Tutorial

Page 9: Linked Data Tutorial

RDF - Resource Description Framework

Distinguishes two fundamental base types:

Resources• Complex abstract or concret entities• Uniquely identified by an URI:

– http://DBpedia.org/resource/Vienna

Literals• concrete data values• Optionally typed (e.g. xsl:string, xsl:dateTime etc.) or language (e.g. en,

de):– "2008-05-31T09:30:00"^^xsd:dateTime– "Wien"@"de"

Linked Data Tutorial

Page 10: Linked Data Tutorial

RDF Statement / Triple ParadigmRDF/XML:

<?xml version="1.0"?><rdf:RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/metadata/dublin_core#">

<Description about=" http://OntoWiki.net "> <dc:Creator>Sören Auer</DC:Creator> </Description>

</rdf:RDF>

RDF/XML:

<?xml version="1.0"?><rdf:RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/metadata/dublin_core#">

<Description about=" http://OntoWiki.net "> <dc:Creator>Sören Auer</DC:Creator> </Description>

</rdf:RDF>

Linked Data Tutorial

http://OntoWiki.net Sören Auerdc:creator

Subject

(Resource)

Predicate

(Resource)

Object

(Resource/Literal)

RDF/N3:

http://OntoWiki.net http://purl.org/metadata/dublin_core#Creator "Sören Auer“

RDF/N3:

http://OntoWiki.net http://purl.org/metadata/dublin_core#Creator "Sören Auer“

Page 11: Linked Data Tutorial

RDF Document / Model / Graph

– Simple Knowledge Base

– Combines multiple RDF Statements

Linked Data Tutorial

[email protected]

http://OntoWiki.net http://aksw.org/staff/Soerendc:Creator

Sören Auer

foaf:Emailfoaf:Name

Page 12: Linked Data Tutorial

RDF Serialization<?xml version="1.0"?><rdf:RDF

xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/metadata/dublin_core#">

<rdf:Description about="http://OntoWiki.net"> <dc:Creator> <rdf:Description> <rdf:Description about="http://aksw.org/staff/Soeren"> <dc:Name>Sören Auer</dc:Name> <dc:Email>[email protected]</dc:Email> </rdf:Description> </dc:Creator> </rdf:Description></rdf:RDF>

Linked Data Tutorial

http://OntoWiki.net http://purl.org/metadata/dublin_core#Creator http://aksw.org/staff/Soerenhttp://aksw.org/staff/Soeren http://purl.org/metadata/dublin_core#Name "Sören Auer"http://aksw.org/staff/Soeren http://purl.org/metadata/dublin_core#Email [email protected]

http://OntoWiki.net http://purl.org/metadata/dublin_core#Creator http://aksw.org/staff/Soerenhttp://aksw.org/staff/Soeren http://purl.org/metadata/dublin_core#Name "Sören Auer"http://aksw.org/staff/Soeren http://purl.org/metadata/dublin_core#Email [email protected]

[email protected]

http://OntoWiki.net http://aksw.org/staff/SoerenCreator

Sören Auer

EmailName

Page 13: Linked Data Tutorial

RDF Schema

Restrict combinations of resources / literals

Structuring of vocabularies

Instantiation / classification

Provisioning of special resources:• Classes (concepts, frames)

http://www.w3.org/2000/01/rdf-schema#Class• Attributes (properties, slots, roles)

http://www.w3.org/2000/01/rdf-schema#Property• Instances (objects)

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

Linked Data Tutorial

http://OntoWiki.net 16.11.2007dc:creator ?

Page 14: Linked Data Tutorial

RDF-S Class & PropertyHierarchies

Beer rdf:type rdfs:ClassBottomFermentedBeer rdfs:subClassOf BeerBock rdfs:subClassOf BottomFermentedBeerLager rdfs:subClassOf BottomFermentedBeerPilsner rdfs:subClassOf BottomFermentedBeer

Beer rdf:type rdfs:ClassBottomFermentedBeer rdfs:subClassOf BeerBock rdfs:subClassOf BottomFermentedBeerLager rdfs:subClassOf BottomFermentedBeerPilsner rdfs:subClassOf BottomFermentedBeer

Linked Data Tutorial

hasContent rdf:type rdfs:PropertyhasAlcoholicContent rdfs:subPropertyOf BeerhasOriginalWortContent rdfs:subClassOf BottomFermentedBeer

hasContent rdf:type rdfs:PropertyhasAlcoholicContent rdfs:subPropertyOf BeerhasOriginalWortContent rdfs:subClassOf BottomFermentedBeer

Page 15: Linked Data Tutorial

RDF-S Properties… are defined and used independently from classes

Domain: Association with one or multiple classes

Range: defines values the property can assume– Instances of a certain class– literals typed with a certain XML schema data type

Linked Data Tutorial

hasAlcoholicContent rdf:type owl:DatatypePropertyhasAlcoholicContent rdf:type owl:FunctionalPropertyhasAlcoholicContent rdfs:domain BeerhasAlcoholicContent rdfs:range xsd:floathasAlcoholicContent rdfs:subPropertyOf hasContent brews rdf:type owl:ObjectPropertybrews rdfs:domain Brewerybrews rdfs:range Beer

hasAlcoholicContent rdf:type owl:DatatypePropertyhasAlcoholicContent rdf:type owl:FunctionalPropertyhasAlcoholicContent rdfs:domain BeerhasAlcoholicContent rdfs:range xsd:floathasAlcoholicContent rdfs:subPropertyOf hasContent brews rdf:type owl:ObjectPropertybrews rdfs:domain Brewerybrews rdfs:range Beer

Page 16: Linked Data Tutorial

RDF-S InstancesAre associated to one (or multiple) class(es) :

Linked Data Tutorial

Boddingtons rdf:type AleGrafentrunk rdf:type BockHoegaarden rdf:type WhiteJever rdf:type Pilsner

Boddingtons rdf:type AleGrafentrunk rdf:type BockHoegaarden rdf:type WhiteJever rdf:type Pilsner

Page 17: Linked Data Tutorial

Linked Data Tutorial

Semantic Web Layer Cake

Page 18: Linked Data Tutorial

Linked Data - Paradigm

• Use URIs as names for things

• Use HTTP URIs so that people can look up those names.

• When someone looks up a URI, provide useful information.

• Include links to other URIs. so that they can discover more things.

Page 19: Linked Data Tutorial

Linked Data – Publishing RDF

• De-referenceable RDF-URIs, e.g.:http://dbpedia.org/resource/Busan

• Different HTTP response depending on HTTP-Accept-

Header

Linked Data Tutorial

Page 20: Linked Data Tutorial

Benefits of using the RDF Data Model in the Linked Data Context

• Clients can look up every URI in an RDF graph over the Web to retrieve additional information.

• Information from different sources merges naturally.• The data model enables you to set RDF links between data

from different sources.• The data model allows you to represent information that is

expressed using different schemata in a single model.• Combined with schema languages such as RDF-S or OWL,

the data model allows you to use as much or as little structure as you need, meaning that you can represent tightly structured data as well as semi-structured data.

Linked Data Tutorial

Page 21: Linked Data Tutorial

Linking Open Data (LOD) Cloud

Linked Data Tutorial

Page 22: Linked Data Tutorial

Data Web Moving Targets

Base technologies (RDF, SPARQL, HTTP etc.) are developed, standardized and ready to use

Big issues:• Scalability• User interfaces• Search engines• Business models• (Reasoning)

Linked Data Tutorial

Page 23: Linked Data Tutorial

Data Web Business Models

• Advertisement (page view) based businesses will probably not be first movers

• Large Web companies will probably not be first movers

• Data Web should focus on fragmented markets with many players which require widest distribution of information, e.g. realtors, online shops, transportation service providers, public information, geo data etc.

Linked Data Tutorial

Page 24: Linked Data Tutorial

Overview

1. The Linked Data Web Vision2. Data Web Technologies3. Publishing relational data on the Web4. DBpedia – transforming Wikipedia into a

knowledge base5. OntoWiki – an Linked Data Wiki6. Open Street Maps – free and open geo data

Linked Data Tutorial

Page 25: Linked Data Tutorial

Triplify Motivation• growth of semantic representations

still outpaced by the traditional Web• overcome the chicken-and-egg

dilemma of missing semantic representations and search facilities on the Web

• Triplify leverages relational representations behind existing Web applications:– often open-source, deployed hundred

thousand times– structure and semantics encoded

in relational database schemes (behind Web apps) is not accessible to Web search engines, mashups etc.

Linked Data Tutorial

Monthly Web application downloads at Sourceforge

Page 26: Linked Data Tutorial

Triplify Big Picture

Linked Data Tutorial

Page 27: Linked Data Tutorial

Triplify Approach: Simplicity• Expose semantics as simple as possible

– No (new) mapping languages– Few lines of code – easy to plug-in– Simple, reusable configurations

• Available for most popular Web app languages– PHP (ready), Ruby/Python under development

• Works with most popular Web app DBs– MySQL (extensively tested), PHP-PDO DBs (SQLite, Oracle,

DB2, MS SQL, PostgreSQL etc.) should work, not needed for Virtuoso

• Triplify exposes RDF/Ntriples, LinkedData and RDF/JSON

Linked Data Tutorial

Page 28: Linked Data Tutorial

Triplify Solution: SQL-SELECT queries map relational data to RDF

Triplify Configuration:• number of SQL queries selecting information, which should be made publicly

available.

Special SQL query result structure required (in order to convert results into RDF:• first column must contain identifiers for generating instance URIs (i.e. the primary

key of DB table) • column names are used to generate property URIs, renaming columns allows to

reuse properties from existing vocabularies such as Dublin Core, FOAF, SIOC– e.g. SELECT id, name AS 'foaf:name' FROM users

• individual cells contain data values or references to other instances(eventually constitute the objects of resulting triples)

Linked Data Tutorial

Page 29: Linked Data Tutorial

Example: Wordpress Blog PostsAssociate the URL path fragment 'post‘ with a number of

SQL patterns:http://blog.aksw.org/triplify/post/(xxx)

SELECT id, post_author AS 'sioc:has_creator->user',post_title  AS 'dc:title',post_content  AS 'sioc:content', post_date  AS 'dcterms:modified^^xsd:dateTime‘,post_modified  AS 'dcterms:created^^xsd:dateTime'

FROM postsWHERE post_status='publish‘ (AND id=xxx)

SELECT post_id id, tag_label  AS 'tag:taggedWithTag‘FROM post2tag INNER JOIN tag ON(post2tag.tag_id=tag.tag_id)(WHERE id=xxx)

SELECT post_id id, category_id  AS 'belongsToCategory->category‘FROM post2cat(WHERE id=xxx)

Linked Data Tutorial

Object propertyObject property

Datatype propertyDatatype property

1

2

3

Page 30: Linked Data Tutorial

RDF Conversion

id post_author post_title post_content post_date post_modified

1 5 New DBpedia release Today we released … 200810201635 200810201635

Linked Data Tutorial

http://blog.aksw.org/triplify/post/1 sioc:has_creator http://blog.aksw.org/triplify/user/5http://blog.aksw.org/triplify/post/1 dc:title “New DBpedia release”http://blog.aksw.org/triplify/post/1 sioc:content “Today we released …”http://blog.aksw.org/triplify/post/1 dcterms:modified “20081020T1635”^^xsd:dateTimehttp://blog.aksw.org/triplify/post/1 dcterms:created “20081020T1635”^^xsd:dateTimehttp://blog.aksw.org/triplify/post/1 tag:taggedWithTag “DBpedia”http://blog.aksw.org/triplify/post/1 tag:taggedWithTag “Release”http://blog.aksw.org/triplify/post/1 belongsToCategory http://blog.aksw.org/triplify/category/34

id tag:taggedWithTag

1 DBpedia

1 Release

..

id belogsToCategory

1 34

1

2 3

http://blog.aksw.org/triplify/post/1

Page 31: Linked Data Tutorial

Example Config<?php

include('../wp-config.php');

$triplify['namespaces']=array( 'vocabulary'=>'http://triplify.org/vocabulary/Wordpress/', 'foaf'=>'http://xmlns.com/foaf/0.1/', … );

$triplify['queries']=array( 'post'=>array( "SELECT id,post_author 'sioc:has_creator->user',post_date 'dcterms:created',post_title 'dc:title', post_content 'sioc:content', post_modified 'dcterms:modified‘ FROM {$table_prefix}posts WHERE post_status='publish'", "SELECT post_id id,tag_id 'tag:taggedWithTag' FROM {$table_prefix}post2tag", "SELECT post_id id,category_id 'belongsToCategory' FROM {$table_prefix}post2cat", ), 'tag'=>"SELECT tag_ID id,tag 'tag:tagName' FROM {$table_prefix}tags", 'category'=>"SELECT cat_ID id,cat_name 'skos:prefLabel',category_parent 'skos:narrower' FROM {$table_prefix}categories", 'user'=>array( "SELECT id,user_login 'foaf:accountName',SHA(CONCAT('mailto:',user_email)) 'foaf:mbox_sha1sum', user_url 'foaf:homepage',display_name 'foaf:name' FROM {$table_prefix}users", "SELECT user_id id,meta_value 'foaf:firstName' FROM {$table_prefix}usermeta WHERE meta_key='first_name'", "SELECT user_id id,meta_value 'foaf:family_name' FROM {$table_prefix}usermeta WHERE meta_key='last_name'", ), 'comment'=>"SELECT comment_ID id,comment_post_id 'sioc:reply_of',comment_author AS 'foaf:name', SHA(CONCAT('mailto:',comment_author_email)) 'foaf:mbox_sha1sum', comment_author_url 'foaf:homepage', comment_date AS 'dcterms:created', comment_content 'sioc:content',comment_karma,comment_type FROM {$table_prefix}comments WHERE comment_approved='1'",);

$triplify['objectProperties']=array( 'sioc:has_creator'=>'user', 'tag:taggedWithTag'=>'tag', 'belongsToCategory'=>'category‘,'skos:narrower'=>'category','sioc:reply_of'=>'post');

$triplify['classMap']=array('user'=>'foaf:person', 'post'=>'sioc:Post', 'tag'=>'tag:Tag', 'category'=>'skos:Concept');

$triplify['TTL']=0; // Caching

$triplify['db']=new PDO('mysql:host='.DB_HOST.';dbname='.DB_NAME,DB_USER,DB_PASSWORD);?>

Linked Data Tutorial

Page 32: Linked Data Tutorial

Triplify Temporal ExtensionProblem: How do next generation search engines know

something changed on the Data Web?

Different solutions:• Try to crawl always everything: currently deployed on

the Web• Ping a central update notification service:

PingTheSemanticWeb.com – will probably not scale if the Data Web gets really deployed

• Each linked data endpoint publishes an update log:Triplify Update Logs

Linked Data Tutorial

Page 33: Linked Data Tutorial

Triplify Temporal Extensionhttp://example.com/Triplify/update

http://example.com/Triplify/update/2007 rdf:type update:UpdateCollection .http://example.com/Triplify/update/2008 rdf:type update:UpdateCollection .

http://example.com/Triplify/update/2008

http://example.com/Triplify/update/2008/Jan rdf:type update:UpdateCollection .http://example.com/Triplify/update/2008/Feb rdf:type update:UpdateCollection .

Nesting continues until we finally reach an URL, which exposes all updates performed in a certain second in time…

http://example.com/Triplify/update/2008/Jan/01/17/58/06

http://example.com/Triplify/update/2008/Jan/01/17/58/06/user123 update:updatedResource http://example.com/Triplify/users/JohnDoe ; update:updatedAt "20080101T17:58:06"^<xsd:dateTime> ; update:updatedBy http://example.com/Triplify/users/JohnDoe .

Linked Data Tutorial

special update path and vocabularyspecial update path and vocabulary

Page 34: Linked Data Tutorial

Triplify Spatial ExtensionHow to publish geo-data using Triplify?OpenStreetMaps – 160 GB Geo Data

lots of POIs – hotels, gas stations, universities …

http://LinkedGeoData.org/near/48.213056,16.359722/1000/Hotel

http://LinkedGeoData.org/point/212331http://LinkedGeoData.org/point/944523http://LinkedGeoData.org/point/234091

Linked Data Tutorial

Lon Lat Radius Tag

Page 35: Linked Data Tutorial

RDB2RDF tool comparison

Linked Data Tutorial

ToolTriplify R2DQ Virtuoso RDF

Views

TechnologyScripting languages

(PHP) Java Whole middleware solution

SPARQL endpoint- X X

Mapping languageSQL RDF based RDF based

Mapping generation Manual Semi-automatic Manual

Scalability Medium-high(but no SPARQL) medium High

More at: http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt

Page 36: Linked Data Tutorial

Relational Databases RDF & Ontologies

Data Model Relational(tables, columns, rows)

Triples(subject, predicate, object)

Schema and data separation

Implicit information

Scalability

Schema flexibility

Web data integration readiness

Marrying DBs with RDF & Ontologies

Linked Data Tutorial

Using DBs for storage and querying of RDF & ontologies

Publishing DB content as RDF

Page 37: Linked Data Tutorial

Overview

1. The Linked Data Web Vision2. Data Web Technologies3. Publishing relational data on the Web4. DBpedia – transforming Wikipedia into a

knowledge base5. OntoWiki – an Linked Data Wiki6. Open Street Maps – free and open geo data

Linked Data Tutorial

Page 38: Linked Data Tutorial

Transforming Wikipedia into aKnowledge base☺ Wikipedia is the 8th most popular website (according to Alexa.com)

☺ Maybe the finest example of truly collaboratively created content(>8M articles in >200 languages written by >300.000 authors)

☺ Covers all possible topics and domains, articles are a result of a “community consensus”

Θ Many inconsistencies can be found on different pages/language versions

Θ Not very well integrated with other data sources

Θ Lacks structured representations of content which facilitate querying and search

Simple Questions – hard to answer:

• What have the Art Nouveau and Berlin in common?

• Who are mayors of central European towns elevated more than 1000m?

• Which films are longer than 4 hours and had a budget of less than $1 Million?

The information required to answer these is contained in Wikipedia!

How can we reveal structure and semantics of Wikipedia content?Linked Data Tutorial

Page 39: Linked Data Tutorial

Structure in Wikipedia• Title• Abstract• Infoboxes• Geo-coordinates• Categories• Images• Links

– other language versions– other Wikipedia pages– To the Web– Redirects– Disambiguations

Linked Data Tutorial

Page 40: Linked Data Tutorial

Infobox templates{{Infobox Korean settlement| title = Busan Metropolitan City| img = Busan.jpg| imgcaption = A view of the [[Geumjeong]] district in Busan| hangul = 부산 광역시...| area_km2 = 763.46| pop = 3635389| popyear = 2006| mayor = Hur Nam-sik| divs = 15 wards (Gu), 1 county (Gun)| region = [[Yeongnam]]| dialect = [[Gyeongsang]]}}

http://dbpedia.org/resource/Busan

dbp:Busan dbpp:title ″Busan Metropolitan City″dbp:Busan dbpp:hangul ″ 부산 광역시″ @Hangdbp:Busan dbpp:area_km2 ″763.46“^xsd:floatdbp:Busan dbpp:pop ″3635389“^xsd:intdbp:Busan dbpp:region dbp:Yeongnamdbp:Busan dbpp:dialect dbp:Gyeongsang...

Wikitext-Syntax

RDF representation

Linked Data Tutorial

Page 41: Linked Data Tutorial

Class Hierarchy• 200k people (70k athletes, 65k artists, 18k office holders)• 193k places (100k areas, 40k cities, 10k rivers)• 187k works (71k music albums, 24k singles, 31k films, 15k books)• 87k species• 70k organisations (20k educational institutions, 18k companies, 12k

radio stations)• 22k buildings (8k airports, 5k stations, 2k stadiums, 1k bridges)• 12k planets

And more… (events, diseases, proteins, drugs, aircrafts, automobiles, ships, astronaut, architect, scientists)

Page 42: Linked Data Tutorial

Extraction resultsExtraction algorithm with the English Wikipedia content (

http://dumps.wikimedia.org/enwiki)

<1h needed to extract templates and convert them to RDF (>2M

English Wikipedia articles, >10GB raw data)

roughly 30M facts extracted from infobox templates alone

Sample checks reveal: ~90% accuracy, 9% redundant

information, 1% erroneous

multi-domain ontology covering a large body of domains

extraction results and source code of the extraction algorithm

available at http://dbpedia.org

Dataset (en) Triples

Articles 7.6M

Abstracts 2.1M

External Links 3.2M

Categories 7.3M

Infoboxes 29.3M

Persons 560k

Yago Classes 2M

Wordnet Classes 338k

Geo-coordinates 450k

Mapping to Flickr, DBLP, Eurostat, CIA-Factbook, Musicbrainz, Project Gutenberg, US Census, …

100k

Mapping to OpenCyc 45k

Linked Data Tutorial

Page 43: Linked Data Tutorial

DBpedia Components

Wikipedia Dumps

Article texts DB tables

InfoboxArticles Categories…

DBpedia datasets

SPARQLEndpoint

QueryBuilder

SNORQLBrowser

TraditionalWeb Browser

Web 2.0 Mashups

Virtuoso MySQL

Extraction

loaded into

published via

…LinkedData…

Semantic Web Browsers

OpenCyc

Wordnet

Freebase

Geonames…

interlinked withother open data

Linked Data Tutorial

Page 44: Linked Data Tutorial

User Interfaces

Linked Data Tutorial

Page 45: Linked Data Tutorial

DBpedia SPARQL Endpoint (1)

• http://dbpedia.org/sparql

• hosted on a OpenLink Virtuoso server

• can answer SPARQL queries like– Give me all Sitcoms that are set in NYC? – All tennis players from Moscow? – All films by Quentin Tarentino? – All German musicians that were born in Berlin in the 19th century? – All soccer players with tricot number 11, playing for a club having a

stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?

Page 46: Linked Data Tutorial

DBpedia SPARQL Endpoint (2)SELECT ?name ?birth ?description ?person WHERE {

?person dbp:birthPlace dbp:Berlin .

?person skos:subject dbp:Cat:German_musicians .

?person dbp:birth ?birth .

?person foaf:name ?name .

?person rdfs:comment ?description .

FILTER (LANG(?description) = 'en') .

} ORDER BY ?name

Linked Data Tutorial

Page 47: Linked Data Tutorial

Overview

1. The Linked Data Web Vision2. Data Web Technologies3. Publishing relational data on the Web4. DBpedia – transforming Wikipedia into a

knowledge base5. OntoWiki – an Linked Data Wiki6. Virtuoso – Knowledge Store7. Open Street Maps – free and open geo data

Linked Data Tutorial

Page 48: Linked Data Tutorial

OntoWiki

1.Semantic Wiki2.Differences3.Similarities4.Architecture5.Use Cases

Linked Data Tutorial

Page 49: Linked Data Tutorial

Semantic Wiki

• Wiki with added semantics• Goal: Wiki pages + background knowledge

base• Examples: Semantic MediaWiki, Rhizome,

IkeWiki

Linked Data Tutorial

Page 50: Linked Data Tutorial

Conceptual Differences: Views over Articles

Wiki articles Resource views

Linked Data Tutorial

Page 51: Linked Data Tutorial

Conceptual Differences:Forms over Code

Wiki code Forms

Linked Data Tutorial

Page 52: Linked Data Tutorial

Conceptual Similarities:Wikiwiki Concepts

• Everyone can edit anything• Content is edited in the same way as

structure is• Activity can be watched and reviewed by

everyone Ward Cunningham

Linked Data Tutorial

Page 53: Linked Data Tutorial

Versioning

• Everything can be undone• Philosophy: make it easy to correct

mistakes

Linked Data Tutorial

Page 54: Linked Data Tutorial

OntoWiki Application Framework: Interfaces

• SPARQL Endpoint• Linked Data Endpoint• WebDAV• REST API• Command Line Interface• LDAP

Linked Data Tutorial

Page 55: Linked Data Tutorial

Extensibility

• Plugins• Views/Templates• Themes• Localizations

Linked Data Tutorial

Page 56: Linked Data Tutorial

Access Control

• Model-based• Action-based• (Statement-based)

Linked Data Tutorial

Page 57: Linked Data Tutorial

Other Features

• Facet-based browsing• Inline editing• Auto-adaptive user interface• Resource auto-suggestion• SPARQL Query Editor

Linked Data Tutorial

Page 58: Linked Data Tutorial

Architecture

Linked Data Tutorial

Page 59: Linked Data Tutorial

Vision

• Generic data wiki for RDF models– no data model mismatch (structured vs.

unstructured)

• Application framework for:– Knowledge-intensive applications– Agile processes– Distributed user groups

Linked Data Tutorial

Page 60: Linked Data Tutorial

SoftWiki*

Linked Data Tutorial

Problem: Requirements Engineering with large, spatially distributed stakeholder groups

Solution: comprehensive ontology for representing RE relevant knowledge + adapted OntoWiki application

Application of text-miningmethods for duplicate detection

* Work in BmbF funded project with UniDuE, T-Systems, QA-Systems, LeCoS,ProDV

Page 61: Linked Data Tutorial

Linked Data Tutorial

Page 62: Linked Data Tutorial

Caucasian Spiders

• Faunistic database on spiders of the Caucasus

• Taxonomy• Localities• 240k triples

Linked Data Tutorial

Page 63: Linked Data Tutorial

Linked Data Tutorial

Page 64: Linked Data Tutorial

Professor Catalogue

• Professor catalogue with 800 entries and 60 schema elements

• OntoWiki used as backend for data entry• Custom front-end

Linked Data Tutorial

Page 65: Linked Data Tutorial

Linked Data Tutorial

Page 66: Linked Data Tutorial

Linked Data Tutorial

Page 67: Linked Data Tutorial

Semantic Wikis: Related Work

Linked Data Tutorial

OntoWiki Semantic MediaWiki

IkeWiki

Main developer Uni Leipzig AKSW AIFB Karlsruhe Salzburg Research

Technology PHP/MySQL PHP/MySQL (MediaWiki extension)

Java/Postgres

Base artifacts Facts (annotated) texts (annotated) texts

Authoring WYSIWIG facts / forms

Wiki syntax / semantic forms

WYSIWIG / forms

Other Data Web development framework

Planned Wikipedia deployment

Visual KB browser

Page 68: Linked Data Tutorial

Vakantieland*One of the largest tourist information sites in NL

(>100.000 daily page views, >20.000 points of interest)Traditional relational DB system was to inflexible to capture the increasingly

heterogeneous content types• Development of an OntoWiki based Data Web application• Geo-data integration from OpenStreetMaps• Semantic-Search• Integration of

DBpedia data• Comprehensive

performance tuning

* work with Ceriel Jakobs,Michael Martin partiallyfunded by SenterNovem

Linked Data Tutorial

Page 69: Linked Data Tutorial

Overview

1. The Linked Data Web Vision2. Data Web Technologies3. Publishing relational data on the Web4. DBpedia – transforming Wikipedia into a

knowledge base5. OntoWiki – an Linked Data Wiki6. Open Street Maps – linked open geo data

Linked Data Tutorial

Page 70: Linked Data Tutorial

Linked Open Geo DataSpatial data is crucial for the Data Web in order to interlink geographically linked resources.Open Street Map project (OSM) collects, organizes and publishes geo data the wiki way:• 80.000 OSM users collected data about 22M km ways (roads, highways etc.) on earth, 25T

km are added daily• OSM contains a vast amount points-of-interest descriptions e.g. shops, amenities, sports

venues, businesses, touristic and historic sights.Goal: publish OSM geo data, interlink it with other data sources and provide efficient means

for browsing and authoring:• Open Street Map data extraction works on the basis of OSM database dumps, a bi-

directional live integration of OSM and our Linked Geo Data browser and editor is currently in the works.

• Triplify spatial data publishing, the Triplify script for publishing linked data from relational databases is extended for publishing geo data, in particular with regard to the retrieval of information about geographical areas.

• LinkedGeo Data browser and editor is a facet-based browser for geo content, which uses an OLAP inspired hypercube for quickly retrieving aggregated information about any user selected area on earth.

Linked Data Tutorial

Page 71: Linked Data Tutorial

Faceted Linked-Geo-Data Browser

Linked Data Tutorial

Page 72: Linked Data Tutorial

DBpedia“Semantification” of Wikipedia

DBpedia“Semantification” of Wikipedia

AKSW Linked Data Web Building Blocks

Linked Data Tutorial

Triplify“Semantification” of (small) Web Applications

Triplify“Semantification” of (small) Web Applications

OntoWikiCollaborative creation of explicit knowledge via Semantic Wikis

OntoWikiCollaborative creation of explicit knowledge via Semantic Wikis

OWLDBExtending DBs for ontology handling / revealing implicit information

OWLDBExtending DBs for ontology handling / revealing implicit information

VakantielandBuilding Data Web applicationsVakantielandBuilding Data Web applications

SoftWikiDistributed, stakeholder driven Requirements Engineering

SoftWikiDistributed, stakeholder driven Requirements Engineering

FoundationsMarrying databases with RDFand ontologies

ToolsApplicationsBringing the Data Web to end users

RDF Query Subsumption & View MaintenanceScaling database backed Triple Stores

RDF Query Subsumption & View MaintenanceScaling database backed Triple Stores

xOperatorCombining Instant Messaging with the Data Web

xOperatorCombining Instant Messaging with the Data Web

OpenResearch.orgA semantic Wiki for the sciencesOpenResearch.orgA semantic Wiki for the sciences

DL-LearnerMachine Learning for Ontologies

DL-LearnerMachine Learning for Ontologies

Page 73: Linked Data Tutorial

Thanks!Dr. Sören [email protected] group Agile Knowledge Engineering & Semantic Web

(AKSW): http://aksw.org• http://Triplify.org• http://DBpedia.org• http://OntoWiki.net• http://OpenResearch.org• http://aksw.org/projects/xOperator

• DL-Learner.org• Cofundos.org

Linked Data Tutorial