www.capsenta.com 1
Linked DataJuan F. Sequeda – Daniel P. MirankerCapsenta
Semantic Tech & Business Conference 2012
June 4, 2012
Outline
Part 1: Introduction to Linked Data
Part 2: Linked Data Principles
Part 3: Linked Data Architectures
Part 4: Linked Enterprise Data
www.capsenta.com June 4, 2012 2
3
Part 1:Introduction to
Linked Data
June 4, 2012 www.capsenta.com
The Web is a Data Shredder
Structured Data
Unstructured Data
Thanks Martin Hepp
June 4, 2012 www.capsenta.com 4
The Web of Documents
Search
Crawler
Search Engine
June 4, 2012 www.capsenta.com 5
What would we like?
Make it easy for computers/software to find THINGS
Do you SEARCH or do you FIND?
June 4, 2012 www.capsenta.com 6
www.capsenta.com 7
Search forFootball Players who went to the
University of Texas at Austin, played for the Dallas Cowboys as Cornerback
June 4, 2012
June 4, 2012 www.capsenta.com 8
June 4, 2012 www.capsenta.com 9
June 4, 2012 www.capsenta.com 10
www.capsenta.com 11
Why can’t we just FIND it…
June 4, 2012
June 4, 2012 www.capsenta.com 12
June 4, 2012 www.capsenta.com 13
www.capsenta.com 14
Guess how I FOUND out?
June 4, 2012
On a Semantic Web
Besides publishing documents on the webwhich computers can’t understand easily
Let’s publish on the web something that computers can understand
DATAJune 4, 2012 www.capsenta.com 15
www.capsenta.com 16
The Semantic Web is a web of data
The current web is a web of documents
June 4, 2012
www.capsenta.com 17
But wait… doesn’t the web already have data?
June 4, 2012
Current Data on the Web
Relational Databases
APIs
XML
CSV
XLS
…
Can’t computers and applications already consume that data on the web?
June 4, 2012 www.capsenta.com 18
www.capsenta.com 19
Yes! But it is all in different formats and data models!
June 4, 2012
www.capsenta.com 20
This makes it hard to integrate data
June 4, 2012
www.capsenta.com 21
The data in different data sources aren’t linked
June 4, 2012
www.capsenta.com 22
For example, how do I state that the Juan Sequeda in Facebook is the
same as Juan Sequeda in Twitter
June 4, 2012
www.capsenta.com 23
Or if I create a mashup from different services, I have to learn different APIs
and I get different formats of data back
June 4, 2012
Data is Siloed
June 4, 2012 www.capsenta.com 24
www.capsenta.com 25
Wouldn’t it be great if we had a standard way of publishing data
on the Web?
June 4, 2012
www.capsenta.com 26
We have a standardized way of publishing documents on the web,
right?HTML
June 4, 2012
www.capsenta.com 27
Then why can’t we have a standard way of publishing data
on the Web?
June 4, 2012
www.capsenta.com 28
Good question! And the answer is YES. There is!
RDF
June 4, 2012
Resource Description Framework (RDF)
Data Model = a way to model datai.e. Relational databases use relational data model
RDF is a graph data model
June 4, 2012 www.capsenta.com 29
RDF is a Graph
<JuanSequeda> <firstName> “Juan”<JuanSequeda> <lastName> “Sequeda”<JuanSequeda> <livesIn> “Austin”<JuanSequeda> <knows> <DanielMiranker>..<DanielMiranker> <firstName> “Daniel”<DanielMiranker> <lastName> “Miranker”<DanielMiranker> <livesIn> “Austin”
June 4, 2012 www.capsenta.com 30
RDF can be serialized in different ways
RDF/XML
RDFa (RDF in HTML)
N3
Turtle
JSON
June 4, 2012 www.capsenta.com 31
June 4, 2012 www.capsenta.com 32
RDFa
June 4, 2012 www.capsenta.com 33
RDF/XML
June 4, 2012 www.capsenta.com 34
RDF/N-triples
June 4, 2012 www.capsenta.com 35
RDF/Turtle
June 4, 2012 www.capsenta.com 36
www.capsenta.com 37
So does that mean that I have to publish my data in RDF now?
June 4, 2012
www.capsenta.com 38
You don’t have to… but we would like you to
June 4, 2012
www.capsenta.com 39
An example
June 4, 2012
Document on the Web
June 4, 2012 www.capsenta.com 40
Databases back up documents
Isbn Title Author PublisherID ReleasedData
978-0-596-15381-6
Programming the Semantic Web
Toby Segaran
1 July 2009
… … … … …
PublisherID
PublisherName
1 O’Reilly Media
… …
This is a THING:A book title “Programming the Semantic Web” by Toby Segaran, …
THINGS have PROPERTIES:A Book as a Title, an author, …
June 4, 2012 www.capsenta.com 41
Lets represent the data in RDF
book
Programming the Semantic
Web
978-0-596-15381-6
Toby Segaran
Publisher O’Reilly
title
name
author
publisher
isbn
Isbn Title Author PublisherID ReleasedData
978-0-596-15381-6
Programming the Semantic Web
Toby Segaran
1 July 2009
PublisherID
PublisherName
1 O’Reilly Media
June 4, 2012 www.capsenta.com 42
www.capsenta.com 43
Remember that we are on the web
Everything on the web is identified by a URI
June 4, 2012
And now let’s link the data to other data
http://…/
isbn978
Programming the Semantic
Web
978-0-596-15381-6
Toby Segaran
http://…/
publisher1
O’Reilly
title
name
author
publisher
isbn
June 4, 2012 www.capsenta.com 44
And now consider the data from Revyu.com
http://…/
isbn978
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
hasReview
reviewer
description
name
June 4, 2012 www.capsenta.com 45
Let’s start to link data
http://…/
isbn978
Programming the Semantic
Web
978-0-596-15381-6
Toby Segaran
http://…/
publisher1
O’Reilly
title
name
author
publisher
isbn
http://…/
isbn978
owl:sameAs
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
hasReview
hasReviewer
description
name
June 4, 2012 www.capsenta.com 46
Juan Sequeda publishes data too
http://juansequeda.com/id
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
June 4, 2012 www.capsenta.com 47
Let’s link more datahttp://
…/isbn978
http://…/
review1
Awesome Book
http://…/reviewer
Juan Sequeda
http://juansequeda.com/id
hasReview
hasReviewer
description
name
sameAs
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
June 4, 2012 www.capsenta.com 48
And more
http://…/isbn978
Programming the Semantic
Web
978-0-596-15381-6
Toby Segaran
http://…/publisher1 O’Reilly
title
name
author
publisher
isbn
http://…/
isbn978
owl:sameAs
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
http://juansequeda.com/id
hasReview
hasReviewer
description
name
owl:sameAs
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
June 4, 2012 www.capsenta.com 49
www.capsenta.com 50
Data on the Web that is in RDF and is linked to other RDF data is
LINKED DATA
June 4, 2012
www.capsenta.com 51
Linked Data makes the web appear as ONE
GIANTHUGE
GLOBAL
DATABASE!
June 4, 2012
www.capsenta.com 52
I can query a database with SQL. Is there a way to query Linked Data
with a query language?
June 4, 2012
www.capsenta.com 53
Yes! There is actually a standardize language for that
SPARQL
June 4, 2012
www.capsenta.com 54
FIND all the reviews on the book “Programming the Semantic Web”
by people who live in Austin
June 4, 2012
SELECT ?review ?commentWHERE { isbn:978 ex:hasReview ?review . ?review ex:description ?comment . ?review ex:hasReviewer ?person . ?person ex:lives dbpedia:Austin .}
SPARQL
June 4, 2012 www.capsenta.com 55
http://…/isbn978
Programming the Semantic
Web
978-0-596-15381-6
Toby Segaran
http://…/publisher1 O’Reilly
title
name
author
publisher
isbn
http://…/
isbn978
owl:sameAs
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
http://juansequeda.com/id
hasReview
hasReviewer
description
name
owl:sameAs
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
SELECT ?review ?commentWHERE {isbn:978 ex:hasReview ?review .?review ex:description ?comment .?review ex:hasReviewer ?person .?person ex:lives dbpedia:Austin .}
June 4, 2012 www.capsenta.com 56
www.capsenta.com 57
This looks cool, but let’s be realistic. What is the incentive to publish Linked Data on the Web?
June 4, 2012
www.capsenta.com 58
What was your incentive to publish an HTML page in 1990?
June 4, 2012
1) Share data in documents
2) Because you neighbor was doing it
… later on …
3) Marketing, Advertising, …, SEO
June 4, 2012 www.capsenta.com 59
www.capsenta.com 60
So why should we publish Linked Data in 2012?
June 4, 2012
1) Share data as data
2) Because you neighbor is doing it
… later on …
3) Marketing, Advertising, …, SEO
June 4, 2012 www.capsenta.com 61
Linked Data Publishers
US and UK Government
BBC
NY Times
Best Buy
Sears
Kmart
Overstock
… too many more to nameJune 4, 2012 www.capsenta.com 62
www.capsenta.com 63
Linked Open Data
June 4, 2012
www.capsenta.com June 4, 2012
http://www.w3.org/DesignIssues/LinkedData.html
64
May 2007
June 4, 2012 www.capsenta.com 65
Oct 2007
June 4, 2012 www.capsenta.com 66
Nov 2007
June 4, 2012 www.capsenta.com 67
Feb 2008
June 4, 2012 www.capsenta.com 68
Mar 2008
June 4, 2012 www.capsenta.com 69
Sept 2008
June 4, 2012 www.capsenta.com 70
Mar 2009 (1)
June 4, 2012 www.capsenta.com 71
Mar 2009 (2)
June 4, 2012 www.capsenta.com 72
July 2009
June 4, 2012 www.capsenta.com 73
September 2010
June 4, 2012 www.capsenta.com 74
September 2011
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
June 4, 2012 www.capsenta.com 75
YOU GET THE PICTURE
ITS BIG and getting
BIGGER and
BIGGERJune 4, 2012 www.capsenta.com 76
www.capsenta.com 77
Part 2: Linked Data Principles
June 4, 2012
June 4, 2012 www.capsenta.com 78
Linked Data is a set of best practices to publish and interlink data on the
web
Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up (dereference) those names.
3. When someone looks up a URI, provide useful information.
4. Include links to other URIs so that they can discover more things.
June 4, 2012 www.capsenta.com 79
June 4, 2012 www.capsenta.com 80
1. Use URIs as names for things
1) Use URIs as names for things
http://juansequeda.com/foaf.rdf#me http://www.w3.org/People/Berners-Lee/card#i
http://xmlns.com/foaf/0.1/knows
http://dbpedia.org/resource/Austin,_Texas
http://xmlns.com/foaf/0.1/based_near
June 4, 2012 www.capsenta.com 81
June 4, 2012 www.capsenta.com 82
2. Use HTTP URIs so that people can look up (dereference) those
names.
2) Use HTTP URIsHTTP client can lookup the URI using HTTP
protocol and retrieve a description
http://dbpedia.org/resource/Austin,_Texas
June 4, 2012 www.capsenta.com 83
June 4, 2012 www.capsenta.com 84
June 4, 2012 www.capsenta.com 85
June 4, 2012 www.capsenta.com 86
June 4, 2012 www.capsenta.com 87
What’s with the redirection (303) ?
June 4, 2012 www.capsenta.com 88
June 4, 2012 www.capsenta.com 89
http://upload.wikimedia.org/wikipedia/commons/0/06/AustinSkylineLouNeffPoint-2010-03-29-b.JPG
June 4, 2012 www.capsenta.com 90http://dbpedia.org/page/Austin,_Texas
June 4, 2012 www.capsenta.com 91
http://dbpedia.org/resource/Austin,_Texas
http://dbpedia.org/page/Austin,_Texas http://dbpedia.org/data/Austin,_Texas.xml
Accept: text/html Accept: application/rdf+xml
Identifies the abstract concept of“the city of Austin, Texas”
Identifies an HTML document that describes “the city of Austin, Texas”
Identifies an RDF document that describes “the city of Austin, Texas”
Minting HTTP URIs
If you own the domain name and run a web server at that location, mint URIs in this namespace
I own the domain capsenta.com
I run the webserver http://capsenta.com
I can mint URIs in this namespacehttp://capsenta.com/person/Juan-Sequeda
June 4, 2012 www.capsenta.com 92
Cool URIs
Don’t misuse a namespace that you don’t ownhttp://www.imdb.com/title
Avoid implementation detailshttp://capsenta.com/person.php?id=123&format=rd
f
Use Natural Keyshttp://capsenta.com/person/123
June 4, 2012 www.capsenta.com 93
http://www.w3.org/TR/cooluris/
June 4, 2012 www.capsenta.com 94
3. When someone looks up a URI, provide useful
information.
3) Provide useful information
How do we provide useful information in document form on the web? HTML
How do we provide useful information in data form on the web RDF
June 4, 2012 www.capsenta.com 95
What to publish?
Literal Triples
<http://dbpedia.org/resource/Austin,_Texas><http://xmlns.com/foaf/0.1/name>
“City of Austin”
Outgoing Link Triples
<http://dbpedia.org/resource/Austin,_Texas><http://www.w3.org/2002/07/owl#sameAs>
<http://rdf.freebase.com/ns/m/0vzm>
Incoming Link Triples
<http://dbpedia.org/resource/Dakota_Johnson><http://dbpedia.org/ontology/birthPlace>
<http://dbpedia.org/resource/Austin,_Texas>
June 4, 2012 www.capsenta.com 96
What to publish?
Description of the data setSemantic SitemapsvoiD (Vocabulary of Interlinked Datasets)
Provenance Metadata
Licenses Information
June 4, 2012 www.capsenta.com 97
Vocabularies (or Schemas or Ontologies)Create your own using
RDFS/OWL/ SKOS
Reuse vocabulariesDublin Core: metadata attributesFriend of a Friend (FOAF): persons and relationshipsSemantically Interlinked Online Communities (SIOC): describing
users, posts, blogs, etcDescription of a Project (DOAP)Music OntologyProgrammes Ontology: TV and radio programsGood Relations: describing products and servicesReview VocabularyBasic Geo (WGS84) Vocabulary
June 4, 2012 www.capsenta.com 98
June 4, 2012 www.capsenta.com 99
4. Include links to other URIs so that they can discover more things.
4) Include links to other things
Set external RDF links into other data sources on the WebSubject of the triple is in the namespace of one
data setObject of the triple is a URI in the namespace of
another data set
Connect siloed data islands
Enable discovery
June 4, 2012 www.capsenta.com 100
4) Include links to other things
Relationship Link Triples
<http://juansequeda.com/foaf.rdf#me><http://xmlns.com/foaf/0.1/based_near>
<http://dbpedia.org/resource/Austin,_Texas>
Identity Link Triples
<http://dbpedia.org/resource/Austin,_Texas><http://www.w3.org/2002/07/owl#sameAs>
<http://rdf.freebase.com/ns/m/0vzm>
Vocabulary Link Triples
<http://capsenta.com/vocab/name><http://www.w3.org/2002/07/owl#equivalentProperty>
<http://xmlns.com/foaf/0.1/name>
June 4, 2012 www.capsenta.com 101
Which predicate for linking to choose?
Depends on your domain
Is it widely used?owl:sameAsfoaf:knowsfoaf:based_near…
If you create your own, relate it to a widely used predicate
June 4, 2012 www.capsenta.com 102
www.capsenta.com 103
Part 3: Linked Data
Architectures
June 4, 2012
Static RDF Files
Small amount of data (personal FOAF file)
Use RDF/XML serialization
Save as .rdf file and upload it to your serverhttp://www.capsenta.com/company.rdfhttp://www.capsenta.com/company.rdf#this
Configure MIME typesAddType application/rdf+xml .rdf
Make RDF discoverable from HTMl <link rel="alternate" type="application/rdf+xml" href="company.rdf">
June 4, 2012 www.capsenta.com 104
RDF in HTML (RDFa)
Another syntax for RDF
Useful if you have template HTML pages
Drupal 7 will do this out of the box
June 4, 2012 www.capsenta.com 105
Triplestores (aka RDF db, …)
CommercialOracle, IBM, OntoText (OWLIM), Franz
(Allegrograph), Openlink (Virtuoso), C&P (Stardog), Ontoprise (OntoBroker), Meronymy
Open SourceJena, Sesame, Mulgara, 4Store (Garlik), BigData
(Systap)
June 4, 2012 www.capsenta.com 106
RDB2RDF
Upcoming W3C RDB2RDF StandardsR2RML: mapping languageDirect Mapping: default automatic mapping
Two ApproachesDynamic (SPARQL to SQL)ETL (Dump RDB to RDF)
UltrawrapSupports W3C standard and moreSPARQL as fast as SQL
June 4, 2012 www.capsenta.com 107
www.capsenta.com
Unstructured to RDF
June 4, 2012
Unstructured
Entity Extractor
Triplestore
108
www.capsenta.com
Semi-structured to RDF
June 4, 2012
Semi-structured
XML2RDF, XLS2RDF, CVS2RDF
Triplestore
109
www.capsenta.com
RDB to RDF
June 4, 2012
RDB2RDF ETL
Triplestore
Relational Database
RDB2RDF (SPARQL to
SQL)
CMS with RDFa, Semantic Wiki
110
Creating Linked Data
June 4, 2012 www.capsenta.com 111
StructuredSemi-structured
Unstructured
Entity Extractor
XML2RDF, XLS2RDF, CVS2RDF
RDB Data source with API
Triplestore
Web Server Linked Data
InterfaceRDB2RDF
(i.e. Ultrawrap)CMS with RDFa, Semantic Wiki
Custom Linked Data Wrapper
Linked Data
Type of Data
Data Preparation
Data Storage
Data Publication
Thanks Heath and Bizer
RDB2RDF
Consuming Linked Data
June 4, 2012 www.capsenta.com 112
Creating Linked Data
Linked Data
Schema Mapping Record Linkage
Data Access
Application
Provenance Tracking
Schema Matching
Renaming<ex:name> <foaf:name>owl:equivalentClass and owl:equivalentPropertyrdfs:subClass or rdfs:subProperty
Structural Transformation<ex:Juan> <ex:lives> “Austin”<ex:Juan><foaf:based_near><db:Austin> .
<db:Austin><rdfs:label> “Austin”.
SPARQL Construct, RIF, R2R
June 4, 2012 www.capsenta.com 113
Record Linkage
Different URIs that identify the same thing
Create owl:sameAs links between them
Manually lookup: Sindice
(Semi) Automatically: SILK
June 4, 2012 www.capsenta.com 114
Provenance
Keep track where the data is coming fromQualityTrust
Named Graphs
SPARQL Graph
June 4, 2012 www.capsenta.com 115
www.capsenta.com
Centralized
June 4, 2012
Creating Linked Data
Triplestore
Application
SPARQL
116
Centralized
AdvantageInclude the datasets that you needComplex queries and high performanceReasoning
DrawbacksDepends on RDF dumps or crawlingEffort to setup the centralized triplestoreQueried data may be out of date
June 4, 2012 www.capsenta.com 117
Federated
June 4, 2012 www.capsenta.com 118
TriplestoreRelational Database
RDB2RDF
Relational Database
RDB2RDF
Triplestore
Federator
Application
SPARQL
SPARQL SPARQLSPARQL SPARQL
Federated
AdvantageInclude the datasets that you needQueried data is up to date
DrawbacksRequires existence of a SPARQL endpointEffort to setup federator
June 4, 2012 www.capsenta.com 119
Linked Traversal
June 4, 2012 www.capsenta.com 120
Linked Data
Relational Database
RDB2RDF
Triplestore
Application
SPARQL
Linked Traversal Query Engine
Linked Traversal
AdvantageNo need to know the data sources in advanceDoes not depend on the existence of SPARQL
endpoints or RDF dumpsQueried data is up to date
DrawbacksQuery execution time is slowUnsuitable for some queriesResults may be incompleteStill in research
June 4, 2012 www.capsenta.com 121
Applications
Linked Data Browsershttp://browse.semanticweb.org/
Linked Data (Semantic Web) Search EnginesFalcons, SWSE, VisiNav, Sindice, Sigma, Swoogle,
Watson
Search EnginesGoogle, Bing, Yahoo!
Faceted Browsershttp://dbpedia.neofonie.de/browse/
June 4, 2012 www.capsenta.com 122
Domain Specific Applications
BBC World Cup
Seevl.net
Linked Life Data
Government apps
June 4, 2012 www.capsenta.com 123
www.capsenta.com 124
Part 2: Linked Enterprise
Data
June 4, 2012
www.capsenta.com June 4, 2012
Consume Linked (Open) Data
PublishLinked (Open) Data
UseLinked Data Principles
internally
125
Linked Enterprise Data
Linked Data can be used as an architectural style for integrating data in the Enterprise
1. Standard Data Access Mechanism: HTTP
2. Standard Address & Identifier Scheme: URI
3. Standard Data Model: RDF
June 4, 2012 www.capsenta.com 126
www.capsenta.com
Linked Enterprise Data
Information creation information sharing
Produce and consume data specific to your needs but also produce it in a way that it can be connected to other data in the enterprise
Distributed but connected!
Data that you create, may benefit others! Share it!
June 4, 2012 127
Benefits of RDF/Linked Data
RDF (graphs) is a least common denominatorText, CVS, XML, XLS, RDB to RDFImagine modeling a social network in XML
Dynamic and FlexibleAdding a column to a table in my RDBMS takes 6
months to authorize!With RDF, simply add the triple!Incremental
June 4, 2012 www.capsenta.com 128
Benefits of RDF/Linked Data
Power of the URI and LinksUniversal IdentifierCreate a “foreign key” to a table that I have no
control of
Scalability in months, not only seconds“More can be done with less and faster”“Cooperation without coordination”
June 4, 2012 www.capsenta.com 129
What’s next?
W3C Linked Data Platform Working Grouphttp://www.w3.org/2012/ldp/charter
Linked Data Basic Profile 1.0http://www.w3.org/Submission/ldbp/
June 4, 2012 www.capsenta.com 130
www.capsenta.com 131
Summary
June 4, 2012
Linked Data Checklist
Does your data link to other data sets?
Do you provide provenance metadata?
Do you provide licensing metadata?
Do you reuse common vocabularies?
Do you map proprietary vocabulary terms to common vocabularies?
Do you provide other access methods?
June 4, 2012 www.capsenta.com
Thanks Heath & Bizer
Acknowledgements
RiBS Lab – UT Austin
Olaf Hartig – Humboldt University Berlin
Patrick Sinclair – BBC
Jamie Taylor – Google
Tom Heath & Chris Bizer. Linked Data: Evolving the Web into a Global Data Space
David Wood (Ed.). Linking Enterprise Data
June 4, 2012 www.capsenta.com 133
Thanks!
Juan F. Sequeda
@juansequeda
June 4, 2012 www.capsenta.com 134
Daniel P. Miranker
www.capsenta.com