Upload
wael-elrifai
View
598
Download
2
Embed Size (px)
DESCRIPTION
A description of using linked data to create clear and unambiguous systems across the Internet or within your enterprise.
Citation preview
London - New York - Dubai - Mumbai 2011
Dealing with the “new” data in the
“Cloud” – Linked Data
Table of Contents
Definitions 3
History 5
The Modigliani Test 11
Link Data 13
Raw Data 23
Resource Description Framework 30
Linked Data Principles 42
Publishing Linked Data 57
Faceted Browsers 65
On-the-fly Mashups 67
SPARQL 73
What is a Linked Data Application 77
Characteristics of a Linked Data Application 78
Contact Us 81
Definitions
RDF: The RDF data model is similar to classic conceptual
modelling approaches such as Entity-Relationship or Class
diagrams, as it is based upon the idea of making statements about
resources (in particular Web resources) in the form of subject-
predicate-object expressions. These expressions are known as
triples in RDF terminology. The subject denotes the resource, and
the predicate denotes traits or aspects of the resource and
expresses a relationship between the subject and the object. For
example, one way to represent the notion "The sky has the colour
blue" in RDF is as the triple: a subject denoting "the sky", a
predicate denoting "has the colour", and an object denoting "blue".
RDF is an abstract model with several serialization formats (i.e.,
file formats), and so the particular way in which a resource or
triple is encoded varies from format to format.
Definitions
SPARQL: (SPARQL Protocol and RDF Query Language,
pronounced "sparkle") is an RDF query language
Linked Data: Linked Data describes a method of publishing
structured data, so that it can be interlinked and become more
useful. It builds upon standard Web technologies, such as HTTP
and URIs - but rather than using them to serve web pages for
human readers, it extends them to share information in a way that
can be read automatically by computers. This enables data from
different sources to be connected and queried.
History
Linked Data Design Issues by Tim Berners-Lee July 2006
Linked Open Data Project WWW2007
First LOD Cloud May 2007
BBC publishes Linked Data 2008
NY Times announcement SemTech2009 - ISWC09
Data.gov.uk publishes Linked Data 2010
May 2007
Mar 2008
Sept 2008
Mar 2009
July 2009
The Modigliani Test
Show me all the locations of all the original paintings
of Modigliani
Daniel Koller (@dakoller) showed that you can find
this with a SPARQL query on DBpedia
So what is Linked Data?
Do you SEARCH or do you FIND?
Search for
Football Players who went to the University of
Texas at Austin, played for the Dallas Cowboys as
Cornerback
Why can’t we just FIND it…
Using the Current Web =internet + links + docs
is terribly inefficient
So what is the problem?
We aren’t always interested in documents
• We are interested in THINGS
• These THINGS might be in documents
We can read a HTML document rendered in a browser and find
what we are searching for
• This is hard for computers. It’s typically based on
guesswork from some primitive NLP engine, or simple
keyword search
What do we need to do?
Make it easy for computers/software to find THINGS
How can we do that?
• Besides publishing documents on the web
- which computers can’t understand easily
• Let’s publish something that computers can
understand
RAW DATA!
But don’t we already publish raw data in
RDBMS, XML, CSV, etc?
Yes!
But it’s not in a consistent format, and very
difficult to integrate (or “link”).
For example, how do I know that the
Wael Elrifai in Facebook is the same
as Wael Elrifai in Twitter
Don’t we already have a standard
way of publishing on the web?
We have a standardized way of
publishing documents on the web, right?
HTML
Then why can’t we have a standard way
of publishing data on the Web?
In fact, we do have one.
Resource Description Framework (RDF)
A data model
•A way to model data
•i.e. Relational databases use relational data model
RDF is a triple data model
Labeled Graph
Subject, Predicate, Object
<Wael> <was born in> <Beirut>
<Beirut> <is part of> <the Lebanon>
<Wael> <likes> <the Semantic Web>
RDF can be serialized in different ways
RDF/XML
RDFa (RDF in HTML)
N3
Turtle
JSON
So does that mean that I have to
publish my data in RDF now?
You don’t have to… but it sure
would be nice.
Document on the Web
Databases back up documents
Isbn Title Author PublisherID ReleasedData
978-0-596-15381-6
Programming the Semantic Web
Toby Segaran 1 July 2009
… … … … …
PublisherID PublisherName
1 O’Reilly Media
… …
This is a THING: A book title “Programming the Semantic Web” by Toby Segaran, …
THINGS have PROPERTIES: A Book as a Title, an author, …
Lets represent the data in RDF
Isbn Title Author PublisherID ReleasedData
978-0-596-15381-6
Programming the Semantic Web
Toby Segaran
1 July 2009
book
Programming the Semantic Web
978-0-596-15381-6
Toby Segaran
Publisher O’Reilly
title
name
author
publisher
isbn
PublisherID PublisherName
1 O’Reilly Media
Remember that we are on the web
Everything on the web is identified by a URL
And now let’s link the data to other data
http://…/isbn978
Programming the Semantic Web
978-0-596-15381-6
Toby Segaran
http://…/publisher1
O’Reilly
title
name
author
publisher
isbn
And now consider the data from Revyu.com
http://…/isbn978
http://…/revie
w1
Awesome Book
http://…/reviewer
Wael Elrifai
hasReview
reviewer
description
name
Let’s start to link data
http://…/isbn9
78
Programming the Semantic
Web
978-0-596-15381-6
Toby Segaran
http://…/publisher1 O’Reilly
title
name
author
publisher
isbn
http://…/isbn978
sameAs
http://…/revie
w1
Awesome Book
http://…/revie
wer
Wael Elrifai
hasReview
hasReviewer
description
name
Data on the Web that is in RDF and
is linked to other RDF data is
LINKED DATA
Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
(dereference) those names.
3. When someone looks up a URI, provide
useful information.
4. Include links to other URIs so that they can
discover more things.
Linked Data makes the web appear
a single global database!
The same can be done inside your company!
What if you wanted to know your company’s
EBITDA for Catalonia in 2010?
You could have a EDW pre-aggregate and
distribute the data, an analyst calculate it on
the spot, or…
Linked data in your internal semantic
web could relate all transactions to a
linked financial formulae!
You ask the question, tell your system
where to look (as part of the question,
this can be prebuilt) and voilà!
I can query a database with SQL. Is
there a way to query Linked Data with a
query language?
Yes! There is actually a standardize
language for that
FIND all the reviews on the book
“Programming the Semantic Web”
by people who live in London
http://…/isbn9
78
Programming the Semantic
Web
978-0-596-15381-6
Toby Segaran
http://…/publishe
r1
O’Reilly
title
name
author
publisher
isbn
http://…/isbn978
sameAs
http://…/review1
Awesome Book
http://…/reviewer
Wael Elrifai
http://waelworldwide.com
hasReview
hasReviewer
description
name
sameAs
livesIn
Wael Elrifai name
http://dbpedia.org/London
This looks cool, but let’s be realistic.
What is the incentive to publish
Linked Data?
What was your incentive to publish
an HTML (Intranet) page in 1990?
1) Share data in documents
2) Because you neighbor was doing it
So why should we publish
Linked Data in 2011?
1) Share data as data
2) Because you neighbor is doing it
You’ll be among good company…
Linked Data Publishers
UK Government
US Government
BBC
Open Calais – Thomson Reuters
Freebase
NY Times
Best Buy
CNET
Dbpedia
How can I publish Linked Data?
Publishing Linked Data
• Legacy Data in Relational Databases
• D2R Server
• Virtuoso
• Triplify
• Ultrawrap
• CMS
• Drupal 7
• Native RDF Stores
• Databases for RDF (Triple Stores)
• AllegroGraph, Jena, Sesame, Virtuoso
• Talis Platform (Linked Data in the Cloud)
• In HTML with RDFa
Consuming Linked Data by Humans
HTML Browsers
RDF can be serialized in RDFa
Have you heard of
•Yahoo’s Search Monkey
•Google Rich Snippets?
They are consuming RDFa
But WHY?
Because there is life beyond ten
blue links
Google and Yahoo are starting to crawl
RDFa!
The Semantic Web is a reality!
The Reality
• Yahoo is crawling data that is in RDFa and
Microformats under a specific vocabularies
• FOAF
• GoodRelations
• Google is crawling RDFa and Microformaats that
use the Google vocabulary
Linked Data Browsers
Tabulator
•http://www.w3.org/2005/ajar/tab
OpenLink
•http://ode.openlinksw.com/
Zitgist Dataviewr
•http://dataviewer.zitgist.com/
Marbles
•http://www5.wiwiss.fu-berlin.de/marbles/
Explorator
•http://www.tecweb.inf.puc-rio.br/explorator
Faceted Browsers
http://dbpedia.neofonie.de
http://dev.semsol.com/2010/semtech/
On-the-fly Mashups
http://sig.ma
What’s next?
Time to create new and innovative
ways to interact with Linked Data
This may be one of the Killer Apps that we have all been
waiting for
http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg
Where can I find SPARQL Endpoints?
Dbpedia:
http://dbpedia.org/sparql
Musicbrainz: http://dbtune.org/musicbrainz/sparql
U.S. Census:
http://www.rdfabout.com/sparql
Semantic Crunchbase: http://cb.semsol.org/sparql
http://esw.w3.org/topic/SparqlEndpoints
• Querying a single dataset is quite boring
compared to:
• Issuing SPARQL queries over multiple datasets
• How can you do this?
1. Issue follow-up queries to different endpoints
2. Querying a central collection of datasets
3. Build store with copies of relevant datasets
4. Use query federation system
Follow-up Queries
• Idea: issue follow-up queries over other
datasets based on results from previous
queries
• Substituting placeholders in query templates
Getting Started
• Finding URIs
• Finding Additional Data
• Finding SPARQL Endpoints
What is a Linked Data application
Software system that makes use of data on the
web from multiple datasets AND that benefits
from links between the datasets
Characteristics of Linked Data Applications
• Consume data that is published on the web following
the Linked Data principles
• Discover further information by following the links
between different data sources
• Combine the consumed linked data with data from
sources (not necessarily Linked Data)
• Expose the combined data back to the web
following the Linked Data principles
• Offer value to end-users
Examples
• http://data-gov.tw.rpi.edu/wiki
• http://dbrec.net/
• http://fanhu.bz/
• http://data.nytimes.com/schools/schools.html
• http://sig.ma
• http://visinav.deri.org/semtech2010/
Hot Research Topics
• Interlinking Algorithms
• Provenance and Trust
• Dataset Dynamics
• UI
• Distributed Query
Contact
PEAK Consulting
Headquarters
90 Long Acre, Covent Garden
London WC2E 9RZ
United Kingdom
Tel: +44 (0)207 849 3422
Fax: +44 (0)207 990 9478
United States
11 Penn Plaza, 5th floor
New York, NY 1000
United States
Tel: +1 (212) 946 4824
Fax: +1 (212) 946 2801
United Arab Emirates
Unit P12 Rimal, The
Walk
PO Box 487 177 Dubai
United Arab Emirates
Tel: +44 (0)207 849
3422
Fax: +44 (0)207 990
9478
http://www.peakconsulting.eu