Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Beyond the Visionaries Taking Linked Data Architecture to the Next Level Richard Cyganiak
Semantic Web Conference, 7 March 2014
The Semantic Web, SciAm, 2001 Berners-Lee, Hendler, Lassila
“The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. […] If properly designed, the Semantic Web can assist the evolution of human knowledge as a whole.”
“The Semantic Web, in naming every concept simply by a URI, lets anyone express new concepts that they invent with minimal effort. Its unifying logical language will enable these concepts to be progressively linked into a universal Web. This structure will open up the knowledge and workings of humankind to meaningful analysis by software agents, providing a new class of tools by which we can live, work and learn together.”
Uptake 2014?
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
But… • Uptake so far only in some areas • Still driven by “early adopters” • No killer app
Technology Adoption Lifecycle
Source: http://www.biznology.com/2013/07/why-your-social-business-platform-doesnt-have-100-adoption/
Geoffrey A. Moore, Crossing the Chasm
Early adopters want technology & performance
Early majority wants solutions & convenience
The Web of Linked Data in engineering terms
• A single global database • Anyone can query from anywhere • Decentralized; anyone with a webserver
can play • Anyone can say anything about anything • Maybe let’s start simple: read-only, open
data
Architecture
Source: http://www.w3.org/2005/Talks/1107-iswc-tbl/
Architecture: Publishers
Source: http://www.w3.org/2005/Talks/1107-iswc-tbl/
Architecture
Source: http://www.w3.org/2001/sw/
Why RDF? • Graph data model! • The real world doesn’t fit in tables. • Networks are everywhere. • Graphs merge easily; tables and trees
don’t!
The “Early Majority” wants convenience
• Most existing data is in Excel and relational (SQL) databases.
• Developers love JSON and have trouble with RDF.
• The most common import/export format by far is CSV.
• There is little immediate benefit from publishing LOD.
Data publishing with immediate use: Exhibit
Source: http://web.mit.edu/newsoffice/2011/data-visualization-loc.html
Lessons from Exhibit
Good • Uses a popular format
(JSON) for publishing
• Immediate benefit to the publisher (attractive “data exhibit” that allows users to eplore the data)
Bad • No “mashing up” of data
from multiple sources
• Data format is not designed for re-use of data
• An Exhibit is a “data island”, no connections, no links
In praise of
CSV The lowest common denominator
Why it’s great… • Dead simple • Edit with Excel or Google Spreadsheets • Clean up with OpenRefine • Import into any SQL database • Export from many business apps • Visualize/chart with Excel, etc.
…and why it’s not so great
• Nowhere to put metadata • Not self-descriptive • No way to address, identify or link
records • No specification of character encoding • Many different variants and dialects
(TSV, semicolons) • Lots of bad CSV out there
Architecture, costs & benefits
Source: http://www.w3.org/2005/Talks/1107-iswc-tbl/
Architecture: RDF+CSV
+ CSV
CSV-to-RDF converter
CSV
Declarative CSV-to-RDF
mappings
“CSV on the Web” Working Group at W3C
https://www.w3.org/2013/csvw/
Linked CSV (Jeni Tennison)
http://jenit.github.io/linked-csv/
Tarql (DERI/Insight work)
https://github.com/cygri/tarql Using SPARQL as a CSV-to-RDF mapping language
CSV-LD proposal (Gregg Kellogg)
• Re-use JSON-LD’s contexts • Embed a context in the CSV file • Or link to the context file in the 2nd line
https://www.w3.org/2013/csvw/wiki/CSV-LD
Clients
and
Servers
Back in 1996… • On the early World Wide Web (WWW),
there was a single dominant client
80% market share in 1996
Clients for the Web of Data
• Sometimes only SPARQL • Sometimes a Data Portal custom-built
for a specific dataset and use case • Generic RDF browsers like Tabulator,
Marbles, Disco, LinkSailor • Many very different ideas!
https://www.w3.org/2013/csvw/wiki/CSV-LD
We still don’t know what a client for the Web of Linked Data really is.
• Data publishers don’t know how to test their data.
• Data publishers don’t get immediate benefit from publishing.
• Client development is splintered. • Improving the architecture is hard
because we don’t know the use case.
Architecture
Source: http://www.w3.org/2005/Talks/1107-iswc-tbl/
Double Bus and Mashup Sites
The generic client (“Netscape”)
for the Web of Data could be a mash-up engine
Features
• Engine for data mash-up sites • Can work on arbitrary datasets (not hard-
coded) • May cache data locally for performance • SPARQL over all mash-up data • Full-text search over all mash-up data • Plug-ins for advanced user interfaces • Many systems are already 75% there!
Technology Adoption Lifecycle
Source: http://www.biznology.com/2013/07/why-your-social-business-platform-doesnt-have-100-adoption/
Geoffrey A. Moore, Crossing the Chasm
Summary • Embrace convenient, popular formats
like CSV • The “early majority” will publish data if
there is an immediate benefit • Focus on data integrators, not just data
publishers • A generic mash-up engine could solve
several architectural problems