Data Science meets Linked DataAlasdair J G Grayhttp://www.alasdairjggray.co.uk@[email protected] Data Science Theme Launch3 July 2014
SICSA Data Science Theme Launch
2
BBC World Cup3 July 2014
SICSA Data Science Theme Launch
3
BBC Linked Data Platform3 July 2014
SICSA Data Science Theme Launch
4
Olympics 20123 July 2014
SICSA Data Science Theme Launch
5
Linking Data3 July 2014
SICSA Data Science Theme Launch
6
1. Global ID – URI2. Resolvable ID3. Useful content
HTML for humans RDF for machines
4. Link to other resourcesLike the Web, but for data!
Linked Data Principles3 July 2014
“RDF and OWL do not solve the interoperability problem, they just lay it bare on the table!”
Challenge 1: Matching
Administrative Data Research Centre - Scotland | Alasdair J G Gray| 3 July 2014
John GrantFisherman
Fiona Sinclair
Ian GrantSmithy
Born: 1861
Stuart AdamWheelwright
Morag Scott
Flora AdamSeamstressBorn: 1866
Married: 1884
John GrantFarmer
Fiona Grant
Iain GrantBorn: 1860
Messy data Probabilistic
matches Schema matching
Gleevec® = Imatinib Mesylate
3 July 2014 SICSA Data Science Theme Launch 8
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N
SICSA Data Science Theme Launch
Challenge 2: Reusing mappings
3 July 2014 9
Link: skos:closeMatchReason: non-salt form
Link: skos:exactMatchReason: drug name
Link: owl:sameAs
SICSA Data Science Theme Launch
Challenge: Multiple IdentitiesAndy Law's Third Law“The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws/
3 July 2014
10
P12047X31045P12047
GB:29384RS_2353
http://rdf.ebi.ac.uk/resource/chembl/molecule/CHEMBL1642
https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL1642
SICSA Data Science Theme Launch
11
Challenge Open Data: Licenses
5★ of linked data Licenses who can
reuse the data Interoperability of
licenses Non-commercial:
academic use, teaching, industry
3 July 2014
SICSA Data Science Theme Launch
12
Challenges: Privacy3 July 2014
SICSA Data Science Theme Launch
13
Challenge: Query Performance
Response time Data freshness Reliability Volume of
requests Hosting
resources
3 July 2014
DataSource
DataSource
Data Warehouse
Queries
DataSource
DataSource
Mediator
Queries
SICSA Data Science Theme Launch
14
In Data we Trust How can we trust
the data we’ve got back?
How can we ensure that it hasn’t been tampered on the way?
Trusty URIs
3 July 2014
http://www.intelsat.com/wp-content/uploads/2014/03/Red-padlock.jpg
SICSA Data Science Theme Launch
16
Contact Details
[email protected]@gray_alasdair
3 July 2014
“There is lots of data we all use every day, and it’s not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?
No. Why not? Because we don’t have a web of data. Because data is controlled by applications and each application keeps it to itself.”
Tim Berners-Lee