15
Data Science meets Linked Data Alasdair J G Gray http:// www.alasdairjggray.co.uk @gray_alasdair [email protected] SICSA Data Science Theme Launch 3 July 2014

Data Science meets Linked Data

Embed Size (px)

DESCRIPTION

What are the research and technical challenges of linked data that are relevant to data science? This presentation introduces the ideas of linked data using the BBC sport web site as an example. It then identifies several research challenges that remain to be addressed.

Citation preview

Page 1: Data Science meets Linked Data

Data Science meets Linked DataAlasdair J G Grayhttp://www.alasdairjggray.co.uk@[email protected] Data Science Theme Launch3 July 2014

Page 2: Data Science meets Linked Data

SICSA Data Science Theme Launch

2

BBC World Cup3 July 2014

Page 3: Data Science meets Linked Data

SICSA Data Science Theme Launch

3

BBC Linked Data Platform3 July 2014

Page 4: Data Science meets Linked Data

SICSA Data Science Theme Launch

4

Olympics 20123 July 2014

Page 5: Data Science meets Linked Data

SICSA Data Science Theme Launch

5

Linking Data3 July 2014

Page 6: Data Science meets Linked Data

SICSA Data Science Theme Launch

6

1. Global ID – URI2. Resolvable ID3. Useful content

HTML for humans RDF for machines

4. Link to other resourcesLike the Web, but for data!

Linked Data Principles3 July 2014

“RDF and OWL do not solve the interoperability problem, they just lay it bare on the table!”

Page 7: Data Science meets Linked Data

Challenge 1: Matching

Administrative Data Research Centre - Scotland | Alasdair J G Gray| 3 July 2014

John GrantFisherman

Fiona Sinclair

Ian GrantSmithy

Born: 1861

Stuart AdamWheelwright

Morag Scott

Flora AdamSeamstressBorn: 1866

Married: 1884

John GrantFarmer

Fiona Grant

Iain GrantBorn: 1860

Messy data Probabilistic

matches Schema matching

Page 8: Data Science meets Linked Data

Gleevec® = Imatinib Mesylate

3 July 2014 SICSA Data Science Theme Launch 8

DrugbankChemSpider PubChem

Imatinib

MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N

Page 9: Data Science meets Linked Data

SICSA Data Science Theme Launch

Challenge 2: Reusing mappings

3 July 2014 9

Link: skos:closeMatchReason: non-salt form

Link: skos:exactMatchReason: drug name

Link: owl:sameAs

Page 10: Data Science meets Linked Data

SICSA Data Science Theme Launch

Challenge: Multiple IdentitiesAndy Law's Third Law“The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study”

http://bioinformatics.roslin.ac.uk/lawslaws/

3 July 2014

10

P12047X31045P12047

GB:29384RS_2353

http://rdf.ebi.ac.uk/resource/chembl/molecule/CHEMBL1642

https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL1642

Page 11: Data Science meets Linked Data

SICSA Data Science Theme Launch

11

Challenge Open Data: Licenses

5★ of linked data Licenses who can

reuse the data Interoperability of

licenses Non-commercial:

academic use, teaching, industry

3 July 2014

Page 12: Data Science meets Linked Data

SICSA Data Science Theme Launch

12

Challenges: Privacy3 July 2014

Page 13: Data Science meets Linked Data

SICSA Data Science Theme Launch

13

Challenge: Query Performance

Response time Data freshness Reliability Volume of

requests Hosting

resources

3 July 2014

DataSource

DataSource

Data Warehouse

Queries

DataSource

DataSource

Mediator

Queries

Page 14: Data Science meets Linked Data

SICSA Data Science Theme Launch

14

In Data we Trust How can we trust

the data we’ve got back?

How can we ensure that it hasn’t been tampered on the way?

Trusty URIs

3 July 2014

http://www.intelsat.com/wp-content/uploads/2014/03/Red-padlock.jpg

Page 15: Data Science meets Linked Data

SICSA Data Science Theme Launch

16

Contact Details

[email protected]@gray_alasdair

3 July 2014

“There is lots of data we all use every day, and it’s not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?

No. Why not? Because we don’t have a web of data. Because data is controlled by applications and each application keeps it to itself.”

Tim Berners-Lee