43
STANFORD UNIVERSITY LIBRARIES The Linked Data Snowball or Why We Need Reconciliation April 4 th , 2016 T HE AAC / G ETTY W ORKSHOP ON R ECONCILIATION OF L INKED O PEN D ATA Rob Sanderson / [email protected] / @azaroth42

Linked Data Snowball, or Why We Need Reconciliation

Embed Size (px)

Citation preview

Page 1: Linked Data Snowball, or Why We Need Reconciliation

STANFORD UNIVERSITY LIBRARIES

The Linked Data Snowball or Why We Need Reconciliation

April 4th, 2016

T H E A A C / G E T T Y W O R K S H O P O N R E C O N C I L I AT I O N O F L I N K E D O P E N D ATA

Rob Sanderson / [email protected] / @azaroth42

Page 2: Linked Data Snowball, or Why We Need Reconciliation

STANFORD UNIVERSITY LIBRARIES

The Linked Data Snowball or Why We Need Reconciliation

April 4th, 2016

T H E A A C / G E T T Y W O R K S H O P O N R E C O N C I L I AT I O N O F L I N K E D O P E N D ATA

Rob Sanderson / [email protected] / @azaroth42 web.stanford.edu/~azaroth/#me

[email protected] / +azaroth42 orcid: 0000-0003-4441-6852

Page 3: Linked Data Snowball, or Why We Need Reconciliation

STANFORD UNIVERSITY LIBRARIES

The Linked Data Snowball or Why We Need Reconciliation

April 4th, 2016

T H E A A C / G E T T Y W O R K S H O P O N

Rob Sanderson / [email protected] / @azaroth42 web.stanford.edu/~azaroth/#me

[email protected] / +azaroth42 orcid: 0000-0003-4441-6852

http://www.informatik.uni-trier.de/~ley/pers/hd/s/Sanderson:Robert http://academic.research.microsoft.com/Author/2765999

http://www.scopus.com/authid/detail.url?authorId=8988953600 www.researchgate.net/profile/Rob_Sanderson

facebook.com/rob.sanderson / linkedin.com/pub/robert-sanderson/1/172/5a6/ [email protected] / [email protected]

public.lanl.gov/rsanderson / gondolin.hist.liv.ac.uk/~azaroth [email protected] / [email protected]

R E C O N C I L I AT I O N O F L I N K E D O P E N D ATA

Page 4: Linked Data Snowball, or Why We Need Reconciliation

Linked Data?

1.  Use URIs as names for things 2.  Use HTTP URIs so that people can look up those names 3.  When someone looks up a URI, provide useful

information, using the standards 4.  Include links to other URIs, so they can discover

more things

Page 5: Linked Data Snowball, or Why We Need Reconciliation

Linked Data?

1.  Use URIs as names for things 2.  Use HTTP URIs so that people can look up those names 3.  When someone looks up a URI, provide useful

information, using the standards 4.  Include links to other URIs, so they can discover more

things 5.  Link your data to other people's data to provide

context

Page 6: Linked Data Snowball, or Why We Need Reconciliation
Page 7: Linked Data Snowball, or Why We Need Reconciliation

Why So Many? Do I know the URI, or can I find it?

URI

No

Page 8: Linked Data Snowball, or Why We Need Reconciliation

Why So Many? Do I know the URI, or can I find it?

No

Understand and agree with the model used? No

URI

Page 9: Linked Data Snowball, or Why We Need Reconciliation

Why So Many? Do I know the URI, or can I find it?

No

Understand and agree with the model used? No

Understand and agree with the description? No

URI

Page 10: Linked Data Snowball, or Why We Need Reconciliation

Why So Many? Do I know the URI, or can I find it?

No

Understand and agree with the model used? No

Understand and agree with the description? No

Agree the URI identifies the same entity? No

URI

Page 11: Linked Data Snowball, or Why We Need Reconciliation

Why So Many? Do I know the URI, or can I find it?

No

Understand and agree with the model used? No

Understand and agree with the description? No

Agree the URI identifies the same entity? No

Agree description is complete? No

URI

Page 12: Linked Data Snowball, or Why We Need Reconciliation

Why So Many? Do I know the URI, or can I find it?

No

Understand and agree with the model used? No

Understand and agree with the description? No

Agree the URI identifies the same entity? No

Agree description is complete? No

Hooray, you reused a URI! URI Yes

Page 13: Linked Data Snowball, or Why We Need Reconciliation

Why So Many? Do I know the URI, or can I find it?

No

Understand and agree with the model used? No

Understand and agree with the description? No

Agree the URI identifies the same entity? No

Agree description is complete? No

Hooray, you reused a URI! Now start again with the next entity :(

URI Yes

Page 14: Linked Data Snowball, or Why We Need Reconciliation

Many Special and Unique Snowflakes

Page 15: Linked Data Snowball, or Why We Need Reconciliation

Become a Huge Snowball of Technical Debt

Page 16: Linked Data Snowball, or Why We Need Reconciliation

Option 1: Balance the Equation

Cost(Create URI)!+!

Cost(Maintain URI) ! !

Cost(Find Good URI)+ Cost(Understand Model)+ Cost(Understand Content)

+!min( Risk(Reliability)+!

Cost(Network Latency),!Risk(Out of Date)+!

Cost(Cache Content)) -!

Value(Connected Graph)!

<=

Page 17: Linked Data Snowball, or Why We Need Reconciliation

Option 1 Likelihood

Page 18: Linked Data Snowball, or Why We Need Reconciliation

Option 1 Likelihood

Botticelli: http://vocab.getty.edu/ulan/500015254!

Page 19: Linked Data Snowball, or Why We Need Reconciliation

Option 1 Likelihood

Botticelli: http://vocab.getty.edu/ulan/500015254 :)!

Page 20: Linked Data Snowball, or Why We Need Reconciliation

Option 1 Likelihood

Botticelli: http://vocab.getty.edu/ulan/500015254!:(

Page 21: Linked Data Snowball, or Why We Need Reconciliation

Option 2: Reconciliation

YCBA's URIs Princeton's URIs

Page 22: Linked Data Snowball, or Why We Need Reconciliation

Option 2: Reconciliation

YCBA's Entities

Princeton's Entities

Shared Entities but not Shared URIs

Page 23: Linked Data Snowball, or Why We Need Reconciliation

Option 2: Reconciliation

1. Algorithmically discover this intersection given the descriptions of the entities

Page 24: Linked Data Snowball, or Why We Need Reconciliation

Option 2: Reconciliation

2. Assert that the entity which two URIs identify is actually the same entity

=

Page 25: Linked Data Snowball, or Why We Need Reconciliation

Option 2: Reconciliation

Page 26: Linked Data Snowball, or Why We Need Reconciliation

Option 2a: Reconciliation (distributed authority)

Page 27: Linked Data Snowball, or Why We Need Reconciliation

Option 2b: Reconciliation (centralized authority)

Page 28: Linked Data Snowball, or Why We Need Reconciliation

Benefits of Reconciliation End User:

•  Has access to more information, more easily, improving research, discovery and navigation

•  Potential for new UIs, new research questions, reasoning

Institution: •  Efficiency (= reduced cost) and improved quality of description •  Increased prestige when descriptions are reused •  Usage across the network is valuable business intelligence

Community: •  Network effects spread faster and further, increasing awareness of

cultural heritage •  Gives easier access to other communities' data

Page 29: Linked Data Snowball, or Why We Need Reconciliation

Real Benefit of Reconciliation

Reconciliation is a network damage limiting step towards balancing Equation 1

By linking entity descriptions together:

•  the cost of discovery and understanding is reduced

•  the costs of creating and maintaining the resources are shared across the community, not duplicated

•  the value of the connected graph is increased

•  the likelihood of new entities (requiring reconciliation) is reduced

Page 30: Linked Data Snowball, or Why We Need Reconciliation

But How Can A Machine Know??

Algorithms won't be perfect, but can be good enough.

•  What use cases will the reconciled data be used to fulfill?

•  What is the cost of a false positive for those use cases?

Precision: What % of matches are correct?

Recall: What % of the possible matches were found?

Can make trade-offs of precision vs recall for different use cases.

Machine can record its certainty, and policy can provide a threshold.

Page 31: Linked Data Snowball, or Why We Need Reconciliation

How Can We Improve It? Several different relationships to express similarity:

•  owl:sameAs – always exactly the same (transitive)

•  skos:exactMatch – the same for most purposes (transitive)

•  skos:closeMatch – the same for some purposes (intransitive)

The context of resource in the network is important

•  Starting simple with high precision gives a better context to use the results to iteratively and incrementally bootstrap

Page 32: Linked Data Snowball, or Why We Need Reconciliation

Trust and Community "Efficiency (= reduced cost) and improved quality of description" •  Efficiency comes from not duplicating descriptive effort... •  Which requires trusting other institutions in the community •  We need to work together, not...

Page 33: Linked Data Snowball, or Why We Need Reconciliation

Trust and Community "Efficiency (= reduced cost) and improved quality of description" •  Efficiency comes from not duplicating descriptive effort... •  Which requires trusting other institutions in the community •  We need to work together, not...

Page 34: Linked Data Snowball, or Why We Need Reconciliation

Entities to Reconcile As a community, we need to pick where to start. Suggest starting with least controversial / most unique:

•  Physical objects •  People •  Places •  Events (specific, like Exhibitions)

A small sub-domain (by time?) to make overlap more likely

Page 35: Linked Data Snowball, or Why We Need Reconciliation

Q. Can I Reconcile a String?

Named Entity Recognition

"snowflake" = .

strings to things

Reconciliation

. = .

things to things

Page 36: Linked Data Snowball, or Why We Need Reconciliation

The Hard Question

How can we be more useful than DBPedia for our own entities?

Page 37: Linked Data Snowball, or Why We Need Reconciliation

The Hard Question

How can we be more useful than DBPedia for our own entities?

•  Focus on unique selling points •  Demonstrate value early,

both internally and to the broader community •  By working together to increase the value of the network

Page 38: Linked Data Snowball, or Why We Need Reconciliation

STANFORD UNIVERSITY LIBRARIES

Thank You!

April 4th, 2016

Rob Sanderson / [email protected] / @azaroth42 web.stanford.edu/~azaroth/#me

[email protected] / +azaroth42 orcid: 0000-0003-4441-6852

http://www.informatik.uni-trier.de/~ley/pers/hd/s/Sanderson:Robert http://academic.research.microsoft.com/Author/2765999

http://www.scopus.com/authid/detail.url?authorId=8988953600 www.researchgate.net/profile/Rob_Sanderson

facebook.com/rob.sanderson / linkedin.com/pub/robert-sanderson/1/172/5a6/ [email protected] / [email protected]

public.lanl.gov/rsanderson / gondolin.hist.liv.ac.uk/~azaroth [email protected] / [email protected]

Page 39: Linked Data Snowball, or Why We Need Reconciliation

STANFORD UNIVERSITY LIBRARIES

Thank You!

April 4th, 2016

Rob Sanderson / [email protected] / @azaroth42 web.stanford.edu/~azaroth/#me

[email protected] / +azaroth42 orcid: 0000-0003-4441-6852

http://www.informatik.uni-trier.de/~ley/pers/hd/s/Sanderson:Robert http://academic.research.microsoft.com/Author/2765999

http://www.scopus.com/authid/detail.url?authorId=8988953600 www.researchgate.net/profile/Rob_Sanderson

facebook.com/rob.sanderson / linkedin.com/pub/robert-sanderson/1/172/5a6/ [email protected] / [email protected]

public.lanl.gov/rsanderson / gondolin.hist.liv.ac.uk/~azaroth [email protected] / [email protected]

Page 40: Linked Data Snowball, or Why We Need Reconciliation

STANFORD UNIVERSITY LIBRARIES

Thank You!

April 4th, 2016

[email protected]

Page 41: Linked Data Snowball, or Why We Need Reconciliation

STANFORD UNIVERSITY LIBRARIES

Thank You!

April 4th, 2016

[email protected]

Page 42: Linked Data Snowball, or Why We Need Reconciliation

ThankYou!

rsanderson@ge*y.edu

April 25th, 2016

Page 43: Linked Data Snowball, or Why We Need Reconciliation

ThankYou!

rsanderson@ge*y.edu

Based on my slides from Andrew W. Mellon Foundation Reconciliation Workshop With recognition and thanks to all of the participants