27
Meeting on Semantic Web and Archives, Libraries and Museums Fundación Ramón Areces, Madrid, Spain. 10 th April 2014 Adrian Stevenson Senior Technical Innovations Coordinator Mimas, University of Manchester, UK @adrianstevenson “Il n’y a pas de hors-texte” Challenges for Archival Linked Data

"Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Embed Size (px)

DESCRIPTION

Presentada en la Jornada de Web semántica en archivos, bibliotecas y museos, en la Fundación Ramón Areces, el 10 de abril de 2014.

Citation preview

Page 1: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Meeting on Semantic Web and Archives, Libraries and Museums Fundación Ramón Areces, Madrid, Spain. 10th April 2014

Adrian Stevenson

Senior Technical Innovations Coordinator Mimas, University of Manchester, UK

@adrianstevenson

“Il n’y a pas de hors-texte” – Challenges for Archival Linked Data

Page 2: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

“Il n’y a pas de hors-texte” ‘Of Grammatology’

Jacques Derrida, 1967

Page 3: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

“There is nothing outside the text”

Page 4: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

“There is nothing outside context”

Moderador
Notas de la presentación
Generally considered to be a more accurate translation. Got me thinking about archival context. Got me thinking about process of creating linked data somewhat like deconstruction – breaking down what we have thinking about things – then reconstruct. This process the possibly problematizes the notion of archival context – RDF model problematizes notion of ISAD(G) and archival context and document centric ways of thinking.
Page 5: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

http://archiveshub.ac.uk

Moderador
Notas de la presentación
Hub is an aggregation of archival descriptions from archive repositories across the UK. The core data comes from the Archives Hub, UK aggregator of archival descriptions – forms the basis of the linked data Approx 500,000 component level descriptions
Page 6: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

http://archiveshub.ac.uk/locah/

Page 7: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Deconstruction / Context

• Archives Hub data in ‘Encoded Archival Description’ EAD XML form

• Need to think about: – knowing what we want to say about our ‘things’ – data modelling – defining relationships – selecting vocabularies – deciding on identifiers – HTTP URIs – creating RDF XML – linking to external resources

Page 8: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Archival Resource

Finding Aid

EAD Document

Biographical History

Agent

Family

Person

Place

Concept

Genre

Function

Organisation

maintainedBy/ maintains

origination

associatedWith

accessProvidedBy/ providesAccessTo

topic/ page

hasPart/ partOf

hasPart/ partOf

encodedAs/ encodes

Repository (Agent)

Book

Place

topic/ page

Language

Level

administeredBy/ administers

hasBiogHist/ isBiogHistFor

foaf:focus Is-a associatedWith

level

Is-a

language

Concept Scheme

inScheme

Object

representedBy

Postcode Unit

Extent

Creation

Birth

Death

extent

participates in

Temporal Entity

Temporal Entity

at time

at time

product of

in

Archives Hub Model

Page 9: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

http://data.archiveshub.ac.uk

Moderador
Notas de la presentación
Talk through the page a bit. 1,495,168 statements currently - triples in LD subset
Page 10: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson
Page 11: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Visualisation Prototype Using Timemap – – Googlemaps and

Simile – http://code.google.com/p/time

map/

Early stages with this Will give location and ‘extent’ of archive. Will link through to Archives Hub

Page 12: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

wraggelabs.com/shed/presentations/anzsi

Moderador
Notas de la presentación
‘Every story has a beginning’
Page 13: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

http://archiveshub.ac.uk/linkinglives/

Page 14: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Linking Lives

• Linking Lives is a project to create an end-user interface based on Linked Data

• A biographical interface, providing information about individuals that is taken from a variety of sources

• Aim is to place archival descriptions within a much broader context

Page 15: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Martha Beatrice Webb

Place of birth: Gloucester, England Place of death: Liphook, Hampshire, England

Life dates: 1858-1943 Epithet: social reformer and historian Family name: Webb

Image

from: Beatrice Webb letters Beatrice Webb (1858 - 1943). Fabian Socialist, social reformer, writer, historian, diarist. Wife, collaborator and assistant of Sidney Webb, later Lord Passfield. Together they contributed to the radical ideology first of the Liberal Party and later of the Labour Party. from: Beatrice Webb, A summer holiday in Scotland, 1884. Beatrice Webb (1858-1943), nee Potter, social reformer and diarist. Married to Sidney Webb, pioneers of social science. She was involved in many spheres of political and social activity including the Labour Party, Fabianism, social observation, investigations into poverty, development of socialism, the foundation of the National Health Service and post war welfare state, the London School of

Biographical Notes

Works

Our Partnership My Apprenticeship The case for the factory acts Beatrice Webb’s diaries; edited by Margaret Cole The Diary

Knows

http://dbpedia.org/page/George_Bernard_Shaw

http://dbpedia.org/page/Sidney_Webb,_1st_Baron_Passfield

Moderador
Notas de la presentación
Mock-up of the LInking Lives interface shows the way data is brought together.
Page 16: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson
Moderador
Notas de la presentación
External data is key to linked data. We link to VIAF and through that to DBPedia. We are looking at linking to the BNB.
Page 17: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson
Moderador
Notas de la presentación
Current unfinished version of the interface.
Page 18: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Why?

• Telling stories • Placing archives in a global information space • External data forms part of the user interface

– moving away from the silo approach • Dynamic links to other content • Extensible • An exemplar – shows what can be done

Page 19: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Some Challenges / Lessons Learnt

• Steep learning curve • Difficult data, URI persistence • Linking data not straightforward • Keeping data up to date • How sustainable are the data sources? • Can you track the provenance of data

sources? • Are data licensing issues covered?

Moderador
Notas de la presentación
Data modelling can be hard – takes time Vocabularies can be hard Transforming data hard XSLT hard Not many tools Worth the investment?
Page 20: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Data Modelling

• Steep learning curve – RDF terminology “confusing” – Lack of archival examples

• Complexity – Archival description is hierarchical and

multi-level – RDF may be at odds with ISAD(G)

Moderador
Notas de la presentación
Steep learning curve: - RDF Linked Data modeling terminology - Lack of archive domain examples – though you now have LOCAH! - Certain level of expertise needed Dirty Data - Joe Bloggs and others’ rather than just a name, or where the access points do not have rules or a source associated with them. - Extent data highly variable Complexity - “lower level” units interpreted in context of the higher levels of description - Arguably “incomplete” without the contextual data. Relations are asserted, e.g. member-of/component-of But there is no requirement or expectation that data consumers will follow the links describing the relations From Pete’s blog post: “In a post on the Archives Hub blog, Jane emphasised the value of the “Linked Data” approach in making things mentioned in our data into “first-class citizens”. One consequence of the multi-level approach in archival description practice is a strong sense of the importance of “context”, and that the descriptions of the “lower level” units should be read and interpreted in the context of the higher levels of description (perhaps even that they are in some sense “incomplete” without that “contextual” data). In contrast, the “Linked Data” approach typically involves exposing “bounded descriptions” of individual resources. Now, certainly, yes, those “bounded descriptions” include assertions of relationships with other resources (including the sort of part-whole/member-of/component-of relationships present here), and those links can be followed by consumers to obtain further information on the other resources – however, there is no requirement or expectation that consumers will do so. So, there is arguably a (perhaps unavoidable) element of tension between the strongly “contextual” emphasis of EAD and ISAD(G) and the “bounded descriptions” of “Linked Data”. Rather than seeing that as an insurmountable hurdle, however, I think it provides an area that the project can usefully explore and evaluate.”
Page 21: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Hub data inconsistencies

• Winston Leonard Churchill • Sir Winston Leonard Spencer Churchill • Churchill, Sir, Winston Leonard Spencer, 1874-

1965, knight, prime minister and historian • Churchill, Winston Leonard, 1874-1965, prime

minister • Churchill, Sir Winston, 1874-1965, knight,

statesman and historian

Moderador
Notas de la presentación
Names are often entered into the Hub in different ways, despite the use of Rules.
Page 22: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Understanding Vocabs & Ontologies

Moderador
Notas de la presentación
One of the challenges of doing Linked Data is the plethora of vocabularies. It is hard to decide what we should use. Daniel Suara highlighted this.
Page 23: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Linking Names

http://archiveshub.ac.uk/blog/2013/08/hub-viaf-namematching/

Page 24: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Linking Subjects

Moderador
Notas de la presentación
But matching strings is not easy, e.g. matching subjects in the Hub with subjects in LCSH.
Page 25: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Thoughts on What Next?

• We still need more convincing use / business cases – Clear articulation of what researchers actually gain

by bringing diverse data together

• We still need more and better tools – But this depends on use cases

• Cultural heritage not working together enough – better collaboration on things like name URIs

• Coordinated consistent approach for vocabs

Moderador
Notas de la presentación
Quotes from Linking Lives Evaluation: Researchers want a clearer idea of what is covered and they don’t always understand the results they see and why they get certain results in response to their searches. I can’t help thinking that, bearing this in mind, bringing diverse sources together may make it more difficult for users to understand and interpret results. “they remained cautious about the the principle of bringing sources together” serendipitous searching there was a feeling that it could potentially be useful but also that it could actually distract the researcher from what is relevant. “I think at PhD level there’s a kind of artistry to how you make your way through…I’ve certainly never come across a search engine that can do the same or be as complex as your own thinning patterns.” Whilst it could be said that it is not important for users to understand how data is pulled together under the hood, our research suggested that potential users, particularly advanced researchers, do indeed have an interest in how and why this information has been gathered together in a particular way.
Page 26: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

Adrian Stevenson [email protected] @adrianstevenson More on Linked Data at: http://archiveshub.ac.uk/linkinglives/ http://data.archiveshub.ac.uk/ http://archiveshub.ac.uk/locah/

Page 27: "Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Stevenson

This presentation is available under creative commons Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/