21
co-funded by the European Union Linked Data Mapping Cultures An Evaluation of Metadata Usage and Distribution in a Linked Data Environment Konstantin Baierer , Evelyn Dröge , Vivien Petras, Violeta Trkulja Berlin School of Library and Information Science, Humboldt-Universität zu Berlin Presentation at the International Conference on Dublin Core and Metadata Applications Austin, October 9, 2014

Dc 2014 baierer-droege

Embed Size (px)

DESCRIPTION

Linked Data Mapping Cultures An Evaluation of Metadata Usage and Distribution in a Linked Data Environment Konstantin Baierer, Evelyn Dröge, Vivien Petras, Violeta Trkulja Berlin School of Library and Information Science, Humboldt-Universität zu Berlin Presentation at the International Conference on Dublin Core and Metadata Applications Austin, October 9, 2014

Citation preview

Page 1: Dc 2014 baierer-droege

co-funded by the European Union

Linked Data Mapping Cultures An Evaluation of Metadata Usage and Distribution

in a Linked Data Environment

Konstantin Baierer, Evelyn Dröge, Vivien Petras, Violeta Trkulja Berlin School of Library and Information Science, Humboldt-Universität zu Berlin

Presentation at the International Conference on Dublin Core and Metadata Applications Austin, October 9, 2014

Page 2: Dc 2014 baierer-droege

Outline

Linked Data Mapping Cultures 2 09.10.2014

1. Linked Data mapping cultures 2. Digitised Manuscripts to Europeana 3. EDM and DM2E model 4. Evaluation: aim, datasets, methods 5. Results of the evaluation 6. Conclusion

Page 3: Dc 2014 baierer-droege

Linked Data mapping cultures

• Linked Data offers great expressivity

With great freedom comes great responsibility

• Data in DM2E:

– Different data formats

– Different data curation background = Different cultures in Linked Data

• Data providers ≠ data mapping institutions

• Mapping is influenced by policies, technology, best practices, personal preferences…

Linked Data Mapping Cultures 3 09.10.2014

Page 4: Dc 2014 baierer-droege

Digitised Manuscripts to Europeana (DM2E)

4 09.10.2014 Linked Data Mapping Cultures

Heterogeneous object data in independent resources

Page 5: Dc 2014 baierer-droege

EDM and DM2E model

EDM = Europeana Data Model

• Used to describe Cultural Heritage Objects (CHOs)

• Very generic but can be specialized

DM2E model: Specialization of EDM for manuscripts

Linked Data Mapping Cultures 5 09.10.2014

dm2e: <http://onto.dm2e.eu/schemas/dm2e/1.0/>

dm2edata: <http://data.dm2e.eu/data/>

edm: <http://www.europeana.eu/schemas/edm/>

Page 6: Dc 2014 baierer-droege

DM2E model: Example

Linked Data Mapping Cultures 6 09.10.2014

foaf:Person dm2edata:agent/uib/ wab/

Ludwig_Wittgenstein

ore:Aggregation

dm2edata:aggregation/uib/wab/Ms-115/Ms-115-2

sko

s:

pre

fLa

bel

“Ludwig Wittgenstein”@de

“remark Ms-115,1[2]et2[1]

from Wittgenstein Nachlass

MS 115”@en

edm:ProvidedCHO

dm2edata:

item/uib/wab/

Ms-115/Ms-115-2

foaf:Organization

dm2edata:agent/uib/wab/ Wittgenstein_Archives

edm:WebResource

http://wab.uib.no/cost-

a32_fax/115/Ms-115%2c1.jpg

dm2e:Paragraph dc:type

Page 7: Dc 2014 baierer-droege

Aim of the evaluation

• Evaluation of datasets from the DM2E project

– Based on mappings to the DM2E model

• Aim: discover similarities and differences between datasets from different mapping institutions

Linked Data Mapping Cultures 7 09.10.2014

Do mapping preferences of individual institutions influence the resulting data

from a mapping process?

Page 8: Dc 2014 baierer-droege

Analyzed datasets

• Datasets as of May 1, 2014

• Analyzed datasets:

– Eight data providers DP I – DP VIII

– Ten datasets Dataset 1 – 10

– Six mapping institutions MI A – F

– Variety of metadata formats

Linked Data Mapping Cultures 8 09.10.2014

DP Dataset Metadata

format MI

DP I Dataset 1

proprietary format

MI A

DP I Dataset 2

proprietary format

MI A

DP II Dataset 3

MAB2 MI B

DP II Dataset 4

MAB2 MI B

DP III Dataset 5

METS/

MODS MI C

DP IV Dataset 6

METS/ MODS

MI C

DP V Dataset 7

TEI P5 MI D

DP VI Dataset 8

EAD MI D

DP VII Dataset 9

TEI P5 MI E

DP VIII Dataset 10

TEI P5 MI F

DP: Data Provider MI: Mapping institution

Page 9: Dc 2014 baierer-droege

Evaluation methods

• Count (SPARQL)

– per dataset

– globally

– per rdf:type and dc:type

• Create metrics (Python)

• Visualize (Google Charts)

• All visualizations:

Linked Data Mapping Cultures 9 09.10.2014

http://data.dm2e.eu/visualize

Page 10: Dc 2014 baierer-droege

Results

Page 11: Dc 2014 baierer-droege

CHO types

Dataset bibo:

Series

bibo:

Book

dm2e:

Manu-script

dm2e:

Para-graph

bibo:

Journal

bibo:

Issue

fabio:

Article

bibo:

Letter

dm2e:

Page

Dataset 1 - - 24 - - - - - 10,427

Dataset 2 1,251 10 530,314

Dataset 3 4,552 39,873 - - - - - - -

Dataset 4 - - 175 - - - - - 46,006

Dataset 5 - - 1,012 - - - - - 307,202

Dataset 6 - 2,916 - - - - - - 472,994

Dataset 7 - 1,295 - - - - - - 416,172

Dataset 8 - - - - - - - 3,630 34,596

Dataset 9 - - - - 1 346 42,173 - 159,277

Dataset 10 - - 20 9,635 - - - - -

Total 4,552 45,335 1,241 9,635 1 346 42,173 3,630 1,976,988

Linked Data Mapping Cultures 11 09.10.2014

Page 12: Dc 2014 baierer-droege

Distribution of classes

Linked Data Mapping Cultures 12 09.10.2014

Page 13: Dc 2014 baierer-droege

Distribution of properties

Linked Data Mapping Cultures 13 09.10.2014

Page 14: Dc 2014 baierer-droege

Usage of different ontologies

Linked Data Mapping Cultures 14 09.10.2014

Page 15: Dc 2014 baierer-droege

Resources vs. literals

Linked Data Mapping Cultures 15 09.10.2014

Page 16: Dc 2014 baierer-droege

Literal statements

Linked Data Mapping Cultures 16 09.10.2014

Page 17: Dc 2014 baierer-droege

Predicate-Object-Equality-Ratio (POER-n)

Linked Data Mapping Cultures 17 09.10.2014

Triples S1 P1 O1

S2 P1 O1

S3 P1 O1

S3 P2 O2

S4 P1 O2

S4 P2 O2

S4 P2 O3

POER-n POER-1: 85.71 % POER-2: 57.14 % POER-3: 57.14 %

POER-4: 0 %

POER-1 in DM2E datasets: 0.08 – 2.48 %

Graph

Page 18: Dc 2014 baierer-droege

Average number of statements (ANOS)

Linked Data Mapping Cultures 18 09.10.2014

Page 19: Dc 2014 baierer-droege

Conclusion

• Linked Data quality assurance is vital

• Structural metrics help everybody

• Ontology engineering as a cyclic process

• “Ontology pruning”

• People > data in metadata mapping

Linked Data Mapping Cultures 19 09.10.2014

Page 20: Dc 2014 baierer-droege

Thank you for your attention!

Konstantin Baierer

Evelyn Dröge

Berlin School of Library and Information Science

Humboldt-Universität zu Berlin

www.ibi.hu-berlin.de

Digitised Manuscripts to Europeana

www.dm2e.eu

[email protected]

[email protected]

Linked Data Mapping Cultures 20 09.10.2014

Page 21: Dc 2014 baierer-droege

References

Literature • Alexander, Keith, Richard Cyganiak, Michael Hausenblas, and Jun Zhao. (2009). Describing Linked Datasets. On the Design and Usage of

VoID, the “Vocabulary of Interlinked Datasets”. In Bizer et al. (Eds.), Proceedings of the Linked Data on the Web Workshop (LDOW2009), Madrid, Spain, April 20, 2009, CEUR Workshop Proceedings. Retrieved, May 14, 2014, from http://ceur-ws.org/Vol-538/.

• Auer, Sören, Jan Demter, Michael Martin, and Jens Lehmann. (2012). LODStats – An Extensible Framework for High-Performance Dataset Analytics. In ten Teije et al. (Eds.), Knowledge Engineering and Knowledge Management. 18th International Conference, EKAW 2012, Galway City, Ireland, October 8-12, 2012, Proceedings (pp. 356-362). Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-33876-2.

• Carroll, J. Carroll, Christian Bizer, Pat Hayes, and Patrick Stickler. (2005). Named Graphs. In Journal of Web Semantics, 3, 247-267. • Dröge, Evelyn, Julia Iwanowa, and Steffen Hennicke. (2014a). A specialisation of the Europeana Data Model for the representation of

manuscripts: The DM2E model. In Libraries in the Digital Age (LIDA) Proceedings, Volume 13, 2014. Retrieved, July, 24, 2014, from http://ozk.unizd.hr/proceedings/index.php/lida/article/view/117.

• Dröge, Evelyn, Julia Iwanowa, Steffen Hennicke and Kai Eckert. (2014b, March). DM2E Model V1.1 Retrieved, May 12, 2014, from http://pro.europeana.eu/documents/1044284/0/DM2E+Model+V+1.1+Specification.

• Europeana Data Model Primer, v14/07/2013. (2013, July). Retrieved from: Europeana Professional website. Retrieved, April 28, 2014, from http://pro.europeana.eu/ documents/900548/770bdb58-c60e-4beb-a687-874639312ba5.

• Heath, Tom, and Christian Bizer. (2011). Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology (Vol. 1). Morgan & Claypool.

• Klimek, Jakub, Jirí Helmich, and Martin Necasky. (2014). An analysis supported by numerous visualizations Application of the Linked Data Visualization Model on Real World Data from the Czech LOD Cloud. Linked Data on the Web (LDOW 2014) Workshop. Retrieved, May 14, 2014, from http://events.linkeddata.org/ldow2014/papers/ldow2014_paper_13.pdf.

• Palavitsinis, Nikos, Nikos Manouselis, and Salvador Sanchez-Alonso. (2014). Metadata quality in digital repositories: Empirical results from the cross-domain transfer of a quality assurance process. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23045.

• Seiffert, Florian. (2001). Eine Analyse der Verbunddaten des HBZ. ABI-technik 21(2): 125-146. • Smith-Yoshimura, Karen, Catherine Argus, Timothy J. Dickey, Chew Chiat Naun, Lisa Rowlison de Ortiz, Hugh Taylor. (2010, March).

Implications of MARC Tag Usage on Library Metadata Practices, OCLC Online Computer Library Center, Inc. Retrieved, May 14, 2014, from http://www.oclc.org/research/publications/library/2010/2010-06.pdf

Images • Speech Bubble (Slide 2): http://commons.wikimedia.org/wiki/File:Blue-Speech-Bubble.png • IBI (Slide 20): http://commons.wikimedia.org/wiki/File:Berlin,_Mitte, _Dorotheenstrasse,_Handelskammer_Berlin_02.jpg

Linked Data Mapping Cultures 21 09.10.2014