15
Ansgar Scherp [email protected] 104th Bibliothekartag, Nuremberg, Germany, May 2015 About the Challenges of Linked Open Data (LOD) in Libraries

About the Challenges of Linked Open Data (LOD) in Libraries

  • Upload
    dangnhi

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: About the Challenges of Linked Open Data (LOD) in Libraries

Ansgar [email protected]

104th Bibliothekartag, Nuremberg, Germany, May 2015

About the Challenges of Linked Open Data (LOD) in Libraries

Page 2: About the Challenges of Linked Open Data (LOD) in Libraries

Index Newly Acquired Media

• Ancient world: Library of Alexandria• Today: database-oriented systems• Tomorrow: Web Linked Open Data in Libraries

Source: http://en.wikipedia.org/wiki/Library_of_Alexandria

- 2 -

Page 3: About the Challenges of Linked Open Data (LOD) in Libraries

Linked Open Data (LOD) in Libraries• Publishing and interlinking of data• Different quality and purpose• From different sources in the Web

World Wide WebDocumentsHyperlinksHTMLAddresses (URIs)

Example: http://www.bibliothekartag2015.de

Linked DataDataTyped LinksRDFAddresses (URIs)

- 3 -

Page 4: About the Challenges of Linked Open Data (LOD) in Libraries

Linked (Library) Data: A Success Story

- 4 -

Page 5: About the Challenges of Linked Open Data (LOD) in Libraries

Current (Technological) Topics *)

1. Entity resolution2. Schema matching3. Distributed data management4. Automatic indexing5. Indexing non-textual content6. Data provenance

Non-technical but equally important:• Quality management (e.g., automated indexing)• Legal aspects• Job market

*) Disclaimer: No guarantee for completeness - 5 -

Core computerscience

Page 6: About the Challenges of Linked Open Data (LOD) in Libraries

1. Entity Resolution• Intra-library

• Identify the author of a new publication

• Inter-library• Linking records via , e.g., authors

Helmut Kohl Helmut Kohlvs.

Ansgar Scherp Ansgar Scherpvs.

ZBW/DNB DBLP

1995 2005

- 6 -

Page 7: About the Challenges of Linked Open Data (LOD) in Libraries

1. Entity Resolution in LOD• Use URI aliases to connect LOD resources• Describing the same things in the real world• Service for sameAs-links: .org

• Resolution of name, co-authors, title, and venue often not sufficient

- 7 -Source: J. Neubert, K. Tochtermann: Linked Library Data: Offering a Backbone for the Semantic Web, CiCIS, 2012.

Source Persons Organizations

DBpedia 364,000 148,000Library of Congress Authorities 3,800,000 900,000German NationalLibrary AuthorityFile 1,797,911 1,262,404Virtual International Authority File 10 million 3.25 million

Page 8: About the Challenges of Linked Open Data (LOD) in Libraries

2. Schema Matching: STW and TheSoz

- 8 -

Standard Thesaurus Wirtschaft

• Manually created ~5000 mappings (mostly 2004/2005)• Also connected to GND and ACROVOC• OAEI Library Track for ontology matching (since 2012)

TheSoz (GESIS)

Page 9: About the Challenges of Linked Open Data (LOD) in Libraries

VIAF(Virtual International Authority File)• Combines multiple name authority files (http://viaf.org/)• Lower costs and increase utility of library authority files • Matching and linking widely-used authority files and

making that information available on the Web

- 9 -

Page 10: About the Challenges of Linked Open Data (LOD) in Libraries

• Auto-completion suggests terms from PND, STW, …• Author confirms by selecting terms• Keyword is matched with the semantic concept

- 10 -

Subject Indexing in

Page 11: About the Challenges of Linked Open Data (LOD) in Libraries

• Auto-completion suggests terms from PND, STW, …• Author confirms by selecting terms• Keyword is matched with the semantic concept

- 11 -

Subject Indexing in

Page 12: About the Challenges of Linked Open Data (LOD) in Libraries

4. Automated Indexing in GERHARD

- 12 -

• ~ 1 Mio web documents • ~ 10.000 concepts from UDC• 3 Languages (EN, DE, FR,)

Page 13: About the Challenges of Linked Open Data (LOD) in Libraries

4. Automated Indexing at ZBW• 1.6 Mio documents with STW annotations in LOD• Average of 5 descriptors per document

• Multi-labeling scientific documents using kNNclassifier with entity detection and the HITS algorithm

• Experiments over 62,000 open access documents• Avg. recall of 40% and precision of 40%• Outperforms today's approaches such as Maui

• But: does not require expensive training phases

• Integrate automatic classification methods in semi-automatic workflow ( “human in the loop”)

- 13 -

Page 14: About the Challenges of Linked Open Data (LOD) in Libraries

6. Data Provenance• VIAF: inter-organizational, cross-border

and thus cross-lingual record linkage• Records may come from different libraries

• But how to …• track metadata (re)use?• refer to original metadata when library A uses a

(part of) record from library B?

- 14 -

• Digitally signing and publishing metadata as LOD• Allows to build network of trust

Page 15: About the Challenges of Linked Open Data (LOD) in Libraries

Summary• Libraries as innovation driver for Linked Open Data• Interesting research topics for computer science• Both data and expertise is available

• Present of representing metadata and record linkage

Got Interested?Contact me:Ansgar ScherpEmail: [email protected]: http://zwb.eu/en/research/

knowledge-discovery