DM2E Project meeting Bergen: WP2: Contextualization, Dominique Ritze (University of Mannheim)

Preview:

DESCRIPTION

 

Citation preview

co-funded by the European Union

Contextualisation

Dominique Ritze

Motivation

218.06.2014

Who is George Grote?

Which resources sharethe same subjects?

Vorführender
Präsentationsnotizen
GND (gemeinsame Normdatei – universal authority file, published by the german national library) Persons, subject headings, corporate bodies

Example

318.06.2014

Work: Der zerbrochene Krug

Vorführender
Präsentationsnotizen
GND (gemeinsame Normdatei – universal authority file, published by the german national library) Persons, subject headings, corporate bodies

Example

418.06.2014

Author: Ludwig Wittgenstein

Vorführender
Präsentationsnotizen
Virtual international authority file link the national authority files

Example

518.06.2014

Owner: Prinz Eugen von Savoyen

Vorführender
Präsentationsnotizen
Dbpedia/wikipedia

Example

618.06.2014

Subject: Adminstration

Example

718.06.2014

Place: Berlin

Overview

• Silk as Contextualisation Tool• System Integration• Contextualisation Progress and Results• Challenges• Applicability and Reuseability• Future Plans

818.06.2014

Contextualisation with Silk

• Silk: Link Discovery Framework (UMA)• Definition of linkage rules to create links between Linked

Data resources

• http://context.dm2e.eu918.06.2014

Vorführender
Präsentationsnotizen
Structured information

Intergration of Silk

• Silk is integrated in OmNom as Web Service

1018.06.2014

use generatedconfiguration

show links

Access to Contextualisation Results

• Contextualization results (Linksets) are kept separate from ingested data

• Linksets are further described and versioned

• Additional linkset properties (tbd):– Automatically created– Manually created– Recall-oriented (exploratory, but with wrong links)– Precision-oriented (incomplete, but high quality)

1118.06.2014

Used Linked Data Resources

1218.06.2014

Geonames GNDLCSHDBPedia

Freebase

Places Subjects

Agents

DDCLinked

Geodata

Example Process

1318.06.2014

• Manual creation of linkage rules, e.g. compareskos:prefLabel with rdfs:label using Levenstheindistance, link if distance < 2

• Let Silk run to find the links

Results

• Contextualised all datasets that are currently ingested-> no qualitative analysis so far

• increased the number of existing links by 20% (performance requirement)

• Different amounts of links were found– Dingler (UBER) 134 unique links– Deutsches Textarchiv (BBAW) 9946 unique links

• Potential to find more links1418.06.2014

Links in Pubby

1518.06.2014

Vorführender
Präsentationsnotizen
GND (gemeinsame Normdatei – universal authority file, published by the german national library) Persons, subject headings, corporate bodies

Links to DBPedia

1618.06.2014

Links to GeoNames

1718.06.2014

Links in Pubby

1818.06.2014

Vorführender
Präsentationsnotizen
GND (gemeinsame Normdatei – universal authority file, published by the german national library) Persons, subject headings, corporate bodies

Challenges

• In most cases, only a prefered label is available– Nancy France vs. Nancy Kentucky

• Very specific rules for different spellings/abbreviationsrequired– Frankfurt am Main vs. Frankfurt a.M. vs. Frankfurt a/M

• Unstructured data is not captured

1918.06.2014

• Place: Wren Library, Trinity College Cambridge

• Agent: Georg Tanner, Maximilian II.

Unstructured Data

2018.06.2014

Results unstructured data

2118.06.2014

• Codices provenance

• WAB description

Vorführender
Präsentationsnotizen
Most information is not included in other data fields

Applicability and Reuseability

• Created linkage rules can be reused but an adaptionmight be necessary

• Knowledge about the Silk framework and the similarityfunctions is required

• Access to the datasets is required (as dump or in a triplestore)

• Quality of the links is not ensured

2218.06.2014

Future Work

• Evaluation of the detected links– Iterative process to improve the links

• Can we use existing information, e.g. already knownconnections to strenghen/weaken links?

• Questions that can be answered based on the links?– Where have the resources been published?– MarineLinves – Map of the ship routes

2318.06.2014

Recommended