View
320
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
co-funded by the European Union
Contextualisation
Dominique Ritze
Motivation
218.06.2014
Who is George Grote?
Which resources sharethe same subjects?
Example
318.06.2014
Work: Der zerbrochene Krug
Example
418.06.2014
Author: Ludwig Wittgenstein
Example
518.06.2014
Owner: Prinz Eugen von Savoyen
Example
618.06.2014
Subject: Adminstration
Example
718.06.2014
Place: Berlin
Overview
• Silk as Contextualisation Tool• System Integration• Contextualisation Progress and Results• Challenges• Applicability and Reuseability• Future Plans
818.06.2014
Contextualisation with Silk
• Silk: Link Discovery Framework (UMA)• Definition of linkage rules to create links between Linked
Data resources
• http://context.dm2e.eu918.06.2014
Intergration of Silk
• Silk is integrated in OmNom as Web Service
1018.06.2014
use generatedconfiguration
show links
Access to Contextualisation Results
• Contextualization results (Linksets) are kept separate from ingested data
• Linksets are further described and versioned
• Additional linkset properties (tbd):– Automatically created– Manually created– Recall-oriented (exploratory, but with wrong links)– Precision-oriented (incomplete, but high quality)
1118.06.2014
Used Linked Data Resources
1218.06.2014
Geonames GNDLCSHDBPedia
Freebase
Places Subjects
Agents
DDCLinked
Geodata
Example Process
1318.06.2014
• Manual creation of linkage rules, e.g. compareskos:prefLabel with rdfs:label using Levenstheindistance, link if distance < 2
• Let Silk run to find the links
Results
• Contextualised all datasets that are currently ingested-> no qualitative analysis so far
• increased the number of existing links by 20% (performance requirement)
• Different amounts of links were found– Dingler (UBER) 134 unique links– Deutsches Textarchiv (BBAW) 9946 unique links
• Potential to find more links1418.06.2014
Links in Pubby
1518.06.2014
Links to DBPedia
1618.06.2014
Links to GeoNames
1718.06.2014
Links in Pubby
1818.06.2014
Challenges
• In most cases, only a prefered label is available– Nancy France vs. Nancy Kentucky
• Very specific rules for different spellings/abbreviationsrequired– Frankfurt am Main vs. Frankfurt a.M. vs. Frankfurt a/M
• Unstructured data is not captured
1918.06.2014
• Place: Wren Library, Trinity College Cambridge
• Agent: Georg Tanner, Maximilian II.
Unstructured Data
2018.06.2014
Results unstructured data
2118.06.2014
• Codices provenance
• WAB description
Applicability and Reuseability
• Created linkage rules can be reused but an adaptionmight be necessary
• Knowledge about the Silk framework and the similarityfunctions is required
• Access to the datasets is required (as dump or in a triplestore)
• Quality of the links is not ensured
2218.06.2014
Future Work
• Evaluation of the detected links– Iterative process to improve the links
• Can we use existing information, e.g. already knownconnections to strenghen/weaken links?
• Questions that can be answered based on the links?– Where have the resources been published?– MarineLinves – Map of the ship routes
2318.06.2014