20
co-funded by the European Union Preliminary Results from the Contextualization Dominique Ritze, Klaus Thoden

All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden

Embed Size (px)

Citation preview

co-funded by the European Union

Preliminary Results from the Contextualization

Dominique Ritze, Klaus Thoden

Why contextualization?

• Disambiguation• Linked Open Data

27.11.13

Contextualization in Year 1

• Baseline• Identification of global identifiers• Authority and type of identity• BBAW, SBB, NLI, UB Frankfurt, MPIWG, ÖNB:

– mostly contextualization of persons and corporate bodies

• What can we do more?

27.11.13

Sources

• GND/ VIAF – Persons, corporations, titles• LCSH, DDC – Subject headings• Wikipedia/Dbpedia – Everything• Geonames – Places• InPho – Argumentation structure• ISIL – Libraries• CERL – Historical places, printers

627.11.13

GND

<title>Der zerbrochene Krug</title>GND

727.11.13

VIAF

<author>Ludwig Wittgenstein</author>Viaf

827.11.13

Wikipedia

<subfield code=“a“>Aus der Bibliothek des Prinzen Eugen von Savoyen</subfield>

Wikipedia/Dbpedia

927.11.13

<subject>Adminstration</subject>

LCSH

LCSH

1027.11.13

DDC

The 1914 - 1918 Collection of the American Jewish Joint Distribution Committee is comprised of the records of the New York headquarters for the period from the Joint's origins

providing emergency relief through World War I.

DDC

1127.11.13

GeoNames

<pubPlace>Berlin</pubPlace>Geonames

27.11.13

Sources

• GND/ VIAF – Persons, corporations, titles• LCSH, DDC – Subject headings• Wikipedia/Dbpedia – Everything• Geonames – Places• InPho – Argumentation structure• ISIL – Libraries• CERL – Historical places, printers

27.11.13

Workflow

• Ingestion through Omnom• Contextualization in DM2E Triplestore• Common input vocabulary – but not really

consistent• Saved as independent triples – no change of

original data

SILK Demo

1427.11.13

• Workbench to create Linkage Rules with a GUI

• Transformations and Normalizations• Similarity metrics to compare values

• Aggregators to combine various comparisons

Structured Data

1627.11.13

a1 GND “118650130“

a2GND

“118650130“

equals

Unique Identifier

project dataGND

Structured Data

1727.11.13

a1 name “C. Brodley“

a2 name “Brodley, Carla“

similarity

Datatype Properties

project dataGND

Structured Data

1827.11.13

a1name

“C. Brodley“

a2

“Brodley, Carla“

namesimilarity

“1991“ “1991“

year year

similarity

Combination of Datatype Properties

project dataGND

Structured Data

1927.11.13

a1 name “C. Brodley“

a2 name “Brodley, Carla“

similarity

“1956“year of birth

“1820“

year of death

project dataGND

Excluding Links

Unstructured Data

2427.11.13

Real-world example

27.11.13

Limitations

Needs high computing power No on-the-fly change of linkage rules Not well-suited for structured data Sparse metadata: get information out of

transcriptions? Named Entity Recognition? Know your data! Results have to be checked.

27.11.13

DM2E Silk Workbench

• Put behind SSO• No user management• Keep own sources (at least GND)• Possibly keep contextualization job to some

power users