19
Europeana and Enrichment Antoine Isaac Europeana PGA meeting Sept 25th, 2014

Enrichment and Europeana

Embed Size (px)

DESCRIPTION

Presentation at the Europeana Project Group Assembly, Sept. 25th, 2014

Citation preview

Page 1: Enrichment and Europeana

Europeana and Enrichment

Antoine Isaac

Europeana PGA meeting

Sept 25th, 2014

Page 2: Enrichment and Europeana

Semantic extraction?

Recognizing and extracting named entities and keywords, analyzing the sentiment of a document, extracting facts and relation between those facts and named entities, categorizing documents, recognizing and extracting concepts and finally adding them as metadata or annotations.

Market study on technical options for semantic feature extraction http://pro.europeana.eu/web/network/europeana-tech/-/wiki/Main/

Market+study+on+technical+options+for+semantic+feature+extraction

Page 3: Enrichment and Europeana

Semantic enrichment?

In a linked data environment, enrichment refers to the creation of new links between the enriched resources and another data resource. […] link to controlled vocabularies or authority files (contextualization)

Automatic Enrichments with Controlled Vocabularies in Europeana: Challenges and Consequences

Stiller, Petras, Gäde, Isaac. Euromed 2014

Page 4: Enrichment and Europeana

Semantic enrichment?• Analysis: the pre-enrichment phase focuses on the analysis of the

metadata fields in the original resource descriptions, the selection of potential resources to be linked to and derives rules to match and link the original fields to the contextual resource.

• Linking: the process of automatically matching the values of the metadata fields to values of the contextual resources and adding contextual links (whose values are most often based on equivalent relationships) to the dataset.

• Augmentation: the process of selecting the values from the contextual resource to be added to the original object description. This might not only include (multilingual) synonyms of terms to be enriched but also further information, for example broader or narrower concepts.

Automatic Enrichments with Controlled Vocabularies in Europeana: Challenges and ConsequencesEuromed 2014

Page 5: Enrichment and Europeana

Characteristics of enrichment

• Adding new data on top of existingnormalization focus on syntactic aspects, no addition of new

semantics

• (Semi-)automatic– For manual enrichment, see discussion on Annotations

• Connecting to internal or external datasets

Page 6: Enrichment and Europeana

Where does it happen?

• Ingestion from providers– Harvesting metadata and content

• Consolidating Europeana’s "master" database– De-referencing– Enrichment

• Leveraging data for search– Augmenting Solr index – Query enrichment and translation

Page 7: Enrichment and Europeana

Not de-referencing?

• In provider data, it is semantically equivalent to have a CHO with link or a CHO with link and contextual entity materialized next to it

• Just called « richer » (more structured, « semantic ») metadata given by providers

Page 8: Enrichment and Europeana

Not index augmentation?

• One semantic link can lead to different indexes• Enrichment shouldn’t be considered to feed directly

in application/tool-specific databasesNB: it should be exchangeable

• Yet enrichment should be designed in coordination with what will happen laterAugmentation is the post-prod of linking

Page 9: Enrichment and Europeana

Not query enrichment/translation?

• Tools used may be the same (NLP)• But the evaluation criteria change• These enrichments are ‘lost’, not exchangeable

Page 10: Enrichment and Europeana

Ground material for enrichment

Metadata is the primary focus of most effortsContent can also be used• Extraction of visual features

– Text transcription– Map alignment– Image-based similarity (Ecreative)

• Extraction of audio features (ESounds)

Page 11: Enrichment and Europeana

Linking is king• Object/object

• Cross-dataset de-duplication – equivalence/similarity links• Other relations – derivation, part-of, FRBRization• Clustering into hierarchical objects or collections• NB: neglected, though Europeana can contribute something

• Object/Context• Agents• Concepts• Places• Periods and Events• Documentation, e.g., Wikipedia articles

• Context/Context (vocabulary alignment)• Matching concepts

Page 12: Enrichment and Europeana

Europeana enrichment

• Bringing multilingual, structured data• Collaborative/strategy aspect• Likely to interest providers (Einside)

Page 13: Enrichment and Europeana

Should we be interested in other kinds of enrichment?

• Non-semantic tagging with simple words• Translation• Named entity recognition• Language detection for metadata fields• Group editing, when not actioned by providers

Page 14: Enrichment and Europeana

Europeana-related projects in the picture• Object/object

• De-duplication – equivalence/similarity links• Other relations – derivation (ESounds), part-of, FRBRization (TEL)• Clustering (EF-OCLC)

• Object/Context• Agents• Concepts (PATHS, EConnect, LOCloud, MIMO)• Places (EConnect, LOCloud)• Periods and Events (PATHS, ECloud)• Documentation, e.g., Wikipedia articles (PATHS, LOCloud)

• Vocabulary alignmentEConnect (Amalgame), EFG, EUScreen, ATHENAplus?, PartagePlus

• Non-semantic tagging with simple words• Translation• Named entity recognition• Language detection for metadata fields• Group editing, when not actioned by provider (Esounds)

Page 15: Enrichment and Europeana

Other categories?

Page 16: Enrichment and Europeana

Next steps?

• Agree on categories• Agree on APIs for enrichment services• Addressing post-processes for applications (solr indexing)• Evaluation• Informativeness measure, completness• Showing it?

Page 17: Enrichment and Europeana

APIs for enrichment services

• Input: record, field, collection?– Meta-enrichers

• Problem: API result often assume application needs and data elements that are useful, beyond the URI of the entity: They are APIs for enrichment+de-referencing.

• Keeping track of provenance (data field, version of enrichment tool…)

• Example of Sounds music information retrieval• Exchanging enrichment data. Cf EDMpaths

Page 18: Enrichment and Europeana

Example: Europeana enrichment console prototype

Page 19: Enrichment and Europeana

Antoine Isaac

[email protected]

@EuropeanaTech