Upload
neal-snow
View
213
Download
1
Embed Size (px)
Citation preview
The mapping process – some observations
Robina ClayphanEDLF
Local schemas > ESE
Data Flow
Management of the process• Sheer complexity of managing the hundreds of files
going through the steps in the process• keeping track of the status of the files
– straight-forward ones – in the right place for the next step – problem ones - refer back to provider or a developer
• Use of Sharepoint document libraries and rapid establishment of procedures that all must adhere to
• The management of the process evolved during implementation - a very steep learning curve
• Maintenance of authority files– getting for meta-metadata from the providers (types etc)– collection IDs
(sort of) Policy issues• Inclusion criterion: must have a link giving direct
access to the digital object– check if URLs in data actually resolve to the object
described• Often:
– resolve to metadata page with e.g. pdf icon – how many clicks are acceptable – need for policy decision– granularity mismatch – link at title level only
• Sometimes: – 404 page not found - refer to provider – persistence of URLs– need a plug in (e.g. DjVu) – is that OK?
• Occasionally: a log-in required for restricted access resources
• Need for providers to ensure they only provide links to resources that can be accessed
Data level problems 1
• Trying to understand decision-making process of the original metadata creators– What they meant by e.g. dc:date, dc:source
• Trying to discern the (implicit) data model of the original metadata creators– What is the dc:relation referring to
• Understanding data in a foreign language or foreign script– Is negyedévenként really hungarian for terminally?
• And, if so, why is it in dc:format?
Data level problems 2• Questions to developers that arose from examining
the data– All records have two instances of dc:identifier the first a URL the
second (possibly) a shelfmark. Need to map each instance to a different ESE - can it be done?
– All records have two instances of dc:rights the first appropriate the second not – is it possible to just display the first and ignore the second?
– Where values had been divided between multiple instances of the same element – could they be concatenated with punctuation for a better display e.g spatial1, spatial2, spatial3 used for a geographic hierarchy. Another with up to 14 instances of dc:subject.
Normalisation level
• At the normalisation stage you can see if your interpretation of the record actually makes sense when it has been processed against the source data.
• Apply the Quality Control Checklist• Edit mapping and repeat !
(my) Conclusion
• All indicates:– that it is easier if the mapping and normalising is done as
close to source as possible, ideally by the providers• they are the ones who understand what the data means and can
make sensible mapping decisions• they understand the language and script
– Tools would be nice!
Local schemas > ESE
Data FlowTransform data to populate local repository
#0
Export data to Europeana
#5
Aggregator? EuropeanaLocalAggregator with provider?
Aggregator with provider?
EuropeanaLocal Content Provider Model - to illustrate movement of metadata only
Aggregator
EuropeanaLocal Parallel Test Environment
Aggregator
Europeana
C o n t e n t p r o v i d e r r e p o s i t o r i e s
C o n t e n t p r o v i d e r l o c a l s y s t e m s
Customised transformations to e.g. OAI-DC
Mapping and transformation to ESE, including <europeana> elements
Harvesting of e.g. OAI-DC
No metadata transformations
• Currently a great deal of manual effort goes into metadata transformation. – at provider sites: local format to repository format– by the Europeana development team harvested
format to ESE – normalisation by Europeana development team
• Where will this work happen in EuropeanaLocal?– feasibility of central Europeana staff handling
hundreds more collections?• Can we minimise the current manual overhead?
Issues for EuropeanaLocal
• What are the possibilities for automating all or some of the transformation work?