1
Translation Proofing – Quantitative Tools for Connecting Metadata Dialects
Ted HabermannDirector of Earth Science The HDF [email protected]
Metadata in Multiple Dialects
DocumentationRepositoryISO 19115,
19115-2, 19119 and extensions
THREDDS
HDF, netCDF(NcML)
FGDC,Data.Gov
SensorML
WCS, WMS, WFS, SOS
Open Provenance
Model, PROV
DIF, ECS, ECHO
KML
Translation Lossiness
Documentation dialects generally have significant overlap because the concepts that are being documented (who, where, what, when, and why?) are shared cross many communities and dialects.
At the same time, there are differences…
A B AB
More Lossy Less Lossy
We are familiar with the idea of lossiness with data compression. How can we quantify the lossiness of a translation?
Characterizing the Source
The distribution of elements in any metadata collection reflects the requirements of the data providers and users. Some elements are more common (important?) than others.
This heterogeneity needs to be considered when evaluating the translation.
448 CSDGM Records161,151 Elements and Attributes10,713 Place Keywords1 /metadata/USGSErp/MetadataNotes264 elements occur < 100 times
Lossiness = Distribution + Crosswalk
+Actual Distribution (collection & community) Reference Crosswalk
In order to calculate the lossiness of a translation we need the actual distribution of elements in the source and a reference crosswalk that gives the destinations that the source elements are mapped to.
Source Destination
ESIP Winter 2014 6
Three Examples
January 8-10, 2014
Element # % Translated? % TranslatedA 134 66% 1 66%B 50 25% 1 25%C 20 10% 1 10%
204 1 100%
Element A occurs 134 times and makes up 66% of the source Element B occurs 50 times and makes up 25% of the source Element C occurs 20 times and makes up 10% of the source
Element # % Translated? % TranslatedA 134 66% 1 66%B 50 25% 0 0%C 20 10% 1 10%
204 1 75%
Element # % Translated? % TranslatedA 134 66% 1 66%B 50 25% 1 25%C 20 10% 0 0%
204 1 91%
100% elements translated: lossiness = 0%
75% elements translated: lossiness = 25%
91% elements translated: lossiness = 9%
Calculating Lossiness
+
Number of Occurrences
Total Number of Elements*1 if in crosswalk0 if not
n = 1
number of elements
= Lossiness
Actual Distribution (collection & community) Reference Crosswalk
1-
Source Destination
Acknowledgements
This work was partially supported by contract number NNG10HP02C from NASA.
Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of NASA or The HDF Group.