Preparing Community Land Model Validation Data for Reuse and Preservation Madison Langseth, Will Wieder, and Gary Strand
// global attributes: :title = "3x3minute regridded HWSD - Weighted average of topsoil carbon content (kg C m-2)" ; :creator = "Will Wieder" ; :creator_email = "[email protected]" ; :institution = "NCAR (National Center for Atmospheric Research, USA)" ; :references = "FAO/IIASA/ISRIC/ISSCAS/JRC, 2012. Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and IIASA, Laxenburg, Austria." ; :references2 = "HWSD Documentation URL: http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HWSD_Documentation.pdf" ; :source = "Original data from HWSD and processed with ESRI ArcGIS 10.0" ; :modified = "Will Wieder Thu Sep 6 15:00:17 MDT 2012" ; :history1 = "changed missing values to -1" ; :history2 = "Flipped lat (-90:90)" ; :processing = "HWSD regridded from native resolution (30 arc-second) to 3 arc-minute using Environmental Systems Research Institute ArcGIS 10.0. \n", " Topsoil carbon content was converted from percent by weight to kg C m-2 and calculated as a weighted average based on the SHARE and SEQ attributes from the original HWSD. \n", " SUM_t_c_12 = (sum(SEQ(SHARE*T_OC) )*3*T_BULK_DENSITY). \n", " Each parameter was exported as a separate netCDF file." ;
Data Curation in the Long Tail of Science: Preparing Community Land Model Validation Data for Reuse and Preservation
Introduction
Long tail science is argued to account for the majority of scientific output1. Long tail scientific research tends to be conducted by small research teams with limited budgets, affecting the team’s ability to properly curate their data for reuse and preservation. The data set that was curated in this project is a small data set of global soil properties2 that was processed for use with the Community Land Model (CLM). The Data Curation Profile Toolkit3 was utilized to work with the scientist in order to determine his needs with respect to curation of the data set. Through a number of formal interviews, it was established that the scientist required assistance in documenting the data workflow, updating the metadata, and eventually archiving the data with an appropriate repository.
Acknowledgements The authors would like to thank Matthew Mayernik, Mary Marlino, and Patricia Steinkamp of NCAR, Carol Tenopir and Suzanne Allard of UTK, and Carole Palmer and Cheryl Thompson of UIUC for their support throughout this project. The authors would also like to thank the Institute of Museum and Library Services for providing funding for the DCERC project and the National Center for Atmospheric Research, funded by the National Science Foundation, for use of its resources during the project.
Madison L. Langseth*, Will Wieder**, and Gary Strand**
References 1. Heidorn, P. B. (2008). Shedding light on the dark data in the long tail of science. Library Trends, 57(2), 280-299. 2. FAO/IIASA/ISRIC/ISSCAS/JRC, (2012). Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and
IIASA, Laxenburg, Austria. http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/ 3. http://datacurationprofiles.org/ 4. Oak Ridge National Laboratory Distributed Active Archive Center. http://daac.ornl.gov
Metadata Update
Interview Stakeholders
Document Data Workflow
Climate and Forecast (CF)
Metadata Conventions
Output Data Curation Profile
Data Appraisal and Selection
Complete Documentation
for ORNL DAAC4 Data Set
Description Output
Provenance Identification
Archiving Assistance
// global attributes: :modified = "Will Wieder Thu Sep 6 15:00:17 MDT 2012" ; :history1 = "changed missing values to -1" ; :history2 = "Flipped lat (-90:90)" ;
Updated Metadata
(See Fig. 1)
Fig . 1. Sample metadata section (a) prior to curation work and (b) after curation work
(a.) (b.)
* School of Information Science, University of Tennessee, Knoxville ([email protected]) ** National Center for Atmospheric Research, USA
Lessons Learned • Curator Engagement: Early involvement in the data life cycle saves time. • Communication: Regular meetings with scientists ensure that everyone is on
the same page. • Documentation: Basic metadata goes a long way in understanding data.
Documenting trial or unused data is important for provenance identification.
Results • Data workflow was documented visually and in a README file
for the scientist’s reference. • Existing metadata content was verified for accuracy against
original data set documentation. • Additional metadata was added to include provenance and
detailed data descriptions. • The authors appraised and selected the data to be submitted to
an external repository. • The data set has been submitted to the Oak Ridge National
Laboratory Distributed Active Archive Center to be archived.
Scientist’s Needs Assessment
Output
Data set visualization