1
Preparing Community Land Model Validation Data for Reuse and Preservation Madison Langseth, Will Wieder, and Gary Strand // global attributes: :title = "3x3minute regridded HWSD - Weighted average of topsoil carbon content (kg C m-2)" ; :creator = "Will Wieder" ; :creator_email = "[email protected]" ; :institution = "NCAR (National Center for Atmospheric Research, USA)" ; :references = "FAO/IIASA/ISRIC/ISSCAS/JRC, 2012. Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and IIASA, Laxenburg, Austria." ; :references2 = "HWSD Documentation URL: http://webarchive.iiasa.ac.at/Research/LUC/ External-World-soil-database/ HWSD_Documentation.pdf" ; :source = "Original data from HWSD and processed with ESRI ArcGIS 10.0" ; :modified = "Will Wieder Thu Sep 6 15:00:17 MDT 2012" ; :history1 = "changed missing values to -1" ; :history2 = "Flipped lat (-90:90)" ; :processing = "HWSD regridded from native resolution (30 arc-second) to 3 arc-minute using Environmental Systems Research Institute ArcGIS 10.0. \n", " Topsoil carbon content was converted from percent by weight to kg C m-2 and calculated as a weighted average based on the SHARE and SEQ attributes from the original HWSD. \n", " SUM_t_c_12 = (sum(SEQ(SHARE*T_OC) )*3*T_BULK_DENSI TY). \n", " Each parameter was exported as a separate netCDF file." ; Data Curation in the Long Tail of Science: Preparing Community Land Model Validation Data for Reuse and Preservation Introduction Long tail science is argued to account for the majority of scientific output 1 . Long tail scientific research tends to be conducted by small research teams with limited budgets, affecting the team’s ability to properly curate their data for reuse and preservation. The data set that was curated in this project is a small data set of global soil properties 2 that was processed for use with the Community Land Model (CLM). The Data Curation Profile Toolkit 3 was utilized to work with the scientist in order to determine his needs with respect to curation of the data set. Through a number of formal interviews, it was established that the scientist required assistance in documenting the data workflow, updating the metadata, and eventually archiving the data with an appropriate repository. Acknowledgements The authors would like to thank Matthew Mayernik, Mary Marlino, and Patricia Steinkamp of NCAR, Carol Tenopir and Suzanne Allard of UTK, and Carole Palmer and Cheryl Thompson of UIUC for their support throughout this project. The authors would also like to thank the Institute of Museum and Library Services for providing funding for the DCERC project and the National Center for Atmospheric Research, funded by the National Science Foundation, for use of its resources during the project. Madison L. Langseth*, Will Wieder**, and Gary Strand** References 1. Heidorn, P. B. (2008). Shedding light on the dark data in the long tail of science. Library Trends, 57(2), 280-299. 2. FAO/IIASA/ISRIC/ISSCAS/JRC, (2012). Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and IIASA, Laxenburg, Austria. http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/ 3. http://datacurationprofiles.org/ 4. Oak Ridge National Laboratory Distributed Active Archive Center. http://daac.ornl.gov Metadata Update Interview Stakeholders Document Data Workflow Climate and Forecast (CF) Metadata Conventions Output Data Curation Profile Data Appraisal and Selection Complete Documentation for ORNL DAAC 4 Data Set Description Output Provenance Identification Archiving Assistance // global attributes: :modified = "Will Wieder Thu Sep 6 15:00:17 MDT 2012" ; :history1 = "changed missing values to -1" ; :history2 = "Flipped lat (-90:90)" ; Updated Metadata (See Fig. 1) Fig . 1. Sample metadata section (a) prior to curation work and (b) after curation work (a.) (b.) * School of Information Science, University of Tennessee, Knoxville ([email protected]) ** National Center for Atmospheric Research, USA Lessons Learned Curator Engagement: Early involvement in the data life cycle saves time. Communication: Regular meetings with scientists ensure that everyone is on the same page. Documentation: Basic metadata goes a long way in understanding data. Documenting trial or unused data is important for provenance identification. Results Data workflow was documented visually and in a README file for the scientist’s reference. Existing metadata content was verified for accuracy against original data set documentation. Additional metadata was added to include provenance and detailed data descriptions. The authors appraised and selected the data to be submitted to an external repository. The data set has been submitted to the Oak Ridge National Laboratory Distributed Active Archive Center to be archived. Scientist’s Needs Assessment Output Data set visualization

Data Curation in the Long Tail of Science21/datastream/PDF... · IIASA, Laxenburg, Austria. http ... Interview Stakeholders Document Data Workflow Climate and Forecast (CF) Metadata

Embed Size (px)

Citation preview

Preparing Community Land Model Validation Data for Reuse and Preservation Madison Langseth, Will Wieder, and Gary Strand

// global attributes: :title = "3x3minute regridded HWSD - Weighted average of topsoil carbon content (kg C m-2)" ; :creator = "Will Wieder" ; :creator_email = "[email protected]" ; :institution = "NCAR (National Center for Atmospheric Research, USA)" ; :references = "FAO/IIASA/ISRIC/ISSCAS/JRC, 2012. Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and IIASA, Laxenburg, Austria." ; :references2 = "HWSD Documentation URL: http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HWSD_Documentation.pdf" ; :source = "Original data from HWSD and processed with ESRI ArcGIS 10.0" ; :modified = "Will Wieder Thu Sep 6 15:00:17 MDT 2012" ; :history1 = "changed missing values to -1" ; :history2 = "Flipped lat (-90:90)" ; :processing = "HWSD regridded from native resolution (30 arc-second) to 3 arc-minute using Environmental Systems Research Institute ArcGIS 10.0. \n", " Topsoil carbon content was converted from percent by weight to kg C m-2 and calculated as a weighted average based on the SHARE and SEQ attributes from the original HWSD. \n", " SUM_t_c_12 = (sum(SEQ(SHARE*T_OC) )*3*T_BULK_DENSITY). \n", " Each parameter was exported as a separate netCDF file." ;

Data Curation in the Long Tail of Science: Preparing Community Land Model Validation Data for Reuse and Preservation

Introduction

Long tail science is argued to account for the majority of scientific output1. Long tail scientific research tends to be conducted by small research teams with limited budgets, affecting the team’s ability to properly curate their data for reuse and preservation. The data set that was curated in this project is a small data set of global soil properties2 that was processed for use with the Community Land Model (CLM). The Data Curation Profile Toolkit3 was utilized to work with the scientist in order to determine his needs with respect to curation of the data set. Through a number of formal interviews, it was established that the scientist required assistance in documenting the data workflow, updating the metadata, and eventually archiving the data with an appropriate repository.

Acknowledgements The authors would like to thank Matthew Mayernik, Mary Marlino, and Patricia Steinkamp of NCAR, Carol Tenopir and Suzanne Allard of UTK, and Carole Palmer and Cheryl Thompson of UIUC for their support throughout this project. The authors would also like to thank the Institute of Museum and Library Services for providing funding for the DCERC project and the National Center for Atmospheric Research, funded by the National Science Foundation, for use of its resources during the project.

Madison L. Langseth*, Will Wieder**, and Gary Strand**

References 1.  Heidorn, P. B. (2008). Shedding light on the dark data in the long tail of science. Library Trends, 57(2), 280-299. 2.  FAO/IIASA/ISRIC/ISSCAS/JRC, (2012). Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and

IIASA, Laxenburg, Austria. http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/ 3.  http://datacurationprofiles.org/ 4.  Oak Ridge National Laboratory Distributed Active Archive Center. http://daac.ornl.gov

Metadata Update

Interview Stakeholders

Document Data Workflow

Climate and Forecast (CF)

Metadata Conventions

Output Data Curation Profile

Data Appraisal and Selection

Complete Documentation

for ORNL DAAC4 Data Set

Description Output

Provenance Identification

Archiving Assistance

// global attributes: :modified = "Will Wieder Thu Sep 6 15:00:17 MDT 2012" ; :history1 = "changed missing values to -1" ; :history2 = "Flipped lat (-90:90)" ;

Updated Metadata

(See Fig. 1)

Fig . 1. Sample metadata section (a) prior to curation work and (b) after curation work

(a.) (b.)

* School of Information Science, University of Tennessee, Knoxville ([email protected]) ** National Center for Atmospheric Research, USA

Lessons Learned •  Curator Engagement: Early involvement in the data life cycle saves time. •  Communication: Regular meetings with scientists ensure that everyone is on

the same page. •  Documentation: Basic metadata goes a long way in understanding data.

Documenting trial or unused data is important for provenance identification.

Results •  Data workflow was documented visually and in a README file

for the scientist’s reference. •  Existing metadata content was verified for accuracy against

original data set documentation. •  Additional metadata was added to include provenance and

detailed data descriptions. •  The authors appraised and selected the data to be submitted to

an external repository. •  The data set has been submitted to the Oak Ridge National

Laboratory Distributed Active Archive Center to be archived.

Scientist’s Needs Assessment

Output

Data set visualization