Upload
mervyn-fitzgerald
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Research Data Management System project:
Best Practices in Research Data Management*
*Adaptation of the NECDMC
Today’s Objectives
Why manage data? Identify common data management issues Best practices for managing data Support: how the library and TTS can help you
and your lab
What is Data?
• “Research data, unlike other types of information, is collected, observed, or created, for purposes of analysis to produce original research results” (University of Edinburgh). • Observational• Experimental• Simulation data • Derived or compiled data
Why Should I Manage it?
• Transparency & Integrity• Compliance
Science & Personal Benefits
• Who uses your data now?• Who COULD use your data?
• Shared/Open Data• Scientific progress• Impact on your career• Citation counts
What if I Don’t Consider RDM?
Data Sharing and Management Snafu in 3 Short Acts: A data management horror story by Karen Hanson, Alisa Surkis and Karen Yacobucci.
http://www.youtube.com/watch?v=N2zK3sAtr-4
Seven “Issues” in Research Data Management• Responsibility• Data Management Plans• Records Management• File Management• File Naming
• Metadata• Backup and Security• Ownership and Retention• Long Term Planning
Issue: Responsibility
• Best Practices• Define roles and assign responsibilities for data
management• Identify skills needed to perform tasks outlined in DMP
and match to available staff • Develop training plans for continuity• Assign responsible parties and monitor results
Issue: Data Management Plans
CREATING DATA
PROCESSINGDATA
ANALYSING DATA
PRESERVING DATA
GIVING ACCESS TO
DATA
RE-USING DATA
Data Life Cycle
Creating a Data Management Plan
• “the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;
• the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);
• policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;
• policies and provisions for re-use, re-distribution, and the production of derivatives; and
• plans for archiving data, samples, and other research products, and for preservation of access to them”
Issue: Data Management Plans
• Best Practices• What types of data will be created?• Who will own, have access to, and be responsible for
managing these data?• What equipment and methods will be used to capture
and process data? • Where will data be stored during and after?
Issue: File Management
• Does this sound familiar?• Inconsistently labeled files• in multiple versions…• inside poorly structured folders…• stored on multiple media…• in multiple locations… • and in various formats…
Issue: File Naming
• Best Practices• Avoid special characters in a file name. • Use capitals or underscores instead of periods or
spaces.• Use 25 or fewer characters. • Use documented & standardized descriptive
information about the project/experiment.• Use date format ISO 8601:YYYYMMDD.• Include a version number.
Issue: File Naming
Issue: File Naming
• Best Practices• Avoid special characters in a file name. • Use capitals or underscores instead of periods or
spaces.• Use 25 or fewer characters. • Use documented & standardized descriptive
information about the project/experiment.• Use date format ISO 8601:YYYYMMDD.• Include a version number.
Need Help?Contact
Issue: MetadataWhat is Metadata? • “Metadata is structured information that describes,
explains, locates, or otherwise makes it easier to retrieve, use or manage an information resource.”
--2004, NISO, Understanding Metadata, pg. 1
• A love note to the future…• How will someone make sense of your data e.g. the
cells and values of your spreadsheet?• What universal or disciplinary standards could be used
to label your data?• How can you describe a data set to make it
discoverable?
Why Use Metadata?• find data from other researchers to support your
research• use the data that you do find• help other professionals find and use data from your
research• use your own data in the future when you may have
forgotten details of the research• Help ensure consistency and clarity of data through
the use of technical standards and controlled vocabularies
Common metadata fields• Title• Creator• Identifier• Subject• Funders• Rights• Access information• Language• Dates• Location
• Methodology• Data processing• Sources• List of file names• File Formats• File structure• Variable list• Code lists• Versions• Checksums
What else?• Standard conventions are used to describe content in a
way that ensures units such as date, time, location, etc. are entered consistently among the researchers in your group
• Controlled vocabularies are lists of predefined terms that ensure consistency of use, and help disambiguate similar concepts. Use the controlled vocabulary that best matches your research. • You might create a short list of terms to choose from when
populating a specific piece of data• For example, subject terms used in research about biometric
sensing might be taken from a controlled vocabulary list such as Medical Subject Headings (MeSH)
Issue: Metadata
• Biology and health-specific metadata examples
Issue: Metadata
• Best Practices – Create a Data Dictionary• Describe the contents of data files• Define the parameters and the units on the parameter• Explain the formats for dates, time, geographic
coordinates, and other parameters• Define any coded values• Describe quality flags or qualifying values• Define missing values
Need Help?Contact
Metadata and the ELN• Any searchable field in the Agilent or LabArchives ELN
technically contains metadata• In both ELNs, you can add tags/keywords to
experiments, data files, and image files• In some cases you can create a pre-defined list of
tags/keywords to choose from
AgilentSearchable fields:
AgilentFunding Source via menu:
Project Focus via menu:
AgilentAssociate metadata with an experiment using keywords:
LabArchivesAssociate metadata with an experiment using tags:
LabArchivesAssociate keyword metadata with an image file:
Issue: Backup & Security
• How often should data be backed up?• How many copies of data should you have?• Where can you store your data?• How much server space can I get?
Issue: Backup & Security
• Best Practices• Make 3 copies (original + external/local + external/remote)• Have them geographically distributed (local vs. remote)• Use a Hard drive (e.g. Vista backup, Mac Timeline, UNIX rsync) or
Tape backup system• Cloud Storage - some examples of private sector storage
resources include: (Amazon S3, Elephant Drive, Jungle Disk, Mozy, Carbonite)
• Unencrypted is ideal for storing your data because it will make it most easily read by you and others in the future…but if you do need to encrypt your data because of human subjects then:• Keep passwords and keys on paper (2 copies), and in a PGP
(pretty good privacy) encrypted digital file• Uncompressed is also ideal for storage, but if you need to do so
to conserve space, limit compression to your 3rd backup copy
Issue: Ownership & Retention
• How long is long enough?
Issue: Ownership & Retention
• Intellectual Property Policy• IRB data retention policy• Funders’ data retention policy• Publishers’ data retention policy• Federal and State laws
Issue: Long-Term Planning
• What will happen to my data after my project ends?• How can I appraise the value of my data?• What are my options for archiving and
preserving my data?• What are my options for publishing and sharing
data?
Open vs. Proprietary Formats Used in Research Labs
Issue: Long-Term Planning
• Best Practices• When choosing a file format, select a consistent
format that can be read well into the future and is independent of changes in applications.• Non-proprietary: Open, documented standard,
Unencrypted, Uncompressed, ASCII formatted files will be readable into the future.
Works CitedLamar Soutter Library, University of Massachusetts Medical School. 2014. “New England Collaborative Data Management Curriculum: Module 1.” http://library.umassmed.edu/necdmc.
DataONE. 2013. “Best Practices for Data Management.”http://www.dataone.org/best-practices.
MIT Libraries. 2013. “Data Management and Publishing.” MIThttp://libraries.mit.edu/guides/subjects/data-management/index.html.
Office of Research Integrity. 2013. “Data Management.” United States Department of Health and Human Services. United States Federal Government. http://ori.hhs.gov/education/products/rcradmin/topics/data/open.shtml.
This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 3.0 United States License.
Learn More
• Data Management Principles & Education:• Tufts Libraries Data Management Guide • Research Data MANTRA• DataONE: Best Practices• UK Data Archives• MIT Data Management and Publishing Guide
• Data Management Plans• Digital Curation Centre• DMPTool2• DataONE: Data Management Planning
Find Help
• Data Management Plans and Metadata services:• Medford/Somerville Campus: [names/contact info]• Boston/Grafton Campus: [librarian names/contact info]
• Data storage and security services + ELN support:• [TTS contact info]