66
Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Embed Size (px)

Citation preview

Page 1: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data Management for Research

Aaron Collie, MSU LibrariesLisa Schmidt, University Archives

Page 2: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Introductions Please tell us your name and

department A brief description of your

primary research area What do you consider to be your

research data?

Optional: Experience managing research data? Experience writing a data

management plan?

cc http://www.flickr.com/photos/quinnanya/

Page 3: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

• Introductions • Background• Definitions• Upfront Decisions• Data Sharing Impacts

• Fundamentals Practices• File Organization• Data Documentation• Reliable Backup

• Data Lifecycle Strategy

Agenda

Page 4: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Why are we here?

Page 5: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

But why are we really here?

An Impetus: NSF recently released a mandate that all grant applications submitted after January 18th, 2011 must include a supplemental “Data Management Plan”

An Effect: This mandate from NSF has had a domino effect, and many funders that now require or state guidelines for data management of grant funded research

A Challenge: Data management (and oftentimes research methods in general) is an area that has not traditionally received a full treatment in most graduate and doctoral curricula

Page 6: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

What is meant by “data management”?

Fundamental Practices File Organization Data Documentation Reliable Backups

Data lifecycle Digital Sustainability Scholarly

Communication Data Publishing Research Impact

Page 7: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Effective January 18, 2011 NSF will not evaluate any proposal missing a DMP May be up to two pages long PI may state that project will not generate data or

samples DMP is reviewed as part of intellectual merit or

broader impacts of application, or both Costs to implement DMP may be included in

proposal’s budget

Page 8: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

NSF’s Data Management Guidelines Policies for re-use, re-distribution, and creation of

derivatives Plans for archiving data, samples, and other research

outcomes, maintaining access Types of data, samples, physical collections, software

generated Standards for data and metadata format and content Access and sharing policies, with stipulations for

privacy, confidentiality, security, intellectual property, or other rights or requirements

Page 9: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Other Federal Policies

NASA “promotes the full and open sharing of all data”

“requires that data…be submitted to and archived by designated national data centers.”

“expects the timely release and sharing of final research data"

"IMLS encourages sharing of research data."

“…should describe how the project team will manage and disseminate data generated by the project”

Page 10: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Upfront Decisions for Researchers What is the expected lifespan of the data? Besides the researcher(s) on the project, who else

should be given access to the data? Does the dataset include any sensitive information? Who owns or controls the research data? Should any restrictions be placed on the dataset? How are the data stored and preserved?

Page 11: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Upfront Decisions for Researchers How might the data be used, reused, and

repurposed? How is the data described and organized? Who are the expected and potential audiences for

the datasets? What publications or discoveries have resulted from

the datasets? How should the data be made accessible?

Page 12: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data Sharing Impacts Reinforces open scientific

inquiry Encourages diversity of

analysis and opinion Promotes new research,

testing of new or alternative hypotheses and methods of analysis

Supports studies on data collection methods and measurement

Cc http://www.flickr.com/photos/pinchof_10/

Page 13: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data Sharing Impacts (cont.)

Facilitates education of new researchers

Enables exploration of topics not envisioned by initial investigators

Permits creation of new datasets by combining data from multiple sources

Page 14: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

• Introductions • Background• Definitions• Upfront Decisions• Data Sharing Impacts

• Fundamentals Practices• File Organization• Data Documentation• Reliable Backup

• Data Lifecycle Strategy

Agenda

Page 15: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

File Organization Practices: Overview

1. Create a file plan for your research project

2. Design a file naming convention that works for your project

3. Agree on a version control method to assist with file synchronization

4. Carefully choose file formats to maximize usefulness

“When I was a freshmen I named my assignments Paper Paperr Paperrr Paperrrr”-Undergrad

Page 16: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

1. Create a file plan for your research project

File plan as a classification system Indexed – makes it easier to locate folders/files Primary subjects – main functions of research project

Secondary subjects – more specific activities of project, including research data

• Tertiary subjects – limit by date or equivalent– File Name (naming conventions)

Page 17: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

1. Create a file plan for your research project (cont.)

Example documentation of Directory Hierarchy: /[Project]/[Grant Number]/[Event]/[Date]

Example documentation of File Naming Convention: [investigator]_[method]_[descriptor]_[YYYYMMDD]_[version].[ext]

Page 18: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

2. Design a file naming convention that works for your project

Why file naming conventions? Enable better access/retrieval of files Create logical sequences for file sorting More easily identify what you’re searching for

Page 19: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Meaningful but short (255 character limit) Descriptive while still making sense Capital letters or underscores differentiate

between words Surname first followed by initials of first name More on handout

2. Design a file naming convention that works for your project (cont.)

Page 20: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

2. Design a file naming convention that works for your project (cont.)

This Not ThissharpeW_krillMicrograph_backscatter3_20110117.tif KrillData2011.tif

This Not ThisborgesJ_collocation_20080414.xml Borges_Textbase.xml

Page 21: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

3. Agree on a version control method to assist with file synchronization

Version number of record indicated file name with “v” followed by version number

Letter “d” indicates draft

Examples of simple version control:waltM_lakeLansing_fieldNotes_20091012_v002.docpetersK_OrgChart2009_d001.svg

Page 22: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

4. Carefully choose file formats to maximize usefulness

• Non-proprietary• Open, documented standard• Common usage by research community• Standard representation (ASCII, Unicode)• Unencrypted• Uncompressed

Page 23: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Documentation Practices: Overview

1. At minimum create a README file that you can use to document your project

2. Utilize standards for describing data including Metadata Standards

3. If applicable, use in-line code commentary to explain code (cc) Will Scullin

Page 24: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

1. At minimum create a README file that you can use to document your project

At minimum, store documentation in readme.txt file or equivalent, with data

Resource: http://libraries.mit.edu/guides/subjects/data-management/metadata.html

Page 25: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

“Data about data” Standardized way of describing data Explains who, what, where, when of data creation

and methods of use Provides the essential tools for discovery, such as

a bibliographic citation

2. Utilize standards for describing data including Metadata Standards

Page 26: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

2. Utilize standards for describing data including Metadata Standards

Basic project metadata:

• Title • Language • File Formats

• Creator • Dates • File Structure

• Identifier • Location • Variable List

• Subject • Methodology • Code Lists

• Funders • Data Processing • Versions

• Rights • Sources • Checksums

• Access Information

• List of File Names

Page 27: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Documentation Practices: Example Metadata Standards

Dublin Core Easy-to-create-and-maintain descriptive format to facilitate cross-domain resource discovery on the Web

Darwin Core Facilitates reference and sharing of biological diversity datasets

Data Documentation Initiative (DDI) Methodology for content, presentation, transport, and preservation of metadata about datasets in the social and behavioral sciences

Page 28: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Documentation Practices: Example Metadata Standards

Directory Interchange Format Descriptive format for exchanging information about earth science data

ISO 19115:2003 Describes geographic data such as maps and charts

PBCore Supports description and exchange of media assets, including both individual clips and full, edited, aired productions

Page 29: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Documentation Practices: Example Metadata Standards

Science Data Literacy Project Metadata for astronomy, biology, ecology and oceanography

VRACoreData standard for description of works of visual culture as well as images that document them

Page 30: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

3. If applicable, use in-line code commentary to explain code

Example of R code commentary

# Cumulative normal densitypnorm(c(-1.96,0,1.96))

Page 31: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Backup Practices: Overview

1. Avoid single points of failure2. Understand the different types of storage3. Ensure data redundancy4. Aim for geographic distribution of data

Page 32: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

1. Avoid single points of failure

A single point of failure occurs when it would only take one event to destroy all data on a device (e.g. dropped hard drive)

Good practices for avoiding single points of error: Use managed networked storage whenever possible Move data off of portable media Never rely on one copy of data Do not rely on CD or DVD copies to be readable Be wary of software lifespans (e.g. Angel)

Page 33: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

2. Understand the different types of storage

• Flash Drives• Internal Hard Drives• External Hard Drives• Server and Web Storage• Managed Networked Storage• Cloud Storage

Page 34: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

3. Ensure data redundancy

Backup Do’s: Make 3 copies

E.g. original + external/local + external/remote E.g. original + 2 formats on 2 drives in 2 locations

Geographically distribute and secure Local vs. remote, depending on needed recovery time

Personal computer, external hard drives, departmental, or university servers may be used

Page 35: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

3. Ensure data redundancy (cont.)

Backup Don’ts: Do not rely on one copy Do not use CDs and DVDs Do not rely on ANGEL

(cc) George Ornbo

Page 36: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

3. Ensure data redundancy (cont.)

Backup Maybe: Cloud storage

Amazon s3 Google MS Azure DuraCloud Rackspace

Note that many enterprise cloud storage services include a charge for in/out of data transfers

$$$

Page 37: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

• Introductions • Background• Definitions• Upfront Decisions• Data Sharing Impacts

• Fundamentals Practices• File Organization• Data Documentation• Reliable Backup

• Data Lifecycle Strategy

Agenda

Page 38: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Research is…De

fine

a qu

estio

n

Gath

er

info

rmati

on

Form

a

hypo

thes

is

Test

the

hypo

thes

is

Anal

yze

the

data Inte

rpre

t th

e da

ta

Publ

ish

resu

lts

Rete

st

Page 39: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Defin

e a

ques

tion

Gath

er

info

rmati

on

Form

a

hypo

thes

is

Test

the

hypo

thes

is

Anal

yze

the

data

Inte

rpre

t th

e da

ta

Publ

ish

resu

lts

Rete

st

?

Page 40: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Defin

e a

ques

tion

Gath

er

info

rmati

on

Form

a

hypo

thes

is

Test

the

hypo

thes

is

Anal

yze

the

data

Inte

rpre

t th

e da

ta

Publ

ish

resu

lts

Rete

st

The scientific method “is often misrepresented as a fixed sequence of steps,” rather than being seen for what it truly is, “a highly variable and creative process” (AAAS 2000:18).

Gauch, Hugh G. Scientific Method in Practice. New York: Cambridge University Press, 2010. Print. (Emphasis added)

Page 41: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Defin

e a

ques

tion

Gath

er

info

rmati

on

Form

a

hypo

thes

is

Test

the

hypo

thes

is

Anal

yze

the

data Inte

rpre

t th

e da

ta

Publ

ish

resu

lts

Rete

st

Page 42: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

The Research Depth Chart

Scientific Method

Research Design

Research Method

Research Tasks Mor

e Sp

ecifi

c

M

ore

Gen

eric

Page 43: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Defin

e a

ques

tion

Gath

er

info

rmati

on

Form

a

hypo

thes

is

Test

the

hypo

thes

is

Anal

yze

the

data Inte

rpre

t th

e da

ta

Publ

ish

resu

lts

Rete

st

Page 44: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Source: DDI Structural Reform Group. “Overview of the DDI Version 3.0 Conceptual Model.“ DDI Alliance. 2004.http://opendatafoundation.org/ddi/srg/Papers/DDIModel_v_4.pdf

The Data Management Depth Chart

Research Data Lifecycle Model

Page 45: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

The Data Management Depth Chart

Research Data Lifecycle Model

Research Data Management Tasks

???

???

Page 46: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

The Data Management Depth Chart

Research Data Lifecycle Model

???

Data Management Plan

Research Data Management Tasks

Page 48: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are brainstormed

Study Concept

Page 49: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are brainstormed

DMP • Data type, purpose & value

MSU

• University Research Council guidelines• Research Facilitation and

Dissemination• Lifecycle Data Management Planning• Research Data Management Guidance

YOU • Start your Data Management Plan!

Page 50: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are collected and secured

Study Concept

Data Collection

Page 51: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are collected

DMP • Data format, size & short term storage

MSU

• ATS Andrew File System (AFS)• Institute for Cyber Enabled Research• MSU Libraries Data Services• MSU Libraries Campus Data Resources

YOU • File Plan, File Naming, Backup Plan

Page 52: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are normalized and processed

Study Concept

Data Collection

Data Processing

Page 53: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are processed

DMP • Data transformations & structures

MSU• LCT Computing Courses• High Performance Computing Center• Consortium of Research Consulting

Services

YOU • Documentation, Methodology

Page 54: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are distributed

Data Distribution

Study Concept

Data Collection

Data Processing

Page 55: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are distributed

DMP • Data sharing, security & rights

MSU

• Human Research Protection Program• University Research Council guidelines• MSU Libraries Copyright Permissions

Center• MSU Google Apps

YOU • Roles, Responsibilities, Resources

Page 56: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are discoverable

Data Distribution

Study Concept

Data Collection

Data Processing

Data Discovery

Page 57: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are discoverable

DMP • Data publishing & metadata

MSU• Development of Copyrighted Materials• MSU Libraries Data Citation Guide

YOU • README, Metadata Standard

Page 58: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are analyzed

Data Distributio

n

Data Discovery

Data Analysis

Study Concept

Data Collection

Data Processing

Page 59: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are analyzed

DMP • Standards & workflow documentation

MSU• Center for Statistical Training and

Consulting• Statistical Consulting Services

YOU • Code Commentary, Documentation

Page 60: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are stored and preserved

Data Distribution

Data Discovery

Data Analysis

Study Concept

Data Collection

Data Processing

Data Archiving

Page 61: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data are preserved

DMP • Long term storage & management

MSU• VPRGS Repositories and Archives• Lifecycle Data Management Planning• Databib.org!

YOU • Embrace stewardship

Page 62: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data can be used and reused

Data Distribution

Data Discovery

Data Analysis

Study Concept

Data Collection

Data Processing

Data Archiving

Repurposing

Page 63: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Data can be used and reused

DMP • Broader impact

MSU• Research Data Management CAFE• MSU Research Centers and Institutes• MSU Libraries Data Citation Guide

YOU • Publish your data!

Page 64: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

Research Data Management Guidance

Face-to-face Advising Writing Data Management Plans Planning for Digital Projects Managing Digital Information

Group Training New Faculty Orientation Faculty Seminars Classroom Instruction lib.msu.edu/about/rdmg

Page 65: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

In Conclusion… Upfront Decisions Researchers Need to Make General Good Practices for Managing Research Data NSF, NIH, IMLS and Other Funders’ Requirements Lifecycle of Research Data

Page 66: Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

ContactLisa M. SchmidtElectronic Records ArchivistUniversity Archives & Historical [email protected]

Aaron CollieDigital Curation LibrarianMSU [email protected]