49
Data Management Stephanie Wright University of Washington [email protected] SPATIAL / IsoCamp June 2015 Tips & Tools

Data Management: Tips & Tools

Embed Size (px)

Citation preview

Page 1: Data Management: Tips & Tools

Data Management

Stephanie WrightUniversity of [email protected]

SPATIAL / IsoCampJune 2015

Tips & Tools

Page 2: Data Management: Tips & Tools

Who Am I?

Page 3: Data Management: Tips & Tools

• Computing Trainer• Cruise Ship Lecturer (Love Boat)• Library Merger Manager• Atmospheric Sciences Librarian• Assessment Librarian• Data Services Coordinator

HTTP://GUIDES.LIB.WASHINGTON.EDU/SWRIGHT

Page 4: Data Management: Tips & Tools

Disclaimer I am not a scientist I am a librarian …

Page 5: Data Management: Tips & Tools

Disclaimer I am not a scientist More like this…

Page 6: Data Management: Tips & Tools

What Do I Do?

• Data Management Plans (DMPs)• Courses• Consultations• Research Projects• DataONE, RDA, eScience Institute• Institutional Data Repository (DRUW)

Page 7: Data Management: Tips & Tools

Why?

Page 8: Data Management: Tips & Tools

THEN NOW

Page 9: Data Management: Tips & Tools

THEN

NOW

Page 10: Data Management: Tips & Tools

THEN NOW

Page 11: Data Management: Tips & Tools

A Real Life Example

Page 12: Data Management: Tips & Tools
Page 13: Data Management: Tips & Tools

Many tables

Page 14: Data Management: Tips & Tools

my spreadsheet

No headings

Page 15: Data Management: Tips & Tools

Embedded figures

Page 16: Data Management: Tips & Tools

my spreadsheet

Page 17: Data Management: Tips & Tools

my spreadsheet

Page 18: Data Management: Tips & Tools

my spreadsheet

Page 19: Data Management: Tips & Tools
Page 20: Data Management: Tips & Tools

?

Page 21: Data Management: Tips & Tools

One More Example

https://www.youtube.com/watch?v=66oNv_DJuPc

Data Sharing and Management Snafu in 3 Short Acts 

Page 22: Data Management: Tips & Tools

Why Does It Matter?

From Flickr by tomhilton

Page 23: Data Management: Tips & Tools

HTTP://WWW.SPARC.ARL.ORG/ISSUES/OPEN-DATA/DATA-SHARING-INITIATIVE/POLICIES

… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”

Page 24: Data Management: Tips & Tools
Page 25: Data Management: Tips & Tools
Page 26: Data Management: Tips & Tools
Page 27: Data Management: Tips & Tools

“The best thing to do with your data will be thought of by someone else.”

“We need open data because we don’t just want to use a car we want to poke around in the engine, see how it works and then rebuild it.”

~ Rufus PollockFounder and President of Open Knowledge Foundation (www.okfn.org)

Page 28: Data Management: Tips & Tools

From Flickr by cogdog

Page 29: Data Management: Tips & Tools

WICHERTS JM, BAKKER M, MOLENAAR D (2011) WILLINGNESS TO SHARE RESEARCH DATA IS RELATED TO THE STRENGTH OF THE EVIDENCE AND THE QUALITY OF REPORTING OF STATISTICAL RESULTS. PLOS ONE 6(11): E26828. DOI:10.1371/JOURNAL.PONE.0026828

HTTP://127.0.0.1:8081/PLOSONE/ARTICLE?ID=INFO:DOI/10.1371/JOURNAL.PONE.0026828

Page 30: Data Management: Tips & Tools

How To Do It?

Page 31: Data Management: Tips & Tools

Data planning is more efficient than data forensics.

DATA MANAGEMENT PLANNING•What will be collected•Methods•Standards•Sharing/access•Long-term storage

Page 32: Data Management: Tips & Tools

COLLECTING •Keep raw data raw• Use scripts to process data

Page 33: Data Management: Tips & Tools

ORGANIZING• Machine readable• Human readable• Works well with default ordering

Page 34: Data Management: Tips & Tools

AVOID• spaces• punctuation• special characters• case sensitivity

20130503_DOEProject_DesignDocument_Smith_v2-01.docx20130709_DOEProject_MasterData_Jones_v1-00.xlsx20130825_DOEProject_Ex1Test1_Data_Gonzalez_v3-03.xlsx20130825_DOEProject_Ex1Test1_Documentation_Gonzalez_v3-03.xlsx20131002_DOEProject_Ex1Test2_Data_Gonzalez_v1-01.xlsx20141023_DOEProject_ProjectMeetingNotes_Kramer_v1-00.docx

Eaffinis_nanaimo_2010_counts.xls

Site name

YearWhat was measured

Study organis

m

Page 35: Data Management: Tips & Tools

YYYYMMDD

Page 36: Data Management: Tips & Tools

NOBLE, WILLIAM S. (2009) "A QUICK GUIDE TO ORGANIZING COMPUTATIONAL BIOLOGY PROJECTS." PLOS COMPUTATIONAL BIOLOGY. 5(7): DOI/10.1371/JOURNAL.PCBI.1000424

• Pick a method that works for you and stick to it• DOCUMENT IT!

Page 37: Data Management: Tips & Tools

METADATA•Who?•What?•Where?•When?•How?•Why?

Page 38: Data Management: Tips & Tools

Digital context

• Name of the data set

• The name(s) of the data file(s) in the data set

• Date the data set was last modified

• Example data file records for each data type file

• Pertinent companion files

• List of related or ancillary data sets

• Software (including version number) used to prepare/read the data set

• Data processing that was performed

Personnel & stakeholders

• Who collected

• Who to contact with questions

• Funders

Scientific context

• Scientific reason why the data were collected

• What data were collected

• What instruments (including model & serial number) were used

• Environmental conditions during collection

• Temporal & spatial resolution

• Standards or calibrations used

Information about parameters

• How each was measured or produced

• Units of measure

• Format used in the data set

• Precision & accuracy if known

Information about data

• Definitions of codes used

• Quality assurance & control measures

• Known problems that limit data use (e.g. uncertainty, sampling problems)

Page 39: Data Management: Tips & Tools

Temperature data

Salinity data

Data import into Excel

Analysis: mean, SD

Graph production

Quality control & data cleaning“Clean”

T & S data

Summary

statistics

Data in spread-sheet

Simple: Flow chart

WORKFLOW

Page 40: Data Management: Tips & Tools

Simple: Commented script

Page 41: Data Management: Tips & Tools

Resulting output

More Fancy: Kepler, Taverna

Page 42: Data Management: Tips & Tools

From Flickr by cogdog

Page 43: Data Management: Tips & Tools

BACKING UP: 3 places, 3 ways

From Flickr by lippo

From Flickr by see phar

Original

Near

Far

What software?What hardware?What personnel?

How often?Set up reminders!

Test system

Page 44: Data Management: Tips & Tools

SHARING

RepositoriesInstitutionalDisciplinaryJournalre3data.org

Sustainable formatsOpen, non-proprietaryCommonly used in your disciplineNot encrypted or compressed

Page 45: Data Management: Tips & Tools

Review your DMPDid you do what you said you would?

Page 46: Data Management: Tips & Tools

Photo credit Michael Ham

Page 47: Data Management: Tips & Tools

How Do I Learn More?

•Funding Mandateshttp://chronicle.com/article/Where-Should-You-Keep-Your/231065/http://datapub.cdlib.org/2013/02/28/the-new-ostp-policy-what-it-means/

•File Naming Conventions: http://www.exadox.com/en/articles/file-naming-convention-ten-rules-best-practice

•Folder Structures: http://www.damlearningcenter.com/resources/articles/best-practices-for-folder-organization/

•Metadata:http://www.dcc.ac.uk/resources/metadata-standards

•DataONE Primerhttps://www.dataone.org/best-practices

•Software Carpentryhttp://software-carpentry.org/

•Research Data Alliancehttps://rd-alliance.org/

•Your Libraryhttp://guides.lib.washington.edu/dmg

Page 48: Data Management: Tips & Tools

Tools

•Data Mgmt PlanningDMPTool https://dmptool.org/

•MetadataMorpho https://www.dataone.org/software-tools/morphoNOAA MERMaid http://www.ncddc.noaa.gov/ metadata-standards/mermaid/

•WorkflowsKepler https://kepler-project.org/Taverna http://www.taverna.org.uk/

•Sharing re3data http://www.re3data.org/GitHub https://github.com/

•MiscellaneousEZID http://ezid.cdlib.org/ImpactStory https://impactstory.org/ORCID http://orcid.org/

Page 49: Data Management: Tips & Tools

Any Other Questions? Stephanie Wright

Web data.blogspot.com

Twitter @UWLibsData

Email [email protected]