Upload
stephanie-wright
View
47
Download
0
Tags:
Embed Size (px)
Citation preview
Data Management
Stephanie WrightUniversity of [email protected]
SPATIAL / IsoCampJune 2015
Tips & Tools
Who Am I?
• Computing Trainer• Cruise Ship Lecturer (Love Boat)• Library Merger Manager• Atmospheric Sciences Librarian• Assessment Librarian• Data Services Coordinator
HTTP://GUIDES.LIB.WASHINGTON.EDU/SWRIGHT
Disclaimer I am not a scientist I am a librarian …
Disclaimer I am not a scientist More like this…
What Do I Do?
• Data Management Plans (DMPs)• Courses• Consultations• Research Projects• DataONE, RDA, eScience Institute• Institutional Data Repository (DRUW)
Why?
THEN NOW
THEN
NOW
THEN NOW
A Real Life Example
Many tables
my spreadsheet
No headings
Embedded figures
my spreadsheet
my spreadsheet
my spreadsheet
?
One More Example
https://www.youtube.com/watch?v=66oNv_DJuPc
Data Sharing and Management Snafu in 3 Short Acts
Why Does It Matter?
From Flickr by tomhilton
HTTP://WWW.SPARC.ARL.ORG/ISSUES/OPEN-DATA/DATA-SHARING-INITIATIVE/POLICIES
… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”
“The best thing to do with your data will be thought of by someone else.”
“We need open data because we don’t just want to use a car we want to poke around in the engine, see how it works and then rebuild it.”
~ Rufus PollockFounder and President of Open Knowledge Foundation (www.okfn.org)
From Flickr by cogdog
WICHERTS JM, BAKKER M, MOLENAAR D (2011) WILLINGNESS TO SHARE RESEARCH DATA IS RELATED TO THE STRENGTH OF THE EVIDENCE AND THE QUALITY OF REPORTING OF STATISTICAL RESULTS. PLOS ONE 6(11): E26828. DOI:10.1371/JOURNAL.PONE.0026828
HTTP://127.0.0.1:8081/PLOSONE/ARTICLE?ID=INFO:DOI/10.1371/JOURNAL.PONE.0026828
How To Do It?
Data planning is more efficient than data forensics.
DATA MANAGEMENT PLANNING•What will be collected•Methods•Standards•Sharing/access•Long-term storage
COLLECTING •Keep raw data raw• Use scripts to process data
ORGANIZING• Machine readable• Human readable• Works well with default ordering
AVOID• spaces• punctuation• special characters• case sensitivity
20130503_DOEProject_DesignDocument_Smith_v2-01.docx20130709_DOEProject_MasterData_Jones_v1-00.xlsx20130825_DOEProject_Ex1Test1_Data_Gonzalez_v3-03.xlsx20130825_DOEProject_Ex1Test1_Documentation_Gonzalez_v3-03.xlsx20131002_DOEProject_Ex1Test2_Data_Gonzalez_v1-01.xlsx20141023_DOEProject_ProjectMeetingNotes_Kramer_v1-00.docx
Eaffinis_nanaimo_2010_counts.xls
Site name
YearWhat was measured
Study organis
m
YYYYMMDD
NOBLE, WILLIAM S. (2009) "A QUICK GUIDE TO ORGANIZING COMPUTATIONAL BIOLOGY PROJECTS." PLOS COMPUTATIONAL BIOLOGY. 5(7): DOI/10.1371/JOURNAL.PCBI.1000424
• Pick a method that works for you and stick to it• DOCUMENT IT!
METADATA•Who?•What?•Where?•When?•How?•Why?
Digital context
• Name of the data set
• The name(s) of the data file(s) in the data set
• Date the data set was last modified
• Example data file records for each data type file
• Pertinent companion files
• List of related or ancillary data sets
• Software (including version number) used to prepare/read the data set
• Data processing that was performed
Personnel & stakeholders
• Who collected
• Who to contact with questions
• Funders
Scientific context
• Scientific reason why the data were collected
• What data were collected
• What instruments (including model & serial number) were used
• Environmental conditions during collection
• Temporal & spatial resolution
• Standards or calibrations used
Information about parameters
• How each was measured or produced
• Units of measure
• Format used in the data set
• Precision & accuracy if known
Information about data
• Definitions of codes used
• Quality assurance & control measures
• Known problems that limit data use (e.g. uncertainty, sampling problems)
Temperature data
Salinity data
Data import into Excel
Analysis: mean, SD
Graph production
Quality control & data cleaning“Clean”
T & S data
Summary
statistics
Data in spread-sheet
Simple: Flow chart
WORKFLOW
Simple: Commented script
Resulting output
More Fancy: Kepler, Taverna
From Flickr by cogdog
BACKING UP: 3 places, 3 ways
From Flickr by lippo
From Flickr by see phar
Original
Near
Far
What software?What hardware?What personnel?
How often?Set up reminders!
Test system
SHARING
RepositoriesInstitutionalDisciplinaryJournalre3data.org
Sustainable formatsOpen, non-proprietaryCommonly used in your disciplineNot encrypted or compressed
Review your DMPDid you do what you said you would?
Photo credit Michael Ham
How Do I Learn More?
•Funding Mandateshttp://chronicle.com/article/Where-Should-You-Keep-Your/231065/http://datapub.cdlib.org/2013/02/28/the-new-ostp-policy-what-it-means/
•File Naming Conventions: http://www.exadox.com/en/articles/file-naming-convention-ten-rules-best-practice
•Folder Structures: http://www.damlearningcenter.com/resources/articles/best-practices-for-folder-organization/
•Metadata:http://www.dcc.ac.uk/resources/metadata-standards
•DataONE Primerhttps://www.dataone.org/best-practices
•Software Carpentryhttp://software-carpentry.org/
•Research Data Alliancehttps://rd-alliance.org/
•Your Libraryhttp://guides.lib.washington.edu/dmg
Tools
•Data Mgmt PlanningDMPTool https://dmptool.org/
•MetadataMorpho https://www.dataone.org/software-tools/morphoNOAA MERMaid http://www.ncddc.noaa.gov/ metadata-standards/mermaid/
•WorkflowsKepler https://kepler-project.org/Taverna http://www.taverna.org.uk/
•Sharing re3data http://www.re3data.org/GitHub https://github.com/
•MiscellaneousEZID http://ezid.cdlib.org/ImpactStory https://impactstory.org/ORCID http://orcid.org/
Any Other Questions? Stephanie Wright
Web data.blogspot.com
Twitter @UWLibsData
Email [email protected]