Upload
mariam-benton
View
37
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Better Software, Better Research Why reproducibility is important for your research http://dx.doi.org/10.6084/m9.figshare.1126304 EMCSR14, St. Andrews , 5 August 2014 Neil Chue Hong (@ npch ), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | [email protected]. - PowerPoint PPT Presentation
Citation preview
Software Sustainability Institute
www.software.ac.ukBetter Software,
Better ResearchWhy reproducibility is important for your researchhttp://dx.doi.org/10.6084/m9.figshare.1126304
EMCSR14, St. Andrews, 5 August 2014Neil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | [email protected]
Original slides licensed under CC-BY as indicated
Supported by Project funding from
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for cultivating world-class research through software• Better software enables better research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage
• Developing the policy and tools tosupport the community developing andusing research software
Supported by EPSRC Grant EP/H043160/1
Software Sustainability Institute
www.software.ac.uk
Four Paradigms of Research
Empirical
Theoretical
Computational
Data Exploration
Software Sustainability Institute
www.software.ac.uk“Scientific publications have at least two goals:
(i) to announce a result and (ii) to convince readers that the result is correct Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension.”
Jill MesirovAccessible Reproducible ResearchDOI: 10.1126/science.1179653
Software Sustainability Institute
www.software.ac.uk
Raise standards for preclinical cancer research
47 out of 53 “landmark” publications
could not be replicated Begl
ey, E
llis.
Nat
ure,
483
, 201
2do
i:10.
1038
/483
531a
Software Sustainability Institute
www.software.ac.uk
SIGMOD Reproducibility
• SIGMOD conference offered to attempt to repeat/reproduce papers accepted at conference 2008-2012
• “High burden on reviewers when setting up experiments” Use of VMs advocated
Bonnet et al, SIGMOD Record, June 2011 (Vol. 40, No. 2)doi: 10.1145/2034863.2034873
Software Sustainability Institute
www.software.ac.uk
Water Swap Reaction Coordinate
A water-swap reaction coordinate for the calculation of absolute protein-ligand binding free energiesWoods CJ, Malaisree M, Hannongbua S, Mulholland AJJ. Chem. Phys. (2011) vol. 134, pp. 054114http://dx.doi.org/10.1063/1.3519057
Long Time Scale GPU Dynamics Reveal the Mechanism of Drug Resistance of the Dual Mutant I223R/H275Y
Neuraminidase from H1N1-2009 Influenza VirusBiochemistry, (2012), vol. 51, pp 4364-4375
http://dx.doi.org/10.1021/bi300561n
Software Sustainability Institute
www.software.ac.ukComputational
science is hard to make truly re-***-ble(perhaps impossible?)
reuse
reproduce
repeat
replicatesame
experimentsame lab
same experiment
different lab
same experimentdifferent set
up
different experiment
some of same
test
Figure by Carole Goble adapted from Drummond C, Replicability is not Reproducibility: Nor is it Good Science, online
and Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
Design
Execution
Result Analysis
Collection
Publish
Peer Review
Peer Reuse
Prediction
Can I repeat & defend my method?
Can I review / reproduce and compare my results / method with your results /
method?
Can I review / replicate and certify
your method?
Can I transfer your results into my
research and reuse this method?
Figure by Carole Goble adapted from: Mesirov, J. Accessible Reproducible Research Science DOI: 10.1126/science.1179653
Software Sustainability Institute
www.software.ac.uk
Group Exercise 1
• Pick a respectable journal from your field (or use arXiv)
• Choose a research article from the journal: What makes you think that you could repeat it? What makes you think that you could extend it?
• What do you think makes a research article more or less reproducible?
Software Sustainability Institute
www.software.ac.uk
Software Infrastructure and Environments for Reproducible and Extensible Research
1. Open licensing should be used for data and code
2. Workflow tracking should be carried out during the research process
3. Data must be available and accessible4. Code and methods must be available and
accessible5. All 3rd party data and software should be cited
Stodden V and Miguez S, (2014) Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research, DOI: 10.5334/jors.ay
Software Sustainability Institute
www.software.ac.uk
10 Simple Rules for Reproducible Computational Research
1. For Every Result, Keep Track of How It Was Produced2. Avoid Manual Data Manipulation Steps3. Archive the Exact Versions of All External Programs Used4. Version Control All Custom Scripts5. Record All Intermediate Results, When Possible in Standardized Formats6. For Analyses That Include Randomness, Note Underlying Random Seeds7. Always Store Raw Data behind Plots8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing
Detail to Be Inspected9. Connect Textual Statements to Underlying Results10. Provide Public Access to Scripts, Runs, and Results
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. DOI:10.1371/journal.pcbi.1003285
Software Sustainability Institute
www.software.ac.uk
Group Exercise 2
• Now that you’ve seen what others think are important, go back to your chosen paper and decide: Whether you can identify the software and data used Whether you can download the software and data
used Whether you could describe to someone else the
steps needed to install, configure and run the software to do something similar to the experiment presented in the paper
• What are the challenges you face?
Software Sustainability Institute
www.software.ac.uk
The reproducibility spectrum
Peng RD (2011) Reproducible Research in Computational Science. DOI:10.1126/science.1213847
Software Sustainability Institute
www.software.ac.uk
The Ladder of Academic Software Reusability
Brown CT (2013) http://ivory.idyll.org/blog/ladder-of-academic-software-notsuck.html
Software Sustainability Institute
www.software.ac.uk
5 Stars of Research Software
• Community There is a community infrastructure
• Open Software has permissive license
• Defined Accurate metadata for the software
• Extensible Usable, modifiable for my purpose
• Runnable I can access and run software
C
O
DE
R
c.f.5 Stars of Linked Data (Berners-Lee)5 Stars of Online Journals (Shotton)
“Golden Star”Originally by SsolbergjCC-BY
http://www.software.ac.uk/blog/2013-04-09-five-stars-research-software
Software Sustainability Institute
www.software.ac.uk
Group Exercise 3
• Now that you’ve looked at someone else’s paper let’s think about your own work
• For the piece of research you’re currently doing What objections might your supervisor/boss make if you said
you wanted to make your research reproducible? What might your supervisor/boss misunderstand about
reproducible research? What would be the biggest barriers to making your research
reproducible? What would be your main motivation for making your
research reproducible?
, it’
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/,
Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
Software Sustainability Institute
www.software.ac.uk
Five selfish reasons to make your research reproducible
1. It will make it easier to build up your own research group
2. You won’t be panicking so much about writing about your results near to that deadline
3. You are less likely to let mistakes get through to the published article
4. You’ll get more collaborators5. It will make you more productive
Software Sustainability Institute
www.software.ac.uk
Reproducibility isn’t about making other peoples lives easier, it’s about making you a more productive researcher… be selfish!
Software Sustainability Institute
www.software.ac.uk
What you can do now
• Read the Best Practices for Scientific Computing http://dx.doi.org/10.1371/journal.pbio.1001745
• Make the code and data you use available through a repository, under version control http://software.ac.uk/resources/guides/choosing-repository-your-software-project http://www.software.ac.uk/blog/2013-09-30-top-tips-version-control
• Publish your software in a journal http://bit.ly/softwarejournals
• Ask for software and data if you’re reviewing a paper
• Forge a career in research, and change it for those coming behind you
• The DOI for this presentation: 10.6084/m9.figshare.1126304• Acknowledgements: the SSI team, particularly Steve Crouch, Dave De Roure,
Carole Goble, Mike Jackson; C Titus Brown; Dan Katz; Jennifer Schopf; Victoria Stodden; Arfon Smith; Greg Wilson; Robin Wilson.
• The Software Sustainabilty Institute is a collaboration between universities of Edinburgh, Manchester, Oxford and Southampton. Supported by EPSRC Grant EP/H043160/1.
Software Sustainability Institute
www.software.ac.uk
Reproducibility isn’t just a set of things to do
… it’s about instilling knowledge
http://bit.ly/datasharingpanda
Software Sustainability Institute
www.software.ac.uk
Purposes
Achieve legal compliance
Create heritage value
Enable continued access to data
Encourage software reuse
Manage systems and services
Purpose
http://www.software.ac.uk/attach/SoftwarePreservationBenefitsFramework.pdf
Software Sustainability Institute
www.software.ac.uk
Approaches
Preservation (techno-centric)
Emulation (data-centric)
Migration (functionality-centric)
Transition (process-centric)
Hibernation (knowledge-centric)
Approach
Software Sustainability Institute
www.software.ac.uk
Software Sustainability: preservation vs sustainability
Image courtesy of RGB Kew – not for reuse
Image courtesy of London Permaculture under CC-by-nc-sa license
Preservation?
Sustainability?
Software Sustainability Institute
www.software.ac.uk
Software Preservation vs Software Sustainablity
• There are several approaches we have identified that could be classed as “sustainability”
• The choice depends on a number of factors, which change through time
Procrastination
Deprecation
HibernationCultivation
Migration
EmulationPreservation
http://www.software.ac.uk/resources/approaches-software-sustainability