27
Software Sustainability Institute www.software.ac. uk Better Research Why reproducibility is important for your research http://dx.doi.org/10.6084/m9.figshare.1126304 EMCSR14, St. Andrews, 5 August 2014 Neil Chue Hong (@npch), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | [email protected] Original slides licensed under CC-BY as indicated Supported by Project funding from

Original slides licensed under CC-BY as indicated

Embed Size (px)

DESCRIPTION

Better Software, Better Research Why reproducibility is important for your research http://dx.doi.org/10.6084/m9.figshare.1126304 EMCSR14, St. Andrews , 5 August 2014 Neil Chue Hong (@ npch ), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | [email protected]. - PowerPoint PPT Presentation

Citation preview

Page 1: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.ukBetter Software,

Better ResearchWhy reproducibility is important for your researchhttp://dx.doi.org/10.6084/m9.figshare.1126304

EMCSR14, St. Andrews, 5 August 2014Neil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | [email protected]

Original slides licensed under CC-BY as indicated

Supported by Project funding from

Page 2: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

The Software Sustainability Institute

A national facility for cultivating world-class research through software• Better software enables better research• Software reaches boundaries in its

development cycle that prevent improvement, growth and adoption

• Providing the expertise and services needed to negotiate to the next stage

• Developing the policy and tools tosupport the community developing andusing research software

Supported by EPSRC Grant EP/H043160/1

Page 3: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Four Paradigms of Research

Empirical

Theoretical

Computational

Data Exploration

Page 4: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk“Scientific publications have at least two goals:

(i) to announce a result and (ii) to convince readers that the result is correct Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension.”

Jill MesirovAccessible Reproducible ResearchDOI: 10.1126/science.1179653

Page 5: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Raise standards for preclinical cancer research

47 out of 53 “landmark” publications

could not be replicated Begl

ey, E

llis.

Nat

ure,

483

, 201

2do

i:10.

1038

/483

531a

Page 6: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

SIGMOD Reproducibility

• SIGMOD conference offered to attempt to repeat/reproduce papers accepted at conference 2008-2012

• “High burden on reviewers when setting up experiments” Use of VMs advocated

Bonnet et al, SIGMOD Record, June 2011 (Vol. 40, No. 2)doi: 10.1145/2034863.2034873

Page 7: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Water Swap Reaction Coordinate

A water-swap reaction coordinate for the calculation of absolute protein-ligand binding free energiesWoods CJ, Malaisree M, Hannongbua S, Mulholland AJJ. Chem. Phys. (2011) vol. 134, pp. 054114http://dx.doi.org/10.1063/1.3519057

Long Time Scale GPU Dynamics Reveal the Mechanism of Drug Resistance of the Dual Mutant I223R/H275Y

Neuraminidase from H1N1-2009 Influenza VirusBiochemistry, (2012), vol. 51, pp 4364-4375

http://dx.doi.org/10.1021/bi300561n

Page 8: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.ukComputational

science is hard to make truly re-***-ble(perhaps impossible?)

Page 9: Original slides licensed  under CC-BY as indicated

reuse

reproduce

repeat

replicatesame

experimentsame lab

same experiment

different lab

same experimentdifferent set

up

different experiment

some of same

test

Figure by Carole Goble adapted from Drummond C, Replicability is not Reproducibility: Nor is it Good Science, online

and Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.

Page 10: Original slides licensed  under CC-BY as indicated

Design

Execution

Result Analysis

Collection

Publish

Peer Review

Peer Reuse

Prediction

Can I repeat & defend my method?

Can I review / reproduce and compare my results / method with your results /

method?

Can I review / replicate and certify

your method?

Can I transfer your results into my

research and reuse this method?

Figure by Carole Goble adapted from: Mesirov, J. Accessible Reproducible Research Science DOI: 10.1126/science.1179653

Page 11: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Group Exercise 1

• Pick a respectable journal from your field (or use arXiv)

• Choose a research article from the journal: What makes you think that you could repeat it? What makes you think that you could extend it?

• What do you think makes a research article more or less reproducible?

Page 12: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Software Infrastructure and Environments for Reproducible and Extensible Research

1. Open licensing should be used for data and code

2. Workflow tracking should be carried out during the research process

3. Data must be available and accessible4. Code and methods must be available and

accessible5. All 3rd party data and software should be cited

Stodden V and Miguez S, (2014) Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research, DOI: 10.5334/jors.ay

Page 13: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

10 Simple Rules for Reproducible Computational Research

1. For Every Result, Keep Track of How It Was Produced2. Avoid Manual Data Manipulation Steps3. Archive the Exact Versions of All External Programs Used4. Version Control All Custom Scripts5. Record All Intermediate Results, When Possible in Standardized Formats6. For Analyses That Include Randomness, Note Underlying Random Seeds7. Always Store Raw Data behind Plots8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing

Detail to Be Inspected9. Connect Textual Statements to Underlying Results10. Provide Public Access to Scripts, Runs, and Results

Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. DOI:10.1371/journal.pcbi.1003285

Page 14: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Group Exercise 2

• Now that you’ve seen what others think are important, go back to your chosen paper and decide: Whether you can identify the software and data used Whether you can download the software and data

used Whether you could describe to someone else the

steps needed to install, configure and run the software to do something similar to the experiment presented in the paper

• What are the challenges you face?

Page 15: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

The reproducibility spectrum

Peng RD (2011) Reproducible Research in Computational Science. DOI:10.1126/science.1213847

Page 16: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

The Ladder of Academic Software Reusability

Brown CT (2013) http://ivory.idyll.org/blog/ladder-of-academic-software-notsuck.html

Page 17: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

5 Stars of Research Software

• Community There is a community infrastructure

• Open Software has permissive license

• Defined Accurate metadata for the software

• Extensible Usable, modifiable for my purpose

• Runnable I can access and run software

C

O

DE

R

c.f.5 Stars of Linked Data (Berners-Lee)5 Stars of Online Journals (Shotton)

“Golden Star”Originally by SsolbergjCC-BY

http://www.software.ac.uk/blog/2013-04-09-five-stars-research-software

Page 18: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Group Exercise 3

• Now that you’ve looked at someone else’s paper let’s think about your own work

• For the piece of research you’re currently doing What objections might your supervisor/boss make if you said

you wanted to make your research reproducible? What might your supervisor/boss misunderstand about

reproducible research? What would be the biggest barriers to making your research

reproducible? What would be your main motivation for making your

research reproducible?

Page 19: Original slides licensed  under CC-BY as indicated

, it’

Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/,

Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.

Page 20: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Five selfish reasons to make your research reproducible

1. It will make it easier to build up your own research group

2. You won’t be panicking so much about writing about your results near to that deadline

3. You are less likely to let mistakes get through to the published article

4. You’ll get more collaborators5. It will make you more productive

Page 21: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Reproducibility isn’t about making other peoples lives easier, it’s about making you a more productive researcher… be selfish!

Page 22: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

What you can do now

• Read the Best Practices for Scientific Computing http://dx.doi.org/10.1371/journal.pbio.1001745

• Make the code and data you use available through a repository, under version control http://software.ac.uk/resources/guides/choosing-repository-your-software-project http://www.software.ac.uk/blog/2013-09-30-top-tips-version-control

• Publish your software in a journal http://bit.ly/softwarejournals

• Ask for software and data if you’re reviewing a paper

• Forge a career in research, and change it for those coming behind you

• The DOI for this presentation: 10.6084/m9.figshare.1126304• Acknowledgements: the SSI team, particularly Steve Crouch, Dave De Roure,

Carole Goble, Mike Jackson; C Titus Brown; Dan Katz; Jennifer Schopf; Victoria Stodden; Arfon Smith; Greg Wilson; Robin Wilson.

• The Software Sustainabilty Institute is a collaboration between universities of Edinburgh, Manchester, Oxford and Southampton. Supported by EPSRC Grant EP/H043160/1.

Page 23: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Reproducibility isn’t just a set of things to do

… it’s about instilling knowledge

http://bit.ly/datasharingpanda

Page 24: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Purposes

Achieve legal compliance

Create heritage value

Enable continued access to data

Encourage software reuse

Manage systems and services

Purpose

http://www.software.ac.uk/attach/SoftwarePreservationBenefitsFramework.pdf

Page 25: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Approaches

Preservation (techno-centric)

Emulation (data-centric)

Migration (functionality-centric)

Transition (process-centric)

Hibernation (knowledge-centric)

Approach

Page 26: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Software Sustainability: preservation vs sustainability

Image courtesy of RGB Kew – not for reuse

Image courtesy of London Permaculture under CC-by-nc-sa license

Preservation?

Sustainability?

Page 27: Original slides licensed  under CC-BY as indicated

Software Sustainability Institute

www.software.ac.uk

Software Preservation vs Software Sustainablity

• There are several approaches we have identified that could be classed as “sustainability”

• The choice depends on a number of factors, which change through time

Procrastination

Deprecation

HibernationCultivation

Migration

EmulationPreservation

http://www.software.ac.uk/resources/approaches-software-sustainability