Reproducible research: First steps

Preview:

Citation preview

Richard LaytonMay 6, 2015

First steps towards reproducible research

Credibility turns on the success or failure of attempts to reproduce findings.

Kenneth Rogoff & Carmen Reinhart

In economic models

• coding errors

• selective exclusion of available data

• unconventional weighting of summary statistics

Thomas Herdon, Michael Ash, & Robert Pollin (2013). Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Working

paper series 322. Political Economy Research Institute, U Mass Amherst.

Credibility turns on the success or failure of attempts to reproduce findings.

Jason deBruyn (Jan 23, 2015) Trial involving disgraced scientist and bunk Duke research to begin Monday. Triangle Business Journal.

In cancer therapy models

• data falsification

• retracted journal articles

• terminated clinical trials

• civil suit by patientsAnil Potti

Credibility turns on the success or failure of attempts to reproduce findings.

1000 years of temperature variation: the ”hockey stick” graph by Michael Mann

In climate science models

• flawed research methods

• evasion of FOIA requests

• leaked emails

• media hype

Freed Pearce (2010-02-09) Climate change debate overheated after sceptic grasped 'hockey stick‘. The Guardian.

“Computational science today faces a credibility crisis.”

Victoria Stodden, UIUC

Without access to the code and

data that underlie scientific

discoveries, published findings

are all but impossible to verify.

What can reproducible research do for you?

Your closest collaborator

is you six months ago,

but you don't reply to emails.

Paul Wilson

Engineering Physics

UW–Madison

This work flow is probably familiar.

Karl Broman

Biostatistics & Medical Informatics

UW–Madison

If you do anything “by hand”

once, you’ll do it 100 times.

Some narrative.

<<>>=

hist(co2)

@

Discuss result.

Principle 1.

Blend computing, results, and narrative.

Open a script.

Embed the code that

creates output.

More narrative.

Write content.

Principle 1.

Blend computing, results, and narrative.

<<>>=

hist(co2)

@

Render the text and

code outputs.

Report titleIntroduction.Some narrative.

Discuss result.More narrative.

Report titleIntroduction.Some narrative.

Discuss result.More narrative.

Some narrative.

<<>>=

hist(co2)

@

Discuss result.

Changes in the script? Render a new report.

.Rnw

Example

.Rnw

render

Example

.Rnw

render

Example

The same report in Markdown.

.Rmd

The same report in Markdown.

render

.Rmd

render

.Rmd

.Rmd

Edit the output option.

No change to the rest of the file.

render

Same report with a different output format.

render

.Rmd

Principle 2. Organize for reproducibility

from the beginning.

1. Everything is a script

2. Every script is connected

3. File management is planned

# wrangle data

write(csv)

# gather data

read(xlsx)

script

Data

# create graph

write(PDF)

write(PNG)

# analysis

read(csv)

script

Design

source(design)

```{r}

source(gather)

Narrative.

script

Narrative

include(graph)

.Rmd

Report

.Rmd

render

.Rmd

reproducible

report

non-reproducible

documents

Your future self thanks you.

Summary: two principles.

Organize for reproducibility

from the beginning.

Explicitly link computing,

results, and narrative.

To learn more,

Victoria Stodden, Friedrich

Leisch, & Roger D. Peng (2014)

Chrtistopher Gandrud (2015)Yihui Xie (2013)

One Script to rule them all,

One Script to find them,

One Script to bring them all

And in the Markdown bind them.

Image credits1. Image of Reinhart and Rogoff, reprinted under Creative Commons license, courtesy of The Commentator,

http://www.thecommentator.com/privacy_policy.

2. Image of Anil Potti, from WPDE.com, http://www.carolinalive.com/ © 2015 Sinclair Communications, LLC.

3. “Hockey stick” graph from Mann, Bradley, & Hughes, Nature, 1998. Reprinted from The Guardian, © 2015 Guardian News and Media Limited, http://www.theguardian.com/environment/2010/feb/02/hockey-stick-graph-climate-change.

4. Image of Victoria Stodden, from YouTube, speaking on "Reproducible Research: A Digital Curation Agenda" at the 7th International Digital Curation Conference, University of Bath, Bristol, UK, Dec 6, 2011. Creative Commons attribution license.

5. Bing images for the MATLAB logo, Microsoft Word, Excel, & PowerPoint, and for Adobe PDF are reprinted under Creative Commons license.

6. Other unattributed clipart courtesy of https://openclipart.org/, used with permission.