18
Lenhard Group Retreat - October 2015 Reproducible research in R Liz Ing-Simmons

Reproducible research (and literate programming) in R

  • Upload
    lizis

  • View
    394

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Reproducible research (and literate programming) in R

Lenhard Group Retreat - October 2015

Reproducible research in R

Liz Ing-Simmons

Page 2: Reproducible research (and literate programming) in R

Lenhard Group Retreat - October 2015

Reproducible research (and literate programming) in R

Liz Ing-Simmons

Page 3: Reproducible research (and literate programming) in R

The worst kind of collaborator

(This is good motivation for reproducibility, too)

Page 4: Reproducible research (and literate programming) in R

What is reproducibility?

Page 5: Reproducible research (and literate programming) in R

What is reproducibility?

• Replicable:– results can be reproduced from an independent analysis

(different lab, model system, software…)

• Reproducible:– Results can be reproduced using your code and data

• Both are important!– Making analysis reproducible means being explicit about

what you’ve done, which makes it easier to replicate– and has other benefits (more on this later)

• Partial reproducibility is better than none

Or maybe the other way round depending on who you ask…

Page 6: Reproducible research (and literate programming) in R

Reproducibility tools for R

• packrat– Manage and track dependencies for projects

• switchr– Switch between different package libraries

• knitr– Report generation from combined text and code

• R Markdown (rmarkdown package)– Simple formatting syntax for text and code blocks

You can use knitr with other languages too!

Page 7: Reproducible research (and literate programming) in R

Literate programming

• Documents that combine code, results, and documentation that tells you what the code is doing

• Encourages you to be explicit about what you’re trying to do– can make it easier to spot mistakes– better code

more readable more understandable more reusable

• Bonus: make pretty reports for your collaborators• Some journals now encourage you to submit code as

supplementary material

Page 8: Reproducible research (and literate programming) in R

Anatomy of an Rmarkdown document

YAML header: Title, author, document options

Code block:Enclosed in ```, language and

options specified

Text:Including section headers and links

Page 9: Reproducible research (and literate programming) in R

A sample .Rmd

• In Rstudio, you can use the ‘knit HTML’ button (or pdf)

• In an R session, use knitr::knit2html()

Page 10: Reproducible research (and literate programming) in R

Anatomy of an Rmd

Table of contents‘short’ or ‘long’ version – with code

included or without

Controls printing of warnings/messages

Custom figure / cache paths

Stop on error!

Page 11: Reproducible research (and literate programming) in R

Anatomy of an RmdSecond-level header

Links

(you can use similar syntax to insert image files)

Load all packages(do not cache!)

Keep functions in one place

Page 12: Reproducible research (and literate programming) in R

Anatomy of an RmdCache data loading /

processing

Code formatting within text using backticks`function()`

Control figure size for a specific chunk

It’s a good idea to name chunks – will be used to name figures

Page 13: Reproducible research (and literate programming) in R

Anatomy of an Rmd

Code can be included for demonstration but not evaluated

Here the data is loaded from the package instead

Page 14: Reproducible research (and literate programming) in R

Anatomy of an Rmd

Format tables with knitr::kable()

You can include citations from (e.g.) a BibTeX file in an Rmd!

(but it’s not worth it for two)

Include session info to track package versions used!

I also add the time the document was created

You can include evaluated R code in the text by using `r `

Page 15: Reproducible research (and literate programming) in R

Other tips and tricks

• You can set multiple figure devices e.g. dev=c(‘pdf’, ‘png’)

• Disable lazy loading for very large caches (cache.lazy = FALSE)

• ‘dependson’ can be used to set dependencies between chunks

Page 16: Reproducible research (and literate programming) in R

Other tips and tricks

• File paths:– Either relative to the Rmd location or set as a

variable• Consider directory structure

– (e.g. nicercode.github.io/blog/2013-04-05-projects/)

• Use set.seed() if using any random numbers

Page 17: Reproducible research (and literate programming) in R

Resources

yihui.name/knitr Official site with examples and documentation(there’s also a knitr book)

kbroman.org/knitr_knutshell/Really good knitr tutorial

kbroman.org/steps2rr/Other reproducibility tips

rmarkdown.rstudio.com/Rmarkdown info including cheatsheets

Page 18: Reproducible research (and literate programming) in R

Other reproducibility tools

• Jupyter (formerly iPython Notebook):– Similar in concept to knitr but for interactive

use (jupyter.org/)• Make (and similar tools):

– Automated building of project outputs• Docker (Rocker):

– Containers for code, like a lightweight virtual machine