Upload
lizis
View
394
Download
3
Embed Size (px)
Citation preview
Lenhard Group Retreat - October 2015
Reproducible research in R
Liz Ing-Simmons
Lenhard Group Retreat - October 2015
Reproducible research (and literate programming) in R
Liz Ing-Simmons
The worst kind of collaborator
(This is good motivation for reproducibility, too)
What is reproducibility?
What is reproducibility?
• Replicable:– results can be reproduced from an independent analysis
(different lab, model system, software…)
• Reproducible:– Results can be reproduced using your code and data
• Both are important!– Making analysis reproducible means being explicit about
what you’ve done, which makes it easier to replicate– and has other benefits (more on this later)
• Partial reproducibility is better than none
Or maybe the other way round depending on who you ask…
Reproducibility tools for R
• packrat– Manage and track dependencies for projects
• switchr– Switch between different package libraries
• knitr– Report generation from combined text and code
• R Markdown (rmarkdown package)– Simple formatting syntax for text and code blocks
You can use knitr with other languages too!
Literate programming
• Documents that combine code, results, and documentation that tells you what the code is doing
• Encourages you to be explicit about what you’re trying to do– can make it easier to spot mistakes– better code
more readable more understandable more reusable
• Bonus: make pretty reports for your collaborators• Some journals now encourage you to submit code as
supplementary material
Anatomy of an Rmarkdown document
YAML header: Title, author, document options
Code block:Enclosed in ```, language and
options specified
Text:Including section headers and links
A sample .Rmd
• In Rstudio, you can use the ‘knit HTML’ button (or pdf)
• In an R session, use knitr::knit2html()
Anatomy of an Rmd
Table of contents‘short’ or ‘long’ version – with code
included or without
Controls printing of warnings/messages
Custom figure / cache paths
Stop on error!
Anatomy of an RmdSecond-level header
Links
(you can use similar syntax to insert image files)
Load all packages(do not cache!)
Keep functions in one place
Anatomy of an RmdCache data loading /
processing
Code formatting within text using backticks`function()`
Control figure size for a specific chunk
It’s a good idea to name chunks – will be used to name figures
Anatomy of an Rmd
Code can be included for demonstration but not evaluated
Here the data is loaded from the package instead
Anatomy of an Rmd
Format tables with knitr::kable()
You can include citations from (e.g.) a BibTeX file in an Rmd!
(but it’s not worth it for two)
Include session info to track package versions used!
I also add the time the document was created
You can include evaluated R code in the text by using `r `
Other tips and tricks
• You can set multiple figure devices e.g. dev=c(‘pdf’, ‘png’)
• Disable lazy loading for very large caches (cache.lazy = FALSE)
• ‘dependson’ can be used to set dependencies between chunks
Other tips and tricks
• File paths:– Either relative to the Rmd location or set as a
variable• Consider directory structure
– (e.g. nicercode.github.io/blog/2013-04-05-projects/)
• Use set.seed() if using any random numbers
Resources
yihui.name/knitr Official site with examples and documentation(there’s also a knitr book)
kbroman.org/knitr_knutshell/Really good knitr tutorial
kbroman.org/steps2rr/Other reproducibility tips
rmarkdown.rstudio.com/Rmarkdown info including cheatsheets
Other reproducibility tools
• Jupyter (formerly iPython Notebook):– Similar in concept to knitr but for interactive
use (jupyter.org/)• Make (and similar tools):
– Automated building of project outputs• Docker (Rocker):
– Containers for code, like a lightweight virtual machine