Upload
ann-loraine
View
74
Download
1
Embed Size (px)
DESCRIPTION
Slides from Workshop 2 of Workshop in Next-Generation Science held at UNC Charlotte City Center Campus in May 2014
Citation preview
Workshops in next-‐genera1on science at UNC Charlo7e 2014
Workshop 2 -‐ R, RStudio, & reproducible research with knitr
1
R, RStudio, & reproducible research with knitr
2
wings 2014
No programming experience necessary
"we wanted users to be able to begin in an interac1ve environment, where they did not consciously think of themselves as programming. Then as their needs became clearer and their sophis1ca1on increased, they should be able to slide gradually into programming..." John Chambers, Stages in the Evolu0on of S
3
Why use R?
• Free & open source • Has a lot of support – Popular in many domains (finance, business analy1cs, sta1s1cs, biology)
• Many libraries available for biological data analysis through Bioconductor project – Such as EdgeR (today)
• Now has an easy to use, free user interface called RStudio
4
RStudio
• A very nice graphical user interface for R. • It's free! • Integrates well with knitr – tool for wri1ng sta1s1cal reports w/ R markdown
5
R Markdown ".Rmd"
• Lets you write a report that combines results and commands
• Sounds weird, but once you get used to it, it's very powerful
• Catch mistakes before publica1on – Ask a friend to run & review your data analysis
6
knitr & R Markdown enable literate programming
• A way to do "literate programming" – Developed by Donald Knuth, Stanford Computer Science professor
• Literate programming: Write programs that explain what they are doing while they are doing it.
• Prac1cal applica1on: Data Analysis Reports
7
Plan for Today
• Introduce R and RStudio – Part I: Func1ons & plots – Part 2: Markdown
– Part 3: See how sta1s1cal tes1ng works in R • Differen1al expression analysis walk-‐through (may extend into Workshop 3)
• Goal: Get you started! – Lots of Web resources for further study
8
Let's get started!
9
Start RStudio
• RStudio has panes – w/ min, max bu7ons (top right)
• Panes have tabs
10
console where you type commands environment, shows variables you've defined
Make new project (Part 1)
• Select File > Project > New Project ..
• Choose New Directory
11
Make new project (Part 2)
• Choose Empty Project
12
Make new project (Part 3)
• Choose Empty Project
• Enter "wings2014"
• Click Create Project
13
Project name in upper right corner
14
• Open folder wings2014 • See wings2014.Rproj file • Tip: Aier quit, double-‐click to start RStudio with correct directory sekngs
15
Enter commands in Console
16
> symbol is the prompt
• Type commands or expressions at the prompt, ENTER
• R evaluates what you type, prints the result
• Returns prompt
Prac1ce: Try arithme1c expressions
• Add + • Subtract -‐ • Mul1ply *
• Raise to a power **
17
• Expressions return values as one-‐element vectors.
• [1] indicates that the value next to it has this index.
Prac1ce: Save results to variables
18
• Use '=' to assign result to a variable – Nothing printed
• Type variable name to see what's in it
• Use variables in expressions
Variables refer to objects
19
• Environment tab shows objects created thus far • Most of what you do in R involves manipula1ng objects saved to variable names – Use objects as inputs to func1ons
R func1ons
• R has many func1ons – math – plokng – sta1s1cal tests
• Func1ons take inputs called arguments • Most func1ons have many possible arguments – Usually have reasonable defaults
20
argument
How to use a func1on in 4 steps
1. Type func1on name 2. Type "(" open paren
! RStudio types closing paren for you
3. Type arguments – if more than one argument, insert "," (comma)
4. Type ENTER
21
sqrt calculates square root
Prac1ce: rnorm func1on
• rnorm creates a vector of numbers randomly sampled from normal distribu1on with specified mean, standard devia1on
22
func1on name
rnorm(10,5,5)!
sample size
mean standard devia1on
arguments
Prac1ce: rnorm func1on
• Mean and standard devia1on are op1onal
• If you don't specify them, they default default to: – 0 default mean – 1 default sd
23
R 1p!
• Use UP arrow key to retrieve previous command – Saves typing
24
Prac1ce: R allows named arguments
Order can vary
25
rnorm(10,mean=5,sd=2)!
26
• Type help(rnorm) to list arguments, defaults
• help is a func1on – takes other func1ons as arguments
help shows how to use a func1on
Now you know how to...
• Calculate values & see the result • Save output to variables • Use Environment tab to view variables
• Use R func1ons
Next -‐-‐-‐ ploKng!!!
27
R plokng func1ons
• Many op1ons – generic x-‐y plot, sca7er plots – barplots – dendrograms – histograms ... and much more
• Highly configurable! – log or linear scale axes – different characters or colors for points ... and much more
28
Prac1ce: Generic x-‐y plot (sca7er plot)
• named argument main determines plot 1tle
• Note: Enclose text in quotes
29
Prac1ce: Try other op1ons
• col -‐ color of points (in quotes)
• pch -‐ point character – numeric code – le7er (in quotes)
30 and many more..
Prac1ce: Histogram (hist)
• main -‐ plot 1tle (in quotes)
• col -‐ color of bars (in quotes)
31
Prac1ce: Adding to a plot (1)
• abline -‐ "a b line" – add straight line
• Arguments: – v or h for loca1on of ver1cal or horizontal line
– a and b for slope and y intercept
32
Prac1ce: Adding to a plot (2)
• points – add points to a plot
• Arguments: – x , y x & y values for the points
– other op1ons, same as for plot !
33
Take-‐home: In R you can "script" a plot
• Using plokng commands like points, abline, lines you can add more data to a plot, element by element
• Most plokng commands accept the same op1ons, like – pch -‐ point character – col -‐ color
• Learning one plokng command helps you learn many.
34
Prac1ce: Graphics demo
• Enter demo(graphics)!
• Type ENTER to see next plot
35
Part 2 -‐ R Markdown
36
How to install knitr • Go to Packages tab • Not checked? – Check it
• Not installed? – Select Tools > Install Packages...
– Enter knitr – Click Install
• May need to restart RStudio
37
Setup -‐ to enable be7er coding! Go to Tools > Global Preferences > Panes • Top right: console
• Lower right: Environment, History, Files, Plots, Help
• Top Lei: Source
• Lower lei: everything else
38
Prac1ce: Make R Markdown file
• Click "new" file icon • Choose R Markdown – Creates an example R Markdown
• Take a moment to scan document
39
R Markdown has plain text with formakng instruc1ons
• Row of "===" makes "Title" a top level heading
40
R Markdown has code chunks
• Code chunk -‐ three back 1cs, {r}, ends with three more back 1cs
• gray background 41
knitr "knits" code & text
• Makes an HTML document (web page) that combines – code – output from code – your text explana1ons
42
Prac1ce: Knit HTML
• Save the file as "Example.Rmd"
• Click • Preview appears • HTML file appears • Click Example.html in File tab – choose View in Web browser
43
knitr makes an HTML document (a Web page)
• Images embedded • You can email it, save in a Dropbox, etc
44
Prac1ce: Edit Example
• Edit Plain text • Edit code chunks
45
Prac1ce: Run commands in Markdown
• Put cursor inside code chunk
• Type CNTRL-‐ENTER – or click run
46
Shortcut: Chunks menu (top right)
• Put cursor in a chunk • Use Run Current Chunk to run en1re chunk • Or Run All
47
Prac1ce: Edit Markdown, make plot look nicer
• Use col to add color • Use las to change orienta1on of y axis numbers
48
Prac1ce: Run the new code
49
• Put cursor inside code chunk
• Type CNTRL-‐ENTER – or click run
Prac1ce: knit your Markdown
50
Sta1s1cal tests in R
• Tests implemented as func1ons – Usually return list objects
• List is – object that contains other objects of many types
• Previously, you saw vectors – Output of rnorm command – Vectors are like lists that only contain one type of object (e.g., numbers only)
51
Prac1ce: Start a new sec1on
• Heading, smaller than 1tle heading
52
• Make new code chunk • Make new vectors
• Run t.test!
Tip: Markdown help
• Using R Markdown opens Web page w/ more info
• Markdown Quick Reference shows Markdown codes in Help tab 53
Prac1ce: Run the code
54
• t.test output is in result!• result is a list
• Cursor inside chunk • Type CNTRL-‐ENTER – or click run
Prac1ce: Type result (variable name) in console for a summary
55
Prac1ce: Result is a list with named components
• Use names func1on to find what it contains • Use $ to retrieve named components
56
Differen1al expression analysis walk-‐through
Effects of mild chronic heat stress on gene expression in tomato pollen
57
Goals
• Show you how to structure a data analysis – Useful framework you can use in many sekngs
• Give you an example differen1al gene expression analysis for RNA-‐Seq – Use it as a star1ng point for other projects – Tip: Review edgeR user guide for other example data analyses
58
Structure of the data analysis
• Introduc1on – explain the experimental design – state ques1ons (no more than 3, ideally 2)
• Analysis – describe steps of analysis, with results – explain judgment calls, like P value cutoffs
• Conclusion – answer the original ques1ons
• State limita1ons of the analysis • Session info including soiware versions used
Adapted from Jeff Leek's Data Analysis, Coursera 59
Prac1ce: Setup • Go to h7ps://bitbucket.org/lorainelab/tomatopollen
60
Download repository 61
Move to Desktop
• Subfolders correspond to analysis chunks – See README.md for details
• Open Differen0alExpression
Folder name suffix based on repo version
62
Double-‐click ".Rproj" file in Differen1al Expression folder
• Opens a new RStudio window
63
Review of the experiment
• Tomato plants subjected to chronic mild heat stress & control – Greenhouse C – Greenhouse B
• Mature pollen grains harvested in batches over eight weeks, ~ 10 plants per batch – One treatment sample, one control sample per collec1on
• RNA extracted, sent to UCLA for sequencing – 10 libraries, 5 treatments, 5 controls, 69 base paired end sequencing
64 Next: Step-‐by-‐step walk-‐through of R Markdown