Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
EntertheTidyverseBIO5312FALL2017
STEPHANIE J. SPIELMAN,PHD
Whatisthe“tidyverse”?AcollectionofRpackageslargelydevelopedbyHadleyWickhamandothersatRstudio
Haveemergedasstaplesofmodern-daydatascienceinthepast5—10years
Wewillfocuson:• Visualization/plottingwithggplot2• Datamanagementand”wrangling”withdplyr andtidyr•DocumentpresentationwithRMarkdown
FocusisontidydataframesEachvariableformsacolumn.Eachobservationformsarow.Eachtypeofobservationalunitformsatable.
Tidydataprovidesaconsistentapproachtodatamanagementthatgreatlyfacilitatesdownstreamanalysisandviz
WorkingwithtidydataThepackagedplyr canmanipulateandmanagetidydata
Thepackagetidyr canrearrangedatatoconvertto/fromtidydata
Thepackageggplot2 isusedforvisualization/plotting
Thefundamentalverbsofdplyrfilter() selectrowsselect() selectcolumnsmutate() createnewcolumnsgroup_by() establish adatagroupingtally() count observationsinagroupingsummarize() calculate summarystatisticarrange() arrangerows
Therearemorefunctionsbuttheseonesarekey!
Thepipeoperator%>%“Pipes”outputfromonefunction/operationasinputtothenext
## Find the mean of iris sepal lengthsmean.sepal <- mean(iris$Sepal.Length)
## Using %>%mean.sepal <- iris$Sepal.Length %>% mean()
iris$Sepal.Length %>% mean() -> mean.sepal
iris %>% mean(Sepal.Length) -> mean.sepal
“forwardassignment”operatorfollowsthelogicalflowofpiping
## Start simple: display datahead(iris)
## Using %>%iris %>% head()
dplyr demoCommandsindemoareonsjspielman.org/bio5312_fall2017/day2_tidyverse1
Visualizingwithggplot2Thepackageggplot2 isagraphicspackagethatimplementsagrammarofgraphics◦ Operatesondataframes,notvectorslikeBaseR◦ Explicitlydifferentiatesbetweenthedataandtherepresentationofthedata
Theggplot2 grammar
Grammar element* What isit
Data Thedataframebeingplotted
Geometrics Thegeometricshapethatwillrepresentthedata• Point,boxplot,histogram, violin,bar,etc.
Aesthetics Theaesthetics ofthegeometricobject• Color,size,shape,etc.
*Tableistinysubsetofwhatggplot2hastooffer
Example:scatterplot> ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point()
●●●
●●
●
●●
●● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
● ●●●
●●
●●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
2
4
6
5 6 7 8Sepal.Length
Petal.Length
Example:scatterplot> ggplot( iris, aes(x = Sepal.Length, y = Petal.Length) ) + geom_point()
●●●
●●
●
●●
●● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
● ●●●
●●
●●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
2
4
6
5 6 7 8Sepal.Length
Petal.Length
• Passinthedataframeasyourfirstargument
• Aestheticsmapthedataontoplotcharacteristics,herexandyaxes
• Displaythedatageometricallyaspoints
Example:scatterplotwithcolor> ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point(color = "red" )
●●●
●●
●
●●
●● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
● ●●●
●●
●●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
2
4
6
5 6 7 8Sepal.Length
Petal.Length
Example:scatterplotwithaes color> ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point()
●●●
●●
●
●●
●● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
● ●●●
●●
●●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
2
4
6
5 6 7 8Sepal.Length
Petal.Length Species
●
●
●
setosa
versicolor
virginica
• Placingcolorinsideaesethetic mapsittothedata.
Example:scatterplotwithaes color,shape
> ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species, shape = Species)) + geom_point()
●●●
●●
●
●●
●● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
● ●●●
●●
●●
●●
●
●●●
●
●
●
●
●●
●
2
4
6
5 6 7 8Sepal.Length
Petal.Length Species
● setosa
versicolor
virginica
Aestheticsmaybeplacedinsidetherelevantgeom
> ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point(aes(color = Species, shape = Species))
●●●
●●
●
●●
●● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
● ●●●
●●
●●
●●
●
●●●
●
●
●
●
●●
●
2
4
6
5 6 7 8Sepal.Length
Petal.Length Species
● setosa
versicolor
virginica
> ## Remember dplyr!> iris %>% ggplot(aes(x = Sepal.Length, y =
Petal.Length)) + geom_point(aes(color = Species, shape = Species))
Aestheticsareformappingonly> ### Color all points blue?> ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = "blue")) + geom_point()
●●●
●●
●
●●
●● ●
●●
●●
●●
●
●●
●●
●
●●
●●●●
● ●●●
●●
●●
●●
●●●●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
● ●●
●
●●
●●
●
●
● ●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●●
●●
●●
●●
●
2
4
6
5 6 7 8Sepal.Length
Petal.Length
colour● blue
Aestheticsareformappingonly> ### Color all points blue?> ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = "blue")) + geom_point()
> ### Correctly color all points blue> ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point(color = "blue")
●●●
●●
●
●●
●● ●
●●
●●
●●
●
●●
●●
●
●●
●●●●
● ●●●
●●
●●
●●
●●●●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
● ●●
●
●●
●●
●
●
● ●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●●
●●
●●
●●
●
2
4
6
5 6 7 8Sepal.Length
Petal.Length
colour● blue
Example:multiplegeoms> ### Use some fake data:> fake.data <- data.frame(t = 1:10, y = runif(10, 1, 100))
> ggplot(fake.data, aes(x = t, y = y)) + geom_point() + geom_line()
●
●
●
●
●
●
●
●
●
●
0
25
50
75
100
2.5 5.0 7.5 10.0t
y
Makesureaestheticmappingsareproperlyapplied
> ggplot(fake.data, aes(x = t, y = y, size = y)) + geom_point() + geom_line()
●
●
●
●
●
●
●
●
●
●
0
25
50
75
100
2.5 5.0 7.5 10.0t
y
y●
●
●
25
50
75
Makesureaestheticmappingsareproperlyapplied
> ggplot(fake.data, aes(x = t, y = y, size = y)) + geom_point() + geom_line()
> ggplot(fake.data, aes(x = t, y = y)) + geom_point( aes(size=y) ) + geom_line()
●
●
●
●
●
●
●
●
●
●
0
25
50
75
100
2.5 5.0 7.5 10.0t
y
y●
●
●
25
50
75
Histograms> ggplot(iris, aes(x = Sepal.Length)) + geom_histogram()
0.0
2.5
5.0
7.5
10.0
12.5
5 6 7 8Sepal.Length
count
Histograms> ggplot(iris, aes(x = Sepal.Length)) + geom_histogram( fill = "orange" )
0.0
2.5
5.0
7.5
10.0
12.5
5 6 7 8Sepal.Length
count
Histograms> ggplot(iris, aes(x = Sepal.Length)) + geom_histogram( fill = "orange", color = "brown" )
0.0
2.5
5.0
7.5
10.0
12.5
5 6 7 8Sepal.Length
count
Histograms> ggplot(iris, aes(x = Sepal.Length)) + geom_histogram( fill = "orange", color = "brown" )
+ xlab("Sepal Length") + ylab("Count") + ggtitle("Histogram of iris sepal lengths")
0.0
2.5
5.0
7.5
10.0
12.5
5 6 7 8Sepal Length
Cou
nt
Histogram of iris sepal lengths
Boxplots> ggplot(iris, aes(x = "", y = Sepal.Length)) + geom_boxplot()
5
6
7
8
x
Sepal.Length
Boxplots> ggplot(iris, aes(x = "", y = Sepal.Length)) + geom_boxplot(fill = "green")
5
6
7
8
x
Sepal.Length
Boxplots> ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_boxplot(fill = "green")
●5
6
7
8
setosa versicolor virginicaSpecies
Sepal.Length
Boxplots> ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) + geom_boxplot()
●5
6
7
8
setosa versicolor virginicaSpecies
Sepal.Length Species
setosa
versicolor
virginica
Boxplots:Customizingthefillmappings
> ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) + geom_boxplot() + scale_fill_manual(values=c("red", "blue", "purple"))
●5
6
7
8
setosa versicolor virginicaSpecies
Sepal.Length Species
setosa
versicolor
virginica
scale_fill_manual()alsotweakslegend
> ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) + geom_boxplot() + scale_fill_manual(values=c("red", "blue", "purple"), name = "Species name", labels=c("SETOSA", "VIRGINICA", "VERSICOLOR"))
●5
6
7
8
setosa versicolor virginicaSpecies
Sepa
l.Len
gth Species name
SETOSA
VIRGINICA
VERSICOLOR
Changingtheorder> ### Ordering depends on factor levels> levels(iris$Species)
[1] "setosa" "versicolor" "virginica"
> ### Change order of levels> iris$Species <- factor(iris$Species, levels=c("virginica", "setosa", "versicolor"))
[1] "virginica" "setosa" "versicolor"
> ### Replot> ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() + scale_fill_manual(values=c("red", "blue", "purple"))
●5
6
7
8
virginica setosa versicolorSpecies
Sepal.Length Species
virginica
setosa
versicolor
GroupedboxplotsThiswillapplytoviolinplotsaswell.> ## Create another categorical variable for grouping purpopses> iris %>%
group_by(Species) %>%mutate(size = ifelse( Sepal.Width > median(Sepal.Width) , "big" , "small" )) -> iris2
> head(iris2) Source: local data frame [150 x 6]Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species size<dbl> <dbl> <dbl> <dbl> <fctr> <chr>
1 5.1 3.5 1.4 0.2 setosa big2 4.9 3.0 1.4 0.2 setosa small3 4.7 3.2 1.3 0.2 setosa small4 4.6 3.1 1.5 0.2 setosa small5 5.0 3.6 1.4 0.2 setosa big6 5.4 3.9 1.7 0.4 setosa big
Condition ValueifTRUE
ValueifFALSE
Groupedboxplots> ggplot(iris2, aes( x = Species, fill=size, y=Sepal.Width)) + geom_boxplot()
●
●
●●
●
2.0
2.5
3.0
3.5
4.0
4.5
setosa versicolor virginicaSpecies
Sepal.W
idth size
big
small
Groupedboxplots> ggplot(iris2, aes( x = size, fill = Species, y=Sepal.Width)) + geom_boxplot()
●
●●
●
●
2.0
2.5
3.0
3.5
4.0
4.5
big smallsize
Sepal.W
idth Species
setosa
versicolor
virginica
Detour:scale_color_manual()customizescolor
> ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point(aes(color = Species)) + scale_color_manual(values=c("cornflowerblue", "deepskyblue4", "lightcyan4"))
●●●
●●
●
●●
●● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
●●●●●
●
●●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
2
4
6
5 6 7 8Sepal.Length
Petal.Length Species
●
●
●
virginica
setosa
versicolor
Detourround2:scale_<fill/color>_??Therearemany scalestousebesidesdefaultandcustom.◦ scale_<fil/color>_brewer()usespre-madecolorschemesfromcolorbrewer.org
◦ scale_color_gradient()cantakealowandhightofillalongaspectrum
Seehere:http://ggplot2.tidyverse.org/reference/#scales
Violinplot> ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) + geom_violin()
5
6
7
8
virginica setosa versicolorSpecies
Sepal.Length Species
virginica
setosa
versicolor
Barplot> ggplot(iris, aes(x = Species, fill = Species)) + geom_bar()
0
10
20
30
40
50
virginica setosa versicolorSpecies
count
Speciesvirginica
setosa
versicolor
Stacked/groupedbarplot> head(iris2)
Source: local data frame [150 x 6]Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species size<dbl> <dbl> <dbl> <dbl> <fctr> <chr>
1 5.1 3.5 1.4 0.2 setosa big2 4.9 3.0 1.4 0.2 setosa small3 4.7 3.2 1.3 0.2 setosa small4 4.6 3.1 1.5 0.2 setosa small5 5.0 3.6 1.4 0.2 setosa big6 5.4 3.9 1.7 0.4 setosa big
Stacked/groupedbarplot> ggplot(iris, aes(x = Species, fill = size)) + geom_bar()
0
10
20
30
40
50
setosa versicolor virginicaSpecies
count size
big
small
Stacked/groupedbarplot> ggplot(iris, aes(x = Species, fill = size)) + geom_bar( position = "dodge" )
0
10
20
30
setosa versicolor virginicaSpecies
count size
big
small
Densityplot> ggplot(iris, aes(x = Sepal.Length, fill = Species)) + geom_density()
Whatdoesthetailofthesetosa distributionlooklike?
0.0
0.4
0.8
1.2
5 6 7 8Sepal.Length
density
Speciessetosa
versicolor
virginica
Densityplot> ggplot(iris, aes(x = Sepal.Length, fill = Species)) + geom_density( alpha = 0.5 )
0.0
0.4
0.8
1.2
5 6 7 8Sepal.Length
density
Speciessetosa
versicolor
virginica
ThemesGraybackgroundandgridnotworkingforyou?Meneither.
◦ Built-inotherthemes:http://ggplot2.tidyverse.org/reference/ggtheme.html
◦ Customizeyourtheme:http://ggplot2.tidyverse.org/reference/theme.html
◦ Usesomebodyelse'sthemes:◦ https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.html◦ https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html