Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Stat 437 Lecture Notes 1Xiongzhi Chen
Washington State University
Contents3
Set up RStudio 3Install R and Rstudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Rstudio: a sanpshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Rstudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Objects in R: I 3Scalars in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Vectors in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Vectors in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4The seq command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Matrices in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Matrices in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Matrices in R: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Matrices in R: IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Data frames in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Data frames in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Data frames in R: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Data frames in R: IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Objects in R: II 8Character vectors in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Strings in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Factors in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Factors in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Logic operators in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Logic operators in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Logic operators in R: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Logic operators in R: IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Lists in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Lists in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Lists in R: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Set operations in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Set operations in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12“Coerce” in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12length and dim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
R markdown 13Install R markdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Create a R markdown file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Structure of a markdown file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13A sample markdown file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Basic syntax: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Basic syntax: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1
Basic syntax: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Basic syntax: IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Latex in markdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Data visualization 15Why data visualization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15R packages for visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Basic principles for plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Scatter plot, density plot, boxplot, bar plot 16Scatter plot matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Scatter plot matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Density plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Density plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Boxplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Boxplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Scatter plot matrix: ggpairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Bar plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Visualization with factors 20Look into iris data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Faceting with 1 factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Non-faceting with 1 factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Faceting with 2 factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Faceting with 2 factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Visualization with 3 factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Mathematical expressions 24Math expressions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Subset of diamonds data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Base layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Math symbols in axis titles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Math symbols in axis titles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Math symbols in legend title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Subset of diamonds data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Base layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Math symbols in legend labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Math symbols in strip names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Math symbols in strip names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Math symbols in strip names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Math symbols in plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Other ggplot2 twicks 31Not covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31License and session Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2
Set up RStudio
Install R and Rstudio
• Rstudio free version at: https://www.rstudio.com/products/rstudio/download/• R at: https://www.r-project.org/• install a R package by install.packages("package_name")• install R packages “tidyverse”, “ggplot2”, “markdown”, “igraph”, “plotly”, “ggmap”
Rstudio: a sanpshot
Rstudio
• Upper Left panel: R scripts, R markdown file, R project file, View data, etc
• Lower Left panel: R console, R markdown log, etc
• Upper Right panel: R workspace, History, etc
• Lower Right panel: Files in working directory, Plots, Help, etc
Objects in R: I
Scalars in R
3
> x = 3 # assign value 3 to variable x> y = 2> x+y # addition[1] 5> x*y # multiplication[1] 6> x/y # division[1] 1.5> x%%y # modulo[1] 1> x^y # exponentiation[1] 9> x/0[1] Inf> 0/0 # undefifed[1] NaN
Vectors in R: I
> z = c(1,2,3) # a vector of 3 components> v = c(5,6,7)> z+v # vector addition[1] 6 8 10> z*v # paired componentwise product[1] 5 12 21> z/v # paired componentwise division[1] 0.2000000 0.3333333 0.4285714> z%*%v # inner product
[,1][1,] 38> 2*z # scalar-vector multipication[1] 2 4 6
Vectors in R: II
> z = c(1,2,3)> v = c(5,6,7)> z[1] # access the 1st component of z[1] 1> t(v) # transpose of vector
[,1] [,2] [,3][1,] 5 6 7> z%*%t(v) # outer product
[,1] [,2] [,3][1,] 5 6 7[2,] 10 12 14[3,] 15 18 21
4
The seq command
> seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),+ length.out = NULL, along.with = NULL, ...)
Usage:> seq(0,1,by=0.1)[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Matrices in R: I
> matrix(1:6,nrow=2,ncol=3) # a 2-by-3 matrix[,1] [,2] [,3]
[1,] 1 3 5[2,] 2 4 6> x = c(1,3,5) # a 3-component vector> y = c(2,4,6) #a 3-component vector> # stack x and y as 2 rows to obtain a 2-by-3 matrix> rbind(x,y)
[,1] [,2] [,3]x 1 3 5y 2 4 6> # stack x and y as 2 columns to obtain a 3-by-2 matrix> cbind(x,y)
x y[1,] 1 2[2,] 3 4[3,] 5 6
Matrices in R: II
> x=matrix(1:6,nrow=2,ncol=3) # a 2-by-3 matrix> x
[,1] [,2] [,3][1,] 1 3 5[2,] 2 4 6> x[,1] # 1st column of x[1] 1 2> x[2,] # 2nd row of x[1] 2 4 6> x[1,2] # (1,2)-entry of x[1] 3> t(x) # transpose of x
[,1] [,2][1,] 1 2[2,] 3 4[3,] 5 6
5
Matrices in R: III
> x=matrix(1:6,nrow=2,ncol=3) # a 2-by-3 matrix> x
[,1] [,2] [,3][1,] 1 3 5[2,] 2 4 6> y = rbind(c(0,1,0),c(1,1,1))> y
[,1] [,2] [,3][1,] 0 1 0[2,] 1 1 1> x %*%t(y) # matrix Cauchy product
[,1] [,2][1,] 3 9[2,] 4 12
Matrices in R: IV
> x=matrix(1:6,nrow=2,ncol=3) # a 2-by-3 matrix> x
[,1] [,2] [,3][1,] 1 3 5[2,] 2 4 6> y = rbind(c(0,1,0),c(1,1,1))> y
[,1] [,2] [,3][1,] 0 1 0[2,] 1 1 1> x + y # matrix addition
[,1] [,2] [,3][1,] 1 4 5[2,] 3 5 7> 2*x # scalar multiplication
[,1] [,2] [,3][1,] 2 6 10[2,] 4 8 12
Data frames in R: I
> x <- data.frame("SN" = 1:2, "Age" = c(21,15),+ "Name" = c("John","Dora"))> x
SN Age Name1 1 21 John2 2 15 Dora> x$SN #access SN[1] 1 2> x[,1] # access SN
6
[1] 1 2> class(x$SN) # check object type for SN[1] "integer"> class(x$Name) # check object type for Name[1] "factor"
Data frames in R: II
> x <- data.frame("SN" = 1:2, "Age" = c(21,15),+ "Name" = c("John","Dora"))> x
SN Age Name1 1 21 John2 2 15 Dora> x$SN[2] #access the 2nd entry of SN[1] 2> x[1,2] #access the 1st entry of Age[1] 21
Caution: do not transpose a data.frame when it contains different types of objects
Data frames in R: III
Import (malaria related death) data as data.frame:> Y = read.csv("dataMalyria.csv",header = TRUE,sep=",",+ colClasses=c("country"=NA,"percent"="numeric",+ "labels"=NA))> head(Y)
country percent labels1 Lesotho 0 <1%2 Mauritius 0 <1%3 Seychelles 0 <1%4 Cabo Verde 0 <1%5 Algeria 0 <1%6 Egypt 0 <1%
Data frames in R: IV
Import (malaria related death) data as data.frame:> str(Y) # object structure of Y'data.frame': 53 obs. of 3 variables:$ country: Factor w/ 53 levels "Algeria","Angola",..: 25 32 41 7 1 15 27 33 50 47 ...$ percent: num 0 0 0 0 0 0 0 0 0 0 ...$ labels : Factor w/ 5 levels " <1% "," 1-4% ",..: 1 1 1 1 1 1 1 1 1 1 ...
> dim(Y) # dimension of Y[1] 53 3> Y$id = 1:53 # append a column to Y> Y[1:3,] # display the first 3 rows of Y
7
country percent labels id1 Lesotho 0 <1% 12 Mauritius 0 <1% 23 Seychelles 0 <1% 3
Objects in R: II
Character vectors in R
> w = c("a","b","c") # a vector of 3 character components> w[2] # access the 2nd component[1] "b"> # 1st 10 upper case letters in the alphabet> LETTERS[seq( from = 1, to = 10 )][1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
> # 1st 10 lower case letters in the alphabet> letters[seq( from = 1, to = 10 )][1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
> Q = c("Go","WSU","Cougs","!")> Q[1] "Go" "WSU" "Cougs" "!"> # concatenate two character vectors> c(w,Q)[1] "a" "b" "c" "Go" "WSU" "Cougs" "!"
Strings in R
> w = "Go cougs!"> w[1] "Go cougs!">> v = "Data analytics"> v[1] "Data analytics">> # concatenate two strings> paste(w,v,sep = " ")[1] "Go cougs! Data analytics"
Factors in R: I
> grades = c("A","F","D","C","B") # character vector> grades[1] "A" "F" "D" "C" "B"> class(grades)[1] "character"
8
> gradesF = factor(grades) # gradesF is a now factor> gradesF[1] A F D C BLevels: A B C D F> class(gradesF)[1] "factor"> # levels of the factor "gradesF"> levels(gradesF)[1] "A" "B" "C" "D" "F"> # levels are ordered alphabetically
Factors in R: II
> x = c(1,3,2) # numeric vector> b = factor(x) # change x into a factor> b[1] 1 3 2Levels: 1 2 3> levels(b) # levels are ordered from smallest to largest[1] "1" "2" "3"> # relabel levels of b> d = factor(x,labels = c("3Level","1Level","2Level"))> d[1] 3Level 2Level 1LevelLevels: 3Level 1Level 2Level
Logic operators in R: I
> x = 0 # assign 0 to x> x >0[1] FALSE> x == 0[1] TRUE> !x # return TRUE[1] TRUE> y = 1> y >= 1[1] TRUE> !y # return FALSE[1] FALSE> x & y # "and"; return FALSE[1] FALSE> x | y # "or"; return TRUE[1] TRUE
9
Logic operators in R: II
> x = 1> y = -1> x >0 & y > 0 # "and"[1] FALSE> x > 0 | y > 0 # "or"[1] TRUE> x >0 & !(y>0)[1] TRUE
Logic operators in R: III
> x = c(1,2,3) # a 3-component vector> x >0 # returns a 3-component logic vector[1] TRUE TRUE TRUE> x > 2 # returns a 3-component logic vector[1] FALSE FALSE TRUE> # return indices of entries of x that are greater than 2> which(x>2)[1] 3> # take the subvector of x whose entries not smaller than 2> x[x >=2][1] 2 3
Logic operators in R: IV
> x = c(1,2,3) # a 3-component vector> y = c(-1,4,-1) # a 3-component vector> # compare x and y entrywise; return a 3-component vector> x > y[1] TRUE FALSE TRUE> x == y[1] FALSE FALSE FALSE> x >= y[1] TRUE FALSE TRUE> any(x>y)[1] TRUE> all(x>y)[1] FALSE
Lists in R: I
> x = vector("list",3) # a list with 3 components> # assign a vector to its 1st component> x[[1]] = c(1,2,3)> # assign a string to its 2nd component> x[[2]] = "Second part of x"
10
> # assign a matrix to its 3rd component> x[[3]] = matrix(1:6,nrow=3)> x[[1]][1] 1 2 3
[[2]][1] "Second part of x"
[[3]][,1] [,2]
[1,] 1 4[2,] 2 5[3,] 3 6
Lists in R: II
> x = vector("list",3) # a list with 3 components> x[[1]] = c(1,2,3)> x[[2]] = "Second part of x"> x[[3]] = matrix(1:6,nrow=3)> x[[2]] # show 2nd component of x[1] "Second part of x"
Lists in R: III
> a = c(1,2,3)> b = "Second part of x"> c = matrix(1:6,nrow=3)> y = list("vector" = a, "string" = b, "matrix" = c)> y$vector[1] 1 2 3
$string[1] "Second part of x"
$matrix[,1] [,2]
[1,] 1 4[2,] 2 5[3,] 3 6
Set operations in R: I
> x = c(1,2,3) # a 3-component vector> 1 %in% x # check membership
11
[1] TRUE> c(2,3) %in% x[1] TRUE TRUE> y = c("stat","115","lecture")> "stat" %in% y[1] TRUE> "time" %in% y[1] FALSE
Set operations in R: II
> x = c(1,2,3) # a 3-component vector> y = c(-1,4,-1) # a 3-component vector> union(x, y)[1] 1 2 3 -1 4> intersect(x, y)numeric(0)> setdiff(x, y)[1] 1 2 3
“Coerce” in R
• as.numeric coerces an object to be numeric• as.factor coerces an object to be a factor• as.marix . . .• as.logical . . .• as.data.frame . . .• so on . . .
length and dim
• length returns the number of components of a vector> a = 1:10> length(a)[1] 10
• dim returns the dimension of matrix or data frame> x=dim(matrix(1:6,nrow=3,ncol=2))> x[1] 3 2> x[1][1] 3
12
R markdown
Install R markdown
> install.packages("markdown")> install.packages("knitr")
In Rstudio, follow “Tools > Global Options > Sweave”, and set “Weave Rnw files using” as “knitr”
More details and video tutorial at: Course webiste
Create a R markdown file
In Rstudio, follow “File > New File > R markdown . . . ”
More details and video tutorial at: Course webiste
Structure of a markdown file
• Header (that typesets the output document)• Main body (that contains the contents)
– R chunk (that contains R codes)– Text chunk (that contains non-coding texts or latex commands)
More details and video tutorial at: Course webiste
13
A sample markdown file
Basic syntax: I
Online tutorial: https://rmarkdown.rstudio.com/authoring_basics.html
Online tutorial: https://bookdown.org/yihui/rmarkdown/r-code.html
Basic syntax: II
Some things to go over carefully:
• Adjust figure size in the output document when figure is generated by a R chunk
• Enable current R chunk to use results produced by previous R chunks
• Basic latex commands
Basic syntax: III
To adjust figure size when figure is generated by a R chunk:
• use fig.width and fig.height to set graphical device size as in
{r eval=TRUE,fig.width = 3,fig.height=4}
• use out.width and out.height to set output size as in
{r eval=TRUE,out.width = 5,out.height=6}
More details at: https://bookdown.org/yihui/rmarkdown/r-code.html
14
Basic syntax: IV
To enable current R chunk to use results produced by privous R chunks:
• name a chunk as “chunk1” and cache results as in
{r chunk1,eval=TRUE,cache=TRUE}
• use dependson= refer to “chunk1” as in
{r chunk2,dependson="chunk1",eval=TRUE}
More details at: https://yihui.name/knitr/options/
Latex in markdown
• To include latex packages, add - \usepackage{package_name} in the header, such as:
header-includes:- \usepackage{bbm}- \usepackage{amssymb}- \usepackage{amsmath}- \usepackage{graphicx,float}
• For Latex commands, please use a quick reference: https://wch.github.io/latexsheet/
• Caution: not all Latex commands work in markdown
Data visualization
Why data visualization?
Data visualization
• provides preliminary understanding of data• helps present and disseminate knowledge• is a relatively under-developed subject of data science
R packages for visualization
• ggplot2: create plots• GGally: extend ggplot2• ggmap: provide maps• igraph: produce graphs• Plotly: create interactive web-based plots• Other specialized packages
Basic principles for plotting
• data usually need to be a data frame• build plot layer by layer• basic components of a plot command:
– data, mapping, scales
15
– geometric objects, coordinate system– facet, statistical transformations
Scatter plot, density plot, boxplot, bar plot
Scatter plot matrix
Scatter plot can be used to show any “visible” relationship between two variables.
Iris data:
• 4 variables: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width• species: setosa, versicolor, and virginica• 150 observations for each variable
Scatter plot matrix
> pairs(iris[,1:4], col=iris$Species)
Sepal.Length
2.0 3.0 4.0 0.5 1.5 2.5
4.5
6.0
7.5
2.0
3.0
4.0
Sepal.Width
Petal.Length
13
57
4.5 5.5 6.5 7.5
0.5
1.5
2.5
1 3 5 7
Petal.Width
16
Density plot
Density plot can be used to:
• visually check model assumptions• visually compare a response’s behavior under different conditions
Example: iris data set
Density plot
> library(ggplot2)> ggplot(iris, aes(x=Sepal.Length, color=Species)) ++ geom_density(linetype = "dashed") + theme_bw()
0.0
0.4
0.8
1.2
5 6 7 8
Sepal.Length
dens
ity
Species
setosa
versicolor
virginica
Boxplot
Boxplot does not present full distributional information as density plot. But it can be used to visuallycheck:
• median of data
17
• range of data• skewness of data• outliers in data
Boxplot
> library(ggplot2)> ggplot(iris, aes(x=Species,y=Sepal.Length))+geom_boxplot()++ theme_bw()+stat_summary(fun.y=mean,geom="point",shape=23,size=4)
5
6
7
8
setosa versicolor virginica
Species
Sep
al.L
engt
h
Scatter plot matrix: ggpairs
> library(GGally)> ggpairs(iris, aes(colour = Species, alpha = 0.4))
18
Cor : −0.118setosa: 0.743
versicolor: 0.526virginica: 0.457
Cor : 0.872setosa: 0.267
versicolor: 0.754virginica: 0.864
Cor : −0.428setosa: 0.178
versicolor: 0.561virginica: 0.401
Cor : 0.818setosa: 0.278
versicolor: 0.546virginica: 0.281
Cor : −0.366setosa: 0.233
versicolor: 0.664virginica: 0.538
Cor : 0.963setosa: 0.332
versicolor: 0.787virginica: 0.322
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Sepal.Length
Sepal.W
idthP
etal.LengthP
etal.Width
Species
5 6 7 8 2.02.53.03.54.04.5 2 4 6 0.00.51.01.52.02.5 setosaversicolorvirginica
0.0
0.4
0.8
1.2
2.02.53.03.54.04.5
2
4
6
0.00.51.01.52.02.5
0.02.55.07.5
0.02.55.07.5
0.02.55.07.5
Bar plot
> library(ggplot2)> ggplot(mpg, aes(x=drv,y=hwy,fill=class))+theme_bw()++ geom_bar(stat='identity', position='dodge')
19
0
10
20
30
40
4 f r
drv
hwy
class
2seater
compact
midsize
minivan
pickup
subcompact
suv
Visualization with factors
Look into iris data set
> library(ggplot2)> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa
Faceting with 1 factor
> library(ggplot2)> ggplot(iris, aes(x=Sepal.Length,y=Petal.Length))+
20
+ theme_bw()+geom_point()++ facet_wrap(~Species,nrow=1)
setosa versicolor virginica
5 6 7 8 5 6 7 8 5 6 7 8
2
4
6
Sepal.Length
Pet
al.L
engt
h
Non-faceting with 1 factor
> library(ggplot2)> ggplot(iris, aes(x=Sepal.Length,y=Petal.Length,+ shape=Species,colour=Species))++ theme_bw()+geom_point()
21
2
4
6
5 6 7 8
Sepal.Length
Pet
al.L
engt
h Species
setosa
versicolor
virginica
Faceting with 2 factors
> library(ggplot2)> head(diamonds)# A tibble: 6 x 10
carat cut color clarity depth table price x y z<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.432 0.21 Prem~ E SI1 59.8 61 326 3.89 3.84 2.313 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.314 0.290 Prem~ I VS2 62.4 58 334 4.2 4.23 2.635 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.756 0.24 Very~ J VVS2 62.8 57 336 3.94 3.96 2.48
Faceting with 2 factors
> library(ggplot2)> diamondsA = diamonds[diamonds$color %in% c("E","J","G"), ]> ggplot(diamondsA, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=depth))+facet_grid(color~cut)
22
Fair Good Very Good Premium Ideal
EG
J
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
0
5000
10000
15000
0
5000
10000
15000
0
5000
10000
15000
carat
pric
e
50
60
70
depth
Visualization with 3 factors
> library(ggplot2)> ggplot(diamondsA, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=clarity))+facet_grid(color~cut)
23
Fair Good Very Good Premium Ideal
EG
J
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
0
5000
10000
15000
0
5000
10000
15000
0
5000
10000
15000
carat
pric
e
clarity
I1
SI2
SI1
VS2
VS1
VVS2
VVS1
IF
Mathematical expressions
Math expressions in R
• Plotmath documentation
• expression and paste commands
Subset of diamonds data
Use dplyr and piping:> library(dplyr)> dB = diamonds %>%+ filter(color %in% c("E","J","G")) %>%+ filter(cut %in% c("Ideal","Premium"))> head(dB)# A tibble: 6 x 10
carat cut color clarity depth table price x y z
24
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.432 0.21 Prem~ E SI1 59.8 61 326 3.89 3.84 2.313 0.23 Ideal J VS1 62.8 56 340 3.93 3.9 2.464 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.715 0.2 Prem~ E SI2 60.2 62 345 3.79 3.75 2.276 0.32 Prem~ E I1 60.9 58 345 4.38 4.42 2.68
Base layer
> library(ggplot2)> p1 = ggplot(dB, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=depth))+facet_grid(cut~color)> p1
E G J
Prem
iumIdeal
1 2 3 4 1 2 3 4 1 2 3 4
0
5000
10000
15000
0
5000
10000
15000
carat
pric
e
45
50
55
60
65
depth
Math symbols in axis titles
Create math expressions for axis titles:
25
> xs = expression(paste("carat ", pi["1,m"], sep=""))> ys = expression(paste("price ", gamma^2, sep=""))> ms = c("Price vs cara")
Math symbols in axis titles
> p2=p1 + ggtitle(ms)+xlab(xs)+ylab(ys)++ theme(plot.title = element_text(hjust = 0.5))> p2
E G J
Prem
iumIdeal
1 2 3 4 1 2 3 4 1 2 3 4
0
5000
10000
15000
0
5000
10000
15000
carat π1,m
pric
e γ2
45
50
55
60
65
depth
Price vs cara
Math symbols in legend title
26
> p2 + labs(col = expression(paste("my ",lambda, sep="")))
E G J
Prem
iumIdeal
1 2 3 4 1 2 3 4 1 2 3 4
0
5000
10000
15000
0
5000
10000
15000
carat π1,m
pric
e γ2
45
50
55
60
65
my λ
Price vs cara
Subset of diamonds data
> library(dplyr)> dC = diamonds %>% filter(color %in% c("E","J","G")) %>%+ filter(cut %in% c("Ideal","Premium")) %>%+ filter(clarity %in% c("SI1","VS1"))> head(dC)# A tibble: 6 x 10
carat cut color clarity depth table price x y z<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.21 Prem~ E SI1 59.8 61 326 3.89 3.84 2.312 0.23 Ideal J VS1 62.8 56 340 3.93 3.9 2.463 0.33 Ideal J SI1 61.1 56 403 4.49 4.55 2.764 0.23 Ideal G VS1 61.9 54 404 3.93 3.95 2.445 0.31 Prem~ G SI1 61.8 58 553 4.35 4.32 2.686 0.7 Ideal E SI1 62.5 57 2757 5.7 5.72 3.57
27
Base layer
> library(ggplot2)> p3 = ggplot(dC, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=clarity))+facet_grid(cut~color)> p3
E G JP
remium
Ideal
1 2 1 2 1 2
0
5000
10000
15000
0
5000
10000
15000
carat
pric
e
clarity
SI1
VS1
Math symbols in legend labels
> library(ggplot2)> p3+scale_color_discrete(labels =+ c(expression(italic(omega)),"Any"))
28
E G J
Prem
iumIdeal
1 2 1 2 1 2
0
5000
10000
15000
0
5000
10000
15000
carat
pric
e
clarity
ω
Any
Math symbols in strip names
Command:
• Both factors
facet_grid(factorA ~ factorB, labeller = label_parsed)
• One factor
facet_grid(factorA ~ factorB,labeller = labeller(factorA=label_parsed))
Math symbols in strip names
Create math expressions for levels of factors:> ColorFStg = c(expression(paste(pi[0],"=", 0.5,sep="")),+ expression(paste(lambda[z],"=", 0.6,sep="")),+ expression(paste(zeta[0],"=", 0.7,sep="")))> dC$colorF = dC$color> dC$colorF = factor(dC$color, labels =ColorFStg)> dC[,c(1:4,7,11)] %>% group_by(colorF) %>% slice(1)
29
# A tibble: 3 x 6# Groups: colorF [3]
carat cut color clarity price colorF<dbl> <ord> <ord> <ord> <int> <ord>
1 0.21 Premium E SI1 326 "paste(pi[0], \"=\", 0.5, ~2 0.23 Ideal G VS1 404 "paste(lambda[z], \"=\", 0~3 0.23 Ideal J VS1 340 "paste(zeta[0], \"=\", 0.7~
Math symbols in strip names
> ggplot(dC, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=clarity))++ facet_grid(cut~colorF,labeller = label_parsed)
π0=0.5 λz=0.6 ζ0=0.7
Prem
iumIdeal
1 2 1 2 1 2
0
5000
10000
15000
0
5000
10000
15000
carat
pric
e
clarity
SI1
VS1
Math symbols in plot
> ggplot(dC, aes(x=carat,y=price))+theme_bw()++ geom_line(aes(linetype=clarity))+
30
+ facet_grid(cut~colorF,labeller = label_parsed)
π0=0.5 λz=0.6 ζ0=0.7
Prem
iumIdeal
1 2 1 2 1 2
0
5000
10000
15000
0
5000
10000
15000
carat
pric
e
clarity
SI1
VS1
Other ggplot2 twicks
Not covered
The following have not been covered:
• some statistical transforms: stat_XXX• lines, shapes for x-y plot: geom_XXX• axis, legend and strip adjustment: theme• figure margin adjustment: margin
Information on this can be found on the ggplot2 book or https://stackoverflow.com
License and session Information
License
31
> sessionInfo()R version 3.5.0 (2018-04-23)Platform: x86_64-w64-mingw32/x64 (64-bit)Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:[1] LC_COLLATE=English_United States.1252[2] LC_CTYPE=English_United States.1252[3] LC_MONETARY=English_United States.1252[4] LC_NUMERIC=C[5] LC_TIME=English_United States.1252
attached base packages:[1] stats graphics grDevices utils datasets methods[7] base
other attached packages:[1] bindrcpp_0.2.2 dplyr_0.7.8 GGally_1.4.0 ggplot2_3.1.0[5] knitr_1.21
loaded via a namespace (and not attached):[1] Rcpp_1.0.0 RColorBrewer_1.1-2 pillar_1.3.1[4] compiler_3.5.0 plyr_1.8.4 highr_0.7[7] bindr_0.1.1 tools_3.5.0 digest_0.6.18
[10] viridisLite_0.3.0 evaluate_0.12 tibble_1.4.2[13] gtable_0.2.0 pkgconfig_2.0.2 rlang_0.3.0.1[16] cli_1.0.1 rstudioapi_0.8 yaml_2.2.0[19] xfun_0.4 withr_2.1.2 stringr_1.3.1[22] grid_3.5.0 tidyselect_0.2.5 reshape_0.8.8[25] glue_1.3.0 R6_2.3.0 fansi_0.4.0[28] rmarkdown_1.11 reshape2_1.4.3 purrr_0.2.5[31] magrittr_1.5 scales_1.0.0 codetools_0.2-15[34] htmltools_0.3.6 assertthat_0.2.0 colorspace_1.3-2[37] labeling_0.3 utf8_1.1.4 stringi_1.2.4[40] lazyeval_0.2.1 munsell_0.5.0 crayon_1.3.4
32