Upload
danny-dewitt
View
236
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The fourth lecture of intro to R programming
Citation preview
Lecture 4
Xiaotong Suo
Homework 1
Ques7on 3
Todays agenda
Data input/output Graphics
Data input/output R can write matrix and data frames to le using the
func7on write.table. And read data from le using read.table.
If you have a tab-delimited le, use the func7on read.delim instead. If the le is comma-separated le, then use read.csv.
Year,Student,Major 2009, John Doe,Sta7s7cs 2009, Bart Simpson, Mathema7cs I
The above is an example of a comma-separated le. Tab.delimited is the same except that we have tabs as a separator.
Data input/output con7nued
The data set airquality is available is R and gives weather measurement in New York city over some period of 7me. Load that data set in a data frame and save it to a le.
Data input/output con7nued
dt=airquality write.table(dt,Airquality.dt,col.names=T, row.names=F,sep=" ", na=Missing)
You could also use write.csv. See the help documenta7on for details.
Data Input/output con7nued Things to keep in mind when reading or wri7ng to le:
Header: whether the le has a rst row giving the names of the variables.
Separator: What separator of elds is used: space, comma, tabular.
Missing data character string: What character strings serve as missing data.
Do you want to allow R to convert characters variables to factors? use op7ons stringsAsFactors and as.is.
Data input/output con7nued
The general syntax of read.table: mydata=read.table(lename.dat,header=F, sep= , dec=., col.names=c(V1,V2),na.strings=NA)
Data input/output con7nued
Let try it with the le just saved. dtNew=read.table(Airquality.dt,header=T, sep= , dec=.,na.strings=Missing)
Data input/output con7nued
As men7oned earlier, if you have a tab-delimited le, use the func7on read.delim instead. If the le is comma-separated le, then use read.csv.
Another func7on to read text data is read.fwf that works with xed-width text data. See the user manual for more detail.
Yet, another func7on to read data from le is scan. It is more ecient when reading data of a single mode. See the user manual.
Data input/output
Exercise: The le Earmarksbymember08.xls is an Excel le available in coursework. Load this le in R.
Graphics R has a powerful graphical capability.
To plot a graph you need a graphical device. If you launch your plot right away, R will create automa7cally one graphical device for you.
On OS Mac use the func7on quartz() to create a graphical device.
On Windows systems, use windows()
A graphical device can also be a le. Your graphs are then sent to that le. Use the func7ons pdf() postscript()
Graphics con7nued
Example: the airquality data set. dt=airquality names(dt) boxplot(dt$Temp) plot(dt$Temp,type=l) plot(dt$Temp,dt$Wind,type=p) plot(dt$Temp,dt$Wind,type=p,xlab=Temperature, ylab=Wind, main=Wind vs Temp. in NY city May-Sept. 73)
Graphics
Con7nuing with the airquality dataset, suppose we want to do a boxplot of the data from each month. dt$Month=as.factor(dt$Month) boxplot(Temp ~ Month,data=dt, names=c(May,June,July,August,Sept.))
Graphics
What if we want to have mul7ple graphics on the same graphical device? There are many ways to do this.
One simple possibility is layout.
Graphics
Example: the airquality data set. m=matrix(c(1,2),ncol=2) layout(m) layout.show(2) boxplot(dt$Temp,main=Boxplot) plot(dt$Temp,type=l,main=Time series plot)
Graphics
Example: the airquality data set. m=matrix(c(1,3,2,3),2,2) layout(m) layout.show(3) boxplot(dt$Temp,main=Boxplot Temp. in NY city)
plot(dt$Temp,type=l,main=Temp. in NY city) plot(dt$Temp,dt$Wind,type=p,xlab=Temp, ylab=Wind,main=xyplot)
Graphics con7nued
What if we want to put mul7ple graphs on the same plot.
issue par(new=T) rst.
Graphics con7nued
Few plokng func7ons in R: plot(x): plot the values of vector x. plot(x,y): bivariate plot of y as func7on of x. boxplot(x): box-and-whiskers plot. hist(x): produce a histogram of x. ... many others. See R manual by typing help.start().
Graphics con7nued
Example: n=10000; X=rnorm(n); hist(X,breaks=200,prob=T,col=blue, xlim=c(-4,4),ylim=c(0,0.4))
par(new=T) curve(dnorm,xlim=c(-4,4),ylim=c(0,0.4),lwd=2,col=red ,xlab=,ylab=)
graphics
Example: X=rnorm(100); Y=rnorm(100) m=matrix(c(1,2),ncol=2) layout(m) plot(x,y) plot(x,y,xlab=100 Normal rvs,ylab=100 Normal rvs, col=blue,pch=4,main=Example of plot in R)
Graphics con7nued
Exercise: The Californian freeway performance measurement system. The data is ow-occ-table.txt in coursework. Download the le to your computer and load it in R using read.table. Prac7ce with the following code.
Graphics con7nued
dt=read.table(ow-occ-table.txt,header=T,sep=,)
names(dt) Ind=complete.cases(dt) sum(Ind); length(dt[,1]) arach(dt)
Graphics con7nued
m=matrix(c(1,5,2,5,3,5,4,5),ncol=4) layout(m) boxplot(Flow1,Flow2,Flow3,names=c(Flow1,Flow2,Flow3) main=Boxplots ows) boxplot(Occ1,Occ2,Occ3,names=c(Flow1,Flow2,Flow3), main=Boxplots Occup.)
plot(Occ2,Flow2,type=p,col=blue, main=Flow vs Occup. for Lane 2)
plot(Occ3,Flow3,type=p,col=red, main=Flow vs Occup. for Lane 3)
Graphics con7nued
plot(Occ1,type=l,xlim=c(0,1700), ylim=c(0,0.5),col=green)
par(new=T) plot(Occ2,type=l,xlim=c(0,1700), ylim=c(0,0.5),col=blue)
par(new=T) plot(Occ3,type=l,xlim=c(0,1700), ylim=c(0,0.5),col=red,main=Occup. for Lane 1,2 and 3)
Graphics
legend(x=top,legend=c(Lane 1, Lane 2, Lane 3),col=c(green,blue,red) ,lty=c(1,1,1))
ggplot2
hrp://cran.r-project.org/web/packages/ggplot2/index.html
Returns much nicer plots. Install the package rst in R and type library(ggplot2)
Control structures
So far we have learned some of the basic aspects of R: working with its basic objects, input/output, graphics. Here, we learn the more general task of wri7ng computer programs using R.
Control structures con7nued
An important component of a programming language is control structures to implement repe77ve tasks.
R programming language has control structures similar to C
For loops
Loops are used to carry out a sequence of related opera7ons without having to write the code for each step explicitly. For instance, suppose we want to calculate:
ii=1
10
For loops con7nued
x=0 for (i in 1:10) { x=x+i }
For loops In the above program, x is an accumulator variable,
meaning that its value is repeatedly updated while the program runs.
Always remember to ini7alize accumulator variables (to zero in the example).
To clarify, we can add a print statement inside the loop body. x=0 for (i in 1:10) { x=x+i
print(c(i,x)) }
For loops
The general structure of for loops: for (var in seq) expr Or for (var in seq){
expr }
For loops con7nued
Exercise: Given a matrix A, write a for loop that calculates the sum of each row of A.
For loops con7nued
This is an example of a trivial for loop. There is never the need to do such loops in R because it provides a simple class of func7ons to do just that: the apply func7ons.
Owen 7mes the apply func7ons even lead to faster code (but not always).
Next lecture
More control structures R in Sta7s7cs(linear regression,etc)