Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Mathematics 101: Elementary Statistics
Mathematics 101:Introduction to R
Olive R. Cawiding
Department of Mathematics and Computer ScienceCollege of Science, University of the PhilippinesGov. Pack Road, Baguio City 2600 Philippines
Mathematics 101: Elementary Statistics
What is R?
• The R project was started by Robert Gentleman and RossIhaka of the Statistics Department of the University ofAuckland in 1995.
• R is a computer programming language. For programmers, itwill feel more familiar than others and for new computerusers, the next step to programming will not be so large.
• R is an open-source statistical environment.
Mathematics 101: Elementary Statistics
Advantages of R
• R is free.
• It has excellent graphing capabilities.
• It has easy to learn syntax with many built-in statisticalfunctions.
Mathematics 101: Elementary Statistics
Special Symbols
> is called the prompt. It is used to indicate where you are totype. If a command is too long to fit on a line, a + is used forthe continuation prompt.
# is used for comments.
Mathematics 101: Elementary Statistics
Introduction to R
R works as a calculator. The elementary arithmetic operationsare the usual +,−, ∗, /, and .̂
In addition all of the common arithmetic functions are availablesuch as log, exp, sin, cos, tan, sqrt, and so on.
Mathematics 101: Elementary Statistics
Entering Data with c
The most useful R command for quickly entering in small datasets is the c function.
Example. The grades of six Math 101 students are 80, 85, 88,95, 95, and 98.
>grades=c(80, 85, 88, 95, 95, 98)
> assign("grades", c(80, 85, 88, 95, 95, 98))
Mathematics 101: Elementary Statistics
Accessing Data
> grades[2] # prints out second entry of variable
[1] 85
> grades[-4] # prints out all entries except the
fourth one
[1] 80 85 88 95 98
> grades[c(1,2,3)] #prints out first, second, and
third entry
[1] 80 85 88
Mathematics 101: Elementary Statistics
Accessing Data
>grades[grades>90] #prints out all grades greater
than 90
[1] 95 95 98
>grades[grades<88 | grades>95] #prints out values
greater than 85 or less than 100
[1] 80 85 98
>grades==95
[1] FALSE FALSE FALSE TRUE TRUE FALSE
>which(grades==95)
[1] 4 5
Mathematics 101: Elementary Statistics
Applying Functions
> mean(grades)
[1] 90.16667
> median(grades)
[1] 91.5
> max(grades)
[1] 98
> min(grades)
[1] 80
> range(grades)
[1] 80 98
>length(grades)
[1] 6
Mathematics 101: Elementary Statistics
Editing Data
> grades[6]=99
> grades
[1] 80 85 88 95 95 99
> grades[7]=65
> grades
[1] 80 85 88 95 95 99 65
> grades[8:10]=c(75,78,79)
> grades
[1] 80 85 88 95 95 99 65 75 78 79
Mathematics 101: Elementary Statistics
Editing Data
>data.entry(grades)
>grades=de(grades)
>grades=edit(grades)
Mathematics 101: Elementary Statistics
Entering Categorical Data
A survey asks people if they smoke or not. Five responses areas follows:
Yes, No, No, Yes, No
> x=c("Yes", "No", "No", "Yes", "No")
> x
[1] "Yes" "No" "No" "Yes" "No"
Mathematics 101: Elementary Statistics
Categorical Data
> table(x)
x
No Yes
3 2
> factor(x)
[1] Yes No No Yes No
Levels: No Yes
Mathematics 101: Elementary Statistics
Sorting Data
Suppose we have a sample of 30 tax accountants from all thestates and territories of Australia and their individual state oforigin is specified by a character vector of state mnemonics as
> state = c("tas", "sa", "qld", "nsw", "nsw", "nt",
"wa", "wa","qld", "vic", "nsw", "vic", "qld", "qld",
"sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw",
"nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act")
Mathematics 101: Elementary Statistics
Sorting Data
> statef = factor(state)
> levels(statef)
Mathematics 101: Elementary Statistics
Sorting Data
Suppose we have the incomes of the same tax accountants inanother vector (in suitably large units of money).
> incomes = c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69,
70, 42, 56, 61, 61, 61, 58, 51, 48, 65, 49, 49, 41,
48, 52, 46, 59, 46, 58, 43)
Mathematics 101: Elementary Statistics
Sorting Data
To calculate the sample mean income for each state we can nowuse the special function tapply():
> incmeans = tapply(incomes, statef, mean)
act nsw nt qld sa tas vic wa
44.500 57.333 55.500 53.600 55.000 60.500 56.000 52.250
Mathematics 101: Elementary Statistics
Generating Sequences
> s=seq(-5, 5, by=.2)
[1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2
-3.0 -2.8 -2.6 -2.4 -2.2
[16] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4
-0.2 0.0 0.2 0.4 0.6 0.8
[31] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2
3.4 3.6 3.8
[46] 4.0 4.2 4.4 4.6 4.8 5.0
Mathematics 101: Elementary Statistics
Data Sets in R
>data()
Some examples of data sets in R are
AirPassengers
trees
Orange
cars
faithful
>unclass(AirPassengers)
Mathematics 101: Elementary Statistics
Input from User
Suppose a group of 25 people are surveyed on the number ofbeer bottles they can finish in one night. These are the results.
3,4,1,1,3,4,3,3,1,3,2,1,2,1,2,3,2,3,1,1,1,1,4,3,1
> beer=scan()
1: 3 4 1 1 3 4 3 3 1 3 2 ...
Mathematics 101: Elementary Statistics
Tables
> table(beer)
beer
1 2 3 4
10 4 8 3
Mathematics 101: Elementary Statistics
Bar Charts
> barplot(table(beer))
Mathematics 101: Elementary Statistics
Bar Charts
> barplot(table(beer)/length(beer))
Mathematics 101: Elementary Statistics
Pie Charts
> beer.counts=table(beer)
> pie(beer.counts)
> names(beer.counts)=c("1 bottle", "2 bottles",
"3 bottles", "4 bottles")
> pie(beer.counts)
> pie(beer.counts,col=c("purple","green2","cyan",
"white"))
Mathematics 101: Elementary Statistics
Stem-and-Leaf Display
>data()
>AirPassengers
Mathematics 101: Elementary Statistics
Stem-and-Leaf Display
> stem(AirPassengers)
The decimal point is 2 digit(s) to the right of the |
1 | 011222223333344444
1 | 55555566677777788888889999
2 | 0000000112333333344444444
2 | 6667777778889
3 | 00111111222224444
3 | 5555666666666799
4 | 00011111222234
4 | 66677779
5 | 114
5 | 56
6 | 12
Mathematics 101: Elementary Statistics
Histograms
> x=scan()
1: 29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3 0.3
16: 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1
27:
Read 26 items
> hist(x)
> hist(x, probability=TRUE)
> rug(jitter(x))
> hist(x,breaks=10)
Mathematics 101: Elementary Statistics
Histograms
Mathematics 101: Elementary Statistics
Constructing Frequency Distribution Tables with R
Math101=c(50,50,50,50,50,50,51,52,53,53)
Math101[11:20]=c(57,59,59,60,60,60,62,62,62,62)
Math101[21:30]=c(63,65,66,66,68,68,68,68,68,69)
Math101[31:40]=c(69,69,69,69,70,71,71,71,71,72)
Math101[41:50]=c(74,75,75,75,75,75,76,76,76,76)
Math101[51:60]=c(77,77,77,77,78,79,79,79,79,79)
Math101[61:70]=c(80,80,80,81,81,81,81,82,82,82)
Math101[71:80]=c(82,82,82,83,83,84,84,84,84,84)
Math101[81:90]=c(84,84,85,85,86,86,87,87,87,87)
Math101[91:100]=c(87,87,88,89,89,91,92,94,94,96)
Mathematics 101: Elementary Statistics
Constructing Frequency Distribution Tables with R
cbs=seq(49.5,99.5,by=5)
scores=cut(Math101, cbs)
table(scores)
transform(table(scores))
freq_table=transform(table(scores))
transform(freq_table,RF=prop.table(Freq),CF=cumsum(Freq))
Mathematics 101: Elementary Statistics
Constructing Frequency Distribution Tables with R
Scores Freq RF CF
1 (49.5,54.5] 10 0.10 10
2 (54.5,59.5] 3 0.03 13
3 (59.5,64.5] 8 0.08 21
4 (64.5,69.5] 13 0.13 34
5 (69.5,74.5] 7 0.07 41
6 (74.5,79.5] 19 0.19 60
7 (79.5,84.5] 22 0.22 82
8 (84.5,89.5] 13 0.13 95
9 (89.5,94.5] 4 0.04 99
10 (94.5,99.5] 1 0.01 100
Mathematics 101: Elementary Statistics
Constructing Frequency Distribution Tables with R
>hist(Math101,main="Final Grades",xlab="Grades",
ylab="Frequency",col="red", xaxt = ’n’)
>axis(1, cbs, labels = TRUE)
Mathematics 101: Elementary Statistics
References
An Introduction to R. Notes on R: A ProgrammingEnvironment for Data Analysis and Graphics Version 3.5.1(2018-07-02) by W. N. Venables, D. M. Smith and the R CoreTeam