Mathematics 101: Introduction to R · Introduction to R R works as a calculator. The elementary arithmetic operations are the usual +; ; ;=; and^. In addition all of the common arithmetic

Mathematics 101: Elementary Statistics

Mathematics 101:Introduction to R

Olive R. Cawiding

Department of Mathematics and Computer ScienceCollege of Science, University of the PhilippinesGov. Pack Road, Baguio City 2600 Philippines

[email protected]


What is R?

• The R project was started by Robert Gentleman and RossIhaka of the Statistics Department of the University ofAuckland in 1995.

• R is a computer programming language. For programmers, itwill feel more familiar than others and for new computerusers, the next step to programming will not be so large.

• R is an open-source statistical environment.


Advantages of R

• R is free.

• It has excellent graphing capabilities.

• It has easy to learn syntax with many built-in statisticalfunctions.


Special Symbols

> is called the prompt. It is used to indicate where you are totype. If a command is too long to fit on a line, a + is used forthe continuation prompt.

# is used for comments.


Introduction to R

R works as a calculator. The elementary arithmetic operationsare the usual +,−, ∗, /, and .̂

In addition all of the common arithmetic functions are availablesuch as log, exp, sin, cos, tan, sqrt, and so on.


Entering Data with c

The most useful R command for quickly entering in small datasets is the c function.

Example. The grades of six Math 101 students are 80, 85, 88,95, 95, and 98.

>grades=c(80, 85, 88, 95, 95, 98)

> assign("grades", c(80, 85, 88, 95, 95, 98))


Accessing Data

> grades[2] # prints out second entry of variable

[1] 85

> grades[-4] # prints out all entries except the

fourth one

[1] 80 85 88 95 98

> grades[c(1,2,3)] #prints out first, second, and

third entry

[1] 80 85 88


Accessing Data

>grades[grades>90] #prints out all grades greater

than 90

[1] 95 95 98

>grades[grades<88 | grades>95] #prints out values

greater than 85 or less than 100

[1] 80 85 98

>grades==95

[1] FALSE FALSE FALSE TRUE TRUE FALSE

>which(grades==95)

[1] 4 5


Applying Functions

> mean(grades)

[1] 90.16667

> median(grades)

[1] 91.5

> max(grades)

[1] 98

> min(grades)

[1] 80

> range(grades)

[1] 80 98

>length(grades)

[1] 6


Editing Data

> grades[6]=99

> grades

[1] 80 85 88 95 95 99

> grades[7]=65

> grades

[1] 80 85 88 95 95 99 65

> grades[8:10]=c(75,78,79)

> grades

[1] 80 85 88 95 95 99 65 75 78 79


Editing Data

>data.entry(grades)

>grades=de(grades)

>grades=edit(grades)


Entering Categorical Data

A survey asks people if they smoke or not. Five responses areas follows:

Yes, No, No, Yes, No

> x=c("Yes", "No", "No", "Yes", "No")

> x

[1] "Yes" "No" "No" "Yes" "No"


Categorical Data

> table(x)

x

No Yes

3 2

> factor(x)

[1] Yes No No Yes No

Levels: No Yes


Sorting Data

Suppose we have a sample of 30 tax accountants from all thestates and territories of Australia and their individual state oforigin is specified by a character vector of state mnemonics as

> state = c("tas", "sa", "qld", "nsw", "nsw", "nt",

"wa", "wa","qld", "vic", "nsw", "vic", "qld", "qld",

"sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw",

"nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act")


Sorting Data

> statef = factor(state)

> levels(statef)


Sorting Data

Suppose we have the incomes of the same tax accountants inanother vector (in suitably large units of money).

> incomes = c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69,

70, 42, 56, 61, 61, 61, 58, 51, 48, 65, 49, 49, 41,

48, 52, 46, 59, 46, 58, 43)


Sorting Data

To calculate the sample mean income for each state we can nowuse the special function tapply():

> incmeans = tapply(incomes, statef, mean)

act nsw nt qld sa tas vic wa

44.500 57.333 55.500 53.600 55.000 60.500 56.000 52.250


Generating Sequences

> s=seq(-5, 5, by=.2)

[1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2

-3.0 -2.8 -2.6 -2.4 -2.2

[16] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4

-0.2 0.0 0.2 0.4 0.6 0.8

[31] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2

3.4 3.6 3.8

[46] 4.0 4.2 4.4 4.6 4.8 5.0


Data Sets in R

>data()

Some examples of data sets in R are

AirPassengers

trees

Orange

cars

faithful

>unclass(AirPassengers)


Input from User

Suppose a group of 25 people are surveyed on the number ofbeer bottles they can finish in one night. These are the results.

3,4,1,1,3,4,3,3,1,3,2,1,2,1,2,3,2,3,1,1,1,1,4,3,1

> beer=scan()

1: 3 4 1 1 3 4 3 3 1 3 2 ...


Tables

> table(beer)

beer

1 2 3 4

10 4 8 3


Bar Charts

> barplot(table(beer))


Bar Charts

> barplot(table(beer)/length(beer))


Pie Charts

> beer.counts=table(beer)

> pie(beer.counts)

> names(beer.counts)=c("1 bottle", "2 bottles",

"3 bottles", "4 bottles")

> pie(beer.counts)

> pie(beer.counts,col=c("purple","green2","cyan",

"white"))


Stem-and-Leaf Display

>data()

>AirPassengers


Stem-and-Leaf Display

> stem(AirPassengers)

The decimal point is 2 digit(s) to the right of the |

1 | 011222223333344444

1 | 55555566677777788888889999

2 | 0000000112333333344444444

2 | 6667777778889

3 | 00111111222224444

3 | 5555666666666799

4 | 00011111222234

4 | 66677779

5 | 114

5 | 56

6 | 12


Histograms

> x=scan()

1: 29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3 0.3

16: 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1

27:

Read 26 items

> hist(x)

> hist(x, probability=TRUE)

> rug(jitter(x))

> hist(x,breaks=10)


Histograms


Constructing Frequency Distribution Tables with R

Math101=c(50,50,50,50,50,50,51,52,53,53)

Math101[11:20]=c(57,59,59,60,60,60,62,62,62,62)

Math101[21:30]=c(63,65,66,66,68,68,68,68,68,69)

Math101[31:40]=c(69,69,69,69,70,71,71,71,71,72)

Math101[41:50]=c(74,75,75,75,75,75,76,76,76,76)

Math101[51:60]=c(77,77,77,77,78,79,79,79,79,79)

Math101[61:70]=c(80,80,80,81,81,81,81,82,82,82)

Math101[71:80]=c(82,82,82,83,83,84,84,84,84,84)

Math101[81:90]=c(84,84,85,85,86,86,87,87,87,87)

Math101[91:100]=c(87,87,88,89,89,91,92,94,94,96)



cbs=seq(49.5,99.5,by=5)

scores=cut(Math101, cbs)

table(scores)

transform(table(scores))

freq_table=transform(table(scores))

transform(freq_table,RF=prop.table(Freq),CF=cumsum(Freq))



Scores Freq RF CF

1 (49.5,54.5] 10 0.10 10

2 (54.5,59.5] 3 0.03 13

3 (59.5,64.5] 8 0.08 21

4 (64.5,69.5] 13 0.13 34

5 (69.5,74.5] 7 0.07 41

6 (74.5,79.5] 19 0.19 60

7 (79.5,84.5] 22 0.22 82

8 (84.5,89.5] 13 0.13 95

9 (89.5,94.5] 4 0.04 99

10 (94.5,99.5] 1 0.01 100



>hist(Math101,main="Final Grades",xlab="Grades",

ylab="Frequency",col="red", xaxt = ’n’)

>axis(1, cbs, labels = TRUE)


References

An Introduction to R. Notes on R: A ProgrammingEnvironment for Data Analysis and Graphics Version 3.5.1(2018-07-02) by W. N. Venables, D. M. Smith and the R CoreTeam

Documents

Mathematics 101: Introduction to R · Introduction to R R works as a calculator. The elementary arithmetic operations are the usual +; ; ;=; and^. In addition all of the common arithmetic