30

Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Statistical Programming with R

Lecture 1: Basic Concepts

Bisher M. [email protected]

Department of Mathematics, Faculty of Science,

The Islamic University of Gaza

2019-2020, Semester 1

Page 2: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Simple R Expressions

A user types expressions to the R interpreter.R responds by computing and printing the answers.(The second line isthe answer from the machine.)

> # "*" is the symbol for multiplication.

> # Everything following a # sign is assumed to be a

> # comment and is ignored by R.

> 1 + 2

[1] 3

> 1/2

[1] 0.5

> 17^2

[1] 289

> 1 + 2 * 3

[1] 7

> (1 + 2) * 3

[1] 9

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 1 / 28

Page 3: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

R Functions

> sqrt(2)

[1] 1.414214

> exp(2)

[1] 7.389056

> sin(1)

[1] 0.841471

> 4 * atan(1)

[1] 3.141593

> abs(3-7) # Absolute value of 3-7

[1] 4

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 2 / 28

Page 4: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Named storage

R has a workspace that can be used to provides a way of naming the valuesproduced by computations. A name/value pair stored by R is called avariable. To assign the value 10 to the variable x, you can enter

> x=10 # or x<-10 (<- read as a single symbol)

From now on x has the value 10 and can be used in subsequent arithmeticexpressions.

> x

[1] 10

> x + x

[1] 20

A variable?s value can be changed by performing a new assignment of thename.

> x=12 # Names are case-sensitive:

> x + x # X and x do not refer to the same variable.

[1] 24

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 3 / 28

Page 5: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Vectors

Rather than working with individual data values, R computationsoperate on vectors of values.

This re�ects the fact that statistical computations generally take placeon collections of values rather than individual ones.

The key point about vectors is that they contain values which are allof the same basic type (numbers, complex-numbers, character strings,etc.).

For the time being, we'll con�ne ourselves to discussing numericvectors (vectors whose elements are numbers).

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 4 / 28

Page 6: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Vectors Continued...

A numeric vector is a list of numbers. The c() function is used to collectthings together into a vector (i.e. concatenated). We can type

> c(-1, 5, 9)

[1] -1 5 9

Again, we can assign this to a named object:

> X <- c(-1, 5, 9) # now X is a 3-element vector

To see the contents of X, simply type

> X

[1] -1 5 9

If you also type x, you will obtain (Why???)

> x

[1] 12

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 5 / 28

Page 7: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Sequences

One very useful way of generating vectors is using the sequence operator :.The expression z1:z2, generates the sequence of integers ranging from z1to z2.

> 1:45

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13

[14] 14 15 16 17 18 19 20 21 22 23 24 25 26

[27] 27 28 29 30 31 32 33 34 35 36 37 38 39

[40] 40 41 42 43 44 45

> 7:-5

[1] 7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 6 / 28

Page 8: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Combining Vectors

The function c() can be used to combine both vectors and scalars intolarger vectors.

> y = c(1, 2, 3, 4)

> c(y, 10)

[1] 1 2 3 4 10

> c(y, y)

[1] 1 2 3 4 1 2 3 4

In fact, R stores scalar values like 10 as vectors of length one, so that allarguments in the expression above are vectors.

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 7 / 28

Page 9: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Vector Arithmetic

Because 'everything is a vector', R handles vector arithmetic quite easilyand intuitively.

>(w<-1:5)#create and print a vector of consecutive integers

[1] 1 2 3 4 5

> w+2 # scalar addition

[1] 3 4 5 6 7

> 2*w #scalar multiplication

[1] 2 4 6 8 10

> w^2 #raise each component to the second power

[1] 1 4 9 16 25

> 2^w #raise 2 to the first through fifth power

[1] 2 4 8 16 32

> w #w itself has not been unchanged

[1] 1 2 3 4 5

> w<-w*2

> w #it is now changed

[1] 2 4 6 8 10Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 8 / 28

Page 10: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

More examples of vector arithmetic:

>z=c(1,3,2,10,5);w=1:5 #use semicolon to separate statements

> z+w

[1] 2 5 5 14 10

> z*w

[1] 1 6 6 40 25

> z/w

[1] 1.0000000 1.5000000 0.6666667 2.5000000 1.0000000

> z^w

[1] 1 9 8 10000 3125

> sum(z) #sum of elements in z

[1] 21

> cumsum(z) #cumulative sum vector

[1] 1 4 6 16 21

> max(z) #maximum

[1] 10

> min(z) #minimum

[1] 1Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 9 / 28

Page 11: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

The Recycling Rule

When the vectors are di�erent lengths, the shorter one is extended byrecycling: values are repeated, starting at the beginning. For example, tosee what happens when vectors of di�erent sizes are combined.

> c(1, 2, 3, 4) + c(1, 2)

[1] 2 4 4 6

This result is explained by the recycling rule which is used by R to de�nethe meaning of this kind of calculation. Here is how the recycling ruleworks.

1234

+

(12

)recycle=⇒

1234

+

1212

add=⇒

2446

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 10 / 28

Page 12: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Binary Operations

The following binary operations all obey the recycling rule.

+ addition- subtraction* multiplication/ division� raising to a power%% remainder after division (modulo)%/% integer division

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 11 / 28

Page 13: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

The Integer Division, Remainder and Modulo Operators

The value of the integer division z1 %/% z2 is computed by dividing z1 byz2 and then rounding down to the next lowest integer.

> 11 %/% 3

[1] 3

The result of the remainder expression z1 %% z2 is de�ned asz1 − z2 × (z1%/%z2).

> 11 %% 3

[1] 2 # 11-3(11 %/% 3)=11-3(3)=2

floor(), ceiling()- rounds to integers not greater or not less than theirarguments, respectively.

> floor(11/3) > 11 %/% 3 == floor(11/3)

[1] 3 [1] TRUE

> ceiling(11/3)

[1] 4

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 12 / 28

Page 14: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

The Integer Division, Remainder and Modulo Operators

Continued...

The modulo operator is useful in integer computations,

> 11 %% 3

[1] 2

> 1:10 %% 2

[1] 1 0 1 0 1 0 1 0 1 0

but it can also be used with more general numbers

> 13.5 %% 2

[1] 1.5

> 13.5 %/% 2

[1] 6

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 13 / 28

Page 15: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Extracting elements from vectors

Individual elements can be extracted from vectors by specifying their index.The third element can be extracted from v = 3:8 as follows.

> v[3]

[1] 5

Square brackets ([ ]) are used for subscripting, and can be applied to anysubscriptable value.It is also possible to extract subvectors by specifying vectors of indices.

> v[c(2, 4)]

[1] 4 6

The sequence operator provides a useful way of extracting consecutiveelements from a vector.

> v[2:4]

[1] 4 5 6

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 14 / 28

Page 16: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Negative subscripts

Negative indices can be used to avoid certain elements. For example, wecan select all but the second element of v as follows:

> v[-2]

[1] 3 5 6 7 8

The third through �fth elements of v can be avoided as follows:

> v[-(3:5)]

[1] 3 4 8

Do not mix positive and negative subscripts. To see what happens, consider

> v[c(-2,4)]

Error in v[c(-2, 4)]:only 0's may be mixed with

negative subscripts

The problem is that it is not clear what is to be extracted: do we want thethird element of v before or after removing the second one?

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 15 / 28

Page 17: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Changing Vector Subsets

As well extracting the values at particular positions in a vector, it ispossible to reset their values. This is done by putting the subset to bemodi�ed on the left-hand side of the assignment with the replacementvalue(s) on the right.

> y = 1:10

> y[4:6] = 0

> y

[1] 1 2 3 0 0 0 7 8 9 10

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 16 / 28

Page 18: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Special Numerical Values ? In�nity

When 1 is divided by 0, mathematics de�nes the result to be in�nite. Thiskind of special result is also produced by R.

> 1 / 0

[1] Inf

Here, Inf represents positive in�nity. There is also a negative in�nity.

> -1 / 0

[1] -Inf

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 17 / 28

Page 19: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Properties of In�nity

In�nities have all the properties you would expect. For example

> 1 + Inf

[1] Inf

and

> 1000 / Inf

[1] 0

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 18 / 28

Page 20: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Special Numerical Values ? Not a Number

R also has a special value, called NaN, which indicates that a numericalresult is unde�ned.

> 0 / 0

[1] NaN

and subtracting in�nity from in�nity.

> Inf - Inf

[1] NaN

Some mathematical functions will also produce NaN results.

> sqrt(-1)

[1] NaN

Warning message:

In sqrt(-1) : NaNs produced

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 19 / 28

Page 21: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Special Numerical Values ? Not Available

R has a particular value which is used to indicate that a value is missing ornot available. The value is indicated by NA. Any arithmetic expressionwhich contains NA will produce NA as a result.

> 1 + sin(NA)

[1] NA

The value NA is usually used for statistical observations where the valuecould not be recorded, for example, when a survey researcher visits a houseand no one is home.

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 20 / 28

Page 22: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Some Functions for Vectors

• unique()- returns a vector containing one element for each uniquevalue in the vector.

• duplicated()- returns a logical vector which tells if elements of avector are duplicated with regard to previous ones.

• rev()- reverse the order of elements in a vector.

• sort()- sorts the elements in a vector.

• append()- append or insert elements in a vector.

• sum()- returns the sum of the elements of a vector.

• min()- returns the minimum value in a vector.

• max()- returns the maximum value in a vector.

• range()- returns a vector containing the minimum and maximumvalues in a vector.

• prod()- returns the product of all the values present in a vector.

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 21 / 28

Page 23: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Examples

> x=1:4; y=c(5,-3,4,8,2); z=6:3

> rev(x)

[1] 4 3 2 1

> sort(y)

[1] -3 2 4 5 8

> m=append(x,z)

> m

[1] 1 2 3 4 6 5 4 3

> unique(m)

[1] 1 2 3 4 6 5

> duplicated(m)

[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 22 / 28

Page 24: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Summary Functions: min, max and range

The functions min and max return the minimum and maximum valuescontained in any of their arguments, and the function range returns avector of length 2 containing the minimum and maximum of the values inthe arguments.

> max(1:100)

[1] 100

> max(1:100, Inf)

[1] Inf

> range(1:100)

[1] 1 100

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 23 / 28

Page 25: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Summary Functions: sum and prod

The functions sum and prod compute the sum and prod of all the elementsin their arguments.

> sum(1:100)

[1] 5050

> prod(1:10)

[1] 3628800

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 24 / 28

Page 26: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Summary Functions and NA

In any of these summary functions the presence of NA and NaN values inany of the arguments will produce a result which is NA and NaN.

> min(NA, 100)

[1] NA

NA and NaN values can be disregarded by specifying an additional argumentof na.rm=TRUE.

> min(10, 20, NA, na.rm = TRUE)

[1] 10

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 25 / 28

Page 27: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Cumulative Summaries

There are also cumulative variants of the summary functions.

> cumsum(1:10)

[1] 1 3 6 10 15 21 28 36 45 55

> cumprod(1:10)

[1] 1 2 6 24 120

[6] 720 5040 40320 362880 3628800

> cummax(1:10)

[1] 1 2 3 4 5 6 7 8 9 10

> cummax(10:1)

[1] 10 10 10 10 10 10 10 10 10 10

> cummin(1:10)

[1] 1 1 1 1 1 1 1 1 1 1

> cummin(10:1)

[1] 10 9 8 7 6 5 4 3 2 1

These cumulative summary functions do not have a na.rm argument.

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 26 / 28

Page 28: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Non-vectorized functions

Although most functions in R are vectorized, returning objects which arethe same size and shape as their input, some will always return a singlelogical value.any() tests if any of the elements of its arguments meet a particularcondition; all() tests if they all do.

> x = c(7,3,12,NA,13,8)

> any(is.na(x))

[1] TRUE

> all(x > 0)

[1] NA

> all(x > 0, na.rm=TRUE)

[1] TRUE

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 27 / 28

Page 29: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

Non-vectorized functions Cont.

identical() tests if two objects are exactly the same.

> set.seed(5)

> x <- rnorm(3)

> x

[1] -0.8408555 1.3843593 -1.2554919

> set.seed(5)

> y <- rnorm(3)

> y

[1] -0.8408555 1.3843593 -1.2554919

> identical(x, y)

[1] TRUE

> z <- c(-0.8408555, 1.3843593, -1.2554919)

> identical(x, z)

[1] FALSE

Bisher M. Iqelan (IUG) R Programming: Basic Concepts 1st Semester 2019 28 / 28

Page 30: Statistical Programming with R - site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2019/09/01RIntro.pdf · Simple R Expressions A user types expressions to the R interpreter. R responds

End of lecture 1. Thank you.!!!