R Tutorial.pdf

Embed Size (px)

Citation preview

  • 8/2/2019 R Tutorial.pdf

    1/27

    An Easy Introduction To R for

    IE 460, IE 508 and IE 586 Course Participants

    Ismail Basoglu

    February 23, 2012

    1

  • 8/2/2019 R Tutorial.pdf

    2/27

    1 Introduction

    This document contains an easy introduction to the programming language R. By the help of eachexample given in this document, you should be able to gather a basic knowledge about R, which will helpyou

    To use predefined functions of statistical forecasting models and realize an effective analysis of givendata or time series in IE 460 Statistical Forecasting and Time Series course,

    To run statistical tests, build statistical models and apply inferential methods regarding the topicsin IE 508 Statistical Inference course,

    To create financial applications and implement Monte Carlo Methods in IE 586 Quantitative Fi-nance course.

    In order to comprehend this programming language, it is recommended that you try each and every stepof the applications presented in this document.

    You can download the latest version of R from http://cran.r-project.org/. For Windows users,click Windows link, then the base link and you will see the download link for the *.exe file.

    Once you install R, we recommend you to write your code in script files. Just click File from the

    quick access bar, then New script and you can write your code inside this script. If you have a completecode in your script file, you can press Ctrl+A and then Ctrl+R to run your code in the R console ina fast manner. You can always save your script files, then reach them again by clicking File and Openscript from the quick access bar.

    Have fun!

    2

  • 8/2/2019 R Tutorial.pdf

    3/27

    2 R Works with Vectors

    2.1 Creating Vectors

    In order to assign a value to a specified variable (e.g. 3 to x), we do the following:

    x < - 3or

    x = 3

    We will use the operator

  • 8/2/2019 R Tutorial.pdf

    4/27

    x

  • 8/2/2019 R Tutorial.pdf

    5/27

    If we are not interested in the length of the sequence but the step size, we can use by parameterinstead of length.out.

    x =: greater than or equal to ==: equal to (do not forget that a single = symbol is used for assigning values) !=: not equal to

    In the following sequence of examples, we create a vector and use it in different logical expressions. If avector element satisfies the expression, it returns a TRUE, otherwise a FALSE in the corresponding index.You can use && as and and || as or in between logical expressions.

    x

  • 8/2/2019 R Tutorial.pdf

    6/27

    x

  • 8/2/2019 R Tutorial.pdf

    7/27

    x

  • 8/2/2019 R Tutorial.pdf

    8/27

    vec

  • 8/2/2019 R Tutorial.pdf

    9/27

    x

  • 8/2/2019 R Tutorial.pdf

    10/27

    y

  • 8/2/2019 R Tutorial.pdf

    11/27

    x

  • 8/2/2019 R Tutorial.pdf

    12/27

    x

  • 8/2/2019 R Tutorial.pdf

    13/27

    exp(4.60517) # must yield 100, maybe with a rounding error

    # [1] 99.99998

    exp(log(100)) # no rounding errors

    # [1] 100

    exp(seq(-2,2,0.4))

    # [1] 0.1353353 0.2018965 0.3011942 0.4493290 0.6703200 1.0000000 1.4918247

    # [8] 2.2255409 3.3201169 4.9530324 7.3890561

    gamma(5) # equivalent to factorial(4)

    # [1] 24

    gamma(5.5) # equivalent to factorial(4.5)

    # [1] 52.34278

    x

  • 8/2/2019 R Tutorial.pdf

    14/27

    3 Probability and Statistical Basis of R

    3.1 Probability Functions in R

    There are four functions related to the distributions which are well-known and commonly used in proba-bility theory and statistics. Let us give the definitions of those functions on normal distribution and then

    talk about this probability distributions which are available in R.

    dnorm(x,y,z): returns the pdf (probability distribution function) value ofx in a normal distributionwith mean y and standard deviation z.

    pnorm(x,y,z): returns the cdf (cumulative density function) value of x in a normal distributionwith mean y and standard deviation z.

    qnorm(x,y,z): returns the inverse cdf value of x in a normal distribution with mean y and standarddeviation z. Clearly x must be in the unit interval (x [0, 1]).

    rnorm(x,y,z): returns a vector of random variates (RVs) which has length x. The variates willfollow a normal distribution with mean y and standard deviation z.

    Check out the following examples about normal distribution:

    dnorm(0.5) # if no parameter is defined, R assumes a std. normal distribution

    # [1] 0.3520653

    dnorm(0,2,1)

    # [1] 0.05399097

    dnorm(3,3,5)

    # [1] 0.07978846

    pnorm(0) # the area below the curve

    # on the left side of "0" in a std. normal distribution

    # [1] 0.5

    pnorm(2)

    # [1] 0.9772499pnorm(5,3,1)

    # [1] 0.9772499

    # following are the inverse of the previous "pnorm()" functions

    qnorm(0.5)

    # [1] 0

    qnorm(0.9772499)

    # [1] 2.000001

    qnorm(0.9772499,3,1)

    # [1] 5.000001

    rnorm(20,2,1) # will generate 20 RVs which follow normal dist.

    # with mean 2 and std. dev. 1# [1] 2.31502453 0.37445729 2.04994863 1.89381118 0.63099383 1.50837615

    # [7] 0.57363369 2.84601422 2.54003868 3.43652548 0.88941281 3.36373629

    # [13] 0.58945290 2.44678124 -0.05360271 2.73920472 2.73643684 1.79465998

    # [19] 1.30906099 2.18648566

    14

  • 8/2/2019 R Tutorial.pdf

    15/27

    Here is a list of useful distributions that are available for computation in R. There are also otherdistributions which are available in R but not in this list. (For each distribution below, you can obtainthe cdf function by changing the initial letter d to p, the inverse cdf by changing to q and random variategenerator by changing to r). Apart from the normal distribution, please intend to practice and learnabout d,p,q,r functions over the first nine distributions in this list3.

    dpois(x,y) : returns the pmf (probability mass function) value of x in a Poisson distribution withmean (rate) y.

    dbinom(x,y,z) : returns the pmf value of x in a binomial distribution with a population size yand success probability z.

    dgeom(x,y) : returns the pmf value of x in a geometric distribution with a success probability y. dunif(x,y,z) : returns the pdf value of x in a uniform distribution with lower bound y and upper

    bound z.

    dexp(x,y) : returns the pdf value of x in a exponential distribution with a rate parameter y. dgamma(x,y,scale=z) : returns the pdf value of x in a gamma distribution with a shape parameter

    y and a scale parameter z. (If you do not write scale in parameter definition, it assumes z as therate parameter, which is equal to 1/scale)

    dchisq(x,y,z) : returns the pdf value of x in a chi-square distribution with degrees of freedom yand the non-centrality parameter z.

    dt(x,y,z) : returns the pdf value of x in a t-distribution with degrees of freedom y and thenon-centrality parameter z.

    df(x,y,z,a) : returns the pdf value of x in a F-distribution with degrees of freedom-1 y, degreesof freedom-2 z and the non-centrality parameter a.

    dcauchy(x,y,z) : returns the pdf value of x in a Cauchy distribution with a location parameter yand scale parameter z.

    dnbinom(x,y,z) : returns the pmf value of x in a negative binomial distribution with dispersionparameter y and success probability z.

    dhyper(x,y,z,a) : returns the pmf value of x (number of white balls) in a hyper geometricdistribution with a white population size y, a black population size z, number of drawings madefrom the whole population a.

    dlnorm(x,y,z) : returns the pdf value of x in a log-normal distribution with log-mean y andlog-standard deviation z.

    dbeta(x,y,z) : returns the pdf value of x in a beta distribution with shape-1 parameter y andshape-2 parameter z.

    dlogis(x,y,z) : returns the pdf value of x in a logistic distribution with a location parameter yand scale parameter z.

    dweibull(x,y,z) : returns the pdf value of x in a Weibull distribution with a shape parameter yand scale parameter z.

    3For IE 586 students, it is sufficient to practice and learn about the first five distributions in the list.

    15

  • 8/2/2019 R Tutorial.pdf

    16/27

    3.2 Statistical Functions in R

    You can find the mean of a vector with the function mean(), its standard deviation with sd(), its variancewith var(), its median with median(). You can use the function summary() to learn about 25 and 75percent quantiles (which are called quartiles altogether with the median).

    x

  • 8/2/2019 R Tutorial.pdf

    17/27

    4 Creating Functions and Defining Loops in R

    4.1 Creating Functions in R

    We use the following structure in order to create a specific function which is not already defined in R.

    # f

  • 8/2/2019 R Tutorial.pdf

    18/27

    # EXAMPLE 02

    # A function that yields the perimeter and the area of a triangle

    # given corner coordinates in R2

    # Check "www.mathopenref.com/coordtriangleareabox.html" for the explanation

    triangle

  • 8/2/2019 R Tutorial.pdf

    19/27

    coora

  • 8/2/2019 R Tutorial.pdf

    20/27

    # EXAMPLE 04

    f

  • 8/2/2019 R Tutorial.pdf

    21/27

    simmax2unif(100000)

    # expectation

    # 0.665354266

    system.time(x

  • 8/2/2019 R Tutorial.pdf

    22/27

    Here is a basic root finding algorithm that uses a while-loop:

    # a root finding algorithm

    # finds the unique real root of a continuous function in an interval

    # the function should intersect with x-axis and should not be a tangent to x-axis

    findroot

  • 8/2/2019 R Tutorial.pdf

    23/27

    5 Drawing Plot Diagrams and Histograms in R

    We would like to draw a plot diagram for the density function of standard normal distirbution in theinterval (-4,4). We should create a dense vector in the x-axis (it should be dense in order to make a goodapproximation), and evaluate their function responses as a second vector.

    x

  • 8/2/2019 R Tutorial.pdf

    24/27

    Figure 1: Plot diagrams for the density function of standard normal distribution

    Figure 2: Histograms of a vector of normal RVs with mean 3 and standard deviation 1.5

    Figure 3: Adding lines on existing diagrams with lines() (1-2) and abline() (3) commands

    24

  • 8/2/2019 R Tutorial.pdf

    25/27

    6 Basic User Information

    6.1 Scaning and Printing Data

    Assume that you have a data7 written in a text file in the following format.

    3 25 94.9 12547 32556 56

    89 567

    435 342.1

    76.5 983.2

    0 343

    # There are 15 real values

    You can use the command scan() in order to store this data in a vector by scanning it from left toright and top to down. Spaces and new lines will separate the values to store them in new indices.

    x

  • 8/2/2019 R Tutorial.pdf

    26/27

    x

  • 8/2/2019 R Tutorial.pdf

    27/27

    apropos("exp")

    # [1] ".__C__expression" ".expand_R_libs_env_var" ".Export"

    # [4] ".mergeExportMethods" ".standard_regexps" "as.expression"

    # [7] "as.expression.default" "char.expand" "dexp"

    # [10] "exp" "expand.grid" "expand.model.frame"

    # [13] "expm1" "expression" "getExportedValue"

    # [16] "getNamespaceExports" "gregexpr" "is.expression"

    # [19] "namespaceExport" "path.expand" "pexp"

    # [22] "qexp" "regexpr" "rexp"

    # [25] "SSbiexp" "USPersonalExpenditure"

    If you need to see all the objects that you have created in your work session, simply write objects().

    objects()

    # [1] "a" "b" "circle" "coora"

    # [5] "coorb" "coorc" "error" "f"

    # [9] "findroot" "fixedcost" "func" "int"

    # [13] "lbound" "marginalcost" "n" "orderingcostlist"

    # [17] "res" "simmax2unif" "simmax2unif_2" "totalcost"

    # [21] "triangle" "ubound" "units" "vec"# [25] "x" "xest" "xinv" "y"

    # [29] "y1" "y2" "y3" "y4"

    # [33] "y5" "y6" "z"

    You can always save your R session together with the objects that you have created by clicking File,then Save Workspace from the quick access bar. You can always reach your saved workspaces by adouble-click on the saved file.