12

Click here to load reader

Chapter9 r studio2

Embed Size (px)

DESCRIPTION

Slide to go with Chaper 9 of the eBook: Introduction to Data Science

Citation preview

Page 1: Chapter9 r studio2

JEFFREY STANTONSCHOOL OF INFORMATION STUDIES

SYRACUSE UNIVERSITY

Installing and Using R-Studio

Page 2: Chapter9 r studio2

Overview of R-Studio

R-Studio is an “IDE” – an integrated development environment. As an IDE, R-Studio provides a convenient user interface for developing R code

R-Studio’s main screen is divided into four panes: Upper left: Code Window Lower left: R-Console Upper right: Data Workspace

and command history browser Lower right: File browser,

plots, package manager, help

Page 3: Chapter9 r studio2

Installing R-Studio

Make sure to install R first, before trying to install R-Studio; generally it makes sense to install or upgrade to the latest version of R before installing R-Studio

The free software download is available at http://www.rstudio.org/

If you reach a page where you are asked to choose between installing R-Studio server and installing R-studio as a desktop application choose desktop application

After installing, run R-Studio and type a command in the console window such as “2+2”

Page 4: Chapter9 r studio2

Creating Your First Function

We are going to build up slowly towards creating a function that calculates the statistical “mode” (the most frequently occurring value in a vector)

The upper left hand pane displays a blank space under the tab title “Untitled1.” Click in that pane and type the code to the right:

MyMode <- function(myVector) { return(myVector) }

Page 5: Chapter9 r studio2

What Does it Do?

The name of the function is MyModeThe function receives one “argument” when

it is called: Within the function, the argument is known as myVector

The function does not do anything yet, except for returning a copy of myVector

MyMode <- function(myVector) { return(myVector) }

Page 6: Chapter9 r studio2

Before You Can Use Your New Function

Before you can actually “call” this function from the R command line, you have to tell R that it exists!

The way to do this is to highlight the whole function with your mouse – all the way from the first “M” to the final “}” – and then click the “Run” button just above and to the right of the code

You can check that your function is defined by looking in the Workspace area in the upper right pane, scrolling down to the Functions list, and seeing MyMode in the list

MyMode <- function(myVector) { return(myVector) }

Page 7: Chapter9 r studio2

Let’s Test it Out

Type this code above into the R console, which is the lower left pane; don’t type the “>” – that is the command prompt

The first line makes a small vector of numbers called “tinyData” using the “c()” concatenate function

The second line passes tinyData to our functionThe R console will display the result: Can you

predict what it will be?

> tinyData <- c(1,2,1,2,3,3,3,4,5,4,5)> MyMode(tinyData)

Page 8: Chapter9 r studio2

Adding New Stuff to MyMode

In the code above, we have added a call to a built in R function called unique() that returns an unduplicated list of the data in the vector it receives

Don’t forget to highlight the whole function with your mouse – all the way from the first “M” to the final “}” – and then click the “Run” button just above and to the right of the code

You can save yourself having to do that every time by clicking the checkbox “Source on Save” and then saving your code file after you make each change

Run MyMode(tinyData) again from the R console command line and see what the result looks like; You should be able to predict what it will be!

MyMode <- function(myVector){ uniqueValues <- unique(myVector) return(uniqueValues)}

Page 9: Chapter9 r studio2

Finishing Up MyMode

We have added two new lines to this version: The first one is easy, the second one is hard

The first line, uniqueCounts <- tabulate(myVector), counts up how many times each unique value appears in myVector; if the lowest element in the vector is 1 and there are a total of three 1s in the vector, then the first element returned by tabulate() would be three

The second line uses the [ ] notation to pick a single item out of uniqueValues, but which one? The function which.max() returns the index (i.e., the ordinal number) of the element with the largest value in it argument uniqueCounts

MyMode <- function(myVector){ uniqueValues <- unique(myVector) uniqueCounts <- tabulate(myVector) return(uniqueValues[which.max(uniqueCounts)])}

Page 10: Chapter9 r studio2

Now Test!

Make sure to select all of your MyMode() code and click Run (or use Source on Save and do a save)

Then test your final function using the R console command line; type MyMode(tinyData) just as before

You can try making more vectors like tinyData with different sets of numbers in them

Your goal is to try to “break” MyMode(), i.e., to find a flaw in it; the chapter in “Introduction to Data Science” exposes one of the flaws in this code

Page 11: Chapter9 r studio2

Review

In this segment you installed R-Studio and fired it up

You created your first custom-designed function, called MyMode() and design to calculate the statistical mode

You “sourced” MyMode() so that R became aware of the definition of the function and then you tested it with a little bit of data

If you followed along in “Introduction to Data Science” you found at least one way in which MyMode() failed as well as some suggestions for fixing it up

Page 12: Chapter9 r studio2

Chapter Challenge

The Chapter Challenge for this chapter of Introduction to Data Science asks you to create a function that creates a distribution of sampling means from an input vector

You will have to refer to the previous chapter to remind yourself of the code that creates sampling distributions of means

Hint: One of the most important things to think about early on is what arguments your function will need to receive; in this case you will obviously need to pass in the vector of data, but what else will the function need to know in order to create a sampling distribution?