Click here to load reader
Upload
syracuse-university
View
303
Download
2
Embed Size (px)
DESCRIPTION
Slide to go with Chaper 9 of the eBook: Introduction to Data Science
Citation preview
JEFFREY STANTONSCHOOL OF INFORMATION STUDIES
SYRACUSE UNIVERSITY
Installing and Using R-Studio
Overview of R-Studio
R-Studio is an “IDE” – an integrated development environment. As an IDE, R-Studio provides a convenient user interface for developing R code
R-Studio’s main screen is divided into four panes: Upper left: Code Window Lower left: R-Console Upper right: Data Workspace
and command history browser Lower right: File browser,
plots, package manager, help
Installing R-Studio
Make sure to install R first, before trying to install R-Studio; generally it makes sense to install or upgrade to the latest version of R before installing R-Studio
The free software download is available at http://www.rstudio.org/
If you reach a page where you are asked to choose between installing R-Studio server and installing R-studio as a desktop application choose desktop application
After installing, run R-Studio and type a command in the console window such as “2+2”
Creating Your First Function
We are going to build up slowly towards creating a function that calculates the statistical “mode” (the most frequently occurring value in a vector)
The upper left hand pane displays a blank space under the tab title “Untitled1.” Click in that pane and type the code to the right:
MyMode <- function(myVector) { return(myVector) }
What Does it Do?
The name of the function is MyModeThe function receives one “argument” when
it is called: Within the function, the argument is known as myVector
The function does not do anything yet, except for returning a copy of myVector
MyMode <- function(myVector) { return(myVector) }
Before You Can Use Your New Function
Before you can actually “call” this function from the R command line, you have to tell R that it exists!
The way to do this is to highlight the whole function with your mouse – all the way from the first “M” to the final “}” – and then click the “Run” button just above and to the right of the code
You can check that your function is defined by looking in the Workspace area in the upper right pane, scrolling down to the Functions list, and seeing MyMode in the list
MyMode <- function(myVector) { return(myVector) }
Let’s Test it Out
Type this code above into the R console, which is the lower left pane; don’t type the “>” – that is the command prompt
The first line makes a small vector of numbers called “tinyData” using the “c()” concatenate function
The second line passes tinyData to our functionThe R console will display the result: Can you
predict what it will be?
> tinyData <- c(1,2,1,2,3,3,3,4,5,4,5)> MyMode(tinyData)
Adding New Stuff to MyMode
In the code above, we have added a call to a built in R function called unique() that returns an unduplicated list of the data in the vector it receives
Don’t forget to highlight the whole function with your mouse – all the way from the first “M” to the final “}” – and then click the “Run” button just above and to the right of the code
You can save yourself having to do that every time by clicking the checkbox “Source on Save” and then saving your code file after you make each change
Run MyMode(tinyData) again from the R console command line and see what the result looks like; You should be able to predict what it will be!
MyMode <- function(myVector){ uniqueValues <- unique(myVector) return(uniqueValues)}
Finishing Up MyMode
We have added two new lines to this version: The first one is easy, the second one is hard
The first line, uniqueCounts <- tabulate(myVector), counts up how many times each unique value appears in myVector; if the lowest element in the vector is 1 and there are a total of three 1s in the vector, then the first element returned by tabulate() would be three
The second line uses the [ ] notation to pick a single item out of uniqueValues, but which one? The function which.max() returns the index (i.e., the ordinal number) of the element with the largest value in it argument uniqueCounts
MyMode <- function(myVector){ uniqueValues <- unique(myVector) uniqueCounts <- tabulate(myVector) return(uniqueValues[which.max(uniqueCounts)])}
Now Test!
Make sure to select all of your MyMode() code and click Run (or use Source on Save and do a save)
Then test your final function using the R console command line; type MyMode(tinyData) just as before
You can try making more vectors like tinyData with different sets of numbers in them
Your goal is to try to “break” MyMode(), i.e., to find a flaw in it; the chapter in “Introduction to Data Science” exposes one of the flaws in this code
Review
In this segment you installed R-Studio and fired it up
You created your first custom-designed function, called MyMode() and design to calculate the statistical mode
You “sourced” MyMode() so that R became aware of the definition of the function and then you tested it with a little bit of data
If you followed along in “Introduction to Data Science” you found at least one way in which MyMode() failed as well as some suggestions for fixing it up
Chapter Challenge
The Chapter Challenge for this chapter of Introduction to Data Science asks you to create a function that creates a distribution of sampling means from an input vector
You will have to refer to the previous chapter to remind yourself of the code that creates sampling distributions of means
Hint: One of the most important things to think about early on is what arguments your function will need to receive; in this case you will obviously need to pass in the vector of data, but what else will the function need to know in order to create a sampling distribution?