Upload
others
View
3
Download
1
Embed Size (px)
Citation preview
1
1 R BASICS
In this chapter, we offer a brief introduction to R. First, we argue why any business
school student should master at least one computer language. For students at Business
Analytics, Data Analytics and Data Science programs, they should understand two. To
support this argument, we will discuss an important concept: Open-Source. What is the
future of the business education? We think that there are three areas: domain knowledge,
Programming, and Data. Then, we will explain why the top two choices are R and Python
in terms of programming languages for business school students. After that, we will
explain how to install R software, how to launch and quit R, whether R is case sensitive,
and how to assign a value or values to a variable. In a sense, we assume that any reader,
especially a business-major student, knows nothing about this wonderful software. The
second half of the book/course is devoted to Python starting from Chapter 14, Python basics. In particular, we will cover the following:
• Business school students should master at least one programming language
• Comparisons between R, Python, Matlab and SAS
• The concepts of Open-Source
• Five steps to download and install R, how to launch and quit R
• R basics and some embedded functions
• Three ways to assign values to a variable
• Choosing meaningful variable names
• Numerical vs. string variables
• Inputting data via scan()
• Finding embedded n-letter functions
• Three types of computers: Windows, Mac and Chromebook
• A list of all chapters
• Finding help, hidden variables
• rm(list=ls() vs. rm(list=ls(all=TRUE)) • Rstudio
2
1.1 ONE PROGRAMMING LANGUAGE PLEASE Our society has entered a so-called big data era. This means that we could use big data to
solve many issues, such as choosing a better way to develop good drugs, and optimizing
our operations by analyzing a huge amount of data. Using finance as an example, for a
topic called Financial Statement Analysis, traditionally, investors or financial analysts
just analyze a few companies. Is it possible to analyze ALL companies available?
Another example is to apply the Benford Law, also called the Law of the First Digit. It is
easy to apply it to a few dozen companies. Since the SEC (Securities and Exchange
Commission) makes all the financial statements available from 2009 onward, could we
apply the Benford Law to all public companies that filed Balance Sheets, Income
Statement and Cash Flow Statement in Q2 2021? By the way, in the second quarter in
2021, there are 73,662 (unique CIKs) companies filed various types of reposts. The
related code could be found at the end of this chapter.
Using the SEC (the US Securities and Exchange Commission) quarterly indices as an
example, could our students process those files in the first place? There is a research area
called Market Microstructure. The database used by this area is called the TAQ (Trade
and Quote) Database. The size of them is huge, about several Giga-bites for just one
day’s data. The third example is related to the Census data. Could we link the
demographic data from the Census data in 2010 and 2020 to predict the success or failure
of private schools? To accomplish those tasks, researchers/students need a programming
language. It is our prediction that within 5 years, all business school students would be
required to learn one programming language. Later in this book we will show how to
process relatively big data sets such as the SEC filings, Census and NYSE high-
frequency data.
1.2 THE CONCEPT OF OPEN SOURCE Open Source is defined as free and available to use for everyone. We use YouTube as an
example. Many of us have watched many interesting, educational or funny videos. To
edit those videos, individual producers could use free software to do so. The top three
open-source video editing software are Shotcut, Davinci Resolve 17, and Lightworks. For
the open software component, we have R, Python, Perl, Octave, and Julia, all free. For
open data, we will devote a whole chapter to it: Chapter 3, Open-source data. Open code
suggests that researchers and users share their code (programs) with other potential users.
A typical example is Github, see its objective below:
GitHub is a development platform inspired by the way you work.
From open source to business, you can host and review code,
manage projects, and build software alongside 50 million developers.
For this course/book, we focus on R and Python, plus business related open source-
data. For this course/book, we have generated over 1,000 programs written in R and
3
Python. In addition, we have generated three utility functions to search, show and
download those programs. We will explain those three functions in later chapters.
1.3 FUTURE OF BUSINESS EDUCATION: 3 AREAS In terms of Business education in the near future, for the next 5 to 10 years, at
college/university levels, we could summarize as follows: domain knowledge,
programming and data skills. Using finance as an example, it is called OSF (Open
Source Finance), Kane and Masters (2009). Its 3-word summary is Finance,
Programming, and Data. Those three areas are vitally important for the next level of
development. The first word represents our current situation/design.
Word #1: Finance
Students will take related courses, such as Corporate Finance, Investment,
Portfolio Theory, Financial Modeling, Options Theory, Econometrics, Fixed
Income, and Business Statistics.
Word #2: Programming
As mentioned in the previous section, business school students should master
at least one programming language. For students at various Business Analytics,
Data Analytics or Data Science programs, they should understand at least 2
computer languages. Among many good open-source software, R and Python
are the top two choices.
Word #3: Data
Students will be trained to use their programming skills to handle big data sets
by writing various programs. Here are several typical examples: Download all
the SEC quarterly index files from Q1 1993 up to Q2 2021, and generate
related R and Python (Pickle) data sets (see the data sets at
https://www.sec.gov/Archives/edgar/full-index/). Download one or two days
of high frequency data at
ftp://ftp.nyxdata.com/Historical%20Data%20Samples/, then estimate stock
spreads by using bid and ask prices. Download and process 2020 Census
Summary File #1 and their data sets at
https://www.census.gov/data/datasets/2010/dec/summary-file-1.html. For
students at research schools, they should understand the CRSP (Center for
Research in Security Prices, a financial database maintained by University of
Chicago) Database, Computat (an accounting data based maintained by
Standard & Poor’s) and TAQ (Trade and Quote, a high-frequency database
maintained by the New York Exchange) Database. For the CRSP Database, see
our paper titled ‘CRSP for Teaching”, Yan (2018).
4
1.4 DATA, INFORMATION AND DECISION
According to Merriam-Webster, data could be defined as: facts or information used
usually to calculate, analyze, or plan something. Data could be classified into two big
categories: numeric and non-numerical data. For example, U.S. GDP growth rate for
2020 was -3.49%, a 5.65% decline from 2019. A typical data set is a 2-dimentional data
set. For example, from Yahoo!Finance, we could find the historical data for IBM. The top
part is shown below.
A typical example for non-numerical data is speech. Anand et al. (2021) examine the
impact of financial disclosures’ readability on future shareholder activism, as expressed
by shareholder-initiated proxy proposals. They find that the semantic complexity of the
MD&A section of the 10-K filings significantly predicts future shareholder proposals.
MD&A (Management Discussion and Analysis) is a section, of a public company's
annual report or quarterly filings, where the management discusses the company’s
performance. 10-K is the annual financial statement submitted to the SEC.
1.5 WHY R? In the previous sections, we mentioned a few languages: R, Python, Octave, Perl and
Julia. R and Python are the top choices for many master’s degree programs from
quantitative finance, business analytics and data science. Matlabhelp.com (2021) has the
following comparisons between Matlab, R, and Python.
Table 1.1 Comparisons between Matlab, R and Python MATLAB R Python
MATLAB is a commercial
tool, therefore, it is not
open-source.
R is not a commercial tool or
programming language,
therefore, it an open and free
source.
Python is not a commercial
tool or programming
language, therefore, it an open
and free source. MATLAB speed is not
faster than R. R speed is not faster than Python. Python speed is much high
than R and MATLAB. MATLAB is used in many
applications which include
voice recognition, image
processing, and many more.
R is only used in statistical
analysis. Python is the same used as
MATLAB but its major
application or basic use is in
web designing or
programming.
5
MATLAB is easy to use
than R and Python. R is slightly difficult for
understanding and writing a code
than MATLAB and Python.
Python is less difficult than R
and MATLAB.
MATLAB has various
functions library. Same as MATLAB R also have a
wide range of libraries. Same as MATLAB and R
Python also have a vast
library. MATLAB is a high-level
language. R is not a high level or low-level
language it is interpreted
language.
Same as MATLAB and R
Python also have a vast
library.
Hayes (2019) shows the following image. From it, we can see that Python and R are
number 1 and number 3, respectively.
Based on our many years’ teaching experience, we have the following table. SAS and
Matlab are two expensive software. However, they are used intensively by the financial
industry. For SAS, it is used intensively by the Banking industry. Since they are paid
software, the support will be the best. In the following table, 5 is the best grade and 1 is
the worst.
6
Table 1.2 Comparison between R, Python, Matlab and SAS.
R Python SAS Matlab
Cost 5 5 2 2
Easy to learn 5 4 2 2
Data handling 4 4 5 2
Big data handling 2 2 5 2
Current Job 4 4 5 5
Future trend 5 5 3 4
Support 2 2 5 5
In the above table, there are two entries related to data: Data handling and Big Data
Handling. The big data is defined here with a size of several Gigabyte. For example, the
daily TAQ (Trade and Quote) data could be treated as big data in terms of this
book/course. Based on our multi-year experience of teaching programming language to
finance major students, R is the best one to start with. There are several reasons behind
this. First, R is relatively easy to learn, compared with Python. Second, R is used quite
intensively in many areas. Third, many R packages, 18,651 available as of 1/4/2022,
could greatly help new learners.
1.6 RELATED COURSE MATERIALS To help a potential student learn this course online, we have generated many data sets in
various forms, such as csv, RData, pickle (Python data), and sas7bdat (SAS data format).
For more detail, please see Chapter 3, Open-source data. Since for each lecture, we
expect to have at least two in-class exercises, we have generated many in-class exercises.
Typing .ice, readers will see a list. For the first week (chapters 1 and 2), we have the
following result.
Hands-on is the most important aspect of learning a computer language. In our paper,
Teaching programming skills to finance students: how to design and teach a great
course, we have summarized 7 critical factors to make a successful programming course.
Those 7 factors are: strong motivation, a good textbook, a hands-on learning
environment, being data-intensive, a challenging term project, multiple supporting R
7
datasets, and an easy way to download such R datasets. In this book, we would target
those objectives. We have produced at least one video for each chapter.
1.7 ALL CHAPTERS A list where the whole course/book is shown below.
The whole book is divided into three parts. Part I is for R from chapters 1 to 13, and Part
II is related to Python from Chapter 14 to Chapter 21. Part III shows a few good projects.
This book is designed for a one-semester course (including both Parts I and II). For each
week, students are expected to learn 2 chapters.
1.8 HOW TO DOWNLOAD AND INSTALL R To install R, we have the following 5 steps.
Step 1: Go to http://www.r-project.org
Step 2: Click "CRAN" under "Download" (on the left-hand side)
Step 3: Choose a mirror address
Step 4: Choose the appropriate software (PC, Mac)
Step 5: Click "base"
When done, an R icon will appear on your desktop.
1.9 HOW TO LAUNCH R AND QUIT R? To start R, double click the icon on your desktop.
8
To quit, just type q() from the R prompt (>).
> q() # first way to quit
Anything after # is a comment.
# this is a comment line
# > is the R prompt
When quitting, the program will ask you whether to “Save workspace image?” which
asks whether to keep all your variables or functions for the future usage. At this stage,
just answer no.
See below for another way to quit.
# [click] "file" on the menu bar - - > "exit"
To quit R without saving, we use the q("no") command.
> q("no") # quit R without saving variables and functions
> q("yes") # quit R and keep variables and functions
1.10 POWER FUNCTION: ^ or ** For the power function, we can use eighter ^ or **, shown below.
>2^3
[1] 8
> 10**2
[1] 100
For the first one, 2^3, it is equivalent to 23, i.e., 2*2*2=8.
1.11 3 WAYS TO ASSIGN A VALUE TO A VARIABLE The first way to assign a value to a variable is to use “<-“.
> x<-10
To show the value of a variable, simply type its name.
> x
[1] 10
9
The second and third ways to assign a value to a variable, we use “=” and “->”.
> y=2
> 10->x
The “->” assignment could make our debugging efforts easier. Assume that we want to
test a program to estimate the present value of $100 received in two years with an 8%
annual discount rate. The related formula is shown below.
𝑝𝑣 =𝑓𝑣
(1+𝑅)𝑛 , (1)
where pv is the present value, fv is the future value, R is the period rate, and n is the
number of periods. We could type the following code to get our result. > 100/(1+0.08)^2
[1] 85.73388
After that, we change our mind. Now, we try to assign the result to a variable, such as pv.
To save time, we simply use the upper arrow-key to recall the previous command. Then
add “->pv” at the end of the above command.
> 100/(1+0.08)^2->pv
> pv
[1] 85.73388
To assign a set of values, we use c(1,2.6,4.3,5.25), where “c” stands for concatenate.
> X<-c(1,2,4,6) # assign a vector (column values)
To assign a set of consecutive integers, we could use n1:n2, such as 1:10.
> y<-1:50
> x<-c(1:5,8:12)
> x
[1] 1 2 3 4 5 8 9 10 11 12
We can input data from high to low, i.e., reversing the order.
> y<-5:1
The rev() function could be used to reverse an input data set.
> x<-5:1
> x<-rev(1:5) # same as the above
Try the following code and print x to see the result.
> x<-1.5:10
1.12 CASE SENSITIVE, AND PUT SEVERAL COMMANDS ON ONE-LINE In R, we don’t need to define a variable before using it.
# a variable is not formally defined before its assignment
10
> fv<-100
R is case sensitive which means that up-case X and lower-case x are different variables.
> x<-10 # lower case x
> X # capital letter of x
Error: object 'X' not found
To put several R commands on one-line, semi-colons are used.
> fv<-10; pv<-100; n<-10; rate<-0.05
1.13 ls() AND rm() FUNCTIONS Sometimes, we need to check all existing variables (objects). For this reason, we use the
ls() function.
> ls()
When a variable is no longer needed, we could remove it from the memory.
> rm(x) # remove variable called x
To remove several variables (objects) simultaneously, we use comma to separate them.
> rm(x,y,pv) # remove x, y and pv
To remove all variables (objects), we have the code below.
> rm(list=ls()) # remove all variables (objects)
The 2nd way to remove all objects (variables) is given below.
# [click] "Misc" - - > "Remove all objects … "
To print a character variable (a string) on the screen, we could use the functions cat() or
print(). Remember to circle our sentences in double or single quotation marks.
> cat("hello, world!\n\n\n") #\n is for a new line
hello, world!
>
In the above output, there are two blank lines. The print() function could also be used.
> print('hello R!')
[1] "hello R!"
Note that “\n” is not working for the print() function.
> print('hello world\n')
[1] "hello world\n"
We could also print a defined variable.
> x<-'this is great'
> print(x)
[1] "this is great"
11
1.14 NEXT LINE SYMBOL (+), BACK TO THE R PROMPT When one command occupies multiple lines, the symbol + will appear. Assume that we
intend to assign 1 to 10 to x.
> x<-1:10
For some reasons, we hit the enter-key before we finish the whole command, shown
below. In other words, we use several lines to finish the command.
> x<-1:
+ 10
> x
[1] 1 2 3 4 5 6 7 8 9 10
It is often, especially for a beginner, that we type a few wrong keys, such as a double or
single quotation mark without a matching one. Sometimes, we simply don’t want figure
out where the issue is since it might be too time-consuming. Instead, we just want to go
back to the R prompt and retype the command. In those cases, we hit the ‘Esc’ key, on
the top-left of our keyboard, to return to the R prompt (>).
> x<-‘9”(999asdfklj
+ > # use ‘Esc’ to come back to the R prompt
1.15 seq() FUNCTION The seq() function is used to generate a set of values.
> x<-seq(1, 19, by = 2)
> x
[1] 1 3 5 7 9 11 13 15 17 19
The following command use pi as an incremental value.
> x<-5:1
> x<-seq(1, 11, by = pi)
> x
[1] 1.000000 4.141593 7.283185 10.424778
The complete command has the following format.
>seq(from=1,to=3, by =0.5)
[1] 1.0 1.5 2.0 2.5 3.0
1.16 rep() AND length() FUNCTIONS The rep() function is used to repeat the same value n times, shown below.
12
> x=rep(0,100)
> head(x)
[1] 0 0 0 0 0 0
To find the number of values (observations) for a vector, we apply the length()
function, shown below.
> length(x)
[1] 100
1.17 USING MEANINGFUL VARIABLE NAMES For clarity, it is always a good idea to generate meaningful variables, such as pv for
present value, fv for future value, pv_f for the present value function, and
pv_annuity_f for the present value function for annuity. By using those names, we and
other users would understand programs more easily.
1.18 POSITION AND KEYWORD APPROACH There are two ways to input data: position and keyword. In the following one-line code,
we use the position-variable approach. In other words, the meaning of the input variable
depends on its position in the set of input variables.
> x<-seq(1,3,0.5) # position variable approach
For the keyword approach, we add a keyword in front of each input value, such as
from=1. One advantage of the key-word approach is that the order of input variables does
not play a role: Hence the following three statements are equivalent.
> seq(from=1,to=3,by=0.5) # they are equivalent
> seq(to=3,from=1,by=0.5)
> seq(by=0.5,to=3,from=1)
In the next chapter, we will come back to this two types of input methods when
discussing how to write our own functions.
1.19 INPUTTING DATA VIA scan()
Another easy way to input data from your keyboard is to use the scan() function.
> x<-scan()
1: 1
2: 3
3: 4
4: 2.5
5: 5
6:
13
Read 5 items
> x
[1] 1.0 3.0 4.0 2.5 5.0
For inputting multiple columns, we can input them as a vector first. Then use the
matrix() function to convert it. The desired input format (two columns) is given in the
right panel below.
> x<-scan()
1: 1 3 3 6 5 6 7 8
9:
Read 8 items
> y<-matrix(x,4,2,byrow=T)
> y
[,1] [,2]
[1,] 1 3
[2,] 3 6
[3,] 5 6
[4,] 7 8
In the above example, we convert a vector by row. On the other hand, if we convert a
vector by column, we must change our code a little bit.
> x<-scan()
1: 1 3 5 7 3 6 6 8
9:
Read 8 items
> y<-matrix(x,4,2,byrow=F) # or use default y<-matrix(x,4,2)
1.20 GETTING DATA FROM THE clipboard
Here we discuss a very simple way to get data from Excel. This method works for small
data sets. Assume that we have the following Excel spread sheet.
We can highlight and copy the data, then issue the following command (for Window
users).
> x<-read.table("clipboard")
> x
V1 V2
1 1 2
2 3 4
3 5 6
14
For Mac users, they should use the following commands to read data from a clipboard.
data <- read.table(pipe("pbpaste"))
data <- read.table(pipe("pbpaste"), header=T)
data <- read.table(pipe("pbpaste"), sep="\t", header=T)
Note that pipe("pbpaste") is the proper way to address the clipboard in Mac OS X,
while in Windows that would be 'clipboard', discussed in the above section. The
combination of '\t' is for a tab. Assume that our data might have a header (i.e., column
names), shown below.
For this case, we simply add “header=T”, shown below. Here the T stands for FRUE.
> x<-read.table("clipboard", header=T)
> x
date ret
1 19990102 0.0034
2 19990202 0.0451
3 19990204 -0.0023
The above operation is true when we copy data from a Notepad or MS-word file. We
should pay attention to the last row. Two different ways, how the last line ends, are
shown below.
For the format shown in the above left panel, we will get the following warning message
when issuing the command x<-read.table(“clipboard”). Fortunately, the variable
will take its supposed values.
> x<-read.table("clipboard")
Warning message:
In read.table("clipboard") :
incomplete final line found by readTableHeader on 'clipboard'
15
1.21 USING R AS A CALCULATOR R can be used as a calculator since it is straight forward to call various embedded R
functions. For example, the mean() function is for the average.
> x<-1:50
> mean(x)
[1] 25.5
You can try other functions as well, such as max(), min(), median(), sd() and
var().
> x<-1:50
> max(x)
[1] 50
> min(x)
[1] 1
> median(x)
[1] 25.5
> sd(x)
[1] 14.57738
The following table summarizes a set of the most widely used functions.
Table 1.3: A list of some basic functions
Function Meaning Examples
mean(x) Mean x<-1:10;mean(x) # [1] 5.5
median(x) Median x<-1:10;median(x) # [1] 5.5
min(x) Minimum x<-1:10;min(x) # [1] 1
max(x) Maximum x<-1:10;max(x) # [1] 10
var(x) Variance x<-1:10;var(x) # [1] .166667
sd(x) Standard deviation x<-1:10;sd(x) # [1] 3.027650
exp(x) Exponential function exp(2.3) # [1] 9.974182
log(x) Natural log function log(4.5) # [1] 1.504077
log10(x) Log function based on 10 log10(4.3) # [1] 0.6334685
sum(x) Take the summation x<-1:10;sum(x) # [1] 55
sort(x) Sort in ascending order x<-c(6,-1,3);sort(x)# [1] -1 3 6
range(x) Range of a variable x<-1.5:10;range(x) # [1] 1.5 9.5
diff(x) Difference for a vector x<-c(1,2.3,4.5);diff(x)#[1]1.3 2.2
ceiling(x) Smallest integer larger
than x
x<-9.5; ceiling(x) # [1] 10
floor(x) Largest integer smaller x<-9.5; floor(x) # [1] 9
16
than x
as.integer(x) Take the integer value x<-9.5;as.integer(x)# [1] 9
prod() Get product of a vector x<-1:3;prod(x) # [1] 6
quantile(x) x<-1:100; quantile(x) 0% 25% 50% 75% 100% 1.00 25.75 50.50 75.25 100.00
Sometimes, we need to change the directory for convenience. The related procedure is
given below.
# change the directory
# [click] File -- > "Change dir…" [choose your working directory]
1.22 USING THE UP- AND DOWN-ARROW KEYS We can use the up- and down-arrow keys to recall the previous command and modify it.
> x<-1:500
> y<-10:510
After issuing a set of command lines, we can use both the Up and Down Arrow keys to
move back and forth to recall and correct the ‘old’ commands. This is extremely
convenient to check and modify our code since we could recall the previous command
with a new input or a minor modification.
1.23 FINDING HELP There exist several ways to find information for a specific R function. If we are interested
in the mean function, we could issue ?mean, help(mean) or example(mean).
>?mean
The command help(mean) achieves the same goal as ?mean.
>help(mean)
To get examples for a specific function, we can use the example() function.
>example(mean)
We can also use the help on the menu bar.
> # [click] "Help" - -> "FAQ on R"
The following picture shows all the entries after clicking the “help” icon on the menu
bar.
17
When unsure about the spelling of a function in question, we use the apropos() function.
> apropos("mea")
[1] "colMeans" "influence.measures" "kmeans"
[4] "mean" "mean.data.frame" "mean.Date"
[7] "mean.default" "mean.difftime" "mean.POSIXct"
[10] "mean.POSIXlt" "rowMeans" "weighted.mean"
One alternative is to use the find() function, which would achieve the same goal.
1.24 .nLetterFunctions() To help readers/students collect all n-letter embedded functions, we have generated a
function called .nLetterFunctions. For example, to get all 4-letter functions, we
issue .nLetterFunctions(4), shown below.
Note that there is a dot in front of nLetterFunctions. For each function, we can use the
help() function to get more related information, such as help(week).
1.25 HIDDEN VARIABLES AND FUNCTIONS If we want to define a hidden variable, we could start our variable with a dot, shown
below.
18
> x<-10
> .x<-100
For the above code, x and .x are different variables. We can use ls() to see the
existence of x. > ls()
[1] "x"
To see all hidden variable or functions, we must specify all=TRUE, shown below. > ls(all=TRUE)
[1] ".x" "x"
To remove all our defined variables or functions, we can use rm(list=ls()). However,
if we want to remove all objects including hidden ones, we specify all=TRUE. > ls(all=TRUE)
[1] ".x"
> rm(list=ls(all=TRUE)) # remove all including hidden ones
> ls(all=TRUE)
character(0)
Another way to remove all objectives, including hidden ones, is to click “Misc” on the
menu bar, then “Remove all objects”. In addition to one dot, we could have multiple
dots in front of a variable or function name, shown below. > ..path<-"http://datayyy.com/" > ...path<-"http://datayyy.com/"
> ....path<-"http://datayyy.com/"
> ls(all=T)
[1] "....path" "...path" "..path" ".m" ".mean" "a"
1.26 Rstudio It is worth mentioning a platform called Rstudio as an alternative to the R console. From
their webpage at https://rstudio.com/products/rstudio/download/, readers/students can
could download and install it. After launching Rstudio for the first time, we will see the
following three panels.
19
The left-panel is quite like our normal R console. We can type our code there. After
trying something on the left panel, those values will show up in the top right panel,
shown below.
On the left panel Show on the top right panel
To write a savable program, click “File”, then “New”. We will see the number 4 panel
pops up, shown below.
There are several advantages of using Rstudio. First, we have several panels to work with.
Second, it is colorful, and this will reduce the chance of typing errors. Third, when typing,
20
some symbols might pop up automatically. According to Barr (2013), there are 6 reasons
why we should use Rstudio. Another advantage is that the structure of RStudio is like that
of Spyder, a Python editor. One disadvantage is that Rstudio is more complex than R
console. Occasionally, some code is not working properly on RStudio compared with R
console.
1.27 MORE ON HELP After clicking “Help” on the menu bar using R console, we will see FAQ on R, FAQ on
R for Windows, manuals (in PDF), etc., shown below.
On the other hand, the bottom right panel, for Rstudio, is a helping window, shown below.
1.29 WINDOWS, MAC AND CHROME BOOK When learning a programming language(s), readers/students need a personal computer to
write their code and run programs. We have two types of computers: a PC (Windows)
21
and a Mac. For most functionalities, there is no difference between them. On the other
hand, there exist some minor differences, such as how to define a path, where to find our
downloaded data sets or programs. To help Mac users, we have generated a help menu.
By typing .macUsers (or .mac for a short-cut), the following menu would pop up.
A few students might use Chromebooks. A Chromebook is essentially a laptop with
Google’s Chrome OS (Operating System) on it instead of Windows or MacOS. Dube
(2020) discussed 5 pros and cons of using a Chromebook. To help Chromebook users, we
have generated a help menu. By typing .chromebook (or .cb for a short-cut), the
following menu would pop up.
One alternative for both Mac and Chrome users is to use a so-called Virtual Lab
discussed in the next section.
1.30 VISUAL LAB This section is optional for students/readers who could run both R and Python on their
own computers. This section is different for students/readers who is outside Geneseo. For
22
readers/students at Geneseo, go to the website at http://go.geneseo.edu/publicvirtuallab,
shown below.
After clicking “Public Lab”, we will see the following image.
Check the box for “File transfer”, then click on “Allow”. Enter your Username and
Password, then click on “Submit”, shown below.
We will have the following desktop.
After clicking on the R icon, we can launch R (not show here to save space). To sign off
this virtual lab, click the appropriate icon on the top right, shown below.
23
Later in the book, we will explain how to upload our data and files to this virtual lab, and
how to output our result or data sets to OneDrive and other personal devices. In Chapter
14, Python Basics, we will explain how to launch Python by using this virtual lab.
1.31 SUMMARY In this chapter, we discussed how to install R, its basics and value assignments. Those are
the most basic concepts for learning a computer language. Later in the book/course, we
will use those concepts repeatedly. Thus, a new user should have certain confidence if
he/she feels overwhelmed by those new concepts.
In the next chapter, Chapter 2, Simple functions using R, we will discuss how to write
one-line R function. Then, we will explain how to extend it to a multi-line function. In
addition, we explain how to add comments to make our functions more readable. For
example, we could add the objective of the program, the formula used for the function,
definitions of input variables, any default values, and a few concrete examples. Because
of those extra comments or help, our functions become self-explanatory.
REFERENCES Barr, Andrew, 2013, Top 6 reasons you need to be using RStudio, https://www.r-
bloggers.com/2013/02/top-6-reasons-you-need-to-be-using-rstudio/
Anand, Abhinav, Xing Huan, and Jalaj Pathak, 2021, Does financial disclosure
readability predict shareholder activism?
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3991459
Blackmagic, Davici Resolve 17, 2021,
https://www.blackmagicdesign.com/products/davinciresolve/
Dube, Kat, 2020, 5 Pros And Cons Of Using A Chromebook,
http://datayyy.com/webs/chromebook.html
Github, 2021, http://github.com
Hayes, Bob, 2019,Programming Languages Most Used and Recommended by Data
Scientists,https://businessoverbroadway.com/2019/01/13/programming-languages-
most-used-and-recommended-by-data-scientists/
Kane, David and Joseph D. Masters, 2009, Open Source Finance, Investing,
https://joi.pm-research.com/content/18/1/92
24
Lightworks, 14, https://www.lwks.com
Matlabhelp.com, Comparison of R, MATLAB and Python:
https://www.matlabhelp.com/comparison-of-r-matlab-and-python/
Merriam-Webster, data’s definition, https://www.merriam-webster.com/dictionary/data
Shotcut, 2021, http://shotcut.com
Yan, Yuxing, 2018, CRSP for Teaching, SSRN working papers,
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3303504
Yan, Yuxing, 2016, Teaching programming skills to finance students: how to design and
teach a great course, https://link.springer.com/article/10.1186/s40854-017-0081-x
Video #1: DANL100 Chapter 1: R basics
https://www.youtube.com/watch?v=xP6PTpcuciU
Video #2, How to download and install R (6m39s) [for windows)
https://www.youtube.com/watch?v=ZoPJGmpYJzw
Video #3, Programming in R - Getting Started - Installing R and RStudio on a Mac
(5m59s), https://www.youtube.com/watch?v=Ywj6yNfc5nM
APPENDIX A: Copy and paste the following line onto the R window.
source("http://datayyy.com/rpy/week1.txt")
Appendix B: Finding the contents of Chapter 1.
To view the contents of Chapter 1, we type .c1. Note that there is a dot in front of c1.
25
Appendix C: Typing .uu to see the menu related to utility functions, shown below.
EXERCISES
1.1 What are the advantages of using R?
1.2 What does ‘Open-Source’ mean?
1.3 Please offer a few examples of open-source software.
1.4 What are three types of computers?
1.5 Compare R with Python and SAS. What are their advantages and disadvantages?
1.6 What is the home page of R?
1.7 How would we assign a value to a new variable?
1.8 How many ways are there to assign values to a new variable?
1.8 What is the difference between ls() and rm()?
1.9 Generate a vector from 2 to 15, then from 20 to 40. Estimate its mean, standard
deviation and median.
26
1.10 What might be the disadvantages of using R?
1.11 Is R case-sensitive?
1.12 Is R free?
1.13 How would we get help for R?
1.14 How would we add a comment?
1.15 Is it difficult to install R?
1.16 Will the R software compile a comment line?
1.17 Does a space play a role in R’s command?
1.18 How would we download manuals related to R?
1.19 Input values for x range from 1 to 100 and 202 to 300.
1.20 Reverse the input values in the above exercise.
1.21 How many 5-letter long embedded functions in R?
1.22 How would we find the titles of all chapters?
1.23 How would get support when using R?
1.24 What is Rstudio? From where we could download and install it?
1.25 What are the advantages of using Rstudio?
1.26 How could colorful code help a programmer?
1.27 What is the meaning of the following code?
>q("no")
1.28 From where could we find help for learning R?
1.29 According to Barr (2013), what are the advantages of using Rstudio?
1.30 What are the usages of the up- and down-arrow keys?
1.31 What are the scenarios when using the symbol “->” to assign a value?
1.32 Why would we call it “the magic use of the tab-key”?
1.33 What is the difference between two functions: seq() and rep()?
1.34 How many 6-letter long embedded functions are there?
1.35 What is the usage of the seq() function? Offer a few examples.
1.36 What is the usage of the rep() function? Offer a few examples.
1.37 Both R and Python are open-source language. True or False?
1.38 Debug the following code.
>q("Yes")
© by Yuxing Yan, [email protected], 1/4/2022.