26
1 1 R BASICS In this chapter, we offer a brief introduction to R. First, we argue why any business school student should master at least one computer language. For students at Business Analytics, Data Analytics and Data Science programs, they should understand two. To support this argument, we will discuss an important concept: Open-Source. What is the future of the business education? We think that there are three areas: domain knowledge, Programming, and Data. Then, we will explain why the top two choices are R and Python in terms of programming languages for business school students. After that, we will explain how to install R software, how to launch and quit R, whether R is case sensitive, and how to assign a value or values to a variable. In a sense, we assume that any reader, especially a business-major student, knows nothing about this wonderful software. The second half of the book/course is devoted to Python starting from Chapter 14, Python basics. In particular, we will cover the following: Business school students should master at least one programming language Comparisons between R, Python, Matlab and SAS The concepts of Open-Source Five steps to download and install R, how to launch and quit R R basics and some embedded functions Three ways to assign values to a variable Choosing meaningful variable names Numerical vs. string variables Inputting data via scan() Finding embedded n-letter functions Three types of computers: Windows, Mac and Chromebook A list of all chapters Finding help, hidden variables rm(list=ls() vs. rm(list=ls(all=TRUE)) Rstudio

R BASICS - datayyy.com

  • Upload
    others

  • View
    3

  • Download
    1

Embed Size (px)

Citation preview

1

1 R BASICS

In this chapter, we offer a brief introduction to R. First, we argue why any business

school student should master at least one computer language. For students at Business

Analytics, Data Analytics and Data Science programs, they should understand two. To

support this argument, we will discuss an important concept: Open-Source. What is the

future of the business education? We think that there are three areas: domain knowledge,

Programming, and Data. Then, we will explain why the top two choices are R and Python

in terms of programming languages for business school students. After that, we will

explain how to install R software, how to launch and quit R, whether R is case sensitive,

and how to assign a value or values to a variable. In a sense, we assume that any reader,

especially a business-major student, knows nothing about this wonderful software. The

second half of the book/course is devoted to Python starting from Chapter 14, Python basics. In particular, we will cover the following:

• Business school students should master at least one programming language

• Comparisons between R, Python, Matlab and SAS

• The concepts of Open-Source

• Five steps to download and install R, how to launch and quit R

• R basics and some embedded functions

• Three ways to assign values to a variable

• Choosing meaningful variable names

• Numerical vs. string variables

• Inputting data via scan()

• Finding embedded n-letter functions

• Three types of computers: Windows, Mac and Chromebook

• A list of all chapters

• Finding help, hidden variables

• rm(list=ls() vs. rm(list=ls(all=TRUE)) • Rstudio

2

1.1 ONE PROGRAMMING LANGUAGE PLEASE Our society has entered a so-called big data era. This means that we could use big data to

solve many issues, such as choosing a better way to develop good drugs, and optimizing

our operations by analyzing a huge amount of data. Using finance as an example, for a

topic called Financial Statement Analysis, traditionally, investors or financial analysts

just analyze a few companies. Is it possible to analyze ALL companies available?

Another example is to apply the Benford Law, also called the Law of the First Digit. It is

easy to apply it to a few dozen companies. Since the SEC (Securities and Exchange

Commission) makes all the financial statements available from 2009 onward, could we

apply the Benford Law to all public companies that filed Balance Sheets, Income

Statement and Cash Flow Statement in Q2 2021? By the way, in the second quarter in

2021, there are 73,662 (unique CIKs) companies filed various types of reposts. The

related code could be found at the end of this chapter.

Using the SEC (the US Securities and Exchange Commission) quarterly indices as an

example, could our students process those files in the first place? There is a research area

called Market Microstructure. The database used by this area is called the TAQ (Trade

and Quote) Database. The size of them is huge, about several Giga-bites for just one

day’s data. The third example is related to the Census data. Could we link the

demographic data from the Census data in 2010 and 2020 to predict the success or failure

of private schools? To accomplish those tasks, researchers/students need a programming

language. It is our prediction that within 5 years, all business school students would be

required to learn one programming language. Later in this book we will show how to

process relatively big data sets such as the SEC filings, Census and NYSE high-

frequency data.

1.2 THE CONCEPT OF OPEN SOURCE Open Source is defined as free and available to use for everyone. We use YouTube as an

example. Many of us have watched many interesting, educational or funny videos. To

edit those videos, individual producers could use free software to do so. The top three

open-source video editing software are Shotcut, Davinci Resolve 17, and Lightworks. For

the open software component, we have R, Python, Perl, Octave, and Julia, all free. For

open data, we will devote a whole chapter to it: Chapter 3, Open-source data. Open code

suggests that researchers and users share their code (programs) with other potential users.

A typical example is Github, see its objective below:

GitHub is a development platform inspired by the way you work.

From open source to business, you can host and review code,

manage projects, and build software alongside 50 million developers.

For this course/book, we focus on R and Python, plus business related open source-

data. For this course/book, we have generated over 1,000 programs written in R and

3

Python. In addition, we have generated three utility functions to search, show and

download those programs. We will explain those three functions in later chapters.

1.3 FUTURE OF BUSINESS EDUCATION: 3 AREAS In terms of Business education in the near future, for the next 5 to 10 years, at

college/university levels, we could summarize as follows: domain knowledge,

programming and data skills. Using finance as an example, it is called OSF (Open

Source Finance), Kane and Masters (2009). Its 3-word summary is Finance,

Programming, and Data. Those three areas are vitally important for the next level of

development. The first word represents our current situation/design.

Word #1: Finance

Students will take related courses, such as Corporate Finance, Investment,

Portfolio Theory, Financial Modeling, Options Theory, Econometrics, Fixed

Income, and Business Statistics.

Word #2: Programming

As mentioned in the previous section, business school students should master

at least one programming language. For students at various Business Analytics,

Data Analytics or Data Science programs, they should understand at least 2

computer languages. Among many good open-source software, R and Python

are the top two choices.

Word #3: Data

Students will be trained to use their programming skills to handle big data sets

by writing various programs. Here are several typical examples: Download all

the SEC quarterly index files from Q1 1993 up to Q2 2021, and generate

related R and Python (Pickle) data sets (see the data sets at

https://www.sec.gov/Archives/edgar/full-index/). Download one or two days

of high frequency data at

ftp://ftp.nyxdata.com/Historical%20Data%20Samples/, then estimate stock

spreads by using bid and ask prices. Download and process 2020 Census

Summary File #1 and their data sets at

https://www.census.gov/data/datasets/2010/dec/summary-file-1.html. For

students at research schools, they should understand the CRSP (Center for

Research in Security Prices, a financial database maintained by University of

Chicago) Database, Computat (an accounting data based maintained by

Standard & Poor’s) and TAQ (Trade and Quote, a high-frequency database

maintained by the New York Exchange) Database. For the CRSP Database, see

our paper titled ‘CRSP for Teaching”, Yan (2018).

4

1.4 DATA, INFORMATION AND DECISION

According to Merriam-Webster, data could be defined as: facts or information used

usually to calculate, analyze, or plan something. Data could be classified into two big

categories: numeric and non-numerical data. For example, U.S. GDP growth rate for

2020 was -3.49%, a 5.65% decline from 2019. A typical data set is a 2-dimentional data

set. For example, from Yahoo!Finance, we could find the historical data for IBM. The top

part is shown below.

A typical example for non-numerical data is speech. Anand et al. (2021) examine the

impact of financial disclosures’ readability on future shareholder activism, as expressed

by shareholder-initiated proxy proposals. They find that the semantic complexity of the

MD&A section of the 10-K filings significantly predicts future shareholder proposals.

MD&A (Management Discussion and Analysis) is a section, of a public company's

annual report or quarterly filings, where the management discusses the company’s

performance. 10-K is the annual financial statement submitted to the SEC.

1.5 WHY R? In the previous sections, we mentioned a few languages: R, Python, Octave, Perl and

Julia. R and Python are the top choices for many master’s degree programs from

quantitative finance, business analytics and data science. Matlabhelp.com (2021) has the

following comparisons between Matlab, R, and Python.

Table 1.1 Comparisons between Matlab, R and Python MATLAB R Python

MATLAB is a commercial

tool, therefore, it is not

open-source.

R is not a commercial tool or

programming language,

therefore, it an open and free

source.

Python is not a commercial

tool or programming

language, therefore, it an open

and free source. MATLAB speed is not

faster than R. R speed is not faster than Python. Python speed is much high

than R and MATLAB. MATLAB is used in many

applications which include

voice recognition, image

processing, and many more.

R is only used in statistical

analysis. Python is the same used as

MATLAB but its major

application or basic use is in

web designing or

programming.

5

MATLAB is easy to use

than R and Python. R is slightly difficult for

understanding and writing a code

than MATLAB and Python.

Python is less difficult than R

and MATLAB.

MATLAB has various

functions library. Same as MATLAB R also have a

wide range of libraries. Same as MATLAB and R

Python also have a vast

library. MATLAB is a high-level

language. R is not a high level or low-level

language it is interpreted

language.

Same as MATLAB and R

Python also have a vast

library.

Hayes (2019) shows the following image. From it, we can see that Python and R are

number 1 and number 3, respectively.

Based on our many years’ teaching experience, we have the following table. SAS and

Matlab are two expensive software. However, they are used intensively by the financial

industry. For SAS, it is used intensively by the Banking industry. Since they are paid

software, the support will be the best. In the following table, 5 is the best grade and 1 is

the worst.

6

Table 1.2 Comparison between R, Python, Matlab and SAS.

R Python SAS Matlab

Cost 5 5 2 2

Easy to learn 5 4 2 2

Data handling 4 4 5 2

Big data handling 2 2 5 2

Current Job 4 4 5 5

Future trend 5 5 3 4

Support 2 2 5 5

In the above table, there are two entries related to data: Data handling and Big Data

Handling. The big data is defined here with a size of several Gigabyte. For example, the

daily TAQ (Trade and Quote) data could be treated as big data in terms of this

book/course. Based on our multi-year experience of teaching programming language to

finance major students, R is the best one to start with. There are several reasons behind

this. First, R is relatively easy to learn, compared with Python. Second, R is used quite

intensively in many areas. Third, many R packages, 18,651 available as of 1/4/2022,

could greatly help new learners.

1.6 RELATED COURSE MATERIALS To help a potential student learn this course online, we have generated many data sets in

various forms, such as csv, RData, pickle (Python data), and sas7bdat (SAS data format).

For more detail, please see Chapter 3, Open-source data. Since for each lecture, we

expect to have at least two in-class exercises, we have generated many in-class exercises.

Typing .ice, readers will see a list. For the first week (chapters 1 and 2), we have the

following result.

Hands-on is the most important aspect of learning a computer language. In our paper,

Teaching programming skills to finance students: how to design and teach a great

course, we have summarized 7 critical factors to make a successful programming course.

Those 7 factors are: strong motivation, a good textbook, a hands-on learning

environment, being data-intensive, a challenging term project, multiple supporting R

7

datasets, and an easy way to download such R datasets. In this book, we would target

those objectives. We have produced at least one video for each chapter.

1.7 ALL CHAPTERS A list where the whole course/book is shown below.

The whole book is divided into three parts. Part I is for R from chapters 1 to 13, and Part

II is related to Python from Chapter 14 to Chapter 21. Part III shows a few good projects.

This book is designed for a one-semester course (including both Parts I and II). For each

week, students are expected to learn 2 chapters.

1.8 HOW TO DOWNLOAD AND INSTALL R To install R, we have the following 5 steps.

Step 1: Go to http://www.r-project.org

Step 2: Click "CRAN" under "Download" (on the left-hand side)

Step 3: Choose a mirror address

Step 4: Choose the appropriate software (PC, Mac)

Step 5: Click "base"

When done, an R icon will appear on your desktop.

1.9 HOW TO LAUNCH R AND QUIT R? To start R, double click the icon on your desktop.

8

To quit, just type q() from the R prompt (>).

> q() # first way to quit

Anything after # is a comment.

# this is a comment line

# > is the R prompt

When quitting, the program will ask you whether to “Save workspace image?” which

asks whether to keep all your variables or functions for the future usage. At this stage,

just answer no.

See below for another way to quit.

# [click] "file" on the menu bar - - > "exit"

To quit R without saving, we use the q("no") command.

> q("no") # quit R without saving variables and functions

> q("yes") # quit R and keep variables and functions

1.10 POWER FUNCTION: ^ or ** For the power function, we can use eighter ^ or **, shown below.

>2^3

[1] 8

> 10**2

[1] 100

For the first one, 2^3, it is equivalent to 23, i.e., 2*2*2=8.

1.11 3 WAYS TO ASSIGN A VALUE TO A VARIABLE The first way to assign a value to a variable is to use “<-“.

> x<-10

To show the value of a variable, simply type its name.

> x

[1] 10

9

The second and third ways to assign a value to a variable, we use “=” and “->”.

> y=2

> 10->x

The “->” assignment could make our debugging efforts easier. Assume that we want to

test a program to estimate the present value of $100 received in two years with an 8%

annual discount rate. The related formula is shown below.

𝑝𝑣 =𝑓𝑣

(1+𝑅)𝑛 , (1)

where pv is the present value, fv is the future value, R is the period rate, and n is the

number of periods. We could type the following code to get our result. > 100/(1+0.08)^2

[1] 85.73388

After that, we change our mind. Now, we try to assign the result to a variable, such as pv.

To save time, we simply use the upper arrow-key to recall the previous command. Then

add “->pv” at the end of the above command.

> 100/(1+0.08)^2->pv

> pv

[1] 85.73388

To assign a set of values, we use c(1,2.6,4.3,5.25), where “c” stands for concatenate.

> X<-c(1,2,4,6) # assign a vector (column values)

To assign a set of consecutive integers, we could use n1:n2, such as 1:10.

> y<-1:50

> x<-c(1:5,8:12)

> x

[1] 1 2 3 4 5 8 9 10 11 12

We can input data from high to low, i.e., reversing the order.

> y<-5:1

The rev() function could be used to reverse an input data set.

> x<-5:1

> x<-rev(1:5) # same as the above

Try the following code and print x to see the result.

> x<-1.5:10

1.12 CASE SENSITIVE, AND PUT SEVERAL COMMANDS ON ONE-LINE In R, we don’t need to define a variable before using it.

# a variable is not formally defined before its assignment

10

> fv<-100

R is case sensitive which means that up-case X and lower-case x are different variables.

> x<-10 # lower case x

> X # capital letter of x

Error: object 'X' not found

To put several R commands on one-line, semi-colons are used.

> fv<-10; pv<-100; n<-10; rate<-0.05

1.13 ls() AND rm() FUNCTIONS Sometimes, we need to check all existing variables (objects). For this reason, we use the

ls() function.

> ls()

When a variable is no longer needed, we could remove it from the memory.

> rm(x) # remove variable called x

To remove several variables (objects) simultaneously, we use comma to separate them.

> rm(x,y,pv) # remove x, y and pv

To remove all variables (objects), we have the code below.

> rm(list=ls()) # remove all variables (objects)

The 2nd way to remove all objects (variables) is given below.

# [click] "Misc" - - > "Remove all objects … "

To print a character variable (a string) on the screen, we could use the functions cat() or

print(). Remember to circle our sentences in double or single quotation marks.

> cat("hello, world!\n\n\n") #\n is for a new line

hello, world!

>

In the above output, there are two blank lines. The print() function could also be used.

> print('hello R!')

[1] "hello R!"

Note that “\n” is not working for the print() function.

> print('hello world\n')

[1] "hello world\n"

We could also print a defined variable.

> x<-'this is great'

> print(x)

[1] "this is great"

11

1.14 NEXT LINE SYMBOL (+), BACK TO THE R PROMPT When one command occupies multiple lines, the symbol + will appear. Assume that we

intend to assign 1 to 10 to x.

> x<-1:10

For some reasons, we hit the enter-key before we finish the whole command, shown

below. In other words, we use several lines to finish the command.

> x<-1:

+ 10

> x

[1] 1 2 3 4 5 6 7 8 9 10

It is often, especially for a beginner, that we type a few wrong keys, such as a double or

single quotation mark without a matching one. Sometimes, we simply don’t want figure

out where the issue is since it might be too time-consuming. Instead, we just want to go

back to the R prompt and retype the command. In those cases, we hit the ‘Esc’ key, on

the top-left of our keyboard, to return to the R prompt (>).

> x<-‘9”(999asdfklj

+ > # use ‘Esc’ to come back to the R prompt

1.15 seq() FUNCTION The seq() function is used to generate a set of values.

> x<-seq(1, 19, by = 2)

> x

[1] 1 3 5 7 9 11 13 15 17 19

The following command use pi as an incremental value.

> x<-5:1

> x<-seq(1, 11, by = pi)

> x

[1] 1.000000 4.141593 7.283185 10.424778

The complete command has the following format.

>seq(from=1,to=3, by =0.5)

[1] 1.0 1.5 2.0 2.5 3.0

1.16 rep() AND length() FUNCTIONS The rep() function is used to repeat the same value n times, shown below.

12

> x=rep(0,100)

> head(x)

[1] 0 0 0 0 0 0

To find the number of values (observations) for a vector, we apply the length()

function, shown below.

> length(x)

[1] 100

1.17 USING MEANINGFUL VARIABLE NAMES For clarity, it is always a good idea to generate meaningful variables, such as pv for

present value, fv for future value, pv_f for the present value function, and

pv_annuity_f for the present value function for annuity. By using those names, we and

other users would understand programs more easily.

1.18 POSITION AND KEYWORD APPROACH There are two ways to input data: position and keyword. In the following one-line code,

we use the position-variable approach. In other words, the meaning of the input variable

depends on its position in the set of input variables.

> x<-seq(1,3,0.5) # position variable approach

For the keyword approach, we add a keyword in front of each input value, such as

from=1. One advantage of the key-word approach is that the order of input variables does

not play a role: Hence the following three statements are equivalent.

> seq(from=1,to=3,by=0.5) # they are equivalent

> seq(to=3,from=1,by=0.5)

> seq(by=0.5,to=3,from=1)

In the next chapter, we will come back to this two types of input methods when

discussing how to write our own functions.

1.19 INPUTTING DATA VIA scan()

Another easy way to input data from your keyboard is to use the scan() function.

> x<-scan()

1: 1

2: 3

3: 4

4: 2.5

5: 5

6:

13

Read 5 items

> x

[1] 1.0 3.0 4.0 2.5 5.0

For inputting multiple columns, we can input them as a vector first. Then use the

matrix() function to convert it. The desired input format (two columns) is given in the

right panel below.

> x<-scan()

1: 1 3 3 6 5 6 7 8

9:

Read 8 items

> y<-matrix(x,4,2,byrow=T)

> y

[,1] [,2]

[1,] 1 3

[2,] 3 6

[3,] 5 6

[4,] 7 8

In the above example, we convert a vector by row. On the other hand, if we convert a

vector by column, we must change our code a little bit.

> x<-scan()

1: 1 3 5 7 3 6 6 8

9:

Read 8 items

> y<-matrix(x,4,2,byrow=F) # or use default y<-matrix(x,4,2)

1.20 GETTING DATA FROM THE clipboard

Here we discuss a very simple way to get data from Excel. This method works for small

data sets. Assume that we have the following Excel spread sheet.

We can highlight and copy the data, then issue the following command (for Window

users).

> x<-read.table("clipboard")

> x

V1 V2

1 1 2

2 3 4

3 5 6

14

For Mac users, they should use the following commands to read data from a clipboard.

data <- read.table(pipe("pbpaste"))

data <- read.table(pipe("pbpaste"), header=T)

data <- read.table(pipe("pbpaste"), sep="\t", header=T)

Note that pipe("pbpaste") is the proper way to address the clipboard in Mac OS X,

while in Windows that would be 'clipboard', discussed in the above section. The

combination of '\t' is for a tab. Assume that our data might have a header (i.e., column

names), shown below.

For this case, we simply add “header=T”, shown below. Here the T stands for FRUE.

> x<-read.table("clipboard", header=T)

> x

date ret

1 19990102 0.0034

2 19990202 0.0451

3 19990204 -0.0023

The above operation is true when we copy data from a Notepad or MS-word file. We

should pay attention to the last row. Two different ways, how the last line ends, are

shown below.

For the format shown in the above left panel, we will get the following warning message

when issuing the command x<-read.table(“clipboard”). Fortunately, the variable

will take its supposed values.

> x<-read.table("clipboard")

Warning message:

In read.table("clipboard") :

incomplete final line found by readTableHeader on 'clipboard'

15

1.21 USING R AS A CALCULATOR R can be used as a calculator since it is straight forward to call various embedded R

functions. For example, the mean() function is for the average.

> x<-1:50

> mean(x)

[1] 25.5

You can try other functions as well, such as max(), min(), median(), sd() and

var().

> x<-1:50

> max(x)

[1] 50

> min(x)

[1] 1

> median(x)

[1] 25.5

> sd(x)

[1] 14.57738

The following table summarizes a set of the most widely used functions.

Table 1.3: A list of some basic functions

Function Meaning Examples

mean(x) Mean x<-1:10;mean(x) # [1] 5.5

median(x) Median x<-1:10;median(x) # [1] 5.5

min(x) Minimum x<-1:10;min(x) # [1] 1

max(x) Maximum x<-1:10;max(x) # [1] 10

var(x) Variance x<-1:10;var(x) # [1] .166667

sd(x) Standard deviation x<-1:10;sd(x) # [1] 3.027650

exp(x) Exponential function exp(2.3) # [1] 9.974182

log(x) Natural log function log(4.5) # [1] 1.504077

log10(x) Log function based on 10 log10(4.3) # [1] 0.6334685

sum(x) Take the summation x<-1:10;sum(x) # [1] 55

sort(x) Sort in ascending order x<-c(6,-1,3);sort(x)# [1] -1 3 6

range(x) Range of a variable x<-1.5:10;range(x) # [1] 1.5 9.5

diff(x) Difference for a vector x<-c(1,2.3,4.5);diff(x)#[1]1.3 2.2

ceiling(x) Smallest integer larger

than x

x<-9.5; ceiling(x) # [1] 10

floor(x) Largest integer smaller x<-9.5; floor(x) # [1] 9

16

than x

as.integer(x) Take the integer value x<-9.5;as.integer(x)# [1] 9

prod() Get product of a vector x<-1:3;prod(x) # [1] 6

quantile(x) x<-1:100; quantile(x) 0% 25% 50% 75% 100% 1.00 25.75 50.50 75.25 100.00

Sometimes, we need to change the directory for convenience. The related procedure is

given below.

# change the directory

# [click] File -- > "Change dir…" [choose your working directory]

1.22 USING THE UP- AND DOWN-ARROW KEYS We can use the up- and down-arrow keys to recall the previous command and modify it.

> x<-1:500

> y<-10:510

After issuing a set of command lines, we can use both the Up and Down Arrow keys to

move back and forth to recall and correct the ‘old’ commands. This is extremely

convenient to check and modify our code since we could recall the previous command

with a new input or a minor modification.

1.23 FINDING HELP There exist several ways to find information for a specific R function. If we are interested

in the mean function, we could issue ?mean, help(mean) or example(mean).

>?mean

The command help(mean) achieves the same goal as ?mean.

>help(mean)

To get examples for a specific function, we can use the example() function.

>example(mean)

We can also use the help on the menu bar.

> # [click] "Help" - -> "FAQ on R"

The following picture shows all the entries after clicking the “help” icon on the menu

bar.

17

When unsure about the spelling of a function in question, we use the apropos() function.

> apropos("mea")

[1] "colMeans" "influence.measures" "kmeans"

[4] "mean" "mean.data.frame" "mean.Date"

[7] "mean.default" "mean.difftime" "mean.POSIXct"

[10] "mean.POSIXlt" "rowMeans" "weighted.mean"

One alternative is to use the find() function, which would achieve the same goal.

1.24 .nLetterFunctions() To help readers/students collect all n-letter embedded functions, we have generated a

function called .nLetterFunctions. For example, to get all 4-letter functions, we

issue .nLetterFunctions(4), shown below.

Note that there is a dot in front of nLetterFunctions. For each function, we can use the

help() function to get more related information, such as help(week).

1.25 HIDDEN VARIABLES AND FUNCTIONS If we want to define a hidden variable, we could start our variable with a dot, shown

below.

18

> x<-10

> .x<-100

For the above code, x and .x are different variables. We can use ls() to see the

existence of x. > ls()

[1] "x"

To see all hidden variable or functions, we must specify all=TRUE, shown below. > ls(all=TRUE)

[1] ".x" "x"

To remove all our defined variables or functions, we can use rm(list=ls()). However,

if we want to remove all objects including hidden ones, we specify all=TRUE. > ls(all=TRUE)

[1] ".x"

> rm(list=ls(all=TRUE)) # remove all including hidden ones

> ls(all=TRUE)

character(0)

Another way to remove all objectives, including hidden ones, is to click “Misc” on the

menu bar, then “Remove all objects”. In addition to one dot, we could have multiple

dots in front of a variable or function name, shown below. > ..path<-"http://datayyy.com/" > ...path<-"http://datayyy.com/"

> ....path<-"http://datayyy.com/"

> ls(all=T)

[1] "....path" "...path" "..path" ".m" ".mean" "a"

1.26 Rstudio It is worth mentioning a platform called Rstudio as an alternative to the R console. From

their webpage at https://rstudio.com/products/rstudio/download/, readers/students can

could download and install it. After launching Rstudio for the first time, we will see the

following three panels.

19

The left-panel is quite like our normal R console. We can type our code there. After

trying something on the left panel, those values will show up in the top right panel,

shown below.

On the left panel Show on the top right panel

To write a savable program, click “File”, then “New”. We will see the number 4 panel

pops up, shown below.

There are several advantages of using Rstudio. First, we have several panels to work with.

Second, it is colorful, and this will reduce the chance of typing errors. Third, when typing,

20

some symbols might pop up automatically. According to Barr (2013), there are 6 reasons

why we should use Rstudio. Another advantage is that the structure of RStudio is like that

of Spyder, a Python editor. One disadvantage is that Rstudio is more complex than R

console. Occasionally, some code is not working properly on RStudio compared with R

console.

1.27 MORE ON HELP After clicking “Help” on the menu bar using R console, we will see FAQ on R, FAQ on

R for Windows, manuals (in PDF), etc., shown below.

On the other hand, the bottom right panel, for Rstudio, is a helping window, shown below.

1.29 WINDOWS, MAC AND CHROME BOOK When learning a programming language(s), readers/students need a personal computer to

write their code and run programs. We have two types of computers: a PC (Windows)

21

and a Mac. For most functionalities, there is no difference between them. On the other

hand, there exist some minor differences, such as how to define a path, where to find our

downloaded data sets or programs. To help Mac users, we have generated a help menu.

By typing .macUsers (or .mac for a short-cut), the following menu would pop up.

A few students might use Chromebooks. A Chromebook is essentially a laptop with

Google’s Chrome OS (Operating System) on it instead of Windows or MacOS. Dube

(2020) discussed 5 pros and cons of using a Chromebook. To help Chromebook users, we

have generated a help menu. By typing .chromebook (or .cb for a short-cut), the

following menu would pop up.

One alternative for both Mac and Chrome users is to use a so-called Virtual Lab

discussed in the next section.

1.30 VISUAL LAB This section is optional for students/readers who could run both R and Python on their

own computers. This section is different for students/readers who is outside Geneseo. For

22

readers/students at Geneseo, go to the website at http://go.geneseo.edu/publicvirtuallab,

shown below.

After clicking “Public Lab”, we will see the following image.

Check the box for “File transfer”, then click on “Allow”. Enter your Username and

Password, then click on “Submit”, shown below.

We will have the following desktop.

After clicking on the R icon, we can launch R (not show here to save space). To sign off

this virtual lab, click the appropriate icon on the top right, shown below.

23

Later in the book, we will explain how to upload our data and files to this virtual lab, and

how to output our result or data sets to OneDrive and other personal devices. In Chapter

14, Python Basics, we will explain how to launch Python by using this virtual lab.

1.31 SUMMARY In this chapter, we discussed how to install R, its basics and value assignments. Those are

the most basic concepts for learning a computer language. Later in the book/course, we

will use those concepts repeatedly. Thus, a new user should have certain confidence if

he/she feels overwhelmed by those new concepts.

In the next chapter, Chapter 2, Simple functions using R, we will discuss how to write

one-line R function. Then, we will explain how to extend it to a multi-line function. In

addition, we explain how to add comments to make our functions more readable. For

example, we could add the objective of the program, the formula used for the function,

definitions of input variables, any default values, and a few concrete examples. Because

of those extra comments or help, our functions become self-explanatory.

REFERENCES Barr, Andrew, 2013, Top 6 reasons you need to be using RStudio, https://www.r-

bloggers.com/2013/02/top-6-reasons-you-need-to-be-using-rstudio/

Anand, Abhinav, Xing Huan, and Jalaj Pathak, 2021, Does financial disclosure

readability predict shareholder activism?

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3991459

Blackmagic, Davici Resolve 17, 2021,

https://www.blackmagicdesign.com/products/davinciresolve/

Dube, Kat, 2020, 5 Pros And Cons Of Using A Chromebook,

http://datayyy.com/webs/chromebook.html

Github, 2021, http://github.com

Hayes, Bob, 2019,Programming Languages Most Used and Recommended by Data

Scientists,https://businessoverbroadway.com/2019/01/13/programming-languages-

most-used-and-recommended-by-data-scientists/

Kane, David and Joseph D. Masters, 2009, Open Source Finance, Investing,

https://joi.pm-research.com/content/18/1/92

24

Lightworks, 14, https://www.lwks.com

Matlabhelp.com, Comparison of R, MATLAB and Python:

https://www.matlabhelp.com/comparison-of-r-matlab-and-python/

Merriam-Webster, data’s definition, https://www.merriam-webster.com/dictionary/data

Shotcut, 2021, http://shotcut.com

Yan, Yuxing, 2018, CRSP for Teaching, SSRN working papers,

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3303504

Yan, Yuxing, 2016, Teaching programming skills to finance students: how to design and

teach a great course, https://link.springer.com/article/10.1186/s40854-017-0081-x

Video #1: DANL100 Chapter 1: R basics

https://www.youtube.com/watch?v=xP6PTpcuciU

Video #2, How to download and install R (6m39s) [for windows)

https://www.youtube.com/watch?v=ZoPJGmpYJzw

Video #3, Programming in R - Getting Started - Installing R and RStudio on a Mac

(5m59s), https://www.youtube.com/watch?v=Ywj6yNfc5nM

APPENDIX A: Copy and paste the following line onto the R window.

source("http://datayyy.com/rpy/week1.txt")

Appendix B: Finding the contents of Chapter 1.

To view the contents of Chapter 1, we type .c1. Note that there is a dot in front of c1.

25

Appendix C: Typing .uu to see the menu related to utility functions, shown below.

EXERCISES

1.1 What are the advantages of using R?

1.2 What does ‘Open-Source’ mean?

1.3 Please offer a few examples of open-source software.

1.4 What are three types of computers?

1.5 Compare R with Python and SAS. What are their advantages and disadvantages?

1.6 What is the home page of R?

1.7 How would we assign a value to a new variable?

1.8 How many ways are there to assign values to a new variable?

1.8 What is the difference between ls() and rm()?

1.9 Generate a vector from 2 to 15, then from 20 to 40. Estimate its mean, standard

deviation and median.

26

1.10 What might be the disadvantages of using R?

1.11 Is R case-sensitive?

1.12 Is R free?

1.13 How would we get help for R?

1.14 How would we add a comment?

1.15 Is it difficult to install R?

1.16 Will the R software compile a comment line?

1.17 Does a space play a role in R’s command?

1.18 How would we download manuals related to R?

1.19 Input values for x range from 1 to 100 and 202 to 300.

1.20 Reverse the input values in the above exercise.

1.21 How many 5-letter long embedded functions in R?

1.22 How would we find the titles of all chapters?

1.23 How would get support when using R?

1.24 What is Rstudio? From where we could download and install it?

1.25 What are the advantages of using Rstudio?

1.26 How could colorful code help a programmer?

1.27 What is the meaning of the following code?

>q("no")

1.28 From where could we find help for learning R?

1.29 According to Barr (2013), what are the advantages of using Rstudio?

1.30 What are the usages of the up- and down-arrow keys?

1.31 What are the scenarios when using the symbol “->” to assign a value?

1.32 Why would we call it “the magic use of the tab-key”?

1.33 What is the difference between two functions: seq() and rep()?

1.34 How many 6-letter long embedded functions are there?

1.35 What is the usage of the seq() function? Offer a few examples.

1.36 What is the usage of the rep() function? Offer a few examples.

1.37 Both R and Python are open-source language. True or False?

1.38 Debug the following code.

>q("Yes")

© by Yuxing Yan, [email protected], 1/4/2022.