23
A Presentation on Choices In Setting Up R for Business Analytics By Mandava Geetha Bhargava 162031005

Choices in setting up r for business analytics

Embed Size (px)

Citation preview

Page 1: Choices in setting up r for business analytics

A Presentation on Choices In Setting Up

R for Business Analytics

By

Mandava Geetha Bhargava

162031005

Page 2: Choices in setting up r for business analytics

R Language

• R is a programming language and software environment for statistical

analysis, graphics representation and reporting. R was created by Ross

Ihaka and Robert Gentleman at the University of Auckland, New Zealand,

and is currently developed by the R Development Core Team.

• This programming language was named R, based on the first letter of first

name of the two R authors (Robert Gentleman and Ross Ihaka), and partly

a play on the name of the Bell Labs Language S.

Page 3: Choices in setting up r for business analytics

• The core of R is an interpreted computer language which allows branching

and looping as well as modular programming using functions. R allows

integration with the procedures written in the C, C++, .Net, Python or

FORTRAN languages for efficiency.

Page 4: Choices in setting up r for business analytics

Features Of R Language

• R is a programming language and software environment for statistical analysis, graphics

representation and reporting. The following are the important features of R −

• R is a well-developed, simple and effective programming language which includes

conditionals, loops, user defined recursive functions and input and output facilities.

• R has an effective data handling and storage facility,

• R provides a suite of operators for calculations on arrays, lists, vectors and matrices.

• R provides a large, coherent and integrated collection of tools for data analysis.

• R provides graphical facilities for data analysis and display either directly at the

computer or printing at the papers.

Page 5: Choices in setting up r for business analytics

Compatibility

• R is freely available under the GNU General Public License, and pre-

compiled binary versions are provided for various operating systems like

Linux, Windows and Mac.

Page 6: Choices in setting up r for business analytics

Microsoft Windows:

• This remains the most widely used operating system on the planet. If you are experienced in Windows-based computing and are active on analytical projects, it would not make sense for you to move to other operating systems unless there are significant cost savings and minimal business disruption as a result of the transition. In addition, compatibility issues are minimal for Microsoft Windows, and extensive help documentation is available. However, there may be some R packages that would not function well under Windows; in that case, a multiple operating system is your next option

Page 7: Choices in setting up r for business analytics

Mac OS and iOS:

• The reasons for choosing Mac OS remain its considerable appeal in

esthetically designed software and performance in art or graphics

related work, but Mac OS is not a standard operating system for enterprise

systems or statistical computing. However, open source R claims to be

quite optimized and can be used for existing Mac users

Page 8: Choices in setting up r for business analytics

Linux:

• This is the operating system of choice for many R users due to the fact

that it has the same open source credentials and so is a much better fit for all

R packages. In addition, it is customizable for large-scale data analytics. The

most popular versions of Linux are Ubuntu/Debian, Red Hat Enterprise Linux,

OpenSUSE, CentOS, and Linux Mint.

• (a) Ubuntu Linux is recommended for people making the transition to Linux for

the first time. Ubuntu Linux had a marketing agreement with Revolution

Analytics for an earlier version of Ubuntu, and many R packages can

be installed in a straightforward way. Ubuntu/Debian packages are also

available.

• (b) Red Hat Enterprise Linux is officially supported by Revolution Analytics for

its enterprise module

Page 9: Choices in setting up r for business analytics

Multiple operating systems

• Virtualization versus dual boot: if you are using more than two operating systems on your

PC. You can also choose between having VMware Player from

VMware(http://www.vmware.com/products/player/), if you want a virtual partition on

your computer that is dedicated to R-based computing, and having a choice of operating

system at startup.

• In addition, you can dual boot your computer with a USB installer from Ubuntu’s

Netbook remix http://www.ubuntu.com/desktop/getubuntu/windows-installer).A

software program called wubi helps with the dual installation of Linux and

Windows.

Page 10: Choices in setting up r for business analytics

Vendors of R Language Products

• You can choose between two kinds of R installations. One is free and open source

and is available at http://r-project.org; the other is commercial and offered by

many vendors including Revolution Analytics. However, there are other commercial

vendors too.

Commercial Vendors of R Language Products:

• Revolution Analytics: http://www.revolutionanalytics.com/

• XL Solutions: http://www.experience-rplus.com/

• Information Builder: http://www.informationbuilders.com/products/webfocus/

PredictiveModeling.html

• Blue Reference (Inference for R): http://inferenceforr.com/default.aspx

• R for RExcel: http://www.statconn.com/

Page 11: Choices in setting up r for business analytics

R-PACKAGES

• Currently, the CRAN package repository features 10080 available packages such as

• Apex- Phylogenetic Methods for Multiple Gene Data

• AppliedPredictiveModeling- Functions and Data Sets for 'Applied Predictive Modeling’

• Apache Log Processor-Process the Apache Web Server Access Log Files

• Apdesign - An Implementation of the Additive Polynomial Design Matrix

Page 12: Choices in setting up r for business analytics

Local Environment Setup – windows Installation

• You can download the Windows installer version of R from Website and click on R-3.2.2 for Windows (32/64 bit) and save it in a local directory.

• As it is a Windows installer (.exe) with a name "R-version-win.exe". You can just double click and run the installer accepting the default settings. If your Windows is 32-bit version, it installs the 32-bit version. But if your windows is 64-bit, then it installs both the 32-bit and 64-bit versions.

• After installation you can locate the icon to run the Program in a directory structure "R\R3.2.2\bin\i386\Rgui.exe" under the Windows Program Files. Clicking this icon brings up the R-GUI which is the R console to do R Programming.

Page 13: Choices in setting up r for business analytics

Local Environment Setup – Linux Installation

• R is available as a binary for many versions of Linux at the location R Binaries(https://cran.r-project.org/bin/linux/).

• The instruction to install Linux varies from flavor to flavor. These steps are mentioned under each type of Linux version in the mentioned link. However, if you are in a hurry, then you can use yum command to install R as follows −

• https://cran.r-project.org/bin/linux/

• Above command will install core functionality of R programming along with standard packages, still you need additional package, then you can launch R prompt as follows −

Page 14: Choices in setting up r for business analytics

• R is available as a binary for many versions of Linux at the location R Binaries(https://cran.r-project.org/bin/linux/).

• The instruction to install Linux varies from flavor to flavor. These steps are mentioned under each type of Linux version in the mentioned link. However, if you are in a hurry, then you can use yum command to install R as follows −

• https://cran.r-project.org/bin/linux/

• Above command will install core functionality of R programming along with standard packages, still you need additional package, then you can launch R prompt and plotrix package should be added for purpose of graphs

Page 15: Choices in setting up r for business analytics

Small Exercise

• >mystring <-”HELLO,WORLD!”

• >print ( mystring)

• [1] “HELLO,WORLD!”

Page 16: Choices in setting up r for business analytics

Operating System Subchoice: 32- or 64-bit

• Given a choice between a 32-bit versus 64-bit version of an operating system like Linux Ubuntu, keep in mind that the 64-bit version would speed up processing by an approximate factor of 2. However, you need to check whether your current hardware can support 64-bit operating systems; if so, you may want to ask your information technology manager to upgrade at least some of the operating systems in your analytics work environment to 64-bit versions. Smaller hardware like netbooks donot support 64-bit Linux, whereas Windows Home Edition computers may have 32-bit version installed on it. There are cost differences due to both hardware and software. One more advantage for 64-bit computing is the support from Revolution Analytics for its version of R Enterprise.

Page 17: Choices in setting up r for business analytics

R-Objects

• In contrast to other programming languages like C and java in R, the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. The frequently used ones are −

• Vectors

• Lists

• Matrices

• Arrays

• Factors

• Data Frames

Page 18: Choices in setting up r for business analytics

DATA TYPES OF R- LANGUAGE

• Logical (True, False)

• Numeric (12.3 , 5, 999)

• Integer (2L, 34L ,0L)

• Complex (3+2i)

• Character (‘a’ , ‘”good boy” , “TRUE” , “24.5”)

• Raw ("Hello" is stored as 48 65 6c 6c 6f)

Page 19: Choices in setting up r for business analytics

Hardware Choices: Cost-Benefit Tradeoffs for

Additional Hardware for R

• Hardware costs represent a significant expense for an analytics environment andare also remarkably depreciated over a short period of time. Thus, it is advisableto examine your legacy hardware and your future analytical computing needs anddecide accordingly regarding the various hardware options available for R.Unlike other analytical software that can charge by the number of processors, orservers, which can be more expensive than workstations, or grid computing, whichcan be very costly as well if it is even available, R is well suited for all kinds ofhardware environments with flexible costs.

Page 20: Choices in setting up r for business analytics

• Given the fact that R is memory intensive (it limits the size of data

analyzed to the RAM size of the machine unless special formats or

chunking is used), the speed at which R can process data depends on the

size of the datasets used and the number of users analyzing a dataset

concurrently.

• Thus the defining issue is not R but the size of the data being analyzed and

the frequency, repeatability, and level of detail of analysis required.

Page 21: Choices in setting up r for business analytics

Choices Between Local, Cluster, and Cloud

Computing

• Local Computing

• Cluster Computing

• Cloud Computing

Page 22: Choices in setting up r for business analytics

Interface Choices: Command Line Versus GUI

• R can be used in various ways depending on the level of customization. The mainGUIs suitable for business analyst audiences are as follows:1. R Commander2. Rattle3. Deducer and JGR4. GrapheR 5. RKWard6. Red-R7. Others including Sciviews-K

Page 23: Choices in setting up r for business analytics