64
Computational Statistics Setia Pramana 2015 Computational Statistics 1

Introduction to Computational Statistics

Embed Size (px)

Citation preview

Page 1: Introduction to Computational Statistics

Computational Statistics

Setia Pramana

2015

Computational Statistics 1

Page 2: Introduction to Computational Statistics

Course Outline

• Introduction– Different Statistical Software

• Data Preparation, Management, Manipulation, Summarization with:– SPSS– R (R Commander)– Ms. Excel

• Data Tabulation and Visualization

Computational Statistics 2

Page 3: Introduction to Computational Statistics

Course Outline

• Generate Different Statistical Distribution (with Rcmdr)

• Simple Linear Regression and Correlation• Basic R Programming• Developing Simple Graphical User Interface in R• Resampling Methods• Statistical Inference (Point and interval

estimation)

Computational Statistics 3

Page 4: Introduction to Computational Statistics

Course Outline

• Hypothesis testing: one, two sample t-test (test for mean difference, proportion and variance)

• Analysis of Variance (Anova): one and two way Anova.

• Introduction to Design of Experiment• Final Project

Computational Statistics 4

Page 5: Introduction to Computational Statistics

Course Workload

• 20% Theory, 80% practice• Group Project (5 students)• Presentation every week• R code would be provided• Slides can be seen at :

http://www.slideshare.net/hafidztio/

Computational Statistics 5

Page 6: Introduction to Computational Statistics

Reference Books

Computational Statistics 6

Page 7: Introduction to Computational Statistics

Reference Books• John Maindonald dan W. John Braun. Data Analysis and

Graphics Using R – an Example-Based Approach. 3rd

Edition. Cambridge University Press: Cambridge.2010.• John Fox. Journal of Statistical Software, The R

Commander : A Basic-Statistics Graphical User Interface to R.Volume 14, Issue 9, September 2005.

• Chris Beeley. Web Application Development with R Using Shiny. Packt Publishing: Birmingham.2013.

• SPSS Statistics Base User’s Guide 17.0. Polar Engineering and Consulting : Chicago, 2007.

Computational Statistics 7

Page 8: Introduction to Computational Statistics

Reference Books• Jurusan Komputasi Statistik. Modul Mata Kuliah

Komputasi Statistik. 2014• Kerns, G. Jays. Introduction to Probability and Statistics

Using R. E book. GNU Free Documentation License. 2010.

• Geof H. Givens dan Jennifer A. Hoeting. Computational Statistics, 2nd edition. John Wiley and Sons : New Jersey. 2013

• Jochen Voss. Statistical Computing. E book. 2011.• Brent B. Welch, Ken Jones dan Jeffrey Hobbs. Practical

Programming in Tcl and Tk. 4Th edition. Prentice Hall PTR: New Jersey.2003.

Computational Statistics 8

Page 9: Introduction to Computational Statistics

Other Materials

• https://sites.google.com/site/biostatinfocore/home/rworkshop

• https://sites.google.com/site/biostatinfocore/biostatistics-workshop

Computational Statistics 9

Page 10: Introduction to Computational Statistics

Introduction

Computational Statistics 10

Page 11: Introduction to Computational Statistics

Statistics?

Computational Statistics 11

Page 12: Introduction to Computational Statistics

Computational Statistics 12

Page 13: Introduction to Computational Statistics

What is Statistics?

• Statistics: is the science which deals with collection, classification and tabulation of numerical facts as the basis for explanation, description and comparison of phenomenon”.

Computational Statistics 13

Page 14: Introduction to Computational Statistics

Observations on the Bills of Mortality (1662)

Recorded Plague related death for 100 years

Computational Statistics 14

Page 15: Introduction to Computational Statistics

What is Statistics?• Exploring data: Using graphical and numerical

techniques to study patterns and departures from patterns (in order to interpreting data)

• Sampling and experimentation: Clarifying the question, deciding on methods of collection and analysis to produce valid information.

• Anticipating patterns: Exploring random phenomena using probability and simulation. Probability is our tool for anticipating distributions...

• Statistical Inference: Estimating population parameters and testing hypothesis

Computational Statistics 15

Page 16: Introduction to Computational Statistics

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write” HG Well

Computational Statistics 16

Page 17: Introduction to Computational Statistics

Areas of Statistics

Two areas of statistics:Descriptive Statistics: collection, presentation, and description of sample data.Inferential Statistics: making decisions and drawing conclusions about populations.

Computational Statistics 17

Page 18: Introduction to Computational Statistics

Statistics Descriptive

What is your conclusion?

The fatality rate is:

– 40% in the group of drivers who did not wear seat belts– 20%in drivers who did wear seat belts

• Seat belts appear to save lives

18Computational Statistics

Page 19: Introduction to Computational Statistics

Inferential Statistics

• Are results applicable to the population of all drivers? (generalization)

• Does wearing seat belts save lives? (assess strength of evidence)

• Is the fatality rate of those not wearing seat belts higher than the fatality rate of those wearing seat belts? (comparison)

• How many lives can be saved by wearing seat belts? (prediction)

• Do other variables influence the conclusion? For example: the age of driver, alcohol use, type of car, speed at impact (ask more questions)

19Computational Statistics

Page 20: Introduction to Computational Statistics

Statistics and the Technology

• The electronic technology has had a tremendous effect on the field of statistics.

• Many statistical techniques are repetitive in nature: computers and calculators are good at this.

• Lots of statistical software packages: R, MINITAB, SYSTAT, STATA, SAS, Statgraphics, SPSS, MS Excel, and calculators.

Computational Statistics 20

Page 21: Introduction to Computational Statistics

Available Statistical Packages

Computational Statistics 21

Page 22: Introduction to Computational Statistics

Available Statistical Packages

Proprietary Excel SPSS MINITAB SAS Stata Statistica Many more ……

Free Software LibreOffice Calc R CS Pro WinBugs EpiInfo Many more……..

Computational Statistics 22

Page 23: Introduction to Computational Statistics

Computational Statistics 23

Page 24: Introduction to Computational Statistics

Computational Statistics 24

Page 25: Introduction to Computational Statistics

Computational Statistics 25

Page 26: Introduction to Computational Statistics

Computational Statistics 26

Page 27: Introduction to Computational Statistics

Microsoft Excel

Computational Statistics 27

Page 28: Introduction to Computational Statistics

Which one do you use?

Why?

Computational Statistics 28

Page 29: Introduction to Computational Statistics

Statistical Software Used

Computational Statistics 29

Page 30: Introduction to Computational Statistics

Statistical Software Used

Computational Statistics 30

Page 31: Introduction to Computational Statistics

R is HOT !

Computational Statistics 31

Page 32: Introduction to Computational Statistics

R is HOT !

• R is HOT !

http://r4stats.com/articles/popularity/Computational Statistics 32

Page 33: Introduction to Computational Statistics

R is HOT !

http://r4stats.com/articles/popularity/Computational Statistics 33

Page 34: Introduction to Computational Statistics

R is HOT !

http://r4stats.com/articles/popularity/Computational Statistics 34

Page 35: Introduction to Computational Statistics

What is R?

• A language and environment for statistical computing and graphics.

• An integrated suite of software facilities for data manipulation, calculation and graphical display.

• First appeared in 1996 by Prof. Ross Ihaka and Robert Gentleman of the University of Auckland, NZ.

• GNU software -> Free. Similar like S language.• Open source, maintained and developed by a community

of developers.• Works in Windows, Unix, MacOsComputational Statistics 35

Page 36: Introduction to Computational Statistics

R includes

• Effective data handling and storage facility,• A suite of operators for calculations on arrays, in particular

matrices• A large, coherent, integrated collection of intermediate

tools for data analysis,• Graphical facilities for data analysis and display either on-

screen or on hardcopy• Well-developed, simple and effective programming

language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

http://www.r-project.org/Computational Statistics 36

Page 37: Introduction to Computational Statistics

Why R?

• It is not only statistical software but also a language

• 5000 add-on packages lots of pre-prepared packages (http://cran.r-project.org/web/packages/)

• With many applications http://cran.r-project.org/web/views/, http://www.revolutionanalytics.com/r-language-features-applications-and-extensions#thirdparty .

• Access to powerful, cutting-edge analytics Computational Statistics 37

Page 38: Introduction to Computational Statistics

Why R?

• Flexible (complex or standard statistical practices, bayesian modelling, GIS map building, building interactive web applications, building interactive tests, etc. )

• We can make our own package and publish it• Great Graphics and data visualization• Can be used for High Performance Computer Clusters• Well Supported by R Community (http://www.inside-r.org/r-

resources-web)• And many more…..

Computational Statistics 38

Page 39: Introduction to Computational Statistics

Why R?

• Can be integrated with other languages (C/C++, Java).

• R can interact with many data sources and other statistical packages (SAS, Stata, SPSS, and Minitab).

• For the high performance computing task multiple cores, either on a single machine or across a network.

39Computational Statistics

Page 40: Introduction to Computational Statistics

But…..

• R has no warranty

• Command Line Interface : difficult for some users.

• Users must learn a new way of thinking about data and data analysis sequence

• That’s all ….. I guess

Computational Statistics 40

Page 41: Introduction to Computational Statistics

Companies using R in 2013

• The New York Times routinely uses R for interactive and print data visualization.

• Google has more than 500 R users.• The FDA supports the use of R for clinical trials of new drugs.• The National Weather Service uses R to predict the extent of flooding

events.• Zillow uses R to model housing prices.• The Consumer Financial Protection Bureau uses R and other open

source tools.• Twitter uses R for data science applications on the Twitter database.• FourSquare uses R to develop its recommendation engine.• Facebook uses R to model all sorts of user behaviour.

Source: RevolutionanalyticsComputational Statistics 41

Page 42: Introduction to Computational Statistics

R Library/packages

R Base Packages

lme4IsoGene

foreign

survivalzoo

ggplot2zoo

reshape2

nlme

Computational Statistics 42

Page 43: Introduction to Computational Statistics

My R Packages• IsoGene• IsoGeneGUI• nea• neaGUI• biclustGUI• OCRME• More detail: http://setiopramono.wordpress.com/r-

programming/

Computational Statistics 43

Page 44: Introduction to Computational Statistics

R For Cutting Edge Technologies

44Computational Statistics

Page 45: Introduction to Computational Statistics

R Graphics and Visualization

• R provides wide range graphics and visualizations• Basic Plots: bar plots, basic 3D plots, heatmap.,etc• Geographic Maps• Projection Maps• Social Network Graphs• Animated graphics and movies (animation) • Motion Charts (GoogleViz) • Interactive Graphics (rggobi)• Image format: BMP, JPEG, PDF, PNG etc…• and….many more………

Computational Statistics 45

Page 46: Introduction to Computational Statistics

R Graphics

Computational Statistics 46

Page 47: Introduction to Computational Statistics

R Graphics

RCircoshttps://gjabel.wordpress.com/ 47Computational Statistics

Page 48: Introduction to Computational Statistics

R Graphics

A map of worldwide email traffic

Computational Statistics 48

Page 49: Introduction to Computational Statistics

R Graphics

Facebook connections between city centers around the world

Computational Statistics 49

Page 50: Introduction to Computational Statistics

R Graphical User Interfaces

• R uses Command line interface and it is preferred for advanced users allows direct control, more accurate, flexible and the analysis is reproducible.

• Requires good knowledge of the language difficult for beginners or less frequent users.

• R provides tools for building GUIs RGUI

Computational Statistics 50

Page 51: Introduction to Computational Statistics

R GUI Projects

• Integrated development environment (IDE)/Script Editors aimed to provide feature-rich environments to edit R scripts and code: Rstudio (www.rstudio.com), and architect (www.Openanalytics.eu)

• Web based application: the Rweb (Banfield, 1999), R.Net (www.u.arizona.edu/~ryckman/Net.php), or gWidgetsWWW (Verzani, 2012).

51Computational Statistics

Page 52: Introduction to Computational Statistics

R GUI Projects

• Python: OpenMeta-Analyst (Wallace et al, 2012)

• Java: JGR (Java GUI for R), Deducer (Fellows, 2012), and Glotaran (Snellenburg, 2012).

• Php: R-php (http://dssm.unipa.it/R-php/)

• Other extensions connect R to graphical toolboxes for developing menus and dialog boxes: Tcltk, Gtk.

52Computational Statistics

Page 53: Introduction to Computational Statistics

R Studio

• Download from Rstudio.com

• Powerfull IDE (Integrated Development Environment) for R.

Computational Statistics 53

Page 54: Introduction to Computational Statistics

RGUI Developed using tcltk

Computational Statistics 54

Page 55: Introduction to Computational Statistics

RGUI: RCommander

• Rcommander.com• Helpful for R beginner• Install inside R

Computational Statistics 55

Page 56: Introduction to Computational Statistics

RGUI using C#: Wires

• Developed by STIS students

• For Spatial Data Analysis

• Still developing…

Computational Statistics 56

Page 57: Introduction to Computational Statistics

RGUI using C#: Wires

Computational Statistics 57

Page 58: Introduction to Computational Statistics

RGUI: Web Based App

Computational Statistics 58

Page 59: Introduction to Computational Statistics

WebBUGS

• Conducting Bayesian Statistical Analysis Online

• Combines OpenBUGS and R

www.webbugs.psychstat.org

Computational Statistics 59

Page 60: Introduction to Computational Statistics

RGUI: Shiny

• A new package from Rstudio to build interactive web applications with R.

• Really Easy!• Build useful web applications with only a few lines of

code—no JavaScript required.• Self learning: http://shiny.rstudio.com/• http://www.showmeshiny.com/

Computational Statistics 60

Page 61: Introduction to Computational Statistics

RGUI using Shiny: FAST

Figure 5. FAST main page

61Computational Statistics

Page 62: Introduction to Computational Statistics

Dynamic Report Generation

• Sweave• knitr• markdown

Computational Statistics 62

Page 63: Introduction to Computational Statistics

Want to Learn R? Need Help?

Lots of Self learning Resources http://www.rdatamining.com/resources/onlinedocsBlogs:

Software # Blogs Blogs SourceR 550 R-Bloggers.comPython 60 SciPy.orgSAS 40 PROC-X.com, sasCommunity.org PlanetStata 11 Stata-Bloggers.com

User Group: Stockholm R User group, etc… Indonesia/Jakarta?https://sites.google.com/site/biostatinfocore/introduction-to-r

Computational Statistics 63

Page 64: Introduction to Computational Statistics

Need Help?

Computational Statistics 64

Number of R- or SAS-related posts to Stack Overflow by week.