14
Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Semantic Community Data Science Data Science for Random Forests November 2, 2015

Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

Embed Size (px)

Citation preview

Page 1: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

Introduction to Data Science with R and

SpotfireDr. Brand Niemann

Director and Senior Data Scientist/Data JournalistSemantic CommunitySemantic Community

Data ScienceData Science for Random Forests

November 2, 2015

Page 2: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

Overview

• Learning Path: Data Science with R• Play Video: 15 Minute Introduction

• Kaggle Competition• How Much Did It Rain Part II

• Kaggle Rain II in Spotfire• TIBCO Spotfire TERR Tools

Page 3: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

Learning Path: Data Science with R

• Publisher: O'Reilly Media• Released: August 2015• Run time: 24 hours 34 minutes• The R programming language has arguably become the single most important tool for

computational statistics, visualization, and data science. With this Learning Path, master all the features you'll need as a data scientist, from the basics to more advanced techniques including R Graph and machine learning. You'll work your data like never before.• Learning to Program with R, by Stuart Greenlee, 04:18:15• Introduction to Data Science with R, by Garrett Grolemund, 08:36:40• Expert Data Wrangling with R, by Garrett Grolemund, 03:50:39• Writing Great R Code, by Richard Cotton, 00:59:13• Data Science with Microsoft Azure and R, by Stephen Elston, 06:48:46

Page 4: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

https://player.oreilly.com/videos/9781491940303?toc_id=220077

Play Video: 15 Minute Introduction

Page 5: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

https://www.kaggle.com/c/how-much-did-it-rain-ii

Page 6: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

https://www.kaggle.com/c/how-much-did-it-rain-ii/data

train.CSV 1219 MBtest.CSV 633 MBsample_solution.CSV 12 MBsample_dask.py

Page 7: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data
Page 8: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data
Page 9: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data
Page 10: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data
Page 11: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data
Page 12: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

Observations

• The previous are the statistical characteristics of the three data sets.• Treating a stochastic problem with a deterministic modeling approach.• Marshall–Palmer relation: Z = aRb, where a and b are adjustable parameters. Z (mm6 m-3)

is the radar reflectivity and R (mm h-1) is rainfall rate.• Data Dictionary:

• radardist_km: Distance of gauge from the radar whose observations are being reported.• Ref: Radar reflectivity in km• RefComposite: Maximum reflectivity in the vertical column above gauge. In dBZ.• RhoHV: Correlation coefficient (unitless)• Zdr: Differential reflectivity in dB• Kdp: Specific differential phase (deg/km)• Expected: Actual gauge observation in mm at the end of the hour.

• Try Insert Calculated Colum and/or Regression Modeling.

Page 13: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

Insert Calculated Column Regression Modeling

Page 14: Introduction to Data Science with R and Spotfire Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data

TIBCO Spotfire TERR ToolsMORE TO FOLLOW