25
Technologies Fueling Predictive Analytics Discussion & Demos

Technologies Fueling Predictive Analytics Discussion & Demos

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Technologies Fueling Predictive Analytics Discussion & Demos

Technologies Fueling Predictive AnalyticsDiscussion & Demos

Page 2: Technologies Fueling Predictive Analytics Discussion & Demos

Terrorist Surveillance

Page 3: Technologies Fueling Predictive Analytics Discussion & Demos

Winning Baseball Games

Page 4: Technologies Fueling Predictive Analytics Discussion & Demos

B I G D A T A L E A R N I N G

Page 5: Technologies Fueling Predictive Analytics Discussion & Demos

RIGHT TOOL FOR THE RIGHT JOB

1. Data Discovery2. Model Prototyping and Selection3. Integration into broader data strategy

1. POC Level Solution2. Robust Solution

4. Consumable location(s)

Page 6: Technologies Fueling Predictive Analytics Discussion & Demos

DATA DISCOVERY TOOLS

Page 7: Technologies Fueling Predictive Analytics Discussion & Demos

MODEL BUILDING & SELECTION TOOLS

MS R OPEN

On-Premise / CloudOn-Premise

R OPEN CRAN AZUREML

Cloud

Page 8: Technologies Fueling Predictive Analytics Discussion & Demos

AZURE ML

Algorithm Marketplace

Cloud Sharing API Integration

Page 9: Technologies Fueling Predictive Analytics Discussion & Demos

DEMO OF AZURE ML

Page 10: Technologies Fueling Predictive Analytics Discussion & Demos

WHAT IS R (AND MS R OPEN)?

ScalableOpen Source Global Community

Eco-System

Page 11: Technologies Fueling Predictive Analytics Discussion & Demos

MICROSOFT R CLIENT

MRAN Parallel ScaleR Prod. Locally

Page 12: Technologies Fueling Predictive Analytics Discussion & Demos

FORCES CHALLENGING THE IMPLEMENTATION OF R

Page 13: Technologies Fueling Predictive Analytics Discussion & Demos

MICROSOFT R SERVER

Efficiency Speed and Scalability

Peace of Mind Agility

Page 14: Technologies Fueling Predictive Analytics Discussion & Demos

MICROSOFT R SERVER§100% Open R Source

§Cran, Mran, Github Connectivity

§Big-Data Connectivity

§Scalable Analytics

§Multi-Platform

§In-Database, In-Cluster Processing

§Choice of IDE

R Server Technology

DeployR IDE

ConnectR

ScaleR

DistributedR

CRAN

Mic

roso

ft R

Ope

n

Licensed ComponentsOpen SourceComponents

Page 15: Technologies Fueling Predictive Analytics Discussion & Demos

MICROSOFT R SERVER

• 100% Open R Source• Cran, Mran, Github Connectivity• Big-Data Connectivity• Scalable Analytics• Multi-Platform• In-Database, In-Cluster Processing• Choice of IDE

Page 16: Technologies Fueling Predictive Analytics Discussion & Demos

COMPONENTS OF R SERVER

Page 17: Technologies Fueling Predictive Analytics Discussion & Demos

REVOSCALER

Not available in MS R open

Not available in MS R open

MS R Client

MS R Server

DistributedExecution

Enhanced File Format

Improved Functions

Stream Datato Disk

Page 18: Technologies Fueling Predictive Analytics Discussion & Demos

REVOSCALER FUNCTIONSDate Preparation§ Data import – delimited, Fixed, SAS, SPSS, OBDC§ Variable creation & transformation§ Recode variables§ Factor variables§ Missing value handling§ Sort, Merge, Split§ Aggregate by category (means, sums)

Descriptive Statistics§ Min/Max,Mean, Median (approx.)§ Quantiles (approx.)§ Standard Deviation§ Variance§ Correlation§ Covariance§ Sum of Squares (cross product matrix for set

variables)§ Pairwise Cross tabs§ Risk Ratio & Odds Ratio§ Cross-Tabulation of Data (standard tables & long

form)§ Marginal Summaries of Cross Tabulations

Statistical Tests§ Chi Square Test§ Kendall Rank Correlation§ Fisher’s Exact Test

Sampling§ Subsample (observations & variables)§ Random sampling

Predictive Models§ Sum of Squares (cross product matrix for set

variables)§ Multiple Linear Regression§ Generalized Linear Models (GLM) exponential family

distributions: binominal, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions.

§ Covariance & Correlation Matrices§ Logistic Regression§ Classification & Regression Trees§ Predictions/scoring for models§ Residuals for all models

Variable Selection§ Stepwise Regression

Simulation§ Simulation (e.g. Monte Carlo)§ Parallel Random Number Generation

Cluster Analysis§ K-Means

Classification§ Decision Trees§ Decision Forests§ Gradient Boosted Decision Trees§ Naïve Bayes

Combination§ rxDataStep§ rxExec§ PEMA-R API Custom Algorithms

Page 19: Technologies Fueling Predictive Analytics Discussion & Demos

Microsoft R Server

DeployR DevelopR

ConnectR

ScaleR

DistributedR

R+C

RAN

RSR

Con

nect

or

DISTRIBUTED RWRITE ONCE DEPLOY ANYWHERE

Workstations& Servers

LinuxWindows

Code Portability Across Platforms

Hadoop

HortonworksClouderaMapR

+ HD Insights+Hadoop Spark

EDW Teradata + SQL Server v16

In the CloudAzure Marketplace + Azure ML

Roa

dmap

Page 20: Technologies Fueling Predictive Analytics Discussion & Demos

R VS MS R VS R SERVERMicrosoft R Open Microsoft R Server

Data size In-memory In-memory In-memory or Disk Based

Speed of Analysis Single threaded Multi-threaded Multi-threaded, parallel processing 1:N servers

Support Community Community Community + Commercial

Analytic Breadth & Depth 7500+ innovative analytic packages

7500+ innovative analytic packages

7500+ innovative packages + commercial parallel high-speed functions

License Open Source Open SourceCommercial license, supported release with indemnity

Page 21: Technologies Fueling Predictive Analytics Discussion & Demos

DEMO OF MS OPEN / R SERVER

Page 22: Technologies Fueling Predictive Analytics Discussion & Demos

A NOD TO OTHER TECHNOLOGIES

Page 23: Technologies Fueling Predictive Analytics Discussion & Demos

CONSIDER PLATFORM AS A SERVICE (PAAS)

1. Security & Governance

4. Rapid Improvement

2. Sharing & Collaboration

3. Easier licensing

Page 24: Technologies Fueling Predictive Analytics Discussion & Demos

TOPPICTUREbrush

Conduct a 1-2 hour workshop with business stakeholders to identify opportunities to adopt Big Data and Advanced Analytics solutions:

• Joint Strategy session• Identify various Big Data solution design patterns• Brainstorm Big Data and Advanced Analytics uses cases• Discuss opportunities for PoCs and PoTs

DISCOVER YOURDATA’S POTENTIAL

Page 25: Technologies Fueling Predictive Analytics Discussion & Demos

Marc [email protected]

Karla [email protected]

Mike [email protected]

Advanced Analytics TourQuestions?