Technologies Fueling Predictive Analytics Discussion & Demos

Preview:

Citation preview

Technologies Fueling Predictive AnalyticsDiscussion & Demos

Terrorist Surveillance

Winning Baseball Games

B I G D A T A L E A R N I N G

RIGHT TOOL FOR THE RIGHT JOB

1. Data Discovery2. Model Prototyping and Selection3. Integration into broader data strategy

1. POC Level Solution2. Robust Solution

4. Consumable location(s)

DATA DISCOVERY TOOLS

MODEL BUILDING & SELECTION TOOLS

MS R OPEN

On-Premise / CloudOn-Premise

R OPEN CRAN AZUREML

Cloud

AZURE ML

Algorithm Marketplace

Cloud Sharing API Integration

DEMO OF AZURE ML

WHAT IS R (AND MS R OPEN)?

ScalableOpen Source Global Community

Eco-System

MICROSOFT R CLIENT

MRAN Parallel ScaleR Prod. Locally

FORCES CHALLENGING THE IMPLEMENTATION OF R

MICROSOFT R SERVER

Efficiency Speed and Scalability

Peace of Mind Agility

MICROSOFT R SERVER§100% Open R Source

§Cran, Mran, Github Connectivity

§Big-Data Connectivity

§Scalable Analytics

§Multi-Platform

§In-Database, In-Cluster Processing

§Choice of IDE

R Server Technology

DeployR IDE

ConnectR

ScaleR

DistributedR

CRAN

Mic

roso

ft R

Ope

n

Licensed ComponentsOpen SourceComponents

MICROSOFT R SERVER

• 100% Open R Source• Cran, Mran, Github Connectivity• Big-Data Connectivity• Scalable Analytics• Multi-Platform• In-Database, In-Cluster Processing• Choice of IDE

COMPONENTS OF R SERVER

REVOSCALER

Not available in MS R open

Not available in MS R open

MS R Client

MS R Server

DistributedExecution

Enhanced File Format

Improved Functions

Stream Datato Disk

REVOSCALER FUNCTIONSDate Preparation§ Data import – delimited, Fixed, SAS, SPSS, OBDC§ Variable creation & transformation§ Recode variables§ Factor variables§ Missing value handling§ Sort, Merge, Split§ Aggregate by category (means, sums)

Descriptive Statistics§ Min/Max,Mean, Median (approx.)§ Quantiles (approx.)§ Standard Deviation§ Variance§ Correlation§ Covariance§ Sum of Squares (cross product matrix for set

variables)§ Pairwise Cross tabs§ Risk Ratio & Odds Ratio§ Cross-Tabulation of Data (standard tables & long

form)§ Marginal Summaries of Cross Tabulations

Statistical Tests§ Chi Square Test§ Kendall Rank Correlation§ Fisher’s Exact Test

Sampling§ Subsample (observations & variables)§ Random sampling

Predictive Models§ Sum of Squares (cross product matrix for set

variables)§ Multiple Linear Regression§ Generalized Linear Models (GLM) exponential family

distributions: binominal, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions.

§ Covariance & Correlation Matrices§ Logistic Regression§ Classification & Regression Trees§ Predictions/scoring for models§ Residuals for all models

Variable Selection§ Stepwise Regression

Simulation§ Simulation (e.g. Monte Carlo)§ Parallel Random Number Generation

Cluster Analysis§ K-Means

Classification§ Decision Trees§ Decision Forests§ Gradient Boosted Decision Trees§ Naïve Bayes

Combination§ rxDataStep§ rxExec§ PEMA-R API Custom Algorithms

Microsoft R Server

DeployR DevelopR

ConnectR

ScaleR

DistributedR

R+C

RAN

RSR

Con

nect

or

DISTRIBUTED RWRITE ONCE DEPLOY ANYWHERE

Workstations& Servers

LinuxWindows

Code Portability Across Platforms

Hadoop

HortonworksClouderaMapR

+ HD Insights+Hadoop Spark

EDW Teradata + SQL Server v16

In the CloudAzure Marketplace + Azure ML

Roa

dmap

R VS MS R VS R SERVERMicrosoft R Open Microsoft R Server

Data size In-memory In-memory In-memory or Disk Based

Speed of Analysis Single threaded Multi-threaded Multi-threaded, parallel processing 1:N servers

Support Community Community Community + Commercial

Analytic Breadth & Depth 7500+ innovative analytic packages

7500+ innovative analytic packages

7500+ innovative packages + commercial parallel high-speed functions

License Open Source Open SourceCommercial license, supported release with indemnity

DEMO OF MS OPEN / R SERVER

A NOD TO OTHER TECHNOLOGIES

CONSIDER PLATFORM AS A SERVICE (PAAS)

1. Security & Governance

4. Rapid Improvement

2. Sharing & Collaboration

3. Easier licensing

TOPPICTUREbrush

Conduct a 1-2 hour workshop with business stakeholders to identify opportunities to adopt Big Data and Advanced Analytics solutions:

• Joint Strategy session• Identify various Big Data solution design patterns• Brainstorm Big Data and Advanced Analytics uses cases• Discuss opportunities for PoCs and PoTs

DISCOVER YOURDATA’S POTENTIAL

Marc LobreeMarc.lobree@neudesic.com

Karla BenefielKarla.benefiel@neudesic.com

Mike RossiMike.rossi@neudesic.com

Advanced Analytics TourQuestions?