Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Technologies Fueling Predictive AnalyticsDiscussion & Demos
Terrorist Surveillance
Winning Baseball Games
B I G D A T A L E A R N I N G
RIGHT TOOL FOR THE RIGHT JOB
1. Data Discovery2. Model Prototyping and Selection3. Integration into broader data strategy
1. POC Level Solution2. Robust Solution
4. Consumable location(s)
DATA DISCOVERY TOOLS
MODEL BUILDING & SELECTION TOOLS
MS R OPEN
On-Premise / CloudOn-Premise
R OPEN CRAN AZUREML
Cloud
AZURE ML
Algorithm Marketplace
Cloud Sharing API Integration
DEMO OF AZURE ML
WHAT IS R (AND MS R OPEN)?
ScalableOpen Source Global Community
Eco-System
MICROSOFT R CLIENT
MRAN Parallel ScaleR Prod. Locally
FORCES CHALLENGING THE IMPLEMENTATION OF R
MICROSOFT R SERVER
Efficiency Speed and Scalability
Peace of Mind Agility
MICROSOFT R SERVER§100% Open R Source
§Cran, Mran, Github Connectivity
§Big-Data Connectivity
§Scalable Analytics
§Multi-Platform
§In-Database, In-Cluster Processing
§Choice of IDE
R Server Technology
DeployR IDE
ConnectR
ScaleR
DistributedR
CRAN
Mic
roso
ft R
Ope
n
Licensed ComponentsOpen SourceComponents
MICROSOFT R SERVER
• 100% Open R Source• Cran, Mran, Github Connectivity• Big-Data Connectivity• Scalable Analytics• Multi-Platform• In-Database, In-Cluster Processing• Choice of IDE
COMPONENTS OF R SERVER
REVOSCALER
Not available in MS R open
Not available in MS R open
MS R Client
MS R Server
DistributedExecution
Enhanced File Format
Improved Functions
Stream Datato Disk
REVOSCALER FUNCTIONSDate Preparation§ Data import – delimited, Fixed, SAS, SPSS, OBDC§ Variable creation & transformation§ Recode variables§ Factor variables§ Missing value handling§ Sort, Merge, Split§ Aggregate by category (means, sums)
Descriptive Statistics§ Min/Max,Mean, Median (approx.)§ Quantiles (approx.)§ Standard Deviation§ Variance§ Correlation§ Covariance§ Sum of Squares (cross product matrix for set
variables)§ Pairwise Cross tabs§ Risk Ratio & Odds Ratio§ Cross-Tabulation of Data (standard tables & long
form)§ Marginal Summaries of Cross Tabulations
Statistical Tests§ Chi Square Test§ Kendall Rank Correlation§ Fisher’s Exact Test
Sampling§ Subsample (observations & variables)§ Random sampling
Predictive Models§ Sum of Squares (cross product matrix for set
variables)§ Multiple Linear Regression§ Generalized Linear Models (GLM) exponential family
distributions: binominal, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions.
§ Covariance & Correlation Matrices§ Logistic Regression§ Classification & Regression Trees§ Predictions/scoring for models§ Residuals for all models
Variable Selection§ Stepwise Regression
Simulation§ Simulation (e.g. Monte Carlo)§ Parallel Random Number Generation
Cluster Analysis§ K-Means
Classification§ Decision Trees§ Decision Forests§ Gradient Boosted Decision Trees§ Naïve Bayes
Combination§ rxDataStep§ rxExec§ PEMA-R API Custom Algorithms
Microsoft R Server
DeployR DevelopR
ConnectR
ScaleR
DistributedR
R+C
RAN
RSR
Con
nect
or
DISTRIBUTED RWRITE ONCE DEPLOY ANYWHERE
Workstations& Servers
LinuxWindows
Code Portability Across Platforms
Hadoop
HortonworksClouderaMapR
+ HD Insights+Hadoop Spark
EDW Teradata + SQL Server v16
In the CloudAzure Marketplace + Azure ML
Roa
dmap
R VS MS R VS R SERVERMicrosoft R Open Microsoft R Server
Data size In-memory In-memory In-memory or Disk Based
Speed of Analysis Single threaded Multi-threaded Multi-threaded, parallel processing 1:N servers
Support Community Community Community + Commercial
Analytic Breadth & Depth 7500+ innovative analytic packages
7500+ innovative analytic packages
7500+ innovative packages + commercial parallel high-speed functions
License Open Source Open SourceCommercial license, supported release with indemnity
DEMO OF MS OPEN / R SERVER
A NOD TO OTHER TECHNOLOGIES
CONSIDER PLATFORM AS A SERVICE (PAAS)
1. Security & Governance
4. Rapid Improvement
2. Sharing & Collaboration
3. Easier licensing
TOPPICTUREbrush
Conduct a 1-2 hour workshop with business stakeholders to identify opportunities to adopt Big Data and Advanced Analytics solutions:
• Joint Strategy session• Identify various Big Data solution design patterns• Brainstorm Big Data and Advanced Analytics uses cases• Discuss opportunities for PoCs and PoTs
DISCOVER YOURDATA’S POTENTIAL
Marc [email protected]
Karla [email protected]
Mike [email protected]
Advanced Analytics TourQuestions?