Revolution Confidential
Revolution Analytics
Overview of Revolution R Enterprise
Joseph B. Rickert, Marketing Manager
For the Dallas R User’s Group
Revolution Confidential
2
Agenda Revolution Analytics Today Revolution R Enterprise Revolution Analytics in the Enterprise Big Data with RevoScaleR Deploying R Throughout the Enterprise with
RevoDeployR
Revolution Confidential
3
Corporate Overview & Quick Facts
Founded 2008 (as REvolution Computing)
Office Locations Palo Alto (HQ), Seattle (Eng)
CEO David Rich
Number of Employees 40+
Number of customers 100+
Investors Northbridge Venture Partners, Intel Capital, Presidio Ventures
“Revolution Analytics is the leading commercial provider of software and support for the
open-source R statistical computing language.”
Revolution Confidential
4
OPEN SOURCE ANALYTICS FOR THE ENTERPRISE
The professor who invented analytic software for the experts now wants to take it to the masses
Most advanced statistical analysis software available
Half the cost of commercial alternatives
2M+ Users
2,500+ Applications
Statistics
Predictive Analytics
Data Mining
Visualization
Finance
Life Sciences
Manufacturing
Retail
Telecom
Social Media
Government
Power
Productivity
Enterprise Readiness
Revolution Confidential
5
Revolution R Enterprise
Productivity
Revolution Confidential
6
Revolution R Enterprise has Open-Source R Engine at the core
2,500 community packages and growing exponentially
R Engine Language Libraries
Community Packages
Technical Support
Web ServicesAPI
Big DataAnalysis
DeveloperIDE
BuildAssurance
ParallelTools
Multi-ThreadedMath Libraries
Revolution Confidential
7
A network of partners for integrated, large-scale data analysis
Advanced Analytics
Deployment / Consumption
Data Infrastructure
Revolution Confidential
8
Revolution R Enterprise
Performance
Performance: Intel MKL Math Libraries
OpenSource R
Revolution R Enterprise
Computation (4-core laptop) Open Source R2.13.2
Revolution R Enterprise5.0
Speedup(4-core laptop)
Linear Algebra1
Matrix Multiply 174.6 sec 10.4 sec 15.8x
Cholesky Factorization 25.7 sec 1.4 sec 17.6x
Linear Discriminant Analysis 224.4 sec 20.1 sec 7.6x
General R Benchmarks2
R Benchmarks (Matrix Functions) 24.9 sec 3.8 sec 5.5x
R Benchmarks (Program Control) 4.7 sec 4.6 sec Not appreciable
1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php2. http://r.research.att.com/benchmarks/
Revolution Confidential
10
Revolution R Enterprise
Big Data Analysis
Revolution Confidential
11
Hadoop File Based In-database
A common analytic platform across big data architectures
Revolution Confidential
12
Two Big Data problems: capacity and speed
Capacity: problems handling the size of data sets or models Data too big to fit into memory Even if it can fit, there are limits on what can be
done Even simple data management can be
extremely challenging Speed: even without a capacity limit,
computation may be too slow to be useful
Revolution Confidential
13
RevoScaleR: Big Data Analysis for Revolution R Enterprise
DistributedStatisticalAlgorithms
External Memory Programming Framework
XDF File Format
R LanguageInterface
Addresses performance by distributing computations between cores and computers
Addresses capacity through a
collection of functions for
chunking through massive data files
A novel high-speed file format designed specifically to support statistical analyses
Familiar, high-prodictivity
programming paradigm for R
users
Revolution ConfidentialThe basis for a solution for capacity, speed, distributed and streaming data – PEMA’s Parallel external memory algorithms
(PEMA’s) allow solution of both capacity and speed problems, and can deal with distributed and streaming data
External memory algorithms are those that allow computations to be split into pieces so that not all data has to be in memory at one time
It is possible to “automatically” parallelize and distribute such algorithms
14
Revolution Confidential
Core 0(Thread 0)
Core n(Thread n)
Core 2(Thread 2)
Core 1(Thread 1)
Multicore Processor (4, 8, 16+ cores)
DataData Data
Disk
RevoScaleR
Shared Memory
• A RevoScaleR algorithm is provided a data source as input• The algorithm loops over data, reading a block at a time. Blocks of data are read by a separate worker thread
(Thread 0).• Other worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update
intermediate results objects in memory• When all of the data is processed a master results object is created from the intermediate results objects
RevoScaleR on a Multicore Server
Revolution Confidential
16
Compute Node
(RevoScaleR)
Compute Node
(RevoScaleR) Master Node
(RevoScaleR)
DataPartition
DataPartition
Compute Node
(RevoScaleR)
Compute Node
(RevoScaleR)
DataPartition
DataPartition
• Portions of the data source are made available to each compute node
• RevoScaleR on the master node assigns a task to each compute node
• Each compute node independently processes its data, and returns it’s intermediate results back to the master node
• master node aggregates all of the intermediate results from each compute node and produces the final result
RevoScaleR for Distributed Computing Clusters
Revolution Confidential
17
Platform-agnostic Big Data Analytics Set “compute context” to define hardware (one line of code)
Native job-scheduler handles distribution, monitoring, failover etc. Same code runs on other supported architectures
Just change compute context Supported architectures:
Windows: Microsoft HPC Server Linux: Platform Computing LSF (coming 2012)
42 seconds instead of 6 minutes
Revolution Confidential
18
R and Hadoop Hadoop offers a scalable infrastructure for
processing massive amounts of data Storage – HDFS, HBASE Distributed Computing - MapReduce
R is a statistical programming language for developing advanced analytic applications
Currently, writing analytics for Hadoop requires a combination of Java, pig, Python, …
The Rhadoop project makes it possible to write PEMAs for Hadoop using the R language alone.
Revolution Confidential
19
Massively parallel/distributed analytics:RevoConnectR for Hadoop
Revolution R Client
R
Map or Reduce
Job Tracker
Task Node
HDFS
HBASE
Thrift
rhdfs - R and HDFS rhbase - R and HBASE rmr - R and MapReduce
Write Map-Reduce analytics using only R code with these R packages:
rmr
rhdfs rhbase
More information at:bit.ly/r-hadoop
Revolution Confidential
20
In-Database Execution with IBM Netezza
Revolution Confidential
21
Revolution R Enterprise
Enterprise Deployment
Revolution Confidential
22
Revolution R Web Services: RevoDeployR Data Sources
& Creation of Analytics
R / Statistical Modeling Expert
Revolution “RevoDeployR”
Data Analysis
Business Intelligence
Interactive Web Apps
Cloud / SaaS
Consumption of Analytics & Results
DeploymentExpert
Revolution Confidential
24
Thank you.
www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR
The leading commercial provider of software and support for the popular open source R statistics language.