Upload
david-kun
View
422
Download
1
Embed Size (px)
Citation preview
1Enterprise R platform
Enterprise R Platform
eRum 2016
2Enterprise R platform
Motivation
1. Can new hires get set up in the environment to run analyses on their first day?
2. Can data scientists utilize the latest tools/packages without help from IT?
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?
The “Joel Test” for Data Science
3Enterprise R platform
Motivation
5. Does collaboration happen through a system other than email?
6. Can predictive models be deployed to production without custom engineering or infrastructure work?
7. Is there a single place to search for past research and reusable data sets, code, etc.?
8. Do your data scientists use the best tools money can buy?
Source: https://blog.dominodatalab.com/joel-test-data-science/
+1 Can you access your statistics without using R?
The “Joel Test” for Data Science
4Enterprise R platform
Ideal ProcessFrom code to production
SCM Package Repo
ProdProdProd
5Enterprise R platform
Ideal ProcessFrom code to production
SCM Package Repo
ProdProdProdCI
• Documentation• Unit tests• Build
6Enterprise R platform
Ideal ProcessFrom code to production
SCM Package Repo
ProdProdProdCI
• Upload to repository
7Enterprise R platform
Ideal ProcessFrom code to production
SCM Package Repo
ProdProdProdCI
• Deploy to separate applications
8Enterprise R platform
laiR
1. Multiple repositories
• Shared and Private repositories
• CRAN & Bioconductor
2. Upload any tar.gz with a DESCRIPTION
3. Search packages
4. Dependency management across all repositories
5. Zero-config installation
6. Maintain multiple versions of the same package
7. For a demo see: https://lair.ownr.io (demo / demo123)
The R package repository
9Enterprise R platform
laiR
1. Can new hires get set up in the environment to run analyses on their first day?
2. Can data scientists utilize the latest tools/packages without help from IT? ✅
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?
The “Joel Test” for Data Science
10Enterprise R platform
laiR
5. Does collaboration happen through a system other than email? ✅
6. Can predictive models be deployed to production without custom engineering or infrastructure work?
7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R?
The “Joel Test” for Data Science
11Enterprise R platform
roveR
1. Preconfigured, separated R environments
2. Open-source R package
3. Linking R environments to laiR installations
4. Release packages into laiR
5. Install dependencies with specific versions using laiR API
6. Install rover using the following command:
R container management
install.packages("rover", repos = c("https://lair.functionalfinances.com/repos/cran/",
"https://lair.functionalfinances.com/repos/shared/"),method = "curl")
12Enterprise R platform
roveR + laiR
1. Can new hires get set up in the environment to run analyses on their first day? ✅
2. Can data scientists utilize the latest tools/packages without help from IT? ✅
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅
The “Joel Test” for Data Science
13Enterprise R platform
roveR + laiR
5. Does collaboration happen through a system other than email? ✅
6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R? ✅
The “Joel Test” for Data Science
14Enterprise R platform
exposeR
1. Access selected R calculations via REST
2. roveR container management - CRUD
3. Select functions to expose from packages in container
4. Allows other application to integrate R calculation
5. R in a dedicated environment
6. Zero-config installation
REST API for R containers
15Enterprise R platform
exposeR + roveR + laiR
1. Can new hires get set up in the environment to run analyses on their first day? ✅
2. Can data scientists utilize the latest tools/packages without help from IT? ✅
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅
The “Joel Test” for Data Science
16Enterprise R platform
exposeR + roveR + laiR
5. Does collaboration happen through a system other than email? ✅
6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R? ✅
The “Joel Test” for Data Science
17Enterprise R platform
Reference ImplementationThe Delta-Lloyd case study
Dev
elop
men
t
R DEPLOYMENT PROCESS Functional Finances Ltd | January 13, 2016
R ContainerroveR::create
GITcommit
test, build, install
Jenkinscheckoutcheckout
test, build, install
laiR
roveR::release
roveR::install
Development Start
R Container
roveR::install
roveR::createTest Start Jenkins
Shiny
CLI
execute
execute
download
release
laiR
R Container(MPI grid)
roveR::install
roveR::createProduction Start
Shiny
CLI
execute
execute
Test
Prod
uctio
n
roveR::install
roveR::install
Sand
box
R Container(MPI grid)
roveR::createGITcommit
test, build, install
clone
checkoutModel Development
Start
18Enterprise R platform
Reference Implementation
1. Can new hires get set up in the environment to run analyses on their first day? ✅
2. Can data scientists utilize the latest tools/packages without help from IT? ✅
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops? ✅
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅
The “Joel Test” for Data Science
19Enterprise R platform
Reference Implementation
5. Does collaboration happen through a system other than email? ✅
6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy? ✅
+1 Can you access your statistics without using R? ✅
The “Joel Test” for Data Science