20
1 Enterprise R platform Enterprise R Platform eRum 2016

ownR presentation eRum 2016

Embed Size (px)

Citation preview

Page 1: ownR presentation eRum 2016

1Enterprise R platform

Enterprise R Platform

eRum 2016

Page 2: ownR presentation eRum 2016

2Enterprise R platform

Motivation

1. Can new hires get set up in the environment to run analyses on their first day?

2. Can data scientists utilize the latest tools/packages without help from IT?

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?

The “Joel Test” for Data Science

Page 3: ownR presentation eRum 2016

3Enterprise R platform

Motivation

5. Does collaboration happen through a system other than email?

6. Can predictive models be deployed to production without custom engineering or infrastructure work?

7. Is there a single place to search for past research and reusable data sets, code, etc.?

8. Do your data scientists use the best tools money can buy?

Source: https://blog.dominodatalab.com/joel-test-data-science/

+1 Can you access your statistics without using R?

The “Joel Test” for Data Science

Page 4: ownR presentation eRum 2016

4Enterprise R platform

Ideal ProcessFrom code to production

SCM Package Repo

ProdProdProd

Page 5: ownR presentation eRum 2016

5Enterprise R platform

Ideal ProcessFrom code to production

SCM Package Repo

ProdProdProdCI

• Documentation• Unit tests• Build

Page 6: ownR presentation eRum 2016

6Enterprise R platform

Ideal ProcessFrom code to production

SCM Package Repo

ProdProdProdCI

• Upload to repository

Page 7: ownR presentation eRum 2016

7Enterprise R platform

Ideal ProcessFrom code to production

SCM Package Repo

ProdProdProdCI

• Deploy to separate applications

Page 8: ownR presentation eRum 2016

8Enterprise R platform

laiR

1. Multiple repositories

• Shared and Private repositories

• CRAN & Bioconductor

2. Upload any tar.gz with a DESCRIPTION

3. Search packages

4. Dependency management across all repositories

5. Zero-config installation

6. Maintain multiple versions of the same package

7. For a demo see: https://lair.ownr.io (demo / demo123)

The R package repository

Page 9: ownR presentation eRum 2016

9Enterprise R platform

laiR

1. Can new hires get set up in the environment to run analyses on their first day?

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?

The “Joel Test” for Data Science

Page 10: ownR presentation eRum 2016

10Enterprise R platform

laiR

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work?

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy?

+1 Can you access your statistics without using R?

The “Joel Test” for Data Science

Page 11: ownR presentation eRum 2016

11Enterprise R platform

roveR

1. Preconfigured, separated R environments

2. Open-source R package

3. Linking R environments to laiR installations

4. Release packages into laiR

5. Install dependencies with specific versions using laiR API

6. Install rover using the following command:

R container management

install.packages("rover", repos = c("https://lair.functionalfinances.com/repos/cran/",

"https://lair.functionalfinances.com/repos/shared/"),method = "curl")

Page 12: ownR presentation eRum 2016

12Enterprise R platform

roveR + laiR

1. Can new hires get set up in the environment to run analyses on their first day? ✅

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅

The “Joel Test” for Data Science

Page 13: ownR presentation eRum 2016

13Enterprise R platform

roveR + laiR

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy?

+1 Can you access your statistics without using R? ✅

The “Joel Test” for Data Science

Page 14: ownR presentation eRum 2016

14Enterprise R platform

exposeR

1. Access selected R calculations via REST

2. roveR container management - CRUD

3. Select functions to expose from packages in container

4. Allows other application to integrate R calculation

5. R in a dedicated environment

6. Zero-config installation

REST API for R containers

Page 15: ownR presentation eRum 2016

15Enterprise R platform

exposeR + roveR + laiR

1. Can new hires get set up in the environment to run analyses on their first day? ✅

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅

The “Joel Test” for Data Science

Page 16: ownR presentation eRum 2016

16Enterprise R platform

exposeR + roveR + laiR

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy?

+1 Can you access your statistics without using R? ✅

The “Joel Test” for Data Science

Page 17: ownR presentation eRum 2016

17Enterprise R platform

Reference ImplementationThe Delta-Lloyd case study

Dev

elop

men

t

R DEPLOYMENT PROCESS Functional Finances Ltd | January 13, 2016

R ContainerroveR::create

GITcommit

test, build, install

Jenkinscheckoutcheckout

test, build, install

laiR

roveR::release

roveR::install

Development Start

R Container

roveR::install

roveR::createTest Start Jenkins

Shiny

CLI

execute

execute

download

release

laiR

R Container(MPI grid)

roveR::install

roveR::createProduction Start

Shiny

CLI

execute

execute

Test

Prod

uctio

n

roveR::install

roveR::install

Sand

box

R Container(MPI grid)

roveR::createGITcommit

test, build, install

clone

checkoutModel Development

Start

Page 18: ownR presentation eRum 2016

18Enterprise R platform

Reference Implementation

1. Can new hires get set up in the environment to run analyses on their first day? ✅

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops? ✅

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅

The “Joel Test” for Data Science

Page 19: ownR presentation eRum 2016

19Enterprise R platform

Reference Implementation

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy? ✅

+1 Can you access your statistics without using R? ✅

The “Joel Test” for Data Science

Page 20: ownR presentation eRum 2016

20Enterprise R platform

Questions?

[email protected]