26
11.018.14 Brian Keller, Data Science Lead, Booz Allen Hamilton Jerry Megaro, Director, Advanced Analytics and Innovation, Merck Manufacturing Nic Perez, Cloud Architecture Lead, Booz Allen Hamilton Making a difference with data

(HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Embed Size (px)

DESCRIPTION

Producing vaccines is a significant and complex effort that spans manufacturing, biological materials, streaming data, and complex computational challenges. In this session, speakers from Merck and Booz Allen Hamilton discuss how they partnered to leverage AWS and data science techniques, enabling them to pioneer new approaches for analyzing vaccine production yields. The solution they created combines a shared data lake service built on AWS services-such as Amazon EC2 and Amazon VPC-as well as Hadoop MapReduce, HDFS, Hive, and R to implement the data science infrastructure and analysis that created models of complex biological processes. As a result of this project, Merck has analyzed 12 years of vaccine manufacturing data from 16 data sources, conducted over 15 billion calculations, and was recognized with the InformationWeek Elite Business Innovation Award for the innovative application of data science towards enhancing vaccine yield rates and saving lives.

Citation preview

Page 1: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

11.018.14

Brian Keller, Data Science Lead, Booz Allen Hamilton

Jerry Megaro, Director, Advanced Analytics and Innovation, Merck Manufacturing

Nic Perez, Cloud Architecture Lead, Booz Allen Hamilton

Making a difference with data

Page 2: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

- George W. Merck (1950)

Page 3: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014
Page 4: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

1 Broor S, Ghosh D, Mathur P. Molecular epidemiology of rotaviruses in India. Indian J Med Res 2003; 118:59-67.

**** ****

***

*

****

************* ********** *

*

*****

** * *

**

*

*

* *

*** *

* = sales for RotaTeq®

*

*

* * ***** ****

*****

*******

* ******

5.6 Billion

people in the

world do not

have access to

our products

90% of

RotaTeq

sales are

in USA

The Rotavirus

Vaccine Disconnect

= 1,000 deaths•

Page 5: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

BUSINESS KNOWLEDGE

Page 6: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Parametric models Let the data tell the story

Input/Output modelingData experiments to

enable discovery

Avoid failureFailure is powerful…

learn fast and adjust

Narrow scope of analysisAsk bigger questions

using atypical data

Page 7: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Human Insight + Actions

Data Management

Infrastructure

Machine Learning Free-Computation Alerting

GeographicLanguage

Translation

Entity

RelationshipEvent Grab

Dense/

Sparse

Structured Unstructured Streaming

Provisioning Deployment Monitoring Workflow

Streaming Analytics

Streaming

indexes

Services (SOA)

Analytics andDiscovery

Views and Indexes

HDFS/Data Lake

Metadata Tagging

Data Sources

Infrastructure/ Management

Visualization,Reporting, Dashboards,

and Query Interface

Page 8: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014
Page 9: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Resulted in…

Page 10: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Resulted in…

Page 11: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Resulted in…

Winner of Information Week Business Innovation Award

Page 12: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014
Page 13: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014
Page 14: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Clustering in this

region indicates

parameter similarity

is associated with

high yield

Clustering in this region

indicates parameter

similarity is associated

with low yieldSimilarity

Score(low)

(high)

Batch 2

Batch 1

Batch 3

Batch 5

Batch 4

Ba

tch

1

Ba

tch

3

Ba

tch

2

Ba

tch

5

Ba

tch

4

Increasing yieldIn

cre

asin

g y

ield

Similarity

Matrix

Page 15: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

(moderate similarity)

(high similarity)Lots of Data Experiments

(And Failures) That Lead to

Final Predictive Model…

Page 16: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

BusinessDecisionMakers

Researchers External Partners

Page 17: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Redshift-Based Data Marts

Amazon EC2Elastic Map/

Reduce

Hadoop, Solr Search Solution

Legacy Enterpise RDS

AES Encypted S3 Data Lake

VPCEnterprise

Active Directory

JAXRS/Tomcat-Based Rest Services on Elastic Bean Stalk

Angular, D3.js Web UIInsightsAccelerated Reasoning

Security

Cell-Level Visibilty, Life Science Informatics via

Custom Solr Plug-ins

Flexible Data ProcessingPipelines

Business Users

Data Scientists

Page 18: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014
Page 19: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Reference Architecture – Privileged Identity Management

Page 20: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014
Page 21: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Reference Architecture – Identity Analytics

Page 22: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

– Monitor, identify, and alert on abnormal user activity

– Govern administrative rights; policy based enforcement

– Hardened virtual appliance; do not allow direct RDP/SSH access to

management/security appliances

– IA has purview into every log (firewall/router logs, crypto logs,

application logs, systemd logs, OS logs, SCCM, etc.)

Page 23: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Reference Architecture – Cryptography

Page 24: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014
Page 25: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

exploredatascience.com

github.com/booz-allen-hamilton

Page 26: (HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

http://bit.ly/awsevals