Deploying Data Science with Docker and AWS

Embed Size (px)

Text of Deploying Data Science with Docker and AWS

  • Deploying Data Science with Docker and AWS

    Audience: Cambridge AWS Meetup Group

    Presenter: Matt McDonnell, Data Scientist at Metail

    Date: 9th June 2016

  • Context

    Lots of event stream data

    Many AWS components

    Outputs:- Business Intelligence- Bespoke Analysis- Productionised Science

  • What?Goal: Moving laptop analyses onto a server

    Turn :

    analysis script retrieves data from DB, Looker, web, etc.

    runs analysis

    outputs results as csv, png, etc. to local hard disk

    Into :

    Automated process running on a server

  • Why? Production scheduled task e.g. Firm Wide Metrics daily processing

    Make use of more powerful Amazon Web Services (AWS) cloud resources for large scale analysis

    Ease of deployment for Data Science analysts

    Build consistent development environment

    How? Containerize applications and runtime using Docker to produce images

    Store images on AWS Elastic Container Registry (ECR)

    Run images either locally, or Amazon Elastic Container Service (ECS)

    Use AWS Lambda functions to trigger scheduled tasks (or react to events)

  • What is Docker?

    Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in. --

    Public code: store Dockerfile on GitHub, use Travis to automatically build image on DockerHub

    Private code: private Dockerfile, build locally, push image to AWS Elastic Container Registry

  • Example application: retrieve market data

    PyAnalysisApplication code built on PCR image

    PCR: Python Component Runtime Base Docker image

  • Where? Amazon Web Services Cloud

    Elastic Container Service (ECS) Defines the task that runs the container Runs tasks on a cluster of EC2 nodes

    EC2 instance set up to act as node Needs to be an AWS ECS optimized AMI

    Needs an IAM Role that has: AmazonEC2ContainerServiceforEC2Role policy attached Policies to allow access to any AWS resources needed e.g. S3

    Lambda function to trigger ECS task cron equivalent by using CloudWatch scheduled events

  • EC2 Instance Security Group

    EC2 instance used by ECS can be locked down no need to SSH in to it so no inbound ports needed

  • EC2 Instance AMI

    Use latest available Amazon ECS Optimized AMI it has Docker and ECS Container Agent already installed

  • EC2 Instance Details

    Enable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions

  • EC2 Instance IAM Role

    Attach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instance

  • ECS Task

    ECS task retrieves image and runs it

  • Lambda function

    Use the lambda-canary blueprint as a basis for cron job equivalents

  • Lambda function

    cron job equivalent via CloudWatch scheduled event

  • Lambda Function

    Simple Lambda function to run task on ECS

  • Lambda function IAM role

    AWS will create default IAM Roles for Lambda function need to add ecs:RunTask to run container

  • Demo / Q&ABlog posts

    Scheduled Downloads using AWS EC2 and Docker Medium (me)

    Better Together: Amazon ECS and AWS Lambda (not me)

    Code samples

    Docker images




    Twitter @mattmcd

    Email or