39
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data analytics and Deep Learning on AWS with Jupyter and MXNet Amazon Web Services Adrian White, Research & Technical Computing July 20, 2017

AWS - Data analytics and Deep Learning on AWS with …-+Data...Why use AWS for Research? Time to Science Access research infrastructure in minutes Low Cost Pay-as-you-go pricing Elastic

Embed Size (px)

Citation preview

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data analytics and Deep Learning on AWS with Jupyter and MXNetAmazon Web Services

Adrian White, Research & Technical ComputingJuly 20, 2017

Agenda1:30 Introduction to AWS (20 mins)1:50 Demo: Alces Flight & Landsat-8 (15 mins)2:05 Lab: Jupyter Notebooks on AWS (50 mins)3:00 Finish

Why use AWS for Research?

Time to ScienceAccess research

infrastructure in minutes

Low CostPay-as-you-go pricing

ElasticEasily add or remove

capacity

Globally AccessibleEasily Collaborate with

researchers around the world

SecureA collection of tools to

protect data and privacy

ScalableAccess to effectively

limitless capacity

AWS Global Infrastructure

16 Regions

43AvailabilityZones

70EdgeLocations

4

ENTERPRISE APPS

DEVELOPMENT & OPERATIONSMOBILE SERVICESAPP SERVICESANALYTICS

DataWarehousing

Hadoop/Spark

Streaming Data Collection

Machine Learning

Elastic Search

Virtual Desktops

Sharing & Collaboration

Corporate Email

Backup

Queuing & Notifications

Workflow

Search

Email

Transcoding

One-click App Deployment

Identity

Sync

Single Integrated Console

PushNotifications

DevOps Resource Management

Application Lifecycle Management

Containers

Triggers

Resource Templates

TECHNICAL & BUSINESS SUPPORT

Account Management

Support

Professional Services

Training & Certification

Security & Pricing Reports

Partner Ecosystem

Solutions Architects

MARKETPLACE

Business Apps

Business Intelligence DatabasesDevOps

Tools NetworkingSecurity Storage

Regions Availability Zones

Points of Presence

INFRASTRUCTURE

CORE SERVICES

ComputeVMs, Auto-scaling, & Load Balancing

StorageObject, Blocks, Archival, Import/Export

DatabasesRelational, NoSQL, Caching, Migration

NetworkingVPC, DX, DNSCDN

Access Control

Identity Management

Key Management & Storage

Monitoring & Logs

Assessment and reporting

Resource & Usage Auditing

SECURITY & COMPLIANCE

Configuration Compliance

Web application firewall

HYBRIDARCHITECTURE

Data Backups

Integrated App Deployments

DirectConnect

IdentityFederation

IntegratedResource Management

Integrated Networking

API Gateway

IoT

Rules Engine

Device Shadows

Device SDKs

Registry

Device Gateway

Streaming Data Analysis

Business Intelligence

MobileAnalytics

More on AWS Instance Types

Broad Set of Compute Instance Typesfor HPC and Deep Learning

M4

General purpose

Computeoptimized

C4

C3

Storage and I/Ooptimized

I3

G2

GPU or FPGAenabled

Memoryoptimized

D2

M3

X1

P2

F1

R4

R3

C5

I2 HS1

Instances Sizes: R4 Example M4

GPU and FPGA InstancesP2: GPU instance• Up to 16 NVIDIA GK210 (8 X K80) GPUs in a single instance, with

peer-to-peer PCIe GPU interconnect• Supporting a wide variety of use cases including deep learning, HPC

simulations, financial computing, and batch rendering

F1: FPGA instance• Up to 8 Xilinx Virtex® UltraScale+™ VU9P FPGAs in a single

instance, with peer-to-peer PCIe and bidirectional ring interconnects• Designed for hardware-accelerated applications including financial

computing, genomics, accelerated search, and image processing

P2

F1

P2 GPU Instances

• Up to 16 K80 GPUs in a single instance• Including peer-to-peer PCIe GPU interconnect• Supporting a wide variety of use cases including deep

learning, HPC simulations, and batch rendering

P2

Instance Size

GPUs GPU Peer to Peer

vCPUs Memory (GiB)

Network Bandwidth*

p2.xlarge 1 - 4 61 1.25Gbpsp2.8xlarge 8 Y 32 488 10Gbpsp2.16xlarge 16 Y 64 732 20Gbps

*In a placement group

F1 FPGA Instances

• Up to 8 Xilinx Virtex UltraScale Plus VU9p FPGAs in a single instance with four high-speed DDR-4 per FPGA

• Largest size includes high performance FPGA interconnects via PCIeGen3 (FPGA Direct), and bidirectional ring (FPGA Link)

• Designed for hardware-accelerated applications including financial computing, genomics, accelerated search, and image processing

F1

Instance Size FPGAs FPGA Link

FPGADirect

vCPUs Memory (GiB)

NVMeInstanceStorage

Network Bandwidth*

f1.2xlarge 1 - 8 122 1 x480 5 Gbpsf1.16xlarge 8 Y Y 64 976 4x960 30 Gbps

*In a placement group

A GPU is effective at processing the same set of operations in parallel – single instruction, multiple data (SIMD). A GPU has a well-defined instruction-set, and fixed word sizes – for example single, double, or half-precision integer and floating point values.

An FPGA is effective at processing the same or different operations in parallel – multiple instructions, multiple data (MIMD). An FPGA does not have a predefined instruction-set, or a fixed data width.

ControlALU

ALU

Cache

DRAM

ALU

ALU

CPU(one core)

FPGA

DRAM DRAM

GPU

Each FPGA in F1 has more than 2M of these cells

Each GPU in P2 has 2880 of these cores

DRAM

Parallel Processing in GPUs and FPGAs

Bloc

k R

AM

Bloc

k R

AM

DRAM DRAM

But it’s not about servers…

Physical Virtualisation Containerization Serverless

Evolving Compute Abstractions

AWS Lambda

AWS Lambda – How it Works

Bring your own codeNode.JS, Java, PythonJava = Any JVM based language such as Scala, Clojure, etc.Bring your own libraries

Flexible invocation pathsEvent or RequestResponse invoke optionsExisting integrations with various AWS services

Simple resource model• Select memory from 128MB

to 1.5GB in 64MB steps• CPU & Network allocated

proportionately to RAM• Reports actual usage

Fine grained permissions• Uses IAM role for Lambda

execution permissions• Uses Resource policy for

AWS event sources

Lambda in the context of Grid Computing

Source: “Occupy the Cloud: Distributed Computing for the 99%”https://arxiv.org/pdf/1702.04024.pdf

Batch & HPC

On-demand, Auto Scaling Clusters On AWS

CfnCluster AWS Batch

AWS Batch automatically provisions compute resources tailored to the needs of your jobs using Amazon EC2 and EC2 Spot

Alces Flight is available in the AWS Marketplace and bundles 1000+ commonly used applicationshttps://aws.amazon.com/marketplace/

CfnCluster is provided by AWS to quickly provision configurable clusters and grid computing environments.

Alces Flight: Personal on-demand HPC

1000+ popular scientific applications

• Pre-installed

• Multiple versions, complete with libraries and various compiler optimizations, ready to run

Available via the AWS Marketplace (the cloud’s “App Store”)

http://alces-flight.com/ for more information

Self-scaling HPC clusters instantly ready to compute, billed by the hour and use the AWS Spot market by default, so they’re incredibly low cost

Flight is accessible

All the traditional command-line tools will be familiar, but you can also create an Alces “session” and immediately launch a desktop view of your cluster to run graphical apps.

Command Line (ssh) Graphical Console

Demo: Alces Flight & Landsat on AWS

Data Analytics

ENTERPRISE APPS

DEVELOPMENT & OPERATIONSMOBILE SERVICESAPP SERVICESANALYTICS

DataWarehousing

Hadoop/Spark

Streaming Data Collection

Machine Learning

Elastic Search

Virtual Desktops

Sharing & Collaboration

Corporate Email

Backup

Queuing & Notifications

Workflow

Search

Email

Transcoding

One-click App Deployment

Identity

Sync

Single Integrated Console

PushNotifications

DevOps Resource Management

Application Lifecycle Management

Containers

Triggers

Resource Templates

TECHNICAL & BUSINESS SUPPORT

Account Management

Support

Professional Services

Training & Certification

Security & Pricing Reports

Partner Ecosystem

Solutions Architects

MARKETPLACE

Business Apps

Business Intelligence DatabasesDevOps

Tools NetworkingSecurity Storage

Regions Availability Zones

Points of Presence

INFRASTRUCTURE

CORE SERVICES

ComputeVMs, Auto-scaling, & Load Balancing

StorageObject, Blocks, Archival, Import/Export

DatabasesRelational, NoSQL, Caching, Migration

NetworkingVPC, DX, DNSCDN

Access Control

Identity Management

Key Management & Storage

Monitoring & Logs

Assessment and reporting

Resource & Usage Auditing

SECURITY & COMPLIANCE

Configuration Compliance

Web application firewall

HYBRIDARCHITECTURE

Data Backups

Integrated App Deployments

DirectConnect

IdentityFederation

IntegratedResource Management

Integrated Networking

API Gateway

IoT

Rules Engine

Device Shadows

Device SDKs

Registry

Device Gateway

Streaming Data Analysis

Business Intelligence

MobileAnalytics

Evolution of Data Analytics

Batch Real time Prediction

Amazon KinesisAmazon

Redshift

AWS Batch

Amazon EMR

AWS IoTAmazon

SNS

Amazon Kinesis Analytics

Amazon Machine Learning

Amazon Rekognition

Amazon Redshift Amazon Elastic MapReduce

Data Warehouse Semi-structured

Amazon Glacier

Use an optimal combination of highly interoperable services

Amazon Simple Storage Service

Data Storage Archive

Amazon DynamoDB

Amazon Machine Learning

Amazon Kinesis

NoSQL Predictive Models Other AppsStreaming

Machine Learning

The Circle of ML

Front-End team

Data Engineering team

Analysts / DS team

DevOps team

Business Problem

Data

ML Model

ML Application

The Circle of ML

Front-End team

Data Engineering team

Analysts / DS team

DevOps team

Business Problem

Data

ML Data

ML Application

Heavy Lifting by AWS

Dive Deep as much as you need

Hardware - Distributed computing, GPU, FPGA, Green Grass

DL - MXNet, NeMo, TensorFlow, Caffe, Torch, Theano

Platform – Data Science Environment (Notebooks, Model Hosting and Retraining)

Simple API ServicesU

sage

& S

impl

icity

Control

Jupyter Notebooks on AWS

Research customers are increasingly doing exploratorydata science and analytics work using notebooks.

Jupyter on AWS allows researchers to take advantageof any AWS compute node type:• Large memory, CPU optimized, IO optimized• GPU nodes (e.g. multiple K80 GPUs)

Researchers can also access Batch, HPC and Spark/Mllib clusterswith Jupyter

How to:Run Jupyter Notebook and JupyterHub on Amazon EMRCreating and Using a Jupyter Instance on AWS

Demo: Jupyter on AWS

Distributed Deep Learning on AWS

• Distributed training across GPUs or CPUs using MXNet

• Spin up a cluster in minutes• Automatically add or remove cluster

nodes• Supports Amazon EFS share filesystem• Available on GitHub

https://github.com/awslabs/deeplearning-cfn

Research Programs at AWS

Global Data Egress Waiver

Why?Researchers strongly need Predictable Budgets

Who? Available to Degree-granting / Research Institutions in APAC (and elsewhere)

What?Waives data egress charges from Qualified Accounts (capped at 15% of Total Spend)

How?Contract Addendum Required.Talk to your Account Team.

All qualifying research customers should use this!

AWS Research Cloud Program

Science first, not servers.Researchers are not professional IT people (nor do they wish to be).

Simple and easily explainedprocedures to get set up with cloud access.

Budget management tools to ensure that over-spends do not happen.

Large catalog of scientificSolutions from partners, including instant clusters from AWS Marketplace.

Fast track to invoice-backed billing & Egress Waiver.

Best practices to ensure both data and research budgets are safe and privacy is protected.

IT’S ABOUT SCIENCE, NOT SERVERS.

aws.amazon.com/rcp

We recognise that whilst research is often a compute-intensive activity, most researchers are not IT experts.

We want to simplify research in the cloud with easy-to-use tools for researchers and their students, and share the catalogue of “researcher-obsessed” products and services created by many of our partners.

AWS Researcher’s HandbookThe 150-page “missing manual” for science in the cloud.

Written by Amazon’s Research Computing community for scientists.

• Explains foundational concepts about how AWS can accelerate time-to-science in the cloud.

• Step-by-step best practices for securing your environment to ensure your research data is safe and your privacy is protected.

• Tools for budget management that will help you control your spending and limit costs (and preventing any over-runs).

• Catalogue of scientific solutions from partners chosen for their outstanding work with scientists.

aws.amazon.com/rcp

Lab: Deep Learning on AWS with Jupyter and MXNet