View
217
Download
2
Category
Preview:
Citation preview
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data analytics and Deep Learning on AWS with Jupyter and MXNetAmazon Web Services
Adrian White, Research & Technical ComputingJuly 20, 2017
Agenda1:30 Introduction to AWS (20 mins)1:50 Demo: Alces Flight & Landsat-8 (15 mins)2:05 Lab: Jupyter Notebooks on AWS (50 mins)3:00 Finish
Why use AWS for Research?
Time to ScienceAccess research
infrastructure in minutes
Low CostPay-as-you-go pricing
ElasticEasily add or remove
capacity
Globally AccessibleEasily Collaborate with
researchers around the world
SecureA collection of tools to
protect data and privacy
ScalableAccess to effectively
limitless capacity
ENTERPRISE APPS
DEVELOPMENT & OPERATIONSMOBILE SERVICESAPP SERVICESANALYTICS
DataWarehousing
Hadoop/Spark
Streaming Data Collection
Machine Learning
Elastic Search
Virtual Desktops
Sharing & Collaboration
Corporate Email
Backup
Queuing & Notifications
Workflow
Search
Transcoding
One-click App Deployment
Identity
Sync
Single Integrated Console
PushNotifications
DevOps Resource Management
Application Lifecycle Management
Containers
Triggers
Resource Templates
TECHNICAL & BUSINESS SUPPORT
Account Management
Support
Professional Services
Training & Certification
Security & Pricing Reports
Partner Ecosystem
Solutions Architects
MARKETPLACE
Business Apps
Business Intelligence DatabasesDevOps
Tools NetworkingSecurity Storage
Regions Availability Zones
Points of Presence
INFRASTRUCTURE
CORE SERVICES
ComputeVMs, Auto-scaling, & Load Balancing
StorageObject, Blocks, Archival, Import/Export
DatabasesRelational, NoSQL, Caching, Migration
NetworkingVPC, DX, DNSCDN
Access Control
Identity Management
Key Management & Storage
Monitoring & Logs
Assessment and reporting
Resource & Usage Auditing
SECURITY & COMPLIANCE
Configuration Compliance
Web application firewall
HYBRIDARCHITECTURE
Data Backups
Integrated App Deployments
DirectConnect
IdentityFederation
IntegratedResource Management
Integrated Networking
API Gateway
IoT
Rules Engine
Device Shadows
Device SDKs
Registry
Device Gateway
Streaming Data Analysis
Business Intelligence
MobileAnalytics
Broad Set of Compute Instance Typesfor HPC and Deep Learning
M4
General purpose
Computeoptimized
C4
C3
Storage and I/Ooptimized
I3
G2
GPU or FPGAenabled
Memoryoptimized
D2
M3
X1
P2
F1
R4
R3
C5
I2 HS1
GPU and FPGA InstancesP2: GPU instance• Up to 16 NVIDIA GK210 (8 X K80) GPUs in a single instance, with
peer-to-peer PCIe GPU interconnect• Supporting a wide variety of use cases including deep learning, HPC
simulations, financial computing, and batch rendering
F1: FPGA instance• Up to 8 Xilinx Virtex® UltraScale+™ VU9P FPGAs in a single
instance, with peer-to-peer PCIe and bidirectional ring interconnects• Designed for hardware-accelerated applications including financial
computing, genomics, accelerated search, and image processing
P2
F1
P2 GPU Instances
• Up to 16 K80 GPUs in a single instance• Including peer-to-peer PCIe GPU interconnect• Supporting a wide variety of use cases including deep
learning, HPC simulations, and batch rendering
P2
Instance Size
GPUs GPU Peer to Peer
vCPUs Memory (GiB)
Network Bandwidth*
p2.xlarge 1 - 4 61 1.25Gbpsp2.8xlarge 8 Y 32 488 10Gbpsp2.16xlarge 16 Y 64 732 20Gbps
*In a placement group
F1 FPGA Instances
• Up to 8 Xilinx Virtex UltraScale Plus VU9p FPGAs in a single instance with four high-speed DDR-4 per FPGA
• Largest size includes high performance FPGA interconnects via PCIeGen3 (FPGA Direct), and bidirectional ring (FPGA Link)
• Designed for hardware-accelerated applications including financial computing, genomics, accelerated search, and image processing
F1
Instance Size FPGAs FPGA Link
FPGADirect
vCPUs Memory (GiB)
NVMeInstanceStorage
Network Bandwidth*
f1.2xlarge 1 - 8 122 1 x480 5 Gbpsf1.16xlarge 8 Y Y 64 976 4x960 30 Gbps
*In a placement group
A GPU is effective at processing the same set of operations in parallel – single instruction, multiple data (SIMD). A GPU has a well-defined instruction-set, and fixed word sizes – for example single, double, or half-precision integer and floating point values.
An FPGA is effective at processing the same or different operations in parallel – multiple instructions, multiple data (MIMD). An FPGA does not have a predefined instruction-set, or a fixed data width.
ControlALU
ALU
Cache
DRAM
ALU
ALU
CPU(one core)
FPGA
DRAM DRAM
GPU
Each FPGA in F1 has more than 2M of these cells
Each GPU in P2 has 2880 of these cores
DRAM
Parallel Processing in GPUs and FPGAs
Bloc
k R
AM
Bloc
k R
AM
DRAM DRAM
AWS Lambda – How it Works
Bring your own codeNode.JS, Java, PythonJava = Any JVM based language such as Scala, Clojure, etc.Bring your own libraries
Flexible invocation pathsEvent or RequestResponse invoke optionsExisting integrations with various AWS services
Simple resource model• Select memory from 128MB
to 1.5GB in 64MB steps• CPU & Network allocated
proportionately to RAM• Reports actual usage
Fine grained permissions• Uses IAM role for Lambda
execution permissions• Uses Resource policy for
AWS event sources
Lambda in the context of Grid Computing
Source: “Occupy the Cloud: Distributed Computing for the 99%”https://arxiv.org/pdf/1702.04024.pdf
On-demand, Auto Scaling Clusters On AWS
CfnCluster AWS Batch
AWS Batch automatically provisions compute resources tailored to the needs of your jobs using Amazon EC2 and EC2 Spot
Alces Flight is available in the AWS Marketplace and bundles 1000+ commonly used applicationshttps://aws.amazon.com/marketplace/
CfnCluster is provided by AWS to quickly provision configurable clusters and grid computing environments.
Alces Flight: Personal on-demand HPC
1000+ popular scientific applications
• Pre-installed
• Multiple versions, complete with libraries and various compiler optimizations, ready to run
Available via the AWS Marketplace (the cloud’s “App Store”)
http://alces-flight.com/ for more information
Self-scaling HPC clusters instantly ready to compute, billed by the hour and use the AWS Spot market by default, so they’re incredibly low cost
Flight is accessible
All the traditional command-line tools will be familiar, but you can also create an Alces “session” and immediately launch a desktop view of your cluster to run graphical apps.
Command Line (ssh) Graphical Console
ENTERPRISE APPS
DEVELOPMENT & OPERATIONSMOBILE SERVICESAPP SERVICESANALYTICS
DataWarehousing
Hadoop/Spark
Streaming Data Collection
Machine Learning
Elastic Search
Virtual Desktops
Sharing & Collaboration
Corporate Email
Backup
Queuing & Notifications
Workflow
Search
Transcoding
One-click App Deployment
Identity
Sync
Single Integrated Console
PushNotifications
DevOps Resource Management
Application Lifecycle Management
Containers
Triggers
Resource Templates
TECHNICAL & BUSINESS SUPPORT
Account Management
Support
Professional Services
Training & Certification
Security & Pricing Reports
Partner Ecosystem
Solutions Architects
MARKETPLACE
Business Apps
Business Intelligence DatabasesDevOps
Tools NetworkingSecurity Storage
Regions Availability Zones
Points of Presence
INFRASTRUCTURE
CORE SERVICES
ComputeVMs, Auto-scaling, & Load Balancing
StorageObject, Blocks, Archival, Import/Export
DatabasesRelational, NoSQL, Caching, Migration
NetworkingVPC, DX, DNSCDN
Access Control
Identity Management
Key Management & Storage
Monitoring & Logs
Assessment and reporting
Resource & Usage Auditing
SECURITY & COMPLIANCE
Configuration Compliance
Web application firewall
HYBRIDARCHITECTURE
Data Backups
Integrated App Deployments
DirectConnect
IdentityFederation
IntegratedResource Management
Integrated Networking
API Gateway
IoT
Rules Engine
Device Shadows
Device SDKs
Registry
Device Gateway
Streaming Data Analysis
Business Intelligence
MobileAnalytics
Evolution of Data Analytics
Batch Real time Prediction
Amazon KinesisAmazon
Redshift
AWS Batch
Amazon EMR
AWS IoTAmazon
SNS
Amazon Kinesis Analytics
Amazon Machine Learning
Amazon Rekognition
Amazon Redshift Amazon Elastic MapReduce
Data Warehouse Semi-structured
Amazon Glacier
Use an optimal combination of highly interoperable services
Amazon Simple Storage Service
Data Storage Archive
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
NoSQL Predictive Models Other AppsStreaming
The Circle of ML
Front-End team
Data Engineering team
Analysts / DS team
DevOps team
Business Problem
Data
ML Model
ML Application
The Circle of ML
Front-End team
Data Engineering team
Analysts / DS team
DevOps team
Business Problem
Data
ML Data
ML Application
Heavy Lifting by AWS
Dive Deep as much as you need
Hardware - Distributed computing, GPU, FPGA, Green Grass
DL - MXNet, NeMo, TensorFlow, Caffe, Torch, Theano
Platform – Data Science Environment (Notebooks, Model Hosting and Retraining)
Simple API ServicesU
sage
& S
impl
icity
Control
Jupyter Notebooks on AWS
Research customers are increasingly doing exploratorydata science and analytics work using notebooks.
Jupyter on AWS allows researchers to take advantageof any AWS compute node type:• Large memory, CPU optimized, IO optimized• GPU nodes (e.g. multiple K80 GPUs)
Researchers can also access Batch, HPC and Spark/Mllib clusterswith Jupyter
How to:Run Jupyter Notebook and JupyterHub on Amazon EMRCreating and Using a Jupyter Instance on AWS
Distributed Deep Learning on AWS
• Distributed training across GPUs or CPUs using MXNet
• Spin up a cluster in minutes• Automatically add or remove cluster
nodes• Supports Amazon EFS share filesystem• Available on GitHub
https://github.com/awslabs/deeplearning-cfn
Global Data Egress Waiver
Why?Researchers strongly need Predictable Budgets
Who? Available to Degree-granting / Research Institutions in APAC (and elsewhere)
What?Waives data egress charges from Qualified Accounts (capped at 15% of Total Spend)
How?Contract Addendum Required.Talk to your Account Team.
All qualifying research customers should use this!
AWS Research Cloud Program
Science first, not servers.Researchers are not professional IT people (nor do they wish to be).
Simple and easily explainedprocedures to get set up with cloud access.
Budget management tools to ensure that over-spends do not happen.
Large catalog of scientificSolutions from partners, including instant clusters from AWS Marketplace.
Fast track to invoice-backed billing & Egress Waiver.
Best practices to ensure both data and research budgets are safe and privacy is protected.
IT’S ABOUT SCIENCE, NOT SERVERS.
aws.amazon.com/rcp
We recognise that whilst research is often a compute-intensive activity, most researchers are not IT experts.
We want to simplify research in the cloud with easy-to-use tools for researchers and their students, and share the catalogue of “researcher-obsessed” products and services created by many of our partners.
AWS Researcher’s HandbookThe 150-page “missing manual” for science in the cloud.
Written by Amazon’s Research Computing community for scientists.
• Explains foundational concepts about how AWS can accelerate time-to-science in the cloud.
• Step-by-step best practices for securing your environment to ensure your research data is safe and your privacy is protected.
• Tools for budget management that will help you control your spending and limit costs (and preventing any over-runs).
• Catalogue of scientific solutions from partners chosen for their outstanding work with scientists.
aws.amazon.com/rcp
Recommended