View
4
Download
0
Category
Preview:
Citation preview
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Julien Simon
Principal Technical Evangelist, AI and Machine Learning
@julsimon
Build, train, and deploy machine
learning models at scale
ML is still too complicated for everyday developers
Collect and prepare
training data
Choose and
optimize your ML
algorithm
Set up and manage
environments for
training
Train and
tune model
(trial and error)
Deploy model
in production
Scale and manage
the production
environment
Amazon SageMaker
Collect and prepare
training data
Choose and
optimize your ML
algorithm
Set up and manage
environments for
training
Deploy model
in production
Scale and manage
the production
environment
Easi ly bui ld, train, and deploy Machine Learning models
Train and
tune model
(trial and error)
Amazon SageMaker
Pre-built
notebooks for
common
problems
K-Means Clustering
Principal Component Analysis
Neural Topic Modelling
Factorization Machines
Linear Learner
XGBoost
Latent Dirichlet Allocation
Image Classification
Seq2Seq,
And more!
ALGORITHMS
Apache MXNet
TensorFlow
Caffe2, CNTK,
PyTorch, Torch
FRAMEWORKSS et u p and m anage
env i ronments fo r
t ra i ni ng
T ra i n and t u ne
m ode l ( t r ia l and
e r ror )
Depl oy m ode l
i n produ ct ion
S c a l e and m anage t he
produ c t ion env i ronment
Built-in, high-
performance
algorithms
Build
Amazon SageMaker
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
Build Train
Deploy model
in production
Scale and manage
the production
environment
Amazon SageMaker
Fully managed
hosting with auto-
scaling
One-click
deploymentPre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
Build Train Deploy
Amazon ECR
Model Training (on EC2)
Model Hosting (on EC2)
Tra
inin
g d
ata
Mo
de
l a
rtif
act
s
Training code Helper code
Helper codeInference code
Gro
un
d T
ruth
Client application
Inference code
Training code
Inference requestInference response
Inference Endpoint
Amazon SageMaker
Open Source Containers for TF and MXNet
https://github.com/aws/sagemaker-tensorflow-containers
https://github.com/aws/sagemaker-mxnet-containers
• Customize them
• Run them locally for development and testing
• Run them on SageMaker for training and prediction at scale
Bring your own container
https://github.com/aws/sagemaker-container-support
• Integration with SageMaker Python SDK Estimators, including:
• Downloading user-provided Python code
• Deserializing hyperparameters (preserving their Python types)
• bin/entry.py, the Docker entrypoint required by SageMaker
• Reading in the metadata files provided to the container during training
• nginx + Gunicorn HTTP server for serving inference requests
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/r_bring_your_own
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon EC2 C5 instances
AVX 512
72 vCPUs
“Skylake”
144 GiB memory
C5
12 Gbps to EBS
2X vCPUs
2X performance
3X throughput
2.4X memory
C4
36 vCPUs
“Haswell”
4 Gbps to EBS
60 GiB memory
C 5 : N e x t G e n e r a t i o n
C o m p u t e - O p t i m i ze d
I n s t a n c e s w i t h
I n t e l ® X e o n ® S c a l a b le
P r o c e s s o r
AW S C o m p u t e o p t i m i ze d
i n s t a n c e s s u p p o r t t h e n e w I n t e l ®
AV X - 5 1 2 a d va n c e d i n s t r u c t i o n
s e t , e n a b l i n g yo u t o m o r e
e f f i c i e n t l y r u n ve c t o r p r o c e s s i n g
wo r k l o a d s w i t h s i n g l e a n d
d o u b l e f l o a t i n g p o i n t p r e c i s i o n ,
s u c h a s A I / m a c h i n e l e a r n i n g o r
v i d e o p r o c e s s i n g .
25% improvement in
price/performance over C4
Faster TensorFlow training on C5
https://aws.amazon.com/blogs/machine-learning/faster-training-with-optimized-tensorflow-
1-6-on-amazon-ec2-c5-and-p3-instances/
Amazon EC2 P3 Instances
• P3.2xlarge, P3.8xlarge, P3.16xlarge
• Up to eight NVIDIA Tesla V100 GPUs in a single instance
• 40,960 CUDA cores, 5120 Tensor cores
• 128GB of GPU memory
• 1 PetaFLOPs of computational performance – 14x better than P2
• 300 GB/s GPU-to-GPU communication (NVLink) – 9x better than P2
T h e f a s t e s t , m o s t p o w e r f u l G P U i n s t a n c e s i n t h e c l o u d
Digital Globehttps://aws.amazon.com/solutions/case-studies/digitalglobe-machine-learning/
• Operating Earth imaging
satellites and providing image
analysis services.
• Over 100 PB of imagery.
• Extensive use of Machine
Learning on SageMaker to
extract information from images.
• Working with the AWS ML Lab,
built a predictive model reducing
cloud storage costs by 50%.
DEMOS
Thank you!
Julien Simon
Principal Technical Evangelist, AI and Machine Learning
@julsimon
https://aws.amazon.com/sagemaker
https://github.com/awslabs/amazon-sagemaker-examples
https://github.com/aws/sagemaker-python-sdk
https://github.com/aws/sagemaker-spark
https://medium.com/@julsimon
https://youtube.com/juliensimonfr
https://gitlab.com/juliensimon/dlnotebooks
Recommended