66
This One Weird API Request Will Save You Thousands Joshua Burgin, General Manager EC2 Spot Zeev Stolin, DevOps, Gett

This One Weird API Request Will Save You Thousands

Embed Size (px)

Citation preview

Page 1: This One Weird API Request Will Save You Thousands

This One Weird API Request Will

Save You Thousands

Joshua Burgin, General Manager EC2 Spot

Zeev Stolin, DevOps, Gett

Page 2: This One Weird API Request Will Save You Thousands

On-Demand

Pay for compute

capacity by the hour

with no long-term

commitments

For spiky workloads,

or to define needs

AWS EC2 Consumption Models

Reserved

Make a low, one-time

payment and receive

a significant discount

on the hourly charge

For committed

utilization

Spot

Bid for unused capacity,

charged at a Spot Price

which fluctuates based

on supply and demand

For time-insensitive,

transient and cost

sensitive workloads

Page 3: This One Weird API Request Will Save You Thousands

Spare capacity at scale

AWS has more than a

million active customers

in 190 countries.

On Average, every

week, AWS customers

are using more compute

capacity on EC2 Spot

Instances than

customers in 2012 were

running across all of

EC2.

Page 4: This One Weird API Request Will Save You Thousands

With Spot the rules are simple

Markets where the price of compute changes based on

supply and demand

You’ll never pay more than your bid. When the market exceeds your bid you get 2 minutes to

wrap up your work

Page 5: This One Weird API Request Will Save You Thousands

$0.27 $0.29$0.50

1b 1c1a

8XL

$0.30 $0.16$0.214XL

$0.07 $0.08$0.082XL

$0.05 $0.04$0.04XL

$0.01 $0.04$0.01L

C3

$1.76

On

Demand

$0.88

$0.44

$.22

$0.11

Show me the markets!

Each instance family

Each instance size

Each Availability Zone

In every region

Is a separate Spot Market

Page 6: This One Weird API Request Will Save You Thousands

50% Bid

75% Bid

You pay the

market

price

Bid Price Vs Market Price

25% Bid

Page 7: This One Weird API Request Will Save You Thousands

JUST BID ON-DEMAND!

Page 8: This One Weird API Request Will Save You Thousands

Example customer cases

Page 9: This One Weird API Request Will Save You Thousands

Why use Spot – customer examples

“The company has saved tens of thousands of dollars. That’s

between 20 and 30 percent of our total monthly AWS bill.”

Gal Aviv Research & Development Group Manager

Page 10: This One Weird API Request Will Save You Thousands

Why use Spot – customer examples

The raw data from the CMS experiment in the Large Hadron

Collider (LHC) is recorded every 25 nanoseconds at a rate of

approximately 1 petabyte per second.

Page 11: This One Weird API Request Will Save You Thousands

- Spot Bid Advisor

- Spot fleet

- Spot blocks

- New! Spot console

Recent Innovations

Page 12: This One Weird API Request Will Save You Thousands

Spot Bid Advisor

1) We make this easy using the

Spot bid advisor

2) With deliberate pool

selection and bidding, you

will keep your Spot instance

as long as you need to.

3) And with new features like

Spot fleet diversified we do

the heavy lifting for you...

Page 13: This One Weird API Request Will Save You Thousands

Spot Bid Advisor – aws-spot-labs

Page 14: This One Weird API Request Will Save You Thousands

Spot fleet – fly like a pro

Launch Thousands of Spot Instanceswith one RequestSpotFleet call.

Get Best PriceFind the lowest priced horsepower that works for you.

or

Get Diversified ResourcesDiversify your fleet. Grow your availability.

And

Apply Custom WeightingCreate your own capacity unit based on your application

needs

Page 15: This One Weird API Request Will Save You Thousands

Spot fleet – continued innovation

One-Time Fleets [May 2016]

CloudWatch Metrics for Spot Fleets [Mar 2016]

Modify Your Fleet [Oct 2015]

Distribute Your Fleet Across Multiple Capacity Pools [Sep 2015]

Weighted Bidding for EC2 Spot Instances [Aug 2015]

Spot instances in the lowest priced Availability Zone [Jul 2015]

Page 16: This One Weird API Request Will Save You Thousands

Spot fleet – super easyaws ec2 request-spot-fleet --spot-fleet-request-config file://config.json {

"IamFleetRole": "arn:aws:iam::781603563322:role/fleet-role", "TargetCapacity":

"100", "SpotPrice": "0.03", "ValidFrom": "2015-09-15T00:56:19Z", "ValidUntil":

"2016-09-14T07:00:00Z", "TerminateInstancesWithExpiration": true,

"LaunchSpecifications": [ { "ImageId": "ami-0d4cfd66", "InstanceType":

"c3.large", "WeightedCapacity": 2, "SubnetId": "subnet-d0dc51fb" }, { "ImageId":

"ami-0d4cfd66", "InstanceType": "c3.large", "WeightedCapacity": 2, "SubnetId":

"subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.large",

"WeightedCapacity": 2, "SubnetId": "subnet-0b1b8052" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.xlarge", "WeightedCapacity": 4, "SubnetId":

"subnet-d0dc51fb" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.xlarge",

"WeightedCapacity": 4, "SubnetId": "subnet-64531413" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.xlarge", "WeightedCapacity": 4, "SubnetId":

"subnet-0b1b8052" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.4xlarge",

"WeightedCapacity": 16, "SubnetId": "subnet-d0dc51fb" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.4xlarge", "WeightedCapacity": 16, "SubnetId":

"subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.4xlarge",

"WeightedCapacity": 16, "SubnetId": "subnet-0b1b8052" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.8xlarge", "WeightedCapacity": 32, "SubnetId":

"subnet-d0dc51fb" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.8xlarge",

"WeightedCapacity": 32, "SubnetId": "subnet-64531413" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.8xlarge", "WeightedCapacity": 32, "SubnetId":

"subnet-0b1b8052" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.2xlarge",

"WeightedCapacity": 8, "SubnetId": "subnet-d0dc51fb" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.2xlarge", "WeightedCapacity": 8, "SubnetId":

"subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.2xlarge",

"WeightedCapacity": 8, "SubnetId": "subnet-0b1b8052" } ] }

Page 17: This One Weird API Request Will Save You Thousands

An easy to use interface that

lets you launch Spot instances,

fleets & Spot blocks in seconds

Helps you select and bid on the

EC2 instances that meet your

applications requirements

Simple to use dashboard lets

you modify and manage your

application’s compute capacity

Spot Console – New [June 2016]

Page 18: This One Weird API Request Will Save You Thousands

Spot Launch Wizard – with fleet and blocks

Page 19: This One Weird API Request Will Save You Thousands

Spot Console – Target Capacity & Automated Bidding

Page 20: This One Weird API Request Will Save You Thousands

Spot Console – Dashboard

Page 21: This One Weird API Request Will Save You Thousands

Using a single

additional Parameter

Run continuously

for up to 6 hours

Save up to 50% off

On-Demand pricing

Spot blocks

$1

Page 22: This One Weird API Request Will Save You Thousands

Spot in action...

Best Practices

Example Workloads

• Hadoop

• Stateless Applications (e.g. web tiers)

Page 23: This One Weird API Request Will Save You Thousands

EC2 Best practices

Fault tolerance

for Spot

Stateless Multi-AZ Loosely coupledInstance Flexibility

Page 24: This One Weird API Request Will Save You Thousands

EC2 Spot – Hadoop

Page 25: This One Weird API Request Will Save You Thousands

Core nodes

Master

Node

Master instance group

Hadoop cluster

Core instance group

HDFS HDFS

Core nodes run

TaskTracker and

Datanode (HDFS)

Process Data with

mappers and

reducers, store data

with HDFS or

DataNode

Page 26: This One Weird API Request Will Save You Thousands

Core nodes

Master

Node

Master instance group

Hadoop cluster

Core instance group

HDFS HDFS

Can Add Core

Nodes:

More CPU

More Memory

More HDFS Space

HDFS

Page 27: This One Weird API Request Will Save You Thousands

Task Nodes – Multiple Instance Types

Master

Node

Hadoop cluster

HDFS HDFS

YARN -

heterogeneous

instances

Can add and

remove task nodes

c3.8xl, r3.8xl, r3.4xl, etc

Spot the

opportunity

Core instance group

Page 28: This One Weird API Request Will Save You Thousands

Multiple capacity pools

How flexible are you?

2. Across Families

3. Across Zones

1. Within family

Page 29: This One Weird API Request Will Save You Thousands

Review and launch

{

"AllocationStrategy": "diversified",

"TargetCapacity": 1000,

"SpotPrice": "0.005",

"TerminateInstancesWithExpiration": true,

"LaunchSpecifications": [

{

"ImageId": "ami-0d4cfd66",

"InstanceType": "c3.xlarge",

"WeightedCapacity": 4,

"SpotPrice": "0.0263",

"SubnetId": "subnet-d0dc51fb"

},

{

"ImageId": "ami-0d4cfd66",

"InstanceType": "c3.2xlarge",

"WeightedCapacity": 8,

"SpotPrice": "0.0263",

"SubnetId": "subnet-d0dc51fb"

},

{

Page 30: This One Weird API Request Will Save You Thousands

Results - Hadoop

Requested 1000

vCores over 30 days

Minimum 848 vCores

Mode 1008 vCores

Average 1005 vCores

Average Price of

$0.0118 per vCore

Savings of over 81%

Page 31: This One Weird API Request Will Save You Thousands

Capitalizing on two minute warning

When the Spot price exceeds

your bid price, the instance will

receive a two-minute warning

Check for the 2 minute spot

instance termination

notification every 5 seconds

leveraging a script invoked at

instance launch

Page 32: This One Weird API Request Will Save You Thousands

Sample script – two minutes left!

1) Check for 2 minute warning

2) If YES, run shutdown scripts

3) OTHERWISE, do nothing

4) Then sleep for 5 seconds

#!/bin/bash

while true

do

if curl -s http://169.254.169.254/latest/meta-

data/spot/termination-time | grep -q .*T.*Z; then

/env/bin/runterminationscripts.sh;

else

# Spot instance not yet marked for termination.

sleep 5

fi

done

Page 33: This One Weird API Request Will Save You Thousands

• No need to scale HDFS – Capacity

– Replication for durability

• Amazon S3 scales with your data– Both in IOPs and data storage

– Massively parallel

EMRFS - Amazon

S3 as HDFS Spot block for HDFS

• For core nodes if HDFS

cluster lives for less than

6 hours

Page 34: This One Weird API Request Will Save You Thousands

Hadoop on EC2 Spot – takeaways

Your Work

Run task nodes separately with EC2 Spot fleet

Spot blocks for core/HDFS clusters that live less than 6 hours

What EC2 Spot fleet does for you

Saves you money

Heterogeneous instance management

Scale on the unit that matters to you

Accelerate results (time is money)

Page 35: This One Weird API Request Will Save You Thousands

Web Applications with Spot

Page 36: This One Weird API Request Will Save You Thousands

Stateless Web Application

Elastic Load

Balancing

Stateless

Web Servers

(Spot)

Stateless

Web Servers

(Spot)

Session

State Data

Spot fleet

Availability Zone A

Availability Zone B

Stateless

Web Servers

(Spot)

Stateless

Web Servers

(Spot)

Page 37: This One Weird API Request Will Save You Thousands

Diversification with EC2 Spot fleet

Multiple EC2 Spot instances

selected

Multiple Availability Zones

selected

Pick the instances with similar

performance characteristics e.g.

c3.large, m3.large, m4.large,

r3.large, c4.large.

Page 38: This One Weird API Request Will Save You Thousands

Multiple capacity pools

How flexible are you?

2. Across Families

3. Across Zones

1. Within family

Page 39: This One Weird API Request Will Save You Thousands

Review and Launch{

"AllocationStrategy": "diversified",

"TargetCapacity": 50,

"SpotPrice": "0.01",

"LaunchSpecifications": [

{

"ImageId": "ami-0d4cfd66",

"InstanceType": "c3.large",

"SpotPrice": "0.105"

},

{

"ImageId": "ami-0d4cfd66",

"InstanceType": "c4.large",

"SpotPrice": "0.11"

},

{

"ImageId": "ami-0d4cfd66",

"InstanceType": "m3.large",

"SpotPrice": "0.133"

},

{

Page 40: This One Weird API Request Will Save You Thousands

Results - Web Application

50 instances requested, over 30 days.

- Never dropped below 45 instances

- 85% discount if you wanted 50 and could withstand dropping to 45

0

0.02

0.04

0.06

0.08

0.1

0.12

30

35

40

45

50

55

Instances Average Price Per Instance

- If you only wanted 45 the discount is still 83%

Page 41: This One Weird API Request Will Save You Thousands

Some additional considerations

Elastic Load Balancing

Two minute warning

Page 42: This One Weird API Request Will Save You Thousands

Since Spot fleet is configured to span across multiple Availability Zones, we highly recommend enabling cross-zone load balancing for the load balancer.

To allow in-flight requests to complete when de-registering Spot instances that are about to be terminated, connection drainingcan be enabled on the load balancer with a timeout of 90 seconds.

Elastic Load Balancing

Page 43: This One Weird API Request Will Save You Thousands

Capitalizing on two minute warning

When the Spot price exceeds

your bid price, the instance will

receive a two-minute warning

Check for the 2 minute spot

instance termination

notification every 5 seconds

leveraging a script invoked at

instance launch

Page 44: This One Weird API Request Will Save You Thousands

Sample script – two minutes left!

1) Check for 2 minute

warning

2) If YES, detach instance

from ELB

3) OTHERWISE, do nothing

4) Sleep for 5 seconds

$ if curl -s http://169.254.169.254/latest/meta-

data/spot/termination-time | \

grep -q .*T.*Z; then instance_id=$(curl -s

http://169.254.169.254/latest/meta-data/instance-id); \

aws elb deregister-instances-from-load-balancer \

--load-balancer-name my-load-balancer \

--instances $instance_id;

/env/bin/flushsessiontoDBonterminationscript.sh; fi

Page 45: This One Weird API Request Will Save You Thousands

For those of you - Using Auto Scaling

Two Auto Scaling groups

• On-demand + Reserved for base use

• Add an additional Auto Scaling group with Spot

Both Auto Scaling groups behind the same Elastic Load Balancer.

Use the bid advisor to select the

right instance time for your

application.

Page 46: This One Weird API Request Will Save You Thousands

Web Application Architecture with Spot

Elastic Load

Balancing

Stateless

Web Servers

Stateless

Web Servers

On Demand Auto

Scaling group

Session

State Data

Stateless Web

Servers (Spot)

Stateless Web

Servers (Spot)

Spot Auto

Scaling group

Availability Zone A

Availability Zone B

On-Demand

ASG

Spot ASG

Page 47: This One Weird API Request Will Save You Thousands

Gett's AWS Spot Instances

Usage

2016

Page 48: This One Weird API Request Will Save You Thousands
Page 49: This One Weird API Request Will Save You Thousands

Gett is the largest and fastest

growing

on-demand mobility company in

EMEA

• 5,000+ corporate accounts

• 300% annual growth since

inception

• $500+ million in funding

• $500 million annual

revenue

• 50,000+ cabs globally

• 60 cities worldwide

• 30M+ passengers

Page 50: This One Weird API Request Will Save You Thousands

Gett is the leader in On-

demand

mobility in Israel

• 7,500 vehicles

• National coverage

• 50,000+ rides daily

• 1.5 million users

• 1,700+ corporate

accounts

• 80% brand recognition

Page 51: This One Weird API Request Will Save You Thousands
Page 52: This One Weird API Request Will Save You Thousands
Page 53: This One Weird API Request Will Save You Thousands
Page 54: This One Weird API Request Will Save You Thousands

Why use spots?

Gett has a small Mobile App

That Generates a Lot of traffic

Page 55: This One Weird API Request Will Save You Thousands

Why use spots?

… a traffic that requires much CPU Power and Memory to process:

In Production:

~300 EC2 Instances

For BI + Staging + Development:

~350-400 Additional EC2 Instances

Page 56: This One Weird API Request Will Save You Thousands

In Production

We replaced ~70% of the On-Demand Instances with a Spot

Instances. We left only 3 On-Demand Instances per service and the

rest are Spots

Result:

65% Cost Saving for production

Availability is improved (due to the additional HW redundancy)

Latency is improved (due to the additional HW resources)

Page 57: This One Weird API Request Will Save You Thousands

In Production (numbers)

~300 servers for ~30 services

3 on demand instances per service ~ 90 for high availability

200 spots of m3.large, m3.xlarge, c3.2xlarge ~ 400k $ saved annually

Page 58: This One Weird API Request Will Save You Thousands

We replaced almost all of the On-Demand Instances with Spot

Instances.

Result:

85% Cost Saving for these environments.

Spots allows us to run as much staging environments as we need.

The cost saving - is tremendous!

In BI, Staging, and Development

Page 59: This One Weird API Request Will Save You Thousands

In BI, Staging, and Development (numbers)

In order to use agile methodology we need a lot of staging environments

On demand : 15 environments 20 m3.large servers each. ($0.146 per Hour X

20 X 15 X 24 X 365 ~ 400k $ annually)

Spot: 15 environments 20 m3.large servers each. ($0.0211 per Hour X 20 X

15 X 24 X 365 ~ 50k $ annually)

~ 350k $ saved annually

Page 60: This One Weird API Request Will Save You Thousands

Other Notes

Using spot-request resources for each service (for persistence).

Smart Bid price mechanism based on Spot bid advisor.

→ Bid price is based on instance type.

We use fulfillment option of spot-requests for persistency.

We use Terraform (from HashiCorp) and Green/Blue deployment.

Page 61: This One Weird API Request Will Save You Thousands

Wrapping it all up

Page 62: This One Weird API Request Will Save You Thousands

Using Spot, OD and RI Together

Data Science

New app development Test and Development

Internal IT

Page 63: This One Weird API Request Will Save You Thousands

- CloudTrail events

- Dedicated Instance on Spot for HIPAA workloads

- ECS to automatically scale Spot fleet using

CloudWatch Metrics

Coming soon!

Page 64: This One Weird API Request Will Save You Thousands

Getting started

Try the Bid Advisor Fire up the Spot Console Block some time

Page 66: This One Weird API Request Will Save You Thousands