47
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. David Pessis, Engineering Manager Elastic Load Balancing July 13, 2016 Deep Dive on Elastic Load Balancing and Best Practices

Deep Dive on Elastic Load Balancing

Embed Size (px)

Citation preview

Page 1: Deep Dive on Elastic Load Balancing

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

David Pessis, Engineering Manager – Elastic Load Balancing

July 13, 2016

Deep Dive on Elastic Load Balancing and

Best Practices

Page 2: Deep Dive on Elastic Load Balancing

Elastic Load Balancing automatically distributes

incoming application traffic across multiple

Amazon EC2 instances.

Page 3: Deep Dive on Elastic Load Balancing

SecureElastic Integrated Cost Effective

Page 4: Deep Dive on Elastic Load Balancing

EC2

Instance

Page 5: Deep Dive on Elastic Load Balancing

Load Balancer used to

route incoming requests

to multiple EC2

instances.

ELB

EC2

Instance

EC2

Instance

EC2

Instance

Page 6: Deep Dive on Elastic Load Balancing

Load balance over classic EC2

instances.

Support for public IP addresses only.

No control over the load balancer

security group.

Load balance over EC2 instances

within a VPC.

Support for both public and private IP

addresses.

Full control over the load balancer

security group.

Tightly integrated into the associated

VPC and subnets.

EC2-Classic EC2-VPC

Page 7: Deep Dive on Elastic Load Balancing

ArchitectureCustomer VPC

EC2

Instance

EC2

Instance

us-w

est-

1a

us-w

est-

1b

Amazon

Route 53

ELB VPC

ELB

ELB

Page 8: Deep Dive on Elastic Load Balancing

HTTP/HTTPSTCP/SSL

Incoming client connection bound to

server connection

No header modification

Proxy Protocol prepends source and

destination IP and ports to request

Round robin algorithm used for

request routing

Connection terminated at the load

balancer and pooled to the server

Headers may be modified

X-Forwarded-For header contains

client IP address

Least outstanding requests algorithm

used for request routing

Sticky session support available

Page 9: Deep Dive on Elastic Load Balancing

Health checks allow for

traffic to be shifted away

from failed instances

Page 10: Deep Dive on Elastic Load Balancing

ELB

EC2

Instance

EC2

Instance

EC2

Instance

Health checks ensure

that request traffic is

shifted away from a

failed instance.

Health Checks

Page 11: Deep Dive on Elastic Load Balancing

Support for TCP and HTTP health checks.

Customize the frequency and failure

thresholds.

Must return a 2xx response.

Consider the depth and accuracy of your

health checks.

Health Checks

Page 12: Deep Dive on Elastic Load Balancing

Idle timeouts allow for connections to be closed by

the load balancer when no longer in use.

Page 13: Deep Dive on Elastic Load Balancing

Length of time that an idle connection should be kept open.

For both client and backend connections.

Defaults to 60 seconds, but can be set between 1 and 3,600

seconds.

Timeouts should decrease as you go

up the stack.

Idle Timeouts

Page 14: Deep Dive on Elastic Load Balancing

15s

3s

3sELB

15sEC2

Instances

Amazon S3

Amazon RDS

Amazon SWF

3s

9s

Idle Timeouts

Page 15: Deep Dive on Elastic Load Balancing

Using multipleAvailability Zones

Page 16: Deep Dive on Elastic Load Balancing

Multiple Availability ZonesELB VPC Customer VPC

EC2

InstanceELB

ELBEC2

Instance

us-w

est-

1a

us-w

est-

1b

Amazon

Route 53

Page 17: Deep Dive on Elastic Load Balancing

Multiple Availability ZonesELB VPC Customer VPC

EC2

InstanceELB

ELB

us-w

est-

1a

us-w

est-

1b

Amazon

Route 53

Page 18: Deep Dive on Elastic Load Balancing

Always associate two

or more subnets in

different zones with

the load balancer

Page 19: Deep Dive on Elastic Load Balancing

Using multiple Availability Zones

does bring a few challenges.

Page 20: Deep Dive on Elastic Load Balancing

Re

qu

es

t C

ou

nt

Time

Traffic Imbalances

Page 21: Deep Dive on Elastic Load Balancing

Imbalanced Instance Capacity

ELB VPC Customer VPC

EC2

InstanceELB

ELB

us-w

est-

1a

us-w

est-

1b

Amazon

Route 53

EC2

Instances

Page 22: Deep Dive on Elastic Load Balancing

Cross-Zone Load BalancingELB VPC Customer VPC

EC2

InstanceELB

ELB

us-w

est-

1a

us-w

est-

1b

Amazon

Route 53

EC2

Instances

Page 23: Deep Dive on Elastic Load Balancing

Re

qu

es

t C

ou

nt

Time

Traffic Imbalances

Cross-Zone Enabled

Page 24: Deep Dive on Elastic Load Balancing

Load balancer absorbs impact of DNS caching.

Eliminates imbalances in backend instance utilization.

Requests distributed evenly across multiple

Availability Zones.

Check connection limits before enabling.

No additional bandwidth charge for

cross-zone traffic.

Cross-Zone Load Balancing

Page 25: Deep Dive on Elastic Load Balancing

Each load balancer domain may contains multiple records.

Round robin used to balance traffic between Availability Zones.

DNS records will change over time; never

target IP addresses directly.

After being removed from DNS, IP addresses

are drained and quarantined for up to 7 days.

Understanding DNS

Page 26: Deep Dive on Elastic Load Balancing

DNS caching by clients and ISPs can often cause clients to target

a specific IP address or stop resolving at all.

Register a wildcard CNAME or ALIAS within Amazon Route 53.

// Create a wildcard CNAME or ALIAS in Route 53.

*.example.com ALIAS … elb-12345.us-east-1.elb.amazon.com

*.example.com CNAME elb-12345.us-east-1.elb.amazon.com

// prepend random content for each lookup made by the application.

PROMPT> dig +short 25a8ade5-6557-4a54-a60e-8f51f3b195d1.example.com

192.0.2.1

192.0.2.2

DNS Optimization

Page 27: Deep Dive on Elastic Load Balancing

SSL Offloading

Support for both SSL and HTTPs is provided.

Support for latest ciphers and protocols, including

Elliptical Curve Ciphers and Perfect Forward Secrecy.

Ability to fully customize ciphers and protocols to be

used by each load balancer.

SSL Negotiation Suites provided to remove complexity

of selecting ciphers and protocols.

Page 28: Deep Dive on Elastic Load Balancing

SSL Negotiation Policies

Provide selection of ciphers and protocols that adhere to the latest

industry best practices.

Balance security best practices with client’s ability to negotiate a

connection, generated using traffic to Amazon.com.

Released on a regular cadence or when new

vulnerabilities are published.

Default for all new load balancers.

Page 29: Deep Dive on Elastic Load Balancing

POODLE Mitigation

Within 24 hours, 62% of load

balancers migrated to the latest SSL

Negotiation Policy, disabling SSLv3.

Page 30: Deep Dive on Elastic Load Balancing

@awscloud Thank-you #AWS for making it

so easy to prevent #sslv3 #poodleattack Only

took about 3 clicks of my mouse.“”@granticini

Page 31: Deep Dive on Elastic Load Balancing

13 CloudWatch metrics provided for each load

balancer.

Provide detailed insight into the health of the load

balancer and application stack.

CloudWatch alarms can be configured to notify or

take action should any metric go outside of the

acceptable range.

All metrics provided at the 1-minute granularity.

Amazon CloudWatch Metrics

Page 32: Deep Dive on Elastic Load Balancing

HealthyHostCount

The count of the number of healthy instances

in each Availability Zone.

Most common cause of unhealthy hosts is

health check exceeding the allocated timeout.

Test by making repeated requests to the

backend instance from another EC2 instance.

View at the zonal dimension.

Page 33: Deep Dive on Elastic Load Balancing

Latency

Measures the elapsed time, in seconds, from when the request leaves the

load balancer until the response is received.

Test by sending requests to the backend instance from another instance.

Using min, average, and max CloudWatch stats,

provide upper and lower bounds for latency.

Debug individual requests using access logs.

Page 34: Deep Dive on Elastic Load Balancing

Surge Queue and Spillovers

Count of the number of requests that could not be sent to backend

instances.

Queue up to 1,024 requests per load balancer

node, after which 503 errors will be returned.

Often caused by not being able to open

connections to the backend instance.

Normally a sign of an under-scaled application.

Page 35: Deep Dive on Elastic Load Balancing

CloudWatch and Auto Scaling

All load balancer metrics can be used for Auto Scaling.

Allows you to scale dynamically, based on the load

balancer's view of the application.

Important to consider all metrics when using

Auto Scaling; may not be aware of resource

contention on another metric.

You may be at peak multiple times a day.

Page 36: Deep Dive on Elastic Load Balancing

Provide detailed information on each

request processed by the load balancer.

Includes request time, client IP address,

latencies, request path, and server

responses.

Delivered to an S3 bucket every 5 or 60

minutes.

Access Logs

Page 37: Deep Dive on Elastic Load Balancing

Access Logs

ELB VPC

ELB

ELB

ELB Amazon S3

Logs indexed by date,

but include the IP

address of the load

balancer node itself.

Page 38: Deep Dive on Elastic Load Balancing

• timestamp

• elb name

• client:port

• backend:port

• request_processing_time

• backend_processing_time

• response_processing_time

• elb_status_code

• backend_state_code

• received_bytes

• sent_bytes

• “request”

2014-02-15T23:39:43.945958Z my-test-loadbalancer

192.168.131.39:2817 10.0.0.0.1 0.000073 0.001048 0.000057 200

200 0 29 "GET http://www.example.com:80/HTTP/1.1"

Access Logs

Page 39: Deep Dive on Elastic Load Balancing

“Everything fails all the time”Werner Vogels, CTO, Amazon.com

Page 40: Deep Dive on Elastic Load Balancing

Be prepared to do nothing!

Page 41: Deep Dive on Elastic Load Balancing

Mitigation Isolation Restore

Redundancy

Page 42: Deep Dive on Elastic Load Balancing

Mitigation

All load balancers scaled to handle loss

of single Availability Zone.

Route 53 health checks shift traffic away

from the failed Availability Zone.

Completed within 150 seconds.

No other external or control plane

dependencies.

Page 43: Deep Dive on Elastic Load Balancing

Isolation

Other zones must remain unaffected.

Avoid dependencies between zones.

Be careful of work generated as a result

of the event.

Operating at reduced capacity, but

stable.

Page 44: Deep Dive on Elastic Load Balancing

Health checkers and edge locations

perform the same volume of activity

whether endpoints are healthy or

unhealthy.

Constant Work

time

System activity

Time to react

When nothing is failing, volume of API

calls is zero. When failure occurs,

volume of API calls spikes.

time

System activity

Time to react

Work on Failure

Page 45: Deep Dive on Elastic Load Balancing

Restore Redundancy

Restoring the system to full capacity.

Avoid putting additional load on the system

by rushing this step.

Ensure that recovered resources are left in

a consistent state.

Fully recovered when done.

Page 46: Deep Dive on Elastic Load Balancing

Remember to complete

your evaluations!

Remember to complete

your evaluations!

Page 47: Deep Dive on Elastic Load Balancing

Thank You!

Twitter: @davidpessis