AWS Summit Benelux 2013 - Architecting for High Availability

Preview:

Citation preview

ARCHITECTING FOR HIGH

AVAILABILITY

Carlos Conde Sr. Mgr. Solutions Architecture

“LET’S BUILD

A ________ WEB

APPLICATION”

“LET’S BUILD

A HIGHLY AVAILABLE

________ WEB

APPLICATION”

“LET’S BUILD

A HIGHLY AVAILABLE

AND SCALABLE

________ WEB

APPLICATION”

“LET’S BUILD A HIGHLY AVAILABLE,

DURABLE AND SCALABLE

________ WEB APPLICATION”

“LET’S BUILD A HIGHLY AVAILABLE, DURABLE, RESILIENT

AND SCALABLE ________ WEB APPLICATION”

AWS BUILDING BLOCKS Inherently Fault-Tolerant Services Fault-Tolerant with the

right architecture Amazon S3

Amazon DynamoDB

Amazon CloudFront

Amazon SWF

Amazon SQS

Amazon SNS

Amazon SES

Amazon Route53

Elastic Load Balancing

AWS IAM

AWS Elastic Beanstalk

Amazon ElastiCache

Amazon EMR

Amazon Redshift

Amazon CloudSearch

Amazon EC2

Amazon EBS

Amazon RDS

Amazon VPC

1. DESIGN FOR FAILURE

2. USE MULTIPLE AZs

3. BUILD FOR SCALE

4. DECOUPLE COMPONENTS

« Everything fails all the time »

Werner Vogels

CTO of Amazon

YOUR GOAL

APPLICATIONS SHOULD CONTINUE TO FUNCTION

EVEN IF THE UNDERLYING PHYSICAL HARDWARE

FAILS OR IS REMOVED OR REPLACED

#1 DESIGN FOR FAILURE

AVOID SINGLE POINTS OF

FAILURE

ASSUME EVERYTHING FAILS,

AND WORK BACKWARDS

AVOID SINGLE POINTS OF

FAILURE

ASSUME EVERYTHING FAILS,

AND WORK BACKWARDS

HEALTH CHECKS

#2 USE MULTIPLE

AVAILABILITY ZONES

US-WEST (N. California) EU-WEST (Ireland)

ASIA PAC (Tokyo)

ASIA PAC

(Singapore)

US-WEST (Oregon)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

GOV CLOUD

ASIA PAC (Sidney)

AMAZON RDS

MULTI-AZ

#3 BUILD FOR SCALE

AMAZON

CLOUDWATCH MONITORING FOR AWS RESOURCES

AUTO SCALING SCALE UP/DOWN EC2 CAPACITY

HEALTH CHECKS

+ AUTO SCALING

HEALTH CHECKS

+ AUTO SCALING

=

SELF-HEALING

#4 DECOUPLE COMPONENTS

BUILD LOOSELY

COUPLED SYSTEMS

The looser they are coupled,

the bigger they scale,

the more fault tolerant they get…

PUBLISH

& NOTIFY RECEIVE TRANSCODE

AMAZON SQS SIMPLE QUEUE SERVICE

PUBLISH

& NOTIFY RECEIVE TRANSCODE

PUBLISH

& NOTIFY RECEIVE TRANSCODE

PUBLISH

& NOTIFY RECEIVE

PUBLISH

& NOTIFY RECEIVE TRANSCODE

ARCHITECTURE

DESIGN PATTERN

SQS VISIBILITY TIMEOUT

BUFFERING

CLOUDWATCH METRICS FOR AMAZON SQS

+ AUTO SCALING

PUBLISH

& NOTIFY RECEIVE TRANSCODE

PUBLISH

& NOTIFY RECEIVE TRANSCODE

CAT?

CHECK

IMAGE

TOO

BIG?

RESIZE

IMAGE

NO

YES NO

OMG, IT’S

A CAT!

TRANSCODE

CAT

CHECK

START

PUBLISH

& NOTIFY

STOP REJECT

CAT?

CHECK

IMAGE

TOO

BIG?

RESIZE

IMAGE

NO

YES NO

YES

TRANSCODE

CAT

CHECK

START

PUBLISH

& NOTIFY

STOP REJECT

CAT?

CHECK

IMAGE

TOO

BIG?

RESIZE

IMAGE

NO

YES NO

YES

TRANSCODE

CAT

CHECK

START

PUBLISH

& NOTIFY

STOP REJECT

TAKS

DECISIONS

HISTORY

TAKS

DECISIONS

HISTORY

STATELESS !

STATELESS SCALES

HORIZONTALLY

AMAZON SWF ENABLES RESILIENT, SCALABLE,

DISTRIBUTED WORKFLOWS

WORKFLOW ACTORS

DECIDERS COORDINATION LOGIC

1. Poll for work on a decision list Long polling: 60 seconds

2. Evaluate workflow execution history SWF sends full history in JSON format

3. Return decision to Amazon SWF Usually scheduling another task

WORKERS EXECUTION LOGIC

1. Poll for work on a specific task list Long polling: 60 seconds

2. Execute works, send heartbeats SWF sends input data from deciders

3. Return success / failure Detailed data can be provided to deciders

SWF IS WATCHING TRACKING:

Execution tracking Time to start, time to finish, …

Time to finish for overall workflow

Timeouts controlled for each of these (and more)

Heartbeats for long-running activities (optional)

Decider is informed of timeouts Schedule retries, “mitigation” strategies or cleanup tasks

NO NEW LANGUAGE

TO LEARN

YOUR CODE IS YOUR WORKFLOW LANGUAGE

AMAZON SWF MAINTAINS STATE

ALL HORIZONTAL SCALING

PATTERNS APPLY

CHAINED TASKS

WITHOUT DECISIONS?

USE AMAZON SQS

PUBLISH

& NOTIFY RECEIVE TRANSCODE

TASK GRAPH WITH DECISIONS?

USE AMAZON SWF

SANITY

CHECK

RECEIVE

DATA

CHECK

FORMAT

REJECT ADJUST

FORMAT

PUBLISH

& NOTIFY

GOOD

LONG

OK

SPAM

TRANSCODE

1. DESIGN FOR FAILURE

2. USE MULTIPLE AZs

3. BUILD FOR SCALE

4. DECOUPLE COMPONENTS

YOUR GOAL

APPLICATIONS SHOULD CONTINUE TO FUNCTION

EVEN IF THE UNDERLYING PHYSICAL HARDWARE

FAILS OR IS REMOVED OR REPLACED

AWS ARCHITECTURE CENTER http://aws.amazon.com/architecture

AWS TECHNICAL ARTICLES http://aws.amazon.com/articles

AWS BLOG http://aws.typepad.com

AWS PODCAST http://aws.amazon.com/podcast

THANK YOU!

Recommended