AWS Summit Tel Aviv - Startup Track - Architecting for High Availability

Preview:

Citation preview

AWS Summit 2013 Tel Aviv Oct 16 – Tel Aviv, Israel

Alex Sinner

Solutions Architect, Amazon Web Services

ARCHITECTING FOR HIGH AVAILABILITY

“LET’S BUILD

A ________ WEB

APPLICATION”

“LET’S BUILD

A HIGHLY AVAILABLE

________ WEB

APPLICATION”

“LET’S BUILD

A HIGHLY AVAILABLE

AND SCALABLE

________ WEB

APPLICATION”

“LET’S BUILD A HIGHLY AVAILABLE,

SCALABLE, AND RESILIENT

________ WEB APPLICATION”

AWS BUILDING BLOCKS Inherently Fault-Tolerant Services Fault-Tolerant with the

right architecture Amazon S3

Amazon DynamoDB

Amazon CloudFront

Amazon SWF

Amazon SQS

Amazon SNS

Amazon SES

Amazon Route53

Elastic Load Balancing

AWS IAM

AWS Elastic Beanstalk

Amazon ElastiCache

Amazon EMR

Amazon Redshift

Amazon CloudSearch

Amazon EC2

Amazon EBS

Amazon RDS

Amazon VPC

1. DESIGN FOR FAILURE

2. USE MULTIPLE AZs

3. BUILD FOR SCALE

4. DECOUPLE COMPONENTS

« Everything fails all the time »

Werner Vogels

CTO of Amazon

YOUR GOAL

APPLICATIONS SHOULD CONTINUE TO FUNCTION

EVEN IF THE UNDERLYING PHYSICAL HARDWARE

FAILS OR IS REMOVED OR REPLACED

#1 DESIGN FOR FAILURE

AVOID SINGLE POINTS OF

FAILURE

ASSUME EVERYTHING FAILS,

AND WORK BACKWARDS

AVOID SINGLE POINTS OF

FAILURE

ASSUME EVERYTHING FAILS,

AND WORK BACKWARDS

HEALTH CHECKS

#2 USE MULTIPLE

AVAILABILITY ZONES

US-WEST (N. California) EU-WEST (Ireland)

ASIA PAC (Tokyo)

ASIA PAC

(Singapore)

US-WEST (Oregon)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

GOV CLOUD

ASIA PAC (Sidney)

AMAZON RDS

MULTI-AZ

#3 BUILD FOR SCALE

AMAZON

CLOUDWATCH MONITORING FOR AWS RESOURCES

AUTO SCALING SCALE UP/DOWN EC2 CAPACITY

HEALTH CHECKS

+ AUTO SCALING

HEALTH CHECKS

+ AUTO SCALING

=

SELF-HEALING

WalkMe Architecture for High Availability

© Copyright 2013 WalkMe Inc. - Confidential

The WalkMe Platform

One of a kind Platform to guide and engage

prospects, customers, employees or partners

through any Web experience

WalkMe Reduces Complexity to Empower

Advanced Selling, Support , training and

improved user experience

Using WalkMe increases conversion rates,

reduces support costs, accelerates training and

improves customer experience

No integration or changes to the underlying

website required.

© Copyright 2013 WalkMe Inc. - Confidential

Introducing the Holistic Approach to Automated Engagement

Surveys Pinpointed feedback –

right on time Search

Pinpointed to site and any

other relevant resource (such

as help desk)

Promotion Personalized

“happy b-day” “top up”

“bag for your camera?”

Announcements all or groups

“scheduled maintenance” “sale

on shirts “ “happy 4th of July”

Launchers & Permalinks Boost the effectiveness of your

existing FAQ, chat and social

support

Task List On Board new Users

Introduce new version

Analytics & Goals Straight forward measurement

and improvement of critical

paths

Segmented Display Right people – Right time

Online Support Employee Training Advanced Online Selling Improved User Experience Onboarding

Selected Customers

And many more…

The Basics

i. WalkMe customer creates WalkThrus using the WalkMe Editor.

ii. WalkMe customer adds the WalkMe JavaScript code to his website.

iii. WalkMe customer publishes the WalkThrus to his users.

iv. Our customers’ users gets WalkMe when they surf the website.

v. Our customer can access WalkMe dashboards to view usage analytics.

Challenges

• Maximum availability for client side experience (100%)

• Low latency for fetching the WalkMe files

• Very high traffic volume from our customers users (over 1B requests a

month)

• Analyzing billions of records for WalkMe analytics

Evolution – Phase 1

Problems: • Low availability • High latency • Hard to scale • Database availability

Evolution – Phase 2

Solution: • Using AWS CloudFront to

host the static files.

Problems: • High volume of analytics

causes a scaling issues and availability

• Database availability

Evolution – Phase 3

Solution: • Adding AWS RDS Multi AZ • Adding AWS Beanstalk

New Challenge: • Collection of billions of

records for BigData analytics (RDS is a bottleneck)

Evolution – Phase 4

Solution: • Analytics BigData requests

are sent to CloudFront. • Analyzing CloudFront logs

using Hadoop.

Solution

Thank You

1. DESIGN FOR FAILURE

2. USE MULTIPLE AZs

3. BUILD FOR SCALE

4. DECOUPLE COMPONENTS

#4 DECOUPLE COMPONENTS

BUILD LOOSELY

COUPLED SYSTEMS

The looser they are coupled,

the bigger they scale,

the more fault tolerant they get…

REPORT&

NOTIFY UPLOAD ANALYZE

AMAZON SQS SIMPLE QUEUE SERVICE

REPORT&

NOTIFY UPLOAD ANALYZE

REPORT&

NOTIFY UPLOAD ANALYZE

REPORT&

NOTIFY UPLOAD

REPORT&

NOTIFY UPLOAD ANALYZE

ARCHITECTURE

DESIGN PATTERN

SQS VISIBILITY TIMEOUT

BUFFERING

CLOUDWATCH METRICS FOR AMAZON SQS

+ AUTO SCALING

1. DESIGN FOR FAILURE

2. USE MULTIPLE AZs

3. BUILD FOR SCALE

4. DECOUPLE COMPONENTS

YOUR GOAL

APPLICATIONS SHOULD CONTINUE TO FUNCTION

EVEN IF THE UNDERLYING PHYSICAL HARDWARE

FAILS OR IS REMOVED OR REPLACED

AWS ARCHITECTURE CENTER http://aws.amazon.com/architecture

AWS TECHNICAL ARTICLES http://aws.amazon.com/articles

AWS BLOG http://aws.typepad.com

AWS PODCAST http://aws.amazon.com/podcast

THANK YOU!

Recommended