Transcript
Page 1: (DVO313) Building Next-Generation Applications with Amazon ECS

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Matt DeBergalis

@debergalis

Cofounder and VP of Product

October 2015

DVO313

Building Next-Generation Applications

with Amazon ECS

Page 2: (DVO313) Building Next-Generation Applications with Amazon ECS

What to expect from the session

• Overview of the connected client application architecture, where rich

web + mobile clients maintain persistent network connections to cloud

microservices, and Meteor, a JavaScript application platform for building

these apps.

• Discussion of the unique devops requirements needed to manage

connected client apps and microservices.

• Reasons for delivering Galaxy, Meteor’s cloud runtime, on Amazon ECS.

• Deep dive into our use of Amazon ECS.

Page 3: (DVO313) Building Next-Generation Applications with Amazon ECS

Open source

The JavaScript app

platform, for mobile

and web

Open source (MIT)

10th most starred

project on GitHub

Fully supported

Galaxy runtime

Deploy, operate, and

monitor apps and

services

Built on Amazon ECS

Launched to public on

10/5

Team of over 30,

hundreds of OSS

contributors

$30M+ raised from

Andreessen Horowitz,

Matrix, others

100+ development

and training partners

Cloud platform Complete ecosystem

Page 4: (DVO313) Building Next-Generation Applications with Amazon ECS

JavaScriptCLRJVM

Meteor: a JavaScript application platform

Page 5: (DVO313) Building Next-Generation Applications with Amazon ECS

Galaxy

Proxy

App A (dead) Galaxy Server Galaxy Server

E C S C L U S T E R

E L B

Proxy Scheduler

A Z 1 A Z 2

App A v2

App A v2

App A v2

App A v2

App A (dead)

App A (dead)

App A (dead)

Page 6: (DVO313) Building Next-Generation Applications with Amazon ECS

WEBSITES

Links and forms

Page-based

Viewed in a browser

APPS

Modern UI/UX

No refresh button

Browser, mobile, and more

Page 7: (DVO313) Building Next-Generation Applications with Amazon ECS

WEBSITES

Stateless

Request / response

Presentation on the wire

APPS

Stateful

Publish / subscribe

Data on the wire

Page 8: (DVO313) Building Next-Generation Applications with Amazon ECS

The architecture required for modern

apps is different

Page 9: (DVO313) Building Next-Generation Applications with Amazon ECS

Mainframe PC WebConnected

client

Page 10: (DVO313) Building Next-Generation Applications with Amazon ECS

1. Stateful clients and servers with

persistent network connections.

2. Reactive architecture where data

is pushed to clients in real time.

3. Code running on the client and

in the cloud. The app spans the

network.

Connected client

Page 11: (DVO313) Building Next-Generation Applications with Amazon ECS

Connected client app

Cloud

Client

Application

microserviceBilling Geo

Page 12: (DVO313) Building Next-Generation Applications with Amazon ECS

HTML

Templates

App

Logic

Microservices Database

Reactive UI update system

Native mobile container

Speculative client-side updates

Client-side data store

Custom data sync protocol

Realtime database monitoring

Build & update system

Assemble it yourself

Off the shelf

Build / integrate

Modern app architecture

Page 13: (DVO313) Building Next-Generation Applications with Amazon ECS

HTML

Templates

App

Logic

Microservices Database

HTML

Templates

App

Logic

Microservices Database

Reactive UI update system

Native mobile container

Speculative client-side updates

Client-side data store

Custom data sync protocol

Realtime database monitoring

Build & update system

Open-source

JavaScript app platform

Off the shelf

Build / integrate

Assemble it yourself With Meteor

Modern app architecture

Page 14: (DVO313) Building Next-Generation Applications with Amazon ECS

JavaScript is the only reasonable

language for cross platform app

development

Page 15: (DVO313) Building Next-Generation Applications with Amazon ECS

The architecture required for

modern apps is different

Page 16: (DVO313) Building Next-Generation Applications with Amazon ECS

The devops required for

modern apps is different

Page 17: (DVO313) Building Next-Generation Applications with Amazon ECS

The devops required for

modern apps is more complex

Page 18: (DVO313) Building Next-Generation Applications with Amazon ECS

• Persistent, stateful connections.

• Seamless application updates –

“hot code push”.

• Client tracking and metrics.

• Complex array of microservices.

• Mobile considerations (builds, push

notifications).

Connected client devops

Page 19: (DVO313) Building Next-Generation Applications with Amazon ECS

• Scalable multi-tenant: 100k users,

1MM processes, 100M sessions.

• Accessible to developers without

sophisticated devops background.

• Suitable for expert teams and complex

apps.

• High availability of user apps and the

Galaxy infrastructure.

• Online updates of all Galaxy components.

• Smooth path to customer-managed cloud.

• Use off-the-shelf parts wherever possible.

Design requirements

Page 20: (DVO313) Building Next-Generation Applications with Amazon ECS

C o n n e c t e d c l i e n t m a n a g e m e n t

Application logic and services

MetricsHot code deploySession mgmt

Container management

IaaS resources

Web

Galaxy: connected client management

I n f r a s t r u c t u r e

Mobile Device

Page 21: (DVO313) Building Next-Generation Applications with Amazon ECS

• O(100k) independent user processes that need isolation.

• Granular, efficient – essential in multi-tenant.

• Surprisingly important: fast spin-up.

• Speed and responsiveness is an essential part of a great

developer experience.

• Fast spin-up lets us build around a “single-shot” container

model.

• Layering as a path to user-supplied binaries.

Containers and orchestration

Page 22: (DVO313) Building Next-Generation Applications with Amazon ECS

• Lots of exciting options here: ECS, Kubernetes, Marathon, …

• Service argument is compelling. Same case we make for Galaxy to

our customers.

• Integration with other parts of AWS saves us time and code.

Example: services automatically register containers with Elastic Load

Balancing (ELB).

• Support for multiple Availability Zones.

• Bottom line: ECS got us to market faster than the alternatives.

ECS container management

Page 23: (DVO313) Building Next-Generation Applications with Amazon ECS

Implementation

Page 24: (DVO313) Building Next-Generation Applications with Amazon ECS

Logs

Metrics

Galaxy UI

App

images

App

state

Cluster 1

Manager

app app

app app

Cluster 2

app app

app app

Cluster 3

app app

app appDeveloper Admin

Manager

Manager

F R O N T E N D B A C K E N D

Page 25: (DVO313) Building Next-Generation Applications with Amazon ECS

Galaxy

E C S C L U S T E R

A Z 1 A Z 2

App AApp A App AApp A

Page 26: (DVO313) Building Next-Generation Applications with Amazon ECS

Galaxy

E C S C L U S T E R

E L B

A Z 1 A Z 2

App AApp A App AApp A

Proxy Proxy

Page 27: (DVO313) Building Next-Generation Applications with Amazon ECS

Galaxy

E C S C L U S T E R

E L B

A Z 1 A Z 2

App AApp A App AApp A

App CService BService B

Proxy SchedulerProxy SchedulerProxy

Page 28: (DVO313) Building Next-Generation Applications with Amazon ECS

Galaxy

Proxy

E C S C L U S T E R

E L B

Proxy

A Z 1 A Z 2

App AApp A App AApp A

Scheduler

App CService BService B

Galaxy UI Galaxy UI

Page 29: (DVO313) Building Next-Generation Applications with Amazon ECS

Deeper Dive

Custom scheduler

Connected client proxy

User metrics

Page 30: (DVO313) Building Next-Generation Applications with Amazon ECS

• Need fine-grained control over how individual tasks are allocated to

container instances and across Availability Zones.

• Container health depends on high-level behavior of app processes, not

just low-level checks.

• Need rate limits and backoff policy when restarting application

containers. (Not our code; potentially not the same policy for all users.)

• Users need visibility into container health.

• Need to ensure that system-essential containers (proxy, Galaxy UI) can

be scheduled even if resources are over-committed.

Scheduling containers

Page 31: (DVO313) Building Next-Generation Applications with Amazon ECS

ECS default scheduler not designed to do these kinds of things.

That’s okay! Instead, ECS provides cluster state and task

management APIs needed to write our own. ~1,500 lines of Go.

• High availability app containers must be

distributed across Availability Zones.

• App containers should be evenly

distributed across instances in an

Availability Zone.

• Container instances should be roughly

equally loaded.

• Each container instance must have

space to run a proxy and a scheduler.

Also implements rate-limiting, application health checks, and

coordinated version updates.

Writing a custom scheduler

Page 32: (DVO313) Building Next-Generation Applications with Amazon ECS

Logs

Metrics

Galaxy UI

App

images

App

state

Cluster 1

Scheduler

app app

app app

Cluster 2

app app

app app

Cluster 3

app app

app appDeveloper Admin

Scheduler

Scheduler

F R O N T E N D B A C K E N D

State sync

Page 33: (DVO313) Building Next-Generation Applications with Amazon ECS

policies

schedulerECS

APIApp state

Desired config

<app,version,containers,HA>

StartTask

StopTask

ListTasks

DescribeTasks

Actual config

[<container status,exit code>]

State sync

Page 34: (DVO313) Building Next-Generation Applications with Amazon ECS

• To ensure the scheduler stays alive, we create an ECS service

calling for exactly 1 scheduler task.

• If the scheduler goes down, crashed containers will no longer be

restarted, and users won't be able to launch new containers or stop

old ones. Reasonable failure mode.

• We’re considering changing to a “keep <n> running” model,

using Amazon DynamoDB to broker a leadership election between

the set.

Scheduling the scheduler

Page 35: (DVO313) Building Next-Generation Applications with Amazon ECS

• Manages the persisent connection between clients and

the appropriate application backend / microservice process.

• Implements stable sessions + coordinated version updates.

• Share nothing architecture. Any proxy can serve any request.

• High availability: multiple proxies in multiple Availability Zones.

• Scheduled as an ECS service; binds to ELB.

Connected client proxy

Page 36: (DVO313) Building Next-Generation Applications with Amazon ECS

Proxy

App A Galaxy ServerApp A App A App A Galaxy Server

E C S C l u s t e r

E L B

Proxy

App B App C App D

Scheduler

ELB routes traffic to any proxy. Any proxy can route to any app container.

ELB routes traffic on ports 80 and 443. The ELB is configured in TCP pass-through mode so that we can use

WebSockets.

A Z 1 A Z 2

Stable sessions

Page 37: (DVO313) Building Next-Generation Applications with Amazon ECS

Proxy

App A Galaxy ServerApp A App A App A Galaxy Server

E C S C l u s t e r

E L B

Proxy

App B App C App D

Scheduler

A Z 1 A Z 2

Proxy routes initial request to random container, and applies a cookie to the client with the ID of the selected container.

On subsequent connections (XHR or interrupted WebSocket), proxy uses cookie to determine backend.

Stable sessions

Page 38: (DVO313) Building Next-Generation Applications with Amazon ECS

Proxy

Galaxy ServerApp A App A App A Galaxy Server

E C S C l u s t e r

E L B

Proxy

App B App C App D

Scheduler

A Z 1 A Z 2

If desired backend is unavailable, proxy selects new backend and reapplies a cookie.

App A (dead)

App A

Stable sessions

Page 39: (DVO313) Building Next-Generation Applications with Amazon ECS

Proxy

App A v1 Galaxy Server

App A v1

App A v1

App A v1

Galaxy Server

E C S C l u s t e r

E L B

Proxy Scheduler

A Z 1 A Z 2

v1

App updates require the cooperation of the scheduler and proxy components.

Coordinated version updates

Page 40: (DVO313) Building Next-Generation Applications with Amazon ECS

Proxy

App A v1 Galaxy Server

App A v1

App A v1

App A v1

Galaxy Server

E C S C l u s t e r

E L B

Proxy Scheduler

A Z 1 A Z 2

App A v2

App A v2

App A v2

App A v2

v1

First step is to spin up new containers in parallel with the old.

(This can be done in a rolling fashion, not shown here.)

Coordinated version updates

Page 41: (DVO313) Building Next-Generation Applications with Amazon ECS

Proxy

App A (dead) Galaxy Server Galaxy Server

E C S C l u s t e r

E L B

Proxy Scheduler

A Z 1 A Z 2

App A v2

App A v2

App A v2

App A v2

App A (dead)

App A v1

App A v1

v1

Once new containers pass health checks, scheduler starts to tear down old client connections and the old

containers.

Coordinated version updates

Page 42: (DVO313) Building Next-Generation Applications with Amazon ECS

Proxy

App A (dead) Galaxy Server Galaxy Server

E C S C l u s t e r

E L B

Proxy Scheduler

A Z 1 A Z 2

App A v2

App A v2

App A v2

App A v2

App A (dead)

App A (dead)

App A (dead)

v1 v2

Proxy recognizes code update in progress, ignores session cookie, and routes client to new container

(establishing new stable session).

Coordinated version updates

Page 43: (DVO313) Building Next-Generation Applications with Amazon ECS

• Galaxy collects metrics on CPU, memory, network traffic,

and a count of connected clients from each running app.

• collector process (one per container instance) streams container

metrics via Docker Remote API, and poll proxy metrics on a known

port.

• aggregator process (singleton) polls each collector, computes

aggregate rollups (hourly, daily), stores each time series in

DynamoDB.

• Aggregator expires old metrics. Tables are sharded by time range.

• Galaxy server reads directly from DynamoDB.

Metrics

Page 44: (DVO313) Building Next-Generation Applications with Amazon ECS

With the amount of growth we have seen after our launch last year, keeping

the servers alive has been an uphill battle until Galaxy came along.

– Tigran Sloyan, Codefights

Galaxy … solved many of the ongoing challenges we had with our previous

server stack: load balancing across sticky sessions, scaling processes, etc.

– Shawn Young, Classcraft

Loosely coupled architecture working well for us

High availability strategy works: apps stayed up during IaaS outages

Our experience so far

Page 45: (DVO313) Building Next-Generation Applications with Amazon ECS

Multiple clusters

Additional AWS regions

On-prem (customer-supplied IAM credentials)

Free tier and instance cost optimizations

… and more …

What’s next

Page 46: (DVO313) Building Next-Generation Applications with Amazon ECS

The JavaScript app platform

www.meteor.com

Galaxy available now!

Page 47: (DVO313) Building Next-Generation Applications with Amazon ECS

Thank you!

Matt DeBergalis – @debergalis

Page 48: (DVO313) Building Next-Generation Applications with Amazon ECS

Remember to complete

your evaluations!