76
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. David Dooling & Ryan Richt October 2015 Cloud First New Architecture for New Infrastructure @ddgenome & @ryan_richt ARC401

(ARC401) Cloud First: New Architecture for New Infrastructure

Embed Size (px)

Citation preview

Page 1: (ARC401) Cloud First: New Architecture for New Infrastructure

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

David Dooling & Ryan Richt

October 2015

Cloud FirstNew Architecture for New Infrastructure

@ddgenome & @ryan_richt

ARC401

Page 2: (ARC401) Cloud First: New Architecture for New Infrastructure

What to Expect from the Session

Theory of Cloud

Page 3: (ARC401) Cloud First: New Architecture for New Infrastructure
Page 4: (ARC401) Cloud First: New Architecture for New Infrastructure

Scientists Turned Developers Turned Architects

Page 5: (ARC401) Cloud First: New Architecture for New Infrastructure

Ryan

David

Scientists Turned Developers Turned Architects

Page 6: (ARC401) Cloud First: New Architecture for New Infrastructure

Monsanto

Page 7: (ARC401) Cloud First: New Architecture for New Infrastructure
Page 8: (ARC401) Cloud First: New Architecture for New Infrastructure

Theory of Cloud

Page 9: (ARC401) Cloud First: New Architecture for New Infrastructure

Theory of Cloud

Automated

Elastic

Highly Available

Security

Software defined everything

Unlimited scale + pay-as-you-go

Horizontally Scalable

Multi-AZ/region + shards/replicas

Provision more like things any time

“Do over” + Correct by construction

Page 10: (ARC401) Cloud First: New Architecture for New Infrastructure

Theory of Cloud Cloud Architecture

Automated Higher-Order Automation

Elastic Ephemeral Environments

Highly Available Fault Tolerant

Security Secure by Construction

Horizontally Scalable Parallel, Commodity

Page 11: (ARC401) Cloud First: New Architecture for New Infrastructure

Higher-Order Automation

Automated Tests

Continuous Integration

Continuous Delivery

Automated Infrastructure

Automated Fault Detection

Automated Recovery

…and automated tools to build more automation!

Page 12: (ARC401) Cloud First: New Architecture for New Infrastructure

Fallacies of Internal Apps

1. The hardware is reliable

2. The network is reliable

3. The database is reliable

4. Other services are available

5. Inside the network is secure

6. …

Fault Tolerant

Page 13: (ARC401) Cloud First: New Architecture for New Infrastructure

Fault Tolerant

Fallacies of 1st Generation Cloud

1. Other people’s fault tolerant

code is actually fault tolerant

2. Everything is stateless

3. Everything can be retried

4. Applications should handle all

faults

5. Data is magically handled by

someone else

Page 14: (ARC401) Cloud First: New Architecture for New Infrastructure

Elastic, Ephemeral, Cost-Effective

time

cost

Cloud

On Prem

Dynamic Env Replication

time

cost

Cloud

On Prem

Experiments

Page 15: (ARC401) Cloud First: New Architecture for New Infrastructure

A Do-Over for Secure by Construction

Secure by Assumption

Secure by Design

Security Automation

Page 16: (ARC401) Cloud First: New Architecture for New Infrastructure

Horizontally Scalable

1. The overhead of scaling

grows at most linearly with

additional nodes

2. Reads and writes both

scale out

3. The system can continue to

provide this scalability

under loss of any node

* This (CAP) requires apps to

understand conflicts

Page 17: (ARC401) Cloud First: New Architecture for New Infrastructure

Infrastructure Automation

Page 18: (ARC401) Cloud First: New Architecture for New Infrastructure

Federation – 1000 VPCs

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPCAmazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPCAmazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Amazon VPC

Page 19: (ARC401) Cloud First: New Architecture for New Infrastructure

Cloud Architecture

Page 20: (ARC401) Cloud First: New Architecture for New Infrastructure

Cloud Architecture

Page 21: (ARC401) Cloud First: New Architecture for New Infrastructure

Cloud Architecture

Page 22: (ARC401) Cloud First: New Architecture for New Infrastructure

Cloud Architecture

Page 23: (ARC401) Cloud First: New Architecture for New Infrastructure

Cloud Architecture

Page 24: (ARC401) Cloud First: New Architecture for New Infrastructure

AWS

CloudFormation

"IPAddress" : {"Type" : "AWS::EC2::EIP","DependsOn" : "AttachGateway","Properties" : {

"Domain" : "vpc","InstanceId" : { "Ref" : "WebServerInstance" }

}},"InstanceSecurityGroup" : {

"Type" : "AWS::EC2::SecurityGroup","Properties" : {

"VpcId" : { "Ref" : "VPC" },"GroupDescription" : "Enable SSH access via port 22","SecurityGroupIngress" : [

{"IpProtocol":"tcp","FromPort":"22","ToPort":"22","CidrIp" : { "Ref" : "SSHLocation"}},

{"IpProtocol":"tcp","FromPort":"80","ToPort":"80","CidrIp" : "0.0.0.0/0"}

]}

},"WebServerInstance" : {

"Type" : "AWS::EC2::Instance","DependsOn" : "AttachGateway","Metadata" : {

"Comment" : "Install a simple application", …

Page 25: (ARC401) Cloud First: New Architecture for New Infrastructure

Cloud Architecture

Page 26: (ARC401) Cloud First: New Architecture for New Infrastructure

CloudFormation Template Generator

https://github.com/MonsantoCo/cloudformation-template-generator

Page 27: (ARC401) Cloud First: New Architecture for New Infrastructure

CloudFormation

Template

Generator

Referential Integrity

Page 28: (ARC401) Cloud First: New Architecture for New Infrastructure

Auto Scaling

Group

Page 29: (ARC401) Cloud First: New Architecture for New Infrastructure

CFTG: Security Groups

Page 30: (ARC401) Cloud First: New Architecture for New Infrastructure

Stax$ ./stax --helpUsage: stax [OPTIONS] COMMAND [COMMAND_ARGS]add Add functionality to an existing VPCauto-services Lanch multiple services on fleet using template/NAME.services filecheck Run various tests against an existing staxclean Remove keys and buckets of non-existant stacksconnect [TARGET] Connect to bastion|gateway|service in the VPC stax over SSHcreate Create a new VPC stax in AWSdescribe Describe the stax created from this hostdelete Delete the existing VPC staxdockerip-update Fetch docker IP addresses and update related filesfleet Run various fleetctl commands against the fleet clusterhelp Output this messagehistory View history of recently created/deleted staxlist List all completely built and running staxrds PASSWORD Create an RDS instance in the DB subnetrds-delete RDSIN Delete RDS instance RDSINremove ADD Remove the previously added ADDservices List servers that are available to run across a staxslack Post usage report to Slack, define hook in stax.configsleep Turn on/off bastion host which allows ssh access into the VPCstart SERVICE Start service SERVICE in the fleet clustertest Automated test to exercise functionality of staxupdate Update an existing VPC with changes from Cloudformationvalidate Validate CloudFormation template

For more help, check the docs: https://github.com/MonsantoCo/stax

Create and

manage

CloudFormation

stacks in AWS

Page 31: (ARC401) Cloud First: New Architecture for New Infrastructure

$ ./stax create[ ---- ] creating stax[ NAME ] vpc-stax-36918-outfitting[ ---- ] creating parameter file[ ---- ] checking for valid json file format[ ---- ] creating ssh key pair in aws[ ---- ] creating key pair[ ---- ] create bucket[ ---- ] creating bucket vpc-stax-36918-outfitting[ ---- ] uploading template[ ---- ] validate template[ ---- ] validating template https://s3.amazonaws.com/…[ ---- ] uploading vpc assets[ ---- ] creating stax in aws[ ---- ] stax creation complete[ ---- ] querying aws[ ---- ] query complete[ ---- ] see run/vpc-stax-36918-outfitting.json for details

Page 32: (ARC401) Cloud First: New Architecture for New Infrastructure

$ ./stax connect[ ---- ] checking if stax build is complete[ ---- ] describe stax[ NAME ] vpc-stax-36918-outfitting[ ---- ] querying aws[ ---- ] query complete[ ---- ] see run/vpc-stax-36918-outfitting.json for details[ ---- ] stack vpc-stax-36918-outfitting build complete[ ---- ] connecting to stax: bastion

__| __|_ )_| ( / Amazon Linux AMI

___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2014.09-release-notes/[ec2-user@ip-10-183-1-195 ~]$

Page 33: (ARC401) Cloud First: New Architecture for New Infrastructure

Stax as a Service - Create

Page 34: (ARC401) Cloud First: New Architecture for New Infrastructure

Stax as a Service – List

Page 35: (ARC401) Cloud First: New Architecture for New Infrastructure

Stax as a Service – Describe

Page 36: (ARC401) Cloud First: New Architecture for New Infrastructure

Stax as a Service – Services

Page 37: (ARC401) Cloud First: New Architecture for New Infrastructure

Monsanto

Page 38: (ARC401) Cloud First: New Architecture for New Infrastructure

Microservices Lifecycle

Page 39: (ARC401) Cloud First: New Architecture for New Infrastructure

Microservices: Cupcakes, Not Wedding Cakes

Page 40: (ARC401) Cloud First: New Architecture for New Infrastructure

A modern language for software engineering

Abstract Data Types (ADTs)

Enforced Immutability

Pattern Matching & Destructuring

Assignment

Type-Level Programming

Futures, Actors, Async

Type classes

Scala, Haskell, Swift, OCaML, SML

Scala, Haskell, Clojure, Erlang, OCaML,

SML

CoffeeScript, Scala, Haskell, Swift, OCaML,

Erlang, SML

Haskell, Scala, C++

Erlang, Scala, Java

Haskell, Scala, ~OCaML

Hybrid OO/FP

Provides transition from and backward compatibility with Java

Page 41: (ARC401) Cloud First: New Architecture for New Infrastructure

Advanced Abstractions

Algebraic Data Types (ADTs)

Enforced Immutability

Pattern Matching & Destructuring

Assignment

Type-Level Programming

Futures, Actors, Async

Type classes

Scala: A Modern Language for Software Engineering

Advanced Type Constraints

Advanced Generics & Variance

Higher Kinds

F-bounded Polymorphism

Self-Types

Type Projections

Type Members

Path Dependent Types

Type Refinements

Turing-complete!

Page 42: (ARC401) Cloud First: New Architecture for New Infrastructure
Page 43: (ARC401) Cloud First: New Architecture for New Infrastructure
Page 44: (ARC401) Cloud First: New Architecture for New Infrastructure

Project-as-a-Service 1 – Create Code Repo/Wiki/Issues

Page 45: (ARC401) Cloud First: New Architecture for New Infrastructure

Project-as-a-Service 2 – Simple Service Template

Runs giter8 to create a fully functional service written in

Scala based off our current best practices:

• Standard libraries (Slick, Spray, Akka, etc.) for

microservices

• Automated tests with ScalaTest

• Administrative REST endpoints

• Built in (remote) logging and metric capabilities

• Auto-Docker-ization

• Local Vagrant environment

Page 46: (ARC401) Cloud First: New Architecture for New Infrastructure

Project-as-a-Service 3 – CI & Dockerization

New check-in Test and Build Dockerize

Page 47: (ARC401) Cloud First: New Architecture for New Infrastructure

Project-as-a-Service 4 – Continuous Deployment

Page 48: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

A commit is made to GitHub1

1

https://github.com/MonsantoCo/etcd-aws-cluster

https://github.com/MonsantoCo/docker-aws

https://github.com/MonsantoCo/fleet-client

Page 49: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

GitHub notifies Jenkins that new code is available.

Jenkins runs automated tests to validate that code is functional.2

2

Page 50: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Jenkins builds a Docker container and pushes it to our private Docker registry.3

3

service-1:1

Page 51: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Jenkins registers the service with etcd, our key/value store, since it doesn’t exist.4

4

service-1:1

name

version

revision

service-1 => 1

Page 52: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Jenkins calls fleet to deploy the container running our service.5

5

service-1:1

service-1 => 1

service v1 rev1

10.183.0.100:8080

Page 53: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Registrator notices the service is deployed and registers the location in etcd.6

6

service-1:1

service-1 => 1

service-1-1 =>

[10.183.0.100:8080]

service v1 rev1

10.183.0.100:8080

Page 54: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

When a request is received, the router determines the current revision for the service as

well as the location of the service.7

7

service-1:1

service-1 => 1

service-1-1 =>

[10.183.0.100:8080]

service v1 rev1

10.183.0.100:8080

Page 55: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Next commit (rev 2) is received, Jenkins will test/build/push and look up the revision from

etcd. The revision is newer so it continues but does not update the current revision.8

8

service-1:1

service-1 => 1

service-1-1 =>

[10.183.0.100:8080]

service v1 rev1

service-1:2

10.183.0.100:8080

Page 56: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Jenkins deploys the new container to fleet. It runs side-by-side with the previous

revision at a different location.9

9

service-1:1

service-1 => 1

service-1-1 =>

[10.183.0.100:8080]

service v1 rev1

service-1:2

service v1 rev2

10.183.0.100:8081

10.183.0.100:8080

Page 57: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Registrator notices the new service is deployed and registers the location in etcd under

a different key.10

10

service-1:1

service-1 => 1

service-1-1 =>

[10.183.0.100:8080]

service-1-2 =>

[10.183.0.100:8081] service v1 rev1

service-1:2

service v1 rev2

10.183.0.100:8081

10.183.0.100:8080

Page 58: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Traffic continues to flow to the old service as the current revision has not changed.11

11

service-1:1

service-1 => 1

service-1-1 =>

[10.183.0.100:8080]

service-1-2 =>

[10.183.0.100:8081] service v1 rev1

service-1:2

service v1 rev2

10.183.0.100:8081

10.183.0.100:8080

Page 59: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Traffic can be directed to a particular version by using a header for testing purposes.12

12

service-1:1

service-1 => 1

service-1-1 =>

[10.183.0.100:8080]

service-1-2 =>

[10.183.0.100:8081] service v1 rev1

service-1:2

service v1 rev2

X-Service-Revision: 2

10.183.0.100:8081

10.183.0.100:8080

Page 60: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Periodically, Route Updater queries etcd to look for cases where there is a revision

deployed that is newer than the current route.13

service-1:1

service-1 => 1

service-1-1 =>

[10.183.0.100:8080]

service-1-2 =>

[10.183.0.100:8081] service v1 rev1

service-1:2

service v1 rev2 13

10.183.0.100:8081

10.183.0.100:8080

Page 61: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

If there is a newer revision, route updater will attempt to call the smoketest endpoint. If

this returns true, it updates the current route.14

service-1:1

service-1 => 2

service-1-1 =>

[10.183.0.100:8080]

service-1-2 =>

[10.183.0.100:8081] service v1 rev1

service-1:2

service v1 rev214

/admin/smoketest

10.183.0.100:8081

10.183.0.100:8080

Page 62: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Now traffic will start flowing to the new revision of the service automatically.15

service-1:1

service-1 => 2

service-1-1 =>

[10.183.0.100:8080]

service-1-2 =>

[10.183.0.100:8081] service v1 rev1

service-1:2

service v1 rev2

15

10.183.0.100:8081

10.183.0.100:8080

Page 63: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Route Updater will notice that there is a stale revision running. It will instruct the service

to cleanly exit by making a call to the /admin/shutdown endpoint.16

service-1:1

service-1 => 2

service-1-1 =>

[10.183.0.100:8080]

service-1-2 =>

[10.183.0.100:8081] service v1 rev1

service-1:2

service v1 rev2

16

/admin/shutdown

10.183.0.100:8081

10.183.0.100:8080

Page 64: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

Registrator will notice the container is no longer running and remove its location from

etcd.17

service-1:1

service-1 => 2

service-1-1 =>

[10.183.0.100:8080]

service-1-2 =>

[10.183.0.100:8081]

service-1:2

service v1 rev2

17

10.183.0.100:8081

Page 65: (ARC401) Cloud First: New Architecture for New Infrastructure

fleet

Router

Route Updater

Registrator

The system continues as-is until a new revision is deployed.18

service-1:1

service-1 => 2

service-1-2 =>

[10.183.0.100:8081]

service-1:2

service v1 rev2

10.183.0.100:8081

Page 66: (ARC401) Cloud First: New Architecture for New Infrastructure

Comprehensive

Service – log4j

Container – logspout

CoreOS – journal forwarder

Bastion/NAT – rsyslog

ELB – S3 (ELK coming soon)

S3 – S3 (ELK coming soon)

CloudTrail – S3 → TrailDash

RDS – (coming soon)

Logging with ScalaLogging and ELK

Easy to use

• Standard ScalaLogging interface

• Auto custom formats (stack traces)

• JSON-format log messages

• Direct-to-ELK writing

• Standard Fields (container ID, code

version, service name, etc)

Page 67: (ARC401) Cloud First: New Architecture for New Infrastructure

Instrumentation & Shipping

• Kamon to Prometheus

Exporter, preserves more

metrics than Prometheus JVM

• Improved tracing

• Improved complex data

mapping

• Periodically collect and push

Spray metrics to Kamon

Automating Kamon and PrometheusAuto-discovery, Dashboards, Alerts

• Custom Docker containers with

more automation – etcd

discovery

• Custom default dashboards

• Auto EC2/EBS/RDS standup

• OAuth integration

• SNS notification integration

• Default Alerts

https://github.com/MonsantoCo/spray-kamon-metrics

Page 68: (ARC401) Cloud First: New Architecture for New Infrastructure

What’s Next

Page 69: (ARC401) Cloud First: New Architecture for New Infrastructure

Improvements & Evolution

AWS Service Catalog – API?

EC2 Container Service

AWS IAM

• EC2 CS Roles

• RDS Roles – per VPC/DB Subnet Groups

Amazon API Gateway

VPC Flow Logs – CloudFormation support?

Inverting control for deployment

CloudFormation update predictability

IAM role

Amazon RDS

Amazon EC2

Container

Service

Page 70: (ARC401) Cloud First: New Architecture for New Infrastructure

Higher-Order Automation

Automated Tests

Continuous Integration

Continuous Delivery

Automated Infrastructure

Automated Fault Detection

Automated Recovery

…and automated tools to build more automation!

Page 71: (ARC401) Cloud First: New Architecture for New Infrastructure

Monsanto IT

Page 72: (ARC401) Cloud First: New Architecture for New Infrastructure

Acknowledgements

Larry Anderson

Chris Coffman

TJ Corrigan

Phil Cryer

Dave D’Alessandro

Daniel Solano Gómez

Justin Honold

Kyle Jones

Jessica Kerr

Kevin Meredith

Jorge Montero

Brian Rodgers

Chris Shafer

Niranjan Vengavasi

Dick Wall

Russ Wilson

Stuart Wong

Page 73: (ARC401) Cloud First: New Architecture for New Infrastructure

Thank you!engineering.monsanto.com

@MonsantoPlatformEng

@ddgenome @ryan_richt

Page 74: (ARC401) Cloud First: New Architecture for New Infrastructure

Remember to complete

your evaluations!

Page 75: (ARC401) Cloud First: New Architecture for New Infrastructure

Related Sessions

ARC309 - From Monolithic to Microservices: Evolving

Architecture Patterns in the Cloud

Thursday, Oct 8, 4:15 PM - 5:15 PM – Palazzo N

MBL203 - From Drones to Cars: Connecting the

Devices in Motion to the Cloud

Friday, Oct 9, 10:15 AM - 11:15 AM – Delfino 4005

Page 76: (ARC401) Cloud First: New Architecture for New Infrastructure

http://engineering.monsanto.com/code

@MonsantoPlatformEng

https://github.com/MonsantoCo/cloudformation-template-generator

https://github.com/MonsantoCo/docker-aws

https://github.com/MonsantoCo/etcd-aws-cluster

https://github.com/MonsantoCo/fleet-client

https://github.com/MonsantoCo/spray-kamon-metrics

https://github.com/MonsantoCo/stax

More to come…

@ddgenome @ryan_richt