54
Nathan Handler [email protected] / @nathanhandler PaaSTA Autoscaling at Yelp

PaaSTA: Autoscaling at Yelp

Embed Size (px)

Citation preview

Page 1: PaaSTA: Autoscaling at Yelp

Nathan [email protected] / @nathanhandler

PaaSTAAutoscaling at Yelp

Page 2: PaaSTA: Autoscaling at Yelp

● Nathan Handler / @nathanhandler● Site Reliability Engineer at Yelp● PaaSTA Developer and Maintainer● Ubuntu/Debian GNU/Linux Developer● freenode IRC staff member

Who am I?

2

Page 3: PaaSTA: Autoscaling at Yelp

Yelp’s MissionConnecting people with great

local businesses.

3

Page 4: PaaSTA: Autoscaling at Yelp

Yelp StatsAs of Q3 2016

97M 3274%115M

Page 5: PaaSTA: Autoscaling at Yelp

● Monolithic Python application (~3M LoC)● Builds/deployments took a long time

○ Bottleneck on how often we can deploy● Mistakes are painful

○ Large impact○ Difficult to find○ Slow to fix

History

5

Page 6: PaaSTA: Autoscaling at Yelp

● Split features into different applications● Smaller services allowed for faster pushes● Easier to reason about issues● Able to scale services independently

Service Oriented Architecture v1

6

Page 7: PaaSTA: Autoscaling at Yelp

● Standalone application● Stateless● Separate git repository● Typically at Yelp:

○ HTTP API○ Python, Pyramid, uWSGI○ virtualenv

What is a service?

7

Page 8: PaaSTA: Autoscaling at Yelp

● Statically defined list of hosts to deploy a service on● Operations decides which hosts to deploy to● Monitoring manually configured in Nagios● Manual deployment system via rsync

Deploying Services v1

8

Page 9: PaaSTA: Autoscaling at Yelp

3x

2x

Does Not Scale

9

1x

Page 10: PaaSTA: Autoscaling at Yelp

3x

2x

Does Not Scale

10

1x

Page 11: PaaSTA: Autoscaling at Yelp

● Yelp's Platform as a Service● Builds, Deploys, Connects, and Monitors services● Glue around existing and established open source

tools

PaaSTA

11https://github.com/yelp/paasta#paasta on irc.freenode.net

Page 12: PaaSTA: Autoscaling at Yelp

12

Page 13: PaaSTA: Autoscaling at Yelp

PaaSTA Components

13

DockerRegistry

Developer

git push

git pull

git push

docker push

13

Marathon

Sensu

Page 14: PaaSTA: Autoscaling at Yelp

PaaSTA Components

14

Developer

git push

14

Page 15: PaaSTA: Autoscaling at Yelp

.├── Dockerfile├── htdocs│ ├── index.php│ └── status└── Makefile

15

A simple service

Page 16: PaaSTA: Autoscaling at Yelp

DOCKER_TAG ?= $(USER)-dev

test: @echo 'Unit testing'

itest: cook-image paasta local-run --healthcheck --service devops

cook-image: docker build -t $(DOCKER_TAG) .

16

$ cat Makefile

Page 17: PaaSTA: Autoscaling at Yelp

DOCKER_TAG ?= $(USER)-dev

test: @echo 'Unit testing'

itest: cook-image paasta local-run --healthcheck --service devops

cook-image: docker build -t $(DOCKER_TAG) .

17

$ cat Makefile

Page 18: PaaSTA: Autoscaling at Yelp

Containers: like lightweight VMsProvides a language (Dockerfile) for describing container imageReproducible builds (mostly)Provides software flexibility

18

Docker

docker.com

Page 19: PaaSTA: Autoscaling at Yelp

FROM ubuntu:xenialMAINTAINER Nathan Handler <[email protected]>

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -y install \ apache2 \ libapache2-mod-php

ENV APACHE_RUN_USER www-dataENV APACHE_RUN_GROUP www-dataENV APACHE_LOG_DIR /var/log/apache2ENV APACHE_LOCK_DIR /var/lock/apache2ENV APACHE_PID_FILE /var/run/apache2.pid

RUN rm -f /var/www/html/index.htmlCOPY htdocs /var/www/html/

CMD ["/usr/sbin/apache2", "-D", "FOREGROUND", "-C", "listen 8888"]EXPOSE 8888 19

$ cat Dockerfile

Page 20: PaaSTA: Autoscaling at Yelp

✓ yelpsoa-config directory for devops found in /nail/etc/services✓ deploy.yaml exists for a Jenkins pipeline✗ No 'security-check' entry was found in your deploy.yaml.Please add a security-check entry *AFTER* the itest entry in deploy.yamlso your docker image can be checked against known security vulnerabilities.More info: http://servicedocs.yelpcorp.com/docs/paasta_tools/cli/security_check.html✗ No 'performance-check' entry was found in your deploy.yaml.Please add a performance-check entry *AFTER* the security-check entry in deploy.yamlso your docker image can be checked for performance regressions.More info: http://servicedocs.yelpcorp.com/docs/paasta_tools/cli/performance_check.html✓ Jenkins build pipeline found✓ Git repo found in the expected location.✓ Found Dockerfile✓ A Makefile is present✓ The Makefile contains a tab character✓ The Makefile contains a docker tag✓ The Makefile responds to `make cook-image`✓ The Makefile responds to `make itest`✓ The Makefile responds to `make test`✓ Found marathon.yaml file.✓ All entries in deploy.yaml correspond to a marathon or chronos entry✓ All marathon instances have a corresponding deploy.yaml entry✓ monitoring.yaml found for Sensu monitoring✓ Your service uses Sensu and team 'nhandler' will get alerts.✓ Found smartstack.yaml file✓ Instance 'demo' of your service is using smartstack port 20973 and will be automatically load balanced✓ Successfully validated schema: marathon-nova-devc.yaml 20

$ paasta check

Page 21: PaaSTA: Autoscaling at Yelp

PaaSTA Components

21

DockerRegistry

Developer

git push

git pull

git push

docker push

21

Page 22: PaaSTA: Autoscaling at Yelp

---pipeline:- instancename: itest- instancename: push-to-registry- instancename: dev.everything

22

$ cat deploy.yaml

Page 23: PaaSTA: Autoscaling at Yelp

Jenkins Pipeline

23

Page 24: PaaSTA: Autoscaling at Yelp

paasta-nova-devc.demo-20160503T231914-deploypaasta-nova-devc.demo-20160503T234021-stoppaasta-nova-devc.demo-20160503T234202-start

24

$ git tag

Page 25: PaaSTA: Autoscaling at Yelp

Describe end goal, not pathHelps us achieve fault tolerance.

"Deploy 6de16ff2 to prod"vs.

"Commit 6de16ff2 should be running in prod"

Gas pedal vs. Cruise Control25

Declarative control

Page 26: PaaSTA: Autoscaling at Yelp

Description: A demo PaaSTA service for OSCON 2016External Link: http://conferences.oreilly.com/oscon/open-source-us/public/schedule/detail/49358Monitored By: team nhandlerRunbook: Please set a `runbook` field in your monitoring.yaml. Like "y/rb-mesos". Docs: https://trac.yelpcorp.com/wiki/HowToService/Monitoring/monitoring.yamlGit Repo: [email protected]:services/devopsJenkins Pipeline: https://jenkins.yelpcorp.com/view/services-devopsDeployed to the following clusters: - nova-devc (N/A)Smartstack endpoint(s): - http://169.254.255.254:20973 (demo)Dashboard(s): - https://uchiwa.yelpcorp.com/#/events?q=devops (Sensu Alerts)

26

$ paasta info

Page 27: PaaSTA: Autoscaling at Yelp

PaaSTA Components

27

DockerRegistry

Developer

git push

git pull

git push

docker push

27

Marathon

Page 28: PaaSTA: Autoscaling at Yelp

Mesos is an "SDK for distributed systems", not batteries-included.Requires a framework• Marathon• Chronos

Can run many frameworks on the same clusterSupports Docker as task executor

28

mesosphere.iomesos.apache.org

Scheduling: Mesos + Marathon

Page 29: PaaSTA: Autoscaling at Yelp

---main: cpus: 0.1 instances: 3 mem: 500

29

$ cat marathon-nova-devc.yaml

Page 30: PaaSTA: Autoscaling at Yelp

PaaSTA Components

30

DockerRegistry

Developer

git push

git pull

git push

docker push

30

Marathon

Page 31: PaaSTA: Autoscaling at Yelp

● Brutal: Stops old versions and starts the new version, without regard to safety.

● Upthendown: Brings up the new version of the service and waits until all instances are healthy before stopping the old versions.

● Downthenup: Stops any old versions and waits for them to die before starting the new version.

● Crossover: Starts the new version, and gradually kills instances of the old versions as new instances become healthy.

Bounce Strategies

31

Page 32: PaaSTA: Autoscaling at Yelp

32

Discovery in PaaSTA: Smartstack

mesosslave

box2client

nerve

HAProxy

synapse

box1service

nerve

mesosslave

synapse

HAProxy

ZooKeeperMetadataHTTP request

healthcheck

Page 33: PaaSTA: Autoscaling at Yelp

33

Latency Zones

SuperregionRegionHabitat

Page 34: PaaSTA: Autoscaling at Yelp

---demo: advertise: [region] discover: region proxy_port: 20973

34

$ cat smartstack.yaml

Page 35: PaaSTA: Autoscaling at Yelp

● Reduce/Eliminate Manual Scaling● Stop Overprovisioning● Save Money on our Infrastructure Bill● Increase Reliability

Why Autoscale?

35

Page 36: PaaSTA: Autoscaling at Yelp

Pipeline: https://jenkins.yelpcorp.com/view/services-devops

cluster: nova-devc instance: demo Git sha: 6de16ff2 State: Running - Desired state: Started Marathon: Healthy - up with (3/3) instances. Status: Running. Mesos: Healthy - (3/3) tasks in the TASK_RUNNING state. Smartstack: Name LastCheck LastChange Status useast1-devc - Healthy - in haproxy with (3/3) total backends UP in this namespace.

36

$ paasta status

Page 37: PaaSTA: Autoscaling at Yelp

Autoscaling Should Be Easy

37

Page 38: PaaSTA: Autoscaling at Yelp

---main: cpus: 0.1 instances: 3 mem: 500

38

$ cat marathon-nova-devc.yaml

Page 39: PaaSTA: Autoscaling at Yelp

---main: cpus: 0.1 min_instances: 3 max_instances: 5 mem: 500

39

$ cat marathon-nova-devc.yaml

Page 40: PaaSTA: Autoscaling at Yelp

Autoscaling Should Be Flexible

40

Page 41: PaaSTA: Autoscaling at Yelp

● mesos_cpu: Tries to use CPU usage to predict when to autoscale (Default)

● http: Makes a request on a HTTP endpoint on your service. Expects a JSON-formatted dictionary with a 'utilization' field containing a number 0-1

● uwsgi: Uses the percentage of non-idle workers as the utilization metric

Metrics Providers

41

Page 42: PaaSTA: Autoscaling at Yelp

● pid: Uses a PID controller to determine when to autoscale a service

● threshold: Autoscales when a service’s utilization exceeds beyond a certain threshold.

● bespoke: Allows a service author to implement their own autoscaling.

Decision Policies

42

Page 43: PaaSTA: Autoscaling at Yelp

---main: cpus: 0.1 min_instances: 3 max_instances: 12 mem: 500

autoscaling: metrics_provider: http endpoint: metrics.json decision_policy: threshold setpoint: 0.5

43

$ cat marathon-nova-devc.yaml

Page 44: PaaSTA: Autoscaling at Yelp

Autoscaling

44

Page 45: PaaSTA: Autoscaling at Yelp

● Examines all resources tracked by PaaSTA○ Scales based on the "worst" value

● Supports scaling pools independently● Supports a configurable target_utilization● Supports both ASGs and SFRs in AWS

Cluster Autoscaler

45

Page 46: PaaSTA: Autoscaling at Yelp

Cluster: mesosstageDashboards: Marathon RO: http://marathon.paasta-mesosstage.yelp/ Smartstack: http://paasta-mesosstage.yelp:3212 Chronos RO: http://chronos.paasta-mesosstage.yelp/ Mesos: http://mesos.paasta-mesosstage.yelpMesos Status: OK quorum: masters: 3 configured quorum: 2 frameworks: framework: chronos-2.4.0 count: 1 framework: marathon count: 1 CPUs: 1.00 / 7 in use (14.29%) Memory: 3.03 / 42.85GB in use (7.07%) Disk: 10.00 / 153.81GB in use (6.50%) tasks: running: 9 staging: 1 starting: 0 slaves: active: 7 inactive: 0Marathon Status: OK marathon apps: 5 marathon tasks: 9 marathon deployments: 0Chronos Status: OK Enabled chronos jobs: 1 46

$ paasta metastatus

Page 47: PaaSTA: Autoscaling at Yelp

module "nova-devc-useast1a-demo" { source = ".../terraform-modules/paasta_spot_cluster" cluster = "nova-devc" ecosystem = "${var.ecosystem}" min_capacity = 2 max_capacity = 20 initial_target_capacity = 5 ami_type = "paasta" spot_price = 0.1337 instance_profile = "paasta" pool = "demo"}

47

Terraform'ed PaaSTA Pool

Page 48: PaaSTA: Autoscaling at Yelp

● Scaling up is safe and easy● Scaling down can be risky

Scaling Safely

48

Page 49: PaaSTA: Autoscaling at Yelp

● Utilizes Mesos Maintenance Primitives● Attempts to dynamically reserve all available

resources on a host● Scales up the service by the number of instances

running on the host that is in maintenance● Terminates the host once it is fully drained or it

reaches its timeout

PaaSTA Maintenance

49

Page 50: PaaSTA: Autoscaling at Yelp

PaaSTA Components

50

DockerRegistry

Developer

git push

git pull

git push

docker push

50

Marathon

Sensu

Page 51: PaaSTA: Autoscaling at Yelp

---team: nhandlerpage: truenotification_email: [email protected]

51

$ cat monitoring.yaml

Page 52: PaaSTA: Autoscaling at Yelp

52

Page 53: PaaSTA: Autoscaling at Yelp

● More advanced decision policies● Infrastructure-Agnostic● Better integration with PaaSTA/Mesos

Why not AWS (or something else)?

53

Page 54: PaaSTA: Autoscaling at Yelp

@YelpEngineering

fb.com/YelpEngineers

engineeringblog.yelp.com

github.com/yelp54