17
Zero to Prod in Crazy Time John Martinez | Adobe Cloud Services

Zero to Production in Crazy Time: Adobe’s Transformation

Embed Size (px)

DESCRIPTION

Adobe has quickly scaled from nothing to a huge presence in the AWS cloud. This is the story from the trenches: how we screwed up, learned and evolved our use of Chef to help get us to today. Taming Chef to work in the AWS cloud while trying to build a platform at a large scale was not as easy as we originally planned, and we’re consistently trying to make it better. We’ll share some tips and tricks from our experience.

Citation preview

Page 1: Zero to Production in Crazy Time: Adobe’s Transformation

Zero to Prod in Crazy Time

John Martinez | Adobe Cloud Services

Page 2: Zero to Production in Crazy Time: Adobe’s Transformation

About Me

• Currently working as a Cloud Operations Engineer at Adobe

• I get to figure out new stuff, and make really old stuff work in AWS

• 20+ years doing UNIX/Linux work

• Learned about cloud computing at Netflix

• Working at Adobe feeds my habit - photography

Page 3: Zero to Production in Crazy Time: Adobe’s Transformation

About Ops PeopleSome people see us as Ninjas, I really see us as Storm Troopers

Page 4: Zero to Production in Crazy Time: Adobe’s Transformation

Cloud Platforms @ Adobe• Creative Cloud

• Marketing Cloud

• Digital Publishing Suite

• Phonegap

• Typekit

• Acrobat.com

• Echosign

• Revel

• ...and growing...

Page 5: Zero to Production in Crazy Time: Adobe’s Transformation

How We Got Started

• Creative Cloud went live in late April 2012

• AWS from the start

• We needed to do SOMETHING

• Yes, it was really that scientific of a decision

• Chef vs. Puppet

• That learning curve

Page 6: Zero to Production in Crazy Time: Adobe’s Transformation

#EPICFAIL #1

• Not socializing the need for Chef to the dev team

• Once sold, keep momentum going

• The “let’s make this more complicated than it needs to be syndrome”

• Start with easy stuff first, then graduate

• Ops guy admits: the dev people know how to use software engineering methods for creating and maintaining infrastructure code: USE IT

Page 7: Zero to Production in Crazy Time: Adobe’s Transformation

Tweaking Knobs• EC2 AMIs: bake or configure?

• Baking positive: fast boot times

• Baking negative: too static

• Configure positive: very dynamic

• Configure negative: can take forever to boot

• We settled on a mostly dynamic configuration, with some static baking

• knife-ec2 is great, but what about autoscale?

• The CloudFormation connection

Page 8: Zero to Production in Crazy Time: Adobe’s Transformation

#EPICFAIL #2

• Get Chef, don’t actually use it

• Back to that learning curve (Hint: Training)

• Issue with compressed timelines and small staff

• In the heat of deploying prod, doing stupid things

• Losing track of what got deployed where

• Who’s doing what?

• Not sleeping sucks

Page 9: Zero to Production in Crazy Time: Adobe’s Transformation

Out of the Rubble

• Now that we’re live: refactor time (a.k.a. Fix all the broken stuff)

• Chef development for reals

• OMG: WINDOWS?!?!

• Not a lot of expertise in-house or outside

• Ops guy admits: learned to love dev tools like Jenkins and Git

Page 10: Zero to Production in Crazy Time: Adobe’s Transformation

It’s Alive!

• Did gradually over time

• Started with simple recipes, graduated to more complicated ones

• Using Environments to deploy the right thing in the right place

• It’s AWS stupid: you SHOULD kill your instances

• CloudFormation to AutoScale to Chef Client

Page 11: Zero to Production in Crazy Time: Adobe’s Transformation

It’s Alive (v1)

EC2Instances

S3 Bucket(validator

key)

CloudFormation Auto

ScaleGroup

Hosted

11. knife upload

CookbooksEnvironment

RolesData bags

2 3

4

0

0. ManualEditor (vi)Perforce

cfn-create-stack

4. Chef ClientBootstrap

Data Bag KeyRecipes

Page 12: Zero to Production in Crazy Time: Adobe’s Transformation

More Automation (v2)

EC2Instances

S3 Bucket(validator

key)

CloudFormation Auto

ScaleGroup

Hosted

11. knife upload

CookbooksEnvironment

RolesData bags

2 3

4

0

0. AutomatedGit

JenkinsJenkins CFN

4. Chef ClientBootstrap

Data Bag KeyRecipes

Page 13: Zero to Production in Crazy Time: Adobe’s Transformation

On Bootstrapping EC2 Instances

• Biggest issue with Chef in AWS: straying from knife-ec2

• Read the bootstrap document and reverse engineer it

• http://wiki.opscode.com/display/chef/Client+Bootstrap+Fast+Start+Guide

• http://wiki.opscode.com/display/chef/EC2+Bootstrap+Fast+Start+Guide

• user-data is your friend

• Use it for node identity

• Resist the devil: don’t send any API keys or passwords or embarrassing things via user-data!!!

• Windows works this way, too, but learn PowerShell

Page 14: Zero to Production in Crazy Time: Adobe’s Transformation

#EPICFAIL #3Oh crap, Opscode is DOWN!!!

Page 15: Zero to Production in Crazy Time: Adobe’s Transformation

#EPICFAIL #3

• Failing to architect for failure (double BAM)

• Even though we built a hot AWS architecture, we still got bit

• What does it mean when Hosted Chef is down for us?

• Talk to Opscode...really, talk to them, they want to help

Page 16: Zero to Production in Crazy Time: Adobe’s Transformation

How We’re Trying to Improve• Mostly around availability

• Augment Hosted Chef with Private Chef

• Mostly around security

• Use the tools at your disposal

• IAM policies for EC2 roles and S3 bucket security

• Mostly around performance

• Refactoring AWS-related code to use AWS SDK for Ruby

• AMI factory from base Amazon Linux or Ubuntu AMIs (bonus points for Windows)

Page 17: Zero to Production in Crazy Time: Adobe’s Transformation

The End

• Operational scripts, template examples and other bits

• https://github.com/Adobe-CloudOps

• Contact me:

• @johnmartinez

[email protected]

• Questions? Suggestions? Come talk to me after!