44
Lessons learned trying to implement DevOps in a rapidly growing environment

Dev ops lessons learned - Michael Collins

Embed Size (px)

Citation preview

Page 1: Dev ops lessons learned  - Michael Collins

Lessons learned trying to implement DevOps in a rapidly growing environment

Page 2: Dev ops lessons learned  - Michael Collins
Page 3: Dev ops lessons learned  - Michael Collins

Local CommunitySustainable!

Page 4: Dev ops lessons learned  - Michael Collins

Thank you!

Page 5: Dev ops lessons learned  - Michael Collins

Lessons learned trying to implement DevOps in a rapidly growing environment

Page 6: Dev ops lessons learned  - Michael Collins

Lessons learned trying to implement DevOps in a rapidly growing environment• Lament of a Failed DevOps Manager

• Origin of this talk

• Excuses

Page 7: Dev ops lessons learned  - Michael Collins

IntroductionsMichael Collins Principal Systems Architect !http://www.demonware.net/ !@ook !

Page 8: Dev ops lessons learned  - Michael Collins

Demonware• Online services for Console

Games

• Middleware

• SaaS APIs

• Cross platform SDKs

• Consultancy & Design

• Part of Activision Blizzard

Page 9: Dev ops lessons learned  - Michael Collins

Demonware• 435+ million gamers

• 3.2 million+ concurrent online gamers

• 95+ games

• 300,000+ requests per second at peak

• Avg. query response time of < .01 second

• Collect 500,000+ metrics a minute

• 100 billion+ API calls per month

Page 10: Dev ops lessons learned  - Michael Collins

Lessons learned trying to implement DevOps in a rapidly growing environment• What is rapidly growing?

• 50-100% annual growth

• People, Scale & Complexity

• How Applicable are our lessons?

• This talk == Not technical

• For DW Tech talks see:

• Erlang and First-Person Shooters in online games - Malcolm Dowse - Erlang Factory London 2011

• PyCon.ie 2011 Keynote - Damien Marshall

• Puppet at Demonware - Ruaidhrí Power - PuppetConf ’12

0

500

1000

1500

2000

2500

3000

3500

2007 2008 2009 2010 2011 2012 2013 2014

NewServersNeeded

Re4redServers

Reusedservers

ServersCumula4ve

OpsStaff

Page 11: Dev ops lessons learned  - Michael Collins

A brief history of "DevOps" at Demonware

• Early years (2003 - 2007)

• Focused on P2P, handful of Services, minimal data persistence, 10s of servers, random hardware, Golden Images, Shell Scripts

• root for (almost) everybody!

• “NoOps”

• Early 2007 - Removed root access for developers

• April 2008 - Started to dabble with Puppet

• September 2008 - Automated installs (preseed), Standard Hardware, Puppet based installs for production base

• Spring 2009 - Started to build OS packages for our stack

• June 2009 - DW Engineers attend Velocity for first time

Page 12: Dev ops lessons learned  - Michael Collins

A brief history of "DevOps" at Demonware

• Summer 2010 - Rushed switch to Cobbler/CentOS, more Puppet driven by custom ENC - disabled noop

• January 2011 - First Ops Intern

• Early 2011 - Ops Re-Org, enter DevOps team

• August 2011 - First Engineer moved from Dev to Ops

• September 2011 - Move to Continuous Deployment for Puppet to Prod

• February 2012 - Work with DTO solutions on "Dev Environment provisioning blue print”

• March 2012 - Disband DevOps team, new Org Structure - first official Ops Software Engineer job title

• September 2012 - Rundeck in Production

• October 2012 - Internal hack-a-thon week to kickstart "Ops API”

• November 2012 - Ops API first release, read only cached access to Inventory system

• December 2012 - Our current Build engineer started

• December 2012 - Prototype v1 of internal IAAS API for bare metal provisioning

• February 2013 - First engineer transferred from Ops team to another team (Datawarehouse)

• March 2013 - First release of Build Engineering automated developer environment setup tool

Page 13: Dev ops lessons learned  - Michael Collins

Initial Thoughts on our DevOps History

• Suspect Typical Evolution for Traditional Busy Ops & Dev?

• “DevOps” almost exclusively focused on Ops :-(

!

• Big Wins

• Building internal APIs

• Continuous deployment of Puppet to Production

• Big Losses

• Restricting Prod Access

• Starting with Prod & trying to retrofit

• Being stereotypical BOFHs

Page 14: Dev ops lessons learned  - Michael Collins
Page 15: Dev ops lessons learned  - Michael Collins

10 Lessons Learned

Page 16: Dev ops lessons learned  - Michael Collins

1 - Be able to clearly articulate what DevOps is

What is DevOps?

Page 17: Dev ops lessons learned  - Michael Collins

What does DevOps mean for …

• You

• Day to Day

• Big Picture

• Your Organization

• Colleagues

• Boss

• Teams you work with

• Leadership

• Can you explain Clearly, Articulately & Concisely to everyone you deal with?

Page 18: Dev ops lessons learned  - Michael Collins
Page 19: Dev ops lessons learned  - Michael Collins

DevOps for me @ Demonware

• For me

• Day to day - “Automate all the things!”

• Big Picture - “Service Delivery Pipeline, Organizational Sympathy”

• For Demonware

• Colleagues - “Buzzword Bullsh*t - almost Cloud”

• My Boss - “DevOps within Ops, Visible Ops”

• Old Boss - “Developer Self Service; You build it, you run it”

• Teams I work with - “We have to write puppet? What happened to the puppet guys? Wow puppet sucks”

• Leadership - “Bridge Dev vs Ops divide, maintain agility as we grow”

• Everyone - “Something Michael rambles about”

• PS. Above Quotes Fabricated

Page 20: Dev ops lessons learned  - Michael Collins

2 - Trust your developers• My Single Biggest Mistake

• Revoking developer access to Production

• Ops be a good Customer for your Developers, provide:

• Requirements

• Bug Reports

• Examples

• Metrics & Data

Page 21: Dev ops lessons learned  - Michael Collins

3 - Start with Dev• Working on Automation for over 5 years

• Almost exclusively focused on Production

• Never quite useable in Development

• Don’t do this

• In 2013 easy to start with Dev

• Packer, Vagrant, Docker, Boxen etc

• First day: sign in, push “make go now”, get coffee, work

Page 22: Dev ops lessons learned  - Michael Collins

4 - Toolchains not Tools• Demonware - "We build & run services which use Erlang,

Python, RabbitMQ, MySQL & Cassandra with Hadoop for Data Analytics”

• DevOps@Demonware were “The Puppet guys”

• Demonware Ops have:

• Nagios guy

• Elasticsearch/Logstash/Kibana girl

• Graphite guy

Page 23: Dev ops lessons learned  - Michael Collins

4 - Toolchains not Tools• CfEngine vs Puppet vs Chef vs Ansible

• Apache vs Lighttpd vs Nginx vs Jetty

• Who cares?

• What matters is:

• Using Configuration Management

• Using a HTTP server

Page 24: Dev ops lessons learned  - Michael Collins

Distinguish between Tools & Toolchain Components

• Knowledge not Trade

• Components not Things

• Bezos Amazon Service mandate

• Containers / VMs / APIs / PaaS

• Describe not Proscribe

Page 25: Dev ops lessons learned  - Michael Collins

5 - Service Delivery Pipelines

Dev

elop

men

tO

pera

tions

Build Run

? ? ? ?

? ? ? ?

Page 26: Dev ops lessons learned  - Michael Collins
Page 27: Dev ops lessons learned  - Michael Collins

5 - Service Delivery Pipelines

Dev

elop

men

tO

pera

tions

Build Run

Page 28: Dev ops lessons learned  - Michael Collins

DevOps Toolchain & Service Delivery

• Not my idea

• DTO Solutions

• ITIL Service Delivery

• Many Others

• http://dev2ops.org/category/devops-toolchain-project/

Page 29: Dev ops lessons learned  - Michael Collins

6 - Organizational Sympathy

• Mechanical Sympathy

• "Hardware and software working together in harmony”

• Martin Thompson, High Performance Low Latency Specialist

• Blog & Mailing List

Page 30: Dev ops lessons learned  - Michael Collins

6 - Organizational Sympathy• Understand your organization

• Goals, Processes etc

• Then decide which Toolchain elements make sense to re-use

• And what you have to build

• Your organization is not Etsy, Facebook or Twitter

• You can’t map their Toolchain & Processes without appropriate Transformations

Page 31: Dev ops lessons learned  - Michael Collins

PHB Alert

Page 32: Dev ops lessons learned  - Michael Collins

7 - Organizational Flexibility• Org Structure not sacred

• Annual re-orgs normal?

• Examples

• Valve

• Internally

• Good - Engineers continuing to work together post “re-org”

• Bad - Ops Area, Dev Area :-(

Page 33: Dev ops lessons learned  - Michael Collins

7 - Organizational Flexibility

• Spend time in different roles

• Google "Mission Control”

• Sit with other teams

• Gatecrash scrums

• Understand your colleagues POV

Page 34: Dev ops lessons learned  - Michael Collins

8 - Communication is Hard• Timezones Suck

• Cultural differences are Hard

• Managing Growth without missteps is impossible?

• Most Nerds^wEngineers pick crappy mediums

• Face, VC, Voice, IM, Mail …

• No Silver Bullets

• Best Writing Advice for Engineers I've Ever Seen. Period.

Page 35: Dev ops lessons learned  - Michael Collins

9 - Hiring Matters

• The biggest contribution I have made to Demonware is managing to hire people who are smarter than me

• Especially crucial for “DevOps”

Page 36: Dev ops lessons learned  - Michael Collins

10 - Metrics & Data• Business Metrics not CPU utilization

• Data justifies

• Change

• Resources

• Experiments

Page 37: Dev ops lessons learned  - Michael Collins

TL;DR

Page 38: Dev ops lessons learned  - Michael Collins

“How does <X> make it easier to deploy and run our services?”

Page 39: Dev ops lessons learned  - Michael Collins

Aside - Puppet Continuous Deployment

• Problem

• Automation just for system build & service prop

• “Just stopping puppet, will fix later” - Divergence not Convergence

• Solution

• Sledgehammer

• Toolchain

• Code Review & Aggressive pushing (Git & Gerrit & Fan-out)

• Monitoring & Alerting based on Puppet (Internal Daemon & Nagios)

• “Positive” Policy enforcement - Disease build “bears”

• Testing - dcinabox

• Result

• Most production hosts 100% puppet managed (working on staging)

• In large clusters Drain & Rebuild easier then troubleshooting

Page 40: Dev ops lessons learned  - Michael Collins

Looking Forward• Distributed Configuration

• Promise Theory, Cluster State Transitions, Multiple Sources of Truth, Constraint Solving

• Distributed System Platform Blocks

• Netflix / Twitter OSS Stacks

• Separating Infrastructure, Platform & Applications

• Containers

• DC wide cluster scheduling

• Scaling Organisations

• Remote Workers?

• Embedded Ops?

• Flat organizations?

Page 41: Dev ops lessons learned  - Michael Collins

DevOps Lessons Learned1. Be able to clearly articulate what DevOps is at multiple Levels of Detail

2. Trust your developers

3. Start with Dev

4. Toolchains not Tools

5. Service Delivery Pipelines

6. Organizational Sympathy

7. Organizational Flexibility

8. Communication is hard

9. Hiring Matters

10. Metrics & Data

Page 42: Dev ops lessons learned  - Michael Collins

Surprise - We are Hiring!

[email protected]

• http://www.demonware.net/

• @demonware

!

• Also food & some drinks later are on us …

Page 43: Dev ops lessons learned  - Michael Collins

Questions?

Page 44: Dev ops lessons learned  - Michael Collins

Random• Contenders for inclusion:

• Operational Acceptance

• Versioning

• Release Management

• Repository Management

• Agile!11!