27
What is DevOps? Book Extract / Summary Mike Loukides O’ Reilly

DevOps 201607

Embed Size (px)

Citation preview

Page 1: DevOps 201607

What is DevOps?Book Extract / Summary

Mike Loukides O’ Reilly

Page 2: DevOps 201607

The NoOps Debate • Adrian Cockcroft’s article about NoOps at Netflix • Ignited a controversy that has been smouldering for sometime .

• John Allspaw’s gave a detailed response to Adrian’s article

• The Key here is not semantics and vocabulary but concepts and principles that help better operations and provide resilient services

Page 3: DevOps 201607

BOFH (“Bastard Operator from Hell”) • serves as a reminder of those days, when disasters happen and

operations fire fight and blame game is on.. • Being told “We need 125 servers online ASAP, and there’s no time to

automate it is a recipe for disaster.” - Sascha Bates

Page 4: DevOps 201607

O’Reilly’s Velocity Conference

Page 5: DevOps 201607

Infrastructure as Code• If you’re going to do operations reliably, you need to make it

reproducible and programmatic

Page 6: DevOps 201607

Autonomous Correction• software to detect a misbehaving EC2 instance automatically, destroy

the bad instance, spin up a new one, and configure it, all without interrupting service.

Page 7: DevOps 201607

modern applications, running in the cloud, still need to be resilient

James Urquhart

Page 8: DevOps 201607

Operations doesn’t go awayit becomes part of the development

Page 9: DevOps 201607

Not an uber developer, • who understands big data, • web performance optimization, • application middleware, • fault tolerance in a massively distributed environment,

•we need operations specialists on the development teams.

Page 10: DevOps 201607

Cooperation not isolationcooperate and collaborate with the developers who create the applications.

Page 11: DevOps 201607

… movement informally known as “DevOps.”

Page 12: DevOps 201607

Amazon’s EBS outage 2013 - NetFlix• demonstrates how the nature of “operations” has changed • Netflix, knew how to design for reliability; • they understood resilience, • spreading data across zones. • resilience was a property of the application, • EBS was down but Netflix’s ChaosMonkey ensured resilience

Page 13: DevOps 201607

that the best thing about the EBS outage was that his guys weren’t running around like crazy trying to fix things

JD Long tweet

Page 14: DevOps 201607

The bonding needs to be fluid, but that’s precisely the point.

The task — providing a solid, stable application for customers — is the same.

Page 15: DevOps 201607

Operations is crucial to success, • but operations can only succeed to the extent that it collaborates with

developers and participates in the development of applications that can monitor and heal themselves.

Page 16: DevOps 201607

Its not about fire fighting but eliminating fires

Page 17: DevOps 201607

it’s important not to divorce developers from the consequences of their work since the fires are frequently set by their code.

Allspaw points out

Page 18: DevOps 201607

Pin Pointing Rather than finger-pointing

post-mortems that try to figure out the causality of an outage is old world

Page 19: DevOps 201607

So What is DevOps• interminable up-front planning, • “minimum viable product,” • continuous integration, • continuous deployment.

The Tool Set

Page 20: DevOps 201607

PERL• no sysadmin worth his salt came without a portfolio of Perl • Perl was designed as a programming language for automating system

administration. I

Page 21: DevOps 201607

Puppet and Chef • automate configuration, • every machine has an identical software configuration • running the right services.

Page 22: DevOps 201607

Vagrant • ensure that all your virtual machines are constructed identically from

the start

Page 23: DevOps 201607

ChaosMonkey? • randomly kills instances and services within the application.• Resilience embedded Operations

Page 24: DevOps 201607

DTrace • provide insight into almost every aspect of system behavior; • 1 big challenges facing modern operations groups is developing

analytic tools and metrics that can take advantage of the data that’s available to predict failures

Page 25: DevOps 201607

EMT training sessions are essential for operations staff so that they understood how to handle themselves and communicate with each other in an emergency

Jesse Robbins

Page 26: DevOps 201607

Ending thoughts • Hadoop cluster to monitor the Hadoop cluster• Operations in change groups are playing a huge role in the

deployment of new, more efficient protocols for the web, like SPDY. • Lot of our “best practices” for TCP tuning were developed in the days

of ISDN and 56 Kbps analog modems, and haven’t been adapted to the reality of Gigabit Ethernet, OC48* fiber,

Page 27: DevOps 201607

Sincere Thanks to Mike Loukides & O’Reilly Media

Vishwanath RamdasThis is a book summary extract for wiki reference