Raimonds Simanovskis
Agile Operations
or
How to sleepbetter at night
Raimonds Simanovskis github.com/rsim
@rsim
The easiest Business Intelligence tool on the Web
Before Agile...
DeveloperCustomer Tester
CodeTest
AnalyzeDesign
Agile Cross-Functional Team
Development vs Operations
Agile
SysAdminsSupport
DevOpsCustomer
Developer
TesterSysAdmin
Support
DBA
How to applyAgile valuesand practices
to Operations?
Individuals and interactions
Working production system
Customer collaboration
Responding to change
Processesand tools
Comprehensive documentation
SLAnegotiation
Followinga plan
over
over
over
over
Agile Values
Infrastructureas code
Typical system administration
Installationinstructions
Developmentserver
Typical system administration
DBPkg1OS
Pkg2App1
Testserver
OSDB Pkg1Pkg2 Pkg3
App1
Productionservers
OS1 OS2DBPkg1Pkg2 Pkg3
App1
Installationinstructions
Developmentserver
Typical system administration
DBPkg1OS
Pkg2App1
Testserver
Productionservers
OS1 OS2DBPkg1Pkg2 Pkg3
App1
Automate infrastructure build
Versioncontrolsystem
Automate infrastructure build
Versioncontrolsystem
Test server
Productionservers
Developmentsand-box
Local sand-box toolsVagrant + VirtualBox
Infrastructure provisioning and
configuration tools
Sprinkle
Continuousdeployment
Development Operations Production
From developmentto production
Development Operations Production
From developmentto production
From developmentto production
Development Operations Production
Fear of change
From Big Releasesto small deployments
Faster feedback
Problems localized faster
Reduces risk
Reduces overhead
Getting startedContinuous integration
Automated deployment
Real-time alerts
Root cause analysis
Good practicesZero-downtime deployments
Feature flags
Gradual rollouts
A/B split testing
Monitoring andself-healing
What to monitor?
CPU Memory
Disk Network
Everythingis fine!
Start withend-user experience
Remoteuser Our
server
HTTP
Is it alive?Time to respond?
One-time failure orfrequent failures?
Real user monitoring
Preventive error log analysis
What to do when something is wrong?
Application 1Monitoringapplication
Application 2Database
Operating system
What to do when something is wrong?
Application 1Monitoringapplication
Application 2Database
Operating systemAlert
What to do when something is wrong?
Application 1Monitoringapplication
Application 2Database
Operating systemAlert
What to do when something is wrong?
Application 1Monitoringapplication
Application 2Database
Operating system
Restart !
Fault tolerant systems
Design for failure
Fail fast
Collect failure data
Restore to known state
Monitoring tools
Managing infrastructure with code is fun!
DBPkg1
OS
Pkg2
App1