24
Automated everything …now what? Pieter Lexis Kumina bv cfgmgmtcamp 2014 Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 1 / 24

Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Automated everything…now what?

Pieter Lexis

Kumina bv

cfgmgmtcamp 2014

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 1 / 24

Page 2: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

What I’ll discuss today

• Quick-wins for and automations to cfgmgmt• Advanced and cool things with basic cfgmgmt in place• How to covertly get your client to do cfgmgmt for you

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 2 / 24

Page 3: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

whoami

Pieter Lexis

• SysAdmin for 5 years (full-time since 7 months)

• MSc System and Network Engineering• Protocol-nerd• Security-geek

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 3 / 24

Page 4: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Kumina

• Founded in 2007• 100% based on FLOSS• Everything Puppet-based (but moving to Ansible)

Stats

• Almost 4 FTE• 42 KSLOC of puppet code• 220+ Machines• 10k+ checks in Icinga

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 4 / 24

Page 5: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Quick-wins with cfgmgmt – Monitoring and trending

What monitoring?

The phone rings

Client : “We’re getting errors while connecting tothe site”You: *open the site*Your browser : zOMG Cert has expired

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 5 / 24

Page 6: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Quick-wins with cfgmgmt – Monitoring and trending

That monitoring

Monitor everything and its dependencies by default.

• What to check• Validity date• OCSP revocation status• Key+chain matches the one in cfgmgmt

• Where to check• On disk• In the TLS connection

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 6 / 24

Page 7: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Quick-wins with cfgmgmt – Built-in security

Which database?

The phone rings

Client : “One of our devs accidentally wrote to the productiondatabase”You: “Uhm what?”Client : “How could that have happened?”

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 7 / 24

Page 8: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Quick-wins with cfgmgmt – Built-in security

The production database!

Security in depth and principle of least privilege by default

• Firewall• Deny all by default• Allow specific source host/port

• Database• Don’t create 'user'@'%'• Different passwords• Don’t give every devs the production password ;-)

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 8 / 24

Page 9: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Quick-wins with cfgmgmt – KISS, DRY etc.

Why copy+paste?sshd[32597]: Invalid user florian from 203.0.113.15sshd[32610]: Invalid user jorge from 203.0.113.15sshd[32633]: Invalid user dennis from 203.0.113.15sshd[32636]: Invalid user kate from 203.0.113.15sshd[32709]: Invalid user george from 203.0.113.15

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 9 / 24

Page 10: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Quick-wins with cfgmgmt – KISS, DRY etc.

When you can generalize

Fix once, test once, deploy everywhere

• Ops is going to use Dev techniques• Why not use Dev best practices?

• KISS – Add only the things you need• DRY – Use includes and generalize• Testing – “Test-driven Ops”

• Basic firewall rules• Extend for each server

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 10 / 24

Page 11: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Quick-wins with cfgmgmt – Conclusion

What is a solid cfgmgmt base?3 Mantras

• Monitor everything and it’s dependencies• Security in depth and principle of least privilege• Fix once, test once, deploy everywhere

Does your cfgmgmt …• Deploy and configure services?• Manage firewall?• Database management?• Monitor and trend everything it deploys?• All of the above• More?

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 11 / 24

Page 12: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

After basic cfgmgmt

Now what?

Automate even more

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 12 / 24

Page 13: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

After basic cfgmgmt – Do more with data

Get a grip

Ever made one of these?

grep 192.0.2. /var/log/apache2/*-access.log |\grep -v 'ˆ-' | grep '20/Dec' | awk '{print $NF}' |\sort -u | xargs -l host

• Sucks doesn’t it?• And you’ll write it anew the next time

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 13 / 24

Page 14: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

After basic cfgmgmt – Do more with data

Get a grip

Collect and parse logs

• Fluentd• Flume• Logstash• Splunk

And display them

• Kibana• Splunk

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 14 / 24

Page 15: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

After basic cfgmgmt – Do more with data

Get a grip

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 15 / 24

Page 16: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

After basic cfgmgmt – A grip on security

A new exploit is out And there are no patches yet

Being in control with cfgmgmt

1 You know where $vulnerable app lives2 There’s probably a way to detect infection3 You control the monitoring app4 Just deploy these checks automatically

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 16 / 24

Page 17: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

After basic cfgmgmt – Don’t wake up

Being on-call sucks

• Getting false-positives sucks• Not getting false-negatives sucks even more

A disk is almost full

• You shouldn’t care about this at night

The disk will be full within an hour

• You should care about this at night

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 17 / 24

Page 18: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

One step beyond

Has this ever happened to you?

Client : “Could you point dev.mydomain.com to our website?”You: *code code code*Client : “Ok, that’s good, but I need this PHP variable altered”You: *code code code* (in the config you just touched)Client : “Now i need this small change”

Lather, rinse, repeat

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 18 / 24

Page 19: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

One step beyond

Create less work for yourself

• Filling in templates• You know the (un)safe values• Why not have your client fill those in?

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 19 / 24

Page 20: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

One step beyond

Involve the client I

What your client does

• Realize what they need changed• Edit the cfgmgmt code• Submit {patches,merge-requests}

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 20 / 24

Page 21: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

One step beyond

Involve the client II

What you do

• Give them access to the VCS• Verifying patches• Applying patches and deploying• Explaining why setting X isn’t correct

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 21 / 24

Page 22: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

One step beyond

Make it easier on the client I

What your client does

• Fill in a web form

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 22 / 24

Page 23: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

One step beyond

Make it easier on the client II

What the application does

• Check if the value is safe• Add the config to your cfgmgmt code• Deploy

What you do

• Build the backend• Set sane defaults

Automated everything …now what? - Pieter Lexis Feb. 3rd 2014 23 / 24

Page 24: Automated everything now what? · • Everything Puppet-based (but moving to Ansible) Stats • Almost 4 FTE • 42 KSLOC of puppet code • 220+ Machines • 10k+ checks in Icinga

Automated everything …now what?