Upload
puppet-labs
View
54.176
Download
2
Tags:
Embed Size (px)
DESCRIPTION
R.I. Pienaar's talk "Managing Puppet using MCollective" at Puppet Camp Ghent, 2013 and at Puppet Camp New York 2013.
Citation preview
R.I.Pienaar
Puppet Camp Ghent
Managing Puppet using MCollective
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Who am I?
• Puppet user since 0.22.x
• Architect of MCollective
• Author of Extlookup and Hiera
• Developer at Puppet Labs London
• Blog at http://devco.net
• Tweets at @ripienaar
• Volcane on IRC
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
The Problem?
• Puppet needs management just like other software
• Enabling, disabling, ad-hoc runs, custom environments etc
• The Puppet Master is a finite resource that needs protection
• Orchestrated deploys
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Available on yum.puppetlabs.com and apt.puppetlabs.com
http://srt.ly/mcpuppet
package{[“mcollective-puppet-agent”, “mcollective-puppet-client”]: ensure => present}
MCollective Puppet Agent
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Obtaining The Agent Status
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
unix text here
Obtaining Statuses
$ mco puppet status
* [ ============================================================> ] 11 / 11
node8.example.net: Currently stopped; last completed run 14 minutes 16 seconds ago ....
Summary of Applying:
false = 11
Summary of Daemon Running:
stopped = 11
Summary of Enabled:
enabled = 10 disabled = 1
Summary of Idling:
false = 11
Finished processing 11 / 11 hosts in 72.05 ms
Per node status
Estate wide summary
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
$ mco puppet count
Total Puppet nodes: 11
Nodes currently enabled: 10 Nodes currently disabled: 1
Nodes currently doing puppet runs: 5 Nodes currently stopped: 6
Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 6
Obtaining Statuses
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
$ mco rpc puppet last_run_summary
* [ ============================================================> ] 28 / 28
. . .
Summary of Config Retrieval Time:
Average: 20.13
Summary of Total Resources:
Average: 435
Summary of Total Time:
Average: 39.33
Finished processing 28 / 28 hosts in 311.23 ms
Obtaining Statuses
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Running Puppet
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
$ mco puppet runonce
* [ ============================================================> ] 11 / 11
node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'
Finished processing 11 / 11 hosts in 2593.85 ms
$ mco puppet count
Total Puppet nodes: 11
Nodes currently enabled: 10 Nodes currently disabled: 1
Nodes currently doing puppet runs: 2 Nodes currently stopped: 9
Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 8
Doing Basic Runs
Puppet 3 disable message
Run with default configured splay and splaylimit
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Run with no splay, still subject to enable/disable
$ mco puppet runonce -f
* [ ============================================================> ] 11 / 11
node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'
Finished processing 11 / 11 hosts in 2661.99 ms
Doing Basic Runs
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Force splay and set a custom splay limit
$ mco puppet runonce --splay --splaylimit 120
* [ ============================================================> ] 11 / 11
node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'
Finished processing 11 / 11 hosts in 2661.99 ms
Doing Basic Runs
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Selects 2 tags in a specific Puppet Environment
$ mco puppet runonce --tag webserver --tag syslog --environment development
* [ ============================================================> ] 11 / 11
node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'
Finished processing 11 / 11 hosts in 2661.99 ms
Tags and Environment
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Do a noop run, gathers reports and audit information
$ mco puppet runonce --noop
* [ ============================================================> ] 11 / 11
node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'
Finished processing 11 / 11 hosts in 2661.99 ms
Doing noop Runs
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
When puppet.conf has noop=true,do an actual run on demand
$ mco puppet runonce --tag webserver --no-noop
* [ ============================================================> ] 11 / 11
node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'
Finished processing 11 / 11 hosts in 2661.99 ms
Doing no-noop Runs
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Does a single run against a differentPuppet Master
$ mco puppet runonce --server secops.example.net:8134 --tag compliance
* [ ============================================================> ] 11 / 11
node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'
Finished processing 11 / 11 hosts in 2661.99 ms
Choosing a Master
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Preventing Puppet Runs
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
The Big Red Button
Disables Puppet, does not change currentlydisabled nodes reasons
$ mco puppet disable “we f’d up, stop the train!”
* [ ============================================================> ] 11 / 11
node9.example.net Request Aborted Could not disable Puppet: Already disabled
Summary of Enabled:
disabled = 11
Finished processing 11 / 11 hosts in 90.06 ms
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
The Big Green Button
Enables all disabled Puppet nodes
$ mco puppet enable -S ‘puppet().disable_message=/stop the train/’
* [ ============================================================> ] 10 / 10
Summary of Enabled:
enabled = 10
Finished processing 10 / 10 hosts in 90.06 ms
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Operating On Groups Of Hosts
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Selective Runs
Run using a filter:all web servers with fact cluster=a
$ mco puppet runonce -W “cluster=a roles::webserver”
* [ ============================================================> ] 5 / 5
Finished processing 5 / 5 hosts in 90.06 ms
Facter fact Puppet Class
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Selective Runs
Run using a filter:nodes where we manage /srv/www
$ mco puppet runonce -S “resource(‘File[/srv/www]’).managed=true”
* [ ============================================================> ] 5 / 5
Finished processing 5 / 5 hosts in 90.06 ms
Any Puppet resource
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Selective Runs
Run using a filter:Most recent run config_version was xyz
that had > 5 resource failures
$ mco puppet runonce -S “resource().failed_resources>5 and resource().config_version=xyz”
* [ ============================================================> ] 5 / 5
Finished processing 5 / 5 hosts in 90.06 ms
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Runs all nodes with a maximum concurrency
$ mco puppet runall 72013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run
Roll Out A Change Quickly
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Does not attempt to manage disabled nodes
2013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes
Roll Out A Change Quickly
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Starts the first 6 quickly but considersadministrators doing 1other run at the same time
2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 7
Roll Out A Change Quickly
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
node9 was being run by an administrator or normalschedule already, skipped to next node
2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run
Roll Out A Change Quickly
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Regularly checks the concurrency and startsmore nodes soon as possible.
Average node run time 34.39s, totaltime 55 seconds
2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run
Roll Out A Change Quickly
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Does runonce in batches of 5, 5 minute sleepper batch. ^c after any batch to stop.
15 minute total run time.
$ mco puppet runonce --batch 5 --batch-sleep 300
* [ ============================================================> ] 11 / 11
Finished processing 11 / 11 hosts in 903686.29 ms
Roll Out A Change SlowlyWait 5 minutes
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Advanced Status And Performance Metrics
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Distribution of various metrics.
$ mco puppet summary
Summary statistics for 28 nodes:
Total resources: ▂▇▂▁▁▃▁▂▂▂▄▁▂▁▁▁▁▁▂▁ min: 332.0 max: 695.0 Out Of Sync resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Failed resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 0.0 Changed resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1 Time since last run (seconds): ▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ min: 10.0 max: 89.0k
Performance Analysis
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Distribution of various metrics.
Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1
Performance Analysis
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Distribution of config retrieval time.
$ mco plot resource config_retrieval_time
Information about Puppet managed resources Nodes 8 ++----*-----+----------+-----------+----------+----------+----------++ + * + + + + + + 7 ++ ** ++ | * * | 6 ++ * * ++ | * * | | * * | 5 ++ * * ++ | * * | 4 ++ * * ++ | * * | 3 ++ * * * * ++ | * * ** * ** | 2 ++* **** * * * ++ | * * * | | * * * | 1 ++ ************** ****** * * ** ++ + + + * + ** + *+ *** + 0 ++----------+----------+---------********-----+--*******-+----*-----++ 0 10 20 30 40 50 60 Config Retrieval Time
Performance Analysis
Slow machines
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Find machines with config_retrieval_time over30 seconds - all the dev servers.
$ mco find -S "resource().config_retrieval_time > 30"dev3.example.netdev4.example.netdev7.example.netdev6.example.netdev8.example.netdev9.example.netdev10.example.net
Performance Analysis
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Maintenance Windows and Access Control
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Only cert=manager can enable and disablethe Puppet Agent indicating maintenance
periods
policy default denyallow cert=manager enable disable * *allow cert=sysadmin runonce status * *allow cert=developer * environment=development *
Puppet State As ACL
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Puppet State As ACL
policy default denyallow cert=manager stop start * *allow cert=noc stop start puppet().enabled=falseallow cert=developer * environment=development *
NOC can start and stop servicesonly during a maintenance window.
Manager user can always overridemaintenance windows.
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
What is MCollective?
• Ruby framework for writing Orchestration systems
• Provides Authentication, Authorization and Auditing
• No direct communication between client and nodes
R.I.Pienaar | [email protected] | http://devco.net | @ripienaar
Questions?twitter: @ripienaar
email: [email protected]
blog: www.devco.net
github: ripienaar
freenode: Volcane
Questions?