27
Puppet Deployment at OnApp Wai Keen Woon CTO, CDN Division [email protected]

Puppet Deployment at OnApp

Embed Size (px)

DESCRIPTION

From PuppetCamp Southeast Asia 2012 in Kuala Lumpur, Malaysia.

Citation preview

Page 1: Puppet Deployment at OnApp

Puppet Deployment at OnApp

Wai Keen Woon CTO, CDN Division [email protected]

Page 2: Puppet Deployment at OnApp

WARNING

<ObligatoryPlug>

Page 3: Puppet Deployment at OnApp

About OnApp

OnApp launched July 1st 2010

Backed by LDC

The leading cloud management software for hosts

The instant global CDN for hosts

Deep industry knowledge

100+ employees in US, EU, APAC

A leading provider of software for hosts

Page 4: Puppet Deployment at OnApp

Vital Statistics

1 in 3 public clouds

cloud deployments

global clients

800+

300+

Page 5: Puppet Deployment at OnApp

Customer Stories

Page 6: Puppet Deployment at OnApp

paid for idle capacity

get low

PoPs

Instant CDN that gives you…

75+ cost, high margin

Page 7: Puppet Deployment at OnApp

OK.

</ObligatoryPlug>

Page 8: Puppet Deployment at OnApp

Systems Overview

l  Core & Development l  ~20 physical servers l  ~200 VMs l  Homogeneous environment – 64-bit Debian everywhere l  Mainly use OpenVZ and KVM for virtualization

l  CDN Delivery Edge Servers l  100+ servers in 60+ cities l  Running on the OnApp platform – either Xen or KVM

l  Puppet integral to our setup – since day 1

Page 9: Puppet Deployment at OnApp

Why Puppet?

l  More reliable configuration of servers. Less need to “run ssh in a for loop” and miss out something.

l  Self-documenting – our manifests are almost able to bootstrap an empty server. l  Our manifests can't bootstrap an empty environment yet. l  Limitation – manifests describe what/where/how something

is setup, but doesn't describe *why*. l  Nice syntax – easy on the eyes. Comprehensive builtin

resource types. Able to fallback to dumb ways of doing things if required (use file, exec et al).

Page 10: Puppet Deployment at OnApp

Core Infra Environments

l  Systems manifest describes everything. l  Three environments:

β

Page 11: Puppet Deployment at OnApp

What Would OnApp Setup...

l  Essential utilities (tcpdump, less, vim, etc). l  Users & their SSH keys, sudoers.

l  Developer's shell => /bin/false if production l  Base firewall rules. l  Nagios agent. l  Set uniform locality settings: UTC timezone,

en_US.UTF-8 locale. l  SMTP that smarthosts to our central relay. l  Syslogd for remote logs to central logging server. l  Finally, the services.

Page 12: Puppet Deployment at OnApp

Core Infra Manifest Excerpt

$portal_domain = "portal.alpha.onappcdn.com"

$portal_db_host = "portal.alpha.onappcdn.com"

$portal_db_user = "aflexi_webportal"

$auth_nameservers = { "ns1" => "175.143.72.214",

"ns2" => "175.143.72.214",

"ns3" => "175.143.72.214",

"ns4" => "175.143.72.214",

}

$monitoring_host_server =

[ "monitoring.alpha.onappcdn.com",

"dns.alpha.onappcdn.com" ]

node "monitoring.alpha.onappcdn.com" {

include base

include s_db_monitoring

include s_monitoring_server

include collectd::rrdcached

include s_munin

include s_monitoring_alerts

include s_monitoring_graph

} class collectd::rrdcached {

package { "rrdcached":

ensure => latest,

}

service { "rrdcached":

ensure => running,

}

}

BLUE – env config definitions RED – node definitions GREEN – class definitions

Page 13: Puppet Deployment at OnApp

Package Repo Integration

l  Jenkins builds debs of our code and stores it into an apt repository for the environment it is built for.

l  Puppet keeps packages up-to-date (ensure => latest) and restarts services on package upgrades. Puppet-agent[25431]: (/Stage[main]/Debian/Exec[apt-get-update]/returns) executed successfully puppet-agent[25431]: (/Stage[main]/Python::Aflexi::Mq/Package[python-aflexi-mqcore]/ensure) ensure changed '7065.20120530.113915-1' to '7066.20120604.090916-1' puppet-agent[25431]: (/Stage[main]/S_mq/Service[worker-rabbitmq]) Triggered 'refresh' from 1 events puppet-agent[25431]: Finished catalog run in 16.08 seconds

Page 14: Puppet Deployment at OnApp

Nagios Integration

l  Plugs into nagios – uses “exported resources”

Page 15: Puppet Deployment at OnApp

Nagios Integration

Server manifest

*exports the service that is checked @@nagios_service { "check_load_$fqdn":

check_command => "check_nrpe_1arg!check_load", use => "generic-service", host_name => $fqdn, service_description => "check_load", tag => $domain, }

Nagios service manifest *collects the resources to check

Nagios_service <<| tag == "onappcdn.cm" |>> { target => "/etc/n3/conf.d/services.cfg", require => Package["nagios3"], notify => Exec["reload-nagios"], }

Page 16: Puppet Deployment at OnApp

Nagios Integration

l  What's logged on the nagios server when puppet runs? puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/Nagios_host[hrm.onappcdn.com]/ensure) created puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/Nagios_service[check_load_hrm.onappcdn.com]/ensure) created nagios3: Nagios 3.2.1 starting... (PID=5601) puppet-agent[15293]: (/Stage[main]/Nagios::Base/Exec[reload-nagios]) Triggered 'refresh' from 8 events

Page 17: Puppet Deployment at OnApp

Monitoring Puppet Itself

l  Lots of tools/dashboards out there to achieve this. l  For us: “grep -i err */syslog”. Dumb, but works until we

need to Really Address it. l  Common issues:

l  Puppet gets “stuck”. And only one puppet instance can run at any one time.

l  Manifest errors – syntax, merge issues. l  Badly-written manifests (vague dependencies,

conditions/commands not robust enough). l  An important dependent resource failing (e.g. apt-get

install fails due to dpkg-configure error).

Page 18: Puppet Deployment at OnApp

File/Dir Organization

l  We use git to revision control our puppet manifests.

l  Style we adopted mainly comes from Hunter Haugen*

l  A branch for each environment, plus a “common” branch.

l  Each branch checked out as a separate directory in /etc/puppet/environments/$env

l  And puppetmaster's includedir configured to that directory.

* - http://hunnur.com/blog/2010/10/dynamic-git-branch-puppet-environments/

l  Common branch Manifests/ alpha.pp beta.pp Modules/ Base/ Users/

l  Alpha env branch Modules/ Python/ Services/ Nameserver/

l  Beta env branch Modules/ Python/ Services/ Nameserver/

Page 19: Puppet Deployment at OnApp

File/Dir Organization

l  Common goes into its own branch – for convenience; less merging needed for manifests that we are Really Sure won't differ between environments.

l  System manifest into common/manifests/$env.pp l  Initially tried putting manifest into alpha/beta/omega

branches as site.pp – merge hell. l  Introduced extra variable - $effective_env

l  Abstracts the puppet environment name, from the environment that the manifest runs in.

Page 20: Puppet Deployment at OnApp

File/Dir Organization

l  Hotfixes branch off omega and merged to alpha/beta/omega.

l  Development branches off alpha l  This branch can be trialed as a separate environment (use

--environment to specify custom env on puppet client). l  Merge to alpha → beta → omega. l  Or merge as feature branch to any other environment.

l  “git diff branchA branchB” - differences are shown clearly between environments.

Page 21: Puppet Deployment at OnApp

Edge Servers

l  Our edge servers are hosted on OnApp cloud (only). l  When creating an edge server, the cloud control panel

l  Instantiates a VM from a lightly-customized Debian image. l  Configures the package repositories. l  Issues a puppet run to set up.

l  Advantage of setting it up through puppet instead of a “gold image” - our system can be installed on bare metal if needed, can be reproducibly installed on $future_debian_release

Page 22: Puppet Deployment at OnApp

Edge Servers

l  Our edge servers are hosted on OnApp cloud (only). l  When creating an edge server, the control panel

instantiates a VM from a lightly-customized Debian image, and issues a puppet run to set it up.

Page 23: Puppet Deployment at OnApp

Edge Servers – External Node Classifier

l  No text manifest – all code, using “external node classifier”.

l  Assign variables and classes specific to the edge server through node classifier. E.g. its password, the services it runs.

l  In python, output = {} output[“classes”] = [ “class1”, “class2” ] output[“parameters”] = { “param1”: “value1” } print yaml.dump(output)

Page 24: Puppet Deployment at OnApp

Edge Servers – External Node Classifier

l  This YAML-encoded structure... $ puppet-nodeclassifier 85206671.onappcdn.com classes: [base, nginx ] parameters: { edge_secret_key: 86zFsrM7Ma, monitoring_domain: monitoring.alpha.onappcdn.com }

l  … is equivalent to this textual manifest: node 85206671.onappcdn.com { $edge_secret_key = “86zFsrM7Ma” $monitoring_domain = “monitoring.alpha.onappcdn.com” include base include nginx }

Page 25: Puppet Deployment at OnApp

Edge Servers Storedconfigs

l  Puppet stores facts about the edge servers into MySQL.

l  We make minimal use of this – for example sizing nginx's in-memory cache depending on the amount of memory it has.

l  Could probably use more e.g. set # threads based on cpu core count.

l  The data's always there if we ever want to query it...

Page 26: Puppet Deployment at OnApp

Q&A

l  Questions? Comments? l  P/S – final plug – we're hiring sysadmins!

Page 27: Puppet Deployment at OnApp