Upload
tomas-doran
View
8.172
Download
0
Embed Size (px)
Citation preview
Empowering developers to deploy their own data stores.
A story of Terraform, Puppet and rage
Tomas Doran@bobtfish
• Iterate on the things you do often
• Hide complexity
• Empower others
2
Devops = Workflow
• A thing of the past (mostly) • Need to be able to scale up and down in hours • If not minutes
• Need to allow people to experiment • Cloud is expensive, unless you use it!
3
Artisanal hand-crafted servers
• ‘Infra’ layer • DNS / puppet / apt - basic services • A(WS)?nycast - failover / HA
• ‘App’ layer • Smartstack - Service discovery + routing • Paasta (Mesos + Marathon) - Scheduling + Orchestration • search24-reviews-uswest1aprod - ugh!
4
2 Layer architecture
• Remembering the . on PTR records
• For some people! • Why make them do this?
5
The hardest thing
• Datastore PAAS • Elasticsearch clusters are the ‘easy’ case
• No ‘master’ - all machines are equal • Automatic sharding/replication
• ASG + ELB • Zookeeper for discovery
6
Next logical step
• curl http://10.29.0.3:8142 (A(WS)nycast puppetmaster)
{ “habitat”: “uswest1aprod”
}
• “habitat”, “region”, “superregion”, “ecosystem”
7
Environment server
• curl http://10.29.0.3:8142 (A(WS)nycast puppetmaster)
{ “habitat”: “uswest1aprod”
}
• “habitat”, “region”, “superregion”, “ecosystem”
8
Environment server
• Hostname: search1-reviews-uswest1aprod • Parse out cluster name
elasticsearch_cluster { ‘reviews’: }
puppet/modules/elasticsearch_cluster/data/cluster/reviews.yaml
• Can locate the ‘data’ directory somewhere else! • Reuse the same YAML for service discovery + provisioning • Commit hook validation
9
puppet data in modules
• External Node Classifer • Puppetmaster calls a script, returns node definition • Create node definition from EC2 tags
puppet::role::elasticsearch_cluster => cluster_name=reviews
• Stop needing individual hostnames! • Pre-allocate names using GENERATE
10
puppet ENC
• Bad abstraction for contextual information • Which db server is the master? Does it have ‘master’ in it’s FQDN? • If it does, what happens when you promote another machine?
• Need key => value for cattle not pets
• Customize your monitoring system to actually tell you what’s wrong! • ‘The master db has crashed’ vs ‘A db has crashed’ • ‘10-46-11-54 is dead’ vs ‘zookeeper::10-46-11-54 is dead`
11
Hostnames
• Got most of the pieces • Machines auto-configure themselves after launch.
• Remaining step is actually launching machines
• Terraform is awesome… • IF you treat it as a low level abstraction • IF you keep things in composeable units • IF you add enough workflow to not run with scissors
12
Terraform
13
14
15
• Terraform the most generic abstraction possible • Map JSON (HCL) DSL => CRUD APIs • Cannot do implicit mapping
• But puppet / ansible / whatever can??? • ‘Name’ tag => namevar • Only works in some cases - not everything has tags!
• Implicit mapping is evil • Duplicates will screw up your day
16
Low level
17
Implicit mapping example - puppet AWS
18
Implicit mapping example - puppet AWS
19
Implicit mapping example - puppet AWS
20
Implicit mapping example - puppet AWS
• BUG - prefetch method eats exceptions (fixed now)
21
Implicit mapping example - puppet AWS
• BUG - prefetch method eats exceptions (fixed now)
22
Implicit mapping example - puppet AWS
• Reusable abstraction (in theory)
• Don’t try to use like puppet! • Flat hierarchy (do not nest modules) • Use version tags • Use other git repos
• Or just generate resources as JSON
• KISS23
Terraform modules
• Why even is state? • How to cope with state
• Atlas • Workflow (locking!) is your problem • Remote state
• Shard terraform for (team) concurrency • S3 store
• Many read, few write • Wrap it yourself (make, Jenkins, don’t install terraform in $PATH)
24
State
• Provides the workflow
• ‘awsadmin’ machine + IAM Role as slave
• Makefile based workflow
• Jenkins job builder to template things
25
Jenkins
• Refresh state (upload refreshed state) • Plan + save as artifact • Filter plan! • Approve plan • Apply plan, save state
26
Split up the steps
• Commit some files to git. • Push to a branch • Jenkins runs • Gated approval/application process
• Abstract away the scary parts • Enforce workflow
27
Cluster provisioning workflow
• Self service cluster provisioning • Developers define their own clusters • 1 click from OPs to approve
• Owning team gets accounted • AWS metadata added as needed. • All metadata validated.
• Clusters built around best practices • Can abstract further in future
28
Nirvana
P.S. We’re hiring!
@bobtfish
engineeringblog.yelp.com
github.com/Yelp
github.com/bobtfish