Cloud patterns applied

Preview:

Citation preview

Cloud patterns applied

Making the most of EC2 at EyeEm

• Site Reliability Engineer at EyeEm

• How do computers even work?

• Started as an operations guy in a scientific datacenter

• Now mostly developing and making users and developers happy

ME

@LarsFronius lars@eyeem.com

Resilience, Development, Culture.

—Paul Hammond

“If you think you can prevent failure, then you aren’t developing your ability to respond.”

• Have as few as possible machines containing application state

• Test restores of stateful machines

• Have as few as possible machines containing application state

• Test restores of stateful machines

• …all the time.

• Have as few as possible machines containing application state

• Test restores of stateful machines

• …all the time.

• Throw away stateless servers

• Throw away stateless servers

• Make sure they can come up again towards their expected behaviour

—John Allspaw, Richard Cook

“The goal of operations is to have every day be just another boring day. Achieving this boredom depends on foreseeing the future performance

of the system and making adjustments accordingly.”

• Distributed datastores

• Many small servers, rather than few big

33% 33% 33%

33% 33% 33%

50% 50% 33%

50% traffic increase on a single instance

12.5%12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%

12.5%12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%

14.3%14.3% 14.3% 14.3% 14.3% 14.3% 14.3% 12.5%

14.3%14.3% 14.3% 14.3% 14.3% 14.3% 14.3% 12.5%

14.4% traffic increase on a single instance

• Mark endpoints dead

• Test timeouts

• Single responsibility servers / services

• Security Groups control the interface how services are supposed to talk to another

• Single responsibility servers / services

• Security Groups control the interface how services are supposed to talk to another

• …and can be used to assign server role.

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379

Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

production branch=master

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

production branch=master

feature_x staging branch=feature_x

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379 Base Security Group

{! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }! }! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "api.feature_x.eyeem.com.",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "restore_from_extract",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "db1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "db1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "db1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "db1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "dbprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "extract-access"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "dbsg": {! "Properties": {! "GroupDescription": "db",! "SecurityGroupIngress": [! {! "FromPort": "3306",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "3306"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "db"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "puppetprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "puppet-provisioning"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "redis1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "redissg"! },! "puppeteers"! ],! "Tags": [],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "redis1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "redis1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "redis1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "redis1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "redissg": {! "Properties": {! "GroupDescription": "redis",! "SecurityGroupIngress": [! {! "FromPort": "6379",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "6379"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "redis"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! }! }!}

{! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }!

}! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "api.feature_x.eyeem.com.",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "base"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]!

}! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },!

–json.org

“JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy

for humans to read and write. It is easy for machines to parse and generate.”

–json.org

“JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy

for humans to read and write. It is easy for machines to parse and generate.”

eyeemstack create --machines backend db feed --restore_db extract --branch feature_x

• Python tool on top of troposphere, a python library to create CloudFormation descriptions

vagrant up backend db feed

class eyeem::profiles::backend::deploy {! eyeem::deploy_codebase { “backend”:! directory => ‘/var/www/backend’,! bucket => ‘eyeem-web-backend’,! filename => “backend-${::branch}.tar.gz”,! restart => [‘nginx’, ’php5-fpm’]! }!}!!!!define eyeem::deploy_codebase (! $prefix = '',! $directory,! $bucket,! $filename,! $restart ) {!! if (member($::mountpoints, “${directory}/current”) and $::environment == ‘local’) {! notice(“Looks like we are on Vagrant and you mounted the code in, skipping deploy.”)! } else {! ( . . . )! }!}

• ~70 Cents for a single test run.

• ~3.50 $ per workday.

• ~17.64 $ for always on staging per day.

• Tests disaster recovery on a sample dataset.

• Scalable setup.

• < 10 minutes

• Stagings just a click away.

Backend Security Group role=backend

Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

production branch=master

feature_x staging branch=feature_x

Backend Security Group role=backend

Base Security Group

Inventory Service Security

Group role=inventory

InstanceInstance

branch=feature_x

public_dns=api.feature_x.

eyeem.combranch=master

public_dns=api.eyeem.com

Backend Security Group role=backend

Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

production branch=master

feature_x staging branch=feature_x

Backend Security Group role=backend

Base Security Group

Inventory Service Security

Group role=inventory

InstanceInstance

branch=feature_x

public_dns=api.feature_x.

eyeem.combranch=master

public_dns=api.eyeem.com

Jobrunner Service Security

Group role=jobrunner

Culture

Practices

Tools

• ~350 Job Executions last month

• 350 times self service operations

• Stagings everywhere

• Definition of Done: Can you boot it up using EyeEmStack and Vagrant?

• Lots of 99.999s%

• “Everything fails all the time.”

• Test your repairs, automate everything.

• Distribute your data.

• Applications should be able to handle state transitions of service-parts and diagnose failure.

• Design your infrastructure towards acting as a service provider to your developers.

Questions?

Questions?