Upload
amazon-web-services
View
1.003
Download
1
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mark Mansour, Senior Manager, Continuous Delivery
November 30, 2016
DEV403
DevOps on AWSAdvanced Continuous Delivery Techniques
What to expect from the session
Make your pipeline safer by
1. Identifying production issues quickly
2. Deploying changes safely
3. Automatically deciding when to release changes
Techniques
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
Prerequisites
• Versioned source
• Automated build
• Automated deployments
• Deploy to > 1 instance
• Unit tests
• Integration tests
• Continuous Delivery
• Operations dashboard
Source
Build
Deploy to Integration Stack
Integration Tests
Deploy to Production
Best practices with your tools
• Focus in on best practices
• Keep using your current tools where possible
• Deployment tools
• Continuous Integration and Continuous Delivery Tools
• Extend your current tools when needed
• This talk uses AWS tools
Tools used in this talk
Monitoring
Amazon CloudWatch
Software Development
Amazon SNS
AWS Lambda
Deployment
AWS CodeDeploy
AWS CodePipeline
MyApp
CodeCommit
Source
Build
CodeCommit
Build
DeployToInteg
CodeDeploy
Integration
IntegTest
End2EndTester
DeployToProd
CodeDeploy
Production
Source
Build
Deploy to Integration Stack
Integration Tests
Deploy to Production
Model the release process in CodePipeline
Pipeline Run
ActionStage
Pipeline
Source change
• starts a run; and
• creates an artifact to be used by
other actions.
Change 1
Release and deploy process: Starting point
MyApp
CodeCommit
Source
Build
Build
Build
DeployToInteg
CodeDeploy
Integration
IntegTest
End2EndTester
DeployToProd
CodeDeploy
Production
CodeDeploy
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
Techniques
Be aware when a service is unavailable
Problem:
A service can stop working at any time for reasons inside
or outside of its control.
Consequence:
Your service may be unavailable without your team
knowing about it.
1 of 5 – Continuous production testing
Use synthetic traffic to simulate real users
• Test all business critical functionality (UI and APIs)
• Tests must run quickly
• Measure client latencies
• Check for reachability
1 of 5 – Continuous production testing
Synthetic Traffic
How synthetic traffic flows
CloudWatch
Alarm
1 of 5 – Continuous production testing
CloudWatch
Events (1m)
CloudWatch
Events (1m)
Synthetic Traffic
Synthetic traffic flow – why two metric streams?
CloudWatch
Alarm
1 of 5 – Continuous production testing
Building a synthetic traffic test
• Keep it simple
• Build logic in Lambda (invoke with CloudWatch Events)
• Capture data in CloudWatch metrics
1 of 5 – Continuous production testing
Release and deploy process: Synthetic traffic
DeployToProd
CodeDeploy
Production
Synthetic Traffic
CodeDeploy
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
Techniques
V1V1 V1 V1 V1 V1 V1 V1 V1 V1V2 V2 V2 V2 V2V2 V2 V2 V2 V2
Rolling deployments – success
Production Fleet
ELB
2 of 5 – Manage deployment health
V1V1 V1 V1 V1 V1 V1 V1 V1 V1V2 V2 V2 V2 V2V2 V2 V2 V2 V2
Rolling deployments – fail
Production Fleet
ELB
2 of 5 – Manage deployment health
Check for deployment failures in production
Problem:
There are no automated tests to verify a service is working
after a new deployment.
Consequence:
Each production deployment needs to be checked
manually.
2 of 5 – Manage deployment health
Add safety to rolling deployments
1. Validate each host’s health
2. Ensure a minimum percentage of the fleet is healthy
3. Rollback if the deployment failed
2 of 5 – Manage deployment health
V1V1 V1 V1 V1 V1 V1 V1 V1 V1V2 V2 V2 V2 V2V2
Step 1: Working tests raises more issues
Production Fleet
ELB
2 of 5 – Manage deployment health
Failed Deployment
4 failures – 60% healthy
MHH 70%, 10 hosts:
V1V2 V1V1 V1 V1 V1 V1 V1 V1 V1V2 V2 V2 V2V2 V2 V2 V2 V2
Step 2: Use minimum healthy hosts
Production Fleet
ELB
2 of 5 – Manage deployment health
1 failure – 90% healthy
Step 3: Rollback when a deployment fails
• CodeDeploy: configured in deployment group
2 of 5 – Manage deployment health
Release and deploy: Deployment health
DeployToProd
CodeDeploy
Production
Synthetic Traffic
CodeDeploy
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
Techniques
3 of 5 - Segment production
Bad changes must not affect all customers
Pipeline Problem:
When a critical issue reaches production all hosts are
affected.
Consequence:
Bad changes impact all customers.
3 of 5 - Segment production
Lower deployment risk by segmenting
1. Break production into multiple segments
2. Deploy to a segment
3. Test a segment after a deployment
4. Repeat 2 & 3 until done
3 of 5 - Segment production
Step 1: Break production into multiple segments
Typical segment types:
• Region
• Availability Zone
• Sub-Zonal
• Single Host (Canary)
3 of 5 - Segment production
US-EAST-1
US-EAST-1A US-EAST-1B
V2 V2 V2V2V1 V1V1
Step 1: Typical deployment segmentation
Availability Zone based
Deployment
Availability Zone based
DeploymentAvailability Zone based
Deployment
V2 V2V2V1 V1V1 V2 V2V2V1 V1V1
Production Fleet
Post-deployment test
3 of 5 - Segment production
Canary
Deployment
V1
Region based Deployment
Step 1: Use deployment groups as segments
Create deployment groups per segment using:
• Tags
• Auto Scaling groups
3 of 5 - Segment production
Production
CanaryDeploy
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-1
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-2
CodeDeploy
Deploy-AZ-3
CodeDeploy
DeployToInteg
CodeDeploy
Integration
IntegTest
End2EndTester
1. Deploy to smallest segment
2. Post-deployment tests
3. Deploy to one Availability Zone
4. Post-deployment tests
5. Deploy to remaining Availability Zones
Step 2: Deploy to each segment
3 of 5 - Segment production
Step 3: Test each segment
A deployment is valid if:
• The test has gathered enough data to gain confidence
• CloudWatch metrics
• No service alarms have fired
• CloudWatch alarms
• The test has not timed out
• Code
3 of 5 - Segment production
Add segment tests to your pipeline
Extend CodePipeline with:
• Test Actions
• Lambda Invoke Actions
• Custom Actions
• Approval Actions
3 of 5 - Segment production
1 hour timeout
7 day timeout
Use CodePipeline approvals to trigger tests
Source
MyAppSource
CodeCommit
Deploy
DeployToSegment
CodeDeploy
SNS topicValidateSegment
Approval
putApprovalResult
Approval
message
3 of 5 - Segment production
DeployToSegment
CodeDeploy
Creating a post-deployment test
Source
MyAppSource
CodeCommit
Build
MyAppBuild
Build
Deploy
CanaryDeploy
CodeDeploy
ValidateCanary
Approval
SNS topic Lambda Function
registerDeployTest()
Lambda Function
evaluateDeploy()
DynamoDB
CloudWatch
Events (1m)
Change 1
Prod-us-east-1a
CodeDeploy alarmtimeusage
3 of 5 - Segment production
Post-deployment test – registerDeployTest
Source
MyAppSource
CodeCommit
Build
MyAppBuild
Build
Deploy
CanaryDeploy
CodeDeploy
ValidateCanary
Approval
SNS topic Lambda Function
registerDeployTest()
Lambda Function
evaluateDeploy()
DynamoDB
CloudWatch
Events (1m)
Change 1
Prod-us-east-1a
CodeDeploy alarmtimeusage
3 of 5 - Segment production
Post-deployment test – evaluateDeployTest
Source
MyAppSource
CodeCommit
Build
MyAppBuild
Build
Deploy
CanaryDeploy
CodeDeploy
ValidateCanary
Approval
SNS topic Lambda Function
registerDeployTest()
Lambda Function
evaluateDeploy()
DynamoDB
CloudWatch
Events (1m)
Change 1
Prod-us-east-1a
CodeDeploy alarmtimeusage
3 of 5 - Segment production
Canary Deployments – they’re different
All production hosts:
• Participates in serving production traffic
• Configured as a production instance
• Participates in production metrics stream
Canary hosts:
• Has its own metrics stream
• Canary validations use the canary metric stream
3 of 5 - Segment production
Summary: Segment production
• Segment production to reduce impact of a bad change
• Minimum segmentation:
• Region
• Canary deployment per region
• Larger service segmentation
• Zonal
• Sub-zonal
• Test each segment before moving on
3 of 5 - Segment production
Release and deploy: Segment production
Synthetic Traffic
CodeDeploy
Production
CanaryDeploy
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-1
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-2
CodeDeploy
Deploy-AZ-3
CodeDeployDeployToProd
CodeDeploy
Production
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
Techniques
3 of 5 - Segment production
4 of 5 – Halt promotions
EC2 instance
Change 2Change 3
Don’t change the system under test
Source
MyAppSource
CodeCommit
Build
MyAppBuild
Build
DeployToProd
MyApp
CodeDeploy
deploys
Change 1
Don’t compound problems during an outage
Pipeline Problem:
The pipeline is unaware of the health of the infrastructure
that it is deploying to
Consequence:
Production changes, usually deployments, can make it
difficult for an operator to resolve a production event.
4 of 5 – Halt promotions
Source
MyAppSource
CodeCommit
Build
MyAppBuild
Build
DeployToProd
MyApp
CodeDeploy
Change 1Change 2
Automatically stop deploying to production
during an event
CloudWatchSynthetic
Trafficdeploys
checks
CloudWatch
Events (1m)
triggers
emitsdisables
disableTransition() Alarm
EC2 instance
SNS
4 of 5 – Halt promotions
Summary: Halt promotions
• Halt promotions to production when your production
environment has “issues”
• Automate by disabling stage transitions
4 of 5 – Halt promotions
Release and deploy: Halt promotions
Synthetic Traffic
CodeDeploy
Production
CanaryDeploy
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-1
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-2
CodeDeploy
Deploy-AZ-3
CodeDeploy
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
1. Continuous production testing
2. Manage deployment health
3. Segment production
4. Halt promotions
5. Gates
Techniques
3 of 5 - Segment production
Do not deploy at sensitive times
Problem:
A bad change during sensitive times has a disproportionate
affect on the business.
Consequence:
Issues during sensitive days risk reputation and financial
loss.
5 of 5 - Gates
Adding safety with deployment black-days
Deploy to production during normal conditions
• Halt deployments during sensitive times
Building a black-day calendar with CodePipeline:
• Use Approvals to pause production deployments
• Lambda to automatically approve when the time is right
5 of 5 - Gates
Black-day test
Source
MyAppSource
CodeCommit
Build
MyAppBuild
Build
Deploy
BlackDayCheck
Approval
ProductionDeploy
CodeDeploy
SNS topic Lambda Function
registerDeployment
Lambda Function
processTimeWindows
DynamoDB
CloudWatch
Events (1m)
Change 1
5 of 5 - Gates
This looks familiar…
Source
MyAppSource
CodeCommit
Build
MyAppBuild
Build
Deploy
BlackDayCheck
Approval
ProductionDeploy
CodeDeploy
SNS topic Lambda Function
registerDeployment
Lambda Function
processTimeWindows
DynamoDB
CloudWatch
Events (1m)
5 of 5 - Gates
This looks familiar – post-deployment test
Source
MyAppSource
CodeCommit
Build
MyAppBuild
Build
Deploy
CanaryDeploy
CodeDeploy
ValidateCanary
Approval
SNS topic Lambda Function
registerDeployTest()
Lambda Function
evaluateDeploy()
DynamoDB
CloudWatch
Events (1m)
Prod-us-east-1a
CodeDeploy alarmtimeusage
3 of 5 - Segment production
What’s the difference?
Source
MyAppSource
CodeCommit
Build
MyAppBuild
Build
Deploy
BlackDayCheck
Approval
ProductionDeploy
CodeDeploy
SNS topic Lambda Function
registerDeployment
Lambda Function
processTimeWindows
DynamoDB
CloudWatch
Events (1m)
5 of 5 - Gates
Summary: Gates
• Black-days provide centralized control
• Add common action to all pipelines
• Black-days are a type of gate
• Implement with Approval actions in CodePipeline
5 of 5 - Gates
Production
CanaryDeploy
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-1
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-2
CodeDeploy
Deploy-AZ-3
CodeDeploy
CheckBlackDays
Approval
Release and deploy: Gates
Synthetic Traffic
CodeDeploy
Production
CanaryDeploy
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-1
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-2
CodeDeploy
Deploy-AZ-3
CodeDeploy
What we’ve learned
Goal: Make your pipeline safer…
1. Identify production issues quickly
• Continuous Production Testing
2. Safely deploy changes
• Manage deployment health
• Segment production
3. Automatically decide when to release changes
• Halt promotions
• Black-days and Gates
Release and deploy process: Ending point
DeployToProd
CodeDeploy
Production
CodeDeploy
Synthetic Traffic
CanaryDeploy
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-1
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-2
CodeDeploy
CheckBlackDays
Approval
CanaryDeploy
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-1
CodeDeploy
PostDeployTest
Approval
Deploy-AZ-2
CodeDeploy
Deploy-AZ-3
CodeDeploy
Production
Code is available online
• github.com/awslabs/aws-codepipeline-time-windows
• github.com/awslabs/aws-codepipeline-synthetic-tests
• github.com/awslabs/aws-codepipeline-block-production
Related Sessions
• DEV303 – Deploying and Managing .NET Pipelines and
Microsoft Workloads
• DEV310 – DevOps on AWS: Choosing the Right
Software Deployment Technique
• DEV313 – Infrastructure Continuous Deployment Using
AWS CloudFormation
• SVR307 – Application Lifecycle Management in a
Serverless World