Upload
amazon-web-services
View
407
Download
2
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building a DevOps Culture in Public Sector
June 13, 2017
Emil Lerch, Sr Consultant Amazon Web Services
Scott Day, CTO, SoundExchange
Reid Badgett, Sr. Director, Engineering, SoundExchange
David Joseph: Senior Director, DevOps Adoption and Implementation, Ellucian
Scott Moomaw: Senior Manager, DevOps Adoption and Implementation, Ellucian
DevOps
https://puppet.com/resources/whitepaper/2016-state-of-devops-report
Integration of Development and Operations, including security into
a highly functioning team. This team engages in:
• Systems thinking
• Amplification of feedback loops
• Culture of continual experimentation and learning
Benefits include:
• Higher quality
• Faster delivery
• Lower implementation cost
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DevOps: Enabling Business Goals
SoundExchangehttp://www.soundexchange.com/careers
Scott Day, CTO
Reid Badgett, Sr. Director, Engineering
Background on SoundExchange
History
• Formed in 2000, a result of U.S. copyright legislation in the 90’s
• Became an independent organization in 2003
• Created by the industry for the industry; we are at the center of today’s digital music industry
• 170 full-time employees headquartered in Washington, DC
Perform critical role in digital music world
• Sole U.S. entity to collect and distribute sound recording performance royalties for 3,000+ non-interactive internet radio, satellite radio, and cable television services
• In 2016, distributed approximately $884 million to recording artists and record labels
• To date, distributed more than $4.5 billion in royalties
At the forefront of music industry transformation to digital streaming
• We create and deploy innovative solutions to power the modern global music community in order to pay creators transparently, accurately and efficiently
Our Technology Platform Transformation
• Monolithic core system
• Disjointed, siloed apps
• Traditional IT delivery
• Highly manual processes
• J2EE/Oracle DB stack
• On-premises infrastructure
Circa 2011
• Federated architecture and systems
• Service-oriented app integration
• Agile and DevOps-based delivery
• Highly automated processes
• Open source stack
• AWS cloud infrastructure
Circa 2016
SoundExchange Engineering in 2011
• Separate Tech Ops Team from Engineers
• Minimal build automation, manual deploy procedures
• Hand-rolled environments w/ environment “drift”
• Increasingly slow performance
• Difficult to triage stability issues
• Frequent fire-fighting
We were limiting business progress
In the past 6 years…
• Grown Technology group from 5 to 32 persons
• Adopted Agile, Open-Source, DevOps
• Adopted use of AWS Public Cloud
• Hired an incredible group of Engineers
• Rebuilt our Royalty Processing Platform
• Built several new systems on top of the Platform
We are a strong enabler of business progress
Principles and Practices
• Small Teams (“1-2 Pizzas”)
• Agile (Scrum & Kanban)
• Loose-coupling via APIs
• Lightweight architectures
• Continuously build and release
• Leverage existing services when possible
• Automation of tests (functional, performance, load)
• Resilient to outages with graceful degradation
How we define our “DevOps” Culture
• Engineers develop software and support it in Production
• DevOps Team develops capabilities to enable DevOps
• Results• High system stability and quality
• Created culture that removes barriers and facilitates quality
• Enabled end-to-end problem thinking
• No opportunity to “throw it over the fence”
• Enabled experimentation, leading to better architectures
• Very efficient teams with low headcount needs
8 Dev Teams running 400+ servers with no O&M team
DevOps Team: Enabler of DevOps
DevOps Team
Licensee Team
Matching Team
Repertoire Team
Rights Team
Distribution
Team
SXDirect
Team
DevOps Team: Creating DevOps Capabilities
Most capabilities should be extendable by Dev Teams
Capability Tools and Approaches
Change Management Git, Jenkins, Ansible, CloudFormation
Cloud Standards AWS Docs, AWS Training/Support, “Experience”
Platform Reliability Auto Scaling, Multi-AZ, SQS, Zone Evacuation
Monitoring of Components CloudWatch, New Relic, Pingdom
Security of the Platform Custom Scripts, CloudTrail, Trusted Advisor
Cloud Management CloudCheckr, Trusted Advisor, Custom Reports
DevOps Themes by Year
• Stabilized Legacy system
• Launched first system in AWS Public Cloud2012• Developed initial Cloud Standards
• Selected Tools and created first Build Pipelines2013• Adapting capabilities to fit with new dev projects
• Refactoring and paying down “tech debt”2014• Training of Dev Teams on DevOps capabilities
• Enhanced Resiliency and Monitoring capabilities2015• Most Dev Teams owning their “DevOps” capabilities
• Increased efficiencies (costs, environment build times)2016• More granular security controls and protections
• Leveraging serverless and more AWS managed servicesPresent
Moving at the Speed of DevOps
In the last year, we’ve performed:
27,471 Continuous Integration Build/Deploys
5,495 Internal Testing Deployments
2,747 User Acceptance Test Deployments
686 Production Deployments
Compared to to just 50 builds/deploys in 2011
Before DevOps
• Uptime: varied, sub-90%
• Unplanned issues and outages
• Provisioning: Weeks/months
• Releases: Monthly
• Scalability: Low
• Focus: On “Tech Ops”
Moving at the Speed of DevOps
After DevOps
• Uptime (Avg): 99.97%
• Dependable deployments
• Provisioning: Hours
• Releases: Daily/Weekly
• Scalability: High
• Focus: On “The Business”
DevOps enables us to deliver what the business needs quickly,
efficiently, and with high quality and dependability
Decisions We Had To Make for DevOps
Decision Considerations
Fostering Collaboration What interest level and skills do we have on Dev teams to
enhance DevOps capabilities? How do we enable collaboration?
Effective Coordination How do we integrate our capabilities into Dev team roadmaps?
How do we roll out changes w/o breaking things?
Picking Tools Which tools for Version Control, CI/CD, Server CM, “Scripting”,
Monitoring, Security? How well do they fit with our tech stack?
Delivering Reliability What level of reliability does the platform need?
Do we / when do we need multi-region? How do we get there?
Service Monitoring What can we get “for free” with AWS? What else do we need?
How do we make it easy for Dev teams to add monitors?
Appropriate Security How much autonomy do we need right now vs controls?
How can we automate our security (pro-active vs reactive)?
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ellucian DevOps Transformation
David Joseph: Senior Director, DevOps
Adoption and Implementation
Scott Moomaw: Senior Manager, DevOps
Adoption and Implementation
Ellucian
SaaS Cloud
AHS Cloud
On Prem
Ellucian’s Customers Base
Good Better Better
Best Best Best
DevOps - Scope and Goals
Repeatable, reliable, deployments and testing
Lower labor costs by eliminating manual touch points
Feedback from Operations
Increase Collaboration
Uptime > 99.9%
R&D Organizational & Culture Transformation
Strategy
Legacy Enterprise Software
Old technology skills
Agile’ish
Content driven releases
Geared to software delivery
Manual centric QA
Heavy manual deployments
Low feature velocity
Long Tenure
Cloud
Cloud and current web skills
DevOps function
Agile & metrics
Stand-alone sprint teams
Time boxed releases
Smaller accelerated feature
delivery
CI/automated testing
Load-performance testing
Automated
deployment/CD/DevOps
PresentQ1/15
Skills Review
Hire
Re-Train /
Study Groups
Consulting
Evangelize
Process/Org Change
Transform to skills and culture of a Cloud company in
2015
DevOps Organizational Relationships
R&D
OperationsCustomer Success
DevOps Tools
SMEs
DevOps Adoption
SMEs
R&D
Customer
SuccessOperations
Pre DevOps Culture DevOps Culture
Building DevOps Culture
Hire
DevOps
Engineers
(Parallel)
Single DevOps team in R&D builds master templates & process
Breakout & embed DevOps with product groups to implement
Engage DevOps Consultants Utilize In-House Experience
Hand-off to in-house DevOps in each product group
Cross team DevOps SCRUM to keep standard/unblock issues
Automated deployment – Infrastructure-As-Code all products
Exit DevOps Consultants
DevOps On-boarding Process
Conduct high level overview of gap analysis for products
Define the scope for the project
Conduct demo of the pipeline to the product teams
Develop a scope statement for each product.
Develop testing plan and acceptable testing standards
Define criteria for “Done”
Full Pipeline Automation into production
Developers
Continuous
Build &
Integration
Automate
d Unit
Test
Fail Fast Pass
Auto
Deployme
ntContinuous
Deploymen
t
Automated Unit &
Functional Test
QA
System
Ap
plica
tio
n C
od
eIn
fra
str
uctu
re
as C
od
e
Fu
ll S
tack u
nd
er
test
Automated Delivery
Pass
Auto
Deployme
ntContinuous
Deploymen
t
Automated
Security &
performanc
e Test
Staging
Pass
Auto
Deployme
ntContinuous
Deploymen
t
Continuous
Monitoring
7x24 NOC
Productio
n
Fail Fast Fail Fast
Legacy Systems:
Infrastructure manually deployed and
maintained.
High labor open to human error
Security checks in production
Downtime during upgrades
Ellucian Systems:
Infrastructure code automatically
deployed and maintained.
Fully tested with App code
Repeatable and low labor
Security scans BEFORE production
Limit to no downtime during upgrades
(blue/green deployments)98808 15
Phase 1:Build, Deploy, Test Phase 2:Operationalize
DevOps Maturity Assessment
lessthan 20%
between 20% and 39%
between 40% and 59%
between 60% and 79%
80% or greater
LEVEL LEVEL LEVEL LEVEL LEVEL
0 1 2 3 4
A N A G G R E G A T E A S S E S S M E N T
BASE BEGINNER INTERMEDIATE ADVANCED LEADER
Update
EvaluateMeasure
Source control used but some items may not be properly versioned. No
traceability from source to binaries
Manual builds. Manual dependency mgmt. Some
items not even fully source controlled
Manual testing after development
Manual processes for deploying hardware and
software
Disparate logging and reporting. Issues
discovered by customers.
LEVEL 1
All items, including build/deploy scripts, in source control ensuring
repeatable builds
Automated, repeatable, builds. All items are under source control
Able to support automated testing during the
build/deploy sequence. Clearly
tracked metrics showing incremental improvements
in testing maturity
Some automation for provisioning/deployment
but varies by environment. Deployed assets are tagged for
tracking.
Centralized logging permitting operational
analytics.
LEVEL 2
Separate repositories in use for infrastructure,
application, etc. artifacts. Artifacts, in
binary repository, tagged and fully traceable to
source
Automated builds include integrated unit tests and
code coverage.
Clear acceptance criteria for each story with
automated tests validating acceptance. Increased level of functional, non-
functional, and unit tests.
Deployment/provisioning uses "Infrastructure as
code" and uses the approved VPC
architecture
Adequate training, feedback, monitoring and
preparation has been completed to enable
Cloud Ops to appropriately support the
application and meet SLAs.
LEVEL 3
Formal branching strategies, using best practices, in use to support release life-
cycles.
Continuous builds (CI) with managed
dependencies. Metrics tracked.
High-level of functional, non-functional, and unit test coverage including integration testing for related applications.
Consistent, automated tools for
deploying/provisioning all environments. Supports smooth upgrades across
application versions. Migration path to
production planned.
Reporting and billing mgmt
centralized. Routine activities by CloudOps
engineers are automated. Disaster
Recovery plans in place.
LEVEL 4
Change management procedures are actively followed in production, ensuring that DevOps
infrastructure definitions are updated.
Automated fail-over, disaster recovery for
production environments in place.
Category CM Build Test Deploy Operations
LEVEL 0
LEVEL 1
LEVEL 2
LEVEL 3
LEVEL 4
DevOps Maturity Framework
Decentralized security with weak security policies and
procedures in place.
No formal performance monitoring
Manual database schema and data
management. Manual database server
deployment.
No deployment via DevOps pipeline
LEVEL 1Centralized security
monitoring and escalations.
Performance monitoring generates notification of
issues.
Automated db schema management from source control. Manual db server
deployments.
Pipeline deploys to staging / Testing
LEVEL 2Data privacy issues are tracked and mitigated
APM tools are used to monitor and adjust
application performance.
Automated db server deployments (e.g. AMI or
RDS instances)
Pipeline facilitates automated creation of
development environments
LEVEL 3Penetration tests are
utilized across production environments.
Application scales across multi-AZ/regions for
performance characteristics. Performan
ce monitoring triggers automatic scaling
and issue remediation.
Automated DB schema and basic data updates
performed during deployment using source
control artifacts
LEVEL 4
Production/preview environments created from assets that are promoted
from staging
Category Security Performance Database Environments
LEVEL 0
LEVEL 1
LEVEL 2
LEVEL 3
LEVEL 4
DevOps Maturity Framework
27
DevOps – metric driven
98808
• DevOps MaturityJIRA
Backlogs
• Unit Test Coverage
• Functional Testing
• Performance Testing
Testing Frameworks
• Vulnerability Scanning
• RemediationSecurity
Tools
The Well-Architected Framework
Security:
The ability to protect information, systems, and assets while delivering business value
through risk assessments and mitigation strategies.
Reliability:
The ability of a system to recover from infrastructure or service failures, dynamically
acquire computing resources to meet demand, and mitigate disruptions such as
misconfigurations or transient network issues.
Performance Efficiency:
The ability to use computing resources efficiently to meet system requirements, and to
maintain that efficiency as demand changes and technologies evolve.
Cost Optimization:
The ability to avoid or eliminate unneeded cost or suboptimal resources.
Operational Excellence:
The ability to run and monitor systems to deliver business value and to continually
improve supporting processes and procedures.
Ellucian’s Culture Maturity
Mostly Lift-and-Shift into AWS
Very Little Test Coverage
Security Scans Ad-hoc
Sparse CI, No Real CD
Processes
New Node Deployments Man-
weeks: Manual
Refactoring Into Cloud-Native
Apps
Improved Automated Test
Coverage
Security Scans in DevOps
Pipeline
7000+ Jenkins Jobs Running
Daily
New Node Deployments ~4
Hours: Automated
Before After
Thank You!