Host a hit site in the cloud without downtime
or going brokeDavid Veksler
Nuts and bolts of running a popular site in the AWS cloud• I will share how we develop and host a popular publishing platform in
the cloud with a limited budget and technology team. • We'll cover architecture, including a variety of services at Amazon
Web Services such as elastic load balancing, S3, Elastic Beanstalk, and RDS in the context of a real site. • We'll cover how we control costs with Spot and burstable instances
and scale up with distributed caching. • Finally we'll discuss continuous deployment strategies for Windows
and Linux-based cloud applications in the context of a distributed team using an agile process.
Contents1. Cloud Architecture2. Key AWS Services3. Keeping costs under control4. Configuration management5. Key tools for distributed agile development
Architecture Overview
Northern Virginia AZ
FEE-DB security groupSpot Instance Fleet
fee-media(US-Standard Region)
Media Storage
EC2VM
C4.2xlarge
Cloudflare
DNSCDN,
FirewallServices
LIVE DB: feedb2
Amazon Web Services Cloud
FEE-Dev.org
FEE.org Admin Node
TeamCity CIFee-dev.org:8080
EC2VM
C4.2xlarge
Admin.fee.orgFee-dev.org
Web1.fee.org
Admin.fee.org contains:
SES Internal Email
Other Services:
• Domain: Google Domains
• Performance: New Relic Pro
• Analytics: Parse.ly, Clicky, Google Analytics
• Uptime: Pingdom
• Email: MailChimp
• Code: BitBucket
users
Web2.fee.org
EC2VM
C4.2xlarge
fee-misc(US-Standard Region)
Backups
admin.fee.org hosts both live and dev, acts as staging for deployments
cache cluster:fee-cache-001fee-cache-002
Redis Cache
ArchitectureDiagram
DEV DB: fee-dev2
Elastic Load Balancinglb.fee.org
Analytics &Content
Recommendations
Marketing Email
web#.FEE.org instances use spot pricing to bid for the best price
DNS, Firewall and CDN
RDS
RDS
High-level objectives (by priority)1. Front end uptime should be 99.8%2. Back Office (admin) uptime should be 95%3. Keep personal information (payments, admin access) secure4. Stay up during traffic surges up to 6X weekly peak5. Keep budget under $1,600/month6. Ongoing development should not impact uptime.
Design strategy1. All components should be redundant and self-healing2. Pay for normal load while supporting surges3. Outsource infrastructure: let AWS cloud be responsible for as much
infrastructure as feasible4. Automate all backup processes5. Semi-automated disaster recovery: site should recover from most
outages automatically, when cost of doing so is reasonable6. Change management integrated into architecture via imaging and
cache keys
Architecture Summary• Front-end is load balanced, scalable, and self-healing• Backend is isolated from front-end• Automatic snapshots for servers, transaction logging for DB• Rely on AWS services for all infrastructure services• Combine functionality within servers to save costs• Massively over-allocate capacity using market-based pricing• Development process integrated with production architecture
Northern Virginia AZ
FEE-DB security groupSpot Instance Fleet
fee-media(US-Standard Region)
Media Storage
EC2VM
C4.2xlarge
Cloudflare
DNSCDN,
FirewallServices
LIVE DB: feedb2
EC2VM
C4.2xlarge
Admin.fee.orgFee-dev.org
Web1.fee.org
SES Internal Email
users
Web2.fee.org
EC2VM
C4.2xlarge
fee-misc(US-Standard Region)
Backups
cache cluster:fee-cache-001fee-cache-002
Redis Cache
DEV DB: fee-dev2
Elastic Load Balancinglb.fee.org
RDS
RDS
Amazon Cloud Services Used• Load balancing: Elastic Load Balancer• Virtual machines: EC2 Spot Instances• Databases: RDS (SQL Server)• Media Storage & Backups: S3• Distributed Cache: ElastiCache (Redis)• CDN: CloudFront CloudFlare• Email: Amazon SES
Other Cloud Services• Analytics: Parse.ly, Clicky, Google Analytics• Performance: New Relic Pro• Email: MailChimp (Campaigns & Automations)
Selected Services in Detail
Why CloudFlare is awesome• Flat-rate CDN service (supports CDN daisy-chaining)• Free, powerful SSL• Active, crowd-sourced firewall• Powerful DNS (CNAME flatting, much more)• HTML and Image minification• Much more!• Saves FEE.org $ thousands per year in bandwidth costs• Starts at $20/month
30 days:
Elastic Load Balancer• Point DNS at CNAME of load balancer• Point destination to specific VMs or use auto-scaling rules• Set destination by path pattern with Application Load Balancer• Use TCP, HTTP, SSL for health check• We use a custom health check endpoint which verifies application
uptime & DB connectivity
RDS: Relational Database Service• FEE.org uses SQL Server Web• Other sites use AuroraDB, which is 10X faster than MySQL
• (With proper tuning, in specific scenarios)
• Use snapshots to create dev instances of DB• Schedule configuration changed for off-hours• Be aware that RDS SQL Server restricts most admin actions. There are special
sprocs for some actions such as renaming DB or bringing DB online (but not taking offline!) • Backup restore not allowed: use SQL Database Migration Wizard to restore DB• Use burstable SQL Server instances, especially for dev DB
S3: Media storage + backup• FEE.org uses S3 as a media (Image/PDF/EPUB/MP4/MP3) store• Only originals are stored in S3, thumbnails are stored on server• Amazon Web Services S3 IFileSystem provider for Umbraco + a
custom caching layer• XSLT transforms to specify production/dev buckets
Spot Instances• Instances only run when market price below the bid price• In practical terms, Spot = 80% saving on hourly instances• Supports auto scaling. Use it!• Set bid price equal to hourly instance price and get 100% availability (so
far)• Specify a range of qualified instance types (including previous
generations) to maximize chance of availability. • FEE.org runs master server as xlarge hourly instance and read-only nodes
as 2xlarge Spot instances. This guarantees at least 1 cheap(er) instance even if prices spike or instances refresh at the same time.
Spot Pricing History
Elastic Load Balancer
Auto Scaling
Example: Netflix• http://techblog.netflix.com/2012/01/auto-scaling-in-amazon-cloud.html
Red= # of serversGreen = CPU utilization
Auto Scaling
Build cloud systems that scale automatically to meet current demand
When to auto-scale?• Instances that don’t take very long to spin up• Individual instances don’t use too much resources• Version release process is automated (such as with Elastic Beanstalk)• Don’t release very often, or cost or snapshot management is minimal• Large difference between minimum and peak traffic• Unpredictable traffic trends
Alternatives to auto-scaling• Burstable instances• Spot Instances• Schedule on/off instance times with AutomatiCloud
Why doesn’t FEE.org auto-scale?• Minimum instance count for high availability is 3• Peak traffic (> 600 concurrent users) can be handled by 2 instances• Each instance requires 16GB ram and 8 CPUs for optimal performance• Release process not fully automated & no full-time developers (do not
use Elastic Beanstalk & have to make manual snapshots post-release)• Can spin up new instances within minutes with Spot + New Relic
Alerts• Will probably consider auto-scaling when we have more process
maturity (fully automated release process)
(More)
More: http://www.slideshare.net/DavidVeksler/auto-scaling-websites-in-the-cloud
Elastic Beanstalk
Elastic Beanstalk• Upload DLLs to AWS git reposity, AWS does the rest• AWS will deploy the code, load balancing, auto-scaling, health
monitoring, etc.• Environment configuration with web.config XSLT transforms and ACL
permissions (wpp.targets) file.• FREE service – only pay for resources used• If using .Net, works with most 100% managed code projects• GUI integrated with Visual Studio
Cloud hosting on a budget
Thinking about IAAS/SAAS Pricing Strategy• Cloud services almost always cost much more per compute resource
than colocations or dedicated hardware• Cost savings come in matching demand to infrastructure and
outsourcing management services• Amazon & Azure are some of the most costly cloud services per
resources, but recommended for most scenarios because of productivity benefits from breadth and depth of managed services.
Cloud Services Pricing Summary• Each cloud service provider has a unique bundle of services and pricing
model. Different providers have unique price advantages for different products. Provider selection should be based on a typical application mix for our business.• Azure may have a price advantage over Amazon when using cloud-optimized
architecture based on Microsoft products.• Softlayer, Digital Ocean, and Google Compute all have better prices than
bost for various scenarios, especially Windows VM, but offer fewer services.• Cost is just one of many criteria for choosing a provider! No provider has a
decisive advantage for all scenarios.
Pricing Recommendations1. Use the pricing calculator offered by each provider to estimate total
application cost for specific applications. Keep in mind cloud-optimized architectures may have a much lower cost. (For example, compute functions instantiated on-demand, auto-scaling, etc.)
2. Do not make pricing the primary consideration in provider selection unless the cost difference is critical to businesses requirements. In general, major service and quality differences between providers are more important than pricing considerations.
3. Developing deep expertise and service integration with a cloud provider is usually more important than cost differences for individual projects.
Saving Money with AWS• Reserved Instances• Spot Instances• Burstable Instances• Scheduled Instances (using AWS or third party tool)• This can be used with any AWS VM service – EC2, RDS, ElastiCache,
etc!
AWS Instance type selection criteria• Use the latest generation of instance types (x4/t2)• Use burstable instances for applications with high daily variability• Evaluate whether applications are CPU, memory, or IO intensive and
select the appropriate type – scale up your particular bottleneck• For applications with consistent and predicable load, prefer larger
instances; for applications with unpredictable load, auto-scale horizontally with more burstable instances
Buying a reserved instance• Unsure about your needs?
Get a convertible instance! Can move up or across.
• You can sell them! (I haven’t tried this)
• Best savings/risk is usually with partial payment option.
S3 Reduced Redundancy Store & Glacier• “Only” duplicated across 2 facilities• .01% storage failure rate (“400 times the durability of typical HDD”)• About 25% cheaper
• Background service via event handle to media upload completed method
• $412GB * $0.0314 per GB = $155/year saved on storage alone
• Runs as AWS Marketplace service ($39/month) or desktop app
JPEGmini
Summary: FEE.org $ saving strategy:• 2 reserved burstable RDS databases• 1 reserved admin EC2 VM• 2 Spot EC2 front-end server instances• AutomatiCloud EC2 scheduling for off-hours (and backup automation)• S3 Reduced redundancy store for non-critical backups and dev data• CloudFlare CDN• JPEGmini image optimization background service
Continuous Deployment Strategies
FEE Development Process1. Post job on UpWork.com2. Hire freelancer3. Developer commits work to git4. Deploy to dev environment5. Test work6. Create pull request for release7. Release build8. Staged deployment to production servers
Development Process in Detail
UpWork.com
Orientation• Google Doc with:• Architectural overview• FEE.org development process• Instructions to setup localhost environment• Review of tools used• Relevant people involved & their contact info• Address of FEE-Dev Skype group• Code Quality Expectations
Development Environment Setup1. Checkout git repository2. “Just hit F5”
• NuGet for all dependencies• XSLT for non-local environments• Dev DB hosted in cloud• Optional: Install Redis on localhost for better performance
Continuous Integration http://fee-dev.org:8080/Login as guest now!
Release Build
Staged, Staggered Deployment• xcopy to each production server• ELB takes server out of production within 30
seconds • Stagger release by ~5 minutes to let each
application pool warm up
Environment Monitoring
Collaboration & Internal Messaging• SlackBot
Project Management
Aside: LAMP deployment strategy (highly available WordPress)• Commit hooks on master branch in Bitbucket git repository• Hooks call deploy.php script which runs a git pull in dev environment• Release PHP code with git pull on production• Image staging server (AMI), and deploy Spot fleet with AMI
• Use S3 Media storage provider, Redis cache – no persistent data on Spot instances• Easy Engine for easy nginx configuration, etckeeper to backup/sync
configuration file
@AtlCodeCamphttpS://AtlantaCodeCamp.com/2016
Platinum Sponsors
Gold Sponsors
Surveys and Prizes• Please complete the session and event surveys!1 ticket per session survey1 ticket for the event survey1 ticket for completing the booth game
• Drawing for prizes begins at 5pm in Q202