Nuts and bolts of running a popular site in the aws cloud

Preview:

Citation preview

Host a hit site in the cloud without downtime

or going brokeDavid Veksler

Nuts and bolts of running a popular site in the AWS cloud• I will share how we develop and host a popular publishing platform in

the cloud with a limited budget and technology team. • We'll cover architecture, including a variety of services at Amazon

Web Services such as elastic load balancing, S3, Elastic Beanstalk, and RDS in the context of a real site. • We'll cover how we control costs with Spot and burstable instances

and scale up with distributed caching. • Finally we'll discuss continuous deployment strategies for Windows

and Linux-based cloud applications in the context of a distributed team using an agile process.

Contents1. Cloud Architecture2. Key AWS Services3. Keeping costs under control4. Configuration management5. Key tools for distributed agile development

Architecture Overview

Northern Virginia AZ

FEE-DB security groupSpot Instance Fleet

fee-media(US-Standard Region)

Media Storage

EC2VM

C4.2xlarge

Cloudflare

DNSCDN,

FirewallServices

LIVE DB: feedb2

Amazon Web Services Cloud

FEE-Dev.org

FEE.org Admin Node

TeamCity CIFee-dev.org:8080

EC2VM

C4.2xlarge

Admin.fee.orgFee-dev.org

Web1.fee.org

Admin.fee.org contains:

SES Internal Email

Other Services:

• Domain: Google Domains

• Performance: New Relic Pro

• Analytics: Parse.ly, Clicky, Google Analytics

• Uptime: Pingdom

• Email: MailChimp

• Code: BitBucket

users

Web2.fee.org

EC2VM

C4.2xlarge

fee-misc(US-Standard Region)

Backups

admin.fee.org hosts both live and dev, acts as staging for deployments

cache cluster:fee-cache-001fee-cache-002

Redis Cache

ArchitectureDiagram

DEV DB: fee-dev2

Elastic Load Balancinglb.fee.org

Analytics &Content

Recommendations

Marketing Email

web#.FEE.org instances use spot pricing to bid for the best price

DNS, Firewall and CDN

RDS

RDS

High-level objectives (by priority)1. Front end uptime should be 99.8%2. Back Office (admin) uptime should be 95%3. Keep personal information (payments, admin access) secure4. Stay up during traffic surges up to 6X weekly peak5. Keep budget under $1,600/month6. Ongoing development should not impact uptime.

Design strategy1. All components should be redundant and self-healing2. Pay for normal load while supporting surges3. Outsource infrastructure: let AWS cloud be responsible for as much

infrastructure as feasible4. Automate all backup processes5. Semi-automated disaster recovery: site should recover from most

outages automatically, when cost of doing so is reasonable6. Change management integrated into architecture via imaging and

cache keys

Architecture Summary• Front-end is load balanced, scalable, and self-healing• Backend is isolated from front-end• Automatic snapshots for servers, transaction logging for DB• Rely on AWS services for all infrastructure services• Combine functionality within servers to save costs• Massively over-allocate capacity using market-based pricing• Development process integrated with production architecture

Northern Virginia AZ

FEE-DB security groupSpot Instance Fleet

fee-media(US-Standard Region)

Media Storage

EC2VM

C4.2xlarge

Cloudflare

DNSCDN,

FirewallServices

LIVE DB: feedb2

EC2VM

C4.2xlarge

Admin.fee.orgFee-dev.org

Web1.fee.org

SES Internal Email

users

Web2.fee.org

EC2VM

C4.2xlarge

fee-misc(US-Standard Region)

Backups

cache cluster:fee-cache-001fee-cache-002

Redis Cache

DEV DB: fee-dev2

Elastic Load Balancinglb.fee.org

RDS

RDS

Amazon Cloud Services Used• Load balancing: Elastic Load Balancer• Virtual machines: EC2 Spot Instances• Databases: RDS (SQL Server)• Media Storage & Backups: S3• Distributed Cache: ElastiCache (Redis)• CDN: CloudFront CloudFlare• Email: Amazon SES

Other Cloud Services• Analytics: Parse.ly, Clicky, Google Analytics• Performance: New Relic Pro• Email: MailChimp (Campaigns & Automations)

Selected Services in Detail

Why CloudFlare is awesome• Flat-rate CDN service (supports CDN daisy-chaining)• Free, powerful SSL• Active, crowd-sourced firewall• Powerful DNS (CNAME flatting, much more)• HTML and Image minification• Much more!• Saves FEE.org $ thousands per year in bandwidth costs• Starts at $20/month

30 days:

Elastic Load Balancer• Point DNS at CNAME of load balancer• Point destination to specific VMs or use auto-scaling rules• Set destination by path pattern with Application Load Balancer• Use TCP, HTTP, SSL for health check• We use a custom health check endpoint which verifies application

uptime & DB connectivity

RDS: Relational Database Service• FEE.org uses SQL Server Web• Other sites use AuroraDB, which is 10X faster than MySQL

• (With proper tuning, in specific scenarios)

• Use snapshots to create dev instances of DB• Schedule configuration changed for off-hours• Be aware that RDS SQL Server restricts most admin actions. There are special

sprocs for some actions such as renaming DB or bringing DB online (but not taking offline!) • Backup restore not allowed: use SQL Database Migration Wizard to restore DB• Use burstable SQL Server instances, especially for dev DB

S3: Media storage + backup• FEE.org uses S3 as a media (Image/PDF/EPUB/MP4/MP3) store• Only originals are stored in S3, thumbnails are stored on server• Amazon Web Services S3 IFileSystem provider for Umbraco + a

custom caching layer• XSLT transforms to specify production/dev buckets

Spot Instances• Instances only run when market price below the bid price• In practical terms, Spot = 80% saving on hourly instances• Supports auto scaling. Use it!• Set bid price equal to hourly instance price and get 100% availability (so

far)• Specify a range of qualified instance types (including previous

generations) to maximize chance of availability. • FEE.org runs master server as xlarge hourly instance and read-only nodes

as 2xlarge Spot instances. This guarantees at least 1 cheap(er) instance even if prices spike or instances refresh at the same time.

Spot Pricing History

Elastic Load Balancer

Auto Scaling

Example: Netflix• http://techblog.netflix.com/2012/01/auto-scaling-in-amazon-cloud.html

Red= # of serversGreen = CPU utilization

Auto Scaling

Build cloud systems that scale automatically to meet current demand

When to auto-scale?• Instances that don’t take very long to spin up• Individual instances don’t use too much resources• Version release process is automated (such as with Elastic Beanstalk)• Don’t release very often, or cost or snapshot management is minimal• Large difference between minimum and peak traffic• Unpredictable traffic trends

Alternatives to auto-scaling• Burstable instances• Spot Instances• Schedule on/off instance times with AutomatiCloud

Why doesn’t FEE.org auto-scale?• Minimum instance count for high availability is 3• Peak traffic (> 600 concurrent users) can be handled by 2 instances• Each instance requires 16GB ram and 8 CPUs for optimal performance• Release process not fully automated & no full-time developers (do not

use Elastic Beanstalk & have to make manual snapshots post-release)• Can spin up new instances within minutes with Spot + New Relic

Alerts• Will probably consider auto-scaling when we have more process

maturity (fully automated release process)

Elastic Beanstalk

Elastic Beanstalk• Upload DLLs to AWS git reposity, AWS does the rest• AWS will deploy the code, load balancing, auto-scaling, health

monitoring, etc.• Environment configuration with web.config XSLT transforms and ACL

permissions (wpp.targets) file.• FREE service – only pay for resources used• If using .Net, works with most 100% managed code projects• GUI integrated with Visual Studio

Cloud hosting on a budget

Thinking about IAAS/SAAS Pricing Strategy• Cloud services almost always cost much more per compute resource

than colocations or dedicated hardware• Cost savings come in matching demand to infrastructure and

outsourcing management services• Amazon & Azure are some of the most costly cloud services per

resources, but recommended for most scenarios because of productivity benefits from breadth and depth of managed services.

Grant Brown
Because what?

Cloud Services Pricing Summary• Each cloud service provider has a unique bundle of services and pricing

model. Different providers have unique price advantages for different products. Provider selection should be based on a typical application mix for our business.• Azure may have a price advantage over Amazon when using cloud-optimized

architecture based on Microsoft products.• Softlayer, Digital Ocean, and Google Compute all have better prices than

bost for various scenarios, especially Windows VM, but offer fewer services.• Cost is just one of many criteria for choosing a provider! No provider has a

decisive advantage for all scenarios.

Grant Brown
bost?

Pricing Recommendations1. Use the pricing calculator offered by each provider to estimate total

application cost for specific applications. Keep in mind cloud-optimized architectures may have a much lower cost. (For example, compute functions instantiated on-demand, auto-scaling, etc.)

2. Do not make pricing the primary consideration in provider selection unless the cost difference is critical to businesses requirements. In general, major service and quality differences between providers are more important than pricing considerations.

3. Developing deep expertise and service integration with a cloud provider is usually more important than cost differences for individual projects.

Saving Money with AWS• Reserved Instances• Spot Instances• Burstable Instances• Scheduled Instances (using AWS or third party tool)• This can be used with any AWS VM service – EC2, RDS, ElastiCache,

etc!

AWS Instance type selection criteria• Use the latest generation of instance types (x4/t2)• Use burstable instances for applications with high daily variability• Evaluate whether applications are CPU, memory, or IO intensive and

select the appropriate type – scale up your particular bottleneck• For applications with consistent and predicable load, prefer larger

instances; for applications with unpredictable load, auto-scale horizontally with more burstable instances

Buying a reserved instance• Unsure about your needs?

Get a convertible instance! Can move up or across.

• You can sell them! (I haven’t tried this)

• Best savings/risk is usually with partial payment option.

S3 Reduced Redundancy Store & Glacier• “Only” duplicated across 2 facilities• .01% storage failure rate (“400 times the durability of typical HDD”)• About 25% cheaper

• Background service via event handle to media upload completed method

• $412GB * $0.0314 per GB = $155/year saved on storage alone

• Runs as AWS Marketplace service ($39/month) or desktop app

JPEGmini

Summary: FEE.org $ saving strategy:• 2 reserved burstable RDS databases• 1 reserved admin EC2 VM• 2 Spot EC2 front-end server instances• AutomatiCloud EC2 scheduling for off-hours (and backup automation)• S3 Reduced redundancy store for non-critical backups and dev data• CloudFlare CDN• JPEGmini image optimization background service

Continuous Deployment Strategies

FEE Development Process1. Post job on UpWork.com2. Hire freelancer3. Developer commits work to git4. Deploy to dev environment5. Test work6. Create pull request for release7. Release build8. Staged deployment to production servers

Development Process in Detail

UpWork.com

Orientation• Google Doc with:• Architectural overview• FEE.org development process• Instructions to setup localhost environment• Review of tools used• Relevant people involved & their contact info• Address of FEE-Dev Skype group• Code Quality Expectations

Development Environment Setup1. Checkout git repository2. “Just hit F5”

• NuGet for all dependencies• XSLT for non-local environments• Dev DB hosted in cloud• Optional: Install Redis on localhost for better performance

Continuous Integration http://fee-dev.org:8080/Login as guest now!

Release Build

Staged, Staggered Deployment• xcopy to each production server• ELB takes server out of production within 30

seconds • Stagger release by ~5 minutes to let each

application pool warm up

Environment Monitoring

Collaboration & Internal Messaging• SlackBot

Project Management

Aside: LAMP deployment strategy (highly available WordPress)• Commit hooks on master branch in Bitbucket git repository• Hooks call deploy.php script which runs a git pull in dev environment• Release PHP code with git pull on production• Image staging server (AMI), and deploy Spot fleet with AMI

• Use S3 Media storage provider, Redis cache – no persistent data on Spot instances• Easy Engine for easy nginx configuration, etckeeper to backup/sync

configuration file

The Enddveksler@fee.org

@AtlCodeCamphttpS://AtlantaCodeCamp.com/2016

Platinum Sponsors

Gold Sponsors

SWAG Sponsors

Silver Sponsors

Surveys and Prizes• Please complete the session and event surveys!1 ticket per session survey1 ticket for the event survey1 ticket for completing the booth game

• Drawing for prizes begins at 5pm in Q202