28
The Columbus Dispatch on Amazon

The Columbus Dispatch on Amazon. The Presenters David Landreman Web Services IT Manager Email: [email protected] Twitter: @GraphIt2000 LinkedIn:

Embed Size (px)

Citation preview

The Columbus Dispatch on

Amazon

The Presenters

David LandremanWeb Services IT Manager

Email: [email protected]

Twitter: @GraphIt2000

LinkedIn: www.linkedin.com/in/davidlandreman

Andrew RothSenior Internet Development Engineer

Email: [email protected]

LinkedIn: www.linkedin.com/in/rothandrew

• Newspapers, weekly periodicals, TV stations, and radio stations

• 23 unique websites hosted on Amazon in a unified content management system (OpenCMS)

• Millions of pageviews daily

Cloud Migration

• Project to upgrade content management system to new version

• Original plan was to migrate hardware from co-lo data center physical to VMWare virtual at the same data center

• 2 months prior to completion the decision was made to migrate to Amazon Web Services (AWS)

Amazon Selection Factors• Team familiarity with AWS

• Ease of beginning an engagemento No contract

• Costo Limited payment options

• Large client base (Netflix, Instagram, …)

• Large selection of services beyond virtual computing resources

Cloud Computing Paradigms• Infrastructure as a Service (IaaS)

• Virtualized computing hardware

• Platform as a Service (PaaS)• Prepackaged / managed runtime application

platform. Reduced complexity when compared to IaaS

• Software as a Service (SaaS)• Full service software solution running in the

cloud.

* Amazon has offerings in all these areas *

The Cloud Model

Amazon Offerings• Infrastructure as a Service• EC2, Elastic Load Balancers

• Platform as a Service• Elastic BeanStalk, Elastic Map Reduce,

CloudFormation, Relational Database Service, SimpleDB, DynamoDB

• Software as a Service• Flexible Payments Service, Mechanical Turk

Scalability on Amazon

• RDS and EC2 allows for easy scaling up / down of server sizes

• Elastic Beanstalk compatible applications can make use of autoscaling of EC2 instances (EC2 as a PaaS)

• Spot Instances for non time sensitive tasks

Reliability on Amazon• Regions

o Consist of one or more Availability Zones, are geographically dispersed, and will be in separate geographic areas or countries. Currently there are 8 Regions.

• Availability Zoneso Distinct locations engineered to be insulated from

failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region.

• Protect application against failures in a single location by launching instances in different Regions and Availability Zones.

Security on Amazon• Can it be secure?

• Security Groups• IAM - User / Role Management

• Can it be PCI compliant?• PCI DSS Level 1 for most services (RDS, S3, …)

• Can it be HIPAA compliant?• Security Groups• In flight / at rest encryption• Case Study: MedCommons patient records

• Government Compliance• GovCloud - Requires pre-approval from AWS to start

infrastructure in this cloud.• Only United States

• Virtual Private Cloud (VPC) vs Classic Cloud

How Dispatch Uses Amazon

• All 23 public facing sites are hosted entirely in Amazon

• Limited communication back to internal infrastructure via Web Services

• Multiple environments (Prod, QA, Development)o Ease of keeping QA as a mirror of production

CloudWatch• Default metrics for all EC2

Instances• e.g. CPU, memory, disk IO• 1 minute measurement

interval

• Tracking custom metrics• OpenCMS Publish Times,

Java Heap Usage, Database Connections

• Alert Thresholds• Text message, email,

JSON posts

SimpleDB - NoSQL• Used for tracking user content access for

metered site access• 60 million+ records and growing

• Managed NoSQL Solution• NoSQL = Non-relational data store

• Extremely simple and flexible data model (e.g. key-value store)

ClcGrU8Eapig2y5eD7r2Ag==201202www.dispatch.com/content/index.html

Unique User ID

Year/Month of Access

Site Content Accessed

SimpleDB – NoSQL (cont.)• High availability and scalability• Amazon manages multiple geographically

distrubuted replicas of your data

• SimpleDB is eventually consistent

• Weaknesses• Complex pricing structure / hard to estimate

• No good mechanism for backups

• Decision to use SimpleDB came before launch of DynamoDB

Relational Database Service (RDS)

• Hosts all web application data • Exception is user access data in SimpleDB

• Managed Relational Database• MySQL, Oracle, MSSQL

• We use MySQL

• Change to traditional database administration• No access to server console

• No access to a true SA account

Relational Database Service (cont.)

• Multi Availability Zone deployment• Failover is not instantaneous (1 - 2 minutes)

• Easy server upgrades with maintenance windows

• Support for read replica databases

• Restore snapshots automatically generated• Up to past 10 days

Simple Queue Service (SQS)• Message passing service to facilitate indirect communication

• Similar to Java Message Service (JMS)

• Why use SQS?

• We run in Amazon's classic cloud configuration• IP Addresses randomly assigned

• Ideal architecture on AWS reduces direct communication between boxes

• Used for application servers communicating with video transcoding server

• Starting new transcoding jobs

• Getting transcoding progress updates

• Scalability

ElastiCache• Managed Memcached Service

• Web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud.

• Integration of ElastiCache metrics with CloudWatch

• Hit rates, Eviction Rates etc.

• Reduces hits to our database servers, faster page loads and increases ability to handle high traffic volumes

Simple Email Service (SES)

• Managed service for bulk email sending

• CloudWatch Integration• Monitor deliveries, bounces etc. from within

AWS console

• Multiple endpoints• SMTP• Web Service Calls

Simple Storage Service (S3)

• First Amazon Web Service Offering

• File storage in the cloud

• Can serve static websites directly from S3

• We use for various processes:• Backing up a SQL dump of our production database• Temporary storage of video content during

transcoding process• Data storage for workflows involving importing

content from other areas of the business into their websites.

June 29th Weekend• Wind Storm

• US East Region (1 of 4 zones offline)• Both primary and backup power lost• Network connectivity issues between

availability zones

• RDS fail-over to a bad zone

• Leap Second Bug

Wind Storm - Lessons Learned

• Single point of failure on ElastiCache• All nodes in a cluster occupy a single AWS Zone

• Database failover• Up to a minute of no connectivity

• Limited communication from Amazon

• No time estimates for service restoration

Leap Second – Lessons Learned

• Not an Amazon specific issue

• Amazon support options• Open a ticket• Phone Calls• Online Chat• Forums (Only free option)

• Amazon able to diagnose issue quickly

23:59:60Saturday, June 30, 2012

UTC

What is Next?• Moving to Cloud Search

• Reduce Total Cost of Ownership

• Moving from Limelight to CloudFront CDN• Significant cost savings

Questions?David Landreman

Web Services IT Manager

Email: [email protected]

Twitter: @GraphIt2000

LinkedIn: www.linkedin.com/in/davidlandreman

Andrew RothSenior Internet Development Engineer

Email: [email protected]

LinkedIn: www.linkedin.com/in/rothandrew