48
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 6-way replication across 3 AZs Custom, scale-out SSD storage Less than 30s failovers or crash recovery Shared storage across replicas Up to 15 read replicas that act as failover targets Pay for the storage you use Automatic hotspot management Automatic IOPS provisioning 100K writes/second & 500K reads/second Buffer caches that survive a database restarts SQL fault injection MySQL compatible Automatic volume growth Automatic volume growth Up to 64TB databases Proactive data block corruption detection Automated continuous backups to S3 Automated repair of bad disks Peer to peer gossip replication Quorum writes tolerate drive or AZ failures 1/10 th the cost of commercial databases Less than 10ms replica lag October 2015 DAT312 Using Amazon Aurora for Enterprise Workloads Debanjan Saha - GM, Amazon Aurora, Amazon Web Services Abdul Sait - Principal Product Marketing Manager, Amazon Web Services Robin Mathews - Sr. Director, Technology, Expedia, Inc.

(DAT312) Using Amazon Aurora for Enterprise Workloads

Embed Size (px)

Citation preview

Page 1: (DAT312) Using Amazon Aurora for Enterprise Workloads

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

6-way replication across 3 AZsCustom, scale-out SSD storageLess than 30s failovers or crash recoveryShared storage across replicasUp to 15 read replicas that act as failover targetsPay for the storage you useAutomatic hotspot managementAutomatic IOPS provisioning100K writes/second & 500K reads/secondBuffer caches that survive a database restartsSQL fault injectionMySQL compatibleAutomatic volume growthAutomatic volume growthUp to 64TB databasesProactive data block corruption detectionAutomated continuous backups to S3Automated repair of bad disksPeer to peer gossip replicationQuorum writes tolerate drive or AZ failures1/10th the cost of commercial databasesLess than 10ms replica lag

October 2015

DAT312

Using Amazon Aurora

for Enterprise Workloads Debanjan Saha - GM, Amazon Aurora, Amazon Web Services

Abdul Sait - Principal Product Marketing Manager, Amazon Web Services

Robin Mathews - Sr. Director, Technology, Expedia, Inc.

Page 2: (DAT312) Using Amazon Aurora for Enterprise Workloads

Enterprise customer wish list

A database that ….

Stays up, even when components fail ….

Performs consistently at enterprise scale …

Doesn’t need an army of experts to manage …

Doesn’t cost a fortune; no licenses to handle …

Page 3: (DAT312) Using Amazon Aurora for Enterprise Workloads

Amazon Aurora: enterprise-class database for the cloud

We started with enterprise requirements and walked

backward to reimagine relational databases for the

cloud ….

Enterprise-class availability, performance

Delivered as a fully managed service

No licenses; 1/10 the cost of commercial databases

Page 4: (DAT312) Using Amazon Aurora for Enterprise Workloads

Perfect fit for enterprise

6-way replication across 3 data centers

Failover in less than 30 secs

Near instant crash recovery

Up to 500 K/sec read and 100 K/sec write

15 low latency (10 ms) Read Replicas

Up to 64 TB DB optimized storage volume

Instant provisioning and deployment

Automated patching and software upgrade

Backup and point-in-time recovery

Compute and storage scaling

Performance and scale

Enterprise class availability

Fully managed service

Page 5: (DAT312) Using Amazon Aurora for Enterprise Workloads

Fastest growing service

in AWS history

Amazon Aurora customer adoption

Page 6: (DAT312) Using Amazon Aurora for Enterprise Workloads

A service-oriented architecture applied to the database

Moved the logging and storage layer into a

multitenant, scale-out database-optimized

storage service

Integrated with other AWS services like

Amazon EC2, Amazon VPC, Amazon

DynamoDB, Amazon SWF, and Amazon

Route 53 for control plane operations

Integrated with Amazon S3 for continuous

backup with 99.999999999% durability

Control PlaneData Plane

Amazon

DynamoDB

Amazon SWF

Amazon Route 53

Logging + Storage

SQL

Transactions

Caching

Amazon S3

1

2

3

Page 7: (DAT312) Using Amazon Aurora for Enterprise Workloads

Designed for high availability

Page 8: (DAT312) Using Amazon Aurora for Enterprise Workloads

Storage node availability

6-way replication across 3 Availability Zones

Quorum system for read/write; latency tolerant

Peer-to-peer gossip replication to fill in holes

Continuous scrubbing of data blocks

Continuous monitoring of nodes and disks for repair

Quorum membership changes do not stall writes

AZ 1 AZ 2 AZ 3

Amazon S3

Page 9: (DAT312) Using Amazon Aurora for Enterprise Workloads

Lose two copies or an Availability Zone failure without read or write availability impact

Lose three copies without read availability impact

Automatic detection, replication, and repair

SQL

Transaction

AZ 1 AZ 2 AZ 3

Caching

SQL

Transaction

AZ 1 AZ 2 AZ 3

Caching

Read and write availability Read availability

Self-healing, fault-tolerant

Page 10: (DAT312) Using Amazon Aurora for Enterprise Workloads

Traditional databases

Have to replay logs since the last

checkpoint

Typically 5 minutes between checkpoints

Single-threaded in MySQL; requires a

large number of disk accesses

Amazon Aurora

Underlying storage replays redo records

on demand as part of a disk read

Parallel, distributed, asynchronous

No replay for startup

Checkpointed Data Redo Log

Crash at T0 requires

a re-application of the

SQL in the redo log since

last checkpoint

T0 T0

Crash at T0 will result in redo logs being

applied to each segment on demand, in

parallel, asynchronously

Instant crash recovery

Page 11: (DAT312) Using Amazon Aurora for Enterprise Workloads

Faster, more predictable failover

AppRunningFailure Detection DNS Propagation

Recovery Recovery

DBFailure

MYSQL

App

Running

Failure Detection DNS Propagation

Recovery

DB

Failure

AURORA WITH MARIADB DRIVER

1 5 - 3 0 s e c

5 - 2 0 s e c

Page 12: (DAT312) Using Amazon Aurora for Enterprise Workloads

Continuous backup

Segment snapshot Log records

Recovery point

Segment 1

Segment 2

Segment 3

Time

Take periodic snapshot of each segment in parallel; stream the redo logs to Amazon S3

Backup happens continuously without performance or availability impact

At restore, retrieve the appropriate segment snapshots and log streams to storage nodes

Apply log streams to segment snapshots in parallel and asynchronously

Page 13: (DAT312) Using Amazon Aurora for Enterprise Workloads

Enterprise-class performance

Page 14: (DAT312) Using Amazon Aurora for Enterprise Workloads

• 4 client machines with 1,000 threads each

WRITE PERFORMANCE READ PERFORMANCE

• Single client with 1,000 threads

• MySQL SysBench

• R3.8XL with 32 cores and 244 GB RAM

SQL benchmark results

Page 15: (DAT312) Using Amazon Aurora for Enterprise Workloads

Scales with table count

Tables

Amazon

Aurora

MySQL

I2.8XL

local SSD

MySQL

I2.8XL

RAM disk

RDS

MySQL

30K IOPS

(single AZ)

10 60,000 18,000 22,000 25,000

100 66,000 19,000 24,000 23,000

1,000 64,000 7,000 18,000 8,000

10,000 54,000 4,000 8,000 5,000

• Write-only workload

• 1,000 connections

• Query cache (default on for Amazon Aurora, off for MySQL)

11x

U P TO

FA S T E R

Page 16: (DAT312) Using Amazon Aurora for Enterprise Workloads

Scales with DB Size

67x

U P TO

FA S T E R

DB Size Amazon Aurora

RDS MySQL

30K IOPS (single AZ)

1GB 107,000 8,400

10GB 107,000 2,400

100GB 101,000 1,500

Page 17: (DAT312) Using Amazon Aurora for Enterprise Workloads

Scales with use connections

• OLTP Workload

• Variable connection count

• 250 tables

• Query cache (default on for Amazon Aurora, off for MySQL)

Connections Amazon Aurora

RDS MySQL

30K IOPS (single AZ)

50 40,000 10,000

500 71,000 21,000

5,000 110,000 13,000

8x

U P TO

FA S T E R

Page 18: (DAT312) Using Amazon Aurora for Enterprise Workloads

Do fewer IOs

Minimize network packets

Cache prior results

Offload the database engine

DO LESS WORK

Process asynchronously

Reduce latency path

Use lock-free data structures

Batch operations together

BE MORE EFFICIENT

How do we achieve these results?

DATABASES ARE ALL ABOUT I/O

NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND

HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES

Page 19: (DAT312) Using Amazon Aurora for Enterprise Workloads

Delivered as a managed service

Page 20: (DAT312) Using Amazon Aurora for Enterprise Workloads

Backup and recovery,

data load and unload

Performance tuning

5%

25%

20%

40%

5% 5%

Scripting

and coding

Security

planning

Installing,

upgrading, patching,

and migrating

Documentation,

licensing, and training

Databases are hard to manage

Page 21: (DAT312) Using Amazon Aurora for Enterprise Workloads

Hosting your databases on premises

youPower, HVAC, net

Rack & stack

Server maintenance

OS patches

DB s/w patches

Database backups

Scaling

High availability

DB s/w installs

OS installation

App optimization

Page 22: (DAT312) Using Amazon Aurora for Enterprise Workloads

Hosting your databases in Amazon EC2

Power, HVAC, net

Rack & stack

Server maintenance

OS installation

OS patches

DB s/w patches

Database backups

Scaling

High availability

DB s/w installs

App optimization

you

Page 23: (DAT312) Using Amazon Aurora for Enterprise Workloads

If you choose a managed DB service

App optimization

Power, HVAC, net

Rack & stack

Server maintenance

OS patches

DB s/w patches

Database backups

High availability

DB s/w installs

OS installation

Scaling

you

Page 24: (DAT312) Using Amazon Aurora for Enterprise Workloads

Advanced monitoring

coming soon...

Page 25: (DAT312) Using Amazon Aurora for Enterprise Workloads

Applications becoming more complex

CLOUD

Amazon EC2

Amazon

RDS

BIG DATA

Hadoop

Cassandra

Amazon EC2

Middleware

On-Prem DBOn-prem

DB

.NET

WEB 2.0

Browser Logic

AJAX

Web Frameworks

Amazon RDSAmazon EC2

Amazon EC2

Middleware Middleware

Amazon EC2

Amazon EC2

Amazon RDS

Amazon

ElastiCache

Monitoring across the stack is key to minimizing downtime Access to information from every potential point of failure

Alarm and notification system for pre-emptive action

Rich visualization of aggregated data at user’s convenience

Page 26: (DAT312) Using Amazon Aurora for Enterprise Workloads

Advanced monitoring

50+ system/OS metrics | sorted process list view | 1-60 sec granularity

alarms on specific metrics | egress to CloudWatch Logs | integration with 3rd-party tools

coming soon

ALARM

Page 27: (DAT312) Using Amazon Aurora for Enterprise Workloads

Important systems and OS metrics

User

System

Wait

IRQ

Idle

CPU Utilization

Rx per declared ethn

Tx per declared ethn

Network

Num processes

Num interruptible

Num non-interruptible

Num zombie

Processes

Process ID

Process name

VSS

Res

Mem %

consumed

CPU % used

CPU time

Parent ID

Process List

MemTotal

MemFree

Buffers

Cached

SwapCached

Active

Inactive

SwapTotal

SwapFree

Dirty

Writeback

Mapped

Slab

Memory

TPS

Blk_read

Blk_wrtn

read_kb

read_IOs

read_size

write_kb

write_IOs

write_size

avg_rw_size

avg_queue_len

Device IO

Free

capacity

Used

% Used

File System

Page 28: (DAT312) Using Amazon Aurora for Enterprise Workloads

Integrations with 3rd party tools

Page 29: (DAT312) Using Amazon Aurora for Enterprise Workloads

Don’t be constrained by

licenses, cost, or capacity

Page 30: (DAT312) Using Amazon Aurora for Enterprise Workloads

Simple pricing

No licenses

No lock-in

Pay only for what you use

Discounts

44% with a 1-year RI

63% with a 3-year RI

vCPU Mem Hourly Price

db.r3.large 2 15.25 $0.29

db.r3.xlarge 4 30.5 $0.58

db.r3.2xlarge 8 61 $1.16

db.r3.4xlarge 16 122 $2.32

db.r3.8xlarge 32 244 $4.64

• Storage consumed, up to 64 TB, is $0.10/GB-month

• IOs consumed are billed at $0.20 per million I/O

• Prices are for Virginia

Enterprise-grade, open-source pricing

Page 31: (DAT312) Using Amazon Aurora for Enterprise Workloads

Many features are unique to Amazon Aurora

Comparing to traditional commercial databases like Oracle

• Available only in most expensive database edition (Enterprise Edition)

• Failover and Replica — Oracle Active Data guard — extra $$$ per core

• Backup to S3 — Oracle Secure Backup Cloud Module — extra $$$ per channel

• Encryption — Oracle Advanced Security — extra $$$ per core

All inclusive pricing …..

Page 32: (DAT312) Using Amazon Aurora for Enterprise Workloads

AWS Data Migration Service

announced at re:Invent

Page 33: (DAT312) Using Amazon Aurora for Enterprise Workloads

Move data to the same or different database engine

Keep your apps running during the migration

Start your first migration in 10 minutes or less

Replicate within, to, or from AWS EC2 or RDS

AWS Database

Migration Service

Page 34: (DAT312) Using Amazon Aurora for Enterprise Workloads

Customer

Premises

Application Users

AWS

Internet

VPN

Start a replication instance

Connect to source and target

database

Select tables, schemas, or

databases

Let the AWS Database Migration

Service create tables, load data,

and keep them in sync

Switch applications over to the

target at your convenience

Keep your apps running during the migration

Page 35: (DAT312) Using Amazon Aurora for Enterprise Workloads

Migrate from Oracle and SQL Server

Move your tables, views, stored procedures,

and data manipulation language (DML) to

MySQL, MariaDB, and Amazon Aurora

Highlight where manual edits are neededAWS Schema

Conversion Tool

Page 36: (DAT312) Using Amazon Aurora for Enterprise Workloads

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Robin Mathews

Expedia Worldwide Engineering (EWE)

October 2015

Using Amazon Aurora for

Expedia Travel Data

Page 37: (DAT312) Using Amazon Aurora for Enterprise Workloads

One of the largest travel companies in the world

Page 38: (DAT312) Using Amazon Aurora for Enterprise Workloads

Lodging Inventory Services (LIS)

LIS

Rates

Change

History

Restrictions

Bookings

Promotions

Availability

Page 39: (DAT312) Using Amazon Aurora for Enterprise Workloads

Change history sample use cases

Finding the needle in the haystack Value add features

Page 40: (DAT312) Using Amazon Aurora for Enterprise Workloads

Challenge

Capacity Cost Performance

20,000 writes/second

300,000,000 rows/day

24 months of data

500 ms read response timeMinimizing storage,

development, and

maintenance cost

Page 41: (DAT312) Using Amazon Aurora for Enterprise Workloads

Existing solution

MS SQL• Tier 1 SAN for fast I/O

• Scale out using horizontal data partitioning

• Cross-database queries by master view database

Challenges• Capacity increase not elastic or automated

• Cost increasing due to licensing, storage, hardware, and maintenance

Page 42: (DAT312) Using Amazon Aurora for Enterprise Workloads

NoSQL Solution

Cassandra + Solr• Set up SQL-like schema and tables

• Solr indexes to support queries beyond key-value lookup

Challenge• Solr indexes require large memory footprint and hundreds of nodes, adding cost

Page 43: (DAT312) Using Amazon Aurora for Enterprise Workloads

Beginning

• Spring boot application

• JPA

• No provisioned write IOPS

• No table partitioning

• Single insert

• Primary key plus secondary indexes

• 400 inserts/second

Amazon RDS MySQL

Tuning

• Used JDBC

• Removed unnecessary secondary

indexes

• Changed to insert in batches

• Provisioned write IOPS to max of

20,000

• Optimized size of JDBC connection

pool

Results and challenges

• Write performance bottlenecked at 5,000 inserts/sec, after 300 million table rows

• Capacity limitation of 6 TB

Page 44: (DAT312) Using Amazon Aurora for Enterprise Workloads

Amazon Aurora

Tuning• Partitioned database table

• Ordered composite primary key based on query

• Co-locate web application and database in same region to reduce latency

• Batch write with batchRewrite flag

Initial performance• 25,000 average inserts/sec with peak up to 70,000 inserts/sec

• 30 ms average response time for write and 17 ms for read, with 1 month of data

Page 45: (DAT312) Using Amazon Aurora for Enterprise Workloads

Amazon Aurora

Summary• Promising performance with initial test results

• Provisioned capacity up to 64 TB meeting storage needs

Next step• Migrating change history data from SQL to Aurora

• Monitor performance for write and read

Page 46: (DAT312) Using Amazon Aurora for Enterprise Workloads

http://bit.ly/awsevalsDAT312

Page 47: (DAT312) Using Amazon Aurora for Enterprise Workloads

Remember to complete

your evaluations!

Page 48: (DAT312) Using Amazon Aurora for Enterprise Workloads

Thank you!

https://aws.amazon.com/rds/aurora