Amazon Aurora Deep Dive - Percona – The Database ... · MySQL-compatible relational database ... Delivered as a managed service What is Amazon Aurora? ... CLOUDHARMONY TPC-C 136x

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Aurora Deep Dive

6-way replication across 3 AZsCustom, scale-out SSD storageLess than 30s failovers or crash recoveryShared storage across replicasUp to 15 read replicas that act as failover targetsPay for the storage you useAutomatic hotspot managementAutomatic IOPS provisioning100K writes/second & 500K reads/secondBuffer caches that survive a database restartsSQL fault injectionMySQL compatibleAutomatic volume growthAutomatic volume growthUp to 64TB databasesProactive data block corruption detectionAutomated continuous backups to S3Automated repair of bad disksPeer to peer gossip replicationQuorum writes tolerate drive or AZ failures1/10th the cost of commercial databasesLess than 10ms replica lag

Anurag Gupta – VP, Big Data

Amazon Web Services

April, 2016

MySQL-compatible relational database

Performance and availability of

commercial databases

Simplicity and cost-effectiveness of

open source databases

Delivered as a managed service

What is Amazon Aurora?

Re-imagined for the cloud

Architected for the cloud – e.g. moved the

logging and storage layer into a

multitenant, scale-out database-optimized

storage service

Leverages existing AWS services: Amazon

EC2, Amazon VPC, Amazon DynamoDB,

Amazon SWF, and Amazon S3

Maintain compatibility with MySQL –

customers can migrate their MySQL

applications as-is, use all MySQL tools.

Control PlaneData Plane

Amazon

DynamoDB

Amazon SWF

Amazon Route 53

Logging + Storage

SQL

Transactions

Caching

Amazon S3

1

2

3

You’ve probably heard about

our benchmark numbers…

Reproducing benchmark results

h t t p s : / / d 0 . a w s s t a t i c . c o m / p r o d u c t - m a rk e t i n g / Au r o r a / R D S_ Au r o r a _ Pe r f o r m a n c e _ As s e s s m e n t _ Be n c hm a r k i n g _ v 1 - 2 . p d f

AMAZON

AURORA

R3.8XLARGE

R3.8XLARGE

R3.8XLARGE

R3.8XLARGE

R3.8XLARGE

• Create an Amazon VPC (or use an existing one).

• Create four EC2 R3.8XL client instances to run the

SysBench client. All four should be in the same AZ.

• Enable enhanced networking on your clients

• Tune your Linux settings (see whitepaper)

• Install Sysbench version 0.5

• Launch a r3.8xlarge Amazon Aurora DB Instance in

the same VPC and AZ as your clients

• Start your benchmark!

1

2

3

4

5

6

7

WRITE PERFORMANCE READ PERFORMANCE

MySQL SysBench results

R3.8XL: 32 cores / 244 GB RAM

5X faster than RDS MySQL 5.6 & 5.7

Five times higher throughput than stock MySQL

based on industry standard benchmarks.

0

25 000

50 000

75 000

100 000

125 000

150 000

0

100 000

200 000

300 000

400 000

500 000

600 000

700 000

Aurora MySQL 5.6 MySQL 5.7

WRITE PERFORMANCE READ PERFORMANCE

Scaling with instance sizes

Aurora scales with instance size for both read and write.

Aurora MySQL 5.6 MySQL 5.7

Beyond benchmarks

If only real world applications saw benchmark performance

POSSIBLE DISTORTIONS

Real world requests contend with each other

Real world metadata rarely fits in data dictionary cache

Real world data rarely fits in buffer cache

Real world production databases need to run with HA enabled

Scaling User Connections

SysBench OLTP Workload

250 tables

Connections Amazon Aurora

RDS MySQL

w/ 30K IOPS

50 40,000 10,000

500 71,000 21,000

5,000 110,000 13,000

8xU P TO

FA S T E R

Scaling Table Count

Tables

Amazon

Aurora

MySQL

I2.8XL

local SSD

MySQL

I2.8XL

RAM disk

RDS MySQL

w/ 30K IOPS

(single AZ)

10 60,000 18,000 22,000 25,000

100 66,000 19,000 24,000 23,000

1,000 64,000 7,000 18,000 8,000

10,000 54,000 4,000 8,000 5,000

SysBench write-only workload

Measuring writes per second

1,000 connections

11xU P TO

FA S T E R

Scaling Data Size

DB Size Amazon Aurora

RDS MySQL

w/ 30K IOPS

1GB 107,000 8,400

10GB 107,000 2,400

100GB 101,000 1,500

1TB 26,000 1,200

21xU P TO

FA S T E R

SYSBENCH WRITE-ONLY

DB Size Amazon Aurora

RDS MySQL

w/ 30K IOPS

80GB 12,582 585

800GB 9,406 69

CLOUDHARMONY TPC-C

136xU P TO

FA S T E R

Real-life data – gaming workloadAurora vs. RDS MySQL – r3.4XL, MAZ

Aurora 3X faster on r3.4xlarge

Do fewer IOs

Minimize network packets

Cache prior results

Offload the database engine

DO LESS WORK

Process asynchronously

Reduce latency path

Use lock-free data structures

Batch operations together

BE MORE EFFICIENT

How did we achieve this?

DATABASES ARE ALL ABOUT I/O

NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND

HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES

IO traffic in MySQL

BINLOG DATA DOUBLE-WRITELOG FRM FILES

T Y P E O F W R I T E

MYSQL WITH REPLICA

EBS mirrorEBS mirror

AZ 1 AZ 2

Amazon S3

EBSAmazon Elastic

Block Store (EBS)

Primary

Instance

Replica

Instance

1

2

3

4

5

Issue write to EBS – EBS issues to mirror, ack when both done

Stage write to standby instance through DRBD

Issue write to EBS on standby instance

IO FLOW

Steps 1, 3, 5 are sequential and synchronous

This amplifies both latency and jitter

Many types of writes for each user operation

Have to write data blocks twice to avoid torn writes

OBSERVATIONS

780K transactions

7,388K I/Os per million txns (excludes mirroring, standby)

Average 7.4 I/Os per transaction

PERFORMANCE

30 minute SysBench writeonly workload, 100GB dataset, RDS MultiAZ, 30K PIOPS

IO traffic in Aurora

AZ 1 AZ 3

Primary

Instance

Amazon S3

AZ 2

Replica

Instance

AMAZON AURORA

ASYNC

4/6 QUORUM

DISTRIBUTED

WRITES

BINLOG DATA DOUBLE-WRITELOG FRM FILES

T Y P E O F W R I T E

IO FLOW

Only write redo log records; all steps asynchronous

No data block writes (checkpoint, cache replacement)

6X more log writes, but 9X less network traffic

Tolerant of network and storage outlier latency

OBSERVATIONS

27,378K transactions 35X MORE

950K I/Os per 1M txns (6X amplification) 7.7X LESS

PERFORMANCE

Boxcar redo log records – fully ordered by LSN

Shuffle to appropriate segments – partially ordered

Boxcar to storage nodes and issue writesReplica

Instance

IO traffic in Aurora (Storage Node)

LOG RECORDS

Primary

Instance

INCOMING QUEUE

STORAGE NODE

S3 BACKUP

1

2

3

4

5

6

7

8

UPDATE

QUEUE

ACK

HOT

LOG

DATA

BLOCKS

POINT IN TIME

SNAPSHOT

GC

SCRUB

COALESCE

SORT

GROUP

PEER TO PEER GOSSIPPeer

Storage

Nodes

All steps are asynchronous

Only steps 1 and 2 are in foreground latency path

Input queue is 46X less than MySQL (unamplified, per node)

Favor latency-sensitive operations

Use disk space to buffer against spikes in activity

OBSERVATIONS

IO FLOW

① Receive record and add to in-memory queue

② Persist record and ACK

③ Organize records and identify gaps in log

④ Gossip with peers to fill in holes

⑤ Coalesce log records into new data block versions

⑥ Periodically stage log and new block versions to S3

⑦ Periodically garbage collect old versions

⑧ Periodically validate CRC codes on blocks

Scaling updates with replicas

Updates per

second Amazon Aurora

RDS MySQL

30K IOPS (single AZ)

1,000 2.62 ms 0 s

2,000 3.42 ms 1 s

5,000 3.94 ms 60 s

10,000 5.38 ms 300 s

SysBench Writeonly Workload

250 tables

500xU P TO

L O W E R L A G

“In RDS MySQL, we saw replica lag spike to almost 12 minutes which

is almost absurd from an application’s perspective. The maximum

read replica lag across 4 replicas never exceeded beyond 20 ms.”

Real-life data - read replica latency

IO traffic in Aurora Replicas

PAGE CACHE

UPDATE

Aurora Master

30% Read

70% Write

Aurora Replica

100% New Reads

Shared Multi-AZ Storage

MySQL Master

30% Read

70% Write

MySQL Replica

30% New Reads

70% Write

SINGLE-THREADED

BINLOG APPLY

Data Volume Data Volume

Logical: Ship SQL statements to Replica

Write workload similar on both instances

Independent storage

Can result in data drift between Master and Replica

Physical: Ship redo from Master to Replica

Replica shares storage. No writes performed

Cached pages have redo applied

Advance read view when all commits seen

MYSQL READ SCALING AMAZON AURORA READ SCALING

“Performance only matters if your database is up”

Storage durability

Storage volume automatically grows up to 64 TB

Quorum system for read/write; latency tolerant

Peer to peer gossip replication to fill in holes

Continuous backup to S3 (built for 11 9s durability)

Continuous monitoring of nodes and disks for repair

10GB segments as unit of repair or hotspot rebalance

Quorum membership changes do not stall writes

AZ 1 AZ 2 AZ 3

Amazon S3

Six copies across three availability zones

4 out 6 write quorum; 3 out of 6 read quorum

Peer-to-peer replication for repairs

Volume striped across hundreds of storage nodes

SQL

Transaction

AZ 1 AZ 2 AZ 3

Caching

SQL

Transaction

AZ 1 AZ 2 AZ 3

Caching

Read and write availability Read availability

Fault-tolerant storage

Continuous backup

Segment snapshot Log records

Recovery point

Segment 1

Segment 2

Segment 3

Time

• Take periodic snapshot of each segment in parallel; stream the redo logs to Amazon S3

• Backup happens continuously without performance or availability impact

• At restore, retrieve the appropriate segment snapshots and log streams to storage nodes

• Apply log streams to segment snapshots in parallel and asynchronously

Survivable caches

We moved the cache out of the

database process

Cache remains warm in the event of

database restart

Lets you resume fully loaded

operations much faster

Instant crash recovery + survivable

cache = quick and easy recovery from

DB failures

SQL

Transactions

Caching

SQL

Transactions

Caching

SQL

Transactions

Caching

Caching process is outside the DB process

and remains warm across a database restart

Traditional Databases

Have to replay logs since the last

checkpoint

Typically 5 minutes between checkpoints

Single-threaded in MySQL; requires a

large number of disk accesses

Amazon Aurora

Underlying storage replays redo records

on demand as part of a disk read

Parallel, distributed, asynchronous

No replay for startup

Checkpointed Data Redo Log

Crash at T0 requires

a re-application of the

SQL in the redo log since

last checkpoint

T0 T0

Crash at T0 will result in redo logs being

applied to each segment on demand, in

parallel, asynchronously

Instant crash recovery

Read replicas are failover targets

Aurora cluster contains primary node

and up to fifteen secondary nodes

Failing database nodes are

automatically detected and replaced

Failing database processes are

automatically detected and recycled

Secondary nodes automatically

promoted on persistent outage, no

single point of failure

Customer application may scale-out

read traffic across secondary nodes

AZ 1 AZ 3AZ 2

Primary

NodePrimary

NodePrimary

Node

Primary

NodePrimary

NodeSecondary

Node

Primary

NodePrimary

NodeSecondary

Node

Customer specifiable fail-over order

Read balancing across read replicas

Faster failover

AppRunningFailure Detection DNS Propagation

Recovery Recovery

DBFailure

MYSQL

App

Running

Failure Detection DNS Propagation

Recovery

DB

Failure

AURORA WITH MARIADB DRIVER

1 5 - 2 0 s e c

3 - 2 0 s e c

“In RDS MySQL, it took minutes or sometimes tens of minutes to failover.

It’s pretty awesome that you can failover/restart within less than a minute.”

Real-life data – fail-over time

ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}]

ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN

[DISK index | NODE index] FOR INTERVAL interval

ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type

[TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval

Simulate failures using SQL

To cause the failure of a component at the database node:

To simulate the failure of disks:

To simulate the failure of networking:

Compatible with the MySQL

ecosystem

Well established MySQL ecosystem

Business Intelligence Data Integration Query and Monitoring SI and Consulting

Source: Amazon

“We ran our compatibility test suites against Amazon Aurora and everything

just worked." - Dan Jewett, Vice President of Product Management at Tableau

Many 3rd party monitoring tools

Just add read-only AWS credentials and select the services you wish to monitor (e.g. RDS)

Monitoring Aurora with Datadog

Monitor the new RDS enhanced metrics for high-resolution system-level metrics

Aurora enhanced metrics

Correlate Aurora metrics with metrics and events from the rest of your infrastructure

Monitoring the whole stack

Q&A

Documents

Amazon Aurora Deep Dive - Percona – The Database ... · MySQL-compatible relational database ... Delivered as a managed service What is Amazon Aurora? ... CLOUDHARMONY TPC-C 136x