Upload
vuongtuyen
View
225
Download
0
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Aurora Deep Dive
6-way replication across 3 AZsCustom, scale-out SSD storageLess than 30s failovers or crash recoveryShared storage across replicasUp to 15 read replicas that act as failover targetsPay for the storage you useAutomatic hotspot managementAutomatic IOPS provisioning100K writes/second & 500K reads/secondBuffer caches that survive a database restartsSQL fault injectionMySQL compatibleAutomatic volume growthAutomatic volume growthUp to 64TB databasesProactive data block corruption detectionAutomated continuous backups to S3Automated repair of bad disksPeer to peer gossip replicationQuorum writes tolerate drive or AZ failures1/10th the cost of commercial databasesLess than 10ms replica lag
Anurag Gupta – VP, Big Data
Amazon Web Services
April, 2016
MySQL-compatible relational database
Performance and availability of
commercial databases
Simplicity and cost-effectiveness of
open source databases
Delivered as a managed service
What is Amazon Aurora?
Re-imagined for the cloud
Architected for the cloud – e.g. moved the
logging and storage layer into a
multitenant, scale-out database-optimized
storage service
Leverages existing AWS services: Amazon
EC2, Amazon VPC, Amazon DynamoDB,
Amazon SWF, and Amazon S3
Maintain compatibility with MySQL –
customers can migrate their MySQL
applications as-is, use all MySQL tools.
Control PlaneData Plane
Amazon
DynamoDB
Amazon SWF
Amazon Route 53
Logging + Storage
SQL
Transactions
Caching
Amazon S3
1
2
3
You’ve probably heard about
our benchmark numbers…
Reproducing benchmark results
h t t p s : / / d 0 . a w s s t a t i c . c o m / p r o d u c t - m a rk e t i n g / Au r o r a / R D S_ Au r o r a _ Pe r f o r m a n c e _ As s e s s m e n t _ Be n c hm a r k i n g _ v 1 - 2 . p d f
AMAZON
AURORA
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
• Create an Amazon VPC (or use an existing one).
• Create four EC2 R3.8XL client instances to run the
SysBench client. All four should be in the same AZ.
• Enable enhanced networking on your clients
• Tune your Linux settings (see whitepaper)
• Install Sysbench version 0.5
• Launch a r3.8xlarge Amazon Aurora DB Instance in
the same VPC and AZ as your clients
• Start your benchmark!
1
2
3
4
5
6
7
WRITE PERFORMANCE READ PERFORMANCE
MySQL SysBench results
R3.8XL: 32 cores / 244 GB RAM
5X faster than RDS MySQL 5.6 & 5.7
Five times higher throughput than stock MySQL
based on industry standard benchmarks.
0
25 000
50 000
75 000
100 000
125 000
150 000
0
100 000
200 000
300 000
400 000
500 000
600 000
700 000
Aurora MySQL 5.6 MySQL 5.7
WRITE PERFORMANCE READ PERFORMANCE
Scaling with instance sizes
Aurora scales with instance size for both read and write.
Aurora MySQL 5.6 MySQL 5.7
Beyond benchmarks
If only real world applications saw benchmark performance
POSSIBLE DISTORTIONS
Real world requests contend with each other
Real world metadata rarely fits in data dictionary cache
Real world data rarely fits in buffer cache
Real world production databases need to run with HA enabled
Scaling User Connections
SysBench OLTP Workload
250 tables
Connections Amazon Aurora
RDS MySQL
w/ 30K IOPS
50 40,000 10,000
500 71,000 21,000
5,000 110,000 13,000
8xU P TO
FA S T E R
Scaling Table Count
Tables
Amazon
Aurora
MySQL
I2.8XL
local SSD
MySQL
I2.8XL
RAM disk
RDS MySQL
w/ 30K IOPS
(single AZ)
10 60,000 18,000 22,000 25,000
100 66,000 19,000 24,000 23,000
1,000 64,000 7,000 18,000 8,000
10,000 54,000 4,000 8,000 5,000
SysBench write-only workload
Measuring writes per second
1,000 connections
11xU P TO
FA S T E R
Scaling Data Size
DB Size Amazon Aurora
RDS MySQL
w/ 30K IOPS
1GB 107,000 8,400
10GB 107,000 2,400
100GB 101,000 1,500
1TB 26,000 1,200
21xU P TO
FA S T E R
SYSBENCH WRITE-ONLY
DB Size Amazon Aurora
RDS MySQL
w/ 30K IOPS
80GB 12,582 585
800GB 9,406 69
CLOUDHARMONY TPC-C
136xU P TO
FA S T E R
Real-life data – gaming workloadAurora vs. RDS MySQL – r3.4XL, MAZ
Aurora 3X faster on r3.4xlarge
Do fewer IOs
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How did we achieve this?
DATABASES ARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
IO traffic in MySQL
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R I T E
MYSQL WITH REPLICA
EBS mirrorEBS mirror
AZ 1 AZ 2
Amazon S3
EBSAmazon Elastic
Block Store (EBS)
Primary
Instance
Replica
Instance
1
2
3
4
5
Issue write to EBS – EBS issues to mirror, ack when both done
Stage write to standby instance through DRBD
Issue write to EBS on standby instance
IO FLOW
Steps 1, 3, 5 are sequential and synchronous
This amplifies both latency and jitter
Many types of writes for each user operation
Have to write data blocks twice to avoid torn writes
OBSERVATIONS
780K transactions
7,388K I/Os per million txns (excludes mirroring, standby)
Average 7.4 I/Os per transaction
PERFORMANCE
30 minute SysBench writeonly workload, 100GB dataset, RDS MultiAZ, 30K PIOPS
IO traffic in Aurora
AZ 1 AZ 3
Primary
Instance
Amazon S3
AZ 2
Replica
Instance
AMAZON AURORA
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R I T E
IO FLOW
Only write redo log records; all steps asynchronous
No data block writes (checkpoint, cache replacement)
6X more log writes, but 9X less network traffic
Tolerant of network and storage outlier latency
OBSERVATIONS
27,378K transactions 35X MORE
950K I/Os per 1M txns (6X amplification) 7.7X LESS
PERFORMANCE
Boxcar redo log records – fully ordered by LSN
Shuffle to appropriate segments – partially ordered
Boxcar to storage nodes and issue writesReplica
Instance
IO traffic in Aurora (Storage Node)
LOG RECORDS
Primary
Instance
INCOMING QUEUE
STORAGE NODE
S3 BACKUP
1
2
3
4
5
6
7
8
UPDATE
QUEUE
ACK
HOT
LOG
DATA
BLOCKS
POINT IN TIME
SNAPSHOT
GC
SCRUB
COALESCE
SORT
GROUP
PEER TO PEER GOSSIPPeer
Storage
Nodes
All steps are asynchronous
Only steps 1 and 2 are in foreground latency path
Input queue is 46X less than MySQL (unamplified, per node)
Favor latency-sensitive operations
Use disk space to buffer against spikes in activity
OBSERVATIONS
IO FLOW
① Receive record and add to in-memory queue
② Persist record and ACK
③ Organize records and identify gaps in log
④ Gossip with peers to fill in holes
⑤ Coalesce log records into new data block versions
⑥ Periodically stage log and new block versions to S3
⑦ Periodically garbage collect old versions
⑧ Periodically validate CRC codes on blocks
Scaling updates with replicas
Updates per
second Amazon Aurora
RDS MySQL
30K IOPS (single AZ)
1,000 2.62 ms 0 s
2,000 3.42 ms 1 s
5,000 3.94 ms 60 s
10,000 5.38 ms 300 s
SysBench Writeonly Workload
250 tables
500xU P TO
L O W E R L A G
“In RDS MySQL, we saw replica lag spike to almost 12 minutes which
is almost absurd from an application’s perspective. The maximum
read replica lag across 4 replicas never exceeded beyond 20 ms.”
Real-life data - read replica latency
IO traffic in Aurora Replicas
PAGE CACHE
UPDATE
Aurora Master
30% Read
70% Write
Aurora Replica
100% New Reads
Shared Multi-AZ Storage
MySQL Master
30% Read
70% Write
MySQL Replica
30% New Reads
70% Write
SINGLE-THREADED
BINLOG APPLY
Data Volume Data Volume
Logical: Ship SQL statements to Replica
Write workload similar on both instances
Independent storage
Can result in data drift between Master and Replica
Physical: Ship redo from Master to Replica
Replica shares storage. No writes performed
Cached pages have redo applied
Advance read view when all commits seen
MYSQL READ SCALING AMAZON AURORA READ SCALING
“Performance only matters if your database is up”
Storage durability
Storage volume automatically grows up to 64 TB
Quorum system for read/write; latency tolerant
Peer to peer gossip replication to fill in holes
Continuous backup to S3 (built for 11 9s durability)
Continuous monitoring of nodes and disks for repair
10GB segments as unit of repair or hotspot rebalance
Quorum membership changes do not stall writes
AZ 1 AZ 2 AZ 3
Amazon S3
Six copies across three availability zones
4 out 6 write quorum; 3 out of 6 read quorum
Peer-to-peer replication for repairs
Volume striped across hundreds of storage nodes
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
Read and write availability Read availability
Fault-tolerant storage
Continuous backup
Segment snapshot Log records
Recovery point
Segment 1
Segment 2
Segment 3
Time
• Take periodic snapshot of each segment in parallel; stream the redo logs to Amazon S3
• Backup happens continuously without performance or availability impact
• At restore, retrieve the appropriate segment snapshots and log streams to storage nodes
• Apply log streams to segment snapshots in parallel and asynchronously
Survivable caches
We moved the cache out of the
database process
Cache remains warm in the event of
database restart
Lets you resume fully loaded
operations much faster
Instant crash recovery + survivable
cache = quick and easy recovery from
DB failures
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
Caching process is outside the DB process
and remains warm across a database restart
Traditional Databases
Have to replay logs since the last
checkpoint
Typically 5 minutes between checkpoints
Single-threaded in MySQL; requires a
large number of disk accesses
Amazon Aurora
Underlying storage replays redo records
on demand as part of a disk read
Parallel, distributed, asynchronous
No replay for startup
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo logs being
applied to each segment on demand, in
parallel, asynchronously
Instant crash recovery
Read replicas are failover targets
Aurora cluster contains primary node
and up to fifteen secondary nodes
Failing database nodes are
automatically detected and replaced
Failing database processes are
automatically detected and recycled
Secondary nodes automatically
promoted on persistent outage, no
single point of failure
Customer application may scale-out
read traffic across secondary nodes
AZ 1 AZ 3AZ 2
Primary
NodePrimary
NodePrimary
Node
Primary
NodePrimary
NodeSecondary
Node
Primary
NodePrimary
NodeSecondary
Node
Customer specifiable fail-over order
Read balancing across read replicas
Faster failover
AppRunningFailure Detection DNS Propagation
Recovery Recovery
DBFailure
MYSQL
App
Running
Failure Detection DNS Propagation
Recovery
DB
Failure
AURORA WITH MARIADB DRIVER
1 5 - 2 0 s e c
3 - 2 0 s e c
“In RDS MySQL, it took minutes or sometimes tens of minutes to failover.
It’s pretty awesome that you can failover/restart within less than a minute.”
Real-life data – fail-over time
ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}]
ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN
[DISK index | NODE index] FOR INTERVAL interval
ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type
[TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval
Simulate failures using SQL
To cause the failure of a component at the database node:
To simulate the failure of disks:
To simulate the failure of networking:
Compatible with the MySQL
ecosystem
Well established MySQL ecosystem
Business Intelligence Data Integration Query and Monitoring SI and Consulting
Source: Amazon
“We ran our compatibility test suites against Amazon Aurora and everything
just worked." - Dan Jewett, Vice President of Product Management at Tableau
Many 3rd party monitoring tools
Just add read-only AWS credentials and select the services you wish to monitor (e.g. RDS)
Monitoring Aurora with Datadog
Monitor the new RDS enhanced metrics for high-resolution system-level metrics
Aurora enhanced metrics
Correlate Aurora metrics with metrics and events from the rest of your infrastructure
Monitoring the whole stack
Q&A