Upload
amazon-web-services
View
725
Download
3
Embed Size (px)
Citation preview
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Abdul Sathar Sait, Principal Product Manager, RDS
Lynn Ferrante, Business Development Manager, RDS
Amazon Aurora Deep Dive
MySQL-compatible relational database
Performance and availability of commercial databases
Simplicity and cost-effectiveness of open source databases
What is Amazon Aurora?
Fastest growing service in AWS history
Business Applications
Web and mobile
Content management
E-commerce, retail
Internet of Things
Search, advertising
BI and analytics
Games, media
Common customer use cases
Expedia: On-line travel marketplace Real-time business intelligence and analytics on
a growing corpus of on-line travel market place data.
Current SQL server based architecture is too expensive. Performance degrades as data volume grows.
Cassandra with Solr index requires large memory footprint and hundreds of nodes, adding cost.
Aurora benefits: Aurora meets scale and performance
requirements with much lower cost. 25,000 inserts/sec with peak up to 70,000. 30ms
average response time for write and 17ms for read, with 1 month of data.
World’s leading online travel company, with a portfolio that includes 150+ travel sites in 70 countries.
PG&E: Large public utility Servicing high traffic surge during power events
had always been a problem.
Availability is critical when databases are down, it adversely affects service to gas and electrical customers.
Aurora benefits: Being able to create multiple database replicas
with millisecond latency allows them handle large surges in traffic and still give customers timely, up-to-date information during a power event .
Amazon Aurora, with 6-way replication, self healing storage and automatic instance repair, provides the availability and reliability needed for mission critical applications.
One of the largest combination natural gas and electric utilities in the United States with approximately 16 million customers in 70,000-square-mile service area in northern and central California.
ISCS: Insurance claims processing
Have been using Oracle and SQL server for operational and warehouse data
Cost and maintenance of traditional commercial database has become the biggest expenditure and maintenance headache.
Aurora benefits: The cost of a “more capable” deployment on
Aurora has proven to be about 70% less than ISCS’s SQL Server deployments.
Eliminated backup window with Aurora’s continuous backup; exploiting linear scaling with number of connections; continuous upload to Redshift using Aurora read replicas.
Provides policy management, claim, billing solutions for casualty and property and insurance organizations
Alfresco: Enterprise content management
Scaling Alfresco document repositories to billions of documents
Support user applications that require sub-second response times
Aurora benefits:
Scaled to 1 billion documents with a throughput of 3 million per hour, which is 10 times faster than their current environment.
Moving from large data centers to cost-effective management with AWS and Aurora.
Leading the convergence of Enterprise Content Management and Business Process Management. More than 1,800 organizations in 195 countries rely on Alfresco, including leaders in financial services, healthcare, and the public sector.
4 cl ient machines with 1,000 connections each
WRITE PERFORMANCE READ PERFORMANCE
Single client machine with 1,600 connections
Using MySQL SysBench with Amazon Aurora R3.8XL with 32 cores and 244 GB RAM
SQL Benchmark Results
Reproducing these results
https://d0.awsstatic.com/product-marketing/Aurora/RDS_Aurora_Performance_Assessment_Benchmarking_v1-2.pdf
AMAZON AURORA
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
• Create an Amazon VPC (or use an existing one).
• Create four EC2 R3.8XL client instances to run the SysBench client. All four should be in the same AZ.
• Enable enhanced networking on your clients
• Tune your Linux settings (see whitepaper)
• Install Sysbench version 0.5
• Launch a r3.8xlarge Amazon Aurora DB Instance in the same VPC and AZ as your clients
• Start your benchmark!
1
2
3
4
5
6
7
Beyond benchmarks
If only real world applications saw benchmark performance
POSSIBLE DISTORTIONSReal world requests contend with each otherReal world metadata rarely fits in data dictionary cacheReal world data rarely fits in buffer cacheReal world production databases need to run HA
Scaling User Connections
SysBench OLTP Workload
250 tables
Connections Amazon AuroraRDS MySQL
30K IOPS (single AZ)
50 40,000 10,000
500 71,000 21,000
5,000 110,000 13,000
8xU P TO
FA S T E R
Scaling Table Count
TablesAmazon Aurora
MySQLI2.8XL
local SSD
MySQLI2.8XL
RAM disk
RDS MySQL
30K IOPS (single AZ)
10 60,000 18,000 22,000 25,000
100 66,000 19,000 24,000 23,000
1,000 64,000 7,000 18,000 8,000
10,000 54,000 4,000 8,000 5,000
SysBench write-only workload1,000 connectionsDefault settings
11xU P TO
FA S T E R
Number of writes per second
Scaling Data Set Size
DB Size Amazon AuroraRDS MySQL
30K IOPS (single AZ)
1GB 107,000 8,400
10GB 107,000 2,400
100GB 101,000 1,500
1TB 26,000 1,200
67xU P TO
FA S T E R
SYSBENCH WRITE-ONLY
DB Size Amazon AuroraRDS MySQL
30K IOPS (single AZ)
80GB 12,582 585
800GB 9,406 69
CLOUDHARMONY TPC-C
136xU P TO
FA S T E R
Running with Read Replicas
Updates persecond Amazon Aurora
RDS MySQL30K IOPS (single AZ)
1,000 2.62 ms 0 s
2,000 3.42 ms 1 s
5,000 3.94 ms 60 s
10,000 5.38 ms 300 s
SysBench Write-only Workload250 tables
500xU P TO
L O W E R L A G
Do fewer IOs
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How Do We Achieve These Results?
DATABASES ARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
A service-oriented architecture applied to the database
Moved the logging and storage layer into a multi-tenant, scale-out database-optimized storage service
Integrated with other AWS services like Amazon EC2, Amazon VPC, Amazon DynamoDB, Amazon SWF, and Amazon Route 53 for control plane operations
Integrated with Amazon S3 for continuous backup with 99.999999999% durability
Control PlaneData Plane
Amazon DynamoDB
Amazon SWF
Amazon Route 53
Logging + Storage
SQL
Transactions
Caching
Amazon S3
1
2
3
Aurora Cluster
Amazon S3
AZ 1 AZ 2 AZ 3
Aurora Primary instance
Cluster volume spans 3 AZs
Aurora Cluster with Replicas
Amazon S3
AZ 1 AZ 2 AZ 3
Aurora Primary instance
Cluster volume spans 3 AZs
Aurora Replica Aurora Replica
IO Traffic in RDS MySQL
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R I T E
MYSQL WITH STANDBY
EBS mirrorEBS mirror
AZ 1 AZ 2
Amazon S3
EBSAmazon Elastic
Block Store (EBS)
PrimaryInstance
StandbyInstance
1
2
3
4
5
Issue write to EBS – EBS issues to mirror, ack when both doneStage write to standby instanceIssue write to EBS on standby instance
IO FLOW
Steps 1, 3, 5 are sequential and synchronousThis amplifies both latency and jitterMany types of writes for each user operationHave to write data blocks twice to avoid torn writes
OBSERVATIONS
780K transactions7,388K I/Os per million txns (excludes mirroring, standby)Average 7.4 I/Os per transaction
PERFORMANCE
30 minute SysBench write-only workload, 100GB dataset, RDS Single AZ, 30K PIOPS
IO Traffic in Aurora (Database)
AZ 1 AZ 3
PrimaryInstance
Amazon S3
AZ 2
ReplicaInstance
AMAZON AURORA
ASYNC4/6 QUORUM
DISTRIBUTED WRITES
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R I T E S
30 minute SysBench writeonly workload, 100GB dataset
IO FLOW
Only write redo log records; all steps asynchronousNo data block writes (checkpoint, cache replacement)6X more log writes, but 9X less network trafficTolerant of network and storage outlier latency
OBSERVATIONS
27,378K transactions 35X MORE
950K I/Os per 1M txns (6X amplification) 7.7X LESS
PERFORMANCE
Boxcar redo log records – fully ordered by LSNShuffle to appropriate segments – partially orderedBoxcar to storage nodes and issue writes
IO Traffic in Aurora (Storage Node)
LOG RECORDS
Primary Instance
INCOMING QUEUE
STORAGE NODE
S3 BACKUP
1
2
3
4
5
6
7
8
UPDATE QUEUE
ACK
HOTLOG
DATABLOCKS
POINT IN TIMESNAPSHOT
GC
SCRUBCOALESCE
SORTGROUP
PEER TO PEER GOSSIPPeerStorageNodes
All steps are asynchronousOnly steps 1 and 2 are in foreground latency pathInput queue is 46X less than MySQL (unamplified, per node)Favor latency-sensitive operationsUse disk space to buffer against spikes in activity
OBSERVATIONS
IO FLOW
① Receive record and add to in-memory queue② Persist record and ACK ③ Organize records and identify gaps in log④ Gossip with peers to fill in holes⑤ Coalesce log records into new data block versions⑥ Periodically stage log and new block versions to S3⑦ Periodically garbage collect old versions⑧ Periodically validate CRC codes on blocks
Asynchronous group commits
Read
Write
Commit
Read
Read
T1
Commi t (T1)
Commi t (T2)
Commit (T3)
LSN 10
LSN 12
LSN 22
LSN 50
LSN 30
LSN 34
LSN 41
LSN 47
LSN 20
LSN 49
Commit (T4)Commit (T5)
Commit (T6)Commit (T7)
Commi t (T8)
LSN GROWTHDurable LSN at head-node
COMMIT QUEUEPending commits in LSN order
TIME
GROUPCOMMIT
TRANSACTIONS
Read
Write
Commit
Read
Read
T1
Read
Write
Commit
Read
Read
Tn
TRADITIONAL APPROACH AMAZON AURORAMaintain a buffer of log records to write out to disk
Issue write when buffer full or time out waiting for writes
First writer has latency penalty when write rate is low
Request I/O with first write, fill buffer till write picked up
Individual write durable when 4 of 6 storage nodes ACK
Advance DB Durable point up to earliest pending ACK
Re-entrant connections multiplexed to active threads
Kernel-space epoll() inserts into latch-free event queue
Dynamically size threads pool
Gracefully handles 5000+ concurrent client sessions on r3.8xl
Standard MySQL – one thread per connection
Doesn’t scale with connection count
MySQL EE – connections assigned to thread group
Requires careful stall threshold tuning
CLI
EN
T C
ON
NE
CTI
ON
CLI
EN
T C
ON
NE
CTI
ON
LATCH FREETASK QUEUE
epol
l()
MYSQL THREAD MODEL AURORA THREAD MODEL
Adaptive thread pool
IO Traffic in Aurora (Read Replica)
PAGE CACHEUPDATE
Aurora Master
30% Read
70% Write
Aurora Replica
100% New Reads
Shared Multi-AZ Storage
MySQL Master
30% Read
70% Write
MySQL Replica
30% New Reads
70% Write
SINGLE-THREADEDBINLOG APPLY
Data Volume Data Volume
Logical: Ship SQL statements to Replica
Write workload similar on both instances
Independent storage
Can result in data drift between Master and Replica
Physical: Ship redo from Master to Replica
Replica shares storage. No writes performed
Cached pages have redo applied
Advance read view when all commits seen
MYSQL READ SCALING AMAZON AURORA READ SCALING
Continuing the Improvements
Write batch size tuning Asynchronous send for read/write I/OsPurge thread performanceBulk insert performance
BATCH OPERATIONS
Failover time reductionsMalloc reductionSystem call reductionsUndo slot caching patternsCooperative log apply
OTHERBinlog and distributed transactionsLock compressionRead-ahead
CUSTOMER FEEDBACK
Hot row contentionDictionary statisticsMini-transaction commit code pathQuery cache read/write conflictsDictionary system mutex
LOCK CONTENTION
Availability
“Performance only matters if your database is up”
Storage node availability
Quorum system for read/write; latency tolerant
Peer to peer gossip replication to fill in holes
Continuous backup to S3 (designed for 11 9s durability)
Continuous scrubbing of data blocks
Continuous monitoring of nodes and disks for repair
10GB segments as unit of repair or hotspot rebalance to quickly rebalance load
Quorum membership changes do not stall writes
AZ 1 AZ 2 AZ 3
Amazon S3
Traditional databasesHave to replay logs since the last checkpoint
Typically 5 minutes between checkpoints
Single-threaded in MySQL; requires a large number of disk accesses
Amazon AuroraUnderlying storage replays redo records on demand as part of a disk read
Parallel, distributed, asynchronous
No replay for startup
Checkpointed Data Redo Log
Crash at T0 requiresa re-application of theSQL in the redo log sincelast checkpoint
T0 T0
Crash at T0 will result in redo logs being applied to each segment on demand, in parallel, asynchronously
Instant crash recovery
Survivable caches
We moved the cache out of the database process
Cache remains warm in the event of a database restart
Lets you resume fully loaded operations much faster
Instant crash recovery + survivable cache = quick and easy recovery from DB failures
SQLTransactions
Caching
SQL
Transactions
Caching
SQLTransactions
Caching
Caching process is outside the DB process and remains warm across a database restart
Faster, more predictable failover
AppRunningFailure Detection DNS Propagation
Recovery Recovery
DBFailure
MYSQL
AppRunning
Failure Detection DNS Propagation
Recovery
DBFailure
AURORA WITH MARIADB DRIVER
1 5 - 2 0 s e c
3 - 2 0 s e c
ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}]
ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN
[DISK index | NODE index] FOR INTERVAL interval
ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type
[TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval
Simulate failures using SQL
To cause the failure of a component at the database node:
To simulate the failure of disks:
To simulate the failure of networking:
http://aws.amazon.com/rds/aurora