Upload
amazon-web-services
View
1.840
Download
1
Embed Size (px)
Citation preview
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
6-way replication across 3 AZsCustom, scale-out SSD storageLess than 30s failovers or crash recoveryShared storage across replicasUp to 15 read replicas that act as failover targetsPay for the storage you useAutomatic hotspot managementAutomatic IOPS provisioning100K writes/second & 500K reads/secondBuffer caches that survive a database restartsSQL fault injectionMySQL compatibleAutomatic volume growthAutomatic volume growthUp to 64TB databasesProactive data block corruption detectionAutomated continuous backups to S3Automated repair of bad disksPeer to peer gossip replicationQuorum writes tolerate drive or AZ failures1/10th the cost of commercial databasesLess than 10ms replica lag
October 2015
DAT312
Using Amazon Aurora
for Enterprise Workloads Debanjan Saha - GM, Amazon Aurora, Amazon Web Services
Abdul Sait - Principal Product Marketing Manager, Amazon Web Services
Robin Mathews - Sr. Director, Technology, Expedia, Inc.
Enterprise customer wish list
A database that ….
Stays up, even when components fail ….
Performs consistently at enterprise scale …
Doesn’t need an army of experts to manage …
Doesn’t cost a fortune; no licenses to handle …
Amazon Aurora: enterprise-class database for the cloud
We started with enterprise requirements and walked
backward to reimagine relational databases for the
cloud ….
Enterprise-class availability, performance
Delivered as a fully managed service
No licenses; 1/10 the cost of commercial databases
Perfect fit for enterprise
6-way replication across 3 data centers
Failover in less than 30 secs
Near instant crash recovery
Up to 500 K/sec read and 100 K/sec write
15 low latency (10 ms) Read Replicas
Up to 64 TB DB optimized storage volume
Instant provisioning and deployment
Automated patching and software upgrade
Backup and point-in-time recovery
Compute and storage scaling
Performance and scale
Enterprise class availability
Fully managed service
Fastest growing service
in AWS history
Amazon Aurora customer adoption
A service-oriented architecture applied to the database
Moved the logging and storage layer into a
multitenant, scale-out database-optimized
storage service
Integrated with other AWS services like
Amazon EC2, Amazon VPC, Amazon
DynamoDB, Amazon SWF, and Amazon
Route 53 for control plane operations
Integrated with Amazon S3 for continuous
backup with 99.999999999% durability
Control PlaneData Plane
Amazon
DynamoDB
Amazon SWF
Amazon Route 53
Logging + Storage
SQL
Transactions
Caching
Amazon S3
1
2
3
Designed for high availability
Storage node availability
6-way replication across 3 Availability Zones
Quorum system for read/write; latency tolerant
Peer-to-peer gossip replication to fill in holes
Continuous scrubbing of data blocks
Continuous monitoring of nodes and disks for repair
Quorum membership changes do not stall writes
AZ 1 AZ 2 AZ 3
Amazon S3
Lose two copies or an Availability Zone failure without read or write availability impact
Lose three copies without read availability impact
Automatic detection, replication, and repair
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
Read and write availability Read availability
Self-healing, fault-tolerant
Traditional databases
Have to replay logs since the last
checkpoint
Typically 5 minutes between checkpoints
Single-threaded in MySQL; requires a
large number of disk accesses
Amazon Aurora
Underlying storage replays redo records
on demand as part of a disk read
Parallel, distributed, asynchronous
No replay for startup
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo logs being
applied to each segment on demand, in
parallel, asynchronously
Instant crash recovery
Faster, more predictable failover
AppRunningFailure Detection DNS Propagation
Recovery Recovery
DBFailure
MYSQL
App
Running
Failure Detection DNS Propagation
Recovery
DB
Failure
AURORA WITH MARIADB DRIVER
1 5 - 3 0 s e c
5 - 2 0 s e c
Continuous backup
Segment snapshot Log records
Recovery point
Segment 1
Segment 2
Segment 3
Time
Take periodic snapshot of each segment in parallel; stream the redo logs to Amazon S3
Backup happens continuously without performance or availability impact
At restore, retrieve the appropriate segment snapshots and log streams to storage nodes
Apply log streams to segment snapshots in parallel and asynchronously
Enterprise-class performance
• 4 client machines with 1,000 threads each
WRITE PERFORMANCE READ PERFORMANCE
• Single client with 1,000 threads
• MySQL SysBench
• R3.8XL with 32 cores and 244 GB RAM
SQL benchmark results
Scales with table count
Tables
Amazon
Aurora
MySQL
I2.8XL
local SSD
MySQL
I2.8XL
RAM disk
RDS
MySQL
30K IOPS
(single AZ)
10 60,000 18,000 22,000 25,000
100 66,000 19,000 24,000 23,000
1,000 64,000 7,000 18,000 8,000
10,000 54,000 4,000 8,000 5,000
• Write-only workload
• 1,000 connections
• Query cache (default on for Amazon Aurora, off for MySQL)
11x
U P TO
FA S T E R
Scales with DB Size
67x
U P TO
FA S T E R
DB Size Amazon Aurora
RDS MySQL
30K IOPS (single AZ)
1GB 107,000 8,400
10GB 107,000 2,400
100GB 101,000 1,500
Scales with use connections
• OLTP Workload
• Variable connection count
• 250 tables
• Query cache (default on for Amazon Aurora, off for MySQL)
Connections Amazon Aurora
RDS MySQL
30K IOPS (single AZ)
50 40,000 10,000
500 71,000 21,000
5,000 110,000 13,000
8x
U P TO
FA S T E R
Do fewer IOs
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How do we achieve these results?
DATABASES ARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
Delivered as a managed service
Backup and recovery,
data load and unload
Performance tuning
5%
25%
20%
40%
5% 5%
Scripting
and coding
Security
planning
Installing,
upgrading, patching,
and migrating
Documentation,
licensing, and training
Databases are hard to manage
Hosting your databases on premises
youPower, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
App optimization
Hosting your databases in Amazon EC2
Power, HVAC, net
Rack & stack
Server maintenance
OS installation
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
App optimization
you
If you choose a managed DB service
App optimization
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
High availability
DB s/w installs
OS installation
Scaling
you
Advanced monitoring
coming soon...
Applications becoming more complex
CLOUD
Amazon EC2
Amazon
RDS
BIG DATA
Hadoop
Cassandra
Amazon EC2
Middleware
On-Prem DBOn-prem
DB
.NET
WEB 2.0
Browser Logic
AJAX
Web Frameworks
Amazon RDSAmazon EC2
Amazon EC2
Middleware Middleware
Amazon EC2
Amazon EC2
Amazon RDS
Amazon
ElastiCache
Monitoring across the stack is key to minimizing downtime Access to information from every potential point of failure
Alarm and notification system for pre-emptive action
Rich visualization of aggregated data at user’s convenience
Advanced monitoring
50+ system/OS metrics | sorted process list view | 1-60 sec granularity
alarms on specific metrics | egress to CloudWatch Logs | integration with 3rd-party tools
coming soon
ALARM
Important systems and OS metrics
User
System
Wait
IRQ
Idle
CPU Utilization
Rx per declared ethn
Tx per declared ethn
Network
Num processes
Num interruptible
Num non-interruptible
Num zombie
Processes
Process ID
Process name
VSS
Res
Mem %
consumed
CPU % used
CPU time
Parent ID
Process List
MemTotal
MemFree
Buffers
Cached
SwapCached
Active
Inactive
SwapTotal
SwapFree
Dirty
Writeback
Mapped
Slab
Memory
TPS
Blk_read
Blk_wrtn
read_kb
read_IOs
read_size
write_kb
write_IOs
write_size
avg_rw_size
avg_queue_len
Device IO
Free
capacity
Used
% Used
File System
Integrations with 3rd party tools
Don’t be constrained by
licenses, cost, or capacity
Simple pricing
No licenses
No lock-in
Pay only for what you use
Discounts
44% with a 1-year RI
63% with a 3-year RI
vCPU Mem Hourly Price
db.r3.large 2 15.25 $0.29
db.r3.xlarge 4 30.5 $0.58
db.r3.2xlarge 8 61 $1.16
db.r3.4xlarge 16 122 $2.32
db.r3.8xlarge 32 244 $4.64
• Storage consumed, up to 64 TB, is $0.10/GB-month
• IOs consumed are billed at $0.20 per million I/O
• Prices are for Virginia
Enterprise-grade, open-source pricing
Many features are unique to Amazon Aurora
Comparing to traditional commercial databases like Oracle
• Available only in most expensive database edition (Enterprise Edition)
• Failover and Replica — Oracle Active Data guard — extra $$$ per core
• Backup to S3 — Oracle Secure Backup Cloud Module — extra $$$ per channel
• Encryption — Oracle Advanced Security — extra $$$ per core
All inclusive pricing …..
AWS Data Migration Service
announced at re:Invent
Move data to the same or different database engine
Keep your apps running during the migration
Start your first migration in 10 minutes or less
Replicate within, to, or from AWS EC2 or RDS
AWS Database
Migration Service
Customer
Premises
Application Users
AWS
Internet
VPN
Start a replication instance
Connect to source and target
database
Select tables, schemas, or
databases
Let the AWS Database Migration
Service create tables, load data,
and keep them in sync
Switch applications over to the
target at your convenience
Keep your apps running during the migration
Migrate from Oracle and SQL Server
Move your tables, views, stored procedures,
and data manipulation language (DML) to
MySQL, MariaDB, and Amazon Aurora
Highlight where manual edits are neededAWS Schema
Conversion Tool
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Robin Mathews
Expedia Worldwide Engineering (EWE)
October 2015
Using Amazon Aurora for
Expedia Travel Data
One of the largest travel companies in the world
Lodging Inventory Services (LIS)
LIS
Rates
Change
History
Restrictions
Bookings
Promotions
Availability
Change history sample use cases
Finding the needle in the haystack Value add features
Challenge
Capacity Cost Performance
20,000 writes/second
300,000,000 rows/day
24 months of data
500 ms read response timeMinimizing storage,
development, and
maintenance cost
Existing solution
MS SQL• Tier 1 SAN for fast I/O
• Scale out using horizontal data partitioning
• Cross-database queries by master view database
Challenges• Capacity increase not elastic or automated
• Cost increasing due to licensing, storage, hardware, and maintenance
NoSQL Solution
Cassandra + Solr• Set up SQL-like schema and tables
• Solr indexes to support queries beyond key-value lookup
Challenge• Solr indexes require large memory footprint and hundreds of nodes, adding cost
Beginning
• Spring boot application
• JPA
• No provisioned write IOPS
• No table partitioning
• Single insert
• Primary key plus secondary indexes
• 400 inserts/second
Amazon RDS MySQL
Tuning
• Used JDBC
• Removed unnecessary secondary
indexes
• Changed to insert in batches
• Provisioned write IOPS to max of
20,000
• Optimized size of JDBC connection
pool
Results and challenges
• Write performance bottlenecked at 5,000 inserts/sec, after 300 million table rows
• Capacity limitation of 6 TB
Amazon Aurora
Tuning• Partitioned database table
• Ordered composite primary key based on query
• Co-locate web application and database in same region to reduce latency
• Batch write with batchRewrite flag
Initial performance• 25,000 average inserts/sec with peak up to 70,000 inserts/sec
• 30 ms average response time for write and 17 ms for read, with 1 month of data
Amazon Aurora
Summary• Promising performance with initial test results
• Provisioned capacity up to 64 TB meeting storage needs
Next step• Migrating change history data from SQL to Aurora
• Monitor performance for write and read
http://bit.ly/awsevalsDAT312
Remember to complete
your evaluations!
Thank you!
https://aws.amazon.com/rds/aurora