62
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anurag Gupta, Vice President Amazon Athena, Amazon CloudSearch, AWS Data Pipeline, Amazon Elasticsearch Service, Amazon EMR, AWS Glue, Amazon Redshift, Amazon Aurora, Amazon RDS for MariaDB, RDS for MySQL, RDS for PostgreSQL [email protected] April 19, 2017 Deep Dive on Amazon Aurora

SRV407 Deep Dive on Amazon Aurora

Embed Size (px)

Citation preview

Page 1: SRV407 Deep Dive on Amazon Aurora

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Anurag Gupta, Vice President Amazon Athena, Amazon CloudSearch, AWS Data Pipeline, Amazon Elasticsearch

Service, Amazon EMR, AWS Glue, Amazon Redshift, Amazon Aurora, Amazon RDS for MariaDB, RDS for MySQL, RDS for PostgreSQL

[email protected]

April 19, 2017

Deep Dive on Amazon Aurora

Page 2: SRV407 Deep Dive on Amazon Aurora

Agenda

What is Aurora? Who is using it? Recent announcements

Designing for performance Aurora Performance enhancements

Designing for availability Recent availability enhancements

Catching up – 18 months since GA

Aurora performance deep-dive

Aurora availability architecture

Page 3: SRV407 Deep Dive on Amazon Aurora

Amazon Aurora …… Databases reimagined for the cloud

Speed and availability of high-end commercial databases

Simplicity and cost-effectiveness of open source databases

Drop-in compatibility with MySQL and PostgreSQL

Simple pay as you go pricing

Delivered as a managed service

Page 4: SRV407 Deep Dive on Amazon Aurora

Re-imagining the relational database

Fully managed service – automate administrative tasks

1

2

3

Scale-out, distributed, multi-tenant design Service-oriented architecture leveraging AWS services

Page 5: SRV407 Deep Dive on Amazon Aurora

Scale-out, distributed, multi-tenant architecture

Purpose-built log-structured distributed storage system designed for databases

Storage volume is striped across hundreds of storage nodes distributed over 3 different availability zones

Six copies of data, two copies in each availability zone to protect against AZ+1 failures

Plan to apply same principles to other layers of the stack

Master Replica Replica Replica

Availability Zone 1

Shared storage volume

Availability Zone 2

Availability Zone 3

Storage nodes with SSDs

SQL

Transactions

Caching

SQL

Transactions

Caching

SQL

Transactions

Caching

Page 6: SRV407 Deep Dive on Amazon Aurora

Leveraging cloud eco-system

AWS Lambda

Amazon S3

AWS IAM

Amazon CloudWatch

Invoke AWS Lambda events from stored procedures/triggers.

Load data from Amazon S3, store snapshots and backups in S3.

Use AWS Identity & Access Management (IAM) roles to manage database access control.

Upload systems metrics and audit logs to Amazon CloudWatch.

Page 7: SRV407 Deep Dive on Amazon Aurora

Automate administrative tasks

Schema design

Query construction Query optimization

Automatic fail-over Backup & recovery Isolation & security Industry compliance Push-button scaling Automated patching Advanced monitoring Routine maintenance

Takes care of your time-consuming database management tasks, freeing you to focus on your applications and business

You

AWS

Page 8: SRV407 Deep Dive on Amazon Aurora

Aurora is used by: 2/3 of top 100 AWS customers 8 of top 10 gaming customers

Aurora customer adoption

Fastest growing service in AWS history

Page 9: SRV407 Deep Dive on Amazon Aurora

Who is moving to Aurora and why?

Customers using commercial databases

Customers using open source databases

Higher performance – up to 5x Better availability and durability Reduces cost – up to 60% Easy migration; no application change

One tenth of the cost; no licenses Integration with cloud eco-system Comparable performance and availability Migration tooling and services

Page 10: SRV407 Deep Dive on Amazon Aurora

Performance enhancements: Fast DDL, fast index build, spatial indexing, hot row contention

Availability features: Zero-downtime patching, database cloning (Q2), database backtrack (Q2)

Eco-system integration: Load from S3, IAM integration (Q2), select into S3 (Q2), log upload to CloudWatch Logs & S3 (Q2)

Cost reduction: t2.small – cuts cost of entry by half – you can run Aurora for $1 / day

Growing footprint: launched in LHR, YUL, CMH, and SFO – now available in all 3AZ regions

Recent announcements

Page 11: SRV407 Deep Dive on Amazon Aurora

MySQL compatibility

Business Intelligence Data Integration Query and Monitoring

“We ran our compatibility test suites against Amazon Aurora and everything just worked." - Dan Jewett, VP, Product Management at Tableau

MySQL 5.6 / InnoDB compatible

No application compatibility issues reported in last 18 months

MySQL ISV applications run pretty much as is

Working on 5.7 compatibility

Running slower than expected – hope to make it available within Q2

Back ported 81 fixes from different MySQL releases

Page 12: SRV407 Deep Dive on Amazon Aurora

Aurora support for PostgreSQL

Amazon Aurora PostgreSQL compatibility now in open preview Same underlying scale out, 3 AZ, 6 copy, fault tolerant, self healing, expanding database optimized storage tier Integrated with a PostgreSQL 9.6 compatible database

Logging + Storage

SQL Transactions

Caching

Amazon S3

Page 13: SRV407 Deep Dive on Amazon Aurora

Designing for performance

Page 14: SRV407 Deep Dive on Amazon Aurora

Do fewer IOs

Minimize network packets

Cache prior results

Offload the database engine

DO LESS WORK

Process asynchronously

Reduce latency path

Use lock-free data structures

Batch operations together

BE MORE EFFICIENT

Aurora performance tenets

DATABASES ARE ALL ABOUT I/O

NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND

HIGH-THROUGHPUT PROCESSING IS ALL ABOUT CONTEXT SWITCHES

Page 15: SRV407 Deep Dive on Amazon Aurora

IO traffic in MySQL

BINLOG DATA DOUBLE-WRITE LOG FRM FILES

TYPE OF WRITE

MYSQL WITH REPLICA

EBS mirror EBS mirror

AZ 1 AZ 2

Amazon S3

EBS Amazon Elastic

Block Store (EBS)

Primary Instance

Replica Instance

1

2

3

4

5

Issue write to Amazon EBS – EBS issues to mirror, ack when both done Stage write to standby instance using synchronous replication Issue write to EBS on standby instance

IO FLOW

Steps 1, 3, 4 are sequential and synchronous This amplifies both latency and jitter Many types of writes for each user operation Have to write data blocks twice to avoid torn writes

OBSERVATIONS

780K transactions 7,388K I/Os per million txns (excludes mirroring, standby) Average 7.4 I/Os per transaction

PERFORMANCE

30 minute SysBench write-only workload, 100GB dataset, RDS Multi-AZ, 30K PIOPS

Page 16: SRV407 Deep Dive on Amazon Aurora

IO traffic in Aurora AMAZON AURORA IO FLOW

Only write redo log records; all steps asynchronous No data block writes (checkpoint, cache replacement) 6X more log writes, but 9X less network traffic Tolerant of network and storage outlier latency

OBSERVATIONS

27,378K transactions 35X MORE 950K I/Os per 1M txns (6X amplification) 7.7X LESS

PERFORMANCE

Boxcar redo log records – fully ordered by LSN Shuffle to appropriate segments – partially ordered Boxcar to storage nodes and issue writes

AZ 1 AZ 3

Primary Instance

Amazon S3

AZ 2

Replica Instance

ASYNC 4/6 QUORUM

DISTRIBUTED WRITES

Replica Instance

BINLOG DATA DOUBLE-WRITE LOG FRM FILES

TYPE OF WRITE

Page 17: SRV407 Deep Dive on Amazon Aurora

IO traffic in Aurora (Storage Node)

LOG RECORDS

Primary Instance

INCOMING QUEUE

STORAGE NODE

S3 BACKUP

1

2

3

4

5

6

7

8

UPDATE QUEUE

ACK

HOT LOG

DATA BLOCKS

POINT IN TIME SNAPSHOT

GC

SCRUB COALESCE

SORT GROUP

PEER TO PEER GOSSIP Peer Storage Nodes

All steps are asynchronous Only steps 1 and 2 are in foreground latency path Input queue is 46X less than MySQL (unamplified, per node) Favor latency-sensitive operations Use disk space to buffer against spikes in activity

OBSERVATIONS

IO FLOW ①Receive record and add to in-memory queue ②Persist record and ACK ③Organize records and identify gaps in log ④Gossip with peers to fill in holes ⑤Coalesce log records into new data block versions ⑥Periodically stage log and new block versions to S3 ⑦Periodically garbage collect old versions ⑧Periodically validate CRC codes on blocks

Page 18: SRV407 Deep Dive on Amazon Aurora

IO traffic in Aurora Replicas

PAGE CACHE UPDATE

Aurora Master

30% Read

70% Write

Aurora Replica

100% New Reads

Shared Multi-AZ Storage

MySQL Master

30% Read

70% Write

MySQL Replica

30% New Reads

70% Write

SINGLE-THREADED BINLOG APPLY

Data Volume Data Volume

Logical: Ship SQL statements to Replica

Write workload similar on both instances

Independent storage

Can result in data drift between Master and Replica

Physical: Ship redo from Master to Replica

Replica shares storage. No writes performed

Cached pages have redo applied

Advance read view when all commits seen

MYSQL READ SCALING AMAZON AURORA READ SCALING

Page 19: SRV407 Deep Dive on Amazon Aurora

“In MySQL, we saw replica lag spike to almost 12 minutes which is almost absurd from an application’s perspective. With Aurora, the maximum read replica lag across 4 replicas never exceeded 20 ms.”

Real-life data - read replica latency

Page 20: SRV407 Deep Dive on Amazon Aurora

Asynchronous group commits

Read

Write

Commit

Read

Read

T1

Commit (T1)

Commit (T2)

Commit (T3)

LSN 10

LSN 12

LSN 22

LSN 50

LSN 30

LSN 34

LSN 41

LSN 47

LSN 20

LSN 49

Commit (T4)

Commit (T5)

Commit (T6)

Commit (T7)

Commit (T8)

LSN GROWTH Durable LSN at head-node

COMMIT QUEUE Pending commits in LSN order

TIME

GROUP COMMIT

TRANSACTIONS

Read

Write

Commit

Read

Read

T1

Read

Write

Commit

Read

Read

Tn

TRADITIONAL APPROACH AMAZON AURORA Maintain a buffer of log records to write out to disk

Issue write when buffer full or time out waiting for writes

First writer has latency penalty when write rate is low

Request I/O with first write, fill buffer till write picked up

Individual write durable when 4 of 6 storage nodes ACK

Advance DB Durable point up to earliest pending ACK

Page 21: SRV407 Deep Dive on Amazon Aurora

Re-entrant connections multiplexed to active threads

Kernel-space epoll() inserts into latch-free event queue

Dynamically size threads pool

Gracefully handles 5000+ concurrent client sessions on r3.8xl

Standard MySQL – one thread per connection

Doesn’t scale with connection count

MySQL EE – connections assigned to thread group

Requires careful stall threshold tuning

CLI

EN

T C

ON

NE

CTI

ON

CLI

EN

T C

ON

NE

CTI

ON

LATCH FREE TASK QUEUE

epol

l()

MYSQL THREAD MODEL AURORA THREAD MODEL

Adaptive thread pool

Page 22: SRV407 Deep Dive on Amazon Aurora

Scan

Delete

Aurora lock management

Scan

Delete

Insert

Scan Scan

Insert

Delete

Scan

Insert

Insert

MySQL lock manager Aurora lock manager

Same locking semantics as MySQL Concurrent access to lock chains

Multiple scanners allowed in an individual lock chains Lock-free deadlock detection

Needed to support many concurrent sessions, high update throughput

Page 23: SRV407 Deep Dive on Amazon Aurora

MySQL SysBench results

R3.8XL: 32 cores / 244 GB RAM

5X faster than MySQL 5.6 & 5.7

Five times higher throughput than stock MySQL based on industry standard benchmarks.

WRITE PERFORMANCE

0

25,000

50,000

75,000

100,000

125,000

150,000

READ PERFORMANCE

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

Aurora MySQL 5.6 MySQL 5.7

Page 24: SRV407 Deep Dive on Amazon Aurora

WRITE PERFORMANCE READ PERFORMANCE

Scaling with instance sizes

Aurora scales with instance size for both read and write.

Aurora MySQL 5.6 MySQL 5.7

Page 25: SRV407 Deep Dive on Amazon Aurora

Real-life data – gaming workload Aurora vs. RDS MySQL – r3.4XL, MAZ

Aurora 3X faster on r3.4xlarge

Page 26: SRV407 Deep Dive on Amazon Aurora

Recent performance enhancements

Page 27: SRV407 Deep Dive on Amazon Aurora

Faster index build – Aug 2016 MySQL 5.6 leverages Linux read ahead – but this requires

consecutive block addresses in the btree. It inserts entries top down into the new btree, causing splits and lots of logging.

Aurora’s scan pre-fetches blocks based on position in tree, not block address.

Aurora builds the leaf blocks and then the branches of the tree.

• No splits during the build.

• Each page touched only once.

• One log record per page.

2-4X better than MySQL 5.6 or MySQL 5.7 0

2

4

6

8

10

12

r3.large on 10GB dataset r3.8xlarge on 10GB dataset r3.8xlarge on 100GB dataset

Hou

rs

RDS MySQL 5.6 RDS MySQL 5.7 Aurora 2016

Page 28: SRV407 Deep Dive on Amazon Aurora

Scan

Delete

Hot row contention – Nov 2016

Scan

Delete

Insert

Scan Scan

Insert

Delete

Scan

Insert

Insert

MySQL lock manager Aurora lock manager

Highly contended workloads had high memory and CPU Lock compression (bitmap for hot locks) Replace spinlocks with blocking futex – up to 12x reduction in CPU, 3x improvement in throughput December – use dynamic programming to release locks: from O(totalLocks * waitLocks) to O(totalLocks)

Throughput on Percona TPC-C 100 improved 29x (from 1,452 txns/min to 42,181 txns/min)

Page 29: SRV407 Deep Dive on Amazon Aurora

Hot row contention

MySQL 5.6 MySQL 5.7 Aurora Improvement

500 connections 6,093 25,289 73,955 2.92x

5000 connections 1,671 2,592 42,181 16.3x

Percona TPC-C – 10GB

* Numbers are in tpmC, measured using release 1.10 on an R3.8xlarge, MySQL numbers using RDS and EBS with 30K PIOPS

MySQL 5.6 MySQL 5.7 Aurora Improvement

500 connections 3,231 11,868 70,663 5.95x

5000 connections 5,575 13,005 30,221 2.32x

Percona TPC-C – 100GB

Page 30: SRV407 Deep Dive on Amazon Aurora

Accelerates batch inserts sorted by primary key – works by caching the cursor position in an index traversal.

Dynamically turns itself on or off based on data pattern

Avoids contention in acquiring latches while navigating down the tree.

Bi-directional, works across all insert statements • LOAD INFILE, INSERT INTO SELECT, INSERT INTO REPLACE

and, Multi-value inserts.

Insert performance – Dec 2015

Index

R4 R5 R2 R3 R0 R1 R6 R7 R8

Index

Root

MySQL: Traverses B-tree starting from root for all inserts

Index

R4 R5 R2 R3 R0 R1 R6 R7 R8

Index

Root

Aurora: Inserts avoids index traversal;

Page 31: SRV407 Deep Dive on Amazon Aurora

Spatial indexes in Aurora – Dec 2016 Z-index used in Aurora

Challenges with R-Trees Keeping it efficient while balanced Rectangles should not overlap or cover empty space Degenerates over time Re-indexing is expensive

R-Tree used in MySQL 5.7

Z-index (dimensionally ordered space filling curve) Uses regular B-Tree for storing and indexing Removes sensitivity to resolution parameter Adapts to granularity of actual data without user declaration Eg GeoWave (National Geospatial-Intelligence Agency) * r3.8xlarge using Sysbench on <1GB dataset

* Write Only: 4000 clients, Select Only: 2000 clients, ST_EQUALS

Spatial Index Benchmarks Sysbench – points and polygons

Page 32: SRV407 Deep Dive on Amazon Aurora

Cached read performance

Catalog concurrency: Improved data dictionary synchronization and cache eviction. NUMA aware scheduler: Aurora scheduler is now NUMA aware. Helps scale on multi-socket instances. Read views: Aurora now uses a latch-free concurrent read-view algorithm to construct read views.

25% Throughput gain

Smart scheduler: Aurora scheduler now dynamically assigns threads between IO heavy and CPU heavy workloads. Smart selector: Aurora reduces read latency by selecting the copy of data on a storage node with best performance Logical read ahead (LRA): We avoid read IO waits by prefetching pages based on their order in the btree.

10% Throughput gain

Non-cached read performance

Aurora read performance – Dec 2016

Page 33: SRV407 Deep Dive on Amazon Aurora

High-performance auditing – Jan 2017 MariaDB server_audit plugin Aurora native audit support

We can sustain over 500K events/sec

DDL

DML

Query

DCL

Connect Create event string

DDL

DML

Query

DCL

Connect

Write to File

Create event string

Create event string

Create event string

Create event string

Create event string

Latch-free queue

Write to File

Write to File

Write to File

MySQL 5.7 Aurora

Audit Off 95K 615K 6.47x

Audit On 33K 525K 15.9x

Sysbench Select-only Workload on 8xlarge Instance

Page 34: SRV407 Deep Dive on Amazon Aurora

Fast DDL: Aurora vs. MySQL – Mar 2017

Full Table copy; rebuilds all indexes – can take hours to complete. Needs temporary space for DML operations; table lock for DML changes DDL operation impacts DML throughput

Index

Leaf Leaf Leaf Leaf

Index

Root table name operation column-name time-stamp

Table 1 Table 2 Table 3

add-col add-col add-col

column-abc column-qpr column-xyz

t1 t2 t3

Add entry to metadata table - use schema versioning to decode the block. Modify-on-write to upgrade the block to latest schema when it is modified.

MySQL DDL Aurora DDL

Support NULLable column at the end; other add column, drop/reorder, modify data types

Page 35: SRV407 Deep Dive on Amazon Aurora

Fast DDL performance

DDL performance on r3.large DDL performance on r3.8xlarge

Aurora MySQL 5.6 MySQL 5.7

10GB table 0.27 sec 3,960 sec 1,600 sec 50GB table 0.25 sec 23,400 sec 5,040 sec 100GB table 0.26 sec 53,460 sec 9,720 sec

Aurora MySQL 5.6 MySQL 5.7

10GB table 0.06 sec 900 sec 1,080 sec 50GB table 0.08 sec 4,680 sec 5,040 sec 100GB table 0.15 sec 14,400 sec 9,720 sec

Page 36: SRV407 Deep Dive on Amazon Aurora

Fast DDL roadmap

Q1 2017 Q2 2017 2H 2017+

Add nullable column at the end of the table, without default values

• Other ADD COLUMN operations

• Nullable and not null columns

• With and without default value

• At the end, or in any any position

• Mark columns nullable

• Reorder columns

• Increase the size of the column in the same type family, such as int to bigint

• Drop column

• Mark columns not-nullable

• …

Page 37: SRV407 Deep Dive on Amazon Aurora

Designing for availability

Page 38: SRV407 Deep Dive on Amazon Aurora

Six copies across three Availability Zones

4 out 6 write quorum; 3 out of 6 read quorum

Peer-to-peer replication for repairs

Volume striped across hundreds of storage nodes

SQL

Transaction

AZ 1 AZ 2 AZ 3

Caching

SQL

Transaction

AZ 1 AZ 2 AZ 3

Caching

Read and write availability Read availability

6-way replicated storage Survives catastrophic failures

Page 39: SRV407 Deep Dive on Amazon Aurora

Storage Durability

Storage volume automatically grows up to 64 TB

Quorum system for read/write; latency tolerant

Peer to peer gossip replication to fill in holes

Continuous backup to S3 (built for 11 9s durability)

Continuous monitoring of nodes and disks for repair

10GB segments as unit of repair or hotspot rebalance

Quorum membership changes do not stall writes

AZ 1 AZ 2 AZ 3

Amazon S3

Page 40: SRV407 Deep Dive on Amazon Aurora

Continuous backup and point-in-time recovery Segment snapshot Log records

Recovery point

Segment 1

Segment 2

Segment 3

Time

• Take periodic snapshot of each segment in parallel; stream the redo logs to Amazon S3

• Backup happens continuously without performance or availability impact

• At restore, retrieve the appropriate segment snapshots and log streams to storage nodes

• Apply log streams to segment snapshots in parallel and asynchronously

Page 41: SRV407 Deep Dive on Amazon Aurora

Traditional Databases Have to replay logs since the last checkpoint Typically 5 minutes between checkpoints Single-threaded in MySQL; requires a large number of disk accesses

Amazon Aurora Underlying storage replays redo records on demand as part of a disk read Parallel, distributed, asynchronous No replay for startup

Checkpointed Data Redo Log

Crash at T0 requires a re-application of the SQL in the redo log since last checkpoint

T0 T0

Crash at T0 will result in redo logs being applied to each segment on demand, in parallel, asynchronously

Instant Crash Recovery

Page 42: SRV407 Deep Dive on Amazon Aurora

Survivable Caches We moved the cache out of the database process Cache remains warm in the event of database restart Lets you resume fully loaded operations much faster Instant crash recovery + survivable cache = quick and easy recovery from DB failures

Caching process is outside the DB process and remains warm across a database restart

SQL

Transactions

Caching

SQL

Transactions

Caching

SQL

Transactions

Caching

Page 43: SRV407 Deep Dive on Amazon Aurora

Up to 15 promotable read replicas

Master Read

Replica Read

Replica Read

Replica

Shared distributed storage volume

Reader end-point

► Up to 15 promotable read replicas across multiple availability zones

► Re-do log based replication leads to low replica lag – typically < 10ms

► Reader end-point with load balancing; customer specifiable failover order

Writer end-point

Page 44: SRV407 Deep Dive on Amazon Aurora

Faster failover

App Running Failure Detection DNS Propagation

Recovery Recovery

DB Failure

MYSQL

App Running

Failure Detection DNS Propagation

Recovery

DB Failure

AURORA WITH MARIADB DRIVER

5 - 6 s e c

3 - 1 5 s e c

5 - 6 s e c

Page 45: SRV407 Deep Dive on Amazon Aurora

Database failover time

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

0 - 5s – 30% of fail-overs

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

50.00%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

5 - 10s – 40% of fail-overs

0%

10%

20%

30%

40%

50%

60%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

10 - 20s – 25% of fail-overs

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

20 - 30s – 5% of fail-overs

Page 46: SRV407 Deep Dive on Amazon Aurora

Recent and upcoming availability enhancements

Page 47: SRV407 Deep Dive on Amazon Aurora

Availability is about more than HW failures

You also incur availability disruptions when you

1. Patch your database software

2. Modify your database schema

3. Perform large scale database reorganizations

4. DBA errors requiring database restores

Page 48: SRV407 Deep Dive on Amazon Aurora

Zero downtime patching – Dec 2016

Net

wor

king

st

ate

App

licat

ion

stat

e

Storage Service App state

Net state

App state Net state

Bef

ore

ZD

P

New DB Engine

Old DB Engine

New DB Engine

Old DB Engine

With

ZD

P

User sessions terminate during patching

User sessions remain active through patching

Storage Service

Page 49: SRV407 Deep Dive on Amazon Aurora

Zero downtime patching – current constraints

• Parameter changes pending • Temporary tables open • Locked Tables

We have to go to our current patching model when we can’t park connections:

• Long running queries • Open transactions • Bin-log enabled

• SSL connections open • Read replicas instances

We are working on addressing the above.

Page 50: SRV407 Deep Dive on Amazon Aurora

Database cloning – planned for May 2017

Create a copy of a database without duplicate storage costs • Creation of a clone is nearly instantaneous – we

don’t copy data • Data copy happens only on write – when original

and cloned volume data differ

Typical use cases: • Clone a production DB to run tests • Reorganize a database • Save a point in time snapshot for analysis without

impacting production system. Production database

Clone Clone

Clone Dev/test

applications

Benchmarks

Production applications

Production applications

Page 51: SRV407 Deep Dive on Amazon Aurora

How does it work?

Page 1

Page 2

Page 3

Page 4

Source Database

Page 1

Page 3

Page 2

Page 4

Cloned database

Shared Distributed Storage System: physical pages

Both databases reference same pages on the shared distributed storage system

Page 1

Page 2

Page 3

Page 4

Page 52: SRV407 Deep Dive on Amazon Aurora

How does it work? (contd.)

Page 1

Page 2

Page 3

Page 4

Page 5

Page 1

Page 3

Page 5

Page 2

Page 4

Page 6

Page 1

Page 2

Page 3

Page 4

Page 5

Page 6

As databases diverge, new pages are added appropriately to each database while still referencing pages common to both databases

Page 2

Page 3

Page 5

Shared Distributed Storage System: physical pages

Source Database Cloned database

Page 53: SRV407 Deep Dive on Amazon Aurora

What is database backtrack? Planned for June 2017

Backtrack quickly brings the database to a point in time without requiring restore from backups • Backtracking from an unintentional DML or DDL operation • Backtrack is not destructive. You can backtrack multiple times to find the right point in time

t0 t1 t2

t0 t1

t2

t3 t4

t3

t4

Rewind to t1

Rewind to t3

Invisible Invisible

Page 54: SRV407 Deep Dive on Amazon Aurora

How does it work?

Segment snapshot Log records

Rewind Point

Segment 1

Segment 2

Segment 3

Time

Take periodic snapshots within each segment in parallel and store them locally

At backtrack time, each segment picks the previous local snapshot and applies the log streams to the snapshot to produce the desired state of the DB

Storage Segments

Page 55: SRV407 Deep Dive on Amazon Aurora

Logs within the log stream are made visible or invisible based on the branch within the LSN tree to provide a consistent view for the DB The first backtrack performed at t2 to rewind the DB to t1 makes the logs in purple color invisible

The second backtrack performed at time t4 to rewind the DB to t3 makes the logs in red and purple invisible

How does it work?

t0 t1 t2

t0 t1

t2

t3 t4

t3

t4

Backtrack to t1

Backtrack to t3

Invisible Invisible

Page 56: SRV407 Deep Dive on Amazon Aurora

Removing Blockers

Page 57: SRV407 Deep Dive on Amazon Aurora

My databases need to meet certifications

Amazon Aurora gives each database instance IP firewall protection

Aurora offers transparent encryption at rest and SSL protection for data in transit

Amazon VPC lets you isolate and control network configuration and connect securely to your IT infrastructure

AWS Identity and Access Management provides resource-level permission controls

*New* *New*

Page 58: SRV407 Deep Dive on Amazon Aurora

t2 RI discounts Up to 34% with a 1-year RI Up to 57% with a 3-year RI

vCPU Mem Hourly Price

db.t2.small 1 2 $0.041

db.t2.medium 2 4 $0.082

db.r3.large 2 15.25 $0.29

db.r3.xlarge 4 30.5 $0.58

db.r3.2xlarge 8 61 $1.16

db.r3.4xlarge 16 122 $2.32

db.r3.8xlarge 32 244 $4.64

Run Aurora for $1 / day? We just introduced t2.small

*Prices are for Virginia

*NEW*

*NEW*

Page 59: SRV407 Deep Dive on Amazon Aurora

Amazon Aurora migration options

Source database From where Recommended option

RDS

EC2, on premise

EC2, on premise, RDS

Console based automated snapshot ingestion and catch up via binlog replication.

Binary snapshot ingestion through S3 and catch up via binlog replication.

Schema conversion using AWS SCT and data migration via AWS DMS.

Page 60: SRV407 Deep Dive on Amazon Aurora

Amazon Aurora roadmap

Page 61: SRV407 Deep Dive on Amazon Aurora

Near-Term Aurora Roadmap – 3-6 Month Plan Accelerating migration of enterprise workload

Category Q4/2016 Q1/2017 Q2 and Q3/2017

Performance • Spatial indexing

• R4 instance support • Parallel data load from S3 • Out-of-cache read improvements • Asynchronous key-based pre-fetch

Availability • Reader cluster end point and load balancing • Zero downtime patching

• Cross-region snapshot copy • Cross region replication for encrypted databases • Fast DDL v1

• Copy-on-write database cloning • Backtrack • Fast DDL v2 • Zero downtime restart

Security • HIPAA/BAA Compliance • Enhanced auditing • Cross account and cross region encrypted snapshot sharing

• IAM Integration and Active Directory integration • Encrypted RDS MySQL-Aurora replication

Ecosystem Integration • Lambda Integration with Aurora • Load tables from S3 • Load from S3 (manifest option)

• Load compressed files from S3 • Log upload to S3 and CloudWatch • Select into S3 • Integration with AWS Performance Insights

Cost-effective • T2.medium instance type • T2.small instance type

MySQL-compatible • RDS MySQL to Aurora replication (in region) • MySQL 5.7 compatibility

Regions • US West (N. California) • EU (Frankfurt)

Page 62: SRV407 Deep Dive on Amazon Aurora

Thank you! Questions?