Upload
jeromeku
View
225
Download
0
Embed Size (px)
Citation preview
8/12/2019 AWS DB Best Practices
1/61
2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of A
DAT203 - AWS Storage and DatabaseArchitecture Best Practices
Siva Raghupathy, Amazon Web Services
8/12/2019 AWS DB Best Practices
2/61
The Third Platform
Built on:
Mobile devices
Cloud services
Social technologies
Big data Billions of users
Millions of apps
8/12/2019 AWS DB Best Practices
3/61
Data Volume, Velocity, Variety
2.7 zettabytes (ZB) of data
exists in the digital universetoday 1 ZB = 1 billion terabytes
450 billion transaction per day
by 2020 More unstructured data than
structured data
8/12/2019 AWS DB Best Practices
4/61
Common Questions from Database Devel
Cloud Migration
How do I move (my data) to the
cloud?Data/Storage Technologies
What data store should I use?
SQL or NoSQL?
Hadoop or DW? What about search?
Management Concerns
Is my data (in the clou
Relational features w/nightmares?
My data volume, velocare exploding!
How can I reduce cos
Performance and Delive
Need low latency (ms
Need high throughput
Need to ship in days
8/12/2019 AWS DB Best Practices
5/61
Cloud Data Tier Anti-Pattern
Data Tier
8/12/2019 AWS DB Best Practices
6/61
Cloud Data Tier Architecture Use the Right Tool
App/Web Tier
Client Tier
Data Tier
Search
Ha
Cache EBlob Store
SQLNoSQLData
Warehouse
8/12/2019 AWS DB Best Practices
7/61
8/12/2019 AWS DB Best Practices
8/61
Compute Storage
AWSGlobalInfrastructure
Database
AppServices
Deployment & Administration
Networking
AWS
8/12/2019 AWS DB Best Practices
9/61
AWS ManagedDatabase & Storage Serv
StructuredComplex Query SQL
Amazon RDS(MySQL, Oracle, SQL Server) Data Warehouse
Amazon Redshift
Search Amazon
CloudSearchUnstructuredCustom Query Hadoop
Amazon Elastic MapReduce(EMR)
StructuredSimple Q NoSQL
Amazon Dynamo Cache
Amazon ElastiCa(Memcached, Redis)
UnstructuredNo Q Cloud Storage
Amazon S3 Amazon Glacier
8/12/2019 AWS DB Best Practices
10/61
AWS PrimitiveCompute and Storag
Compute Capabilities
Many different EC2 instancetypes General purpose Compute optimized Storage optimized Memory optimized
Host any major data storagetechnology RDBMS NoSQL Cache
Raw Storage Options
EC2 Instance store (e Amazon Elastic Block Standard volume
1 TB, ~100 IOPS pe
Provisioned IOPS vo 1 TB, up to 4000 IO
Stripe multiple volum
IOPS or storage
Primit ives add f lexib i l i ty , but also com e with operatio
8/12/2019 AWS DB Best Practices
11/61
AWS Data Tier Architecture - Us the right tool fo
D
Amazon RDS
AmazonCloudSearch
Amazon DynamoDB
AmazonElastiCache
AmazonElastic MapReduce
Amazon S3
Amazon Redshift AWS Data Pipeline
8/12/2019 AWS DB Best Practices
12/61
Reference Architecture
8/12/2019 AWS DB Best Practices
13/61
Reference Architecture
AmazonRDS
AmazonCloudSearch
AmazonDynamoDB
AmazonElastiCache
AmazonEMR
AmazonS3
AWS
AR
8/12/2019 AWS DB Best Practices
14/61
Use Case: A Video Streaming Application
8/12/2019 AWS DB Best Practices
15/61
Use Case: A Video Streaming App U
AmazonDynamoDB
AmazonRDS
AmazonCloudSearch
AmazonS3
8/12/2019 AWS DB Best Practices
16/61
A Video Streaming App Discove
XAmazon
ElastiCache
CloudFront
AmazonDynamoDB
AmazonRDS
AmazonCloudSearch
AmazonS3
8/12/2019 AWS DB Best Practices
17/61
Use Case: A Video Streaming App
AmazonS3
AmazonDynamoDB
AmazonEMR
8/12/2019 AWS DB Best Practices
18/61
Use Case: A Video Streaming App Anal
AmazonEMR
AmazonS3
AmRe
Wh t i th t t f d t
8/12/2019 AWS DB Best Practices
19/61
What is the temperature of your dat
8/12/2019 AWS DB Best Practices
20/61
Data Characteristics: Hot, Warm, Co
Hot Warm Cold
Volume MBGB GBTB PB
Item size BKB KBMB KBT
Latency ms ms, sec min,
Durability LowHigh High VeryRequest rate Very High High LowCost/GB $$-$ $-
Low
8/12/2019 AWS DB Best Practices
21/61
AmazonElastiCache
AmazonRDS
AmazonRedshift
Amazon S3
Request rate
High
Cost/GBHigh
LatencyLow
Data Volume
Low
AmazonEMR
Structure
Low
High
AmazonDynamoDB
What data store should I use?
8/12/2019 AWS DB Best Practices
22/61
What data store should I use?Elasti-Cache
AmazonDynamoDB
AmazonRDS
CloudSearch
AmazonRedshift
AmazonEMR (Hive)
A
Averagelatency
ms ms ms,sec ms,sec sec,min sec,min,hrs
m(
Data volume GB GBTBs(no limit)
GBTB(3 TB Max)
GBTB TBPB(1.6 PB max)
GBPB(~nodes)
G(
Item size B-KB KB(64 KB max)
KB(~rowsize)
KB(1 MBmax)
KB(64 K max)
KB-MB K(
Request rate Very High Very High High High Low Low LV(
Storage cost$/GB/month
$$ $
Durability Low -Moderate
Very High High High High High V
Hot Data Warm Data
S f
8/12/2019 AWS DB Best Practices
23/61
AWS Data Tier Architecture - Use the right tool f
Da
Amazon RDS
AmazonCloudSearch
Amazon DynamoDB
AmazonElastiCache
AmazonElastic MapReduce
Amazon S3
Amazon Redshift AWS Data Pipeline
8/12/2019 AWS DB Best Practices
24/61
Cost Conscious Design
C t C i D i
8/12/2019 AWS DB Best Practices
25/61
Cost Conscious DesignExample: Should I use Amazon S3 or Amazon Dy
Im currently scoping out a project that will greatly
my teams use ofAmazon S3. Hoping you could ansome questions. The current iteration of the designmany small files, perhaps up to a billion during peatotal size would be on the order of 1.5 TB per mont
Request rate(Writes/sec)
Object size(Bytes)
Total size(GB/month)
Objects per mo
300 2048 1483 777,600,0
C t C i D i
8/12/2019 AWS DB Best Practices
26/61
Cost Conscious DesignExample: Should I use Amazon S3 or Amazon Dyn
Request rate Object size Total sizeA S3
http://calculator.s3.amazonaws.com/calc5.html#r=IAD&key=calc-736174F7-ECD3-4636-BB5A-0AF2DF8F4D4E8/12/2019 AWS DB Best Practices
27/61
Request rate(Writes/sec)
Object size(Bytes)
Total size(GB/month
300 2,048 1,483
Amazon S3 orAmazonDynamoDB?
http://calculator.s3.amazonaws.com/calc5.html#r=IAD&key=calc-736174F7-ECD3-4636-BB5A-0AF2DF8F4D4E8/12/2019 AWS DB Best Practices
28/61
Request rate(Writes/sec)
Object size(Bytes)
Total size(GB/month)
Obmo
Scenario 1300 2,048 1,483 77
Scenario 2300 32,768 23,730 777
Amazon S3
Amazon DynamoDB
use
use
http://calculator.s3.amazonaws.com/calc5.html#r=IAD&key=calc-736174F7-ECD3-4636-BB5A-0AF2DF8F4D4Ehttp://calculator.s3.amazonaws.com/calc5.html#r=IAD&key=calc-24CBA60C-49D4-4D42-84B6-B33E2C980C94http://calculator.s3.amazonaws.com/calc5.html#r=IAD&key=calc-24CBA60C-49D4-4D42-84B6-B33E2C980C94http://calculator.s3.amazonaws.com/calc5.html#r=IAD&key=calc-736174F7-ECD3-4636-BB5A-0AF2DF8F4D4E8/12/2019 AWS DB Best Practices
29/61
Best Practices
Amazon RDS
8/12/2019 AWS DB Best Practices
30/61
When to use
Transactions Complex queries Medium to high query/write rate
Up to 30 K IOPS (15 K reads + 15K writes)
100s of GB to low TBs
Workload can fit in a single node High durability
When not to use
Massive read/write ra Example: 150 K writ
second
Data size or throughpsharding Example: 10 s or 10
Simple Get/Put and qNoSQL can handle Complex analytics
Push-Button Scaling
Region
Multi-AZ
AZ 1 AZ 2
Amazon RDS
8/12/2019 AWS DB Best Practices
31/61
Amazon RDS Best Practices Use the right DB instance class
Use EBS-optimized instances db.m1.large, db.m1.xlarge, db.m2.2xlarge, db.m2.4xlarg
db.cr1.8xlarge
Use provisioned IOPS
Use multi-AZ for high availability Use read replicas for
Scaling reads
Schema changes
Additional failure recovery
Amazon DynamoDB
8/12/2019 AWS DB Best Practices
32/61
When to use
Fast and predictable performance
Seamless/massive scale
Autosharding
Consistent/low latency
No size or throughput limits
Very high durability
Key-value or simple queries
When not to use
Need multi-item/row otransactions
Need complex queries
Need real-time analytihistoric data
Storing cold data
Amazon DynamoDB
Amazon DynamoDB Best Practi
8/12/2019 AWS DB Best Practices
33/61
Amazon DynamoDB Best Practi Keep item size small
Store metadata in Amazon DynamoDB and
large blobs in Amazon S3 Use a table with a hash key for extremely
high scale
Use table per day, week, month etc. forstoring time series data
Use conditional/OCC updates
Use hash-range key to model 1:N relationships
Multi-tenancy
Avoid hot keys and hot partitions
Events_table_2012
Event_id(Hash key)
Timestamp(range key)
A
Events_table_2012_05_week1
Event_id(Hash key)
Timestamp(range key)
AEvents_table_2012_05_wee
Event_id(Hash key)
Timestamp(range key)
Events_table_2012_05_wee
Event_id(Hash key)
Timestamp(range key)
Amazon ElastiCache (Memcached)
8/12/2019 AWS DB Best Practices
34/61
When to use
Transient key-value store
Need to speed up reads/write
Caching frequent SQL, NoSQL orDW query results
Saving transient and frequently
updated data Increment/decrement game
scores/counters
Web application session storage
Best effort deduplication
When not to use
Store infrequently use
Need persistence
Amazon ElastiCache (Memcached)
Amazon ElastiCache (Memcached) Best Practic
8/12/2019 AWS DB Best Practices
35/61
Amazon ElastiCache (Memcached) Best Practic
Use autodiscovery Share memcached client objects in application Use TTLs Consider memory for connections overhead Use Amzon CloudWatch alarms / SNS alerts
Number of connections Swap memory usage
Freeable memory
Amazon ElastiCache (Redis)
8/12/2019 AWS DB Best Practices
36/61
When to use
Key-value store with advanceddata structures Strings, lists, sets, sorted sets,
hashes
Caching Leader boards
High-speed sorting Atomic counters Queuing systems Activity streams
When not to use
Need native sharding Need hard persisten
Data wont fit in memo
Need transaction rollbunder exceptions
Amazon ElastiCache (Redis)
Amazon ElastiCache (Redis) Best Practice
8/12/2019 AWS DB Best Practices
37/61
Amazon ElastiCache (Redis) Best Practice
Use TTL
Use the right instance types Instances with high ECU/vCPU and network performance
yield the highest throughput. Example: m2.4xlarge, m2.2xlarge
Use read replicas Increase read throughput
AOF cannot protect against all failure modes
Promote read replicas to primary
Use RDB file snapshot for on-premises to Amazon ElastiCache Key parameter group settings
Avoid AOF with fsync always huge impact on performance
AOF (+ RDB) with fsync everysecbest durability + performance
Pub-sub: set client-output-buffer-limit-pubsub-hard-limit and client-output-buffer-limit-based on the workloads
Amazon CloudSearch
8/12/2019 AWS DB Best Practices
38/61
When to use
No search expertise
Full-text search
Ranking
Relevance
Structured and unstructured data
Faceting
$0 to $10 (4 items)
$10 and above (3 items)
When not to use
Not as replacement fo Not as a system of reco
Transient data
Nonatomic updates
Amazon CloudSearch
A Cl dS h B P i
8/12/2019 AWS DB Best Practices
39/61
Batch documents for uploading
Use Amazon CloudSearch for searching and anostore for retrieving full records for the UI (i.e. donreturn fields)
Include other data like popularity scores in docum
Use stop words to remove common terms Use fielded queries to reduce match sets
Query latency is proportional to query specificity
Amazon CloudSearch Best Practice
Amazon Redshift
8/12/2019 AWS DB Best Practices
40/61
When to use
Information analysis and reporting Complex DW queries that
summarize historical data Batched large updates e.g. daily
sales totals 10s of concurrent queries 100s GB to PB Compression Column based Very high durability
When not to use
OLTP workloads 1000s of concurrent
Large number of sinupdates
Amazon Redshift
A R d hift B t P ti
8/12/2019 AWS DB Best Practices
41/61
Amazon Redshift Best Practices
Use COPY command to load large data sets from
S3, Amazon DynamoDB, Amazon EMR/EC2/Unix Split your data into multiple files
Use GZIP or LZOP compression
Use manifest file
Choose proper sort key Range or equality on WHERE clause
Choose proper distribution key Join column, foreign key or largest dimension, group by column
Avoid distribution key for denormalized data
Amazon Elastic MapReduce
8/12/2019 AWS DB Best Practices
42/61
When to use
Batch analytics/processing Answers in minutes or hours
Structured and unstructured data Parallel scans of the entire dataset
with uniform query performance
Supports Hive QL + other languages GB, TB, or PB of data Replicated data store (HDFS) for
ad-hoc and real-time queries(HBase)
When not to use
Real-time analytics (D Need answers in sec
1000s of concurrent u
Amazon Elastic MapReduce
Amazon Elastic MapReduce Best Practic
8/12/2019 AWS DB Best Practices
43/61
p
Choose between transient and persistentclusters for best TCO
Leverage Amazon S3 integration forhighly durable and interim storage
Right-size cluster instances based oneach jobnot one size fits all
Leverage resizing and spot to add andremove capacity cost-effectively
Tuning cluster instances can be easierthan tuning Hadoop code
AWS Data Pipeline
8/12/2019 AWS DB Best Practices
44/61
AWS Data Pipeline
When to use
Automate movement and transformationof data (ETL in the cloud)
Dependency management Data Control
Schedule management Transient Amazon EMR clusters Regular data move pattern
Every hour, day Every 30 minutes
Amazon DynamoDB backups Cross region
When not to use
Less that 15 minutes schinterval
Execution latency less th Event-based scheduling
AWS Data Pipeline Best Practice
8/12/2019 AWS DB Best Practices
45/61
AWS Data Pipeline Best Practice
Use dependency rather than time based
Make your activities idempotent
Add in your tools using shell activity
Use Amazon S3 for staging
Amazon S3
8/12/2019 AWS DB Best Practices
46/61
When to use
Store large objects
Key-value store - Get/Put/List Unlimited storage Versioning Very high durability
99.999999999%
Very high throughput (via parallel
clients) Use for storing persistent data
Backups Source/target for EMR Blob store with metadata in SQL
or NoSQL
When not to use
Complex queries
Very low latency (ms) Search Read-after-write consi
overwrites Need transactions
Amazon S3 Best Practices
8/12/2019 AWS DB Best Practices
47/61
Use random hash prefix for keys
Ensure a random access pattern Use Amazon CloudFront for high throughput GETs and PU
Leverage the high durability, high throughput design of Amfor backup and as a common storage sink Durable sink between data services
Supports de-coupling and asynchronous delivery
Consider RRS for lower cost, lower durability storage of derivatives or copies
Consider parallel threads and multipart upload for faster w
Consider parallel threads and range get for faster reads
Amazon Glacier
8/12/2019 AWS DB Best Practices
48/61
When to use
Infrequently accessed data sets Very low cost storage Data retrieval times of several
hours is acceptable Encryption at rest Very high durability
99.999999999% Unlimited amount of storage
When not to use
Frequent access Low latency access
Amazon Glacier Best Practices
8/12/2019 AWS DB Best Practices
49/61
Reduce request and storage costs with aggrega
Aggregating your files into bigger files before sending them to Am Store checksums along with your files
Use a format that allows you to access files within your aggregate
Improve speed and reliability with multipart uploa
Reduce costs with ranged retrievals
Maintaining your own index in a highly durable s
Amazon EC2 + Amazon EBS/Instanc
8/12/2019 AWS DB Best Practices
50/61
When to use Alternate data store technologies
Hand-tuned performance needs
Direct/admin access required
When not to use When a managed serv
the job
When operational explow
Storage
Amazon EBS Best Practices
8/12/2019 AWS DB Best Practices
51/61
Pick the right EC2 instance type Higher network performance instances for driving more Amazon EBS IOPS
EBS-Optimized EC2 instances for dedicated throughput between EC2 & Amazo
Use provisioned IOPS volumes for database workloads reconsistent IOPS
Use standard volumes for workloads requiring low to mod& occasional bursts
Stripe multiple Amazon EBS volumes for higher IOPS or s RAID0 for higher I/O
RAID10 for highest local durability
Amazon EBS snapshots Quiesce the file system and take a snapshot
Amazon EC2 Best Practices
8/12/2019 AWS DB Best Practices
52/61
HI-Best IOPS/$HS-Best GB/$
Amazon EC2 Best Practices
8/12/2019 AWS DB Best Practices
53/61
Summary
Cloud Data Tier Architecture Anti-Pa
8/12/2019 AWS DB Best Practices
54/61
Data Tier
AWS Data Tier Architecture - Use the right tool f
8/12/2019 AWS DB Best Practices
55/61
Da
Amazon RDS
AmazonCloudSearch
Amazon DynamoDB
AmazonElastiCache
AmazonElastic MapReduce
Amazon S3
Amazon Redshift AWS Data Pipeline
8/12/2019 AWS DB Best Practices
56/61
Reference Architecture
AmazonRDS
AmazonCloudSearch
AmazonDynamoDB
AmazonElastiCache
AmazonEMR
AmazonS3
AWS
AR
Cost Conscious Design
8/12/2019 AWS DB Best Practices
57/61
8/12/2019 AWS DB Best Practices
58/61
Please give us your feedback on thispresentation
As a thank you, we will select prizewinners daily for completed surveys!
DAT203
8/12/2019 AWS DB Best Practices
59/61
8/12/2019 AWS DB Best Practices
60/61
8/12/2019 AWS DB Best Practices
61/61
Remember