Upload
amazon-web-services
View
443
Download
0
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dean Bryen - Solutions Architect - AWS - @deanbryen Mashooq Badar - Co-Founder - Codurance - @codurance
July 7, 2016
Getting Started with Amazon DynamoDB
Agenda
Brief history of data processing
SQL vs. NoSQL
DynamoDB tables, API, data types, indexes
Scaling
Streams and Triggers
Customer Case Study - Codurance
History of Data Processing
Timeline of database technologyDa
ta P
ress
ure
Ledg
ers
Unit Rec
ords
Data Drum
s
File Syst
ems
RDBMSNoS
QL
Data volume since 2010Da
ta V
olum
e
Historical Current
90% of stored data generated in last 2 years
1 terabyte of data in 2010 equals 6.5 petabytes today
Linear correlation between data pressure and technical innovation
No reason these trends will not continue over time
SQL vs. NoSQL
Amazon’s path to DynamoDB
DynamoDBRDBMS
DB
Relational vs. NonRelational databases
Traditional SQL NoSQL
Primary Secondary
Scale up
DB
Scale out
DB
DBDB
DBDB
DB
DBDB
Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
SQL vs. NoSQL schema design
NoSQL design optimises for compute instead of storage
Intro to DynamoDB
Amazon DynamoDB
Fully managed
Low cost
Predictable performance
Massively scalable
Highly available
Over 200 million usersOver 4 billion items stored
Millions of ads per month
Cross-device ad solutions
130+ million new users in 1 year
150+ million messages per month
Process requests in milliseconds High-performance ads
Statcast uses burst scalabilityfor many games on a single day
Flexibility for fast growth
Web clickstream insights
Specialty online and retail stores
Over 5 billion items processed daily
About 200 million messages processed daily
Cognitive training
Job-matching platform
5+ million registered users
Mobile game analytics
10M global users
Home security
Wearable and IoTsolutions
170,000 concurrent players
Consistently low latency at scale
PREDICTABLE PERFORMANCE!
High availability and durability
WRITES Replicated continuously to 3 AZs Persisted to disk (custom SSD)
READS Strongly or eventually consistent
No latency trade-off
Designed to support
99.99% of availability
Built for high durability
How DynamoDB scales
partitions 1 .. N
table
DynamoDB automatically partitions data • Partition key spreads data (and workload) across
partitions • Automatically partitions as data grows and throughput
needs increase
Large number of unique hash keys +
Uniform distribution of workload across hash keys
High-scale apps
Flexibility and low cost
Reads per second
Writes per second
table
Customers can configure a table for just a few RPS or for hundreds of
thousands of RPS
Customers only pay for how much they provision
Provides maximum flexibility to adjust expenditure based on the workload
Fully managed service = automated operations
DB hosted on-premises DB hosted on Amazon EC2
App Optimisation
Scaling
High Availability
Database Backups
DB s/w patches
DB s/w installs
OS patches
OS installation
Server Maintenance
Rack & Stack
Power, HVAC, net
App Optimisation
Scaling
High Availability
Database Backups
DB s/w patches
DB s/w installs
OS patches
OS installation
Server Maintenance
Rack & Stack
Power, HVAC, net
Amazon DynamoDB
App Optimisation
Scaling
High Availability
Database Backups
DB s/w patches
DB s/w installs
OS patches
OS installation
Server Maintenance
Rack & Stack
Power, HVAC, net
DynamoDB Tables and Indexes
DynamoDB table structureTable
Items
Attributes
Partition key
Sort key
Mandatory Key-value access pattern Determines data distribution Optional
Model 1:N relationships Enables rich query capabilities
All items for key ==, <, >, >=, <= “begins with” “between” “contains” “in” sorted results counts top/bottom N values
00 55 A954 FFAA
Partition keysPartition key uniquely identifies an item Partition key is used for building an unordered hash index Allows table to be partitioned for scale
Id = 1 Name = Jim
Hash (1) = 7B
Id = 2 Name = Andy Dept = EngHash (2) = 48
Id = 3 Name = Kim Dept = Ops
Hash (3) = CD
Key Space
Partition:Sort keyPartition:Sort key uses two attributes together to uniquely identify an Item Within unordered hash index, data is arranged by the sort key No limit on the number of items (∞) per partition key
• Except if you have local secondary indexes
00:0 FF:∞
Hash (2) = 48
Customer# = 2 Order# = 10 Item = Pen
Customer# = 2 Order# = 11 Item = Shoes
Customer# = 1 Order# = 10 Item = Toy
Customer# = 1 Order# = 11 Item = Boots
Hash (1) = 7B
Customer# = 3 Order# = 10 Item = Book
Customer# = 3 Order# = 11 Item = Paper
Hash (3) = CD
55 A9:∞54:∞ AA
Partition 1 Partition 2 Partition 3
Partitions are three-way replicated
Id = 2 Name = Andy Dept = Engg
Id = 3 Name = Kim Dept = Ops
Id = 1 Name = Jim
Id = 2 Name = Andy Dept = Engg
Id = 3 Name = Kim Dept = Ops
Id = 1 Name = Jim
Id = 2 Name = Andy Dept = Engg
Id = 3 Name = Kim Dept = Ops
Id = 1 Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N
Local secondary index (LSI)
Alternate sort key attribute Index is local to a partition key
A1 (partition)
A3 (sort)
A2 (item key)
A1 (partition)
A2 (sort)
A3 A4 A5
LSIs A1 (partition)
A4 (sort)
A2 (item key)
A3 (projected)
Table
KEYS_ONLY
INCLUDE A3
A1 (partition)
A5 (sort)
A2 (item key)
A3 (projected)
A4 (projected) ALL
10 GB maximum per partition key; LSIs limit the number of range keys!
Global secondary index (GSI)Alternate partition and/or sort key Index is across all partition keys
A1 (partition)
A2 A3 A4 A5
GSIs A5 (partition)
A4 (sort)
A1 (item key)
A3 (projected)
Table
INCLUDE A3
A4 (partition)
A5 (sort)
A1 (item key)
A2 (projected)
A3 (projected) ALL
A2 (partition)
A1 (itemkey) KEYS_ONLY
Online indexing
Read capacity units (RCUs) and write capacity units (WCUs) are provisioned separately for GSIs
How do GSI updates work?
Table
Primary tablePrimary
tablePrimary tablePrimary
tableGlobal
secondary index
Client1. Update request
2. Asynchronous update (in progress)
2. Update response
If GSIs don’t have enough write capacity, table writes will be throttled!
LSI or GSI?
LSI can be modelled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI!
Scaling
Scaling
Throughput • Provision any amount of throughput to a table
Size • Add any number of items to a table
• Maximum item size is 400 KB • LSIs limit the number of range keys due to 10 GB limit
Scaling is achieved through partitioning
Throughput
Provisioned at the table level • Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
WCURCU
Partitioning math
In the future, these details might change…
Number of partitionsBy capacity (Total RCU / 3000) + (Total WCU / 1000)By size Total Size / 10 GBTotal partitions CEILING(MAX (Capacity, Size))
Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500
RCUs per partition = 5000/3 = 1666.67 WCUs per partition = 500/3 = 166.67 Data/partition = 10/3 = 3.33 GB
RCUs and WCUs are uniformly spread across partitions
Number of partitions
By capacity (5000 / 3000) + (500 / 1000) = 2.17
By size 8 / 10 = 0.8Total partitions CEILING(MAX (2.17, 0.8)) = 3
To learn more, please attend: Deep Dive on DynamoDB Floor 0, Room 3, 14:00 p.m.–14:45 p.m.Andreas Chatzakis, Solutions Architect
DynamoDB Streams and Triggers
Integration capabilities
DynamoDB Triggers ❑ Implemented as AWS Lambda
functions ❑ Your code scales automatically ❑ Java, Node.js, and Python
DynamoDB Streams ❑ Stream of table updates ❑ Asynchronous ❑ Exactly once ❑ Strictly ordered ❑ 24-hr lifetime per item
Customer Case Study - Codurance
Building AWS Loft Registration Site
@mashooq
Ireland (eu-west-1)
Secure:AWSLambda Amazon
DynamoDB
AmazonSES
users
AWSKMS AWSIAM
admin
AmazonCloudFront
AmazonCloudWatch
AWSCloudTrail
QR Reader
awsloft.londonAWSWAF
S3:StaticHTML/CSSandJavascriptcontent
forthesite.
APIforalldynamiccontent(proxytoAWSLambda)
Serverless Architecture
DynamoDBisapieceofthepuzzle.
Fast feedback …
• SingleServerLocalEnvironment• LocalDynamoDB• SimulatedAPIGateway• AbstractedLambdaAPI• MockedEncryption• MockedExternalServices
• SES,KMSetc.
• MicroservicesbasedCloudEnvironment• ContinuouslydeploytoQA• Oneclickdeploymenttoproduction
• Hotdeployment
Persistence Options
• RDS(Postgres)• Outofboxbackupandrecovery• Outofboxencryption• Maturedevelopmenttoolingandlibraries• Possibledowntimeduringscalling• Relativelycomplicatedmigrations• Morecomplicatedtomodelhierarchicalstructure
• DynamoDB• ElasticScaling• EvolutionarySchemadesign• Easytogetstarted• Customencryption• Complicatedjoins• Backupandrecoveryusingpipelines
Lessons
• DynamoDBiseasytogetstarted• RunslocallyforDevenvironments• Toolingandlibrariesaresurprisinglymature• API(atleastinClojure/Java)issimple• Customencryptionisinconvenientbuteasytoovercome• Backupsusingpipelinesarestraightforward• Schemamigrationsarerare• Possiblymorecostefficientifyouplanwelloruseautoscaling• Easytomonitor
• …it’spainless
Thank You!