Getting Started with Amazon DynamoDB

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Dean Bryen - Solutions Architect - AWS - @deanbryen Mashooq Badar - Co-Founder - Codurance - @codurance

July 7, 2016

Getting Started with Amazon DynamoDB

Agenda

Brief history of data processing

SQL vs. NoSQL

DynamoDB tables, API, data types, indexes

Scaling

Streams and Triggers

Customer Case Study - Codurance

History of Data Processing

Timeline of database technologyDa

ta P

ress

ure

Ledg

ers

Unit Rec

ords

Data Drum

s

File Syst

ems

RDBMSNoS

QL

Data volume since 2010Da

ta V

olum

e

Historical Current

90% of stored data generated in last 2 years

1 terabyte of data in 2010 equals 6.5 petabytes today

Linear correlation between data pressure and technical innovation

No reason these trends will not continue over time

SQL vs. NoSQL

Amazon’s path to DynamoDB

DynamoDBRDBMS

DB

Relational vs. NonRelational databases

Traditional SQL NoSQL

Primary Secondary

Scale up

DB

Scale out

DB

DBDB

DBDB

DB

DBDB

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

SQL vs. NoSQL schema design

NoSQL design optimises for compute instead of storage

Intro to DynamoDB

Amazon DynamoDB

Fully managed

Low cost

Predictable performance

Massively scalable

Highly available

Over 200 million usersOver 4 billion items stored

Millions of ads per month

Cross-device ad solutions

130+ million new users in 1 year

150+ million messages per month

Process requests in milliseconds High-performance ads

Statcast uses burst scalabilityfor many games on a single day

Flexibility for fast growth

Web clickstream insights

Specialty online and retail stores

Over 5 billion items processed daily

About 200 million messages processed daily

Cognitive training

Job-matching platform

5+ million registered users

Mobile game analytics

10M global users

Home security

Wearable and IoTsolutions

170,000 concurrent players

Consistently low latency at scale

PREDICTABLE PERFORMANCE!

High availability and durability

WRITES Replicated continuously to 3 AZs Persisted to disk (custom SSD)

READS Strongly or eventually consistent

No latency trade-off

Designed to support

99.99% of availability

Built for high durability

How DynamoDB scales

partitions 1 .. N

table

DynamoDB automatically partitions data • Partition key spreads data (and workload) across

partitions • Automatically partitions as data grows and throughput

needs increase

Large number of unique hash keys +

Uniform distribution of workload across hash keys

High-scale apps

Flexibility and low cost

Reads per second

Writes per second

table

Customers can configure a table for just a few RPS or for hundreds of

thousands of RPS

Customers only pay for how much they provision

Provides maximum flexibility to adjust expenditure based on the workload

Fully managed service = automated operations

DB hosted on-premises DB hosted on Amazon EC2

App Optimisation

Scaling

High Availability

Database Backups

DB s/w patches

DB s/w installs

OS patches

OS installation

Server Maintenance

Rack & Stack

Power, HVAC, net

App Optimisation

Scaling

High Availability

Database Backups

DB s/w patches

DB s/w installs

OS patches

OS installation

Server Maintenance

Rack & Stack

Power, HVAC, net

Amazon DynamoDB

App Optimisation

Scaling

High Availability

Database Backups

DB s/w patches

DB s/w installs

OS patches

OS installation

Server Maintenance

Rack & Stack

Power, HVAC, net

DynamoDB Tables and Indexes

DynamoDB table structureTable

Items

Attributes

Partition key

Sort key

Mandatory Key-value access pattern Determines data distribution Optional

Model 1:N relationships Enables rich query capabilities

All items for key ==, <, >, >=, <= “begins with” “between” “contains” “in” sorted results counts top/bottom N values

00 55 A954 FFAA

Partition keysPartition key uniquely identifies an item Partition key is used for building an unordered hash index Allows table to be partitioned for scale

Id = 1 Name = Jim

Hash (1) = 7B

Id = 2 Name = Andy Dept = EngHash (2) = 48

Id = 3 Name = Kim Dept = Ops

Hash (3) = CD

Key Space

Partition:Sort keyPartition:Sort key uses two attributes together to uniquely identify an Item Within unordered hash index, data is arranged by the sort key No limit on the number of items (∞) per partition key

• Except if you have local secondary indexes

00:0 FF:∞

Hash (2) = 48

Customer# = 2 Order# = 10 Item = Pen

Customer# = 2 Order# = 11 Item = Shoes

Customer# = 1 Order# = 10 Item = Toy

Customer# = 1 Order# = 11 Item = Boots

Hash (1) = 7B

Customer# = 3 Order# = 10 Item = Book

Customer# = 3 Order# = 11 Item = Paper

Hash (3) = CD

55 A9:∞54:∞ AA

Partition 1 Partition 2 Partition 3

Partitions are three-way replicated

Id = 2 Name = Andy Dept = Engg


Id = 1 Name = Jim



Id = 1 Name = Jim



Id = 1 Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Local secondary index (LSI)

Alternate sort key attribute Index is local to a partition key

A1 (partition)

A3 (sort)

A2 (item key)

A1 (partition)

A2 (sort)

A3 A4 A5

LSIs A1 (partition)

A4 (sort)

A2 (item key)

A3 (projected)

Table

KEYS_ONLY

INCLUDE A3

A1 (partition)

A5 (sort)

A2 (item key)

A3 (projected)

A4 (projected) ALL

10 GB maximum per partition key; LSIs limit the number of range keys!

Global secondary index (GSI)Alternate partition and/or sort key Index is across all partition keys

A1 (partition)

A2 A3 A4 A5

GSIs A5 (partition)

A4 (sort)

A1 (item key)

A3 (projected)

Table

INCLUDE A3

A4 (partition)

A5 (sort)

A1 (item key)

A2 (projected)

A3 (projected) ALL

A2 (partition)

A1 (itemkey) KEYS_ONLY

Online indexing

Read capacity units (RCUs) and write capacity units (WCUs) are provisioned separately for GSIs

How do GSI updates work?

Table

Primary tablePrimary

tablePrimary tablePrimary

tableGlobal

secondary index

Client1. Update request

2. Asynchronous update (in progress)

2. Update response

If GSIs don’t have enough write capacity, table writes will be throttled!

LSI or GSI?

LSI can be modelled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI!

Scaling

Scaling

Throughput • Provision any amount of throughput to a table

Size • Add any number of items to a table

• Maximum item size is 400 KB • LSIs limit the number of range keys due to 10 GB limit

Scaling is achieved through partitioning

Throughput

Provisioned at the table level • Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second

• RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads

Read and write throughput limits are independent

WCURCU

Partitioning math

In the future, these details might change…

Number of partitionsBy capacity (Total RCU / 3000) + (Total WCU / 1000)By size Total Size / 10 GBTotal partitions CEILING(MAX (Capacity, Size))

Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500

RCUs per partition = 5000/3 = 1666.67 WCUs per partition = 500/3 = 166.67 Data/partition = 10/3 = 3.33 GB

RCUs and WCUs are uniformly spread across partitions

Number of partitions

By capacity (5000 / 3000) + (500 / 1000) = 2.17

By size 8 / 10 = 0.8Total partitions CEILING(MAX (2.17, 0.8)) = 3

To learn more, please attend: Deep Dive on DynamoDB Floor 0, Room 3, 14:00 p.m.–14:45 p.m.Andreas Chatzakis, Solutions Architect

DynamoDB Streams and Triggers

Integration capabilities

DynamoDB Triggers ❑ Implemented as AWS Lambda

functions ❑ Your code scales automatically ❑ Java, Node.js, and Python

DynamoDB Streams ❑ Stream of table updates ❑ Asynchronous ❑ Exactly once ❑ Strictly ordered ❑ 24-hr lifetime per item

Customer Case Study - Codurance

Building AWS Loft Registration Site

@mashooq

Ireland (eu-west-1)

Secure:AWSLambda Amazon

DynamoDB

AmazonSES

users

AWSKMS AWSIAM

admin

AmazonCloudFront

AmazonCloudWatch

AWSCloudTrail

QR Reader

awsloft.londonAWSWAF

S3:StaticHTML/CSSandJavascriptcontent

forthesite.

APIforalldynamiccontent(proxytoAWSLambda)

Serverless Architecture

DynamoDBisapieceofthepuzzle.

Fast feedback …

• SingleServerLocalEnvironment• LocalDynamoDB• SimulatedAPIGateway• AbstractedLambdaAPI• MockedEncryption• MockedExternalServices

• SES,KMSetc.

• MicroservicesbasedCloudEnvironment• ContinuouslydeploytoQA• Oneclickdeploymenttoproduction

• Hotdeployment

Persistence Options

• RDS(Postgres)• Outofboxbackupandrecovery• Outofboxencryption• Maturedevelopmenttoolingandlibraries• Possibledowntimeduringscalling• Relativelycomplicatedmigrations• Morecomplicatedtomodelhierarchicalstructure

• DynamoDB• ElasticScaling• EvolutionarySchemadesign• Easytogetstarted• Customencryption• Complicatedjoins• Backupandrecoveryusingpipelines

Lessons

• DynamoDBiseasytogetstarted• RunslocallyforDevenvironments• Toolingandlibrariesaresurprisinglymature• API(atleastinClojure/Java)issimple• Customencryptionisinconvenientbuteasytoovercome• Backupsusingpipelinesarestraightforward• Schemamigrationsarerare• Possiblymorecostefficientifyouplanwelloruseautoscaling• Easytomonitor

• …it’spainless

Thank You!

Technology

Getting Started with Amazon DynamoDB