AWS Webcast - Build high-scale applications with Amazon DynamoDB

Chris Munns Solutions Architect

Amazon Web Services

Build High-Scale Applications with

Amazon DynamoDB

Traditional Database Architecture

App/Web Tier

Client Tier

Database Tier

• key-value access • complex queries • transactions • analytics

One Database for All Workloads

App/Web Tier

Client Tier

RDBMS

Cloud Data Tier Architecture

App/Web Tier

Client Tier

Data Tier

Search Cache Blob Store

RDBMS NoSQL Data Warehouse

Workload Driven Data Store Selection

Data Tier

Search Cache Blob Store

RDBMS NoSQL Data Warehouse

logging analytics

key/value simple query

rich search hot reads complex queries and transactions

AWS Services for the Data Tier

Data Tier

Amazon DynamoDB

Amazon RDS

Amazon ElastiCache

Amazon S3

Amazon Redshift

Amazon CloudSearch

logging analytics

key/value simple query

rich search hot reads complex queries and transactions

RDBMS = Default Choice • Amazon.com page composed of responses from 1000’s of

independent services • Query patterns for different service are different

Catalog service is usually heavy key-value Ordering service is very write intensive (key-value) Catalog search has a different pattern for querying

Relational Era @ Amazon.com

RDBMS

Poor Availability Limited Scalability High Cost

Dynamo = NoSQL Technology • Replicated DHT with consistency management • Consistent hashing • Optimistic replication • “Sloppy quorum” • Anti-entropy mechanisms • Object versioning

Distributed Era @ Amazon.com

lack of strong every engineer needs to operational consistency learn distributed systems complexity

DynamoDB = NoSQL Cloud Service

Cloud Era @ Amazon.com

Non-Relational

Fast & Predictable Performance

Seamless Scalability

Easy Administration

DynamoDB Fundamentals

database service

automated operations predictable performance

fast development

always durable

low latency cost effective

=

partitions 1 .. N

table

• DynamoDB automatically partitions data by the hash key Hash key spreads data (& workload) across partitions

• Auto-partitioning occurs with: Data set size growth Provisioned capacity increases

Massive and Seamless Scale

large number of unique hash keys

+ uniform distribution of workload

across hash keys

ready to scale

app’s

Making life easier for developers…

• Developers are freed from: Performance tuning (latency) Automatic 3-way multi-AZ replication Scalability (and scaling operations) Security inspections, patches, upgrades Software upgrades, patches Automatic hardware failover Improving the underlying hardware …and lots of other stuff

Automated Operations

Provisioned Throughput • Request-based capacity provisioning model

• Throughput is declared and updated via the API or the console CreateTable (foo, reads/sec = 100, writes/sec = 150) UpdateTable (foo, reads/sec=10000, writes/sec=4500)

• DynamoDB handles the rest Capacity is reserved and available when needed Scaling-up triggers repartitioning and reallocation No impact to performance or availability

Predictable Performance

WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD)

READS Strongly or eventually consistent

No trade-off in latency

Durable At Scale

WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD)

READS Strongly or eventually consistent

No trade-off in latency

Low Latency At Scale

DynamoDB Customers

“DynamoDB has scaled effortlessly to match our company's explosive growth, doesn't burden our operations staff, and integrates beautifully with our other AWS assets”.

“I love how DynamoDB enables us to provision our desired throughput, and achieve low

latency and seamless scale, even with our constantly growing workloads.”

Weatherbug mobile app

Lightning detection & alerting for 40M users/month

Developed and tested in weeks, at “1/20th of the cost of the traditional DB approach”

Super Bowl promotion

Millions of interactions over a relatively short period of time

Built the app in 3 days, from

design to production-ready

Fast Development

Cost Effective

“Our previous NoSQL database required almost a full time administrator to run.

Now AWS takes care of it.”

“Being optimized at AdRoll means we spend more every month on snacks than

we do on DynamoDB – and almost nothing on an ops team”

Save Money Reduce Effort

DynamoDB Primitives

DynamoDB Concepts

table

DynamoDB Concepts

table

items

DynamoDB Concepts

attributes

items

table

schema-less schema is defined per attribute

DynamoDB Concepts

attributes

items

table

scalar data types • number, string, and binary multi-valued types • string set, number set, and binary set

DynamoDB Concepts

hash

hash keys mandatory for all items in a table key-value access pattern

PutItem UpdateItem DeleteItem BatchWriteItem

GetItem BatchGetItem

Hash = Distribution Key

partition 1 .. N

hash keys mandatory for all items in a table key-value access pattern determines data distribution

Hash = Distribution Key

large number of unique hash keys

uniform distribution of workload across hash keys

optimal schema design

+

Range = Query

range

hash

range keys model 1:N relationships enable rich query capabilities composite primary key

all items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top / bottom N values paged responses

Index Options

local secondary indexes (LSI) alternate range key + same hash key index and table data is co-located (same partition)

Projected Attributes

KEYS_ONLY INCLUDE ALL





Index Options

global secondary indexes (GSI)

any attribute indexed as new hash or range key

Same projected attribute options

• Currently 13 operations in total

Simple API

Manage Tables

• CreateTable

• UpdateTable

• DeleteTable

• DescribeTable

• ListTables

Read and Write Items

• PutItem

• GetItem

• UpdateItem

• DeleteItem

Read and Write Multiple Items

• BatchGetItem

• BatchWriteItem

• Query

• Scan

• Scalar data types String (S) - Unicode with UTF8 binary encoding Number (N) up to 38 digits precision and can be between 10-128 to

10+126

• Variable width encoding can occupy up to 21 bytes

• Multi-valued types String Set (SS) Number Set (NS) Not ordered

Data types

• Data is indexed by the primary key Single Hash Key

• Targeted towards object persistence

Hash Range composite Key • Sorted collection within hash bucket • Can store series of events for a given entity

• Automatic partitioning Leading hash key spreads data & workload across partitions

• Traffic is scaled out and parallelized

Indexing & Partitioning

• Consistent Reads Inventory, shopping cart applications

• Atomic Counters Increment and return new value in same operation

• Conditional Writes Expected value before write – fails on mismatch “state machine” use cases

• Sparse Indexes Ideal for sorted lists; fast access to a subset of items Popular: identify recently updated items; top lists; leaderboards

Other Features

• Use API/SDK/CLI Management Console to crate tables • Use the AWS SDK to interact with DynamoDB

PutItem, UpdateItem, DeleteItem Query Scan etc.

How to use DynamoDB?

$client = $aws->get("dynamodb");

$tableName = "ProductCatalog";

$response = $client->putItem(array(

"TableName" => $tableName,

"Item" => $client->formatAttributes(array(

"Id" => 120,

"Title" => "Book 120 Title",

"ISBN" => "120-1111111111",

"Authors" => array("Author12", "Author22"),

"Price" => 20,

"Category" => "Book",

"Dimensions" => "8.5x11.0x.75",

"InPublication" => 0,

)

),

"ReturnConsumedCapacity" => 'TOTAL'

));

Libraries, SDK’s

Web Console

Interaction

Command Line

Figure: Writing an item to a table via the PHP SDK

• Higher-Level Programming Interfaces

Object Persistence Model for .NET & Java Helper Classes for .NET Transaction Library for Java

• Local DynamoDB available for development and testing • Dynamic DynamoDB for auto-scaling • Many community contributed tools/frameworks

How to use DynamoDB?

[DynamoDBTable("ProductCatalog")]

public class Book

{

[DynamoDBHashKey]

public int Id { get; set; }

public string Title { get; set; }

public int ISBN { get; set; }

[DynamoDBProperty("Authors")]

public List<string> BookAuthors { get; set; }

[DynamoDBIgnore]

public string CoverPage { get; set; }

}

Figure: .NET class using object persistence model

Use Libraries and Tools

Transactions Atomic transactions across multiple items & tables Tracks status of ongoing transactions via two tables

1. Transactions 2. Pre-transaction snapshots of modified items

Geolocation Add location awareness to mobile

applications

Find Yourself – sample app

https://github.com/awslabs




• Third party library for automating scaling decisions • Scale up for service levels, scale down for cost • CloudFormation template for fast deployment

Autoscaling with Dynamic DynamoDB

• Disconnected development with full API support

No network No usage costs

Develop and Test Locally – DynamoDB Local

Note! DynamoDB Local does not have a durability or availability SLA

m2.4xlarge

DynamoDB Local

do this instead!

Some minor differences from Amazon DynamoDB • DynamoDB Local ignores your provisioned throughput

settings The values that you specify when you call CreateTable and

UpdateTable have no effect

• DynamoDB Local does not throttle read or write activity • The values that you supply for the AWS access key and the

Region are only used to name the database file • Your AWS secret key is ignored but must be specified

Recommended using a dummy string of characters

Develop and Test Locally – DynamoDB Local

• Reports CloudWatch metrics Latency Consumed throughput Errors Throttling

• Alarms can be used to dynamically size throughput

Monitoring

CloudWatch

• DynamoDB can be used for large data ingest • Redshift can directly load data from DynamoDB (COPY) • EMR can directly read from DynamoDB by using Hive

Analytics

CREATE EXTERNAL TABLE pc_dynamodb (

[attributes]

)

STORED BY

'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler

'

TBLPROPERTIES ([properties]);

Amazon S3

Redshift

EMR

External Hive table

External Hive table

Hive DynamoDB

CREATE EXTERNAL TABLE pc_s3 (

[attributes]

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

LOCATION 's3://myawsbucket1/catalog/';

• Provisioned Throughput: $0.0065 per hour for every 10 units of Write Capacity 1 write per second for 1 KB items $0.0065 per hour for every 50 units of Read Capacity 1 consistent read per second for 4 KB items

• Storage $0.25 per GB-month of storage

• Free tier! 100MB storage + 50 writes/sec + 10 reads/sec each month

Pricing

Best Practices

• Method 1. Describe the overall use case – maintain context 2. Identify the individual access patterns of the use case 3. Model each access pattern to its own discrete data set 4. Consolidate data sets into tables and indexes

• Benefits Single table fetch for each query Payloads are minimal for each access

Access Pattern Modeling

• Design for uniform data access across items Partition distribution based on hash key Hash Key should be well distributed Access frequency should be distributed across different hash keys

• Time Series Pattern Logging Focus only on recent data

Table Best Practices

Hash Key value Efficiency

User ID, where the application has many users. Good

Status code, where there are only a few possible status codes. Bad

Device ID, where even if there are a lot of devices being tracked, one is by far more popular than all the others.

Bad

• Use One-to-Many Tables instead of large set attributes

Break items up in multiple tables

• Use Multiple Tables to support Varied Access Patterns If you frequently access large items but do not use all attributes, store

smaller frequently attributes in separate tables

• Compress large attributes Reduces cost of storage and throughput

• Store large attributes in S3

Item Best Practices

• Avoid sudden burst of read Activity Reduce page size of Scans Isolate scan operations; create separate tables and write to both:

• Mission-Critical Table • Shadow Table

• Take advantage of parallel scans Sequential scans take longer

Query and Scan Best Practices

Quick Poll + Questions?

Thanks for joining!