AWS Webinar - Dynamo DB + Redshift 13_09_19

Designing for Scale Three steps to optimal data performance

using DynamoDB and Redshift

David Pearson Business Development

Amazon RDS

Amazon DynamoDB Amazon Redshift

Amazon ElastiCache

Compute Storage

AWS Global Infrastructure

Database

Application Services

Deployment & Administration

Networking

AWS Database

Services

Scalable High Performance

Application Storage in the Cloud

provision

manage

scale

EFFORT

differentiated?

Introduction to AWS Big Data Services

Redshift DynamoDB

Elastic MapReduce Amazon S3

Object Storage

Batch Processing

Real-Time Transactions

Online Analysis and Reporting

Amazon DynamoDB

NoSQL Database

Predictable performance

Seamless & massive scalability

Fully managed; zero admin

Amazon DynamoDB

Amazon’s Path to DynamoDB

RDBMS DynamoDB

Amazon DynamoDB

DEVS

OPS

USERS

Fast Application Development

Time to Build New Applications

• Flexible data models • Simple API • High-scale queries • Laptop development

Amazon DynamoDB

DEVS

OPS

USERS

Amazon DynamoDB

DEVS

OPS

USERS

Admin-Free (at any scale)

request-based capacity provisioning model

Provisioned Throughput

Throughput is declared and updated via the API or the console

CreateTable (foo, reads/sec = 100, writes/sec = 150)

UpdateTable (foo, reads/sec=10000, writes/sec=4500)

DynamoDB handles the rest

Capacity is reserved and available when needed

Scaling-up triggers repartitioning and reallocation

No impact to performance or availability

Amazon DynamoDB

DEVS

OPS

USERS Durable Low Latency

WRITES Replicated continuously to 3 AZ’s

Persisted to disk (custom SSD)

READS Strongly or eventually consistent

No latency trade-off

Latest News… DynamoDB Local

• Disconnected development

• Full API support

• Download from http://aws.amazon.com/dynamodb/resources/#testing

http://aws.amazon.com/dynamodb/resources/#testing


“Compared to similar products, DynamoDB

provides an amazing feature set, including super

low latencies, (literally) push-button scaling,

automatic data persistence, and seamless

integration with Redshift and other AWS services.”

Peter Bogunovich, RightAction Inc

AD SERVING

EC2

Profiles Database

ad request

ad url

visitor

Ad Servers

DynamoDB

1. Visitor loads a web page

2. Web page issues a request to ad servers on EC2

3. Query to DynamoDB returns the ad to display

4. Link is returned to visitor

cookie hash=userid range=timestamp

user-profile hash=userid

EC2

Profiles Database Ad Servers

DynamoDB

Real-time bidding platform

Bidder DynamoDB

Ads Profiles Queues and Buffer Bid response

20 ms

20 ms 20 ms 40 ms

Request network transit

Response network transit Decision on best ad and bid price based on optimization that needs multiple data look-ups

Contingency time buffer

…

Bid request

real-time bidding

EC2

Profiles Database

ad request

ad url

visitor

Ad Servers

DynamoDB

1. Ad files are downloaded from CloudFront

2. Impressions captured in logs to S3

CloudFront

advertisement

impression logs

Static Repository Files

Amazon S3

CloudFront

advertisement

impression logs


Amazon S3

Profiles Database

EC2 (MAZ)

ad request

ad url

Ad Servers

DynamoDB Elastic Load Balancing

visitor

Click-through Servers

click through log files

click through requests

Elastic Load Balancing

Amazon Redshift

Relational data warehouse

Massively parallel

Petabyte scale

Fully managed; zero admin

Amazon Redshift

• Direct-attached storage

• Large data block sizes

• Columnar storage

• Data compression

• Zone maps

Redshift dramatically reduces I/O

Id Age State 123 20 CA 345 25 WA 678 40 FL

Row storage Column storage

• Load

• Query

• Resize

• Backup

• Restore

Redshift parallelizes and distributes everything

Compute Node 16TB

10 GigE (HPC)

Ingestion Backup Restore

SQL Clients / BI Tools

Amazon S3

Client VPC

Compute Node 16TB

Compute Node 16TB

Leader Node

Start small and grow big Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE

Cluster 2-100 Nodes (32 TB – 1.6 PB)

note: nodes not to scale

Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores, 10 GigE

Cluster 2-100 Nodes (32 TB – 1.6 PB)

Monitor query performance

View explain plans

Redshift works with existing BI tools

JDBC/ODBC

Amazon Redshift

More coming soon…

Redshift is Priced to Analyze All Your Data

$0.85 per hour for on-demand (2TB)

$999 per TB per year (3-yr reservation)

“Amazon Redshift introduces a major

opportunity to improve the performance of

our real-time reporting, allowing us to run

queries up to 50 times faster than our current

OLAP solution.” – Niek Sanders, VP Engineering

Realized a 20x – 40x

reduction in query times

“Redshift is the

real deal”

Analysis

CloudFront

advertisement

impression logs


Amazon S3

Profiles Database

EC2 (MAZ)

ad request

ad url

Ad Servers

DynamoDB Elastic Load Balancing

visitor

Amazon Redshift

bid history user history

ETL Click-through Servers

click through log files

click through requests

Elastic Load Balancing

Amazon EMR

updated profiles

impressions

new requests user history

Amazon Redshift

Drive qualified users to advertiser’s sites

• Ad server logs • 3rd party data

• Bid history • User history

Bid Optimization

Optimizing with Redshift

Optimize return on advertising expenditure

• Impressions • 3rd party data

• User history

• Enrichment

Cost Optimization

1. Describe the full lifecycle of data Identify data consumption patterns, expected data volumes and

SLAs (latency, availability, durability) at each point on the timeline

2. Leverage specialized options

DynamoDB – real-time transaction processing

Redshift – online reporting and analysis

EMR – enrichment

S3 – data staging

Three steps to optimal data performance

3. Optimize access patterns Design database schemas for maximum efficiency

DynamoDB

» minimize payloads

» separate hot data from cold

Redshift

» good distribution and sort key selection – test as needed

» efficient ingestion (from DynamoDB and S3)

Three steps to optimal data performance

DynamoDB • Best Practices, How-Tos, and Tools

• http://aws.amazon.com/dynamodb/resources/

• Download DynamoDB Local • http://aws.amazon.com/dynamodb/resources/#testing

Redshift • Best practices for loading data

• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html

• Best practices for designing tables • http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-

practices.html

Resources

http://aws.amazon.com/dynamodb/resources/



http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html








http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html







Questions

Technology

AWS Webinar - Dynamo DB + Redshift 13_09_19