Upload
amazon-web-services
View
115
Download
1
Embed Size (px)
DESCRIPTION
Learn how Digital Advertising customers are leveraging the integration between Amazon DynamoDB and Amazon Redshift to manage their high scale data, from creation to analysis. In this session, we will describe the three essential ingredients of efficient data flow in the cloud, and introduce a reference architecture that enables customers to meet the demands for low latency and high volume encountered in the Digital Advertising industry. Using existing SQL-based tools and business intelligence systems, you will learn how to gain deeper insight from your data at lower cost. The design principles presented here will be useful to every environment where managing data at scale is a challenge.
Citation preview
Designing for Scale Three steps to optimal data performance
using DynamoDB and Redshift
David Pearson Business Development
Amazon RDS
Amazon DynamoDB Amazon Redshift
Amazon ElastiCache
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
AWS Database
Services
Scalable High Performance
Application Storage in the Cloud
provision
manage
scale
EFFORT
differentiated?
Introduction to AWS Big Data Services
Redshift DynamoDB
Elastic MapReduce Amazon S3
Object Storage
Batch Processing
Real-Time Transactions
Online Analysis and Reporting
Amazon DynamoDB
NoSQL Database
Predictable performance
Seamless & massive scalability
Fully managed; zero admin
Amazon DynamoDB
Amazon’s Path to DynamoDB
RDBMS DynamoDB
Amazon DynamoDB
DEVS
OPS
USERS
Fast Application Development
Time to Build New Applications
• Flexible data models • Simple API • High-scale queries • Laptop development
Amazon DynamoDB
DEVS
OPS
USERS
Amazon DynamoDB
DEVS
OPS
USERS
Admin-Free (at any scale)
request-based capacity provisioning model
Provisioned Throughput
Throughput is declared and updated via the API or the console
CreateTable (foo, reads/sec = 100, writes/sec = 150)
UpdateTable (foo, reads/sec=10000, writes/sec=4500)
DynamoDB handles the rest
Capacity is reserved and available when needed
Scaling-up triggers repartitioning and reallocation
No impact to performance or availability
Amazon DynamoDB
DEVS
OPS
USERS Durable Low Latency
WRITES Replicated continuously to 3 AZ’s
Persisted to disk (custom SSD)
READS Strongly or eventually consistent
No latency trade-off
Latest News… DynamoDB Local
• Disconnected development
• Full API support
• Download from http://aws.amazon.com/dynamodb/resources/#testing
“Compared to similar products, DynamoDB
provides an amazing feature set, including super
low latencies, (literally) push-button scaling,
automatic data persistence, and seamless
integration with Redshift and other AWS services.”
Peter Bogunovich, RightAction Inc
AD SERVING
EC2
Profiles Database
ad request
ad url
visitor
Ad Servers
DynamoDB
1. Visitor loads a web page
2. Web page issues a request to ad servers on EC2
3. Query to DynamoDB returns the ad to display
4. Link is returned to visitor
cookie hash=userid range=timestamp
user-profile hash=userid
EC2
Profiles Database Ad Servers
DynamoDB
Real-time bidding platform
Bidder DynamoDB
Ads Profiles Queues and Buffer Bid response
20 ms
20 ms 20 ms 40 ms
Request network transit
Response network transit Decision on best ad and bid price based on optimization that needs multiple data look-ups
Contingency time buffer
…
Bid request
real-time bidding
EC2
Profiles Database
ad request
ad url
visitor
Ad Servers
DynamoDB
1. Ad files are downloaded from CloudFront
2. Impressions captured in logs to S3
CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
Profiles Database
EC2 (MAZ)
ad request
ad url
Ad Servers
DynamoDB Elastic Load Balancing
visitor
Click-through Servers
click through log files
click through requests
Elastic Load Balancing
Amazon Redshift
Relational data warehouse
Massively parallel
Petabyte scale
Fully managed; zero admin
Amazon Redshift
• Direct-attached storage
• Large data block sizes
• Columnar storage
• Data compression
• Zone maps
Redshift dramatically reduces I/O
Id Age State 123 20 CA 345 25 WA 678 40 FL
Row storage Column storage
• Load
• Query
• Resize
• Backup
• Restore
Redshift parallelizes and distributes everything
Compute Node 16TB
10 GigE (HPC)
Ingestion Backup Restore
SQL Clients / BI Tools
Amazon S3
Client VPC
Compute Node 16TB
Compute Node 16TB
Leader Node
Start small and grow big Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
note: nodes not to scale
Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Monitor query performance
View explain plans
Redshift works with existing BI tools
JDBC/ODBC
Amazon Redshift
More coming soon…
Redshift is Priced to Analyze All Your Data
$0.85 per hour for on-demand (2TB)
$999 per TB per year (3-yr reservation)
“Amazon Redshift introduces a major
opportunity to improve the performance of
our real-time reporting, allowing us to run
queries up to 50 times faster than our current
OLAP solution.” – Niek Sanders, VP Engineering
Realized a 20x – 40x
reduction in query times
“Redshift is the
real deal”
Analysis
CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
Profiles Database
EC2 (MAZ)
ad request
ad url
Ad Servers
DynamoDB Elastic Load Balancing
visitor
Amazon Redshift
bid history user history
ETL Click-through Servers
click through log files
click through requests
Elastic Load Balancing
Amazon EMR
updated profiles
impressions
new requests user history
Amazon Redshift
Drive qualified users to advertiser’s sites
• Ad server logs • 3rd party data
• Bid history • User history
Bid Optimization
Optimizing with Redshift
Optimize return on advertising expenditure
• Impressions • 3rd party data
• User history
• Enrichment
Cost Optimization
1. Describe the full lifecycle of data Identify data consumption patterns, expected data volumes and
SLAs (latency, availability, durability) at each point on the timeline
2. Leverage specialized options
DynamoDB – real-time transaction processing
Redshift – online reporting and analysis
EMR – enrichment
S3 – data staging
Three steps to optimal data performance
3. Optimize access patterns Design database schemas for maximum efficiency
DynamoDB
» minimize payloads
» separate hot data from cold
Redshift
» good distribution and sort key selection – test as needed
» efficient ingestion (from DynamoDB and S3)
Three steps to optimal data performance
DynamoDB • Best Practices, How-Tos, and Tools
• http://aws.amazon.com/dynamodb/resources/
• Download DynamoDB Local • http://aws.amazon.com/dynamodb/resources/#testing
Redshift • Best practices for loading data
• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
• Best practices for designing tables • http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-
practices.html
Resources
Questions