37
Scaling your Database in the Cloud Spring2011 Presented by: Cory Isaacson, CEO CodeFutures Corporation http://www.dbshards.com View the video of this presentation here!

Scaling SQL and NoSQL Databases in the Cloud

Embed Size (px)

Citation preview

Page 1: Scaling SQL and NoSQL Databases in the Cloud

Scaling your Database in the Cloud

Spring2011

Presented by:Cory Isaacson, CEOCodeFutures Corporationhttp://www.dbshards.com

View the video of this presentation here!

Page 2: Scaling SQL and NoSQL Databases in the Cloud

Introduction

Who I am Cory Isaacson, CEO of CodeFutures Providers of dbShards Author of Software Pipelines

Partnerships: Rightscale

The leading Cloud Management Platform Leaders in database scalability, performance, and high-

availability for the cloud… …based on real-world experience with dozens of cloud-based

applications …social networking, gaming, data collection, mobile, analytics

Objective is to provide useful experience you can apply to scaling (and managing) your database tier… …especially for high volume applications

Page 3: Scaling SQL and NoSQL Databases in the Cloud

Challenges of cloud computing Cloud provides highly attractive service environment

Flexible, scales with need (up or down) No need for dedicated IT staff, fixed facility costs Pay-as-you-go model

Cloud services occasionally fail Partial network outages Server failures…

…by their nature cloud servers are “transient” Disk volume issues

Cloud-based resources are constrained CPU I/O Rates…

…the “Cloud I/O Barrier”

Page 4: Scaling SQL and NoSQL Databases in the Cloud

Typical Application Architecture

Page 5: Scaling SQL and NoSQL Databases in the Cloud

Scaling in the Cloud

Scaling Load Balancers is easy Stateless routing to app server Can add redundant Load Balancers if needed DNS round-robin or intelligent routing for larger sites If one goes down…

…failover to another Scaling Application Servers is easy

Stateless Sessions can easily transition to another server Add or remove servers as need dictates If one goes down…

…failover to another

Page 6: Scaling SQL and NoSQL Databases in the Cloud

Scaling in the Cloud

Scaling the Database tier is hard “Stateful” by definition (and necessity…) Large, integrated data sets…

10s of GBs to TBs (or more…) Difficult to move, reload

I/O dependent… …adversely affected by cloud service failures …and slow cloud I/O

If one goes down… …ouch!

Databases form the “last mile” of true application scalability Initially simple optimization produces the best result Implement a follow-on scalability strategy for long-term

performance goals… …plus a high-availability strategy is a must

Page 7: Scaling SQL and NoSQL Databases in the Cloud

All CPUs wait at the same speed…

The Cloud I/O Barrier

Page 8: Scaling SQL and NoSQL Databases in the Cloud

More Database scalability challenges

Databases have many other challenges that limit scalability ACID compliance…

…especially Consistency …user contention

Operational challenges Failover

Planned, unplanned Maintenance

Index rebuild Restore Space reclamation

Lifecycle Reliable Backup/Restore Monitoring Application Updates Management

Page 9: Scaling SQL and NoSQL Databases in the Cloud

Database slowdown is not linear…

0 5 10 15 20 25 30 35 400

2000

4000

6000

8000

10000

Database Load Curve

TimeExponential (Time)

Data File (GB)

Load T

ime

GB Load Time (Min)

.9 1

1.3 2.5

3.5 11.7

39.0 10 days…

Page 10: Scaling SQL and NoSQL Databases in the Cloud

Challenges apply to all types of databases

Traditional RDBMS (MySQL, Postgres, Oracle…) I/O bound Multi-user, lock contention High-availability Lifecycle management

NoSQL Databases In-memory, Caching, Document…) Reliability, High-availability Limits of a single server

…and a single thread Data dumps to disk Lifecycle Management

No matter what the technology, big databases are hard to manage… …elastic scaling is a real challenge …degradation from growth in size and volume is a certainty

Application-specific database requirements add to the challenge …database design is key… …balance performance vs. convenience vs. data size

Page 11: Scaling SQL and NoSQL Databases in the Cloud

The Laws of Databases

Law #1: Small Databases are fast… Law #2: Big Databases are slow… Law #3: Keep databases small

Page 12: Scaling SQL and NoSQL Databases in the Cloud

What is the answer?

Database sharding is the only effective method for achieving scale, elasticity, reliability and easy management… …regardless of your database technology

Page 13: Scaling SQL and NoSQL Databases in the Cloud

What is Database Sharding?

“Horizontal partitioning is a database design principle whereby rows of a database table are held separately... Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.” Wikipedia

Page 14: Scaling SQL and NoSQL Databases in the Cloud

The key to Database Sharding…

Page 15: Scaling SQL and NoSQL Databases in the Cloud

Database Sharding Architecture

Page 16: Scaling SQL and NoSQL Databases in the Cloud

Database Sharding Architecture

Page 17: Scaling SQL and NoSQL Databases in the Cloud

Database Sharding… the results

Page 18: Scaling SQL and NoSQL Databases in the Cloud

Why does Database Sharding work?

Maximize CPU/Memory per database instance… …as compared to database size

Reduce the size of index trees Speeds writes dramatically

No contention between servers Locking, disk, memory, CPU

Allows for intelligent parallel processing… …Go Fish queries across shards

Keep CPUs busy and productive

Page 19: Scaling SQL and NoSQL Databases in the Cloud

Breaking the Cloud I/O Barrier

Page 20: Scaling SQL and NoSQL Databases in the Cloud

Database types

Traditional RDBMS Proven SQL language…

…although ORM can change that Durable, transactional, dependable…

…perceived as inflexible NoSQL

Typically in-memory… …the real speed advantage

Various flavors… Key/Value Store

Document database Hybrid Store complex documents using a key In-memory and disk based

Page 21: Scaling SQL and NoSQL Databases in the Cloud

Black box vs. Relational Sharding

Both utilize sharding on a data key… …typically modulus on a value or consistent hash

Black box sharding is automatic… …attempts to evenly shard across all available servers …no developer visibility or control …can work acceptably for simple, non-indexed NoSQL data

stores …easily supports single-row/object results

Relational sharding is defined by the developer… …selective sharding of large data …explicit developer control and visibility …tunable as the database grows and matures …more efficient for result sets and searchable queries …preserves data relationships

Page 22: Scaling SQL and NoSQL Databases in the Cloud

How Database Sharding works for NoSQL

Page 23: Scaling SQL and NoSQL Databases in the Cloud

Relational Sharding

Global Tables

Shard-Tree Root Table

Shard-Tree Child Tables

Page 24: Scaling SQL and NoSQL Databases in the Cloud

How Relational Sharding works

Page 25: Scaling SQL and NoSQL Databases in the Cloud

How Relational Sharding works Shard key recognition in SQL

SELECT * FROM customerWHERE customer_id = 1234

INSERT INTO customer(customer_id, first_name, last_name, addr_line1,…)VALUES(2345, ‘John’, ‘Jones’, ‘123 B Street’,…)

UPDATE customerSET addr_line1 = ‘456 C Avenue’WHERE customer_id = 4567

Page 26: Scaling SQL and NoSQL Databases in the Cloud

What about Cross-Shard result sets?

Page 27: Scaling SQL and NoSQL Databases in the Cloud

Cross-shard result set example Go Fish (no shard key)

SELECT country_id, count(*) FROM customerGROUP BY country_id

Page 28: Scaling SQL and NoSQL Databases in the Cloud

More on Cross-Shard result sets…

Black Box approach requires “scatter gather” for multi-row/object result set… …common with NoSQL engines …forces use of denormalized lists …must be maintained by developer/application code …Map Reduce processing helps with this (non-

realtime) Relational sharding provides access to

meaningful result sets of related data… …aggregation, sort easier to perform …logical search operations more natural …related rows are stored together

Page 29: Scaling SQL and NoSQL Databases in the Cloud

Make your Application Shard-Safe1 For sharding to be efficient, every sharded table should include

a shard key.

2 SELECT WHERE clause statements should include a shard key where possible so that the query goes directly to a single shard rather than being executed against multiple shards in parallel.

3 INSERT, UPDATE and DELETE WHERE CLAUSE statements must include a shard key so that the data is written to the correct shard.

4 Avoid cross-shard inner joins. For example, joining one customer to another is not feasible as both customers may be in different shards. This can be accomplished, but requires complex logic and can adversely affect performance.

Page 30: Scaling SQL and NoSQL Databases in the Cloud

What about High-Availability? Can you afford to take your databases offline:

…for scheduled maintenance? …for unplanned failure? …can you accept some lost transactions?

By definition Database Sharding adds failure points to the data tier

A proven High-Availability strategy is a must… …system outages…especially in the cloud …planned maintenance…a necessity for all

database engines

Page 31: Scaling SQL and NoSQL Databases in the Cloud

Traditional asynch replication is inadequate…

Replication “Lag” can result in lost transactions

No rapid failover for continuous operation

Maintenance difficult in production environments

Used by many NoSQL engines

Page 32: Scaling SQL and NoSQL Databases in the Cloud

Database Sharding and High-Availability

Need a minimum of 2 active-active instances per shard… …3 active-active instances for in-memory databases

Support for… …fail-down …failover …maintenance …automated backup/restore …monitoring

Page 33: Scaling SQL and NoSQL Databases in the Cloud

Database Sharding…elastic shards

Page 34: Scaling SQL and NoSQL Databases in the Cloud

Database Sharding…elastic shards Expand the number of shards…

…divide a single shard into N new shards …or rebalance existing shards across N new shards

Contract the number of shards… …consolidate N shards into a single shard

Due to large data sizes, this takes time… …regardless of data architecture

Requires High-Availability to ensure no downtime… …perform scaling on live replica in background …fail back to active instances

Page 35: Scaling SQL and NoSQL Databases in the Cloud

Database Sharding…the future Ability to leverage proven database engines…

…SQL …NoSQL …Caching

Hybrid database infrastructures using the best of all technologies In-memory and disk-based solutions co-existing

Allow developers to select the best database engine for a given set of application requirements… …seamless context-switching within the application …use the API of choice

Improved management… …monitoring …configuration “on-the-fly” …dynamic elastic shards based on demand

Page 36: Scaling SQL and NoSQL Databases in the Cloud

Database Sharding summary Database Sharding is the only effective tool for scaling

your database tier in the cloud… …or anywhere

Use Database Sharding for any type of engine… …SQL …NoSQL …Cache

Use the best engine for a given application requirement… …avoid the “one-size-fits-all” trap …each engine has its strengths to capitalize on

Ensure your High-Availability is proven and bulletproof… …especially critical on the cloud …must support failure and maintenance

Page 37: Scaling SQL and NoSQL Databases in the Cloud

Questions/Answers

Cory IsaacsonCodeFutures [email protected]://www.dbshards.com