What Every Developer Should Know About Database Scalability

Database scalability

Jonathan Ellis

Classic RDBMS persistence

Disk is the new tape*

• ~8ms to seek– ~4ms on expensive 15k rpm disks

What scaling means

Performance

• Latency

• Throughput

Two kinds of operations

• Reads

• Writes

Caching

• Memcached

• Ehcache

• etc

Cache invalidation

• Implicit

• Explicit

Cache set invalidation

get_cached_cart(cart=13, offset=10, limit=10)

get('cart:13:10:10')

Set invalidation 2

prefix = get('cart_prefix:13')

get(prefix + ':10:10')

del('cart_prefix:13')

http://www.aminus.org/blogs/index.php/2007/12/30/memcached_set_invalidation?blog=2

Replication

Types of replication

• Master → slave– Master → slave → other slaves

• Master ↔ master– multi-master

Types of replication 2

• Synchronous

• Asynchronous

Synchronous

• Synchronous = slow(er)

• Complexity (e.g. 2pc)

• PGCluster

• Oracle

Asynchronous master/slave

• Easiest

• Failover

• MySQL replication

• Slony, Londiste, WAL shipping

• Tungsten

Asynchronous multi-master

• Conflict resolution

– O(N3) or O(N2) as you add nodes

– http://research.microsoft.com/~gray/replicas.ps

• Bucardo

• MySQL Cluster

Achtung!

• Asynchronous replication can lose data if the master fails

“Architecture”

• Primarily about how you cope with failure scenarios

Replication does not scale writes

Scaling writes

• Partitioning aka sharding– Key / horizontal

– Vertical

– Directed

Partitioning

Key based partitioning

• PK of “root” table controls destination– e.g. user id

• Retains referential integrity

Example: blogger.com

Comments

Example: blogger.com

Comments

Comments'

Vertical partitioning

• Tables on separate nodes

• Often a table that is too big to keep with the other tables, gets too big for a single node

Growing is hard

Directed partitioning

• Central db that knows what server owns a key

• Makes adding machines easier

• Single point of failure

Partitioning

Partitioning with replication

What these have in common

• Ad hoc

• Error-prone

• Manpower-intensive

To summarize

• Scaling reads sucks

• Scaling writes sucks more

Distributed* databases

• Data is automatically partitioned

• Transparent to application

• Add capacity without downtime

• Failure tolerant

*Like Bigtable, not Lotus Notes

Two famous papers

• Bigtable: A distributed storage system for structured data, 2006

• Dynamo: amazon's highly available key-value store, 2007

The world doesn't need another half-assed key/value store

(See also Olin Shivers' 100% and 80% solutions)

Two approaches

• Bigtable: “How can we build a distributed database on top of GFS?”

• Dynamo: “How can we build a distributed hash table appropriate for the data center?”

Bigtable architecture

Lookup in Bigtable

Dynamo

Eventually consistent

• Amazon: http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

• eBay: http://queue.acm.org/detail.cfm?id=1394128

Consistency in a BASE world

• If W + R > N, you are 100% consistent

• W=1, R=N

• W=N, R=1

• W=Q, R=Q where Q = N / 2 + 1

Cassandra

Memtable / SSTable

Commit log

ColumnFamilies

keyA column1 column2 column3

keyC column1 column7 column11

Column

Byte[] Name

Byte[] Value

I64 timestamp

LSM write properties

• No reads

• No seeks

• Fast

• Atomic within ColumnFamily

vs MySQL with 50GB of data

• MySQL– ~300ms write

– ~350ms read

• Cassandra– ~0.12ms write

– ~15ms read

• Achtung!

Classic RDBMS persistence

Questions

What Every Developer Should Know About Database Scalability

Technology

Database Scalability {Patterns} - O'Reilly Media

Developer Database...- 1 - Developer Database ˘ˇ ˆ ˙˝ ˛ ˚ Database Developer – %ˇ &’ˇ() * ˘ ˆ +˙ ( Developer ˜˛ ! ˛ " ˚ # $ ’ˇ* ˛ - .’$ / 0 – ˝˘ ˆ

Android Developer Database

Performance/Scalability with JDBC, UCP & Oracle Database 12c

MySQL Developer Day conference: MySQL Replication and Scalability

Predicting Replicated Database Scalability from Standalone ......Predicting Replicated Database Scalability from Standalone Database Profiling Sameh Elnikety Microsoft Research Cambridge,

DataDude for Dev Guys Visual Studio Database Developer Edition · 4 Our Toybox •SQL Server Management Studio –Developer Edition –Express Edition •Visual Studio for Database

5 scalability Cloudstack Developer Day

Oracle Database 10g GeoRaster: Scalability and Performance

Accordion: Elastic Scalability for Database Systems Supporting … · 2020-04-30 · Accordion: Elastic Scalability for Database Systems Supporting Distributed Transactions Marco

Java Performance, Scalability, Availability & Security …...Java Performance, Scalability, Availability & Security with Oracle Database 12c 1 1 - Introduction No matter your programming

Amazon Redshift Database Developer Guide

Position: Database Developer Ref: 74376W

Enhance Database Performance and Scalabilitymedia.progress.com/...openedge...database-performance-scalability.pdf · Enhance Database Performance and Scalability Progress OpenEdge:

Oracle Database 12c Feature Support in Oracle SQL Developer

Website: ...faculty.orangecoastcollege.edu/mmalaty/WhyTakeOracleAtOCC.pdf · •Database Developer? ... Oracle Forms & Reports Developer ... • Oracle Database 10g 22. Oracle Database

Video Game Developer Database

Масштабирование баз данных. (Database Scalability)

การติดตั้ง Oracle Database 11g /Oracle SQL Developer และ ... · การติดตั้ง Oracle Database 1. Download โปรแกรม Database

Easy HTML DB. Michael Cunningham Developer/Database Administrator