Upload
konstantin-gredeskoul
View
26.587
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A collection of problems and solutions worked through Wanelo team as they were scaling the site with the rapid demand of users. By Konstantin Gredeskoul and Eric Saxby.
Citation preview
Proprietary and Confidential
Scaling
100x in six months
by Eric Saxby & Konstantin GredeskoulApril 2013
1Thursday, April 18, 13
What is Wanelo?
Proprietary and Confidential
■ Wanelo (“Wah-nee-lo” from Want, Need Love) is a global platform for shopping.
2Thursday, April 18, 13
What is Wanelo?
Proprietary and Confidential
■ Wanelo (“Wah-nee-lo” from Want, Need Love) is a global platform for shopping.
2Thursday, April 18, 13
Proprietary and Confidential
■ It’s marketing-free shopping across 100s of thousands of unique stores
3Thursday, April 18, 13
Proprietary and Confidential
Personal Activity Feed...4Thursday, April 18, 13
Proprietary and Confidential
Personal Activity Feed...4Thursday, April 18, 13
Proprietary and Confidential
iOS + Android
5Thursday, April 18, 13
Proprietary and Confidential
iOS + Android
5Thursday, April 18, 13
Proprietary and Confidential
Early Decisions
6Thursday, April 18, 13
Proprietary and Confidential
■ Optimize for iteration speed, not performance
Early Decisions
6Thursday, April 18, 13
Proprietary and Confidential
■ Optimize for iteration speed, not performance
■ Keep scalability in mind, track metrics, and fix as needed
Early Decisions
6Thursday, April 18, 13
Proprietary and Confidential
■ Optimize for iteration speed, not performance
■ Keep scalability in mind, track metrics, and fix as needed
■ Introduce many levels of caching early
Early Decisions
6Thursday, April 18, 13
Technology Timeline
Proprietary and Confidential 7Thursday, April 18, 13
Technology Timeline
Proprietary and Confidential
■ 2010 - 2011 Wanelo v1 stack is Java, JSP, MySQL, Hibernate90K lines of code, 53+ DB tables, no tests
7Thursday, April 18, 13
Technology Timeline
Proprietary and Confidential
■ 2010 - 2011 Wanelo v1 stack is Java, JSP, MySQL, Hibernate90K lines of code, 53+ DB tables, no tests
■ May 2012 - June 2012Rewrite from scratch to RoR on PostgreSQL (v2)
7Thursday, April 18, 13
Technology Timeline
Proprietary and Confidential
■ 2010 - 2011 Wanelo v1 stack is Java, JSP, MySQL, Hibernate90K lines of code, 53+ DB tables, no tests
■ May 2012 - June 2012Rewrite from scratch to RoR on PostgreSQL (v2)
■ Ruby app is 10K LOC, full test coverage, 8 database tables, less features
7Thursday, April 18, 13
The “Big” Rewrite
Proprietary and Confidential 8Thursday, April 18, 13
The “Big” Rewrite
Proprietary and Confidential
More info here....
8Thursday, April 18, 13
The “Big” Rewrite
Proprietary and Confidential
More info here....
http://building.wanelo.com/
8Thursday, April 18, 13
The “Big” Rewrite
Proprietary and Confidential
More info here....
http://building.wanelo.com/
8Thursday, April 18, 13
Growth Timeline
Proprietary and Confidential 9Thursday, April 18, 13
Growth Timeline
Proprietary and Confidential
■ 06/2012 - RoR App Relaunches
9Thursday, April 18, 13
Growth Timeline
Proprietary and Confidential
■ 2-3K requests per minute (RPM) peak ■ 06/2012 - RoR App Relaunches
9Thursday, April 18, 13
Growth Timeline
Proprietary and Confidential
■ 08/2012 - iOS App is launched
■ 2-3K requests per minute (RPM) peak ■ 06/2012 - RoR App Relaunches
9Thursday, April 18, 13
Growth Timeline
Proprietary and Confidential
■ 08/2012 - iOS App is launched
■ 2-3K requests per minute (RPM) peak
■ 10-40K RPM peak
■ 06/2012 - RoR App Relaunches
9Thursday, April 18, 13
Growth Timeline
Proprietary and Confidential
■ 08/2012 - iOS App is launched
■ 12/2012 - Android app launched
■ 2-3K requests per minute (RPM) peak
■ 10-40K RPM peak
■ 06/2012 - RoR App Relaunches
9Thursday, April 18, 13
Growth Timeline
Proprietary and Confidential
■ 08/2012 - iOS App is launched
■ 12/2012 - Android app launched
■ 2-3K requests per minute (RPM) peak
■ 40-120K RPM peak
■ 10-40K RPM peak
■ 06/2012 - RoR App Relaunches
9Thursday, April 18, 13
Growth Timeline
Proprietary and Confidential
■ 08/2012 - iOS App is launched
■ 12/2012 - Android app launched
■ 03/2013 - #24 top free apps iTunes
■ 2-3K requests per minute (RPM) peak
■ 40-120K RPM peak
■ 10-40K RPM peak
■ 06/2012 - RoR App Relaunches
9Thursday, April 18, 13
Growth Timeline
Proprietary and Confidential
■ 08/2012 - iOS App is launched
■ 12/2012 - Android app launched
■ 03/2013 - #24 top free apps iTunes
■ 2-3K requests per minute (RPM) peak
■ 40-120K RPM peak
■ 10-40K RPM peak
■ 80-200K RPM peak
■ 06/2012 - RoR App Relaunches
9Thursday, April 18, 13
Requests Per Minute (RPM)
Proprietary and Confidential 10Thursday, April 18, 13
Current Numbers...
Proprietary and Confidential
■ 4M active monthly users
■ 5M products saved 700M times
■ 8M products saved per day
■ 200k stores
11Thursday, April 18, 13
Backend Stack & Key Vendors
Proprietary and Confidential
■ MRI Ruby 1.9.3 & Rails 3.2
■ PostgreSQL 9.2.4, Solr 3.6
■ Joyent Cloud, SmartOSZFS, ARC, raw IO performance, SmartOS, CPU bursting, dTrace
■ Circonus, Chef + OpscodeMonitoring, graphing, alerting, automation
■ Amazon S3 + Fastly CDN
■ NewRelic, statsd, Graphite, nagios
12Thursday, April 18, 13
Wanelo Web Architecture
Proprietary and Confidential
nginx
haproxy
unicorn x 14
haproxy pgbouncer twemproxy
PostgreSQLSolr Redis MemCached
sidekiq
haproxy pgbouncer twemproxy
20 x 8GB 4 x 8GB
6 x 2GB
13Thursday, April 18, 13
This talk is about:
Proprietary and Confidential 14Thursday, April 18, 13
This talk is about:
Proprietary and Confidential
1. How much traffic can your database handle?
14Thursday, April 18, 13
This talk is about:
Proprietary and Confidential
1. How much traffic can your database handle?
2. Special report on counters
14Thursday, April 18, 13
This talk is about:
Proprietary and Confidential
1. How much traffic can your database handle?
3. Scaling database reads
2. Special report on counters
14Thursday, April 18, 13
This talk is about:
Proprietary and Confidential
1. How much traffic can your database handle?
3. Scaling database reads
4. Scaling database writes
2. Special report on counters
14Thursday, April 18, 13
1.How much traffic can your database handle?
15Thursday, April 18, 13
PostgreSQL is Awesome!
Proprietary and Confidential 16Thursday, April 18, 13
PostgreSQL is Awesome!
Proprietary and Confidential
■ Does a fantastic job of not corrupting your data
16Thursday, April 18, 13
PostgreSQL is Awesome!
Proprietary and Confidential
■ Does a fantastic job of not corrupting your data
■ Streaming replication in 9.2 is extremely reliable
16Thursday, April 18, 13
PostgreSQL is Awesome!
Proprietary and Confidential
■ Does a fantastic job of not corrupting your data
■ Streaming replication in 9.2 is extremely reliable
■ Won’t write to a read-only replica
16Thursday, April 18, 13
PostgreSQL is Awesome!
Proprietary and Confidential
■ Does a fantastic job of not corrupting your data
■ Streaming replication in 9.2 is extremely reliable
■ Won’t write to a read-only replica
■ But... No master/master replication
16Thursday, April 18, 13
PostgreSQL is Awesome!
Proprietary and Confidential
■ Does a fantastic job of not corrupting your data
■ Streaming replication in 9.2 is extremely reliable
■ Won’t write to a read-only replica
■ But... No master/master replication(good!)
16Thursday, April 18, 13
Is the database healthy?
Proprietary and Confidential 17Thursday, April 18, 13
What’s healthy?
Proprietary and Confidential 18Thursday, April 18, 13
What’s healthy?
Proprietary and Confidential
■ Able to respond quickly to queries from application (< 4ms disk seek time)
18Thursday, April 18, 13
What’s healthy?
Proprietary and Confidential
■ Able to respond quickly to queries from application (< 4ms disk seek time)
■ Has enough room to grow
18Thursday, April 18, 13
What’s healthy?
Proprietary and Confidential
■ Able to respond quickly to queries from application (< 4ms disk seek time)
■ Has enough room to grow
■ How do we know when we’re approaching a dangerous threshold?
18Thursday, April 18, 13
Oops!
Proprietary and Confidential
NewRelic Latency (yellow = database)
19Thursday, April 18, 13
Oops!
Proprietary and Confidential
NewRelic Latency (yellow = database)
19Thursday, April 18, 13
pg_stat_statements
Proprietary and Confidential
■ Maybe your app is to blame for performance... select query, calls, total_time from pg_stat_statements order by total_time desc limit 12;
20Thursday, April 18, 13
pg_stat_statements
Proprietary and Confidential
■ Maybe your app is to blame for performance... select query, calls, total_time from pg_stat_statements order by total_time desc limit 12;
Similar to Percona Toolkit, but runs all the time collecting stats.
20Thursday, April 18, 13
Proprietary and Confidential
pg_stat_statements
21Thursday, April 18, 13
Proprietary and Confidential
pg_stat_user_indexes
■ Using indexes as much as you think you are?
■ Using indexes at all?
22Thursday, April 18, 13
Proprietary and Confidential
pg_stat_user_indexes
■ Using indexes as much as you think you are?
■ Using indexes at all?
22Thursday, April 18, 13
Proprietary and Confidential
pg_stat_user_tables
■ Full table scans? (seq_scan)
23Thursday, April 18, 13
Proprietary and Confidential
pg_stat_user_tables
■ Full table scans? (seq_scan)
23Thursday, April 18, 13
Throw that in a graph
Proprietary and Confidential
Reads/second for one large table, daily
24Thursday, April 18, 13
Non-linear changes
Proprietary and Confidential
Suspicious spike!
25Thursday, April 18, 13
Correlate different data
Proprietary and Confidential
Deployments! Aha!
26Thursday, April 18, 13
Utilization vs Saturation
Proprietary and Confidential
# of Active PostgreSQL connections
27Thursday, April 18, 13
Utilization vs Saturation
Proprietary and Confidential
Red line: % of max connections establishedPurple: % of connections in query
28Thursday, April 18, 13
Disk reads/writes
Proprietary and Confidential
green: reads, red: writes
29Thursday, April 18, 13
Disk reads/writes
Proprietary and Confidential
Usage increases, but are the disks saturated?
green: reads, red: writes
29Thursday, April 18, 13
Utilization vs Saturation
Proprietary and Confidential 30Thursday, April 18, 13
Utilization vs Saturation
Proprietary and Confidential 30Thursday, April 18, 13
Utilization vs Saturation
Proprietary and Confidential
[How much are you waiting on disk?
31Thursday, April 18, 13
File system cache (ARC)
Proprietary and Confidential 32Thursday, April 18, 13
File system cache (ARC)
Proprietary and Confidential 32Thursday, April 18, 13
File system cache (ARC)
Proprietary and Confidential 32Thursday, April 18, 13
Watch the right things
Proprietary and Confidential
Hit ratio of the file system cache (ARC)
33Thursday, April 18, 13
Watch the right things
Proprietary and Confidential
Hit ratio of the file system cache (ARC)
33Thursday, April 18, 13
Room to grow...
Proprietary and Confidential
Size (including indexes) of a key table
34Thursday, April 18, 13
Working set in RAM?
Proprietary and Confidential
Adding index increases the size
35Thursday, April 18, 13
Working set in RAM?
Proprietary and Confidential
Adding index increases the size
35Thursday, April 18, 13
Collect all the data you can
Proprietary and Confidential
Once we knew where to look, graphs addedlater could explain behavior we could
only guess at earlier
36Thursday, April 18, 13
Collect all the data you can
Proprietary and Confidential
Once we knew where to look, graphs addedlater could explain behavior we could
only guess at earlier
36Thursday, April 18, 13
2.Special report on Counters and Pagination
37Thursday, April 18, 13
Proprietary and Confidential
Problem #1: DB Latency Up...
38Thursday, April 18, 13
Proprietary and Confidential
Problem #1: DB Latency Up...
■ iostat shows 100% disk busy
38Thursday, April 18, 13
Proprietary and Confidential
Problem #1: DB Latency Up...
device r/s w/s Mr/s Mw/s wait actv svc_t %w %b
sd1 384.0 1157.5 48.0 116.8 0.0 8.8 5.7 2 100 sd1 368.0 1117.9 45.7 106.3 0.0 8.0 5.4 2 100 sd1 330.3 1357.5 41.3 139.1 0.0 9.5 5.6 2 100
■ iostat shows 100% disk busy
38Thursday, April 18, 13
Proprietary and Confidential
Problem #1: DB Latency Up...
device r/s w/s Mr/s Mw/s wait actv svc_t %w %b
sd1 384.0 1157.5 48.0 116.8 0.0 8.8 5.7 2 100 sd1 368.0 1117.9 45.7 106.3 0.0 8.0 5.4 2 100 sd1 330.3 1357.5 41.3 139.1 0.0 9.5 5.6 2 100
■ iostat shows 100% disk busy
38Thursday, April 18, 13
Proprietary and Confidential
Problem #1: DB Latency Up...
device r/s w/s Mr/s Mw/s wait actv svc_t %w %b
sd1 384.0 1157.5 48.0 116.8 0.0 8.8 5.7 2 100 sd1 368.0 1117.9 45.7 106.3 0.0 8.0 5.4 2 100 sd1 330.3 1357.5 41.3 139.1 0.0 9.5 5.6 2 100
■ iostat shows 100% disk busy
38Thursday, April 18, 13
Proprietary and Confidential
Problem #1: Diagnostics
39Thursday, April 18, 13
Proprietary and Confidential
■ Database is running very very hot. Initial investigation shows large number of counts.
Problem #1: Diagnostics
39Thursday, April 18, 13
Proprietary and Confidential
■ Database is running very very hot. Initial investigation shows large number of counts.
■ Turns out anytime you page with Kaminari, it always does a count(*)!
Problem #1: Diagnostics
39Thursday, April 18, 13
Proprietary and Confidential
■ Database is running very very hot. Initial investigation shows large number of counts.
■ Turns out anytime you page with Kaminari, it always does a count(*)!
Problem #1: Diagnostics
SELECT "stores".* FROM "stores" WHERE (state = 'approved') LIMIT 20 OFFSET 0
SELECT COUNT(*) FROM "stores" WHERE (state = 'approved')
39Thursday, April 18, 13
Proprietary and Confidential
Problem #1: Pagination
40Thursday, April 18, 13
Proprietary and Confidential
Problem #1: Pagination
■ Doing count(*) is pretty expensive, as DB must scan many rows (either the actual table or an index)
40Thursday, April 18, 13
Proprietary and Confidential
Problem #1: Pagination
41Thursday, April 18, 13
Proprietary and Confidential
■ We are paginating everything! Even infinite scroll is a paged view behind the scenes.
Problem #1: Pagination
41Thursday, April 18, 13
Proprietary and Confidential
■ We are paginating everything! Even infinite scroll is a paged view behind the scenes.
Problem #1: Pagination
■ But we really DON’T want to run count(*) for every paged view.
41Thursday, April 18, 13
Proprietary and Confidential
■ We are showing most popular stores
■ Maybe it’s OK to hard-code the total number to, say, 1000?
Problem #1: Pagination
42Thursday, April 18, 13
Proprietary and Confidential
■ We are showing most popular stores
■ Maybe it’s OK to hard-code the total number to, say, 1000?
■ How do we tell Kaminari NOT to issue a count query in this case?
Problem #1: Pagination
42Thursday, April 18, 13
Proprietary and Confidential
Problem #1: Pagination (ctd)
43Thursday, April 18, 13
Proprietary and Confidential
Solution #1: Monkey Patch!!
44Thursday, April 18, 13
Proprietary and Confidential
Solution #1: Monkey Patch!!
44Thursday, April 18, 13
Proprietary and Confidential
Solution #1: Pass in the counter
45Thursday, April 18, 13
Proprietary and Confidential
SELECT "stores".* FROM "stores" WHERE (state = 'approved') LIMIT 20 OFFSET 0
Solution #1: Pass in the counter
45Thursday, April 18, 13
Proprietary and Confidential
■ AKA: We still are doing too many counts!
Problem #2: Count Draculas
46Thursday, April 18, 13
Proprietary and Confidential
■ AKA: We still are doing too many counts!
Problem #2: Count Draculas
46Thursday, April 18, 13
Proprietary and Confidential
■ AKA: We still are doing too many counts!
Problem #2: Count Draculas
■ Rails makes it so easy to do it the lazy way.
46Thursday, April 18, 13
Proprietary and Confidential
■ But it just doesn’t scale well
Problem #2: Too Many Counts!
47Thursday, April 18, 13
Proprietary and Confidential
■ But it just doesn’t scale well
Problem #2: Too Many Counts!
■ Fortunately, Rails has just a feature for this...
47Thursday, April 18, 13
Proprietary and Confidential
■ But it just doesn’t scale well
Problem #2: Too Many Counts!
■ Fortunately, Rails has just a feature for this...
47Thursday, April 18, 13
Proprietary and Confidential
■ Unfortunately, it has one massive issue:
Counter Caches
48Thursday, April 18, 13
Proprietary and Confidential
■ Unfortunately, it has one massive issue:
Counter Caches
■ It causes database deadlocks at high volume
48Thursday, April 18, 13
Proprietary and Confidential
■ Unfortunately, it has one massive issue:
Counter Caches
■ It causes database deadlocks at high volume
■ Because many ruby processes are creating child records concurrently
48Thursday, April 18, 13
Proprietary and Confidential
■ Unfortunately, it has one massive issue:
Counter Caches
■ It causes database deadlocks at high volume
■ Because many ruby processes are creating child records concurrently
■ Each is executing a callback, trying to update counter_cache column on the parent, requiring row-level lock
48Thursday, April 18, 13
Proprietary and Confidential
■ Unfortunately, it has one massive issue:
Counter Caches
■ It causes database deadlocks at high volume
■ Because many ruby processes are creating child records concurrently
■ Each is executing a callback, trying to update counter_cache column on the parent, requiring row-level lock
■ Deadlocks ensue
48Thursday, April 18, 13
Proprietary and Confidential
Possible Solution:Use Background Jobs
49Thursday, April 18, 13
Proprietary and Confidential
Possible Solution:Use Background Jobs
■ It works like this:
49Thursday, April 18, 13
Proprietary and Confidential
Possible Solution:Use Background Jobs
■ It works like this:
■ As the record is created, we enqueue a request to recalculate counter_cache on the parent
49Thursday, April 18, 13
Proprietary and Confidential
Possible Solution:Use Background Jobs
■ It works like this:
■ As the record is created, we enqueue a request to recalculate counter_cache on the parent
■ The job performs a complete recalculation of the counter cache and is idempotent
49Thursday, April 18, 13
Proprietary and Confidential
Solution #2: Explained
50Thursday, April 18, 13
Proprietary and Confidential
Solution #2: Explained
■ Sidekiq with UniqueJob extension
50Thursday, April 18, 13
Proprietary and Confidential
Solution #2: Explained
■ Sidekiq with UniqueJob extension
■ Short wait for “buffering”
50Thursday, April 18, 13
Proprietary and Confidential
Solution #2: Explained
■ Sidekiq with UniqueJob extension
■ Short wait for “buffering”
■ Serialize updates via small number of workers
50Thursday, April 18, 13
Proprietary and Confidential
Solution #2: Explained
■ Sidekiq with UniqueJob extension
■ Short wait for “buffering”
■ Serialize updates via small number of workers
■ Can temporarily stop workers (in an emergency) to alleviate DB load
50Thursday, April 18, 13
Proprietary and Confidential
Solution #2: Code
51Thursday, April 18, 13
Proprietary and Confidential
Things are better. BUT...
52Thursday, April 18, 13
Proprietary and Confidential
Things are better. BUT...Still too many fucking counts!
52Thursday, April 18, 13
Proprietary and Confidential
Things are better. BUT...
■ Even doing count(*) from workers is too much on the databases
Still too many fucking counts!
52Thursday, April 18, 13
Proprietary and Confidential
Things are better. BUT...
■ Even doing count(*) from workers is too much on the databases
■ We need to stop doing count(*) in DB. But keep counter_caches. How?
Still too many fucking counts!
52Thursday, April 18, 13
Proprietary and Confidential
Things are better. BUT...
■ Even doing count(*) from workers is too much on the databases
■ We need to stop doing count(*) in DB. But keep counter_caches. How?
■ We could use Redis for this.
Still too many fucking counts!
52Thursday, April 18, 13
Proprietary and Confidential
Solution #3: Counts Deltas unicorn
PostgreSQL
sidekiq
save product product_id
RedisCounters
RedisSidekiq
3. Dequeue
2. ProductCountWorker.enqueue product_id
4. GET5. RESET
1. INCR product_id
5. SQL Update INCR by N
counter_cache column
53Thursday, April 18, 13
Proprietary and Confidential
Solution #3: Counts Deltas unicorn
PostgreSQL
sidekiq
save product product_id
RedisCounters
RedisSidekiq
3. Dequeue
2. ProductCountWorker.enqueue product_id
4. GET5. RESET
1. INCR product_id
5. SQL Update INCR by N
counter_cache column
■ Web request increments counter value in Redis
■ Enqueues request to update counter_cache
■ Background Job picks up a few minutes later, reads Redis delta value, and removes it.
■ Updates counter_cache column by incrementing it by delta.
53Thursday, April 18, 13
Proprietary and Confidential
Define counter_cache_on...
■ Internal GEM, will open source soon!
54Thursday, April 18, 13
Proprietary and Confidential
Can now use counter caches in pagination!
55Thursday, April 18, 13
3.Scaling reads
56Thursday, April 18, 13
Multiple optimization cycles
Proprietary and Confidential
■ Cachingaction caching, fragment, CDN
■ Personalization via AJAXCache the entire page, then add personalized details
■ 25ms/req memcached time is cheaper than 12ms/req of database time
57Thursday, April 18, 13
Cache optimization
Proprietary and Confidential
40% hit ratio! Woo!Wait... is that even good?
58Thursday, April 18, 13
Cache optimization
Proprietary and Confidential
Increasing your hit ratio means lessqueries against your database
59Thursday, April 18, 13
Cache optimization
Proprietary and Confidential
Caveat: even low hit ratio cachescan save your ass. You’re removing
load from the DB, remember?
60Thursday, April 18, 13
Cache saturation
Proprietary and Confidential
How long before your cachesstart evicting data?
Blue: cache writesRed: automatic evictions
61Thursday, April 18, 13
Cache saturation
Proprietary and Confidential
How long before your cachesstart evicting data?
Blue: cache writesRed: automatic evictions
61Thursday, April 18, 13
Cache saturation
Proprietary and Confidential
How long before your cachesstart evicting data?
Blue: cache writesRed: automatic evictions
61Thursday, April 18, 13
Ajax personalization
Proprietary and Confidential 62Thursday, April 18, 13
Ajax personalization
Proprietary and Confidential 62Thursday, April 18, 13
Ajax personalization
Proprietary and Confidential 62Thursday, April 18, 13
Nice!
Proprietary and Confidential
■ Rails Action CachingRuns before_filters, so A/B experiments can still run
■ Extremely fast pages4ms application time for some of our computationally heaviest pages
■ Could be served via CDN in the future
63Thursday, April 18, 13
Sad trombone...
Proprietary and Confidential
■ Are you actually logged in?Pages don’t know until Ajax successfully runs
■ Selenium AND Jasmine tests!
64Thursday, April 18, 13
Read/write splitting
Proprietary and Confidential
■ Sometime in December 2012...
65Thursday, April 18, 13
Read/write splitting
Proprietary and Confidential
■ Sometime in December 2012...
■ Database reaching 100% saturation
65Thursday, April 18, 13
Read/write splitting
Proprietary and Confidential
■ Sometime in December 2012...
■ Database reaching 100% saturation
■ Latency starting to increase non-linearly
65Thursday, April 18, 13
Read/write splitting
Proprietary and Confidential
■ Sometime in December 2012...
■ Database reaching 100% saturation
■ Latency starting to increase non-linearly
■ We need to distribute database load
65Thursday, April 18, 13
Read/write splitting
Proprietary and Confidential
■ Sometime in December 2012...
■ Database reaching 100% saturation
■ Latency starting to increase non-linearly
■ We need to distribute database load
■ We need to use read replicas!
65Thursday, April 18, 13
DB adapters for read/write
Proprietary and Confidential
■ Looked at several, including DbCharmer
66Thursday, April 18, 13
DB adapters for read/write
Proprietary and Confidential
■ Features / Configurability / Stability
■ Thread safety? This may be Ruby, but some people do actually use threads.
■ If I tell you it’s a read-only replica, DON’T ISSUE WRITES
■ Failover on errors?
■ Looked at several, including DbCharmer
66Thursday, April 18, 13
Chose Makara, by TaskRabbit
Proprietary and Confidential
■ Used in production
■ We extended it to work with PostgreSQL
■ Works with Sidekiqs (thread-safe!)
■ Failover code is very simple. Simple is sometimes better.
https://github.com/taskrabbit/makara
67Thursday, April 18, 13
We rolled out Makara and...
Proprietary and Confidential
■ 1 master, 3 read-only async replicas
68Thursday, April 18, 13
We rolled out Makara and...
Proprietary and Confidential
Wait, what?
■ 1 master, 3 read-only async replicas
68Thursday, April 18, 13
A note about graphs
Proprietary and Confidential
■ NewRelic is great!
■ Not easy to predict when your systems are about to fall over
■ Use something else to visualize Database and disk saturation
69Thursday, April 18, 13
3 days later, in production
Proprietary and Confidential
■ 3 read replicas distributing load from master
■ app servers and sidekiqs create lots of connections to DB backends
70Thursday, April 18, 13
3 days later, in production
Proprietary and Confidential
■ 3 read replicas distributing load from master
■ app servers and sidekiqs create lots of connections to DB backends
■ Mysterious spikes in errors at high traffic
70Thursday, April 18, 13
3 days later, in production
Proprietary and Confidential
■ 3 read replicas distributing load from master
■ app servers and sidekiqs create lots of connections to DB backends
■ Mysterious spikes in errors at high traffic
70Thursday, April 18, 13
Replication! Doh!
Proprietary and Confidential
Replication lag (yellow) correlates with application errors (red)
71Thursday, April 18, 13
Replication lag! Doh!
Proprietary and Confidential
■ Track latency sending xlog to slavesselect client_addr, pg_xlog_location_diff(sent_location, write_location) from pg_stat_replication;
■ Track latency applying xlogs on slavesselect pg_xlog_location_diff( pg_last_xlog_receive_location(), pg_last_xlog_replay_location()), extract(epoch from now()) - extract(epoch from pg_last_xact_replay_timestamp());
72Thursday, April 18, 13
Eventual Consistency
Proprietary and Confidential 73Thursday, April 18, 13
Eventual Consistency
Proprietary and Confidential
■ Some code paths should always go to master for reads (ie, after signup)
73Thursday, April 18, 13
Eventual Consistency
Proprietary and Confidential
■ Some code paths should always go to master for reads (ie, after signup)
■ Application should be resilient to getting RecordNotFound to tolerate replication delays
73Thursday, April 18, 13
Eventual Consistency
Proprietary and Confidential
■ Some code paths should always go to master for reads (ie, after signup)
■ Not enough to scale reads. Writes become the bottleneck.
■ Application should be resilient to getting RecordNotFound to tolerate replication delays
73Thursday, April 18, 13
Write load delays replication
Proprietary and Confidential
Replicas are busy trying to apply XLOGsand serve heavy read traffic
74Thursday, April 18, 13
4.Scaling database writes
75Thursday, April 18, 13
First, No-Brainers:
Proprietary and Confidential
■ Move stuff out of the DB. Easiest first.
76Thursday, April 18, 13
First, No-Brainers:
Proprietary and Confidential
■ Tracking user activity is very easy to do with a database table. But slow.
■ Move stuff out of the DB. Easiest first.
76Thursday, April 18, 13
First, No-Brainers:
Proprietary and Confidential
■ Tracking user activity is very easy to do with a database table. But slow.
■ 2000 inserts/sec while also handling site critical data? Not a good idea.
■ Move stuff out of the DB. Easiest first.
76Thursday, April 18, 13
First, No-Brainers:
Proprietary and Confidential
■ Tracking user activity is very easy to do with a database table. But slow.
■ 2000 inserts/sec while also handling site critical data? Not a good idea.
■ Solution:UDP packets to rsyslog, ASCII delimited files, log-rotate, analyze them later
■ Move stuff out of the DB. Easiest first.
76Thursday, April 18, 13
Next: Async Commits
Proprietary and Confidential 77Thursday, April 18, 13
Next: Async Commits
Proprietary and Confidential
■ PostgreSQL supports delayed (batched) commits
77Thursday, April 18, 13
Next: Async Commits
Proprietary and Confidential
■ PostgreSQL supports delayed (batched) commits
■ Delays fsync for some # of microseconds
77Thursday, April 18, 13
Next: Async Commits
Proprietary and Confidential
■ PostgreSQL supports delayed (batched) commits
■ Delays fsync for some # of microseconds
■ At high volume helps disk IO
77Thursday, April 18, 13
PostgreSQL Async Commits
Proprietary and Confidential 78Thursday, April 18, 13
ZFS Block Size
Proprietary and Confidential 79Thursday, April 18, 13
ZFS Block Size
Proprietary and Confidential
■ Default ZFS block size is 128Kb
79Thursday, April 18, 13
ZFS Block Size
Proprietary and Confidential
■ Default ZFS block size is 128Kb
■ PostgreSQL block size is 8Kb
79Thursday, April 18, 13
ZFS Block Size
Proprietary and Confidential
■ Default ZFS block size is 128Kb
■ PostgreSQL block size is 8Kb
■ Small writes require lots of bandwidth
79Thursday, April 18, 13
ZFS Block Size
Proprietary and Confidential
■ Default ZFS block size is 128Kb
device r/s w/s Mr/s Mw/s wait actv svc_t %w %b
sd1 384.0 1157.5 48.0 116.8 0.0 8.8 5.7 2 100 sd1 368.0 1117.9 45.7 106.3 0.0 8.0 5.4 2 100 sd1 330.3 1357.5 41.3 139.1 0.0 9.5 5.6 2 100
■ PostgreSQL block size is 8Kb
■ Small writes require lots of bandwidth
79Thursday, April 18, 13
ZFS Block Size (ctd.)
Proprietary and Confidential 80Thursday, April 18, 13
ZFS Block Size (ctd.)
Proprietary and Confidential
■ Solution: change ZFS block size to 8K:
80Thursday, April 18, 13
ZFS Block Size (ctd.)
Proprietary and Confidential
■ Solution: change ZFS block size to 8K:
device r/s w/s Mr/s Mw/s wait actv svc_t %w %b sd1 384.0 1157.5 48.0 116.8 0.0 8.8 5.7 2 100 sd1 368.0 1117.9 45.7 106.3 0.0 8.0 5.4 2 100 sd1 330.3 1357.5 41.3 139.1 0.0 9.5 5.6 2 100
80Thursday, April 18, 13
ZFS Block Size (ctd.)
Proprietary and Confidential
■ Solution: change ZFS block size to 8K:
device r/s w/s Mr/s Mw/s wait actv svc_t %w %b sd1 384.0 1157.5 48.0 116.8 0.0 8.8 5.7 2 100 sd1 368.0 1117.9 45.7 106.3 0.0 8.0 5.4 2 100 sd1 330.3 1357.5 41.3 139.1 0.0 9.5 5.6 2 100
device r/s w/s Mr/s Mw/s wait actv svc_t %w %bsd1 130.3 219.9 1.0 4.4 0.0 0.7 2.1 0 37sd1 329.3 384.1 2.6 11.3 0.0 1.9 2.6 1 78sd1 335.3 357.0 2.6 8.7 0.0 1.8 2.6 1 80sd1 354.0 200.3 2.8 4.9 0.0 1.6 3.0 0 84sd1 465.3 100.7 3.6 1.7 0.0 2.1 3.7 0 91
80Thursday, April 18, 13
Next: Vertical Sharding
Proprietary and Confidential 81Thursday, April 18, 13
Next: Vertical Sharding
Proprietary and Confidential
■ Move out largest table into its own master database (150 inserts/sec)
81Thursday, April 18, 13
Next: Vertical Sharding
Proprietary and Confidential
■ Move out largest table into its own master database (150 inserts/sec)
■ Remove any SQL joins, do them in application, drop foreign keys
81Thursday, April 18, 13
Next: Vertical Sharding
Proprietary and Confidential
■ Move out largest table into its own master database (150 inserts/sec)
■ Remove any SQL joins, do them in application, drop foreign keys
■ Switch model to establish_connection to another DB. Fix many broken tests.
81Thursday, April 18, 13
Vertical Sharding
Proprietary and Confidential
unicorns
haproxy pgbouncer twemproxy
PostgreSQLmain master
PostgreSQLsaves masterPostgreSQL
main replicaPostgreSQLmain replica
streaming replication
82Thursday, April 18, 13
Vertical Sharding: Results
Proprietary and Confidential 83Thursday, April 18, 13
Vertical Sharding: Results
Proprietary and Confidential
■ Deploy All Things!
83Thursday, April 18, 13
Future: Services Approach
Proprietary and Confidential
unicorns
haproxy pgbouncer twemproxy
PostgreSQLmain master
Shard1
PostgreSQLmain replica
PostgreSQLmain replica
streaming replication
sinatra services app
Shard2 Shard3
http / json
84Thursday, April 18, 13
In Conclusion. Tasty gems :)
Proprietary and Confidential
https://github.com/wanelo/pause
https://github.com/wanelo/spanx
https://github.com/wanelo/redis_with_failover
https://github.com/kigster/ventable
85Thursday, April 18, 13
In Conclusion. Tasty gems :)
Proprietary and Confidential
https://github.com/wanelo/pause
https://github.com/wanelo/spanx
https://github.com/wanelo/redis_with_failover
https://github.com/kigster/ventable
■ distributed rate limiting using redis
85Thursday, April 18, 13
In Conclusion. Tasty gems :)
Proprietary and Confidential
https://github.com/wanelo/pause
https://github.com/wanelo/spanx
https://github.com/wanelo/redis_with_failover
https://github.com/kigster/ventable
■ distributed rate limiting using redis
■ rate-limit-based IP blocker for nginx
85Thursday, April 18, 13
In Conclusion. Tasty gems :)
Proprietary and Confidential
https://github.com/wanelo/pause
https://github.com/wanelo/spanx
https://github.com/wanelo/redis_with_failover
https://github.com/kigster/ventable
■ distributed rate limiting using redis
■ rate-limit-based IP blocker for nginx
■ attempt another redis server if available
85Thursday, April 18, 13
In Conclusion. Tasty gems :)
Proprietary and Confidential
https://github.com/wanelo/pause
https://github.com/wanelo/spanx
https://github.com/wanelo/redis_with_failover
https://github.com/kigster/ventable
■ distributed rate limiting using redis
■ rate-limit-based IP blocker for nginx
■ attempt another redis server if available
■ observable pattern with a twist
85Thursday, April 18, 13
Thanks.
Comments? Questions?
https://github.com/wanelo
https://github.com/wanelo-chef
Proprietary and Confidential
@kig & @sax
@kig & @ecdysone
@kigster & @sax
86Thursday, April 18, 13