Scaling Wanelo.com 100x in Six Months

Preview:

DESCRIPTION

A collection of problems and solutions worked through Wanelo team as they were scaling the site with the rapid demand of users. By Konstantin Gredeskoul and Eric Saxby.

Citation preview

Proprietary and Confidential

Scaling

100x in six months

by Eric Saxby & Konstantin GredeskoulApril 2013

1Thursday, April 18, 13

What is Wanelo?

Proprietary and Confidential

■ Wanelo (“Wah-nee-lo” from Want, Need Love) is a global platform for shopping.

2Thursday, April 18, 13

What is Wanelo?

Proprietary and Confidential

■ Wanelo (“Wah-nee-lo” from Want, Need Love) is a global platform for shopping.

2Thursday, April 18, 13

Proprietary and Confidential

■ It’s marketing-free shopping across 100s of thousands of unique stores

3Thursday, April 18, 13

Proprietary and Confidential

Personal Activity Feed...4Thursday, April 18, 13

Proprietary and Confidential

Personal Activity Feed...4Thursday, April 18, 13

Proprietary and Confidential

iOS + Android

5Thursday, April 18, 13

Proprietary and Confidential

iOS + Android

5Thursday, April 18, 13

Proprietary and Confidential

Early Decisions

6Thursday, April 18, 13

Proprietary and Confidential

■ Optimize for iteration speed, not performance

Early Decisions

6Thursday, April 18, 13

Proprietary and Confidential

■ Optimize for iteration speed, not performance

■ Keep scalability in mind, track metrics, and fix as needed

Early Decisions

6Thursday, April 18, 13

Proprietary and Confidential

■ Optimize for iteration speed, not performance

■ Keep scalability in mind, track metrics, and fix as needed

■ Introduce many levels of caching early

Early Decisions

6Thursday, April 18, 13

Technology Timeline

Proprietary and Confidential 7Thursday, April 18, 13

Technology Timeline

Proprietary and Confidential

■ 2010 - 2011 Wanelo v1 stack is Java, JSP, MySQL, Hibernate90K lines of code, 53+ DB tables, no tests

7Thursday, April 18, 13

Technology Timeline

Proprietary and Confidential

■ 2010 - 2011 Wanelo v1 stack is Java, JSP, MySQL, Hibernate90K lines of code, 53+ DB tables, no tests

■ May 2012 - June 2012Rewrite from scratch to RoR on PostgreSQL (v2)

7Thursday, April 18, 13

Technology Timeline

Proprietary and Confidential

■ 2010 - 2011 Wanelo v1 stack is Java, JSP, MySQL, Hibernate90K lines of code, 53+ DB tables, no tests

■ May 2012 - June 2012Rewrite from scratch to RoR on PostgreSQL (v2)

■ Ruby app is 10K LOC, full test coverage, 8 database tables, less features

7Thursday, April 18, 13

The “Big” Rewrite

Proprietary and Confidential 8Thursday, April 18, 13

The “Big” Rewrite

Proprietary and Confidential

More info here....

8Thursday, April 18, 13

The “Big” Rewrite

Proprietary and Confidential

More info here....

http://building.wanelo.com/

8Thursday, April 18, 13

The “Big” Rewrite

Proprietary and Confidential

More info here....

http://building.wanelo.com/

8Thursday, April 18, 13

Growth Timeline

Proprietary and Confidential 9Thursday, April 18, 13

Growth Timeline

Proprietary and Confidential

■ 06/2012 - RoR App Relaunches

9Thursday, April 18, 13

Growth Timeline

Proprietary and Confidential

■ 2-3K requests per minute (RPM) peak ■ 06/2012 - RoR App Relaunches

9Thursday, April 18, 13

Growth Timeline

Proprietary and Confidential

■ 08/2012 - iOS App is launched

■ 2-3K requests per minute (RPM) peak ■ 06/2012 - RoR App Relaunches

9Thursday, April 18, 13

Growth Timeline

Proprietary and Confidential

■ 08/2012 - iOS App is launched

■ 2-3K requests per minute (RPM) peak

■ 10-40K RPM peak

■ 06/2012 - RoR App Relaunches

9Thursday, April 18, 13

Growth Timeline

Proprietary and Confidential

■ 08/2012 - iOS App is launched

■ 12/2012 - Android app launched

■ 2-3K requests per minute (RPM) peak

■ 10-40K RPM peak

■ 06/2012 - RoR App Relaunches

9Thursday, April 18, 13

Growth Timeline

Proprietary and Confidential

■ 08/2012 - iOS App is launched

■ 12/2012 - Android app launched

■ 2-3K requests per minute (RPM) peak

■ 40-120K RPM peak

■ 10-40K RPM peak

■ 06/2012 - RoR App Relaunches

9Thursday, April 18, 13

Growth Timeline

Proprietary and Confidential

■ 08/2012 - iOS App is launched

■ 12/2012 - Android app launched

■ 03/2013 - #24 top free apps iTunes

■ 2-3K requests per minute (RPM) peak

■ 40-120K RPM peak

■ 10-40K RPM peak

■ 06/2012 - RoR App Relaunches

9Thursday, April 18, 13

Growth Timeline

Proprietary and Confidential

■ 08/2012 - iOS App is launched

■ 12/2012 - Android app launched

■ 03/2013 - #24 top free apps iTunes

■ 2-3K requests per minute (RPM) peak

■ 40-120K RPM peak

■ 10-40K RPM peak

■ 80-200K RPM peak

■ 06/2012 - RoR App Relaunches

9Thursday, April 18, 13

Requests Per Minute (RPM)

Proprietary and Confidential 10Thursday, April 18, 13

Current Numbers...

Proprietary and Confidential

■ 4M active monthly users

■ 5M products saved 700M times

■ 8M products saved per day

■ 200k stores

11Thursday, April 18, 13

Backend Stack & Key Vendors

Proprietary and Confidential

■ MRI Ruby 1.9.3 & Rails 3.2

■ PostgreSQL 9.2.4, Solr 3.6

■ Joyent Cloud, SmartOSZFS, ARC, raw IO performance, SmartOS, CPU bursting, dTrace

■ Circonus, Chef + OpscodeMonitoring, graphing, alerting, automation

■ Amazon S3 + Fastly CDN

■ NewRelic, statsd, Graphite, nagios

12Thursday, April 18, 13

Wanelo Web Architecture

Proprietary and Confidential

nginx

haproxy

unicorn x 14

haproxy pgbouncer twemproxy

PostgreSQLSolr Redis MemCached

sidekiq

haproxy pgbouncer twemproxy

20 x 8GB 4 x 8GB

6 x 2GB

13Thursday, April 18, 13

This talk is about:

Proprietary and Confidential 14Thursday, April 18, 13

This talk is about:

Proprietary and Confidential

1. How much traffic can your database handle?

14Thursday, April 18, 13

This talk is about:

Proprietary and Confidential

1. How much traffic can your database handle?

2. Special report on counters

14Thursday, April 18, 13

This talk is about:

Proprietary and Confidential

1. How much traffic can your database handle?

3. Scaling database reads

2. Special report on counters

14Thursday, April 18, 13

This talk is about:

Proprietary and Confidential

1. How much traffic can your database handle?

3. Scaling database reads

4. Scaling database writes

2. Special report on counters

14Thursday, April 18, 13

1.How much traffic can your database handle?

15Thursday, April 18, 13

PostgreSQL is Awesome!

Proprietary and Confidential 16Thursday, April 18, 13

PostgreSQL is Awesome!

Proprietary and Confidential

■ Does a fantastic job of not corrupting your data

16Thursday, April 18, 13

PostgreSQL is Awesome!

Proprietary and Confidential

■ Does a fantastic job of not corrupting your data

■ Streaming replication in 9.2 is extremely reliable

16Thursday, April 18, 13

PostgreSQL is Awesome!

Proprietary and Confidential

■ Does a fantastic job of not corrupting your data

■ Streaming replication in 9.2 is extremely reliable

■ Won’t write to a read-only replica

16Thursday, April 18, 13

PostgreSQL is Awesome!

Proprietary and Confidential

■ Does a fantastic job of not corrupting your data

■ Streaming replication in 9.2 is extremely reliable

■ Won’t write to a read-only replica

■ But... No master/master replication

16Thursday, April 18, 13

PostgreSQL is Awesome!

Proprietary and Confidential

■ Does a fantastic job of not corrupting your data

■ Streaming replication in 9.2 is extremely reliable

■ Won’t write to a read-only replica

■ But... No master/master replication(good!)

16Thursday, April 18, 13

Is the database healthy?

Proprietary and Confidential 17Thursday, April 18, 13

What’s healthy?

Proprietary and Confidential 18Thursday, April 18, 13

What’s healthy?

Proprietary and Confidential

■ Able to respond quickly to queries from application (< 4ms disk seek time)

18Thursday, April 18, 13

What’s healthy?

Proprietary and Confidential

■ Able to respond quickly to queries from application (< 4ms disk seek time)

■ Has enough room to grow

18Thursday, April 18, 13

What’s healthy?

Proprietary and Confidential

■ Able to respond quickly to queries from application (< 4ms disk seek time)

■ Has enough room to grow

■ How do we know when we’re approaching a dangerous threshold?

18Thursday, April 18, 13

Oops!

Proprietary and Confidential

NewRelic Latency (yellow = database)

19Thursday, April 18, 13

Oops!

Proprietary and Confidential

NewRelic Latency (yellow = database)

19Thursday, April 18, 13

pg_stat_statements

Proprietary and Confidential

■ Maybe your app is to blame for performance...    select      query,  calls,  total_time      from          pg_stat_statements      order  by  total_time  desc  limit  12;

20Thursday, April 18, 13

pg_stat_statements

Proprietary and Confidential

■ Maybe your app is to blame for performance...    select      query,  calls,  total_time      from          pg_stat_statements      order  by  total_time  desc  limit  12;

Similar to Percona Toolkit, but runs all the time collecting stats.

20Thursday, April 18, 13

Proprietary and Confidential

pg_stat_statements

21Thursday, April 18, 13

Proprietary and Confidential

pg_stat_user_indexes

■ Using indexes as much as you think you are?

■ Using indexes at all?

22Thursday, April 18, 13

Proprietary and Confidential

pg_stat_user_indexes

■ Using indexes as much as you think you are?

■ Using indexes at all?

22Thursday, April 18, 13

Proprietary and Confidential

pg_stat_user_tables

■ Full table scans? (seq_scan)

23Thursday, April 18, 13

Proprietary and Confidential

pg_stat_user_tables

■ Full table scans? (seq_scan)

23Thursday, April 18, 13

Throw that in a graph

Proprietary and Confidential

Reads/second for one large table, daily

24Thursday, April 18, 13

Non-linear changes

Proprietary and Confidential

Suspicious spike!

25Thursday, April 18, 13

Correlate different data

Proprietary and Confidential

Deployments! Aha!

26Thursday, April 18, 13

Utilization vs Saturation

Proprietary and Confidential

# of Active PostgreSQL connections

27Thursday, April 18, 13

Utilization vs Saturation

Proprietary and Confidential

Red line: % of max connections establishedPurple: % of connections in query

28Thursday, April 18, 13

Disk reads/writes

Proprietary and Confidential

green: reads, red: writes

29Thursday, April 18, 13

Disk reads/writes

Proprietary and Confidential

Usage increases, but are the disks saturated?

green: reads, red: writes

29Thursday, April 18, 13

Utilization vs Saturation

Proprietary and Confidential 30Thursday, April 18, 13

Utilization vs Saturation

Proprietary and Confidential 30Thursday, April 18, 13

Utilization vs Saturation

Proprietary and Confidential

[How much are you waiting on disk?

31Thursday, April 18, 13

File system cache (ARC)

Proprietary and Confidential 32Thursday, April 18, 13

File system cache (ARC)

Proprietary and Confidential 32Thursday, April 18, 13

File system cache (ARC)

Proprietary and Confidential 32Thursday, April 18, 13

Watch the right things

Proprietary and Confidential

Hit ratio of the file system cache (ARC)

33Thursday, April 18, 13

Watch the right things

Proprietary and Confidential

Hit ratio of the file system cache (ARC)

33Thursday, April 18, 13

Room to grow...

Proprietary and Confidential

Size (including indexes) of a key table

34Thursday, April 18, 13

Working set in RAM?

Proprietary and Confidential

Adding index increases the size

35Thursday, April 18, 13

Working set in RAM?

Proprietary and Confidential

Adding index increases the size

35Thursday, April 18, 13

Collect all the data you can

Proprietary and Confidential

Once we knew where to look, graphs addedlater could explain behavior we could

only guess at earlier

36Thursday, April 18, 13

Collect all the data you can

Proprietary and Confidential

Once we knew where to look, graphs addedlater could explain behavior we could

only guess at earlier

36Thursday, April 18, 13

2.Special report on Counters and Pagination

37Thursday, April 18, 13

Proprietary and Confidential

Problem #1: DB Latency Up...

38Thursday, April 18, 13

Proprietary and Confidential

Problem #1: DB Latency Up...

■ iostat shows 100% disk busy

38Thursday, April 18, 13

Proprietary and Confidential

Problem #1: DB Latency Up...

device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b  

sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100  sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100  sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100  

■ iostat shows 100% disk busy

38Thursday, April 18, 13

Proprietary and Confidential

Problem #1: DB Latency Up...

device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b  

sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100  sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100  sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100  

■ iostat shows 100% disk busy

38Thursday, April 18, 13

Proprietary and Confidential

Problem #1: DB Latency Up...

device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b  

sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100  sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100  sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100  

■ iostat shows 100% disk busy

38Thursday, April 18, 13

Proprietary and Confidential

Problem #1: Diagnostics

39Thursday, April 18, 13

Proprietary and Confidential

■ Database is running very very hot. Initial investigation shows large number of counts.

Problem #1: Diagnostics

39Thursday, April 18, 13

Proprietary and Confidential

■ Database is running very very hot. Initial investigation shows large number of counts.

■ Turns out anytime you page with Kaminari, it always does a count(*)!

Problem #1: Diagnostics

39Thursday, April 18, 13

Proprietary and Confidential

■ Database is running very very hot. Initial investigation shows large number of counts.

■ Turns out anytime you page with Kaminari, it always does a count(*)!

Problem #1: Diagnostics

SELECT  "stores".*  FROM  "stores"                                      WHERE  (state  =  'approved')                                      LIMIT  20  OFFSET  0

SELECT  COUNT(*)  FROM  "stores"  WHERE  (state  =  'approved')

39Thursday, April 18, 13

Proprietary and Confidential

Problem #1: Pagination

40Thursday, April 18, 13

Proprietary and Confidential

Problem #1: Pagination

■ Doing count(*) is pretty expensive, as DB must scan many rows (either the actual table or an index)

40Thursday, April 18, 13

Proprietary and Confidential

Problem #1: Pagination

41Thursday, April 18, 13

Proprietary and Confidential

■ We are paginating everything! Even infinite scroll is a paged view behind the scenes.

Problem #1: Pagination

41Thursday, April 18, 13

Proprietary and Confidential

■ We are paginating everything! Even infinite scroll is a paged view behind the scenes.

Problem #1: Pagination

■ But we really DON’T want to run count(*) for every paged view.

41Thursday, April 18, 13

Proprietary and Confidential

■ We are showing most popular stores

■ Maybe it’s OK to hard-code the total number to, say, 1000?

Problem #1: Pagination

42Thursday, April 18, 13

Proprietary and Confidential

■ We are showing most popular stores

■ Maybe it’s OK to hard-code the total number to, say, 1000?

■ How do we tell Kaminari NOT to issue a count query in this case?

Problem #1: Pagination

42Thursday, April 18, 13

Proprietary and Confidential

Problem #1: Pagination (ctd)

43Thursday, April 18, 13

Proprietary and Confidential

Solution #1: Monkey Patch!!

44Thursday, April 18, 13

Proprietary and Confidential

Solution #1: Monkey Patch!!

44Thursday, April 18, 13

Proprietary and Confidential

Solution #1: Pass in the counter

45Thursday, April 18, 13

Proprietary and Confidential

SELECT  "stores".*  FROM  "stores"  WHERE  (state  =  'approved')  LIMIT  20  OFFSET  0

Solution #1: Pass in the counter

45Thursday, April 18, 13

Proprietary and Confidential

■ AKA: We still are doing too many counts!

Problem #2: Count Draculas

46Thursday, April 18, 13

Proprietary and Confidential

■ AKA: We still are doing too many counts!

Problem #2: Count Draculas

46Thursday, April 18, 13

Proprietary and Confidential

■ AKA: We still are doing too many counts!

Problem #2: Count Draculas

■ Rails makes it so easy to do it the lazy way.

46Thursday, April 18, 13

Proprietary and Confidential

■ But it just doesn’t scale well

Problem #2: Too Many Counts!

47Thursday, April 18, 13

Proprietary and Confidential

■ But it just doesn’t scale well

Problem #2: Too Many Counts!

■ Fortunately, Rails has just a feature for this...

47Thursday, April 18, 13

Proprietary and Confidential

■ But it just doesn’t scale well

Problem #2: Too Many Counts!

■ Fortunately, Rails has just a feature for this...

47Thursday, April 18, 13

Proprietary and Confidential

■ Unfortunately, it has one massive issue:

Counter Caches

48Thursday, April 18, 13

Proprietary and Confidential

■ Unfortunately, it has one massive issue:

Counter Caches

■ It causes database deadlocks at high volume

48Thursday, April 18, 13

Proprietary and Confidential

■ Unfortunately, it has one massive issue:

Counter Caches

■ It causes database deadlocks at high volume

■ Because many ruby processes are creating child records concurrently

48Thursday, April 18, 13

Proprietary and Confidential

■ Unfortunately, it has one massive issue:

Counter Caches

■ It causes database deadlocks at high volume

■ Because many ruby processes are creating child records concurrently

■ Each is executing a callback, trying to update counter_cache column on the parent, requiring row-level lock

48Thursday, April 18, 13

Proprietary and Confidential

■ Unfortunately, it has one massive issue:

Counter Caches

■ It causes database deadlocks at high volume

■ Because many ruby processes are creating child records concurrently

■ Each is executing a callback, trying to update counter_cache column on the parent, requiring row-level lock

■ Deadlocks ensue

48Thursday, April 18, 13

Proprietary and Confidential

Possible Solution:Use Background Jobs

49Thursday, April 18, 13

Proprietary and Confidential

Possible Solution:Use Background Jobs

■ It works like this:

49Thursday, April 18, 13

Proprietary and Confidential

Possible Solution:Use Background Jobs

■ It works like this:

■ As the record is created, we enqueue a request to recalculate counter_cache on the parent

49Thursday, April 18, 13

Proprietary and Confidential

Possible Solution:Use Background Jobs

■ It works like this:

■ As the record is created, we enqueue a request to recalculate counter_cache on the parent

■ The job performs a complete recalculation of the counter cache and is idempotent

49Thursday, April 18, 13

Proprietary and Confidential

Solution #2: Explained

50Thursday, April 18, 13

Proprietary and Confidential

Solution #2: Explained

■ Sidekiq with UniqueJob extension

50Thursday, April 18, 13

Proprietary and Confidential

Solution #2: Explained

■ Sidekiq with UniqueJob extension

■ Short wait for “buffering”

50Thursday, April 18, 13

Proprietary and Confidential

Solution #2: Explained

■ Sidekiq with UniqueJob extension

■ Short wait for “buffering”

■ Serialize updates via small number of workers

50Thursday, April 18, 13

Proprietary and Confidential

Solution #2: Explained

■ Sidekiq with UniqueJob extension

■ Short wait for “buffering”

■ Serialize updates via small number of workers

■ Can temporarily stop workers (in an emergency) to alleviate DB load

50Thursday, April 18, 13

Proprietary and Confidential

Solution #2: Code

51Thursday, April 18, 13

Proprietary and Confidential

Things are better. BUT...

52Thursday, April 18, 13

Proprietary and Confidential

Things are better. BUT...Still too many fucking counts!

52Thursday, April 18, 13

Proprietary and Confidential

Things are better. BUT...

■ Even doing count(*) from workers is too much on the databases

Still too many fucking counts!

52Thursday, April 18, 13

Proprietary and Confidential

Things are better. BUT...

■ Even doing count(*) from workers is too much on the databases

■ We need to stop doing count(*) in DB. But keep counter_caches. How?

Still too many fucking counts!

52Thursday, April 18, 13

Proprietary and Confidential

Things are better. BUT...

■ Even doing count(*) from workers is too much on the databases

■ We need to stop doing count(*) in DB. But keep counter_caches. How?

■ We could use Redis for this.

Still too many fucking counts!

52Thursday, April 18, 13

Proprietary and Confidential

Solution #3: Counts Deltas unicorn

PostgreSQL

sidekiq

save product product_id

RedisCounters

RedisSidekiq

3. Dequeue

2. ProductCountWorker.enqueue product_id

4. GET5. RESET

1. INCR product_id

5. SQL Update INCR by N

counter_cache column

53Thursday, April 18, 13

Proprietary and Confidential

Solution #3: Counts Deltas unicorn

PostgreSQL

sidekiq

save product product_id

RedisCounters

RedisSidekiq

3. Dequeue

2. ProductCountWorker.enqueue product_id

4. GET5. RESET

1. INCR product_id

5. SQL Update INCR by N

counter_cache column

■ Web request increments counter value in Redis

■ Enqueues request to update counter_cache

■ Background Job picks up a few minutes later, reads Redis delta value, and removes it.

■ Updates counter_cache column by incrementing it by delta.

53Thursday, April 18, 13

Proprietary and Confidential

Define counter_cache_on...

■ Internal GEM, will open source soon!

54Thursday, April 18, 13

Proprietary and Confidential

Can now use counter caches in pagination!

55Thursday, April 18, 13

3.Scaling reads

56Thursday, April 18, 13

Multiple optimization cycles

Proprietary and Confidential

■ Cachingaction caching, fragment, CDN

■ Personalization via AJAXCache the entire page, then add personalized details

■ 25ms/req memcached time is cheaper than 12ms/req of database time

57Thursday, April 18, 13

Cache optimization

Proprietary and Confidential

40% hit ratio! Woo!Wait... is that even good?

58Thursday, April 18, 13

Cache optimization

Proprietary and Confidential

Increasing your hit ratio means lessqueries against your database

59Thursday, April 18, 13

Cache optimization

Proprietary and Confidential

Caveat: even low hit ratio cachescan save your ass. You’re removing

load from the DB, remember?

60Thursday, April 18, 13

Cache saturation

Proprietary and Confidential

How long before your cachesstart evicting data?

Blue: cache writesRed: automatic evictions

61Thursday, April 18, 13

Cache saturation

Proprietary and Confidential

How long before your cachesstart evicting data?

Blue: cache writesRed: automatic evictions

61Thursday, April 18, 13

Cache saturation

Proprietary and Confidential

How long before your cachesstart evicting data?

Blue: cache writesRed: automatic evictions

61Thursday, April 18, 13

Ajax personalization

Proprietary and Confidential 62Thursday, April 18, 13

Ajax personalization

Proprietary and Confidential 62Thursday, April 18, 13

Ajax personalization

Proprietary and Confidential 62Thursday, April 18, 13

Nice!

Proprietary and Confidential

■ Rails Action CachingRuns before_filters, so A/B experiments can still run

■ Extremely fast pages4ms application time for some of our computationally heaviest pages

■ Could be served via CDN in the future

63Thursday, April 18, 13

Sad trombone...

Proprietary and Confidential

■ Are you actually logged in?Pages don’t know until Ajax successfully runs

■ Selenium AND Jasmine tests!

64Thursday, April 18, 13

Read/write splitting

Proprietary and Confidential

■ Sometime in December 2012...

65Thursday, April 18, 13

Read/write splitting

Proprietary and Confidential

■ Sometime in December 2012...

■ Database reaching 100% saturation

65Thursday, April 18, 13

Read/write splitting

Proprietary and Confidential

■ Sometime in December 2012...

■ Database reaching 100% saturation

■ Latency starting to increase non-linearly

65Thursday, April 18, 13

Read/write splitting

Proprietary and Confidential

■ Sometime in December 2012...

■ Database reaching 100% saturation

■ Latency starting to increase non-linearly

■ We need to distribute database load

65Thursday, April 18, 13

Read/write splitting

Proprietary and Confidential

■ Sometime in December 2012...

■ Database reaching 100% saturation

■ Latency starting to increase non-linearly

■ We need to distribute database load

■ We need to use read replicas!

65Thursday, April 18, 13

DB adapters for read/write

Proprietary and Confidential

■ Looked at several, including DbCharmer

66Thursday, April 18, 13

DB adapters for read/write

Proprietary and Confidential

■ Features / Configurability / Stability

■ Thread safety? This may be Ruby, but some people do actually use threads.

■ If I tell you it’s a read-only replica, DON’T ISSUE WRITES

■ Failover on errors?

■ Looked at several, including DbCharmer

66Thursday, April 18, 13

Chose Makara, by TaskRabbit

Proprietary and Confidential

■ Used in production

■ We extended it to work with PostgreSQL

■ Works with Sidekiqs (thread-safe!)

■ Failover code is very simple. Simple is sometimes better.

https://github.com/taskrabbit/makara

67Thursday, April 18, 13

We rolled out Makara and...

Proprietary and Confidential

■ 1 master, 3 read-only async replicas

68Thursday, April 18, 13

We rolled out Makara and...

Proprietary and Confidential

Wait, what?

■ 1 master, 3 read-only async replicas

68Thursday, April 18, 13

A note about graphs

Proprietary and Confidential

■ NewRelic is great!

■ Not easy to predict when your systems are about to fall over

■ Use something else to visualize Database and disk saturation

69Thursday, April 18, 13

3 days later, in production

Proprietary and Confidential

■ 3 read replicas distributing load from master

■ app servers and sidekiqs create lots of connections to DB backends

70Thursday, April 18, 13

3 days later, in production

Proprietary and Confidential

■ 3 read replicas distributing load from master

■ app servers and sidekiqs create lots of connections to DB backends

■ Mysterious spikes in errors at high traffic

70Thursday, April 18, 13

3 days later, in production

Proprietary and Confidential

■ 3 read replicas distributing load from master

■ app servers and sidekiqs create lots of connections to DB backends

■ Mysterious spikes in errors at high traffic

70Thursday, April 18, 13

Replication! Doh!

Proprietary and Confidential

Replication lag (yellow) correlates with application errors (red)

71Thursday, April 18, 13

Replication lag! Doh!

Proprietary and Confidential

■ Track latency sending xlog to slavesselect client_addr, pg_xlog_location_diff(sent_location, write_location) from pg_stat_replication;

■ Track latency applying xlogs on slavesselect pg_xlog_location_diff( pg_last_xlog_receive_location(), pg_last_xlog_replay_location()), extract(epoch from now()) - extract(epoch from pg_last_xact_replay_timestamp());

72Thursday, April 18, 13

Eventual Consistency

Proprietary and Confidential 73Thursday, April 18, 13

Eventual Consistency

Proprietary and Confidential

■ Some code paths should always go to master for reads (ie, after signup)

73Thursday, April 18, 13

Eventual Consistency

Proprietary and Confidential

■ Some code paths should always go to master for reads (ie, after signup)

■ Application should be resilient to getting RecordNotFound to tolerate replication delays

73Thursday, April 18, 13

Eventual Consistency

Proprietary and Confidential

■ Some code paths should always go to master for reads (ie, after signup)

■ Not enough to scale reads. Writes become the bottleneck.

■ Application should be resilient to getting RecordNotFound to tolerate replication delays

73Thursday, April 18, 13

Write load delays replication

Proprietary and Confidential

Replicas are busy trying to apply XLOGsand serve heavy read traffic

74Thursday, April 18, 13

4.Scaling database writes

75Thursday, April 18, 13

First, No-Brainers:

Proprietary and Confidential

■ Move stuff out of the DB. Easiest first.

76Thursday, April 18, 13

First, No-Brainers:

Proprietary and Confidential

■ Tracking user activity is very easy to do with a database table. But slow.

■ Move stuff out of the DB. Easiest first.

76Thursday, April 18, 13

First, No-Brainers:

Proprietary and Confidential

■ Tracking user activity is very easy to do with a database table. But slow.

■ 2000 inserts/sec while also handling site critical data? Not a good idea.

■ Move stuff out of the DB. Easiest first.

76Thursday, April 18, 13

First, No-Brainers:

Proprietary and Confidential

■ Tracking user activity is very easy to do with a database table. But slow.

■ 2000 inserts/sec while also handling site critical data? Not a good idea.

■ Solution:UDP packets to rsyslog, ASCII delimited files, log-rotate, analyze them later

■ Move stuff out of the DB. Easiest first.

76Thursday, April 18, 13

Next: Async Commits

Proprietary and Confidential 77Thursday, April 18, 13

Next: Async Commits

Proprietary and Confidential

■ PostgreSQL supports delayed (batched) commits

77Thursday, April 18, 13

Next: Async Commits

Proprietary and Confidential

■ PostgreSQL supports delayed (batched) commits

■ Delays fsync for some # of microseconds

77Thursday, April 18, 13

Next: Async Commits

Proprietary and Confidential

■ PostgreSQL supports delayed (batched) commits

■ Delays fsync for some # of microseconds

■ At high volume helps disk IO

77Thursday, April 18, 13

PostgreSQL Async Commits

Proprietary and Confidential 78Thursday, April 18, 13

ZFS Block Size

Proprietary and Confidential 79Thursday, April 18, 13

ZFS Block Size

Proprietary and Confidential

■ Default ZFS block size is 128Kb

79Thursday, April 18, 13

ZFS Block Size

Proprietary and Confidential

■ Default ZFS block size is 128Kb

■ PostgreSQL block size is 8Kb

79Thursday, April 18, 13

ZFS Block Size

Proprietary and Confidential

■ Default ZFS block size is 128Kb

■ PostgreSQL block size is 8Kb

■ Small writes require lots of bandwidth

79Thursday, April 18, 13

ZFS Block Size

Proprietary and Confidential

■ Default ZFS block size is 128Kb

device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b  

sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100  sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100  sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100  

■ PostgreSQL block size is 8Kb

■ Small writes require lots of bandwidth

79Thursday, April 18, 13

ZFS Block Size (ctd.)

Proprietary and Confidential 80Thursday, April 18, 13

ZFS Block Size (ctd.)

Proprietary and Confidential

■ Solution: change ZFS block size to 8K:

80Thursday, April 18, 13

ZFS Block Size (ctd.)

Proprietary and Confidential

■ Solution: change ZFS block size to 8K:

device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b  sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100  sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100  sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100  

80Thursday, April 18, 13

ZFS Block Size (ctd.)

Proprietary and Confidential

■ Solution: change ZFS block size to 8K:

device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b  sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100  sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100  sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100  

device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %bsd1              130.3    219.9        1.0        4.4    0.0    0.7        2.1      0    37sd1              329.3    384.1        2.6      11.3    0.0    1.9        2.6      1    78sd1              335.3    357.0        2.6        8.7    0.0    1.8        2.6      1    80sd1              354.0    200.3        2.8        4.9    0.0    1.6        3.0      0    84sd1              465.3    100.7        3.6        1.7    0.0    2.1        3.7      0    91

80Thursday, April 18, 13

Next: Vertical Sharding

Proprietary and Confidential 81Thursday, April 18, 13

Next: Vertical Sharding

Proprietary and Confidential

■ Move out largest table into its own master database (150 inserts/sec)

81Thursday, April 18, 13

Next: Vertical Sharding

Proprietary and Confidential

■ Move out largest table into its own master database (150 inserts/sec)

■ Remove any SQL joins, do them in application, drop foreign keys

81Thursday, April 18, 13

Next: Vertical Sharding

Proprietary and Confidential

■ Move out largest table into its own master database (150 inserts/sec)

■ Remove any SQL joins, do them in application, drop foreign keys

■ Switch model to establish_connection to another DB. Fix many broken tests.

81Thursday, April 18, 13

Vertical Sharding

Proprietary and Confidential

unicorns

haproxy pgbouncer twemproxy

PostgreSQLmain master

PostgreSQLsaves masterPostgreSQL

main replicaPostgreSQLmain replica

streaming replication

82Thursday, April 18, 13

Vertical Sharding: Results

Proprietary and Confidential 83Thursday, April 18, 13

Vertical Sharding: Results

Proprietary and Confidential

■ Deploy All Things!

83Thursday, April 18, 13

Future: Services Approach

Proprietary and Confidential

unicorns

haproxy pgbouncer twemproxy

PostgreSQLmain master

Shard1

PostgreSQLmain replica

PostgreSQLmain replica

streaming replication

sinatra services app

Shard2 Shard3

http / json

84Thursday, April 18, 13

In Conclusion. Tasty gems :)

Proprietary and Confidential

https://github.com/wanelo/pause

https://github.com/wanelo/spanx

https://github.com/wanelo/redis_with_failover

https://github.com/kigster/ventable

85Thursday, April 18, 13

In Conclusion. Tasty gems :)

Proprietary and Confidential

https://github.com/wanelo/pause

https://github.com/wanelo/spanx

https://github.com/wanelo/redis_with_failover

https://github.com/kigster/ventable

■ distributed rate limiting using redis

85Thursday, April 18, 13

In Conclusion. Tasty gems :)

Proprietary and Confidential

https://github.com/wanelo/pause

https://github.com/wanelo/spanx

https://github.com/wanelo/redis_with_failover

https://github.com/kigster/ventable

■ distributed rate limiting using redis

■ rate-limit-based IP blocker for nginx

85Thursday, April 18, 13

In Conclusion. Tasty gems :)

Proprietary and Confidential

https://github.com/wanelo/pause

https://github.com/wanelo/spanx

https://github.com/wanelo/redis_with_failover

https://github.com/kigster/ventable

■ distributed rate limiting using redis

■ rate-limit-based IP blocker for nginx

■ attempt another redis server if available

85Thursday, April 18, 13

In Conclusion. Tasty gems :)

Proprietary and Confidential

https://github.com/wanelo/pause

https://github.com/wanelo/spanx

https://github.com/wanelo/redis_with_failover

https://github.com/kigster/ventable

■ distributed rate limiting using redis

■ rate-limit-based IP blocker for nginx

■ attempt another redis server if available

■ observable pattern with a twist

85Thursday, April 18, 13

Thanks.

Comments? Questions?

https://github.com/wanelo

https://github.com/wanelo-chef

Proprietary and Confidential

@kig & @sax

@kig & @ecdysone

@kigster & @sax

86Thursday, April 18, 13

Recommended