Download pdf - MongoDB: Scaling write performance | Devon 2012

MongoDB:Scaling write performance

Junegunn Choi

First impression:Easy

• Easy installation

• Easy data model

•No prior schema design

•Native support for secondary indexes

Second thought:Not so easy

•No SQL

• Coping with massive data growth

• Setting up and operating sharded cluster

• Scaling write performance

Today we’ll talk aboutinsert performance

Insert throughputon a replica set

* 1kB record. ObjectId as PK* WriteConcern: Journal sync on Majority

Steady 5k inserts/sec

Insert throughputwith a secondary index

Culprit:B+Tree index

• Good at sequential insert

• e.g. ObjectId, Sequence #, Timestamp

• Poor at random insert

• Indexes on randomly-distributed data

Sequential vs. Random insert

123456789101112

B+Tree

55757819936809152635633

B+Tree

Sequential insert ➔ Small working set➔ Fits in RAM ➔ Sequential I/O

(bandwidth-bound)

Random insert ➔ Large working set➔ Cannot fit in RAM ➔ Random I/O

(IOPS-bound)

working set working set

So, what do we do now?

B+Tree

1. Partitioning

does not fit in memory

Aug 2012 Sep 2012 Oct 2012

fits in memory

1. Partitioning

•MongoDB doesn’t support partitioning

• Partitioning at application-level

• e.g. Daily log collection

• logs_20121012

Switch collection every hour

2. Better H/W

•More RAM

•More IOPS

• RAID striping

• SSD

• AWS Provisioned IOPS (1k ~ 10k)

SHARD3SHARD2SHARD1

3. More H/W: Sharding

• Automatic partitioning across nodes

mongos router

3 shards (3x3)

3 shards (3x3)on RAID 1+0

There’s no free lunch• Manual partitioning

• Incidental complexity

• Better H/W

• $

• Sharding

• $$

• Operational complexity

“Do you really need that index?”

Scaling insert performancewith sharding

=Choosing the right shard key

SHARD3

USERS

Shard key example:year_of_birth

SHARD1

USERS

~ 1950 1951 ~ 1970

1991 ~ 2005

SHARD2

USERS

1971 ~ 1990

2010 ~ ∞

2006 ~ 2010

64MB chunk

mongos router

5k inserts/sec w/o sharding

Sequential key

•ObjectId as shard key

• Sequence #

• Timestamp

Worse throughput with 3x H/W.

Sequential key

• All inserts into one chunk

• Chunk migration overhead

SHARD-x

USERS

1000 ~ 2000

9000 ~ ∞

5000 ~ 7500

9001, 9002, 9003, 9004, ...

Sequential key

Hash key

• e.g. SHA1(_id) = 9f2feb0f1ef425b292f2f94 ...

•Distributes evenly across all ranges

Hash key

• Performance drops as collection grows

•Why? Mandatory shard key index

• B+Tree problem again!

Sequential keyHash key

Sequential + hash key• Coarse-grained sequential prefix

• e.g. Year-month + hash value

• 201210_24c3a5b9

B+Tree

201210_*201209_*201208_*

B+Tree

But what if...

201210_*201209_*201208_*

large working set

Sequential + hash key

• Can you predict data growth rate?

• Balancer not clever enough

•Only considers # of chunks

•Migration slow during heavy-writes


Sequential + hash key

Low-cardinality hash key

• e.g. A~Z, 00~FF

• Alleviates B+Tree problem

• Sequential access on fixed # of parts

LocalB+Tree

A B C

Shard key range: A ~ D

AA BB CC

Low-cardinality hash key

• Limits the # of possible chunks

• e.g. 00 ~ FF ➔ 256 chunks

• Chunk grows past 64MB

• Balancing becomes difficult


Sequential + hash keyLow-cardinality hash key

Low-cardinality hash prefix+ sequential part

• e.g. Short hash prefix + timestamp

• FA1350005981

•Nice index access pattern

• Unlimited # of chunks

LocalB+Tree

A123 B123 C123

Shard key range: A000 ~ C999

A000 B000 C000

Finally, 2x throughput

Lessons learned• Know the performance impact of secondary index

• Choose the right shard key

• Test with large data sets

• Linear scalability is hard

• If you really need it, consider HBase or Cassandra

• SSD

Thank you. Questions?

[email protected]

mailto:[email protected]

mailto:[email protected]