Growing Up MongoDB

Preview:

DESCRIPTION

 

Citation preview

Growing up MongoDB

Kiril Savino - CTO GameChanger@kirilnyc

About Me.

Lead Engineer, Higher One

Lead Engineer, DoubleClick

Lead Engineer -> CTO, ShopWiki

Director Engineering, Conductor

Founder & CTO, GameChanger Media

10 years Oracle and MySQL, 4 MongoDB

pre-$not

About GameChanger.

Growing up.

865,443,426+

3 terabytes

16-nodes

240GB RAM, 8TB SSD storage

120,000 ops/s sustained

Learnings.

Schemas

Concurrency

Availability

Firefighting

1. Schemas!

Schema-less.

I do not think that word means what you think it means.

Be abnormal.

Schema-less does mean not having to separate data for modeling reasons

Focus on data usage patterns along with semantics

You're going to have to do this anyway: start now and scale up easy

Go monolithic.

Learn to query, then forget and pretend MongoDB is a really fancy, full-featured KV store.

Querying secondary data is a waste

Scans & indexed queries are slow

Memory fragmentation can kill you

Use MongoDB's strengths

Garbage in...

Validation is your problem

Don't let inconsistency linger

Know what parts of schema are flexible

Index wisely.

Data size

Insert/Update speed

Bad schema smell

2. Concurrency!

(A)CID.

Good schema design provides for basic atomicity at the document level

Obviates the need for transactions in many trivial cases

Your friends.

$set/$unset

$push/$pull

$addToSet

findAndModify

Two phased commits.

Optimistic locking.

External transactions.

Eventual Consistency.

Write canonical data first

Ensure queuing of propagation

Guarantee queue entry completeness

3. Availability!

“/dev/null is web scale”

Durability.

Journaling.

OK?

Replication.

Moar = better?

4. Firefighting!

Test your capacity.Naïve throughput testing with real hardware

Clone prod configuration

Consider copying data or subset

Start with crude approximations

Get as close to “real load” as makes sense

Model your growth.

db.stats()

db.<collection>.stats()

avg doc size, avg index size / doc

growth rates / collection

approx active portion / collection

Read your logs.

{...}ntoreturn:1keyUpdates:0numYields: 136locks(micros) r:368727reslen:78 199ms

Don’t scan.

So...

Schema-less != no schema

ACID overrated; concurrency not

High availability is up to you

Understand the mechanics

Thanks!

Kiril SavinoCTO, GameChanger Media

www.GameChanger.io@kirilnyc

kirilsavino.com/blog

Next Sessions at 3:405th Floor:

West Side Ballroom 3&4: Advanced Replication Internals

West Side Ballroom 1&2: Building a High-Performance Distributed Task Queue on MongoDB

Juilliard Complex: WhiteBoard Q&A

Lyceum Complex: Ask the Experts

7th Floor:

Empire Complex: Managing a Maturing MongoDB Ecosystem

SoHo Complex: MongoDB Indexing Constraints and Creative Schemas