50
Using MongoDB in Anger Techniques and Considerations

Mongodb in-anger-boston-rb-2011

Embed Size (px)

Citation preview

Page 1: Mongodb in-anger-boston-rb-2011

Using MongoDB inAnger

Techniques andConsiderations

Page 2: Mongodb in-anger-boston-rb-2011

Kyle [email protected] and @hwaet

Page 3: Mongodb in-anger-boston-rb-2011

Four topics:Schema design

Indexing

Concurrency

Durability

Page 4: Mongodb in-anger-boston-rb-2011

I. Schema design

Page 5: Mongodb in-anger-boston-rb-2011

Document sizeKeys are stored in the documentsthemselves

For large data sets, you should use smallkey names.

Page 6: Mongodb in-anger-boston-rb-2011

> doc = { _id: ObjectId("4e94886ebd15f15834ff63c4"), username: 'Kyle', date_of_birth: new Date(1970, 1, 1), site_visits: 1027 }

> Object.bsonsize( doc );85

Page 7: Mongodb in-anger-boston-rb-2011

> doc = { _id: ObjectId("4e94886ebd15f15834ff63c4"), name: 'Kyle', dob: new Date(1970, 1, 1), v: 1027 }

> Object.bsonsize( doc );61 // 28% smaller!

Page 8: Mongodb in-anger-boston-rb-2011

Document growthCertain schema designs require documentsto grow significantly.

This can be expensive.

Page 9: Mongodb in-anger-boston-rb-2011

// Sample: user with followers{ _id: ObjectId("4e94886ebd15f15834ff63c4"), name: 'Kyle' followers: [ { user_id: ObjectId("4e94875fbd15f15834ff63c3") name: 'arussell' }, { user_id: ObjectId("4e94875fbd15f15834ff63c4") name: 'bsmith' } ]}

Page 10: Mongodb in-anger-boston-rb-2011

An initial design:// Update using $push will grow the documentnew_follower = { user_id: ObjectId("4e94875fbd15f15834ff63c5") name: 'jcampbell' }db.users.update({name: 'Kyle'}, { $push: {friends: { $push: new_follower } } )

Page 11: Mongodb in-anger-boston-rb-2011

Let's break this down...At first, documents are inserted with noextra space.

But updates that change the size of thedocuments will alter the padding factor.

Even with a large padding factor,documents that grow unbounded will stilleventually have to be moved.

Page 12: Mongodb in-anger-boston-rb-2011

Relocation is expensive:All index entry pointers must be updated.

Entire document must be rewritten in a newplace on disk (possibly not in RAM).

May cause fragmentation. Increases thenumber of entries in the free list.

Page 13: Mongodb in-anger-boston-rb-2011

A better design:// User collection{ _id: ObjectId("4e94886ebd15f15834ff63c4"), name: 'Kyle'}// Followers collection{ friend_id: ObjectId("4e94875fbd15f15834ff63c3") name: 'arussell' },{ friend_id: ObjectId("4e94875fbd15f15834ff63c4") name: 'bsmith' }

Page 14: Mongodb in-anger-boston-rb-2011

The upshot?Rich documents are still useful. Theysimplify the representation of objects andcan increase query performance because oftheir pre-joined structure.

However, if your documents are going togrow unbounded, it's best to separate theminto multiple collections.

Page 15: Mongodb in-anger-boston-rb-2011

Pre-aggregation

Page 16: Mongodb in-anger-boston-rb-2011

AggregationMap-reduce and group are adequate, butmay not be fast enough for large data sets.

MongoDB 2.2 has a new, fast aggregationframework!

Still, pre-aggregation will be faster thanpost-aggregation in a lot of cases. For real-time apps, it's almost a necessity.

Page 17: Mongodb in-anger-boston-rb-2011

Example: a counter cache.// User collection{ _id: ObjectId("4e94886ebd15f15834ff63c4"), name: 'Kyle', follower_ct: 4}

Page 18: Mongodb in-anger-boston-rb-2011

Using the $inc operator:// This increment is in-place.// (i.e., no rewriting of the document).db.users.update({name: 'Kyle'}, {$inc: {follower_ct: 1}})

Page 19: Mongodb in-anger-boston-rb-2011

Need a real-world example?

Page 20: Mongodb in-anger-boston-rb-2011

A sophisticated example of pre-aggregation.

{ _id: { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"), day: '2011-5-1' }, total: 2820, hrs: { 0: 500, 1: 700, 2: 450, 3: 343, // ... 4-23 go here } // Minutes are rolling. This gives real-time // numbers for the last hour. So when you increment // minute n, you need to $set minute n-1 to 0. mins: { 1: 12, 2: 10, 3: 5, 4: 34 // ... 5-60 go here }}

Page 21: Mongodb in-anger-boston-rb-2011

Schema design summaryThink hard about the size of yourdocuments. Optimize keys and data types(not discussed).

If your documents are growing unbounded,you may have the wrong schema design.

Consider operations that rewrite documents(and individual values) in-place. $inc and(sometimes) $set is great examples of this.

Page 22: Mongodb in-anger-boston-rb-2011

II. Indexing

Page 23: Mongodb in-anger-boston-rb-2011

It's all about efficiency:Fundamental, but widely misunderstood.

The right indexes gives you the mostefficient use of your hardware (RAM, disk,and CPU).

The wrong indexes, or no indexesaltogether, make trivial workloadsimpossible to run, even on high-endhardware.

Page 24: Mongodb in-anger-boston-rb-2011

The BasicsEvery query should use an index. Use theMongoDB log or the query profiler to identifyqueries not using an index. The value ofnscanned should be low.

Know about compound-key index. Knowwhich indexes can be utilized for sorts,ranges, etc. Learn to use explain().

Good resources on indexing: MongoDB inAction and High Performance MySQL.

Page 25: Mongodb in-anger-boston-rb-2011

For the best performance, you should haveenough RAM to contain indexes andworking set.

Working setWorking set is the portion of your total datasize that's regularly used by the application.For some applications, working set might be50% of data size. For others, it's close to100%.

For example, think about Foursquare'scheckins database. Because checkins areconstantly queried to calculate badges,checkins must live in RAM. So working seton this database is 100%.

Page 26: Mongodb in-anger-boston-rb-2011

Working set (cont.)On the other end of the spectrum, Craigslistuses MongoDB as a listing archive. Thisarchive is rarely queried. Therefore, itdoesn't matter if data size is much largerthan RAM, since the working set is small.

Page 27: Mongodb in-anger-boston-rb-2011

Special indexing features...

Page 28: Mongodb in-anger-boston-rb-2011

Sparse indexesUse a sparse index to reduce index size. Asparse include will include only thosedocument having the indexed key.

For example, suppose you have 10 millionusers, of which only 100K are payingsubscribers. You can index only those fieldsrelevant to paid subscriptions with a sparseindex.

Page 29: Mongodb in-anger-boston-rb-2011

A sparse index:db.users.ensureIndex({expiration: 1}, {sparse: true})// All users whose accounts expire next monthdb.users.find({expiration: {$lte: new Date(2011, 11, 30), $gte: new Date(2011, 11, 1)})

Page 30: Mongodb in-anger-boston-rb-2011

Index-only queriesIf you only need a few values, you canreturn those values directly from the index.This eliminates the indirection from index todata files on the server.

Specify the fields you want, and exclude the_id field.

The explain() method will display{indexOnly: true}.

Page 31: Mongodb in-anger-boston-rb-2011

An index-only query:db.users.ensureIndex({follower_ct: 1, name: 1})// This will be index-only.db.users.find({}, {follower_ct: 1, name: 1, _id: 0}).sort({follower_ct: -1})

Page 32: Mongodb in-anger-boston-rb-2011

Indexing summaryLearn about indexing.

Ensure that your queries are using the mostefficient index.

Investigate sparse indexes and index-onlyqueries for performance-intensive apps.

Page 33: Mongodb in-anger-boston-rb-2011

Concurrency

Page 34: Mongodb in-anger-boston-rb-2011

Current implementation:Concurrency is still somewhat coarse-grained. For any given mongod, there's aserver-wide reader-writer lock, with a varietyof yielding optimizations.

For example, in MongoDB 2.0, the serverwon't hold a write lock around a page fault.

On the roadmap are database-level locking,collection-level locking, and extent-basedlocking.

Page 35: Mongodb in-anger-boston-rb-2011

To avoid concurrency-relatedbottlenecks:

Separate orthogonal concerns into multiplesmaller deployments. For example, one foranalytics and another for the rest of the app.

Ensure that your indexes and working set fitin RAM.

Do not attempt to scale reads withsecondary nodes unless your application ismostly read-heavy.

Page 36: Mongodb in-anger-boston-rb-2011

mostly read-heavy.

IV. Durability

Page 37: Mongodb in-anger-boston-rb-2011

Four topics:Storage

Journaling

Write concern

Replication

Page 38: Mongodb in-anger-boston-rb-2011

Storage

Each file is mapped to virtual memory.

All writes to data files are to a virtualmemory address.

Sync to disk is handled by the OS, with aforced flush every 60 seconds.

Page 39: Mongodb in-anger-boston-rb-2011

Disk

RAM

Virtual Memory(Per Process)

PhysicalMemory

Page 40: Mongodb in-anger-boston-rb-2011

Journaling

Data written to an append-only log, andsynced every 100ms.

This imposes a write penalty, especially onslow drives.

If you use journaling, you may want tomount a separate drive for the journaldirectory.

Enabled by default in MongoDB 2.0.

Page 41: Mongodb in-anger-boston-rb-2011

Replication

Fast, automatic failover.

Simplifies backups.

If you don't want to use journaling, you canuse replication instead. Recovery can betrickier, but writes will be faster.

Page 42: Mongodb in-anger-boston-rb-2011

Write concern

Page 43: Mongodb in-anger-boston-rb-2011

A default, fire-and-forget write:@users.insert( {'name' => 'Kyle'} )

Page 44: Mongodb in-anger-boston-rb-2011

Write with a round trip:@users.insert( {'name' => 'Kyle'}, :safe => true )

Page 45: Mongodb in-anger-boston-rb-2011

Write to two nodes with a 1000mstimeout:

@users.insert( {'name' => 'Kyle'}, :safe => {:w => 2, :wtimeout => 1000})

Page 46: Mongodb in-anger-boston-rb-2011

Write concern advice:Use a level of write concern appropriate tothe data you're writing.

By default, use {:safe => true}. That is,ensure a single round trip.

For especially sensitive data, use replicationacknowledgment.

For analyics, clicks, logging, etc., use fire-and-forget.

Page 47: Mongodb in-anger-boston-rb-2011

Durability in angerUse replication for durability. You can,optionally, keep a single, passive replicawith durability enabled.

Use write concern judiciously.

Page 48: Mongodb in-anger-boston-rb-2011

Topics we didn't cover:Hardware and deployment practices.

Sharding and schema design at scale.

(Lots of videos on these at 10gen.com!)

Page 49: Mongodb in-anger-boston-rb-2011

Announcements, Questions,and Credits

http://www.flickr.com/photos/foamcow/34055184/

http://www.flickr.com/photos/reedinglessons/2239767394

http://www.flickr.com/photos/edelman/6031599707

http://www.flickr.com/photos/curtisperry/5386879526/

http://www.flickr.com/photos/ryanspalding/4756905846

Page 50: Mongodb in-anger-boston-rb-2011

Thank you