24
Indexing with Aaron Staple [email protected]

Indexing with MongoDB

  • Upload
    mongodb

  • View
    28.177

  • Download
    0

Embed Size (px)

DESCRIPTION

Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677 We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.

Citation preview

Page 1: Indexing with MongoDB

Indexing with

Aaron [email protected]

Page 2: Indexing with MongoDB

What are indexes?

• References to your documents, efficiently ordered by key

• Maintained in a tree structure, allowing fast lookup

{x:0.5,y:0.5}

{x:2,y:0.5}

{x:5,y:2}

{x:-4,y:10}{x:3,y:’f’}

{x:1}

{y:1}

Page 3: Indexing with MongoDB

Fast document lookup

• db.c.findOne( {_id:2} ), using index {_id:1}• db.c.find( {x:2} ), using index {x:1}• db.c.find( {x:{$in:[2,3]}} ), using index {x:1}• db.c.find( {‘x.a’:1} ), using index {‘x.a’:1}– Matches {_id:1,x:{a:1}}

• db.c.find( {x:{a:1}} ), using index {x:1}– Matches {_id:1,x:{a:1}}, but not {_id:2,x:{a:1,b:2}}

QUESTION: What about db.c.find( {$where:“this.x == this.y”} ), using index {x:1}?Indexes cannot be used for $where type queries, but if there are non-where elements

in the query then indexes can be used for the non-where elements.

Page 4: Indexing with MongoDB

Fast document range scan

• db.c.find( {x:{$gt:2}} ), using index {x:1}• db.c.find( {x:{$gt:2,$lt:5}} ), using index {x:1}• db.c.find( {x:/^a/} ), using index {x:1}

QUESTION: What about db.c.find( {x:/a/} ), using index {x:1}?The letter ‘a’ can appear anywhere in a matching string, so lexicographic ordering on

strings won’t help. However, we can use the index to find the range of documents where x is string (eg not a number) or x is the regular expression /a/.

Page 5: Indexing with MongoDB

Other operations

• db.c.count( {x:2} ) using index {x:1}• db.c.distinct( {x:2} ) using index {x:1}• db.c.update( {x:2}, {x:3} ) using index {x:1}• db.c.remove( {x:2} ) using index {x:1}

QUESTION: What about db.c.update( {x:2}, {$inc:{x:3}} ), using index {x:1}?Older versions of mongoDB didn’t support modifiers on indexed fields, but we now

support this.

Page 6: Indexing with MongoDB

Fast document ordering

• db.c.find( {} ).sort( {x:1} ), using index {x:1}• db.c.find( {} ).sort( {x:-1} ), using index {x:1}• db.c.find( {x:{$gt:4}} ).sort( {x:-1} ), using index

{x:1}• db.c.find( {} ).sort( {‘x.a’:1} ), using index

{‘x.a’:1}

QUESTION: What about db.c.find( {y:1} ).sort( {x:1} ), using index {x:1}?The index will be used to ensure ordering, provided there is no better index.

Page 7: Indexing with MongoDB

Missing fields

• db.c.find( {x:null} ), using index {x:1}– Matches {_id:5}

• db.c.find( {x:{$exists:false}} ), using index {x:1}– Matches {_id:5}, but not {_id:6,x:null}

QUESTION: What about db.c.find( {x:{$exists:true}} ), using index {x:1}?The index is not currently used, though we may use the index in a future version of

mongoDB.

Page 8: Indexing with MongoDB

Array matching

• All the following match {_id:6,x:[2,10]} and use index {x:1}– db.c.find( {x:2} )– db.c.find( {x:10} )– db.c.find( {x:{$gt:5}} )– db.c.find( {x:[2,10]} )– db.c.find( {x:{$in:[2,5]}} )

QUESTION: What about db.c.find( {x:{$all:[2,10]}} )?The index will be used to look up all documents matching {x:2}.

Page 9: Indexing with MongoDB

Compound Indexes

• db.c.find( {x:10,y:20} ), using index {x:1,y:1}• db.c.find( {x:10,y:20} ), using index {x:1,y:-1}• db.c.find( {x:{$in:[10,20]},y:20} ), using index

{x:1,y:1}• db.c.find().sort( {x:1,y:1} ), using index {x:1,y:1}• db.c.find().sort( {x:-1,y:1} ), using index {x:1,y:-1}• db.c.find( {x:10} ).sort( {y:1} ), using index {x:1,y:1}

QUESTION: What about db.c.find( {y:10} ).sort( {x:1} ), using index {x:1,y:1}?The index will be used to ensure ordering, provided no better index is available.

Page 10: Indexing with MongoDB

When indexes are less helpful

• db.c.find( {x:{$ne:1}} )• db.c.find( {x:{$mod:[10,1]}} )– Uses index {x:1} to scan numbers only

• db.c.find( {x:{$not:/a/}} )• db.c.find( {x:{$gte:0,$lte:10},y:5} ) using index

{x:1,y:1}– Currently must scan all elements from {x:0,y:5} to

{x:10,y:5}, but some improvements may be possible• db.c.find( {$where:’this.x = 5’} )QUESTION: What about db.c.find( {x:{$not:/^a/}} ), using index {x:1}?

The index is not used currently, but will be used in mongoDB 1.6

Page 11: Indexing with MongoDB

Geospatial indexes

• db.c.find( {a:[50,50]} ) using index {a:’2d’}• db.c.find( {a:{$near:[50,50]}} ) using index {a:’2d’}– Results are sorted closest - farthest

• db.c.find( {a:{$within:{$box:[[40,40],[60,60]]}}} ) using index {a:’2d’}

• db.c.find( {a:{$within:{$center:[[50,50],10]}}} ) using index {a:’2d’}

• db.c.find( {a:{$near:[50,50]},b:2} ) using index {a:’2d’,b:1}QUESTION: Most queries can be performed with or without an index. Is this true of

geospatial queries? No. A geospatial query requires an index.

Page 12: Indexing with MongoDB

Creating indexes

• {_id:1} index created automatically– For non-capped collections

• db.c.ensureIndex( {x:1} )– Can create an index at any time, even when you already

have plenty of data in your collection– Creating an index will block mongoDB unless you

specify background index creation• db.c.ensureIndex( {x:1}, {background:true} )• Background index creation is a still impacts performance –

run at non peak times if you’re concerned QUESTION: Can an index be removed during background creation?

Not at this time.

Page 13: Indexing with MongoDB

Unique key constraints

• db.c.ensureIndex( {x:1}, {unique:true} )– Don’t allow {_id:10,x:2} and {_id:11,x:2}– Don’t allow {_id:12} and {_id:13} (both match

{x:null}• What if duplicates exist before index is created?– Normally index creation fails and the index is

removed– db.ensureIndex( {x:1}, {unique:true,dropDups:true} )

QUESTION: In dropDups mode, which duplicates will be removed?The first document according to the collection’s “natural order” will be preserved.

Page 14: Indexing with MongoDB

Cleaning up indexes

• db.system.indexes.find( {ns:’db.c’} )• db.c.dropIndex( {x:1} )• db.c.dropIndexes()• db.c.reIndex()– Rebuilds all indexes, removing index cruft that has

built up over large numbers of updates and deletes. Index cruft will not exist in mongoDB 1.6, so this command will be deprecated.

QUESTION: Why would you want to drop an index?See next slide…

Page 15: Indexing with MongoDB

Limits and Tradeoffs

• Max 40 indexes per collection• Logically equivalent indexes are not prevented

(eg {x:1} and {x:-1})• Indexes can improve speed of queries, but

make inserts slower• More specific indexes {a:1,b:1,c:1} can be more

helpful than less specific indexes {a:1}, but sorting compound keys may not be as fast as sorting simple keysQUESTION: Do indexes make updates slower? How about deletes?

It depends – finding your document might be faster, but if any indexed fields are changed the indexes must be updated.

Page 16: Indexing with MongoDB

Query Optimizer

• In charge of picking which index to use for a query/count/update/delete/etc– Implementation is part of the magic of mongo (you can read

about it online – not covering today)• Usually it does a good job, but if you know what you’re

doing you can override it– db.c.find( {x:2,y:3} ).hint( {y:1} )

• Use index {y:1} and avoid trying out {x:1}

• As your data changes, different indexes may be chosen. Ordering requirements should be made explicit using sort().QUESTION: How can you force a full collection scan instead of using indexes?

db.c.find( {x:2,y:3} ).hint( {$natural:1} )

Page 17: Indexing with MongoDB

Mongod log output• query test.c ntoreturn:1 reslen:69 nscanned:100000 { i:

99999.0 } nreturned:1 157ms• query test.$cmd ntoreturn:1 command: { count: "c",

query: { type: 0.0, i: { $gt: 99000.0 } }, fields: {} } reslen:64 256ms

• query:{ query: {}, orderby: { i: 1.0 } } ... query test.c ntoreturn:0 exception 1378ms ... User Exception 10128:too much key data for sort() with no index. add an index or specify a smaller limit

• query test.c ntoreturn:0 reslen:4783 nscanned:100501 { query: { type: 500.0 }, orderby: { i: 1.0 } } nreturned:101 390ms

• Occasionally may see a slow operation as a result of disk activity or mongo cleaning things up – some messages about slow ops are spurious– Keep this in mind when running the same op a massive number of times, and it appears

slow very rarely

Page 18: Indexing with MongoDB

Profiling• Record same info as with log messages, but in a database collection

> db.system.profile.find(){"ts" : "Thu Jan 29 2009 15:19:32 GMT-0500 (EST)" , "info" :

"query test.$cmd ntoreturn:1 reslen:66 nscanned:0 <br>query: { profile: 2 } nreturned:1 bytes:50" , "millis" : 0}...

> db.system.profile.find( { info: /test.foo/ } )> db.system.profile.find( { millis : { $gt : 5 } } )> db.system.profile.find().sort({$natural:-1})

• Enable explicitly using levels (0:off, 1:slow ops (>100ms), 2:all ops)> db.setProfilingLevel(2);{"was" : 0 , "ok" : 1}> db.getProfilingLevel()2> db.setProfilingLevel( 1 , 10 ); // slow means > 10ms

• Profiling impacts performance, but not severely

Page 19: Indexing with MongoDB

Query explain> db.c.find( {x:1000,y:0} ).explain(){

"cursor" : "BtreeCursor x_1","indexBounds" : [

[{

"x" : 1000},{

"x" : 1000}

]],"nscanned" : 10,"nscannedObjects" : 10,"n" : 10,"millis" : 0,"oldPlan" : {

"cursor" : "BtreeCursor x_1","indexBounds" : [

[{

"x" : 1000},{

"x" : 1000}

]]

},"allPlans" : [

{"cursor" : "BtreeCursor x_1","indexBounds" : [

[

{

"x" : 1000

},

{

"x" : 1000

}]

]},{

"cursor" : "BtreeCursor y_1","indexBounds" : [

[

{

"y" : 0

},

{

"y" : 0

}]

]},{

"cursor" : "BasicCursor","indexBounds" : [ ]

}]

}

Page 20: Indexing with MongoDB

Example 1> db.c.findOne( {i:99999} ){ "_id" : ObjectId("4bb962dddfdcf5761c1ec6a3"), "i" : 99999 }

query test.c ntoreturn:1 reslen:69 nscanned:100000 { i: 99999.0 } nreturned:1 157ms

> db.c.find( {i:99999} ).limit(1).explain(){

"cursor" : "BasicCursor","indexBounds" : [ ],"nscanned" : 100000,"nscannedObjects" : 100000,"n" : 1,"millis" : 161,"allPlans" : [{"cursor" : "BasicCursor","indexBounds" : [ ]}]

}> db.c.ensureIndex( {i:1} );> for( i = 0; i < 100000; ++i ) { db.c.save( {i:i} ); }

Page 21: Indexing with MongoDB

Example 2> db.c.count( {type:0,i:{$gt:99000}} )499

query test.$cmd ntoreturn:1 command: { count: "c", query: { type: 0.0, i: { $gt: 99000.0 } }, fields: {} } reslen:64 256ms

> db.c.find( {type:0,i:{$gt:99000}} ).limit(1).explain(){

"cursor" : "BtreeCursor type_1","indexBounds" : [

[{

"type" : 0},{

"type" : 0}

]],"nscanned" : 49502,"nscannedObjects" : 49502,"n" : 1,"millis" : 349,

...

> db.c.ensureIndex( {type:1,i:1} );> for( i = 0; i < 100000; ++i ) { db.c.save( {type:i%2,i:i} ); }

Page 22: Indexing with MongoDB

Example 3> db.c.find().sort( {i:1} )error: {

"$err" : "too much key data for sort() with no index. add an index or specify a smaller limit"

}

> db.c.find().sort( {i:1} ).explain()JS Error: uncaught exception: error: {

"$err" : "too much key data for sort() with no index. add an index or specify a smaller limit"

}

> db.c.ensureIndex( {i:1} );> for( i = 0; i < 1000000; ++i ) { db.c.save( {i:i} ); }

Page 23: Indexing with MongoDB

Example 4> db.c.find( {type:500} ).sort( {i:1} ){ "_id" : ObjectId("4bba4904dfdcf5761c2f917e"), "i" : 500, "type" : 500 }{ "_id" : ObjectId("4bba4904dfdcf5761c2f9566"), "i" : 1500, "type" : 500 }...

query test.c ntoreturn:0 reslen:4783 nscanned:100501 { query: { type: 500.0 }, orderby: { i: 1.0 } } nreturned:101 390ms

> db.c.find( {type:500} ).sort( {i:1} ).explain(){

"cursor" : "BtreeCursor i_1","indexBounds" : [

[{

"i" : {"$minElement" : 1

}},{

"i" : {"$maxElement" : 1

}}

]],"nscanned" : 1000000,"nscannedObjects" : 1000000,"n" : 1000,"millis" : 5388,

...

> db.c.ensureIndex( {type:1,i:1} );> for( i = 0; i < 1000000; ++i ) { db.c.save( {i:i,type:i%1000} ); }

Page 24: Indexing with MongoDB

Questions?

• Follow @mongodb• Get involved www.mongodb.org• Upcoming events

www.mongodb.org/display/DOCS/Events– MongoSF April 30– SF office hours every Mon 4-6pm Epicenter Cafe

• Commercial support www.10gen.com• [email protected]