Upload
mongodb
View
936
Download
6
Embed Size (px)
DESCRIPTION
Learn all about Indexing Strategies for MongoDB.
Citation preview
Indexing Strategies to Help you Scale
Senior Solutions Architect, MongoDB
Dmitry Baev
Agenda
• What are indexes?
• Indexing Basics
• Evaluation / Tuning
• Geospatial
• Text Search
• Scaling
What Are Indexes?
What Are Indexes?
Imagine you're looking for a recipe in a cookbook ordered by recipe name. Looking up a recipe by name is quick and easy.
Consult the Index
Linked List
Finding 7 in a Linked List
Finding 7 In a Tree
Indexes in MongoDB are B-Trees
Queries, inserts and deletes: O(log(n)) time
Indexes are the single biggest tunable performance factor in MongoDB
Indexing Basics
13
• Single biggest tunable performance factor in the DB.– Index efficiency should be reviewed early– Avoid duplicates
– .
// index on author (ascending)>db.articles.ensureIndex( { author : 1 } )
// index on author (descending)>db.articles.ensureIndex( { author : -1 } )
// index on arrays of values – multi key index.>db.articles.ensureIndex( { tags : 1 } )
Indexing Basics
14
• Index on sub-documents– Using dot notation
Sub-document indexes
{‘_id’ : ObjectId(..),
‘article_id’ : ObjectId(..), ‘section’ : ‘schema’,
‘date’ : ISODate(..),‘daily’: { ‘views’ : 45,
‘comments’ : 150 } ‘hours’ : { 0 : { ‘views’ : 10 }, 1 : { ‘views’ : 2 }, … 23 : { ‘views’ : 14,
‘comments’ : 10 } }}
>db.interactions.ensureIndex(
{ “daily.comments” : 1}
}
>db.interactions.find(
{“daily.comments” : { $gte : 150} } ,
{ _id:0, “daily.comments” : 1 } )
15
• Indexes that use multiple values
Compound indexes
//To view via the console> db.articles.ensureIndex( { author : 1, tags : 1 } )
> db.articles.find( { author : ‘Joe D’, tags : ‘MongoDB’} )//and> db.articles.find( { author : ‘Joe D’ } )
// you don’t need this> db.articles.ensureIndex( { author : 1 } )
16
• Sort doesn’t matter on single indexes– We can read from either side of the btree
• { attribute: 1 } or { attribute: -1 }
• Sort order matters on compound indexes– We’ll want to query on author and sort by date in the
application
Sort order
// index on author ascending but date descending
>db.articles.ensureIndex( { ‘author’ : 1, ‘date’ -1 } )
17
• Returns data from the index– Rather than the database files– Performance optimization – Works with compound indexes
• Invoke with a projection
Covered or Index only Queries
> db.users.ensureIndex( { user : 1, password :1 } )
> db.user.find({ user:”joe” }, { _id:0, password:1 }
)
Tip: use projections anyway to reduce data sent back to the client
18
Options
• Uniqueness constraints (unique, dropDups)
• Sparse Indexes
// index on author must be unique
>db.articles.ensureIndex( { ‘author’ : 1}, { unique : true } )
// allow multiple documents to not have likes field
>db.articles.ensureIndex( { ‘author’ : 1, ‘likes’ : 1}, { sparse: true } )
* Missing fields are stored as null(s) in the index
19
Background Index Builds
• Index creation is a blocking operation that can take a long time
• Background creation yields to other operations
• Build more than one index in background concurrently
• Restart secondaries in standalone to build index
// To build in the background> db.articles.ensureIndex(
{ ‘author’ : 1, ‘date’ -1 }, {background : true}
)
20
• Use to evaluate operations and indexes– Which indexes have been used.. If any.– How many documents / objects have been scanned– View via the console or via code
Explain plan
//To view via the console> db.articles.find({author:’Joe D'}).explain()
21
Explain plan output (no index)
{"cursor" : ”BasicCursor","isMultiKey" : false,"n" : 12,"nscannedObjects" : 25820,"nscanned" : 25820,…"indexOnly" : false,…"millis" : 27,…
}
Other Types:
• BasicCursor• Full collection scan
• BtreeCursor• GeoSearchCursor• Complex Plan• TextCursor
22
Explain plan output
{"cursor" : "BtreeCursor
author_1_date_-1","isMultiKey" : false,"n" : 12,"nscannedObjects" : 12,"nscanned" : 12,…"indexOnly" : false,…"millis" : 0,…
}
Other Types:
• BasicCursor• Full collection scan
• BtreeCursor• GeoSearchCursor• Complex Plan• TextCursor
23
• Enable to see slow queries– (or all queries)– Default 100ms
Database profiler
//Enable database profiler on the console, 0=off 1=slow 2=all> db.setProfilingLevel(1, 100){ "was" : 0, "slowms" : 100, "ok" : 1 }
//View profile with > show profile
//or>db.system.profile.find().pretty()
24
The Query Optimizer
• For each "type" of query, MongoDB periodically tries all useful indexes
• Aborts the rest as soon as one plan wins
• The winning plan is temporarily cached for each “type” of query (used for next 1,000 times)
• MongoDB 2.6 can use the intersection of multiple indexes to fulfill queries
25
Other Index Types
• Geospatial Indexes (2d Sphere)
• Text Indexes
• TTL Collections (expireAfterSeconds)
• Hashed Indexes for sharding
Geo Spatial Indexes
27
• Indexes on geospatial fields– Using GeoJSON objects– Geometries on spheres
2dSphere
//GeoJSON object structure for indexing{ name: ’MongoDB Palo Alto’, location: { type : “Point”,
coordinates: [ 37.449157 , -122.158574 ] }}
// Index on GeoJSON objects>db.articles.ensureIndex( { location: “2dsphere” } )
Supported GeoJSON objects:
PointLineStringPolygonMultiPointMultiLineStringMultiPolygonGeometryCollection
28
Extended Articles document
• Store the location article was posted from….
• Geo location from browser
Articles collections>db.articles.insert({
'text': 'Article content…’, 'date' : ISODate(...), 'title' : ’Intro to MongoDB’, 'author' : ’Joe D’, 'tags' : ['mongodb',
'database',
'nosql’],
‘location’ : { ‘type’ : ‘Point’, ‘coordinates’ :
[37.449, -122.158] }
});
//Javascript function to get geolocation.navigator.geolocation.getCurrentPosition();
//You will need to translate into GeoJSON
29
– Query for locations ’near’ a particular coordinate
Example
>db.articles.find( { location: { $near :
{ $geometry : { type : "Point”, coordinates : [37.449, -
122.158] } }, $maxDistance : 5000 }
} )
Text Search
31
Text Indexes
• Use text indexes to support text search of string content in documents of a collection.
• Text indexes can include any field whose value is a string or an array of string elements.
• To perform queries that access the text index, use the $text query operator.
32
Text Search
• Only one text index per collection
• $** operator to index all text fields in the collection
• Use weight to change importance of fields
>db.articles.ensureIndex({title: ”text”, content:
”text”})
>db.articles.ensureIndex( { "$**" : “text”,
name : “MyTextIndex”} )
>db.articles.ensureIndex( { "$**" : "text”}, { weights :
{ ”title" : 10, ”content" : 5}, name : ”MyTextIndex” })
Operators$text, $search, $language, $meta
33
• Use the $text and $seach operators to query
• Now returns a cursor
• $meta for scoring results
– .// Search articles collection> db.articles.find ({$text: { $search: ”MongoDB" }})
> db.articles.find({ $text: { $search: "MongoDB" }}, { score: { $meta: "textScore" }, _id:0, title:1 } )
{ "title" : "Intro to MongoDB", "score" : 0.75 }
Search
Scaling
Working Set Exceeds Physical Memory
• When a specific resource becomes a bottle neck on a machine or replica set
• RAM• Disk IO• Storage• Concurrency
When to consider Scaling?
Vertical Scalability (Scale Up)
Horizontal Scalability (Scale Out)
Sharding
• User defines shard key
• Shard key defines range of data
• Data is partitioned into shards according to shard key
40
Scalability
Auto-Sharding
• Increase capacity as you go
• Commodity and cloud architectures
• Improved operational simplicity and cost visibility
Thank You