66
Senior Solutions Architect, 10gen James Kerr 2.4 Sharding Features

Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

  • Upload
    mongodb

  • View
    2.533

  • Download
    1

Embed Size (px)

DESCRIPTION

In version 2.4, MongoDB introduces hash-based sharding, allowing the user to shard based on a randomized shard key to spread documents evenly across a cluster. Hash-based sharding is an alternative to range-based sharding, making it easier to manage your growing cluster. In this talk, we'll discuss provide an overview of this new feature and discuss the pros and cons of using a hash-based sharding vs. range-based approach.

Citation preview

Page 1: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Senior Solutions Architect, 10gen

James Kerr

2.4 Sharding Features

Page 2: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Agenda

• Mechanics of sharding– Key space– Chunks– Balancing

• Types of requests

• Hashed shard keys– Why use hashed shard keys– How to enable hashed shard keys– Limitations

Page 3: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Sharded Cluster

Page 4: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Sharding your data

Page 5: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

What is a Shard Key

• Shard key is used to partition your collection

• Shard key must exist in every document

• Shard key is immutable

• Shard key values are immutable

• Shard key must be indexed

• Shard key is used to route requests to shards

Page 6: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

The key space

{x: 10} {x: -5} {x: -9} {x: 7} {x: 6} {x: 0}

Page 7: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Inserting data

{x: 0}{x: 6}{x: 7}{x: -5}{x: 10} {x: -9}

Page 8: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Inserting data

{x: 0} {x: 6}{x: 7}{x: -5} {x: 10}{x: -9}

Page 9: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Chunk range and size

{x: 0} {x: 6}{x: 7}{x: -5} {x: 10}{x: -9}

Page 10: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Inserting further data

{x: 0} {x: 6}{x: 7}{x: -5} {x: 10}{x: -9}

{x: 9}{x: -7} {x: 3}

Page 11: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Chunk splitting

{x: 0} {x: 6}{x: 7}{x: -5} {x: 10}{x: -9}

0 0

• A chunk is split once it exceeds the maximum size• There is no split point if all documents have the same shard

key• Chunk split is a logical operation (no data is moved)• If split creates too large of a discrepancy of chunk count

across cluster a balancing round starts

Page 12: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Data distribution

• MinKey to 0 lives on Shard1• 0 to MaxKey lives on Shard2• Mongos routes queries appropriately

Page 13: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Mongos routes data

minKey 0 0 maxKey

db.test.insert({ x: -1000 })

Page 14: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Mongos routes data

minKey 0 0 maxKey

db.test.insert({ x: -1000 })

Page 15: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Unbalanced shards

minKey 0 0 maxKey

Page 16: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Balancing

• Migration threshold• Number of chunks less than 20, migration threshold

of 2• 21-80, migration threshold 4• >80, migration threshold 8

Page 17: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Moving the chunk

• One chunk of data is copied from Shard 1 to Shard 2

Page 18: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Committing Migration

• Once everyone agrees the data has moved, that chunk gets deleted from Shard 1.

Page 19: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Cleanup

• Other mongos' have to find out about new configuration

Page 20: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Migrations' effect

• Expensive• Can take a long time• Competes for limited resources

Page 21: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Picking a shard key

• Cardinality

• Optimize routing

• Minimize (unnecessary) traffic

• Allow best scaling

Page 22: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

What about Object Id?

ObjectId("51597ca8e28587b86528edfd”)

• Used for _id

• 12 byte value

• Generated by the driver if not specified

• Theoretically globally unique

Page 23: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

What about Object Id?

ObjectId("51597ca8e28587b86528edfd”)

12 Bytes

Timestamp

MAC

PID

Counter

Page 24: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

// enabling sharding on test database

mongos> sh.enableSharding("test"){ "ok" : 1 }

// sharding the test collection

mongos> sh.shardCollection("test.test",{_id:1}){ "collectionsharded" : "test.test", "ok" : 1 }

// create a loop inserting data

mongos> for (x=0; x<10000; x++) {... db.test.insert({value:x})... }

Sharding on ObjectId

Page 25: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

shards:{ "_id" : "shard0000", "host" : "localhost:30000" }{ "_id" : "shard0001", "host" : "localhost:30001" }

databases:

{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }

test.testshard key: { "_id" : 1 }chunks:

shard0001 3{ "_id" : { "$minKey" : 1 } } -->> { "_id" : ObjectId(”...") }

on : shard0001 { "t" : 1000, "i" : 1 }

{ "_id" : ObjectId(”...”) } -->> { "_id" : { "$maxKey" : 1 } }

on : shard0001 { "t" : 1000, "i" : 2 }

ObjectId chunk distribution

Page 26: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

ObjectId gives a hot shard

minKey 0 0 maxKey

Page 27: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Sharding on incremental values like timestamp is not optimum for even distribution

Page 28: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Hashed Shard Keys

Page 29: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Hashed Shard Keys

{x:2} md5 c81e728d9d4c2f636f067f89cc14862c

{x:3} md5 eccbc87e4b5ce2fe28308fd9f2a7baf3

{x:1} md5 c4ca4238a0b923820dcc509a6f75849b

Page 30: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Hashed shard key eliminates hot shards

minKey 0 0 maxKey

Page 31: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Under the hood

• Create a hashed index used for sharding

• Uses the first 64-bits of md5 hash of field

• Uses existing hash index, or creates a new one on a collection

• Hash both data and BSON type

• Represented as a NumberLong in the JS shell

Page 32: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

// hash on 1 as an integer

> db.runCommand({_hashBSONElement:1}){

"key" : 1,"seed" : 0,"out" : NumberLong("5902408780260971510"),"ok" : 1

}

// hash on “1” as a string

> db.runCommand({_hashBSONElement:"1"}){

"key" : "1","seed" : 0,"out" : NumberLong("-2448670538483119681"),"ok" : 1

}

Hash on simple or embedded BSON values

Page 33: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Enabling hashed indexes

• Create index– db.collection.ensureIndex( {field : ”hashed”} )

• Options– Seed, specify a different seed to use– hashVersion, at the moment only version 0 (md5).

Page 34: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Using hash shard keys

• Enable sharding on collection– sh.shardCollection(“test.collection”, {field:

“hashed”})

• Options– numInitialChunks, specifies the number of initial

chunks per shard. Default is two chunks per shard (use “sh._adminCommand” to specify options)

Page 35: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

// enabling sharding on test database

mongos> sh.enableSharding("test"){ "ok" : 1 }

// shard by hashed _id field

mongos> sh.shardCollection("test.hash",{_id:"hashed"}){ "collectionsharded" : "test.hash", "ok" : 1 }

Sharding on hashed ObjectId

Page 36: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

databases:{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }

test.hash

shard key: { "_id" : "hashed" }

chunks:

shard0000 2

shard0001 2

{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 2 }

{ "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 3 }

{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard0001 { "t" : 2000, "i" : 4 }

{ "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 { "t" : 2000, "i" : 5 }

Pre-splitting the data

Page 37: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

// create a loop inserting data

mongos> for (x=0; x<10000; x++) {... db.hash.insert({value:x})... }

Inserting into hashed shard key collection

Page 38: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

test.hashshard key: { "_id" : "hashed" }chunks:

shard0000 4shard0001 4

{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-7374407069602479355") } on : shard0000 { "t" : 2000, "i" : 8 }

{ "_id" : NumberLong("-7374407069602479355") } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 9 }

{ "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong("-2456929743513174890") } on : shard0000 { "t" : 2000, "i" : 6 }

{ "_id" : NumberLong("-2456929743513174890") } -->> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 7 }

{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1483539935376971743") } on : shard0001 { "t" : 2000, "i" : 12 }

Even distribution of chunks

Page 39: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Routing Requests

Page 40: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Cluster Request Routing

• Targeted Queries

• Scatter Gather Queries

• Scatter Gather Queries with Sort

Page 41: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Cluster Request Routing: Targeted Query

Page 42: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Routable request received

Page 43: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Request routed to appropriate shard

Page 44: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Shard returns results

Page 45: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Mongos returns results to client

Page 46: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Cluster Request Routing: Non-Targeted Query

Page 47: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Non-Targeted Request Received

Page 48: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Request sent to all shards

Page 49: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Shards return results to mongos

Page 50: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Mongos returns results to client

Page 51: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Cluster Request Routing: Non-Targeted Query with Sort

Page 52: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Non-Targeted request with sort received

Page 53: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Request sent to all shards

Page 54: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Query and sort performed locally

Page 55: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Shards return results to mongos

Page 56: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Mongos merges sorted results

Page 57: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Mongos returns results to client

Page 58: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Hash keys are great for equality queries

• Equality queries directed to a specific shard

• Will use the index

• Most efficient query possible

Page 59: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

mongos> db.hash.find({x:1}).explain(){

"cursor" : "BtreeCursor x_hashed","n" : 1,"nscanned" : 1,"nscannedObjects" : 1,"millisShardTotal" : 0,"numQueries" : 1,"numShards" : 1,"indexBounds" : {

"x" : [[

NumberLong("5902408780260971510"),

NumberLong("5902408780260971510")]

]},"millis" : 0

}

Explain plan of an equality query

Page 60: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

But not so good for a range query

• Range queries scatter gather

• Won’t use index

Page 61: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

mongos> db.hash.find({x:{$gt:1, $lt:99}}).explain(){

"cursor" : "BasicCursor","n" : 97,"nChunkSkips" : 0,"nYields" : 0,"nscanned" : 1000,"nscannedAllPlans" : 1000,"nscannedObjects" : 1000,"nscannedObjectsAllPlans" : 1000,"millisShardTotal" : 0,"millisShardAvg" : 0,"numQueries" : 2,"numShards" : 2,"millis" : 3

}

Explain plan of a range query

Page 62: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Other limitations

• Cannot use a compound key

• Key cannot have an array value

• Tag-aware sharding– Only makes sense to assign the full hashed shard

key collection to particular shards– By design, there’s no real way to know or control

what data is in what range

• Key with poor cardinality is going to give a hash with poor cardinality

– Floating point numbers are squashed. E.g. 100.4 will be hashed as 100

Page 63: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Summary

• Range-based Sharding– Most efficient for applications that operate on

ranges– Requires careful shard key selection

• Hash-based Sharding– Uniform writes,– No routed range queries

• Tag Aware Sharding– That’s another talk!

Page 65: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

Questions?

Page 66: Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding

James Kerr

Thank You

Senior Solutions Architect, 10gen