51
MongoDB Queuing & Monitoring

MongoDB - Monitoring and queueing

Embed Size (px)

DESCRIPTION

Building a queueing system in MongoDB and monitoring your cluster. Presentation by David Mytton at MongoSF May 2011 and MongoDB London User Group July 2011.

Citation preview

Page 1: MongoDB - Monitoring and queueing

MongoDB Queuing & Monitoring

Page 3: MongoDB - Monitoring and queueing

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

Page 4: MongoDB - Monitoring and queueing

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

☺ Atomicity

Page 5: MongoDB - Monitoring and queueing

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

☺ Atomicity

☃ Speed

Page 6: MongoDB - Monitoring and queueing

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

☺ Atomicity

☃ Speed

☹ GC

Page 7: MongoDB - Monitoring and queueing

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

Page 8: MongoDB - Monitoring and queueing

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

☺ Known

Page 9: MongoDB - Monitoring and queueing

It’s a little different,but not entirely new.

Page 10: MongoDB - Monitoring and queueing

www.flickr.com/photos/comedynose/4388430444/

Keep it in RAM. Obviously.

Page 11: MongoDB - Monitoring and queueing

http://www.flickr.com/photos/comedynose/4388430444/

How do you know?

> db.stats(){! "collections" : 3,! "objects" : 379970142,! "avgObjSize" : 146.4554114991488,! "dataSize" : 55648683504,! "storageSize" : 61795435008,! "numExtents" : 64,! "indexes" : 1,! "indexSize" : 21354514128,! "fileSize" : 100816388096,! "ok" : 1}

51GB

19GB

Page 12: MongoDB - Monitoring and queueing

http://www.flickr.com/photos/comedynose/4388430444/

Where should it go?

What? Should it be in memory?

Indexes Always

Data If you can

Page 13: MongoDB - Monitoring and queueing

How you’ll know

1) Slow queries

Thu Oct 14 17:01:11 [conn7410] update sd.apiLog query: { c: "android/setDeviceToken", a: 1466, u: "blah", ua: "Server Density Android" } 51926ms

www.flickr.com/photos/tonivc/2283676770/

Page 14: MongoDB - Monitoring and queueing

How you’ll know

2) Timeouts

cursor timed out (20000 ms)

Page 15: MongoDB - Monitoring and queueing

How you’ll know

3) Disk i/o spikes

www.flickr.com/photos/daddo83/3406962115/

Page 16: MongoDB - Monitoring and queueing

Watch your storage

1) Pre-alloc

Page 17: MongoDB - Monitoring and queueing

Watch your storage

2) Sharding maxSize

Page 18: MongoDB - Monitoring and queueing

Watch your storage

3) Logging

--quiet

db.runCommand("logRotate");

killall -SIGUSR1 mongod

Page 19: MongoDB - Monitoring and queueing

Watch your storage

4) Journaling

david@rs2b ~: ls -alh /mongodbdata/journal/total 538Mdrwxrwxr-x 2 david david 29 Mar 20 16:50 .drwx------ 4 david david 4.0K Mar 13 09:50 ..-rw------- 1 david david 538M Mar 20 17:00 j._862-rw------- 1 david david 88 Mar 20 17:00 lsn

Page 20: MongoDB - Monitoring and queueing

db.serverStatus()

Page 21: MongoDB - Monitoring and queueing

1) Used connections

db.serverStatus()

www.flickr.com/photos/armchaircaver/2061231069/

Page 22: MongoDB - Monitoring and queueing

2) Available connections

db.serverStatus()

Page 23: MongoDB - Monitoring and queueing

Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2268] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2d:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2c:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2a:27018 socket exception Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2d:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2d:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2c:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2c:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2a:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2a:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:35 [conn2343] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } MessagingPort say send() errno:9 Bad file descriptor (NONE)

Page 24: MongoDB - Monitoring and queueing

connPoolStats> db.runCommand("connPoolStats"){! "hosts" : {! ! "config1:27019" : {! ! ! "available" : 2,! ! ! "created" : 6! ! },! ! "set1/rs1a:27018,rs1b:27018" : {! ! ! "available" : 1,! ! ! "created" : 249! ! },

...! },! "totalAvailable" : 5,! "totalCreated" : 1002,! "numDBClientConnection" : 3490,! "numAScopedConnection" : 3,}

Page 25: MongoDB - Monitoring and queueing

3) Index counters

db.serverStatus()

"indexCounters" : {! ! "btree" : {! ! ! "accesses" : 15180175,! ! ! "hits" : 15178725,! ! ! "misses" : 1450,! ! ! "resets" : 0,! ! ! "missRatio" : 0.00009551932! ! }! },

Page 26: MongoDB - Monitoring and queueing

4) Op counters

db.serverStatus()

www.flickr.com/photos/cosmic_bandita/2395369614/

Page 27: MongoDB - Monitoring and queueing

5) Background flushing

db.serverStatus()

Picture is unrelated! Mmm, ice cream.

Page 28: MongoDB - Monitoring and queueing

6) Dur

db.serverStatus()

Page 29: MongoDB - Monitoring and queueing

rs.status()

www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek)

{! "_id" : 1,! "name" : "rs3b:27018",! "health" : 1,! "state" : 2,! "stateStr" : "SECONDARY",! "uptime" : 1886098,! "optime" : {! ! "t" : 1291252178000,! ! "i" : 13! },! "optimeDate" : ISODate("2010-12-02T01:09:38Z"), "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")},

Page 30: MongoDB - Monitoring and queueing

1) myState

rs.status()

Value Meaning0 Starting up (phase 1)1 Primary2 Secondary3 Recovering4 Fatal error5 Starting up (phase 2)6 Unknown state7 Arbiter8 Down

en.wikipedia.org/wiki/State_of_matter

Page 31: MongoDB - Monitoring and queueing

2) Optime

rs.status()

www.flickr.com/photos/robbie73/4244846566/

"optimeDate" : ISODate("2010-12-02T01:09:38Z")

Page 32: MongoDB - Monitoring and queueing

3) Heartbeat

rs.status()

www.flickr.com/photos/drawblindfaith/3400981091/

"lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")

Page 33: MongoDB - Monitoring and queueing

mongostat

Page 34: MongoDB - Monitoring and queueing

1) faults

mongostat

Picture is unrelated! Snowmobile in Norway.

Page 35: MongoDB - Monitoring and queueing

2) locked

mongostat

www.flickr.com/photos/bbusschots/4541573665/

Page 36: MongoDB - Monitoring and queueing

3) index miss

mongostat

www.flickr.com/photos/gareandkitty/276471187/

Page 37: MongoDB - Monitoring and queueing

4) queues

mongostat

Page 38: MongoDB - Monitoring and queueing

5) Diagnostics

mongostat

Page 39: MongoDB - Monitoring and queueing

Current operations

www.flickr.com/photos/jeffhester/2784666811/

db.currentOp();{! ! ! "opid" : "shard1:299939199",! ! ! "active" : true,! ! ! "lockType" : "write",! ! ! "waitingForLock" : false,! ! ! "secs_running" : 15419,! ! ! "op" : "remove",! ! ! "ns" : "sd.metrics",! ! ! "query" : {! ! ! ! "accId" : 1391,! ! ! ! "tA" : {! ! ! ! ! "$lte" : ISODate("2010-11-24T19:53:00Z")! ! ! ! }! ! ! },! ! ! "client" : "10.121.12.228:44426",! ! ! "desc" : "conn"! ! },

Page 40: MongoDB - Monitoring and queueing

Monitoring tools

Server Density

Page 41: MongoDB - Monitoring and queueing
Page 42: MongoDB - Monitoring and queueing
Page 43: MongoDB - Monitoring and queueing
Page 44: MongoDB - Monitoring and queueing

Monitoring tools

www.mongomonitor.com

Page 45: MongoDB - Monitoring and queueing
Page 46: MongoDB - Monitoring and queueing

plugins.serverdensity.com

App store for sysadmins

Page 47: MongoDB - Monitoring and queueing

Recap

Page 48: MongoDB - Monitoring and queueing

Keep it in RAM

Recap

Page 49: MongoDB - Monitoring and queueing

Keep it in RAM

Watch your storage

Recap

Page 50: MongoDB - Monitoring and queueing

Keep it in RAM

Watch your storage

db.serverStatus()

rs.status()

Recap

Page 51: MongoDB - Monitoring and queueing

David Mytton

[email protected]

@davidmytton

Woop Japan!

www.mongomonitor.com