43
MongoDB Health Tips

Monitoring MongoDB (MongoUK)

Embed Size (px)

DESCRIPTION

Presentation by David Mytton about monitoring MongoDB at the MongoUK conference 21st Mar 2011.A full blog series covering everything in this presentation is at http://blog.boxedice.com/mongodb-monitoring/

Citation preview

Page 1: Monitoring MongoDB (MongoUK)

MongoDB Health Tips

Page 2: Monitoring MongoDB (MongoUK)

It’s a little different,but not entirely new.

Page 3: Monitoring MongoDB (MongoUK)

www.flickr.com/photos/comedynose/4388430444/

Keep it in RAM. Obviously.

Page 4: Monitoring MongoDB (MongoUK)

http://www.flickr.com/photos/comedynose/4388430444/

How do you know?

> db.stats(){! "collections" : 3,! "objects" : 379970142,! "avgObjSize" : 146.4554114991488,! "dataSize" : 55648683504,! "storageSize" : 61795435008,! "numExtents" : 64,! "indexes" : 1,! "indexSize" : 21354514128,! "fileSize" : 100816388096,! "ok" : 1}

51GB

19GB

Page 5: Monitoring MongoDB (MongoUK)

http://www.flickr.com/photos/comedynose/4388430444/

Where should it go?

What? Should it be in memory?

Indexes Always

Data If you can

Page 6: Monitoring MongoDB (MongoUK)

How you’ll know

1) Slow queries

Thu Oct 14 17:01:11 [conn7410] update sd.apiLog query: { c: "android/setDeviceToken", a: 1466, u: "blah", ua: "Server Density Android" } 51926ms

www.flickr.com/photos/tonivc/2283676770/

Page 7: Monitoring MongoDB (MongoUK)

How you’ll know

2) Timeouts

cursor timed out (20000 ms)

Page 8: Monitoring MongoDB (MongoUK)

How you’ll know

3) Disk i/o spikes

www.flickr.com/photos/daddo83/3406962115/

Page 9: Monitoring MongoDB (MongoUK)

Watch your storage

1) Pre-alloc

Page 10: Monitoring MongoDB (MongoUK)

Watch your storage

2) Sharding maxSize

Page 11: Monitoring MongoDB (MongoUK)

Watch your storage

3) Logging

--quiet

db.runCommand("logRotate");

killall -SIGUSR1 mongod

Page 12: Monitoring MongoDB (MongoUK)

Watch your storage

4) Journaling

david@rs2b ~: ls -alh /mongodbdata/journal/total 538Mdrwxrwxr-x 2 david david 29 Mar 20 16:50 .drwx------ 4 david david 4.0K Mar 13 09:50 ..-rw------- 1 david david 538M Mar 20 17:00 j._862-rw------- 1 david david 88 Mar 20 17:00 lsn

Page 13: Monitoring MongoDB (MongoUK)

db.serverStatus()

Page 14: Monitoring MongoDB (MongoUK)

1) Used connections

db.serverStatus()

www.flickr.com/photos/armchaircaver/2061231069/

Page 15: Monitoring MongoDB (MongoUK)

2) Available connections

db.serverStatus()

Page 16: Monitoring MongoDB (MongoUK)

Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2268] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2d:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2c:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2a:27018 socket exception Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2d:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2d:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2c:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2c:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2a:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2a:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:35 [conn2343] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } MessagingPort say send() errno:9 Bad file descriptor (NONE)

Page 17: Monitoring MongoDB (MongoUK)

connPoolStats> db.runCommand("connPoolStats"){! "hosts" : {! ! "config1:27019" : {! ! ! "available" : 2,! ! ! "created" : 6! ! },! ! "set1/rs1a:27018,rs1b:27018" : {! ! ! "available" : 1,! ! ! "created" : 249! ! },

...! },! "totalAvailable" : 5,! "totalCreated" : 1002,! "numDBClientConnection" : 3490,! "numAScopedConnection" : 3,}

Page 18: Monitoring MongoDB (MongoUK)

3) Index counters

db.serverStatus()

"indexCounters" : {! ! "btree" : {! ! ! "accesses" : 15180175,! ! ! "hits" : 15178725,! ! ! "misses" : 1450,! ! ! "resets" : 0,! ! ! "missRatio" : 0.00009551932! ! }! },

Page 19: Monitoring MongoDB (MongoUK)

4) Op counters

db.serverStatus()

www.flickr.com/photos/cosmic_bandita/2395369614/

Page 20: Monitoring MongoDB (MongoUK)

5) Background flushing

db.serverStatus()

Picture is unrelated! Mmm, ice cream.

Page 21: Monitoring MongoDB (MongoUK)

6) Dur

db.serverStatus()

Page 22: Monitoring MongoDB (MongoUK)

rs.status()

www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek)

{! "_id" : 1,! "name" : "rs3b:27018",! "health" : 1,! "state" : 2,! "stateStr" : "SECONDARY",! "uptime" : 1886098,! "optime" : {! ! "t" : 1291252178000,! ! "i" : 13! },! "optimeDate" : ISODate("2010-12-02T01:09:38Z"), "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")},

Page 23: Monitoring MongoDB (MongoUK)

1) myState

rs.status()

Value Meaning0 Starting up (phase 1)1 Primary2 Secondary3 Recovering4 Fatal error5 Starting up (phase 2)6 Unknown state7 Arbiter8 Down

en.wikipedia.org/wiki/State_of_matter

Page 24: Monitoring MongoDB (MongoUK)

2) Optime

rs.status()

www.flickr.com/photos/robbie73/4244846566/

"optimeDate" : ISODate("2010-12-02T01:09:38Z")

Page 25: Monitoring MongoDB (MongoUK)

3) Heartbeat

rs.status()

www.flickr.com/photos/drawblindfaith/3400981091/

"lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")

Page 26: Monitoring MongoDB (MongoUK)

mongostat

Page 27: Monitoring MongoDB (MongoUK)

1) faults

mongostat

Picture is unrelated! Snowmobile in Norway.

Page 28: Monitoring MongoDB (MongoUK)

2) locked

mongostat

www.flickr.com/photos/bbusschots/4541573665/

Page 29: Monitoring MongoDB (MongoUK)

3) index miss

mongostat

www.flickr.com/photos/gareandkitty/276471187/

Page 30: Monitoring MongoDB (MongoUK)

4) queues

mongostat

Page 31: Monitoring MongoDB (MongoUK)

5) Diagnostics

mongostat

Page 32: Monitoring MongoDB (MongoUK)

Current operations

www.flickr.com/photos/jeffhester/2784666811/

db.currentOp();{! ! ! "opid" : "shard1:299939199",! ! ! "active" : true,! ! ! "lockType" : "write",! ! ! "waitingForLock" : false,! ! ! "secs_running" : 15419,! ! ! "op" : "remove",! ! ! "ns" : "sd.metrics",! ! ! "query" : {! ! ! ! "accId" : 1391,! ! ! ! "tA" : {! ! ! ! ! "$lte" : ISODate("2010-11-24T19:53:00Z")! ! ! ! }! ! ! },! ! ! "client" : "10.121.12.228:44426",! ! ! "desc" : "conn"! ! },

Page 33: Monitoring MongoDB (MongoUK)

Monitoring tools

Run yourself

Ganglia

Page 34: Monitoring MongoDB (MongoUK)

Monitoring tools

Server Density

Page 35: Monitoring MongoDB (MongoUK)
Page 36: Monitoring MongoDB (MongoUK)
Page 37: Monitoring MongoDB (MongoUK)
Page 38: Monitoring MongoDB (MongoUK)

Monitoring tools

www.mongomonitor.com

Page 39: Monitoring MongoDB (MongoUK)

Recap

Page 40: Monitoring MongoDB (MongoUK)

Keep it in RAM

Recap

Page 41: Monitoring MongoDB (MongoUK)

Keep it in RAM

Watch your storage

Recap

Page 42: Monitoring MongoDB (MongoUK)

Keep it in RAM

Watch your storage

db.serverStatus()

rs.status()

Recap

Page 43: Monitoring MongoDB (MongoUK)

David Mytton

[email protected]

@davidmytton

Woop Japan!

www.mongomonitor.com