MongoDB - Monitoring & queueing

Preview:

DESCRIPTION

Building a queueing system in MongoDB and monitoring your cluster. Presentation by David Mytton at MongoSF May 2011 and MongoDB London User Group July 2011.

Citation preview

MongoDB Queuing & Monitoring

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

☺ Atomicity

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

☺ Atomicity

☃ Speed

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

☺ Atomicity

☃ Speed

☹ GC

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

www.flickr.com/photos/triplexpresso/496995086/

Queuing

☺ Redundancy

☺ Known

It’s a little different,but not entirely new.

www.flickr.com/photos/comedynose/4388430444/

Keep it in RAM. Obviously.

http://www.flickr.com/photos/comedynose/4388430444/

How do you know?

> db.stats(){! "collections" : 3,! "objects" : 379970142,! "avgObjSize" : 146.4554114991488,! "dataSize" : 55648683504,! "storageSize" : 61795435008,! "numExtents" : 64,! "indexes" : 1,! "indexSize" : 21354514128,! "fileSize" : 100816388096,! "ok" : 1}

51GB

19GB

http://www.flickr.com/photos/comedynose/4388430444/

Where should it go?

What? Should it be in memory?

Indexes Always

Data If you can

How you’ll know

1) Slow queries

Thu Oct 14 17:01:11 [conn7410] update sd.apiLog query: { c: "android/setDeviceToken", a: 1466, u: "blah", ua: "Server Density Android" } 51926ms

www.flickr.com/photos/tonivc/2283676770/

How you’ll know

2) Timeouts

cursor timed out (20000 ms)

How you’ll know

3) Disk i/o spikes

www.flickr.com/photos/daddo83/3406962115/

Watch your storage

1) Pre-alloc

Watch your storage

2) Sharding maxSize

Watch your storage

3) Logging

--quiet

db.runCommand("logRotate");

killall -SIGUSR1 mongod

Watch your storage

4) Journaling

david@rs2b ~: ls -alh /mongodbdata/journal/total 538Mdrwxrwxr-x 2 david david 29 Mar 20 16:50 .drwx------ 4 david david 4.0K Mar 13 09:50 ..-rw------- 1 david david 538M Mar 20 17:00 j._862-rw------- 1 david david 88 Mar 20 17:00 lsn

db.serverStatus()

1) Used connections

db.serverStatus()

www.flickr.com/photos/armchaircaver/2061231069/

2) Available connections

db.serverStatus()

Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2268] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2d:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2c:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2a:27018 socket exception Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2d:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2d:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2c:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2c:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2a:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2a:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:35 [conn2343] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } MessagingPort say send() errno:9 Bad file descriptor (NONE)

connPoolStats> db.runCommand("connPoolStats"){! "hosts" : {! ! "config1:27019" : {! ! ! "available" : 2,! ! ! "created" : 6! ! },! ! "set1/rs1a:27018,rs1b:27018" : {! ! ! "available" : 1,! ! ! "created" : 249! ! },

...! },! "totalAvailable" : 5,! "totalCreated" : 1002,! "numDBClientConnection" : 3490,! "numAScopedConnection" : 3,}

3) Index counters

db.serverStatus()

"indexCounters" : {! ! "btree" : {! ! ! "accesses" : 15180175,! ! ! "hits" : 15178725,! ! ! "misses" : 1450,! ! ! "resets" : 0,! ! ! "missRatio" : 0.00009551932! ! }! },

4) Op counters

db.serverStatus()

www.flickr.com/photos/cosmic_bandita/2395369614/

5) Background flushing

db.serverStatus()

Picture is unrelated! Mmm, ice cream.

6) Dur

db.serverStatus()

rs.status()

www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek)

{! "_id" : 1,! "name" : "rs3b:27018",! "health" : 1,! "state" : 2,! "stateStr" : "SECONDARY",! "uptime" : 1886098,! "optime" : {! ! "t" : 1291252178000,! ! "i" : 13! },! "optimeDate" : ISODate("2010-12-02T01:09:38Z"), "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")},

1) myState

rs.status()

Value Meaning0 Starting up (phase 1)1 Primary2 Secondary3 Recovering4 Fatal error5 Starting up (phase 2)6 Unknown state7 Arbiter8 Down

en.wikipedia.org/wiki/State_of_matter

2) Optime

rs.status()

www.flickr.com/photos/robbie73/4244846566/

"optimeDate" : ISODate("2010-12-02T01:09:38Z")

3) Heartbeat

rs.status()

www.flickr.com/photos/drawblindfaith/3400981091/

"lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")

mongostat

1) faults

mongostat

Picture is unrelated! Snowmobile in Norway.

2) locked

mongostat

www.flickr.com/photos/bbusschots/4541573665/

3) index miss

mongostat

www.flickr.com/photos/gareandkitty/276471187/

4) queues

mongostat

5) Diagnostics

mongostat

Current operations

www.flickr.com/photos/jeffhester/2784666811/

db.currentOp();{! ! ! "opid" : "shard1:299939199",! ! ! "active" : true,! ! ! "lockType" : "write",! ! ! "waitingForLock" : false,! ! ! "secs_running" : 15419,! ! ! "op" : "remove",! ! ! "ns" : "sd.metrics",! ! ! "query" : {! ! ! ! "accId" : 1391,! ! ! ! "tA" : {! ! ! ! ! "$lte" : ISODate("2010-11-24T19:53:00Z")! ! ! ! }! ! ! },! ! ! "client" : "10.121.12.228:44426",! ! ! "desc" : "conn"! ! },

Monitoring tools

Server Density

Monitoring tools

www.mongomonitor.com

plugins.serverdensity.com

App store for sysadmins

Recap

Keep it in RAM

Recap

Keep it in RAM

Watch your storage

Recap

Keep it in RAM

Watch your storage

db.serverStatus()

rs.status()

Recap

David Mytton

david@boxedice.com

@davidmytton

Woop Japan!

www.mongomonitor.com

Recommended