Transcript
Page 1: Webinar: Deployment Best Practices

Solutions Architect, 10gen

Sandeep Parikh

Deployment Best Practices

Page 2: Webinar: Deployment Best Practices

Prototype

Test

Monitor Scale

Script

The Cycle of Deployment Prep

Page 3: Webinar: Deployment Best Practices

Prototype Your Deployment

•  You have to start somewhere

•  Development is complete, deployment is next

•  Sketch out some initial deployment parameters ü Hardware sizing ü Operating system ü Disk setup ü Storage layout, data vs. journal vs. log

Prototype

Test

Monitor Scale

Script

Page 4: Webinar: Deployment Best Practices

Prototyping Considerations

•  Additional considerations –  Horizontal vs. vertical scale options –  Multiple datacenters

•  Start thinking about data growth –  Do you know how your data will evolve? –  Does your data live in multiple collections/databases –  Read-centric, write-centric or both?

•  The more you start thinking about it, the better

Prototype

Test

Monitor Scale

Script

Page 5: Webinar: Deployment Best Practices

Test, Test, Test

•  Generate a lot of data –  Write tests to measure bulk loading throughput –  Scaffolding can be used for staging, validation

•  Build your indexes –  All in the beginning –  On the fly

•  Script your app –  Can you simulate “expected” usage?

Prototype

Test

Monitor Scale

Script

Page 6: Webinar: Deployment Best Practices

Monitor Your Resources

•  Watch everything

•  The goal is to understand the numbers before deploying

•  Monitor using –  SNMP, munin, nagios –  mongostat, mongotop, iostat, cpustat –  MongoDB Monitoring Service (MMS)

•  Other stats –  Database, Collection level

Prototype

Test

Monitor Scale

Script

Page 7: Webinar: Deployment Best Practices

Monitoring Key Metrics

•  Op Counters –  Inserts, updates, deletes, reads

(more is generally better) –  Some differences in primary

vs. secondary ops

•  Resident memory –  Want this lower than

available physical memory –  Correlated with page faults

and index misses

•  Queues –  Readers and writers

Prototype

Test

Monitor Scale

Script

Page 8: Webinar: Deployment Best Practices

Monitoring Key Metrics

•  Page faults and B-Tree –  How often are you having to

hit the disk –  Persistently non-zero?

Working set might not fit.

•  Lock Percentage –  If high and queues are filled,

hitting write capacity

•  IO and CPU Stats –  IO Sustained or fluctuating

=> IO bound –  CPU hitting IOWAITs

Prototype

Test

Monitor Scale

Script

Page 9: Webinar: Deployment Best Practices

Scale Your Setup

•  Monitor those metrics while testing

•  Should tell you where to add capacity –  CPU, RAM, Disks

•  Storage configuration –  RAID levels –  Filesystem selection –  Block sizing –  Readahead setting

Prototype

Test

Monitor Scale

Script

Page 10: Webinar: Deployment Best Practices

Script Your Plays

•  Backups

•  Restores (backups are not enough)

•  Maintenance and Upgrades

•  Replica Set operations –  Stepping primaries down, adding new secondaries

•  Sharding operations –  Consistent backups, balancer operations

Prototype

Test

Monitor Scale

Script

Page 11: Webinar: Deployment Best Practices

Prototype

Test

Monitor Scale

Script

Lather, Rinse, Repeat

Page 12: Webinar: Deployment Best Practices

Perfect. I know what to do. How Do I Do It?

Page 13: Webinar: Deployment Best Practices

Balancing Priorities

Product Development

Infrastructure Development

Integration

QA

Code

Operations

Monitoring

Page 14: Webinar: Deployment Best Practices

The Scale Tips To One Side

•  Product development is the priority –  As it should be, but…

•  Infrastructure development can’t be overlooked

•  Know the downsides of not being prepared –  Downtime –  Data safety

•  Disaster will strike in one way or another

Page 15: Webinar: Deployment Best Practices

Integrate With The Dev Cycle

•  Why are ops typically skipped over until it’s too late? –  Planning can alleviate this issue

•  Make operations development a part of the dev cycle –  Put it into the schedule –  Make it a development milestone

•  Use it to your advantage –  Script deployment of dev and test systems

Page 16: Webinar: Deployment Best Practices

That’s all well and good but we are already deployed

Page 17: Webinar: Deployment Best Practices

Let’s Avoid This Situation

Page 18: Webinar: Deployment Best Practices

Prototype

Test

Monitor Scale

Script

Start The Cycle Again

Page 19: Webinar: Deployment Best Practices

Start With Monitoring

•  Monitor your deployment –  Munin, nagios –  MMS

•  Instrument your app –  Know your queries –  Read/write/update/delete behaviors –  Index utilization

•  Database and collection stats

Prototype

Test

Monitor Scale

Script

Page 20: Webinar: Deployment Best Practices

Scaling Deployment

•  The numbers don’t lie –  But individual measurements don’t always tell the whole

story

•  Are you hardware bound? –  Memory, Disks, CPU

•  Is your app the problem?

•  What about system settings? –  Low Resident Memory > Readahead > Page Faults

Prototype

Test

Monitor Scale

Script

Page 21: Webinar: Deployment Best Practices

Basic Solutions

•  Low opcounters + high page faults –  More memory

•  High paddingFactor and fragmentation –  Data model changes

•  Balancer running a lot, chunks always migrating –  Better shard key

•  Persistent b-tree misses, high page faults –  Queries aren’t hitting the indexes or aren’t using them

Prototype

Test

Monitor Scale

Script

Page 22: Webinar: Deployment Best Practices

Continue Through the Cycle

•  Script your setup –  This will save time as you iterate

•  Prototype the fixes –  Evaluate queries, how documents change, expected usage

•  Test the new setup –  Scripts to build the deployment and model usage

Prototype

Test

Monitor Scale

Script

Page 23: Webinar: Deployment Best Practices

Deployment is about Not being surprised

Page 24: Webinar: Deployment Best Practices

Problem > Diagnosis > Solution

Page 25: Webinar: Deployment Best Practices

Problem 1: Streaming Events

•  Suboptimal write throughput

•  Where is the bottleneck? –  Check the metrics

Page 26: Webinar: Deployment Best Practices

Diagnosis 1

•  Are opcounters reasonably accurate?

•  Check the queues

•  Examine lock percentages

•  How does resident memory look?

•  How large are your indexes?

Page 27: Webinar: Deployment Best Practices

Solution 1

•  Opcounters aren’t as high as you’d expect but memory is saturated

•  Correlated with high page faults

•  You might need more memory

•  MongoDB wants to fit your working set into memory

Page 28: Webinar: Deployment Best Practices

Problem 2: Tracking FB Friends

•  Update-heavy workload is slow

•  Document paddingFactor is increasing

Page 29: Webinar: Deployment Best Practices

Diagnosis 2

•  High paddingFactor –  Fragmentation!

•  More memory/disk is taken up by new documents –  Inefficient space usage

•  Documents are having to be relocated regularly

Page 30: Webinar: Deployment Best Practices

Solution 2

•  Check your queries –  Are your documents growing because of arrays or added

fields?

•  Pre-create required document structure or…

•  Kick growing elements individual objects in a separate collection –  Data model changes, app changes

Page 31: Webinar: Deployment Best Practices

Problem 3: Status Updates

•  Write-heavy sharded deployment –  Is one shard getting burned –  Balancer locked all the time

•  Balancer is constantly migrating chunks

Page 32: Webinar: Deployment Best Practices

Diagnosis 3

•  Check the mongos logs –  How often is migration occurring? –  Are chunks constantly moving from one shard to the next?

•  Shard key distribution –  Sequential keys? –  One shard always getting new writes?

Page 33: Webinar: Deployment Best Practices

Solution 3

•  Consider using hash, byte swapping, etc. if no “natural” key that distributes well –  Avoids the “hot” shard problem

•  High writes and high balancer lock –  Manage balancer window –  Run it during low utilization

Page 34: Webinar: Deployment Best Practices

Problem 4: File Sharing

•  Storing files in GridFS

•  Uploads are taking too long

Page 35: Webinar: Deployment Best Practices

Diagnosis 4

•  Check CPU and IO stats

•  Is the CPU stuck in IOWAITS?

•  High sustained IO operations

•  Lots of queued operations

•  IO bound workload

Page 36: Webinar: Deployment Best Practices

Solution 4

•  Ensure storage is in good health –  RAID status –  SAN or NAS devices functioning properly –  Virtualized disks

•  Consider separating data and journal –  --directoryperdb –  Symlink journal to another location

•  Ensure other processes aren’t hitting storage

Page 37: Webinar: Deployment Best Practices

Problem 5: Reading Logs

•  Indexes are underperforming

•  Queries are using indexes but yielding quite a bit

Page 38: Webinar: Deployment Best Practices

Diagnosis 5

•  Use .explain() and .hint() with your queries

•  Check out the b-tree metrics –  Persistent non-zero misses? –  Correlated with memory, page faults, IO stats

•  B-trees best for range queries over single dimension –  Range queries on {A} if index is {A,B} could be suboptimal

Page 39: Webinar: Deployment Best Practices

Solution 5

•  Revisit your indexing strategy

•  Consider data model changes to optimize queries and indexes

•  Some functionality doesn’t hit the index –  $where javascript clauses –  $mod, $not, $ne –  Complex regular expressions

Page 40: Webinar: Deployment Best Practices

Miscellaneous Deployment Notes

•  Warm the cache –  Use touch via db.runCommand()

•  Dynamically change log levels

•  Synchronize all clocks to the same NTP server

Page 41: Webinar: Deployment Best Practices

Questions?

Page 42: Webinar: Deployment Best Practices

How To Get Help

•  Refer to our docs: docs.mongodb.org –  (hint: they’re very helpful!)

•  Other things we monitor –  mongodb-user Google group –  Stack Overflow

•  Found a bug? Submit a ticket