Capacity Planning For Your Growing MongoDB Cluster

Solution Architect, MongoDB

Sam Weaver

Capacity Planning:

Deploying MongoDB

#mongodb

Capacity Planning

• Why is it important?

• What is it?

• When is it important?

• How is it actually done?

Prepping for launch

• You’ve written your application

• The code is good

• You’re looking to launch soon

• How do I deploy?

Questions to ask yourself

• Instance types

– Standalone?

– Replica set?

– Sharded?

• Architecture

• Size of machines

– Machines cost money

– Size of machines may affect instance types required

• What are the consequences of not planning?

Why does it matter?

• Once we launch, we don't want to have avoidable down time due to poorly selected HW

• As our success grows we want to stay in front of the demand curve

• We want to meet business' and users' expectations

• We want to keep our jobs

What is Capacity Planning?

Requirements

Resources

Requirements

• Availability

– Planning for a crash

– Planning for binary upgrades

– Planning for hardware maintenance

• Throughput

– X many users at any one time

– Bulk loads vs. random access

• Responsiveness

– SLA of x ms per page load

– Amazon, Google study

• Non-indexed Data

• Sorting

• Aggregation

– Map/Reduce

– Framework

• Data

– Fields

– Nesting

– Arrays/Embedded-Docs

Network

• Latency

– WriteConcern

– ReadPreference

– Batching

• Throughput

– Update/Write Patterns

– Reads/Queries

Understand memory usage for MongoDB

• Data & indexes memory mapped into virtual

address space

• Data accessed is paged into RAM

• OS evicts least recently used page

• More frequently used pages stay in RAM

Identify your working set

Number of active users on the system at any one time

Number of distinct pages accessed per second

Working Set

4 distinct pages per second

Working Set

Worst case 4 disk accesses

Working Set

Worst case disk access on every op

Memory & Storage

Memory

• Working set affected by

–Sorting

–Aggregation

–Connections

Connections

Aggregations

Working Set Estimator

"workingSet" : {

"note" : "thisIsAnEstimate",

"pagesInMemory" : <num>,

"computationTimeMicros" : <num>,

"overSeconds" : num

Number of unique pages the server needed in the last

15 minutes. Use this to see if you are growing out

Storage• Different storage have different IOPs

– Spinning disk

• 7,500k SATA 75-100 IOPs

– SSD

• 9,000-120,000 IOPs

– EBS

• 100 IOPs

– Provisioned EBS

• 2,000 IOPs

• Work out how much data you need to write per time frame.

• MongoDB writes to a journal and datafiles flush to disk.

• Replication adds oplog considerations

Using this information

• Plan hardware to hold the working set + indexes

• Allow room to grow

• If working set is larger than RAM and you can’t

reasonably add more resources, then shard

– Don’t shard too early

– Lots of little instances vs. a few big instances

• Think about architecture

– Local disk or central storage

– Don’t be surprised with x copies of data with x number of

Development to production

• Don’t be surprised by:

– More data = more/larger indexes

– Indexes make your working set bigger

• Replication adds a network overhead

• Journal has different access patterns

What tools are there to help me?

IOStat

MongoStat

MongoPerf

• Measure amount of data written to device per

second

MongoDB Management Service

• Free Cloud or On-Premise based management tool

– Monitoring

– Automation

– Backup

Scaling for capacity – MMS automation

Capacity Planning: When

• When?

– Before it's too late!

– Iterative process

Start Launch Version 2

Repeat (continuously)

• Repeat Testing

• Repeat Evaluations

• Repeat Deployment

What is failure?

• We have failed at Capacity Planning when our

resources don’t meet our requirements

• Because our requirements can have many

dimensions, we may exceed our requirements in

one characteristic but not meet them in another

• This means that we can spend many $$$ and still

Models

• Load/Users

– Response Time/TTFB

• System Performance

– Peak Usage

– Min Usage

Starter Questions

• What is the working set?

– How does that equate to memory

– How much disk access will that require

• How efficient are the queries?

• What is the rate of data change?

• How big are the highs and lows?

Questions?

Solution Architect, MongoDB

Sam Weaver

Thank You

#mongodb

Capacity Planning For Your Growing MongoDB Cluster

Technology

Belgium, a growing biopharma cluster - FICCIficci.in/events/20337/ISP/Belgium-a-growing-biopharma-cluster.pdf · Beatriz, living with epilepsy Belgium, a growing biopharma cluster

MongoDB World 2016: MongoDB + Google Cloud

MongoDB on the AWS Cloud · configuration steps for deploying a MongoDB cluster on the Amazon Web Services (AWS) cloud. It discusses best practices for deploying MongoDB on AWS using

Implementasi Dan Analisis Performansi Mapreduce di ... · 3" MongoDB menjadi sistem basisdata terdistribusi dengan menggunakan sharding, Berikut adalah cluster-cluster yang membangun

1. Spark DataFrames + SQL€¦ · Spark + MongoDB 1. Spark DataFrames + SQL 1.1 Setup the Spark cluster on Azure Create a cluster Sign into the azure portal (portal.azure.com). Search

MongoDB 3.0 migration - MongoDB Days Munich

Visser Tavara Villaran Gamarra Cluster Growing but Not Developing KNAG Netherland 2015

MongoDB: What, why, when. Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb

Data Driven Performance Repository to Classify and ... · MongoDB. Cluster-Python Driver. Cassandra - Python Driver. Python. Spark Cluster. Spark - Cassandra Connector. Spark - MongoDB

MongoDB Europe 2016 - Debugging MongoDB Performance

Growing Up MongoDB

Automate MongoDB with MongoDB Management Service

Research and Technology Transfer to Support a Growing Alternative Energy Cluster

20161122 From Terraform to SaltStack and beyondfiles.meetup.com/18628440/From Terraform to... · “Create a MongoDB sharded cluster in an AWS environment. This cluster should be

MongoDB Europe 2016 - MongoDB 3.4 preview and introduction to MongoDB Atlas

Elastic Data Platform - BlueData€¦ · For example, a Hadoop cluster may be Kerberized, but the MongoDB cluster may not be, and the Google BigQuery engine would have its own internal

MongoDB in AWS (MongoDB as a DBaaS) - The …return back to CloudFormation as custom resource • Assign instance initial roles in the Cluster for building process • Save roles information

Growing the Ceramic Cluster in North Staffordshire - Urbact

Growing the Business on the Mainframe. - TPFUG · 2019-04-10 · Growing the Business on the Mainframe. Misha Kravchenko - Marriott VP Mainframe Delivery ... •The MongoDB on z initiative

The Massachusetts Robotics Cluster - Michael Porter...2 Executive Summary The Massachusetts (MA) Robotics cluster is a rapidly growing cluster that includes both large global leaders