17
Reduce MongoDB Data Size Steven Wang Tangome inc [email protected]

Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

Reduce MongoDB Data Size

Steven Wang Tangome inc [email protected]

Page 2: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

2

Outline• MongoDB Cluster Architecture

• Advantages to Reduce Data Size

• Several Cases To Reduce MongoDB Data Size

• Case 1: Migrate To wiredTiger (High Compression Ratio)

• Case 2: Set TTL Index To Expire Data In Collections (Having Timestamp Field)

• Case 3: Purge Data Based On Hidden timestamp In ”_id” field ( No Timestamp Field)

• Case 4: Use Replica Set To Purge Data And Rebuild Mongo (For large Quantity Of Data)

• Reclaim Disk Space

• Summary

Page 3: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

3

MongoDB Cluster Architecture

Each Shard has one Primary and two Secondaries

Page 4: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

4

Advantages To Reduce Data Size

ØMore data can be stored in Memory (speed up query)

ØSmaller index size (speed up query)

ØLess hard drive (SSD) usage (reduce cost)

Page 5: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

5

Case 1: Migrate To WiredTigerØ WiredTiger used document-level concurrency control for write operations.

(Better Write Performance than MMAPv1)

Ø WiredTiger supports compressions for all collections and indexes. (Compression minimizes storage use at the expense of additional CPU)

Page 6: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

6

WiredTiger VS MMAPv1

Page 7: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

7

Things to Consider Before MigrationØ Always Test the Migration Procedure on Test Environment Before Production MigrationØ Check Replica member priority: rs.conf()Ø Check MongoDB read preference Mode (from Application Configuration):

Ø primaryØ primaryPreferred (preferred during the migration)Ø secondaryØ secondaryPreferred (for Read load performance)Ø nearest

Ø Check chunk balancer status (Set it to off)Ø Check DB size and disk free spaceØ Check Memory Size (use it to configure wiredTigerCacheSizeGB Size)Ø Check CPU Usage (wiredTiger will use more CPU for compression/Decompression)Ø Use Monitor tools: MMS & New Relic (monitor mongodb response time and error rate)Ø Tail mongod logs during the migrationØ Collaborations: Managers/Network Engineers/System Engineers/Software EngineersØ Monitor the whole cluster after migration

Page 8: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

8

WiredTiger Migration Procedure1) Change Mongo Configuration files on Puppet Master

2) Stop Mongo Balancer from one MongoS server

3) Upgrade all mongoS servers using Puppet Agent

4) Upgrade all Mongo Config Servers using Puppet Agent

5) Upgrade all Mongod Secondaries using Puppet Agent

6) Upgrade all Mongod Primaries using Puppet Agent

7) Enable Balancer

Page 9: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

9

Compression Ratio After Migration(Use WiredTiger: snappy compressor)

CusterName

Version before

Version After

Size before(GB)

Size After(GB)

Compression Ratio

Cluster_1 2.8 MMAPv1 3.0.3 wiredTiger 1350 119 11.34

Cluster_2 2.8 MMAPv1 3.0.3 wiredTiger 1680 270 6.22

Cluster_3 2.8 MMAPv1 3.0.3 wiredTiger 309 22.6 13.67

Cluster_4 2.8 MMAPv1 3.0.3 wiredTiger 132 13.2 10.00

Cluster_5 2.8 MMAPv1 3.0.3 wiredTiger 234 32 7.31

Page 10: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

10

Case 2: Set TTL Index On Timestamp FieldScenario: What if your company just want to keep 90 days’ data?

If your collection has a field that holds values of BSON date type of an array of BSON date-types objects, then you can set TTL Index to expire the data.

Examples:1. Expire Documents after a Specified Number of Seconds

Ø Create TTL Index (expire in 90 days)db.log_events.createIndex( {“createdAt: : 1}, { expireAfterSeconds:7776000 } )

db.log_events.insert( { "createdAt": new Date(), "logEvent": 2, "logMessage": "Success!" } )

2. Expire Documents at a Specific Clock TimeØ Create TTL Indexdb.log_events.createIndex( {“expireAt: : 1}, { expireAfterSeconds:0 } ) Ø insert data to be expired on July 22, 2017 14:00:00db.log_events.insert( { "expireAt": new Date('July 22, 2017 14:00:00'), "logEvent": 2, "logMessage": "Success!" } )

Page 11: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

11

TTL Index NotesØ The background task that removes expired documents runs every 60 seconds.

Ø On replica set members, the TTL background thread only deletes documents when a member is in state primary. The TTL background thread is idle when a member is in state secondary. Secondary members replicate deletion operations from the primary.

Ø If collection is large, it will take a long time to create an index. Better purge data first, then create index on smaller collection, or create TTL index when create the collection

Ø Restrictions• TTL indexes are a single-field indexes. Compound indexes do not support TTL and ignores theexpireAfterSeconds option.• The _id field does not support TTL indexes.• You cannot use createIndex() to change the value of expireAfterSeconds of an existing index. Instead use the collMod database

command in conjunction with the index collection flag. Otherwise, to change the value of the option of an existing index, you must drop the index first and recreate.

• If a non-TTL single-field index already exists for a field, you cannot create a TTL index on the same field since you cannot createindexes that have the same key specification and differ only by the options. To change a non-TTL single-field index to a TTL index, you must drop the index first and recreate with theexpireAfterSeconds option.

Page 12: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

12

Case 3: Purge data Using _id fieldScenario: What if your company just wants to keep 90 days’ data?

If your collection doesn’t have a timestamp field, then you can still purge data using the hidden timestamp from _id field.

_id: object_id consists of:• a 4-byte value representing the seconds since the

Unix epoch• a 3-byte machine identifier• a 2-byte process id• a 3-byte counter, starting with a random value.

Page 13: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

13

Sample script for Purge[root@server001 ~]$ cat purge_sample_log.js

# purge data older than “2017-01-25T08:00:00.000Z”use SampleLog

var removeIdsArray= db.sample_log.find({ _id: { $lt: ObjectId("58885b000000000000000000")}}, {_id : 1}).limit(3000).toArray().map(function(doc) { return doc._id; });

db.sample_log.remove({_id: {$in: removeIdsArray}})

[root@server001 ~]$ cat loop_purge.sh#!/bin/bashi=1while [ $i -le 50000 ] ;do/usr/bin/mongo localhost:27517 < /root/purge_sample_log.jsi=$(( $i+1 ))echo $i

# sleep 1Done

Page 14: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

14

Case 4: Purge Large Volume Collections Question: If a Mongo Cluster stores 5 years’ data, and you want to keep the latest 3 months’ data, it may take a long

time (several years) to purge them using batch method.

How to purge it as quick as possible?

Solutions: (Take SampleLog as an example)step 1: take out the 1st secondary (B) of its Replica Set as a stand alone serverstep 2: Disable Replication properties, Start up Mongo on BStep 3: Create a new collection, SampleLog_NewStep 4: Select the latest 3 months’ data from SampleLog, and insert into SampleLog_NewStep 5: Renew SampleLog to SampleLog_OldStep 6: Renew SampleLog_New to SampleLogStep 7: Stop Mongo on B, and Enable ReplicationStep 8: Add backup into Replica Set. After system is stable, remove SampleLog_Old from BStep 9: On the 2nd Secondary (C), repeat steps 1 to 8Step 10: On Primary A, Failover the Primary to one of Secondaries. Shutdown Mongod, Wipe out data in Data Diretory, Start Mongod. A will replicate data from Secondary B or C

Step 11: Run step 1 to step 10 on other Shards

B C

A

Page 15: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

15

Reclaim Deleted Space After PurgingNotes: Mongodb won’t release unused disk space, if there are lots of deletes, you need to periodically compact data to reclaim disk space. You can run db.stats() to check itThree ways to reclaim disk spacesØ Compact Collections

Ø Work at collection level: run it on each collectionØ Place a block on all other operations at the database level, so you have to plan for some downtimeØ Command: db.runCommand({compact:'collectionName'})

Ø Repair DatabasesØ Work at Database levelØ Check and repair errors and inconsistenciesØ Block all other operations on your database, so need schedule a downtimeØ Need free space equivalent to the data in your database and an additional 2GB or more\Ø Command: mongod --repair --repairpath /mnt/vol1

or db.repairDatabase()or db.runCommand({repairDatabase:1})

Ø Re-sync InstanceØ Work on instance levelØ In a replica set, unused disk space can be released by running an initial sync.Ø Steps:

• On Secondary: stop mongod Instance; delete data in data directory; start Mongod Instance; wait for replication to rebuild all data

• On Primary: Failover; stop mongod Instance; delete data in data directory; start Mongod Instance; wait for replication to rebuild all data

Page 16: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

16

Summary

Ø Use WiredTiger Storage Engine if possible

Ø Purge Data by TTL if there are time-stamp fields

Ø purge data by ”_id” field

Ø Use Replica Set for purging large volume of Data

Ø Always remember to Reclaim deleted space after purge

Page 17: Reduce MongoDB Data Size - Percona · 2 Outline • MongoDB Cluster Architecture • Advantages to Reduce Data Size • Several Cases To Reduce MongoDB Data Size • Case 1: Migrate

17

Thank YouJ