Upload
clairvoyantllc
View
44
Download
0
Embed Size (px)
Citation preview
MongoDB Replication Fundamentals
Desert Code Camp – Oct 2014By
Avinash Ramineni
Agenda
• Introduction to MongoDB• MongoDB Replication• Understanding Oplog• Stream data from Oplog• Demo• Gotchas• Questions
Why use a NoSQL Database?
• NoSQL describes a horizontally scalable, non-relational database with built-in replication support
• One Size does not Fit All– RDBMS
• Horizontal or Vertical Scalability ?– Key-Value stores– Column– Document and Graph
• High Availability and Scalability• CAP Theorem
– Choose any two from (Consistency, Availability , Partition Tolerance)• Availability and Partition Tolerance
Why use a NoSQL Database? -2 • NoSQL’s primary goal is to achieve horizontal scalability. It attains
this by reducing transactional semantics and referential integrity.
MongoDB -1
• Document Oriented Database– Bridges the gap between RDBMS and Key-Value Stores– Atomicity– Indexing– Sharding - horizontal Scalability
• BSON format– Binary encoded JSON representation
• No Joins• Complex Queries /Indices• Row Level Locking
MongoDB -2
• MongoDB Cluster– Master - Slave• Slave can become Master incase of fail-over• Only Master is allowed to commit changes to Store
– Master – Master in limited capacity• Inserts/Queries/Deletions are done by Id• Does not work if the usecase expects same object can
be updated concurrently – ReplicaSets
MongoDB -2
Replication
• Why Replication ?– Failover Scenarios
• Hot Backups– Disaster Recovery
• Provides Redundancy and Increases Data Availability• Increases Read Capacity• Different uses of data
• Normal processing• DR / Backup• Reporting
MongoDB Terminology
• Database – Collection (RDBMS – table)– Document (RDBMS – row)
• Cluster Node Types– Primary– Secondary– Arbiter– Hidden
MongoDB Replication
Replicasets
• Primary– Primary accepts all write operations– Only one Primary– Strict Consistency for reads– Logs all the changes in data to “oplog “
• Secondary– Replicate by reading Primary’s “oplog”– Reads might return stale data – Can become primary
Cluster
Primary Election
Read Preference• Routes Read operations to Replica set Members• Increase Read throughputs• Reduce Latency • Secondary reads might be stale• Modes– Primary– Primary Preferred (secondary if primary unavailable)– Secondary– Secondary Preferred– Nearest (read from member with least network latency)
Write Preference
• Write only on Primary (Default)• Write to N number of replica set members
db.products.insert( { item: "envelopes", qty : 100, type: "Clasp" }, { writeConcern: { w: 2, wtimeout: 5000 } })
WriteConcern: Unacknowledged
WriteConcern: Acknowledged
WriteConcern: Journaled
WriteConcern w:2
Stream data from MongoDB
Oplog (Operation Log)
• Similar to Oracle Redo log– Rolling record of all operations that modify the
data – All writes (insert/update/delete) get an entry in
the Oplog• Replicaset members have oplog collection– local.oplog.rs– Oplog is yet another collection in the database
Oplog in Action - Demo
Dissecting Oplog
Dissecting Oplog ..
• Oplog Contents– ts: the time this operation occurred.– h: a unique ID for this operation. Each operation will
have a different value in this field.– op: the write operation that should be applied to the
slave – ns: the database and collection affected by this
operation.– o: the actual document representing the operation– v: Version of the oplog.
Oplog - op
• Op – Operation– i inserts– u updates– d deletes– n no-op
• Updates has an extra field– o2• o1 has update information• o2 has the id that was updated
Triggers?
• Does mongoDB have triggers?– Tailable cursors• tail –f oplog
• Notice any issues with oplog– Aren't we doubling the size of the database ?
Oplog ..
• Capped Collection (fixed Size collection)– Circular Queue– Default Oplog size depends on the OS– Oldest entries get overwritten
• What if the slave node is way off that the oplog got overwritten– Full Resync
• copyDatabase starts streaming from oplog– What if oplog rolls over while the slaves are completing
the copy
Non-Replicated Collections
• local database– Collections in local don’t get replicated– Changes to the collections in local database don’t
show up in the oplog
Questions