17

Click here to load reader

Shift: Real World Migration from MongoDB to Cassandra

Embed Size (px)

DESCRIPTION

Presentation on SHIFT's migration from MongoDB to Cassandra. Topics will include reasons behind choosing to move to Cassandra, zero downtime migration strategy, data modeling patterns, and the benefits of using CQL3.

Citation preview

Page 1: Shift: Real World Migration from MongoDB to Cassandra

SHIFT.com Migrating from MongoDB to Cassandra by: Blake Eggleston & Jon Haddad

Page 2: Shift: Real World Migration from MongoDB to Cassandra

What is SHIFT.com?

Shift is a platform that enables marketers to communicate across organizations and departments in one single place.

It’s also an open application platform with a set of applications built on top of it that can communicate with one another.

Page 3: Shift: Real World Migration from MongoDB to Cassandra

Initial Stack

●  Python ○  Flask ○  Celery

●  MongoDB ○  mongoengine

●  Neo4j / Titan ○  Bulbs ○  thunderdome

●  Redis ●  AWS

○  m1.xlarge for mongo

Page 4: Shift: Real World Migration from MongoDB to Cassandra

Current Stack

●  Python ○  still flask ○  still celery ○  gevent (it rocks)

●  Cassandra ○  1.2.6 ○  cqlengine

●  ElasticSearch ●  Redis

○  jondis

●  AWS ○  m1.xlarge

Page 5: Shift: Real World Migration from MongoDB to Cassandra

Why did we move to Cassandra?

●  Operational Benefits ○  Adding and removing nodes is much easier,

compared to Mongo’s shards

●  Control over our Data on Disk (LSMT) ●  Love CQL3 ●  Long term scalability

○  Scales Linearly ○  Multi DC Support Baked in

Page 6: Shift: Real World Migration from MongoDB to Cassandra

Migration Goals

●  Zero downtime ○  We wanted to roll out Cassandra without any

service interruptions

●  No loss of performance ○  By carefully structuring our schema we were able

to match MongoDB’s performance.

Page 7: Shift: Real World Migration from MongoDB to Cassandra

Migration Strategy

Page 8: Shift: Real World Migration from MongoDB to Cassandra

Benefits of CQL3

●  Easy to understand if you’re coming from RDBMS

●  Collections ○  sets, lists, maps

●  Batch Queries ●  Clustering Keys

○  Handles ordering of logical rows ○  Saved us from column name management scheme

and allowed us to focus on our data

Page 9: Shift: Real World Migration from MongoDB to Cassandra

Physical vs Logical Row

Page 10: Shift: Real World Migration from MongoDB to Cassandra

Single Row

Page 11: Shift: Real World Migration from MongoDB to Cassandra

Clustered Row

Page 12: Shift: Real World Migration from MongoDB to Cassandra

Data Modelling Patterns

●  considerations: working with Mongo’s dbrefs and optimizing layout on disk

●  structured tables as materialized views of the queries we planned on using

●  moving multiple documents into a single physical row

●  creating supporting index tables for looking up logical rows

Page 13: Shift: Real World Migration from MongoDB to Cassandra

Time Series: Message Stream

●  Users have tens of thousands of messages ●  Each users message stream is specific to

them, like a twitter feed ●  This is Cassandra’s strength - Time Series ●  Considered Redis - but poor for multi-dc

create table news_feed ( user_id uuid,

message_id timeuuid,

message,

primary key (user_id, message_id));

Page 14: Shift: Real World Migration from MongoDB to Cassandra

cqlengine

●  cqlengine.org ●  the Python CQL3 object-row mapper ●  exposes CQL3 tables as Python classes ●  maps columns to properties ●  builds CQL queries

#model definition class ExampleModel(Model): example_id = columns.UUID(primary_key=True) example_type = columns.Integer(index=True) created_at = columns.DateTime() description = columns.Text(required=False) # example query ExampleModel.objects(example_type=1)

Page 15: Shift: Real World Migration from MongoDB to Cassandra

Improvements from moving to C*

●  Operationally we’ve had zero problems ●  Outstanding Performance ●  Easy to build new features ●  Community has been amazing (mailing list

and #cassandra)

Page 16: Shift: Real World Migration from MongoDB to Cassandra

misc tips

●  leveled compaction - good for read heavy workloads

●  use secondary indexes sparingly, understand how they work and when to use them

●  to reiterate, think about how you’re going to query your data

●  use elastic search / solr for ad hoc queries

Page 17: Shift: Real World Migration from MongoDB to Cassandra

Contact Info

Jon Haddad @rustyrazorblade [email protected]

Blake Eggleston @blakeeggleston [email protected]

….we’re hiring!