Creating social features at BranchOut using MongoDB

Preview:

DESCRIPTION

Slides from the MongoDB MeetUp "IRC Bots and Activity Feeds with MongoDB - At BranchOut", presented by the San Francisco MongoDB User Group and 10gen. http://www.meetup.com/San-Francisco-MongoDB-User-Group/events/95713262/ Over the past year, we've used MongoDB to power more and more of BranchOut's functionality, including some cool social features such as a Facebook-like activity feed. In this talk, I discuss the design decisions that went into developing these features and outline how Mongo is used under the hood. I discuss not only what makes Mongo a good technology choice, but also list a few things about Mongo that need to be worked around. If you have any questions regarding these slides, feel free to reach out to me on Twitter: @nate510. Thanks!

Citation preview

Building Social Features with MongoDB

Nathan SmithBranchOut.comJan. 22, 2013

Tuesday, January 22, 13

BranchOut

• Connect with your colleagues (follow)

• Activity feed of their professional activity

• Timeline of an individual’s posts

A more social professional network

Tuesday, January 22, 13

BranchOut

• 30M installed users

• 750MM total user records

• Average 300 connections per installed user

A more social professional network

Tuesday, January 22, 13

MongoDB @ BranchOut

Tuesday, January 22, 13

MongoDB @ BranchOut

• 100% MySQL until ~July 2012

Tuesday, January 22, 13

MongoDB @ BranchOut

• 100% MySQL until ~July 2012

• Much of our data fits well into a document model

Tuesday, January 22, 13

MongoDB @ BranchOut

• 100% MySQL until ~July 2012

• Much of our data fits well into a document model

• Our data design avoids RDBMS features

Tuesday, January 22, 13

Follow System

Tuesday, January 22, 13

Follow SystemBusiness logic

Tuesday, January 22, 13

Follow System

• Limit of 2000 followees (people you follow)

Business logic

Tuesday, January 22, 13

Follow System

• Limit of 2000 followees (people you follow)

• Unlimited followers

Business logic

Tuesday, January 22, 13

Follow System

• Limit of 2000 followees (people you follow)

• Unlimited followers

• Both lists reflect updates in near-real time

Business logic

Tuesday, January 22, 13

Follow SystemTraditional RDBMS (i.e. MySQL)

follower_uid followee_uid follow_time123 456 2013-01-22 15:43:00

456 123 2013-01-22 15:52:00

Tuesday, January 22, 13

Follow SystemTraditional RDBMS (i.e. MySQL)

follower_uid followee_uid follow_time123 456 2013-01-22 15:43:00

456 123 2013-01-22 15:52:00

Advantage: Easy inserts, deletes

Tuesday, January 22, 13

Follow SystemTraditional RDBMS (i.e. MySQL)

follower_uid followee_uid follow_time123 456 2013-01-22 15:43:00

456 123 2013-01-22 15:52:00

Advantage: Easy inserts, deletes

Disadvantage: Data locality, index size

Tuesday, January 22, 13

Follow SystemMongoDB (first pass)

followee: { _id: 123 uids: [456, 567, 678]}

Tuesday, January 22, 13

Follow SystemMongoDB (first pass)

Advantage: Compact data, read locality

followee: { _id: 123 uids: [456, 567, 678]}

Tuesday, January 22, 13

Follow SystemMongoDB (first pass)

Advantage: Compact data, read locality

Disadvantage: Can’t display a user’s followers

followee: { _id: 123 uids: [456, 567, 678]}

Tuesday, January 22, 13

db.follow.find({uids: 456}, {_id: 1});

Follow SystemCan’t display a user’s followers (easily)

followee: { _id: 123 uids: [456, 567, 678]}

...with multi-key index on uids

Tuesday, January 22, 13

db.follow.find({uids: 456}, {_id: 1});

Follow SystemCan’t display a user’s followers (easily)

Expensive! Also, no guarantee of order.

followee: { _id: 123 uids: [456, 567, 678]}

...with multi-key index on uids

Tuesday, January 22, 13

Follow SystemMongoDB (second pass)

followee: { _id: 1, uids: [2, 3]},followee: { _id: 2, uids: [1, 3]}

follower: { _id: 1, uids: [2]}, follower: { _id: 2, uids: [1]}follower: { _id: 3, uids: [1, 2]}

Tuesday, January 22, 13

Follow SystemMongoDB (second pass)

Advantages: Local data, fast selects

followee: { _id: 1, uids: [2, 3]},followee: { _id: 2, uids: [1, 3]}

follower: { _id: 1, uids: [2]}, follower: { _id: 2, uids: [1]}follower: { _id: 3, uids: [1, 2]}

Tuesday, January 22, 13

Follow SystemMongoDB (second pass)

Advantages: Local data, fast selects

Disadvantages: Follower doc size

followee: { _id: 1, uids: [2, 3]},followee: { _id: 2, uids: [1, 3]}

follower: { _id: 1, uids: [2]}, follower: { _id: 2, uids: [1]}follower: { _id: 3, uids: [1, 2]}

Tuesday, January 22, 13

Follow SystemFollower document size

Tuesday, January 22, 13

Follow SystemFollower document size

• Max Mongo doc size: 16MB

Tuesday, January 22, 13

Follow SystemFollower document size

• Max Mongo doc size: 16MB

• Number of people who follow our community manager: 30MM

Tuesday, January 22, 13

Follow SystemFollower document size

• Max Mongo doc size: 16MB

• Number of people who follow our community manager: 30MM

• 30MM uids × 8 bytes/uid = 240MB

Tuesday, January 22, 13

Follow SystemFollower document size

• Max Mongo doc size: 16MB

• Number of people who follow our community manager: 30MM

• 30MM uids × 8 bytes/uid = 240MB

• Max followers per doc: ~2MM

Tuesday, January 22, 13

Follow SystemMongoDB (final pass)

follower: { _id: “1”, uids: [2,3,4,...], count: 20001, next_page: 2},follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000}

followee: { _id: 1, uids: [2, 3]},followee: { _id: 2, uids: [1, 3]}

Tuesday, January 22, 13

Follow SystemMongoDB (final pass)

follower: { _id: “1”, uids: [2,3,4,...], count: 20001, next_page: 2},follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000}

followee: { _id: 1, uids: [2, 3]},followee: { _id: 2, uids: [1, 3]}

follower: { _id: “1”, uids: [2,3,4,...], count: 10001, next_page: 3},follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000}

Tuesday, January 22, 13

Follow SystemMongoDB (final pass)

Asynchronous thread manages follower documents

follower: { _id: “1”, uids: [2,3,4,...], count: 20001, next_page: 2},follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000}

followee: { _id: 1, uids: [2, 3]},followee: { _id: 2, uids: [1, 3]}

follower: { _id: “1”, uids: [2,3,4,...], count: 10001, next_page: 3},follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000}

Tuesday, January 22, 13

Activity Feed

Tuesday, January 22, 13

Push vs Pull architecture

Activity Feed

Tuesday, January 22, 13

Push vs Pull architecture

Activity Feed

Tuesday, January 22, 13

Push vs Pull architecture

Activity Feed

Tuesday, January 22, 13

Business logic

Activity Feed

Tuesday, January 22, 13

Business logic

• All connections and followees appear in your feed

Activity Feed

Tuesday, January 22, 13

Business logic

• All connections and followees appear in your feed

• Reverse chron sort order (but should support other rankings)

Activity Feed

Tuesday, January 22, 13

Business logic

• All connections and followees appear in your feed

• Reverse chron sort order (but should support other rankings)

• Support for evolving set of feed event types

Activity Feed

Tuesday, January 22, 13

Business logic

• All connections and followees appear in your feed

• Reverse chron sort order (but should support other rankings)

• Support for evolving set of feed event types

• Tagging creates multiple feed events for the same underlying object

Activity Feed

Tuesday, January 22, 13

Business logic

• All connections and followees appear in your feed

• Reverse chron sort order (but should support other rankings)

• Support for evolving set of feed event types

• Tagging creates multiple feed events for the same underlying object

• Feed events are not ephemeral -- Timeline

Activity Feed

Tuesday, January 22, 13

Traditional RDBMS (i.e. MySQL)

activity_id uid event_time type oid1 oid21 123 2013-01-22 15:43:00 photo 123abc 789ghi

2 345 2013-01-22 15:52:00 status 456def foobar

Activity Feed

Tuesday, January 22, 13

Traditional RDBMS (i.e. MySQL)

activity_id uid event_time type oid1 oid21 123 2013-01-22 15:43:00 photo 123abc 789ghi

2 345 2013-01-22 15:52:00 status 456def foobar

Advantage: Easy inserts

Activity Feed

Tuesday, January 22, 13

Traditional RDBMS (i.e. MySQL)

activity_id uid event_time type oid1 oid21 123 2013-01-22 15:43:00 photo 123abc 789ghi

2 345 2013-01-22 15:52:00 status 456def foobar

Advantage: Easy inserts

Disadvantages: Rigid schema adapts poorly to new activity types, doesn’t scale

Activity Feed

Tuesday, January 22, 13

MongoDB

ufc:{ _id: 123, // UID total_events: 18, 2013_01_total: 4, 2012_12_total: 8, 2012_11_total: 6, ...other counts...}

ufm:{ _id: “123_2013_01”, events: [ { uid: 123, type: “photo_upload”, content_id: “abcd9876”, timestamp: 1358824502, ...more metadata... }, ...more events... ]}

user_feed_card user_feed_month

Activity Feed

Tuesday, January 22, 13

Algorithm

Activity Feed

Tuesday, January 22, 13

Algorithm

1. Load user_feed_cards for all connections

Activity Feed

Tuesday, January 22, 13

Algorithm

1. Load user_feed_cards for all connections

2. Calculate which user_feed_months to load

Activity Feed

Tuesday, January 22, 13

Algorithm

1. Load user_feed_cards for all connections

2. Calculate which user_feed_months to load

3. Load user_feed_months

Activity Feed

Tuesday, January 22, 13

Algorithm

1. Load user_feed_cards for all connections

2. Calculate which user_feed_months to load

3. Load user_feed_months

4. Aggregate events that refer to the same story

Activity Feed

Tuesday, January 22, 13

Algorithm

1. Load user_feed_cards for all connections

2. Calculate which user_feed_months to load

3. Load user_feed_months

4. Aggregate events that refer to the same story

5. Sort (reverse chron)

Activity Feed

Tuesday, January 22, 13

Algorithm

1. Load user_feed_cards for all connections

2. Calculate which user_feed_months to load

3. Load user_feed_months

4. Aggregate events that refer to the same story

5. Sort (reverse chron)

6. Load content, comments, etc. and build stories

Activity Feed

Tuesday, January 22, 13

Performance

Activity Feed

Tuesday, January 22, 13

Performance

• Response times average under 500 ms (98th percentile under 1 sec

Activity Feed

Tuesday, January 22, 13

Performance

• Response times average under 500 ms (98th percentile under 1 sec

• Design expected to scale well horizontally

Activity Feed

Tuesday, January 22, 13

Performance

• Response times average under 500 ms (98th percentile under 1 sec

• Design expected to scale well horizontally

• Need to continue to optimize

Activity Feed

Tuesday, January 22, 13

Building Social Features with MongoDB

Nathan Smith BrO: http://branchout.com/nate

FB: http://facebook.com/neocortica Twitter: @nate510

Email: nate@branchout.com

Aditya Agarwal on Facebook’s architecture: http://www.infoq.com/presentations/Facebook-Software-Stack

Dan McKinley on Etsy’s activity feed: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture

Good Quora questions on activity feeds: http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed

http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed

Tuesday, January 22, 13

Recommended