45
1

Couchbase Seoul Data Engineering Conference (SDEC) 2011

Embed Size (px)

Citation preview

Page 1: Couchbase Seoul Data Engineering Conference (SDEC) 2011

1

Page 2: Couchbase Seoul Data Engineering Conference (SDEC) 2011

2

Chiyoung Seo, Couchbase Inc.Matt Ingenthron, Couchbase Inc.

USING COUCHBASE FOR SOCIAL GAME SCALING AND SPEED

Page 3: Couchbase Seoul Data Engineering Conference (SDEC) 2011

3

• Introduction• What is Couchbase Server?

– Simple, Fast, Elastic– Technology Overview (Architecture, data flow, rebalancing)

• Tribal Crossing Inc: Animal Party– Challenges before Couchbase

• Original Architecture

– Why Couchbase?• Simplicity• Performance• Flexibility

– Deploying Couchbase• New Architecture• EC2• Data Model• Accessing data in Couchbase

• Product Roadmap• Q&A

Agenda

Page 4: Couchbase Seoul Data Engineering Conference (SDEC) 2011

4

• Membase and CouchOne have merged to form Couchbase Inc. (headquartered in Silicon Valley)

• Team– Brings together the creators and core contributors of Memcached,

Membase and CouchDB technologies– Doubles technical team size, accelerates roadmaps by over a year

• Products– Couchbase Server (Formerly Membase)– Couchbase Single Server– Mobile Couchbase (iPhone and Android)

• Technology– Most mature, reliable and widely deployed NoSQL technologies– Fully featured, open source document datastore– First complete, end-to-end NoSQL database product

Couchbase Inc.

Page 5: Couchbase Seoul Data Engineering Conference (SDEC) 2011

5

Modern Interactive Web Application Architecture

Application Scales OutJust add more commodity web servers

Database Scales UpGet a bigger, more complex server

www.facebook.com/animalparty

Web Servers

Relational Database

Load Balancer

- Expensive and disruptive sharding- Doesn’t perform at Web Scale

Page 6: Couchbase Seoul Data Engineering Conference (SDEC) 2011

6

Couchbase Server is a distributed database

Couchbase Servers

Web application server

Application user

Couchbase Web Console

Page 7: Couchbase Seoul Data Engineering Conference (SDEC) 2011

7

Couchbase data layer scales like application logic tierData layer now scales with linear cost and constant performance.

Application Scales OutJust add more commodity web servers

Database Scales OutJust add more commodity data servers

Scaling out flattens the cost and performance curves.

Couchbase Servers

www.facebook.com/animalparty

Web ServersLoad Balancer

Horizontally scalable, schema-less, auto-sharding, high-performance at Web Scale

Page 8: Couchbase Seoul Data Engineering Conference (SDEC) 2011

8

Couchbase Server is Simple, Fast, Elastic

• Five minutes or less to a working cluster– Downloads for Windows, Linux and OSX– Start with a single node– One button press joins nodes to a cluster

• Easy to develop against– Just SET and GET – no schema required– Drop it in. 10,000+ existing applications

already “speak Couchbase” (via memcached)– Practically every language and application

framework is supported, out of the box

• Easy to manage– One-click failover and cluster rebalancing– Graphical and programmatic interfaces– Configurable alerting

Page 9: Couchbase Seoul Data Engineering Conference (SDEC) 2011

9

Couchbase Server is Simple, Fast, Elastic

• Predictable– “Never keep an application waiting”– Quasi-deterministic latency and throughput

• Low latency– Built-in Memcached technology– Auto-migration of hot data to lowest latency

storage technology (RAM, SSD, Disk)– Selectable write behavior – asynchronous,

synchronous (on replication, persistence)

• High throughput– Multi-threaded– Low lock contention– Asynchronous wherever possible– Automatic write de-duplication

Page 10: Couchbase Seoul Data Engineering Conference (SDEC) 2011

10

Couchbase Server is Simple, Fast, Elastic

• Zero-downtime elasticity– Spread I/O and data across commodity

servers (or VMs) – Consistent performance with linear cost– Dynamic rebalancing of a live cluster

• All nodes are created equal– No special case nodes– Clone to grow

• Extensible– Change feeds– Real-time map-reduce– RESTful interface for management

Couchbase Web Console

Page 11: Couchbase Seoul Data Engineering Conference (SDEC) 2011

11

Proven at Small, and Extra Large Scale

• Leading cloud service (PAAS) provider

• Over 150,000 hosted applications

• Couchbase Server serving over 6,200 Heroku customers

• Social game leader – FarmVille, Mafia Wars, Empires and Allies, Café World, FishVille

• Over 230 million monthly users

• Couchbase Server is the primary database behind key Zynga properties

Page 12: Couchbase Seoul Data Engineering Conference (SDEC) 2011

12

Customers and Partners

Customers (partial listing) Partners

Page 13: Couchbase Seoul Data Engineering Conference (SDEC) 2011

13

moxi

11211 11210

memcachedprotocol listener/sender

Couchbase Storage Engine

engine interface

memcapable 1.0 memcapable 2.0

21100 – 2119943698091

httpRE

ST m

anag

emen

t API

/Web

UI

Hea

rtbe

at

Proc

ess

mon

itor

Glo

bal s

ingl

eton

sup

ervi

sor

Confi

gura

tion

man

ager

on each node

Erlang/OTP

Reba

lanc

e or

ches

trat

or

Nod

e he

alth

mon

itor

one per cluster

vBuc

ket s

tate

and

repl

icati

on m

anag

er

HTTP distributed erlangerlang port mapper

Data Manager Cluster Manager

Couchbase Server Architecture

Page 14: Couchbase Seoul Data Engineering Conference (SDEC) 2011

14

moxi

11211 11210

memcachedprotocol listener/sender

engine interface

memcapable 1.0 memcapable 2.0

21100 – 2119943698091

httpRE

ST m

anag

emen

t API

/Web

UI

Hea

rtbe

at

Proc

ess

mon

itor

Glo

bal s

ingl

eton

sup

ervi

sor

Confi

gura

tion

man

ager

on each node

Erlang/OTP

Reba

lanc

e or

ches

trat

or

Nod

e he

alth

mon

itor

one per cluster

vBuc

ket s

tate

and

repl

icati

on m

anag

er

HTTP distributed erlangerlang port mapper

Couchbase Server Architecture

Couchbase Storage Engine

Page 15: Couchbase Seoul Data Engineering Conference (SDEC) 2011

15

Couchbase “write” Data Flow – application view

User action results in the need to change the VALUE of KEY

Application updates key’s VALUE, performs SET operation

Couchbase client hashes KEY, identifies KEY’s master serverSET request sent over

network to master server

Couchbase replicates KEY-VALUE pair, caches it in memory and stores it to disk

1

2

34

5

Page 16: Couchbase Seoul Data Engineering Conference (SDEC) 2011

16

Couchbase Data Flow – under the hood

Listener-Sender

DiskDisk Disk

RAM*

mem

base

stor

age

engi

ne

SSDSSD SSD

Listener-Sender

DiskDisk Disk

RAM*

mem

base

stor

age

engi

ne

SSDSSD SSD

SET request arrives at KEY’s master server

Listener-Sender

Master server for KEY Replica Server 2 for KEYReplica Server 1 for KEY

2 2

1 SET acknowledgement returned to application3

DiskDisk Disk

RAM*

Couc

hbas

e st

orag

e en

gine

SSDSSD SSD

2

4

Page 17: Couchbase Seoul Data Engineering Conference (SDEC) 2011

17

Elasticity - Rebalancing

vBucket 1vBucket 2

vBucket 3

vBucket 4vBucket 5vBucket 6

Node 1 Node 2 Node 3

vBucket 1

vBucket 2

vBucket 3

vBucket 4

vBucket 5

vBucket 6vBucket 7

vBucket 8

vBucket 9

vBucket 10

vBucket 11

vBucket 12

Before• Adding Node 3• Node 3 is in pending state• Clients talk to Node 1,2 only

After• Node 3 is balanced• Clients are reconfigured to talk to

Node 3

During• Rebalancing orchestrator recalculates

the vBucket map (including replicas)• Migrate vBuckets to the new server• Finalize migration

vBucket 7vBucket 8

vBucket 9

vBucket 10vBucket 11vBucket 12

Pending state

vBucket 1vBucket 2

vBucket 3

vBucket 4vBucket 5vBucket 6

vBucket 7vBucket 8

vBucket 9

vBucket 10vBucket 11vBucket 12

Rebalancing

vBucket migrator vBucket migrator

Client

Page 18: Couchbase Seoul Data Engineering Conference (SDEC) 2011

18

Data buckets are secure Couchbase “slices”

Couchbase data servers

In the data center

Web application server

Application user

On the administrator console

Bucket 1

Bucket 2

Aggregate Cluster Memory and Disk Capacity

Page 19: Couchbase Seoul Data Engineering Conference (SDEC) 2011

19

• Support large-scale analytics on application data by streaming data from Couchbase to Hadoop– Real-time integration using Flume– Batch integration using Sqoop

• Examples– Various game statistics (e.g., monthly / daily / hourly rankings)– Analyze game patterns from users to enhance various game metrics

Couchbase and Hadoop Integration

memcachedprotocol listener/sender

engine interface

Couchbase Storage Engine

TAP

Flume

Sqoop

Page 20: Couchbase Seoul Data Engineering Conference (SDEC) 2011

20

• Introduction• What is Couchbase Server?

– Simple, Fast, Elastic– Technology Overview (Architecture, data flow, rebalancing)

• Tribal Crossing Inc: Animal Party– Challenges before Couchbase

• Original Architecture

– Why Couchbase?• Simplicity• Performance• Flexibility

– Deploying Couchbase• New Architecture• EC2• Data Model• Accessing data in Couchbase

• Product Roadmap• Q&A

Agenda

Page 21: Couchbase Seoul Data Engineering Conference (SDEC) 2011

21

Common steps on scaling up database:● Tune queries (indexing, explain query)● Denormalization● Cache data (APC / Memcache)● Tune MySQL configuration● Replication (read slaves)

Where do we go from here to prepare for the scale of a successful social game?

Tribal Crossing: Challenges

Page 22: Couchbase Seoul Data Engineering Conference (SDEC) 2011

22

● Write-heavy requests– Caching does not help– MySQL / InnoDB limitation (Percona)

● Need to scale drastically over night– My Polls – 100 to 1m users over a weekend

● Small team, no dedicated sysadmin– Focus on what we do best – making games

● Keeping cost down

Tribal Crossing: Challenges

Page 23: Couchbase Seoul Data Engineering Conference (SDEC) 2011

23

● MySQL with master-to-master replication and sharding

– Complex to setup, high administration cost– Requires application level changes

● Cassandra– High write, but low read throughput– Live cluster reconfiguration and rebalance is quite complicated– Eventual consistency gives too much burden to application

developers● MongoDB

– High read/write, but unpredictable latency– Live cluster rebalance for existing nodes only– Eventual consistency with slave nodes

Tribal Crossing: “Old” Architecture and Options

Page 24: Couchbase Seoul Data Engineering Conference (SDEC) 2011

24

● SPEED, SPEED, SPEED● Immediate consistency● Interface is dead simple to use

– We are already using Memcache● Low sysadmin overhead● Schema-less data store● Used and Proven by big guys like Zynga● … and lastly, because Tribal CAN

– Bigger firms with legacy code base = hard to adapt– Small team = ability to get on the cutting edge

Tribal Crossing: Why Couchbase Server?

Page 25: Couchbase Seoul Data Engineering Conference (SDEC) 2011

25

● But, there are some different challenges in using Couchbase (currently 1.7) to handle the game data:

– No easy way to query data– No transaction / rollback

➔ Couchbase Server 2.0 resolves them by using CouchDB as the underlying database engine

● Can this work for an online game?– Break out of the old ORM / relational paradigm!– We are not handling bank transactions

Tribal Crossing: New Challenges With Couchbase

Page 26: Couchbase Seoul Data Engineering Conference (SDEC) 2011

26

Couchbase Cluster

Web Server

Tribal Crossing: Deploying Couchbase in EC2

● Basic production environment setup

● Dev/Stage environment – feel free to install Couchbase on your web server

Apache

Couchbase Couchbase

DNS Entry

Client-side Moxi

Cluster Mgmt. Requests

Page 27: Couchbase Seoul Data Engineering Conference (SDEC) 2011

27

Tribal Crossing: Deploying Couchbase in EC2

● Amazon Linux AMI, 64-bit, EBS backed instance

● Setup swap space● Install Couchbase’s

Membase Server 1.7● Access web console

http://<hostname>:8091

● Start the new cluster with a single node

● Add the other nodes to the cluster and rebalance

Couchbase Cluster

Web Server

Apache

Couchbase

DNS Entry

Client-side Moxi

Cluster Mgmt. Requests

… Couchbase

Page 28: Couchbase Seoul Data Engineering Conference (SDEC) 2011

28

Tribal Crossing: Deploying Couchbase in EC2

Moxi figures out which node in the cluster holds data for a given key.● On each web server, install Moxi

proxy● Start Moxi by pointing it to the

DNS entry you created● Web apps connect to Moxi that is

running locallymemcache->addServer(‘localhost’, 11211);

Couchbase Cluster

Web Server

Apache

Couchbase Couchbase

DNS Entry

Client-side Moxi

Cluster Mgmt. Requests

Page 29: Couchbase Seoul Data Engineering Conference (SDEC) 2011

29

Use case - simple farming game:● A player can have a variety of plants on their farm.● A player can add or remove plants from their farm.● A Player can see what plants are on another player's

farm.

Tribal Crossing: Representing Game Data in Couchbase

Page 30: Couchbase Seoul Data Engineering Conference (SDEC) 2011

30

Representing Objects● Simply treat an object as an associative array● Determine the key for an object using the class name

(or type) of the object and an unique ID

Representing Object Lists● Denormalization● Save a comma separated list or an array of object IDs

Tribal Crossing: Representing Game Data in Couchbase

Page 31: Couchbase Seoul Data Engineering Conference (SDEC) 2011

31

Player ObjectKey: 'Player1'

Array( [Id] => 1 [Name] => Shawn)

Tribal Crossing: Representing Game Data in Couchbase

Plant ObjectKey: 'Plant201'

Array( [Id] => 201 [Player_Id] => 1 [Name] => Starflower)PlayerPlant List

Key: 'Player1_PlantList'

Array( [0] => 201 [1] => 202 [2] => 204)

Page 32: Couchbase Seoul Data Engineering Conference (SDEC) 2011

32

● No need to “ALTER TABLE”● Add new “fields” all objects at any time

– Specify default value for missing fields– Increased development speed

● Using JSON for data objects though, owing to the ability to query on arbitrary fields in Couchbase 2.0

Tribal Crossing: Schema-less Game Data

Page 33: Couchbase Seoul Data Engineering Conference (SDEC) 2011

33

Get all plants belong to a given playerRequest: GET /player/1/farm

$plant_ids = couchbase->get('Player1_PlantList');

$response = array();

foreach ($plant_ids as $plant_id){ $plant = couchbase->get('Plant' . $plant_id); $response[] = $plant;}

echo json_encode($response);

Tribal Crossing: Accessing Game Data in Couchbase

Page 34: Couchbase Seoul Data Engineering Conference (SDEC) 2011

34

Give a player a new plant// Create the new plant$new_plant = array ( 'id' => 100, 'name' => 'Mushroom');

$couchbase->set('Plant100', $new_plant);

// Update the player plant list$plant_ids = $couchbase->get('Player1_PlantList');$plant_ids[] = $new_plant['id'];

$couchbase->set('Player1_PlantList', $plant_ids);

Tribal Crossing: Modifying Game Data in Couchbase

Page 35: Couchbase Seoul Data Engineering Conference (SDEC) 2011

35

Concurrency issue can occur when multiple requests are working with the same piece of data.

Solution:● CAS (check-and-set)

– Client can know if someone else has modified the data while you are trying to update

– Implement optimistic concurrency control

● Locking (try/wait cycle)– GETL (get with lock + timeout)

operations– Pessimistic concurrency control

Tribal Crossing: Concurrency

Page 36: Couchbase Seoul Data Engineering Conference (SDEC) 2011

36

● Record object relationships both ways– Example: Plots and Plants

● Plot object stores id of the plant that it hosts● Plant object stores id of the plot that it grows on

– Resolution in case of mismatch● Don't sweat the extra calls to load data in a one-to-

many relationship– Use multiGet– We can still cache aggregated results in a Memcache

bucket if needed

Tribal Crossing: Data Relationship

Page 37: Couchbase Seoul Data Engineering Conference (SDEC) 2011

37

Web Server

First migrated large or slow performing tables and frequently updated fields from MySQL to Couchbase

Tribal Crossing: Migrating to Couchbase Servers

memcachedprotocol listener/sender

engine interface

Couchbase Storage Engine

TAP

TAP Client

Apache + PHP

Client-side Moxi

Reporting Applications

MySQL

Page 38: Couchbase Seoul Data Engineering Conference (SDEC) 2011

38

Tribal Crossing: Deployment

Page 39: Couchbase Seoul Data Engineering Conference (SDEC) 2011

39

Tribal Crossing: Deployment

Page 40: Couchbase Seoul Data Engineering Conference (SDEC) 2011

40

• Significantly reduced the cost incurred by scaling up database servers and managing them.

• Achieved significant improvements in various performance metrics (e.g., read, write, latency, etc.)

• Allowed them to focus more on game development and optimizing key metrics

• Plan to use real-time MapReduce, querying, and indexing abilities provided by the upcoming Elastic Couchbase 2.0

Tribal Crossing: Conclusion

Page 41: Couchbase Seoul Data Engineering Conference (SDEC) 2011

41

• Introduction• What is Couchbase Server?

– Simple, Fast, Elastic– Technology Overview (Architecture, data flow, rebalancing)

• Tribal Crossing Inc: Animal Party– Challenges before Couchbase

• Original Architecture

– Why Couchbase?• Simplicity• Performance• Flexibility

– Deploying Couchbase• New Architecture• EC2• Data Model• Accessing data in Couchbase

• Product Roadmap• Q&A

Agenda

Page 42: Couchbase Seoul Data Engineering Conference (SDEC) 2011

42

• Mobile to cloud data synchronization• Cross data center replication

Product Roadmap: Couchbase Server 2.0

Couchbase Single Server

US West Coast Data Center

CouchbaseServer

Couchbase Single Server

US East Coast Data Center

CouchbaseServer

CouchSync

CouchSync CouchSync

CouchSync

…… …

… …

CouchSync

Page 43: Couchbase Seoul Data Engineering Conference (SDEC) 2011

43

• Replace Sqlite-based storage engine with CouchDB• Support indexing and querying on values• Integrate real-time MapReduce into Couchbase server• SDK for Couchbase server

Product Roadmap: Couchbase Server 2.0

The world’s leading caching and clustering technology

The most reliable and full-featured document database

The fastest, most complete and most reliable database on the

planet

Membase Server 1.7 CouchDB 1.1 Couchbase Server 2.0

Page 44: Couchbase Seoul Data Engineering Conference (SDEC) 2011

44

• Community Edition– Open source build– Free forum support

• Enterprise Edition– Free for non-production use– Certified, QA tested version of open source– Case tracking and guaranteed SLA for production

environments

• Partner in Korea– N2M Inc. (http://www.n2m.co.kr)

Couchbase Product Download

Page 45: Couchbase Seoul Data Engineering Conference (SDEC) 2011

45

Q&AMatt Ingenthron, Couchbase Inc.

([email protected], @ingenthr)Chiyoung Seo, Couchbase Inc.

([email protected], @chiyoungseo)