27
Running Cassandra in the Cloud: An Introduction to Priam Jason Brown @jasobrown [email protected] www.linkedin.com/in/jasedbrown

An Introduction to Priam

Embed Size (px)

DESCRIPTION

In-depth exploration of Priam, a side kick application to help cassandra run inside of Amazon's cloud.

Citation preview

Page 1: An Introduction to Priam

Running Cassandra in the Cloud:

An Introduction to PriamJason Brown

@jasobrown [email protected]/in/jasedbrown

Page 2: An Introduction to Priam

About me

● Senior Software Engineer, Netflix● Apache Cassandra committer

● E-Commerce Architect, Major League Baseball Advanced Media

● Wireless developer (J2ME and BREW)

Page 3: An Introduction to Priam

Netflix Databases

● Oracle in the datacenter● Migrate to EC2

○ SimpleDB at first○ Cassandra

Page 4: An Introduction to Priam

Cassandra meet EC2

● shell script(s)● python scripts

● backup / restore● centralized model● installing 2.7 broke CentOS yum● first time we ran it in prod, my cluster was

destroyed

Page 5: An Introduction to Priam

Hello, Priam!

Priam, the father of Cassandra(http://en.wikipedia.org/wiki/Priam)

Java web app● Token Assignment● Backup / Restore● Multi-region support● Configuration management

Page 6: An Introduction to Priam

Branches

each priam branch corresponds to a c* version● priam 1.1 -> c* 1.1● priam master -> c* 1.2● ??? -> c* trunk

Page 7: An Introduction to Priam

Token Assignment

● Cassandra needs an assigned token● Priam tries to

○ replace a dead instance○ join as a new node

● External storage for known cluster members○ host name/IP addr/instance id○ token○ region/availability zone

Page 8: An Introduction to Priam
Page 9: An Introduction to Priam

Replacing a dead node

● Get known nodes in region/AZ from storage○ {A, B, C}

● Get live nodes in region/AZ from ASG api○ {A, B}

● Take over a dead node's token○ C

● uses c*'s replace_token

Page 10: An Introduction to Priam

Joining as a new node

● Calculate token○ per-region offset○ determine 'slot' in region/AZ○ derive token

Page 11: An Introduction to Priam

Region hash offset

● Each region needs a different base offset○ avoids token collisions

int hash = "us-east-1".hashCode();

Page 12: An Introduction to Priam

Determining slot

New nodes takes next numbered slot in AZ- looks for other registered nodes in sdb

Page 13: An Introduction to Priam

Node Slotting Layout +--------+--------+--------+| zone A | zone B | zone C |+--------+--------+--------+| 0 | 1 | 2 |+--------+--------+--------+| 3 | 4 | 5 |+--------+--------+--------+| 6 | 7 | 8 |+--------+--------+--------+| 9 | 10 | 11 |+--------------------------+

(ascii art rocks)

Page 14: An Introduction to Priam

Here's your token

MAXIMUM_TOKEN .divide(regionNodeCount) .multiply(mySlot) .add(regionHashOffset);

example:100 / 10 (ten nodes in region) 3 + (in slot three) + 12 = 42

Page 15: An Introduction to Priam

Seeds

● first node in each AZ, in every region● except if current node is in the first slot

○ seeds cannot auto bootstrap

Page 16: An Introduction to Priam

Multi-region communication

AWS security groups block ingress requests

Intra-region: whitelist by other in-region SG

Inter-region: whitelist by IP address○ must use public IP address!

Page 17: An Introduction to Priam

Whitelisting IP address

● Seed nodes compare○ current region's SG IP address○ entries in SimpleDB database

● Add new nodes's to SG● Remove dead nodes from SG

Page 18: An Introduction to Priam

++

us-east-1 || eu-west-1

+-------------+ ||

| simpleDB | ||

+-------------+ ||

||

+--+ || +--+

|S | || |S |

|e | || |e |

|c | || |c |

+----------+ |G | || |G | +----------+

| c* 1 | |r | || |r | | c* 2 |

+----------+ |p | || |p | +----------+

| | || | |

|1 | || |2 |

+--+ || +--+

||

++

Page 19: An Introduction to Priam

Backup

Two types:● Snapshot

○ invokes nodetool snapshot○ once a day, cron-like

● Incremental○ copy all newly flushed sstables

Page 20: An Introduction to Priam

Backup location

Upload to S3 bucket in same region

Bucket lifecycle rules● configure TTL for data

Page 21: An Introduction to Priam

Backup path

Bucket: netflix-cassandra-data

Path: base dir / region / cluster name / token / snapshot time / [SNP | SST | META] / keyspace / column family / data file

example: test_backup/us-east-1/cass_jasobrown/42/1234567/ SNP/jasobrown/dog/jasobrown-dog-ja-1-Data.db

Page 22: An Introduction to Priam

Restore

● best with same size cluster as source● best if tokens match with source

Uses (besides the obvious)● prod to test refresh● reproduce prod data problems● incremental restore - WIP

Page 23: An Introduction to Priam

Configuration Management

Control aspects of priam and c*● yaml● startup script(s) env values

Netflix needs this as we have ~55 production clusters, with slightly different configs

Page 24: An Introduction to Priam

So, does Netflix actually use Priam?

55 production clusters, > 750 nodes

Internal extensions● Hook into internal DNS, properties systems● Alternative storage to SimpleDB● BI messaging integration - WIP● C* JMX monitoring

Page 25: An Introduction to Priam

Monitoring

● Poll C* every 60 seconds● selected JMX metrics● publish to internal metrics aggregator

○ currently uses Netflix's OSS Servo library (github.com/Netflix/servo)

Page 26: An Introduction to Priam

Next directions

Commit log backups

Datastax Enterprise support● security● solr● configuration

c* 1.2 virtual nodes (a/k/a vnodes)

auto scaling

Page 27: An Introduction to Priam

Thank you!

Q & A time

@jasobrown