55
Massively scalable NoSQL with Apache Cassandra Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced

Massively Scalable NoSQL with Apache Cassandra

  • Upload
    jbellis

  • View
    148

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Massively Scalable NoSQL with Apache Cassandra

Massively scalable NoSQL with Apache Cassandra!Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax@spyced

Page 2: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

?

Page 3: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Some Casandra users

Page 4: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

eBay

Application/Use Case• Social Signals: like/want/own features for

eBay product and item pages• Hunch taste graph for eBay users and items• Many time series use cases

Why Cassandra? • Multi-datacenter• Scalable• Write performance• Distributed counters• Hadoop support

ACE

Page 5: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Time series data

Page 6: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Multi-datacenter support

Page 7: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Distributed counters

Page 8: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Hadoop support

Page 9: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Disney

Application/Use Case• Meet the data management needs of user

facing applications across The Walt Disney Company with a single platform

Why Cassandra? • DataStax Enterprise can tackle real-time

and search functions in the same cluster• Scalability• 24x7 uptime

NDI

Page 10: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Multitenancy

Page 11: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Multitenancy

Page 12: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Enterprise search

Page 13: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

SimpleReachApplication/Use Case• SimpleReach tracks social actions for

content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior.

Why Cassandra? • Very high velocity data ingest rate and

large data volumes• Workload separation between realtime

and batch applications

NDE

Page 14: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

SourceNinjaApplication/Use Case• SourceNinja notifies you to performance,

security, and bug fixes for the software you depend on

Why Cassandra? • Previous database system could not

handle load; HBase has too many points of failure and was too slow

• Fast real time capabilities, batch analytics on that data, and enterprise search

RDE

Page 15: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

NetflixApplication/Use Case• General purpose backend for large scale

highly available cloud based web services supporting Netflix Streaming

Why Cassandra? • Highly available, highly robust and no

schema change downtime• Highly scalable, optimized for SSD• Much lower cost than previous Oracle and

SimpleDB implementations• Flexible data model• Ability to directly influence/implement

OSS feature set• Supports local and wide area distributed

operations, spanning US and Europe

RCE

Page 16: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Optimized for SSD

Page 17: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Open source

Page 18: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

• Massively scalable

• High performance

• Reliable/Available

Use case patterns

Page 19: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Page 20: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

0

5000

10000

15000

20000

25000

30000

35000

Cassandra 0.6

Cassandra 1.0

reads/s writes/s

Page 21: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Page 22: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Classic partitioning with SPOFpartition 1 partition 2 partition 3 partition 4

router

client

Page 23: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Availability• “High availability implies that a single fault will not bring

down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax

• “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: Instagram

Page 24: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Fully distributed, no SPOFclient

p1

p1

p1p3

p6

Page 25: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Partitioning

jim

carol

johnny

suzy

age: 36 car: camaro gender: M

age: 37 car: subaru gender: F

age:12 gender: M

age:10 gender: F

Page 26: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Primary key determines placement*

Partitioning

jim

carol

johnny

suzy

age: 36 car: camaro gender: M

age: 37 car: subaru gender: F

age:12 gender: M

age:10 gender: F

Page 27: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

jim

carol

johnny

suzy

PK

5e02739678...

a9a0198010...

f4eb27cea7...

78b421309e...

MD5 Hash

MD5* hash operation yields a 128-bit number

for keysof any size.

Page 28: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Node A

Node D Node C

Node B

The “token ring”

Page 29: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 30: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 31: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 32: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 33: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 34: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Node A

Node D Node C

Node B

carol a9a0198010...

Replication

Page 35: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Node A

Node D Node C

Node B

carol a9a0198010...

Page 36: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Node A

Node D Node C

Node B

carol a9a0198010...

Page 37: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Highlights• Adding capacity is application-transparent and requires

no downtime

• No SPOF, not even temporarily• No “primary” replica

• Con!gurable synchronous/asynchronous

• Tolerates node failure; never have to restart replication “from scratch”

• “Smart” replication avoids correlated failures

Page 38: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE INDEX ON users(state);

SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;

CQL: You got SQL in my NoSQL!

Page 39: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Strictly “realtime” focused• No joins

• No subqueries

• No aggregation functions* or GROUP BY

• ORDER BY?

Page 40: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Clustered data in in CFS

Page 41: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Clustered data in in CFS

Page 42: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

CREATE TABLE sblocks (    block_id uuid,    subblock_id uuid,    data blob,    PRIMARY KEY (block_id, subblock_id));

Clustering in CQL3

block_id subblock_id data

Block1 subblock A data ABlock1 subblock B data B

... ... ...

Block2 subblock C data CBlock2 subblock D data D

... ... ...

Block3 subblock E data EBlock3 subblock F data F

... ... ...

Page 43: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

Collections

Page 44: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

Collections

X

Page 45: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text>);

UPDATE usersSET email_addresses = email_addresses + {‘[email protected]’, ‘[email protected]’};

Collections

Page 46: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

?

Page 47: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

The evolution of Analytics

Analytics + Realtime

Page 48: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

The evolution of Analytics

Analytics Realtime

replication

Page 49: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

The evolution of Analytics

ETL

Page 50: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Big data

Analytics(Hadoop)

Realtime(Cassandra)

DatastaxEnterprise

Page 51: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Page 52: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Better Hadoop than Hadoop• “Vanilla” Hadoop

• 8+ services to setup, monitor, backup, and recover(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...)

• Single points of failure

• Can't separate online and offline processing

• DataStax Enterprise• Single, simpli!ed component

• Self-organizes based on workload

• Peer to peer

• JobTracker failover

Page 53: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

SELECT title FROM solr WHERE solr_query='title:natio*';

title-------------------------------------------------------------------------- Bolivia national football team 2002 List of French born footballers who have played for other national teams Lithuania national basketball team at Eurobasket 2009 Bolivia national football team 2000 Kenya national under-20 football team Bolivia national football team 1999 Israel men's national inline hockey team Bolivia national football team 2001

Enterprise search with Solr

Page 54: Massively Scalable NoSQL with Apache Cassandra

©2012 DataStax

Managing & Monitoring Big Data