Transcript
Page 1: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadin Chief Evangelist/Solution Architect - DataStax

Cassandra : Introduction

Page 2: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Who I am

!2

• Patrick McFadin • Solution Architect at DataStax • Cassandra MVP • User for years • Follow me for more:

I talk about Cassandra and building scalable, resilient apps ALL THE TIME!

@PatrickMcFadin

Dude. Uptime == $$

Page 3: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Five Years of Cassandra

0 1 2 3 4 5

0.1 0.3 0.6 0.7 1.0 1.2...

2.0

DSE

Jul-08

Page 4: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Why Cassandra?

Page 5: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

The Best !!Persistence !!Tier !!For Your !!Application

Page 6: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - An introduction

Page 7: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Roots

• Based on Amazon Dynamo and Google BigTable paper

• Shared nothing

• Data safe as possible

• Predictable scaling

!7

Dynamo

BigTable

Page 8: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - More than one server

• All nodes participate in a cluster

• Shared nothing

• Add or remove as needed

•More capacity? Add a server

!8

Each node owns 25% of the data

25%

25%

25%

25%

Page 9: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Core Concepts Write path

Compacted later

<row,column>

Page 10: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Core Concepts Read Path

Real user story • New app • SSDs • 2.5 m requests • Client P99: 3.17ms!

Page 11: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Locally Distributed

• Client writes to any node

• Node coordinates with others

• Data replicated in parallel

• Replication factor: How many copies of your data?

• RF = 3 here

!11

Page 12: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Consistency

• Consistency Level (CL)

• Client specifies per read or write

!12

• ALL = All replicas ack

• QUORUM = > 51% of replicas ack

• LOCAL_QUORUM = > 51% in local DC ack

• ONE = Only one replica acks

Page 13: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Transparent to the application

• A single node failure shouldn’t bring failure

• Replication Factor + Consistency Level = Success

• This example:

• RF = 3

• CL = QUORUM

!13

>51% Ack so we are good!

Page 14: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

My favorite feature.

!14

Ever!

Page 15: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Geographically Distributed

• Client writes local

• Data syncs across WAN

• Replication Factor per DC

!15

Page 16: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra Applications - Drivers

• DataStax Drivers for Cassandra

• Java

• C#

• Python

•more on the way

!16

Page 17: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra Applications - Connecting

• Create a pool of local servers

• Client just uses session to interact with Cassandra

!17

!contactPoints = {“10.0.0.1”,”10.0.0.2”}!!keyspace = “videodb”!!public VideoDbBasicImpl(List<String> contactPoints, String keyspace) {!

! cluster = Cluster! .builder()! .addContactPoints(!! contactPoints.toArray(new String[contactPoints.size()]))! .withLoadBalancingPolicy(Policies.defaultLoadBalancingPolicy())! .withRetryPolicy(Policies.defaultRetryPolicy())! .build();!! session = cluster.connect(keyspace);! }

Page 18: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

CQL Intro

• Cassandra Query Language

• SQL–like language to query Cassandra

• Limited predicates. Attempts to prevent bad queries

• But still offers enough leeway to get into trouble

!18

Page 19: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Data Model Logical containers

Cluster - Contains all nodes. Even across WAN

Keyspace - Contains all tables. Specifies replication

Table (Column Family) - Contains rows

Page 20: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

CQL Intro

• CREATE / DROP / ALTER TABLE

• SELECT

!

• BUT

• INSERT AND UPDATE are similar to each other

• If a row doesn’t exist, UPDATE will insert it, and if it exists, INSERT will replace it.

• Think of it as an UPSERT

• Therefore we never get a key violation

• For updates, Cassandra never reads (no col = col + 1)

!20

Page 21: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Data Modeling Creating Tables

CREATE TABLE shopping_cart (!! username varchar,!! cart_name text!! item_id int,!! item_name varchar,! description varchar,!

! price float,!! item_detail map<varchar,varchar>!! PRIMARY KEY ((username,cart_name),item_id)!);

Creates compound partition row key

CREATE TABLE user (!! username varchar,!! firstname varchar,!! lastname varchar,!! shopping_carts set<varchar>,!! PRIMARY KEY (username)!);

Collection!

Page 22: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

CQL Inserts

• Insert will always overwrite

!22

INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!

Page 23: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

CQL Selects

• No joins

• Data is returned in row/column format

!23

SELECT username, firstname, lastname, ! email, password, created_date!FROM users!WHERE username = 'pmcfadin';!

username | firstname | lastname | email | password | created_date!----------+-----------+----------+--------------------------+----------------------------------+--------------------------! pmcfadin | Patrick | McFadin | ['[email protected]'] | ba27e03fd95e507daf2937c937d499ab | 2011-06-20 13:50:00-0700!

Page 24: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra and Time Series

Page 25: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Taming the beast• Peter Higgs and Francois Englert. Nobel prize for Physics

• Theorized the existence of the Higgs boson

!

• Found using ATLAS

!

!

• Data stored in P-BEAST

!

!

• Time series running on Cassandra

Page 26: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Use Cassandra for time series

Get a nobel prize

Page 27: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Why• Storage model from BigTable is perfect

• One row key and tons of (variable)columns

• Single layout on disk

Row Key Column Name Column Name

Column Value Column Value

Page 28: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Example• Storing weather data

• One weather station

• Temperature measurements every minute

WeatherStation ID 2013-10-09 10:00 AM 2013-10-09 10:00 AM 2013-10-10 11:00 AM

72 Degrees 72 Degrees 65 Degrees

Page 29: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Example• Query data

•Weather Station ID = Locality of single node

WeatherStation ID 100

2013-10-09 10:00 AM 2013-10-09 10:00 AM 2013-10-10 11:00 AM

72 Degrees 72 Degrees 65 Degrees

Date query weatherStationID = 100 AND!date = 2013-10-09 10:00 AM

weatherStationID = 100 AND!date > 2013-10-09 10:00 AM AND!date < 2013-10-10 11:01 AM

Date Range

OR

Page 30: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series How• CQL expresses this well

• Data partitioned by weather station ID and time

!

!

!

• Easy to insert data

!

!

• Easy to query

CREATE TABLE temperature (! weatherstation_id text,! event_time timestamp,! temperature text,! PRIMARY KEY (weatherstation_id,event_time)!);

INSERT INTO temperature(weatherstation_id,event_time,temperature) !VALUES ('1234ABCD','2013-04-03 07:01:00','72F');

SELECT temperature !FROM temperature !WHERE weatherstation_id='1234ABCD'!AND event_time > '2013-04-03 07:01:00'!AND event_time < '2013-04-03 07:04:00';

Page 31: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Further partitioning• At every minute you will eventually run out of rows

• 2 billion columns per storage row

• Data partitioned by weather station ID and time

• Use the partition key to split things up

CREATE TABLE temperature_by_day (! weatherstation_id text,! date text,! event_time timestamp,! temperature text,! PRIMARY KEY ((weatherstation_id,date),event_time)!);

Page 32: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Further Partitioning• Still easy to insert

!

!

!

!

• Still easy to query

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) !VALUES ('1234ABCD','2013-04-03','2013-04-03 07:01:00','72F');

SELECT temperature !FROM temperature_by_day !WHERE weatherstation_id='1234ABCD' !AND date='2013-04-03'!AND event_time > '2013-04-03 07:01:00'!AND event_time < '2013-04-03 07:04:00';

Page 33: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Use cases• Logging

• Thing Tracking (IoT)

• Sensor Data

• User Tracking

• Fraud Detection

•Nobel prizes!

Page 34: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Application Example - Layout

• Active-Active

• Service based DNS routing

!34

Cassandra Replication

Page 35: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Application Example - Uptime

!35

• Normal server maintenance

• Application is unaware

Cassandra Replication

Page 36: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Application Example - Failure

!36

• Data center failure

• Data is safe. Route traffic.

33

Another happy user!

Page 37: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra Users and Use Cases

Page 38: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Netflix!• If you haven’t heard their story… where have you been?

• 18B market cap — Runs on Cassandra

• User accounts

• Play lists

• Payments

• Statistics

Page 39: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Spotify

•Millions of songs. Millions of users.

• Playlists

• 1 billion playlists

• 30+ Cassandra clusters

• 50+ TB of data

• 40k req/sec peak

!39

http://www.slideshare.net/noaresare/cassandra-nyc

Page 40: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Instagram(Facebook)

• Loads and loads of photos. (Probably yours)

• All in AWS

• Security audits

• News feed

• 20k writes/sec. 15k reads/sec.

!40

Page 41: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

DataStax Ac*demy for Apache Cassandra

• 100,000 Registrations by the end of 2014

• 25,000 Certifications by the end of 2014

!41

• First four sessions available with Weekly roll-out of 7 sessions total

• Based on DataStax Community Edition

• CQL, Schema Design and Data Modeling

• Introduction to Cassandra Objects

• First Java, then Python, C# and .NET

https://datastaxacademy.elogiclearning.com/

Content

Goals

Page 42: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

©2013 DataStax Confidential. Do not distribute without consent. !42