42
©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin Chief Evangelist/Solution Architect - DataStax Cassandra : Introduction

Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Embed Size (px)

DESCRIPTION

Video: http://youtu.be/B-bTPSwhsDY Abstract Patrick McFadin (@PatrickMcFadin), Chief Evangelist for Apache Cassandra at DataStax, will be presenting an introduction to Cassandra as a key player in database technologies. Both large and small companies alike chose Apache Cassandra as their database solution and Patrick will be presenting on why they made that choice. Patrick will also be discussing Cassandra's architecture, including: data modeling, time-series storage and replication strategies, providing a holistic overview of how Cassandra works and the best way to get started. About Patrick McFadin Prior to working for DataStax, Patrick was the Chief Architect at Hobsons, an education services company. His responsibilities included ensuring product availability and scaling for all higher education products. Prior to this position, he was the Director of Engineering at Hobsons which he came to after they acquired his company, Link-11 Systems, a software services company. While at Link-11 Systems, he built the first widely popular CRM system for universities, Connect. He obtained a BS in Computer Engineering from Cal Poly, San Luis Obispo and holds the distinction of being the only recipient of a medal (asanyone can find out) for hacking while serving in the US Navy.

Citation preview

Page 1: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadin Chief Evangelist/Solution Architect - DataStax

Cassandra : Introduction

Page 2: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Who I am

!2

• Patrick McFadin • Solution Architect at DataStax • Cassandra MVP • User for years • Follow me for more:

I talk about Cassandra and building scalable, resilient apps ALL THE TIME!

@PatrickMcFadin

Dude. Uptime == $$

Page 3: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Five Years of Cassandra

0 1 2 3 4 5

0.1 0.3 0.6 0.7 1.0 1.2...

2.0

DSE

Jul-08

Page 4: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Why Cassandra?

Page 5: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

The Best !!Persistence !!Tier !!For Your !!Application

Page 6: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - An introduction

Page 7: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Roots

• Based on Amazon Dynamo and Google BigTable paper

• Shared nothing

• Data safe as possible

• Predictable scaling

!7

Dynamo

BigTable

Page 8: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - More than one server

• All nodes participate in a cluster

• Shared nothing

• Add or remove as needed

•More capacity? Add a server

!8

Each node owns 25% of the data

25%

25%

25%

25%

Page 9: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Core Concepts Write path

Compacted later

<row,column>

Page 10: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Core Concepts Read Path

Real user story • New app • SSDs • 2.5 m requests • Client P99: 3.17ms!

Page 11: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Locally Distributed

• Client writes to any node

• Node coordinates with others

• Data replicated in parallel

• Replication factor: How many copies of your data?

• RF = 3 here

!11

Page 12: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Consistency

• Consistency Level (CL)

• Client specifies per read or write

!12

• ALL = All replicas ack

• QUORUM = > 51% of replicas ack

• LOCAL_QUORUM = > 51% in local DC ack

• ONE = Only one replica acks

Page 13: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Transparent to the application

• A single node failure shouldn’t bring failure

• Replication Factor + Consistency Level = Success

• This example:

• RF = 3

• CL = QUORUM

!13

>51% Ack so we are good!

Page 14: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

My favorite feature.

!14

Ever!

Page 15: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra - Geographically Distributed

• Client writes local

• Data syncs across WAN

• Replication Factor per DC

!15

Page 16: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra Applications - Drivers

• DataStax Drivers for Cassandra

• Java

• C#

• Python

•more on the way

!16

Page 17: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra Applications - Connecting

• Create a pool of local servers

• Client just uses session to interact with Cassandra

!17

!contactPoints = {“10.0.0.1”,”10.0.0.2”}!!keyspace = “videodb”!!public VideoDbBasicImpl(List<String> contactPoints, String keyspace) {!

! cluster = Cluster! .builder()! .addContactPoints(!! contactPoints.toArray(new String[contactPoints.size()]))! .withLoadBalancingPolicy(Policies.defaultLoadBalancingPolicy())! .withRetryPolicy(Policies.defaultRetryPolicy())! .build();!! session = cluster.connect(keyspace);! }

Page 18: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

CQL Intro

• Cassandra Query Language

• SQL–like language to query Cassandra

• Limited predicates. Attempts to prevent bad queries

• But still offers enough leeway to get into trouble

!18

Page 19: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Data Model Logical containers

Cluster - Contains all nodes. Even across WAN

Keyspace - Contains all tables. Specifies replication

Table (Column Family) - Contains rows

Page 20: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

CQL Intro

• CREATE / DROP / ALTER TABLE

• SELECT

!

• BUT

• INSERT AND UPDATE are similar to each other

• If a row doesn’t exist, UPDATE will insert it, and if it exists, INSERT will replace it.

• Think of it as an UPSERT

• Therefore we never get a key violation

• For updates, Cassandra never reads (no col = col + 1)

!20

Page 21: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Data Modeling Creating Tables

CREATE TABLE shopping_cart (!! username varchar,!! cart_name text!! item_id int,!! item_name varchar,! description varchar,!

! price float,!! item_detail map<varchar,varchar>!! PRIMARY KEY ((username,cart_name),item_id)!);

Creates compound partition row key

CREATE TABLE user (!! username varchar,!! firstname varchar,!! lastname varchar,!! shopping_carts set<varchar>,!! PRIMARY KEY (username)!);

Collection!

Page 22: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

CQL Inserts

• Insert will always overwrite

!22

INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!

Page 23: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

CQL Selects

• No joins

• Data is returned in row/column format

!23

SELECT username, firstname, lastname, ! email, password, created_date!FROM users!WHERE username = 'pmcfadin';!

username | firstname | lastname | email | password | created_date!----------+-----------+----------+--------------------------+----------------------------------+--------------------------! pmcfadin | Patrick | McFadin | ['[email protected]'] | ba27e03fd95e507daf2937c937d499ab | 2011-06-20 13:50:00-0700!

Page 24: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra and Time Series

Page 25: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Taming the beast• Peter Higgs and Francois Englert. Nobel prize for Physics

• Theorized the existence of the Higgs boson

!

• Found using ATLAS

!

!

• Data stored in P-BEAST

!

!

• Time series running on Cassandra

Page 26: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Use Cassandra for time series

Get a nobel prize

Page 27: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Why• Storage model from BigTable is perfect

• One row key and tons of (variable)columns

• Single layout on disk

Row Key Column Name Column Name

Column Value Column Value

Page 28: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Example• Storing weather data

• One weather station

• Temperature measurements every minute

WeatherStation ID 2013-10-09 10:00 AM 2013-10-09 10:00 AM 2013-10-10 11:00 AM

72 Degrees 72 Degrees 65 Degrees

Page 29: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Example• Query data

•Weather Station ID = Locality of single node

WeatherStation ID 100

2013-10-09 10:00 AM 2013-10-09 10:00 AM 2013-10-10 11:00 AM

72 Degrees 72 Degrees 65 Degrees

Date query weatherStationID = 100 AND!date = 2013-10-09 10:00 AM

weatherStationID = 100 AND!date > 2013-10-09 10:00 AM AND!date < 2013-10-10 11:01 AM

Date Range

OR

Page 30: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series How• CQL expresses this well

• Data partitioned by weather station ID and time

!

!

!

• Easy to insert data

!

!

• Easy to query

CREATE TABLE temperature (! weatherstation_id text,! event_time timestamp,! temperature text,! PRIMARY KEY (weatherstation_id,event_time)!);

INSERT INTO temperature(weatherstation_id,event_time,temperature) !VALUES ('1234ABCD','2013-04-03 07:01:00','72F');

SELECT temperature !FROM temperature !WHERE weatherstation_id='1234ABCD'!AND event_time > '2013-04-03 07:01:00'!AND event_time < '2013-04-03 07:04:00';

Page 31: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Further partitioning• At every minute you will eventually run out of rows

• 2 billion columns per storage row

• Data partitioned by weather station ID and time

• Use the partition key to split things up

CREATE TABLE temperature_by_day (! weatherstation_id text,! date text,! event_time timestamp,! temperature text,! PRIMARY KEY ((weatherstation_id,date),event_time)!);

Page 32: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Further Partitioning• Still easy to insert

!

!

!

!

• Still easy to query

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) !VALUES ('1234ABCD','2013-04-03','2013-04-03 07:01:00','72F');

SELECT temperature !FROM temperature_by_day !WHERE weatherstation_id='1234ABCD' !AND date='2013-04-03'!AND event_time > '2013-04-03 07:01:00'!AND event_time < '2013-04-03 07:04:00';

Page 33: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Time Series Use cases• Logging

• Thing Tracking (IoT)

• Sensor Data

• User Tracking

• Fraud Detection

•Nobel prizes!

Page 34: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Application Example - Layout

• Active-Active

• Service based DNS routing

!34

Cassandra Replication

Page 35: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Application Example - Uptime

!35

• Normal server maintenance

• Application is unaware

Cassandra Replication

Page 36: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Application Example - Failure

!36

• Data center failure

• Data is safe. Route traffic.

33

Another happy user!

Page 37: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Cassandra Users and Use Cases

Page 38: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Netflix!• If you haven’t heard their story… where have you been?

• 18B market cap — Runs on Cassandra

• User accounts

• Play lists

• Payments

• Statistics

Page 39: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Spotify

•Millions of songs. Millions of users.

• Playlists

• 1 billion playlists

• 30+ Cassandra clusters

• 50+ TB of data

• 40k req/sec peak

!39

http://www.slideshare.net/noaresare/cassandra-nyc

Page 40: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

Instagram(Facebook)

• Loads and loads of photos. (Probably yours)

• All in AWS

• Security audits

• News feed

• 20k writes/sec. 15k reads/sec.

!40

Page 41: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

DataStax Ac*demy for Apache Cassandra

• 100,000 Registrations by the end of 2014

• 25,000 Certifications by the end of 2014

!41

• First four sessions available with Weekly roll-out of 7 sessions total

• Based on DataStax Community Edition

• CQL, Schema Design and Data Modeling

• Introduction to Cassandra Objects

• First Java, then Python, C# and .NET

https://datastaxacademy.elogiclearning.com/

Content

Goals

Page 42: Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

©2013 DataStax Confidential. Do not distribute without consent. !42