36
@PatrickMcFadin Patrick McFadin Chief Evangelist for Apache Cassandra, DataStax Hey relational developer, let's go crazy 1

Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Embed Size (px)

Citation preview

Page 1: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

@PatrickMcFadin

Patrick McFadinChief Evangelist for Apache Cassandra, DataStax

Hey relational developer, let's go crazy

1

Page 2: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016
Page 3: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Why do you develop?

Page 4: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016
Page 5: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

value = Business.add(you)

Page 6: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

KillrVideo

https://killrvideo.github.io/

Page 7: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Major areas to cover

Connecting to the database Inserting Data Selecting Data Indexing Data Locality

WARNING

Page 8: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Connecting to the database

Page 9: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Cluster cluster;Session session;

// Connect to the cluster and keyspace "killrvideo"cluster = Cluster.builder().addContactPoint(“192.168.0.1,192.168.0.2”).build();session = cluster.connect("killrvideo");

Page 10: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Cluster cluster;Session session;

// Connect to the cluster and keyspace "killrvideo"cluster = Cluster.builder().addContactPoint(“NODE1,NODE2”).build();session = cluster.connect("killrvideo");

WARNINGCluster cluster = Cluster.builder() .addContactPoint(“192.168.0.1,192.168.0.2”) .withLoadBalancingPolicy( DCAwareRoundRobinPolicy.builder() .withLocalDc("myLocalDC") .build() ).build();

Page 11: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Multi-DCEast West

< 1ms > 70ms

I wonder why I have random slow queries?

Page 12: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Major areas to cover

Connecting to the database Inserting Data Selecting Data Indexing Data Locality

Page 13: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Inserting Data

Page 14: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Inserting dataCREATE TABLE video_ratings_by_user ( videoid uuid, userid uuid, rating int, PRIMARY KEY (videoid, userid) );

INSERT INTO video_ratings_by_user(videoid, userid)VALUES (?,?);

Page 15: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Inserting data

• Batch in the same partition is great • Pay attention to the partition key

BEGIN BATCH INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.ʼ);

…100 Inserts later…

INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.');APPLY BATCH;

Page 16: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Batches: The bad

BEGIN BATCH 1000 insertsAPPLY BATCH;

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

WARNING

Page 17: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Prepared Statements• Built for speed an efficiency

Page 18: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

How they work: Prepare

SELECT * FROM user WHERE id = ?

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

Prepare

Parsed

Hashed Cached

Prepared Statement

Page 19: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

How they work: Bind

id = 1 + PreparedStatement Hash

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

Bind & Execute

Combine Pre-parsed Query and Variable

Execute

Page 20: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Major areas to cover

Connecting to the database Inserting Data Selecting Data Indexing Data Locality

Page 21: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Selecting Data

Page 22: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Getting data

• Use a partition key always •Need JSON? Just ask • Order of clustering columns matter

SELECT * FROM user_videosWHERE userid = ?;

SELECT * FROM user_videosWHERE userid = ?AND added_date = ?;

CREATE TABLE IF NOT EXISTS user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid)) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);

SELECT * FROM user_videosWHERE userid = ?AND videoid = ?;

SELECT JSON * FROM user_videosWHERE userid = ?;

Page 23: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Getting data

• CQLSH trace facility is your friend •Watch the logs. Filter for warnings

SELECT * FROM videos;

SELECT * FROM videos ALLOW FILTERING;

WARNING

SELECT * FROM videosWHERE key IN <10s, 100s or 1000s of keys>;

Page 24: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Major areas to cover

Connecting to the database Inserting Data Selecting Data Indexing Data Locality

Page 25: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Indexing

Page 26: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Check out what I built This query is really slow

Duh. Add an index to this field.

Oh yeah. That is faster.

Page 27: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Indexing data

• Secondary Indexes are not for speed • Index clustering columns • Index collections

CREATE INDEX videoid_idxON user_videos(videoid) ;

CREATE TABLE IF NOT EXISTS user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid)) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);

CREATE INDEX tags_idxON videos(tags) ;

Page 28: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

name (PK) location

Jonathan TX

Aleksey UK

Patrick CA

Stefania HK

CREATE INDEX location_idx ON users(location)

USERS Index:user(location)

Index:user(location)

Index:user(location)

Index:user(location)

Page 29: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

name (PK) location

Jonathan TX

Aleksey UK

Patrick CA

Stefania HK

CREATE CUSTOM INDEX location_idx ON users(location) USING ‘org.apache.cassandra.sasi.SASIIndex’;

USERS

Page 30: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

name (PK) location

Jonathan TX

Aleksey UK

Patrick CA

Stefania HK

CREATE CUSTOM INDEX location_idx ON users(location) USING ‘org.apache.cassandra.sasi.SASIIndex’;

USERS

Memtable

Users

SSTable

Users

SASI Index

SASI Index

Page 31: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

SASI Queries

SELECT * FROM users WHERE firstname LIKE 'pat%';

SELECT * FROM users WHERE lastname LIKE ‘%Fad%';

SELECT * FROM users WHERE email LIKE '%data%';

SELECT * FROM users WHERE created_date > '2011-6-15' AND created_date < '2011-06-30';

userid | created_date | email | firstname | lastname --------------------------------------+---------------------------------+----------------------+-----------+---------- 9761d3d7-7fbd-4269-9988-6cfd4e188678 | 2011-06-20 20:50:00.000000+0000 | [email protected] | Patrick | McFadin

Page 32: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Major areas to cover

Connecting to the database Inserting Data Selecting Data Indexing Data Locality

Page 33: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Data Locality

Page 34: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

8 Fallacies of Distributed Computing

1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn’t change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous

Page 35: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Insert Alternative

BEGIN BATCH 1000 insertsAPPLY BATCH;

while() { future = session.executeAsync(statement)}

Instead of:

Do this:

WARNING

Collect and deal with your futures!

Page 36: Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Thank you!Questions?

Follow me @PatrickMcFadin