62
User-Defined Functions & Materialized Views DuyHai DOAN

Cassandra UDF and Materialized Views

Embed Size (px)

Citation preview

Page 1: Cassandra UDF and Materialized Views

User-Defined Functions & Materialized ViewsDuyHai DOAN

Page 2: Cassandra UDF and Materialized Views

User Define Functions•  Why ? •  Detailed Impl •  UDAs •  Gotchas

© 2015 DataStax, All Rights Reserved. 2

Page 3: Cassandra UDF and Materialized Views

Rationale•  Push computation server-side

•  save network bandwidth (1000 nodes!)

•  simplify client-side code

•  provide standard & useful function (sum, avg …)

•  accelerate analytics use-case (pre-aggregation for Spark)

© 2015 DataStax, All Rights Reserved. 3

Page 4: Cassandra UDF and Materialized Views

How to create an UDF ?

© 2015 DataStax, All Rights Reserved. 4

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALL ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language AS ‘ // source code here‘;

Page 5: Cassandra UDF and Materialized Views

How to create an UDF ?

© 2015 DataStax, All Rights Reserved. 5

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language AS ‘ // source code here‘;

CREATE OR REPLACE FUNCTION IF NOT EXISTS

Page 6: Cassandra UDF and Materialized Views

How to create an UDF ?

© 2015 DataStax, All Rights Reserved. 6

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language AS ‘ // source code here‘;

An UDF is keyspace-wide

Page 7: Cassandra UDF and Materialized Views

How to create an UDF ?

© 2015 DataStax, All Rights Reserved. 7

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE languageAS ‘ // source code here‘;

Param name to refer to in the code Type = CQL3 type

Page 8: Cassandra UDF and Materialized Views

How to create an UDF ?

© 2015 DataStax, All Rights Reserved. 8

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language // jAS ‘ // source code here‘;

Always called Null-check mandatory in code

Page 9: Cassandra UDF and Materialized Views

How to create an UDF ?

© 2015 DataStax, All Rights Reserved. 9

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnTypeLANGUAGE language // javAS ‘ // source code here‘;

If any input is null, code block is skipped and return null

Page 10: Cassandra UDF and Materialized Views

How to create an UDF ?

© 2015 DataStax, All Rights Reserved. 10

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnType LANGUAGE languageAS ‘ // source code here‘;

CQL types •  primitives (boolean, int, …) •  collections (list, set, map) •  tuples •  UDT

Page 11: Cassandra UDF and Materialized Views

How to create an UDF ?

© 2015 DataStax, All Rights Reserved. 11

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE languageAS ‘ // source code here‘; JVM supported languages

•  Java, Scala •  Javascript (slow) •  Groovy, Jython, JRuby •  Clojure ( JSR 223 impl issue)

Page 12: Cassandra UDF and Materialized Views

How to create an UDF ?

© 2015 DataStax, All Rights Reserved. 12

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE languageAS ‘ // source code here‘;

Avoid simple quotes ( ‘ ) or escape them

Page 13: Cassandra UDF and Materialized Views

© 2015 DataStax, All Rights Reserved. 13

Simple UDF Demo max()

Page 14: Cassandra UDF and Materialized Views

UDF creation for C* 2.2 and 3.0•  UDF definition sent to coordinator

•  Check for user permissions (ALTER, DROP, EXECUTE)•  Coordinator compiles it using ECJ compiler (not Javac because of licence

issue). •  If language != Java, use other compilers (you should ship the jars on C*

servers)•  Announces SchemaChange = CREATED, Target = FUNCTION to all nodes•  Code loaded into child classloader of current classloader

© 2015 DataStax, All Rights Reserved. 14

Page 15: Cassandra UDF and Materialized Views

UDF creation in C* 3.0•  Compiled byte code is verified

•  static and instance code block•  declaring methods/constructor à no anonymous class creation•  declaring inner class•  use of synchronized •  accessing forbidden classes, ex: java.net.NetworkInterface•  accessing forbidden methods, ex: Class.forName()•  …

•  Code loaded into seperated classloader •  verify loaded classes against whitelist/backlist

© 2015 DataStax, All Rights Reserved. 15

Page 16: Cassandra UDF and Materialized Views

UDF execution in C* 2.2•  No safeguard, arbitrary code execution server side …

•  Flag “enable_user_defined_functions” in cassandra.yaml turned OFF by default

•  Keep your UDFs pure (stateless)

•  Avoid expensive computation (for loop …)

© 2015 DataStax, All Rights Reserved. 16

Page 17: Cassandra UDF and Materialized Views

UDF execution in C* 3.0•  If "enable_user_defined_functions_threads" turned ON (default)

•  execute UDF in separated ThreadPool •  if duration longer than "user_defined_function_warn_timeout", retry with new

timeout = fail_timeout – warn_timeout •  if duration longer than "user_defined_function_fail_timeout", exception

•  Else•  execute in same thread

© 2015 DataStax, All Rights Reserved. 17

Page 18: Cassandra UDF and Materialized Views

UDF creation best practices

© 2015 DataStax, All Rights Reserved. 18

•  Keep your UDFs pure (stateless)

•  No expensive computation (huge for loop)

•  Prefer Java language for perf

Page 19: Cassandra UDF and Materialized Views

UDA

© 2015 DataStax, All Rights Reserved. 19

•  Real use-case for UDF

•  Aggregation server-side à huge network bandwidth saving

•  Provide similar behavior for Group By, Sum, Avg etc …

Page 20: Cassandra UDF and Materialized Views

How to create an UDA ?

© 2015 DataStax, All Rights Reserved. 20

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;

Only type, no param name

State type

Initial state type

Page 21: Cassandra UDF and Materialized Views

How to create an UDA ?

© 2015 DataStax, All Rights Reserved. 21

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;

Accumulator function. Signature: accumulatorFunction(stateType, type1, type2, …)

RETURNS stateType

Page 22: Cassandra UDF and Materialized Views

How to create an UDA ?

© 2015 DataStax, All Rights Reserved. 22

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;

Optional final function. Signature: finalFunction(stateType)

Page 23: Cassandra UDF and Materialized Views

How to create an UDA ?

© 2015 DataStax, All Rights Reserved. 23

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;

UDA return type ? If finalFunction •  return type of finalFunction Else •  return stateType

Page 24: Cassandra UDF and Materialized Views

How to create an UDA ?

© 2015 DataStax, All Rights Reserved. 24

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond; If accumulatorFunction RETURNS NULL ON NULL INPUT

•  initCond should be defined Else •  initCond can be null (default value)

Page 25: Cassandra UDF and Materialized Views

UDA execution

© 2015 DataStax, All Rights Reserved. 25

•  Start with state = initCond

•  For each row•  call accumulator function with previous state + params •  update state with accumulator function output

•  If no finalFunction •  return last state

•  Else•  apply finalFunction and returns its output

Page 26: Cassandra UDF and Materialized Views

© 2015 DataStax, All Rights Reserved. 26

UDA Demo Group By

Page 27: Cassandra UDF and Materialized Views

Gotchas

© 2015 DataStax, All Rights Reserved. 27

C* C*

C*

C*

UDF ①

④ UDF execution ⑤

Page 28: Cassandra UDF and Materialized Views

Gotchas

© 2015 DataStax, All Rights Reserved. 28

C* C*

C*

C*

UDF ①

④ UDF execution ⑤

Why do not apply UDF on replica node ?

Page 29: Cassandra UDF and Materialized Views

Gotchas

© 2015 DataStax, All Rights Reserved. 29

C* C*

C*

C*

UDA ①

② & ③

④ •  apply accumulatorFunction•  apply finalFunction

⑤ ② & ③

② & ③

1.  Cannot apply accumulatorFunction on replicas

2.  UDA applied AFTER read-repair

Page 30: Cassandra UDF and Materialized Views

Gotchas

© 2015 DataStax, All Rights Reserved. 30

C* C*

C*

C*

UDA ①

② & ③

④ •  apply accumulatorFunction•  apply finalFunction

⑤ ② & ③

② & ③

Avoid UDA on large number of rows

•  WHERE partitionKey IN (…) •  WHERE clustering >= AND clusterting <=

Page 31: Cassandra UDF and Materialized Views

Future applications of UDFs & UDAs•  Functional index (CASSANDRA-7458)

•  Partial index (CASSANDRA-7391)

•  WHERE clause filtering with UDF

•  …

© 2015 DataStax, All Rights Reserved. 31

Page 32: Cassandra UDF and Materialized Views

© 2015 DataStax, All Rights Reserved. 32

Q & A

! "

Page 33: Cassandra UDF and Materialized Views

Materialized Views•  Why ? •  Detailed Impl •  Gotchas

© 2015 DataStax, All Rights Reserved. 33

Page 34: Cassandra UDF and Materialized Views

Why Materialized Views ?•  Relieve the pain of manual denormalization

© 2015 DataStax, All Rights Reserved. 34

CREATE TABLE user( id int PRIMARY KEY, country text, …);CREATE TABLE user_by_country( country text, id int, …, PRIMARY KEY(country, id));

Page 35: Cassandra UDF and Materialized Views

Materialzed View In Action

© 2015 DataStax, All Rights Reserved. 35

CREATE MATERIALIZED VIEW user_by_country AS SELECT country, id, firstname, lastnameFROM user WHERE country IS NOT NULL AND id IS NOT NULLPRIMARY KEY(country, id)

CREATE TABLE user_by_country ( country text, id int, firstname text, lastname text, PRIMARY KEY(country, id));

Page 36: Cassandra UDF and Materialized Views

Materialzed View Syntax

© 2015 DataStax, All Rights Reserved. 36

CREATE MATERIALIZED VIEW [IF NOT EXISTS] keyspace_name.view_name

AS SELECT column1, column2, ...

FROM keyspace_name.table_name

WHERE column1 IS NOT NULL AND column2 IS NOT NULL ...

PRIMARY KEY(column1, column2, ...)

Must select all primary key columns of base table

•  IS NOT NULL condition for now •  more complex conditions in future

•  at least all primary key columns of base table (ordering can be different)

•  maximum 1 column NOT pk from base table •  1:1 relationship between base row & MV row

Page 37: Cassandra UDF and Materialized Views

© 2015 DataStax, All Rights Reserved. 37

Materialzed View Demo user_by_country

Page 38: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 38

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

① create BatchLog

for update

OPTIONAL STEP activated by

–Dcassandra.mv_enable_coordinator_batchlog=true

Page 39: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 39

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

② •  send mutation to all replicas•  waiting for ack(s) with CL

Page 40: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 40

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

③ Acquire local lock on base table partition

Page 41: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 41

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

④ Local read to fetch current values

SELECT * FROM user WHERE id=1

Page 42: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 42

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑤ Create BatchLog with

•  DELETE FROM user_by_country WHERE country = ‘old_value’

•  INSERT INTO user_by_country(country, id, …) VALUES(‘FR’, 1, ...)

Page 43: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 43

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑥ Execute async BatchLog

to paired view replica with CL = ONE

Page 44: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 44

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑦ Apply base table updade locallySET COUNTRY=‘FR’

Page 45: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 45

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑧ Release local lock

Page 46: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 46

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑨ Return ack to coordinator

Page 47: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 47

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

①⓪ If CL ack(s)

received, ack client

Page 48: Cassandra UDF and Materialized Views

Materialized View Impl

© 2015 DataStax, All Rights Reserved. 48

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

①① If QUORUM ack(s) received, remove

BatchLog

OPTIONAL STEP activated by

–Dcassandra.mv_enable_coordinator_batchlog=true

Page 49: Cassandra UDF and Materialized Views

Materialzed Views impl explainedWhat is paired replica ?•  Base primary replica for id=1 is paired with MV primary replica for

country=‘FR’ •  Base secondary replica for id=1 is paired with MV secondary replica for

country=‘FR’ •  etc …

© 2015 DataStax, All Rights Reserved. 49

Page 50: Cassandra UDF and Materialized Views

Materialzed Views impl explainedWhy local lock on base replica ?•  to provide serializability on concurrent updates

Why BatchLog on base replica ?•  necessary for eventual durability •  replay the MV delete + insert until successful

Why BatchLog on base replica uses CL = ONE ?•  each base replica is responsible for update of its paired MV replica•  CL > ONE will create un-necessary duplicated mutations

© 2015 DataStax, All Rights Reserved. 50

Page 51: Cassandra UDF and Materialized Views

Materialzed Views impl explainedWhy BatchLog on coordinator with QUORUM ?•  necessary to cover some edge-cases

© 2015 DataStax, All Rights Reserved. 51

Page 52: Cassandra UDF and Materialized Views

MV Failure Cases: concurrent updates

© 2015 DataStax, All Rights Reserved. 52

Read base row (country=‘UK’)•  DELETE FROM mv WHERE

country=‘UK’•  INSERT INTO mv …(country)

VALUES(‘US’)•  Send async BatchLog •  Apply update country=‘US’

1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’

Read base row (country=‘UK’)•  DELETE FROM mv WHERE

country=‘UK’•  INSERT INTO mv …(country)

VALUES(‘FR’)•  Send async BatchLog •  Apply update country=‘FR’

t0

t1

t2

Without local

lock

Page 53: Cassandra UDF and Materialized Views

MV Failure Cases: concurrent updates

© 2015 DataStax, All Rights Reserved. 53

Read base row (country=‘UK’)•  DELETE FROM mv WHERE

country=‘UK’•  INSERT INTO mv …(country)

VALUES(‘US’)•  Send async BatchLog •  Apply update country=‘US’

1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’

Read base row (country=‘US’)•  DELETE FROM mv WHERE

country=‘US’•  INSERT INTO mv …(country)

VALUES(‘FR’)•  Send async BatchLog •  Apply update country=‘FR’

With local lo

ck

🔒

🔓 🔒

🔓

Page 54: Cassandra UDF and Materialized Views

MV Failure Cases: failed updates to MV

© 2015 DataStax, All Rights Reserved. 54

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑥ Execute async BatchLog

to paired view replica with CL = ONE

MV replica d

own

Page 55: Cassandra UDF and Materialized Views

MV Failure Cases: failed updates to MV

© 2015 DataStax, All Rights Reserved. 55

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

BatchLog replay

MV replica u

p

Page 56: Cassandra UDF and Materialized Views

MV Failure Cases: base replica(s) down

© 2015 DataStax, All Rights Reserved. 56

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

② •  send mutation to all replicas•  waiting for ack(s) with CL

⁇ ⁇

✔︎

Base replic

as down

Page 57: Cassandra UDF and Materialized Views

MV Failure Cases: base replica(s) down

© 2015 DataStax, All Rights Reserved. 57

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1 ✔ ✔

✔︎

Base replic

as up

BatchLog replay

Page 58: Cassandra UDF and Materialized Views

Materialized Views Gotchas•  Write performance

•  local lock•  local read-before-write for MV (most of perf hits here)•  local batchlog for MV•  coordinator batchlog (OPTIONAL)•  ☞ you only pay this price once whatever number of MV

•  Consistency level•  CL honoured for base table, ONE for MV + local batchlog•  QUORUM for coordinator batchlog (OPTIONAL)

© 2015 DataStax, All Rights Reserved. 58

Page 59: Cassandra UDF and Materialized Views

Materialized Views Gotchas•  Beware of hot spots !!!

•  MV user_by_gender 😱•  Repair, read-repair, index rebuild, decomission …

•  repair on base replica à update on MV paired replica•  repair on MV replica possible•  read-repair on MV behaves as normal read-repair

© 2015 DataStax, All Rights Reserved. 59

Page 60: Cassandra UDF and Materialized Views

Materialized Views Gotchas•  Schema migration

•  cannot drop column from base table used by MV•  can add column to base table, initial value = null from MV•  cannot drop base table, drop all MVs first

© 2015 DataStax, All Rights Reserved. 60

Page 61: Cassandra UDF and Materialized Views

© 2015 DataStax, All Rights Reserved. 61

Q & A

! "

Page 62: Cassandra UDF and Materialized Views

© 2015 DataStax, All Rights Reserved. 62

@doanduyhai

[email protected]

https://academy.datastax.com/

Thank You