46

Are you Kudu-ing me?!

Embed Size (px)

Citation preview

Page 1: Are you Kudu-ing me?!
Page 2: Are you Kudu-ing me?!
Page 3: Are you Kudu-ing me?!
Page 4: Are you Kudu-ing me?!
Page 5: Are you Kudu-ing me?!
Page 6: Are you Kudu-ing me?!
Page 7: Are you Kudu-ing me?!
Page 8: Are you Kudu-ing me?!
Page 9: Are you Kudu-ing me?!
Page 10: Are you Kudu-ing me?!

This folks must be all wrong, aren’t they?

Page 11: Are you Kudu-ing me?!

uuid first_name last_name dob

ee-c6-47-2c John Connor Feb 28th, 1985

84-ee-ff-d5 Sarah Connor May 11th, 1965

57-4f-d9-d8 Kyle Reese Mar 1st, 2002

SELECT MIN(dob) FROM characters WHERE last_name=”connor”

Page 12: Are you Kudu-ing me?!

uuid

ee-c6-47-2c

84-ee-ff-d5

57-4f-d9-d8

last_name

Connor

Connor

Reese

first_name

John

Sarah

Kyle

dob

Feb 28th, 1985

May 11th, 1965

Mar 1st, 2002

SELECT MIN(dob) FROM characters WHERE last_name=”connor”

Page 13: Are you Kudu-ing me?!

What’s the problem with Apache Parquet then?

Page 14: Are you Kudu-ing me?!

Ever implemented Lambda Architecture?

Page 15: Are you Kudu-ing me?!
Page 16: Are you Kudu-ing me?!
Page 17: Are you Kudu-ing me?!
Page 18: Are you Kudu-ing me?!
Page 19: Are you Kudu-ing me?!

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

CREATE TABLE ’characters’ (last_name STRING,first_name STRING,movie STRING,actor STRING,actor_age INT

)DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETSTBLPROPERTIES (

’kudu.key_columns’ = ’last_name, first_name, movie, actor’)

Page 20: Are you Kudu-ing me?!

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

CREATE TABLE ’characters’ (last_name STRING,first_name STRING,movie STRING,actor STRING,actor_age INT

)DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETSTBLPROPERTIES (

’kudu.key_columns’ = ’last_name, first_name, movie, actor’)

Page 21: Are you Kudu-ing me?!

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

CREATE TABLE ’characters’ (last_name STRING,first_name STRING,movie STRING,actor STRING,actor_age INT

)DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETSTBLPROPERTIES (

’kudu.key_columns’ = ’last_name, first_name, movie, actor’)

Page 22: Are you Kudu-ing me?!
Page 23: Are you Kudu-ing me?!

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

Page 24: Are you Kudu-ing me?!

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

Somewhere between BigTable/HBase range partitioning and Cassandra’s hash partitioning.

Page 25: Are you Kudu-ing me?!

last_name

Connor

Connor

Reese

first_name

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator 2

actor

Edward Furlong

Michael Edwards

Michael Biehn

actor_age

14

47

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

Page 26: Are you Kudu-ing me?!

last_name

Connor

Connor

Reese

first_name

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator 2

actor

Edward Furlong

Michael Edwards

Michael Biehn

actor_age

14

47

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

INSERT INTO characters (last_name, first_name, movie, actor, actor_age)

VALUES(’Connor’, ’John’, ’Terminator Genisys’, ’Jason Clarke’, 36)

Page 27: Are you Kudu-ing me?!

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

INSERT INTO characters (last_name, first_name, movie, actor, actor_age)

VALUES(’Connor’, ’John’, ’Terminator Genisys’, ’Jason Clarke’, 36)

Delta

Page 28: Are you Kudu-ing me?!

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

SELECT MAX(actor_age) FROM characters WHERE last_name=’Connor’

Page 29: Are you Kudu-ing me?!

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

SELECT MAX(actor_age) FROM characters WHERE last_name=’Connor’

MPP FTW

Page 30: Are you Kudu-ing me?!

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

SELECT MAX(actor_age) FROM characters WHERE movie=’Terminator 2’

Page 31: Are you Kudu-ing me?!

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

SELECT MAX(actor_age) FROM characters WHERE movie=’Terminator 2’

Bloom filters FTW

Page 32: Are you Kudu-ing me?!
Page 33: Are you Kudu-ing me?!
Page 34: Are you Kudu-ing me?!

Tablet Server 1

Tablet Server 2

Master

Page 35: Are you Kudu-ing me?!

Leader

Leader

MasterMaster replica

Leader

Leader

Tablet Server 1

Tablet Server 2

Tablet Server 3

Page 36: Are you Kudu-ing me?!

Leader

Leader

Tablet Server 1

Tablet Server 2

MasterMaster replica

Tablet Server 3

Leader

Leader

Typically 10-100 tablets per machine.

Page 37: Are you Kudu-ing me?!
Page 38: Are you Kudu-ing me?!
Page 39: Are you Kudu-ing me?!

DiskRowSet

• Col A

• Col B

• …

• [Delta store]

DiskRowSet

• Col A

• Col B

• …

• [Delta store]

MemRowSet

• Col A

• Col B

• …

In-memory concurrent B-tree,Keeps all recently-inserted rows

Each column separately written in a single contiguous block of data

Base data

Deltas organized by rows(until compaction happens)

Page 40: Are you Kudu-ing me?!
Page 41: Are you Kudu-ing me?!
Page 42: Are you Kudu-ing me?!
Page 43: Are you Kudu-ing me?!

Long story short:- 30% faster than Parquet 1.0 (TPC-H)- 16-187 times faster than Phoenix or HBase (TPC-H again)- hundreds of thousands of rows inserted per second on a single tablet server

Page 44: Are you Kudu-ing me?!

TPC-H test, scale factor 100, RF 3- 75 nodes, each: 64 GB RAM, 12 spinning disks, 2x 6-core Xeon- Expansion of 62 GB of data (post-replication, compactions done):

- 570 GB in Hbase (9.2x)- 227 GB in Kudu (3.7x)

http://getkudu.io/kudu.pdf

Page 45: Are you Kudu-ing me?!

http://getkudu.io/

http://getkudu.io/faq.html