35
Breadth or Depth What's in a column-store? Jeff Smith February 23, 2013

Breadth or Depth: What's in a column-store?

Embed Size (px)

DESCRIPTION

My talk from Barcamp2013 in HK at PolyU

Citation preview

Page 1: Breadth or Depth: What's in a column-store?

Breadthor Depth

What's in a column-store?

Jeff SmithFebruary 23, 2013

Page 2: Breadth or Depth: What's in a column-store?

This presentationIs not

marketingtechnicalarbitrarypolitetraining

Ispersuasivefor the technicalpreciseopinionatededucational

Page 3: Breadth or Depth: What's in a column-store?

Srsouly

Page 4: Breadth or Depth: What's in a column-store?

Bio{ past :[startups, biotech, data_management],school : [research, HKU, uncertain_data],work : [AI, finance, prediction] }

Page 5: Breadth or Depth: What's in a column-store?

This guy

Daniel Abadi

Page 6: Breadth or Depth: What's in a column-store?

Back to the future● 1 database to rule them all● A scrappy band of rebels● A brave new idea

Page 7: Breadth or Depth: What's in a column-store?

The big questionWhy grab this?

When all you want is this?

id thing attr1 attr2 attr3 attr4 attr5 attr6 attr7 attr8

123 doodad abc def ghi jkl mno pqr stu vwx

id thing

123 doodad

Page 8: Breadth or Depth: What's in a column-store?

You're chopping it wrong.

Page 9: Breadth or Depth: What's in a column-store?

Relations in pieces

id pet weight poops_per_day

1 dog 40 3

2 cat 15 2

3 bird 5 4

4 snake 78 0.25

Page 10: Breadth or Depth: What's in a column-store?

Horizontal Partitions

id pet weight poops_per_day

1 dog 40 3

2 cat 15 2

3 bird 5 4

4 snake 78 0.25

Page 11: Breadth or Depth: What's in a column-store?

You gotta get yourself some marble columns.

Page 12: Breadth or Depth: What's in a column-store?

Vertical Partitions

id

1

2

3

4

pet

dog

cat

bird

snake

weight

40

15

5

78

poops_per_day

3

2

4

0.25

Page 13: Breadth or Depth: What's in a column-store?

We're gonna need a bigger table.

Page 14: Breadth or Depth: What's in a column-store?

NoSQL startsEmpire crumblesNomenclature obfuscates

BigTable

Page 15: Breadth or Depth: What's in a column-store?

I know that song!

Page 16: Breadth or Depth: What's in a column-store?

Column...families?!

row_id best_pet worst_pet illegal_pet

123 bulldog turtle rhino

row_id make model

123 Smart Fortwo

Pets Cars

Page 17: Breadth or Depth: What's in a column-store?

Modest MapYear of the snake =>4G =>NoSQL =>Beard =>Column-stores =>

Year of PythonLTENon-relationalFace-mane{column-store | column-family-store}

Page 18: Breadth or Depth: What's in a column-store?

Does it smell as sweet?

Page 19: Breadth or Depth: What's in a column-store?

...at column-oriented tasks.

C-Store rocks*

* Contrary to popular belief, after years of effort, Cleveland still does not rock.

Page 20: Breadth or Depth: What's in a column-store?

Move, b*tch.Get out the vote.

age

23

32

45

67

56

49

43

50

63

34

Page 21: Breadth or Depth: What's in a column-store?

The catch

Page 22: Breadth or Depth: What's in a column-store?

Attack of the clones

Page 23: Breadth or Depth: What's in a column-store?

The contendersHBase*Cassandra*HypertableAccumulo

* The ones that matter

Page 24: Breadth or Depth: What's in a column-store?

HBaseHadoop stackJava everywhereComponents, extensions, variables, headaches...

Page 25: Breadth or Depth: What's in a column-store?

Tastes like SQLSELECT sensorid, (20-down)/(up-down) AS probabilityFROM hive_sensors WHERE down>=10 AND up>=20 and down <=20UNION ALLSELECT sensorid, (up-10)/(up-down) AS probabilityFROM hive_sensors WHERE up>=10 AND up<=20 and down <=10UNION ALLSELECT sensorid, 1 AS probabilityFROM hive_sensors WHERE up<=20 and down >=10UNION ALLSELECT sensorid, (20-10)/(up-down) AS probability

FROM hive_sensors WHERE down<=10 AND up>=20;

Page 26: Breadth or Depth: What's in a column-store?

CassandraCQL interfacePeer to peerBetter, but...

Page 27: Breadth or Depth: What's in a column-store?

Anything you can do, I can do better.

Page 28: Breadth or Depth: What's in a column-store?

Sparsenessid attr1 attr2 attr3 attr4

1 1

2 1

3 1

4 1

5

6 1

7

8 1

9 1

10

11

Page 29: Breadth or Depth: What's in a column-store?

Dynamic Schemas

row_id best_pet worst_pet illegal_pet robot_pet

123 bulldog turtle rhino aibo

456 shi tzu gecko koala

row_id make model

123 Smart Fortwo

456 VW Golf

Pets Cars

Page 30: Breadth or Depth: What's in a column-store?

Stronger in the broken places

Page 31: Breadth or Depth: What's in a column-store?

InnovationTruly distributed systemsColumns as metadataArbitrarily deep column hierarchies*Community database development

* Someday soon, I hope

Page 32: Breadth or Depth: What's in a column-store?

Pig & friendsdata = load 'hbase://table_name' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*', '-loadKey true') AS (id:chararray, stats: map[int]);

@outputSchema("values:bag{t:tuple(key, value)}")def bag_of_tuples(map_dict): return map_dict.items()

register 'udfs.py' using jython as pydata = load 'hbase://table_name' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*', '-loadKey true') AS (id:chararray, stats: map[int]);databag = foreach data generate id, FLATTEN(py.bag_of_tuples(stats));

from Chase Seibert

Page 33: Breadth or Depth: What's in a column-store?

No dog in this fight

Page 34: Breadth or Depth: What's in a column-store?

[email protected]

Hey I just met youAnd this is crazyBut here's my emailMail me maybe

[email protected]

Work Play

Page 35: Breadth or Depth: What's in a column-store?

Disclaimer

All images used in this presentation were stolen from the internet in a daring midnight raid that left 3 dead and 8 wounded. No license was obtained for their use and no license is implied by their misappropriation.

Yarrr. BarrrCamp.

Please don't sue me. I have nothing. Just a dog. Don't take my dog.