64
Apache HBase: Overview and Use Cases Apekshit Sharma

Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

  • Upload
    others

  • View
    39

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

1©  Cloudera,  Inc.  All  rights  reserved.

Apache  HBase:Overview and  Use  CasesApekshit Sharma

Page 2: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

2©  Cloudera,  Inc.  All  rights  reserved.

Apekshit Sharma• Software  Engineer,  Cloudera• Ex-­‐Software  Engineer,  Google

• Apache  HBase contributor• Performance  improvements,  replication,  build  infra,  etc

About myself

Page 3: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

3©  Cloudera,  Inc.  All  rights  reserved.

Contents

•Motivation• Apache  HBase  data  model• Overview  of  Architecture• Few  usage  patterns

Page 4: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

4©  Cloudera,  Inc.  All  rights  reserved.

Motivation

Page 5: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

5©  Cloudera,  Inc.  All  rights  reserved.

•What  if  you’re  not  trying  to  index  the  internet?

Motivation

Page 6: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

6©  Cloudera,  Inc.  All  rights  reserved.

• Open  Source

• Horizontally  Scalable

• Consistent

• Random  access,  low  latency

• Built  on  top  of  HDFS

Page 7: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

7©  Cloudera,  Inc.  All  rights  reserved.

Data  model

Row  key info:name info:age comp:base comp:stocks

121 ‘tom’ ‘28’ ‘125k’

145 ‘bob’ ‘32’ ‘110k’ ‘50’  (ts=2012)‘100’   (ts=2014)

Columns

Cells

Row  keys

Page 8: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

8©  Cloudera,  Inc.  All  rights  reserved.

Data  model

• Sorted  rows :  supports  billions  of  rows

• Columns  : Supports  millions  of  columns

• Cell  :  intersection  of  row  and  column.

• Can  have  multiple  values  (which  are  time-­‐stamped)

• Can  be  empty.  No  storage/processing  overheads

Page 9: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

9©  Cloudera,  Inc.  All  rights  reserved.

HBase  Architecture

Unique  id Name price weight store1 store2 store3“1000000” snickers $9.99 4  Oz Yes Yes Yes“3000000” almonds $9.99 8  Oz Yes No Yes“8000000” coke $9.99 16  Oz Yes Yes Yes

Page 10: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

10©  Cloudera,  Inc.  All  rights  reserved.

HBase  Architecture

Unique  id Name price weight store1 store2 store3“1000000” snickers $9.99 4  Oz Yes Yes Yes“3000000” almonds $9.99 8  Oz Yes No Yes“8000000” coke $9.99 16  Oz Yes Yes Yes“4000000” new $34.63 16  Oz No Yes Yes

Page 11: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

11©  Cloudera,  Inc.  All  rights  reserved.

HBase  Architecture

Unique  id Name price weight store1 store2 store3“1000000” snickers $9.99 4  Oz Yes Yes Yes“3000000” almonds $9.99 8  Oz Yes No Yes“8000000” coke $9.99 16  Oz Yes Yes Yes“4000000” foo $34.63 16  Oz No Yes Yes“5000000” bar $22.54 16  Oz Yes Yes Yes

Page 12: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

12©  Cloudera,  Inc.  All  rights  reserved.

HBase  Architecture

Unique  id Name price weight store1 store2 store3“1000000” snickers $9.99 4  Oz Yes Yes Yes“3000000” almonds $9.99 8  Oz Yes No Yes“8000000” coke $9.99 16  Oz Yes Yes Yes“4000000” foo $34.63 16  Oz No Yes Yes“5000000” bar $22.54 16  Oz Yes Yes Yes“9000000” new1 $2.5 16  Oz Yes Yes Yes“7000000” new2 $6.4 16  Oz Yes Yes Yes“2000000” new3 $6.4 16  Oz Yes Yes Yes

Page 13: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

13©  Cloudera,  Inc.  All  rights  reserved.

Row Key Name brand price weight store1 store2 store3“1000000” snickers xxx $9.99 4  Oz Yes Yes Yes“2000000” new3 xxx $6.4 16  Oz Yes Yes Yes“3000000” almonds xxx $9.99 8  Oz Yes No Yes“4000000” foo xxx $34.63 16  Oz No Yes Yes

Row  Key Name brand price weight store1 store2 store3“5000000” bar xxx $22.54 16  Oz Yes Yes Yes“7000000” new2 xxx $6.4 16  Oz Yes Yes Yes“8000000” coke xxx $9.99 16  Oz Yes Yes Yes“9000000” new1 xxx $2.5 16  Oz Yes Yes Yes

[  “”,  “5000000”)  

[  “5000000”,  “”)  

HBase  Architecture  |  Regions

Page 14: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

14©  Cloudera,  Inc.  All  rights  reserved.

Row Key Name price weight ….“1000000” snickers $9.99 4  Oz ….“2000000” new3 $6.4 16  Oz ….“3000000” almonds $9.99 8  Oz ….“4000000” foo $34.63 16  Oz ….

Row Key Name price weight ….

“5000000” bar $22.54 16  Oz ….“7000000” new2 $6.4 16  Oz ….“8000000” coke $9.99 16  Oz ….“9000000” new1 $2.5 16  Oz ….

HBase  Architecture  |  RegionServer

Server  12

Server  7

Page 15: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

15©  Cloudera,  Inc.  All  rights  reserved.

HBase  Architecture  |  Regions

• Horizontal  split  of  tables

• Regions  are  served  by  RegionServers

• Automatically  (based  on  configurations).  Can  be  done  manually  too.

Page 16: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

16©  Cloudera,  Inc.  All  rights  reserved.

Row Key Name price weight store1 store2 store3“1000000” snickers $9.99 4  Oz Yes Yes Yes“2000000” new3 $6.4 16  Oz Yes Yes Yes“3000000” almonds $9.99 8  Oz Yes No Yes“4000000” foo $34.63 16  Oz No Yes Yes“5000000” bar $22.54 16  Oz Yes Yes Yes“7000000” new2 $6.4 16  Oz Yes Yes Yes“8000000” coke $9.99 16  Oz Yes Yes Yes“9000000” new1 $2.5 16  Oz Yes Yes Yes

HBase  Architecture

Page 17: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

17©  Cloudera,  Inc.  All  rights  reserved.

Row Key info:name

info:price

info:weight

availability:store1

availability:store2

availability:store3

“1000000” snickers $9.99 4  Oz Yes Yes Yes“2000000” new3 $6.4 16  Oz Yes Yes Yes“3000000” almonds $9.99 8  Oz Yes No Yes“4000000” foo $34.63 16  Oz No Yes Yes“5000000” bar $22.54 16  Oz Yes Yes Yes“7000000” new2 $6.4 16  Oz Yes Yes Yes“8000000” coke $9.99 16  Oz Yes Yes Yes“9000000” new1 $2.5 16  Oz Yes Yes Yes

HBase  Architecture  |  Column  family

Page 18: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

18©  Cloudera,  Inc.  All  rights  reserved.

HBase  Architecture  |  Column  family

• You  can  also  see  it  as  vertical  splits

• Data  stored  in  separate  files

• Tune  performance

• In-­‐memory• Compression• Version  retention  policies• Cache  priority

• Needs  to  be  specified  by  the  user

Page 19: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

19©  Cloudera,  Inc.  All  rights  reserved.

HBase  Architecture

info:name

info:price

info:weight

“1000000” snickers $9.99 4  Oz

“2000000” new3 $6.4 16  Oz

“3000000” almonds $9.99 8  Oz

available:  store1

available:  store2

available:  store3

“1000000” Yes Yes Yes

“2000000” Yes Yes Yes

“3000000” Yes No Yes

Region

ColumnFamily

ColumnFamily

Page 20: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

20©  Cloudera,  Inc.  All  rights  reserved.

Enough  of  Architecture

Lets  move  to  use  patterns  

Page 21: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

21©  Cloudera,  Inc.  All  rights  reserved.

Apache  HBase  “Nascar”  Slide

Page 22: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

22©  Cloudera,  Inc.  All  rights  reserved.

Apache  HBase  “Nascar”  Slide

Page 23: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

23©  Cloudera,  Inc.  All  rights  reserved.

Apache  HBase  “Nascar”  Slide

Page 24: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

24©  Cloudera,  Inc.  All  rights  reserved.

Apache  HBase  “Nascar”  Slide

Page 25: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

25©  Cloudera,  Inc.  All  rights  reserved.

What  have  we  learned  from  so  many  users?

Page 26: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

26©  Cloudera,  Inc.  All  rights  reserved.

There  are  some  patterns  which  repeat  often.

Just  like  a  Lego  block,  maybe  you  can  fit  one  directly  in  your  system!

Page 27: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

27©  Cloudera,  Inc.  All  rights  reserved.

● Entity  Data

● Time-­‐centric  Event  Data

● Real-­‐time  vs  Batch

● Random  vs  Sequential

Data How  it  goes  in  and  out

Know  your    ...

Page 28: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

28©  Cloudera,  Inc.  All  rights  reserved.

Know  your    data  ...There  are  primarily  two  kinds  of  big  data  workloads.    They  have  different  storage  requirements.

• Entity  centric  data

• Time  centric  event  data

Page 29: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

29©  Cloudera,  Inc.  All  rights  reserved.

• Scales  up  with  #  of  entities

• Billions  of  distinct  entities

Entity  centric  data

Users Accounts Location Clicks  and  Metrics

Page 30: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

30©  Cloudera,  Inc.  All  rights  reserved.

• Time-­‐series  data  points  over  a  period

• Scales  up  due  to  finer  grained  intervals,  retention  policies,  and  the  passage  of  time

Time  centric  event  data

Periodic  Sensor  DataStock  Ticker  Data Monitoring  applications

Page 31: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

31©  Cloudera,  Inc.  All  rights  reserved.

Time  

Entities

Now

e1

e2

e3

e5

e4

Page 32: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

32©  Cloudera,  Inc.  All  rights  reserved.

Time   Now

Entities dataEntities  data

Millions  of  entities  =  Big  Data

e1

e2

e3

e5

e4

Entities

Page 33: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

33©  Cloudera,  Inc.  All  rights  reserved.

Time   Now

Time-­‐centric  events  data

Time  centric  events  data

Millions  of  events  =  Big  Data

Page 34: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

34©  Cloudera,  Inc.  All  rights  reserved.

Time   Now

Time-­‐centric  events  about  Entities

e1

e2

e3

e5

e4

Entities

|Entities|  *  |Events|  =

Big  Big  Data

Page 35: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

35©  Cloudera,  Inc.  All  rights  reserved.

What  questions  do  you  ask?

• Do  you  focus  in  on  entity  first?

OR

• Do  you  focus  in  on  time  ranges  first?

• Your  answer  will  help  you  determine  where  and  how  to  store  your  data.

Page 36: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

36©  Cloudera,  Inc.  All  rights  reserved.

Time   Now

Entities

user1

user2

user3

user4

user5

Let’s  say  you  start  your  own  e-­‐mail  service….

Page 37: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

37©  Cloudera,  Inc.  All  rights  reserved.

Entity  first  questions…For  a  give  user,  showlast  N  messages.

Time   Now

Entities

user1

user2

user3

user4

user5

Page 38: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

38©  Cloudera,  Inc.  All  rights  reserved.

Time   Now

Entity  first  questions…For  a  give  user,  show  allthe  messages.

Entities

user1

user2

user3

user4

user5

Page 39: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

39©  Cloudera,  Inc.  All  rights  reserved.

Entity  first  questions…

T1 T2

For  a  give  user,  show  all  messagesreceived  between  time  [t1,  t2].

Entities

Time   Now

Entities

user1

user2

user3

user4

user5

Page 40: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

40©  Cloudera,  Inc.  All  rights  reserved.

Time  centric  event  first  questions…

T1 T2

Find  all  messages  betweentime  [t1,  t2].

Time   Now

Entities

user1

user2

user3

user4

user5

Page 41: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

41©  Cloudera,  Inc.  All  rights  reserved.

Time  centric  event  first  questions…

T1 T2

Find  all  messages  betweentime  [t1,  t2]  for  all  users.

Time  Time   Now

Entities

user1

user2

user3

user4

user5

Page 42: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

42©  Cloudera,  Inc.  All  rights  reserved.

How  does  the  data  get  in  and  out  of  HBase?

Page 43: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

43©  Cloudera,  Inc.  All  rights  reserved.

Getting  data  in...

Apache  HBase

Put,  Incr,  Append

Bulk  Import

Page 44: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

44©  Cloudera,  Inc.  All  rights  reserved.

Getting  data  out...

Apache  HBase

Get,  Short  Scans

Full  scan

Page 45: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

45©  Cloudera,  Inc.  All  rights  reserved.

So,  what’s  the  best  way?

Page 46: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

46©  Cloudera,  Inc.  All  rights  reserved.

Depends  on  your  use  caseBottom-­‐line:  Disk  I/O  takes  times.

-­ Limited  disk  read-­‐write  heads  in  a  cluster

-­ Use  the  I/O  bandwidth  of  your  cluster  efficiently

Page 47: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

47©  Cloudera,  Inc.  All  rights  reserved.

Apache  HBase

Put,  Incr,  Append

Bulk  Import

Get,  Short  Scans

Full  scan

Real-­‐time

Batch

Page 48: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

48©  Cloudera,  Inc.  All  rights  reserved.

Let’s  dive  into use  case  ...

Page 49: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

49©  Cloudera,  Inc.  All  rights  reserved.

Simple  Entities

• Purely  entity  data,  no  relation  between  entities

• Often  from  many  different  sources

• Could  be  a  well-­‐done  de-­‐normalized  RDBMS

Time   Now

e1

e2

e3

e5

e4

Entities

Page 50: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

50©  Cloudera,  Inc.  All  rights  reserved.

Simple  Entities :  Schema

• One  row per  entity

• Row  key  =>  entity  ID,  or  hash  of  entity  ID

• Column  =>  Property  /  field,  possibly  timestamp

Page 51: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

51©  Cloudera,  Inc.  All  rights  reserved.

Simple  Entities :  ExampleOCLC  :  Online  Computer  Library  Center

Workloads:• Lookup  books  à Real  time  read• Add  new  book  one  at  a  time,  update  information  about  existing  books,  issue  books  à Real-­‐time  write• New  library  joins  the  group,  import  its  data  à Batch  write

Apache  HBase

Put,  Incr,  Append

Bulk  Import

Get,  Short  ScansReal-­‐time

Batch

Page 52: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

52©  Cloudera,  Inc.  All  rights  reserved.

Linked  Entities  (Graph  Data)

• Entity  are  linked  to  form  a  graph

Time   Now

e1

e2

e3

e5

e4

Entities

Page 53: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

53©  Cloudera,  Inc.  All  rights  reserved.

Linked  Entities  (Graph  Data)  :  Schema

• One  row per Node (Entity)

• Row  key  =>  Node  ID  (Entity  ID)

• One  Column  Family  to  store  edges  =>  

“Relationship:OtherNodeID”

• Value  =>  Meta  data  about  relationship

• Second  column  family  to  store  other  information  about  node

Page 54: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

54©  Cloudera,  Inc.  All  rights  reserved.

Linked  Entities  (Graph  Data)  :  ExampleSocial  Network  (Facebook)Workloads:

• Get  any  info  about  a  user  à Real  time  

read

• Update  any  info  about  a  user  à Real  

time  write

• Limited  graph  analysis  (based  on  

immediate  friends)  à Batch  read

Apache  HBase

Put,  Incr,  Append

Get,  Short  Scans

Full  scan

Real-­‐time

Batch

Page 55: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

55©  Cloudera,  Inc.  All  rights  reserved.

Time-­‐coupled  entities

• Events  about  entities

• Focus  on  entities  first

Time   Now

e1

e2

e3

e5

e4

Entities

Page 56: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

56©  Cloudera,  Inc.  All  rights  reserved.

Time-­‐coupled  entities  :  Schema

• Row  = Entity’s  events  in  a  time  slice

• Row  key  =  Entity  ID +  (time /  k)

• Column  Qualifier  =  timestamp

Page 57: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

57©  Cloudera,  Inc.  All  rights  reserved.

Time-­‐coupled  entities:  Example

Messaging  service

Primary  Workload

• Sending  a  message,  update  metadata  (read,  star,  move,  delete)  à

Real-­‐time  write

• Reading  a  message,  get  last  N  messages  à Real-­‐time  read

Page 58: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

58©  Cloudera,  Inc.  All  rights  reserved.

HBase  is  great!

But  there  are  some  use  cases  which  are  better  off  without  it...

Page 59: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

59©  Cloudera,  Inc.  All  rights  reserved.

Current  HBase  weak  spots

• HBase  architecture  can  handle  a  lot

• Engineering  tradeoffs  optimize  for  some  use  cases

• HBase  can  still  do  things  it  is  not  optimal  for

• Other  systems  are  fundamentally  more  efficient  for  some  workloads

Page 60: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

60©  Cloudera,  Inc.  All  rights  reserved.

A  not  so  good  use  case:  Large  Blob  Store

• Saving  large  objects  >50  MB  per  cell

• Examples

• Raw  video  storage  in  HBase

• Problems:

• Write  amplification  when  re-­‐optimizing  data  for  read  (compactions  on  large  unchanging  data)

• New:  Medium  Object  (MOB)  supported  (lots  of  100KB-­‐10MB  cells)

Page 61: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

61©  Cloudera,  Inc.  All  rights  reserved.

Another  not  good  use  case:  Analytic  archive

• Store  data  chronologically,  time  as  primary  index

• Schema• Row  key:  timestamp

• Monotonically  increasing  row  key

• Column  qualifiers:  properties  with  data  or  counters

• Example• Machine  logs  organized  by  timestamp  (causes  write  hot-­‐spotting)

Page 62: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

62©  Cloudera,  Inc.  All  rights  reserved.

That’s  all  folks!

Page 63: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

63©  Cloudera,  Inc.  All  rights  reserved.

Questions  ?

Page 64: Apache#HBase: Overview and#Use#Cases...• Apache#HBase#data#model ... HBase#Architecture Unique#id Name price weight store1 store2 store3 “1000000” snickers $9.99 4Oz Yes Yes

64©  Cloudera,  Inc.  All  rights  reserved.

Sources

• A  Survey  of  HBase  Application  Archetypes• Lars  George,   Jon  Hsieh• http://www.slideshare.net/HBaseCon/case-­‐studies-­‐session-­‐7

• OpenTSDB 2.0• Benoit  Sigoure,  Chris  Larsen• http://www.slideshare.net/HBaseCon/ecosystem-­‐session-­‐6

• Hadoop  and  HBase:  Motivations,  Use  cases  and  Trade-­‐offs• Jon  Hsieh

• Phoenix• https://phoenix.apache.org