55
Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Apache HBase – Where we’ve been and what’s upcoming Jonathan Hsieh | @jmhsieh SoMware Engineer at Cloudera | HBase PMC Member BigData.be April 4, 2014 4/4/14 BigData.be (Brussels Belgium)

Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Headline  Goes  Here  Speaker  Name  or  Subhead  Goes  Here  

DO  NOT  USE  PUBLICLY  PRIOR  TO  10/23/12  

Apache  HBase  –  Where  we’ve  been  and  what’s  upcoming  Jonathan  Hsieh  |  @jmhsieh    SoMware  Engineer  at  Cloudera  |  HBase  PMC  Member    BigData.be  April  4,  2014  

4/4/14 BigData.be (Brussels Belgium)

Page 2: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Who  Am  I?  

•  Cloudera:  •  Tech  Lead  HBase  Team  •  So<ware  Engineer  •  Apache  HBase  commiVer  /  PMC    •  Apache  Flume  founder  /  PMC  

• U  of  Washington:  •  Research  in  Distributed  Systems  

4/4/14 BigData.be (Brussels Belgium)

Page 3: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

What  is  Apache  HBase?  

Apache  HBase  is  a  reliable,  column-­‐

oriented  data  store  that  provides  consistent,  low-­‐latency,  random  read/

write  access.  

ZK   HDFS  

App   MR  

4/4/14 BigData.be (Brussels Belgium)

Page 4: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

An  HBase  History  

Where  We’ve  Been  

4/4/14 BigData.be (Brussels Belgium)

Page 5: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Jan  ‘12:  0.92.0  

Apache  HBase  Timeline  

4/4/14 BigData.be (Brussels Belgium)

2014  2006   2007   2008   2009   2010   2011   2013  2012  

Nov  ’06:  Google    BigTable  OSDI  ‘06  

Apr  ‘07:  First  Apache  HBase  commit  as  Hadoop  contrib  project  

Apr  ‘10:  Apache  HBase  becomes  top  level  project   Oct  ‘13:  0.96.0  

Apr’11:  CDH3  GA  with  HBase  0.90.1      

May  ‘12:  HBaseCon  2012  

Jun  ‘13:  HBaseCon  2013  

Jan‘08:  Promoted  to  Hadoop  subproject  

Feb  ‘13:  0.98.0  May  ‘12:  0.94.0  

Summer‘11:    Messages    on  HBase    Summer  ‘09  

StumbleUpon    goes  producdon  on  HBase  ~0.20  

Nov  ‘11:    Cassini  on  HBase  

Jan  ‘13  Phoenix  on  HBase  

Summer‘11:    Web  Crawl    Cache  

Page 6: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Developer  Community  

•  Acdve  community!  •  Diverse  commiVers  from  many  organizadons  

4/4/14 BigData.be (Brussels Belgium)

Page 7: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Apache  HBase  “Nascar”  Slide  

4/4/14 BigData.be (Brussels Belgium)

Page 8: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Apache  HBase  Core  Development  

4/4/14 BigData.be (Brussels Belgium)

•  Vendors  •  Self  Service  

Page 9: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Apache  HBase  Sample  Users  

4/4/14 BigData.be (Brussels Belgium)

•  Inbox  •  Storage  • Web  •  Search  • Analydcs  • Monitoring      

Page 10: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Apache  HBase  Ecosystem  Projects  

4/4/14 BigData.be (Brussels Belgium)

Page 11: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Disaster  recovery,  Condnuity,  and  MTTR  

Today:  Apache  0.96.0  

4/4/14 BigData.be (Brussels Belgium)

Page 12: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

HBase  provides  Low-­‐latency  Random  Access  

•  Writes:    •  1-­‐3ms,  1k-­‐20k  writes/sec  per  node  

•  Reads:    •  0-­‐3ms  cached,  10-­‐30ms  disk  •  10k-­‐40k  reads  /  second  /  node  from  cache  

•  Cell  size:    •  0B-­‐3MB    

•  Read,  write,  and  insert  data  anywhere  in  the  table  

4/4/14 BigData.be (Brussels Belgium)

0000000000  

1111111111  

2222222222  

3333333333  

4444444444  

5555555555  

6666666666  

7777777777  

1  

2  3  

4  

5  

Page 13: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Core  Properdes  

•  ACID  guarantees  on  a  row    •  Writes  are  durable  •  Strong  consistency  first,  then  availability  •  AMer  failure,  recover  and  return  current  value  instead  of  returning  stale  value  •  CAS  and  atomic  increments  can  be  efficient.  

•  Sorted  By  Primary  Key    •  Short  scans  are  efficient  •  Parddoned  by  Primary  Key  

•  Log  Structured  Merged  Tree  •  Writes  are  extremely  efficient  •  Reads  are  efficient  

•  Periodic  layout  opdmizadons  for  read  opdmizadon  (“compacdons”)  required.  

4/4/14 BigData.be (Brussels Belgium)

Page 14: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Cridcal  Features  

Disaster  Recovery  

•  Cluster  Replicadon  •  Table  Snapshots  •  Copy  Table  •  Import  /  Export  Tables  • Metadata  Corrupdon  repair  tool  (hbck)  

AdministraMve  and  ConMnuity  

•  Kerberos  based  Authendcadon  •  ACL  based  Authorizadon  •  Config  change  via  rolling  restart.  • Within  version  rolling  upgrade.  •  Protobuf  based  wire  protocol  for  RPC  future  proofing  

4/4/14 BigData.be (Brussels Belgium)

Page 15: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Hardened  for  0.96.0  

Table  AdministraMon  

• Online  Schema  change  • Online  Region  Merging  •  Condnuous  fault  injecdon  tesdng  with  “Chaos  Monkey”  

Performance  Tuning  

•  Alternate  key  encodings  for  efficient  memory  usage  

•  Exploring  Compactor  policy  minimizes  compacdon  storms  

•  Smart  and  Adapdve  Stochasdc  region  load  balancer  

•  Fast  split  policy  for  new  tables  

4/4/14 BigData.be (Brussels Belgium)

Page 16: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Mean  Time  to  Recovery  (MTTR)  

• Machine  failures  happen  in  distributed  systems  • Average  unavailability  when  automadcally  recovering  from  a  failure.  

•  Recovery  dme  for  a  unclean  data  center  power  cycle  

4/4/14 BigData.be (Brussels Belgium)

recovered  nodfy  repair  detect  

Region  unavailable  

Region  available  client  aware  

Region  available  client  unaware  

Page 17: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Fast  nodficadon  and  detecdon  (0.96)  

•  Proacdve  nodficadon  of  HMaster  failure  (0.96)  •  Proacdve  nodficadon  of  RS  failure  (0.96)  • Nodfy  client  on  recovery  (0.96)  

•  Fast  server  failover  (Hardware)  

4/4/14 BigData.be (Brussels Belgium)

recovered  replay  assign  split  

Region  unavailable  

Region  available    for  RW  

hdfs   hdfs  

detect  

hdfs  

Page 18: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

•  Previously  had  two  IO  intensive  passes:  •  Log  splisng  to  intermediate  files  •  Assign  and  log  replay  

•  Now  just  one  IO  heavy  pass:    Assign  first,  then  split+replay.  •  Improves  read  and  write  recovery  dmes.  •  Off  by  default  currently*.  

Distributed  log  replay  (0.96)  

4/4/14 BigData.be (Brussels Belgium)

recovered  split  +  replay  assign  

Region  unavailable  

Region  available    for  RW  

Region  available    for  replay  writes  

hdfs  

detect  

*Caveat:  If  you  override  dme  stamps  you  could  have    READ  REPEATED  isoladon  violadons  (use  tags  to  fix  this)  

Page 19: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

A  Future  HBase  

What’s  Upcoming  

4/4/14 BigData.be (Brussels Belgium)

Page 20: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Outline  

•  Improved  Mean  dme  to  recovery  (MTTR)  

•  Improved  Predictability  

•  Improved  Usability  

•  Improved  Muldtenancy  

4/4/14 BigData.be (Brussels Belgium)

Page 21: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Faster  read  recovery  

Improving  MTTR  Further    

4/4/14 BigData.be (Brussels Belgium)

Page 22: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

•  Previously  had  two  IO  intensive  passes:  •  Log  splisng  to  intermediate  files  •  Assign  and  log  replay  

•  Now  just  one  IO  heavy  pass:    Assign  first,  then  split+replay.  •  Improves  read  and  write  recovery  dmes.  •  Off  by  default  currently*.  

Distributed  log  replay  (0.96*)  

4/4/14 BigData.be (Brussels Belgium)

recovered  split  +  replay  assign  

Region  unavailable  

Region  available    for  RW  

Region  available    for  replay  writes  

hdfs  

detect  

*Caveat:  If  you  override  dme  stamps  you  could  have    READ  REPEATED  isoladon  violadons  (use  tags  to  fix  this)  

Page 23: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

recovered  split  +  replay  

Distributed  log  replay  with  fast  write  recovery  

4/4/14 BigData.be (Brussels Belgium)

assign  

Region  unavailable  

Region  available    for  RW  

Region  available    for  all  writes  

hdfs  

detect  

• Writes  in  HBase  do  not  incur  reads.  • With  distributed  log  replay,  we’ve  already  have  regions  open  for  write.  

• Allow  fresh  writes  while  replaying  old  logs*.  *Caveat:  If  you  override  dme  stamps  you  could  have    READ  REPEATED  isoladon  violadons  (use  tags  to  fix  this)  

Page 24: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Fast  Read  Recovery  (proposed)  

•  Idea:  Prisdne  Region  fast  read  recovery  •  If  region  not  edited  it  is  consistent  and  can  recover  RW  immediately    

•  Idea:  Shadow  Regions  for  fast  read  recovery  •  Shadow  region  tails  the  WAL  of  the  primary  region  •  Shadow  memstore  is  one  HDFS  block  behind,  catch  up  recover  RW  

•  Currently  some  progress  for  trunk  

4/4/14 BigData.be (Brussels Belgium)

recovered  assign  

Region  unavailable  

Can  guarantee  no  new  edits?  Region  available    for  all  RW  

detect  

Can  guarantee  we  have  all  edits?  Region  available  for  all  RW  

Page 25: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Improving  the  99%dle  

Improving  Predictability  

4/4/14 BigData.be (Brussels Belgium)

Page 26: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Common  causes  of  performance  variability  

•  Locality  Loss  •  Favored  Nodes,  HDFS  block  affinity  

•  Compacdon  •  Exploring  compactor  

• GC*    •  Off-­‐heap  Cache  

• Hardware  hiccups  • MulM  WAL,  HDFS  speculaMve  read  

4/4/14 BigData.be (Brussels Belgium)

Page 27: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Performance  degraded  aMer  recovery  

•  AMer  recovery,  reads  suffer  a  performance  hit.  •  Regions  have  lost  locality  •  To  maintain  performance  aMer  failover,  we  need  to  regain  locality.  •  Compact  Region  to  regain  locality  

• We  can  do  beVer  by  using  HDFS  features  

4/4/14 BigData.be (Brussels Belgium)

performance  recovered  recovered  

Service  recovered;    degraded  performance  L  

recovery  

Performance  recovered  because    compacdon  restores  locality  J  

Page 28: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

•  Control  and  track  where  block  replicas  are  •  All  files  for  a  region  created  such  that  blocks  go  to  the  same  set  of  favored  nodes  •  When  failing  over,  assign  the  region  to  one  of  those  favored  nodes.  

•  Currently  a  preview  feature  in  0.96  •  Disabled  by  default  because  it  doesn’t  work  well  with  the  latest  balancer  or  splits.  •  Will  likely  use  upcoming  HDFS  block  affinity  for  beVer  operability  

•  Originally  on  Facebook’s  0.89,  ported  to  0.96  

performance  recovered  

Read  Throughput:  Favored  Nodes  (0.96*)  

4/4/14 BigData.be (Brussels Belgium)

Service  recovered;  performance  sustained  because    region  assigned  to  favored  node.  J  

recovery  

Page 29: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Read  latency:  HDFS  hedged  read  (CDH5.0)  

• HBase’s  Region  servers  use  HDFS  client  to  reads  1  of  3  HDFS  block  replicas  

•  If  you  chose  the  slow  node,  your  reads  are  slow.  

•  If  a  read  is  taking  too  long,  speculadvely  go  to  another  that  may  be  faster.  

4/4/14 BigData.be (Brussels Belgium) RS  

1  2  3  

Slow  read!  

Hdfs  re

plicas  

RS  

1  2  3  

Hdfs  re

plicas  

Too  slow,  read  other  replica  

Page 30: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Read  latency:  Read  Replicas  (in  progress)  

•  HBase  client  reads  from  primary  region  servers.  

•  If  you  chose  the  slow  node,  your  reads  are  slow.  

•  Idea:  Read  replica  assigned  to  other  region  servers.    Replicas  periodically  catch  up  (via  snapshots  or  shadow  region  memstores)      

•  Client  specifies  if  stale  read  OK.    If  a  read  is  taking  too  long,  speculadvely  go  to  another  that  may  be  faster.  

4/4/14 BigData.be (Brussels Belgium)

Hbase    

Client  

1  

Slow  read!  

1  2  3  Re

gion

   replicas  

Too  slow,  read  stale  replica  

Hbase    

Client  

Page 31: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Write  latency:  Muldple  WALs  (in  progress)  

• HBase’s  HDFS  client  writes  3  replicas    

• Min  write  latency  is  bounded  by  the  slowest  of  the  3  replicas  

•  Idea:  If  a  write  is  taking  too  long  let’s  duplicate  it  on  another  set  that  may  be  faster.  

4/4/14 BigData.be (Brussels Belgium) RS  

1  2  3  

Slow  Write  

Hdfs    

replicas  

RS  

1  2  3  Hd

fs    

replicas  

1  2  3  Hd

fs  

replicas  

Too  slow,  write    to  other  replica  

Page 32: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

MR  over  Table  Snapshots  (0.98,  CDH5.0)  

•  Previously  MapReduce  jobs  over  HBase  required  online  full  table  scan  

•  Idea:  Take    a  snapshot  and  run  MR  job  over  snapshot  files  

•  Doesn’t  use  HBase  client  •  Avoid  affecdng  HBase  caches    •  3-­‐5x  perf  boost.  

4/4/14 BigData.be (Brussels Belgium)

map  map  map  map  map  map  map  map  

reduce  reduce  reduce  

map  map  map  map  map  map  map  map  

reduce  reduce  reduce  

snapshot  

Page 33: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Improving  Usability  Autotuning,  Tracing,  and  SQL  

4/4/14 BigData.be (Brussels Belgium)

Page 34: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Making  HBase  easier  to  use  and  tune.  

• Difficult  to  see  what  is  happening  in  HBase  •  Easy  to  make  poor  design  decisions  early  without  realizing    • New  Developments  

• Memory  auto  tuning  •  HTrace  +  Zipkin  •  Frameworks  for  Schema  design  

4/4/14 BigData.be (Brussels Belgium)

Page 35: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Memory  Use  Auto-­‐tuning  

• Memory  is  divided  between    •  the  memstore  (used  for  serving  recent  writes)    •  the  block  cache  (used  for  read  hot  spots)  

• Need  to  choose  balance  for  work  load  

4/4/14 BigData.be (Brussels Belgium)

memstore  

Block  cache  memstore  

Block  cache  

memstore  Block  cache  

Read  Heavy    Balanced   Write  heavy  

Page 36: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

HTrace  

•  Problem:  •  Where  is  dme  being  spent  inside  HBase?      

•  Soludon:  HTrace  Framework  •  Inspired  by  Google  Dapper  •  Threaded  through  HBase  and  HDFS    

•  Tracks  dme  spent  in  calls  in  a  distributed  system  by  tracking  spans*  on  different  machines.  

*Some  assembly  sdll  required.  

4/4/14 BigData.be (Brussels Belgium)

Page 37: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

HTrace:  Distributed  Tracing  in  HBase  and  HDFS  

•  Framework  Inspired  by  Google  Dapper  

•  Tracks  dme  spent  in  calls  in  RPCs  across  different  machines.  

•  Threaded  through  HBase  (0.96)  and  future  HDFS.  

4/4/14 BigData.be (Brussels Belgium)

HBase    

RS  

1  HDFS  

DN  

ZK  

HBase  

Client  

HBase  

meta  

HDFS      

NN  

A  span  

RPC  calls  

Page 38: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Zipkin  –  Visualizing  Spans  

•  UI  +  Visualizadon  System  •  WriVen  by  TwiVer  

•  Zipkin  HBase  Storage  •  Zipkin  HTrace  integradon  

•  View  where  dme  from  a  specific  call  is  spent  in  HBase,  HDFS,  and  ZK.  

4/4/14 BigData.be (Brussels Belgium)

Page 39: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

HBase  Schemas  

•  HBase  Applicadon  developers  must  iterate  to  find  a  suitable  HBase  schema  

•  Schema  criMcal  for  Performance  at  Scale  •  How  can  we  make  this  easier?  •  How  can  we  reduce  the  experdse  required  to  do  this?  

•  Today:  •  Lots  of  tuning  knobs  •  Developers  need  to  understand  Column  Families,  Rowkey  design,  Data  encoding,  …  

•  Some  are  expensive  to  change  aMer  the  fact  

4/4/14 BigData.be (Brussels Belgium)

Page 40: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

How  should  I  arrange  my  data?  

•  Isomorphic  data  representadons!  

4/4/14 BigData.be (Brussels Belgium)

rowkey   d:  

bob-­‐col1   aaaa  

bob-­‐col2   bbbb  

bob-­‐col3   cccc  

bob-­‐col4   dddd  

jon-­‐col1   eeee  

jon-­‐col2   ffff  

jon-­‐col3   gggg  

jon-­‐col4   hhhh  

Rowkey   d:col1   d:col2   d:col3   d:col4  

bob   aaaa   bbbb   cccc   dddd  

jon   eeee   ffff   gggg   hhhhh  

Rowkey   col1:   col2:   col3:   col4:  

bob   aaaa   bbbb   cccc   dddd  

jon   eeee   ffff   gggg   hhhhh  

Short  Fat  Table  using  column  qualifiers  

Short  Fat  Table  using  column  families  

Tall  skinny  with    compound  rowkey  

Page 41: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

How  should  I  arrange  my  data?  

•  Isomorphic  data  representadons!  

4/4/14 BigData.be (Brussels Belgium)

rowkey   d:  

bob-­‐col1   aaaa  

bob-­‐col2   bbbb  

bob-­‐col3   cccc  

bob-­‐col4   dddd  

jon-­‐col1   eeee  

jon-­‐col2   ffff  

jon-­‐col3   gggg  

jon-­‐col4   hhhh  

Rowkey   d:col1   d:col2   d:col3   d:col4  

bob   aaaa   bbbb   cccc   dddd  

jon   eeee   ffff   gggg   hhhhh  

Rowkey   col1:   col2:   col3:   col4:  

bob   aaaa   bbbb   cccc   dddd  

jon   eeee   ffff   gggg   hhhhh  

Short  Fat  Table  using  column  qualifiers  

Short  Fat  Table  using  column  families  

Tall  skinny  with    compound  rowkey  

With  great  pow

er  comes  grea

t  

responsibility

!  

 

How  can  we  

make  this  easie

r  for  users?  

Page 42: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Impala  

•  Scalable  Low-­‐latency  SQL  querying  for  HDFS  (and  HBase!)  •  ODBC/JDBC  driver  interface  •  Highlights    

•  Use’s  Hive  metastore  and  its  hbase-­‐hbase  connector  configuradon  convendons.  

•  Nadve  code  implementadon,  uses  JIT  for  query  execudon  opdmizadon.  

•  Authorizadon  via  Kerberos  support  

•  Open  sourced  by  Cloudera  •  hVps://github.com/cloudera/impala  

4/4/14 BigData.be (Brussels Belgium)

Page 43: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Phoenix  

•  A  SQL  skin  over  HBase  targedng  low-­‐latency  queries.  •  JDBC  SQL  interface  •  Highlights  

•  Adds  Types  •  Handles  Compound  Row  key  encoding    •  Secondary  indices  in  development  •  Provides  some  pushdown  aggregadons  (coprocessor).  

•  Open  sourced  by  Salesforce.com  •  Work  from  James  Taylor,  Jesse  Yates,  et  al  •  hVps://github.com/forcedotcom/phoenix  

4/4/14 BigData.be (Brussels Belgium)

Page 44: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Kite  (nee  Cloudera  Development  Kit/CDK)  

•  APIs  that  provides  a  Dataset  abstracMon    •  Provides  get/put/delete  API  in  avro  objects  •  HBase  Support  in  progress  

•  Highlights  •  Supports  muldple  components  of  the  hadoop  distros  (flume,  morphlines,  hive,  crunch,  hcat)  

•  Provides  types  using    Avro  and  parquet  formats  for  encoding  enddes  

•  Manages  schema  evoludon  

•  Open  source  by  Cloudera  •  hVps://github.com/kite-­‐sdk/kite  

4/4/14 BigData.be (Brussels Belgium)

Page 45: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Many  apps  and  users  in  a  single  cluster  

Muld-­‐tenancy  

4/4/14 BigData.be (Brussels Belgium)

Page 46: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Growing  HBase  

•  Pre  0.96.0:  scaling  up  HBase  for  single  HBase  applicadons  

•  Essendally  a  single  user  for  single  app.  

•  Ex:  Facebook  messages,  one  applicadon,  many  hbase  clusters  

•  Shard  users  to  different  pods  •  Focused  on  condnuity  and  disaster  recovery  features  

•  Cross-­‐cluster  Replicadon  •  Table  Snapshots  •  Rolling  Upgrades  

4/4/14 BigData.be (Brussels Belgium)

#  of  isolated  applicadons  

 #  of  clusters  

Scalability  

One  giant  applicadon,    Muldple  clusters  

Page 47: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Growing  HBase  

•  In  0.96  we  introduce  primidves  for  suppordng  MulMtenancy  

• Many  users,  many  applicadons,  one  HBase  cluster  

•  Need  to  have  some  control  of  the  interacdons  different  users  cause.  

•  Ex:  Manage  for  MR  analydcs  and  low-­‐latency  serving  in  one  cluster.  

4/4/14 BigData.be (Brussels Belgium)

#  of  isolated  applicadons  

 #  of  clusters  

muldtenancy  

Scalability  

One  giant  applicadon,    Muldple  clusters  

Many  applicadons    In  one  shared  cluster  

Page 48: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Namespaces  (0.96)  

• Namespaces  provide  an  abstracdon  for  muldple  tenants  to  create  and  manage  their  own  tables  within  a  large  HBase  instance.  

4/4/14 BigData.be (Brussels Belgium)

Namespace  blue   Namespace  green   Namespace  orange  

Page 49: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Muldtenancy  goals  

•  Security  (0.96)  •  A  separate  admin  ACLs  for  different  sets  of  tables  

•  Quotas  (in  progress)  •  Max  tables,  max  regions.  

•  Performance  Isoladon  (in  progress)  •  Limit  performance  impact  load  on  one  table  has  on  others.  

•  Priority  (future)  •  Handle  some  tables  before  others  

4/4/14 BigData.be (Brussels Belgium)

Page 50: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Isoladon  with  Region  Server  Groups  

4/4/14 BigData.be (Brussels Belgium)

Region  assignment  distribudon  (no  region  server  groups)  

Namespace  blue   Namespace  green   Namespace  orange  

Page 51: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Isoladon  with  Region  Server  Groups  

4/4/14 BigData.be (Brussels Belgium)

RSG  blue   RSG  green  orange  

Namespace  blue   Namespace  green   Namespace  orange  

Region  assignment  distribudon  with  Region  Server  Groups  (RSG)  

Page 52: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Cell  Tags  

• Mechanism  for  aVaching  arbitrary  metadata  to  Cells.    

• Modvadon:  Finer-­‐grained  isoladon  •  Use  for  Accumulo-­‐style  cell-­‐level  visibility    

• Main  feature  for  0.98  • Other  uses:  

•  Add  sequence  numbers  to  enable  correct  fast  read/write  recovery  

•  Potendal  for  schema  tags  

4/4/14 BigData.be (Brussels Belgium)

Page 53: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Conclusions  

4/4/14 BigData.be (Brussels Belgium)

Page 54: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Summary  by  Version  0.90  (CDH3)   0.92  /0.94  (CDH4)   0.96  (CDH5)   Next  (0.98  /  1.0.0)  

New  Features  

Stability    Reliability    

Condnuity   Muldtenancy  

MTTR   Recovery  in  Hours  

Recovery  in  Minutes   Recovery  of  writes  in  seconds,  reads  in  10’s  of  seconds    

Recovery  in  Seconds  (reads+writes)  

Perf   Baseline   BeVer  Throughput   Opdmizing  Performance    

Predictable  Performance  

Usability   HBase  Developer  Experdse  

HBase  Operadonal  Experience  

Distributed  Systems  Admin  Experience  

Applicadon  Developers  Experience  

4/4/14 BigData.be (Brussels Belgium)

Page 55: Apache(HBase(–Where(we’ve(been( DONOTUSEPUBLICLY# …bigdata.be/wp-content/uploads/2014/04/140404-brussels-hbase-now-and... · Jan(‘12:(0.92.0(Apache(HBase(Timeline(4/4/14 BigData.be

Quesdons?  @jmhsieh  

4/4/14 BigData.be (Brussels Belgium)