55
Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Apache HBase – Where we’ve been and what’s upcoming Jonathan Hsieh | @jmhsieh Tech lead / SoMware Engineer at Cloudera | HBase PMC Member Hadoop Users Group UK April 10, 2014 4/10/14 Hadoop Users Group UK

Apache HBase: Where We've Been and What's Upcoming

  • Upload
    huguk

  • View
    977

  • Download
    0

Embed Size (px)

DESCRIPTION

Jon Hsieh, Software Engineer @ Cloudera and HBase Committer Apache HBase is a distributed non-relational database that provides low-latency random read write access to massive quantities of data. This talk will be broken up into two parts. First I'll talk about how in the past few years, HBase has been deployed in production at companies like Facebook, Pinterest, Groupon, and eBay and about the vibrant community of contributors from around the world include folks at Cloudera, Salesforce.com, Intel, HortonWorks, Yahoo!, and XiaoMi. Second I'll talk about the features in the newest release 0.96.x and in the upcoming 0.98.x release.

Citation preview

Page 1: Apache HBase: Where We've Been and What's Upcoming

Headline  Goes  Here  Speaker  Name  or  Subhead  Goes  Here  

DO  NOT  USE  PUBLICLY  PRIOR  TO  10/23/12  

Apache  HBase  –  Where  we’ve  been  and  what’s  upcoming  Jonathan  Hsieh  |  @jmhsieh    Tech  lead  /  SoMware  Engineer  at  Cloudera  |  HBase  PMC  Member    Hadoop  Users  Group  UK  April  10,  2014  

4/10/14 Hadoop Users Group UK

Page 2: Apache HBase: Where We've Been and What's Upcoming

Who  Am  I?  

•  Cloudera:  •  Tech  Lead  HBase  Team  •  So<ware  Engineer  •  Apache  HBase  commiVer  /  PMC    •  Apache  Flume  founder  /  PMC  

• U  of  Washington:  •  Research  in  Distributed  Systems  

4/10/14 Hadoop Users Group UK

Page 3: Apache HBase: Where We've Been and What's Upcoming

What  is  Apache  HBase?  

Apache  HBase  is  a  reliable,  column-­‐

oriented  data  store  that  provides  consistent,  low-­‐latency,  random  read/

write  access.  

ZK   HDFS  

App   MR  

4/10/14 Hadoop Users Group UK

Page 4: Apache HBase: Where We've Been and What's Upcoming

HBase  provides  Low-­‐latency  Random  Access  

•  Writes:    •  1-­‐3ms,  1k-­‐20k  writes/sec  per  node  

•  Reads:    •  0-­‐3ms  cached,  10-­‐30ms  disk  •  10k-­‐40k  reads  /  second  /  node  from  cache  

•  Cell  size:    •  0B-­‐3MB    

•  Read,  write,  and  insert  data  anywhere  in  the  table  

4/10/14 Hadoop Users Group UK

0000000000  

1111111111  

2222222222  

3333333333  

4444444444  

5555555555  

6666666666  

7777777777  

1  

2  3  

4  

5  

Page 5: Apache HBase: Where We've Been and What's Upcoming

Core  Properces  

•  ACID  guarantees  on  a  row    •  Writes  are  durable  •  Strong  consistency  first,  then  availability  •  AMer  failure,  recover  and  return  current  value  instead  of  returning  stale  value  •  CAS  and  atomic  increments  can  be  efficient.  

•  Sorted  By  Primary  Key    •  Short  scans  are  efficient  •  Parcconed  by  Primary  Key  

•  Log  Structured  Merged  Tree  •  Writes  are  extremely  efficient  •  Reads  are  efficient  

•  Periodic  layout  opcmizacons  for  read  opcmizacon  (“compaccons”)  required.  

4/10/14 Hadoop Users Group UK

Page 6: Apache HBase: Where We've Been and What's Upcoming

An  HBase  History  

Where  We’ve  Been  

4/10/14 Hadoop Users Group UK

Page 7: Apache HBase: Where We've Been and What's Upcoming

Jan  ‘12:  0.92.0  

Apache  HBase  Timeline  

4/10/14 Hadoop Users Group UK

2014  2006   2007   2008   2009   2010   2011   2013  2012  

Nov  ’06:  Google    BigTable  OSDI  ‘06  

Apr  ‘07:  First  Apache  HBase  commit  as  Hadoop  contrib  project  

Apr  ‘10:  Apache  HBase  becomes  top  level  project   Oct  ‘13:  0.96.0  

Apr’11:  CDH3  GA  with  HBase  0.90.1      

May  ‘12:  HBaseCon  2012  

Jun  ‘13:  HBaseCon  2013  

Jan‘08:  Promoted  to  Hadoop  subproject  

Feb  ‘13:  0.98.0  May  ‘12:  0.94.0  

Summer‘11:    Messages    on  HBase    Summer  ‘09  

StumbleUpon    goes  produccon  on  HBase  ~0.20  

Nov  ‘11:    Cassini  on  HBase  

Jan  ‘13  Phoenix  on  HBase  

Summer‘11:    Web  Crawl    Cache  

Page 8: Apache HBase: Where We've Been and What's Upcoming

Developer  Community  

•  Accve  community!  •  Diverse  commiVers  from  many  organizacons  

4/10/14 Hadoop Users Group UK

Page 9: Apache HBase: Where We've Been and What's Upcoming

Apache  HBase  “Nascar”  Slide  

4/10/14 Hadoop Users Group UK

Page 10: Apache HBase: Where We've Been and What's Upcoming

Apache  HBase  Core  Development  

4/10/14 Hadoop Users Group UK

•  Vendors  •  Self  Service  

Page 11: Apache HBase: Where We've Been and What's Upcoming

Apache  HBase  Sample  Users  

4/10/14 Hadoop Users Group UK

•  Inbox  •  Storage  • Web  •  Search  • Analyccs  • Monitoring      

Page 12: Apache HBase: Where We've Been and What's Upcoming

Apache  HBase  Ecosystem  Projects  

4/10/14 Hadoop Users Group UK

Page 13: Apache HBase: Where We've Been and What's Upcoming

What’s  here  and  new  today?  

Today:  Apache  0.96.2  /  0.98.1  

4/10/14 Hadoop Users Group UK

Page 14: Apache HBase: Where We've Been and What's Upcoming

Criccal  Features  

Disaster  Recovery  

•  Cluster  Replicacon  •  Table  Snapshots  •  Copy  Table  •  Import  /  Export  Tables  • Metadata  Corrupcon  repair  tool  (hbck)  

AdministraMve  and  ConMnuity  

•  Kerberos  based  Authenccacon  •  ACL  based  Authorizacon  •  Config  change  via  rolling  restart.  • Within  version  rolling  upgrade.  •  Protobuf  based  wire  protocol  for  RPC  future  proofing  

4/10/14 Hadoop Users Group UK

Page 15: Apache HBase: Where We've Been and What's Upcoming

Hardened  for  0.96  

Table  AdministraMon  

• Online  Schema  change  • Online  Region  Merging  •  Concnuous  fault  injeccon  tescng  with  “Chaos  Monkey”  

Performance  Tuning  

•  Alternate  key  encodings  for  efficient  memory  usage  

•  Exploring  Compactor  policy  minimizes  compaccon  storms  

•  Smart  and  Adapcve  Stochascc  region  load  balancer  

•  Fast  split  policy  for  new  tables  

4/10/14 Hadoop Users Group UK

Page 16: Apache HBase: Where We've Been and What's Upcoming

MR  over  Table  Snapshots  (0.98,  CDH5.0)  

•  Previously  MapReduce  jobs  over  HBase  required  online  full  table  scan  

•  Idea:  Take    a  snapshot  and  run  MR  job  over  snapshot  files  

•  Doesn’t  use  HBase  client  •  Avoid  affeccng  HBase  caches    •  3-­‐5x  perf  boost.  

4/10/14 Hadoop Users Group UK

map  map  map  map  map  map  map  map  

reduce  reduce  reduce  

map  map  map  map  map  map  map  map  

reduce  reduce  reduce  

snapshot  

Page 17: Apache HBase: Where We've Been and What's Upcoming

Mean  Time  to  Recovery  (MTTR)  

• Machine  failures  happen  in  distributed  systems  • Average  unavailability  when  automaccally  recovering  from  a  failure.  

•  Recovery  cme  for  a  unclean  data  center  power  cycle  

4/10/14 Hadoop Users Group UK

recovered  nocfy  repair  detect  

Region  unavailable  

Region  available  client  aware  

Region  available  client  unaware  

Page 18: Apache HBase: Where We've Been and What's Upcoming

Fast  nocficacon  and  deteccon  (0.96)  

•  Proaccve  nocficacon  of  HMaster  failure  (0.96)  •  Proaccve  nocficacon  of  RS  failure  (0.96)  • Nocfy  client  on  recovery  (0.96)  

•  Fast  server  failover  (Hardware)  

4/10/14 Hadoop Users Group UK

recovered  replay  assign  split  

Region  unavailable  

Region  available    for  RW  

hdfs   hdfs  

detect  

hdfs  

Page 19: Apache HBase: Where We've Been and What's Upcoming

•  Previously  had  two  IO  intensive  passes:  •  Log  splitng  to  intermediate  files  •  Assign  and  log  replay  

•  Now  just  one  IO  heavy  pass:    Assign  first,  then  split+replay.  •  Improves  read  and  write  recovery  cmes.  •  Off  by  default  currently*.  

Distributed  log  replay  (experimental  0.96)  

4/10/14 Hadoop Users Group UK

recovered  split  +  replay  assign  

Region  unavailable  

Region  available    for  RW  

Region  available    for  replay  writes  

hdfs  

detect  

*Caveat:  If  you  override  cme  stamps  you  could  have    READ  REPEATED  isolacon  violacons  (use  tags  to  fix  this)  

Page 20: Apache HBase: Where We've Been and What's Upcoming

Cell  Tags  (0.98  experimental)  

• Mechanism  for  aVaching  arbitrary  metadata  to  Cells.    

• Mocvacon:  Finer-­‐grained  isolacon  •  Use  for  Accumulo-­‐style  cell-­‐level  visibility    

• Main  feature  for  0.98  • Other  uses:  

•  Add  sequence  numbers  to  enable  correct  fast  read/write  recovery  

•  Potencal  for  schema  tags  

4/10/14 Hadoop Users Group UK

Page 21: Apache HBase: Where We've Been and What's Upcoming

Htrace  (0.96  experimental)  

•  Problem:  •  Where  is  cme  being  spent  inside  HBase?      

•  Solucon:  HTrace  Framework  •  Inspired  by  Google  Dapper  •  Threaded  through  HBase  and  HDFS    

•  Tracks  cme  spent  in  calls  in  a  distributed  system  by  tracking  spans*  on  different  machines.  

*Some  assembly  scll  required.  

4/10/14 Hadoop Users Group UK

Page 22: Apache HBase: Where We've Been and What's Upcoming

HTrace:  Distributed  Tracing  in  HBase  and  HDFS  

•  Framework  Inspired  by  Google  Dapper  

•  Tracks  cme  spent  in  calls  in  RPCs  across  different  machines.  

•  Threaded  through  HBase  (0.96)  and  future  HDFS.  

4/10/14 Hadoop Users Group UK

HBase    

RS  

1  HDFS  

DN  

ZK  

HBase  

Client  

HBase  

meta  

HDFS      

NN  

A  span  

RPC  calls  

Page 23: Apache HBase: Where We've Been and What's Upcoming

Zipkin  –  Visualizing  Spans  

•  UI  +  Visualizacon  System  •  WriVen  by  TwiVer  

•  Zipkin  HBase  Storage  •  Zipkin  HTrace  integracon  

•  View  where  cme  from  a  specific  call  is  spent  in  HBase,  HDFS,  and  ZK.  

4/10/14 Hadoop Users Group UK

Page 24: Apache HBase: Where We've Been and What's Upcoming

A  Future  HBase  

What’s  Upcoming  

4/10/14 Hadoop Users Group UK

Page 25: Apache HBase: Where We've Been and What's Upcoming

Outline  

•  Improved  Mean  cme  to  recovery  (MTTR)  

•  Improved  Predictability  

•  Improved  Usability  

•  Improved  Mulctenancy  

4/10/14 Hadoop Users Group UK

Page 26: Apache HBase: Where We've Been and What's Upcoming

Faster  read  recovery  

Improving  MTTR  Further    

4/10/14 Hadoop Users Group UK

Page 27: Apache HBase: Where We've Been and What's Upcoming

•  Previously  had  two  IO  intensive  passes:  •  Log  splitng  to  intermediate  files  •  Assign  and  log  replay  

•  Now  just  one  IO  heavy  pass:    Assign  first,  then  split+replay.  •  Improves  read  and  write  recovery  cmes.  •  Off  by  default  currently*.  

Distributed  log  replay  (experimental  0.96)  

4/10/14 Hadoop Users Group UK

recovered  split  +  replay  assign  

Region  unavailable  

Region  available    for  RW  

Region  available    for  replay  writes  

hdfs  

detect  

*Caveat:  If  you  override  cme  stamps  you  could  have    READ  REPEATED  isolacon  violacons  (use  tags  to  fix  this)  

Page 28: Apache HBase: Where We've Been and What's Upcoming

recovered  split  +  replay  

Distributed  log  replay  with  fast  write  recovery  

4/10/14 Hadoop Users Group UK

assign  

Region  unavailable  

Region  available    for  RW  

Region  available    for  all  writes  

hdfs  

detect  

• Writes  in  HBase  do  not  incur  reads.  • With  distributed  log  replay,  we’ve  already  have  regions  open  for  write.  

• Allow  fresh  writes  while  replaying  old  logs*.  *Caveat:  If  you  override  cme  stamps  you  could  have    READ  REPEATED  isolacon  violacons  (use  tags  to  fix  this)  

Page 29: Apache HBase: Where We've Been and What's Upcoming

Fast  Read  Recovery  (proposed)  

•  Idea:  Priscne  Region  fast  read  recovery  •  If  region  not  edited  it  is  consistent  and  can  recover  RW  immediately    

•  Idea:  Shadow  Regions  for  fast  read  recovery  •  Shadow  region  tails  the  WAL  of  the  primary  region  •  Shadow  memstore  is  one  HDFS  block  behind,  catch  up  recover  RW  

•  Currently  some  progress  for  trunk  

4/10/14 Hadoop Users Group UK

recovered  assign  

Region  unavailable  

Can  guarantee  no  new  edits?  Region  available    for  all  RW  

detect  

Can  guarantee  we  have  all  edits?  Region  available  for  all  RW  

Page 30: Apache HBase: Where We've Been and What's Upcoming

Improving  the  99%cle  

Improving  Predictability  

4/10/14 Hadoop Users Group UK

Page 31: Apache HBase: Where We've Been and What's Upcoming

Common  causes  of  performance  variability  

•  Locality  Loss  •  Favored  Nodes,  HDFS  block  affinity  

•  Compaccon  •  Exploring  compactor  

• GC*    •  Off-­‐heap  Cache  

• Hardware  hiccups  • MulM  WAL,  HDFS  speculaMve  read  

4/10/14 Hadoop Users Group UK

Page 32: Apache HBase: Where We've Been and What's Upcoming

Performance  degraded  aMer  recovery  

•  AMer  recovery,  reads  suffer  a  performance  hit.  •  Regions  have  lost  locality  •  To  maintain  performance  aMer  failover,  we  need  to  regain  locality.  •  Compact  Region  to  regain  locality  

• We  can  do  beVer  by  using  HDFS  features  

4/10/14 Hadoop Users Group UK

performance  recovered  recovered  

Service  recovered;    degraded  performance  L  

recovery  

Performance  recovered  because    compaccon  restores  locality  J  

Page 33: Apache HBase: Where We've Been and What's Upcoming

•  Control  and  track  where  block  replicas  are  •  All  files  for  a  region  created  such  that  blocks  go  to  the  same  set  of  favored  nodes  •  When  failing  over,  assign  the  region  to  one  of  those  favored  nodes.  

•  Currently  a  preview  feature  in  0.96  •  Disabled  by  default  because  it  doesn’t  work  well  with  the  latest  balancer  or  splits.  •  Will  likely  use  upcoming  HDFS  block  affinity  for  beVer  operability  

•  Originally  on  Facebook’s  0.89,  ported  to  0.96  

performance  recovered  

Read  Throughput:  Favored  Nodes  (experimental  0.96)  

4/10/14 Hadoop Users Group UK

Service  recovered;  performance  sustained  because    region  assigned  to  favored  node.  J  

recovery  

Page 34: Apache HBase: Where We've Been and What's Upcoming

Read  latency:  HDFS  hedged  read  (CDH5.0)  

• HBase’s  Region  servers  use  HDFS  client  to  reads  1  of  3  HDFS  block  replicas  

•  If  you  chose  the  slow  node,  your  reads  are  slow.  

•  If  a  read  is  taking  too  long,  speculacvely  go  to  another  that  may  be  faster.  

4/10/14 Hadoop Users Group UK RS  

1  2  3  

Slow  read!  

Hdfs  re

plicas  

RS  

1  2  3  

Hdfs  re

plicas  

Too  slow,  read  other  replica  

Page 35: Apache HBase: Where We've Been and What's Upcoming

Read  latency:  Read  Replicas  (in  progress)  

•  HBase  client  reads  from  primary  region  servers.  

•  If  you  chose  the  slow  node,  your  reads  are  slow.  

•  Idea:  Read  replica  assigned  to  other  region  servers.    Replicas  periodically  catch  up  (via  snapshots  or  shadow  region  memstores)      

•  Client  specifies  if  stale  read  OK.    If  a  read  is  taking  too  long,  speculacvely  go  to  another  that  may  be  faster.  

4/10/14 Hadoop Users Group UK

Hbase    

Client  

1  

Slow  read!  

1  2  3  Re

gion

   replicas  

Too  slow,  read  stale  replica  

Hbase    

Client  

Page 36: Apache HBase: Where We've Been and What's Upcoming

Write  latency:  Mulcple  WALs  (in  progress)  

• HBase’s  HDFS  client  writes  3  replicas    

• Min  write  latency  is  bounded  by  the  slowest  of  the  3  replicas  

•  Idea:  If  a  write  is  taking  too  long  let’s  duplicate  it  on  another  set  that  may  be  faster.  

4/10/14 Hadoop Users Group UK RS  

1  2  3  

Slow  Write  

Hdfs    

replicas  

RS  

1  2  3  Hd

fs    

replicas  

1  2  3  Hd

fs  

replicas  

Too  slow,  write    to  other  replica  

Page 37: Apache HBase: Where We've Been and What's Upcoming

Improving  Usability  Autotuning,  Tracing,  and  SQL  

4/10/14 Hadoop Users Group UK

Page 38: Apache HBase: Where We've Been and What's Upcoming

Making  HBase  easier  to  use  and  tune.  

• Difficult  to  see  what  is  happening  in  HBase  •  Easy  to  make  poor  design  decisions  early  without  realizing    • New  Developments  

• Memory  auto  tuning  •  HTrace  +  Zipkin  •  Frameworks  for  Schema  design  

4/10/14 Hadoop Users Group UK

Page 39: Apache HBase: Where We've Been and What's Upcoming

Memory  Use  Auto-­‐tuning  (trunk)  

• Memory  is  divided  between    •  the  memstore  (used  for  serving  recent  writes)    •  the  block  cache  (used  for  read  hot  spots)  

• Need  to  choose  balance  for  work  load  

4/10/14 Hadoop Users Group UK

memstore  

Block  cache  memstore  

Block  cache  

memstore  Block  cache  

Read  Heavy    Balanced   Write  heavy  

Page 40: Apache HBase: Where We've Been and What's Upcoming

HBase  Schemas  

•  HBase  Applicacon  developers  must  iterate  to  find  a  suitable  HBase  schema  

•  Schema  criMcal  for  Performance  at  Scale  •  How  can  we  make  this  easier?  •  How  can  we  reduce  the  expercse  required  to  do  this?  

•  Today:  •  Lots  of  tuning  knobs  •  Developers  need  to  understand  Column  Families,  Rowkey  design,  Data  encoding,  …  

•  Some  are  expensive  to  change  aMer  the  fact  

4/10/14 Hadoop Users Group UK

Page 41: Apache HBase: Where We've Been and What's Upcoming

How  should  I  arrange  my  data?  

•  Isomorphic  data  representacons!  

4/10/14 Hadoop Users Group UK

rowkey   d:  

bob-­‐col1   aaaa  

bob-­‐col2   bbbb  

bob-­‐col3   cccc  

bob-­‐col4   dddd  

jon-­‐col1   eeee  

jon-­‐col2   ffff  

jon-­‐col3   gggg  

jon-­‐col4   hhhh  

Rowkey   d:col1   d:col2   d:col3   d:col4  

bob   aaaa   bbbb   cccc   dddd  

jon   eeee   ffff   gggg   hhhhh  

Rowkey   col1:   col2:   col3:   col4:  

bob   aaaa   bbbb   cccc   dddd  

jon   eeee   ffff   gggg   hhhhh  

Short  Fat  Table  using  column  qualifiers  

Short  Fat  Table  using  column  families  

Tall  skinny  with    compound  rowkey  

Page 42: Apache HBase: Where We've Been and What's Upcoming

How  should  I  arrange  my  data?  

•  Isomorphic  data  representacons!  

4/10/14 Hadoop Users Group UK

rowkey   d:  

bob-­‐col1   aaaa  

bob-­‐col2   bbbb  

bob-­‐col3   cccc  

bob-­‐col4   dddd  

jon-­‐col1   eeee  

jon-­‐col2   ffff  

jon-­‐col3   gggg  

jon-­‐col4   hhhh  

Rowkey   d:col1   d:col2   d:col3   d:col4  

bob   aaaa   bbbb   cccc   dddd  

jon   eeee   ffff   gggg   hhhhh  

Rowkey   col1:   col2:   col3:   col4:  

bob   aaaa   bbbb   cccc   dddd  

jon   eeee   ffff   gggg   hhhhh  

Short  Fat  Table  using  column  qualifiers  

Short  Fat  Table  using  column  families  

Tall  skinny  with    compound  rowkey  

With  great  pow

er  comes  grea

t  

responsibility

!  

 

How  can  we  

make  this  easie

r  for  users?  

Page 43: Apache HBase: Where We've Been and What's Upcoming

Impala  

•  Scalable  Low-­‐latency  SQL  querying  for  HDFS  (and  HBase!)  •  ODBC/JDBC  driver  interface  •  Highlights    

•  Use’s  Hive  metastore  and  its  hbase-­‐hbase  connector  configuracon  convencons.  

•  Nacve  code  implementacon,  uses  JIT  for  query  execucon  opcmizacon.  

•  Authorizacon  via  Kerberos  support  

•  Open  sourced  by  Cloudera  •  hVps://github.com/cloudera/impala  

4/10/14 Hadoop Users Group UK

Page 44: Apache HBase: Where We've Been and What's Upcoming

Phoenix  

•  A  SQL  skin  over  HBase  targecng  low-­‐latency  queries.  •  JDBC  SQL  interface  •  Highlights  

•  Adds  Types  •  Handles  Compound  Row  key  encoding    •  Secondary  indices  in  development  •  Provides  some  pushdown  aggregacons  (coprocessor).  

•  Open  sourced  by  Salesforce.com  •  Work  from  James  Taylor,  Jesse  Yates,  et  al  •  hVps://github.com/forcedotcom/phoenix  

4/10/14 Hadoop Users Group UK

Page 45: Apache HBase: Where We've Been and What's Upcoming

Kite  (nee  Cloudera  Development  Kit/CDK)  

•  APIs  that  provides  a  Dataset  abstracMon    •  Provides  get/put/delete  API  in  avro  objects  •  HBase  Support  in  progress  

•  Highlights  •  Supports  mulcple  components  of  the  hadoop  distros  (flume,  morphlines,  hive,  crunch,  hcat)  

•  Provides  types  using    Avro  and  parquet  formats  for  encoding  encces  

•  Manages  schema  evolucon  

•  Open  source  by  Cloudera  •  hVps://github.com/kite-­‐sdk/kite  

4/10/14 Hadoop Users Group UK

Page 46: Apache HBase: Where We've Been and What's Upcoming

Many  apps  and  users  in  a  single  cluster  

Mulc-­‐tenancy  

4/10/14 Hadoop Users Group UK

Page 47: Apache HBase: Where We've Been and What's Upcoming

Growing  HBase  

•  Pre  0.96.0:  scaling  up  HBase  for  single  HBase  applicacons  

•  Essencally  a  single  user  for  single  app.  

•  Ex:  Facebook  messages,  one  applicacon,  many  hbase  clusters  

•  Shard  users  to  different  pods  •  Focused  on  concnuity  and  disaster  recovery  features  

•  Cross-­‐cluster  Replicacon  •  Table  Snapshots  •  Rolling  Upgrades  

4/10/14 Hadoop Users Group UK

#  of  isolated  applicacons  

 #  of  clusters  

Scalability  

One  giant  applicacon,    Mulcple  clusters  

Page 48: Apache HBase: Where We've Been and What's Upcoming

Growing  HBase  

•  In  0.96  we  introduce  primicves  for  supporcng  MulMtenancy  

• Many  users,  many  applicacons,  one  HBase  cluster  

•  Need  to  have  some  control  of  the  interaccons  different  users  cause.  

•  Ex:  Manage  for  MR  analyccs  and  low-­‐latency  serving  in  one  cluster.  

4/10/14 Hadoop Users Group UK

#  of  isolated  applicacons  

 #  of  clusters  

mulctenancy  

Scalability  

One  giant  applicacon,    Mulcple  clusters  

Many  applicacons    In  one  shared  cluster  

Page 49: Apache HBase: Where We've Been and What's Upcoming

Namespaces  (0.96)  

• Namespaces  provide  an  abstraccon  for  mulcple  tenants  to  create  and  manage  their  own  tables  within  a  large  HBase  instance.  

4/10/14 Hadoop Users Group UK

Namespace  blue   Namespace  green   Namespace  orange  

Page 50: Apache HBase: Where We've Been and What's Upcoming

Mulctenancy  goals  

•  Security  (0.96)  •  A  separate  admin  ACLs  for  different  sets  of  tables  

•  Quotas  (in  progress)  •  Max  tables,  max  regions.  

•  Performance  Isolacon  (in  progress)  •  Limit  performance  impact  load  on  one  table  has  on  others.  

•  Priority  (future)  •  Prioricze  some  workloads/tables/user  before  others  

4/10/14 Hadoop Users Group UK

Page 51: Apache HBase: Where We've Been and What's Upcoming

Isolacon  with  Region  Server  Groups  (in  progress)  

4/10/14 Hadoop Users Group UK

Region  assignment  distribucon  (no  region  server  groups)  

Namespace  blue   Namespace  green   Namespace  orange  

Page 52: Apache HBase: Where We've Been and What's Upcoming

Isolacon  with  Region  Server  Groups  (in  progress)  

4/10/14 Hadoop Users Group UK

RSG  blue   RSG  green  orange  

Namespace  blue   Namespace  green   Namespace  orange  

Region  assignment  distribucon  with  Region  Server  Groups  (RSG)  

Page 53: Apache HBase: Where We've Been and What's Upcoming

Conclusions  

4/10/14 Hadoop Users Group UK

Page 54: Apache HBase: Where We've Been and What's Upcoming

Summary  by  Version  0.90  (CDH3)   0.92  /0.94  (CDH4)   0.96  (CDH5)   Next  (0.98  /  1.0.0)  

New  Features  

Stability    Reliability    

Concnuity   Mulctenancy  

MTTR   Recovery  in  Hours  

Recovery  in  Minutes   Recovery  of  writes  in  seconds,  reads  in  10’s  of  seconds    

Recovery  in  Seconds  (reads+writes)  

Perf   Baseline   BeVer  Throughput   Opcmizing  Performance    

Predictable  Performance  

Usability   HBase  Developer  Expercse  

HBase  Operaconal  Experience  

Distributed  Systems  Admin  Experience  

Applicacon  Developers  Experience  

4/10/14 Hadoop Users Group UK

Page 55: Apache HBase: Where We've Been and What's Upcoming

Quescons?  @jmhsieh  

4/10/14 Hadoop Users Group UK