22
1 © Cloudera, Inc. All rights reserved. The Evolu:on and Future of Hadoop Storage Todd Lipcon | Engineer at Cloudera TwiCer @tlipcon | [email protected]

The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

Embed Size (px)

Citation preview

Page 1: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

1  ©  Cloudera,  Inc.  All  rights  reserved.  

The  Evolu:on  and  Future  of  Hadoop  Storage  Todd  Lipcon  |  Engineer  at  Cloudera  TwiCer  @tlipcon  |  [email protected]    

Page 2: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

2  ©  Cloudera,  Inc.  All  rights  reserved.  

Introduc:on  (the  evolu:on  and  future  of  me)  Mailing  list  messages  sent  by  Todd  Lipcon  

Spoke  at  HCJ  2011!  

Page 3: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

3  ©  Cloudera,  Inc.  All  rights  reserved.  

Introduc:on  (the  evolu:on  and  future  of  me)  Mailing  list  messages  sent  by  Todd  Lipcon  

-­‐ Early  user  of  Hadoop  -­‐ Joined  Cloudera  as  So4ware  Engineer  

Spoke  at  HCJ  2011!  

Page 4: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

4  ©  Cloudera,  Inc.  All  rights  reserved.  

Introduc:on  (the  evolu:on  and  future  of  me)  Mailing  list  messages  sent  by  Todd  Lipcon  

-­‐ Early  user  of  Hadoop  -­‐ Joined  Cloudera  as  So4ware  Engineer   -­‐  Work  on  HDFS,  HBase,  

MR  (HA,  performance,  stability,  etc)  

-­‐  Became  a  commiFer,  PMC  member,  and  ASF  Member  

Spoke  at  HCJ  2011!  

Page 5: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

5  ©  Cloudera,  Inc.  All  rights  reserved.  

Introduc:on  (the  evolu:on  and  future  of  me)  Mailing  list  messages  sent  by  Todd  Lipcon  

-­‐ Early  user  of  Hadoop  -­‐ Joined  Cloudera  as  So4ware  Engineer  

-­‐  Founded  the  Kudu  project  within  Cloudera  

-­‐  Secretly  developing  with  a  small  team  for  3  years  

-­‐  Work  on  HDFS,  HBase,  MR  (HA,  performance,  stability,  etc)  

-­‐  Became  a  commiFer,  PMC  member,  and  ASF  Member  

Spoke  at  HCJ  2011!  

Page 6: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

6  ©  Cloudera,  Inc.  All  rights  reserved.  

Introduc:on  (the  evolu:on  and  future  of  me)  Mailing  list  messages  sent  by  Todd  Lipcon  

-­‐ Early  user  of  Hadoop  -­‐ Joined  Cloudera  as  So4ware  Engineer  

-­‐  Founded  the  Kudu  project  within  Cloudera  

-­‐  Secretly  developing  with  a  small  team  for  3  years  

-­‐  Kudu  announced  and  contributed  to  the  ASF  as  Apache  Kudu  (incubaMng)  

-­‐  Work  on  HDFS,  HBase,  MR  (HA,  performance,  stability,  etc)  

-­‐  Became  a  commiFer,  PMC  member,  and  ASF  Member  

Spoke  at  HCJ  2011!  

Page 7: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

7  ©  Cloudera,  Inc.  All  rights  reserved.  

誕生日おめでとう  ございます。    Hadoop:  the  last  10  years  

Page 8: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

8  ©  Cloudera,  Inc.  All  rights  reserved.  

Page 9: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

9  ©  Cloudera,  Inc.  All  rights  reserved.  

 Parquet  Sentry  Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

Evolu:on  of  the  Hadoop  Plagorm    

2006   2008   2009   2010   2011   2012   2013  

Core  Hadoop    (HDFS,    

MapReduce)  

HBase  ZooKeeper  

Solr  Pig  

Core  Hadoop  

Hive  Mahout  HBase  

ZooKeeper  Solr  Pig  

Core  Hadoop  

Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

Core  Hadoop  

Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

The  stack  is  con:nually  evolving  and  growing!  

2007  

Solr  Pig  

Core  Hadoop  

   Ibis  Flink  

Parquet  Sentry  Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

 2014-­‐15  

Page 10: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

10  ©  Cloudera,  Inc.  All  rights  reserved.  

 Parquet  Sentry  Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

Basics  

Evolu:on  of  the  Hadoop  Plagorm    

2006   2008   2009   2010   2011   2012   2013  

Core  Hadoop    (HDFS,    

MapReduce)  

HBase  ZooKeeper  

Solr  Pig  

Core  Hadoop  

Hive  Mahout  HBase  

ZooKeeper  Solr  Pig  

Core  Hadoop  

Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

Core  Hadoop  

Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

The  stack  is  con:nually  evolving  and  growing!  

2007  

Solr  Pig  

Core  Hadoop  

   Ibis  Flink  

Parquet  Sentry  Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

 2014-­‐15  

-­‐ Very  basic  Hadoop  

-­‐ Batch  processes  only  

-­‐ Not  stable,  fast,  or  featureful  

Page 11: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

11  ©  Cloudera,  Inc.  All  rights  reserved.  

 Parquet  Sentry  Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

Basics  

Evolu:on  of  the  Hadoop  Plagorm    

2006   2008   2009   2010   2011   2012   2013  

Core  Hadoop    (HDFS,    

MapReduce)  

HBase  ZooKeeper  

Solr  Pig  

Core  Hadoop  

Hive  Mahout  HBase  

ZooKeeper  Solr  Pig  

Core  Hadoop  

Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

Core  Hadoop  

Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

The  stack  is  con:nually  evolving  and  growing!  

2007  

Solr  Pig  

Core  Hadoop  

   Ibis  Flink  

Parquet  Sentry  Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

 2014-­‐15  

-­‐ Very  basic  Hadoop  

-­‐ Batch  processes  only  

-­‐ Not  stable,  fast,  or  featureful  

-­‐ Expanding  feature  set  -­‐ Basic  security,  HA,  stability  

-­‐ Commercial  distribuMons    

Produc:on  

Page 12: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

12  ©  Cloudera,  Inc.  All  rights  reserved.  

 Parquet  Sentry  Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

Basics  

Evolu:on  of  the  Hadoop  Plagorm    

2006   2008   2009   2010   2011   2012   2013  

Core  Hadoop    (HDFS,    

MapReduce)  

HBase  ZooKeeper  

Solr  Pig  

Core  Hadoop  

Hive  Mahout  HBase  

ZooKeeper  Solr  Pig  

Core  Hadoop  

Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

Core  Hadoop  

Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

The  stack  is  con:nually  evolving  and  growing!  

2007  

Solr  Pig  

Core  Hadoop  

   Ibis  Flink  

Parquet  Sentry  Spark  Tez  

Impala  Ka]a  Drill  Flume  Bigtop  Oozie  MRUnit  HCatalog  

Hue  Sqoop  Whirr  Avro  Hive  

Mahout  HBase  

ZooKeeper  Solr  Pig  

YARN  Core  Hadoop  

 2014-­‐15  

Enterprise  

-­‐ Security  -­‐ Performance  -­‐ Fast  full-­‐featured  SQL    

-­‐ Very  basic  Hadoop  

-­‐ Batch  processes  only  

-­‐ Not  stable,  fast,  or  featureful  

-­‐ Expanding  feature  set  -­‐ Basic  security,  HA,  stability  

-­‐ Commercial  distribuMons    

Produc:on  

Page 13: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

13  ©  Cloudera,  Inc.  All  rights  reserved.  

Evolu:on  of  Storage  (Basics  /  2006-­‐2007)  

• HDFS  only  •  Support  basic  batch  workloads.  No  HA.  • Performance  not  important  • MapReduce  is  too  slow,  anyway!  • Batch  only  

• Early  Adopters  (FaceBook,  Yahoo,  etc)  

Page 14: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

14  ©  Cloudera,  Inc.  All  rights  reserved.  

Evolu:on  of  Storage  (Produc:on  /  2008-­‐2011)  

• HDFS  evolves  to  add  high  availability  and  security  • Focused  on  batch  workloads  •  Inefficient  file  formats  commonly  used  (text)  • Query  engines  are  slow!  No  need  for  beCer  performance  

• Apache  HBase  becomes  an  Apache  Top-­‐Level  Project  (TLP)  •  Introduces  fast  random  access  • Early  adopters  experiment  with  new  use  cases  • Deployed  at  Facebook  and  other  large  companies  

Page 15: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

15  ©  Cloudera,  Inc.  All  rights  reserved.  

Evolu:on  of  Storage  (Enterprise  /  2012-­‐2015)  • Reliable  core  brings  new  users  • Enterprise  features:  access  control,  disaster  recovery,  encryp:on  

•  Introduc:on  of  fast  query  engines  • 10-­‐100x  faster  SQL-­‐on-­‐Hadoop  (Impala,  Spark,  etc.)  • Pushes  HDFS  performance  improvements:  caching,  CPU  efficiency,  columnar  file  formats  (Apache  Parquet,  ORCFile)  

• HBase  evolves  to  1.0  •  Improved  stability,  scalability,  security  • Good  random  access  -­‐  not  fast  for  SQL  analy:cs.  

•  IniMal  support  for  cloud  storage  • Rising  adop:on  of  AWS,  Azure,  Google  Compute,  etc.  

Page 16: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

16  ©  Cloudera,  Inc.  All  rights  reserved.  

So  what’s  the  next  genera:on?  2016  and  beyond  

Page 17: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

17  ©  Cloudera,  Inc.  All  rights  reserved.  

2016-­‐2020  (Next-­‐gen):  storage  hardware  

•  Spinning  disk  -­‐>  solid  state  storage  • NAND  flash:  Up  to  450k  read  250k  write  iops,  about  2GB/sec  read  and  1.5GB/sec  write  throughput,  at  a  price  of  less  than  $3/GB  and  dropping  fast  • 3D  XPoint  memory  (1000x  faster  than  NAND,  cheaper  than  RAM)  

• RAM  is  cheaper  and  more  abundant:  • 64-­‐>128-­‐>256GB  over  last  few  years  

• HDFS  and  HBase  were  not  designed  for  next-­‐genera:on  hardware.  • Not  using  full  speed  of  flash  or  RAM  size    

Page 18: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

18  ©  Cloudera,  Inc.  All  rights  reserved.  

2016-­‐2020  (Next-­‐gen):  gaps  in  capabili:es  HDFS  good  at:  

•  Batch  ingest  only  (eg  hourly)  •  Efficiently  scanning  large  amounts  

of  data  (analy:cs)  HBase  good  at:  

•  Efficiently  finding  and  wri:ng  individual  rows  

•  Making  data  mutable    Gaps  exist  when  these  proper:es  are  needed  simultaneously    

Page 19: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

19  ©  Cloudera,  Inc.  All  rights  reserved.  

• High  throughput  for  big  scans  Goal:  Within  2x  of  Parquet    

•  Low-­‐latency  for  short  accesses          Goal:  1ms  read/write  on  SSD    

•  RelaMonal  data  model  •  SQL  queries  are  easy  •  “NoSQL”  style  scan/insert/update  (Java/C++  client)  

•  Expands  Hadoop  use  cases  •  Real-­‐:me  analy:cs  and  :me  series  •  Internet-­‐of-­‐things  

2016-­‐2020  (Next-­‐gen):  Apache  Kudu  (incuba:ng)  

Page 20: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

20  ©  Cloudera,  Inc.  All  rights  reserved.  

Kudu:  Open  source,  scalable  and  fast  tabular  storage  

•  Scalable  • Designed  to  scale  to  1000s  of  nodes,  tens  of  PBs  

•  Fast  • Designed  for  modern  hardware  • Millions  of  read/write  opera:ons  per  second  across  cluster  • MulMple  GB/second  read  throughput  per  node  

• Tabular  • Store  tables  like  a  normal  database  (support  SQL,  Spark,  etc)  • NoSQL-­‐style  access  to  100+  billion  row  tables  (Java/C++/Python  APIs)  

Page 21: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

21  ©  Cloudera,  Inc.  All  rights  reserved.  

2016-­‐2020  (Next  gen):  Predic:ons  

• Kudu  will  evolve  an  enterprise  feature  set  and  enable  simple  high-­‐performance  real-­‐:me  architectures  •  Increasing  ability  to  migrate  tradi:onal  applica:ons  

• HDFS  and  HBase  will  con:nue  to  innovate  and  adapt  to  next  genera:on  hardware  • Steady  improvements  in  performance,  efficiency,  and  scalability  (e.g.  erasure  coding)  

 

• Cloud  storage  will  become  increasingly  important  • Hadoop  ecosystem  will  evolve  to  coexist  

Page 22: The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

22  ©  Cloudera,  Inc.  All  rights  reserved.  

ありがとうございます  @tlipcon  @ApacheKudu  

To  learn  more  about  Kudu,  please  aCend  my  session  at  13:45,  Conference  Room  B  (7F)