29
Page 1 © Hortonworks Inc. 2014 Discover HDP 2.2: Apache HBase with YARN & Slider for Fast NoSQL Access Hortonworks. We do Hadoop.

Discover.hdp2.2.h base.final[2]

Embed Size (px)

Citation preview

Page 1: Discover.hdp2.2.h base.final[2]

Page 1 © Hortonworks Inc. 2014

Discover HDP 2.2: Apache HBase with YARN & Slider for Fast NoSQL Access

Hortonworks. We do Hadoop.

Page 2: Discover.hdp2.2.h base.final[2]

Page 2 © Hortonworks Inc. 2014

Speakers

Justin Sears

Hortonworks Product Marketing Manager

Carter Shanklin

Hortonworks Director of Product Management & PM for Apache HBase in Hortonworks Data Platform

Enis Soztutar

Hortonworks Engineer, Apache HBase Committer & PMC Member

Page 3: Discover.hdp2.2.h base.final[2]

Page 3 © Hortonworks Inc. 2014

Agenda

•  Introduction to Apache HBase

•  New HBase Innovation in HDP 2.2 –  HBase HA –  Support for rolling upgrades

–  HBase on YARN using Apache Slider

•  Q & A

We’ll move quickly: •  Attendee phone lines are muted •  Text any questions to Enis Soztutar using Webex chat

•  Questions answered at the end •  Unanswered questions and answers in upcoming blog post

Page 4: Discover.hdp2.2.h base.final[2]

Page 4 © Hortonworks Inc. 2014

Big Data, Hadoop & Data Center Re-platforming

Business Drivers

•  From reactive analytics to proactive interactions

•  Insights that drive competitive advantage & optimal returns

Financial Drivers

•  Cost of data systems, as % of IT spend, continues to grow

•  Cost advantages of commodity hardware & open source software

$ Technical Drivers

•  Data is growing exponentially & existing systems overwhelmed

•  Predominantly driven by NEW types of data that can inform analytics

There is an inequitable balance between vendor and customer in the market

Page 5: Discover.hdp2.2.h base.final[2]

Page 5 © Hortonworks Inc. 2014

Clickstream Capture and analyze website visitors’ data trails and optimize your website

Sensors Discover patterns in data streaming automatically from remote sensors and machines

Server Logs Research logs to diagnose process failures and prevent security breaches

New Types of Data Hadoop Value:

Sentiment Understand how your customers feel about your brand and products – right now

Geographic Analyze location-based data to manage operations where they occur

Unstructured Understand patterns in files across millions of web pages, emails, and documents

Page 6: Discover.hdp2.2.h base.final[2]

Page 6 © Hortonworks Inc. 2014

A Shift from Reactive to Proactive Interactions

HDP and Hadoop allow organizations to use data to shift interactions from…

Reactive Post Transaction

Proactive Pre Decision

…to Real-time Personalization From static branding

…to repair before break From break then fix

…to Designer Medicine From mass treatment

…to Automated Algorithms From Educated Investing

…to 1x1 Targeting From mass branding

A shift in Advertising

A shift in Financial Services

A shift in Healthcare

A shift in Retail

A shift in Telco

Page 7: Discover.hdp2.2.h base.final[2]

Page 7 © Hortonworks Inc. 2014

Enterprise Goals for the Modern Data Architecture

•  Consolidate siloed data sets structured and unstructured

•  Central data set on a single cluster

•  Multiple workloads across batch interactive and real time

•  Central services for security, governance and operation

•  Preserve existing investment in current tools and platforms

•  Single view of the customer, product, supply chain

APP

LIC

ATIO

NS

DAT

A S

YSTE

M

Business Analytics

Custom Applications

Packaged Applications

RDBMS

EDW

MPP

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

Interactive Real-Time Batch CRM

ERP

Other 1 ° ° °

° ° ° °

HDFS (Hadoop Distributed File System)

SOU

RC

ES

EXISTING  Systems  

Clickstream   Web    &Social  

Geoloca9on   Sensor    &  Machine  

Server    Logs  

Unstructured  

Page 8: Discover.hdp2.2.h base.final[2]

Page 8 © Hortonworks Inc. 2014

YARN Transformed Hadoop & Opened a New Era

YARN The Architectural Center of Hadoop

•  Common data platform, many applications

•  Support multi-tenant access & processing

•  Batch, interactive & real-time use cases

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Page 9: Discover.hdp2.2.h base.final[2]

Page 9 © Hortonworks Inc. 2014

YARN Extends Hadoop to Other Data Center Leaders

YARN The Architectural Center of Hadoop

•  Common data platform, many applications

•  Support multi-tenant access & processing

•  Batch, interactive & real-time use cases

•  Supports 3rd-party ISV tools

(ex. SAS, Syncsort, Actian, etc.)

YARN Ready Applications Facilitates ongoing innovation and enterprise adoption via ecosystem of new and existing “YARN Ready” solutions

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Page 10: Discover.hdp2.2.h base.final[2]

Page 10 © Hortonworks Inc. 2014

Enterprise Hadoop: Central Set of Services

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° °

° °

° ° ° ° °

° ° ° ° °

Enables Apache Hadoop to be an Enterprise Data Platform with centralized services for:

•  Governance

•  Operations

•  Security

Everything that plugs into Hadoop inherits these services

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Load data and manage

according to policy

Deploy and effectively

manage the platform

Provide layered approach to

security through Authentication, Authorization,

Accounting, and Data Protection

SECURITY GOVERNANCE OPERATIONS

Script

Pig

SQL

Hive

Java Scala

Cascading

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Others

ISV Engines

YARN: Data Operating System (Cluster Resource Management)

HDFS (Hadoop Distributed File System)

Tez Slider Slider Tez Tez

Page 11: Discover.hdp2.2.h base.final[2]

Page 11 © Hortonworks Inc. 2014

Hortonworks Data Platform 2.2

HDP Delivers Enterprise Hadoop

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume Kafka NFS

WebHDFS

Authentication Authorization

Audit Data Protection

Storage: HDFS

Resources: YARN Access: Hive

Pipeline: Falcon Cluster: Ranger Cluster: Knox

Deployment Choice Linux Windows Cloud

YARN is the architectural center of HDP

•  Common data set across all applications

•  Batch, interactive & real-time workloads

•  Multi-tenant access & processing

Provides comprehensive enterprise capabilities

•  Governance

•  Security

•  Operations

Enables broad ecosystem adoption

•  ISVs can plug directly into Hadoop

The widest range of deployment options •  Linux & Windows

•  On premises & cloud

Others

ISV Engines

On-Premises

Page 12: Discover.hdp2.2.h base.final[2]

Page 12 © Hortonworks Inc. 2014

Hortonworks Data Platform 2.2

HDP Delivers Enterprise Hadoop

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

Slider

SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume Kafka NFS

WebHDFS

Authentication Authorization

Audit Data Protection

Storage: HDFS

Resources: YARN Access: Hive

Pipeline: Falcon Cluster: Ranger Cluster: Knox

YARN is the architectural center of HDP

•  Common data set across all applications

•  Batch, interactive & real-time workloads

•  Multi-tenant access & processing

Provides comprehensive enterprise capabilities

•  Governance

•  Security

•  Operations

Enables broad ecosystem adoption

•  ISVs can plug directly into Hadoop

The widest range of deployment options •  Linux & Windows

•  On premises & cloud

Others

ISV Engines

YARN: Data Operating System (Cluster Resource Management)

Deployment Choice Linux Windows Cloud On-Premises

NoSQL

HBase Accumulo

Slider

Page 13: Discover.hdp2.2.h base.final[2]

Page 13 © Hortonworks Inc. 2014

Introduction to Apache HBase

Page 14: Discover.hdp2.2.h base.final[2]

Page 14 © Hortonworks Inc. 2014

What Is Apache HBase?

Flexible  Schema  Extreme  Low  Latency  SQL  and  NoSQL  Interfaces  Store  and  Process  Petabytes  of  Data  Scale  out  on  Commodity  Servers  Integrated  with  YARN  100%  Open  Source  

YARN  :  Data  Opera9ng  System  

HBase    

RegionServer  

1   °   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °   N  

HDFS  (Permanent  Data  Storage)  

HBase    

RegionServer  

HBase    

RegionServer  

Flexible Schema Extreme Low Latency

Directly Integrated with Hadoop

Page 15: Discover.hdp2.2.h base.final[2]

Page 15 © Hortonworks Inc. 2014

New in HDP 2.2: HBase HA

Page 16: Discover.hdp2.2.h base.final[2]

Page 16 © Hortonworks Inc. 2014

Primary  Keys:  (Read  Write)  

1-­‐100  

Standby  Keys:  (Read  Only)  

101-­‐200  201-­‐300  

Primary  Keys:  (Read  Write)  

101-­‐200  

Standby  Keys:  (Read  Only)  

201-­‐300  301-­‐400  

Primary  Keys:  (Read  Write)  

201-­‐300  

Standby  Keys:  (Read  Only)  

301-­‐400  1-­‐100  

Primary  Keys:  (Read  Write)  

301-­‐400  

Standby  Keys:  (Read  Only)  

1-­‐100  101-­‐200  

HBase  RegionServer  1  

HBase  RegionServer  2  

HBase  RegionServer  3  

HBase  RegionServer  4  

HDFS  (3  Copies  of  All  Data,  Available  to  all  RegionServers)  

1

2

3

1 HBase  Keys  are  range  parVVoned  across  servers,  node  failure  affects  1  key  range,  rest  remain  available.  

2 HBase  HA  stores  read-­‐only  copies  in  separate  RegionServers.  Data  can  sVll  be  read  if  a  node  fails.  

3 3  copies  of  all  data  stored  in  HDFS.  Data  from  failed  nodes  automaVcally  recovered  on  other  nodes.  

HBase  HA:  3  Levels  of  Protec9on  

Page 17: Discover.hdp2.2.h base.final[2]

Page 17 © Hortonworks Inc. 2014

Comparing HBase HA Phase 1 Versus 2

Item   HA  Phase  1  /  HDP  2.1   HA  Phase  2  /  HDP  2.2  

Data  Staleness   >  30s   Near  Zero  

HA  in  Scans   Unsupported   Supported  

Region  Split/Merge   Disabled   Supported  

META  Table  Highly  Available   Unsupported   Supported  

HBCK  check  for  common  HA  problems   Unsupported   Supported  

Page 18: Discover.hdp2.2.h base.final[2]

Page 18 © Hortonworks Inc. 2014

New in HDP 2.2: Rolling Upgrades

Page 19: Discover.hdp2.2.h base.final[2]

Page 19 © Hortonworks Inc. 2014

Rolling Upgrade Goals Zero downtime upgrades

Roll forward and roll backward

Update clients and servers independently

Page 20: Discover.hdp2.2.h base.final[2]

Page 20 © Hortonworks Inc. 2014

HBase Rolling Upgrade: Component Overview

New  Package  Format  

 Install  mulVple  versions  of  Hadoop  so`ware  on  a  single  

node  or  cluster.  

hdp-­‐select  U9lity      

Choose  the  component  version  you  want,  roll  forward  or  backward.  

Decoupled  Clients  and  Servers  

 Upgrade  servers  

independently  of  clients.  

Page 21: Discover.hdp2.2.h base.final[2]

Page 21 © Hortonworks Inc. 2014

HBase Rolling Upgrade: Directory Layout Directory  Layout:  /usr/hdp  

[root@cluster1  current]#  pwd  /usr/hdp/current  [root@cluster1  current]#  ls  -­‐l  |  grep  hbase  lrwxrwxrwx.  1  root  root  27  Dec    6  22:57  hbase-­‐client  -­‐>  /usr/hdp/2.2.0.0-­‐1995/hbase  lrwxrwxrwx.  1  root  root  27  Dec    6  22:57  hbase-­‐master  -­‐>  /usr/hdp/2.2.0.0-­‐1995/hbase  lrwxrwxrwx.  1  root  root  27  Dec    6  22:57  hbase-­‐regionserver  -­‐>  /usr/hdp/2.2.0.0-­‐1995/hbase  

[root@cluster1  hdp]#  pwd  /usr/hdp  [root@cluster1  hdp]#  ls  -­‐l  drwxr-­‐xr-­‐x.  19  root  root  4096  Nov  15  07:26  2.2.0.0-­‐1995  drwxr-­‐xr-­‐x.    2  root  root  4096  Dec    7  01:22  2.2.0.1-­‐2217  drwxr-­‐xr-­‐x.    2  root  root  4096  Dec    6  22:57  current  

Multiple versions of the HDP stack.

Within  /usr/hdp/current  

Page 22: Discover.hdp2.2.h base.final[2]

Page 22 © Hortonworks Inc. 2014

HBase Rolling Upgrade: Upgrade One Component hdp-­‐select  [root@cluster1  hdp]#  hdp-­‐select  status  |  grep  hbase  hbase-­‐client  -­‐  2.2.0.0-­‐1995  hbase-­‐master  -­‐  2.2.0.0-­‐1995  hbase-­‐regionserver  -­‐  2.2.0.0-­‐1995  

Upgrade  Servers  Before  Clients  

[root@cluster1  hdp]#  hdp-­‐select  set  hbase-­‐master  2.2.0.1-­‐2217  

[root@cluster1  current]#  pwd  /usr/hdp/current  [root@cluster1  current]#  ls  -­‐l  |  grep  hbase  lrwxrwxrwx.  1  root  root  27  Dec    6  22:57  hbase-­‐client  -­‐>  /usr/hdp/2.2.0.0-­‐1995/hbase  lrwxrwxrwx.  1  root  root  27  Dec    7  02:23  hbase-­‐master  -­‐>  /usr/hdp/2.2.0.1-­‐2217/hbase  lrwxrwxrwx.  1  root  root  27  Dec    6  22:57  hbase-­‐regionserver  -­‐>  /usr/hdp/2.2.0.0-­‐1995/hbase  

Page 23: Discover.hdp2.2.h base.final[2]

Page 23 © Hortonworks Inc. 2014

Rolling Upgrade Contracts Rolling Upgrade works for minor upgrades. •  Example: HDP 2.2.0 to HDP 2.2.1.

Wire compatibility guaranteed between clients and servers.

Binary compatibility guaranteed, e.g. for coprocessors.

Data format compatibility guaranteed.

Page 24: Discover.hdp2.2.h base.final[2]

Page 24 © Hortonworks Inc. 2014

Rolling Upgrade Benefits

Rolling  Upgrade  Benefit  Upgrade  with  zero  downVme.  Roll  forward  and  roll  backward.  Instant  switchover  /  restart  preserve  data  locality  when  upgrading  HBase.  Update  servers  and  clients  independently.  

Page 25: Discover.hdp2.2.h base.final[2]

Page 25 © Hortonworks Inc. 2014

New in HDP 2.2: HBase on YARN via Slider

Page 26: Discover.hdp2.2.h base.final[2]

Page 26 © Hortonworks Inc. 2014

Deploying HBase with Slider What is it? •  Deploy HBase into the Hadoop cluster using YARN.

Benefit Details Simplified Deployment No need to deploy HBase or its configuration to individual cluster nodes. Lifecycle Management Start / stop / process management handled automatically. Multitenancy Different users can run HBase clusters within one Hadoop cluster. Multiple Versions Run different versions of HBase (e.g. 0.98 and 1.0) on the same cluster. Elasticity Cluster size is a parameter and easily changed. Co-located Analytics HBase resource usage is known to YARN, nodes running HBase will not

be used as heavily to satisfy MapReduce or Tez jobs.

Page 27: Discover.hdp2.2.h base.final[2]

Page 27 © Hortonworks Inc. 2014

HBase / Slider Sample Configure HBase settings in appConfig.json and resources.json

Sample Slider Command: •  slider  create  mycluster  \  

           -­‐-­‐template  appConfig.json  \  

           -­‐-­‐resources  resources.json  

{      "schema":  "http://example.org/specification/v2.0.0",      "metadata":  {      },      "global":  {          "site.hbase-­‐site.hbase.hstore.flush.retries.number":  "120",          "site.hbase-­‐site.hbase.client.keyvalue.maxsize":  "10485760",          "site.hbase-­‐site.hbase.hstore.compactionThreshold":  "3",          "site.hbase-­‐site.hbase.rootdir":  "${DEFAULT_DATA_DIR}/data",          "site.hbase-­‐site.hbase.stagingdir":  "${DEFAULT_DATA_DIR}/staging",          "site.hbase-­‐site.hbase.regionserver.handler.count":  "60”,  ...  

Page 28: Discover.hdp2.2.h base.final[2]

Page 28 © Hortonworks Inc. 2014

Q & A

Page 29: Discover.hdp2.2.h base.final[2]

Page 29 © Hortonworks Inc. 2014

Thank you! Learn more at: hortonworks.com/hadoop/hbase/

Register for the last

Discover HDP 2.2 Webinar

Hortonworks.com/webinars