Hortonworks Technical Workshop: HBase and Apache Phoenix

SQL on HBase with Phoenix

Agenda What Is Apache HBase •  High Level Overview. •  Technical Detail.

What Is Apache Phoenix •  Overview. •  What’s New.

•  Secondary Index Demo.

New Data Requires a New Data Architecture

Source: IDC

2.8 ZB in 2012

85% from New Data Types

15x Machine Data by 2020

40 ZB by 2020

OLTP, ERP, CRM Systems

Unstructured documents, emails

Clickstream

Server logs

Sen>ment, Web Data

Sensor, Machine Data

Geoloca>on

Modern Database Needs More Scalable

Handle New Data Types Intelligent and Predic>ve

What Is Apache HBase?

100% Open Source Store and Process Petabytes of Data Flexible Schema Scale out on Commodity Servers High Performance, High Availability Integrated with YARN SQL and NoSQL Interfaces

YARN : Data OperaGng System

RegionServer

1 ° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° ° N

HDFS (Permanent Data Storage)

RegionServer

Dynamic Schema Scales Horizontally to PB of Data Directly Integrated with Hadoop

Kinds of Apps Built with HBase

Interested? See HBase Case Studies later in this document.

Write Heavy Low-Latency

Search / Indexing

Messaging

Audit / Log Archive Advertising Data Cubes

Time Series Sensor / Device

HBase is Deeply Integrated with Hadoop

•  Data is stored in HDFS. You can store more data and re-‐use exis>ng HDFS exper>se.

•  HBase is integrated with YARN. •  Analy>cs in-‐place using Hive, Pig,

Spark and more.

Who’s Using HBase?

HBase Technical Details

Spring 2014 Version 1.0

HBase Technical Details Based on Google BigTable •  Dynamic schema. •  Good for very sparse datasets.

•  All data is range-partitioned for trivial horizontal scaling across commodity hardware.

Directly integrated with HDFS and Hadoop •  Analyze data in HBase with any Hadoop ecosystem tools (Hive, Pig, MapReduce, Tez, etc.) •  Re-use existing Hadoop skills to run HBase.

Logical ArchitectureDistributed, persistent partitions of a BigTable

Table A

Region 1

Region 2

Region 3

Region 4

Region Server 7Table A, Region 1Table A, Region 2

Table G, Region 1070Table L, Region 25

Region Server 86Table A, Region 3Table C, Region 30Table F, Region 160Table F, Region 776

Region Server 367Table A, Region 4Table C, Region 17Table E, Region 52

Table P, Region 1116

Legend: - A single table is partitioned into Regions of roughly equal size. - Regions are assigned to Region Servers across the cluster. - Region Servers host roughly the same number of regions.

Logical Data ModelA sparse, multi-dimensional, sorted map

Legend: - Rows are sorted by rowkey. - Within a row, values are located by column family and qualifier. - Values also carry a timestamp; there can me multiple versions of a value. - Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes.

1368387247 [3.6 kb png data]"thumb"cf2b

1368394583 71368394261 "hello"

1368394583 221368394925 13.61368393847 "world"

cf21368387684 "almost the loneliest number"1.0001

1368396302 "fourth of July""2011-07-04"

Table A

rowkey columnfamily

columnqualifier timestamp value

HBase HA Overview (Introduced in HDP 2.1)

HMaster

Zookeeper

Client Client Client Client

HBase RegionServer

Region: 100-‐199 (Standby)

Region: 200-‐299 (Standby)

Region: 0-‐99

(Primary)

HBase RegionServer

Region: 100-‐199 (Primary)

Region: 0-‐99

(Standby)

Region: 200-‐299 (Primary)

HFile HFile HFile HFile HFile HFile

HBase HA: Real-‐Time Replica>on

Low-‐Latency Reads and Writes

In-‐Memory Cache In-‐Memory Cache

Hive, Pig, MapReduce Hive, Pig, MapReduce

Data Stored to HDFS

Read or Write Directly from Hadoop Tools

Cluster Topology, Data Placement

Apache Phoenix

Spring 2014 Version 1.0

The SQL Skin for HBase

Apache Phoenix A SQL Skin for HBase •  Provides a SQL interface for managing data in HBase. •  Large subset of SQL:1999 mandatory featureset.

•  Create tables, insert and update data and perform low-latency point lookups through JDBC. •  Phoenix JDBC driver easily embeddable in any app that supports JDBC.

Phoenix Makes HBase Better •  Oriented toward online / semi-transactional apps. •  If HBase is a good fit for your app, Phoenix makes it even better.

•  Phoenix gets you out of the “one table per query” model many other NoSQL stores force you into.

Apache Phoenix: Current Capabilities

Feature Supported? Common SQL Datatypes Yes Inserts and Updates Yes SELECT, DISTINCT, GROUP BY, HAVING Yes NOT NULL and Primary Key constrants Yes Inner and Outer JOINs Yes Views Yes Subqueries HDP 2.2 Robust Secondary Indexes HDP 2.2

Apache Phoenix: Future Capabilities

Feature Supported? Multi-Table Transactions Future Scalable Joins (Fact-to-Fact) Future Analytics, Windowing Functions Future

Phoenix Provides Familiar SQL Constructs Compare: Phoenix versus Native API

Code Notes // HBase Native API. HBaseAdmin hbase = new HBaseAdmin(conf); HTableDescriptor desc = new HTableDescriptor("us_population"); HColumnDescriptor state = new HColumnDescriptor("state".getBytes()); HColumnDescriptor city = new HColumnDescriptor("city".getBytes()); HColumnDescriptor population = new HColumnDescriptor("population".getBytes()); desc.addFamily(state); desc.addFamily(city); desc.addFamily(population); hbase.createTable(desc);

// Phoenix DDL. CREATE TABLE us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));

•  Familiar SQL syntax. •  Provides additional constraint

checking.

Phoenix: Architecture

HBase Cluster

Phoenix Coprocessor

Java Applica>on

Phoenix JDBC Driver

User Application

Phoenix Performance Phoenix Performance Characterization: •  Suitable for 10s of thousands of point-lookups per second. •  Suitable for thousands of aggregations / filtered searches per second.

•  Supports extremely high concurrency.

Phoenix Performance Optimizations •  Column skipping. •  Table salting.

•  Skip scans.

Performance characteristics: •  Index point lookups in milliseconds.

•  Aggregation and Top-N queries in a few seconds over large datasets.

Phoenix Use Cases Phoenix is for: •  Rapidly and easily building an application backed by HBase. •  Making use of your existing SQL skills and investment.

•  High performing aggregations of moderately-sized datasets inside HBase.

Phoenix is not for: •  Sophisticated SQL queries involving large joins or advanced SQL features. •  Queries requiring large scans that do not use indexes. •  ETL.

Phoenix: Futures Short-term focus: •  Transactions. •  Scalable joins.

•  Analytical capabilities.

Long-term focus: Primary interface for HBase. •  Build HBase applications using Phoenix. •  Configure cluster security and replication using Phoenix. •  Integration with BI tools like Microstrategy.

What’s New in Apache Phoenix

What’s New in Apache Phoenix Phoenix in HDP 2.2 •  Based on Apache Phoenix 4.2. •  8 new features, 143 total improvements and fixes.

Notable new features. •  Robust secondary indexes. •  Sub-joins.

•  Basic window functions. •  Bulk loader improvements.

Robust Secondary Index Background / Refresher •  Phoenix supports local and global secondary indexes. •  Updating a global index may require coordination with another RegionServer.

•  See Phoenix docs if you need info on which to use when.

Before Phoenix 4.1 (HDP 2.1): •  Using global indexes, if the RegionServer serving the index key was down, regionservers would abort. •  Note: Does not affect local indexes.

Phoenix 4.1+: •  If the global index cannot be updated:

•  The index is temporarily disabled. •  Background job is launched to rebuild the index.

•  Reads will go directly to base tables rather than accessing the index.

•  Writes will continue to update the index.

•  Controlled by: phoenix.index.failure.handling.rebuild

Improved SQL: Sub Joins Example: select * from A

left join (B join C on B.bc_id = C.bc_id)

on A.ab_id = B.ab_id and A.ac_id = C.ac_id;

Caveats related to joins still apply: •  Still broadcast joins only.

Phoenix: Basic Window Functions FIRST_VALUE, LAST_VALUE, NTH_VALUE

•  No OVER or PARTITION BY.

•  Function applied to each group based on GROUP BY.

Example: SELECT

FIRST_VALUE(“column1”)

WITHIN GROUP

(ORDER BY column2 ASC)

GROUP BY

column3;

ENCODE, DECODE DECODE •  Supports hexadecimal format. DECODE('000000008512af277ffffff8', 'hex')

ENCODE •  Supports hexadecimal and Base62 ENCODE(1, 'base62')

What is base 62??? •  Used to encode data using only letters and numbers.

•  Commonly used for things like URL shorteners.

Demo Phoenix Secondary Indexes

Secondary Index Recap Index Management via JDBC: •  CREATE INDEX my_index ON my_table (v1); •  DROP INDEX my_index ON my_table;

•  ALTER INDEX my_index ON my_table DISABLE / REBUILD;

Index population during bulk import: •  Uses the CsvBulkLoadTool utility (not psql.py). •  Adds the --index-table argument to specify your target index.

HADOOP_CLASSPATH=/path/to/hbase-‐protocol.jar:/path/to/hbase/conf

hadoop jar phoenix-‐4.0.0.jar \

org.apache.phoenix.mapreduce.CsvBulkLoadTool \

-‐-‐table EXAMPLE -‐-‐input /data/example.csv

Hortonworks Technical Workshop: HBase and Apache Phoenix

Technology

Apache HBase: State of the Union

Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

HBaseConEast2016: Practical Kerberos with Apache HBase

Network Traffic Search using Apache HBase

Hortonworks Technical Workshop: Apache Ambari

Hortonworks Data Platform - Apache Ambari Installation

Hortonworks Data Platform - Apache Ambari Minor Upgrade ... · Distributed File System (HDFS), HCatalog, Pig, Hive, HBase, ZooKeeper and Ambari. Hortonworks is the major contributor

Hortonworks Phoenix ODBC Driver · 2018-10-09 · Introduction Apache Phoenix is a relational database layer that is built on top of Apache HBase. Phoenix takes your SQL query, compiles

Apache Spark streaming and HBase

Apache HBase Deploys - gotocon.comgotocon.com/.../slides/MichaelStack_HBaseDeploys.pdf · Apache HBase Deploys Michael Stack GOTO Amsterdam 2011. Me • Chair of Apache HBase Project

Hortonworks HBase Meetup Presentation

Apache Phoenix + Apache HBase

Webanwendungen mit Apache HBase entwickeln

Apache Sqoop with Apache Hbase

Apache Big Data EU 2015 - HBase

Apache hadoop hbase

Apache hbase overview (20160427)

Apache HBase Low Latency

Hortonworks Data Platform - User Guides · 2014-10-28 · Hortonworks Data Platform Oct 28, 2014 1 1. HBase Import Tools HBase includes several methods of loading data into tables

Apache HBase Performance Tuning