29
Apache HBase: State of the Union Enis Söztutar [email protected]

Apache HBase: State of the Union

Embed Size (px)

Citation preview

Page 1: Apache HBase: State of the Union

Apache HBase: State of the Union

Enis Sö[email protected]

Page 2: Apache HBase: State of the Union

About Me

Enis Söztutar

[email protected]

• Committer and PMC member in Apache HBase, Phoenix, and Hadoop

• HBase/Phoenix dev @Hortonworks

Page 3: Apache HBase: State of the Union

Outline

Versions, compatibility

Releases, what is in HBase-{1.1, 1.2, 1.3}

New Developments

HBase-2.0

Page 4: Apache HBase: State of the Union

Versions, Compatibility

Page 5: Apache HBase: State of the Union

Semantic Versioning

Starting with the 1.0 release, HBase works toward Semantic Versioning

MAJOR.MINOR.PATCH[-identifiers]

PATCH: only BC bug fixes.

MINOR: BC new features

MAJOR: Incompatible changes

Page 6: Apache HBase: State of the Union

SemVer in Action

1.0 Released last year. Started following semantic versioning

10 releases with 1.x.y versions. More coming!

Release notes contain “compatibility” report for source / binary

Patch upgrades do not have new features. Drop in replacement.

Minor versions are “compatible”

Page 7: Apache HBase: State of the Union

To be, or not to be (Compatible)

Page 8: Apache HBase: State of the Union

To be, or not to be (Compatible)

Compatibility is NOT a simple yes or no

Many dimensions

• source, binary, wire, command line, dependencies etc

What is client interface?

• InterfaceAudience.{Public,Private,LimitedPrivate}

Read https://hbase.apache.org/book.html#upgrading

Page 9: Apache HBase: State of the Union

Major Minor Patch

Client-Server Wire Compatibility ✗ ✓ ✓Server-Server Compatibility ✗ ✓ ✓File Format Compatibility ✗* ✓ ✓Client API Compatibility ✗ ✓ ✓Client Binary Compatibility ✗ ✗ ✓Server Side Limited API Compatibility ✗ ✗*/ *✓ ✓Dependency Compatibility ✗ ✓ ✓Operation Compatibility ✗ ✗ ✓

Page 10: Apache HBase: State of the Union

Releases

Page 11: Apache HBase: State of the Union

2015 H2 – 2016 H1 (repo and releases)(master) 2.0.0-SNAPSHOT

(branch-1) 1.4.0-SNAPSHOT

(branch-1.3)1.3.0 RC

1.2.2 RC1.2.0 1.2.1(branch-1.2)

1.1.0 1.1.5

0.98.200.98.19

1.0.0 1.0.3

(branch-1.1)

(branch-1.0)

(0.98)

Page 12: Apache HBase: State of the Union
Page 13: Apache HBase: State of the Union

RTFM – HBase-1.1 Release Notes

• Async RPC client

• Simple RPC throttling

• Improved compaction controls

• Scan improvements

• Procedure V2 for improved reliability of cluster operations (HBASE-12439)

• New extension interfaces for

coprocessor users

• Per-column family flush

• WAL on SSD

• BlockCache in Memcached

• Region replica enhancements around META, WAL, and bulk loading

Page 14: Apache HBase: State of the Union

RTFM – HBase-1.2 Release Notes

• JDK8 is now supported

• Hadoop 2.6.1+ and Hadoop 2.7.1+ are now supported

• Per column-family time ranges for scan

• Daemons respond to SIGHUP to reload configs

• Region location methods added to thrift2 proxy

• Table-level sync that sends deltas

• Client side metrics via JMX

Page 15: Apache HBase: State of the Union

RTFM – HBase-1.3 Release Notes

• Date-based tiered compactions

• Maven archetypes for HBase client applications

• Throughput controller for flushes Controlled delay (CoDel) based RPC scheduler (HBASE-15136)

• Bulk loaded HFile replication

• More improvements to Procedure V2

• Improvements to Multi WAL

• Many improvements and optimizations in metrics subsystem

• Reduced memory allocation in RPC layer

• Region location lookups optimizations in HBase client

Page 16: Apache HBase: State of the Union

Releases – How to choose

0.98 is still released frequently, likely will continue till end of 2016

1.0 is EOL’ed. Move to 1.1 at least

Both 1.1 and 1.2 are pretty stable

Starting from scratch, use 1.2 or 1.3

1.3 is coming shortly

Moving between minor versions is easy for 1.x

Page 17: Apache HBase: State of the Union

New Developments

Page 18: Apache HBase: State of the Union

New Compaction Policies for Time series

FIFO: First In, First Out

• No Compaction!

• Only data with very short TTL

Date Tiered Compaction

• Dramatic reduction in IO!

• Partition hfiles and compaction by time windows

• Scans with time ranges filters whole files

Page 19: Apache HBase: State of the Union

Date Tiered Compaction

From https://labs.spotify.com/2014/12/18/date-tiered-compaction/

Page 20: Apache HBase: State of the Union

Spark Integration

• RRD

• DataFrame / DataSet / SparkSQL

• Partition pruning

• Column pruning

• Data locality

• Predicate pushdown

Page 21: Apache HBase: State of the Union

Spark Integration

Page 22: Apache HBase: State of the Union

Perf

Async

• Async RPC client already in

• Async Client

• Async WAL Writer

Row locks, Read / Write

Write path re-ordered

Page 23: Apache HBase: State of the Union

New Development – In Progress

RPC Scheduling improvements

Replication 2.0

Reduce Garbage

C++ Client

Backup / Restore

Page 24: Apache HBase: State of the Union

New Development – In Progress

Offheaping

Read path (done)

Write path in development

In-memory flushes/compactions

Compact in-memory representations

Fatter flushes

Assignment Manager/Master

Page 25: Apache HBase: State of the Union

HBase-2.0

Page 26: Apache HBase: State of the Union

HBase-2.0

Target is 2016 EOY

Learnt from singularity (0.94 -> 0.96+)

2.0 will be rolling upgradable!

• Disclaimer: to the extend that we can make it

JDK-8 only

Will work with Hadoop-3?

Assignment and data layout changes is the big driver

Page 27: Apache HBase: State of the Union

How to prepare for HBase-2.0

2.0 contains more API clean up

Cleanup PB and guava “leaks” into the API

Some deprecated APIs (HConnection, HTable, HBaseAdmin, etc) going away

Start using JDK-8 (and G1). You will like it.

1.x client should be able to do read / write / scan against 2.0 clusters

Some DDL / Admin operations may not work

Page 28: Apache HBase: State of the Union

Other HBase talks

Today

(3:00pm) Omid: A Transactional Framework for HBase

(4:10pm) Hive Hbase Metastore - Improving Hive with a Big Data Metadata Storage

(5:00pm) Operating and Supporting Apache HBase - Best Practices and Improvements

Thursday

(2:10pm) Managing Hadoop, HBase, and Storm Clusters at Yahoo Scale

(3:00pm) Phoenix + HBase: An Enterprise Grade Data-Warehouse Appliance for Interactive Analytics?

(4:10pm) The DAP: Where Yarn, HBase, Kafka and Spark go to Production

(5:00pm) HBase BoF

Page 29: Apache HBase: State of the Union

Questions

Thanks for listening *.

*Here is a picture of a cat for your suffering!