Transcript
Page 1: Apachecon Europe 2012: Operating HBase - Things you need to know

Operating HBase – Things You Need to Know

Christian Gügi

Page 2: Apachecon Europe 2012: Operating HBase - Things you need to know

2

Outline● HBase internals

● Overview of HBase utilities

● HBase split visualisation with Hannibal

● Challenges & lessons learned

● Resources to get started

Page 3: Apachecon Europe 2012: Operating HBase - Things you need to know

3

About me● Software Architect @ Sentric

● Founder and organizer of the Swiss Big Data User Grouphttp://www.bigdata-usergroup.ch

● Contact:[email protected]://www.sentric.ch@chrisgugi

Page 4: Apachecon Europe 2012: Operating HBase - Things you need to know

4

HBase Internals

Page 5: Apachecon Europe 2012: Operating HBase - Things you need to know

5

Data Model● A sparse, multi-dimensional, sorted map

● Table consist of rows, each has a row key

● Each row may have any number of columns

● Rows are sorted lexicographically based on row key

● Column = Column Family : Column Qualifier

– Cell → {rowkey, column, timestamp}

● Region: contiguous set of sorted rows

● Region: unit of distribution and availability

[Bigtable: A Distributed Storage System for Structured Data]

Page 6: Apachecon Europe 2012: Operating HBase - Things you need to know

6

Physical Data Organization

Memstore

HFile(on HDFS)

HFile(on HDFS)

Store

Region

HLo

g(W

AL

on H

FD

S)

content Column Family

● Column families are stored separately on disk

– Unit of access control with different patterns

● Writes are held (sorted) in memory until flush

● Sorted on disk in predictable order

– By row key, column key, descending timestamp

Memstore

HFile(on HDFS)

Store

anchor Column Family

Page 7: Apachecon Europe 2012: Operating HBase - Things you need to know

7

Flushes and Compaction● Flushing/compaction per Region

– One thread (CompactSplitThread) per region server

● Minor compaction

– Merges two or more HFiles into one

● Major compaction

– Picks up all HFiles in the region, merges them and removes deleted k/v

● Regions are split when grown too large

Page 8: Apachecon Europe 2012: Operating HBase - Things you need to know

8

System Architecture

Master

HBase

Write-Ahead Log

RegionServer

HDFS ZooKeeper

[HBase: The Definitive Guide]

API

MemstoreHFile

Page 9: Apachecon Europe 2012: Operating HBase - Things you need to know

9

Key Design & Distribution● Bad idea: continuous number or timestamp

(sequential row keys)– RegionServer hot-spotting

● Better: use hash function and/or composite key – Distribute keys over random regions

– Uniform reads/writes across key space

● Proper key design is very essential– E.g. reversed URL (Bigtable paper)

Page 10: Apachecon Europe 2012: Operating HBase - Things you need to know

10

Overview HBase Utilities

Page 11: Apachecon Europe 2012: Operating HBase - Things you need to know

11

Useful Tools● hbck – checks and fixes table integrity and

region consistency

● HFile – examine contents of HFile

● HLog – examine contents of HLog file

● OfflineMetaRepair – rebuild meta table from file system

● HBase web interfaces– Master

– RegionsServer

Page 12: Apachecon Europe 2012: Operating HBase - Things you need to know

12

Monitoring Tools● Ganglia

● Nagios

● OpenTSDB

● …

All tools use metrics provided through JMX

Page 13: Apachecon Europe 2012: Operating HBase - Things you need to know

13

Manual Splitting● Via master web interface– Split

● HBase shell split command

● RegionSplitter– Create table with pre-split regions

– Rolling split of all regions on existing table

– . /bin/hbase org.apache.hadoop.hbase.util.RegionSplitter

Page 14: Apachecon Europe 2012: Operating HBase - Things you need to know

14

Disable Automatic Splitting● Determined by hbase.hregion.max.filesize

● Set to max. 100GB

● OK, but: – How do I monitor my region growth?

– Where do I split when I have irregular data growth?

Page 15: Apachecon Europe 2012: Operating HBase - Things you need to know

15

HBase Split Visualisation with Hannibal

Page 16: Apachecon Europe 2012: Operating HBase - Things you need to know

16

Hannibal● Open source, project on github

– https://github.com/sentric/hannibal

● Web based

● Implemented in Scala

● Compatible with HBase 0.90

● Support > 0.92 added soon

● Check it out!

Page 17: Apachecon Europe 2012: Operating HBase - Things you need to know

17

How well are regions balanced over the cluster?

Page 18: Apachecon Europe 2012: Operating HBase - Things you need to know

18

How well are the regions split for the table?

Page 19: Apachecon Europe 2012: Operating HBase - Things you need to know

19

How did the region evolve over time?

Page 20: Apachecon Europe 2012: Operating HBase - Things you need to know

20

Future Plans● HBase 0.92 client API changes allow to

query Compaction-State on Regions through HBaseAdmin → differentiate major from minor compactions

● Add tool to find best region-key for irregular data growth

● Expose metrics through JMX

Page 21: Apachecon Europe 2012: Operating HBase - Things you need to know

21

Challenges & Lessons Learned

Page 22: Apachecon Europe 2012: Operating HBase - Things you need to know

22

Challenges● Everyone is still learning

● Some issues only appear at scale– At scale, nothing works as advertised

● Production cluster configuration– Hardware issues

– Tuning cluster configuration to our work loads

● HBase stability

● Monitoring health of HBase

Page 23: Apachecon Europe 2012: Operating HBase - Things you need to know

23

Lessons Learned● Schema & key design

– What’s queried together should be stored together

● Monitoring/Operational tooling is most important

● Forget “emergency actions”, it takes some time

● You need DevOps in production

● Huge know-how curve, you need to know the whole ecosystem

– Hadoop, HDFS, Map/Red, ZooKeeper

Page 24: Apachecon Europe 2012: Operating HBase - Things you need to know

24

Resources to get started● https://github.com/sentric/hannibal

● http://hbase.apache.org/book.html

● https://github.com/jmhsieh/hbase-repair-scripts

● http://www.sentric.ch/blog/best-practice-why-monitoring-hbase-is-important

● HBase: The Definitive Guide

Page 25: Apachecon Europe 2012: Operating HBase - Things you need to know

25

Questions?@chrisgugi

Thank you!