Upload
josh-elser
View
638
Download
0
Embed Size (px)
Citation preview
Apache HBase Internals you Hoped you Never Needed to UnderstandJosh ElserFuture of Data, NYC2016/10/11
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Engineer at Hortonworks, Member of the Apache Software Foundation
Top-Level Projects• Apache Accumulo®• Apache Calcite™• Apache Commons ™• Apache HBase ®• Apache Phoenix ™
ASF Incubator• Apache Fluo ™• Apache Gossip ™• Apache Pirk ™• Apache Rya ™• Apache Slider ™
These Apache project names are trademarks or registeredtrademarks of the Apache Software Foundation.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache HBase for storing your data!
CC BY 3.0 US: http://hbase.apache.org/
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What happens when things go wrong?
CC BY-ND 2.0: https://www.flickr.com/photos/widnr/6588151679
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The BigTable Architecture
BigTable’s architecture is simple
Debugging a distributed system is not simple
How can we break down a complex system?
How do we write resilient software?
• Log-Structured Merge Tree• Write-Ahead Logs• Distributed Coordination• Row-based, Auto-Sharding• Strong Consistency• Read Isolation• Coprocessors• Security (AuthN/AuthZ)• Backups
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Naming Conventions
Servers– Hostname, Port, and Timestamp– RegionServer: r01n01.domain.com,16201,1475691463147– Master: r02n01.domain.com,16000,1475691462616
Regions– Table, Start RowKey, Region ID (timestamp), Replica ID, Encoded name– T1,\x04\x00\x00,1470324608597.c04d94cd4ee9797da2fb906b4dcd2e3c.– Or simply c04d94cd4ee9797da2fb906b4dcd2e3c
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Regions
A sorted “shard” of a table At least one “column family”
– Physical partitions
Each family can have zero to many files Hosted by at most one RegionServer
– Can have many hosting RS’s for reads
In-memory locks for certain intra-row operations
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Region Assignment
Coordinated by the HBase Master A Region must only be hosted by one RegionServer State tracked in hbase:meta
– hbck to fix issues
Region splits/merges make a hard problem even harder Moving towards ProcedureV2
Closed Offline Opening OpenPending Open
Normal Region Assignment States
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System
HDFS “Compatible”– Distributed, durable, ”write leases”
Physical storage of HBase Tables (HFiles) Write-ahead logs A parent directory in that FileSystem (hbase.rootdir)
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File SystemPhysical Separation by HBase Namespace/hbase/data//hbase/data/default/<table1>/hbase/data/default/.tabledesc/.tableinfo…/hbase/data/default/<table2>/<region_id1>/hbase/data/default/<table2>/<region_id2>/hbase/data/my_custom_ns/<table3>/…/hbase/data/hbase/meta/…/hbase/archive/…
/hbase/WALs/<regionserver_name>/…/hbase/oldWALs/…/hbase/corrupt/…
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System for one Region
/hbase/data/default/<table2>/<region_id1>
…/.regioninfo…/.tmp…/<family1>/<hfile>…/<family1>/<hfile>…/<family2>/<hfile>…/<family3>/<hfile>…/recovered.edits/<number>.seqid
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Writes into HBase
Mutations inserted into sorted in-memory structure and WAL– Fast lookups of recent data– Append-only log for durability and speed
Mutations are collected by destination Region Beware of hot-spotting Data in memory eventually flush’ed into sorted (H)files
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compactions and Flushes
Flush: Taking Key-Values from the In-Memory map and creating an HFile Minor Compaction: Rewriting a subset of HFiles for a Region into one HFile Major Compaction: Rewriting all HFiles for a Region into one HFile
Compactions balance improved query performance with cost of rewriting data– Compactions are good!– Must understand SLA’s to properly tune compactions
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reads into HBase
Merge-Sort over multiple streams of data– Memory– Disk (many files)
hbase:meta is the definitive source of where to find Regions
RowKey Region
hbase:meta
RegionServer
ZooKeeper
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache ZooKeeper™
Distributed coordination is really hard Obvious use cases
– Service Discovery– Cluster Membership– “Root Table”
Non-obvious use cases– Assignment (sometimes)– Region Recovery– WAL Splitting– Cluster Replication– Distributed Procedures– HBase Snapshots
Apache ZooKeeper is a trademark of the Apache Software Foundation
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache ZooKeeper™ Discovery/Leader ZNodes
– /hbase/rs/…– /hbase/master/…– /hbase/backup-masters/…
Consensus– /hbase/splitWAL/…– /hbase/flush-table-proc/...– /hbase/table-lock/...– /hbase/region-in-transition/...– /hbase/recovering-regions/...
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Procedures
Resiliency in an unreliable system– How do we create a table?
“Procedure V2”– Resilient, finite state machine
HBase operations represented as ”procedures”
Clients are agnostic of Master state– Clients track procedure state
https://issues.apache.org/jira/secure/attachment/12679960/ProcedureV2.pdf
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Procedures
Procedures are durable via Write-Ahead Log– /hbase/MasterProcWALs/…
Procedures only executed by the active HBase Master Reusable framework for the future
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase RPCs
Internal and External HBase Communication
Half-Sync/Half-Async Model Many knobs to tweak
Listener Readers Scheduler Call Queues Call Runners/Handlers
Overview Components
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase RPCs
Listener
Reader
Reader
Reader
Reader
Scheduler
Call Queues Handlers
Priority
Read
Write
Replication
Request to Execution
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disaster Recovery
Multiple tools to ensure copies of data in the face of catastrophic failure CopyTable
– MapReduce job which reads all data from a source, writing to destination
Snapshots– A collection of Regions, their HFiles, and metadata
Backup & Restore– HBASE-7912, current targeted for HBase-2.0.0– Incremental and full backup/restore
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos
Strong authentication for untrusted networks ”Standard” across Apache Hadoop and friends Requirements:
– Forward/Reverse DNS– Unlimited Strength Java Cryptography Extension
SASL used to build RPC systems “Practical Kerberos with Apache HBase” https://goo.gl/y0d9ZO
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Finding an Hypothesis
Logs logs logs Application and System
Metrics exposed by JMX Graphing solutions
– Ambari Metrics Server + Grafana