34
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Apache HBase For Architects Nick Dimiduk, Hortonworks Strata/Hadoop World Barcelona, 2014-11-21 Page 1

Apache HBase for Architects

Embed Size (px)

DESCRIPTION

An overview of HBase's architecture, internal design, and its place in your application.

Citation preview

Page 1: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Apache HBase For Architects Nick Dimiduk, Hortonworks Strata/Hadoop World Barcelona, 2014-11-21

Page 1

Page 2: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Page 2

Page 3: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Agenda •  Background

–  (how did we get here?) •  TL;DR

–  (don’t waste my time!) •  High-level Architecture

–  (where are we?) •  Anatomy of a RegionServer

–  (how does this thing work?) •  By Example

–  (how do I use it?) •  Resources

–  (where do we go from here?)

Page 3

Page 4: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Background

Page 4

Page 5: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

So what is HBase anyway? •  BigTable paper from Google, 2006, Dean et al.

–  “Bigtable is a sparse, distributed, persistent multi-dimensional sorted map.” –  http://research.google.com/archive/bigtable.html

•  Key Features: – Distributed storage across cluster of machines – Random, online read and write data access – Schemaless data model (“NoSQL”) – Self-managed data partitions

Page 5

Page 6: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Apache Hadoop Dependencies •  Apache Hadoop Distributed Filesystem (HDFS)

– Distributed, fault-tolerant, throughput-optimized data storage –  The Google File System, 2003, Ghemawat et al. –  http://research.google.com/archive/gfs.html

•  Apache Zookeeper (ZK) – Distributed, available, reliable coordination system –  The Chubby Lock Service …, 2006, Burrows –  http://research.google.com/archive/chubby.html

•  Apache Hadoop MapReduce (MR) – Distributed, fault-tolerant, batch-oriented data processing – MapReduce: …, 2004, Dean and Ghemawat –  http://research.google.com/archive/mapreduce.html

Page 6

Page 7: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

TL;DR

Page 7

Page 8: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

So what is HBase anyway?

Page 8

C1 tree C0 tree

Disk Memory

Figure 2.1. Schematic picture of an LSM-tree of two components

Figure 2.1 reproduced from O’Neil, Patrick, et al. "The log-structured merge-tree (LSM-tree)." Acta Informatica 33.4 (1996): 351-385.

Page 9: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

So what is HBase anyway?

Page 9

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

Page 10: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

So what is HBase anyway?

Page 10

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

Page 11: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

So what is HBase anyway?

Page 11

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

Page 12: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

So what is HBase anyway?

Page 12

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

a

cf1

1368394583 71368394261 "hello"

"bar"

1368394583 221368394925 13.61368393847 "world"

"foo"

cf21368387684 "almost the loneliest number"1.0001

1368396302 "fourth of July""2011-07-04"

Page 13: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

High-level Architecture

Page 13

Page 14: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Logical Data Model

Page 14

1368387247 [3.6 kb png data]"thumb"cf2b

a

cf1

1368394583 71368394261 "hello"

"bar"

1368394583 221368394925 13.61368393847 "world"

"foo"

cf21368387684 "almost the loneliest number"1.0001

1368396302 "fourth of July""2011-07-04"

Table A

rowkey columnfamily

columnqualifier timestamp value

Rows

Column Families

Page 15: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Logical Architecture

Page 15

ab

dc

ef

hg

ij

lk

mn

po

Table A

Region 1

Region 2

Region 3

Region 4

Region Server 7Table A, Region 1Table A, Region 2

Table G, Region 1070Table L, Region 25

Region Server 86Table A, Region 3Table C, Region 30Table F, Region 160Table F, Region 776

Region Server 367Table A, Region 4Table C, Region 17Table E, Region 52

Table P, Region 1116

Page 16: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Physical Architecture

Page 16

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

RegionServer

DataNode

RegionServer

DataNode

RegionServer

DataNode

RegionServer

DataNode

...

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

MasterMaster

ZooKeeper

ZooKeeper

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

NameNode

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

HBaseClient

HDFS

HBase

Page 17: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

User API •  {rowkey => {family => {qualifier => {version => value}}}}

–  Think: nested TreeMap (Java), OrderedDictionary (C#), OrderedDict (Python) •  Basic data operations: GET, PUT, DELETE •  SCAN over range of key-values

–  benefit of the sorted rowkey business –  this is how you implement any kind of "complex query” *

•  GET, SCAN support Filters – Push application logic to RegionServers

•  INCREMENT, APPEND, CheckAnd{Put,Delete} – Server-side, atomic data operations, can be contentious!

Page 17

* This is also a foundational component in what we refer to as “schema design” in this “schemaless” database.

Page 18: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Anatomy of a RegionServer

Page 18

Page 19: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

So what is HBase anyway?

Page 19

63HBase in distributed mode

store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.

Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.

You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Table T1 split into 3 regions - R1, R2, and R3

T1R1

T1R2

T1R3

Figure 3.6 A table consists of multiple smaller chunks called regions.

DataNode RegionServer DataNode RegionServer DataNode RegionServer

Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Licensed to Nick Dimiduk <[email protected]>

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory

C1 tree C0 tree

Disk MemoryC1 tree C0 tree

Disk Memory Cache

Page 20: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Storage Machinery

Page 20

RegionServer (HBase)

DataNode (Hadoop DFS)

HLog(WAL)

HRegion

HStore

StoreFile

HFile

StoreFile

HFile

MemStore

... ...

HStore

BlockCache

HRegion

...

HStoreHStore

...

Page 21: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Storage Machinery

Page 21

RegionServer (HBase)

DataNode (Hadoop DFS)

HLog(WAL)

HRegion

HStore

StoreFile

HFile

StoreFile

HFile

MemStore

... ...

HStore

BlockCache

HRegion

...

HStoreHStore

...C1

C0

C1

C0

C1

C0

C1

C0

Cache

Page 22: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Storage Machinery: Write Path

Page 22

RegionServer (HBase)

DataNode (Hadoop DFS)

HLog(WAL)

HRegion

HStore

StoreFile

HFile

StoreFile

HFile

MemStore

... ...

HStore

BlockCache

HRegion

...

HStoreHStore

...

1

2

3

4

5

Page 23: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Storage Machinery: Read Path

Page 23

RegionServer (HBase)

DataNode (Hadoop DFS)

HLog(WAL)

HRegion

HStore

StoreFile

HFile

StoreFile

HFile

MemStore

... ...

HStore

BlockCache

HRegion

...

HStoreHStore

...

1 5

23

3

2

4

Page 24: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

By Example

Page 24

Page 25: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Database Dichotomy

Page 25

ReadWrite

Latency

Throughput

PredicatePushdown

Row+ColumnBloomfilter

LargerBlock SizeBulkload

HFiles

Compression

Compression

Compression

Compression

SmallerBlock Size

LargerBlockCache

IncreaseBlocking Storefiles

LossyWAL Durability

IncreasedScanner Caching

LargerFlush Size

Smart RowkeyDesign

Smart RowkeyDesign

Smart RowkeyDesign

Smart RowkeyDesign

Page 26: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Web-scale Database

Page 26

Appserver

Appserver

Appserver

Appserver

Appserver

Counters

Sessions

User profiles

Social Media

Application Data

ReadWrite

Latency

Throughput

Page 27: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

“BigIndex”

Page 27

"BigIndex"

Document store

Search Search

SearchSearch

Search

Appserver

Appserver

Appserver

Appserver

Appserver

ReadWrite

Latency

Throughput

Page 28: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Materialized View

Page 28

Appserver

Appserver

Appserver

Appserver

Appserver

ReadWrite

Latency

Throughput

Page 29: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

ETL Assist

Page 29

ReadWrite

Latency

Throughput

Page 30: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Lambda Architecture

Page 30

Appserver

Appserver

Appserver

Appserver

Appserver

Page 31: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Resources

Page 31

Page 32: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Join the Community! •  hbase.apache.org

–  hbase.apache.org/book.html –  blogs.apache.org/hbase/ –  hbase.apache.org/mail-lists.html

•  IRC: irc.freenode.net #hbase •  JIRA: issues.apache.org/jira/browse/HBASE •  Source: git clone git://git.apache.org/hbase.git •  In person

– HBaseCon, hbasecon.com – Hadoop Summit, hadoopsummit.org – Strata / Hadoop World, strataconf.com –  Local meetup near you!

Page 32

Page 33: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

HBase 0.98/1.0 •  Hardening

– Stability, Reliability, Availability, Performance •  Horizontal Scalability

–  1000’s of machines •  Availability

– Speed of Recovery (MTTR), Region Replicas •  Improved Multi-tenancy

– RPC priorities/QoS, namespace management •  Cell-level security •  Semantic Versioning •  Client API cleanup

Page 33

Page 34: Apache HBase for Architects

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Thanks!

Page 34

M A N N I N G

Nick Dimiduk Amandeep Khurana

FOREWORD BY Michael Stack

hbaseinaction.com

Nick Dimiduk github.com/ndimiduk

@xefyr

n10k.com

strataeucftw

slideshare.net/xefyr/hbase-for-architects