Upload
nick-dimiduk
View
118
Download
3
Embed Size (px)
DESCRIPTION
An overview of HBase's architecture, internal design, and its place in your application.
Citation preview
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Apache HBase For Architects Nick Dimiduk, Hortonworks Strata/Hadoop World Barcelona, 2014-11-21
Page 1
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Page 2
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Agenda • Background
– (how did we get here?) • TL;DR
– (don’t waste my time!) • High-level Architecture
– (where are we?) • Anatomy of a RegionServer
– (how does this thing work?) • By Example
– (how do I use it?) • Resources
– (where do we go from here?)
Page 3
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Background
Page 4
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
So what is HBase anyway? • BigTable paper from Google, 2006, Dean et al.
– “Bigtable is a sparse, distributed, persistent multi-dimensional sorted map.” – http://research.google.com/archive/bigtable.html
• Key Features: – Distributed storage across cluster of machines – Random, online read and write data access – Schemaless data model (“NoSQL”) – Self-managed data partitions
Page 5
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Apache Hadoop Dependencies • Apache Hadoop Distributed Filesystem (HDFS)
– Distributed, fault-tolerant, throughput-optimized data storage – The Google File System, 2003, Ghemawat et al. – http://research.google.com/archive/gfs.html
• Apache Zookeeper (ZK) – Distributed, available, reliable coordination system – The Chubby Lock Service …, 2006, Burrows – http://research.google.com/archive/chubby.html
• Apache Hadoop MapReduce (MR) – Distributed, fault-tolerant, batch-oriented data processing – MapReduce: …, 2004, Dean and Ghemawat – http://research.google.com/archive/mapreduce.html
Page 6
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
TL;DR
Page 7
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
So what is HBase anyway?
Page 8
C1 tree C0 tree
Disk Memory
Figure 2.1. Schematic picture of an LSM-tree of two components
Figure 2.1 reproduced from O’Neil, Patrick, et al. "The log-structured merge-tree (LSM-tree)." Acta Informatica 33.4 (1996): 351-385.
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
So what is HBase anyway?
Page 9
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
So what is HBase anyway?
Page 10
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
So what is HBase anyway?
Page 11
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
So what is HBase anyway?
Page 12
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
a
cf1
1368394583 71368394261 "hello"
"bar"
1368394583 221368394925 13.61368393847 "world"
"foo"
cf21368387684 "almost the loneliest number"1.0001
1368396302 "fourth of July""2011-07-04"
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
High-level Architecture
Page 13
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Logical Data Model
Page 14
1368387247 [3.6 kb png data]"thumb"cf2b
a
cf1
1368394583 71368394261 "hello"
"bar"
1368394583 221368394925 13.61368393847 "world"
"foo"
cf21368387684 "almost the loneliest number"1.0001
1368396302 "fourth of July""2011-07-04"
Table A
rowkey columnfamily
columnqualifier timestamp value
Rows
Column Families
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Logical Architecture
Page 15
ab
dc
ef
hg
ij
lk
mn
po
Table A
Region 1
Region 2
Region 3
Region 4
Region Server 7Table A, Region 1Table A, Region 2
Table G, Region 1070Table L, Region 25
Region Server 86Table A, Region 3Table C, Region 30Table F, Region 160Table F, Region 776
Region Server 367Table A, Region 4Table C, Region 17Table E, Region 52
Table P, Region 1116
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Physical Architecture
Page 16
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
RegionServer
DataNode
RegionServer
DataNode
RegionServer
DataNode
RegionServer
DataNode
...
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
MasterMaster
ZooKeeper
ZooKeeper
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
NameNode
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
HBaseClient
HDFS
HBase
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
User API • {rowkey => {family => {qualifier => {version => value}}}}
– Think: nested TreeMap (Java), OrderedDictionary (C#), OrderedDict (Python) • Basic data operations: GET, PUT, DELETE • SCAN over range of key-values
– benefit of the sorted rowkey business – this is how you implement any kind of "complex query” *
• GET, SCAN support Filters – Push application logic to RegionServers
• INCREMENT, APPEND, CheckAnd{Put,Delete} – Server-side, atomic data operations, can be contentious!
Page 17
* This is also a foundational component in what we refer to as “schema design” in this “schemaless” database.
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Anatomy of a RegionServer
Page 18
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
So what is HBase anyway?
Page 19
63HBase in distributed mode
store/access data on HDFS. The master process does the distribution of regions amongRegionServers, and each RegionServer typically hosts multiple regions.
Given that the underlying data is stored in HDFS, which is available to all clients asa single namespace, all RegionServers have access to the same persisted files in the filesystem and can therefore host any region (figure 3.8). By physically collocating Data-Nodes and RegionServers, you can use the data locality property; that is, RegionServ-ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In someHBase deployments, the MapReduce framework isn’t deployed at all if the workload isprimarily random reads and writes. In other deployments, where the MapReduce pro-cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-Servers can run together.
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Table T1 split into 3 regions - R1, R2, and R3
T1R1
T1R2
T1R3
Figure 3.6 A table consists of multiple smaller chunks called regions.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.
Licensed to Nick Dimiduk <[email protected]>
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory
C1 tree C0 tree
Disk MemoryC1 tree C0 tree
Disk Memory Cache
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Storage Machinery
Page 20
RegionServer (HBase)
DataNode (Hadoop DFS)
HLog(WAL)
HRegion
HStore
StoreFile
HFile
StoreFile
HFile
MemStore
... ...
HStore
BlockCache
HRegion
...
HStoreHStore
...
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Storage Machinery
Page 21
RegionServer (HBase)
DataNode (Hadoop DFS)
HLog(WAL)
HRegion
HStore
StoreFile
HFile
StoreFile
HFile
MemStore
... ...
HStore
BlockCache
HRegion
...
HStoreHStore
...C1
C0
C1
C0
C1
C0
C1
C0
Cache
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Storage Machinery: Write Path
Page 22
RegionServer (HBase)
DataNode (Hadoop DFS)
HLog(WAL)
HRegion
HStore
StoreFile
HFile
StoreFile
HFile
MemStore
... ...
HStore
BlockCache
HRegion
...
HStoreHStore
...
1
2
3
4
5
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Storage Machinery: Read Path
Page 23
RegionServer (HBase)
DataNode (Hadoop DFS)
HLog(WAL)
HRegion
HStore
StoreFile
HFile
StoreFile
HFile
MemStore
... ...
HStore
BlockCache
HRegion
...
HStoreHStore
...
1 5
23
3
2
4
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
By Example
Page 24
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Database Dichotomy
Page 25
ReadWrite
Latency
Throughput
PredicatePushdown
Row+ColumnBloomfilter
LargerBlock SizeBulkload
HFiles
Compression
Compression
Compression
Compression
SmallerBlock Size
LargerBlockCache
IncreaseBlocking Storefiles
LossyWAL Durability
IncreasedScanner Caching
LargerFlush Size
Smart RowkeyDesign
Smart RowkeyDesign
Smart RowkeyDesign
Smart RowkeyDesign
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Web-scale Database
Page 26
Appserver
Appserver
Appserver
Appserver
Appserver
Counters
Sessions
User profiles
Social Media
Application Data
ReadWrite
Latency
Throughput
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
“BigIndex”
Page 27
"BigIndex"
Document store
Search Search
SearchSearch
Search
Appserver
Appserver
Appserver
Appserver
Appserver
ReadWrite
Latency
Throughput
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Materialized View
Page 28
Appserver
Appserver
Appserver
Appserver
Appserver
ReadWrite
Latency
Throughput
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
ETL Assist
Page 29
ReadWrite
Latency
Throughput
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Lambda Architecture
Page 30
Appserver
Appserver
Appserver
Appserver
Appserver
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Resources
Page 31
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Join the Community! • hbase.apache.org
– hbase.apache.org/book.html – blogs.apache.org/hbase/ – hbase.apache.org/mail-lists.html
• IRC: irc.freenode.net #hbase • JIRA: issues.apache.org/jira/browse/HBASE • Source: git clone git://git.apache.org/hbase.git • In person
– HBaseCon, hbasecon.com – Hadoop Summit, hadoopsummit.org – Strata / Hadoop World, strataconf.com – Local meetup near you!
Page 32
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
HBase 0.98/1.0 • Hardening
– Stability, Reliability, Availability, Performance • Horizontal Scalability
– 1000’s of machines • Availability
– Speed of Recovery (MTTR), Region Replicas • Improved Multi-tenancy
– RPC priorities/QoS, namespace management • Cell-level security • Semantic Versioning • Client API cleanup
Page 33
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Thanks!
Page 34
M A N N I N G
Nick Dimiduk Amandeep Khurana
FOREWORD BY Michael Stack
hbaseinaction.com
Nick Dimiduk github.com/ndimiduk
@xefyr
n10k.com
strataeucftw
slideshare.net/xefyr/hbase-for-architects