Upload
apache-apex
View
74
Download
0
Embed Size (px)
Citation preview
Image Source: https://help.marklogic.com/news/list/Index/10
Agenda
What are Blocks?A physical storage disk has a block size - minimum amount of data it
can read or write. Normally 512 bytes.
File systems for a single disk also deal with data in blocks. Normally few kilo bytes (4 kb).
Hadoop has a much larger block size. By default it is 64 mb.
Files in HDFS are broken down into block sized chunks and are stored as independent units.
However, files smaller than a block size do not occupy the entire block.
Should I care?
Why so large blocks?Minimize disk seek times
Assuming 10 ms of seek time, and 100 MB/s as disk transfer rate, if block size if 100 MB, then seek time is 1% of transfer time which is small enough to ignore.
Hence default is 64 MB while many production environments also use 128 MB.
HDFS Architecture
Image Source: https://hadoop.apache.org
Namenode and DatanodeMaster - Namenode
Manages file system namespace
File system tree and metadata for all files and directories
Stores this info in -
Namespace image
Edit log
Knows for a given file which datanodes has the corresponding blocks. Reconstructed at startup
Worker - DatanodeStore and retrieve blocks as requested by clients
Periodically report back to the namenode on the list of blocks they are storing
HDFS Storage
Image Source: https://developer.yahoo.com/hadoop/tutorial/module2.html
Secondary Namenode
Image Source: http://www.quickmeme.com/meme/35ke38
Secondary NamenodeNot a backup namenode
Periodically merge the namespace image with the edit log, if edit log becomes too large
Usually runs on a different machine than the namenode
The secondary however always lags behind primary and hence the merged copy cannot be used in case of primary failure
In event of primary failure, copy the primary namespace image to the secondary and run it as the new primary.
Writing a File in HDFS
Reading a file in HDFS
HDFS Block Placement
Small File Problem?
Each file occupies namespace irrespective of file size!!
Further ReadingHDFS Comics :-) https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
Sample:
Thank You!!
Please send your questions at:[email protected] / [email protected]