Upload
databloginfo
View
149
Download
0
Tags:
Embed Size (px)
Citation preview
Hadoop What is HDFS Core components Architecture Name Node Metadata Secondary Name Node HDFS Blocks Limitation File System commands
Hadoop is a framework that allows for distributed processing of large data sets across clusters of commodity computers using a
simple programming model
Hadoop was designed to enable applications to make most out of cluster architecture by addressing two key points:1. Layout of data across the cluster ensuring data is evenly distributed2. Design of applications to benefit from data locality
It brings us two main mechanism of hadoop hdfs and hadoop MapReduce
HDFS is a file system designed for storing very large files with
streaming data access patterns, running clusters on commodity
hardware
Highly fault tolerantSuitable for application with large data setsStreaming access to file system dataCan be built out of commodity hardware
Features
Hadoop can handle small datasets but you can’t unleash the power of hadoop.There is overhead associated with each data distribution. If dataset is small you won’t get huge advantage in hadoop.
If dataset is small and unstructured, you will try to collate the data.
Areas where Hadoop is not good fit Today