2.introduction to hdfs

Introduction to

HDFS(Hadoop Distributed File System)

Hadoop What is HDFS Core components Architecture Name Node Metadata Secondary Name Node HDFS Blocks Limitation File System commands

Hadoop is a framework that allows for distributed processing of large data sets across clusters of commodity computers using a

simple programming model

Hadoop was designed to enable applications to make most out of cluster architecture by addressing two key points:1. Layout of data across the cluster ensuring data is evenly distributed2. Design of applications to benefit from data locality

It brings us two main mechanism of hadoop hdfs and hadoop MapReduce

Hadoop Core Components

Splits, Scatter, Replicate and manage data across nodes

HDFS is a file system designed for storing very large files with

streaming data access patterns, running clusters on commodity

hardware

Highly fault tolerantSuitable for application with large data setsStreaming access to file system dataCan be built out of commodity hardware

Features

Hadoop Core Components

HDFS Architecture

Main Components of HDFS

Hadoop Cluster

Metadata

Secondary Name Node

HDFS Block

Hadoop can handle small datasets but you can’t unleash the power of hadoop.There is overhead associated with each data distribution. If dataset is small you won’t get huge advantage in hadoop.

If dataset is small and unstructured, you will try to collate the data.

Areas where Hadoop is not good fit Today

File System Commands

File System Commands

Data & Analytics

2.introduction to hdfs