Upload
manish-chopra
View
7
Download
1
Embed Size (px)
Citation preview
Difference Between Hadoop 2 vs Hadoop 3
Features Hadoop 2.x Hadoop 3.x
License Apache 2.0, Open Source Apache 2.0, Open Source
Minimumsupportedversion of java
Minimum supported version of java is java 7. Minimum supported version of java is java 8
Fault tolerance Fault tolerance can be handled by replication(which is wastage of space)
Fault tolerance can be handled by erasurecoding
Data Balancing For data balancing uses HDFS balancer.For data balancing uses intra datanodebalancer, which is invoked via the hdfs diskbalancer CLI.
StorageScheme Uses 3X replication scheme Support for erasure encoding in hdfs.
Storageoverhead HDFS has 200% overhead in storage space Storage overhead is only 50%
Storageoverheadexample
If there is 6 block so there will be 18 blocksoccupied the space because of replicationscheme.
If there is 6 block so there will be 9 blockoccupied the space 6 block and 3 for parity.
YARN timelineservice
Uses an old timeline service which hasscalability issues.
Improve the timeline service v2 and improvesthe scalability and reliability of timeline service.
Default portsrange
In Hadoop 2.0 some default ports are Linuxephemeral port range. So at the time of startupthey will be fail to bind.
But in hadoop 3.0 these ports have been movedout of the ephemeral range.
Tools Uses Hive, pig, Giraph and other hadoop tools Hive, pig, Tez, Hama, Giraph and other hadooptools are available.
Compatible filesystem
HDFS (Default FS), FTP File system: Thisstores all its data on remotely accessible FTPservers. Amazon S3 (Simple Storage Service)file system Windows Azure Storage Blobs(WASB) file system.
It supports all the previous one as well asMicrosoft Azure Data Lake filesystem.
DatanodeResources
Datanode resource is not dedicated forthe mapreduce we can use it for otherapplication.
Here also datanode resources can be used forother Applications too
MR APIcompatibity
MR API compatible with hadoop 1.x program toexecute on hadoop 2.X
Here also MR API is compatible with runninghadoop 1.x programs to execute on hadoop 3.X
support forMicrosoftwindows
It can be deployed on windows it also supports for Microsoft windows
Slots / container
Hadoop 1 works on concept of slots buthadoop 2.X works on the concept of thecontainer. Through in the container we can rungeneric task.
It also works on the concept of container.
Single point offailure
Has Features to overcome SPOF so wheneverNamenode fails it recovers automatically
Has Feature to overcome SPOF so wheneverNamenode fail it recovers automatically noneeds manual intervention to overcome it
HDFSFederation
In hadoop 1.0 only single NameNode tomanage all Namespace but in Hadoop 2.0mutiple NameNode for Mutiple Namespace
Hadoop 3.x also have multiple Namenode formultiple namespace
Scalibility we can scale up to 10000 Nodes per clusterBetter scalability. we can scale more than10000 nodes per cluster
Faster access todata
due to data Node caching we can fast accessthe data
Here also through Datanode caching we canfast access the data
HDFS snapshotHadoop 2 adds the support for snapshot. itprovides disaster recovery and protection foruser error
Haddop 2 also support for the snapshot feature.
platformCan serve as a platform for a wide variety ofdata analytics possible to run event processing,streaming and real time operations.
Here also it is possible to run event processing,streaming and real time operation on the top ofYarn
ClusterResourceManagement
For cluster resource Management ituses YARN. It improves scalability, highavailability, Multitenancy.
For cluster resource Management Uses YARN,with all the features