16
HADOOP COMPONENTS & ARCHITECTURE: BIG DATA & HADOOP TRAINING

Big Data and Hadoop Components

Embed Size (px)

Citation preview

Page 1: Big Data and Hadoop Components

HADOOP COMPONENTS & ARCHITECTURE:

BIG DATA & HADOOP TRAINING

Page 2: Big Data and Hadoop Components

UNDERSTAND HOW THE HADOOP ECOSYSTEM WORKS TO MASTER APACHE HADOOP SKILLS AND GAIN IN-DEPTH KNOWLEDGE OF BIG DATA ECOSYSTEM AND HADOOP ARCHITECTURE. HOWEVER, BEFORE YOU ENROLL FOR ANY BIG DATA HADOOP TRAINING COURSE IT IS NECESSARY TO GET SOME BASIC IDEA ON HOW THE HADOOP ECOSYSTEM WORKS. LEARN ABOUT THE VARIOUS HADOOP COMPONENTS THAT CONSTITUTE THE APACHE HADOOP ARCHITECTURE IN THIS PRESENTATION.

Page 3: Big Data and Hadoop Components

DEFINING ARCHITECTURE COMPONENTS OF THE BIG DATA

ECOSYSTEM

Page 4: Big Data and Hadoop Components
Page 5: Big Data and Hadoop Components

CORE HADOOP COMPONENTS

 •  Hadoop Common• 2) Hadoop Distributed File System (HDFS) • 3) MapReduce- Distributed Data Processing Framework of Apache Hadoop• 4)YARN•  • Read More in Detail about Hadoop Components - https://www.dezyre.com/article/hadoop-components-and-architecture-big-data-and-hadoop-training/114

Page 7: Big Data and Hadoop Components

APACHE PIG

•  Apache Pig is a convenient tool developed by Yahoo for analysing huge data sets efficiently and easily. It provides a high level data flow language Pig Latin that is optimized, extensible and easy to use.

Page 8: Big Data and Hadoop Components

APACHE HIVE

•   Hive developed by Facebook is a data warehouse built on top of Hadoop and provides a simple language known as HiveQL similar to SQL for querying, data summarization and analysis. Hive makes querying faster through indexing.

Page 9: Big Data and Hadoop Components

DATA INTEGRATION COMPONENTS OF HADOOP ECOSYSTEM- SQOOP AND FLUME

Page 10: Big Data and Hadoop Components

APACHE SQOOP

• Sqoop component is used for importing data from external sources into related Hadoop components like HDFS, HBase or Hive. It can also be used for exporting data from Hadoop o other external structured data stores. 

Page 11: Big Data and Hadoop Components

FLUME

•  Flume component is used to gather and aggregate large amounts of data. Apache Flume is used for collecting data from its origin and sending it back to the resting location (HDFS).

Page 12: Big Data and Hadoop Components

DATA STORAGE COMPONENT OF HADOOP ECOSYSTEM –HBASE

Page 13: Big Data and Hadoop Components

HBASE

• HBase is a column-oriented database that uses HDFS for underlying storage of data. HBase supports random reads and also batch computations using MapReduce. With HBase NoSQL database enterprise can create large tables with millions of rows and columns on hardware machine. 

Page 14: Big Data and Hadoop Components

MONITORING, MANAGEMENT AND ORCHESTRATION COMPONENTS OF HADOOP ECOSYSTEM- OOZIE AND

ZOOKEEPER

Page 15: Big Data and Hadoop Components

OOZIE

• Oozie is a workflow scheduler where the workflows are expressed as Directed Acyclic Graphs. Oozie runs in a Java servlet container Tomcat and makes use of a database to store all the running workflow instances, their states ad variables along with the workflow definitions to manage Hadoop jobs (MapReduce, Sqoop, Pig and Hive).

Page 16: Big Data and Hadoop Components

ZOOKEEPER

• Zookeeper is the king of coordination and provides simple, fast, reliable and ordered operational services for a Hadoop cluster. Zookeeper is responsible for synchronization service, distributed configuration service and for providing a naming registry for distributed systems.

•  • To Know about other Hadoop Components - https://www.dezyre.com/article/hadoop-components-and-architecture-big-data-and-hadoop-training/114