6
IND PH: +91-9000380723 USA PH: +1-(999)-666-5174 1 | Page Email: [email protected] HADOOP DEVELOPER CONTENT Introduction to Hadoop What is Distributed File System? Problems with Traditional Large-Scale Systems Introduction to Hadoop Brief history of Hadoop RDBMS/SQL vs. Hadoop DWH vs. Hadoop Scaling with Hadoop Introduction to the Hadoop Ecosystem Business Use cases on Health Care /Banking Industry Assignment -1 HADOOP Cluster Setup Hadoop Installation & Configuration Setting up Standalone system Setting up pseudo distributed cluster Installing Hadoop in Pseudo Distributed Mode , Understanding Important configuration files ,their Properties and Demon Threads Hadoop Daemon Addresses and Ports, Other Hadoop Properties SSH Configuration Basic Unix/Linux Commands Hands-On Assignment -2 HDFS Deep Dive Significance of HDFS in Hadoop Features of HDFS HDFS Architecture Daemons of Hadoop Name Node and its functionality Data Node and its functionality Secondary Name Node and its functionality Job Tracker and its functionality Task Track and its functionality Data Flow (Anatomy of a File Read, Anatomy of a File Write, Coherency Model)

Hadoop Developer Course Content

Embed Size (px)

Citation preview

Page 1: Hadoop Developer Course Content

IND PH: +91-9000380723 USA PH: +1-(999)-666-5174

1 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m

HADOOP DEVELOPER CONTENT

Introduction to Hadoop

What is Distributed File System?

Problems with Traditional Large-Scale Systems

Introduction to Hadoop

Brief history of Hadoop

RDBMS/SQL vs. Hadoop

DWH vs. Hadoop

Scaling with Hadoop

Introduction to the Hadoop Ecosystem

Business Use cases on Health Care /Banking Industry

Assignment -1

HADOOP Cluster Setup

Hadoop Installation & Configuration

Setting up Standalone system

Setting up pseudo distributed cluster

Installing Hadoop in Pseudo Distributed Mode , Understanding Important

configuration files ,their Properties and Demon Threads

Hadoop Daemon Addresses and Ports, Other Hadoop Properties

SSH Configuration

Basic Unix/Linux Commands Hands-On

Assignment -2

HDFS Deep Dive

Significance of HDFS in Hadoop

Features of HDFS

HDFS Architecture

Daemons of Hadoop

Name Node and its functionality

Data Node and its functionality

Secondary Name Node and its functionality

Job Tracker and its functionality

Task Track and its functionality

Data Flow (Anatomy of a File Read, Anatomy of a File Write, Coherency Model)

Page 2: Hadoop Developer Course Content

IND PH: +91-9000380723 USA PH: +1-(999)-666-5174

2 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m

Heartbeats, Data Node commissioning/decommissioning

Rack Awareness, Block Scanner, Balancer, Trash, Health Check

Exploring the HDFS Web UI

Parallel Copying with DISTCP

Hadoop Archives

Hadoop Commands Hands on Live Environment

Assignment -3

Map Reduce

The Map Reduce Flow

Hadoop Data Types

Functional - Concept of Mappers, Functional - Concept of Reducers

Basic Map Reduce API Concepts

Writing Map Reduce Drivers, Mappers and Reducers in Java

The Execution Framework

Combiner

Partitioner

Shuffle and Sort

Speculative Execution

Speeding Up Hadoop Development by Using Eclipse

Hands-On Exercise: Writing a Map Reduce Program

Differences Between the Old and New Map Reduce APIs

Exploring the Map Reduce Web UI

Creating Input and Output Formats in Map Reduce Jobs

Text Input Format

Key Value Input Format

Sequence File Input Format

How to debug Map Reduce Jobs in Local and Pseudo cluster Mode.

OutPut Formats (TextOutput, BinaryOutPut, Multiple Output)

Joining Data sets in Map Reduce

Delving Deeper Into The Hadoop API

More Advanced Map Reduce Programming

Graph Manipulation in Hadoop

Algorithms – Traversing Graph etc.

Business Use Case: Facial Recognition against CCTV video files using Map Reduce Unit Testing Map Reduce Jobs.

Assignment -4

Page 3: Hadoop Developer Course Content

IND PH: +91-9000380723 USA PH: +1-(999)-666-5174

3 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m

Pigs Eat Anything

What Is Pig?

Pig Use Cases

How Pig Works

Installing and Configuring Pig

Pig Latin and the Grunt shell

Modes Of Execution in Pig

Local Mode

Map Reduce OR Distributed Mode

Loading data

Data types and schemas

Pig Latin details: structure, functions, expressions, relational operators

Intro to User Defined Functions and Scripts

How to write pig script

Advance Pig Latin, Evaluation and Filter functions, Pig and Ecosystem

Real time use cases – Health Care Industry

Hands on Exercise: Using Pig for ETL Processing

Assignment -5

Hive for Structured Data

Hive Introduction

Hive Architecture

Hive Meta Store

Comparison with Traditional Database (Schema on Read Versus Schema on Write,

Updates, Transactions and Indexes)

Hive Schema and Data Storage

Hive Setup and Configuration

Hive vs Pig

HiveQL and Hive Shell

Creating Hive Tables

Loading Data into Hive

Retrieving Data with the SELECT Command

Joining Tables

Storing Query Results in HDFS

Partitioning Data

Bucketing Data

Hive Variables

Page 4: Hadoop Developer Course Content

IND PH: +91-9000380723 USA PH: +1-(999)-666-5174

4 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m

The Hive CLI

Hive and Thrift

Hive Transform

Hands on Exercises – Playing with huge data and Querying extensively

Debugging and Troubleshooting Hive User Defined Functions

Appending Data into existing Hive Table

Custom Map/Reduce in Hive

Overview of Text Processing

Important String Functions

Using Regular Expressions in Hive

Sentiment Analysis and N-Grams

Hands on Exercise

Assignment -6

Real-time I/O with HBase

HBase Introduction

HBase Architecture

HBase versions and origins

HBase vs. RDBMS

HBase Master and Region Servers

Data Modeling

Column Families and Regions

Bloom Filters and Block Indexes

Write Pipeline/ Read Pipeline

Catalog Tables

Compactions

The HBase Shell

Running the Shell

Creating the Tables

Accessing Data in Tables

Administration

Scripting

HBase Administration

Monitoring

Backup

Tools

Compression

Page 5: Hadoop Developer Course Content

IND PH: +91-9000380723 USA PH: +1-(999)-666-5174

5 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m

Managed Operations

Capacity Planning

Map Reduce Integration

Assignment -7

Sqoop

Introduction ETL Concepts

Introduction to Sqoop

Setup and Configuration of Sqoop

MySQL client and Server Installation

How to connect to Relational Database using Sqoop

Sqoop Import

Connecting to a Database Server

Selecting the Data to Import

Free-form Query Imports

Controlling Parallelism

Controlling the Import Process

Controlling type mapping

Incremental Imports

File Formats

Importing Data into Hive

Importing Data into Hbase

Hands on Exercise

Working with Imported Data

Importing Large Objects

Sqoop Export

Introduction

Inserts vs Updates

Exports and Transactions

Hands on Exercise

Assignment -8

Flume

What is Flume?

Setup and Configuration of Flume

Flume Architecture

How it works?

Page 6: Hadoop Developer Course Content

IND PH: +91-9000380723 USA PH: +1-(999)-666-5174

6 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m

Reliability

Scalability

Manageability

Extensibility

Assignment -9

Zookeeper

The Zookeeper Service (Data Modal, Operations, Implementation, Consistency,

Sessions, States) Building Applications with Zookeeper (Zookeeper in Production)

Assignment -10

REAL TIME PROJECT

Health Care Dataset: It has all the details of Health Care System over a period of time

using which you may find out Member policy logins, Provide Services, Treatment

Methadone Abstract, Early Dropout Abstract, Payment Processing to Providers and agents

etc.

Additional Features

Cloudera HADOOP Developer/Admin Certification Guidance

HADOOP Installation process and Configuration

Well Versed Materials Which Covers Hadoop Ecosystem, UNIX and JAVA

Separate JAVA and Unix Training for Beginners

We also have a 24x7 Support