Upload
vishal-gupta
View
36
Download
0
Embed Size (px)
Citation preview
IND PH: +91-9000380723 USA PH: +1-(999)-666-5174
1 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m
HADOOP DEVELOPER CONTENT
Introduction to Hadoop
What is Distributed File System?
Problems with Traditional Large-Scale Systems
Introduction to Hadoop
Brief history of Hadoop
RDBMS/SQL vs. Hadoop
DWH vs. Hadoop
Scaling with Hadoop
Introduction to the Hadoop Ecosystem
Business Use cases on Health Care /Banking Industry
Assignment -1
HADOOP Cluster Setup
Hadoop Installation & Configuration
Setting up Standalone system
Setting up pseudo distributed cluster
Installing Hadoop in Pseudo Distributed Mode , Understanding Important
configuration files ,their Properties and Demon Threads
Hadoop Daemon Addresses and Ports, Other Hadoop Properties
SSH Configuration
Basic Unix/Linux Commands Hands-On
Assignment -2
HDFS Deep Dive
Significance of HDFS in Hadoop
Features of HDFS
HDFS Architecture
Daemons of Hadoop
Name Node and its functionality
Data Node and its functionality
Secondary Name Node and its functionality
Job Tracker and its functionality
Task Track and its functionality
Data Flow (Anatomy of a File Read, Anatomy of a File Write, Coherency Model)
IND PH: +91-9000380723 USA PH: +1-(999)-666-5174
2 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m
Heartbeats, Data Node commissioning/decommissioning
Rack Awareness, Block Scanner, Balancer, Trash, Health Check
Exploring the HDFS Web UI
Parallel Copying with DISTCP
Hadoop Archives
Hadoop Commands Hands on Live Environment
Assignment -3
Map Reduce
The Map Reduce Flow
Hadoop Data Types
Functional - Concept of Mappers, Functional - Concept of Reducers
Basic Map Reduce API Concepts
Writing Map Reduce Drivers, Mappers and Reducers in Java
The Execution Framework
Combiner
Partitioner
Shuffle and Sort
Speculative Execution
Speeding Up Hadoop Development by Using Eclipse
Hands-On Exercise: Writing a Map Reduce Program
Differences Between the Old and New Map Reduce APIs
Exploring the Map Reduce Web UI
Creating Input and Output Formats in Map Reduce Jobs
Text Input Format
Key Value Input Format
Sequence File Input Format
How to debug Map Reduce Jobs in Local and Pseudo cluster Mode.
OutPut Formats (TextOutput, BinaryOutPut, Multiple Output)
Joining Data sets in Map Reduce
Delving Deeper Into The Hadoop API
More Advanced Map Reduce Programming
Graph Manipulation in Hadoop
Algorithms – Traversing Graph etc.
Business Use Case: Facial Recognition against CCTV video files using Map Reduce Unit Testing Map Reduce Jobs.
Assignment -4
IND PH: +91-9000380723 USA PH: +1-(999)-666-5174
3 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m
Pigs Eat Anything
What Is Pig?
Pig Use Cases
How Pig Works
Installing and Configuring Pig
Pig Latin and the Grunt shell
Modes Of Execution in Pig
Local Mode
Map Reduce OR Distributed Mode
Loading data
Data types and schemas
Pig Latin details: structure, functions, expressions, relational operators
Intro to User Defined Functions and Scripts
How to write pig script
Advance Pig Latin, Evaluation and Filter functions, Pig and Ecosystem
Real time use cases – Health Care Industry
Hands on Exercise: Using Pig for ETL Processing
Assignment -5
Hive for Structured Data
Hive Introduction
Hive Architecture
Hive Meta Store
Comparison with Traditional Database (Schema on Read Versus Schema on Write,
Updates, Transactions and Indexes)
Hive Schema and Data Storage
Hive Setup and Configuration
Hive vs Pig
HiveQL and Hive Shell
Creating Hive Tables
Loading Data into Hive
Retrieving Data with the SELECT Command
Joining Tables
Storing Query Results in HDFS
Partitioning Data
Bucketing Data
Hive Variables
IND PH: +91-9000380723 USA PH: +1-(999)-666-5174
4 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m
The Hive CLI
Hive and Thrift
Hive Transform
Hands on Exercises – Playing with huge data and Querying extensively
Debugging and Troubleshooting Hive User Defined Functions
Appending Data into existing Hive Table
Custom Map/Reduce in Hive
Overview of Text Processing
Important String Functions
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Hands on Exercise
Assignment -6
Real-time I/O with HBase
HBase Introduction
HBase Architecture
HBase versions and origins
HBase vs. RDBMS
HBase Master and Region Servers
Data Modeling
Column Families and Regions
Bloom Filters and Block Indexes
Write Pipeline/ Read Pipeline
Catalog Tables
Compactions
The HBase Shell
Running the Shell
Creating the Tables
Accessing Data in Tables
Administration
Scripting
HBase Administration
Monitoring
Backup
Tools
Compression
IND PH: +91-9000380723 USA PH: +1-(999)-666-5174
5 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m
Managed Operations
Capacity Planning
Map Reduce Integration
Assignment -7
Sqoop
Introduction ETL Concepts
Introduction to Sqoop
Setup and Configuration of Sqoop
MySQL client and Server Installation
How to connect to Relational Database using Sqoop
Sqoop Import
Connecting to a Database Server
Selecting the Data to Import
Free-form Query Imports
Controlling Parallelism
Controlling the Import Process
Controlling type mapping
Incremental Imports
File Formats
Importing Data into Hive
Importing Data into Hbase
Hands on Exercise
Working with Imported Data
Importing Large Objects
Sqoop Export
Introduction
Inserts vs Updates
Exports and Transactions
Hands on Exercise
Assignment -8
Flume
What is Flume?
Setup and Configuration of Flume
Flume Architecture
How it works?
IND PH: +91-9000380723 USA PH: +1-(999)-666-5174
6 | P a g e E m a i l : h o d o o p b y k o t i @ g m a i l . c o m
Reliability
Scalability
Manageability
Extensibility
Assignment -9
Zookeeper
The Zookeeper Service (Data Modal, Operations, Implementation, Consistency,
Sessions, States) Building Applications with Zookeeper (Zookeeper in Production)
Assignment -10
REAL TIME PROJECT
Health Care Dataset: It has all the details of Health Care System over a period of time
using which you may find out Member policy logins, Provide Services, Treatment
Methadone Abstract, Early Dropout Abstract, Payment Processing to Providers and agents
etc.
Additional Features
Cloudera HADOOP Developer/Admin Certification Guidance
HADOOP Installation process and Configuration
Well Versed Materials Which Covers Hadoop Ecosystem, UNIX and JAVA
Separate JAVA and Unix Training for Beginners
We also have a 24x7 Support