Hadoop Deployment Guide v1.0

Dell Next Generation Compute Solutions

Dell | Cloudera Solution for Apache Hadoop Deployment Guide A Dell Deployment Guide

Dell | Cloudera Hadoop Solution Deployment Guide v1.0

2

Table of Contents

Tables 3

Figures 3

Overview 4

Summary 4 Abbreviations 4

Dell | Cloudera Hadoop Solution 5

Solution Overview 5

Dell | Cloudera Hadoop Solution Hardware Architecture 8

High-level Architecture 9 High-level Network Architecture 10

Dell | Cloudera Hadoop Solution Deployment Process Overview 14

Dell | Cloudera Hadoop Solution Hardware Configuration 14

Edge Node Hardware Configuration 14 Master Node Hardware Configuration 15 Slave Node Hardware Configuration 15 Network Switch Configuration 15

Dell | Cloudera Hadoop Solution Network Configuration 16

Dell | Cloudera Hadoop Solution Automated Software Installation 17

Admin Node Installation 17 Slave Node Installation 20

Installing components 21

General installation process 21

Dell | Cloudera Hadoop Solution Manual Software Installation 22

Solution Deployment Prerequisites 22 Configuration Files and Scripts 22 Prepare the Deployment Server 22 Installing Hadoop on the Primary Master Node 23 Configuring Memory Utilization for HDFS and MapReduce 28 Configuring the Hadoop environment 28 Installing Hadoop on the Secondary Master Node (aka Checkpoint Node) 30 Installing Hadoop on the JobTracker Node 30 Installing Hadoop on the Slave Node 30 Installing Hadoop on the Edge Node 30 Configuring the Secondary Master Node Internal Storage 30 Configuring the Cluster for the Secondary Master Node 30

Verify Cluster Functionality 31

Operating System Configuration Checklist 31 Configuring Rack Awareness 31 Starting Your Hadoop Cluster 32

Dell | Cloudera Hadoop Solution Software Configuration 33

Dell | Cloudera Hadoop Solution Configuration Parameters Recommended Values 33

Dell | Cloudera Hadoop Solution Monitoring and Alerting 36

Hadoop Ecosystem Components 37

Pig 37 Hive 39


3

Sqoop 39 ZooKeeper 40

References 43

To Learn More 43

Tables

Table 1: Hadoop Use Cases 5

Table 2: Dell | Cloudera Hardware Configurations 9

Table 3: Dell | Cloudera Hadoop Solution Software Locations 9

Table 4: Dell | Cloudera Hadoop Solution Support Matrix 10

Table 5: Dell | Cloudera Hadoop Solution Network Cabling 10

Table 6: IP Scheme 16

Table 7: Accessing Services 20

Table 8: Local storage directories configuration 24

Table 9: hdfs-site.xml 33

Table 10: mapred-site.xml 34

Table 11: default.xml 34

Table 12: hadoop-env.sh 35

Table 13: /etc/fstab 35

Table 14: core-site.xml 35

Table 15: /etc/security/limits.conf 35

Figures

Figure 1: Dell | Cloudera Hadoop Solution Taxonomy 6

Figure 2: Dell | Cloudera Hadoop Solution Hardware Architecture 8

Figure 3: Dell | Cloudera Hadoop Solution Network Interconnects 10

Figure 4: VMware Player Configuration for DVD 18

Figure 5: VMware Player Configuration for Network Adapter 19

THIS PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. © 2011 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell. Dell, the DELL logo, and the DELL badge, PowerConnect, and PowerEdge are trademarks of Dell Inc. Cloudera, CDH,, Cloudera Enterprise are trademarks of Cloudera and its affiliates in the US and other countries. Intel and Xeon are registered trademarks of Intel Corporation in the U.S. and other countries. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own. August 2011 Revision A00


4

Overview

Summary The document presents the reference architecture of the Dell™ | Cloudera™ Hadoop™ Solution. The

deployment guide describes the steps to install Dell | Cloudera Hadoop Solution on the predefined hardware

and network configuration specified in the “Dell | Cloudera Hadoop Solution Reference Architecture v1.0”

document.

Abbreviations Abbreviation Definition

BMC Baseboard management controller

CDH Cloudera Distribution for Hadoop

DMBS Database management system

EDW Enterprise data warehouse

EoR End-of-row switch/router

HDFS Hadoop File System

IPMI Intelligent Platform Management

Interface

NIC Network interface card

OS Operating system

ToR Top-of-rack switch/router


5

Dell | Cloudera Hadoop Solution

Solution Overview Hadoop is an Apache project being built and used by a global community of contributors, using the Java

programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively

across its businesses. Other contributors and users include Facebook, LinkedIn, eHarmony, and eBay. Cloudera

has created a quality-controlled distribution of Hadoop and offers commercial management software,

support, and consulting services.

Dell developed a solution for Hadoop that includes optimized hardware, software, and services to streamline

deployment and improve the customer experience.

The Dell | Cloudera Hadoop Solution is based on the Cloudera CDH Enterprise distribution of Hadoop. Dell’s solution includes:

Reference architecture and best practices

Optimized hardware and network infrastructure

Cloudera CDH Enterprise software (CDH community-provided for customer deployed solutions)

Hadoop infrastructure management tools

Dell Crowbar software

This solution provides Dell a foundation to offer additional solutions as the Hadoop environment evolves and

expands.

The solution is designed to address the following use cases:

Table 1: Hadoop Use Cases

Use case Description

Data storage

The user would like to be able to collect and store unstructured and semi-structured data

in a fault-resilient scalable data store that can be organized and sorted for indexing and

analysis.

Batch processing of

unstructured data

The user would like to batch-process (index, analyze, etc.) large quantities of

unstructured and semi-structured data.

Data archive The user would like medium-term (12–36 months) archival of data from EDW/DBMS to

increase the length that data is retrained or to meet data retention policies/compliance.

Integration with data

warehouse

The user would like to transfer data stored in Hadoop into a separate DBMS for advanced

analytics. Also the user may want to transfer the data from DMBS back on Hadoop.

Aside from the Hadoop core technology (HDFS, MapReduce, etc.) Dell had designed additional capabilities meant to address

specific customer needs:

Monitoring, reporting, and alerting of the hardware and software components

Infrastructure configuration automation

The Dell | Cloudera Hadoop Solution lowers the barrier to adoption for organizations looking to use Hadoop in

production. Dell’s customer-centered approach is to create rapidly deployable and highly optimized end-to-

end Hadoop solutions running on commodity hardware. Dell provides all the hardware and software

components and resources to meet your requirements, and no other supplier need be involved.

The hardware platform for the Dell | Cloudera Hadoop Solution (Figure 1) is the Dell™ PowerEdge™ C-series.

Dell PowerEdge C-series servers are focused on hyperscale and cloud capabilities. Rather than emphasizing

gigahertz and gigabytes, these servers deliver maximum density, memory, and serviceability while minimizing

total cost of ownership. It’s all about getting the processing customers need in the least amount of space and

in an energy-efficient package that slashes operational costs.


6

Dell recommends Red Hat Enterprise Linux 5.6 for use in Cloudera Hadoop deployments. You can choose to

install CentOS5.6 for user-deployed solutions.

The operating system of choice for the Dell | Cloudera Hadoop Solution is Linux (i.e. RHEL, CentOS). The

recommended Java Virtual Machine (JVM) is the Oracle Sun JVM.

The hardware platforms, the operating system, and the Java Virtual Machine make up the foundation on which

the Hadoop software stack runs.

The bottom layer of the Hadoop stack (Figure 1) comprises two frameworks:

1. The Data Storage Framework (HDFS) is the file system that Hadoop uses to store data on the cluster nodes.

Hadoop Distributed File System (HDFS) is a distributed, scalable, and portable file system.

2. The Data Processing Framework (MapReduce) is a massively-parallel compute framework inspired by

Google’s MapReduce papers.

The next layer of the stack in the Dell | Cloudera Hadoop Solution design is the network layer. Dell recommends implementing

the Hadoop cluster on a dedicated network for two reasons:

1. Dell provides network design blueprints that have been tested and qualified.

2. Network performance predictability—sharing the network with other applications may have detrimental

impact on the performance of the Hadoop jobs.

The next two frameworks—the Data Access Framework and the Data Orchestration Framework—comprise

utilities that are part of the Hadoop ecosystem.

Dell listened to its customers and designed a Hadoop solution that is unique in the marketplace. Dell’s end-to-

end solution approach means that you can be in production with Hadoop in a shorter time than is traditionally

possible with homegrown solutions. The Dell | Cloudera Hadoop Solution embodies all the software functions

and services needed to run Hadoop in a production environment. One of Dell’s chief contributions to Hadoop

is a method to rapidly deploy and integrate Hadoop in production. These complementary functions are

designed and implemented side-by-side with the core Hadoop core technology.

Installing and configuring Hadoop is non-trivial. There are different roles and configurations that need to

deployed on various nodes. Designing, deploying, and optimizing the network layer to match Hadoop’s

Figure 1: Dell | Cloudera Hadoop Solution Taxonomy


7

scalability requires consideration for the type of workloads that will be running on the Hadoop cluster. The

deployment mechanism that Dell designed for Hadoop automates the deployment of the cluster from “bare-

metal” (no operating system installed) all the way to installing and configuring the Hadoop software

components to your specific requirements. Intermediary steps include system BIOS update and configuration,

RAID/SAS configuration, operating system deployment, Hadoop software deployment, Hadoop software

configuration, and integration with your data center applications (i.e. monitoring and alerting).

Data backup and recovery is another topic that was brought up during customer roundtables. As Hadoop

becomes the de facto platform for business-critical applications, the data that is stored in Hadoop is crucial for

ensuring business continuity. Dell’s approach is to offer several enterprise-grade backup solutions and let the

customer choose while providing reference architectures and deployments guides for streamlined, consistent,

low-risk implementations. Contact your Dell sales representative for additional information.

Lastly, Dell’s open, integrated approach to enterprise-wide systems management enables you to build

comprehensive system management solutions based on open standards and integrated with industry-leading

partners. Instead of building a patchwork of solutions leading to systems management sprawl, Dell integrates

the management of the Dell hardware running the Hadoop cluster with the “traditional” Hadoop management

consoles (Ganglia, Nagios).

To summarize, Dell has added Hadoop to its data analytics solutions portfolio. Dell’s end-to-end solution

approach means that Dell will provide readily available software interfaces for integration between the

solutions in the portfolio.

In the current design, the Dell | Cloudera Hadoop Solution contains the core components of a typical Hadoop

deployment (HDFS, MapReduce, etc.) and auxiliary services (monitoring, reporting, security, etc.) that span the

entire solution stack.


8

Dell | Cloudera Hadoop Solution Hardware Architecture

The Dell | Cloudera Hadoop Solution hardware consists of:

Master Node (aka Name Node)—runs all the services needed to manage the HDFS data storage and MapReduce task

distribution and tracking.

Slave Node— runs all the services required to store blocks of data on the local hard drives and execute processing

tasks against that data

Edge Node—provides the interface between the data and processing capacity available in the Hadoop cluster and a

user of that capacity

Admin Node—provides cluster deployment and management capabilities

Figure 2: Dell | Cloudera Hadoop Solution Hardware Architecture


9

High-level Architecture The hardware configurations for the Dell | Cloudera Hadoop Solution are:

Table 2: Dell | Cloudera Hardware Configurations

Machine Function Master Node (Admin Node) Slave Node (Slave Node) Edge Node

Platform PowerEdge C2100 PowerEdge C2100 PowerEdge C2100

CPU 2x E5645 (6-core) 2x E5606 (4-core)

(optional 2x E5645 6-core)

2x E5645 (6-core)

RAM (Recommended) 96GB 24GB 48GB

Add-in NIC One dual-port Intel 1GigE None One dual-port Intel 10GigE

DISK 6x 600GB SAS NL3.5” 12x 1TB SATA 7.2K 3.5” 6x 600GB SAS NL3.5

Storage Controller PERC H700 LSI2008 PERC H700

RAID RAID 10 JBOD RAID 10

Min per Rack 1

Max Per Rack 20

Min per Pod 2 3 1

Max per Pod 2 60

Min per cluster 2 36 1

Max per Cluster 2 720

Table 3: Dell | Cloudera Hadoop Solution Software Locations

Daemon Primary Location Secondary Location

JobTracker MasterNode02 MasterNode01

TaskTracker SlaveNode(x)

Slave Node SlaveNode(x)

Master Node MasterNode01 MasterNode02

Operating System Provisioning MasterNode02 MasterNode01

Chef MasterNode02 MasterNode01

Yum Repositories MasterNode02 MasterNode01


10

Table 4: Dell | Cloudera Hadoop Solution Support Matrix

RA Version OS Version Hadoop Version Available Support

1.0 Red Hat Enterprise Linux 5.6 Cloudera CDH3 Enterprise

Dell Hardware Support

Cloudera Hadoop Support

Red Hat Linux Support

1.0 CentOS 5.6 Cloudera CDH3 Community Dell Hardware Support

High-level Network Architecture The network interconnects between various hardware components of the cloud solution are depicted in the

following diagram.

Figure 3: Dell | Cloudera Hadoop Solution Network Interconnects

The network cabling within the Dell | Cloudera Hadoop Solution is described in the following table.

Table 5: Dell | Cloudera Hadoop Solution Network Cabling

Component NICs to Switch Port

LOM1 LOM2 PCI-NIC1 PCI-NIC2 BMC

Master Node

Data Node N/A N/A

Edge Node

Legend

Cluster Production LAN

Cluster Management LAN

Cluster Edge LAN

NIC

2

NIC

1

BM

C


11

Top of Rack Switch Port Connectivity

Node Connection Switch Port

rNN-n01 LOM1 rNN-sw01 1

BMC rNN-sw01 25

LOM2 rNN-sw02 1


BMC rNN-sw01 26

LOM2 rNN-sw02 2


BMC rNN-sw01 27

LOM2 rNN-sw02 3


BMC rNN-sw01 28

LOM2 rNN-sw02 4


BMC rNN-sw01 29


12

LOM2 rNN-sw02 5


BMC rNN-sw01 30

LOM2 rNN-sw02 6


BMC rNN-sw01 31

LOM2 rNN-sw02 7


BMC rNN-sw01 32

LOM2 rNN-sw02 8


BMC rNN-sw01 33

LOM2 rNN-sw02 9


BMC rNN-sw01 34

LOM2 rNN-sw02 10


BMC rNN-sw01 35

LOM2 rNN-sw02 11


BMC rNN-sw01 36

LOM2 rNN-sw02 12


BMC rNN-sw01 37

LOM2 rNN-sw02 13


BMC rNN-sw01 38

LOM2 rNN-sw02 14


BMC rNN-sw01 39

LOM2 rNN-sw02 15


BMC rNN-sw01 40

LOM2 rNN-sw02 16


BMC rNN-sw01 41

LOM2 rNN-sw02 17


BMC rNN-sw01 42

LOM2 rNN-sw02 18


BMC rNN-sw01 43


13

LOM2 rNN-sw02 19


BMC rNN-sw01 44

LOM2 rNN-sw02 20

End of Row Switch Port Connectivity

POD Number ToR Switch ToR Switch Port EoR Switch EoR Switch Port

1 r01-s01 10GbE1 Eor-row01-sw01 1

1 r01-s01 10GbE2 Eor-row01-sw02 1

1 r01-s02 10GbE1 Eor-row01-sw01 2

1 r01-s02 10GbE2 Eor-row01-sw02 2

1 r02-s01 10GbE1 Eor-row01-sw01 3

1 r02-s01 10GbE2 Eor-row01-sw02 3

1 r02-s02 10GbE1 Eor-row01-sw01 4

1 r02-s02 10GbE2 Eor-row01-sw02 4

1 r03-s01 10GbE1 Eor-row01-sw01 5

1 r03-s01 10GbE2 Eor-row01-sw02 5

1 r03-s02 10GbE1 Eor-row01-sw01 6

1 r03-s02 10GbE2 Eor-row01-sw02 6

2 r01-s01 10GbE1 Eor-row01-sw01 7

2 r01-s01 10GbE2 Eor-row01-sw02 7

2 r01-s02 10GbE1 Eor-row01-sw01 8

2 r01-s02 10GbE2 Eor-row01-sw02 8

2 r02-s01 10GbE1 Eor-row01-sw01 9

2 r02-s01 10GbE2 Eor-row01-sw02 9

2 r02-s02 10GbE1 Eor-row01-sw01 10

2 r02-s02 10GbE2 Eor-row01-sw02 10

2 r03-s01 10GbE1 Eor-row01-sw01 11

2 r03-s01 10GbE2 Eor-row01-sw02 11

2 r03-s02 10GbE1 Eor-row01-sw01 12

2 r03-s02 10GbE2 Eor-row01-sw02 12

3 r01-s01 10GbE1 Eor-row01-sw01 13

3 r01-s01 10GbE2 Eor-row01-sw02 13

3 r01-s02 10GbE1 Eor-row01-sw01 14

3 r01-s02 10GbE2 Eor-row01-sw02 14

3 r02-s01 10GbE1 Eor-row01-sw01 15

3 r02-s01 10GbE2 Eor-row01-sw02 15

3 r02-s02 10GbE1 Eor-row01-sw01 16

3 r02-s02 10GbE2 Eor-row01-sw02 16

3 r03-s01 10GbE1 Eor-row01-sw01 17

3 r03-s01 10GbE2 Eor-row01-sw02 17

3 r03-s02 10GbE1 Eor-row01-sw01 18

3 r03-s02 10GbE2 Eor-row01-sw02 18


14

Dell | Cloudera Hadoop Solution Deployment Process Overview

Dell | Cloudera Hadoop Solution Hardware Configuration

Edge Node Hardware Configuration Component Setting Parameter

BIOS Boot Order 1) LOM 1 PXE 2) Internal Boot Device PERC

H700 LUN 0 PXE Boot LOM 1 Enable

PXE Boot LOM 2 Disable

PERC H700 BIOS RAID Enabled

LUN 0 Disk 0-5 RAID 10

Boot Order 1) LUN 0

Start with Boxed Node

Unbox and Rack

Racked Node

Cable into Switches and

Power

Primed Node

Power Node (Network Boot)

Discovering Node

Base OS Install

Reboot/Network Boot

Hardware Install

Reboot/Network Boot

Reboot/Network Boot

Ready for Role

Mark for Update in UI Reboot/Network Boot Hardware

Update Reboot/Network

Boot Chef client Completes Crowbar UI

Assigns New Roles

Applying Role


15

Master Node Hardware Configuration Component Setting Parameter

BIOS Boot Order 1) LOM 1 PXE 2) Internal Boot Device PERC

H700 LUN 0 PXE Boot LOM 1 Enable


PERC H700 BIOS RAID Enabled

LUN 0 Disk 0-5 RAID 10

Boot Order 2) LUN 0

Slave Node Hardware Configuration Component Setting Parameter

BIOS Boot Order 3) LOM 1 PXE 4) Internal Boot Device

PXE Boot LOM 1 Enable


LSI 2008 Controller BIOS RAID Disabled

Boot Order 3) Disk 0 4) Disk 1

Network Switch Configuration Setting Parameter Ports

Spanning-Tree Disable ALL

Port-Fast Disable ALL

Flow-Control Enable ALL


16

Dell | Cloudera Hadoop Solution Network Configuration

Table 6: IP Scheme

A B C D Use

First POD

172 16 0/22 Rack Number

1-42 SlaveNode[XX] bond0, by Rack Unit

4/22 Rack Number (1xx)

200-242 SlaveNode [XX] BMC, by Rack Unit

172 16 3 1-19 MgmtNode[XX]

3 20-30 SlaveNode[XX]

3 31-40 JobTrackNode[XX]

3 41-50 EdgeNode[XX]

172 16 7 1-19 MgmtNode[XX]

7 20-30 MgmtNode[XX]



Second POD

172 16 8/22 Rack Number

1-42 SlaveNode[XX] bond0, by Rack Unit

12/22 Rack Number (1xx)

200-242 SlaveNode [XX] BMC, by Rack Unit

172 16 11 1-19 MgmtNode[XX]

11 20-30 Master Node[XX]



172 16 15 1-19 MgmtNode[XX]




17

Dell | Cloudera Hadoop Solution Automated Software Installation

Admin Node Installation To use Crowbar, you must first install an Admin Node. Installing the Admin Node requires installing the base

operating system, optionally customizing the Crowbar configuration, and installing Crowbar itself.

The following is required to bootstrap the Admin Node by PXE booting:

1. The user is expected to make the physical arrangements to connect this VM to the network such that the

(soon to be) Admin Node can PXE boot from it. A network crossover cable might be required.

2. A VM image provides an initial TFTP/DHCP/Boot server. A VMware Player (free download from VMware) is

required to execute it.

Procedure:

1. Make sure you have VMware Player installed.

2. Open the VMware machine configuration distributed with Crowbar.

3. Edit the machine settings and ensure that (see images below):

o The CD/DVD drive is mounting the Crowbar ISO distribution

o The Network adapter is configured to use Bridged Networking

4. Obtain the ISO of Crowbar, and configure VMware Player to mount it as a DVD in the VM.

5. Plug the crossover cable into eth0 of the server and your network port on the laptop.

6. Start the WMware Player and configure it to use the network port.

7. Power on the admin node, and ensure that:

o It is set up to boot from the hard disk for subsequent boots

o The first boot is a network boot

The machine would obtain its image from the VMware Player VM and start the installation process.


18

Figure 4: VMware Player Configuration for DVD


19

Figure 5: VMware Player Configuration for Network Adapter


20

1. Installing Crowbar

The image installed in the previous steps includes all the required Crowbar components. Before actually

installing Crowbar, there is the opportunity to customize the installation to fit into the deployment

environment. The steps below assume default configuration.

To install Crowbar:

Log onto the Admin node. The default username is openstack, password: openstack.

cd /tftpboot/redhat_dvd/extra

sudo ./install admin.your.cluster.fqdn

This will install Crowbar.

Note: Because there are many dependencies some transient errors might be visible on the console. This is expected.

2. Verifying admin node state

At this point all Crowbar services have started. The table below provides information to access the services:

Table 7: Accessing Services

Service URL Credentials

Ssh [email protected] openstack

Crowbar UI http://192.168.124.10:3000/ crowbar/crowbar

Nagios http://192.168.124.10/nagios3 nagiosadmin / password

Ganglia http://192.168.124.10/ganglia nagiosadmin / password

Chef UI http://192.168.124.10:4040/ admin/password

Set CROWBAR Parameter

export CROWBAR_KEY=$(`cat /etc/crowbar.install.key`)

export CROWBAR_KEY=crowbar:crowbar

Slave Node Installation Nodes other than the Admin Nodes are installed when they are first powered up. A sequence boot phase is

executed (rebooting multiple times) which culminates in deploying a minimal OS image installed on the local

drive. Part of the basic installation includes “hooking” the nodes into the infrastructure services—NTP, DNS,

Nagios, and Ganglia.

Once known to Crowbar, the node can be managed; it can be powered on and off, rebooted, and

components can be installed on it.

Functional components are installed on nodes by including them in one or more barclamps’ proposals. For

example, when a node is mentioned in a proposal for swift as a storage node, the relevant packages, services,

and configuration are deployed to that node when the proposal is committed.

The next section describes details for installing the different components.

mailto:[email protected]

http://192.168.124.10:3000/

http://192.168.124.10/

http://192.168.124.10/

http://192.168.124.10:4040/


21

Installing components

The general workflow to install any component is the same:

A. Obtain a default proposal which includes the parameters for the component and a mapping of nodes to the

roles they are assigned.

B. Edit the proposal to match the desired configuration.

C. Upload the proposal to Crowbar.

D. Commit the proposal.

All these activities are achieved by using the Crowbar command line tool or the Web-based UI. The sections

that follow use the command line tool: /opt/dell/bin/crowbar.

In the sections that follow, this tool is referred to as “Crowbar.”

General installation process

Obtain a proposal

Crowbar can inspect the current known nodes and provide a proposal that best utilizes the available systems for the

component being installed. To obtain and inspect this proposed configuration:

/opt/dell/bin/crowbar <component> proposal create <name>

/opt/dell/bin/crowbar <component> proposal show <name> > <local_file_name>

Where:

<component> – is the component for which the proposal is made; e.g. swift, nova glance.

<name> – is the name assigned to this proposal. This name should be unique for the component; i.e. if two swift

clusters are being installed, the proposals for each should have unique names.

<local_file_name> - Is any file name into which the proposal will be written

Update a proposal

The local file created above can be inspected and modified. The most common changes are:

Change default passwords and other barclamp parameters (e.g. swift replica count).

Change assignment of machines to roles.

Once edits are completed, Crowbar must be updated. To update Crowbar with a modified proposal, execute:

/opt/dell/bin/crowbar <component> proposal –-file=<local_file_name> edit <name>

Where the parameters in this command are exactly as mentioned above, Crowbar will validate the proposal for

syntax and basic sanity rules as part of this process.

Committing a proposal

Once the proposal content is satisfactory, the barclamp instance can be activated. To achieve that, execute:

/opt/dell/bin/crowbar <component> proposal commit <name>

This might take a few moments, as Crowbar is deploying the required software to the machines mentioned in

the proposal.

Modifying an active configuration

When committing a proposal that was previously committed, Crowbar compares the new configuration to the

currently active state and applies the deltas.

To force Crowbar to reapply a proposal, the active state needs to be deleted:

/opt/dell/bin/crowbar <component> delete <name>


22

Dell | Cloudera Hadoop Solution Manual Software Installation

Solution Deployment Prerequisites 1. Access to the global Internet from the Admin Nodes for the Hadoop environment

2. Hardware compliant with the Dell | Cloudera Hadoop Solution, Release 1 Reference Architecture

Configuration Files and Scripts Dell has created several ancillary configuration files and scripts to make deploying and configuring Hadoop a

little easier. They are attached to this deployment guide here:

Prepare the Deployment Server 3. Install a default install of RHEL 5.6 x86_64 on what will become the primary Admin Node. Ensure that selinux

is disabled.

4. Configure SSH to allow root to log in.

5. Create a mirror of the RHEL install image in /srv/www/install/rhel5.6

6. Create a mirror of the Cloudera CDH repository in /srv/www/mirrors/cloudera/redhat

7. Configure YUM to use the mirror of the RHEL install image for package installs if you do not already have an

internal RHEL satellite server that you can use. This YUM repository will be overwritten once we have

installed and configured Apache to serve our RHEL 5.6 repository as a local mirror. To use the local mirror

directly, create: /etc/yum.repos.d/RHEL5.6.repo:

[base]

baseurl=file:///srv/www/install/rhel5.6/Server

gpgcheck=1

gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

8. Configure one of the Ethernet ports to have the following static configuration:

IP address = 172.16.3.1

Netmask = 255.255.0.0

For the rest of this install guide, we will assume that this is eth1, and refer to it as such. If you change this to a

different IP, you will need to update all the Dell-provided files to match.

9. Install Apache and configure it to serve /srv/www.

yum install httpd

10. Edit /etc/httpd/conf/httpd.conf to serve /srv/www as the primary DocumentRoot. You can use

deploy/etc/httpd/conf/httpd.conf as an example.

11. chkconfig httpd on

12. service httpd start

13. Verify that Apache is serving files:

14. links http://localhost/install/

You should get a listing containing the local RHEL and CentOS mirrors you created.

15. Create YUM repository files for the RHEL 5.6 install and Cloudera mirrors and save them in /etc/yum.repos.d

and /opt/dell/hadoop/repos. You can find both of these repository files in /repos/RHEL5.6.repo and

/repos/cloudera-cdh.repo in the zip file.

16. Ensure that xinetd is installed and configured to start at boot:

yum install xinetd

chkconfig xinetd on

service xinetd start

hadoop.zip

http://localhost/install/


23

17. Configure rsync to serve out the basic files that the rest of the cluster will need.

Copy /deploy/etc/rsyncd.conf from the zipfile to /etc/rsyncd.conf on the admin node.

Copy /deploy/etc/xinetd.d/rsync from the zipfile to /etc/xinetd.d/rsync on the admin node.

18. Create a directory for TFTP to serve from:

mkdir –p /srv/tftpboot

19. Copy over the PXE boot system files we will need:

cp /usr/lib/syslinux/pxelinux.0 /srv/tftpboot

cp /usr/lib/syslinux/menu.c32 /srv/tftpboot

20. Copy /deploy/etc/dnsmasq.conf from the zipfile to /etc/dnsmasq.conf on the deploy server, and edit it to

match the Hadoop cluster config you are deploying. Dell strongly recommends taking the time to

thoroughly read and understand this configuration file—dnsmasq will act as the primary DNS, DHCP, and

TFTP server for the cluster, and the cluster will not function properly if it is misconfigured. In particular, you

will need to ensure that the MAC addresses referenced in this file are accurate.

21. Copy /deploy/usr/local/sbin/update_hosts from the zipfile to /usr/local/sbin/update_hosts on the deploy

server. You will need to run this file whenever you change the dnsmasq configuration to ensure that new

nodes are functioning properly.

22. Copy /deploy/srv/tftpboot/pxelinux.cfg/default from the zipfile to /srv/tftpboot/pxelinux.cfg/default

23. Copy the RHEL 5.6 install kernel and initrd into /srv/tftpboot/rhel5.6:

cd /srv/www/install/rhel5.6/images/pxeboot

cp initrd.img /srv/tftpboot/rhel5.6/initrd.img

cp vmlinuz /srv/tftpboot/rhel5.6/vmlinuz

24. Enable and start dnsmasq:

25. chkconfig dnsmasq on

26. service dnsmasq start

27. Copy /deploy/srv/www/install/kickstarts/hadoop-node.ks from the zipfile to

/srv/www/install/kickstarts/hadoop-node.ks on the deploy server.

28. Copy /deploy/srv/www/install/scripts/compute-node.sh from the zipfile to

/srv/www/install/scripts/compute-node.sh on the deploy server.

29. Copy /deploy/srv/www/install/scripts/edge-node.sh from the zipfile to /srv/www/install/scripts/edge-

node.sh on the deploy server.

30. Copy the repos, namenode, and slave directories in their entirety from the zipfile to /opt/dell/hadoop on the

deploy server. You should end up with /opt/dell/hadoop/namenode, /opt/dell/hadoop/slave, and

/opt/dell/hadoop/repos directories, and their contents should exactly match what is in the zipfile.

Installing Hadoop on the Primary Master Node 1. Ensure that an IP address for the Primary Name Node has been assigned in /etc/dnsmasq.conf on the

Deployment server (Admin node).

2. PXE boot Primary Master Node to the Hadoop Unconfigured Node boot entry.

3. Let the install finish

4. Install the Primary Master Node software:

yum –y install hadoop-0.20-namenode hadoop-0.20-jobtracker xinetd

Note: Don’t install the JobTracker software on the Primary Master Node if the design specifies one or more servers dedicated to run the JobTracker software (i.e. cluster size, cluster topology, workload type, etc.). See instructions below for installing a dedicated JobTracker server.

5. Copy over and activate the proper rsync configuration from the deploy node with the following commands:

rsync 172.16.3.1::dell-hadoop/namenode/conf/rsyncd.conf /etc/rsyncd.conf

rsync 172.16.3.1::dell-hadoop/namenode/conf/xinetd-rsync /etc/xinetd.d/rsync

rsync 172.16.3.1::dell-hadoop/namenode/conf/xinetd-hadoop-conf /etc/xinetd.d/hadoop-conf

chkconfig xinetd on

service xinetd start

6. Create a new configuration repository for your cluster:

cp -r /etc/hadoop-0.20/conf.empty/ /etc/hadoop-0.20/conf.my_cluster

7. On the Primary Master Node update the /etc/hadoop-0.20/conf.my_cluster/masters with the DNS

hostname (or IP address) of the Primary Master Node.

echo namenode >/etc/hadoop-0.20/conf.my_cluster/masters


24

8. Configuring the Primary Master Node Internal Storage

For volumes greater than 2TB, use Parted. If for example /dev/sdb is to be used for Internal Storage, follow

these instructions:

# parted /dev/sdbmkpart primary ext3 1 -1

# mkfs.ext3 /dev/sdb1

# e2label /dev/sdb1 meta1

# cat /etc/fstab | grep meta1

LABEL=meta1 /mnt/hdfs/hdfs01/meta1 ext3 defaults,noatime,nodiratime 0 0

# mkdir /mnt/hdfs/hdfs01/meta1

# mount –a

9. Configuring local storage directories for use with HDFS and MapReduce

You will need to specify, create, and assign the correct permissions to the local directories where you want the

HDFS and MapReduce daemons to store data.

You specify the directories by configuring the following three properties; two properties are in the hdfs-

site.xml file, and one property is in the mapred-site.xml file.

Table 8: Local storage directories configuration

Property Configuration File Location Description

dfs.name.dir hdfs-site.xml on the

MasterNode

This property specifies the directories where the Master Node stores its

metadata and edit logs. Cloudera recommends that you specify at least two

directories, one of which is located on an NFS mount point.

dfs.data.dir hdfs-site.xml on each

SlaveNode

This property specifies the directories where the SlaveNode stores blocks.

Cloudera recommends that you configure the disks on the SlaveNode in a JBOD

configuration, mounted at /mnt/hdfs/hdfs01/data1 through

/mnt/hdfs/hdfs01/dataN, and configure dfs.data.dir to specify

/mnt/hdfs/hdfs01/data1/hdfs through

/mnt/hdfs/hdfs01/dataN/hdfs.

mapred.local.dir

mapred-site.xml on

each TaskTracker (which runs

on the SlaveNode)

This property specifies the directories where the TaskTracker will store temporary

data and intermediate map output files while running MapReduce jobs. Cloudera

recommends that this property specifies a directory on each of the JBOD mount

points; for example, /mnt/hdfs/hdfs01/data1/mapred through

/mnt/hdfs/hdfs01/dataN/mapred.

10. Let’s start by configuring the storage directories on the Primary Master Node. Logon to the Primary Master

Node and update /etc/hadoop-0.20/conf.my_cluster/hdfs-site.xml:

# vi /etc/hadoop-0.20/conf.my_cluster/hdfs-site.xml

<property>

<name>dfs.name.dir</name>

<value>/mnt/hdfs/hdfs01/meta1</value>

</property>

<property>

<name>dfs.data.dir</name>

<value> </value>

</property>

11. Update the same /etc/hadoop-0.20/conf.my_cluster/hdfs-site.xml file with:

12. Size of the HDFS block size (the unit of data storage on each SlaveNode, property name is dfs.block.size)

13. The amount of space on each storage volume (on the SlaveNode) which HDFS should not use (property

name is dfs.datanode.du.reserved),


25

14. The Master Node server thread count (which should be increased as you add more Master Nodes to your

cluster; property name is dfs.namenode.handler.count)

15. The SlaveNode server thread count (higher count may help for speedier replication of data or when running

storage-intensive workloads, property name dfs.datanode.handler.count)

16. The data replication factor (counts high many copies of same block of data exist; property name is

dfs.replication)

17. The permission to access HDFS data (property name dfs.permissions)

18. The HDFS space reclaiming time interval (within which each SlaveNode evaluates its own storage space and

decides to get rid of data blocks that no longer belong to a file, property name fs.trash.interval)

# vi /etc/hadoop-0.20/conf.my_cluster/hdfs-site.xml

<property>

<name>dfs.block.size</name>

<value>134217728</value>

<final>true</final>

</property>

<property>

<name>dfs.datanode.du.reserved</name>

<value>10737418240</value>

<final>true</final>

</property>

<property>

<name>dfs.namenode.handler.count</name>

<value>32</value>

<final>true</final>

</property>

<property>

<name>dfs.datanode.handler.count</name>

<value>16</value>

<final>true</final>

</property>

<property>

<name>dfs.permissions</name>

<value>True</value>

<final>true</final>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>fs.trash.interval</name>

<value>1440</value>

<final>true</final>

</property>

Note: It is recommended that system administrators configure these properties as read-only by using the XML tag <final>true</final>.

19. On the Master Node, configure the owner of the dfs.name.dir directories to be the hdfs user.

chown -R hdfs:hadoop /mnt/hdfs/hdfs01/meta /nfs_mount/dfs_name

20. Configure the storage for the MapReduce tasks (more precisely TaskTracker) running on SlaveNode’s to store

temporary data. Logon to the Primary Master Node and edit /etc/hadoop-0.20/conf.my_cluster/mapred-

site.xml:


26

#vi /etc/hadoop-0.20/conf.my_cluster/mapred-site.xml

<property>

<name>mapred.local.dir</name>

<value>/var/lib/hadoop-0.20/cache/mapred/mapred/local </value>

</property>

<property>

<name>mapred.system.dir</name>

<value> /mapred/system</value>

</property>

#mkdir –p /var/lib/hadoop-0.20/cache/mapred/mapred/local

#chown –R mapred:hadoop /var/lib/hadoop-0.20/cache/mapred/mapred/local


27

21. Configure additional MapReduce configuration parameters:

#vi /etc/hadoop-0.20/conf.my_cluster/mapred-site.xml

<property>

<name>mapred.job.tracker</name>

<value>namenode:8021</value>

</property>

<property>

<name>mapred.job.tracker.handler.count</name>

<value>32</value>

</property>

<property>

<name>mapred.reduce.tasks</name>

<value>6</value>

</property>

<property>

<name>mapred.tasktracker.map.tasks.maximum</name>

<value>10</value>

</property>

<property>

<name>mapred.tasktracker.reduce.tasks.maximum</name>

<value>6</value>

</property>

<property>

<name>mapred.child.java.opts</name>

<value>-Xmx1024m</value>

</property>

<property>

<name>mapred.child.ulimit</name>


</property>

<property>

<name>mapred.map.tasks.speculative.execution</name>

<value>false</value>

</property>

<property>

<name>mapred.reduce.tasks.speculative.execution</name>

<value>false</value>

</property>

<property>

<name>mapred.job.reuse.jvm.num.tasks</name>

<value>1</value>

</property>

where MASTER_NODE_ADDR is the DNS hostname (or IP address) of the Primary Master Node.


28

Configuring Memory Utilization for HDFS and MapReduce Setting the optimal memory configurations for the Master Node, JobTracker, and SlaveNode helps the cluster

process more jobs with no job queuing or resource contention.

The steps are:

22. Edit /etc/hadoop-0.20/conf.my_cluster/core-site.xml and add the following entries:

<property>

<name>io.file.buffer.size</name>


</property>

<property>

<name>fs.default.name</name>

<value>hdfs://namenode:8020</value>

</property>

<property>

<name>io.sort.factor</name>

<value>80</value>

</property>

<property>

<name>io.sort.mb</name>

<value>512</value>

</property>

Configuring the Hadoop environment

23. Assign a Heap Size for HADOOP_*_OPTS in /etc/hadoop-0.20/conf.my_cluster/hadoop-env.sh

export HADOOP_NAMENODE_OPTS= ” –Xmx2048m”

export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx2048m"

export HADOOP_DATANODE_OPTS="-Xmx2048m"

export HADOOP_BALANCER_OPTS="-Xmx2048m"

export HADOOP_JOBTRACKER_OPTS="-Xmx2048m"

24. Increase the max number of open files that mapred and hdfs can open.

# vi /etc/security/limits.conf

Add:

mapred - nofile 32768

hdfs - nofile 32768

hbase - nofile 32768 # optional Activate the new configuration:

sudo alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster 50

25. Create a slaves.d directory and make it world-writable:

mkdir –p /etc/hadoop/conf/slaves.d

chmod 777 /etc/hadoop/conf/slaves.d

26. Install add-slaves from the deploy node:

rsync 172.16.3.1::dell-hadoop/namenode/bin/add-slaves /usr/local/sbin/add-slaves

27. Create a cron job that will run add-slaves every 5 minutes

28. # crontab –e

29. */5 * * * * /usr/local/sbin/add-slaves

30. Verify that the new configuration has been activated:

alternatives --display hadoop-0.20-conf

31. The command should list the highest priority (e.g. 50) for your new configuration.

32. Format the cluster metadata space. On the Master Node run the following command:

sudo -u hdfs hadoop namenode –format


29

Note: This command will delete all your data in an existing cluster. Do not run this command until all your

data is backed up.

33. Enable the job tracker and the name node:

service hadoop-0.20-namenode start

service hadoop-0.20-jobtracker start

chkconfig hadoop-0.20-namenode on

chkconfig hadoop-0.20-jobtracker on

34. Create the HDFS MapReduce storage directory:

sudo –u hdfs hadoop fs -mkdir /mapred/system

sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system


30

Installing Hadoop on the Secondary Master Node (aka Checkpoint Node) Log on to the Secondary Master Node and follow these steps:

35. Install the Secondary Master Node software:

yum install hadoop-0.20-secondarynamenode

36. Copy the configuration files from the namenode On the Secondary

# scp <namenode IP>:/etc/hadoop-0.20/conf /etc/hadoop-0.20/conf

Installing Hadoop on the JobTracker Node Note: Follow these instructions only when the Master Node and JobTracker software need to run on dedicated machines

Log on to the machine designated to run the JobTracker and follow these steps:

37. yum install hadoop-0.20-jobtracker

Installing Hadoop on the Slave Node Ensure that the Primary Master Node is up, running, and has a valid configuration, and then PXE boot to

Hadoop Compute Node. The Slave Node will automatically deploy and register itself with the primary Master

Node.

Installing Hadoop on the Edge Node Ensure that the Primary Master Node is up, running, and has a valid configuration, and then PXE boot to

Hadoop Edge Node. The EdgeNode will automatically deploy and register itself with the Primary Master Node.

Configuring the Secondary Master Node Internal Storage For volumes greater than 2TB, use Parted. If for example /dev/sdb is to be used for Internal Storage, follow

these instructions:

# parted /dev/sdb

mkpart primary ext3 1 -1

# mkfs.ext3 /dev/sdb1

# e2label /dev/sdb1 meta1

# cat /etc/fstab | grep meta1

LABEL=meta1 /mnt/hdfs/hdfs01/meta1 ext3 defaults,noatime,nodiratime 0 0

# mkdir /mnt/hdfs/hdfs01/meta1

# mount –a

Configuring the Cluster for the Secondary Master Node The Secondary Master Node stores the latest copy of the HDFS metadata in a directory that is structured the

same way as the Primary Master Node’s directory.

The configuration of the Secondary Master Node is controlled by two parameters defined in core-site.xml (on the

Secondary Master Node):

fs.checkpoint.period, set to 1 hour by default (the value is 3600 seconds), specifies the maximum delay between

two consecutive checkpoints

fs.checkpoint.size, set to 64MB by default (the value is 67108864), defines the size of the edits log file that forces

an urgent checkpoint even if the maximum checkpoint delay is not reached.

fs.checkpoint.dir, defines where on the local file system the Secondary Master Node should store the temporary

edits to merge. If this is a comma-delimited list of directories then the edits log is replicated in all directories for

redundancy.

<property>

<name>fs.checkpoint.period</name>

<value>3600</value>

</property>

<property>

<name>fs.checkpoint.size</name>



31

</property>

<property>

<name>fs.checkpoint.dir</name>

<value>/tmp/hadoop-metadata</value>

</property>

To check that the Secondary Master Node functions properly, force the Secondary Master Node to get a copy of the edits log

file from the Primary Master Node. On the Secondary Master Node, run the following commands: # sudo –u hdfs hadoop secondarynamenode -checkpoint force

# sudo –u hadoop secondarynamenode -geteditsize

A non-zero value indicates that the Secondary Master Node runs properly.

Verify Cluster Functionality

Once you have the Master Node and your edge nodes deployed, verify that the cluster is functional with a

quick terasort:

sudo –u hdfs hadoop jar /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u1-examples.jar teragen

10000 teragen

sudo –u hdfs hadoop jar /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u1-examples.jar terasort

teragen terasort

sudo –u hdfs hadoop fs –ls /user/hdfs/terasort

Operating System Configuration Checklist (Source: Cloudera – https://ccp.cloudera.com/display/KB/Cluster+Checklist)

37. /etc/fstab: mount points should be noatime because HDFS doesn't do updates.

38. Run tune2fs on hadoop devices. Superblock backups = 1% using sparse_super.

File system mounted with journaling enabled.

39. Run mii-tool -w to see if things are negotiating at a proper rate.

40. iptables -nvL to see if there are any firewalls in place. All ports should be open between all nodes.

41. DNS and hosts configuration should be reasonable: All host names should consistently forward/reverse lookup.

42. Use dnscache for DHCP and DNS on the primary management node.

43. sysstat installed. sa* collectors running for retroactive sar and iostat information.

Helps you determine things like “swappiness.”

44. tmpwatch pruning for $var/userlogs/* $var/logs/*

45. Check number of open files available:

cat /proc/sus/fs/file-max: should be very large (/etc/).

ulimit -Hn: should be appx 32768 (/etc/security/limits.conf).

Configuring Rack Awareness (Source: Yahoo – http://developer.yahoo.com/hadoop/tutorial/module2.html#perms)

For small clusters in which all servers are connected by a single switch, there are only two levels of locality:

“on-machine” and “off-machine.” When loading data from a SlaveNode’s local drive into HDFS, the Master

Node will schedule one copy to go into the local SlaveNode and will pick two other machines at random from

the cluster.

For larger Hadoop installations that span multiple racks, it is important to ensure that replicas of data exist on

multiple racks. This way, the loss of a switch does not render portions of the data unavailable due to all replicas

being underneath it.

HDFS can be made rack-aware by the use of a script that allows the master node to map the network

topology of the cluster. While alternate configuration strategies can be used, the default implementation

allows you to provide an executable script that returns the “rack address” of each of a list of IP addresses.

The network topology script receives as arguments one or more IP addresses of nodes in the cluster. It returns

on stdout a list of rack names, one for each input. The input and output order must be consistent.

https://ccp.cloudera.com/display/KB/Cluster+Checklist

http://developer.yahoo.com/hadoop/tutorial/module2.html#perms


32

To set the rack mapping script, specify the key topology.script.file.name in conf/hadoop-

site.xml. This provides a command to run to return a rack ID; it must be an executable script or program.

By default, Hadoop will attempt to send a set of IP addresses to the file as several separate command line

arguments. You can control the maximum acceptable number of arguments with the

topology.script.number.args key.

Rack ID’s in Hadoop are hierarchical and look like path names. By default, every node has a rack id of

/default-rack. You can set rack IDs for nodes to any arbitrary path, e.g., /foo/bar-rack. Path

elements further to the left are higher up the tree. Thus a reasonable structure for a large installation may be

/top-switch-name/rack-name.

Hadoop rack ids are not currently expressive enough to handle an unusual routing topology such as a 3-d

torus; they assume that each node is connected to a single switch which in turn has a single upstream switch.

This is not usually a problem, however. Actual packet routing will be directed using the topology discovered by

or set in switches and routers. The Hadoop rack IDs will be used to find “near” and “far” nodes for replica

placement (and in 0.17, MapReduce task placement).

The following example script performs rack identification based on IP addresses given a hierarchical IP

addressing scheme enforced by the network administrator. This may work directly for simple installations;

more complex network configurations may require a file- or table-based lookup process. Care should be

taken in that case to keep the table up-to-date as nodes are physically relocated, etc. This script requires that

the maximum number of arguments be set to 1.

#!/bin/bash

# Set rack id based on IP address.

# Assumes network administrator has complete control

# over IP addresses assigned to nodes and they are

# in the 10.x.y.z address space. Assumes that

# IP addresses are distributed hierarchically. e.g.,

# 10.1.y.z is one data center segment and 10.2.y.z is another;

# 10.1.1.z is one rack, 10.1.2.z is another rack in

# the same segment, etc.)

#

# This is invoked with an IP address as its only argument

# get IP address from the input

ipaddr=$0

# select "x.y" and convert it to "x/y"

segments=`echo $ipaddr | cut --delimiter=. --fields=2-3 --output-delimiter=/`

echo /${segments}

Starting Your Hadoop Cluster 46. Run the following commands on the Master Node:

# service hadoop-0.20-namenode start

# service hadoop-0.20-jobtracker start


33

Dell | Cloudera Hadoop Solution Software Configuration

Dell | Cloudera Hadoop Solution Configuration Parameters Recommended Values

Table 9: hdfs-site.xml

Property Description Value

dfs.block.size Lower value offers parallelism 134217728 (128Mb)

dfs.name.dir

Comma-separated list of folders (no

space) where a Slave Node stores its

blocks

Cluster specific

dfs.datanode.handler.count

Number of handlers dedicated to serve

data block requests in Hadoop Slave

Nodes

16

(Start 2 x CORE_COUNT in each

SlaveNode )

dfs.namenode.handler.count

More Master Node server threads to

handle RPCs from large number of

Slave Nodes

Start with 10, increase large clusters

(Higher count will drive higher CPU,

RAM, and network utilization)

dfs.namenode.du.reserved

The amount of space on each storage

volume that HDFS should not use, in

bytes.

10M

dfs.replication Data replication factor. Default is 3. 3 (default)

fs.trash.interval Time interval between HDFS space

reclaiming 1440 (minutes)

dfs.permissions true (default)

dfs.datanode.handler.count 8


34

Table 10: mapred-site.xml

Table 11: default.xml


SCAN_IPC_CACHE_LIMIT

Number of rows cached in search

engine for each scanner next call over

the wire. It reduces the network round

trip by 300 times caching 300 rows in

each trip.

100

LOCAL_JOB_HANDLER_COUNT

Number of parallel queries executed at

one go. Query requests above than this

limit gets queued up.

30


mapred.child.java.opts Larger heap-size for child JVM’s of

maps/reduces. -Xmx1024M

mapred.job.tracker Hostname or IP address and port of the

JobTracker. TBD

mapred.job.tracker.handler.count

More JobTracker server threads to

handle RPCs from large number of

TaskTrackers.

Start with 32, increase large clusters

(higher count will drive higher CPU,

RAM and Network utilization)

mapred.reduce.tasks The number of Reduce tasks per job. Set to a prime close to the number of

available hosts

mapred.local.dir

Comma-separated list of folders (no

space) where a TaskTracker stores

runtime information

Cluster-specific

mapred.tasktracker.map.tasks.maximum Maximum number of map tasks to run

on the node 2 + (2/3) * number of cores per node

mapred.tasktracker.reduce.tasks.maxim

um

Maximum number of reduce tasks to

run per node 2 + (1/3) * number of cores per node

mapred.child.ulimit 2097152

mapred.map.tasks.speculative.execution FALSE

mapred.reduce.tasks.speculative.executi

on FALSE

mapred.job.reuse.jvm.num.tasks -1


35

Table 12: hadoop-env.sh


java.net.preferIPv4Stack true

JAVA_HOME

HADOOP_*_OPTS -Xmx2048m

Table 13: /etc/fstab


File system mount options data=writeback,nodiratime, noatime

Table 14: core-site.xml


io.file.buffer.size

The size of buffer for use in sequence

files. The size of this buffer should

probably be a multiple of hardware

page size (4096 on Intel x86), and it

determines how much data is buffered

during read and write operations.

65536 (64Kb)

fs.default.name

The name of the default files system. A

URI whose scheme and authority

determine the file system

implementation.

Example:

hdfs://someserver.example.com:8020/

fs.checkpoint.dir

Comma-separated list of directories on

the local file system of the Secondary

Master Node where its checkpoint

images are stored

TBD

io.sort.factor 80

Io.sort.mb 512

Table 15: /etc/security/limits.conf


mapred – nofile 32768

hdfs – nofile 32768

hbase – nofile 32768


36

Dell | Cloudera Hadoop Solution Monitoring and Alerting

The following components will be monitored by the Hadoop monitoring console:

Service Type Resource Warning Critical Nodes to Monitor Tool

Disk HDFS_DISK_[00-10] 60 90 SlaveNode[] Nagios

SWAP SWAP 60 90 SlaveNode[] Nagios

60 90 Master Node[] Nagios

60 90 EdgeNode[] Nagios

Ping_Node_From_Admin DELAY NO RESPONSE SlaveNode[] Nagios

DELAY NO RESPONSE Master Node[] Nagios

DELAY NO RESPONSE EdgeNode[] Nagios

NIC Bonding DELAY 1 NIC in Bond SlaveNode[] Nagios

DELAY 1 NIC in Bond Master Node[] Nagios

DELAY 1 NIC in Bond EdgeNode[] Nagios

DNS_From_Node DELAY NO RESPONSE SlaveNode[] Nagios



DNS_About_Node DELAY NO RESPONSE SlaveNode[] Nagios



JobTracker_Daemon DELAY DAEMON NOT

RUNNING Master Node[] Nagios

TaskTracker_Daemon DELAY DAEMON NOT

RUNNING SlaveNode[] Nagios

SlaveNode_Daemon DELAY DAEMON NOT


Master Node_Daemon DELAY DAEMON NOT


SecondaryMaster Node DELAY DAEMON NOT


SSH DELAY NO RESPONSE SlaveNode[] Nagios


Zombie_Processes 5 10 SlaveNode[] Nagios



CPU_Load 80 90 SlaveNode[] Nagios



Zookeeper_Client DELAY DAEMON NOT


Zookeeper_Server DELAY DAEMON NOT


JobTracker_Submit_Job DELAY NO RESPONSE Master Node[] Nagios

Chef_Daemon DELAY NO RESPONSE SlaveNode[] Nagios



Disk MAPRED_DIR 60 90 SlaveNode[] Nagios



Memory_Capacity_Used System Memory 80 90 SlaveNode[] Nagios



Disk HDFS01_Capacity 60 90 Master Node[] Nagios


37

CPU_Utilizion SlaveNode[] Ganglia

Master Node[] Ganglia

EdgeNode[] Ganglia

Memory_Utilization SlaveNode[] Ganglia


EdgeNode[] Ganglia

NIG_LAG_Utilization SlaveNode[] Ganglia


EdgeNode[] Ganglia

CPU Temp

As defined

by SDR

(Sensor

Data

Record)

As defined by

SDR SlaveNode[] Nagios

As defined

by SDR PENDING Master Node[] Nagios

As defined

by SDR

As defined by

SDR EdgeNode[] Nagios

Power Supplies

As defined

by SDR

As defined by

SDR Master Node[] Nagios

As defined

by SDR

As defined by

SDR Edge Node[] Nagios

Master Node

_NFS_Mount DELAY

MOUNT

MISSING Master Node[] Nagios

Hbase DELAY SELECT FAILED EdgeNode[] Nagios

DELAY INSERT FAILED EdgeNode[] Nagios

Hive DELAY SELECT FAILED EdgeNode[] Nagios

DELAY INSERT FAILED EdgeNode[] Nagios

Ping_From_Admin IPMI Interface DELAY NO RESPONSE SlaveNode[] Nagios



Hadoop Ecosystem Components

Component Master Node Slave Node Edge Node Utilize From Administer From

Pig X X X Edge Node Edge Node

Hive X X Edge Node Edge Node

Sqoop X Edge Node Edge Node

Zookeeper-Server X (5) Edge Node Edge Node

X—Designates server location for the appropriate package binaries to be installed.

Pig

Install Binaries

# yum –y install hadoop-pig.noarch


38

Configuration

[root@admin2 conf]# pwd

/etc/pig/conf

[root@admin2 conf]# cat pig.properties

# Pig configuration file. All values can be overwritten by command line arguments.

# see bin/pig -help

# log4jconf log4j configuration file

# log4jconf=./conf/log4j.properties

# brief logging (no timestamps)

brief=false

# clustername, name of the hadoop jobtracker. If no port is defined port 50020 will be used.

#cluster

#debug level, INFO is default

debug=INFO

# a file that contains pig script

#file=

# load jarfile, colon separated

#jar=

# Remote Map Reduce Connectivity

fs.default.name=hdfs://namenode:8020/

mapred.job.tracker=namenode:8021

#verbose print all log messages to screen (default to print only INFO and above to screen)

verbose=false

#exectype local|mapreduce, mapreduce is default

#exectype=mapreduce

# hod realted properties

#ssh.gateway

#hod.expect.root

#hod.expect.uselatest

#hod.command

#hod.config.dir

#hod.param

#Do not spill temp files smaller than this size (bytes)

pig.spill.size.threshold=5000000

#EXPERIMENT: Activate garbage collection when spilling a file bigger than this size (bytes)

#This should help reduce the number of files being spilled.

pig.spill.gc.activation.size=40000000

######################

# Everything below this line is Yahoo specific. Note that I've made

# (almost) no changes to the lines above to make merging in from Apache

# easier. Any values I don't want from above I override below.

#

# This file is configured for use with HOD on the production clusters. If you


39

# want to run pig with a static cluster you will need to remove everything

# below this line and set the cluster value (above) to the

# hostname and port of your job tracker.

exectype=mapreduce

log.file=

Path Configuration

[root@admin2 jre]# export JAVA_HOME=/usr/java/jdk1.6.0_24/jre

Interactive Modes

# pig –x local

# pig –x mapreduce

Script Execution Modes

# pig –x local SCRIPT

# pig –x mapreduce SCRIPT

Hive

Install Binaries

# yum –y install hadoop-hive.noarch

Configuration


/etc/hive/conf

Sqoop

Install Binaries

# yum –y install sqoop

Configure


/etc/sqoop/conf

In case you need to use Sqoop to read data from or write data to a MySQL server, note that the MySQL JDBC

driver was removed from the Sqoop distribution in order to ensure that the default distribution is fully Apache

compliant. You can download the MySQL JDBC driver from http://www.mysql.com/downloads/connector/j/,

unzip it and move mysql-connector-java-*-bin.jar into /usr/lib/sqoop/lib directory on the machine(s) where

you installed the Sqoop service.

http://www.mysql.com/downloads/connector/j/


40

Also note that the data import/export process initiated and managed by Sqoop takes place between the

Hadoop file system HDFS and the database server (i.e. MySQL). This data marshalling process relies on

MapReduce, which provides fault tolerance and wide write bandwidth. The Hadoop worker nodes establish

individual (concurrent) connections with the database server. Database servers like MySQL need to be

preconfigured to grant data access to machines on the network. Thus the hostnames of the Hadoop worker

nodes, the usernames/passwords that they should use, the databases that they can access, etc. have to be

preconfigured in the database server configuration. An example of how to configure the MySQL server is

posted to http://www.debianadmin.com/mysql-database-server-installation-and-configuration-in-

debian.html.

ZooKeeper

Introduction: What is Apache ZooKeeper?

Apache ZooKeeper is a coordination system that has high availability built in. ZooKeeper allows distributed

applications to coordinate with each other. For example, a group of nodes (i.e. web servers) can use

ZooKeeper to deliver a highly-available service. They can use ZooKeeper to refer all clients to the master node.

Also they can use ZooKeeper to assign a new master in case the original master node fails.

Distributed processes using ZooKeeper coordinate with each other via a shared hierarchical name space of

data registers (called ZNodes). This name space is much like that of a standard file system. A name is a

sequence of path elements separated by a slash (/). Every ZNode in ZooKeeper’s name space is identified by a

path. And every ZNode has a parent whose path is a prefix of the ZNode with one less element; the exception

to this rule is root (/), which has no parent. Also, exactly like standard file systems, a ZNode cannot be deleted

if it has any children.

The main differences between ZooKeeper and standard file systems are that every ZNode can have data

associated with it (every file can also be a directory and vice-versa) and ZNodes are limited to the amount of

data that they can have. ZooKeeper was designed to store coordination data: status information, configuration,

location information, etc. This kind of meta-information is usually measured in kilobytes, if not bytes.

ZooKeeper has a built-in sanity check of 1M, to prevent it from being used as a large data store, but in general

it is used to store much smaller pieces of data.

ZooKeeper Service Topology

The ZooKeeper service is replicated over a set of machines that comprise the service. These machines

maintain an in-memory image (hence, high throughput, low latency) of the data tree along with transaction

logs and snapshots in a persistent store.

The machines that make up the ZooKeeper service must all know about each other. As long as a majority of

the servers are available the ZooKeeper service will be available. Clients also must know the list of servers. The

clients create a handle to the ZooKeeper service using this list of servers.

Because ZooKeeper requires majority, it is best to use an odd number of machines. For example, with four

machines ZooKeeper can handle only the failure of a single machine; if two machines fail, the remaining two

machines do not constitute a majority. However, with five machines ZooKeeper can handle the failure of two

machines. Therefore, to sustain the failure of F machines, the ZooKeeper service should be deployed on (2xF +

1) machines.

Guidelines for ZooKeeper Deployment

The reliability of ZooKeeper rests on two basic assumptions:

Only a minority of servers in a deployment will fail; size the number ZooKeeper machines in accordance to the

2xF+1 rule. If possible, you should try to make machine failures independent. For example, if most machines share

the same switch or are installed in the same rack, failure of that switch or a rack power failure could cause a

correlated failure and bring the service down.

Deployed machines operate correctly, which means execute code correctly, have clocks that operate properly and

have storage and network components that perform consistently. ZooKeeper has strong durability requirements,

which means it uses storage media to log changes before the operation responsible for the change is allowed to

complete. If ZooKeeper has to contend with other applications for access to resources like storage media, its

performance will suffer. Ideally, ZooKeeper’s transaction log should be on a dedicated device—a dedicated

partition is not enough.

http://www.debianadmin.com/mysql-database-server-installation-and-configuration-in-debian.html

http://www.debianadmin.com/mysql-database-server-installation-and-configuration-in-debian.html


41

For additional information, please click on http://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html.

Installing ZooKeeper Server on Dell | Cloudera Hadoop Solution Cluster

First, determine the machines that will run the ZooKeeper service. You should start with five HDFS Data Nodes

installed in three different racks. For example, these machines can be:

Rack ID DataNode Hostname/IP ZooKeeper Machine ID

RACK1

R01-N04 1

R01-N15 2

RACK2 R02-N07 3

R02-N11 4

RACK5 R05-N10 5

Following steps should be performed on each ZooKeeper machine:

1. Install the ZooKeeper Server package:

# yum –y install hadoop-zookeeper

# yum –y install hadoop-zookeeper-server

2. Create a ZooKeeper data directory on the system drive for ZooKeeper logs. The installer creates a

/var/zookeeper directory. You can use that directory or create a new one:

# mkdir /var/my_zookeeper

3. Create unique IDs, between 1 and 255, for each ZooKeeper machine and store them in the file myid in the

ZooKeeper data directory. For example, on ZooKeeper machine 1 run:

# echo 1 > /var/my_zookeeper/myid

4. On ZooKeeper machine 2, run:

# echo 2 > /var/my_zookeeper/myid

5. Edit the file /etc/zookeeper/zoo.cfg and append the ZooKeeper machine IDs of each machine and its IP address

or DNS name.

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

dataDir=/var/zookeeper

# the port at which the clients will connect

clientPort=2181

server.1=172.16.0.1:2888:3888

server.2=172.16.0.2:2888:3888

server.3=172.16.0.3:2888:3888

server.4=172.16.0.4:2888:3888

server.6=172.16.0.6:2888:3888

Note that the entries of the form server.X list the servers that make up the ZooKeeper service. When the

ZooKeeper machine starts up, it determines which ZooKeeper machine it is by looking for the file myid in the

http://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html


42

data directory. That file contains the server number in ASCII, and it should match X in server.X in the left-hand

side of this setting.

6. Increase the heapsize of the ZooKeeper-Server instance to 4GB. Edit the JVM settings in file /usr/bin/zookeeper-

server:

# vi /usr/bin/zookeeper-server

export JVMFLAGS="-Dzookeeper.log.threshold=INFO -Xmx4G"

Note: Don’t forget the double-quotes (“).

7. Save the file and start the ZooKeeper server:

# /etc/init.d/hadoop-zookeeper start

8. Verify that the ZooKeeper service started by reading the state of the service. On one the machines, not

necessarily a ZooKeeper machine, run the following command:

# echo stat | nc ZKNode_IP 2181

Where ZKNode_IP is the IP address or the hostname of one of the ZooKeeper machines and 2181 is the client

connect port specified in configuration file zoo.cfg.

The output should look something like this: Zookeeper version: 3.3.3-cdh3u0--1, built on 03/26/2011 00:21 GMT

Clients:

/172.16.3.20:49499[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0

Received: 1

Sent: 0

Outstanding: 0

Zxid: 0x100000004

Mode: leader

Node count: 4

For additional ZooKeeper commands click

http://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html#sc_zkCommands.

Maintenance

http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance.

Troubleshooting and Common Problems

http://archive.cloudera.com/cdh/3/zookeeper/zookeeperAdmin.html#sc_commonProblems

http://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html#sc_zkCommands

http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance

http://archive.cloudera.com/cdh/3/zookeeper/zookeeperAdmin.html#sc_commonProblems


43

References

Ghemawat, S. Gobioff, H. and Leung, S.-T. The Google File System. Proceedings of the 19th ACM Symposium

on Operating Systems Principles. pp 29--43. Bolton Landing, NY, USA. 2003. © 2003, ACM.

Borthakur, Dhruba. The Hadoop Distributed File System: Architecture and Design. © 2007, The Apache

Software Foundation.

Hadoop DFS User Guide. © 2007, The Apache Software Foundation.

HDFS: Permissions User and Administrator Guide. © 2007, The Apache Software Foundation.

HDFS API Javadoc © 2008, The Apache Software Foundation.

HDFS source code

Pig – http://developer.yahoo.com/hadoop/tutorial/pigtutorial.html

Pig – http://pig.apache.org/docs/r0.6.0/setup.html

Zookeeper – http://zookeeper.apache.org/doc/r3.2.2/zookeeperOver.html

Zookeeper – https://ccp.cloudera.com/display/CDHDOC/ZooKeeper+Installation

Zookeeper – http://archive.cloudera.com/cdh/3/zookeeper/zookeeperAdmin.html#sc_zkMulitServerSetup

Nagios – http://www.nagios.org/

Ganglia – http://ganglia.sourceforge.net/

Additional information can be obtained at www.dell.com/hadoop or by e-mailing [email protected].

To Learn More

For more information on the Dell | Cloudera Hadoop Solution, visit:

www.dell.com/hadoop

©2011 Dell Inc. All rights reserved. Trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Specifications are correct at date of publication but are subject to availability or change without notice at any time. Dell and its affiliates cannot be responsible for errors or omissions in typography or photography. Dell’s Terms and Conditions of Sales and Service apply and are available on request. Dell service offerings do not affect consumer’s statutory rights. Dell, the DELL logo, and the DELL badge, PowerConnect, and PowerVault are trademarks of Dell Inc.

http://labs.google.com/papers/gfs-sosp2003.pdf

http://hadoop.apache.org/core/docs/current/hdfs_design.html

http://hadoop.apache.org/common/docs/r0.20.2/hdfs_user_guide.html

http://hadoop.apache.org/common/docs/r0.20.2/hdfs_permissions_guide.html

http://hadoop.apache.org/core/docs/current/api/

http://hadoop.apache.org/core/version_control.html

http://developer.yahoo.com/hadoop/tutorial/pigtutorial.html

http://pig.apache.org/docs/r0.6.0/setup.html

http://zookeeper.apache.org/doc/r3.2.2/zookeeperOver.html

https://ccp.cloudera.com/display/CDHDOC/ZooKeeper+Installation

http://archive.cloudera.com/cdh/3/zookeeper/zookeeperAdmin.html#sc_zkMulitServerSetup

http://www.nagios.org/

http://ganglia.sourceforge.net/

http://www.dell.com/hadoop

http://www.dell.com/hadoop

Documents

Hadoop Deployment Guide v1.0