23
EDUREKA Apache Hadoop Installation and Cluster setup on AWS EC2 (Ubuntu) – Part 1 A guide to install and setup Multi-Node Apache Hadoop Cluster on AWS EC2 edureka! 9/20/2013 A guide to setup a Multi-Node Apache Hadoop Cluster on AWS EC2 (using free tier eligible server)

Apache Hadoop Installation and Cluster setup on AWS EC2

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Apache Hadoop Installation and Cluster setup on AWS EC2

EDUREKA

Apache Hadoop Installation and Cluster

setup on AWS EC2 (Ubuntu) – Part 1

A guide to install and setup Multi-Node Apache Hadoop Cluster on AWS EC2

edureka!

9/20/2013

A guide to setup a Multi-Node Apache Hadoop Cluster on AWS EC2 (using free tier eligible server)

Page 2: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 1

APACHE HADOOP INSTALLATION AND

CLUSTER SETUP ON AWS EC2 (UBUNTU) –PART 1

A guide to install and setup Multi-Node Apache Hadoop Cluster on AWS EC2

Table of Contents Introduction ............................................................................................................................................ 2

1. Setting up the Cluster Infrastructure on AWS EC2 ......................................................................... 2

1.1 Creating a AWS Free Account ....................................................................................................... 2

1.1.1 Signup and register on AWS. .................................................................................................. 2

1.1.2 Use your correct contact number .......................................................................................... 4

1.1.3 Choose a Plan for your usage ................................................................................................. 4

1.2 Login to AWS ................................................................................................................................. 6

1.3. Creating Cluster member servers ................................................................................................ 7

1.3.1 Choose a free tier eligible instance ........................................................................................ 7

1.3.2 Create a key pair .................................................................................................................. 11

1.3.3 Configure Security Group and Firewall settings................................................................... 12

1.3.4 Review the pre-launch ......................................................................................................... 13

1.3.5 Launch the servers ............................................................................................................... 14

1.4 Setup client access to AWS servers............................................................................................. 16

1.4.1 Generate the Public/Private KeyPair ................................................................................... 16

1.4.2 Import keypair and save public/private keys ....................................................................... 16

1.4.3 Access the AWS EC2 servers ................................................................................................ 17

1.4.4 Setup WINSCP access to AWS EC2 servers .......................................................................... 22

Page 3: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 2

Introduction This setup and configuration document is a guide to setup a Multi-Node Apache Hadoop cluster on

Amazon Web Services (AWS) Elastic Cloud 2 (EC2) using ‘free tier usage eligible’ Ubuntu (t1.micro)

servers. If you are new to both AWS and Hadoop, this guide comes handy to quickly setup a Multi-

Node Apache Hadoop Cluster on AWS EC2.

Note AWS also provides a hosted solution for Hadoop, named Amazon Elastic Map Reduce (EMR) but Only Pig and Hive are available as of now and with a cost.

The guide describes the whole process in two parts:

Part 1: Setting up the Cluster Infrastructure on AWS EC2

This section describes step by step guide to setup an AWS account and launch the AWS EC2 free tier

eligible Ubuntu servers. These servers will be used to setup a four node Apache Hadoop Clusters on

AWS EC2 cloud infrastructure.

Part 2: Installing Apache Hadoop and Setting up the Cluster

This section provides step by step guide to install pre-requisites for Hadoop Installation and to

configure the cluster on EC2 servers. The section explains primary Hadoop configuration files,

Password-less SSH access, configuring master and slaves, and service start/stop in detail.

Note

The configuration described here is intended for learning purposes only.

1. Setting up the Cluster Infrastructure on AWS EC2 This section describes the steps to create a free account and launch Ubuntu servers on AWS EC2 for

Apache Hadoop Installation and Cluster Setup.

1.1 Creating a AWS Free Account The first step is to create a free trial account in AWS. You can review the limit on free services at

http://aws.amazon.com/free/

1.1.1 Signup and register on AWS. You can sign up on AWS using your email id and credit card.

Even though the AWS EC2 free tier eligible instances are available without any additional cost, you

need to specify the credit card during the account creation.

As explained in the following image, your credit card will be billed if your monthly usage goes

beyond the free tier. For example, using any additional AWS resource or service such as Elastic

Block Store (EBS).

Page 4: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 3

FIGURE 1-1 SPECIFY YOUR CREDIT CARD DETAILS

Page 5: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 4

1.1.2 Use your correct contact number Please ensure that you provide a correct contact number as AWS verify your identity through a

phone call on your number.

FIGURE 1-2 VERIFY THE DETAILS

1.1.3 Choose a Plan for your usage Choose basic plan for trial usage. This plan is good enough to create the cluster and to play around p.

Page 6: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 5

FIGURE 1-3 CHOOSE A PLAN

Page 7: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 6

1.2 Login to AWS Login to your AWS account and access the ‘AWS Management Console’.

FIGURE 1-4 AWS MANAGEMENT CONSOLE

Choose EC2 and access EC2 Dashboard to create cluster member servers.

Page 8: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 7

FIGURE 1-5 EC2 DASHBOARD

1.3. Creating Cluster member servers Click on ‘Launch Instance’ and choose ‘Classic Wizard’ to create, configure and launch your Cluster

Servers.

1.3.1 Choose a free tier eligible instance Choose an Instance configuration. All the option with the ‘orange’ colour star are Free tier eligible

instances. (If used with a micro instance).

Page 9: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 8

FIGURE 1-6 QUICK LAUNCH

Choose Ubuntu 12.04.2 LTS. Remember to change number of Instances to 4. This will simultaneously

create four Ubuntu instances.

Page 10: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 9

FIGURE 1-7 CHOOSE INSTANCE DETAILS

Ensure that you choose free tier for the setup. Keep the defaults but change the root volume to 5 or

6 GiB so that the total HDD usage (4*5 =20 GiB) is below the free tier limit of 30 GiB/Month.

Page 11: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 10

FIGURE 1-8 INSTANCE DETAILS

Choose a name and add any other tag for billing or operations purpose.

FIGURE 1-9 CHOOSE NAME AND TAGS

Page 12: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 11

1.3.2 Create a key pair This is the most important part of launching and creating the AWS instances. AWS provides a

private/public key based access to the servers. You can choose a previously created key or can

create a new key pair. We will create and download the fresh key pair. Keep the Key Pair file (.pem)

safe in your PC as this will be needed to access the servers.

FIGURE 1-10 CREATE AND DOWNLOAD A KEY PAIR

Page 13: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 12

1.3.3 Configure Security Group and Firewall settings You need to choose a security group to control the access to the services on server. You can create a

new Group or use the existing one.

Create a group with default options and Add ‘All TCP’, ‘All ICMP’ and ‘SSH (22)’ under the inbound

rules. This will allow ping, SSH, and other similar commands among servers and from any other

machine on internet.

These protocols and ports are also required to enable communication among cluster servers. As this

is a test setup we are allowing access to all for TCP, ICMP and SSH and not bothering about the

details of individual server port and security.

FIGURE 1-11 CONFIGURE FIREWALL

Page 14: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 13

1.3.4 Review the pre-launch Review all the settings before you proceed with the server creation.

FIGURE 1-12 REVIEW THE SERVER CREATION

Page 15: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 14

1.3.5 Launch the servers Launch the servers and review the Instance page for newly launched servers.

FIGURE 1-13INSTANCE REVIEW AT EC2 DASHBOARD

Rename the servers according to their roles in cluster.

FIGURE 1-14 RENAME THE SERVERS AS PER THEIR ROLES

Page 16: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 15

Here is the final list of instances:

FIGURE 1-15 SERVER DETAILS

Make a note of the public URL of servers such as ‘ec2-54-212-38-184.us-west-

2.compute.amazonaws.com’. These URL’s will be used to access the servers from your PC and to

monitor the HDFS health from your browser.

FIGURE 1-16 SERVER DETAILS

Page 17: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 16

1.4 Setup client access to AWS servers You need to setup password-less SSH access among servers to setup the cluster. Especially from

Master server to Slave servers to ensure that Master Server can remotely start the Data Node and

Task Tracker services on Slave servers.

1.4.1 Generate the Public/Private KeyPair Download ’putty’ to access the AWS EC2 servers. Also download ‘puttygen’ to generate the

public/private keypair from the ‘.pem’ created in step “1.3.2 Create a Key pair”

1.4.2 Import keypair and save public/private keys Open ‘puttygen’ and import the ‘.pem’ file downloaded to your PC in step “1.3.2 Create a Key pair”.

FIGURE 1-17 IMPORT THE KEY PAIR

You can give passphrase to protect your private key or leave the passphrase fields blank to use the

private key without any passphrase. The passphrase protects the private key from any unauthorized

access to servers using your machine and your private key. Every access to servers using passphrase

protected private key will require end user to enter the passphrase to enable the private key

enabled access to AWS EC2 server.

Page 18: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 17

FIGURE 1-18 CREATE PUBLIC/PRIVATE KEYS

1.4.3 Access the AWS EC2 servers Access the servers using the private key created in Step 1.4.2 Import keypair and save public/private

keys and note down their hostname and IP addresses using ifconfig command.

Page 19: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 18

FIGURE 1-19 ADD THE PRIVATE KEY TO PUTTY

You may receive following error if you have not appropriately configured your security group in Step

1.1.3 .

Page 20: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 19

FIGURE 1-20 ADD THE PRIVATE KEY TO PUTTY

Note the IP Address and update the /etc/hosts file with hostname and IP address.

Page 21: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 20

FIGURE 1-21 HOST IP ADDRESS

Change the hostname to Public URL of AWS EC2 server using the following command:

$sudo hostname ec2-54-214-206-65.us-west-2.compute.amazonaws.com

Page 22: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 21

FIGURE 1-22 CHANGE HOSTNAME

Edit /etc/hosts with Public ID of your AWS EC2 server:

$sudo vi /etc/hosts

FIGURE 1-23 HOSTNAME CHANGE

Also, repeat all the steps in this particular Section (1.4.3) ion all the other three cluster servers to

enable public access to these AWS EC2 servers.

Page 23: Apache Hadoop Installation and Cluster setup on AWS EC2

© 2 0 1 3 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t . L t d

Page 22

1.4.4 Setup WINSCP access to AWS EC2 servers Use the private key created in Step 1.4.2 Import keypair and save public/private keys to access the

servers from desktop with WINSCP for any file download and upload to/from the servers from/to

your PC.

FIGURE 1-24 SETUP WINSCP

Copy the .pem file and other keys to Master server using WinSCP

You are ready with the infrastructure to create your first Apache Hadoop Cluster.

Please Review the Part -2 of this guide to create the Apache Hadoop Cluster.