19
TCS Hadoop Components Setup Ajay Vaidya 8/4/2012

Hadoop Components Setup

Embed Size (px)

DESCRIPTION

h

Citation preview

  • TCS

    Hadoop Components Setup

    Ajay Vaidya

    8/4/2012

  • Contents Purpose ......................................................................................................................................................... 3

    Interdependent Hadoop components .......................................................................................................... 3

    Before Starting Installation ........................................................................................................................... 5

    Hadoop components..................................................................................................................................... 5

    Hadoop, hdfs and Mapreduce .................................................................................................................. 5

    Download and Unpack .......................................................................................................................... 5

    Setting Parameters................................................................................................................................ 7

    Format filesystem and start hadoop ................................................................................................... 10

    Test hadoop installation ..................................................................................................................... 11

    Hbase ...................................................................................................................................................... 12

    Download and Unpack ........................................................................................................................ 12

    Setting Parameters.............................................................................................................................. 12

    Start Hbase .......................................................................................................................................... 14

    Test hbase ........................................................................................................................................... 15

    Hive ......................................................................................................................................................... 16

    Download and Unpack ........................................................................................................................ 16

    Setting Parameters.............................................................................................................................. 16

    Start Hive ............................................................................................................................................. 17

    Test hive .............................................................................................................................................. 17

    Pig ............................................................................................................................................................ 18

    Download and Unpack ........................................................................................................................ 18

    Start Pig ............................................................................................................................................... 19

  • Purpose This document describes as how to install following Hadoop components in single machine

    environment. Following installation procedure is tested on Open SUSE Linux OS on Vmware.

    1) Hadoop, Hdfs and Mapreduce

    2) Hbase

    3) Hive

    4) Pig

    This document also describes as how these various components are related to each other in terms of

    interdependency and parameters configuration.

    Please contact Ajay Vaidya ([email protected]) for any queries about this document.

    Interdependent Hadoop components

    Hadoop provides storage mechanism using HDFS supported by Mapreduce framework. Other Hadoop

    family components are based on this storage mechanism.

    mailto:[email protected]

  • Once you install all the components described in this document, your linux filesystem would look like

    following.

  • Before Starting Installation Make sure that you have 64bit linux environment with root access. Also it needs Java installed in the

    environment

    Hadoop components

    Hadoop, hdfs and Mapreduce

    Download and Unpack

    Hadoop (includes hdfs and Mapreduce) can be downloaded from apache mirror site

    http://apache.techartifact.com/mirror/hadoop/common/stable/

    http://apache.techartifact.com/mirror/hadoop/common/stable/

  • (Note : Always download version from stable reference)

    At the time of writing this document, the stable version was 1.0.3-1 and the file to be downloaded is

    hadoop-1.0.3-1.x86_64.rpm

    (Note: 64 stand for 64 bit version)

    Hadoop can be installed in either Single Node Setup or in a Clustered Environment. Here we are

    installing Hadoop on single machine in Single Node Setup

    >rpm ivh /hadoop-1.0.3-1.x86_64.rpm

    After executing this installation command, it creates files are two different places

    A) Jar files stored as /usr/share/hadoop

  • B) Environmental variables script and parameter xml file

    C) Log files location where log would be created when hadoop services are started

    Note that following shows some log files. But after the fresh installation, you would see this

    location with no log files created.

    Setting Parameters

    Parameter xml files are stored at /etc/hadoop location. Make changes to three xml files 1) core-

    site.xml 2) hdfs-site.xml 3) mapred-site.xml

    Examples of these three files are as follows. You can use vi editor command to edit these xml files like

    >vi core-site.xml

  • Hdfs would use port 9000 to listen to incoming requests on url hdfs://localhost:9000.

  • You need to set only dfs.replication property for base hadoop installation. Other properties are required

    for hbase installation and interworking with hadoop.

  • Format filesystem and start hadoop

    Format by using following command. Note : hadoop command is located at /usr/bin which would

    typically be in the system path. If it is not, you need to set the system path include the location where

    hadoop command is stored.

    >hadoop namenode format

    By default, namespace IDs for name node and data node should match. But if it does not match, you

    need to overwrite data node ID with name node ID.

    As you can see namespace IDs for both are same i.e namespaceID=233327041

    Start hadoop by executing start-all.sh command. By default, it is located in /usr/sbin which would be

    typically in the system path.

  • After issuing start-all.sh command, it prompts for password. Enter the password for root. It creates

    log files under HADOOP_LOG_DIR location. Check log files for any errors.

    Test hadoop installation

    Test if you can use hadoop command. For simplicity try to create and list one folder using hadoop fs

    command. If you are able to list the test folder, your hadoop installation is successful.

    Please note that you may not see input and output folders unless it is created. But you should see test

    folder since you have created it using hadoop fs command.

  • Hbase

    Download and Unpack

    Hbase can be downloaded from following apache site. Always download the stable version.

    http://apache.techartifact.com/mirror/hbase/stable/

    At the time of writing this document, Hbase stable version was 0.92.1

    Download file hbase-0.92.1.tar.gz and copy it to the folder on linux where you want to install hbase

    Run command >tar xfz hbase-0.92.1.tar.gz

    This unpacks the tar file and creates a folder in the same directory with name hbase-0.92.1

    Setting Parameters

    The Hbase parameters file hbase-site.xml is located in conf directory. Update it for various parameters

    as shown.

    You can edit hbase-site.xml file using vi editor using command >vi hbase-site.xml

    http://apache.techartifact.com/mirror/hbase/stable/

  • Parameter hbase.rootdir indicates the location where hbase stores information on hadoop hdfs.

    Please note that port number is 9000 same as we setup for name node while installing hadoop.

    Make sure that /etc/hosts file has entry for 127.0.0.1 for localhost

  • Start Hbase

    Hbase command is located in bin folder under the directory where hbase files are unpacked. If this

    bin directory is not in system path, you need to either specific full path or go the bin folder for

    executing hbase.

    Start the hbase using start-hbase.sh command

    Check for any errors in log files located under hbase-0.92.1/logs directory.

  • Start the hbase shell using hbase shell command.

    Test hbase

    Create sample table mytable and put / get sample data to test the hbase installation.

  • Hive

    Download and Unpack

    Hive can be downloaded from apache site from following mirror location. Always use stable version.

    At the time of writing this document, the stable version was 0.8.1

    http://apache.techartifact.com/mirror/hive/stable/

    Download hive-0.8.1.tar.gz and copy on to the linux system where you want to install hive.

    Run command >tar xfz hive-0.8.1.tar.gz

    This creates folder hive-0.8.1.

    Setting Parameters

    Define HIVE_HOME environment variable and also include HIVE bin folder in the system path.

    One way to do this is to define it in linux login script. For instance, it can be defined in .bash_profile file.

    This file can be edited using vi editor using command >vi .bash_profile

    Create /tmp and /user/hive/warehouse directories in hdfs using following commands

    >hadoop fs mkdir /tmp

    >hadoop fs mkdir /user/hive/warehouse

    http://apache.techartifact.com/mirror/hive/stable/

  • >hadoop fs chmod g+w /tmp

    >hadoop fs chmod g+w /user/hive/warehouse

    These folders looks like this once created.

    Start Hive

    Hive shell by executing hive command that is located in hive-0.8.1/bin folder.

    Test hive

    Test hive by creating sample table through the shell.

  • Pig

    Download and Unpack

    Pig can be downloaded from apache site from following location. Always use stable release.

    http://apache.techartifact.com/mirror/pig/stable/

    At the time of writing this document, Pig stable release was 0.10.0

    Download file pig-0.10.0.tar.gz and copy to the location on linux file system where you want to install

    Pig.

    Run command >tar xfz pig-0.10.0.tar.gz

    This creates folder pig-0.10.0

    http://apache.techartifact.com/mirror/pig/stable/

  • Start Pig

    Pig command line can be started by executing Pig command which is located in pig-0.10.0/bin

    folder.