Amazon Aws

Embed Size (px)

DESCRIPTION

amazon web service

Citation preview

Installing Hadoop and Running word count ApplicationCahya Perdana2015833731. Create instances on Amazon Web Service

Choose Ubuntu image

Add 4 nodes, so we will have 4 node (1 master node, 1 secondary master, and 2 slaves)

Setup storage each nodes, We recommend each node should have 29 GB.

When we are in configure security group just clik review and launch.

Finally click launch to launch all the instances.

When there is dialog box to download key pair, click download key pair. This key pair will be useful to try connect to all instances that we have made.

All the instances that we have made are running well.

To make easy how we know which is the master node and slave node, we will change Name instances to make easy us remember.

Now we add protocol that we would like to use in Security group. Add SSH with port 22, All TCP with source anywhere, and All ICMP with source anywhere. With all this settings, we can remote our instances and even we can check by using Ping.2. Remote instancesTo remote our intstances, we need some tools:Putty Key Generator : tool that can help you to generate key access to connect our instances on amazon web service.Putty: tool that will help you to connect to our instances using SSH.

First we are going to generate key. Import key from key pair that we downloaded before.

Save the key from putty key generator.

Now open putty client, and then set SSH key in SSH menu.

Choose session, and fill with the dns of our instace.

Login using user Ubuntu.

Now open all instance in putty.

Now set up all the hostname in our nodes like this table:

AMI NamePublic DNSIP

Masterec2-54-191-198-212.us-west-2.compute.amazonaws.com172.31.45.31

slave2ec2-52-11-196-185.us-west-2.compute.amazonaws.com172.31.45.30

slave1ec2-54-191-199-0.us-west-2.compute.amazonaws.com172.31.45.29

secondaryec2-54-191-198-251.us-west-2.compute.amazonaws.com172.31.45.28

Edit hostname in each node using comman $vi /etc/hosts

Then we will upload our key generation to Master node, so Master node can connect to the other nodes by using SSH. To Upload file to the master node, we will use winscp. Set up connection in winscp to master node.

After we set up connection to master node, we will upload key from putty key generation (hadoop.pem)

3. Download HadoopThis instruction will apply to all nodes. First we have to upgrade our repository so we can get the latest java SDK.$sudo apt-get update$sudo add-apt-repository ppa:webupd8team/java

Run this command $sudo apt-get install oracle-jdk7-installer to install Java development kit version 7.Check Java SDK 7 with command $java version

Download Hadoop with this command $wgethttp://apache.mirror.gtcomm.net/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gzExtract the file using command $tar xzvf Hadoop1.2.1 tar.gzRename the folder by using this command$mv Hadoop1.21 hadoop

Do All this step in Secondary master and all slaves nodes.

4. Set Up Hadoop.

Before we set up Hadoop, we need to check whether our master node can access to all nodes that we have created before. To check we will connect to other nodes using SSH. First we have to add key that we have upload to master node from our computer. Then run this command.$ssh-add Hadoop.pem

Now we will check whether our SSH setting is running well, we run with command:$ssh [email protected].

Now check to the others node.

Next we will setup Hadoop configuration. We do this step in Master node and then we will copy all the configuration to the other nodes.

Hadoop-env.shFirst step is setup Hadoop-env.sh with this command.$vi $HADOOP_CONF/Hadoop-env.shAdd java parameter like this picture.

Core-site.xml$vi $HADOOP_CONF/Core-site.xml

Now add this property:

fs.default.namehdfs://ec2-54-209-221-112.compute-1.amazonaws.com:8020

hadoop.tmp.dir/home/ubuntu/hdfstmp

HDFS-site.xmlThis file contains the configuration for HDFS daemons, the master node, secondary master and slave nodes.

dfs.replication2

dfs.permissionsfalse

Mapred-site.xmlThis file contains the configuration settings for MapReduce daemons;

mapred.job.trackerhdfs://ec2-54-191-198-212.us-west-2.compute.amazonaws.com:8020

Move all this configuration from master node to all the other nodes.$scp Hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml [email protected]:/home/ubuntu/hadoop/conf

Do this step to the other nodes.

5. Setup Master & Slave

Modify file master & slave on Master Nodes.

Modify file master with this command $vi $HADOOP_CONF/masterThen add all Master node in this.

Modify file slave with this command $vi $HADOOP_CONF/slaveThen add all slave node.

Next copy master and slave file to secondary master.$scp master slaves [email protected]:/home/ubuntu/hadoop/conf

Modify file master and slave for Slave nodes.

This step is almost the same with the previous step, but in master file on slave node there is no master node, so we will empty this file.

For the slave file we will add

Setup Hadoop-env.sh

Masters

Slaves

Setup slavesSetup the master file, we delete all the values in this.

SlavesFor the slaves file, add the next slave node.

6. Startup DaemonThe first step to start our Hadoop is formatting the Hadoop filesystem.$hadoop namenode format

Start all Hadoop from master node with command:$cd $HADOOP_CONF$start-all.sh

Now, lets check our Hadoop. From browser.

Link to check our Hadoop status.Ec2-54-191-198-212.us-west-2.compute.amazonaws.com:50070/dfshealth.jsp

Link to check task tracker.Ec2-54-191-199-0.us-west-2.compute.amazonaws.com:50060/tasktracker.jsp

To quick verify our Hadoop it is working or not, submit this command:$hadoop jar hadoop-examples-1.2.1.jar pi 10 1000000

Running Example WordCount.1. Create wordCount.java$vi wordCount.javapackage org.myorg;import java.io.IOException;import java.util.*;import org.apache.hadoop.fs.Path;import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*;import org.apache.hadoop.mapred.*;import org.apache.hadoop.util.*;public class WordCount { public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); }}

2. Compile wordcount.java with this command.$ mkdir wordcount_classes$ javac -classpath /home/ubuntu/hadoop-core-1.2.1.jar -d wordcount_classes WordCount.java$ jar -cvf /home/ubuntu/wordcount.jar -C wordcount_classes/ .3. Prepare the input file.Create input file with .txt format$vi /home/Ubuntu/input.txtand add this sentences.my name is cahya perdana. My name means in Indonesia is I am the first light of hope

4. Create dfs folder to put input file.Move to /home/Ubuntu/Hadoop/binRun $./Hadoop dfs mkdir /home/Ubuntu/wordcount/input

To move input.txt to dfs folder, run this command.$./Hadoop dfs put /home/Ubuntu/input.txt /home/Ubuntu/wordcount/input

5. Running java application with this command.$./Hadoop jar /home/Ubuntu/wordcount.jar org.myorg.WordCount /home/Ubuntu/wordcount/input /home/Ubuntu/wordcount/output

If the program is running well, the output must be like this: