9
Tutorial on Hadoop Environment for ECE5610 1

Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Embed Size (px)

Citation preview

Page 1: Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Tutorial on Hadoop Environment for ECE5610

1

Page 2: Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Login to the Hadoop Server

Host name: 141.217.24.182, Port: 8001

2

If you are using Linux, you could simply use the following command:ssh –p 8001 [email protected]

Page 3: Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Login to the Hadoop Server

• Use putty.exe to loginUsername: ab1234 (your AccessID)Password: your 9 digits student ID

3

Page 4: Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Copy the WordCount.java• There are two files (“WordCount.java” and “input.txt”)

at “/opt” directory. Copy both files to your home directory.

4

Page 5: Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Compile WordCount.java• Compile the program by the following command:

compilemr WordCount• Run the resulting executable jar file on Hadoop.

5

Page 6: Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Copy Files to HDFS (Hadoop Distributed File System)

• Before you run the wordcount program, you need to first copy the local input file to your home directory at HDFS by the following command:hdfs dfs –copyFromLocal input.txt /home/accessID/input.txt

6

Page 7: Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Run the Program on Hadoop• The command to execute the jar file: hadoop jar WordCount.jar WordCount [input_path] [output_path]

• Both input file and output file have to be located in your HDFS home directory (/home/yourAccessID/), and make sure the output_file does not exist before you run the program.

7

Page 8: Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Browse files on HDFS• The command to view the input file: hdfs dfs -cat /home/jim/input.txt | less• The command to view the output file:

hdfs dfs -cat /home/jim/result.txt/* | less By viewing the output, you see your running output.

Note: In fact the output result.txt is not a file but a directory in HDFS, so we need to use * to browse all the content of it.

8

Page 9: Tutorial on Hadoop Environment for ECE5610 1. Login to the Hadoop Server Host name: 141.217.24.182, Port: 8001 2 If you are using Linux, you could simply

Other Commands of HDFS

• Type “hdfs dfs”, and you will find all the HDFS commands.

9