Upload
penelope-hawkins
View
214
Download
0
Embed Size (px)
Citation preview
Tutorial on Hadoop Environment for ECE5610
1
Login to the Hadoop Server
Host name: 141.217.24.182, Port: 8001
2
If you are using Linux, you could simply use the following command:ssh –p 8001 [email protected]
Login to the Hadoop Server
• Use putty.exe to loginUsername: ab1234 (your AccessID)Password: your 9 digits student ID
3
Copy the WordCount.java• There are two files (“WordCount.java” and “input.txt”)
at “/opt” directory. Copy both files to your home directory.
4
Compile WordCount.java• Compile the program by the following command:
compilemr WordCount• Run the resulting executable jar file on Hadoop.
5
Copy Files to HDFS (Hadoop Distributed File System)
• Before you run the wordcount program, you need to first copy the local input file to your home directory at HDFS by the following command:hdfs dfs –copyFromLocal input.txt /home/accessID/input.txt
6
Run the Program on Hadoop• The command to execute the jar file: hadoop jar WordCount.jar WordCount [input_path] [output_path]
• Both input file and output file have to be located in your HDFS home directory (/home/yourAccessID/), and make sure the output_file does not exist before you run the program.
7
Browse files on HDFS• The command to view the input file: hdfs dfs -cat /home/jim/input.txt | less• The command to view the output file:
hdfs dfs -cat /home/jim/result.txt/* | less By viewing the output, you see your running output.
Note: In fact the output result.txt is not a file but a directory in HDFS, so we need to use * to browse all the content of it.
8
Other Commands of HDFS
• Type “hdfs dfs”, and you will find all the HDFS commands.
9