Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine...

Preview:

DESCRIPTION

Start the Virtual Machine

Citation preview

Working with Hadoop

Requirement

• Virtual machine software – VM Ware– VirtualBox

• Virtual machine images– Download from Cloudera (Founded by leaders in the field, including father of Hadoop)

Start the Virtual Machine

Inside the Virtual machine

•CentOS 6.4•JDK•Hadoop 2.5.0•Eclipse 4.2.6 (Juno)

Basics of HDFS (routine)

5

• With Terminal– hadoop– hadoop version– hadoop jar– hadoop fs …– hadoop fs -ls : List all file in HDFS– hadoop fs –put / -get / -mkdir / -rmdir...

Copy Files from Windows to VM

• WinSCP (see Demo at bin\scp_ssh\winscp575)– Protocol scp– Hostname (Get from ifconfig in Terminal)– Username/Passoword = cloudera/cloudera

6

Copy Files from VM (CentOS) to HDFS

• hadoop fs -put localfiles /user/cloudera

7

Copy Files from Windows to HDFS

• Via HUE services

8

Using web server – port 8888 (File manager)

Hadoop Administration

• http://hostname:50070/dfshealth.html#tab-overview

10

WordCount Example in HadoopWordCount Example in Hadoop

• #1: Via guidelines in Cloudera website• #2: Directly in Eclipse (Preferred)

WordCount in Cloudera Website

• http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_wordcount1.html

• Source code downloaded from http://tiny.cloudera.com/hadoopTutorialSample

• Source code details and explanations: http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_wordcount1_source.html

12

WordCount in Cloudera Website

• Create directory in HDFS– $ hadoop fs -mkdir /user/cloudera – $ hadoop fs -chown cloudera /user/cloudera– $ hadoop fs -mkdir /user/cloudera/wordcount

/user/cloudera/wordcount/input• Create sample text

– 1: Directly in CentOS $$ echo "Hadoop is an elephant" > file0 $ echo "Hadoop is as yellow as can be" > file1 $ echo "Oh what a yellow fellow is Hadoop" > file2And then move to HDFS$ hadoop fs -put file* /user/cloudera/wordcount/input– 2: Create in Windows and Copy to HDFS via HUE

13

WordCount in Cloudera Website

• Compilation error

14

WordCount Example in HadoopWordCount Example in Hadoop

• #1: Via guidelines in Cloudera website• #2: Directly in Eclipse (Preferred)

WordCount in Eclipse environment

• http://kishorer.in/2014/10/22/running-a-wordcount-mapreduce-example-in-hadoop-2-4-1-single-node-cluster-in-ubuntu-14-04-64-bit/

• https://www.youtube.com/watch?v=hJsaChh2Yhk (Some parts are different for ClouderaVM)

16

18

19

Update source codes (from website)

20

Adding JAR files to Project

21

usr/lib/hadoop; usr/lib/hadoop/lib;usr/lib/hadoop-mapreduce; usr/lib/hadoop-mapreduce/lib

22

Run ConfigRun Run Configurations

23

File Export

24

25

Update Properties in jar file

26

Prepare for run

• Make HDFS directory

27

Copy sample input to HDFS (via HUE)

28

Run the example (in .jar folder)(Make sure to remove output folder before

use)

29

View the result

30

Other sources

• Very nice example @ https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

31

Recommended