31
Working with Hadoop

Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Embed Size (px)

DESCRIPTION

Start the Virtual Machine

Citation preview

Page 1: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Working with Hadoop

Page 2: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Requirement

• Virtual machine software – VM Ware– VirtualBox

• Virtual machine images– Download from Cloudera (Founded by leaders in the field, including father of Hadoop)

Page 3: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Start the Virtual Machine

Page 4: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Inside the Virtual machine

•CentOS 6.4•JDK•Hadoop 2.5.0•Eclipse 4.2.6 (Juno)

Page 5: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Basics of HDFS (routine)

5

• With Terminal– hadoop– hadoop version– hadoop jar– hadoop fs …– hadoop fs -ls : List all file in HDFS– hadoop fs –put / -get / -mkdir / -rmdir...

Page 6: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Copy Files from Windows to VM

• WinSCP (see Demo at bin\scp_ssh\winscp575)– Protocol scp– Hostname (Get from ifconfig in Terminal)– Username/Passoword = cloudera/cloudera

6

Page 7: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Copy Files from VM (CentOS) to HDFS

• hadoop fs -put localfiles /user/cloudera

7

Page 8: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Copy Files from Windows to HDFS

• Via HUE services

8

Page 9: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Using web server – port 8888 (File manager)

Page 10: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Hadoop Administration

• http://hostname:50070/dfshealth.html#tab-overview

10

Page 11: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

WordCount Example in HadoopWordCount Example in Hadoop

• #1: Via guidelines in Cloudera website• #2: Directly in Eclipse (Preferred)

Page 12: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

WordCount in Cloudera Website

• http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_wordcount1.html

• Source code downloaded from http://tiny.cloudera.com/hadoopTutorialSample

• Source code details and explanations: http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_wordcount1_source.html

12

Page 13: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

WordCount in Cloudera Website

• Create directory in HDFS– $ hadoop fs -mkdir /user/cloudera – $ hadoop fs -chown cloudera /user/cloudera– $ hadoop fs -mkdir /user/cloudera/wordcount

/user/cloudera/wordcount/input• Create sample text

– 1: Directly in CentOS $$ echo "Hadoop is an elephant" > file0 $ echo "Hadoop is as yellow as can be" > file1 $ echo "Oh what a yellow fellow is Hadoop" > file2And then move to HDFS$ hadoop fs -put file* /user/cloudera/wordcount/input– 2: Create in Windows and Copy to HDFS via HUE

13

Page 14: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

WordCount in Cloudera Website

• Compilation error

14

Page 15: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

WordCount Example in HadoopWordCount Example in Hadoop

• #1: Via guidelines in Cloudera website• #2: Directly in Eclipse (Preferred)

Page 16: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

WordCount in Eclipse environment

• http://kishorer.in/2014/10/22/running-a-wordcount-mapreduce-example-in-hadoop-2-4-1-single-node-cluster-in-ubuntu-14-04-64-bit/

• https://www.youtube.com/watch?v=hJsaChh2Yhk (Some parts are different for ClouderaVM)

16

Page 17: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the
Page 18: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

18

Page 19: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

19

Page 20: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Update source codes (from website)

20

Page 21: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Adding JAR files to Project

21

Page 22: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

usr/lib/hadoop; usr/lib/hadoop/lib;usr/lib/hadoop-mapreduce; usr/lib/hadoop-mapreduce/lib

22

Page 23: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Run ConfigRun Run Configurations

23

Page 24: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

File Export

24

Page 25: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

25

Page 26: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Update Properties in jar file

26

Page 27: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Prepare for run

• Make HDFS directory

27

Page 28: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Copy sample input to HDFS (via HUE)

28

Page 29: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Run the example (in .jar folder)(Make sure to remove output folder before

use)

29

Page 30: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

View the result

30

Page 31: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the

Other sources

• Very nice example @ https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

31