13
Hands-On Hadoop Hands-On Hadoop Tutorial Tutorial Chris Sosa Chris Sosa Wolfgang Richter Wolfgang Richter May 23, 2008 May 23, 2008

Hadooptutorial 090807203926-phpapp02

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Hadooptutorial 090807203926-phpapp02

Hands-On Hadoop Hands-On Hadoop TutorialTutorial

Chris SosaChris Sosa

Wolfgang RichterWolfgang Richter

May 23, 2008May 23, 2008

Page 2: Hadooptutorial 090807203926-phpapp02

General InformationGeneral Information

Hadoop uses HDFS, a distributed file Hadoop uses HDFS, a distributed file system based on GFS, as its shared system based on GFS, as its shared filesystemfilesystem

HDFS architecture divides files into HDFS architecture divides files into large chunks (~64MB) distributed large chunks (~64MB) distributed across data serversacross data servers

HDFS has a global namespaceHDFS has a global namespace

Page 3: Hadooptutorial 090807203926-phpapp02

General Information (cont’d)General Information (cont’d) Provided a script for your convenienceProvided a script for your convenience

– Run source /localtmp/hadoop/setupVars from Run source /localtmp/hadoop/setupVars from centurtion064centurtion064

– Changes all uses of {somePath}/command to just Changes all uses of {somePath}/command to just commandcommand

Goto Goto http://www.cs.virginia.edu/~cbs6n/hadoophttp://www.cs.virginia.edu/~cbs6n/hadoop for web access. These slides and more for web access. These slides and more information are also available there.information are also available there.

Once you use the DFS (put something in it), Once you use the DFS (put something in it), relative paths are from /usr/{your usr id}. E.G. if relative paths are from /usr/{your usr id}. E.G. if your id is tb28 … your “home dir” is /usr/tb28your id is tb28 … your “home dir” is /usr/tb28

Page 4: Hadooptutorial 090807203926-phpapp02

Master NodeMaster Node

Hadoop currently configured with Hadoop currently configured with centurion064 as the master nodecenturion064 as the master node

Master nodeMaster node– Keeps track of namespace and Keeps track of namespace and

metadata about itemsmetadata about items– Keeps track of MapReduce jobs in the Keeps track of MapReduce jobs in the

systemsystem

Page 5: Hadooptutorial 090807203926-phpapp02

Slave NodesSlave Nodes

Centurion064 also acts as a slave nodeCenturion064 also acts as a slave node

Slave nodesSlave nodes– Manage blocks of data sent from master Manage blocks of data sent from master

nodenode– In terms of GFS, these are the chunkserversIn terms of GFS, these are the chunkservers

Currently centurion060 is also another Currently centurion060 is also another slave nodeslave node

Page 6: Hadooptutorial 090807203926-phpapp02

Hadoop PathsHadoop Paths Hadoop is locally “installed” on each machineHadoop is locally “installed” on each machine

– Installed location is in /localtmp/hadoop/hadoop-Installed location is in /localtmp/hadoop/hadoop-0.15.30.15.3

– Slave nodes store their data in Slave nodes store their data in /localtmp/hadoop/hadoop-dfs (this is automatically /localtmp/hadoop/hadoop-dfs (this is automatically created by the DFS)created by the DFS)

– /localtmp/hadoop is owned by group gbg /localtmp/hadoop is owned by group gbg (someone in this group must administer this or a (someone in this group must administer this or a cs admin)cs admin)

Files are divided into 64 MB chunks (this is Files are divided into 64 MB chunks (this is configurable)configurable)

Page 7: Hadooptutorial 090807203926-phpapp02

Starting / Stopping HadoopStarting / Stopping Hadoop

For the purposes of this tutorial, we For the purposes of this tutorial, we assume you have run the setupVars assume you have run the setupVars from earlierfrom earlier

start-all.sh – starts all slave nodes start-all.sh – starts all slave nodes and master nodeand master node

stop-all.sh – stops all slave nodes and stop-all.sh – stops all slave nodes and master nodemaster node

Page 8: Hadooptutorial 090807203926-phpapp02

Using HDFS (1/2)Using HDFS (1/2) hadoop dfshadoop dfs

– [-ls <path>][-ls <path>]– [-du <path>][-du <path>]– [-cp <src> <dst>][-cp <src> <dst>]– [-rm <path>][-rm <path>]– [-put <localsrc> <dst>][-put <localsrc> <dst>]– [-copyFromLocal <localsrc> <dst>][-copyFromLocal <localsrc> <dst>]– [-moveFromLocal <localsrc> <dst>][-moveFromLocal <localsrc> <dst>]– [-get [-crc] <src> <localdst>][-get [-crc] <src> <localdst>]– [-cat <src>][-cat <src>]– [-copyToLocal [-crc] <src> <localdst>][-copyToLocal [-crc] <src> <localdst>]– [-moveToLocal [-crc] <src> <localdst>][-moveToLocal [-crc] <src> <localdst>]– [-mkdir <path>][-mkdir <path>]– [-touchz <path>][-touchz <path>]– [-test -[ezd] <path>][-test -[ezd] <path>]– [-stat [format] <path>][-stat [format] <path>]– [-help [cmd]][-help [cmd]]

Page 9: Hadooptutorial 090807203926-phpapp02

Using HDFS (2/2)Using HDFS (2/2)

Want to reformat?Want to reformat?

EasyEasy– hadoop namenode –formathadoop namenode –format

Basically we see most commands look similar Basically we see most commands look similar – hadoop “some command” optionshadoop “some command” options– If you just type hadoop you get all possible If you just type hadoop you get all possible

commands (including undocumented ones – commands (including undocumented ones – hooray)hooray)

Page 10: Hadooptutorial 090807203926-phpapp02

To Add Another SlaveTo Add Another Slave This adds another data node / job This adds another data node / job

execution site to the poolexecution site to the pool– Hadoop dynamically uses filesystem Hadoop dynamically uses filesystem

underneath itunderneath it– If more space is available on the HDD, HDFS If more space is available on the HDD, HDFS

will try to use it when it needs towill try to use it when it needs to Modify the slaves file Modify the slaves file

– In centurion064:/localtmp/hadoop/hadoop-In centurion064:/localtmp/hadoop/hadoop-0.15.3/conf0.15.3/conf

– Copy code installation dir to Copy code installation dir to newMachine:/localtmp/hadoop/hadoop-0.15.3 newMachine:/localtmp/hadoop/hadoop-0.15.3 (very small)(very small)

– Restart HadoopRestart Hadoop

Page 11: Hadooptutorial 090807203926-phpapp02

Configure HadoopConfigure Hadoop

Can configure in {$installation dir}/confCan configure in {$installation dir}/conf– hadoop-default.xml for globalhadoop-default.xml for global– hadoop-site.xml for site specific (overrides global)hadoop-site.xml for site specific (overrides global)

Page 12: Hadooptutorial 090807203926-phpapp02

That’s it for Configuration!That’s it for Configuration!

Page 13: Hadooptutorial 090807203926-phpapp02

Real-time AccessReal-time Access