36
CSCI4180 Tutorial2 Hadoop Setup on OpenStack Windows Azure Guide ZHANG, Mi [email protected] Sep. 24, 2015

CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi [email protected]

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

CSCI4180  Tutorial-­‐2Hadoop Setup  on  OpenStack

Windows  Azure  Guide

ZHANG,  Mi

[email protected]

Sep.  24,  2015

Page 2: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Outline

• Hadoop setup  on  OpenStackØSet  up  Hadoop  clusterØManage  Hadoop  clusterØWordCount  Example

• Windows  Azure  guideØAccess  AzureØCreate  VMsØInstall  Hadoop

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   2

Page 3: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster• We’ve  created  three  VM  instances  of  our  own.

ØArchitecture

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   3

Page 4: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• We’ll  set  up  small-­‐scale  Hadoop  cluster  using  these  VM  instances.

• What  you’ve  done  in  tutorial-­‐1:ØSetting  up  HTTP  proxy.ØInstalling  Java.ØConfiguring  /etc/hosts.

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   4

Page 5: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Switch  to  normal  user  “hadoop”§ su -­‐ hadoop

• If  you  do  not  have  user  “hadoop”§ adduser hadoopØenter  your  password  when  necessary…§ su -­‐ hadoop

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   5

Page 6: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Download  Hadoop  on  EACH  node§ wgethttp://archive.apache.org/dist/hadoop/core/hadoop-­‐0.20.203.0/hadoop-­‐0.20.203.0rc1.tar.gz

• Place  Hadoop  in  home  directory  on  EACH  node§ tar  xzf hadoop-­‐0.*.*.tar.gz§ mv  hadoop-­‐0.*.*  hadoop

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   6

Page 7: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Set  environment  variable  on  EACH  node.ØI  recommend  you  put  them  in  ~/.bashrc§ export  HADOOP_HOME=~/hadoop§ export  PATH=$PATH:$HADOOP_HOME/bin

• Set  hadoop environment  on  EACH  node.ØAppend  the  following  lines  to  ~/hadoop/conf/hadoop-­‐env.sh§ export  JAVA_HOME=/usr/lib/jvm/java-­‐7-­‐oracle§ #depends  on  where  you  put  the  jvm§ export  HADOOP_OPTS=-­‐Djava.net.preferIPv4Stack=true

• Set  path  for  HDFS  storage  on  EACH  node.§ #under  HOME  directory§ mkdir hadoop/tmp

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   7

Page 8: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Configure  SSH  on  EACH  node§ ssh-­‐keygen -­‐t  rsa -­‐P  ""§ cat  $HOME/.ssh/id_rsa.pub  >>  $HOME/.ssh/authorized_keys

• Configure  SSH  on  namenode only§ ssh-­‐copy-­‐id  -­‐i $HOME/.ssh/id_rsa.pub  hadoop@vm1§ ssh-­‐copy-­‐id  -­‐i $HOME/.ssh/id_rsa.pub  hadoop@vm2§ ssh-­‐copy-­‐id  -­‐i $HOME/.ssh/id_rsa.pub  hadoop@vm3

• Check  SSH  configuration  Øwhether  namenodecan  ssh all  the  datanodeswithout  typing  password.  E.g.,

§ ssh vm2

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   8

Page 9: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Set  hadoop core  on  EACH  nodeØAdd  property  in  ~/hadoop/conf/core-­‐site.xml

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   9

<property>  <name>hadoop.tmp.dir</name>    <value>/home/hadoop/hadoop/tmp</value>  </property><property>  <name>fs.default.name</name>    <value>hdfs://vm1:54310</value>  </property>

Page 10: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Set  hadoop mapreduce on  EACH  nodeØAdd  property  in  ~/hadoop/conf/mapred-­‐site.xml

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   10

<property>  <name>mapred.job.tracker</name>    <value>vm1:54311</value>  </property>

Page 11: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Set  hadoop HDFS  on  EACH  nodeØAdd  property  in  ~/hadoop/conf/hdfs-­‐site.xml

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   11

<property>  <name>dfs.replication</name>    <value>3</value>  </property>

Page 12: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Set  hadoop master  on  namenodeØAdd  hostname  which  is  supposed  to  run  NameNode andJobTracker in  ~/hadoop/conf/masters

§ vm1

• Set  hadoop slaves  on  namenodeØAdd  hostname  which  is  supposed  to  run  DataNode andTaskTracker in  ~/hadoop/conf/slaves

§ vm1§ vm2§ vm3

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   12

Page 13: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Format  namenode  on  namenode§ hadoop  namenode  –format

• Start  hadoop on  namenode§ start-­‐dfs.sh§ start-­‐mapred.sh§ #  you  can  type  “jps”  to  see  whether  the  startup  is  successful.

• Stop  hadoop on  namenode§ stop-­‐mapred.sh§ stop-­‐dfs.sh

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   13

Page 14: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Set  up  Hadoop Cluster

• Some  operations  related  to  HDFSØFrom  Local  to  HDFS§ hadoop dfs -­‐copyFromLocal <local  dir/file>  <hdfs URI>  (  for  user  home  URI:  /home/hadoop )ØFrom  HDFS  to  Local§ hadoop dfs -­‐copyToLocal <hdfs URI>  <local  dir/file>ØList  files  in  HDFS§ hadoop dfs -­‐ls <hdfs URI>ØCat  files  in  HDFS§ hadoop dfs -­‐cat  <hdfs URI>

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   14

Page 15: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Manage  Hadoop Cluster

• Add  one  more  instance  into  clusterØStop  Hadoop services  on  namenodeØFor  the  new  instance,  repeat  steps  from  slide  4  to  slide  11

ØAdd  IP  of  new  instance  in  ~/hadoop/conf/slaveson  namenode

ØFormat  namenode and  start  Hadoop

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   15

Page 16: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Manage Hadoop Cluster

• Remove  one  instance  from  clusterØStop  Hadoop services  on  namenodeØRemove  IP  of  the  instance  from  ~/hadoop/conf/slavesØFormat  namenode and  start  Hadoop

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   16

Page 17: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

WordCount Example

• Download  the  java  source  code  from  course  website,  say,  WordCount.java,  to  your  namenode,  home  directory• Compile  and  run  the  program

§ mkdir wordcount§ javac -­‐classpath $HADOOP_HOME/hadoop-­‐core-­‐0.20.203.0.jar  WordCount.java  -­‐d  wordcount

§ jar  -­‐cvf wordcount.jar  -­‐C  wordcount/  .§ hadoop jar  wordcount.jar  org.myorg.WordCount*/HDFS  URI/to/input/file*  */HDFS  URI/to/output/directory*

ØNote  that  the  part-­‐r-­‐00000   is  the  actual  output.

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   17

Page 18: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Windows  Azure  platform

• Windows  Azure  guideØAccess  AzureØCreate  VMsØInstall  Hadoop

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   18

Page 19: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Overview

• Get  the  14-­‐character  code  before  you  start.• Redeem  your  Windows  Azure  at

https://www.microsoftazurepass.com/azureu

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   19

Page 20: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Redeem  the  pass

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   20

Click  here

Page 21: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Redeem  the  pass

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   21

Hong  Kong

• Followed  with  some  register  information.• To  redeem  the  pass,  you  also  need  a  windows  live  account.

Enter  code  here

Page 22: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Redeem  the  pass

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   22

Login  your  windows  account

Page 23: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Redeem  the  pass

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   23

Page 24: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Redeem  the  pass

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   24

phone  number

Page 25: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Redeem  the  pass

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   25

Page 26: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Create  VMs

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   26

Click  Products

Page 27: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Create  VMs

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   27

Page 28: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Create  VMs

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   28

Page 29: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Create  VMs

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   29

Page 30: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Create  VMs

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   30

Page 31: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Create  VMs

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   31

You  can  ssh  to  your  VM  using   this  IP.

Page 32: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Create  VMs• In  your  terminal,  ssh azureuser@*your  vm IP*

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   32

Page 33: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Create  VMs

• If  you  SSH  the  VM  via  cse wired  network,  you  may  need  to  configure  your  ssh setting.ØAppend  the  following  lines  to  ~/.ssh/config

• Host  *your  hostname*• User  azureuser• HostName*your  hostname*• ProxyCommand nc -­‐x  socks.cse.cuhk.edu.hk:1080  %h  %p

• Then  you  can  login  to  the  vm using  sshØssh *your  hostname*

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   33

Page 34: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Install  Hadoop

• Install  Java  on  EACH  VM:Øsudo apt-­‐get  updateØsudo apt-­‐get  upgradeØsudo add-­‐apt-­‐repository  ppa:webupd8team/javaØsudo apt-­‐get  updateØsudo apt-­‐get  install  oracle-­‐java7-­‐installer

• You  could  follow  the  instruction  in  Tutorial  1.Øhttp://mtyiu.github.io/csci4180-­‐fall15/

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   34

Page 35: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

Install  Hadoop

• Repeat  the  process  of  installing  hadoop on  OpenStack from  slide  5  to  slide  11.ØSlide  8:    replace  vm1,  vm2  with  their  respective  public  IP.  ØSlide  9,  10:  change  "vm1"  to  "127.0.0.1"  when  editing  .xml  files.

• Set  hadoop  masters  on  namenode  ØEdit ~/hadoop/conf/masters

• 127.0.0.1

• Set  hadoop  slaves  on  namenode  ØEdit  ~/hadoop/conf/slaves

• 127.0.0.1• *Another  vm  public  IP  *

• After  starting  HDFS  and  MapReduce,  you  can  run  the  WordCountexample.

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   35

Page 36: CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure… · 2019-12-18 · CSCI4180Tutorial 12 HadoopSetup(on(OpenStack Windows(Azure(Guide ZHANG,’ Mi mzhang@cse.cuhk.edu.hk

ZHANG,  Mi  (CUHK) CSCI4180  Tutorial-­‐2   36