Upload
aya
View
36
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The EPIKH Project. (Exchange Programme to advance e-Infrastructure Know-How). CE+WN+siteBDII Installation and configuration. Bouchra RAHIM([email protected]) Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Rabat, 01.06.2011. www.epikh.eu. Outline. - PowerPoint PPT Presentation
Citation preview
www.epikh.eu
The EPIKH Project(Exchange Programme to advance e-Infrastructure Know-How)
CE+WN+siteBDII Installation and configuration
Bouchra RAHIM([email protected])
Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators
Rabat, 01.06.2011
2
Outline
• Computing Element overview• Worker Node overview• CE CREAM overview• gLite stack overview• gLite CE siteBDII• gLite CE cream and WN
3
gLite stack overview
4
gLite overview
worker node
5
glite overview• User Interface: it’s the point of access for users to
glite grid services• WMS: it’s the component that optimize resource
usage.• CE: the machine who manage worker nodes• WN: the machines who actually execute applications• SE: machines where files are stored• LFC: used to “find” files on the grid• BDII: services responsible to publish all info of your
sites• Logging and Bookkeping: as it’s name says it’s a
logger and alert user when job is finisched
6
Computing Element Overview
• Computing Element provides some of main services of a site.
• Main functionalities:– job management (job submission, job control)– job status updated for WMS– Communicate with BDII site that publishes all information
regarding the computing element
• It can runs several kinds of batch system:– Torque + MAUI– LSF– SGE– Condor
7
Torque + MAUI
• Torque server service:– pbs_server provides basic batch services such as
receiving/creating a batch job.
• Torque client service:– psb_mom places jobs into execution. It’s is also
responsible for returning job’s output to the user.
• MAUI system service:– job_scheduler contains site’s policy to decide which job is
going to be executed and when.
8
Site BDII*
• By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual.
• It collect all site GRISes* (for example SE,RB,LFC,etc...)
• Service is named bdii
• Log file: /opt/bdii/var/bdii.log
• *BDII = Berkeley Database Information Index• **GRIS = Grid Resouce Information Service
9
Worker Node Element Overview
• They are machines which really execute your job.
• User can only access their services by a Computing Element.
• Their characteristics are collected by Computing Element that publishes all information by BDII services
• Computing Resource Execution And Management
• Accept job submission requests belonging from a WMS and other job management request.
• It exposes a web services interface
10
CE Cream overview
11
Requirements
• Three or more machine:– One will be used to perform CE installation;– One will be used to perform site BDII installation;– Others will be used to perform WN installation;
• Architecture: 64 bit• Operating System: Scientific Linux 5• Two machines with a public ip address, direct and
reverse address resolution on a DNS (CE and BDII ) • The CE machine must be equipped with an X509
certificate
1212
BDII Installation)
13
Preparing the Linux machine
• Network Time Protocol settings
# yum install ntp• Copy the ntp.conf file and the ntp directory from
ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp)• Synchronize the date
# /etc/init.d/ntpd stop# ntpdate ntp.marwan.ma
# /etc/init.d/ntpd start# chkconfig ntpd on
• Start the ntpd service and configure it to start on boot
14
Preparing the Linux machine• Disable Selinux: make sure /etc/selinux/config contains line:
SELINUX=disabled
# /etc/init.d/iptables stop# chkconfig iptables off
• Stop iptables
• Please check If you have a valid hostname
#hostname –f# cat /etc/hosts
• Reboot
15
Repository set up-BDII
• Add to system repository ones specific for middleware to install
# cd /etc/yum.repos.d/# mv dag.repo dag.repo.stopexport MREPO=http://repo.magrid.ma/yumrepo/glite32
# REPOS="dag lcg-CA glite-BDII_site"# for name in $REPOS;do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done
16
package installation-BDII
• Use yum to install needed packets
# yum install lcg-CA ca-policy-egi-core ca-policy-lcg# yum install glite-BDII_site
17
Yaim Configuration• All the configuration samples files are located in /opt/glite/yaim/examples/siteinfo directory
• it’s better to make a copy of the original files
18
Yaim Configuration• You can find some template files in : ftp://repo.magrid.ma/pub/CE_WN_BDII/• Edit the site-info.def file and change the following variables:
– SITE_NAME=MA-ZZ-School (Name of the site)– CE_HOST=pcXX.magrid.ma (XX the machine that will be a CE)– SITE_BDII_HOST=pcYY.magrid.ma(the current machine)
• Edit the services/glite-bdii_site file and change the following variables:– SITE_NAME=MA-ZZ-School– SITE_DESC="MA-ZZ-School"
19
Yaim Configuration-BDII• Run the configuration Command:
• if everything is OK, run a basic test– ldapsearch -x -h pcYY.magrid.ma -p 2170 -b "mds-vo-name=local,o=grid"
/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-BDII_site
20
CE Cream Installation(on Torque/PBS)
20
21
Preparing the Linux machine
•Network Time Protocol settings
# yum install ntp• Copy the ntp.conf file and the ntp directory from
ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp)• Synchronize the date with an ntp server
# /etc/init.d/ntpd stop# ntpdate ntp.marwan.ma
# /etc/init.d/ntpd start# chkconfig ntpd on
• Start the ntpd service and configure it to start on boot
Preparing the Linux machine
22
Preparing the Linux machine
• Disable Selinux: make sure /etc/selinux/config contains line:
SELINUX=disabled
# /etc/init.d/iptables stop# chkconfig iptables off
• Stop iptables
• Please check If you have a valid hostname
#hostname –f# cat /etc/hosts
Preparing the Linux machine
• Reboot
23
Repository set up-CE
• Add to system repository ones specific for middleware to install
# cd /etc/yum.repos.d/# mv dag.repo dag.repo.stopexport MREPO=http://repo.magrid.ma/yumrepo/glite32
# REPO="dag lcg-CA glite-CREAM glite-TORQUE_server glite-TORQUE_utils"# for name in $REPOS;do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done
24
package installation-CE
• Use yum to install needed packets# yum clean all # yum install lcg-CA ca-policy-egi-core ca-policy-lcg# yum install glite-CREAM# yum install glite-TORQUE_server glite-TORQUE_utils
• Due to a dependency problem within the Tomcat distribution in SL5 first install xml-commons-apis:
yum install xml-commons-apis
25
Before configuration-HostCertificates• Some preliminary steps before configuration:
- copy host certificate in default path:
# cd# mv /root/pcXXcert.pem /etc/grid-security/hostcert.pem# mv root/pcXXkey.pem /etc/grid-security/hostkey.pem# chmod 400 /etc/grid-security/hostkey.pem# chmod 600 /etc/grid-security/hostcert.pem
26
YAIM configuration-CE• Main file to edit is site-info.def, where you specify some
general settings and other component’s parameters (CE Cream)
• Other file to be edited are: wn-list.conf, users.conf,groups.conf, services/glite-creamce
• Set variables with corrected values replacing example ones.
# vi services/glite-creamceCEMON_HOST=pcXX.$MY_DOMAINCREAM_DB_USER=eumedCREAM_DB_PASSWORD=grid2011BLPARSER_HOST=pcXX.$MY_DOMAIN
27
YAIM configuration-CE
# vi wn-list.conf pcAA.magrid.ma pcBB.magrid.ma
Declare the worker nodes in wn-list.conf
28
YAIM configuration-CECE_HOST=pcYY.magrid.maCE_CPU_MODEL=XEON #cat /proc/cpuinfoCE_CPU_VENDOR=IntelCE_CPU_SPEED=2230CE_OS=ScientificSL CE_OS_RELEASE=5.5 #cat /etc/redhat-releaseCE_OS_VERSION="Boron"CE_OS_ARCH=x86_64CE_MINPHYSMEM=512 #cat /proc/meminfo on WNCE_MINVIRTMEM=512 CE_PHYSCPU=1 #total cpu in site CE_LOGCPU=4 CE_SMPSIZE=4CE_OUTBOUNDIP=TRUECE_INBOUNDIP=FALSECE_OTHERDESCR="Cores=4,Benchmark=6.5-HEP-SPEC06”
http://gkswiki.fzk.de/index.php5/Configuration_of_the_CREAM_CE
29
YAIM configuration-CE• How to set CE_SI00, CE_SF00, CE_CAPABILITY, CE_OTHERDESCR
?
• Try to search for you value in this link:• http://www.italiangrid.org/grid_operations/site_manager/HEP-SPEC0
6
• https://hepix.caspur.it/benchmarks/doku.php?id=bench:results_sl5_x86_64_gcc_412
• https://hepix.caspur.it/processors/dokuwiki/doku.php?id=benchmarks:results
• For example if you have an Intel XEON 5520 2.23 GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1HS06=40 so:
• CE_SI00 = 3800
• CE_SF00 = 3800
• CE_CAPABILITY="CPUScalingReferenceSI00=3800”
• CE_OTHERDESCR="Cores=4,Benchmark=23.75-HEP-SPEC06”
• Where (3800/40)/4= 23.75
30
YAIM configuration-CE
BATCH_SERVER=$CE_HOSTJOB_MANAGER=lcgpbsCE_BATCH_SYS=pbsBATCH_LOG_DIR=/var/spool/pbsAPEL_DB_PASSWORD=grid2011DGAS_ACCT_DIR=/var/spool/pbs/server_priv/accountingVOS="eumed"QUEUES=“eumed"EUMED_GROUP_ENABLE="eumed"
31
YAIM configuration-CE
#/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n creamCE -n TORQUE_server -n TORQUE_utils
#/opt/glite/yaim/bin/yaim -r -s /opt/glite/yaim/etc/siteinfo/site-info.def -n creamCE -f config_cream_blparser
• After editing you can launch command:
http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream32
32
Check the CE
• http://grid.pd.infn.it/cream/field.php?n=Main.CheckYourCREAMCEConfiguration
• Download the script wget
http://grid.pd.infn.it/cream/CheckCreamConf/current/CheckCreamConf.pl
chmod +x CheckCreamConf.pl • Run it:./CheckCreamConf.pl • Check output :
• CheckCreamConf.log
33
WN Cream Installation(on Torque/PBS)
33
34
Preparing the Linux machine
•Network Time Protocol settings
# yum install ntp• Copy the ntp.conf file and the ntp directory from
ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp)• Synchronize the date
# /etc/init.d/ntpd stop# ntpdate ntp.marwan.ma
# /etc/init.d/ntpd start# chkconfig ntpd on
• Start the ntpd service and configure it to start on boot
Preparing the Linux machine
35
Preparing the Linux machine
• Disable Selinux: make sure /etc/selinux/config contains line:
SELINUX=disabled
# /etc/init.d/iptables stop# chkconfig iptables off
• Stop iptables
• Please check If you have a valid hostname
#hostname –f# cat /etc/hosts
Preparing the Linux machine
• Reboot
36
Repository set up-CE
•Add to system repository ones specific for middleware to install
# cd /etc/yum.repos.d/# mv dag.repo dag.repo.stopexport MREPO=http://repo.magrid.ma/yumrepo/glite32
# REPOS="dag lcg-CA glite-WN glite-TORQUE_client "# for name in $REPOS;do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done
Repository set up-WN
37
package installation-CE
•Use yum to install needed packets
# yum clean all # yum install -y lcg-CA ca-policy-egi-core ca-policy-lcg# yum groupinstall glite-WN# yum install glite-TORQUE_client
package installation-WN
38
WN - YAIM Configuration• You can use same configuration file edited on CE:
- this can be done on all worker node of a site;
- so you don’t neet to re-edit anything!
• Copy configuration files from CE machine using scp command:mkdir /opt/glite/yaim/etc/siteinfo/
mkdir /opt/glite/yaim/etc/siteinfo/services
#Copy the following files site-info.def ,users.conf,groups.conf and wn-list.conf from ceroot@pcYY:/opt/glite/yaim/etc/siteinfo/site-info.def#copy the glite-wn from examples/services
• Ready to configure now
# /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-WN -n TORQUE_client
39
WN - YAIM Configuration
• Ready to configure now
# /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-WN -n TORQUE_client
• A basic test:
• Check the status of pbs_mom• pbsnodes –a
40
• Ready to configure now
# /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-WN -n TORQUE_client
• A basic test:
• Check the status of pbs_mom• pbsnodes –a
4141
Testing installation
42
Tests on CE• SSH access to CE to test if CE can see WN and to test if all main
service are up & running
# pbsnodes # /etc/init.d/gLite status
43
Tests on CE
• SSH access to CE and then become a gilda user:
# su – eumed001
$ vi test.sh#!/bin/sh sleep 20 #(it's useful to see the job status) hostname
• Create a file and add the following:
• Set right permission to be executable:
$ chmod 700 test.sh
44
Tests on CE
• Launch job locally on CE
$ qsub –q eumed test.sh
• Then check list of job in execution on CE
$ qstat –a
ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - ----0.pc22.magrid.ma eumed001 short test.sh 5839 -- -- -- 00:15 R --
• In case you want to abort a job execution:
$ qdel 3 #that is jobid
• In case you want to more info:
$ qstat -f 3
45
Tests on CE
• If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output.
$ ls test.sh.e3 test.sh.o3$ cat test.sh.e3 #error file$$ cat test.sh.o3 #output filewn.localdomain
46
JDL example
$ vim hostname-cream.jdl
Type = "Job";JobType = "Normal";Executable = "/bin/hostname";StdOutput = "hostname.out";StdError = "hostname.err";OutputSandbox = {"hostname.err","hostname.out"};Arguments = "-f";OutputSandboxBaseDestUri = "gsiftp://localhost/tmp“;
47
Working test• SSH access to UI to test if CE can receive and execute
simple job$ ssh [email protected] #password: gridXX#set up the certificate
mkdir /home/grid01/.globus
[root@ui01 ~]# cp /root/user_cert/usercert.pem /home/grid01/.globus/usercert.pem
[root@ui01 ~]# cp /root/user_cert/userkey.pem /home/grid01/.globus/userkey.pem
[root@ui01 ~]# chown grid01 /home/grid01/.globus/usercert.pem
[root@ui01 ~]# chown grid01 /home/grid01/.globus/userkey.pem
[root@ui01 ~]# chmod 400 /home/grid01/.globus/userkey.pem
[root@ui01 ~]# su – grid01
[grid01@ui01 ~]$ voms-proxy-init --voms eumed
Enter GRID pass phrase: [grid2011]$ voms-proxy-init --voms eumedpassword[grid2011]#glite-ce-job-submit –r pc22.magrid.ma:8443/cream-pbs-eumed –o ID hostname-cream.jdl#glite-ce-job-status –i ID
48
Troubleshooting
• Which logs are supposed to be open if something goes wrong?:–/var/log/message, for general errors–/opt/glite/var/log (especially glite-
ce-cream.log)–/var/spool/pbs/server_priv/
accounting/<data>, if even local submission on batch system doesn’t work.
49
References• INFNGRID generic installation guide:
– http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:install-3_2
• YAIM configuration variables
– https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables
• CE Cream installation guide:
– GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki]
• YAIM system administrator guide:
– https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400
• EUMEDGRID wiki:
– http://wiki.eumedgrid.eu/bin/view
• EuMedGRID sites installation and setup tips
– http://wiki.eumedgrid.eu/twiki/bin/view/InfrastructureStatus/EumedSiteInstallation
• How To Check And Test Your CREAMCE
– http://grid.pd.infn.it/cream/field.php?n=Main.HowToCheckAndTestYourCREAMCE
50
Thank you for your kind attention !
Any questions ?