Upload
hubert-fleming
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
How to get startedLondon Tier2
O. van der Aa
16/04/2007 Running the LT2
UK HEP Grid:GridPP, One T1, Four
T2
ScotGridDurham, Edinburgh, Glasgow NorthGridDaresbury, Lancaster, Liverpool,Manchester, Sheffield
SouthGridBirmingham, Bristol, Cambridge,Oxford, RAL PPD
London Tier2Brunel, Imperial, QMUL, RHUL, UCL
16/04/2007 Running the LT2
Imperial College
Spread Across two Sites•Physics Department
•465 KIS2K (Dual Core Intel Woodcrest)running sge 6.•60TB running dCache
•Computing Department •177 KIS2K (Opterons) runningsge 6.•Storage using the Physics Department one
•All Running RHEL4 and RHEL3 and using the LCG Tarball.•Local Physicist, CMS/LHCB/DZero
16/04/2007 Running the LT2
•324 KSI2K across two clusters. Two CE running pbs/maui
•6.5 TB of storage running DPM
•Complex situation wrt to networking. Grid is in demilitarized zone with 200Mb/s max.
•Local Physicist are mainly from CMS.
16/04/2007 Running the LT2
Biggest cluster in LondonMixture of Athlons,XeonsOpterons.
•Total of 1200 KSI2K running separate pbs/maui•Cluster shared by Astronomy/HEP/MaterialSciences. •Storage 18TB runningpoolfs and DPM•Expect to use worker nodelocal disk with luster. 400TB•Local community Atlas oriented
16/04/2007 Running the LT2
• Separate pbs/maui from ce
• 160 KSI2K
• 8TB running DPM
• ATLAS/ILC community
• Running slc3
• Will soon buy 265KSI2K and 136TB to come around april
16/04/2007 Running the LT2
Situation similar to Imperial:•Physics department
•24KSI2K and ~1TB •Computing department
•Shared cluster with 50 KSI2K•1.5 TB running DPM
•Running centos3, sge
16/04/2007 Running the LT2
Resource Summary
0
500
1000
1500
2000
2500
3000
1Q05 2Q05 3Q05 4Q05 1Q06 2Q06 3Q06 4Q06
London KSI2K
UCL
RHUL
QMUL
Imperial
Brunel
CPU: 2.5 MSI2K
Storage: 94 TB
64%
1%
9%
19%
7% BRUNEL
IC
QMUL
RHUL
UCL
16/04/2007 Running the LT2
How are the resources used ?
Currently around 70%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Sep-05Oct-05Nov-05Dec-05Jan-06Feb-06Mar-06Apr-06May-06Jun-06Jul-06Aug-06Sep-06Oct-06Nov-06Dec-06Jan-07Feb-07
London Grid Delivered MSI2K*hours
other gridbiomeddzerobabarcdflhcbcmsatlasalice
0%
10%
20%
30%
40%
50%
60%
70%
80%
Sep-05Oct-05Nov-05Dec-05Jan-06Feb-06Mar-06Apr-06May-06Jun-06Jul-06Aug-06Sep-06Oct-06Nov-06Dec-06Jan-07Feb-07
CPU Usage in London Grid
16/04/2007 Running the LT2
How to contact us
• Our mailing list: [email protected]– The coordinator: [email protected]– The T2 manager: [email protected]
• via GGUS: http://www.ggus.org– Specify UKI-LT2 in the subject field and the university– Use it for any specific problem once you are setup
• Our wiki: http://wiki.gridpp.ac.uk/wiki/London_Tier2– Used to describe the infrastructure– Gives links to monitoring pages
16/04/2007 Running the LT2
How to start ?
• “The Tree Steps” …
1. Register for a certificate (as explained in the ngs talk). • https://ca.grid-support.ac.uk/
2. With your certificate register to the ltwo virtualorganisation• https://voms.gridpp.ac.uk:8443/voms/ltwo/
3. Get access to a user interface• Ask via the lt2-technical mailing list. Each university
in the LT2 has a user interface
16/04/2007 Running the LT2
Summary, main Grid Components
• User Interface (UI) is where the user sits to submit his job• The Virtual Organisation Membership Service (VOMS) is
involved in authorizing and authenticating users• The Information System (IS) publishes the individual site
information (CE Queue names, SE contact points, #waiting jobs, #running jobs etc)
• The Workload Management System (WMS) take the user job find a compatible site and submit the job to the site CE.
• The Computing Element (CE) is the entrance point for the jobs to get into the computing cluster.
• The Storage Element (SE) is the equivalent of the CE but for data
16/04/2007 Running the LT2
The Main Grid Components
wms
voms
16/04/2007 Running the LT2
Information System
• Tree structure showing all available resources in the Grid. – Implemented in the form of a ldap server– Top Level view at
• lcg-bdii.gridpp.ac.uk, port 2170
– Interesting to have a look• Use Jxplorer ldap browser• http://www.jxplorer.org/
16/04/2007 Running the LT2
Submitting your first job
• Get a login on a user interface– In this case gfe03.hep.ph.ic.ac.uk
• Initialize your proxy– voms-proxy-init --voms ltwo
• Prepare your JDL (Job Description Language)– The name of the executable– The files you want to transfer before the job starts– Your constrains, for example:
• How much cpu time you need• Which subset of resources you want to use
16/04/2007 Running the LT2
The files
• Hello.jdlExecutable = "/bin/sh";Arguments = "Hello.sh";StdOutput = "std.out";StdError = "std.err";InputSandbox = {"Hello.sh"};OutputSandbox = {"std.out", "std.err"};
• Hello.sh#!/bin/shecho 'Hello LT2 Workshop' whoamihostname
16/04/2007 Running the LT2
Submitting
• Finding matching resources– edg-job-list-match Hello.jdl
*************************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found:
*CEId* ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-30min ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-72hr ce1.pp.rhul.ac.uk:2119/jobmanager-pbs-ltwogrid dgc-grid-35.brunel.ac.uk:2119/jobmanager-lcgpbs-short gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs-ltwo mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-10min mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-12hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-1hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-24hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-30min mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-3hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-6hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-72hr gw-2.ccc.ucl.ac.uk:2119/jobmanager-sge-default ***************************************************************************
16/04/2007 Running the LT2
Submitting
• The actual submission
– Edg-job-submit Hello.jdl**************************************************************************************
******* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier
(edg_jobId) is: - https://gfe01.hep.ph.ic.ac.uk:9000/izz75vlTThizfJVP-7VGdQ**************************************************************************************
*******
This is your job identifier you need to keep track of them
16/04/2007 Running the LT2
Checking the state of your job
• Edg-job-status [your job id]
16/04/2007 Running the LT2
Getting the result
• Edg-job-get-output [jobid]
– Will store your OutputSandbox files in /tmp/• Std.out• Std.err
– Content of std.out---------Hello LT2 Workshoplt2-ltwo007mars092.mars.lesc.doc.ic.ac.uk---------
16/04/2007 Running the LT2
JDL: more complex requirements
• Specify a CE in a domain– Requirements =
RegExp(".*mars.lesc.doc.ic.ac.uk.*$",other.GlueCEUniqueID);
• Require Some CPU Time (min)– Requirements =
RegExp(".*mars.lesc.doc.ic.ac.uk.*$",other.GlueCEUniqueID) && (other.GlueCEPolicyMaxCPUTime > 600);
• Require Some CPU*KSI2K Time– Requirements =
other.GlueCEPolicyMaxCPUTime > 30 * 500/other.GlueHostBenchmarkSI00 ))
More on how to master JDL at http://tinyurl.com/28oje9
16/04/2007 Running the LT2
Data Management
• In the previous example– All files are transferred via the SandBox– SandBox is limited to 100Mb
• Clearly something additional is required to transfer bigger datasets
Data Management tools: lcg utils, gfal
16/04/2007 Running the LT2
Catalogue services
•A “file” is identified by a GUID
•Several Alias (LFN) can be attached to the GUID
•One “file” can be located a several places (PFN)
16/04/2007 Running the LT2
Uploading a file to a storage element (SE)
• Finding list of SE– Lcg-info-sites --vo dteam SE
– If you don’t specify an SE the one closest tothe cluster will be used
• Uploading– lcg-cr --vo dteam -d gfe02.hep.ph.ic.ac.uk
file:myfile.dta – Returns: guid:ec362b1a-6f88-4860-a72b-68d4ad55eb59
16/04/2007 Running the LT2
GUID ?
• Remembering GUID is not human friendly
• You can give an alias (lfn) to a GUID.– lcg-aa --vo dteam guid:ec362b1a-6f88-4860-a72b-68d4ad55eb59
lfn:/grid/home/lt2wk.dta
• You can give an alias when registering the file– lcg-cr --vo dteam -d gfe02.hep.ph.ic.ac.uk file:myfile.dta -l
lfn:/grid/dteam/lt2wk.dta
16/04/2007 Running the LT2
More on moving files
• Copying files back on your UI– lcg-cp --vo dteam lfn:/grid/dteam/lt2wk.dta
file:`pwd`/myfile.dta
• Replicating files somewhere else– lcg-rep -d se1.pp.rhul.ac.uk --vo dteam
lfn:/grid/dteam/lt2wk.dta
16/04/2007 Running the LT2
Listing files
• Listing replicas:– lcg-lr –-vo [yourvo] lfn:<name>
• List the guid:– lcg-lg –-vo [yourvo] lfn:<name>
• Example– lcg-lr --vo dteam lfn:/grid/dteam/lt2wk.dta
• srm://gfe02.hep.ph.ic.ac.uk/pnfs/hep.ph.ic.ac.uk/data/dteam/generated/2007-04-16/filec6b6fba2-c854-4ee6-a0db-68bd6cd6e0dd
• srm://se1.pp.rhul.ac.uk/dpm/pp.rhul.ac.uk/home/dteam/generated/2007-04-16/file5642a5ea-b63f-411a-b56c-84a75137d716
16/04/2007 Running the LT2
Sending your job where your files are
• In your JDL– InputData = {"lfn:/grid/dteam/lt2wk.dta"};– DataAccessProtocol ={"file", "srm", "gridftp"};
• Then you have to use the lcg- commandsto copy the files
• Alternatively you can link to the gfal libraryand stream the data (man gfal).
16/04/2007 Running the LT2
Conclusions• In London you have
– Around 2500 cpu– 94 TB– All availaible trough the ltwo vo
• To get more on how to use– http://www.gridpp.ac.uk/deployment/users/– Get registered to the ltwo vo.
• See the GANGA talk for more high leveltools to submit jobs without having to writejdl.
16/04/2007 Running the LT2
Thanks to all of the Team
M. Aggarwal, D. Colling, A. Chamberlin, S. George, K. Georgiou, M. Green, W. Hay, P. Hobson, P. Kyberd, A. Martin, G. Mazza, D. Rand, G. Rybkine, G. Sciacca, K. Septhon,
B. Waugh,
LT2
BACKUP
16/04/2007 Running the LT2
Listing the SE. Removing files
• lcg-infosites –-vo ltwo se
• Don’t forget to remove your files– lcg-del
16/04/2007 Running the LT2
RLS remember file location
16/04/2007 Running the LT2
VOMS: Virtual Organization Membership Service.• Provides information on the user's relationship with her Virtual
Organization: her groups, roles and capabilities. • Provides the list of users for a given VO
16/04/2007 Running the LT2
GridLoadTool to monitor the sites:
-Updates every 5minutes-Uses the RTM data and stores it in rrd files•Shows theNumber of Jobs in any state•VO view. Stacks the Jobs by VO •CE view. Stacks the Jobs by CE
https://gfe03.hep.ph.ic.ac.uk:4175/cgi-bin/load
Still a prototype. Will add•View by GOC and ROC.•Error checking.•Add usage (running cpu/total cpu). •Improve look and feel
Could interface with NAGIOS for raising alarms (high abort rate)
16/04/2007 Running the LT2
GridLoad What can it be used for ?
#Aborted JobsHome dir full Problem solved
•Can be used to have a unique measureof the health of the system•We can then use nagios to find out more•Avoid the to many alarms syndrome !•You can query the cgi to get graphs for your site
16/04/2007 Running the LT2
Extracting the private and public keys.
• You have to create a .globus directory and extract the keys into it.– Extract your public key:
•openssl pkcs12 -in cert.p12 -clcerts -nokeys -out usercert.pem
• Chmod 644 usercert.pem
– Extract your private key:•openssl pkcs12 -in cert.p12 -nocerts -out
userkey.pem • Protected it: chmod 200 userkey.pem
16/04/2007 Running the LT2
Initialize your Proxy
• The Proxy is a temporary key pair that is signed by your private key. It allows to delegate your credidential to another machine where your job will run.
• To create a proxy (which will be a file in the /tmp directory) you need to– Voms-proxy-init –-voms ltwo– Type the password to decrypt your public key
• You should see this:
Your identity: /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1)Enter GRID pass phrase:Creating temporary proxy ............................................... DoneContacting gm01.hep.ph.ic.ac.uk:15002 [/C=UK/O=eScience/OU=Imperial/L=Physics/CN=host/gm01.hep.ph.ic.ac.uk/[email protected]] "ltwo" DoneCreating proxy ............................................ DoneYour proxy is valid until Tue Dec 6 23:45:14 2005
16/04/2007 Running the LT2
Preparing for submitting jobs
• A simple job program is made available in the /tmp/Lecture.tar.gz
• Copy it to your home dir and untar it.• To submit a job you need to create a file that contains
your requirements this is the so called jdl file (job description language)
• We will submit jobs as members of the London Tier 2 VO (LTWO) so we need to specify to run on sites that support it.
• For the moment the site that support it is the Imperial College HEP site.
16/04/2007 Running the LT2
Submit the job
• edg-job-submit --config-vo gridpp_wl_vo_ltwo.conf --config gridpp_wl_cmd_var.conf hello.jdl
• Or runjob.sh hello.jdl• The configuration files (gridpp_...) are there to specify to use the
imperial Ressource Broker since it is the only one that knows about the ltwo vo.
********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:
- https://gfe01.hep.ph.ic.ac.uk:9000/kvexAiToJyvcBvxdMBTdoA
*********************************************************************************************
This is your Job ID
16/04/2007 Running the LT2
Check the status of your job• Edg-job-status [Your job ID] will get the status of your job
16/04/2007 Running the LT2
Managing large files
• To transfer large files your should not use the input and output sandbox. They are limited to 9MB.
• File replication should be used.
• The LTWO vo does not have a catalog to register the files so I will describe what can be done.
16/04/2007 Running the LT2
Globus-url-copy
• You can copy file to our SE using the globus-url-copy command
• Globus-url-copy file:////myfile gsiftp://gw38.hep.ph.ic.ac.uk/stage2/lcg2-data/ltwo/myfilename
• But this is not using the catalog to avoid knowing where your file really is.
16/04/2007 Running the LT2
Hello.jdl and finding matching ressources
• In the Lecture directory– See file Hello.jdlExecutable = "/bin/hostname";#Arguments = "none";StdOutput = "std.out";StdError = "std.err";OutputSandbox = {"std.out", "std.err"};
Name of the executable
Files you want to retreive
Check which ressources match your requirements
edg-job-list-match --config-vo gridpp_wl_vo_ltwo.conf --config gridpp_wl_cmd_var.conf hello.jdl
16/04/2007 Running the LT2
Exercice
• Find out what the GridCR program does• Submit 5 jobs. The output of the GridCR
program should be stored on the classic SE
• Using your job standard output retreive the files that have been generated.
16/04/2007 Running the LT2
Check the validity of your proxy
• voms-proxy-info will tell you how many hours your delegation is valid.
subject : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1)/CN=proxyissuer : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1)identity : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1)type : proxystrength : 512 bitspath : /tmp/x509up_u37227timeleft : 11:58:43
16/04/2007 Running the LT2
Finding which ce support the ltwo vo
• To get a list of CE that support the ltwo vo you use the lcg-infosites command– Lcg-infosites –vo ltwo ce
gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs-ltwo
This is the CE of the HEP group.
- If you do lcg-infosites –-vo dteam ce you will get a list of CE in LCG.
16/04/2007 Running the LT2
Lcg-cr,lcg-rep
• To register a file in a catalog and copy it to your beloved SE
lcg-cr –-vo [yourvo] file://`pwd`/<name> \ -l lfn:<name> -d yourse
If you do not give SE the local one will be used.
• To replicate the same file in a different CE– lcg-rep -–vo [yourvo]