34
Upgrade D0 farm

Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Embed Size (px)

Citation preview

Page 1: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Upgrade D0 farm

Page 2: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Reasons for upgrade

• RedHat 7 needed for D0 software

• New versions of – ups/upd v4_6– fbsng v1_3f+p2_1– sam

• Use of farm for MC and analysis

• Integration in farm network

Page 3: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

MC production on farm

• Input: requests

• Request translated in mc_runjob macro

• Stages:1. mc_runjob on batch server (hoeve)

2. MC job on node

3. SAM store on file server (schuur)

Page 4: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

farm server file server

node

SAM DB

datastore

fbs(rcp,sam)

fbs(mcc)

mcc request

mcc input

mcc output

1.2 TB

40 GB

FNALSARA

control

data

metadata

fbs job:1 mcc2 rcp3 sam

100 cpu’s

Page 5: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

farm server file server

node

SAM DB

datastore

fbs(rcp[,sam])

fbs(mcc)

mcc request

mcc input

mcc output

1.2 TB

40 GB

FNALSARA

control

data

metadata

fbs job:1 mcc2 rcp

100 cpu’s

cron:sam

Page 6: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

fbsuser:cpfbsuser:mcc

fbsuser: rcp

willem:sam

hoeve node schuur

fbsuser:mc_runjob

fbs submit

fbs submit

data

control

cron

Page 7: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

SECTION mcc EXEC=/d0gstar/curr/minbias-02073214824/batch NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/curr/minbias-02073214824/stdout STDERR=/d0gstar/curr/minbias-02073214824/stdoutSECTION rcp EXEC=/d0gstar/curr/minbias-02073214824/batch_rcp NUMPROC=1 QUEUE=IOQ DEPEND=done(mcc) STDOUT=/d0gstar/curr/minbias-02073214824/stdout_rcp STDERR=/d0gstar/curr/minbias-02073214824/stdout_rcp

Page 8: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

#!/bin/sh

. /usr/products/etc/setups.shcd /d0gstar/mcc/mcc-dist. mcc_dist_setup.sh

mkdir -p /data/curr/minbias-02073214824cd /data/curr/minbias-02073214824cp -r /d0gstar/curr/minbias-02073214824/* .touch /d0gstar/curr/minbias-02073214824/.`uname -n`sh minbias-02073214824.sh `pwd` > logtouch /d0gstar/curr/minbias-02073214824/`uname -n`/d0gstar/bin/check minbias-02073214824

#!/bin/shi=minbias-02073214824if [ -f /d0gstar/curr/$i/OK ];thenmkdir -p /data/disk2/sam_cache/$icd /data/disk2/sam_cache/$inode=`ls /d0gstar/curr/$i/node*`node=`basename $node`job=`echo $i | awk '{print substr($0,length-8,9)}'`rcp -pr $node:/data/dest/d0reco/reco*${job}* .rcp -pr $node:/data/dest/reco_analyze/rAtpl*${job}* .rcp -pr $node:/data/curr/$i/Metadata/*.params .rcp -pr $node:/data/curr/$i/Metadata/*.py .rsh -n $node rm -rf /data/curr/$irsh -n $node rm -rf /data/dest/*/*${job}*touch /d0gstar/curr/$i/RCPfi

batchruns on node

batch_rcpruns on schuur

Page 9: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

#!/bin/shlocate(){file=`grep "import =" import_${1}_${job}.py | awk -F \" '{print $2}'`sam locate $file | fgrep -q [return $?}. /usr/products/etc/setups.shsetup samSAM_STATION=hoeveexport SAM_STATION

tosam=$1LIST=`cat $tosam`

for job in $LISTdo cd /data/disk2/sam_cache/${job} list='gen d0g sim' for i in $list do until locate $i || (sam declare import_${i}_${job}.py && locate ${i}) do sleep 60; done done

list='reco recoanalyze' for i in $list do sam store --descrip=import_${i}_${job}.py --source=`pwd` return=$? echo Return code sam store $returndonedoneecho Job finished ...

declare gen, d0g, sim

store reco, recoanalyze

runs on schuurcalled by fbs or cron

Page 10: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Filestream

• Fetch input from sam

• Read input file from schuur

• Process data on node

• Copy output to schuur

Page 11: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

rcp

d0exe

rcp

sam

hoeve node schuur

mc_runjob

fbs submit

fbs submit

data

control

cron

attach filestream

Page 12: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Analysis on farm

• Stages:– Read files from sam– Copy files to node(s)– Perform analysis on node– Copy files to file server– Store files in sam

Page 13: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

farm server file server

node

SAM DB

datastore

1.2 TB

40 GB

FNALSARA

control (fbs)

data

metadata

100 cpu’s

1. sam + rcp2. analyze3. rcp + sam

fbs(1), fbs(3)

fbs(2)

Page 14: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

triviaal node-2

fbsuser:rcp

fbsuser:rcp

fbsuser:

analysisprogram

willem:sam

willem:sam

input

output

Page 15: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

SECTION sam EXEC=/home/willem/batch_sam NUMPROC=1 QUEUE=IOQ STDOUT=/home/willem/stdout STDERR=/home/willem/stdout

#!/bin/sh

. /usr/products/etc/setups.shsetup samSAM_STATION=triviaalexport SAM_STATION

sam run project get_file.py --interactive > log

/usr/bin/rsh -n -l fbsuser triviaal rcp -r /stage/triviaal/sam_cache/boo node-2:/data/test >> log

batch.jdf

batch_sam

Page 16: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

farm server file server

node

SAM DB

datastore

1.2 TB

40 GB

FNALSARA

control (fbs)

data

metadata

100 cpu’s

1. sam2. rcp + analyze + rcp3. rcp + sam

fbs(1), fbs(3)

fbs(2)

Page 17: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

triviaal node-2

fbsuser:rcpanalysisprogram

rcp

willem:sam

willem:sam

input

output

fbsuser:fbs submit

Page 18: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

SECTION sam EXEC=/d0gstar/batch_node NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/stdout STDERR=/d0gstar/stdout

#!/bin/shuname -adate

rsh -l fbsuser triviaal fbs submit ~willem/batch_node.jdf

Page 19: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

#!/bin/sh. /usr/products/etc/setups.shsetup fbsngsetup samSAM_STATION=triviaalexport SAM_STATIONsam run project get_file.py --interactive > log/usr/bin/rsh -n -l fbsuser triviaal fbs submit /home/willem/batch_node.jdf

SECTION sam EXEC=/home/willem/batch NUMPROC=1 QUEUE=IOQ STDOUT=/home/willem/stdout STDERR=/home/willem/stdout

SECTION ana EXEC=/d0gstar/batch_node NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/stdout STDERR=/d0gstar/stdout

#!/bin/shrcp -pr server:/stage/triviaal/sam_cache/boo /data/test. /d0/fnal/ups/etc/setups.shsetup root -q KCC_4_0:exception:opt:threadsetup kailibroot -b -q /d0gstar/test.C

{gSystem->cd("/data/test/boo");gSystem->Exec("pwd");gSystem->Exec("ls -l");}

Page 20: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

## This file sets up and runs a SAM project.#import os, sys, string, time, signalfrom re import *from globals import *import run_projectfrom commands import *########################################### Set the following variables to appropriate values

# Consult database for valid choicessam_station = "triviaal"

# Consult Database for valid choicesproject_definition = "op_moriond_p1014"

# A particular snapshot version, last or newsnapshot_version = 'new'

# Consult database for valid choicesappname = "test"version = "1"group = "test"

# The maximum number of files to get from sammax_file_amt = 5

# for additional debug info use "--verbose"#verbosity = "--verbose"verbosity = ""

# Give up on all exceptionsgive_up = 1

def file_ready(filename): # Replace this python subroutine with whatever # you want to do # to process the file that was retrieved. # This function will only be called in the event of # a successful delivery. print "File ",filename," has been delivered!"# os.system('cp '+filename+' /stage/triviaal/sam') return

get_file.py

Page 21: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Disk partitioning hoeve

/d0

/fnal

/d0dist /d0usr

/mcc

/mcc-dist /mc_runjob /curr/ups

/db /etc /prd

/fnal -> /d0/fnal/d0usr -> /fnal/d0usr/d0dist -> /fnal/d0dist/usr/products -> /fnal/ups

/fbsng

Page 22: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

ana_runjob

• Is analogous to mc_runjob

• Creates and submits analysis jobs

• Input– get_file.py with SAM project name

• Project defines files to be processed

– analysis script

Page 23: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Integration with grid (1)

• At present separate clusters:– D0, LHCb, Alice, DAS cluster

• hoeve and schuur in farm network

Page 24: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Present network layout

hoeve schuur

switch

node node node

router

hefnet

surfnet

ajax

NFS

Page 25: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

New network layout

farmrouter

switch switch switch

D0LHCb

hefnet

lambda

hoeve schuur

alice

ajax

NFS

booder

Page 26: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

New network layout

farmrouter

switch switch switch

D0LHCb

hefnet

lambda

hoeve schuur

alice

ajax

NFS

booder

das-2

Page 27: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Server tasks

• hoeve– software server– farm server

• schuur– fileserver– sam node

• booder– home directory server– in backup scheme

Page 28: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Integration with grid (2)

• Replace fbs with pbs or condor– pbs on Alice and LHCb nodes– condor on das cluster

• Use EDG installation tool LCGF– Install d0 software with rpm

• Problem with sam (uses ups/upd)

Page 29: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Integration with grid (3)

• Package mcc in rpm

• Separate programs from working space

• Use cfg commands to steer mc_runjob

• Find better place for card files

• Input structure now created on node

Page 30: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Grid job

#!/bin/sh

macro=$1

pwd=`pwd`

cd /opt/fnal/d0/mcc/mcc-dist. mcc_dist_setup.sh

cd $pwddir=/opt/fnal/d0/mcc/mc_runjob/py_scriptpython $dir/Linker.py script=$macro

[willem@tbn09 willem]$ cat test.pbs# PBS batch job script

#PBS -o /home/willem/out#PBS -e /home/willem/err#PBS -l nodes=1

# Changing to directory as requested by user

cd /home/willem

# Executing job as requested by user

./submit minbias.macro

PBS job submit

Page 31: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

RunJob class for gridclass RunJob_farm(RunJob_batch) : def __init__(self,name=None) : RunJob_batch.__init__(self,name) self.myType="runjob_farm"

def Run(self) : self.jobname = self.linker.CurrentJob() self.jobnaam = string.splitfields(self.jobname,'/')[-1] comm = 'chmod +x ' + self.jobname commands.getoutput(comm) if self.tdconf['RunOption'] == 'RunInBackground' : RunJob_batch.Run(self) else : bq = self.tdconf['BatchQueue'] dirn = os.path.dirname(self.jobname) print dirn comm = 'cd ' + dirn + '; sh ' + self.jobnaam + ' `pwd` >& stdout' print comm runcommand(comm)

Page 32: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

To be decided

• Location of minimum bias files

• Location of MC output

Page 33: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

Job status

• Job status is recorded in– fbs– /d0/mcc/curr/<job_name>– /data/mcc/curr/<job_name>

Page 34: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis

SAM servers

• On master node:– station– fss

• On master and worker nodes:– stager– bbftp