15
GridPP11 Liverpool Sept04 SAMGrid SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London

GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London

Embed Size (px)

Citation preview

GridPP11 Liverpool Sept04

SAMGridSAMGrid

GridPP11 LiverpoolSept 2004

Gavin DaviesImperial College London

GridPP11 Liverpool Sept04

IntroductionIntroduction

• Tevatron– Less data than LHC, but still PBs/experiment and growing – Running experiments

• SAM (Sequential Access to Metadata) – Well developed metadata and distributed data replication

system– Developed by DØ & FNAL-CD

• JIM (Job Information and Monitoring)– handles job submission and monitoring (all but data handling)– SAM + JIM →SAMGrid – computational grid

• Runjob – handles job workflow management

See http://cdinternal.fnal.gov/RUNIIRev2004/runIIMP.asp

GridPP11 Liverpool Sept04

SAMGrid SAMGrid ArchitectureArchitecture

GridPP11 Liverpool Sept04

SAM plotsSAM plots

Up to 200TB/month

Over 2 PB in last yr

CDF usage now similar-have just topped the PB

Active SAM sites40 DØ, 26 CDF

(DØ usage)

(DØ usage)

GridPP11 Liverpool Sept04

SAMGrid-plotsSAMGrid-plots

http://samgrid.fnal.gov:8080/(09/09/04)

JIM: Active execution sites: 11DØ, 1 CDF in testing

GridPP11 Liverpool Sept04

SAMGrid plotsSAMGrid plots

GridPP11 Liverpool Sept04

DDØ – Production - MCØ – Production - MC

• All DØ MC always produced off-site

• SAMGrid now default (went into production in mar 04)– Based on request system and jobmanager-mc_runjob– MC software package retrieved via SAM– Currently running at (multiple) sites in Cz, Fr, UK, USA (10 in total

+ FNAL)• more on way, inc central farm

– Average production efficiency ~90%– Average inefficiency due to grid infrastructure ~1-5%

• For more details, see– GridPP10 DØ talk by Peter Love– http://www-d0.fnal.gov/computing/grid/deployment-issues.html

GridPP11 Liverpool Sept04

• P14 Autumn 2003

– 25M events in UK– Based around mc_runjob– Distributed computing rather than Grid– UK effort key to project success

• P17 Autumn 2004– x 10 larger, use of db proxy servers– SAMGrid as default– Use LCG resources

DDØ – Production - Ø – Production - ReprocessingReprocessing

GridPP11 Liverpool Sept04

DDØ – Production - Ø – Production - LCGLCG

• Increasing effort to ensure SAMGrid / LCG interoperability– MC generated on EDG/LCG and other shared resources (inc Imperial, RAL) “by hand”– Demo of sam_client functionality on LCG at London workshop in Apr– Will use LCG resources p17 data reprocessing

All Nikhef MCproduced this way

GridPP11 Liverpool Sept04

(D(DØ –) RunjobØ –) Runjob

• mc_runjob currently used by SAMGrid for MC and reprocessing• DØrunjob - the rewrite• Joint CDF, CMS, DØ, FNAL-CD project

• Base classes from common Runjob package

• DØrunjob available this autumn– Will incorporate Sandbox as a separate module

• For details see: http://projects.fnal.gov/runjob/

Runjob

CDFRunjob CMSRunjob DØRunjob

GridPP11 Liverpool Sept04

CDF – production - ICDF – production - I

• See Mòrag Burgon-Lyon’s GridPP 10 talk for details

• Goal 1: 25% of computing offsite by June 2004– Done, using DCAF and SAM

• DCAF = de-centralised CDF analysis farm, core of 7 sites, more on way

• Goal 2: 50% by June 2005, using Grid– Resources being identified / pledged

• JIM deployment – Originally planned for Oct 15th – Problematic, look at grid3 as possible alternative

GridPP11 Liverpool Sept04

CDF – production - IICDF – production - II

• Migration of DCAF sites to Condor

• Migration to SAM V6– Switch to new internal dbserve code under test– Roll out to global sites expected soon

• FroNTier - new way to serve database contents to remote institutes– Should lower load on central CDF Oracle servers

• Studying methods to lower load and avoid fragmentation on remote file servers due to simultaneous network writes

GridPP11 Liverpool Sept04

(CDF -) SAMTV(CDF -) SAMTV

• SAM TV used by CDF & DØ to monitor SAM and SAM stations– Currently created from log files– Version in dev created from MIS database, filled by new MIS server

GridPP11 Liverpool Sept04

Summary / Summary / plansplans

• SAM & SAMGrid critical – GridPP key part of effort

• SAMGrid, default for– MC production– Data reprocessing from

autumn– Analysis to follow

• dØ tools, dØrte, sandboxing

• Interoperability– Good progress

DØ• 25% of computing off-site

– Most with DCAF/SAM– GridPP effort key part of effort

• Increase to 50% for June 2005– More DCAF installations

• Encourage user migration

UKLight -10Gbit/s - “data –reprocessing”

CDF

GridPP11 Liverpool Sept04

Backup - IBackup - I

From Peter Love’s GridPP10 talk