1
Tier-2 Data Analysis MC simulation Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia a , J.M. Hernández a , F. Martínez b , G. Merino b , M. Rodríguez b , F.J. Rodríguez-Calonge a a CIEMAT, Madrid, Spain b PIC, Barcelona, Spain Abstract CMS has chosen to adopt a distributed model for all computing in order to cope with the requirements on computing and storage resources needed for the processing and analysis of the huge amount of data the experiment will be providing from LHC startup. The architecture is based on a tier-organized structure of computing resources, based on a Tier-0 center at CERN, a small number of Tier- 1centers for mass data processing, and a relatively large number of Tier-2 centers where physics analysis will be performed. The distributed resources are inter-connected using high-speed networks and are operated by means of Grid toolkits and services. We present in this poster, using the Spanish Tier-1 (PIC) and Tier-2 (federated CIEMAT-IFCA) centers as examples, the organization of the computing resource in CMS together with the CMS Grid Services, built on top of generic Grid Services, required to operate the resources and carry out the CMS workflows. The Spanish sites contribute with 5% of the CMS computing resources. We also present the current Grid-related computing activities performed at the CMS computing sites, like high-throughput and reliable data distribution, distributed Monte Carlo production and distributed data analysis, where the Spanish sites have traditionally played a leading role in development, integration and operation. CMS Computing Model Tier-0 Accepts data from DAQ Prompt reconstruction Archives and distributes data to Tier-1’s Tier-1 Real data archiving Re-processing Calibration Skimming and other data- intensive analysis tasks MC data archiving Data Transfer DB TMDB Data Book- keeping DB RefDB LCG replica catalogue User Interface MC Production McRunjob Analysis Tool CRAB Job Monitoring & Bookkeeping BOSS Data Location DB PubDB Resource Broker Grid Information System Computing Element Worker Node Storage Element Data Transfer PhEDEx Site Local Catalogue Application Database Local Global Local or remote Computing resource Interaction Job flow Data flow Workload Management System Data Management System CMS Computing Services/Workflows End 2005 End 2006 End 2007 CPU (kSI2k) 100 300 800 Disk (TB) 50 200 400 Tape (TB) 100 400 800 WAN (Gbps) 1 10 10 End 2005 End 2006 End 2007 CPU (kSI2k) 150 400 800 Disk (TB) 20 100 200 Tape (TB) - - - WAN (Gbps) 1 1 10 Hardware Resources PIC Tier-1 Tier-2 Spain CIEMAT-IFCA Data Distribution Computing Grid Activities Done with PhEDEx, the CMS large scale dataset replica management system. Managed, reliable, routed multi-hop transfers. Includes pre-staging data from tape Demonstrated sustained transfer rates of ~3TB/day to the Spanish sites. Distributed Data Analysis Spanish sites host ~15% of CMS data (~15 Mio. events, ~15 TB) ~10K analysis jobs ran at the Spanish sites so far Distributed Monte Carlo Production Leading role of the Spanish sites in MC production in LCG Grid in the areas of development, integration and operation ~ 15 Mio. MC events produced in LCG Production activities LCG Service Challenge 3 CMS goal: exercise under realistic conditions bulk data transfer and data serving infrastructure SC3 Throughput phase (July 2005) Goal: high throughput transfer + storage system test Sustained rate disk-to disk of ~4TB/day (50 MB/s avg) for 2 weeks to PIC Tier-1 SC3 Service phase (September-November 2005) Goal: concurrent data transfer and data access from storage Test Data Management components (data transfer system, data publication, data catalogues, storage systems) as well as Workload Management components (job submission, monitoring, etc) in a large-scale real use. Automatic data publication for analysis as data get transferred Automatic analysis job submission as data get published 15 TB of data transferred to the Spanish sites 10K jobs run at the Spanish sites (~20% of the CMS SC3 jobs run in LCG) 175 MB/s aggregate analysis job read throughput Integration/te st activities For more information, contact: José M. Hernández, CIEMAT, Particle Physics Division E-28040, Madrid, Spain [email protected]

Tier-2 Data Analysis MC simulation Import data from Tier-1 and export MC data

Embed Size (px)

DESCRIPTION

CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES. For more information, contact: Jos é M. Hernández , CIEMAT, Particle Physics Division E-28040, Madrid, Spain [email protected]. - PowerPoint PPT Presentation

Citation preview

Page 1: Tier-2   Data Analysis   MC simulation   Import data from Tier-1      and export MC data

Tier-2 Data Analysis MC simulation Import data from Tier-1 and export MC data

CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES

P. Garcia-Abia a, J.M. Hernández a, F. Martínez b, G. Merino b, M. Rodríguez b, F.J. Rodríguez-Calonge a a CIEMAT, Madrid, Spain b PIC, Barcelona, Spain

Abstract

CMS has chosen to adopt a distributed model for all computing in order to cope with the requirements on computing and storage resources needed for the processing and analysis of the huge amount of data the experiment will be providing from LHC startup. The architecture is based on a tier-organized structure of computing resources, based on a Tier-0 center at CERN, a small number of Tier-1centers for mass data processing, and a relatively large number of Tier-2 centers where physics analysis will be performed. The distributed resources are inter-connected using high-speed networks and are operated by means of Grid toolkits and services. We present in this poster, using the Spanish Tier-1 (PIC) and Tier-2 (federated CIEMAT-IFCA) centers as examples, the organization of the computing resource in CMS together with the CMS Grid Services, built on top of generic Grid Services, required to operate the resources and carry out the CMS workflows. The Spanish sites contribute with 5% of the CMS computing resources. We also present the current Grid-related computing activities performed at the CMS computing sites, like high-throughput and reliable data distribution, distributed Monte Carlo production and distributed data analysis, where the Spanish sites have traditionally played a leading role in development,integration and operation.

CMS Computing Model

Tier-0 Accepts data from DAQ Prompt reconstruction Archives and distributes data to Tier-1’s

Tier-1 Real data archiving Re-processing Calibration Skimming and other data- intensive analysis tasks MC data archiving

DataTransfer DB

TMDB

Data Book-keeping DB

RefDB

LCG replicacatalogue

User Interface

MC Production McRunjob

Analysis ToolCRAB

Job Monitoring& Bookkeeping

BOSS

Data Location DB

PubDB

ResourceBroker

Grid InformationSystem

Computing Element

WorkerNode

StorageElement

Data TransferPhEDEx

Site LocalCatalogue

Application

Database

LocalGlobal Local or remote

Computingresource

Interaction

Job flow

Data flow

Wor

kloa

d M

anag

emen

t S

yste

mD

ata

Man

agem

ent

Sys

tem

CMS Computing Services/Workflows

End 2005

End 2006End

2007

CPU (kSI2k) 100 300 800

Disk (TB) 50 200 400

Tape (TB) 100 400 800

WAN (Gbps) 1 10 10

End 2005

End 2006End

2007

CPU (kSI2k)

150 400 800

Disk (TB) 20 100 200

Tape (TB) - - -

WAN (Gbps)

1 1 10

Hardware Resources

PIC Tier-1 Tier-2 SpainCIEMAT-IFCA

Data Distribution

Computing Grid Activities

• Done with PhEDEx, the CMS large scale dataset replica management system. Managed, reliable, routed multi-hop transfers. Includes pre-staging data from tape• Demonstrated sustained transfer rates of ~3TB/day to the Spanish sites.

Distributed Data Analysis• Spanish sites host ~15% of CMS data (~15 Mio. events, ~15 TB)• ~10K analysis jobs ran at the Spanish sites so far

Distributed Monte Carlo Production• Leading role of the Spanish sites in MC production in LCG Grid in the areas of development, integration and operation• ~ 15 Mio. MC events produced in LCG

Production activities

LCG Service Challenge 3 CMS goal: exercise under realistic conditions bulk data transfer and data serving infrastructure

SC3 Throughput phase (July 2005) Goal: high throughput transfer + storage system test Sustained rate disk-to disk of ~4TB/day (50 MB/s avg) for 2 weeks to PIC Tier-1

SC3 Service phase (September-November 2005) Goal: concurrent data transfer and data access from storage Test Data Management components (data transfer system, data publication, data catalogues, storage systems) as well as Workload Management components (job submission, monitoring, etc) in a large-scale real use. Automatic data publication for analysis as data get transferred Automatic analysis job submission as data get published 15 TB of data transferred to the Spanish sites 10K jobs run at the Spanish sites (~20% of the CMS SC3 jobs run in LCG) 175 MB/s aggregate analysis job read throughput at PIC Tier-1 and 75 MB/s at Spain Tier-2

Integration/test activities

For more information, contact:José M. Hernández,CIEMAT, Particle Physics DivisionE-28040, Madrid, [email protected]