Upload
caroun
View
43
Download
1
Embed Size (px)
DESCRIPTION
XXII- th International Symposium on Nuclear Electronics and Computing. Varna Sep 6-13, 2009. ATLAS Distributed Computing Computing Model, Data Management, Production System, Distributed Analysis, Information System, Monitoring. Alexei Klimentov Brookhaven National Laboratory. - PowerPoint PPT Presentation
Citation preview
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
1
Alexei Klimentov
Brookhaven National Laboratory
ATLAS Distributed ComputingComputing Model, Data Management, Production System, Distributed Analysis, Information System, Monitoring
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
2
Introduction
The title that Vladimir gave me cannot be done in
20 mins.
I’ll talk about Distributed Computing
Components, but I am certainly biased as any
Operations person.
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
3
ATLAS Collaboration
6 Continents 37 Countries 169 Institutions 2800 Physicists 700 Students>1000 Technical and support staff
Albany, Alberta, NIKHEF Amsterdam, Ankara, LAPP Annecy, Argonne NL, Arizona, UT Arlington, Athens, NTU Athens, Baku,
IFAE Barcelona, Belgrade, Bergen, Berkeley LBL and UC, HU Berlin, Bern, Birmingham, Bogotá, Bologna, Bonn, Boston, Brandeis,
Bratislava/SAS Kosice, Brookhaven NL, Buenos Aires, Bucharest, Cambridge, Carleton, Casablanca/Rabat, CERN, Chinese Cluster, Chicago,
Chilean Cluster (Santiago+Valparaiso), Clermont-Ferrand, Columbia, NBI Copenhagen, Cosenza, AGH UST Cracow, IFJ PAN Cracow, DESY,
Dortmund, TU Dresden, JINR Dubna, Duke, Frascati, Freiburg, Geneva, Genoa, Giessen, Glasgow, Göttingen, LPSC Grenoble, Technion Haifa,
Hampton, Harvard, Heidelberg, Hiroshima, Hiroshima IT, Indiana, Innsbruck, Iowa SU, Irvine UC, Istanbul Bogazici, KEK, Kobe, Kyoto, Kyoto UE,
Lancaster, UN La Plata, Lecce, Lisbon LIP, Liverpool, Ljubljana, QMW London, RHBNC London, UC London, Lund, UA Madrid, Mainz, Manchester,
Mannheim, CPPM Marseille, Massachusetts, MIT, Melbourne, Michigan, Michigan SU, Milano, Minsk NAS, Minsk NCPHEP, Montreal,
McGill Montreal, FIAN Moscow, ITEP Moscow, MEPhI Moscow, MSU Moscow, Munich LMU, MPI Munich, Nagasaki IAS, Nagoya, Naples,
New Mexico, New York, Nijmegen, BINP Novosibirsk, Ohio SU, Okayama, Oklahoma, Oklahoma SU, Oregon, LAL Orsay, Osaka, Oslo, Oxford,
Paris VI and VII, Pavia, Pennsylvania, Pisa, Pittsburgh, CAS Prague, CU Prague, TU Prague, IHEP Protvino, Regina, Ritsumeikan,
UFRJ Rio de Janeiro, Rome I, Rome II, Rome III, Rutherford Appleton Laboratory, DAPNIA Saclay, Santa Cruz UC, Sheffield, Shinshu, Siegen, Simon
Fraser Burnaby, SLAC, Southern Methodist Dallas, PNPI St.Petersburg, Stockholm, KTH Stockholm, Stony Brook, Sydney, AS Taipei, Tbilisi,
Tel Aviv, Thessaloniki, Tokyo ICEPP, Tokyo MU, Toronto, TRIUMF, Tsukuba, Tufts, Udine/ICTP, Uppsala, Urbana UI, Valencia,
UBC Vancouver, Victoria, Washington, Weizmann Rehovot, FH Wiener Neustadt, Wisconsin, Wuppertal, Yale, Yerevan
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
4
Necessity of Distributed Computing?
ATLAS will collect RAW data at 320 MB/s for 50k seconds/day and ~100
days/year
RAW data: 1.6 PB/year
Processing (and re-processing) these events will require ~10k CPUs full time
the first year of data-taking, and a lot more in the future as data accumulate
Reconstructed events will also be large, as people want to study detector
performance as well as do physics analysis using the output data
ESD data: 1.0 PB/year, AOD data: 0.1 PB/year
At least 10k CPUs are also needed for continuous simulation production of at
least 30% of the real data rate and for analysis
There is no way to concentrate all needed computing power and storage
capacity at CERN
The LEP model will not scale to this level
The idea of distributed computing, and later of the computing grid, became
fashionable at the turn of the century and looked promising when applied to
HEP experiments’ computing needs
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
5
Copy RAW data to CERN Castor Mass Storage System tape for archival
Copy RAW data to Tier-1s for storage and reprocessing
Run first-pass calibration/alignment (within 24 hrs)
Run first-pass reconstruction (within 48 hrs)
Distribute reconstruction output (ESDs, AODs & TAGs) to Tier-1s
5 Calibration
Tier 2
5 sites in Europe and US
RAW
Archive a fraction of RAW data
(Re)run calibration and alignment
Re-process data with better calib/align or/and algo
Distribute derived data to Tier-2sRun HITS reconstruction and large-scale event selection and analysis jobs
Computing Model : Main Operations
Tier 3Contribute to MC simulation
Users AnalysisO(100) sites Worldwide
AODTAG
Incomplete list of Data Formats:ESD : Event Summary DataAOD : Analysis Object DataDPD : Derived Physics DataTAG : event meta-information
TAGRun MC simulation
Keep AOD and TAG for the analysis
Run analysis jobs(36 Tier-2s, ~80 sites)
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
Tier-0
BNL
IN2P3
FZK
IN2P3 MoU & CMRAW, ESD 15%, AOD,DPD,TAG 100%
FZK MoU and CMRAW, ESD 10%, AOD,DPD,TAG 100%
ASGCASGCASGCASGCTier-1
MWT2
AGLT2
NET2
SWT2
SLAC
ATLAS Grid Sites and Data Distribution
3 Grids, 10 Tier-1s, ~80 Tier-2(3)sTier-1 and associated Tier-ns form cloud. ATLAS clouds have from 2 to 15 sites. We also have T1-T1 associations.
Input Rates Estimation (Tier-1s)
BNL MoU & CMRAW 24%, ESD, AOD,DPD,TAG 100%
Data export from CERNreProcessed and MC data distribution
Tier-1 BNL CNAF FZK IN2P3 NDGF PIC RAL SARA TAIWAN TRIUMF
Summary
Tape (MB/s)
80 16 32 48 16 16 32 48 16 16 320
Disk (MB/s)
240 60 80 100 60 60 80 100 60 60 800
Total (MB/s)
320 76 112 148 76 76 112 148 76 76 1220
ATLAS Tier-1s Data Shares
6
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
Ubiquitous
Wide Area Network
Bandwidth
First Computing TDR’s assumed not enough network bandwidth
The Monarch project proposed multi Tier model with this in mind
Today network bandwidth is our least problem
But we still have the Tier model in the LHC experiments
Not in all parts of the world ideal network yet (last mile)
LHCOPN provides excellent backbone for Tier-0 and Tier-1’s
Each LHC experiment has adopted differently
K.Bos. “Status and Prospects of The LHC Experiments Computing”. CHEP’09
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
8
Distributed Computing Components
The ATLAS Grid architecture is based on : Distributed Data Management (DDM)
Distributed Production System (ProdSys, PanDA)
Distributed Analysis (DA), GANGA, PanDA
Monitoring
Grid Information System
Accounting
Networking
Databases
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
ATLAS Distributed Data Management.
1/2
The second generation of ATLAS DDM system (DQ2) DQ2 is built on top of Grid data transfer tools
Moved to dataset based approach Datasets : an aggregation of files plus associated DDM metadata
Datasets is a unit of storage and replication
Automatic data transfer mechanisms using distributed site services
Subscription system
Notification system Technicalities :
Global services dataset repository dataset location catalog logical file names only, no global physical file catalog
Local Site services (LocalFileCatalog) It provides logical to physical file name mapping.
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009ATLAS Distributed Data Management. 2/2
10
STEP09STEP09Data export from CERN to Tiersday/averageMB/s
Reprocessed datasets replication between Tier-1s (ΔΤ [hours] = T_last_file_transfer – T_subscription)
99% of data were transferredwithin 4 hours
Latency in reprocessingor site issue
One dataset wasn’t replicated after 3 days
Days of running
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
ATLAS Production System 1/2
Manages ATLAS simulation (full chain) and reprocessing jobs on the wLCG Task request interface to define a related group of jobs
Input : DQ2 dataset(s) (with the exception of some event generation)
Output : DQ2 dataset(s) (the jobs are done only when the output is at the Tier-1)
Due to temporary site problems, jobs are allowed several attempts
Job definition and attempt state are stored in Production Database (Oracle DB)
Jobs are supervised by ATLAS Production System
Consists of many components DDM/DQ2 for data management
PanDA task request interface and job definitions
PanDA for job supervision
ATLAS Dashboard and PanDA monitor for monitoring
Grid Middlewares
ATLAS software
11
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
ATLAS Production System 2/2
12
Task request interfaceProduction Database: job definition, job states, metadata
Tasks Input: DQ2 datasetsTask states
Tasks Output: DQ2 datasets
3 Grids/10 Clouds/90+Production Sites
Monitor sites, tasks, jobs
Job brokering is done by the PanDA Service (bamboo) according to input data and site availability
A.Read, Mar09
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
Data Processing Cycle
Data processing at CERN (Tier-0 processing) First-pass processing of the primary event stream
The derived datasets (ESD, AOD, DPD, TAG) are distributed from the Tier-0 to the Tier-1s
RAW data (received from Event Filter Farm) are exported within 24h. This is why first-pass processing can be done by Tier-1s (though this facility was not used during LHC beam and cosmic ray runs)
Data reprocessing at Tier-1s 10 Tier-1 centers world wide. Each takes a subset of RAW data
(Tier-1 shares from 5% to 25%), ATLAS production facilities at CERN can be used in case of emergency.
Each Tier-1 reprocessed its share of RAW data. The derived datasets are distributed ATLAS-wide.
13
See P.Nevski’ talk NEC2009, LHC Computing
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009ATLAS Data Simulation and
Reprocessing
14
RunningJobs
ReprocessingSep08-Sep09
Production System in continuous operations10 clouds use LFC as file catalog and Panda as jobs executorCPUs are under utilized in average, peak rate 33kjobs/dayProdSys can produce 100 TB/week of MCAverage walltime efficiency is over 90%System does : Data simulation and data reprocessing
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
ATLAS Distributed Analysis
15
Probably the most important area at this pointIt depends on a functional data management and job management system
Two widely used distributed analysis tools (Ganga and pathena)They capture the great majority of usersWe expect the usage to grow substantially in the preparation and especially in the 2009/10 run Present/traditional use cases: AOD/DPD analysis clearly very important But also run over selected RAW (for detector debugging, studying etc…)
J.Elmsheuser Sep09ATLAS jobs go to the data
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009ATLAS Grid Information System (AGIS)
The overall purpose of ATLAS Grid Information
System is to store and to expose static, dynamic
and configuration parameters needed by ATLAS
Distributed Computing (ADC) applications. AGIS is
a database oriented system.
The first AGIS proposal from G.Poulard. The pioneering work of
R.Pezoa and R.Rocha in summer 2008, and definition of basic
design principles implemented in ‘dashboards’. Now development
is leaded by ATLAS BINP group.
Today’s situation :
various configuration parameters and information about available resources, services and its status and properties are extracted from different sources or they are defined in different configuration files (sometimes Grid information is hard coded in application programs).
16
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
AGIS Architecture Overview
System architecture should allow to add new classes of
information or sites configuration parameters, to
reconfigure ATLAS clouds topology and production
queues, to add and to modify users information.
AGIS is ORACLE based information system.
AGIS stores as read-only data extracted from the external
databases (f.e, OIM, GOCDB, BDII) and ADC configuration
information which can be modified.
The synchronization of AGIS content with the external
sources will be done by agents (data providers), agents
will access databases via standard interfaces.
17
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009AGIS Components
A.Anisenkov, D.Krivashin. Sep09
ATP
Logging Service
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
AGIS Information ATLAS clouds, tiers and sites
Topology : clouds, tiers, sites specifics (f.e. geography, names, etc)
Site Resources and Services information list of resources and services (FTS servers, SRM, LFC)
site service properties (name, status, type, endpoints)
Site information and configurations available CE and SE information (CPU,disk information, status, available resources)
availability and various status information, like site status in ATLAS Data distribution, Monte-Carlo Production,
Functional Tests. Site downtime periods
relations to currently running/planned tests, tasks or runs
Data replication. Sites shares and pairing. List of activities (f.e. reprocessing), activity start and end time • Global configuration
parameters needed by ADC applications
Users related information (privileges, roles, account info)
19
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009ATLAS Distributed Computing Monitoring
(Today)
20
R.Rocha Sep09
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009ATLAS Distributed Computing Monitoring
(Next)
21
R.Rocha Sep09
• Simplify into one monitoring application (where it is possible)
• Standardize monitoring messageso https://svnweb.cern.ch/trac/da
shboard/wiki/WorkInProgresso HTTP for transporto JSON for data serialization
• Attempt to have a common (single) dashboard client applicationo Built using the Google Web
Toolkit (GWT)• Source data exposed directly from
its source (like the Panda database)o Avoid aggregation databases
like we have todayo Server side technology left
open
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
Summary & Conclusions
The ATLAS Collaboration has developed a set of software and middleware tools that enable access to data for physics analysis purposes to all members of the collaboration, independently of their geographical location.
Main building blocks of this infrastructure are
the Distributed Data Management system;
the Ganga and pathena for distributed analysis on the Grid.
Production System to (re)process and to simulate ATLAS data
Almost all required functionalities are already provided; and extensively used for simulated, as well as real data from beam and cosmic ray events.
Grid Information System technical proposal is finalized and the
system must be in production by the end of the year
Monitoring system standardization is in progress 22
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
МНОГО
БЛАГОДАРЮ
23
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
Acknoledgements
Thanks to A.Anisenkov, D.Barberis, K.Bos, M.Branco, S.Campana, A.Farbin, J.Elmsheuser, D.Krivashin, A.Read, R.Rocha, A.Vaniachine, T.Wenaus,…
For pictures and slides used in this presentation
24