Federico Carminati, Peter HristovNEC’2011 Varna
September 12-19, 2011
Federico Carminati, Peter HristovNEC’2011 Varna
September 12-19, 2011
An Update about ALICE Computing
An Update about ALICE Computing
ALICE@NEC
• NEC’2001 AliRoot for simulation
• NEC’2003 Reconstruction with AliRoot
• NEC’2005 AliRoot for analysis
• NEC’2007 Still no LHC data => Status and plans of the ALICE offline software. Calibration & Alignment
• NEC’2009 In preparation for the first LHC data, no presentation
• NEC’2011 Almost 2 years of stable data taking, a lot of published physics results => An update about the ALICE computing
Why HI collisions?
STARIndication of trans. HG to QGP at Tc≅170 MeV εc≅1 GeV/fm3
Phase trans. or crossover?
Intermediate phase of strongly interacting QGP?
Chiral symmetry restoration ?
Constituent mass current mass
NA49
3 flavors
• Study QCD at its natural energy scale T=ΛQCD=200 MeV by creating a state of matter at high density and temperature using high energetic heavy ion
collisions.
ALICE
4
History of High-Energy A+B Beams
BNL-AGS: mid 80’s, early 90’s
• O+A, Si+A 15 AGeV/c √sNN ~ 6 GeV
• Au+Au 11 AGeV/c √sNN ~ 5 GeV
CERN-SPS: mid 80’s, 90’s
• O+A, S+A 200 AGeV/c √sNN ~ 20 GeV
• Pb+A 160 AGeV/c √sNN ~ 17 GeV
BNL-RHIC: early 00’s
• Au+Au √sNN ~ 130 GeV
• p+p, d+Au √sNN ~ 200 GeV
• LHC: 2010 (!)
• Pb+Pb √sNN ~ 5,500 (2,760 in ’10-’12) GeV
• p+p √sNN ~ 14,000 (7000 in ’10-’12) GeV
2
level 0 - special hardware
level 0 - special hardware8 kHz (160 GB/sec)
level 1 - embedded processors
level 1 - embedded processors
level 2 - PCslevel 2 - PCs
200 Hz (4 GB/sec)
30 Hz (2.5 GB/sec)
30 Hz
(1.25 GB/sec)
data recording &
offline processing
Total weight 10,000t
Overall diameter 16.00mOverall length 25mMagnetic Field 0.5Tesla
ALICE Collaboration~ 1/2 ATLAS, CMS, ~ 2x LHCb~1000 people, 30 countries, ~ 80 Institutes
•A full pp programme•Data rate for pp is 100Hz@1MB
Organization
Core Offline is CERN responsibility
Framework development
Coordination activities
Documentation
Integration
Testing & release
Resource planning
Each sub detector is responsible for its own offline system
It must comply with the general ALICE Computing Policy as defined by the Computing Board
It must integrate into the AliRoot framework
http://aliweb.cern.ch/Offline/
PLANNING
IN PREPARING FOR BATTLE I ALWAYS FOUND PLANS
USELESS BUT PLANNING ESSENTIAL
GEN D.EISENHAUER(155 o
pen items, 32
66 total)
9
Computing model – pp
Generationof calibrationparameters
RAW
CalibrationDisk buffer
T0
CERN T0
First passReco
Tape T0
T0 tape To G
rid
FC
Alien FC
CAF analysis
to T
1s
T1sMC dataT2sPass 1& 2 Pass 1& 2
recorecoordered ordered analysisanalysis
end-user end-user analysisanalysis
10
Computing model – AA
Generationof calibrationparameters
RAW
CalibrationDisk buffer
T0
Tape T0
To tape
To G
rid
FC
Alien FC
CAF analysis
to T
1s
T1s
CERN T0
PilotReco
First passReco
HI data taking
LHC shutdown
From tape
MC dataT2sPass 1& 2 Pass 1& 2
recorecoordered ordered analysisanalysis
end-user end-user analysisanalysis
Prompt reconstruction
Based on PROOF (TSelector)
Very useful for high-level QA and debugging
Integrated in the AliEVE event display
Full Offline code sampling events directly from DAQ memory
13
Ξ→πΛ→pπ
ALICE Analysis Basic Concepts
Analysis Models
Prompt data processing (calib, align, reco, analysis) @CERN with PROOF
Batch Analysis using GRID infrastructure
Local analysis
Interactive analysis PROOF+GRID
User Interface
Access GRID via AliEn or ROOT UIs
PROOF/ROOT
Enabling technology for (C)AF
GRID API class TAliEn
Analysis Object Data contain only data needed for a particular analysis
Extensible with ∆-AODs
Same user code local, on CAF and Grid
Work on the distributed infrastructure has been done by the ARDA project
Ω→Λ→pπ
Analysis train
AOD
TASK 1 TASK 2 TASK 3 TASK 4
ESD
KineEff cor
AOD production is organized in a ‘train’ of tasks
To maximize efficiency of full dataset processing
To optimize CPU/IO
Using the analysis framework
Needs monitoring of memory consumption and individual tasks
Production of RAW
Successful despite rapidly changing conditions in the code and detector operation
74 major cycles
7.2•109 events (RAW) passed through the reconstruction
Processed 3.6PB of data
Produced 0.37TB of ESDs and other data
17
ALICE central services
Sending jobs to data
Job 1lfn1, lfn2, lfn3,
lfn4
Job 2lfn1, lfn2, lfn3,
lfn4Job 3 lfn1, lfn2, lfn3 Optimizer
Submits Submits jobjob User
ALICE Job Catalogue
Registers output
lfn guid {se’s}
lfn guid {se’s}
lfn guid {se’s}
lfn guid {se’s}
lfn guid {se’s}
ALICE File Catalogue
ComputingAgent
Site Site
ComputingAgent
Site
ComputingAgent
Site
ComputingAgent
Send results
Fetch job
Job 1.1 lfn1
Job 1.2 lfn2
Job 1.3 lfn3, lfn4
Job 2.1 lfn1, lfn3
Job 2.1 lfn2, lfn4
Job 3.1 lfn1, lfn3
Job 3.2 lfn2
18
Storage strategy
WN
VOBOX::SA
xrootd (manager)
MSS
xrootd (server)
Disk
SR
MS
RM
xrootd (server)
DPM
xrootd (server)
Castor
SR
MS
RM
SR
MS
RM
MSS
xrootd emulation (server)
dCache
SR
MS
RM
The access to the data
ApplicatioApplicationn
ALICE FCALICE FC
File GUID, lfn or MD
SE & pfn & envelope
lfn→→build pfnbuild pfn
who has pfn?who has pfn?SE & pfn xrootd
ev#guid Tag1, tag2, tag3…
ev#guid Tag1, tag2, tag3…
ev#guid Tag1, tag2, tag3…
ev#guid Tag1, tag2, tag3…
Tag catalogueDirect access to datavia TAliEn/TGrid interface
The ALICE way with XROOTD
Pure Xrootd + ALICE strong authz plugin. No difference among T1/T2 (only size and QOS)
WAN-wide globalized deployment, very efficient direct data access
Tier-0: CASTOR+Xrd serving data normally.
Tier-1: Pure Xrootd cluster serving conditions to ALL the GRID jobs via WAN
“Old” DPM+Xrootd in some tier2s
Xrootd site(GSI)
A globalized clusterALICE global redirector
Local clients workLocal clients workNormally at each Normally at each
sitesite
Missing a file?Missing a file?Ask to the global redirectorAsk to the global redirectorGet redirected to the rightGet redirected to the right
collaborating cluster, and fetch it.collaborating cluster, and fetch it.Immediately.Immediately.
A smart clientcould point here
Any otherXrootd site
Xrootd site(CERN)
Cmsd
Xrootd
VirtualMassStorageSystem… built on data Globalization
More details and complete info in “Scalla/Xrootd WAN globalization tools: where we are.” @ CHEP09
21
CAFlfn guid {se’s}
lfn guid {se’s}
lfn guid {se’s}
lfn guid {se’s}
lfn guid {se’s}
xrootdCASTOR
WNPROOF
XROOTD
WNPROOF
XROOTD
WNPROOF
XROOTD
WNPROOF
XROOTD
WNPROOF
XROOTD
WNPROOF
XROOTD
PROOF master
The whole CAF becomes a xrootd cluster
Powerful and fast machinery – Powerful and fast machinery – very popular with users very popular with users
Allows for any use pattern, Allows for any use pattern, however quite often leading to however quite often leading to contention for resourcescontention for resources
Expected speedup
Observed speedup
70% utilization
22
Analysis facilities - profile
• 1.8 PB of data through CAF, 550TB through SKAF• For comparison – on the Grid, we have written 15PB,
read 37PB
The ALICE Grid
AliEn working prototype in 2002
Single interface to distributed computing for all ALICE physicists
File catalogue, job submission and control, software management, user analysis
~80 participating sites now
1 T0 (CERN/Switzerland)
6 T1s (France, Germany, Italy, The Netherlands, Nordic DataGrid Facility, UK)
KISTI and UNAM coming (!)
~73 T2s spread over 4 continents
~30,000 (out of ~150,000 WLCG) cores and 8.5 PB of disk
Resources are “pooled” together
No localization of roles / functions
National resources must integrate seamlessly into the global grid to be accounted for
FAs contribute proportionally to the number of PhDs (M&O-A share)
T3s have the same role than T2s, even if they do not sign the MoU
http://alien.cern.ch
GRID operation principle
Central AliEn Central AliEn servicesservices
Site VO-boxSite VO-boxSite VO-boxSite VO-box Site VO-Site VO-boxbox
Site VO-boxSite VO-boxSite VO-boxSite VO-box
WMS (gLite/ARC/OSG/Local)WMS (gLite/ARC/OSG/Local)
SM (dCache/DPM/CASTOR/xrootd)SM (dCache/DPM/CASTOR/xrootd)
Monitoring, Package Monitoring, Package managementmanagement
The VO-box system (very controversial in the beginning)
Has been extensively tested
Allows for site services scaling
Is a simple isolation layer for the VO in case of troubles
Operation – central/site support
Central services support (2 FTEs equivalent)
There are no experts which do exclusively support – there are 6 highly-qualified experts doing development/support
Site services support - handled by ‘regional experts’ (one per country) in collaboration with local cluster administrators
Extremely important part of the system
In normal operation ~0.2FTEs/site
Regular weekly discussions and active all-activities mailing lists
Summary
• ALICE offline framework (AliRoot) is mature project that covers simulation, reconstruction, calibration, alignment, visualization and analysis
• Successful operation with “real data” since 2009• The results for several major physics
conferences were obtained in time• The Grid and AF resources are adequate to serve the
RAW/MC and user analysis tasks• More resources would be better of course
• The sites operation is very stable• The gLite (EMI now) software is mature and
few changes are necessary
29
The codeThe code• Move to C++ was probably inevitable
• But it made a lot of “collateral damage”• Learning process was long, and it is still going on• Very difficult to judge what would have happened “had root
not been there”
• The most difficult question is now “what next”• A new language? there is none at the horizon• Different languages for different scopes (python, java, C,
CUDA…) just think about debugging• A better discipline in using C++ (in ALICE no STL /
templates)
• Code management tools, build systems, (c)make, autotools
• Still a lot of “glue” has to be provided, no comprehensive system “out of the box”
• Move to C++ was probably inevitable• But it made a lot of “collateral damage”• Learning process was long, and it is still going on• Very difficult to judge what would have happened “had root
not been there”
• The most difficult question is now “what next”• A new language? there is none at the horizon• Different languages for different scopes (python, java, C,
CUDA…) just think about debugging• A better discipline in using C++ (in ALICE no STL /
templates)
• Code management tools, build systems, (c)make, autotools
• Still a lot of “glue” has to be provided, no comprehensive system “out of the box”
30
The GridThe Grid• A half empty glass
• We are still far from the “Vision”• A lot of tinkering and hand-holding to keep it alive• 4+1 solutions for each problem• We are just seeing now some light at the end of the
tunnel of data management
• The half full glass• We are using the Grid as a “distributed heterogeneous
collection of high-end resources”, which was the idea after all
• LHC physics is being produced by the Grid
• A half empty glass• We are still far from the “Vision”• A lot of tinkering and hand-holding to keep it alive• 4+1 solutions for each problem• We are just seeing now some light at the end of the
tunnel of data management
• The half full glass• We are using the Grid as a “distributed heterogeneous
collection of high-end resources”, which was the idea after all
• LHC physics is being produced by the Grid
31
Grid need-to-haveGrid need-to-have• Far more automation and resilience
• Make the Grid less manpower intensive
• More integration between workload management and data placement
• Better control of upgrades (OS, MW)• Or better transparent integration of different OS/MW
• Integration of the network as an active, provisionable resource
• “Close” storage element, file replication / caching vs remote access
• Better monitoring• Or perhaps simply more coherent monitoring...
• Far more automation and resilience• Make the Grid less manpower intensive
• More integration between workload management and data placement
• Better control of upgrades (OS, MW)• Or better transparent integration of different OS/MW
• Integration of the network as an active, provisionable resource
• “Close” storage element, file replication / caching vs remote access
• Better monitoring• Or perhaps simply more coherent monitoring...