Upload
harry-armstrong
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
SummarySummary
Distributed Data AnalysisDistributed Data AnalysisTrackTrack
F. Rademakers, S. Dasu, V. Innocente
CHEP06 TIFR, Mumbai
CHEP06, 17 Feb 2006 2 Fons Rademakers
OutlineOutline
Introduction Distributed Analysis Systems Submission Systems Bookkeeping Systems Monitoring Systems Data Access Systems Miscellaneous Conveners’ impressions
We have only 20 min for the summaryand therefore cannot do justice to all talks
CHEP06, 17 Feb 2006 3 Fons Rademakers
Track StatisticsTrack Statistics
Lies, damn lies and statistics: there were 23 talks number of cancellations 2 number of no-shows 1 average attendance 25 minimum attendance 12 maximum attendance 55 average duration of talks 23 min equipment failures 1 (laser pointer) average outside temperature 31 C average room temperature 21 C
CHEP06, 17 Feb 2006 4 Fons Rademakers
What Was This Track All About?What Was This Track All About?
DIAL
ProdSys
BOSS
Ganga
Analysis Systems
PROOF
CRAB
Submission Systems
DIRACPANDA
Bookkeeping Systems
JobMon
BOSS BbK
Monitoring Systems
DashBoard
JobMon
BOSS
MonaLisa
Data Access Systems
xrootd
SAM
Miscellaneous
Go4
ARDA
Grid Simulations
AJAX Analysis
CHEP06, 17 Feb 2006 5 Fons Rademakers
Data Analysis SystemsData Analysis Systems
ALICE ATLAS CMS LHCb
PROOF DIAL
GANGA
CRAB
PROOF
GANGA
All systems support, or plan to support, parallelism Except for PROOF all systems achieve parallelism via job
splitting and serial batch submission (job level parallelism)
The different analysis systems presented, categorized by experiment:
CHEP06, 17 Feb 2006 6 Fons Rademakers
Classical Parallel Data AnalysisClassical Parallel Data AnalysisStorageBatch farm
queues
manager
outputs
catalog
“Static” use of resources Jobs frozen, 1 job / CPU
“Manual” splitting, merging Limited monitoring (end of single job) Possible large tail effects
submit
files
jobsdata file splitting
myAna.C
mergingfinal analysis
query
From PROOF System by Ganis [98]
CHEP06, 17 Feb 2006 7 Fons Rademakers
Interactive Parallel Data AnalysisInteractive Parallel Data Analysiscatalog StorageInteractive farm
scheduler
query
Farm perceived as extension of local PC More dynamic use of resources Automated splitting and merging Real time feedback Much better control of tail effects
MASTER
query:data file list, myAna.C
files
final outputs(merged)
feedbacks
(merged)
From PROOF System by Ganis [98]
CHEP06, 17 Feb 2006 8 Fons RademakersPrototype of a Parallel Analysis System for CMS using PROOF - I. González 11
CMSProofCMSProof –– Time & Speedup MeasurementsTime & Speedup Measurements
Real analysis used to select top quark pair production events with a tau (needs to be reconstructed) a lepton and two b quarks in the final stateProcessing ~800K events…
In 1 CPU ~ 4 hours (only event loop)In 80 CPUs ~4 minutes (only event loop)
Initialisation time (~3 minutes):Authentication is done on all slaves, even if unused
• Therefore not dependent on the number of slaves used
Remote environment settingCode uploading and compilation
• Only done for newer code• First time this takes quite some time
TChain initialisation• Very long for very distributed chains
Run time scales close to the ideal 1/ Ncpu
CHEP06, 17 Feb 2006 9 Fons Rademakers
DIALDIALDistributed Interactive Analysis of Large DatasetsDistributed Interactive Analysis of Large Datasets
A useful DIAL system has been deployed for ATLAS Common analysis transformations Access to current data For AOD to histograms and large samples, 15 times faster than a
single process
Easy to use ROOT interface Web-based monitoring Packaged datasets, applications and example tasks
Demonstrated viability of remote processing Via Condor-G or PANDA Need interactive queues at remote sites
With corresponding gatekeeper or DIAL service Or improve PANDA responsiveness
From DIAL by Adams [39]
CHEP06, 17 Feb 2006 10 Fons RademakersD. Adams CHEP06 DIAL February 13, 2006 20ATLAS
DIAL 1.30 AOD processing time 2/10/06
0
600
1200
1800
2400
3000
3600
0 200 400 600 800 1000 1200Thousands of events
Tim
e (s
ec)
single job
(single job)/10
100 MB/s
50 MB/s
10k events
8feb-lfast-nfs-100
9feb-lshort-nfs-100
9feb-cgfast-nfs-100
9feb-panda-nfs-100
10feb-lfast-nfs-100
10feb-lfast-nfs-50
10feb-lfast-nfs-20
Single job
(Single job)/10
CHEP06, 17 Feb 2006 11 Fons Rademakers
GangaGanga
Designed for data analysis on the Grid LHCb will do all its analysis on T1’s T2’s mostly for simulation
System should not be general – we know all main use cases
Use prior knowledge Identified use pattern
Aid user in Bookkeeping aspects Keeping track of many individual jobs
Developed in cooperation between LHCb and ATLAS
From LHCb Experiences by Egede [317]
CHEP06, 17 Feb 2006 12 Fons Rademakers
CHEP06, 17 Feb 2006 13 Fons Rademakers
CHEP06, 17 Feb 2006 14 Fons Rademakers
CHEP06, 17 Feb 2006 15 Fons Rademakers
CRABCRAB
Makes it easy to create large number of user analysis jobs
Assume all jobs are the same except for some parameters (event number to be accessed, output file name…)
Allows to access distributed data efficiently Hiding WLCG middleware complications. All interactions are
transparent for the end user
Manages job submission, tracking, monitoring and output harvesting
User doesn’t have to take care about how to interact with sometimes complicated grid commands
Leaves time to get a coffee …
Uses BOSS as Grid independent submission engine
From CRAB by Corvo [273]
CHEP06, 17 Feb 2006 16 Fons Rademakers
CHEP ’06 Mumbai Marco Corvo – Cern/Cnaf 12
Some statistics
Most accessed sites since J uly 05
CRAB jobs so far
D.Spiga: CRAB Usage and jobs-flow Monitoring (DDA-252)
CHEP06, 17 Feb 2006 17 Fons Rademakers
Submission SystemsSubmission Systems
ALICE ATLAS CMS LHCb
AliEn
(not presented)
ProdSys
PanDA
BOSS DIRAC
These systems are the DDA launch vehicles for the Grid based batch analysis solutions
The different submission systems, categorized by experiment:
CHEP06, 17 Feb 2006 18 Fons Rademakers
ATLAS StrategyATLAS Strategy
ATLAS will use all three main Grids: LCG/EGEE OSG NorduGrid
ProdSys was developed to provide seamless access to all ATLAS grid resources
At this point emphasis on batch model to implement the ATLAS Computing model
Interactive solutions are difficult to realize on top of the current middleware layer
We expect our users to send large batches of short jobs to optimize their turnaround
Scalability Data Access
From ATLAS Strategy by Liko [263]
CHEP06, 17 Feb 2006 19 Fons Rademakers
ProdDB
CECE CE
DulcineaDulcineaDulcinea
DulcineaDulcinea
LexorDulcinea
DulcineaCondorG
CG
PANDA
RBRB
RB
ATLAS Prodsys
CHEP06, 17 Feb 2006 20 Fons Rademakers
CHEP06, 17 Feb 2006 21 Fons Rademakers
CHEP06, 17 Feb 2006 22 Fons Rademakers
BOSSBOSS
Batch Object Submission System A tool for batch job submission, real time monitoring
and book keeping Interfaced to many schedulers both local and grid Utilizes relational database for persistency Full logging and bookkeeping information stored Job commands: submit, kill, query and output retrieval Can define custom job types which allows specify
monitoring unique to the submitted application Significant new functionality identified and being
actively integrated into BOSS
From Evolution of BOSS by Wakefield [240]
CHEP06, 17 Feb 2006 23 Fons Rademakers
BOSS WorkflowBOSS Workflow
boss submitboss queryboss kill BOSS
DB
BOSS Schedulerfarm node
farm node
Wrapper
User specifies job - parameters including: Executable name. Executable type - turn on customized monitoring. Output files to retrieve (for sites without shared file system and grid).
User tells Boss to submit jobs specifying scheduler i.e. PBS, LSF, SGE, Condor, LCG, GLite etc..
Job consists of job wrapper, Real time monitoring service and users executable.
From Evolution of BOSS by Wakefield [240]
CHEP06, 17 Feb 2006 24 Fons Rademakers
DIRACDIRAC
CHEP 2006 (13th–17th February 2006) Mumbai, IndiaStuart K. Paterson 3
Introduction to DIRAC
The DIRAC Workload & Data Management System (WMS) is made up of Central Services and Distributed Agents
Realizes PULL scheduling paradigm
Agents are requesting jobs whenever the corresponding resource is availableExecution environment is checked before job is delivered to WN
Service Oriented Architecture masks underlying complexity
CHEP06, 17 Feb 2006 25 Fons RademakersCHEP 2006 (13th–17th February 2006) Mumbai, IndiaStuart K. Paterson 16
Comparison of 1 and 10 Users for Multi-Threaded Mode
0
50
100
150
200
250
300
350
400
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Start Time (Mins)
Nu
mb
er
of
Job
s
Multi-Threaded 1 User
Multi-Threaded 10 Users
Same Number of Jobs With Less Users
Two cases,1 user submitting 1000 jobs10 users submitting 100 jobs
CHEP06, 17 Feb 2006 26 Fons RademakersCHEP 2006 (13th–17th February 2006) Mumbai, IndiaStuart K. Paterson 20
Conclusions
The DIRAC API provides a simple yet powerful tool for users
Access to LCG resources is provided in a simple and transparent way
DIRAC Multi-Threaded and Filling modes show significant reductions on the job start times
Also reduce the load on LCG
Workload management on the level of the user is effective
Can be more powerful on the level of the VO
DIRAC infrastructure for distributed analysis is in place
Now have real users
CHEP06, 17 Feb 2006 27 Fons Rademakers
Data Access SystemsData Access Systems
The different data access systems that were presented: SAM
Used by CDF in its CAF environment
xrootd serverUsed by BaBar, ALICE, STARAll BaBar sites run xrootd, extensive deployment experienceWinner of the SC05 throughput testPerforms better than even the developers ever expected and had hoped for
xrootd clientMany improvements in the xrootd client side codeReduce latencies using asynchronous read ahead, client side caching and asynchronous opens
CHEP06, 17 Feb 2006 28 Fons RademakersCHEP 13-17 February 2006 11: http://xrootd.slac.stanford.edu
ESnet routed ESnet SDN layer 2 via USN
SLAC to Seattle
BW Challenge
Seattle to SLAC
•SC2005 BW Challenge•Latency Bandwidth
•8 xrootd Servers•4@SLAC & 4@Seattle•Sun V20z w/ 10Gb NIC•Dual 1.8/2.6GHz Opterons•Linux 2.6.12
•1,024 Parallel Clients•128 per server
•35Gb/sec peak•Higher speeds killed router•2 full duplex 10Gb/s links•Provided 26.7% overall BW
•BW averaged 106Gb/sec•17 Monitored links total
I/O Bandwidth (wide area network)
http://www-iepm.slac.stanford.edu/monitoring/bulk/sc2005/hiperf.html
CHEP06, 17 Feb 2006 29 Fons RademakersCHEP 13-17 February 2006 12: http://xrootd.slac.stanford.edu
xrootd Server Scaling
Linear scaling relative to load Allows deterministic sizing of server
Disk
NIC
CPU
Memory
Performance tied directly to hardware cost Underlying hardware & software are critical
CHEP06, 17 Feb 2006 30 Fons Rademakers
CHEP06, 17 Feb 2006 31 Fons Rademakers
AcknowledgmentsAcknowledgments
A big thank you to the organizers And to the speakers for the high quality talks Especially the ones of whom the talks were not
properly summarized
Hope to see you all at CHEP07 to see how the Distributed Data Analysis Systems have evolved
CHEP06, 17 Feb 2006 32 Fons Rademakers
Distributed Data analysis tools are of strategic importance GANGA, DIAL, CRAB, PROOF, … They can be a real differentiator There is a large development activity going on in this area However, none of these tools have yet been exposed to the expected large number of final analysis users
Development of a plethora of grid independent access layers DIRAC, BOSS, ALiEn, PanDA, … Gap between the grid middleware capabilities and user needs, especially data location, placement and
bookkeeping services, left room for this activity Although appropriate now, convergence to one or two tools is desired
CPU and data intensive portion of analysis is most suited for the grid Skimming and organized “rootTree making” is enabled by these DDA tools
Advantage of adapting production style tools to analysis Can one adapt other stuff from production toolbox? Bookkeeping?
Avoid arcane work-group level bookkeeping that is common currently
Interactive analysis on grid with its large latencies PROOF is taking advantage of co-located CPUs for interactive analysis
In the era of multi-core CPUs this is only natural Provides incremental data merging for prompt feedback to users
Most DDA tools coupled to high-latency batch systems aren’t quite capable Block reservation of co-located nodes, a la Condor MPI Universe, may enable PROOF capabilities over the
grid
High throughput AND low latency storage access critical for analysis Attention to performance boosting by deferred opens, caching and read-ahead by xrootd team is encouraging
Conveners’ ObservationsConveners’ Observations