Upload
gray-gould
View
16
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Prototype of a Parallel Analysis System for CMS using PROOF CHEP 2006 - Mumbai, India - February 2006. I. González , D. Cano, R. Marco Instituto de Física de Cantabria (CSIC – U.C.), Santander – Spain J. Cuevas Dpto. De Física, Universidad de Oviedo, Oviedo - Spain. Outline. - PowerPoint PPT Presentation
Citation preview
Prototype of a Prototype of a ParallelParallel Analysis Analysis SystemSystem for for CMSCMS using using PROOFPROOF
CHEP 2006 - Mumbai, India - February 2006
I. González, D. Cano, R. MarcoInstituto de Física de Cantabria (CSIC – U.C.), Santander –
Spain
J. CuevasDpto. De Física, Universidad de Oviedo, Oviedo - Spain
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
2
OutlineOutline
Ideal and usual HEP AnalysisA parallel system: PROOFPrototype:
ObjectivesImplementationUser point of viewRunning it..
Performance studiesHardware setupResults
Experience with CMS & PROOFUser point of viewDeveloper point of view
Conclusions
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
3
Ideal HEP Analysis - Kind of HPCIdeal HEP Analysis - Kind of HPC
A typical analysis needs a continuous algorithm refining cycle
1. Implement algorithm (cuts, particle identification, high level reconstruction)
2. Run on data and build histograms, tables,…
3. Look at the results and think on improvements and go back to point 1
To achieve interactivity:Huge levels of CPU are needed during short periods to process large amounts of data HPC model
Time is spent in coding and thinking… not waiting!
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
4
Usual HEP Analysis – Data ProcessingUsual HEP Analysis – Data Processing
A typical sample (final ROOT files) in CMS:
Signal may be of the order of 1M events (150 GB)Backgrounds much, much biggerTogether might be of the order of 1 TB
Algorithms may include:Loops inside loops inside loops…Construction of new objects, collections, etc…Kinematical cutsHistograms, summaries and various other mathematical objects
Unavoidable on a single CPUToo long (hours for signal or days for all)Access to data
Uncomfortable on a distributed batch system
Take care in your code of splitting the samplesInteractivity is completely lostDebugging becomes complicatedTake care of merging the final results
Mixture of bothDevelop and debug in a single CPUProduction selection on a batch system
… parallelisation may be better
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
5
A parallel system - PROOFA parallel system - PROOFImportant aspects of a parallel
system Support for various authentication
mechanisms Possibility to upload code Local and remote data access Efficient master/slave
communication Clever load balancing Easy splitting of data and merging
of final objects
PROOF is the Parallel facility in ROOTProvides a simple model to process TTrees in parallel
Disperse your data among your slaves… …build your libraries following the Selector model…and PROOF takes care of the CPU load for each slave
PROOF supports several ways of authentication: SSH, GSI, kerberos…
Profit from GRID technology Easy mechanism to upload code
Package code into PAR (tar.gz) files and tell ROOT how to load it…and is easy to share these files
Remote data may be accessed via rfio, (x)rootd, …
Implements dynamic load balancing… based on local availability of data…… and individual CPU performance
Master/Slave communication is done through special light daemons (proofd)
Automatic sample splitting and support for object merging is provided:
Automatically handled by PROOF for ROOT objectsOther objects need to inherit from TObject and implement merging codeRecover your histograms automatically to your normal ROOT session
More information in the talk from G. Ganis and in http://root.cern.ch/root/PROOF.html
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
6
CMS & PROOF Prototype - ObjectivesCMS & PROOF Prototype - Objectives
Hide PROOF details as much as possibleThe physicist should concentrate on the analysis developmentForget about the insights of PROOFMake all the operations related to PROOF (compilation, packaging and uploading, etc) invisible for the user
Easy code migration from current analysis applications based on CMS tools (ExRootAnalysisReader)
Integrate and reuse code from those tools (do not reinvent the wheel)Provide the same level of functionality
Load balancing handled and optimised automatically by PROOFBase design on the Selector model in ROOT
Favour a modular analysis development:Facilitate code sharing between different physicistsProvide a mechanism to (de-)activate parts of the analysis at will
Profit from GRID local infrastructuresClusters ready to useAuthentication and certification mechanisms in place
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
7
CMS & Proof Prototype – CMS & Proof Prototype – ImplementationImplementationOne class to encapsulate the interaction with PROOF (compilation, packaging, uploading,…) Modularity achived by inheriting from AnalysisModule base class
Related algorithms to process data (IsoMuonFinder, TTbarSelection,…)
Analysis Modules ManagerOne class to specialise TSelector for CMS data
Integrates also CMS already existing analysis tools
One class to encapsulate CountersVery simple but non existing in ROOT
One class to handle an Input File
Main macro to run PROOF
Main macro to run sequentiallyUseful for debugging
Some scriptsTo generate the skeleton of a new Analysis ModuleInternally used by the tool itself
MyAnalysisMod2Analyser
void AddModule(AnalysisModule*)void Init()void Loop()void Summary(ostream& os)
vector <AnalysisModule*> theModules
MyAnalysisMod1
MyAnalysisModN
AnalysisModule
void Init()void Loop()void Summary()
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
8
CMS & Proof Prototype – User point of viewCMS & Proof Prototype – User point of view
In the input file several things need to be set:
Data files: Location and namePROOF Master IP nameAnalysis Modules to use
Analysis ModulesWhere the actual code goes• The place to concentrate
Only two mandatory methodsThe skeleton may be created with a script • Code produced is well
commented with hints
They can be easily shared between developers
Extra featuresPROOF running statistics may be activatedA mechanism to pass parameters to the modules has been developed
• Avoid recompilation if a cut is changed
• Define them in the input file
Utility packages are supported
• Need not executed on each event
Each module may implement a summary method to be printed at the end of each jobNumber of events to run is configurable…
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
9
CMS & Proof Prototype – Running it…CMS & Proof Prototype – Running it…[#] export ORCA_SRCPATH=/path/to/ORCA/src[#] export ORCA_LIBPATH=/path/to/ORCA/libs[#] grid-proxy-initYour identity: /C=ES…
[#] root –l
root[0] .x RunProof.C>> Creating CMSProofLoader... Info in <TUnixSystem::ACLiC>: creating shared library
~/CMSProof/./CMSProofLoader_C.so…>> Checking if PAR files need to be redone...…>> Initialising PROOF...…SUMARY======Number of events processed: 794920
>> PROOF Done!!!
root[1] fMyHistogram->Draw();root[2] .q
Set environment (once)
Authentication (once)
Start ROOT
Execute RunProof.C
Draw histograms
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
10
Performance Studies – Hardware setupPerformance Studies – Hardware setup
Hardware description90 nodesIBM xSeries 336 , 2 Processors Xeon 3.2GHz , 2GB memory , 2 Hard Disk SATA 80+400GB
Network: Gigabit Ethernet.Stack of 4 units: Switch 3COM SuperStack3 3870 48 portsEach node has 1 Gigabit Ethernet connection to a Gigabit port.
1 slave per node1 Master80 slaves
Data distributed in blocks of ~10K events (~1.5 GB)
1 in each node800K events (~120 GB) in total
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
11
Performance Studies – ResultsPerformance Studies – ResultsWe used a real analysis:
Selection of top quark pair production events with a tau, a lepton and two b quarks in the final state reconstruction from tracks, jets and clusters
ResultsTotal time = processing + initialisation timesRun: Only loop on events
In 1 CPU ~ 4 hoursIn 80 CPUs ~4 minutes
Initialisation time takes ~3 minutes including:
Authentication:• Done on all slaves, even if unused• Therefore not dependent on the number of
slaves usedRemote environment settingCode uploading and compilation
• Smart: Only done for newer code• First time it takes some time (not in plots)
TChain initialisation• Very long for very distributed chains (normal
case)
Run time scales close to the ideal 1/Ncpu
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
12
Experience with CMS & PROOF Experience with CMS & PROOF PrototypePrototype
Good things
Code was easily and quickly migrated from the previous “framework”
One morning of basically copy/pasteIt is done only once and foreverThe old sequential mode is still available which is very useful for debugging
Common code is being shared between different developers located in different placesIn our sites quick analysis development has been possible thanks to the interactivity provided by the analysis parallelisation
Now we have developed ~20 new modulesMore than 200 histograms are produced in a few minutesTime is spent thinking and programming the new cuts and algorithms, not waiting for results
Physicists are concentrated on the physics and computer managers are concentrated on computers
Problems and Improvements
Debugging the code is still a difficult task
If an error happens in the remote node, PROOF master hangsWe need to put more effort on this issue
Histogram are only recovered at the end
PROOF allows to draw plots while data at given intervals while data is being processed
GUI:To set master name, number of events, data files, analysis modules, packages, parameters…Based on the existing and evolving one in PROOF?
Data is specified by data file name and its location in the slaves need to be known.
Use “named” datasets to, for example, specify data by physics processMight be supported by PROOF
More will certainly come as we use it…
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
13
Experience with PROOF – Developer point of Experience with PROOF – Developer point of viewview
Some problems found with PROOF but a lot of good support from PROOF development teamInstalling and setting PROOF is not a straightforward task:
Need to deal with several interconnected different applications: ROOT, Globus, xinetd,… Documentation still incomplete… but improving!PROOF is very sensitive to time syncronisation (authentication), node and network status, ???
• Unable to recover or skip incorrect nodesFinding what is wrong is not easy
• Errors are not always issued… sometime it just hangsEvents are sometimes skipped with no warning message
Could not properly test the configuration with two slaves per node:
Sometimes it would hang (even though there were two CPUs)No gain observed when it works
Performance is strongly dependent on the “locality” of dataBest performance achieved if data is equally distributed among slaves, i.e. no network data transfer neededCurrently done manually…
• PROOF should automatically handle this• Currently being developed (learnt it just yesterday!)
Prototype of a Parallel Analysis System for CMS using PROOF - I. González
14
ConclusionsConclusions
We wanted to implement a tool to quickly, easily and interactively develop a HEP analysis…
… that was modular, usable now and easy… that did not force us to rewrite our already existing code… that allowed us to concentrate on physics… that fully exploited local CPU farms
We built a light tool which profits from existing CMS libraries and PROOF…
… that fills all the requirements… that has brought “interactivity” back into the analysis cycle… that took us one morning to migrate to… that allows code sharing
We gain a lot of experience on using PROOF and developing tools for PROOF…
… that will allow new functionalities into the tool… that will ease the integration of PROOF with the new CMS event data model and framework
More information in: http://grid.ifca.unican.es/cms/proof