14
Prototype of a Prototype of a Parallel Parallel Analysis Analysis System System for for CMS CMS using using PROOF PROOF CHEP 2006 - Mumbai, India - February 2006 I. González, D. Cano, R. Marco Instituto de Física de Cantabria (CSIC – U.C.), Santander Spain J. Cuevas Dpto. De Física, Universidad de Oviedo, Oviedo - Spain

I. González , D. Cano, R. Marco

Embed Size (px)

DESCRIPTION

Prototype of a Parallel Analysis System for CMS using PROOF CHEP 2006 - Mumbai, India - February 2006. I. González , D. Cano, R. Marco Instituto de Física de Cantabria (CSIC – U.C.), Santander – Spain J. Cuevas Dpto. De Física, Universidad de Oviedo, Oviedo - Spain. Outline. - PowerPoint PPT Presentation

Citation preview

Prototype of a Prototype of a ParallelParallel Analysis Analysis SystemSystem for for CMSCMS using using PROOFPROOF

CHEP 2006 - Mumbai, India - February 2006

I. González, D. Cano, R. MarcoInstituto de Física de Cantabria (CSIC – U.C.), Santander –

Spain

J. CuevasDpto. De Física, Universidad de Oviedo, Oviedo - Spain

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

2

OutlineOutline

Ideal and usual HEP AnalysisA parallel system: PROOFPrototype:

ObjectivesImplementationUser point of viewRunning it..

Performance studiesHardware setupResults

Experience with CMS & PROOFUser point of viewDeveloper point of view

Conclusions

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

3

Ideal HEP Analysis - Kind of HPCIdeal HEP Analysis - Kind of HPC

A typical analysis needs a continuous algorithm refining cycle

1. Implement algorithm (cuts, particle identification, high level reconstruction)

2. Run on data and build histograms, tables,…

3. Look at the results and think on improvements and go back to point 1

To achieve interactivity:Huge levels of CPU are needed during short periods to process large amounts of data HPC model

Time is spent in coding and thinking… not waiting!

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

4

Usual HEP Analysis – Data ProcessingUsual HEP Analysis – Data Processing

A typical sample (final ROOT files) in CMS:

Signal may be of the order of 1M events (150 GB)Backgrounds much, much biggerTogether might be of the order of 1 TB

Algorithms may include:Loops inside loops inside loops…Construction of new objects, collections, etc…Kinematical cutsHistograms, summaries and various other mathematical objects

Unavoidable on a single CPUToo long (hours for signal or days for all)Access to data

Uncomfortable on a distributed batch system

Take care in your code of splitting the samplesInteractivity is completely lostDebugging becomes complicatedTake care of merging the final results

Mixture of bothDevelop and debug in a single CPUProduction selection on a batch system

… parallelisation may be better

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

5

A parallel system - PROOFA parallel system - PROOFImportant aspects of a parallel

system Support for various authentication

mechanisms Possibility to upload code Local and remote data access Efficient master/slave

communication Clever load balancing Easy splitting of data and merging

of final objects

PROOF is the Parallel facility in ROOTProvides a simple model to process TTrees in parallel

Disperse your data among your slaves… …build your libraries following the Selector model…and PROOF takes care of the CPU load for each slave

PROOF supports several ways of authentication: SSH, GSI, kerberos…

Profit from GRID technology Easy mechanism to upload code

Package code into PAR (tar.gz) files and tell ROOT how to load it…and is easy to share these files

Remote data may be accessed via rfio, (x)rootd, …

Implements dynamic load balancing… based on local availability of data…… and individual CPU performance

Master/Slave communication is done through special light daemons (proofd)

Automatic sample splitting and support for object merging is provided:

Automatically handled by PROOF for ROOT objectsOther objects need to inherit from TObject and implement merging codeRecover your histograms automatically to your normal ROOT session

More information in the talk from G. Ganis and in http://root.cern.ch/root/PROOF.html

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

6

CMS & PROOF Prototype - ObjectivesCMS & PROOF Prototype - Objectives

Hide PROOF details as much as possibleThe physicist should concentrate on the analysis developmentForget about the insights of PROOFMake all the operations related to PROOF (compilation, packaging and uploading, etc) invisible for the user

Easy code migration from current analysis applications based on CMS tools (ExRootAnalysisReader)

Integrate and reuse code from those tools (do not reinvent the wheel)Provide the same level of functionality

Load balancing handled and optimised automatically by PROOFBase design on the Selector model in ROOT

Favour a modular analysis development:Facilitate code sharing between different physicistsProvide a mechanism to (de-)activate parts of the analysis at will

Profit from GRID local infrastructuresClusters ready to useAuthentication and certification mechanisms in place

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

7

CMS & Proof Prototype – CMS & Proof Prototype – ImplementationImplementationOne class to encapsulate the interaction with PROOF (compilation, packaging, uploading,…) Modularity achived by inheriting from AnalysisModule base class

Related algorithms to process data (IsoMuonFinder, TTbarSelection,…)

Analysis Modules ManagerOne class to specialise TSelector for CMS data

Integrates also CMS already existing analysis tools

One class to encapsulate CountersVery simple but non existing in ROOT

One class to handle an Input File

Main macro to run PROOF

Main macro to run sequentiallyUseful for debugging

Some scriptsTo generate the skeleton of a new Analysis ModuleInternally used by the tool itself

MyAnalysisMod2Analyser

void AddModule(AnalysisModule*)void Init()void Loop()void Summary(ostream& os)

vector <AnalysisModule*> theModules

MyAnalysisMod1

MyAnalysisModN

AnalysisModule

void Init()void Loop()void Summary()

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

8

CMS & Proof Prototype – User point of viewCMS & Proof Prototype – User point of view

In the input file several things need to be set:

Data files: Location and namePROOF Master IP nameAnalysis Modules to use

Analysis ModulesWhere the actual code goes• The place to concentrate

Only two mandatory methodsThe skeleton may be created with a script • Code produced is well

commented with hints

They can be easily shared between developers

Extra featuresPROOF running statistics may be activatedA mechanism to pass parameters to the modules has been developed

• Avoid recompilation if a cut is changed

• Define them in the input file

Utility packages are supported

• Need not executed on each event

Each module may implement a summary method to be printed at the end of each jobNumber of events to run is configurable…

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

9

CMS & Proof Prototype – Running it…CMS & Proof Prototype – Running it…[#] export ORCA_SRCPATH=/path/to/ORCA/src[#] export ORCA_LIBPATH=/path/to/ORCA/libs[#] grid-proxy-initYour identity: /C=ES…

[#] root –l

root[0] .x RunProof.C>> Creating CMSProofLoader... Info in <TUnixSystem::ACLiC>: creating shared library

~/CMSProof/./CMSProofLoader_C.so…>> Checking if PAR files need to be redone...…>> Initialising PROOF...…SUMARY======Number of events processed: 794920

>> PROOF Done!!!

root[1] fMyHistogram->Draw();root[2] .q

Set environment (once)

Authentication (once)

Start ROOT

Execute RunProof.C

Draw histograms

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

10

Performance Studies – Hardware setupPerformance Studies – Hardware setup

Hardware description90 nodesIBM xSeries 336 , 2 Processors Xeon 3.2GHz , 2GB memory , 2 Hard Disk SATA 80+400GB

Network: Gigabit Ethernet.Stack of 4 units: Switch 3COM SuperStack3 3870 48 portsEach node has 1 Gigabit Ethernet connection to a Gigabit port.

1 slave per node1 Master80 slaves

Data distributed in blocks of ~10K events (~1.5 GB)

1 in each node800K events (~120 GB) in total

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

11

Performance Studies – ResultsPerformance Studies – ResultsWe used a real analysis:

Selection of top quark pair production events with a tau, a lepton and two b quarks in the final state reconstruction from tracks, jets and clusters

ResultsTotal time = processing + initialisation timesRun: Only loop on events

In 1 CPU ~ 4 hoursIn 80 CPUs ~4 minutes

Initialisation time takes ~3 minutes including:

Authentication:• Done on all slaves, even if unused• Therefore not dependent on the number of

slaves usedRemote environment settingCode uploading and compilation

• Smart: Only done for newer code• First time it takes some time (not in plots)

TChain initialisation• Very long for very distributed chains (normal

case)

Run time scales close to the ideal 1/Ncpu

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

12

Experience with CMS & PROOF Experience with CMS & PROOF PrototypePrototype

Good things

Code was easily and quickly migrated from the previous “framework”

One morning of basically copy/pasteIt is done only once and foreverThe old sequential mode is still available which is very useful for debugging

Common code is being shared between different developers located in different placesIn our sites quick analysis development has been possible thanks to the interactivity provided by the analysis parallelisation

Now we have developed ~20 new modulesMore than 200 histograms are produced in a few minutesTime is spent thinking and programming the new cuts and algorithms, not waiting for results

Physicists are concentrated on the physics and computer managers are concentrated on computers

Problems and Improvements

Debugging the code is still a difficult task

If an error happens in the remote node, PROOF master hangsWe need to put more effort on this issue

Histogram are only recovered at the end

PROOF allows to draw plots while data at given intervals while data is being processed

GUI:To set master name, number of events, data files, analysis modules, packages, parameters…Based on the existing and evolving one in PROOF?

Data is specified by data file name and its location in the slaves need to be known.

Use “named” datasets to, for example, specify data by physics processMight be supported by PROOF

More will certainly come as we use it…

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

13

Experience with PROOF – Developer point of Experience with PROOF – Developer point of viewview

Some problems found with PROOF but a lot of good support from PROOF development teamInstalling and setting PROOF is not a straightforward task:

Need to deal with several interconnected different applications: ROOT, Globus, xinetd,… Documentation still incomplete… but improving!PROOF is very sensitive to time syncronisation (authentication), node and network status, ???

• Unable to recover or skip incorrect nodesFinding what is wrong is not easy

• Errors are not always issued… sometime it just hangsEvents are sometimes skipped with no warning message

Could not properly test the configuration with two slaves per node:

Sometimes it would hang (even though there were two CPUs)No gain observed when it works

Performance is strongly dependent on the “locality” of dataBest performance achieved if data is equally distributed among slaves, i.e. no network data transfer neededCurrently done manually…

• PROOF should automatically handle this• Currently being developed (learnt it just yesterday!)

Prototype of a Parallel Analysis System for CMS using PROOF - I. González

14

ConclusionsConclusions

We wanted to implement a tool to quickly, easily and interactively develop a HEP analysis…

… that was modular, usable now and easy… that did not force us to rewrite our already existing code… that allowed us to concentrate on physics… that fully exploited local CPU farms

We built a light tool which profits from existing CMS libraries and PROOF…

… that fills all the requirements… that has brought “interactivity” back into the analysis cycle… that took us one morning to migrate to… that allows code sharing

We gain a lot of experience on using PROOF and developing tools for PROOF…

… that will allow new functionalities into the tool… that will ease the integration of PROOF with the new CMS event data model and framework

More information in: http://grid.ifca.unican.es/cms/proof