24
[email protected], DIANE Geant4 Workshop, CERN Oct 2002 Distributed Simulation with Geant4 Preliminary results of the LowE / DIANE joint project Jakub T. Moœcicki, CERN/IT credits also to: Alfonso Mantero, INFN Genova

Distributed Simulation with Geant4

  • Upload
    jariah

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Distributed Simulation with Geant4 Preliminary results of the LowE / DIANE joint project Jakub T. Moœcicki, CERN/IT credits also to: Alfonso Mantero, INFN Genova. History. Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe - PowerPoint PPT Presentation

Citation preview

Page 1: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Distributed Simulation with Geant4

Preliminary results of the LowE / DIANE joint project

Jakub T. Moœcicki, CERN/ITcredits also to: Alfonso Mantero, INFN Genova

Page 2: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

History

Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe

DIANE is an R&D project in IT/API to study distributed analysis and simulation and create a prototype

initiated early 2001 with very limited resources

Anaphe is an analysis project supported by IT

provides the analysis framework for HEP

The pilot programme includes G4 simulation which produces AIDA/Anaphe histograms

Collaboration started late spring 2002

Page 3: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Sequential Geant4 Simulation

the goal of simulation:

optimize the detectors used for x-ray fluorescence emission from Mercury's crust in the context of Hermes, Bepi Colombo ESA mission.

requires high statistics many events

20 Mio events ~ 3 hours

up to 100 Mio events might be useful

estimated time ~16 hours

Page 4: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Parallel Geant4 Simulationincrease performance

shift from batch to semi-interactive simulation

speed up the analysis cycle

generate more events – debug simulation faster

from sequential to parallel simulation

preserve reproducability of the results

minimize deployment overhead

when moving from sequential to parallel simulation

both in terms of time and amout of code/expertise one must invest

Page 5: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Performance Increase

Page 6: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Benchmarking environmentparallel cluster configuration

lxplus: 70 redhat 61 nodes

7 Intel STL2 (2 x PIII 1GHz, 512MB)

31 ASUS P2B-D ( 2 x PIII 600MHz, 512MB)

15 Celsius 620 (2 x PIII, 550MHz, 512MB)

the rest – Kayak 450 Mhz (2 x PIII, 450Mhz, 128MB)

reference sequential machine

pcgeant2 (2x Xeon 1700Mhz, 1GB)

Page 7: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Benchmarking Caveatnon-exclusive access to interactive machines

'load-noise' background, unpredictible load peaks

different CPU and RAM on nodes

AFS used to fetch physics config data

try to remove the noise:

repeat simulations many times to get the correct mean

work at night and off-peak hours (what about US people using CERN computing facilities ?)

etc...

conclusion:

results should be taken with caution and are approximate

Page 8: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Structure of the simulation initialization phase (constant)

load ~10-15 Mb of physics tables, config data etc.

reference sequential machine: ~ 4 minutes (user time)

cluster nodes: ~ 5-6 minutes

beamOn ~ f( event number )

small job: 1-5 Mio events

medium job: 20-40 Mio events

big job: > 50 Mio events

Page 9: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Scalability test (job time)

Page 10: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Normalized efficency

Page 11: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Benchmarking (comments)results are approximate

scaling factors for different CPU speeds

but seem with agreement with expectations

move from batch to semi interactive simulation feasible

small jobs do not gain so much – large constant initialization time

Page 12: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Problems & solutionstime of job execution = slowest machine...

...or most loaded one at the moment

often had to wait a long time for last worker to finish

possible solution:

use larger number of smaller workers

fast machines run workers sequentially many times, but...

constant initialization time rather important

initialize once, beamOn many times... to be checked

if this problem is solved we may move towards more interactive simulation

Page 13: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

From sequential to parallel simulation

Page 14: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Reproducability initial seed of the random engine

make sure that every parallel simulation starts with a seed uniquely determined by the job's initial seed

number of times engine is used depends on the initial seed

make sure that correlations between the workers' seeds are avoided

our solution:

use two uncorrelated random engines

one to generate a table of initial seeds (one seed for each worker)

another for the simulation inside the worker

Page 15: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Reproducabilityparameters which need to be fixed to reproduce the simulation:

total number of events

initial seed

... but also:

number of workers

number of events per worker

Page 16: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Minimizing deployment overhead

Page 17: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Ease of use

user-friendliness

G4 simulation developer should not need to fight with irrelevant technical problems when moving from sequential to parallel G4 simulation

as non-intrusive as possible

minimize necessary code changes in original simulation

good separation of the subsystems

G4 simulation does not need to know that it runs in parallel...

the distributed framework (DIANE) does not need to care about what actually is being simulated (see #Slide 20)

Page 18: Distributed Simulation with Geant4

Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 18

What is DIANE?What is DIANE?

R&D project in IT/API

semi-interactive parallel analysis for LHCmiddleware technology evaluation & choice

CORBA, MPI, Condor, LSF...also see how to integrate API products with GRID

prototyping (focus on ntuple analysis)

time scale and resources:

Jan 2001: start (< 1 FTE)June 2002: running prototype exists

sample Ntuple analysis with Anapheevent-level parallel Geant4 simulation

Page 19: Distributed Simulation with Geant4

Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 19

What is DIANE?What is DIANE?

framework for parallel cluster computationapplication-oriented

master-worker model common in HEP applications

application-independentapps dynamically loaded in a plugin stylecallbacks to applications via abstract interfaces

component-basedsubsystems and services packaged into component librariescore architecture uses CORBA and CCM (CORBA Component Model )

integration layer between applications and the GRID

environment and deployment tools

Page 20: Distributed Simulation with Geant4

Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 20

Master/Worker model

applications share the same computation modelso also share a big part of the framework codebut have different non-functional requirements

CPU vs IO intensive

semi-interactive vs batch etc....

Page 21: Distributed Simulation with Geant4

Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 21

What DIANE is What DIANE is notnot

DIANE is nota replacement for a GRID and its servicesa hardwired analysis toolkit

Page 22: Distributed Simulation with Geant4

Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 22

DIANE and GRID

DIANE as a GRID computing element...via a gateway that understands Grid/JDL

... Grid/JDL must be able to descibe parallel jobs/tasks

DIANE as a user of (low level) Grid services ...authentication, security, load balancing...

and profit from existing 3rd party implementations

python environment is a rapid prototyping platform and may provide a convinient connection between DIANE and Globus Toolkit via pyGlobus API

Page 23: Distributed Simulation with Geant4

Geant4 Workshop, Oct 2002 CERN IT/API, [email protected] 23

Architecture Overview

layering: abstract middleware interfaces and components

plugin-style application loading

Page 24: Distributed Simulation with Geant4

[email protected], DIANE Project

Geant4 Workshop, CERN Oct 2002

Conclusionsprototype deployment of G4-DIANE

significant performance improvement possible

scalability tests:

140 Mio Events

70 nodes in the cluster

1 hour total parallel execution

putting together DIANE and G4 is fairly easy

done in several days...

DIANE may bridge G4 to the GRID world

without necessarily waiting for fully-fledged GRID infrastructure to become available