26
[email protected], DIANE Seminar on Innovative Detectors, Siena Distributed Computing in Physics Parallel Geant4 Simulation in Medical and Space Science Applications Jakub T. Moscicki, CERN/IT Maria G. Pia, INFN Genova Alfonso Mantero, INFN Genova Susanna Guatelli, INFN Genova

[email protected], DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

Embed Size (px)

Citation preview

Page 1: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Distributed Computing in Physics

Parallel Geant4 Simulation in Medical and Space Science Applications

Jakub T. Moscicki, CERN/ITMaria G. Pia, INFN Genova

Alfonso Mantero, INFN GenovaSusanna Guatelli, INFN Genova

Page 2: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Applications of Distributed Technology and GRID

Examples of interdisciplinary applicationsGeant4 simulation and analysis

speed-up factor ~ 30 times

DIANE R&D Project: application-oriented gateway to GRID

developed for LHC

CERN IT/API – INFN Geant4/LowEnergy collaboration

cern.ch/diane

LHC: ntuple analysis and simulationradiotherapy: brachytherapy, IMRTspace missions: ESA Bepi Colombo, LISA

Page 3: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Why Distributed Computing?

share limited hardware resources

lend when not needed, borrow when needed

optimize load of CPUs

avoid redundancy: save common disk space

distributed collaborations e.g. LHC community

share and manage access to distributed data

replication, security, consistency

move processing close to available resources

e.g. data

process in parallel

Page 4: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

What is GRID ?

global, unified resource access system

a la WWW: easy and universal access

virtual organisations over administrative boundaries

black-box: sumbit here, run anywhere

world of virtual happiness but...

in pratice to work efficiently and correctly every generic system must be customized to match specific experiment's needs and their configuration

technology in constant evolution

mature and universally accessible GRID still to come

Page 5: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

DIstributed ANalysis Environmentparallel cluster processing

make fine tuning and customization easy

transparently using GRID technology

accessible via a Wide Area Network

application independent

Page 6: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

DIstributed ANalysis Environmenthide complex details of underlying technology

easy to use

dedicated to master-worker modelmost of typical jobs: ntuple analysis, event level distributed simulation

Page 7: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Preliminary Benchmark Results

Page 8: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Standard Geant4 Simulation

the goal of simulation:

study the experimental configuration and the physics reach for Bepi Colombo ESA mission to Mercury

requires high statistics many events

20 Mio events ~ 3 hours

up to 100 Mio events might be useful

estimated time ~16 hours

analysis implemented with AIDA/Anaphe

Page 9: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Distributed Geant4 Simulation

increase performance

shift from batch to semi-interactive simulation

user can study the results of the simulation faster and more often

generate more events – debug simulation faster

correctness and ease of use

preserve reproducability of the results

parallel should look as local to users

main goals:

Page 10: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Benchmarking Environmentparallel cluster configuration

70 redhat 61 nodes

7 Intel STL2 (2 x PIII 1GHz, 512MB)

31 ASUS P2B-D ( 2 x PIII 600MHz, 512MB)

15 Celsius 620 (2 x PIII, 550MHz, 512MB)

the rest – Kayak 450 Mhz (2 x PIII, 450Mhz, 128MB)

reference sequential machine

pcgeant2 (2x Xeon 1700Mhz, 1GB)

notice different CPU speeds and memory size

Page 11: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Scalability Test – Job Time

not normalized execution time: average gain 15 times

Page 12: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Normalized Efficiency

normalized efficiency: average real gain ~30 times

Page 13: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Benchmarking Commentarynon-exclusive access to interactive machines

'load-noise' background, unpredictible load peaks

different CPU and RAM on nodes

AFS used to fetch physics config data

try to remove the noise:

repeat simulations many times to get the correct mean

work at night and off-peak hours (what about US people using CERN computing facilities ?)

etc...

interpretation of results

scaling factors for different CPU speeds

results agree with expectations

Page 14: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Summary

Page 15: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Scalability Testsprototype deployment of Geant4-DIANE

proved significant performance improvement

scalability tests:

140 Mio Events

70 nodes in the cluster

1 hour total parallel execution

putting together DIANE and Geant4 is fairly easy

done in few days...

Page 16: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Easy to use

user-friendliness

application developer (e.g. Geant4 simulation) is shielded from complexity of underlying technology

not affecting the original code of application

standalone and distributed cases is the same code

good separation of the subsystems

application does not need to know that it runs in distributed environment...

the distributed framework (DIANE) does not need to care about what actions application performs internally

Page 17: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Universally ApplicableDIANE is application independent

easy to customize and use in applications other than Geant4

e.g. it has been originally developed for ntuple analysis

DIANE may bridge applications to the GRID world

without necessarily waiting for fully-fledged GRID infrastructure to become available

with smooth transition to GRID technologies as they become available

DIANE and distributed computing technology may be applied in a variety of other scientific/research domains

Page 18: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

In progress: Optimizationtime of job execution = slowest machine...

...or most loaded one at the moment

often had to wait a long time for last worker to finish

example of customization

exploit dual-processor mode

use larger number of smaller workers

fast machines run workers sequentially many times

benchmark in dedicated cluster

Page 19: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

In progress: Medical Applicationsplan to run Geant4 simulation for radiotherapy in couple of days

new possibilities:

precise MC-based treatment planning FAST

small hospitals may access distributed resources worldwide

Page 20: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Referencesmore informarion:

cern.ch/diane

www.ge.infn.it/geant4/techtransf

aida.freehep.org

cern.ch/anaphe

Page 21: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

The end

Page 22: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

From sequential to parallel simulation

Page 23: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Structure of the simulation initialization phase (constant)

load ~10-15 Mb of physics tables, config data etc.

reference sequential machine: ~ 4 minutes (user time)

cluster nodes: ~ 5-6 minutes

beamOn ~ f( event number )

small job: 1-5 Mio events

medium job: 20-40 Mio events

big job: > 50 Mio events

Page 24: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Reproducability initial seed of the random engine

make sure that every parallel simulation starts with a seed uniquely determined by the job's initial seed

number of times engine is used depends on the initial seed

make sure that correlations between the workers' seeds are avoided

our solution:

use two uncorrelated random engines

one to generate a table of initial seeds (one seed for each worker)

another for the simulation inside the worker

Page 25: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

DIANE – G4 prototype

Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe

DIANE is an R&D project in IT/API to study distributed analysis and simulation and create a prototype

initiated early 2001 with very limited resources

Anaphe is an analysis project supported by IT

provides the analysis framework for HEP

The pilot programme includes G4 simulation which produces AIDA/Anaphe histograms

Collaboration started late spring 2002

Page 26: Jakub.Moscicki@cern.ch, DIANE Project Seminar on Innovative Detectors, Siena Oct 2002 Distributed Computing in Physics Parallel Geant4 Simulation in Medical

[email protected], DIANE ProjectSeminar on Innovative Detectors, Siena Oct 2002

Reproducabilityparameters which need to be fixed to reproduce the simulation:

total number of events

initial seed

... but also:

number of workers

number of events per worker