17
Next Generation of Event Data Processing Frameworks ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Embed Size (px)

Citation preview

Page 1: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Next Generation of Event Data Processing

Frameworks

ATLAS MeetingCERN, 17 October 2011

P. Mato, CERN

Page 2: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

For the last 40 years HEP event processing frameworks have had the same structure◦ initialize; loop events {loop modules {…} }; finalize◦ O-O has not added anything substantial◦ It is simple, intuitive, easy to manage, scalable

Current frameworks designed late 1990’s◦ We know now better what is really needed◦ Unnecessary complexity impacts on performance

Clear consensus that we need to adapt HEP applications to new generation CPUs◦ Multi-process, multi-threads, GPU’s, vectorization, etc.◦ The one job-per-core scales well but requires too much

memory and sequential file merging

P. Mato/CERN 2

Time for a new framework (i)

Page 3: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Covering a broad scope of applications and environments◦ Single user desktop/laptop case to minimize latency◦ Production (throughput computing) to minimize memory

and other resources Investigations are starting and prototypes are

being developed ◦ Examples:

Languages investigations (Go, OpenCL, etc.) Parallelization and vectorization (e.g. Geant prototype) Porting algorithms to GPU’s

◦ Opportunity to collaborate and organize better the work

P. Mato/CERN 3

Time for a new framework (ii)

Page 4: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Universal framework for simulation, reconstruction, analysis, high level trigger applications

Common framework for use by any experiment Decomposition of the processing of each event into

‘chunks’ that can be executed concurrently Ability to process several events concurrently Optimal scheduling and associated data structures Minimize any processing requiring exclusive access to

resources because it breaks concurrency Supporting various hardware/software technologies Facilitate the integration of existing LHC applications

code (algorithmic part) Quick delivery of running prototypes. The opportunity of

the 18 months LHC shutdownP. Mato/CERN 4

Vision

Page 5: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Current frameworks used by LHC experiments supports all data processing applications◦ High-level trigger, reconstruction, analysis, etc.◦ Nothing really new here

But, simulation applications are designed with a big ‘chunk’ in which all Geant4 processing is happening◦ We to improve the full and fast simulation using the set

common services and infrastructure Running on the major platforms

◦ Linux, MacOSX, Windows

P. Mato/CERN 5

Universal Framework

Page 6: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Frameworks can be shared between experiments◦ E.g. Gaudi used by LHCb, ATLAS, HARP, MINERVA, GLAST,

BES3, etc. We can do better this time :-)

◦ Expect to work closely with LHC experiments◦ Aim to support ATLAS and CMS at least

Special emphasis to requirements from:◦ New experiments

E.g. Linear Collider, SuperB, etc.◦ Different processing paradigms

E.g. fix target experiments, astroparticles

P. Mato/CERN 6

Common Framework

Page 7: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Framework with the ability to schedule modules/algorithms concurrently◦ Full data dependency analysis would be required (no global data or

hidden dependencies)◦ Need to resolve the DAGs (Direct Acyclic Graphs) statically and

dynamically Not much gain expected with today’s designed ‘Tasks’

◦ Algorithm decomposition can be influenced by the framework capabilities

‘Tasks’ could be processed by different hardware/software◦ CPU, GPU, threads, process, etc.

P. Mato/CERN 7

Concurrent ‘Task’ processing

Time

Input Processing Output

Page 8: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

DAG of Brunel◦ Obtained from the existing code

instrumented with ‘Auditors’◦ Probably still missing ‘hidden or

indirect’ dependencies (e.g. Tools) Can serve to give an idea of

potential ‘concurrency’◦ Assuming no changes in current

reconstruction algorithms

P. Mato/CERN 8

Example: LHCb Reconstruction

Page 9: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Need to deal with tails of sequential processing◦ See Rene’s presentation (*)

Introducing Pipeline processing◦ Never tried before!◦ Exclusive access to

resources can be pipelinede.g. file writing

Need to design a verypowerful scheduler

P. Mato/CERN 9

Many Concurrent Events

Time

Page 10: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Concrete algorithms can be parallelized with some effort◦ Making use of Threads, OpenMP, MPI, GPUs, etc.◦ But difficult to integrate them in a complete application

E.g. MT-G4 with Parallel Gaudi◦ Performance-wise only makes sense to parallelize the

complete application and not only parts Developing and validating parallel code is difficult

◦ ‘Physicists’ should be saved from this◦ In any case, concurrency will limit what can and can not be

done in the algorithmic code At the Framework level you have the overall view

and control of the application

P. Mato/CERN 10

Why the Framework managing the concurrency?

Page 11: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

It is not simple but we are not alone◦ Technologies like the Apple’s Grand Central

Dispatch (GCD) are designed to help write applications without having to fiddle directly with threads and locking (and getting it terribly wrong)

New paradigms for concurrency programming◦ Developer needs to factor out the processing in ‘chunks’

with their dependencies and let the framework (system) to deal with the creation and management of a ‘pool’ of threads that will take care of the execution of the ‘chunks’

◦ Tries to eliminates lock-based code and makes it more efficient

P. Mato/CERN 11

Concurrency Programming

Page 12: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

With an approach like the GDC we could exercise different factorizations◦ Processing each event (set of primary particles) could be

the ‘chunk’ (same as GeantMT) We could also go at the sub-event level

◦ Development of Rene’s ideas of ‘baskets’ of particles organized by particle type, volume shape, etc.

◦ Would need to develop an efficient summing (‘reduce’) of the results

◦ Would require to study the reproducibility of results (random number sequence)

P. Mato/CERN 12

Concurrency in Geant4

Page 13: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Collaboration of framework developers of CERN, FNAL, LBL, DESY and possible other Labs◦ Start with small number of people (at the beginning)◦ Open to people willing to collaborate◦ Strong collaboration with ATLAS and CMS (and others)

E.g. Instrumentation of existing applications to provide requirements

◦ Strong collaboration with Geant4 team Quick delivery of running prototypes (I and II)

◦ First prototype in 12 months :-) Agile project management with ‘short’ cycles

◦ Weekly meetings to review progress and update plans

P. Mato/CERN 13

Project

Page 14: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

We need to evaluate some of the existing technologies and design partial prototypes of critical parts◦ See next slide

The idea would be to organize these R&D activities in short cycles◦ Coordinating the interested people to cover all aspects◦ Coming with conclusions (yes/no) within few months

Possibility to organize a kick-off workshop◦ Define the initial R&D activities and involved people◦ Refine the overall vision ◦ Where and When?

P. Mato/CERN 14

R&D Activities

Page 15: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

P. Mato/CERN 15

Investigate current LHC applications to gather requirements◦ Dependencies, data access patterns, opportunities for

concurrency, etc. Investigate design and implementations of state –of-

the-art concurrency frameworks◦ Scheduling (static, dynamic, adaptive), memory model, I/O

Prototype framework elements◦ Identify ‘exemplar’ algorithms to be parallelized◦ Data structures and memory allocation strategies◦ New languages (C++11, Go, pypy, …) ◦ and libraries (OpenCL, CnC, STM, …)

Understanding I/O

Proposed R&D activities

Page 16: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

P. Mato/CERN 16

Straw man Project Timeline

2011

2012

2013

2014

LHC shutdown

Today

R&D, technology evaluation and design of

critical parts

Complete Prototype I

Initial adaptation of LHC and Geant4 applications

Complete Prototype II with experience of

porting LHC applications

First production quality release

Project definition

Page 17: ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN

Presented a proposal for the development of the new generic data processing framework with concurrency to exploit new CPU/GPU architectures

LHC experiments should be the main players providing specific requirements, participating into the project development and taking advantage of the new framework◦ Would imply some re-engineering of parts of the experiment

applications Need a R&D program to evaluate existing technologies

and development of partial prototypes of critical parts Some initial ideas for the project definition being

outlined

P. Mato/CERN 17

Summary