My Journey through Scientific Computing

Preview:

DESCRIPTION

My Journey through Scientific Computing. August 28 2013, FNAL René Brun/CERN*. Disclaimer. In this talk I present the views of somebody involved in some aspects of scientific computing as seen from a major lab in HEP. - PowerPoint PPT Presentation

Citation preview

My Journey throughScientific

ComputingAugust 28 2013, FNAL

René Brun/CERN*

R.Brun 2

DisclaimerIn this talk I present the views of somebody involved in some aspects of scientific computing as seen from a major lab in HEP.

Having been involved in the design and implementation of many systems, my views are necessarily biased.

28/8/13

R.Brun 3

CV with experiments

Jun 1973: Thesis Nuclear Physics

Jul 1973 : ISR: R602 with C.Rubbia (reconstruction)

Oct 1974: SPS: NA4 with C.Rubbia (simulation/recons)

Feb 1980: LEP: OPAL at LEP (simulation and framework)

1991-1993 :LHC: ATLAS/CMS simulation

Sep 1994: SPS: NA49 at SPS (exile)

Sep 1996: LHC: ALICE (simulation and framework)

28/8/13

R.Brun 4

CV with general software

1974 : HBOOK

1975: GEANT1, Lintra, Mudifi

1977: GEANT2, HPLOT, ZBOOK

1980: GEANT3

1983: ZEBRA

1984: PAW

1995: ROOT

2012: GEANT4+5 vector prototype concepts

28/8/13

R.Brun 5

Machines

28/8/13

From Mainframes ===== Clusters

Walls of

cores

GRIDs&

Clouds

R.Brun 6

Machine Units (bits)

28/8/13

16 32 36 48 56 60 64pdp

11nord50

besm6

cdc many

many

univac

With even more combinations of

exponent/mantissa size

or byte ordering

A strong push to develop portable

machine independent I/O systems

R.Brun 7

User machine interface

28/8/13

R.Brun 8

Systems in 1980

28/8/13

OS & fortran

LibrariesHBOOK, Naglib, cernlib

ExperimentSoftware

End userAnalysis software

CDC, IBM

1000 KLOC

500 KLOC

100 KLOC

10 KLOC

Vax780

TapesRAM1 MB

R.Brun 9

Systems today

28/8/13

OS & compilers

Frameworks likeROOT, Geant4

ExperimentSoftware

End userAnalysis software

Hardware

20 MLOC

5 MLOC

4 MLOC

0.1 MLOC

HardwareHardwareHardwareClusters of multi-core machines

10000x8

GRIDS

CLOUDS

Networks10 Gbit/s

Disks1o PB

RAM16 GB

R.Brun 10

Tools & Libs

28/8/13

hbook

zebrapawzbook

hydra

geant1

geant2

geant3

geant4

root

minuit

bos

Geant4+5

R.Brun 11

General Software in 1973

Software for bubble chambers: Thresh, Grind, Hydra

Histogram tool: SUMX from Berkeley

Simulation with EGS3 (SLAC), MCNP(Oak Ridge)

Small Fortran IV programs (1000 LOC, 50 kbytes)

Punched cards, line printers, pen plotters (GD3)

Small archive libraries (cernlib), lib.a 28/8/13

R.Brun 12

Software in 1974First “Large Electronic Experiments”

Data Handling Division == Track Chambers

Well organized software in TC with HYDRA, Thresh, Grind, anarchy elsewhere

HBOOK: from 3 routines to 100, from 3 users to many

First software group in DD28/8/13

R.Brun 13

GEANT1 in 1975Very basic framework to drive a simulation program, reading data cards with FFREAD, step actions with GUSTEP, GUNEXT, apply mag-field (GUFLD).

Output (Hist/Digits) was user defined

Histograms with HBOOK

About 2,000 LOC

28/8/13

R.Brun 14

ZBOOK in 1975Extraction of the HBOOK memory manager in an independent package.

Creation of banks and data structures anywhere in common blocks

Machine independent I/O, sequential and random

About 5,000 LOC

28/8/13

R.Brun 15

GEANT2 in 1976Extension of GEANT1 with more physics (e-showers based on a subset of EGS, mult-scattering, decays, energy loss

Kinematics, hits/digits data structures in ZBOOK

Used by several SPS experiments (NA3, NA4, NA10, Omega)

About 10,000 LOC

28/8/13

R.Brun 16

Problems with GEANT2

Very successful small framework.

However, the detector description was user written and defined via “if” statements at tracking time.

This was becoming a hard task for large and always evolving detectors (case with NA4 and C.Rubbia)

Many attempts to describe a detector geometry via data cards (a bit like XML), but the main problem was the poor and inefficient detector description in memory.

28/8/13

R.Brun 17

GEANT3 in 1980A data structure (ZBOOK tree) describing complex geometries introduced , then gradually the geometry routines computing distances, etc

This was a huge step forward implemented first in OPAL, then L3 and ALEPH.

Full electromagnetic showers (first based on EGS, then own developments)

28/8/13

R.Brun 18

(HYDRA,ZBOOK)GEM ->ZEBRA

HYDRA and ZBOOK continuous developments, both having nice and complementary features.

In 1981 the GEM project launched, developed with no contacts with experiments fail to deliver a working system in 1982.

In 1983 the director for computing decided to stop GEM and HYDRA and support the ZBOOK line, mainly because of the success of GEANT3 based on ZBOOK.

I decided to collaborate with J.Zoll (HYDRA) to develop ZEBRA, combining ZBOOK and HYDRA.

This decision made OPAL and L3 happy, but ALEPH decided to use BOS from DESY.

28/8/13

19

GEANT3 with ZEBRA

ZEBRA was very rapidly implemented in 1983.

We introduced ZEBRA in GEANT3 in 1984.

From 1984 to 1993 we introduced plenty of new features in GEANT3: extensions of the geometry, hadronic models with Tatina, Gheisha and Fluka, Graphics tools.

In 1998, GEANT3 interface with ROOT via the VMC (Virtual Monte Carlo)

GEANT3 has been used and still in use by many experiments.

28/8/13R.Brun

R.Brun 20

Graphics/UI in the 80s

CORE system (Andy Van Dam) in the US

GKS in Europe

Xwindow wins with Xterms and workstations

We design HIGZ (interface to graphics and ZEBRA) with many interfaces (CORE, GKS, X, PHIGS,etc)

KUIP for User Interface (VT100, work-stations, xterms)

28/8/13

R.Brun 21

PAWFirst minimal version in 1984

Attempt to merge with GEP (DESY) in 1985, but take the idea of ntuples for storage and analysis. GEP was written in PL1.

Package growing until 1994 with more and more functions. Column-wise ntuples in 1990.

Users liked it, mainly once the system was frozen in 1994.

28/8/13

R.Brun 22

Vectorization attempts

During the years 1985->1990 a big effort was invested in vectorizing GEANT3 (work in collaboration with Florida State University) on CRAY/YMP, CYBER205,ETA10.

The minor gains obtained did not justify the big manpower investment. GEANT3 transport was still essentially sequential and we had a big overhead with vectors creation, gather/scatter.

However this experience and failure was very important for us and many messages useful for the design of GEANT5 many years later.

28/8/13

23

So far so goodThe years 1980->1989 were pretty quiet (ie no fights). Fortran 77 was the main language and LEP our main target. The SSC simulations (GEM, SDC) were along the same lines.

The ADAMO system had been proposed as an implementation of the Entity Relationship Model, but its use remained confidential (ALEPH/ZEUS), same for the Jazelle system from SLD.

In 1989 Tim Berners Lee joined my group in DD to implement a system allowing access to a central documentation on the IBM via RPCs (Remote Procedure Calls), but he developed something else. This coincided with a lot of controversy about future languages, data structures management, data bases, user interfaces, documentation systems, etc.

28/8/13R.Brun

R.Brun 24

1991: Erice Workshop

This workshop was supposed to see an agreement on the directions and languages for the next generation experiments, instead a very confusing picture emerged.

The MOOSE project created at CERN to investigate languages such as Eiffel, C++, ObjectiveC, F90. But this project failed to produce anything concrete.

At the same time my group was very busy with the LEP analysis with PAW and the SSC and LHC simulations with GEANT3 (ATLAS and CMS).

28/8/13

R.Brun 25

Erice workshop 1991

28/8/13

Where to go with DS

tools & languages

?

Powerful tools but

programming with

them was odd

R.Brun 26

1992: CHEP Annecy

Web, web, web, web…………

Attempts to replace/upgrade ZEBRA to support/use F90 modules and structures, but modules parsing and analysis was thought to be too difficult.

With ZEBRA the bank description was within the bank itself (just a few bits). A bank was typically a few integers followed by a dynamic array of floats/doubles.

We did not realize at the time that parsing user data structures was going to be a big challenge!!

28/8/13

R.Brun 27

Parallelism in the 80s & early 90s

Many attempts (all failing) with parallel architectures

Transputers and OCCAM

MPP (CM2, CM5, ELXI,..) with OpenMP-like software

Too many GLOBAL variables/structures with Fortran common blocks.

RISC architectures or emulators perceived as a cheaper solution in the early 90s.

Then MPPs died with the advent of the Pentium Pro (1994) and farms of PCs or workstations.

28/8/13

R.Brun 28

ConsequencesIn 1993/1994 performance was not anymore the main problem.

Our field invaded by computer scientists.

Program design, object-oriented programming , move to more sexy languages was becoming a priority.

The “goal” was thought less important than the “how”

This situation deteriorates even more with the death of the SSC.

28/8/13

R.Brun 29

1993: Warning Danger

3 “clans” in my group1/3 pro F90 1/3 pro C++1/3 pro commercial products (any language) for graphics, User Interfaces, I/O and data bases

My proposal to continue with PAW, develop ZOO(ZEBRA Object-Oriented) and GEANT3 geometry in C++ is not accepted.

Evolution vs Revolution

28/8/13

R.Brun 30

1994: What next?SSC down, LHC up. SSC refugees joining LHC development groups.

DRDC projects: Objectivity, GEANT4

Time to think, think, think and learn new things (OO,UML,C++,Eiffel, O2, ObjectStore, Objectivity,..)

Discard several proposals and choose exile to NA49

Fall94: first version of histogram package in C++, including some I/O attempts. Now in a better position to estimate development time, C++ pros&cons, Objectivity cons.

Christmas94: YES, let’s go with ROOT

28/8/13

R.Brun 31

1995: roads for ROOT

The official line was with GEANT4 and Objectivity, not much room left for success with an alternative product when you are alone.

The best tactic had to be a mixture of sociology , technicalities and very hard work.

Strong support from PAW and GEANT3 usersStrong support from HP (workstations + manpower)

In November we were ready for a first ROOT show

Java is announced (problem?)

28/8/13

R.Brun 32

1996: Technicalities

Histogram classes (3 versions) + Minuit

Hand written Streamers

From PAW/KUIP style to CINT

Collection classes (STL , templates, hesitations..)

ATLFAST++ (fast simulation of ATLAS based on ROOT)

Letter from the director of computing against the use of ROOT by experiments (except NA49). Problem for ALICE.

LHC++ official alternative to ROOT

28/8/13

R.Brun 33

1997: Technicalities++ROOTCINT generated Streamers with parsing of C++ header files, including user classes.

Many improvements and new packages

Automatic classes documentation system (THTML), first manual and tutorials.

gAlice converted to AliRoot

Interest from RHIC (Phobos and Star)

CHEP97 in Berlin with focus on GEANT4, Objectivity,LHC++,JAS

28/8/13

R.Brun 34

1998: work & smile

RUN II projects at FNALData Analysis and VisualizationData Formats and storage

ROOT competing with HistoScope, JAS, LHC++

CHEP98 (September) Intercontinental Chicago

ROOT selected by FNAL, followed by RHICVital decision for ROOT, thanks FermiLab !!!

But official support at CERN only in 2002

28/8/13

R.Brun 35

ROOT Trees vs Objectivity

Compared to the best OODBMS candidate in 1995 (Objectivity) ROOT supports a persistent class that may be a subset of the transient class.

ROOT supports compression (typical factors 3 to 6), file portability and access in heterogeneous networks

ROOT supports branch splitting that increases drastically the performance when reading.

It is based on classical system files and does not require the nightmare of a central data base.

ROOT TTreeCache allows efficient access to LAN and WAN files.

28/8/13

R.Brun

Input/Output: Major Steps

36

parallel merge

TreeCache

member-wise streamingfor STL collections<T*>

member-wise streamingfor TClonesArray

automatic streamers from dictionary with StreamerInfosin self-describing files

streamers generatedby rootcint

User written streamersfilling TBuffer

28/8/13

R.Brun 37

ROOT evolutionNo time to discuss the creation/evolution of the 110 ROOT shared libs/packages.

ROOT has gradually evolved from a data storage, analysis and visualization system to a more general software environment replacing totally what was known before as CERNLIB.

This has been possible thanks to MANY contributors from experiments, labs or people working on other fields.

In the following few slides, I show the current big systems assigned to at least one developer.

28/8/13

R.Brun 38

Code repository and ManagementFrom CMZ to CVS SVN GIT

Make cmake

Build and test infrastructure

Distribution

Forum, mails, Savannah->JIRA

Documentation, User Guide

28/8/13

R.Brun 39

CINT -> CLINGCINT, originally developed by Masa Goto (HP/Taligent/Japan) in 1991 has been gradually upgraded across the years.

Work in progress to replace CINT by CLING based on the CLANG C++ compiler from Apple.

With CLING many C++ limitations of CINT will be eliminated and full support for C++11 provided at the command line, script via a JIT compiler.

28/8/13

R.Brun 40

2-D GraphicsExtremely important area that requires a lot of effort to support a large number of styles, options.

Support for graphics on screen with different back-ends for Linux, Macs/Ipad , Windows, Android, etc

Support for many output formats (.ps, .eps, .pdf,.tex,.gif,.png,.jpg, .C, .root

Support for interaction, picking, reflexion, etc 28/8/13

R.Brun 41

3D GraphicsAgain many back-ends: X, OpenGL, GDK/Windows, Cocoa/Mac,

Event viewing

Ouput drivers

User Interfaces

28/8/13

R.Brun 42

Geometry package

Originally from GEANT3, then considerable developments.

Many interfaces: to/from GEANT3 and GEANT4

New developments (parallelism, thread safety and vectorization) in the context of the GEANT4+5 project

28/8/13

R.Brun 43

User InterfaceUI library for X, GL, GDK, Cocoa, Ipad,

Pre-defined menu, pop-up, pull-down

Interface builder

C++ automatic code generator

Qt interface

28/8/13

R.Brun 44

Maths&StatsMathematical functions

Matrix packages

Random number generators

MINUIT, Minuit2,..

RooStats, RooFit

TMVA,…

28/8/13

R.Brun 45

PyRoot/PythonFull time to work to support an easy Python interface

Old interface based on CINT

New interface based on CLING

28/8/13

R.Brun 46

PROOFUse multi-process techniques to speed-up interactive analysis.

Evolved and again evolving to be used via automatic interfaces to the GRID systems

PROOFLight interesting for multi-core laptops.

28/8/13

R.Brun 47

Systems in 2030 ?

28/8/13

OS & compilers

Frameworks likeROOT, Geant5

ExperimentSoftware

End userAnalysis software

Hardware

100 MLOC

20 MLOC

50 MLOC

1 MLOC

HardwareHardwareHardwareMulti-level parallel machines10000x1000x1000

GRIDS

CLOUDSon

demand

Networks100

Gbit/s

Disks1o00 PB

Networks100

Gbit/sNetworks10 Tbit/s

RAM10 TB

R.Brun 48

Parallelism: key points

28/8/13

Minimize the sequential/synchronization parts (Amdhal law): Very difficultRun the same code (processes) on all cores to optimize the memory use (code and read-only data sharing)

Job-level is better than event-level parallelism for offline systems.

Use the good-old principle of data locality to minimize the cache misses.

Exploit the vector capabilities but be careful with the new/delete/gather/scatter problem

Reorganize your code to reduce tails

Data Structures & parallelism

28/8/13R.Brun 49

eventevent

vertices

tracks

C++ pointersspecific to a process

Copying the structure implies a

relocation of all pointers

I/O is a nightmare

Update of the structure from a different thread implies a

lock/mutex

R.Brun 50

Data Structures & Locality

28/8/13

sparse data structures defeat the system memory caches

Group object elements/collections such

that the storage matches the traversal processes

For example: group the cross-sections for all

processes per material instead of all materials

per process

Tails

28/8/13R.Brun 51

A killer if one has to wait the end of col(i) before

processing col(i+1)Average number

of objects in memory

A better solution

28/8/13R.Brun 52

Pipeline of objects

CheckpointSynchronization.

Only 1 « gap » every N events

This type of solution required

anyhow for pile-up studies

R.Brun 53

Other requirements

Eliminate the sequential part (like merging files) when running jobs/threads/processes in parallel. Use parallel buffer merges instead.

Use efficient tools to monitor bottlenecks like memory allocation, cache misses, too many locks, etc

Compare the results with the most efficient sequential version and not just the version using one single thread.

Prove that you use a 8 cores-node with one job more efficiently than running 8 independent jobs (memory, cpu, I/O).

28/8/13

This still

requires more effort.Urgent

!!

R.Brun 54

Towards Parallel Software

A long way to go!!

There is no point in just making your code thread-safe. Use of parallel architectures requires a deep rethinking of the algorithms and dataflow.

Avoid committees!! Small teams instead, but well focused projects with well defined milestones and reference benchmarks and test suites.

One such project is GEANT GEANT4+5 launched 2 years ago. We start having very nice results. Now lead by Federico Carminati

But still a long way to go to adapt (or write radically new software) for the emerging parallel systems.

28/8/13

R.Brun 55

A long journeyI had a fantastic opportunity to work in a great environment and in contact with many many colleagues and users.

I had the chance to follow the evolution of physics via many experiments and understand a bit better the experiments expectations for simulation and analysis

I had the freedom to propose, implement and support a list of widely used tools

28/8/13

56

A long journey++I continue to work at CERN as honorary member

Following the developments of GEANT4+5 and ,of course, very interested by the current developments with ROOT.

I had the opportunity to start my own physics project (top left logo), my main occupation, where I feel free to think & work in crazy directions :

28/8/13R.Brun

R.Brun 57

ThanksMany thanks to the many people met in this lab across the past decades.

Many thanks to FermiLab for the strong support in 1998 and then for a strong contribution to the development of the ROOT system

Thanks to Pushpa Bhat for inviting me to give this talk and suggesting the theme and the title.

28/8/13

Recommended