Upload
dyanne
View
69
Download
0
Tags:
Embed Size (px)
DESCRIPTION
My Journey through Scientific Computing. August 28 2013, FNAL René Brun/CERN*. Disclaimer. In this talk I present the views of somebody involved in some aspects of scientific computing as seen from a major lab in HEP. - PowerPoint PPT Presentation
Citation preview
My Journey throughScientific
ComputingAugust 28 2013, FNAL
René Brun/CERN*
R.Brun 2
DisclaimerIn this talk I present the views of somebody involved in some aspects of scientific computing as seen from a major lab in HEP.
Having been involved in the design and implementation of many systems, my views are necessarily biased.
28/8/13
R.Brun 3
CV with experiments
Jun 1973: Thesis Nuclear Physics
Jul 1973 : ISR: R602 with C.Rubbia (reconstruction)
Oct 1974: SPS: NA4 with C.Rubbia (simulation/recons)
Feb 1980: LEP: OPAL at LEP (simulation and framework)
1991-1993 :LHC: ATLAS/CMS simulation
Sep 1994: SPS: NA49 at SPS (exile)
Sep 1996: LHC: ALICE (simulation and framework)
28/8/13
R.Brun 4
CV with general software
1974 : HBOOK
1975: GEANT1, Lintra, Mudifi
1977: GEANT2, HPLOT, ZBOOK
1980: GEANT3
1983: ZEBRA
1984: PAW
1995: ROOT
2012: GEANT4+5 vector prototype concepts
28/8/13
R.Brun 5
Machines
28/8/13
From Mainframes ===== Clusters
Walls of
cores
GRIDs&
Clouds
R.Brun 6
Machine Units (bits)
28/8/13
16 32 36 48 56 60 64pdp
11nord50
besm6
cdc many
many
univac
With even more combinations of
exponent/mantissa size
or byte ordering
A strong push to develop portable
machine independent I/O systems
R.Brun 7
User machine interface
28/8/13
R.Brun 8
Systems in 1980
28/8/13
OS & fortran
LibrariesHBOOK, Naglib, cernlib
ExperimentSoftware
End userAnalysis software
CDC, IBM
1000 KLOC
500 KLOC
100 KLOC
10 KLOC
Vax780
TapesRAM1 MB
R.Brun 9
Systems today
28/8/13
OS & compilers
Frameworks likeROOT, Geant4
ExperimentSoftware
End userAnalysis software
Hardware
20 MLOC
5 MLOC
4 MLOC
0.1 MLOC
HardwareHardwareHardwareClusters of multi-core machines
10000x8
GRIDS
CLOUDS
Networks10 Gbit/s
Disks1o PB
RAM16 GB
R.Brun 10
Tools & Libs
28/8/13
hbook
zebrapawzbook
hydra
geant1
geant2
geant3
geant4
root
minuit
bos
Geant4+5
R.Brun 11
General Software in 1973
Software for bubble chambers: Thresh, Grind, Hydra
Histogram tool: SUMX from Berkeley
Simulation with EGS3 (SLAC), MCNP(Oak Ridge)
Small Fortran IV programs (1000 LOC, 50 kbytes)
Punched cards, line printers, pen plotters (GD3)
Small archive libraries (cernlib), lib.a 28/8/13
R.Brun 12
Software in 1974First “Large Electronic Experiments”
Data Handling Division == Track Chambers
Well organized software in TC with HYDRA, Thresh, Grind, anarchy elsewhere
HBOOK: from 3 routines to 100, from 3 users to many
First software group in DD28/8/13
R.Brun 13
GEANT1 in 1975Very basic framework to drive a simulation program, reading data cards with FFREAD, step actions with GUSTEP, GUNEXT, apply mag-field (GUFLD).
Output (Hist/Digits) was user defined
Histograms with HBOOK
About 2,000 LOC
28/8/13
R.Brun 14
ZBOOK in 1975Extraction of the HBOOK memory manager in an independent package.
Creation of banks and data structures anywhere in common blocks
Machine independent I/O, sequential and random
About 5,000 LOC
28/8/13
R.Brun 15
GEANT2 in 1976Extension of GEANT1 with more physics (e-showers based on a subset of EGS, mult-scattering, decays, energy loss
Kinematics, hits/digits data structures in ZBOOK
Used by several SPS experiments (NA3, NA4, NA10, Omega)
About 10,000 LOC
28/8/13
R.Brun 16
Problems with GEANT2
Very successful small framework.
However, the detector description was user written and defined via “if” statements at tracking time.
This was becoming a hard task for large and always evolving detectors (case with NA4 and C.Rubbia)
Many attempts to describe a detector geometry via data cards (a bit like XML), but the main problem was the poor and inefficient detector description in memory.
28/8/13
R.Brun 17
GEANT3 in 1980A data structure (ZBOOK tree) describing complex geometries introduced , then gradually the geometry routines computing distances, etc
This was a huge step forward implemented first in OPAL, then L3 and ALEPH.
Full electromagnetic showers (first based on EGS, then own developments)
28/8/13
R.Brun 18
(HYDRA,ZBOOK)GEM ->ZEBRA
HYDRA and ZBOOK continuous developments, both having nice and complementary features.
In 1981 the GEM project launched, developed with no contacts with experiments fail to deliver a working system in 1982.
In 1983 the director for computing decided to stop GEM and HYDRA and support the ZBOOK line, mainly because of the success of GEANT3 based on ZBOOK.
I decided to collaborate with J.Zoll (HYDRA) to develop ZEBRA, combining ZBOOK and HYDRA.
This decision made OPAL and L3 happy, but ALEPH decided to use BOS from DESY.
28/8/13
19
GEANT3 with ZEBRA
ZEBRA was very rapidly implemented in 1983.
We introduced ZEBRA in GEANT3 in 1984.
From 1984 to 1993 we introduced plenty of new features in GEANT3: extensions of the geometry, hadronic models with Tatina, Gheisha and Fluka, Graphics tools.
In 1998, GEANT3 interface with ROOT via the VMC (Virtual Monte Carlo)
GEANT3 has been used and still in use by many experiments.
28/8/13R.Brun
R.Brun 20
Graphics/UI in the 80s
CORE system (Andy Van Dam) in the US
GKS in Europe
Xwindow wins with Xterms and workstations
We design HIGZ (interface to graphics and ZEBRA) with many interfaces (CORE, GKS, X, PHIGS,etc)
KUIP for User Interface (VT100, work-stations, xterms)
28/8/13
R.Brun 21
PAWFirst minimal version in 1984
Attempt to merge with GEP (DESY) in 1985, but take the idea of ntuples for storage and analysis. GEP was written in PL1.
Package growing until 1994 with more and more functions. Column-wise ntuples in 1990.
Users liked it, mainly once the system was frozen in 1994.
28/8/13
R.Brun 22
Vectorization attempts
During the years 1985->1990 a big effort was invested in vectorizing GEANT3 (work in collaboration with Florida State University) on CRAY/YMP, CYBER205,ETA10.
The minor gains obtained did not justify the big manpower investment. GEANT3 transport was still essentially sequential and we had a big overhead with vectors creation, gather/scatter.
However this experience and failure was very important for us and many messages useful for the design of GEANT5 many years later.
28/8/13
23
So far so goodThe years 1980->1989 were pretty quiet (ie no fights). Fortran 77 was the main language and LEP our main target. The SSC simulations (GEM, SDC) were along the same lines.
The ADAMO system had been proposed as an implementation of the Entity Relationship Model, but its use remained confidential (ALEPH/ZEUS), same for the Jazelle system from SLD.
In 1989 Tim Berners Lee joined my group in DD to implement a system allowing access to a central documentation on the IBM via RPCs (Remote Procedure Calls), but he developed something else. This coincided with a lot of controversy about future languages, data structures management, data bases, user interfaces, documentation systems, etc.
28/8/13R.Brun
R.Brun 24
1991: Erice Workshop
This workshop was supposed to see an agreement on the directions and languages for the next generation experiments, instead a very confusing picture emerged.
The MOOSE project created at CERN to investigate languages such as Eiffel, C++, ObjectiveC, F90. But this project failed to produce anything concrete.
At the same time my group was very busy with the LEP analysis with PAW and the SSC and LHC simulations with GEANT3 (ATLAS and CMS).
28/8/13
R.Brun 25
Erice workshop 1991
28/8/13
Where to go with DS
tools & languages
?
Powerful tools but
programming with
them was odd
R.Brun 26
1992: CHEP Annecy
Web, web, web, web…………
Attempts to replace/upgrade ZEBRA to support/use F90 modules and structures, but modules parsing and analysis was thought to be too difficult.
With ZEBRA the bank description was within the bank itself (just a few bits). A bank was typically a few integers followed by a dynamic array of floats/doubles.
We did not realize at the time that parsing user data structures was going to be a big challenge!!
28/8/13
R.Brun 27
Parallelism in the 80s & early 90s
Many attempts (all failing) with parallel architectures
Transputers and OCCAM
MPP (CM2, CM5, ELXI,..) with OpenMP-like software
Too many GLOBAL variables/structures with Fortran common blocks.
RISC architectures or emulators perceived as a cheaper solution in the early 90s.
Then MPPs died with the advent of the Pentium Pro (1994) and farms of PCs or workstations.
28/8/13
R.Brun 28
ConsequencesIn 1993/1994 performance was not anymore the main problem.
Our field invaded by computer scientists.
Program design, object-oriented programming , move to more sexy languages was becoming a priority.
The “goal” was thought less important than the “how”
This situation deteriorates even more with the death of the SSC.
28/8/13
R.Brun 29
1993: Warning Danger
3 “clans” in my group1/3 pro F90 1/3 pro C++1/3 pro commercial products (any language) for graphics, User Interfaces, I/O and data bases
My proposal to continue with PAW, develop ZOO(ZEBRA Object-Oriented) and GEANT3 geometry in C++ is not accepted.
Evolution vs Revolution
28/8/13
R.Brun 30
1994: What next?SSC down, LHC up. SSC refugees joining LHC development groups.
DRDC projects: Objectivity, GEANT4
Time to think, think, think and learn new things (OO,UML,C++,Eiffel, O2, ObjectStore, Objectivity,..)
Discard several proposals and choose exile to NA49
Fall94: first version of histogram package in C++, including some I/O attempts. Now in a better position to estimate development time, C++ pros&cons, Objectivity cons.
Christmas94: YES, let’s go with ROOT
28/8/13
R.Brun 31
1995: roads for ROOT
The official line was with GEANT4 and Objectivity, not much room left for success with an alternative product when you are alone.
The best tactic had to be a mixture of sociology , technicalities and very hard work.
Strong support from PAW and GEANT3 usersStrong support from HP (workstations + manpower)
In November we were ready for a first ROOT show
Java is announced (problem?)
28/8/13
R.Brun 32
1996: Technicalities
Histogram classes (3 versions) + Minuit
Hand written Streamers
From PAW/KUIP style to CINT
Collection classes (STL , templates, hesitations..)
ATLFAST++ (fast simulation of ATLAS based on ROOT)
Letter from the director of computing against the use of ROOT by experiments (except NA49). Problem for ALICE.
LHC++ official alternative to ROOT
28/8/13
R.Brun 33
1997: Technicalities++ROOTCINT generated Streamers with parsing of C++ header files, including user classes.
Many improvements and new packages
Automatic classes documentation system (THTML), first manual and tutorials.
gAlice converted to AliRoot
Interest from RHIC (Phobos and Star)
CHEP97 in Berlin with focus on GEANT4, Objectivity,LHC++,JAS
28/8/13
R.Brun 34
1998: work & smile
RUN II projects at FNALData Analysis and VisualizationData Formats and storage
ROOT competing with HistoScope, JAS, LHC++
CHEP98 (September) Intercontinental Chicago
ROOT selected by FNAL, followed by RHICVital decision for ROOT, thanks FermiLab !!!
But official support at CERN only in 2002
28/8/13
R.Brun 35
ROOT Trees vs Objectivity
Compared to the best OODBMS candidate in 1995 (Objectivity) ROOT supports a persistent class that may be a subset of the transient class.
ROOT supports compression (typical factors 3 to 6), file portability and access in heterogeneous networks
ROOT supports branch splitting that increases drastically the performance when reading.
It is based on classical system files and does not require the nightmare of a central data base.
ROOT TTreeCache allows efficient access to LAN and WAN files.
28/8/13
R.Brun
Input/Output: Major Steps
36
parallel merge
TreeCache
member-wise streamingfor STL collections<T*>
member-wise streamingfor TClonesArray
automatic streamers from dictionary with StreamerInfosin self-describing files
streamers generatedby rootcint
User written streamersfilling TBuffer
28/8/13
R.Brun 37
ROOT evolutionNo time to discuss the creation/evolution of the 110 ROOT shared libs/packages.
ROOT has gradually evolved from a data storage, analysis and visualization system to a more general software environment replacing totally what was known before as CERNLIB.
This has been possible thanks to MANY contributors from experiments, labs or people working on other fields.
In the following few slides, I show the current big systems assigned to at least one developer.
28/8/13
R.Brun 38
Code repository and ManagementFrom CMZ to CVS SVN GIT
Make cmake
Build and test infrastructure
Distribution
Forum, mails, Savannah->JIRA
Documentation, User Guide
28/8/13
R.Brun 39
CINT -> CLINGCINT, originally developed by Masa Goto (HP/Taligent/Japan) in 1991 has been gradually upgraded across the years.
Work in progress to replace CINT by CLING based on the CLANG C++ compiler from Apple.
With CLING many C++ limitations of CINT will be eliminated and full support for C++11 provided at the command line, script via a JIT compiler.
28/8/13
R.Brun 40
2-D GraphicsExtremely important area that requires a lot of effort to support a large number of styles, options.
Support for graphics on screen with different back-ends for Linux, Macs/Ipad , Windows, Android, etc
Support for many output formats (.ps, .eps, .pdf,.tex,.gif,.png,.jpg, .C, .root
Support for interaction, picking, reflexion, etc 28/8/13
R.Brun 41
3D GraphicsAgain many back-ends: X, OpenGL, GDK/Windows, Cocoa/Mac,
Event viewing
Ouput drivers
User Interfaces
28/8/13
R.Brun 42
Geometry package
Originally from GEANT3, then considerable developments.
Many interfaces: to/from GEANT3 and GEANT4
New developments (parallelism, thread safety and vectorization) in the context of the GEANT4+5 project
28/8/13
R.Brun 43
User InterfaceUI library for X, GL, GDK, Cocoa, Ipad,
Pre-defined menu, pop-up, pull-down
Interface builder
C++ automatic code generator
Qt interface
28/8/13
R.Brun 44
Maths&StatsMathematical functions
Matrix packages
Random number generators
MINUIT, Minuit2,..
RooStats, RooFit
TMVA,…
28/8/13
R.Brun 45
PyRoot/PythonFull time to work to support an easy Python interface
Old interface based on CINT
New interface based on CLING
28/8/13
R.Brun 46
PROOFUse multi-process techniques to speed-up interactive analysis.
Evolved and again evolving to be used via automatic interfaces to the GRID systems
PROOFLight interesting for multi-core laptops.
28/8/13
R.Brun 47
Systems in 2030 ?
28/8/13
OS & compilers
Frameworks likeROOT, Geant5
ExperimentSoftware
End userAnalysis software
Hardware
100 MLOC
20 MLOC
50 MLOC
1 MLOC
HardwareHardwareHardwareMulti-level parallel machines10000x1000x1000
GRIDS
CLOUDSon
demand
Networks100
Gbit/s
Disks1o00 PB
Networks100
Gbit/sNetworks10 Tbit/s
RAM10 TB
R.Brun 48
Parallelism: key points
28/8/13
Minimize the sequential/synchronization parts (Amdhal law): Very difficultRun the same code (processes) on all cores to optimize the memory use (code and read-only data sharing)
Job-level is better than event-level parallelism for offline systems.
Use the good-old principle of data locality to minimize the cache misses.
Exploit the vector capabilities but be careful with the new/delete/gather/scatter problem
Reorganize your code to reduce tails
Data Structures & parallelism
28/8/13R.Brun 49
eventevent
vertices
tracks
C++ pointersspecific to a process
Copying the structure implies a
relocation of all pointers
I/O is a nightmare
Update of the structure from a different thread implies a
lock/mutex
R.Brun 50
Data Structures & Locality
28/8/13
sparse data structures defeat the system memory caches
Group object elements/collections such
that the storage matches the traversal processes
For example: group the cross-sections for all
processes per material instead of all materials
per process
Tails
28/8/13R.Brun 51
A killer if one has to wait the end of col(i) before
processing col(i+1)Average number
of objects in memory
A better solution
28/8/13R.Brun 52
Pipeline of objects
CheckpointSynchronization.
Only 1 « gap » every N events
This type of solution required
anyhow for pile-up studies
R.Brun 53
Other requirements
Eliminate the sequential part (like merging files) when running jobs/threads/processes in parallel. Use parallel buffer merges instead.
Use efficient tools to monitor bottlenecks like memory allocation, cache misses, too many locks, etc
Compare the results with the most efficient sequential version and not just the version using one single thread.
Prove that you use a 8 cores-node with one job more efficiently than running 8 independent jobs (memory, cpu, I/O).
28/8/13
This still
requires more effort.Urgent
!!
R.Brun 54
Towards Parallel Software
A long way to go!!
There is no point in just making your code thread-safe. Use of parallel architectures requires a deep rethinking of the algorithms and dataflow.
Avoid committees!! Small teams instead, but well focused projects with well defined milestones and reference benchmarks and test suites.
One such project is GEANT GEANT4+5 launched 2 years ago. We start having very nice results. Now lead by Federico Carminati
But still a long way to go to adapt (or write radically new software) for the emerging parallel systems.
28/8/13
R.Brun 55
A long journeyI had a fantastic opportunity to work in a great environment and in contact with many many colleagues and users.
I had the chance to follow the evolution of physics via many experiments and understand a bit better the experiments expectations for simulation and analysis
I had the freedom to propose, implement and support a list of widely used tools
28/8/13
56
A long journey++I continue to work at CERN as honorary member
Following the developments of GEANT4+5 and ,of course, very interested by the current developments with ROOT.
I had the opportunity to start my own physics project (top left logo), my main occupation, where I feel free to think & work in crazy directions :
28/8/13R.Brun
R.Brun 57
ThanksMany thanks to the many people met in this lab across the past decades.
Many thanks to FermiLab for the strong support in 1998 and then for a strong contribution to the development of the ROOT system
Thanks to Pushpa Bhat for inviting me to give this talk and suggesting the theme and the title.
28/8/13