View
216
Download
0
Embed Size (px)
Citation preview
CERN
1999 Summer Student Lectures
Computing at CERN
Lecture 2 — Looking at Data
Tony Cass — [email protected]
CERN
2Tony Cass
Data and Computation for Physics Analysis
batchphysicsanalysis
batchphysicsanalysis
detector
event summary data
rawdata
eventreconstruction
eventreconstruction
eventsimulation
eventsimulation
interactivephysicsanalysis
analysis objects(extracted by physics topic)
event filter(selection &
reconstruction)
event filter(selection &
reconstruction)
processeddata
CERN
3Tony Cass
Central Data Recording CDR marks the boundary between the experiment and
the central computing facilities. It is a loose boundary which depends on an
experiment’s approach to data collection and analysis. CDR developments are also affected by
– network developments, and
– event complexity.detector
rawdata
event filter(selection &
reconstruction)
event filter(selection &
reconstruction)
CERN
4Tony Cass
Monte Carlo Simulation From a physics standpoint, simulation is needed to study
– detector response
– signal vs. background
– sensitivity to physics parameter variations.
From a computing standpoint, simulation– is CPU intensive, but
– has low I/O requirements.
Simulation farms are therefore good testbedsfor new technology:– CSF for Unix and now PCSF for PCs and Windows/NT.
eventsimulation
eventsimulation
CERN
5Tony Cass
Data Reconstruction The event reconstruction stage turns detector information into
physics information about events. This involves– complex processing
» i.e. lots of CPU capacity
– reading all raw data» i.e lots of input, possibly read
from tape
– writing processed events» i.e. lots of output which
must be written topermanent storage.
event summary data
rawdata
eventreconstruction
eventreconstruction
CERN
6Tony Cass
Batch Physics Analysis
Physics analysis teams scan over all events to find those that are interesting to them.– Potentially enormous input
» at least data from current year.
– CPU requirements are high.
– Output is “small”» O(102)MB
– but there are many different teams andthe output must be stored for future studies
» large disk pools needed.
batchphysicsanalysis
batchphysicsanalysis
event summary data
analysis objects(extracted by physics topic)
CERN
7Tony Cass
Symmetric MultiProcessor Model
Experiment
TapeStorage TeraBytes of disks
CERN
8Tony Cass
Scalable model—SP2/CS2
Experiment
TapeStorage TeraBytes of disks
CERN
9Tony Cass
Distributed Computing Model
Experiment
TapeStorage
Disk Server
CPU Server
Switch
CERN
10Tony Cass
Today’s CORE Computing Systems
CERN Network
Home directories& registry
Central Data Services
Shared Disk Servers
CORE Physics Services
CERN
32 IBM, DEC,SUN servers
Simulation Facility
46 H-PPA-RISC
46 H-PPA-RISC
SHIFTData intensive services
70 computers, 250 processors(DEC, H-P, IBM, SGI, SUN)
8 TeraBytes embedded disk
70 computers, 250 processors(DEC, H-P, IBM, SGI, SUN)
8 TeraBytes embedded disk
2 TeraByte disk10 SGI, DEC, IBM servers
2 TeraByte disk10 SGI, DEC, IBM servers
3 tape robots100 tape drivesRedwood, DLT, Sony D1IBM 3590, 3490, 3480EXABYTE, DAT
3 tape robots100 tape drivesRedwood, DLT, Sony D1IBM 3590, 3490, 3480EXABYTE, DAT
Shared Tape Servers
CS-2 Service -Data Recording& Event Filter
Farm
komei
RSBATCH + PaRCPublic BatchService
15-node IBM SP236 PowerPC 604
15-node IBM SP236 PowerPC 604
consoles&
monitors
CSF - RISC servers
DECPLUS, HPPLUS,RSPLUS, WGS
InteractiveServices
66 systems (HP, SUN, IBM, DEC)66 systems (HP, SUN, IBM, DEC)
NAP - accelerator simulation service
NAP - accelerator simulation service
10-CPU DEC 840010 DEC workstations
10-CPU DEC 840010 DEC workstations
PCSF - PCs & NT
20 PentiumPro50 Pentium II
20 PentiumPro50 Pentium II
QSW CS-264-nodes (128-processors)
2 TeraBytes disk
QSW CS-264-nodes (128-processors)
2 TeraBytes disk
SUN & DECServers
SUN & DECServers
199
8!
CERN
11Tony Cass
Today’s CORE Computing Systems
PaRCEngineeringCluster
PaRCEngineeringCluster
CERN Network
Homedirectories& registry
Central Data Services
Shared Disk Servers
CORE Physics Services
CERN
32 IBM, DEC,SUN servers
SHIFTData intensive services
200 computers, 550 processors(DEC, H-P, IBM, SGI, SUN, PC)25 TeraBytes embedded disk
200 computers, 550 processors(DEC, H-P, IBM, SGI, SUN, PC)25 TeraBytes embedded disk
2 TeraByte disk10 SGI, DEC, IBM servers
2 TeraByte disk10 SGI, DEC, IBM servers
4 tape robots90 tape drivesRedwood, 9840 DLT,IBM 3590, 3490, 3480EXABYTE, DAT, Sony D1
4 tape robots90 tape drivesRedwood, 9840 DLT,IBM 3590, 3490, 3480EXABYTE, DAT, Sony D1
Shared Tape Servers
Data Recording, Event Filter and
CPU Farmsfor
NA45, NA48, COMPASS
consoles&
monitors
DXPLUS, HPPLUS,RSPLUS,LXPLUS, WGS
InteractiveServices
70 systems (HP, SUN, IBM, DEC, Linux)70 systems (HP, SUN, IBM, DEC, Linux)
RSBATCH Public BatchService
32 PowerPC 60432 PowerPC 604
NAP - accelerator simulation service
NAP - accelerator simulation service
10-CPU DEC 840010 DEC workstations
10-CPU DEC 840010 DEC workstations
Simulation Facility
25 H-PPA-RISC
25 H-PPA-RISC
CSF - RISC servers
PCSF - PCs & NT
10 PentiumPro25 Pentium II
10 PentiumPro25 Pentium II
60 dualprocessor PCs
60 dualprocessor PCs
13 DEC workstations3 IBM workstations
13 DEC workstations3 IBM workstations
PC Farms
CERN
12Tony Cass
Interactive Physics Analysis Interactive systems are needed to enable physicists to develop
and test programs before running lengthy batch jobs.– Physicists also
» visualise event data and histograms» prepare papers, and» send Email
Most physicists use workstations—either private systems or central systems accessed via an Xterminal or PC.
We need an environment that provides access to specialist physics facilities as well as to general interactive services.
analysis objects(extracted by physics topic)
CERN
13Tony Cass
Unix based Interactive Architecture Backup
& ArchiveReference
EnvironmentsCORE
Services
Optimized Access
X Terminals PCs PrivateWorkstations.
WorkGroupServer
Clusters
PLUSCLUSTERS
Central Services
(mail, news,ccdb, etc.)
ASIS :Replicated
AFS Binary Servers
AFS Home Directory Services
GeneralStaged Data
Pool
X-terminal Support
CERN InternalNetwork
CERN
14Tony Cass
PC based Interactive Architecture
CERN
15Tony Cass
Event Displays
Event displays, such as this ALEPH display help physicists to understand what is happening in a detector. A Web based event display, WIRED, was developed for DELPHI and is now used elsewhere.
Clever processing of events can also highlight certain features—such as in the V-plot views of ALEPH TPC data.
Standard X-Y view
V-plot view
CERN
16Tony Cass
Data Analysis Work
By selecting a dE/dx vs. p region on this scatter plot, a physicist can choose tracks created by a particular type of particle.
Most of the time, though, physicists will study eventdistributions rather than individual events.
RICH detectors provide better particle identification, however. This plot shows that the LHCb RICH detectors can distinguish pions from kaons efficiently over a wide momentum range.
Using RICH information greatly improves the signal/noise ratio in invariant mass plots.
CERN
17Tony Cass
CERN’s Network Connections
CERN
RENATER
C-IXP
IN2P3
TEN-155
C&W (US)
ATM Test Beds
SWITCH
39/155 Mb/s6 Mb/s2Mb/s
100
Mb/
s
12/20Mb/s
100 Mb/s
155 Mb/s
National Research Networks
Mission Oriented Link
Public
Test
CommercialWHO
TEN-155: Trans-European Network at 155Mb/s
2Mb/s
CERN
18Tony Cass
CERN’s Network TrafficMay - June 1999
CERN4.5Mb/s Out3.7Mb/s In
C&W (US)
RENATER
TEN-155
IN2P3
SWITCH
100Mb/s2Mb/s
20Mb/s
40Mb/s
6Mb/s
Link Bandwidth
0.6Mb/s
1.9Mb/s
2.5Mb/s
1Mb/s
0.1Mb/s
1.7Mb/s
1.8Mb/s
0.1Mb/s
~1 TB/month in each direction
1TB/month = 3.86Mb/s1Mb/s = 10GB/day
Incoming data rate
Outgoing data rate
CERN
19Tony Cass
Outgoing Traffic by ProtocolMay 31st-June 6th 1999
0
50
100
150
200
250
300
350
ftp www X afs int rfio mail news other Total
Protocol
Gig
aB
yte
s T
ran
sfe
rre
d
Elsewhere
USA
Europe
CERN
20Tony Cass
Incoming Traffic by Protocol May 31st-June 6th 1999
0
50
100
150
200
250
300
350
ftp www X afs int rfio mail news other Total
Protocol
Gig
aB
yte
s T
ran
sfe
rre
d
Elsewhere
USA
Europe
CERN
21Tony Cass
European & US Traffic GrowthFeb ’97-Jun ’98
USA
EU
Start of TEN-34 connection
199
8!
CERN
22Tony Cass
European & US Traffic GrowthFeb ’98-Jun ’99
USA
EU
CERN
23Tony Cass
Traffic GrowthJun 98 - May/Jun 99
Total
Outgoing
Incoming
0.00
1
2.00
3.00
4
ftp www X afs int rfio mail news other Total
0.00
1
2.00
3.00
4.00
5.00
6.00
7.00
8
ftp www X afs int rfio mail news other Total
0.00
1
2.00
3.00
4.00
5.00
6.00
7.00
8
ftp www X afs int rfio mail news other Total
0.00
1
2.00
3.00
4.00
5.00
6.00
7.00
8
ftp www X afs int rfio mail news other Total
Total EU
Other US
CERN
24Tony Cass
Round Trip times and Packet Loss ratesRound trip times for packets to SLAC
5 seconds!
Packet Loss rates to/from the US on the CERN link
[But traffic to, e.g., SLAC passes over other links in the US and these may also lose packets.]
[This is measured with ping; A packet must arrive and be echoed back; if it is lost, it does not give a Round Trip Time value.]
1998
Figure
s.
CERN
25Tony Cass
Looking at Data—Summary Physics experiments generate data!
– and physcists need to simulate real data to model physics processes and to understand their detectors.
Physics data must be processed, stored and manipulated. [Central] computing facilities for physicists must be designed to
take into account the needs of the data processing stages– from generation through reconstruction to analysis
Physicists also need to– communicate with outside laboratories and institutes, and to– have access to general interactive services.