LHC experimental data: LHC experimental data: FFrom today’s Data Challenges rom today’s Data Challenges
to the promise of tomorrowto the promise of tomorrow
B. Panzer – CERN/IT,
F. Rademakers – CERN/EP,
P. Vande Vyvre - CERN/EP
Academic Training CERN
Computing Infrastructure Computing Infrastructure and Technologyand Technology
Day 2
Academic Training CERN 12-16 May 2003
Bernd Panzer-Steindel CERN-IT
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT3
OutlineOutline
tasks, requirements, boundary conditions component technologies building farms and the fabric into the future
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT4
Before building a computing infrastructuresome questions need to be answered :
what are the tasks ? what is the dataflow ? what are the requirements ? what are the boundary conditions ?
Questions Questions
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT5
Interactive physics analysis
Interactive physics analysisInteractive physics
analysis
Interactive physics analysis
Experiment dataflowExperiment dataflow
Eventreconstruction
Eventreconstruction
High Level Triggerselection, reconstr.
High Level Triggerselection, reconstr.
Processed Data
Raw Data
Event SimulationEvent SimulationData AcquisitionData Acquisition
Event Summary Data
Interactive physics analysis
Interactive physics analysis
Physics analysisPhysics analysis
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT6
Physics resultPhysics result
Detector channel digitization
Level 1 and Level 2 Trigger
Event building
High Level Trigger
Detector channel digitization
Level 1 and Level 2 Trigger
Event building
High Level Trigger
Offline data reprocessing
Offline data analysis
Interactive data analysis and visualization
Offline data reprocessing
Offline data analysis
Interactive data analysis and visualization
Simulated data production(Monte Carlo)Simulated data production(Monte Carlo)
TasksTasks
Data storageData storage
Data calibrationData calibration
Online data processingOnline data processing
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT7
Tape server
Disk server
CPU server
DAQ
Central Data Recording
Online processingOnline filtering
MC production + pileup
Analysis
Dataflow ExamplesDataflow Examples
Re-processing
5 GB/s 2 GB/s 1 GB/s 50 GB/s
100 GB/s
CPU intensive
scenario for 2008
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT8
Requirements and Boundaries (I)Requirements and Boundaries (I)
The HEP applications require integer processor performance and less floating point performance choice of processor type, benchmark reference
Large amount of processing and storage needed, but optimization is for aggregate performance , not the single tasks + the events are independent units many components, moderate demands on the single components, coarse grain parallelism
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT9
the major boundary condition is cost, staying within the budget envelope + maximum amount of resources commodity equipment, best price/performance values
≠ cheapest ! take into account reliability, functionality and performance together == total-cost-of-ownership
basic infrastructure , environment availability of space, cooling and electricity
Requirements and Boundaries (II)Requirements and Boundaries (II)
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT10
Component technologiesComponent technologies
processor disk tape network
and packaging issuesand packaging issues
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT11
Level of complexity
Batch system, load balancing,Control software, Hierarchical Storage Systems
HardwareHardware SoftwareSoftware
CPUCPU
Physical and logical couplingPhysical and logical coupling
DiskDisk
PC PC Storage tray,NAS server,SAN element
Storage tray,NAS server,SAN element
Motherboard, backplane,Bus, integrating devices(memory,Power supply, controller,..)
Operating system, driver
Network (Ethernet, fibre channel, Myrinet, ….)Hubs, switches, routers
ClusterCluster
World wide clusterWorld wide cluster Grid middleware Wide area network
Coupling of building blocksCoupling of building blocks
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT12
ProcessorsProcessors
focus on integer price/performance (SI2000)
PC mass market INTEL and AMD
price/performance optimum is changing frequently between the two weak point of AMD : heat protection, heat production
current CERN strategy is to use INTEL processors
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT13
Price/performance evolutionPrice/performance evolutionSI2000 cost per processor
0.1
1
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
time since January 2000
SF
r/S
I200
0
500 MHz PIII
600 MHz PII
700 MHz PIII
800 MHz PIII
1000 MHz PIII
1260 MHz PIII
1400 MHz PIII
1400 MHz PIV
1600 MHz PIV
1800 MHz PIV
2000 MHz PIV
2200 MHz PIV
2400 MHz PIV
2600 MHz PIV
2800 MHz PIV
3000 MHz PIV
dual cpu serverfactor 5 in 3 years
factor 4 differenceprocessor cost is 25% of the box
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT14
Industry tries now to fulfill Moore’s Law
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT15
best price/performance per node comes today with dual processors and desk side cases processors are only 25-30% of the box costs mainboard, memory, power-supply, case, disk
today a typical configuration is : 2 x 2.4 GHz PIV processors, 1 GB memory, 80 GB disk, fast ethernet about two ‘versions’ behind == 2.8 GHz , 3 Ghz are available but don’t give a good price/performance value
one has to add 10% of the box costs for infrastructure ( racks, cabling, network, control system)
Processor packagingProcessor packaging
1U rack mounted case
desk side case thin units can be up to 30% more expensive cooling and space
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT16
Computer center
Experiment control room
SSPPAACCEE
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT17
seeing effects of market saturation for desktops + moving into the laptop direction we are currently using “desktop+” machines more expensive to use server CPU’smore expensive to use server CPU’s
Moore’s Second Law : the cost of a fabrication facility increases at an even greater rate as the transistor density (doubling every 18 month)current fabrication plants cost : ~ 2.5 billion $ (INTEL profit in 2002 : 3.2 billion $)
heat dissipation, currently heat production increases linear with performance tera herz transistors (2005 - ), reduce leakage currents power saving processors BUT careful to compare effective performance measures for mobile computing do not help in case of 100% CPU utilization 24*7 operation
ProblemsProblems
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT18
Processor performance (SpecInt2000) per Watt
0
2
4
6
8
10
12
14
16
18
0 1000 2000 3000
Frequency [MHz]
Sp
ec
Int2
00
0/W
att
PIII 0.25
PIII 0.18
PIV 0.18
PIV 0.13
Itanium 2 0.18
PIV Xeon 0.13
Processor power consumption
Heat production
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT19
Electricity and cooling
large investments necessary, long planning and implementation period
we use today about 700 KW in the center, upgrade to 2.5 MW has started i.e. 2.5 for electricity + 2.5 for cooling need extra buildings, will take several years and costs up to 8 million SFr
this infrastructure evolves not linear but in larger step functions
much more complicated for the experimental areas with their space and access limitations
Basic infrastructureBasic infrastructure
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT20
Disk storageDisk storage
density improving every year (doubling every ~14 month)
single stream speed (sequential I/O) increasing considerably (up to 100 MB/s)
transactions per second (random I/O, access time) very little improvement (factor 2 in 4 years, from 8 ms to 4 ms)
data rates drop considerably when moving from sequential to random I/O
online/offline processing works with sequential streams analysis using random access patterns and multiple,parallel sequential streams =~ random access
disks come in different ‘flavours’, connection type to the host same hardware with different electronics SCSI, IDE, fiber channel different quality selection criteria MTBF (Mean-Time-Between-Failure) mass market == lower values
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT21
Disk performanceDisk performance
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT22
Price/performance evolutionPrice/performance evolution
price in SFr per GByte
1
10
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
time since Jan 2000
SF
r/G
B
40 GB disk
60 GB disk
80 GB disk
120 GB
160 GB
180 GB
200 GB
disk server
factor 6 in 3 years
factor 2.5 difference
Non-mirrored disk server
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT23
Storage density evolutionStorage density evolution
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT24
10-12 IDE disks are attached to a RAID controller inside a modified PC with a larger housing, connected with gigabit ethernet to the network NAS Network Attached Storage
good experience with this approach, current practice
alternatives :
SAN Storage Area Networks based on disks directly attached to a fiber channel network
iSCSI SCSI commands via IP, disk trays with iSCSI controller attached to ethernet
R&D, evaluations advantages of SAN versus NASwhich would justify the higher costs factor 2-4
not only the ‘pure’ costs per GB of storage throughput, reliability, manageability, redundancy
Storage packagingStorage packaging
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT25
for disk servers coupling of disks, processor, memory and network defines the performance + LINUX
PCI 120 – 500 MB/sPCI-X 1 – 8 GB/s
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT26
Tape storageTape storage not a mass market, aimed at backup (write once - read never) we need high throughput reliable under constant read/write stress
need automated reliable access to a large amount of data large robotic installations major players are IBM and StorageTek (STK)
improvements are slow, not comparable with processors or disks trends ; current generation : 30 MB/s tape drives with 200 GB cartridges
disk and tape storage prices are getting closer factor 2-3 difference
two types of read/write technologies : helical scan “video recorder” complicated mechanics
linear scan “audio recorder” simpler, density lower linear is prefered, had some bad experience with helical scan
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT27
NetworkNetwork
commodity Ethernet 10 / 100 / 1000 / 10000 Mbits/s sufficient in the offline world and even partly in the online world (HLT) level1 triggers need lower latency times
special network , Cluster interconnect : Myrinet 1,2,10 Gbits/s GSN 6.4 Gbits/s
infiniband 2.5 Gbits/s * 4 (12)
storage network fiber channel 1 Gbits/s , 2 Gbits/s
very high performance with low latency, small processor ‘footprint’,small market , expensive
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT28
nano technology (carbon nanotubes)
molecular computing, (kilohertz plastic processors, single molecule switches)
biological computing, (DNS computing)
quantum computing, (quantum dots, ion traps, few qbits only)
very interesting and fast progress in the last years, but far away from any commodity production
less fancy game machines (X-Box, GameCube, Playstation 2) advantage : large market (>10 billion $), cheap high power nodes disadvantage : little memory, networking capabilities graphics cards several times the raw power of normal CPUs not easy to use in our environment
““Exotic” technology trendsExotic” technology trends
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT29
Technology evolutionTechnology evolution
exponential growth rates everywhere
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT30
Building farms and the fabricBuilding farms and the fabric
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT31
Building the FarmBuilding the Farm
Processors “desktop+” node == CPU server
CPU server + larger case + 6*2 disks == Disk server
CPU server + Fiber Channel Interface + tape drive == Tape server
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT32
Software ‘glue’Software ‘glue’
management of the basic hardware and software : installation, configuration and monitoring system (from the European Data Grid project)
management of the processor computing resources : Batch system (LSF from Platform Computing)
management of the storage (disk and tape) : CASTOR (CERN developed Hierarchical Storage Management system)
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT33
tape servers
disk servers
application servers
Generic model of a FabricGeneric model of a Fabric
to external network
network
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT34
Fast Ethernet, 100 Mbit/s
Gigabit Ethernet, 1000 Mbit/s
WAN
Disk Server Tape Server
CPU Server
Backbone
Today’s schematic network topologyToday’s schematic network topology
Multiple Gigabit Ethernet, 20 * 1000 Mbit/s
Gigabit Ethernet, 1000 Mbit/s
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT35
LCG Testbed StructureLCG Testbed Structure
100 cpu servers on GE, 300 on FE, 100 disk servers on GE (~50TB), 20 tape server on GE
3 GB lines
3 GB lines
8 GB lines
64 disk server64 disk server
BackboneRouters BackboneRouters
36 disk server36 disk server
20 tape server20 tape server
100 GE cpu server100 GE cpu server
200 FE cpu server200 FE cpu server
100 FE cpu server100 FE cpu server
1 GB lines
GigaBitGigabit EthernetFast Ethernet
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT36
Benchmark,performance and testbed clusters (LCG prototype resources)
computing data challenges, technology challenges, online tests, EDG testbeds, preparations for the LCG-1 production system, complexity tests 500 CPU server, 100 disk server, ~390000 Si2000, ~ 50 TB
Main fabric cluster (Lxbatch/Lxplus resources)
physics production for all experiments Requests are made in units of Si2000
1000 CPU server, 160 disk server, ~ 950000 Si2000, ~ 100 TB
50 tape drives (30MB/s, 200 GB cart.) 10 silos with 6000 slots each == 12 PB capacity
Computer center todayComputer center today
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT37
2-3 hardware generations2-3 OS/software versions4 Experiment environments
Service control and management(e.g. stager, HSM, LSF master, repositories, GRID services, CA, etc
Service control and management(e.g. stager, HSM, LSF master, repositories, GRID services, CA, etc
Main fabric clusterMain fabric cluster
Certification clusterMain cluster ‘en miniature’Certification clusterMain cluster ‘en miniature’
R&D cluster(new architectureand hardware)
R&D cluster(new architectureand hardware)
Benchmark and performance cluster(current architecture and hardware)Benchmark and performance cluster(current architecture and hardware)
New software , new hardware (purchase)New software , new hardware (purchase)
old current new
Development clusterGRID testbedsDevelopment clusterGRID testbeds
General Fabric LayoutGeneral Fabric Layout
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT38
View of different Fabric areasView of different Fabric areas
InfrastructureElectricity, Cooling, SpaceInfrastructureElectricity, Cooling, Space
NetworkNetwork
Batch system (LSF, CPU server)Batch system (LSF, CPU server)
Storage system (AFS, CASTOR, disk server)Storage system (AFS, CASTOR, disk server)
Purchase, Hardware selection,Resource planningPurchase, Hardware selection,Resource planning
InstallationConfiguration + monitoringFault tolerance
InstallationConfiguration + monitoringFault tolerance
Prototype, TestbedsPrototype, Testbeds
Benchmarks, R&D,ArchitectureBenchmarks, R&D,Architecture
Automation, Operation, ControlAutomation, Operation, Control
Coupling of components through hardware and software
GRID services !?GRID services !?
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT39
Into the futureInto the future
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT40
current state of performance, functionality and reliability is good and technology developments look still promising
more of the same for the future !?!?
How can we be sure that we are following the right path ?
How to adapt to changes ?
ConsiderationsConsiderations
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT41
continue and expand the current system
BUT do in parallel :
R&D activities SAN versus NAS, iSCSI, IA64 processors, ….
technology evaluations infiniband clusters, new filesystem technologies,…..
Data Challenges to test scalabilities on larger scales “bring the system to it’s limit and beyond “ we are very successful already with this approach, especially with the “beyond” part Fridays talk
watch carefully the market trends
StrategyStrategy
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT42
CERN computer center 2008CERN computer center 2008
Hierarchical Ethernet network tree topology (280 GB/s)
~ 8000 mirrored disks ( 4 PB)
~ 3000 dual CPU nodes (20 million SI2000)
~ 170 tape drives (4 GB/s)
~ 25 PB tape storage
The CMS High Level Trigger will consist of about 1000 nodes with 10 million SI2000 !!
all numbers : IF exponential growth rate continues !
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT43
Gigabit Ethernet, 1000 Mbit/s
WAN
Disk Server Tape ServerCPU Server
Backbone
Tomorrow’s schematic network topologyTomorrow’s schematic network topology
Multiple 10 Gigabit Ethernet, 200 * 10000 Mbit/s
10 Gigabit Ethernet, 10000 Mbit/s
10 Gigabit Ethernet, 10000 Mbit/s
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT44
quite confident in the technological evolution
quite confident in the current architecture
LHC computing is not a question of pure technology
efficient coupling of components, hard- + software
commodity is a must for cost efficiency
boundary conditions are important
market development can have large effects
SummarySummary
CERN Academic Training 12-16 May 2003Bernd Panzer-Steindel CERN-IT45
TomorrowTomorrow
• Day 1 (Pierre VANDE VYVRE)– Outline, main concepts– Requirements of LHC experiments– Data Challenges
• Day 2 (Bernd PANZER)– Computing infrastructure– Technology trends
• Day 3 (Pierre VANDE VYVRE)– Data acquisition
• Day 4 (Fons RADEMAKERS)– Simulation, Reconstruction and analysis
• Day 5 (Bernd PANZER)– Computing Data challenges– Physics Data Challenges– Evolution