34
1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th , 2001 Niko Neufeld CERN, EP

1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Embed Size (px)

Citation preview

Page 1: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

1

DAQ System Realization

DAQ Data Flow Review Sep. 11-12th, 2001

Niko NeufeldCERN, EP

Page 2: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

2

Overview

• FEM/RU complex – How many?

• Readout network– How big? , Which Components?

• Level 2/3 farm & Sub-farm Controllers – How fast? , How many?

•Conclusions

Page 3: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

3

Acronyms

• Readout Unit – RU

• Readout Network – RN

• Level 1 – L1

• Front-end Multiplexer – FEM

• Gigabit Ethernet – GigE

• More non-sense Acronyms - MNSA

• Sub-farm Controller – SFC

• Network Processor – NP

• SpecInt95 SI95 – benchmark obtained from a standard test-suite of applications normalised to the performance of a SPARCstation 10/40 (40MHz SuperSPARC) this machine takes 48h to run the suite

Page 4: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

4

DAQ Architecture

Read - out Network (RN)

RU RU RU

6-15 GB/s

6-15 GB/s

50 MB/sVariable latency

L2 ~10 ms

L3 ~200 ms

Control &

Monitoring

LA

N

Read -out units (RU)

Timing&

FastControl

Level -0

Front - End Electronics

Level -1

VELO TRACK ECAL HCAL MUON RICH

LHCb Detector

L0

L1

Level 0Trigger

Level 1Trigger

40 MHz

1 MHz

40-100 kHz

Fixed latency

4.0 s

Variable latency <2 ms

Data

rates

40 TB/s

1 TB/s

1 MHz

Front End Links

Trigger Level 2 & 3Event Filter

SFC SFC

CPU

CPU

CPU

CPU

Sub - Farm Controllers (SFC)

Storage

Th

rott

le

Front -End Multiplexers (FEM)

Page 5: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

5

Basic Parameters & Features: Recap

• L1 trigger rate 40 kHz upgradeable to 100 kHz

• Total raw data size from all L1 boards approximately 4 GB/s

• Asynchronous

• Push-through

• Strict separation between control and data paths

• Overflow avoidance via central throttling

Page 6: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

6

FEM/RU Complex

Read - out Network (RN)

RU RU RU

6-15 GB/s

6-15 GB/s

50 MB/sVariable latency

L2 ~10 ms

L3 ~200 ms

Control &

Monitoring

LA

N

Read-out units (RU)

Timing&

FastControl

Level -0

Front - End Electronics

Level -1

VELO TRACK ECAL HCAL MUON RICH

LHCb Detector

L0

L1

Level 0

Trigger

Level 1Trigger

40 MHz

1 MHz

40-100 kHz

Fixed latency

4.0 s

Variable latency

<2 ms

Data

rates

40 TB/s

1 TB/s

1 MHz

Front End Links

Trigger Level 2 & 3Event Filter

SFC SFC

CPU

CPU

CPU

CPU

Sub - Farm Controllers (SFC)

Storage

Th

rott

le

Front-End Multiplexers (FEM)

Page 7: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

7

The FEM/RU complex

• Transports data from L1 links to the Readout Network

• Performs some multiplexing

• Assigns destinations (SFC)

• Is segmented according to the partitioning scheme of LHCb

All number shown in this section are for a system which puts 40 MB/s onto the output of an RU

Page 8: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

8

“Generic” Sub-detector:From L1 Links to the FEM/RU

1…

108L1

board

ou

t lin

ks

0…7

FEM FEMFEM

0…70…7

0…

25

total 373 links from L1 front-end

Page 9: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

9

“Generic” Sub-detector:From FEM/L1 to the Readout Unit

1…60

L1/F

EM

ou

t lin

ks

RU RURU1…

30

Gig

E

Lin

k t

o

RN

total ~95 links to RN

Gig

E

Lin

k t

o

RN

Gig

E

Lin

k t

o

RN

1…

71…

71…

7

Page 10: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

10

Sub-Detector FEM/RU System

•A “generic sub-”detector” from the Dataflow-system’s point of view is one of:VELO, IT, OT, RICH1, RICH2, SPD/PS, ECAL, HCAL, MUON, L0-Trigger, L1-Trigger, Readout Supervisor

•A “generic sub-”detector” has– 1…108 output links from Level 1– 0…25 Front-end Multiplexers with a

multiplexing factor between 2 and 4– 1…30 Readout Units, with a multiplexing

factor between 1 and 7

Page 11: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

11

Building the FEM/RU System

• Baseline for the building block is a Network Processor based module, with 4 or 8 Gigabit Ethernet ports(each module consists of 1 or 2 mezzanine cards with 1 NP and 4 GigE ports and 1 carrier board)

• The multiplexing factor is thus between 1 and 7

• System design proceeds by1. fixing the output bandwidth from a RU2. optimising the number of 4-port carrier cards3. taking into account partitioning (2 partitions must

not share a RU)

Page 12: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

12

FEM/RU System for 40 MB/s Output Bandwidth

Velo IT OT RICH1 RICH2 SPD/PS ECAL HCAL Muon Level-0 Level-1Readout

SupervisorTotal

L1 Boards 100 108 60 21 34 8 14 4 10 3 1 10 373Data Rate/L1 Board [MB/s] 2.4 3.9 17 11 11 28 9.6 17 7.2 10 10 10

Fragment Size/L1 Board [kB] 0.06 0.0975 0.425 0.275 0.275 0.7 0.24 0.425 0.18 0.25 0.25 0.25

Total Rate [MB/s] 240 421.2 1020 231 374 224 134.4 68 72 30 10 10 2835Target RU output BW [MB/s] 40 40 40 40 40 40 40 40 40 40 40 40

Target # RU Output Ports 6 10.53 25.5 5.775 9.35 5.6 3.36 1.7 1.8 0.75 0.25 0.25

Target Mux Factor 16.67 10.26 2.35 3.64 3.64 1.43 4.17 2.35 5.56 4.00 4.00 40.00

Mux Factor (FEM) 4 2 1 1 1 1 2 1 1 1 1 1

#FEMs 25 54 60 21 34 8 7 4 10 3 1 10

#Mezzanines (FEMs) 50 54 0 0 0 0 7 0 0 0 0 0

#Carrier Boards (FEM) 25 27 0 0 0 0 4 0 0 0 0 0Output BW/FEM [MB/s] 9.6 7.8 17 11 11 28 19.2 17 7.2 10 10 10

Mux Factor (RU) 4 5 2 3 3 1 2 2 5 4 4 1

#RU Outputs 7 11 30 7 12 8 4 2 2 1 1 10 95#Mezzanines (RUs) 14 22 30 7 12 8 4 2 4 2 2 10

#Carrier Boards (RU) 7 11 15 4 6 4 2 1 2 1 1 5Ouput BW/RU [MB/s] 38.4 39 34 33 33 28 38.4 34 36 40 40 10

Total Mux Factor 16 10 2 3 3 1 4 2 5 4 4 1

#Mezzanines 64 76 30 7 12 8 11 2 4 2 2 10 228

Page 13: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

13

FEM/RU System for 60 MB/s Output Bandwidth

Velo IT OT RICH1 RICH2 SPD/PS ECAL HCAL Muon Level-0 Level-1Readout

SupervisorTotal

L1 Boards 100 108 60 21 34 8 14 4 10 3 1 10 373Data Rate/L1 Board [MB/s] 2.4 3.9 17 11 11 28 9.6 17 7.2 10 10 10Fragment Size/L1 Board [kB] 0.06 0.0975 0.425 0.275 0.275 0.7 0.24 0.425 0.18 0.25 0.25 0.25Total Rate [MB/s] 240 421.2 1020 231 374 224 134.4 68 72 30 10 10 2835Target RU output BW [MB/s] 60 60 60 60 60 60 60 60 60 60 60 60Target # RU Output Ports 4.00 7.02 17.00 3.85 6.23 3.73 2.24 1.13 1.20 0.50 0.17 0.17Target Mux Factor 25.00 15.38 3.53 5.45 5.45 2.14 6.25 3.53 8.33 6.00 6.00 60.00Mux Factor (FEM) 5 5 1 1 1 1 2 1 2 2 1 1#FEMs 20 22 60 21 34 8 7 4 5 2 1 10#Mezzanines (FEMs) 40 44 0 0 0 0 7 0 5 2 0 0#Carrier Boards (FEM) 20 22 0 0 0 0 4 0 3 1 0 0Output BW/FEM [MB/s] 12 19.5 17 11 11 28 19.2 17 14.4 20 10 10Mux Factor (RU) 5 3 3 5 5 2 3 3 4 3 1 1#RU Outputs 4 8 20 5 7 4 3 2 2 1 1 10 67#Mezzanines (RUs) 8 8 20 10 14 4 3 2 4 1 1 10#Carrier Boards (RU) 4 4 10 5 7 2 2 1 2 1 1 5Ouput BW/RU [MB/s] 60 58.5 51 55 55 56 57.6 51 57.6 60 10 10Total Mux Factor 25 15 3 5 5 2 6 3 8 6 1 1#Mezzanines 48 52 20 10 14 4 10 2 9 3 1 10 183

Page 14: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

14

RU/FEM System Summary

• 373 L1 boards give a total of average data rate of 2835 MB/s at 40 kHz– average event size 71 kB

• Fixing the average output bandwidth on the RU link to 40(60) MB/s results in 95(67) output links to the Readout Network

• This number takes into account– partitioning at the level of sub-detectors– multiplexing factors up to 7– minimisation of NP carrying mezzanine cards, i.e.

cost (228/183 in total)

Page 15: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

15

Readout Network

Read-out Network (RN) 6-15 GB/s

50 MB/sVariable latency

L2 ~10 ms

L3 ~200 ms

Control &

MonitoringTrigger Level 2 & 3Event Filter

SFC SFC

CPU

CPU

CPU

CPU

Sub - Farm Controllers (SFC)

Storage

Th

rott

le

6-15 GB/s

Data

rates

40 TB/s

1 TB/s

RU RU RU

LA

N

Read -out units (RU)

Timing&

FastControl

Level -0

Front - End Electronics

Level -1

VELO TRACK ECAL HCAL MUON RICH

LHCb Detector

L0

L1

Level 0

Trigger

Level 1Trigger

40 MHz

1 MHz

40-100 kHz

Fixed latency

4.0 s

Variable latency

<2 ms

1 MHz

Front End Links

Front -End Multiplexers (FEM)

Page 16: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

16

The Readout Network

• Must connect ~95 RUs to ~100 SFCs

• Consists of point-to-point GigE links

• Uses a custom light-weight connection-less protocol on top of raw Ethernet frames

• Is asynchronous and relies on back-pressure (via flow-control) to avoid buffer-overflows

• Must be able to perform non-blocking switching at least up to O(10) GB/s

Is from the RU/SFC point of view just a ~ 100 x 100 port Gigabit Ethernet Switch

Page 17: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

17

Building a large GigE Switch

• Monolithic switches of this size are still not very common and very expensive (but they do exist, e.g. from ALCATEL and CISCO)

• Medium size commercial switch (e. g. Foundry FastIron) with 120 GigE ports

• Small switches like our standard NP based module(8 GigE ports)

• Possibly future custom modules based on next generation NPs (up to 20 ports)

• Any building block has to fulfil the basic requirements: non-blocking, flow-control, full line-speed: – For our NP based module we know that it complies– For commercial switches this must be / has been tested

• If the requirements are met, the only criterion is the cost per usable port

Page 18: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

18

Topology of the Switching Network

• All numbers in the following are based on aBanyan (= a fully connected, equal-size layer) network topology, assuming a maximum load of40 MB/s on each output from a RU

• We have seen (J-P. Dufey’s presentation), that one can do better (taking into account the uni-directional data-flow)

Page 19: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

19

Evolution of relative costs

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

Required Bandwidth w.r.t. 4 GB/s

Rel

ativ

e C

ost

4x4 (2 NPs 1st Generation)

5x5 (1 NP 2nd Generation)

10x10 (2 NPs 2nd Generation)

60x60 (Foundry BigIron)

Page 20: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

20

Number of elementary switching elements needed to go from 4 to 12

GB/s

0

50

100

150

200

250

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

Required Bandwidth w.r.t. 4 GB/s

Nu

mb

er

of

Ite

ms

4x4 Boards

5x5 Boards

10x10 Boards

60x60 Swiches200 modules 4 switches

Page 21: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

21

Cost of usable port

For a Banyan 96 x 96 port system at 40 MB/s load– using Foundry BigIron: 4 switches needed

120 x 120 usable ports:2200$ (copper) or 2500$ (SX fibre) per port (list-price)

– using 4 x 4 NP based module: 92 modules needed 92 x 92 usable ports: ~ 2400$ (SX fibre) per port (estimate)

Note however:

• Big Iron does not seem to fulfil our requirements (flow control!)

• NP based system can do final event-building (see later)

Page 22: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

22

Readout Network Summary

• Fairly large ~ 100 x 100 Gigabit Ethernet Switch – most likely not monolithic

• Need non-blocking, line-speed switching, flow-control and reasonably large buffers

• Optical connectors very much preferred (but price!)

• NP based modules fulfil all requirements

• Optimised topology saves in switch ports

• Ultimate decision will evidently be based on price per usable port (provided other requirements are met)

Page 23: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

23

L2/L3 Farm

Read - out Network (RN)

RU RU RU

6-15 GB/s

6-15 GB/s

50 MB/sVariable latency

L2 ~10 ms

L3 ~200 ms

Control &

Monitoring

LA

N

Read - out units (RU)

Timing&

FastControl

Level -0

Front - End Electronics

Level -1

VELO TRACK ECAL HCAL MUON RICH

LHCb Detector

L0

L1

Level 0

Trigger

Level 1Trigger

40 MHz

1 MHz

40-100 kHz

Fixed latency

4.0 s

Variable latency

<2 ms

Data

rates

40 TB/s

1 TB/s

1 MHz

Front End Links

Trigger Level 2 & 3Event Filter

SFC SFC

CPU

CPU

CPU

CPU

Sub-Farm Controllers (SFC)

Storage

Th

rott

le

Front - End Multiplexers (FEM)

Page 24: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

24

Event building and Level 2 & 3 Farm

• Data from the RN are delivered to a specific part of the Level 2 & 3 farm

• The entry point towards the RN is the Subfarm Controller (SFC) (RUs know only about SFCs)

• The SFC is also the gate-way to the Storage Controller(s) (SC).

• Immediately before or after an event enters the SFC, the final event building must be performed

• Load on the farm must be balanced

• The farm nodes must be controlled

Page 25: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

25

Bird’s eye view of a sub-farm

Storage Controller

10–20 Subfarm nodes

Subfarm Controller

ControlsPC

Readout Network Main

Switch

Controls Network Aggregation Switch

Subfarm Aggregation

Switch

Up-link to CERN

Up-link to Controls Network

Page 26: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

26

Anatomy of a SFC

“Server-like” PC

CPU

Memory

GigENIC

Local Bus PCI/Infinibus

GigENIC

100BaseT NIC

Readout Network

Subfarm Network

Controls Network

LocalBridge

~60 MB/s~0.5 MB/s

~60 MB/s~0.5 MB/s

This NIC could do the final event

building

Large buffer for load balancing

Not critical if Event Building done else-

where

LocalBridge

66/64b

33/32b

A server like this can be bought today for ~5 kCHF

Page 27: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

27

Subfarm Node• is disk-less, network-booted

• needs 2 network interfaces for controls and data

• needs remote reset facility

• needs lots of memory and CPU power

• must be “cheap” in terms of:– price per MIPS– floor-space– cooling, power, maintenance

• possible physical realizations include:– rack-mounted (1U) servers– standard boxes, “pizza-boxes”– “naked motherboards” on a carrier board crate based– micro-server blades– etc.

Page 28: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

28

Moore’s Law

                                                                                                              

                                                                                           

                                                                                          

608 SpecInt2000 (roughly ~ 60 SpecInt95)

Complete system (standard box) ~ 2400 CHF today!

Page 29: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

29

Further Components of the L2/L3 Farm

• Storage Controller: 1 or more multi T-Byte disk servers with connection to the high band-width link to the permanent storage facility

• Controls and Sub-farm aggregation switches: Edge switches with typically 2 1000BaseT up-links and ~20 100BaseT links (these are already almost commodity items)

• Controls PC: Server PC to control an entire sub-farm – will run standard ECS/SCADA system. (if needed for performance reasons several Control PCs can share the control of a sub-farm)

Page 30: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

30

Final Event Building

Concatenation of fragmentsfrom RUs to one event:

•Using the SFC CPU (sorting & memory copy)

•Using “smart” = programmable NICs (event-building done during DMA)

•Using a final stage of NP based modules as 4 to 4 event-builders

NP

-based

8

port m

od

ule

SFC CPU

Page 31: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

31

Size of L2/L3 farm

• Assuming 10000 SI95 for L2, 25000 SI95 for L3 and 50000 for Reconstruction for results in~850 SI95 units per sub-farm

• Assuming 55 SI95 for a farm node 20 nodes per sub-farm (including a comfortable safety margin)

100 SFCs (500 kCHF), 100 edge switches (300 kCHF), 2000 farm-nodes (4000 kCHF) = 4.8 MCHF (total cost of farm) (TODAY!!!)

• These numbers are approximate and the demand for CPU will perhaps be higher but a high performing farm could be built today at reasonable cost

Page 32: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

32

L2/L3 Farm Summary

•The L2/L3 farm is composed of sub-farms

•It maintains the separation between control and data network

•It consists of ~100 SFCs, strong in I/O, and ~O(2000) nodes, strong in CPU/memory and an aggregation switch per sub-farm

•It is scalable, hierarchically organised, uniform, hence easy to configure, control and monitor

Page 33: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

33

Conclusions (1)

• The data flow system is based on Gigabit Ethernet, most likely over cheap multi-mode fibres (1000BaseSX)

• The data flow system consists of 3 main parts:1. FEM/RU complex, which consists of NP based

modules and multiplexes several L1 links to 1 output link from a Readout Unit

2. A Gigabit Ethernet Switch Fabric, most likely composed of several smaller sub-units

3. A large compute farm, decomposed into sub-farms, load-balanced by Sub-farm Controllers

Page 34: 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

Niko NEUFELDCERN, EP

34

Conclusions (2)

• FEM/RU complex will be built of NP-based modules, whose performance has been established to be largely sufficient

• The main switching network will be built in an optimised topology, using either commercial switches or NP-based modules, depending on cost and performance

• The Subfarm will be implemented from server PCs as Subfarm-Controllers and PC like farm-nodes, connected by moderate sized edge switches. All these components exist at reasonable prices already today