11
LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

Embed Size (px)

Citation preview

Page 1: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb readout infrastructure

NA62 TDAQ WG Meeting

April 1st, 2009

Niko Neufeld, PH/LBC

Page 2: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 2

General features

• Two trigger levels: L0 (hardware), HLT (software)

• L0 fixed latency 4 us, no ZS, deterministic buffer status, buffers protected by central trigger distribution system (TFC) using emulation

• Frontend links (G-link/GOL @ ~ 160 MB/s)

• Uniform Readout-board TELL1 (up to 24 input links)

• 4 output links (Gigabit Ethernet 1000 BaseT)

• Gigabit Ethernet network

• Farm (up to ~1500 servers)

Page 3: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 3

LHCb Frontend Architecture

Page 4: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 4

LHCb DAQ

SWITCH

HLT farm

Detector

TFC System

SWITCHSWITCH SWITCH SWITCH SWITCH SWITCH

READOUT NETWORK

L0 triggerLHC clock

MEP Request

Event building

Front-End

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Readout Board

Expe

rimen

t Con

trol

Sys

tem

(EC

S)

VELO ST OT RICH ECal HCal MuonL0

Trigger

Event dataTiming and Fast Control SignalsControl and Monitoring data

SWITCH

MON farm

CPU

CPU

CPU

CPU

Readout Board

Readout Board

Readout Board

Readout Board

Readout Board

Readout Board

FEElectronics

FEElectronics

FEElectronics

FEElectronics

FEElectronics

FEElectronics

FEElectronics

40 GB/s

80 MB/s

Average event size 40 kBAverage rate into farm 1 MHzAverage rate to tape 2 kHz

Page 5: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 5

DAQ features

• Push-protocol• Message coalescing: data

for a single trigger in LHCb are small: (ca 100 bytes) put several together into one message on the network

• Farmnodes announce availability to TFC

• TFC decides where to send and how many events to put into one packet

• After TELL1 all hardware is commercial off the shelf (COTS) albeit high-end

• Switch (main router): Force10 E1200 (up to 1260 Gigabit Ethernet ports)

• Separated networks for DAQ and controls• important for robustness &

monitoring

• Farmnodes (currently 2.5 GHz, 1 to 2 GB/core, 8 cores, Intel Harpertown)

Page 6: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 6

What’s nice?

• Almost only COTS components• always profit from industry progress

(Moore’s law) and price decline

• Very few different components• few experts needed

• Scalable: farm is only limited by cooling and power, network is limited by size of F10 switch, but could buy a second chassis

Page 7: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 7

Hardware / technology choices

• Ethernet over copper: longest distance in the LHCb Online system is 36 m (standard allows 90 m) – tremendous cost advantage!

• Force10. At the time of our evaluation (~ 2006), this switch was the only one which could withstand our traffic (because of its superior buffering)

• PCs: normally go for best price/performance, new servers can easily handle the traffic (up to 6 Gbit/s), some tuning of kernel parameters is necessary

Page 8: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 8

TELL1 from the DAQ point of view

• We had very few real problems with the hardware except:• Ethernet problem of the CCPC (wrong connector): fixed by bringing

active switches close to the crates

• Spurious VME resets (now fixed)

• Software:• CCPC software complete and stable (most files have not been touched

in years): I2C, JTAG, local bus, allows for interrupt handling from FPGAs (not used but working), the libraries are protected against concurrent access to hardware resources

• No software upgrades planned, except following OS trends (will move to SLC5 eventually) and some performance improvements in the main DIM-server

• TELL1 problems in the readout:• Misconfiguration (now rare)• desynchronisation: dirty fibres, glitches, front-end, etc…

Page 9: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 9

Controls

• In LHCb both slow-controls (DCS) and run-control are based on PVSS and the JCOP framework

• TELL1 is fully integrated into PVSS (as framework component): software layering is quite modular

• Custom hard- and software is usually integrated into PVSS using DIM

Page 10: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 10

Problems

• Commercial hardware:• harddisks, power-supplies,

ethernet switch ports, memory barettes

• Software:• funny problems with BMCs

/ booting, BIOS oddities,

• All these are related to the fact that we have a lot and that the hardware is cheap: maintenance

• Large system with many components requires quite a few people to run

Page 11: LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC

LHCb Readout Infrastructure - NA62 TDAQ 11

Hardware inventory

• 290 TELL1s in production• 1 x Force10 E1200 switch

• 50 x HP3500 aggregation switch (farm/DAQ)

• 70 x HP2650 aggregation switch (control)

• 100 SuperMicro Twinserver (E4)• 350 DELL blade servers (M605)• ~ 120 DELL 1425 SC servers

(controls)