13
Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld , PH-LBC SFT Technical Group Meeting 08/10/2012

Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

Embed Size (px)

Citation preview

Page 1: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

Why it might be interesting to look at ARM

Ben Couturier, Vijay KartikNiko Neufeld, PH-LBC

SFT Technical Group Meeting 08/10/2012

Page 2: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

The challenge for LHCb

• Major upgrade during LS2• Read out detector at bunch-xing rate

40 MHz• No more hardware based trigger –

need to filter 40 Million events / s (32 Tbit/s) in software

Why look at ARM? N. Neufeld

Page 3: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

GBT: custom radiation- hard link over MMF, 3.2 Gbit/s (about 10000)

Input into DAQ network (10/40 Gigabit Ethernet or FDR IB) (1000 to 4000)

Output from DAQ network into compute unit clusters (100 Gbit Ethernet / EDR IB) (200 to 400 links)

Dataflow

Why look at ARM? N. Neufeld

Detector

DAQ network

100 m rock

Readout Units

Compute Units

Page 4: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

What will be the Compute Unit?

• Baseline could possibly be augmented with a co-processor card (like Intel MIC or a GPU) lots of interest from various groups

• Alternative 1: Use lower-power, cheaper x86 processors such as Intel Atom, AMD– Optimize HEPSpec/CHF/W

• Alternative 2: Or use non-Intel processors. Try to profit from the highly competitive and innovative market for processors for portable devices ARM

Why look at ARM? N. Neufeld

• A compute unit is a destination for the event-data fragments from the readout units

• It assembles the fragments into a complete “event” and runs various selection algorithms on this event

• About 0.1 % of events is retained

• Baseline option: a high-density server platform (mainboard with standard CPUs) using Moore’s law and some estimates on the algorithms need 4000 to 5000 servers of the 2018 type!

Page 5: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

ARM

• A “pure” RISC architecture (with some enhancements)

• A long tradition in the embedded market

• Billions of cores sold– in many variants – # cores / power vs

performance• Produced by various

licensees • Has a reputation of the

best power-efficiency in the market

Why look at ARM? N. Neufeld

We are here32-bitIEEE floatsSIMDnative Javaoffload

Announced:64-bitSIMD with DP floats

Page 6: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

So what would a compute unit look like?

Why look at ARM? N. Neufeld

Page 7: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

Operational constraints

• The Online farms are very big– O(2000) servers, of different generations, vendors,

• Like a traditional data-centre with all the problems, and very few administrators and some simplifications:– A single client – In Online operation at least mostly a single work-load

• But want rack-mountable, remote-manageable, good mechanics, decent powering, vendor support etc… and of course low cost!– Don’t want to build this ourselves needs to fit in

traditional data-centre structure

Why look at ARM? N. Neufeld

Page 8: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

Embedded in the data-centre

Why look at ARM? N. Neufeld

• Boston Viridis (projects also from DELL and HP)

• Consists of 48 SoC• 4 cores 4 GB RAM• ARM A9 Cortex 1.4 GHz

• 80 Gb Ethernet switch• Total 192 cores / 192 GB RAM /

300 Watt• Exists also from DELL/HP

Page 9: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

How fast is a core?

Why look at ARM? N. Neufeld

So we’ll need many

Page 10: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

Is it worth it?

• ARM v7: 192 cores need 300 W and 2 U for about 520 HepSpecs

• X5650: 96 hyperthreads need about 1400 W and 2 U for 900 HEPSpecs

• If this ratio continues to hold into 2018 LHCb could do the upgrade with a 600 kW data-centre instead of a new (!) 2 MW one

• And maybe at some point we need to pay for the power

Why look at ARM? N. Neufeld

Page 11: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

The acid test

• HepSPEC is not necessarily a good test for Online usage– Online we (currently) run n instances of

the same application in parallel, where n is the number of cores/hyperthreads

– No “mixed” work-load – hyperthreading typically adds more in the Online “mono-culture”

• Need to benchmark using the High Level Trigger code

Why look at ARM? N. Neufeld

Page 12: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

Project: “Moore on ARM”

• Need to compile the LHCb software-stack (beginning from Root)

• Can compare with natively compiled code – everything works fine on the FC17 test-node, but compilation is slow– Root 5.34.02 ./configure linuxarm --enable-c+

+11;make –j 4 takes 30m43s

• Team (part-time only) Ben Couturier, Vijay Kartik, Niko Neufeld

Why look at ARM? N. Neufeld

Page 13: Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012

Future plans

• X-compiler chain ready• Will now go on to compile stack• Verification and bench-marking• Then: full-scale test on fully loaded

192 core system (with a faster ARM – currently use A8 – will have A9 or A15), possibly including real network input (for fun)

Why look at ARM? N. Neufeld