JET Algorithm Attila Hidvégi. Overview FIO scan in crate environment JET Algorithm –Hardware tests (on JEM 0.2) –Results and problems –Ongoing work on

JET Algorithm

Attila Hidvégi

Overview

• FIO scan in crate environment

• JET Algorithm– Hardware tests (on JEM 0.2)– Results and problems– Ongoing work on jet code

• Other work in Stockholm

• Summary

FIO scan in crate environment (one of two JEMs)

Jem[0]

0

2000

4000

6000

8000

10000

12000

14000

0 9 18 27 36 45 54 63 72 81 90 99 108

117

126

135

144

153

162

171

180

189

198

207

216

225

234

Deskew

Err

ors

InputFPGA 1

InputFPGA 2

InputFPGA 3

InputFPGA 4

InputFPGA 5

InputFPGA 6

InputFPGA 7

InputFPGA 8

InputFPGA 9

InputFPGA 10

InputFPGA 11

Total

Delay scan for individual FPGAsJem[0]

0

200

400

600

800

1000

1200

1400

1600

1800

0 9 18 27 36 45 54 63 72 81 90 99 108

117

126

135

144

153

162

171

180

189

198

207

216

225

234

Deskew

Err

ors

InputFPGA 1

InputFPGA 2

InputFPGA 3

InputFPGA 4

InputFPGA 5

InputFPGA 6

InputFPGA 7

InputFPGA 8

InputFPGA 9

InputFPGA 10

InputFPGA 11

Delay scan for two JEMs (same deskew clock settings)

All Jems

0

2000

4000

6000

8000

10000

12000

14000

16000

0 8 16 24 32 40 48 56 64 72 80 88 96 104

112

120

128

136

144

152

160

168

176

184

192

200

208

216

224

232

Deskew

Err

ors Jem[0]

Jem[1]

It is important to equalize timing between JEMs

All Jems

0

5000

10000

15000

20000

25000

0 8 16 24 32 40 48 56 64 72 80 88 96 104

112

120

128

136

144

152

160

168

176

184

192

200

208

216

224

232

Deskew

Err

ors

Jem[0]

Jem[1]

All

Hardware tests of Jet algorithm

• Originally only jet multiplicities recorded; insufficient for diagnostics. New firmware was needed.

• All inputs and outputs needed to be recorded.

• Same spy memory and software as for FIO scan.

JET

Spy memory

Data in

Configurations

JET Algorithm – Results

• All input data is received properly (↕).• Synthesis tool reports maximum delay of 11.2 ns

and 26.4 ns, for 80 and 40 MHz clocks, respectively.

• This is the maximum achievable result for current VHDL code.

• Results from hardware tests show random errors!

Most likely cause of the problem

• Only adders still use 5-bit serial arithmetics – Virtex architecture more suited to parallel arithmetic with fast-

carry chains.

• Adders are the slowest component of the algorithm.• The errors are intermittent, suggesting timing problems

– Sums occasionally wrong.– Sometimes cause incorrect threshold passes– More often give wrong ROI positions for random data

• The timing problem of the adders can not be resolved without a major rewriting.

• JET algorithm is being rewritten from scratch.

New design for the Jet code

• It will be flexible. Generic variables will decide the configuration.

• Try to minimize size and latency.• The code has to be short, so that one can modify it

easily.• Use only parallel arithmetics, and take full advantage of

the fast carry architecture available in Virtex FPGAs.• Let the synthesis tool do the hard work.• All the parts have to be tested VERY carefully so we

really get what we expect.• It needs to work…

Making it flexible means:entity JET is generic ( ROIx : integer := 2; -- Number of ROI in x direction. ROIy : integer := 4; -- Number of ROIs in y direction. input_width : integer := 10; -- Width of input data. threshold_nr : integer := 8; -- Number of thresholds. threshold_width : integer := 10; -- Width of threshold. multiplicity_width : integer := 3); -- Width of a single jetmultilpicity. port ( data_in : in std_logic_vector(((ROIx*2+3)*(ROIy*2+3)*input_width-1) downto 0); thresholds : in std_logic_vector((threshold_nr*threshold_width-1) downto 0); sizes : in std_logic_vector((threshold_nr*2-1) downto 0); ROIs : out std_logic_vector(((threshold_nr+4)*ROIx*ROIy-1) downto 0); jetmultiplicities : out std_logic_vector((threshold_nr*multiplicity_width) downto 0); clk40, clk80, reset : in std_logic); end JET;

What could be done better ?

• Putting all summations and the local maximum finder in the first clock cycle allows us to remove the pipelines.

1 2*2, 4 3*3, 1 4*4 and 1 2-bit position

Input 77 jet elements

Latch

55 1*3

60 2*2

Comparator1

60 2*2

Latch

60 2*2

Latch

32 2*2

Latch

Latch

45 3*3

Latch

Mux(A)

Latch

Latch

AL<BL

32 2*2

32 4*4

45 3*3

Loc al

m

axim

a po

siti

on s

select

Add(1)

Add(2)

Add(3)


• It is important to remember that multiplexers (2:1), adders and comparators have equal sizes, since they use the same logic resources.

1 of 8 Sub-region Components

1 2*2, 4 3*3, 1 4*4 and 1 2-bit position

1 of 8 threshold definitions

Mux(B)

4 3*3

4 3*3

M a x p o s i t i o n

1 3*3

1 2*2and 1

4*4

Mux(C)

Latch

2*2, 3*3 or 4*4

Comparator3Programmable threshold 1

Programmable size for threshold 1

Local max 1passed

definition 1

Demult (5 to 10 bit)

Latch

1 2*2, 1 3*3 and 1 4*4

Comparator2

select

po sit

io n

2- bit

Latch

Latch


Add(4)

Output:

Jet count 3-bit

vector

Output:

RO

I 1 11-bit vector

Connected to Threshold definition 1 in all of the sub-region components

Latch Latch

Connected to Sub-region 1 component to all threshold definitions (8-bit)

Connected to Sub-region 1 component position (2-bit)

Connected to Sub-region 1 component

Overflow-bit

Generating the adder trees for the Jet algorithm (new version)

Declaration of cluster types and necessary indices: type c_array is array ((ROIy*2+2) downto 0) of std_logic_vector((input_width+3) downto 0); type c_matrix is array ((ROIx*2+2) downto 0) of c_array; type cluster_type is array (5 downto 0) of c_matrix; type cluster_size is array (0 to 5) of integer; type cluster_index is array (1 to 7) of cluster_size; constant ci : cluster_index := ((2,1,0,0,1,0),(3,1,1,0,2,0),(2,2,1,1,0,1),(3,2,2,2,0,1), (4,2,3,3,2,0),(3,3,4,2,0,2),(4,4,5,5,0,2));

Generating the adder trees for the Jet algorithm (new version)

Adders: ADD: process (clk40, reset) variable cl : cluster_type; begin -- process ADD if reset = '0' then -- asynchronous reset (active low) elsif clk40'event and clk40 = '1' then -- rising clock edge cl := (others => (others => (others => (others => '0')))); for x in (ROIx*2+2) downto 0 loop for y in (ROIy*2+2) downto 0 loop cl(0)(x)(y) := "0000" & data_in((((ROIy*2+3)*x+y+1)*input_width-1) downto (((ROIy*2+3)*x+y)*input_width));

end loop; -- y end loop; -- x for c in 1 to 7 loop for x in 0 to (ROIx*2+3-ci(c)(0)) loop for y in 0 to (ROIy*2+3-ci(c)(1)) loop cl(c)(x)(y) := cl(ci(c)(2))(x)(y) + cl(ci(c)(3))(x+ci(c)(4))(y+ci(c)(5)); end loop; -- y end loop; -- x end loop; -- c end if; end process ADD;

Other work in Stockholm

• Work on the CMM code for Jet merging (Sam Silverstein).– Based on the CP merger design – simple

modifications to accomodate 16 JEMs– Simple version – no special FCAL treatment– Have completed crate merger FPGA source

for both crate and system mergers– Next steps:

• Compile and simulate crate merger designs• System merging FPGA (not much to do)

Summary

• FIO scan look promising.

• Jet code didn’t work, is being rewritten from scratch.

• The new Jet code seems promising.

• Work on the CMM goes well.

Documents

JET Algorithm Attila Hidvégi. Overview FIO scan in crate environment JET Algorithm –Hardware tests (on JEM 0.2) –Results and problems –Ongoing work on