View
216
Download
0
Embed Size (px)
Citation preview
Overview
• FIO scan in crate environment
• JET Algorithm– Hardware tests (on JEM 0.2)– Results and problems– Ongoing work on jet code
• Other work in Stockholm
• Summary
FIO scan in crate environment (one of two JEMs)
Jem[0]
0
2000
4000
6000
8000
10000
12000
14000
0 9 18 27 36 45 54 63 72 81 90 99 108
117
126
135
144
153
162
171
180
189
198
207
216
225
234
Deskew
Err
ors
InputFPGA 1
InputFPGA 2
InputFPGA 3
InputFPGA 4
InputFPGA 5
InputFPGA 6
InputFPGA 7
InputFPGA 8
InputFPGA 9
InputFPGA 10
InputFPGA 11
Total
Delay scan for individual FPGAsJem[0]
0
200
400
600
800
1000
1200
1400
1600
1800
0 9 18 27 36 45 54 63 72 81 90 99 108
117
126
135
144
153
162
171
180
189
198
207
216
225
234
Deskew
Err
ors
InputFPGA 1
InputFPGA 2
InputFPGA 3
InputFPGA 4
InputFPGA 5
InputFPGA 6
InputFPGA 7
InputFPGA 8
InputFPGA 9
InputFPGA 10
InputFPGA 11
Delay scan for two JEMs (same deskew clock settings)
All Jems
0
2000
4000
6000
8000
10000
12000
14000
16000
0 8 16 24 32 40 48 56 64 72 80 88 96 104
112
120
128
136
144
152
160
168
176
184
192
200
208
216
224
232
Deskew
Err
ors Jem[0]
Jem[1]
It is important to equalize timing between JEMs
All Jems
0
5000
10000
15000
20000
25000
0 8 16 24 32 40 48 56 64 72 80 88 96 104
112
120
128
136
144
152
160
168
176
184
192
200
208
216
224
232
Deskew
Err
ors
Jem[0]
Jem[1]
All
Hardware tests of Jet algorithm
• Originally only jet multiplicities recorded; insufficient for diagnostics. New firmware was needed.
• All inputs and outputs needed to be recorded.
• Same spy memory and software as for FIO scan.
JET
Spy memory
Data in
Configurations
JET Algorithm – Results
• All input data is received properly (↕).• Synthesis tool reports maximum delay of 11.2 ns
and 26.4 ns, for 80 and 40 MHz clocks, respectively.
• This is the maximum achievable result for current VHDL code.
• Results from hardware tests show random errors!
Most likely cause of the problem
• Only adders still use 5-bit serial arithmetics – Virtex architecture more suited to parallel arithmetic with fast-
carry chains.
• Adders are the slowest component of the algorithm.• The errors are intermittent, suggesting timing problems
– Sums occasionally wrong.– Sometimes cause incorrect threshold passes– More often give wrong ROI positions for random data
• The timing problem of the adders can not be resolved without a major rewriting.
• JET algorithm is being rewritten from scratch.
New design for the Jet code
• It will be flexible. Generic variables will decide the configuration.
• Try to minimize size and latency.• The code has to be short, so that one can modify it
easily.• Use only parallel arithmetics, and take full advantage of
the fast carry architecture available in Virtex FPGAs.• Let the synthesis tool do the hard work.• All the parts have to be tested VERY carefully so we
really get what we expect.• It needs to work…
Making it flexible means:entity JET is generic ( ROIx : integer := 2; -- Number of ROI in x direction. ROIy : integer := 4; -- Number of ROIs in y direction. input_width : integer := 10; -- Width of input data. threshold_nr : integer := 8; -- Number of thresholds. threshold_width : integer := 10; -- Width of threshold. multiplicity_width : integer := 3); -- Width of a single jetmultilpicity. port ( data_in : in std_logic_vector(((ROIx*2+3)*(ROIy*2+3)*input_width-1) downto 0); thresholds : in std_logic_vector((threshold_nr*threshold_width-1) downto 0); sizes : in std_logic_vector((threshold_nr*2-1) downto 0); ROIs : out std_logic_vector(((threshold_nr+4)*ROIx*ROIy-1) downto 0); jetmultiplicities : out std_logic_vector((threshold_nr*multiplicity_width) downto 0); clk40, clk80, reset : in std_logic); end JET;
What could be done better ?
• Putting all summations and the local maximum finder in the first clock cycle allows us to remove the pipelines.
1 2*2, 4 3*3, 1 4*4 and 1 2-bit position
Input 77 jet elements
Latch
55 1*3
60 2*2
Comparator1
60 2*2
Latch
60 2*2
Latch
32 2*2
Latch
Latch
45 3*3
Latch
Mux(A)
Latch
Latch
AL<BL
32 2*2
32 4*4
45 3*3
Loc al
m
axim
a po
siti
on s
select
Add(1)
Add(2)
Add(3)
What could be done better ?
• It is important to remember that multiplexers (2:1), adders and comparators have equal sizes, since they use the same logic resources.
1 of 8 Sub-region Components
1 2*2, 4 3*3, 1 4*4 and 1 2-bit position
1 of 8 threshold definitions
Mux(B)
4 3*3
4 3*3
M a x p o s i t i o n
1 3*3
1 2*2and 1
4*4
Mux(C)
Latch
2*2, 3*3 or 4*4
Comparator3Programmable threshold 1
Programmable size for threshold 1
Local max 1passed
definition 1
Demult (5 to 10 bit)
Latch
1 2*2, 1 3*3 and 1 4*4
Comparator2
select
po sit
io n
2- bit
Latch
Latch
What could be done better ?
Add(4)
Output:
Jet count 3-bit
vector
Output:
RO
I 1 11-bit vector
Connected to Threshold definition 1 in all of the sub-region components
Latch Latch
Connected to Sub-region 1 component to all threshold definitions (8-bit)
Connected to Sub-region 1 component position (2-bit)
Connected to Sub-region 1 component
Overflow-bit
Generating the adder trees for the Jet algorithm (new version)
Declaration of cluster types and necessary indices: type c_array is array ((ROIy*2+2) downto 0) of std_logic_vector((input_width+3) downto 0); type c_matrix is array ((ROIx*2+2) downto 0) of c_array; type cluster_type is array (5 downto 0) of c_matrix; type cluster_size is array (0 to 5) of integer; type cluster_index is array (1 to 7) of cluster_size; constant ci : cluster_index := ((2,1,0,0,1,0),(3,1,1,0,2,0),(2,2,1,1,0,1),(3,2,2,2,0,1), (4,2,3,3,2,0),(3,3,4,2,0,2),(4,4,5,5,0,2));
Generating the adder trees for the Jet algorithm (new version)
Adders: ADD: process (clk40, reset) variable cl : cluster_type; begin -- process ADD if reset = '0' then -- asynchronous reset (active low) elsif clk40'event and clk40 = '1' then -- rising clock edge cl := (others => (others => (others => (others => '0')))); for x in (ROIx*2+2) downto 0 loop for y in (ROIy*2+2) downto 0 loop cl(0)(x)(y) := "0000" & data_in((((ROIy*2+3)*x+y+1)*input_width-1) downto (((ROIy*2+3)*x+y)*input_width));
end loop; -- y end loop; -- x for c in 1 to 7 loop for x in 0 to (ROIx*2+3-ci(c)(0)) loop for y in 0 to (ROIy*2+3-ci(c)(1)) loop cl(c)(x)(y) := cl(ci(c)(2))(x)(y) + cl(ci(c)(3))(x+ci(c)(4))(y+ci(c)(5)); end loop; -- y end loop; -- x end loop; -- c end if; end process ADD;
Other work in Stockholm
• Work on the CMM code for Jet merging (Sam Silverstein).– Based on the CP merger design – simple
modifications to accomodate 16 JEMs– Simple version – no special FCAL treatment– Have completed crate merger FPGA source
for both crate and system mergers– Next steps:
• Compile and simulate crate merger designs• System merging FPGA (not much to do)