Upload
bernard-simon
View
221
Download
3
Tags:
Embed Size (px)
Citation preview
A new concept to use 3D vertical integration technology for fast pattern recognition
Ted Liu, Jim Hoff, Grzegorz Deptuch, Ray Yarema
Fermilab
Questions or Comments: [email protected]
Introduction and OutlineThe development of 3D technology
for the solution of the fast pattern recognition problem is part of a broader, ongoing R&D effort that includes both 2D and 3D solutions.
This talk will cover:◦An introduction to the problem◦A description of the Associative Memory
solution◦A new concept – VIPRAM – that uses
emerging 3D technology
103
3
103
5
1032 cm-2 s-1
103
4
simulation
The Obvious Problem…
There are enormous challenges in implementing pattern recognition for a tracking trigger at LHC (L1&L2), due to
1. The much higher occupancy and event rates at the LHC2. The much more massive detectors3. The larger number of channels in their tracking volumes
There is a clear need to develop/improve the hardware-based pattern recognition technology to advance the state-of-the-art for the future
The Challenges
To increase the patterns density by 3 orders of magnitude (from the
original AMchips) and increase the speed by more than a factor of 3
while reducing power consumption (or at least dramatically reducing
the rate of increase of power consumption)1.
[1] Based on the extensive simulation studies by Atlas FTK Collaboration
Some Obvious Questions…Can’t we just use what we currently
have and just make bigger PC boards or more of them?◦No. This results in severe speed
bottlenecks and power issues. Can’t we just use commercial CAMs?
◦No. CAMs are part of the fast pattern recognition process, but not all of it. Alone, CAMs lack certain necessary features, making them unsuitable for fast track triggering.
It’s not a CAM; it’s a PRAMA CAM (Content Addressable
Memory) is a classical digital system building block
•One pattern at a time•Each CAM cell responds or does not respond to the current pattern•There is no memory of previous matches
Pattern 1Pattern 1
Matc
h
Pattern 3Pattern 3
Matc
h
Matc
h
Pattern 7Pattern 7
Matc
h
It’s not a CAM; it’s a PRAMA PRAM on the other hand is a
Pattern Recognition Associative Memory.
Layer 1Address 4
Matc
hLayer 1Address 4
Matc
hM
atc
h
Layer 3Address 7
Matc
h
Matc
h
Layer 3Address 7
Matc
h
Matc
h
Matc
h
Matc
h
Layer 3Address 9
Matc
h
Matc
h
Matc
h
Matc
h
Layer 3Address 9
Matc
h
Matc
h
Matc
h
Matc
hM
atc
h
Matc
h
Matc
h
Matc
h
Layer 2Address 1
Matc
h
Matc
h
Matc
h
Matc
h
Layer 2Address 1
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Layer 4Address 4
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Layer 4Address 4
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Layer 2Address 4
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Layer 2Address 4 Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Matc
h
Road!
History and the “traditional” effort The AMchips were invented
and developed in Italy resulting in the AMchip03 which is currently being used by CDF.
There is an ongoing effort, led by Italians, to improve on the AMchip03 design. We are now a part of this collaboration.
The idea, of course, is to increase pattern density and speed and to optimize the performance.
Design in deep sub-micron processes. The current target is 65nm.
Limitations in 2D…
A Single PRAM Cell(in 2 dimensions)
CAM Cells
Match Storage
Matc
h lin
es
Glu
e L
ogic
In the older version of the AMchip,the match lines were a source of speed limitation because of their length and capacitance. The GlueLogic was large and slow.
Length -> Capacitance -> Reduced Speed
THE CONCEPT – VIPRAM Vertically Integrated Pattern Recognition Associative Memory
A Reduced Footprint and therefore greater pattern density.
Shorter Match lines and therefore greater speed.
Less Capacitance and therefore reduced power consumption
Each detector layer corresponds to a single tier
All communication from “CAM Tiers” to the single “Control Tier”
The PRAM concept is tailor-made for 3D design.
Much
Short
er
Much
Shorte
r
Another Single CAM Cell(this time in 3 dimensions)
Viewing this structure as a pseudo-layout some of the aforementioned benefits become even more obvious.
The 3-dimensional design of the VIPRAM makes the PRAM appear like a 2-dimensional array of “tubes”, each dedicated to a single pattern.
Communication with the outside world during normal operation is done solely through the Control Tier (the blue tier on top).
Pattern recognition for trackingis naturally a task in 3D
trackroad
Majority Logic – Old VersionM
atc
h L
ines
User-definedThreshold
Road Flag
Adder DigitalComparator
Majority Logic – New Version
0
1
Sel
0
1
Sel
0
1
Sel
0
1
Sel
0
1
Sel
0
1
Sel
0
1
Sel
0
1
Sel
0
1
Sel
0
1
Sel
0
1
Sel
0
1
Sel
Match1 Match2 Match3 Match4
Matc
h P
att
ern
Pass Transistor Logic
Majority Logic – New Approach
Majority Pattern
Meaning
111 Perfect Match
011 1 Missing Layer
001 2 Missing Layers
000 3 or More Missing Layers
Stage Input
Stage Output: Match
Stage Output:
Mismatch
111 111 011
011 011 001
001 001 000
000 000 000
For each stage…
In the end…
Can 3D exploit even more advantages from the new Majority Logic?
Yes. We have divided the 3D design by detector layer (i.e. each CAM Tier is dedicated to one detector layer) Therefore, any logical division by detector layer results in functions that can be sub-divided by tier.
Can 3D exploit even more advantages from the new Majority Logic?
0
1
Sel0
1
Sel0
1
Sel
0
1
Sel0
1
Sel0
1
Sel
0
1
Sel0
1
Sel0
1
Sel
0
1
Sel0
1
Sel0
1
Sel
M
atch
1
Match
2
Match
3
Match
4
Match Pattern
Readout The top tier (a.k.a. the Control
Tier) is a two dimensional array of elements whose position is indicative of its address and that contains an indication of whether or not a road was found. Compare this with a pixel array which is a two dimensional array of elements whose position is indicative of its address and that contains an indication of whether or not a hit was found.
In other words, high-speed readout architectures for pixel arrays can and should be used for VIPRAM readout.
Design for Simplicity The VIPRAM has two types of
tiers, CAM and Control. In the final design, there will be several CAM tiers and only one Control tier.
Each CAM tier is functionally identical to the others, but must maintain a unique relationship to the Control tier in order to work. In other words, patterns that come into the Control Tier from Detector “1” must be sent to the CAM tier dedicated to Detector “I”. Similarly, when data is sent from CAM tier #3, the Control Tier must know it came from CAM tier #3 and not some other CAM tier.
How can this be done without requiring unique mask sets for each CAM tier?
Great minds think alike? Having gone part-way
through this design procedure, the collaboration had the opportunity to meet with Bob Patti of Tezzaron who has been involved in 3D memory design from the beginning.
Tezzaron’s 3D Memories follow exactly this arrangement of Control Tier and (in Tezzaron’s case) Memory Tier.
In other words, we are following a beaten path, not blazing a new trail.
The Diagonal Via
The Diagonal Via was patented by BobPatti and Tezzaron in 2000. It convertsvertical position to horizontal position andallows a common mask set to provide uniqueaccess to each layer.
Conclusions and Future WorkThe VIPRAM is a new concept and now we
are developing a collaboration with Fermilab, University of Chicago, INFN and Argonne. ◦The immediate goal is a proof of principal◦The ultimate goal is a 3 order-of-magnitude
increase in performance (density+speed).At present, we are seeking funding for
the VIPRAM development. You will hear from us again at the next
TIPP (please pick a nice place for my wife…)
Background
Figure 13 - Pass Transistor Multiplexors in the Majority Logic
VIPRAM – A Vertically Integrated PRAM
Modern technology provides us with another approach…and another dimension.
At first, the idea was extremely simple – increase the pattern density by stacking otherwise normal AMchips. The outputs of existing AMchips are already in a daisy chain. The stacked AMchips would not need to “know” that they were part of a stack.
VIPRAM – A Vertically Integrated PRAM This was necessarily
modified to include “wrapping” an AMchip in circuitry that dealt with the 3D stacking, leaving an AMchip core that was identical to the 2D AMchips that are under development.
3D Wrapper AMchip03 Clone 3D Wrapper
Not the first to consider 3D Content Addressable Memory Oh and Franzone1 first
suggested the advantages of 3D design on CAMs in 2007
Their idea involved vertically integrating the CAM cell itself so that the Matchline was vertical. This minimized its length and therefore its capacitance.
The method is highly impractical since it requires f(N) 3D layers where N is the number of bits in the CAM cell.
CAMBit CellCAMBit CellCAMBit Cell
CAMBit Cell
Matc
hlin
e
3D Layer 1
3D Layer 2
3D Layer 3
3D Layer N
[1] E.C. Oh and P.D. Franzon, “Design Considerations and Benefits of Three-Dimensional Ternary Content Addressable Memory”, IEEE Custom Integrated Circuits Conference, 2007, p. 591
Again, this is a PRAM not a CAM There is a perfectly
natural, 3D functional division in a PRAM. Each detector layer gets its own 3D layer.
The vertical interconnect is not the CAM match line, but the Road line.
Moreover, each detector layer has independent data lines for both pattern matching and pattern loading, and this is a natural consequence of this architecture.
M
atc
h
M
atc
h
M
atc
h
M
atc
h
M
atc
h
M
atc
h
M
atc
h
M
atc
h
M
atc
h
M
atc
h
M
atc
h
Road!3D Layer 1
3D Layer 2
3D Layer 3
3D Layer N
How can we improve on this design?
4 blocks of 12806-layer patterns
~80%AM bank
~20%control
&interface
Move to another tier
in 3D
How can we fundamentally improve on this design?
How can we fundamentally improve on this design?
How can we fundamentally improve on this design?
Majority blockstill in standard cell~ 30%
can be also moved to thecontrol/interface tier in 3D
within each
pattern block
Fischer Tree (Mephisto Logic) P. Fischer introduced
the Mephisto readout architecture [1].
We found “Fischer Tree” easier to say.
It is a self-selecting, self-addressing priority encoding architecture that performs the task in log[N] time.
[1]“First implementation of the MEPHISTO binary readout architecture for strip detectors” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment Volume 461, Issues 1-3, 1 April 2001, Pages 499-504 8th Pisa Meeting on Advanced Detectors
Fischer Tree (Mephisto Logic)Fischer Trees can be
stacked if need be, so the two dimensional array in the Control Tier can be handled this way.
An alternate approach could take each output and push it into a stack.
Fisc
her
Tree
Fisc
her
Tree
Fisc
her
Tree
Fisc
her
Tree
Col 1 Col 2 Col 3 Col N
Fischer Tree
…