A new concept to use 3D vertical integration technology for fast pattern recognition Ted Liu, Jim Hoff, Grzegorz Deptuch, Ray Yarema Fermilab Questions

A new concept to use 3D vertical integration technology for fast pattern recognition

Ted Liu, Jim Hoff, Grzegorz Deptuch, Ray Yarema

Fermilab

Questions or Comments: [email protected]

Introduction and OutlineThe development of 3D technology

for the solution of the fast pattern recognition problem is part of a broader, ongoing R&D effort that includes both 2D and 3D solutions.

This talk will cover:◦An introduction to the problem◦A description of the Associative Memory

solution◦A new concept – VIPRAM – that uses

emerging 3D technology

103

3

103

5

1032 cm-2 s-1

103

4

simulation

The Obvious Problem…

There are enormous challenges in implementing pattern recognition for a tracking trigger at LHC (L1&L2), due to

1. The much higher occupancy and event rates at the LHC2. The much more massive detectors3. The larger number of channels in their tracking volumes

There is a clear need to develop/improve the hardware-based pattern recognition technology to advance the state-of-the-art for the future

The Challenges

To increase the patterns density by 3 orders of magnitude (from the

original AMchips) and increase the speed by more than a factor of 3

while reducing power consumption (or at least dramatically reducing

the rate of increase of power consumption)1.

[1] Based on the extensive simulation studies by Atlas FTK Collaboration

Some Obvious Questions…Can’t we just use what we currently

have and just make bigger PC boards or more of them?◦No. This results in severe speed

bottlenecks and power issues. Can’t we just use commercial CAMs?

◦No. CAMs are part of the fast pattern recognition process, but not all of it. Alone, CAMs lack certain necessary features, making them unsuitable for fast track triggering.

It’s not a CAM; it’s a PRAMA CAM (Content Addressable

Memory) is a classical digital system building block

•One pattern at a time•Each CAM cell responds or does not respond to the current pattern•There is no memory of previous matches

Pattern 1Pattern 1

Matc

h

Pattern 3Pattern 3

Matc

h

Matc

h

Pattern 7Pattern 7

Matc

h

It’s not a CAM; it’s a PRAMA PRAM on the other hand is a

Pattern Recognition Associative Memory.

Layer 1Address 4

Matc

hLayer 1Address 4

Matc

hM

atc

h

Layer 3Address 7

Matc

h

Matc

h

Layer 3Address 7

Matc

h

Matc

h

Matc

h

Matc

h

Layer 3Address 9

Matc

h

Matc

h

Matc

h

Matc

h

Layer 3Address 9

Matc

h

Matc

h

Matc

h

Matc

hM

atc

h

Matc

h

Matc

h

Matc

h

Layer 2Address 1

Matc

h

Matc

h

Matc

h

Matc

h

Layer 2Address 1

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Layer 4Address 4

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Layer 4Address 4

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Layer 2Address 4

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Layer 2Address 4 Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Matc

h

Road!

History and the “traditional” effort The AMchips were invented

and developed in Italy resulting in the AMchip03 which is currently being used by CDF.

There is an ongoing effort, led by Italians, to improve on the AMchip03 design. We are now a part of this collaboration.

The idea, of course, is to increase pattern density and speed and to optimize the performance.

Design in deep sub-micron processes. The current target is 65nm.

Limitations in 2D…

A Single PRAM Cell(in 2 dimensions)

CAM Cells

Match Storage

Matc

h lin

es

Glu

e L

ogic

In the older version of the AMchip,the match lines were a source of speed limitation because of their length and capacitance. The GlueLogic was large and slow.

Length -> Capacitance -> Reduced Speed

THE CONCEPT – VIPRAM Vertically Integrated Pattern Recognition Associative Memory

A Reduced Footprint and therefore greater pattern density.

Shorter Match lines and therefore greater speed.

Less Capacitance and therefore reduced power consumption

Each detector layer corresponds to a single tier

All communication from “CAM Tiers” to the single “Control Tier”

The PRAM concept is tailor-made for 3D design.

Much

Short

er

Much

Shorte

r

Another Single CAM Cell(this time in 3 dimensions)

Viewing this structure as a pseudo-layout some of the aforementioned benefits become even more obvious.

The 3-dimensional design of the VIPRAM makes the PRAM appear like a 2-dimensional array of “tubes”, each dedicated to a single pattern.

Communication with the outside world during normal operation is done solely through the Control Tier (the blue tier on top).

Pattern recognition for trackingis naturally a task in 3D

trackroad

Majority Logic – Old VersionM

atc

h L

ines

User-definedThreshold

Road Flag

Adder DigitalComparator

Majority Logic – New Version

0

1

Sel

0

1

Sel

0

1

Sel

0

1

Sel

0

1

Sel

0

1

Sel

0

1

Sel

0

1

Sel

0

1

Sel

0

1

Sel

0

1

Sel

0

1

Sel

Match1 Match2 Match3 Match4

Matc

h P

att

ern

Pass Transistor Logic

Majority Logic – New Approach

Majority Pattern

Meaning

111 Perfect Match

011 1 Missing Layer

001 2 Missing Layers

000 3 or More Missing Layers

Stage Input

Stage Output: Match

Stage Output:

Mismatch

111 111 011

011 011 001

001 001 000

000 000 000

For each stage…

In the end…

Can 3D exploit even more advantages from the new Majority Logic?

Yes. We have divided the 3D design by detector layer (i.e. each CAM Tier is dedicated to one detector layer) Therefore, any logical division by detector layer results in functions that can be sub-divided by tier.

Can 3D exploit even more advantages from the new Majority Logic?

0

1

Sel0

1

Sel0

1

Sel

0

1

Sel0

1

Sel0

1

Sel

0

1

Sel0

1

Sel0

1

Sel

0

1

Sel0

1

Sel0

1

Sel

M

atch

1

Match

2

Match

3

Match

4

Match Pattern

Readout The top tier (a.k.a. the Control

Tier) is a two dimensional array of elements whose position is indicative of its address and that contains an indication of whether or not a road was found. Compare this with a pixel array which is a two dimensional array of elements whose position is indicative of its address and that contains an indication of whether or not a hit was found.

In other words, high-speed readout architectures for pixel arrays can and should be used for VIPRAM readout.

Design for Simplicity The VIPRAM has two types of

tiers, CAM and Control. In the final design, there will be several CAM tiers and only one Control tier.

Each CAM tier is functionally identical to the others, but must maintain a unique relationship to the Control tier in order to work. In other words, patterns that come into the Control Tier from Detector “1” must be sent to the CAM tier dedicated to Detector “I”. Similarly, when data is sent from CAM tier #3, the Control Tier must know it came from CAM tier #3 and not some other CAM tier.

How can this be done without requiring unique mask sets for each CAM tier?

Great minds think alike? Having gone part-way

through this design procedure, the collaboration had the opportunity to meet with Bob Patti of Tezzaron who has been involved in 3D memory design from the beginning.

Tezzaron’s 3D Memories follow exactly this arrangement of Control Tier and (in Tezzaron’s case) Memory Tier.

In other words, we are following a beaten path, not blazing a new trail.

The Diagonal Via

The Diagonal Via was patented by BobPatti and Tezzaron in 2000. It convertsvertical position to horizontal position andallows a common mask set to provide uniqueaccess to each layer.

Conclusions and Future WorkThe VIPRAM is a new concept and now we

are developing a collaboration with Fermilab, University of Chicago, INFN and Argonne. ◦The immediate goal is a proof of principal◦The ultimate goal is a 3 order-of-magnitude

increase in performance (density+speed).At present, we are seeking funding for

the VIPRAM development. You will hear from us again at the next

TIPP (please pick a nice place for my wife…)

Background

Figure 13 - Pass Transistor Multiplexors in the Majority Logic

VIPRAM – A Vertically Integrated PRAM

Modern technology provides us with another approach…and another dimension.

At first, the idea was extremely simple – increase the pattern density by stacking otherwise normal AMchips. The outputs of existing AMchips are already in a daisy chain. The stacked AMchips would not need to “know” that they were part of a stack.

VIPRAM – A Vertically Integrated PRAM This was necessarily

modified to include “wrapping” an AMchip in circuitry that dealt with the 3D stacking, leaving an AMchip core that was identical to the 2D AMchips that are under development.

3D Wrapper AMchip03 Clone 3D Wrapper

Not the first to consider 3D Content Addressable Memory Oh and Franzone1 first

suggested the advantages of 3D design on CAMs in 2007

Their idea involved vertically integrating the CAM cell itself so that the Matchline was vertical. This minimized its length and therefore its capacitance.

The method is highly impractical since it requires f(N) 3D layers where N is the number of bits in the CAM cell.

CAMBit CellCAMBit CellCAMBit Cell

CAMBit Cell

Matc

hlin

e

3D Layer 1

3D Layer 2

3D Layer 3

3D Layer N

[1] E.C. Oh and P.D. Franzon, “Design Considerations and Benefits of Three-Dimensional Ternary Content Addressable Memory”, IEEE Custom Integrated Circuits Conference, 2007, p. 591

Again, this is a PRAM not a CAM There is a perfectly

natural, 3D functional division in a PRAM. Each detector layer gets its own 3D layer.

The vertical interconnect is not the CAM match line, but the Road line.

Moreover, each detector layer has independent data lines for both pattern matching and pattern loading, and this is a natural consequence of this architecture.

M

atc

h

M

atc

h

M

atc

h

M

atc

h

M

atc

h

M

atc

h

M

atc

h

M

atc

h

M

atc

h

M

atc

h

M

atc

h

Road!3D Layer 1

3D Layer 2

3D Layer 3

3D Layer N

How can we improve on this design?

4 blocks of 12806-layer patterns

~80%AM bank

~20%control

&interface

Move to another tier

in 3D

How can we fundamentally improve on this design?



Majority blockstill in standard cell~ 30%

can be also moved to thecontrol/interface tier in 3D

within each

pattern block

Fischer Tree (Mephisto Logic) P. Fischer introduced

the Mephisto readout architecture [1].

We found “Fischer Tree” easier to say.

It is a self-selecting, self-addressing priority encoding architecture that performs the task in log[N] time.

[1]“First implementation of the MEPHISTO binary readout architecture for strip detectors” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment Volume 461, Issues 1-3, 1 April 2001, Pages 499-504 8th Pisa Meeting on Advanced Detectors

Fischer Tree (Mephisto Logic)Fischer Trees can be

stacked if need be, so the two dimensional array in the Control Tier can be handled this way.

An alternate approach could take each output and push it into a stack.

Fisc

her

Tree

Fisc

her

Tree

Fisc

her

Tree

Fisc

her

Tree

Col 1 Col 2 Col 3 Col N

Fischer Tree

…

Documents

A new concept to use 3D vertical integration technology for fast pattern recognition Ted Liu, Jim Hoff, Grzegorz Deptuch, Ray Yarema Fermilab Questions