12
882 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008 Designing a 3-D FPGA: Switch Box Architecture and Thermal Issues Aman Gayasen, Vijaykrishnan Narayanan, Mahmut Kandemir, Member, IEEE, and Arifur Rahman Abstract—Three-dimensional (3-D) integration is an attractive technology to reduce wirelengths in a field-programmable gate array (FPGA). However, it suffers from two problems: one, the inter-layer vias are limited in number, and second, the increased power density leads to high junction temperatures. In this paper, we tackle the first problem by designing switch boxes that maxi- mize the use of the vias. Compared to the previously used subset switch box, our best switch box reduces the number of vias by about 49% and area-delay product by about 9%. For the second problem, we utilize the difference in power densities between CLBs and some of the hard blocks in modern FPGAs to distribute the power more uniformly across the FPGA. The peak temperature in a two-layer FPGA reduces by about 16 C after our change. Index Terms—Field-programmable gate arrays (FPGAs), switch-box, thermal issues, three-dimensional (3-D) integration. I. INTRODUCTION F IELD-PROGRAMMABLE gate arrays (FPGAs) are con- sistently improving in capacity and performance, and are now among the most popular devices in the market. With their regular structure, they also scale easily to future technologies. However, the large overheads of their programmable intercon- nect are severely limiting their growth. In an SRAM-based FPGA, the programmable interconnect resources take almost 70% of the die area and consume the major part of FPGA power. Furthermore, for most designs, they also constitute more than 50% of the critical path delay. Therefore, a reduction in the interconnect resources will greatly benefit FPGAs. Three-dimensional integration is a promising technique for reducing wire-lengths. It involves the stacking of multiple sil- icon wafers interconnected with vias. If every layer in a 3-D chip implements a normal (2-D) FPGA, stacking reduces the average Manhattan distance between logic blocks, which leads to fewer interconnect resources. Consequently, 3-D integration of FPGAs (which we refer to as 3-D FPGA) is an attractive tech- nique to improve the performance of FPGAs. Other gains, such as reduced design footprint and the ability to integrate different technologies, further favor 3-D FPGAs. Manuscript received November 10, 2006; revised March 26, 2007. This work was supported in part by the National Science Foundation under NSF CAREER 0093085, NSF CCF 0702617, and by a grant from MARCO/GSRC. A. Gayasen is with R&D Department, Synopsys, Sunnyvale, CA 94043 USA (e-mail: [email protected]). N. Vijaykrishnan is with the Departments of Computer Science and Engi- neering and Electrical Engineering, Pennsylvania State University, University Park, PA 16802 USA. M. Kandemir is with the Computer Science and Engineering Department, Pennsylvania State University, University Park, PA 16802 USA. A. Rahman is with Xilinx Research Laboratories, San Jose, CA 95124 USA. Digital Object Identifier 10.1109/TVLSI.2008.2000456 Three-dimensional technology, however, suffers from two problems: 1) the inter-layer vias are limited in number and 2) stacking increases the power density inside the package, which leads to high junction temperatures. These two issues are the focus of this paper. We present architecture-level solutions for increasing via utilization and for reducing the junction temperatures. The inter-layers vias are limited because they are large com- pared to the minimum feature size on the die. While the finest vias currently available are about 1 m 1 m with a pitch of about 3 m [1], the global wiring pitch within a die is about 290- for 65-nm technology [2]. Although fabrication engineers are trying to reduce the via dimensions, the minimum feature size on the die is also shrinking. Therefore, the inter-layer vias are ex- pected to remain larger than the wire dimensions in metal layers within a die. In this paper, we tackle this problem by designing switch boxes that maximize the use of the vias. We design six types of switch boxes, each varying in the flexibility provided for inter-layer connectivity. The architectures are modeled in VPR, which we extended for 3-D. Empirical evaluation using MCNC benchmarks shows that, compared to the subset switch box used in previous studies [3], our best switch box reduces the number of vias by about 49% and area-delay product by about 9%. Junction temperature is a growing concern in FPGAs. Recent articles on thermal management from leading FPGA manu- facturers [4], [5] clearly indicate the growing importance of thermal issues in FPGA designs. Improvements in fabrication technology, circuit design, architecture, and tools, have all contributed toward an increase in FPGA logic density as well as clock frequency. Increased logic density and performance have in turn led to an increase in power densities, which manifests itself in the form of high temperatures. Since 3-D integration stacks multiple silicon layers, it increases the effective power density, which makes 3-D integrated circuits (ICs) suffer from severe thermal issues. In this paper, we utilize the difference in power densities between CLBs and some of the hard blocks in modern FPGAs to distribute the power more uniformly across the FPGA. Experimentation with a fabric resembling the Virtex-4 FPGA [6] shows a reduction in the peak temperature of a 2-layer FPGA by about 16 C after our change. The remainder of this paper is organized as follows. Section I discusses related work. Section II gives a brief overview of 2-D switch boxes and 3-D technology. In Section III, we explore six 3-D switch box (SB) topologies for the case when the vias are fewer than the horizontal wires. 1 The switch box topologies explored in this study are described in Section III-A. 1 A preliminary version of this work was presented at FCCM-06 [7]. 1063-8210/$25.00 © 2008 IEEE

Designing a 3-D FPGA- Switch Box Architecture and Thermal Issues

Embed Size (px)

DESCRIPTION

dfdf

Citation preview

  • 882 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

    Designing a 3-D FPGA: Switch Box Architectureand Thermal Issues

    Aman Gayasen, Vijaykrishnan Narayanan, Mahmut Kandemir, Member, IEEE, and Arifur Rahman

    AbstractThree-dimensional (3-D) integration is an attractivetechnology to reduce wirelengths in a field-programmable gatearray (FPGA). However, it suffers from two problems: one, theinter-layer vias are limited in number, and second, the increasedpower density leads to high junction temperatures. In this paper,we tackle the first problem by designing switch boxes that maxi-mize the use of the vias. Compared to the previously used subsetswitch box, our best switch box reduces the number of vias byabout 49% and area-delay product by about 9%. For the secondproblem, we utilize the difference in power densities between CLBsand some of the hard blocks in modern FPGAs to distribute thepower more uniformly across the FPGA. The peak temperature ina two-layer FPGA reduces by about 16 C after our change.

    Index TermsField-programmable gate arrays (FPGAs),switch-box, thermal issues, three-dimensional (3-D) integration.

    I. INTRODUCTION

    F IELD-PROGRAMMABLE gate arrays (FPGAs) are con-sistently improving in capacity and performance, and arenow among the most popular devices in the market. With theirregular structure, they also scale easily to future technologies.However, the large overheads of their programmable intercon-nect are severely limiting their growth. In an SRAM-basedFPGA, the programmable interconnect resources take almost70% of the die area and consume the major part of FPGApower. Furthermore, for most designs, they also constitutemore than 50% of the critical path delay. Therefore, a reductionin the interconnect resources will greatly benefit FPGAs.

    Three-dimensional integration is a promising technique forreducing wire-lengths. It involves the stacking of multiple sil-icon wafers interconnected with vias. If every layer in a 3-Dchip implements a normal (2-D) FPGA, stacking reduces theaverage Manhattan distance between logic blocks, which leadsto fewer interconnect resources. Consequently, 3-D integrationof FPGAs (which we refer to as 3-D FPGA) is an attractive tech-nique to improve the performance of FPGAs. Other gains, suchas reduced design footprint and the ability to integrate differenttechnologies, further favor 3-D FPGAs.

    Manuscript received November 10, 2006; revised March 26, 2007. This workwas supported in part by the National Science Foundation under NSF CAREER0093085, NSF CCF 0702617, and by a grant from MARCO/GSRC.

    A. Gayasen is with R&D Department, Synopsys, Sunnyvale, CA 94043 USA(e-mail: [email protected]).

    N. Vijaykrishnan is with the Departments of Computer Science and Engi-neering and Electrical Engineering, Pennsylvania State University, UniversityPark, PA 16802 USA.

    M. Kandemir is with the Computer Science and Engineering Department,Pennsylvania State University, University Park, PA 16802 USA.

    A. Rahman is with Xilinx Research Laboratories, San Jose, CA 95124 USA.Digital Object Identifier 10.1109/TVLSI.2008.2000456

    Three-dimensional technology, however, suffers from twoproblems: 1) the inter-layer vias are limited in number and2) stacking increases the power density inside the package,which leads to high junction temperatures. These two issues arethe focus of this paper. We present architecture-level solutionsfor increasing via utilization and for reducing the junctiontemperatures.

    The inter-layers vias are limited because they are large com-pared to the minimum feature size on the die. While the finestvias currently available are about 1 m 1 m with a pitch ofabout 3 m [1], the global wiring pitch within a die is about 290-for 65-nm technology [2]. Although fabrication engineers aretrying to reduce the via dimensions, the minimum feature size onthe die is also shrinking. Therefore, the inter-layer vias are ex-pected to remain larger than the wire dimensions in metal layerswithin a die. In this paper, we tackle this problem by designingswitch boxes that maximize the use of the vias. We design sixtypes of switch boxes, each varying in the flexibility providedfor inter-layer connectivity. The architectures are modeled inVPR, which we extended for 3-D. Empirical evaluation usingMCNC benchmarks shows that, compared to the subset switchbox used in previous studies [3], our best switch box reduces thenumber of vias by about 49% and area-delay product by about9%.

    Junction temperature is a growing concern in FPGAs. Recentarticles on thermal management from leading FPGA manu-facturers [4], [5] clearly indicate the growing importance ofthermal issues in FPGA designs. Improvements in fabricationtechnology, circuit design, architecture, and tools, have allcontributed toward an increase in FPGA logic density as well asclock frequency. Increased logic density and performance havein turn led to an increase in power densities, which manifestsitself in the form of high temperatures. Since 3-D integrationstacks multiple silicon layers, it increases the effective powerdensity, which makes 3-D integrated circuits (ICs) suffer fromsevere thermal issues. In this paper, we utilize the differencein power densities between CLBs and some of the hard blocksin modern FPGAs to distribute the power more uniformlyacross the FPGA. Experimentation with a fabric resembling theVirtex-4 FPGA [6] shows a reduction in the peak temperatureof a 2-layer FPGA by about 16 C after our change.

    The remainder of this paper is organized as follows. Section Idiscusses related work. Section II gives a brief overview of2-D switch boxes and 3-D technology. In Section III, weexplore six 3-D switch box (SB) topologies for the case whenthe vias are fewer than the horizontal wires.1 The switch boxtopologies explored in this study are described in Section III-A.

    1A preliminary version of this work was presented at FCCM-06 [7].

    1063-8210/$25.00 2008 IEEE

  • GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES 883

    Section III-B explains the experimentation methodology andSection III-C analyzes the exploration results. Section IVanalyzes (see Section IV-A) the thermal profiles in FPGAs, andthen proposes an alternate organization of a two-layer FPGAto reduce the peak die temperature (see Section IV-B). Finally,Section V summarizes the contributions of this paper.

    II. RELATED WORK

    A. 3-D FPGAs

    The advantages of 3-D FPGAs have evoked significant in-terest, and several studies have looked at them in the past. Morethan a decade ago, Alexander et al. [8] presented a 3-D FPGAthat used package-level integration to stack multiple 2-D FPGAsinterconnected using solder bumps. The minimum pitch of thesevertical interconnects was 100 m. Campenhout et al. [9] pro-posed opto-electronic FPGAs, in which the inter-chip commu-nication used optical links. The optical links provide a large ver-tical channel density. The Rothko 3-D FPGA [10] was a 3-D ex-tension of the Triptych sea-of-gates architecture [11], consistingof routing and logic blocks. The 3-D integration was done at thewafer-level and inter-layer communication used metal vias. Adynamically reconfigurable 3-D FPGA was presented in [12],which consisted of three physical layers: routing and logic blocklayer, routing layer, and memory layer. Recently, Lin et al. [13]analyzed the performance benefits of a monolithically stacked3-D FPGA. Their 3-D integration technology provided very finevias, which allowed them to stack the configuration memory ontop of the rest of the FPGA (logic blocks and interconnects).

    Researchers have also looked at theoretical models for 3-DFPGAs. Rahman et al. [14] presented an analytical model forpredicting interconnect requirements in 3-D FPGAs, and esti-mated over 50% reduction in channel width, interconnect delay,and power dissipation, when compared to 2-D FPGAs. Kwonet al. [15] recently extended this model to incorporate clusteredlogic blocks (similar to Virtex-2 [6]).

    On the computer-aided design (CAD) front, Ababei et al.[3], [16] recently presented a partitioning-based placement al-gorithm for 3-D FPGAs, which primarily focused on reducingthe inter-layer vias. However, their router was not timing-driven.

    Although several researchers have proposed 3-D FPGAs,the detailed routing architecture of a 3-D FPGA remains unex-plored. Ababei et al. [3] assumed a subset switch block (see thedefinition in Section II-B). Although Wu et al. [17] designeduniversal 3-D switch blocks, they used track count as the solemetric of quality. Furthermore, they assumed that the numberof inter-layer vias is the same as the horizontal channel width.In todays technology, especially if we stack more than twolayers, the vias are much thicker than the horizontal wires (1

    m versus 0.1 m), which makes this assumption impractical.

    B. Thermal Issues

    Package designers have been considering thermal issuesfor a long time. Heat sinks, spreaders, and fans are the mostcommon examples of package level techniques. Instead ofconsidering variations in the temperatures on the die, theydesign the package to support the worst case specifications of

    the design. They typically provide the user with the thermalresistance of the package, which is used to estimate thejunction temperature using

    (1)

    where is the ambient temperature, and refers to thetotal power consumed by the chip.

    As designing the package for the worst case junction tem-perature started becoming too expensive, researchers startedlooking at design level solutions to reduce the temperature. Acommon example is dynamic thermal management (DTM),where the design is run at a reduced power (and performance)if the chip temperature increases beyond a previously setthreshold. Thermal sensors measure the temperature, andpower is reduced by lowering the clock frequency or the supplyvoltage, and clock-gating [18].

    Design level techniques can also aid in removing the heat gen-erated by the design. For example, thermal-aware floorplanningtries to reduce the hotspots on the die by distributing the tem-perature uniformly [19], [20]. Researchers have mostly focusedat microprocessors in these works. Thermal placement is a sim-ilar technique applied at the placement stage. Chen and Sapat-nekar [21] proposed a partition-driven algorithm for standardcell thermal placement. Thermal floorplanning and placementare particularly attractive because they impact the performanceless than DTM.

    On the modeling front, several researchers have developedtools for estimating the die temperature. Among them, HotSpot[22] is an architecture-level thermal simulator, which can per-form transient as well as steady-state temperature estimation.HS3d [23] is another architecture-level tool that performs onlysteady-state temperature estimation, but is orders of magni-tude faster than HotSpot. Both HS3d and HotSpot providethe flexibility to set several package and die parameters, suchas the spreader thickness, package-to-air thermal resistance

    , and substrate thickness. Since, in this work, welook at only steady-state temperatures, we use HS3d.

    Recently, some researchers have proposed solutions forthermal issues in 3-D ICs too. Cong et al. [24] suggested athermal-driven floorplanning for 3-D. Goplen and Sapatnekar[25] also proposed a temperature-driven placement algorithmfor 3-D standard cell application-specific integrated circuits(ASICs). Studies have also indicated that careful insertion ofthermal vias can reduce the peak temperature [26], [27].

    Thermal issues in FPGAs are relatively unexplored. Someresearchers have proposed the use of distributed sensors formonitoring temperatures in FPGAs [28], [29]. They, however,considered only configurable logic blocks (CLBs) in the fabric,and consequently, observed very little temperature variationsacross the die. In contrast, we focus on platform FPGAs,containing embedded circuit blocks including high-speed trans-ceivers, multipliers, delay-locked loops (DLLs), and memories[6], [30]. Here, we first characterize the temperature distribu-tion in a modern 2-D FPGA, and then observe how it changeswhen we stack multiple such layers. Next, we propose changesin the placement of hard blocks in the 3-D FPGA to reduce thedie temperature.

  • 884 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

    Fig. 1. Two kinds of stacking. (a) F2f. (b) F2b.

    III. BACKGROUND

    A. 3-D Technology Overview3-D chip design is a promising methodology to alleviate many

    interconnect problems. Current state of the art chips are 2-D,which means that they have only one plane of active layer thatcontains all the devices. Note that although no transistor (device)is stacked on top of other transistor (device), the metal wires in-terconnecting these devices typically span multiple layers, withthe higher layers occupied by global wires. 3-D ICs extend thisconcept to the devices by stacking multiple device layers in thevertical dimension. In this paper, we use face to refer to theside of the wafer with top-most metal layer and back to theopposite side.

    Several technologies, such as beam recrystallization, siliconepitaxial growth, processed wafer bonding, and solid phasecrystallization, enable the vertical integration of multiple devicelayers [31]. Among these technologies, wafer bonding is partic-ularly promising. It involves the bonding of two fully processedwafers (on which the devices and interconnects have alreadybeen fabricated). Since the individual wafers are fabricatedseparately, it is possible to integrate completely different tech-nologies, and have a very large number of layers. The inter-layervias in this technology can be as fine as 1 m 1 m at a 3 mpitch [1]. The wafers can be bonded in two ways: face-to-face(f2f) or face-to-back (f2b). In the former, a wafer is invertedto bond with another wafer [see Fig. 1(a)]. This reduces thearea overhead of the inter-layer vias because they do not needto pass through the silicon substrate. However, this limits thenumber of layers to only two. The second way, face-to-back,does not invert the wafer [see Fig. 1(b)]. Consequently, it canintegrate more than two layers of Silicon. However, since theinter-layer vias now need to pass through the Si layer, they takeup die space. F2f alone is not compatible with flip-chip becausewe need to keep the face side exposed for bumping. F2f withthrough die via (TDV) may be compatible with flip-chip. Inthis case, we use TDVs to bring signals outside the stacked-diefor bonding. Compared to f2f, f2b is more scalable (we can addadditional layers and use the same process). Note that these twotechniques can also be combined to use various combinationsof f2f and f2b layers, mixed with back-to-back (b2b) stackingas well. In this study, we evaluate the f2f and f2b wafer-bondingtechniques for 3-D FPGA integration.

    TABLE IVIA PROPERTIES

    Fig. 2. 2-D switch boxes. X , Y , X , Y mark their sides. (a) Subset. (b)Universal.

    Since the wafer-bonding 3-D technology is still being per-fected, several methods are being explored. These methods re-sult in different via dimensions and wafer thicknesses. For thisstudy, we explore three different methods, which result in thevia dimensions shown in Table I. Via 1 reflects the process fromTezzaron [1], which uses a wafer thickness of 10 m. Dependingon the process steps, we may need handle wafers to support thethin wafers. For a two-layer f2f stack, we may be able to avoidthe handle wafers if we bond first and then thin the wafers. Atthe other extreme is via 3 that uses 50 m wafers, which reflectsthe process in [32]. A larger wafer thickness imparts mechan-ical strength to the wafers, and eliminates the need for handlewafers. Via 2 reflects an intermediate process that we use to il-lustrate the trends due to via dimensions. Note that via lengthis important only for f2b integration technology. An integrationtechnology from MIT uses silicon-on-insulator (SOI) wafers toreduce the device layer thickness to less than a micrometer [33].We do not model this technology in this work.

    B. 2-D Switch BoxesOur study will focus on island-style SRAM-based FPGAs.

    FPGAs from Xilinx and Altera belong to this category. TheCLB consists of lookup tables (LUTs) and flip-flops (FFs).Routing wires (tracks) and programmable switches constitutethe routing channel. Channel width refers to the number oftracks in a channel. The CLBs connect to the channel throughconnection boxes. The routing wires connect among themselvesthrough switch boxes.

    Switch box topology refers to the connectivity provided bythe switch box. Researchers have explored several topologies[34][38] (see Fig. 2). The subset (also called disjoint) topology,used in Xilinx XC4000 FPGAs, connects tracks of the samenumber in all four directions. This divides the channel into dis-joint sets of tracks and a net uses the same track number forits route. Universal topology provides more flexibility than dis-joint. It facilitates connectivity for all possible global routes oftwo-terminal nets.

    Research has shown that the universal switch box results infewer tracks in the channel [39]. Hyper-universal switch boxesprovide even greater flexibility, and facilitate the connectivity

  • GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES 885

    Fig. 3. 3-D FPGA.

    for all possible global routes of k-terminal nets [40]. However,they use more switches than universal switch boxes.

    IV. 3-D DETAILED ROUTING ARCHITECTUREWe extend the island-style architecture of 2-D FPGAs to

    3-D (see Fig. 3). The CLB consists of LUTs and FFs. Theswitch box is modified to connect the inter-layer vias (ILVs)to the horizontal wires (CHANX and CHANY), and also withother ILVs. The ILVs form channels in the vertical direction(CHANZ). The architecture is symmetric in the and direc-tions, i.e., CHANX and CHANY contain the same number oftracks. CHANZ, however, differs from CHANX and CHANYin its width, which is influenced by the via density provided bythe 3-D technology. We use to refer to the number of vias(i.e., vertical channel width) and for the horizontal channelwidth. Fig. 3 shows the case when .

    CHANZ differs from CHANX and CHANY in anotherrespect too. The length of these vias depends on the waferthickness, which is typically much smaller than the average2-D wire-length (e.g., wafer thickness 10 m for Tezzaronsprocess [1], length of a wire spanning 4 CLBs 150 min a 65-nm process). These differences between vertical andhorizontal channels must be accounted for to design a good3-D FPGA. Next, we describe the various 3-D architectures weexplored. Where appropriate, we also discuss how technologyparameters influence our design.

    A. Switch Box TopologyThe flexibility of a switch box (SB) refers to the number

    of wires to which each incoming wire can connect. Previousstudies have shown that for a 2-D FPGA, an of 3 providesgood routability [34]. In such SBs, a track connects to one trackon each of the other three sides of the SB. Subset and universaltopologies are examples of such SBs (see Fig. 2).

    These 2-D SBs are extended to 3-D by adding two more faces,which contain terminals for vertical wiresone for going upand another for going down. Since the vias will be fewer thanthe horizontal wires, the two vertical faces will contain fewerterminals than the other four.

    Fig. 4 shows the SBs we created for this study for and. Normally, the 3-D SB is visualized as a cube, where

    each face of the cube represents one of the directions. However,for ease of illustration, we have flattened the SB and shown it asa hexagon, where each side represents a direction: North ,South , East , West , top , or bottom .Furthermore, we show only the connections to the vertical faces( and ). For all SBs, the horizontal wires (CHANX andCHANY) use either the subset or universal connections amongthemselves. These connections were described in Section II-Band illustrated in Fig. 2. For clarity, we do not show the hori-zontal connections in Fig. 4. The first four SBs use subset con-nections among the horizontal wires, and the last two use uni-versal. Fig. 4 also tabulates the connections from the verticalfaces, where refers to the th terminal on the face ofthe SB. The Appendix formally describes the six SBs.

    The first SB [subset, see Fig. 4(a)] is an extension of the 2-Dsubset SB. This SB connects the same track number on all sides.Consequently, the entire routing fabric gets divided into disjointsubsets, and a net uses the same track number for its entire route.Note that only the first of the horizontal wires connect tothe vias. While these wires have a flexibility of 5 (three connec-tions to the other horizontal directions and two to the verticalones), the other wires connect to only horizontal tracks (flexi-bility ). Apart from decreasing the routing flexibility, thisresults in a difference in the capacitive loads of the horizontalwires: large for the first wires and small for the rest.

    The second SB [subset-split, see Fig. 4(b)] modifies thesubset SB by allowing the first horizontal tracks to connectto the vias going above, and the last to those going below.This implies that now there are twice as many horizontal wiresthat connect to the vertical wires. Therefore, if nets do notfan-out at the SB, then this SB provides greater flexibility tothe vertical directions. A limitation, however, is that the firstcan only go above, and the last , only below. Consequently, ifa net needs to fan-out to both top and bottom, then it needs touse two horizontal tracks (compared to one for subset). This SBdistributes the capacitive loads on the horizontal tracks moreevenly than the subset SB.

    The subset-split SB, although more flexible than subset,suffers from the disjoint property of subset SBs: the entirerouting fabric is divided into disjoint subsets and a net canuse only one of those subsets. This disjoint subset consists ofvertical track and horizontal tracks and (where

    ). In order to improve upon this, wemodified the connections to the vertical faces as shown inFig. 4(d). Now, terminal connects to track 1 on the side

    , but track 0 on side . This allows the net to switch tracksat the SBs. We call this SB subset-twist.

    The main objective of the subset-twist SB is to improve theflexibility in the vertical direction. Another way to achieve thisis by adding more switches to the vertical facesthe approachused by the next, subset-more SB [see Fig. 4(c)]. Here, the ver-tical terminal connects to both and terminals onthe horizontal faces (where ). The extraswitches have a twofold effect. On the one hand, they improvethe flexibility in the vertical direction, and on the other, they in-crease the area of the SB and the capacitive loads on the wires.

    MostafaComment on TextI think this is incorrect because I can connect to Z1,0 then I shall connect Z1,0 to Z0,0

  • 886 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

    Fig. 4. 3-D switch boxes forH = 4, V = 2. (a) Subset. (b) Subset-split. (c) Subset-more. (d) Subset-twist. (e) Universal-twist. (f) Universal-more.

    The next two switch boxes use universal connections amongthe horizontal wires. The vertical connections in the universal-twist SB are identical to the subset-twist SB [see Fig. 4(e)]. How-ever, due to universal connections among the horizontal wires, itprovides greater flexibility. The last SB, universal-more furtherincreases the flexibility by adding more switches to the verticalfaces. For example, in Fig. 4(f), track 0 on side connects toboth, tracks 1 and 3 on the side. These extra switches im-prove the flexibility in the vertical direction, but also increasethe area of the SB and the capacitive loads on the wires.

    B. Experimentation

    We modified VPR [41], an FPGA place and route tool avail-able from University of Toronto, to model our 3-D FPGA archi-tectures. We refer to this tool as 3-D VPR. It uses simulated an-nealing to place the logic blocks and then routes the nets using amodified path-finder algorithm. Both placement and routing aretiming-driven, i.e., they try to reduce the delays of critical paths.

    The 2-D placement algorithm of VPR optimized the fol-lowing cost function:

    where is the number of nets in the design,is the number of sink pins of net , is the estimateddelay from the source of net to sink number . For each net, , and denote the and spans of its bounding

    box, respectively. The factor compensates for the fact thatthe bounding box wire-length model underestimates the wiringnecessary to connect nets with more than three terminals. Itsvalue depends on the number of terminals of net .and are the average channel capacities in the - and

    -directions respectively, over the bounding box of net . Thevalue of adjusts the weight given to congestion in the costfunction. The larger the value of , the more wiring in narrowchannels is penalized relative to wiring in wider channels. Avalue of 1 has been previously found to work best, and is usedin this work.

    To the 2-D cost function, we add a term, , to reducethe vertical span of the nets. This is similar to what was proposedin [3], except that, similar to the congestion cost terms for - and

    -directions, we incorporate congestion in

    MostafaRectangle

    MostafaHighlightwhat is Simulated Annealing?

    MostafaComment on Textwhat is cost function?

    MostafaRectangle

    MostafaRectangle

  • GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES 887

    Fig. 5. Experimentation flow.

    By varying the values for , , and for two of the benchmarkdesigns, we found , , and to give thesmallest critical path delay. Hence, we use these values in allour experimentation.

    1) Architecture and Technology Parameters: The logicblocks in our experiments consist of four four-input LUTs andfour FFs, with ten inputs and four outputs. All the inputs areequivalent, and so are the outputs, that is, every input pin caninternally drive any LUT input. The pins are uniformly dis-tributed around the sides of the CLB. Each output pin connectsto 25% of the tracks in the adjacent channel, and every input pinconnects to 60% of the adjacent tracks. All horizontal segments(CHANX and CHANY) in the routing fabric span 1 CLB, andare driven by tri-state buffers.

    The vertical channel (CHANZ) has vias that transcend onlysingle layer. When these vias are very short (10 m), we useminimum size pass transistor switches to drivethem. However, for the case when they are 50 m high, we use a5X tri-state buffer switch to drive them. In contrast, the buffersdriving the CHANX and CHANY segments are always 5X theminimum, and consist of two stages.

    We calculated the resistance and capacitance values for thevias and horizontal wires by using the Predictive TechnologyModel (PTM) [42]. The vias and wires are made using copperin our target technology. Timing parameters for switches werederived from Spice simulations using 65 nm BPTM.

    We explored a spectrum of 3-D technologies: with the viaproperties shown in Table I, number of layers varying from 2 to5, and either f2f or f2b bonding technology. The finest vias of1 m thickness are in line with Tezzarons process [1], whilethe coarsest ones (of 5 m thickness) are reflecting the processfrom [32]. A perfect alignment between layers is assumed.

    2) Experimentation Flow: Fig. 5 shows the experimentationflow. A design in blif format is packed into clusters (CLBs) of4-LUTs using T-VPack. On the basis of the number of CLBsin the design, 3-D VPR creates the smallest FPGA fabric that

    would contain the design. It takes the number of layers as aninput, and finds the minimum square size of one layer, assumingthat all layers contain the same number of CLBs. The packednetlist is then placed and routed using 3-D VPR to find the min-imum number of vias for a large horizontal channel width (for five layers). The router performs a binary search over thenumber of vias to find the minimum value. Fixing the numberof vias to 130% of the minimum value, we reroute the designto find the minimum possible channel width. Thus, this flowgives priority to reducing the number of vias instead of channelwidth, which makes sense because the vias take more area thanthe horizontal wires. However, most FPGAs provide more thanthe minimum number of channels to ensure good performancefor the worst case too. On similar lines, we add 30% to theminimum via and channel-width numbers while evaluating theFPGA. Using these values (which may be different for every de-sign), we reroute the design to obtain the critical path delay ofthe routed design. This flow is repeated for every switch-blocktype for all the 20 MCNC benchmark designs.

    3) Area Model: VPR estimates area by counting the numberof transistors in the fabric. This works because the 2-D FPGAarea is transistor-dominated. In the case of 3-D, however, wemust add the via areas to the transistor areas. The two types of3-D integration technologies discussed in Section II-A need dif-ferent area models. In the case of f2f bonding, the ILVs do notpass through the Silicon (see Fig. 1). Consequently, they do nottake any die area. In contrast, the f2b bonding requires vias topass through the Silicon (through-vias). In this case, every viaconsumes some Si area. We incorporate the area overhead ofthese through-vias in our area estimates, as shown in the fol-lowing:

    The and numbers are estimated byVPR modified for 3-D. The via area is calculated by countingthe number of vias and multiplying it by the area taken by a via.While comparing the area of two architectures, we estimate thetotal FPGA area and divide it by the number of CLBs in thefabric to estimate the area per CLB. Thus, the area numbers inSection IV include the area for one logic block (CLB) and therouting resources (horizontal wires, switches, and vias) associ-ated with it.

    C. Results and Analysis

    Here, we show the results for two extremes of 3-D integra-tion: first, a simple stack of two layers and, second, a more ag-gressive stack of five layers. Together they capture the trendsseen by varying the number of layers in a 3-D FPGA. Whilethe two-layer FPGA can be fabricated using f2f or f2b waferbonding, the five-layer FPGA must be fabricated using f2b. Forall these technology points, we evaluate the effects of differentvia dimensions shown in Table I. The metric we primarily lookat to evaluate an architecture is the area-delay product (ADP),

    MostafaRectangle

    MostafaHighlight

    MostafaRectangle

  • 888 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

    Fig. 6. Comparing 2-D and 3-D FPGAs.

    because it is inversely proportional to the throughput of the de-vice [43]. In all the figures in this section, we plot the geometricmeans over 20 MCNC benchmarks.

    The first step towards evaluating 3-D FPGAs is comparingthem with 2-D FPGAs. Fig. 6 shows the average area (per CLB),delay, and ADP for 1, 2, and 5 layers in 65-nm technology. Forboth 2 and 5 layers, it shows the results for the three via tech-nologies of Table I. The key 2-layers-f2f-3 m in Fig. 6 refersto the use of two device layers, stacked using f2f bonding withvias at 3- m pitch (via 1 in Table I). Fig. 6 uses the same switchbox (universal-twist) for all cases.

    The area is estimated as explained in Section III-B3. Note thatarea reduces as we increase the number of layers or reduce thepitch of the vias. The smallest area is obtained when five layersare used with 3- m-pitch vias, in which case, the CLBs area isonly 84% of the single-layer case. Furthermore, we observe thatthe area of the two-layer FPGA using f2f bonding remains con-stant with increasing via pitches. This happens because the viasin this case are accommodated within the transistors footprint,and the CLB area is determined by the transistors.

    The critical path delay also reduces with increasing numberof layers (second set of bars in Fig. 6). The five-layer FPGAwith 5- m-pitch vias (best case) reduces the delay by 24.7%compared with the single layer case and by 14% compared withthe two-layer case. This happens because interconnect lengths(and hence delays) reduce as we increase the number of layers.F2f and f2b technologies do not have any significant impact onthe delay.

    The reduction of area and delay in 3-D combine to signif-icantly reduce the area-delay product of the FPGA (third setof bars in Fig. 6). The five-layer FPGA reduces the area-delayproduct by 36% (for 3- m-pitch vias), while a two-layerFPGA does so by about 20%, when compared to a single-layerFPGA. These results justify the interest in 3-D FPGAs anddemonstrate that we can obtain significant improvements evenby the relatively simple integration of two FPGA layers. Theresults also indicate that even by using the moderately aggres-sive 5- m-pitch vias, we can significantly improve upon 2-DFPGAs. Table II tabulates some of the results for five-layerFPGA using universal-twist switch box.

    Now, we explore the different switch boxes to find which onegives the best values for area, delay, and area-delay product.Fig. 7 shows the results for five layers, using 65-nm process and3- m-pitch vias (via 1 in Table I). The results for two layers

    TABLE IIEXPERIMENTAL RESULTS FOR UNIVERSAL-TWIST (65 nm, 3-m PITCH VIAS)

    Fig. 7. Comparing the switch boxes for five-layer FPGA.

    follow a similar trend. The first set of bars in Fig. 7 compare theflexibility in the vertical direction of the various SBs by lookingat the minimum number of vias they take for the designs to route.Observe that the universal-more type of SB provides the greatestflexibility (minimum number of vias). In fact, it uses only 49%of the vias needed by the subset SB. It also results in the min-imum channel width among all the SBs. However, the total areais determined by both, the vias and the number of transistors inthe fabric. Since universal-more uses extra switches to increaseflexibility, we observe that the total area taken by the FPGAusing universal-more SB is larger than that of the one with uni-versal-twist SB. This indicates that the universal-twist SB pro-vides greater flexibility per switch than the universal-more SB.

    While the area metric reduced to 88% by using universal-twist SB instead of the subset SB, the critical path delay doesnot show such a strong variation. This happens because thetiming-driven router of 3-D VPR gives less weight to congestionfor timing-critical nets, which implies that they almost alwaystake the shortest possible route. The smallest delay is obtainedfor the subset-split case. Note that adding more switches to theSB increases the delay, which is explained by the larger para-sitic capacitances due to these switches. Because the variation

    MostafaHighlight

    MostafaHighlight

  • GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES 889

    Fig. 8. Comparing the switch boxes for different via technologies for five-layerFPGA.

    Fig. 9. Comparing the switch boxes for different process nodes for five-layerFPGA.

    in delay is not much, the trend for area-delay product is similarto that for area. The universal-twist offers the lowest area-delayproduct, 91% of that for the subset SB.

    Next, we explore how the via properties affect the choice ofSB for the five-layer FPGA. Fig. 8 compares the area-delayproduct for different SBs for the three via technologies ofTable I. The -axis is labeled as .Intuitively, as the vias become larger, we will prefer the SBthat provides the minimum number of vias. Fig. 8 demonstratesthis trend. As vias become larger, the difference between thearea-delay products for universal-twist and universal-more(which produces the minimum number of vias) reduces. Thishappens because, as vias become larger, the area taken bythe vias starts dominating the total area. However, even for10- m-pitch vias (the largest case), the universal-twist SBcontinues to provide the smallest area-delay product.

    We also look at the effect of technology scaling on the perfor-mance of our SBs in a five-layer FPGA (see Fig. 9). The vias areassumed to remain at 3- m pitch while the CMOS technologyscales from 65 to 45 and 32 nm. Again, the universal-twist re-mains the best SB for all process nodes. Since the via dimen-sions remain constant among the different process nodes, thearea penalty due to through-vias increases as transistor dimen-sions shrink. Consequently, the universal-more SB (which givesthe minimum number of vias) improves as process scales. How-ever, even for the 32-nm node, the universal-twist SB remainsthe best from an area-delay product perspective.

    V. THERMAL ISSUES IN 3-D FPGASDie temperature must be controlled because it impacts the

    timing, leakage power, package design, and lifetime of the

    Fig. 10. Thermal profile of 4VFX100.

    device. Circuits run slower when they are hot, and their lifetimereduces exponentially with increasing temperature. Besides,plastic packages can only withstand relatively low tempera-tures. Furthermore, leakage power increases exponentially withtemperature, which can cause a thermal runaway.

    All these factors have forced chip manufacturers to employtechniques to control the die temperature. Section I describedsome of these techniques.

    Thermal issues in FPGAs are relatively unexplored. Some re-searchers have proposed the use of distributed sensors for mon-itoring temperatures in FPGAs [28], [29]. They, however, con-sidered only CLBs in the fabric, and consequently, observedvery little temperature variations across the die. In contrast, wefocus on platform FPGAs, containing embedded circuit blocksincluding high-speed transceivers, multipliers, DLLs, and mem-ories [6], [30]. Here, we first characterize the temperature distri-bution in a modern 2-D FPGA, and then observe how it changeswhen we stack multiple such layers. Next, we propose changesin the placement of hard blocks in the 3-D FPGA to reduce thedie temperature.

    A. Thermal-Characterization of FPGAs: 2-D to 3-DMost modern FPGAs incorporate hard blocks in the fabric.

    These blocks exhibit different power characteristics, leading tovariations in power densities within the chip. We calculated thepower numbers for blocks in a Virtex-4 FX100 device by usingXilinx power spreadsheets and observed that the power densitiesvary from 0.78 for the DSP blocks to 11.46 for the DCMs (seeTable III). Such a vast range results in large temperature varia-tions within the FPGA die (see Fig. 10). The hotspots occur nearthe MGTs and DCMs, which are about 14 C above the coolestportions.

    Table IV shows the temperatures for 3-D FPGAs consistingof identical FPGA layers of 4VFX100. The temperatures wereestimated using HS3d [23] with the parameters listed in Table V.The value of 0.5 C/W reflects the thermal resistanceof a high-end package with a moderate heat sink. Note that, forboth 2-D and 3-D FPGAs, we used the same power numbersfor individual blocks (listed in Table III). This doubles the totalFPGA power for a two-layer FPGA compared to a single-layerFPGA. This is a pessimistic estimate, because power consump-tion in the routing fabric is expected to reduce when we stack

  • 890 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

    TABLE IIIPOWER DENSITIES IN 4VFX100 (FREQ: 500 MHz)

    TABLE IVEFFECT OF STACKING ON TEMPERATURE

    TABLE VPARAMETERS FOR TEMPERATURE ESTIMATION IN HS3D

    multiple layers (because of reduction in the number of switchesand their load capacitances). We estimated temperatures for twoextremes of 3-D technologies: one with very thin layers andfine vias (Tezzarons process, Via 1 of Table I), and anotherwith 5- m vias and 50- m layers (Via 3 of Table I). For boththese technology nodes, we also varied the number of inter-layerthermal vias between the two extremes of no thermal vias to themaximum possible number of thermal vias. Table IV shows thetemperatures for these two corners along with a more realisticnumber based on the via pitches in Table I.

    As expected, the peak temperature increases with increasein the number of layersfrom 89.4 C for a 2-D FPGA to220.7 C for a four-layer FPGA using Tezzarons process. Theintra-package temperature variation also increases with increasein the number of layers, from 14.4 C for a 2-D FPGA to 55.0 C

    Fig. 11. Effect of stacking on peak temperature.

    for a four-layer FPGA. This large variation in temperature indi-cates that the peak temperature could be reduced by distributingthe hot blocks more evenly across the fabric. Interestingly, 3-Dtechnology parameters change the temperatures only minutely.For a four-layer FPGA, layer thickness changes the peak tem-perature by up to 4.4 C, while thermal vias could decrease thepeak temperature by up to 3.4 C. Fig. 11 shows the effect ofstacking on temperature, as well as the possible variations be-cause of 3-D technology parameters.

    B. Thermal-Aware 3-D FPGA OrganizationRecently, a study proposed alternate organizations for a 2-D

    FPGA to reduce the intra-die temperature variations [44]. Usinga fully utilized Virtex-4 FX100 FPGA as an example, it demon-strated a reduction in peak die temperature of about 6 C. Sincetemperature variation is larger in a 3-D FPGA, we would expectthermal organization to have a greater impact. To demonstratethis, we design a thermal-aware two-layer FPGA. For ease ofexperimentation, we consider only four types of blocks in theFPGA, namely, CLB, BRAM, DSP, and MGT. These blocksconsume the majority of the area in 4VFX100. The peak tem-perature for a 2-D FPGA containing these blocks is 86.9 C.In the first case, we stack two identical such layers to form atwo-layer stacked FPGA [see Fig. 12(a)]. The peak temperaturefor this FPGA is 128.5 C. Note that stacking the hot blocks sig-nificantly increases the power density, and therefore, the temper-ature. Hence, next, we keep all the MGTs, DSPs, and BRAMson a single layer. The second layer now consists only of CLBs[see Fig. 12(b)]. This change in floorplan can be implementedeasily with the column-based modular architecture of Virtex-4(ASMBL) [6]. This reduces the peak temperature to 112.9 C(two-layer thermal in Table VI). The temperature variation also

  • GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES 891

    Fig. 12. 3-D FPGA organizations. (a) Two-layer stacked. (b) Two-layer thermal.

    TABLE VITHERMAL-AWARE 3-D FPGA DESIGN

    drops from 25.7 C for the stacked design to only 2.6 C for thethermal-aware design.

    In the previous experiments, the heat sink is attached closestto the layer consuming the maximum power. Previous studieshave suggested that this should be preferred. In fact, researchershave proposed thermal-aware 3-D floorplanning that tries toplace the hot blocks closer to the sink [24]. In order to see theeffect of sink placement, we attached it to the layer containingonly CLBs in the two-layer thermal organization. Table VI alsoshows the temperature for this case (two-layer thermal inverted).We observe that the temperature increases only very slightly be-cause of this change. This happens because the vertical distancesare small compared to the horizontal dimensions of the FPGA.

    VI. CONCLUSIONWe demonstrated that 3-D FPGAs can provide significant ad-

    vantages over 2-D by reducing the interconnect area and thetotal area-delay product. The 3-D FPGA with five layers and3- m-pitch vias reduces the area-delay product of a 2-D FPGAby 36%. This number may increase even further with improve-ments in 3-D technology.

    We designed and evaluated several switch boxes for 3-DFPGAs and showed that the area-delay product dependsheavily on the switch box topology. In 65-nm technology, thearea-delay product for our universal-twist switch box is 15%lower than that of the subset switch box for 5- m-pitch vias.We further showed that the universal switch boxes become evenbetter with scaling process technology, as well as with largervias. However, adding more switches to the universal SB doesnot provide any benefit.

    Three-dimensional integration, however, increases the dietemperature. Our experiments indicate that the peak tem-perature for a four-layer FPGA could be 2.4 times that of asingle-layer FPGA. However, the large variation in temperaturewithin the 3-D package allows us to reorganize the 3-D FPGAto reduce the peak temperature. For a two-layer FPGA, the peaktemperature reduced by 16 C when the design was altered tocreate a more uniform temperature profile.

    In this work, we used single-length segments. Most modernFPGAs use a mixture of different length segments. Incorpo-rating this into a 3-D FPGA forms part of future work. The per-formance of a 3-D FPGA could be further improved by usingdirect vertical connections among neighboring CLBs. Further-more, asymmetric 3-D interfaces could be used to mix f2f, f2b,and b2b stacking.

    APPENDIXFORMAL DESCRIPTION OF 3-D SWITCH BOXES

    For brevity, we show only the connections to the inter-layervias. The connections among horizontal wires is either subset(disjoint) or universal (see Fig. 2). Sides and referto the vertical faces of the 3-D SB. , , , andrefer to the horizontal faces. Horizontal channel width is re-ferred to as nodes_per_chan. The number of inter-layer vias isvias_per_chan.

    SUBSET:;

    SUBSET_SPLIT:if then

    if then;

    else;

    end ifelse if then

    ;end ifSUBSET_TWIST:UNIVERSAL_TWIST:if then

    if then;

    else if then;

    else if then;

    end ifelse if then

    if then;

    else if then

    ;else if then

  • 892 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

    ;end if

    end ifSUBSET_MORE://Works because its assumed that

    if thenif then

    ;else

    ;;

    end ifend ifUNIVERSAL_MORE://Works because its assumed that

    if thenif then

    ;

    ;else if then

    ;

    ;else

    ;end if

    else if thenif then

    ;

    ;else if then

    ;

    ;else

    ;end if

    end if

    REFERENCES[1] S. Gupta, M. Hilbert, S. Hong, and R. Patti, Techniques for producing

    3D ICs with high-density interconnect, Tezzaron Semiconductor,Naperville, IL, 2005.

    [2] ITRS, International technology roadmap for semiconductors, 2003[Online]. Available: http://public.itrs.net

    [3] C. Ababei, H. Mogal, and K. Bazargan, Three-dimensional placeand route for FPGAs, in Proc. Asia South-Pacific Des. Autom. Conf.,Shanghai, China, 2005.

    [4] A. Telikepalli, Designing for power budgets and effective thermalmanagement, Xcell J., vol. 56, no. 56, pp. 2427, 2006.

    [5] Altera Corporation, San Jose, CA, Thermal management for 90-nmFPGAs, Appl. Note 358, 2004.

    [6] Xilinx, San Jose, CA, Xilinx products documentation, 2006 [Online].Available: http://www.xilinx.com/literature

    [7] A. Gayasen, N. Vijaykrishnan, M. Kandemir, and A. Rahman, Switchbox architectures for three-dimensional FPGAs, presented at theField-Program. Custom Comput. Mach. (FCCM) Napa Valley, CA,Apr. 2006.

    [8] A. J. Alexander, J. P. Cohoon, J. L. Colflesh, J. Karro, and G. Robins,Three-dimensional field-programmable gate arrays, in Proc. Int.ASIC Conf., 1995, pp. 253256.

    [9] J. V. Campenhout, H. V. Marck, J. Depreitere, and J. Dambre, Opto-electronic FPGAs, IEEE J. Sel. Topics Quantum Electron., vol. 5, no.2, pp. 306315, Mar./Apr. 1999.

    [10] M. Leeser, W. M. Meleis, M. M. Vai, S. Chiricescu, W. Xu, and P.M. Zavracky, Rothko: A three-dimensional FPGA, IEEE Des. TestComput., vol. 15, no. 1, pp. 1623, Jan./Mar. 1998.

    [11] G. Borriello, C. Ebeling, S. A. Hauck, and S. Burns, The triptychFPGA architecture, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,vol. 3, no. 4, pp. 491501, Dec. 1995.

    [12] S. Chiricescu, M. Leeser, and M. M. Vai, Design and analysis of a dy-namically reconfigurable three-dimensional FPGA, IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 9, no. 1, pp. 186196, Feb. 2001.

    [13] M. Lin, A. E. Gamal, Y. Lu, and S. Wong, Performance benefits ofmonolithically stacked 3D-FPGA, presented at the Int. Symp. FieldProgram. Gate Arrays, Monterey, CA, 2006.

    [14] A. Rahman, S. Das, A. Chandrakasan, and R. Reif, Wiring require-ment and three-dimensional integration of field-programmable gate ar-rays, in Proc. Int. Workshop Syst.-Level Interconnect Prediction, 2001.

    [15] Y.-S. Kwon, P. Lajevardi, A. Chandrakasan, F. Honore, and D. E.Troxel, A 3-D FPGA wire resource prediction model validated usinga 3-D placement and routing tool, presented at the Int. WorkshopSyst.-Level Interconnect Prediction, San Francisco, CA, 2005.

    [16] C. Ababei, Y. Feng, B. Goplen, H. Mogal, T. Zhang, K. Bazargan, andS. Sapatnekar, Placement and routing in 3D integrated circuits, IEEEDesign Test, vol. 22, no. 6, pp. 520531, Nov./Dec. 2005.

    [17] G.-M. Wu, M. Shyu, and Y.-W. Chang, Universal switch blocks forthree-dimensional FPGA design, in Proc. ACM/SIGDA Int. Symp.Field-Programmable Gate Arrays, 1999.

    [18] D. Brooks and M. Martonosi, Dynamic thermal management for high-performance microprocessors, presented at the 7th Int. Symp. High-Perf. Comput. Arch., Nuevo Leone, Mexico, 2001.

    [19] K. Sankaranarayanan, S. Velusamy, M. Stan, and K. Skadron, A casefor thermal-aware floorplanning at the microarchitectural level, J.Instruction-Level Parallelism, vol. 7, Oct. 2005 [Online]. Available:http://www.jilp.org/vol7

    [20] Y. Han, I. Koren, and C. A. Moritz, Temperature aware floorplan-ning, presented at the 2nd Workshop Temperature-Aware Comput.Syst. (TACS-2), Madison, WI, Jun. 2005.

    [21] G. Chen and S. Sapatnekar, Partition-driven standard cell thermalplacement, presented at the Int. Symp. Phys. Des., Monterey, CA,2003.

    [22] K. Skadron et al., Temperature-aware microarchitecture, presentedat the 30th Int. Symp. Comput. Arch. (ISCA), San Diego, CA, 2003.

    [23] G. M. Link and N. Vijaykrishnan, Thermal trends in emerging tech-nologies, presented at the Int. Symp. Quality Electron. Des. (ISQED),San Jose, CA, 2006.

    [24] J. Cong, J. Wei, and Y. Zhang, A thermal-driven floorplanning algo-rithm for 3D ICs, presented at the Int. Conf. Comput.-Aided Des., SanJose, CA, Nov. 2004.

    [25] B. Goplen and S. S. Sapatnekar, Efficient thermal placement of stan-dard cells in 3D ICs using a force directed approach, presented at theInt. Conf. Comput.-Aided Des., San Jose, CA, 2003.

    [26] J. Cong and Y. Zhang, Thermal via planning for 3-D ICs, presentedat the Int. Conf. Comput.-Aided Des., San Jose, CA, Nov. 2005.

    [27] B. Goplen and S. S. Sapatnekar, Thermal via placement in 3D ICs,presented at the ACM Int. Symp. Phys. Des., San Francisco, CA, 2005.

    [28] S. Lopez-Buedo, J. Garrido, and E. Boemo, Dynamically inserting,operating, and eliminating thermal sensors of FPGA-based systems,IEEE Trans. Components Packag. Technol. (CPM), vol. 25, no. 4, pp.561566, Dec. 2002.

    [29] S. Velusamy et al., Monitoring temperature in FPGA based SoCs,presented at the Int. Conf. Comput. Des. (ICCD), San Jose, CA, 2005.

    [30] Altera Corporation, San Jose, CA, Altera product datasheets, 2006[Online]. Available: http://www.altera.com/literature

    [31] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, 3-D ICs: Anovel chip design for improving deep submicron interconnect perfor-mance and systems-on-chip integration, Proc. IEEE, vol. 89, no. 5, pp.602633, May 2001.

    [32] Y. Yamaji et al., Thermal characterization of bare-die stacked mod-ules with Cu through-vias, presented at the Electron. ComponentsTechnol. Conf., Orlando, FL, 2001.

    [33] C. S. Tan and R. Reif, Multi-layer silicon layer stacking based oncopper wafer bonding, Electrochem. Solid-State Lett., vol. 8, no. 6,pp. G147G149, 2005.

    [34] J. Rose and S. Brown, Flexibility of interconnection structures forfield-programmable gate arrays, IEEE J. Solid-State Circuits, vol. 26,no. 3, pp. 277282, Mar. 1991.

    [35] G. Lemieux, S. Brown, and D. Vranesic, On two-step routing forFPGAs, in Proc. Int. Symp. Phys. Des., Napa Valley, CA, 1997.

  • GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES 893

    [36] S. Wilton, Architecture and algorithms for field-programmable gatearrays with embedded memory, Ph.D. dissertation, Dept. Elect.Comput. Eng., Univ. Toronto, Toronto, ON, Canada, 1997.

    [37] M. I. Masud and S. Wilton, A new switch block for segmentedFPGAs, presented at the Int. Workshop Field Program. Logic Appl.,Glasgow, U.K., 1999.

    [38] P. Hallschmid and S. Wilton, Detailed routing architectures for em-bedded programmable logic IP cores, presented at the ACM/SIGDAInt. Symp. Field-Program. Gate Arrays, Monterey, CA, 2001.

    [39] Y.-W. Chang, D. F. Wong, and C. K. Wong, Universal switch blocksfor FPGA design, ACM Trans. Des. Autom. Electron. Syst., vol. 1, no.1, pp. 80101, Jan. 1996.

    [40] H. Fan, J. Liu, Y.-L. Wu, and C.-C. Cheung, On optimum switchbox designs for 2-D FPGAs, presented at the 38th ACM/SIGDA Des.Autom. Conf. (DAC), Las Vegas, NV, 2001.

    [41] V. Betz and J. Rose, VPR: A new packing, placement and routing toolfor FPGA research, presented at the Int. Workshop Field-Program.Logic Appl., London, U.K., 1997.

    [42] Arizona State University, Tempe, Predictive technology model, [On-line]. Available: http://www.eas.asu.edu/~ptm

    [43] V. Betz, J. Rose, and A. Marquardt, Architecture and CAD for Deep-Submicron FPGAs. Norwell, MA: Kluwer, 1999.

    [44] P. Sundararajan, A. Gayasen, N. Vijaykrishnan, and T. Tuan, Thermalcharacterization and optimization in platform FPGAs, presented at theInt. Conf. Comput.-Aided Des., San Jose, CA, Nov. 2006.

    Aman Gayasen received the B.Tech. degree inelectrical engineering from Indian Institute ofTechnology, Delhi, India, in 2001, and the Ph.D.degree in computer engineering from PennsylvaniaState University, University Park, in 2006. HisPh.D. dissertation was on the implications of futuretechnologies on the design of FPGAs.

    He is a Senior R&D Engineer with Synopsys, Sun-nyvale, CA. In the past, he has worked with Ikos Sys-tems on behavioral synthesis for emulators. His re-search interests include reconfigurable devices and

    systems, nanotechnology, and all aspects of EDA.

    Vijaykrishnan Narayanan is a Professor with theDepartment of Computer Science and Engineeringand Electrical Engineering with Pennsylvania StateUniversity, University Park. His research interestsinclude the areas of energy-aware reliable systems,embedded systems, on-chip networks, system designusing emerging technologies (such as 3-D and Nan-otechnology) and computer architecture. For moreinformation, visit http://www.cse.psu.edu/~vijay.

    Mahmut Kandemir (M03) received the Ph.D. de-gree in electrical engineering and computer sciencefrom Syracuse University, Syracuse, NY, in 1999.

    He is an Associate Professor with the ComputerScience and Engineering Department, PennsylvaniaState University, University Park. His main researchinterests include optimizing compilers, I/O intensiveapplications, and power-aware computing.

    Dr. Kandemir is a member of the ACM.Hisresearch is supported by NSF, DARPA, SRC andMICROSOFT.

    Arifur Rahman received the B.S. degree fromPolytechnic University, Brooklyn, NY, and the M.S.and Ph.D. degrees from Massachusetts Institute ofTechnology (MIT), Cambridge, all in electrical engi-neering. His Ph.D. dissertation was on performancemodeling of 3-D integrated circuits.

    He is a Senior Member of technical staff at XilinxResearch Laboratories, San Jose, CA. He has pub-lished more than 25 papers on this subject as wellas field programmable gate arrays and sensors. He isan inventor with 8 patents granted and more than 25

    patents pending.