Upload
ngotu
View
218
Download
0
Embed Size (px)
Citation preview
International Journal of Numerical Methods for Heat & Fluid FlowThermal simulations in support of multi-scale co-design of energy efficientinformation technology systemsYogendra Joshi Banafsheh Barabadi Rajat Ghosh Zhimin Wan He Xiao Sudhakar Yalamanchili SatishKumar
Article information:To cite this document:Yogendra Joshi Banafsheh Barabadi Rajat Ghosh Zhimin Wan He Xiao Sudhakar YalamanchiliSatish Kumar , (2015),"Thermal simulations in support of multi-scale co-design of energy efficientinformation technology systems", International Journal of Numerical Methods for Heat & Fluid Flow,Vol. 25 Iss 6 pp. 1385 - 1403Permanent link to this document:http://dx.doi.org/10.1108/HFF-08-2014-0242
Downloaded on: 07 August 2015, At: 12:03 (PT)References: this document contains references to 25 other documents.To copy this document: [email protected] fulltext of this document has been downloaded 1 times since 2015*
Access to this document was granted through an Emerald subscription provided by emerald-srm:168551 []
For AuthorsIf you would like to write for this, or any other Emerald publication, then please use our Emeraldfor Authors service information about how to choose which publication to write for and submissionguidelines are available for all. Please visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.comEmerald is a global publisher linking research and practice to the benefit of society. The companymanages a portfolio of more than 290 journals and over 2,350 books and book series volumes, aswell as providing an extensive range of online products and additional customer resources andservices.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of theCommittee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative fordigital archive preservation.
*Related content and download information correct at time ofdownload.
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
Thermal simulations in supportof multi-scale co-design ofenergy efficient information
technology systemsYogendra Joshi
Woodruff School of Mechanical Engineering, Georgia Institute of Technology,Atlanta, Georgia, USABanafsheh Barabadi
MIT, Cambridge, Massachusetts, USARajat Ghosh
Mechanical Engineering, Georgia Tech, Atlanta, Georgia, USAZhimin Wan and He Xiao
G.W. Woodruff School of Mechanical Engineering,Georgia Institute of Technology, Atlanta, Georgia, USA
Sudhakar YalamanchiliElectrical and Computer Engineering, Georgia Tech, Atlanta,
Georgia, USA, andSatish Kumar
Mechanical Engineering, Georgia Tech, Atlanta, Georgia, USA
AbstractPurpose – Information technology (IT) systems are already ubiquitous, and their future growth isexpected to drive the global economy for the next several decades. However, energy consumptionby these systems is growing rapidly, and their sustained growth requires curbing the energyconsumption, and the associated heat removal requirements. Currently, 20-50 percent of the incomingelectrical power is used to meet the cooling demands of IT facilities. Careful co-optimization of electricalpower and thermal management is essential for reducing energy consumption requirements of ITequipment. Such modeling based co-optimization is complicated by the presence of several decadesof spatial and temporal scales. The purpose of this paper is to review recent approaches for handlingthese challenges.Design/methodology/approach – In this paper, the authors illustrate the challenges and possiblemodeling approaches by considering three examples. The multi-scale modeling of chip level transientheating using a combination of Progressive Zoom-in, and proper orthogonal decomposition (POD) is aneffective approach for chip level electrical/thermal co-design for mitigation of reliability concerns, suchas Joule heating driven electromigration. In the second example, the authors will illustrate the optimalmicrofluidic thermal management of hot spots, and large background heat fluxes associated withfuture high-performance microprocessors. In the third example, data center facility level energy usagereduction through a transient measurements based POD modeling framework will be illustrated.Findings – Through modeling based electrical/thermal co-design, dramatic savings in energy usagefor cooling are possible.Originality/value – The multi-scale nature of the thermal modeling of IT systems is an importantchallenge. This paper reviews some of the approaches employed to meet this challenge.Keywords POD, Co-design, Multi-scale, Thermal modeling, Microfluidic cooling, Transient heatingPaper type General review
International Journal of NumericalMethods for Heat & Fluid Flow
Vol. 25 No. 6, 2015pp. 1385-1403
©Emerald Group Publishing Limited0961-5539
DOI 10.1108/HFF-08-2014-0242
Received 3 August 2014Revised 2 January 2015
Accepted 6 January 2015
The current issue and full text archive of this journal is available on Emerald Insight at:www.emeraldinsight.com/0961-5539.htm
1385
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
Nomenclature(c1, c2, c3) empirical constantsCp specific heatEm cumulative correlation energyg normalized temporal resolutionh heat transfer coefficient, W/m2Kh normalized spatial resolutionK thermal conductivity, W/mKl size of input space_m mass flow rate, kg/s_q heat transfer rate, WQ total chip power, Wt time, sT temperature, KT0 ambient temperature, KTbottom temperature at the bottom
surface of package, KTs temperature at the bottom
surface of package, Kz height, mm
λi eigenvalue corresponding toPOD modes
θ normalized time
Subscriptsamb ambientf fluidcond conductionconv convectiongen generation
AcronymsBT bismaleimide triazinePCB printed circuit boardFE pipeline frontend and L1
instruction cacheSC out-of-order schedulerInt integer unitFPU floating-point unitDL1 L1 data cache
1. IntroductionThermal management of information technology (IT) systems involves over tendecades of length scales, from on-chip interconnects (currently ~22 nm) to large datacenter facilities (~50-100 m in length), and a multitude of time scales. Thermal modelingat various scales is essential during design phase to insure performance and reliability.An emerging focus area is co-design, involving the consideration of two or more aspectsof the overall design, with the objective of reducing overall energy consumption. Energyconsumption in IT facilities and equipment is a growing concern, which is driving thisinterest. Figure 1 shows two dissimilar length scales, where the thermal analysis forco-design is illustrated in this paper.
On the left is a chip package. The current baseline is a single-tier package with aircooling. Temperature computations within such packages with multi-scale resolution
~2 m
Tier 3
Inlet
Inlet
Active layer
Microbump
PCB
BT substrate
Interposer
SiO2 & Metal layer
Tier 1
Tier 2
Outlet
Outlet
SEM of Micropins
Note: Both are considered in this paper to illustrate co-design approaches
Figure 1.A cm scale 3Dstacked chippackage withmicrofluidic cooling(left), and data center(right) illustrate themulti-scale spatialnature of thermalmodeling needs inIT systems
1386
HFF25,6
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
are needed under both steady state and transient operation, for assessing reliability.These are explored in Sections 2-4. The packaged two-tier chip stack incorporatingmicrofluidic cooling, shown in Figure 1 illustrates future packaging direction, toachieve the continuing goals of faster and more functional systems. In Section 5, thisconfiguration is analyzed in an electrical/thermal co-design framework to optimizecooling. The image on the right in Figure 1 illustrates a data center aisle, with perforatedtiles for cooling air delivery. Efficient computations of inlet air temperatures at serversallow for controlling cooling hardware to optimize energy consumption. A data-drivenmodeling framework for achieving this is presented in Sections 6-8.
2. Chip/package level thermal modeling for integrated circuit (IC) designand reliability analysisKey factors impacting IC chip performance and reliability are the knowledge of thermaland electrical performance. As the size of on-chip interconnects reduces below theelectron mean free path (~40 nm in Cu at room temperature), electron transportbecomes dominated by scattering at the metal-dielectric interface, and at grainboundaries. This scattering can decrease the electrical and thermal conductivity to lessthan half that of the bulk value (Fuchs, 1938; Sondheimer, 1952; Ziman and Levy, 1961;Namba, 1970; Soffer, 1967; Gurrum et al., 2004) making Joule heating a major concern inIC industry, requiring a comprehensive predictive framework.
It is important to note that on-chip interconnects are not isolated entities. They arearranged as a stack of metal lines, vias, and dielectric layers, as a part of a chip that isultimately enclosed in a package. The majority of heat is generated in the transistorsand metallic interconnects, with characteristic length scale of tens of nm (~10−8 m).This heat is then conducted through the passive/active components, substrate, andheat sink, prior to being convectively removed by the ambient environment. At thislevel, the characteristic length scale is on the order of tens of cm (10−1 m). Therefore,the effect of the entire structure should be accounted for, to acquire an accurate andrealistic thermal simulation of interconnects. In addition, transient thermal events inthe interconnect structure can vary from tens of s (102 s) at the package level to μs(10−6 s) at the interconnect level. Thus, the heat diffusion problem in interconnects is amulti-scale problem, both in length and time scales.
The traditional finite difference (FD) and finite element (FE) methods require a largenumber of computational nodes for such problems. As a result, computational timesare long, even for a unit cell (micro-model), and hence unfeasible for parametric studies.Analytical methods, less complex in geometry, are not able to capture the importantfeatures of the thermal problem. Additionally, previous models have been primarilyconcerned with the steady-state Joule heating in interconnects, while the pulsed natureof the electric currents necessitates studying the effects of transient heat conduction inthe interconnects as well. This confirms the need for the development of high-fidelitytransient multi-scale thermal models that can address the aforementioned issues.
Barabadi et al. (2012) developed a computationally efficient hybrid multi-scaletransient thermal modeling methodology. It incorporates two different multi-scalemodeling approaches: “Progressive Zoom-in” (e.g. Tang and Joshi (2005)), and “ProperOrthogonal Decomposition (POD)” (Lumley, 1967). This integration resulted in a modelthat has the capability of handling several decades of length scales, from tens of mm at“package” level to several nm at “interconnects”, corresponding to various thermalevents. This ability also applies for time scales from s to μs, at significantly lower
1387
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
computational cost, without compromising the desired accuracy. Hybrid scheme alsoprovides the ability to rapidly predict thermal responses under different power inputpatterns, based on only a few representative detailed simulations (Barabadi et al., 2015).It is demonstrated that utilizing the proposed model, the computational time is reducedby at least two orders of magnitude at every step of modeling.
To further describe the hybrid scheme, a chip and its function blocks, in a Flip ChipBall Grid Array (FCBGA) package are considered. The hybrid scheme utilizes acombination of Progressive Zoom-in, and POD as an effective approach for chip levelelectrical/thermal co-design for mitigation of reliability concerns, such as Joule heatingdriven electromigration. Random transient power distributions were assumed for thechip and its individual function blocks to demonstrate the capability of POD methodin predicting different thermal scenarios. To validate this methodology, the resultswere compared with a FE model developed in COMSOL®. The results of the modeldeveloped through hybrid scheme were in good agreement with the correspondingFE models.
3. Hybrid scheme for multi-scale thermal modelingAs mentioned earlier, hybrid scheme combines the implementation of POD techniqueand Progressive Zoom-in approach.
3.1 POD methodPOD is a powerful method of data analysis that offers low-dimensional but accuratedescriptions of a high-dimensional system. A comprehensive summary for applicationsof POD is given by Holmes et al. (1998). A detailed procedure to generate a twodimentional (2D) POD-based reduced order model is provided in Barabadi et al. (2011).The primary steps to generate a POD-based reduced order model are: generating theobservation matrix; calculating basis functions (PODmodes); calculating POD coefficients;and generating the POD temperature field (Barabadi et al., 2015). Combining PODwith theProgressive Zoom-in approach greatly improves the computational efficiency.
3.2 Progressive Zoom-in approachProgressive Zoom-in method is a top-down approach to model transient thermalproblems. It integrates package, chip, sub-chip level analyzes, acquiring the advantagesof each (e.g. Tang and Joshi, 2005). It has the capability of covering several orders ofmagnitude in length and time scale (Barabadi et al., 2011).
The overall hybrid approach is demonstrated in the flowchart provided in Figure 2.The section marked via dashed lines represents the area used in this paper to illustratethe proposed multi-scale transient thermal modeling approach. The scheme is outlinedin Figure 2, and is further explained through a case study in the following section.
At first, the entire package, including the surrounding components such asthe mold compound, solder bumps, substrate, etc. are numerically modeled and thetransient thermal solution is obtained. At this level, the chip is modeled as a solid blockwith uniform and effective material and thermal properties and no internal details. Thetransient thermal solution is then transferred into the chip in the form of dynamicallychanging boundary conditions. The next step is to develop the thermal model of thechip using the transferred boundary conditions. At this step, the internal functionblocks of the chip with their nonhomogeneous material and thermal propertiesare included in the modeling. The transient thermal solution at the chip level is then
1388
HFF25,6
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
determined by assigning the block specific power profiles. Afterwards, by following theprocedure explained in Section 3.1, the POD model is developed at the chip level.The POD model has the ability to predict the transient temperature profile in the chipfor different types of power sources and power maps at a much lower computationalcost. If further spatial resolution is desired, this approach can be utilized by repeatingthe previous steps until the desired resolution is achieved.
4. Case study and resultsThe focus of this study is to develop a transient thermal model at the chip and itsfunction block level, with a spatial resolution of 1 μm. The modeling steps are describedas follows.
4.1 Thermal simulation at the package levelIn order to have a realistic thermal solution at the chip level, we start from modelingthe entire FCBGA package (shown in Figure 3(a)), including the surrounding mold,underfill, solder bumps, substrate, etc., using COMSOL software. At this level, the chipis treated as a solid block with effective material and thermal properties, withoutconsidering internal details. The details of the package level modeling including thethermal and material property calculations are provided in Barabadi et al. (2012, 2015).This package is used for low power portable systems where compact form factorprohibits the use of heat sinks and forced cooling. Natural convection boundarycondition with a heat transfer coefficient of h¼ 15W/m2K (typical range for air; Tangand Joshi, 2005) was applied to the top surface and a constant temperature boundarycondition is assigned to the bottom surface.
Continue to desired resolution on chip
Transferring solution to function blocklevel as dynamic Boundary Conditions
Applying POD technique to chip level
Chip level thermal simulation
Transferring solution to chip levelas dynamic Boundary Conditions
Thermal simulation atpackage level
Source: Reprinted with permission Barabadiet al. (2012)
Figure 2.Flowchart of the
hybrid scheme formulti-scale thermal
modeling
1389
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
A full FE model is developed for the entire package consisting of 75,919 elements, ofwhich 343 are for the chip. The grid size is chosen after performing mesh independenceanalysis. Total power to the chip is Q¼ 3 sin 2πt+ 3 (W), which is applied for 1 s.The FE results for the spatial distribution of temperature in the package and chipafter 1 s is displayed in Figure 4(a) and (b), respectively.
4.2 Transferring the solution from package level to the chip levelOnce the transient thermal solution is obtained at the package level for 1 s, the solutionis then dynamically transferred into the chip every 0.1 s as boundary conditions. On thetop surface temperature, and on the bottom surface heat flux are applied to the chip.Due to the high-aspect ratio of the chip, the side walls are considered adiabatic.Afterwards, the thermal solution is then linearly interpolated on each surface withmuch higher spatial resolution (268,033 elements to model the chip at this level vs 343elements to model the chip at the package level).
12
3
89
10
yz
x
45
67
Top layer:Si substrate
Bottom layer:Interconnect/Dielectric multilayer
Middle layer:Device
z
Mold
Die
Substrate
Underfill
Tbottom=Ts
h, T0
Solder Bumps
x
(0,0,0)
(a)
(b)
Source: Reprinted with permission Barabadi et al. (2012)
(c)
Figure 3.(a) Schematic of asimplified FCBGAfor package levelmodeling,(b) zoomed-in outlineof the chip withartificial block, and(c) zoomed-inschematic of thelayers of the dieused in chip levelmodeling
(a) (b)
20×10–3
20×10–35×10–3
5×10–3
0
340347
346
345
344
343
342
341
340
330
320
310
300
T (K) T (K)
0
yz
x y zx
0
Source: Reprinted with permission Barabadi et al. (2012)
Figure 4.Spatial distributionof temperature (K)extracted from FEmethod after 1 µsfor (a) FCBGApackage and (b) chip
1390
HFF25,6
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
4.3 Chip level thermal simulationAt this level, chip is no longer assumed to be a solid block. It is now divided into tensubdomains called function blocks which symbolize various components on thechip (shown in Figure 3(b)). Each block has three layers, displayed in Figure 3(c). Theselayers are: top Si layer with the thickness of 0.249 mm; middle layer which is a5 µm-thick device layer, and Interconnect/Dielectric multilayer at the bottom with thethickness of 16.72 µm. The third layer consists of 21 sub-layers including ten metallayers. Details on the calculation of thermal and material properties of the functionblocks can be found in Barabadi et al. (2012, 2015). In order to have a realistic andaccurate thermal simulation, a detailed dynamic power map of the function blocksare required which is discussed in the following section.
4.4 Applying POD technique to chip levelAfter determining the temperature distribution at package level and transferring it tothe chip level as boundary conditions, a POD model is developed for chip level analysisusing the procedure discussed in Section 3.1. Total of six observations based on thetransient temperature solution were taken in the first 0.5 s using the package levelFE thermal solution that is transferred on to the chip level model setup. Theseobservations correspond to the temperature solutions obtained at different timeinstants. As previously demonstrated by Barabadi et al. (2011) for any linear system,POD method has the ability to predict transient thermal solution regardless of thetemporal or spatial dependence of the applied heat source. In other words, the PODsolutions of these transient thermal scenarios are independent of the initial observations.Essentially, for any linear system, once the solution to a sample case of chip total power isobtained, there is no need to generate new observations, or full field FE simulations. Thisfeature provides the ability to predict temperature distributions for arbitrary heat inputs,by using a smaller sample set of applied heat sources and power maps, which results inconsiderably decreased simulation time.
To further verify this feature of POD, a randomly generated transient powerdistribution, shown in Figure 5(d), is used in the POD thermal model development.As can be implied from Figure 5(d) the new power profile differs from the chip powerin duration, magnitude, and temporal dependency. Furthermore, since at this level ofsimulation the chip is no longer treated as a solid block and is divided into subdomains,the dynamic power grid also needs to be assigned to individual function blocks.Therefore, ten random power profiles with minimum and maximum values of 0 and3Wwere generated between zero and 1 s and assigned to each block in such a way thattheir sum will equal the total chip power as shown in Figure 5(d). The power sources forblocks 1, 2, and 10 are presented in Figure 5(a)-(c) as representatives.
After assigning the function blocks power profiles, the POD modes are calculated.In order to have a reliable but fast model, only five POD modes are utilized. This isselected such that the cumulative correlation energy, Em (Bizon et al., 2008) is largerthan 99.0 percent. Because the POD modes are 3D, for better visualization, 2D contourplots of the first six POD modes at the interface of device and interconnect/dielectriclayers are depicted in Figure 6. In this figure, POD modes are normalized with respectto the total sum of the generated modes. The POD coefficients, bi, were then calculatedas functions of time using the method of Galerkin Projection (Barabadi et al., 2015).
After calculating the POD modes and coefficients, the transient thermal solution canbe obtained. Figure 7 shows the spatial distribution of temperature after 0.95 s based on
1391
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
POD method (Figure 7(a)) and full field FE model (Figure 7(c)) inside the chip and itsfunction blocks (Figure 7(b)). The temporal distribution of temperature based on PODmodel and FE analysis at four different points inside the function blocks are alsoprovided in Figure 8. The mean absolute error between the POD and FE model is8 percent over the entire space and time domain. It must be noted that thecomputational time required for POD model is approximately 20 s, which is two ordersof magnitude less than that of FE results (~27 min).
4.5 Transferring the solution from chip level to the function block levelIf higher spatial resolution on the function blocks are required, the procedure inSection 3.2 can be repeated and transient thermal solution can be transferred to any of
POD Mode 1
POD Mode 4 POD Mode 5 POD Mode 6
POD Mode 2× 10–3 × 10–3 × 10–3
× 10–3 × 10–3 × 10–3
3.3
3.2
3
3.1
2.8
4
2
0
–2
–4–6
–8
x
y
4
6
2
0
–2
–4
–6
4
6
2
0
–2
–4
–6
2.9
64
–2
0
2
–6
–4
4
2
–4
–2
0
–6
–8
POD Mode 3
Figure 6.2D contour plots ofthe first six PODmodes at theinterface of deviceand interconnect/dielectric layers
0
1
2
3
0 0.5 1
Pow
er (
W)
Time (s)
00.5
11.5
22.5
0 0.5 1
Pow
er (
W)
Time (s)
0
0.5
1
1.5
2
0 0.5 1
Pow
er (
W)
Time (s)
05
101520
0 0.5 1Tota
l Pow
er (
W)
Time (s)
(a) (b)
(d)(c)
Block 2
Block 10
Block 1
Source: Adapted from Barabadi et al. (2015)
Figure 5.Dynamic randomlygenerated powerprofile for functionblocks 1, 2, 3 (a-c)and randomlygenerated total chippower (d)
1392
HFF25,6
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
the function blocks in terms of boundary conditions. In general, procedures 3.1-3.4 canbe repeatedly applied, until the desired resolution on the chip is achieved.
5. Thermal modeling for co-design of microfluidically cooled 3D ICsWire interconnections become longer in 2D topology, with increasing number oftransistors, which can increase the latency in the wire and reduce the expected benefitfrom integration (Meindl, 2003); 3D ICs as the next generation IC technology achievehigher integration with the technology of through-silicon vias. The vertical integrationcan reduce the global wire length by as much as 50 percent (Khan et al., 2009). The wirelimited clock frequency could be increased nearly fourfold (Bamal et al., 2006). However,the increased power density due to the stack results in great thermal challenges. If theheat cannot be removed effectively, the temperature rise will degrade the electricalperformance, and adversely impact reliability. One important drawback of high
(b)
T (K)
338
339
340
341
342
343
342
341
343
344
345
346
347× 10
–4
0
–0.5
–1
–2
–1.5
–2.5
0.01
0.005
0 0
0.005
0.0
–3
z (m
)y (m) x (m)
(c)T (K)
(a)
z (m
)
0
–0.5
–1
–1.5
–2
–2.5
–30.01
0.005
0 0
0.005
0.0
y (m) x (m)
× 10–4
Notes: Extracted from the POD model (a) and FE simulation (c). The domain is slicedhorizontally along the XY plane. The outline of chip with the function blocks are shownin (b)
Figure 7.Spatial distributionof temperature (K)
at 0.95 s
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
10
20
30
40
50
60
70
Time (s)
ΔT (
K)
o o o COMSOLPOD
Figure 8.Comparison of
temporal dependenceof temperature rise
between FE(markers) and POD(solid lines) models
at four differentpoints inside the die
1393
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
temperature in ICs is the non-linear increase in leakage power (Roy et al., 2003).The leakage power cannot be used by the transistors, and further exacerbates thethermal management challenge. Inter-tier microfluidic cooling has been proposed as apossible solution to the thermal problem of 3D ICs due to its high performance andscalability (Bakir and Meindl, 2008).
In order to design single phase microfluidic cooling, accurate thermal modeling isneeded. However, computational fluid dynamics/heat transfer (CFD/HT) based detailednumerical analysis methods, such as FE or finite volume, need very fine mesh, and aretime-consuming, and thus not suitable for optimization of complex 3D architectures.Compact or reduced thermal modeling is typically used to improve computationalefficiency. A fast and accurate thermal circuit model for 3D ICs with an integratedmicrochannel heat sink was proposed (Mizunuma et al., 2009). This model achievedmore than 3,300× speedup with less than 5 percent error, in comparison with acommercial numerical finite volume simulation tool. However, it was only applicableto steady state conditions and numerical pre-simulations were needed for everysimulation executed, which further increased the calculation time.
A 3D-ICE is a transient compact thermal model, which takes into account the effectsof the inter-tier cooling through a microchannel heat sink (Sridhar et al., 2010). Only onenode was used for each channel block, which reduced the problem size greatly.Empirical correlations for heat transfer coefficient were used, so no detailed convectiveheat transfer analysis was needed. However, the empirical correlations utilized were forfully developed flow and did not capture the developing flow regime.
Another technique to model the integrated microchannel cooling of 3D ICs wasbased on modeling the channel layer as a porous medium (Sridhar et al., 2010b).A similar porous medium model as in (Sridhar et al., 2010b) was developed to analyzethe thermal characteristics of 3D ICs with integrated microgaps. One limitation ofthe previous thermal resistance method (Sridhar et al., 2010a) was that the grid sizemust be the same as the channel pitch. Using the porous medium model removed thislimitation by homogenizing the channel layers. As such, it provided freedom to increasethe grid size, resulting in faster simulations.
The main problems of the existing thermal modeling of 3D ICs incorporatingmicrofluidic cooling are: the power map is either assumed uniform or not realistic.When a specific application runs on a chip, the power should be non-uniform; thecoupling effect between power and thermal is not included. Increased temperaturecould lead to higher consumption of power, due to increased leakage power.The leakage power raises the temperature further. So it is important to include thetemperature dependent leakage power in the thermal model.
Figure 9 shows the schematic of physical model of 3D ICs with inter-tier microfluidiccooling (Wan et al., 2013). The two-tiers stack with processor and L2 cache memory isplaced on the substrate through interposer. Each tier consists of a SiO2 and metal layer,an active layer, and a silicon base layer. SiO2 is used for insulation and bonding. Activelayer is a thin layer in which most of the heat is generated. Staggered micro pin finarray used for cooling is fabricated between the two tiers by Deep Reactive Ion Etching.Signal vias are embedded in the pin fins to provide communication between the twotiers. The stack is electrically connected to the outside through the Si interposer. Thechip size is 8.4 mm× 8.4 mm. Thickness of the silicon base and is 100 μm. Thicknessof SiO2 and metal is 10 μm. The active layer is assumed zero thickness for simplicity.
The pin fin diameter is 100 μm, height 300 μm, and pitches in both directions are200 μm. De-ionized (DI) water with inlet temperature 20°C is used for cooling.
1394
HFF25,6
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
A coupled power-thermal mode is used to model the 3D ICs in Figure 9. Figure 10shows the floorplans for processor and L2 cache. The processor is a ×86 16-coresprocessor resemble the Intel Nehalem structure, a typical Out-of-Order microarchitectureincluding: FE (pipeline frontend and L1 instruction cache), SC (Out-of-Order scheduler),INT (integer unit), FPU (float-point unit) and DL1 (L1 data cache). Each core has a privateL1 data cache of 128 KB and a shared L2 cache. The L2 cache layer consists of 16S RAMbanks each with size of 2 MB.
A cycle-level timing model is used to execute compiled 32-bit× 86 binaries.Simulations are run for 500 M clock cycles to warm up the processor state and reacha “region of interest” where the computational characteristics are representative in thebenchmark program. The power at each block in the processor floor plan is sampled
Silicon base
Active layer
Staggered micropin fin array
Silicon base
Active layer
SiO2 andMetal layer
Interposer
Substrate
PCB
Flow
direc
tion
Figure 9.Schematic of
physical model of3D ICs with inter-tiermicrofluidic cooling
FPU
INT
DL1
DL1
INT
FPU
FPU
INT
DL1
DL1
INT
FPU
SC SCINT
DL1
DL1
INT
FPU
FPU
INT
DL1
INT
FPUSCSC
FE FE
FE FE
SC SC
SC SC
FE FE
FE FE
DL1
FPU FPU SC SC FPU
L2
L2
L2
L2 L2 L2 L2
L2 L2 L2
L2 L2 L2
L2 L2 L2
8.4m
m
8.4m
m
INT
DL1
DL1
INT
FPU
FPU
INT
DL1
DL1
INT
FPU
FE FE
FE FE
SC SC
SC SC
FE FE
FE FE
SC SC
INT
DL1
DL1
INT
FPU
INT
DL1
DL1
INT
FPU
8.4 mm 8.4 mm
FPU
Notes: (a) Processor; (b) L2 cacheSource: Wan et al. (2013)
Figure 10.Floor plans
1395
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
every 10 μs to produce a power trace. Table I shows the power distribution for eachmodule. The total power of processor is 165.5W with maximum heat flux 301.3 W/cm2.The total power of L2 cache is 49.6 W. The power distribution will in input to thethermal model.
First, the 3D stack is discretized into multiple control volumes. Each of the controlvolumes is around one pin and consists of the fluid and solid domain. The temperaturesof the two active layers and the fluid within each control volume are assumed uniformand stored in three 2D matrix. Figure 11 shows the energy flow within each controlvolume. Each tier has in-plane conduction from four directions. Also there areconduction between the two tiers, conduction between the stack and PCB, convectionbetween the stack and coolant.
The thermal model is based on finite volume approximation of the energy equationsfor both the solid and fluid within the control volumes:
Solid: _qgenþ _qcondþ _qconv ¼ 0 (1)
Fluid: _mCpðTf ;in�Tf ;outÞþ _qconv ¼ 0 (2)
where _qgen is power dissipation. It consists of two parts: dynamic power from the powersimulation, and static power, which depends on temperature, _qcond , _qconv are the heatconduction and heat convection rate terms, _m is the mass flow rate into the controlvolume. Cp is the heat capacity of the fluid. A system of energy equations could be builtfor the 3D ICs. The conduction and convection heat rate can be determined by thetemperatures of two tiers and fluid. So the system of energy equations could betransferred to a system of equations with only temperatures as variables. Afteriterative calculations, the temperature within each control volume can be obtained.
Figure 12 shows the temperature distribution of processor and L2 cache whenthe pumping power is 0.02W and processor is located at the bottom of the stack.
FPU (W) INT (W) DL1 (W) SC (W) FE (W) L2 (W)
Core 9 1.16 1.52 2.68 1.73 0.96 3.1Other cores 1.69 2.06 2.81 2.31 1.63 3.1
Table I.Power distributionfor each module
Heat into the fluid
Heat into the
control volume
Figure 11.Energy flow withinone control volume
1396
HFF25,6
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
The temperature of processor is non-uniform due to the non-uniform powerdistribution. The maximum temperature is near the exit of the fluid due to the bulkfluid temperature. Although the power distribution of L2 cache is uniform, itstemperature distribution is non-uniform due to the cross-tier heat conduction. Table IIshows the maximum temperature and leakage power under various pumping power.When the pumping power is 0.005W, the maximum temperature is 123.1°C and theleakage power is 26.7 W. If the air cooled heat sink instead of inter-tier microfluidiccooling is used for cooling, the temperature and leakage power would be much higher.When the pumping power is increased to 0.12W, the maximum temperature is reducedto 65.7°C and leakage power is reduced to 7.3W. This proves the inter-tier microfluidiccooling could achieve significant performance enhancement.
6. Data-driven modeling of air temperatures near data center racksPOD, introduced in Section 3.1, can pave the way for an efficient decision supportsystem to optimize the energy consumption of a data center. The most attractivefeature of POD is its computational efficiency and flexibility. Currently, data centercooling hardware is controlled based on air temperature measurements. Sampling rateis a critical cost item for such transient temperature data acquisition. High samplingrate means larger sampling cost and vice versa. To solve this problem, we conducted acase study with the hypothesis that a POD-based data-driven algorithm can improvethe temporal granularity of the temperature data, acquired using selected sensors.Data center air temperatures are a function of time and rack heat load. Depending uponthe sampling intervals of time and rack heat load, air temperature interrogation pointcan lie on one of the nine possible interrogation spaces, as shown in Figure 13. Four of
Processor
8
7
6
5
4
2
1
00 2 4 6 8
(mm)
40
50
60
70
80
89.4 °C 89.4 °CL2 cache
7
6
5
4
3
1
00 2 4
(mm)
6 840
50
60
70
80
2
8
Flo
w d
irect
ion
(mm
)
Flo
w d
irect
ion
(mm
)
3 Figure 12.Temperature
distribution ofprocessor and
L2 cache, pumpingpower 0.02W
Pumping power (W) Tmax(°C) Leakage power (W)
0.005 123.1 26.70.01 110.5 17.20.015 101.6 12.10.02 89.4 8.90.04 80.1 7.60.08 70.9 7.40.12 65.7 7.3
Table II.Maximum
temperature andleakage power under
various pumpingpower
1397
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
these nine zones, namely Zones A-D convey most physical significance. Zone Arepresents normal mode of operation, Zone B represents failure mode, Zone Crepresents critical mode, and Zone D represents retrofit mode. The modeling fidelity isdocumented in Figure 14. It shows the maximum errors for Zones A-D are equal to 1, 9,12.5, and 12.5 percent, respectively, similar to Ghosh and Joshi (2014b) for slightlydifferent conditions.
7. Prognosis-based reliability modeling of data center air temperaturesTemperature prognosis, or future prediction, at critical locations, especially at severinlets, is potentially beneficial for the reliability monitoring of data centers. PODfacilitates an efficient temperature prediction tool; however, the modeling accuracyis a critical concern for the prognosis. Therefore, error conditioning is an importantrequirement. The error – defined as the differences between data and correspondingpredictions – can be computed using a data-driven approach or by a semi-analyticalformulation. The semi-analytical formulation is more cost-effective because it requiresminimal hardware deployment. The analytical error is given by:
E ¼ c1 gþhð Þþ lexpðyÞ1�y
c2þc3gð ÞXni¼kþ 1
li
" #12
(3)
where g is the normalized temporal resolution, h is the normalized spatialresolution, l is the size of input space, θ is the normalized time, λi is the eigenvaluecorresponding to POD modes, k is the number of retained POD modes, c1, c2, c3 arearbitrary constants.
Figure 15 shows the numerical matching procedure to determine the empiricalconstants (c1, c2, c3). The redline with open triangular markers shows how the data-driven prediction errors–deviations between data and predictions – vary with time.The predictions are computed using input data generated by measuring transientresponse of air temperature ensuring sudden switch-on of the test computer room airconditioning unit. The temperature data are collected in [s interval at 10 s samplingperiod. Figure 15 shows the extrapolation horizon staring from 201 s. The extrapolationwindow lasts till 224 s, because the error becomes higher than an arbitrarily-specifiedabsolute error bound of 2°C. The empirical constants (c1, c2, c3) are determined by theconjugate gradient method. This matching shown in Figure 15 is conducted for
t_limit
t_bound
O Q_bound Q_limit
Failure ModeZone-B
Retrofit ModeZone-D
Normal ModeZone-A
Critical ModeZone-C
Figure 13.Two dimensionalparametric space,comprising heat loadand time, for datacenter airtemperatureprediction
1398
HFF25,6
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
DAT
A
300
932
2,00
0
567
PR
ED
300
932
2,00
0
567
DE
V
300
932
2,00
0
00.1
0.2
0.3
0.4
0.5
DAT
A
300
932
2,00
0
101214
PR
ED
300
932
2,00
0
101214
DE
V
300
932
2,00
0
00.5
11.5
2
DAT
A
300
932
2,00
0(°
C)
(°C
)(°
C)
(°C
)(°
C)
(°C
)
(°C
)(°
C)
(°C
)(°
C)
(°C
)(°
C)
7891011P
RE
D
300
932
2,00
0
7891011D
EV
300
932
2,00
0
00.2
0.4
0.6
0.8
1D
ATA
300
932
2,00
0
10121416
PR
ED
300
932
2,00
0
10121416
DE
V
300
932
2,00
0
00.5
11.5
2
Zon
e A
: Max
. Err
or~
1%Z
one
C: M
ax.
Zon
e B
: Max
. Err
or~
9%Z
one
D: M
ax.
Figure 14.Fidelity of POD-based prediction
framework
1399
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
the average air temperature. Similar procedure can be performed for all sensor points.Unless the data center condition is drastically changed, the calibration constants oncedetermined can be used repeatedly over time.
The importance of the error conditioning is evident from Figure 16, comprising thecontour plots generated from the air temperature data, predictions, and deviations
–2
–1.5
–1
–0.5
0
0.5
Time (s)
Ext
rapo
latio
n E
rror
(°C
)
NumericalAnalytical
201 205 210 215 220 224
Notes: For this particular data set, c1=3.1, c2=–0.02,c3=–0.02
Figure 15.Specification of apriori error by theconjugate gradientmethod
(a) (b) (c)Data
0 300 600
260
600
940
1,280
1,620
1,960
12
13
14
15
16
17
18
19
20
21
22POD
Time=206s0 300 600
260
600
940
1,280
1,620
1,960
12
14
16
18
20
22Deviations
0 300 600
260
600
940
1,280
1,620
1,960
–2
–1.5
–1
–0.5
0
0.5
Max Abs Error=2.2 °C
(°C) (°C) (°C)
Figure 16.Fidelity of POD-based temperatureprognosis at 206 s atthe test rack inlet
1400
HFF25,6
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
at the test server inlet at 206 s. Figure 16(c) shows the deviations are considerablylarge ranging from −2.0°C to 0.5°C, i.e. more than 10 percent of the maximum inletair temperature. These are consistent with Ghosh and Joshi (2013).
8. Sensor fusion for reducing temperature monitoring costMeasurement-based thermal control is of paramount importance to optimize energyconsumption in data centers. Therefore, cost-effective data center management demandsoptimizing sensor number for monitoring air temperatures at critical locations. Theoptimization problem pertains to minimizing sensor numbers, while the error remainsbelow a certain pre-assigned limit.
min ðsensor numberÞsuch that; deviation o error limit:
The effectiveness of POD for reducing the number of sensors is assessed using thetemperature data collected at the test rack exhaust located near the top of the test rackas shown in Figure 17. The input air temperature data are collected by 13 sensors at thelocations marked by black filled circles. These are used to predict air temperatures atthe locations indicated by open circles. The prediction accuracy of the POD-basedframework shows a maximum relative deviation of 2.2 percent. The high predictiveaccuracy at new locations indicates the usefulness of POD in reducing the number ofmonitoring sensors. In this particular case, the required sensor number has beenreduced to 13 from 21, a 38 percent savings in number of sensors. The detailed casestudy can be obtained in Ghosh and Joshi (2014a).
9. ConclusionThree examples are presented to illustrate the emerging thermal modeling needs toenable co-design of multi-scale IT systems, with the objective of energy efficientoperation, while meeting the performance and reliability targets. The computationally
Maximum temperature variation; usefulfor thermal prognostics
Z=2,000 mm Plane
600
450
300
00
8 new data generation from 13input data: 38% sensor saving
150 300 450 600
150
z
xy
Figure 17.Test plane for thesensor reduction
1401
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
intensive nature of full scale simulations, makes these infeasible over the severaldecades of length and time scales encountered. Instead, reduced order thermalmodeling techniques are employed, in order to couple the thermal analyses with theoverall co-design framework. Two such techniques illustrated are based on controlvolume energy balances, and POD. For the control volume based approach, combinedconduction and convection modeling of enhanced microfluidic cooling surfaces in 3Dchip stacking allows for a three-orders of magnitude speedup in computation, comparedto a fully resolved finite volume simulation, while keeping prediction errors with 5percent, thus making electrical/thermal co-design possible. For the POD technique, bothcomputational observations, as well as a data-driven modeling framework are presented.The POD technique is found to provide a two order of magnitude faster computation,compared to a full-field simulation, with prediction errors within 10 percent. This enablestheir effective use in multi-scale simulations using a Progressive Zoom-in approach.For the data-driven framework, a nearly 40 percent reduction in sensors needed forcreating air temperature maps for sensing based control of cooling hardware is achievedfor server racks in data centers. These techniques are potentially applicable to a broadrange of multi-scale microsystems.
References
Bakir, M.S. and Meindl, J.D. (Eds) (2008), Integrated Interconnect Technologies for 3DNanoelectronic Systems, Artech House, Norwood.
Bamal, M., Stucchi, M., Verhulst, A.S., Van Hove, M., Cartuyvels, R., Beyer, G. and Maex, K.(2006), “Performance comparison of interconnect technology and architecture options fordeep submicron technology nodes”, 2006 International Interconnect TechnologyConference, San Francisco, CA, pp. 202-204.
Barabadi, B., Joshi, Y. and Kumar, S. (2011), “Prediction of transient thermal behavior of planarinterconnect architecture using proper orthogonal decomposition method”, ASME 2011Pacific Rim Technical Conference and Exhibition on Packaging and Integration of Electronicand Photonic Systems, American Society of Mechanical Engineers, pp. 213-224.
Barabadi, B., Kumar, S., Sukharev, V. and Joshi, Y.K. (2015), “Multiscale transient thermal analysis ofmicroelectronics”, Journal of Electronic Packaging, Vol. 137 No. 3, pp. 031002-031002-8.
Barabadi, B., Kumar, S., Sukharev, V. and Joshi, Y. (2012), “Multi-scale transient thermal analysisof microelectronics”, ASME 2012 International Mechanical Engineering Congress andExposition, American Society of Mechanical Engineers, pp. 767-774.
Bizon, K., Continillo, G., Russo, L. and Smula, J. (2008), “On POD reduced models of tubular reactorwith periodic regimes”, Computers & Chemical Engineering, Vol. 32 No. 6, pp. 1305-1315.
Fuchs, K. (1938), “The conductivity of thin metallic films according to the electron theory ofmetals”, Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 34 No. 1,pp. 100-108.
Ghosh, R. and Joshi, Y. (2013), “Error estimation in POD-based dynamic reduced-order thermalmodeling of data centers”, International Journal of Heat and Mass Transfer, Vol. 57 No. 2,pp. 698-707.
Ghosh, R. and Joshi, Y. (2014a), “Proper orthogonal decomposition-based modeling frameworkfor improving spatial resolution of measured temperature data”, IEEE Transactions onComponents, Packaging and Manufacturing Technology, Vol. 4 No. 5, pp. 848-858.
Ghosh, R. and Joshi, Y. (2014b), “Rapid temperature predictions in data centers usingmulti-parameter proper orthogonal decomposition”, Numerical Heat Transfer, Part A:Applications, Vol. 66 No. 1, pp. 41-63.
1402
HFF25,6
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)
Gurrum, S.P., Joshi, Y., King, W.P. and Ramakrishna, K. (2004), “Numerical simulation of electrontransport through constriction in a metallic thin film”, Electron Device Letters, IEEE,Vol. 25 No. 10, pp. 696-698.
Holmes, P., Lumley, J.L. and Berkooz, G. (1998), Turbulence, Coherent Structures, DynamicalSystems and Symmetry, Cambridge University Press, Cambridge, CA.
Khan, N.H., Alam, S.M. and Hassoun, S. (2009), “System-level comparison of power deliverydesign for 2D and 3D ICS”, Proceedings 3DIC 2009, San Francisco, CA, pp. 1-7.
Lumley, J.L. (1967), “The structure of inhomogeneous turbulent flows”, Atmospheric Turbulenceand Radio Wave Propagation, pp. 166-178.
Meindl, J.D. (2003), “Beyond Moore’s law: the interconnect era”, Computing in Science &Engineering, Vol. 5 No. 1, pp. 20-24.
Mizunuma, H., Yang, C.L. and Lu, Y.C. (2009), “Thermal management of 3D-ICs with integratedmicrochannel cooling”, Proceedings of ICCAD, Vol. 2009, pp. 256-263.
Namba, Y. (1970), “Resistivity and temperature coefficient of thin metal films with roughsurface”, Japanese Journal of Applied Physics, Vol. 9 No. 11, pp. 1326-1329.
Roy, K., Mukhopadhyay, S., Mahmoodi-Meimand, H. (2003), “Leakage current mechanisms andleakage reduction techniques in deep-submicrometer CMOS circuits”, Proceedings of IEEE,Vol. 91, pp. 305-327.
Soffer, S.B. (1967), “Statistical model for the size effect in electrical conduction”, Journal of AppliedPhysics, Vol. 38 No. 4, pp. 1710-1715.
Sondheimer, E. (1952), “The mean free path of electrons in metals”, Advances in Physics, Vol. 1No. 1, pp. 1-42.
Sridhar, A., Vincenzi, A., Ruggiero, M., Brunschwiler, T. and Atienza, D. (2010a), “3D-ICE:fastcompact transient thermal modeling for 3D ICs with inter-tier liquid cooling”, Proceedingsof 2010 IEEE/ACM International Conf. Computer Aided Design, pp. 463-470.
Sridhar, A., Vincenzi, A., Ruggiero, M., Brunschwiler, T. and Atienza, D. (2010b) ,“Compacttransient thermal model for 3D ICs with liquid cooling via enhanced heat transfer cavity”,Proceedings of 2010 16th International Workshop on Thermal Investigations of ICs andSystems, pp. 1-6.
Tang, L. and Joshi, Y. (2005), “A multi-grid based multi-scale thermal analysis approach forcombined mixed convection, conduction, and radiation due to discrete heating”, Journal ofHeat Transfer, Vol. 127 No. 1, pp. 18-26.
Wan, Z., Xiao, H., Joshi, Y. and Yalamanchili, S. (2013), “Electrical/thermal co-design of multi-corearchitectures and microfluidic cooling for 3D stacked Ics”, Proceedings of 2013 19thInternational Workshop on Thermal Investigations of ICs and Systems, pp. 237-242.
Ziman, J.M. and Levy, P.W. (1961), “Electrons and phonons”, Physics Today, Vol. 14 No. 11,pp. 64-66.
Corresponding authorProfessor Yogendra Joshi can be contacted at: [email protected]
For instructions on how to order reprints of this article, please visit our website:www.emeraldgrouppublishing.com/licensing/reprints.htmOr contact us for further details: [email protected]
1403
Energyefficient IT
systems
Dow
nloa
ded
by G
eorg
ia I
nstit
ute
of T
echn
olog
y A
t 12:
03 0
7 A
ugus
t 201
5 (P
T)