Bringing 3D COTS DRAM Memory Cubes to SpaceBringing 3D COTS DRAM Memory Cubes to Space Anthony Agnesina Department of ECE Georgia Institute of Technology Atlanta, GA 30332 404-894-0373

Bringing 3D COTS DRAM Memory Cubes to SpaceAnthony Agnesina

Department of ECEGeorgia Institute of Technology

Atlanta, GA 30332404-894-0373

[email protected]

James Yamaguchi3D Stacking Systems

Irvine Sensors CorporationCosta Mesa, CA 30332

[email protected]

Christian Krutzik3D Stacking Systems


[email protected]

John Carson3D Stacking Systems


[email protected]

Jean Yang-ScharlottaNASA Jet Propulsion LaboratoryCalifornia Institute of Technology

Pasadena, CA 91109818-354-0412

[email protected]

Sung Kyu LimDepartment of ECE

Georgia Institute of TechnologyAtlanta, GA 30332

[email protected]

Abstract—This paper details the architectural choices and im-plementation challenges faced in the building and validationof a space-qualified 3D DRAM memory system, in an effortto offer high memory capacity, increased bandwidth, faulttolerance and improved size-weight-and-power characteristicsneeded for harsh space mission environments. Our novel hor-izontal 3D stacking technology called “Loaf-Of-Bread” (LOB)is used to integrate multiple Commercial-Off-The-Shelf (COTS)DRAM memory dies into a cube structure (3D-M3). A customRadiation-Hardened-By-Design (RHBD) controller sitting un-derneath the cube supplements the 3D-M3 in addressing COTSradiation weaknesses by including advanced SEU and SEFI mit-igation features such as error detection and correction, scrub-bing, device data rebuilding and die management. We developeda custom DDR physical layer (PHY) for 14 independent dies toconnect the 3D-M3 to its controller.

Validation and functional evaluation of the ASIC controllerwill be conducted prior to tape-out on a custom FPGA-basedemulator platform integrating the 3D-stack. The selected testmethodology ensures high-quality RTL as well as allows tosubject the cube structure to radiation testing. The proposeddesign concept allows for flexibility in the choice of the DRAMdie in case of technology roadmap changes or unsatisfactoryradiation results.

TABLE OF CONTENTS

1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. CUBE DESIGN CONSIDERATIONS . . . . . . . . . . . . . . . . . . . 13. CONTROLLER ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . 34. DESIGN FOR FLEXIBILITY . . . . . . . . . . . . . . . . . . . . . . . . . . 65. FPGA TEST BOARD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76. CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10BIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1. INTRODUCTIONThe on-board capabilities of space computing have not keptup with the needs of current and future important NASAmission scenarios, which today cannot be accomplishedin a robust and cost-effective fashion. In particular, theDeep Space exploration program has unfulfilled and unique

978-1-5386-6854-2/19/$31.00 c©2019 IEEE

requirements such as high-performance data processing orautonomous operations, along with extreme needs for en-ergy management, efficiency, fault tolerance and resilience.Although technology development efforts are underway intodeveloping a High performance Space Computing/Next Gen-eration Space Processor (HPSC/NGSP), a radiation hardenedmulti-core computing processor expected to fuel missionsuntil 2030, they do not address the limitations of currenton-board memory systems. Our cube would prove to be acomplementary memory system for HPSC and other spacecomputing systems targeting high performance. Our com-plete memory system can address the computational perfor-mance, energy management, and fault tolerance needs ofspace missions. The 3D memory cube can potentially beutilized in many aspects of the spacecraft beyond just themain compute/processor. Other spacecraft systems such asinstruments, detectors, sensors, and communication, mayutilize our high density, high bandwidth memory cube toenable high performance operations.

Progressive testing of Commercial-Off-The-Shelf (COTS)devices for space applications has opened up new windows ofopportunity for increased functionality of space electronics.Not only does this increase the component selection choicesbut it also drives down the procurement time and systemcost. Our approach is to leverage COTS memory devicesto maximize memory density, bandwidth and speed, by in-tegrating them into a 3D memory cube supplemented witha controller chip. Features integrated in the controller chipto address COTS deficiencies in terms of radiation toleranceinclude error correction and detection (EDAC), scrubbing,device data rebuilding, and die-level reboot and swap.

Our complete memory module can be directly connected toa host processor and act as a hot-pluggable DDR3 moduleof large capacity (8GB). The controller has been tested onan FPGA from the Xilinx Virtex-6 family and a completetest board structure is in development to subject the cubeto radiation sources. This paper is intended to be a naturalcontinuation and complement the work from the same authorspresented in the introductory article [1].

2. CUBE DESIGN CONSIDERATIONSAfter a long wait, three-dimensional integrated circuits (3DICs) are now becoming a fairly mainstream technology, within particular applications into consumer 3D memory cubessuch as the High Bandwidth Memory (HBM) [2] and HybridMemory Cube (HMC) [3], where multiple memory dies

1

are vertically integrated and connected through thousandsof through-silicon vias (TSVs), to offer “more than Moore”improvements in terms of size, capacity, speed, and powerconsumption. This type of memory parts could truly enablethe aforementioned space missions but unfortunately none ofthese are currently ready for space. In particular, it is unclearhow TSVs behave when deployed in cryogenic temperaturesenvironments [4].

Leveraging TSV-based 3D IC?

Very little work has been found on 3D IC reliability analysisfor space applications. Gouker et al. [5] studied radiationeffects on 3D IC for the first time and characterized SingleEvent Transients (SETs) for a small-scale logic circuit im-plemented into a 3-tier 3D IC using a 180nm SOI process.Each tier contains a simple circuitry that measures SET pulsewidth and is connected with 1.25µm TSVs. Their heavy-ion testing based on krypton shows that the pulse widthdistribution across the tiers exhibits non-trivial tier-to-tiervariations. They observed that various materials used in the3D IC lead to different levels of energy deposited duringthe radiation testing. The authors also conducted a similarstudy using proton and neutron beams on an SRAM moduleimplemented in the same 3D IC technology. Unlike in thelogic circuit, their SRAM module suffers less from tier-to-tiereffects when the un-thinned bottom tier is rad-hardened witha special SOI wafer. In both studies, however, their focus isnot on TSVs, and no rad-hard design method is presented. Inaddition, the designs used are nowhere close to commercialmemory cubes. Moreover, existing work on actual commer-cial memory cubes as such is limited and mainly focused onprocessing-in-memory architecture research. If some effortshave been done to reduce power and bandwidth overhead,none of these solutions address radiation hardening of mem-ory cubes nor their operations under extreme temperature.

One solution to alleviate the lack of radiation hardening ofthese stacks would be to incorporate an HBM or HMC typecube with a custom Radiation-Hardened-By-Design (RHBD)controller. That would give the advantage of utilizing state-of-the-art high bandwidth and low power memory stacks, aswell not needing to fabricate a full stack. Unfortunately, itis unclear as to the availability of the memory stacks alonewithout the logic tier, and the proprietary nature of thesetiers is a showstopper for agencies like NASA. Moreover,if building a module with both the memory and controllerbeing RHBD theoretically provides the highest robustnessand performance, the downfall of designing both a radiationhardened memory chip and a radiation hardened controllerchip is fairly complex and costly. Also, since the overalldesign would be relying on a specific memory, it would limitthe ability of the module to interchange memory.

The COTS Solution

These issues rationalize supporting the path to utilize a 3D-M3 of COTS memory with a custom RHBD controller. COTSdevices have had a lot of scrutiny and interest in the spacecommunity. Extensive testing on SDRAM COTS has beenundertaken over the recent years to understand their perfor-mance in a space environment. NASA has successfully usedSDRAMs COTS in spacecraft for critical applications such asfor the Compute Element within the Mars Science LaboratoryCuriosity rover.

As presented in [1], with proper component screening andradiation testing, the use of COTS devices within our memorystructure provides a low cost and effective solution to capture

DDR IF

PCB

packag

e

TSV

controller

sp

are

da

ta d

ie

ec

c d

ie

Figure 1. Our 3D-Memory package

the benefits offered by state-of-the-art commercial devices.Leveraging COTS memory devices stacked in a “Loaf-of-Bread” (LOB) fashion theoretically achieves comparable per-formances to the HMC, plus in particular increased den-sity, possible intermixing of different memory types such asMRAM, NAND Flash, etc. into a single cube, increasedfault-tolerance aspects such as individual die access, as wellas avoidance of TSVs use for interconnection between dies.

Our Stacking Method

In order to stack the COTS memory devices factors such asdie orientation, number of dies within the cube, die density,and IO pad out to the controller chip were analyzed. Ourfinal stack configuration contains fourteen DDR dies stackedin a LOB fashion as shown in the system-level descriptionin Figure 1. Eight DDR x16 devices make up the dataword, five extra devices store the ECC, and a spare die isincluded in case of die failure replacement. A DDR interfaceconnects to the external host DDR interface. A single cavitypackage surrounds the cube structure (3D-M3) flip bumpedto the logic controller tier. Interconnection of the activecircuitry on the top of the controller to the bottom IO pads ismade through TSVs. These TSVs will be filled with copperto eliminate the issues associated with tungsten filled vias,a high-Z material producing radiation effects that can bedetrimental to the operation of the device as proton generationcan affect the operation of the logic sections of the device [6].A lid will be welded to the top of the package to completelyenclose the module hermetically and for radiation shieldingpurposes. Despite subjecting the 3D-M3 to additional reflowprocesses during the assembly as well as requiring TSVs,a single cavity package has been chosen over a dual-cavitypackage because it has the advantage of allowing to test thecompleted 3D memory module prior to assembly into thepackage. Moreover the dual cavity requires an extremely highIO count to pass through the substrate which poses technicalrisks.

In the LOB cube configuration detailed in Figure 2, it ismore readily achievable for the controller to interact witheach individual die within the cube, compared to a verticalstacking method like in traditional TSV-based stacks. A Re-Distribution Layer (RDL) brings the necessary interconnec-tions from each die IOs to the bottom, where an Under BumpMetal (UBM) is used to form the IOs of the cube. The LOBstructure can be a tall cube depending upon the die X-Yfootprint. The list of benefits for the LOB structure listedin [1] such as point-to-point logic-to-memory interconnects,low thermal impedance, IO connectivity able to supportindividual die access for recover and rebuild process, areimportant and very desirable for mitigating radiation effects,

2

DRAM

die

TSV for

PDN

FillerRDL

UBM

Logic Tier

Figure 2. Our proposed “Loaf-of-Bread” structure

which makes the LOB structure a clear choice for the 3D-M3

design.

DDR3 Die Selection

To meet the present and future needs of space missions forhigh speed and high density memory, it was determined thatDDR would be the best choice. DDR4 device versus a DDR3device from the same manufacturer showed approximatelya 45% increase in single bit upset cross-section and a 17%logic upset decrease [7]. This would tend to conclude thatthe DDR4 devices are more susceptible to bit errors thantheir DDR3 counterparts but have a higher resilience to logicupsets, which could prove beneficial given the very high levelof ECC of the module. However, we selected the DDR3technology as it is still the most mature of the DDR seriesand has had a fair amount of review for radiation tolerancewhereas DDR4 has not had a lot of attention in this regardsas of this date.

The reviews from [8], [9], [10] showed that in generalDDR3 devices, typically from Micron, Samsung or Hynix,are strong candidates for implementation into space memoryapplications. No occurrences of destructive failures or single-event latchup were noted for any of the devices tested. Ageneral observation is that DDR devices are more sensitive tosingle-event functional interrupts (SEFIs) than single-eventupsets (SEUs). All devices tested showed relatively goodresistance to total ionizing dose (TID) effect, with devicesable to be used from ∼100 krad (Si) up to ∼400 krad. InMicron devices, idle current increases dramatically as theTID approaches the 100 krad range, whereas the Samsungand Hynix devices tend to show a more gradual increase inidle current with increased TID exposure. A noticeable trendin terms of tolerance to SEUs seemed to play out: during SEEProton testing, SEUs and Column/Row SEFIs showed anincrease towards low linear energy transfer. Typically, Hynixdevices showed the best resistance to SEUs than Samsung andMicron devices. In some instances, the Samsung and Hynixdevices showed orders of magnitude better cross sections thanthe Micron devices.

3. CONTROLLER ARCHITECTUREThe top-level block diagram of our controller is shown inFigure 3. The architectural choices are based on the mainrequirements that the controller has to integrate smoothlywith a host DDR interface so that the memory could beactively used by the processor — namely serve its requestswith very low latency, as well as control and perform thenecessary maintenance (error-mitigation) operations on the

14 independent dies memory stack.

The necessary features required for the controller and ad-dressed in the RTL code are in particular:

• read/write capability to the stack,• wide 14 DDRs physical interface (PHY) to the cube,• DDR interface to Host,• EDAC,• access to each individual die within the cube for indi-

vidual power management, ON-OF or rebuilding,• housekeeping,• SEFI error handling.

The specifications of the DDR interface between the host andcontroller have important repercussions on the architectureand latency of the memory controller.

Host Interface Considerations

Latency Concerns—The overall latency of the core processingis critical for the RTL design. Indeed, typical DDR3 deviceshave CAS latencies ranging from 7 to 16 clock cycles depend-ing on the data rate. When the host memory controller issuesa read to our memory module, the command is first issued tothe cube, and the data is brought back from the cube to thehost. Therefore, the latency of the path Host DDR PHY toMUX to Stack PHY to CAS Latency of the cube to EDACand back to the Host has to be reasonable (not much morethan twice the CAS latency) to be able to be handled by thehost integrated memory controller, and also for pure perfor-mance concerns. Minimizing the latency of each traversedblock is then critical as the overall latency increases quicklywith each block met on the path. Fortunately, the memorybandwidth is not affected by the additional processing.

Flexibility Concerns—A programmable bus width interfaceon the host side allows data widths densities (baseline ofx128) to be achieved depending on the host memory con-troller specifications. Note that due to the buffering of theinternal DDR3 die, the 3D-M3 presents a reduced load to thehost processor, which in turn allows higher densities to beachieved that might not otherwise be possible due to loadingand PCB layout issues.

Integration Concerns and Idle Detector—As it is expected forthe controller to be “transparent” to the host, namely functionwithout interacting with it and without provision for extensivesideband control, an Idle detector is included to determinewhen internal operations can be interleaved with regular hostoperations and interject scrubbing and other managementfunctions. A discrete approach using an “Idle” gpio inputsignal is provided to allow the host system to notify the cubethat the DDR3 bus will be idle – this will be required if arebuild is required and the host desires to have it completedimmediately. It does however present additional requirementsto the host processor memory controller but it is designedas an optional feature. Likewise, an SPI port allows foradditional housekeeping readout (query device operation, fullerror statistics, temperature, etc.), configuration of the cube aswell as initiate additional tests.

The Idle detector controls the multiplexer MUX to switchthe operation of our controller from Normal mode (i.e. aslave that serves the requests of the host in a passthrough likefashion) to Maintenance mode (the controller takes over con-trol to perform powerup, initialization, calibration, scrubbing,rebuild, etc.). Due to the fixed timing of a DDR3 interface,the Idle detector must insure that no host operations occur

3

STACKHOST

SPI

port

Support

x8, x16,

x32, x64,

x128 DDR

PHY

Control Logic

Idle detector

Memory

Controller

Maintenance MU

X

EDAC

ECC

LOG

Ba

nk

Sp

ira

lin

g

14

DDR

PHYs

SPI

IdleHost initiated idle

via pin or port

CE#,

NOP,

ZQ, REF...

Figure 3. Controller architecture

within the duration of the operation, i.e. the future. This isa difficult task as host operations occur asynchronously fromthe standpoint of the command execution. For this reason, thememory cube requires some timing adjustments on the partof the host memory controller to allocate a few hundreds ofnanoseconds extra on operations such as refresh (by adjustingtRFC to a higher value), ZQ, CE enable, or NOP commandsto allow the memory cube to perform its internal managementfunctions. For example, if the CE active timing is extended by200ns then our controller would be guaranteed 200ns to cleanup its execution on detection of CE going active — this wouldallow the module to perform internal functions whenever CEwas inactive.

Cube Interface Considerations — PHY Layer

As the main interface between the logic tier and the memorycube, the PHY provides a high-speed electrical interface tothe DRAM dies, integrating signal integrity features suchas equalization, PLL, voltage and temperature compensateddelays. The PHY also takes care of necessary features such aspower-up/on-the-fly calibration, lane training, write Leveling,etc.

Because we want each die to be independent (for example forindividual ON/OFF, calibration or rebuild), one PHY layeris implemented for each die. In order to account for thefly-by routing of traditional dual in-line memory modules(DIMMs), traditional DDR3 PHYs implement very advancedtiming procedures, which add several clock cycles of latencyto the CAS latency. As latency is a concern for our appli-cation, using them in their current form is not acceptable.Moreover, in the final application, the memory stack will beintegrated on top of the ASIC controller, which eliminatesany tight timing requirements due to the close proximity ofconnections.

Because our code will first be tested and validated on anFPGA, where complex training procedures are used to op-timize the sampling delays and compensate for timing er-rors introduced by the FPGA and physical interconnections,we relaxed the baseline performance requirements to allowworking at a reduced frequency and therefore use a simplifiedlower latency PHY. As a back-end for coding the customPHY, some parts of the Xilinx PHY MIG for Virtex-5 areutilized [11], in order to leverage an existing but simpleFPGA PHY infrastructure that can be easily customizable toour needs.

Error Mitigation Features

A goal of the 3D memory module is to provide robustprotection against all forms of radiation induced failures ofCOTS DDR devices. Our RHBD controller provides meansof mitigation for each of the error modes, as shown in Table 1.In the following subsections, we describe the details of someof the mitigation and detection techniques.

EDAC Implementation—We implement a byte wide SingleError Correction - Double Error Detection (SEC-DED) across8 data devices (or similarly a DEC-TED across 2 rows, i.e.16 bits data/10 bits ECC). This requires five additional DDRdevices as shown in Figure 4. This method incurs a 20%power penalty compared to a more powerful shortened ReedSolomon RS(22,16) (i.e. can correct up to 3 symbols with 2additional devices over RS code parity) but provides a muchlower latency as the SEC-DED calculation is a simple XORtree. In the case of SEC-DED, 16 Encoders/16 Decoders areneeded to fully encode/decode the 128-bit data word. Thedata and ECC are interleaved within the 128-bit data wordas shown Figure 4, so that retrieving each 8 bit mask allowsto detect the faulty locations in each die. By re-encodingthe decoded word, we provide flags to tell if a correctablewas detected or if an error was unable to be corrected. Thisencoding scheme allows the controller to correct one erroracross each row so that it can potentially correct a full dieincluding nibble errors on one die if all the other dies arewithout errors.

HSIAO codes are chosen to define systematic linear blockcodes for SEC-DED. Their fixed code word parity enablesthe construction of low density parity-check matrices and fasthardware implementations [12]. They present a high die costbut provide very high performance (encoding and decodingat clock speed is possible) which is important to reduce theoverall latency of the controller.

SEUs can be easily mitigated by our SEC-DED scheme. Onthe other hand, detection of SEFIs is challenging. IndeedSEFI failures can affect the entire device preventing properreadout of data: temporary burst errors, to faulty rows orcolumns, or even corrupting an entire die which is equivalentto 16 bit errors in our case. If the SEC-DED provides goodprotection against SEFIs that cause device failures in thesense that it can provide full recovery of a failed device,this approach is vulnerable to SEUs when a device failsand during device rebuild. To detect SEFIs, the controllercontains status and control registers. The EDAC stores a copyof the original data so that the corrected data can be XOR’d

4

Table 1. Failure mode mitigation

Failure Mode Mitigation DetectionSEU in memory array EDAC EDAC algorithm

SEU buildup Scrubbing Scrubbing over entire memory array

Leakage Reduce refresh rateBy ground based testing of estimatedleakage, plus programmable refresh ratecombined with temperature sensor

SEFI — data failure Power cycle and rebuildEDAC errors exceeding programmablethreshold level in same die

SEFI — high current Power cycle and rebuild Current draw anomaly at dieDevice failure Activate cold spare Device fails BIST after power cycle

SEC-DED

DEC-TED

d1 d2 d3 d4 d5 d6 d7 d8 ecc1 ecc2 ecc3 ecc4 ecc5 spare

Data word Parity

DDR Stack

14 DDR ICs x16

10101

10

0

10001

10

0

10111

10

0

10100

10

0

10101

00

0

10001

10

0

10111

10

0

10111

11

0

10101

10

1

10101

11

0

10101

10

0

10111

10

0

10101

10

0

10101

11

1

10101

10

0

11001

10

0

10101

10

0

10101

10

0

10001

11

0

10101

11

0

10101

10

0

10001

10

1

10001

10

1

10011

10

0

10001

11

1

11001

10

0

10101

10

0

10001

10

0

16b

Figure 4. DDR stack device layout with SEC-DED code (8b data/5b ECC) or DEC-TED (16b data/ 10b ECC)

with the original data to determine failing bits and incrementappropriate bit error count register. By tracking the level andfrequency of high error symbols, we can determine if it islikely that a device has been affected by a SEFI. For thecase of SEFI failure the assumption will be made that onlyone device fails at a time, multiple device failures will causestack malfunction. For this reason, it is critical that the rateof SEFIs be much lower than the rebuild time.

Scrubbing—As typical error rates are on the order of 10−10

per bit-day [10], or roughly 1 per device-day, SEU buildupis not hard to prevent by regular scrubbing. As the proba-bility of multiple bit errors within the same ECC code wordexponentially increases over the SEU rate, it is possible toperform a non-intrusive, low-rate, background scrubbing (itis typical to perform scrubbing after each refresh) that isstill orders of magnitude less than the expected error rate.To simplify the scrubbing process, rather than maintaininga clean/dirty bit for each ECC code word, the entire DDRdevice is zeroized on power up with a Built-In-Self-Test(BIST). This will include adding the proper ECC data suchthe scrub algorithm can proceed and operate properly foruninitialized and unused memory.

Another failure mode due to radiation exposure seen in DDRare stuck bits. Alone a stuck bit does not pose a problem asthe ECC will correct it as will remain tolerant to another SEU.Radiation testing has shown that stuck bits are not always

persistent and can be removed with a device reset. To performvalidation during scrub, if a bit error is encountered, thescrubber will store bit error location, repair, write back, andre-read. If data is still in error on re-read, the scrubber willrepeat the process. The cycle will repeat until a maximumrepeat count is hit. The system can trigger a device rebuild ifit determines excessive stuck bits are present.

Current Monitoring— As part of the power conditioningcircuitry, we implement a current monitor that tracks therelative current draw of each device within the stack. Asall the stacked devices are identical, only a relative currentprofile needs to be monitored such that current draw can becompared, i.e. the absolute value is not critical which relaxesdesign constraints of the current monitor. The current monitoruses a sufficient RC time constant to average current draw foreach measurement interval.

Built-In-Self-Test—The BIST module is required to performdevice initialization and zeroization with varying patterns tosimplify the scrub process. To enhance the capabilities, themodule also contains logic to read out and compare the datato provide full self-test capabilities. This module can alsobe triggered to operate a full device memory scan as needed.Our BIST implements pattern options for fast in-systemtesting, such as address, checkerboard, or 1’s/0’s. Each ofthese patterns has its strength and weaknesses but they aretypically sufficient for powerup type tests to validate the

5

operation of the module. For extended test capability, we alsoimplemented a powerful March X algorithm [13] that can testaddress decoding faults, stuck-at faults, transition faults andsome coupling faults that can be triggered if necessary and ifthe cube has not to be readily available as this procedure hasa high time complexity. The BIST is also designed to supportpattern offsets for each die such that parallel testing can beperformed where each die gets an offset.

Rebuild—The rebuild process consists of a single state ma-chine controller that can be muxed into any of the DDRdevice channels. The rebuild process requires a read/writecycle for the entire array and it is estimated to require lessthan 5 seconds at maximum priority. The rebuild logic alsocontains various debug registers that can be used for ground-level test and characterization as well as in-flight diagnostics.To prevent SEU buildup the logic tier performs a continuousscrub of the entire memory space at a programmable rate.Through proper EDAC implementations, active rebuilding ofa DDR device can be performed without incurring any downtime. The rebuild time will be much lower than the SEFIrate. The logic tier supports power cycling and/or resetting ofindividual die that become affected by a SEFI failure. Witha SEC-DED the array loses correction capabilities duringrebuild. In either case, it is critical that the rebuild occur asquickly as possible.

Bank Spiraling—As the stacked devices are in close proximityit is possible for radiation to strike the stack such that alllayers get affected within the same bank. To minimizepotential failures in similar memory cell areas of the dies,the controller performs an address mixing such that data isstored in alternate banks among the dies. An additional“spiraling” at system level, as shown in Figure 1, which isdone by alternating data dies with ECC dies in the cube canbe executed to further expand this concept.

Software Conditioning—Software conditioning proposed in[9] can help address SEFIs without data loss. The degreein which software conditioning can help appears to be man-ufacturer dependent as Samsung devices did not appear todramatically improve with software conditioning. However,[9] showed that most device SEFIs could be removed byapplying the C1 nondestructive procedure. In order to clearcertain SEFI failures, a full device power cycle may berequired however [14]. The C1 procedure consists of threeoperations normally performed during initialization only:

• rewriting the load mode registers of the DRAM die,• resetting the internal DLL of the DRAM die, and• re-performing the ZQ calibration.

Our controller can perform periodical dynamic software con-ditioning procedure when required, and independently be-tween dies. In order to implement the procedure, the blockin charge of initialization inside each PHY was modified sothat to re-trigger the initialization steps and skip the unwantedcalibration after initialization ended.

Scheduling-based Mitigation Features— An interesting ob-servation by [9] recommends to adopt a closed page policybecause idle banks are more sensitive to SEEs. For thisreason, whenever timing is not critical, our controller issuesa Precharge All after every maintenance operation in order toclose unused rows as soon as possible.

Verification

We carried out extensive Verilog simulations using ISE Sim-ulator from Xilinx on the RTL code to test and verify thefunctionality of the controller in handling different memoryoperations and its ability to perform the required maintenanceoperations. The test benches include mechanisms to properlytest SEE and SEFI type failures. In our test benches, thelogic controller connects to the 14 PHYs, themselves con-nected to their corresponding DRAM die (8Gb x16 DDR3Verilog model from Micron) through PCB traces simulatedby wire delay models. Simulations (see Figure 5) showcorrect operation of the controller, including DRAM initial-ization and calibration, read/write operation with refreshes,Bank spiraling, variable rate refresh, self-Refresh low powermode, ZQ Calibration, BIST, Software conditioning as wellas effective working of the Idle Detector in switching fromNormal to Maintenance modes. In order to specifically testthe interaction with the host, we developed a host emulatorincluding an integrated memory controller and PRBS-basedaddress/data pattern generators.

4. DESIGN FOR FLEXIBILITYOur design methodology focused on allowing interchange-ability of the memory device to migrate to different densitiesor different technology (e.g. DRAM to MRAM) as well as al-low interchangeability of the host processor. Throughout thedevelopment, efforts have been made to allow upgradabilityof the memory devices used in the 3D-M3 by maintaininga common footprint such that the controller chip can bereused without too much changes in its architecture. Physicallimitations on die size exist, but proper re-routing of signal onthe RDL can provide compatibility for a certain range of diesizes.

Controller Flexibility

The controller has been designed in an open, modular archi-tecture so that it can support multiple operating modes. Thehost data width (originally x128) can be reduced to targetmore classical widths (x32 or x64), at the expense of under-utilizing the full capability of the cube. The controller canalso be parameterized to target different commercial DRAMdies by simply changing various timing parameters (CASlatency, tREFI, etc.) and load mode register values. As an al-ternative to DRAM, DDR3 MRAM chips showed extremelyhigh TID [15]. Thus, a STT-MRAM die can also beingswitched to, with slight modifications in the controller aschanging timings, triggering anti-scribble during calibration,scramming (precharge all) before power-down and disablingrefreshes.

Die Flexibility and RDL Redesign

With the possibility of needing to redesign the RDL layer toaccommodate a different DDR3 die, reviews indicated thatchanges to the RDL design will cascade to the cube bus IOdesign and finally to the cube interface board design. Theability to readily modify the RDL design from one die type toanother can be relatively difficult due to the different die sizes,IO pad spacings & pitch, and IO pad sizes. This is especiallytrue when only a two metal RDL is used. Due to the needto fanout the RDL circuitry, adjacent die to the one modifieddie are sacrificed in order to achieve the necessary fanout.This is advantageous to attempt to utilize different memorydies to match a set IO pattern on the final RH controller.This will allow the use of single RH controller design thatcan accommodate different memory for the 3D memory cube.

6

Initialization & Calibration Switching of MUX select

BIST after power-up for zeroization Interleavead host

operations

Figure 5. Simulation of the controller

Cost wise, this makes sense as a majority of the cost for thefinal 3D memory module is in the RH ASIC controller. Inorder to make this concept feasible, it is necessary to move toa four-metal layer RDL in order to effectively route out thenecessary interconnects from the die to match the IO patternon the RH controller. This would allow for two circuit layers,a power layer, and a ground layer. This also would allowrelaxed circuit spacing (i.e. wider traces and spaces) whichcould relax the necessary circuitry metal thicknesses which inturn may reduce the necessary dielectric thicknesses betweenlayers. A possible drawback with increasing the number ofmetal layers is that due to the fact that the RDL processis sequential, the chances of defect densities increase withincreasing layer counts. This can be minimized by beingable to relax some of the design parameters which wouldhelp reduce the possibility of defects occurring during theprocessing. Newer dielectric materials (e.g. photo-definabledielectrics) may also produce more repeatable results, whichmay help reduce defect density during processing.

5. FPGA TEST BOARDTo provide a breadboard validation of the design prior tofabrication of the final custom ASIC controller tier, we de-velop an FPGA-based breadboard. The FPGA emulating thefunction of the controller will connect to an actual 3D-M3.

FPGA Implementation of the RTL Controller

Though the memory controller architecture easily fits withina commonly available FPGA, the initial design did exhibitsome performance issues owing to high IO requirements ofnearly 1,000 pins. The high IO requirement is driven bythe need of the controller to communicate with 14 memorydies as well as with the host memory controller. A XilinxXC6VLX550T-1FF1760C FPGA with 1200 IOBs has beenchosen as a target device. The high IO utilization constrainsplacement and results in potential routing delays that willreduce operating speed. Typical FPGA implementation re-

quirements for pin placement, such as allocating only centerbanks for address and control pins, cannot be respected dueto the IO limitations. This is however an “FPGA only”type issue as the final ASIC design would overcome theselimitations due to custom IO placement and area-array TSVs.Therefore, the FPGA design initial focus is on supportingminimum operating speeds of the DRAM dies for the firstpass (300 MHz as baseline goal with DLL on).

Manual placement of the IOBs primitives (I/O-DDRs, IODE-LAYs, read data capture Flip-Flops) has been completed forthe 14 PHYs to ensure proper timing closure. The locationof these Flip-Flops and the routes between the IDDR andfabric Flip-Flops must be carefully matched. The Xilinx ISEtool was used to map the controller architecture shown inFigure 3 on the Virtex-6, with the exception of the DDR hostPHY which is currently in development stage. We achievedtiming closure at 300 MHz. Fabric utilization of the FPGAresources is shown in Table 2. Figure 6 shows the mappingof the controller on the designated FPGA. The utilization isexpected change substantially for the ASIC implementationwith the addition of logical radiation-hardening techniquesto detect and protect against soft errors, such as distributedTriple Modular Redundancy, protection of redundant logicand fault-tolerant Finite State Machine implementation.

FPGA Test Platform

The block diagram of the test board is shown in Figure 7.In addition to integrating the FPGA-emulated controller and3D-M3, the test board includes the following features:

• power supply current monitor for applicable rails,• an external DDR3 interface for connection to host pro-

cessor,• connector-less logic probe pads for at least 1 DDR of

the 3D-memory cube to enhance debug-ability,• FPGA side channel IO for real-time performance mon-

itoring and logging (e.g. a USB3.0 interface is usedto perform memory dumps and allow for uploading

7

Table 2. Design metrics of the FPGA logic controller

Resource Utilization Utilization %Slices 5,686 6%

Slice LUTs 10,674 3 %Slice Registers 13,063 1 %BRAM36E1s 11 1 %

Bounded IOBs 809 67 %BUFIODQS 28 19 %

BUFGs 4 12 %IDELAYCTRLs 18 50 %IODELAYE1s 280 19 %

Figure 6. Floorplan of the FPGA-mapped controller design:14 PHYs, Control Logic, and EDAC

particular memory patterns to simulate error conditionsand faults),

• micro jumpers for at least one DDR device to allow“hard” error testing, and

• voltage switching for DDR layers.

As the FPGA fabric itself does not support device powerswitching, external FETs will be utilized under the controlof the FPGA. The FPGA will support analog-to-digital con-verters to allow power monitoring.

The test board requires schematic design, layout, and carefulsignal integrity analysis to insure adequate operation margins.This task is actually more critical for the breadboard asthe interconnect between memory cube and FPGA emula-tor is physically much further apart. This requires careful

impedance matching, length matching, and crosstalk evalua-tion. When the final design is migrated to an ASIC, the cubeinterconnect electrical concerns are virtually eliminated dueto the extremely short interconnect (≤ 5 mm from die pad).

Custom Substrate for Cube Mounting

The preliminary concept for integrating the stack onto theFPGA based logic tier board is shown in Figure 8. Low-profile spring-arrays (interposers) in the 4 quadrants allow aceramic high-density build-up substrate to connect to the loadboard on the peripheries of the stack substrate. This providesthermal expansion stress relief and fan-out for the tight BGApitch of the memory cube. Secure contact over the connectorareas is made by pressure distribution compression rings.

The most critical area of the substrate is the cube BGAinterface. The BGA pad size is estimated at 150µm with150µm clearance. The pads will be non-solder mask definedto provide a well-controlled surface area for each ball. Thestack uses a staggered BGA pattern which dictates that theIOs route out from one side (see Figure 9) for all the layerssince the vertical direction is blocked by the BGA patternitself. To maintain signal integrity (match impedance andelectrical length), all DDR signal lines from one device arerouted on the same layer in a stripline configuration. Due tothe staggered pattern, the BGA escape pattern requires theuse of blind vias and 7 signal layers to allow for each layer toroute out unobstructed by adjacent layers.

Host Emulator

Given that no space processor is currently available to test anddemonstrate the memory module, a host emulator has to beused for the final test setup in order to exercise the memorycontroller and perform pattern testing, error reporting, andhandle user interface tasks. A few approaches for hostemulation were considered. One concept is to use an off-the-shelf motherboard with DIMM slots with a custom adapter toallow the FPGA emulator to connect to the DIMM slots viaa cable. This has the benefit of providing a very flexible in-terface and a proof-of-concept of operability with a standardmemory controller (as the motherboard will accommodate anactual CPU). A limit, however, is the length of additionalcabling required as well as minimum speed setting which mayinterfere with timing control if required settings are out ofrange (most motherboard designs are tailored for the high-endnot the low-end). A dual-channel is required to support thefull 128-bit width and the open source MemTest86 programprovides necessary software for performing memory testingon a motherboard platform. The other solution would be touse an FPGA based host emulator. This option is the mostflexible in terms of adapting to required pinout and cablingrequirements but it has more limited test features.

Radiation Test Environment

In an effort to ultimately subject the 3D memory cube toradiation testing (both TID and SEE), we propose a test setupfor radiation testing. For performance concerns, the intercon-nection between the 3D memory cube/interface substrate tothe FPGA emulator requires the close proximity of the FPGAto the cube. However, this configuration does not lend itselfto be subjected to radiation testing as the components on theFPGA board would not be shielded from the radiation sourceand it would be too expensive to procure radiation hardenedcomponents for the FPGA board. A low-speed approach forradiation testing using two boards is proposed and shown inFigure 10. The setup interconnects the two boards through aseries of cables. It is speculated that there will be approxi-

8

3D-STACK FPGA

Sta

ck

Co

nn

ec

tor

DD

R C

on

ne

cto

r

HOST

Clocks JTAG USB3.0

Current

Monitors

Voltage

profiling

Voltage

switching

Debug

Channel

System

Power

Conditioning

Figure 7. Test board architecture

Figure 8. Cube substrate to FPGA board connectionconcept

mately 1500 IOs that will need to be interconnected betweenthe two boards. For TID testing, the DUT has to be in front ofthe radiation source so cable lengths in the 15’ to 20’ rangeto adequately isolate any test electronics from the radiationsource. For the SEE testing, shorter cables can be used as theradiation source is more pinpointed.

6. CONCLUSIONSIn this research we developed and demonstrated a radiationtolerant stacked memory array based on state-of-the-art chipstacking and radiation mitigation technologies. Our mod-ule can be directly connected to a host processor and actas a highly reliable DDR3 module thanks to its integratedRHBD controller. Using our LOB technology to stack COTSmemory dies, we can avoid the use of proprietary TSV dies,achieve high memory capacity and good radiation tolerance.The controller has been tested in hardware and a complete testboard structure is in development to subject the cube to radi-

Figure 9. Cube substrate composite of signal layers

ation sources and assess the performances of the controller.Our modular architecture allows flexibility in the choice ofthe memory die. Our cube development design methodologyis intended to be a building block that provides a path toadditional opportunities such as integration of non-volatilememory or computing resources.

We believe the technology and architecture developed underthis program can find utility in pure commercial applicationssuch as high-performance computing environments. The3D-M3 technology may be applicable for direct attachmentto graphical processing units substrates to improve perfor-mance (i.e. die-to-die interconnection, and DDR interfaceswould not be required to go off package). These stacks usebare silicon die that are rarely, if ever, dynamically burned-in resulting in infant mortality that can take out an entirememory stack. The proposed 3D architecture and controllerdesign will provide fault tolerance to overcome this problemto enable much taller, and hence, more capable stacks.

9

STACK

FPGA

FPGA Load BoardInterposer

Cable interface boardSubstrate/Cube

interface board

Cables

(15’ to 20’)

Connector

compression ring

Inside radiative chamber

Figure 10. FPGA radiation testing environment

ACKNOWLEDGMENTSThis research is funded by the NASA SBIR Grant under thecontract number NNX17CP02C. A portion of the researchwas carried out at the Jet Propulsion Laboratory, CaliforniaInstitute of Technology, under a contract with the NationalAeronautics and Space Administration.

REFERENCES[1] Anthony Agnesina, Amanvir Sidana, James Yamaguchi,

Christian Krutzik, John Carson, Jean Yang-Scharlotta,and Sung Kyu Lim. A Novel 3D DRAM Memory CubeArchitecture for Space Applications. In Proceedings ofthe 55th Annual Design Automation Conference, DAC’18, pages 24:1–24:6, New York, NY, USA, 2018.ACM.

[2] D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y.Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park,J. W. Shin, J. H. Cho, K. H. Kwon, M. J. Kim, J. Lee,K. W. Park, B. Chung, and S. Hong. 25.2 A 1.2V 8Gb8-channel 128GB/s high-bandwidth memory (HBM)stacked DRAM with effective microbump I/O test meth-ods using 29nm process and TSV. In 2014 IEEEInternational Solid-State Circuits Conference Digest ofTechnical Papers (ISSCC), pages 432–433, Feb 2014.

[3] J. T. Pawlowski. Hybrid Memory Cube (HMC). In 2011IEEE Hot Chips 23 Symposium (HCS), pages 1–24, Aug2011.

[4] Sung Kyu Lim. Bringing 3D ICs to Aerospace. JICCE,12(2):117–122, June 2017.

[5] P. M. Gouker, B. Tyrrell, M. Renzi, C. Chen, P. Wyatt,J. R. Ahlbin, S. Weeden-Wright, N. M. Atkinson, N. J.Gaspard, B. L. Bhuva, L. W. Massengill, E. Zhang,R. Schrimpf, R. A. Weller, M. P. King, and M. J. Gad-lage. SET Characterization in Logic Circuits Fabricatedin a 3DIC Technology. IEEE Transactions on NuclearScience, 58(6):2555–2562, Dec 2011.

[6] Yinghong Lin and David Joy. A new examination ofsecondary electron yield data. 37:895 – 900, 11 2005.

[7] M. Park et al. Soft error study on DDR4 SDRAMs usinga 480 MeV proton beam. In 2017 IEEE InternationalReliability Physics Symposium (IRPS), pages SE–3.1–SE–3.6, April 2017.

[8] D. M. Hiemstra. Guide to the 2016 IEEE Radiation Ef-fects Data Workshop Record. In 2017 IEEE RadiationEffects Data Workshop (REDW), pages 1–7, July 2017.

[9] Martin Herrmann, Kai Grurmann, Fritz Gliem, Ha-gen Schmidt, Gilbert Leibeling, Heikki Kettunen, andVeronique Ferlet-Cavrois. New SEE Test Results for 4Gbit DDR 3 SDRAM. 2012.

[10] F. Gliem M. Herrmann, K. Grurmann and H. Schmidt.Radiation Tests of Candidate Memory Devices DDR3SDRAM and NAND Flash, 2013.

[11] Xilinx. UG086 Memory Interface Generator, UserGuide v3.6, 2010.

[12] V. Gherman, S. Evain, N. Seymour, and Y. Bonhomme.Generalized parity-check matrices for SEC-DED codeswith fixed parity. In 2011 IEEE 17th International On-Line Testing Symposium, pages 198–201, July 2011.

[13] T. Koshy and C. S. Arun. Diagnostic data detection offaults in RAM using different march algorithms withBIST scheme. In 2016 International Conference onEmerging Technological Trends (ICETT), pages 1–6,Oct 2016.

[14] K. Grurmann F. Gliem, D. Walter and M. Herrmann.Memory Technology Trends and Qualification Aspects,2012.

[15] J. Heidecker. MRAM Technology Status, 2013.

BIOGRAPHY[

Anthony Agnesina received the Diplomed’Ingenieur from CentraleSupelec, Gif-sur-Yvette, France, in 2016 and the M.S.degree in electrical and computer en-gineering from the Georgia Institute ofTechnology, Atlanta, GA, USA, in 2017,where he is currently pursuing the Ph.D.degree with the School of Electrical andComputer Engineering. His current re-search interests include 3D memory ar-

chitectures, computer-aided design of VLSI circuits, andapplied machine learning to electronics.

10

James Yamaguchi is VP of Irvine Sen-sors Corporation’s 3D Electronics andMass Storage group. He received hisB.A. in Chemistry in 1977 and his B.S. inChemical Engineering from CaliforniaState University, Long Beach. He joinedIrvine Sensors Corporation in 1993. Mr.Yamaguchi has over 39 years of hands-on processing experience in the areasof 3D packaging, thin film deposition,

electroplating, high density MCM fabrication and PWB fab-rication. He has extensive R&D experience in fabricationtechnologies, process design and integration, facility opera-tions and technology transfer. He is co-inventor on 14 patentsand has co-authored 6 journal articles.

Christian Krutzik is a Senior ElectricalEngineer at Irvine Sensors Corporation.He received his Masters Degree in Elec-trical and Computer Engineering fromUniversity of California, Irvine. Mr.Krutzik has over 18 years of experienceinvolving electrical design and evalua-tion of 3D stacked modules and systemminiaturization. Projects have includedSSD design and development, compact

1TB USB SSD drives, NAND Flash characterization forsecure erase products, FPGA development for LiDAR ap-plications, miniaturized and wearable biotelemetry sensors,and acoustic processing/telemetry over RF using hearing-aid technology. Mr. Krutzik is currently involved with thedesign and development of a secure SSD product line andhas experience with developing and debugging SATA, PCIe,and other high speed interfaces. Mr. Krutzik was alsoinvolved with a project to evaluate NAND flash devices fordata remanence which included low level analysis of Flashdevices and intricate knowledge of standard SSD processorcapabilities. Further experience at Irvine Sensors includesstacked computer systems running Linux, various RF inter-faces, high-density stacked memories, storage devices suchas SSD’s, FPGA’s, embedded firmware, software, and DSP-based systems. Mr. Krutzik also has experience with multipleelectrical simulation tools such as Hyperlynx, HSPICE, andother FEA tools. Software experience includes C, Python,Matlab, Javascript, PHP, HTML, assembly, and Labview onboth Linux and Windows platforms.

John Carson is the President & CEOand a founder of Irvine Sensors Corpo-ration. Upon graduation from MIT in1961, Mr. Carson joined Baird Atomics,Inc. in Cambridge and Waltham, Mas-sachusetts, where he became Project En-gineer and Assistant Program Manageron a space-based infrared surveillanceprogram. He left Baird Atomics in 1967to form a consulting company for sen-

sor systems design and development, with clients that in-cluded North American Aviation, Honeywell, Lockheed, FordAeronutronics, and Grumman. Over the years, this companyevolved into Irvine Sensors Corporation. Mr. Carson hasbeen responsible for the design and development of space-based military and NASA sensor systems, commercial spec-trometers, commercial laser printers, 3-D electronics, andvarious neuromorphic sensor and electronics systems for themilitary. Mr. Carson is the holder of 23 patents. In additionto his Irvine Sensors duties, he has chaired the Industry Advi-sory Board for the Caltech Center for Neuromorphic Systems

Engineering and has served on the Scientific Advisory Boardfor the Egg Factory, an incubation firm headquartered inRoanoke, Virginia.

Jean Yang-Scharlotta (M07) receivedher B.S. degree in chemical engineer-ing from The University of Texas inAustin, in 1990 followed by M.S. andPh.D. degrees in chemical engineeringfrom Stanford University in 1996. Sheconducted engineering and research atthe Los Alamos National Laboratoryfrom 1987-1991 on radioisotope chem-istry prior to graduate research in self-

assembled monolayers at Stanford University. From 1996 to2008, she was a device technology engineer then departmentmanager at the Advanced Micro Devices focusing on intro-duction and scaling of new flash memory device technologies.She ushered in the first three generations of the MirrorBitTMdevices and technologies. She has been a senior microelec-tronics specialist in the components assurance office at JPL.Dr. Yang-Scharlotta holds more than 70 U. S. patents andis the author of many publications on memory devices andtechnologies. Her research interests include memory tech-nologies, electronics and materials in radiation and extremeenvironments, physics of failure, interfacial phenomenon andtechnology development.

Sung Kyu Lim received the B.S., M.S.,and Ph.D. degrees from UCLA in 1994,1997, and 2000, respectively. He joinedthe School of Electrical and ComputerEngineering, Georgia Institute of Tech-nology in 2001, where he is currentlyDan Fielder Endowed Chair Professor.His current research interests includemodeling, architecture, and electronicdesign automation (EDA) for 3D ICs.

His research on 3D IC reliability is featured as ResearchHighlight in the Communication of the ACM in 2014. Dr.Lim is a recipient of the National Science Foundation FacultyEarly Career Development (CAREER) Award in 2006. Hereceived the Best Paper Awards from the IEEE Asian TestSymposium (2012) and the IEEE International InterconnectTechnology Conference (2014). He has been an AssociateEditor of the IEEE Transactions on Computer-Aided Designof Integrated Circuits and Systems since 2013.

11

Documents

Bringing 3D COTS DRAM Memory Cubes to SpaceBringing 3D COTS DRAM Memory Cubes to Space Anthony Agnesina Department of ECE Georgia Institute of Technology Atlanta, GA 30332 404-894-0373