Csaba Andras-Moritz ECE 668 3D IC Technology and Emerging 3D Processors

Embed Size (px)

DESCRIPTION

3 Motivation  Device scaling challenges V DD, V TH not scaling linearly Secondary effects  System level power / performance challenges Interconnection bottleneck Increasing RC  Fabrication challenges Lithography limitation Doping control challenges Performance trend [1] Lithography challenge with scaling [1] [1] J. Warnock, "Circuit Design Challenges at the 14nm Technology Node", pp , DAC 2011

Citation preview

Csaba Andras-Moritz ECE 668 3D IC Technology and Emerging 3D Processors 2 Outline Motivation TSV-based 3D IC and Monolithic 3D IC Skybridge Fabric NP-dynamic Skybridge Fabric Skybridge-3D-CMOS Fabric 3 Motivation Device scaling challenges V DD, V TH not scaling linearly Secondary effects System level power / performance challenges Interconnection bottleneck Increasing RC Fabrication challenges Lithography limitation Doping control challenges Performance trend [1] Lithography challenge with scaling [1] [1] J. Warnock, "Circuit Design Challenges at the 14nm Technology Node", pp , DAC 2011 4 TSV 3D ICs: uses normal die process but needs special packaging Monolithic 3D IC: uses smaller vias, applies sequential process for each die TSV 3D IC has easier process and higher throughput Monolithic 3D ICs has better RC reduction Monolithic 3D IC vs. TSV-based 3D IC Implementation of TSV based 3D ICs [2] (via dimension: 5-10 m) Block-level Monolithic 3D [3] (via dimension: nm) [2] J. H. Lau, et al., TSV manufacturing yield and hidden costs for 3D IC integration, pp , ECTC, 2010 [3] S. Panth, et al., High-density integration of functional modules using monolithic 3D-IC technology, pp , ASP-DAC, 2013 5 Transistor-level Monolithic 3D: Fine-grained 3D IC with Intra-cell benefits Simplified process for each die due to single type of MOS Less cost of each layer due to less mask layers Uses existing commercial CAD tools for placement and routing Gate-level vs. Transistor-level Monolithic 3D IC Gate-level Monolithic 3D [4] Transistor-level Monolithic 3D [4] S. Panth, et al., "Design challenges and solutions for ultra-high-density monolithic 3D ICs", pp 1-2, S3S 2014 6 Inter-layer dielectric to avoid coupling between tiers Monolithic Inter-layer via to connect pull-up and pull-down network Cell-to-cell routing uses metal layers in top-tier Overview of Transistor-Level Monolithic 3D IC 7 Monolithic 3D IC Bottom-up Sequential Process 8 True Vertical Integration Addresses 3D Device, circuit, connectivity, heat management and manufacturing requirements Follows a template based approach with uniform vertical nanowires Achieves tremendous benefits across all aspects Skybridge: 3D Integrated Framework Abstract View of Envisioned Skybridge Fabric [5] [5] M. Rahman, et al., "Fine-grained 3-D integrated circuit fabric using vertical nanowires", pp , 3DIC 2015 9 Skybridge Fabric Components Fabric assembly by integration of core components Specially architected for 3D Core Fabric Components 10 Vertical Gate-all-Around Junctionless Transistor Single type uniform V-GAA Junctionless transistor as active device Simple device structure Uniform doping Device formation by material deposition Junctionless Device Structure and TCAD Simulation results 11 Skybridge 3D Circuit Style Dynamic circuit style amenable to physical implementation Uses only single type uniform n-type V-GAA Junctionless transistors Supports compound, cascaded dynamic circuits with both dual rail and single rail inputs Skybridge 3D Circuit Style. A) XOR Schematic; B) HSPICE Simulation; C) XOR Layout [6] [6] M. Rahman, et al., "Skybridge: 3-D Integrated Circuit Technology Alternative to CMOS" 2014; 12 Various circuit options for optimizations High fan-in support due to dynamic circuit style Skybridge Circuit Styles, and High Fan-In Options Compound vs Cascaded Circuits with Dual Rail and Single Rail Fan-in Sensitivity Analysis 13 Follows Skybridge circuit style Utilizes fabric components 32 transistors for Full Adder accommodated in just 4 logic nanowires Full Adder Implementation in Skybridge Full-Adder Layout HSPICE Simulated Waveforms Full Adder Schematic 14 Volatile memory with single type transistors No sizing/doping requirements as in SRAM Two cross-coupled dynamic NAND gates for storage Addresses noise, leakage power concerns through circuit level designs Volatile Memory in Skybridge Fabric Skybridge RAM Schematic Simulated HSPICE Waveforms 15 Intrinsic fabric features for noise mitigation Engineered GND shielding approach All signal routing through Coaxial structures Noise Mitigation in Skybridge 16 Higher gate voltage in Precharge transistors to boost current Long interconnect delay mitigation Logic replication, dynamic buffer insertion, CMOS-like inverters in long interconnect paths Signal Pull-Up and Delay Mitigation Inverters in Long Interconnect Path [7] [7] S. Khasanvis, et al., Architecting Connectivity for Fine-grained 3-D Vertically Integrated Circuits, NANOARCH, pp , 2015. 17 Arithmetic circuit design examples with Adders and Multiplier High fan-in circuit designs to evaluate scalability potentials Arithmetic Circuit Design Examples in Skybridge Array Multiplier Design (Block Diagram) 8 and 16 bit CLA designs (Block Diagram) 18 Benchmarking with respect to equivalent CMOS designs at 16nm WISP-4: 30x density and 3.5x performance/watt benefits High Bit-Width Arithmetic Circuits: 16-bit CLA design achieves 60.5x density, and 16.5x performance/watt benefits Analytical Interconnect Modeling Results: 10x less interconnect length, and 100x less repeater count Benchmarking Results (Skybridge vs. 2D-CMOS) CLA Throughput (s -1 ) Power (W) Area (m 2 ) CMOSSBCMOSSBCMOSSB 4-Bit Multiplier 5.0e95.1e Bit CLA9.9e910.4e Bit CLA4.5e95.7e Bit CLA 2.4e93.7e Benchmarking of Arithmetic Circuits 19 Integrated fabric approaches extending Skybridge 3-D concepts to incorporate both n-type and p-type transistors NP-Dynamic-Skybridge (NP-D-SB): an integrated framework to achieve NP-dynamic circuits in vertical nanowires SkyBridge-3D-CMOS(S3DC): an integrated framework to achieve static circuits in vertical nanowires NP-Dynamic Skybridge and Skybridge-3D-CMOS Fabric 20 Specifically designed fabric components for incorporating both p- and n-type transistors Vertical Si nanowire array with p- and n-doped regions as building blocks Device engineering for designing both p- and n-type transistors SB-ILC provides Ohmic connection between doping regions Fabric Components Fabric Components [8] [8] J. Shi, et al., Architecting NP-Dynamic Skybridge, NANOARCH, pp , 2015 21 Support for elementary logic gates NAND, NOR with vertically stacked transistors in a single nanowire Compact implementation 5-in NOR needs 5 nanowires in SB, but only one nanowire in NP-D-SB NP-D-SB NOR and NAND Gate NOR GateNAND Gate 22 Improved in logic diversity and flexibility Skybridge is limited in AND-of-NANDs logic for compound gate NP-dynamic SB has both OR-of-NORs and AND-of-NANDs gate logics Diversity in logic expression helps to build compact circuit Compound Gates in NP-D-SB OR-of-NORs Gate LogicAND-of-NANDs Gate Logic 23 Uses uniform set of {PRE EVA} clock to control circuits No monotonicity problem in cascading of n-type and p-type gates Cascaded Gates in NP-D-SB Cascaded Gates Schematic Cascaded Gates Layout HSPICE Simulation 24 Significant benefits for latency, power-latency product and density 3x latency benefits over Skybridge single-rail implementation Over 2x density improvement over Skybridge dual-rail implementation At least 17% Throughput/Power benefit Throughput is worse due to less number of pipelined stages Benchmarking Results (NP-D-SB vs. SB) Benchmarking Evaluation Results 25 SB-CMOS follows static CMOS circuit style Signal In: routed between stages with routing nanowire Signal Int0, Int1 and Int2: routed between stages without routing nanowires Cascaded Inverters in S3DC SB-CMOS Circuit style 26 3-in SB-3D-CMOS NAND: 3 nanowires for 3 parallel p-transistors Multiple nanowires shorted together by SB-ILC and bridges S3DC NAND Gate 3-in NAND physical layout Layout legend 3-in NAND schematic 27 SB-CMOS full adder implemented with 11 nanowires: 28 transistors in 0.06 um 2, 28X denser than 16nm CMOS technology S3DC Full Adder 1-bit SB-CMOS full adder design 1-bit full adder transistor-level schematic 1-bit SB-CMOS full adder physical layout Layout legend 28 S3DC 6T SRAM Cross-coupled INVs for holding value Pass transistors for write / read control Independent read / write access Customize transistor strength with various voltage levels SRAM schematic and physical layout SRAM operations Write operationRead operation Layout legend 29 Compared with 16nm-CMOS: Much better power and area efficiency Worse performance Compared with SB: Better latency but lower throughput Better power efficiency and less power consumption Good density Evaluation Results (S3DC vs. SB and 2D-CMOS) Latency( ps) Throughput (Ops./sec.) Power (W) Performance/Watt (Ops./J) Area ( m 2 ) SB-CMOS5012E E nm CMOS E E SB (dual-rail) E E Benchmarking Evaluation Results 30 Modeling and Simulation of Thermal Profile in 3-D Fine-grain transistor level modeling accounting for thermal conductivity at nanoscale Thermal profiling of 3-D circuits with and without Skybridge Heat Extraction features for the worst case static heat scenario Thermal Evaluation Methodology [9] [9] M. Rahman, et al., Architecting 3-D Integrated Circuit Fabric with Intrinsic Thermal Management Features, NANOARCH, pp , 2015 31 Analytical calculation of thermal resistance for different FET regions Electrical equivalent representation for HSPICE simulations Thermal Modeling of V-GAA Junctionless Transistor Heat Flow Paths Thermal Resistance Network for the Device Simulation Results 32 Up-to 85% average temperature reduction with heat extraction Thermal Simulations for 3-D Circuits 33 Evaluation methodology accounting for material structures, device physics, circuit style, and 3-D parasitics Design rules derived from circuit requirements and manufacturing assumptions at 16nm Circuit Evaluation Methodology, and Design Rules Width (nm) X Length (nm) Z Thicknes s (nm) Y Spacing (nm) Bridge (X,Y,Z) 16n- 58n 16n16n-58n16n-37n Transistor Channel (X,Y,X) 16n 58n Transistor Spacing (Z) ---16n Gate Electrode (Z) 29n16n11.5n- Contact (X,Y,Z)26n16n 39 Heat Junction (X,Y,Z) 22n16n6n- Coaxial (Si-M1) (X,Y) 37n- 4n (Si- M1) Coaxial (M1-M2) (X,Y) 58n- 4n (M1- M2) Evaluation Methodology Design Rules 34 4-bit fully functional microprocessor (WISP-4) design RISC architecture, 5 pipeline stages Implemented in Skybridge, NP-dynamic-Skybridge and S3DC 3D WISP-4 Microprocessor WISP-4 architecture 35 WISP-4: Instruction Fetch and Decode Instruction fetch stage 4-bit CLA for Program Counter 4:16 decoder to decode ROM address 16*9 ROM to store instructions Instruction decode stage 3:8 decoder to decode opcode 2-bit buffers for buffering address and data Instruction Fetch Instruction Decode 36 Register access stage Four 4-bit registers for operands Two 4:1 multiplexer and one 2:1 multiplexer for operand selection WISP-4: Register File Register File 37 Execution stage 4-bit CLA and multiplier for addition and multiplications A buffer for data buffering Two 2:1 multiplexers for result selection WISP-4: Arithmetic Logic Unit Arithmetic Logic Unit 38 WISP-4 Benchmarking Results Compared with 2D CMOS: 30x ~ 60x density benefits Up to 8x power efficiency benefits Up to 2x benefits in throughput WISP-4 Throughput (ops/sec) Power (uW) Power Efficiency (ops/Joule) Density (mm -2 ) 2D CMOS4.31E E+3 Skybridge5.1E+9 (1.19x)301 (0.34x)1.69E+13 (3.46x)1.05E+5 (30x) NP-D-SB9.1E+9 (2.11x)230 (0.26x)3.96E+13 (8.15x)1.96E+5 (56.6x) S3DC4.55E+9 (1.06x)186 (0.21x)2.45E+13 (5.04x)9.43E+4 (27.3x) WISP-4 Benchmarking Results