19
Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D. Marino, Kevin Skadron Dept. of Computer Science – UVA {mdm9u,skadron}@cs.virginia.edu

Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

Embed Size (px)

Citation preview

Page 1: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package

WEED 2011, ISCA

Mario D. Marino, Kevin Skadron

Dept. of Computer Science – UVA

{mdm9u,skadron}@cs.virginia.edu

Page 2: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

2

What is the problem?

Excessive power usage by the physical memory channel

– 2mW/Gbits/s by Palmer et al. ISSCC’07

– 160W for 10TB/s (Vantrease et al., ISCA’08)

– Poor scaling in physical channel: RC load in package

Page 3: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

3

Outline

• Hypothesis: Wired-RF (ie, coplanar waveguides--CPWs) solves all these problems in technology that is easier to adopt than optical.

• Architecture for CPW memory interface

• Evaluation: area, power, and performance

• Conclusion

• PS: note that this is over wires (CPWs), not wireless!

Page 4: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

4

Hypothesis: why wired-RF (RF) as a bandwidth solution?

RF

Low latency media andmodulation (Chang et al., “Near Speed-of-Light Signaling Over On-Chip Electrical”, 2003)

All electrical (impedances matching), development costs closer to CMOS

distances from 1mm to 30cm (delays, energy, data rate; “RF for Future Chips”, Tam et al. 2011)

Beckmann et al., “Transmission Line Caches”, MICRO’03

Frank Chang et al. (caches, modulation, high bandwidth, latency ad power reduction; MICRO’08, HPCA’08)

Quilt-packaging (RF coplanar waveguide connecting two dies, > 200GHz, low insertion loss, built), Liu, Buckhanan et al., Notre Dame

Intel-Tera (Polka, ITJ’07): on-package

Modulation and high speed from optical

Page 5: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

5

Why can't we use RF in a traditional fashion?

• Different impedances: I/O pad, inner and outer wire bonds, PCB pads, PCB [Liu, 2006]

Page 6: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

6

Contributions• Evaluate power and area gains by replacing

power-hungry MC circuitry with on-die RF transceivers + CPW + Quilt packaging

• Evaluate architectural performance gains due to power and area gains

Page 7: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

7

Diagram of the proposed organization

• Example with 1 core and 1 RFMC

• RF path from a specific core to its rank

> 1mm

Page 8: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

8

Detailed Organization

• RFMC: MCs coupled to on-die RF transceivers and on- and inter-die coplanar waveguides (CPW)

Page 9: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

9

Quilt

• The use of Quilt (inter-dies distance ~40um) allows:

– Extending on-die CPWs

– Built for RF/low insertion loss: 0.1 dB

– Use of processor-die and DRAM dies, RF transceivers, and UCLA RF models

– Versus traditional power hungry transceivers (Palmer et al., ISSCC 2007)

– Co-planar, not flip-chip

– See Liu’s PhD dissertation and Buckhanan et al., UGIM’10

Page 10: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

10

Interfacing on-dies CPW and Quilt

Page 11: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

11

Quilt Packaging is a CPW

• Extension of the interconnection of two dies facing each other

• Designed for frequencies larger than 200GHz

• Prototype from Notre Dame tested up to 60 GHz

• Insertion loss (*): 0.1 dB

• So far, no transceivers needed for Quilt; due to its low insertion loss

Page 12: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

12

Transceivers: Power and Area

• Extracted from Chang, Tam with 10% power reduction on the amplifier to account for savings for Quilt-type packaging

Technology(nm)

Data ratePer band(Gbits/s)

#carriers to match DRAM

Power (TX+RX)

(mW)

Energy per bit (pJ)

Area (TX + RX)mm2

45 7 6 28.1 0.67 0.00690

32 8 5 24 0.6 0.00495

22 9 5 23.4 0.53 0.00439

Page 13: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

13

Area Comparison

• MC Area decreases for all components, but RF essentially eliminates PHY

• 2.4X area savings

MC RFMC

Page 14: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

14

Energy Comparison-PHY

• Even with technology improvements, RF is more efficient for distances >= 1mm and < 10mm

• Net power savings (incl. FE & TE) of 4.6X at 5mm

Page 15: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

Performance Evaluation

• M5 and DRAMsim

• 32K L1s, 1MB/core L2

• 8 cores

• 1 DRAM rank per MC, DDR2, at 2 GHz

• Same FE, TE for both MC, RFMC

• No RF latency benefits in the performance evaluation

Page 16: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

16

Performance: Stream

• Baseline—current CPUs: 3 or 4 MC

• RFMC is up to 2.4x faster than MC

Page 17: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

17

Conclusions

• RF architecture for on-package CPU-DRAM interconnection

• Evolutionary changes to CPU and DRAM design—straightforward manufacturability

• Area and power benefits (preliminary; improve with Quilt dedicated circuits)

• Benefits on performance for more cores (limited to the number of ranks if the same proportion core-to-rank is desired)

Page 18: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

18

Thanks!

Page 19: Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D

19

Power Comparison

• FE and TE present power reduction

• PHY/RF part is evaluated in the next slide (McPAT does not model RF)