29
Photonic On-Chip Networks for Performance-Energy Optimized Off-Chip Memory Access GILBERT HENDRY Off Chip Memory Access GILBERT HENDRY JOHNNIE CHAN, DANIEL BRUNINA, LUCA CARLONI, KEREN BERGMAN Lightwave Research Laboratory Columbia University New York, NY

Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Photonic On-Chip Networks for Performance-Energy Optimized

Off-Chip Memory Access

G I L B E R T H E N D R Y

Off Chip Memory Access

G I L B E R T H E N D R Y

J O H N N I E C H A N , D A N I E L B R U N I N A , L U C A C A R L O N I , K E R E N B E R G M A N

Lightwave Research LaboratoryColumbia University

New York, NY

Page 2: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Motivation

The memory gap warrants a paradigm shift in how y g p p gwe move information to and from storage and computing elements

10/1/2009Lightwave Research Lab, Columbia University

[www.OpenSparc.net][Exascale Report, 2008]

Page 3: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Main Premise

Current memory subsystem technology and y y gypackaging are not well-suited to future trends

Networks on chip

i h iGrowing cache sizes

Growing bandwidth requirements

Growing pin countsGrowing pin counts

10/1/2009Lightwave Research Lab, Columbia University

Page 4: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

SDRAM context

• DIMMs controlled fully in parallel, sharing access on data and address busses sharing access on data and address busses • Many wires/pins• Matched signal paths (for delay)• DIMMs made for short, random accessesDIMMs made for short, random accesses

DIMM

Chip Lately, this is on chip

[Intel]

Memory Controller

DIMM

DIMM

10/1/2009Lightwave Research Lab, Columbia University

Page 5: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Future SDRAM context

Example: Tilera TILE 64p 4

10/1/2009Lightwave Research Lab, Columbia University

Page 6: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

SDRAM DIMM Anatomy

dataDRAM_Bank

IO w

oder

Col Decoder

DRAM cell arraysdata

Col addr/en

Row

Sense AmpsDRAM_Chip

IO

Ro

Dec

o

Banks (usually 8)

Row addr/en

CntrlAddr/cntrl

Ranks

DRAM_DIMM

Ranks

SDRAM device

10/1/2009Lightwave Research Lab, Columbia University

Page 7: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Memory Access in an Electronic NoC

Chip BoundarymessagePacketized, size of

k t d t i d

NoC router

Chip Boundarymessage packet determined by router buffers

Memory Controller

Burst length dictated by packet size

10/1/2009Lightwave Research Lab, Columbia University

Page 8: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Memory Control

Complex DRAM controlpScheduling accesses around:

Open/closed rows

PrechargingPrecharging

Refreshing

Data/Control bus usage

10/1/2009Lightwave Research Lab, Columbia University

[DRAMsim, UMD]

Page 9: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Experimental Setup – Electronic NoC

2cm×2cm chipSystem: 5-port Electronic Router

2cm×2cm chip8×8 Electronic Mesh

28 DRAM Access points (MCs)2 DIMMs per DRAM AP

RRouters:1 kb input buffers (per VC)4 virtual channels256 b packet size128 b h l128 b channels

32 nm tech. point (ORION)Normal VtVdd = 1.0 VF 2 5 GHFreq = 2.5 GHz

Random core-DRAM access point pairsRandom read/write

Traffic:Modeled cycle-accurately with DRAMsim [Univ. MD]DDR3 (10 10 10) @ 1333 MT/

DRAM:

Uniform message sizesPoisson arrival at 1µs

10/1/2009Lightwave Research Lab, Columbia University

DDR3 (10-10-10) @ 1333 MT/s8 chips per DIMM, 8 banks per Chip, 2 ranks

Page 10: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Experiment Results

300

269 Gb/s

200

250

300

h (G

b/s)

10

100

µs)

Avg Read Latency Zero Load Latency

100

150

M B

andw

idth

1

10

Late

ncy

1000 100000

50

DR

AM

Msg Size (b)1000 10000

0.1

Msg Size (b)g ( ) Msg Size (b)

10/1/2009Lightwave Research Lab, Columbia University

Page 11: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Current

10/1/2009Lightwave Research Lab, Columbia University

Page 12: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Goal: Optically Integrated Memory

Optical Fiber

Optical Transceiver

Vdd, Gnd

10/1/2009Lightwave Research Lab, Columbia University

Page 13: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Advantages of Photonics

Decoupled energy-distance relationship

No long traces to drive and synch with clockDRAM chips can run faster

L Less power

Less pins on DIMM module and going into chipEventually required by packaging constraintsy q y p g g

Waveguides can achieve dramatically higher density due to WDM

DRAM can be arbitrarily distant – fiber is low loss

10/1/2009Lightwave Research Lab, Columbia University

Page 14: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Hybrid Circuit-Switched Photonic Network

Broadband 1×2 SwitchBroadband 1×2 Switch [Cornell, 2008]

Broadband 2×2 SwitchBroadband 2×2 Switch

n

[Shacham, NOCS ’07]

nsm

issi

on

10/1/2009Lightwave Research Lab, Columbia Universityλ

Tra

n

Page 15: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Hybrid Circuit-Switched Photonic Network

10/1/2009Lightwave Research Lab, Columbia University

Page 16: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Hybrid Circuit-Switched Photonic Network16

10/1/2009International Symposium on Networks-on-Chip

Page 17: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Hybrid Circuit-Switched Photonic Network17

10/1/2009International Symposium on Networks-on-Chip

[Bergman, HPEC ’07]

Page 18: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Photonic DRAM Access

DIMM

Fiber / PCB waveguide

Memory

Photonic + electronic

DIMM

DIMM

DIMM

Memory gateway

To network

Processor l i

Procesorgateway Modulators

needed to send commands to

DRAM

Chi p boundaryProcessor

/ cacheelectronic

Photonic switch

Modulators

generates memory

cntrl

Network Interface

control commandsMemory Control

l

10/1/2009Lightwave Research Lab, Columbia University

To/From network

Mem cntrl

Page 19: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Memory Transaction

DIMMMemory

To et o k

DIMM

DIMM

DIMM

Memory gateway

3

To network Procesorgateway

1

21

1) Read or write request is initiated from local or remote

Chi p boundary

Processor / cache

initiated from local or remote processor, travels on electronic network

2) Processor Gateway forwards it to Memory gateway

3) Memory gateway receives request

10/1/2009Lightwave Research Lab, Columbia University

Page 20: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Memory READ Transaction

4) MC receives READ command5) Switch is setup from modulators to

DIMM, and from DIMM to network6) Path setup travels back to receiving

Processor. Path ACK returns when Processor. Path ACK returns when path is set up

7) Row/Col addresses sent to DIMM optically

8) R d d d i ll

8

8) Read data returned optically9) Path torn down, MC knows how long

it will take Photonic

switchModulators7

5

Control4

8

5

68

10/1/2009Lightwave Research Lab, Columbia University

Page 21: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Memory WRITE Transaction

4) MC receives WRITE command, which is also a path setup from the processor to memory gateway

5) Switch is setup from modulators to DIMMDIMM

6) Row/Col addresses sent to DIMM7) Switch is setup from network to DIMM8) Path ACK sent back to Processor

) D i d i ll DIMM

9

9) Data transmitted optically to DIMM10) Path torn down from Processor after

data transmittedPhotonic

switchModulators

5

6

Control4

7

8

10/1/2009Lightwave Research Lab, Columbia University

Page 22: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Optical Circuit Memory (OCM) Anatomy

DRAM_OpticalTransceiverDetector Bank

Packe t Formatλ

Modulator Bank

Addr/cntrl(25)

Data (64)

Row

a

Col a

Data Nλ

Ban

k ID

Mux

Cntrl

DLL Latches

Bu

rst len

gth

add

ress

add

ress

t

tRCD

λ

driversclk

tCL

Fiber Coupling

OR

Waveguide CouplingVDD, Gnd

10/1/2009Lightwave Research Lab, Columbia University

Page 23: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Advantages of Photonics

Decoupled energy-distance relationship

No long traces to drive and synch with clockDRAM chips can run faster

L Less power

Less pins on DIMM module and going into chipEventually required by packaging constraintsy q y p g g

Waveguides can achieve dramatically higher density due to WDM

DRAM can be arbitrarily distant – fiber is low loss

Simplified memory control logic – no contending accesses, contention handled by path setup

Accesses are optimized for large streams of dataAccesses are optimized for large streams of data

10/1/2009Lightwave Research Lab, Columbia University

Page 24: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Experimental Setup - Photonic

2cm×2cm chipSystem: Photonic Torus Tile

2cm×2cm chip8×8 Photonic Torus

28 DRAM Access points (MCs)2 DIMMs per DRAM AP

RRouters:256 b buffers32 b packet size32 b channels

32 h i (ORION)32 nm tech. point (ORION)High VtVdd = 0.8 VFreq = 1 GHz

Ph t i 13λPhotonics - 13λ

Random core-DRAM access point pairsRandom read/write

Traffic:Modeled with our event-driven DRAM model

DRAM:

Uniform message sizesPoisson arrival at 1µs

10/1/2009Lightwave Research Lab, Columbia University

DDR3 (10-10-10) @ 1600 MT/s8 chips per DIMM, 8 banks per Chip

Page 25: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Performance Comparison

100s) Electronic MeshPh t i T

1

10

ad L

aten

cy (µ

Photonic Torus

600

700

800

b/s)

10 1000 10000

0.1

1

Avg

Rea

300

400

500

600

andw

idth

(Gb

Electronic Mesh Photonic Torus

1

aten

cy (µ

s)

Electronic Mesh Photonic Torus

1000 10000

100

200

DR

AM

Ba

0.1

Zero

Loa

d LaMsg Size (b)

10/1/2009Lightwave Research Lab, Columbia University

1000 10000

Msg Size (b)

Page 26: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Experiment #2

Random Statically Mapped Address Space

10/1/2009Lightwave Research Lab, Columbia University

Page 27: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Results

EM d

100

1000

cy (µ

s)

EM - random EM- mapped PT - random PT - mapped

800

1000

1200

h (G

b/s) EM - random

EM- mapped PT - random PT - mapped

1

10

Rea

d La

tenc

200

400

600

M B

andw

idth

1000 100000.01

0.1

Avg

RMsg Size (b)

1000 10000

0

200

DR

AM

Msg Size (b) Msg Size (b)Msg Size (b)

10/1/2009Lightwave Research Lab, Columbia University

Page 28: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Network Energy Comparison

Electronic Mesh Photonic Torus

1%

3%

16%

Electronic Arbiter

Electronic Clock Tree

Electronic Crossbar

1%

7%

Electronic Arbiter

57%

9%

4%

6%

4%

3Electronic Inport

Electronic Wire

Detector

Modulator

PSE1x2

Electronic Clock Tree

Electronic Crossbar

Electronic IO Wire

Electronic Inport

Electronic Wire 9% PSE1x2

PSE2x2

Thermal Tuning

90%

Electronic Wire

Power = 0.42 W

Power = 13.3 W Total Power = 2.53 W(Including laser power)

10/1/2009Lightwave Research Lab, Columbia University

Page 29: Photonic On-Chip Networks for Performance-Energy ......{2 DIMMs per DRAM AP yRouters: {1 kb input buffers (per VC) {4 virtual channels {256 b packet size {128bh l128 b channels y32

Summary

Extending a photonic network to include access to g pDRAM looks good for many reasons:

Circuit-switching allows large burst lengths and simplified memory control for increased bandwidthmemory control, for increased bandwidth.

Energy efficient end-to-end transmission

Alleviates pin count constraints with high-density waveguidesp g y g

10/1/2009Lightwave Research Lab, Columbia University

PhotoMAN