Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Photonic On-Chip Networks for Performance-Energy Optimized
Off-Chip Memory Access
G I L B E R T H E N D R Y
Off Chip Memory Access
G I L B E R T H E N D R Y
J O H N N I E C H A N , D A N I E L B R U N I N A , L U C A C A R L O N I , K E R E N B E R G M A N
Lightwave Research LaboratoryColumbia University
New York, NY
Motivation
The memory gap warrants a paradigm shift in how y g p p gwe move information to and from storage and computing elements
10/1/2009Lightwave Research Lab, Columbia University
[www.OpenSparc.net][Exascale Report, 2008]
Main Premise
Current memory subsystem technology and y y gypackaging are not well-suited to future trends
Networks on chip
i h iGrowing cache sizes
Growing bandwidth requirements
Growing pin countsGrowing pin counts
10/1/2009Lightwave Research Lab, Columbia University
SDRAM context
• DIMMs controlled fully in parallel, sharing access on data and address busses sharing access on data and address busses • Many wires/pins• Matched signal paths (for delay)• DIMMs made for short, random accessesDIMMs made for short, random accesses
DIMM
Chip Lately, this is on chip
[Intel]
Memory Controller
DIMM
DIMM
10/1/2009Lightwave Research Lab, Columbia University
Future SDRAM context
Example: Tilera TILE 64p 4
10/1/2009Lightwave Research Lab, Columbia University
SDRAM DIMM Anatomy
dataDRAM_Bank
IO w
oder
Col Decoder
DRAM cell arraysdata
Col addr/en
Row
Sense AmpsDRAM_Chip
IO
Ro
Dec
o
Banks (usually 8)
Row addr/en
CntrlAddr/cntrl
Ranks
DRAM_DIMM
Ranks
SDRAM device
10/1/2009Lightwave Research Lab, Columbia University
Memory Access in an Electronic NoC
Chip BoundarymessagePacketized, size of
k t d t i d
NoC router
Chip Boundarymessage packet determined by router buffers
Memory Controller
Burst length dictated by packet size
10/1/2009Lightwave Research Lab, Columbia University
Memory Control
Complex DRAM controlpScheduling accesses around:
Open/closed rows
PrechargingPrecharging
Refreshing
Data/Control bus usage
10/1/2009Lightwave Research Lab, Columbia University
[DRAMsim, UMD]
Experimental Setup – Electronic NoC
2cm×2cm chipSystem: 5-port Electronic Router
2cm×2cm chip8×8 Electronic Mesh
28 DRAM Access points (MCs)2 DIMMs per DRAM AP
RRouters:1 kb input buffers (per VC)4 virtual channels256 b packet size128 b h l128 b channels
32 nm tech. point (ORION)Normal VtVdd = 1.0 VF 2 5 GHFreq = 2.5 GHz
Random core-DRAM access point pairsRandom read/write
Traffic:Modeled cycle-accurately with DRAMsim [Univ. MD]DDR3 (10 10 10) @ 1333 MT/
DRAM:
Uniform message sizesPoisson arrival at 1µs
10/1/2009Lightwave Research Lab, Columbia University
DDR3 (10-10-10) @ 1333 MT/s8 chips per DIMM, 8 banks per Chip, 2 ranks
Experiment Results
300
269 Gb/s
200
250
300
h (G
b/s)
10
100
µs)
Avg Read Latency Zero Load Latency
100
150
M B
andw
idth
1
10
Late
ncy
(µ
1000 100000
50
DR
AM
Msg Size (b)1000 10000
0.1
Msg Size (b)g ( ) Msg Size (b)
10/1/2009Lightwave Research Lab, Columbia University
Current
10/1/2009Lightwave Research Lab, Columbia University
Goal: Optically Integrated Memory
Optical Fiber
Optical Transceiver
Vdd, Gnd
10/1/2009Lightwave Research Lab, Columbia University
Advantages of Photonics
Decoupled energy-distance relationship
No long traces to drive and synch with clockDRAM chips can run faster
L Less power
Less pins on DIMM module and going into chipEventually required by packaging constraintsy q y p g g
Waveguides can achieve dramatically higher density due to WDM
DRAM can be arbitrarily distant – fiber is low loss
10/1/2009Lightwave Research Lab, Columbia University
Hybrid Circuit-Switched Photonic Network
Broadband 1×2 SwitchBroadband 1×2 Switch [Cornell, 2008]
Broadband 2×2 SwitchBroadband 2×2 Switch
n
[Shacham, NOCS ’07]
nsm
issi
on
10/1/2009Lightwave Research Lab, Columbia Universityλ
Tra
n
Hybrid Circuit-Switched Photonic Network
10/1/2009Lightwave Research Lab, Columbia University
Hybrid Circuit-Switched Photonic Network16
10/1/2009International Symposium on Networks-on-Chip
Hybrid Circuit-Switched Photonic Network17
10/1/2009International Symposium on Networks-on-Chip
[Bergman, HPEC ’07]
Photonic DRAM Access
DIMM
Fiber / PCB waveguide
Memory
Photonic + electronic
DIMM
DIMM
DIMM
Memory gateway
To network
Processor l i
Procesorgateway Modulators
needed to send commands to
DRAM
Chi p boundaryProcessor
/ cacheelectronic
Photonic switch
Modulators
generates memory
cntrl
Network Interface
control commandsMemory Control
l
10/1/2009Lightwave Research Lab, Columbia University
To/From network
Mem cntrl
Memory Transaction
DIMMMemory
To et o k
DIMM
DIMM
DIMM
Memory gateway
3
To network Procesorgateway
1
21
1) Read or write request is initiated from local or remote
Chi p boundary
Processor / cache
initiated from local or remote processor, travels on electronic network
2) Processor Gateway forwards it to Memory gateway
3) Memory gateway receives request
10/1/2009Lightwave Research Lab, Columbia University
Memory READ Transaction
4) MC receives READ command5) Switch is setup from modulators to
DIMM, and from DIMM to network6) Path setup travels back to receiving
Processor. Path ACK returns when Processor. Path ACK returns when path is set up
7) Row/Col addresses sent to DIMM optically
8) R d d d i ll
8
8) Read data returned optically9) Path torn down, MC knows how long
it will take Photonic
switchModulators7
5
Control4
8
5
68
10/1/2009Lightwave Research Lab, Columbia University
Memory WRITE Transaction
4) MC receives WRITE command, which is also a path setup from the processor to memory gateway
5) Switch is setup from modulators to DIMMDIMM
6) Row/Col addresses sent to DIMM7) Switch is setup from network to DIMM8) Path ACK sent back to Processor
) D i d i ll DIMM
9
9) Data transmitted optically to DIMM10) Path torn down from Processor after
data transmittedPhotonic
switchModulators
5
6
Control4
7
8
10/1/2009Lightwave Research Lab, Columbia University
Optical Circuit Memory (OCM) Anatomy
DRAM_OpticalTransceiverDetector Bank
Packe t Formatλ
Modulator Bank
Addr/cntrl(25)
Data (64)
Row
a
Col a
Data Nλ
Ban
k ID
Nλ
Mux
Cntrl
DLL Latches
Bu
rst len
gth
add
ress
add
ress
t
tRCD
λ
driversclk
tCL
Fiber Coupling
OR
Waveguide CouplingVDD, Gnd
10/1/2009Lightwave Research Lab, Columbia University
Advantages of Photonics
Decoupled energy-distance relationship
No long traces to drive and synch with clockDRAM chips can run faster
L Less power
Less pins on DIMM module and going into chipEventually required by packaging constraintsy q y p g g
Waveguides can achieve dramatically higher density due to WDM
DRAM can be arbitrarily distant – fiber is low loss
Simplified memory control logic – no contending accesses, contention handled by path setup
Accesses are optimized for large streams of dataAccesses are optimized for large streams of data
10/1/2009Lightwave Research Lab, Columbia University
Experimental Setup - Photonic
2cm×2cm chipSystem: Photonic Torus Tile
2cm×2cm chip8×8 Photonic Torus
28 DRAM Access points (MCs)2 DIMMs per DRAM AP
RRouters:256 b buffers32 b packet size32 b channels
32 h i (ORION)32 nm tech. point (ORION)High VtVdd = 0.8 VFreq = 1 GHz
Ph t i 13λPhotonics - 13λ
Random core-DRAM access point pairsRandom read/write
Traffic:Modeled with our event-driven DRAM model
DRAM:
Uniform message sizesPoisson arrival at 1µs
10/1/2009Lightwave Research Lab, Columbia University
DDR3 (10-10-10) @ 1600 MT/s8 chips per DIMM, 8 banks per Chip
Performance Comparison
100s) Electronic MeshPh t i T
1
10
ad L
aten
cy (µ
Photonic Torus
600
700
800
b/s)
10 1000 10000
0.1
1
Avg
Rea
300
400
500
600
andw
idth
(Gb
Electronic Mesh Photonic Torus
1
aten
cy (µ
s)
Electronic Mesh Photonic Torus
1000 10000
100
200
DR
AM
Ba
0.1
Zero
Loa
d LaMsg Size (b)
10/1/2009Lightwave Research Lab, Columbia University
1000 10000
Msg Size (b)
Experiment #2
Random Statically Mapped Address Space
10/1/2009Lightwave Research Lab, Columbia University
Results
EM d
100
1000
cy (µ
s)
EM - random EM- mapped PT - random PT - mapped
800
1000
1200
h (G
b/s) EM - random
EM- mapped PT - random PT - mapped
1
10
Rea
d La
tenc
200
400
600
M B
andw
idth
1000 100000.01
0.1
Avg
RMsg Size (b)
1000 10000
0
200
DR
AM
Msg Size (b) Msg Size (b)Msg Size (b)
10/1/2009Lightwave Research Lab, Columbia University
Network Energy Comparison
Electronic Mesh Photonic Torus
1%
3%
16%
Electronic Arbiter
Electronic Clock Tree
Electronic Crossbar
1%
7%
Electronic Arbiter
57%
9%
4%
6%
4%
3Electronic Inport
Electronic Wire
Detector
Modulator
PSE1x2
Electronic Clock Tree
Electronic Crossbar
Electronic IO Wire
Electronic Inport
Electronic Wire 9% PSE1x2
PSE2x2
Thermal Tuning
90%
Electronic Wire
Power = 0.42 W
Power = 13.3 W Total Power = 2.53 W(Including laser power)
10/1/2009Lightwave Research Lab, Columbia University
Summary
Extending a photonic network to include access to g pDRAM looks good for many reasons:
Circuit-switching allows large burst lengths and simplified memory control for increased bandwidthmemory control, for increased bandwidth.
Energy efficient end-to-end transmission
Alleviates pin count constraints with high-density waveguidesp g y g
10/1/2009Lightwave Research Lab, Columbia University
PhotoMAN