26
1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

1

Exploring Design Space for 3D Clustered Architectures

Manu Awasthi, Rajeev BalasubramonianUniversity of Utah

Page 2: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

2

Device Layer 2Vertical Interconnect

Silicon

1

• Multiple layers of active devices• Vertical interconnects between layers

Device Layer

Silicon

1

Courtesy: K.Bernstein, IBM

2D Chip

3D Chip

Layer 1

Layer 2

3D TechnologiesVerySmall

~ 10µm

Page 3: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

3

Benefits of 3D • Reduction of global interconnect

L

L

• Delay/Power reduction• Bandwidth• Mix-technology integration

Page 4: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

4

Previous Proposals

• Previously in 3D…– Break and stack (Folding) [Puttaswamy et

al]• Vertical stacking of active devices

RegFile

Break and Stack

All are active

HEAT!!!

Reduced Intra-block

latency

Page 5: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

5

An alternative approach?

2D Chip

3D Chip

Die 1

Die 0

Prudent Stacking Can:

• Improve Performance

• Result in better thermal profile

Page 6: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

6

Wire Delays and Performance

Impact of wire delays

0

5

10

15

20

25

30

35

40

45

50

0 2 4 6 8

Extra delay (in clock cycles)

Per

cent s

low

dow

n

DCACHE-INTALU

IQ-INTALU

RENAME-IQ

L1D-L2

BPRED-ICACHE

ICACHE-DECODE

DECODE-RENAME

DCACHE-FPALU

FPALU-INTALU

Page 7: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

7

Clustered Architectures

• Centralized front-end– I-Cache & D-Cache– LSQ, Rename, Decode– Branch Predictor

• Clustered back-end– Issue Queue– Regfile, FUs

L1 DCache

Cluster

Crossbar/Router

Front-End

Higher clock Frequency, High ILP!!

Page 8: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

8

Decentralized Cache Banks

L1 DCache

L1 DCache

L1 DCache

Possibly better performance

Page 9: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

9

Decentralized Cache Banks

L1 DCache

Replicated Cache Banks

L1 DCache

L1 DCache

Page 10: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

10

Decentralized Cache Banks

L1 DCache

Word Interleaved Cache Banks

L1 DCache

Odd Words Even Words

Page 11: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

11

Outline

• Introduction– Motivation– 3D Architectures– Clustered Architectures

• Proposals• Results • Conclusions

Page 12: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

12

Architecture 1

Cache-on-cluster

Die 1

Die 0

Cache Bank

Cluster

Inter Die Interconnect

Intra Die Interconnect

Page 13: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

13

Architecture 2

Cluster-on-cluster

Die 1

Die 0

Cache Bank

Cluster

Inter Die Interconnect

Intra Die Interconnect

Page 14: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

14

Architecture 3

Staggered

Die 1

Die 0

Cache Bank

Cluster

Inter Die Interconnect

Intra Die Interconnect

Page 15: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

15

Outline

• Introduction– Motivation– 3D Architectures– Clustered Architectures

• Proposals• Results • Conclusions

Page 16: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

16

Experimental Setup

• Framework– Simplescalar, Wattch and Hotspot 3.0– Wire model : 8x global metal plane

• Benchmarks– SPEC 2K, single threaded

• Processor Configuration– 8 Clusters– 64 kB L1 I/D Caches, 2 way set-assoc

• L1 Data cache Word-Interleaved or Replicated

• 2D Centralized Cache – Base Case

Page 17: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

17

Base Case PerformancesPerformance Improvement wrt 2D Centralized Cache

0.01.02.03.04.05.06.07.08.09.0

Replicated WI

Cache Bank Type

Per

form

ance

Impr

ovem

ent Best Case 2D Config

Page 18: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

18

The 3D EffectAverage Performance Improvement

0

2

4

6

8

10

12

14

16

Arch 1 Arch 2 Arch 3

Perc

enta

ge Im

prov

emen

t ove

r 2D

Cent

raliz

ed

3D Replicated vs 2D Centralized

Page 19: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

19

The 3D EffectAverage performance Improvement

0

5

10

15

20

25

Arch 1 Arch 2 Arch 3Perc

enta

ge Im

pro

vem

ent over

Centr

alized

3D WI vs 2D Centralized

Page 20: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

20

Comparisons

Average Performance Improvement wrt 2D Centralized

0

5

10

15

20

25

Arch 1 Arch 2 Arch 3

IPC

Impr

ovem

ent

Average performance Improvement wrt 2D Centralized

0

5

10

15

20

25

Arch 1 Arch 2 Arch 3IP

C Im

prov

emen

t

3D Replicated 3D WI

Best Case 3D - Rep Best Case 3D - WI

12% Improvement for best case 3D vs best case 2D

Best Case 2D

2D Case

Base Case Performance Comparisons

0

5

10

15

20

25

Replicated WI

IPC

Impr

ovem

ent

Page 21: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

21

Thermal Analysis

• Wattch for power numbers• HotSpot 3.0 for thermal model (grid)

– 500x500 grid resolution

• Interconnect power modeling– Attributed to functional units– 8X plane wires– Router + Crossbar modeled as separate

entity

Page 22: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

22

Thermal Profiles

0

20

40

60

80

100

120

Base Arch 1 Arch 2 Arch 3

Pea

k Tem

p - H

ottes

t U

nit (C

)

Peak Temperature : Hottest on-chip Unit (Celsius)

Page 23: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

23

Outline

• Introduction– Motivation– 3D Architectures– Clustered Architectures

• Proposals• Results • Conclusions

Page 24: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

24

Conclusions

• Wire delays are critical to performance– Some are more important than others.

• Prudent block stacking– Performance improvement upto 12% over

2D• WI banks + Arch 3 (3D)

– Better thermal profiles compared to folding

Page 25: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

25

Backup Slides

Page 26: 1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

26

Cluster

(a) Arch-1 (cache-on-cluster) (b) Arch-2 (cluster on cluster) (c) Arch-3 (staggered)

Cache bank Intra-die horizontal wire Inter-die vertical wire

Die 1

Die 0

4 Cluster Arrangements