55
ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Embed Size (px)

Citation preview

Page 1: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon1

ESE534:Computer Organization

Day 19: April 7, 2014

Interconnect 5: Meshes

Page 2: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon2

Previously

• Saw – need to exploit locality/structure in

interconnect

– a mesh might be useful

– Rent’s Rule as a way to characterize structure

Page 3: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon3

Today

• Mesh:– Channel width bounds– Linear population– Switch requirements– Routability– Segmentation– Clusters

Page 4: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon4

Mesh

Street Analogy

Page 5: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon5

Manhattan

Page 6: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon6

Mesh

switchbox

Page 7: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Mesh

• Strengths?

ESE534 -- Spring 2014 -- DeHon7

Page 8: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon8

Mesh Channels

• Lower Bound on w?

• Bisection Bandwidth– BW Np

– channels in bisection =N0.5

N

N

p

NN

Wp

5.0Channel width grows with N.

Page 9: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon9

Straight-forward Switching Requirements

• Total Switches?

• Switching Delay?

Page 10: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon10

Switch Delay

• Switching Delay:– Manhattan distance

• |Xi-Xj|+|Yi-Yj|

– 2 (Nsubarray)• worst case:

• Nsubarray = N

Page 11: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon11

Total Switches

• Switches per switchbox:– 4(3ww)/2 = 6w2

– Bidirectional switches • (NW same as WN) • double count

Page 12: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon12

Total Switches

• Switches per switchbox:– 6w2

• Switches into network: – (K+1) w

• Switches per PE: – 6w2 +(K+1) w– w = cNp-0.5

– Total w2N2p-1

• Total Switches: N×(Sw/PE) N2p

Page 13: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon13

Routability?

• Asking if you can route in a given channel width is:– NP-complete

• Contrast with Beneŝ, Beneš-crossover tree….

Page 14: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Linear Population Switchbox

ESE534 -- Spring 2014 -- DeHon14

Page 15: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon15

Traditional Mesh Population:Linear

• Switchbox contains only a linear number of switches in channel width

Page 16: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon16

Linear Mesh Switchbox

• Each entering channel connect to:– One channel on each remaining side (3)– 4 sides– W wires– Bidirectional switches

• (NW same as WN) • double count

– 34W/2=6W switches• vs. 6w2 for full population

Page 17: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon17

Total Switches• Switches per switchbox:

– 6w

• Switches into network: – (K+1) w

• Switches per PE: – 6w +(K+1) w– w = cNp-0.5

– Total Np-0.5

• Total Switches: N×(Sw/PE) Np+0.5 > N

Page 18: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon18

Total Switches (linear population)

• Total Switches

Np+0.5

N < Np+0.5 < N2p

• Switches grow faster than nodes

• Wires grow faster than switches

Page 19: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon19

Checking Constants (Preclass 3)

When do linear population designs become wire dominated?

• Wire pitch = 4 F• switch area = 625 F2

• wire area: (4w)2

• switch area: 6625 w• Crossover?

Page 20: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon20

Checking Constants: Full Population

Does full population really use all the wire physical tracks?

• Wire pitch = 4F• switch area = 625 F2

• wire area: (4w)2

• switch area: 6625 w2

• effective wire pitch:60F

15 times pitch

Page 21: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon21

Practical

• Full population is always switch dominated– doesn’t really use all the potential physical

tracks– …even with only two metal layers

• Just showed: – would take 15 Mapping Ratio for linear

population to take same area as full population (once crossover to wire dominated)

• Can afford to not use some wires perfectly– to reduce switches (area)

Page 22: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon22

Diamond Switch

• Typical linear switchbox pattern:– Used by Xilinx

Page 23: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon23

Mapping Ratio?

• How bad is it?

• How much wider do channels have to be?

• Mapping Ratio:– detail channel width required / global ch width

Page 24: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon24

Mapping Ratio

• Empirical: – Seems plausibly, constant in practice

• Theory/provable:– There is no Constant Mapping Ratio

• At least detail/global

– can be arbitrarily large!

Page 25: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon25

Domain Structure

• Once enter network (choose color) can only switch within domain

Page 26: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon26

Detail Routing as Coloring

• Global Route channel width = 2• Detail Route channel width = N

– Can make arbitrarily large difference

Page 27: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon27

Routability

• Domain Routing is NP-Complete– can reduce coloring problem to domain

selection• i.e. map adjacent nodes to same channel• Previous example shows basic shape

– (another reason routers are slow)

Page 28: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon28

Routing

• Lack of detail/global mapping ratio– Says detail can be arbitrarily worse than global– Doesn’t necessarily say domain routing is bad

• Maybe can avoid this effect by changing global route path?

– Says global not necessarily predict detail– Argument against decomposing mesh routing

into global phase and detail phase• Modern FPGA routers do not• VLSI routers and earliest FPGA routers did

Page 29: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Buffering and Segmentation

ESE534 -- Spring 2014 -- DeHon29

Page 30: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Buffered Bidirectional Wires

Logic

C Block

S Block

30ESE534 -- Spring 2014 -- DeHon

Page 31: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon31

Segmentation• To improve speed

(decrease delay)

• Allow wires to bypass switchboxes

• Maybe save switches?

• Certainly cost more wire tracks

Page 32: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon32

Segmentation

• Segment of Length Lseg

– 6 switches per switchbox visited

– Only enters a switchbox every Lseg

– SW/sbox/track of length Lseg = 6/Lseg

Page 33: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon33

Segmentation• Reduces switches on path N/Lseg

• May get fragmentation• Another cause of unusable wires

Page 34: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon34

Segmentation: Corner Turn Option

• Can you corner turn in the middle of a segment?

• If can, need one more switch

• SW/sbox/track = 5/Lseg + 1

Page 35: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Buffered Switch Composition

ESE534 -- Spring 2014 -- DeHon35

Page 36: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Buffered Switch Composition

ESE534 -- Spring 2014 -- DeHon36

Page 37: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Delay of Segment

ESE534 -- Spring 2014 -- DeHon37

Tseg = Tsw + Lseg( )2

× Rseg ×Cseg

Page 38: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Segment R and C

Tseg = Tsw + Lseg( )2

× Rseg ×Cseg

• What contributes to?

• Rseg?

• Cseg?

ESE534 -- Spring 2014 -- DeHon38

Page 39: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Preclass 4

• Fillin Tseg table together.

ESE534 -- Spring 2014 -- DeHon39

Page 40: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Preclass 4

• What Lseg minimizes delay for:

• Distance=1?

• Distance=2?

• Distance=6?

• Distance=10?

• Distance=20?

ESE534 -- Spring 2014 -- DeHon40

Page 41: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon41

VPR Lseg=4 Pix

Page 42: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon42

VPR Lseg=4 Route

Page 43: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Effect of Segment Length?

• Experiment with on HW9

ESE534 -- Spring 2014 -- DeHon43

Page 44: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Connection Boxes

ESE534 -- Spring 2014 -- DeHon44

Page 45: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon45

C-Box Depopulation

• Not necessary for every input to connect to every channel

• Saw last time:– K(N-K+1) switches

• Maybe use fewer?

Page 46: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon46

IO Population• Toronto Model

– Fc fraction of tracks which an input connects to

• IOs spread over 4 sides

• Maybe show up on multiple– Shown here: 2

Page 47: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon47

IO Population

Page 48: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Clustering

ESE534 -- Spring 2014 -- DeHon48

Page 49: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon49

Leaves Not LUTs

• Recall cascaded LUTs

• Often group collection of LUTs into a Logic Block

Page 50: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon50

Logic Block

[Betz+Rose/IEEE D&T 1998]

What does clusteringdo for delay?

Page 51: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

Delay versus Cluster Size

ESE534 -- Spring 2014 -- DeHon51

[Lu et al., FPGA 2009]

Page 52: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon52

Area versus Cluster Size

[Lu et al., FPGA 2009]

Page 53: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon53

Review: Mesh Design Parameters

• Cluster Size– Internal organization

• LB IO (Fc, sides)

• Switchbox Population and Topology

• Segment length distribution– and staggering

• Switch rebuffering

Page 54: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon54

Big Ideas[MSB Ideas]

• Mesh natural 2D topology– Channels grow as (Np-0.5)– Wiring grows as (N2p )– Linear Population:

• Switches grow as (Np+0.5)–Worse than shown for hierarchical

• Unbounded globaldetail mapping ratio• Detail routing NP-complete• But, seems to work well in practice…

Page 55: ESE534 -- Spring 2014 -- DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes

ESE534 -- Spring 2014 -- DeHon55

Admin

• HW8 due Wednesday

• HW9 out

• Reading for Wednesday on web