Download ppt - Architecture of Datapath-oriented Coarse-grain Logic and Routing for FPGAs

1

Architecture of Datapath-oriented Coarse-grain Logic

and Routing for FPGAs

Andy Ye, Jonathan Rose, David Lewis

Department of Electrical and Computer Engineering University of Toronto

{yeandy, jayar, lewis}@eecg.utoronto.ca

2

Outline

• Motivation– Datapath regularity

• An datapath-oriented FPGA– Architecture

– CAD flow

• Experimental results– Area efficiency

• Conclusion

3

Modern FPGAs

• Very large logic capacities– Over 10 million equivalent logic gates

• Increasingly used to implement large and complex applications– Central processing units

– Graphics accelerators

– Digital signal processors

– Packet switching networks

4

Datapath Circuits

• Large applications– Contain a greater amount of datapath circuits

• Datapath circuits – Consist of multiple identical logic structures

called bit-slices• Regularity

• Predictability

5

An Example

FullAdder

FullAdder

FullAdder

FullAdder

A0 A1 A2 A3B0 B1 B2 B3

C0 C1 C2 C3

Carry In

CarryOut

6

An Example

7

Research Goal

• Design a new FPGA architecture– Utilize datapath regularity

• Reduce the implementation area of datapath circuits on FPGAs

• Implement a full set of CAD tools for the new architecture– Synthesis

– Packing

– Placement

– Routing

8

Key Architectural Features

• A bus-oriented logic block architecture

• A mixture of coarse-grain tracks and fine-grain routing tracks

9

Datapath FPGA Overview

L L

L L

S

L Logic Block

Coarse grain routing tracksFine grain routing tracks

S Switch Block

RoutingChannels

10

Logic Block — Super-clusterBLEBLEBLEBLE

BLEBLEBLEBLE

BLEBLEBLEBLE

BLEBLEBLEBLE

Cluster 4Cluster 3Cluster 2Cluster 1

LocalRoutingNetwork

BLEBLEBLEBLE

A Cluster

MU

X

LUT

DFF

MA Basic Logic Element (BLE)

11

Datapath FPGA Overview

L L

L L

S

L Super-cluster

Coarse grain routing tracksFine grain routing tracks

S Switch Block

RoutingChannels

12

Coarse-grain Routing Tracks

Super-cluster

Cluster Cluster ClusterCluster

M

Sw

itch

Blo

ck

M

M

Coarse-grain Routing

M M M M

Fine-grain Routing

13

• CAD flow for the datapath-oriented FPGA consists of– Synthesis– Packing– Placement– Routing

• Conventional CAD flow– Minimize area and delay metrics– Destroy datapath regularity

CAD Flow

14

Datapath-oriented CAD Flow

• Preserve datapath regularity (bit-sliced structures)

• Map the preserved regularity onto the datapath-oriented FPGA architecture

• Maximize the utilization of coarse-grain routing tracks– Minimize the implementation area of datapath

structures

15

Datapath Representation

• Datapath circuits are represent by netlists of datapath components (VHDL or Verilog)

• Datapath component library– Multiplexers

– Adders/subtracters

– Shifters

– Comparators

– Registers

• Each component consists of identical bit-slices

16

Synthesis

• Enhanced module compaction algorithm

• Based on the Synopsys FPGA compiler

• Augmented with several datapath-oriented features– Preserve datapath regularity by preserving bit-

slice boundaries

– Achieve as good area results as the conventional synthesis tools

17

An Example Datapath Circuit

mux

+

c1

a1 b1

d1

s1

mux

+

c2

a2 b2

d2

s2

mux

+

c3

a3 b3

d3

s3

sel mux

+

c0

a0 b0

d0

s0

cin cout

18

Synthesis

mux

c0

a0 b0

d0

s0

sel

cin

4-LUT

a0 b0 c0 sel

4-LUT

4-LUT

d0

s0

cin

+

19

Synthesis

4-LUT

a2 b2 c2 sel

4-LUT

4-LUT

d2

s2

4-LUT

a1 b1 c1 sel

4-LUT

4-LUT

d1

s1

4-LUT

a0 b0 c0 sel

4-LUT

4-LUT

d0

s0

cin

4-LUT

a3 b3 c3 sel

4-LUT

4-LUT

d3

s3

cout

20

Packing

• Based on the T-VPACK packing algorithm

• Pack adjacent bit-slices into super-clusters

• Utilize carry connections in super-clusters to minimize the delay of carry chains

21

An Example

• Four clusters per super-cluster

• Two BLEs per cluster

• Six inputs per cluster

BLEBLE

BLEBLE

BLEBLE

BLEBLE

22

Packing Into Clusters

4-LUT

a0 b0 c0 sel

4-LUT

4-LUT

d0

s0

cin BLE

a0 b0 c0 sel

d0

s0

cin

BLE

BLEBLE

BLE

BLEBLE

23

Packing Into Super-clusters

BLEBLE

BLEBLE

BLEBLE

BLEBLE

BLEBLE

BLEBLE

BLEBLE

BLEBLE

a0 b0 c0 sel a2 b2 c2 sel a3 b3 c3 sel

d0 d1 d2 d3

s0 s1 s2 s3

cin

cout

a1 b1 c1 sel

24

Placement

• Based on the VPR placer

• Use simulated annealing algorithm

• For super-clusters containing datapath circuits– Move super-clusters only

• For super-clusters containing non-datapath circuits- Move individual clusters

25

Routing

• Based on the VPR router

• Use the path finder algorithm

• As much as possible– Route buses through coarse-grain routing tracks

– Route individual signals through fine-grain routing tracks

• When necessary– Use coarse-grain routing tracks for individual signals

– Use fine-grain routing tracks for buses

26

Area Efficiency

• Benchmarks– 15 datapath circuits from the Pico-java processor

• Architectural assumptions– Four BLEs per cluster– Four clusters per super-cluster– Four coarse-grain tracks sharing configuration memory– Logic track length of two– Disjoint switch block topology

• Architectural variables– Number of coarse-grain tracks

27

Area Efficiency

1.60

1.50

1.40

100.0%

95.0%

90.0%0% 0%-

10%10%-20%

20%-30%

30%-40%

40%-50%

50%-60%

60%-70%

circuit area in minimumtransistor area (x106)

normalizedcircuit area

% of coarse-grain tracks

28

Logic Track Length Vs. Area

• Architectural assumptions– Four clusters per super-cluster– Four coarse-grain tracks share configuration

memory– 50% of tracks are coarse-grain tracks– Disjoint switch block topology

• Architectural variables– Number of BLEs per cluster– Logic track length

29

Logic Track Length Vs. Area

1 2 4 8 16track length1.60

1.80

2.00

2.20

circuit area inminimum transistor area (x106) N = 2

N = 4

N = 8

N = 10

30

Conclusion

• Proposed a datapath-oriented FPGA architecture and its CAD tools

• Best area is achieved when – 40% - 50% of tracks are coarse-grain routing

tracks– Four BLEs per cluster– Logic track length of two

• Best area is 9.6% smaller than conventional FPGAs