Architecture of Datapath-oriented Coarse-grain Logic and Routing for FPGAs

  • View

  • Download

Embed Size (px)


Architecture of Datapath-oriented Coarse-grain Logic and Routing for FPGAs. Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer Engineering University of Toronto {yeandy, jayar, lewis} Outline. Motivation Datapath regularity An datapath-oriented FPGA - PowerPoint PPT Presentation


  • Architecture of Datapath-oriented Coarse-grain Logic and Routing for FPGAsAndy Ye, Jonathan Rose, David LewisDepartment of Electrical and Computer Engineering University of Toronto{yeandy, jayar, lewis}

  • OutlineMotivationDatapath regularityAn datapath-oriented FPGAArchitectureCAD flowExperimental resultsArea efficiencyConclusion

  • Modern FPGAsVery large logic capacitiesOver 10 million equivalent logic gatesIncreasingly used to implement large and complex applicationsCentral processing unitsGraphics acceleratorsDigital signal processorsPacket switching networks

  • Datapath CircuitsLarge applicationsContain a greater amount of datapath circuitsDatapath circuits Consist of multiple identical logic structures called bit-slicesRegularityPredictability

  • An ExampleFullAdderFullAdderFullAdderFullAdderCarry InCarryOut

  • An Example

  • Research GoalDesign a new FPGA architectureUtilize datapath regularityReduce the implementation area of datapath circuits on FPGAsImplement a full set of CAD tools for the new architectureSynthesisPackingPlacementRouting

  • Key Architectural FeaturesA bus-oriented logic block architectureA mixture of coarse-grain tracks and fine-grain routing tracks

  • Datapath FPGA OverviewRoutingChannels

  • Logic Block Super-clusterCluster 4Cluster 3Cluster 2Cluster 1

  • Datapath FPGA OverviewRoutingChannels

  • Coarse-grain Routing Tracks

  • CAD FlowCAD flow for the datapath-oriented FPGA consists ofSynthesisPackingPlacementRoutingConventional CAD flowMinimize area and delay metricsDestroy datapath regularity

  • Datapath-oriented CAD FlowPreserve datapath regularity (bit-sliced structures)Map the preserved regularity onto the datapath-oriented FPGA architectureMaximize the utilization of coarse-grain routing tracksMinimize the implementation area of datapath structures

  • Datapath RepresentationDatapath circuits are represent by netlists of datapath components (VHDL or Verilog)Datapath component libraryMultiplexersAdders/subtractersShiftersComparatorsRegistersEach component consists of identical bit-slices

  • SynthesisEnhanced module compaction algorithmBased on the Synopsys FPGA compilerAugmented with several datapath-oriented featuresPreserve datapath regularity by preserving bit-slice boundariesAchieve as good area results as the conventional synthesis tools

  • An Example Datapath Circuitselcincout

  • Synthesismuxc0a0b0d0s0selcin

  • Synthesis

  • PackingBased on the T-VPACK packing algorithmPack adjacent bit-slices into super-clustersUtilize carry connections in super-clusters to minimize the delay of carry chains

  • An ExampleFour clusters per super-clusterTwo BLEs per clusterSix inputs per clusterBLEBLEBLEBLEBLEBLEBLEBLE

  • Packing Into ClustersBLEd0s0cinBLEBLEBLEBLEBLEBLE

  • Packing Into Super-clustersBLEBLEBLEBLEBLEBLEBLEBLEBLEBLEBLEBLEBLEBLEBLEBLEd0d1d2d3s0s1s2s3cincout

  • PlacementBased on the VPR placerUse simulated annealing algorithmFor super-clusters containing datapath circuitsMove super-clusters onlyFor super-clusters containing non-datapath circuits- Move individual clusters

  • RoutingBased on the VPR routerUse the path finder algorithmAs much as possibleRoute buses through coarse-grain routing tracksRoute individual signals through fine-grain routing tracksWhen necessaryUse coarse-grain routing tracks for individual signalsUse fine-grain routing tracks for buses

  • Area EfficiencyBenchmarks15 datapath circuits from the Pico-java processorArchitectural assumptionsFour BLEs per clusterFour clusters per super-clusterFour coarse-grain tracks sharing configuration memoryLogic track length of twoDisjoint switch block topologyArchitectural variablesNumber of coarse-grain tracks

  • Area Efficiency1.601.501.40100.0%95.0%90.0%0%0%-10%10%-20%20%-30%30%-40%40%-50%50%-60%60%-70%circuit area in minimum transistor area (x106)normalized circuit area% of coarse- grain tracks

  • Logic Track Length Vs. AreaArchitectural assumptionsFour clusters per super-clusterFour coarse-grain tracks share configuration memory50% of tracks are coarse-grain tracksDisjoint switch block topologyArchitectural variablesNumber of BLEs per clusterLogic track length

  • Logic Track Length Vs. Area124816track length1.601.802.002.20circuit area in minimum transistor area (x106)N = 2N = 4N = 8N = 10

  • ConclusionProposed a datapath-oriented FPGA architecture and its CAD toolsBest area is achieved when 40% - 50% of tracks are coarse-grain routing tracksFour BLEs per clusterLogic track length of twoBest area is 9.6% smaller than conventional FPGAs

    Good afternoon.In this presentation, I am going to present a new FPGA architectureThat utilizes datapath regularity to improve its overall area efficiency.Here is an outline of the talk.First we will present the motivation of the research, namely datapath regularity.Then we will discuss the architecture and the CAD flow of our proposed datapath-oriented FPGA.Experimental results on the area efficiency of the FPGA is presented next.And finally we conclude.Modern FPGAs can have very large logic capacities.Current state of the art FPGAs can have logic capacities in the order of magnitude of over 10 million equivalent logic gates [Xilinx Website].Because of this large logic capacity, FPGAs have been increasingly used to implement very large and very complex applications including CPUs, graphics accelerators, digital signal processors, and packet switching networks.Large applications usually contain a greater amount of datapath circuitsWhich typically consist of multiple identical logic structures called bit-slices.These bit-slices make the structure of datapath circuits very regular and very predictable.Here is a very simple example of a datapath circuit.This is a ripple carry adder that processes four-bit wide dataIt adds two four-bit wide numbers A0-A3 and B0-B3 together and produces another four-bit wide number represented by C0-C3 as its output.The ripple carry adder consists of four bit-slices, and each bit-slice is called a full adder.The function of each full adder is to add one-bit of each input data together and produce one-bit of output dataSo the goal of our research is to design a new FPGA architecture that utilizes datapath regularity in order to reduce the implementation area of datapath circuits on FPGAs.We also want to design and implement a full suite of CAD tools to support this new datapath-oriented FPGA architecture.These tools include synthesis, packing, placement, and routing tools.All these tools should preserve datapath regularity and efficiently map datapath regularity onto the new architecture.Our datapath-oriented FPGA contains two key architectural features.They are a bus-oriented logic block architecture; andA mixture of coarse-grain routing tracks and fine-grain routing tracks.Each is explained in more detail in the next few slides.The overall structure of our datapath-oriented architecture is shown on this slide.It consists of a two-dimensional array of logic blocks.Each logic block contains several Look-Up Tables and D-type Flip-Flops.and has its own local routing resources.The detailed structure of a logic block is shown here.The logic block is called a super-cluster.A super-cluster is divided into several clusters, in this case we have four clusters inside our super-cluster.Clusters are connected together by carry chains.Here we have two groups of carry chains running horizontally connecting the four cluster together.Now we take a look at the structure of a cluster.Our cluster is very much like the clusters used in many previous studies.It consists of several Basic Logic Elements, BLEs, and a fully connected local routing network.Each basic logic element, consists of a lookup table and a D-type flip flop.- A multiplexer and a configuration memory select the output of the BLE from the LUT output and the DFF output.Now we return to our top view of of FPGA and take a look at the global routing resources.The global routing network consists of horizontal and vertical routing channels connected by switch boxes.Each routing channel in our FPGA not only contains routing tracks of various length, but also of different granularities.In this example, our routing channel contains routing tracks with two different granularities the coarse grain routing tracks andthe fine grain routing tracks.And now we will take look at these two types of routing tracks in detail.Shown here is a super-cluster containing four clusters.We also have two kinds of routing resourcesFine-grain routing tracks andCoarse-grain routing tracks. The fine-grain routing tracks are very much like regular routing tracks in a regular FPGA.Each fine-grain routing track is controlled by a single set of configuration memory. The coarse-grain routing tracks are grouped into groups. Each group contains several routing tracks. The number of tracks in a group is called the granularity of the coarse grain routing.For example in this example, the granularity of our coarse-grain routing is four.We control a group of coarse-grain routing tracks with a single set of configuration memory.

    For example, inside the switch box. For fine-grain routing, a single switch is controlled by a single bit of SRAM. For coarse-grain routing, on the other hand, a group of four switches are controlled by a single bit of SRAM.

    The output connection blocks are similar. For fine-grain routing tracks, each logic cluster output is independently controlled by a single bit of SRAM.For coarse-grain tracks, a group of four outputs are controlled by a single bit of SRAM.Here we have individually controlled output switches for fine grain routing; and c