Upload
percival-jackson
View
239
Download
3
Embed Size (px)
Citation preview
Basic FPGA Architecture 2 - 2 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Slices and CLBs
• Each Virtex-II CLB contains four slices
– Local routing provides feedback between slices in the same CLB, and it provides routing to neighboring CLBs
– A switch matrix provides access to general routing resources
CIN
SwitchMatrix
BUFTBUF T
COUTCOUT
Slice S0
Slice S1
Local Routing
Slice S2
Slice S3
CIN
SHIFT
Basic FPGA Architecture 2 - 3 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Slice 0
LUTLUT CarryCarry
LUTLUT CarryCarry D QCE
PRE
CLR
DQCE
PRE
CLR
Simplified Slice Structure
• Each slice has four outputs– Two registered outputs,
two non-registered outputs
– Two BUFTs associated with each CLB, accessible by all 16 CLB outputs
• Carry logic runs vertically, up only
– Two independent carry chains per CLB
Basic FPGA Architecture 2 - 4 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Detailed Slice Structure
• The next few slides discuss the slice features
– LUTs– MUXF5, MUXF6,
MUXF7, MUXF8 (only the F5 and F6 MUX are shown in this diagram)
– Carry Logic– MULT_ANDs– Sequential Elements
Basic FPGA Architecture 2 - 5 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Combinatorial Logic
AB
CD
Z
Look-Up Tables
• Combinatorial logic is stored in Look-Up Tables (LUTs) – Also called Function Generators (FGs)– Capacity is limited by the number of inputs, not by the
complexity• Delay through the LUT is constant
A B C D Z
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
. . .
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
Basic FPGA Architecture 2 - 6 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Connecting Look-Up Tables
F5F8
F5F6
CLB
Slice S3
Slice S2
Slice S0
Slice S1 F5F7
F5F6
MUXF8 combines the two MUXF7 outputs (from the CLB above or below)
MUXF6 combines slices S2 and S3
MUXF7 combines the two MUXF6 outputs
MUXF6 combines slices S0 and S1
MUXF5 combines LUTs in each slice
Basic FPGA Architecture 2 - 7 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Fast Carry Logic
• Simple, fast, and complete arithmetic Logic
– Dedicated XOR gate for single-level sum completion
– Uses dedicated routing resources
– All synthesis tools can infer carry logic
COUT COUT
SLICE S0
SLICE S1
Second Carry Chain
To S0 of the next CLB
To CIN of S2 of the next CLB
First Carry Chain
SLICE S3
SLICE S2
COUT
COUTCIN
CIN
CIN CIN CLB
Basic FPGA Architecture 2 - 8 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
CODI CIS
LUT
CY_MUX
CY_XOR
MULT_AND
A
B
A x B
LUT
LUT
MULT_AND Gate
• Highly efficient multiply and add implementation– Earlier FPGA architectures require two LUTs per bit to perform the
multiplication and addition– The MULT_AND gate enables an area reduction by performing the
multiply and the add in one LUT per bit
Basic FPGA Architecture 2 - 9 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
D
CE
PRE
CLR
Q
FDCPE
D
CE
S
R
Q
FDRSE
D
CE
PRE
CLR
Q
LDCPE
G
_1
Flexible Sequential Elements
• Either flip-flops or latches• Two in each slice; eight in each CLB• Inputs come from LUTs or from an
independent CLB input• Separate set and reset controls
– Can be synchronous or asynchronous• All controls are shared within a slice
– Control signals can be inverted locally within a slice
Basic FPGA Architecture 2 - 10 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Shift Register LUT (SRL16CE)
• Dynamically addressable serial shift registers
– Maximum delay of 16 clock cycles per LUT (128 per CLB)
– Cascadable to other LUTs or CLBs for longer shift registers
• Dedicated connection from Q15 to D input of the next SRL16CE
– Shift register length can be changed asynchronously by toggling address A
LUT
D QCE
D QCE
D QCE
D QCE
LUTD
CECLK
A[3:0]
Q
Q15 (cascade out)
Basic FPGA Architecture 2 - 11 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
IOB Element
• Input path– Two DDR registers
• Output path– Two DDR registers– Two 3-state enable
DDR registers• Separate clocks and
clock enables for I and O• Set and reset signals
are shared
RegReg
RegReg
DDR MUX
3-state
OCK1
OCK2
RegReg
RegReg
DDR MUX
Output
OCK1
OCK2
PADPAD
RegReg
RegReg
Input
ICK1
ICK2
IOB
Basic FPGA Architecture 2 - 12 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Distributed SelectRAM Resources
• Uses a LUT in a slice as memory• Synchronous write• Asynchronous read
– Accompanying flip-flops can be used to create synchronous read
• RAM and ROM are initialized duringconfiguration
– Data can be written to RAMafter configuration
• Emulated dual-port RAM – One read/write port– One read-only port
RAM16X1S
O
D
WE
WCLK
A0
A1
A2
A3
LUTLUT
RAM32X1S
O
D
WE
WCLK
A0
A1
A2
A3
A4
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
Slice
LUT
LUT
Basic FPGA Architecture 2 - 13 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Block SelectRAM Resources
• Up to 3.5 Mb of RAM in 18-kb blocks
– Synchronous read and write• True dual-port memory
– Each port has synchronous read and write capability
– Different clocks for each port • Supports initial values• Synchronous reset on output latches• Supports parity bits
– One parity bit per eight data bits
DIADIPAADDRAWEA
ENASSRA
CLKA
DIBDIPB
WEBADDRB
ENBSSRB
DOA
CLKB
DOPA
DOPBDOB
18-kb block SelectRAM memory
Basic FPGA Architecture 2 - 15 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Dedicated Multiplier Blocks
• 18-bit twos complement signed operation• Optimized to implement Multiply and Accumulate functions• Multipliers are physically located next to block SelectRAM™ memory
18 x 18 Multiplier
18 x 18 Multiplier
Output (36 bits)
Data_A (18 bits)
Data_B (18 bits)
4 x 4 signed
8 x 8 signed
12 x 12 signed
18 x 18 signed
Basic FPGA Architecture 2 - 16 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Translate
Map
Place & Route
Xilinx Design Flow
Plan & Budget HDL RTLSimulation
Synthesizeto create netlist
FunctionalSimulation
CreateBIT File
Attain Timing Closure
TimingSimulation
Implement
Create Code/Schematic
Basic FPGA Architecture 2 - 17 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Xilinx Implementation
• Once you generate a netlist, you can implement the design
• There are several outputs of implementation
– Reports– Timing simulation netlists– Floorplan files– FPGA Editor files– and more!
Translate
Map
Place & Route
Implement
. . .
.
.
.
Basic FPGA Architecture 2 - 18 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
What is Implementation?
• More than just Place & Route• Implementation includes many phases
– Translate: Merge multiple design files into a single netlist– Map: Group logical symbols from the netlist (gates) into physical
components (slices and IOBs)– Place & Route: Place components onto the chip, connect the components,
and extract timing data into reports• Each phase generates files that allow you to use other Xilinx tools
– Floorplanner, FPGA Editor, XPower
Basic FPGA Architecture 2 - 19 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Project Summary
• Design Overview
• Device Utilization
• Performance and Constraints
• Reports
Basic FPGA Architecture 2 - 20 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Map Reports
• Map Report contents– Command line options for the map program– Design summary
• List of how many device resources are used– Errors and warnings– Removed logic summary
• List of logic that was removed due to sourceless or loadless nets– IOB properties
• Indicates whether an I/O flip-flop is used• List of attributes on each I/O pin
• Post-Map Static Timing Report not covered here
Basic FPGA Architecture 2 - 21 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Map Report ExampleRelease 4.1i - Map E.30Xilinx Mapping Report File for Design 'top'
Design Information------------------Command Line : map -p xc2v40-fg256-4 -cm area -k
4 -c 100 -tx off top.ngd Target Device : x2v40Target Package : fg256Target Speed : -4Mapper Version : virtex2 -- $Revision: 1.58 $Mapped Date : Tue Aug 21 09:42:20 2001
Design Summary--------------
Basic FPGA Architecture 2 - 22 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Map Report Example Number of errors: 0 Number of warnings: 0 Number of Slices: 182 out of
256 71% Number of Slices containing unrelated logic: 0 out of
182 0% Number of Slice Flip Flops: 170 out of
512 33% Total Number 4 input LUTs: 248 out of
512 48% Number used as LUTs: 167 Number used as a route-thru: 81 Number of bonded IOBs: 26 out of
88 29% Number of GCLKs: 1 out of
16 6%Total equivalent gate count for design: 3,475Additional JTAG gate count for IOBs: 1,248
Basic FPGA Architecture 2 - 23 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Place & Route Reports
• Place & Route Report contents – Command line options for the par program– Errors and warnings– Device utilization summary
• Similar to the Design Summary from the Map Report– Unrouted nets– Timing summary
• Statistics on average routing delays• Performance versus constraints if the design contains timing constraints
Basic FPGA Architecture 2 - 24 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Timing Reports• Timing Report contents (for designs with constraints)
– Command line options for the trce program– Timing Constraints section
• Summary of each timing constraint• Details on paths that fail to meet constraints
– Data Sheet section• Setup/hold, clock to pad, timing between clock domains, and pad-to-pad
delay information• Organized in easy-to-read table format
– Timing Summary section• Number of errors and Timing Score• Constraint coverage
Basic FPGA Architecture 2 - 25 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Timing Report ExampleRelease 4.1i - Trace E.30
Copyright (c) 1995-2001 Xilinx, Inc. All rights reserved.
trce -e 3 -l 3 -xml top top.ncd -o top.twr top.pcf
Design file: top.ncd
Physical constraint file: top.pcf
Device,speed: xc2v40,-4 (ADVANCED 1.85 2001-07-24)
Report level: error report
--------------------------------------------------------------------------------
WARNING:Timing - No timing constraints found, doing default enumeration.
================================================================================
Timing constraint: Default period analysis
8292 items analyzed, 0 timing errors detected.
Minimum period is 8.852ns.
Maximum delay is 11.830ns.
--------------------------------------------------------------------------------
Basic FPGA Architecture 2 - 26 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Timing Report ExampleAll constraints were met.
Data Sheet report:
-----------------
All values displayed in nanoseconds (ns)
Clock FiftyM_clk to Pad
---------------+------------+
| clk (edge) |
Destination Pad| to PAD |
---------------+------------+
EN | 10.035(R)|
half1 | 9.465(R)|
half2 | 9.166(F)|
half3 | 9.740(R)|
half4 | 9.174(F)|
---------------+------------+
Basic FPGA Architecture 2 - 27 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Without Timing Constraints This design had no timing constraints or pin
assignments entered when it was implemented Note the logical structure of the placement and
pins. Xilinx recommends that you compile your design
at least once without timing constraints or pin assignments
This design has a maximum system clock frequency of 50 MHz
Basic FPGA Architecture 2 - 28 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
With Timing Constraints This is the same design with three
global timing constraints entered with the Constraints Editor
It has a maximum system clock frequency of 60 MHz
Note how most of the logic is placed closer to the edge of the device where the pins have been placed
Basic FPGA Architecture 2 - 29 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Period Constraint In this example the Period constraint optimizes all delay paths between flip-flops
The Period constraint does NOT optimize delay paths from input pads to output pads (purely combinatorial), paths from input pads to flip-flops, or paths from flip-flops to output pads
= Combinatorial Logic
BUFG
CLK
ADATA
OUT2
OUT1Q
FLOP3
DQ
FLOP1
D
Q
FLOP5
DQ
FLOP4
D
BUS [7..0]
CDATA
Q
FLOP2
D
Basic FPGA Architecture 2 - 30 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
The Period Constraint A synchronous element is a flip-flop, latch, or a synchronous RAM
The Period constraint covers paths…– Between synchronous elements which are clocked by the reference net
Synchronous elements are grouped by the clock signal driving them. This is called forward propagation and enables constraining large pieces of logic with a single constraint
Basic FPGA Architecture 2 - 31 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
Offset Constraint In this example, the Offset constraint optimizes delay paths from input pads to flip-flops
and paths from flip-flops to output pads
= Combinatorial Logic
BUFG
CLK
ADATA
OUT2
OUT1Q
FLOP
DQ
FLOP
D
Q
FLOP
DQ
FLOP
D
BUS [7..0]
CDATA
Q
FLOP
D
Offset In Offset Out
Basic FPGA Architecture 2 - 32 © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only
The Offset Constraint The Offset constraint covers paths…
– From input pads to synchronous elements clocked by the reference net (Offset In)– From synchronous elements to output pads clocked by the reference net (Offset Out)
Note, that this constraint does not cover paths…– Between synchronous elements– From pads to pads (purely combinatorial paths)