Upload
105034412
View
246
Download
36
Embed Size (px)
Citation preview
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
1/40
This material exempt per Department of Commerce license exception TSU
Basic FPGAArchitecture
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
2/40
Objectives
After completing this module, you will be able to:
Identify the basic architectural resources of the Virtex-II FPGA
List the differences between the Virtex-II, Virtex-II Pro, Spartan-3, and Spartan-3E devices
List the new and enhanced features of the new Virtex-4 devicefamily
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
3/40
Basic Architecture 3
Outline
Overview
Slice Resources I/O Resources
Memory and Clocking
Spartan-3, Spartan-3E, andVirtex-II Pro Features
Virtex-4 Features
Summary Appendix
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
4/40
Basic Architecture 4
Overview
All Xilinx FPGAs contain the same basic resources
Slices (grouped into CLBs)
Contain combinatorial logic and register resources
IOBs
Interface between the FPGA and the outside world
Programmable interconnect
Other resources
Memory
Multipliers
Global clock buffers
Boundary scan logic
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
5/40
Basic Architecture 5
Outline
Overview
Slice Resources I/O Resources
Memory and Clocking
Spartan-3, Spartan-3E, andVirtex-II Pro Features
Virtex-4 Features
Summary Appendix
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
6/40
Basic Architecture 6
Slices and CLBs
Each Virtex-II CLBcontains four slices
Local routing providesfeedback between slices in
the same CLB, and it
provides routing toneighboring CLBs
A switch matrix provides
accessto general routing resources
CIN
SwitchMatrix
BUFTBUF T
COUTCOUT
Slice S0
Slice S1
Local Routing
Slice S2
Slice S3
CIN
SHIFT
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
7/40
Basic Architecture 7
Slice 0
LUTLUT CarryCarry
LUTLUT CarryCarryD Q
CE
PRE
CLR
DQCE
PRE
CLR
Simplified Slice Structure
Each slice has fouroutputs Two registered outputs,
two non-registered outputs
Two BUFTs associatedwith each CLB, accessible
by all 16 CLB outputs
Carry logic runs vertically,
up only Two independentcarry chains per CLB
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
8/40
Basic Architecture 8
Detailed Slice Structure
The next few slidesdiscuss the slice features
LUTs MUXF5, MUXF6,
MUXF7, MUXF8
(only the F5 and
F6 MUX are shownin this diagram)
Carry Logic
MULT_ANDs Sequential Elements
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
9/40
Basic Architecture 9
Combinatorial Logic
AB
CD
Z
Look-Up Tables
Combinatorial logic is stored in Look-Up Tables(LUTs)
Also called Function Generators (FGs) Capacity is limited by the number of inputs, not by the
complexity
Delay through the LUT is constant
A B C D Z
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 1
0 1 0 0 10 1 0 1 1
. . .
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
10/40
Basic Architecture 10
Connecting Look-Up Tables
F5
F8
F5
F6
CLB
Slice S3
Slice S2
Slice S0
Slice S1F5
F7
F5
F6
MUXF8 combines the two
MUXF7 outputs (from the
CLB above or below)
MUXF6 combines slices S2
and S3
MUXF7 combines the two
MUXF6 outputs
MUXF6 combines slices S0 and S1
MUXF5 combines LUTs in each slice
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
11/40
Basic Architecture 11
Fast Carry Logic
Simple, fast, and completearithmetic Logic
Dedicated XOR gate forsingle-level sum
completion
Uses dedicated routing
resources
All synthesis tools caninfer carry logic
COUT COUT
SLICE
S0
SLICE
S1
SecondCarry
Chain
To S0 of the
next CLBTo CIN of S2 of the next
CLB
First Carry
Chain
SLICE
S3
SLICES2
COUT
COUT
CIN
CIN
CIN CIN CLB
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
12/40
Basic Architecture 12
CO
DI CIS
LUT
CY_MUX
CY_XOR
MULT_AND
A
B
A x B
LUT
LUT
MULT_AND Gate
Highly efficient multiply and add implementation
Earlier FPGA architectures require two LUTs per bit to perform the
multiplication and addition The MULT_AND gate enables an area reduction by performing the
multiply and the add in one LUT per bit
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
13/40
Basic Architecture 13
DCE
PRE
CLR
Q
FDCPE
D
CE
S
R
Q
FDRSE
D
CE
PRE
CLR
Q
LDCPE
G
_1
Flexible Sequential Elements
Either flip-flops or latches
Two in each slice; eight in each
CLB Inputs come from LUTs or from an
independent CLB input
Separate set and reset controls Can be synchronous or asynchronous
All controls are shared within a
slice Control signals can be inverted locally
within a slice
Shift R i t LUT
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
14/40
Basic Architecture 14
Shift Register LUT
(SRL16CE) Dynamically addressable serialshift registers
Maximum delay of 16 clock cyclesper LUT (128 per CLB)
Cascadable to other LUTs or CLBsfor longer shift registers
Dedicated connection from Q15 toD input of the next SRL16CE
Shift register length canbe changed
asynchronously
by toggling address ALUT
D QCE
D QCE
D QCE
D QCE
LUT
DCE
CLK
A[3:0]
Q
Q15 (cascade out)
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
15/40
Basic Architecture 15
Shift Register LUT Example
The SRL can be used to create a No Operation (NOP)
This example uses 64 LUTs (8 CLBs) to replace 576 flip-flops (72 CLBs)
and associated routing and delays
12 Cycles
64
Operation A
4 Cycles4 Cycles 8 Cycles8 Cycles
Operation B
3 Cycles3 Cycles
Operation C
64
12 Cycles
Paths are Statically
Balanced
9 Cycles9 Cycles
Operation D -NOP
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
16/40
Basic Architecture 16
Outline
Overview
Slice Resources I/O Resources
Memory and Clocking
Spartan-3, Spartan-3E, andVirtex-II Pro Features
Virtex-4 Features
Summary Appendix
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
17/40
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
18/40
Basic Architecture 18
SelectIO Standard
Allows direct connections to external signals of varied voltagesand thresholds Optimizes the speed/noise tradeoff
Saves having to place interface components onto your board
Differential signaling standards LVDS, BLVDS, ULVDS
LDT LVPECL
Single-ended I/O standards
LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V) PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz)
GTL, GTLP
and more!
Digital Controlled
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
19/40
Basic Architecture 19
Digital Controlled
Impedance (DCI) DCI provides Output drivers that match the impedance of the traces
On-chip termination for receivers and transmitters DCI advantages
Improves signal integrity by eliminating stub reflections
Reduces board routing complexity and component count by eliminatingexternal resistors
Eliminates the effects of temperature, voltage, and process variations byusing an internal feedback circuit
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
20/40
Basic Architecture 20
Outline
Overview
Slice Resources I/O Resources
Memory and Clocking
Spartan-3, Spartan-3E, andVirtex-II Pro Features
Virtex-4 Features
Summary Appendix
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
21/40
Basic Architecture 21
Other Virtex-II Features
Distributed RAM and block RAM
Distributed RAM uses the CLB resources (1 LUT = 16 RAM bits)
Block RAM is a dedicated resources on the device (18-kb blocks) Dedicated 18 x 18 multipliers next to block RAMs
Clock management resources
Sixteen dedicated global clock multiplexers Digital Clock Managers (DCMs)
Distributed SelectRAM
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
22/40
Basic Architecture 22
Distributed SelectRAM
Resources Uses a LUT in a slice as memory
Synchronous write
Asynchronous read Accompanying flip-flops
can be used to createsynchronous read
RAM and ROM are initializedduringconfiguration
Data can be written to RAMafter configuration
Emulated dual-port RAM One read/write port
One read-only port
RAM16X1S
O
D
WE
WCLK
A0
A1
A2
A3
LUTLUT
RAM32X1S
O
D
WE
WCLK
A0
A1
A2
A3
A4
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
Slice
LUT
LUT
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
23/40
Basic Architecture 23
Block SelectRAM Resources
Up to 3.5 Mb of RAM in 18-kbblocks
Synchronous read and write True dual-port memory
Each port has synchronous readand write capability
Different clocks for each port
Supports initial values
Synchronous reset on output
latches
Supports parity bits
One parity bit per eight data bits
DIA
DIPAADDRA
WEA
ENA
SSRA
CLKA
DIB
DIPB
WEB
ADDRB
ENB
SSRB
DOA
CLKB
DOPA
DOPB
DOB
18-kb block SelectRAM memory
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
24/40
Basic Architecture 24
Dedicated Multiplier Blocks
18-bit twos complement signed operation
Optimized to implement Multiply and Accumulate functions
Multipliers are physically located next to block SelectRAMmemory
18 x 18
Multiplier
18 x 18Multiplier
Output(36 bits)
Data_A(18 bits)
Data_B(18 bits)
4 x 4 signed
8 x 8 signed
12 x 12 signed
18 x 18 signed
Global Clock Routing
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
25/40
Basic Architecture 25
Global Clock Routing
Resources Sixteen dedicated global clock multiplexers Eight on the top-center of the die, eight on the bottom-center
Driven by a clock input pad, a DCM, or local routing Global clock multiplexers provide the following:
Traditional clock buffer (BUFG) function
Global clock enable capability (BUFGCE) Glitch-free switching between clock signals (BUFGMUX)
Up to eight clock nets can be used in each clock region of thedevice
Each device contains four or more clock regions
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
26/40
Basic Architecture 26
Digital Clock Manager (DCM)
Up to twelve DCMs per device
Located on the top and bottom edges of the die
Driven by clock input pads DCMs provide the following:
Delay-Locked Loop (DLL)
Digital Frequency Synthesizer (DFS) Digital Phase Shifter (DPS)
Up to four outputs of each DCM can drive onto global clock buffers
All DCM outputs can drive general routing
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
27/40
Basic Architecture 27
Outline
Overview
Slice Resources I/O Resources
Memory and Clocking
Spartan-3, Spartan-3E,and Virtex-II Pro Features
Virtex-4 Features
Summary Appendix
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
28/40
Basic Architecture 28
Spartan-3 versus Virtex-II
Lower cost
Smaller process = lower corevoltage
.09 micron versus .15 micron
Vccint = 1.2V versus 1.5V
Different I/O standard support
New standards: 1.2V LVCMOS,1.8V HSTL, and SSTL
Default is LVCMOS, versusLVTTL
More I/O pins per package
Only one-half of the slices
support RAM or SRL16s(SLICEM)
Fewer block RAMs and
multiplier blocks Same size and functionality
Eight global clock multiplexers
Two or four DCM blocks No internal 3-state buffers
3-state buffers are in the I/O
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
29/40
Basic Architecture 29
SLICEM and SLICEL
Each Spartan-3 CLBcontains four slices
Similar to the Virtex-II Slices are grouped in pairs
Left-hand SLICEM (Memory)
LUTs can be configured asmemory or SRL16
Right-hand SLICEL (Logic)
LUT can be used as logic
only
CIN
Switch
Matrix
COUTCOUT
Slice X0Y0
Slice X0Y1
Fast Connects
Slice X1Y0
Slice X1Y1
CIN
SHIFTIN
Left-Hand SLICEM Right-Hand SLICEL
SHIFTOUT
S 3
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
30/40
Basic Architecture 30
Spartan-3E Features
More gates per I/O thanSpartan-3
Removed some I/O standards
Higher-drive LVCMOS
GTL, GTLP
SSTL2_II
HSTL_II_18, HSTL_I, HSTL_III
LVDS_EXT, ULVDS
DDR Cascade Internal data is presented on a
single clock edge
16 BUFGMUXes on left andright sides
Drive half the chip only In addition to eight global clocks
Pipelined multipliers
Additional configurationmodes
SPI, BPI
Multi-Boot mode
Vi t II P F t
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
31/40
Basic Architecture 31
Virtex-II Pro Features
0.13 micron process
Up to 24 RocketIOMulti-Gigabit Transceiver (MGT) blocks
Serializer and deserializer (SERDES) Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant transceivers,
and others
8-, 16-, and 32-bit selectable FPGA interface
8B/10B encoder and decoder
PowerPCRISC processor blocks
Thirty-two 32-bit General Purpose Registers (GPRs)
Low power consumption: 0.9mW/MHz
IBM CoreConnect bus architecture support
O tli
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
32/40
Basic Architecture 32
Outline
Overview
Slice Resources I/O Resources
Memory and Clocking
Spartan-3, Spartan-3E, andVirtex-II Pro Features
Virtex-4 Features
Summary Appendix
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
33/40
Choose the Platform that Best
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
34/40
Basic Architecture 34
Fits the Application
ResourceResource
14K14K200K LCs200K LCsLogic
Memory
DCMs
DSP Slices
SelectIO
RocketIO
PowerPC
Ethernet MAC
LXLX FXFX SXSX
0.90.96 Mb6 Mb
441212
32329696
240240960960
23K23K55K LCs55K LCs
2.32.35.7 Mb5.7 Mb
4488
128128512512
320320640640
12K12K140K LCs140K LCs
0.60.610 Mb10 Mb
442020
3232192192
240240896896
0024 Channels24 Channels
1 or 2 Cores1 or 2 Cores
2 or 4 Cores2 or 4 Cores
N/A
N/A
N/A
N/A
N/A
N/A
O tli
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
35/40
Basic Architecture 35
Outline
Overview
Slice Resources I/O Resources
Memory and Clocking
Spartan-3, Spartan-3E, andVirtex-II Pro Features
Virtex-4 Features
Summary Appendix
R i Q ti
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
36/40
Basic Architecture 36
Review Questions
List the primary slice features
List the three ways a LUT can be configured
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
37/40
Summary
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
38/40
Basic Architecture 38
Summary
Slices contain LUTs, registers, and carry logic
LUTs are connected with dedicated multiplexers and carry logic
LUTs can be configured as shift registers or memory IOBs contain DDR registers
SelectIO standards and DCI enable direct connection to multiple
I/O standards while reducing component count Virtex-II memory resources include the following:
Distributed SelectRAM resources and distributed SelectROM (usesCLBLUTs)
18-kb block SelectRAM resources
Summary
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
39/40
Basic Architecture 39
Summary
The Virtex-II devices contain dedicated 18x18multipliers next to each block SelectRAM
resource
Digital clock managers provide the following:
Delay-Locked Loop (DLL)
Digital Frequency Synthesizer (DFS)
Digital Phase Shifter (DPS)
Where Can I Learn More?
8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8
40/40
Basic Architecture 40
Where Can I Learn More?
User Guides
www.xilinx.com DocumentationUser Guides
Application Notes
www.xilinx.com DocumentationApplication Notes
Education resources
Designing with the Virtex-4 Family course
Spartan-3E Architecture free Recorded e-Learning