Upload
samuel-challenger
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
© 2011 Xilinx, Inc. All Rights ReservedThis material exempt per Department of Commerce license exception TSU
Basic FPGA Architectures
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Objectives
After completing this module, you will be able to:• Describe the basic slice resources available in Spartan-6 FPGAs• Identify the basic I/O resources available in Spartan-6 FPGAs• List some of the dedicated hardware features of Spartan-6 FPGAs• Differentiate the Virtex-6 family of devices from the Spartan-6
family• Identify latest members of Virtex-7 device family
Basic Architecture 2
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 3
Outline
• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families
– Virtex-6 Family– Virtex-7 Family
• Summary
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 4
Overview• All Xilinx FPGAs contain the same basic resources
– Logic Resources• Slices (grouped into CLBs)
– Contain combinatorial logic and register resources• Memory• Multipliers
– Interconnect Resources• Programmable interconnect • IOBs
– Interface between the FPGA and the outside world– Other resources
• Global clock buffers• Boundary scan logic
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA
Basic Architecture 5
Block RAM
DSP48
CMT
BUFG
BUFIO
I/OMemory Controller
MGT
PCIe Endpoint
CLB
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 Lowest Total Power
• 45 nm technology• Static power reductions
– Process & architectural innovations• Dynamic power reduction
– Lower node capacitance & architectural innovations• More hard IP functionality
– Integrated transceivers & other logic reduces power – Hard IP uses less current & power than soft IP
• Lower IO power• Low power option -1L reduces power even further• Fewer supply rails reduces power • Two families: LX and LXT
Basic Architecture 6
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 LX / LXT FPGAs
** All memory controller support x16 interface, except in CS225 package where x8 only is supported
Basic Architecture 7
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 8
Outline
• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families
– Virtex-6 Family– Virtex-7 Family
• Summary
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA CLB
• CLB contains two slices• Connected to switch matrix
for routing to otherFPGA resources
• Carry chain runs vertically through Slice0 only
SwitchM
atrix
Slice0
Slice1
CIN
COUT
Basic Architecture 9
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Three Types of Slices in Spartan-6 FPGAs
• SLICEM: Full slice– LUT can be used for logic and
memory/SRL– Has wide multiplexers and carry chain
• SLICEL: Logic and arithmetic only– LUT can only be used for logic (not
memory)– Has wide multiplexers and carry chain
• SLICEX: Logic only– LUT can only be used for logic (not
memory)– No wide multiplexers or carry chain
SLICEXSLICEX
SLICEMSLICEM
SLICEXSLICEX
SLICELSLICEL
oror
Basic Architecture 10
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 CLB Logic Slices
LUT6 8 Registers Carry Logic Wide Function Muxes Distributed RAM / SRL logic
SliceM (25%) SliceL (25%) SliceX (50%)
LUT6 8 Registers Carry Logic Wide Function Muxes
LUT6 Optimized for Logic 8 Registers
Slice mix chosen for the optimal balance of Cost, Power & Performance
Basic Architecture 11
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA SLICE
• Four LUTs• Eight storage elements
– Four flip-flop/latches– Four flip-flops
• F7MUX and F8MUX– Connects LUT outputs to create wide
functions– Output can drive the flip-flop/latches
• Carry chain (Slice0 only)– Connected to the LUTs and the four
flip-flop/latches
LUT/RAM/SRL
LUT/RAM/SRL
LUT/RAM/SRL
LUT/RAM/SRL
0 1
Basic Architecture 12
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
6-Input LUT with Dual Output
• 6-input LUT can be two 5-input LUTs with common inputs– Minimal speed impact to
a 6-input LUT– One or two outputs– Any function of six variables or
two independent functions of five variables
5-LUTD
A5A4A3A2A1
5-LUTD
A5A4A3A2A1
A6
A5A4A3A2A1
O6
O5
6-LUT
Basic Architecture 13
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Slice Flip-Flop and Flip-Flop/Latch Control
• All flip-flops and flip-flop/latches share the same CLK, SR, and CE signals– This is referred to as the “control set” of the flip-
flops– CE and SR are active high– CLK can be inverted at the slice boundary
• Set/Reset (SR) signal can be configured as synchronous or asynchronous– All four flip-flop/latches are configured the same– All four flip-flops are configured the same
• SR will cause the flip-flop to be set to the state specified by the SRINIT attribute
D
CE
SR
Q
CK
D
CE
SR
Q
CK
D
CE
SR
Q
CK
D
CE
SR
Q
CK
DFF/LATCH
AFF/LATCHAFF
DFF
D
● ●
●
● ●
●
Basic Architecture 14
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Configuring LUTs as a Shift Register (SRL)
D QCE
D QCE
D QCE
D QCE
LUTD
CECLK
A[4:0]
Q
Q31 (cascade out)
LUT
Basic Architecture 15
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Shift Register LUT Example
• Operation D - NOP must add 17 pipeline stages of 64 bits each– 1,088 flip-flops (hence 136 slices) or– 64 SRLs (hence 16 slices)
20 Cycles
64Operation A
8 Cycles8 Cycles 12 Cycles12 Cycles
Operation B
3 Cycles3 Cycles
Operation C
64
20 Cycles
Paths are StaticallyBalanced
17 Cycles17 Cycles
Operation D - NOP
Basic Architecture 16
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 17
Outline
• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families
– Virtex-6 Family– Virtex-7 Family
• Summary
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
I/O Block DiagramElectrical Resources
N
P
LVDS
Termination
Logical ResourcesIn
terc
onne
ct to
FPG
A fa
bric Master
IOSERDES
IOD
ELAY
IOLOGIC
Slave
IOSERDES
IOD
ELAY
IOLOGIC
Basic Architecture 18
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA Supports 40+ Standards
• Each input can be 3.3 V compatible• LVCMOS (3.3 V, 2.5 V, 1.8 V, 1.5 V, and 1.2 V)• LVCMOS_JEDEC• LVPECL (3.3 V, 2.5 V)• PCI• I2C*• HSTL (1.8 V, 1.5 V; Classes I, II, III, IV)
– DIFF_HSTL_I, DIFF_HSTL_I_18– DIFF_HSTL_II*
• SSTL (2.5 V, 1.8 V; Classes I, II)– DIFF_SSTL_I, DIFF_SSTL18_I– DIFF_SSTL_II*
• LVDS, Bus LVDS• RSDS_25 (point-to-point)
Easier and More
Flexible I/O Design!
Easier and More
Flexible I/O Design!
* Newly added standards
Basic Architecture 19
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA I/O Bank Structure
• All I/Os are on the edges of the chip• I/Os are grouped into banks
– 30 ~ 83 I/O per banks– Eight clock pins per edge– Common VCCO, VREF
• Restricts mixture of standards in one bank
• The differential driver is only available in Bank0 and Bank2– Differential receiver is available in all banks– On-chip termination is available in all banks
BANK 0
BANK 2
BANK 4 BANK 5
BANK 3 BANK 1
BANK 0
BANK 2
BANK 3 BANK 1
Chip View(LX45/T and Smaller)
Chip View(LX100/T and Larger)
Basic Architecture 20
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
I/O Logical Resources
• Two IOLOGIC block per I/O pair– Master and slave– Can operate independently or
concatenated
• Each IOLOGIC contains– IOSERDES
• Parallel to serial converter (serializer)• Serial to parallel converter
(De-serializer)– IODELAY
• Selectable fine-grained delay– SDR and DDR resources
Inte
rcon
nect
to F
PGA
Fabr
ic
Master
IOSERDES
IOD
ELAY
IOLOGIC
Slave
IOSERDES
IOD
ELAY
IOLOGIC
Basic Architecture 21
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 22
Outline
• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families
– Virtex-6 Family– Virtex-7 Family
• Summary
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
SLICEM Used as Distributed SelectRAM Memory
• Uses the same storage that is used for the look-up table function
• Synchronous write, asynchronous read– Can be converted to synchronous read using
the flip-flops available in the slice• Various configurations
– Single port • One LUT6 = 64x1 or 32x2 RAM• Cascadable up to 256x1 RAM
– Dual port (D)• 1 read / write port + 1 read-only port
– Simple dual port (SDP)• 1 write-only port + 1 read-only port
– Quad-port (Q)• 1 read / write port + 3 read-only ports
SinglePort
DualPort
SimpleDual Port
QuadPort
32x2 32x4 32x6 32x8 64x1 64x2 64x3 64x4
128x1128x2 256x1
32x2D 32x4D 64x1D 64x2D
128x1D
32x6SDP 64x3SDP
32x2Q 64x1Q
Each port has independent address inputs
Basic Architecture 23
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA Block RAM Features
• 18 kb size– Can be split into two independent 9-kb
memories
• Performance up to 300 MHz• Multiple configuration options
– True dual-port, simple dual-port, single-port
• Two independent ports access common data– Individual address, clock, write enable, clock enable– Independent widths for each port
• Byte-write enable
Dual-Port BRAM
Dual-Port BRAM
18k Memory
Basic Architecture 24
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Better, More BRAM
• More Block RAMs– 2x higher BRAM to Logic Cell ratio than Spartan-3A platform
• More port flexibility – 18K can be split into two 9K BRAM blocks and can be
independently addressed
• Improves buffering, caching & data storage– Excellent for embedded processing, communication protocols– Enables DSP blocks to provide more efficient video and
surveillance algorithms
• Lower Static Power
OR 9K BRAM
9K BRAM
18K BRAM
Basic Architecture 25
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Memory Controller
• Only low cost FPGA with a “hard” memory controller
• Guaranteed memory interface performance providing– Reduced engineering & board design time– DDR, DDR2, DDR3 & LP DDR support– Up to 12.8Mbps bandwidth for each memory controller
• Automatic calibration features
• Multiport structure for user interface– Six 32-bit programmable ports from fabric– Controller interface to 4, 8 or 16 bit memories devices
DRAM
SRAM
FLASH
EEPROM
DRAM
SRAM
FLASH
EEPROM
Spartan-6Spartan-6
DRAMDDRDDR2DDR3LP DDR
DRAMDRAMDDRDDR2DDR3LP DDR
Basic Architecture 26
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 Hard Memory Controller
• New Hard Block Memory Controller– Up to 4 controllers per device
• Why a Hard Memory Block?– Very common design component– Multiple customer benefits
Customer Requests Spartan-6 Hard Block Memory Controller Benefits
Higher performance • Up to 800 Mbps
Lower cost • Saves soft logic, smaller die
Lower power • Dedicated logic
Easier designs• Timing closure no longer an issue• Configurable MultiPort user interface • CoreGen/MIG wizard & EDK support
Basic Architecture 27
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA DSP48A1 Slice
PCIN
D:A:B
X
M
C
18 X 18
P
Z
+/-0
0
C
48
OPM
OD
E[3:
0]
OPM
OD
E[5]
OPM
OD
E[7]
PCO
UT
OPM
OD
E[6,
4]
P
BC
IN
D
A 48
18
B
CC
OU
T
CFOUT
BC
OU
T
CIN
12
18
18
18
Dual B, DRegister
WithPre-adder
18
18
48
18
36
18
MFOUT
48
36
18x18 signed multiplier 48-bit add/subtract/accumulatePipeline registers for high speedCascade paths for wide functionsPre-adder
18x18 signed multiplier 48-bit add/subtract/accumulatePipeline registers for high speedCascade paths for wide functionsPre-adder
A0 A1
Basic Architecture 28
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 29
Outline
• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clock Resources• Latest Families
– Virtex-6 Family– Virtex-7 Family
• Summary
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA Global Clock Network
• 16 global clock buffers in the Spartan-6 FPGA allow clocks to be distributed to potentially every clocked element on the die
• 16 HCLK lines connect clock signals to logic resources in each row
• HCLK lines can be driven by– Global clock buffers– DCM outputs– PLL outputs
Basic Architecture 30
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA I/O Clock Network
• Special clock network dedicated to I/O logical resources– Independent of global clock resources– Speeds up to 1 GHz
• Multiple sources for clocking I/O logic– BUFIO2: for high-speed dedicated I/O clock signals– BUFPLL: for clocks driven by the PLL in the CMT
P N
CMT PLL
IO bank
IOLOGIC
P NBUFIO2BUFIO2
BUFPLLBUFPLLP N P N
IOLOGIC IOLOGIC IOLOGIC
Basic Architecture 31
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Spartan-6 FPGA Clock Management Tile (CMT)
dcm1_clkout<9:0>
dcm2_clkout<9:0>
10
PLL
pll_clkout<5:0>6
CLKIN
CLKFB
CLKOUT<5:0>
DCM
10CLKIN
CLKFB
CLKIN
CLKFB DCM
Clocks from BUFG
CLKOUT<9:0>
CLKOUT<9:0>
GCLK Inputs
Feedback clocks from BUFIO2FB
Basic Architecture 32
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 33
Outline
• Overview• Slice Resources• I/O Resources• Memory and DSP48 • Clocking Resources• Latest Families
– Virtex-6 Family– Virtex-7 Family
• Summary
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Designers Eccentrics
• Higher System Performance – More design margin to simplify designs– Higher integrated functionality
• Lower System Cost– Reduce BOM– Implement design in a smaller device & lower speed-grade
• Lower Power– Help meet power budgets– Eliminate heat sinks & fans – Prevent thermal runaway
Basic Architecture 34
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Architecture AlignmentVirtex-6 FPGAs Spartan-6 FPGAs
150K Logic Cell
Device
760K Logic Cell
Device
Common Resources
*Optimized for target application in each family
3.3 Volt compatible I/O
Hardened Memory Controllers
LUT-6 CLB
DSP Slices
BlockRAM
HSS Transceivers*
Parallel I/O FIFO Logic
System Monitor
Tri-mode EMAC
PCIe® Interface
Enables IP Portability, Protects Design Investments
High-performance Clocking
Basic Architecture 35
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Virtex-6 and Spartan-6 FPGA Sub-Families
Spartan-6LX FPGA
Spartan-6LXT FPGA
Virtex-6 LXT FPGA
Virtex-6SXT FPGA
Virtex-6HXT FPGA
• Lowest Cost Logic • Lowest Cost Logic • Low-Cost Serial Connectivity
• High Logic Density • High-Speed Serial
Connectivity
• High Logic Density• High-Speed Serial
Connectivity• Enhanced DSP
• High Logic Density • Ultra High-Speed Serial
Connectivity
LogicBlock RAMDSPParallel I/OSerial I/O
Virtex-6 CXT FPGA
• Upto 3.75Gbps serial connectivity and corresponding logic performance
Basic Architecture 36
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 37
Outline
• Overview• Slice Resources• I/O Resources• Memory and DSP48 • Clocking Resources• Latest Families
– Virtex-6 Family– Virtex-7 Family
• Summary
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use OnlyVirtex-6 Base Platform 38
Virtex® Product & Process Evolution
Delivering Balanced Performance, Power, and Cost
Virtex
Virtex-E
Virtex-II
Virtex-II Pro
Virtex-4
Virtex-5
1st Generation1st Generation 2nd Generation2nd Generation 3rd Generation3rd Generation 4th Generation4th Generation 5th Generation5th Generation 6th Generation6th Generation
220-nm
180-nm
150-nm
40-nm
65-nm
90-nm
130-nm
Virtex-6
Basic Architecture 38
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
• Static Power Reduction– Higher distribution of low leakage transistors
• Dynamic Power Reduction– Reduced capacitance through device shrink
• Reduced Core Voltage Devices Lower Overall Power– VCCINT = 0.9V option allows power / performance tradeoff
• I/O Power Improvements– Dynamic termination
• System Monitor– Allows sophisticated monitoring of temperature and voltage
Strong Focus on Power Reduction
Up to 50% Power Reduction vs. Previous GenerationBasic Architecture 39
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Virtex-6 Logic Fabric
• Virtex-6 Configurable Logic Block (CLB)– Each CLB contains two slices– Each slice contains four 6-input Lookup Tables (6LUT)
• Slices implement logic functions (slice_l) • Slices for memories and shift registers (slice_m)• LUT6 implements
– All functions of up to 6 variables– Two functions of up to 5 or less variables each– Shift registers up to 32 stages long– Memories of 64 bits
• Multiple configurations within a slice
Power Consumption Benefits Performance Benefits Cost Benefits• Shift register mode greatly reduces power consumption over FF implementation
• Increased ratio of slice_m – memories available closer to the source or target logic
• Can pack logic and memory functions more efficiently
CLBCLB
Slice
LUT
LUT
LUT
LUT
Slice
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
Slice
LUT
LUT
LUT
LUT
Slice
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
Basic Architecture 40
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Higher DSP Performance
• Most advanced DSP architecture– New optional pre-adder for symmetric filters– 25x18 multiplier
• High resolution filters• Efficient floating point support
– ALU-like second stage enables mapping of advanced operations
• Programmable op-code• SIMD support• Addition / Subtraction / Logic functions
– Pattern detector
• Lowest power consumption
• Highest DSP slice capacity– Up to 2K DSP Slices
Basic Architecture 41
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 42
Outline
• Overview• Slice Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families
– Virtex-6 Family– Virtex-7 Family
• Summary
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Power, Performance and ProductivityDrive Market Trends
Lower PowerLegislation and Regulations
Higher PerformanceSystem Capacity and Performance
Improved ProductivityReduce Capital and Operating Expenses (OPEX, CAPEX)
Flat panel/TV, Central Office, Server Farms, Portable Medical, Portable Consumer
Wired Infrastructure, Wireless, Broadcast, 300G+ Networks, Aerospace and Defense, High Performance Computing
All Market Segments
#1 Customer Problem: Lower Power enables better Cost, Performance, and Capability
Basic Architecture 43
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
The Unified Architecture Advantage
• Common elements enable easy IP reuse for quickdesign portability across all 7 series families– Design scalability from low-cost to high-performance– Expanded eco-system support– Quickest TTM
Precise, Low Jitter ClockingMMCMs
Logic FabricLUT-6 CLB
DSP Engines DSP48E1 Slices
On-Chip Memory36Kbit/18Kbit Block RAM
Enhanced ConnectivityPCIe® Interface Blocks
Hi-perf. Parallel I/O ConnectivitySelectIO™ Technology
Artix™-7 FPGA
Kintex™-7 FPGA
Virtex®-7 FPGA
Hi-performance Serial I//O ConnectivityTransceiver Technology
Basic Architecture 44
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
The Xilinx 7 Series FPGAsIndustry’s First Unified Architecture
• Industry’s Lowest Power and First Unified Architecture – Spanning Low-Cost to Ultra High-End applications
• Three new device families with breakthrough innovations in power efficiency, performance-capacity and price-performance
Page 45
Basic Architecture 45
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Virtex-7 Sub-Families
• The Virtex-7 family has several sub-families– Virtex-7: General logic– Virtex-7XT: Rich DSP and block RAM– Virtex-7HT: Highest serial bandwidth
Virtex-7 XT FPGAVirtex-7 FPGA Virtex-7 HT FPGA
• High Logic Density • High-Speed Serial
Connectivity
• High Logic Density • Ultra High-Speed Serial
Connectivity
• High Logic Density• High-Speed Serial
Connectivity• Enhanced DSP
LogicBlock RAMDSPParallel I/OSerial I/O
Basic Architecture 46
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Basic Architecture 47
Outline
• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families
– Virtex-6 Family– Virtex-7 Family
• Summary
© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only
Summary• The Spartan-6 FPGA slices contain four 6-input LUTs, eight registers, and carry
logic– LUTs can perform any combinatorial function of up to six inputs– LUTs are connected with dedicated multiplexers and carry logic– Some LUTs can be configured as shift registers or memories
• The Spartan-6 FPGA IOBs contain DDR registers as well as SERDES resources
• The SelectIO™ interfaces enable direct connection to multiple I/O standards• The Spartan-6 FPGA includes dedicated block RAM and DSP slice resources• The Spartan-6 FPGA includes dedicated DCMs, PLLs, and routing resources to
improve your system clock performance and generation capability• Latest introduced families are architected for power efficiencies
– Consists of Artix, Kintex, and Virtex devices
Basic Architecture 48