48
© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Basic FPGA Architectures

© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Basic FPGA Architectures

Embed Size (px)

Citation preview

© 2011 Xilinx, Inc. All Rights ReservedThis material exempt per Department of Commerce license exception TSU

Basic FPGA Architectures

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Objectives

After completing this module, you will be able to:• Describe the basic slice resources available in Spartan-6 FPGAs• Identify the basic I/O resources available in Spartan-6 FPGAs• List some of the dedicated hardware features of Spartan-6 FPGAs• Differentiate the Virtex-6 family of devices from the Spartan-6

family• Identify latest members of Virtex-7 device family

Basic Architecture 2

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 3

Outline

• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families

– Virtex-6 Family– Virtex-7 Family

• Summary

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 4

Overview• All Xilinx FPGAs contain the same basic resources

– Logic Resources• Slices (grouped into CLBs)

– Contain combinatorial logic and register resources• Memory• Multipliers

– Interconnect Resources• Programmable interconnect • IOBs

– Interface between the FPGA and the outside world– Other resources

• Global clock buffers• Boundary scan logic

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA

Basic Architecture 5

Block RAM

DSP48

CMT

BUFG

BUFIO

I/OMemory Controller

MGT

PCIe Endpoint

CLB

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 Lowest Total Power

• 45 nm technology• Static power reductions

– Process & architectural innovations• Dynamic power reduction

– Lower node capacitance & architectural innovations• More hard IP functionality

– Integrated transceivers & other logic reduces power – Hard IP uses less current & power than soft IP

• Lower IO power• Low power option -1L reduces power even further• Fewer supply rails reduces power • Two families: LX and LXT

Basic Architecture 6

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 LX / LXT FPGAs

** All memory controller support x16 interface, except in CS225 package where x8 only is supported

Basic Architecture 7

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 8

Outline

• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families

– Virtex-6 Family– Virtex-7 Family

• Summary

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA CLB

• CLB contains two slices• Connected to switch matrix

for routing to otherFPGA resources

• Carry chain runs vertically through Slice0 only

SwitchM

atrix

Slice0

Slice1

CIN

COUT

Basic Architecture 9

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Three Types of Slices in Spartan-6 FPGAs

• SLICEM: Full slice– LUT can be used for logic and

memory/SRL– Has wide multiplexers and carry chain

• SLICEL: Logic and arithmetic only– LUT can only be used for logic (not

memory)– Has wide multiplexers and carry chain

• SLICEX: Logic only– LUT can only be used for logic (not

memory)– No wide multiplexers or carry chain

SLICEXSLICEX

SLICEMSLICEM

SLICEXSLICEX

SLICELSLICEL

oror

Basic Architecture 10

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 CLB Logic Slices

LUT6 8 Registers Carry Logic Wide Function Muxes Distributed RAM / SRL logic

SliceM (25%) SliceL (25%) SliceX (50%)

LUT6 8 Registers Carry Logic Wide Function Muxes

LUT6 Optimized for Logic 8 Registers

Slice mix chosen for the optimal balance of Cost, Power & Performance

Basic Architecture 11

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA SLICE

• Four LUTs• Eight storage elements

– Four flip-flop/latches– Four flip-flops

• F7MUX and F8MUX– Connects LUT outputs to create wide

functions– Output can drive the flip-flop/latches

• Carry chain (Slice0 only)– Connected to the LUTs and the four

flip-flop/latches

LUT/RAM/SRL

LUT/RAM/SRL

LUT/RAM/SRL

LUT/RAM/SRL

0 1

Basic Architecture 12

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

6-Input LUT with Dual Output

• 6-input LUT can be two 5-input LUTs with common inputs– Minimal speed impact to

a 6-input LUT– One or two outputs– Any function of six variables or

two independent functions of five variables

5-LUTD

A5A4A3A2A1

5-LUTD

A5A4A3A2A1

A6

A5A4A3A2A1

O6

O5

6-LUT

Basic Architecture 13

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Slice Flip-Flop and Flip-Flop/Latch Control

• All flip-flops and flip-flop/latches share the same CLK, SR, and CE signals– This is referred to as the “control set” of the flip-

flops– CE and SR are active high– CLK can be inverted at the slice boundary

• Set/Reset (SR) signal can be configured as synchronous or asynchronous– All four flip-flop/latches are configured the same– All four flip-flops are configured the same

• SR will cause the flip-flop to be set to the state specified by the SRINIT attribute

D

CE

SR

Q

CK

D

CE

SR

Q

CK

D

CE

SR

Q

CK

D

CE

SR

Q

CK

DFF/LATCH

AFF/LATCHAFF

DFF

D

● ●

● ●

Basic Architecture 14

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Configuring LUTs as a Shift Register (SRL)

D QCE

D QCE

D QCE

D QCE

LUTD

CECLK

A[4:0]

Q

Q31 (cascade out)

LUT

Basic Architecture 15

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Shift Register LUT Example

• Operation D - NOP must add 17 pipeline stages of 64 bits each– 1,088 flip-flops (hence 136 slices) or– 64 SRLs (hence 16 slices)

20 Cycles

64Operation A

8 Cycles8 Cycles 12 Cycles12 Cycles

Operation B

3 Cycles3 Cycles

Operation C

64

20 Cycles

Paths are StaticallyBalanced

17 Cycles17 Cycles

Operation D - NOP

Basic Architecture 16

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 17

Outline

• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families

– Virtex-6 Family– Virtex-7 Family

• Summary

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

I/O Block DiagramElectrical Resources

N

P

LVDS

Termination

Logical ResourcesIn

terc

onne

ct to

FPG

A fa

bric Master

IOSERDES

IOD

ELAY

IOLOGIC

Slave

IOSERDES

IOD

ELAY

IOLOGIC

Basic Architecture 18

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA Supports 40+ Standards

• Each input can be 3.3 V compatible• LVCMOS (3.3 V, 2.5 V, 1.8 V, 1.5 V, and 1.2 V)• LVCMOS_JEDEC• LVPECL (3.3 V, 2.5 V)• PCI• I2C*• HSTL (1.8 V, 1.5 V; Classes I, II, III, IV)

– DIFF_HSTL_I, DIFF_HSTL_I_18– DIFF_HSTL_II*

• SSTL (2.5 V, 1.8 V; Classes I, II)– DIFF_SSTL_I, DIFF_SSTL18_I– DIFF_SSTL_II*

• LVDS, Bus LVDS• RSDS_25 (point-to-point)

Easier and More

Flexible I/O Design!

Easier and More

Flexible I/O Design!

* Newly added standards

Basic Architecture 19

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA I/O Bank Structure

• All I/Os are on the edges of the chip• I/Os are grouped into banks

– 30 ~ 83 I/O per banks– Eight clock pins per edge– Common VCCO, VREF

• Restricts mixture of standards in one bank

• The differential driver is only available in Bank0 and Bank2– Differential receiver is available in all banks– On-chip termination is available in all banks

BANK 0

BANK 2

BANK 4 BANK 5

BANK 3 BANK 1

BANK 0

BANK 2

BANK 3 BANK 1

Chip View(LX45/T and Smaller)

Chip View(LX100/T and Larger)

Basic Architecture 20

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

I/O Logical Resources

• Two IOLOGIC block per I/O pair– Master and slave– Can operate independently or

concatenated

• Each IOLOGIC contains– IOSERDES

• Parallel to serial converter (serializer)• Serial to parallel converter

(De-serializer)– IODELAY

• Selectable fine-grained delay– SDR and DDR resources

Inte

rcon

nect

to F

PGA

Fabr

ic

Master

IOSERDES

IOD

ELAY

IOLOGIC

Slave

IOSERDES

IOD

ELAY

IOLOGIC

Basic Architecture 21

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 22

Outline

• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families

– Virtex-6 Family– Virtex-7 Family

• Summary

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

SLICEM Used as Distributed SelectRAM Memory

• Uses the same storage that is used for the look-up table function

• Synchronous write, asynchronous read– Can be converted to synchronous read using

the flip-flops available in the slice• Various configurations

– Single port • One LUT6 = 64x1 or 32x2 RAM• Cascadable up to 256x1 RAM

– Dual port (D)• 1 read / write port + 1 read-only port

– Simple dual port (SDP)• 1 write-only port + 1 read-only port

– Quad-port (Q)• 1 read / write port + 3 read-only ports

SinglePort

DualPort

SimpleDual Port

QuadPort

32x2 32x4 32x6 32x8 64x1 64x2 64x3 64x4

128x1128x2 256x1

32x2D 32x4D 64x1D 64x2D

128x1D

32x6SDP 64x3SDP

32x2Q 64x1Q

Each port has independent address inputs

Basic Architecture 23

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA Block RAM Features

• 18 kb size– Can be split into two independent 9-kb

memories

• Performance up to 300 MHz• Multiple configuration options

– True dual-port, simple dual-port, single-port

• Two independent ports access common data– Individual address, clock, write enable, clock enable– Independent widths for each port

• Byte-write enable

Dual-Port BRAM

Dual-Port BRAM

18k Memory

Basic Architecture 24

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Better, More BRAM

• More Block RAMs– 2x higher BRAM to Logic Cell ratio than Spartan-3A platform

• More port flexibility – 18K can be split into two 9K BRAM blocks and can be

independently addressed

• Improves buffering, caching & data storage– Excellent for embedded processing, communication protocols– Enables DSP blocks to provide more efficient video and

surveillance algorithms

• Lower Static Power

OR 9K BRAM

9K BRAM

18K BRAM

Basic Architecture 25

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Memory Controller

• Only low cost FPGA with a “hard” memory controller

• Guaranteed memory interface performance providing– Reduced engineering & board design time– DDR, DDR2, DDR3 & LP DDR support– Up to 12.8Mbps bandwidth for each memory controller

• Automatic calibration features

• Multiport structure for user interface– Six 32-bit programmable ports from fabric– Controller interface to 4, 8 or 16 bit memories devices

DRAM

SRAM

FLASH

EEPROM

DRAM

SRAM

FLASH

EEPROM

Spartan-6Spartan-6

DRAMDDRDDR2DDR3LP DDR

DRAMDRAMDDRDDR2DDR3LP DDR

Basic Architecture 26

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 Hard Memory Controller

• New Hard Block Memory Controller– Up to 4 controllers per device

• Why a Hard Memory Block?– Very common design component– Multiple customer benefits

Customer Requests Spartan-6 Hard Block Memory Controller Benefits

Higher performance • Up to 800 Mbps

Lower cost • Saves soft logic, smaller die

Lower power • Dedicated logic

Easier designs• Timing closure no longer an issue• Configurable MultiPort user interface • CoreGen/MIG wizard & EDK support

Basic Architecture 27

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA DSP48A1 Slice

PCIN

D:A:B

X

M

C

18 X 18

P

Z

+/-0

0

C

48

OPM

OD

E[3:

0]

OPM

OD

E[5]

OPM

OD

E[7]

PCO

UT

OPM

OD

E[6,

4]

P

BC

IN

D

A 48

18

B

CC

OU

T

CFOUT

BC

OU

T

CIN

12

18

18

18

Dual B, DRegister

WithPre-adder

18

18

48

18

36

18

MFOUT

48

36

18x18 signed multiplier 48-bit add/subtract/accumulatePipeline registers for high speedCascade paths for wide functionsPre-adder

18x18 signed multiplier 48-bit add/subtract/accumulatePipeline registers for high speedCascade paths for wide functionsPre-adder

A0 A1

Basic Architecture 28

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 29

Outline

• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clock Resources• Latest Families

– Virtex-6 Family– Virtex-7 Family

• Summary

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA Global Clock Network

• 16 global clock buffers in the Spartan-6 FPGA allow clocks to be distributed to potentially every clocked element on the die

• 16 HCLK lines connect clock signals to logic resources in each row

• HCLK lines can be driven by– Global clock buffers– DCM outputs– PLL outputs

Basic Architecture 30

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA I/O Clock Network

• Special clock network dedicated to I/O logical resources– Independent of global clock resources– Speeds up to 1 GHz

• Multiple sources for clocking I/O logic– BUFIO2: for high-speed dedicated I/O clock signals– BUFPLL: for clocks driven by the PLL in the CMT

P N

CMT PLL

IO bank

IOLOGIC

P NBUFIO2BUFIO2

BUFPLLBUFPLLP N P N

IOLOGIC IOLOGIC IOLOGIC

Basic Architecture 31

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-6 FPGA Clock Management Tile (CMT)

dcm1_clkout<9:0>

dcm2_clkout<9:0>

10

PLL

pll_clkout<5:0>6

CLKIN

CLKFB

CLKOUT<5:0>

DCM

10CLKIN

CLKFB

CLKIN

CLKFB DCM

Clocks from BUFG

CLKOUT<9:0>

CLKOUT<9:0>

GCLK Inputs

Feedback clocks from BUFIO2FB

Basic Architecture 32

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 33

Outline

• Overview• Slice Resources• I/O Resources• Memory and DSP48 • Clocking Resources• Latest Families

– Virtex-6 Family– Virtex-7 Family

• Summary

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Designers Eccentrics

• Higher System Performance – More design margin to simplify designs– Higher integrated functionality

• Lower System Cost– Reduce BOM– Implement design in a smaller device & lower speed-grade

• Lower Power– Help meet power budgets– Eliminate heat sinks & fans – Prevent thermal runaway

Basic Architecture 34

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Architecture AlignmentVirtex-6 FPGAs Spartan-6 FPGAs

150K Logic Cell

Device

760K Logic Cell

Device

Common Resources

*Optimized for target application in each family

3.3 Volt compatible I/O

Hardened Memory Controllers

LUT-6 CLB

DSP Slices

BlockRAM

HSS Transceivers*

Parallel I/O FIFO Logic

System Monitor

Tri-mode EMAC

PCIe® Interface

Enables IP Portability, Protects Design Investments

High-performance Clocking

Basic Architecture 35

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Virtex-6 and Spartan-6 FPGA Sub-Families

Spartan-6LX FPGA

Spartan-6LXT FPGA

Virtex-6 LXT FPGA

Virtex-6SXT FPGA

Virtex-6HXT FPGA

• Lowest Cost Logic • Lowest Cost Logic • Low-Cost Serial Connectivity

• High Logic Density • High-Speed Serial

Connectivity

• High Logic Density• High-Speed Serial

Connectivity• Enhanced DSP

• High Logic Density • Ultra High-Speed Serial

Connectivity

LogicBlock RAMDSPParallel I/OSerial I/O

Virtex-6 CXT FPGA

• Upto 3.75Gbps serial connectivity and corresponding logic performance

Basic Architecture 36

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 37

Outline

• Overview• Slice Resources• I/O Resources• Memory and DSP48 • Clocking Resources• Latest Families

– Virtex-6 Family– Virtex-7 Family

• Summary

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use OnlyVirtex-6 Base Platform 38

Virtex® Product & Process Evolution

Delivering Balanced Performance, Power, and Cost

Virtex

Virtex-E

Virtex-II

Virtex-II Pro

Virtex-4

Virtex-5

1st Generation1st Generation 2nd Generation2nd Generation 3rd Generation3rd Generation 4th Generation4th Generation 5th Generation5th Generation 6th Generation6th Generation

220-nm

180-nm

150-nm

40-nm

65-nm

90-nm

130-nm

Virtex-6

Basic Architecture 38

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

• Static Power Reduction– Higher distribution of low leakage transistors

• Dynamic Power Reduction– Reduced capacitance through device shrink

• Reduced Core Voltage Devices Lower Overall Power– VCCINT = 0.9V option allows power / performance tradeoff

• I/O Power Improvements– Dynamic termination

• System Monitor– Allows sophisticated monitoring of temperature and voltage

Strong Focus on Power Reduction

Up to 50% Power Reduction vs. Previous GenerationBasic Architecture 39

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Virtex-6 Logic Fabric

• Virtex-6 Configurable Logic Block (CLB)– Each CLB contains two slices– Each slice contains four 6-input Lookup Tables (6LUT)

• Slices implement logic functions (slice_l) • Slices for memories and shift registers (slice_m)• LUT6 implements

– All functions of up to 6 variables– Two functions of up to 5 or less variables each– Shift registers up to 32 stages long– Memories of 64 bits

• Multiple configurations within a slice

Power Consumption Benefits Performance Benefits Cost Benefits• Shift register mode greatly reduces power consumption over FF implementation

• Increased ratio of slice_m – memories available closer to the source or target logic

• Can pack logic and memory functions more efficiently

CLBCLB

Slice

LUT

LUT

LUT

LUT

Slice

LUT

LUT

LUT

LUT

LUT

LUT

LUT

LUT

Slice

LUT

LUT

LUT

LUT

Slice

LUT

LUT

LUT

LUT

LUT

LUT

LUT

LUT

Basic Architecture 40

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Higher DSP Performance

• Most advanced DSP architecture– New optional pre-adder for symmetric filters– 25x18 multiplier

• High resolution filters• Efficient floating point support

– ALU-like second stage enables mapping of advanced operations

• Programmable op-code• SIMD support• Addition / Subtraction / Logic functions

– Pattern detector

• Lowest power consumption

• Highest DSP slice capacity– Up to 2K DSP Slices

Basic Architecture 41

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 42

Outline

• Overview• Slice Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families

– Virtex-6 Family– Virtex-7 Family

• Summary

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Power, Performance and ProductivityDrive Market Trends

Lower PowerLegislation and Regulations

Higher PerformanceSystem Capacity and Performance

Improved ProductivityReduce Capital and Operating Expenses (OPEX, CAPEX)

Flat panel/TV, Central Office, Server Farms, Portable Medical, Portable Consumer

Wired Infrastructure, Wireless, Broadcast, 300G+ Networks, Aerospace and Defense, High Performance Computing

All Market Segments

#1 Customer Problem: Lower Power enables better Cost, Performance, and Capability

Basic Architecture 43

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

The Unified Architecture Advantage

• Common elements enable easy IP reuse for quickdesign portability across all 7 series families– Design scalability from low-cost to high-performance– Expanded eco-system support– Quickest TTM

Precise, Low Jitter ClockingMMCMs

Logic FabricLUT-6 CLB

DSP Engines DSP48E1 Slices

On-Chip Memory36Kbit/18Kbit Block RAM

Enhanced ConnectivityPCIe® Interface Blocks

Hi-perf. Parallel I/O ConnectivitySelectIO™ Technology

Artix™-7 FPGA

Kintex™-7 FPGA

Virtex®-7 FPGA

Hi-performance Serial I//O ConnectivityTransceiver Technology

Basic Architecture 44

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

The Xilinx 7 Series FPGAsIndustry’s First Unified Architecture

• Industry’s Lowest Power and First Unified Architecture – Spanning Low-Cost to Ultra High-End applications

• Three new device families with breakthrough innovations in power efficiency, performance-capacity and price-performance

Page 45

Basic Architecture 45

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Virtex-7 Sub-Families

• The Virtex-7 family has several sub-families– Virtex-7: General logic– Virtex-7XT: Rich DSP and block RAM– Virtex-7HT: Highest serial bandwidth

Virtex-7 XT FPGAVirtex-7 FPGA Virtex-7 HT FPGA

• High Logic Density • High-Speed Serial

Connectivity

• High Logic Density • Ultra High-Speed Serial

Connectivity

• High Logic Density• High-Speed Serial

Connectivity• Enhanced DSP

LogicBlock RAMDSPParallel I/OSerial I/O

Basic Architecture 46

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Basic Architecture 47

Outline

• Overview• Logic Resources• I/O Resources• Memory and DSP48• Clocking Resources• Latest Families

– Virtex-6 Family– Virtex-7 Family

• Summary

© 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only

Summary• The Spartan-6 FPGA slices contain four 6-input LUTs, eight registers, and carry

logic– LUTs can perform any combinatorial function of up to six inputs– LUTs are connected with dedicated multiplexers and carry logic– Some LUTs can be configured as shift registers or memories

• The Spartan-6 FPGA IOBs contain DDR registers as well as SERDES resources

• The SelectIO™ interfaces enable direct connection to multiple I/O standards• The Spartan-6 FPGA includes dedicated block RAM and DSP slice resources• The Spartan-6 FPGA includes dedicated DCMs, PLLs, and routing resources to

improve your system clock performance and generation capability• Latest introduced families are architected for power efficiencies

– Consists of Artix, Kintex, and Virtex devices

Basic Architecture 48