CprE / ComS 583 Reconfigurable Computing

  • Published on
    03-Jan-2016

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

CprE / ComS 583 Reconfigurable Computing. Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #6 Modern FPGA Devices. Quick Points. HW #2 is out Due Thursday, September 20 (12:00pm) LUT mapping Comparing FPGA devices - PowerPoint PPT Presentation

Transcript

  • CprE / ComS 583Reconfigurable ComputingProf. Joseph ZambrenoDepartment of Electrical and Computer EngineeringIowa State University

    Lecture #6 Modern FPGA Devices

    Lect-06.*

    Quick PointsHW #2 is outDue Thursday, September 20 (12:00pm)LUT mappingComparing FPGA devicesSynthesizing arithmetic operatorsAssignedDueEffort LevelStandard Preferred CprE 583

    Lect-06.*

    RecapHard-wired carry logic supportAltera FLEX 8000Xilinx XCV4000

    Lect-06.*

    Recap (cont.)Square-root carry select adders++01A31-30B31-30S31-30++01A29-22B29-22S29-22++01A21-15B21-15S21-15++01A14-9B14-9S14-9++01A8-4B8-4S8-4++01A3-0B3-0S3-0t4t4t5t5t5t6t6t7t7t8t8t6t7t8t9t10

    Lect-06.*

    Recap (cont.)If one operand is constant:More speed?Less hardware?HAA00S0FAA1S1FAA2S2FAA3S3101C3

    Lect-06.*

    Recap (cont.)Carry save multiplication+X0Y0X1X2X3Z0Y1X0+X1+X2+X3Z1+Y2++++Y3+++Z2

    Lect-06.*

    Recap (cont.)If one operand is constant:Can greatly reduce the number of addersRemoves all and gatesY0=0Z0X0X1X2X3Z1+Y3=1X0+X1+X2+X3Z2Y1=1Y2=0

    Lect-06.*

    LUT-Based Constant MultipliersConstants can be changed in the LUTs to program new multipliers4-LUTN0N3 10101011 x NNNNNNNN AAAAAAAAAAAA (N * 1011 (LSN))+ BBBBBBBBBBBB (N * 1010 (MSN)) SSSSSSSSSSSSSSSS Product4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUTN4N74-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT4-LUT+S0S15A0A11B4B15

    Lect-06.*

    OutlineRecapMore MultiplicationHandling Fractional ValuesFixed PointFloating PointSome Modern FPGA DevicesXilinx XC5200, Virtex (-II / -II Pro / -4 / -5), Spartan (-II / -3)*Altera FLEX 10K, APEX (20K / II), ACEX 1K, Cyclone (II), Stratix (GX / II / II GX)

    Lect-06.*

    Partial Product GenerationAND gates in multiplication are wastefulOption 1 use cascade logicOption 2 break into smaller (2x2) multipliers 42 = 101010 Multiplicandx 11 = x 1011 Multiplier 0110 (10x11) 0110 (10x11) 0110 (10x11) 0100 (10x10) 0100 (10x10) + 0100 (10x10) 462 = 0111001110 Product

    Lect-06.*

    Representation CompressionMultiplication can be simplified if the representation is compressedStandard binary representation {0,1}x2nCanonical Signed Digit (CSD) representation {-1,0,1}x2nTo encode CSD:Set C = (B + (B1)Di = Bi + Ci 2Ci+1, where Ci+1 is the carryout of Bi + Ci Example: B = 61d = 0111101b C = 0111101b + 01111010b = 010110111b -2Ci+1 = 2222101 D = 1000201 = 1000(-1)01For any n bit number, there can only be n/2 nonzero digits in a CSD representation (every other bit)

    Lect-06.*

    Booth EncodingVariation on CSD encoding:Ej = -2Bi + Bi-1 + Bi-2Select a group of 3 digits, add the two least significant digits, and then subtract 2x the most significant bitEj is {-2,-1,0,1}x22nExample:B = 61d = 0111101b = 0001111010b (with padding)E = 010(-1)1Reduces the number of partial products for multiplication by Can automatically handle negative numbers

    Lect-06.*

    Fractional ArithmeticMany important computations require fractional componentsFractional arithmetic often ignored in FPGA literatureComplex standards (ex. IEEE special cases)Resource intensive and slow

    Why not just extend the binary representation past the decimal point?

    Lect-06.*

    Fixed-Point RepresentationSeparate value into Integer (I) and Fractional remainder (F)

    F bits represent {0,1}x2-nHow large to make I and F depends on applicationEx: Q16.16 is 16 bits of integer [-215, 216) with 16 bits of fraction increments of 2-16 or 0.0000152587890625Ex: Q1.127 is a normalized integer [-1,1) with 127 bits of fraction increments of 2-127 or 5.8774717541114375398436826861112e-39IF

    Lect-06.*

    Fixed-Point ArithmeticAddition, subtraction the same (Q4.4 example):

    Multiplication requires realignment: 3.6250 0011.1010+ 2.8125 0010.1101 6.4375 0110.0111 3.6250 0011.1010 x 2.8125 0010.1101 00111010 00111010 00111010 00111010 10.1953125 1010.00110010

    Lect-06.*

    Fixed-Point IssuesOverflow/underflowQuantization ErrorsAfter rounding down previous example 3.625 x 2.8125 = 10.1875 (0.08% error)In Q4.4, 2 divided by 3 = 0.625 (6.25% error)ScalingDynamic range needed for some applications

    Lect-06.*

    IEEE 754 Floating PointSingle precision: V = (-1)S x 2(E-127) x (1.F)

    Double precision: V = (-1)S x 2(E-1023) x (1.F)

    Special conditions not a number (NaN), +-0, +-infinityGradual underflowSE18F23SE111F52

    Lect-06.*

    Floating Point FPGA HardwareXilinx XCV4085AdditionSingle-precision 587 4-LUTsDouble-precision 1334 4-LUTsMultiplicationSingle-precision 1661 4-LUTsDouble-precision 4381 4-LUTsDivisionSingle-precision 1583 4-LUTsDouble-precision 4910 4-LUTsFor double-precision, can only fit any two of three units on a single device!See [Und04] for details

    Lect-06.*

    Capacity TrendsYear1985Xilinx Device ComplexityXC200050 MHz1K gatesXC4000100 MHz250K gatesVirtex200 MHz1M gatesVirtex-II 450 MHz8M gatesSpartan80 MHz40K gatesSpartan-II200 MHz200K gatesSpartan-3326 MHz5M gates19911987XC300085 MHz7.5K gatesVirtex-E240 MHz4M gatesXC520050 MHz23K gates199519981999200020022003Virtex-II Pro450 MHz8M gates*20042006Virtex-4500 MHz16M gates*Virtex-5550 MHz24M gates*

    Lect-06.*

    Xilinx XC5200 FPGASuccessor to the XC4000Relatively small amount of CLBs with faster interconnectDeviceLogic CellsXC5202XC5204XC5206XC5210XC5215Max Logic GatesVersaBlock ArrayCLBsFlip-Flops2564807841,2961,9363,0006,00010,00016,00023,0008 x 810 x 1214 x 1418 x 1822 x 22641201963244842564807841,2961,936I/Os84124148196244

    Lect-06.*

    Xilinx XC5200 (cont.)Each CLB consists of four Logic Cells (LCs)Logic Cell = LUT + DFF20 inputs12 outputs

    Lect-06.*

    Xilinx XC5200 (cont.)

    Lect-06.*

    Xilinx Spartan FPGAsMeant to be low-power / low-cost version of XC4000 series (on newer process technology)

    DeviceLogic CellsXCS05XCS10XCS20XCS30XCS40Max Logic GatesCLB MatrixTotal CLBsFlip-Flops2384669501,3681,8625,00010,00020,00030,00040,00010 x 1014 x 1420 x 2024 x 2428 x 281001964005767843606161,1201,5362,016I/Os77112160192224Dist. RAM Bits3,2006,27212,80018,43225,088

    Lect-06.*

    Xilinx Spartan (cont.)Identical CLB to XC4000 series

    Lect-06.*

    Xilinx Spartan (cont.)Individual LUTs can be programmed as 16x1 RAMs and combined to form larger memory structures

    Lect-06.*

    Xilinx Virtex FPGAsDeviceXCV50Logic CellsMax Logic GatesCLB ArrayI/O BitsBlock RAM BitsXCV100XCV150XCV2001,72857,90616 x 2418032,7682,700108,90420 x 3018040,9603,888164,67424 x 3826049,1525,292238,66628 x 4228457,844XCV3006,912322,97032 x 4831665,536XCV40010,800468,25240 x 6040481,920XCV60015,552661,11148 x 7251298,304XCV80021,168888,43956 x 84512114,688XCV100027,6481,124,02264 x 96512131,072Select RAM+ Bits24,57638,40055,29675,26498,304153,600221,184301,058393,216

    Lect-06.*

    Xilinx Virtex (cont.)4 4-LUTs / FFs per CLBOrganized into 2 slices

    Lect-06.*

    Xilinx Virtex (cont.)Block Select+RAM dedicated blocks of on-chip, true dual port read/write synchronous RAM 4Kbit of RAM with different aspect ratiosFaster, less flexible than distributed RAM using LUTsVirtex-E updated, larger version of Virtex devices

    Lect-06.*

    Xilinx Spartan-IICLB structure similar to VirtexDeviceXC2S15Logic CellsSystem GatesCLB ArrayI/O BitsDistributed RAM BitsXC2S30XC2S50XC2S10043215,0008 x 12866,14497230,00012 x 189213,8241,72850,00016 x 2417624,5762,700100,00020 x 3017638,400XC2S1503,888150,00024 x 3626055,296XC2S2005,292200,00028 x 4228475,264Select RAM+ Bits16,38424,57632,76840,96049,15257,344

    Lect-06.*

    Xilinx Virtex-II Platform FPGAsPlatform FPGA == Multiplier??DeviceXC2V40Max LogicGatesCLB ArrayMultiplierBlocksMax I/OPadsBlock RAM BitsXC2V80XC2V250XC2V50040K8 x 84888K80K16 x 8812016K250K24 x 162420048K500K32 x 243226496KXC2V10001M40 x 3240432160KXC2V15001.5M48 x 4048528240KXC2V20002M56 x 4856624336KXC2V30003M64 x 5696720448KXC2V40004M80 x 72120912720KSelect RAM+ Bits72K144K432K576K720K864K1,008K1,728K2,160KXC2V60006M96 x 881401,1041,056KXC2V80008M112 x 1041681,1081,456K2,592K3,024K

    Lect-06.*

    Xilinx Virtex-II (cont.)4 Slices per CLB, 2 4-LUTs per slice8 LUTs per CLBBlock Select+RAMs now 18Kbit each

    Lect-06.*

    Xilinx Virtex-II (cont.)Block multipliers (18b x 18b) arranged in columns near RAM

    Lect-06.*

    Block MultipliersSynthesis tools can take larger multipliers and break them down into 18x18 multipliers

    Lect-06.*

    Xilinx Virtex-II Pro FPGAsDeviceXC2VP2PowerPCCPU BlocksLogic CellsMultiplierBlocksMax I/OPadsBlock RAM BitsXC2VP4XC2VP7XC2VP2003,1681220444K16,7682834894K111,08844396154K220,88088564290KXC2VP30230,816136644428KXC2VP40243,632192804606KXC2VP50253,136232852738KXC2VP70274,4483289961,034KXC2VP100299,2164441,1641,378KSelect RAM+ Bits216K504K792K1,584K2,448K3,456K4,176K5,904K7,992K

    Lect-06.*

    Xilinx Virtex-II Pro (cont.)PowerPC processor block features300+ MHz Harvard architecture (RISC)Five-stage pipelineHardware multiply/divideThirty-two 32-bit GPRs16 KB two-way instruction cache16 KB two-way data cacheOn-Chip Memory (OCM) interfaceIBM CoreConnect (OPB, PLB) interfaces

    Lect-06.*

    Xilinx Virtex-II Pro (cont.)PPC 405 details

    Lect-06.*

    Xilinx Spartan-3 FPGAsCLB structure similar to Virtex-IIDeviceXC3S50System GatesCLB ArrayMultiplierBlocksMax I/OPadsDistr. RAM BitsXC3S200XC3S400XC3S100050K16 x 12412412K200K24 x 201217330K400K32 x 281626456K1M48 x 4024391120KSelect RAM+ Bits72K216K288K432KXC3S1500XC3S2000XC3S4000XC3S50001.5M64 x 5232487208K2M80 x 6440565320K4M96 x 7296712432K5M104 x 80104784520K576K720K1,728K1,872K

Recommended

View more >