32
CS250, UC Berkeley Fall ‘20 Lecture 04, Reconfigurable Architectures 2 CS250 VLSI Systems Design Fall 2020 John Wawrzynek with Arya Reais-Parsi

CS250 VLSI Systems Designcs250/fa20/files/lec04-rca2.pdfVLSI Systems Design Fall 2020 John Wawrzynek with Arya Reais-Parsi ... 3. Lecture 04, Reconfigurable Architecture 2 4 CS250,

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architectures 2

    CS250
VLSISystemsDesign

    Fall2020

    JohnWawrzynek

    with

    AryaReais-Parsi

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architectures 2

    FPGAOverview

    2

    SimplifiedversionofFPGAinternalarchitecture(notscalable)

    ‣ Basicidea:two-dimensionalarrayoflogicblocksandflip-flopswithameansfortheusertoconfigure(program):

    1.theinterconnectionbetweenthelogicblocks,

    2.thefunctionofeachblock.

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    ReconfigurableFabricArchitecture:DegreesofFreedom

    1. LogicBlocksCapacityandinternalstructureofcombinationlogiccircuitsandstateelement(s),

    Clusteringandinternalinterconnect

    2. InterconnectionNetworkArchitectureCircuit-switchednotpacket-switched,

    Topologyofnetwork

    3. ConfigurationArchitecturehowisprogramminginformationloadedanddistributed,

    configuration“depth”

    4. Hardblocks:RAM,ALUs,ProcessorCores,…Function(s),count,andhowintegratedintothefabric

    3

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2 4

    Spring 2013 EECS150 - Lec02-SDS-FPGAs Page

    Colorsrepresentdifferenttypesofresources:

    LogicBlockRAMDSP(ALUs)ClockingI/OSerialI/O+PCI

    Aroutingfabricrunsthroughoutthechiptowireeverythingtogether. 64

    XilinxVirtex-5

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    ConfigurableLogicBlocks(CLBs)

    5

    Slices define regular connections to the switching fabric, and to slices in

    CLBs above and below it on the die.

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    Primitive:5-inputLookUpTables(LUTs)

    6

    A[6:2] D000000000100010

    ....

    101

    111011111011111

    001

    Q

    Q

    Q

    Q

    Q

    Q

    (1)

    (1)

    (1)

    (0)

    (0)

    (0)

    ....D

    A[6:2]

    Computes any 5-input logic function.

    Timing is independent of function.

    Latchesset during

    configuration.

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    Virtex6-LUTs:Compositionof5-LUTs

    7

    May be used as one

    6-input LUT (D6 out) ...

    ... or as two 5-input LUTS (D6 and D5)

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    Thesimplestviewofaslice

    8

    Spring 2013

    Four 6-LUTs

    Four Flip-Flops

    Switching fabric may see combinational and registered

    outputs.

    AnactualVirtexsliceaddsmanysmallfeaturestothissimplifieddiagram.Weshowthemoneby

    one...

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    Two7-LUTsperslice...

    9

    Extra multiplexers(F7AMUX,

    F7BMUX)Extra inputs (AX and CX)

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    Orone8-LUTsperslice...

    10

    Spring 2013

    Third multiplexer(F8MUX)

    Third input (BX)

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    ExtramuxestochoseLUToption...

    11

    Spring 2013

    From eight 5-LUTs ... to one 8-LUT.

    Combinational or registered outs.

    Flip-flops unused by LUTs can be used

    standalone.

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2 12

    Spring 2013

    Wecanmapripple-carryadditiononto

    carry-chainblock.

    Thecarry-chainblockalsousefulforspeedingupother

    adderstructuresandcounters.

    Virtex“Vertical”Logic

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    Puttingitalltogether...aSLICEL.

    13

    Spring 2013 EECS150 - Lec02-SDS-FPGAs

    ThepreviousslidesexplainallSLICELfeatures.

    About50%oftheareSLICELs.

    TheotherslicesareSLICEMs,

    andhaveextrafeatures.

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    Recall:5-LUTarchitecture...

    14

    A[6:2] D000000000100010

    ....

    101

    111011111011111

    001

    Q

    Q

    Q

    Q

    Q

    Q

    (1)

    (1)

    (1)

    (0)

    (0)

    (0)

    ....D

    A[6:2]

    32Latches.Configuredto1or0.

    Somepartsofalogicdesignneedmanystate

    elements.

    SLICEMsreplacenormal5-LUTswithcircuitsthatcanactlike5-LUTs,butcanalternativelyusethe32

    latchesasRAM,ROM,shiftregisters.

  • EE141

    ASLICEM6-LUT...

    Normal6-LUTinputs.

    Normal5/6-LUToutputs.

    Memory write

    address

    Memorydatainput

    Memorydatainput.

    ControloutputforchainingLUTsto

    makelargermemories.

    Synchronouswrite/asychronousread

    15

  • EE141

    Page

    SLICELvsSLICEM...SLICEMSLICEL

    SLICEMaddsmemoryfeaturestoLUTs,+muxes.

    !16

  • EE141

    DistributedRAMPrimitives

    Allarebuiltfromasinglesliceorless.

    Remember,though,thattheSLICEMLUTisnaturallyonly1readand1writeport.

    17

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    ConfigurableInterconnect

    ‣ DesignChallenges(topology):‣ traversinglongwiresincursdelayandenergy‣ switches(transistors)addsignificantdelay‣ Mappingtime

    18

    “connection block”

    switch matrix could be more richly populated

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    XilinxFPGAs(tileinterconnectdetail)

    19

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    OtherTopologies‣ Traditional:

    ‣ Fromflexlogic,Inc.

    20

    Clos Network

    “uses about half the area of the traditional interconnect and uses only 5-7 metal routing layers”

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    Fat-TreeBasedInterconnect‣ Use“Rent’srule”forproperthickness

    21

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    EmbeddedHardBlocks‣ Manyimportant

    functionsarenotefficientwhenimplementedinthereconfigurablefabric:

    ‣ multiplication,largememory,processorcores,…

    ‣ Dedicatedblockstakerelativelylittleareaandthereforecouldgounused.

    22

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2 23

    Spring 2013 EECS150 - Lec02-SDS-FPGAs Page

    Colorsrepresentdifferenttypesofresources:

    LogicBlockRAMDSP(ALUs)ClockingI/OSerialI/O+PCI

    Aroutingfabricrunsthroughoutthechiptowireeverythingtogether. 64

    XilinxVirtex-5

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    VirtexDSP48ESlice

    24

    Efficientimplementationofmultiply,add,bit-wiselogical.

  • EE141

    BlockRAMOverview❑ 36K bits of data total, can be configured as:

    ▪ 2 independent 18Kb RAMs, or one 36Kb RAM. ❑ Each 36Kb block RAM can be configured as:

    ▪ 64Kx1 (when cascaded with an adjacent 36Kb block RAM), 32Kx1, 16Kx2, 8Kx4, 4Kx9, 2Kx18, or 1Kx36 memory.

    ❑ Each 18Kb block RAM can be configured as: ▪ 16Kx1, 8Kx2, 4Kx4, 2Kx9, or 1Kx18 memory.

    ❑ Write and Read are synchronous operations. ❑ The two ports are symmetrical and totally

    independent (can have different clocks), sharing only the stored data.

    ❑ Each port can be configured in one of the available widths, independent of the other port. The read port width can be different from the write port width for each port.

    ❑ The memory content can be initialized or cleared by the configuration bitstream.

    25

  • EE141

    Ultra-RAMBlocks

    26

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    State-of-the-Art-XilinxFPGAs

    27

    Virtex Ultra-scale

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    ConfigurationArchitecture‣ Howarethe

    programmingbitsloadedanddistributed?

    ‣ Configurationdepth(numberofstoredon-chipconfigurations)

    ‣ Sameinterfaceoftencanprovideread-backtosavestate/debug

    ‣ DesignChallenge:‣ Configurationsare

    verylarge(100’sofMbits)

    ‣ Movingmanybitsoverchipinterfacerequirestimeandenergy

    28

    Many commercial FPGAs also have an internal reconfiguration controller that allows dynamic self reconfiguration.

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    InternalReconfiguration‣ Traditionally,longshiftchains:

    ‣ slow,relativelyenergyefficient‣ “Randomaccess”structureshavebeentried.‣ permitsfine-grainpartialreconfiguration

    29

    Connections to logic blocks, programmable interconnection points, …

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    XilinxConfigurationLayout‣ “frame”isunitof

    reconfiguration

    ‣ seriallyloadedintochip

    30

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2

    Multi-contextFPGAs

    ‣ Rapiddynamicreconfigurationpossible.

    ‣ What’stheexecutionandprogrammingmodel?

    31

    Garp: a MIPS processor with a reconfigurable coprocessorPublished 1997Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

    “3-D FPGA”

  • CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architectures 2

    EndofLecture4

    32