Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architectures 2
CS250 VLSISystemsDesign
Fall2020
JohnWawrzynek
with
AryaReais-Parsi
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architectures 2
FPGAOverview
2
SimplifiedversionofFPGAinternalarchitecture(notscalable)
‣ Basicidea:two-dimensionalarrayoflogicblocksandflip-flopswithameansfortheusertoconfigure(program):
1.theinterconnectionbetweenthelogicblocks,
2.thefunctionofeachblock.
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
ReconfigurableFabricArchitecture:DegreesofFreedom
1. LogicBlocksCapacityandinternalstructureofcombinationlogiccircuitsandstateelement(s),
Clusteringandinternalinterconnect
2. InterconnectionNetworkArchitectureCircuit-switchednotpacket-switched,
Topologyofnetwork
3. ConfigurationArchitecturehowisprogramminginformationloadedanddistributed,
configuration“depth”
4. Hardblocks:RAM,ALUs,ProcessorCores,…Function(s),count,andhowintegratedintothefabric
3
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2 4
Spring 2013 EECS150 - Lec02-SDS-FPGAs Page
Colorsrepresentdifferenttypesofresources:
LogicBlockRAMDSP(ALUs)ClockingI/OSerialI/O+PCI
Aroutingfabricrunsthroughoutthechiptowireeverythingtogether. 64
XilinxVirtex-5
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
ConfigurableLogicBlocks(CLBs)
5
Slices define regular connections to the switching fabric, and to slices in
CLBs above and below it on the die.
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
Primitive:5-inputLookUpTables(LUTs)
6
A[6:2] D000000000100010
....
101
111011111011111
001
Q
Q
Q
Q
Q
Q
(1)
(1)
(1)
(0)
(0)
(0)
....D
A[6:2]
Computes any 5-input logic function.
Timing is independent of function.
Latchesset during
configuration.
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
Virtex6-LUTs:Compositionof5-LUTs
7
May be used as one
6-input LUT (D6 out) ...
... or as two 5-input LUTS (D6 and D5)
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
Thesimplestviewofaslice
8
Spring 2013
Four 6-LUTs
Four Flip-Flops
Switching fabric may see combinational and registered
outputs.
AnactualVirtexsliceaddsmanysmallfeaturestothissimplifieddiagram.Weshowthemoneby
one...
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
Two7-LUTsperslice...
9
Extra multiplexers(F7AMUX,
F7BMUX)Extra inputs (AX and CX)
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
Orone8-LUTsperslice...
10
Spring 2013
Third multiplexer(F8MUX)
Third input (BX)
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
ExtramuxestochoseLUToption...
11
Spring 2013
From eight 5-LUTs ... to one 8-LUT.
Combinational or registered outs.
Flip-flops unused by LUTs can be used
standalone.
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2 12
Spring 2013
Wecanmapripple-carryadditiononto
carry-chainblock.
Thecarry-chainblockalsousefulforspeedingupother
adderstructuresandcounters.
Virtex“Vertical”Logic
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
Puttingitalltogether...aSLICEL.
13
Spring 2013 EECS150 - Lec02-SDS-FPGAs
ThepreviousslidesexplainallSLICELfeatures.
About50%oftheareSLICELs.
TheotherslicesareSLICEMs,
andhaveextrafeatures.
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
Recall:5-LUTarchitecture...
14
A[6:2] D000000000100010
....
101
111011111011111
001
Q
Q
Q
Q
Q
Q
(1)
(1)
(1)
(0)
(0)
(0)
....D
A[6:2]
32Latches.Configuredto1or0.
Somepartsofalogicdesignneedmanystate
elements.
SLICEMsreplacenormal5-LUTswithcircuitsthatcanactlike5-LUTs,butcanalternativelyusethe32
latchesasRAM,ROM,shiftregisters.
EE141
ASLICEM6-LUT...
Normal6-LUTinputs.
Normal5/6-LUToutputs.
Memory write
address
Memorydatainput
Memorydatainput.
ControloutputforchainingLUTsto
makelargermemories.
Synchronouswrite/asychronousread
15
EE141
Page
SLICELvsSLICEM...SLICEMSLICEL
SLICEMaddsmemoryfeaturestoLUTs,+muxes.
!16
EE141
DistributedRAMPrimitives
Allarebuiltfromasinglesliceorless.
Remember,though,thattheSLICEMLUTisnaturallyonly1readand1writeport.
17
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
ConfigurableInterconnect
‣ DesignChallenges(topology):‣ traversinglongwiresincursdelayandenergy‣ switches(transistors)addsignificantdelay‣ Mappingtime
18
“connection block”
switch matrix could be more richly populated
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
XilinxFPGAs(tileinterconnectdetail)
19
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
OtherTopologies‣ Traditional:
‣ Fromflexlogic,Inc.
20
Clos Network
“uses about half the area of the traditional interconnect and uses only 5-7 metal routing layers”
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
Fat-TreeBasedInterconnect‣ Use“Rent’srule”forproperthickness
21
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
EmbeddedHardBlocks‣ Manyimportant
functionsarenotefficientwhenimplementedinthereconfigurablefabric:
‣ multiplication,largememory,processorcores,…
‣ Dedicatedblockstakerelativelylittleareaandthereforecouldgounused.
22
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2 23
Spring 2013 EECS150 - Lec02-SDS-FPGAs Page
Colorsrepresentdifferenttypesofresources:
LogicBlockRAMDSP(ALUs)ClockingI/OSerialI/O+PCI
Aroutingfabricrunsthroughoutthechiptowireeverythingtogether. 64
XilinxVirtex-5
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
VirtexDSP48ESlice
24
Efficientimplementationofmultiply,add,bit-wiselogical.
EE141
BlockRAMOverview❑ 36K bits of data total, can be configured as:
▪ 2 independent 18Kb RAMs, or one 36Kb RAM. ❑ Each 36Kb block RAM can be configured as:
▪ 64Kx1 (when cascaded with an adjacent 36Kb block RAM), 32Kx1, 16Kx2, 8Kx4, 4Kx9, 2Kx18, or 1Kx36 memory.
❑ Each 18Kb block RAM can be configured as: ▪ 16Kx1, 8Kx2, 4Kx4, 2Kx9, or 1Kx18 memory.
❑ Write and Read are synchronous operations. ❑ The two ports are symmetrical and totally
independent (can have different clocks), sharing only the stored data.
❑ Each port can be configured in one of the available widths, independent of the other port. The read port width can be different from the write port width for each port.
❑ The memory content can be initialized or cleared by the configuration bitstream.
25
EE141
Ultra-RAMBlocks
26
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
State-of-the-Art-XilinxFPGAs
27
Virtex Ultra-scale
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
ConfigurationArchitecture‣ Howarethe
programmingbitsloadedanddistributed?
‣ Configurationdepth(numberofstoredon-chipconfigurations)
‣ Sameinterfaceoftencanprovideread-backtosavestate/debug
‣ DesignChallenge:‣ Configurationsare
verylarge(100’sofMbits)
‣ Movingmanybitsoverchipinterfacerequirestimeandenergy
28
Many commercial FPGAs also have an internal reconfiguration controller that allows dynamic self reconfiguration.
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
InternalReconfiguration‣ Traditionally,longshiftchains:
‣ slow,relativelyenergyefficient‣ “Randomaccess”structureshavebeentried.‣ permitsfine-grainpartialreconfiguration
29
Connections to logic blocks, programmable interconnection points, …
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
XilinxConfigurationLayout‣ “frame”isunitof
reconfiguration
‣ seriallyloadedintochip
30
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architecture 2
Multi-contextFPGAs
‣ Rapiddynamicreconfigurationpossible.
‣ What’stheexecutionandprogrammingmodel?
31
Garp: a MIPS processor with a reconfigurable coprocessorPublished 1997Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
“3-D FPGA”
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architectures 2
EndofLecture4
32