Upload
others
View
18
Download
0
Embed Size (px)
Citation preview
FPGA Architectures and FPGA Architectures and Operation for Tolerating Operation for Tolerating SEUsSEUs
Chuck StroudChuck StroudElectrical and Computer EngineeringElectrical and Computer Engineering
Auburn UniversityAuburn University
1/31/07 VLSI Design & Test Seminar 2
Outline of PresentationOutline of PresentationField Programmable Gate Arrays (Field Programmable Gate Arrays (FPGAsFPGAs))
How Programmable Logic WorksHow Programmable Logic WorksConfiguration MemoryConfiguration Memory
Single Event Upset (SEU) Problem in Single Event Upset (SEU) Problem in FPGAsFPGAsConfiguration MemoryConfiguration MemorySystem Function Memory ElementsSystem Function Memory Elements
Architectural SolutionsArchitectural SolutionsHamming Code for MemoryHamming Code for Memory
SEU Controller for Configuration MemorySEU Controller for Configuration MemoryTriple Modular Redundancy and Guard BandsTriple Modular Redundancy and Guard Bands
Operational SolutionsOperational SolutionsPlan for AubieSatPlan for AubieSat--22
Summary & ConclusionsSummary & ConclusionsAubieSatAubieSatAUBIAUBIeeSSaaTT
1/31/07 VLSI Design & Test Seminar 3
11100110100010001001010100010111110011010001000100101010001011100010100101010101001001000100010001010010101010100100100010001010100100100110010010000111100101010010010011001001000011110001100101000100001100100010100010110010100010000110010001010001001001001000101001010101001001000100100100010100101010100100100101000101001010001010010100100010100010100101000101001010010001001010101110101010101010101010100101010111010101010101010101010101111011111000000000000001101010111101111100000000000000110100111110000100111000001110010010011111000010011100000111001001010000000011111001001000101000101000000001111100100100010100111001001010000111100011100010011100100101000011110001110001001010101010101010101001010010101101010101010101010100101001010101001001010101010101010010010010100100101010101010101001001001
Basic FPGA OperationBasic FPGA OperationWriting configuration Writing configuration memorymemory ⇒⇒ ddefines system efines system functionfunction
Input/Output (I/O) CellsInput/Output (I/O) CellsLogic in Logic BlocksLogic in Logic BlocksConnections between Connections between Logic Blocks & I/O cellsLogic Blocks & I/O cells
Changing configuration Changing configuration memory data memory data ⇒⇒ changes changes system functionsystem function
Can change at anytimeCan change at anytimePartial reconfigurationPartial reconfiguration
SEUsSEUs can change can change configuration memory configuration memory data to another functiondata to another function
1/31/07 VLSI Design & Test Seminar 4
FPGA ResourcesFPGA Resources
79,704,83279,704,83242,10442,104Configuration memory bitsConfiguration memory bits1,2001,2006262Input/output cellsInput/output cells
OtherOther
51251200DSP coresDSP cores5765761616Memory cores per FPGAMemory cores per FPGA
36,86436,864128128Bits per memory coreBits per memory coreSpecializeSpecializedd
CoresCores
3,4623,462139139PIPsPIPs per PLBper PLB4064064545Wire segments per PLBWire segments per PLB
RoutingRouting
8811LUTsLUTs and flipand flip--flops per PLBflops per PLB25,92025,920256256PLBsPLBs per FPGAper FPGA
LogicLogic
Large FPGALarge FPGASmall FPGASmall FPGAFPGA ResourceFPGA Resource
Almost everything in FPGA eitherAlmost everything in FPGA eitherUses memory elements, orUses memory elements, orIs controlled by configuration memoryIs controlled by configuration memory
1/31/07 VLSI Design & Test Seminar 5
PLB ArchitecturePLB ArchitectureLookLook--up Table (LUT) implements truth table up Table (LUT) implements truth table for combination logic functionsfor combination logic functionsCarry & control logic implements fast Carry & control logic implements fast adders/adders/subtractorssubtractorsMemory elements susceptible to Memory elements susceptible to SEUsSEUs::
FlipFlip--flop/latchflop/latchLUTsLUTs are memory elements storing truth tableare memory elements storing truth table
In some In some FPGAsFPGAs LUTsLUTs can function as small can function as small RAMsRAMs
carry in
LUT/RAM Carry &
ControlLogic
Flip-flop/Latch
4
carry out
3
Control
OutputQ output
Input[1:4]
clock, enable, set/reset
1/31/07 VLSI Design & Test Seminar 6
Combinational Logic Combinational Logic FucntionsFucntionsAny digital logic function Any digital logic function can be represented by a can be represented by a truth tabletruth tableMultiplexer exampleMultiplexer example
If S = 0, Z = AIf S = 0, Z = AIf S = 1, Z = BIf S = 1, Z = BHeavily used in Heavily used in FPGAsFPGAs
S input controlled by S input controlled by configuration memory bit to configuration memory bit to allow selection of signal allow selection of signal flowflow
A
S
B
Z
0
1
A
B
S
Z
Logic symbol
01
S A B Z0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 01 1 1 1
Truth table
1/31/07 VLSI Design & Test Seminar 7
LookLook--up Tablesup TablesConfiguration Configuration memory holds memory holds outputs for truth outputs for truth tabletableInternal signals Internal signals connect to connect to control signals control signals of multiplexers of multiplexers to select value to select value of truth table for of truth table for any given input any given input valuevalue
0
1
A
B
S
Z
Multiplexer
S A B Z0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 01 1 1 1
Truth table
B A S
0
1
Z
0
1
0
1
0
1
0
1
0
1
0
1
1 0 1
1
0
0
1
1
0
1
0
1
1/31/07 VLSI Design & Test Seminar 8
Data In
Add
ress
Dec
oderWriteEnable
In0In1In2
ld0
ld1
ld2
ld3
ld4
ld5
ld6
ld7
LookLook--up Table Based up Table Based RAMsRAMsNormal LUT mode Normal LUT mode performs read performs read operationsoperationsAddress decoder Address decoder with write enable with write enable generates load generates load signals to latches signals to latches for write operationsfor write operationsSmall Small RAMsRAMs but but can be combined can be combined for larger for larger RAMsRAMs
In0 In1 In2
0
1
Z
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
1
0
1
0
1
1/31/07 VLSI Design & Test Seminar 9
Xilinx VirtexXilinx Virtex--4 4 FPGAsFPGAsConfiguration memory: 4.7M to Configuration memory: 4.7M to 50.8M bits of RAM50.8M bits of RAMLogic Blocks: 1,536 to 22,272Logic Blocks: 1,536 to 22,272
4 4 LUTsLUTs (4(4--input)input)4 4 LUTs/RAMsLUTs/RAMs (4(4--input)input)8 FF/latches8 FF/latches
Block Block RAMsRAMs: 48 to 552 18K: 48 to 552 18K--bit bit dualdual--port RAMsport RAMs
Also operate as FIFOsAlso operate as FIFOsDSP cores: 32 to 512, each DSP cores: 32 to 512, each includes:includes:
18x1818x18--bit multiplierbit multiplier4848--bit adder & accumulatorbit adder & accumulator
PowerPC processors: 0 to 2PowerPC processors: 0 to 2
PC
PC
1/31/07 VLSI Design & Test Seminar 10
ItIt’’s Getting Worse All The Times Getting Worse All The TimeSmaller design rules & lower supply voltagesSmaller design rules & lower supply voltagesM. M. OhlssonOhlsson, P. , P. Dyreklev, K. Johansson, & P. Alfke,
““Neutron Single Even Upsets in SRAMNeutron Single Even Upsets in SRAM--Based Based FPGAsFPGAs,,””Proc. Proc. 1998 1998 IEEE Nuclear & Space Radiation Effects ConfIEEE Nuclear & Space Radiation Effects Conf. .
Used radiation chamber to calculate SEU frequency at Used radiation chamber to calculate SEU frequency at altitude of 10km at 60altitude of 10km at 60°°N (Sweden)N (Sweden)
3.3V3.3V5V5VVccVcc2.8x102.8x1055 hrshrs
0.350.35µµmmXC4010XLXC4010XL
1.3x101.3x1066 hrshrs1 SEU every1 SEU every
0.600.60µµmmProcessProcessXC4010EXC4010EFPGAFPGA
Increase by Increase by a factor of a factor of
21.521.5
Projecting this for 3 design rule shrinks & 2 voltage reductionsProjecting this for 3 design rule shrinks & 2 voltage reductions we getwe get≈≈1 SEU every 28.2 hrs1 SEU every 28.2 hrs
400 slices in 4010400 slices in 4010vs.vs.
89,088 in Virtex89,088 in Virtex--44
1/31/07 VLSI Design & Test Seminar 11
Hardware SolutionsHardware SolutionsFPGA manufacturers are including FPGA manufacturers are including somesomemechanisms formechanisms for
Detecting/correcting Detecting/correcting SEUsSEUsHamming codeHamming code
Configuration memoryConfiguration memorySEU controller soft coreSEU controller soft core
RAM coresRAM cores
Tolerating Tolerating SEUsSEUsTools for Triple Modular Redundancy (TMR)Tools for Triple Modular Redundancy (TMR)
TMR would be used for FPGA memory elements not TMR would be used for FPGA memory elements not covered by Hamming codecovered by Hamming code
Allows limited number of Allows limited number of SEUsSEUs to be toleratedto be tolerated
Need more & better techniquesNeed more & better techniques
1/31/07 VLSI Design & Test Seminar 12
Calculating Hamming CodeCalculating Hamming CodeHH = # Hamming bits= # Hamming bits
DD++HH+1 +1 ≤≤ 22HH
D= D= # data bits# data bitsHamming, BSTJ Hamming, BSTJ ‘‘5050
DD=8 example=8 exampleH1=D1H1=D1⊕⊕D2D2⊕⊕D4D4⊕⊕D5D5⊕⊕D7D7H2=D1H2=D1⊕⊕D3D3⊕⊕D4D4⊕⊕D6D6⊕⊕D7D7H3=D2H3=D2⊕⊕D3D3⊕⊕D4D4⊕⊕D8D8H4=D5H4=D5⊕⊕D6D6⊕⊕D7D7⊕⊕D8D8
Hamming distance, Hamming distance, dd=3==3=EE++CC+1+1Single bit error detection & Single bit error detection & correction (SEC)correction (SEC)
EE=1, =1, CC=1=1Additional parity bit, Additional parity bit, dd=4==4=EE++CC+1+1
Parity over data & Hamming bitsParity over data & Hamming bitsDouble error detection (DED) & Double error detection (DED) & single error correction (SEC)single error correction (SEC)
EE=2, =2, CC=1=1
121110987654321PositionPosition
1000
H4
0100
H3
0010
H2
0001
H1
11110000Parity H410001110Parity H301101101Parity H201011011Parity H1
D8D7D6D5D4D3D2D1BitBit
Hamming mismatch, no parity error2-bit error detectionHamming mismatch, parity error1-bit correctable errorHamming match, no parity errorNo bit error
ConditionError Type
E = #bit errors to detectC = #bit errors to correct
1/31/07 VLSI Design & Test Seminar 13
Hamming Code OperationHamming Code OperationExample: RAM or configuration memoryExample: RAM or configuration memoryInput (Generate Circuit):Input (Generate Circuit):
Generate Hamming code for dataGenerate Hamming code for dataStore data and Hamming bitsStore data and Hamming bits
Output (Detect/Correct Circuit):Output (Detect/Correct Circuit):Regenerate Hamming code for dataRegenerate Hamming code for dataBitBit--wise XOR with stored Hamming bitswise XOR with stored Hamming bits
NonNon--zero syndrome indicateszero syndrome indicatesError detection and bit position of error bitError detection and bit position of error bit
Flip that bit to correctFlip that bit to correct
Extra parity bit determines nonExtra parity bit determines non--correctable double bit errorcorrectable double bit errorIndication can disable correction circuit to avoid further corruIndication can disable correction circuit to avoid further corruptionption
HHstoredstored
HHregeneratedregeneratedSyndromeSyndrome
HHHH
HH
Syndrome Decoder
DDii DDiiSyndromeSyndromeHH
11 DDii
1/31/07 VLSI Design & Test Seminar 14
H1=0H1=0H2=0H2=0H3=1H3=1H4=1H4=1
H1=0H1=0H2=1H2=1H3=0H3=0H4=1H4=1
H1=0H1=0H2=1H2=1H3=1H3=1H4=0H4=0
Error Detection and CorrectionError Detection and CorrectionSingle bit error examplesSingle bit error examples
D3 is erroneousD3 is erroneousChanges H3 and H2Changes H3 and H2
Syndrome = 0110 = bit 6Syndrome = 0110 = bit 6D6 is erroneousD6 is erroneous
Changes H4 and H2Changes H4 and H2Syndrome = 1010 = bit 10Syndrome = 1010 = bit 10
Odd number of bits changeOdd number of bits changeOverall parity bit error Overall parity bit error ⇒⇒ SECSEC
Double bit error exampleDouble bit error exampleD3 and D6 are erroneousD3 and D6 are erroneous
Changes H3 and H4 (but not H2)Changes H3 and H4 (but not H2)Syndrome = 1100 = bit 12Syndrome = 1100 = bit 12
Indicates error in D8Indicates error in D8Even number of bits changeEven number of bits change
No overall parity error No overall parity error ⇒⇒ DEDDED
121211111010998877665544332211PositionPosition
11000000
H4H4
00110000
H3H3
00001100
H2H2
00000011
H1H1
1111111100000000110000001111110000111100111100110011001111001111
D8D8D7D7D6D6D5D5D4D4D3D3D2D2D1D1BitBit
01100110 10101010 11001100
1/31/07 VLSI Design & Test Seminar 15
VirtexVirtex--4 Hamming Codes4 Hamming CodesHamming bits stored in each frame of Hamming bits stored in each frame of configuration memoryconfiguration memory
Frame ECC circuit checks Hamming code Frame ECC circuit checks Hamming code as each frame is read & indicatesas each frame is read & indicates
Single correctable errorsSingle correctable errorsNeed additional circuit to fix erroneous bitNeed additional circuit to fix erroneous bit
Multiple nonMultiple non--correctable errorscorrectable errorsNeed to reload configuration memoryNeed to reload configuration memory
Block Block RAMsRAMsContents not covered by configuration Contents not covered by configuration memory Hamming bitsmemory Hamming bitsRAMsRAMs have ECC mode with Hamming bitshave ECC mode with Hamming bitsDetection and correction circuitryDetection and correction circuitry
Correction only on output dataCorrection only on output dataNeed to write corrected data back in RAMNeed to write corrected data back in RAM
PC
PC
1/31/07 VLSI Design & Test Seminar 16
Xilinx VirtexXilinx Virtex--4 Frame ECC Circuit4 Frame ECC CircuitHamming code stored in configuration memoryHamming code stored in configuration memory
1,3121,312--bit frame includesbit frame includesUp to 1,300 bits of configuration dataUp to 1,300 bits of configuration data11 Hamming bits + 1 overall parity bit11 Hamming bits + 1 overall parity bit
Hamming code generated by configuration bit generation Hamming code generated by configuration bit generation program and downloaded with configuration dataprogram and downloaded with configuration dataHamming code check performed on each read operationHamming code check performed on each read operation
No bit error correctionNo bit error correctionmust be performed by user logic and written back to configuratiomust be performed by user logic and written back to configuration n memorymemory
Status indications:Status indications:No errorNo errorDEDDEDSEC w/ syndromeSEC w/ syndromeSyndrome validSyndrome valid
ConfigMemory
1,312-bitwords
Parity BitGenerator
D
H
HammingCode
Generator
ParityCheck
HammingCheckH
OutputData
FRAME ECC
FrameAddressRegister
ErrorIndicators
DED
SECHSyndrome
1/31/07 VLSI Design & Test Seminar 17
XilinxXilinx’’s SEU Controllers SEU ControllerSoft core synthesized with userSoft core synthesized with user’’s designs design
Sequences through frames one at a timeSequences through frames one at a timeUses Frame ECC circuit and Internal Configuration Access Uses Frame ECC circuit and Internal Configuration Access Port (ICAP) to detectPort (ICAP) to detect
Single bit detectable errorsSingle bit detectable errorsPicoBlazePicoBlaze microcontroller corrects bit and writes frame microcontroller corrects bit and writes frame back into configuration memoryback into configuration memory
Double bit nonDouble bit non--correctable errorscorrectable errors
Requires Requires ≈≈ 140 140 PLBsPLBs & 2 Block & 2 Block RAMsRAMs≈≈ 30 30 PLBsPLBs for for PicoBlazePicoBlaze and 1 RAM for and 1 RAM for program memoryprogram memory≈≈ 110 110 PLBsPLBs for SEC circuit and ICAP interfacefor SEC circuit and ICAP interface
Plus 1 RAM for storing and correcting frame dataPlus 1 RAM for storing and correcting frame data
SEU controller operation SEU controller operation (full chip @ 100MHz)(full chip @ 100MHz)
Error detection time Error detection time ≈≈ 1.2 to 14.6 1.2 to 14.6 msecmsecSmallest to largest VirtexSmallest to largest Virtex--4 4
Error correction time Error correction time ≈≈ 24 to 278 24 to 278 msecmsec
1/31/07 VLSI Design & Test Seminar 20
Complicating the ProblemComplicating the ProblemBlock RAM contents not covered by configuration Block RAM contents not covered by configuration memory Hamming bitsmemory Hamming bits
Current program memory for Current program memory for PicoBlazePicoBlaze not SEU tolerantnot SEU tolerantChanging data in memory elementsChanging data in memory elements
FFsFFs & LUT& LUT--RAMsRAMsDo not change Hamming bitsDo not change Hamming bits
Restore operationRestore operationLoads Loads configconfig memory data into memory data into FFsFFs, LUT, LUT--RAMsRAMs, and , and BRAMsBRAMs
Capture operationCapture operationLoads FF, LUTLoads FF, LUT--RAM, and BRAM contents to RAM, and BRAM contents to configconfig memmem for readfor read
Destroys Hamming informationDestroys Hamming informationCannot use Capture with SEU controllerCannot use Capture with SEU controller
Operational restrictions on FPGA for SEU toleranceOperational restrictions on FPGA for SEU tolerance
SEU controller not SEUSEU controller not SEU--toleranttolerantNeed TMR SEU controller design Need TMR SEU controller design Need TMR Need TMR PicoBlazePicoBlaze design w/ ECC RAM for program design w/ ECC RAM for program memmem
Need to write corrected single bit errors back into program memoNeed to write corrected single bit errors back into program memoryry
1/31/07 VLSI Design & Test Seminar 21
VirtexVirtex--4 Block 4 Block RAMssRAMssContain 48 to 552 18KContain 48 to 552 18K--bit dualbit dual--port port RAMsRAMs
Program from 16Kx1Program from 16Kx1--bit RAM to 512x36bit RAM to 512x36--bit RAMbit RAMNo SEU protection in these modes of operationNo SEU protection in these modes of operation
Can operate as 24 to 276 36KCan operate as 24 to 276 36K--bit bit RAMsRAMs with ECCwith ECC512x72512x72--bit bit RAMsRAMs
6464--bit databit data77--bit Hammingbit Hamming
Single error correctionSingle error correction11--bit overall paritybit overall parity
Double error detectionDouble error detection
Can also operate as Can also operate as FIFOsFIFOsWith or without ECC modeWith or without ECC mode
PPC
PPC
=DSPs=PLBs
=Block RAMs/FIFOs=I/O Buffers
1/31/07 VLSI Design & Test Seminar 22
Xilinx VirtexXilinx Virtex--4 ECC RAM4 ECC RAM
D64D63D62D61D60D59D58H71000D57D56D55D54D53D52D51D500111D49D48D47D46D45D44D43D420110D41D40D39D38D37D36D35D340101D33D32D31D30D29D28D27H60100D26D25D24D23D22D21D20D190011D18D17D16D15D14D13D12H50010D11D10D9D8D7D6D5H40001D4D3D2H3D1H2H1no err0000111110101100011010001000Syndrome
Separate Hamming code Separate Hamming code generatorsgenerators
Separate write & read portsSeparate write & read portsOnly RAM output data Only RAM output data corrected by ECCcorrected by ECC
Contents of RAM still erroneousContents of RAM still erroneousExtra circuitry to write corrected Extra circuitry to write corrected data back into RAMdata back into RAM
VirtexVirtex--5 has internal correct mode5 has internal correct mode
RAMCore
512words
64+7+1 bits/word
D=64
H=7
InputData
Parity BitGenerator
D
H
HammingCode
Generator
ParityCheck
HammingCheckH
Bit ErrorCorrection Circuit
OutputData
Generate Detect/Correct
HammingCode
Generator
Parity BitGenerator
D
ErrorIndicators
DED
SEC
readaddr
writeaddr
1/31/07 VLSI Design & Test Seminar 24
Triple Modular Redundancy (TMR)Replicate modules and add majority Replicate modules and add majority voter(svoter(s))
Protects against single faults in replicated modulesProtects against single faults in replicated modulesTMR SEU susceptibility problem in TMR SEU susceptibility problem in FPGAsFPGAs
Single faults in can cause multiple modules to failSingle faults in can cause multiple modules to failPrimarily biPrimarily bi--directional directional PIPsPIPs
TMR fault isolation with guard band regionsTMR fault isolation with guard band regionsGuard bands isolate module components and routingGuard bands isolate module components and routingAn SEU can cause errors in only one moduleAn SEU can cause errors in only one module
Deactivated switch
Module 2
Module 1 Module 3
Majority Voter
isolatedwire segments
Majority Voter
Module1
Module2
Module3
Guard Bands
1/31/07 VLSI Design & Test Seminar 25
Programmable Interconnect PointsProgrammable Interconnect PointsBreakBreak--point PIPpoint PIP
Connect or isolate 2 wire segmentsConnect or isolate 2 wire segmentsCrossCross--point PIPpoint PIP
2 nets straight through2 nets straight through1 net turns corner and/or fans out1 net turns corner and/or fans out
Compound crossCompound cross--point PIPpoint PIPCollection of 6 breakCollection of 6 break--point point PIPsPIPs
Can route to two isolated signal netsCan route to two isolated signal netsThese biThese bi--directional directional PIPsPIPs were significant were significant portion of routing resources in early portion of routing resources in early FPGAsFPGAs
Now less than 0.4% of routing resourcesNow less than 0.4% of routing resourcesMultiplexer PIPMultiplexer PIP
Directional and bufferedDirectional and bufferedMain routing resource in recent Main routing resource in recent FPGAsFPGAsSelect 1Select 1--ofof--NN inputs for outputinputs for output
Buffer prevents some SEU affectsBuffer prevents some SEU affectsBut not all But not all –– currently studying effectscurrently studying effects
1/31/07 VLSI Design & Test Seminar 26
Guard BandsGuard BandsGuard Bands reduce Guard Bands reduce interaction of signals interaction of signals between modulesbetween modules6 CLB wide 6 CLB wide GBsGBs
Good isolation but big Good isolation but big area overheadarea overhead
1 CLB wide 1 CLB wide GBsGBsSome isolationSome isolation
Turn off stub trimming Turn off stub trimming to see used wire to see used wire segment interactionsegment interaction
Still have problemsStill have problemsLong linesLong lines
Long lines use biLong lines use bi--directional directional PIPsPIPsPACE controls logic PACE controls logic but not routingbut not routing
CLB isolation for fault CLB isolation for fault monitoring circuitsmonitoring circuits
1/31/07 VLSI Design & Test Seminar 27
Fault Monitoring CircuitFault Monitoring CircuitLocated in guard band regionsLocated in guard band regionsCompares outputs of adjacent working regionsCompares outputs of adjacent working regions
Can be used to compare internal nodesCan be used to compare internal nodesEarlier SEU detection than output aloneEarlier SEU detection than output alone
Any mismatch implies SEU occurredAny mismatch implies SEU occurredCount errors and/or take actionCount errors and/or take action
Scrub configuration memoryScrub configuration memoryActivate SEU controller to locate/correct single bit errorsActivate SEU controller to locate/correct single bit errors
Failure indications point to frames to scan for errorsFailure indications point to frames to scan for errors
output fromregion #1
PLBs forfault isolation
guard bandwith fault
monitor circuit
output fromregion #2
Interrupt to SEU controller
SR latch
Module1
Module2
Module3
Guard Bands
1 01 1
1/31/07 VLSI Design & Test Seminar 28
Majority Voter for SEU ControllerMajority Voter for SEU ControllerAdding Adding XORsXORs to majority voting circuit to majority voting circuit gives circular comparison of module gives circular comparison of module outputsoutputs
Better diagnostic resolution for faulty modules Better diagnostic resolution for faulty modules to scan for SEU controllerto scan for SEU controllerLower latency for locating/correcting Lower latency for locating/correcting SEUsSEUs
Out1 Out2 Out3 Out1 Out2 Out3
11 11 00
XX
00 11 11
XX
11 00 11
XX
1/31/07 VLSI Design & Test Seminar 29
TMRTMR
SEUSEUECCECC
ECCECC
SEUSEU
TMRTMR
Our Plan for VirtexOur Plan for Virtex--4 4 FPGAsFPGAsConfiguration memory: 4.7M to Configuration memory: 4.7M to 50.8M bits of RAM50.8M bits of RAMPLBsPLBs: 1,536 to 22,272: 1,536 to 22,272
4 4 LUTsLUTs (4(4--input)input)4 4 LUTs/RAMsLUTs/RAMs (4(4--input)input)8 FF/latches8 FF/latches
Block Block RAMsRAMs: 24 to 276 32K: 24 to 276 32K--bit bit ECC ECC RAMsRAMs ((ECC onlyECC only))
Also operate as FIFOsAlso operate as FIFOsDSP cores: 32 to 512, each DSP cores: 32 to 512, each includes:includes:
18x1818x18--bit multiplierbit multiplier4848--bit adder & accumulatorbit adder & accumulator
PowerPC processors: 0 to 2PowerPC processors: 0 to 2
PC
PC
CanCan’’t TMR PowerPCs!!t TMR PowerPCs!!Use TMR MicroUse TMR Micro-- or Picoor Pico--BlazeBlaze
1/31/07 VLSI Design & Test Seminar 30
AUBIAUBIeeSSaaTT PlanPlanCount, correct, and classify Count, correct, and classify SEUsSEUs in an actual in an actual FPGA in spaceFPGA in space
Compare with sensor measurementsCompare with sensor measurementsDetermine if Determine if SEUsSEUs impact system function or notimpact system function or not
Single bit correctableSingle bit correctableWith and without impact on system functionWith and without impact on system function
Double bit nonDouble bit non--correctablecorrectableWith and without impact on system functionWith and without impact on system function
Record and transmit SEU counts and typesRecord and transmit SEU counts and typesTolerate/correct Tolerate/correct SEUsSEUs using various mechanismusing various mechanism
Use ECC functionality to count & correct Use ECC functionality to count & correct SEUsSEUsConfiguration Memory (w/ SEU controller circuit)Configuration Memory (w/ SEU controller circuit)Block Block RAMsRAMs in ECC modein ECC mode
Monitor and count failure indicationsMonitor and count failure indications
1/31/07 VLSI Design & Test Seminar 31
AUBIAUBIeeSSaaTT PlanPlanUse TMR with guard bands for all other logicUse TMR with guard bands for all other logic
Design SEU Design SEU Include fault monitoring circuits to detect/count Include fault monitoring circuits to detect/count SEUsSEUs
SEUsSEUs can occur in configuration memory & be counted twicecan occur in configuration memory & be counted twiceBut only configuration memory ECC can correct But only configuration memory ECC can correct SEUsSEUs
SEUsSEUs in system TMR flipin system TMR flip--flops may be flops may be ““flushed outflushed out”” in timein timeFault monitor failures indicate area for SEU controller scanFault monitor failures indicate area for SEU controller scan
Reduces latency for detection & correction of SEUReduces latency for detection & correction of SEU
Include ability to download original configurationInclude ability to download original configurationTo scrub memories in case of multiple nonTo scrub memories in case of multiple non--correctable errors in configuration memorycorrectable errors in configuration memory
Use Use ““radrad--hardhard”” ROM to store configurationROM to store configurationMay also periodically reMay also periodically re--download to scrub memorydownload to scrub memory
1/31/07 VLSI Design & Test Seminar 32
SummarySummarySingle Event Upsets (Single Event Upsets (SEUsSEUs) in ) in FPGAsFPGAs
Serious problemSerious problemEverything controlled by configuration memory bitsEverything controlled by configuration memory bits
New architectural features provide indication of New architectural features provide indication of SEUsSEUswith ability to correctwith ability to correct
SEU controller scan to detect and correct single bit errorsSEU controller scan to detect and correct single bit errorsECC Block RAM modeECC Block RAM mode
TMR with guard band regions in TMR with guard band regions in FPGAsFPGAsIsolate multiple working regions that contain Isolate multiple working regions that contain functionally equivalent system functionsfunctionally equivalent system functionsFault monitoring circuits within guard bandsFault monitoring circuits within guard bands
Compare working regionsCompare working regionsDetects Detects SEUsSEUs that could impact system operationthat could impact system operation
Take action when mismatch occursTake action when mismatch occursAHABAHAB