Upload
logan-ellingham
View
270
Download
10
Tags:
Embed Size (px)
Citation preview
SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs
Prof. Fernanda Lima Kastensmidt, Ph.D.
Instituto de InformaticaUniversidade Federal do Rio Grande do SulPorto Alegre – RS – Brazil
Prof. Fernanda Lima Kastensmidt
Motivation A large set of electronics devices used in avionic, space and ground-level applications can be upset by ionized particles.
memoryprocessors
Analog electronics
FPGA
ASIC
Hardened components
COTS componentsX
$$$$$$$$$$$$$$ $$$
high reliability low reliability
General System
Prof. Fernanda Lima Kastensmidt
Motivation Solution I:
If it is too expensive, so the solution may be design your own hardened device!
– Which fault tolerance techniques should be used?– How much fault tolerance is enough?
It is necessary to qualify your hardened design.
Hardened components
$$$$$$$$$$$$$$
high reliability
Prof. Fernanda Lima Kastensmidt
Motivation Solution II:
It is necessary to qualify the device to analyze its robustness to the application!
– Is it possible to apply some fault tolerance technique? Software level Component replication level
COTS components
$$$
low reliability
Prof. Fernanda Lima Kastensmidt
Types of SEE
Single event phenomena can be classified into threeeffects (in order of permanency):
Single event upset and Single event transient (soft error)
Single event latchup (soft or hard error) Single event burnout (hard failure)
Hard errors or Single Event Latchup (SEL) are due to shorts between ground and power, and cause permanent functional damages.
Prof. Fernanda Lima Kastensmidt
Depending on the circuit, transistor size, charge energy, different current amplitude, duration and shapes will appear.
Collected Charge
Prof. Fernanda Lima Kastensmidt
IC(t) = ICRITICAL(t) = IP(t) – ION(t)
IP
ION IC
Ion Ip
Ic
Soft Error occurs when Qcollected > Qcritical
Charge Collection Mechanism
Prof. Fernanda Lima Kastensmidt
Fault Tolerance
+-
-++-
ionization
FAILUREFault Masking: any technique that prevents faults from introducing errors to the output (failure)
Prof. Fernanda Lima Kastensmidt
Fault Tolerance
+-
-++-
ionization Transient current (injected or
extracted from the junction)
Transient voltage pulse(capacitor node)
FAULTERROR
clk
BIT-FLIP
FAULT EFFECT
FAILURE
Sensors(detection)
Error latencyFault latency
Fault Masking (hardening by design):Hardware and time redundancy
Hardened memory cellsError-correction codes
Self-checking mechanisms with recovery
shielding
Prof. Fernanda Lima Kastensmidt
Fault Tolerance
+-
-++-
ionization Transient current (injected or
extracted from the junction)
Transient voltage pulse(capacitor node)
FAULTERROR
clk
BIT-FLIP
FAULT EFFECT
FAILURE
Sensors(detection)
Error latencyFault latency
Redundant Spare
componentsFault Masking (hardening by design):
Hardware and time redundancyHardened memory cellsError-correction codes
Self-checking mechanisms with recovery
Number of faults overcome the mitigation technique
Prof. Fernanda Lima Kastensmidt
Outline
Radiation Effects on Digital ICs
Radiation Hardening by Design: Strategies for ASICs
Radiation Effects on FPGAs
Radiation Hardening by Design: Strategies for FPGAs
Final Remarks
Prof. Fernanda Lima Kastensmidt
Outline
Radiation Effects on Digital ICs
Radiation Hardening by Design: Strategies for ASICs
Radiation Effects on FPGAs
Radiation Hardening by Design: Strategies for FPGAs
Final Remarks
Prof. Fernanda Lima Kastensmidt
Single Event Effects (SEEs)
Single Event Upset (SEU): bit-flip in a sequential logic element
Digital Single Event Transient (DSET): transient voltage pulse in the combinational logic
Combinational logic
sequential logic sequential logic
0010
1
10 1
11
Transient Effect
Prof. Fernanda Lima Kastensmidt
SEU in Sequential Logic
1 0
OF
F
OF
F
PN N
gnd
OF
F
OF
F
0 1
BIT-FLIP
ionization
P
WL WL
Prof. Fernanda Lima Kastensmidt
Hardened Memories
Approach 1: use decoupling resistors to slow the cell regenerative feedback response avoiding the bit-flip
[Rocket, R., IEEE TNS, 1992]
Prof. Fernanda Lima Kastensmidt
Hardened Memory
clk
D /D/QQ
clk
PE PF
PA PB
PC PD
A
B
C
Vss Vss
Vdd Vdd
Vdd Vdd
Vdd Vdd
N1 N2
P1 P2
N3 N4
Vdd
Vdd Vdd
Vdd
Vss Vss
D /D
clk
Q /Q
M LMP1 MP2
MN1 MN2
MN5 MN6
MN4MN3
MP4MP3
MP6MP5
Approach 2: add transistors to create an appropriate feedback devoted to restore the data corrupted.
IBM Memory Cell [Rockett cell, 88] HIT Memory Cell (Velazco, 92]
Prof. Fernanda Lima Kastensmidt
Hardened MemoriesThe principle is to store the data in two different locationstwo different locations
within the cell in such way that the corrupted part can be restored.
D
D /Q
Q
/clk
clk
Vss Vss
/D
MN0 MN1 MN2 MN3
clk
MN6MN5MN4 MN7
D
MP0 MP1 MP2 MP3
A B C D
Vss Vss Vss Vss
Vdd Vdd Vdd Vdd
Whitaker/Liu Memory Cell [Liu, 92] DICE Memory Cell [Calin, 96]
Prof. Fernanda Lima Kastensmidt
Dual Interlocked storage Cell (DICE)
clk clk
0
0
01
1O
FF
OF
F
OF
F
OF
F
Qa Qb
Prof. Fernanda Lima Kastensmidt
clk clk
0
0
01
1
OF
F
0
OF
F
OF
F
OF
F
OF
F
OF
F
Qa Qb
Dual Interlocked storage Cell (DICE)
Prof. Fernanda Lima Kastensmidt
OF
F
0
clk clk
0
01 1
OF
F
The original value is restored
OF
F
OF
F
OF
F
OF
F
OF
F
Qa Qb
0
Dual Interlocked storage Cell (DICE)
Prof. Fernanda Lima Kastensmidt
Challenges in Sequential Logic
Particle incidence angle Transistor Dimensions Voltage Supply Memory Array Density
+ - + -+ - + -+ - + -+ - + -
MULTIPLE BIT UPSETS
Single memory cell Multiple memory cells
Prof. Fernanda Lima Kastensmidt
Charge Sharing (NMOS transistor)T=0 T=100ps
T=250ps
T=50ps
T=800ps T=2ns
[Reed, et al., New Electronic Technologies Insertion into Flight Programs Workshop, 2007]
Prof. Fernanda Lima Kastensmidt
Limitations of Hardened Memory
Multiple nodes collecting charge are able to upset hardened memory cells.
Solutions: Shallow Trench Isolation (STI) structures Suitable transistors placement and routing Hardened memory cells combined with hardware
redundancy.
+-
-++-
ionization
-+
+-
+-
-+
Prof. Fernanda Lima Kastensmidt
Triple Modular Redundancy
OK
MAJ000 0001 0010 0011 1100 0101 1110 1111 1
inputs
MAJ
clk
Sequential logic
Combinational logic
X
Each master-slave flip-flip can be composed of: standard latches: robust to multiple node collected
charge in the same latch hardened latches: robust to multiple node collected
charge in crossing domain latches too
Prof. Fernanda Lima Kastensmidt
Triple Modular Redundancy
MAJ000 0001 0010 0011 1100 0101 1110 1111 1
inputs
clk
Sequential logic
Combinational logic
X
MAJ
Voter’s output can show a transient wrong value that may be captured by the next memory cell.
X 0
X 1
Prof. Fernanda Lima Kastensmidt
Triple Modular Redundancy
clk
Sequential logic
Combinational logic
MAJ
MAJ
MAJ
OK
Current strength
Triple MAJ voter
OK
OK • Increases current drive helping keeping the node in the original value.
Prof. Fernanda Lima Kastensmidt
Triple Modular Redundancy
MAJ000 0001 0010 0011 1100 0101 1110 1111 1
inputs
clk
Sequential logic
Combinational logic
X
X
X
X
Catastrophic effect: the system votes three wrong values out of three and the result is assumed to be correct.
TripleMAJvoter
Prof. Fernanda Lima Kastensmidt
SET in Combinational Logic
Each node has an associated: Capacitance Resistance
curr
ent
time
Charge Qi
QDrift
Qdiffusion
…
Critical ChargeQCRIT
SET pulseAmplitude x Width
Prof. Fernanda Lima Kastensmidt
SET in Combinational Logic
e0e1
e2a3
Q
10
0 1
Not all SETs are captured by a memory cell.
They can be: Logical masked Electrical masked Latch window masked
Logical masked
01
1
Prof. Fernanda Lima Kastensmidt
SET in Combinational Logic
e0e1
e2a3
Q
01
1
Electrical masked
01
1
0
0
Not all SETs are captured by a memory cell.
They can be: Logical masked Electrical masked Latch window masked
Negligible pulse
Prof. Fernanda Lima Kastensmidt
SET in Combinational Logic
e0e1
e2a3
Q
01
101
1
0
clk edge
0
Latch window masked
Not all SETs are captured by a memory cell.
They can be: Logical masked Electrical masked Latch window masked
Prof. Fernanda Lima Kastensmidt
Electrical Masking
[Bruguier, G., et al., IEEE TNS, 1996]
Heavy Ion Radiation Results: 180nm CMOS
Pulse too narrow!!!
Prof. Fernanda Lima Kastensmidt
SET vs. Frequency
Radiation Results:DSET for 180nm vs. Freq
Freq.
clk
[Benedetto et al, IEEE TNS, 2004]
Prof. Fernanda Lima Kastensmidt
TW
TW
TW
Challenges in Combinational Logic
SET Transient Width (TW) may vary from few hundred of pico seconds to few nano seconds according to LET.
clk
clk
clk[Dodd, P., IEEE TNS 2004]
TW
100
Crit
ical
Tra
nsie
nt W
idth
(p
s)
100 Ghz
5Ghz
2.5 GHz
1Ghz
500 Mhz
Process technology (nm)
Prof. Fernanda Lima Kastensmidt
SET vs. SEU Error Rate
Prof. Fernanda Lima Kastensmidt
Challenges in Combinational Logic
According to the logic topology fan-out, a single SET may originate multiple SETs.
a0
a1a2a3
a4a5
y0
y1
Q0
Q1
X
X
Prof. Fernanda Lima Kastensmidt
Identifying the most sensitive nodes
Fault injection performed by electrical (SPICE) and logic simulations can identify the most sensitive nodes:
Lower critical charge (QCRIT)
Lower SET logical mask probability
AB
CD
E
F
Z
most sensitive nodes
Prof. Fernanda Lima Kastensmidt
Transistor Resizing
AB
CD
E
F
Z
most sensitive nodes
[Zhou et al., IRPS 2004] [Cazeaux et al., IOLTS 2005] [Dhillon et al., IEEE Transaction on ISVLSI 2006]
QCRITICAL
Prof. Fernanda Lima Kastensmidt
Gate Replication
AB
CD
E
F
Z
most sensitive nodes
[Lisboa, C., et al., SBCCI 2005]
• Increases current drive helping keeping the node in the original value.
[Nieuwland et al., IOLTS 2006]
Current strength
Prof. Fernanda Lima Kastensmidt
Temporal Filtering
Votes the SET out by time redundancy. The time redundancy is implemented by delays at the
clock lines or at the latch/flip-flops inputs.
clk
Sequential logic
Combinational logic
clk+ T
clk+ 2.T
X OK
Sequential logic
Combinational logic
clk
X OK
2.T
T
Tripleor
Single
MAJvoter
Tripleor
Single
MAJvoter
[Nicolaidis, VTS 1999], [Anghel et al., DATE 2000]
Prof. Fernanda Lima Kastensmidt
Full time redundancy
clk
Sequential logic
Combinational logic
clk+T
clk+ 2.T
X
clk
clk+T
T
comb
clk+2.T
T
SET
ffp0
ffp1
ffp2
MAJ
MAJ + comb delays
T
OK
[Nicolaidis, VTS 1999][Anghel et al., DATE 2000]
The .T is directly proportional to the SET Transient Width (TW)
Tripleor
Single
MAJvoter
TW
Prof. Fernanda Lima Kastensmidt
Full time redundancy
clk
Sequential logic
Combinational logic
clk+2.T
clk+4.T
XOK
clk
clk+2.T
T
comb
clk+4.T
T
SET
ffp0
ffp1
ffp2
MAJ
MAJ + comb delays
T
TW clk period (T)
Tripleor
Single
MAJvoter
2. TW
Prof. Fernanda Lima Kastensmidt
Temporal Latching to Trigger SETs
[Benedetto et al., IEEE TNS 2004]
Error cross-section decreases with the increase of T
.T
Prof. Fernanda Lima Kastensmidt
Triple Sample Memory Robust to Multiple Bit Upsets and SET
[MAVIS, IRPS 2002]
/D
MN0 MN1 MN2 MN3
clk
MN6MN5MN4 MN7
D
MP0 MP1 MP2 MP3
A B C D
Vss Vss Vss Vss
Vdd Vdd Vdd Vdd
combinational logic
Shifted clocks
Prof. Fernanda Lima Kastensmidt
Triple Sample Memory Robust to Multiple Bit Upsets and SET
[MAVIS, IRPS 2002]
combinational logic
Shifted clocks
X
OK
Prof. Fernanda Lima Kastensmidt
Triple Sample Memory Robust to Multiple Bit Upsets and SET
[MAVIS, IRPS 2002]
combinational logic
Shifted clocks
Multiple nodes collected charge
OK
OK
OK
X
Prof. Fernanda Lima Kastensmidt
Triple Sample Memory Robust to Multiple Bit Upsets and SET
[MAVIS, IRPS 2002]
combinational logic
Shifted clocks
OK
OK
OK
OK
Multiple nodes collected charge
Prof. Fernanda Lima Kastensmidt
Full Triple Modular Redundancy (TMR) with self-recovery
voter
voter
voter
TR0
TR1
TR1TR2
TR0TR2
TR2TR0TR1
TRV0
TRV1
TRV2
E0
E1
E2
D0
D1
D2
clk0
clk1
clk2
X
OK
OK
OK
combinational logic
combinational logic
combinational logic
Prof. Fernanda Lima Kastensmidt
Full Triple Modular Redundancy (TMR) with self-recovery
voter
voter
voter
TR0
TR1
TR1TR2
TR0TR2
TR2TR0TR1
TRV0
TRV1
TRV2
E0
E1
E2
D0
D1
D2
clk0
clk1
clk2
combinational logic
combinational logic
combinational logic
X
OK
OK
OK
Prof. Fernanda Lima Kastensmidt
Full Triple Modular Redundancy (TMR) with self-recovery
voter
voter
voter
TR0
TR1
TR1TR2
TR0TR2
TR2TR0TR1
TRV0
TRV1
TRV2
E0
E1
E2
D0
D1
D2
clk0
clk1
clk2
combinational logic
combinational logic
combinational logic
output pad
wired voter
output pads
Prof. Fernanda Lima Kastensmidt
How much mitigation is enough?
The circuits are becoming more and more complex
Hardware and Time redundancy techniques can provide a certain level of protection on:– Single Event Upsets (SEU)– Single Event Transient (SET)– Multiple Bits or Nodes Upsets
Problem: in some cases multiple faults can overcome the mitigation techniques, provoking a system failure.
Prof. Fernanda Lima Kastensmidt
Multiple Faults in the Full TMR
voter
voter
voter
TR0
TR1
TR1TR2
TR0TR2
TR2TR0TR1
TRV0
TRV1
TRV2
E0
E1
E2
D0
D1
D2
clk0
clk1
clk2
combinational logic
combinational logic
combinational logic
X
X
WR
ON
G V
AL
UE
Prof. Fernanda Lima Kastensmidt
How much mitigation is enough?
How is it possible to know that the mitigation technique is working properly for a certain Soft Error Rate (SER)?
It is necessary to have a mechanism to inform the system when the number of multiple faults have passed a certain level.
Built-in Self Test (BIST) Mechanism: – sensors working as watch dogs– each time an ionization occurs, the system is informed
Prof. Fernanda Lima Kastensmidt
How about sensors working as watch dogs?
voter
voter
voter
TR0
TR1
TR1TR2
TR0TR2
TR2TR0TR1
TRV0
TRV1
TRV2
D0
D1
D2
clk0
clk1
clk2
combinational logic
combinational logic
combinational logic
sensors
sensors
sensors
sensors
sensors
sensors
Full TMR with sensors
Prof. Fernanda Lima Kastensmidt
How about sensors working as watch dogs?
voter
voter
voter
TR0
TR1
TR1TR2
TR0TR2
TR2TR0TR1
TRV0
TRV1
TRV2
D0
D1
D2
clk0
clk1
clk2
combinational logic
combinational logic
combinational logic
sensors
sensors
sensors
sensors
sensors
sensors
If sensors detect:
• One upset per time
Technique is working!
Full TMR with sensors
Prof. Fernanda Lima Kastensmidt
How about sensors working as watch dogs?
voter
voter
voter
TR0
TR1
TR1TR2
TR0TR2
TR2TR0TR1
TRV0
TRV1
TRV2
D0
D1
D2
clk0
clk1
clk2
combinational logic
combinational logic
combinational logic
sensors
sensors
sensors
sensors
sensors
sensors
If sensors detect:
• Two or more upsets in distinct redundant modules per time
Technique is not working!
Full TMR with sensors
X
Prof. Fernanda Lima Kastensmidt
Bulk Built-in Current Sensors
During normal operation, the current in the bulk is approximately zero. When an energetic particle generates an ionization, it creates a current that flows through the stroke node and Vdd or gnd. The bulk-BICS senses the current generated by ionization at the bulk terminal.
+ - + - + -
[Henes Neto et al. IEEE MICRO, 2006]
Bulk-BICS
Prof. Fernanda Lima Kastensmidt
Bulk Built-in Current Sensors
Circuit Design
Vdd’
Gnd’
Vdd
Vdd
BICS-N
BICS-P
n1 n2
n4 n3
n5
p4
n6
p6p5
p1 p2
p3
nRST
RST
Vdd
01
NP P
ionization
01
Flips the BICS latch
Prof. Fernanda Lima Kastensmidt
Trade-offs
There is always some penalty to be paid when protecting circuits against upsets.
Each technique may present a combination of:– area overhead, – performance penalty,– power dissipation increase.
The challenge is to select the most cost-effective techniques for the target circuit application.
Prof. Fernanda Lima Kastensmidt
CASE-STUDY: Adder
ADDER X
XDetection• SET• SEU
ADDER
ADDER
=
Duplication with Comparison (DWC)
ADDER
Bulk-BICS
Bulk-BICS
ADDER
Recomputing with Shifted Operands
<<
<< >>
=
S = A + B
2.S = 2.A + 2.B
Prof. Fernanda Lima Kastensmidt
CASE-STUDY: Adder
ADDER X
XSEU correction
ADDER
Hardened Flip-flops
ADDER
Error-Correction Code (Hamming)
enc dec
enc dec
enc dec
Prof. Fernanda Lima Kastensmidt
CASE-STUDY: Adder
SEU and SET correction
ADDER
ADDER
ADDER
voter
ADDER
ADDER
ADDER
voter
voter
voter
TMR with single voter TMR with triple voter
Prof. Fernanda Lima Kastensmidt
CASE-STUDY: Adder
SEU and SET correction
ADDER
voter
voter
voter
voter
voter2.T
T
Time redundancy with TMR in the registers
Prof. Fernanda Lima Kastensmidt0 500 1000 1500 2000 2500 3000
No protected
DWC
bulk-BICS
Recomputation with Shifted Operands
Hardened memory
ECC hamming
TMR single voter
TMR triple voter
Time Redundancy + TMR registers
Performance
Area
AREA vs. PERFORMANCE
SEU and SET detection
SEU correction
SEU and SET correction
Less than 50%
More than 200%
Less than 50%
Prof. Fernanda Lima Kastensmidt
How about Qualifying for SEE?
Testing by fault injection:– Model the SEU and SET effect at:
Spice level Logic level or RTL level
Testing in a Laser Facility Testing at ground-level facilities
– (in front of a beam of Protons,
heavy ions, neutrons)
Testing in space (actual environment)
accu
racy
cost
Prof. Fernanda Lima Kastensmidt
When testing in a Ground Level facility for SEE: Static Testing:
– no application is running during the test. – The register files are read during or after the test to check for SEU or/and SET and compared to a gold file. – Test in memories, microprocessors, ASICs in general
Dynamic Testing:– Applications are running during test.– Outputs are been analyzed and compared to a gold design. – SEU and SET can be checked during test – Test in memories, microprocessors, ASICs in general, analogcircuits, etc…
Prof. Fernanda Lima Kastensmidt
General System
memory
processors
Analog logic
FPGA
ASIC
Prof. Fernanda Lima Kastensmidt
Outline
Radiation Effects on Digital ICs
Radiation Hardening by Design: Strategies for ASICs
Radiation Effects on FPGAs
Radiation Hardening by Design: Strategies for FPGAs
Final Remarks
Prof. Fernanda Lima Kastensmidt
Field-Programmable Gate Arrays An array of logic blocks and interconnections customizable by programmable switches. High logic density Customizable by the end user to realize different designs
Configurable logic blocks
(CLBs)
interconnections
Switches for customization
Prof. Fernanda Lima Kastensmidt
Programmable Technologies
Programmable switches can be based on:
Antifuse: (Antifuses based FPGAs)– when an electrically programmable switch forms a low resistance path between
two metal layers. – One-time configurable
SRAM: (SRAM based FPGAs) – the state of a static latch controls pass transistors or multiplexers connected to
pre-defined metal layers– Re-configurable
Flash: (Flash based FPGAs)– Floating gate controls the switches– Re-configurable
Prof. Fernanda Lima Kastensmidt
Antifuse-based FPGAs
Non-volatile: hold the customizable content even when not connected to the power supply. They can be programmed just once.
FPGAs products for Space– ACTEL– AEROFLEX (based on Quicklogic)
Prof. Fernanda Lima Kastensmidt
ACTEL: RTAX-S device
RAM
CT
RAM
RAM
RAM
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
SCSCSCSCSCSC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
HDHDHDHDHDHDHDHDHDHDHDHDHD
[Actel, RTAX-S RadTolerant FPGAs 2007]
C RRX
TX
RX
TX
RX
TX
RX
TX
BC CC R
Super Cluster
Prof. Fernanda Lima Kastensmidt
ACTEL: RTAX-S device
C RC
D2
D0
DB
A0
A1 Y
D3
D1
B1
B0
FC
I
CF
N
10
10
10
10
10
D2
D0
DB
A0
A1
FC
O Y
D3
D1
B1
B0
CF
N10
10
10
10
10
10
C-CELL R-CELL
Robust to SEU
Susceptible to SET
[Actel, RTAX-S RadTolerant FPGAs 2007]
X
X
X
ERROR
C-CELL
Prof. Fernanda Lima Kastensmidt
Effects of Frequency Response
Circuit: Shift Register with 8 levels of C-cell between R-cells
Error cross-section increases when frequency increases.
# E
RR
OR
clk edge
[Berg, M. et al., IEEE TNS 2006]
Prof. Fernanda Lima Kastensmidt
hardened flip-flops
ViaLink connections
RadHard Eclipse FPGA from Aeroflex
Robust to SEU
X
ERROR
Prof. Fernanda Lima Kastensmidt
Antifuse FPGAs: summary
Customized routing is not sensitive to SEU
Flip-flops are not sensitive to SEU– Actel and Aeroflex provides one solution where all
flip-flops are hardened.
Logic are susceptible to DSETs– The user may protect the logic by using high level
mitigation techniques in the VHDL/VERILOG description of the design (TMR, duplication and others)
Prof. Fernanda Lima Kastensmidt
SRAM-based FPGAs
Volatile: loose their contents information when the memories are not connected to the power supply. They can be reprogrammed as many times as necessary at the work site They are programmed by loading a bitstream
FPGAs products for Space
– XILINX – ATMEL– HONEYWELL
Prof. Fernanda Lima Kastensmidt
SRAM-based FPGAs
Basic board must be composed of:
FPGA
Osc.
IO Interface
Power SupplyCore & IO
EEPROM FPGA
LOADER& MEMORY
Pro
gra
mm
ing
Int
erfa
ce
The original design bitstream must be stored in a memory outside the FPGA.
Memory size needed:Bitstream may range from Kbytes to several Mbytes.
110101011
Prof. Fernanda Lima Kastensmidt
Reconfigurability
Can offer benefits for space and remote applications by:
saving space in the system: the same circuitry can be used with different configurations at different stages of a mission, reducing weight and power requirements.
allowing in-orbit design changes reducing the mission cost by correcting errors
If part of an FPGA fails, then circuitry can be reprogrammed to make use of remaining functional portions of the chips.
Prof. Fernanda Lima Kastensmidt
FPGA Design Flow
Hardware Description Language
Synthesis optimizations
Logic mapping Placement
Routing
configuration bitstream… 101001110100000111…
Prof. Fernanda Lima Kastensmidt
Technology Scaling in Xilinx FPGAs
Nanometer technologies
Embedded Hard microprocessor
Embedded memories (BRAM)
Prof. Fernanda Lima Kastensmidt
SRAM-based FPGA Architecture
Configurable logic block (CLB)
GRM
slices
A B C D
Lookup Table (LUT)
‘0’
0
1
1
1
1
11
1
0
1
0
0
1
01
0
BRAM
Boolean FunctionF(A,B,C,D)
Xilinx FPGA
Prof. Fernanda Lima Kastensmidt
Configuration memory bits
SEU in SRAM-based FPGAs: CLB slice
CLB slice000101
11
000010111
I1 I2 I3 I4
LUT
routing
LUT
Persistent effect (corrected by scrubbing)
Transient Effect (corrected at next ffp load)
Prof. Fernanda Lima Kastensmidt
Configuration memory bits
SET in SRAM-based FPGAs : CLB slice
CLB slice
000101
11
000010111
I1 I2 I3 I4
LUT
routing
X
LUT
SET may be captured by the ffp.
Prof. Fernanda Lima Kastensmidt
Direct connections
Hex connections
General Routing Matrix (GRM)
Direct lines
Double lines
CLB CLB CLB
CLB CLB
CLB CLB CLB
CLB CLB
Long lines
Hex lines
CLB CLB CLB CLB CLB CLB
CLB CLB CLB CLB
Fast connect
CLB
Prof. Fernanda Lima Kastensmidt
0 1
short
1 0
open
Direct connections: Hex connections:
open
short
0 1 1 1
SEU in SRAM-based FPGAs: Routing
short
open
Prof. Fernanda Lima Kastensmidt
Other sensitive structures
Digital Clock Manager (DCM)
Power-on Reset (POR)
Input and Output Blocks (IOB)
• Low probability of occurrence• Signature: done pin transitions low, I/O becomes tri-stated, no user functionality available• Solution: reconfigure device
Single-Event-Functional Interrupts (SEFI)
SelectMAP and JTAG controllers• Low probability of occurrence• Signature: loss of communication, read access to configuration memory returns constant value.• Solution: reconfigure device
Power-PC Hard IP
Multi-Gigabit Transceivers (MGT)
Prof. Fernanda Lima Kastensmidt
SEE Characterization – Heavy Ion: Static Testing in Virtex4
BRAMs present higher error cross-section compared to CLBs
Error cross-section of POR in Virtex4 has improved compared to Virtex-II.
[George, et al. IEEE Radiation Effects Data Workshop, 2006]
Prof. Fernanda Lima Kastensmidt
Scrubbing(full or partial
reconfiguration)
Scrubbing
Hardware Description Language
configuration bitstream … 101001110100000111…
TMR by hand
ISE tool Synthesis optimizations
Logic mapping Placement
Routing
ISE tool Placement
Routing
Fault Injection(fault tolerance verification)
10101011..
output
Prof. Fernanda Lima Kastensmidt
Scrubbing: continuous configuration
SRAM-based FPGA
OSC
INITDONE
CCLK
OE/RESET
CLK
XQR18V04DATA[7:0] DATA[7:0]
CE
WRGND
OE/RESET
CLK
XQR18V04DATA[7:0]
CE
I/O
GND
CS
BOOT
SCRUB
• No application interruption
PROM
It does not correct upsets in:- Embedded Memory (BRAM)- CLB flip-flops
00000001010101010101001010101001010101010101010101001011111111110111100000000111010101011010101010100101000010
10001101010
00000001010101010101001010101001010101000101010101001011111111110111100000000111010101011010101010100101000010
I/O
I/OI/O
SCRUB Controller
I/O
Configuration bits
Original bitstream
Prof. Fernanda Lima Kastensmidt
Configuration Scrubbing Example: to correct persistent effect faults
ScrubColumn
x
ConfigurationUpset
Prof. Fernanda Lima Kastensmidt
ScrubColumn
ConfigurationUpsetRepaired
Scrubbing rate is important to reduce the probability of multiple upsets.
Scrubbing can be performed:
– from outside the FPGA by another FPGA controller
– from inside the FPGA: Hardware Internal Configuration Access Port (HWICAP)
Configuration Scrubbing Example: to correct persistent effect faults
Prof. Fernanda Lima Kastensmidt
Scrubbing(full or partial
reconfiguration)
Mitigation Techniques
Hardware Description Language
configuration bitstream … 101001110100000111…
TMR by hand
ISE tool Synthesis optimizations
Logic mapping Placement
Routing
ISE tool Placement
Routing
Fault Injection(fault tolerance verification)
10101011..
output
Prof. Fernanda Lima Kastensmidt
X-TMR
Full TMR in: Combinational logic Sequential Logic Inputs/Output pads
INPUT
package PIN
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
OUTPUT
package PINT
MR
flip-flo
p
TM
R O
utput Vote
r
FPGA
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
TM
R flip
-flop
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
Why do we need full TMR?
To guarantee the correct output in the presence of the persistent effect errors that are corrected only by loading the correct bitstream.
Prof. Fernanda Lima Kastensmidt
MAJ
MAJ
MAJ
clk0
clk1
clk2
TMR flip-flop
INPUT
package PIN
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
OUTPUT
TM
R flip
-flop
TM
R O
utput Vote
r
FPGA
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
TM
R flip
-flop
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
LUT: 00010111_00010111
R0 R1 R20 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1
MAJ00010111
tr0
tr1
tr2
The recovery path is mandatory to correct the state of the flip-flops, specially in FSM.
Prof. Fernanda Lima Kastensmidt
INPUT
package PIN
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
OUTPUT
package PIN
TM
R flip
-flop
TM
R O
utput Vote
r
FPGA
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
TM
R flip
-flop
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
R0
R1
R2
O_voter
O_voter
O_voterR2
R1
R0
R0 R1 R20 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1
MAJ00011000
REF
LUT: 00011000_00011000
3-state_0
3-state_1
3-state_2
0: it allows the data to pass to the output pad.
1: it blocks the data
Prof. Fernanda Lima Kastensmidt
Evaluating TMR I/O pads
Inputs at 66 MHz
[Swift et al, IEEE TNS 2004]
Prof. Fernanda Lima Kastensmidt
Heavy Ion
[Swift et al., IEEE TNS 2004]
Evaluating TMR I/O pads
Prof. Fernanda Lima Kastensmidt
Evaluating Multiple Bit Upsets
220nm CMOS 130nm CMOS
Heavy ion radiation static test:
[Quinn, et al., IEEE TNS, 2005]
Virtex Family Virtex II Family
Prof. Fernanda Lima Kastensmidt
Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2).
INPUT
package PIN
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
OUTPUT
package PIN
TM
R re
giste
r w
ith vo
ters a
nd
refre
sh
tr0
tr1
tr2
TM
R O
utp
ut
Ma
jority V
ote
rFPGA
a
Bit-flip a: affects only the redundant logic tr0, consequently, the majority voter choose the correct result (two out of three outputs).
Domain Crossing Events
X
OK
OK
OK
Prof. Fernanda Lima Kastensmidt
Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2).
INPUT
package PIN
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
OUTPUT
package PIN
TM
R re
giste
r w
ith vo
ters a
nd
refre
sh
tr0
tr1
tr2
TM
R O
utp
ut
Ma
jority V
ote
rFPGA
b
Bit-flip b: affect two redundant logic parts, consequently, the majority voter will not choose the correct result (two out of three outputs).
Domain Crossing Events
OK
X
X
X
Prof. Fernanda Lima Kastensmidt
Solution to Reduce Domain Crossing Events
Voters Insertion:Barrier of voters can reduce the probability of a bit-flip in the
routing causing a short cut connection among two or more redundant blocks.
INPUT
package PIN
REDUNDANT LOGIC (tr0)
REDUNDANT LOGIC (tr1)
REDUNDANT LOGIC (tr2)
tr0
tr1
tr2
OUTPUT
package PIN
tr0
tr1
tr2
TM
R re
giste
r w
ith vo
ters a
nd
refre
sh
TM
R M
ajo
rity Vo
ter
tr0
tr1
tr2
TM
R M
ajo
rity Vo
ter
TM
R O
utp
ut
Ma
jority V
ote
r
FPGA
logic partition
[Kastensmidt, et al., DATE 2005]
b OK
OK
OKOK
X
X
OK
OK
OK
OK
Prof. Fernanda Lima Kastensmidt
Upsets in BRAMs are not corrected by scrubbing.
TMR with refreshing must be used to mitigate upsets.
Need to use Dual Port BRAMs.
Mechanism to refresh the memory contents– Counter– Voters
TMR BRAM (Embedded memory)
X
OKOK
Prof. Fernanda Lima Kastensmidt
Scrubbing(full or partial
reconfiguration)
Verifying the Mitigated Design
Hardware Description Language
configuration bitstream … 101001110100000111…
TMR by hand
ISE tool Synthesis optimizations
Logic mapping Placement
Routing
ISE tool Placement
Routing
Fault Injection(fault tolerance verification)
10101011..
output checking
Prof. Fernanda Lima Kastensmidt
Flash-based: ActelProASIC3
Prof. Fernanda Lima Kastensmidt
Flash-based FPGA: CLB tile
Prof. Fernanda Lima Kastensmidt
SummaryAntifuse FPGAs:
- Fault tolerance techniques applied in VHDL/Verilog- protect SET (SEU is protected by the vendor)
SRAM FPGA- Fault tolerance techniques applied in VHDL/Verilog- Scrubbing to clean persistent faults- protect SET and SEU- New FPGA protected by Vendor is coming out!
Flash FPGA- Fault tolerance techniques applied in VHDL/Verilog- protect SEU and SET- Flash transistor sensitivity for SEE is low, still under
Investigation
Prof. Fernanda Lima Kastensmidt
Outline
Radiation Effects on Digital ICs
Radiation Hardening by Design: Strategies for ASICs
Radiation Effects on FPGAs
Radiation Hardening by Design: Strategies for FPGAs
Final Remarks
Prof. Fernanda Lima Kastensmidt
Final Remarks
Mitigation techniques for ASICs and FPGAs must take into account SEUs and SETs considering single and multiple effects.
ASICs: Integrated systems fabricated at nanometer technologies should have mitigation techniques at different levels to ensure robustness:– charge dissipation (transistor resizing, capacitors,
resistors)– Sensors (bulk-BICS)– hardware and time redundancy– Error-correction codes (ECCs)– Self-checking and recomputation
Prof. Fernanda Lima Kastensmidt
Final Remarks
FPGAs: new FPGA generations bring more flexibility and design capabilities but also more reliable design challenges.
The design can always be protected by high level techniques (VHDL, VERILOG) such as TMR.
In order to reduce the cost of TMR, solutions at the FPGA architectural level must be done in:– CLB logic:
Combinational blocks Sequential blocks Programmable switches
– Routing programmable switches… to mitigate against SEU and SET!
Prof. Fernanda Lima Kastensmidt
Conferences
NSREC – IEEE Nuclear and Space Radiation Effects Conference www.nsrec.com
RADECSEuropean Conference on Radiation Effects on Components and Systems www.radecs.org
2011- RADECS in Sevilla, SPAIN
Prof. Fernanda Lima Kastensmidt
Schools
SERESSA
First: 2006 - Manaus - BrazillSecond: 2007 - Sevilla - SpainThird: 2008 - Buenos Aires - ArgentinaFourth: 2009 - Florida, USA
2010 - France
2011 - Brazil
Prof. Fernanda Lima Kastensmidt
Takasaki, JapanDecember 2-4th, 2009
TECHNICAL PROGRAM
Registration Daniel Loveless (Vanderbilt Univ.) TBD (ONERA)Basics Radiation testing
Welcome Michel Pignol (CNES)Robert Ecoffet (CNES) System hardening& Pascal Fouillat (IMS) Dale McMorrow (NRL)Environments & Anomalies & Vincent Pouget (IMS)
Laser testing
TBD (JAXA) Massimo Violante (Polito)TBD Software hardening TBD (JAEA)
TBD
Sarah Armstrong (NSWC) Fernanda Lima-Kastensmidt (UFRGS) Tour of JAEA Radiation testing facilitiesBasics SEU & SET in FPGA Ron Schrimpf (Vanderbilt Univ & ISDE)Single event effects
Raoul Velazco (TIMA) Guy Berger (UCL)Experiments & Rate prediction & Paul Peronnard (TIMA)
Remote Heavy Ion testing
Hugh Barnaby (ASU)Total dose effects Vincent Pouget (IMS)
TBD (LIMMS) Remote laser testingMEMS in space applications
Philippe Adell (JPL)Rad effects ConclusionsPower systems Bob Walters (NRL)
Radiation effects in solar cells
4
5
AM
PM
9
10
11
12
1
2
3
SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs
Fernanda Lima Kastensmidt, Ph.D.