Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
1
CMOSCMOS INTEGRATED CIRCUIT DESIGN TECHNIQUESINTEGRATED CIRCUIT DESIGN TECHNIQUES
University of University of IoanninaIoannina
OnOn‐‐Line TestingLine TestingOnOn‐‐Line TestingLine Testing
Dept. of Computer Science and EngineeringDept. of Computer Science and Engineering
Y. Tsiatouhas
OverviewOverview
CMOS Integrated Circuit Design TechniquesCMOS Integrated Circuit Design Techniques
OverviewOverview
1.1. Reliability issues Reliability issues –– Transient faultsTransient faults
2.2. Soft errorsSoft errors
3.3. Timing errorsTiming errors
On-Line Testing 2
4.4. Error tolerance design techniquesError tolerance design techniques
VLSI Systemsand Computer Architecture Lab
2
Reliability in Nanometer TechnologiesReliability in Nanometer Technologies
The evolution (scaling) of CMOS technology
Soft Error Rate (SER) increases with technology scaling
( g) gyresults in:
• the reduction of the transistor size• the increment of the operating frequency• the reduction of the power supplyvoltage
• the increment of the transistors’ numberin a die
hi h i t ff t th i it i i
1MB SRAM
64K DynamicLogic
On-Line Testing 3
which in turn affect the circuit noise marginsand reduce their reliability.
SER vs VDD
IEDM ‐ 1999
• Permanent Faults: faults that permanently affect the circuit operation.• Temporary Faults: faults that do not affect permanently the circuit
operation and are discriminated to:f l h l k
Transient Faults and OnTransient Faults and On‐‐Line TestingLine Testing
Transient Faults: due to random fault generation mechanisms like powersupply disturbance, electromagnetic interference, radiation e.t.c.
Intermittent Faults: due to the degradation of the circuit characteristics.
• On‐Line Testing: testing procedures are performed during the circuitoperation.
Concurrent Testing: testing is performed concurrently to the circuit normalti
On-Line Testing 4
operation.
Periodic Testing: testing is performed periodically when the circuit is in theidle mode of operation.
• Off‐Line Testing: testing is performed when the circuit is not used (e.g.manufacturing testing).
3
RequirementsRequirements
We need design techniques and architectures that will guarantee the correctoperation under any circumstances. Techniques that will provide error resiliencyor error detection / correction capabilities.We need self‐checking and concurrent testing mechanisms. We need self‐healingarchitectures that will dynamically react to overcome technology andenvironmental related variabilities and which will be capable to recover after anerror generation and will operate correctly even in the presence of failures.
Towards this direction error tolerance techniques have been proposed:
On-Line Testing 5
Towards this direction, error tolerance techniques have been proposed:• error correction codes and self‐checking circuits and checkers• error mitigation techniques aiming to reduce error rates• error detection and correction methodologies
RadiationRadiation and Soft Errors (I)and Soft Errors (I)
Cosmic Rays
PrimariesDisappear
Low Energy
Secondary Cosmic Ray
Disappear~25Km
Deflected
SecondaryCosmic Rays
(neutrons & pions)~1/cm2 ‐ sat sea level
Si
Burst of
On-Line Testing 6
EarthElectronic Charge1M electrons/μm Si
Problem with Soft Errors in VLSI circuits due to Single Event Transients (SETs) and SingleEvent Upsets (SEUs) generated by:
• emitted α‐particles by package impurities• cosmic ray particles (neutrons, protons and pions)
4
0
1 1
Transient pulse attenuation/masking!
RadiationRadiation and Soft Errors (II)and Soft Errors (II)
SET
1
10D Q
CLK
01 0
1
11
10
Soft Error generation!CLK
Transient Pulseof Duration
SET
On-Line Testing 7
D Q
CLK
01 0
1
1
11
11
0 D
Soft Error
Q
Timing Errors (I)Timing Errors (I)
• Transient faults due to crosstalk, power supply disturbance or ground bounceand environmental variations (e.g. temperature gradients) contribute totiming error generation. Device aging (BTI, HCI effects) is also an important
f ti isource of timing errors.
• Timing verification turns to be a hard task. Moreover, the increased path delaydeviations, due to process variations and the statistical behavior of nano‐devices, as well as the manufacturing defects that affect circuit speed mayresult in timing errors that are not easily detectable (in terms of test cost) inhigh frequency and high device count ICs. Considering also the huge numberof paths in modern circuit designs along with the complexity of testing, it is
t li th t i ifi t b f d f ti IC th
On-Line Testing 8
easy to realize that a significant number of defective ICs may pass thefabrication tests.
• Modern systems running at multiple frequency and voltage levels may sufferfrom timing errors due to environmental and process related (and also datadependent) variabilities that can affect circuit performance.
5
D Q
CLK
01 0
1
1
Timing Errors (II)Timing Errors (II)Error free case
Timing Error generation!
CLK
Delayed Pulseby d
D Q
CLK
01 0
1
1
On-Line Testing 9
D
CLK
d
Timing Error
Q
System Flip‐Flop
CLK
Logic Stage
SLatch PH2Logic Stage
The SThe Scan Design Topology of Intelcan Design Topology of Intel
Latch PH1C1 C2
C1
UPDATE
Sj+1Latch PH2
Latch LA
g g
Sj
Scan Flip‐Flop
D1D2
D1
Latch LB
D1
C1
SCA SCB
C1
D1
D2
C2CAPTURE
SI
SOLatch LA
C1
D1
D2
C2 Latch LB
D1
C1
On-Line Testing 10
Separated System and Scan Flip‐Flops
Extra cost of one Flip‐Flop
6
Intel’s Error Resilience ApproachIntel’s Error Resilience Approach
Main Flip‐Flop
CLK
Logic Stage Logic Stage Latch PH1
C1 C2
C1
UPDATE
Error_L
Sj+1Latch PH2
Latch LA
Logic Stage
Sj
Scan Flip‐Flop
Latch PH1
D1D2
D1
Latch LB
D1
C1
C1
D1
D2
C2CAPTURE
Scan_IN
Scan_OUT
XOR
XOR
OR
AND
On-Line Testing 11
IEEE Computer, 38 (2), 2005
• Additional cost of: 1Flip‐Flop, 2 XOR, 1 AND and 1 OR per Flip‐Flop !
• The error detection latency is high ! Univ. of Stanford & Intel
SCA SCB
The BISER ArchitectureThe BISER Architecture
BISER Flip‐Flop
CLK
MainFli FlLogic Stage
ALogic Stage
Sj+1
Flip‐Flop
ShadowLatch
Logic Stage
Sj
SecondaryFlip‐Flop
C
Weak Keeper
C Element
A
VDD
B
B
Q
D
On-Line Testing 12
ICICDT, 2007
Univ. of Stanford & Intel
Gnd
Q
A
B
B
Muller C‐Element
7
Latch
Φ1
Logic Stage 0 Main
Logic Stage
Register
The GRAAL ArchitectureThe GRAAL Architecture
Sj+10
1
Error_j
MainLatch
ShadowFlip‐Flop
MUXSj
XOR
Φ2
Error_j+1
On-Line Testing 13
ITC, 2007
TIMA & iRoc
Φ1
Φ1
Φ1
Φ2
Φ2
Φ2
Razor Flip‐Flop
CLK
Logic Stage
S0 MainFli Fl
Logic Stage
S
The Razor TopologyThe Razor Topology
0
Error_Rj
Sj+11
D_CLK
Error_L
Flip‐Flop
ShadowLatch
MUXSj
E
1
MUX
ShadowLatch
OR
XOR
On-Line Testing 14
D_CLK
ErrorFlip‐Flop
Error
IEEE Computer, 37 (3), 2004
ErrorFlip‐Flop
• Additional cost of: 1 Latch, 1 XOR, and 1 MUX per Flip‐Flop !Univ. of Michigan & ARM
8
Razor Flip‐Flop
CLK
Logic Stage
S0 MainFli Fl
Logic Stage
S
The Razor Operation (I)The Razor Operation (I)
CLK
Error_Rj
Sj+11
D_CLK
Error_L
Flip‐Flop
ShadowLatch
MUXSj
E0
00
Error Free CaseError Free Case
XOR
OR
On-Line Testing 15
D_CLK
ErrorFlip‐Flop
Error0
Razor Flip‐Flop
CLK
Logic Stage
S0 MainFli Fl
Logic Stage
S
The Razor Operation (II)The Razor Operation (II)
CLK
Error_Rj
Sj+11
D_CLK
Error_L
Flip‐Flop
ShadowLatch
MUXSj
E0
00
Erroneous CaseErroneous Case
00
11
01
11
00
10
XOR
OR
On-Line Testing 16
D_CLK
ErrorFlip‐Flop
Error0
D_CLK
01
• No error detection latency! • One clock cycle cost for error correction!
10
9
The Time Dilation TopologyThe Time Dilation Topology
TD Flip‐Flop
CLK
Logic Stage
S0 Main
TD Register
D
QM
Logic Stage
S 0
Error_Rj
Sj+11
C CLK
Error_L
Flip‐Flop
Capture
QMMUX
ErrorFlip‐Flop
Sj
Error
1
MUX
XOR
OR
OR
On-Line Testing 17
Cap_CLK
Delay
• Additional cost of: 1 XOR, and 1 MUX per Flip‐Flop.IEEE ICECS, 2006
Univ. of Ioannina & Athens
The Time Dilation Operation (I)The Time Dilation Operation (I)
TD Flip‐Flop
CLK
Logic Stage
S0 Main
TD Register
D
QM
Logic Stage
S
CLK
Error_Rj
Sj+11
C CLK
Error_L
Flip‐Flop
Capture
QMMUX
ErrorFlip‐Flop
Sj
Error
Error Free CaseError Free Case
0
0
0XOR
OR
OR
On-Line Testing 18
Cap_CLK
Delay
10
The Time Dilation Operation (II)The Time Dilation Operation (II)
TD Flip‐Flop
CLK
Logic Stage
S0 Main
TD Register
D
QM
Logic Stage
S
CLK
Error_Rj
Sj+11
C CLK
Error_L
Flip‐Flop
Capture
QMMUX
ErrorFlip‐Flop
Sj
Error0
0
0
Erroneous CaseErroneous Case
0
0
1
1
Cap CLK
01
1
1
0
0
10
XOR
OR
OR
On-Line Testing 19
Cap_CLK
Delay
Cap_CLK
• No error detection latency!• One clock cycle cost for error correction!
Timing DiagramsTiming Diagrams
CLK
Cycle i Cycle i+1 Cycle i+2 Cycle i+3
Timing FaultValid Data
Delayed Arrival
Cap_CLK
Data i Data i+1D
Error
Valid Data Correct Arrival
MUX Latch Memory State
MUX LatchExtended
Memory State
On-Line Testing 20
Capture
Data i Data i+1Q
Erroneous DataTiming Error
Error Correction
Correct Data Correct Data
Data i
11
The Time Dilation ArchitectureThe Time Dilation Architecture
eg.Logic
Stage eg.Logic
Stage eg.Logic
Stage eg.Logic
Stage eg.
CLK
D0 D1 D2 D3 D4Q0 Q1 Q2 Q3 Q4
TD FF Reg
IF
TD FF Reg
ID
TD FF Reg
EX
TD FF Reg
MEM
TD FF Re
Error_R0 Error_R1 Error_R2 Error_R3 Error_R4
Capture
ErrorFF
ErrorOR
OR
On-Line Testing 21
Cap_CLKDelay
Pipeline Organization
Pipeline RecoveryPipeline Recovery
Time in cycles
Failing stage Erroneous stage
nstructions
IF ID
IF
EX
ID
IF
MEM
~EX
ID
IF
EX
MEM
ID
IF
MEM
EX
ID
IF
MEM
EX
ID
On-Line Testing 22
I
Re‐execution with correct values at stage inputs
12
ReferencesReferences
• “Making Typical Silicon Matter with Razor”, T. Austin, D. Blaauw, T. Mudge and K.Flautner, IEEE Computers, vol. 37, no. 3, pp. 57‐65, 2004.
“R b S D i i h B il I S f E R ili ” S Mi N S if M• “Robust System Design with Built‐In Soft‐Error Resilience”, S. Mitra, N. Seifert, M.Zhang, Q. Shi and K‐S. Kim, IEEE Computers, vol. 38, no. 2, pp. 43‐52, 2005.
• “Built‐In Soft Error Resilience for Robust System Design,” S. Mitra et al,International Conference on IC Design and Technology, pp. 263‐268, 2007.
• “GRAAL: A New Fault Tolerant Design Paradigm for Mitigating the Flaws of DeepNanometric Technologies,” M. Nicolaidis, International Test Conference, p. 4.3,2007.
On-Line Testing 23
• “Testing and Reliable Design of CMOS Circuits”, N. Jha and S. Kundu, KluwerAcademic Publishers, 1990.