View
8
Download
0
Category
Preview:
Citation preview
ECE 534 1
Fault-Tolerant Design of Digital Systems
EGE 534
Information Redundancy
Dr. Baback IzadiDepartment of Electrical and Computer Engineering and
State University of New York – New Paltzbai@engr.newpaltz.edu
ECE 534 2
Outline
Error detecting and error correcting codesCodes for storage and communicationCodes for arithmetic operationsCodes for control units Example: Applying the detection techniques used so far
A failure resilient node/network controller
ECE 534 3
Basic ConceptsCode
A mean of representing information using a well defined set of rulesCodeword
A collection of symbols used to represent a data based on specific codeExample: BCD code using 0’s and 1’s
Valid codeword Vs. invalid codewordCodeword is valid if it adheres to all the rules that defines the code.
Encoding ProcessDetermining the pertaining codeword from a data itemExample: Data item = 7 0111
Decoding ProcessRecovering the original data from the codeword.
Separable Code: Original information is appended with the check bits to form the code
Non separable Code
ECE 534 4
Basic Concept of Error Detecting/ Correcting
Error detection code Specific representation allowing error introduced into the code to be detected.
n bits 2n combinations
Is BCD error detecting code?Error correcting code
Error detecting initiates reconfigurationError correction provides fault masking
Valid InvalidError
ECE 534 5
Fault Detection through Encoding
At logic level, codes provide means of masking or detection of errorsFormally, code is a subset S of universe U of possible vectorsA noncode word is a vector in set U-S
Example:X1 is a codeword <10010011>Due to multiple bit error, becomesX3 = <10011100> not detectable
X2 is a codeword, becomes X4 noncode detectable
S = even parity
X1
X3X2
X4
U = 28 vectors
ECE 534 6
Hamming Distance
The Hamming weight of a vector x (e.g., codeword), w(x), is number of nonzero elements of x.Hamming distance between two vectors x and y, d(x,y) is number of bits in which they differ.Distance of a code is a minimum of Hamming distances between all pairs of code words.
Example: x = (1011), y = (0110)w(x) = 3, w(y) = 2, d(x, y) = 3
ECE 534 7
Distance Properties
A code with a distance of 2 can detect a single fault. A code with a distance of 3 can correct a single fault or detect a double fault. To detect all error patterns of Hamming distance ≤ d, code distance must be ≥ d+1To correct all error patterns of Hamming distance ≤ c, code distance must be ≥ 2c + 1To correct c bits, and detect d bits, the code distance must be ≥ 2c + d + 1
ECE 534 8
Basic Code Operations
Consider n-bit vectors, space of 2n vectorsA subset of 2n vectors are code wordsSubset called (n, k) code, where fraction k/n is called rate of codeAddition operation on vectors is bit-wise exclusive-ORX + Y = < x1 ⊕ y1, x2 ⊕ y2, …, xn ⊕ yn >
Multiplication operation is bitwise ANDcX = < cx1, cx2, …, cxn >
ECE 534 9
Example
0123456789
Odd and even parity codes for BCD data
DecimalDigit BCD
BCDodd parity
BCDeven parity
0000000100100011010001010110011110001001
00001000100010000111010000101101101011101000010011
ParityBit
00000000110010100110010010101001100011111000110010
ParityBit
Parity
Data
Data Out
ParityChecking
ErrorSignalParity
Generator
Memory
Parity Bit
Data
Data In~~ ~~Organization of a memory that uses single-bit parity. The parity bit is generated when data is written to memory and checked when data is ready.
Parity Codes
Separable code
Data Bits
Generated ParityBit
Error Signal
d0d1
d2
d3
P
XOR Tree for Parity Generation4-bit Parity Generation and Checking
ECE 534 10
Variation of Parity Codes for RAM
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P Bit-per-word parity
7 6 5 4 3 2 1 0 P1 P215 14 13 12 11 10 9 8 Bit-per-byte parity
3 2 1 0 7 6 5 411 10 9 815 14 13 12 P4 P3 P2 P1
Chip 5 Chip 2Chip 4 Chip 3 Chip 1
Bit-per-multiple-chips parity
3 2 1 0 7 6 5 411 10 9 815 14 13 12 P4 P3 P2 P1
Chip 5 Chip 2Chip 4 Chip 3 Chip 1
Bit-per-chip parity
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P4 P3 P2 P1 Interlaced parity
Odd or Even
Odd Even
ECE 534 11
Overlapping Parity
Parity groups are formed with each bit appearing in more than one parity groupErrors can be detected and locatedErroneous bit can be corrected by a simple complementation
0123 P0P1P2
Four information bits Three parity bits
Bit in error Parity in error3 P2 P1 P02 P2 P11 P2 P00 P1 P0P2 P2P1 P1P0 P0
ECE 534 12
Parity Codes for Memory - Comparison
Parity Code Advantages Disadvantages Bit-per-word: one parity bit per data word Detects all single-bit
errors Certain errors undetected, e.g., a word, including parity bit becomes all 1s, due to a failure of a bus or a set of data buffers.
Bit-per-byte: each data portion (e.g., a byte) is protected by a separate parity bit; the parity of one group should be even and the parity of the other group should be odd
Detects all-1s and all-0s conditions
Ineffective for multiple errors, e.g., the whole-chip failure
Bit-per-multiple-chips: one bit from each chip is associated with a single parity bit
Detects failure of entire chip
Cannot locate failure of complete chip
Bit-per-chip: each parity bit is associated with one chip of the memory
Detects single-bit errors and identifies chip with erroneous bit
Susceptible to whole-chip failure, i.e., a single chip error can result in multiple bits to be corrupted and this may go undetected.
Interlaced: similar to the bit-per-multiple-chips; must ensure that no two adjacent bits are from the same parity group
Detects errors in adjacent bits
Parity groups are not based on physical organization of the memory
Overlapping: Error can be detected and corrected
Multiple bits of parity is needed
ECE 534 13
Parity Prediction in Arithmetic Circuits
Binary adderTwo inputs: A = (an-1 … a0 ac) and B = (bn-1 … b0 bc)Two operands to be added: (an-1 … a0) and (bn-1 … b0)ac and bc are check bits of A and B, respectivelyEncoded output will be s = (sn-1 … s0 sc) where (sn-1 … s0) are determined by the ordinary binary addition of (an-1 … a0) to (bn-1 … b0) and sc is the check bit for (sn-1 … s0)Then
Reduces to
sc = sii= 0
n−1
∑
= aii=0
n−1
∑ ⊕ bii=0
n−1
∑ ⊕ cii=0
n−1
∑
sc = ac ⊕ bc ⊕ cii =0
n−1
∑
ECE 534 14
A four bit example
A = a3a2 a1a0 ac = a3 ⊕ a2 ⊕ a1 ⊕ a0
B = b3b2 b1b0 bc = b3 ⊕ b2 ⊕ b1 ⊕ b0
S = s3 s2 s1 s0 sc = s3 ⊕ s2 ⊕ s1 ⊕ s0
s0 = a0⊕ b0 ⊕ c0
s1 = a1⊕ b1 ⊕ c1
s2 = a2⊕ b2 ⊕ c2
s3 = a3⊕ b3 ⊕ c3
sc= (a0⊕ b0⊕ c0)⊕( a1⊕ b1⊕ c1) ⊕ (a2⊕ b2⊕ c2) ⊕ (a3⊕ b3⊕ c3 )sc= (a3 ⊕ a2 ⊕ a1⊕ a0)⊕(b3 ⊕ b2 ⊕ b1 ⊕ b0) ⊕ (c3 ⊕ c2 ⊕ c1 ⊕ c0)sc = ac ⊕ bc ⊕ (c3 ⊕ c2 ⊕ c1 ⊕ c0)
sc = ac ⊕ bc ⊕ cii =0
n−1
∑
ECE 534 15
Parity Prediction in Binary Adder
an-1 a0 ac bcbn-1 b0
Binary adder
+
cn-1
c0
sn-1 s0 Error
... ...
...sc = ac ⊕ bc ⊕ ci
i =0
n−1
∑
∑−
=
=1
0'
n
iic ss
ECE 534 16
Parity-Checked Binary Adder
Sum
...
...
...
sn-1
cn-1
′ c n −1
an-1 bn-1
Sum
Carry
Carry Carry
Sum
Carry
′ c 2
c2
′ c 1
c1
s1 s0
c0
a1 a0 b0b1
+
...sn-1
s0s1
Error
(a) ac bc
ECE 534 17
Parity-Checked Binary Adder (cont.)
+
...sn-1
s0s1
Error
Carry look-ahead circuit
′ c n −1 Carry
Sum
Carry
SumSum
(b)
′ c 2 ′ c 1
sn-1 s1 s0cn-1
a n 1
b n 1
c2
ac bc
c1
a0
b0
...
c0a1
b1
ECE 534 18
Binary Multiplierp0 = a0b0
p1 = a0b1 ⊕ a1b0
p2 = a0b2 ⊕ a1b1 ⊕ a2b0 ⊕ c1,1
p3 = a0b3 ⊕ a1b2 ⊕ a2b1 ⊕ a3b0 ⊕ c2 ,1 ⊕ c1,2
p4 = a1b3 ⊕ a2b2 ⊕ a3b1 ⊕ c3,1 ⊕ c 2, 2 ⊕ c1 ,3
p5 = a2b3 ⊕ a3b2 ⊕ c3,2 ⊕ c2 ,3 ⊕ c1 ,4
p6 = a3b3 ⊕ c3 ,3 ⊕ c2 ,4
p7 = c3 ,4Therefore, denoting the check bit for (p7 …p0 ) by pc
pc = pii =0
7
∑
= ( aii=0
3
∑ )( bji = 0
3
∑ )⊕ ci, jj =1
4
∑i=1
3
∑
= acbc ⊕ ci , jj =1
4
∑i=1
3
∑
a3a2 a1a0× b3b2 b1b0
a3b0 a2b0 a1b0 a0b0a3b1 a2b1 a1b1 a0b1
a3b2 a2b2 a1b2 a0b2a3b3 a2b3 a1b3 a0b3
p7 p6 p5 p4 p3 p2 p1 p0
Example:1 0 1 1
× 0 1 1 00 0 0 0
1 0 1 11 0 1 1
0 0 0 0p7p6 p5p4p3p2p1p0
Cases for A’s and B’s
Even #aibj =1
Even #aibj =1
Oddaibj =1
B Even 1B Odd 1B Odd
A Even 1A Even 1A Oddai= 1∑ ai= 0∑
bj= 0∑bj= 1∑
ai= 0∑
bj= 0∑
pj= 1∑ pj= 0∑ pj= 0∑
ECE 534 19
Multiplier Using Array of Full-Half Adders
a3 a2 a1 a0
a3 a1 a0a2
a3 a2 a1 a0
a3 a2 a1 a0
c sc sc s
c sc sc s
c sc sc s
c sc sc s
HA: Half AdderFA: Full Adder
S: SumC: Carry
HA
HA HA HA
FA
FA FA FA
FAFAFA
FA
C 3 , 1 C 2 , 1 C1 , 1
C 3, 2C 2, 2 C 1, 2
C 3, 3 C 2 , 3 C 1 , 3
C 2, 4C 3, 4 C 1 , 4
p7 p6 p5 p4 p5 p2 p1 p0
b0
b1
b2
b3
ECE 534 20
Error Correction with Overlapped Parity
Corrected Bits
B it 0C 0
B it 0
C 2B it 2
CP 2
B it P2
C 1B it 1
B it 1
B it 2
B it P2
3 2 1 0 P2 P1 P0
Parity GeneratorParity Generator
Pr0V
Parity Generator
1 3 0
Pr1V
2 0 3
Pr2V
1 2 3
3-8Decode
Correct Bit 0C 0
Pr0
P0
Pr1
P1
Pr2
P2
C 1
C 2C 3
CP 0CP 1
CP2Correct Bit 1Correct Bit 2
Correct Bit 3
Correct Bit P0
Correct Bit P1
Correct Bit P2
No Error
Bit in error Parity in error3 P2 P1 P02 P2 P11 P2 P00 P1 P0P2 P2P1 P1P0 P0
0123 P0P1P2
Four information bits Three parity bits
ECE 534 21
Hamming Error-Correcting Code
Require from 10% to 40% redundancyOverlapping parityThe Hamming single-error correcting code uses c parity check bits to protect n bits of information:
2c ≥ c + n + 1Example:
For four information bits (d3, d2, d1, d0), need three parity bits (p2, p1, p0)the bits are partitioned into groups as (d3, d1, d0, p0), (d3, d2, d0, p1) and (d3, d2, d1, p2)the grouping of bits can be determined from a list of binary numbers from 0 to 2n - 1.Each check bit is specified to set the parity, either even or odd, of its respective group 7 6 5 4 3 2 1
d3 d2 d1 p2 d0 p1 p0
ECE 534 22
Hamming Error-Correcting Code
p2p1p0 Bit in Error0 0 0 No Error0 0 1 P0
0 1 0 P1
0 1 1 d0
1 0 0 p3
1 0 1 d1
1 1 0 d2
1 1 1 d3
Parity bits calculationp2 = XOR of bits (5, 6, 7) = d1⊕d2⊕ d3p1 = XOR of bits (3, 6, 7) = d0⊕d2⊕ d3p0 = XOR of bits (3, 5, 7) = d0⊕d1⊕ d3
Parity checkingc2 = XOR of bits (4,5, 6, 7) = p2 ⊕ d1 ⊕d2 ⊕ d3c1 = XOR of bits (2,3, 6, 7) = p1 ⊕ d0 ⊕d2 ⊕ d3c0 = XOR of bits (1,3, 5, 7) = p0 ⊕ d0 ⊕d1 ⊕ d3
Example: d3d2d1d0= 0101 p2p1p0= 101Hence, code = d3d2d1p2d0p1p0 = 0101101Assume a bit error in bit d0 0101001
c3 = p2 ⊕ d1⊕d2⊕ d3 = 0c2 = p1⊕ d0⊕d2⊕ d3 = 1c1= p0⊕d0⊕d1⊕ d3 = 1
Hence, bit 3 is in error! And needs to be inverted.
7 6 5 4 3 2 1
d3 d2 d1 p2 d0 p1 p0
Observe that the check bits identifies the position of the bit in error
ECE 534 23
Check Bits and Syndromes for Single-Bit Errors
The original data is encoded by generating a set Cg, of parity bits.To check correctness, the encoding process is repeated and a set Cc, of parity bits is generated.If Cg and Cc agree, the information is correct.If Cg and Cc disagree, the information is incorrect and must be corrected.To aid the correction, a syndrome is defined:The syndrome is a binary word that has 1 in each bit position in which Cg and Cc disagree; the syndrome points directly to the erroneous bit.
Erroneous bits Check bits affected Syndromesd0 p0, p1 011d1 p0, p2 101d2 p1, p2 110d3 p0, p1, p2 111p1 p0 001p2 p1 010p3 p2 100
7 6 5 4 3 2 1
d3 d2 d1 p2 d0 p1 p0
ECE 534 24
Hamming Single-Error Correction Unit
p3 *
ParityGenerator
d3 d2 d1
p3
C3
p2
ParityGenerator
d3 d2 d0
p2
C2
p1
ParityGenerator
d3 d1 d0
p1
C1
**
Syndrome
Controlled Complementation Unit
d3d0 p3p2 d1 d2p1
d3d0 d1p1 p3p2 d2
C1C2C3
}
SyndromeSyndrome determineswhich bit (if any) Iscomplemented}
Corrected Bits
Hamming single-error correction unit for fourinformation bits and three check bits
7 6 5 4 3 2 1
d3 d2 d1 p3 d0 p2 p1
ECE 534 25
Double-Bit Errors
Say d1 and d0 become faulty, p2 p1 p0 = 111 d3 is incorrectly inverted.
Double errors can’t be corrected, can it be detected?Adding an extra parity bit to check data and overlapping parity bits
dn-1 … d1, d0 pk-1 … p0 p
Erroneous bits Check bits affected Syndromesd0 p0, p1 011d1 p0, p2 101d2 p1, p2 110d3 p0, p1, p2 111p0 p0 001p1 p1 010p3 p2 100
7 6 5 4 3 2 1
d3 d2 d1 p2 d0 p1 p0
Does not detect errorDetects errorDouble bits error
Detects errorDetects errorSingle bit error
ECE 534 26
Modified Hamming Code (SEC-DED)
c2 c1 c0 c3 Syndromes computationC2 = XOR (4, 5, 6, 7) C1 = XOR (2, 3, 6, 7)C0 = XOR (1, 3, 5, 7)
C3 = parity over all 8 bits of the code word
Check bits computationP2 = XOR (5, 6, 7)P1 = XOR (3, 6, 7) P0 = XOR (3, 5, 7)
P3 = parity over the first 7 bits of the code word i.e. d3 d2 d1 p2 d0 p1 p0
0 0 001
y2 y1 0y0
0 0 10
Single error in a position (x2x1x0)x2 x1 x0
H matrix for Single Error Correction and Double Error Detection
No errors
Double errorError in bit p4
11111111c3
10101010c0
01100110c1
00011110c2
p0p1d0p2d1d2d3p3
12345678
ECE 534 27
Single Error Correction and Double Error Detection Hamming Code (SEC-DED) Example
110011015
010001004
110010003
111011002
110011001
p0p1d0p2d1d2d3p4
12345678
Initial datad3 d2 d1 d00 1 1 0
CorrespondingSyndromes
No errors
Failure scenarios
Single error in position 3
Error in bit p3
Check bits computationP2 = XOR (5, 6, 7)P1 = XOR (3, 6, 7) P0 = XOR (3, 5, 7)No errors
Single error in position 3
Double error
Single error in position 6
Error in bit p3
Single error in position 6
Double error
Syndromes computationC2 = XOR (4, 5, 6, 7) C1 = XOR (2, 3, 6, 7)C0 = XOR (1, 3, 5, 7)
C3 = parity over all 8 bits of the code word00015
00104011131101200001c0c1c2c3
ECE 534 28
Checksum Codes - Basic Concepts
The checksum is appended to block data when such blocks are transferred
rn
r5r4r3r2r1
Transfer
Checksum onReceived Data
Received Versionof Checksum
Compare
dn
d5d4d3d2d1
Checksum onOriginal Data
di = original word of datari = received word of data
ECE 534 29
Single Precision Checksums
The single-precision checksum is unable to detect certain types of errors. The received checksum and the checksum of the received data are equal, so no error is detected.
0 1 1 1
0 0 0 1
0 1 1 0
0 0 0 0
d3 d2 d1 d0
1 1 1 0
Checksum
Original Data
1 1 1 1
1 0 0 1
1 1 1 0
1 0 0 0
d3 d2 d1 d0
1 1 1 0
Checksum of Received Data
Original Data
1 1 1 0
Received Checksum
Transmit Receive
d0
d1
d2
d3
Faulty LineAlways “1”
A single-precision checksum is formed by adding the data words and ignoring any overflow
Carry is Ignored
0 1 1 1 0 0 0 10 1 1 01 0 0 0
0 1 1 0( )
(Addition) +
1 Checksum
Original Data}
ECE 534 30
Double Precision Checksums
Compute 2n-bit checksum for a block of n-bit wordsOverflow is still a concern, but it is now overflow from a 2n-bits
The received checksum and the checksum of the received data are not equal, so the error is detected
0 1 1 1
0 0 0 1
0 1 1 0
0 0 0 0
d3 d2 d1 d0
0 0 0 0Checksum
Original Data
1 1 1 1
1 0 0 1
1 1 1 0
1 0 0 0
d3 d2 d1 d0
Checksum of Received Data
Received Data
Received Checksum
Transmit Receive
d0d1
d2
d3
Faulty LineAlways “1”
1 1 1 0 0 0 1 0 1 1 1 0
1 0 0 0 1 1 1 0
ECE 534 31
Honeywell Checksums
Concatenate consecutive words to form double words to create k/2words of 2n bits; checksum formed over newly structured data
Transmit Receive
d0d1
d2
d3
Faulty LineAlways “1”
0 1 1 1
0 0 0 1
0 1 1 0
0 0 0 0
d3 d2 d1 d0
Original Data
0 0 0 1 0 1 1 1
0 0 0 0 0 1 1 0
0 0 0 1 1 1 0 1
Checksum of Original Data
1 1 1 1
1 0 0 1
1 1 1 0
1 0 0 0
d3 d2 d1 d0
Received Data
1 0 0 1 1 1 1 1
1 0 0 0 1 1 1 0
0 0 1 0 1 1 0 1
Checksum of Received Data
Received Checksum
10 0 1 1 1 0 1
Word n
Word 2Word 3
Word 1
Checksum
Word n - 1 Word n
Word 9 Word 10
Word 7 Word 8
Word 5 Word 6
Word 3 Word 4
Word 1 Word 2
ECE 534 32
Residue Checksums
0 1 1 1
0 0 0 1
0 1 1 0
0 0 0 0
d3 d2 d1 d0
Original Data
1 1 1 0
Checksum of Original Data
Word n
Word 3Word 4
Word 2
Checksum
Sum of Data
Word 1
c
c
Carry from Addition
End-Around Carry Addition
1 1 1 11 0 0 1
1 1 1 0
1 0 0 0
d3 d2 d1 d0
Received Data
0 0 0 1
Checksum of Received Data
1 1 1 0
Received Checksum
Three Carries
Generated During
End-Around
Carry Addition
1 1 1 0
111
The same concept as the single-precision checksum except that the carry bit is not ignored and is added to checksum in an end-around carry fashion
Transmit Receive
d0d1
d2
d3
Faulty LineAlways “1”
ECE 534 33
Codes for Storage and Communication Cyclic Codes
Cyclic codes are parity check codes with additional property that cyclic shift of codeword is also a codeword
if (Cn-1, Cn-1 ... C1, C0) is a codeword, (Cn-2, Cn-3, ... C0, Cn-1) is also a codeword
Cyclic codes are used in
sequential storage devices, e.g. tapes, disks, and data links
communication applications
An (n,k) cyclic code can detect single bit errors, multiple adjacent bit errors affecting fewer than (n-k) bits, and burst transient errors
Cyclic codes require less hardware, in form of linear feedback shift registers
in comparison, parity check codes require complex encoding, decoding circuit using arrays of EX-OR gates, AND gates, etc.
ECE 534 34
Cyclic Code and Polynomials
Cyclic codes employ on the representation of data by a polynomial If (Cn-1, Cn-2 ... C1, C0) is a codeword, its polynomial representation is C(x)= Cn-1 xn-1 + Cn-2xn-2 + ... C1x + C0
The (n,k) cyclic code is characterized by
G(x): the generator polynomial of degree (n-k)
D(x): the data polynomial of degree (k-1)
dk-1, dk-2 ... d1, d0 D(x)= dk-1 xk-1 + Ck-2xk-2 + ... C1x + C0
Similar idea to number representation in a base
(d3d2d1d0)r = d3 × r3 + d2 × r2 + d1 × r + d0
V(x): Code polynomial Code word V
Code word V = (vn-1, vn-2 ... v1, v0) of degree (n-1)
ECE 534 35
Cyclic Code and Polynomials
Encoding process: V(x) = D(x) * G (x) using modulo 2
Code word V(x) is a non-separable code
Example: for (7,4) code, G(x) = x3 + x + 1
Given data word (1011) D(x) =x3 + x + 1
V(x) = G(x) * D(x) = (x3 + x + 1) (x3 + x + 1) = x6 + x4 + x3 +x4 + x2 + x + x3 + x + 1 =x6 + x2 + 1
Hence code word is (1000101)
Given data word (1111) D(x) =x3 + x2 + x + 1
V(x) = G(x) * D(x) = (x3 + x2 + x + 1) (x3 + x + 1) = = x6 + x5 + x3 + 1
Hence code word is (1101001)
ECE 534 36
Basic Operations on Polynomials
Can multiply or divide one polynomial by another, follow modulo 2 arithmetic, coefficients are 1 or 0, and addition and subtraction are same
G(x) is a polynomial of degree (n-k) for an (n,k) code, with a unity coefficient in (n-k) termG(x) is a factor of xn-1, i.e., it divides it with zero remainder
if a polynomial with degree n-k divides xn-1, then G(x) generates a cyclic codefor (7,4) code, G(x) = x3 + x + 1, can verify g(x) divides x7 -1
MultiplicationMultiplication DivisionDivision
(x4 + x3 + x2 + 1)(x3 + x)x7+ x6 + x5 + x3
+ x5 + x4 +x3 +x= x7 + x6 + x4 + x
x4 + x3 + x2 + 1Quotient
Remainder
x
x5 + x4
x5 + x4 + x3 + x
x3 + x
ECE 534 37
Cyclic Code - Example
Generator polynomial G(x) =x3 + x + 1 for (7,4) code
Data polynomial D(x) = d3x3 + d2x2 + d1x + d0
Code polynomial V(x)= ν6x6 + ν5x5 + ν4x4 + ν3x3 +
+ ν2x2 + ν1x + ν0
Code distance is 3
SEC/DED code
0000000100100011010001010110011110001001101010111100110111101111
0000000000110100110100010111011010001110010101110010001111010001100101111001011111111011100101000110001101001011
Cyclic codes for 4-bit information words.Information
(d0, d1, d2, d3,)Code
(ν0,ν1,ν2,ν3,ν4,ν5 ,ν6)
ECE 534 38
Consider blocks labeled X as multipliers, and addition elements as modulo 2
Another representation is to replace multipliers by storage elements, adders by EX-OR gates
Circuit to Generate Cyclic Code
x x x+ + v(x)
D(x)
g(x)=x3 + x + 1V(x) = [(x2 +1)x +1] D(x)
v(x)
D(x)
clock
Reg 1 Reg 2 Reg 3
ECE 534 39
Generation of Code Words
0000000100100011010001010110011110001001101010111100110111101111
0000000000110100110100010111011010001110010101110010001111010001100101111001011111111011100101000110001101001011
Cyclic codes for 4-bit information words.Information
(d0, d1, d2, d3,)Code
(ν0,ν1,ν2,ν3,ν4,ν5 ,ν6)
Data polynomial = d0 + d1x + d2x2 + d3x3
Generator polynomial = 1 + x + x3
Code polynomial = ν0 + ν1x + ν2x2 + ν3x3 + ν4x4 + ν5x5 + ν6x6
0
1
2
3
4
5
6
7
1
0
1
1
0
1
0
0
0
2
0
0
1
1
0
1
0
0
3
0
1
1
1
0
0
1
0
1
1
0
1
0
0
0
1
0
1
0
0
0
1
The encoding process
Register valuesClock period D(x)V(x)
v(x)
D(x)
clock
Reg 1 Reg 2 Reg 3
ECE 534 40
Decoding of Cyclic Codes
Determine if code word (rn-1, rn-2, ....., r1, r0) is validCode polynomial r(x) = rn-1 xn-1 + rn-2 xn-2 + ... r1 x + r0
If r(x) is a valid code polynomial, it should be a multiple generator polynomial g(x)
r(x) = d(x) g(x) + s(x), where s(x) the syndrome polynomial should be zero
Hence, divide r(x) by g(x) and check the remainder whether equal to 0
D(x) = V(x) / G(x)+
V(x) B(x)
D(x)
ECE 534 41
Decoding of Cyclic Codes
Example: for (7,4) code, G(x) = x3 + x + 1
D(x) = V(x) + B(x)
V(x) = D(x) – B(x)
Since in Modulo 2 addition and subtraction are the same
V(x) = D(x) + B(x)
V(x) = D(x) (x3 + x + 1)
B(x) = D(x) ( x3 + x)
+V(x) B(x)
D(x)
x x x
+B(x)
D(x)
ECE 534 42
Circuits for Decoding
Another representation is to replace multipliers by storage elements and adders by EX-OR gates
x x x
+ +V(x) B(x)
D(x)
Note: Once the division is completed, the registers contain the valueof the syndrome (remainder)
V(x)
D(x)Reg 1 Reg 2 Reg 3
clock
ECE 534 43
Example Decoding
0
1
2
3
4
5
6
7
1
0
0
0
1
1
0
1
0
2
0
0
1
1
0
1
0
0
3
0
1
1
0
1
0
0
0
1
0
1
0
0
0
1
0
1
1
1
0
0
1
The decoding process with correct information
Register valuesClock period V(x) B(x)
OriginalInformationSyndrome {
1
1
0
1
0
0
0
D(x)
Codeword
0
1
2
3
4
5
6
7
1
0
0
0
1
1
0
0
1
2
0
0
1
1
0
0
1
1
3
0
1
1
0
0
1
1
0
1
0
1
1
0
0
1
0
1
1
1
1
1
1
The decoding process with erroneous information
Register valuesClock period V(x) B(x)
1
1
0
0
1
1
0
D(x)
{Nonzero Syndrome
Receivedword
Generator polynomial, g(x) = x3 + x +1
v(x)
d(x)Reg 1 Reg
2Reg 3
clock
ECE 534 44
Arithmetic CodesUseful to check arithmetic operations
Parity codes are not preserved under addition, subtraction
Arithmetic codes can be separate (check bits disjoint from data bits) or nonseparate (combined check and data)
Several Arithmetic codes
AN codes, Residue codes, Inverse residue codes, Residue Number Systems (RNS)
Arithmetic codes must be invariant to a set of arithmetic operations
A(b*c) = A(b) * A(c)
ECE 534 45
AN CodesMultiple each data word N with some constant A
Code is invariant to addition and subtraction, but not to multiplication and division
Constant A determines the number of extra bits required and error detection capability provided
A ≠ 2a
If A = 2a
N = nj-1 nj-2 … n2 n1 n0
A N = nj-1 nj-2 … n2 n1 n0 00 … 0
AN still divisible to A if and bits ni is in error
a
ECE 534 46
Properties of AN CodesA valid AN code is 3N code
N 3 N0000 0000000001 0000110010 0001100011 001001
Operation performed under an AN code can be checked by determining if the result is evenly divisible by A
6 + 1 = 7 010010 + 000011 = 010101 (21 is divisible by 3)
If you have an output line s-a-1 i.e. 010111
23 not divisible by 3 and error is detected
ECE 534 47
Example of 3N Code
0000000100100011010001010110011110001001101010111100110111101111
000000000011000110001001001100001111010010010101011000011011011110100001100100100111101010101101
Resulting 3N code words for a 4-bit information words
OriginalInformation 3N code word
b5 b4 b3 b2 b1 b0 a5 a4 a3 a2 a1 a0
S5 S4 S3 S2 S1 S0
ADDER
3N CODEB
3N CODEA
3N CODEof Sum
(3N Code of 6)(3N Code of 1)(3N Code of 7)
A = 0 1 0 0 1 0+ B = 0 0 0 0 1 1
S = 0 1 0 1 0 1
Normal Operation If S1 is always “1”
(3N Code of 6)(3N Code of 1)(Not a valid 3N Code)
A = 0 1 0 0 1 0B = 0 0 0 0 1 1S = 0 1 0 1 1 1
Illustration of the error detection capabilities of the 3N arithmeticcode. The presence of the fault results in the sum being an invalid 3N code.
ECE 534 48
Generating AN Codes
(n+1) bit adder
N = nj-1 nj-2 … n2 n1 n02N = nj-1 nj-2 … n2 n1 n0 0
3N
Generating 3N code
00
01
00 01 11 10
0 1 3 2
4 5 7 6
12 13 15 14
8 9 11 10
11
10 1
1
1 1
1
1
How to check if a code is a 3N
ECE 534 49
Residue codes
Separable codes D | R D is data and R is the residue of the dataN = Q m + r ; r remainder or residue, Q quotient, and m modulus
N = 14 m = 3, r =2 14 = 2 mod (3)Example: residue of a 4 bit mod 3
Data residue Codeword0000 0 0000 000001 1 0001 010010 2 0010 100011 0 0011 000100 1 0100 01
ECE 534 50
Residue CodesResidue of an integer n with respect to A is: n mod (A)Residue code are invariant to additions((x mod A) + (y mod A)) mod A = (x+y) mod AResidues can be handled separately from the data during the addition process
If A is any integer not a power of 2, then Residue Code detects any single-bit error in an arithmetic operation of ADD or SUB
+A
+X
Y
residue(X mod A)
residue(Y mod A)
Equality checker
X + Y
Adder
Modulo-AAdder
ResidueGenerator(mod-A)
Error if not equal
ECE 534 51
Low Cost Residue Code
m = 2b -1 ( b ≥ 2) then code word has n+b bitsEncoding in low cost residue is done by dividing the information into b bits and add them in mod (2b -1).Example: b = 2 m = 22 – 1= 3
Data = (167)10 = (10100111)2 10 10 01 11
01 0110
Low cost residue code provides easy encoding Inverse residue code
Instead of appending r, append m-rBetter detection of repeated-used fault
167 mod 3 = 2
ECE 534 52
Residue Number System (RNS)
Represent a number by a set of relatively prime moduliP = [3, 4, 5]
3210 = (2 0 2)RNS
1410 = (2 2 4)RNS
3210 (2 0 2)RNS
+ 1410 + (2 2 4)RNS
4610 (1 2 1)RNS
Speed advantage – carry-free number systemError detection capability
ECE 534 53
Residue Number System (RNS)
Example: P = [3,4]
Number RNS RNS (Binay)0 0 0 00 001 1 1 01 012 2 2 10 103 0 3 00 114 1 0 01 005 2 1 10 016 0 2 00 107 1 3 01 11
Code distance is one
Example: P = [3, 4, 5]
Number RNS RNS (Binay)0 0 0 0 00 00 0001 1 1 1 01 01 0012 2 2 2 10 10 0103 0 3 3 00 11 0114 1 0 4 01 00 1005 2 1 0 10 01 0006 0 2 1 00 10 0017 1 3 2 01 11 010
Code distance is twoTolerates one error
Redundant moduli provide higher code distance and thereforeprovides error detection
ECE 534 54
Self-Checking Concept
Have systems with capabilities of self-checking
Duplication and coding scheme, need to compare outputs of two modules
What if the checker fails – single point of failure sceneraio
Self Checking is the ability to automatically detect the existence of a fault without the need for any externally applied stimulus
Circuit(if faulty)
CodewordValid codeword
Invalid codeword
ECE 534 55
Self-Checking Circuits
Self-testing circuit for every fault from a prescribed set there exists at least one valid input code word that will produce an invalid output code word when a single fault is present in the circuit
Fault secure circuit any single fault from a prescribed set results in the circuit either producing the correct code word or producing a non-code word, for any valid input code word
Totally self-checking circuit (TSC)
the circuit is both fault secure and self-testing
all single faults are detectable by at least one valid code wordinput, and when a given input combination does not detect the fault, the output is the correct code word output
ECE 534 56
Self Checking Checker
Two outputs needed to avoid stuck at fault situationIf checker is faulty free (valid output)
Checker output = 01 or 10If checker is faulty
Checker output = 00 or 11
Checker
Coded OutputCoded InputCircuit
f Error g indicator
ECE 534 57
Two Rail Checker
f = A ⊕ B = AB + AB A = x0 = y0g = A ⊕ B = AB + AB B = x1 = y1
f = x0y1 + x1y0g = x0x1 + y1y0
x0
y1
x1
y0
f
g
ECE 534 58
Two Rail Checkers
fg
x0
y1
x1
y0 x2
y3
x3
y2
x4
y5
x5
y4
f1g1
f2g2
f3g3 f4g4
x0
y1
x1
y0
fg
ECE 534 59
“Vanilla” Node/Network Controller
Node/network controller
• executes high level protocols (e.g., reliable multicast) and interfaces directly with the network• host/ processing element can directly access memory of the controller• the controller can access memory of the host/processing element
Compute Node
Host/Processing Element
Node/Network Controller
Network Interface
RAMROMProcessorBuffer
PCI BUS
Syst
em B
us
ECE 534 60
Fault Resilient Node/Network Controller
High level protocol engine(executes high level protocols, e.g., reliable multicast)
• duplicated, tightly coupled CPUs• data buses compared onmemory accesses and otherCPU operations, e.g., write tomemory of the the network adapter)
• memory write-protection (MASK)on all write operations from the processing element (usually DMA operations)
• checksum computation on the data from ROM
Network adapter• parity checking on all data transfers over the bus
• watchdog to monitor the node communications with the network
• control flow checking and dataaudit to monitor SW errors
Compute Node
Host/Processing Element
RAMROMProcessor 1
Node/Network Controller
RAMROMProcessor 2
Reset
Network Interface
MASK
MASK
RAMROMLow levelprotocol engine
Buffer
Bridge
Network Adapter
High levelprotocol engineComparator
PCI BUS
Watch-dog
Syst
em B
us
Recommended