Chapter 3 Sections 3.1 – 3.5 & 3.8 Appendix C.1 – C.3, C.5 – C.6 Dr. Iyad F. Jafar Arithmetic for Computers

Chapter 3 Sections 3.1 3.5 & 3.8 Appendix C.1 C.3, C.5 C.6 Dr. Iyad F. Jafar Arithmetic for Computers Slide 2 Outline Addition and Subtraction Overflow Detection Faster Addition The 1-Bit ALU The 32-bit MIPS ALU Shift Operations Multiplication Division Floating Point Numbers Fallacies and Pitfalls 2 Slide 3 Addition and Subtraction Add corresponding bits including the sign bit and ignore the carry out of the MSB For subtraction, add the negative 4 + 3 7 0100 0011 0111 -4 + 3 1100 0011 1111 -4 - (-3) 1100 1101 4 - 3 1 0100 1101 1 0001 -4 + 3 1100 0011 1111 3 Slide 4 When do we get overflow? Adding two positive numbers and get a negative number When we add two negative numbers and get a positive number Investigate the sign bit! Detecting Overflow 0 + 0 0 C in 0 C out + + + 0 + 0 1 C in 0 C out + + - No overflow Overflow 1 + 1 0 C in 1 C out - - + 1 + 1 1 C in 1 C out - - - Overflow No Overflow Overflow when carry into sign bit does not equal the carry out C in C out Overflow 0110 1 + 0 0 C in 0 C out - + - No Overflow 1 1 + 0 1 C in 1 C out - + + No Overflow 0 4 Slide 5 Addition and Subtraction How to perform addition in hardware? Design 32-bit adder (two 32-bit inputs !!!!) Cell design ! 1-bit Full Adder 5 + B1B1 A1A1 Sum CarryOut CarryIn ABC in C out Sum 00000 00101 01001 01110 1 0001 10110 11010 11111 B AB 00011110 0 1 A C in C out 0 0 0 1 1 1 0 1 B AB 00011110 0 1 A C in Sum 0 1 1 0 0 1 1 0 C out =Sum = A B C in BC in + AC in AB Slide 6 Addition and Subtraction 32-bit ripple-carry adder Cascade 32 copies and wire them up through the C in and C out How long does it take to get the result ? 6 FA 0 A0A0 B0B0 S0S0 A1A1 B1B1 S1S1 A2A2 B2B2 S2S2 A 31 B 31 S 31 C 32 Slide 7 Addition and Subtraction 32-bit ripple-carry Subtractor Subtraction is addition of the negative! Compute the 2s complement = 1s complement + 1 7 FA 1 A0A0 B0B0 D0D0 A1A1 B1B1 D1D1 A2A2 B2B2 D2D2 A 31 B 31 D 31 B 32 Slide 8 Addition and Subtraction 32-bit ripple-carry adder/subtractor Redundancy in hardware!! Subtraction is addition of the negative! Use one adder and configure the second input Remember X X and X X 8 FA Add/Sub A0A0 B0B0 S0S0 FA A1A1 B1B1 S1S1 A2A2 B2B2 S2S2 A 31 B 31 S 31 C 32 0 ADD 1 Subtract Slide 9 Faster Addition The ripple-carry adder is slow! We have to wait until the carry is propagated to the final position in order to read out the addition or subtraction result. Carry generation is associated with two levels of gates at each bit position Co i = A i B i + A i Cin i + B i Cin i Total delay = gate delay x 2 x number of bits Example 16 bit adder delay is 32 delay units Can we go faster? What if we generate the carries in parallel? 9 Slide 10 Faster Addition The carries can be expressed by the Adders inputs and c0 exclusively! Add a separate hardware to compute the carry in parallel! Carry-lookahead Adder 10 A 31 A 0 B 31 B 0 c0c0 c1c1 c2c2 c3c3 c4c4 Slide 11 Faster Addition In a 4-bit adder, the equations of the carries are c 1 = (b 0. c 0 ) + (a 0. c 0 ) + (a 0. b 0 ) c 2 = (b 1. c 1 ) + (a 1. c 1 ) + (a 1. b 1 ) c 3 = (b 2. c 2 ) + (a 2. c 2 ) + (a 2. b 2 ) c 4 = (b 3. c 3 ) + (a 3. c 3 ) + (a 3. b 3 ) By substitution c 2 = (a 1. a 0. b 0 ) + (a 1. a 0. c 0 ) + (a 1. b 0. c 0 ) + (b 1. a 0. b 0 ) + (b 1. a 0. c 0 ) + (b 1. b 0. c 0 ) + (a 1. b 1 ) c 3 = (b 2. a 1. a 0. b 0 ) + (b 2. a 1. a 0. c 0 ) + (b 2. a 1. b 0. c 0 ) + (b 2. b 1. a 0. b 0 ) + (b 2. b 1. a 0. c 0 ) + (b 2. b 1. b 0. c 0 ) + (b 2. a 1. b 1 ) + (a 2. a 1. a 0. b 0 ) + (a 2. a 1. a 0. c 0 ) + (a 2. a 1. b 0. c0) + (a 2. b 1. a0. b 0 ) + (a 2. b 1. a 0. c 0 ) + (a 2. b 1. b 0. c0) + (a 2. a 1. b 1 ) + (a 2. b 2 ) c 4 = All carries require two gate delays ! However, imagine the equation/cost if the adder is 32 bits ?? 11 Slide 12 Faster Addition We can reduce the logic cost by simple simplification c i+1 = (a i. b i ) + (b i. c i ) + (a i. c i ) = (a i. b i ) + (a i + b i ). c i = g i + p i. c i g i : carry generate p i : carry propagate Carry equations for 4 bit adder c 1 = g 0 + p 0. c 0 c 2 = g 1 + p 1. c 1 = g 1 + (p 1. g 0 ) + (p 1. p 0. c 0 ) c 3 = g 2 + p 2. c 2 = g 2 + (p 2. g 1 ) + (p 2. p 1. g 0 ) + (p 2. p 1. p 0. c 0 ) c 4 = g 3 + p 3. c 3 = g 3 + (p 3. g 2 ) + (p 3. p 2. g 1 ) + (p 3. p 2. p 1. g 0 ) + (p 3. p 2. p 1. p 0. c 0 ) Delay to generate c4 is 3 gate delay Still cost is high for large adders ! ! ! 12 Slide 13 Faster Addition 2 nd Level of Abstraction Example: 16-bit adder. assume that we have four 4-bit carry- lookahead adders These 4-bit adders will be designed to produce supper generate (G) and propagate (P) signals P the four bits propagate a carry to the next four bits G the four bits generate a carry to the next four bits The super carry signals are fed to a separate carry generation unit 13 4-bit CLA c0c0 A 3 -A 0 B 3 -B 0 S 3 -S 0 P0P0 G0G0 Slide 14 Faster Addition Need to generate the carry propagate and generate signals at higher level Think of each 4-bit adder block as a single unit that can either generate or propagate a carry. 14 4-bit CLA C0 A3-A0B3-B0 4-bit CLA A7-A4B7-B4 4-bit CLA A11-A8B11-B8 4-bit CLA A15-A12B15-B12 S15-S12 Carry Generation Unit C4 S11-S8S7-S4S3-S0 C1C2C3P0G0P1G1G2P2G3P3 Slide 15 Faster Addition Super propagate signals P0 = p3 p2 p1 p0 (how can the first 4-bit adder propagate c0?) P1 = p7 p6 p5 p4 P2 = p11 p10 p9 p8 P3 = p15 p14 p13 p12 Super generate signals G0 = g3+(p3 g2)+(p3 p2 g1)+(p3 p2 p1 g0) G1 = g7+(p7 g6)+(p7 p6 g5)+(p7 p6 p5 g4) G2 = g11+(p11 g10)+(p11 p10 g9)+(p11 p10 p9 g8) G3 = g15+(p15 g14)+(p15 p14 g13)+(p15 p14 p13 g12) Carry signal at higher levels are C1 = G0 + (P0 c0) C2 = G1 + (P1 G0) + (P1 P0 c0) C3 = G2 + (P2 G1) + (P2 P1 G0) + (P2 P1 P0 c0) C4 = G3 + (P3 G2) + (P3 P2 G1) + (P3 P2 P1 G0) + (P3 P2 P1 P0 c0) 15 Slide 16 Faster Addition Each supper carry signal is two level implementation in terms of Pi and Gi Pi is one level of gates while Gi is two and expressed in terms of pi and gi pi and gi are one level of gates Total delay is 2 + 2 + 1 = 5 16-bit CLA is ~6 times faster than the 16- bit ripple carry adder 16 Slide 17 Designing the ALU We want to design an ALU that Supports logic operations Supports arithmetic operations Supports the set-on-less-than instruction Supports test for equality With special handling to sign extension zero extension overflow detection 32 m (operation) result A B ALU 4 zeroovf 1 1 17 Slide 18 Designing the ALU We start by 1-bit ALU Starting with logical operations is easier since they map directly to hardware 18 0 1 A B Operation Result AB A+B Two operands, two results. We need only one result... Use 2-to MUX The Operation input comes from logic that looks at the opcode FunctionOperation A and B0 A or B1 Slide 19 Designing the ALU How about addition? 19 Cin Cout + Add an Adder Connect Cin(from previous bit) and Cout (to next bit) Expand Mux to 3-to-1 (Op is now 2 bits) 0 1 Operation Result A B 2 0 1 FunctionOperation A and B00 A or B01 A + B10 Slide 20 Designing the ALU How about subtraction? 20 0 1 0 1 A Operation Result + 2 Cout BInvert B Use the same adder for subtraction Depending operation, choose whether to compute the 2s complement of B or not (MUX or XOR) Depending operation, choose whether to compute the 2s complement of B or not (MUX or XOR) For 2s complement, define the Binvert signal and set Cin of LSB to 1 Cin FunctionOperationBInvertCin A and B000x A or B010x A + B1000 A - B1011 Slide 21 Designing the ALU Can we add the NOR instruction? 21 0 1 0 1 A Operation Result + 2 Cout BInvert B No need to add a NOR gate !! Use Demorgans theorem, an inverter and 2-to-1 MUX Cin 0 1 AInvert Define the Ainvert signal FunctionOperationBInvertCinAInvert A and B000x0 A or B010x0 A + B10000 A - B10110 A nor B001x1 Slide 22 Designing the ALU Building the 32-bit ALU Simply, we need to wire up 32 copies of the ALU we designed earlier with special care to the LSB ALU The Cin and Binvert signals are the same, tie them together into one signal BNegate 22 0 1 0 1 A Operation Result + 2 Cout BNegate B 0 1 AInvert LSB ALU Slide 23 Building the 32-bit ALU Operation C out BNegate ALU 31 Result 31 C in A 31 B 31 C out ALU 0 A0A0 B0B0 Result 0 C in C out B2B2 ALU 2 Result 2 C in A2A2 C out ALU 1 Result 1 A1A1 B1B1 C out C in Designing the ALU Note that the Cin and Bnegate for the LSB are the same in order to compute the 2s complement in case of subtraction Slide 24 Designing the ALU Supporting SLT instruction Expand the multiplexer for one more input (Less). Subtract the two registers and feed the sign bit (the result of bit 31) back to the Less input of the LSB ALU The Less inputs of remaining ALUs is 0. 24 Slide 25 The second version of 32-bit ALU For SLT instruction, the MSB is fed back to the LSB while other bits are set to zero! The operation is basically subtraction Designing the ALU Operation C out BNegate ALU 31 Result 31 C in A 31 B 31 C out Less OverFlow Set ALU 0 A0A0 B0B0 Result 0 C in Less C out 0 B2B2 ALU 2 Result 2 C in A2A2 Less C out 0 ALU 1 Result 1 A1A1 B1B1 C out Less C in 0 Slide 26 Designing the ALU Supporting Branch instructions Basically, subtract two registers! However, we need to generate a signal that indicates whether the result is zero or not. Simply OR the result bits and take the complement. This signal will be used to make the selection between the branch address and the PC. 26 Example on using the Zero signal to select the address for BEQ instruction Slide 27 Designing the ALU Operation C out BNegate 0 ALU 31 Result 31 C in A 31 B 31 C out Less OverFlow Set ALU 0 A0A0 B0B0 Result 0 C in Less C out 0 B2B2 ALU 2 Result 2 C in A2A2 Less C out 0 ALU 1 Result 1 A1A1 B1B1 C out Less C in The 32-bit ALU Slide 28 Designing the ALU The 32-bit ALU List of Supported Operations 28 FunctionOperationBNegateAInvert A and B0000 A or B0100 A + B1000 A - B1010 A nor B0011 SLT1110 BEQ1010 BNE1010 Slide 29 Shift Operations Shift operations are commonly needed! MIPS ISA specifies three shift instructions Two logical shift instructions SLL$rt, $rs, shift_amount #R[rt] = R[rs] > shift_amount One arithmetic shift instruction SRA $rt, $rs, shift_amount #R[rt] = R[rs] >> shift_amount What is the difference? Unlike the SRL, the SRA instruction preserves the sign of the number! Encoding 29 oprsrtrd shamt funct 65555 6 R-type Slide 30 Shift Operations Example 1. 30 srl $t1, $t1, 8 0010 0011 0111 0110 1010 1111 0000 1101 $t1 0000 0000 0010 0011 0111 0110 1010 1111 $t1 andi $t1, $t1, 0x00FF 0000 0000 0000 0000 0000 0000 1010 1111 $t1 2. You want to multiply $t3 by 8 (note: 8 equals 2 3 ) 0000 0000 0000 0000 0000 0000 0000 0101 $t3 sll $t3, $t3, 3# move 3 places to the left 0000 0000 0000 0000 0000 0000 0010 1000 $t3 (equals 5) (equals 40) 8 0000 0000 0000 0000 0000 0000 1111 1111 1. You need to extract the 2nd byte of a 4-byte word in $t1 Slide 31 Shift Operations How are these instructions implemented? Outside the ALU Shift registers slow; shifting by one bit requires one cycle! Barrel Shifters l A digital circuit that can shift a data word by a specified number of bits in one clock cycle, if long enough ! l Simply a set of multiplexors ! 31 Slide 32 Shift Operations Example 2. 4-bit barrel shifter (rotate to left by 0, 1, 2, or 3 bits) 32 4-bit Barrel Shifter 4 D 4 Y S1S1 S0S0 Shift ValueOutput S1 S0Y3 Y2 Y1 Y0 0 D3 D2 D1 D0 0 1D2 D1 D0 D3 1 0D1 D0 D3 D2 1 D0 D3 D2 D1 D0D3D2D1D0D3D2D1 Y0Y0 D1D0D3D2D1D0D3D2 Y1Y1 D2D1D0D3D2D1D0D3 Y2Y2 D3D2D1D0D3D2D1D0 Y3Y3 Slide 33 Multiplication 33 Multiplying two 3-digit numbers A and B n partial products, where B is n digits long n - 1 additions 6 x 5 Equals 30 Each partial product is either: 110 (A*1) or 000 (A*0) Each partial product is either: 110 (A*1) or 000 (A*0) Note: Product may take as many as two times the number of bits! In Binary... 4 2 1 x 1 2 3 1 2 6 3 8 4 2 + 4 2 1 5 1 7 8 3 1 1 0 x 1 0 1 1 1 0 0 0 0 + 1 1 0 1 1 1 1 0 Multiplicand Multiplier Slide 34 Multiplication Multiplication Steps 34 1 1 01 1 0 0 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 + Step1: LSB of multiplier is 1 Add a copy of multiplicand x 0 0 1 1 01 1 1 1 0 1 1 0 0 0 Step2: Shift multiplier right to reveal new LSB Shift multiplicand left to multiply by 2 Step 3: LSB of multiplier is 0 Add zero Step 4: Shift multiplier right, multiplicand left Done! Thus, we need hardware to: 1. Hold multiplier (32 bits) and shift it right 2. Hold multiplicand (32 bits) and shift it left (requires 64 bits) 4. Add the multiplicand to the current result 3. Hold product (result) (64 bits) Step 5: LSB of multiplier is 1 Add a copy of multiplicand Step 6: Add partial products Slide 35 Multiplication Multiplication Hardware 35 Control 64-bit Product 64 bit Write Multiplicand 64 bit Shift Left Multiplier 32 bit Shift Right 1. Hold multiplier (32 bits) and shift it right 2. Hold multiplicand (32 bits) and shift it left (requires 64 bits) 4. Add the multiplicand to the current result 3. Hold product (result) (64 bits) 5. Control the whole process LSB Slide 36 Multiplication Example 3. (4-bit multiplication) 36 Multiplicand MultiplierProduct xxxx1101 010100000000 Initial Values 1-->Add Multiplicand to Product Shift Mcand left, Mplier right 0-->Do nothing Shift Mcand left, Mplier right 1-->Add Multiplicand to Product Shift Mcand left, Mplier right 0-->Do nothing Shift Mcand left, Mplier right Control 8-bit 000000000 8 bit Write xxxx1101 8 bit ShLeft 0101 4 bit ShRight xxx11010 001000001101 + xx110100 000100001101 x1101000 000001000001 + 11010000 000001000001 Slide 37 Multiplication A Cheaper Implementation Even though were only adding 32 bits at a time, we need a 64- bit adder Instead, hold the multiplicand still and shift the product register right! Now were only adding 32 bits each time 37 32-bit 32 bit Control RH Product 64 bit Write Multiplicand Multiplier Shift Right LH Product Shift Right Extra bit for carry out Slide 38 Multiplication A Cheaper than the Cheaper Implementation Note that were shifting bits out of the multiplier and into the product Why not put these together into the same register?!! As space opens up in the multiplier, overwrite it with the product bits 38 32-bit 32 bit Control Multiplier 64 bit Write Multiplicand LH Product Shift Right LSB Slide 39 Multiplication Fast Multiplication Use 31 32-bit adders to compute the partial products One input is the multiplicand ANDed with a multiplier, and the other is the partial product from previous step. Question? S how the multiplication tree to compute 5 X 3. Assume unsigned numbers represented using 3 bits and we have 4-bit ALU. 39 Slide 40 Multiplication MIPS Multiplication Two multiplication instructions mult $s0, $s1 # hi||lo = $s0 * $s1 multu$s0, $s1 # hi||lo = $s0 * $s1 The result is 64 bits and it stored in two special registers LO holds the lower 32 bits of the result Hi holds the upper 32 bits of the result The contents of these registers can be read using two special instructions 40 mfhi $t5 # move Hi to register $t5 mflo $t6 # move Lo to register $t6 oprsrtrd shamt funct 65555 6 R-type Slide 41 Multiplication MIPS Multiplication (NOTES) Both multiplication instructions ignore overflow! It is the responsibility of the software to check if the result fits into 32 bits ! For MULTU, there is no overflow if hi is 0 For MULT, there is no overflow if hi is the replicated sign of lo Question! Modify the designed multiplier to support signed multiplication. 41 Slide 42 Division 42 dividend quotient divisor remainder 4832315 -45 33 -30 32 23 -15 8 3221 73 14 5 3 1001001101 -000 1001 -101 1000 -101 110 -101 1 0111 1 -000 0 11 Dividend = Divisor * Quotient + Remainder Idea: Repeatedly subtract divisor. Shift as appropriate. Slide 43 Division 43 010010010101 -01010000 01110 01001001 -00101000 00100001 -00010100 00001101 -00001010 00000011 -00000101 00000011 1001001101 -000 1001 -101 1000 -101 110 -101 1 0111 1 -000 0 11 Looking at the alignment a little differently Make the dividend 8 bits and the divisor 4 bits by filling in with 0s Each iteration, re-express the entire remainder as 8 bits Note: At any step, the dividend = divisor * quotient + current remainder Each iteration, re-express the entire remainder as 8 bits Note: At any step, the dividend = divisor * quotient + current remainder Try subtracting the divisor from the current remainder each time if it doesnt fit, restore the remainder Slide 44 Division 44 Division Hardware 1. Hold divisor (32 bits) and shift it right (requires 64 bits) 2. Hold remainder (64 bits) 4. Subtract the divisor from the current result 3. Hold quotient (result) (32 bits) and shift it left 5. Control the whole process Control 64-bit Remainder 64 bit Write Divisor 64 bit Shift Right Quotient 32 bit Shift Left Algorithm initialize registers (divisor in LHS); for (i=0; i

Documents

Chapter 3 Sections 3.1 – 3.5 & 3.8 Appendix C.1 – C.3, C.5 – C.6 Dr. Iyad F. Jafar Arithmetic for Computers