62
Advanced ALU Lecture 5 CS365

Advanced ALU Lecture 5 CS365. D. Barbara Advanced ALU CS465 F08 2 RoadMap Implementation of MIPS ALU Signed and unsigned numbers (Section 3.2) Addition

Embed Size (px)

Citation preview

Advanced ALU

Lecture 5

CS365

Advanced ALU CS465 F082

D. Barbara

RoadMap Implementation of MIPS ALU

Signed and unsigned numbers (Section 3.2) Addition and subtraction (Section 3.3) Constructing an arithmetic logic unit (Appendix

B.5, B.6) Multiplication (Section 3.4) Division (Section 3.5) Floating point (Section 3.6)

This lecture

Advanced ALU CS465 F083

D. Barbara

Paper and pencil example (unsigned): Multiplicand 1000

Multiplier 1001 1000 0000 0000 1000

Product 01001000 m bits x n bits = m+n bit product (at most) Binary makes it easy:

0 => place 0 ( 0 x multiplicand) 1 => place a copy ( 1 x multiplicand)

4 versions of multiply hardware & algorithm: Successive refinement

Unsigned Multiply

Advanced ALU CS465 F084

D. Barbara

B0

A0A1A2A3

A0A1A2A3

A0A1A2A3

A0A1A2A3

B1

B2

B3

P0P1P2P3P4P5P6P7

0 0 0 0

Unsigned Combinational Multiplier

Stage i accumulates A * 2i if Bi == 1

Advanced ALU CS465 F085

D. Barbara

B0

A0A1A2A3

A0A1A2A3

A0A1A2A3

A0A1A2A3

B1

B2

B3

P0P1P2P3P4P5P6P7

0 0 0 00 0 0

How Does It Work?

At each stage shift A left ( x 2) Use next bit of B to determine whether to add in shifted

multiplicand Accumulate 2n bit partial product at each stage

Advanced ALU CS465 F086

D. Barbara

Product

Multiplier

Multiplicand

64-bit ALU

Shift Left

Shift Right

WriteControl

32 bits

64 bits

64 bits

Unsigned Multiplier (Version 1) Shift-add multiplier

64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit Multiplier reg

Figure 3.5

Advanced ALU CS465 F087

D. Barbara

3. Shift the Multiplier register right 1 bit.

Done

Yes: 32 repetitions

2. Shift the Multiplicand register left 1 bit.

No: < 32 repetitions

1. TestMultiplier0

Multiplier0 = 0Multiplier0 = 1

1a. Add multiplicand to product & place the result in Product register

32nd repetition?

Start

Multiply Algorithm Version 1

0010 x 0011

Product Multiplier Multiplicand0000 0000 0011 0000 00100000 0010 0001 0000 01000000 0110 0000 0000 10000000 0110 0000 0001 00000000 0110

Figure 3.6

Advanced ALU CS465 F088

D. Barbara

Observations on Multiply Version 1 One clock cycle per step => ~100 clocks per

multiply

Half bits in multiplicand always 0 => 64-bit adder is wasted

0 is inserted when shift the multiplicand left=> least significant bits of product never changed once formed

Instead of shifting multiplicand to left, shift product to right?

Advanced ALU CS465 F089

D. Barbara

B0

B1

B2

B3

P0P1P2P3P4P5P6P7

0 0 0 0

A0A1A2A3

A0A1A2A3

A0A1A2A3

A0A1A2A3

How Does It Work?

Multiplicand stays still and product moves right

Advanced ALU CS465 F0810

D. Barbara

Product

Multiplier

Multiplicand

32-bit ALU

Shift Right

WriteControl

32 bits

32 bits

64 bits

Shift Right

Unsigned Multiplier Version 2 Refined from the first version: 32-bit

Multiplicand reg, 32 -bit ALU, 64-bit Product reg, 32-bit Multiplier reg

Advanced ALU CS465 F0811

D. Barbara

3. Shift the Multiplier register right 1 bit.

Done

Yes: 32 repetitions

2. Shift the Product register right 1 bit.

No: < 32 repetitions

1. TestMultiplier0

Multiplier0 = 0Multiplier0 = 1

1a. Add multiplicand to the left half of product & place the result in the left half of Product register

32nd repetition?

Start

0000 0000 0011 00101: 0010 0000 0011 00102: 0001 0000 0011 00103: 0001 0000 0001 00101: 0011 0000 0001 00102: 0001 1000 0001 00103: 0001 1000 0000 00101: 0001 1000 0000 00102: 0000 1100 0000 00103: 0000 1100 0000 00101: 0000 1100 0000 00102: 0000 0110 0000 00103: 0000 0110 0000 0010

0010 x 0011

Product Multiplier Multiplicand

Multiply Algorithm Version 2

Advanced ALU CS465 F0812

D. Barbara

Observations on Multiply Version 2 Useful portion of multiplier shrinks while

valid portion of product grows in the process of multiplication

Product register wastes space that exactly matches size of multiplier

=>Combine Multiplier register and Product register?

Advanced ALU CS465 F0813

D. Barbara

Product (Multiplier)

Multiplicand

32-bit ALU

WriteControl

32 bits

64 bits

Shift Right

MULTIPLY Hardware Version 3 32-bit Multiplicand reg, 32 -bit ALU, 64-bit

Product reg, (0-bit Multiplier reg)

Advanced ALU CS465 F0814

D. BarbaraDone

Yes: 32 repetitions

2. Shift the Product register right 1 bit.

No: < 32 repetitions

1. TestProduct0

Product0 = 0Product0 = 1

1a. Add multiplicand to the left half of product & place the result in the left half of Product register

32nd repetition?

Start

Multiply Algorithm Version 3

Multiplicand Product0010 0000 0011

0010 00110010 0001 0001

0011 00010010 0001 10000010 0000 11000010 0000 0110

0010 x 0011

Advanced ALU CS465 F0815

D. Barbara

Multiply Version 3 Two steps per bit because Multiplier &

Product combined MIPS multiplication (unsigned)multu $t1, $t2 # t1 * t2 No destination register: product could be ~264;

need two special registers to hold it Registers Hi and Lo are left and right half of

Product To use the result:

mfhi $t3mflo $t4

Advanced ALU CS465 F0816

D. Barbara

Signed Multiplication What about signed multiplication?

Easiest solution is to make both positive & remember whether to complement product when done

Apply definition of 2’s complement Need to sign-extend partial products

Booth’s algorithm is an elegant way to multiply signed and unsigned numbers using the same hardware and save cycles

Can handle multiple bits at a time

Advanced ALU CS465 F0818

D. Barbara

Example 2 x 6 = 0010 x 0110: 0010

x 0110 + 0000 shift (0 in multiplier)

+ 0010 add (1 in multiplier)+ 0010 add (1 in multiplier)+ 0000 shift (0 in multiplier) 00001100

Basic idea: replace a string of 1s with an initial subtract on seeing a one and add after last one

0010 x 0110 0000 shift (0 in multiplier)

– 0010 sub (first 1 in multiplier) 0000 shift (mid string of 1s) + 0010 add (prior step had last 1) 00001100

Motivation for Booth’s Algorithm

Can get the same result in more than one way:6= – 2 + 8 0110 = – 0010 + 1000

0010 x (-2)

0010 x 8

Advanced ALU CS465 F0819

D. Barbara

0 1 1 1 1 0beginning of runend of run

middle of run

–1+ 1000001111

Booth’s Algorithm

Current Bit to Explanation Example Op bit right

1 0 Begins run of 1s 0001111000 sub 1 1 Middle of run of 1s 0001111000 none 0 1 End of run of 1s 0001111000 add 0 0 Middle of run of 0s 0001111000 none

Originally for speed when shift was faster than add Big advantage: it handles signed number

Why it works?

Advanced ALU CS465 F0820

D. Barbara

Booth’s Algorithm1. Depending on the current and previous

bits, do one of the following:00: Middle of a string of 0s, no arithmetic op01: End of a string of 1s, so add multiplicand to the left half of the product10: Beginning of a string of 1s, so subtract multiplicand from the left half of the product11: Middle of a string of 1s, so no arithmetic op

2. As in the previous algorithm, shift the Product register right (arithmetically) 1 bit

Advanced ALU CS465 F0821

D. Barbara

step Multiplicand Product operation

0. initial value 0010 0000 0111 0 10 -> sub

Booths Example (2 x 7)

1. P = P - m 1110 +1110

1110 0111 0 shift P(sign ext)

2. 0010 1111 0011 1 11 -> nop, shift

3. 0010 1111 1001 1 11 -> nop, shift

4. 0010 1111 1100 1 01 -> add

0010 +0010

0001 1100 1 shift

0010 0000 1110 0 done

Advanced ALU CS465 F0822

D. Barbara

step Multiplicand Product operation

0. initial value 0010 0000 1101 0 10 -> sub

Booths Example (2 x -3)

1. P = P - m 1110 +1110 1110 1101 0 shift P(sign

ext)2. 0010 1111 0110 1 01 -> add

0010 + 0010 0001 0110 1 shift P

3. 0010 0000 1011 0 10 -> sub 1110 +1110

0010 1110 1011 0 shift4. 0010 1111 0101 1 11 -> nop

1111 0101 1 shift 0010 1111 1010 1 done

Advanced ALU CS465 F0823

D. Barbara

Outline Implementation of MIPS ALU

Signed and unsigned numbers (Section 3.2) Addition and subtraction (Section 3.3) Constructing an arithmetic logic unit (Appendix

B.5, B.6) Multiplication (Section 3.4)

Booth’s algorithm: CD: 3.23 In More Depth Division (Section 3.5) Floating point (Section 3.6)

Advanced ALU CS465 F0824

D. Barbara

1001 QuotientDivisor 1000 1001010 Dividend

–1000 10 101 1010 –1000 10 Remainder (or Modulo result)

See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or 0 * divisor

Dividend = Quotient x Divisor + Remainder 3 versions of divide, successive refinement

DIVIDE: Paper & Pencil

Advanced ALU CS465 F0825

D. Barbara

Remainder

Quotient

Divisor

64-bit ALU

Shift Right

Shift Left

WriteControl

32 bits

64 bits

64 bits

DIVIDE Hardware Version 1 64-bit Divisor reg, 64-bit ALU, 64-bit

Remainder reg, 32-bit Quotient reg

Advanced ALU CS465 F0826

D. Barbara

2b. Restore the original value by adding the Divisor register to the Remainder register, &place the sum in the Remainder register. Alsoshift the Quotient register to the left, setting the new least significant bit to 0.

Test Remainder

Remainder < 0Remainder 0

1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register.

2a. Shift the Quotient register to the left, setting the new rightmost bit to 1.

3. Shift the Divisor register right 1 bit.

Done

Yes: n+1 repetitions (n = 4 here)

Start: Place Dividend in Remainder

n+1repetition?

No: < n+1 repetitions

Divide Algorithm Version 1 Takes n+1 iterations for n-

bit Quotient & RemainderQuot. Divisor Rem.

0000 00100000 00000111 11100111 000001110000 00010000 00000111 11110111 000001110000 00001000 00000111 11111111 000001110000 00000100 00000111 000000110001 000000110001 00000010 00000011 000000010011 000000010011 00000001 00000001

Songqing
the first time is always a negative and restore

Advanced ALU CS465 F0827

D. Barbara

Observations on Divide Version 1 1st step cannot produce a 1 in quotient bit

(otherwise quotient is too big for the register) => switch order to shift first and then subtract => save 1 iteration

1/2 bits in divisor always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted

Instead of shifting divisor to right, shift remainder to left?

Advanced ALU CS465 F0828

D. Barbara

Remainder

Quotient

Divisor

32-bit ALU

Shift Left

WriteControl

32 bits

32 bits

64 bits

Shift Left

DIVIDE Hardware Version 2 32-bit Divisor reg, 32-bit ALU, 64-bit

Remainder reg, 32-bit Quotient reg

Advanced ALU CS465 F0829

D. Barbara

3b. Restore the original value by adding the Divisor register to the left half of the Remainder register, &place the sum in the left half of the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0.

Test Remainder

Remainder < 0Remainder 0

2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register.

3a. Shift the Quotient register to the left setting the new rightmost bit to 1.

1. Shift the Remainder register left 1 bit.

Done

Yes: n repetitions

nthrepetition?

No: < n repetitions

Start: Place Dividend in Remainder

Divide Algorithm Version 2

Advanced ALU CS465 F0830

D. Barbara

Observations on Divide Version 2 Remainder shrinks while Quotient grows Eliminate Quotient register by combining

with Remainder register Start by shifting the Remainder left as before Only shifting once for both Remainder(left half)

and Quotient(right half): 2 steps each iteration Another consequence is that the remainder

will be shifted left one more time Thus the final correction step must shift back only

the remainder in the left half of the register

Advanced ALU CS465 F0831

D. Barbara

Remainder (Quotient)

Divisor

32-bit ALU

WriteControl

32 bits

64 bits

Shift Left“HI” “LO”

DIVIDE HARDWARE Version 3 32-bit Divisor reg, 32 -bit ALU, 64-bit

Remainder reg, (0-bit Quotient reg)

Advanced ALU CS465 F0832

D. Barbara

3b. Restore the original value by adding the Divisor register to the left half of the Remainder register, &place the sum in the left half of the Remainder register. Also shift the Remainder register to the left, setting the new least significant bit to 0.

Test Remainder

Remainder < 0Remainder ≥ 0

2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register.

3a. Shift the Remainder register to the left setting the new rightmost bit to 1.

1. Shift the Remainder register left 1 bit.

Done. Shift left half of Remainder right 1 bit

Yes: n repetitions (n = 4 here)

nthrepetition?

No: < n repetitions

Start: Place Dividend in Remainder

Divide Algorithm Version 3Step Remainder Div.0 0000 0111 00101.1 0000 11101.2 1110 11101.3b 0001 11002.2 1111 11002.3b 0011 10003.2 0001 10003.3a 0011 0001

4.2 0001 00014.3a 0010 0011 0001 0011

Advanced ALU CS465 F0833

D. Barbara

More Division Issues Signed divides: simplest is to remember signs,

make positive, and complement quotient and remainder if necessary Satisfy Dividend =Quotient x Divisor + Remainder Note: Dividend and Remainder must have same sign Note: Quotient negated if Divisor sign & Dividend sign

disagree, e.g., –7 ÷ 2 = –3, remainder = –1 Quotient = 4 and remainder =1 is not correct

Possible for quotient to be too large If divide 64-bit integer by 1, quotient is 64 bits (“called

saturation”) MIPS ignores overflow, SW detection and processing

Advanced ALU CS465 F0834

D. Barbara

Recap Same hardware for Divide as Multiply: just need

ALU to add or subtract, and 64-bit register to shift left or shift right Hi and Lo registers in MIPS combine to act as 64-bit

register for multiply and divide div $s2, $s3 #Lo=$S2/$s3, Hi=$s2 mod $s3

Successive refinement to see final design Booth’s algorithm to handle signed multiplies

Overflow Both MIPS multiply and divide instructions ignore

overflow Software must check for overflow scenarios

Advanced ALU CS465 F0836

D. Barbara

Outline Implementation of MIPS ALU

Signed and unsigned numbers (Section 3.2) Addition and subtraction (Section 3.3) Constructing an arithmetic logic unit (Appendix

B.5, B.6) Multiplication (Section 3.4, CD: 3.23 In More

Depth) Division (Section 3.5) Floating point (Section 3.6)

Advanced ALU CS465 F0837

D. Barbara

Floating-Point: Motivation What can be represented in N bits?

Unsigned 0 to 2N-1 2’s Complement - 2N-1 to 2N-1 - 1 1’s Complement -2N-1+1to 2N-1-1 Binary Coded Decimal 0 to 10N/4 –1

But, what about? Very large numbers?

9,349,398,989,787,762,244,859,087,678 Very small number?

0.0000000000000000000000045691 Rational numbers 2/3 Irrational numbers pi

Advanced ALU CS465 F0838

D. Barbara

1.01 x 223

exponent

radix (base)Significand(Mantissa)

binary point

Recall Scientific Notation: Binary

Normalized form: no leading 0s, exactly one digit to left of decimal/binary point

Alternatives to represent 1/1,000,000,000 Normalized: 1.0 x 10-9

Not normalized: 0.1 x 10-8, 10.0 x 10-10

Computer arithmetic that supports scientific notation numbers is called floating point, because the binary point is not fixed, as it is for integers

Advanced ALU CS465 F0839

D. Barbara

S represents signExponent represents y’sFraction represents x’s

Represent numbers as small as 2.0 x 10-38 to as large as 2.0 x 1038

Normalized format: 1.xxxxxxxxxxtwo 2yyyytwo

Want to put it into multiple words: 32 bits for single-precision and 64 bits for double-precision

A simple single-precision representation:031

S Exponent30 2322

Fraction

1 bit 8 bits 23 bits

FP Representation

Advanced ALU CS465 F0840

D. Barbara

Next multiple of word size (64 bits)

Double precision (vs. single precision) Represent numbers almost as small as

2.0 x 10-308 to almost as large as 2.0 x 10308 But primary advantage is greater accuracy

due to larger fraction

031S Exponent

30 20 19Fraction

1 bit 11 bits 20 bits

Fraction (cont’d)

32 bits

Double Precision Representation

Advanced ALU CS465 F0841

D. Barbara

Regarding single precision, DP similar Sign bit:

1 means negative0 means positive

Significand: To pack more bits, leading 1 implicit for normalized

numbers Significand = 1 + fraction: 1 + 23 bits single, 1 + 52

bits double Always true: 0 < fraction < 1

(for normalized numbers) Note: 0 has no leading 1, so reserve exponent

value 0 just for number 0

IEEE 754 Standard (1/4)

Advanced ALU CS465 F0842

D. Barbara

0 1111 1111000 0000 0000 0000 0000 00001/20 0000 0001000 0000 0000 0000 0000 00002

Exponent: Need to represent positive and negative exponents Also want to compare FP numbers as if they were integers,

to help in value comparisons If use 2’s complement to represent?

e.g., 1.0 x 2-1 versus 1.0 x2+1 (1/2 versus 2)

IEEE 754 Standard (2/4)

If we use integer comparison for these two words, we will conclude that 1/2 > 2!!!

Advanced ALU CS465 F0843

D. Barbara

Biased (Excess) Notation Remapping: shift all values by subtracting a bias from the

unsigned representation Biased 7

•All-zero pattern is the smallest; all-one pattern is the largest •Now integer comparison can be used directly to compare exponents

Bit-pattern biased-7 value

0000 -70001 -60010 -50011 -40100 -30101 -20110 -10111 0

Bit-pattern biased-7 value

1000 11001 21010 31011 41100 51101 61110 71111 8

Advanced ALU CS465 F0844

D. Barbara

1/2 0 0111 1110000 0000 0000 0000 0000 00000 1000 0000000 0000 0000 0000 0000 00002

Use biased notation for exponents, where bias is the number subtracted to get the real number IEEE 754 uses bias of 127 for single precision:

subtract 127 from Exponent field to get actual value for exponent

1023 is the bias for double precision

All-zero and all-one exponent patterns are reserved for special numbers (Fig 3.15)

IEEE 754 Standard (3/4)

Advanced ALU CS465 F0845

D. Barbara

Summary (single precision):

(-1)S x (1.Fraction) x 2(Exponent-127)

Double precision identical, except longer fraction and exponent segments with exponent bias of 1023

031S Exponent30 2322

Fraction

1 bit 8 bits 23 bits

IEEE 754 Standard (4/4)

Advanced ALU CS465 F0846

D. Barbara

0 0110 1000101 0101 0100 0011 0100 0010

Sign: 0 => positive Exponent:

0110 1000two = 104ten

Bias adjustment: 104 - 127 = -23 Fraction:

1+2-1+2-3 +2-5 +2-7 +2-9 +2-14 +2-15 +2-17 +2-22

= 1.0 + 0.666115

Represents: 1.666115ten2-23 1.986 10-7

Example: FP to Decimal

Advanced ALU CS465 F0847

D. Barbara

1 0111 1110100 0000 0000 0000 0000 0000

Example 1: Decimal to FP Number = - 0.75

= - 0.11two 20 (scientific notation)

= - 1.1two 2-1 (normalized scientific notation)

Sign: negative => 1 Exponent:

Bias adjustment: -1 +127 = 126 126ten = 0111 1110two

Advanced ALU CS465 F0848

D. Barbara

A more difficult case: representing 1/3?= 0.33333…10 = 0.0101010101… 2 20

= 1.0101010101… 2 2-2

Sign: 0 Exponent = -2 + 127 = 12510=011111012

Fraction = 0101010101…

Example 2: Decimal to FP

0 0111 11010101 0101 0101 0101 0101 010

Advanced ALU CS465 F0849

D. Barbara

Special Numbers What have we defined so far? (single precision)

Exponent Fraction Object0 0 00 nonzero ???1-254 anything +/- floating-point255 0 ???255 nonzero ???

Range:1.0 2-126 1.8 10-38

What if result too small? (>0, < 1.8x10-38 => Underflow!)

(2 – 2-23) 2127 3.4 1038

What if result too large? (> 3.4x1038 => Overflow!)

Advanced ALU CS465 F0850

D. Barbara

Denormalized Numbers For single-precision f. p. (page 217)

0 is 00000000000000000000000000000000 Smallest positive normalized number is

1.0000 0000 0000 0000 0000 000 x 2-126

This is actually a pretty big gap... Gradual underflow: allow a number to degrade in

significance until it becomes 0 By getting rid of implicit leading 1 in front of the fraction...

we can represent a smaller number We denote a denormalized number by 0 exponent and a

non-zero fraction The smallest denormalized number is

0.0000 0000 0000 0000 0000 001 x 2-126 or 1.0 x 2-149

For double-precision f. p. Smallest denormalized number: 1.0 x 2-1074

Advanced ALU CS465 F0851

D. Barbara

Special Numbers What have we defined so far? (single

precision)

Exponent Fraction Object

0 0 0

0 nonzero denorm

1-254 anything +/- floating-point

255 0 ???

255 nonzero ???

Advanced ALU CS465 F0852

D. Barbara

Representation for +/- Infinity In FP, divide by zero should produce +/-

infinity, not overflow Why?

OK to do further computations with infinity, e.g., X/0 > Y may be a valid comparison

IEEE 754 represents +/- infinity by all-one exponent and all-zero fraction

S 1111 11110000 0000 0000 0000 0000 000

Advanced ALU CS465 F0853

D. Barbara

Representation for Not a Number What do I get if I calculate sqrt(-4.0) or 0/0?

If infinity is not an error, these should not be either They are called Not a Number (NaN) IEEE 754: all-one Exponent (255), nonzero Fraction

Why is this useful? A symbol for invalid operations: allow programmers to

postpone some tests and decisions to later time They contaminate: op(NaN,X) = NaN

Advanced ALU CS465 F0854

D. Barbara

Special Numbers What have we defined so far? (single-precision)

Exponent Fraction Object

0 0 0

0 nonzero denom

1-254 anything +/- fl. pt. #

255 0 +/- infinity

255 nonzero NaN

Fig. 3.15

Advanced ALU CS465 F0855

D. Barbara

FP Operations Complexities Operations are somewhat more complicated In addition to overflow we can have “underflow”

Absolute value too small Accuracy can be a big problem

IEEE 754 keeps two extra bits, guard and round Four rounding modes Special cases:

Positive divided by zero yields “infinity” Zero divide by zero yields “not a number”

Implementing the standard can be tricky Not using the standard can be even worse

See text for description of 80x86 and Pentium bug!

Advanced ALU CS465 F0856

D. Barbara

(1) We'll need to convert from fraction to significand(2) We'll need to shift the smaller number so that we have the same exponent:num of shifts= Larger Number's exponent - Smaller Number's exponent

-- right shift smaller number that many positions(3) Add or subtract the two numbers(4) Normalize: a. left shift result, decrement result exponent (e.g. 0.0001) b. right shift result, increment result exponent (e.g. 101.1)

continue until MSB of data is 1 (will not be stored)(5) Check for overflow or underflow(6) If the result is a 0 significand, set the exponent to zero by special step

Basic Addition Algorithm

Advanced ALU CS465 F0857

D. Barbara

Example: Add 9.999 101 and 1.610 10-1 assuming 4 decimal digits

Floating Point Addition Example

Align decimal point of number with smaller exponent 1.610 10-1 = 0.161 100 = 0.0161 101

Shift smaller number to right Add significands 9.999

+ 0.016 10.015 SUM = 10.015 101

NOTE: One digit of precision lost during shifting Shift sum to put it in normalized form 1.0015 102

Since significand only has 4 digits, we need to round the sum SUM = 1.002 102

NOTE: normalization maybe needed again after rounding, e.g, rounding 9.9999 you get 10.000

Advanced ALU CS465 F0858

D. BarbaraDone

2. Add the significands

4. Round the significand to the appropriatenumber of bits

Still normalized?

Start

Yes

No

No

YesOverflow orunderflow?

Exception

3. Normalize the sum, either shifting right andincrementing the exponent or shifting left

and decrementing the exponent

1. Compare the exponents of the two numbers.Shift the smaller number to the right until itsexponent would match the larger exponent

FP Addition Algorithm: Fig 3.16

Advanced ALU CS465 F0859

D. Barbara

0 10 1 0 1

Control

Small ALU

Big ALU

Sign Exponent Significand Sign Exponent Significand

Exponentdifference

Shift right

Shift left or right

Rounding hardware

Sign Exponent Significand

Increment ordecrement

0 10 1

Shift smallernumber right

Compareexponents

Add

Normalize

Round

Fig 3.17

Advanced ALU CS465 F0860

D. Barbara

Accurate Arithmetic Round accurately requires HW to include extra

bits in the calculation IEEE 754 standard specifies the use of 2 extra

bits on the right during intermediate calculations – Guard bit and Round bit

Example: Add 2.56 100 and 2.34 102, assuming 3 significant digits Shift the smaller number

2.56 100 = 0.0256 102(guard holds 5 and round holds 6) Without guard and round bits

2.34+ 0.02 = 2.36 102

With guard and round bits 2.34+ 0.0256 = 2.3656 102

ROUND 2.37102

Advanced ALU CS465 F0861

D. Barbara

Floating-Point Multiplication Algorithm: Z = X * Y

Product exponent = X's exponent + Y's exponent

Doubly-biased exponent must be corrected Product exponent = sum of stored exponents – bias

Xe = 7Ye = -3Bias 8

Multiply the significands: Zm = Xm x Ym Normalization, rounding and exception

checking (overflow and underflow) Determine the sign bit

Xe = 1111Ye = 0101 10100

= 15= 5 20

= 7 + 8= -3 + 8 4 + 8 + 8

Advanced ALU CS465 F0862

D. Barbara

MIPS Floating-Point Instructions Floating-point instructions

Computation: add.s, add.d, sub.s, sub.d, mul.s, mul.d, div.s, div.d

Comparison: c.x.s, c.x.d (x=eq, neq, lt, le, gt, ge) FP conditional branch: bc1t, bc1f Floating-point data transfer: lwc1, swc1, ldc1, sdc1

32 floating-point registers: $f0, …, $f31 Double precision: by convention, even/odd pair

contain one DP FP number: $f0/$f1, $f2/$f3 See page 206 for details

Advanced ALU CS465 F0863

D. Barbara

Summary Multiplication

Refined hardware and algorithms Booth’s algorithm

Division Similar hardware as multiplication

Floating point Representing numbers Addition and multiplication Precision is a big issue

Advanced ALU CS465 F0864

D. Barbara

Next Lecture Topic:

Assessing and understanding performance Reading

Ch4 Patterson and Hennessy