23
 

Floating Point Arithmetic 2

Embed Size (px)

Citation preview

Page 1: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 1/23

 

Page 2: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 2/23

A method of representation of real numbersthat can support a wide range of values. Atypical number that can be represented exactly

is of the form:

Significant digits × baseexponent 

The term floating point refers to the fact that theradix point can "float" i.e., it can be placedanywhere relative to the significant digits of thenumber.

Page 3: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 3/23

Floating point numbers approximate realnumbers

Floating numbers have large dynamic range

Page 4: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 4/23

 

The IEEE 754 has produced a standard

for floating point arithmetic. This standardspecifies how single precision (32 bit) and

double precision (64 bit) floating point

numbers are to be represented, as well as

how arithmetic should be carried out on

them

Page 5: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 5/23

 The IEEE 754 standard specifies a binary32 as

having:• Sign bit: 1 bit

• Exponent width: 8 bits

• Significand precision: 24 (23 explicitly stored)

The base is 2

Page 6: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 6/23

Sign bit determines the sign of the number,which is the sign of the significand as well.

Sign bit=0 if the number is positive=1 if the number is negative

Page 7: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 7/23

The exponent field needs to represent bothpositive and negative exponents. To do this, abias of  ‘127’ is added to the actual exponent in

order to get the stored exponent.Thus, an exponent of zero means that 127 is

stored in the exponent field. A stored value of 

200 indicates an exponent of (200-127), or 73.Exponents of -127 (all 0s) and +128 (all 1s) are

reserved for special numbers.

Page 8: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 8/23

 Also known as ‘Mantissa’ 

The true significand includes 23 fraction bitsto the right of the binary point and an implicitleading bit with value 1 unless the exponent isstored with all zeros. Thus only 23 fractionbits of the significand appear in the memoryformat but the total precision is 24 bits

Page 9: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 9/23

The bits are laid out as follows:

31 30 23 22 0

sign exponent significand

Page 10: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 10/23

The value of the number represented in singleprecision format is as follows:

(a)If e=255 and f=0, then v= NaN.(b) If e=255 and f=0, then v= (- I)s (c) If 0<e<255, then v=(- 1)s2e-127 (1. f).(d) If e =0 and f=0, then v = ( - 1)s2 -126(0.f).(e) If e=0 and f=0, then v=(- l)s 0, (zero).

Page 11: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 11/23

  In order to maximize the quantity of 

representable numbers, floating-pointnumbers are typically stored in normalized  form. This basically puts the radix point afterthe first non-zero digit. In normalized form,five is represented as 5.0 × 100.

Page 12: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 12/23

A nice little optimization is available to us inbase two, since the only possible non-zero

digit is 1. Thus, we can just assume a leadingdigit of 1, and don't need to represent itexplicitly. As a result, the mantissa haseffectively 24 bits of resolution, by way of 23fraction bits.

Page 13: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 13/23

The storage format of double precision is asshown

sign bit: 1 bitExponent width:11 bitssignificand precision: 52 bits(implicit)

The bias for exponent is 1023

63 62 52 51 0

Sign exponent significand

Page 14: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 14/23

Convert the following single-precision IEEE 754number into a floating-point decimal value. 

1 10000001 10110011001100110011010 First, put the bits in three groups.

Bit ‘31’ (the leftmost bit) show the sign of thenumber.Bits ‘23-30’ (the next 8 bits) are the exponent. Bits ‘0-22’ (on the right) give the fraction

Page 15: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 15/23

Now, look at the sign bit.

If this bit is a 1, the number is negative, otherwise positive.Here this bit is 1, so the number is negative.

Get the exponent and the correct bias. The exponent is simply a positive binary number.10000001bin = 129ten 

Remember that we will have to subtract a bias fromthis exponent to find the power of 2. Since this is asingle-precision number, the bias is 127.

Page 16: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 16/23

Convert the fraction string into base ten.This is the trickiest step. The binary string

represents a fraction, so conversion is a littledifferent.Binary fractions look like this:

0.1 = (1/2) = 2-1 0.01 = (1/4) = 2-2 0.001 = (1/8) = 2-3 

Page 17: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 17/23

So, for this example, we multiply each digit by thecorresponding power of 2:

0.10110011001100110011010bin = 1*2-1

+ 0*2-2

+ 1*2-3

+1*2-4 + 0*2-5 + 0 * 2-6 + ...0.10110011001100110011010bin = 1/2 + 1/8 + 1/16 + ...

Note that this number is just an approximation onsome decimal number. There will most likely be someerror. In this case, the fraction is about0.7000000476837158.

Page 18: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 18/23

This is all the information we need. We canput these numbers in the expression:

(-1)sign bit * (1+fraction) * 2 exponent - bias = (-1)1 * (1.7000000476837158) * 2 129-127 = -6.8

The answer is approximately -6.8.

Page 19: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 19/23

Convert 0.1015625 to IEEE 32-bit floating pointformat. Converting:

0.1015625 × 2 = 0.203125 0 Generate 0 and continue.

0.203125 × 2 = 0.40625 0 Generate 0 and continue. 0.40625 × 2 = 0.8125 0 Generate 0 and continue.

 0.8125 × 2 = 1.625 1 Generate 1 and continue with the rest.

0.625 × 2 = 1.25 1 Generate 1 and continue with the rest. 0.25 × 2 = 0.5 0 Generate 0 and continue.

 0.5 × 2 = 1.0 1 Generate 1 and nothing remains. So 0.101562510 = 0.00011012.

Page 20: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 20/23

Normalize: 0.00011012 = 1.1012 × 2-4. Mantissa is 10100000000000000000000,

exponent is -4 + 127 = 123 = 011110112, signbit is 0. So 0.1015625 is

00111101110100000000000000000000

Page 21: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 21/23

Binary Fractional Numbers “Even” when least significant bit is 0  Half way when bits to right of rounding position =

100…2

Examples Round to nearest 1/4 (2 bits right of binary point)Value Binary Rounded Action Rounded

Value

2 3/32 10.000112 10.002 (<1/2—down)  22 3/16 10.001102 10.012 (>1/2—up)  2 1/42 7/8 10.111002 11.002 (1/2—up)  32 5/8 10.101002 10.102 (1/2—down)  2 1/2

Page 22: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 22/23

Operands( – 1)s1 M1 2E1

( – 1)s2 M2 2E2

Assume E1 > E2 

Exact Result( – 1)s M 2E   Sign s, significand M:

▪ Result of signed align & add

Exponent E : E1 Fixing

If M ≥ 2, shift M right, increment E   if M < 1, shift M left k positions, decrement E by k   Overflow if E out of range Round M to fit frac precision

( – 1)s1 m1

( – 1)s2 m2

E1–E2

+

( – 

1)s

m

Page 23: Floating Point Arithmetic 2

8/2/2019 Floating Point Arithmetic 2

http://slidepdf.com/reader/full/floating-point-arithmetic-2 23/23

3.25 x 10 ** 3+ 2.63 x 10 ** -1

-----------------

first step: align decimal pointssecond step: add

3.25 x 10 ** 3+ 0.000263 x 10 ** 3

--------------------= 3.250263 x 10 ** 3