View
218
Download
0
Category
Tags:
Preview:
Citation preview
Faculty of Computer Science
CMPUT 229 © 2006
Floating Point Representation
Operating with Real Numbers
© 2006
Department of Computing Science
CMPUT 229
Reading Material
This set of slides is based on the texts by Patt and
Patel and by Patterson and Hennessy.
The topics covered in these slides are presented in
Section 4.9 of Clements’ textbook.
© 2006
Department of Computing Science
CMPUT 229
Representing Large and Small Numbers
How would you represent a number such as 6.0231023 in binary?
The range (1023) of this number is greater than the range of the 32-bitsrepresentation that we have used for integers (2312.14 1010).
However the precision (6023) of this number is quite small, and can beexpressed in a small number of bits.
The solution is to use a floating point representation.
A floating point representation allocates some bits for the range ofthe value, some bits for precision, and one bit for the sign.
Patt/Patel, pp. 32
© 2006
Department of Computing Science
CMPUT 229
Floating Point Representation
Most standard floating point representation use: 1 bit for the sign (positive or negative) 8 bits for the range (exponent field)23 bits for the precision (fraction field)
S exponent fraction2381
( )( )⎪⎩
⎪⎨⎧
=××−=
≤≤××−=−
−
0 ,2.01
2541 ,2.11126
127
exponentfractionN
exponentfractionNexponentS
exponentS
Patt/Patel, pp. 33
© 2006
Department of Computing Science
CMPUT 229
Floating Point Representation (example)
S exponent fraction2381
( )( )⎪⎩
⎪⎨⎧
=××−=
≤≤××−=−
−
0 ,2.01
2541 ,2.11126
127
exponentfractionN
exponentfractionNexponentS
exponentS
point? floatingin drepresente 8
56number theis How :Example −
Thus the exponent is given by:
1292127 =⇒=− exponentexponent1 10000001 10101000000000000000000
Patt and Patel, pp. 34
© 2006
Department of Computing Science
CMPUT 229
Floating Point Representation (example)
S exponent fraction2381
( )( )⎪⎩
⎪⎨⎧
=××−=
≤≤××−=−
−
0 ,2.01
2541 ,2.11126
127
exponentfractionN
exponentfractionNexponentS
exponentS
00111101100000000000000000000000
What is the decimal value of the following floating point number?
exponent
exponent = 64+32+16+8+2+1=(128-8)+3=120+3=123
( )16
120.120.11 41271230 =×=××−= −−N
Patt and Patel, pp. 34
© 2006
Department of Computing Science
CMPUT 229
Floating Point Representation (example)
S exponent fraction2381
( )( )⎪⎩
⎪⎨⎧
=××−=
≤≤××−=−
−
0 ,2.01
2541 ,2.11126
127
exponentfractionN
exponentfractionNexponentS
exponentS
01000001100101000000000000000000
What is the decimal value of the following floating point number?
exponent
exponent =128+2+1=131
( ) 24
2127131
20 1.10010200101.1200101.11 =×=××−= −N
5.182
1216222 114 =++=++= −N
Patt and Patel, pp. 35
© 2006
Department of Computing Science
CMPUT 229
Floating Point Representation (example)
S exponent fraction2381
( )( )⎪⎩
⎪⎨⎧
=××−=
≤≤××−=−
−
0 ,2.01
2541 ,2.11126
127
exponentfractionN
exponentfractionNexponentS
exponentS
11000001000101000000000000000000
What is the decimal value of the following floating point number?
exponent
exponent =128+2=130
( ) 23
2127130
21 01.1001200101.1200101.11 −=×−=××−= −N
( ) 25.94
118222 203 −=⎟
⎠
⎞⎜⎝
⎛++−=++−= −N
Patt and Patel, pp. 35
© 2006
Department of Computing Science
CMPUT 229
Floating Point
S exponent fraction2381
( )( )⎪⎩
⎪⎨⎧
=××−=
≤≤××−=−
−
0 ,2.01
2541 ,2.11126
127
exponentfractionN
exponentfractionNexponentS
exponentS
What is the largest number that can be represented using a 32-bit floatingpoint number using the IEEE 754 format above?
01111111011111111111111111111111
exponentexponent =254
232221 2121....2121 −−−− ×+×++×+×=fraction
Patt and Patel, pp. 35
99999998807.0810241024
11
2
112121
23230 =
××−=−=×−×= −fraction
© 2006
Department of Computing Science
CMPUT 229
Floating Point
S exponent fraction2381
( )( )⎪⎩
⎪⎨⎧
=××−=
≤≤××−=−
−
0 ,2.01
2541 ,2.11126
127
exponentfractionN
exponentfractionNexponentS
exponentS
What is the largest number that can be represented in 32 bits floatingpoint using the IEEE 754 format above?
01111111011111111111111111111111
exponentactual exponent =254-127 = 127 99999998807.0=fraction
( ) 1281270 2299999998807.11 ≈××−=N
Patt and Patel, pp. 35
© 2006
Department of Computing Science
CMPUT 229
Floating Point
S exponent fraction2381
( )( )⎪⎩
⎪⎨⎧
=××−=
≤≤××−=−
−
0 ,2.01
2541 ,2.11126
127
exponentfractionN
exponentfractionNexponentS
exponentS
What is the smallest number (closest to zero) that can be represented in 32 bits floating point using the IEEE 754 format above?
00000000000000000000000000000001
exponentactual exponent =0-126 = -126 2321 −×=fraction
( ) 149126230 2221 −−− ≈××−=N
Patt and Patel, pp. 35
© 2006
Department of Computing Science
CMPUT 229
Special Floating Point Representations
In the 8-bit field of the exponent we can represent numbers from 0 to255. We studied how to read numbers with exponents from 0 to 254.What is the value represented when the exponent is 255 (i.e. 111111112)?
An exponent equal 255 = 111111112 in a floating point representationindicates a special value.
When the exponent is equal 255 = 111111112 and the fraction is 0,the value represented is infinity.
When the exponent is equal 255 = 111111112 and the fraction is non-zero, the value represented is Not a Number (NaN).
Hen/Patt, pp. 301
© 2006
Department of Computing Science
CMPUT 229
Double Precision
32-bit floating point representation is usually called single precisionrepresentation.
A double precision floating point representation requires 64 bits. In double precision the following number of bits are used:
1 sign bit11 bits for exponent52 bits for fraction (also called significand)
© 2006
Department of Computing Science
CMPUT 229
Floating Point Addition (Decimal)
How do we perform the following addition?
9.99910 101 + 1.61010 10-1
Step 1: Align decimal point of the number with smaller exponent (notice lost of precision)
9.99910 101 + 0.01610 101
Step 2: Add significands:9.99910 101 + 0.01610 101 = 10.01510 101
Step 3: Renormalize the result:10.015 101 = 1.0015 102
Step 3: Round-off the result to the representation available:1.0015 102 = 1.002 102
Hen/Patt, pp. 281
© 2006
Department of Computing Science
CMPUT 229
Floating Point Addition(Example)
Convert the numbers 0.510 and -0.437510 to floating point binary representation, and then perform the binary floating-point addition of these numbers.
22
02
432
10101010
122
110
2110.120111.0222
0625.0125.025.04375.0
2000.11.022
15.0
−−−−
−−
×=×=++=
++=
×====
12
12
22
12 2111.02000.12110.12000.1 −−−− ×−×=×−×
Which number should have its significand adjusted?
12
12
12 2001.02111.02000.1 −−− ×=×−×
42
12 2000.12001.0 −− ×=×
Hen/Patt, pp. 283
© 2006
Department of Computing Science
CMPUT 229
Floating Point Multiplication (Decimal)
Assume that we only can store four digits of the significand and two digits of the exponent in a decimal floating point representation.
How would you multiply 1.110101010 by 9.2001010-5 inthis representation?
Step 1: Add the exponents: new exponent = 10 - 5 = 5
Step 2: Multiply the significands: 1.1109.200
00000000
2220 9990
10.212000
Step 3: Normalize the product:10.21210105 = 1.021210 106
Step 4: Round-off the product:1.021210106 = 1.02110 106
Hen/Patt, pp. 286
Recommended