Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real...

Faculty of Computer Science

Floating Point Representation

Operating with Real Numbers

Department of Computing Science

CMPUT 229

Reading Material

This set of slides is based on the texts by Patt and

Patel and by Patterson and Hennessy.

The topics covered in these slides are presented in

Section 4.9 of Clements’ textbook.

CMPUT 229

Representing Large and Small Numbers

How would you represent a number such as 6.0231023 in binary?

The range (1023) of this number is greater than the range of the 32-bitsrepresentation that we have used for integers (2312.14 1010).

However the precision (6023) of this number is quite small, and can beexpressed in a small number of bits.

The solution is to use a floating point representation.

A floating point representation allocates some bits for the range ofthe value, some bits for precision, and one bit for the sign.

Patt/Patel, pp. 32

CMPUT 229

Floating Point Representation

Most standard floating point representation use: 1 bit for the sign (positive or negative) 8 bits for the range (exponent field)23 bits for the precision (fraction field)

S exponent fraction2381

( )( )⎪⎩

⎪⎨⎧

=××−=

≤≤××−=−

0 ,2.01

2541 ,2.11126

exponentfractionN

exponentfractionNexponentS

exponentS

Patt/Patel, pp. 33

CMPUT 229

Floating Point Representation (example)

( )( )⎪⎩

⎪⎨⎧

=××−=

≤≤××−=−

0 ,2.01

2541 ,2.11126

exponentfractionN

exponentS

point? floatingin drepresente 8

56number theis How :Example −

Thus the exponent is given by:

1292127 =⇒=− exponentexponent1 10000001 10101000000000000000000

Patt and Patel, pp. 34

CMPUT 229

( )( )⎪⎩

⎪⎨⎧

=××−=

≤≤××−=−

0 ,2.01

2541 ,2.11126

exponentfractionN

exponentS

00111101100000000000000000000000

What is the decimal value of the following floating point number?

exponent

exponent = 64+32+16+8+2+1=(128-8)+3=120+3=123

120.120.11 41271230 =×=××−= −−N

CMPUT 229

( )( )⎪⎩

⎪⎨⎧

=××−=

≤≤××−=−

0 ,2.01

2541 ,2.11126

exponentfractionN

exponentS

01000001100101000000000000000000

exponent

exponent =128+2+1=131

( ) 24

2127131

20 1.10010200101.1200101.11 =×=××−= −N

1216222 114 =++=++= −N

CMPUT 229

( )( )⎪⎩

⎪⎨⎧

=××−=

≤≤××−=−

0 ,2.01

2541 ,2.11126

exponentfractionN

exponentS

11000001000101000000000000000000

exponent

exponent =128+2=130

( ) 23

2127130

21 01.1001200101.1200101.11 −=×−=××−= −N

( ) 25.94

118222 203 −=⎟

⎞⎜⎝

⎛++−=++−= −N

CMPUT 229

Floating Point

( )( )⎪⎩

⎪⎨⎧

=××−=

≤≤××−=−

0 ,2.01

2541 ,2.11126

exponentfractionN

exponentS

What is the largest number that can be represented using a 32-bit floatingpoint number using the IEEE 754 format above?

01111111011111111111111111111111

exponentexponent =254

232221 2121....2121 −−−− ×+×++×+×=fraction

99999998807.0810241024

112121

23230 =

××−=−=×−×= −fraction

CMPUT 229

Floating Point

( )( )⎪⎩

⎪⎨⎧

=××−=

≤≤××−=−

0 ,2.01

2541 ,2.11126

exponentfractionN

exponentS

What is the largest number that can be represented in 32 bits floatingpoint using the IEEE 754 format above?

01111111011111111111111111111111

exponentactual exponent =254-127 = 127 99999998807.0=fraction

( ) 1281270 2299999998807.11 ≈××−=N

CMPUT 229

Floating Point

( )( )⎪⎩

⎪⎨⎧

=××−=

≤≤××−=−

0 ,2.01

2541 ,2.11126

exponentfractionN

exponentS

What is the smallest number (closest to zero) that can be represented in 32 bits floating point using the IEEE 754 format above?

00000000000000000000000000000001

exponentactual exponent =0-126 = -126 2321 −×=fraction

( ) 149126230 2221 −−− ≈××−=N

CMPUT 229

Special Floating Point Representations

In the 8-bit field of the exponent we can represent numbers from 0 to255. We studied how to read numbers with exponents from 0 to 254.What is the value represented when the exponent is 255 (i.e. 111111112)?

An exponent equal 255 = 111111112 in a floating point representationindicates a special value.

When the exponent is equal 255 = 111111112 and the fraction is 0,the value represented is infinity.

When the exponent is equal 255 = 111111112 and the fraction is non-zero, the value represented is Not a Number (NaN).

Hen/Patt, pp. 301

CMPUT 229

Double Precision

32-bit floating point representation is usually called single precisionrepresentation.

A double precision floating point representation requires 64 bits. In double precision the following number of bits are used:

1 sign bit11 bits for exponent52 bits for fraction (also called significand)

CMPUT 229

Floating Point Addition (Decimal)

How do we perform the following addition?

9.99910 101 + 1.61010 10-1

Step 1: Align decimal point of the number with smaller exponent (notice lost of precision)

9.99910 101 + 0.01610 101

Step 2: Add significands:9.99910 101 + 0.01610 101 = 10.01510 101

Step 3: Renormalize the result:10.015 101 = 1.0015 102

Step 3: Round-off the result to the representation available:1.0015 102 = 1.002 102

Hen/Patt, pp. 281

CMPUT 229

Floating Point Addition(Example)

Convert the numbers 0.510 and -0.437510 to floating point binary representation, and then perform the binary floating-point addition of these numbers.

10101010

2110.120111.0222

0625.0125.025.04375.0

2000.11.022

−−−−

−−

×=×=++=

×====

12 2111.02000.12110.12000.1 −−−− ×−×=×−×

Which number should have its significand adjusted?

12 2001.02111.02000.1 −−− ×=×−×

12 2000.12001.0 −− ×=×

Hen/Patt, pp. 283

CMPUT 229

Floating Point Multiplication (Decimal)

Assume that we only can store four digits of the significand and two digits of the exponent in a decimal floating point representation.

How would you multiply 1.110101010 by 9.2001010-5 inthis representation?

Step 1: Add the exponents: new exponent = 10 - 5 = 5

Step 2: Multiply the significands: 1.1109.200

00000000

2220 9990

10.212000

Step 3: Normalize the product:10.21210105 = 1.021210 106

Step 4: Round-off the product:1.021210106 = 1.02110 106

Hen/Patt, pp. 286

Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real...

Documents

Faculty of Computer Science © 2006 CMPUT 229 Memory Hierarchy Part 2 Refreshing Memory

CMPUT 229 - Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic4: Procedures José…

Faculty of Computer Science © 2006 CMPUT 229 Assembly Language Programming Control Flow, Endianess and Registers

CMPUT 229 - Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic6: Logic, Multiply and Divide Operations José Nelson Amaral

CMPUT 366: Intelligent Systems and CMPUT 609

CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and

Hybrid Manipulation: Force-Vision CMPUT 610 Martin Jagersand

Faculty of Computer Science © 2006 CMPUT 229 Subroutines (Part 1) The 68K Stack

N-view factorization and bundle adjustment CMPUT 613

CMPUT 229 1 Efficiency Often several ways to write the same program Want to choose the most efficient implementation Space efficiency using small

Computing Science (CMPUT) 455

Faculty of Computer Science © 2006 CMPUT 229 Representing Information Numbers, Numbers, and Numbers

CMPUT 229 - Computer Organization and Architecture I1 CMPUT229 - Fall 2003 TopicA: Flow Analysis José Nelson Amaral

CMPUT 301: Lecture 01 Introduction

Faculty of Computer Science © 2006 CMPUT 229 Special-Purpose Codes Binary, BCD, Hamming, Gray, EDC, ECC

Main Memory by J. Nelson Amaral. CMPUT 229 Types of Memories Read/Write Memory (RWM): the time required to read or write a bit of memory is independent

CMPUT 229 - Computer Organization and Architecture I1 CMPUT229 - Fall 2002 Topic 2: Digital Logic Structure Jos Nelson Amaral

Transmembrane Protein Prediction Project Presentation CMPUT 606

CMPUT 229 - Computer Organization and Architecture I1 2. Number Systems z Decimal z Binary z Addition z Headecimal zTwo’s complement z ASCII characters

CMPUT 114 – First Class C. Jones, Winter 2003Slide # 1 CMPUT 114 – Welcome! Department of Computing Science University of Alberta