Representation of real number

REPRESENTATION OF REAL NUMBER

Presented by: Pawan yadav Puneet vinayak

CONTENTS:- Floating Point Numbers Decimal Binary conversion Floating point representation Mantissa Exponent Normalization IEEE Floating Point Representation Floating point airhtematic Error in floating point airthematic

FLOATING POINT NUMBERS In computer science real number is also called

floating point number. In the decimal system, a decimal point (radix

point) separates the whole numbers from the fractional part

Examples:

37.25 ( whole=37, fraction = 25)

123.567

10.12345678

FLOATING POINT NUMBERS

For example, 37.25 can be analyzed as:

101 100 10-1 10-2

Tens Units Tenths Hundredths3 7 2 5

37.25 = 3 x 10 + 7 x 1 + 2 x 1/10 + 5 x 1/100

BINARY EQUIVALENT In the binary representation of a floating point

number the column values will be as follows:

… 26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4 …

… 64 32 16 8 4 2 1 . 1/2 1/4 1/8 1/16 …

… 64 32 16 8 4 2 1 . .5 .25 .125 .0625…

DECIMAL BINARY CONVERSION

Repeatedly multiply fraction by two until fraction becomes zero.

0.8125 1.6250.625 1.250.25 0.50.5 1.0

SCIENTIFIC NOTATION OF FLOATING NUMBERS Decimal:-123,000,000,000,000 -1.23 × 1014

0.000 000 000 000 000 123 +1.23× 10-16

Binary:110 1100 0000 0000 1.1011× 214

-0.0000 0000 0000 0001 1011 -1.1101 × 2-16

FLOATING POINT NUMBER REPRESENTATION If x is a real number then its normal form

representation is:x = f • Base E

where f : mantissaE: exponent

exponentExample: 125.3210 = 0.12532 • 103

mantissa - 125.3210 = - 0.12532 • 103

0.054610 = 0.546 • 10 –1

NORMALIZED AND UNNORMALIZED

NORMALIZATION PROCESS

FLOATING POINT FORMAT FOR BINARY NUMBERS

IEEE FLOATING POINT REPRESENTATION

– more exponent bits greater range– more significant bits greater accuracy

IEEE FLOATING POINT REPRESENTATION The first, or leftmost, field of our floating point

representation will be the sign bit: 0 for a positive number, 1 for a negative number.

IEEE FLOATING POINT REPRESENTATION The second field of the floating point number will be

the exponent. Since we must be able to represent both positive and

negative exponents, we will use a convention which uses a value known as a bias of 127 to determine the representation of the exponent. An exponent of 5 is therefore stored as 127 + 5 or 132; an exponent of -5 is stored as 127 + (-5) OR 122.

The biased exponent, the value actually stored, will range from 0 through 255. This is the range of values that can be represented by 8-bit, unsigned binary numbers.

IEEE FLOATING POINT REPRESENTATION The mantissa is the set of 0’s and 1’s to

the left of the radix point of the normalized (when the digit to the left of the radix point is 1) binary number. ex:1.00101 X 23

The mantissa is stored in a 23 bit field,

NORMALIZING NUMBERSExample:

134.1510 = 0.13415 x 103

0.002110 = 0.21 x 10-2

101.11B = .1011 x 23 or 1.011 x 22 (hidden1)

0.011B = .11 x 2-1 or 1.1 x 2-2 (hidden1)

AB.CDH= .ABCD x 162

0.00ACH= .AC x 16-2

Note that the concept of a hidden 1 only applied to binary.

CONVERTING DECIMAL FLOATING POINT VALUES TO STORED IEEE STANDARD VALUES. Example: Find the IEEE FP representation of

40.15625.

Step 1. Compute the binary equivalent of the whole part and the fractional part. ( convert 40 and .15625. to their binary equivalents)

40.1562510 = 101000.001012

CONVERTING DECIMAL FLOATING POINT VALUES TO STORED IEEE STANDARD VALUES.

Step 2. Normalize the number by moving the decimal point to the right of the leftmost one.

101000.00101 = 1.0100000101 x 25

Step 3. Convert the exponent to a biased

exponent

127 + 5 = 132

==> 13210 = 100001002

CONVERTING DECIMAL FLOATING POINT VALUES TO STORED IEEE STANDARD VALUES.

Step 4. Store the results from above

Sign Exponent (from step 3) Mantissa ( from step 2)

0 10000100 01000001010 .. 0

CONVERT 10.37 TO SINGLE PRECISION FLOATING POINT

Floating point arithmetic

FLOATING-POINT ADDITION

23

Assume 4 decimal digit for mantissa

FLOATING POINT SUBTRACTION(USING 4 DIGIT MANTISSA)

Addition must be of terms of the same scale: 0.2361106 - 0.1455104

0.2361106 - 0.001455106 {both106} (0.2361 - 0.001455) 106

0.147861 106

0.234645 106

0.2346 106 {4 digit mantissa}

REAL NUMBER MULTIPLICATION(USING 4 DIGIT MANTISSA)

Multiplication problem is in the mantissa (0.2361102) (0.1455 104) 0.2361 0.1455 102+4 {add indices} 0.03435255 106 = 0.3435255 105

0.3435 105 {4 digit mantissa}

Notice that multiplication must work from the largest digit downwards since at some point the number is going to have to be truncated.

REAL NUMBER DIVISION(USING 4 DIGIT MANTISSA)

(0.2361102) /(0.1455 104) (0.2361 /0.1455) 102-4 {sub indices} 1.6226804 10-2 = 0.3435255 105

0.16226804 10-1

0.1623 10-1 {4 digit mantissa}

ERRORS IN FLOATING POINT ARITHMETIC Round off errorEx- 5.6999=5.7 7.238=7.24 Truncation error 4.67444444=4.674 5.45676767=5.4567

thanks

Documents

Representation of real number