Floating point representation · (Unsigned) Fixed-point representation Range: difference between the largest and smallest numbers possible. More bits for the integer part increase

Floating point representation

-¥f¥,- III:*n→ t t = and

i: ¥ l&:

(Unsigned) Fixed-point representationThe numbers are stored with a fixed number of bits for the integer part and a fixed number of bits for the fractional part.

Suppose we have 8 bits to store a real number, where 5 bits store the integer part and 3 bits store the fractional part:

2"2#2$2%2& 2'%2'$2'#1 0 1 1 1.0 1 1 $

Smallest number:

Largest number:

( 00000 . 00 I 72 =@. 1251,0( l l l l l . l l l L = (31-875),

(Unsigned) Fixed-point representationSuppose we have 64 bits to store a real number, where 32 bits store the integer part and 32 bits store the fractional part:

!"# …!%!#!&. (#(%(" …("% % = *+,&

"#!+ 2+ +*

+,#

"%(+ 2/+

= !"#× 2"#+!"&× 2"&+⋯+ !&× 2&+(#× 2/#+(%× 2%+⋯+ ("%× 2/"%

0 ∞

- I0

smallest i 00 - - .00 . ¥,Oy -

OLz-322 a 10

largest .. l l l - - . . l l . l l l . . . l l E 109ka wg-10 109

10

(Unsigned) Fixed-point representationRange: difference between the largest and smallest numbers possible.

More bits for the integer part ⟶ increase range

Precision: smallest possible difference between any two numbersMore bits for the fractional part ⟶ increase precision

Wherever we put the binary point, there is a trade-off between the amount of range and precision. It can be hard to decide how much you need of each!

"#"$"%. '$'#'( # "$"%. '$'#'(') #OR

Scientific Notation

In scientific notation, a number can be expressed in the form

! = ± $ × 10(

where $ is a coefficient in the range 1 ≤ $ < 10 and + is the exponent.

1165.7 =

0.0004728 =

Floating-point numbersA floating-point number can represent numbers of different order of magnitude (very large and very small) with the same number of fixed bits.

In general, in the binary system, a floating number can be expressed as

! = ± $ × 2'$ is the significand, normally a fractional value in the range [1.0,2.0)

. is the exponent

Floating-point numbers

Numerical Form:

! = ±$ × 2' = ±(). (+(,(- …(/× 2'

Fractional part of significand(0 bits)

significand

÷bit bi E lo , I }

ME [ L , U ]Precision : pent I

Normalized floating-point numbersNormalized floating point numbers are expressed as

! = ± 1. &'&(&) …&+× 2. = ± 1. / × 2.

where / is the fractional part of the significand, 0 is the exponent and &1 ∈ 0,1 .

store 5 bits/ bo . blbzbsb4 → p=5 bits

\ I . b.bebaby be → p-6

hidden bit representation →"

gain"

I bit of

precision

Converting floating points

Convert (39.6875)*+ = 100111.1011 / into floating point representation

Iclicker questionDetermine the normalized floating point representation 1. # × 2& of the decimal number ' = 47.125 (# in binary representation and & in decimal)

A) 1.01110001 / × 20B) 1.01110001 / × 22C) 1.01111001 / × 20D) 1.01111001 / × 22

meet. ps Lcs357

(47. 1251,0=401411.011/25

l . O l l l l O l l x2

• Exponent range:

• Precision:

• Smallest positive normalized FP number:

• Largest positive normalized FP number:

Normalized floating-point numbers

! = ± $ × 2'= ± 1. *+*,*- …*/× 2' = ± 1. 0 × 2'• r

DME [ L , U ]

P - htt h:# bits in f

-

L1. 000 . . . . OO x 2h = 2-n

1.lll__x I = It'ft -2-P)

Floating-point numbers: Simple exampleA ”toy” number system can be represented as ! = ±1. &'&(×2+for , ∈ [−4,4] and &3 ∈ {0,1}.

i÷÷÷÷÷÷ Effm-- 9

µm-12

/m=-3

)m -- -4

Floating-point numbers: Simple exampleA ”toy” number system can be represented as ! = ±1. &'&(×2+for , ∈ [−4,4] and &3 ∈ {0,1}.1.00 ( ×27 = 11.01 ( ×27 = 1.251.10 ( ×27 = 1.51.11 ( ×27 = 1.75

1.00 ( ×2:' = 0.51.01 ( ×2:' = 0.6251.10 ( ×2:' = 0.751.11 ( ×2:' = 0.875

1.00 ( ×2' = 21.01 ( ×2' = 2.51.10 ( ×2' = 3.01.11 ( ×2' = 3.5

1.00 ( ×2( = 4.01.01 ( ×2( = 5.01.10 ( ×2( = 6.01.11 ( ×2( = 7.0

1.00 ( ×2> = 8.01.01 ( ×2> = 10.01.10 ( ×2> = 12.01.11 ( ×2> = 14.0

1.00 ( ×2? = 16.01.01 ( ×2? = 20.01.10 ( ×2? = 24.01.11 ( ×2? = 28.0

1.00 ( ×2:( = 0.251.01 ( ×2:( = 0.31251.10 ( ×2:( = 0.3751.11 ( ×2:( = 0.4375

1.00 ( ×2:> = 0.1251.01 ( ×2:> = 0.156251.10 ( ×2:> = 0.18751.11 ( ×2:> = 0.21875

1.00 ( ×2:? = 0.06251.01 ( ×2:? = 0.0781251.10 ( ×2:? = 0.093751.11 ( ×2:? = 0.109375

Same steps are performed to obtain the negative numbers. For simplicity, we will show only the positive numbers in this example.

D (It'll -2-17=254-2-3)

⇒ ri-

! = ±1. &'&(×2+ for , ∈ [−4,4] and &3 ∈ {0,1}

• Smallest normalized positive number:

• Largest normalized positive number:

2h = 2-4=0.0625

It' ( I - 2-t ) = 28

! = ±1. &'&(×2+ for , ∈ [−4,4] and &3 ∈ {0,1}

Machine epsilon• Machine epsilon (7+): is defined as the distance (gap) between 1 and the

next largest floating point number.

Em .-I" 1/2-2

X⇒ 1.0000 . . . 00°

-x2

1. 000 -..

01×20-

0.000 - - - - 0*200--5"

Machine numbers: how floating point numbers are stored?

Floating-point number representationWhat do we need to store when representing floating point numbers in a computer?

! = ± 1. & × 2)

! = ± * +sign exponent significand

Initially, different floating-point representations were used in computers, generating inconsistent program behavior across different machines.

Around 1980s, computer manufacturers started adopting a standard representation for floating-point number: IEEE (Institute of Electrical and Electronics Engineers) 754 Standard.

Floating-point number representationNumerical form:

! = ± 1. & × 2)

Representation in memory:

! =

-signedME[LiU]

S C --Mtshift f

unsigned (signed

Precisions:

IEEE-754 Single precision (32 bits):

IEEE-754 Double precision (64 bits):

! =

! =

S a fI 8bits -

Ibit 23 bits

S C/ Ilbits -

Ibit 52 bits

IEEE-754 Single Precision (32-bit)! = (−1)' 1. ) × 2,

sign (1-bit)

exponent(8-bit)

significand(23-bit)

- . = / + -ℎ234 3↳-1260=127

p - 24

C = 000000072=0110 Of C f 255C -41111111112=255 set aside c-- O

C --255

I f e f 2540¥fl%oicea,offfhift

/ gmtshifts 254f-lzgfmf.be#

I f C f 254M→

I f Mt shift E 254| fshift=l27J choice !I - 127 f M S 254- 127

-126 S ME 127 23

praise-8

C = 101110010)C -- Mt shift

M - c - shift

IEEE-754 Single Precision (32-bit)

! = (−1)' 1. ) × 2,

67.125 = 1000011.001 1 = 1.000011001 1×22

Example: Represent the number ! = −67.125 using IEEE Single-Precision Standard

• Machine epsilon (!"): is defined as the distance (gap) between 1 and the next largest floating point number.

# = (−1)) 1. + × 2. = / = 0 + 127

IEEE-754 Single Precision (32-bit)) 3 +

• Smallest positive normalized FP number:

• Largest positive normalized FP number:

Em = In = 2-"

w 1.2×10-7

2"

→ 2-126 I 10-38

It' fl - 2-P) ⇒ 212811 - 2-24) 1038

Ah

=A -OFL - UFL UFL OFL A

-

10£ -

1£38 to-32 11038

Documents

Floating point representation · (Unsigned) Fixed-point representation Range: difference between the largest and smallest numbers possible. More bits for the integer part increase