Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Floating point representation
-¥f¥,- III:*n→ t t = and
i: ¥ l&:
(Unsigned) Fixed-point representationThe numbers are stored with a fixed number of bits for the integer part and a fixed number of bits for the fractional part.
Suppose we have 8 bits to store a real number, where 5 bits store the integer part and 3 bits store the fractional part:
2"2#2$2%2& 2'%2'$2'#1 0 1 1 1.0 1 1 $
Smallest number:
Largest number:
( 00000 . 00 I 72 =@. 1251,0( l l l l l . l l l L = (31-875),
(Unsigned) Fixed-point representationSuppose we have 64 bits to store a real number, where 32 bits store the integer part and 32 bits store the fractional part:
!"# …!%!#!&. (#(%(" …("% % = *+,&
"#!+ 2+ +*
+,#
"%(+ 2/+
= !"#× 2"#+!"&× 2"&+⋯+ !&× 2&+(#× 2/#+(%× 2%+⋯+ ("%× 2/"%
0 ∞
- I0
smallest i 00 - - .00 . ¥,Oy -
OLz-322 a 10
largest .. l l l - - . . l l . l l l . . . l l E 109ka wg-10 109
10
(Unsigned) Fixed-point representationRange: difference between the largest and smallest numbers possible.
More bits for the integer part ⟶ increase range
Precision: smallest possible difference between any two numbersMore bits for the fractional part ⟶ increase precision
Wherever we put the binary point, there is a trade-off between the amount of range and precision. It can be hard to decide how much you need of each!
"#"$"%. '$'#'( # "$"%. '$'#'(') #OR
Scientific Notation
In scientific notation, a number can be expressed in the form
! = ± $ × 10(
where $ is a coefficient in the range 1 ≤ $ < 10 and + is the exponent.
1165.7 =
0.0004728 =
Floating-point numbersA floating-point number can represent numbers of different order of magnitude (very large and very small) with the same number of fixed bits.
In general, in the binary system, a floating number can be expressed as
! = ± $ × 2'$ is the significand, normally a fractional value in the range [1.0,2.0)
. is the exponent
Floating-point numbers
Numerical Form:
! = ±$ × 2' = ±(). (+(,(- …(/× 2'
Fractional part of significand(0 bits)
significand
÷bit bi E lo , I }
ME [ L , U ]Precision : pent I
Normalized floating-point numbersNormalized floating point numbers are expressed as
! = ± 1. &'&(&) …&+× 2. = ± 1. / × 2.
where / is the fractional part of the significand, 0 is the exponent and &1 ∈ 0,1 .
store 5 bits/ bo . blbzbsb4 → p=5 bits
\ I . b.bebaby be → p-6
hidden bit representation →"
gain"
I bit of
precision
Converting floating points
Convert (39.6875)*+ = 100111.1011 / into floating point representation
Iclicker questionDetermine the normalized floating point representation 1. # × 2& of the decimal number ' = 47.125 (# in binary representation and & in decimal)
A) 1.01110001 / × 20B) 1.01110001 / × 22C) 1.01111001 / × 20D) 1.01111001 / × 22
meet. ps Lcs357
(47. 1251,0=401411.011/25
l . O l l l l O l l x2
• Exponent range:
• Precision:
• Smallest positive normalized FP number:
• Largest positive normalized FP number:
Normalized floating-point numbers
! = ± $ × 2'= ± 1. *+*,*- …*/× 2' = ± 1. 0 × 2'• r
DME [ L , U ]
P - htt h:# bits in f
-
L1. 000 . . . . OO x 2h = 2-n
1.lll__x I = It'ft -2-P)
Floating-point numbers: Simple exampleA ”toy” number system can be represented as ! = ±1. &'&(×2+for , ∈ [−4,4] and &3 ∈ {0,1}.
i÷÷÷÷÷÷ Effm-- 9
µm-12
/m=-3
)m -- -4
Floating-point numbers: Simple exampleA ”toy” number system can be represented as ! = ±1. &'&(×2+for , ∈ [−4,4] and &3 ∈ {0,1}.1.00 ( ×27 = 11.01 ( ×27 = 1.251.10 ( ×27 = 1.51.11 ( ×27 = 1.75
1.00 ( ×2:' = 0.51.01 ( ×2:' = 0.6251.10 ( ×2:' = 0.751.11 ( ×2:' = 0.875
1.00 ( ×2' = 21.01 ( ×2' = 2.51.10 ( ×2' = 3.01.11 ( ×2' = 3.5
1.00 ( ×2( = 4.01.01 ( ×2( = 5.01.10 ( ×2( = 6.01.11 ( ×2( = 7.0
1.00 ( ×2> = 8.01.01 ( ×2> = 10.01.10 ( ×2> = 12.01.11 ( ×2> = 14.0
1.00 ( ×2? = 16.01.01 ( ×2? = 20.01.10 ( ×2? = 24.01.11 ( ×2? = 28.0
1.00 ( ×2:( = 0.251.01 ( ×2:( = 0.31251.10 ( ×2:( = 0.3751.11 ( ×2:( = 0.4375
1.00 ( ×2:> = 0.1251.01 ( ×2:> = 0.156251.10 ( ×2:> = 0.18751.11 ( ×2:> = 0.21875
1.00 ( ×2:? = 0.06251.01 ( ×2:? = 0.0781251.10 ( ×2:? = 0.093751.11 ( ×2:? = 0.109375
Same steps are performed to obtain the negative numbers. For simplicity, we will show only the positive numbers in this example.
D (It'll -2-17=254-2-3)
⇒ ri-
! = ±1. &'&(×2+ for , ∈ [−4,4] and &3 ∈ {0,1}
• Smallest normalized positive number:
• Largest normalized positive number:
2h = 2-4=0.0625
It' ( I - 2-t ) = 28
! = ±1. &'&(×2+ for , ∈ [−4,4] and &3 ∈ {0,1}
Machine epsilon• Machine epsilon (7+): is defined as the distance (gap) between 1 and the
next largest floating point number.
Em .-I" 1/2-2
X⇒ 1.0000 . . . 00°
-x2
1. 000 -..
01×20-
0.000 - - - - 0*200--5"
Machine numbers: how floating point numbers are stored?
Floating-point number representationWhat do we need to store when representing floating point numbers in a computer?
! = ± 1. & × 2)
! = ± * +sign exponent significand
Initially, different floating-point representations were used in computers, generating inconsistent program behavior across different machines.
Around 1980s, computer manufacturers started adopting a standard representation for floating-point number: IEEE (Institute of Electrical and Electronics Engineers) 754 Standard.
Floating-point number representationNumerical form:
! = ± 1. & × 2)
Representation in memory:
! =
-signedME[LiU]
S C --Mtshift f
unsigned (signed
Precisions:
IEEE-754 Single precision (32 bits):
IEEE-754 Double precision (64 bits):
! =
! =
S a fI 8bits -
Ibit 23 bits
S C/ Ilbits -
Ibit 52 bits
IEEE-754 Single Precision (32-bit)! = (−1)' 1. ) × 2,
sign (1-bit)
exponent(8-bit)
significand(23-bit)
- . = / + -ℎ234 3↳-1260=127
p - 24
C = 000000072=0110 Of C f 255C -41111111112=255 set aside c-- O
C --255
I f e f 2540¥fl%oicea,offfhift
/ gmtshifts 254f-lzgfmf.be#
I f C f 254M→
I f Mt shift E 254| fshift=l27J choice !I - 127 f M S 254- 127
-126 S ME 127 23
praise-8
C = 101110010)C -- Mt shift
M - c - shift
IEEE-754 Single Precision (32-bit)
! = (−1)' 1. ) × 2,
67.125 = 1000011.001 1 = 1.000011001 1×22
Example: Represent the number ! = −67.125 using IEEE Single-Precision Standard
• Machine epsilon (!"): is defined as the distance (gap) between 1 and the next largest floating point number.
# = (−1)) 1. + × 2. = / = 0 + 127
IEEE-754 Single Precision (32-bit)) 3 +
• Smallest positive normalized FP number:
• Largest positive normalized FP number:
Em = In = 2-"
w 1.2×10-7
2"
→ 2-126 I 10-38
It' fl - 2-P) ⇒ 212811 - 2-24) 1038
Ah
=A -OFL - UFL UFL OFL A
-
10£ -
1£38 to-32 11038