Chapter 9 Floating Point Arithmetic

Chapter 9Floating Point Arithmetic

9.1 Floating Point Formats

Common Format Components

• Each codes a normalized number whose “binary scientific notation” would be ±1.dd…d x 2exp

• Sign bit– 0 for positive and 1 for negative

• Exponent field– Actual exponent exp plus a bias– Bias gives an alternative to 2’s complement

• Fraction (“mantissa”) field

IEEE Single Precision Format• 32-bit format

– Sign bit– 8-bit biased exponent (the actual exponent in

the normalized binary “scientific” format plus 127)

– 23-bit fraction (the fraction in the scientific format without the leading 1 bit)

• Generated by REAL4 directive

IEEE Double Precision Format

• 64-bit format– Sign bit– 11-bit biased exponent (the actual exponent in

the normalized binary scientific format plus 1023)

– 52-bit fraction (the fraction in the scientific format without the leading 1 bit)


Double Extended Precision Format

• 80-bit format– Sign bit– 15-bit biased exponent (the actual exponent

in a normalized binary scientific format plus 16,383)

– 64-bit fraction (the fraction in the scientific format including the leading 1 bit)


Floating Point Formats

format total bitsexponent

bitsfraction

bitsapproximate

maximumapproximate

minimum

approximate decimal

precision

single 32 8 23 3.401038 1.1810-38 7 digits

double 64 11 52 1.7910308 2.2310-308 15 digits

extended double

80 15 64 1.19104932 3.3710-4932 19 digits

• These are for normalized numbers• Binary scientific notation mantissa written starting with 1 and

binary point

• Zero cannot be normalized• +0 represented by a pattern of all 0 bits

• Also formats for ± and NaN ("not a number”)

9.2 80x86 Floating Point Architecture

Floating Point Unit• FPU is independent of integer unit

• Eight 80-bit registers, organized as a stack– ST, the stack top, also called ST(0)– ST(1), the register just below the stack top– ST(2), the register just below ST(1)– ST(3), ST(4), ST(5), ST(6)– ST(7), the register at the bottom of the stack

• Several 16-bit control registers, including status word

Load Instructions• fld realMemoryOperand

– Loads stack top ST with floating point data value

– Values already on the stack are pushed down

• fld integerMemoryOperand– Converts integer value to corresponding fp

value that is pushed onto the stack

• fld st(nbr)– Pushes a copy of st(nbr) onto the fp stack

More Loads and finit• fld1

– Pushes 1.0 onto floating point stack

• fld0– Pushes 0.0 onto fp stack

• fldpi– Pushes onto fp stack …and others

• finit initializes the floating point processor, clearing the stack

Store Instructions• fst realMemoryOperand

– Copies stack top ST value to memory

• fstp realMemoryOperand– Copies stack top ST value to memory and pops

the floating point stack

• fist integerMemoryOperand– Copies stack top ST value to memory, converting

to integer

• fistp integerMemoryOperand– Same as fist, but also pops the floating point

stack

Exchange Instructions

• fxch– Exchange values in ST and ST(1)

• fxch st(nbr)– Exchange ST and ST(nbr)

Addition Instructions• fadd

– adds ST(1) and ST; pushes sum on stack

• fadd st, st(nbr)– adds ST(nbr) and ST; sum replaces ST

• fadd st(nbr), st– adds ST(nbr) and ST; sum replaces ST(nbr)

• faddp st(nbr), st– adds ST(nbr) and ST; sum replaces ST(nbr);

old ST popped from stack

More Addition Instructions• fadd realMemoryOperand

– Adds ST and real memory operand; sum replaces ST

• fiadd integerMemoryOperand– Adds ST and integer memory operand;

sum replaces ST

Subtraction Instructions• fsub

– pops ST and ST(1); calculates ST(1) - ST; pushes difference onto the stack

• fsub st(nbr), st– calculates ST(nbr) - ST;

replaces ST(nbr) by the difference

• fsub st, st(nbr)– calculates ST - ST(nbr);

replaces ST by the difference

More Subtraction Instructions• fsub realMemoryOperand

– calculates ST - real number from memory; replaces ST by the difference

• fisub integerMemoryOperand– calculates ST - integer from memory;


• fsubp st(nbr), st– calculates ST(nbr) - ST;

replaces ST(nbr) by the difference; pops ST from the stack

Reversed Subtraction Instructions

• fsubr– pops ST and ST(1); calculates ST - ST(1);

pushes difference onto the stack

• fsubr st(nbr), st– calculates ST - ST(nbr);

replaces ST(nbr) by the difference

• fsubr st, st(nbr)– calculates ST(nbr) - ST;


More Reversed Subtraction Instructions• fsubr realMemoryOperand

– calculates real number from memory - ST; replaces ST by the difference

• fisubr integerMemoryOperand– calculates integer from memory - ST;


• fsubpr st(nbr), st– calculates ST - ST(nbr);

replaces ST(nbr) by the difference; pops ST from the stack

Multiplication Instructions• fmul

– pops ST and ST(1); multiplies these values; pushes product onto the stack

• fmul st(nbr), st– multiplies ST(nbr) and ST;

replaces ST(nbr) by the product

• fmul st, st(num)– multiplies ST and ST(nbr);

replaces ST by the product

More Multiplication Instructions• fmul realMemoryOperand

– multiplies ST and real number from memory; replaces ST by the product

• fimul integerMemoryOperand– multiplies ST and integer from memory;

replaces ST by the product

• fmulp st(nbr), st– multiplies ST(nbr) and ST;

replaces ST(nbr) by the product; pops ST from stack

Division Instructions• fdiv

– pops ST and ST(1); calculates ST(1) / ST; pushes quotient onto the stack

• fdiv st(nbr), st– calculates ST(nbr) / ST;

replaces ST(nbr) by the quotient

• fdiv st,st(nbr)– calculates ST / ST(nbr);

replaces ST by the quotient

More Division Instructions• fdiv realMemoryOperand

– calculates ST / real number from memory; replaces ST by the quotient

• fidiv integerMemoryOperand– calculates ST / integer from memory;

replaces ST by the quotient

• fdivp st(nbr),st– calculates ST(nbr) / ST;

replaces ST(nbr) by the quotient; pops ST from the stack

Reversed Division Instructions

• Similar to reversed multiplication--each division instruction has a version that reverses operands used as dividend and divisor

• fdivr• fdivr• fdivr

• fdivr• fidivr• fdivpr

Miscellaneous Instructions• fabs

– Absolute value: ST := | ST |

• fchs– Change sign: ST := - ST

• frndint– Rounds ST to an integer value

• fsqrt– Replace ST by its square root

• There are also trigonometric, exponential and logarithmic functions

Comparisons• Each instruction compares ST with some other

operand• Sets “condition code” bits 14, 10 and 8 in the

status word register– These bits are named C3, C2 and C0

result of comparison C3 C2 C0ST > operand 0 0 0ST < operand 0 0 1ST = operand 1 0 0not comparable 1 1 1

Comparison Instructions• fcom

– compares ST and ST(1)

• fcom st(nbr)– compares ST and ST(nbr)

• fcom realMemoryOperand– compares ST and real number in memory

• ficom integerMemoryOperand– compares ST and integer in memory

More Comparison Instructions• ftst

– compares ST and 0.0

• fcomp– compares ST and ST(1); then pops stack

• fcompp– compares ST and ST(1); then pops stack

twice

Yet More Comparison Instructions• fcomp st(nbr)

– compares ST and ST(nbr); then pops stack

• fcomp realMemoryOperand– compares ST and real number in memory;

then pops stack

• ficomp integerMemoryOperand– compares ST and integer in memory; then

pops stack

Status Word Access• Conditional jump instructions look at bits in

flags register, not in status word. The fstsw instructions provide access to the status word bits.

• fstsw memoryWord– copies status register to memory word

• fstsw AX– copies status register to AX

• Similar instructions available for control word

Comparison in 32-bit Modefcom ; ST > ST(1)?

fstsw ax ; copy condition code bits to AX

sahf ; shift condition bits to flags

jna endGT ; skip if not

• sahf copies AH into the low order eight bits of the EFLAGS register– Puts C3 in the ZF position (bit 6) and C0 in

the CF position (bit 0)– Makes it possible to use conditional jump

instructions (unsigned mnemonics)

Comparison in 64-bit Mode

• sahf not available in 64-bit mode

• Two instructions directly set flags in the flags register

• fcomi st, st(nbr)– compares ST and ST(nbr)

• fcomip st, st(nbr)– compares ST and ST(nbr); pops stack

9.3 Converting Floating Point To and From ASCII

ASCII to Floating Point• Algorithm similar to ASCII to integer:

value := 0.0;point at first character of source string;while (source character is a digit) loop

convert ASCII digit to 2's complement digit; value := 10*value + float(digit); point at next character of source string;

end while;

• Main difference is that you must divide the final value by 10dig, where dig is the number of digits after a decimal point

Floating Point to ASCII (1)• Algorithm generates E-notation:

– a leading minus sign or a blank– a digit– a decimal point– five digits– the letter E– a plus sign or a minus sign– two digits

• These pieces generated one at a time

Floating Point to ASCII (2)point at first destination byte;

if value 0

then

put blank in destination string;

else

put minus in destination string;

value := value;

end if;

point at next destination byte;

Make leading character a minus sign or a blank

Floating Point to ASCII (3)exponent := 0;if value ≥ 10then

repeatdivide value by 10;add 1 to exponent;

until value < 10 loopelse

while value < 1 loopmultiply value by 10;subtract 1 from

exponent;end while;

end if;

“Normalize” fp value to have single digit before decimal point

Floating Point to ASCII (4)add 0.000005 to value; { for rounding }if value ≥ 10then

divide value by 10;add 1 to exponent;

end if; digit := int(value); { truncate to integer }convert digit to ASCII and store in destination string;point at next destination byte;store "." in destination string;point at next destination byte;

Continue to normalize floating point value; get first digit and decimal point

Floating Point to ASCII (5)for i := 1 to 5 loop

value := 10 * (value float(digit));

digit := int(value);

convert digit to ASCII and store in destination string;

point at next destination byte;

end for;

Generate five digits after the decimal point

Floating Point to ASCII (6)store E in destination string;point at next destination byte;if exponent 0then

put + in destination string;else

put in destination string;exponent := exponent;

end if;point at next destination byte;

convert exponent to two decimal digits;convert two decimal digits of exponent to ASCII;store characters of exponent in destination string;

Generate exponent

9.4 Single-Instruction Multiple-Data Instructions

SIMD Instructions• Single-instruction multiple-data (SIMD)

instructions operate on several operands at once with a single instruction

• The Intel family has had some form of SIMD instructions since the Pentium II– MMX technology in Pentium II– Several generations of streaming SIMD

extensions (SSE)– All current 80x86 CPUs include these features

SSE• First appeared in the Pentium III processor

• Eight new 128-bit registers, XMM0 through XMM7– 64-bit architecture added eight more XMM

registers, XMM8 through XMM15

• A single 128-bit register can hold four 32-bit floating point numbers

SSE Instructions• Packed SSE instructions operates on four

pairs of floating point numbers simultaneously

• Scalar SSE instructions operate only on the low-order operands, ignoring the other three

Selected Scalar SSE Instructionsmnemonic operand 1 (dest) operand 2 (source) action

movss xmmreg or mem32 xmmreg or mem32 destination := source(at least one operand must be a register)

addss xmmreg xmmreg or mem32 destination := destination + source

subss xmmreg xmmreg or mem32 destination := destination - source

mulss xmmreg xmmreg or mem32 destination := destination * source

divss xmmreg xmmreg or mem32 destination := destination / source

sqrtss xmmreg xmmreg or mem32 destination := sqrt(source)

rcpss xmmreg xmmreg or mem32 destination := 1/source

comiss xmmreg xmmreg or mem32 compare operand1 and operand2;set flags

Selected Packed SSE Instructionsmnemonic operand 1 (dest) operand 2 (source) action

movupsxmmreg or mem128 xmmreg or mem128 destination := source

(at least one operand must be a register)

addps xmmreg xmmreg or mem128 destination := destination + source(four additions)

subps xmmreg xmmreg or mem128 destination := destination - source(four subtractions)

mulps xmmreg xmmreg or mem128 destination := destination * source(four multiplications)

divps xmmreg xmmreg or mem128 destination := destination / source(four divisions)

Using Scalar SSE Instructions

• Similar to programming integer operations with general registers in the 32-bit or 64-bit mode

• comiss comparison instruction sets flags in exactly as fcomi does for the floating point unit– “Unsigned” conditional jump instructions are

appropriate following comiss or fcomi

9.5 Floating Point Assembly Language Procedures With C/C++

Why Use Assembly Language Procedures?

• May be possible or easier or more efficient to code parts of a program in assembly language than in a high-level language– Parts that need critical optimization– Implementation of low-level algorithms

• The bulk of programming is usually better done in a high level language

32-bit Linkages

• Decorate assembly language procedure name with an underscore– If C program calls roots, name the procedure

_roots

• To use cdecl protocol in a C++ program, use the “C” declaration, for example,

extern "C" void roots(…);• Push parameters on stack

• Return single float value in ST

64-bit Linkages

• No text decoration

• Pass floating point parameters in XMM0, XMM1, XMM2 and XMM3– Integer parameters in RCX, RDX, R8 and R9

• Return single float value in XMM0

Documents

Chapter 9 Floating Point Arithmetic