80X87 Arch and Register Set

80X87 Architecture and instruction sets

Intel Arithmetic co processors

• 8087, 80287, 80387SX, 80387DX• 80487SX for 80486SX• 80486DX contain internal FPU• Pentium and Petium-4 contains built in Co

processors• Do multiply, add divide subtract , Square root,

transcendental functions and logarithms – On 16, 32,64 bit integers– 32, 64 , 80 bit Floating point numbers– 18 digit BCD data

FPU data types

• The 80x87 FPU supports seven different data types: – three integer types,– a packed decimal type, and– three floating point types.

• Since the 80x86 CPUs already support integer data types, there are few reasons why you would want to use the 80x87 integer types.

• The packed decimal type provides a 17 digit signed decimal (BCD) integer.

• The three data types are the 32 bit, 64 bit, and 80 bit floating point data types we've looked at so far. The 80x87 data types appear in the following figures:

Integer data types / Assembler directives DW, DD, DQ

Integer data types• In the context of FPU operations, integers are whole numbers, i.e. numbers

which do not contain any fractional part.• All integers used in FPU instructions are also considered as signed integers, the

most significant bit being 0 for positive values or 1 for negative values.

• Negative integer values are represented by taking the 2's complement of the positive value and adding 1 (2's complements are obtained simply by inverting each bit of the number).

• As a refresher, the following example would be for a decimal value of 6235 in a 16-bit WORD.

• 0001 1000 0101 1011 185Bh +6235d • 1110 0111 1010 0100 2's complement • + 1• ------------------- 1110 0111 1010 0101 E7A5h -6235d

•

Integer data types

• Within the integer data types, three sizes of integers may be used:

• the 16-bit WORD, • the 32-bit DWORD, • and the 64-bit QWORD, (the 8-bit byte cannot

be used with FPU instructions). • The available range of values for each of those

sizes is as follows: – WORD range ±(2 15 -1) or ±32767– DWORD range ±(2 31 -1) or ±2147483647 – QWORD range ±(2 63 -1) or ±9223372036854775807

Floating point numbers• The floating point data types are simply binary numbers represented in a

manner similar to the scientific notation used for decimal values.• For example: • 211 = 2.11 x 10 2 (2.11E+0002) • (The latter is the conventional syntax for decimal values in scientific notation

when superscripts are not allowed in a text. For instance, most assemblers/compilers would not recognize superscripts.)

• If the above is divided by a multiple of 10 such as 100000, the only thing which would change in the scientific notation would be the exponent:

• 211 ÷ 100000 = 0.00211 = 2.11 x 10 -3 (2.11E-0003) • In binary, the 211 value could be expressed as: • 11010011 = 1.1010011 x 2 7

• In this case, if the above is divided by a multiple of 2 (such as 8), again the only thing which would change in the "binary scientific notation" would be the exponent:

• 11010011 ÷ 2 3 = 1.1010011 x 2 4

Floating point numbers• As can be deduced, this allows for the representation of binary fractions, and of very large or very small

values.

• The formatting of this "binary scientific notation" was standardized for the original CPUs and is usually called the IEEE (Institute of Electrical and Electronics Engineers) real number format.

• This real number format consists basically in dividing a binary numerical data into three fields: – a sign field, – an exponent field, – and a number description (significand) field.

• The exponent field is biased to the middle of the available range such that negative exponents are effectively smaller than positive exponents.

• And, as opposed to the negative integer system of 2's complements, the significand field is always that of the positive number, negative numbers being distinguished strictly by the sign field.

• Within the floating point data types, three sizes of real numbers are available: • the 32-bit REAL4 (also called short real or single precision),

• the 64-bit REAL8 (also called long real or double precision), •

the 80-bit REAL10 (also called temporary real or extended precision). •

FP data types / Real 4, Real 8, Real 10

80x87 data types appear in the following figures:

Real 4 numbers• For REAL4 numbers, the bias of the 8 exponent's bits is 7Fh (the last 7 bits).

– This means that if the real exponent is 0, the value of the exponent field would be 7Fh. – When the exponent is negative (i.e. for absolute values lower than 1), the value in the exponent field would be lower

than 7Fh, and vice versa for values of 2 and higher.

• The maximum value of FFh in the exponent field is reserved for a special category of numbers designated as NAN (Not-A-Number).

• This category includes the special value of INFINITY and will be described later in more details.

• The value of 0 in the exponent field is also reserved for a special category of numbers.

• When all bits in the significand field are also 0, the value of the REAL number would be equal to 0.

• If any of the bits in the significand field are set, the value is then called a "denormalized" REAL number.

• This will also be described later in more details. • Because a valid number in real format must always start with a 1, that first bit is implied in the REAL4

format and the significand field only contains the fraction bits f1, f2, etc.

• A value of +1.0 would thus be represented in REAL4 format as:

• 0 01111111 00000000000000000000000b (or 3F800000h in hex notation)

Real 4 numbers• The value of +2.0 (1.0 x 2 1) would be: • 0 10000000 00000000000000000000000b (or 40000000h in hex notation) S 7Fh+1 fraction bits • And the value of -2.0 would be: • 1 10000000 00000000000000000000000b (or C0000000h in hex notation) The result of dividing -

211 by 8 would give -1.1010011 x 24 in binary scientific format and its REAL4 representation would be:

• 1 10000011 10100110000000000000000 (or C1D30000h in hex notation) S 7Fh+4 fraction bits

• As with all other numerical data, all REAL numbers are stored in memory with the least significant bytes first.

• The value of +1.0 in REAL4 format would thus appear in consecutive bytes of memory as: • 00 00 80 3F

• The largest number which can be represented properly within the REAL4 format is when the exponent field contains FEh and the significand is almost equal to 2 (or almost 280h =2128d or approx. 3.40x1038).

• The smallest one would be when the exponent field contains 1 and the significand contains all 0s (or 2-7Eh =2-126d or approx. 1.17x10-38).

• The 24 bits describing the number (23 bits in the significand field + 1 implied bit) is approximately equivalent to 7 decimal digits.

Real 8 numbers• For REAL8 numbers, the bias of the 11 exponent's bits is 3FFh (the last 10 bits).• The maximum value of 7FFh in the exponent field is reserved for NANs, and the value of

0 in that field has the same purpose as described for the REAL4 format.

• As with the REAL4 format, the first bit of the number is implied and the significand field only contains the fraction bits f1, f2, etc.

• A value of +1.0 would thus be represented in REAL8 format as: • 0 01111111111 0000000000000000000000000000000000000000000000000000b (or

3FF0000000000000h in hex notation).• The largest number which can be represented properly within the REAL8 format is when

the exponent field contains 7FEh and the significand is almost equal to 2 (or almost 2400h =21024d or approx. 1.79x10308).

• The smallest one would be when the exponent field contains 1 and the significand contains all 0s (or 2-3FEh =2-1022d or approx. 2.22x10-308).

• The 53 bits describing the number (52 bits in the significand field + 1 implied bit) is approximately equivalent to 15 decimal digits.

•

Real 10 numbers• For REAL10 numbers, the bias of the 15 exponent's bits is 3FFFh (the last 14 bits).

• The maximum value of 7FFFh in the exponent field is reserved for NANs, and the value of 0 in that field has the same purpose as described for the REAL4 format.

• • As opposed to the REAL4 and REAL8 formats, the first bit of the number is explicitly

included in the significand field and followed by the fraction bits f1, f2, etc.• A value of +1.0 would thus be represented in REAL10 format as: • 0 011111111111111 10000...........0b (or 3FFF1000000000000000h in hex notation).

• The largest number which can be represented properly within the REAL10 format is when the exponent field contains 7FFEh and the significand is almost equal to 2 (or almost 24000h =216384d or approx. 1.19x104932).

• The smallest one would be when the exponent field contains 1 and the significand's fraction bits contains all 0s (or 2-3FFEh =2-16382d or approx. 3.36x10-4932).

• The 64 bits of the significand describing the number is approximately equivalent to 19 decimal digits.

•

NAN and Infinity• NANs (Not-A-Number) • Whenever all the bits are set to 1 in the exponent field of a real number format, the value is designated as a NAN.• Two values in that category are generated by the FPU:

– INFINITY – INDEFINITE.

•INFINITY

• In addition to the exponent field bits being all set to 1, the value of INFINITY has the following special coding to differentiate it from other NANs:

• All fraction bits of the significand field are 0 (the explicit 1 in bit 63 remains set for the REAL10 format). In addition,

– when the sign bit is 0, that NAN is treated as +INFINITY– when the sign bit is 1, that NAN is treated as -INFINITY

• Such values of INFINITY are generated by the FPU when – - attempting to divide a valid number by 0 (Zero divide exception detected) – - the result of a computation exceeds the maximum value allowable (Overflow exception detected) – - instructed to store a value larger than the upper limit of the destination format (Overflow exception detected).

• This INFINITY value can be used as an operand in FPU instructions. Depending on the instruction, the result can vary and exceptions may or may not be detected.

•

Indefinite• INDEFINITE• In addition to the exponent field bits being all set to 1, the value of

INDEFINITE has the following special coding to differentiate it from other NANs:

• The 1st fraction bit of the significand field (f1) is set to 1, all other fraction bits being 0 (the explicit 1 in bit 63 remains set for the REAL10 format), and the sign bit is set.

• Such a value of INDEFINITE is generated by the FPU whenever a reasonable result is impossible for the given instruction. An Invalid exception is detected in some cases. Examples are:

• - using the value of INDEFINITE as an operand • - using an empty register as an operand • - subtracting two values of INFINITY • - extracting the square root of a negative number.

Other NAN• Apart from the INFINITY and INDEFINITE values which can be generated

by the FPU, there is a very large number of other NANs with all the possible permutations of fraction bits and sign bit being set to 1 when all the bits in the exponent field are set to 1.

– For example, the short REAL4 format could have over 16 million of them (2 24 -3 to be more exact).

• There are two general categories of other NANs, the QNANs (Quiet NAN) and the SNANs (Signaling NAN).

– The difference betwen the two is that the first fraction bit is 1 for the QNAN (such as for the special INDEFINITE NAN) and 0 for the SNAN (but with at least one other fraction bit set to 1).

• Although NANs could be used as valid operands with some of the FPU instructions, they are of no practical use for the average programmer.

•

BCD data type/ Assembler directive DT

BCD data type/ Assembler directive DT

• The Packed BCD (Binary Coded Decimal) data type is considered by the FPU as a signed integer and has the following 80-bit special packed decimal format.

• where: S = sign bit (0=positive, 1=negative) dn = 4-bit decimal values, d0 being the least significant (bits 72-78 are not used and ignored)

• For example, the decimal value 211 in this data type format would be: • 00000000000000000211h in hex notationThe decimal value of -65536 (-216) in this

data type format would be: 80000000000000065536h in hex notation• As with all other numerical data, the packed BCD format is stored in memory with the

least significant bytes first. • The consecutive memory bytes (in hex notation) of the above number would thus be: • 36 55 06 00 00 00 00 00 00 80 • As depicted, 18 decimal digits is the maximum which can be inserted in this format. • The largest integer which could be represented in this format would thus be 18

consecutive 9 (or 1018-1).

Internal structure of 80X87

Data Buffer

Exponent module

Instruction decoder

Operand Queue Temporary Registers

Arithmetic Module

ShifterControl Register

Status Register

Exceptions

Control Unit (CU) Numeric Execution unit ( NEU )

Bus tracking

TAG REGITER

80 BIT WIDE STACK( 0 )

( 2 )( 3 )( 4 )( 5 )( 6 )

( 7 )

( 1 )

Data

Status

Address

80X87 Registers• Add 13 registers to the 80386 and later processors

– eight floating point data registers, – control register, – status register, – a tag register, – an instruction pointer, and – a data pointer.

• The data registers are similar to the 80x86's general purpose register set insofar as all floating point calculations take place in these registers.

• The control register contains bits that let you decide how the 80x87 handles certain degenerate cases like rounding of inaccurate computations, control precision, and so on.

• The status register is similar to the 80x86's flags register; it contains the condition code bits and several other floating point flags that describe the state of the 80x87 chip.

• The tag register contains several groups of bits that determine the state of the value in each of the eight general purpose registers.

• The instruction and data pointer registers contain certain state information about the last floating point instruction executed.

80X87 data Registers• provides eight 80 bit data registers

organized as a stack.

• This is a significant departure from the organization of the general purpose registers on the 80x86 CPU that comprise a standard general-purpose register set.

• Intel refers to these registers as ST(0), ST(1), ..., ST(7). Most assemblers will accept ST as an abbreviation for ST(0).

The biggest difference between the FPU register set and the 80x86 register set is the stack organization.

80X87 data Registers• On the 80x86 CPU, the ax register

is always the ax register, no matter what happens.

• On the 80x87, however, the register set is an eight element stack of 80 bit floating point values (see the figure ).

• ST(0) refers to the item on the top of the stack, ST(1) refers to the next item on the stack, and so on.

• Many floating point instructions push and pop items on the stack; therefore, ST(1) will refer to the previous contents of ST(0) after you push something onto the stack.

80X87 Control Register requirement

• When Intel designed the 80x87 (and, essentially, the IEEE floating point standard), there were no standards in floating point hardware.

• Different (mainframe and mini) computer manufacturers all had different and incompatible floating point formats.

• Unfortunately, much application software had been written taking into account the idiosyncrasies of these different floating point formats. Intel wanted to designed an FPU that could work with the majority of the software out there (keep in mind, the IBM PC was three to four years away when Intel began designing the 8087, they couldn't rely on that "mountain" of software available for the PC to make their chip popular).

• Unfortunately, many of the features found in these older floating point formats were mutually exclusive. For example, in some floating point systems rounding would occur when there was insufficient precision; in others, truncation would occur. Some applications would work with one floating point system but not with the other.

• Intel wanted as many applications as possible to work with as few changes as possible on their 80x87 FPUs, so they added a special register, the FPU control register, that lets the user choose one of several possible operating modes for the 80x87

80X87 Control Register• Bit 12 of the control register is only present on the 8087

and 80287 chips.• It controls how the 80x87 responds to infinity. • The 80387 and later chips always use a form of infinitly

known and affine closure because this is the only form supported by the IEEE 754/854 standards.

• As such, we will ignore any further use of this bit and assume that it is always programmed with a one.

•Bits 10 and 11 provide rounding control according to the following values:

– Bits 10 & 11 Function

• 00 To nearest or even• 01 round Down• 10 round up• 11 truncate

• The "00" setting is the default. • The 80x87 rounds values above one-half of the least

significant bit up. It rounds values below one-half of the least significant bit down

• . If the value below the least significant bit is exactly one-half the least significant bit, the 80x87 rounds the value towards the value whose least significant bit is zero. For long strings of computations, this provides a reasonable, automatic, way to maintain maximum precision.

80X87 Control Register• The round up and round down options

are present for those computations where it is important to keep track of the accuracy during a computation.

• By setting the rounding control to round down and performing the operation, the repeating the operation with the rounding control set to round up, you can determine the minimum and maximum ranges between which the true result will fall.

• The truncate option forces all computations to truncate any excess bits during the computation.

• You will rarely use this option if accuracy is important to you.

• However, if you are porting older software to the 80x87, you might use this option to help when porting the software.

80X87 Control Register (contd 1)• Bits eight and nine of the control register

control the precision during computation. • This capability is provided mainly to

allow compatibility with older software as required by the IEEE 754 standard.

• The precision control bits use the following values:

Mantissa Precision Control Bits

– Bits 8 & 9 Precision Control• 00 24 bits• 01 Reserved• 10 53 bits• 11 64 bits

• For modern applications, the precision control bits should always be set to "11" to obtain 64 bits of precision.

• This will produce the most accurate results during numerical computation.

80X87 Control Register (contd 2)• Bits zero through five are the exception

masks.• These are similar to the interrupt enable

bit in the 80x86's flags register.• If these bits contain a one, the

corresponding condition is ignored by the 80x87 FPU.

• However, if any bit contains zero, and the corresponding condition occurs, then the FPU immediately generates an interrupt so the program can handle the degenerate condition.

• Bit zero corresponds to an invalid operation error.

• This generally occurs as the result of a programming error.

• Problem which raise the invalid operation exception include pushing more than eight items onto the stack or attempting to pop an item off an empty stack, taking the square root of a negative number, or loading a non-empty register.

80X87 Control Register (contd 3 )• Bit one masks the denormalized interrupt

which occurs whenever you try to manipulate denormalized values.

• Denormalized values generally occur when you load arbitrary extended precision values into the FPU or work with very small numbers just beyond the range of the FPU's capabilities.

• Normally, you would probably not enable this exception.

•Bit two masks the zero divide exception.

• If this bit contains zero, the FPU will generate an interrupt if you attempt to divide a nonzero value by zero.

• If you do not enable the zero division exception, the FPU will produce NaN (not a number) whenever you perform a zero division.

80X87 Control Register (contd 4 )•

Bit three masks the overflow exception.

• The FPU will raise the overflow exception– if a calculation overflows or

– if you attempt to store a value which is too large to fit into a destination operand (e.g., storing a large extended precision value into a single precision variable).

•Bit four, if set, masks the underflow exception.

• Underflow occurs when the result is too small to fit in the desintation operand.

• Like overflow, this exception can occur

– whenever you store a small extended precision value into a smaller variable (single or double precision) or

– when the result of a computation is too small for extended precision.

80X87 Control Register (contd 5 )• Bit five controls whether the

precision exception can occur.• A precision exception occurs

whenever the FPU produces an imprecise result, generally the result of an internal rounding operation.

• Although many operations will produce an exact result, many more will not.

• For example, dividing one by ten will produce an inexact result. Therefore, this bit is usually one since inexact results are very common.

80X87 Control Register (contd 6 )• Bits six and thirteen through fifteen in the

control register are currently undefined and reserved for future use.

• Bit seven is the interrupt enable mask, but it is only active on the 8087 FPU;

• a zero in this bit enables 8087 interrupts and a one disables FPU interrupts.

The 80x87 provides two instructions, FLDCW (load control word) and FSTCW (store control word), that let you load and store the contents of the control register.

• The single operand to these instructions must be a 16 bit memory location.

• The FLDCW instruction loads the control register from the specified memory location, FSTCW stores the control register into the specified memory location.

80X87 Status register• The FPU status register provides the

status of the coprocessor at the instant you read it.

• The FSTSW instruction stores the16 bit floating point status register into the mod/reg/rm operand.

• The status register s a 16 bit register, its layoutis

• Bits zero through five are the exception flags.

• These bits are appear in the same order as the exception masks in the control register.

• If the corresponding condition exists, then the bit is set.

• These bits are independent of the exception masks in the control register.

• The 80x87 sets and clears these bits regardless of the corresponding mask setting.

80X87 Status register• Bit six (active only on 80386 and later

processors) indicates a stack fault. • A stack fault occurs whenever there is a

stack overflow or underflow.

• When this bit is set, the C1 condition code bit determines whether there was a stack overflow (C1=1) or stack underflow (C1=0) condition.

Bit seven of the status register is set if any error condition bit is set.

• It is the logical OR of bits zero through five.

• A program can test this bit to quickly determine if an error condition exists.

Bits eight, nine, ten, and fourteen are the coprocessor condition code bits.

• Various instructions set the condition code bits as shown in the table that follows

80X87 Status register (contd)

Bits 11-13 of the FPU status register provide the register number of the top of stack. During computations, the 80x87 adds (modulo eight) the logical register numbers supplied by the programmer to these three bits to determine the physical register number at run time.

Bit 15 of the status register is the busy bit. It is set whenever the FPU is busy. Most programs will have little reason to ac

FPU Condition Code Bits

InstructionCondition Code Bits

C3 C2 C1 C0Condition

fcom, fcomp, fcompp, ficom, ficomp

0 0 X 0

0 0 X 1

1 0 X 0

1 1 X 1

ST > source

ST < source

ST = source

ST or source undefined

ftst

0 0 X 0

0 0 X 1

1 0 X 0

1 1 X 1

ST is positive

ST is negative

ST is zero (+ or -)

ST is uncomparable

FPU Condition Code Bits fxam

0 0 0 0

0 0 1 0

0 1 0 0

0 1 1 0

1 0 0 0

1 0 1 0

1 1 0 0

1 1 1 0

0 0 0 1

0 0 1 1

0 1 0 1

0 1 1 1

1 X X 1

+ Unnormalized

-Unnormalized

+Normalized

-Normalized

+0

-0

+Denormalized

-Denormalized

+NaN

-NaN

+Infinity

-Infinity

Empty register

fucom, fucomp, fucompp

0 0 X 0

0 0 X 1

1 0 X 0

1 1 X 1

ST > source

ST < source

ST = source

Unorder

X = Don't care

Condition Code InterpretationInsruction(s) C0 C3 C2 C1

fcom, fcomp, fcmpp, ftst, fucom, fucomp, fucompp, ficom, ficomp

Result of

comparison.

See table above.

Result of

comparison.

See table above.

Operand is not comparable.

Result of comparison (see table above) or stack overflow/underflow (if stack exception bit is set ).

fxamSee previous table.

See previous table.

See previous table.

Sign of result, or stack overflow/underflow (if stack exception bit is set ).

fprem, fprem1Bit 2 of remainder

Bit 0 of remainder

0- reduction done.

1- reduction incomplete.

Bit 1 of remainder or stack overflow/underflow (if stack exception bit is set ).

Condition Code Interpretation

fist, fbstp, frndint, fst, fstp, fadd, fmul, fdiv, fdivr, fsub, fsubr, fscale, fsqrt, fpatan, f2xm1, fyl2x, fyl2xp1

Undefined

Undefined

Undefined

Round up occurred or stack overflow/underflow (if stack exception bit is set ).

fptan, fsin, fcos, fsincos

Undefined

Undefined

0- reduction done.

1- reduction incomplete.

Round up occurred or stack overflow/underflow (if stack exception bit is set ).

fchs, fabs, fxch, fincstp, fdecstp, constant loads, fxtract, fld, fild, fbld, fstp (80 bit)

Undefined

Undefined

Undefined

Zero result or stack overflow/underflow (if stack exception bit is set ).

fldenv, fstor

Restored from memory operand.




fldcw, fstenv, fstcw, fstsw, fclex

Undefined

Undefined

Undefined

Undefined

finit, fsaveCleared to zero.

Cleared to zero.

Cleared to zero.

Cleared to zero.

Programming the FPU• The 80-bit registers are

generally designated in most literature as a stack of eight registers. To better understand how these 80-bit registers function, instead of imagining them as a stack, it will be easier to imagine them as a revolver barrel with 8 compartments numbered clock-wise from 0 to 7. When the FPU is initialized, all the compartments are empty and Barrel Compartment #0 (BC0) would be at the 12 o'clock position (at the TOP), as depicted in Fig

• When the FPU would be instructed to LOAD a value, it would turn the barrel clockwise by one notch and load the specified value in the top compartment.

• The first value loaded immediately after the FPU is initialized would thus go into BC7 according to the FPU's internal numbering system.

• If the FPU would be instructed to load another value while the first one is still in BC7, it would again turn the barrel clockwise by one notch and load the specified value again in the top compartment, which would now be BC6.

• Values can be loaded only into the TOP compartment of the FPU

• This could continue until all the compartments contain a value. • If, however, an attempt is made to load a value when all the compartments

have a value in them, the barrel would still turn by one notch but the attempted loading would fail (just like trying to insert a bullet into a compartment which already contains one).

• And, in addition, whatever valid value would have been in that compartment now at the TOP is also destroyed, leaving unusable trash in that register at the TOP.

• Rule #1: An FPU 80-bit register compartment MUST be free (empty) in order to load a value into it.

• Quite fortunately, these registers can be emptied with various FPU instructions. The most common way is generally referred to as "popping a register".

• The "pop" mnemonic used for the CPU is not available for the FPU. Instead, it can be included as a part of numerous FPU instructions; such instruction would be carried out normally and then immediately followed by popping the register at the TOP.

• When the FPU is instructed to POP a value, it would first remove it from whichever compartment would currently be at the TOP and then turn the barrel counter-clockwise by one notch.

• For example, if BC6 would be at the TOP and popped, BC7 would then become the register compartment at the TOP.

• Values can be popped only from the TOP compartment of the FPU

• Those BC numbers are never used directly by the programmer.• The FPU takes care of remembering where all the 80-bit values are located in its

internals and which of its compartments is at the TOP.• However, the programmer must remain aware of this internal numbering system.

• For the programmer, while still using the revolving barrel image, the 80-bit registers are ALWAYS numbered clockwise from 0 to 7 starting from the TOP.

• The numbers shown in Fig.1 above would therefore never change for referring to register numbers in FPU instructions.

• The register at the TOP would always have the number 0.

• The designation ST with its number in parenthesis (such as ST(0), ST(1), etc.) is used when reference to a given 80-bit register is required in an FPU instruction.

• (MASM also interprets ST without any explicit number as if ST(0) had been specified.)

• Any value loaded to the FPU must initially be referred to as ST(0) because it can only be loaded to the TOP compartment.

• If the FPU would be instructed to load another value while the first one is still there, that second value would now be referred to as ST(0) because it has now become the one at the TOP.

• As a consequence, the first value would now have to be referred to as ST(1). If another value is loaded, the first value would then have to be referred to as ST(2).

• After popping the last loaded value, that same first value would revert back to being referred to as ST(1).

• That is probably the most complex concept to understand by someone starting to learn how to use the FPU. When compared to the CPU where a value in EAX would always be referred to as EAX regardless of operations on the other registers, a value in an FPU register must be referred to according to its position relative to the register at the TOP.

• Rule #2: The programmer must constantly keep track of the relative location of the existing register values while other values may be loaded to or popped from the TOP register.

• A good programming practice is to insert a comment after each FPU instruction which can affect the location of register values, indicating the new ST number of each value.

• When a register is popped from the FPU, its current value can no longer be used in any operation.

• If that value would need to be used later, it should be stored in memory before popping it and reloaded when required. (Some debuggers may still show the old value in the popped register but that should only be considered as residual "gun powder".)

Programming the control word• The Control Word 16-bit register is used by the programmer to

select between the various modes of computation available from the FPU, and to define which exceptions should be handled by the FPU or by an exception handler written by the programmer.

• The Control Word is divided into several bit fields as depicted in the following Fig.1.2.

• • The IC field (bit 12) or Infinity Control allows for two types of infinity

arithmetic: • 0 = Both -infinity and +infinity are treated as unsigned infinity

(initialized state) 1 = Respects both -infinity and +infinity

• This field has been retained for compatibility with the 287 and earlier co-processors. In the more modern FPUs, this bit is disregarded and both -infinity and +infinity are respected.

Programming the control word (1)

• The RC field (bits 11 and 10) or Rounding Control determines how the FPU will round results in one of four ways:

• 00 = Round to nearest, or to even if equidistant (this is the initialized state) 01 = Round down (toward -infinity) 10 = Round up (toward +infinity) 11 = Truncate (toward 0)

• The PC field (bits 9 and 8) or Precision Control determines to what precision the FPU rounds results after each arithmetic instruction in one of three ways:

• 00 = 24 bits (REAL4) 01 = Not used 10 = 53 bits (REAL8) 11 = 64 bits (REAL10) (this is the initialized state)

• The IEM field (bit 7) or Interrupt Enable Mask determines whether any of the interrupt masks will be enabled (bit = 0) or all those masks will be disabled (bit = 1). This bit field is set to 1 in the initialized state. (This field is also for compatibility with early co-processors and not used anymore.)

Programming the control word (2)

• Bits 5-0 are the interrupt masks. In the initialized state, they are all set to 1 which lets the FPU handle all exceptions. When any one of them is set to 0, it instruct the FPU to generate an interrupt whenever that particular exception is detected so that the program will take whatever action may be deemed necessary before returning control to the FPU.

• The various interrupt masks available are: • PM (bit 5) or Precision Mask

UM (bit 4) or Underflow Mask OM (bit 3) or Overflow Mask ZM (bit 2) or Zero divide Mask DM (bit 1) or Denormalized operand Mask IM (bit 0) or Invalid operation Mask

• (A more detailed description of the various exceptions and how the FPU would normally handle them is given in the following section. This document will not describe how interrupts are generated and transmitted nor how to respond to such interrupts.)

• Bits 15-13 and 6 are reserved or unused.

Status word programming

• The Status Word 16-bit register indicates the general condition of the FPU. Its content may change after each instruction is completed.

• Part of it cannot be changed directly by the programmer. It can, however, be accessed indirectly at any time to inspect its content.

• The Status Word is divided into several bit fields as depicted in the following Fig.

• When the FPU is initialized, all the bits are reset to 0. • The B field (bit 15) indicates if the FPU is busy (B=1)

while executing an instruction, or is idle (B=0).

Status word programming

• The C3 (bit 14) and C2 - C0 (bits 10-8) fields contain the condition codes following the execution of some instructions such as comparisons. These codes will be explained in detail for each instruction affecting those fields.

• The TOP field (bits 13-11) is where the FPU keeps track of which of its 80-bit registers is at the TOP. The BC numbers described previously for the FPU's internal numbering system of the 80-bit registers would be displayed in that field. When the programmer specifies one of the FPU 80-bit registers ST(x) in an instruction, the FPU adds (modulo 8) the ST number supplied to the value in this TOP field to determine in which of its registers the required data is located.

Status word programming ( 1 )• The IR field (bit 7) or Interrupt Request gets set to 1 by the FPU

while an exception is being handled and gets reset to 0 when the exception handling is completed.

• When the interrupt is masked in the Control Word for the FPU to handle the exception, this bit may never be seen to be set while stepping through the instructions with a debugger.

• However, if the programmer handles the interrupt, that bit should remain set until the interrupt handling routine is completed.

• Bits 6-0 are flags raised by the FPU whenever it detects an exception.

• Those exception flags are cumulative in the sense that, once set (bit=1), they are not reset (bit=0) by the result of a subsequent instruction which, by itself, would not have raised that flag.

• Those flags can only be reset by either initializing the FPU (FINIT instruction) or by explicitly clearing those flags (FCLEX instruction).

Status word programming (2)

• The SF field (bit6) or Stack Fault exception is set whenever an attempt is made to either load a value into a register which is not free (the C1 bit would also get set to 1) or pop a value from a register which is free (and the C1 bit would get reset to 0). (Such stack fault is also treated as an invalid operation and the I field flag bit0 would thus also be set by this exception; see below.)

• The P field (bit5) or Precision exception is set whenever some precision is lost by instructions which do exact arithmetic.

• For example, dividing 1 by 10 does not yield an exact value in binary arithmetic and would set the P exception flag. Another example which sets the P exception flag would be the conversion of a REAL10 to a REAL4 when some of the least significant bits would be lost. If the FPU handles this exception (when the PM bit is set in the Control Word), it rounds the result according to the rounding mode specified in the RC field of the Control Word.


• The U field (bit4) or Underflow exception flag gets set whenever a value is too small (without being equal to 0) to be represented properly.

• Each of the floating point formats has a different limit on the smallest number which can be represented. The U flag gets set if the result of an operation exceeds that limit. For example, dividing a valid very small number by a large number could exceed the limit. A valid REAL10 small number may be much smaller than acceptable for the REAL4 or REAL8 formats; in such cases, conversion from the former to the latter would also set the U flag. If the FPU handles this exception (when the UM bit is set in the Control Word), it would denormalize the value until the exponent is in range or ultimately return a 0.


• The O field (bit3) or Overflow exception flag gets set whenever a value is too large in magnitude to be represented properly.

• Again, each of the floating point formats has a different limit on the largest number which can be represented. The O flag gets set if the result of an operation exceeds that limit.

• For example, multiplying a valid very large number by another large number could exceed the limit. A valid REAL10 large number may be much larger than acceptable for the REAL4 or REAL8 formats; conversion from the former to the latter would also set the O flag. If the FPU handles this exception (when the OM bit is set in the Control Word), it would generate a properly signed INFINITY according to the IC flag of the Control Word.

• The Z field (bit2) or Zero divide exception flag gets set whenever the division of a finite non-zero value by 0 is attempted.

• If the FPU handles this exception (when the ZM bit is set in the Control Word), it would generate a properly signed INFINITY according to the XOR of the operand signs and then according to the IC flag of the Control Word.

Status word programming (5)• The D field (bit1) or Denormalized exception flag gets set whenever

an instruction attempts to operate on a denormalized number or the result of the operation is a denormalized number.

•If the FPU handles this exception (when the DM bit is set in the Control Word), it would simply continue with normal processing and then check for other possible exceptions.

• The I field (bit0) or Invalid operation exception flag gets set whenever an operation is considered invalid by the FPU. Examples of such operations are: - Stack overflow or underflow - Indeterminate arithmetic such as 0 divided by 0, or subtracting infinity from infinity - Using a Not-A-Number (NAN) as an operand with some instructions - Trying to extract the square root of a negative number

TAG Word• The Tag Word 16-bit register is managed by the FPU to maintain

some information on the content of each of its 80-bit registers. • The Tag Word is divided into 8 fields of 2 bits each as depicted in

the following Fig.1.4. • • The above Tag numbers correspond to the FPU's internal

numbering system for the 80-bit registers (the BC numbers). The meaning of each pair of bits is as follows:

• 00 = The register contains a valid non-zero value 01 = The register contains a value equal to 0 10 = The register contains a special value (NAN, infinity, or denormal) 11 = The register is empty

TAG Word (1)• When the FPU is initialized, all the 80-bit registers are empty and the Tag

Word would thus have an overall value of 1111111111111111b (FFFFh).

• If a valid non-zero value is then loaded, the Tag Word would then be change to 0011111111111111b (3FFFh). (Remember that the very first value loaded goes into BC7.)

• If a second value equal to 0 was then loaded, the Tag Word would become 0001111111111111b (1FFFh). (And the second value loaded goes into BC6.)

• Although this Tag Word may contain information which could also be useful to the programmer, it cannot be accessed directly nor by itself. The only way to gain access to it is to store the FPU's environment data in memory (see the FSTENV instruction) and examine it there. However, the information available in the Tag Word could also be obtained otherwise (such as with the FXAM instruction for individual registers).

•

Internal Flag Register

• The FPU also has an internal exception flag register which is not accessible to the programmer. All these flags are cleared before each instruction and are set as each exception is encountered. Those are the flags that trigger a response from the FPU or an interrupt for the programmer's exception handler. They are also OR'ed with the exception flags of the Status Word to provide a cumulative record for the programmer.

• It is possible that several of the flags could be set with a single instruction. For example, using a denormal number as an operand would set the Denormal flag. The result of the operation with it could then set the Underflow flag and the Precision flag.

Documents

80X87 Arch and Register Set