Upload
trantruc
View
214
Download
1
Embed Size (px)
Citation preview
Math coprocessor• 8087, 80287, 80387 – separate chips.• i486DX and above – built in.• Clculate many times faster then 8086-based
processor:– real numbers,– packed BCD numbers,– long integers.
• Has its own set of registers.
Classical stack format
• Instructions treat registers like a stack.• Only a top item can be accessed.• First and sometimes second registers are
assumed.• ST is the source operand.• ST(1) is the destination operand.• Source is popped off the stack.• Result is at the top of the stack.
Memory format
• Instructions treat registers like a stack.• Item are pushed from memory or popped
to the memory.• Memory operand is always the source
operand.• ST is the destination operand.• Result is at the top of the stack without
popping the destination operand.
Register format
• Instructions treat registers like a registers.• One operand is always the stack top ST.• First operand is the destination.• Second operand is the source.• Stack position does not change.
Register pop format
• Instructions treat registers as modified stack.• Source must be the ST (stack top).• Destination is the other register.• Result is placed into te destination register.• Source (top) is popped off the stack.
Coprocessor instructions
• Loading and storing data.• Doing arithmetic calculations.• Controlling program flow.
Loading and Storing Data • Copy data between memory and registers.• Copy data between registers.• Data in memory can be:
– integer,– BCD number,– real number.
• Data transferred to coprocessor is always 10-byte real number.
Loading and Storing Data • Load commands push data onto the stack.• Store commands pop data off the stack or
copy the data to other register.• Constants can not be operands.• You can load constants like 0, 1, pi with
special instructions.• You can save the coprocessor status to the
memory and later load the status back intoregisters.
Loading and storing data
• FLD, FST, FSTP - Loads and stores real numbers.
• FILD, FIST, FISTP - Loads and stores binary integers.
• FBLD - Loads BCD.• FBSTP - Stores BCD.• FXCH - Exchanges register values.• In all istructions P means popping the ST.
Loading constants
• FLDZ - Pushes 0 into ST.• FLD1 - Pushes 1 into ST.• FLDPI - Pushes the value of pi into ST.• FLDL2E - Pushes the value of log2e into ST.• FLDL2T - Pushes log210 into ST.• FLDLG2 - Pushes log102 into ST.• FLDLN2 - Pushes loge2 into ST.
Loading and storing status• FLDCW mem2byte - Loads the control word into the
coprocessor• F[[N]]STCW mem2byte - Stores the control word in
memory• FLDENV mem14byte - Loads environment from
memory• F[[N]]STENV mem14byte - Stores environment in
memory• FRSTOR mem94byte - Restores state from memory• F[[N]]SAVE mem94byte - Saves state in memory
Loading and storing example
.DATAm1 REAL4 1.0m2 REAL4 2.0
.CODEfld m1 ; Push m1 into first itemfld st(2) ; Push third item into firstfst m2 ; Copy first item to m2fxch st(2) ; Exchange first and third itemsfstp m1 ; Pop first item into m1
.DATAm1 REAL4 1.0m2 REAL4 2.0
.CODEfld m1 ; Push m1 into first itemfld st(2) ; Push third item into firstfst m2 ; Copy first item to m2fxch st(2) ; Exchange first and third itemsfstp m1 ; Pop first item into m1
Arithmetic calculations • FADD - Adds the source and destination.• FSUB - Subtracts the source from the
destination.• FSUBR - Subtracts the destination from the
source.• FMUL - Multiplies the source and the destination.• FDIV - Divides the destination by the source.• FDIVR - Divides the source by the destination.• FABS - Sets the sign of ST to positive.• FCHS - Reverses the sign of ST.
Arithmetic calculations • FRNDINT - Rounds ST to an integer.• FSQRT - Replaces the contents of ST with
its square root.• FSCALE - Multiplies the stack-top value by
2 to the power contained in ST(1).• FPREM - Calculates the remainder of ST
divided by ST(1).
Arithmetic calculations 387• FSIN - Calculates the sine of the value in
ST• FCOS - Calculates the cosine of the value
in ST• FSINCOS - Calculates the sine and cosine
of the value in ST• FPTAN - Calculates the tangent of the
value in ST• FPATAN - Calculates the arctangent of the
ratio Y/X
Arithmetic calculations 387• FPREM1 - Calculates the partial remainder
by performing modulo division on the top two stack registers
• FXTRACT - Breaks a number down into its exponent and mantissa and pushes the mantissa onto the register stack
• F2XM1 - Calculates 2x–1• FYL2X - Calculates Y * log2 X• FYL2XP1 - Calculates Y * log2 (X+1)
Arithmetic calculations 387• F[[N]]INIT - Resets the coprocessor and restores
all the default conditions in the control and status words
• F[[N]]CLEX - Clears all exception flags and the busy flag of the status word
• FINCSTP - Adds 1 to the stack pointer in the status word
• FDECSTP - Subtracts 1 from the stack pointer in the status word
• FFREE - Marks the specified register as empty
Arithmetic calculations - Example.DATAa REAL4 3.0b REAL4 7.0cc REAL4 2.0posx REAL4 0.0negx REAL4 0.0
.CODE.
; Solve quadratic equation - no error checking; The formula is: -b +/- squareroot(b2 - 4ac) / (2a)
fld1 ; Get constants 2 and 4fadd st,st ; 2 at bottomfld st ; Copy itfmul a ; = 2afmul st(1),st ; = 4afxch ; Exchange st and st(1)fmul cc ; = 4ac
.DATAa REAL4 3.0b REAL4 7.0cc REAL4 2.0posx REAL4 0.0negx REAL4 0.0
.CODE.
; Solve quadratic equation - no error checking; The formula is: -b +/- squareroot(b2 - 4ac) / (2a)
fld1 ; Get constants 2 and 4fadd st,st ; 2 at bottomfld st ; Copy itfmul a ; = 2afmul st(1),st ; = 4afxch ; Exchange st and st(1)fmul cc ; = 4ac
fld b ; Load bfmul st,st ; = b2fsubr ; = b2 - 4ac
; Negative value here produces errorfsqrt ; = square root(b2 - 4ac)fld b ; Load bfchs ; Make it negativefxch ; Exchange
fld st ; Copy square rootfadd st,st(2) ; Plus version = -b + root(b2 - 4ac)fxch ; Exchangefsubp st(2),st ; Minus version = -b - root(b2 -4ac)
fdiv st,st(2) ; Divide plus versionfstp posx ; Store itfdivr ; Divide minus versionfstp negx ; Store it
fld b ; Load bfmul st,st ; = b2fsubr ; = b2 - 4ac
; Negative value here produces errorfsqrt ; = square root(b2 - 4ac)fld b ; Load bfchs ; Make it negativefxch ; Exchange
fld st ; Copy square rootfadd st,st(2) ; Plus version = -b + root(b2 - 4ac)fxch ; Exchangefsubp st(2),st ; Minus version = -b - root(b2 -4ac)
fdiv st,st(2) ; Divide plus versionfstp posx ; Store itfdivr ; Divide minus versionfstp negx ; Store it
Status Word register
SW C0C1C2C3
Invalid operationDenormalizedZero divideOverflowUnderflowPrecisionStack faultException flagCondition codesTop of stackReserved
Exception flags
Controlling program flow• Status word can be stored:
– into the memory,– into the AX register (80287 and above).
• Coprocessor have instructions for:– comparing operands,– testing control flags.
• These instructions compare the ST to:– specified source operand,– ST(1) if not specified.
Controlling program flow• FCOM - Compares the stack top to the source.
The source and destination are unaffected by the comparison.
• FTST - Compares ST to 0. • FCOMP - Compares the stack top to the source
and then pops the stack.• FUCOM, FUCOMP, FUCOMPP - Compares the
source to ST and sets the condition codes of the status word according to the result (80386/486 only).
• F[[N]]STSW mem2byte - Stores the status word in memory.
• FXAM - Sets the value of the control flags based on the type of the number in ST.
Controlling program flow• FPREM - Finds a correct remainder for large
operands. It uses the C2 flag to indicate whether the remainder returned is partial (C2 is set) or complete (C2 is clear).
• FNOP - Copies the stack top onto itself without having any effect on registers or memory.
• FDISI, FNDISI, FENI, FNENI - Enables or disables interrupts (8087 only).
• FSETPM - Sets protected mode. Requires a .286P or .386P directive (80287, 80387, and 80486 only).
Controlling the flow - Example.DATA
down REAL4 10.35 ; Sides of a rectangleacross REAL4 13.07diamtr REAL4 12.93 ; Diameter of a circlestatus WORD ?P287 EQU (@Cpu AND 00111y).CODE
; Get area of rectanglefld across ; Load one sidefmul down ; Multiply by the other
; Get area of circle: Area = PI * (D/2)2fld1 ; Load one andfadd st, st ; double it to get constant 2fdivr diamtr ; Divide diameter to get radiusfmul st, st ; Square radiusfldpi ; Load pifmul ; Multiply it
.DATAdown REAL4 10.35 ; Sides of a rectangleacross REAL4 13.07diamtr REAL4 12.93 ; Diameter of a circlestatus WORD ?P287 EQU (@Cpu AND 00111y).CODE
; Get area of rectanglefld across ; Load one sidefmul down ; Multiply by the other
; Get area of circle: Area = PI * (D/2)2fld1 ; Load one andfadd st, st ; double it to get constant 2fdivr diamtr ; Divide diameter to get radiusfmul st, st ; Square radiusfldpi ; Load pifmul ; Multiply it
; Compare area of circle and rectanglefcompp ; Compare and throw both awayIF p287fstsw ax ; (For 287+, skip memory)ELSEfnstsw status ; Load from coprocessor to memorymov ax, status ; Transfer memory to registerENDIFsahf ; Transfer AH to flags registerjp nocomp ; If parity set, can't comparejz same ; If zero set, they're the samejc rectangle ; If carry set, rect. is biggerjmp circle ; else circle is bigger
nocomp: ... ; Error handler...
same: ... ; Both equal...
rectangle: ... ; Rectangle bigger...
circle: ... ; Circle bigger
; Compare area of circle and rectanglefcompp ; Compare and throw both awayIF p287fstsw ax ; (For 287+, skip memory)ELSEfnstsw status ; Load from coprocessor to memorymov ax, status ; Transfer memory to registerENDIFsahf ; Transfer AH to flags registerjp nocomp ; If parity set, can't comparejz same ; If zero set, they're the samejc rectangle ; If carry set, rect. is biggerjmp circle ; else circle is bigger
nocomp: ... ; Error handler...
same: ... ; Both equal...
rectangle: ... ; Rectangle bigger...
circle: ... ; Circle bigger
Program flow – new mechanism• Available beginning with the P6 family
processors.• New instructions:
– FCOMI, FCOMIP, FUCOMI, FUCOMIP,– compare and set ZF, PF, and CF flags in the
EFLAGS register directly.• New conditional transfer instructions:
– FCMOVcc,– conditionally moves floating point values – eliminates branches.
Memory access
• When using the coprocessor, follow these three steps:– Load data from memory to coprocessor registers.– Process the data.– Store the data from coprocessor registers back to
memory.• Processing the data, can occur while the main
processor is handling other tasks.• Loading and storing data must be coordinated
Memory access
• Coprocessor instruction follows a processor instruction:– assembler coordinates this conflict automatically for
8086,– processor coordnates it automatically on 80186 and
above processors.
; Processor instruction first - No wait neededmov WORD PTR mem32[0], ax ; Load memorymov WORD PTR mem32[2], dxfild mem32 ; Load to register
; Processor instruction first - No wait neededmov WORD PTR mem32[0], ax ; Load memorymov WORD PTR mem32[2], dxfild mem32 ; Load to register
Memory access
• Processor instruction follows a coprocessor instruction:– synchronization is not automatic,– You must include WAIT or FWAIT instruction.
; Coprocessor instruction first - Wait neededfist mem32 ; Store to memoryfwait ; Wait until
; coprocessor is done
mov ax, WORD PTR mem32[0] ; Move to registermov dx, WORD PTR mem32[2]
; Coprocessor instruction first - Wait neededfist mem32 ; Store to memoryfwait ; Wait until
; coprocessor is done
mov ax, WORD PTR mem32[0] ; Move to registermov dx, WORD PTR mem32[2]
Coprocessor example; counting average of table elements
count DW 100average REAL4 0.0
mov cx,countmov si,tablefld qword ptr [si] ; load first element to the STdec cx
sum: add si,8 ; index of next elementfld qword ptr [si] ; load next element to the STfadd ; add ST(1) to STloop sumfidiv count ; divide sum in ST / countfstp average ; store the result
; counting average of table elements
count DW 100average REAL4 0.0
mov cx,countmov si,tablefld qword ptr [si] ; load first element to the STdec cx
sum: add si,8 ; index of next elementfld qword ptr [si] ; load next element to the STfadd ; add ST(1) to STloop sumfidiv count ; divide sum in ST / countfstp average ; store the result
MMX• Introduced in the Pentium MMX.• SIMD – Single Instruction Multiple Data.• Handles 64-bit packed integer data.• Works on 8 new 64-bit registers.• Three new packed data types:
– 64-bit packed byte integers (signed and unsigned).– 64-bit packed word integers (signed and unsigned).– 64-bit packed doubleword integers (signed and
unsigned).• 47 new instructions.
MMX registers
63 079 64
64-bit MMX registers
80-bit math coprocessor registers
MM7
MM0
MM6MM5MM4MM3MM2MM1
R7
R0
R6R5R4R3R2R1
MMX data types
Word
Byte
Doubleword
63 0
ByteByteByte ByteByteByteByte
Word Word Word
Doubleword
Packed byte
Packed word
Packed doubleword
SIMD instructions
Word Word Word Word Packed word
Word Word Word Word
Word Word Word Word
Packed word
Packed word
Source operand
Destination
Source operand
PADDSW + + + +
= = = =
Wraparound
0000 FFFF 8000 FFFF Packed word
0001 0001 8000 FFFF
0001 0000 0000 FFFE
Packed word
Packed word
Source operand
Destination
Source operand
PADDW + + + +
= = = =
Signed saturation
0000 FFFF 8000 FFFF Packed word
0001 0001 8000 FFFF
0001 0000 8000 FFFE
Packed word
Packed word
Source operand
Destination
Source operand
PADDSW + + + +
= = = =
Unsigned saturation
0000 FFFF 8000 FFFF Packed word
0001 0001 8000 FFFF
0001 FFFF FFFF FFFF
Packed word
Packed word
Source operand
Destination
Source operand
PADDUSW + + + +
= = = =
MMX instructions• Data transfer• Arithmetic• Comparison• Conversion• Unpacking• Logical• Shift• Empty MMX state instruction (EMMS)
Data transfer• MOVD – moves 32 bits between MMX
register and memory, or general purposeregister.
• MOVQ - moves 64 bits between MMX register and memory, or between MMX registers.
Arithmetic instructions• PADDB, PADDW, PADDD
– add packed integers with wraparound.• PSUBB, PSUBW, PSUBD
– subtract packed integers with wraparound.• PADDSB, PADDSW
– add packed signed integers with signed saturation.• PSUBSB, PSUBSW
– subtract packed signed integers with signed saturation• PADDUSB, PADDUSW
– add packed unsigned integers with unsigned saturation• PSUBUSB, PSUBUSW
– subtract packed unsigned integers with unsigned saturation
Arithmetic instructions• PMULHW
– multiply packed signed integers and store highresult.
• PMULLW– multiply packed signed integers and store low
result.• PMADDWD
– multiply and add packed integers.
Comparison instructions• PCMPEQB, PCMPEQW, PCMPEQD
– compare packed data for equal.• PCMPGTB, PCMPGTW, PCMPGTD
– compare packet signed data for greater than.
Conversion instructions• PACKSSWB
– pack words into bytes with signed saturation.• PACKSSDW
– pack doublewords into words with signedsaturation.
• PACKUSWB– pack words into bytes with unsigned
saturation.
Unpack instructions• PUNPCKHBW, PUNPCKHWD,
PUNPCKHDQ– unpack high-order data elements.
• PUNPCKLBW, PUNPCKLWD, PUNPCKLDQ– unpack low-order data elements.
Logical instructions• PAND
– bitwise logical AND.• PANDN
– bitwise logical AND NOT.• POR
– bitwise logical OR.• PXOR
– bitwise logical exclusive OR.
Shift instructions• PSLLW, PSLLD, PSLLQ
– shift packed data left logical.• PSRLW, PSRLD, PSRLQ
– shift packed data right logical.• PSRAW, PSRAD
– shift packed data right arithmetic.