47
with a focus on floating point

SSE2

  • Upload
    guido

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

SSE2. with a focus on floating point. Supported data types. For floating point (i.e., real numbers), MASM supports: real4 single precision; IEEE standard; analogous to float real8 double precision; IEEE standard; analogous to double real10 double extended precision Not IEEE standard - PowerPoint PPT Presentation

Citation preview

Page 1: SSE2

with a focus on floating point

Page 2: SSE2

For floating point (i.e., real numbers), MASM supports: real4

single precision; IEEE standard; analogous to float real8

double precision; IEEE standard; analogous to double

real10 double extended precision Not IEEE standard

NaN = Not a Number (see p. 4-14 of v1)

Page 3: SSE2

SSE2 supports 32 and 64 bit f.p. data x87 supports 32, 64, and 80 bit f.p. data

Page 4: SSE2
Page 5: SSE2

Note: These are 24-bit binary numbers.

Here they are in base 10: 2.00000000000000 1.99999988079071

Page 6: SSE2
Page 7: SSE2

SSE2 = Streaming SIMD Extensions 2 SIMD = Single Instruction Multiple Data

instructions

SSE2 introduced in 2000 on Pentium 4 and Intel Xeon processors.

Page 8: SSE2

1996 Intel MMX 1998 AMD 3DNow! 1999 Intel SSE on P3 2001 Intel SSE2 on P4 2003 Intel SSE3 (since Prescott P4) 2006 Intel SupplementalSSE3 (since Woodcrest Xeons) 2006 Intel SSE4 (4.1 and 4.2) 2007 AMD SSE5 (proposed 2007, implemented 2011) 2008 Intel AVX (proposed 2008, implemented 2011 in Intel

Westmere and AMD Bulldozer) XMM registers go from 128 bit to 256 bit, called YMM.

Page 9: SSE2

1. You must use MASM v6.15 or newer for SIMD support. (MASM v6.15 is available from the course software web page.)

2. You must enable MASM support for these instructions with the following:

.686 ;instructions for Pentium Pro (or better)

.xmm ;allow simd instructions.model flat, stdcall ;no crazy segments!

Page 10: SSE2

Each one of the 8 128-bit registers (xmm0...xmm7) can hold: 16 packed 1 byte integers 8 packed word (2 byte) integers 4 packed doubleword (4 byte) integers 2 packed quadword (8 byte) integers 1 double quadword (16 byte)

4 packed single precision (4 bytes each) floating point values

2 packed double precision (8 bytes each) floating point values

Page 11: SSE2
Page 12: SSE2
Page 13: SSE2
Page 14: SSE2
Page 15: SSE2

IA32 Registers: 8 32-bit GPRs

Integer only 8 80-bit fp regs

Floating point only 8 64-bit mmx regs

Integer only Re-uses fp regs

8 128-bit xmm regs Integer and fp

Page 16: SSE2

IA32 Registers: 8 32-bit GPRs

Integer only 8 80-bit fp regs

Floating point only 8 64-bit mmx regs

Integer only Re-uses fp regs

8 128-bit xmm regs Integer and fp

Page 17: SSE2

IA32 Registers: 8 32-bit GPRs

Integer only 8 80-bit fp regs

Floating point only 8 64-bit mmx regs

Integer only Re-uses fp regs

8 128-bit xmm regs Integer and fp

Page 18: SSE2

IA32 Registers: 8 32-bit GPRs

Integer only 8 80-bit fp regs

Floating point only 8 64-bit mmx regs

Integer only Re-uses fp regs

8 128-bit xmm regs Integer and fp These will be the

focus of our discussion.

Page 19: SSE2
Page 20: SSE2

XMMregisterformats

Page 21: SSE2

The utilities.asm MASM code (on the course’s software web page) contains a function that you can call to display the contents of the 8 xmm registers (dump) as pairs of 64 bit double precision fp values.

call dumpXmm64

Page 22: SSE2
Page 23: SSE2

1. Data movement

2. Arithmetic

3. Comparison

4. Conversion

Page 24: SSE2

1. Data movement

2. Arithmetic

3. Comparison

4. Conversion

Page 25: SSE2

movhpd Move High Packed Double-Precision Floating-

Point Value

movlpd Move Low Packed Double-Precision Floating-

Point Value

movsd Move Scalar Double-Precision Floating-Point

Value

Page 26: SSE2

movhpd - Move High Packed Double-Precision Floating-Point Value for memory to XMM move:

DEST[127-64] ← SRC; DEST[63-0] unchanged Ex. movhpd xmm0, m64

for XMM to memory move: DEST ← SRC[127-64] Ex. movhpd m64, xmm2

Page 27: SSE2

movlpd - Move Low Packed Double-Precision Floating-Point Value for memory to XMM move:

DEST[127-64] unchanged; DEST[63-0] ← SRC

Ex. movlpd xmm1, m64 for XMM to memory move:

DEST ← SRC[63-0] Ex. movlpd m64, xmm2

Page 28: SSE2

movsd - Move Scalar Double-Precision Floating-Point Value

1. when source and destination operands are both XMM registers: DEST[127-64] remains unchanged; DEST[63-0] ←

SRC[63-0] Ex. movsd xmm1, xmm3

2. when source operand is XMM register and destination operand is memory location: DEST ← SRC[63-0] Ex. movsd m64, xmm2

3. when source operand is memory location and destination operand is XMM register: DEST[127-64] ← 0000000000000000H; DEST[63-0] ← SRC Ex. movsd xmm1, m64

Page 29: SSE2

1. Data movement

2. Arithmetic (scalar)

3. Comparison

4. Conversion

Page 30: SSE2

addsd - Add Scalar Double-Precision Floating-Point Values

subsd - Subtract Scalar Double-Precision Floating-Point Values

mulsd - Multiply Scalar Double-Precision Floating-Point Values

divsd - Divide Scalar Double-Precision Floating-Point Values

Also sqrtsd but no sin or cos SSE2 instructions! We have to use the x87 instructions for that!

Page 31: SSE2

addsd DEST[63-0] ← DEST[63-0] + SRC[63-0] DEST[127-64] remains unchanged

Page 32: SSE2

subsd DEST[63-0] ← DEST[63-0] − SRC[63-0] DEST[127-64] remains unchanged

Page 33: SSE2

mulsd DEST[63-0] ← DEST[63-0] * xmm2/m64[63-0] DEST[127-64] remains unchanged

Page 34: SSE2

divsd DEST[63-0] ← DEST[63-0] / SRC[63-0] DEST[127-64] remains unchanged

Page 35: SSE2

1. Data movement

2. Arithmetic (packed)

3. Comparison

4. Conversion

Page 36: SSE2

addpd - Add Packed Double-Precision Floating-Point Values

subpd - Subtract Packed Double-Precision Floating-Point Values

mulpd - Multiply Packed Double-Precision Floating-Point Values

divpd - Divide Packed Double-Precision Floating-Point Values

Page 37: SSE2

addpd - Add Packed Double-Precision Floating-Point Values DEST[63-0] ← DEST[63-0] + SRC[63-0] DEST[127-64] ← DEST[127-64] + SRC[127-64]

Page 38: SSE2

subpd - Subtract Packed Double-Precision Floating-Point Values DEST[63-0] ← DEST[63-0] / (SRC[63-0]) DEST[127-64] ← DEST[127-64] / (SRC[127-64])

Page 39: SSE2

mulpd - Multiply Packed Double-Precision Floating-Point Values DEST[63-0] ← DEST[63-0] / (SRC[63-0]) DEST[127-64] ← DEST[127-64] / (SRC[127-64])

Page 40: SSE2

divpd - Divide Packed Double-Precision Floating-Point Values DEST[63-0] ← DEST[63-0] / (SRC[63-0]) DEST[127-64] ← DEST[127-64] / (SRC[127-64])

Page 41: SSE2

1. Data movement

2. Arithmetic

3. Comparison

4. Conversion

Page 42: SSE2

comisd Compare Scalar Ordered Double-Precision

Floating-Point Values and Set EFLAGS

Page 43: SSE2

1. Data movement

2. Arithmetic

3. Comparison

4. Conversion

Page 44: SSE2

cvtsd2si Convert Scalar Double-Precision Floating-Point

Value to Doubleword Integer

cvtsi2sd Convert Doubleword Integer to Scalar Double-

Precision Floating-Point Value

Page 45: SSE2

cvtsd2si Convert Scalar Double-Precision Floating-Point

Value to Doubleword Integer DEST[31-0] ←

Convert_Double_Precision_Floating_Point_To_Integer(SRC[63-0])

Page 46: SSE2

cvtsi2sd Convert Doubleword Integer to Scalar Double-

Precision Floating-Point Value DEST[63-0] ←

Convert_Integer_To_Double_Precision_Floating_Point(SRC[31-0])

DEST[127-64] remains unchanged

Page 47: SSE2