AC CourseWork MihaiTrofim (1)

Ministry of Education of Republic of Moldova

Technical University of MoldovaFaculty of Computers, Informatics and Microelectronics

Anglophone Department

Course WorkComputer Architecture

Topic: Floating point multiplication (algorithm nr.1)

Performed by:

st. gr. FAF-141 (l. eng.) Trofim Mihai Verified by:

dr. conf. univ. Sudacevschi Viorica

Chişinău 2016

Content

Introduction...............................................................................................................21. Central Processing Unit.......................................................................................2

1.1 CPU basics.......................................................................................................21.2 The register set.................................................................................................31.3 Instruction cycle...............................................................................................4

2. I8086 microprocessor architecture......................................................................52.1 Execution Unit.................................................................................................62.2 Bus Interface Unit............................................................................................62.3 Registers set of I8086.......................................................................................7

3. Instruction set architecture..................................................................................93.1 Instruction Format.......................................................................................113.2 Instruction Types........................................................................................12

4. Floating Point Arithmetic..................................................................................154.1 Floating point numbers……………………………………………………..154.2 Floating point numers range..........................................................................15

4.3 IEEE : floating point in modern computer………………………………….164.4 Floating point multiplication algorithm nr. 1.................................................18

5. Code explanation...…………………………………………………………..…216. Code explanation...…………………………………………………………..…218.Conclusion………………………………………………………………………22 Appendix………………………………………………………………………....23

1

Introduction

A computer consists of a set of physical components (hardware) and system programs (system software) that are responsible for data processing according to an algorithm, specified by the user through an application program (application software).

Computer systems have conventionally been defined through their interfaces at a number of abstraction levels, each providing functional support to its predecessor. Included among the levels are the application programs, the high-level languages, and the set of machine instructions.

In the past, the term computer architecture often referred only to instruction set design that represents an interface between hardware and the lowest level software - machine instructions (binary coded programs).

A different definition of computer architecture is built on four basic viewpoints:

structure (defines the interconnection of various hardware components), organization (defines the dynamic interplay and management of the various

components), implementation (defines the detailed design of hardware components), performance (specifies the behavior of the computer system).

1. Central Processing Unit1.1 CPU basics

A typical CPU has three major components: 1. register set, 2. arithmetic logic unit (ALU), 3. control unit (CU).

2

The register set differs from one computer architecture to another. It is usually a combination of general-purpose and special purpose registers.

The ALU provides the circuitry needed to perform the arithmetic, logical and shift operations demanded of the instruction set. It also generates information about carry, overflow and other special cases. It consists of combinational logic circuits: adders, decoders, encoders, multiplexers and a set of registers (ex. accumulator), used as a fast memory in arithmetic and logic operations.

The control unit is the entity responsible for fetching the instruction to be executed from the main memory and decoding and then executing it.

The main components of the CPU and its interactions with the memory system and the input/output devices:

1.2 The register set

The register set is usually a combination of general-purpose and special purpose registers.

General-purpose registers can be used for multiple purposes and assigned to a variety of functions by the programmer. Special-purpose registers are restricted to only specific functions.

Two main registers are involved in fetching an instruction for execution:

3

program counter (PC) (is the register that contains the address of the next instruction to be fetched). After a successful instruction fetch, the PC is updated to point to the next instruction to be executed.

instruction register (IR) in which the fetched instruction is loaded Two registers are essential in memory write and read operations:

memory data register (MDR) memory address register (MAR).

The MDR and MAR are used exclusively by the CPU and are not directly accessible to programmers.

In order to perform a write operation into a specified memory location, the MDR and MAR are used as follows:

1. The word to be stored into the memory location is first loaded by the CPU into MDR.

2. The address of the location into which the word is to be stored is loaded by the CPU into a MAR.

3. A write signal is issued by the CPU. Similarly, to perform a memory read operation, the MDR and MAR are used

as follows: 1. The address of the location from which the word is to be read is loaded into

the MAR. 2. A read signal is issued by the CPU. 3. The required word will be loaded by the memory into the MDR ready for

use by the CPU. Some architectures contain a special program status word (PSW) register

or a Flag register. The PSW contains bits that are set by the CPU to indicate the current status of an executing program. These indicators are typically for arithmetic operations, interrupts, memory protection information, or processor status.

1.3 Instruction cycle

The basic function performed by a computer is execution of a program, which consists of a set of instructions stored in memory. The CPU reads (fetch) instructions from memory one at a time and executes each instruction. Program execution consists of repeating the process of instruction fetch and execution.

4

The processing required for a single instruction is called an instruction cycle. It consists of two steps: fetch cycle and execute cycle. The instruction cycle is the multiple of the clock signal.

The fetched instruction is loaded into the IR. The processor interprets a binary code of the instruction and executes the required action: reads and writes data from and to memory, and transfers data from and to input/output devices. A typical and simple instruction cycle can be summarized as follows:

1. Instruction address calculation: determine the address of the next instruction to be executed by adding a fixed number to the address of the previous instruction in PC.

2. Instruction fetch: Read the instruction from its memory location and store it into IR.

3. Instruction decoding: analyze instruction to determine type of operation to be performed and operands to be used.

4. Operands address calculation, if needed. 5. Operand fetch: fetch the operand from memory and store it in CPU

registers, if needed. 6. Instruction execution. 7. Results store: results are transferred from CPU registers to memory, if

needed. The instruction cycle is repeated as long as there are more instructions to

execute. A check for pending interrupts is usually included in the cycle. Examples of

interrupts include I/O device request, arithmetic overflow, division by zero, etc. Interrupts are provided primarily as a way to improve processing efficiency. For example, most external devices are much slower than a processor. With interrupts; the processor can be engaged in executing other instructions while an I/O operation is in progress.

To accommodate interrupts, an interrupt cycle is added to the instruction cycle. In the interrupt cycle, the processor checks to see if any interrupts have occurred. If no interrupts are pending, the processor proceeds to the fetch cycle for the next instruction. If an interrupt is pending, the processor suspends execution of the current program, saves the address of the next instruction and relevant data. Then it sets the PC to the starting address of an interrupt handler routine.

The actions of the CPU during an instruction cycle are defined by micro-orders issued by the control unit. These micro-orders are individual control signals sent over dedicated control lines.

5

2. I8086 microprocessor architectureThe I8086 microprocessor architecture consists of two sections:

execution unit (EU) bus interface unit (BIU)

These two sections work simultaneously. BIU accesses memory and peripherals while the EU executes the instructions previously fetched. Thus, Intel implemented the concept of pipelining. Pipelining is the simplest form to allow the CPU to fetch and execute at the same time.

It only works if BIU keeps ahead of EU. Thus BIU has a buffer of queue. (6 bytes). If the execution of any instruction takes to long, the BIU is filled to its maximum capacity and busses will stay idle. It starts to fetch again whenever there is 2-byte room in the queue.

When there is a jump instruction, the microprocessor must flush out the queue. When a jump instruction is executed BIU starts to fetch information from the new location in the memory. In this situation EU must wait until the BIU starts to fetch the new instruction. This is known as branch penalty.

2.1 Execution Unit

The Execution Unit executes all instructions, provides data and addresses to the Bus Interface Unit and manipulates the general registers and the Processor Status Word (Flags register).

The 16-bit ALU performs arithmetic and logic operations, control flags and manipulates the general registers and instruction operands.

The Execution Unit does not connect directly to the system bus. It obtains instructions from a queue maintained by the Bus Interface Unit. When an

6

instruction requires access to memory or a peripheral device, the Execution Unit requests the Bus Interface Unit to read and write data.

2.2 Bus Interface Unit

The Bus Interface Unit facilities communication between the EU and memory or I/O circuits. It is responsible for transmitting address, data, and control signals on the buses. This unit consists of the segment registers, the Instruction Pointer, internal communication registers, a logic circuit to generate a 20 bit address, bus control logic that multiplexers data and address lines, the instruction code queue (6 bytes RAM).

2.3 Registers set of I8086

1. General Purpose Registers The CPU has eight 16-bit general registers. The general registers are

subdivided into two sets of four registers. These sets are the data registers (also called the H & L group for high and low) and the pointer and index registers (also called the P & I group).

7

The data registers can be addressed by their upper or lower halves. Each data register can be used interchangeably as a 16-bit register or two 8-bit registers. The pointer and index registers are always accessed as 16-bit values. The μp can use data registers without constraint in most arithmetic and logic operations. Arithmetic and logic operations can also use the pointer and index registers. Some instructions use certain registers implicitly allowing compact encoding. SP - Stack Pointer: Always points to top item of the stack. BP - Base Pointer: It is used to access any item in the stack; SI - Source Index: Contains the address of the current element in the source string; DI - Destination Index: Contains the address of the current element in the destination string.

2. Segment registersThe microprocessor 8086 has a 20-bit address bus for 1 Mbyte

external memory but inside the CPU registers have 16 bits that can access 64 Kbytes. The 8086 family memory space is divided into logical segments of up to 64 Kbytes each. The segment registers contain the base addresses (starting locations) of these memory segments.

CS (code segment) – points at the segment containing the current program.

DS (data segment) – generally points at the segment where variables are defined.

ES (extra segment) – extra segment register, it's up to a coder to define its usage.

SS (stack segment) – points at the segment containing the stack.

3. Special purpose registers IP - the instruction pointer or program counter: Always points to next

instruction to be executed. It contains the offset (displacement) of the next instruction from the start address of the code segment.

Flags Register - determines the current state of the processor. It is also called PSW (processor state word). From 16 bits are used only 9. Flags Register is modified automatically by CPU after mathematical operations, this allows to determine the type of the result, and to determine conditions to transfer control to other parts of the program. Generally you cannot access these registers directly.

8

All flags can be divided into condition (status) flags and control (system) flags. Condition flags:

0 bit -Carry Flag (CF) - this flag is set to 1 when there is a carry (borrow) from the 8 or 16 bit in addition or subtraction operation. For example when you add bytes 255 + 1 (result is not in range 0...255). When there is no a carry or borrow this flag is set to 0. It is also used to store the value of the MSB in shift operations.

2 bit - Parity Flag (PF) - this flag is set to 1 when there is even number of one bits in result, and to 0 when there is odd number of one bits. Even if result is a word only 8 low bits are analyzed!

4 bit - Auxiliary Flag (AF) - set to 1 when there is an unsigned overflow for low nibble (4 bits).

6 bit - Zero Flag (ZF) - set to 1 when result is zero. For none zero result this flag is set to 0.

7 bit - Sign Flag (SF) - set to 1 when result is negative. When result is positive it is set to 0. Actually this flag take the value of the most significant bit.

11 bit - Overflow Flag (OF) - set to 1 when there is a signed overflow. For example, when you add bytes 100 + 50 (result is not in range -128...127).

Control flags: 8 bit - Trap Flag (TF) System flag - Used for on-chip debugging (pas cu

pas) when TF=1. In this case the interrupt is generated (int 1) which calls a special routine to show the state of internal registers. There are no instructions to change this flag. The content of PSW is written in one general Rg through the stack to can change it.

9 bit - Interrupt enable Flag (IF) System flag - when this flag is set to 1 CPU reacts to interrupts on INTR input of the microprocessor from external devices. When IF=0 interrupts are not allowed (masked). IF do not react to NMI (non maskable) interrupts and to internal interrupts performed by instruction INT. Instructions CLI (clear interrupt) and STI (set interrupt) are used to control this flag.

10 bit - Direction Flag (DF) - this flag is used by some instructions to process data chains, when this flag is set to 0 - the processing is done forward (increment of SI and DI registers), when this flag is set to 1 the processing is done backward - decrement (instructions CLD and STD).

9

3. Instruction set architecture The instruction set architecture (ISA) includes:

instruction set in a binary code (machine language) that is recognized by a processor;

data types with which instructions can operate; environment in which instructions operate.

Technically, CPUs come in two main architectures: CISC (Complex Instruction-Set Computing) RISC (Reduced Instruction-Set Computing).

CISC chips (Motorola 68k and Intel x86 architectures) sacrifice speed in favour of having a complete set of built-in instructions on the chip. RISC chips (Power PC, ARM, SPARC) contain fewer instructions but can execute their tasks much faster.

A computer program can be represented at different levels of abstraction. A program could be written in a machine-independent, high-level language such as Java or C++.

A computer can execute programs only when they are represented in machine language specific to its architecture.

A machine language program for a given architecture is a collection of machine instructions represented in binary form that are recognised by a Control Unit (CU). According to this binary code, CU selects a certain transition states

10

algorithm and generates control signals to ALU and registers. The algorithm can be microprogramed or hardwired.

Programs written at any level higher than the machine language must be translated to the binary representation before a computer can execute them.

An assembly language program is a symbolic representation of the machine language program.

Converting the symbolic representation into machine language is performed by a special program called the assembler.

Although high-level languages and compiler technology have witnessed great advances over the years, assembly language remains necessary in some cases.

Programming in assembly can result in machine code that is much smaller and much faster than that generated by a compiler of a high-level language. Small and fast code could be critical in some embedded and portable applications, where resources may be very limited. In such cases, small portions of the program that may be heavily used can be written in assembly language.

Assembly programmers have access to all the hardware features of the target machine that might not be accessible to high-level language programmers.

Learning assembly languages can be of great help in understanding the low level details of computer organization and architecture.

Machine language is the native language of a given processor. Since assembly language is the symbolic form of machine language, each different type of processor has its own unique assembly language. Before we study the assembly language of a given processor, we need first to understand the details of that processor. We need to know the memory size and organization, the processor registers, the instruction format, and the entire instruction set.

3.1 Instruction Format

Assembly language is the symbolic form of machine language. Assembly programs are written with short abbreviations that represents the actual machine instruction called mnemonics.

The use of mnemonics is more meaningful than that of hex or binary values, which would make programming at this low level easier and more manageable. Examples: mov - move, add – addition, aub – subtraction, mul – multiplication.

An assembly program consists of a sequence of assembly statements, where statements are written one per line. Each line of an assembly program is split into the following four fields: label, operation code (opcode), operand, and comments.

11

Labels are used to provide symbolic names for memory addresses. A label is an identifier that can be used on a program line in order to branch to the labeled line. It can also be used to access data using symbolic names. The operation code (opcode) field contains the symbolic abbreviation of a given operation. The operand field consists of additional information or data that the opcode requires. The operand field may be used to specify constant, label, immediate data, register, or a memory address. The comments field provides a space for documentation to explain what has been done for the purpose of debugging and maintenance. In I8086 instruction consists from one to six bytes.

According to the length of the instructions exists two types of ISA: 1. With fixed length instructions (commonly used in RISC architectures) 2. With variable length instructions (commonly used in CISC architectures)

The advantage of using variable length instructions is that they reduce the amount of memory space required for a program. In I8086 instructions are from one byte to a maximum of 6 bytes in length.

The advantage of fixed length instructions is that they make the job of fetching and decoding instructions easier and more efficient, which means that they can be executed in less time than the corresponding variable length instructions. Instructions can be classified based on the number of operands as: three-address, two-address, one-address, and zero-address.Examples:

Three-address instruction formats are not common, because they require a relatively long space to hold all addresses.

In two-address instruction one address is an operand and also a result. In one-address instruction a second address is implicit. Usually it is the

accumulator AX. It is used for one operand and the result. Zero-address instructions are applicable to stack memory and use as address

the content of SP (top of the stack).

12

The number of addresses per instruction is a basic design decision. Fewer addresses per instruction result in more primitive instructions, which require a less complex CPU. It also results in instruction of shorter length. On the other hand programs contain more total instructions and have a longer execution time. Another problem: with one-address instructions, the programmer has available only one general-purpose register – the accumulator, with multiple address instructions it is common to have multiple general-purpose registers. Because register references are faster than memory references this speeds up execution. Most contemporary machines employ a mixture of two – and three – address instructions.

3.2 Instruction TypesThe X86 family of processors defines a number of instruction types. I. Data transfer instructions

1. General-purpose data transfer MOV dst, src: (dst) ← (src) copies the second operand to the first

operand.XCHG dst, src: (dst) ← (src) exchange bytes or exchange words.

2. Data transfer with stack PUSH src: copy specified word to top of stack. POP dst: copy word from top of stack to specific location.

3. Flag transfer PUSHF: Copy flag register to top of stack. POPF: Copy word at top of stack to flag register LAHF: Load AH with the low byte of the flag register. No operands SAHF: Store AH register into low 8 bits of Flags register. No operands

4. Address transfer LEA reg, src: Load effective address of operand in specified register. LDS reg, src: Load DS register and other specified register from

memory. LES reg, src: Load ES register and other specified register from

memory. 5. I/O port transfer

IN ac, port: Copy a byte or word from specified port to accumulator IN ac, DX

13

OUT port, ac: Copy a byte or word from accumulator to specified port.

OUT DX, ac

II. Arithmetic instructions Arithmetic operations are executed on integer numbers in 4 formats:

unsigned binary (byte or word ) 5h - 0000 0101 signed binary (byte or word), -5h or FAh 1111 1011 packed decimal ( the string of decimal digits are stored in consecutive 4-bit

groups : 3251- 0011 0010 0101 0001) unpacked decimal ( each digit is stored in low 4-bit part of the byte: 3251 -

****0011 ****0010 ****0101 ****0001) All arithmetic instructions influence flags that can be checked with

conditional transfer instructions. Arithmetic operations can use all addressing modes but one operand should

be a register. ADD dst, src: dst ← (dst) + (scr) src can be also immediate value of 8 or 16 bits ADC dst,src: dst ← (dst) + (src) + CF. It is used in multiple precision operations SUB dst, src: dst ← (dst) - (src) Subtract byte from byte or word from word. SBB dst, src: dst ← (dst) - (src) - CF INC opr: opr ← (opr) + 1 do not change CF. DEC opr: opr ← (opr) - 1NEG opr: opr ← - (opr) Negate – invert each bit of a specified byte or word and add 1 (form 2’s complement). CMP opr1, opr2: Compare two specified bytes or two specified words and do not keep the result, just for flags (OF, SF, ZF, AF, PF, CF according to result). It is used with conditional jump instructions. CBW: (no opr) (for signed binary) converts byte to word. If the high digit in AL is 0 then all AH bits are 0, if high bit in AL is 1 then all AH bits are 1.CWD: convert word to double word. Works with AX and DX (high word) MUL src: (AX) ← (AL) * (src) for bytes CF and OF =1 if the high byte is not 0 (DX : AX) ← (AX) * (src) for words IMUL src: Multiply signed byte by byte or signed word by word CF and OF =1 if the high byte is not the extension of sign DIV src:

14

divisor is a byte (AL) ← quotient (AX) / (src) (AH) ← remainder (AX) / (src)

divisor is a word (AX) ← quotient (DX : AX) / (src) (DX) ← remainder (DX : AX) / (src)

IDIV src: Divide signed word by byte or signed double word by word.

4. Floating Point Arithmetic4.1 Floating point numbers

In computing, floating point is the formulaic representation which approximates a real number so as to support a trade-off between range and precision. A number is, in general, represented approximately to a fixed number of significant digits (the significand) and scaled using an exponent; the base for the scaling is normally two, ten, or sixteen. A number that can be represented exactly is of the following form:

Where

The term floating point refers to the fact that a number's radix point (decimal point, or, more commonly in computers, binary point) can "float"; that is, it can be placed anywhere relative to the significant digits of the number. This position is indicated as the exponent component, and thus the floating-point representation can be thought of as a kind of scientific notation.

A floating-point system can be used to represent, with a fixed number of digits, numbers of different orders of magnitude: e.g. the distance between galaxies or the diameter of an atomic nucleus can be expressed with the same unit of length. The result of this dynamic range is that the numbers that can be represented are not uniformly spaced; the difference between two consecutive representable numbers grows with the chosen scale.

Over the years, a variety of floating-point representations have been used in computers. However, since the 1990s, the most commonly encountered representation is that defined by the IEEE 754 Standard.

15

http://en.wikipedia.org/wiki/IEEE_754

http://en.wikipedia.org/wiki/Orders_of_magnitude_(numbers)

http://en.wikipedia.org/wiki/Scientific_notation

http://en.wikipedia.org/wiki/Radix_point

http://en.wikipedia.org/wiki/Exponentiation

http://en.wikipedia.org/wiki/Significand

http://en.wikipedia.org/wiki/Significant_figures

http://en.wikipedia.org/wiki/Accuracy_and_precision

http://en.wikipedia.org/wiki/Trade-off

http://en.wikipedia.org/wiki/Real_number

http://en.wikipedia.org/wiki/Computing

The speed of floating-point operations, commonly measured in terms of FLOPS, is an important characteristic of a computer system, especially for applications that involve intensive mathematical calculations.

4.2 Floating point numbers rangeA floating-point number consists of two fixed-point components, whose range depends exclusively on the number of bits or digits in their representation. Whereas components linearly depend on their range, the floating-point range linearly depends on the significant range and exponentially on the range of exponent component, which attaches outstandingly wider range to the number.On a typical computer system, a 'double precision' (64-bit) binary floating-point number has a coefficient of 53 bits (one of which is implied), an exponent of 11 bits, and one sign bit. Positive floating-point numbers in this format have an approximate range of 10−308 to 10308, because the range of the exponent is [−1022, 1023] and 308 is approximately log10(21023). The complete range of the format is from about −10308 through +10308.The number of normalized floating-point numbers in a system F (B, P, L, U) (where B is the base of the system, P is the precision of the system to P numbers, L is the smallest exponent representable in the system, and U is the largest exponent used in the system) is:

.

There is a smallest positive normalized floating-point number, Underflow level = UFL = which has a 1 as the leading digit and 0 for the remaining digits of the significand, and the smallest possible value for the exponent.There is a largest floating-point number, Overflow level = OFL =

which has B − 1 as the value for each digit of the significand and the largest possible value for the exponent.In addition there are representable values strictly between − UFL and UFL. Namely, positive and negative zeros, as well as denormalized numbers.

4.3 IEEE : floating point in modern computerThe IEEE has standardized the computer representation for binary floating-point numbers in IEEE 754 (a.k.a. IEC 60559). This standard is followed by almost all modern machines. IBM mainframes support IBM's own hexadecimal floating point

16

http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture

http://en.wikipedia.org/wiki/IEEE_floating_point

http://en.wikipedia.org/wiki/IEEE

http://en.wikipedia.org/wiki/Denormal_numbers

http://en.wikipedia.org/wiki/Signed_zero

http://en.wikipedia.org/wiki/Fixed-point_arithmetic

http://en.wikipedia.org/wiki/FLOPS

format and IEEE 754-2008 decimal floating point in addition to the IEEE 754 binary format. The Cray T90 series had an IEEE version, but the SV1 still uses Cray floating-point format.

The standard provides for many closely related formats, differing in only a few details. Five of these formats are called basic formatsand others are termed extended formats; three of these are especially widely used in computer hardware and languages:

Single precision, usually used to represent the "float" type in the C language family (though this is not guaranteed). This is a binary format that occupies 32 bits (4 bytes) and its significand has a precision of 24 bits (about 7 decimal digits).

Double precision, usually used to represent the "double" type in the C language family (though this is not guaranteed). This is a binary format that occupies 64 bits (8 bytes) and its significand has a precision of 53 bits (about 16 decimal digits).

Double extended, also called "extended precision" format. This is a binary format that occupies at least 79 bits (80 if the hidden/implicit bit rule is not used) and its significand has a precision of at least 64 bits (about 19 decimal digits). A format satisfying the minimal requirements (64-bit precision, 15-bit exponent, thus fitting on 80 bits) is provided by the x86 architecture. In general on such processors, this format can be used with "long double" in the C language family (the C99 and C11 standards "IEC 60559 floating-point arithmetic extension- Annex F" recommend the 80-bit extended format to be provided as "long double" when available). On other processors, "long double" may be a synonym for "double" if any form of extended precision is not available, or may stand for a larger format, such as quadruple precision.

Increasing the precision of the floating point representation generally reduces the amount of accumulated round-off error caused by intermediate calculations.[8]

Less common IEEE formats include:

Quadruple precision (binary128). This is a binary format that occupies 128 bits (16 bytes) and its significand has a precision of 113 bits (about 34 decimal digits).

Double precision (decimal64) and quadruple precision (decimal128) decimal floating-point formats. These formats, along with the single precision (decimal32) format, are intended for performing decimal rounding correctly.

Half, also called binary16, a 16-bit floating-point value.

17

http://en.wikipedia.org/wiki/Half_precision

http://en.wikipedia.org/wiki/Decimal32_floating-point_format




http://en.wikipedia.org/wiki/Quadruple_precision

http://en.wikipedia.org/wiki/Floating_point#cite_note-8

http://en.wikipedia.org/wiki/Round-off_error

http://en.wikipedia.org/wiki/C11_(C_standard_revision)

http://en.wikipedia.org/wiki/C99

http://en.wikipedia.org/wiki/Long_double

http://en.wikipedia.org/wiki/X86_architecture

http://en.wikipedia.org/wiki/Extended_precision

http://en.wikipedia.org/wiki/C_data_types#Basic_types

http://en.wikipedia.org/wiki/Double_precision

http://en.wikipedia.org/wiki/C_data_types#Basic_types

http://en.wikipedia.org/wiki/Single_precision

http://en.wikipedia.org/wiki/Cray_SV1

http://en.wikipedia.org/wiki/Cray_T90

http://en.wikipedia.org/wiki/Decimal_floating_point

http://en.wikipedia.org/wiki/IEEE_754-2008

http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture

Any integer with absolute value less than 224 can be exactly represented in the single precision format, and any integer with absolute value less than 253 can be exactly represented in the double precision format. Furthermore, a wide range of powers of 2 times such a number can be represented. These properties are sometimes used for purely integer data, to get 53-bit integers on platforms that have double precision floats but only 32-bit integers.

The standard specifies some special values, and their representation: positive infinity (+∞), negative infinity (−∞), a negative zero (−0) distinct from ordinary ("positive") zero, and "not a number" values (NaNs).

Comparison of floating-point numbers, as defined by the IEEE standard, is a bit different from usual integer comparison. Negative and positive zero compare equal, and every NaN compares unequal to every value, including itself. All values except NaN are strictly smaller than +∞ and strictly greater than −∞. Finite floating-point numbers are ordered in the same way as their values (in the set of real numbers).

A project for revising the IEEE 754 standard was started in 2000 (see IEEE 754 revision); it was completed and approved in June 2008. It includes decimal floating-point formats and a 16-bit floating-point format ("binary16"). binary16 has the same structure and rules as the older formats, with 1 sign bit, 5 exponent bits and 10 trailing significand bits. It is being used in the NVIDIA Cg graphics language, and in the openEXR standard.

4.4 Floating point multiplication algorithm nr. 1First we must declare out floating numbers as mantissa and exponent.

x=m x ∙ 2e x

y=m y ∙ 2e y

The result is stored in z=m z ∙ 2e z

Here are the steps of the algorithm:1. Determination of sign of the result

sign(mz) = sign(mx) ⊕ sign(my)

As the 8086 processor sees the numbers if Two’s Complement code, the sign of a number is its MSB (Most Significant Bit)

2. Find modules of mantissas

18

http://en.wikipedia.org/wiki/Cg_(programming_language)

http://en.wikipedia.org/wiki/IEEE_754_revision

http://en.wikipedia.org/wiki/IEEE_754_revision

http://en.wikipedia.org/wiki/NaN

http://en.wikipedia.org/wiki/Negative_zero

http://en.wikipedia.org/wiki/Infinity

If a number is positive, it’s module remain unchanged. If it is negative (MSB = 1) it is converted in CC by inverting all bits, and adding 1 to the number.

if( sign(mx) = 0 ) | mx | = mx

if( sign(mx) = 1 ) | mx | = neg( mx )

neg is an assembly instruction which performs the conversion in CC.

3. Find result’s exponent

ez = ex + ey , but it may not be the final exponent, as the number might be denormalized

4. Multiplication of modules (1 st algorithm)

o Allocate double memory for an adder which will be the result’s mantissa, and for multiplicand

o Check LSB of my if LSB = 0 => shift my to right, shift mx to left if LSB = 1 => add adder and mx then shift my to right, shift mx

to lefto Check my

If it is 0 (all the initial bits has been shifted) => quit the algorithm

If it is not zero = > return to 2nd step

19

5. Result normalization

Shift mz to left how many times MSB is repeated.if ( mz = 1.1… or mz = 0.0… ) mz = mz and ez = ez - 1

6. Assign to result the sign calculated at the beginning of the algorithm

Example:

mx = 1.0001101 my = 1.0010010

ex = 0.1000010ey = 1.1010001

1. Find sign of the result:sign(mz) = sign(mx) ⊕sign(my) = 1 ⊕ 1 = 0

2. Find modules of mantissas:|my| = 0.1101110|mx| = 0.1110011

3. Calculating the exponent of the result:ez = 0.1000010 + 1.1010001 = 0.0010011

20

4. Multiplication algorithm nr.1 :

ADDER |MY| COMMENTS00000000 00000000 01110011

01101110 +|mx|00000000 01101110 00111001 ¿m y∨¿¿

0 1101110 ´¿mx∨¿¿, +|mx|0000001 01001010 00011100 ¿m y∨¿¿

01 101110 ´¿mx∨¿¿, +|mx|00000001 01001010 00000111 2¿m y∨¿¿

0110 1110 ´¿mx∨¿¿, +|mx|00001000 00101010 00000011 ¿m y∨¿¿

01101 110 ´¿mx∨¿¿, +|mx|00010101 11101010 00000001 ¿m y∨¿¿

011011 10 ´¿mx∨¿¿, +|mx| 00110001 01101010 00000000 ¿m y∨¿¿ 0110111 0 ´¿mx∨¿¿

5. Normalizationmz = 0.0110001 01101010mz = 0.110001011010100 ez = ez - 1 = 0.0010011 – 1 = 0.00010010

6. The sign of result remain the same5. Code explanation

The assembly program performs the floating point multiplication (algorithm nr.1).It uses the following macros:

FirstAlgMult macro x,y It performs the fixed-point multiplication (algorithm nr.1) of sent peremeters x and y, and stores the result in mz variable

Normalize macro z,expIt checks if the number is denormalized. It uses a mask (C000h) for highlighting the first 2 bits of the number and checks if it is 00 or 11. If so, it shift z to the left and decrement exp.

print macro messagePrint a message to the screen

print8Bits macro xPrint a 8-bit number in binary

21

print16Bits macro xPrint a 16-bit number in binary

6. Results

7. ConclusionAssembler is a low level programming language. It allows the programmer to interact with the processor, to manage memory.While performing this course work I learned a lot of things about assembly language. In my code I used macros which are very useful things because it allows to pass parameters and make the program more modular.The implementation of floating point multiplication algorithm was an interesting performance. The arithmetic algorithms are very important because they are the basics of data management in a computer. Their implementation and complexity influences the performance of the computer.

22

AppendixSource code in Assembler:

.model small

.stack 100h

.data mx db 3Ah ; 0011 1010 b my db 0D1h ; 1101 0001 b ex db 42h ey db 0D1h sign db ? mz dw ? ez db ? mask dw 0xC000h ; 1100 0000 0000 0000 b - for checking 1st two bits at normalization ; variables for outputing mxInput db 10,13,'Input',10,13,'mx = $' myInput db 10,10,13,'my = $' exInput db 10,13,'ex = $' eyInput db 10,13,'ey = $' mzResult db 10,13,10,13,'Result',10,13,'mz = $' ezResult db 10,13,'ez = $' .code

;First Multiplication Fixed-Point Algorithm;------------------------------------------ FirstAlgMult macro x, y ; save data from registers push dx push ax push bx

23

xor dx, dx ; adder xor ax, ax xor bx, bx mov al, x ; AX <-- mx mov bl, y ; BX <-- my CheckLSB: test bl, 1 jz Shift ;if LSB = 1 add dx, ax ; adder = adder + mx Shift: shl ax, 1 ;shift mx left shr bl, 1 ;shift my right test bl, 0FFh ;check if my is 0 jnz CheckLSB mov mz, dx ;restore data from stack pop bx pop ax pop dx endm ; Check for denormalization;-------------------------------------Normalize macro z, exp push ax push bx xor bx,bx xor ax,ax mov bx, z

24

CheckTwoBits: mov ax, bx and ax, mask jz DoNormalization ; two consecutive 0s (00... & 11... = 00...) cmp ax, mask je DoNormalization ; two consecutive 1s (11... & 11... = 11... = mask) jmp NumberIsNormalized DoNormalization: shl bx, 1 dec exp jmp CheckTwoBits NumberIsNormalized: mov z, bx pop bx pop ax endm

print macro message push dx push ax mov dx, offset message mov ah, 9 int 21h pop ax pop dx endm ; print an 8 bits number-------------print8Bits macro x, len LOCAL show8, Zero8, Jump8, Quit8

25

push bx push cx push dx mov dl, 8 mov cx, 8 xor bx,bx show8: mov bl, x dec dl mov cl, dl shr bx, cl test bx, 1 jz Zero8 mov al, '1' mov ah, 0Eh int 10h jmp Jump8 Zero8: mov al, '0' mov ah, 0Eh int 10h Jump8: test dx, 0FFFFh jz Quit8 jmp show8 Quit8: push dx push cx push bx endm

26

;print a 16 bits number------------- print16Bits macro x LOCAL show16, Zero16, Jump16, Quit16 push bx push cx push dx mov dl, 16 mov cx, 16 xor bx,bx show16: mov bx, x dec dl mov cl, dl shr bx, cl test bx, 1 jz Zero16 mov al, '1' mov ah, 0Eh int 10h jmp Jump16 Zero16: mov al, '0' mov ah, 0Eh int 10h Jump16: test dx, 0FFFFh jz Quit16 jmp show16 Quit16: push dx push cx push bx endm

27

;------------------- START ---------------------start: mov ax, @data mov ds, ax ;Determine sign of the result ; sign(mx) xor sign(my) = sign(mz) xor ax, ax mov al, mx xor al, my mov sign, al ; find modules of mx and my test mx, 80h jz mxIsPositive ;mx negative case neg mx mxIsPositive: test my, 80h jz myIsPositive ;my negative case neg my myIsPositive: ; find exponent of result ez mov al, ex add al, ey mov ez, al ; perform fixed point multiplication of mantissas FirstAlgMult mx, my Normalize mz, ez ; convert the resulting mz in CC ; by checking the sign stored at the beginning test sign, 80h jz DoNotConvert neg mz

28

DoNotConvert: print mxInput print8Bits mx print exInput print8Bits ex print myInput print8Bits my print eyInput print8Bits ey print mzResult print16Bits mz print ezResult print8Bits ez

end start

29

Documents

AC CourseWork MihaiTrofim (1)