introduction to IT and Com systems.doc

INDEX

1. Digital Logic Fundamentals 4

1.1 Overview 4

1.2 Boolean Algebra 4

1.3 Combinational Logic Circuit 7

1.4 Multiplexers 11

1.5 Decoders & Encoders 12

1.6 Memory 14

1.7 Flip-flop 14

1.8 Sequential Circuit 16

1.9 Register 17

1.10 Counters 18

1.11 Digital circuit 19

1.12 Multiple Choice Quiz 20

2. Introduction to Computer Organisation 22

2.1 Functional Units 22

2.2 System Buses 23

2.3 Memory subsystem Organisation 24

2.4 I/O Subsystem Organisation & Interfacing 27


3. Programming the basic computer 31

3.1 Introduction 31

3.2 The level of programming languages 32

3.3 Translators 34


4. CPU Organisation 42

1

4.1 General Register Organisation 42

4.2 Stack Organisation 44

4.3 Register Organisation 45

4.4 Status & control Registers 46

4.5 Register transfer language 47

4.6 Register Transfer 48

4.7 Bus and Memory Transfer 48

4.8 Arithmetic Micro Operations 49

4.9 Logic Micro-Operation 50

4.10 Shift Micro Operation 51

4.11 Control Unit 52

4.12 Case Study: 8085: Microprocessor 56


5. Computer Arithmetic 60

5.1 Introduction 60

5.2 Number Systems 60

5.3 Addition and Substraction 63

5.4 Multiplication 63

5.5 Division 64

5.6 Floating Point 66


6. Input /Output Organisation 72

6.1 Peripheral Devices 72

6.2 Input/Output Interface 73

6.3 Asynchronous Data Transfer 74

6.4 Mode of Transfer 77

2

6.5 Priority Interrupt 86

6.6 Direct Memory Access 88

6.7 Input –Output Processor 89

6.8 Serial Communicator 89


7. Memory Organisation 92

7.1 Memory Hierarchy 92

7.2 Main Memory 92

7.3 Auxiliary Memory 95

7.4 Associative Memory 97

7.5 Cache Memory 98

7.6 Virtual Memory Concept 103

8. Pipeline & Vector Processing 107

8.1 Parallel Processing 109

8.2 Pipelining 110

8.3 Arithmetic Pipeline 112

8.4 Instruction Pipeline 112

8.5 RISC Pipeline 114

8.6 Vector Processing 114

8.7 Array Processor 117

8.8 RISC vs. CISC 118


3

Chapter 1

Digital Logic Fundamentals

1.1 Overview

• Gates, latches, memories and other logic components are used to design computer systems and their subsystems

• Good understanding of digital logic is necessary in order to learn the fundamentals of computing systems organization and architecture

• Two types of digital logic:

– Combinatorial logic: output is a function of inputs

– Sequential logic: output is a complex function of inputs, previous inputs and previous outputs

• Neither combinatorial logic nor sequential logic is better than the other. In practice, both are used as appropriate in circuit design.

1.2 Boolean Algebra

• A Boolean algebra value can be either true or false.

• Digital logic uses 1 to represent true and 0 to represent false.

• This presentation introduces the Boolean algebra basic functions and examines the fundamental methods used to combine, manipulate and transform these functions.

AND

x y

out = xy

0 0 0

0 1 0

4

1 0

0

1 1 1

• Output is one if every input has value of 1

• More than two values can be “and-ed” together

• For example xyz = 1 only if x=1, y=1 and z=1

OR

x y

out = x+y

0 0 0

0 1 1

1 0

1

1 1 1

5

• The number of inputs that are 1 matter.

• More than two values can be “xor-ed” together.

• General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an even number of input values are 1.

XOR (Exclusive OR)

x y

out =

0 0 0

0 1

1

1 0 1

1 1 0

• The number of inputs that are 1 matter.

• More than two values can be “xor-ed” together.

• General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an even number of input values are 1.

NOT

6

x x'

0 1

1

0

• This function operates on a single Boolean value.

• Its output is the complement of its input.

• An input of 1 produces an output of 0 and an input of 0 produces an output of 1

XNOR

x y

out =x xnor y

0 0 1

0 1 0

1 0

0

1 1 1

7

• Output value is the complemented output from an “XOR” function.

1.3 Combinatorial Logic Circuit

• Combinatorial Logic Circuit that implements the function xy’+yz DeMorgan’s Law

(ab)’=a’+b’

(a+b)’=a’b’

• Property for generating equivalent functions

• Allows conversion of AND function to an equivalent OR function and vice-versa

• It may allow the simplification of complex functions, that will allow a simpler design

8

• It is useful in generating the complement of a function

Generating the complement of a function using DeMorgan’s law

(xy’ + yz)’ = (xy’)’(yz)’ = (x’ + y)(y’ + z’) = x’y’ + x’z’ + yy’ + yz’ (because yy’=0) => (xy’+yz)’ = x’y’ + x’z’ + yz’

x y z x'y' x'z' yz' x'y‘ + y'z‘ + yz'

0 0 0 1 1 0 1

0 0 1 1 0 0 1

0 1 0 0 1 1 1

0 1 1 0 0 0 0

1 0 0 0 0 0 0

1 0 1 0 0 0 0

1 1 0 0 0 1 1

1 1 1 0 0 0 0

Karnaugh Map (K map)

• Method for minimizing logic

• Is used to represent the values of a function for different input values

• The rows and columns of the K-map correspond to the possible values of the function's input

9

• Each cell in the K-map represents a minterm (i.e. a three variables function has: x’y’z’, x’y’z, x’yz’, x’yz, xy’z’, xy’z, xyz’ and xyz)

Gray Code

• Depends on the number of bits in its value

• The 1-bit Gray code serves as basis for the 2-bit Gray code, the 2-bit Gray code is the basis for 3-bit Gray code, etc…

• Gray code sequences are cycles: 000 -> 001 -> 011 -> 010 -> 110 -> 111 -> 101 -> 100 -> 000

• Adjacent values differ by only one bit

K-map Example

• Let’s consider (xy’+yz)’ = x’y’ + x’z’ + yz’

• Group together the 1s in the map:

– g1: x’y’z’+x’y’z=x’y’(z’+z)=x’y’

– g2: x’yz’+xyz’ = yz’(x’+x)=yz’

– g3: x’yz’+x’y’z’=x’z’(y+y’)=x’z’

• To derive a minimal expression we must select the fewest groups that cover all active minterms (1s).

• (xy’ + yz)’= x’y’ + yz’

10

x y Z x'y'+y'z'+yz'

0 0 0 1

0 0 1 1

0 1 0 1

0 1 1 0

1 0 0 0

1 0 1 0

1 1 0 1

1 1 1 0

K-map for more complex function

w’x’y’z’ + w’x’yz’ + wx’y’z’ + wx’y’z + wx’yz + wx’yz’

The final minimized function is:

x’z’ + wx’ + w’xyz

11

1.4 Multiplexers

• It is a selector: it chooses one of its data inputs and passes it to the output according to some other selection inputs

• Consider four binary data inputs as inputs of a multiplexer. Two select signals will determine which of the four inputs will be passed to the output.

• Figure (a) presents the internal structure of a four inputs multiplexer, b and c present the multiplexer schematic representation with active high enable signal (b) and active low enable signal (c)

12

Multiplexer schematic representation with active high enable signal

Multiplexer schematic representation with active low enable signal

• Multiplexers can be cascaded to select from a large number of inputs 4 to 1 multiplexer made

of 2 to 1 multiplexers

1.5 Decoders and Encoders

• A decoder accepts a binary value as input and decodes it.

• It has n inputs and 2n outputs, numbered from 0 to 2n -1.

• Each output represents one minterm of the inputs

• The output corresponding to the value of the n inputs is activated

• For example, a decoder with three inputs and eight outputs will activate output 6 whenever the input values are 110.

• Figure (a) shows a two to four decoder internal structure, (b) and (c) show its schematic representation with active high enable signal and active low enable signal

13

• For inputs S1S0 = 00, 01, 10 and 11 the outputs are 0, 1, 2 respectively 3 are active

• As with the multiplexer, the output can tri-state all outputs

• Decoders can have active high or active low enable signals.

• Variants:

• have active low outputs (the selected output has a value 0 and all the other outputs have a value 1)

• output all 0 when not enabled instead of state Z (the ones in the figure).

14

Encoders

• The encoder is the exact opposite of the decoder.

• It receives 2n inputs and outputs a n bit value corresponding to the one input that has a value of 1.

• An 4-to-2 encoder and its schematic representations are presented in (a), (b) and (c).

1.6 Memory

The memory unit is an essential components in any digital computer since it is needed for strong progress and data. Most general purpose computer would run more efficiently if they were equipped with additional storage device beyond the capacity of main memory.The main memory unit that communicates directly with CPU is called the MAIN MEMORY . Devices that provide backup storage are called AUXILARY MEMORY. Most common auxiliary devices are magnetic disks and tapes they are used for strong system programs, large data files and other backup information. Only programs and data currently needed by the processor resides in main memory. All other informations are stored in auxiliary memory and transferred to the main memory when needed.

The main memory hierarchy system consists of all storage devices employed in a computer system from the slow but high –capacity auxiliary memory to a relatively faster main memory, to an even smaller and faster cache memory accessible to the high-speed processing logic. Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the memory system.

1.7 Flip-Flop

Every digital circuit likely to have a combinational circuit , most systems encountered, its also include storage elements which describe in terms of sequential circuits. The most common type of sequential circuit is synchronous.The storage elements employed in clocked sequential circuits are called flip-flops. A flip-flop is a binary cell capable of storing one bit of information.

Basic RS Flip-Flop (NAND)

15

Clocked D Flip-Flop

Clock Pulse Definition

Master-Slave Flip-Flop

T Flip-Flop

16

JK Flip-Flop

1.8 Sequential circuit

A sequential circuits is an interconnection of flip-flop and gates.The gates by themselves constitutes a Combinational circuit ,but when included with the flip-flop, the overall circuit is classified as a Sequential circuit .

State Tables and State Diagrams

17

Sequential Circuit Example

1.9 Register

A register is a group of flip-flops with each flip-flop capable of storing one bit of information. An n-bit register has a group of n flip-flop and is capable of storing any binary information of n-bits. In addition to flip-flops a register may have combinational gates that perform certain data -processing tasks. The flip-flops hold the binary information and the gates control when and how new information is transferred into the register.

4-Bit Parallel Register

18

D and Q are both sets of lines, with the number of lines equal to the width of each register. There are often multiple address ports, as well as additional data ports.

1.10 Counters

A register that goes through a predetermined of states upon the application of input pulses is called a counter. The input pulses may be clock pulses or may originate from external sources. They may occur at uniform interval of time or at a random. Counters are found in almost all equipment containing digital logic.

Binary counter

• Count-down counter: A binary counter with reverse count: Starts from 15 goes down.

• In a count-down counter the least significant bit is complemented with every count pulse. Any other bit is complemented if the previous bit goes from 0 to 1.

• We can use the same counter design with negative edge flip-flops to make a count-down flip-flop.

A BCD counter starts from 0 ends at 9.

State diagram of Binary counter.

19

1.11 Digital Circuit

20

MULTIPLE CHOICE QUESTIONS

1. Which of the following is not a Boolean Algebra function?a) AND

b) OR

c) XAND

d) XNOR

2. Which of the following is a De- Morgan Law ?

a) a+b = ab

b) (a-b)’ = a’b’

c) (ab)’=a’+b’

d) a’+b’=a + b

3. Transistors are made up of?

a) Lithium

b) Neon

c) Silicon

d) Nickel

4. How many Type of memory are there?

a) 1

b) 2

c) 3

d) 4

5. Which of the following is not a counter?

a) Century counter

b) Ring Counter

c) Johnson Counter

21

d) Decade Counter

6. What is the main advantage of synchronous logic?

a) Complexity

b) Simplicity

c) Serial Port

d) Regularity

7. How many types of counters are there?

a) 6 type

b) 7 type

c) 8 type

d) 9 type

8. Asynchronous & Synchronous counter is created from?

a) Decoder

b) Encoder

c) JK flip Flop

d) D Flip flop

9. Which of the following is not a type of register?

a) Active Register

b) Data Register

c) Address Register

d) Vector Register

10 . Which of the following is a type of register?

a) Speed Registerb) Shift Register

c) User- accessible Register

22

d) Constant Register

ANSWERS

1. D, 2.C,3.B,4.A,5.B,6.B,7.C,8.A,9.A ,10.B

23

Chapter 2

Introduction to computer organization

2.1 Functional Units CPU – Central Processing Unit, which includes:

o ALU – Arithmetic and Logic Unit

o Set of registers (including program counter keeping the address of the next

instruction to be performed)

o Control Unit – decoding instructions and controlling internal data movement

among different CPU parts

o Interface Unit – moves data between CPU and other hardware components

Memoryo The Main Memory – Primary Storage or RAM – Random Access Memory

Holds data or a piece of a program code (instructions)

Consists of a large number of cells, each individually addressable.

Cell is called “byte” and consists of 8 bits (binary elements holding 0 or

1)

Usually a word consists of 4 bytes.

In most cases – volatile memory

o The Multilevel Cache - An intermediate stage between ultra-fast registers (inside

CPU) and much slower main memory.

Most actively used information in the main memory is duplicated in the

cache, which is faster, but of much lesser capacity.

2.2.System Buses2.2.1 Memory Bus

24

The memory bus is the set of wires that is used to carry memory addresses and data to and from the

system RAM. The memory bus in most PCs is also shared with the processor bus, connecting the

system memory to the processor and the system chipset. The memory bus is part of the PC's hierarchy

of buses, which are high-speed communications channels used within the computer to transfer

information between its components.

The memory bus is made up of two parts:

Data bus : carries actual memory data within the PC

Address bus : selects the memory address that the data will come from or go to on a read or write

2.2.2 I/O BusThe I/O bus facilitates communication between CPU and all the interface units.

I/O bus is of two types:

Isolated I/O

o Separate I/O read/write control lines in addition to memory read/write control lines

o Separate (isolated) memory and I/O address spaces

o Distinct input and output instructions

Memory-mapped I/O

o A single set of read/write control lines (no distinction between memory and I/O

transfer)

o Memory and I/O addresses share the common address space which reduces memory

address range available

o No specific input or output instruction, as a result the same memory reference

instructions can be used for I/O transfers

o Considerable flexibility in handling I/O operations

2.3 Memory Subsystem Organization

25

2.3.1. Memory Hierarchy

Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the

memory system.

26

http://en.wikipedia.org/wiki/Image:Computer_storage_types.svg

2.3.2. Memory Location: Address Map

Address space assignment to each memory chip. Example: 512 bytes RAM and 512 bytes ROM.

2.3.3.Types of Memory

Main Memory

o Consists of RAM and ROM chips

o Holds instructions and data for the programs in execution

o Volatile

o Fast access

o Small storage capacity

Typical RAM chip

Typical ROM chip

RAM 1

RAM 2

RAM 3

RAM 4

ROM

0000 - 007F

0080 - 00FF

0100 - 017F

0180 - 01FF

0200 - 03FF

ComponentHexa

address0 0 0 x x x x x x x

0 0 1 x x x x x x x

0 1 0 x x x x x x x

0 1 1 x x x x x x x

1 x x x x x x x x x

10 9 8 7 6 5 4 3 2 1

Address bus

Chip select 1Chip select 2

ReadWrite

7-bit address

CS1CS2RDWRAD 7

128 x 8RAM 8-bit data bus

Chip select 1Chip select 2

9-bit address

CS1CS2

AD 9

512 x 8ROM

8-bit data bus

27

Auxiliary Memoryo Information organized on magnetic tapes

o Auxiliary memory holds programs and data for future use

o Non-volatile – data remains even when the memory device is taken off-power

o Slower access rates

o Greater storage capacity

Associative Memoryo Accessed by the content of the data rather than by an address

o Also called Content Addressable Memory (CAM)

o Used in very high speed searching applications

o Much faster than RAM in virtually all search applications

o Higher costs

CAM Hardware Organization

Cache Memoryo Cache is a fast small capacity memory that should hold those information which are

most likely to be accessed

o The property of Locality of Reference makes the Cache memory systems work

Argument register(A)

Key register (K)

Associative memoryarray and logic

m words

n bits per word

Input

ReadWrite

M

28

Match Register

o Locality of Reference - The references to memory at any given time interval tend to be

confined within a localized areas. This area contains a set of information and the

membership changes gradually as time goes by.

Temporal Locality - The information which will be used in near future is likely

to be in use already

Spatial Locality - If a word is accessed, adjacent(near) words are likely accessed

soon

2.4 I/O Subsystem Organization and Interfacing

2.4.1. A Model of I/O Subsystem Organization

2.4.2. I/O Interface Provides a method for transferring information between internal storage (such as memory and

CPU registers) and external I/O devices Resolves the differences between the computer and peripheral devices

Peripherals - Electromechanical Devices

Local peripheral device Communications port File System

Hardware

User Processes

Logical I/O Communication Architecture

Directory Management

File System

Physical Organization

Scheduling & Control

Device I/O

29

CPU or Memory - Electronic Device

Data Transfer Rate o Peripherals - Usually slower o CPU or Memory - Usually faster than peripherals; Some Synchronization

mechanism may be needed unit of Information

o Peripherals – Byte, Block, etc.o CPU or Memory – Word

Chip selectRegister selectRegister select

I/O read

I/O write

CSRS1

RS0

RD

WR

Timingand

Control

Busbuffers

Bidirectionaldata bus

Port Aregister

Port Bregister

Controlregister

Statusregister

I/O data

Control

StatusInte

rnal

bus

CPU I/O

Device

30

I/O data

End chapter quizzes

Q1.Control unit basically

a) Decode instructionb) Moves the datac) It is a logical unitd) Used to save the data in register

Q2. CPU does not include

a) ALUb) Registersc) Control Unitd) System Bus

Q3. Memory bus is made up of two parts

a) Data Bus and Address busb) Data bus and I/O busc) Main memory and address busd) Address bus and I/O bus

Q4. RAM and ROM chip

a) Volatile and Non volatileb) Non volatile and Volatilec) Both are volatiled) Both are non volatile

Q5. I/O interface work for

a) Synchronization b) Transferring informationc) Managing memoryd) Process management

31

Q6. Special high speed memory

a) Auxiliary memoryb) Main memoryc) Associative memoryd) Cache memory

Q7.Memory is assigned with help of

a) Memory location b) Memory managementc) Memory address mapd) By assignment

Q8. CPU- Central Processing Unit has how many parts

a) 5 partsb) 4 partsc) 3 partsd) 2 parts

Q9. Associative Memory is also known as

a) RAMb) ROMc) CADd) CAM

Q10. What is the basic function of I/O.

a) Connect to CPUb) Connect to Memoryc) Connect to Output Devicesd) Connect to ROM

ANSWERS

1. c, 2.d, 3.a, 4.a, 5.b, 6.d, 7.c, 8.b, 9.d, 10. c

32

Chapter 3

Programming the Basic computer

3.1 Introduction:

A total computer system includes both hardware and software. Hardware consists of physical components and all associated equipment. Software refers to the program that written to the computer. It is possible to be familiar with various aspects of computer software without being concerned with of how computer hardware operates. It is also possible to design the parts of software without a knowledge of software capabilities. A program written by a user may be either dependent or independent of the physical computer that runs his program. For this programming language is a interface between computer and programmer. The set of instructions written in a language which is being translated in machine compatible language.

3.2 The level of programming languages

3.2.1 Low level Programming Language:

2.2.1.1. First generation language.

2.2.1.2. Second generation language.

Low level programming language includes machine languages and Assembly languages. Both are machine dependent like a programs written in low-level languages for IBM PC cannot be run in a Macintosh computer. e.g. LEGO programming. Low level languages are used to control the hardware for e.g. Driver programs are written in low-level languages. And also Programs written in low-level languages are usually small in size and efficient.

3.2.2. High level Programming Language

1. Third Generation Language (3GL)

2. Fourth Generation Language (4GL)

3.2.3. Comparison : High & Low Level languages

3.21.1 First generation languages

The first generation programming languages: Machine languages Defined by the hardware design of the computer

33

Machine dependent. Instruction formed by binary digits (“0”or “1”)Corresponding to an ON or OFF state of an

electric circuit . Machine Instruction has two parts: op code and operand

1. Op code : Operation to be performed as a verb like increment, add , copy etc.

2. operand: operand acts like a noun on the data for example incrementing the value of A, A is the operand. It can be a value or an address of memory location.(Remark: variable is machine language is in form of address, i.e address = variable)

3.2.1.2. Second generation languages

Assembly languages: The user employs symbols (letter numerals , or special characters ) for the operation parts , the address part and other parts of instruction code. Each symbolic instruction can be translated in to one binary coded instruction. This translation is done by the special program called assembler. Because an assembler translates the symbols, this type of symbolic program is referred to as an assembly language program and the language is assembly language.

Instructions are 1:1 with machine instructions. Symbolic instruction code (or mnemonics).

1. Using symbolic instruction code to replace binary digits. 2. Meaningful abbreviations that substitute the op code 3. e.g. “JMP A” means jump to address represented by A.

3.2.3 High-Level Programming Languages

Machine-independent Third generation languages or above More like human languages Let programmers concentrate on the logic rather than machine architecture.

These are the special languages developed to reflected the procedure used in the solution of a problem rather than be concerned with computer hardware behavior e.g. FORTAN.

3.2.3.1. Third generation language (3GL)

More human like than assembly language Meet demand of efficiency and effectiveness Examples : C, Fortran, COBOL, BASIC, PASCAL etc. Each high-level language statement is translated (翻譯) into many machine instructions.

32.2.3.2. Fourth generation language (4GL)

Each 4GL instruction represents many 3GL statements. Typical example: SQL (Structured Query Language). e.g. SELECT * FROM student ORDER BY class, class_no;

will display all the records in table “Student” in ascending order of class and class_no.

If the same task is written in a 3GL, instructions have to be written:

34

Sort the records in ascending order of class and class_no;Get each record, test for end of file, put each item on screen;Go back and repeat the operation until no more records to process.

2.2.3. Comparing Low & High-level Languages (1)

1. Advantages of high-level language

Advantages of HIGH levellanguage

Reasons

1. more effective designed to solve problem

2. easier to learn, write, read andmodify

instructions are English-like format

3. programs are shorter in length one high-level language statementtake the place of many machineinstructions

4. portable (can use in differentcomputers with littlemodification)

machine-independent

2. Advantages of low-level language

Advantages of LOW levellanguage

Reasons

1. more efficient hardware features, like bus width, isfully made use of

2. require smaller memory space torun

fewer low-level instructions areneeded to do the same task (noredundant instructions produced by atranslator)

3. precise control of hardware some low-level instructions aredesigned specifically for controllingthe hardware at low level

Examples of High Level Languages

Designed to solve different problems. Ranging from business to games . Some common programming languages:

1. PASCAL

Designed for teaching structured programming.

2. C

Designed to operate computer at low-level using high level language. More readable and better structured than Assembly language. Similar language C++ is popular in game programming.

35

3. Java

Includes some syntax of C. Java programs can be called from HTML document or run directly. Computer needs Java Virtual Machine to execute Java Programs e.g. Yahoo

Snooker .

4. COBOL

Designed for business applications Wordy language Readable for beginners e.g. multiply hourly-rate by hours-worked giving gross-pay

i.e. gross-pay ß hourly-rate * hours-worked. But, can be clumsy for long programs.

5. BASIC

Designed to provide students with an easy-to-learn language.Visual Basic

A version of BASIC Specialized for developing Windows applications

Belong to Rapid Application Development (RAD)

Programming can be done by drag-and-drop Programmers can quickly build an application

Visual Basic.NET

Newest version of Visual Basic Supports all Web-based features

6. Script Languages

Interpreted and processed by a software (e.g. Browser)

e.g. VBScript and JavaScript have similar syntax as Visual Basic and Java respectively Interpreted and processed by Web browser

Instructions are embedded in HTML documents

Increasing the function of HTML Making Web pages more interactive

3.3 Translators

36

Translator consist of three attributes source program, translator and object program. Source program is a text file used to for storing source code which must be translated into machine instructions before execution .Translator is a software to convert source code into machine instructions. And it is used to discover syntax errors in a program.Object program is a b inary file e.g. EXE used for storing machine instructions during execution.

Types of translators:

3.3.1. Assembler

3.3.2. Compiler

3.3.3. Interpreter

3.3.1. Assemblers

A software used to translate assembly language program into machine instructions for producing one machine instruction for each assembly language statement which translate the whole source program before execution and the store the results in an object program(During execution, need the object program only).

3.3.2 Compilers

Compiler (e.g. Visual Basic – Make EXE)

A software Translate high-level language program into machine instructions Translate the whole source program before execution Storing the results in an object program

3.3.3. Interpreters (e.g. Visual Basic – Run)

A software for translating ONE high-level language program statement at a time send it to the computer for execution and then proceed to the NEXT statement.

No object program is produced when program is run, the following must be present interpreter and source code.

Interpreted program runs slower than compiled ones because time is spent on translation .

Like by most of the programmers as it is more convenient in testing.3.3.4. Comparison of Translators

37

Assembler Compiler InterpreterTranslate low-level languageinto machine language

Translate high-level language into machine language

A new file, called object program, is created No new file is producedPrograms run faster Programs run slowerTranslate the whole source program Translate and execute a

statement at a time.Less convenient for programmer. Every change in thesource program needs compilation.

Convenient for programmer

Programmer can hide the source program The source program must bepresent

Translator is not necessary at run-time Translator is necessary atrun-time

3.3.5. Instruction sets

An instruction set is the complete collection of instructions that are understood (executed) by a CPU that is machine code, binary code and usually represented by assembly codes.

2.3.5.1. Elements of a Machine Instruction each instruction must contain the information required by the CPU for execution.Elements of a Machine Instruction are:

• Operation Code: Specifies the operation to be performed. The operation is specified by a binary code, known as the operation code, or opcode.

• Source operand reference: The operation may involve one or more source operands; that is, operands that are the inputs for the operation.

• Result operand reference: The operation may produce a result.• Next instruction reference: This tells the CPU where to fetch the next instruction after the

execution of this instruction is complete.

3.3.5.2. Source and Result operands:(values for that operands are of importance)

• Main (or virtual) memory: The memory address must be supplied.• CPU register: CPU contains one or more registers that may be referenced by machine

instructions. If only one register exists, reference to it may be implicit. If more than one register exists, then each register is assigned a unique number, and the instruction must contain the number of the desired register.

• I/O device: The instruction must specify the I/O module or device for the operation. If memory-mapped I/O is used, this is just another memory address.

Each instruction is represented by a sequence of bits. The instruction is divided into fields, corresponding to the constituent elements of the instruction.

3.3.5.3Instruction Types

Example: High level language statement: X = X + Y

38

• If we assume a simple set of machine instructions, this operation could be accomplished with three instructions: (assume X is stored in memory location 624, and Y in memory loc. 625.)

1. Load a register with the contents of memory location 624.2. Add the contents of memory location 625 to the register,3. Store the contents of the register in memory location 624.

As seen, a simple "C" (or BASIC) instruction, may require 3 machine instructions.

The instructions fall into one the following four categories:

– Data processing: Arithmetic and logic instructions.– Data storage: Memory instructions.– Data movement: I/O instructions.– Control: Test and branch instructions.

What is the maximum number of addresses one might need in an instruction?

• Virtually all arithmetic and logic operations are either unary (one operand) or binary (two operands). The result of an operation must be stored, suggesting a third address. Finally, after the completion of an instruction, the next instruction must be fetched, and its address is needed.

• This line of resoning suggests that an instruction could be required to contain 4 address references: two operands, one result, and one address of the next instruction. In practice, the address of the next instruction is handled by the program counter; therefore most instructions have one, two or three operand addresses.

• Three-address instruction formats are not common, because they require a relatively long instruction format to hold three address references.

Number of Addresses

Instruction Set Design

The fundamental design issues in designing an instruction set are:

39

• Operation Repertoire: How many and which operations to provide, and how complex operations should be.

• Data Types: The various types of data upon which operations are performed.• Instruction Format: Instruction length (in bits), number of addresses, size of various fields,

and so on.• Registers: Number of CPU registers that can be referenced by instructions and their use.• Addressing: The mode or modes by which the address of an operand is specified.

These issues are highly interrelated and must be considered togeter in designing an instruction set.

3.3.6. Types of Operands

3.3.6.1. Most important general categories of data are:

Addresses Numbers Characters Logical data.

Numbers:

o Integer or fixed pointo Floating point (real numbers)o Decimal

Characters:

o International Reference Alphabet (IRA) which is referred to as ASCII in the USA, o EBCDIC character set used in IBM 370 machines.

Logical Data:

Boolean data. 1 = true, 0 = false.

Types of Operations

A set of general types of operations is as follows:

• Data transfer: Move, Store, Load (Fetch), Exchange, Clear (Reset), set, Push, Pop.• Arithmetic: Add, Subtract, Multiply, Divide, Absolute, Negate, Increment, Decrement.• Logical: AND, OR, NOT, XOR, Test, Compare, Shift, Rotate, Set control variables.• Conversion: Translate, Convert.• I/O: Input (Read), Output (Write), Start I/O, Test I/O.• Transfer of Control: Jump (Branch), Jump Conditional, Jump to Subroutine, Return,

Execute, Skip, Skip Conditional, Halt, Wait (Hold), No Operation.• System Control: Instructions that can only be executed while the processor is in a priviledged

state, or is executing a program in a special priviledged area of memory. Instructions reserved for the use of operating system.

40

3.3.7. Instruction Formats

• Includes opcode• Includes (implicit or explicit) operand(s)• Usually more than one instruction format in an instruction set

Instruction Length:

– Affected by and affects:– Memory size– Memory organization– Bus structure– CPU complexity– CPU speed– Trade off between powerful instruction repertoire and saving space

Allocation of Bits:

– Number of addressing modes– Number of operands– Register versus memory– Number of register sets– Address range– Address granularity

Variable-Length Instructions:

– 1. Due to varying number of operands,– 2. Due to varying length of opcode in some CPU's.

41


1. The speed at which the CPU processes data to convert is measured in what :a) Megahertzb) Gigahertzc) Nanosecondsd) A and B

2. Machine cycles are measured in nanoseconds or picoseconds..a) Trueb) False

3. Which register example holds the address of the current instruction being processed?

a) Program counterb) Instruction registerc) Control Unitd) Arithmetic Logic Unit

4. What is a machine cycle :

a) Data measured in megahertzb) Data measured in gigahertzc) A sequence of instruction perform to execute one program instructiond) All of the Above

5. Most of the computers available today are known as

a). 3rd generation computers b). 4 th generation computers

c).5th generation computers d). 6th generation computers

42

6. What difference does the 5th generation computers have from othe generation computers?

a). Technological advancement b). Scientific code

c). Object Oriented Programming d). All of the above

7.Which of the following language is translated in to pseudo code?

a). Assembly .language b). FORTRAN

c).PASCAL d). BASIC

8. Which translator translate high level into machine language

a). compiler and interpreter

b). compiler and assembler

c) assembler and interpreter

9.Which translator translate the whole source program before execution

a) compiler b) assemblerc) interpreter

10. Instruction format does not include

a) Includes opcodeb) Includes (implicit or explicit) operand(s)c) Usually more than one instruction format in an instruction setd) İnclude only operators

Answers

1.a , 2.b, 3.b, 4.a, 5.b 6.a , 7.a ,8.a, 9. a,10.d

43

Chapter 4

CPU ORGANIZATION

4.1 General Register Organization – In a computational device memory access is the most time

consuming process. So, it would be more convenient & more efficient to store these intermediate

values in processor register.

The overall layout includes:

o Bus Systems

MUXSELA

{ MUX } SELB

ALUOPR

R1R2R3R4R5R6R7

Input

3 x 8decoder

SELD

Load(7 lines)

Output

A bus B bus

Clock

44

MUX A & B - select one of 7 register or external data input by SELA

and SELB

BUS A and B : form the inputs to a common ALU

3 X 8 Decoder: select the register (by SELD) that receives the

information from ALU

o Control Word

Format – 14 binary selection inputs with 3 bits for SELA, 3 for SELB, 3

for SELD, and 5 for OPR

Encoding of register selection fields

o ALU – An operation is selected by the ALU operation selector (OPR). The

result of a microoperation is directed to a destination register selected by a

decoder (SELD).

45

Example of CPU Microprograms

4.2 Stack Organizationo Stack - A storage device that stores information in such a manner that the item

stored last is the first item retrieved. Hence, also called last-in first-out (LIFO)

list.

o Use of Stack

Temporary storage during program runtime

Not fixed in size: grows and shrinks using push and pop

operations

Not typed: elements of list not defined as one type

Subroutines (functions, procedures)

Linkage: saving a “link” back to the caller (address of calling

routine)

Data: return value, parameters, local variables

Other saved information

Managed at the assembly language level

Rules for compiler, AL programmer

o Stack Pointer (SP) - A register that holds the address of the top item in the

stack. SP always points at the top item in the stack.

o Push - Operation to insert an item into the stack.

o Pop - Operation to retrieve an item from the stack.

46

4.3 Register OrganizationOne of the key difference among various computers is the difference in their register sets. Some computers have very large while some has smaller sets. But on the whole, from a user's point of view the register set can be classified under two basic categories

Programmer Visible Registers: These registers can be used by machine or assembly language programmers to minimize the references to main memory.

Status Control and Registers: These registers can not be used by the programmers but are used to control the CPU or the execution of a program.

4.3.1 Program Visible Register

These registers can be accessed using machine language. In general we encounter four types of programmer visible registers.

General Purpose Registers Data registers Address Registers Condition Codes Registers

The general purpose registers are used for various functions desired by the processor. A true general purpose register can contain operand or can be used for calculation of address of operand for any operation code of an instruction. But trends in today's machines show drift towards dedicated registers, for example, some registers may be dedicated to floating point operations. In some machines there is a distinct separation between data and address registers. The data registers are used only for storing intermediate results or data. These data registers are not used for calculation of address of the operand.

An address register may be a general-purpose register, but some dedicated address registers are also used in several machines. Examples of dedicated address registers can be:

Segment Pointer : Used to point out a segment of memory.

Index Register : These are used for index addressing scheme.

Stack Pointer (when programmer visible stack addressing is used.)

In the case of specialized register the number of bits needed for register specific details are reduced as here we need to specify only few registers out of a set of registers. However, this specialization does not allow much flexibility to the programmer.

Another issue related to the register set design is the number of general-purpose registers or data and address registers to be provided in a microprocessor. The number of registers also effects the instruction design as the number of registers determines the number of bits needed in an instruction to specify a register reference. In general, it has been found that optimum number of registers in a CPU is in the range 8 to 32. In case registers fall below the range then more memory reference per instruction on an average will be needed, as some of the intermediate result then has to be stored in the memory.

47

On the other hand, if the number of registers goes above 32, then there is no appreciable reduction in memory references.

4.4 Status and Control Registers

For control of various operations several registers are used. These registers can not be used in data manipulations, however. the content of some of these registers can be used by the programmer. Some of the control register for a von Neurnann machine can be the Program Counter (PC), Memory Address Register (MAR) and Data Register .Almost all the CPUs, have status register, a part of which may be programmer visible. A register which may be formed by condition codes is called condition code register. Some of the commonly used flags or condition codes in such a register may be:

Sign flag : This indicates whether the sign of previous arithmetic operation was positive (0) or negative (1).

Zero flag : This flag bit will be set if the result of last arithmetic operation was zero.

Carry flag : This flag is set, if a carry results from the addition of the highest order bits or a borrow is taken on subtraction of highest order bit.

Equal flag : This bit flag will be set if a logic comparison operation finds out that both of its operands are equal.

Overflow flag : This flag is used to indicate the condition of arithmetic overflow.

Interrupt : This flag is used for enabling or disabling interrupts.

Supervisor : This flag is used in certain computers to determine whether the CPU is,flag executing in supervisor or user mode. In case the CPU is in supervisor modeit will be allowed to execute certain privileged instructions. In most CPUs, on encountering a subroutine call or interrupt handling routine, it is desired that the status information such as conditional codes and other register information be stored and restored on initiation and end of these routines respectively. The register often known as Program Status Word (PSW) contains condition code plus other status information. There can be several other status and control registers such as interrupt vector register in the machines using vectored interrupt, stack pointer if a stack is used to implement subroutine calls, etc.T4.2 Status and Control Registers has status and control register design and also dependent on the Operating System (0s) support. The functional understanding of OS helps in tailoring the register organization. In fact, some control information is only of specific use to the operating system.

One major decision to be taken for designing status and control registers organization is :

how to allocate control information between registers and the memory. Generally first few hundred or thousands words of memory are allocated for storing control information. It is the responsibility of the designer to determine how much control information should be in registers and how much in memory.

48

4.5 Register Transfer Language

• Rather than specifying a digital system in words, a specific notation is used, register transfer language

• For any function of the computer, the register transfer language can be used to describe the (sequence of) microoperations

• Register transfer language:A symbolic language which is a convenient tool for describing the internal organization of digital computers and can also be used to facilitate the design process of digital systems.

• Registers are designated by capital letters, sometimes followed by numbers (e.g., A, R13, IR)• Often the names indicate function:

– MAR - memory address register– PC - program counter– IR - instruction register

• Registers and their contents can be viewed and represented in various ways– A register can be viewed as a single entity:

– Registers may also be represented showing the bits of data they contain

4.6 Register TransferComputer registers are designated by capital letters to denote the function of registers eg. The register that holds an address for the memory unit is usually called a memory address register and is designated by the name MAR. Other designation for registers are PC (for programme counter), IR (for instruction register), and R1 (for processor register). The individual flip flops is an n-bit register are3 numbered in sequence from zero through n-1, starting from zero is the right most position and increasing the number towards the left.

Block diagram of register

• Copying the contents of one register to another is a register transfer• A register transfer is indicated as R2 ¬ R1

– In this case the contents of register R2 are copied (loaded) into register R1– A simultaneous transfer of all bits from the source R1 to the destination register R2,

during one clock pulse.

49

– Note that this is a non-destructive; i.e. the contents of R1 are not altered by copying (loading) them to R2

4.7 Bus And Memory Transfer

Bus is a path(of a group of wires) over which information is transferred, from any of several sources to any of several destinations.

From a register to bus: BUS ¬ R

Bus system for four registers

Transfer From Bus to a Destination Register

50

4.8 Arithmetic Micro operations

• The basic arithmetic microoperations are– Addition– Subtraction– Increment– Decrement

• The additional arithmetic microoperations are– Add with carry– Subtract with borrow– Transfer/Load

51

4.9 Logic Microoperation Specify binary operations on the strings of bits in registers

Logic microoperations are bit-wise operations, i.e., they work on the individual bits of data and useful for bit manipulations on binary data even useful for making logical decisions based on the bit value.

There are, in principle, 16 different logic functions that can be defined over two binary input variables

• However, most systems only implement four of these are AND (Ù), OR (Ú), XOR (Å), Complement/NOT.

• The others can be created from combination of these.• List of Logic Microoperations

- 16 different logic operations with 2 binary vars.

- n binary vars → functions

52

Truth tables for 16 functions of 2 variables and the corresponding 16 logic micro-operations

Application of logic Microoperation

• Logic microoperations can be used to manipulate individual bits or a portions of a word in a register

• Consider the data in a register A. In another register, B, is bit data that will be used to modify the contents of A

– Selective-set A ¬ A + B– Selective-complement A ¬ A Å B– Selective-clear A ¬ A • B’– Mask (Delete) A ¬ A • B– Clear A ¬ A Å B– Insert A ¬ (A • B) + C– Compare A ¬ A Å B

4.10 Shift Micro operations

Logic micro operation specify binary operations for strings of bits stored in register. These operations consider each bit of register separately and treat them as binary variables. For example, the exclusive OR micro operation with the content of two registers R1 and R2 is symbolized by the statement.

P: R1 R1 Å R2

It specifies a logic micro operation to be executed on the individual bits of registers provided that controls variable P = 1.

a).There are three types of shifts

Logical shift Circular shift Arithmetic shift

53

b).What differentiates them is the information that goes into the serial input

c).

A right shift operation

A left shift operation

4.11 Control Unit

The Control Unit can be thought of as the brain of the CPU itself. It controls based on the instructions it decodes, how other parts of the CPU and in turn, rest of the computer systems should work in order that the instruction gets executed in a correct manner. There are two types of control units, the first type is called hardwired control unit. Hardwired control units are constructed using digital circuits and once formed cannot be changed. The other type of control unit is micro programmed control unit. A micro programmed control unit itself decodes and execute instructions by means of executing micro programs.

4.11.1 Structure of control unit

Control Unit Block Diagram

4.11.2 Hardwired Control Unit

54

Figure 1 is a block diagram showing the internal organization of a hard-wired control unit for our simple computer. Input to the controller consists of the 4-bit opcode of the instruction currently contained in the Instruction Register and the negative flag from the accumulator. The controller's output is a set of 16 control signals that go out to the various registers and to the memory of the computer, in addition to a HLT signal that is activated whenever the leading bit of the op-code is one. The controller is composed of the following functional units: A ring counter, an instruction decoder, and a control matrix.

The ring counter provides a sequence of six consecutive active signals that cycle continuously. Synchronized by the system clock, the ring counter first activates its T0 line, then its T1 line, and so forth. After T5 is active, the sequence begins again with T0. Figure 2 shows how the ring counter might be organized internally. The instruction decoder takes its four-bit input from the op-code field of the instruction register and activates one and only one of its 8 output lines. Each line corresponds to one of the instructions in the computer's instruction set. Figure 3 shows the internal organization of this decoder. The most important part of the hard-wired controller is the control matrix. It receives input from the ring counter and the instruction decoder and provides the proper sequence of control signals.

Figure 1.Block diagram of hard-wired control unit

Figure 2. The internal organization of Ring Counter

55

Figure 3.The internal organization of hard-wired Instruction Decoder

4.11.3 A Micro-programmed Control Unit

As we have seen, the controller causes instructions to be executed by issuing a specific set of control signals at each beat of the system clock. Each set of control signals issued causes one basic operation (micro-operation), such as a register transfer, to occur within the data path section of the computer. In the case of a hard-wired control unit the control matrix is responsible for sending out the required sequence of signals.

An alternative way of generating the control signals is that of micro-programmed control. In order to understand this method it is convenient to think of sets of control signals that cause specific micro-operations to occur as being "microinstructions" that could be stored in a memory. Each bit of a microinstruction might correspond to one control signal. If the bit is set it means that the control signal will be active; if cleared the signal will be inactive. Sequences of microinstructions could be stored in an internal "control" memory. Execution of a machine language instruction could then be caused by fetching the proper sequence of microinstructions from the control memory and sending them out to the data path section of the computer. A sequence of microinstructions that implements an instruction on the external computer is known as a micro-routine. The instruction set of the computer is thus determined by the set of micro-routines, the "microprogram," stored in the controller's memory. The control unit of a microprogram-controlled computer is essentially a computer within a computer.

Figure 4. A Microprogrammed Control Unit for the Simple Computer

56

Figure 4 is a block diagram of a micro-programmed control unit that may be used to implement the instruction set of the computer we described above. The heart of the controller is the control 32 X 24 ROM memory in which upt to 32 24-bit long microinstructions can be stored. Each is composed of two main fields: a 16-bit wide control signal field and an 8-bit wide next-address field. Each bit in the control signal field corresponds to one of the control signals discussed above. The next-address field contains bits that determine the address of the next microinstruction to be fetched from the control ROM. We shall see the details of how these bits work shortly. Words selected from the control ROM feed the microinstruction register. This 24-bit wide register is analogous to the outer machine's instruction register. Specifically, the leading 16 bits (the control-signal field) of the microinstruction register are connected to the control-signal lines that go to the various components of the external machine's data path section.

Addresses provided to the control ROM come from a micro-counter register, which is analogous to the external machine's program counter. The micro-counter, in turn, receives its input from a multiplexer which selects from : (1) the output of an address ROM, (2) a current-address incrementer, or (3) the address stored in the next-address field of the current microinstruction.

57

4.12 Case Study: 8085 Microprocessor

• The 8085 is an 8-bit general purpose microprocessor that can address 64K Byte of memory. It has 40 pins and uses +5V for power. It can run at a maximum frequency of 3 MHz.The pins on the chip can be grouped into 6 groups:

• Address Bus.

• Data Bus.

• Control and Status Signals.

• Power supply and frequency.

• Externally Initiated Signals.

• Serial I/O ports.

The Address and Data Busses

• The address bus has 8 signal lines A8 – A15 which are unidirectional.The other 8 address bits are multiplexed (time shared) with the 8 data bits.So, the bits AD0 – AD7 are bi-directional and serve as A0 – A7 and D0 – D7 at the same time.During the execution of the instruction, these lines carry the address bits during the early part, then during the late parts of the execution, they carry the 8 data bits.In order to separate the address from the data, we can use a latch to save the value before the function of the bits changes.

The Control and Status Signals

• There are 4 main control and status signals. These are:• ALE: Address Latch Enable. This signal is a pulse that become 1 when the AD0

– AD7 lines have an address on them. It becomes 0 after that. This signal can be used to enable a latch to save the address bits from the AD lines.

• RD: Read. Active low.

• WR: Write. Active low.

• IO/M: This signal specifies whether the operation is a memory operation (IO/M=0) or an I/O operation (IO/M=1).

• S1 and S0 : Status signals to specify the kind of operation being performed .Usually un-used in small systems.

Frequency Control Signals

• There are 3 important pins in the frequency control group.X0 and X1 are the inputs from the crystal or clock generating circuit.The frequency is internally divided by 2.So, to run the microprocessor at 3 MHz, a clock running at 6 MHz should be connected to the X0 and X1 pins.

58

– CLK (OUT): An output clock pin to drive the clock of the rest of the system.

Steps For Fetching an Instruction

• Lets assume that we are trying to fetch the instruction at memory location 2005. That means that the program counter is now set to that value.The following is the sequence of operations:

• The program counter places the address value on the address bus and the controller issues a RD signal.

• The memory’s address decoder gets the value and determines which memory location is being accessed.

• The value in the memory location is placed on the data bus.

• The value on the data bus is read into the instruction decoder inside the microprocessor.

• After decoding the instruction, the control unit issues the proper control signals to perform the operation.


1. Which register example hold the instruction itself?

A) Program CounterB) Instruction RegisterC) Control UnitD) Arithmetic Logic unit

2. To accomplish a task a computer has to process data in three stages. They are:

59

3. The CPU is also known as:

A) The BrainB) The ProcessorC) The Central Processing UnitD) All of the above

4. Main Memory holds data and instructions being processed by the computer and is this memory is accessible by the CPU.

a) Trueb) False

5. It is difficult to classify computer systems on the basis of their system performance ,as newer, smaller computer systems outperform their larger models of yesteryear.

a) Trueb) False

6.The two smaller units of the processor are CU and ALU

a). True

b). False

7.Which smaller unit of the CPU directs and coordinates all activities within it and determines the sequence in which instructions are executed, sending instruction sequence to the other units.

a) CU

b) ALU

c) PROCESSOR

d) All of the above

60

8. The CPU primary responsibility is the movement of data and instructions from itself to main memory and ALU and back. Arrange the CU execution of instruction in the correct order by placing the execution instructions letter in the box provided:

a) Send data to memory unit after processing

b) Fetches data required by the instruction from memory

c) Fetches the instruction from memory

d) Decode the instruction

e) Send data and instruction to the ALU for processing

9. Which smaller CPU unit contains registers-temporary storage locations that hold a single instruction or data item needed immediately and frequently

a) CU

b) ALU

c) PROCESSOR

d) All of the above

10. Program counter (PC) and instruction register (IR) are examples of registers:

a) True

b) False

Answers1 ,2 ,3 ,4 ,5 ,6 ,7 8, 9, 10

Chapter 5

Computer Arithmetic

5.1 INTRODUCTION

Because electronic logic deals with currents that are on or off, it has been found convenient to represent quantities in binary form to perform arithmetic on a computer. Thus, instead of having ten different digits, 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, in binary arithmetic, there are only two different digits, 0 and 1, and when moving to the next column, instead of the digit representing a quantity that is ten

61

times as large, it only represents a quantity that is two times as large. Thus, the first few numbers are written in binary as follows:

Decimal BinaryZero 0 0One 1 1Two 2 10Three 3 11Four 4 100Five 5 101Six 6 110Seven 7 111Eight 8 1000Nine 9 1001Ten 10 1010Eleven 11 1011Twelve 12 1100

The addition and multiplication tables for binary arithmetic are very small, and this makes it possible to use logic circuits to build binary adders.

+ | 0 1 * | 0 1--------- ---------0 | 0 1 0 | 0 01 | 1 10 1 | 0 1

Thus, from the table above, when two binary digits, A and B are added, the carry bit is simply (A AND B), while the last digit of the sum is more complicated; ((A AND NOT B) OR ((NOT A) AND B)) is one way to express it.

5.2 Number Systems

ALU does calculations with binary numbers• Decimal number system

Uses 10 digits (0,1,2,3,4,5,6,7,8,9) In decimal system, a number 84, e.g., means

84 = (8x10) + 3 4728 = (4x1000)+(7x100)+(2x10)+8 Base or radix of 10: each digit in the number is multiplied by 10 raised to a power

corresponding to that digit’s position E.g. 83 = (8x101)+ (3x100) 4728 = (4x103)+(7x102)+(2x101)+(8x100)

Decimal number system…

• Fractional values, e.g. 472.83=(4x102)+(7x101)+(2x100)+(8x10-1)+(3x10-2)

62

In general, for the decimal representation ofX = {… x2x1x0.x-1x-2x-3 … }

X = å i xi10i

Binary Number System

• Uses only two digits, 0 and 1• It is base or radix of 2• Each digit has a value depending on its position:

102 = (1x21)+(0x20) = 210 112 = (1x21)+(1x20) = 310 1002 = (1x22)+ (0x21)+(0x20) = 410 1001.1012=(1x23)+(0x22)+(0x21)+(1x20)+(1x2-1)+(0x2-2)+(1x2-3)= 9.62510

Decimal to Binary conversion

• Integer and fractional parts are handled separately, Integer part is handled by repeating division by 2 Factional part is handled by repeating multiplication by 2

• E.g. convert decimal 11.81 to binary Integer part 11 Factional part .81

Decimal to Binary conversion, e.g. //

• e.g. 11.81 to 1011.11001 (approx) 11/2 = 5 remainder 1 5/2 = 2 remainder 1 2/2 = 1 remainder 0 1/2 = 0 remainder 1 Binary number 1011 .81x2 = 1.62 integral part 1 .62x2 = 1.24 integral part 1 .24x2 = 0.48 integral part 0 .48x2 = 0.96 integral part 0 .96x2 = 1.92 integral part 1 Binary number .11001 (approximate)

Hexadecimal Notation:

command ground between computer and Human• Use 16 digits, (0,1,3,…9,A,B,C,D,E,F)• 1A16=(116x161)+(A16x16o)

= (110 x 161)+(1010 x 160)=2610• Convert group of four binary digits to/from one hexadecimal digit,

0000=0; 0001=1; 0010=2; 0011=3; 0100=4; 0101=5; 0110=6; 0111=7; 1000=8; 1001=9; 1010=A; 1011=B; 1100=C; 1101=D; 1110=E; 1111=F;

• e.g. 1101 1110 0001. 1110 1101 = DE1.DE

63

Integer Representation (storage)

• Only have 0 & 1 to represent everything• Positive numbers stored in binary

e.g. 41=00101001• No minus sign• No period• How to represent negative number

Sign-Magnitude Two’s compliment

Sign-Magnitude

• Left most bit is sign bit• 0 means positive• 1 means negative• +18 = 00010010• -18 = 10010010• Problems

Need to consider both sign and magnitude in arithmetic Two representations of zero (+0 and -0)

Two’s Compliment (representation)

• +3 = 00000011• +2 = 00000010• +1 = 00000001• +0 = 00000000• -1 = 11111111• -2 = 11111110• -3 = 11111101

Benefits

• One representation of zero• Arithmetic works easily (see later)• Negating is fairly easy (2’s compliment operation)

3 = 00000011 Boolean complement gives 11111100 Add 1 to LSB 11111101

5.3 Addition and Subtraction

• Normal binary addition• 0011 0101 1100 • +0100 +0100 +1111 • -------- ---------- ------------ • 0111 1001 = overflow 11011

64

• Monitor sign bit for overflow (sign bit change as adding two positive numbers or two negative numbers.)

• Subtraction: Take twos compliment of subtrahend then add to minuend i.e. a - b = a + (-b)

• So we only need addition and complement circuits

5.4 Multiplication

• Complex• Work out partial product for each digit• Take care with place value (column)• Add partial products

Multiplication Example

• (unsigned numbers e.g.)• 1011 Multiplicand (11 dec)• x 1101 Multiplier (13 dec)• 1011 Partial products• 0000 Note: if multiplier bit is 1 copy• 1011 multiplicand (place value)• 1011 otherwise zero• 10001111 Product (143 dec)• Note: need double length result

5.4.1 Booth’s Algorithm for multiplication

65

Example of Booth’s Algorithm

5.5 Division

• More complex than multiplication• However, can utilize most of the same hardware.• Based on long division

Division of Unsigned Binary Integers

001111

101100001101100100111011

0011101011

1011100

Quotient

Dividend

Remainder

Partial

Remainders

Divisor

66

Flowchart for Unsigned Binary division

Real Numbers

• Numbers with fractions• Could be done in pure binary

1001.1010 = 24 + 20 +2-1 + 2-3 =9.625• Where is the binary point?• Fixed?

Very limited• Moving?

How do you show where it is?5.6 Floating Point

• +/- .significand x 2exponent• Point is actually fixed between sign bit and body of mantissa• Exponent indicates place value (point position)

67

Signs for Floating Point

• Exponent is in excess or biased notation

e.g. Excess (bias) 127 means

8 bit exponent field

Pure value range 0-255

Subtract 127 to get correct value

Range -127 to +128

• The relative magnitudes (order) of the numbers do not change. Can be treated as integers for comparison.

5.6.1 Normalization

Floating Point Examples

68

• FP numbers are usually normalized• i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1• Since it is always 1 there is no need to store it• (c.f. Scientific notation where numbers are normalized to give a single digit before the

decimal point• e.g. 3.123 x 103)

FP Ranges

• For a 32 bit number 8 bit exponent +/- 2256 » 1.5 x 1077

• Accuracy The effect of changing lsb of mantissa 23 bit mantissa 2-23 » 1.2 x 10-7 About 6 decimal places

Expressible Numbers

IEEE 754

• Standard for floating point storage• 32 and 64 bit standards• 8 and 11 bit exponent respectively• Extended formats (both mantissa and exponent) for intermediate results• Representation: sign, exponent, faction

0: 0, 0, 0 -0: 1, 0, 0 Plus infinity: 0, all 1s, 0 Minus infinity: 1, all 1s, 0 NaN; 0 or 1, all 1s, =! 0

FP Arithmetic +/-

• Check for zeros• Align significands (adjusting exponents)• Add or subtract significands

69

• Normalize resultFP Arithmetic x/¸

• Check for zero• Add/subtract exponents • Multiply/divide significands (watch sign)• Normalize• Round• All intermediate results should be in double length storage

Floating Point Division

Floating Point Multiplication

70

End chapter quizzes

Q1. The Binary Number system uses the radix of

71

a) 2b) 3c) 1d) 0

Q2. (41) is equivalent to

a) (101001)2b) (.10111)2c) (101010)2d) (156)2

Q3. (0.1011) is equivalent to

a) (41)10b) (41)8c) (0.6785)10d) (0.6875)10

Q4. What is the result of adding the following two positive binary bit strings?

101101.101

+101100.0010

a).1000001.1110

b).1000001.1010

c).1000001.1000

d).1000001.1100

Q5.What name is used to describe the bit of lowest magnitude within abyte or bit string?a). radix pointb). low order bitc). high order bitd). op code

Q6. What is the best way to write the value '7564' and make it clear tothe reader that the number should be interpreted as a hexadecimalvalue?a). 0x7654

72

b). 7654Bc). 7654Hd). \7654

Q7.What is the numeric range of an eight bit unsigned binary number?a). 0..7b). 1..8c). 0..255d). 1..256

Q8. What coding format encodes a real number as a mantissa multipliedby a power (exponent) of two?a). Binaryb). excess notationc). floating pointd). two's complement

Q9. Which of the following does not result from floating point mathoperations?a). Underflowb). Overflowc). Truncationd). Two's complement

Q10. How many bits of information can each memory cell in a computer chip hold?

a). 0 bit

b). 1 bit

c). 8 bits

Answers

1.a ,2. a ,3.c, 4.d, 5.a, 6.a, 7.a, 8.a, 9.d, 10.c

Chapter 6

Input-Output Organization

73

INTRODUCTION

In computing, input/output, or I/O, refers to the communication between an information processing system (such as a computer), and the outside world – possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it. The term can also be used as part of an action; to "perform I/O" is to perform an input or output operation. I/O devices are used by a person (or other system) to communicate with a computer. For instance, keyboards and mouses are considered input devices of a computer, while monitors and printers are considered output devices of a computer. Devices for communication between computers, such as modems and network cards, typically serve for both input and output.

Note that the designation of a device as either input or output depends on the perspective. Mouses and keyboards take as input physical movement that the human user outputs and convert it into signals that a computer can understand. The output from these devices is input for the computer. Similarly, printers and monitors take as input signals that a computer outputs. They then convert these signals into representations that human users can see or read. (For a human user the process of reading or seeing these representations is receiving input.)

In computer architecture, the combination of the CPU and main memory (i.e. memory that the CPU can read and write to directly, with individual instructions) is considered the brain of a computer, and from that point of view any transfer of information from or to that combination, for example to or from a disk drive, is considered I/O. The CPU and its supporting circuitry provide memory-mapped I/O that is used in low-level computer programming in the implementation of device drivers.

INPUT-OUTPUT ORGANIZATION

• Peripheral Devices• Input-Output Interface

• Asynchronous Data Transfer

• Modes of Transfer

• Priority Interrupt

• Direct Memory Access

• Input-Output Processor

• Serial Communication

6.1 PERIPHERAL DEVICES

6.1.1 Keyboard

A keyboard is a human interface device which is represented as a layout of buttons. Each button, or key, can be used to either input a linguistic character to a computer, or to call upon a particular

74

function of the computer. Traditional keyboards use spring-based buttons, though newer variations employ virtual keys, or even projected keyboards.

Examples of types of keyboards include:

Computer keyboard Keyer

Chorded keyboard

LPFK

6.1.2. Pointing devices

A pointing device is any human interface device that allows a user to input spatial data to a computer. In the case of mice and touch screens, this is usually achieved by detecting movement across a physical surface. Analog devices, such as 3D mice, joysticks, or pointing sticks, function by reporting their angle of deflection. Movements of the pointing device are echoed on the screen by movements of the cursor, creating a simple, intuitive way to navigate a computer's GUI.

6.1.3. High-degree of freedom input devices

Some devices allow many continuous degrees of freedom as input. These can be used as pointing devices, but are generally used in ways that don't involve pointing to a location in space, such as the control of a camera angle while in 3D applications. These kinds of devices are typically used in CAVEs, where input that registers 6DOF is

6.1.4. Imaging and Video input devices

Video input devices are used to digitize images or video from the outside world into the computer. The information can be stored in a multitude of formats depending on the user's requirement.

Webcam Image scanner

Fingerprint scanner

Barcode reader

3D scanner

Laser rangefinder

6.1.5. Medical Imaging

o Computed tomography o Magnetic resonance imaging

o Positron emission tomography

75

http://en.wikipedia.org/wiki/Computer_keyboard

http://en.wikipedia.org/wiki/Positron_emission_tomography

http://en.wikipedia.org/wiki/Magnetic_resonance_imaging

http://en.wikipedia.org/wiki/Computed_tomography

http://en.wikipedia.org/wiki/Laser_rangefinder

http://en.wikipedia.org/wiki/3D_scanner

http://en.wikipedia.org/wiki/Barcode_reader

http://en.wikipedia.org/wiki/LPFK

http://en.wikipedia.org/wiki/Chorded_keyboard

http://en.wikipedia.org/wiki/Keyer

o Medical ultrasonography

6.2 Output Devices

An output device is any piece of computer hardware equipment used to communicate the results of data processing carried out by an information processing system (such as a computer) to the outside world. In computing, input/output, or I/O, refers to the communication between an information processing system (such as a computer), and the outside world. Inputs are the signals or data sent to the system, and outputs are the signals or data sent by the system to the outside.

The most common input devices used by the computer are the keyboard and mouse. The keyboard allows the entry of textual information while the mouse allows the selection of a point on the screen by moving a screen cursor to the point and pressing a mouse button. The most common outputs are monitors and speakers

o Card Puncher, Paper Tape Punchero CRT

o Printer (Impact, Ink Jet, Laser, Dot Matrix) Plotter

o Analog

o Voice

6.3 INPUT/OUTPUT INTERFACE

• Provides a method for transferring information between internal storage (such as memory and CPU registers) and external I/O devices

• Resolves the differences between the computer and peripheral devices

– Peripherals - Electromechanical Devices

– CPU or Memory - Electronic Device

– Data Transfer Rate

» Peripherals - Usually slower

» CPU or Memory - Usually faster than peripherals

• Some kinds of Synchronization mechanism may be needed

– Unit of Information

» Peripherals – Byte, Block, …

» CPU or Memory – Word

76

http://en.wikipedia.org/wiki/Medical_ultrasonography

– Data representations may differ

6.3.1 BUS AND INTERFACE MODULES

Each peripheral has an interface module associated with it Interface.

- Decodes the device address (device code)

- Decodes the commands (operation)

- Provides signals for the peripheral controller

- Synchronizes the data flow and supervises

the transfer rate between peripheral and CPU or Memory

Typical I/O instruction

Commands

6.3.2. CONNECTION OF I/O BUS

6.3.2.1 Connection of I/O Bus to CPU

77

Op. code

Device address

Function code

6.3.2.2.Connection of I/O to One interface

6.3.4. I/O BUS AND MEMORY BUS

Functions of Buses

MEMORY BUS is for information transfers between CPU and the MM

* I/O BUS is for information transfers between CPU and I/O devices through their I/O interface

* Many computers use a common single bus system for both memory and I/O interface units

- Use one common bus but separate control lines for each function

- Use one common bus with common control lines for both functions

* Some computer systems use two separate buses, one to communicate with memory and the other with I/O interfaces

- Communication between CPU and all interface units is via a common I/O Bus

- An interface connected to a peripheral device may have a number of data registers , a control register, and a status register

- A command is passed to the peripheral by sending to the appropriate interface register

- Function code and sense lines are not needed (Transfer of data, control, and status information is always via the common I/O Bus)

6.3.4. ISOLATED vs MEMORY MAPPED I/O

Isolated I/O

78

Separate I/O read/write control lines in addition to memory read/write control lines.

- Separate (isolated) memory and I/O address spaces.

- Distinct input and output instructions.

Memory-mapped I/O

A single set of read/write control lines(no distinction between memory and I/O transfer)

- Memory and I/O addresses share the common address space

-> reduces memory address range available

- No specific input or output instruction

-> The same memory reference instructions can be used for I/O transfers

- Considerable flexibility in handling I/O operations

6.3.5. I/O INTERFACE

6.3.6. Programmable Interface

- Information in each port can be assigned a meaning depending on the mode of operation of the I/O device. → Port A = Data; Port B = Command; Port C = Status

- CPU initializes(loads) each port by transferring a byte to the Control Register .

→ Allows CPU can define the mode of operation of each port.

→ Programmable Port: By changing the bits in the control register, it is possible to change the interface characteristics.

79

6.4 ASYNCHRONOUS DATA TRANSFER

6.4.1. Synchronous and Asynchronous Operations

Synchronous - All devices derive the timing information from common clock line.

Asynchronous - No common clock.

6.4.2 Asynchronous Data Transfer

Asynchronous data transfer between two independent units requires that control signals be transmitted between the communicating units to indicate the time at which data is being transmitted.

Two Asynchronous Data Transfer Methods

Strobe pulse : A strobe pulse is supplied by one unit to indicate the other unit when the transfer has to occur.

Handshaking: A control signal is accompanied with each data being transmitted to indicate the presence of data .The receiving unit responds with another control signal to acknowledge receipt of the data.

6.4.2.1 STROBE CONTROL

* Employs a single control line to time each transfer

* The strobe may be activated by either the source or the destination unit

Source-Initiated Strobe for Data Transfer Destination-Initiated Strobe for Data Transfer

6.4.2.2. HANDSHAKING

Strobe Methods

80

1. Source-Initiated

The source unit that initiates the transfer has no way of knowing whether the destination unit has actually received data

2. Destination-Initiated

The destination unit that initiates the transfer no way of knowing whether the source has actually placed the data on the bus.To solve this problem, the HANDSHAKE method introduces a second control signal to provide a Reply to the unit that initiates the transfer.

SOURCE-INITIATED TRANSFER USING HANDSHAKE

* Allows arbitrary delays from one state to the next

* Permits each unit to respond at its own data transfer rate

* The rate of transfer is determined by the slower unit

DESTINATION-INITIATED TRANSFER USING HANDSHAKE

6.4. ASYNCHRONOUS SERIAL TRANSFER

Four Different Types of Transfer

Asynchronous serial transfer

81

Synchronous serial transfer

Asynchronous parallel transfer

Synchronous parallel transfer

Asynchronous Serial Transfer

- Employs special bits which are inserted at both ends of the character code .

- Each character consists of three parts; Start bit; Data bits; Stop bits.

A character can be detected by the receiver from the knowledge of 4 rules;

When data are not being sent, the line is kept in the 1-state (idle state)

- The initiation of a character transmission is detected by a Start Bit , which is always a 0.

- The character bits always follow the Start Bit.

- After the last character , a Stop Bit is detected when the line returns to the 1-state for at least 1 bit time.

The receiver knows in advance the transfer rate of the bits and the number of information bits to expect.

UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER - UART –

82

Transmitter Register

- Accepts a data byte(from CPU) through the data bus

- Transferred to a shift register for serial transmission

Receiver

- Receives serial information into another shift register

- Complete data byte is sent to the receiver register

Status Register Bits

- Used for I/O flags and for recording errors

Control Register Bits.

Define baud rate, no. of bits in each character, whether to generate and check parity, and no. of stop bits.

FIRST-IN-FIRST-OUT(FIFO) BUFFER

* Input data and output data at two different rates

* Output data are always in the same order in which the data entered the buffer.

* Useful in some applications when data is transferred asynchronously

4 x 4 FIFO Buffer (4 4-bit registers Ri),

4 Control Registers(flip-flops Fi, associated with each

MODES OF TRANSFER - PROGRAM-CONTROLLED I/O -

83

3 different Data Transfer Modes between the central computer (CPU or Memory) and peripherals;

1. Program-Controlled I/O

2. Interrupt-Initiated I/O

3. Direct Memory Access (DMA)

Program-Controlled I/O(Input Dev to CPU)

MODES OF TRANSFER - INTERRUPT INITIATED I/O & DMA

- Polling takes valuable CPU time.

- Open communication only when some data has to be passed -> Interrupt.

- I/O interface, instead of the CPU, monitors the I/O device.

- When the interface determines that the I/O device is ready for data transfer, it generates an Interrupt Request to the CPU.

- Upon detecting an interrupt, CPU stops momentarily the task it is doing, branches to the service routine to process the data transfer, and then returns to the task it was performing.

DMA (Direct Memory Access)

- Large blocks of data transferred at a high speed to or from high speed devices, magnetic drums, disks, tapes, etc.

84

- DMA controller : Interface that provides I/O transfer of data directly to and from the memory and the I/O device.

- CPU initializes the DMA controller by sending a memory address and the number of words to be transferred.

- Actual transfer of data is done directly between the device and memory through DMA controller.

-> Freeing CPU for other tasks

PRIORITY INTERRUPT

Priority

- Determines which interrupt is to be served first when two or more requests are made simultaneously. And also determines which interrupts are permitted to interrupt the computer while another is being serviced.

- Higher priority interrupts can make requests while servicing a lower priority interrupt .

Priority Interrupt by Software(Polling)

- Priority is established by the order of polling the devices(interrupt sources) and flexible since it is established by software.

- Low cost since it needs a very little hardware.

- Very slow .

Priority Interrupt by Hardware

Require a priority interrupt manager which accepts all the interrupt requests to determine the highest priority request . Fast since identification of the highest priority interrupt request is identified by the hardware. Fast since each interrupt source has its own interrupt vector to access directly to its own service routine .

HARDWARE PRIORITY INTERRUPT - DAISY-CHAIN -

Interrupt Request from any device(>=1)

-> CPU responds by INTACK <- 1.

-> Any device receives signal (INTACK) 1 at PI puts the VAD on the bus.

Among interrupt requesting devices the only device which is physically closest to CPU gets INTACK=1, and it blocks INTACK to propagate to the next device.

85

One stage of the daisy chain priority arrangement

PARALLEL PRIORITY INTERRUPT

IEN: Set or Clear by instructions ION or IOF.

IST: Represents an unmasked interrupt has occurred. INTACK enables tristate Bus Buffer to load VAD generated by the Priority Logic.

Interrupt Register:

- Each bit is associated with an Interrupt Request from different Interrupt Source - different priority level.

- Each bit can be cleared by a program instruction.

Mask Register:

- Mask Register is associated with Interrupt Register.

- Each bit can be set or cleared by an Instruction.

86

INTERRUPT PRIORITY ENCODER

Determines the highest priority interrupt when more than one interrupts take place.

Priority Encoder Truth table

INTERRUPT CYCLE

At the end of each Instruction cycle :

- CPU checks IEN and IST.

- If IEN IST = 1, CPU -> Interrupt Cycle.

SP ¬SP - 1 Decrement stack pointer.

M[SP] ¬ PC Push PC into stack.

INTACK ¬ 1 Enable interrupt acknowledge.

PC ¬ VAD Transfer vector address to PC.

IEN ¬ 0 Disable further interrupts.

Go To Fetch to execute the first instruction in the interrupt service routine.

INTERRUPT SERVICE ROUTINE

87

Initial and Final Operations

Each interrupt service routine must have an initial and final set of operations for controlling the registers in the hardware interrupt system.

Initial and Final Operations

6.5 DIRECT MEMORY ACCESS

* Block of data transfer from high speed devices, Drum, Disk, Tape.

* DMA controller - Interface which allows I/O transfer directly between Memory and Device, freeing CPU for other tasks.

* CPU initializes DMA Controller by sending memory address and the block size (number of words) .

CPU bus signals for DMA transfer

Block diagram of DMA controller

88

6.5.1. DMA I/O OPERATION

Starting an I/O

- CPU executes instruction to

a. Load Memory Address Register

b. Load Word Counter

c. Load Function (Read or Write) to be performed

d. Issue a GO command

Upon receiving, a GO Command DMA performs I/O operation as follows independently from CPU.

[1] Input Device <- R (Read control signal).

[2] Buffer (DMA Controller) <- Input Byte;and assembles the byte into a word until word is full.

[4] M <- memory address, W(Write control signal).

[5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC – 1.

[6] If WC = 0, then Interrupt to acknowledge done, else go to [1] .

Output

[1] M <- M Address, R, M Address R <- M Address R + 1, WC <- WC – 1.

[2] Disassemble the word .

[3] Buffer <- One byte; Output Device <- W, for all disassembled bytes .

[4] If WC = 0, then Interrupt to acknowledge done, else go to [1].

89

6.5.2. CYCLE STEALING

While DMA I/O takes place, CPU is also executing instructions DMA Controller and CPU both access Memory -> Memory Access Conflict .

Memory Bus Controller

- Coordinating the activities of all devices requesting memory access

- Priority System

Memory accesses by CPU and DMA Controller are interwoven, with the top priority given to DMA Controller

-> Cycle Stealing

Cycle Steal

- CPU is usually much faster than I/O(DMA), thus CPU uses the most of the memory cycles.

- DMA Controller steals the memory cycles from CPU.

- For those stolen cycles, CPU remains idle.

- For those slow CPU, DMA Controller may steal most of the memory cycles which may cause CPU remain idle long time .

DMA TRANSFER

90

6.6 PRIORITY INPUT/OUTPUT PROCESSOR

Channel

- Processor with direct memory access capability that communicates with I/O devices.

- Channel accesses memory by cycle stealing.

- Channel can execute a Channel Program.

i. Stored in the main memory.

ii. Consists of Channel Command Word(CCW).

iii. Each CCW specifies the parameters needed by the channel to control the I/O devices and perform data transfer operations.

- CPU initiates the channel by executing an channel I/O class instruction and once initiated, channel operates independently of the CPU .

CHANNEL / CPU COMMUNICATION

91

Multiple choice Questions

1. In Programmable interface each port contains

a).Port A=data ; Port B= Command ; Port C=status ;

b).Port A=data ; Port B= status ; Port C =Command ;

c).Port A=Command ; Port B= Data ; Port=status ;

2._________ types of Asynchronous Data Transfer methods are

a). 3, strobe , stroke and Handshaking

b).2, strobe and handshaking

c). 1 , handshaking

3. Four different types of asynchronous serial transfer

a).Asynchronous serial and parallel transfer

b). synchronous serial and asynchronous parallel transfer

c).Asynchronous serial, synchronous parallel Asynchronous parallel, synchronous serial.

4.Status register is used for

92

a). I/O flag and to receive information

b).I/O flag and for recording errors.

c).for recording errors and to receive information

5 Which one is incorrect

a).CPU is usually much faster than I/O(DMA) .

b). DMA Controller steals the memory cycles from CPU.

c).For those stolen cycles, CPU remains idle.

d). For those slow CPU, DMA Controller may steal most of the memory cycles which may cause CPU remain idle for few seconds .

6. Transmitter register

a).Accepts a data byte through data bus.

b). Accepts a data byte through memory bus.

c). Accepts a data byte through address bus.

7.The larger the RAM of computer the faster its processing speed is, since it eliminates.a). need for external memory b). need for ROM

c).frequent disk I/Os. d). need for wider data path

8. A group of signal lines used to transmit data in parallel from one element of a computer to another isa).Control Bus b).Address Bus

c). Databus d). Network

9. The basic unit within a computer store capable of holding a single unit of Data is

a). register b).ALU

c).Control unit d). store location

10.A device used to bring information into a computer is

a).ALU b).Input device

93

c).Control unit d).Output device

Answers

1.a, 2.b,3.c,4.b,5.d,6.a, 7.b,8.c,9.b,10.d

Chapter 7

Memory Organization

7.1 Memory Hierarchy

The memory unit is an essential components in any digital computer since it is needed for strong progress and data. Most general purpose computer would run more efficiently if they were equipped with additional storage

device beyond the capacity of main memory.The main memory unit that communicates directly with CPU is called the MAIN MEMORY . Devices that provide backup storage are called AUXILARY MEMORY. Most common auxiliary devices are magnetic disks and tapes they are used for strong system programs, large data files and other backup information. Only programs and data currently needed by the processor resides in main memory. All other informations are stored in auxiliary memory and transferred to the main memory when needed.

The main memory hierarchy system consists of all storage devices employed in a computer system from the slow but high –capacity auxiliary memory to a relatively faster main memory, to an even smaller and faster cache memory accessible to the high-speed processing logic. Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the memory system.

Memory Hierarchy in computer system

94

A very high speed memory is called cache memory used to increase the speed of processing by making current programs and data available to the CPU at rapid rate.The cache memory is employed in the system to compensates the speed differential between main memory access time and processor logic.

7.2 Main Memory

The main memory is the central storage unit in a computer system. It is a relatively large and fast memory used to store programs and data during the computer operations. The principal technology used for maim memory is based on semiconductor integrated circuits. Integrated circuits RAM chips are available in two possible operating modes static and dynamic. The static RAM is easier to use and has shorter read and write cycles.

The dynamic RAM offers reduced power consumption and larger storage capacity in a single memory chip compared to static RAM.

7.2.1 RAM and ROM Chips

95

Most of main memory in a general- purpose computer is made up of RAM integrated circuit chips, but apportion of the memory may be constructed with ROM chips. Originally RAM was used to refer the random access memory, but now it used to designate the read/write memory to distinguish it from only read only memory, although ROM is also a random access. RAM is used for storing bulk of programs and data that are subject to change. ROM are used to for storing programs that are permanently resident in the computer and for tables of constants that do not change in value once the production of computer s completed. Among other things , the ROM portion is used to store the initial programs called a bootstrap loader .This is program whose function is used to turn on the computer software operating system. Since RAM is volatile its content are destroyed when power is turn off on other side the content in ROM remain unchanged either the power is turn off and on again.

7.2.2 Memory Address maps

The designer of computer system must calculate the amount of memory required for particular application and assign it to either RAM and ROM. The interconnection between memory and processor is an established from knowledge of the size of memory needed and the type of RAM and ROM chips available. The addressing of memory can be established by means of a table that specifies the memory address to each chip. The table, called a memory address map , is a pictorial representation of assigned address space for each chip in the system.

Example: 512 bytes RAM and 512 bytes ROM

96

Memory Connection to CPU

1. RAM and ROM chips are connected to a CPU through the data and address buses.2. The low-order lines in the address bus select the byte within the chips and other lines in the

address bus select a particular chip through its chip select inputs.

7.3. Auxiliary Memory

The most common auxiliary memory devices used in computer systems are magnetic disks and tapes. Other components used, but not as frequently, are magnetic drums, magnetic bubble memory, and optical disks. To understand fully the physical mechanism of auxiliary memory devices one must have knowledge of magnetic, electronics and electronics and electromechanical systems.

7.3.1 Magnetic Tapes

A magnetic tape transport consists of electrical, mechanical and electronic components to provide the parts and control mechanism for magnetic – tape unit. The tape itself is a strip of coated with magnetic recording medium. Bits are recorded as magnetic spots on the tape along tracks. Usually, seven or nine bits are recorded simultaneously to from a character together with a parity bit. Read/write heads are mounted one in each track so that data can be recorded and read as a sequence of characters.

97

7.3.2 Magnetic Disks

A magnetic disk is a circular plate constructed of metals or plastic coated with magnetized. Often both sides of disk are used and several disks may be stacked on one spindle with read/write heads available on each surface. All disks rotate together at high speed and are not stopped or started for access purposes. Bits are stored in magnetized surface in spots along concentric circles called track. The tracks are commonly divided into section called sectors. In most systems, the minimum quality of information, which can be transferred, is a sector.

7.3.3 RAID

RAID is an acronym first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the University of California, Berkeley in 1987 to describe a Redundant Array of Inexpensive Disks a technology that allowed computer users to achieve high levels of storage reliability from low-cost and less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for redundancy .More recently, marketers representing industry RAID manufacturers reinvented the term to describe a Redundant Array of Independent Disks as a means of disassociating a "low cost" expectation from RAID technology.

"RAID" is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple hard disk drives. The different Schemes/architectures are named by the word RAID followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs all involve two key design

98

goals: increased data reliability or increased input/output performance. When multiple physical disks are set up to use RAID technology, they are said to be in a RAID array. This array distributes data across multiple disks, but the array is seen by the computer user and operating system as one single disk. RAID can be set up to serve several different purposes.

Purpose and basics: Redundancy is achieved by either writing the same data to multiple drives (known as mirroring), or writing extra data (known as parity data) across the array, calculated such that the failure of one (or possibly more, depending on the type of RAID) disks in the array will not result in loss of data. A failed disk may be replaced by a new one, and the lost data reconstructed from the remaining data and the parity data. Organizing disks into a redundant array decreases the usable storage capacity. For instance, a 2-disk RAID 1 array loses half of the total capacity that would have otherwise been available using both disks independently, and a RAID 5 array with several disks loses the capacity of one disk. Other types of RAID arrays are arranged so that they are faster to write to and read from than a single disk.There are various combinations of these approaches giving different trade-offs of protection against data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found, and cover most requirements.

RAID can involve significant computation when reading and writing information. With traditional "real" RAID hardware, a separate controller does this computation. In other cases the operating system or simpler and less expensive controllers require the host computer's processor to do the computing, which reduces the computer's performance on processor-intensive tasks (see "Software RAID" and "Fake RAID" below). Simpler RAID controllers may provide only levels 0 and 1, which require less processing.

RAID systems with redundancy continue working without interruption when one (or possibly more, depending on the type of RAID) disks of the array fail, although they are then vulnerable to further failures. When the bad disk is replaced by a new one the array is rebuilt while the system continues to operate normally. Some systems have to be powered down when removing or adding a drive; others support hot swapping, allowing drives to be replaced without powering down. RAID with hot-swapping is often used in high availability systems, where it is important that the system remains running as much of the time as possible.

Principles: RAID combines two or more physical hard disks into a single logical unit by using either special hardware or software. Hardware solutions often are designed to present themselves to the attached system as a single hard drive, so that the operating system would be unaware of the technical workings. For example, you might configure a 1TB RAID 5 array using three 500GB hard drives in hardware RAID, the operating system would simply be presented with a "single" 1TB disk. Software solutions are typically implemented in the operating system and would present the RAID drive as a single drive to applications running upon the operating system.

There are three key concepts in RAID: mirroring, the copying of data to more than one disk; striping, the splitting of data across more than one disk; and error correction, where redundant data is stored to allow problems to be detected and possibly fixed (known as fault tolerance). Different RAID levels use one or more of these techniques, depending on the system requirements. RAID's main aim can be either to improve reliability and availability of data, ensuring that important data is available more

99

often than not (e.g. a database of customer orders), or merely to improve the access speed to files (e.g. for a system that delivers video on demand TV programs to many viewers).

7.4 Associative memory Many data-processing application require the search of items in a table stored in memory. The established way to search a table is to store all items where they can be addressed in a sequence. The search procedure is a strategy for choosing a sequence of addresses, reading the content of memory at each address, and comparing the information read with the item being searched until the match occurs.

The number of accesses to memory depends on the location of item and efficiency of the search algorithm.

The time required to find the item stored in memory can be reduced considerably if stored data can be identified for access by content of the data itself rather than by an address. A memory unit accessed by a content is called associative memory or content addressable memory(CAM).

Compare each word in CAM in parallel with the content of A(Argument Register)

- If CAM Word[i] = A, M(i) = 1

- Read sequentially accessing CAM for CAM Word(i) for M(i) = 1

- K(Key Register) provides a mask for choosing a particular field or key in the argument in A(only those bits in the argument that have 1’s in their corresponding position of K are compared).

Organization of CAM

100

7.5 Cache memory

The cache is a small amount of high-speed memory, usually with a memory cycle time comparable to the time required by the CPU to fetch one instruction. The cache is usually filled from main memory when instructions or data are fetched into the CPU. Often the main memory will supply a wider data word to the cache than the CPU requires, to fill the cache more rapidly. The amount of information which is replaces at one time in the cache is called the line size for the cache. This is normally the width of the data bus between the cache memory and the main memory. A wide line size for the cache means that several instruction or data words are loaded into the cache at one time, providing a kind of prefetching for instructions or data. Since the cache is small, the effectiveness of the cache relies on the following properties of most programs:

Spatial locality -- most programs are highly sequential; the next instruction usually comes from the next memory location.

Data is usually structured, and data in these structures normally are stored in contiguous memory locations.

Short loops are a common program structure, especially for the innermost sets of nested loops. This means that the same small set of instructions is used over and over.

Generally, several operations are performed on the same data values, or variables.

When a cache is used, there must be some way in which the memory controller determines whether the value currently being addressed in memory is available from the cache. There are several ways that this can be accomplished. One possibility is to store both the address and the value from main memory in the cache, with the address stored in a type of memory called associative memory or, more descriptively, content addressable memory.

101

An associative memory, or content addressable memory, has the property that when a value is presented to the memory, the address of the value is returned if the value is stored in the memory, otherwise an indication that the value is not in the associative memory is returned. All of the comparisons are done simultaneously, so the search is performed very quickly. This type of memory is very expensive, because each memory location must have both a comparator and a storage element. A cache memory can be implemented with a block of associative memory, together with a block of `òrdinary'' memory. The associative memory would hold the address of the data stored in the cache, and the ordinary memory would contain the data at that address. Such a cache memory might be configured as shown in Figure.

Figure: A cache implemented with associative memory

If the address is not found in the associative memory, then the value is obtained from main memory. Associative memory is very expensive, because a comparator is required for every word in the memory, to perform all the comparisons in parallel. A cheaper way to implement a cache memory, without using expensive associative memory, is to use direct mapping. Here, part of the memory address (usually the low order digits of the address) is used to address a word in the cache. This part of the address is called the index. The remaining high-order bits in the address, called the tag, are stored in the cache memory along with the data. For example, if a processor has an 18 bit address for memory, and a cache of 1 K words of 2 bytes (16 bits) length, and the processor can address single bytes or 2 byte words, we might have the memory address field and cache organized as in Figure .

102

Figure: A direct mapped cache configuration

This was, in fact, the way the cache is organized in the PDP-11/60. In the 11/60, however, there are 4 other bits used to ensure that the data in the cache is valid. 3 of these are parity bits; one for each byte and one for the tag. The parity bits are used to check that a single bit error has not occurred to the data while in the cache. A fourth bit, called the valid bit is used to indicate whether or not a given location in cache is valid. In the PDP-11/60 and in many other processors, the cache is not updated if memory is altered by a device other than the CPU (for example when a disk stores new data in memory). When such a memory operation occurs to a location which has its value stored in cache, the valid bit is reset to show that the data is ``stale'' and does not correspond to the data in main memory. As well, the valid bit is reset when power is first applied to the processor or when the processor recovers from a power failure, because the data found in the cache at that time will be invalid. In the PDP-11/60, the data path from memory to cache was the same size (16 bits) as from cache to the CPU. (In the PDP-11/70, a faster machine, the data path from the CPU to cache was 16 bits, while from memory to cache was 32 bits which means that the cache had effectively prefetched the next instruction, approximately half of the time). The amount of information (instructions or data) stored with each tag in the cache is called the line size of the cache. (It is usually the same size as the data path from main memory to the cache.) A large line size allows the prefetching of a number of instructions or data words. All items in a line of the cache are replaced in the cache simultaneously, however, resulting in a larger block of data being replaced for each cache miss.

The MIPS R2000/R3000 had a built-in cache controller which could control a cache up to 64K bytes. For a similar 2K word (or 8K byte) cache, the MIPS processor would typically have a cache configuration as shown in Figure . Generally, the MIPS cache would be larger (64Kbytes would be typical, and line sizes of 1, 2 or 4 words would be typical).

103

Figure: One possible MIPS cache organization

A characteristic of the direct mapped cache is that a particular memory address can be mapped into only one cache location. Many memory addresses are mapped to the same cache location (in fact, all addresses with the same index field are mapped to the same cache location.) Whenever a ``cache miss'' occurs, the cache line will be replaced by a new line of information from main memory at an address with the same index but with a different tag.

Note that if the program ``jumps around'' in memory, this cache organization will likely not be effective because the index range is limited. Also, if both instructions and data are stored in cache, it may well happen that both map into the same area of cache, and may cause each other to be replaced very often. This could happen, for example, if the code for a matrix operation and the matrix data itself happened to have the same index values.

A more interesting configuration for a cache is the set associative cache, which uses a set associative mapping. In this cache organization, a given memory location can be mapped to more than one cache location. Here, each index corresponds to two or more data words, each with a corresponding tag. A set associative cache with n tag and data fields is called an ``n-way set associative cache''. Usually

, for k = 1, 2, 3 are chosen for a set associative cache (k = 0 corresponds to direct mapping). Such n-way set associative caches allow interesting tradeoff possibilities; cache performance can be improved by increasing the number of ``ways'', or by increasing the line size, for a given total amount

104

of memory. An example of a 2-way set associative cache is shown in Figure , which shows a cache containing a total of 2K lines, or 1 K sets, each set being 2-way associative. (The sets correspond to the rows in the figure.)

Figure: A set-associative cache organization

In a 2-way set associative cache, if one data word is empty for a read operation corresponding to a particular index, then it is filled. If both data words are filled, then one must be overwritten by the new data. Similarly, in an n-way set associative cache, if all n data and tag fields in a set are filled, then one value in the set must be overwritten, or replaced, in the cache by the new tag and data values. Note that an entire line must be replaced each time. The most common replacement algorithms are:

Random -- the location for the value to be replaced is chosen at random from all n of the cache locations at that index position. In a 2-way set associative cache, this can be accomplished with a single modulo 2 random variable obtained, say, from an internal clock.

First in, first out (FIFO) -- here the first value stored in the cache, at each index position, is the value to be replaced. For a 2-way set associative cache, this replacement strategy can be implemented by setting a pointer to the previously loaded word each time a new word is stored in the cache; this pointer need only be a single bit. (For set sizes > 2, this algorithm can be implemented with a counter value stored for each ``line'', or index in the cache, and the cache can be filled in a ``round robin'' fashion).

Least recently used (LRU) -- here the value which was actually used least recently is replaced. In general, it is more likely that the most recently used value will be the one required in the near future. For a 2-way set associative cache, this is readily implemented by setting a special bit called the `ÙSED'' bit for the other word when a value is accessed while the corresponding bit for the word which was accessed is reset. The value to be replaced is then the value with the USED bit set. This replacement strategy can be implemented by adding a single USED bit to each cache location. The LRU strategy operates by setting a bit in the other word when a value is stored and resetting the corresponding bit for the new word. For an n-way set associative cache, this strategy can be implemented by storing a modulo n counter with each data word. (It is an interesting exercise to determine exactly what must be done in this case. The required circuitry may become somewhat complex, for large n.)

Cache memories normally allow one of two things to happen when data is written into a memory location for which there is a value stored in cache:

Write through cache -- both the cache and main memory are updated at the same time. This may slow down the execution of instructions which write data to memory, because of the

105

relatively longer write time to main memory. Buffering memory writes can help speed up memory writes if they are relatively infrequent, however.

Write back cache -- here only the cache is updated directly by the CPU; the cache memory controller marks the value so that it can be written back into memory when the word is removed from the cache. This method is used because a memory location may often be altered several times while it is still in cache without having to write the value into main memory. This method is often implemented using an `ÀLTERED'' bit in the cache. The ALTERED bit is set whenever a cache value is written into by the processor. Only if the ALTERED bit is set is it necessary to write the value back into main memory (i.e., only values which have been altered must be written back into main memory). The value should be written back immediately before the value is replaced in the cache.

7.6 Virtual Memory Concept

In a memory hierarchy system, programs and data are first stored in auxiliary memory. Portion of program or data are brought into main memory as they are needed by CPU. Virtual memory is a concept used in some large computer systems that permit the user to construct programs as though a large memory space were available , equal to the totality of auxiliary memory. Each address that is referenced by CPU goes through the address mapping from the so called virtual address to a physical address in memory.Virtual memory is used to give the programmer the illusion that the system has a very large memory, even though the computer actually has a relatively small main memory.

7.6.1 Address Mapping Memory Mapping Table for Virtual Address -> Physical address

Address Space and Memory Space are each divided into fixed size group of words called blocks or pages

1K words group

106

Organization of memory Mapping Table in a paged system

7.6.2 Associative memory page table

Assume that Number of Blocks in memory = m, Number of Pages in Virtual Address Space = n.

Page Table

Straight forward design -> n entry table in memory, Inefficient storage space utilization <- n-m entries of the table is empty

More efficient method is m-entry Page Table. Page Table made of an Associative Memory that is m words; (Page Number: Block Number)

107

Page Fault

1. Trap to the OS

2. Save the user registers and program state

3. Determine that the interrupt was a page fault

4. Check that the page reference was legal and determine the location of the page on the backing store(disk)

5. Issue a read from the backing store to a free frame

a. Wait in a queue for this device until serviced

b. Wait for the device seek and/or latency time

c. Begin the transfer of the page to a free frame

6. While waiting, the CPU may be allocated to some other process

7. Interrupt from the backing store (I/O completed)

8. Save the registers and program state for the other user

9. Determine that the interrupt was from the backing store

10. Correct the page tables (the desired page is now in memory)

11. Wait for the CPU to be allocated to this process again

12. Restore the user registers, program state, and new page table, then resume the interrupted instruction.

1 0 1 Line number

Page no.

Argument register

1 0 1 0 0

0 0 1 1 10 1 0 0 01 0 1 0 11 1 0 1 0

Key register

Associative memory

Page no. Block no.

Virtual address

108

Processor architecture should provide the ability to restart any instruction after a page fault.

109

Multiple choice questions

1.The content of these chips are lost when the computer is switched off?

a) ROM Chipsb) RAM Chipsc) DRAM Chips

2.What are responsible for storing permanent data and instructions?

a) ROM Chipsb) RAM Chipsc) DRAM Chips

3. which memory is used to speedup the processes

a) Associative memoryb) Main memoryc) Cache memory

4.How many bits of information can each memory cell in a computer chip hold?

a).0 bit

b).1 bit

c).8 bits

5.What type of computer chips are said to be volatile

a).ROM Chips

b).RAM Chips

c).DRAM Chips

6. The interface between level-2 (operating system) and level-1 (Microprogram) of a computer design is called:

a).Computer architecture

b).Virtual machine interface

110

c).User interface

d).All of the above

7.Which memory is actual working memory

a)SAM b)ROM

c).RAM d).TAB MEMORY

8.Which of the following is secondary memory device

a). ALU b).keyboard c). Disk d) All of the above

9. A micro computer has primary memory of 640k . What is the exact number of bytes contained in this memory?

a). 64x 1000 b).640x100 c).640x1024 d). either b or c

10.How many bits can be stored in 8K RAM

a). 8000 b).8192 c). 4000 d). 4096

Answers

1. a , 2.b , 3. a,4.c ,5.b ,6.d ,7. d ,8. c , 9. c,10.d

111

Chapter 8

Pipeline and Vector Processing

8.1 Parallel Processing

Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data processing tasks for the purpose of increasing the computational speed of computer. The purpose of parallel processing is to speed up the computer processing capability and increased its throughput that is, the amount of processing can be accomplished during a interval of time the amount of hardware increases with parallel processing and with it the cost of system increases.

Processor with multiple functional units 8.1

Parallel processing at a higher level of complexity can be achieved by having a multiplicity of functional units that perform identical or different operation simultaneously. Parallel processing is established by distributing the data among the multiple functional units. . For example, the arithmetic, logic, and shift operations can be separated into three units and the operands diverted to each unit under the supervision of a control unit . The normal operation of a computer is to fetch instructions from memory and execute them in the processor.

The sequence of instructions read from memory constitutes an instruction stream. The operations performed on the data in the processor constitutes a data stream. Parallel processing may occur in the instruction stream, in the data stream, or in both. Flynn's classification divides computers into four

112

major groups as follows:

1. Single instruction stream, single data stream (SISD)2. Single instruction stream, multiple data stream (SIMD) 3. Multiple instruction stream, single data stream (MISD)4. Multiple instruction stream, multiple data stream (MIMD)

SISD represents the organization of single computer containing a control unit, a processor unit and memory unit. Instruction are executed sequentially and system may or may not have internal parallel processing capabilities. Parallel processing in this case may be achieved by means of multiple functional units or by pipeline processing.

SIMD represents an organization that includes many processing units under the supervision of a common control unit .All processor receive the same instruction from the control unit but operate on the different items of data. The shared memory unit must contain multiple modules so that it can communicate with all the processors simultaneously. MISD structure is only of theoretical interest since no practical system has been constructed using this organization. MIMD organization refer to computer system capable of processing several programs at the same time. Most multiprocessor and multicomputer systems can be classified in this category.

Flynn's classification depends on the distinction between the performance of the control unit and the data-processing unit. It emphasizes the behavior characteristics of the computer system rather than its operational and structural interconnections. One type of parallel processing that does not fit Flynn's classification is pipelining.

8.2 Pipelining

Pipelining is a technique of decomposing a sequential process into suboperations, with each subprocess being executed in special dedicated segment that operates concurrently with all other segments. A pipeline can be visualized as a collection of processing segments through which binary information flows.

Each segment performs partial processing dictated by the way the task is partitioned. The result obtained from the computation in each segment is transferred to the next segment in the pipeline. The final result is obtained after the data have passed through all segments. The name "pipeline" implies a flow of information analogous to an industrial assembly line. It is characteristic of pipelines that several computations can be in progress in distinct segments at the same time. The overlapping of computation is made possible by associating a register with each segment in the pipeline. The registers provide isolation between each segment so that each can operate on distinct data simultaneously.

Perhaps the simplest way of viewing the pipeline structure is to imagine that each segment consists of an input register followed by a combinational circuit. The register holds the data and the combinational circuit performs the suboperation in the particular segment. The output of the combinational circuit in a

113

given segment is applied to the input register of the next segment. A clock is applied to all registers after enough time has elapsed to perform all segment activity. In this way the information flows through the pipeline one step at a time.

The pipeline organization will be demonstrated by means of a simple example. Suppose that we want to perform the combined multiply and add operations with a stream of numbers.

Ai*Bi + Cj

for i = 1,2,3, . . . , 7

Each suboperation is to be implemented in a segment within a pipeline. Each segment has one or two registers and a combinational circuit as shown in Figure 8.2. Rl through R5 are registers that receive new data with every clock pulse.The multiplier and adder are combinational circuits. The suboperations performed in each segment of the pipeline are as follows:

R1<--Aj, R2<--Bj Input Aj and Bj

R3<--Rl*R2, R4-Cj Multiply and input Ci

R5<--R3 + R4 Add Cj to product

Example of pipeline processing 8.2

8.3 Arithmetic Pipeline

114

Pipeline arithmetic units are usually found in very high speed computers. They are used to implement floating-point operations, multiplication of fixed-point numbers, and similar computations encountered in scientific problems. A pipeline multiplier is essentially an array multiplier as described in Fig. 10-10, with special adders designed to minimize the carry propagation time through the partial products. Floating-point operations are easily decomposed into suboperations as demonstrated in . We will now show an example of a pipeline unit for floating-point addition and subtraction.

Floating Point Arithmetic Pipeline: a. Suitable for pipeline as it consists of series of steps

b. Can implement FP ADD algorithm using the following pipeline

Fixed Point Arithmetic Pipeline: a. Can convert the “ripple-carry” adder as follows:

b. Can be used for both ADD & SUB

8.4 Instruction Pipeline

Pipeline processing can occur not only in the data stream but in the instruction stream as well. An instruction pipeline reads consecutive instructions from memory while previous instructions are being executed in other segments.This causes the instruction fetch and execute phases to overlap and perform simultaneous operations. One possible digression associated with such a scheme is that an instruction may cause a branch out of sequence. In that case the pipeline must be emptied and all the instructions that have been read from memory after the branch instruction must be discarded.

115

Consider a computer with an instruction fetch unit and an instruction execution unit designed to provide a two-segment pipeline. The instruction fetch segment can be implemented by means of a first-in, first-out (FIFO) buffer. This is a type of unit that forms a queue rather than a stack. Whenever the execution unit is not using memory, the control increments the program counter and uses its address value to read consecutive instructions from memory. The instructions are inserted into the FIFO buffer so that they can be executed on a first-in, first-out basis. Thus an instruction stream can be placed in a queue, waiting for decoding and processing by the execution segment. The instruction stream queuing mechanism provides an efficient way for reducing the average access time to memory for reading instructions. Whenever there is space in the FIFO buffer, the control unit initiates the next instruction fetch phase. The buffer acts as a queue from which control then extracts the instructions for the execution unit.

Computers with complex instructions require other phases in addition to the fetch and

execute to process an instruction completely. In the most general case, the computer needs to process each instruction with the following sequence of steps.

1. Fetch the instruction from memory.

2. Decode the instruction.

3. Calculate the effective address.

4. Fetch the operands from memory.

5. Execute the instruction.

6. Store the result in the proper place.

There are certain difficulties that will prevent the instruction pipeline from operating at its maximum rate. Different segments may take different times to operate on the incoming information. Some segments are skipped for certain operations. For example, a register mode instruction does not need an effective address calculation. Two or more segments may require memory access at the same time, causing one segment to wait until another is finished with the memory. Memory access conflicts are sometimes resolved by using two memory buses for accessing instructions and data in separate modules. In this way, an instruction word and a data word can be read simultaneously from two different modules.

The design of an instruction pipeline will be most efficient if the instruction cycle is divided into segments of equal duration. The time that each step takes to fulfill its function' depends on the instruction and the way it is executed.

8.5 RISC Pipeline

116

Among the characteristics attributed to RISC is its ability to use an efficient instruction pipeline. The simplicity of the instruction set can be utilized to implement an instruction pipeline using a small number of suboperations, with each being executed in one clock cycle. Because of the fixed-length instruction format, the decoding of the operation can occur at the same time as the register selection. All data manipulation instructions have register-to register operations. Since all operands are in registers, there is no need for calculating an effective address or fetching of operands from memory. Therefore, the instruction pipeline can be implemented with two or three segments.One segment fetches the instruction from program memory, and the other segment executes the instruction in the ALU. A third segment may be used to store the result of the ALU operation in a destination register.

The data transfer instructions in RISC are limited to load and store instructions. These instructions use register indirect addressing. They usually need three or four stages in the pipeline. To prevent conflicts between a memory access to fetch an instruction and to load or store an operand, most RISe machines use two separate buses with two memories: one for storing the instructions and the other for storing the data. The two memories can sometime operate at the same speed as the epu clock and are referred to as cache memories .

It is not possible to expect that every instruction be fetched from memory and executed in one clock cycle. What is done, in effect, is to start each instruction with each clock cycle and to pipeline the processor to achieve the goal of single-cycle instruction execution. The advantage of RISC over else (complex instruction set computer) is that RISC can achieve pipeline segments, requiring just one clock cycle, while CISC uses many segments in its pipeline, with the longest segment requiring two or more clock cycles.

Another characteristic of RISC is the support given by the compiler that translates the high-level language program into machine language program. Instead of designing hardware to handle the difficulties associated with data conflicts and branch penalties, RISC processors rely on the efficiency of the compiler to detect and minimize the delays encountered with these problems.

8.6 Vector Processing

There is a class of computational problems that are beyond the capabilities of conventional computer. These problems are characterized by the fact that they require a vast number of computations that will take a conventional computer days or even weeks to complete. In many science and engineering applications, the problems can be formulated in terms of vectors and matrices that lend themselves to vector processing.Computers with vector processing capabilities are in demand in specialized applications. The following are representative application areas where vector processing is of the utmost importance.

1. Long-range weather forecasting2. Petroleum explorations3. Seismic data analysis

117

4. Medical diagnosis Aerodynamics and space flight simulations 5. Artificial intelligence and expert systems6. Mapping the human genome7. Image processing

Without sophisticated computers, many of the required computations cannot be completed within a reasonable amount of time. To achieve the required level of high performance it is necessary to utilize the fastest and most reliable hardware and apply innovative procedures from vector and parallel processing techniques.

Vector Operations

Many scientific problems require arithmetic operations on large arrays of numbers. These numbers fire usually formulated as vectors and matrices of floating-point numbers. A vector is an ordered set of a one-dimensional array of data items. A vector V of length n is represented as a row vector by V = [VI V2 V3 ... Vn]. It may be represented as a column vector if the data items are listed in a column. A conventional sequential computer is capable of processing operands one at a time. Consequently, operations on vectors must be broken down into single computations with subscripted variables. The element Vi ot vector V is written as V(I) and the index I refers to a memory address or register where the number is stored. To examine the difference between a conventional scalar processor and a vector processor, consider the following Fortran DO loop:

DO 20 I = 1, 100

20 C(I) = B(I) + A(I)

This is a program for adding two vectors A and B of length 100 to produce a vector C. This is implemented in machine language by the following sequence of operations.

Initialize I = 0

20 Read A (I) Read B ( I )

Store C (I) = A (I) + B (I)

Increment I = I + 1

If I :5 100 go to 20

Continue

This constitutes a program loop that reads a pair of operands from arrays A and B and performs a floating-point addition. The loop control variable is then up~ and the steps repeat 100 times.

118

A computer capable of vector processing eliminates the overhead associated with the time it takes of fetch and execute the instructions in the program loop.It allows operations to be specified with single vector instruction of the form

C(1:100) = A(1:100) + B(1:100)

Vector Instructions

Vector Instruction format

Pipeline for Inner Product

8.7 Array Processors

An array processor is processor that performs computations on large arrays of data. The term is used to

119

refer two different types of processor . An attaché array processor is an auxiliary processor attached to general purpose computer. It is intended to improve the performance of the host computer in specific numerical computation tasks. An SIMD

processor is a processor that has single instruction multiple data organization. It manipulates vector instructions by means of multiple functional units responding to a common instruction . Although both types of array processors manipulate vector , their organization is different.

Attached Array Processor

An attached array processor is designed as a peripheral for conventional host computer, and its purpose is to enhance the performance of computer by providing Vector processing for complex applications. It achieves high performance by means of parallel processing with multiple functional units. It includes an arithmetic unit containing one or more pipelined floating point adders and multipliers. The array processor can be programmed by the user to accommodate variety of complex arithmetic problems.

Attached array processor with host computer

SIMD Array Processor

An SIMD array processor is a computer with multiple processing units operating in parallel. The processing units are synchronized to perform the same operation under the control of common control unit, thus providing a single instruction stream, multiple data stream organization. A general block diagrams of an array is shown in the below diagram. It contains a set of identical processing elements (PE),each having a local memory M. Each processor includes an ALU, a floating point arithmetic unit , and working registers. The master control unit controls the operations in the processor elements.The main memory is used for storage of the program.The Function of the master control unit is decode the instructions and determine how the instructions and determine how the instruction is to be executed. Scalar and program control instructions are directly executed within the master control units. Vector instructions are broad cast to all PEs simultaneously. Each PE uses operand stored in its local memory. Vector operands distributed to the local memories prior to parallel execution of the instruction.

120

8.8 Reduced Instruction set computer: RISC VS CISC

Many computers have instruction sets that include more than 100 and sometimes even more than 200 instructions. These computers also employ a variety of data types and a large number of addressing modes. The trend into computer hardware complexity was influenced by various factors, such as upgrading existing models to provide more customer applications, adding instructions that facilitate the translation from high-level language into machine language programs, and striving to develop machines that move functions from software implementation into hardware implementation. A computer with a large number of instructions is classified as a complex instruction set computer, abbreviated CISC.

ClSC Characteristics

1. A large number of instructions-typically from 100 to 250 instructions.

2. Some instructions that perform specialized tasks and are used infrequently.

3. A large variety of addressing modes-typically from 5 to 20 different modes.

4. Variable-length instruction formats

5. Instructions that manipulate operands in memory

RISC Characteristics

1.Relatively few instructions.

2.Relatively few addressing modes.

3. Memory access limited to load and store instructions

4.All operations done within the registers of the CPU

5. Fixed-length, easily decoded instruction format

6. Single-cycle instruction execution

7. Hardwired rather than micro programmed control

121

End Chapter quizzes:

Q1. A technique uses special hardware to detect a conflict and then avoid the data by routing the data through special paths between pipeline segment.

a) Operand forwardingb) Delayed load c) Hardware interlocksd) Delayed branch

Q2.Data transfer are limited to load and store instructions.

a) CISCb) RISCc) SIMDd) MISD

Q3. Which Pipeline are used to implement floating-point operations, multiplication of fixed-point numbers, and similar computations encountered in scientific problems.

a) Instruction Pipelineb) Arithmetic Pipelinec) RISC Pipelined) CISC Pipeline

Q4. SIMD does not include

a) ALUb) Processing Elementc) Local memoryd) Control Unit

Q5. Pipelining is a technique of decomposing a __________into __________, with each subprocess being executed in special dedicated segment.

a) sequential process and suboperationsb) Suboperations and sequential processc) Instructions and sequential processd) Sequential and segmentation

Q6. Flow of information is synonymous of

a) Parallel processingb) Vector processing c) Array processor

122

d) Pipelining

Q7.Parallel processing can be achieved with the help of

a) Multiple functional unitb) Synchronizationc) Stream lining the datad) Execution of instruction

Q8. To solve A-B/C equation number of registers

a) 3b) 4 c) 5 d) 2

Q9.Flynn’s classification divides computers into how many computers.

a) 5b) 6c) 3d) 4

Q10. A measure used to evaluate supercomputer computer in their ability to perform a given number of floating-point operations per second is referred

a) Flopsb) Bytesc) Bitsd) Hz

Answers

1.(a), 2.(b),3.(b), 4.(d) ,5.(a) ,6(d), 7.(a) ,8.(c), 9.(d) , 10.(a)

123

Documents

introduction to IT and Com systems.doc