EECC550 - Shaaban #1 Lec # 1 Winter 2011 11-29-2011 Computer Organization EECC 550 Introduction: Modern Computer Design Levels, Components, Technology

EECC550 - ShaabanEECC550 - Shaaban#1 Lec # 1 Winter 2011 11-29-2011

Computer Organization EECC 550• Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer

Notation (RTN). [Chapters 1, 2]

• Instruction Set Architecture (ISA) Characteristics and Classifications: CISC Vs. RISC. [Chapter 2]

• MIPS: An Example RISC ISA. Syntax, Instruction Formats, Addressing Modes, Encoding & Examples. [Chapter 2]

• Central Processor Unit (CPU) & Computer System Performance Measures. [Chapter 1]

• CPU Organization: Datapath & Control Unit Design. [Chapter 4]

– MIPS Single Cycle Datapath & Control Unit Design.

– MIPS Multicycle Datapath and Finite State Machine Control Unit Design.

• Microprogrammed Control Unit Design.

– Microprogramming Project

• Midterm Review and Midterm Exam

• CPU Pipelining. [Chapter 4]

• The Memory Hierarchy: Cache Design & Performance. [Chapter 5]

• The Memory Hierarchy: Main & Virtual Memory. [Chapter 5]

• Input/Output Organization & System Performance Evaluation. [Chapter 7]

• Computer Arithmetic & ALU Design. [Chapter 3] If time permits.

• Final Exam.

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

3rd Edition Ch. 4

3rd Edition Ch. 5

3rd Edition Ch. 5 (not in 4th)

3rd Edition Ch. 5 (not in 4th Edition)

3rd Edition Ch. 6

3rd Edition Ch. 7

3rd Edition Ch. 8


Computing System History/Trends + Instruction Set Architecture (ISA) Fundamentals

• Computing Element Choices:– Computing Element Programmability– Spatial vs. Temporal Computing– Main Processor Types/Applications

• General Purpose Processor Generations• The Von Neumann Computer Model• CPU Organization (Design)• Recent Trends in Computer Design/performance• Hierarchy of Computer Architecture• Hardware Description: Register Transfer Notation (RTN)• Computer Architecture Vs. Computer Organization• Instruction Set Architecture (ISA):

– Definition and purpose– ISA Specification Requirements– Main General Types of Instructions– ISA Types and characteristics – Typical ISA Addressing Modes– Instruction Set Encoding– Instruction Set Architecture Tradeoffs– Complex Instruction Set Computer (CISC)– Reduced Instruction Set Computer (RISC)– Evolution of Instruction Set Architectures

Chapters 1, 2 (both editions)

ISA CPU Design


Computing Element Choices• General Purpose Processors (GPPs): Intended for general purpose computing

(desktops, servers, clusters..)• Application-Specific Processors (ASPs): Processors with ISAs and

architectural features tailored towards specific application domains– E.g Digital Signal Processors (DSPs), Network Processors (NPs), Media Processors,

Graphics Processing Units (GPUs), Vector Processors??? ...

• Co-Processors: A hardware (hardwired) implementation of specific algorithms with limited programming interface (augment GPPs or ASPs)

• Configurable Hardware:– Field Programmable Gate Arrays (FPGAs)– Configurable array of simple processing elements

• Application Specific Integrated Circuits (ASICs): A custom VLSI hardware solution for a specific computational task

• The choice of one or more depends on a number of factors including: - Type and complexity of computational algorithm

(general purpose vs. Specialized) - Desired level of flexibility/ - Performance requirements programmability - Development cost/time - System cost - Power requirements - Real-time constrains

The main goal of this course is the study of fundamental design techniques for General Purpose Processors


Computing Element Choices

Performance

Fle

xibi

lity

General Purpose Processors (GPPs):

Application-Specific Processors (ASPs)

Co-ProcessorsApplication Specific Integrated Circuits (ASICs)

Configurable Hardware

- Type and complexity of computational algorithms (general purpose vs. Specialized)- Desired level of flexibility - Performance - Development cost - System cost - Power requirements - Real-time constrains

Selection Factors:

Specialization , Development cost/time Performance/Chip Area/Watt(Computational Efficiency)

Pro

gram

mab

ility

/ The main goal of this course is the study of fundamental design techniquesfor General Purpose Processors

Processor : Programmable computing element that runs programs written using a pre-defined set of instructions

Software Hardware

ISA Requirements Processor Design


Computing Element ProgrammabilityComputing Element Programmability

• Computes one function (e.g. FP-multiply, divider, DCT)

• Function defined at fabrication time

• e.g hardware (ASICs)

• Computes “any” computable function (e.g. Processors)

• Function defined after fabrication

Fixed Function: Programmable:

Parameterizable Hardware:Performs limited “set” of functions

e.g. Co-Processors

Processor = Programmable computing element that runs programs written using pre-defined instructions

Computing Element Choices:

(Processor)(Hardware) Software


Spatial Temporal

ProcessorInstructions(Program)

(using hardware) (using software/program running on a processor)

Spatial vs. Temporal ComputingSpatial vs. Temporal ComputingComputing Element Choices:

Defined by Defined by fixedfixed functionalityfunctionality and and connectivityconnectivity of hardware elements of hardware elements

Processor = Programmable computing element that runs programs written using a pre-defined set of instructions

Hardware Block Diagram

Time


Main Processor Types/ApplicationsMain Processor Types/Applications

• General Purpose Computing & General Purpose Processors (GPPs) – High performance: In general, faster is always better.– RISC or CISC: Intel P4, IBM Power4, SPARC, PowerPC, MIPS ...– Used for general purpose software– End-user programmable– Real-time performance may not be fully predictable (due to dynamic arch. features)– Heavy weight, multi-tasking OS - Windows, UNIX– Normally, low cost and power not a requirement (changing)– Servers, Workstations, Desktops (PC’s), Notebooks, Clusters …

• Embedded Processing: Embedded processors and processor cores– Cost, power code-size and real-time requirements and constraints– Once real-time constraints are met, a faster processor may not be better– e.g: Intel XScale, ARM, 486SX, Hitachi SH7000, NEC V800...– Often require Digital signal processing (DSP) support or other application-specific support (e.g network, media processing)– Single or few specialized programs – known at system design time– Not end-user programmable– Real-time performance must be fully predictable (avoid dynamic arch. features)– Lightweight, often realtime OS or no OS– Examples: Cellular phones, consumer electronics .. …

• Microcontrollers – Extremely code size/cost/power sensitive– Single program– Small word size - 8 bit common– Usually no OS– Highest volume processors by far– Examples: Control systems, Automobiles, industrial control, thermostats, ...

Incr

easi

ngC

ost/

Com

plex

ity

Increasingvolum

e

Examples of Application-Specific Processors (ASPs)


Processor = Programmable computing element

that runs programs written using pre-defined instructions

64 bit

16-32 bit

8 bit


Processor Cost

Per

form

ance

Microprocessors

Performance iseverything& Software rules

Embeddedprocessors

Microcontrollers

Cost is everything

Application specific architecturesfor performance

GPPsReal-time constraintsSpecialized applicationsLow power/cost constraints

Chip Area, Powercomplexity

The Processor Design SpaceThe Processor Design Space

Processor = Programmable computing element that runs programs written using a pre-defined set of instructions



General Purpose Processor/Computer System GenerationsGeneral Purpose Processor/Computer System Generations

Classified according to implementation technology:

• The First Generation, 1946-59: Vacuum Tubes, Relays, Mercury Delay Lines:

– ENIAC (Electronic Numerical Integrator and Computer): First electronic computer, 18000 vacuum tubes, 1500 relays, 5000 additions/sec (1944).

– First stored program computer: EDSAC (Electronic Delay Storage Automatic Calculator), 1949.

• The Second Generation, 1959-64: Discrete Transistors.– e.g. IBM Main frames

• The Third Generation, 1964-75: Small and Medium-Scale Integrated (MSI) Circuits.

– e.g Main frames (IBM 360) , mini computers (DEC PDP-8, PDP-11).

• The Fourth Generation, 1975-Present: The Microcomputer. VLSI-based Microprocessors (single-chip processor)

– First microprocessor: Intel’s 4-bit 4004 (2300 transistors), 1970.– Personal Computer (PCs), laptops, PDAs, servers, clusters …– Reduced Instruction Set Computer (RISC) 1984

Common factor among all generations:All target the The Von Neumann Computer Model or paradigm


The Von Neumann Computer ModelThe Von Neumann Computer Model• Partitioning of the programmable computing engine into components:

– Central Processing Unit (CPU): Control Unit (instruction decode , sequencing of operations), Datapath (registers, arithmetic and logic unit, connections, buses …).

– Memory: Instruction (program) and operand (data) storage.– Input/Output (I/O) sub-system: I/O bus, interfaces, devices.– The stored program concept: Instructions from an instruction set are fetched from a common

memory and executed one at a time

-Memory

(instructions, data)

Control

DatapathregistersALU, buses

CPUComputer System

Input

Output

I/O Devices

Major CPU Performance Limitation: The Von Neumann computing model implies Neumann computing model implies sequential executionsequential execution one instruction at a time one instruction at a time Another Performance Limitation: Separation of CPU and memory (The Von Neumann memory bottleneck)Von Neumann memory bottleneck)

The Program Counter (PC) points to next instruction to be processed

AKA Program Counter(PC) Based Architecture


Instruction

Fetch

Instruction

Decode

Operand

Fetch

Execute

Result

Store

Next

Instruction

Obtain instruction from program storage

Determine required actions and instruction size

Locate and obtain operand data

Compute result value or status

Deposit results in storage for later use

Determine successor or next instruction

Major CPU Performance Limitation: The Von Neumann computing modelNeumann computing model

implies implies sequential executionsequential execution one instruction at a time one instruction at a time

Generic CPU Machine Instruction Processing StepsGeneric CPU Machine Instruction Processing Steps

(memory)

(Implied by The Von Neumann Computer Model)(Implied by The Von Neumann Computer Model)

The Program Counter (PC) points to next instruction to be processed

(i.e Update PC to fetch next instruction to be processed)


Hardware Components of Computer SystemsHardware Components of Computer Systems

Processor (active)

Computer

ControlUnit

Datapath

Memory(passive)

(where programs, data live whenrunning)

Devices

Input

Output

Keyboard, Mouse, etc.

Display, Printer, etc.

Disk

Five classic components of all computers:Five classic components of all computers:

1. Control Unit; 2. Datapath; 3. Memory; 4. Input; 5. Output1. Control Unit; 2. Datapath; 3. Memory; 4. Input; 5. Output}

Processor

}I/O

I/O

Central Processing Unit (CPU)


CPU Organization (Design)CPU Organization (Design)• Datapath Design:

– Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions

– (e.g., Registers, ALU, Shifters, Logic Units, ...)– Ways in which these components are interconnected (buses

connections, multiplexors, etc.).– How information flows between components.

• Control Unit Design:– Logic and means by which such information flow is controlled.– Control and coordination of FUs operation to realize the targeted

Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram).

• Hardware description with a suitable language, possibly using Register Transfer Notation (RTN).

Components & their connections needed by ISA instructions

Control/sequencing of operations of datapath componentsto realize ISA instructions

Components

Connections

ISA = Instruction Set ArchitectureThe ISA forms an abstraction layer that sets the requirements for both complier and CPU designers

AKA CPU Microarchitecture


A Typical A Typical Microprocessor Microprocessor

Layout: Layout:

The Intel The Intel Pentium ClassicPentium Classic

ControlUnit

Datapath

First Level of Memory (Cache)

1993 - 199760MHz - 233 MHz


A Typical A Typical Microprocessor Microprocessor

Layout: Layout:

The Intel The Intel Pentium ClassicPentium Classic

ControlUnit

Datapath

First Level of Memory (Cache)

1993 - 199760MHz - 233 MHz


Computer System ComponentsComputer System Components

SDRAMPC100/PC133100-133MHZ64-128 bits wide2-way inteleaved~ 900 MBYTES/SEC )64bit)

Double DateRate (DDR) SDRAMPC3200200 MHZ DDR64-128 bits wide4-way interleaved~3.2 GBYTES/SEC (one 64bit channel)~6.4 GBYTES/SEC(two 64bit channels)

RAMbus DRAM (RDRAM)400MHZ DDR16 bits wide (32 banks)~ 1.6 GBYTES/SEC

CPU

Caches

Front Side Bus (FSB)

I/O Devices:

Memory

Controllers

adapters

DisksDisplaysKeyboards

Networks

NICs

I/O BusesMemoryController Example: PCI, 33-66MHz

32-64 bits wide 133-528 MBYTES/SEC PCI-X 133MHz 64 bit 1024 MBYTES/SEC

CPU Core1 GHz - 3.8 GHz4-way SuperscalerRISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware speculation

L1

L2 L3

Memory Bus

All Non-blocking cachesL1 16-128K 1-2 way set associative (on chip), separate or unifiedL2 256K- 2M 4-32 way set associative (on chip) unifiedL3 2-16M 8-32 way set associative (off or on chip) unified

Examples: Alpha, AMD K7: EV6, 200-400 MHz Intel PII, PIII: GTL+ 133 MHz Intel P4 800 MHz

NorthBridge

SouthBridge

Chipset

Off or On-chip

Current Standard

I/O Subsystem

Recently 1 or 2 or 4 processor cores per chip


Microprocessor Performance Increase 1984-2000Microprocessor Performance Increase 1984-2000

> 100x performance increase in one decade

SPEC CPU2000 PerformanceSPEC CPU2000 Performance


Microprocessor Transistor Count Growth RateMicroprocessor Transistor Count Growth Rate

Intel 4004 (2300 transistors)

Currently ~ 3 Billion

Still holds today

Moore’s Law:Moore’s Law:2X transistors/ChipEvery 1.5-2 years(circa 1970)

~ 1,300,000x transistor density increase in the last 40 years


A current Multi-core Microprocessor Example

AMD Barcelona (Opteron X4) : 4 processor cores on one chip

The increase in transistor chip density allows integrating more than one processor core per chip

A benefit of Moore’s Law


1.55X/yr, or doubling every 1.6 years

Increase of Capacity of VLSI Dynamic RAM Increase of Capacity of VLSI Dynamic RAM (DRAM) Memory Chips(DRAM) Memory Chips

(Also follows Moore’s Law)Moore’s Law)

~ 17,000x DRAM chip capacity increase in 20 years


Computer Technology Trends:Computer Technology Trends: Evolutionary but Rapid ChangeEvolutionary but Rapid Change

• Processor:– 1.5-1.6 performance improvement every year; Over 100X performance in last

decade.

• Memory:– DRAM capacity: > 2x every 1.5 years; 1000X size in last decade.– Cost per bit: Improves about 25% or more per year.– Only 15-25% performance improvement per year.

• Disk:– Capacity: > 2X in size every 1.5 years.– Cost per bit: Improves about 60% per year.– 200X size in last decade.– Only 10% performance improvement per year, due to mechanical limitations.

• Expected State-of-the-art PC Fourth Quarter 2011 :– Processor clock speed: ~ 3500 MegaHertz (3.5 Giga Hertz)– Memory capacity: ~ 8000 MegaByte (8 Giga Bytes)– Disk capacity: ~ 3000 GigaBytes (3 Tera Bytes)

Performance gap compared to CPU performance causes system performance bottlenecks

With 2-8 processor coreson a single chip


A Simplified View of The A Simplified View of The Software/Hardware Hierarchical LayersSoftware/Hardware Hierarchical Layers


Hierarchy of Computer ArchitectureHierarchy of Computer Architecture

I/O systemInstr. Set Proc.

Compiler

OperatingSystem

Application

Digital DesignCircuit Design

Instruction Set Architecture

Firmware

Datapath & Control

Layout

Software

Hardware

Software/Hardware Boundary

High-Level Language Programs

Assembly LanguagePrograms

Microprogram

Register TransferNotation (RTN)

Logic Diagrams

Circuit Diagrams

Machine Language Program e.g.

BIOS (Basic Input/Output System)e.g.BIOS (Basic Input/Output System)

VLSI placement & routing

(ISA)The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers


Levels of Program RepresentationLevels of Program RepresentationHigh Level Language

Program

Assembly Language Program

Machine Language Program

Control Signal Specification

Compiler

Assembler

Machine Interpretation

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

lw$15, 0($2)lw$16, 4($2)sw$16, 0($2)sw$15, 4($2)

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

°°

ALUOP[0:3] <= InstReg[9:11] & MASK

Register Transfer Notation (RTN)Microprogram

MIPSAssemblyCode

Sof

twar

eH

ard

war

e

ISA

ISA = Instruction Set Architecture. The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers

ISA Requirements Processor Design


A Hierarchy of Computer DesignA Hierarchy of Computer DesignLevel Name Modules Primitives Descriptive Media

1 Electronics Gates, FF’s Transistors, Resistors, etc. Circuit Diagrams

2 Logic Registers, ALU’s ... Gates, FF’s …. Logic Diagrams

3 Organization Processors, Memories Registers, ALU’s … Register Transfer

Notation (RTN)

4 Microprogramming Assembly Language Microinstructions Microprogram

5 Assembly language OS Routines Assembly language Assembly Language

programming Instructions Programs

6 Procedural Applications OS Routines High-level Language

Programming Drivers .. High-level Languages Programs

7 Application Systems Procedural Constructs Problem-Oriented

Programs

Low Level - Hardware

Firmware

High Level - Software


Hardware DescriptionHardware Description• Hardware visualization:

– Block diagrams (spatial visualization): Two-dimensional representations of functional units and their

interconnections.– Timing charts (temporal visualization): Waveforms where events are displayed vs. time.

• Register Transfer Notation (RTN):– A way to describe microoperations capable of being performed

by the data flow (data registers, data buses, functional units) at the register transfer level of design (RT).

– Also describes conditional information in the system which cause operations to come about.

– A “shorthand” notation for microoperations.

• Hardware Description Languages:– Examples: VHDL: VHSIC (Very High Speed Integrated

Circuits) Hardware Description Language, Verilog.

AKA Register Transfer Language (RTL)


Register Transfer Notation (RTN)Register Transfer Notation (RTN)• Independent RTN:

– No predefined data flow is assumed (i.e No datapath design yet)– Describe actions on registers and memory locations without regard

to nonexistence of direct paths or intermediate registers. – Useful to describe functionality of instructions of a given ISA.

• Dependent RTN: – When RTN is used after the data flow (datapath design) is assumed

to be frozen. – No data transfer can take place over a path that does not exist.– No RTN statement implies a function the data flow hardware is

incapable of performing.• The general format of an RTN statement:

Conditional information : Action1; Action2; …• The conditional statement is often an AND of literals (status and

control signals) in the system (a p-term). The p-term is said to imply the action.

• Possible actions include transfer of data to/from registers/memory data shifting, functional unit operations etc.


RTN Statement ExamplesRTN Statement ExamplesA B or R[A] R[B]

where R[X] mean the content of register X– A copy of the data in entity B (typically a register) is

placed in Register A

– If the destination register has fewer bits than the source, the destination accepts only the lowest-order bits.

– If the destination has more bits than the source, the value of the source is sign extended to the left.

CTL T0: A = B– The contents of B are presented to the input of

combinational circuit A

– This action to the right of “:” takes place when control signal CTL is active and signal T0 is active.


RTN Statement ExamplesRTN Statement ExamplesMD M[MA] or MD Mem[MA]

– Means the memory data (MD) register receives the contents of the main memory (M or Mem) as addressed from the Memory Address (MA) register.

AC(0), AC(1), AC(2), AC(3) – Register fields are indicated by parenthesis.

– The concatenation operation is indicated by a comma.

– Bit AC(0) is bit 0 of the accumulator AC

– The above expression means AC bits 0, 1, 2, 3

– More commonly represented by AC(0-3)

E T3: CLRWRITE– The control signal CLRWRITE is activated when the

condition E T3 is active.


Computer Architecture Vs. Computer OrganizationComputer Architecture Vs. Computer Organization• The term Computer architecture is sometimes erroneously restricted

to computer instruction set design, with other aspects of computer design called implementation.

• More accurate definitions:

– Instruction Set Architecture (ISA): The actual programmer-visible instruction set and serves as the boundary or interface between the software and hardware.

– Implementation of a machine has two components:• Organization: includes the high-level aspects of a computer’s

design such as: The memory system, the bus structure, the internal CPU unit which includes implementations of arithmetic, logic, branching, and data transfer operations.

• Hardware: Refers to the specifics of the machine such as detailed logic design and packaging technology.

• In general, Computer Architecture refers to the above three aspects:

1- Instruction set architecture 2- Organization. 3- Hardware.

The ISA forms an abstraction layer that sets therequirements for both complier and CPU designers

Hardware design and implementation

CPU Micro-architecture (CPU design)


Instruction Set Architecture (ISA)Instruction Set Architecture (ISA)“... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.” – Amdahl, Blaaw, and Brooks, 1964.

The instruction set architecture is concerned with:

• Organization of programmable storage (memory & registers): Includes the amount of addressable memory and number of available registers.

• Data Types & Data Structures: Encodings & representations.

• Instruction Set: What operations are specified.

• Instruction formats and encoding.

• Modes of addressing and accessing data items and instructions

• Exceptional conditions.

The ISA forms an abstraction layer that sets therequirements for both complier and CPU designers

AssemblyProgrammerOrCompiler


Evolution of Instruction Set ArchitecturesEvolution of Instruction Set ArchitecturesSingle Accumulator (EDSAC 1949)

Accumulator + Index Registers

(Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model from Implementation

High-level Language Based Concept of an ISA Family(B5000 1963) (IBM 360 1964)

General Purpose Register (GPR) Machines

Complex Instruction Sets (CISC)Load/Store Architecture

Reduced Instruction Set Computer ( (RISC)

(Vax, Motorola 68000, Intel x86 1977-80) (CDC 6600, Cray 1 1963-76)

(MIPS, SPARC, HP-PA, PowerPC, . . . 1984..)

No ISA

i.e. CPU Design


Computer Instruction SetsComputer Instruction Sets• Regardless of computer type, CPU structure, or

hardware organization, every machine instruction must specify the following:

– Opcode: Which operation to perform. Example: add, load, and branch.

– Where to find the operand or operands, if any: Operands may be contained in CPU registers, main memory, or I/O ports.

– Where to put the result, if there is a result: May be explicitly mentioned or implicit in the opcode.

– Where to find the next instruction: Without any explicit branches, the instruction to execute is the next instruction in the sequence or a specified address in case of jump or branch instructions.

Opcode = Operation Code

Operands location can be explicitly specified in the instruction or implied

Destination


Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) Specification RequirementsSpecification RequirementsInstruction

Fetch

Instruction

Decode

Operand

Fetch

Execute

Result

Store

Next

Instruction

• Instruction Format or Encoding:– How is it decoded?

• Location of operands and result (addressing modes):– Where other than memory?– How many explicit operands? – How are memory operands located?– Which can or cannot be in memory?

• Data type and Size.• Operations

– What are supported• Successor instruction:

– Jumps, conditions, branches.• Fetch-decode-execute is implicit.


Main General Types of InstructionsMain General Types of Instructions1. Data Movement Instructions, possible variations:

– Memory-to-memory.– Memory-to-CPU register.– CPU-to-memory.– Constant-to-CPU register.– CPU-to-output.– etc.

2. Arithmetic Logic Unit (ALU) Instructions:– Logic instructions– Integer Arithmetic Instructions– Floating Point Arithmetic Instructions

3. Branch (Control) Instructions:

– Unconditional jumps. – Conditional branches.


Examples of Data Movement InstructionsExamples of Data Movement Instructions

Instruction Meaning Machine

MOV A,B Move 16-bit data from memory loc. A to loc. B VAX11

lwz R3,A Move 32-bit data from memory loc. A to register R3 PPC601

li $3,455 Load the 32-bit integer 455 into register $3 MIPS R3000

MOV AX,BX Move 16-bit data from register BX into register AX Intel X86

LEA.L (A0),A2 Load the address pointed to by A0 into A2 MC68000


Examples of ALU InstructionsExamples of ALU Instructions


MULF A,B,C Multiply the 32-bit floating point values at mem. VAX11locations A and B, and store result in loc. C

nabs r3,r1 Store the negative absolute value of register r1 in r2 PPC601

ori $2,$1,255 Store the logical OR of register $1 with 255 into $2 MIPS R3000

SHL AX,4 Shift the 16-bit value in register AX left by 4 bits Intel X86

ADD.L D0,D1 Add the 32-bit values in registers D0, D1 and store MC68000the result in register D0


Examples of Branch InstructionsExamples of Branch Instructions


BLBS A, Tgt Branch to address Tgt if the least significant bit VAX11at location A is set.

bun r2 Branch to location in r2 if the previous comparison PPC601signaled that one or more values was not a number.

Beq $2,$1,32 Branch to location PC+4+32 if contents of $1 and $2 MIPS R3000are equal.

JCXZ Addr Jump to Addr if contents of register CX = 0. Intel X86

BVS next Branch to next if overflow flag in CC is set. MC68000


Operation Types in The Instruction SetOperation Types in The Instruction Set Operator Type Examples

Arithmetic and logical Integer arithmetic and logical operations: add, or

Data transfer Loads-stores (move on machines with memory

addressing)

Control Branch, jump, procedure call, and return, traps.

System Operating system call/return, virtual memory management instructions ...

Floating point Floating point operations: add, multiply ....

Decimal Decimal add, decimal multiply, decimal to character conversion

String String move, string compare, string search

Media The same operation performed on multiple data (e.g Intel MMX, SSE)

1

2

3


Instruction Usage Example:Instruction Usage Example: Top 10 Intel X86 Top 10 Intel X86

InstructionsInstructionsRank Integer Average Percent total executed

1

2

3

4

5

6

7

8

9

10

instruction

load

conditional branch

compare

store

add

and

sub

move register-register

call

return

Total

Observation: Simple instructions dominate instruction usage frequency.

22%

20%

16%

12%

8%

6%

5%

4%

1%

1%

96%

CISC to RISC observation


Types of Instruction Set ArchitecturesTypes of Instruction Set ArchitecturesAccording To Operand Memory Addressing FieldsAccording To Operand Memory Addressing Fields

Memory-To-Memory Machines:– Operands obtained from memory and results stored back in memory by any instruction

that requires operands.– No local CPU registers are used in the CPU datapath.– Include:

• The 4 Address Machine.• The 3-address Machine.• The 2-address Machine.

The 1-address (Accumulator) Machine: – A single local CPU special-purpose register (accumulator) is used as the source of one

operand and as the result destination.The 0-address or Stack Machine:

– A push-down stack is used in the CPU.

General Purpose Register (GPR) Machines:– The CPU datapath contains several local general-purpose registers which can be

used as operand sources and as result destinations.– A large number of possible addressing modes.– Load-Store or Register-To-Register Machines: GPR machines where only data

movement instructions (loads, stores) can obtain operands from memory and store results to memory.

CISC to RISC observation (load-store simplifies CPU design)

Machine = ISA or CPU targeting a specific ISA type


Types of Instruction Set ArchitecturesTypes of Instruction Set Architectures Memory-To-Memory Machines: Memory-To-Memory Machines:

The 4-Address Machine/ISAThe 4-Address Machine/ISA• No program counter (PC) or other CPU registers are used.• Instruction encoding has four address fields to specify:

– Location of first operand. - Location of second operand.– Place to store the result. - Location of next instruction.

Memory

Op1

Op2

Res

Nexti

::

Op1Addr:

Op2Addr:

ResAddr:

NextiAddr:

+

CPUInstruction:

add Res, Op1, Op2, Nexti

Meaning:

Res Op1 + Op2

or more precise RTN:

M[ResAddr] M[Op1Addr] + M[Op2Addr]

NextiAddrResAddr Op1Addr Op2AddraddBits: 8 24 24 24 24

Instruction Format (encoding)

Opcode Whichoperation

Where toput result

Where to findnext instruction

Where to find operands

Can address 224 bytes = 16 MBytes

Instruction Size:13 bytes


• A program counter (PC) is included within the CPU which points to the next instruction.• No CPU storage (general-purpose registers).


The 3-Address Machine/ISAThe 3-Address Machine/ISA

Instruction:

add Res, Op1, Op2

Meaning:

Res Op1 + Op2or more precise RTN:

M[ResAddr] M[Op1Addr] + M[Op2Addr] PC PC + 10

ResAddr Op1Addr Op2AddraddBits: 8 24 24 24



Where toput result


Memory

Op1

Op2

Res

Nexti

::

Op1Addr:

Op2Addr:

ResAddr:

NextiAddr:

+

CPU

ProgramCounter (PC)


24


Increment PC



• The 2-address Machine: Result is stored in the memory address of one of the operands.


The 2-Address Machine/ISAThe 2-Address Machine/ISA

Instruction:

add Op2, Op1

Meaning:

Op2 Op1 + Op2or more precise RTN:

M[Op2Addr] M[Op1Addr] + M[Op2Addr]PC PC + 7

Where toput result

Op2Addr Op1AddraddBits: 8 24 24




Memory

Op1

Op2,Res

Nexti

::

Op1Addr:

Op2Addr:

NextiAddr:

+

CPU

ProgramCounter (PC)


24

Increment PC




• A single register (accumulator) in the CPU is used as the source of one operand and result destination.

Instruction:

add Op1

Meaning:

Acc Acc + Op1

or more precise RTN:

Acc Acc + M[Op1Addr]PC PC + 4

Types of Instruction Set ArchitecturesTypes of Instruction Set Architectures The 1-address (Accumulator) Machine/ISAThe 1-address (Accumulator) Machine/ISA

Op1AddraddBits: 8 24



Where to find operand1

Memory

Op1

Nexti

::

Op1Addr:

NextiAddr:

+

CPU

ProgramCounter (PC)


24

Accumulator

Where to findoperand2, andwhere to put result

Increment PC




• A push-down stack is used in the CPU.

Types of Instruction Set ArchitecturesTypes of Instruction Set Architectures The 0-address (Stack) Machine/ISAThe 0-address (Stack) Machine/ISA

Instruction: push Op1

Meaning: TOS M[Op1Addr]

Instruction: add

Meaning: TOS TOS + SOS

Instruction Format

addBits: 8

Opcode

Instruction: pop Res

Meaning: M[ResAddr] TOS

Op1AddrpushBits: 8 24

Instruction Format

Opcode Where to find operand

ResAddr popBits: 8 24

Instruction Format

Opcode Memory Destination

CPU

ProgramCounter (PC)

24

Memory

Op1

Nexti

::

Op1Addr:

NextiAddr:

TOS

SOS

etc.

Stackpush

Op2Op2Addr:

ResResAddr:

+

add

Op1

Op2, Respop

8

4 Bytes

1 Byte

4 Bytes

TOS = Top Entry in StackSOS = Second Entry in Stack



• CPU contains several general-purpose registers which can be used as operand sources and result destination.

Types of Instruction Set ArchitecturesTypes of Instruction Set Architectures General Purpose Register (GPR) MachinesGeneral Purpose Register (GPR) Machines

Instruction: load R8, Op1Meaning: R8 M[Op1Addr] PC PC + 5

+

CPU

ProgramCounter (PC)

24

Memory

Op1

Nexti

::

Op1Addr:

NextiAddr:

R8

R7

R6

R5

R4

R3

R2

R1

Registersload

add

store

Op1AddrloadBits: 8 3 24

Instruction Format

Opcode Where to find operand1

R8

Instruction: add R2, R4, R6Meaning: R2 R4 + R6 PC PC + 3

addBits: 8 3 3 3

Instruction Format

Opcode Des Operands

R2 R4 R6

Instruction: store R2, Op2Meaning: M[Op2Addr] R2 PC PC + 5

ResAddrstoreBits: 8 3 24

Instruction Format

Opcode Destination

R2

Here add instruction has three register specifier fieldsWhile load, store instructions have one register specifier fieldand one memory address specifier field

Size = 4.375 bytes rounded up to 5 bytes



Eight general purpose Registers (GPRs) assumed here: R1-R8


Expression Evaluation Example with 3-, 2-, Expression Evaluation Example with 3-, 2-, 1-, 0-Address, And GPR Machines1-, 0-Address, And GPR Machines

For the expression A = (B + C) * D - E where A-E are in memory

0-Address Stack

push B push C add push D mul push E sub pop A

8 instructionsCode size:23 bytes5 memoryaccesses fordata

3-Address

add A, B, Cmul A, A, Dsub A, A, E

3 instructionsCode size:30 bytes9 memory accesses fordata

2-Address

load A, Badd A, Cmul A, Dsub A, E

4 instructionsCode size:28 bytes11 memoryaccesses fordata

1-AddressAccumulator

load B add C mul D sub E store A

5 instructionsCode size:20 bytes5 memory accesses fordata

GPRRegister-Memory

load R1, B add R1, C mul R1, D sub R1, E store A, R1

5 instructions Code size: 25 bytes 5 memory accesses for data

Load-Store

load R1, Bload R2, Cadd R3, R1, R2load R1, Dmul R3, R3, R1load R1, Esub R3, R3, R1store A, R3

8 instructions Code size: 34 bytes 5 memory accesses for data


Instruction Set Architecture TradeoffsInstruction Set Architecture Tradeoffs• 3-address machine: shortest code sequence; a large number of bits

per instruction; large number of memory accesses.

• 0-address (stack) machine: Longest code sequence; shortest individual instructions; more complex to program.

• General purpose register machine (GPR): – Addressing modified by specifying among a small set of registers

with using a short register address (all new ISAs since 1975).

– Advantages of GPR:• Low number of memory accesses. Faster, since register access is

currently still much faster than memory access. • Registers are easier for compilers to use.• Shorter, simpler instructions.

• Load-Store Machines: GPR machines where memory addresses are only included in data movement instructions (loads/stores) between memory and registers (all new ISAs designed after 1980).

CISC to RISC observation (load-store simplifies CPU design)

Machine = CPU or ISA

Why GPR? 1

2

3 AKA Register-register GPR


Typical GPR ISA Memory Addressing ModesTypical GPR ISA Memory Addressing Modes

Register

Immediate

Displacement

Indirect

Indexed

Absolute

Memory indirect

Autoincrement

Autodecrement

Scaled

R4 R4 + R3

R4 R4 + 3

R4 R4 + Mem[10+ R1]

R4 R4 + Mem[R1]

R3 R3 +Mem[R1 + R2]

R1 R1 + Mem[1001]

R1 R1 + Mem[Mem[R3]]

R1 R1 + Mem[R2]

R2 R2 + d

R2 R2 - d

R1 R1 + Mem[R2]

R1 R1+ Mem[100+ R2 + R3*d]

Add R4, R3

Add R4, #3

Add R4, 10 (R1)

Add R4, (R1)

Add R3, (R1 + R2)

Add R1, (1001)

Add R1, @ (R3)

Add R1, (R2) +

Add R1, - (R2)

Add R1, 100 (R2) [R3]

Addressing Sample Mode Instruction Meaning

CISC to RISC observation (fewer addressing modes simplify CPU design)


Addressing Modes Usage ExampleAddressing Modes Usage Example

Displacement 42% avg, 32% to 55%

Immediate: 33% avg, 17% to 43%

Register deferred (indirect): 13% avg, 3% to 24%

Scaled: 7% avg, 0% to 16%

Memory indirect: 3% avg, 1% to 6% Misc: 2% avg, 0% to 3%

75% displacement & immediate88% displacement, immediate & register indirect.

Observation: In addition Register direct, Displacement, Immediate, Register Indirect addressing modes are important.

For 3 programs running on VAX ignoring direct register mode:

75%88%

CISC to RISC observation (fewer addressing modes simplify CPU design)


Displacement Address Size ExampleDisplacement Address Size ExampleAvg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs

1% of addresses > 16-bits

12 - 16 bits of displacement needed

0%

10%

20%

30%

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

Int. Avg. FP Avg.

Displacement Address Bits Needed


For displacement addressing mode


Instruction Set EncodingInstruction Set EncodingConsiderations affecting instruction set encoding:

– The number of registers and addressing modes supported by ISA.

– The impact of of the size of the register and addressing mode fields on the average instruction size and on the average program.

– To encode instructions into lengths that will be easy to handle in the implementation. On a minimum to be a multiple of bytes.

• Instruction Encoding Classification:

1. Fixed length encoding: Faster and easiest to implement in hardware.

2. Variable length encoding: Produces smaller instructions.

3. Hybrid encoding.

e.g. Simplifies design of pipelined CPUs


to reduce code size


Three Examples of Instruction Set EncodingThree Examples of Instruction Set Encoding

Variable Length Encoding: VAX (1-53 bytes)

Operations &no of operands

Addressspecifier 1

Addressfield 1

Address specifier n

Address field n

Operation Addressfield 1

Addressfield 2

Addressfield3

Fixed Length Encoding: MIPS, PowerPC, SPARC (all instructions are 4 bytes each)

Operation Address Specifier

Addressfield

Operation AddressSpecifier 1

AddressSpecifier 2

Address field

OperationAddress Specifier Address

field 1

Address field 2

Hybrid Encoding: IBM 360/370, Intel 80x86


ISA ExamplesISA Examples Machine Number of General Architecture year Purpose RegistersEDSACIBM 701CDC 6600IBM 360DEC PDP-8DEC PDP-11Intel 8008Motorola 6800DEC VAX

Intel 8086 Motorola 68000Intel 80386MIPSHP PA-RISCSPARCPowerPCDEC AlphaHP/Intel IA-64AMD64 (EMT64)

11816181116

1168323232323212816

accumulatoraccumulatorload-storeregister-memoryaccumulatorregister-memoryaccumulatoraccumulatorregister-memorymemory-memoryextended accumulatorregister-memoryregister-memoryload-storeload-storeload-storeload-storeload-storeload-storeregister-memory

194919531963196419651970197219741977

1978198019851985198619871992199220012003


Examples of GPR MachinesExamples of GPR Machines

Max. number ofnumber of Max. numberMax. number

memory addresses of operands allowedmemory addresses of operands allowed

SPARC, MIPSSPARC, MIPS

PowerPC, ALPHAPowerPC, ALPHA

Intel Intel 80386

Motorola 68000Motorola 68000

2 or 3 2 or 3 VAXVAX

0 3

1 2

For Arithmetic/Logic (ALU) Instructions(ISAs)

+ destination


Complex Instruction Set Computer (CISC)Complex Instruction Set Computer (CISC)• Emphasizes doing more with each instruction:

– Thus fewer instructions per program (more compact code).

• Motivated by the high cost of memory and hard disk capacity when original CISC architectures were proposed– When M6800 was introduced: 16K RAM = $500, 40M hard disk = $ 55, 000– When MC68000 was introduced: 64K RAM = $200, 10M HD = $5,000

• Original CISC architectures evolved with faster more complex CPU designs but backward instruction set compatibility had to be maintained (e.g X86).

• Wide variety of addressing modes:• 14 in MC68000, 25 in MC68020

• A number instruction modes for the location and number of operands:

• The VAX has 0- through 3-address instructions.

• Variable-length instruction encoding.

ISAs

Circa 1980

Why?

To reduce code size


Example CISC ISAs Example CISC ISAs Motorola 680X0Motorola 680X0

18 addressing modes:• Data register direct.• Address register direct.• Immediate.• Absolute short.• Absolute long.• Address register indirect.• Address register indirect with postincrement.• Address register indirect with predecrement.• Address register indirect with displacement.• Address register indirect with index (8-bit).• Address register indirect with index (base).• Memory inderect postindexed.• Memory indirect preindexed.• Program counter indirect with index (8-bit).• Program counter indirect with index (base).• Program counter indirect with displacement.• Program counter memory indirect postindexed.• Program counter memory indirect preindexed.

Operand size:• Range from 1 to 32 bits, 1, 2, 4, 8,

10, or 16 bytes.

Instruction Encoding:• Instructions are stored in 16-bit

words.

• the smallest instruction is 2- bytes (one word).

• The longest instruction is 5 words (10 bytes) in length.

GPR ISA (Register-Memory)

2 Bytes 10 Bytes


Example CISC ISA:Example CISC ISA:

Intel Intel 8038612 addressing modes:

• Register.

• Immediate.

• Direct.

• Base.

• Base + Displacement.

• Index + Displacement.

• Scaled Index + Displacement.

• Based Index.

• Based Scaled Index.

• Based Index + Displacement.

• Based Scaled Index + Displacement.

• Relative.

Operand sizes:• Can be 8, 16, 32, 48, 64, or 80 bits long.

• Also supports string operations.

Instruction Encoding:• The smallest instruction is one byte.

• The longest instruction is 12 bytes long.

• The first bytes generally contain the opcode, mode specifiers, and register fields.

• The remainder bytes are for address displacement and immediate data.

X86 or IA-32

GPR ISA (Register-Memory)

One Byte 12 Bytes


Reduced Instruction Set Computer (RISC)Reduced Instruction Set Computer (RISC)• Focuses on reducing the number and complexity of instructions of the

ISA.

– Motivated by simplifying the ISA and its requirements to:• Reduce CPU design complexity• Improve CPU performance.

– CPU Performance Goal: Reduced number of cycles needed per instruction. At least one instruction completed per clock cycle.

• Simplified addressing modes supported.– Usually limited to immediate, register indirect, register displacement, indexed.

• Load-Store GPR: Only load and store instructions access memory.– (Thus more instructions are usually executed than CISC)

• Fixed-length instruction encoding.– (Designed with CPU instruction pipelining in mind).

• Support of delayed branches.• Examples: MIPS, HP PA-RISC, SPARC, Alpha, POWER, PowerPC.

ISAs~1984

RISC: Simplify ISA Simplify CPU Design Better CPU Performance

RISC Goals


Example RISC ISA:Example RISC ISA:

PowerPCPowerPC8 addressing modes:

• Register direct.

• Immediate.

• Register indirect.

• Register indirect with immediate index (loads and stores).

• Register indirect with register index (loads and stores).

• Absolute (jumps).

• Link register indirect (calls).

• Count register indirect (branches).

Operand sizes:• Four operand sizes: 1, 2, 4 or 8 bytes.

Instruction Encoding:• Instruction set has 15 different formats

with many minor variations.•

• All are 32 bits (4 bytes) in length.

Load-Store GPR



HP Precision Architecture HP Precision Architecture HP PA-RISC HP PA-RISC

7 addressing modes:• Register

• Immediate

• Base with displacement

• Base with scaled index and displacement

• Predecrement

• Postincrement

• PC-relative

Operand sizes:• Five operand sizes ranging in powers of

two from 1 to 16 bytes.

Instruction Encoding:• Instruction set has 12 different formats.•


Load-Store GPR



SPARCSPARC

5 addressing modes:• Register indirect with immediate

displacement.

• Register inderect indexed by another register.

• Register direct.

• Immediate.

• PC relative.


Instruction Encoding:• Instruction set has 3 basic instruction

formats with 3 minor variations.


Load-Store GPR



DEC Alpha AXPDEC Alpha AXP

4 addressing modes:• Register direct.

• Immediate.

• Register indirect with displacement.

• PC-relative.


Instruction Encoding:• Instruction set has 7 different formats.•


Load-Store GPR


RISC ISA Example: RISC ISA Example:

MIPS R3000 (32-bit)MIPS R3000 (32-bit)Instruction Categories:

• Load/Store.• Computational.• Jump and Branch.• Floating Point (using coprocessor).• Memory Management.• Special.

OP

OP

OP

rs rt rd sa funct

rs rt immediate

jump target

Instruction Encoding: 3 Instruction Formats, all 32 bits (4 bytes) wide.

R0 - R31

PCHI

LO

Registers 5 Addressing Modes:• Register direct (arithmetic).• Immedate (arithmetic).• Base register + immediate offset

(loads and stores). • PC relative (branches).• Pseudodirect (jumps)

Operand Sizes:• Memory accesses in any

multiple between 1 and 4 bytes.

MIPS is the target ISA for CPU design in this course

R

I

J

Load-Store GPR

AKA MIPS-I

Documents

EECC550 - Shaaban #1 Lec # 1 Winter 2011 11-29-2011 Computer Organization EECC 550 Introduction: Modern Computer Design Levels, Components, Technology