EE37E20051 Lesson 5: Processor Design Topic 1 – Methods and Concepts

EE37E 2005 1

Lesson 5: Processor Design

Topic 1 – Methods and Concepts

EE37E 2005 2

Introduction

References:-Modern Processor Design Book ( pp. 1 – 16)- Computer Organization and Design Book (pp. 54- 89)

EE37E 2005 3

• While introducing this topic we will focus on these points:– Evolution of microprocessors– Instruction set processor design– Principles

• Microprocessors are Instruction set processors (ISPs). • An ISP executes instructions from a predefined

instruction set.• A microprocessor’s functionality is fully characterized

by the instruction set it is capable of executing.• This predefined instruction set is also called the

instruction set architecture.

EE37E 2005 4

• An ISA serves as an interface between software and hardware.

• In terms of processor design methodology, an ISA is the specification of the design while the microprocessor or ISP is the implementation of a design.

EE37E 2005 5

Computer System ComponentsComputer System Components

SDRAMPC100/PC133100-133MHZ64-128 bits wide2-way inteleaved~ 900 MBYTES/SEC

Double DateRate (DDR) SDRAMPC3200400MHZ (effective 200x2)64-128 bits wide4-way interleaved~3.2 GBYTES/SEC(second half 2002)

RAMbus DRAM (RDRAM)PC800, PC1060 400-533MHZ (DDR)16-32 bits wide channel~ 1.6 - 3.2 GBYTES/SEC ( per channel)

CPU

CachesSystem Bus

I/O Devices:

Memory

Controllers

adapters

DisksDisplaysKeyboards

Networks

NICs

I/O BusesMemoryController

Examples: Alpha, AMD K7: EV6, 400MHZ Intel PII, PIII: GTL+ 133MHZ Intel P4 800MHZ

Example: PCI-X 133MHZ PCI, 33-66MHZ 32-64 bits wide 133-1024 MBYTES/SEC

1000MHZ - 3 GHZ (a multiple of system bus speed)Pipelined ( 7 -21 stages )Superscalar (max ~ 4 instructions/cycle) single-threadedDynamically-Scheduled or VLIWDynamic and static branch prediction

L1

L2 L3

Memory Bus

Support for one or more CPUs

Fast EthernetGigabit EthernetATM, Token Ring ..

NorthBridge

SouthBridge

Chipset

EE37E 2005 6

Computer System ComponentsComputer System Components

CPU

CachesSystem Bus

I/O Devices:

Memory

Controllers

adapters

Disks (RAID)DisplaysKeyboards

Networks

NICs

I/O BusesMemoryController

L1

L2 L3

Memory Bus

Conventional & Block-based Trace Cache.

Integrate MemoryController & a portionof main memory with CPU: Intelligent RAM

Integrated memory Controller: AMD Opetron

IBM Power5

Memory Latency Reduction:

Enhanced CPU Performance & Capabilities:

• Support for Simultaneous Multithreading (SMT): Alpha EV8.• VLIW & intelligent compiler techniques: Intel/HP EPIC IA-64.• More Advanced Branch Prediction Techniques.• Chip Multiprocessors (CMPs): The Hydra Project. IBM Power 4,5• Vector processing capability: Vector Intelligent RAM (VIRAM). Or Multimedia ISA extension.• Digital Signal Processing (DSP) capability in system.• Re-Configurable Computing hardware capability in system.

SMTCMP

NorthBridge

SouthBridge

Chipset

EE37E 2005

Recent Trends in Computer DesignRecent Trends in Computer Design• The cost/performance ratio of computing systems have seen a

steady decline due to advances in:– Integrated circuit technology: decreasing feature size,

• Clock rate improves roughly proportional to improvement in • Number of transistors improves proportional to (or faster).

– Architectural improvements in CPU design.

• Microprocessor systems directly reflect IC improvement in terms of a yearly 35 to 55% improvement in performance.

• Assembly language has been mostly eliminated and replaced by other alternatives such as C or C++

• Standard operating Systems (UNIX, NT) lowered the cost of introducing new architectures.

• Emergence of RISC architectures and RISC-core architectures.

• Adoption of quantitative approaches to computer design based on empirical performance observations.

EE37E 2005 8

Microprocessor Architecture TrendsMicroprocessor Architecture Trends

C IS C M ac h i n e sins truc tio ns take var iable t im e s to c o m ple te

R IS C M ac h i n e s ( m i c r o c o d e )s im ple ins truc tio ns , o ptim ize d fo r spe e d

R IS C M ac h i n e s ( p i p e l i n e d )s am e individual ins truc tio n late nc y

gre ate r thro ughput thro ugh ins truc tio n "o ve r lap"

S u p e r s c a l ar P r o c e s s o r sm ultiple ins truc tio ns e xe c uting s im ultane o us ly

M u l t i t h r e ad e d P r o c e s s o r saddit io nal H W re so urc e s ( re gs , P C , SP )e ac h c o nte xt ge ts pro c e s so r fo r x c yc le s

V L IW"Supe r ins truc tio ns " gro upe d to ge the r

de c re ase d H W c o ntro l c o m ple xity

S i n g l e C h i p M u l t i p r o c e s s o r sduplic ate e ntire pro c e s so rs

( te c h so o n due to M o o re 's Law)

S IM U L TA N E O U S M U L TITH R E A D IN Gm ultiple H W c o nte xts ( re gs , P C , SP )e ac h c yc le , any c o nte xt m ay e xe c ute

CMPs

(SMT)

SMT/CMPs (e.g. IBM Power5 in 2004)

EE37E 2005 9Year

Tra

nsis

tors

1000

10000

100000

1000000

10000000

100000000

1970 1975 1980 1985 1990 1995 2000

i80386

i4004

i8080

Pentium

i80486

i80286

i8086

Evolution of microprocessors

CMOS improvements:• Die size: 2X every 3 yrs• Line width: halve / 4-7 yrs

“Graduation Window”

Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million

Moore’s Law

Figure1: Evolution of microprocessors

EE37E 2005 10

• Three decades of the history of microprocessors tell a truly remarkable story of advances in the computer industry (Table 1).

1970 - 1980

1980 - 1990

1990 -2000

2000 -2010

Transistor count

2K – 100K 100K – 1 M

1M – 100M

100M – 2 B

Clock frequency

0.1 – 3 MHz

3 – 30 MHz

30 MHz – 1 GHz

1 – 15 GHz

Instructions/Cycle

0.1IPC 0.1IPC-0.9IPC

0.9IPC-1.9IPC

1.9IPC-2.9IPC

Table 1. The amazing decades of the evolution of microprocessors

EE37E 2005 11

Hierarchy of Computer ArchitectureHierarchy of Computer Architecture

I/O systemInstr. Set Proc.

Compiler

OperatingSystem

Application

Digital DesignCircuit Design

Instruction Set Architecture

Firmware

Datapath & Control

Layout

Software

Hardware

Software/Hardware Boundary

High-Level Language Programs

Assembly LanguagePrograms

Microprogram

Register TransferNotation (RTN)

Logic Diagrams

Circuit Diagrams

Machine Language Program

EE37E 2005 12

Instruction Set Processor Design

• Critical to an ISP is the instruction set architecture, which specifies the functionality that must be implemented by the instruction set processor (ISP).

EE37E 2005 13

The Design Process

• "To Design Is To Represent“– Design activity yields description/representation of

an object• Traditional craftsman does not distinguish between

the conceptualization and the artifact• Separation comes about because of complexity• Concept is captured in one or more representation

languages

– This process IS design

• Design Begins With Requirements– Functional Capabilities: what it will do– Performance Characteristics: Speed, Power, Area,

Cost, . . .

EE37E 2005 14

Design Process (cont.)

• Design Finishes As Assembly– Design understood in terms of

components and how they have been assembled

– Top Down decomposition of complex functions (behaviors) into more primitive functions

• Bottom-up composition of primitive building blocks into more complex assemblies

CPU

Datapath Control

ALU Regs Shifter

NandGate

Design is a "creative process," not a simple method

EE37E 2005 15

Design as Search

Design involves educated guesses and verification

-- Given the goals, how should these be prioritized?

-- Given alternative design pieces, which should be selected?

-- Given design space of components & assemblies, which part will yield the best solution?

Feasible (good) choices vs. Optimal choices

Problem A

Strategy 1 Strategy 2

SubProb 1 SubProb2 SubProb3

BB1 BB2 BB3 BBn

EE37E 2005 16

Instruction Set Architecture(subset of Computer Architecture)

“... the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.”

– Amdahl, Blaaw, and Brooks, 1964

SOFTWARESOFTWARE• Organization of Programmable Storage

• Data Types & Data Structures: Encodings & Representations

• Instruction Set

• Instruction Formats

• Modes of Addressing and Accessing Data Items and Instructions

• Exceptional Conditions

EE37E 2005 17

The Instruction Set: a Critical Interface

instruction set

software

hardware

Figure 2: ISA

EE37E 2005 18

Dynamic Static Interface

• We have discussed two critical roles played by the ISA:– Contract between software and Hardware, which

facilitates the development pf programs and machines

– Specification for microprocessor design

• The third role is an associated definition of an interface that separates what is done statically at the compile time versus what is done dynamically at run time. This interface is called the “ Dynamic-static Interface”

EE37E 2005 19

Architecture (DSI)

Program

Machine (Hardware)

(Software)

Compiler

complexity

Hardware

complexity

Exposed to software

Hidden in hardware

“Static”

“Dynamic”

Figure 3: The dynamic-static feature

EE37E 2005 20

Computer Architecture Topics

Instruction Set Architecture

Pipelining, Hazard Resolution,Superscalar, Reordering, Prediction, Speculation,Vector, DSP

Addressing,Protection,Exception Handling

L1 Cache

L2 Cache

DRAM

Disks, WORM, Tape

Coherence,Bandwidth,Latency

Emerging TechnologiesInterleavingBus protocols

RAID

VLSI

Input/Output and Storage

MemoryHierarchy

Pipelining and Instruction Level Parallelism

EE37E 2005 21

Principles of Processor Performance

EE37E 2005 22

Definitions•Performance is in units of things per sec

– bigger is better

•If we are primarily concerned with response time–performance(x) = 1

execution_time(x)

" X is n times faster than Y" means

Performance(X) Execution_time(Y) n = =

Performance(Y) Execution_time(X)

EE37E 2005 23

Cycles Per Instruction

IC = Instruction CountCPI = Clock Per Instruction

n

jjj ICPI

CPI

1

Time Cycle timeCPU

RateClock

CPIIC timeCPU

timecycleClock CPIIC timeCPUIC

cyclesclock ofNumber

FrequencyClock

cyclesclock ofNumber timeCPU

timecycleClock cyclesclock ofNumber timeCPU

EE37E 2005 24

Cycles Per Instruction

We may separate the contribution of each type ofinstruction to the execution time defining:

j

CPIj

IC

ICCPI

j

j

n

jjj

n instructio execute torequired clocks

ofnumber average theis and executed, is

ninstructio that timesofnumber theis where

cyclesclock ofNumber 1

Processor pipelining and memory interactions limit the accuracy of this approach, but its a good first guess. For accuracy, it is necessary to simulate the instructions of an entire program with issue, pipeline and memory interactions.

EE37E 2005 25

Aspects of CPU Performance (CPU Law)

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

EE37E 2005 26

Amdahl's Law

Speedup due to enhancement E:

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected

E.g. special instructions, memory, IO, parallel processing

E w/oePerformanc

E w/ ePerformanc

E w/ Time Exec

E w/oTime ExecSpeedup(E)

EE37E 2005 27

Amdahl’s Law

enhanced

enhancedenhancedoldnew Speedup

FractionFraction1ExTimeExTime

enhanced

enhancedenhanced

new

oldoverall

Speedup

FractionFraction1

1

ExTime

ExTimeSpeedup

EE37E 2005 28

Amdahl’s Law

• Example: Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

95.0ExTime2

0.11.01ExTimeExTime oldoldnew

053.195.0

1

95.0ExTime

ExTime

ExTime

ExTimeSpeedup

old

old

new

oldoverall

EE37E 2005 29

Topic 2: Instruction Set Architecture Design

Adapted from Prof. Jerry Breecher’s Notes + my CS21Q Notes

(http://babbage.clarku.edu/~jbreecher/arch/arch.html)

EE37E 2005 30

Introduction

7.1 Introduction

7.2 Classifying Instruction Set Architectures

7.3 Memory Addressing

7.4 Operations in the Instruction Set

7.5 Type and Size of Operands

7.6 Encoding and Instruction Set

7.7 The Role of Compilers

7.8 The MIPS Architecture and Bonus

7.9. Endianess

EE37E 2005 31

Introduction

The Instruction Set Architecture is that portion of the machine visible to the assembly level programmer or to the compiler writer.

Questions:

- What are the advantages and disadvantages of various instruction set alternatives?

- How do languages and compilers affect ISA?

instruction set

software

hardware

EE37E 2005 32

Classifying Instruction Set Architectures

Classifications can be by:

1. Stack/accumulator/register2. Number of memory operands.3. Number of total operands.

EE37E 2005 33

Instruction Set Architectures

Accumulator:1 address add A acc acc + mem[A]1+x address addx A acc acc + mem[A + x]

Stack:0 address add tos tos + next

General Purpose Register:2 address add A B EA(A) EA(A) + EA(B)3 address add A B C EA(A) EA(B) + EA(C)

Load/Store:0 Memory load R1, Mem1

load R2, Mem2add R1, R2

1 Memory add R1, Mem2

Basic ISA Classes

ALU Instructions can have two or three operands.

ALU Instructions can have 0, 1, 2, 3 operands. Shown here are cases of 0 and 1.

EE37E 2005 34


Basic ISA Classes

Stack Accumulator Register(Register-memory)

Register(load-store)

Push A Load A Load R1, A Load R1, A

Push B Add B Add R1, B Load R2, B

Add Store C Store C, R1 Add R3, R1, R2

Pop C Store C, R3

The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B.

Registers are the class that won out. The more registers on the CPU, the better.

EE37E 2005 35


Intel 80x86 Integer Registers

GPR0 EAX Accumulator

GPR1 ECX Count register, string, loop

GPR2 EDX Data Register; multiply, divide

GPR3 EBX Base Address Register

GPR4 ESP Stack Pointer

GPR5 EBP Base Pointer – for base of stack seg.

GPR6 ESI Index Register

GPR7 EDI Index Register

CS Code Segment Pointer

SS Stack Segment Pointer

DS Data Segment Pointer

ES Extra Data Segment Pointer

FS Data Seg. 2

GS Data Seg. 3

PC EIP Instruction Counter

Eflags Condition Codes

EE37E 2005 36

Memory Addressing

Sections Include:

Interpreting Memory Addresses

Addressing Modes

Displacement Address Mode

Immediate Address Mode

EE37E 2005 37

Memory Addressing

What object is accessed as a function of the address and length?

Objects have byte addresses – an address refers to the number of bytes counted from the beginning of memory.

Little Endian – puts the byte whose address is xx00 at the least significant position in the word.

Big Endian – puts the byte whose address is xx00 at the most significant position in the word.

Alignment – data must be aligned on a boundary equal to its size. Misalignment typically results in an alignment fault that must be handled by the Operating System.

Interpreting Memory Addresses

EE37E 2005 38

Memory Addressing

Addressing Modes

This table shows the most common modes. A more complete set is in Figure 2.6

Addressing Mode Example Instruction Meaning When Used

Register Add R4, R3 R[R4] <- R[R4] + R[R3] When a value is in a register.

Immediate Add R4, #3 R[R4] <- R[R4] + 3 For constants.

Displacement Add R4, 100(R1) R[R4] <- R[R4] + M[100+R[R1] ]

Accessing local variables.

Register Deferred Add R4, (R1) R[R4] <- R[R4] + M[R[R1] ]

Using a pointer or a computed address.

Absolute Add R4, (1001) R[R4] <- R[R4] + M[1001] Used for static data.

EE37E 2005 39

Memory Addressing

Displacement Addressing Mode

How big should the displacement be?

For addresses that do fit in displacement size:Add R4, 10000 (R0)

For addresses that don’t fit in displacement size, the compiler must do the following:

Load R1, addressAdd R4, 0 (R1)

Depends on typical displaces as to how big this should be.

On both IA32 and DLX, the space allocated is 16 bits.

EE37E 2005 40

Memory Addressing

Immediate Address Mode

Used where we want to get to a numerical value in an instruction.

So how would you get a 32 bit value into a register?

At high level:

a = b + 3;

if ( a > 17 )

goto Addr

At Assembler level:

Load R2, 3Add R0, R1, R2

Load R2, 17CMPBGT R1, R2

Load R1, AddressJump (R1)

EE37E 2005 41

Operations In The Instruction Set

Sections Include:

Detailed information about types of instructions.

Instructions for Control Flow (conditional branches, jumps)

EE37E 2005 42


Arithmetic and logical and, add Data transfer move, load Control branch, jump, call System system call, traps Floating point add, mul, div, sqrt Decimal add, convert String move, compare Multimedia - 2D, 3D? e.g., Intel MMX and Sun VIS

Operator Types

EE37E 2005 43


Control Instructions Issues:– taken or not– where is the target – link return address– save or restore

Instructions that change the PC:– (conditional) branches, (unconditional) jumps– function calls, function returns– system calls, system returns

Control InstructionsConditional branches are 20%

of all instructions!!

EE37E 2005 44

Type And Size of Operands

The type of the operand is usually encoded in the Opcode – a LDW implies loading of a word.

Common sizes are:

Character (1 byte)Half word (16 bits)Word (32 bits)Single Precision Floating Point (1 Word)Double Precision Floating Point (2 Words)

Integers are two’s complement binary.Floating point is IEEE 754.Some languages (like COBOL) use packed decimal.

EE37E 2005 45

The MIPS Architecture

MIPS is very RISC oriented.

EE37E 2005 46


MIPS Characteristics

There’s MIPS – 32 that we learned in CS140

32 bit byte addresses aligned Load/store only displacement

addressing Standard datatypes 3 fixed length formats 32 32 bit GPRs (r0 = 0) 16 64 bit (32 32 bit) FPRs FP status register No Condition Codes

Data transfer • load/store word, load/store

byte/halfword signed? • load/store FP single/double • moves between GPRs and FPRs ALU • add/subtract signed? immediate? • multiply/divide signed? • and,or,xor immediate?, shifts: ll, rl,

ra immediate? • sets immediate?

There’s MIPS – 64 – the current arch.Standard datatypes 4 fixed length formats (8,16,32,64)32 64 bit GPRs (r0 = 0) 64 64 bit FPRs

Addressing Modes• Immediate• Displacement • (Register Mode used only for ALU)

EE37E 2005 47


MIPS Characteristics

Control • branches == 0, <> 0 • conditional branch testing FP bit • jump, jump register • jump & link, jump & link register • trap, return from exception

Floating Point• add/sub/mul/div • single/double • fp converts, fp set

EE37E 2005 48


The MIPS Encoding

Op

31 26 01516202125

Rs1 Rd immediate

Op

31 26 025

Op

31 26 01516202125

Rs1 Rs2

target

Rd Opx

Register-Register

561011

Register-Immediate

Op

31 26 01516202125

Rs1 Rs2/Opx immediate

Branch

Jump / Call

EE37E 2005 49

Byte Ordering

• How should bytes within multi-byte word be ordered in memory?

• Conventions– Sun’s, Mac’s are “Big Endian” machines

• Least significant byte has highest address

– Alphas, PC’s are “Little Endian” machines• Least significant byte has lowest address

EE37E 2005 50

Byte Ordering Example

• Big Endian– Least significant byte has highest address

• Little Endian– Least significant byte has lowest address

• Example– Variable x has 4-byte representation 0x01234567– Address given by &x is 0x100

0x100 0x101 0x102 0x103

01 23 45 67

0x100 0x101 0x102 0x103

67 45 23 01

Big Endian

Little Endian

01 23 45 67

67 45 23 01

EE37E 2005 51

Machine-Level Code Representation

• Encode Program as Sequence of Instructions– Each simple operation

• Arithmetic operation• Read or write memory• Conditional branch

– Instructions encoded as bytes• Alpha’s, Sun’s, Mac’s use 4 byte instructions

– Reduced Instruction Set Computer (RISC)• PC’s use variable length instructions

– Complex Instruction Set Computer (CISC)– Different instruction types and encodings for different

machines• Most code not binary compatible

• Programs are Byte Sequences Too!

EE37E 2005 52

Classification of Processors• We can classify processors according to the areas

in which they are mostly used. • We can identity four different group of processors:

– General purpose processors that are used in building computers

– Digital Signal processors which are processors designed specifically for signal processing.

– Microcontrollers which are small microcromputers which integrate in the same chip a core processors plus I/O elements and small amount of memories

– Application specific processors which design to performed specific function (i.e. Network processors)

EE37E 2005 53

General Purpose Processors

• These processors are used to built major computer platforms.

• We can name:– Intel / AMD based computers also called IBM

compatible– Macintosh computers built using PowerPC

processors– Sun machines that use Ultrasparc Processors.

EE37E 2005 54

Examples of General Purpose Processors

Type of Computer Processors Used Technology

Macinstosh PowerPC (IBM, Motorola)

Superscalar

Sun Ultrasparc(SUN)

RISC

IBM Compatible Intel ProcessorsAthlon, Duron (AMD), Cyrix

Superscalar

EE37E 2005 55

DSP• Digital Signal Processing (DSP) is used in a wide variety of

applications, and it is hard to find a good definition that is general.• We can start by dictionary definitions of the words:

– Digital * operating by the use of discrete signals to represent data

in the form of numbers – Signal * a variable parameter by which information is conveyed

through an electronic circuit – Processing * to perform operations on data according to programmed

instructions • Which leads us to a simple definition of: Digital Signal

processing

* changing or analyzing information which is measured as discrete sequences of numbers

EE37E 2005 56

• Note two unique features of Digital Signal processing as opposed to plain old ordinary digital processing:– signals come from the real world - this intimate connection with the

real world leads to many unique needs such as the need to react in real time and a need to measure signals and convert them to digital numbers

– signals are discrete - which means the information in between discrete samples is lost

• The advantages of DSP are common to many digital systems and include:– Versatility:

• digital systems can be reprogrammed for other applications (at least where programmable DSP chips are used)

• digital systems can be ported to different hardware (for example a different DSP chip or board level product)

– Repeatability:• digital systems can be easily duplicated • digital systems do not depend on strict component tolerances • digital system responses do not drift with temperature

– Simplicity:• some things can be done more easily digitally than with analogue

systems

EE37E 2005 57

• DSP is used in a very wide variety of applications.

• But most share some common features:– they use a lot of math

(multiplying and adding signals)

– they deal with signals that come from the real world

– they require a response in a certain time

• Where general purpose DSP processors are concerned, most applications deal with signal frequencies that are in the audio range.

Documents

EE37E20051 Lesson 5: Processor Design Topic 1 – Methods and Concepts