Course number: CS141 Who? Tarun Soni ( [email protected] )

CS141-L1-1 Tarun Soni, Summer ‘03

Course number: CS141

Who? Tarun Soni ( [email protected] )TA: Wenjing Rao (wrao@cs) and Eric Liu (xeliu@cs)

Where? CENTR: 119When? M,W @ 6-8:50pm Textbook: Patterson and Hennessy,

Computer Organization & DesignThe hardware software interface, 2nd edition.

Web-page: http://www-cse.ucsd.edu/users/tsoni/cse141(slides, homework questions, other pointers and information)

Office hours: Tarun: Mon. 4pm-6pm: AP&M 3151Yang Yu and Wenjing Rao: TBD, look on the webpage

Introduction to Computer Architecture

http://www-cse.ucsd.edu/users/tsoni/cse141


Administrivia Technology trends Computer organization: concept of abstraction Instruction Set Architectures: Definition, types, examples Instruction formats: operands, addressing modes Operations: load, store, arithmetic, logical Control instructions: branch, jump, procedures Stacks Examples: in-line code, procedure, nested-procedures Other architectures

Todays Agenda


Schedule-sort of

1 6/30 Intro., Technology, ISA

2 7/2 Performance, Cost, Arithmetic

3 7/7 Multiply, Divide?, FP numbers

4 7/9 Single cycle: Datapath, Control

5 7/14 Multiple Cycle CPU, Microprogramming

6 7/16 Mid-term quiz;

7 7/21 Pipelining: intro, control, exceptions

8 7/23 Memory systems, Cache, Virtual memory

9 7/28 I/O Devices

10 7/30 Superscalars, Parallel machines

11 ?? Overview, wrapup, catchup ..

** ?? Final, 7-10 pm, Friday


Grading

• Grade breakdown– Mid-term (1.5 hours) 30% – Final (3 hours) 40%– Pop-Quizzes (3, 45 min each, only 2 high scores cout) 30% – Class Participation: Extras??

• Can’t make exams: tell us early and we will work something out• Homeworks do not need to be turned in. However, pop-quizzes will be based on hw.• What is cheating?

– Studying together in groups is encouraged– Work must be your own– Common examples of cheating: copying an exam question from other material or

other person...– Better off to skip question (small fraction of grade.)

• Written/email request for changes to grades– average grade will be a B or B+; set expectations accordingly


Why?

• You may become a practitioner someday ?• Keeper of Moore’s law• Architecture concepts are core to other sub-systems

• Video-processors• Security engines• Routing/Networking etc.

• Even if you become a software geek?

• Architecture enables a way of thinking• Understanding leads to breadth and better

implementation of software


‘Computer” of the day

Jacquard loomlate 1700’sfor weaving silk

“Program” on punch cards

“Microcode”: each holelifts a set of threads

“Or gate”: thread lifted if any controlling hole punched


Trends: Moores law


Trends: $1000 will buy you…


Trends: Densities


Technology

Source: Intel Journal, May 2002


Other technology trends

• Processor

– logic capacity: about 30% per year

– clock rate: about 20% per year

• Memory

– DRAM capacity: about 60% per year (4x every 3 years)

– Memory speed: about 10% per year

– Cost per bit: about 25% per year

• Disk

– capacity: about 60% per year

1

10

100

1000

1 4 7

10

13

16

19

22

25

28

31

34

CPU Speed

DRAM Speed

1

10

100

1000

10000

100000

1000000

10000000

100000000

1 4 7

10

13

16

19

22

25

28

31

34

CPU logic capacity

DRAM capacity

Disk capacity

Speed Capacity

Physics-advancementArchitecture-advancement


SPEC Performance

0

50

100

150

200

250

300

350

1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995

Year

Perform

ance

RISC

Intel x86

35%/yr

RISCintroduction

performance now improves 50% per year (2x every 1.5 years)


Organization: A Basic Computer

Computer

Memory

Datapath

Control

Output

Input

Every computer has 5 basic components


Organization: A Basic Computer

Proc

CachesBusses

Memory

I/O Devices:

Controllers

adapters

DisksDisplaysKeyboards

Networks

• Not all “memory” are created equally– Cache: fast (expensive) memory are placed closer to the processor– Main memory: less expensive memory--we can have more

• Input and output (I/O) devices have the messiest organization– Wide range of speed: graphics vs. keyboard– Wide range of requirements: speed, standard, cost ... – Least amount of research (so far)


What is “Computer Architecture”

Computer Architecture =

Instruction Set Architecture + Machine Organization

Computer Architecture and Engineering

Instruction Set Design Computer Organization

Interfaces Hardware Components

Compiler/System View Logic Designer’s View

How you talk to the machine What the machine looks like


Architecture?

I/O systemInstr. Set Proc.

Compiler

OperatingSystem

Application

Digital DesignCircuit Design

Instruction Set Architecture

Firmware

• Coordination of many levels of abstraction

• Under a rapidly changing set of forces

• Design, Measurement, and Evaluation

Datapath & Control

Layout


Levels of abstraction?

High Level Language Program

Assembly Language Program

Machine Language Program

Control Signal Specification

Compiler

Assembler

Machine Interpretation

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

ALUOP[0:3] <= InstReg[9:11] & MASK



instruction set

software

hardware

ISA is the agreed-upon interface between all the software that runs on the machine and the hardware that executes it.


Example ISAs

• IBM360, VAX etc. • Digital Alpha (v1, v3) 1992-

97• HP PA-RISC (v1.1, v2.0) 1986-

96• Sun Sparc (v8, v9) 1987-

95• SGI MIPS (MIPS I, II, III, IV, V) 1986-

96• Intel (8086,80286,80386, 1978-

96 80486,Pentium, MMX, ...)

• ARM ARM7,8,StrongARM 1995-

Digital Signal Processors also have an ISA

TMS320, Motorola, OAK etc.


ISAs


“How to talk to computers if you aren’t in Star Trek”


ISAs

• Language of the Machine• More primitive than higher level languages

e.g., no sophisticated control flow• Very restrictive

e.g., MIPS Arithmetic Instructions

• We’ll be working with the MIPS instruction set architecture– similar to other architectures developed since the 1980's– used by NEC, Nintendo, Silicon Graphics, Sony

Design goals: maximize performance and minimize cost, reduce design time


ISAs

Ideally the only part of the machine visible to the programmer/compiler

• Available instructions (Opcodes)

• Formats

• Registers, number and type

• Addressing modes, access mechanisms

• Exception conditions etc.


Instruction Set Architecture: What Must be Specified?

Instruction

Fetch

Instruction

Decode

Operand

Fetch

Execute

Result

Store

Next

Instruction

° Instruction Format or Encoding

– how is it decoded?

° Location of operands and result

– where other than memory?

– how many explicit operands?

– how are memory operands located?

– which can or cannot be in memory?

° Data type and Size

° Operations

– what are supported

° Successor instruction

– jumps, conditions, branches

fetch-decode-execute is implicit!


Vocabulary

• superscalar processor -- can execute more than one instructions per cycle.

• cycle -- smallest unit of time in a processor.

• parallelism -- the ability to do more than one thing at once.

• pipelining -- overlapping parts of a large task to increase throughput without decreasing latency


ISA Decisions

• operations

– how many?

– which ones

• operands

– how many?

– location

– types

– how to specify?

• instruction format

– size

– how many formats?

y = x + b

operationdestination operand

how does the computer know what0001 0100 1101 1111means?

(add r1, r2, r5)


Crafting an ISA

• We’ll look at some of the decisions facing an instruction set architect, and

• how those decisions were made in the design of the MIPS instruction set.

• MIPS, like SPARC, PowerPC, and Alpha AXP, is a RISC (Reduced Instruction Set Computer) ISA.

– fixed instruction length

– few instruction formats

– load/store architecture

• RISC architectures worked because they enabled pipelining. They continue to thrive because they enable parallelism.


Basic types of ISAs

Accumulator (1 register):

1 address add A acc acc + mem[A]

1+x address addx A acc acc + mem[A + x]

Stack:

0 address add tos tos + next

General Purpose Register:

2 address add A B EA(A) EA(A) + EA(B)

3 address add A B C EA(A) EA(B) + EA(C)

Load/Store:

3 address add Ra Rb Rc Ra Rb + Rc

load Ra Rb Ra mem[Rb]

store Ra Rb mem[Rb] Ra

Comparison:

Bytes per instruction? Number of Instructions? Cycles per instruction?


Instruction Count

Accumulator (1 register):

Load A

Add B

Store C

Stack:

Push A

Push B

Add

Pop C

C = A+B

General Purpose Register: (Register-Memory)

Load R1,A

Add R1,B

Store C,R1

Load/Store:

Load R1,A

Load R2,B

Add R3,R1,R2

Store C,R3


Instruction Length

Variable:

Fixed:

Hybrid:

…

• All instructions have 3 operands

• Operand order is fixed (destination first)

C code: A = B + CMIPS code: add $s0, $s1, $s2

(associated with variables by compiler)

MIPS Instructions


Instruction Length

• Variable-length instructions (Intel 80x86, VAX) require multi-step fetch and decode, but allow for a much more flexible and compact instruction set.

• Fixed-length instructions allow easy fetch and decode, and simplify pipelining and parallelism.

All MIPS instructions are 32 bits long.

– this decision impacts every other ISA decision we make because it makes instruction bits scarce.

• If code size is most important, use variable length instructions

• If performance is most important, use fixed length

Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide instructions (Thumb, MIPS16)

choose performance or density per procedure


MIPS Instruction Format

OP

OP

OP

rs rt rd sa funct

rs rt immediate

target

• the opcode tells the machine which format

• so add r1, r2, r3 has

– opcode=0, funct=32, rs=2, rt=3, rd=1, sa=0

– 000000 00010 00011 00001 00000 100000

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits


Operands

• operands are generally in one of two places:

– registers (32 int, 32 fp)

– memory (232 locations)

• registers are

– easy to specify

– close to the processor (fast access)

• the idea that we want to access registers whenever possible led to load-store architectures.

– normal arithmetic instructions only access registers

– only access memory with explicit loads and stores


Load Store Architectures

Load-store architectures

can do:

add r1=r2+r3

and

load r3, M(address)

forces heavy dependence on registers, which is exactly what you want in today’s CPUs

can’t do

add r1 = r2 + M(address)

-more instructions

+fast implementation (e.g., easy pipelining)

Expect new instruction set architecture to use general purpose register

Pipelining => Expect it to use load store variant of GPR ISA


General Purpose Registers

° Advantages of registers

• registers are faster than memory

• registers are easier for a compiler to use

- e.g., (A*B) – (C*D) – (E*F) multiplies in any order vs. stack

• registers can hold variables

- memory traffic is reduced, so program is sped up - code density improves (since register named with fewer bits

than memory location)

• Programmable storage

– 2^32 x bytes of memory

– 31 x 32-bit GPRs (R0 = 0)

– 32 x 32-bit FP regs (paired DP)

– HI, LO, PC

0r0r1°°°r31PClohi

MIPS Registers


Memory Organization

• Viewed as a large, single-dimension array, with an address.

• A memory address is an index into the array

• "Byte addressing" means that the index points to a byte of memory.

0

1

2

3

4

5

6

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data


Memory Organization

• Bytes are nice, but most data items use larger "words"

• For MIPS, a word is 32 bits or 4 bytes.

• 232 bytes with byte addresses from 0 to 232-1

• 230 words with byte addresses 0, 4, 8, ... 232-4

• Words are alignedi.e., what are the least 2 significant bits of a word address?

0

4

8

12

...

32 bits of data

32 bits of data

32 bits of data

32 bits of data

Registers hold 32 bits of data


Data Types

Bit: 0, 1

Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word

Character: ASCII 7 bit code

Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte

Integers: 2's Complement

Floating Point: Single Precision Double Precision Extended Precision

M x RE

How many +/- #'s?Where is decimal pt?How are +/- exponents represented?

exponent

basemantissa


Operand Usage

Frequency of reference by size

0% 20% 40% 60% 80%

Byte

Halfword

Word

Doubleword

0%

0%

31%

69%

7%

19%

74%

0%

Int Avg.

FP Avg.

Support data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers


Addressing: Endian-ness and alignment

• Big Endian: address of most significant byte = word address (xx00 = Big End of word)

– IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA

• Little Endian: address of least significant byte = word address(xx00 = Little End of word)

– Intel 80x86, DEC Vax, DEC Alpha (Windows NT)

msb lsb

3 2 1 0

little endian byte 0

0 1 2 3big endian byte 0

Alignment: require that objects fall on address that is multiple of their size.

0 1 2 3

Aligned

NotAligned


Addressing Modes

– Register direct R3– Immediate (literal) #25– Direct (absolute) M[10000]

– Register indirect M[R3]– Base+Displacement M[R3 + 10000]– if register is the program counter, this is PC-

relative– Base+Index M[R3 + R4]– Scaled Index M[R3 + R4*d + 10000]– Autoincrement M[R3++]– Autodecrement M[R3 - -]

– Memory Indirect M[ M[R3] ]

how do we specify the operand we want?


Addressing Modes

Addressing mode Example Meaning

Register Add R4,R3 R4R4+R3

Immediate Add R4,#3 R4 R4+3

Displacement Add R4,100(R1) R4 R4+Mem[100+R1]

Register indirect Add R4,(R1) R4 R4+Mem[R1]

Indexed / Base Add R3,(R1+R2) R3 R3+Mem[R1+R2]

Direct or absolute Add R1,(1001) R1 R1+Mem[1001]

Memory indirect Add R1,@(R3) R1 R1+Mem[Mem[R3]]

Auto-increment Add R1,(R2)+ R1 R1+Mem[R2]; R2 R2+d

Auto-decrement Add R1,–(R2) R2 R2–d; R1 R1+Mem[R2]

Scaled Add R1,100(R2)[R3] R1 R1+Mem[100+R2+R3*d]


Addressing Modes: Usage

3 programs measured on machine with all address modes (VAX)

--- Displacement: 42% avg, 32% to 55% 75%

--- Immediate: 33% avg, 17% to 43% 85%

--- Register deferred (indirect): 13% avg, 3% to 24%

--- Scaled: 7% avg, 0% to 16%

--- Memory indirect: 3% avg, 1% to 6%

--- Misc: 2% avg, 0% to 3%

75% displacement & immediate88% displacement, immediate & register indirect

similar measurements:

- 16 bits is enough for the immediate address 75 to 80% of the time

- 16 bits is enough of a displacement 99% of the time.


Program Base + Dis- placement

Immediate Scaled Index

Memory Indirect

All Others

TEX 56% 43% 0 1 0

Spice 58% 17% 16% 6% 3%

GCC 51% 39% 6% 1% 3%

Addressing mode usage: Application Specific


MIPS Addressing Modes

register direct

add $1, $2, $3

immediate

add $1, $2, #35

base + displacement

lw $1, disp($2)

OP rs rt rd sa funct

OP rs rt immediate

rt

rs

immediate register indirect disp = 0absolute (rs) = 0


MIPS ISA-so far

• fixed 32-bit instructions

• 3 instruction formats

• 3-operand, load-store architecture

• 32 general-purpose registers (integer, floating point)

– R0 always equals 0.

• 2 special-purpose integer registers, HI and LO, because multiply and divide produce more than 32 bits.

• registers are 32-bits wide (word)

• register, immediate, and base+displacement addressing modes

But what about the actual instructions themselves ??


Typical Operations (little change since 1960)

Data Movement Load (from memory)Store (to memory)memory-to-memory moveregister-to-register moveinput (from I/O device)output (to I/O device)push, pop (to/from stack)

Arithmetic integer (binary + decimal) or FPAdd, Subtract, Multiply, Divide

Logical not, and, or, set, clear

Shift shift left/right, rotate left/right

Control (Jump/Branch) unconditional, conditional

Subroutine Linkage call, return

Interrupt trap, return

Synchronization test & set (atomic r-m-w)

String search, translate

Graphics (MMX) parallel subword ops (4 16bit add)


80x86 Instruction usage

° Rank instruction Integer Average Percent total executed

1 load 22%

2 conditional branch 20%

3 compare 16%

4 store 12%

5 add 8%

6 and 6%

7 sub 5%

8 move register-register 4%

9 call 1%

10 return 1%

Total 96%

° Simple instructions dominate instruction frequency


Instruction usage

• Support the simple instructions, since they will dominate the number of instructions executed:

load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return;

Compiler Issues

orthogonality: no special registers, few special cases, all operand modes available with any data type or instruction type

completeness: support for a wide range of operations and target applications

regularity: no overloading for the meanings of instruction fields

streamlined: resource needs easily determined

Register Assignment is critical too

Easier if lots of registers


MIPS Instructions

• arithmetic

– add, subtract, multiply, divide

• logical

– and, or, shift left, shift right

• data transfer

– load word, store word

• conditional Branch

• unconditional Jump


MIPS Instructions

• arithmetic

– add, subtract, multiply, divide

Instruction Example Meaning Comments

add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible

subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible

add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible

add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions

subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions

add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no exceptions

multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product

multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product

divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder

Hi = $2 mod $3

divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder

Hi = $2 mod $3

move from Hi mfhi $1 $1 = Hi Used to get copy of Hi

move from Lo mflo $1 $1 = Lo Used to get copy of Lo


MIPS Instructions

• logical

– and, or, shift left, shift right

Instruction Example Meaning Comment

and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND

or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR

xor xor $1,$2,$3 $1 = $2 Å $3 3 reg. operands; Logical XOR

nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR

and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant

or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant

xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant

shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant

shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant

shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend)

shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable

shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable

shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable


MIPS Instructions

• data transfer

– load word, store wordInstruction Comment

SW 500(R4), R3 Store word

SH 502(R2), R3 Store half

SB 41(R3), R2 Store byte

LW R1, 30(R2) Load word

LH R1, 40(R3) Load halfword

LHU R1, 40(R3) Load halfword unsigned

LB R1, 40(R3) Load byte

LBU R1, 40(R3) Load byte unsigned

LUI R1, 40 Load Upper Immediate (16 bits shifted left by 16)

Why need LUI?

0000 … 0000

LUI R5

R5


MIPS Control Instructions

° Condition Codes

Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions.

add r1, r2, r3

bz label

° Condition Register

cmp r1, r2, r3

bgt r1, label

° Compare and Branch

bgt r1, r2, label

• How do you specify the destination of a branch/jump?• studies show that almost all conditional branches go short distances from the

current program counter (loops, if-then-else).– we can specify a relative address in much fewer bits than an absolute

address– e.g., beq $1, $2, 100 => if ($1 == $2) PC = PC + 100 * 4

• How do we specify the condition of the branch?


Conditional Branch Distance

Bits of Branch Dispalcement

0%

10%

20%

30%

40%

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

Int. Avg. FP Avg.


Conditional Branching

• PC-relative since most branches are relatively close to the current PC address

• At least 8 bits suggested (± 128 instructions)

• Compare Equal/Not Equal most important for integer programs (86%)

Frequency of comparison types in branches

0% 50% 100%

EQ/NE

GT/LE

LT/GE

37%

23%

40%

86%

7%

7%

Int Avg.

FP Avg.


Conditional Branching

• Compare and Branch – BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch – BNE rs, rt, offset <>

• Compare to zero and Branch – BLEZ rs, offset if R[rs] <= 0 then PC-relative branch – BGTZ rs, offset > – BLT < – BGEZ >= – BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31) – BGEZAL >=

• Remaining set of compare and branch take two instructions • Almost all comparisons are against zero!

• beq, bne beq r1, r2, addr => if (r1 == r2) goto addr• slt $1, $2, $3 => if ($2 < $3) $1 = 1; else $1 = 0• these, combined with $0, can implement all fundamental branch conditions

– Always, never, !=, ==, >, <=, >=, <, >(unsigned), <= (unsigned), ...

MIPS Branch Instructions


Jumps

• need to be able to jump to an absolute address sometime• need to be able to do procedure calls and returns

• jump -- j 10000 => PC = 10000• jump and link -- jal 100000 => $31 = PC + 4; PC = 10000

– used for procedure calls

• jump register -- jr $31 => PC = $31– used for returns, but can be useful for lots of other things.


Jumps

OP

OP

OP

rs rt rd sa funct

rs rt Immediate (16 bits)

target

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

R

I

J

MIPS Instruction Formats

• Branch (e.g., beq) uses PC-relative addressing mode (few bits if addr typically close) uses base+displacement mode, with the PC being the base.

• Jump uses pseudo-direct addressing mode. 26 bits of the address is in the instruction, the rest is taken from the PC.

instruction program counter

6 26 6 26

jump destination address

MIPS Addressing Formats: Branches and Jumps


MIPS Branch & Jump Instructions

Instruction Example Meaning

branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100Equal test; PC relative branch

branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100Not equal test; PC relative

set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0Compare less than; 2’s comp.

set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0Compare < constant; 2’s comp.

set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0Compare less than; natural numbers

set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0Compare < constant; natural numbers

jump j 10000 go to 10000Jump to target address

jump register jr $31 go to $31For switch, procedure return

jump and link jal 10000 $31 = PC + 4; go to 10000For procedure call


Stacks

Stacking of Subroutine Calls & Returns and Environments:

A: CALL B

CALL C

C: RET

RET

B:

A

A B

A B C

A B

A

Some machines provide a memory stack as part of the architecture (e.g., VAX)

Sometimes stacks are implemented via software convention (e.g., MIPS)


Stacks

Useful for stacked environments/subroutine call & return even if operand stack not part of architecture

Stacks that Grow Up vs. Stacks that Grow Down:

abc

0 Little

inf. Big 0 Little

inf. Big

MemoryAddresses

SP

NextEmpty?

LastFull?

Little --> Big/Last Full

POP: Read from Mem(SP) Decrement SP

PUSH: Increment SP Write to Mem(SP)

growsup

growsdown

Little --> Big/Next Empty

POP: Decrement SP Read from Mem(SP)

PUSH: Write to Mem(SP) Increment SP


Stack Frames

FP

ARGS

Callee SaveRegisters

Local Variables

SP

Reference args andlocal variables atfixed (positive) offsetfrom FP

Grows and shrinks duringexpression evaluation

(old FP, RA)

• Many variations on stacks possible (up/down, last pushed / next )

• Block structured languages contain link to lexically enclosing frame

• Compilers normally keep scalar variables in registers, not memory!

High Mem

Low Mem


MIPS Software Register Conventions

0 zero constant 0

1 at reserved for assembler

2 v0 expression evaluation &

3 v1 function results

4 a0 arguments

5 a1

6 a2

7 a3

8 t0 temporary: caller saves

. . . (callee can clobber)

15 t7

16 s0 callee saves

. . . (caller can clobber)

23 s7

24 t8 temporary (cont’d)

25 t9

26 k0 reserved for OS kernel

27 k1

28 gp Pointer to global area

29 sp Stack pointer

30 fp frame pointer

31 ra Return Address (HW)


MIPS Branch & Jump Instructions

MIPS operands

Name Example Comments$s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform

32 registers $a0-$a3, $v0-$v1, $gp, arithmetic. MIPS register $zero always equals 0. Register $at is

$fp, $sp, $ra, $at reserved for the assembler to handle large constants.

Memory[0], Accessed only by data transfer instructions. MIPS uses byte addresses, so

230 memory Memory[4], ..., sequential words differ by 4. Memory holds data structures, such as arrays,words Memory[4294967292] and spilled registers, such as those saved on procedure calls.

MIPS assembly language

Category Instruction Example Meaning Commentsadd add $s1, $s2, $s3 $s1 = $s2 + $s3 Three operands; data in registers

Arithmetic subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; data in registers

add immediate addi $s1, $s2, 100 $s1 = $s2 + 100 Used to add constants

load word lw $s1, 100($s2) $s1 = Memory[$s2 + 100] Word from memory to register

store word sw $s1, 100($s2) Memory[$s2 + 100] = $s1 Word from register to memory

Data transfer load byte lb $s1, 100($s2) $s1 = Memory[$s2 + 100] Byte from memory to register

store byte sb $s1, 100($s2) Memory[$s2 + 100] = $s1 Byte from register to memoryload upper immediate

lui $s1, 100 $s1 = 100 * 216 Loads constant in upper 16 bits

branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to PC + 4 + 100

Equal test; PC-relative branch

Conditional

branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to PC + 4 + 100

Not equal test; PC-relative

branch set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1; else $s1 = 0

Compare less than; for beq, bne

set less than immediate

slti $s1, $s2, 100 if ($s2 < 100) $s1 = 1; else $s1 = 0

Compare less than constant

jump j 2500 go to 10000 Jump to target address

Uncondi- jump register jr $ra go to $ra For switch, procedure return

tional jump jump and link jal 2500 $ra = PC + 4; go to 10000 For procedure call


Example: Swap()

• Can we figure out the code?

swap(int v[], int k);{ int temp;

temp = v[k]v[k] = v[k+1];v[k+1] = temp;

}

swap: // $4=v, $5=kmuli $2, $5, 4 // $2 = k*4add $2, $4, $2 // $2 = v+(4*k)lw $15, 0($2) // $15=temp= *($2+0)=*(v+k)lw $16, 4($2) // $16 = *($2+4) = *(v+k+1)sw $16, 0($2) // *(v+k) = $16 = *(v+k+1)sw $15, 4($2) // *(v+k+1) = $15 = tempjr $31 // return;


Example: Leaf_procedure()

• Procedures?int PairDiff(int a, int b, int c,int d);{ int temp;

temp = (a+b)-(c+d);return temp;

}

Assume caller puts $a0-$a3 = a,b,c,d and wants result in $v0PairDiff: //

sub $sp,$sp,12 // Make space for 3 temp locationssw $t1, 8($sp) // save $t1 (optional if MIPS convention)sw $t0, 4($sp) // save $t0 (optional if MIPS convention)sw $s0, 0($sp) // save $s0

add $t0,$a0,$a1 // (t0=a+b) add $t1,$a2,$a3 // (t1=c+d)sub $s0,$t0,$t1 // (s0=t0-t1)

add $v0,$s0,$zero // store return value in $v0lw $s0,0($sp) // restore registerslw $t0,4($sp) // (optional if MIPS convention)lw $t1,8($sp) // (optional if MIPS convention)add $sp,$sp,12 // ‘pop’ the stack

jr $ra // The actual return to calling routine


Example: Nested_procedure()

• What about nested procedures? $ra ??

• Recursive procedures?

int fact(int n);{

if(n<1) return(1);else return (n*fact(n-1));

}

Assume $a0 = n fact: //

sub $sp,$sp,8 // Make space for 2 temp locationssw $ra, 4($sp) // save return addresssw $a0, 4($sp) // save argument n

slt $t0,$a0,1 // test for n<1beq $t0,$zero, L1 // if (n>=1) goto L1

add $v0,$zero,1 // $v0=1add $sp,$sp,8 // ‘pop’ the stack

jr $ra // return

L1: sub $a0,$a0,1 // n--; jal fact; // call fact again.

lw $a0,0($sp) // fact() returns here. Restore nlw $ra,4($sp) // restore return addressadd $sp,$sp,8 // ‘pop’ stackmult $v0,$a0,$v0 // $v0 = n*fact(n-1)jr $ra // return to caller

(n<1) case

(n>=1) case


Other Architectures

• Design alternative:– provide more powerful operations (e.g., DSP, Encryption engines, Java

Processors)– goal is to reduce number of instructions executed– danger is a slower cycle time and/or a higher CPI

• Sometimes referred to as “RISC vs. CISC”– virtually all new instruction sets since 1982 have been RISC– VAX: minimize code size, make assembly language easy

instructions from 1 to 54 bytes long!• We’ll look at PowerPC and 80x86


Power PC

• Indexed addressing– example: lw $t1,$a0+$s3 // $t1=Memory[$a0+$s3]– What do we have to do in MIPS?

• Update addressing– update a register as part of load (for marching through arrays)– example: lwu $t0,4($s3) // $t0=Memory[$s3+4];$s3=$s3+4– What do we have to do in MIPS?

• Others:– load multiple/store multiple– a special counter register “bc Loop”

decrement counter, if not 0 goto loop


x86: Volume is beautiful

• 1978: The Intel 8086 is announced (16 bit architecture)• 1980: The 8087 floating point coprocessor is added• 1982: The 80286 increases address space to 24 bits, +instructions• 1985: The 80386 extends to 32 bits, new addressing modes• 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions

(mostly designed for higher performance)• 1997: MMX is added

“This history illustrates the impact of the “golden handcuffs” of compatibility”

“adding new features as someone might add clothing to a packed bag”

“an architecture that is difficult to explain and impossible to love”

“what the 80x86 lacks in style is made up in quantity,

making it beautiful from the right perspective”


x86: Complex Instruction Set

• See text for a detailed description….• Complexity:

– Instructions from 1 to 17 bytes long– one operand must act as both a source and destination– one operand can come from memory– complex addressing modes

e.g., “base or scaled index with 8 or 32 bit displacement”• Saving grace:

– the most frequently used instructions are not too difficult to build– compilers avoid the portions of the architecture that are slow


Comparing Instruction Set Architectures

Design-time metrics:

° Can it be implemented, in how long, at what cost?

° Can it be programmed? Ease of compilation?

Static Metrics:

° How many bytes does the program occupy in memory?

Dynamic Metrics:

° How many instructions are executed?

° How many bytes does the processor fetch to execute the program?

° How many clocks are required per instruction?

° How "lean" a clock is practical?

Best Metric: Time to execute the program!

This depends on instruction set, processor organization, and compilation techniques.

CPI

Inst. Count Cycle Time


Instruction Set Architectures: What did we learn today?

• MIPS is a general-purpose register, load-store, fixed-instruction-length architecture.

• MIPS is optimized for fast pipelined performance, not for low instruction count• Four principles of IS architecture

– simplicity favors regularity– smaller is faster– good design demands compromise– make the common case fast


AdministriviaTechnology trendsComputer organization: concept of abstractionInstruction Set Architectures: Definition, types, examplesInstruction formats: operands, addressing modes Operations: load, store, arithmetic, logicalControl instructions: branch, jump, proceduresStacksExamples: in-line code, procedure, nested-proceduresOther architectures

Todays Agenda

Documents

Course number: CS141 Who? Tarun Soni ( [email protected] )