66
Chapter 2 Instruction Set Principles and Examples 吳吳吳 吳吳吳吳吳吳吳吳吳吳 October 2004 EEF011 Computer Architecture 吳吳吳吳吳

Chapter 2 Instruction Set Principles and Examples

  • Upload
    pillan

  • View
    66

  • Download
    12

Embed Size (px)

DESCRIPTION

EEF011 Computer Architecture 計算機結構. Chapter 2 Instruction Set Principles and Examples. 吳俊興 高雄大學資訊工程學系 October 2004. Chapter 2. Instruction Set Principles and Examples. 2.1 Introduction 2.2 Classifying Instruction Set Architectures - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 2 Instruction Set Principles and Examples

Chapter 2Instruction Set Principles and Examples

吳俊興高雄大學資訊工程學系

October 2004

EEF011 Computer Architecture計算機結構

Page 2: Chapter 2 Instruction Set Principles and Examples

2

Chapter 2. Instruction Set Principles and Examples

2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Addressing Modes for Signal Processing 2.5 Type and Size of Operands 2.6 Operands for Media and Signal Processing2.7 Operations in the Instruction Set2.8 Operations for Media and Signal Processing2.9 Instructions for Control Flow2.10 Encoding an Instruction Set

Page 3: Chapter 2 Instruction Set Principles and Examples

3

2.1 Introduction

Instruction Set Architecture – the portion of the machine visible to the assembly level programmer or to the compiler writer

– In order to use the hardware of a computer, we must speak its language

– The words of a computer language are called instructions, and its vocabulary is called an instruction set

instruction set

software

hardware

Instr. # Operation+Operands

i movl -4(%ebp), %eax

(i+1) addl %eax, (%edx)

(i+2) cmpl 8(%ebp), %eax

(i+3) jl L5

:

L5:

Page 4: Chapter 2 Instruction Set Principles and Examples

4

Topics

1. A taxonomy of instruction set alternatives and qualitative assessment

2. Instruction set quantitative measurements3. Specific instruction set architecture4. Issues and bearings of languages and compile

rs5. Examples: MIPS and Trimedia TM32 CPUAppendices C-F:

MIPS, PowerPC, Precision Architecture, SPARCARM, Hitachi SH, MIPS 16, Thumb80x86 (App. D),IBM 360/370 (App. E), VAX (App. F)

Page 5: Chapter 2 Instruction Set Principles and Examples

5

2.2 Classifying Instruction Set Architectures

Operand storage in CPU Where are they other than memory

# explicit operands named per instruction

How many? Min, Max, Average

Addressing mode How the effective address for an operand calculated? Can all use any mode?

Operations What are the options for the opcode?

Type & size of operands How is typing done? How is the size specified?

These choices critically affect number of instructions, CPI, and CPU cycle time

Page 6: Chapter 2 Instruction Set Principles and Examples

6

ISA Classification

• Most basic differentiation: internal storage in a processor

– Operands may be named explicitly or implicitly• Major choices:

1. In an accumulator architecture one operand is implicitly the accumulator => similar to calculator

2. The operands in a stack architecture are implicitly on the top of the stack

3. The general-purpose register architectures have only explicit operands – either registers or memory location

Page 7: Chapter 2 Instruction Set Principles and Examples

7

ISA Type ExamplesExplicit operands per

ALU inst.

Result

Destination

OperandaccessMethod

Stack B5500, B6500

HP 3000/70

0 Stack Push & Pop Stack

Accumulator Motorola 6809

+ ancient ones

1 Accumulator Acc = Acc + mem[A]

Register Set IBM 360

DEC VAX

+ all modern

micro’s

2 or 3 Registers

or

Memory

Rx = Ry + mem[A]

Rx = Rx + Ry (2)

Rx = Rx + Rz (3)

Basic ISA Classes

Register-register, register-memory, and memory-memory (gone) options

Page 8: Chapter 2 Instruction Set Principles and Examples

8

Stack:

0 address add tos tos + next

Accumulator:

1 address add A acc acc + mem[A]

General Purpose Register (register-memory):

1 address add R1 A R1 R1 + mem[A]

GPR (register-register or called load/store):

0 address load R1, A R1 mem[A]

load R2, B R2 mem[B]

add R3, R1, R2 R3 R1+R2

Example

ALU Instructions can have two operands.

ALU Instructions can have three operands.

Page 9: Chapter 2 Instruction Set Principles and Examples

9

Operand Locations and Code Sequence for C=A+B

Stack AccumulatorGPR

(register-memory)GPR

(load-store)Push A Load A Load R1, A Load R1, APush B Add B Add R1, B Load R2, BAdd Store C Store C, R1 Add R3, R1, R2Pop C Store C, R3

Page 10: Chapter 2 Instruction Set Principles and Examples

10

ISA Type Advantages Disadvantages

Stack • Simple effective address• Short instructions• Good code density

• Lack of random access• Efficient code is difficult to

generate• Stack is often a bottleneck

Accumulator • Minimal internal state• Fast context switch• Short instructions

• Very high memory traffic

Register • Registers are faster than memory• Registers can be used to hold

variables+reduce memory traffic+speed up programs

• Registers are more efficient for a compiler to use than other forms of internal storage

• Longer instructions• Possibly complex effective

address generation• Size and structure of register set

has many options

Register is the class that won out!

Pro’s and Con’s

Page 11: Chapter 2 Instruction Set Principles and Examples

11

Register Machines• How many registers are sufficient?• General-purpose registers vs. special-purpose registers

• compiler flexibility and hand-optimization• Two major concerns for arithmetic and logical instructions (ALU)

1. Two or three operandsX + Y XX + Y Z

2. How many of the operands may be memory addresses (0 – 3)

Number of memory addresses

Max number of

operands allowedExamples

0 3Alpha, ARM, MIPS, PowerPC, Sparc, SuperH, Trimedia TM5200

1 2 IBM 360/370, Intel 80x86, Motorola 68000, TI TMS320C54x

2 2 VAX, PDP-1, National 32x32, IBM 360SS

3 3 VAX

Hence, register classification (# mem, # operands)

Page 12: Chapter 2 Instruction Set Principles and Examples

12

ALU is Register to Register – also known as pure Reduced Instruction Set Computer (RISC)

o Advantages– simple fixed length instruction encoding– decode is simple since instruction types are small– simple code generation model– instruction CPI tends to be very uniform

• except for memory instructions of course• but there are only 2 of them - load and store

o Disadvantages– instruction count tends to be higher– some instructions are short - wasting instruction word bits

(0, 3): Register-Register

Page 13: Chapter 2 Instruction Set Principles and Examples

13

Evolved RISC and also old CISC • new RISC machines capable of doing speculative loads• predicated and/or deferred loads are also possible

o Advantages– data access to ALU immediate without loading first– instruction format is relatively simple to encode– code density is improved over Register (0, 3) model

o Disadvantages– operands are not equivalent - source operand may be

destroyed– need for memory address field may limit # of registers– CPI will vary

• if memory target is in L0 cache then not so bad• if not - life gets miserable

(1, 2): Register-Memory

Page 14: Chapter 2 Instruction Set Principles and Examples

14

True and most complex CISC model• currently extinct and likely to remain so• more complex memory actions are likely to appear but not directly linked to the ALU

o Advantages– most compact code– doesn’t waste registers for temporary values

• good idea for use once data - e.g. streaming media

o Disadvantages– large variation in instruction size - may need a shoe-horn– large variation in CPI - i.e. work per instruction– exacerbates the infamous memory bottleneck

• register file reduces memory accesses if reused

Not used today

(2, 2) or (3, 3): Memory-Memory

Page 15: Chapter 2 Instruction Set Principles and Examples

15

• In today’s machine, objects have byte addresses – an address refers to the number of bytes counted from the beginning of memory

• Object Length: Provides access for bytes (8 bits), half words (16 bits), words (32 bits), and double words (64 bits). The type is implied in opcode (e.g., LDB – load byte; LDW – load word; etc.)

• Byte Ordering– Little Endian: puts the byte whose address is xx00 at the least significant position in t

he word. (7,6,5,4,3,2,1,0)– Big Endian: puts the byte whose address is xx00 at the most significant position in th

e word. (0,1,2,3,4,5,6,7)• Problem occurs when exchanging data among machines with different orderings

Interpreting Memory Addresses

2.3 Memory Addressing

Page 16: Chapter 2 Instruction Set Principles and Examples

16

• Alignment Issues– Accesses to objects larger than a byte must be aligned. An access to an

object of size s bytes at byte address A is aligned if A mod s = 0. Misalignment causes hardware complications, since the memory is

typically aligned on a word or a double-word boundary Misalignment typically results in an alignment fault that must be

handled by the OS

– Hence

• byte address is anything - never misaligned

• half word - even addresses - low order address bit = 0 ( XXXXXXX0) else trap

• word - low order 2 address bits = 0 ( XXXXXX00) else trap

• double word - low order 3 address bits = 0 (XXXXX000) else trap

Interpreting Memory Addresses

Page 17: Chapter 2 Instruction Set Principles and Examples

17

Figure 2.5

Page 18: Chapter 2 Instruction Set Principles and Examples

18

How do architectures specify the addr. of an object they will access?

Effective address: the actual memory address specified by the addressing mode. “->” is for assignment. Mem[R[R1]] refers to the contents of the memory location whose

location is given the contents of register 1 (R1).

Addressing Modes

Page 19: Chapter 2 Instruction Set Principles and Examples

19

Summary of use of memory addressing modes

Based on a VAX which supported everything – from SPEC89

Figure 2.7

Page 20: Chapter 2 Instruction Set Principles and Examples

20

How big should the displacement be?

Figure 2.8 Displacement values are widely distributed

Displacement Addressing Mode

Page 21: Chapter 2 Instruction Set Principles and Examples

21

• Benchmarks show 12 bits of displacement would capture about 75% of the full 32-bit displacements and 16 bits should capture about 99%

• Remember: optimize for the common case. Hence, the choice is at least 12-16 bits

For addresses that do fit in displacement size:

Add R4, 10000 (R0)

For addresses that don’t fit in displacement size, the compiler must do the following:

Load R1, 1000000

Add R1, R0

Add R4, 0 (R1)

Displacement Addressing Mode (cont.)

Page 22: Chapter 2 Instruction Set Principles and Examples

22

• Used where we want to get to a numerical value in an instruction

• Around 20% of the operations have an immediate operand

At high level:

a = b + 3;

if ( a > 17 )

goto Addr

At Assembler level:

Load R2, 3Add R0, R1, R2

Load R2, 17CMPBGT R1, R2

Load R1, AddressJump (R1)

Immediate Addressing Mode

Page 23: Chapter 2 Instruction Set Principles and Examples

23

Immediate Addressing Mode

Figure 2.9 About one-quarter of data transfers and ALU operations have an immediate operand

How frequent for immediates?

Page 24: Chapter 2 Instruction Set Principles and Examples

24

Figure 2.10 Benchmarks show that 50%-70% of the immediates fit within 8 bits and 75%-80% fit within 16 bits

How big for immediates?

Immediate Addressing Mode

Page 25: Chapter 2 Instruction Set Principles and Examples

25

2.4 Addressing Modes for Signal Processing

Two addressing modes that distinguish DSPs

1.Modulo or circular addressing mode–autoincrement/autodecrement to support circular buffers

•As data are added, a pointer is checked to see if it is pointing to the end of the buffer

–If not, the pointer is incremented to the next address

–If it is, the pointer is set instead to the start of the buffer

2.Bit reverse addressing mode–the hardware reverses the lower bits of the address, with the

number of bits reversed depending on the step of the FFT algorithm

Page 26: Chapter 2 Instruction Set Principles and Examples

26

Addressing for Fast Fourier Transform (FFT)

• FFTs start or end their processing with data shuffled in a particular order

Without special support, such address transformations would take an extra memory access to get the new address, or involve a fair amount of logical instructions to transform the address

0 (0002) => 0 (0002)

1 (0012) => 4 (1002)

2 (0102) => 2 (0102)

3 (0112) => 6 (1102)

4 (1002) => 1 (0012)

5 (1012) => 5 (1012)

6 (1102) => 3 (0112)

7 (1112) => 7 (1112)

Page 27: Chapter 2 Instruction Set Principles and Examples

27

Figure 2.11 Static Frequency of Addressing Modes for TI TMS320C54x DSP

17 addressing modes, 6 modes also found in Figure 2.6 account for 95% of the DSP addressing

Page 28: Chapter 2 Instruction Set Principles and Examples

28

Summary: Memory Addressing

• A new architecture expected to support at least: displacement, immediate, and register indirect– represent 75% to 99% of the addressing modes (Figure 2.7)

• The size of the address for displacement mode to be at least 12-16 bits– capture 75% to 99% of the displacements (Figure 2.8)

• The size of the immediate field to be at least 8-16 bits– capture 50% to 80% of the immediates (Figure 2.10)

Desktop and server processors rely on compilers, but historically DSPs rely on hand-coded libraries

Page 29: Chapter 2 Instruction Set Principles and Examples

29

2.5 Type And Size of Operands

• The type of the operand is usually encoded in the opcode– e.g., LDB – load byte; LDW – load word

• Common operand types: (imply their sizes)Character (8 bits or 1 byte)Half word (16 bits or 2 bytes)Word (32 bits or 4 bytes)Double word (64 bits or 8 bytes)Single precision floating point (4 bytes or 1 word)Double precision floating point (8 bytes or 2 words) Characters are almost always in ASCII 16-bit Unicode (used in Java) is gaining popularity Integers are two’s complement binary Floating points follow the IEEE standard 754

• Some architectures support packed decimal: 4 bits are used to encode the values 0-9; 2 decimal digits are packed into each byte

How is the type of an operand designated?

Page 30: Chapter 2 Instruction Set Principles and Examples

30

SPEC2000 Operand Sizes

Figure 2.12 Distribution of data accesses by size for the benchmark programs

The double-word data type is used for double-precision floating point in floating-point programs and for addresses

Page 31: Chapter 2 Instruction Set Principles and Examples

31

2.6 Operands for Media and Signal Processing

• Vertex– A common 3D data type dealt in graphics applications– four components: (x, y, z) and w=color or hidden surfaces– vertex values are usually 32-bit floating-point values– Three vertices specify a graphics primitive such as a triangle

• Pixel– Typically 32 bits, consisting of four 8-bit channels

• R (red), G (green), B (blue), and A (attribute: eg. transparency)

• DSPs add fixed point– fractions between -1 and +1 (divide by 2n-1)

• Blocked floating point– a block of variables with common exponent– accumulators, registers that are wider to guard against round

-off error to aid accuracy in fixed-point arithmetic

Page 32: Chapter 2 Instruction Set Principles and Examples

32

Size of Data operands for DSP

Figure 2.13 Four generations of DSPs, their data width, and the width of the registers that reduces round-off error

Figure 2.14 Size of data operands for TMS320C540x DSP. This DSP has two 40-bit accumulators and no floating-point operations.

Page 33: Chapter 2 Instruction Set Principles and Examples

33

Brief Summary

• Review instruction set classes

– choose register-register class

• Review memory addressing

– select displacement, immediate, and register indirect addressing modes

• Select the operand sizes and types

Page 34: Chapter 2 Instruction Set Principles and Examples

34

2.7 Operations in the Instruction Set

Figure 2.15 Categories of instruction operators and examples of each.

•All computers generally provide a full set of operations for the first three categories•All computers must have some instruction support for basic system functions•Graphics instructions typically operate on many smaller data items in parallel

Page 35: Chapter 2 Instruction Set Principles and Examples

35

Figure 2.16 Top 10 instructions for the 80x86

• Simple instructions dominate this list and responsible for 96% of the instructions executed

• These percentages are the average of the five SPECint 92 programs

Page 36: Chapter 2 Instruction Set Principles and Examples

36

2.8 Operations for Media & Signal Processing

• Data for multimedia operations is often narrower than the 64-bit data word– normally in single precision, not double precision

• Single-instruction multiple-data (SIMD) or vector instructions– A partitioned add operation on 16-bit data with a 64-bit ALU

would perform four 16-bit adds in a single clock cycle• Hardware cost: prevent carries between the four 16-bit partitions of the

ALU

– Two 32-bit floating-point operations (paired single operations)• The two partitions must be insulated to prevent operations on one half

from affecting the other

Page 37: Chapter 2 Instruction Set Principles and Examples

37

Figure 2.17 Summary of multimedia support for desktop RISCs

•B: byte (8 bits), H: half word (16 bits), W: word (32 bits) 8B: operation on 8 bytes in a single instruction•All are fixed-width operations, performing multiple narrow operations on either a 64-bit or 128-bit ALU

Page 38: Chapter 2 Instruction Set Principles and Examples

38

Multimedia Operations for DSPs

• DSP architectures use saturating arithmetic– If the result is too large to be represented, it is set to the largest representabl

e number– There is not an option of causing an exception on arithmetic overflow

• Prevent missing an event in real-time applications• The result will be used no matter what the inputs

• There are several modes to round the wider accumulators into the narrower data words

• The targeted kernels for DSPs accumulate a series of produces, and hence have a multiply-accumulate (MAC) instruction– MACs are key to dot product operations for vector and matrix multiplies

Finite Impulse Response (FIR) Problemy[n] = c[k] * x[n-k]

In C:y[n] = 0for(k=0; k <N; k++)

y[n] = y[n] + c[k]*x[n-k]

General form:x = x + y * z

IBM PowerPC 440 MAC instruction:macchw RT, RA, RB

where RT = x, RA = y, RB = z

Page 39: Chapter 2 Instruction Set Principles and Examples

39

Figure 2.18 Mix of instructions for

TMS320C540x DSP

• 16-bit architecture use two 40-bit accumulators, 8 address registers, no floating-point operations (fixed points instead), plus a stack for passing parameters to library routines and for saving return addresses

• 15% to 20% of the multiplies and MACs round the final sum (not shown)

Page 40: Chapter 2 Instruction Set Principles and Examples

40

Control instructions change the flow of control: instead of executing the next instruction, the program branches to the address specified in the branching instructions

They are a big deal• Primarily because they are difficult to optimize out• AND they are frequent

Four types of control instructions• Conditional branches • Jumps – unconditional transfer• Procedure calls• Procedure returns

2.9 Instructions for Control Flow

Page 41: Chapter 2 Instruction Set Principles and Examples

41

Issues:

• Where is the target address? How to specify it?

• Where is return address kept? How are the arguments passed? (calls)

• Where is return address? How are the results passed? (returns)

Figure 2.19 Breakdown

Control Flow Instructions

Page 42: Chapter 2 Instruction Set Principles and Examples

42

Addressing Modes for Control Flow Instructions

• PC-relative (Program Counter)– supply a displacement added to the PC

• Known at compile time for jumps, branches, and calls (specified within the instruction)

– the target is often near the current instruction• requiring fewer bits• independently of where it is loaded (position independence)

• Register indirect addressing – dynamic addressing– The target address may not be known at compile time– Naming a register that contains the target address

• Case or switch statements• Virtual functions or methods• High-order functions or function pointers• Dynamically shared libraries

Page 43: Chapter 2 Instruction Set Principles and Examples

43

Figure 2.20 Branch distances

These measurements were taken on a load-store computer (Alpha architecture) with all instructions aligned on word boundaries

Page 44: Chapter 2 Instruction Set Principles and Examples

44

Conditional Branch Options

Figure 2.21 Major methods for evaluating branch conditions

Page 45: Chapter 2 Instruction Set Principles and Examples

45

•Most loops go from 0 to n.

•Most backward branches are loops – taken about 90%

Figure 2.22 Comparison Type vs. Frequency

Program % backward branches % all control instructions that modify PC

gcc 26% 63%

spice 31% 63%

TeX 17% 70%

Average 25% 65%

Page 46: Chapter 2 Instruction Set Principles and Examples

46

Repeat Instruction for DSP

• DSPs: add looping structure, called a repeat instruction, to avoid loop overhead

• It allows a single instruction or a block of instructions to be repeated up to, say, 256 times

• eg. TMS320C54 dedicates three special registers to hold the block starting address, ending address, and repeat counter

Page 47: Chapter 2 Instruction Set Principles and Examples

47

Procedure Invocation Options•Procedure calls and returns

– control transfer– state saving; the return address must be savedNewer architectures require the compiler to generate stores and lo

ads for each register saved and restored•Two basic conventions in use to save registers

– caller saving: the calling procedure must save the registers that it wants preserved for access after the call

• the called procedure need not worry about registers– callee saving: the called procedure must save the registers it w

ants to use• leaving the caller unrestrained

most real systems today use a combination of the two mechanisms

• specified in an application binary interface (ABI) that set down the basic rules as to which register be caller saved and which should be callee saved

Page 48: Chapter 2 Instruction Set Principles and Examples

48

2.10 Encoding an Instruction Set

Figure 2.23 Three basic variations in instruction encoding

•Opcode: specifying the operation•# of operand

– addressing mode– address specifier: tells what addressing m

ode is used– Load-store computer

• Only one memory operand• Only one or two addressing modes

•Encoding issues1. The desire to have as many registers and

addressing modes as possible2. The impact of the size of the register and

addressing mode fields on the average instruction size and hence on the average program size

3. A desire to have instructions encoded into lengths that will be easy to handle in a pipelined implementation

• The length of 80x86 instructions varies between 1 and 17 bytes

• Trade-off: size of programs vs. ease of decoding

Page 49: Chapter 2 Instruction Set Principles and Examples

49

Instruction formats for desktop/server

RISC architectures

Page 50: Chapter 2 Instruction Set Principles and Examples

50

Reduced Code Size in RISCs

• Hybrid encoding – support 16-bit and 32-bit instructions in RISC, eg. ARM Thumb and MIPS 16– narrow instructions support fewer operations, smaller address

and immediate fields, fewer registers, and two-address format rather than the classic three-address format

– claim a code size reduction of up to 40%• Compression in IBM’s CodePack

– Adds hardware to decompress instructions as they are fetched from memory on an instruction cache miss

– The instruction cache contains full 32-bit instructions, but compressed code is kept in main memory, ROMs, and the disk

• Hitachi’s SuperH: fixed 16-bit format– 16 rather than 32 registers– fewer instructions

Page 51: Chapter 2 Instruction Set Principles and Examples

51

Today almost all programming is done in high-level languages. As such, an ISA is essentially a complier target.

Because performance of a computer will be significantly affected by the compiler, understanding compiler technology today is critical to designing and efficiently implementing an IS.

Compiler goals:

All correct programs execute correctly

Most compiled programs execute fast (optimizations)

Fast compilation

Debugging support

2.11 The Role of Compilers

Page 52: Chapter 2 Instruction Set Principles and Examples

52

Typical Modern Compiler Structure

Object Code

C, Fortran, or Cobol, etc.

Page 53: Chapter 2 Instruction Set Principles and Examples

53

Optimization Types High level – done at or near source code level

• If procedure is called only once, put it in-line and save CALL• more general case: if call-count < some threshold, put them in-line

Local – done within straight-line code• common sub-expressions produce same value – either allocate a

register or replace with single copy• constant propagation – replace constant valued variable with the

constant• stack height reduction – re-arrange expression tree to minimize

temporary storage needs

Global – across a branch• copy propagation – replace all instances of a variable A that has been

assigned X (i.e., A=X) with X. • code motion – remove code from a loop that computes same value each

iteration of the loop and put it before the loop

• simplify or eliminate array addressing calculations in loops

Page 54: Chapter 2 Instruction Set Principles and Examples

54

Optimization Types Machine-dependent optimizations – based on machine knowledge

• strength reduction – replace multiply by a constant with shifts and adds• would make sense if there was no hardware support for MUL• a trickier version: 17 = arithmetic left shift 4 and add

• pipelining scheduling – reorder instructions to improve pipeline performance

• dependency analysis• branch offset optimization - reorder code to minimize branch offsets

Page 55: Chapter 2 Instruction Set Principles and Examples

55

Complier Optimizations – Change in IC

• L0 – unoptimized

• L1 – local opts, code scheduling, & local reg. allocation

• L2 – global opts and loop transformations, & global reg. Allocation

• L3 – procedure integration

Page 56: Chapter 2 Instruction Set Principles and Examples

56

The Impact of Compiler TechnologyHow are variables allocated and addressed?

How many registers are needed to allocate variables appropriately? Three separate areas for data allocation

• Stack– Used to allocate local variables

– Grown and shrunk on calls and returns

– Addressing is relative to the stack pointer

• Global data area – Used to allocate statically declared objects, such as global variables and

constants

– A large percentage of these objects are aggregate data structures such as arrays

• Heap– Used to allocate dynamic objects

– Access are usually by pointers

– Data is typically not scalars (single variables)

Page 57: Chapter 2 Instruction Set Principles and Examples

57

Reasonably simple for stack-allocated objects• Done with the graph coloring theory: variable – vertex; dependency

between variables – edge; # register will be equal to # colors Essentially impossible for heap-allocated objects because they are

accessed with pointers Hard for global variables and some static variables due to aliasing

opportunity There are multiple ways to refer to the address of a variable b

......

3

2

&

b

p

b

bp

Register Allocation Problem

Page 58: Chapter 2 Instruction Set Principles and Examples

58

How do you help the compiler writer? Key: make the frequent cases fast and the rare case correct

Guidelines that will make it easy to write a complier

– Regularity– Addressing modes, operations, and data types should be

independent of each other

– Provide primitives, not solution– What works in one language may be detrimental to others, so don’t

optimize for one particular language

– Simplify trade-offs among alternatives– Anything that makes code sequence performance obvious is a

definite win!

– Provide instructions that bind the quantities known at compile time as constants

Page 59: Chapter 2 Instruction Set Principles and Examples

59

A Typical General-Purpose Load-Store Architecture Registers

• 32 integer - 64-bit integer registers (R0:R31)– R0 is always equal to 0 and the rest are general purpose

• 32 FP registers (F0:F31)– Contains a single 32-bit or a single 64-bit float

Data Types• byte, half word, word, double word

Instructions• 32-bit long• Fixed length: 4 bytes, 3 instruction types ==> easy decode• Overall goal: simple ==> pipeline ease ==> performance

2.12 The MIPS64 Architecture

Page 60: Chapter 2 Instruction Set Principles and Examples

60

MIPS64 Operations

Key features• addressing modes for data

Displacement (size 12-16 bits)Immediate (size 8-16 bits)Register indirect: placing 0 in the 16-bit displacement fieldAbsolute addressing: using R0 as the base register

• conditional branchesTest the register source for zero or nonzero (compare equal); orCompare 2 registers (compare less)If satisfied, then jump PC+4+immeidate (size 8 bits long)jump, call, and return

• floats - usual set of operations.S for single precision (32-bit).D for double precision (64-bit)

Page 61: Chapter 2 Instruction Set Principles and Examples

61

Instruction FormatFixed length with 3 types

NOTE:• 16 bit immediate: Immediate data and PC-relative offset forshort jumps and conditional branches.

• 26 bit PC-relative offsetfor calls and returns (regular or traps) and long jumps.

Page 62: Chapter 2 Instruction Set Principles and Examples

62

MIPS64 Operations ( load and store)

Example instruction Instruction name Meaning

LD R1, 30(R2) Load double word Regs[R1]←64 Mem[30+Regs[R2]]

LD R1, 1000(R0) Load double word Regs[R1]←64 Mem[1000+0]

LW R1, 30(R2) Load word Regs[R1]←64 (Mem[30+Regs[R2]]0)32 ##

Mem[30+Regs[R2]]

LB R1, 30(R2) Load byte Regs[R1]←64 (Mem[30+Regs[R2]]0)56 ##

Mem[30+Regs[R2]]

L.S F0, 30(R2) Load FP single Regs[F0]←64 Mem[30+Regs[R2]]##032

L.D F0, 30(R2) Load FP double Regs[F0]←64 Mem[30+Regs[R2]]

SD R3, 500(R4) Store double word Mem[500+Regs[R4]] ← 64 Regs[R3]

SW R3, 500(R4) Store word Mem[500+Regs[R4]] ← 32 Regs[R3]

SH R3, 500(R4) Store half word Mem[500+Regs[R4]] ← 16 Regs[R3]48..63

SB R3, 500(R4) Store byte Mem[500+Regs[R4]] ← 8 Regs[R3]56..63

S.S F0, 500(R4) Store FP single Mem[500+Regs[R4]] ← 32 Regs[F0]0..31

S.D F0, 500(R4) Store FP double Mem[500+Regs[R4]] ← 16 Regs[F0]

Page 63: Chapter 2 Instruction Set Principles and Examples

63

MIPS64 Operations ( load and store)

Illustration legends:

A subscript is appended to the symbol ←whenever the length of the datum being transferred might not be clear. Thus, ←n means transfer an n-bit quantity.

A subscript is used to indicate selection of a bit from a filed. Bits are labeled from the most significant bit starting at 0.(Mem[30+Regs[R2]]0)32 : replicating the sign bit for 32 times

The variable Mem, used as an array that stands for main memory, is indexed a byte address and may transfer any number of bytes.

A superscript is used to replicate a filed.

The symbol ## is used to concatenate two fields and may appear on either side of a data transfer.

Page 64: Chapter 2 Instruction Set Principles and Examples

64

MIPS64 Operations ( A/L w/o immediate )

Example instruction Instruction name Meaning

DADDU R1, R2, R3 Add unsigned Regs[R1] ← Regs[R2]+Regs[R3]

DADDI R1, R1, #-3 Add immediate Regs[R1] ← Regs[R1]-3

LUI R1, #88 Load upper immediate Regs[R1] ← 032 ## 88 ## 016

DSLL R1,R2, #3 Shift left logic Regs[R1] ← Regs[R2]<<3

DSLT R1,R2,R3 Set less than If (Regs[R2]<Regs[R3]) Regs[R1] ← 1 else Regs[R1] ← 0

<<: logical shift left

Page 65: Chapter 2 Instruction Set Principles and Examples

65

MIPS64 Operations ( flow control)

Example instruction Instruction name Meaning

J name Jump PC ← name; ((PC+4)–225) ≤ name < ((PC+4)+225)

JAL name Jump and link Regs[R31] ← PC+4; PC ← name; ((PC+4)–225) ≤ name < ((PC+4)+225)

JALR R2 Jump and link register Regs[R31] ← PC+4; PC ← Regs[R2]

JR R3 Jump register PC ← Regs[R3]

BEQZ R4, name Branch equal zero If (Regs[R4] == 0) PC ← name; ((PC+4)–215) ≤ name < ((PC+4)+215)

BNE R3, R4, name Branch not equal zero If (Regs[R3] != Regs[R4]) PC ← name; ((PC+4)–215) ≤ name < ((PC+4)+215)

MOVZ R1, R2, R3 Conditional move if zero If (Regs[R3] == 0) Regs[R1] ← Regs[R2]

Page 66: Chapter 2 Instruction Set Principles and Examples

66

SummaryChapter 2 Instruction Set Principles and Examples2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Addressing Modes for Signal Processing 2.5 Type and Size of Operands 2.6 Operands for Media and Signal Processing2.7 Operations in the Instruction Set2.8 Operations for Media and Signal Processing2.9 Instructions for Control Flow2.10 Encoding an Instruction Set2.11 The Role of Compilers2.12 The MIPS Architecture