35
EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British Columbia EECE 476 © 2005 Guy Lemieux

EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

EECE476: Computer Architecture

Lectures 1, 2: Instruction Set Architecture

Chapters 1, 2

The University ofBritish Columbia EECE 476 © 2005 Guy Lemieux

Page 2: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

2

REVIEW: What isInstruction Set Architecture?

• Important acronym: ISA– Instruction Set Architecture

• The low-level software interface to the machine– Language of the machine– Must translate any programming language into this language– Examples: IA-32 (Intel instruction set), MIPS, SPARC, Alpha,

PA-RISC, PowerPC, …

• Visible to programmer (if desired)!

Page 3: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

3

REVIEW: Instruction Set Architecture

I/O systemInstr. Set Proc.

Compiler

OperatingSystem

Application

Digital DesignCircuit Design

Instruction Set Architecture

Firmware

Datapath & Control

Layout

Software

Hardware

Page 4: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

4

Which ISA?

Millions of Processors

Year

Page 5: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

5

MIPS CPU

• MIPS CPU– What does MIPS mean?

• Millions of Instructions Per Second• Meaningless Indicator of Processor Speed• Microprocessor without Interlocking Pipeline Stages

• Altera NIOS II CPU– What does NIOS mean?

Page 6: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

6

Levels of RepresentationHigh Level Language

Program

Assembly Language Program

Machine Language Program

Control Signal Specification

Compiler

Assembler

Machine Interpretation

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

lw $15, 0($2)lw $16, 4($2)sw$16, 0($2)sw$15, 4($2)

1000 1100 0110 0010 0000 0000 0000 00001000 1100 1111 0010 0000 0000 0000 01001010 1100 1111 0010 0000 0000 0000 0000 1010 1100 0110 0010 0000 0000 0000 0100

°°

ALUOP[0:3] ← InstrReg[9:12] & MASK

Page 7: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

7

Instruction Set Architectures• Computer architect’s jargon: ISA

– Machine’s native language (not assembly language!)– Interface specification between hardware and low-level software– Includes:

• language mnemonics (syntax)• behaviour (semantics)• instruction format (bit encoding)

– Note: assembler is simple translator(assembly langage -> machine language)

• Standardizes instructions, machine language bit patterns, etc– Advantage: different CPU implementations of the same ISA– Disadvantage: sometimes prevents use of new innovations

• Many different ISAs– One for every CPU family– Most are very similar, easy to learn a new one!

Page 8: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

8

Advantage of Standardized ISA

Performance

Year

Page 9: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

9

Data Movement Load (from memory)Store (to memory)memory-to-memory moveregister-to-register moveinput (from I/O device)output (to I/O device)push, pop (to/from stack)

Arithmetic integer (binary + decimal) or FPAdd, Subtract, Multiply, Divide

Logical not, and, or, set, clear

Shift shift left/right, rotate left/right

Control (Jump/Branch) unconditional, conditional

Subroutine Linkage call, return

Interrupt trap, return

Synchronization test & set (atomic read-modify-write)

Typical ISA Operations(little change since 1960s)

Page 10: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

10

Typical ISA Operations(biggest changes since 1960s)

• Instruction types eliminated

• Instruction types added

String search, translate

“Multimedia”SIMD Instructions(eg, MMX, 3DNow, SSE)

parallel subword ops (eg 4-way 16bit add with 1 instruction)

Looping

Conditional Execution eliminates branch instructionsp = (b>c); // comparisonIf(p) a=b; // conditional moveIf(!p) a=c; // conditional move

Page 11: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

11

Top 10 IA-32 Instructions

Rank Instruction Integer Average (Percent total executed)

1 load 22%

2 conditional branch 20%

3 compare 16%

4 store 12%

5 add 8%

6 and 6%

7 sub 5%

8 move register-register 4%

9 call 1%

10 return 1%

Total 96%

• Simple instructions dominate instruction frequency

Page 12: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

12

Top MIPS InstructionsSPEC2000 Benchmarks SPEC2000 BenchmarksInteger Floating-Pt Integer Floating-Pt

add add 0% 0% FP add double add.d 0% 8%add immediate addi 0% 0% FP subtract double sub.d 0% 3%add unsigned addu 7% 21% FP multiply double mul.d 0% 8%add immediate unsigned addiu 12% 2% FP divide double div.d 0% 0%subtract unsinged subu 3% 2% load word to FP double l.d 0% 15%and and 1% 0% store word to FP double s.d 0% 7%and immediate andi 3% 0% shift right arithmetic sra 1% 0%or or 7% 2% load half lhu 1% 0%or immediate ori 2% 0% branch less than zero bltz 1% 0%nor nor 3% 1% branch greater or equal zero bgez 1% 0%shift left logical sll 1% 1% branch less or equal zero blez 0% 1%shift right logical srl 0% 0% multiply mul 0% 1%load upper immediate lui 2% 5%load word lw 24% 15% TOTAL 98% 97%store word sw 9% 2%load byte sbu 1% 0%store byte sbu 1% 0%branch on equal (zero) beq 6% 2%branch on not equal (zero) bne 5% 1%jump and link jal 1% 0%jump register jr 1% 0%set less than slt 2% 0%set less than immediate slti 1% 0%set less than unsigned sltu 1% 0%set less than imm. Unsignedsltiu 1% 0%

Page 13: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

13

ISA Summary• Support these simple instructions, since they

will dominate the number of instructions executed:

load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return;

Page 14: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

14

Compilers and InstructionSet Architectures

Designing a new ISA? Design Choices…

• Ease of compilation:• orthogonality: no special registers, no special cases,

all operations work with all registers, all operand modes available with any data type or instruction type

• completeness: supports wide range of operations and target applications

• regularity: no overloading or multiple-meanings of instruction fields

• streamlined: resource needs are easily determined

• Eg, all instructions same length

• Eg, no “fancy” instructions that make it difficult to know if the ALU will be used

Page 15: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

15

Good (Simple) CompilerConsiderations

• Lessons from history….

– Provide at least 16 general purpose registers plus separate floating-point registers

• Register Assignment (of variables to registers) is critical too• Easier if lots of registers• Too many registers slows down CPU clock speed

– Be sure all addressing modes apply to all data transfer instructions

– Aim for a minimalist instruction set

Page 16: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

16

• simple instructions all 32 bits wide• very structured, no unnecessary baggage• only three instruction formats:

• rely on compiler to achieve performance— what are the compiler's goals?

• help compiler where we can

op rs rt rd shamt funct

op rs rt Imm16

op Imm26

R

I

J

Overview of MIPS Instructions

031

Page 17: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

17

MIPS I Operation Overview

• Arithmetic/Logical/Comparisons–Add, AddU, Sub, SubU, And, Or, Xor, Nor–AddI, AddIU, AndI, OrI, XorI, LUI –SLT, SLTU, SLTI, SLTIU,–SLL, SRL, SRA, SLLV, SRLV, SRAV

• Memory Access –LB, LBU, LH, LHU, LW, LWL, LWR –SB, SH, SW, SWL, SWR

Page 18: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

18

MIPS Arithmetic

• All instructions have 3 operands– R-type instructions (think “R=register”)

• Operand order is fixed (destination first)

Example:

C code: A = B + C

MIPS code: add $s0, $s1, $s2

(registers associated with variables by compiler)

Page 19: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

19

MIPS Arithmetic

• C to MIPS Assembly Example

C code: A = B + C + D;E = F - A;

MIPS code: add $t0, $s1, $s2add $s0, $t0, $s3sub $s4, $s5, $s0

– Operands must be registers, only 32 registers provided– Notice our convention

• $s0..$s7 registers hold C language variables• $t0..$t9 registers hold intermediate results

Page 20: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

20

Registers vs. Memory

• Arithmetic instructions operands must be registers– Only 32 registers provided– Cannot “add value to memory location X”– Can only “add register B and register C, store in register A”

• Compiler associates variables with registers

• Q: What about programs with lots of variables?– A: Load/save registers from/to memory– Called register spilling

Page 21: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

MIPS Registers: Software Conventions for Register Use

Name Register Number Usage$zero $0 the constant value 0$v0-$v1 $2-$3 values for results and expression evaluation$a0-$a3 $4-$7 arguments$t0-$t7 $8-$15 temporaries$s0-$s7 $16-$23 saved$t8-$t9 $24-$25 more temporaries$gp $28 global pointer$sp $29 stack pointer$fp $30 frame pointer$ra $31 return address

Page 22: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

22

0 zero constant 0

1 at reserved for assembler

2 v0 expression evaluation &

3 v1 function results

4 a0 arguments

5 a1

6 a2

7 a3

8 t0 temporary: caller savesif they are important

. . . (callee can clobber)

15 t7

MIPS: Software Conventions for Registers

16 s0 callee saves (if used)

. . . (caller can clobber on return)

23 s7

24 t8 temporary (cont’d)

25 t9

26 k0 reserved for OS kernel

27 k1

28 gp Pointer to global area

29 sp Stack pointer

30 fp Frame pointer

31 ra Return Address (HW)

Page 23: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

23

Caller/Callee Relationship

• Caller function runs first– Calls the “callee” as sub-function

• Analogy– Caller == Employer– Callee == Employee

int caller(){

int a,b; b = callee(a); return b;}

int callee(int t){ int x=2, y=3; return x+y+t;}

Page 24: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

24

Caller/Callee Saved Registers• No such thing as “local

variables” and “global variables” in MIPS ISA

• All registers are “global”

• Conventions– $t0-$t9 registers

• Callee temporary variables• Callee can use freely• Do not expect same value in

these registers after function call• Almost never saved to memory,

short-term use only

– $s0-$s7 registers• Callee must save before use• Caller relies upon callee to

restore these values before returning

// globally visible registersint t0, t1, ..., t7, t8, t9;int s0, s1, ..., s7;

int caller(){ t0 = 1; s0 = t0 + 1; callee();

// s0 == 2 here, always // t0 = ? unreliable value}

int callee(){ // callee saves $s0-$s7 save_s_regs(); t0 = calcB(s0,s1,t9,t3); s0 = 0; restore_s_regs();}

Page 25: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

25

Memory Organization: Bytes

• Large one-dimensional array of bytes• Each byte has unique address• Memory address is index into the array• “Byte addressing”

– Each different address points to a unique byte of memory.

6

5

4

3

2

1

0

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

...Byte

address

Page 26: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

26

Memory Organization: Words

• Bytes are nice, but most data items use larger “words”• For MIPS, a word is 32 bits (4 bytes)

• 232 bytes with byte addresses from 0 to 232-1• 230 words starting at byte addresses 0, 4, 8, ... 232-4

• Words are aligned– The 2 least-significant bits of a word’s byte address are always zero

12

8

4

0

...

32 bits of data

32 bits of data

32 bits of data

32 bits of data

Each register holds 32 bits of data

(1 word)Byte

address

Page 27: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

27

Memory Endian-ness

• Big Endian, aka “Motorola”– MSB comes first (lower byte address is MSB)

• Little Endian, aka “Intel”– LSB comes first (lower byte address is LSB)

• MIPS is “Big Endian”– (actually, it is configurable, but we shall use Big Endian)

MSB ... … LSBMSB ... LSBMSB ... … LSBMSB ... … LSB

MSB ... … LSBMSB ... LSBMSB ... … LSBMSB ... … LSB

0

4

8

12Byte

Address

Of Word

+0 +1 +2 +3 +3 +2 +1 +0

… …

Byte address 4+2 = 6 Byte address 4+1 = 5

Big Endian Little Endian

Byte Address Offset Byte Address Offset

Page 28: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

28

Instructions

• Load and store instructions• Example with integers (words):

C code: A[8] = h + A[8];

MIPS code: lw $t0, 32($s3)add $t0, $s2, $t0sw $t0, 32($s3)

• Store word has destination last• Again, arithmetic operands are registers, not memory!

Page 29: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

29

Subroutine Example

• Can we figure out the code?

• No multiply?Use add instead:

swap(int v[], int k);{

int temp;temp = v[k]v[k] = v[k+1];v[k+1] = temp;

}

swap:muli $2, $5, 4add $2, $4, $2lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)jr $31

swap:add $2, $5, $5add $2, $2, $2

Page 30: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

30

So far we’ve learned:

• MIPS– Loads words, but addresses bytes– Arithmetic on registers only

• Instruction Meaning

add $s1, $s2, $s3 $s1 = $s2 + $s3sub $s1, $s2, $s3 $s1 = $s2 – $s3lw $s1, 100($s2) $s1 = Mem[$s2+100]

sw $s1, 100($s2) Mem[$s2+100] = $s1

Page 31: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

31

• Instructions, like registers and words of data, are also 32 bits long– Example: add $t0, $s1, $s2– Registers have numbers: $t0=8, $s1=17, $s2=18

• Instruction Format: R-type (aRithmetic, Register operands)

000000 10001 10010 01000 00000 100000

op rs rt rd shamt funct

$s1 $s2 $t0 add

• What do the field names stand for?

MIPS Machine Language

bit 0bit 31

Page 32: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

32

• Instruction Format: I-type (Immediate data)

• Example: lw $t0, 32($s2)

35 18 8 32

op rs rt Imm16 lw $s2 $t0 32

MIPS Machine Language

Page 33: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

33

• Decision making instructions– alter the control flow,– i.e., change the "next" instruction to be executed

• MIPS conditional branch instructions:

bne $t0, $t1, Label beq $t0, $t1, Label

• Example: if (i==j) h = i + j;

bne $s0, $s1, Labeladd $s2, $s0, $s1

Label: ....

Control: Branches

Page 34: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

34

• MIPS unconditional branch instructions:j label

• Example:

if (i!=j) beq $s4, $s5, Lab1 h=i+j; add $s3, $s4, $s5else j Lab2 h=i-j;Lab1: sub $s3, $s4, $s5

Lab2: ...

Control: Jumps

Page 35: EECE476: Computer Architecture Lectures 1, 2: Instruction Set Architecture Chapters 1, 2 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

35

Exercise

• Write the assembly code for a simple loop

for( i=0; i!=a; i=i+1)a=a-1;

• Assuming initial conditionsregister $s0 holds i (already =0)register $s1 holds the constant 1register $s2 holds a (already =10)

• Solution:Loop: beq $s0, $s2, Done

sub $s2, $s2, $s1add $s0, $s0, $s1j Loop

Done: ...