View
214
Download
1
Embed Size (px)
Citation preview
EECE476: Computer Architecture
Lectures 1, 2: Instruction Set Architecture
Chapters 1, 2
The University ofBritish Columbia EECE 476 © 2005 Guy Lemieux
2
REVIEW: What isInstruction Set Architecture?
• Important acronym: ISA– Instruction Set Architecture
• The low-level software interface to the machine– Language of the machine– Must translate any programming language into this language– Examples: IA-32 (Intel instruction set), MIPS, SPARC, Alpha,
PA-RISC, PowerPC, …
• Visible to programmer (if desired)!
3
REVIEW: Instruction Set Architecture
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital DesignCircuit Design
Instruction Set Architecture
Firmware
Datapath & Control
Layout
Software
Hardware
4
Which ISA?
Millions of Processors
Year
5
MIPS CPU
• MIPS CPU– What does MIPS mean?
• Millions of Instructions Per Second• Meaningless Indicator of Processor Speed• Microprocessor without Interlocking Pipeline Stages
• Altera NIOS II CPU– What does NIOS mean?
6
Levels of RepresentationHigh Level Language
Program
Assembly Language Program
Machine Language Program
Control Signal Specification
Compiler
Assembler
Machine Interpretation
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw $15, 0($2)lw $16, 4($2)sw$16, 0($2)sw$15, 4($2)
1000 1100 0110 0010 0000 0000 0000 00001000 1100 1111 0010 0000 0000 0000 01001010 1100 1111 0010 0000 0000 0000 0000 1010 1100 0110 0010 0000 0000 0000 0100
°°
ALUOP[0:3] ← InstrReg[9:12] & MASK
7
Instruction Set Architectures• Computer architect’s jargon: ISA
– Machine’s native language (not assembly language!)– Interface specification between hardware and low-level software– Includes:
• language mnemonics (syntax)• behaviour (semantics)• instruction format (bit encoding)
– Note: assembler is simple translator(assembly langage -> machine language)
• Standardizes instructions, machine language bit patterns, etc– Advantage: different CPU implementations of the same ISA– Disadvantage: sometimes prevents use of new innovations
• Many different ISAs– One for every CPU family– Most are very similar, easy to learn a new one!
8
Advantage of Standardized ISA
Performance
Year
9
Data Movement Load (from memory)Store (to memory)memory-to-memory moveregister-to-register moveinput (from I/O device)output (to I/O device)push, pop (to/from stack)
Arithmetic integer (binary + decimal) or FPAdd, Subtract, Multiply, Divide
Logical not, and, or, set, clear
Shift shift left/right, rotate left/right
Control (Jump/Branch) unconditional, conditional
Subroutine Linkage call, return
Interrupt trap, return
Synchronization test & set (atomic read-modify-write)
Typical ISA Operations(little change since 1960s)
10
Typical ISA Operations(biggest changes since 1960s)
• Instruction types eliminated
• Instruction types added
String search, translate
“Multimedia”SIMD Instructions(eg, MMX, 3DNow, SSE)
parallel subword ops (eg 4-way 16bit add with 1 instruction)
Looping
Conditional Execution eliminates branch instructionsp = (b>c); // comparisonIf(p) a=b; // conditional moveIf(!p) a=c; // conditional move
11
Top 10 IA-32 Instructions
Rank Instruction Integer Average (Percent total executed)
1 load 22%
2 conditional branch 20%
3 compare 16%
4 store 12%
5 add 8%
6 and 6%
7 sub 5%
8 move register-register 4%
9 call 1%
10 return 1%
Total 96%
• Simple instructions dominate instruction frequency
12
Top MIPS InstructionsSPEC2000 Benchmarks SPEC2000 BenchmarksInteger Floating-Pt Integer Floating-Pt
add add 0% 0% FP add double add.d 0% 8%add immediate addi 0% 0% FP subtract double sub.d 0% 3%add unsigned addu 7% 21% FP multiply double mul.d 0% 8%add immediate unsigned addiu 12% 2% FP divide double div.d 0% 0%subtract unsinged subu 3% 2% load word to FP double l.d 0% 15%and and 1% 0% store word to FP double s.d 0% 7%and immediate andi 3% 0% shift right arithmetic sra 1% 0%or or 7% 2% load half lhu 1% 0%or immediate ori 2% 0% branch less than zero bltz 1% 0%nor nor 3% 1% branch greater or equal zero bgez 1% 0%shift left logical sll 1% 1% branch less or equal zero blez 0% 1%shift right logical srl 0% 0% multiply mul 0% 1%load upper immediate lui 2% 5%load word lw 24% 15% TOTAL 98% 97%store word sw 9% 2%load byte sbu 1% 0%store byte sbu 1% 0%branch on equal (zero) beq 6% 2%branch on not equal (zero) bne 5% 1%jump and link jal 1% 0%jump register jr 1% 0%set less than slt 2% 0%set less than immediate slti 1% 0%set less than unsigned sltu 1% 0%set less than imm. Unsignedsltiu 1% 0%
13
ISA Summary• Support these simple instructions, since they
will dominate the number of instructions executed:
load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return;
14
Compilers and InstructionSet Architectures
Designing a new ISA? Design Choices…
• Ease of compilation:• orthogonality: no special registers, no special cases,
all operations work with all registers, all operand modes available with any data type or instruction type
• completeness: supports wide range of operations and target applications
• regularity: no overloading or multiple-meanings of instruction fields
• streamlined: resource needs are easily determined
• Eg, all instructions same length
• Eg, no “fancy” instructions that make it difficult to know if the ALU will be used
15
Good (Simple) CompilerConsiderations
• Lessons from history….
– Provide at least 16 general purpose registers plus separate floating-point registers
• Register Assignment (of variables to registers) is critical too• Easier if lots of registers• Too many registers slows down CPU clock speed
– Be sure all addressing modes apply to all data transfer instructions
– Aim for a minimalist instruction set
16
• simple instructions all 32 bits wide• very structured, no unnecessary baggage• only three instruction formats:
• rely on compiler to achieve performance— what are the compiler's goals?
• help compiler where we can
op rs rt rd shamt funct
op rs rt Imm16
op Imm26
R
I
J
Overview of MIPS Instructions
031
17
MIPS I Operation Overview
• Arithmetic/Logical/Comparisons–Add, AddU, Sub, SubU, And, Or, Xor, Nor–AddI, AddIU, AndI, OrI, XorI, LUI –SLT, SLTU, SLTI, SLTIU,–SLL, SRL, SRA, SLLV, SRLV, SRAV
• Memory Access –LB, LBU, LH, LHU, LW, LWL, LWR –SB, SH, SW, SWL, SWR
18
MIPS Arithmetic
• All instructions have 3 operands– R-type instructions (think “R=register”)
• Operand order is fixed (destination first)
Example:
C code: A = B + C
MIPS code: add $s0, $s1, $s2
(registers associated with variables by compiler)
19
MIPS Arithmetic
• C to MIPS Assembly Example
C code: A = B + C + D;E = F - A;
MIPS code: add $t0, $s1, $s2add $s0, $t0, $s3sub $s4, $s5, $s0
– Operands must be registers, only 32 registers provided– Notice our convention
• $s0..$s7 registers hold C language variables• $t0..$t9 registers hold intermediate results
20
Registers vs. Memory
• Arithmetic instructions operands must be registers– Only 32 registers provided– Cannot “add value to memory location X”– Can only “add register B and register C, store in register A”
• Compiler associates variables with registers
• Q: What about programs with lots of variables?– A: Load/save registers from/to memory– Called register spilling
MIPS Registers: Software Conventions for Register Use
Name Register Number Usage$zero $0 the constant value 0$v0-$v1 $2-$3 values for results and expression evaluation$a0-$a3 $4-$7 arguments$t0-$t7 $8-$15 temporaries$s0-$s7 $16-$23 saved$t8-$t9 $24-$25 more temporaries$gp $28 global pointer$sp $29 stack pointer$fp $30 frame pointer$ra $31 return address
22
0 zero constant 0
1 at reserved for assembler
2 v0 expression evaluation &
3 v1 function results
4 a0 arguments
5 a1
6 a2
7 a3
8 t0 temporary: caller savesif they are important
. . . (callee can clobber)
15 t7
MIPS: Software Conventions for Registers
16 s0 callee saves (if used)
. . . (caller can clobber on return)
23 s7
24 t8 temporary (cont’d)
25 t9
26 k0 reserved for OS kernel
27 k1
28 gp Pointer to global area
29 sp Stack pointer
30 fp Frame pointer
31 ra Return Address (HW)
23
Caller/Callee Relationship
• Caller function runs first– Calls the “callee” as sub-function
• Analogy– Caller == Employer– Callee == Employee
int caller(){
int a,b; b = callee(a); return b;}
int callee(int t){ int x=2, y=3; return x+y+t;}
24
Caller/Callee Saved Registers• No such thing as “local
variables” and “global variables” in MIPS ISA
• All registers are “global”
• Conventions– $t0-$t9 registers
• Callee temporary variables• Callee can use freely• Do not expect same value in
these registers after function call• Almost never saved to memory,
short-term use only
– $s0-$s7 registers• Callee must save before use• Caller relies upon callee to
restore these values before returning
// globally visible registersint t0, t1, ..., t7, t8, t9;int s0, s1, ..., s7;
int caller(){ t0 = 1; s0 = t0 + 1; callee();
// s0 == 2 here, always // t0 = ? unreliable value}
int callee(){ // callee saves $s0-$s7 save_s_regs(); t0 = calcB(s0,s1,t9,t3); s0 = 0; restore_s_regs();}
25
Memory Organization: Bytes
• Large one-dimensional array of bytes• Each byte has unique address• Memory address is index into the array• “Byte addressing”
– Each different address points to a unique byte of memory.
6
5
4
3
2
1
0
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
...Byte
address
26
Memory Organization: Words
• Bytes are nice, but most data items use larger “words”• For MIPS, a word is 32 bits (4 bytes)
• 232 bytes with byte addresses from 0 to 232-1• 230 words starting at byte addresses 0, 4, 8, ... 232-4
• Words are aligned– The 2 least-significant bits of a word’s byte address are always zero
12
8
4
0
...
32 bits of data
32 bits of data
32 bits of data
32 bits of data
Each register holds 32 bits of data
(1 word)Byte
address
27
Memory Endian-ness
• Big Endian, aka “Motorola”– MSB comes first (lower byte address is MSB)
• Little Endian, aka “Intel”– LSB comes first (lower byte address is LSB)
• MIPS is “Big Endian”– (actually, it is configurable, but we shall use Big Endian)
MSB ... … LSBMSB ... LSBMSB ... … LSBMSB ... … LSB
MSB ... … LSBMSB ... LSBMSB ... … LSBMSB ... … LSB
0
4
8
12Byte
Address
Of Word
+0 +1 +2 +3 +3 +2 +1 +0
… …
Byte address 4+2 = 6 Byte address 4+1 = 5
Big Endian Little Endian
Byte Address Offset Byte Address Offset
28
Instructions
• Load and store instructions• Example with integers (words):
C code: A[8] = h + A[8];
MIPS code: lw $t0, 32($s3)add $t0, $s2, $t0sw $t0, 32($s3)
• Store word has destination last• Again, arithmetic operands are registers, not memory!
29
Subroutine Example
• Can we figure out the code?
• No multiply?Use add instead:
swap(int v[], int k);{
int temp;temp = v[k]v[k] = v[k+1];v[k+1] = temp;
}
swap:muli $2, $5, 4add $2, $4, $2lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)jr $31
swap:add $2, $5, $5add $2, $2, $2
30
So far we’ve learned:
• MIPS– Loads words, but addresses bytes– Arithmetic on registers only
• Instruction Meaning
add $s1, $s2, $s3 $s1 = $s2 + $s3sub $s1, $s2, $s3 $s1 = $s2 – $s3lw $s1, 100($s2) $s1 = Mem[$s2+100]
sw $s1, 100($s2) Mem[$s2+100] = $s1
31
• Instructions, like registers and words of data, are also 32 bits long– Example: add $t0, $s1, $s2– Registers have numbers: $t0=8, $s1=17, $s2=18
• Instruction Format: R-type (aRithmetic, Register operands)
000000 10001 10010 01000 00000 100000
op rs rt rd shamt funct
$s1 $s2 $t0 add
• What do the field names stand for?
MIPS Machine Language
bit 0bit 31
32
• Instruction Format: I-type (Immediate data)
• Example: lw $t0, 32($s2)
35 18 8 32
op rs rt Imm16 lw $s2 $t0 32
MIPS Machine Language
33
• Decision making instructions– alter the control flow,– i.e., change the "next" instruction to be executed
• MIPS conditional branch instructions:
bne $t0, $t1, Label beq $t0, $t1, Label
• Example: if (i==j) h = i + j;
bne $s0, $s1, Labeladd $s2, $s0, $s1
Label: ....
Control: Branches
34
• MIPS unconditional branch instructions:j label
• Example:
if (i!=j) beq $s4, $s5, Lab1 h=i+j; add $s3, $s4, $s5else j Lab2 h=i-j;Lab1: sub $s3, $s4, $s5
Lab2: ...
Control: Jumps
35
Exercise
• Write the assembly code for a simple loop
for( i=0; i!=a; i=i+1)a=a-1;
• Assuming initial conditionsregister $s0 holds i (already =0)register $s1 holds the constant 1register $s2 holds a (already =10)
• Solution:Loop: beq $s0, $s2, Done
sub $s2, $s2, $s1add $s0, $s0, $s1j Loop
Done: ...