Upload
anoki
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Introduction to Computer Architecture. Course number: CS141 Who? Tarun Soni ( [email protected] ) TA: Wenjing Rao (wrao@cs) and Eric Liu (xeliu@cs) Where? CENTR: 119 When? M,W @ 6-8:50pm Textbook: Patterson and Hennessy, Computer Organization & Design - PowerPoint PPT Presentation
Citation preview
CS141-L1-1 Tarun Soni, Summer ‘03
Course number: CS141
Who? Tarun Soni ( [email protected] )TA: Wenjing Rao (wrao@cs) and Eric Liu (xeliu@cs)
Where? CENTR: 119When? M,W @ 6-8:50pm Textbook: Patterson and Hennessy,
Computer Organization & DesignThe hardware software interface, 2nd edition.
Web-page: http://www-cse.ucsd.edu/users/tsoni/cse141(slides, homework questions, other pointers and information)
Office hours: Tarun: Mon. 4pm-6pm: AP&M 3151Yang Yu and Wenjing Rao: TBD, look on the webpage
Introduction to Computer Architecture
CS141-L1-2 Tarun Soni, Summer ‘03
Administrivia Technology trends Computer organization: concept of abstraction Instruction Set Architectures: Definition, types, examples Instruction formats: operands, addressing modes Operations: load, store, arithmetic, logical Control instructions: branch, jump, procedures Stacks Examples: in-line code, procedure, nested-procedures Other architectures
Todays Agenda
CS141-L1-3 Tarun Soni, Summer ‘03
Schedule-sort of
1 6/30 Intro., Technology, ISA
2 7/2 Performance, Cost, Arithmetic
3 7/7 Multiply, Divide?, FP numbers
4 7/9 Single cycle: Datapath, Control
5 7/14 Multiple Cycle CPU, Microprogramming
6 7/16 Mid-term quiz;
7 7/21 Pipelining: intro, control, exceptions
8 7/23 Memory systems, Cache, Virtual memory
9 7/28 I/O Devices
10 7/30 Superscalars, Parallel machines
11 ?? Overview, wrapup, catchup ..
** ?? Final, 7-10 pm, Friday
CS141-L1-4 Tarun Soni, Summer ‘03
Grading
• Grade breakdown– Mid-term (1.5 hours) 30% – Final (3 hours) 40%– Pop-Quizzes (3, 45 min each, only 2 high scores cout) 30% – Class Participation: Extras??
• Can’t make exams: tell us early and we will work something out• Homeworks do not need to be turned in. However, pop-quizzes will be based on hw.• What is cheating?
– Studying together in groups is encouraged– Work must be your own– Common examples of cheating: copying an exam question from other material or
other person...– Better off to skip question (small fraction of grade.)
• Written/email request for changes to grades– average grade will be a B or B+; set expectations accordingly
CS141-L1-5 Tarun Soni, Summer ‘03
Why?
• You may become a practitioner someday ?• Keeper of Moore’s law• Architecture concepts are core to other sub-systems
• Video-processors• Security engines• Routing/Networking etc.
• Even if you become a software geek?
• Architecture enables a way of thinking• Understanding leads to breadth and better
implementation of software
CS141-L1-6 Tarun Soni, Summer ‘03
‘Computer” of the day
Jacquard loomlate 1700’sfor weaving silk
“Program” on punch cards
“Microcode”: each holelifts a set of threads
“Or gate”: thread lifted if any controlling hole punched
CS141-L1-7 Tarun Soni, Summer ‘03
Trends: Moores law
CS141-L1-8 Tarun Soni, Summer ‘03
Trends: $1000 will buy you…
CS141-L1-9 Tarun Soni, Summer ‘03
Trends: Densities
CS141-L1-10 Tarun Soni, Summer ‘03
Technology
Source: Intel Journal, May 2002
CS141-L1-11 Tarun Soni, Summer ‘03
Other technology trends
• Processor
– logic capacity: about 30% per year
– clock rate: about 20% per year
• Memory
– DRAM capacity: about 60% per year (4x every 3 years)
– Memory speed: about 10% per year
– Cost per bit: about 25% per year
• Disk
– capacity: about 60% per year
1
10
100
1000
1 4 7
10
13
16
19
22
25
28
31
34
CPU Speed
DRAM Speed
1
10
100
1000
10000
100000
1000000
10000000
100000000
1 4 7
10
13
16
19
22
25
28
31
34
CPU logic capacity
DRAM capacity
Disk capacity
Speed Capacity
Physics-advancementArchitecture-advancement
CS141-L1-12 Tarun Soni, Summer ‘03
SPEC Performance
0
50
100
150
200
250
300
350
1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
Year
Perform
ance
RISC
Intel x86
35%/yr
RISCintroduction
performance now improves 50% per year (2x every 1.5 years)
CS141-L1-13 Tarun Soni, Summer ‘03
Organization: A Basic Computer
Computer
Memory
Datapath
Control
Output
Input
Every computer has 5 basic components
CS141-L1-14 Tarun Soni, Summer ‘03
Organization: A Basic Computer
Proc
CachesBusses
Memory
I/O Devices:
Controllers
adapters
DisksDisplaysKeyboards
Networks
• Not all “memory” are created equally– Cache: fast (expensive) memory are placed closer to the processor– Main memory: less expensive memory--we can have more
• Input and output (I/O) devices have the messiest organization– Wide range of speed: graphics vs. keyboard– Wide range of requirements: speed, standard, cost ... – Least amount of research (so far)
CS141-L1-15 Tarun Soni, Summer ‘03
What is “Computer Architecture”
Computer Architecture =
Instruction Set Architecture + Machine Organization
Computer Architecture and Engineering
Instruction Set Design Computer Organization
Interfaces Hardware Components
Compiler/System View Logic Designer’s View
How you talk to the machine What the machine looks like
CS141-L1-16 Tarun Soni, Summer ‘03
Architecture?
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital DesignCircuit Design
Instruction Set Architecture
Firmware
• Coordination of many levels of abstraction
• Under a rapidly changing set of forces
• Design, Measurement, and Evaluation
Datapath & Control
Layout
CS141-L1-17 Tarun Soni, Summer ‘03
Levels of abstraction?
High Level Language Program
Assembly Language Program
Machine Language Program
Control Signal Specification
Compiler
Assembler
Machine Interpretation
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
ALUOP[0:3] <= InstReg[9:11] & MASK
CS141-L1-18 Tarun Soni, Summer ‘03
Instruction Set Architecture
instruction set
software
hardware
ISA is the agreed-upon interface between all the software that runs on the machine and the hardware that executes it.
CS141-L1-19 Tarun Soni, Summer ‘03
Example ISAs
• IBM360, VAX etc. • Digital Alpha (v1, v3) 1992-
97• HP PA-RISC (v1.1, v2.0) 1986-
96• Sun Sparc (v8, v9) 1987-
95• SGI MIPS (MIPS I, II, III, IV, V) 1986-
96• Intel (8086,80286,80386, 1978-
96 80486,Pentium, MMX, ...)
• ARM ARM7,8,StrongARM 1995-
Digital Signal Processors also have an ISA
TMS320, Motorola, OAK etc.
CS141-L1-20 Tarun Soni, Summer ‘03
ISAs
Instruction Set Architecture
“How to talk to computers if you aren’t in Star Trek”
CS141-L1-21 Tarun Soni, Summer ‘03
ISAs
• Language of the Machine• More primitive than higher level languages
e.g., no sophisticated control flow• Very restrictive
e.g., MIPS Arithmetic Instructions
• We’ll be working with the MIPS instruction set architecture– similar to other architectures developed since the 1980's– used by NEC, Nintendo, Silicon Graphics, Sony
Design goals: maximize performance and minimize cost, reduce design time
CS141-L1-22 Tarun Soni, Summer ‘03
ISAs
Ideally the only part of the machine visible to the programmer/compiler
• Available instructions (Opcodes)
• Formats
• Registers, number and type
• Addressing modes, access mechanisms
• Exception conditions etc.
CS141-L1-23 Tarun Soni, Summer ‘03
Instruction Set Architecture: What Must be Specified?
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
° Instruction Format or Encoding
– how is it decoded?
° Location of operands and result
– where other than memory?
– how many explicit operands?
– how are memory operands located?
– which can or cannot be in memory?
° Data type and Size
° Operations
– what are supported
° Successor instruction
– jumps, conditions, branches
fetch-decode-execute is implicit!
CS141-L1-24 Tarun Soni, Summer ‘03
Vocabulary
• superscalar processor -- can execute more than one instructions per cycle.
• cycle -- smallest unit of time in a processor.
• parallelism -- the ability to do more than one thing at once.
• pipelining -- overlapping parts of a large task to increase throughput without decreasing latency
CS141-L1-25 Tarun Soni, Summer ‘03
ISA Decisions
• operations
– how many?
– which ones
• operands
– how many?
– location
– types
– how to specify?
• instruction format
– size
– how many formats?
y = x + b
operationdestination operand
how does the computer know what0001 0100 1101 1111means?
(add r1, r2, r5)
CS141-L1-26 Tarun Soni, Summer ‘03
Crafting an ISA
• We’ll look at some of the decisions facing an instruction set architect, and
• how those decisions were made in the design of the MIPS instruction set.
• MIPS, like SPARC, PowerPC, and Alpha AXP, is a RISC (Reduced Instruction Set Computer) ISA.
– fixed instruction length
– few instruction formats
– load/store architecture
• RISC architectures worked because they enabled pipelining. They continue to thrive because they enable parallelism.
CS141-L1-27 Tarun Soni, Summer ‘03
Basic types of ISAs
Accumulator (1 register):
1 address add A acc acc + mem[A]
1+x address addx A acc acc + mem[A + x]
Stack:
0 address add tos tos + next
General Purpose Register:
2 address add A B EA(A) EA(A) + EA(B)
3 address add A B C EA(A) EA(B) + EA(C)
Load/Store:
3 address add Ra Rb Rc Ra Rb + Rc
load Ra Rb Ra mem[Rb]
store Ra Rb mem[Rb] Ra
Comparison:
Bytes per instruction? Number of Instructions? Cycles per instruction?
CS141-L1-28 Tarun Soni, Summer ‘03
Instruction Count
Accumulator (1 register):
Load A
Add B
Store C
Stack:
Push A
Push B
Add
Pop C
C = A+B
General Purpose Register: (Register-Memory)
Load R1,A
Add R1,B
Store C,R1
Load/Store:
Load R1,A
Load R2,B
Add R3,R1,R2
Store C,R3
CS141-L1-29 Tarun Soni, Summer ‘03
Instruction Length
Variable:
Fixed:
Hybrid:
…
• All instructions have 3 operands
• Operand order is fixed (destination first)
C code: A = B + CMIPS code: add $s0, $s1, $s2
(associated with variables by compiler)
MIPS Instructions
CS141-L1-30 Tarun Soni, Summer ‘03
Instruction Length
• Variable-length instructions (Intel 80x86, VAX) require multi-step fetch and decode, but allow for a much more flexible and compact instruction set.
• Fixed-length instructions allow easy fetch and decode, and simplify pipelining and parallelism.
All MIPS instructions are 32 bits long.
– this decision impacts every other ISA decision we make because it makes instruction bits scarce.
• If code size is most important, use variable length instructions
• If performance is most important, use fixed length
Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide instructions (Thumb, MIPS16)
choose performance or density per procedure
CS141-L1-31 Tarun Soni, Summer ‘03
MIPS Instruction Format
OP
OP
OP
rs rt rd sa funct
rs rt immediate
target
• the opcode tells the machine which format
• so add r1, r2, r3 has
– opcode=0, funct=32, rs=2, rt=3, rd=1, sa=0
– 000000 00010 00011 00001 00000 100000
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
CS141-L1-32 Tarun Soni, Summer ‘03
Operands
• operands are generally in one of two places:
– registers (32 int, 32 fp)
– memory (232 locations)
• registers are
– easy to specify
– close to the processor (fast access)
• the idea that we want to access registers whenever possible led to load-store architectures.
– normal arithmetic instructions only access registers
– only access memory with explicit loads and stores
CS141-L1-33 Tarun Soni, Summer ‘03
Load Store Architectures
Load-store architectures
can do:
add r1=r2+r3
and
load r3, M(address)
forces heavy dependence on registers, which is exactly what you want in today’s CPUs
can’t do
add r1 = r2 + M(address)
-more instructions
+fast implementation (e.g., easy pipelining)
Expect new instruction set architecture to use general purpose register
Pipelining => Expect it to use load store variant of GPR ISA
CS141-L1-34 Tarun Soni, Summer ‘03
General Purpose Registers
° Advantages of registers
• registers are faster than memory
• registers are easier for a compiler to use
- e.g., (A*B) – (C*D) – (E*F) multiplies in any order vs. stack
• registers can hold variables
- memory traffic is reduced, so program is sped up - code density improves (since register named with fewer bits
than memory location)
• Programmable storage
– 2^32 x bytes of memory
– 31 x 32-bit GPRs (R0 = 0)
– 32 x 32-bit FP regs (paired DP)
– HI, LO, PC
0r0r1°°°r31PClohi
MIPS Registers
CS141-L1-35 Tarun Soni, Summer ‘03
Memory Organization
• Viewed as a large, single-dimension array, with an address.
• A memory address is an index into the array
• "Byte addressing" means that the index points to a byte of memory.
0
1
2
3
4
5
6
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
CS141-L1-36 Tarun Soni, Summer ‘03
Memory Organization
• Bytes are nice, but most data items use larger "words"
• For MIPS, a word is 32 bits or 4 bytes.
• 232 bytes with byte addresses from 0 to 232-1
• 230 words with byte addresses 0, 4, 8, ... 232-4
• Words are alignedi.e., what are the least 2 significant bits of a word address?
0
4
8
12
...
32 bits of data
32 bits of data
32 bits of data
32 bits of data
Registers hold 32 bits of data
CS141-L1-37 Tarun Soni, Summer ‘03
Data Types
Bit: 0, 1
Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word
Character: ASCII 7 bit code
Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte
Integers: 2's Complement
Floating Point: Single Precision Double Precision Extended Precision
M x RE
How many +/- #'s?Where is decimal pt?How are +/- exponents represented?
exponent
basemantissa
CS141-L1-38 Tarun Soni, Summer ‘03
Operand Usage
Frequency of reference by size
0% 20% 40% 60% 80%
Byte
Halfword
Word
Doubleword
0%
0%
31%
69%
7%
19%
74%
0%
Int Avg.
FP Avg.
Support data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers
CS141-L1-39 Tarun Soni, Summer ‘03
Addressing: Endian-ness and alignment
• Big Endian: address of most significant byte = word address (xx00 = Big End of word)
– IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
• Little Endian: address of least significant byte = word address(xx00 = Little End of word)
– Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
msb lsb
3 2 1 0
little endian byte 0
0 1 2 3big endian byte 0
Alignment: require that objects fall on address that is multiple of their size.
0 1 2 3
Aligned
NotAligned
CS141-L1-40 Tarun Soni, Summer ‘03
Addressing Modes
– Register direct R3– Immediate (literal) #25– Direct (absolute) M[10000]
– Register indirect M[R3]– Base+Displacement M[R3 + 10000]– if register is the program counter, this is PC-
relative– Base+Index M[R3 + R4]– Scaled Index M[R3 + R4*d + 10000]– Autoincrement M[R3++]– Autodecrement M[R3 - -]
– Memory Indirect M[ M[R3] ]
how do we specify the operand we want?
CS141-L1-41 Tarun Soni, Summer ‘03
Addressing Modes
Addressing mode Example Meaning
Register Add R4,R3 R4R4+R3
Immediate Add R4,#3 R4 R4+3
Displacement Add R4,100(R1) R4 R4+Mem[100+R1]
Register indirect Add R4,(R1) R4 R4+Mem[R1]
Indexed / Base Add R3,(R1+R2) R3 R3+Mem[R1+R2]
Direct or absolute Add R1,(1001) R1 R1+Mem[1001]
Memory indirect Add R1,@(R3) R1 R1+Mem[Mem[R3]]
Auto-increment Add R1,(R2)+ R1 R1+Mem[R2]; R2 R2+d
Auto-decrement Add R1,–(R2) R2 R2–d; R1 R1+Mem[R2]
Scaled Add R1,100(R2)[R3] R1 R1+Mem[100+R2+R3*d]
CS141-L1-42 Tarun Soni, Summer ‘03
Addressing Modes: Usage
3 programs measured on machine with all address modes (VAX)
--- Displacement: 42% avg, 32% to 55% 75%
--- Immediate: 33% avg, 17% to 43% 85%
--- Register deferred (indirect): 13% avg, 3% to 24%
--- Scaled: 7% avg, 0% to 16%
--- Memory indirect: 3% avg, 1% to 6%
--- Misc: 2% avg, 0% to 3%
75% displacement & immediate88% displacement, immediate & register indirect
similar measurements:
- 16 bits is enough for the immediate address 75 to 80% of the time
- 16 bits is enough of a displacement 99% of the time.
CS141-L1-43 Tarun Soni, Summer ‘03
Program Base + Dis- placement
Immediate Scaled Index
Memory Indirect
All Others
TEX 56% 43% 0 1 0
Spice 58% 17% 16% 6% 3%
GCC 51% 39% 6% 1% 3%
Addressing mode usage: Application Specific
CS141-L1-44 Tarun Soni, Summer ‘03
MIPS Addressing Modes
register direct
add $1, $2, $3
immediate
add $1, $2, #35
base + displacement
lw $1, disp($2)
OP rs rt rd sa funct
OP rs rt immediate
rt
rs
immediate register indirect disp = 0absolute (rs) = 0
CS141-L1-45 Tarun Soni, Summer ‘03
MIPS ISA-so far
• fixed 32-bit instructions
• 3 instruction formats
• 3-operand, load-store architecture
• 32 general-purpose registers (integer, floating point)
– R0 always equals 0.
• 2 special-purpose integer registers, HI and LO, because multiply and divide produce more than 32 bits.
• registers are 32-bits wide (word)
• register, immediate, and base+displacement addressing modes
But what about the actual instructions themselves ??
CS141-L1-46 Tarun Soni, Summer ‘03
Typical Operations (little change since 1960)
Data Movement Load (from memory)Store (to memory)memory-to-memory moveregister-to-register moveinput (from I/O device)output (to I/O device)push, pop (to/from stack)
Arithmetic integer (binary + decimal) or FPAdd, Subtract, Multiply, Divide
Logical not, and, or, set, clear
Shift shift left/right, rotate left/right
Control (Jump/Branch) unconditional, conditional
Subroutine Linkage call, return
Interrupt trap, return
Synchronization test & set (atomic r-m-w)
String search, translate
Graphics (MMX) parallel subword ops (4 16bit add)
CS141-L1-47 Tarun Soni, Summer ‘03
80x86 Instruction usage
° Rank instruction Integer Average Percent total executed
1 load 22%
2 conditional branch 20%
3 compare 16%
4 store 12%
5 add 8%
6 and 6%
7 sub 5%
8 move register-register 4%
9 call 1%
10 return 1%
Total 96%
° Simple instructions dominate instruction frequency
CS141-L1-48 Tarun Soni, Summer ‘03
Instruction usage
• Support the simple instructions, since they will dominate the number of instructions executed:
load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return;
Compiler Issues
orthogonality: no special registers, few special cases, all operand modes available with any data type or instruction type
completeness: support for a wide range of operations and target applications
regularity: no overloading for the meanings of instruction fields
streamlined: resource needs easily determined
Register Assignment is critical too
Easier if lots of registers
CS141-L1-49 Tarun Soni, Summer ‘03
MIPS Instructions
• arithmetic
– add, subtract, multiply, divide
• logical
– and, or, shift left, shift right
• data transfer
– load word, store word
• conditional Branch
• unconditional Jump
CS141-L1-50 Tarun Soni, Summer ‘03
MIPS Instructions
• arithmetic
– add, subtract, multiply, divide
Instruction Example Meaning Comments
add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible
subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible
add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible
add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions
subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions
add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no exceptions
multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product
multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product
divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder
Hi = $2 mod $3
divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder
Hi = $2 mod $3
move from Hi mfhi $1 $1 = Hi Used to get copy of Hi
move from Lo mflo $1 $1 = Lo Used to get copy of Lo
CS141-L1-51 Tarun Soni, Summer ‘03
MIPS Instructions
• logical
– and, or, shift left, shift right
Instruction Example Meaning Comment
and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND
or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR
xor xor $1,$2,$3 $1 = $2 Å $3 3 reg. operands; Logical XOR
nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR
and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant
or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant
xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant
shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant
shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant
shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend)
shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable
shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable
shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable
CS141-L1-52 Tarun Soni, Summer ‘03
MIPS Instructions
• data transfer
– load word, store wordInstruction Comment
SW 500(R4), R3 Store word
SH 502(R2), R3 Store half
SB 41(R3), R2 Store byte
LW R1, 30(R2) Load word
LH R1, 40(R3) Load halfword
LHU R1, 40(R3) Load halfword unsigned
LB R1, 40(R3) Load byte
LBU R1, 40(R3) Load byte unsigned
LUI R1, 40 Load Upper Immediate (16 bits shifted left by 16)
Why need LUI?
0000 … 0000
LUI R5
R5
CS141-L1-53 Tarun Soni, Summer ‘03
MIPS Control Instructions
° Condition Codes
Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions.
add r1, r2, r3
bz label
° Condition Register
cmp r1, r2, r3
bgt r1, label
° Compare and Branch
bgt r1, r2, label
• How do you specify the destination of a branch/jump?• studies show that almost all conditional branches go short distances from the
current program counter (loops, if-then-else).– we can specify a relative address in much fewer bits than an absolute
address– e.g., beq $1, $2, 100 => if ($1 == $2) PC = PC + 100 * 4
• How do we specify the condition of the branch?
CS141-L1-54 Tarun Soni, Summer ‘03
Conditional Branch Distance
Bits of Branch Dispalcement
0%
10%
20%
30%
40%
0 1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
Int. Avg. FP Avg.
CS141-L1-55 Tarun Soni, Summer ‘03
Conditional Branching
• PC-relative since most branches are relatively close to the current PC address
• At least 8 bits suggested (± 128 instructions)
• Compare Equal/Not Equal most important for integer programs (86%)
Frequency of comparison types in branches
0% 50% 100%
EQ/NE
GT/LE
LT/GE
37%
23%
40%
86%
7%
7%
Int Avg.
FP Avg.
CS141-L1-56 Tarun Soni, Summer ‘03
Conditional Branching
• Compare and Branch – BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch – BNE rs, rt, offset <>
• Compare to zero and Branch – BLEZ rs, offset if R[rs] <= 0 then PC-relative branch – BGTZ rs, offset > – BLT < – BGEZ >= – BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31) – BGEZAL >=
• Remaining set of compare and branch take two instructions • Almost all comparisons are against zero!
• beq, bne beq r1, r2, addr => if (r1 == r2) goto addr• slt $1, $2, $3 => if ($2 < $3) $1 = 1; else $1 = 0• these, combined with $0, can implement all fundamental branch conditions
– Always, never, !=, ==, >, <=, >=, <, >(unsigned), <= (unsigned), ...
MIPS Branch Instructions
CS141-L1-57 Tarun Soni, Summer ‘03
Jumps
• need to be able to jump to an absolute address sometime• need to be able to do procedure calls and returns
• jump -- j 10000 => PC = 10000• jump and link -- jal 100000 => $31 = PC + 4; PC = 10000
– used for procedure calls
• jump register -- jr $31 => PC = $31– used for returns, but can be useful for lots of other things.
CS141-L1-58 Tarun Soni, Summer ‘03
Jumps
OP
OP
OP
rs rt rd sa funct
rs rt Immediate (16 bits)
target
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R
I
J
MIPS Instruction Formats
• Branch (e.g., beq) uses PC-relative addressing mode (few bits if addr typically close) uses base+displacement mode, with the PC being the base.
• Jump uses pseudo-direct addressing mode. 26 bits of the address is in the instruction, the rest is taken from the PC.
instruction program counter
6 26 6 26
jump destination address
MIPS Addressing Formats: Branches and Jumps
CS141-L1-59 Tarun Soni, Summer ‘03
MIPS Branch & Jump Instructions
Instruction Example Meaning
branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100Equal test; PC relative branch
branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100Not equal test; PC relative
set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0Compare less than; 2’s comp.
set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0Compare < constant; 2’s comp.
set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0Compare less than; natural numbers
set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0Compare < constant; natural numbers
jump j 10000 go to 10000Jump to target address
jump register jr $31 go to $31For switch, procedure return
jump and link jal 10000 $31 = PC + 4; go to 10000For procedure call
CS141-L1-60 Tarun Soni, Summer ‘03
Stacks
Stacking of Subroutine Calls & Returns and Environments:
A: CALL B
CALL C
C: RET
RET
B:
A
A B
A B C
A B
A
Some machines provide a memory stack as part of the architecture (e.g., VAX)
Sometimes stacks are implemented via software convention (e.g., MIPS)
CS141-L1-61 Tarun Soni, Summer ‘03
Stacks
Useful for stacked environments/subroutine call & return even if operand stack not part of architecture
Stacks that Grow Up vs. Stacks that Grow Down:
abc
0 Little
inf. Big 0 Little
inf. Big
MemoryAddresses
SP
NextEmpty?
LastFull?
Little --> Big/Last Full
POP: Read from Mem(SP) Decrement SP
PUSH: Increment SP Write to Mem(SP)
growsup
growsdown
Little --> Big/Next Empty
POP: Decrement SP Read from Mem(SP)
PUSH: Write to Mem(SP) Increment SP
CS141-L1-62 Tarun Soni, Summer ‘03
Stack Frames
FP
ARGS
Callee SaveRegisters
Local Variables
SP
Reference args andlocal variables atfixed (positive) offsetfrom FP
Grows and shrinks duringexpression evaluation
(old FP, RA)
• Many variations on stacks possible (up/down, last pushed / next )
• Block structured languages contain link to lexically enclosing frame
• Compilers normally keep scalar variables in registers, not memory!
High Mem
Low Mem
CS141-L1-63 Tarun Soni, Summer ‘03
MIPS Software Register Conventions
0 zero constant 0
1 at reserved for assembler
2 v0 expression evaluation &
3 v1 function results
4 a0 arguments
5 a1
6 a2
7 a3
8 t0 temporary: caller saves
. . . (callee can clobber)
15 t7
16 s0 callee saves
. . . (caller can clobber)
23 s7
24 t8 temporary (cont’d)
25 t9
26 k0 reserved for OS kernel
27 k1
28 gp Pointer to global area
29 sp Stack pointer
30 fp frame pointer
31 ra Return Address (HW)
CS141-L1-64 Tarun Soni, Summer ‘03
MIPS Branch & Jump Instructions
MIPS operands
Name Example Comments$s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform
32 registers $a0-$a3, $v0-$v1, $gp, arithmetic. MIPS register $zero always equals 0. Register $at is
$fp, $sp, $ra, $at reserved for the assembler to handle large constants.
Memory[0], Accessed only by data transfer instructions. MIPS uses byte addresses, so
230 memory Memory[4], ..., sequential words differ by 4. Memory holds data structures, such as arrays,words Memory[4294967292] and spilled registers, such as those saved on procedure calls.
MIPS assembly language
Category Instruction Example Meaning Commentsadd add $s1, $s2, $s3 $s1 = $s2 + $s3 Three operands; data in registers
Arithmetic subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; data in registers
add immediate addi $s1, $s2, 100 $s1 = $s2 + 100 Used to add constants
load word lw $s1, 100($s2) $s1 = Memory[$s2 + 100] Word from memory to register
store word sw $s1, 100($s2) Memory[$s2 + 100] = $s1 Word from register to memory
Data transfer load byte lb $s1, 100($s2) $s1 = Memory[$s2 + 100] Byte from memory to register
store byte sb $s1, 100($s2) Memory[$s2 + 100] = $s1 Byte from register to memoryload upper immediate
lui $s1, 100 $s1 = 100 * 216 Loads constant in upper 16 bits
branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to PC + 4 + 100
Equal test; PC-relative branch
Conditional
branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to PC + 4 + 100
Not equal test; PC-relative
branch set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1; else $s1 = 0
Compare less than; for beq, bne
set less than immediate
slti $s1, $s2, 100 if ($s2 < 100) $s1 = 1; else $s1 = 0
Compare less than constant
jump j 2500 go to 10000 Jump to target address
Uncondi- jump register jr $ra go to $ra For switch, procedure return
tional jump jump and link jal 2500 $ra = PC + 4; go to 10000 For procedure call
CS141-L1-65 Tarun Soni, Summer ‘03
Example: Swap()
• Can we figure out the code?
swap(int v[], int k);{ int temp;
temp = v[k]v[k] = v[k+1];v[k+1] = temp;
}
swap: // $4=v, $5=kmuli $2, $5, 4 // $2 = k*4add $2, $4, $2 // $2 = v+(4*k)lw $15, 0($2) // $15=temp= *($2+0)=*(v+k)lw $16, 4($2) // $16 = *($2+4) = *(v+k+1)sw $16, 0($2) // *(v+k) = $16 = *(v+k+1)sw $15, 4($2) // *(v+k+1) = $15 = tempjr $31 // return;
CS141-L1-66 Tarun Soni, Summer ‘03
Example: Leaf_procedure()
• Procedures?int PairDiff(int a, int b, int c,int d);{ int temp;
temp = (a+b)-(c+d);return temp;
}
Assume caller puts $a0-$a3 = a,b,c,d and wants result in $v0PairDiff: //
sub $sp,$sp,12 // Make space for 3 temp locationssw $t1, 8($sp) // save $t1 (optional if MIPS convention)sw $t0, 4($sp) // save $t0 (optional if MIPS convention)sw $s0, 0($sp) // save $s0
add $t0,$a0,$a1 // (t0=a+b) add $t1,$a2,$a3 // (t1=c+d)sub $s0,$t0,$t1 // (s0=t0-t1)
add $v0,$s0,$zero // store return value in $v0lw $s0,0($sp) // restore registerslw $t0,4($sp) // (optional if MIPS convention)lw $t1,8($sp) // (optional if MIPS convention)add $sp,$sp,12 // ‘pop’ the stack
jr $ra // The actual return to calling routine
CS141-L1-67 Tarun Soni, Summer ‘03
Example: Nested_procedure()
• What about nested procedures? $ra ??
• Recursive procedures?
int fact(int n);{
if(n<1) return(1);else return (n*fact(n-1));
}
Assume $a0 = n fact: //
sub $sp,$sp,8 // Make space for 2 temp locationssw $ra, 4($sp) // save return addresssw $a0, 4($sp) // save argument n
slt $t0,$a0,1 // test for n<1beq $t0,$zero, L1 // if (n>=1) goto L1
add $v0,$zero,1 // $v0=1add $sp,$sp,8 // ‘pop’ the stack
jr $ra // return
L1: sub $a0,$a0,1 // n--; jal fact; // call fact again.
lw $a0,0($sp) // fact() returns here. Restore nlw $ra,4($sp) // restore return addressadd $sp,$sp,8 // ‘pop’ stackmult $v0,$a0,$v0 // $v0 = n*fact(n-1)jr $ra // return to caller
(n<1) case
(n>=1) case
CS141-L1-68 Tarun Soni, Summer ‘03
Other Architectures
• Design alternative:– provide more powerful operations (e.g., DSP, Encryption engines, Java
Processors)– goal is to reduce number of instructions executed– danger is a slower cycle time and/or a higher CPI
• Sometimes referred to as “RISC vs. CISC”– virtually all new instruction sets since 1982 have been RISC– VAX: minimize code size, make assembly language easy
instructions from 1 to 54 bytes long!• We’ll look at PowerPC and 80x86
CS141-L1-69 Tarun Soni, Summer ‘03
Power PC
• Indexed addressing– example: lw $t1,$a0+$s3 // $t1=Memory[$a0+$s3]– What do we have to do in MIPS?
• Update addressing– update a register as part of load (for marching through arrays)– example: lwu $t0,4($s3) // $t0=Memory[$s3+4];$s3=$s3+4– What do we have to do in MIPS?
• Others:– load multiple/store multiple– a special counter register “bc Loop”
decrement counter, if not 0 goto loop
CS141-L1-70 Tarun Soni, Summer ‘03
x86: Volume is beautiful
• 1978: The Intel 8086 is announced (16 bit architecture)• 1980: The 8087 floating point coprocessor is added• 1982: The 80286 increases address space to 24 bits, +instructions• 1985: The 80386 extends to 32 bits, new addressing modes• 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions
(mostly designed for higher performance)• 1997: MMX is added
“This history illustrates the impact of the “golden handcuffs” of compatibility”
“adding new features as someone might add clothing to a packed bag”
“an architecture that is difficult to explain and impossible to love”
“what the 80x86 lacks in style is made up in quantity,
making it beautiful from the right perspective”
CS141-L1-71 Tarun Soni, Summer ‘03
x86: Complex Instruction Set
• See text for a detailed description….• Complexity:
– Instructions from 1 to 17 bytes long– one operand must act as both a source and destination– one operand can come from memory– complex addressing modes
e.g., “base or scaled index with 8 or 32 bit displacement”• Saving grace:
– the most frequently used instructions are not too difficult to build– compilers avoid the portions of the architecture that are slow
CS141-L1-72 Tarun Soni, Summer ‘03
Comparing Instruction Set Architectures
Design-time metrics:
° Can it be implemented, in how long, at what cost?
° Can it be programmed? Ease of compilation?
Static Metrics:
° How many bytes does the program occupy in memory?
Dynamic Metrics:
° How many instructions are executed?
° How many bytes does the processor fetch to execute the program?
° How many clocks are required per instruction?
° How "lean" a clock is practical?
Best Metric: Time to execute the program!
This depends on instruction set, processor organization, and compilation techniques.
CPI
Inst. Count Cycle Time
CS141-L1-73 Tarun Soni, Summer ‘03
Instruction Set Architectures: What did we learn today?
• MIPS is a general-purpose register, load-store, fixed-instruction-length architecture.
• MIPS is optimized for fast pipelined performance, not for low instruction count• Four principles of IS architecture
– simplicity favors regularity– smaller is faster– good design demands compromise– make the common case fast
CS141-L1-74 Tarun Soni, Summer ‘03
AdministriviaTechnology trendsComputer organization: concept of abstractionInstruction Set Architectures: Definition, types, examplesInstruction formats: operands, addressing modes Operations: load, store, arithmetic, logicalControl instructions: branch, jump, proceduresStacksExamples: in-line code, procedure, nested-proceduresOther architectures
Todays Agenda