30
N ew York Institute ofTechnology Engineering and C om puterSciences CSCI 641 – EENG 641 1 CSCI-641/EENG-641 Computer Architecture Khurram Kazi

CSCI-641/EENG-641 Computer Architecture

  • Upload
    tejano

  • View
    57

  • Download
    1

Embed Size (px)

DESCRIPTION

CSCI-641/EENG-641 Computer Architecture. Khurram Kazi. Major sources of the slides for this lecture. http://fourier.eng.hmc.edu/e85/lectures/instruction/node7.html http://www.ece.ucdavis.edu/~vojin/CLASSES/EEC180B/Spring99/lab6.pdf http://fourier.eng.hmc.edu/e85/lectures/r3000-isa.html - PowerPoint PPT Presentation

Citation preview

Page 1: CSCI-641/EENG-641 Computer Architecture

New York Institute of Technology

Engineering and Computer Sciences

CSCI 641 – EENG 641 1

CSCI-641/EENG-641

Computer Architecture

Khurram Kazi

Page 2: CSCI-641/EENG-641 Computer Architecture

NYITMajor sources of the slides for this lecture

http://fourier.eng.hmc.edu/e85/lectures/instruction/node7.htmlhttp://www.ece.ucdavis.edu/~vojin/CLASSES/EEC180B/Spring99/lab6.pdfhttp://fourier.eng.hmc.edu/e85/lectures/r3000-isa.htmlDigital Design and Computer Architecture book by David Money Harris and Sarah L. Harris. Chapter 6 – Architecturehttp://www.hirstbrook.com/cod/Chapter2B.pdf

K Kazi CSCI 641/EENG 641 2

Page 3: CSCI-641/EENG-641 Computer Architecture

NYITAssembly language

Assembly language is the human readable representation of computer’s native language

Each instruction specifies both the operations to perform and the operands on which to operate

add is called the mnemonic and indicates what operation to perform

The operation is performed on b and c, the source operands, and results is written to a, the destination operand

Design Principle 1: Simplicity favors regularityInstructions with a consistent number of operands – in this case, two sources and

one destination – are easier to encode and handle in hardware. More complex high-level code translates into multiple MIPS instructions

K Kazi CSCI 641/EENG 641 3

High-level code MIPS Assembly code

a = b + c; add a, b, c

a = b – c; sub a , b, c

High-level code (complex)

MIPS Assembly code

a = b + c – d; sub t, c, d # t = c-d

add a , b, t # a=b+t

Page 4: CSCI-641/EENG-641 Computer Architecture

NYITAssembly language

To execute complex operation multiple assembly language instructions are performed

Design Principle 2: Make the common case fastThe MIPS instruction set makes the common case fast by including only simple,

commonly used instructions. The number of instructions is kept small so that the hardware required to decode the instructions and its operands can be simple, small and fast.

Less frequent more elaborate operations are performed using sequence of multiples simple instructions

K Kazi CSCI 641/EENG 641 4

High-level code MIPS Assembly code

a = b + c; add a, b, c

a = b – c; sub a , b, c

High-level code (complex)

MIPS Assembly code

a = b + c – d; sub t, c, d # t = c-d

add a, b, t # a=b+t

Page 5: CSCI-641/EENG-641 Computer Architecture

NYITAssembly language: Operands: Registers, Memory, and Constants

Instruction operates on Operands Variables a, b, and c, all are called operands. Computer operates on 0’s and 1’s Operands are stored in registers or memory, or they may be

constants stored in the instruction itself Computers use various locations to hold operands, to optimize

for speed and data capacity Registers are accessed quickly compared to memory Registers can hold very limited amount of data whereas

memories hold large amounts of data MIPS architecture uses 32 register, called register set or

register file

Design Principle 3: Smaller is faster

K Kazi CSCI 641/EENG 641 5

Page 6: CSCI-641/EENG-641 Computer Architecture

NYITTranslating high-level code to assembly

K Kazi CSCI 641/EENG 641 6

High level code Assembly language

a = b – c; # $s0=a, $s1=b, $s2=c, $s3=f, $s4=g, #s5=h,

f = (g + h) – (i + j)

#$s6=i, $s7=j

sub $s0, $s1, $s2 #a = b - c

add $t0, $s4, $s5 #$t0 = g + h

add $t1, $s6, $s7 # $t1 = i + j

sub $s3, $t0, $t1 # f = (g + h) – (i + j)

Page 7: CSCI-641/EENG-641 Computer Architecture

NYIT

K Kazi CSCI 641/EENG 641 7

MIPS register set

Name Number Use

$0 0 The constant value 0

$at 1 Assembler temporary

$v0 – $v1 2 – 3 Procedure return values

$a0 – $a3 4 – 7 Procedure arguments

$t0 – $t7 8 – 15 Temporary variables

$s0 – $s7 16 – 23 Saved variables

$t8 – $t9 24 – 25 Temporary variables

$k0 – $k1 26 – 27 Operating system (OS) temporaries

$gp 28 Global pointer

$sp 29 Stack pointer

$fp 30 Frame pointer

$ra 31 Procedure return address

Page 8: CSCI-641/EENG-641 Computer Architecture

NYITRegisters within MIPS Processor

Register file (RF): 32 registers ($0 through $31), each for a word of 32 bits (4 bytes); $0 always holds zero $sp (29) is the stack pointer (SP) which always points to

the top item of a stack in the memory; $ra (31) always holds the return address from a subroutine

The table in the previous shows the conventional usage of all 32 registers

K Kazi CSCI 641/EENG 641 8

Page 9: CSCI-641/EENG-641 Computer Architecture

NYITDescription of Register File

There are two read data buses, a_dout and b_dout, two read address buses, a_addr and b_addr. one write data bus, wr_dbus and one write address bus, wr_addr.

Each of these address buses is used to specify one of the 32 registers for either reading or writing.

The write operation takes place on the rising edge of the clk signal when the wr_en signal is logic 1.

The read operation, however, is not clocked - it is combinational. Thus, the value on the a_dout should always be the contents of the register specified by the a_addr bus.

Similarly, the value on the b_dout should always be the contents of the register specified by the b_addr bus.

So, with this register file, you can write to a register and read two registers simultaneously. It is also possible to read a single register on both of the read buses simultaneously.

It essence it is a 3-port memory element that allows two reads and one write simultaneously.

K Kazi CSCI 641/EENG 641 9

Page 10: CSCI-641/EENG-641 Computer Architecture

NYITMemory

K Kazi CSCI 641/EENG 641 10

ABCDEF78

F2EFAB01

ECEBAB01

12EBAB2A

Data

00000000

00000001

00000002

00000003

Word Address

Word 0

Word 1

Word 2

Word 3

AB CD EF 78

F2 EF AB 01

EC EB AB 01

12 EB AB 2A

00000000

00000004

00000008

0000000C

Word AddressData

0 1 2 3

4 5 6 7

8 9 A B

C D E F

MSB LSB

Byte Address

3 2 1 0

7 6 5 4

B A 9 8

F E D C

MSB LSB

BIG- ENDIAN LITTLE- ENDIAN

0 1 2 3 3 2 1 0Byte Address

Page 11: CSCI-641/EENG-641 Computer Architecture

NYITMemory

Compared to registerfile, memory is large and slow

MIPS uses byte-addressable memory MIPS architecture uses 32-bit memory address

and 32-bit data words Memory array is word-addressable, i.e., each 32-

bit data word has a unique 32-bit address MIPS uses load word instruction, lw, to read data

from memory into a register lw instrcution specifies the effective address in

memory as sum of base address and offset, e.g. lw $s0, 0($0) # read data word 0 into $s0 lw $s1, 4($0) # read data word 1 into $s1 lw $s2, 0xC($0) # read data word 3 into $s2

K Kazi CSCI 641/EENG 641 11

offset Base address

Page 12: CSCI-641/EENG-641 Computer Architecture

NYITMemory

MIPS uses store word instruction, sw, to write data from a register to a memory

sw instrcution specifies the effective address in memory as sum of base address and offset, e.g.

sw $s0, 0($0) # write $s0 to memory data word 0 sw $s1, 4($0) # write $s1 to memory data word 1 sw $s2, 0xC($0) # write $s2 to memory data word 3

K Kazi CSCI 641/EENG 641 12

offset Base address

Page 13: CSCI-641/EENG-641 Computer Architecture

NYIT

Instruction set: each instruction in the instruction set describes one particular CPU operation. Each instruction is represented in both assembly language by the mnimonics and machine language (binary) by a word of 32 bits subdivided into several fields.

rs – is short for “register source.” rt comes after rs alphabetically and usually indicates second register source.

rd – is short for “register destination.”

K Kazi CSCI 641/EENG 641 13

Instruction Set of MIPS Processor

shamt – shift and mix operationOp - Opcode

Page 14: CSCI-641/EENG-641 Computer Architecture

NYITInstruction Set of MIPS Processor: R-type instruction

Arithmetic/Logical Instructions in MIPSLogical operations are and, or, xor, and norR-type instructions operate bit-by-bit on

two source registers and the result is written to the destination address

and is used in masking bits (i.e. forcing unwanted bits to 0)

or is useful in combing bits from two registers

MIPS does not provide a NOT instruction, NOR can be used for NOT operation, e.g., A NOR $0 = not A

K Kazi CSCI 641/EENG 641 14

Page 15: CSCI-641/EENG-641 Computer Architecture

NYIT

K Kazi CSCI 641/EENG 641 15

Instruction Set of MIPS Processor: Machine code for R-type instruction

0 11 13 8

6 bits

0 34

5 bits 5 bits 5 bits 5 bits 6 bits

0 17 18 16 0 32

sub $t1, $t3, $t5

add $s0, $s1, $s2

000000 01011 01101 01000 00000 100010

000000 10001 10010 10000 00000 100000

0x016D4022

0x02328020

Field values Machine code

Page 16: CSCI-641/EENG-641 Computer Architecture

NYITInstruction Set of MIPS Processor: I-type instruction

Immediate type or I-type instruction use two register operands and one immediate operand.

Similar to R-type instructionOperation is solely defined by the opcode rs and imm are always used as source

operands rt is used as a destination for some

instructions, but never a source for others

K Kazi CSCI 641/EENG 641 16

Page 17: CSCI-641/EENG-641 Computer Architecture

NYIT

K Kazi CSCI 641/EENG 641 17

Instruction Set of MIPS Processor: Machine code for I-type instructions

sw $s1, 4($t1) 0xAD310004

Field valuesMachine code

op rs rt imm

6 bits 5 bits 16 bits5 bits

43 9 17 4

6 bits 5 bits 16 bits5 bits

35 0 10 32lw $t2, 32($0)

8 19 8 -12addi $t0, $32. -12

8 17 16 5addi $s0, $s1. 5

op rt immrs

0x8C0A0020

0x2268FFF4

0x22300005

I-type instructions – immediate type

Page 18: CSCI-641/EENG-641 Computer Architecture

NYITLoad word (lw) instruction

MIPS uses load word instruction, lw, to read a data word from memory into a register lw $s3 1($0) #read memory word 1 into $s3 lw instruction specifies effective address in memory as sum

of base address and an offset. Base address (written in parentheses in the instruction) is a

register Offset is constant (written before the parentheses) Base address is $0 and offset is 1 => instruction reads from

memory address 1. After instruction register S3 = F2F1AC07

K Kazi CSCI 641/EENG 641 18

Word address Data Word

00000003 40F30788 WORD 3

00000002 01EE2842 WORD 2

00000001 F2F1AC07 WORD 1

00000000 ABCDEF78 WORD 0

Page 19: CSCI-641/EENG-641 Computer Architecture

NYITlw instruction

K Kazi CSCI 641/EENG 641 19

rtrt is used as a destination in

this instruction

Page 20: CSCI-641/EENG-641 Computer Architecture

NYITstore word (sw) instruction

MIPS uses load word instruction, lw, to read a data word from memory into a register sw $s7 5($0) #write $s7 to memory word 5 sw instruction is used to write data from register to memory. Base address (written in parentheses in the instruction) is

the value stored in the register Offset is constant (written before the parentheses) Base address is $0 and offset is 5 => instruction writes data

from register $7 to memory word 5. Keep in mind that MIPS memory model is byte

addressable, not word addressable

K Kazi CSCI 641/EENG 641 20

Page 21: CSCI-641/EENG-641 Computer Architecture

NYIT

Classical five-stage RISC pipeline

K Kazi CSCI 641/EENG 641 21

Page 22: CSCI-641/EENG-641 Computer Architecture

NYIT

K Kazi CSCI 641/EENG 641 22

MIPS R3000 Instruction Set Summary

Name Example Comments

32 registers $0, $1, $2,..., $31

Fast location for data. In MIPS, data must be in registers to perform arithmetic. MIPS register $0 always equal 0. Register $1 is reserved for the assembler to handle pseudo instructions and large constants

230 memory words Memory[0], Memory[4],..., Memory[4293967292]

Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls

Page 23: CSCI-641/EENG-641 Computer Architecture

NYIT

K Kazi CSCI 641/EENG 641 23

MIPS R3000 Instruction Set Summary

Category Instruction Example Meaning Comments

Arithmetic add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible

subtract sub $1,$2,$3 $1 = $2 - $3 3 operands; exception possible

add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible

add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; exception possible

subtract unsigned subi $1,$2,$3 $1 = $2 - $3 3 operands; exception possible

add immediate unsigned addi $1,$2,100 $1 = $2 + 100 + constant; exception possible

Move from coprocessor register mfc0 $1,$epc $1 = $epc Used to get of Exception PC

Page 24: CSCI-641/EENG-641 Computer Architecture

NYIT

K Kazi CSCI 641/EENG 641 24

MIPS R3000 Instruction Set Summary

Category Instruction Example Meaning Comments

Logical and and $1,$2,$3 $1 = $2 & $3 3 register operands; Logical AND

or or $1,$2,$3 $1 = $2 | $3 3 register operands; Logical OR

and immediate and $1,$2,100 $1 = $2 & 100 Logical AND register, constant

or immediate or $1,$2,100 $1 = $2 | 100 Logical OR register, constant

shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant

shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant

Page 25: CSCI-641/EENG-641 Computer Architecture

NYIT

K Kazi CSCI 641/EENG 641 25

MIPS R3000 Instruction Set Summary

Category Instruction Example Meaning Comments

Data transfer load word lw $1,(100)$2 $1 = Memory[$2+100] Data from memory to register

store word sw $1,(100)$2 Memory[$2+100] = $1 Data from memory to register

load upper immediate lui $1,100 $1 = 100 * 216 Load constant in upper 16bits

Conditional branch branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch

branch on not equal bne $1,$2,100 if ($1 != $2) go to PC+4+100 Not equal test; PC relative

set on less than slt $1,$2,$3 if ($2 < $3) $1 = 1; else $1 = 0 Compare less than; 2`s complement

set less than immediate slti $1,$2,100 if ($2 < 100) $1 = 1; else $1 = 0 Compare < constant; 2`s complement

set less than unsigned sltu $1,$2,$3 if ($2 < $3) $1 = 1; else $1 = 0 Compare less than; natural number

set less than immediate unsigned sltiu $1,$2,100 if ($2 < 100) $1 = 1; else $1 = 0 Compare constant; natural number

Unconditional jump jump j 10000 goto 10000 Jump to target address

jump register j $31 goto $31 For switch, procedure return

jump and link jal 10000 $31 = PC + 4;go to 10000 For procedure call

Page 26: CSCI-641/EENG-641 Computer Architecture

NYITPipeline stages: 5 stages Fetch

Reads instructions from instruction memory

Decode Reads the source operands from the register file and

decodes the instructions to produce the control signals

Execute Performs a computation with the ALU

Memory Processor reads or write data memory

Writeback Processor writes the results to the register file, when

applicable

K Kazi CSCI 641/EENG 641 26

Page 27: CSCI-641/EENG-641 Computer Architecture

NYITPipelined logic of MIPS

K Kazi CSCI 641/EENG 641 27

Page 28: CSCI-641/EENG-641 Computer Architecture

NYITSnippet of MIPS simulation

K Kazi CSCI 641/EENG 641 28

Instruction immediateDest. reg

Write enable

Write addr

Write data

Page 29: CSCI-641/EENG-641 Computer Architecture

NYITMIPS Microarchitecture

Single-cycle microarchitecture Executes an entire instruction in one cycle

Multi-cycle microarchitecture Executes instructions in a series of shorter cycles Simpler instructions execute in fewer cycles than complicated

ones Reuse complicated hardware blocks such as adders Executes only one instruction at a time over multiple cycles

Pipelined microarchitecture Applies pipelining to the single-cycle microarchitecture Can execute several instructions simultaneously Improve throughput significantly All high-performance commercial processors use pipelining

K Kazi CSCI 641/EENG 641 29

Page 30: CSCI-641/EENG-641 Computer Architecture

NYITMIPS Microarchitecture

Many ways to measure performance of a computer system Intel and Advanced Micro Devices (AMD) both sell compatible

processors conforming to IA-32 architecture Intel offered higher clock frequencies than its competitors AMD’s Athlon, Intel’s main competitor, executed programs faster than Intel’s chips

at the same clock frequency

The only gimmick-free way to measure performance is by measuring the execution time of a program of interest to you

Computer that executes your program fastest has the highest performance

Next best thing would be to measure total execution time of a collection of programs Such collections of programs are called benchmarks and execution times of such

programs are commonly published

Execution time = (#instructions)(cycle/instruction)(seconds/cycle) Cycle per instruction, CPI is the number of clock cycles

required to execute an average instruction

K Kazi CSCI 641/EENG 641 30