40
SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) The Processor (1) Jinkyu Jeong ([email protected] ) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu

The Processor (1) - SKKU

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

The Processor (1)

Jinkyu Jeong ([email protected])Computer Systems Laboratory

Sungkyunkwan Universityhttp://csl.skku.edu

Page 2: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 1

Introduction• CPU performance factors– Instruction count

• Determined by ISA and compiler– CPI and Cycle time

• Determined by CPU hardware

• We will examine two MIPS implementations– A simplified version– A more realistic pipelined version

• Simple subset, shows most aspects– Memory reference: lw, sw– Arithmetic/logical: add, sub, and, or, slt– Control transfer: beq, j

Page 3: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 2

Outline

Textbook: P&H 4.1-4.4

• Logic Design Basics & Implementation Overview

• Building a Datapath

• Control Logic Design

Page 4: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Logic Design Basics & Implementation Overview

Page 5: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 4

Logic Design Basics

• Information encoded in binary– Low voltage = 0, High voltage = 1– One wire per bit– Multi-bit data encoded on multi-wire buses

• Combinational element– Operate on data– Output is a function of input

• State (sequential) elements– Store information

Page 6: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 5

Combinational Elements

• AND-gate– Y = A & B

AB

Y

I0I1

YMux

S

A

B

Y+

A

B

YALU

F

• Adder– Y = A + B

• Multiplexer– Y = S ? I1 : I0

• Arithmetic/Logic Unit– Y = F(A, B)

Page 7: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 6

Sequential Elements

• Register: stores data in a circuit– Uses a clock signal to determine when to update the stored

value– Edge-triggered: update when Clk changes from 0 to 1

D

Clk

QClk

D

Q

Page 8: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 7

Sequential Elements

• Register with write control– Only updates on clock edge when write control input is 1– Used when stored value is required later

D

Clk

Q

Write

Write

D

Q

Clk

Page 9: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 8

Clocking Methodology• Combinational logic transforms data during clock

cycles– Between clock edges– Input from state elements, output to state element– Longest delay determines clock period

Page 10: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 9

Instruction Execution

• PC ® instruction memory, fetch instruction

• Register numbers ® register file, read registers

• Depending on instruction class– Use ALU to calculate

• Arithmetic result• Memory address for load/store• Branch target address

– Access data memory for load/store– PC ¬ target address or PC + 4

Page 11: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

CPU Overview

Page 12: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 11

Multiplexers

• Can’t just join wires together

– Use multiplexers

Page 13: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Control

Page 14: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Building a Datapath

Page 15: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 14

Building a Datapath

• Datapath– Elements that process data and addresses in the CPU

• Registers, ALUs, mux’s, memories, …

• We will build a MIPS datapath incrementally– Refining the overview design

Page 16: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Instruction Fetch

32-bit register

Increment by 4 for next instruction

Page 17: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 16

R-Format Instructions

• Read two register operands

• Perform arithmetic/logical operation

• Write register result

Page 18: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 17

Register File Read

• Two register numbers select two register outputs

Page 19: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 18

Register File Write

• A register number and a write signal enable a state element (D flip-flop) to update its value

Page 20: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 19

Arithmetic/Logic Unit

Ainvert Binvert Operationa AND b 0 0 00a OR b 0 0 01

a NOR b 1 1 00a + b 0 0 10a - b 0 1 10

slt a, b 0 1 11

Binvert

Page 21: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 20

R-Format Instructions• R format instructions (add, sub, slt, and, or)

– perform operation (op and funct) on values in rs and rt– store the result back into the Register File (into location rd)

– Note that Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

overflowzero

ALU controlRegWrite

R-type:

31 25 20 15 5 0

op rs rt rd functshamt

10

Page 22: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 21

Load/Store Instructions• Read register operands• Calculate address using 16-bit offset– Use ALU, but sign-extend offset

• Load: Read memory and update register• Store: Write register value to memory

Page 23: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 22

Load/Store Instructions• Load and store instructions involve

– compute memory address by adding the base register (read from the Register File during decode) to the 16-bit signed-extended offset field in the instruction

– store value (read from the Register File during decode) written to the Data Memory

– load value, read from the Data Memory, written to the Register File

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

overflowzero

ALU controlRegWrite

DataMemory

Address

Write Data

Read Data

SignExtend

MemWrite

MemRead

16 32

Page 24: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 23

Branch Instructions

• Read register operands

• Compare operands– Use ALU, subtract and check Zero output

• Calculate target address– Sign-extend displacement– Shift left 2 places (word displacement)– Add to PC + 4

• Already calculated by instruction fetch

Page 25: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Branch Instructions

Justre-routes wires

Sign-bit wire replicated

Page 26: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 25

Branch Instructions• Branch instructions involve

– compare the operands read from the Register File during decode for equality (zero ALU out)– compute the branch target address by adding the updated PC to the 16-bit signed-extended offset

field in the instr

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

zero

ALU control

SignExtend16 32

Shiftleft 2

Add

4 Add

PC

Branchtargetaddress

(to branch control logic)

Page 27: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 26

Composing the Elements

• First-cut data path does an instruction in one clock cycle

– Each datapath element can only do one function at a time– Hence, we need separate instruction and data memories

• Use multiplexers where alternate data sources are used for different instructions

Page 28: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 27

R-Type/Load/Store Datapath

Page 29: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Full Datapath

Page 30: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Control Logic Design

Page 31: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Datapath With Control

Page 32: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 31

ALU Control

• ALU used for– Load/Store: F = add– Branch: F = subtract– R-type: F depends on funct field

ALU control Function0000 AND0001 OR0010 add0110 subtract0111 set-on-less-than1100 NOR

Page 33: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 32

ALU Control

• Assume 2-bit ALUOp derived from opcode– Combinational logic derives ALU control

opcode ALUOp Operation funct ALU function ALU controllw 00 load word XXXXXX add 0010sw 00 store word XXXXXX add 0010beq 01 branch equal XXXXXX subtract 0110R-type 10 add 100000 add 0010

subtract 100010 subtract 0110AND 100100 AND 0000OR 100101 OR 0001set-on-less-than 101010 set-on-less-than 0111

Page 34: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 33

The Main Control Unit

• Control signals derived from instruction

0 rs rt rd shamt funct

31:26 5:025:21 20:16 15:11 10:6

35 or 43 rs rt address

31:26 25:21 20:16 15:0

4 rs rt address

31:26 25:21 20:16 15:0

R-type

Load/Store

Branch

opcode always read

read, except for load

write for R-type and load

sign-extend and add

Page 35: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

R-Type Instruction

Page 36: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Load Instruction

Page 37: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Branch-on-Equal Instruction

Page 38: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 37

Implementing Jumps

• Jump uses word address

• Update PC with concatenation of– Top 4 bits of old PC– 26-bit jump address– 00

• Need an extra control signal decoded from opcode

2 address

31:26 25:0Jump

Page 39: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected])

Datapath With Jumps Added

Page 40: The Processor (1) - SKKU

SWE3005: Introduction to Computer Architectures, Fall 2019, Jinkyu Jeong ([email protected]) 39

Performance Issues

• Longest delay determines clock period– Critical path: load instruction– Instruction memory ® register file ®ALU ® data memory ® register file

• Not feasible to vary period for different instructions

• Violates design principle– Making the common case fast

• We will improve performance by pipelining