56
ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and Control Benjamin Lee Slides based on those from Andrew Hilton (Duke), Alvy Lebeck (Duke) Benjamin Lee (Duke), and Amir Roth (Penn)

ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

ECE 250 / CPS 250 Computer Architecture

Processor Design

Datapath and Control

Benjamin Lee Slides based on those from

Andrew Hilton (Duke), Alvy Lebeck (Duke) Benjamin Lee (Duke), and Amir Roth (Penn)

Page 2: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 2

Where We Are in This Course Right Now

•  So far: •  We know what a computer architecture is •  We know what kinds of instructions it might execute •  We know how to perform arithmetic and logic in an ALU

•  Now: •  We learn how to design a processor in which the ALU is just one

component •  Processor must be able to fetch instructions, decode them, and

execute them •  There are many ways to do this, even for a given ISA

•  Next: •  We learn how to design memory systems

Page 3: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 3

This Unit: Processor Design

•  Datapath components and timing •  Registers and register files •  Memories (RAMs)

•  Mapping an ISA to a datapath •  Control •  Exceptions

Application

OS

Firmware Compiler

CPU I/O

Memory

Digital Circuits

Gates & Transistors

Page 4: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 4

Readings

•  Patterson and Hennessy •  Chapter 4: Sections 4.1-4.4

•  Read this chapter carefully •  It has many more examples than I can cover in class

Page 5: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 5

So You Have an ALU…

•  Important reminder: a processor is just a big finite state machine (FSM) that interprets some ISA

•  Start with one instruction add $3,$2,$4

•  ALU performs just a small part of execution of instruction •  You have to read and write registers •  You have have to fetch the instruction to begin with

•  What about loads and stores? •  Need some sort of memory interface

•  What about branches? •  Need some hardware for that, too

Page 6: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 6

Datapath and Control

•  Fetch: get instruction, translate into control •  Control: which registers read/write, which ALU operation •  Datapath: registers, memories, ALUs (computation) •  Processor Cycle: Fetch → Decode → Execute

PC Insn

memory Register

File Data

Memory

control

datapath

fetch

Page 7: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 7

Building a Processor for an ISA

•  Fetch is pretty straightforward •  Just need a register (called the Program Counter or PC) to hold

the next address to fetch from instruction memory •  Provide address to instruction memory ! instruction memory

provides instruction at that address

•  Let�s start with the datapath 1.  Look at ISA 2.  Make sure datapath can implement every instruction

Page 8: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 8

Datapath for MIPS ISA

•  Consider only the following instructions add $1,$2,$3 addi $1,2,$3 lw $1,4($3) sw $1,4($3) beq $1,$2,PC_relative_target j Absolute_target

•  Why only these? •  Most other instructions are similar from datapath viewpoint •  I leave the ones that aren�t for you to figure out

Page 9: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 9

Review: A Register

•  Register: DFF array with shared clock, write-enable (WE) •  Notice: both a clock and a WE (DFFWE = clock & registerWE) •  Convention I: clock represented by wedge •  Convention II: if no WE, DFF is written on every clock

DFF

DFF

DFF

D0

DN-1

D1

CLK WE

Q0

Q1

QN-1

D Q N N

WE

Page 10: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 10

Uses of Registers: Program Counter

•  A single register is good for some things •  PC: program counter •  Other things which aren�t the ISA registers (more later in

semester)

PC Insn

memory Register

File Data

Memory

control

datapath

fetch

Page 11: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 11

Uses of Registers: Architected Registers

•  Register file: the ISA (�architectural�, �visible�) registers •  Two read �ports� + one write �port� •  Maximum number of reads/writes in single instruction (R-type)

•  Port: wires for accessing an array of data •  Data bus: width of data element (MIPS: 32 bits) •  Address bus: width of log2 number of elements (MIPS: 5 bits) •  Write enable: if it�s a write port •  M ports = M parallel and independent accesses

Register File

RS1VAL

RS2VAL

RDVAL

RD WE RS1 RS2 RD = dest reg

RS = source reg

Page 12: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 12

A Register File With Four Registers

Page 13: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 13

Add a Read Port for RS1

•  Output of each register into 4to1 mux (RS1VAL) •  RS1 is select input of RS1VAL mux

RS1

RS1VAL

Page 14: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 14

Add Another Read Port for RS2

•  Output of each register into another 4to1 mux (RS2VAL) •  RS2 is select input of RS2VAL mux

RS1

RS1VAL

RS2VAL

RS2

Page 15: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 15

Add a Write Port for RD

•  Input RDVAL into each register •  Enable only one register�s WE: (Decoded RD) & (WE)

•  What if we needed two write ports?

RS1

RS1VAL

RS2VAL

RS2 RD WE

RDVAL

2-to-1 decoder

Page 16: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 16

Another Read Port Implementation

•  A read port that uses muxes is fine for 4 registers •  Not so good for 32 registers (32-to-1 mux is very slow)

•  Alternative implementation uses tri-state buffers •  Truth table (E = enable, D = input, Q = output)

E D → Q 1 D → D 0 D → Z

•  Z: �high impedance� state, no current flowing

•  Mux: connect multiple tri-stated buses to one output bus •  Key: only one input �driving� at any time, all others must be in �Z�

•  Else, all hell breaks loose (electrically)

D Q

E

Page 17: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 17

Register File With Tri-State Read Ports

RS2 RS1 RD WE

RDVAL RS1VAL

RS2VAL

Page 18: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 18

Another Useful Component: Memory

•  Memory: where instructions and data reside •  One address bus •  One input data bus for writes •  One output data bus for reads

•  Actually, a more traditional definition of memory is •  One input/output data bus •  No clock ! asynchronous �strobe� instead

Memory

DATAOUT DATAIN

WE

ADDRESS

Page 19: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 19

Let�s Build A MIPS-like Datapath

Page 20: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 20

Start With Fetch

•  PC and instruction memory •  A +4 incrementer computes default next instruction PC

•  Why +4 (and not +1)?

P C

Insn Mem

+ 4

Page 21: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 21

First Instruction: add $rd, $rs, $rt

•  Add register file and ALU

P C

Insn Mem

Register File

Op(6) rs(5) rt(5) rd(5) Sh(5) Func(6) R-type

s1 s2 d

+ 4

rs

rt rs + rt

WE

Page 22: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 22

Second Instruction: addi $rt, $rs, imm

•  Destination register can now be either rd or rt •  Add sign extension unit and mux into second ALU input

P C

Insn Mem

Register File

S X

Op(6) rs(5) rt(5) I-type Immed(16)

s1 s2 d

+ 4

rs

Extended(imm)

sign extension (sx) unit

Page 23: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 23

Third Instruction: lw $rt, imm($rs)

•  Add data memory, address is ALU output (rs+imm) •  Add register write data mux to select memory output or ALU output

P C

Insn Mem

Register File

S X

Op(6) rs(5) rt(5) I-type Immed(16)

s1 s2 d

Data Mem

a

d

+ 4

?

?

?

?

?

?

Page 24: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 24

Fourth Instruction: sw $rt, imm($rs)

•  Add path from second input register to data memory data input •  Disable RegFile�s WE signal

P C

Insn Mem

Register File

S X

Op(6) rs(5) rt(5) I-type Immed(16)

s1 s2 d

Data Mem

a

d

+ 4

?

Page 25: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 25

Fifth Instruction: beq $1,$2,target

•  Add left shift unit (why?) and adder to compute PC-relative branch target •  Add mux to do what?

P C

Insn Mem

Register File

S X

Op(6) rs(5) rt(5) I-type Immed(16)

s1 s2 d

Data Mem

a

d

+ 4

<< 2

z

Page 26: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 26

Sixth Instruction: j

•  Add shifter to compute left shift of 26-bit immediate •  Add additional PC input mux for jump target

P C

Insn Mem

Register File

S X

Op(6) J-type Immed(26)

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Page 27: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 27

Seventh, Eight, Ninth Instructions

•  Are these the paths we would need for all instructions? sll $1,$2,4 // shift left logical

•  Like an arithmetic operation, but need a shifter too slt $1,$2,$3 // set less than (slt)

•  Like subtract, but need to write the condition bits, not the result •  Need zero extension unit for condition bits •  Need additional input to register write data mux

jal absolute_target // jump and link •  Like a jump, but also need to write PC+4 into $ra ($31)

•  Need path from PC+4 adder to register write data mux •  Need to be able to specify $31 as an implicit destination

jr $31 // jump register •  Like a jump, but need path from register read to PC write mux

Page 28: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 28

Clock Timing

•  Must deliver clock(s) to avoid races •  Can�t write and read same value at same clock edge

•  Particularly a problem for RegFile and Memory

•  May create multiple clock edges (from single input clock) by using buffers (to delay clock) and inverters

Page 29: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 29

This Unit: Processor Design

•  Datapath components and timing •  Registers and register files •  Memories (RAMs) •  Clocking strategies

•  Mapping an ISA to a datapath •  Control •  Exceptions

Application

OS

Firmware Compiler

CPU I/O

Memory

Digital Circuits

Gates & Transistors

Page 30: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 30

What Is Control?

•  9 signals control flow of data through this datapath •  MUX selectors, or register/memory write enable signals •  Datapath of current microprocessor has 100s of control signals

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Rwe

ALUinB

DMwe

JP

ALUop

BR

Rwd

Rdst

Page 31: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 31

Example: Control for add

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

BR=0

JP=0

Rwd=0

DMwe=0 ALUop=0

ALUinB=0 Rdst=1

Rwe=1

Page 32: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 32

Example: Control for sw

•  Difference between a sw and an add is 5 signals •  3 if you don�t count the X (�don�t care�) signals

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Rwe=0

ALUinB=1

DMwe=1

JP=0

ALUop=0

BR=0

Rwd=X

Rdst=X

Page 33: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 33

Example: Control for beq $1,$2,target

•  Difference between a store and a branch is only 4 signals

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Rwe=0

ALUinB=0

DMwe=0

JP=0

ALUop=1

BR=1

Rwd=X

Rdst=X

Page 34: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 34

How Is Control Implemented?

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Rwe

ALUinB

DMwe

JP

ALUop

BR

Rwd

Rdst

Control?

Page 35: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 35

Implementing Control

•  Each instruction has a unique set of control signals •  Most signals are function of opcode •  Some may be encoded in the instruction itself

•  E.g., the ALUop signal is some portion of the MIPS Func field +  Simplifies controller implementation –  Requires careful ISA design

•  Options for implementing control 1.  Use instruction type to look up control signals in a table 2.  Design FSM whose outputs are control signals •  Either way, goal is same: turn instruction into control signals

Page 36: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 36

Control Implementation: ROM

•  ROM (read only memory): like a RAM but unwritable •  Bits in data words are control signals •  Lines indexed by opcode

•  Example: ROM control for our simple datapath

BR JP ALUinB ALUop DMwe Rwe Rdst Rwd

add 0 0 0 0 0 1 1 0

addi 0 0 1 0 0 1 1 0

lw 0 0 1 0 0 1 0 1

sw 0 0 1 0 1 0 0 0

beq 1 0 0 1 0 0 0 0

j 0 1 0 0 0 0 0 0

opcode

Page 37: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 37

ROM vs. Combinational Logic

•  A control ROM is fine for 6 insns and 9 control signals •  A real machine has 100+ insns and 300+ control signals

•  Even �RISC�s have lots of instructions •  30,000+ control bits (~4KB) –  Not huge, but hard to make fast

•  Control must be faster than datapath

•  Alternative: combinational logic •  Exploits observation: many signals have few 1s or few 0s

Page 38: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 38

ALUinB

Control Implementation: Combinational Logic

•  Example: combinational logic control for our simple datapath

opcode add addi lw sw beq j

BR JP DMwe Rwd Rdst ALUop Rwe

Page 39: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 39

Datapath and Control Timing

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

Control (ROM or combinational logic)

Read IMem Read Registers (Read Control ROM)

Read DMEM Write DMEM Write Registers

Write PC

Page 40: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 40

�Single-Cycle� Performance

•  Useful metric: cycles per instruction (CPI) +  Easy to calculate for single-cycle processor: CPI = 1

•  Seconds/program = (insns/program) * 1 CPI * (N seconds/cycle) •  ICQ: How many cycles/second in 3.8 GHz processor?

–  Slow! •  Clock period must be elongated to accommodate longest operation

•  In our datapath: lw •  Goes through five structures in series: insn mem, register file

(read), ALU, data mem, register file again (write) •  No one will buy a machine with a slow clock

•  Later: faster processor cores

Page 41: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 41

This Unit: Processor Design

•  Datapath components and timing •  Registers and register files •  Memories (RAMs) •  Clocking strategies

•  Mapping an ISA to a datapath •  Control •  Exceptions

Application

OS

Firmware Compiler

CPU I/O

Memory

Digital Circuits

Gates & Transistors

Page 42: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 42

Program Execution

•  Threads of Control •  Multiple threads, programs run within a system •  Each thread, program has its own program counter

•  Program Execution •  Fetch instruction from memory with address in PC •  Execute instruction •  Increment PC

•  Begin PC at known location after system start-up •  Load the operating system kernel

Page 43: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 43

Execution Context

•  Program context defined by processor state •  General purpose registers (integer, floating-point) •  Status registers (e.g., condition codes) •  Program counter, stack pointer •  Memory hierarchy

Page 44: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 44

Context Switches

•  Context switches (Motivation) •  Allows programs to share machine, increases machine utilization •  OS schedules, switches between multiple programs •  Permits different execution modes (e.g., user versus OS kernel)

•  Context switches (Mechanism) •  Save current context, restore next context

•  Context switches (Triggers) •  User <-> User: Timeslice for multiple programs (e.g., 10-100ms) •  User <-> OS: Invoke OS to handle external events (e.g., I/O)

Page 45: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 45

Interrupts and Exceptions

•  Exception is external event requiring response •  Clock interrupts for time-slicing, context switches •  I/O operations for network, disk, keyboard, etc. •  “Exception,” “interrupt,” “trap” are used interchangeably

•  Exceptions are infrequent •  Input/Output, illegal instruction, divide-by-zero, page fault,

protection fault, ctrl-C, ctrl-Z, timer

•  Exception handling requires OS •  OS kernel includes instructions for exception handling •  Exception handling is transparent to application code •  Handlers fix & restart (e.g., I/O), or terminate (e.g., illegal insn)

Page 46: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 46

Detecting Exceptions

•  Undefined Instruction •  Detect unknown opcodes

•  Arithmetic Exceptions •  Add logic in ALU to detect overflow •  Add logic in divider to detect divide-by-zero

•  Unaligned Access •  Add circuit to check addresses •  Word-aligned addresses have {00} in the least significant bits

Page 47: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 47

Handling Exceptions

•  Identify interrupt’s cause

•  Invoke OS routine •  i = ID for interrupt’s cause •  PC = interrupt_table[i] •  Kernel initializes table @ boot

•  Clear interrupt signal

•  Return from interrupt handler •  Return to user context

Page 48: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 48

Handling System Calls

•  Special instruction invokes OS •  Read or write I/O devices •  Create new process

•  Invoke OS routine •  i = ID for syscall •  PC = interrupt_table[i] •  Kernel initializes table @ boot

•  Clear interrupt signal

•  Return from interrupt handler •  Return to user context

Page 49: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 49

Exceptions as “Procedure Calls”

1.  Processor saves address of user instruction •  Address of instruction stored in Exception Program Counter (EPC)

2.  Processor transfers control to OS •  Set PC to address of exception handler within OS code

3.  OS executes handler, which resolves exception

4.  OS returns to user program (EPC), or terminates program

Page 50: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 50

Exceptions and Register Support

1.  Exception Program Counter (EPC) •  32-bit register holds address of affected instruction

2.  Cause Register •  32-bit register encodes exception cause

3.  BadVAddr •  32-bit register holds address that triggers memory access exception •  See memory hierarchy, virtual memory for detail

4.  Status •  32-bit register tracks interrupt handling, multiple interrupts, etc.

Page 51: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 51

Exception Handling

•  What does exception handling look like to software? •  When exception happens…

•  Control transfers to OS at pre-specified exception handler address •  OS has privileged access to registers user processes do not see

•  These registers hold information about exception •  Cause of exception (e.g., page fault, arithmetic overflow) •  Other exception info (e.g., address that caused page fault) •  PC of application insn to return to after exception is fixed

•  OS uses privileged (and non-privileged) registers to do its �thing� •  OS returns control to user application

•  Same mechanism available programmatically via SYSCALL

Page 52: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 52

MIPS Exception Handling

•  MIPS uses registers for state during exception handling •  These registers live on �coprocessor 0� •  $14: EPC (holds PC of user program during exception handling) •  $13: exception type (SYSCALL, overflow, etc.) •  $8: virtual address (produced page/protection fault) •  $12: exception mask (which exceptions trigger OS)

•  Access registers with privileged instructions mfc0, mtc0 •  Privileged = user program cannot execute them •  mfc0: move (register) from coprocessor 0 (to user reg) •  mtc0: move (register) to coprocessor 0 (from user reg)

•  Restore user mode with privileged instruction rfe •  Kernel executes this instruction to restore user program

Page 53: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 53

Implementing Exceptions

•  Why do architects care about exceptions? •  Because we use datapath and control to implement them •  More precisely… to implement aspects of exception handling

•  Recognition of exceptions •  Transfer of control to OS •  Privileged OS mode

•  Later in semester, we�ll talk more about exceptions (b/c we need them for I/O)

Page 54: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 54

Datapath with Support for Exceptions

•  Co-processor register (CR) file needn�t be implemented as RF •  Independent registers connected directly to pertinent muxes

•  PSR (processor status register): in privileged mode?

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

I R

B

A

O D

Co-procesor Register File

P S R

ALUinAC

PCwC

CRwd CRwe

PSRs

PSRr

Page 55: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 55

This Unit: Processor Design

•  Datapath components and timing •  Registers and register files •  Memories (RAMs) •  Clocking strategies

•  Mapping an ISA to a datapath •  Control

Application

OS

Firmware Compiler

CPU I/O

Memory

Digital Circuits

Gates & Transistors

Next up: Memory Systems

Page 56: ECE 250 / CPS 250 Computer Architecture - Penn Engineeringleebcc/teachdir/ece250_fall... · 2020. 9. 10. · ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and

© 2013 Daniel J. Sorin from Roth ECE250 56

Summary

•  We now know how to build a fully functional processor

•  But … •  We’re still treating memory as a black box •  Our fully functional processor is slow. Really, really slow.