43
ECE 4750 Computer Architecture Topic 2: From CISC to RISC Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece4750 slide revision: 2013-09-08-23-34

Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

ECE 4750 Computer ArchitectureTopic 2: From CISC to RISC

Christopher BattenSchool of Electrical and Computer Engineering

Cornell University

http://www.csl.cornell.edu/courses/ece4750

slide revision: 2013-09-08-23-34

Page 2: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

CPI for Microcoded Machine

Inst 17 cycles

Inst 25 cycles

Inst 310 cycles

I Total clock cycles = 7 + 5 + 10 = 22I Total instructions = 3I Clocks per Instruction (CPI) = 22 / 3 = 7.33I CPI is always an average over a large number of instructions

ECE 4750 T02: From CISC to RISC 2 / 43

Page 3: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

“Iron Law” of Processor Performance

TimeProgram

=Instructions

Program× Cycles

Instruction× Time

Cycles

I Instructions / program depends on source code, compiler, ISAI Cycles / instruction (CPI) depends on ISA, microarchitectureI Time / cycle depends upon microarchitecture and implementation

Microarchitecture CPI Cycle Time

last topic→ Microcoded >1 shortthis topic→ Single-Cycle Unpipelined 1 longthis topic→ Multi-Cycle Unpipelined >1 short

next topic→ Pipelined ≈1 short

ECE 4750 T02: From CISC to RISC 3 / 43

Page 4: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

Agenda

Technology Trends Motivating RISC

Memory Basics

Single-Cycle Unpipelined MIPS Processor

Multi-Cycle Unpipelined MIPS Processor

ECE 4750 T02: From CISC to RISC 4 / 43

Page 5: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

Minicomputers in the 1970’s

Extremely popular VAX 11/780 firstavailable in 1977; often used as a

baseline for benchmarking andassumed to have a speed of 1M

instructions/section (1 MIPS):5 MHz, TTL devices

I Implemented with racks ofdiscrete components

I Used microcode to implementCISC ISA

I Applications in business,scientific, commercial computing

ECE 4750 T02: From CISC to RISC 5 / 43

Page 6: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

Microprocessors in the 1970’s

First microprocessor is the Intel4004 fabricated in 1971: designed

for desktop printing calculator:750 KHz, 8–16 cycles/inst, 8 µm

PMOS, 2.3K transistors, 12 mm2,microcoded control to implement

CISC ISA

I Microprocessors made possibleby new integrated circuit tech

I Constrained by what could fit ona single chip leading to few-bitdatapaths with hardwired control

I Initial application was forembedded control

I 8-bit microprocessors used inhobbyist personal computers. Micral, Alrair, TRS-80, Apple-II. Usually had 16-bit address space

(65KB directly addressable). Simple BASIC interpreter in ROM

or cassette tape

ECE 4750 T02: From CISC to RISC 6 / 43

Page 7: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

DRAM in the 1970’s

I Dramatic progress in MOSFET memory technologyI 1970→ Intel introduces first DRAM (Model 1103 w/ 1 Kb)I 1979→ Fujitsu introduces 64 Kb DRAMI By mid-1970’s became obvious that microprocessors would

soon have >64 KB of physical memory

ECE 4750 T02: From CISC to RISC 7 / 43

Page 8: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

VisiCalc as “Killer” App and Eventually the IBM PC

ECE 4750 T02: From CISC to RISC 8 / 43

Page 9: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

Analyzing Microcoded Machines

I John Cocke and group at IBM

. Working on a simple pipelined processor, 801, and advanced compilers

. Ported experimental PL8 compiler to IBM 370, and only used simpleregister-register and load/store instructions similar to 801

. Code ran faster than other existing compilers that used all 370 instructions!(up to 6 MIPS, whereas 2 MIPS considered good before)

I Joel Emer and Douglas Clark at DEC. Measured VAX-11/780 using external hardware. Found it was actually a 0.5 MIPS machine, not a 1 MIPS machine. 20% of VAX instrs = 60% of µcode, but only 0.2% of the dynamic execution

I VAX 8800, high-end VAX in 1984. Control store: 16K×147b RAM, Unified Cache: 64K×8b RAM. 4.5× more microstore RAM than cache RAM!

ECE 4750 T02: From CISC to RISC 9 / 43

Page 10: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

From CISC to RISC

I Key changes in tech constraints. Logic, RAM, ROM all implemented with MOS transistors. RAM ≈ same speed as ROM

I Use fast RAM to build fast instruction cache of user-visibleinstructions, not fixed hardware microfragments. Change contents of fast instruction memory to fit what app needs

I Use simple ISA to enable hardwired pipelined implementation. Most compiled code only used a few of CISC instructions. Simpler encoding allowed pipelined implementations. Load/Store Reg-Reg ISA as opposed to Mem-Mem ISA

I Further benefit with integration. Early 1980’s→ fit 32-bit datapath, small caches on single chip. No chip crossing in common case allows faster operation

ECE 4750 T02: From CISC to RISC 10 / 43

Page 11: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

From CISC to RISC

μPC

ROM for

μInst

Small

Decoder

User PC

RAM for

Instr Cache

"Larger"

Decoder

Vertical μCode

ControllerRISC

Controller

ECE 4750 T02: From CISC to RISC 11 / 43

Page 12: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

Berkeley RISC Chips

RISC-I fabricated in 1982under the direction of David

Patterson and probably the firstVLSI RISC processor: 1 MHz, 5 µmNMOS, 44.5K transistors, 77 mm2

RISC-II was the 1983 follow up withseveral improvements: 3 MHz, 3 µmNMOS, 40.7K transistors, 60 mm2

ECE 4750 T02: From CISC to RISC 12 / 43

Page 13: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

Stanford MIPS Chips

First MIPS prototype fabricatedin 1984 under direction of John

Hennessy; MIPS-X was the 1986follow up: 5-stage, 20 MHz, 2 µm

2-layer CMOS

John Hennessy leaves Stanford toform MIPS Computer Systems and

their first chip is MIPS R2000 in1986: 8–15 MHz, 2 µm 2-layer

CMOS, 110K transistors, 80 mm2

ECE 4750 T02: From CISC to RISC 13 / 43

Page 14: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

MIPS vs. VAX

Ratio of

MIPS

to

VAX

-- H&P, Appendix J, from Bhandarkar and Clark, 1991

Performance Ratio

Instructions Excuted Ratio

CPI Ratio

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0

spice

matri

x

nasa7

fpppp

tom

catv

doduc

espre

sso

eqntott li

2x more instr

6x lower CPI

2-4x higher perf

ECE 4750 T02: From CISC to RISC 14 / 43

Page 15: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

CISC/RISC Convergence

by Linley Gwennap

Not to be left out in the move to thenext generation of RISC, MIPS Tech-nologies (MTI) unveiled the design ofthe R10000, also known as T5. As thespiritual successor to the R4000, thenew design will be the basis of high-end

MIPS processors for some time, at least until 1997. Byswapping superpipelining for an aggressively out-of-order superscalar design, the R10000 has the potentialto deliver high performance throughout that period.

The new processor uses deep queues decouple theinstruction fetch logic from the execution units. Instruc-tions that are ready to execute can jump ahead of thosewaiting for operands, increasing the utilization of the ex-ecution units. This technique, known as out-of-order ex-ecution, has been used in PowerPC processors for sometime (see 081402.PDF ), but the new MIPS design is themost aggressive implementation yet, allowing more in-structions to be queued than any of its competitors.

Taking advantage of its experience with the 200-MHz R4400, MTI was able to streamline the design andexpects it to run at a high clock rate. Speaking at theMicroprocessor Forum, MTI’s Chris Rowen said that thefirst R10000 processors will reach a speed of 200 MHz,50% faster than the PowerPC 620. At this speed, he ex-pects performance in excess of 300 SPECint92 and 600SPECfp92, challenging Digital’s 21164 for the perfor-mance lead. Due to schedule slips, however, the R10000has not yet taped out; we do not expect volume ship-ments until 4Q95, by which time Digital may enhancethe performance of its processor.

Speculative Execution Beyond BranchesThe front end of the processor is responsible for

maintaining a continuous flow of instructions into thequeues, despite problems caused by branches and cachemisses. As Figure 1 shows, the chip uses a two-way set-associative instruction cache of 32K. Like other highlysuperscalar designs, the R10000 predecodes instructionsas they are loaded into this cache, which holds four extra

bits per instruction. These bits reducethe time needed to determine the ap-propriate queue for each instruction.

The processor fetches four instruc-tions per cycle from the cache and de-codes them. If a branch is discovered, itis immediately predicted; if it is pre-dicted taken, the target address is sentto the instruction cache, redirecting thefetch stream. Because of the one cycleneeded to decode the branch, takenbranches create a “bubble” in the fetchstream; the deep queues, however, gen-erally prevent this bubble from delay-ing the execution pipeline.

The sequential instructions thatare loaded during this extra cycle arenot discarded but are saved in a “re-sume” cache. If the branch is later de-termined to have been mispredicted, thesequential instructions are reloadedfrom the resume cache, reducing themispredicted branch penalty by onecycle. The resume cache has four entriesof four instructions each, allowing spec-ulative execution beyond four branches.

The R10000 design uses the stan-dard two-bit Smith method to predict

M I C R O P R O C E S S O R R E P O R T

MIPS R10000 Uses Decoupled Architecture Vol. 8, No. 14, October 24, 1994 © 1994 MicroDesign Resources

MIPS R10000 Uses Decoupled ArchitectureHigh-Performance Core Will Drive MIPS High-End for Years

1 9 9 4

FORUMMICROPROCESSOR

Figure 1. The R10000 uses deep instruction queues to decouple the instruction fetch logicfrom the five function units.

Instruction Cache32K, two-way associative

PC

Unit

Predecode

Unit

ITLB8 entry

Decode, Map,

DispatchActive

ListMapTable

Main TLB64 entries

ALU1

Data Cache32K, two-way associative

FP

Adder

4 instr

4 instr

4 instr

MemoryQueue

16 entries

IntegerQueue16 entries

FPQueue16 entries

ALU2 FP

Mult÷!

FP÷"

virtualaddr

phys addr

64

DataSRAM

128

512K-16M

Ava

lan

ch

e B

us (

64

bit a

dd

r/d

ata

)

L2 C

ache Inte

rface

128

Syste

m Inte

rface

TagSRAM

BHT512 x 2

Resume

Cache

Address

Adder

Integer Registers64 ! 64 bits

FP Registers64 ! 64 bits

MIPS R10K uses sophisticatedout-of-order engine; branch

delay slot not useful

– Gwennap, MPR, 1994

Intel Nehalem frontend breaks x86 CISCinto smaller RISC-like µops; µcode

engine handles rarely used complex instr

– Kanter, Real World Technologies, 2009

ECE 4750 T02: From CISC to RISC 15 / 43

Page 16: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

Agenda

Technology Trends Motivating RISC

Memory Basics

Single-Cycle Unpipelined MIPS Processor

Multi-Cycle Unpipelined MIPS Processor

ECE 4750 T02: From CISC to RISC 16 / 43

Page 17: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

Register File with Combinational Read

ReadSel1 ReadSel2

WriteSel

Register file

2R+1W ReadData2

WriteData

WE Clock

rd1 rs1

rs2

ws wd

rd2

we

ff

Q0

D0

Clk En

ff

Q1

D1

ff

Q2

D2

ff

Qn-1

Dn-1

...

...

...

Single Register

ReadData1

ECE 4750 T02: From CISC to RISC 17 / 43

Page 18: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

Register File Implementation

reg 31

ws clk

reg 1

wd

we

rs1 rd1 rd2

reg 0

32

5 32 32

rs2 5 5

I Register files with large number of ports are difficult to implement

I Almost all MIPS instrs have exactly two register source operands

I Intel’s Itanium general-purpose register file has 128 registerswith 8 read ports and 4 write ports!

ECE 4750 T02: From CISC to RISC 18 / 43

Page 19: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

“Magic” Memory Model

MAGIC RAM

ReadData

WriteData

Address

WriteEnable Clock

I Read is combinational

I Write is performed at the rising clock edge if enabled

I Write address must be stable at the clock edge

I Later we will consider using more realistic memory

ECE 4750 T02: From CISC to RISC 19 / 43

Page 20: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor

More Realistic Memory Model

SRAM ReadData

WriteData

Address

WriteEnable Clock

I Synchronous operation

I Read data ready next cycle

I Read/write data buses sharesingle internal bit lines

Simplified SRAM Read Simplified SRAM Write

ECE 4750 T02: From CISC to RISC 20 / 43

Page 21: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Agenda

Technology Trends Motivating RISC

Memory Basics

Single-Cycle Unpipelined MIPS Processor

Multi-Cycle Unpipelined MIPS Processor

ECE 4750 T02: From CISC to RISC 21 / 43

Page 22: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

MIPS Instruction Formats

6 5 5 5 5 6ALU 0 rs rt rd 0 func R[rd]← R[rs] func R[rt]

31 26 25 21 20 16 15 11 10 6 5 0

6 5 5 16ALUI opcode rs rt immediate R[rt]← R[rs] op immediate

31 26 25 21 20 16 15 0

6 5 5 16ST: M[ R[rs] + sext(offset) ]← R[rt]LD: R[rt]← M[ R[rs] + sext(offset) ]

LD/ST opcode rs rt offset31 26 25 21 20 16 15 0

6 5 5 16if ( R[rs] == 0 )PC← PC+4 + offset*4

BEQZ opcode rs 0 offset31 26 25 21 20 16 15 0

6 5 5 16PC← R[rs]JALR also does R[31]← PC+8

JR/JALR opcode rs 0 031 26 25 21 20 16 15 0

6 26PC← jtarg( PC, target )JAL also does R[31]← PC+8

J/JAL opcode target31 26 25 0

ECE 4750 T02: From CISC to RISC 22 / 43

Page 23: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Instruction Execution Steps

I 1. Instruction fetchI 2. Decode and register fetchI 3. ALU operationI 4. Memory operation if requiredI 5. Register write-back if requiredI — Computation of the next instruction to fetch

ECE 4750 T02: From CISC to RISC 23 / 43

Page 24: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Datapath: ALU Reg-Reg Instructions (ADDU)

0x4 Add

clk

addr inst

Inst. Memory

PC

inst<25:21> inst<20:16>

inst<15:11>

inst<5:0>

OpCode

z ALU

ALU Control

RegWrite

clk

rd1

GPRs

rs1 rs2

ws wd rd2

we

ECE 4750 T02: From CISC to RISC 24 / 43

Page 25: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Datapath: ALUI Reg-Imm Instructions (ADDIU)

Imm Ext

ExtSel

inst<15:0>

OpCode

0x4 Add

clk

addr inst

Inst. Memory

PC

z ALU

RegWrite

clk

rd1

GPRs

rs1 rs2

ws wd rd2

we inst<25:21>

inst<20:16>

inst<31:26> ALU Control

ECE 4750 T02: From CISC to RISC 25 / 43

Page 26: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Address Conflicts in Merged Datapath with Muxes

Imm Ext

ExtSel OpCode

0x4 Add

clk

addr inst

Inst. Memory

PC

z ALU

RegWrite

clk

rd1

GPRs

rs1 rs2

ws wd rd2

we inst<25:21>

inst<20:16>

inst<15:0>

inst<31:26> ALU Control

inst<15:11>

inst<5:0>

inst<20:16>

ECE 4750 T02: From CISC to RISC 26 / 43

Page 27: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Datapath: ALU and ALUI Instructions

<31:26>, <5:0>

BSrc Reg / Imm

RegDst rt / rd

Imm Ext

ExtSel OpCode

0x4 Add

clk

addr inst

Inst. Memory

PC

z ALU

RegWrite

clk

rd1

GPRs

rs1 rs2

ws wd rd2

we <25:21> <20:16>

<15:0>

OpSel

ALU Control

<15:11>

ECE 4750 T02: From CISC to RISC 27 / 43

Page 28: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Approach for Program and Data Memory

I Harvard-style : separate program and data memories. Inspired by Howard Aiken and the Mark I. Read-only program memory. Read/write data memory. Need some way to load program memory

I Princeton-style : unified program and data memories. Inspired by von Neumann. Single read/write memory for both. Load/store instructions require accessing memory twice during execution

Most modern machines are mixed with separateinstruction and data caches but a unified main memory

that holds both the program and data

ECE 4750 T02: From CISC to RISC 28 / 43

Page 29: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Datapath: Load Instructions (LW)

WBSrc ALU / Mem

RegDst BSrc

rs

offset

ExtSel OpCode OpSel

ALU Control

z ALU

0x4

Add

clk

addr inst

Inst. Memory

PC

RegWrite

clk

rd1

GPRs

rs1 rs2

ws wd rd2

we

Imm Ext

clk

MemWrite

addr

wdata

rdataDataMemory

we

ECE 4750 T02: From CISC to RISC 29 / 43

Page 30: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Datapath: Store Instructions (SW)

WBSrc ALU / Mem

RegDst BSrc

rs

offset

ExtSel OpCode OpSel

ALU Control

z ALU

0x4

Add

clk

addr inst

Inst. Memory

PC

RegWrite

clk

rd1

GPRs

rs1 rs2

ws wd rd2

we

Imm Ext

clk

MemWrite

addr

wdata

rdataDataMemory

we

ECE 4750 T02: From CISC to RISC 30 / 43

Page 31: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Datapath: Conditional Branches (BEQZ)

0x4

Add

PCSrc

clk

WBSrc MemWrite

addr

wdata

rdata Data Memory

we

RegDst BSrc ExtSel OpCode

z

OpSel

clk

zero?

clk

addr inst

Inst. Memory

PC rd1

GPRs

rs1 rs2

ws wd rd2

we

Imm Ext

ALU

ALU Control

Add

br

pc+4

RegWrite

ECE 4750 T02: From CISC to RISC 31 / 43

Page 32: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Datapath: Register-Indirect Jumps (JR)

0x4

RegWrite

Add Add

clk

WBSrc MemWrite

addr

wdata

rdata Data Memory

we

RegDst BSrc ExtSel OpCode

z

OpSel

clk

zero?

clk

addr inst

Inst. Memory

PC rd1

GPRs

rs1 rs2

ws wd rd2

we

Imm Ext

ALU

ALU Control

PCSrc br

pc+4

rind

ECE 4750 T02: From CISC to RISC 32 / 43

Page 33: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Datapath: Register-Indirect Jump-&-Link (JALR)

0x4

RegWrite

Add Add

clk

WBSrc MemWrite

addr

wdata

rdata Data Memory

we

RegDst BSrc ExtSel OpCode

z

OpSel

clk

zero?

clk

addr inst

Inst. Memory

PC rd1

GPRs

rs1 rs2

ws wd rd2

we

Imm Ext

ALU

ALU Control

31

PCSrc br

pc+4

rind

ECE 4750 T02: From CISC to RISC 33 / 43

Page 34: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Datapath: Absolute Jump-&-Link (J,JAL)

0x4

RegWrite

Add Add

clk

WBSrc MemWrite

addr

wdata

rdata Data Memory

we

RegDst BSrc ExtSel OpCode

z

OpSel

clk

zero?

clk

addr inst

Inst. Memory

PC rd1

GPRs

rs1 rs2

ws wd rd2

we

Imm Ext

ALU

ALU Control

31

PCSrc br

pc+4

rind jabs

ECE 4750 T02: From CISC to RISC 34 / 43

Page 35: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Final Harvard Style Datapath for MIPS

0x4

RegWrite

Add Add

clk

WBSrc MemWrite

addr

wdata

rdata Data Memory

we

RegDst BSrc ExtSel OpCode

z

OpSel

clk

zero?

clk

addr inst

Inst. Memory

PC rd1

GPRs

rs1 rs2

ws wd rd2

we

Imm Ext

ALU

ALU Control

31

PCSrc br rind jabs pc+4

ECE 4750 T02: From CISC to RISC 35 / 43

Page 36: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Hardwired Controller is Pure Combinational Logic

Comb Logic

op code

zero?

ExtSel

BSrc

OpSel

MemWrite

WBSrc

RegDst

RegWrite

PCSrc

Inst<31:26> (Opcode)

Decode Map

Inst<5:0> (Func)

ALUop

0?

+

OpSel ( Func,Op,+,0? )

ExtSel ( sExt16, uExt16, High16)

ECE 4750 T02: From CISC to RISC 36 / 43

Page 37: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Hardwired Control Table

January 26, 2010 CS152, Spring 2010 42

Opcode ExtSel BSrc OpSel MemW RegW WBSrc RegDst PCSrc

ALU

ALUi

ALUiu

LW

SW

BEQZz=0

BEQZz=1

J

JAL

JR

JALR

Hardwired Control Table

BSrc = Reg / Imm WBSrc = ALU / Mem / PC

RegDst = rt / rd / R31 PCSrc = pc+4 / br / rind / jabs

* * * no yes rind PC R31

rind * * * no no * *

jabs * * * no yes PC R31

jabs * * * no no * *

pc+4 sExt16 * 0? no no * *

br sExt16 * 0? no no * *

pc+4 sExt16 Imm + yes no * *

pc+4 Imm Op no yes ALU rt

pc+4 * Reg Func no yes ALU rd

sExt16 Imm Op pc+4 no yes ALU rt

pc+4 sExt16 Imm + no yes Mem rt

uExt16

BSrc = { Reg, Imm } RegDest = { rt, rd, R31 }WBSrc = { ALU, Mem, PC } PCSrc = { pc+4, br, rind, jabs }

ECE 4750 T02: From CISC to RISC 37 / 43

Page 38: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor

Single-Cycle Hardwired Control

Requires that clock period is sufficiently longso that all of the following steps can be completed

I 1. Instruction fetchI 2. Decode and register fetchI 3. ALU operationI 4. Data read or data store if requiredI 5. Register write-back setup time if required

tc > tifetch + trfrd + tALU + tdmem + trfwr

At the rising edge of the clock:the PC, the register file, and the memory are updated

ECE 4750 T02: From CISC to RISC 38 / 43

Page 39: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •

Agenda

Technology Trends Motivating RISC

Memory Basics

Single-Cycle Unpipelined MIPS Processor

Multi-Cycle Unpipelined MIPS Processor

ECE 4750 T02: From CISC to RISC 39 / 43

Page 40: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •

Multi-Cycle Unpipelined Datapath

write -back phase

fetch phase

execute phase

decode & Reg-fetch phase

memory phase

addr

wdata

rdata Data Memory

we ALU

Imm Ext

0x4

Add

addr rdata

Inst. Memory

rd1

GPRs

rs1 rs2

ws wd rd2

we

IR PC

Clock period is reduced by dividing the execution of an instruction intomultiple cycles; allows for more realistic synchronous memory

tc < max(tifetch, trf , tALU , tdmem, trfwr )

CPI will of course be greater than one

ECE 4750 T02: From CISC to RISC 40 / 43

Page 41: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •

Multi-Cycle Unpipelined ControllerEC

E4750

Com

puterA

rchitecture,Fall2011Lab

2:Multicycle

PAR

Cv2

Processor

Figure2:A

ppendix:Multicycle

PAR

Cv1

StateD

iagram

20

ECE 4750 T02: From CISC to RISC 41 / 43

Page 42: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •

Summary

I Microcoding less attractive due to evolving technology constraintsI Unpipelined µarch first step towards RISC design philosophyI “Iron Law” of processor performance helps explain design space

Inst 17 cycles

Inst 25 cycles

Inst 310 cycles

Inst 11 cycle

Inst 21 cycle

Inst 31 cycle

Inst 15 cycles

Inst 23 cycles

Inst 35 cycles

CPI = 7.33

CPI = 1 CPI = 4.33

Microcoded

Single-Cycle

Unpipelined

Multi-Cycle

Unpipelined

Microarchitecture CPI Cycle Time

last topic→ Microcoded >1 shortthis topic→ Single-Cycle Unpipelined 1 longthis topic→ Multi-Cycle Unpipelined >1 short

next topic→ Pipelined ≈1 short

ECE 4750 T02: From CISC to RISC 42 / 43

Page 43: Topic 2: From CISC to RISC ECE 4750 Computer Architecture€¦ · VLSI RISC processor: 1MHz, 5µm NMOS, 44.5K transistors, 77mm2 RISC-II was the 1983 follow up with several improvements:

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •

Acknowledgements

Some of these slides contain material developed and copyrighted by:

Arvind (MIT), Krste Asanovic (MIT/UCB), Joel Emer (Intel/MIT)James Hoe (CMU), John Kubiatowicz (UCB), David Patterson (UCB)

MIT material derived from course 6.823UCB material derived from courses CS152 and CS252

ECE 4750 T02: From CISC to RISC 43 / 43