9
10/5/17 1 CS 61C: Great Ideas in Computer Architecture Lecture 12: Control & Operating Speed Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 Agenda Finish Single-Cycle RISC-V Datapath Controller Instruction Timing Performance Measures Introduction to Pipelining Pipelined RISC-V Datapath And in Conclusion, ... CS 61c Lecture 12: Control & Performance 2 Recap: Adding branches to datapath CS 61c 3 IMEM ALU Imm. Gen +4 DMEM Branch Comp. Reg[] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 1 0 pc 0 1 inst[11:7] inst[19:15] inst[24:20] inst[31:7] alu mem wb alu pc+4 Reg[rs1] pc imm[31:0] Reg[rs2] inst[31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb Implementing JALR Instruction (I-Format) JALR rd, rs, immediate Writes PC+4 to Reg[rd] (return address) Sets PC = Reg[rs1] + immediate Uses same immediates as arithmetic and loads § no multiplication by 2 bytes 4 Adding jalr to datapath CS 61c 5 IMEM ALU Imm. Gen +4 DMEM Branch Comp. Reg[] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst[11:7] inst[19:15] inst[24:20] inst[31:7] pc+4 alu mem wb alu pc+4 Reg[rs1] pc imm[31:0] Reg[rs2] inst[31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb Adding jalr to datapath CS 61c 6 IMEM ALU Imm. Gen +4 DMEM Branch Comp. Reg[] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst[11:7] inst[19:15] inst[24:20] inst[31:7] pc+4 alu mem wb alu pc+4 Reg[rs1] pc imm[31:0] Reg[rs2] inst[31:0] ImmSel=I RegWEn=1 BrUn=* BrEq=* BrLT=* Asel=0 Bsel=1 ALUSel=Add MemRW=Read WBSel=2 PCSel wb

Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

10/5/17

1

CS61C:GreatIdeasinComputerArchitecture

Lecture12:Control&OperatingSpeed

KrsteAsanović &RandyKatz

http://inst.eecs.berkeley.edu/~cs61c/fa17

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 2

Recap:Addingbranchestodatapath

CS61c 3

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

01

10

pc01

inst[11:7]

inst[19:15]inst[24:20]

inst[31:7]

alu

mem

wbalu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

ImplementingJALR Instruction(I-Format)

• JALRrd,rs,immediate−WritesPC+4toReg[rd](returnaddress)− SetsPC=Reg[rs1]+immediate− Usessameimmediates asarithmeticandloads

§ nomultiplicationby2bytes

4

Addingjalr todatapath

CS61c 5

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0

0121

0pc

0

1

inst[11:7]

inst[19:15]inst[24:20]

inst[31:7]

pc+4alu

mem

wbalu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

Addingjalr todatapath

CS61c 6

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0

0121

0pc

0

1

inst[11:7]

inst[19:15]inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel=I RegWEn=1

BrUn=* BrEq=* BrLT=*

Asel=0Bsel=1ALUSel=Add

MemRW=Read WBSel=2PCSel

wb

Page 2: Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

10/5/17

2

Implementingjal Instruction

• JALsavesPC+4inReg[rd](thereturnaddress)• SetPC=PC+offset(PC-relativejump)• Targetsomewherewithin±219 locations,2bytesapart

− ±218 32-bitinstructions• Immediateencodingoptimizedsimilarlytobranchinstructiontoreducehardwarecost

7

Addingjal todatapath

CS61c 8

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

0pc

01

inst[11:7]

inst[19:15]inst[24:20]

inst[31:7]

pc+4alu

mem

wbalu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

Addingjal todatapath

CS61c 9

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

0pc

01

inst[11:7]

inst[19:15]inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel=J RegWEn=1

BrUn=* BrEq=* BrLT=*

Asel=1Bsel=1ALUSel=Add

MemRW=Read WBSel=2PCSel

wb

“UpperImmediate”instructions

• Has20-bitimmediateinupper20bitsof32-bitinstructionword• Onedestinationregister,rd• Usedfortwoinstructions

− LUI– LoadUpperImmediate(addtozero)− AUIPC– AddUpperImmediatetoPC

10

Implementinglui

CS61c 11

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0

0121

00

1

inst[11:7]

inst[19:15]inst[24:20]

inst[31:7]

pc+4alu

mem

wbalu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel=U RegWEn=1

BrUn=* BrE=* BrLT=*

Asel=*Bsel=1 ALUSel=B MemRW=Read WBSel=1PCSel=pc+4

wb

pc

Implementingauipc

CS61c 12

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0

0121

00

1

inst[11:7]

inst[19:15]inst[24:20]

inst[31:7]

pc+4alu

mem

wbalu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel=U RegWEn=1

BrUn=* BrE=* BrLT=*

Asel=1Bsel=1 ALUSel=Add MemRW=0 WBSel=1PCSel=pc+4

wb

pc

Page 3: Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

10/5/17

3

Recap:CompleteRV32IISA

13

NotinCS61C

RV32Ihas47instructionstotal37instructionscoveredinCS61C

Single-CycleRISC-VRV32IDatapath

CS61c 14

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

0pc

01

inst[11:7]

inst[19:15]inst[24:20]

inst[31:7]

pc+4alu

mem

wbalu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 15

Processor

CS61c Lecture12:Control&Performance 16

Processor

Control

DatapathPC

RegistersArithmetic&LogicUnit

(ALU)

Memory

Bytes

Enable?Read/Write

Address

WriteDataReadData

Processor-MemoryInterface

Program

Data

Single-CycleRISC-VRV32IDatapath

CS61c 17

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0

0121

0pc

0

1

inst[11:7]

inst[19:15]inst[24:20]

inst[31:7]

pc+4alu

mem

wbalu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

ControlLogic

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

ControlLogicTruthTable(incomplete)

CS61c Lecture12:Control&Performance 18

Inst[31:0] BrEq BrLT PCSel ImmSel BrUn ASel BSel ALUSel MemRW RegWEn WBSel

add * * +4 * * Reg Reg Add Read 1 ALU

sub * * +4 * * Reg Reg Sub Read 1 ALU

(R-R Op) * * +4 * * Reg Reg (Op) Read 1 ALU

addi * * +4 I * Reg Imm Add Read 1 ALU

lw * * +4 I * Reg Imm Add Read 1 Mem

sw * * +4 S * Reg Imm Add Write 0 *

beq 0 * +4 B * PC Imm Add Read 0 *

beq 1 * ALU B * PC Imm Add Read 0 *

bne 0 * ALU B * PC Imm Add Read 0 *

bne 1 * +4 B * PC Imm Add Read 0 *

blt * 1 ALU B 0 PC Imm Add Read 0 *

bltu * 1 ALU B 1 PC Imm Add Read 0 *

jalr * * ALU I * Reg Imm Add Read 1 PC+4

jal * * ALU J * PC Imm Add Read 1 PC+4

auipc * * +4 U * PC Imm Add Read 1 ALU

Page 4: Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

10/5/17

4

ControlRealizationOptions• ROM

−“Read-OnlyMemory”−Regularstructure−Canbeeasilyreprogrammed

§ fixerrors§ addinstructions

−Popularwhendesigningcontrollogicmanually• CombinatorialLogic

−Today,chipdesignersuselogicsynthesistoolstoconverttruthtablestonetworksofgates

CS61c Lecture12:Control&Performance 19

RV32I,anine-bitISA!

20

NotinCS61C

Instructiontypeencodedusingonly9bitsinst[30],inst[14:12],inst[6:2]

inst[30] inst[14:12] inst[6:2]

ROM-basedControl

CS61c Lecture12:Control&Performance 21

ROM

Inst[30,14:12,6:2] BrEq9

PCSel

ALUSel[3:0]4

11-bitaddress(inputs)

15databits(outputs)

BrLT

ImmSel[2:0]3BrUnASelBSel

MemRWRegWEnWBSel[1:0]2

ROMControllerImplementation

CS61c Lecture12:Control&Performance 22

Control Word foradd

Control Word forsub

Control Word foror

.

.

.

Address

Decode

r...

Inst[]BrEQBrLT

Controlleroutput(PCSel,ImmSel,…)

addsubor

jal

11

Administrivia• Homework2Duetomorrow11:59pm• Project1Part1DueMondayOct.9

− Part2dueMondayOct.16

• Midterm1RegradesduenextTuesday− TalktoaTAifyoudon’tunderstandamidtermquestionorareunsureofaregrade

CS61c Lecture12:Control&Performance 23

Break!

10/5/17 24

Page 5: Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

10/5/17

5

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 25

InstructionTiming

IF ID EX MEM WB Total

I-MEM Reg Read ALU D-MEM Reg W

200 ps 100ps 200 ps 200ps 100ps 800psCS61c Lecture12:Control&Performance 26

InstructionTiming

• Maximumclockfrequency− fmax =1/800ps=1.25GHz

• Mostblocksidlemostofthetime− E.g.fmax,ALU =1/200ps=5GHz!− HowcanwekeepALUbusyallthetime?− 5billionadds/sec,ratherthanjust1.25billion?− Idea:Factories usethreeemployeeshifts- equipmentisalwaysbusy!

Instr IF=200ps ID=100ps ALU=200ps MEM=200ps WB =100ps Total

add X X X X 600ps

beq X X X 500ps

jal X X X 500ps

lw X X X X X 800ps

sw X X X X 700ps

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 28

PerformanceMeasures• “Our”RISC-Vexecutesinstructionsat1.25GHz

−1instructionevery800ps

• Canweimproveitsperformance?−Whatdowemeanwiththisstatement?−Notsoobvious:

§ Quickerresponsetime,soonejobfinishesfaster?§ Morejobsperunittime(e.g.webserverreturningpages)?§ Longerbatterylife?

CS61c Lecture12:Control&Performance 29

TransportationAnalogySportsCar Bus

Passenger Capacity 2 50Travel Speed 200mph 50mphGasMileage 5mpg 2mpg

CS61c Lecture12:Control&Performance 30

SportsCar BusTravelTime 15min 60minTimefor100 passengers 750min 120minGallonsperpassenger 5gallons 0.5 gallons

50Miletrip:

Page 6: Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

10/5/17

6

ComputerAnalogyTransportation Computer

TripTime Program executiontime:e.g.timetoupdatedisplay

Timefor100passengers Throughput:e.g.number ofserverrequestshandledperhour

Gallonsper passenger Energypertask*:e.g.how manymoviesyoucanwatchperbatterychargeorenergybillfordatacenter

31

*Note: powerisnotagoodmeasure,sincelow-powerCPUmightrunforalongtimetocompleteonetaskconsumingmoreenergythanfastercomputerrunningathigherpowerforashortertime

“IronLaw”ofProcessorPerformance

32

Time =Instructions Cycles TimeProgramProgram*Instruction*Cycle

InstructionsperProgram

Determinedby• Task• Algorithm,e.g.O(N2) vsO(N)• Programminglanguage• Compiler• InstructionSetArchitecture(ISA)

CS61c Lecture12:Control&Performance 33

(Average)ClockcyclesperInstruction

Determinedby• ISA• Processorimplementation(ormicroarchitecture)• E.g.for“our”single-cycleRISC-Vdesign,CPI=1• Complexinstructions(e.g.strcpy),CPI>>1• Superscalarprocessors,CPI<1(nextlecture)

CS61c Lecture12:Control&Performance 34

TimeperCycle(1/Frequency)

Determinedby• Processormicroarchitecture(determinescriticalpath

throughlogicgates)• Technology(e.g.14nmversus28nm)• Powerbudget(lowervoltagesreducetransistorspeed)

CS61c Lecture12:Control&Performance 35

SpeedTradeoffExample• Forsometask(e.g.imagecompression)…

CS61c Lecture12:Control&Performance 36

ProcessorA Processor B#Instructions 1Million 1.5MillionAverageCPI 2.5 1Clockrate f 2.5 GHz 2GHzExecutiontime 1ms 0.75ms

ProcessorBisfasterforthistask,despiteexecutingmoreinstructionsandhavingalowerclockrate!

Page 7: Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

10/5/17

7

EnergyperTask

37

Energy =Instructions EnergyProgramProgram*Instruction

Energy αInstructions *CV2

ProgramProgram

“Capacitance” dependsontechnology,processorfeaturese.g.#ofcores

Supplyvoltage,e.g.1V

Wanttoreducecapacitanceandvoltagetoreduceenergy/task

EnergyTradeoffExample• “Next-generation”processor

− C(Moore’sLaw): -15%− Supplyvoltage,Vsup: -15%− Energyconsumption: 1- (1-0.85)3 =-39%

• Significantlyimprovedenergyefficiencythanksto−Moore’sLawAND− Reducedsupplyvoltage

CS61c Lecture12:Control&Performance 38

Energy“IronLaw”

39

Performance=Power*EnergyEfficiency(Tasks/Second)(Joules/Second)(Tasks/Joule)

• Energyefficiency(e.g.,instructions/Joule)iskeymetricinallcomputingdevices• Forpower-constrainedsystems(e.g.,20MWdatacenter),needbetterenergyefficiencytogetmoreperformanceatsamepower• Forenergy-constrainedsystems(e.g.,1Wphone),needbetterenergyefficiencytoprolongbatterylife

EndofScaling• Inrecentyears,industryhasnotbeenabletoreducesupplyvoltagemuch,asreducingitfurtherwouldmeanincreasing“leakagepower”wheretransistorswitchesdon’tfullyturnoff(morelikedimmerswitchthanon-offswitch)• Also,sizeoftransistorsandhencecapacitance,notshrinkingasmuchasbeforebetweentransistorgenerations• Powerbecomesagrowingconcern– the“powerwall”• Cost-effectiveair-cooledchiplimitaround~150W

CS61c Lecture12:Control&Performance 40

0

1

10

100

1000

10000

100000

1000000

10000000

1970 1975 1980 1985 1990 1995 2000 2005 2010

Transistors (Thousands)Frequency (MHz)Power (W)Cores

ProcessorTrends

41[Olukotun, Hammond,Sutter,Smith,Batten]

Break!

10/5/17 42

Page 8: Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

10/5/17

8

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 43

Pipelining• Afamiliarexample:

− Gettingauniversitydegree

• ShortageofComputerscientists(yourstartupisgrowing):− Howlongdoesittaketoeducate16,000?

CS61c Lecture12:Control&Performance 44

Year1 Year2 Year3 Year4

ComputerScientistEducation

CS61c Lecture12:Control&Performance 45

• Option1:serial

• Option2:pipelining

7years

16,000in16years,averagethroughputis1000/year

• 16,000in7years• Steadystatethroughputis4000/year• Resourcesusedefficiently• 4-foldimprovementoverserialeducation

4000graduate

4years

4000graduate

4years

4000graduate

4years

4000

4years

4000graduate4000graduate

4000graduate4000graduate

4000enter

1year

LatencyversusThroughput• Latency

− Timefromenteringcollegetograduation− Serial 4years− Pipelining 4years

• Throughput− Averagenumberofstudentsgraduatingeachyear− Serial 1000− Pipelining 4000

• Pipelining− Increasesthroughput(4xinthisexample)− Butdoesnothingtolatency

§ sometimesworse(additionaloverheade.g.forshifttransition)CS61c Lecture12:Control&Performance 46

SimultaneousversusSequential

CS61c Lecture12:Control&Performance 47

• Whathappenssequentially?• Whathappenssimultaneously?

4years

4000graduate4000graduate

4000graduate4000graduate

4000graduate4000graduate

4000graduate4000graduate

4000graduate4000graduate

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 48

Page 9: Agenda CS 61C: Great Ideas in Computer Architecture Finish ...cs61c/fa17/lec/12/L12 Datapath2 (6u… · e.g. howmany movies you can watch per battery charge or energy bill for datacenter

10/5/17

9

PipeliningwithRISC-V

CS61c 49

Phase Pictogram tstep Serial

Instruction Fetch 200 ps

Reg Read 100ps

ALU 200ps

Memory 200ps

RegisterWrite 100ps

tinstruction 800ps

addt0,t1,t2

ort3,t4,t5

sll t6,t0,t3tcycle

instructionsequence

tinstruction

tcycle Pipelined

200 ps

200ps

200ps

200ps

200ps

1000ps

PipeliningwithRISC-V

CS61c 50

addt0,t1,t2

ort3,t4,t5

sll t6,t0,t3tcycle

instructionsequence

tinstruction

Single Cycle Pipelining

Timing tstep =100…200ps tcycle =200ps

Register accessonly100ps Allcyclessamelength

Instruction time,tinstruction =tcycle =800ps 1000ps

Clockrate,fs 1/800ps =1.25GHz 1/200 ps =5GHz

Relativespeed 1x 4x

SequentialvsSimultaneous

addt0,t1,t2

ort3,t4,t5

sll t6,t0,t3

tcycle=200ps

instructionsequence

tinstruction =1000ps

sw t0,4(t3)

lw t0,8(t3)

addi t2,t2,1

Whathappenssequentially,whathappenssimultaneously?

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 52

AndinConclusion,…• Controller

− Tellsuniversaldatapathhowtoexecuteeachinstruction• Instructiontiming

− Setbyinstructioncomplexity,architecture,technology− Pipeliningincreasesclockfrequency,“instructionspersecond”

§ Butdoesnotreducetimetocompleteinstruction

• Performancemeasures− Differentmeasuresdependingonobjective

§ Responsetime§ Jobs/second§ Energypertask

CS61c Lecture12:Control&Performance 53