124
Out-of-Order Execution: Tomasulo’s Algorithm CS5513 Computer Architecture Wei Wang Computer Science University of Texas at San Antonio

Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Out-of-Order Execution: Tomasulo’sAlgorithm

CS5513 Computer Architecture

Wei Wang

Computer ScienceUniversity of Texas at San Antonio

Page 2: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Text Book ChaptersI “Computer Architecture” Hennessy and Patterson,

Chapter 3.4 and 3.5 about “Dynamic Scheduling”.

Fall 2019 CS5513 Computer Architecture 2

Page 3: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Road Map

Overview of Tomasulo’s Algorithm

An Example of Tomasulo’s Algorithm

Another Example of Tomasulo’s Algorithm

Tomasulo Wrapup

Acknowledgment

Fall 2019 CS5513 Computer Architecture 3

Page 4: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Scoreboarding is Limited ByI Number and type of functional units.I Number of instruction buffer entries (scoreboard size)I Amount of application ILP (RAW hazards)I Presence of anti-dependencies (WAR) and output

dependencies (WAW)– In-order issue for WAW/Structural Hazards limits

scheduler.– WAR stalls are critical for loops (WAR prevents hardware

unrolling in Scoreboarding).

Fall 2019 CS5513 Computer Architecture 4

Page 5: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

The WAR Limitation of ScoreboardingI WAR stalls are critical for loops (WAR prevents

hardware unrolling in Scoreboarding).– For example, for the following two iterations of a loop,

instruction 5 cannot be finished due to WAR hazardbetween instruction 3 and instruction 5, preventing theinstructions following it from execution,

1 mov R3, [R12+R14]2 mul R3, R3, 113 add R5, R3, 104 add R14, R14, 45 mov R3, [R12+R14]6 mul R3, R3, 117 add R5, R3, 108 add R14, R14, 4

I One way to address this WAR limitation is to renameregisters.

Fall 2019 CS5513 Computer Architecture 5

Page 6: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Register RenamingI Dynamically change register names to eliminate

“false dependencies” (WAR/WAW hazards)I Architectural registers are Names not Locations

– Many more locations (e.g., “reservation stations” or“physical registers”) than names (“logical or architecturalregisters”)

– Dynamically map names to locations (e.g., mapping EAXto a inter-stage buffer).

Fall 2019 CS5513 Computer Architecture 6

Page 7: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Register Renaming ExampleI For example, the previous code can be changed by

renaming R3 to R6 to allow two iterations of the loopto executed in parallel (assuming there are enoughfunctional units)

1 mov R3, [ R12+R14 ]2 mul R3, R3, 113 add R5, R3, 104 add R14 , R14 , 45 mov R3, [ R12+R14 ]6 mul R3, R3, 117 add R5, R3, 108 add R14 , R14 , 4

1 mov R3, [ R12+R14 ]2 mul R3, R3, 113 add R5, R3, 104 add R14 , R14 , 45 mov R6, [ R12+R14 ]6 mul R6, R6, 117 add R6, R6, 108 add R14 , R14 , 4

Fall 2019 CS5513 Computer Architecture 7

Page 8: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo’s ApproachI Used in IBM 360/91 Machines (Late 60s)I Similar to scoreboarding, but added register renamingI Key concept: Reservation StationsI Very important in Architecture, since most processors

use this algorithm.

Fall 2019 CS5513 Computer Architecture 8

Page 9: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Reservation Stations (RS)I Distributed (rather than centralized) control scheme.

– Bypassing is allowed via Common Data Bus (CDB) to RS– Register Renaming eliminates WAR/WAW hazards

I Scoreboard/Instruction Buffer => ReservationStations

– Fetch and buffer operands as soon as availableI Eliminates the need to always get values from

registers– Pending instructions designate reservation stations that

will provide their inputs– Successive writes to a register cause only the last one to

update the register

Fall 2019 CS5513 Computer Architecture 9

Page 10: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Register Renaming with TomasuloI At instruction issue:

– Registers for source operands are renamed to the namesof the registers in reservation stations

– Values can exist in the registers of the reservation stationsor the register file

I To eliminate WARs, register values are copied toreservation stations at issue

I Other methods example use pointer-based renaming(map-table)

Fall 2019 CS5513 Computer Architecture 10

Page 11: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Reservation Station ComponentsI Op: Operation to perform in the unitI Qj, Qk: Ids of the reservation station entries that

produce source values– No ready flags needed as in Scoreboard;– If index is 0, then the values (Vj or Vk) are ready– Store buffers only have Qi for RS producing result, since

stores only need one register.I Vj, Vk: Values of Source operandsI Busy: Indicates reservation station or FU are

occupiedI RegisterResultStatus: Indicates which functional unit

will write each register, if one exists. Blank when nopending instructions that will write that register.

Fall 2019 CS5513 Computer Architecture 11

Page 12: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Three Stages of Tomasulo Algorithm1. Issue (IS): get instruction from instruction queue

– Issue if there are slots in the reservation station (nostructural hazard)

– Control Unit issues instruction and send operands (renameregisters) to reservation stations.

2. Execution (EX): operate on operands3. Write back (WB): finish execution and write back to

registers– Send result value to Common Data Bus (CDB)– Registers in register file and reservation stations pick up

values from CDB based on needs.

Fall 2019 CS5513 Computer Architecture 12

Page 13: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Common Data BusI Normal data bus: data + destination (“go to” bus)I Common data bus: data + source (“come from” bus)

– 64 bits of data + 4 bits id of the source reservation stationentry that generates the result/data

– Registers in RS and register file monitors the bus and pickup the value based on the source id.

– A broadcasting bus

Fall 2019 CS5513 Computer Architecture 13

Page 14: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Organization

Load Buffers act as a RS forload instructions.

Fall 2019 CS5513 Computer Architecture 14

Page 15: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Organization

RS for the adder.

Fall 2019 CS5513 Computer Architecture 14

Page 16: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Organization

RS for the multiplier.

Fall 2019 CS5513 Computer Architecture 14

Page 17: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Organization

Store Buffers act as a RS forstore instructions.

Fall 2019 CS5513 Computer Architecture 14

Page 18: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Organization

Register file.

Fall 2019 CS5513 Computer Architecture 14

Page 19: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Organization

Load buffers, adder andmultiplier send valuesand sources to CDB. CDBbroadcasts the values andsources to RS, register fileand store buffers.

Fall 2019 CS5513 Computer Architecture 14

Page 20: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Road Map

Overview of Tomasulo’s Algorithm

An Example of Tomasulo’s Algorithm

Another Example of Tomasulo’s Algorithm

Tomasulo Wrapup

Acknowledgment

Fall 2019 CS5513 Computer Architecture 15

Page 21: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example

Clock: cycle 0

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11LD R2 45+ R13MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Fall 2019 CS5513 Computer Architecture 16

Page 22: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 1

Clock: cycle 1

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1LD R2 45+ R13MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 Yes 34+R11Load2 NoLoad3 No

Load1

Fall 2019 CS5513 Computer Architecture 17

Page 23: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 1

Clock: cycle 1

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1LD R2 45+ R13MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 Yes 34+R11Load2 NoLoad3 No

Issue the firstinstruction atcycle 1.

Occupiedthe first loadbuffer entrywith instruc-tion 1.

Register R6 will be filled with theresult from the Load1.

Load1

Fall 2019 CS5513 Computer Architecture 17

Page 24: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 2

Clock: cycle 2

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2LD R2 45+ R13 2MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No

Load1Load2

Fall 2019 CS5513 Computer Architecture 18

Page 25: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 2

Clock: cycle 2

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2LD R2 45+ R13 2MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No

Issue the 2nd instruction at cy-cle 2, since load buffers haveempty entries. Unlike Score-boarding, as long as there areentries in the load buffers, loadinstructions can be issued.

Occupiedthe 2nd loadbuffer entrywith instruc-tion 2.

Register R2 will be filled with theresult from Load2.

Load1Load2

Fall 2019 CS5513 Computer Architecture 18

Page 26: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 2

Clock: cycle 2

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2LD R2 45+ R13 2MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No

Instruction 1starts execu-tion. Takes2 cycles tocomplete.

Load1Load2

Fall 2019 CS5513 Computer Architecture 18

Page 27: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 3

Clock: cycle 3

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3LD R2 45+ R13 2 3MUL R0 R2 R4 3SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No

Load1Load2Mul1

Fall 2019 CS5513 Computer Architecture 19

Page 28: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 3

Clock: cycle 3

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3LD R2 45+ R13 2 3MUL R0 R2 R4 3SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No

Issue the 3rdinstruction atcycle 3.

Register R0 will be filled with theresult from Mul1.

Occupy Mul1 with the 3rd instruc-tion. Copy the value of register R4in to Vk (src2 value). Mark Qj to“Load2”, indicating src1 comes from“Load2”.Load1Load2Mul1

Fall 2019 CS5513 Computer Architecture 19

Page 29: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 3

Clock: cycle 3

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3LD R2 45+ R13 2 3MUL R0 R2 R4 3SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No

Instruction1 finishesexecution atthe end ofcycle 3.

Load1Load2Mul1

Fall 2019 CS5513 Computer Architecture 19

Page 30: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 3

Clock: cycle 3

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3LD R2 45+ R13 2 3MUL R0 R2 R4 3SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No

Instruction 2 startsat cycle 3, assumingmultiple memory loadscan happened at thesame time. Takes 2cycles to execute.

Load1Load2Mul1

Fall 2019 CS5513 Computer Architecture 19

Page 31: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 4

Clock: cycle 4

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No

M1Load2Mul1 Add1

Fall 2019 CS5513 Computer Architecture 20

Page 32: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 4

Clock: cycle 4

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No

Issue the 4thinstruction atcycle 4.

Register R8 will be filled with theresult from Add1.

Occupy Add1 with the 4th instruc-tion. Copy the value (M1) of from“Load1” from CDB to Vj (src1value). Mark Qk to “Load2”, indi-cating src1 comes from “Load2”.

M1Load2Mul1 Add1

Fall 2019 CS5513 Computer Architecture 20

Page 33: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 4

Clock: cycle 4

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No

Instruction 1writes back.Load bufferssends resultvalue M1 andid “Load1” onto CDB.

Load1 is re-leased.

Register R6 sees id “Load1” onCDB matches its source, picks upthe value M1.

M1Load2Mul1 Add1

Fall 2019 CS5513 Computer Architecture 20

Page 34: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 4

Clock: cycle 4

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No

Instruction2 finishes atcycle 4.

M1Load2Mul1 Add1

Fall 2019 CS5513 Computer Architecture 20

Page 35: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 4

Clock: cycle 4

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No

Instruction 3stalled due tomissing src1.

M1Load2Mul1 Add1

Fall 2019 CS5513 Computer Architecture 20

Page 36: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 5

Clock: cycle 5

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 M2

Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M1M2Mul1 Add1 Mul2

Fall 2019 CS5513 Computer Architecture 21

Page 37: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 5

Clock: cycle 5

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 M2

Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M1M2Mul1 Add1 Mul2

Instruction 2writes back.Load bufferssends resultvalue M2 andid “Load2” onto CDB.

Load2 is re-leased.

Register R2 sees id “Load2” onCDB matches its source, picks upthe value M2.

Fall 2019 CS5513 Computer Architecture 21

Page 38: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 5

Clock: cycle 5

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 M2

Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M1M2Mul1 Add1 Mul2

Issue the 5thinstruction atcycle 5.

Register R10 will be filled with theresult from Mul2.

Occupy Mul2 with the 5th instruc-tion. Copy the value (M1) of fromR0 to Vk (src2 value). Mark Qj to“Mul1”, indicating src1 comes from“Mul1”.

Fall 2019 CS5513 Computer Architecture 21

Page 39: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 5

Clock: cycle 5

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 M2

Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M1M2Mul1 Add1 Mul2

Mul1 sees id “Load2” on CDBmatches its source, picks up thevalue M2 and set it to Vj .

Fall 2019 CS5513 Computer Architecture 21

Page 40: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 5

Clock: cycle 5

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes SUB M1 M2

Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M1M2Mul1 Add1 Mul2

Add1 sees id “Load2” on CDBmatches its source, picks up thevalue M2 and set it to Vk .

Fall 2019 CS5513 Computer Architecture 21

Page 41: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 6

Clock: cycle 6

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

1 Add1 Yes SUB M1 M2

Add2 Yes ADD M2 Add1Add3 No

9 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6DIV R10 R0 R6 5ADD R6 R8 R2 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Add1 Mul2Add2

Fall 2019 CS5513 Computer Architecture 22

Page 42: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 6

Clock: cycle 6

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

1 Add1 Yes SUB M1 M2

Add2 Yes ADD M2 Add1Add3 No

9 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6DIV R10 R0 R6 5ADD R6 R8 R2 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Add1 Mul2Add2

Issue the 6th instruction at cy-cle 6, since there are emptyslots in the RS for adders.

Register R6 will be filled with theresult from Add2.

Occupy Add2 with the 6th instruc-tion. Copy the value (M2) of fromR2 to Vk (src2 value). Mark Qj to“Add1”, indicating src1 comes from“Add1”.

Fall 2019 CS5513 Computer Architecture 22

Page 43: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 6

Clock: cycle 6

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

1 Add1 Yes SUB M1 M2

Add2 Yes ADD M2 Add1Add3 No

9 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6DIV R10 R0 R6 5ADD R6 R8 R2 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Add1 Mul2Add2

All sourceoperandsare ready,instruction 3starts execu-tion.

Fall 2019 CS5513 Computer Architecture 22

Page 44: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 6

Clock: cycle 6

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

1 Add1 Yes SUB M1 M2

Add2 Yes ADD M2 Add1Add3 No

9 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6DIV R10 R0 R6 5ADD R6 R8 R2 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Add1 Mul2Add2

All sourceoperandsare ready,instruction 4starts execu-tion.

Instruction 3 and 4 still needs 1 and9 cycles to finish.

Fall 2019 CS5513 Computer Architecture 22

Page 45: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 7

Clock: cycle 7

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

0 Add1 Yes SUB M1 M2

Add2 Yes ADD M2 Add1Add3 No

8 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7DIV R10 R0 R6 5ADD R6 R8 R2 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Add1 Mul2Add2

Fall 2019 CS5513 Computer Architecture 23

Page 46: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 7

Clock: cycle 7

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

0 Add1 Yes SUB M1 M2

Add2 Yes ADD M2 Add1Add3 No

8 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7DIV R10 R0 R6 5ADD R6 R8 R2 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Add1 Mul2Add2

Instruction 4finishes.

Fall 2019 CS5513 Computer Architecture 23

Page 47: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 8

Clock: cycle 8

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val1 M2

Add3 No7 Mul1 Yes MUL M2 Value(R4)

Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Add2

Fall 2019 CS5513 Computer Architecture 24

Page 48: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 8

Clock: cycle 8

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val1 M2

Add3 No7 Mul1 Yes MUL M2 Value(R4)

Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Add2

Instruction 4 writesback.

Add1 sends result value Val1 andid “Add1” on to CDB. Instruction 4releases Add1.

Fall 2019 CS5513 Computer Architecture 24

Page 49: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 8

Clock: cycle 8

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val1 M2

Add3 No7 Mul1 Yes MUL M2 Value(R4)

Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Add2

Add2 reads id “Add1” from CDB,which matches its Vj . THerefore,Add2 reads value Val1 from CDB toValj

Register R8 sees id “Add1” onCDB, which matches its source,so it picks up the value Val1.

Fall 2019 CS5513 Computer Architecture 24

Page 50: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 9

Clock: cycle 9

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 No1 Add2 Yes ADD Val1 M2

Add3 No6 Mul1 Yes MUL M2 Value(R4)

Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Add2

Fall 2019 CS5513 Computer Architecture 25

Page 51: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 9

Clock: cycle 9

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 No1 Add2 Yes ADD Val1 M2

Add3 No6 Mul1 Yes MUL M2 Value(R4)

Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Add2

Instruction 6starts execu-tion.

Add takes 2 cycles. 1 cycle re-mains.

Fall 2019 CS5513 Computer Architecture 25

Page 52: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 10

Clock: cycle 10

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 No0 Add2 Yes ADD Val1 M2

Add3 No5 Mul1 Yes MUL M2 Value(R4)

Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Add2

Fall 2019 CS5513 Computer Architecture 26

Page 53: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 10

Clock: cycle 10

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 No0 Add2 Yes ADD Val1 M2

Add3 No5 Mul1 Yes MUL M2 Value(R4)

Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Add2

Instruction6 finishesexecution.

Fall 2019 CS5513 Computer Architecture 26

Page 54: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 11

Clock: cycle 11

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 No

4 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Val2

Fall 2019 CS5513 Computer Architecture 27

Page 55: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 11

Clock: cycle 11

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 No

4 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Val2

Instruction 6writes back.

Add2 sends result value Val2 andid “Add2” on to CDB. Instruction 6releases Add2.

Fall 2019 CS5513 Computer Architecture 27

Page 56: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 11

Clock: cycle 11

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 No

4 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Val2

Register R6 sees id “Add2” on CDB, which matches its source, so it picks up thevalue Val2. Unlike Scoreboarding, WAW does not prevent write back as registersare already renamed: instruction 5 (DIV) waits on “Mul1” instead of R6 in the RS.

Fall 2019 CS5513 Computer Architecture 27

Page 57: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 15

Clock: cycle 15

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 No

0 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Val2

Fall 2019 CS5513 Computer Architecture 28

Page 58: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 15

Clock: cycle 15

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 No

0 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Mul1 Val1 Mul2Val2

Instruction 3finishes.

Fall 2019 CS5513 Computer Architecture 28

Page 59: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 16

Clock: cycle 16

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 Yes DIV Val3 M1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2/3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Mul2Val2

Fall 2019 CS5513 Computer Architecture 29

Page 60: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 16

Clock: cycle 16

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 Yes DIV Val3 M1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2/3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Mul2Val2

Instruction 3writes back.

Mul1 sends result value Val3 andid “Mul1” on to CDB. Instruction 3releases Mul1.

Fall 2019 CS5513 Computer Architecture 29

Page 61: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 16

Clock: cycle 16

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 Yes DIV Val3 M1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2/3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Mul2Val2

Mul2 reads id “Mul1” from CDB,which matches its Vj . THerefore,Mul2 reads value Val2 from CDB toValj

Register R0 sees id “Mul1” on CDB,which matches its source, so itpicks up the value Val3.

Fall 2019 CS5513 Computer Architecture 29

Page 62: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 17

Clock: cycle 17

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 No

39 Mul2 Yes DIV Val3 M1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Mul2Val2

Fall 2019 CS5513 Computer Architecture 30

Page 63: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 17

Clock: cycle 17

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 No

39 Mul2 Yes DIV Val3 M1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Mul2Val2

Instruction 5starts execu-tion.

Fall 2019 CS5513 Computer Architecture 30

Page 64: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 56

Clock: cycle 56

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 No

0 Mul2 Yes DIV Val3 M1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Mul2Val2

Fall 2019 CS5513 Computer Architecture 31

Page 65: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 56

Clock: cycle 56

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 No

0 Mul2 Yes DIV Val3 M1

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Mul2Val2

Instruction5 finishesexecution.

Fall 2019 CS5513 Computer Architecture 31

Page 66: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 57

Clock: cycle 57

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 No

0 Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56 57ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Val4Val2

Fall 2019 CS5513 Computer Architecture 32

Page 67: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 57

Clock: cycle 57

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 No

0 Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56 57ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Val4Val2

Instruction 5writes back.

Fall 2019 CS5513 Computer Architecture 32

Page 68: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example: Cycle 57

Clock: cycle 57

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 No

0 Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56 57ADD R6 R8 R2 6 9-10 11

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

M2Val3 Val1 Val4Val2

Out-of-order com-pletion.

Out-of-order exe-cution.

In-order issue.

Fall 2019 CS5513 Computer Architecture 32

Page 69: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Comparing to ScoreboardI Comparing to Scoreboard’s 62 cycles, Tomasulo’s

Algorithm takes only 57 cycles to execute the samecode.

I Why Tomasulo’s Algorithm is faster?– Fewer structure hazards (Scoreboarding cannot issue if

functional units are busy).– CDB acts as a data forwarding mechanism: write back

also forwards the value to RS, eliminating a separateregister read.

Fall 2019 CS5513 Computer Architecture 33

Page 70: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Road Map

Overview of Tomasulo’s Algorithm

An Example of Tomasulo’s Algorithm

Another Example of Tomasulo’s Algorithm

Tomasulo Wrapup

Acknowledgment

Fall 2019 CS5513 Computer Architecture 34

Page 71: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2

Clock: cycle 0

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11MUL R1 16 R1ADD R5 R5 R1LD R1 4 R1MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Fall 2019 CS5513 Computer Architecture 35

Page 72: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 1

Clock: cycle 1

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1MUL R1 16 R1ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No

Load1

Fall 2019 CS5513 Computer Architecture 36

Page 73: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 1

Clock: cycle 1

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1MUL R1 16 R1ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No

Load1

Issue the firstinstruction atcycle 1.

Occupiedthe first loadbuffer entrywith instruc-tion 1.

Register R1 will be filled with theresult from the Load1.

Fall 2019 CS5513 Computer Architecture 36

Page 74: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 2

Clock: cycle 2

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2MUL R1 16 R1 2ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No

Mul1

Fall 2019 CS5513 Computer Architecture 37

Page 75: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 2

Clock: cycle 2

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2MUL R1 16 R1 2ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No

Mul1

Issue the2nd instruc-tion at cycle2

Occupy Mul1 with the 3rd instruc-tion. Copy the immd value 16 into Vk (src1 value). Mark Qk to“Load1”, indicating src2 comes from“Load1”.

Register R1 will be filled with the result from Mul1. Note that,value from Load1 will not be copied into R1 anymore. Instead,Load1 will be forwarded to Mul1 directly.

Fall 2019 CS5513 Computer Architecture 37

Page 76: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 2

Clock: cycle 2

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2MUL R1 16 R1 2ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No

Mul1

Instruction 1starts execut-ing. Takes2 cycles tocomplete.

Fall 2019 CS5513 Computer Architecture 37

Page 77: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 3

Clock: cycle 3

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No

Mul1 Add1

Fall 2019 CS5513 Computer Architecture 38

Page 78: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 3

Clock: cycle 3

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No

Mul1 Add1

Issue the 3rdinstruction atcycle 3.

Register R5 will be filled with theresult from Add1.

Occupy Add1 with the 3rd instruc-tion. Copy the value of register R5in to Vj (src1 value). Mark Qk to“Mul1”, indicating src2 comes from“Mul1”.

Fall 2019 CS5513 Computer Architecture 38

Page 79: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 3

Clock: cycle 3

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No

Mul1 Add1

Instruction1 finishesexecution atthe end ofcycle 3.

Fall 2019 CS5513 Computer Architecture 38

Page 80: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 4

Clock: cycle 4

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 M1

Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11 4MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Load2 Add1

Fall 2019 CS5513 Computer Architecture 39

Page 81: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 4

Clock: cycle 4

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 M1

Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11 4MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Load2 Add1

Issue the 4thinstruction atcycle 4.

Occupiedthe 2nd loadbuffer entrywith instruc-tion 2.

Register R1 will be filled with the result from Load2. Note that,value from Mul1 will not be copied into R1 anymore. Instead,Mul1 will be forwarded to Add1 directly.

Fall 2019 CS5513 Computer Architecture 39

Page 82: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 4

Clock: cycle 4

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 M1

Mul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11 4MUL R1 16 R1ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Load2 Add1

Instruction 1writes back.Load bufferssends resultvalue M1 andid “Load1” onto CDB.

Load1 is re-leased.

Mul1 sees id “Load1” on CDBmatches its source, picks up thevalue M1 and set it to Vk .

Fall 2019 CS5513 Computer Architecture 39

Page 83: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 5

Clock: cycle 5

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 No

9 Mul1 Yes MUL 16 M1

Mul2 Yes MUL 16 Load2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5MUL R1 16 R1 5ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Mul2 Add1

Fall 2019 CS5513 Computer Architecture 40

Page 84: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 5

Clock: cycle 5

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 No

9 Mul1 Yes MUL 16 M1

Mul2 Yes MUL 16 Load2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5MUL R1 16 R1 5ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Mul2 Add1

Issue the 5thinstruction atcycle 5.

Register R1 will be filled with the result from Mul2. Note that,value from Load2 will not be copied into R1 anymore. Instead,Mul1 will be forwarded to Mul2 directly.

Occupy Mul2 with the 5th instruction. Copy the immdvalue 16 in to Vk (src1 value). Mark Qk to “Load2”,indicating src2 comes from “Load2”.

Fall 2019 CS5513 Computer Architecture 40

Page 85: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 5

Clock: cycle 5

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 No

9 Mul1 Yes MUL 16 M1

Mul2 Yes MUL 16 Load2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5MUL R1 16 R1 5ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Mul2 Add1

Instruction 2starts execut-ing at Mul1

Instruction 2 still needs 9 cycles tofinish.

Fall 2019 CS5513 Computer Architecture 40

Page 86: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 5

Clock: cycle 5

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 No

9 Mul1 Yes MUL 16 M1

Mul2 Yes MUL 16 Load2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5MUL R1 16 R1 5ADD R5 R5 R1

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Mul2 Add1

Instruction 4starts execut-ing at Load2

Fall 2019 CS5513 Computer Architecture 40

Page 87: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 6

Clock: cycle 6

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No

8 Mul1 Yes MUL 16 M1

Mul2 Yes MUL 16 Load2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6MUL R1 16 R1 5ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Mul2 Add2

Fall 2019 CS5513 Computer Architecture 41

Page 88: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 6

Clock: cycle 6

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No

8 Mul1 Yes MUL 16 M1

Mul2 Yes MUL 16 Load2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6MUL R1 16 R1 5ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Mul2 Add2

Issue the 6thinstruction atcycle 6.

Register R5 will be filled with the result from Add2. Note that,value from Add1 will not be copied into R5 anymore. Instead,Add1 will be forwarded to Add2 directly.

Occupy Add2 with the 6th instruc-tion. Both operands are not ready,and are from “Add1” and “Mul2”each.

Fall 2019 CS5513 Computer Architecture 41

Page 89: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 6

Clock: cycle 6

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No

8 Mul1 Yes MUL 16 M1

Mul2 Yes MUL 16 Load2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6MUL R1 16 R1 5ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No

Mul2 Add2

Instruction4 finishesexecution atthe end ofcycle 6.

Fall 2019 CS5513 Computer Architecture 41

Page 90: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 7

Clock: cycle 7

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No

7 Mul1 Yes MUL 16 M1

Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Fall 2019 CS5513 Computer Architecture 42

Page 91: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 7

Clock: cycle 7

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No

7 Mul1 Yes MUL 16 M1

Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Instruction 4writes back.Load bufferssends resultvalue M2 andid “Load2” onto CDB.

Load2 is re-leased.

Mul2 sees id “Load2” on CDBmatches its source, picks up thevalue M2 and set it to Vk .

Fall 2019 CS5513 Computer Architecture 42

Page 92: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 8

Clock: cycle 8

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No

6 Mul1 Yes MUL 16 M1

9 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Fall 2019 CS5513 Computer Architecture 43

Page 93: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 8

Clock: cycle 8

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No

6 Mul1 Yes MUL 16 M1

9 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Instruction 5 starts executingat Mul2, assuming we havetwo adders. We now have twoiterations of the loop running atthe same time.

Instruction 2 and 5 still need 6 and9 cycles to finish.

Fall 2019 CS5513 Computer Architecture 43

Page 94: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 14

Clock: cycle 14

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No

0 Mul1 Yes MUL 16 M1

3 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Fall 2019 CS5513 Computer Architecture 44

Page 95: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 14

Clock: cycle 14

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No

0 Mul1 Yes MUL 16 M1

3 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Instruction2 finishesexecuting.

Fall 2019 CS5513 Computer Architecture 44

Page 96: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 15

Clock: cycle 15

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No

2 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Fall 2019 CS5513 Computer Architecture 45

Page 97: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 15

Clock: cycle 15

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No

2 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Instruction 2writes back.

Mul1 sends result value Val1 andid “Mul1” on to CDB. Instruction 2releases Mul1.

Fall 2019 CS5513 Computer Architecture 45

Page 98: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 15

Clock: cycle 15

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No

2 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Add1 sees id “Mul1” on CDBmatches its source, picks up thevalue Val1 and set it to Vk .

Fall 2019 CS5513 Computer Architecture 45

Page 99: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 16

Clock: cycle 16

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

1 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No

1 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Fall 2019 CS5513 Computer Architecture 46

Page 100: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 16

Clock: cycle 16

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

1 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No

1 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Instruction 3starts execut-ing at Add1.

Instruction 3 and 5 still each needs1 cycle to finish.

Fall 2019 CS5513 Computer Architecture 46

Page 101: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 17

Clock: cycle 17

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

0 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No

0 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Fall 2019 CS5513 Computer Architecture 47

Page 102: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 17

Clock: cycle 17

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

0 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No

0 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Instruction3 finishesexecuting.

Fall 2019 CS5513 Computer Architecture 47

Page 103: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 17

Clock: cycle 17

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

0 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No

0 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Mul2 Add2

Instruction5 finishesexecuting.

Fall 2019 CS5513 Computer Architecture 47

Page 104: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 18

Clock: cycle 18

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val2 Mul2Add3 NoMul1 No

0 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Fall 2019 CS5513 Computer Architecture 48

Page 105: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 18

Clock: cycle 18

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val2 Mul2Add3 NoMul1 No

0 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Instruction 3writes back.

Add1 sends result value Val2 andid “Add1” on to CDB. Instruction 3releases Add1.

Fall 2019 CS5513 Computer Architecture 48

Page 106: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 18

Clock: cycle 18

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val2 Mul2Add3 NoMul1 No

0 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Add2 sees id “Add1” on CDBmatches its source, picks up thevalue Val2 and set it to Vj .

Fall 2019 CS5513 Computer Architecture 48

Page 107: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 18

Clock: cycle 18

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val2 Mul2Add3 NoMul1 No

0 Mul2 Yes MUL 16 M2

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Instruction 5cannot writeback due toCDB is busy(structuralhazard).

Fall 2019 CS5513 Computer Architecture 48

Page 108: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 19

Clock: cycle 19

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val2 Val3Add3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Fall 2019 CS5513 Computer Architecture 49

Page 109: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 19

Clock: cycle 19

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val2 Val3Add3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Instruction 5writes back.

Mul2 sends result value Val3 andid “Mul2” on to CDB. Instruction 5releases Mul2.

Fall 2019 CS5513 Computer Architecture 49

Page 110: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 19

Clock: cycle 19

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 Yes ADD Val2 Val3Add3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Add2 sees id “Mul2” on CDBmatches its source, picks up thevalue Val3 and set it to Vk .

Register R1 sees id “Mul2” on CDB,which matches its source, so itpicks up the value Val3.

Fall 2019 CS5513 Computer Architecture 49

Page 111: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 20

Clock: cycle 20

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 No1 Add2 Yes ADD Val2 Val3

Add3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Fall 2019 CS5513 Computer Architecture 50

Page 112: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 20

Clock: cycle 20

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 No1 Add2 Yes ADD Val2 Val3

Add3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Instruction 6starts execut-ing.

Fall 2019 CS5513 Computer Architecture 50

Page 113: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 21

Clock: cycle 21

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 No0 Add2 Yes ADD Val2 Val3

Add3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Fall 2019 CS5513 Computer Architecture 51

Page 114: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 21

Clock: cycle 21

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 No0 Add2 Yes ADD Val2 Val3

Add3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Add2

Instruction6 finishesexecuting.

Fall 2019 CS5513 Computer Architecture 51

Page 115: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 22

Clock: cycle 22

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NOAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21 22

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Val4

Fall 2019 CS5513 Computer Architecture 52

Page 116: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 22

Clock: cycle 22

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NOAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21 22

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Val4

Instruction 6writes back.

Add2 sends result value Val4 andid “Add2” on to CDB. Instruction 6releases Add2.

Fall 2019 CS5513 Computer Architecture 52

Page 117: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo Example 2: Cycle 22

Clock: cycle 22

Register Result Status:

FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...

Reservation Stations for the Adder and Multiplier

Time Name Busy Op Src1 ValueVj

Src2 ValueVk

Src1 RS IdQj

Src2 RS IdQk

Add1 NoAdd2 NOAdd3 NoMul1 NoMul2 No

Instruction Status:

Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21 22

Load Buffers:

Busy AddressLoad1 NoLoad2 NoLoad3 No

Val3 Val4

Register R5 sees id “Add2” onCDB, which matches its source,so it picks up the value Val4.

Fall 2019 CS5513 Computer Architecture 52

Page 118: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Road Map

Overview of Tomasulo’s Algorithm

An Example of Tomasulo’s Algorithm

Another Example of Tomasulo’s Algorithm

Tomasulo Wrapup

Acknowledgment

Fall 2019 CS5513 Computer Architecture 53

Page 119: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo ReviewI Reservation Stations

– RS provides many more registers than those name-able inthe assembly programming.

– Distribute RAW hazard detection– Renaming eliminates WAW hazards– Buffering values in Reservation Stations removes WARs– Tag match in CDB requires many associative compares

I Common Data Bus– Broadcast results to RS and register file.– Can be a bottle neck: Multiple writebacks (multiple CDBs)

expensive

Fall 2019 CS5513 Computer Architecture 54

Page 120: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Load/Store reorderingI In case a store and a load to the same memory

address are reordered, store buffer has to forward thevalue to the load buffer.

– That is, if store R1 and load R1 are reordered to load R1and store R1, load R1 cannot be issued but wait for storebuffer to forward the value.

I To support this forwarding, every load has to checkthe memory addresses in the store buffers beforeexecuting.

Fall 2019 CS5513 Computer Architecture 55

Page 121: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Tomasulo vs. Scoreboarding1. No explicit checking for WAW or WAR hazards2. CDB broadcasts results rather than waiting on

registers3. Loads/Store are treated like basic FUs4. Distributed vs. Centralized control

Fall 2019 CS5513 Computer Architecture 56

Page 122: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Register Renaming: Pointer-BasedI Registers are renamed (mapped) to an internal

register.I A map table records this renaming/mapping.I Mapper/Map Table: Hardware to hold these

mappings– Register Writes: Allocate new location, note mapping in

table– Register Reads: Look in map table, find location of most

recent writeI Deallocate mappings when doneI Used in MIPS R10K, Alpha 21264, Pentium 4 and

POWER4

Fall 2019 CS5513 Computer Architecture 57

Page 123: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

Road Map

Overview of Tomasulo’s Algorithm

An Example of Tomasulo’s Algorithm

Another Example of Tomasulo’s Algorithm

Tomasulo Wrapup

Acknowledgment

Fall 2019 CS5513 Computer Architecture 58

Page 124: Out-of-Order Execution: Tomasulo's Algorithm - CS5513 Computer …€¦ · Out-of-OrderExecution:Tomasulo’s Algorithm CS5513ComputerArchitecture WeiWang ComputerScience UniversityofTexasatSanAntonio

AcknowledgmentThis lecture is partially based on the slides from Dr. DavidBrooks. The first Tomasulo’s example was based ib Dr.Broderson’s example, for CS152 at UCB (Copyright (C)2001 UCB).

Fall 2019 CS5513 Computer Architecture 59