107
Pipelining

Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

  • Upload
    lamanh

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelining

Page 2: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Performance Measurements

•  Cycle Time: Time in between clock ticks

•  Latency: Time to finish a complete job, start to finish

•  Throughput: Average jobs completed per unit time

•  CyclesPerJob: Number of cycles between finishing jobs.

Page 3: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Goals

•  Faster clock rate•  Use machine more efficiently•  No longer execute only one instruction

at a time

Page 4: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry

•  Laundry-o-matic washes, dries & folds

•  Wash: 30 min•  Dry: 40 min•  Fold: 20 min•  It switches them internally with no

delay•  How long to complete 1 load?

______

Page 5: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry

•  Laundry-o-matic washes, dries & folds

•  Wash: 30 min•  Dry: 40 min•  Fold: 20 min•  It switches them internally with no

delay•  How long to complete 1 load? 90

min

Page 6: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry-o-Matic - SingleCycle

Minutes

Load

1 2 3

0 30 60 90 120 150 180 210 240 270

Page 7: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry-o-Matic - SingleCycle

Minutes

Load

1 2 3

0 30 60 90 120 150 180 210 240 270

W F D

Page 8: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry-o-Matic - SingleCycle

Minutes

Load

1 2 3

0 30 60 90 120 150 180 210 240 270

W F

W D F

D

Page 9: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry-o-Matic - SingleCycle

Minutes

Load

1 2 3

0 30 60 90 120 150 180 210 240 270

W F

W D F

W F D

D

Page 10: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry-o-Matic

•  Cycle Time: Clothing is switched every ____ minutes

•  Latency: A single load takes a total of ______ minutes

•  Throughput: A load completes each ______ minutes

•  CyclesPerLoad: Every ____ cycles, a load completes

Page 11: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry-o-Matic

•  Cycle Time: Clothing is switched every 90 minutes

•  Latency: A single load takes a total of ______ minutes

•  Throughput: A load completes each ______ minutes

•  CyclesPerLoad: Every ____ cycles, a load completes

Page 12: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry-o-Matic

•  Cycle Time: Clothing is switched every 90 minutes

•  Latency: A single load takes a total of 90 minutes

•  Throughput: A load completes each ______ minutes

•  CyclesPerLoad: Every ____ cycles, a load completes

Page 13: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry-o-Matic

•  Cycle Time: Clothing is switched every 90 minutes

•  Latency: A single load takes a total of 90 minutes

•  Throughput: A load completes each 90 minutes

•  CyclesPerLoad: Every ____ cycles, a load completes

Page 14: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Laundry-o-Matic

•  Cycle Time: Clothing is switched every 90 minutes

•  Latency: A single load takes a total of 90 minutes

•  Throughput: A load completes each 90 minutes

•  CyclesPerLoad: Every 1 cycles, a load completes

Page 15: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

•  Split the laundry-o-matic into a washer, dryer, and folder (what a concept)

•  Moving the laundry from one to another takes 6 minutes

Page 16: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

Minutes

Load

1 2 3

0 30 60 90 120 150 180 210 240 270

W F D

We have to include time to switch stages

Page 17: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

Minutes

Load

1 2 3

0 30 60 90 120 150 180 210 240 270

W F D

W F D

Page 18: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

Minutes

Load

1 2 3

0 30 60 90 120 150 180 210 240 270

W F D

W F D

Two loads can not be in Dryer at the same

time.

Page 19: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

Minutes

Load

1 2 3

0 30 60 90 120 150 180 210 240 270

W F

W

Switch all loads at the same time

D

D F

Page 20: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

Minutes

Load

1 2 3

0 30 60 90 120 150 180 210 240 270

W F

W

D

D F

W D F

Page 21: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

•  Cycle Time: Clothing is switched every ____ minutes

•  Latency: A single load takes a total of ______ minutes

•  Throughput: A load completes each ______ minutes

•  CyclesPerLoad: Every ____ cycles, a load completes

Page 22: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

•  Cycle Time: Clothing is switched every 46 minutes

•  Latency: A single load takes a total of ______ minutes

•  Throughput: A load completes each ______ minutes

•  CyclesPerLoad: Every ____ cycles, a load completes

Page 23: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

•  Cycle Time: Clothing is switched every 46 minutes

•  Latency: A single load takes a total of 138 minutes

•  Throughput: A load completes each ______ minutes

•  CyclesPerLoad: Every ____ cycles, a load completes

Page 24: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

•  Cycle Time: Clothing is switched every 46 minutes

•  Latency: A single load takes a total of 138 minutes

•  Throughput: A load completes each 46 minutes

•  CyclesPerLoad: Every ____ cycles, a load completes

Page 25: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Laundry

•  Cycle Time: Clothing is switched every 46 minutes

•  Latency: A single load takes a total of 138 minutes

•  Throughput: A load completes each 46 minutes

•  CyclesPerLoad: Every 1 cycles, a load completes

Page 26: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Single-Cycle vs Pipelined

•  _________ has the higher cycle time•  _________ has the higher clock rate•  _________ has the higher single-load

latency•  _________ has the higher throughput•  _________ has the higher CPL (Cycles

per Load)•  More stages makes a _______ clock

rate

Page 27: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Single-Cycle vs Pipelined

•  Single has the higher cycle time•  _________ has the higher clock rate•  _________ has the higher single-load

latency•  _________ has the higher throughput•  _________ has the higher CPL (Cycles

per Load)•  More stages makes a _______ clock

rate

Page 28: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Single-Cycle vs Pipelined

•  Single has the higher cycle time•  Pipelined has the higher clock rate•  _________ has the higher single-load

latency•  _________ has the higher throughput•  _________ has the higher CPL (Cycles

per Load)•  More stages makes a _______ clock

rate

Page 29: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Single-Cycle vs Pipelined

•  Single has the higher cycle time•  Pipelined has the higher clock rate•  Pipelined has the higher single-load

latency•  _________ has the higher throughput•  _________ has the higher CPL (Cycles

per Load)•  More stages makes a _______ clock

rate

Page 30: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Single-Cycle vs Pipelined

•  Single has the higher cycle time•  Pipelined has the higher clock rate•  Pipelined has the higher single-load

latency•  Pipelined has the higher throughput•  _________ has the higher CPL (Cycles

per Load)•  More stages makes a _______ clock

rate

Page 31: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Single-Cycle vs Pipelined

•  Single has the higher cycle time•  Pipelined has the higher clock rate•  Pipelined has the higher single-load

latency•  Pipelined has the higher throughput•  Neither has the higher CPL (Cycles per

Load)•  More stages makes a _______ clock

rate

Page 32: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Single-Cycle vs Pipelined

•  Single has the higher cycle time•  Pipelined has the higher clock rate•  Pipelined has the higher single-load

latency•  Pipelined has the higher throughput•  Neither has the higher CPL (Cycles per

Load)•  More stages makes a Higher clock rate

Page 33: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Obstacles to speedup in PipeliningW F D

•  1. •  2.

•  Ideal cycle time w/out above limitations with n stage pipeline:

Page 34: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Obstacles to speedup in Pipelining

•  1. Uneven Stages•  2.

•  Ideal cycle time w/out above limitations with n stage pipeline:

W F D

Page 35: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Obstacles to speedup in Pipelining

•  1. Uneven Stages•  2. Pipeline Register Delay

•  Ideal cycle time w/out above limitations with n stage pipeline:

W F D

Page 36: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Obstacles to speedup in Pipelining

•  1. Uneven Stages•  2. Pipeline Register Delay

•  Ideal cycle time w/out above limitations with n stage pipeline:w OldCycleTime / n

W F D

Page 37: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example

•  Washing = 45•  Drying = 120•  Folding = 15•  Switching = 5

•  What is the latency for one load of laundry?

•  What is the latency for three loads?

Page 38: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Creating Stages

•  Fetch – get instruction•  Decode – read registers•  Execute – use ALU•  Memory – access memory•  WriteBack – write registers

Fetch Decode Execute Memory WriteBack

IF WB MEM ID

Page 39: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Machine

Read Addr Out Data

Instruction Memory

PC

4

src1 src1data

src2 src2data Register File

destreg

destdata

op/fun rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext

16

<< 2

<< 2

Pipeline Register

Fetch

(Writeback)

Execute Decode Memory

Page 40: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

Time->

IF

1 2 3 4 5 6 7 8

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

add $s0, $0, $0

Page 41: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

Time->

IF ID

IF

1 2 3 4 5 6 7 8

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

add $s0, $0, $0 lw $s1, 0($t0)

Page 42: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

Time->

IF ID

IF ID

IF

1 2 3 4 5 6 7 8

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

add $s0, $0, $0 lw $s1, 0($t0) sw $s2, 0($t1)

Page 43: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF

add $s0, $0, $0

ID

IF

lw $s1, 0($t0)

ID

IF

sw $s2, 0($t1)

MEM

ID

IF

or $s3, $s4, $t3

1 2 3 4 5 6 7 8

Page 44: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

add $s0, $0, $0 lw $s1, 0($t0) sw $s2, 0($t1) or $s3, $s4, $t3

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

MEM

ID

WB

Page 45: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

lw $s1, 0($t0) sw $s2, 0($t1) or $s3, $s4, $t3

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

MEM

ID

WB MEM

WB

Page 46: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

sw $s2, 0($t1) or $s3, $s4, $t3

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

MEM ID

WB

MEM

WB

MEM

WB

Page 47: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

or $s3, $s4, $t3

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

ID WB

MEM

WB

MEM

WB

MEM

WB

Page 48: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

ID WB

MEM

WB

MEM

WB

MEM

WB

The machine in cycle 4

Page 49: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

IF WB MEM ID

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

ID WB

MEM

WB

MEM

WB

MEM

WB

The machine in cycle 5

Page 50: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

In what cycle was $s1 written? In what cycle was $s4 read? In what cycle was the Add executed?

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

ID WB

MEM

WB

MEM

WB

MEM

WB

Page 51: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

In what cycle was $s1 written? 6 In what cycle was $s4 read? In what cycle was the Add executed?

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

ID WB

MEM

WB

MEM

WB

MEM

WB

Page 52: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

In what cycle was $s1 written? 6 In what cycle was $s4 read? 5 In what cycle was the Add executed?

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

ID WB

MEM

WB

MEM

WB

MEM

WB

Page 53: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

In what cycle was $s1 written? 6 In what cycle was $s4 read? 5 In what cycle was the Add executed? 3

Time->

add $s0, $0, $0

lw $s1, 0($t0)

sw $s2, 0($t1)

or $s3, $s4, $t3

IF ID

IF ID

IF

MEM

ID

IF

1 2 3 4 5 6 7 8

ID WB

MEM

WB

MEM

WB

MEM

WB

Page 54: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Performance Analysis

•  Measurements related to our machine•  Job = single instruction•  Latency: Time to finish a complete

_______________, start to finish. •  Throughput: Average ______________

completed per unit time.

•  Which is more important for reducing program execution time?

Page 55: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Performance Analysis

•  Measurements related to our machine•  Job = single instruction•  Latency: Time to finish a complete

instruction start to finish. •  Throughput: Average ______________

completed per unit time.

•  Which is more important for reducing program execution time?

Page 56: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Performance Analysis

•  Measurements related to our machine•  Job = single instruction•  Latency: Time to finish a complete

instruction start to finish. •  Throughput: Average number of

instructions completed per unit time.

•  Which is more important for reducing program execution time?

Page 57: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Machine

Read Addr Out Data

Instruction Memory

PC

4

src1 src1data

src2 src2data Register File

destreg

destdata

op/fun rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext

16

<< 2

<< 2

Pipeline Register

Fetch

(Writeback)

Execute Decode Memory

Page 58: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipeline Registers

w  IF/ID§  32b instruction§  32b nPC

w  ID/EX§  32b register§  32b register§  32b immediate field§  32b nPC

w  EX/MEM§  Zero§  32b ALU result§  32b nPC§  32b register value

w MEM/WB§  32b ALU result§  32b memory value

•  Named for two stages they separate •  Store all data corresponding to lines that go

through them

Page 59: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Register File

•  Only takes half of a cycle to read or write to register file

•  Convention:w Read 2nd half of cyclew Write 1st half of cycle

Page 60: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay

Single-Cycle Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Page 61: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay

Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Page 62: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay

Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: _____ inst/ns

Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Page 63: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay

Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns

Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Page 64: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay

Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns

Pipelined Implementation Clock cycle time: 2.1 ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Page 65: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay

Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns

Pipelined Implementation Clock cycle time: 2.1 ns Latency of a single instruction: 2.1*5=10.5 ns Throughput for machine: _____ inst/ns

Page 66: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay

Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns

Pipelined Implementation Clock cycle time: 2.1 ns Latency of a single instruction: 2.1*5=10.5 ns Throughput for machine: 1 / 2.1 inst/ns

Page 67: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – How do we speed up pipelined machine?

Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Page 68: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – How do we speed up pipelined machine?

Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Single cycle: 1 / 32 inst / ns Pipelined: 1 / 10.1 inst / ns

Page 69: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – Split more stages

Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Which stage(s) should we split? _________ and _________

Page 70: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – Split more stages

Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Which stage(s) should we split? Memory and _________

Page 71: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – Split more stages

Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Which stage(s) should we split? Memory and Execute

Page 72: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – After Split

F D X1 X2 M1 M2 WB ___ns ___ns ___ns ___ns ___ns ___ns ___ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Page 73: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – After Split

F D X1 X2 M1 M2 WB 6 ns 4 ns ___ns ___ns ___ns ___ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Page 74: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – After Split

F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns ___ns ___ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Page 75: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – After Split

F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns 5 ns 5 ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Page 76: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – After Split

F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns 5 ns 5 ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / 32 ns Pipelined: 1 / ns

Page 77: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Example 2 – After Split

F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns 5 ns 5 ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / 32 ns Pipelined: 1 / 6.1 ns

Page 78: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Easy Right? Not so fast.

In what cycle does the add write $s0? In what cycle does the or read $s0?

Time->

add $s0, $0, $0

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Incorrect Execution

Page 79: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

add $s0, $0, $0

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

Easy Right? Not so fast.

In what cycle does the add write $s0? 1st half of cycle 5 In what cycle does the or read $s0?

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Page 80: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

add $s0, $0, $0

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

Easy Right? Not so fast.

In what cycle does the add write $s0 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 3

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

WB

Page 81: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

add $s0, $0, $0

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

Easy Right? Not so fast.

In what cycle does the add write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 3

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Ahhhh! Values can not pass backwards

in time

WB

Page 82: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

add $s0, $0, $0

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

Easy Right? Not so fast. In what cycle does the add write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 5

Stall - wasted cycles

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

IF IF

Correct, Slow Execution

Page 83: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

add $s0, $0, $0

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

Easy Right? Not so fast. In what cycle does the add write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 5

Stall - wasted cycles

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

IF IF

Correct, Slow ExecutionOnly Register File rd/wr in half a cycle. All other stages take a full cycle – this is

because of shared hardware

Page 84: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Machine

Read Addr Out Data

Instruction Memory

PC

4

src1 src1data

src2 src2data Register File

destreg

destdata

op/fun rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext

16

<< 2

<< 2

Pipeline Register

Fetch

(Writeback)

Execute Decode Memory

Page 85: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Incorrect Execution caused by Data Hazard

In what cycle does the lw write $s0? In what cycle does the or read $s0?

Time->

lw $s0, 0($t4)

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Page 86: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

In what cycle does the lw write $s0? 1st half of cycle 5 In what cycle does the or read $s0?

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

lw $s0, 0($t4)

Incorrect Execution caused by Data Hazard

Page 87: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB

In what cycle does the lw write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 3

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

lw $s0, 0($t4) WB

Incorrect Execution caused by Data Hazard

Page 88: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM WB

In what cycle does the lw write $s0? 1st half of 5 In what cycle does the or read $s0? 2nd half of 3

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

lw $s0, 0($t4) WB

ID

Incorrect Execution caused by Data Hazard

Arrow to the left is information passed backwards in time

Page 89: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM WB

In what cycle does the lw write $s0? 1st half of 5 In what cycle does the or read $s0? 2nd half of 3

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Data Hazard

lw $s0, 0($t4) WB

ID

Remember!!! Only WB and ID take ½ cycle. All other stages take a full

cycle!!!!!!!!!

Page 90: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Barriers to pipelined performance

•  Uneven stages•  Pipeline register delays

Page 91: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Barriers to pipelined performance

•  Uneven stages•  Pipeline register delays•  Data Hazards

Page 92: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Barriers to pipeline performance

•  Uneven stages•  Pipeline register delays•  Data Hazards

w An instruction depends on the result of a previous instruction still in the pipeline

Page 93: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM WB

In what cycle does the lw write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 5

IF IF

Stall - wasted cycles

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Default: Stall

lw $s0, 0($t4) WB

ID

Remember!!! Only WB and ID take ½ cycle. All other stages take a full

cycle!!!!!!!!!

Page 94: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM WB

In what cycle is $s0 calculated in the machine? In what cycle is $s0 used in the machine?

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Solution 1: Data Forwarding

lw $s0, 0($t4) WB

ID

Page 95: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM WB

In what cycle is $s0 calculated in the machine? End of cycle 4 In what cycle is $s0 used?

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Solution 1: Data Forwarding

lw $s0, 0($t4) WB

ID

Page 96: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM WB

In what cycle is $s0 calculated in the machine? End of cycle 4 In what cycle is $s0 used? beginning of cycle 4

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Solution 1: Data Forwarding

lw $s0, 0($t4) WB

ID

Page 97: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM WB

In what cycle is $s0 calculated in the machine? end of cycle 4 In what cycle is $s0 used? beginning of cycle 5

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Solution 1: Data Forwarding

lw $s0, 0($t4)

ID

IF

WB

ID

Page 98: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Data-ForwardingWhere are those wires?

Read Addr Out Data

Instruction Memory

PC

4

src1 src1data

src2 src2data Register File

destreg

destdata

op/fun rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext

16

<< 2

<< 2

Pipeline Register

Fetch

(Writeback)

Execute Decode Memory

Page 99: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Data-ForwardingWhere are those wires?

Read Addr Out Data

Instruction Memory

PC

4

src1 src1data

src2 src2data Register File

destreg

destdata

op/fun rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext

16

<< 2

<< 2

Pipeline Register

Fetch

(Writeback)

Execute Decode Memory

Page 100: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

add $s2, $s2, $t0

F D

F

M

1 2 3 4 5 6 7 8 9 10 11 12

Draw the timing diagram with data forwarding Draw arrows to indicate data passing through forwarding

F

lw $t0, 0($s0) W

D

Data ForwardingExample 2

sw $s2, 0($s0)

addi $t0, $t0, 1

Page 101: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM WB

IF IF

Stall - wasted cycles

sw $s2, 0($t1)

and $s6, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Solution 2: Instruction Reordering (Before reordering)

lw $s0, 0($t4) WB

ID

Page 102: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

sw $s2, 0($t1) and $s6, $s4, $t3 IF ID

IF ID WB

MEM

MEM

WB

Solution 2: Instruction Reordering (After Reordering)

lw $s0, 0($t4)

ID

WB

Page 103: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Who reorders instructions?

•  Static schedulingw Compilerw Simpler, but does not know when caches

miss or loads/stores are to the same locations

•  Dynamic schedulingw Hardwarew More complicated, but has all knowledge

Page 104: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF ID

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

sw $s3, 0($t1)

and $s0, $s4, $t3

IF ID

IF ID WB

MEM

MEM

WB

Solution 2: Instruction Reordering

lw $s0, 0($t4) ID

WB

Page 105: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

sw $s3, 0($t1)

and $s0, $s4, $t3 IF ID

IF ID WB

MEM

MEM

WB

Solution 2: Instruction Reordering

lw $s0, 0($t4) ID

WB

Is this the same execution?!?

Page 106: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Time->

or $s3, $s0, $t3

IF ID

IF

MEM

1 2 3 4 5 6 7 8

MEM

WB

WB

sw $s3, 0($t1)

and $s0, $s4, $t3 IF ID

IF ID WB

MEM

MEM

WB

Solution 2: Instruction Reordering

lw $s0, 0($t4) ID

WB

Is this the same execution?!?

Page 107: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:

Pipelined Machine

Read Addr Out Data

Instruction Memory

PC

4

src1 src1data

src2 src2data Register File

destreg

destdata

op/fun rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext

16

<< 2

<< 2

Pipeline Register

Fetch

(Writeback)

Execute Decode Memory