81
L16 Pipelining 1 Comp 411 Pipelining Between Spring break and 411 problems sets, I haven’t had a minute to do laundry Now that’s what I call dirty laundry

Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

Embed Size (px)

Citation preview

Page 1: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 1Comp 411

Pipelining

Between Spring break and 411 problems sets, I haven’t had a minute to

do laundry

Now that’s what Icall dirty laundry

Page 2: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 2Comp 411

Forget 411… Let’s Solve a “Relevant Problem”

Device: Washer

Function: Fill, Agitate, Spin

WasherPD = 30 mins

Device: Dryer

Function: Heat, Spin

DryerPD = 60 mins

INPUT:dirty laundry

OUTPUT:4 more weeks

Page 3: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 3Comp 411

One Load at a Time

Everyone knows that the real reason that UNC students put off doing laundry so long is *not* because they procrastinate, are lazy, or even have better things to do.

The fact is, doing laundry one load at a time is not smart.

(Sorry Mom, but you were wrong about this one!)

Step 1:

Step 2:

Total = WasherPD + DryerPD

= _________ mins

Page 4: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 4Comp 411

One Load at a Time

Everyone knows that the real reason that UNC students put off doing laundry so long is *not* because they procrastinate, are lazy, or even have better things to do.

The fact is, doing laundry one load at a time is not smart.

(Sorry Mom, but you were wrong about this one!)

Step 1:

Step 2:

Total = WasherPD + DryerPD

= _________ mins90

Page 5: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 5Comp 411

Doing N Loads of Laundry

Here’s how they do laundry at Duke, the “combinational” way.

(Actually, this is just an urban legend. No one at Duke actually does laundry. The butler’s all arrive on Wednesday morning, pick up the dirty laundry and return it all pressed and starched by dinner)

Step 1:

Step 2:

Step 3:

Step 4:

Total = N*(WasherPD + DryerPD)

= ____________ mins

Page 6: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 6Comp 411

Doing N Loads of Laundry

Here’s how they do laundry at Duke, the “combinational” way.

(Actually, this is just an urban legend. No one at Duke actually does laundry. The butler’s all arrive on Wednesday morning, pick up the dirty laundry and return it all pressed and starched by dinner)

Step 1:

Step 2:

Step 3:

Step 4:

Total = N*(WasherPD + DryerPD)

= ____________ minsN*90

Page 7: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 7Comp 411

Doing N Loads… the UNC way

UNC students “pipeline” the laundry process.

That’s why we wait!

Step 1:

Step 2:

Step 3:

Total = N * Max(WasherPD, DryerPD)

= ____________ mins

Page 8: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 8Comp 411

Doing N Loads… the UNC way

UNC students “pipeline” the laundry process.

That’s why we wait!

Step 1:

Step 2:

Step 3:

Total = N * Max(WasherPD, DryerPD)

= ____________ minsN*60

Page 9: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 9Comp 411

Doing N Loads… the UNC way

UNC students “pipeline” the laundry process.

That’s why we wait!

Step 1:

Step 2:

Step 3:

Total = N * Max(WasherPD, DryerPD)

= ____________ minsN*60

…Actually, it’s more like N*60 + 30 if we account for the startup transient correctly. When doing pipeline analysis, we’re mostly interested in the “steady state” where we assume we have an infinite supply of inputs.

Page 10: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 10Comp 411

Recall Our Performance Measures

Latency:The delay from when an input is established until the output associated with that input becomes valid.

(Duke Laundry = _________ mins)( UNC Laundry = _________ mins)

Throughput:The rate of which inputs or outputs are processed.

(Duke Laundry = _________ outputs/min)( UNC Laundry = _________ outputs/min)

Page 11: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 11Comp 411

Recall Our Performance Measures

Latency:The delay from when an input is established until the output associated with that input becomes valid.

(Duke Laundry = _________ mins)( UNC Laundry = _________ mins)

Throughput:The rate of which inputs or outputs are processed.

(Duke Laundry = _________ outputs/min)( UNC Laundry = _________ outputs/min)

90

Page 12: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 12Comp 411

Recall Our Performance Measures

Latency:The delay from when an input is established until the output associated with that input becomes valid.

(Duke Laundry = _________ mins)( UNC Laundry = _________ mins)

Throughput:The rate of which inputs or outputs are processed.

(Duke Laundry = _________ outputs/min)( UNC Laundry = _________ outputs/min)

90

120

Page 13: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 13Comp 411

Recall Our Performance Measures

Latency:The delay from when an input is established until the output associated with that input becomes valid.

(Duke Laundry = _________ mins)( UNC Laundry = _________ mins)

Throughput:The rate of which inputs or outputs are processed.

(Duke Laundry = _________ outputs/min)( UNC Laundry = _________ outputs/min)

90

120

Assuming that the wash is started as soon as possible and waits (wet) in the washer until dryer is available.

Page 14: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 14Comp 411

Recall Our Performance Measures

Latency:The delay from when an input is established until the output associated with that input becomes valid.

(Duke Laundry = _________ mins)( UNC Laundry = _________ mins)

Throughput:The rate of which inputs or outputs are processed.

(Duke Laundry = _________ outputs/min)( UNC Laundry = _________ outputs/min)

90

120

1/90

Assuming that the wash is started as soon as possible and waits (wet) in the washer until dryer is available.

Page 15: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 15Comp 411

Recall Our Performance Measures

Latency:The delay from when an input is established until the output associated with that input becomes valid.

(Duke Laundry = _________ mins)( UNC Laundry = _________ mins)

Throughput:The rate of which inputs or outputs are processed.

(Duke Laundry = _________ outputs/min)( UNC Laundry = _________ outputs/min)

90

120

1/90

1/60

Assuming that the wash is started as soon as possible and waits (wet) in the washer until dryer is available.

Page 16: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 16Comp 411

Recall Our Performance Measures

Latency:The delay from when an input is established until the output associated with that input becomes valid.

(Duke Laundry = _________ mins)( UNC Laundry = _________ mins)

Throughput:The rate of which inputs or outputs are processed.

(Duke Laundry = _________ outputs/min)( UNC Laundry = _________ outputs/min)

90

120

1/90

1/60

Assuming that the wash is started as soon as possible and waits (wet) in the washer until dryer is available.

Even though we increaselatency, it takes less time per load

Page 17: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 17Comp 411

Okay, Back to Circuits…

F

G

HX P(X)

For combinational logic:latency = tPD, throughput = 1/tPD.

We can’t get the answer faster, but are we making effective use of our hardware at all times?

G(X)

F(X)

P(X)

X

F & G are “idle”, just holding their outputs stable while H performs its computation

Page 18: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 18Comp 411

Pipelined Circuits

use registers to hold H’s input stable!

F

G

HX P(X)

15

20

25

Now F & G can be working on input Xi+1

while H is performing its computation on Xi. We’ve created a 2-stage pipeline : if we have a valid input X during clock cycle j, P(X) is valid during clock j+2.

Suppose F, G, H have propagation delays of 15, 20, 25 ns and we are using ideal zero-delay registers (ts = 0, tpd = 0):

latency45

______

throughput1/45

______unpipelined

2-stage pipeline

Page 19: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 19Comp 411

Pipelined Circuits

use registers to hold H’s input stable!

F

G

HX P(X)

15

20

25

Now F & G can be working on input Xi+1

while H is performing its computation on Xi. We’ve created a 2-stage pipeline : if we have a valid input X during clock cycle j, P(X) is valid during clock j+2.

Suppose F, G, H have propagation delays of 15, 20, 25 ns and we are using ideal zero-delay registers (ts = 0, tpd = 0):

latency45

______

throughput1/45

______unpipelined

2-stage pipeline 50

worse

Page 20: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 20Comp 411

Pipelined Circuits

use registers to hold H’s input stable!

F

G

HX P(X)

15

20

25

Now F & G can be working on input Xi+1

while H is performing its computation on Xi. We’ve created a 2-stage pipeline : if we have a valid input X during clock cycle j, P(X) is valid during clock j+2.

Suppose F, G, H have propagation delays of 15, 20, 25 ns and we are using ideal zero-delay registers (ts = 0, tpd = 0):

latency45

______

throughput1/45

______unpipelined

2-stage pipeline 50

worse

1/25

better

Page 21: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 21Comp 411

Pipelined Circuits

use registers to hold H’s input stable!

F

G

HX P(X)

15

20

25

Now F & G can be working on input Xi+1

while H is performing its computation on Xi. We’ve created a 2-stage pipeline : if we have a valid input X during clock cycle j, P(X) is valid during clock j+2.

Suppose F, G, H have propagation delays of 15, 20, 25 ns and we are using ideal zero-delay registers (ts = 0, tpd = 0):

latency45

______

throughput1/45

______unpipelined

2-stage pipeline 50

worse

1/25

better

Pipelining uses registers to improve the

throughput of combinational

circuits

Page 22: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 22Comp 411

Pipeline Diagrams

Input

F Reg

G Reg

H Reg

i i+1 i+2 i+3

Xi

Clock cycleP

ipel

ine

stag

es

The results associated with a particular set of input data moves diagonally through the diagram, progressing through one pipeline stage each clock cycle.

F

G

HX P(X)

15

20

25

Page 23: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 23Comp 411

Pipeline Diagrams

Input

F Reg

G Reg

H Reg

i i+1 i+2 i+3

Xi Xi+1

F(Xi)

G(Xi)

Clock cycleP

ipel

ine

stag

es

The results associated with a particular set of input data moves diagonally through the diagram, progressing through one pipeline stage each clock cycle.

F

G

HX P(X)

15

20

25

Page 24: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 24Comp 411

Pipeline Diagrams

Input

F Reg

G Reg

H Reg

i i+1 i+2 i+3

Xi Xi+1

F(Xi)

G(Xi)

Xi+2

F(Xi+1)

G(Xi+1)

H(Xi)

Clock cycleP

ipel

ine

stag

es

The results associated with a particular set of input data moves diagonally through the diagram, progressing through one pipeline stage each clock cycle.

F

G

HX P(X)

15

20

25

Page 25: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 25Comp 411

Pipeline Diagrams

Input

F Reg

G Reg

H Reg

i i+1 i+2 i+3

Xi Xi+1

F(Xi)

G(Xi)

Xi+2

F(Xi+1)

G(Xi+1)

H(Xi)

Xi+3

F(Xi+2)

G(Xi+2)

H(Xi+1)

Clock cycleP

ipel

ine

stag

es

The results associated with a particular set of input data moves diagonally through the diagram, progressing through one pipeline stage each clock cycle.

F

G

HX P(X)

15

20

25

Page 26: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 26Comp 411

Pipeline Diagrams

Input

F Reg

G Reg

H Reg

i i+1 i+2 i+3

Xi Xi+1

F(Xi)

G(Xi)

Xi+2

F(Xi+1)

G(Xi+1)

H(Xi)

Xi+3

F(Xi+2)

G(Xi+2)

H(Xi+1)

Clock cycleP

ipel

ine

stag

es

The results associated with a particular set of input data moves diagonally through the diagram, progressing through one pipeline stage each clock cycle.

H(Xi+2)

F

G

HX P(X)

15

20

25

Page 27: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 27Comp 411

Pipeline Diagrams

Input

F Reg

G Reg

H Reg

i i+1 i+2 i+3

Xi Xi+1

F(Xi)

G(Xi)

Xi+2

F(Xi+1)

G(Xi+1)

H(Xi)

Xi+3

F(Xi+2)

G(Xi+2)

H(Xi+1)

Clock cycleP

ipel

ine

stag

es

The results associated with a particular set of input data moves diagonally through the diagram, progressing through one pipeline stage each clock cycle.

H(Xi+2)

F

G

HX P(X)

15

20

25

This is an exampleof parallelism. Atany instant we arecomputing 2 results.

Page 28: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 28Comp 411

Pipeline ConventionsDEFINITION:

a K-Stage Pipeline (“K-pipeline”) is an acyclic circuit having exactly K registers on every path from an input to an output.

a COMBINATIONAL CIRCUIT is thus a 0-stage pipeline.

CONVENTION:Every pipeline stage, hence every K-Stage pipeline, has a register on its OUTPUT (not on its input).

ALWAYS:The CLOCK common to all registers *must* have a period sufficient to allow for the propagation delays of all combinational paths PLUS (input) register tPD PLUS (output) register tSETUP.

The LATENCY of a K-pipeline is K times the period of the clock common to all registers.

The THROUGHPUT of a K-pipeline is the frequency of the clock.

Page 29: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 29Comp 411

Ill-Formed Pipelines

B

CX

Y

A

For what value of K is the following circuit a K-Pipeline? ANS: ____________

Consider a BAD job of pipelining:

Page 30: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 30Comp 411

Ill-Formed Pipelines

B

CX

Y

A

noneFor what value of K is the following circuit a K-Pipeline? ANS: ____________

Consider a BAD job of pipelining:

Page 31: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 31Comp 411

Ill-Formed Pipelines

B

CX

Y

A

Problem:

Successive inputs get mixed: e.g., B(A(Xi+1), Yi). This happened because some paths from inputs to outputs had 2 registers, and some had only 1!

Can this happen on a well-formed K pipeline?

noneFor what value of K is the following circuit a K-Pipeline? ANS: ____________

Consider a BAD job of pipelining:

Page 32: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 32Comp 411

A Pipelining Methodology

Step 1:

Draw a line that crosses every output in the circuit, and select one endpoint as an origin.

Step 2:

Continue to draw new lines from the origin across various circuit connections such that these new lines partition the inputs from the outputs.

Adding a pipeline register at every point where a separating line crosses a connection will always generate a valid pipeline.

STRATEGY:

Focus your attention on placing pipelining registers around the slowest circuit elements (BOTTLENECKS).

A4 nS

B3 nS

C8 nS

D4 nS

E2 nS

F5 nS

Page 33: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 33Comp 411

A Pipelining Methodology

Step 1:

Draw a line that crosses every output in the circuit, and select one endpoint as an origin.

Step 2:

Continue to draw new lines from the origin across various circuit connections such that these new lines partition the inputs from the outputs.

Adding a pipeline register at every point where a separating line crosses a connection will always generate a valid pipeline.

STRATEGY:

Focus your attention on placing pipelining registers around the slowest circuit elements (BOTTLENECKS).

A4 nS

B3 nS

C8 nS

D4 nS

E2 nS

F5 nS

Page 34: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 34Comp 411

A Pipelining Methodology

Step 1:

Draw a line that crosses every output in the circuit, and select one endpoint as an origin.

Step 2:

Continue to draw new lines from the origin across various circuit connections such that these new lines partition the inputs from the outputs.

Adding a pipeline register at every point where a separating line crosses a connection will always generate a valid pipeline.

STRATEGY:

Focus your attention on placing pipelining registers around the slowest circuit elements (BOTTLENECKS).

A4 nS

B3 nS

C8 nS

D4 nS

E2 nS

F5 nS

Page 35: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 35Comp 411

A Pipelining Methodology

Step 1:

Draw a line that crosses every output in the circuit, and select one endpoint as an origin.

Step 2:

Continue to draw new lines from the origin across various circuit connections such that these new lines partition the inputs from the outputs.

Adding a pipeline register at every point where a separating line crosses a connection will always generate a valid pipeline.

STRATEGY:

Focus your attention on placing pipelining registers around the slowest circuit elements (BOTTLENECKS).

A4 nS

B3 nS

C8 nS

D4 nS

E2 nS

F5 nS

Page 36: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 36Comp 411

A Pipelining Methodology

Step 1:

Draw a line that crosses every output in the circuit, and select one endpoint as an origin.

Step 2:

Continue to draw new lines from the origin across various circuit connections such that these new lines partition the inputs from the outputs.

Adding a pipeline register at every point where a separating line crosses a connection will always generate a valid pipeline.

STRATEGY:

Focus your attention on placing pipelining registers around the slowest circuit elements (BOTTLENECKS).

A4 nS

B3 nS

C8 nS

D4 nS

E2 nS

F5 nS

In these examples

we assume “idealized”

pipeline registers,

with tpd = 0

and ts = 0.

Page 37: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 37Comp 411

A Pipelining Methodology

Step 1:

Draw a line that crosses every output in the circuit, and select one endpoint as an origin.

Step 2:

Continue to draw new lines from the origin across various circuit connections such that these new lines partition the inputs from the outputs.

Adding a pipeline register at every point where a separating line crosses a connection will always generate a valid pipeline.

STRATEGY:

Focus your attention on placing pipelining registers around the slowest circuit elements (BOTTLENECKS).

A4 nS

B3 nS

C8 nS

D4 nS

E2 nS

F5 nS

T = 1/8nsL = 24ns

In these examples

we assume “idealized”

pipeline registers,

with tpd = 0

and ts = 0.

Page 38: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 38Comp 411

Extreme Pipelining

A

B

CX

Y

2

1

1

0-pipe:

1-pipe:

2-pipe:

3-pipe:

LATENCY THROUGHPUT

4 1/4

OBSERVATIONS:

• 1-pipeline improves neither L or T.

• T improved by breaking long combinational paths, allowing faster clock.

• Too many stages cost L, don’t improve T.

• Back-to-back registers are often required to keep pipeline well-formed.

Page 39: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 39Comp 411

Extreme Pipelining

A

B

CX

Y

2

1

1

0-pipe:

1-pipe:

2-pipe:

3-pipe:

LATENCY THROUGHPUT

4 1/4

OBSERVATIONS:

• 1-pipeline improves neither L or T.

• T improved by breaking long combinational paths, allowing faster clock.

• Too many stages cost L, don’t improve T.

• Back-to-back registers are often required to keep pipeline well-formed.

4 1/4

1

Page 40: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 40Comp 411

Extreme Pipelining

A

B

CX

Y

2

1

1

0-pipe:

1-pipe:

2-pipe:

3-pipe:

LATENCY THROUGHPUT

4 1/4

OBSERVATIONS:

• 1-pipeline improves neither L or T.

• T improved by breaking long combinational paths, allowing faster clock.

• Too many stages cost L, don’t improve T.

• Back-to-back registers are often required to keep pipeline well-formed.

4 1/4

1

4 1/2

2

Page 41: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 41Comp 411

Extreme Pipelining

A

B

CX

Y

2

1

1

0-pipe:

1-pipe:

2-pipe:

3-pipe:

LATENCY THROUGHPUT

4 1/4

OBSERVATIONS:

• 1-pipeline improves neither L or T.

• T improved by breaking long combinational paths, allowing faster clock.

• Too many stages cost L, don’t improve T.

• Back-to-back registers are often required to keep pipeline well-formed.

4 1/4

1

4 1/2

2

1/26

3

Page 42: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 42Comp 411

Pipelining Summary

Advantages:– Higher throughput than combinational system

– Different parts of the logic work on different parts of the problem…

Disadvantages:– Generally, increases latency

– Only as good as the *weakest* link(often called the pipeline’s BOTTLENECK)

Isn’t there a way around this “weak link” problem?

This bottleneckis the onlyproblem

Page 43: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 43Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Page 44: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 44Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Page 45: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 45Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Step 2:

Page 46: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 46Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Step 2:

Page 47: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 47Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Step 2:

Step 3:

Page 48: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 48Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Step 2:

Step 3:

Page 49: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 49Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Step 2:

Step 3:

Step 4:

Page 50: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 50Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Step 2:

Step 3:

Step 4:

Page 51: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 51Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Step 2:

Step 3:

Step 4:

Page 52: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 52Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Step 2:

Step 3:

Step 4:

1/30

Page 53: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 53Comp 411

How do UNC students REALLY do Laundry?

They work around the bottleneck. First, they find a place with twice as many dryers as washers.

Throughput = ______ loads/min

Latency = ______ mins/load

Step 1:

Step 2:

Step 3:

Step 4:

1/30

90

Page 54: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 54Comp 411

Circuit Interleaving

One way to overcome a pipeline bottleneck is to replicate the critical element as many times as needed and alternate inputs between the various copies.

N-way interleaving is equivalent to how many pipeline Stages? ____

C C

D Q

ENEN

C’

0 1

Latency = 2 clocks

N-wayinterleave

Page 55: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 55Comp 411

Circuit Interleaving

One way to overcome a pipeline bottleneck is to replicate the critical element as many times as needed and alternate inputs between the various copies.

N-way interleaving is equivalent to how many pipeline Stages? ____

C C

D Q

ENEN

C’

0 1

Latency = 2 clocks

N-wayinterleave

N

Page 56: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 56Comp 411

Circuit Interleaving

One way to overcome a pipeline bottleneck is to replicate the critical element as many times as needed and alternate inputs between the various copies.

N-way interleaving is equivalent to how many pipeline Stages? ____

C C

D Q

ENEN

C’

0 1

Latency = 2 clocks

N-wayinterleave

NBut with a register

constrained to be on the

inputs

Page 57: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 57Comp 411

Circuit Interleaving

One way to overcome a pipeline bottleneck is to replicate the critical element as many times as needed and alternate inputs between the various copies.

N-way interleaving is equivalent to how many pipeline Stages? ____

C C

D Q

ENEN

C’

0 1

Latency = 2 clocks

N-wayinterleave

N-1 registers

NBut with a register

constrained to be on the

inputs

Page 58: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 58Comp 411

Combining Techniques

We can combine interleaving and pipelining. Here, C’ interleaves two C elements with a propagation delay of 8 nS. The resulting C’ circuit has a throughput of 4 nS, and latency of 8 nS. This can be considered as an extra pipelining stage that passes through the middle of the C’ module. One of our separation lines must pass through this pipeline stage.

A4 nS

B3 nS

C’4-4nS

D4 nS

E2 nS

F5 nS

By combining interleaving with pipelining we move the bottleneck from the C element to the F element.

T = 1/5nsL = 25ns

Page 59: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 59Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

Page 60: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 60Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

Page 61: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 61Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

Page 62: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 62Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

Page 63: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 63Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

Page 64: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 64Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

Page 65: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 65Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

Page 66: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 66Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

Page 67: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 67Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

2/30 = 1/15

Page 68: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 68Comp 411

Step 5:

Better Yet… Parallelism

Step 1:

Step 2:

Step 3:

Step 4:

We can combine interleavingand pipelining with parallelism.

Throughput =_______ load/min

Latency = _______ min

2/30 = 1/15

90

Page 69: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 69Comp 411

Control Structure Approaches

RIGID

LaidBack

ALL computation “events” occur at active edges of a periodic clock: time is divided into fixed-size discrete intervals.

Synchronous

Events – e.g. the loading of a register -- can happen at at arbitrary times.

Asynchronous

Timing dictated by centralized FSM according to a fixed schedule.

Globally Timed

Each module takes a START signal, generates a FINISHED signal. Timing is dynamic, data dependent.

Locally Timed

Page 70: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 70Comp 411

Control Structure Approaches

RIGID

LaidBack

ALL computation “events” occur at active edges of a periodic clock: time is divided into fixed-size discrete intervals.

Synchronous

Events – e.g. the loading of a register -- can happen at at arbitrary times.

Asynchronous

Timing dictated by centralized FSM according to a fixed schedule.

Globally Timed

Each module takes a START signal, generates a FINISHED signal. Timing is dynamic, data dependent.

Locally Timed

Page 71: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 71Comp 411

Control Structure Approaches

RIGID

LaidBack

ALL computation “events” occur at active edges of a periodic clock: time is divided into fixed-size discrete intervals.

Synchronous

Events – e.g. the loading of a register -- can happen at at arbitrary times.

Asynchronous

Timing dictated by centralized FSM according to a fixed schedule.

Globally Timed

Each module takes a START signal, generates a FINISHED signal. Timing is dynamic, data dependent.

Locally Timed

Page 72: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 72Comp 411

Control Structure Approaches

RIGID

LaidBack

ALL computation “events” occur at active edges of a periodic clock: time is divided into fixed-size discrete intervals.

Synchronous

Events – e.g. the loading of a register -- can happen at at arbitrary times.

Asynchronous

Timing dictated by centralized FSM according to a fixed schedule.

Globally Timed

Each module takes a START signal, generates a FINISHED signal. Timing is dynamic, data dependent.

Locally Timed

Page 73: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 73Comp 411

Control Structure Approaches

RIGID

LaidBack

ALL computation “events” occur at active edges of a periodic clock: time is divided into fixed-size discrete intervals.

Synchronous

Events – e.g. the loading of a register -- can happen at at arbitrary times.

Asynchronous

Timing dictated by centralized FSM according to a fixed schedule.

Globally Timed

Each module takes a START signal, generates a FINISHED signal. Timing is dynamic, data dependent.

Locally Timed

Page 74: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 74Comp 411

“Classroom Computer”

Row 1 Row 2 Row 3 Row 4 Row 5 Row 6Psets in

There are lots of problem sets to grade, each with six problems. Students in Row 1 grade Problem 1 and then hand it back to Row 2 for grading Problem 2, and so on… Assuming we want to pipeline the grading, how do we time the passing of papers between rows?

Page 75: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 75Comp 411

Controls for “Classroom Computer”

Synchronous Asynchronous

GloballyTimed

LocallyTimed

Page 76: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 76Comp 411

Controls for “Classroom Computer”

Synchronous Asynchronous

GloballyTimed

LocallyTimed

Teacher picks time interval long enough for worst-case student to grade toughest problem. Everyone passes psets at end of interval.

Page 77: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 77Comp 411

Controls for “Classroom Computer”

Synchronous Asynchronous

GloballyTimed

LocallyTimed

Teacher picks time interval long enough for worst-case student to grade toughest problem. Everyone passes psets at end of interval.

Students raise hands when they finish grading current problem. Teacher checks every 10 secs, when all hands are raised, everyone passes psets to the row behind. Variant: students can pass when all students in a “column” have hands raised.

Page 78: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 78Comp 411

Controls for “Classroom Computer”

Synchronous Asynchronous

GloballyTimed

LocallyTimed

Teacher picks time interval long enough for worst-case student to grade toughest problem. Everyone passes psets at end of interval.

Teacher picks variable time interval long enough for current students to grade current set of problems. Everyone passes psets at end of interval.

Students raise hands when they finish grading current problem. Teacher checks every 10 secs, when all hands are raised, everyone passes psets to the row behind. Variant: students can pass when all students in a “column” have hands raised.

Page 79: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 79Comp 411

Controls for “Classroom Computer”

Synchronous Asynchronous

GloballyTimed

LocallyTimed

Teacher picks time interval long enough for worst-case student to grade toughest problem. Everyone passes psets at end of interval.

Teacher picks variable time interval long enough for current students to grade current set of problems. Everyone passes psets at end of interval.

Students raise hands when they finish grading current problem. Teacher checks every 10 secs, when all hands are raised, everyone passes psets to the row behind. Variant: students can pass when all students in a “column” have hands raised.

Students grade current problem, wait for student in next row to be free, and then pass the pset back.

Page 80: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 80Comp 411

Control Structure Taxonomy

Synchronous Asynchronous

GloballyTimed

LocallyTimed

Centralized clocked FSM generates all control signals.

Central control unit tailors current time slice to current tasks.

Start and Finish signals generated by each major subsystem, synchronously with global clock.

Each subsystem takes asynchronous Start, generates asynchronous Finish (perhaps using local clock).

Easy to design but fixed-sized interval can be wasteful (no data-dependencies in timing)

Large systems lead to very complicated timing generators… just say no!

The best way to build large systems that have independent components.

The “next big idea” for the last several decades: a lot of design work to do in general, but extra work is worth it in special cases

Page 81: Pipelining - Computer Sciencegb/Comp411Fall2011/media/L16-Pipelining.pdf · Comp 411 L16 –Pipelining 3 One Load at a Time Everyone knows that the real reason that UNC students put

L16 – Pipelining 81Comp 411

Summary• Latency (L) = time it takes for the results from a given

input to arrive at outputs

• Throughput (T) = rate at each new outputs appear

• For combinational circuits: L = tPD of circuit, T = 1/L

• For K-pipelines (K > 0):• always have register on output(s)• K registers on every path from input to output• T = 1/(tPD,REG + tPD of slowest pipeline stage + tSETUP)

– more throughput split slowest pipeline stage(s)– use replication/interleaving if no further splits possible

• L = K / T– pipelined latency combinational latency