53
Asynchronous Asynchronous Pipelines Pipelines Author: Peter Yeh Author: Peter Yeh Advisor: Professor Beerel Advisor: Professor Beerel

Asynchronous Pipelines

  • Upload
    dian

  • View
    91

  • Download
    1

Embed Size (px)

DESCRIPTION

Asynchronous Pipelines. Author: Peter Yeh Advisor: Professor Beerel. Motivation. Can we reduce asynchronous pipelines communication overhead while hiding precharge time? Can we have cycle time in asynchronous pipelines as fast, if not faster, than best synchronous counterparts. - PowerPoint PPT Presentation

Citation preview

Page 1: Asynchronous Pipelines

Asynchronous PipelinesAsynchronous Pipelines

Author: Peter YehAuthor: Peter Yeh

Advisor: Professor BeerelAdvisor: Professor Beerel

Page 2: Asynchronous Pipelines

USC Asynchronous Group 2

MotivationMotivation

• Can we reduce asynchronous pipelines Can we reduce asynchronous pipelines communication overhead while hiding communication overhead while hiding precharge time?precharge time?

• Can we have cycle time in Can we have cycle time in asynchronous pipelines as fast, if not asynchronous pipelines as fast, if not faster, than best synchronous faster, than best synchronous counterparts.counterparts.

Page 3: Asynchronous Pipelines

USC Asynchronous Group 3

Motivation: System Motivation: System PerformancePerformance• Fixed stage pipelineFixed stage pipeline

– Low pipeline usage: Low latency is criticalLow pipeline usage: Low latency is critical

– High pipeline usage: Cycle time is the High pipeline usage: Cycle time is the limiting factor to generate new outputs as limiting factor to generate new outputs as fast as possiblefast as possible

• Flexible stage pipelineFlexible stage pipeline– With zero forward overhead and short cycle With zero forward overhead and short cycle

time, we can achieve a given desired time, we can achieve a given desired throughput with fewer stagesthroughput with fewer stages

Page 4: Asynchronous Pipelines

USC Asynchronous Group 4

Motivation: System Motivation: System PerformancePerformance• Pipelines with loop dependenciesPipelines with loop dependencies

– Optimal cycle time is the sum of latency Optimal cycle time is the sum of latency around the looparound the loop

– Pipelining is required to ensure Pipelining is required to ensure precharge/reset is not in the critical pathprecharge/reset is not in the critical path

– Our scheme requires less pipeline stages to Our scheme requires less pipeline stages to achieve same performanceachieve same performance

Page 5: Asynchronous Pipelines

USC Asynchronous Group 5

IntroductionIntroduction

• Asynchronous pipeline schemes using Asynchronous pipeline schemes using Taken Detector (TD)Taken Detector (TD)

• Best use in coarse-grained pipelinesBest use in coarse-grained pipelines

• Two schemes targeting different Two schemes targeting different requirements (a possible third SI requirements (a possible third SI scheme as well)scheme as well)

Page 6: Asynchronous Pipelines

USC Asynchronous Group 6

OutlineOutline• Background reviewBackground review

– SutherlandSutherland

– Ted WilliamTed William

– RenaudinRenaudin

– MartinMartin

• Taken pipelineTaken pipeline

• Performance comparisonPerformance comparison

• ConclusionConclusion

Page 7: Asynchronous Pipelines

USC Asynchronous Group 7

DefinitionDefinition

• Stage: A collection of logic that is Stage: A collection of logic that is precharged or evaluated at the same precharged or evaluated at the same timetime

• Cycle: The time it takes for a stage to Cycle: The time it takes for a stage to start next evaluation from the current start next evaluation from the current oneone

• Forward Latency: The time it takes Forward Latency: The time it takes between the start of the evaluation of between the start of the evaluation of current stage to next stagecurrent stage to next stage

Page 8: Asynchronous Pipelines

USC Asynchronous Group 8

Background OutlineBackground Outline

• Sutherland’s Micropipeline schemeSutherland’s Micropipeline scheme

• Ted William’s PS0 and PC0 pipeline Ted William’s PS0 and PC0 pipeline schemesschemes

• Renaudin’s DCVSL pipeline schemeRenaudin’s DCVSL pipeline scheme

• Martin’s deep pipeline schemeMartin’s deep pipeline scheme

Page 9: Asynchronous Pipelines

USC Asynchronous Group 9

Sutherland’s MicropipelineSutherland’s Micropipeline

• Father of Asynchronous Pipeline. Presented Father of Asynchronous Pipeline. Presented in Turing Award lecturein Turing Award lecture

• Delay InsensitiveDelay Insensitive

C

Cd

Pd

P

REG

C

Cd

Pd

P

REG

LOGIC

C

Cd

Pd

P

REG

C

Cd

Pd

P

REG

LOGIC

C

Cd

Pd

P

REG

C

Cd

Pd

P

REG

LOGIC

c

c

c

R(in)

A(in)

D(in)

A(out)

R(out)

D(out)

Page 10: Asynchronous Pipelines

USC Asynchronous Group 10

William’s PC0William’s PC0• Speed IndependentSpeed Independent

• Cycle Time (Cycle Time (PP) = 3) = 3tF tF +1+1tF tF +4+4tCtC+4+4tDtD

• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF+1+1tDtD+1+1tCtC

PrechargedFunction

BlockF1

PrechargedFunction

BlockF3

PrechargedFunction

BlockF3

D1

C1 C2 C3

D2 D3

D(in)

R(in)

A(in)A(out)

R(out)

PrechargedFunction

BlockF1

PrechargedFunction

BlockF3

PrechargedFunction

BlockF1

PrechargedFunction

BlockF3

PrechargedFunction

BlockF2

D(out)

Page 11: Asynchronous Pipelines

USC Asynchronous Group 11

PC0 Timing DiagramPC0 Timing Diagram

F1 F2 F3F1 (evaluation)

D1 (completed) C2 F2 (evaluation)

C1 D2 (completed) C3 F1 (precharge) F3 (evaluation)

D1 (Preharged) C2 D2 (completed)F2 (precharge)

C1 D2 (Preharged) C3 F1 (evaluation) F3 (precharge)

D1 (completed) C2 D3 (Preharged)F2 (evaluation)

C1 D2 (completed) C3 F1 (precharge) F3 (evaluation)

D1 (Preharged) C2 D2 (completed)F2 (precharge)

C1 D2 (Preharged) C3 F3 (precharge)

D3 (Preharged)Time

• The cycle time is shown in read arrows while The cycle time is shown in read arrows while the blue arrows show the precharge phasethe blue arrows show the precharge phase

Page 12: Asynchronous Pipelines

USC Asynchronous Group 12

Dependency GraphDependency Graph

C2 F2 C3 F3 C4 F4

D2 D2 D2

C1 F1 C2 F2 C3 F3

D1 D2 D3

C F D

C F D

0 0

00

+1

+1

+1

+1

-1

-1Folded Dependency

Graph

Flat DependencyGraph

Page 13: Asynchronous Pipelines

USC Asynchronous Group 13

William’s PC1William’s PC1

• Cycle Time (Cycle Time (PP) = 2) = 2tF tF +4+4tCtC+4+4tDtD

• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF+2+2tCtC+1+1tDtD

PrechargedFunction

BlockF1

PrechargedFunction

BlockF2

DA

C1 C2

DB D2

D(in)

R(in)

A(in)A(out)

R(out)

D(out)

CLatch

Page 14: Asynchronous Pipelines

USC Asynchronous Group 14

William’s PS0William’s PS0• Not Speed IndependentNot Speed Independent

• Cycle Time (Cycle Time (PP) = 3) = 3tF tF +1+1tF tF +2+2tDtD

• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF

PrechargedFunction

BlockF1

PrechargedFunction

BlockF2

PrechargedFunction

BlockF3

D1 D2 D3

D(in)

A(in)

A(out)

D(out)

Page 15: Asynchronous Pipelines

USC Asynchronous Group 15

PS0 Timing DiagramPS0 Timing Diagram

F1 F2 F3F1 (evaluation)

D1 (complete evaluation) F2 (evaluation)

F1 (precharge) D2 (complete evaluation) F3 (evaluation)

D1 (precharged) F2 (precharge) D3 (complete evaluation)

F1 (evaluation) D2 (precharged) F3 (precharge)

D1 (complete evaluation) F2 (evaluation) D3 (precharged)

F1 (precharge) D2 (complete evaluation) F3 (evaluation)

D1 (precharged) F2 (precharge) D3 (complete evaluation)

D2 (precharged) F3 (precharge)

D3 (precharged)

Time

Page 16: Asynchronous Pipelines

USC Asynchronous Group 16

PS0 Timing AssumptionPS0 Timing Assumption• The pipeline has to meet the following The pipeline has to meet the following

timing assoumptiontiming assoumption

1122 iiiii tDtFtDtFtFF1 F2 F3F1 (evaluation)

D1 (evaluated) F2 (evaluation)

F1 (precharge) D2 (evaluated) F3 (evaluation)

D1 (precharged) F2 (precharge) D3 (evaluated)

F1 (evaluation) D2 (precharged) F3 (precharge)

D1 (evaluated) F2 (evaluation) D3 (precharged)

F1 (precharge) D2 (evaluated) F3 (evaluation)

D1 (precharged) F2 (precharge) D3 (evaluated)

D2 (precharged) F3 (precharge)

D3 (precharged)Time

tF

1122 iiii tDtFtDtF

Page 17: Asynchronous Pipelines

USC Asynchronous Group 17

Renaudin’s DCVSL PipelineRenaudin’s DCVSL Pipeline

• Compare to Ted’s PC0 onlyCompare to Ted’s PC0 only

• Use DCVSL exclusivelyUse DCVSL exclusively

• Introduce Latched DCVSLIntroduce Latched DCVSL

• Improve cycle time but not forward latencyImprove cycle time but not forward latency

• Cycle Time (Cycle Time (PP) = 1) = 1tFtF +1+1tFtF + 4+ 4tC tC +2+2tDtD

• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF + 1+ 1tC tC +1+1tDtD

Page 18: Asynchronous Pipelines

USC Asynchronous Group 18

DCVS Logic FamilyDCVS Logic Family

ReqOut Out

c ina

a

b

Req

c ina

a

b

A ck

Req

Out

Req

DCVSL TreeA ck

In

In

Out

DCVS Logic Latched DCVS Logic

Page 19: Asynchronous Pipelines

USC Asynchronous Group 19

More on DCVSLMore on DCVSL• AdvantageAdvantage

– Fast, based on the dynamic domino type logicFast, based on the dynamic domino type logic

– Build-in Four-Phase handshakingBuild-in Four-Phase handshaking

– Robust completion sensingRobust completion sensing

– Storage elementStorage element

• DisadvantageDisadvantage

– Higher Complexity - increase in number of Higher Complexity - increase in number of transistors and areatransistors and area

– Higher Power dissipationHigher Power dissipation

Page 20: Asynchronous Pipelines

USC Asynchronous Group 20

DCVS PipelineDCVS Pipeline

PrechargedFunction

BlockF1

PrechargedFunction

BlockF2

PrechargedFunction

BlockF3

D1

C1 C2 C3

D2 D3

D(in)

R(in)

A(in)

A(out)

R(out)

D(out)

• Cycle Time (Cycle Time (PP) = 1) = 1tFtF +1+1tFtF +4+4tC tC +2+2tDtD

(2(2tFtF +4+4tC tC +2+2tDtD ) )

• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF +1+1tC tC +1+1tDtD

Page 21: Asynchronous Pipelines

USC Asynchronous Group 21

DCVS Pipeline Timing DCVS Pipeline Timing DiagramDiagram

F1 F2 F3F1 (evaluation)

D1 (completed) C2 F2 (evaluation)

C1 D2 (completed) C3 F1 (precharge) F3 (evaluation)

D1 (Preharged) C2 F2 (precharge) D2 (completed)

C1 D2 (Preharged) C3 F1 (evaluation) F3 (precharge)

D3 (Preharged)D1 (completed) C2

F2 (evaluation)

C1 D2 (completed) C3 F1 (precharge) F3 (evaluation)

D1 (Preharged) C2 F2 (precharge) D2 (completed)

C1 D2 (Preharged) C3 F3 (precharge)

D3 (Preharged)Time

Page 22: Asynchronous Pipelines

USC Asynchronous Group 22

DCVS Dependency GraphDCVS Dependency Graph

C F D

C F D

0 0

00

+1

+1

+1

+1

-1

-1 Folded DependencyGraph

• Cycle Time (Cycle Time (PP) = 1) = 1tFtF +1+1tFtF +4+4tC tC +2+2tDtD

• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF +1+1tC tC +1+1tDtD

Page 23: Asynchronous Pipelines

USC Asynchronous Group 23

Martin’s Pipeline SchemesMartin’s Pipeline Schemes

• Deep pipeliningDeep pipelining

• Quasi Delay-Insensitive (QDI)Quasi Delay-Insensitive (QDI)No timing No timing assumptionassumption

• Based on different handshaking Based on different handshaking reshufflingreshuffling

• Best scheme has high concurrency which Best scheme has high concurrency which reduce control overheadreduce control overhead

• Control logic is more complexControl logic is more complex

Page 24: Asynchronous Pipelines

USC Asynchronous Group 24

Basic Asynchronous Basic Asynchronous HandshakingHandshaking

ee

ee

RRRRRxRx

LLLLxLxLFB

; ,; ;

;; ; ;

101100

101100

2L0

L1

Le Re

R0

R1

1L0

L1

Le Re

R0

R1

3L0

L1

Le Re

R0

R1

• Reshuffling eliminates the explicit variable Reshuffling eliminates the explicit variable xx• Large control overheadLarge control overhead

L1

LeLe

L1 R1R1

ReRe

Page 25: Asynchronous Pipelines

USC Asynchronous Group 25

Handshaking ReshufflingHandshaking Reshuffling

ee

ee

LRRLLR

LRLRLRHB

; ,; ;

;; ;

1010

1100

• Still wait for predecessor to reset before Still wait for predecessor to reset before resetting itselfresetting itselflarger overhead for more inputslarger overhead for more inputs

2L0

L1

Le Re

R0

R1

1L0

L1

Le Re

R0

R1

3L0

L1

Le Re

R0

R1

L1

LeLe

L1 R1R1

ReRe

Page 26: Asynchronous Pipelines

USC Asynchronous Group 26

Precharge-Logic Half-BufferPrecharge-Logic Half-Buffer

• Doesn’t wait for the predecessor to reset Doesn’t wait for the predecessor to reset before it resets its outputs. Yet, the control before it resets its outputs. Yet, the control logic wait for the reset of the predecessor logic wait for the reset of the predecessor only after current stage has resetonly after current stage has reset

ee

ee

LLLRRR

LRLRLRPCHB

; ; ,;

;; ;

1010

1100

2L0

L1

Le Re

R0

R1

1L0

L1

Le Re

R0

R1

3L0

L1

Le Re

R0

R1

L1

LeLe

L1 R1R1

ReRe

Page 27: Asynchronous Pipelines

USC Asynchronous Group 27

Precharge-Logic Full-BufferPrecharge-Logic Full-Buffer

• Allows the neutrality test of the output Allows the neutrality test of the output data to overlap with raising the left enablesdata to overlap with raising the left enables

• Complex control logic, requires extra state Complex control logic, requires extra state variablevariable

enLLLRRR

enLRLRLRPCFB

ee

ee

; ; , ,;

; ; ;

1010

1100

2L0

L1

Le Re

R0

R1

1L0

L1

Le Re

R0

R1

3L0

L1

Le Re

R0

R1

L1

LeLe

L1 R1R1

ReRe

enen

Page 28: Asynchronous Pipelines

USC Asynchronous Group 28

Martin’s PCHB Full-adderMartin’s PCHB Full-adderen

A1

Se

A1S0S1

A0 A0C1 C1

C0 C0B1 B0

Se

en

S0S1

en

De

D0D1

De

en

C1 C0

B1B0

A1 A0

B1 B0

D0D1

enA0

A1

B1

B0

C0

C1

Av

Bv

Cv

S0

S1

D0

D1

ABCv

Dv

Le

C C

Page 29: Asynchronous Pipelines

USC Asynchronous Group 29

Martin’s Pipeline in GeneralMartin’s Pipeline in General

• The Cycle time is limited by the The Cycle time is limited by the properties of QDIproperties of QDI– Next stage has to Next stage has to finishfinish precharge before precharge before

the current stage can evaluate next inputthe current stage can evaluate next input

PrechargedFunction

BlockF1

PrechargedFunction

BlockF2

PrechargedFunction

BlockF3

D1 D2 D3

D(in)

D(out)

Control Control ControlLe

Le

Re

Page 30: Asynchronous Pipelines

USC Asynchronous Group 30

Performance Analysis on Performance Analysis on PCFBPCFB• Control logic can be seen as completion Control logic can be seen as completion

detection (D) plus C-element (C)detection (D) plus C-element (C)

• Reshuffling of handshaking just changes the Reshuffling of handshaking just changes the degree of the concurrency but it doesn’t degree of the concurrency but it doesn’t affect the best case performance analysisaffect the best case performance analysis

• Cycle Time (Cycle Time (PP) = 3) = 3tFtF +1+1tFtF +2+2tC tC +2+2tDtD

• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF

Page 31: Asynchronous Pipelines

USC Asynchronous Group 31

OutlineOutline• Background reviewBackground review

– SutherlandSutherland

– Ted WilliamTed William

– RenaudinRenaudin

– MartinMartin

• Taken pipelineTaken pipeline

• Performance comparisonPerformance comparison

• ConclusionConclusion

Page 32: Asynchronous Pipelines

USC Asynchronous Group 32

Taken PipelineTaken Pipeline

• Use of Taken DetectorUse of Taken Detector

• Two schemes to satisfy different Two schemes to satisfy different requirementsrequirements

• Both are not speed independent Both are not speed independent

Page 33: Asynchronous Pipelines

USC Asynchronous Group 33

Initial IdeaInitial Idea

• Precharge: only when next stage has taken Precharge: only when next stage has taken the current resultthe current result

• Evaluation: only when next stage has Evaluation: only when next stage has prechargedprecharged

• Similar idea to Martin’s pipeline schemesSimilar idea to Martin’s pipeline schemes

Page 34: Asynchronous Pipelines

USC Asynchronous Group 34

Further ObservationFurther Observation

• PrechargePrecharge– We can precharge the current stage as We can precharge the current stage as

soon as the first level logic of next stage soon as the first level logic of next stage has evaluatedhas evaluatednext stage has taken the next stage has taken the resultresult

• EvaluateEvaluate– Evaluation can be started as soon as the Evaluation can be started as soon as the

guarded N-transistor in the first level logic guarded N-transistor in the first level logic of next stage has turned offof next stage has turned off

Page 35: Asynchronous Pipelines

USC Asynchronous Group 35

Relax Precharge (RP) Relax Precharge (RP) ConstraintConstraint• Current stage can precharge as soon as Current stage can precharge as soon as

the first level logic of next stage has the first level logic of next stage has evaluated: Next stage has Taken the resultevaluated: Next stage has Taken the result

• Current stage can evaluate as soon as the Current stage can evaluate as soon as the first level logic of next stage has first level logic of next stage has precharged, blocking the new result from precharged, blocking the new result from passing throughpassing through

• No need for extra control logic except TD No need for extra control logic except TD which is similar to completion detectorwhich is similar to completion detector

Page 36: Asynchronous Pipelines

USC Asynchronous Group 36

RP Pipeline SchemeRP Pipeline Scheme

PrechargedFunction

BlockF1

PrechargedFunction

BlockF2

PrechargedFunction

BlockF3

TD1 TD2 TD3

D(in) D(out)

• Cycle Time (Cycle Time (PP) = 2) = 2tFtF + 1+ 1tF1tF1 +1+1tF1tF1 +2+2tTDtTD

• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF

Page 37: Asynchronous Pipelines

USC Asynchronous Group 37

RP Timing DiagramRP Timing Diagram

F1 F2 F3F1 (evaluation)

T1 (taken) F2 (evaluation)

F1 (precharge) T2 (taken) F3 (evaluation)

T1 (precharged) F2 (precharge) T3 (taken)

F1 (evaluation) T2 (precharged) F3 (precharge)

T1 (taken) F2 (evaluation) T3 (precharged)

F1 (precharge) T2 (taken) F3 (evaluation)

T1 (precharged) F2 (precharge) T3 (taken)

T2 (precharged) F3 (precharge)

T3 (precharged)Time

Page 38: Asynchronous Pipelines

USC Asynchronous Group 38

RP Timing AssumptionRP Timing Assumption• Easy to meet timing assumptionEasy to meet timing assumption

112211 112 iiiiiii tTDtFtTDtFtFtFtTD

F1 F2 F3F1 (evaluation)

T1 (taken) F2 (evaluation)

F1 (precharge) T2 (taken) F3 (evaluation)

T1 (precharged) F2 (precharge) T3 (taken)

F1 (evaluation) T2 (precharged) F3 (precharge)

T1 (taken) F2 (evaluation) T3 (precharged)

F1 (precharge) T2 (taken) F3 (evaluation)

T1 (precharged) F2 (precharge) T3 (taken)

T2 (precharged) F3 (precharge)

T3 (precharged)

Time

ii tFtTD 1

11221 112 iiiii tTDtFtTDtFtF

Page 39: Asynchronous Pipelines

USC Asynchronous Group 39

RP Timing Assumption Cont.RP Timing Assumption Cont.

• tF1tF1ii is the first level logic of stage is the first level logic of stage ii

• tF2tF2ii is the logic after the first level of is the logic after the first level of stage stage ii

• Assuming rising and falling of TD is the Assuming rising and falling of TD is the samesame

Page 40: Asynchronous Pipelines

USC Asynchronous Group 40

Relax Evaluation (RE) Relax Evaluation (RE) ConstraintConstraint• Current stage can start the evaluation Current stage can start the evaluation

about the same time as the next stage about the same time as the next stage turns off the guarded N-transistors in the turns off the guarded N-transistors in the first level logicfirst level logic

• Requires general C-element, yet improve Requires general C-element, yet improve cycle timecycle time

Page 41: Asynchronous Pipelines

USC Asynchronous Group 41

RE Pipeline SchemeRE Pipeline Scheme• TD can be skewed for fast evaluation TD can be skewed for fast evaluation

detectiondetection

• Cycle Time (Cycle Time (PP) = 2) = 2tFtF + 1+ 1tF1tF1 +1+1tTDtTD +1 +1tCtC

• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF

PrechargedFunction

BlockF1

PrechargedFunction

BlockF2

PrechargedFunction

BlockF3

TD1 TD2 TD3

D(in) D(out)

GC1

+

GC1 GC1

+ +

Page 42: Asynchronous Pipelines

USC Asynchronous Group 42

RE Timing DiagramRE Timing DiagramF1 F2 F3F1 (evaluation)

T1 (taken) F2 (evaluation)

C1 T2 (taken) F3 (evaluation)F1 (precharge)

C2 T3 (taken)T1 (precharged) F2 (precharge)

C3C1 T2 (precharged) F3 (precharge)F1 (evaluation)

C2 T3 (precharged)T1 (taken) F2 (evaluation)

C3C1 T2 (taken) F3 (evaluation)F1 (precharge)

C2 T3 (taken)T1 (precharged) F2 (precharge)

C3T2 (precharged) F3 (precharge)

T3 (precharged)Time

Page 43: Asynchronous Pipelines

USC Asynchronous Group 43

RE Timing Assumption 1RE Timing Assumption 1

• Precharge constraintPrecharge constraint

iiiiiii tCtTDtFtFtFtCtTD 2211 12F1 F2 F3F1 (evaluation)

T1 (taken) F2 (evaluation)

C1 T2 (taken) F3 (evaluation)F1 (precharge)

C2 T3 (taken)T1 (precharged) F2 (precharge)

C3C1 T2 (precharged) F3 (precharge)F1 (evaluation)

C2 T3 (precharged)T1 (taken) F2 (evaluation)

C3C1 T2 (taken) F3 (evaluation)F1 (precharge)

C2 T3 (taken)T1 (precharged) F2 (precharge)

C3T2 (precharged) F3 (precharge)

T3 (precharged)

Time

iii tFtCtTD 1

iiii tCtTDtFtF 221 12

Page 44: Asynchronous Pipelines

USC Asynchronous Group 44

RE Timing Assumption 2RE Timing Assumption 2

11 1 iiii tFtFCtFtC

• Evaluation constraint (Min Delay)Evaluation constraint (Min Delay)

F1 F2 F3F1 (evaluation)

T1 (taken) F2 (evaluation)

C1 T2 (taken) F3 (evaluation)F1 (precharge)

C2 T3 (taken)T1 (precharged) F2 (precharge)

C3C1 T2 (precharged) F3 (precharge)F1 (evaluation)

C2 T3 (precharged)T1 (taken) F2 (evaluation)

C3C1 T2 (taken) F3 (evaluation)F1 (precharge)

C2 T3 (taken)T1 (precharged) F2 (precharge)

C3T2 (precharged) F3 (precharge)

T3 (precharged)

Time

ii tFtC

11 1ii tFtFC

Page 45: Asynchronous Pipelines

USC Asynchronous Group 45

Issue in Fine-Grained Issue in Fine-Grained PipelinesPipelines• In a fine-grained pipeline, such as Martin’s In a fine-grained pipeline, such as Martin’s

single gate pipeline, RE scheme may single gate pipeline, RE scheme may require buffering due to process variationrequire buffering due to process variation– Buffering is necessary because of second Buffering is necessary because of second

timing assumption, next gate (stage) may not timing assumption, next gate (stage) may not have turned off N-stack before the result from have turned off N-stack before the result from current stage reaches itcurrent stage reaches it

11 1 iiii tFtFCtFtC

Page 46: Asynchronous Pipelines

USC Asynchronous Group 46

Taken Detector (TD)Taken Detector (TD)

• Similar to Completion DetectorSimilar to Completion Detector

• Detect both evaluation and prechargeDetect both evaluation and precharge

• Inputs are the output of first level logic Inputs are the output of first level logic of each stageof each stage

Page 47: Asynchronous Pipelines

USC Asynchronous Group 47

Datapath Merging & SplittingDatapath Merging & Splitting

• Datapath merging and splitting can be Datapath merging and splitting can be done similar to William’s styledone similar to William’s style

PrechargedFunction

BlockF2a

PrechargedFunction

BlockF3

TD2a

TD3

D(out)PrechargedFunction

BlockF2b

PrechargedFunction

BlockF1

TD1

TD2b

C

D(in)

Page 48: Asynchronous Pipelines

USC Asynchronous Group 48

OutlineOutline• Background reviewBackground review

– SutherlandSutherland

– Ted WilliamTed William

– RenaudinRenaudin

– MartinMartin

• Taken pipelineTaken pipeline

• Performance comparisonPerformance comparison

• ConclusionsConclusions

Page 49: Asynchronous Pipelines

USC Asynchronous Group 49

Comparison of RE and Comparison of RE and Synchronous Skew Tolerant Synchronous Skew Tolerant • Assuming 4 stages pipeline, stage 1-4, Assuming 4 stages pipeline, stage 1-4,

and 4 phases clockingand 4 phases clocking

• Synchronous:Synchronous:

– Stage 1 starts next evaluation after stage 4 Stage 1 starts next evaluation after stage 4 starts evaluationstarts evaluation

• Asynchronous:Asynchronous:

– Stage 1 starts next evaluation after we Stage 1 starts next evaluation after we detect the completion of the first level logic detect the completion of the first level logic of stage 3of stage 3

Page 50: Asynchronous Pipelines

USC Asynchronous Group 50

Comparison AssumptionsComparison Assumptions

• It is a balanced pipeline—all stages It is a balanced pipeline—all stages have equal evaluation timehave equal evaluation time

• Precharge time is same as evaluation Precharge time is same as evaluation timetime

Page 51: Asynchronous Pipelines

USC Asynchronous Group 51

Graphical ComparisonGraphical ComparisonStage 1

1

Stage 22

Stage 33

Stage 44

Page 52: Asynchronous Pipelines

USC Asynchronous Group 52

Optimum Number of StagesOptimum Number of Stages

• Optimum Number of Stages (ONS)Optimum Number of Stages (ONS)

• Cycle Time is not the only factor in system Cycle Time is not the only factor in system performance, Forward Latency is also a performance, Forward Latency is also a limiting factor limiting factor

• Larger cycle time can be compensated by Larger cycle time can be compensated by increasing the number of stagesincreasing the number of stages

• However, high However, high LLff means system throughput means system throughput can not be increased by adding more can not be increased by adding more stagesstages

fL

PONS

Page 53: Asynchronous Pipelines

USC Asynchronous Group 53

Conclusion Conclusion

• With Taken logic and some easy to With Taken logic and some easy to meet timing requirement, we can meet timing requirement, we can achieve the best cycle time and forward achieve the best cycle time and forward latencylatency

• The performance comparison with The performance comparison with existing pipeline schemes are favorableexisting pipeline schemes are favorable

• Implementation is still required to prove Implementation is still required to prove the theorythe theory