Upload
janna
View
63
Download
0
Tags:
Embed Size (px)
DESCRIPTION
J. Cortadella (Univ. Politècnica Catalunya) Mike Kishinevsky (Intel Corporation) Alex Kondratyev (University of Aizu) Luciano Lavagno (Universitá di Udine) Enric Pastor (Univ. Politècnica Catalunya) Alexander Taubin (University of Aizu) Alex Yakovlev (Univ. Newcastle upon Tyne). - PowerPoint PPT Presentation
Citation preview
STG-based synthesis and Petrify STG-based synthesis and Petrify
J. Cortadella (Univ. Politècnica Catalunya)
Mike Kishinevsky (Intel Corporation)
Alex Kondratyev (University of Aizu)
Luciano Lavagno (Universitá di Udine)
Enric Pastor (Univ. Politècnica Catalunya)
Alexander Taubin (University of Aizu)
Alex Yakovlev (Univ. Newcastle upon Tyne)
What is it about?
This tutorial is about the synthesis of asynchronous circuits from behavioral specifications.
STGs can specify I/O concurrency(based on Petri nets).
STGs specify behavior at a level in which logic synthesis techniques can be applied.
Speed-independent and timed circuits can be derived.
Specification(STG)
State Graph
SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
State encoding
Boolean minimization
Logic decomposition
Technology mapping
DesignDesignflowflow
Outline
Overview
Synthesis steps– Specification (STGs)– State encoding– Logic synthesis, decomposition and mapping
Synthesis with relative timing
Conclusions
x
y
z
x+
x-
y+
y-
z+
z-
Signal Transition Graph (STG)
xy
z
x
y
z
x+
x-
y+
y-
z+
z-
x+
x-
y+
y-
z+
z-
xyz000
x+
100y+z+
z+y+
101 110
111
x-
x-
001
011y+
z-
010
y-
xyz000
x+
100y+z+
z+y+
101 110
111
x-
x-
001
011y+
z-
010
y-
Next-state functions
x z x y ( )
y z x
z x y z
Next-state functions
x z x y ( )
y z x
z x y z
x
z
y
Specification(STG)
State Graph
SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
State encoding
Boolean minimization
Logic decomposition
Technology mapping
DesignDesignflowflow
VME bus
DeviceLDS
LDTACK
D
DSr
DSw
DTACK
VME BusController
DataTransceiver
BusDSr
LDS
LDTACK
D
DTACK
Read Cycle
STG for the READ cycle
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
LDS
LDTACK
D
DSr
DTACK
VME BusController
Choice: Read and Write cycles
DSr+
LDS+
LDTACK+
D+
DTACK+
DSr-
D-
LDS-
LDTACK- DTACK-
DSw+
D+
LDS+
LDTACK+
D-
DTACK+
DSw-
LDS-
LDTACK-DTACK-
Choice: Read and Write cycles
DTACK-
DSr+
LDS+
LDTACK+
D+
DTACK+
DSr-
D-
LDS-
LDTACK-
DSw+
D+
LDS+
LDTACK+
D-
DTACK+
DSw-
LDS-
LDTACK-DTACK-
Choice: Read and Write cycles
DTACK-
DSr+
LDS+
LDTACK+
D+
DTACK+
DSr-
D-
LDS-
LDTACK-
DSw+
D+
LDS+
LDTACK+
D-
DTACK+
DSw-
LDS-
LDTACK-DTACK-
Choice: Read and Write cycles
DTACK-
DSr+
LDS+
LDTACK+
D+
DTACK+
DSr-
D-
LDS-
LDTACK-
DSw+
D+
LDS+
LDTACK+
D-
DTACK+
DSw-
LDS-
LDTACK-DTACK-
Circuit synthesis
Goal:– Derive a hazard-free circuit
under a given delay model andmode of operation
Modes of operation
Currentstate
Nextstate
Fundamental mode– Single-input changes– Multiple-input changes
Input / Output mode– Concurrency
circuit / environment
Speed independence
Delay model– Unbounded gate / environment delays– Certain wire delays shorter than certain paths in
the circuit
Conditions for implementability:– Consistency– Complete State Coding– Output persistency
Other synthesis approaches
Burst-mode machines– Mealy-like FSMs– Fundamental mode (slow environment)
VLSI programming– Syntax-directed translation from CSP
(“Communicating Sequential Processes”)– No logic synthesis– Circuit size ~ Size of the specification
Specification(STG)
State Graph
SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
State encoding
Boolean minimization
Logic decomposition
Technology mapping
DesignDesignflowflow
STG for the READ cycle
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
LDS
LDTACK
D
DSr
DTACK
VME BusController
State Graph (Read cycle)
DSr+
DSr+
DSr+
DTACK-
DTACK-
DTACK-
LDS-LDS-LDS-
LDTACK- LDTACK- LDTACK-
D-
DSr-DTACK+
D+
LDTACK+
LDS+
Binary encoding of signals
DSr+
DSr+
DSr+
DTACK-
DTACK-
DTACK-
LDS-LDS-LDS-
LDTACK- LDTACK- LDTACK-
D-
DSr-DTACK+
D+
LDTACK+
LDS+
Binary encoding of signals
DSr+
DSr+
DSr+
DTACK-
DTACK-
DTACK-
LDS-LDS-LDS-
LDTACK- LDTACK- LDTACK-
D-
DSr-DTACK+
D+
LDTACK+
LDS+
10000
10010
10110 01110
01100
0011010110
(DSr , DTACK , LDTACK , LDS , D)
QR (LDS+)QR (LDS+)
QR (LDS-)QR (LDS-)
Excitation / Quiescent Regions
ER (LDS+)ER (LDS+)
ER (LDS-)ER (LDS-)
LDS-LDS-
LDS+
LDS-
Next-state function
0 1
LDS-LDS-
LDS+
LDS-
1 0
0 0
1 1
1011010110
Karnaugh map for LDS
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 0 0 0 0/1?
1
111
-
-
-
---
- - - -
-
- ---
- - -
Specification(STG)
State Graph
SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
State encoding
Boolean minimization
Logic decomposition
Technology mapping
DesignDesignflowflow
Concurrency reduction
LDS-LDS-
LDS+
LDS-
1011010110
DSr+
DSr+
DSr+
Concurrency reduction
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
State encoding conflicts
LDS-
LDTACK-
LDTACK+
LDS+
10110
10110
Signal Insertion
LDS-
LDTACK-
D-
DSr-
LDTACK+
LDS+
CSC-
CSC+
101101
101100
Specification(STG)
State Graph
SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
State encoding
Boolean minimization
Logic decomposition
Technology mapping
DesignDesignflowflow
Complex-gate implementation
)(csccsc
csc
csc
LDTACKDSr
LDTACKD
DDTACK
DLDS
Specification(STG)
State Graph
SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
State encoding
Boolean minimization
Logic decomposition
Technology mapping
DesignDesignflowflow
Hazards
abc
x 0
abcx1000
1
0
0
1100
b+1
1
00100
a-
0
1
0
0110
c+
0
1
1
Hazardsabcx1000
1100
b+
0100
a-
0110
c+
a
bz
cx
1
0
0
00
10001
1
0
001100
1
1
1
001100
0
1
1
00
0100
0
1
1
10
0110
0
1
1
11
0
1
0
11
0
1
0
10
Decomposition
Global acknowledgement
Generating candidates
Hazard-free signal insertion
– Event insertion
– Signal insertion
Global acknowledgement
abc
z
abd
y
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
abc
z
abd
y
How about 2-input gates ?
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
a
bc
z
abd
y
How about 2-input gates ?
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
a
bc
z
abd
y
How about 2-input gates ?
00
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
abc
z
a
bd
y
How about 2-input gates ?
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
cz
dy
How about 2-input gates ?
a
b
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
Strategy for logic decomposition
Each decomposition defines a new internal signal
Method: Insert new internal signals such that– After resynthesis, some large gates are decomposed– The new specification is hazard-free
Generation of candidates for decomposition:– Algebraic factorization– Boolean factorization (boolean relations)
y-
z- w-
y+ x+
z+
x-
w+
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
Decomposition example
yz=1yz=0
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
C
C
x
y
x
y
w
z
xyz
y
zw
z
w
z
y
s-
s+
s-
s-
s=1
s=0
1001 1011
1000
1010
0111
0011y+
x-
w+
z+
z-
0001
0000 0101
0010 0100
0110
x+
w-
w-
w-
z-
z-y+
y+
x+
x+
1001
1000
1010
y+
z-
0111
C
C
x
y
x
y
w
z
x
y
z
w
z
w
z
y
sy-
s-
s+
s-
s-
s=1
s=0
1001 1011
1000
1010
0111
0011y+
x-
w+
z+
z-
0001
0000 0101
0010 0100
0110
x+
w-
w-
w-
z-
z-y+
y+
x+
x+
1001
1000
1010
y+
z-
0111
y-y-
z- w-
y+ x+
z+
x-
w+
s-
s+
s-
s+
s-
s+
s-
s+
s-
s+
s-
s+
s-
s+
s-
s+
Event insertion
a b
ER(x)
cx x x x
b
SR(x)
a
Properties to preserve
a
a
b
b
a
a
b
b
a
a
b
b
xx
a
a
b
b
a
a
b
b
ba
a
b
b
xx
xx
a ispersistent
a is disabled by b
= hazards
Specification(STG)
State Graph
SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
State encoding
Boolean minimization
Logic decomposition
Technology mapping
DesignDesignflowflow
Timing assumptions in design flow
Speed-independent: wire delays after a forkshorter than fan-out gate delays
Burst-mode: circuit stabilizes betweentwo changes at the inputs
Timed circuits: Absolute bounds on gate / environment delays are known a priori (before physical design)
Relative Timing Circuits
Assumptions: “a before b” for concurrent andordered events
Used by the tool to derive a circuit and timing constraints that must be met in physical design flow
Applied to design of the Rotating Asynchronous Pentium Processor(TM) Instruction Decoder (K.Stevens, S.Rotem et al. Intel Corporation)
Lazy Transition Systems
ER (LDS+)ER (LDS+)
EnR (LDS-)EnR (LDS-)
LDS-LDS-
LDS+
LDS-DTACK- FR (LDS-)FR (LDS-)
Event LDS- is lazy: firing = subset of enabling
Timing assumptions
(a before b) for concurrent events: concurrency reduction for firing and enabling
(a before b) for ordered events: early enabling
(a simultaneous to b wrt c) for triples of events: combination of the above
Netlist with SI timing constraints
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr
LDS
LDTACK
csc
map
Adding timing assumptions (I)
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr
LDS
LDTACK
csc
map
LDTACK- before DSr+
FAST
SLOW
Adding timing assumptions (I)
DTACKD
DSr
LDS
LDTACK
csc
map
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
LDTACK- before DSr+
State space domain
LDTACK- before DSr+
LDTACK-
DSr+
State space domain
LDTACK- before DSr+
LDTACK-
DSr+
State space domain
LDTACK- before DSr+
LDTACK-
DSr+
Two more unreachable states
Boolean domain
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 0 0 0 0/1?
1
111
-
-
-
---
- - - -
-
- ---
- - -
Boolean domain
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 - 0 0 1
1
111
-
-
-
---
- - - -
-
- ---
- - -
One more DC vector for all signals One state conflict is removed
Netlist with one constraint
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr
LDS
LDTACK
csc
map
Netlist with one constraint
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACK D
DSr LDS
LDTACK
LDTACK- before DSr+
TIMING CONSTRAINT
Timing assumptions
(a before b) for concurrent events: concurrency reduction for firing and enabling
(a before b) for ordered events: early enabling
(a simultaneous to b wrt c) for triples of events: combination of the above
Ordered events: early enabling
a
c
b
a
a
c
b
a
bb
c cF G
Logic for gate c may change
Adding timing assumptions (II)
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr LDS
LDTACK
D- before LDS-
State space domain
LDS-
D-
Reachable space is unchanged
For LDS- enabling can be changed in one state
D- before LDS-
Potential enabling for LDS-
DSr-
Boolean domain
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 - 0 0 1
1
111
-
-
-
---
- - - -
-
- ---
- - -
Boolean domain
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 - 0 0 1
1
11-
-
-
-
---
- - - -
-
- ---
- - -
One more DC vector for one signal: LDSIf used: LDS = DSr, otherwise: LDS = DSr + D
Before early enabling
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr LDS
LDTACK
Netlist with two constraints
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
LDTACK- before DSr+and D- before LDS-
TIMING CONSTRAINTSDTACKD
DSr LDS
LDTACK
Both timing assumptions are used for optimization and become constraints
Backannotation
Timed circuits require post-verification
Can synthesis tools help ?– Report the least stringent set of timing assumptions required
for the correctness of the circuit– Not all initial timing assumptions may be required
Petrify reports a set of firing order constraints that guarantee the circuit correctness
Experiments
Assumption: delays are controllable in physical design
2-3x improvement in area/delay wrt to SI(K.Stevens, S.Rotem et al. Intel Corporation)
– Rotating Asynchronous Pentium Processor(TM)
– Instruction Decoder (Async’99)
Summary
Synthesis of asynchronous circuits can be automated at gate level (logic synthesis)
Timing assumptions/constraints are essential to compete with synchronous circuits
Relative timing seems to be a promising approach for specification and synthesis
High-level and logic synthesis can be combined(e.g. CSP Petri net circuit)
Petrify
The synthesis methodology presented in this tutorial is handled by petrify
but also ...– Concurrency reduction– Automatic handshake expansion (2-4 phase)– Noise isolation– Synthesis with gC elements and gate libraries– Synthesis of Petri nets Synthesis of Petri nets (crucial for backannotation)– ...
Petrify: implementation details
50,000 lines of code +SIS (data structures & logic synthesis) +BDD package (symbolic manipulation) +dot (graph visualization package from AT&T)
BDD-based implementation:– Reachability analysis– Manipulation of sets of states– Boolean minimization
Petrify
http://www.lsi.upc.es/~jordic/petrify
• References
• Tutorial for the designer
• Binaries (for several platforms)