Upload
khangminh22
View
4
Download
0
Embed Size (px)
Citation preview
AD-A124 064 AN INVESTIGATION OF ORDERING TEARING AND LATENCY I/tALGORITHMS FOR THE'TIME-.. (U) ILLINOIS UNIV AT URBANACOORD INATED SC IENCE LAB P YANG AUG 80 R-891
UNCLASSIFIED NO014-79-C0424 F/G 9/5 NLEIIIIIIIIIIIEIIIIIIIIIIIIuMhEOOEEhNIEIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIEEEIIIIEEIIIIE
REPORT R-891 AU ST, 1980 UILU-ENG 80-2223
I?,1CORDINA TED SCIENCE LABORATORY
U AN INVESTIGATION OF ORDERING,TEARING, AND LATENCY ALGORITHMSFOR THE TIME-DOMAIN SIMULATIONOF LARGE CIRCUITS
PIG YANG
IIIII
A ," ' PUBLIC *WAh. CHSTRINUTION UNLIMITED.
ECTE3
i ~~~JA 3 1983E
"I UNIVERSITY OF ILLINOIS - URBANA, ILLINOIS1 83 01 31 102
|rl~a I" I ii iI~i - -
UNCLASSIFIEDSECURITY CLASSIFICATION OF THIS PAGE (When Doe. Entor*d)
REPORT DOCUMENTATION PAGE READ INSTRUCTIONSBEFORE COMPLETING FORMI. REPORT NUMBER 2. GOVT ACCESSION NO. 3. RECiPIENT'S CATALOG NUMBER
4. TITLE (and Sublils) S. TYPE OF REPORT & PERIOD COVERED
AN INVESTIGATION OF ORDERING, TEARING, AND LATENCYALGORITHMS FOR THE TIME-DOMAIN SIMULATION OF LARGE Technical ReportCIRCUITS 6. PERFORMING ORG. REPORT NUMBER
R-880; UILU-ENG 80-22127. AUTHOR(o) 6. CONTRACT OR GRANT NUMBER()
Ping Yang N00014-79-C-0424
4 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT PROJECT, TASKCoordinated Science Laboratory AREA & WORK UNIT NUMBERS
University of Illinois at Urbana-ChampaignUrbana, Illinois 61801
1I. CONTROLLING OFFICE NAME ANO ADDRESS 12. REPORT DATE
August 1980Joint Services Electronics Program 13. NUMBER oF PAGES
17614. MONITORING AGENCY NAME & ADDRESS(II dif erent from Concroiling Office) IS. SECURITY CLASS. ro( this report)
UNCLASSIFIED
IS&. OECL ASSIIrlCATION/ DOWNGRAOINGSCH E DU LE
16. OISTRISUTION STATEMENT (of che aReport)
Approved for public release; distribution unlimited
17. DISTRIBUTION STATEMENT (ot the *bstract entered in 81'ck 20. if different from Report)
IS. SUPPLEMENTARY NOTES
19. KEY WORDS (Continue on roverse side if ncessory anid Identify by block nwnber)
Integrated CircuitsComputer-Aided Analysis ProgramDC and Transient Analysis Including Tearing and Latency
20. ABSTRACT (Continue on reverse aide It necessary and Identify by block number)Many circuit simulation programs have been available for the design of integratecircuits. However, these conventional circuit simulation programs calculateall of the node voltages or branch voltages and currents at each iteration andeach timepoint. Even with sparse matrix techniques the simulation of modernlarge-scale integrated (LSI) circuits is not possible in many situations dueto the excessive computation time and high storage requirements.
The goal of this research was to investigate new approaches to the
OD FOANm 1473 UNCLASSIFIED (Over)
SECURITY CLASSIFICATION OF THIS pAGE rWhen Dots Entere )
.t,
SECURITY Ct.A$SIFICATION OF T4IS PAGO(Vehan Data Entered)
simulation of integrated circuits which can alleviate the problems of excessive'computation time and high storage requirements. A new ordering scheme for themodified nodal approach was developed, and some new algorithms for the dc andtransient analysis of logic circuits were studied. Different tearing methodsand sparsity considerations for the node tearing method were theoretically andexperimentally studied. Latency at the subcircuit and the network levels wasinvestigated. Different latency criteria were proposed and studied. The resul.of this research is a new general purpose circuit simulation program SLATE.
UNCLASSIFIED
SECURITY CLASIICATION OFr
TmiS PAGcrm.q 3e& t aotee.E
II.I
AN INVESTIGATION OF ORDERING, TEARING, AND LATENCY ALGORITHMS FORTHE TIME-DOMAIN SIMULATION OF LARGE CIRCUITS
by
I Ping Yang
This work was supported by the Joint Services Electronics
Program under Contract N00014-79-C-0424.
III
Reproduction in whole or in part is permitted for any purpose of
the United States Government.
I Accession ForNTIS GRA&I
DTIC TAB
Approved for public release. Distribution unlimited. Unannounced ElJustificatio
- --.-- Distribution/. " Availability Codes
* ';Avail and/or
SDist Special
1"
AN INVESTIGATION OF ORDERING, TEARING, AND LATEN4CY ALGORITHMS FORI THE TIME-DOMAIN SIMULATION OF LARGE CIRCUITS
BY
PING YANG
B.S., National Taiwan University, 1974M.S., University of Illinois, 1978
THESIS
Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy in Electrical Engineering
in the Graduate College of theUniversity of Illinois at Urbana-Champaign, 1980
Thesis Adviser: Professor Timothy N. Trick
and
Professor Ibrahim N. Hajj
Urbana, Illinois
3 AN INVESTIGATION OF ORDERING, TEARING, AND LATENCY ALGORITHMS FOR
THE TIME-DOMAIN SIMULATION OF LARGE CIRCUITS
Ping Yang, Ph.D.
Coordinated Science Laboratory andI Department of Electrical EngineeringUniversity of Illinois at Urbana-Champaign, 1979
Manyi circuit simulation prc-grams have been available for the design of
integrated circuits. However, these conventional circuit simulation
programs calculate all of the node voltages or branch voltages and currents
at each iteration and each timepoint. Even with sparse matrix techniques
the simulation of modern large-scale integrated (L.SI) circuits is not
possible in many situations due to the excessive computation time and high
storage requirements.
The goal of this research was to investigate new approaches to the
simulation of integrated circuits which can alleviate the problems of
excessive computation time and high storage requirments. A new ordering
scheme for the modified nodal approach was developed, and some new
algorithms for the dc and transient analysis of logic circuits were
studied. Different tearing methods and sparsity considerations for the
node tearing method were theoretically and experimentally studied. Latency
at the subcircuit and the network levels was investigated. Different
latency criteria were proposed and studied. The result of this research is
a new general purpose circuit simulation program SLATE.
-iii-
ACKNOWLEDGEMENTS
First, I would like to thank Professor T. N. Trick, my dissertation
advisor. With his thorough knowledge of the field, he provided valuable
guidance and help throughout the course of this research. Also, with his
great personality, he provided support and encouragement in all respects of
my life throughout my graduate study. I would also like to thank Professor
I. N. Hajj, my dissertation co-advisor, for the many helpful discussions
with him and suggestions. I am also grateful to Professor M. R. Lightner,
who invented the name 'SLATE' for my program and offered a lot :f advice
and encouragement. I also wish to thank Professor W. R. PerKins for being
member of my dissertation committee and his support.
I am also indebted to my wife, Jau-Yuann, for her help in preparing
this dissertation and her understanding throughout my graduate study.
Finally, I would like to thank my parents, I-Lueh and Queh-Chu Yang,
for their love and support throughout the past years. Ii- -4
-iv-
TABLE OF CONTENTS
Chapter Page
j I. INTRODUCTION ........ ......................... .i. .
II. NEW REORDERING STRATEGY FOR THE MODIFIED NODAL APPROACH . . . 6
2.1 Problems with Previous Methods ...... ............. 7
2.2 New Partitioning and Ordering Strategy . ......... . 162.3 Theorems and Examples ...... .................. ... 21
2.4 Results ......... ......................... ... 27
2.5 Discussion ........ ....................... ... 33
III. MODIFIED NEWTON METHOD AND PIECEWISE-NONLINEAR APPROACH . . . 34
3.1 The Newton-Raphson Algorithm ... .............. ... 353.2 Problems with the Newton-Raphson Algorithm ....... . 353.3 Piecewise Nonlinear Approach ... .............. ... 373.4 A New Modified Newton-Raphson Method for Bipolar Devices 423.5 Piecewise Nonlinear Approach for Bipolar Devices
Including Avalanche Effects .... ............... ... 673.6 Piecewise Nonlinear Approach for MOSFET .......... ... 68
3.7 Discussion ........ ....................... ... 70
IV. NUMERICAL INTEGRATION ....... .................... . 77
4.1 Problems with Previous Work .... ............... ... 78
4.2 Algorithm ......... ........................ ... 91
V. TEARING METHODS AND SPARSITY CONSIDERATIONS FOR NODE TEARINGMETHOD .......... ............................ .... 94
5.1 Derivation of the Branch Tearing Method .. ......... . .. 1005.2 Derivation of the Node Tearing Method .. .......... ... 1055.3 Comparison of the Branch Tearing Method with the Node
Tearing Method ..................... 1095.4 Constructing the Node Tearing Matrix from Subnetworks . III5.5 Sparsity Considerations for the Node Tearing Method . . 1165.6 Implementation of the Node Tearing Method ......... ... 1295.7 Circuit Interpretation of the Tearing Methods ...... ... 1335.8 Discuss-LJn and Conclusion ..... ................ ... 136
VI. LATENCY EXPLOITATION ....... ..................... .... 137
6.1 Latency Exploitation at the Subnetwork Level ...... . 1386.2 Latency Exploitation at the Network Level ......... ... 1606.3 Discussion ......... ....................... ... 164
VII. CONCLUSIONS ......... ......................... ... 165
REFERENCES .............. ............................. 171
VITA ............. ................................ .. 176
I.
-["- . . -.. . . .-- .. i--
I. INTRODUCTION
The design of integrated circuits requires an accurate method of
predicting circuit performance. The traditional breadboard method is
not able to satisfy the above requirement because of the fact that the
parasitic components that are present in the breadboard are entirely
different from the parasitic components that are present in integrated
circuits, so a circuit simulation program is a must. Conventional
circuit simulation programs [1-111 possess two serious limitations: a
computer storage requirement and a computing time requirement, so the
size of the circuit that can be simulated is limited. With the advances
of circuit simulation techniques, the size of the circuit that can be
simulated has increased; but the simulation of large scale integrated
(LSI) circuits is still beyond the capabilities of present circuit
simulation programs.
The goal of this research was to study new approaches to the
simulation of integrated circuits which can alleviate the abo've
two limitations, namely the repetitiveness and latency
properties of digital integrated circuits. Since a DEC-10 version of
SPICE2 was available to us, it was decided that this program would
serve as a vehicle for testing our algorithms. However, in the initial
phases of our research, it was found that our version of SPICE2 had
several deficiencies in the implementation of some of its algorithms
* - which occasionally caused numerical difficulties. In order to resolve
these difficulties a new reordering scheme for the modified nodal
approach was developed, a new concept -a piecewise nonlinear approach
"2-
for the Newton-Raphson iteration was proposed, and two problems with
the numerical integration algorithm were resolved. The new reordering
scheme for the modified nodal approach not only avoids zero diagonal
pivot elements which increases numerical accuracy, but it also signi-
ficantly reduces the number of fills in the matrix which reduces the
computational cost. The piecewise nonlinear approach reduces the number
of iterations needed to find the solution of a nonlinear circuit and
improves the global convergence property of Newton-Raphson method. The
resolution of the two problems with the numerical integration algorithm
provides more efficiency and accuracy. All of these new developments
result in a modified version of SPICE2 (YSPICE), which is 2 to 5 times
faster than SPICEZ.
Although YSPICE is more efficient and more accurate than SPICE2,
it is still not powerful enough to handle LSI circuits simulation
problems. Experience has shown that LSI circuits possess properites
which can be exploited to improve the storage and computing time
requirements. The two properties are the repetitiveness of a limited
number of subcircuits and the latency that may exist within parts of
the circuits during an analysis. Conventional circuit simulation
programs do not exploit these two properties, so all of the node
voltages or branch voltages and currents are calculated at each
iteration and each timepoint. In order to increase the capabilities of
circuit simulation programs substantially, these two properties must
be fully exploited. When the first property is exploited both computer
storage requirements and computing time can be reduced in several ways.
-3-
First, only one subcircuit description for each type of repetitive
subcircuit need be stored; secondly, only one set of small submatrix
sparse matrix pointers for each type of repetitive subcircuit is
needed so that both storage and preprocessing time can be saved;
thirdly, if one type of subcircuit is linear, then the LU factorization
of that type of subcircuit need be found once only. When the second
property is exploited, we only need to solve for the active parts of
the circuit and this reduces the computational effort considerably.
Tearing methods, first introduced by Kron [12), are well suited for the
exploitation of these two properites as well as the sparsity of the net-
work. Recently, the use of tearing methods and latency [13-24] has
been studied to exploit these two properties, but in order to fully
exploit these two properties more research effort is needed.
In the second stage of our research, these two problems were
studied extensively and the result of our investigations is a new
general purpose circuit simulation program SLATE (a Simulator with
Latency and Tearing). SLATE evolved from YSPICE, so it has all the
good features of YSPICE: in addition, several new approaches are used.
First, the new reordering strategy for the modified nodal approach is
used at both the subcircuit and interconnection levels; secondly, ways
of exploiting sparsity that exist at the subcircuit and interconnection
levels were theoretically and experimentally studied and the most
efficient way is used; thirdly, node tearing is used such that the
program is more efficient and the final equation formulation is suit-
able for latency exploitation and parallel processing; fourthly,
-4-
latency in he Newton-Raphson iterations is exploited not only at device
and subcircuit levels, but also at interconnection levels; fifthly,
latency in the time domain is exploited not only at device and subcircuit
levels, but also at interconnection levels; sixthly, three latency in
time criteria schemes were studied thoroughly in relation to the spread
of the time constants in the subcircuits and the best scheme was deter-
mined; and lastly, the interconnection matrix formulation method is
general enough to accommodate the situation when there are no subcircuits
specified in the network or when the interconnection circuits consist of
more than tearing nodes.
Both YSPICE and SLATE are written in FORTRAN and have a SPICE-like
input language for user convenience. If no subcircuits are used, then
the methods of analysis of SLATE is equivalent to that of YSPICE, that
is, YSPICE is a subset of this new program SLATE. Simulation results
indicate that the speed of SLATE is about an order of magnitude faster
than SPICE2, and the output results are either the same as or more ac-
curate than those of SPICE2.
The new reordering scheme for the modified nodal approach is
described in Chapter 2, and the comparison between this new scheme and
that used in SPICE2 is given The piecewise nonlinear approach is
explained in Chapter 3 and simulation results are given. The two
problems with numerical integration are detailed in Chapter 4, and the
solution is given. Chapter 5 introduces the concept of tearing methods
and gives the sparsity consideration for the node tearing method.
Chapter 6 describes three latency criteria and gives the simulation
ii i I -1 I-----l - I- - - ..- - .. . -,. , -- m - " -.. . r V...
results of these three schemes. Finally, in Chapter 7 a summary of
SLATE performance is given, the conclusions are presented, and areas
for future work are described.
-6-
II. NEW REORDERING STRATEGY FOR THE MODIFIED NODAL APPROACH
The modified nodal approach (MNA) [251 has been widely used in many
computer-aided circuit analysis programs [1,11,26,271 for formulating
circuit equations. It is well known, however, that while the more
restrictive nodal approach in general produces nonzero diagonal elements
for pivoting, the modified nodal approach, although more general, may
produce zero diagonal entries in the network matrix. This occurs, for
example, when the circuit contains voltage sources, short-circuits,
inductors at zero frequency (dc solution) and some types of controlled
sources. When sparse matrix techniques with diagonal pivoting are used for
solving these types of circuit equations, extreme care should be taken so
as not to choose a zero-valued pivot. Two methods have been proposed for
avoiding pivoting on these zero diagonal entries. One method (method 1)
involves ordering the rows and columns with zero diagonal entries last, in
the hope that they will be filled before becoming candidates for pivoting
(1,111. Another method (method 2) involves rearranging and/or combining
rows and columns in order to obtain nonzero diagonal elements [251.
However, as we show below, there are two problems with these methods.
First, even if all the zero diagonal elements which exist in the network
matrix at the formulation stage are avoided or filled during the
elimination stage, it is possible to generate zero diagonal elements during
the Gaussian elimination process regardless of the values of the circuit
elements; Secondly, these methods usually are not efficient. For example,
forcing the zero-diagonal entries to be last usually increases the number
Of fills considerably.
-7-
In this chapter a new reordering scheme for the modified nodal
approach is described which avoids zero diagonal pivots in essentially all
practical cases and is very efficient. In Section 2.1, the problems with
previous methods are illustrated and explained. In Section 2.2, the
partitioning of the circuit variables is detailed and the ordering strategy
is introduced. In Section 2.3, theorems and examples are given. The
implementation of this new scheme resulted in YSPICE. The simulation
results from YSPICE are given in Section 2.4. In this Section examples are
given which caused computational problems in our DEC-10 version of SPICE2
due to pivoting on zero diagonal elements, but which were successfully
analyzed by YSPICE. Also the number of fills produced by YSPICE is much
less than that produced by SPICE2. In Section 2.5, a discussion of this
new ordering strategy is given.
2..Problems with Previous Methods
The MNA matrix can in general be expressed in the form [251
r R B V= (2.1)
4. where V is the set of node-to-datum voltages and I is the set of branch
i currents which are chosen as additional circuit variables. YR is a reduced
form of the nodal matrix excluding the contributions due to voltage
sources, current controlling elements, etc. B contains partial derivatives
of the Kirchhoff current equations with respect to the additional current
variables and thus contains +1's for the elements whose branch relations
llll l I I , . .. iiii | • i ll il I II . . .. . , . -.. .
are introduced. The branch constitution relations, differentiated with
respect to the unknown vector are represented by the matrices g and D. ~
and E are the excitations.
As mentioned above, when sparse matrix techniques with diagonal
pivoting are used for solving Eq. (2.1), zero diagonal elements may be
encountered. Previously, two methods have been proposed for avoiding
pivoting on these zero diagonal elements. However, there are still two
problems with these previous methods: (1) zero diagonal elements may be
generated during the Gaussian elimination process, and (2) the methods may
not be the most efficient. In this section we consider the zero diagonal
problem, and in Section 2.4 we discuss the efficiency problem.
Method 1 orders the rows and columns with zero diagonal entries last,
in the hope that they will be filled before becoming candidates for
pivoting. Even if all the zero diagonal elements which exist in the
network matrix at the formulation stage are filled during the elimination
stage, cutsets of branches whose currents are declared as network variables
in a modified nodal formulation will generate zero diagonal elements during
the Gaussian elimination process regardless of the values of the circuit
elements. This problem is proved and illustrated by Theorem 2.1, Example
2.1, and Example 2.2.
Theorem 2.1. For any network which has cutsets of branches whose currents
are declared as circuit variables in a modified nodal formulation, if these
current variables are ordered last, then zero diagonal elements will be
generated during the Gaussian elimination process, regardless of circuit
element values.
II -9-
Proof: Since we assume that all the current variables are ordered last and
they form cutsets, therefore floating subnetworks are created. The
admittance matrices of these floating subnetworks are singular, therefore
the Y in Eq. (2.1) is singular, so zero diagonal elements will be
generated during the Gaussian elimination of Eq. (2.1).
The following Example 2.1 illustrates Theorem 2.1.
Example 2.1: A cutset of current variables (Fig. 2.1)
If the method which orders all the current variables last is used to
formulate the modified nodal equations of the circuit shown in Fig. 2.1,
the resulting equations will be as follows:
G - 0 1 0 V1 0
-G1 G 0 0 -1 V2 0
0 0 G2 0 1 V3 0
1 0 0 0 0 1E E
0 -l 1 0 0L 0
During the course of Gaussian elimination due to the resulting
floating subnetwork, a zero diagonal element will be produced at location
(2,2).
Reajr: For any subnetwork which has cutsets of branches whose currents
are declared as circuit variables in a modified nodal formulation, if the
rows corresponding to current variables which have zero-diagonal elements
,-. -_ . . . . . . . . . . . . . . . , ,-
are ordered last until a diagonal entry is filled, before it is considered
as a pivot, then zero diagonal elements may be generated during the
Gaussian elimination process, regardless of circuit element values.
The proof of this remark is the same as that of Theorem 2.1. In the
following, Example 2.2 illustrates this remark.
Example 2.2: A cutset of current variables (Fig. 2.2)
If the reordering strategy mentioned in the previous remark is used to
formulate the equations of the circuit shown in Fig. 2.2, the matrix
formulated is:
G 2 0 0 0 V 3 0
0 G -G 1 V, W 0
o -G G 0 IV 2 0
o0 0 0 1 E E
During the course of Gaussian elimination due to the resulting
floating subnetwork, a zero diagonal element will be generated at location
(3,3).
4 Method 2 interchanges rows in order to obtain nonzero diagonal
elements. Even if all the zero diagonal elements which exist in the
* network matrix at the formulation stage are avoided before the elimination,
if there are loops of branches whose currents are declared as network
variables in the modified nodal formulation, then zero diagonal elements
1 -13-
I may be generated during the Gaussian elimination process regardless of the
values of the circuit elements. This problem is proved and illustrated by
Theorem 2.2 and Example 2.3.
I Let us define the branch whose current is declared as a current
variable in the modified nodal formulation as current branch. Let us
define the 'positive' node as follows: Assuming that the datum node can
not be chosen as 'positive' and that the datum node is not contained in any
loop formed by current branches, then we can always choose one of the two
nodes of a current branch as 'positive' for that current branch and there
is a one-to-one correspondence between these 'positive' nodes and the
current branches. An algorithm for choosing 'positive' nodes is given in
Section 2.2.
Theorem 2.2. For any network with a loop of branches whose currents are
declared as network variables in a modified nodal formulation, and the
reference node is not contained in the loop and there is no coupling among
the voltages of the branches in the loop, then if all the rows
corresponding to the current variables are interchanged with the
corresponding 'positive' node voltage rows, zero diagonal elements wi1i be
1 generated during the Gaussian elimination process, regardless of circuit
element values.
Proof: Let us assume that after the rows corresponding to the current
I variables are interchanged with the corresponding 'positive' node voltage
rows, the rows corresponding to the current variables are ordered first,
then the MNA matrix equation (2.1) is transformed into
W
-14-
1 12 =11
22 12 21 2 1; 2(2.2)
2 2 Ci
The submatrix being eliminated first is the node-to-branch incidence matrix
for the 'positive' nodes and the current variable branches [28], that is ,
the B1 in Eq. (2.2). Since we assume that the reference node is not
contained in the loop and there is no coupling among the voltages of the
branches of the loop, then there is a one-to-one correspondence between
each branch of the loop and the corresponding 'positive' node and each
column in B contains exactly a +1 and a -1, therefore, B is singular and
zero diagonal elements will be generated during the Gaussian elimination.
The following Example 2.3 illustrates Theorem 2.2.
Examole 2.1: A loop of current variables (Fig. 2.3)
THe circuit equations formulated by method I for the circuit shown in
Fig. 2.3 in a transient analysis using a backward Euler Formula with
timestep h have the following form:
L Monson
-15-
i 1 ILu2 3
12~r 12.
- Rz - G 3
~G4
Transient Analysis P- 736
Fig. 2.3 Circuit used in Example 2.3.
L-- - -- tt.----
-16-
L 2 2 2 2 2 7
S2 -,'2 -. 1. 2 2 .. ) -
The submatrix B is singular, therefore during the Gaussian elimination
a zero diagonal element will be generated at location (4,4).
2.2. New Partitioning and Ordering Strategy
From the previous section, we conclude that the topological reasons
for zero diagonal elements being generated in the modified nodal approach
are: (1) cutsets of current variables and (2) loops of current variables.
Here we present a new partitioning and ordering strategy which has the
following good features:
(1) zero diagonal elements are avoided before the Gaussian elimination and
during the Gaussian elimination in essentially all practical circuits;
(2) it is efficient and the number of fills is less than that of previous
-17-
methods;
(3) it is easy to implement and the partitioning and ordering are done in
the preprocessing phase, so it is well suited for the use of sparse matrix
techniques.
Consider a linear (or linearized) circuit which contains independent
current and voltage sources, two terminal resistors, capacitors, inductors
and all types of controlled sources. We assume that the circuit contains
neither loops of only (independent and dependent) voltage sources and
inductors nor cutsets of only (independent and dependent) current sources
and capacitors.
In the modified nodal approach, the circuit variables consist of
node-to-datum voltage V together with a subset of branch currents Ib"
(Henceforth those branches are referred to as current branches.) In the
proposed ordering strategy, the node voltages V n are partitioned into two
subsets, V, and V2, and 'b is partitioned into three subsets, I, 12 and
13. The components of I, consist of the currents in the (dependent and
independent) voltage sources, and are in turn partitioned as follows:
Iv 'branch currents of the independent voltage sources.
VCV Ebranch currents of the voltage-controlled voltage sources.
I Vbranch currents of the current-controlled voltage sources.
The components of 12 and 13 consist of the remaining currents which are
circuit variables.
1
-18-
Let a graph GI (possibly disconnected) be first constructed to include
all the current branches, with all the other branches removed. If GI
contains loops, then a tree (or forest) is chosen, with only finite-valued
resistors as links. This is always possible since by assumption no loops
of only voltage sources, inductors and zero-valued resistors exist in the
circuit. Let 13 be the set of currents in the links of G1, then these
links can not form cutsets [28]. The components of 12 consist of currents
in the inductors and the remaining currents of the current resistors.
The components of VIconsist of the following:
YV Eset of 'positive' node voltages of the independent voltage
sources.
YVCV -set of 'positive' node voltages of the voltage-controlled
voltage sources.
YCCV Eset of 'positive' node voltages of the current-controlled
voltage sources.
VbC Eset of 'positive' node voltages of the the 12 branches.
The components of V 2 consist of the remaining node voltages.
I I I l I • • I l . . . .- ... ..2 - - T I l T I I
-19-
fThe following algorithm is followed in selecting the 'positive' node
voltages defined above:
AlxorithM
(1) The ungrounded nodes of all grounded current branches belonging to
or 12are chosen first as 'positive';
(2) Let b. be the number of branches whose currents belong to I or Ij-1 -2
and which are incident at node j. Whenever a node of a current branch is
chosen as 'positive', the number b kat its 'negative' node k is reduced by
one.
(3) If the b k value of node k of a current branch is one and that node
has not been previously selected as 'positive', then node k is selected
'Positive' for that particular current branch. If more than one node have
their b k value equal to one and if some of these nodes do not 1413 a
conductance (i.e., a resistance whose current is not a circuit variable)
connected to them, then one of these nodes is chosen 'positive' first.
Otherwise, any one of the nodes that has its bk value equal to one is
chosen 'positive'.
Step (2) and (3) are repeated until all the branches corresponding to
11and 1 2 have been processed. Note that up to this point there is always
at least one node whose b kvalue is one. This is because I Iand I do not
form loops. Note also that the number of positive nodes is equal to the
numer f eemntsin 1and 1 2. The polarities of the currents in the
current branches are associated with the positive node assignments.
-20-
Partitioning and ordering the circuit variables in the order of I,
12 , V1, Y2 , and I3, and writing the modified nodal equations in the usual
way [25], we get the following equation structure:
0 I
I- I B vc EvcB9 Z3 1 B .
I EI - 2 - -2
1I (2.3)
A 2 iI y A V J,Vi1CCV
AI I bCA- I I I I A 1 r21 . 2 --- 2 2-
A, 1 21 1Y2 IA6 J2
- ,., II~
Where the Ai's contain the partial derivatives of the Kirchhoff
current equations with respect to the circuit current variables, I, 12,
13, and thus contain 0, +1, -1 only.
By interchanging the rows corresponding to VI with the rows
corresponding to II and 12, Eq. (2.3) can be written in the following form
(This interchange is equivalent to off-diagonal pivoting and is done in
practice by a simple change in the pointer system rather than a physical
interchange of data in the rows.)
-21-
11 11 11 i 4 V1I I
1-
I I
" ' ' !v v c
A Y1 IY12'
1 , ZCCV II I i
Ill1 03 i Xv iv (2.4)
o5 - 21i22N Z2.
0 I . I
B1 ~ 3 ,'4 (i X ccv rwcI v"EI
IY -
o ' 7 ,A8K, z !1
I ,, IThe circuit variables are partitioned into three subgroups: (1) II
and £2' (2) VV' and (3) the remaining variables. The Markowitz scheme [29]
is used to minimize the number of operations within each subgroup. After
reordering, Eq. (2.4) can now be solved by Gaussian elimination or LU
factorization.
2.3 Theorems and Examples
If there are no current-controlled current sources or if the
current-controlled current sources are not incident at the 'positive'
nodes, then within the first two subgroups all the diagonal elements remain
1's and all the nonzero off-diagonal elements are -1's during the Gaussian
elimination process, so the leading part of the elimination can be done
simply by addition. The proof is given below in Theorem 2.3. Let us
consider the first subgroup, the 3ubmatrix associated is the node-to-branch
incidence matrix A for the 'positive' nodes and the currents belong to I
and 1 . Let us denote the directed graph of those nodes and currents by
-22-
GI . Due to our partitioning and ordering strategy, there are no loops in
GI , so _ has 1's on the diagonal, O's or -l's on the off-diagonal and A.,a
is square and nonsingular.
Theorem 2.1. For any diagonal pivoting the LU factors of A have the
following special properties: all the diagonal elements remain 1 's and
all the nonzero off-diagonal elements are -1 's.
Proof: Let A be formulated with current Ik chosen as the first pivot
where Ik flows in branch bk, which is connected between node i and node J.
After row and column interchange the first row and column of A will have
the following form:
1 1 2A
j -1:1
* I
n 1
where node j is assumed to be in G otherwise column one would be all
zeros below the diagonal. Note that the entry ali = 0 because G, does not
have any loops and all i=2,3 ..... ,n are either zero or -1. Pivoting on a1 1
amounts simply to adding row 1 to row J. Since adding any two rows in the
incidence matrix of a directed graph produces a row with 0, -1 or 1
. . .. ..- S# ,
-23-
enties, row j will then contain 0 and -1's with +1 on the diagonal because
a : 0.
A
Let the submatrix generated by pivoting on a11 be denoted by A . A..a , a
can be considered as the incidence matrix of a directed graph GI where G
is derived from GI by removing branch bk and merging node 1 with node J.A
Thus A has the same properties as A , and pivoting on its first diagonal
entry will produce a submatrix with ones on the diagonal and 0 and -1's
elsewhere. This proves the theorem.
The reasoning for the second subgroup is similar to Theorem 2.3.
Now we would like to present the main result.
Main Result. For any network which has a unique solution, if the
partitioning and orderinng strategy proposed here is used to solve the
modified nodal equations, then no zero diagonal elements will be
encountered during the Gaussian elimination process, except for the case
when controlled sources or negative-valued elements with some specific set
of circuit element values result in perfect cancellation.
Proof: There are two kinds of zero diagonal elements which may be
encountered. One type is due to the formulation method [25] and occurs in
the network matrix before the elimination process starts. These zero
Jdiagonal elements are avoided by interchanging the rows corresponding to
the 'positive' node voltages with the rows corresponding to h and I
IDuring the elimination, topologically, the zero diagonal elements are
caused either by ordering loops of current variables first or by a floating
subnetwork which results by ordering a cutset of current variables last.
I
-24-
Both off these situations are prevented by partitioning 1 3 away from 1I and
ordering I and I first, so no loops can be formed by I and I . Since I1 2 1 -2 -
consists off currents in the links, so 13 will not form cutsets, and thus no
floating subnetworks will result.
Alternatively, this theorem can be proved as follows: Since all the
currents are ordered first and eliminated first, from Theorem 2.3, we know
that the elimination of these currents will. not generate zero diagonal
elements. After all these currents are eliminated, if 1I is empty, we are
left with nodal matrix equations, then no zero diagonal elements will be
generated; if 13 is not empty, since 13can not form loops or cutsets, so
no zero diagonal elements will be generated.
A more rigorous and general proof can be found in 1301.
In the following we would like to use the new ordering scheme to solve
those examples used in Section 2.1.
Example 2.1:
If our approach is used, initially, the matrix formulated is:
1 -25-
I
I After interchanging rows, the resulting matrix is:
I
0 .L -GI 0 G 1 vI , HiNo zero diagonal elements will be encountered during the course of
Gaussian elimination.
Example 2.2:
I If our approach is used, initially, the matrix formulated is:
IB 0
IAfter interchanging rows, the resulting matrix is:I
II
-. ... ' A ;, .
-26-
I 'o O L . -
No zero diagonal elements will be encountered during the course of
Gaussian elimination.
Example 2.1:
If our approach is used, initially, the matrix formulated is:
. 0 0 1 -1 0 0 0 - -.
3 0 3 L 0 -L 0 t..3 0 0 0 3 0 -( 3
L ,3 3 3, 3 3 3 -t 7. '3
L i 0 3 3 . 0 0 7, 3
0 3 3 3 Z3 0 --. ' , 3
After interchanging rows, the resulting matrix is:
AW-1. -,
-27-
3~ 33~~~ 0 . 3 .3:
.3~R Z, .
No zero diagonal elements will be encountered during the course of'
Gaussian elimin~at ion.
These examples show that our approach indeed can avoid zero diagonal
elements before the elimination and during the elimination. However, as
mentioned before, if there are controlled sources or negative valued
elements with specific set of element values, zero diagonal elements may be
produced due to perfect cancellation.
2.4. Results
The implementation of this new algorithm into the DEC-10 version of
SPICE2 has resulted in YSPICE. In YSPICE, the 'positive' nodes are first
determined by the algorithm presented in Section 2.2. The network matrix
is constructed using the element stamps as in £31]. The sparse matrix
reordering is carried out using the Markowitz criterion [29,32] with
diagonal pivoting. The row interchange is done by one extra set of
pointers.
-28-
Examples which caused computational problems in the original version
of SPICE2 due to pivoting on zero diagonal elements were successfully
analyzed using YSPICE. Furthermore, the results we obtained show that in
many cases the number of fills produced by our ordering strategy is far
lower than that produced by previous methods, resulting in less
computational cost, and at the same time, more accurate solutions.
Here a small selection of the examples analyzed by YSPICE is presented
and the results are compared with those obtained by SPICE2.
Example 2.4: The two circuits shown in Figs. 2.4(a) and (b) were analyzed
using SPICE2 and YSPICE. The CPU times required by the equation solving
subroutines in both programs for both circuits are given in Table 2.1.
Table 2.1 Simulation Data-
CPU time I number ofCircuit for the equation number of operations
solving subroutine variables per iteration
YSPICE2.4( CE 0.9090 sec. 7 162 .4(a)
SPICE2 1.9740 sec. 7 71
2.(b)YSPICE 0.031 sec. 10 30
SPICE2 0.108 sec. 10 101
h-
t -29-
0 i4
S 15V
15 215 1
(0)
0
®Dov ®5 g v50P 7 50p ~ 5 0 pT
. r-48S'l
(b)
Fig. 2.4 Example Circuits.
1It
-30-
The difference in the number of operations between YSPICE and SPICE2
in Table 2.1 can be explained as follows: In SPICE2, the matrix formulated
by the modified nodal approach for the circuit in Fig. 2.4(a) is as shown
in Fig 2.5(a). It can be seen that although the number of off-diagonal
elements of the rows and columns corresponding to I, 12, and 14 is small,
they are not chosen as pivots until their corresponding zero diagonal
entries are filled. The delay causes the number of fills to increase
greatly. In YSPICE, the matrix formulated for the circuit in Fig. 2.4(a)
is as shown in Fig. 2.5(b). It can be seen that the number of fills is
now zero due to the off-diagonal pivoting, and consequently, the number of
operations is reduced.
Example 2.5: The circuit shown in Fig. 2.6 was also analyzed using both
SPICE2 and YSPICE. The results of the dc analysis are shown in Table 2.2.
Table 2.2 Simulation Data.
Node 2 3 4 5 6
YSPICE 2.000 V 4.000 V 4.000 V 0.000 V 0.000 Vnode voltages _
SPICE2 2.000 V 2.324 V 2.324 V -1.676 V 1675.9999 Vnode voltages
-31-
I ~l
1
3 1 iJ 2 i2 4 i4
' xx oxoxox x 1 xo0xo0
0 1 @@0®0x x x 1 xo
0 0 01 @@0x xgxgxi
00000 i(a)
i1 i2 i 4 1 2 4 3
00 lx XXX
0001000
00000 10O0000010
L..OOOX XXX(b) ,,..,,,
Fig. 2.5(a) Structure of the Network Matrix for theCircuit in Fig. 2 .4 (a) formulated by SPICE2.
(b) Structure of the Network Matrix for theCircuit in Fig. 2 -4(a) formulated by YSPICE.
AII
-. 1"~- . .f- I,
-33-
Our approach gave correct results for this circuit vhile SPICE2 gave
inaccurate results. These inaccuracies can be explained as follows: In
SPICE2, if a diagonal element becomes too small, then it is replaced by
1.Ox1O In this circuit this approach is equivalent to connecting a
1.0x1012C- resistor from node 4 to ground. In this circuit the diode is
reverse biased, the equivalent resistance used in SPICE2 for this diode is
O.721xIl 2C , as a result the computed I1 in SPICE2 is 2.324x0-1 A, instead
of the correct value, which should be O.OA. This inaccuracy in computing
I, makes V5 = -1.6760V and V 6 = 1675.9999V instead of O.OV.
2.5 Discussion
In this chapter we have presented an ordering stategy to be followed
when the modified nodal approach is used. When this new strategy is used,
the possibility of selecting zero diagonal pivots is reduced. The new
strategy eliminates the need for having to continuously check the pivot and
to replace it by a nonzero value in case a zero is generated, as is done in
some existing strategies, which is both time consuming and inaccurate.
In addition, if the currents through the voltage sources are not
needed, our ordering scheme provides a convenient way of reducing
computation by performing the backward substitution step only partially to
obtain the required variables.
Although by performing off-diagonal pivoting, the circuit matrix loses
its symmetry and increases the complexity of the program, however, this is not
j a serious drawback. In fact, in many of the examples which we have
analyzed, we have observed that by using off-diagonal pivoting, the number
of fills is much less than that produced by other methods.
____
-34-
III. MODIFIED NEWTON METHOD AND PIECEWISE-NONLINEAR APPROACH
In a computer-aided circuit simulation program, if a circuit contains
nonlinear elements, then a nonlinear solution method is required to solve
the nonlinear algebraic equations in both dc analysis and transient
analysis of the circuit. There are many nonlinear solution methods
available, but the one most widely used is the Newton-Raphson method. This
method has the desirable property that its rate of convergence is quadratic
in the neighborhood of the solution.
Although The Newton-Raphson method has excellent local convergence
properties, it has problems [1,33] when the initial guess is not close to
the solution, such as numerical overflow, slow convergence, )r no
convergence. Several modified Newton-Raphson methods have been proposed to
try to resolve the above problems, and the performance of the basic
Newton-Raphson method has been improved to some extent. Here a new
method - the piecewise nonlinear approach - is presented, and examples are
given which show even further improvement. This method evolved from the
piecewise linear method and previous modified Newton-Raphson methods, so it
has the advantages of both methods. However, this new method is still at
the experimental stage, no definite conclusion about it has been obtained.
This chapter begins with the introduction of the Newton-Raphson
method. In Section 3.2, problems with the Newton-Raphson method are
illustrated. In Section 3.3 the piecewise nonlinear approach is presented.
In Section 3.4, a new modified lewton-Raphson method for bipolar devices *s
detailed and the piecew ise nonlinear approach for bipolar devices including
the avalanche effect is given in Section 3.5. In Section 3.6, the
... Iii i- ,1 1 - A .... - -. . ..
I-35-
piecewise nonlinear approach for the MOSFET is described. In Section 3.7,
a discussion of the piecewise nonlinear approach is given.
3.1. The Newton-Raphson Algorithm
Let the set of nonlinear equations be
F(X) = 0 (3.1)
If Xk is the solution at the kth iteration, from Taylor series expansion,
we have
F(X) = F(Xk ) J(Xk) ( X - Xk ) + higher order terms (3.2)
Eq. (3.2) is used to obtain a solution to Eq. (3.1) under the assumption
that the higher order terms are negligible. Thus, we write
+ J-k) (Xk+l-kQ (3.3)
Solving Eq. (3.3) for X+i we obtain
X X - [J(X k)]-F(X k) (3.4)
Eq. (3.4) is called the Newton-Raphson iteration algorithm.
3.2. Problems with the Newton-Raphson Algorithm
The problems of numerical overflow and slow convergence can be
illustrated by a simple diode circuit shown in Fig. 3.1. The branch
constraint for the typical semiconductor diode has the form i = Is(e 4 - I)
Given an initial estimate V to the solution for this circuit as shown in0
Fig. 3.1, it is not uncommon for the solution V1 to the next
L Aftpk ;
-36-
I I
II
R V VOD O
VDD.
VV3 V 2 VDv V- s" PP.7037
Fig. 3.1 Overflow and Slow Convergence Problem with the
Simple Diode Circuit.
-37-
Newton-Raphson iterate to be in the neighborhood of VDD as shown in Fig.
3.1. If the exponent in the diode equation is too large, overflow may
occur. Even if overflow does not occur, convergence will be extremely slow
because of the very large slope of the diode characteristic in this region.
One modified Newton-Raphson algorithm which has proved successful in
avoiding the above problems was proposed by Colon [33]. In this algorithm
iteration on current is employed if Vk+1 exceeds a reference junction
voltage VREF, this is illustrated in Fig. 3.2. This algorithm is used in
the SPICE2 program.
Another problem with the Newton-Raphson algorithm is the lack of
convergence. This is illustrated in Fig. 3.3. The iterate solutions Will
oscillate between Vo and V 1 and never converge to the solution V*.
3.3. Piecewise Nonlinear Approach
This is a new approach which has the advantages of the piecewise
linear approach and the modified Newton-Raphson methods. However, this
method is still at the experimental stage, the proof of global convergence
or conditions for global convergence has not been obtained. We restrict
our discussion to two terminal elements. In this approach, first, a set of
breakpoints is chosen and the device characteristic is partitioned into
several nonlinear pieces. The partition must satisfy the following
constraints:
(1) each piece must be monotonic and the first derivative must be
monotonic too;
1*
L.'
-40-
(2) the piece must be chosen to be suitable for the current/voltage
iteration to avoid numerical overflow and to hasten convergence;
(3) the number of pieces must be kept as small as possible to avoid
the possibility of slow convergence.
After the partitioning, the following algorithm is used to perform the
iteration:
(1) choose an initial guess Vo;
(2) linearize the circuit by Newton-Raphson method and find the
iterate solution Vk+l (k = 0);
(3) if Vk+ I is within the original piece, then use the moaified
Newton-Raphson method to choose Vk+l, and continue the iteration;
otherwise, if the next breakpoint in the direction of change has not been
chosen before, choose Vk+l equal to it; otherwise go into the adjacent
piece, if the derivative is not continuous at this breakpoint, then choose
this breakpoint as Vk+l again but use the new derivative; otherwise,
choose the other breakpoint as Vk+l and continue the iteration.
This approach is illustrated in the graphical solution that is given
in Fig. 3.4. Here the tunnel diode characteristic is partitioned into
four pieces. The initial guess is Vo located in piece I. The solution of
the linearized circuit is V1 which is not in piece I, so V 1 is chosen to be
equal to breakpoint 1. The solution of the new linearized circuit is V,
which is still not in piece I. Enter piece II and choose breakpoint 2 as
V The iterate solution V3 is not in piece II, go into piece III and
choose breakpoint 3 as V3, the iterate solution V 4 is not in piece III.
I -41-
I
R
Vro 3 v1 Q1 v2 2v3 V V4 VD0V Fp-704
Fig. 3.4 Example of the Piecewise Nonlinear Approach.
AN
-42-
Enter piece IV and choose V&4 as V4, this time, the solution V5 is in piece
IV. Continue the iteration by modified Newton-Raphson method until the
convergence is obtained.
3.4. A New Modified Newton-Raphson Method for Bipolar Devices
In the piecewise nonlinear approach, the diode characteristic is
partitioned into three pieces as shown in Fig. 3.5. In region III, in
order to avoid numerical overflow and to compensate for large higher order
terms, the modified Newton-Raphson method must be used [I1,33,34].
Consider the simple diode circuit shown in Fig. 3.6. The nodal
equation is
F(V) V VDD + I (eV/Vt - 1) . 0 (3.5)
R
By a Taylor series expansion we obtain
F(Vk+I) = F(Vk) + F'(Vk) (Vk+l - Vk) + F (Vk) (Vk+l _ Vk) (3.6)2
+ higher order terms
' I +Is VkiVtwhere F (Vk) (Vk+ I - Vk) (--L- e ) (Vk+I - Vk) and
R Vt
F (Vk) -V+ V)2 i V/Vt 2
2 2*V vk+ " k
If we assume that R is sufficiently large, then the ratio of the third term
to the second term in Eq. (3.6) is
(Vk+1 - Vk) (3.7)
2*Vt
IIII III.-,, .. .. - ]i ,J ltl..
-44-
R v
V00k
(a)
-k+ YkiQ+ V 0
1k--1 ------
-P-7044
(b)
Fig. 3 6(a) Simple Diode Circuit.
(b) Newton-Raphson Iteration Solutions for (a).
L- . --
-45-
If (Vk+ - V ) is not small compared to 2Vt, then the assumption that the
higher order terms in Eg. (3.2) are small and can be neglected is not
true, so the correction term AVk = Vk+ - V k obtained by the Newton-Raphson
method may not be good.
Let -W be the modified correction term such that
V +AV'-VF(V k k k Ae)V+k-DD (Vk+N~)/Vt
F(V + IS(e k 1 ) 0 (3.8)R S
Because Vk satisfies
F(Vk) + F'(Vk)AVk0 (3.9)
so
F(V k ) + F'(V k) k Z F(Vk + "Vj) (3.10)
From Eq. (3.10) we obtain
(LAW-AVk) (Vk+AVk)/Vt Vk/Vt Avk
R + I e S e (1+ (3.11)s t
From Eq. (3.11) we obtain
AV -/v Av-AVk kt k_1 =-e R (3.12)
if (4v Vk)/R is sufficiently small compared to the exponential
term I eVk/Vt, then the eq-..tions
-46-
v V in - ) (313)kt
yields a good approximation to the true solution.
Now we would like to find out the relationship of Eq. (3.13) to
current iteration. For current iteration, after obtaining
AVk+ I 1 V k + AV k (3.14)
we can obtain
I e/Vt S Vk/Vt(.
+ (e - 1) AVk (3.15)
If 'IK+ > -Is , then there is a point Vk+l = Vk + Vk on the dicde
charateristic whose current is Ik+l"
'A I a k V
Is(e(VkfVk)/Vt - !) I (eVk/Vt - 1) - /V A% (3.16)
We obtain
A AVk (3.17)A V ln(l +-
k t Vt
We can see that Eq. (3.17) is identical to Eq. (3.13), also we can
see that the condition for Eq. (3.16) to have a solution is
A k& ->(3.18)Vt
Since if AVk _ O, then Eq. (3.18) is satisfied, so we only need to
consider the situation when AVk < 0. This condition can be explained
graphically in Figs. 3.7(a), (b) and (c). From Eq. (3.15) we see thatA
-i s Ik+I 7 I1 for -V, AVk A -0. Thus, if the current intercept of the
load line with the linearized diode curve lies in this range, then IV kI
cannot exceed Vt and convergence can be quite slow if voltage iteration is
_ _ _ __I.. . ... .. ... ... . .I
-47-
(b) L 0-o
---- vo'? v
R I
v -Vt ----
VO
(c) 1. VI<
C s v t+ -
o S AV,1 --- i
Fig, 3M7 Three Cases with the Simple Diode Circuit.
-48-
used.
For Example in Fig. 3.7(a) we see that Ik+ 1 > -Is and so
-Vt < AV k _4.0
AIn Fig. 3.7(b) Ik+I = -Is and so
AVk = -Vt
In Fig. 3.7(c) Ik+1 < -I. and so
AVk < -Vt
In cases (a) and (b) IVkl 4 Vt , therefore - of iterations f V
t
voltage iteration is used.
Let us consider the conditions for cases (a) and (b) to be true. From Eq.
(3.9), we obtain
-AVk (Vk - VDD)/R + Is(eVk/Vt - 1)- , (3.19)
V= 1/R + Is eVk/Vt
Vt
I Vk -VD - V t -R*I
1/R +I eVk/Vt + k DD t s
S t R*Vt (3.20)
I/R +-..eVk/Vt
Vt
Since
-49-
R DD, + IS (e v / v t - 1) - 0 (3.21)
so from Eqs. (3.19), (3.20) and (3.21), we obtain the following
conclusions:
(1) If Vk V then AVk .0.
k Vk VDD V
(2) If Vk -%.V and R . D , then -V ,- A 0. (3.22)
IIs ,. vk - VDD- vt(3) If R and V k are chosen such that V k _ V and R ."
Sand voltage iteration is used in the forward region, then the number of
iteration is lowerbounded by V*-VkSvt
For example, if V - V 1.OV and Vt 0.025V,k
then V-Vk 40.Vt
The above conclusions show why current iteration must be used in the
forward region.
Now we would like to examine under what conditions current iteration
should be used and if there is a VREF (such as the VREF used in SPICE2) to
determine whether current iteration or voltage iteration should be used.
* A
Let us consider the simple diode circuit in Fig. 3.8. Vk, V and Vk
satisfy Eq. (3.23)* vk ' *
(eV /Vt - k/V - (3.23)S (R
Let us consider the limiting case when V* - V << V' then Eq. (3.23)
k
can be rewritten as:vR*I ev*/v t (*~
.. ! v -V k ) J Vk V (3.24)
L*
-50-
VDDR
T*
A~ v v V00 V
'S P.7021I
Fig. 3.8 Comiparison of Current Iteration to voltage Iteration.
-51-
so if---Is ev /Vt>> 1, then current iteration is preferred; else ifV
R*Ise V*/Vt<< 1, then voltage iteration is preferred. These two conditions
Vt
are illustrated in Figs. 3.9(a) and (b).
From Eq. (3.24) we can conclude that VREF must satisfy
R*I sV RE /VREF / t (3.25)V
t
Since the value of R is not a constant, there is no universal VREF.
Experiments of the simple diode circuit with different values of R and VDD
were done to test the above conclusion, and the data are given in Figs.
3.10(a), (b), (c), and (d). These data confirm Eq. (3.25). In Fig,
3e10(a), /Vt is always much larger than one, this explains why
current iteration is always better than voltage iteration; in Fig.
R*I s * sls hnoe otg3.10(b), when VDD is less than 0.I7V, TeV/ is less than one, voltage
iteration is better than current iteration; in Fig. 3.10(c), when V isDD
less than 0.5V, v /vt is less than one, so voltage iteration is bettert
than current iteration; in Fig. 3.10(d), because AV may be less than -1,t AV
strict current iteration in region III can not be done. Whenever V-is. isVt
less than -1, the next guess is reset to zero, and current iteration is
resumed. for this approach, current iteration is always better than
voltage iteration.
In the conventional current/voltage iteration approach, such as the
one used in SPICE2, there is a universal VREF, if Vk+l exceeds VREF, then
current iteration is used; otherwise, voltage iteration is used. In
4SPICE2, this VREF is set to the point of minimum radius of curvature:
£
-52-
v[
R
VkV A V0 V
Vref Vt, I ( )
(a)
I I
III
Vk V*1 VD0A-151 Vk
Fig, 3.9(a) Situation When Current Iteration Is Better Than Voltage Iteration.(b) Situation When Voltage Iteration Is Better Than Current Iteration.
(a)
0.9
Vo0 =5V -
0.8
0.7
0.6
V* -0.5 >i[ Current Iteration _o Voltage Iteration
Z 30 A Our Approach -0.4
20 0.3
S10- 0.2
R a P-7019
Fig. 3.10(a) Comparison of Iteration Methods for the Simole Diode Circuit.
p
-54-
(b)I I ' i ' l i' ' i " I " ' 0 .9
R=°"0.8
-0.7
V 0.6o Current Iteration
OL 0 Voltage IterationA~ Our Approach 0.
00.
0
30 0.4
20 0.3
10 0.2
I I I .I . I _ I I I. I 1 0 .1L0 2.0 3.0 4.0 5.0
VDD FP7018
Fig. 3.10(b) Comparison of Iteration Methods for the Simple Diode Circuit.
A. ll ' 1
-55- E
(c)
I I I I0.9
R=5K-0.8
-0.7
-0.6
0.
0 0 Current Iteration0 Voltage I.teration
z30 - Our Approach -0.4
20- 0.3
10 -0.2
1.0 2.0 3.0 4.0 5.0VDO P7
Fig. 3.10(c) Comparison of Iteration Methods for the Simple Diode Circuit.
-56-
R= 1000 K
-0.8V*
o Current Iteration -.o Voltage Iteration0.A Our Approach
-0.6
-0.5 -,0 *>
=30 - 0.4
20- 0.3
1. 2.10 1 3.10 1 4.10 1 5 1.0 1 -0.1-VDD
FP-7016
Fig. 3.10(d) Comparison of Iteration Methods for the Simple Diode Circuit.
-57-
V V ln(._ t (3.26)REF t S
From the above analysis, we can see that this VREF does not provide
any guarantee of fast convergence. The simple diode circuit was used again
to test the approach used in SPICE2, VDD = 5V, R = 1000K, the number of
iterations used by SPICE2 is 12, while the number of iterations for strict
current iteration is only 3.
There is another problem associated with the conventional
current/voltage iteration used in SPICE2. This problem is illustrated in
Fig. 3.11. Let us assume that the initial guess is V0 and the first
iterate solution is V . Now we do not know which load lines we are1z
encountering, because 'both load lines will give us V . If it is load line
1, then voltage iteration should be used; if it is load line 2, then
voltage iteration is too slow. In SPICE2, because V is less than VREF,
voltage iteration is used for both cases. Experimental results show that
for VDD = -5V and R = 1000K the number of iterations used by SPICE2 is 12.
This problem can be solved by using the piecewise nonlinear approach.
Whenever this situation occurs, then the next guess is changed to zero. If
it is load line 1, the next iterate solution is in the first quadrant and
voltage iteration is used to obtain the solution. If it is load line 2,
the next iterate solution is in the third quadrant. The number of
iterations used by recognizing that the load line 2 is being used and
changing the next guess to zero is 3.
Also let us examine Eq. (3.13) again. When Akis positive and muchQV
vt
smaller than 1, then AVk AVk. If the difference between &Vk and Vk is
small compared to the iteration error tolerance, then there is no need to
-59-
do the transformation. Let us assume the error tolerance is 10" 6V, thenAVk
when- -- is less than 0.01, there is no need to do the transformation.
The result of all the above analysis is a new iteration scheme. The
flowchart for this new approach is shown in Fig. 3.12(a), the experimental
data for the simple diode circuit are also given in Figs. 3.10(a), (b),
(c), and (d). These data show that this algorithm works well for resistor
load diode circuits. However, when transistor circuits are solved, the
load line generated by linearization changes during the iterations and the
algorithm goes into limit cycle for some circuits. If the piecewise
nonlinear method presented in Section 3.3 (it corresponds to a
Katzenelson's type algorithm [403 for the piecewise linear approach) is
used, then probably the limit cycle problem will not occur. But the
piecewise nonlinear method only allows one diode to change regions at a
given iteration, so the convergence rate is slow; also the piecewise
nonlinear method requires a linear search to accomplish the task that only
one diode changes regions. So instead of using a strict piecewise
nonlinear approach, the algorithm in Fig. 3.12(a) was modified to
eliminate the limit cycle problem. The flow chart for the modified
algorithm is given in Fig 3.12(b).
Three test circuits were used to test this new iteration scheme.
These three circuits are given in Fig. 3.13(a), (b) and (c), and the data
are given in Table 3.1. These test results show that the new iteration
scheme is superior to the Colon method used in SPICE2.
II
t---61-
(a)
I Vk+lIv=
- A.1 yesI t
V k+l 0. ye
! k yes V 6R yes
Sno n
no no
[ Vk
I yek + Vt*sn(.O +AV
IVIt
I'i ~~k~' Vk+ 1 ,..no
-64-
Fig. 3.13 Example Circuitg (a) One Transistor Amplifier.
(b) TTL NAND Gate.
(c) Differential Amplifier.
.1
-66-
Table 3.1 Comparison of the Results betweenthe New Approach and SPICE2.
Number of iterations Number of iterationsCircuit (New approach) (SPICE2)
Fig. 3.13(a) 3 6
Fig. 3.13(b) 7 17
Fig. 3.13(c) 6 7
I
.1
I -67-
3.5. Piecewise Nonlinear Approach for Bipolar Devices Including Avalanche
Effects
Although the avalanche charateristic of diode should consist of two
separate exponential functions £35], in order to simplify the analysis,
here the avalanche characteristic is chosen to consist of only one
exponential function. The diode I-V static characteristic used here is
l shown in Fig. 3.5.
In the forward biased region the equation for the diode current is
Id Is(e /Vt -1) (3.27)
The reverse-biased current before breakdown is
Id = (3.28)
The avalanche current is
Id = eA(VB - B*Vd) (3.29)
The constants A and B are determined from the I-V characteristic curve,
I where VB is the breakdown voltage and Vd is the junction voltage.
I If, in order to hasten the convergence, strict current iteration is
used in pieces I and III, then divergence may be encountered as shown in
Fig. 3.5. Therefore, the piecewise nonlinear approach for bipolar devices
with avalanche modeling is as follows:
(1) choose VO equal to VREF and piece III;
(2) find iterate solution Vk+1 by the new modified Newton method. If
IVk+1 is within piece III, repeat this step.
-68-
(3) otherwise go into piece I, choose Vk+l = 0 and use the new
derivative, if Vk+ 2 is within piece II, then solution is fo.
(4) otherwise, go into piece I, choose Vk+ 1 - VB, and use the new
modified Newton method.
Remark: Only one nonlinear device is allowed to change its region at one
iteration, otherwise, limit cycle problems may occur.
3.6. Piecewise Nonlinear Approach for MOSFET
Let us consider the simple resistor load MOS inverter which are shown
in Fig. 3.14(a). The nodal equation is
F(V) V - DD + 0((VIN - VT)V 2 (3.30)R 2
By Taylor series expansion we obtainif
F(Vk+l) = F(Vk) + F (Vk) (Vk+1 . Vk) + F (VR) (Vk+ 1 _ k)2 (3.31)2
+ higher order terms1
where F (Vk ) ( - - -+ 8 - V- k) k+1 Vk) and
F (V V)2 -$2
2 k+l - 2 (Vk+l - vk)
If we assume R is sufficiently large, then
the third term in Eq. (3.31) -(Vk+ 1 - Vk )____ ___ ___ ___ ____ ___ ___ - k(3.32)the second term in Eq. (3.31) 2(VIN - VT - Vk )
if (V k+ - Vk) is not small compared to 2(VIN - VT - Vk), then the
assumption that higher order terms in Eq. (3.12) are small and can be
---- ---- - ~-
I -69-
I*III
I Do
D VI 0
I V ° °
l VnB Vinj Vi
I FP-7040I
I(a) (b) kc)
Fig. 3.14(a) Resistor Load MOS Inverter Circuit.
(b) Saturated Load MOS Inverter Circuit.
(c) Depletion Load MOS Inverter Circuit.[II
IJ
-70-
neglected is not true, so the correction term Vk V k+1 - Vk obtained by
the Newton-Raphson method is not good. This may result in very slow
convergence. Fig 3.15(a) illustrates this problem. If the initial guess
is V0, then the first iterate solution is V1 , which is far away from the*
exact solution V . Because the derivative is large when Vk is negative, so
it requires a large number of iteration to converge to V Fig. 3.15(b)
and (c) illustrate the slow convergence problem with a saturated load MOS
inverter circuit and a depletion load MOS inverter circuit as shown in Fig.
3.14(b) and (c) respectively.
The above slow convergence problem can be resolved by using the piece-
wise nonlinear approach. For example, for the resistor load MOS inverter
circuit, first, the MOSFET characteristic is partitioned into two pieces as
shown in Fig. 3.16, the first piece is from -- to zero, the second piece
is from zero to +f, then the circuit can be solved as illustrated in Fig.
A3.16. The initial guess V0 is in piece II, the first iterate solution V1
Ais in piece I, so VI is chosen to be the breakpoint zero. Since V2 is
within piece II, the Newton-Raphson method is used to find the solution V
Remark: In the iteration scheme for MOS circuits, the piecewise nonlinear
approach is used for VDS. The change of VGS and VGD are limited by 1V at
each iteration. I3.7. Discussion I
Two large circuits were used to test the piecewise nonlinear approach,
one is a bipolar circuit as shown in Fig. 3.17, the other is a MOS circuit
as shown in Fig.3.18. The data are given in Table 3.2. The results show
that this method improves the convergence property of the basic I
I -71-
Ii (b)
I I v. Vo-VT 'I(C)
-i; V'O VDD V
F . 3
I
II1I
1 VI
II
Fig. 3.15 The Slow Convergence Problem with the Circuits in Fig. 3.14.
!A
1 -75-
UI
3 Table 3.2 Comparison of the Results betweenthe Piecewise Nonlinear Approach(PWNL) and SPICE2.
ICircuit Number of iterations Number of iterationsiE'WNL) (SPICE2)
I Fig. 3.17 34 48
Fig. 3.18 10 28
IIIIIIIIII
[i
-76-
Newton-Raphson method; however, the proof of the global convergence
property or the conditions for global convergence have not yet been
derived. More research work on this topic is needed.
I -77-
IIV. NUMERICAL INTEGRATION
IA numerical integration method is required to determine the transient
I response of a circuit. In order to make the numerical integration more
i accurate and efficient, some method of dynamically varying the timestep is
needed, this is usually accomplished by a local truncation error (LTE)
timestep control.
Let us denote the upperbound on the local truncation error by ET. In
previous work, ET was established as follows [36]. First, a maximum
I allowable global truncation error GE and the solution interval T aremax
specified. An assumption that this global error is distributed uniformly
within T is made, then the maximum allowable ET per timestep (h) is given
f by
ET - amaX * h (4.1)T
The LTE timestep control with trapezoidal integration is implemented
as follows. First, the timestep h and tn+1 = tn + hn are determined, the
solution at the timepoint tn+1 is found, then the local truncation error
(LTE) is evaluated by Eq. (4.2).
LTE = * h) 3 DD3 (4.2)12 T-
I where DD3 is the 3rd divided difference [] and tn T < tn+I . The kth
divided difference is defined by the recursive relation
DDk'l(tn+l) - DDk'l(tn) - X(tn+) X(tn)DDkk , DD1 (4.3)
i-I hn+-ii n
If LTE > ET, then the timestep is considered too large, hn is rejected, a
new h. is computed using
-78-
2 2*GEM xn T*DD3 (4.4)
and then a new timepoint t is determined. If, on the other hand,n+l
LTE < ET, then the local truncation error at timepoint t n+ is considered
satisfactory, and the timestep h is computed usingn+.
22*GEmax (4.5)
h n-I-i T*DD3
4.1. Problems with Previous Work
When the above strategy is applied to determine the transient response
of a circuit, there are two problems:
(1) SinceoDD3 is only an approximation of x(T), whenever the timestep
is changed or the input signal changes abruptly, our investigation shows
that DD3 becomes an inaccurate estimate of LTE. This inaccuracy results in
the following unwanted situations. One situation is that at the timepoint
n+19 if LTE < ET, the timestep hn+1 is increased, but at the next
timepoint tn+2, due to the inaccuracy, LTE is now found to be larger than
ET, so this timepoint is rejected and the timestep is reduced. The other
situation is even worse. If, at the timepoint tn , the input changes
abruptly, then due to the inaccuracy of the DD3 approximation to x(T), LTE
may be greater than ET and the timestep is reduced. Sometimes this happens
repeatedly until the timestep becomes too small and the program terminates.
These two situations are explained in detail later.
(2) For digital circuits, the total solution time T may consist of
several switching intervals. If a stable numerical integration method is
used, initially in an interval the local truncation error accumulates and
the global truncation error (GE) increases, but as the solution nears
.1
* 79-
I steady state in a given switch interval the global error decreases, and as
g the solution approaches the steady state the global error goes to zero, so
that the upperbound ET given by Eq. (4.1) is too conservative. This is
3 illustrated in Fig. 4.1.
3 Now we will consider the above two problems in more detail. The first
situation of the first problem can be illustrated by a simple RC circuit as
I shown in Fig. 4.2. In order to simplify the analysis, the backward Euler
method is used. The exact solution for this circuit is
v(t) = 5e-t/T (4.6)Ithe solution obtained by the backward Euler method isI v
nv n+1 " nI hn (4.7)
I where v 5V.0
I The local truncation error estimates at timepoints tn+l and tn are
I LTE h2 * DD2+ (4.8)LTn+I n l
n-LTE =h 2 * DD2 (4.9)n n-I n
where DD2nl Vn+Ih -v n v n h n v n-l
h n + h n -I.I hn+°V - V -
DD2 V v n - Vn-I Vn-2In h n- h hn- 2
n n -
n-A-
II
.M,
U -81-
II
I R _
I v(O)=5V
iI I= VM -=C2n
I
tn tn I
n nh h I
(b) FP-7007
Fig. 4 .2(a) Simple RC Circuit.
(b) Waveform of the Simple RC Circuit.
III
I- - .... ,---- -- -
-82- 1
Let us consider the situation when hn-2 = h h and hn ah, where
a is a ratio constant. From Eqs. (4.7), (4.8) and (4.9), we obtain
LTEn 1 DD2 n+i 22 2-- l n+ n 2a 2 i
- *a(4.10)LTEn n 2 a+ 1 + ah
n-1 T
If both LTE and LTE are good approximations of the true localn+1 n
truncation errors, then from Eq. (4.6), (4.7) and the definition of local
truncation error [36] the ratio of LTEn+l over LTEn should be
LT++ 1 a2n
LTEn+1 2 1 (4.11)
LTEn +ah
Comparison of Eq. (4.11) with Eq. (4.10) shows that the ratio computed by2a
Eq. (4.10) is wrong by a factor of a-. When a = 1, that is, the timestep
is constant, then the estimation by Eq. (4.8) is good. When a is
different from 1, then the estimation by Eq. (4.8) is not good. Table 4.1
gives the simulation results of the simple RC circuit, which confirms the
above conclusion. The ET used is 10 3V. At the first three timepoints,
the timesteps are kept constant, so the estimation by Eq. (4.8) is good.
At the fourth timepoint the timestep is increased by a factor of two. The
true local truncation error is 0.8930E-3, which is an acceptable error;
but the estimation by Eq. (4.8) is 0.1207E-2, which is larger than ET, so
the timepoint is rejected.
Now we would like to see if this inaccuracy can be explained by the
above conclusion. The ratio of the estimation at the fourth timepoint over
the estimation at the third timepoint is
LI
-83-
I
Table 4.1 Simulation Results of the Simple3 RC Circuit.
ITrue local
t(ns) h(timestep) LTE(estinate) truncation error
1.5 0.25 0.2355E-3 0.2339E-3I1.75 0.25 0.2332E-3 0.2314E-3
i 2.00 0.25 0.2309E-3 0.2293E-3
2.50 0.50 0.1207E-2 0.8930E-3
I
II[
-84-
0.1207E-2 5.227 (4.12)0.2309E-3
instead of 4 as predicted by Eq. (4.11). However, note that for a 2
2a . 4 133 5.227 (4.13)a+1 3 4
Eq. (4.13) shows that the estimation of local truncation error is really2a
wrong by a factor of- as shown in Eq. (4.10).a+, ssow nE. 41)
Now let us consider the second situation of the first problem. This
situation can be illustrated by a simple RC circuit as shown in Fig. 4.3.
Again the backward Euler method is used here for the simplicity of the
analysis. The exact solution for this circuit is
5e- T t tn
v(t) - (4.14)
e- t/T + 5( •-(t - tn )/T tvn - n
the solution obtained by backward Euler method is
k<n
vk+1 = I(4.15)
vk
1 + hk/T + I+hk n
where v = 5V.0
In the early version of SPICE2, when tn exceeded a source breakpoint,
then h was reduced such that the value t coincides with the breakpoint.n-I n
The timestep was reduced to a small value and then the iteration was
-i ;.1It
I -85-
K I R
+ v(O)= 5VSC V(t) r:RC=25nsI "
(a)I )
I 5V
I '
tn t
I (b)
I 5V
, 5V
IV.
tn.2 %-? I tnZ.
1 r -Htn
h h obh
(C) PP-,027
Fig. 4.3(a) Simple RC Circuit.
I (b) Waveform of vin.(c) Waveform of v.
I FL 7-
-86- 1I,
continued. Let us consider the situation when han2 h, hn 1 : ah and
h = bh! where a and b are ratio constants. From Eqs. (4.8) and (4.15), Jwe obtain the following estimate of the local truncation error:
LTE =2 2 5 + 2 (4.16)n+1 [ T(a+b)h T (a+b)
Let us also assume that bh is not small enough, so that
LTEn+1 >ET (4.17)
Then the timestep was reduced and a new timestep ch was computed by
22 b (vn - 5)
c2h 2 5(a+b)h + 2 =ET (4.18)T~a~bh T 2(a+b)
where c < b.
The local truncation error for this new timestep ch is estimated by
LIE' 2 2 5 c (V n+ 1 - 5)LTEnC+ h [ 7(a- h+ 2(ac (4.19)
n+lct ach T 2(a+c)
It follows that
ch(vn+i -5)LT '+1 a+b [1l+ 5T ]1ET -- (4.20)
ET a+c [I+ +h____"5_
5T
Since c < b then (a + c) < (a + b) and for a small enough a the ratio in
Eq. (4.20) could be greater than one. If this is the case, then the step
' .1I.'
* -87-
U size will be reduced again. This could happen again and again. This was
the case in early version of' the SPICE2 program, and frequently after
abrupt clock or signal changes the program would not converge and the job
I would be terminated prematurely.
Remark: In Eq. (4.16), the second term is an approximation to the true
local truncation error, the first term, although it is dominant, is a
Iparasitic term which is generated by the use of voltages at the timepoints
of the previous switching interval.
The second problem is detailed as follows. Let GEmax denote the
maximum global truncation error at t (Fig. 4.I). Assume that the
I ltrapezoidal method is used and that we are dealing with an exponentially
decaying waveform. The local truncation error at the timepoint t n+I is
I given by
I LTEn+1 ) n t n+ (4.21)
and for this example
I0LTEn+1 in3 Vn'max (4.22)
12T
From Eq. (4.22) and the definition of local truncation error, we obtain
-hI/T h 3
GE n+1 -GEMAX e n +n3 V n,max- GEmax (4.23)j n~max12T
where GE n+Ibis the global truncation error at the timepoint t n+. Eq.
(4.23) can be reduced toh I 3 h12T V "m GE max(_n) (4.24)
1 12j n,max max
-88-
The local truncation error timestep control requires that
h 3
LTE _2-3 V Se ET (4.25)n~l 2T3 n~mx
Assuming equality in Eq. (4.25) and eliminating hn/ in Eq. (4.24), we
obtain the following upper bound on the local truncation error.12 (GE m
ET ! (4.26)VVn , max
In order to check the above bound which was derived for RC circuits, a
number of digital circuits were simulated and the following empirical bound
on the local truncation error was -3termined.
ET 10 (4.27)VDD
where V is the voltage swing which is the supply voltage in this example.DD
Eq. (4.27) also holds for exponentially rising waveforms. Given GFmax and
VD' then ET can be determined by Eq. (4.27), and then Eq. (4.2) can be
used to control the timestep.
Eq. (4.26) shows that ET is proportional to (GEmax ) for RC circuits
if the trapezoidal method is used. In general, for RC circuits, if a stable
numerical integration method of order n is used, similar derivation as used
above can show that
ET ; (GEma x (n+l)/n (4.28)
Eq. (4.28) was verified experimentally for the backward Euler method and the
trapezoidal method. The RC circuit as shown in Fig. 4.2 was used. The simu-
lation results are given in Figs. 4.4 and 4.5.
AD-A124 064 AN INVESTIGATION OF ORDERING TEARING AND LATENCY
ALGORIT HMS FOR THE WIME-..(U) ILLINOIS UNV AT URBANA
COORD INATED SC IENCE LAB P YANG AUG 80 R_891
UNCLASSIFIED N00014-79-C-0424 F/G 9/5 NLEIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIuIIIIIEIIIIEIIIIIIIIIIIIIIIuEIIIIIIIIIIIIuliulnD
111.0 1 LM
1111 w - jil2.2
III1 L_----" jII .
MICROCOPY RESOLUTION TEST CHART
-iiI I.. ..A.. R LA CF A -
1 -89-
II i,I, II
, 10-2-
i I
ELJ
0.-6 O-5 . 4 10 3 10-2
ET (backward Euler method) FP_03
1 Fig. 4.4 Simulation Data of the Backward Euler MethodWhich show that ET c< (GE max)2i!a
-90-
10I
30-3-
aE
LUI0
ET (trapezoidal method) F
Fig. 4.5 Simulation Data of the Trapezoidal Method
Which Show that ET o< (GE max3/2
max
... .r ,
1 -91-
I4.2. Algorithm
The following algorithm for variable timestep control was developed
I based on the discussion in the previous section. The trapezoidal method is
assumed. ET is computed by Eq. (4.27).
First let us derive an expression for the estimation of the local
truncation error which is used in the algorithm. Let us assume that
I -2 h , = ah and hn = bh. The solution obtained by the trapezoidal
method for an exponentially decaying waveform is
I -h n/2Tvn I m n (4.29)
1 +hn/2T
the 3rd divided difference is given byI
I DD2n+ 1 - DD2DD3n+I h hn_2- -+hn +h
n-2 n-1 n
3(ahb) bh (4.30)2(1a+b) 6T3 (1 _ L -- ( + L bhI 2T (1 2T' 2T
where 3(1+b) is the error factor in the estimate of the third2(1+a+b)
derivative caused by varying the timestep.!By taking into account the effect of different timesteps, the expression of
local truncation error is given by
h 3
LTE n 3 2(1+a+b) (4.31)ITn+I - 2 n+l * 3(1+b)
[ -~ -F
-92-
or we can define a new quantity DD3' which is given by
DD2n+L DD2nDD3' . =- .--- (4.32)
n+1 (h n-2 +hn)
nnthen Eq. (4.31) can be reduced to
LTE - * DD3+ (4.33)
Algorithm:
(1) Record the initial time to, final time tf, minimum stepsize hmin '
maximum stepsize h , and source breakpoints.max
(2) Set the initial timestep h = hmin"
(3) Compute X1 at tj = to + h, X2 at t2 = t + 2h, and X3 at
t 3 = t0 + 3h.
(4) Set n z 3 and compute LTE by Eq. (4.33).
(5) Compute hn = h - . If hn < 0.6h, then h = hn and go to (3);
otherwise, continue.
(6) Compute tn+1 = tn + hn . If in+1 does not exceed a source
breakpoint, then go to (7). If tn+1 exceeds a source breakpoint, then hn
is reduced such that the value t n41 coincides with the breakpoint. Compute
Xn+ 1 for this breakpoint. Compute LTE by Eq. (4.33), compute hn+, if
hn+ 1 < o.6h n , then hn z hn+1 and go to (6); otherwise, set h hmin and
to 2 tn+i, then go to (3).
. . .. .-.. ..-
-93-
(7) Compute Xn+ 1. Compute LTE by Eq. (4.33), compute h n1 , if
ht+1 < 0.6hn, then he = hn+, and go to (6); otherwise, continue.
i (8) If t > tf, then stop; if not, then n = n+1, and go to (6).
I Remark: The above algorithm has been derived for a fixed order variable
stepsize method which uses the trapezoidal rule. Our simulation results
I show that the problems we mentioned before in this chapter are resolved by
1 this algorithm. If other fixed order methods are to be used, then the
corresponding equations should be modified.
IIIII
.1
,1
!-- __ 7
-94-.
V. TEARING METHODS AND SPARSITY CONSIDERATIONS FOR NODE TEARING METHOD
There are two kinds of tearing methods - the branch tearing method and
the node tearing method. The idea of branch tearing was first introduced
by Kron (12]. Recently Chua and Chen (163 have shown that the branch
tearing is just a special case of generalized hybrid analysis. The main
idea of branch tearing is to select a set of tearing branches first, then
the given network is torn apart into several subnetworks by removing these
tearing branches (Fig. 5.1), analyzing each subnetwork separately, and
obtaining the solution of the entire network by combining the solutions of
the subnetworks via the tearing branches. Algebraically, this method is
equivalent to a particular ordering of the hybrid analysis equations such
that the resulting matrix has a bordered block-diagonal structure (Fig.
5.2). Each block corresponds to a subnetwork, and the border corresponds
to the interconnections of the subnetworks.
The idea of node tearing was first introduced by
Sangiovanni-Vincentelli, Chen and Chua £20). The main idea is to select a
set of tearing nodes first, then the network is torn apart into several
subnetworks by removing these tearing nodes (Fig. 5.3), each subnetwork is
analyzed separately, and the solution of the entire network is obtained via
the tearing nodes. Algebraically, this method is equivalent to a
particular ordering of nodal analysis equations such that the resulting
matrix has a bordered block-diagonal structure (Fig. 5.14). Each block
corresponds to a subnetwork, and the border corresponds to the
interconnections of the subnetworks.
111. 7
-96-
v
Y11 o Al
000 2 2 t2
33 t3 3
T AT T Z ATl U N t A rt ~t
Art vr
Fig. 5.2 Bordered Block-Diagonal Matrix Formulated
by Branch Tearing.
I -97-
Suntok1ISbewr
Th etothIewr
It ,
SuntwrI
FPI70Fi.53 EapeoI ewokPriindit he
Suntok yteNd ern ehd
-98-
Y ~yV
122 t20
133 Y-t33
!it _y2t 13t itt Yt Z0 y Y y v
0rt Yrr
Fig. 5.4 Bordered Block-Diagonal Matrix Formulated
by Node Tearing.
.1 t
-99-
I Recently, because tearing methods possess several advantages over
conventional circuit analysis methods, a lot of effort has been devoted to
£ the study of tearing methods for the analysis of large scale circuits. The
advantages of tearing methods are as follows: First, tearing methods are
suitable for the exploitation of the repetitiveness of a limited number of
subnetworks; secondly, tearing methods are suitable for the exploitation
of latency; thirdly, tearing methods are suitable for parallel processing.
IIn order to solve the network by tearing methods one must specify a
Ipartitioning strategy, and also one must specify a technique for solving
the partitioned equations. In the literature mostly the branch tearing
Umethod has been used to solve large networks 122-24]. The solution
strategy has been to estimate the current or voltage at each tearing port
i and to excite the torn subnetworks with independent sources at these ports.
The remaining port responses are computed, and are substituted into the
6 interconnection equations. If these equations are not satisfied, then
another estimate is made of the variables chosen as port excitations. This
iterative procedure continues until convergence is achieved. If the
subnetworks are nonlinear a multilevel iteration scheme is used, such as a
Gauss-Seidel [22], Newton-SOR [24] or a multilevel Newton iteration [42].
However, the first two iteration schemes do not have second-order
convergence while the third scheme requires the computation of an
add-itional Jacobian matrix.
Because the above approach introduces new variables, such as tearing
1 branch currents, the complexity, of the problem is increased. Also, a
multilevel iteration scheme is required. Another disadvantage of the
I branch tearing method is that each subnetwork must contain the datum node
for the network, or else a local datum node must be chosen for each
subnetwork. In the program SLATE a different approach is used, and the
-100-
internal subnetwork variables are eliminated from the tearing node
equations. Then the tearing node voltages are computed. Only a one level
Newton iteration is required, and the internal variables for each
subnetwork can be eliminated using parallel processing methods. The
elimination of the internal variables is equivalent to replacing the
subnetworks by Norton equivalent circuits at the tearing nodes.
In our study of tearing methods it was assumed that the user specifies
the subnetworks, and any part of the network not specified as a subnetwork
is automatically included in the subnetwork called rest of network in Figs.
5.1 and 5.3. The subnetworks are processed first in the solution
algorithm, and the tearing branches or nodes along with the rest of network
equations are processed last. Thus, the branch tearing method and the node
tearing method described in Section 5.1 and Section 5.2 are somewhat
different from those found in the literature r12-141. T, Section 5.3, a
comparison between branch tearing and node tearing is given. In Section
5.4, the derivation of the construction of the node tearing matrix from
subnetworks is detailed. The sparsity considerations for the node tearing
method are presented in Section 5.9. The implementation of node tearing is
described in Section 5.6. The circuit interpretation of the tearing
methods is given in Section 5.7. Some conclusions are given in Section
5.8.
5.1. Derivation of the Branch Tearing Method
Let N be a connected network having (n+1) nodes: the datum node n0
and the nondatum nodes o = ln,,n 2 . .. . .. 9nn , and b branches,
S1bi9b2 .. . .. .. bb. Let the branch voltages, branch currents and the
Tnode-to-datum voltages be denoted by e (e i e ..... e )
-. e 9 b'
A.- d1
1 -101-
(i i 2 . . . . . . i ) and v (v,v 2 . . . . . . V) respectively. Let the
interconnection be defined as the remaining part of the network when all
the subnetworks are removed, that is, the set of tearing branches and the
I rest of the network in Fig. 5.1. Let us aasume that a proper set of
g tearing branches has been chosen such that there is no mutual coupling
either among the torn subnetworks or between the torn subnetworks and the
interconnection, and that all subnetworks contain the common datum node.
This latter assumotion is made in order to avoid floating subnetworks which
I result in singular submatrices. The more general case when all subnetworks
g do not contain a common datum node is discussed in r17]. Subscripts s, t
and r are used to denote quantities pertaining to the subnetworks. the
tearing branches and the remaining branches, respectively, so the branch
set 9 is partitioned into three subsets 8s, St and Sr (Fig. 5.1), and the
node set C is partitioned into two subsets csand r. This yields the
jfollowing special structures for the reduced incidence matrix A and the
branch ionductance matrix G of network N:
SB S r
A S (5.1)Of 0 A A5
s t r
G 0 0s -3 S
G t o G 0 (5.2)
r 0 _O*
ti
-102-
Let the torn network have k subnetworks Ni, N2 , . .. . .. ,Nk. Let os
and 3s be partitioned correspondingly into k subsets oi (29 ...... 1Qk and
S i t a2, ...... 9ak respectively. With this partitioning, the node-to-brancn
incidence matrix A can be written as
1 2 .... k Otof t A tl
Qe A 0
A • 0 (5.3)
0 .
k Ask A tk
1r 0 Art ,Arr\II
I /
the branch conductance matrix G can be written as:
33 a .3 131 2 k t r
13 G01I '.sl
2 os2 _
G 0 0 (5.4)
* 0 I
k 2 skI
0 ;Gt 0-- o- --3t 0 'SOrG
r o
A -
J
-103-
IThe network variables are constrained by the Kirchhoff's current law
I (KCL), Kirchhoff's voltage law (KVL) and the branch constraint relations
(BC) [28).
(KCL) A i + A t = 0 (5.5)
A i + A i r 0 (5.6)
(KVL) e = A Tv (5.7)
AT-- ATv + v (5.8)-t -,,s -r,-r
T T
A A T v (5.9),%.r Zrr r
(BC) i = I + Gspe- Gges (5.10)
t ~t s + z- t -Z;.-t S5tt
i=I + G e G e (5.12)~--r "-r S ,,P--r "~v ,,Es
Substituting Eqs. (5.7), (58) and (5.9) into Eqs. (510) and (5.12), and
then substit-iting the results into Eqs. (5.5) and (5.6), we obtain
A G A Tv + A i=- J (5.13)
Arl A G A T V = O (5.14)rtt ~rr--rrr -rs
Substituting Eq. (5.8) into E q. (5.11), we obtain
ATv - Z i + AT v = E (5.15)-- B ,,--t -rt-r - s
where J A G e -A I
-" J :A Ge -A I-.rs -rp-v.-rs -rr,-rs
-104-
Aand E :e -Zian ts " ts " Z-ts"
Eqs. (5.13), (5.14) and (5.15) can be rewritten in the following form:
AGA T A 0 v,,9%-,SFS ~t- S ,Ss
T TA A i E (5.16)
t -C r -t -Cs
0 A A G A v J-rt-rv--v,-rr -r -rs
Let us now examine the term As Gs.AT in Eq. (5.16). Substituting A of Eq.S -S
2 Ys2
TT
A G of Eq.. 0 (5.17)
•0•
\t y
k y-sk
where Y = A G AT J=1,2 .......,k,~Sj ~s jS s) sj
so Eq (5.16) can be rewritten as:
I /
Y A tl v rl J ssl
~s2 --t2 -r2 ss2
'00 0I
0• (5.18)
Lsk .tk v k kAIT AT T I AT
-tl -t2 -tk t"Z rt t E t
-rt -r _ _rs
o -- ---- --r - -.--.---
1 -105-
where Y A vG TA Eq. (5.18) is the resulting matrix by branch tearing
method, which has the desirable bordered block-diagonal structure and it is
a particular ordering of the hybrid analysis equations.
5 I5.2. Derivation of the Node Tearing Method
Let the interconnection be defined as the remaining part of the
network when all the subnetworks are removed, that is, the set of tearing
3 nodes and the rest of network in Fig. 5.3. Let us assume that a proper
set of tearing nodes has been chosen such that no coupling exists either
5 among the torn subnetworks or between the torn subnetworks and the
interconnection. The node set a is partitioned into three subsets as , a
and ar, and the branch set 8 is partitioned into two subsets 8S and3 r
(Fig. 5.3). This yields the following special structures for the reduced
incidence matrix A and the branch conductance matrix G of the network N:I
S A 0
A a A A (5.19)t - tts -trrC1 0 A=
i rG aI s r
G - Sj (5.20)B 0r E
II.
-106-
Let the torn subnetwork have k subnetworks Ni, N 2 Y .. . .. . ' N k Let
and B be partitioned correspondingly into k subsets C11' a2 ..... ' ak
and 3 1, 8 2 .. . . . . 8 k respectively. With this partitioning, the node-branch
incidence matrix A can be written as:
a. 2..... aB I a
of AI sI
2 As2 I
A .•0 (5.21)
A I
t .tsl _.ts2 -tsk I r
r 0 -i
The branch conductance matrix G can be written as:
1 2 . r
i
I G G2 Zs2
" 0 IS 5
G '0 (5.22)
* 0
k. 2 skI
r
o -li
-107-
The network variables are constrained by the Kirchhoff's current law
(KCL), Kirchhoff's voltage law (KVL) and the branch constraint relations
(BC).
(KCL) A. = 0 (5.23)
I At£ i + A (5.24)
A- :0 (5.25)
I TT(KVL) e = ATv + A v (5.26)
e = AT v ATV (5.27)
(BC) i = I + G e - G e (5.28)~s -SS ~-SS -&SS
i = I - G e (5.29)~r .-rs -.r-r -t'-rs
Substituting Eqs. (5.26) and (5.27) into Eqs. (5.28) and (5.29), and then
substituting the results into Eqs. (5.23), (5.24) and (5.25), we obtain
A G AT + AGA T v =JS (5.30)
T T I~s~
A G ATv + (A G AT + A G AT )v + A G ATV J (5.31)tts tss-ts -tr-v.-tr - t cr.V--r~r -ts
Av T,A G~vvt + A O6v J (5.32)_tv r wr, r _rs
*1where J :AGe - &
J :A G e -A I + A G e -A I-Cs -.-- ss -Ce--ss ~tr-D-rs tprs
and J A G e - A IIPr s --,P rs S ,,-rs
Eqs. (5.30), (5.31) and (5.32) can be rewritten as:
I
-1.08-
T TA G AT A G A 0 v
T T T TAt GAs At G +It AT G A GA v 3t&..S.S.S .. S.S-..s .t r---tr r-r-r -t = (5.33)
0 AGA A G A T v Jr r t r-rr -r -rs
TLet us now examine the term A G AT in Eq. (5.33). Substituting the A of !
Eq. (5.21) and G of Eq. (5.22) into A G AT , we obtain
Ct y
1 i - 2 . . ..l.1i 4S
of2 Ys2T
AG f. . 0 (5.34)
* 0
kL Ysk
where Y. A GA T J=1,2 ........,k
so Eq. (5.33) can be rewritten as:
1 ~I -sts Jsl I v
0Xs2 o I1x9t 2 42 Js2
0 " " .(5.35)
s k tk -sk -ssk
y s ts2 . '.. tsk t-Ytt I -tr V _it
0 Y v Jrt 1rr r ,r s
, - : 000,
3 -109-
where Y = A G AT j=1,2 ...... kI -Stj -s-j.. ts
'Yrs = A G A T J=1,2, ..... k
= ~tsssj ......
I AYG A T +A T~-t t -s- S--Cs -r--Cr
,Y :A AT~rt _tr-2-r
,Y :AG-AI -rt -vZ-rC
Tand Y =A G A.
Irr -- r
Eq. (5.35) is the resulting matrix by node tearing method, which has
the desirable bordered block-diagonal structure and is a particular
ordering of the nodal analysis equations. In the above derivation the
modified nodal method could have been used. In this case the vectors v-s
and v consist of hoth node voltages and currents of branches for which an_r
admittance description presents difficulties. This is actually the
formulation used in the program SLATE.
5.3. Comparison of the Branch Tearing Method with the Node Tearing Method1As described above, branch tearing is equivalent to a particular
jordering of the hybrid equations, node tearing is equivalent to a
particular ordering of the nodal equations, so branch tearing requires the
use of tearing branch currents as extra variables. As a result of the
IIII
above property, node tearing possesses the following advantages over branch
tearing:
(1) the dimension of the matrix formulated by node tearing is smaller
than that formulated by branch tearing;
(2) the number of nonzero entries in the matrix formulated by node
tearing is smaller than that formulated by branch tearing [201;
(3) for passive networks, node tearing generates a diagonally-dominant
matrix, so any application of the Gaussian elimination method with diagonal
pivoting is stable, while this is not the the case in the branch tearing
method;
(4) usually, in the analysis at' large scale circuits, node tearing
preserves the identities of' the resulting torn subnetworks, while branch
tearing sometimes destroys the identities of the torn subnetworks.
However, one can generate examples in which the opposite is true, but these
situations were not encountered in our examples.
The above conclusions are not conclusive; although the dimension at'
the matrix formulated by branch tearing is larger than that at' node
tearing, the extra nonzero entries are either +1 or -1. If this property
is fully exploited, then node tearing may not be so advantageous. But full
exploitation at' the property that extra entries are either +1 or -1
requires a much more complicated sparse matrix technique. So node tearing
is pref'erred and is used in the program SLATE.
T I -Iii-
5.4. Constructing the Node Tearing Matrix from Subnetworks
In Section 5.2, the derivation of node tearing is given; however, in
I the real implementation we do not want to solve the whole matrix equation
at one time. We would like to process each subnetwork separately, and then
obtain the solution of the entire network by combining the results of
U subnetwork process.
I
In the following the procedure of constructing the node tearing matr.x
from subnetworks is detailed. Let us consider one subnetwork Ni . Let the
tearing nodes which are connected to Ni be denoted by oti' the node voltage
of 'ti be denoted by vti' the nodes of Ni be denoted by ai, the node
voltages of ai be denoted by vsi, and the currents which represents the
1 relationship of the rest of the network with this subnetwork be denoted by
i (Fig. 5.5). vti and Jti satisfy the following relations:
Uzt (5.36)I~ I.
E = ti 0 (by KCL) (5.37)I3.
I1I_ _ _ _ _ _ _1
3 -113-
IThe node equations for this subnetwork have the following form:
I i
A Kti
II
,
By augmenting with appropriate zeros to match the dimension and summing Eq.
(5.38) for all the subnetworks and the interconnection, we obtainII
ysl I s 2 -
I Ys*t s j
I.
ly V
- s 2 0 I Yst2 -s Z. 2
0 ". ""(5.39)
Ysk -stk i 2 ssk
Ytsl Yts2 4tsk I-tt I -tr -v itsI 0 4 . . .
-Irt Irr -vr a00r rs
T
I
We can see that Eq. (5.39) is identical to Eq. (5.35).
Eq. (5.39) is solved by first eliminating all theYtsi to obtain the
interconnection matrix equations.
Y tt Y r v It
(5.40)
* k )-I~
* k
an 2st = st I: Ytsi ( Ysi JS
i= 1
Eq. (5.40) is solved to obtain solutions for v t and v r , and then the
solution v . can be obtained by using backward substitution.
The above solution procedure can be modified to enable us to process
each subnetwork separately to obtain the interconnection matrix equations.
Let us consider Eq. (5.38) again, after eliminating Y s, we have
CI.I. I f ti
U L. . L.I 0S, usi I -si si -si. ~Si SSI ~0L Is " I
-W I %I +iLi L
where L U 2Y-s i.si ~Si'.
Y * Y -Ytti Ytsi __si i L -u.--- -
1 -115-
I . ~-and i i Y (
atsi = tsi - si -s i J "S
The partial interconnection matrix equations obtained from Eq. (5.41) are
=tt t = I + t (5.42)
By augmenting the above arrays with appropriate zeros to match the
dimension of Y (this is only conceptual, because sparse matrix techniques
I are used and every elment is put in the appropriate location in a one
dimensional array) and summing up Eq. (5.42) for all the subnetworks and
Ithe interconnection, we obtain Eq. (5.40) again.
Since Z J 0 by KCL, therefore J does not appear in the final
i1ti2- -timatrix equations (5.39) and (5.40), so we can neglect J. in both Eqs.
(5.38) and (5.42).
I So the modified solution- procedure is as follows:
1(1) Formulate the simplified Eq. (5.38) for each subnetwork, i.e.
i 0ti
i Si st s SSI (5.43)
(2) Process Eq. (5.43) to obtain the simplified Eq. (5.42) for each
subnetwork, i.e.I= ts (5.44)I@ tJ )
-116-
(3) Sum up Eqs. (5.44) for all subnetworks and the interconnection
with appropriate dimension match to obtain Eq. (5.40);
(4) Solve Eq. (5.40) to obtain vt and vr;
(5) Solve the upper part of Eq. (5.41) by backward substitution to
obtain V I for each subnetwork Ni.
5.5. Sparsity Considerations for the Node Tearing Method
Now let us compare different ways of sparsity exploitation for
processing Eq. (5.43). For the solution procedure discussed in the
previous section, both LU factorization and substitution procedures are
required. In SLATE the modified nodal equation formulation and the new
reordering strategy described in Chapter 2 are used at the subnetwork
level. After the current variables and the corresponding 'positive' node
voltage variables are eliminated, the final subnetwork matrix is
structurally symmetric. So here we assume that the subnetwork matrix is
structurally symmetric, under this assumption, there are two possible LU
factorization procedures we would like to compare. there are other
procedures described in 138), but these procedures are either equivalent to
them or are less efficient.
The two LU factorization procedures are denoted by F1 and F2 £38), and
are given in Table 5.1.
-1
whereL U ' V LZs k-s k -sk' - sk' tk'
T A1 i -1W = Y UV
- ts1k'k - sl
and LtkUtk :ttk : ttk "tskUsWs 2 stk '
-Aft
3 -117-
" II
I Table 5.1 Possible Factorization Procedures.
I
I F 2
i Lk u -sk k sk 0 1 vIT Ltj 0lok s L k 0 Utk
I
III
I
.I[ 2,
-118-
In Fw only those rows of hich are required to compute V are2' onytoeroso Vreq ~computed, V consists of those rows of V corresponding to the nonzero
columns of Ytsk"
Let 1.1 denote the number of nonzero elements in a vector or a matrix,
and M(.J) and M(J.) the jth column and the jth row of matrix M,
respectively. Let nt denote the number of tearing nodes, and B(k)
satisfies the following relation.
r k=O
B(k) = (5.45)
1 k 1, k integer
The following lemmas are used to compare the number of operations between
F and F2.
Lemma 5.1. Suppose the subnetwork matrix is structurally symmetric, then
the number of rows of [ equals the number of nonzero rows of V.-req
Proof: From the definition of V p, we know that any nonzero row of V which
is not a row of V must consist of only fill-ins. Suppose it is row i,
then there must be a nonzero row j of V , J < i, and a nonzero L Row
j together withLskij creates the fill-ins of row i. Due to structure
symmetry, there also must be a nonzero U sk,ji So in order to evaluate the
row j of Vp, it is necessary to evaluate the row i of V.
Lemma 5.2. Suppose the subnetwork matrix is structurally symmetric and Y
is mxm, then the difference in the number of operations between F1 and F2
(DNF) is:
DNF = LI v- ) t I I (l5 () 1I(j.)l :(j=1 j.1
U j=2 i-j+1 j
if V rqis full, thenr eq
DNF Lj-i) t~.(i-)I - E dL 8ki) -1)n- 1y(iiI)B(Iy(i.)I)j21. j.1
t- yLtskl (5.47)
5Proof: DNF E (J .~ - 1)IW(j.)I + z IwT(.j)I Iv(i)I
j=2 i~jl re j=1 req
E E~i1 -u6~iI~ e~-I-Z ys -~j e j (5.49)
From steutua 5.1, ety obtain ai
I IW~req - 2 !V-)I Bjy(*) (5.52)
Sustttn Eqs (5.9) (5.0) (5.51) an(55)5noE.1
(5.48)ma5., we obtainEq (54)
-req
Substituting Eq. (5.53) into Eq. (5.46), we obtain Eq. (5.47).
-120-
Associated with F and F,> there are five possible substitution
procedures denoted by Si , Sl, S1 , S2 , and S. [38] which are given in
Table 5.2, where b req consists of the rows of b which are required to
compute b . bp are those rows of b corresponding to the nonzero columns of
-p -p- tsk"
Let C(.) denote the number of operations required in performing a
given procedure S. By comparing the entries of the five substitution
procedures in Table 5.2, the following equation is obtained.
C(S1 ) - C(S) C(s 2 ) - C(S 2 )(.54)
In large scale integrated circuits, most of devices are nonlinear.
Due to the current sources generated by Newton-Raphson iteration and
numerical integration, it is reasonable to assume that the source vectors
4ssk and 4tsk are full. Also, since Vsk and Vtk are the required node
voltages, it is reasonable to assume that they are full. Under the above
assumptions, we obtain the following lemmas.
Lemma 5.. a, b, y, and ^ are full.
Lemma 5.4. Suppose that the subnetwork matrix is structurally symmetric,
then the number of rows of b req equals the number of nonzero rows of V.
Lemma .j. Suppose that that subnetwork matrix is structurally symmetric
and Y sk is mxm, then the difference in the number of operations between S1
and S1 (DNS1) is:
-I
IIC'rn2
N (n W w
U).0 z
it Qi) 4-bI
I4 -- W ~02 0
11 0d4
>4 -'-
UUrn -W
Z ~W202I
I "-4
I1 N
I 414/2
02 02U tfn
IWI
-122-
DNS1 = C(S1 ) - C(S1 )
m mm
='1 Vsk j=l'-tskm
=IV1-l Y s I -!: (JUsk ( j ) I - 1 ) B ( Qj ) I
m
=(number of fill-ins in V) - (IlUsk(j.) l-)B(V(j )I) (5.55)
Lemma 5.6. Suppose that the subnetwork matrix is structurally symmetric
and Ysk is mxm, then the difference of number of operations between S1 and
S I (DNS2) is:
DNS2 = C(S1 ) - C(SIn t n t m
" DNS1 j I'V('j)I-jIIY tk('J) II L sk(.'B(IV(j')m m
DNS1 lvl-lYtskl-j--l" (lusl,(J') -I)B((V(J')[)-j-B(fV(j.)I)m
2DNS1 - IVB(jV(j-)I) (5.56)
Remark. From Lemma 5.6 we can conclude that if C(SI) < C(SI) then
C(Si) < C(S1 ), that is, only when C(S1 ) > C(S1 ) do we need to compare Sl
with Sr*.
eMa 5.7. Suppose that the subnetwork matrix is structurally symmetric
and Y is mxm, then
C(S1*) < C(S 2 ) (5.58)
Proof: From Eq. (5.54), we obtain
* _ . . . . .
bA
r -123-
C(S1) 1 C(S2) =C(S) C(S2 ) (5.59)
So we only need to prove C(SI ) - C(S2 ) < 0
+M
=j-l(IUsk(i) -1)(B(IV(J')I)-Z 2 (j ) I) (5.60)
SinceI I (j) I-B(IV(j
" I[) (5.61)
andIz2(j) IlIz4 (j) 1 (5.62)
so we have
[ IZ (j ) I B ( V ~j ") 1) (5.63)
Substituting Eq. (5.63) into Eq. (5.60), we obtain
C(S*) - C(S*) " 0 (5.64)
From Lemma 5.7, we know that S2 and S2 are not as efficient as the
other procedures, so they are eliminated from the list of possible
substitution procedures. Now we are left with S1 , S1 and S. The
* possible combinations of factorization methods and substitution methods are
F1 + Sl, F1 e+ SI, F1 + S1 , F2 + SI , and F2 + S1 . Theoretically we can
not eliminate any of these five combinations, because we can always come up
with a special subcircuit structure for which a particular combination
gives the best result. However, after conducting a large number of
Ir studies, we found experimentally that F1 + S,. gives the best results for
A all the practical circuits we used; moreover, F1 + S1 is well compatible
A.
- - ... 1... .. 9,
I-124-
1with the sparse matrix techniques and is the easiest one to implement, so
F1 + S I was chosen to be used in the program SLATE.
In the following we would like to present a small selection of the
examples which we have analyzed by using Lemma 5.2, Lemma 5.5 and Lemma
5.6.
Example 5.1. The subcircuit used is a TTL two-input NAND gate with
parasitic resistors included (Fig. 5.6). The admittance matrix for this
subcircuit is shown in Fig. 5.7.
DNF = 20 - 23 - 39 = -42
DNS1 = 9 - 13 -4
DNS2 -8 - 10 = -18
so the best combination for this subcircuit is F1 s S1 .
Example 5.Z. The subcircuit used is an ECL two-input NOR gate with
parasitic resistors included (Fig. 5.8). The admittance matrix for this
suboircuit is shown in Fig. 5.9.
DNF = 17 - 15 - 27 = -25
DNS1 = 6 - 8 = -2
DNS2 -4 - 6 - -10
so the best combination for this subeircuit is F. + SI .
Example The subcircuit used is an MOS two-input NAND gate with
3 -125-
III
I 28
I 1726 24 23 1-61 16
1218
8
91 21
I 52
11I5
I F5-oO03I Fig. 5.6 TTL Nwo-Input NAND Gate.
I
-126-
t-~ 00000000000000 x x. 0 0 0 z-, 0 Z
C','
C,,'
C',4
C','
C','
C"C
0 CD0ID00c0 000 XX000X0C
00000000000000 0000000000 00
~~00 o~o oo0 oZZ0 000~ 00 00
C14
- 0
U -127-
I7I9
I!I
17 14 1113 5 21
19 iis0151 10 23
44
16
22
FP- 7014
T Fig. 5.8 ECL Two-Input NOR Gate.
4,
-128-
23 22 5 6 3 8 7 9 10 11 12 4 14 15 16 17 IS 13 2 19 20 21(5) 23 1 0 X 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0(6) 22 0 1 0 X 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 X
(23) 5 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0(22) 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30 0 00 xO0 0 00 xO 00 00 00 00 00 008 00 0 00 X XXO0 00 00 00 00 00 00 07 00 0 00 X X X0 000 00 0 00 0 x00 090 00 00 XX X 000 00 0 00 00 F 0X
10 0 0 X 0 0 0 0 0 X X X0 0 0 0 0 0 0 0 0 0 011 0 0 0 0 X 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 012 0 0 0 0 0 0 0 0 X X X X 0 0 0 0 0 0 0 0 0 04 00 0 X 0 00 00 0 X O 0X0 0 XC 0 00 00
14 0 0 0 0 0 0 0 0 0 0 0 0 X X 0 0 0 XX 0 0 015 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 F X F0 0 016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 X 0 017 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 X F 0 018 0 0 0 0 0 0 0 0 0 0 0 X 0 F X X X F FF 0 013 0 0 0 0 0 0 0 0 0 0 0 0 X X 0 0 F X FF X 02 0 0 0 00 0 XF0000X F 0X F F XFF F
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X F F F FX F F20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X FF X F21 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 FF F X
Fig. 5.9 Admittance Matrix for the Subcircuit in Fig. 5.8.
1 -129-
I parasitic resistors included (Fig. 5.10). The admittance matrix for this
3 subcircuit is shown in Fig. 5.11.
DNF 18 - 9 - 30 = -21
DNS1 3 - 5 -2
DNS2 = -4 - 7 = -11Iso the best combination for this subcircuit is F1 + S1.
These three examples show that at the subnetwork level the best
combination is F1 + S1 .
jFor the interconnection matrix, because this matrix has the same
property as that of the matrix formulated by the MNA for any circuit, so
jthe new reordering strategy of the MNA and the Markowitz sparse matrix
scheme are used to exploit the sparsity at this level.I1 5.6. Implementation of the Node Tearing Method
As concluded in the previous section the best combination at the
subcircuit level is in most cases F1 + Sl, now we would like to describe
jthe implementation of this approach. First, at the subcircuit level, the
source vector is appended to the matrix to form
!.V~k jstk Jssk'
1tsk ittk tsk (5.65)
Secondly, use the new reordering strategy of the MNA and the Markowitz
I sparse matrix scheme to find the LU factorization Of Isk. The LU
I AJ
I I -131-
Ii
!!
13 2 5 4 8 9 3 7 6 10 11 12(2) 13 1 X 0 0 X 0 0 0 0 0 0 0
(13) 2 0 1 0 0 0 0 0 0 0 0 0 0
5 0 0 X X 0 0 0 0 0 0 X 04 0 0 X X 0 0 X 0 0 0 X 0I 8 0 x 0 0 x x 0 0 0 0 0 x
i9 0 0 0 0 X X 0 0 0 0 0 X
3 0 0 0 X 0 0 X X 0 0 F 07 0 0 0 0 0 0 X X X X F 0
.6 0 0 0 0 0 0 0 X X X F X10 0 0 0 0 0 0 0 X X X F F
11 0 0 X X 0 0 FF F F X F
12 0 0 0 0 X X 0 0 X F F X
III
Fig. 5.11 Admittance Matrix for the Subcircuit in Fig. 5.10.
IIIIII
-132-
factorization operates on the whole matrix but terminates after the LU
factorization of Ysk is obtained. Now the original matrix is transformed
into I I
Lsk -sk -stk I-sk -ssk
(5.66)
Y*gZtsk Esk Lttk
Thirdly, formulate the interconnection matrix equation (5.40). Fourthly,
use the new reordering strategy of the MNA and the Markowitz sparse matrix
scheme to find the LU factorization of Eq. (5.40), and use forward and
backward substitution to find the solutions for vt and vr" Fifthly, use
backward substitution to solve the upper part of Eq. (5.46) to obtain the
solutions V~sk"
Remark: The reordering of the subnetwork and network matrix equations is
done in the preprocessing phase. The subnetwork matrix equations are
reordered first. So when the network matrix equations are reordered, the
structures of all the Y 's are known. From the structures of all the Y tkttk
and the circuit description of the network, we can reorder the network matrix
equations by the new reordering strategy of the MNA and the Markowitz sparse
matrix scheme.
: L ........... 1 P
1 -133-
I5.7. Circuit Interpretation of the Tearing Methods
Although the derivations of the tearing methods are quite
I mathematical, there is a very simple circuit interpretation. The node
tearing method is just a generalized Norton equivalent circuit approach.
II The branch tearing method is just a generalized Thevenin eqivalent circuit
I approach.
Let us consider node tearing first. Consider the special case when
there is only one tearing node. The partial interconnection matrix
jequations (Eq. (5.42)) that we obtain for each subnetwork are just the
Norton equivalent circuit matrix equations for that subnetwork. This is
illustrated in Fig. 5.12(a). So for the case when there are more than one
tearing node, Eq. (5.42) is just the generalized Norton equivalent circuit
I matrix equations for each subnec. See Fig. 5.12(b) for an illustration of
the case of two tearing nodes.
Now let us consider branch tearing for the special case when there is
only one tearing branch. The partial interconnection matrix equations that
we obtain for each subnetwork are just the Thevenin equivalent circuit
matrix equations for that subnetwork. This is illustrated in Fig.
15. 13(a). So for the case when there is more than one tearing branche, the
partial interconnection matrix equations that we obtain for each subnetwork
are just the generalized Thevenin eqivalent circuit matrix equations for
that subnetwork Fig. 5.13(b) illustrates the case of two tearing branches.
i- - - - II - I I ... . . . . . . . ...... ... - - - - - ---- i . . .. . -II .. .....
-136-
5.8. Discussion and Conclusion
The reason why we considered many possible LU factorization and
substitution procedures at the subnetwork level is that at the subnetworkc
level we only want to perform Gaussian elimination for the variables v sk
and the elimination procedure terminates after the LU factorization of Y s
is obtained. At the interconnection level, because we perform the Gaussian
elimination for all the variables, only one LU factorization and
substitution procedure is used.
As discussed in Section 5.5, it is possible to generate special
subcircuit structures for which some other combinations give better
results, however, our experimental results show that even for these
specially constructed circuits the difference of the number of operations
is only 1 or 2 most of the time, so this very small savings does not
justify the extra difficulties of implementation associated with these
other combinations; moreover, for all the practical circuits we tested,
F + S showed considerable savings over all the other combinations.
-137-
VI. LATENCY EXPLOITATIONIIn conventional circuit simulation programs [1,2], all of the node
3 voltages or branch voltages and currents are calculated at each iteration
and each timepoint. Even with sparse matrix techniques the simulation of
modern large-scale integrated (LSI) circuits is not possible in many
j situations due tc the excessive computation time and high storage
requirements. The latency approach is a circuit analysis version of the
selective trace approach used in logic simulation. This approach takes
advantage of the fact that in some circuits only a small portion of the
circuit is active at any given time and at any iteration, and t us provides
savings in CPU time.
In the program SLATE this approach is applied at three levels: (1)
device level; (2) subnetwork level; (3) network level. At the device
* level, it is also called the bypass scheme. This scheme is done by
* monitoring the operating point :2 each nonlinear device. If the operating
point does not change significantly between timepoints or Newton-Raphson
iterations, then the device models are not reevaluated, and the matrix
entries computed at the previous timepoint or the previous iteration are
used again. This scheme is used in SPICE2 [2] and SPLICE [39]. Latency at
the subnetwork and network levels can be well exploited when tearing
methods are used to analyze the network. The tearing method used in
I program SLATE is node tearing, so in the following discussion about latency
at the subnetwork and network levels, node tearing is assumed. Latency
I exploitation at the subnetwork level is presented in Section 6.1. Latency
exploitation at the network level is presented in Section 6.2. In Section
6.3 a discussion is given.
-138
6.1. Latency Exploitation at the Subnetwork Level
As discussed in Chaoter V, the node tearing method is just a
generalized Norton equivalent circuit aporoach. After all the internal
circuit variables of a subnetwork are eliminated, a generalized Norton
equivalent circuit of the subnetwork is obtained. Combining the equivalent
circuits of all the subnetworks with the rest of the network (Fig. 6.1),
we obtain the interconnection circuit. So after applying the node tearing
method to tear the network apart into several subnetworks, the
preprocessing of each subnetwork to obtain the contribution to the
interconnection matrix is equivalent to constructing an equivalent circuit
for each subnetwork. If the solutions of the circuit variables of a
subnetwork do not change significantly between timepoints or Newton-Raphson
iterations, then there is no need to reconstruct an equivalent circuit for
that subnetwork. The equivalent circuit constructed at the orevious
timepoint or the previous iteration is used again and the subnetwork is
declared as latent. The subnetwork remains latent until the solutions of
the circuit variables of the subnetwork change significantly between when
it is declared latent and the oresent time or the oresent iteration. This
is the basic concept of latency at subnetwork level.
There are two types of latency at the subnetwork level: one is
latency in the Newton-R~ahson iteration, the other is latency in time.
Latency in the Newton-Raphson iteration is not natural. It is related
to the convergence property of each subnetwork and the initial guess of the
operating point for each subnetwork. Let us consider the example in Fig.
6.2. It is assumed that the input signal is constant and that a ic
analysis is required. Different subnetworks may require different number
3 -139-
Suntok1IjSbewr(a?
Th etoth ewr
n, n,
Suntwr(a,3
nI
FP- 7048
Fig. 6 .1(a) A Network Partitioned into Three Subnetworks by
the Node Tearing Method.
(b) Equivalent Interconnection Circuit obtained from
the Node Tearing Method.
of iterations to converge, for example, because the initial guess of the
operating points may be good for some subnetworks and bad for the others.
After these subnetworks converge, they are declared as latent. The
1 Newton-Raphson iterations continue until all the subnetworks converge.
I This phenomenon is called latency in the Newton-Raphson iteration. Taking
advantage of this latency results in savings in the execution time. This
j latency may not exist in some circuits. For example, a linear circuit.
Latency in time is a natural phenomenon. All physical devices have
intrinsic delay time between excitation and response. For a large network,
j qhen the input signal changes, it takes time for this change to propagate
to the rest of the network. Let us consider the example in Fig. 6.2
again. Let 'us assume that the innut changes from OV to 5-V at time t0*
Initially, probabably only the first few subnetworks are not latent, the
rest of the subnetworks are latent. As time oasses, the change in the
response propagates to the intermediate subnetworks. At this time, only
the intermediate subnetworks are not latent, the rest of the subnetworks
are latent. Finally, the change oropagates to the last few subnetworks and
the rest of the subnetworks are latent. This phenomenon is called latency
in time, and it always exists in real circuits. Taking advantage of this
Jlatency results in savings in the execution time.J In order to exploit these two tyoes of latency, some sort of latency
criteria are required to determine if a subnetwork is latent. The latency
I criteria proposed here are developed for the node tearing method, if the
branch tearing method is used, these criteria should be modified
accordingly.
I
-142-
Let us consider one subnetwork Nk. Let the tearing node voltages of
Nk be denoted by vtk' the node voltages of Nk be denoted by Ysk, the node
voltages of all the nonlinear devices be denoted by Vnlk.
First, let us consider the latency criterion in the Newton-Raphson
iteration. In principle, the solution of all the circuit variables of a
subnetwork should be checked to determine if the subnetwork is latent.
However, to check for latency in the Newton-Raphson iteration only the node
voltages of all the nonlinear devices need be checked. The latency
criterion in the Newton-Raphson iteration used in SLATE is as follows: A
subnetwork Nk is declared as latent at the ith iteration if the following
two conditions are satisfied.
(1) IVnlk (i-l)-Vnlk (i-2) 4 E+ Er m ax( Vnlkm (i-) (6.1)
IVnlk (i-2)I) m = 1,2,...m
(2) IVtk (i)vtk 4l Ca+ Er max(Iv k (i) I IVk (i-I)!) (6.2)
m in in m
m - 1,2,....
where E a and er are absolute and relative error criteria.
The subnetwork Nk will remain latent as long as
(3) (vtk (i+j)-vtk (i-1l) I < a + Er max(Ivtk (i+J) , (6.3)
IVtk (i-1) 1) m 1,2.m j 1,2.
Once a subnetwork is declared as latent in the Newton-Raphson
iteration, no linearization of the nonlinear devices of Nk is required, no
preprocessing of the subnetwork to obtain the partial contribution to the
interconnection matrix is required, no backward substitution to obtain the
solutions of the internal circuit varietles is required, and no convergence
A
1-143-
tests are required. One only needs to monitor the tearing node voltages
and to bring the previous partial contribution to the interconnection
matrix.IThe simulation data from SLATE show that considerable savings in
execution time is obtained and the output results are essentially the same
as those from YSPICE. Table 6.1 gives the simulation data from SLATE for
the circuit shown in Fig. 6.3. A dc analysis was performed. For this
I circuit, a 42.42% latency exploitation was achieved and a 22.65% savings in
CPU time was obtained.
Remark: Because the CPU time shown in Table 6.1 is the total CPU time,
which includes the time spent in the I/O and other utility subroutines, the
savings in CPU time is not the same as the latency exploitation.
For the latency in time criteria, four schemes are proposed here. The
first three schemes, scheme 0, scheme I and scheme 2, have been implemented
J and tested in program SLATE. Scheme 3 is still under investigation.
IScheme 0 is the easiest and the crudest scheme that could be
implemented. A subnetwork N is considered latent at time t ifkn(1) IV tk (t n ) - v t k (tn-1 ) 1 '5 a+ Er 'a~lt M tn"J64
Ivt.(t n-1)Dl m - 1,2,....The subnetworkN k will remain latent as long as
(2) IVtk(t.j)-vtk( I < ea+ e max(Iv tk(tn+j) (6.5)
IVk(t )D M 1)- 1,2,...
The advantages of Scheme 0 are:
-144-
sw
. z
La.
N
a I D
-Awl,,. - l . . ... . .. . . . . . . .. . . .. . . . . . . " " - . - i . .-
I --145-!-F-
ITable 6.1 Simulation Data of a DC Analysis
for the MOS Circuit in Fig. 6.3.
IDC analysis T SLATE S PICE2
#of subnetworksItimes # of 231 231
iterationsII
# of nonlatentsubnetworks times 133# of iterations
Latency exploitation 42.42%__ _ _ _ _ _ _ _ _ _ _ __ _ _
Total CPU time 3.927 5.077
(sec.)
Savings in CPU 22.65%
time
f.
-146-
(1) It is very easy to implement and there is no overhead.
(2) It is faster than Scheme I
The disadvantages of Scheme 0 are:
(1) It is not reliable and it is not accurate. If the network is a
stiff system and some of the node voltages are slowly varying, then it is
possible to declare a slowly varying subnetwork as latent and consequently
wrong answers are obtained.
(2) The tearing node voltages at time tn_1 must be stored, that is
more memory is required.
Scheme 1 is the most accurate scheme, and it only takes advantage of
latency in the Newton-Raphson iteration. It is based on the idea that if a
subnetwork is latent in time, then it is also latent in the Newton-Raphson
iteration. Even if we do not take advantage of latency in time, all the
subnetworks are treated as nonlatent in time and are solved at least once
at every timepoiit. Those subnetworks which are latent in time at any
timepoint will be declared as latent in the Newton-Raphson iteration after
one iteration at that timepoint. So at most one iteration for each latent
subnetwork is wasted at one timepoint. Scheme 1 is as follows:
(1) Solve the entire network including all the subnetworks at least
once at every timepoint.
(2) If any subnetwork is latent in time, then it is latent in all the
subsequent Newton-Raphson iterations at that timepoint. So only one
iteration for that subnetwork is performed.
1 -147-
I(3) For the nonlatent subnetworks, the Newton-Raphson iterations are
continued until convergence is obtained at that timepoint.
The advantages of Scheme 1 are:
I (1) It is very easy to implement and there is no overhead.
is, (2) The tearing node voltages at time tn_ 1 need not be stored, that
is, less memory is required.
1 (3) It is accurate and reliable.
j The disadvantage of Scheme I is that at every timepoint one iteration is
wasted for each subnetwork latent in time.
Scheme 0 is efficient but is not reliable. Scheme 1 is reliable but
is not efficient. If the network being solved is not stiff, then Scheme 0
is preferred. However, if the network is stiff and efficiency is
important, then both Scheme 0 and Scheme 1 are not suitable. Scheme 2 was
developed to accommodate this situation. It is similar to Scheme 0 in
efficiency and it is similar to Scheme 1 in reliability. It differs from
Scheme 0 in that some extra checks are made to make sure that slowly
varying subnetworks will not be declared as latent. All the slowly varying'4
subnetworks are declared as nonlatent.
Let the charges of capacitors and the fluxes of inductors of
i subnetwork Nk be denoted by Qk = (Qkl'Qk2' ...... 'Qkb )" Let the currents of
capacitors and the voltages of inductors of subnetwork Nk be denoted by
Ik z (Ikl'Ik2' ...... 'Ikb)- Scheme 2 is as follows: A subnetwork Nk is
i considered as latent if the following three conditions are satisfied.
I
-148-
(1)IVk (tn)v+tk (t 1 E a r max(Ivtk (tn) 1 (6.6)m m m
Vtkm (tn-1 ) m 1 1, 2,
This condition is the same as that of Scheme 0.
(2) 1I (t n )-I k (tl E+ Er max(I Ikm(tn)lg (6.7)
i ( ) ) m = 1,2 ... , bm
where E is the absolute error criterion for current. This condition isC
used to check if the changes of the energy-storage elements of subnetwork
Nk are small.
I km(rn)-I km(n.l)
(3) h I Qkm(tn)Qkm(tn-l) 1 m = 1, 2, .... b (6.8)
This condition is used to check if there are slowly varying nodes within
subnetwork Nk. In order to avoid division by zero, if
Ik (t n)-Ikm(tnl)l (e is a very small qv-ntity, it is 1T'2 in
SLATE), then condition (3) is skipped.
The subnetwork Nk will remain latent as long as
(4) Vtk m(tn+j)-Vtkm
IVtk (tn I)) m 1 1, 2,
Eq. (6.91 is-derived based on the following reasoning. Let us assume
that we are dealing with a linear capacitor and an exponential waveform,
then
Q -oy ¢(6.10)
N 1 (6.11) Z
di
1 -149-
V = VDD(-e -t/) (6.12)
Q(tn) = CVD ( 1 e tn) (6.13)
Q(tn 1 ) C*VDD(I - e ) (6.14)
(tn) C*VDD e- n/ (6.15)
C*V DD -tn-i / .rI(t-) =e (6.16)
I From Eqs. (6.13), (6.14), (6.15), and (6.16), we obtain
LT h I (tn)-It (n-1) hn-i (6.17)I n-iQ(tn)_Q(tn-l ) T
So Eq. (6.8) means that although the change in the response of a capacitor
is very small, if hn_1 is smaller than T, then the capacitor is not latent,
it is just slowly varying.
* The advantages of Scheme 2 are:
1 (1) It is faster than Scheme I.
1 (2) It is accurate and reliable. Slowly varying subnetworks are
detected and are treated as nonlatent subnetworks.
The disadvantages of Scheme 2 are:
(1) More checking is required.
IIL
-150-
(2) The tearing node voltages at time tn_1 must be stored, that is,
more memory is required.
(3) slowly varying subnetworks are detected and are treated as
nonlatent subnetworks, even though the changes in the response may be
negligible.
Remark: The detection of slowly varying subnetworks increases the accuracy
and reliability of the program, that is why it is an advantage. However,
since the changes in these slowly varying subnetworks are small, treating
them as nonlatent subnetworks is not efficient, that is why it is also a
disadvantage.
In order to overcome this problem, Scheme 3 was proposed. Scheme 3
takes full advantage of latency in time. Scheme 3 is as follows:
(1) The truncation error criteria are used to determine the timestep
for each subnetwork.
(2) Each subnetwork is analyzed with its own timestep.
The advantages of Scheme 3 are:
(1) It should be faster than all the other schemes.
(2) It is accurate and reliable. Since every subnetwork has its own
timestep, there will not be the problem of slowly varying subnetworks.
The disadvantages of Scheme 3 are:
-
.oA
I -151-
(1) It is not compatible with the present version of SLATE. An extra
event scheduler is required and the data structures of SLATE has to be
revised.I(2) It is more complicated to implement.
Because Scheme 3 is not compatible with the present version of SLATE,
1therefore it has not been tested and implemented in SLATE.
In the following, a small selection of examples is presented to give a
comparison among the first three schemes of SLATE: Scheme 0, Scheme 1, and
Scheme 2, and YSPICE.
Example 6.1: The MOS circuit shown in Fig 6.3 was analyzed by SLATE and
YSPICE. The output results of the three schemes of SLATE and YSPICE are
essentially the same (within four significant figures). The simulation
data of a transient analysis are given in Table 6.2. The simulation data
show that both Scheme 0 and Scheme 2 are more efficient than Scheme 1. For
this circuit, Scheme 2 is the most efficient and is about 2.5 times faster
than YSPICE.
This example shows that the latency in time approach is useful for the
analysis of MOS circuits. The next example shows that the latency in time
approach is also useful for bipolar circuits.
.Example 6.2: The TTL circuit shown in Fig 6.4 was analyzed by SLATE and
YSPICE. The output results of the three schemes of SLATE and YSPICE are
essentially the same(within four significant figures). The simulation data
jof a transient analysis are given in Table 6.3. For this example, the
L .1
-152-
Table 6.2 Simulation Data of a Transient Analysisfor the MOS Circuit in Fig. 6.3.
Transient Scheme 0 Scheme I Scheme 2 YSPICEanalysis
# of subnetworkstimes # of 2849 2475 2585iterations
# of nonlatentsubnetworks times 790 1277 706
# of iterations
Latency 72.27% 48.40% 72.69%exploitation
Total CPU time 18.358 22.073 17.022 42.877(sec.)
Savings in 57.12% 48.52% 60.30%CPU time
is
I -153-
II
-4I r-. I' -
a.
~A.IU,1 U,
'.4
I '.4
4'~J %4.400I
-4~ 0
C-,I ~ji 1.4
0
-d JJ .4J-44'
-U1.4 4)
U -41.4
.~ 4J
C/~ Cz~
-S ~
e~J 0 .~
1 +i ji ao-4
II KIII1 .1
-154-
Table 6.3 Simulation Data of a Transient Analysisfor the TTL Circuit in Fig. 6.4.
Transient Scheme 0 Scheme I Scheme 2 YSPICEanalysis
# of subnetworkstimes # of 2850 2945 2770iterations
# of nonlatentsubnetworks times 1516 2128 1945# of iterations
Latency 46.81% 27.74% 29.78%exploitation
Total CPU time 68.996 97.164 86.033 132.338(sec.)
Savings in 47.86% 26.58% 34.99%CPU time
M -155-------
simulation data show that Scheme 0 is the most efficient, Scheme 1 is still
j the least efficient. The reason why Scheme 2 is not as efficient as Scheme
0 can be traced to the close-coupling of bipolar circuits. So the
I conclusion. obtained from this example is that the error criteria should be
gloosaned for bipolar circuits. Scheme 0 is about 2 times faster than
YSPICE.
I Although the above two examples show that Scheme 0 is very efficient,
however, as mentioned before, Scheme 0 has a reliability problem. The
following example shows that Scheme 0 may give inaccurate output results
1 for stiff systems.
I Example 6.3: The RC circuit shown in Fig. 6.5 was analyzed by the three
schemes of SLATE. This circuit is a stiff system. the output results are
given in Table 6.4. For scheme 0, because the changes of the tearing node
j voltages of subnetwork 1 and subnetwork 2 are very small after t =13 ns,
both subnetworks are declared as latent. Since the input is constant too
I after t =13 ns, all the calculated output voltages will remain unchanged
afterwards, while the true output voltages should increase slowly. This
phenomenon can be observed from the unchanged output voltages after
j t =13 ns in Table 6.4(a). This example shows that Scheme 0 may not give
accurate results when the network is stiff and that both Scheme 1 and
Scheme 2 give accurate results even when the network is stiff.
-156-
51V 0. 5K QSK 31 0.5K 0.5K
l- I-
V Subnetwork I Subnetwork 2
PA7045
Fig. 6.5 An Example of a Stiff System.
I . . . . ... . . .. .. i [11 i i i ii , . .. .. . I III i I .. .. .. . -. . .. ... . * .. . 1 . . .
I •-157-
ITable 6.4(a) Scheme 0.
I-
TIME V(2) V(3) V(4)
O0.OOOD0 0 0 0.000D+00 0.OOOD+O 0. OOOD+O01 . OOOD-09 2. 936D+O0 7. 567D-01 4. 445D-1 32.OOOD-09 3.604D+00 1. 146D+00 3.726D-123.OOOD-09 3.733D+00 1.237D+00 1.144D-114. OOOD-09 3. 749D+00 1. 249D+00 2. 403D-115.OOOD-09 3.751D+00 1,251D+O0 4.161D-116. OOOD-09 3. 749D+O0 1 .249D+O0 6. 187D-1 17. OOOD-09 3. 750D.00 1. 250D.+00 8. 020D-118.OOD-09 3. 750D+00 1. 250D+00 9. 850D-1 19.OOOD-09 3.749D+O0 1.249D+00 1.161D-101.OOOD-08 3.749D+00 1. 249D+O0 1. 260D-101. 100D-08 3.748D+00 1. 248D+00 1 307D-101 .200D-08 3.748D+00 1. 248D+00 1 302D-101. 300D-08 3. 749D+O0 1. 249D00 1. 246D-101.40OD-08 3.749D+00 1.249D+00 1.145D-101 500D-08 3.749D+O0 1.249D+O0 1. 145D-101.600D-08 3.749D+00 1.249D+00 1. 145D-101.700D-08 3.749D+00 1.249D400 1.145D-101. 800D-08 3.749D+00 1.249D+00 1. 145D-101.900D-08 3.749D+00 1.249D+00 1.145D-102.OOOD-08 3.749D+00 1.249D+00 1.145D-102. 100D-08 3.749D+O0 1.249D+00 1.145D-102.200D-08 3.749D+O0 1.249D+O0 1. 145D-l02.300D-08 3.749DO0 1.249D+00 1. 145D-102.40OD-08 3.749D+00 1.249D.00 1. 145D-102.500D-08 3.749D+00 1.249D+00 1. 145D-10
I
1
-158-
Table 6.4(b) Scheme 1.
TIME V(2) V(3) V(4)
O.O00D4O0 0.OOD.O0 O.000D+00 O. + 000D001. OOOD-09 2. 936D400 7. 567D-01 4. 444D-132.OOOD-09 3.604D+00 1. 146D+,00 3.726D-123.OOOD-09 3.733D400 1.237D400 1.144D-114. OOOD-09 3. 749D+00 I. 249D+00 2. 403D-1 1
5.OOOD-09 3.751DO0 1.251D+O0 4.16lD-116.OOD-09 3.749D+00 . 249D+00 6. 187D-117.OOOD-09 3.750D+00 1.250D+00 8.020D-118. OOOD-09 3. 750D100 . 250D+00 9. 850D-1 19.OOOD-09 3.749D+00 1.249D+00 1. 172D-101.OOOD-08 3.749D+O0 1.249D+00 1.404D-101. 100D-08 3.749D+00 1.249D+00 1.666D-101. 200D-08 3. 749D+00 1. 249D+00 1. 958D-101.300D-08 3.750D+O0 1.250D+OO 2.281D-101.400D-08 3.751D+O0 1.251D+00 2.629D-101.500D-08 3.751D+00 1.251D+00 2.919D-101.600D-08 3.751D+00 1.251D+00 3.208D-101.700D-08 3.751D+00 1.251D+00 3.498D-101.800D-08 3.751D+00 1.251D+00 3.788D-101.900D-08 3.751D+00 1.251D+00 4.077D-10
2.OOOD-08 3.751D+00 1.251D+00 4.367D-102.100D-08 3.751D+O0 1.251D+00 4.656D-102.200D-08 3.751D+00 1.251D+O0 4.946D-102. 300D-08 3. 750D4*0 1. 250D+00 5. 236D-102.400D-08 3.750D+00 1.250D+00 5.525D-102.500D-08 3.750D,00 1.250D+00 5.815D-10
-159-
I
Table 6.4(c) Scheme 2.
I
TIME V(2) V(3) V(4)
O.OOOD+O0 O.OOOD+O0 O.OOOD+O0 O.OOOD+O01.OOOD-09 2.936D+00 7.567D-01 4. 444D-132.OOOD-09 3.604D+00 1.146D+00 3.726D-123.OOOD-09 3.733D+00 1.237D4.O0 1.144D-114.OOOD-09 3.749D400 1.249D+00 2.403D-115.OOOD-09 3.751D+00 1.251D+O0 4.161D-116.OOOD-09 3.749D+00 1.249D+Oo 6.187D-117.OOOD-09 3.750D+00 1.250DOo 8.020D-118.OOOD-09 3.750D+00 1.250D+00 9.850D-119.OOOD-09 3.749D+O0 1.249D+O0 1. 172D-101. OOOD-08 3.749D+O0 0 249D+00 1.404D-101 100D-08 3.749D+00 . 249D400 1.666D-101 .200D-08 3.749D+00 1. 249D+O0 1.958D-101.300D-08 3.750D+00 1.250D+O0 2.281D-101.400D-O8 3.751D+00 1.251D+00 2.629D-101500D-08 3.751D+00 1.251D+00 2.919D-101.600D-08 3.751D+00 1.251D+00 3.208D-101700D-08 3.751D+00 1.251D+00 3.498D-101.800D-08 3.751D+00 1.251D+O0 3.788D-101.900D-08 3.751D+00 1.251D+o0 4.077D-102.OOOD-08 3.751D+00 1.251D+00 4.367D-102..100D-08 3.751D+O0 1.251D+O0 4.656D-102.200D-08 3.751D+O0 1.251D+00 4,946D-102. 300D-08 3. 750D+00 1. 250D+O0 5. 236D-102.J400D-08 3.750D+00 1.250D+00 5.525D-102.50OD-08 3. 750D+00 1. 250D+00 5. 815D-10
L
-160-
6.2. Latency Exploitation at the Network Level
When the submatrices are large and the interconnection matrix is small
and sparse, then latency exploitation at the subnetwork level provides most
of the savings in CPU time. When the reverse is true, that is, the
submatrices are small and the interconnection matrix is large and
relatively dense, then the latency exploitation at network level becomes
important. Usually, the latter situation is true for MOS circuits.
Latency exploitation at the network level is equivalent to solving a
smaller interconnection matrix by using voltage source substitution. From
the substitution theorem we know that the same results will be obtained.
This approach can be explained by the following example as shown in Fig.
6.6(a). Let us assume that at a particular time or a particular iteration,
only subnetworks 5 and 6 are nonlatent, all the other subnetworks are
latent. B1 using voltage source substitution, the network can be replaced
by the equivalent network as shown in Fig. 6.6(b). The equivalent network
is solved to obtain the solutions of all the nonlatent nodes (nodes which
belong to nonlatent subnetworks). This equivalent network is obtained as
follows First, all the nonlatent subnetworks and all the latent
subnetworks which are adjacent to the nonlatent subnetworks are included in
the equivalent network; secondly, all the tearing nodes which only belong
to latent subnetworks are replaced by voltage sources, the resulting
netw4ork is the equivalent netdork. For this example, in the equivalent
netdork, the nonlatent subnetworks are subnetworks 5 and 6, the latent
subnetworks are subnet-dorks 4 and 7, the tearing nodes which are replaced
by voltage sources are nodes 5 and 9. After the solution for all the
nonlatent nodes is obtained, subnetworks 4 and 7 are checked to see if they
remain latent. If the answer is yes, then the same equivalent circuit is
At
-162-
used again. If the answer is no, then a new equivalent net-work is
generated.
The above is the conceptual idea. In the implementation, because
sparse matrix techniques are used, we do not want to really generate the
equivalent network and we do not want to reorder the interconnection matrix
and reconstruct the sparse matrix pointer systems everytime a new
equivalent circuit is generated. So the following algorithm is implemented
in SLATE.
(1) The ordering of the interconnection matrix is determined in the
preprocessing phase assuming all the subnetworks are nonlatent, and this
ordering is used in the whole analysis.
(2) At every timepoint or iteration, all the subnetworks are checked
to determine their latent status. All the tearing nodes which only belong
to latent subnetworks are labelled as latent nodes.
(3) All the rows corresponding to the latent nodes are replaced by the
branch constraint relations of grounded voltage sources. Thib is done by
skipping thote rows and columns during the LU factorization and forward and
backward substitutions.
Example 6.4: The MOS circuit shown in Fig. 3.18 was analyzed by SLATE
with and without the latency exploitation at network level. Scheme 2 w.as
used. The output results are essentially the same for both
approaches(whithin four significant figures). The simulation data for both
approaches are given in Table 6.5. This example shows that the latency
exploitation at network level also provides savings in CPU time.
-•-- - ,-I
-163-
I Table 6.5 Simulation Data of a Transient Analysisfor the MOS Circuit in Fig. 3.18.I
Transient Scheme 2 with Scheme 2 withoutanalysis latency exploitation latency exploitation
at network level at network level
Total CPU time 80.102 102.365(sec.)
I Savings in21.75%*CPU time
a.
II
[
-164-
6.3. Discussion
Four schemes for latency exploitation at subnetwork level are proposed
in this chapter. Scheme 0, Scheme 1 and Scheme 2 were implemented and
tested in the program SLATE. Scheme 3 is not compatible with the . 3nt
version of SLATE, so it is still at the development stage. From the
simulation data obtained from the first three schemes, our conclusion is
that Scheme 2 is the best of these three schemes. However, for bipolar
circuits, Scheme 2 is not the most efficient one. Conceptually, Scheme 3
should be the optimal one, so more work will be devoted to study this
scheme.
In order to illustrate the ideas and to estimate the inherent latency
easily, chains of inverters are used as example circuits in this chapter.
More complicated circuits are used in the next chapter to evaluate the
latency approaches used i.n SLATE.
I -165-
I VII. CONCLUSIONS
IThe examples used in Chapter 6 are chains of inverters. One example
I has eleven levels of inverters, the other has five levels of inverters.
I From the simulation data we can see that the latency exploitation increases
with the number of levels of logic gates. Since the number of levels of
logic gates for those circuits is large and those circuits have very simple
interconnection networks, so significant latency exploitation was obtained.
In this chapter, the simulation data for some circuits, which have a
complicated interconnection network and for which the number of levels of
logic gates is small, are presented to see if the latency approach can
provide significant savings in CPU time for these circuits. The simulation
data are compared with those obtained from our DEC-10 version of SPICE2.
Example L: The TTL circuit shown in Fig. 3.17 was analyzed by SLATE and
SPICE2. Scheme 2 was used in SLATE. The output results of SLATE and
SPICE2 are essentially the same (within four significant figures). The
simulation data of a transient analysis are given in Table 7.1. For this
bipolar circuit example, a 32.66% latency exploitation was achieved and a
40.15% savings in CPU time was obtained.
Example .2: The MOS circuit shown in Fig. 3.18 was analyzed by SLATE and
SPICE2. Scheme 2 was used in SLATE. The output results of SLATE and
SPICE2 are essentially the same (within four significant figures). The
, simulation data of a transient analysis are given in Table 7.2. For this
MOS circuit example, a 22.53% latency exploitation was achieved and a
46.70% savings in CPU time was obtained.
-166-
Table 7.1 Simulation Data of a Transient Analysisfor the TTL Circuit in Fig 3.17
Transient SLATE SPICE2analysis
# of subnetworkstimes # of 6426iterations
# of nonlatentsubnetworks times 4327# of iterations
Latency exploitation 32.66%
Total CPU time 189.56 316.74(sec.)
Savings in 40.15%CPU time
"L-
-167-
!Table 7.2 Simulation Data of a Transient Analysisi for the MOS Circuit in Fig. 3.18.
I Transient SLATE SPICE2a nalysis SAE SIE
# of subnetworkstimes # of 3600i terations
I# of nonlatent
subnetworks times 2789# of iterations
Latency exploitation 22.53%
Total CPU time 72.23 135.52(sec.)
Savings in 46.70CPU time
I
I!
t.'
-168-
Also these simulation data show that not only the latency approach but
also other new approaches implemented in SLATE provide savings in CPU time.
This observation is obtained by noting that the latency exploitation is
smaller than the savings in CPU time. These other new approaches which
also provide savings in CPU time are the new reordering scheme for the
modified nodal approach presented in Chapter 2 and the piecewise nonlinear
approach presented in Chapter 3. The new reordering scheme for the
modified nodal approach avoids the problem of pivoting on zero diagonal
elements and decreases the number of operations at the same time. However,
the efficiency provided by the new reordering scheme is problem-dependent.
For example, if the circuit does not have voltage sources or inductors,
then certainly no efficiency can be obtained by using our approach. The
piecewise nonlinear approach is still at the experimental stage. All the
examples we have simulated show that the use of the piecewise nonlinear
approach hastens the convergence and improves the global convergence
property of the Newton-Raphson method for bipolar and MOS circuits.
However, the proof of global convergence or the conditions for global
convergence for the piecewise nonlinear approach has not been obtained.
Further research is needed to prove the global convergence, or to modify
the approach we proposed to ensure global convergence. Also more work is
needed to study if the strict piecewise nonlinear approach is efficient, and
if it is not efficient, then the problem of how to use the ideas of the
piecewise nonlinear approach to hasten the convergence and to improve the
global convergence property of the Newton-Raphson method should be studied.
The solution of the two problems of numerical integration makes the program
more reliable and more accurate. This is described in Chapter 4. An
equation was presented to compute the upperbound on the local truncation
-169-
error (LTE) from the maximum global error 02E ' and the solution time T.max
The inaccuracy in the estimation of the local truncation error caused by
different timesteps was resolved by introducing a new formula for the
estimation. The inaccuracy in the estimation of the local truncation error
caused by using the node voltages at timepoints of the previous switch
interval was resolved by recognizing this situation and restarting the
numerical integration from the breakpoint.
In Chapter 5, the ideas of tearing methods were detailed, the most
efficient way of implementing the node tearing method was determined
theoretically and experimentally, and a circuit interpretation of tearing
methods was given.
In Chapter 6, four latency criteria schemes were proposed. The first
three schemes: Scheme 0, Scheme 1 and Scheme 2, were implemented and
tested. From the simulation data we conclude that Scheme 2 is the best out
of these three. Scheme 3 is still under investigation and we think it
should be the best scheme to exploit latency. More work is needed to study
how to implement this scheme efficiently and reliably, and to find out if
it is really the best scheme.
The nested subnetwork approach f4l,42,43] is the approach which allows
several levels of subnetworks and in which the latency approach is used at
every level of the subnetworks. This approach may provide savings in the
time spent in checking the latent status of subnetworks. Only latency
exploitation at the network level is implemented in program SLATE and we
believe that this checking time may be small, thus the savings in CPU time
provided by the nested subnetworks approach probably is not significant.
However, further investigation is needed to yield conclusive results.
Li_
-170-
Device characteristic latency and function latency are two concepts which
may provide some more savings in CPU time. More investigations need to be
done to exploit these two latencies.
In the present version of SLATE, a lot of information which is not
needed is still stored because SLATE evolved from YSPICE. Due to this
reason, although tearing methods should provide savings in memory, no
comparison of memory usage was presented in this thesis.
-~ ... .. ....
-17!-
REFERENCES
[1] L. W. Nagel, "SPICE2: A Computer Program to Simulate Semiconductor
Circuits," ERL Memo., No. ERL-M520, University of California,
Berkeley, May 1975.
[2] L. W. Nagel and R. A. Rohrer, "Computer Analysis of Nonlinear
Circuits, Excluding Radiation (CANCER)," IEEE J. Solid-State
Circuits, Vol. SC-6, August 1971, pp. 166-182.
[3] R. A. Rohrer, L. W. Nagel, R. G. Meyer, and L. Weber, "CANCER:
Computer Analysis of Nonlinear Circuits, Excluding Radiation,"
Proc. 1971 IEEE International Solid-State Circuits Conference,
Philadelphia, February 18, 1971, pp. 124-125.
[4] F. S. Jenkins and S. P. Fan, "TIME---A Nonlinear DC and Time-Domain
Circuit-Simulation Program," IEEE J. Solid-State Circuits, Vol. SC-6,
August 1971, pp. 188-192.
[5] "ASTAP General Information Manual (GH20-1271-0)," IBM Corp.,
Mechanicsburg, Pa.
[6] H. Qassemzadeh, and T. R. Scott, "Algorithm for ASTAP---A Network
Analysis Program," IEEE Trans.M Circuit Theory, Vol. CT-20,
November 1973, pp. 628-634.
[7] "ECAP II Application Description Manual (GH20-0983)," IBM Corp.,
Mechanicsburg, Pa.
[8] F. H. Branin, G. R. Hogsett, R. L. Lunde, and L. E. Kugel, "ECAP
II--A New Electronic Circuit Analysis Program," IEEEJ.. S d-State
Circuits, Vol. SC-6, August 1971, pp. 146-165.
[9] B. Dembart and L. Milliman, "Circus-2, A Digital Computer Program
for Transient Analysis of Electronic Circuits," Harry Diamond
Laboratories, Washington, D. C., July 1971.
-I I I I_ _ III_ I I_ _ _ ii _l _...._ _ _ _ _.... . _ _... II A. .
-172-
10] L. D. Milliman, W. A. Massena, and R. 4. Dickhaut, "CIRCUS---A
Digital Computer Program for Transient Analysis of Electronic
Circuits, "Harry Diamond Laboratory, Washington, D. C., Rep.
AD-346-1 , January 1961.
i] P. R. Bryant, 1. N. Hajj, S. Skelboe and M. Vlach, "WATAND Primer,"
University of Waterloo, Report 73-4, June 1979.
F1 2] -. Kron, "A Set of Principles to Interconnect the Solution of
Physical S~stems," Journal of .plied Physics, Vol. 24, No. 9.,
August 1953, pp. 965-980.
F131 G. Kron, Diakoptics---Piecewise Solution of Large-Scale 5rstems,
Macdonald, London, 1963.
F14] H. H. Happ, Diakoptics And Networks, Academic Press, New York, 1971.
[15] F. H. Branin, "The relation between Kron's Method and the Classical
methods of Network Analysis," The Matrix and Tensor Quarterly,
Vol. 12, No. 3, March 1962, pp. 69-115.
F16 ] L. 0. Chua and L. K. Chen, "Diakoptics and Generalized Hybrid
Analysis," IEEE Trans. on Circuits and Svstems, Vol. CAS-23, No. 12,
December 1976, pp. 694-705.
F17] F. F. Wu, "Solution of Large Scale Networks bY Tearing," IEEE Trans.
on Circuits and Systems, Vol. CAS-23, No. 12, December 1976,
pp. 706-71 3.
ri 8 ] L. 0. Chua and L. K. Chen, "Nonlinear Diakoptics," Proc. IEEE Int.
Srmp. on Circuits and Systems, April 1975, pp. 373-376.
19 ] K. U. Wang and T. Chao, "Diakoptics for Large Scale Nonlinear Time-
Varying Networks," Proc. IEEE Int. N mp. on Circuits and SNstems,
April 1975, pp. 277-278.
L i" .. . . . . llllI . .. I I I ' d '
.. . . .h - .. l
-173-
r201 A. Sangiovanni-Vincentelli, L. K. Chen, and L. 0. Chua, "A New
Tearing Approach---Node Tearing Nodal Analysis," Proc. IEEE Int.
Swrmp. on Circuits and Svstems, Anril 1977, pp. 143-147.
r21 ] P. Linardis, K. G. Nichols, and E. J. Zaluska, "Network Partitioning
and Latency Exploitation in Time Domain Analysis of Nonlinear
Electronic Circuits," Proc. IEEE Int. S'mp. on Circuits and Sstems,
May 1978, pp. 510-513.
F22] N. B. Rabbat and H. Y. Hsieh, "A Latent Macromodular Approach to
Large-Scale Soarse Networks," IEEE Trans. on Circuits and Svstems,
Vol. CAS-22, No. 13, December 1976, pp. 745-752.
F23 ] H. Y. Hsieh and N. B. Rabbat, "Computer-Aided Design of Large
Networks by Macromodular and Latent Techniques," IEEE Int. 'mp.
on Circuits and Systems, April 1977, pp. 688-692.
[24] N. B. Rabbat and H. Y. Hsieh, "Concepts of Latency in the Time-Domain
Solution of Nonlinear Differential Equations," Proc. IEEE Int. S~rmp.
on Circuits and Systems, May 1978, pp. 813-825.
[25] C. W. Ho, A. E. Ruehli and P. A. Brennan, "The Modified Nodal
Approach to Network Analysis," IEEE Trans. on Circuits and
Systems, Vol. CAS-22, pp. 504-509, June 1975.
[26] A. R. Newton and D. 0. Pederson, "A Simulation Program with Large
Scale Integrated Circuit Emphasis," IEEE Int. 5Wmp. on Circuits
and Svstems, New York, May 1978.
[27] G. Arnout and H. J. Deman, "The use of Threshold Functions and
Boolean-Controlled Network Elements for Macromodelling of LSI
* Circuits," IEEE J. Solid-State Circuits, vol. SC-13, pp. 326-332,
June 1978.1
r ..
-174-
r2 ]j C. A. Desoer and E. S. Kuh, Basic Circuit Theory, McGraw-Hill Book
Co., New York, 1969.
291 H. M. Markowitz, "The Elimination Form of the Inverse and its
.Aplication to Linear Programming," Management Sci., Vol. 3,
pp. 255-269, 1957.
[30] I. N. Hajj, P. Yang and T. N. Trick, "Avoiding Zero Pivots in the
Modified Nodal Approach," IEEE Trans. on Circuits and S stems,
to appear.
E3 11 W. J. McCalla and D. 0. Pederson, "Elements of Computer-Aided Circuit
Analysis," IEEE Trans. on Circuit Theory, Vol. CT-IS, January
1971, pp. 14-26.
F3 2 ] R. D. Berry, "An 0Otimal Ordering of Electronic Circuit Equations
for a Sparse Matrix Solution," IEEE Trans. on Circuit Theory,
Vol. CT-IS, January 1971, pp. 40-50.
[33] W. H. Kao, "Comparison of Quasi-Newton Methods for the DC analysis
of Electronic Circuits," M. S. Thesis, University of Illinois at
Urbana-Champaign, 1972.
[341 H. Gupta and J. Sharma, "An Algorithm for DC Solution of Electronic
Circuits," IEEE Int. 3$rmp. on Circuits and Sstems, July 1979.
[35] M. E. Daniel, "Development of Mathematical Models of Semiconductor
Devices for Computer-Aided Circuit Analysis," Proceeding of the
IEEE, Vol. 55, No. 11, Nov. 1967.
[36] L. 0. Chua and P. M. Lin, Computer-Aided Analysis of Electronic
Circuits: Algorithms and Computational Techniques. Englewood Cliff,
New Jersey, Prentice-Hall, 1975.
3 -175-
1 37] F. H. Branin, "A Sparse Matrix Modification of Kron's Method of
Piecewise Analysis," Proc. IEEE Int. Svmp. on Circuits and
Systems, April 1975, pp. 383-386.
I [38] I. N. Hajj, "Sparsity Considerations in Network Solution by Tearing,"
IEEE Trans. on Circuits and Sfstems, May 1980, pp. 357-366.
[39] A. R. Newton and D. 0. Pederson, "A Simulation Program with Large
I Scale Integrated Circuit Emphasis," IEEE int. SWmp. on Circuits
and Systems, New York, May 1978.
[40] J. Katzenelson, "An Algorithm for Solving Nonlinear Resistive
Network", Bell SWst. Tech. J., Vol. 44, pp. 1605-1620,
Oct. 1965.
[41] A. E. Ruehli, A. L. Sangiovanni-Vincentelli and N. B. Rabbat,
Time Analysis of Large-Scale Circuits Containing One-Way
Macromodels," Proc. IEEE Int. Symp. on Circuits and Sy'stems,
April 1980, pp. 766-770.
[42] N. B. Rabbat, A. L. Sangiovanni-Vincentelli and H. Y. Hsieh, "A
Multilevel Newton Algorithm with Macromodeling and Latency for the
Analysis of Large-Scale Nonlinear Circuits in the Time Domain",
IEEE Trans. on Circuits and S rstems, Vol. CAS-26, pp. 733-741,
Sept. 1979.
[43] H. Y. Hsieh and N. B. Rabbat, "Multilevel Newton Algorithm for
Nested Macromodel Analysis of Bipolar Networks", Proc. IEEE Int.4.
Symp. on Circuits and Srstems, April 1980, pp. 762-765.
T
-176-
VITA
Ping Yang was born in Ping-Tung, Taiwan, China on July 15, 1952. He
attended National Taiwan University in Taipei and Graduated in June 1974
with a Bachelor of Science in Physics. He was an Ensign in the Chinese
Navy from 1974 to 1976. In August 1976 he entered the University of
Illinois. From August 1976 to August 1930 he held a teaching assistantshia
in the Electrical Engineering department and a research assistantshio with
the Coordinated Science Laboratory. His research interests are in the
areas of computer-aided design, circuits and systems, large-scale
integrated circuits, and solid-state electronics.
9. -