of 28 /28
UNIT – 4 VLSI Design (ECE – 401 E) DELAY MODELS Delay modeling of ICs can be done in several ways, resulting in trade- offs between computation times and analysis accuracy. Consider the following timing analysis methods: Analog Simulation uses a simulator such as SPICE. These simulators require models of the I-V characteristics of the devices. The analysis is done using numerical integration methods. Worse case delay analysis requires finding worse case input patterns. These methods are very time-consuming, and thus timing analysis using these methods is all but impractical, except for special problems. Timing Simulation uses models of circuit and interconnect delays. As an example, consider the Crystal timing simulator. In this case simple models of gates and interconnections are used, and thus timing analysis can be done efficiently on large circuits. However, these methods pay a price in terms of accuracy. In static logic circuits, delay is measured between the 50% point of the waveforms, as shown in Figure 110. This choice assumes that the switching points of all gates are approximately at V DD /2. The time from the 10% point of a waveform to the 90% point is called the rise time t r . The time from the 90% to the 10% point is referred to as the fall time t f . Figure 110. Definition of Rise Time and Fall Time.

# Vlsi Final Notes Unit4

Embed Size (px)

DESCRIPTION

ECE VLSI 4 UNIT

### Text of Vlsi Final Notes Unit4

UNIT 4VLSI Design (ECE 401 E)DELAY MODELS Delay modeling of ICs can be done in several ways, resulting in trade-offs between computation times and analysis accuracy. Consider the following timing analysis methods: Analog Simulation uses a simulator such as SPICE. These simulators require models of the I-V characteristics of the devices. The analysis is done using numerical integration methods. Worse case delay analysis requires finding worse case input patterns. These methods are very time-consuming, and thus timing analysis using these methods is all but impractical, except for special problems. Timing Simulation uses models of circuit and interconnect delays. As an example, consider the Crystal timing simulator. In this case simple models of gates and interconnections are used, and thus timing analysis can be done efficiently on large circuits. However, these methods pay a price in terms of accuracy.In static logic circuits, delay is measured between the 50% point of the waveforms, as shown in Figure 110. This choice assumes that the switching points of all gates are approximately at VDD/2. The time from the 10% point of a waveform to the 90% point is called the rise time tr. The time from the 90% to the 10% point is referred to as the fall time tf.

Figure 111. (a) Schematic for a simple CMOS inverter driving a distributed RC wiring load and a single gate load. (b) On-resistance buffer and interconnection delay model, the distributed RC wiring is modeled with lumped components.2. Models for Interconnect DelayFigure 112 shows a three-dimensional view of a wire and an electrical model including several parasitic effects . Interconnect delay is a result of these parasitic effects: Interconnect Capacitance Cw, which arises from three components: the area component, also referred to as parallel plate capacitance; the fringing field component that accounts for the finite dimensions of wires; and the wire-to-wire capacitance between neighboring wires. In modern IC technologies the latter two are becoming comparable to parallel plate capacitance. Given a wire of width w, length l, dielectric thickness tox, a dielectric constant ox , and the fringing field factor, the wire capacitance is

Interconnect Resistance Rw, which arises since conductors in modern ICs are not ideal. Given the resistivity of the wiring material , and the thickness of the wire h, the wire resistance is

Other effects such as interconnect inductance Lw and shunt conductance Gw . These effects have been negligible up to date in IC technologies; however, they may become more important with decreasing dimensions.

Figure 112. (a) Three-dimensional view of an IC wiring structure. The dimensions of the wire are its length l, width w, thickness h, and its distance from a ground plane tox. (b) Electrical model of the parasitic effects of the wire.This section considers various interconnect models that handle parasitic capacitance and resistance. The most accurate model of interconnect at high speed is the transmission line model, where wires have capacitance, resistance, and inductance per unit length. These models are difficult to use and are unnecessarily complicated for present-day IC technologies. Transmission line analysis is only called for when the time of flight is longer than the signal rise times. This effect only occurs in the fastest of modern IC technologies such as gallium arsenide ICs. For this reason, this section focuses on two practical interconnect models, the distributed RC network model and the lumped RC network models.3. Delay in RC Trees The general time-domain solution of any RC network is a well-known problem in numerical analysis. An RC network (as shown in Figure 113) produces a linear system of differential equations that can be solved for an exact response. In general, given a linear network with n storage elements the response of the Laplace transform (or frequency domain) of the network, H(s) = L(h(t)) can be expressed in the form

Which has the time-domain solution

where pi and ki are the poles and residues, respectively, of the frequency domain expression. Unfortunately, the problem of computing the exact poles and residues from the frequency domain solution is a computationally expensive problem. Furthermore, it has been observed in practice that the response of practical RC networks is dominated by a few poles and residues.

Figure 113. Example of RC Tree for a CMOS VLSI Circuit with a fanout of 2.TIMING DRIVEN PLACEMENT Timing-driven placement incorporates timing objective functions into the placement problem. Depending on how important the timing is, designers can first try to fit the timing requirements and then minimize the area, or vice versa. Nets that must satisfy timing requirements are called critical nets. The criticality of a net is dictated by input and output timing requirements. In timing-driven placement, one tries to make critical nets timing-efficient and other nets length- and area-efficient. Timing can be represented by clock period T,

where S, dmax, and T0 represent clock skew, maximum delay, and the adjustment factor, respectively. Placement is a very important step in designing timing-efficient circuits. If a placement is bad in terms of timing, even a very powerful router cannot help. There are basically three approaches to the timing-driven placement problem: the zero-slack algorithm, the weight-based placement, and the linear programming approach. 1. Zero Slack Algorithm Consider a circuit without any loops, represented by a directed acyclic graph (DAG). If a circuit has loops, it is transformed into one without loops by removing a small number of edges. These edges will later be placed back in the circuit and the timing is verified. Inputs to the circuit (i.e., vertices with in-degree zero) are called primary inputs (PIs) and outputs of the circuit (i.e., vertices with out-degree zero) are called primary outputs (POs). Assume the signal-arriving time at each PI and the time a signal is required at a PO are known. Gate delays are also given. By tracing a signal path starting from the PIs, the arrival time tA of the signal at each pin (i.e., at inputs and the output of each gate) can be obtained. Here, it is assumed that each gate has single output; however, the fanout of a gate could be more than one. This is equivalent to saying that all outputs of a gate carry the same signal. This model can be extended to include one in which each gate has k distinct output signals. This is accomplished by duplicating the gate k times. The input to each of the duplicates is the same as the input to the original gate. However, each duplicate will have only one of the outputs of the original gate.Similarly, by starting from the POs and tracing back the path of each signal, the required time tR at each pin is obtained. Defineslack = tR tA...(1)Note that the arrival time at the output of a gate is known only when the arrival time at every input of that gate has been computed. Thus, a topological sorting is performed on the gates (vertices) to find the arrival times. (A topological sort is the process of visiting the gates one by one. A gate can be visited if all its inputs are PIs or when all the gates providing signal to it have already been visited.)The relationship between the arrival time of an output pin, , and that of the output pins of the fanin cells, , , , .. , is expressed in Equation 1. Terms , , .., are net delays, and D is the delay of the cell containing pin j . The relationship is also shown in Figure 114. ..(2)Deciding the required times is done in a similar manner, except the signal direction (i.e., the edges of the DAG) is reversed when topological sorting is applied. The relationship between the required time of the output pin, , and that of the related output pins of the fanout cells, , , , .. , is shown in Figure 115 and expressed in Equation 2. The terms D1, D2, D3, . . ., Dm are cell delays, and dn is the delay of the net that connects the cell to its fanout cells. ..(3)The zero-slack algorithm associates a maximum allowable delay to each net in an iterative manner. In the first iteration, the delay of each net is assumed to be zero. The purpose of the zero-slack algorithm is to distribute slacks to each net. It transforms timing constraints to net length constraints. Consequently, primary inputs, primary outputs, and the outputs of each cell will have zero slacks.

Figure 114. Computation of arrival time at an output pin.

Figure 115. Computation of required time at an output pin.An example is demonstrated in Figures 116 122. A triplet tA/slack/tR is shown at each (input and output) pin. Figure 116 shows the input circuit with the delay of each net initially set to zero. The arrival times of primary inputs and the required times of primary outputs are given. Slacks are calculated according to Equations 1 3. The minimum positive slack is four and the corresponding path, then one is arbitrarily selected. In Figure 117, the four units of slack have been distributed along the path segment of Figure 116. It indicates that each of the segments on the path can have length proportional to the assigned length (in this case, 1) in the final layout. A unit-delay model is assumed; that is, the propagation delay along a wire of length l is O(l). After slack distribution, slacks on the path segment become zero. Then the new slacks are calculated. The new path segment is primary input I2 and the net connected to it. Figures 118 122 demonstrate subsequent steps. Figure 122 shows the final result, where the output pin slack of each gate is zero. The just described algorithm is summarized as follows:procedure zero-slackbegin-1repeat until no positive slack;compute all slacks;find the minimum positive slack;find a path with all the slacks equal to the minimum slack;distribute the slacks along the path segment;end-1.

Figure 116. The zero slack algorithm example, step 1.

Figure 117. The zero slack algorithm example, step 2.

Figure 118. The zero slack algorithm example, step 3.

Figure 119. The zero slack algorithm example, step 4.

Figure 120. The zero slack algorithm example, step 5.

Figure 121. The zero slack algorithm example, step 6.

Figure 123. Three different spanning tree with distinct values.2. Clock Skew ProblemIn present and future circuit design, speed is of fundamental importance. Delay on the paths is a major factor in determining the speed. Another important factor is the clock skew that is caused by asymmetric clock distribution. Clock skew increases the total clock cycle time. Consider a clock tree consisting of a source and a set of sinks. The delay of the path with longest delay between the source and a sink is denoted by dmax. Similarly, the delay of the path with smallest delay between the source and a sink is denoted by dmin. The value dmax dmin is the clock skew. In most cases, clock trees with skew equal to zero (and small total length) are most desirable. See Figure 124, where an example of a tree with a large skew (left) and one with a small skew (right) are shown. The operation shown in the figure is called H-flipping.

Figure 124. Transforming a clock tree with large skew to one with small skew.The minimization of clock skew has been studied in the past few years. First, researchers proposed to use the H-tree construction in systolic arrays. The H-tree construction can be used when all components have equal sizes and are regularly placed. When all blocks are organized hierarchically, a clock distribution technique was proposed in. It was assumed that the number of blocks at each level of hierarchy was small since an exhaustive search algorithm was employed to enumerate all the possible routes.More recently, two recursive algorithms were proposed. They are both based on the linear delay model; recall that the delay along a path of length l in a given routing tree is O(l). First, Jackson, Srinivasan, and Kuh proposed a top-down approach to the problem. Their algorithm (the JSK-algorithm) recursively partitions the problem into two equal-sized (i.e., with equal number of points) subproblems. At each step, the center of mass of the whole problem is directly connected to the center of mass of the two subproblems, as shown in Figure 125. The center of mass of a point set {P1,.....,Pn}, where Pi has coordinates xi and yi is defined as a point with x-coordinate x = (x1 + . . . +xn)/n and y-coordinate y = (y1 + . . . + yn)/n. It is shown that clock skew (dmax dmin) is bounded by O(1/) for a point set uniformly distributed in the unit square, where n is the total number of points.

Figure 125. Example of the top-down JSK-algorithm.The second recursive algorithm was proposed by Cong, Kahng, and Robins. Their algorithm is a bottom-up technique based on recursive matching. Let {P1,.....,Pn} be the set of sinks, each being a point in a two-dimensional Euclidean plane. The set of points are matched-a minimal (length) matching has been employed as shown in Figure 126(a). Then the centers of the just matched points, denoted by , are found (Figure 126(b)). Here, the center is defined as the point that minimizes the length from it to the furthest node minus the length from it to the closest point. That is, let c be the selected center, lmax the length of the longest path from c to a sink, and lmin the length of the shortest path from c to a sink. The center c is selected to minimize lmax - lmin. At this stage the generated center points are also matched as shown in Figure 126(b). This procedure is repeated until all the points are matched. The final tree is obtained as shown in Figure 126(c). It is shown that the total length of the final tree for a set uniformly distributed in the unit square is O().

Figure 126. Recursive steps in the bottom-up matching algorithm.3. Buffered Clock Trees Modern high-speed digital systems are designed with a target clock period (or clock frequency), which determines the rate of data processing. A clock network distributes the clock signal from the clock generator, or source, to the clock inputs of the synchronizing components, or sinks. This must be done while maintaining the integrity of the signal, and minimizing (or at least upper bounding) the following clock parameters: the clock skew, defined as the maximum difference of the delays from the clock source to the clock pins; the clock phase delay (or latency), defined as the maximum delay from the clock source to any clock pin; the clock rise time (or skew rate) of the signals at the clock pins, defined as the time it takes the waveform to change from a VLO to a VHI value; the sensitivity to the parametric variations of the clock skew, clock phase delay, and rise time. The most important sensitivity is to the clock skew.Additionally, since in modern microprocessors the clock networks can consume a large portion of the total microprocessor's power (up to 50%), the clock design objectives must be attained while minimizing the use of system resources such as power and area. In high performance systems, these constraints can be quite severe. Consequently, the layout of a good clock distribution network is difficult and time-consuming. A very basic concern in the construction of clock trees is the integrity of the clock signal. Since the chip interconnect is lossy, the clock signal degrades as it is transmitted through the network. The longer the path from the source to the sink, the longer the rise time of the clock signal. If the rise time of the clock signal exceeds the clock period, then the clock signal is unable to reach suitable logic high and low values. This problem can be remedied by several different clock network design strategies.Other work on buffered clock trees has focused on minimizing the phase delay of the clock tree on the assumption that reducing clock tree delays also reduces skew and skew sensitivity to process variation. In these works, the delay model usually consists of a fixed on-resistance and delay for each buffer. More recently, work in considers the minimization of skew and delay in the presence of process variations in buffered clock trees.Various wiring-oriented approaches exist to resolve these problems. First, it is possible to taper the wire widths of the clock tree such that the rise times at the terminals are decreased and perhaps satisfied (Figure 127(a)). Another alternative is to increase the capacitance and decrease the resistance on the source-to-sink paths by using wide bus techniques (Figure 127(b)). This approach can be taken to extremes by constructing a fat clock bus from which other clock pin connections are obtained (as shown in Figure 127(c)). This last type of a clock tree has to be driven by a very large buffer, as in Figure 128(a); thus large amounts of power are consumed by the clock driver and the clock interconnect. However, since the delays of the clock tree are dominated by the capacitance of the wide bus, the bus-to-clock pin interconnection can be done simply and efficiently.

Figure 127. (a) Clock tree with tapered wire widths. (b) Clock tree using a wide bus approach. (c) Fat bus clock tree constructionThere are also buffer-oriented approaches to clock network constructions. Buffer-oriented approaches partition the clock tree into sections using buffers. Consider a buffered clock tree or clock power-up tree, which contains buffers in its source-to-sink paths. Several examples of buffered clock trees are shown in Figure 128. In this case, to maintain the integrity of the clock signal, it is sufficient to restrict all the rise times in the network to values smaller than the clock period. The addition of buffers in source-to-sink paths reduces the resistance in these paths, and therefore reduces the signals' rise times. As a result, the number of buffers needed by the clock tree is bounded below by the clock's target clock period. Algorithms are proposed to compute the minimum number of buffers required to satisfy a given clock period. Buffer-oriented and wiring-oriented clock designs exhibit several trade-offs. Wiring-oriented solutions are not very sensitive to fabrication variations, while buffer-oriented solutions may be more so. The construction of wiring-oriented networks requires careful analysis and modeling of the interconnect networks, while buffer-oriented trees must also take into account detailed buffer delay modeling. Buffer-oriented solutions potentially consume less power than wiring oriented solutions. Embedding the buffers into the placement may not be as easy as constructing a wiring-oriented solution.Consider buffered clock tree constructions along with process variations and realistic buffer delay models. To obtain zero skews while reducing sensitivity to process variations, buffered clock trees usually have equal numbers of buffers in all source-to-sink paths. Furthermore, at each buffered level the clock tree buffers are required to have equal dimensions (Figure 128(c)) to avoid buffer parameter variation and buffer delay model accuracy problems. Using the buffer delay model described earlier, two additional steps are required to obtain zero-skew clock trees: the capacitive loads of the buffers in one level must be the same (this is called load matching), and the rise times at the inputs of the buffers at one level must also match. Note that wiring-oriented techniques may be applied at each buffer stage to produce load and rise time matching.

Figure 128. (a) A buffer chain with tapered sizes driving a clock tree. (b) Clock power-up tree, with buffers of a single size. (c) Clock power-up tree, with buffers of tapered sizes. (Note that all buffers on the same level have the same size.)The clock network construction problem presents trade-offs between wire length and skew, and between variation of parameters under manufacturing conditions (effectively affecting chip yield) and power consumption. These trade-offs present a challenge to the designer and to those seeking to automate the clock design process.VIA MINIMIZATION Vias (or contact holes) between different layers of interconnection on dense integrated circuits can reduce production yield, degrade circuit performance (by increasing propagation delay), and occupy a large amount of chip area (as the physical size of a via is larger than the wire's width). Also, vias on multilayer printed circuit boards raise the manufacturing cost and reduce product reliability. Thus, much research effort has been focused on minimizing the number of vias.There are basically two categories of via minimization problems, constrained via minimization (CVM), and unconstrained via minimization (UVM). In constrained via minimization, given a routing region, a set of modules and terminals, and the layout of the nets (the detailed routing), the question is how to assign the wire segments to the layers such that no two wires of different nets intersect in the same layer and the number of vias is minimum. In an unconstrained via minimization problem, the layout of the nets is not given, and the question is how to find a topological (or global) routing such that the minimum number of vias is needed. 1. Constrained Via Minimization This section focuses on constrained via minimization in two-layer environments on circuits involving two-terminal nets. A circuit involving multiterminal nets can be partitioned into one with two-terminal nets. Figure 129(a) shows a layout before via minimization. The objective is to find the position for vias such that no two wires of different nets intersect in the same layer and the number of vias is minimum. Figure 129(b) shows an optimal solution. An algorithm is presented below for finding a minimum via solution, along with several examples.

Figure 129. CVM. (a) Before via minimization. (b) After via minimization.Consider an instance of CVM shown in Figure 130(a). A crossing graph is one where the vertices represent the intersection of nets and an edge is present between two adjacent intersections. The crossing graph of the layout shown in Figure 130(a) is shown in Figure 130(b). The faces of the graph are labeled as a, b,, g, where face g is the infinite (or boundary) face. From the result of CVM, represented by a plane graph G, the dual graph G' is obtained. A face with an odd number of edges on its boundary in G is an odd-cycle face. An even-cycle face is similarly defined.

Figure 130. An example of CVM and its crossing graph.2. Unconstrained Via Minimization In contrast to the constrained via minimization, an unconstrained, or topological, via minimization (UVM) considers topological (or relative) routing of the wires and layer assignment simultaneously. Techniques for solving UVM, first decide the topological information among the signal nets such that the number of required vias is minimum. Then a geometrical mapping of the topology to the layers is performed. An example is shown in Figure 131. The topology for the nets interconnecting the given terminals is shown in Figure 131(a), and the layout after geometrical mapping is shown in Figure 131(b).

Figure 131. UVM. (a) Topology of the net with one via; (b) geometrical mapping.

Figure 132. Topological routing graphs. (a) A circle graph; (b) net intersection graph; (c) a max-cut.Consider a routing region and a set of terminals. The routing region is characterized as a circle and the nets are drawn as chords. The intersection graph of the chords is called a circle graph, as shown in Figure 132(a). The vertices of the net intersection graph (see Figure 132(b)) represent the nets, and the edges represent intersections between the corresponding nets. That is, an edge (i, j) exists if nets i and j intersect in the circle graph. The goal is to delete a minimum number of vertices from the intersection graph, such that the resulting graph is bipartite; see Figure 132(c). This creates two sets of vertices that can be placed into two layers without any vias. In the figure, L1 represents a set of nets assigned to layer 1, and L2 represents a set of nets assigned to layer 2. The sets L1 and L2 define the maximum sets of wires that can be routed on two layers without vias. The deleted vertices are referred to as the residual nets. The problem now reduces to one of routing the residual wires.3. Other Issues in Via Minimization As VLSI technology advances, particularly as more than two layers are being used in industry, via minimization will become increasingly important. Researchers provided a more general formulation of UVM, and considered variations that arise due to differences in multilayer technologies such as VLSI and high performance packaging. Their formulation allows pins to be assigned to a layer, more than two pins per net, and regions of unordered pins that occur in channel routing problems. Various restricted classes are discussed; problems in most classes are NP-hard. Another class of via minimization problem is allowing modification of the given layout. Two procedures are used to modify the layout. The first one is to search for an alternate route to eliminate a via. The second one is to improve the result by changing the layer of wire segments by shifting vias along wire segments.POWER MINIMIZATION Power consumption in CMOS circuits is due to the dynamic power consumption of charging and discharging capacitive loads during output transitions at gates, the short circuit current that flows during output transitions at gates, and the leakage current. The last two factors can be made sufficiently small with proper device and circuit design techniques; thus, research in design automation for low power use has focused on the minimization of the first factor, the dynamic power consumption.The average dynamic power consumed by a CMOS gate is given below, where, Cl is the load capacity at the output of the node, Vdd is the supply voltage, Tcycle is the global clock period, N is the number of transitions of the gate output per clock cycle, Cg is the load capacity due to input capacitance of fanout gates, and Cw is the load capacity due to the interconnection tree formed between the driver and its fanout gates. ##### VLSI DESIGN - vignanits.ac.invignanits.ac.in/new/course notes/ECE/III/VLSI DESIGN Objective Questions.pdf · analysis and design, Bi-CMOS Inverters. UNIT II: VLSI CIRCUIT DESIGN PROCESSES
Documents ##### ECE 410: VLSI Design Course Lecture Notes - … · ECE 410, Prof. F. Salem Lecture Notes Page 2.1 ECE 410: VLSI Design Course Lecture Notes (Uyemura textbook) Professor Fathi Salem
Documents ##### Unit4-2 - Queen's Umath121/Notes/Annotated_Online/... · 2012. 4. 16. · Unit4-15 TangentLine orLocal Linearization In the last few examples, we focused on the change in y (or f,
Documents Documents