Vlsi physical design-notes

Dr.Y.NARASIMHA MURTHY Ph.D [email protected]

VLSI –PHYSICAL DESIGN

INTRODUCTION:

The transformation of a circuit description into a geometric description, is known as a layout.

A layout consists of a set of planar geometric shapes in several layers.

The process of converting the specifications of an electrical circuit into a layout is called the

Physical design.

Due to the large number of components and the fine details required by the fabrication process,

the physical design is not practically possible without the help of computers. As a result, almost

all phases of physical design extensively use computer-aided design (CAD) tools and many

phases are either partially or fully automated. This automation of the physical design process

has increased the level of integration, reduced the turnaround time, and enhanced chip

performance.

There are various CAD tools available in market and each of them have their own strengths

and weaknesses. The Electronic Design Automation (EDA) companies like Cadence,

Synopsys, Magma, and Mentor Graphics provide these CAD tools.

VLSI physical design automation is mainly deals with the study of algorithms related to the

physical design process. The objective is to study optimal arrangements of devices on a plane (or

in a three-dimensional space) and various interconnection schemes between these devices to

obtain the desired functionality. Because space on a wafer is very expensive, algorithms must

use the space very efficiently to decrease the costs and improve the yield. In addition, the

arrangement of devices (placement) plays a key role in determining the performance of a chip.

Algorithms for physical design must also ensure that all the rules required by the fabrication are

followed and that the layout is within the tolerance limits of the fabrication process. Finally,

algorithms must be efficient and should be able to handle very large designs. Efficient algorithms

not only lead to fast turnaround time, but also permit designers to iteratively improve the layouts.

VLSI DESIGN CYCLE:

The design process of producing a packaged VLSI chip physically follows various steps which

is popularly known as VLSI design cycle.This design cycle is normally represented by a flow

chart shown below. The various steps involved in the design cycle are elaborated below.

1


(i). System specification: The specifications of the system to be designed are exactly specified

in this step. It considers performance, functionality, and the physical dimensions of the design.

The choice of fabrication technology and design techniques are also considered. The end results

are specifications for the size, speed, power, and functionality of the VLSI system to be

designed.

(ii) Functional design: In this step, behavioral aspects of the system are considered. The

outcome is usually a timing diagram or other relationships between sub-units. This information

is used to improve the overall design process and to reduce the complexity of the subsequent

phases.

(iii). Logic design: In this step, the functional design is converted into a logical design, using the

Boolean expressions. These expressions are minimized to achieve the smallest logic design

2


which conforms to the functional design. This logic design of the system is simulated and tested

to verify its correctness.

(iv).Circuit design: This step involves conversion of Boolean expressions into a circuit

representation by taking into consideration the speed and power requirements of the original

design. The electrical behavior of the various components are also considered in this phase. The

circuit design is usually expressed in a detailed circuit diagram.

(v).Physical design: In this step, the circuit representation of each component is converted into

a geometric representation. This representation is a set of geometric patterns which perform the

intended logic function of the corresponding component. Connections between different

components are also expressed as geometric patterns. (This geometric representation of a circuit

is called a layout). The exact details of the layout also depend on design rules, which are

guidelines based on the limitations of the fabrication process and the electrical properties of the

fabrication materials. Physical design is a very complex process, therefore, it is usually broken

down into various sub-steps in order to handle the complexity of the problem.

(vi). Design verification: In this step, the layout is verified to ensure that the layout meets the

system specifications and the fabrication requirements. Design verification consists of design

rule checking (DRC) and circuit extraction. DRC is a process which verifies that all geometric

patterns meet the design rules imposed by the fabrication process. After checking the layout for

design rule violations and removing them, the functionality of the layout is verified by circuit

extraction. This is a reverse engineering process and generates the circuit representation from the

layout. This reverse engineered circuit representation can then be compared to the original

circuit representation to verify the correctness of the layout.

(vii). Fabrication: This step is followed after the design verification. The fabrication process

consists of several steps like, preparation of wafer, deposition, and diffusion of various

materials on the wafer according to the layout description. A typical wafer is 10 cm in diameter

and can be used to produce between 12 and 30 chips. Before the chip is mass produced, a

prototype is made and tested.

(viii). Packaging, testing, and debugging : In this step, the chip is fabricated and diced in a

fabrication facility. Each chip is then packaged and tested to ensure that it meets all the design

specifications and that it functions properly. Chips used in printed circuit boards (PCBs) are

3


packaged in a dual in-line package (DIP) or pin grid array (PGA). Chips which are to be used in

a multichip module (MCM) are not packaged because MCMs use bare or naked chips.

PHYSICAL DESIGN CYCLE :

The Physical design cycle converts a circuit diagram into a layout. This complex task is

completed in several steps ,like s partitioning, floor-planning, placement, routing, and lay-out

compaction etc. The details of these steps are given below.

(a).Partitioning : The chip layout is always a complex task and hence it is divided into

several smaller tasks. A chip may contain several million transistors. Layout of the entire circuit

cannot be handled due to the limitation of memory space as well as computation power available.

Therefore, it is normally partitioned by grouping the components into blocks. The actual

partitioning process considers many factors such as size of the blocks, number of blocks, and

number of interconnections between the blocks. The output of partitioning is a set of blocks

along with the interconnections required between blocks. The set of interconnections required is

referred to as a net list. In large circuits the partitioning process is hierarchical and at the

topmost level a chip may have between 5 and 25 blocks. Each module is then partitioned

recursively into smaller blocks.

A disadvantage of the partitioning process is that it may degrade the performance of the final

design. During partitioning, critical components should be assigned to the same partition. If such

an assignment is not possible, then appropriate timing constraints must be generated to keep the

two critical components close together. Usually, several components, forming a critical path,

determine the chip performance. If each component is assigned to a different partition, the

critical path may be too long. Minimizing the length of critical paths improves system

performance

After a chip has been partitioned, each of the sub-circuits must be placed on a fixed plane and the

nets between all the partitions must be interconnected. The placement of the sub-circuits is done

by the placement algorithms and the nets are routed by using routing algorithms.

(b) Placement: It is the process of arranging a set of modules on the layout surface. Each

module has fixed shape and fixed terminal locations. A poor placement uses larger area and

hence results in performance degradation.

The placement process determines the exact positions of the blocks on the chip, so as to find a

minimum area arrangement for the blocks that allows completion of interconnections between

4


the blocks. Placement is typically done in two phases. In the first phase an initial placement is

created. In the second phase the initial placement is evaluated and iterative improvements are

made until the layout has minimum area and conforms to design specifications.

It is important to note that some space between the blocks is intentionally left empty to allow

interconnections between blocks. Placement may lead to un-routable design, i.e., routing may not

be possible in the space provided. Thus, another iteration of placement is necessary. To limit the

number of iterations of the placement algorithm, an estimate of the required routing space is used

during the placement phase. A good routing and circuit performance heavily depend on a good

placement algorithm.This is due to the fact that once the position of each block is fixed, very

little can be done to improve the routing and the overall circuit performance.

There are various types of placements.

System-level placement : Place all the PCBs together such that Area occupied is minimum

and Heat dissipation is within limits.

Board-level placement : All the chips have to be placed on a PCB. Area is fixed All modules

of rectangular shape.

The objective is to , Minimize the number of routing layers and Meet system performance

requirements.

Chip-level placement : Normally, floor planning / placement carried out along with pin

assignment. It has limited number of routing layers (2 to 4). Bad placements may be unroutable.

Can be detected only later (during routing). Costly delays in design cycle. Minimization of

area.

Floorplanning:

Floor-plan design is an important step in physical design of VLSI circuits to plan the positions of

a set of circuit modules on a chip in order to optimize the circuit performance.

In floor-planning, the information of a set of modules, including their areas and interconnection

is considered and the goal is to plan their positions on a chip to minimize the total chip area and

interconnect cost.

In the floor planning phase, the macro cells are positioned on the layout surface in such a way

that no blocks overlap and that there is enough space left to complete the interconnections. The

input for the floor planning is a set of modules, a list of terminals (pins for interconnections) for

each module and a net list, which describes the terminals which have to be connected.

5


Different approaches are followed to the floor planning problem. Wimer et al. describe a branch

and bound approach for the floor plan sizing problem, i.e. finding an optimal combination of all

possible layout-alternatives for all modules after placement. While their algorithm is able to find

the best solution for this problem, it is very time consuming, especially for real problem

instances. Cohoon et al. implemented a genetic algorithm for the whole floor planning problem.

Their algorithm makes use of estimates for the required routing space to ensure completion of

6


the interconnections. Another more often used heuristic solution method for placement is

Simulated Annealing

(c) Routing: The main objective in this step is to complete the interconnections between blocks

according to the specified netlist. First, the space not occupied by the blocks (called the routing

space) is partitioned into rectangular regions called channels and switchboxes. The goal of a

router is to complete all circuit connections using the shortest possible wire length and using

only the channels and switchboxes. This is usually done in two phases, referred to as the global

routing and detailed routing phases.

In global routing, connections are completed between the proper blocks of the circuit

disregarding the exact geometric details of each wire and pin. For each wire, the global router

finds a list of channels which are to be used as a passage way for that wire. In other words,

global routing specifies the ‘‘loose route’’ of a wire through different regions in the routing

space.

Global routing is followed by detailed routing, which completes point-to-point connections

between pins on the blocks. Loose routing is converted into exact routing by specifying

geometric information such as width of wires and their layer assignments. Detailed routing

includes channel routing and switchbox routing.

As all problems in routing are computationally hard, the researchers have focused on heuristic

algorithms. As a result, experimental evaluation has become an integral part of all algorithms and

several benchmarks have been standardized. Due to the nature of the routing algorithms,

complete routing of all the connections cannot be guaranteed in many cases

(d).Compaction: The operation of layout area minimization without violating the design rules

and without altering the original functionality of layout is called as compaction. The input of

compaction is layout and output is also layout but by minimizing area.

Compaction is done by three ways:

(i) By reducing space between blocks without violating design space rule.

(ii) By reducing size of each block without violating design size rule.

(iii).By reducing shape of blocks without violating electrical characteristics of blocks.

Therefore compaction is very complex process because this process requires the knowledge of

all design rules. Due to the use of strategies compaction algorithms are divided into one-

dimensional algorithms (either in x-dimension or y-dimension), two dimensional algorithms

7


(both in x-dimension and y-dimension) and topological algorithm (moving of separate cells

according to routing constraints).

Types of compaction techniques:

(i) 1-Dimensional compaction:

In this technique compaction is done only in one dimension either in x-direction or y-direction

until no further compaction is possible . There are two types of constraints which relates to these

compaction techniques (i) Separation constraint (ii) Connectivity constraint.

(ii).2-Dimensional compaction:

In this method compaction is done in both dimension x-dimensions as well as in y-dimension. 2-

D compaction is in general much better than performing 1-D compaction. If 2-D compaction,

solved optimally, produces minimum-area layouts. The trade off in this technique is the much

time consumption. Thus we use 3/2-D Compaction.

(iii) 3/2-D Compaction:

In this technique the blocks are moved in such a way that it not only compact the circuit but

also resolve interferences. Since the geometry is not as free as in 2-D Compaction.

In this method two lists are formed one is ceiling another is floor. First is formed by the blocks

which are appeared from the top & second is formed by the blocks which are appeared from the

bottom. Selects the lowest block in the ceiling list and moves it to the place on the floor which

maximizes the gap between floor and ceiling. The process is continued until all blocks are moved

from ceiling to floor.

VLSI –DSIGN STYLES :

Though the partitioning of a physical design decomposes the physical design into several

conceptually easier steps, still each step is computationally very hard. So, in order to reduce the

the complexity of physical design and to get high yield certain restricted models and design

styles are proposed. They are (i) full-custom design style (ii) standard cell design style (iii)

gate array design style and (iv)

(i).Full-custom design style :

It is the most general form of layout in which the circuit is partitioned into a collection of sub-

circuits according to some criteria such as functionality of each sub-circuit. In this design style,

each sub-circuit is called a functional block or simply a block. The full custom design style

allows functional blocks to be of any size. Blocks can be placed at any location on the chip

8


surface without restriction. In other words, this style is characterized by the absence of any

constraints on the physical design process. This design style allows for very compact designs.

But, the process of automating a full-custom design style has a much higher complexity than

other restricted models. For this reason, it is used only when final design must have a minimum

area and designing time is less of a factor. The full-custom structure of a design is shown below .

The automation process for a full-custom layout is still a topic of intensive research. Some

phases of physical design of a full-custom chip may be done manually to optimize the layout.

Layout compaction is a very important aspect in full-custom. The rectangular solid boxes around

the boundary of the circuit are called I–O pads.

Pads are used to complete interconnections between chips or interconnections between chip and

the board. The space not occupied by blocks is used for routing of interconnecting wires. Initially

all the blocks are placed within the chip area, with the objective of minimizing the total area.

However, enough space must be left between the blocks to complete the routing. Usually several

metal layers are used for routing interconnections. Currently, two metal layers are common for

routing and the three-metal layer process is gaining acceptance, as the fabrication costs become

more feasible. The routing area needed between the blocks becomes increasingly smaller as more

routing layers are used. This is because some routing is done on top of the transistors in the

9


additional metal layers. If all the routing can be done on top of the transistors, the total chip area

is determined by the area of the transistors.

In a hierarchical design of circuit each block in full-custom design may be very complex and

may consist of several sub blocks, which in turn may be designed using the full-custom design

style or other design styles. It is clear that as any block is allowed to be placed anywhere on the

chip, the problem of optimizing area and interconnection of wires becomes difficult. Full-custom

design is very time consuming thus, the method is inappropriate for very large circuits, unless

performance is of utmost importance. Full-custom is usually used for the layout of chips like

microprocessors etc.

(ii).Standard cell design style : This is the more restricted design style and the design process

is simpler than a full-custom design style. Standard cell methodology considers the layout to

consist of rectangular cells of the same height. Initially, a circuit is partitioned into several

smaller blocks, each of which is equivalent to some predefined sub-circuit or cell. The

functionality and electrical characteristics of each predefined cell are tested, analyzed, and

specified. A collection of these cells is called a cell library, usually consisting of 200–400 cells.

Terminals on cells may be located either on the boundary or in the center of the cells. Cells are

placed in rows and the space between two rows is called a channel. These channels are used to

perform interconnections between cells. If two cells to be interconnected lie in the same row or

in adjacent rows, then the channel between the rows is used for interconnection. However, if two

cells to be connected lie in two nonadjacent rows, then their interconnection wire passes through

the empty space between any two cells, or feed through.

Standard cell design is well suited for moderate-size circuits and medium production volumes.

Physical design using standard cells is simpler as compared to full-custom and efficient using

moderndesign tools. The standard cell design style is also widely used to implement the

‘‘random logic’’ of the full-custom design . While standard cell designs are developed more

quickly, a substantial initial investment is needed in the development of the cell library, which

may consist of several hundred cells. Each cell in the cell library is ‘‘handcrafted’’ and requires a

highly skilled design engineer. Each type of cell must be created with several transistor sizes.

Each cell must then be tested by simulation and its performance must be characterized.A

standard cell design usually takes more area than a full-custom or a handcrafted design.

10


However, as more metal layers become available for routing, the difference in area between the

two design styles will gradually be reduced.

(iii). Gate array design style : This design style is a simplified version of the standard cell

design style. Unlike the cells in standard cell designs, all the cells in gate array are identical. The

entire wafer is prefabricated with an array of identical gates or cells. These cells are separated by

both vertical and horizontal spaces called vertical and horizontal channels. The circuit design is

modified such that it can be partitioned into a number of identical blocks. Each block must be

logically equivalent to a cell on the gate array. The name ‘‘gate array’’ signifies the fact that each

cell may simply be a gate, such as a three-input NAND gate. Each block in the design is mapped

or placed onto a prefabricated cell on the wafer during the partitioning / placement phase, which

is reduced to a block-to-cell assignment problem.

The number of partitioned blocks must be less than or equal to that of the total number of cells

on the wafer. Once the circuit is partitioned into identical blocks, the task is to make the

interconnections between the prefabricated cells on the wafer using horizontal and vertical

channels to form the actual circuit. The uncommitted gate array is taken into the fabrication

facility and routing layers are fabricated on top of the wafer. The completed wafer is also called a

customized wafer.

This simplicity of gate array design is gained at the cost of rigidity imposed upon the circuit both

by the technology and the prefabricated wafers. The advantage of gate arrays is that the steps

involved for creating any prefabricated wafer are the same, and only the last few steps in the

fabrication process actually depend on the application for which the design will be used. Hence,

gate arrays are cheaper and easier to produce than full-custom or standard cell. Similar to

standard cell design, gate array is also a nonhierarchical structure. The gate array architecture is

the most restricted form of layout. It means that it is the simplest for algorithms to work with.

For example, the task of routing in gate array is to determine if a given placement is routable.

The routability problem is conceptually simpler as compared to the routing problem in standard

cell and full-custom design styles.

(iv).Field Programmable Gate Array Design (FPGA):

In this design, chips are prefabricated with logic blocks and interconnects.Logic and

interconnects can be programmed (erased and reprogrammed)by users. No fabrication is needed.

Interconnects are predefined wire segments of fixed lengths with switches in between.

11


Conclusion: The choice of design style for a particular circuit depends on many factors like

functionality of the chip, time-to-market, and the volume of chips to be manufactured. Full-

custom is typically reserved for high-performance, high-volume chips, while standard cells are

used for moderate performance, where the cost of full-custom cannot be justified. Gate arrays are

typically used for low-performance, low-cost applications. A design style may be applicable to

the entire chip or a block of the chip.

Placement Algorithms :

The placement algorithms can be classified on the basis of the input to the algorithms, the nature

of output generated by the algorithms, and the process used by the algorithms. Depending on the

input, the placement algorithms can be classified into two major groups: constructive placement

and iterative improvement methods. The input to the constructive placement algorithms consists

of a set of blocks along with the net list. The algorithm finds the locations of blocks. On the

other hand, iterative improvement algorithms start with an initial placement. These algorithms

modify the initial placement in search of a better placement. These algorithms are typically used

in an iterative manner until no improvement is possible.

Simulated Annealing Placement Algorithm :

This is one of the popular placement algorithm used in the VLSI design process .The simulated

annealing technique has been successfully used in many phases of VLSI physical design,e.g.,

circuit partitioning. Simulated annealing is used in placement as an iterative improvement

algorithm.

Given a placement configuration, a change to that configuration is made by moving a component

or interchanging locations of two components. In the case of the simple pair wise interchange

algorithm, it is possible that an achieved configuration has a cost higher than that of the

optimum, but no interchange can cause a further cost reduction. In such a situation, the algorithm

is trapped at a local optimum, and cannot proceed further. Actually, this happens quite often

when this algorithm is used in real-life examples. Simulated annealing avoids getting stuck at a

local optimum by occasionally accepting moves that result in a cost increase.

The simulated Annealing Algorithm is given below.

Begin

temp = INIT-TEMP;

place = INIT-PLACEMENT;

12


while (temp > FINAL-TEMP) do

while (inner_loop_criterion = FALSE) do

new_ place = PERTURB (place);

ΔC = COST(new_ place) - COST (place);

if (ΔC < 0) then

place = new_ place;

else if (RANDOM(0,1) > e T then

place = new_place;

temp = SCHEDULE(temp);

End;

In simulated annealing, all moves that result in a decrease in cost are accepted. Moves that result

in an increase in cost are accepted with a probability that decreases over the iterations. The

analogy to the actual annealing process is heightened with the use of a parameter called

temperature T. This parameter controls the probability of accepting moves that result in an

increased cost. Additional moves are accepted at higher values of temperature than at lower

values. The acceptance probability can be given by e ΔC/T where ΔC is the increase in cost. The

algorithm starts with a very high value of temperature, which gradually decreases so that moves

that increase cost have a lower probability of being accepted.

Finally, the temperature reduces to a very low value which causes only moves that reduce cost to

be accepted. In this way the algorithm converges to an optimal or near-optimal configuration.

This random shuffling could be achieved by displacing a block to a random location, an

interchange of two blocks, or any other move that can change the wire length. After the shuffle,

the change in cost is evaluated. If a decrease in cost occurs, the configuration is accepted;

otherwise, the new configuration is accepted with a probability that depends on the temperature.

The temperature is then lowered using some function which, for example, could be exponential

in nature. The process is stopped when the temperature has dropped to a certain level. The

parameters and functions used in a simulated annealing algorithm determine the quality of the

placement produced.

These parameters and functions include the cooling schedule consisting of initial temperature

(init_temp), final temperature ( final_temp), and the function used for changing the temperature

(SCHEDULE), inner_loop_criterion, which is the number of trials at each temperature, the

13


process used for shuffling a configuration (PERTURB), acceptance probability (F), and the cost

function (COST).A good choice of these parameters and functions can result in a good

placement in a relatively short time.

Backtrack Search Algorithm:

In finding a solution for any problem , a sequence of decision points are considered , in which

each choice leads to further along some path. If the correct set of choices are made ,it will end

up at the solution. On the other hand, if a dead end is reached or otherwise found that an

incorrect choice is made then a back track is followed to a previous decision point and try a

different path. Algorithms that use this approach are called backtracking algorithms

So,this algorithm uses recursive depth-first search to explore the search space. Depth-first search

keeps going down a path as long as it can. If it reaches a node with no children (‘dead end"), then

it back tracks to its parent and tries another child node that it hasn't already explored. If it has

explored all child nodes, then it backtracks up one more level and continues. If the average

branching factor is b and the depth of the search tree is k, then backtrack search requires O(bk )

time, which is exponential. Classic examples of problems requiring backtrack search.

Branch and Bound algorithm:

Branch and bound is a variant of backtracking search that takes advantage of information about

the optimality of partial solutions to avoid considering solutions that cannot be optimal. So we

are still doing an exhaustive search but potentially avoiding exploring large parts of the search

space that are not going to give us a solution. Given an initial problem and some objective

function f to be minimized, the branch and bound technique works as follows.

14


If the problem is small enough, then solve it directly.

Otherwise the problem is decomposed into two or more sub problems. Each sub problem is

characterized by the inclusion of one or more constraints.

For each sub problem, we compute a lower bounding function g. This lower bound represents the

smallest possible cost of a solution to the sub problem,given the constraints on the given sub

problem.

Simulation:

The objective behind any simulation tool is to create a computer based model for the design

verification and analyzing the behavior of circuits under construction also checking the current

level of abstraction.

Types of Simulation :

Device level simulation . Circuit level simulation . Timing level & Macro level simulation.

Switch level simulation. Gate level simulation. RTL simulation. System level simulation.

Device level simulation : This model involves with a semiconductor device like a MOS

transistor used to test the effect of fabrication parameters .Simulator techniques based on finite-

element method are used for this purpose.

Circuit level simulation:It deals with small groups of transistors modeled in the analog

domain .The variables computed are currents and voltages and the computations are based on

numerical methods.

Switch level simulation: This simulation method , models the MOS transistors as switches,that

pass signals .The values of signals are discrete ,but it also includes certain analog features to

combine certain components like resistance and capacitance.

Gate level simulation : In this model a circuit is composed of several logic gates connected by

uni-directional memory less wires. The logic gates themselves are collections of transistors and

other circuit elements which perform a logic function. A logic gate may be a simple inverter or

NAND gate or NOR gate or more complex functional unit like a flip-flop or register.

Register –Transfer Level (RTL) simulation: This model is used synchronous circuits where all

registers are controlled by a system clock signal.The registers store the state of the system ,while

the combinational logic computes the next state and the output based on the current state and the

15


input. Here the important consideration is the state transitions and the precise timing of

intermediate signals in the computation of the next state is not considered.

System level Simulation : It deals with the hardware described in terms of primitives that need

not correspond with hardware building blocks. VHDL is the most popular hardware description

language used for system level simulation.When used in the initial stages of a design ,it can

describe the behavior of a circuit as a processor as a set of communicating processes.

Gate Level Modeling and Simulation :

The gate level model forms the theoretical basis for the logic design. In this model a circuit is

composed of several logic gates connected by uni-directional memory less wires. The logic gates

themselves are collections of transistors and other circuit elements which perform a logic

function. A logic gate may be a simple inverter or NAND gate or NOR gate or more complex

functional unit like a flip-flop or register .The logic gates compute the Boolean functions

correspond to their input signals and transmit the values along wires to the inputs of other gates

to which it is connected.Each input of the gate has a unique signal source. Information is stored

only in feedback paths of sequential circuits.

Gate level modeling and simulation is classified into following four types.

Signal Modeling :A signal modeling deals with the signal applied to a logic gate. Normally the

Boolean signals are denoted by either 0 or 1.A signal which is neither 0 nor 1 is denoted by X.

This indicates a transition from one state to other. Here X denotes the unknown value.The more

values are used for a signal ,the more complex is the modeling of a gate.If the gate has n inputs

signals ,with each signal having N values ,the output for Nn should be specified. The logic

involved in dealing with a circuit modeled using multiple –valued discrete signals is called

multiple-valued logic.

Gate Modeling: This is useful to model the behavior of even a single gate .The model should

be such that signal values at the gates outputs are efficiently computed as a function of gates

16


inputs.The outputs of a gate are represented by either truth table representation or Sub-routine

representation.

Delay Modeling:

At the gate level ,time is modeled ina discrete way and all delays in the circuit are expressed as

an integer multiple of a time unit. The output of any physical gate will take some time to switch

after the moment that the input is switched. The delay occurred here can affect the correct

functionality of the circuit, especially when the circuit is asynchronous. So,a correct modeling of

the delays is needed. The important delay models are

Propagation Delay model : It is associated with a fixed delay at the gates output. So,any effect

of switching inputs is observed at the output of after certain delay.

Rise fall Dealy model: This model is related to rise and fall in the output of a gate.It always

takes some time ro come back to normal state after rise or fall of a signal.

Inertial Delay model:The input pulse should have a minimum width in order to have any effect

at the output.Inertial delays occurs due to the capacitive elements in the gate. The inertial delays

can be combined with the propagation and rise fall delay models.

Connectivity Modeling : This model is related to the suitable connections of all gates in the

network .For this the simulator should have suitable data structures to represent the connectivity.

The unilateral nature of logic gates is the basis to the operation of gate level simulators. For

each binary vector at the input nodes of a logic gate ,the binary value (0 or 1) at the output is

computed and propagated on the inputs of other gates that are connected to it. Here during the

propagation of the signal a certain time delay occurs due to the inertial elements like node

capacitances present in the circuit. The simulators which do not consider this delay can analyze

only combinatorial circuits.So,the simulators which handle sequential circuits must estimate the

propagational delay through a logic gate in several ways.Some simulators operate in the unit-

delay mode where all logic gates are assumed to have the same delay. But these Unit –delay

simulators can verify only the steady –state behavior or the logic functionality of the digital

circuit.

The difference in the propagation delays through different signal paths in a network of logic

gates ,sometimes cause undesirable situations like static Hazards and dynamic Hazards.Hazards

are situations ,where a spurious glitches or spikes occur in an otherwise smooth analog

waveform at the output of a logic gate.

17


Compiler Driven Simulation:

There are two basic mechanisms to simulate a circuit at the gate level.They are (i) Compiler-

driven simulation and (ii) Event-driven simulation

The compiler driven simulation occurs in synchronous circuits.The core of such circuits consists

of registers that store the state of the system and combinational logic that computes the next

state.

Event driven simulation is developed by the fact that ,under normal conditions ,very few gates

switch simultaneously and that computing signal propagation through all gates in the network

over and over again at each time instant leads to unnecessary computations. So,it is economical

to compute only those signals that are actually changing their states. A change in signal state is

called an event , hence this simulation is called Event –driven Simulation.

In a sequential circuit ,the occurrence of a glitch could cause the circuit to mal-function.

Therefore ,the detection of Hazards and race-conditions are very important and as result ,most

digital simulators generates an alert to the user when they occur. The detection of Hazards is

possible by introducing a third state , denoted by X ,which denotes a signal-transition.

Many simulators use a third value to represent an unknown or undefined logic level denoted by

X.This X state indicate an uninitialized signal ,a signal held between two logic thresholds or

signal in a 0 → 1 or 1 → 0 transition.The X state is handled algebraically by extending the

binary Boolean algebra to a ternary or three valued De Morgan’s algebra which preserves most

of the desired properties of Gate model.

Some other simulators also implement the X-state by an enumeration technique in which the

simulation is repeated with the nodes in the X-state set to all possible combinations of 0’s and

1’s.Nodes that remain in a unique binary state for all combinations are set to this state,whicl

others are set to X.

To simulate tri-state gates and logic buses ,some simulators use a fourth state called the High

Impedance state and normally denoted by H(or Z also).This H state is also used some times to

model dynamic memory by allowing a node to retain its previous logic state,if the outputs of all

logic gates connected to the node are at the H-level.

18


Gate level simulators are not completely suitable for the logic simulation of MOS circuits.

Because ,there is a mismatch between the Boolean gate model and the behavior of the MOS

logic circuits. Hence ,there is a need of different approach to the digital modeling and simulation

of MOS circuits ,which is nothing but switch level logic simulation.

Switch level modeling and simulation: For simulation of MOS circuits ,these switch

level simulators are developed.One of the first switch-level simulators to be implemented is

MOSSIM.

In contrast to the gate-level modeling and simulation ,the switch leveltechniques operate directly

on the transistor circuit structure and capture many circuit properties that are not possible in gate

level model. For example bi-directionality of signal flow ,charge sharing effects and transistor

sizes.In contrast to circuit level simulation ,node voltages are represented by discrete logic levels

and transistors by bi-directional resistive switches in switch –level modeling.

So, a MOS circuit is modeled by a set of nodes connected by resistive switches.Node voltages

are represented by a small number of discrete logic levels like 0 ,1,X or U .Here U denotes

undefined or unknown.In some cases the high impedance state Z is also used.

The resistive switch model of a transistor is controlled by the voltage level at its gate terminal.An

n-type transistor is conducting when its gate voltage is 1 and a p-type transistor is conductiong

when its gate voltage is 0.Transistors are allowed to have discrete strength values depending on

the values of their conductances when fully ON.This is done to model the behavior of ratioed

logic.

As an example, a depletion load transistor used in n-channel MOS circuit design has its gate

logic level set to 1 and its strength is weaker than that of an enhancement type transistor.

Transistors in series are equal to a single transistor of strength equal to the weakest one, while

transistors in parallel are equivalent to a single transistor of strength equal to the strongest

one(maximum conductance).

In most of the switch level simulators ,the circuit is partitioned into channel-connected sub-

circuits.This partitioning can be done at once at the outset,where every transistor is included or

19


dynamically at every iteration where only conducting transistors are included.This dynamic

partitioning adds some additional overhead cost in the design.

The simulation of the entire circuit follows an event scheduler similar in many ways to gate –

level logic simulators, except that now the gates consists of channel –connected transistors.

LOGIC SYNTHESIS –VERIFICATION

INTRODUCTION:

Logic synthesis is the process of converting a high-level description of design into an optimized

gate-level representation. Logic synthesis uses a standard cell library which have simple cells,

such as basic logic gates like and, or, and nor, or macro cells, such as adder, muxes, memory,

and flip-flops. Standard cells put together are called technology library. Normally the technology

library is known by the transistor size (0.18u, 90nm).

Boolean Logic Functions Representation: There are various methods in use to represent the

logic functions. For example

(i) Truth table method ,(ii) equation method (iii) K-map method and (iv) , circuit method etc…

But all these methods are not standard or unique or Canonical. To synthesize ,optimize or to

verify or manipulate the large Boolean functions ,they must be represented efficiently by using

suitable methods. One such method to represent the complex Boolean functions is Binary

Decision Diagrams(BDD).The BDD method ,which is canonical is the most popular method

among others.

Binary Decision Diagram (BDD)

Binary decision diagram (BDD) is a graphical representation of a Boolean function, which is

derivable from Shannon’s expansion theorem . It is similar to binary tree .So, a binary decision

diagram (BDD) is a finite DAG (Directed Acyclic Graph) with the following features.

It has a unique initial node,

all non-terminals labelled with a Boolean variable,

Aall terminals labeled are with 0 or 1,

All edges are labelled with 0 (dashed edge) or 1 (solid edge),

Each non-terminal has exactly 1 out-edge labeled 0 and 1 out-edge labeled 1.

20


Shannon’s Expansion Theorem :

The Shannon’s expansion theorem is used iteratively to build any BDD for a given Boolean function.

Shannon’s expansion theorem states that “Any switching function of n variables can be

expressed as a sum of products of n literals, one for each variable’.

Let us assume that f (x1, x2, ..., xn) is a switching function of n variables. According to

Shannon , one way of expressing this function is

f(x1, x2, ..., xn) = x1f(1, x2, ..., xn) + x1'f(0, x2, ..., xn)

On the right side, the function is the sum of two terms, one of them relevant when x1 is equal to 1

and the other when x1 is equal to 0 . The first term is x1 times what remains of f when x1 is equal

to the value 1 and the second term is x1' times what remains of f when x1 is equal to 0.

Shannon’s expansion theorem in the general case isf = a0 x1'x2' ... xn' + a1x1'x2' ... xn–1'xn + a2x1'x2' ... xn–1 xn' + ...+ a2n–2 x1x2... xn' + a2n–1 x1x2... xn

Each ai is a constant in which the subscript is the decimal equivalent of the multiplier of ai

viewed as a binary number. Thus, for three variables, a5 (binary 101) is the coefficient of x1x2'x3.

In a similar way it can be stated as “ any switching function of n variables can be expressed as

a product of sums of n literals, one for each variable “.

Binary Decision Diagram (BDD)- Example :

Let us consider an example of constructing a BDD.The output of a Boolean function S is given

by the truth table and the function is denoted by the MUX based circuit shown in the diagram.

Figure (i) Truth table Figure (ii) Mux based circuit

21


This Boolean function is denoted by the Binary Decision Diagram as shown below figure ( c)

Figure ( c) - Binary Decision Diagram

In the BDD , the line with a bubble on it denotes that value = 0 and the Lines without bubble

denote the value = 1.

Let us consider S(0,0,0) in Figure (d) and S(1,1,1) in Figure (e).

There are several methods to denote the value = 1 and value = 0 .For example Bubble vs. Non-bubble line Dashed line vs. Solid line T (then) vs. E (else) labels

22


(i)-Bubble –no bubble (ii) Dashed line-solid line (iii) T(then) vs E(else) label

BDDs for Basic Logic Functions :

The following diagrams denote some logic functions NOT , OR ,AND etc..

23


Ordered Binary Decision Diagram (OBDD)

A ,Binary Decision Diagram is said to be an ordered binary decision diagram if the variables

appear in the same order along all paths from the root to the leaves

Let [x1,x2,-------------xn] be an ordered list of variables without duplicates. This BDD is said to

have ordering [x1,x2,x3,---------xn] if (i) all variable labels occur in the order [x1,x2,---------xn]

and if xj follows xi along a path such that j > i.

For variables a,b,c,d the ordering should be such that a ≤ b ≤ c ≤ d as shown in the diagram below.

(a) (b)

The diagram (b) above, shows an optimal ordering because, there is exactly one node for

each variable. The order is b ≤ c ≤ a ≤ d .

Reduction operations(ROBDD) :1. Removal of duplicate terminals. If a BDD contains more than one terminal 0-node, then

redirect all edges which point to such a 0-node to just one of them. Proceed in the same way

with terminal nodes labelled with 1.

2. Removal of redundant tests. If both outgoing edges of a node n- point to the same node m,

then eliminate that node n,sending all its incoming edges to m

3. Removal of duplicate non-terminals. If two distinct nodes n and m in the BDD are the roots of

structurally identical sub BDDS, then eliminate one of them, say m, and redirect all its

incoming edges to the other one.

A BDD is reduced if it has been simplified as much as possible using these reduction operations

Examples: (i). remove duplicate terminals

24


(ii).Remove redundant tests.

(iii).Remove duplicate non-terminal

(iv).Remove duplicate non-terminal and redundant tests

25


A BDD is reduced if it has been simplified as much as possible using these reduction operations

BDD Canonical Form : The Binary Decision Diagrams are said to be canonical(unique) for a

given ordering if All internal nodes are descendants of some node and there are no isomorphic

sub-graphs and for every node fT ≠ fE .

Advantages of Canonical representation: Representing a Boolean function in canonical

form ,has the advantage of checking for redundant variables easily .A Boolean function f does

not depend on an input variable x if no nodes occur for x in the Reduced ordered Binary

Decision Diagram.

The equivalence of two functions f and g can be easily done by seeing the structure of

ROBDDs.The various manipulations on BDDs can be performed directly if the function is

denoted in canonical form.

26


HIGH-LEVEL SYNTHESIS

INTRODUCTION:

The basic objective of high-level synthesis is the mapping of a behavioral description of a digital

system into an RTL design consisting of a data path and a control unit.

So, the High-level synthesis (HSL) is a sequence of tasks that transforms a behavioral

representation into an RTL design. The design consists of functional units such as ALUs and

multipliers, storage units such as memories and register files, and inter connection units such as

multiplexers and buses.

One of the tasks of synthesis is to find the structure that best meets the constraints, such as

limitations on cycle time , area or power, while minimizing other costs. For example, the goal

might be to minimize area while achieving a certain minimum processing rate.

From that input specification, the synthesis system produces a description of a register-transfer

structure that realizes the specified behavior. This structure includes a data path, that is, a

network of registers, functional units, multiplexers and buses, as well as hardware to control the

data transfers in that network. If the control is not integrated into the data path the synthesis

system must also produce the specification of a finite state machine that drives the data paths so

as to produce the required behavior. The control specification could be in terms of microcode, a

PLA profile or random logic.

High-level synthesis is different from the logic synthesis in a sense that the logic synthesis is

specified in terms of logic equations, which must be optimized and mapped into a given

technology. Logic synthesis is used on a design after high-level synthesis has been done.

So,we can say that High-level synthesis (HLS) is a new step in the design flow of a digital

electronic circuit, moving the design effort to higher abstraction levels.

The advantage of High-level synthesis is that it raises the design abstraction level and allows

rapid generation of optimized RTL hardware for performance, area, and power requirements.

In addition , there are several advantages arise from the use of HLS in the design flow. First of

all, the amount of code to be written by designers is reduced dramatically, which saves time and

reduces the risk of mistakes.

27


Verification time, is reduced a lot because the HLS tool can, in addition to the design itself,

generate test benches, thereby reusing test data that was used to validate the source code.This is

particularly relevant for the design of FPGA based embedded systems.

HARDWARE MODELS :

All HLS systems need to restrict the target hardware. Most systems generate synchronous

hardware and build it with the following parts:

Functional units : They can perform one or more computations, e.g. addition, multiplication,comparison, ALU.

Registers:They store inputs, intermediate results and outputs; sometimes several registers are

taken together to form a register file.

Multiplexers: From several inputs, one is passed to the output.

28


Busses: a connection shared between several hardware elements, such that only one element can write data at a specific time.

Three-state (tri-state) drivers control the exclusive writing on the bus.

Hardware is normally partitioned into two parts:

The data path : A network of functional units, registers, multiplexers and buses. The actual

‘‘computation’’ takes place in the data path.

Control: The part of the hardware that takes care of having the data present at the right place at

a specific time, of presenting the right instructions to a programmable unit, etc.

Often high-level synthesis concentrates on data-path synthesis. The control part is then realized

as a finite state machine or in microcode.

Synthesis tasksHigh-level synthesis maps a behavioral description into the FSMD model so that the data path

executes variable assignments and the control unit implements the control constructs. Since the

FSMD model [FSM with a data path] determines the amount of computation in each state,

so,one must first define the number and type of resources (storage units, functional units, and

interconnection units) to be used in the data path. Allocation is the task of defining necessary

resources for a given design constraint.

The next task in mapping a behavioral description into an FSMD model is to partition the

behavioral description into states (or control steps) so that the allocated resources can compute

all the variable assignments in each state. This partitioning of behavior into time intervals is

called scheduling.

Although scheduling assigns each operation to a particular state, it does not assign it to a

particular component. To obtain the proper implementation, we assign each variable to a storage

unit, each operation to a functional unit, and each transfer from l/O ports to units and among

units to an interconnection unit. This task is called binding (or resource sharing).

Binding defines the structure of the data path but not the structure of the control unit. The final

task, control synthesis, consists of reducing and encoding states and deriving the logic network

for next-state and control signals in the control unit. Control synthesis employs well-known logic

synthesis.

Allocation. The allocation task determines the type and quantity of resources used in the chip

architecture. It also determines the clocking scheme, memory hierarchy, and pipelining style.

29


The goal of allocation is to make appropriate trade-offs between the design’s cost and

performance. If the original description contains inherent parallelism, allocating more hardware

resources increases area and cost, but it also creates more opportunities for parallel operations or

storage accesses, resulting in better performance. On the other hand, allocating fewer resources

decreases area and cost, but it also forces operations to execute sequentially , resulting in poorer

performance. To perform the required tradeoffs, allocation must determine the exact area and

performance values.

Scheduling :

The next step schedules operations and memory accesses into clock cycles. Scheduling

algorithms are of two types, based on the optimization goal and the specified constraints.I f the

user has completely specified all the available resources and the clock cycle length during

allocation, the scheduling algorithm’s goal is to produce a design with the best possible

performance, or the fewest clock cycles. In other words, scheduling must maximize usage of the

allocated resources. We call this approach resource-constrained scheduling. If a list of resources

is not available prior to scheduling, but a desired overall performance is specified, the scheduling

algorithm’s goal is to produce a design with the lowest possible cost, or the fewest functional

units. This is the time constrained scheduling approach. Resource-constrained scheduling usually

constructs the schedule one state at a time. It schedules operations so as not to exceed resource

constraints or violate data dependencies. It ensures that at the instant for which it schedules an

operation Oi into control step Sj, a resource capable of executing Oi is available and all the

predecessors of node Oi have been scheduled.

Binding : The binding task assigns the operations and memory accesses within each clock cycle

to available hardware units. A resource such as a functional, storage, or interconnection unit can

be shared by different operations, data accesses, or data transfers if they are mutually exclusive.

For example, two operations assigned to two different control steps are mutually exclusive since

they will never execute simultaneously; hence they can be bound to the same hardware unit.

Binding consists of three subtasks based on the unit type.

30


Scheduling Algorithms: The scheduling is very important because it determines the

amount of concurrency of the resulting implementation .It also determines the amount of

hardware resources required and with minimum area.

There are two important scheduling algorithms. They are ASAP(As soon as possible ) and

ALAP(As late as possible).

ASAP Scheduling: It is a unconstrained minimum-latency scheduling problem.In this situation

,we have infinite resources, all that is needed is the minimum time to perform the computation.

Hence it is referred to as ASAP (as soon as possible) scheduling.

The code for the ASAP scheduling is given below.

ASAP( GS(V,E) ){

Schedule v0 by setting t0 = 1

repeat{

Select a vertex vi whose predecessors are all scheduled;

Schedule vi by setting ti = max tj + dj

}

until (vn is scheduled);

return t;

}

Explanation: The various steps are explained below.

Step1: Schedule v0 at time 1

Step2 : Select a vertex vi whose predecessors are all scheduled

Step3 : Schedule vi to time = predecessor’s scheduled time + time required for predecessor to

execute

Step4 : Has vn been scheduled yet? No, Repeat loop.

ALAP Scheduling :

It is a Latency constrained scheduling problem.Here Schedule must satisfy an upper bound on

latency .Hence it is referred to as ALAP (as late as possible) scheduling.

ALAP Scheduling goal is to schedule tasks/operations to perform as late as possibleThe code for the ALAP scheduling is given below.

31


ALAP( GS(V,E), λ ) { ; Perform ALAP scheduling on the sequencing graph, λ is the upper time bound

Schedule vn by setting tn = λ + 1 ; Schedule the sink node vn for upper latency bound + 1repeat {

Select a vertex vi whose successors are all scheduled ; Look for tasks/operations whose Successors are already scheduledSchedule vi by setting ti = min tj - dj ; Schedule the task/operation to time = time successor scheduled for - time required for successor to execute may have multiple successors,take minimum time}

until (v0 is scheduled) ; Keep going until we have scheduled the source node v0

return t;

}

Mobility :

In ASAP scheduling we compute the earliest time at which an operation is scheduled. One can

also perform the opposite and compute the latest time at which an operation can be

scheduled.The ALAP scheduling has the same disadvantages as ASAP scheduling. The

combination of both scheduling gives more powerful heuristics. Mobility is used as a basis for

several scheduling heuristics.

So, Mobility or slack is an important quantity used by some scheduling algorithms.Mobility is

given by

Mobility = start time ALAP scheduling – start time ASAP scheduling

When Mobility = 0, task/operation can only be started at the given time in order to meet overall

latency constraint. Similarly Mobility > 0, indicates the span of possible start times and Helps

with minimizing resources (adders, multipliers, etc.).

Hence , the concept of Mobility can be used to improve resource requirements.

Physical Design Automation of FPGAs

INTRODUCTION:

FPGA is a new approach to ASIC design that can dramatically reduce manufacturing turnaround

time and cost. In its simplest form, an FPGA consists of a regular array of programmable logic

blocks interconnected by a programmable routing network. A programmable logic block is a

RAM and can be programmed by the user to act as a small logic module. Given a circuit, user

32


can program the programmable logic module using an FPGA programming tool. The key

advantage of FPGAs is re-programmability .The RAM nature of the FPGAs allows for in-circuit

flexibility that is most useful when the specifications are likely to change in the final application.

In some applications such as remote sensors, it is necessary to make system updates via software.

In FPGA, a data channel is provided, which allows easy transfer of the new logic function and

reprogramming the FPGA.

The physical design automation of FPGAs involves mainly three steps. They are partitioning,

placement and routing.

Partitioning problem in FPGAs is significantly different from the partitioning problems in other

design styles .This problem mainly depends on the architecture in which the circuit has to be

implemented.

Placement problem in FPGAs is very similar to the gate array placement problem.

The routing problem in FPGAs is to find a connection path and program the appropriate

interconnection points.

FPGA Technologies :An FPGA architecture mainly consists of two parts : the logic blocks, and the routing network. A

logic block has a fixed number of inputs and one output. A wide range of functions can be

implemented using a logic block. Given a circuit to be implemented using FPGAs, it is first

decomposed into smaller sub-circuits such that each of the sub-circuit can be implemented using

a single logic block. There are two types of logic blocks. The first type is based on Look-Up

Tables (LUTs), while second type is based on multiplexers.

Look-up table based logic blocks:

A LUT based logic block is just a segment of RAM. A function can be implemented by simply

loading its LUT into the logic block at power up. If a function needs to be implemented, then its

truth table is loaded into the logic block. In this way, on receiving a certain set of inputs, the

logic blocks simply ‘look up’ the appropriate output and set the output line accordingly. Because

of the reconfigurable nature of the LUT based logic blocks, they are also called the Configurable

Logic Blocks (CLBs). It is clear that bits are required in a logic block to represent abit input, 1-

bit output combinational logic function.

Multiplexer based logic blocks: Typically a multiplexer based logic block consist of three 2-to-

l multiplexers and one two-input OR gate as shown in Figure below.

33


The number of inputs is eight. The circuit within the logic block can be used to implement a

wide range of functions. One such function, shown in Figure (a) can be mapped to a logic block

as shown in Figure (b). Thus, the programming of multiplexer based logic block is achieved by

routing different inputs into the block.

There are two models of routing network , the segmented and the non-segmented routing

network .

Physical Design Cycle for FPGAs :The physical design cycle for FPGAs consists of the

following three important steps:

.Partitioning: The circuit to be mapped onto the FPGA has to be partitioned into smaller sub-

circuits, such that each sub-circuit can be mapped to a programmable logic block. Unlike the

partitioning in other design styles, there are no constraints on the size of a partition. However,

there are constraints on the inputs and outputs of a partition. This is due to the unique

architecture of FPGAs.

Placement: In this step of the design cycle, the sub-circuits which are formed in the partitioning

phase are allocated physical locations on the FPGA, i.e., the logic block on the FPGA is

programmed to behave like the sub-circuit that is mapped to it. This placement must be carried

out in a manner that the routers can complete the interconnections. This is very critical as the

routing resources of the FPGA are limited.

34


Routing: In this phase, all the sub-circuits which have been programmed on the FPGA blocks

are interconnected by blowing the fuses between the routing segments to achieve the

interconnections.

Figure above shows the complete physical design cycle of FPGAs. System design is available

as a directed graph which is partitioned in second step. Placement involves mapping of sub-

circuits onto CLBs. Shaded rectangles represent CLBs which have been programmed. Final step

is routing of channels.

--------------xxxxxxxxxxx---------------

References: 1.Algorithms for VLSI Design Automation –S.H.Garez

2. ECE 474a/575a Susan Lysecky. 3. Algorithms for VLSI Physical Design Automation - Naveed A. Sherwani.

35

Technology

Vlsi physical design-notes