VLSI Implementation Styles

Embed Size (px)

Citation preview

  • 7/29/2019 VLSI Implementation Styles

    1/40

    44-1 2006 by CRC Press LLC

    44Full-Custom andSemi-Custom DesignCONTENTS

    44.1 Introduction ..........................................................44-144.1.1 Semi-Custom Design .................................................44-2

    44.1.2 Full-Custom Design... ........................................ ........ 44-2

    44.1.3 Motivation for Semi-Custom Design.......................44-2

    44.2 Full-Custom Design Sequence of aDigital System........................................................44-3

    References .........................................................................44-5

    44.1 Introduction

    As integrated circuits become more inexpensive and compact, many new types of products, such as digital

    cameras, digital camcorders, and digital television [2], are being introduced, based on digital systems.

    Consequently, logic design must be done under many different motivations. Since each case is different,

    we have different design problems. For example, we have to choose an appropriate IC (integrated circuit)

    logic family, since these cases have different performance requirements (scientific computers require high

    speed, but wristwatches require very low power consumption), although in recent years, CMOS has beenmore widely used than other IC logic families, such as ECL, which has been used for fast computers.

    Logic functions that are frequently used by many designers, such as a full adder, are commercially

    available as off-the-shelf IC packages. (A package means an IC chip or a discrete component encased in

    a container.) Logic networks that realize such logic networks are often called standard (logic) networks.

    A single component, such as a resistor and a capacitor, is also commercially available as an off-the-shelf

    discrete component package. Logic networks can be assembled with these off-the-shelf packages. In many

    cases, not only performance requirements but also compactness and low cost are very important for

    products such as digital cameras. So, digital systems must accordingly be realized in IC packages that are

    designed, being tailored to specific objectives, rather than assembling many of these off-the-shelf packages

    on pc-boards, although assembling with these off-the-shelf packages has the advantage of ease of partial

    design changes.

    Here, however, let us consider two important cases of designing an IC chip inside such an IC package,

    which is not off-the-shelf, that leads to two sharply contrasting logic design approaches: quick design

    and high-performance design. Quick design of IC chips is called semi-custom design (recently called

    ASIC design, abbreviating Application Specific Integrated Circuit design), whereas deliberate design for

    high performance is called full-custom design because full-custom design is fully customized to high

    performance. Full-custom design is discussed in this chapter, and different approaches of semi-custom

    design will be discussed in the succeeding chapters.

    Saburo MurogaUniversity of Illinois

    at Urbana-Champaign

  • 7/29/2019 VLSI Implementation Styles

    2/40

  • 7/29/2019 VLSI Implementation Styles

    3/40

    Full-Custom and Semi-Custom Design 44-3

    2006 by CRC Press LLC

    The second term on the right-hand side of Eq. 44.1, [Manufacturing cost per IC package], is fairly

    proportional to the size of each chip when the complexity of manufacturing is determined, being usually

    on the order of dollars, or tens of dollars in the case of commercial chips. In the case of full-custom

    design, chips are deliberately designed by many designers spending many months. So, [Design expenses],

    the first term on the right-hand side of Eq. 44.1 is very high and can easily be on the order of tens of

    millions of dollars. Thus, the first term is far greater than the second term, making [Total cost of an IC

    package] very expensive, unless [Production volume] is very large, being on the order of more than tens

    of millions. Many digital systems that use IC chips are produced in low volume and [Design expenses]

    must be very low. Semi-custom design is for this purpose and CAD programs need to be used extensively

    for shortening design time and manpower in order to reduce [Design expenses]. In this case, [Manufac-

    turing cost per IC chip] is higher than that in the case of full-custom design because the size of each

    chip is larger.

    Thus, we can see the following from the formula in Eq. 44.1: chips by semi-custom design are cheaper

    in small production volume than those by full-custom design, but more expensive in high production

    volume. But chips by full-custom design are cheaper in the case of high volume production, and are

    expensive for low volume production.

    44.2 Full-Custom Design Sequence of a Digital System

    Full-custom design flow of a digital system follows a long sequence of different design stages, as follows.

    First, the architecture of a digital system is designed by a few people. The performance or cost of the

    entire system is predominantly determined by architectural design, which must be done based on good

    knowledge of all other aspects of the system, including logic design and also software to be run. If an

    inappropriate architecture is chosen, the best performance or lowest cost of the system cannot be achieved,

    even if logic networks, or other aspects like software, are designed to yield the best results. For example,

    if microprogramming is chosen for the control logic of a microcomputer based on ROM, it occupies too

    much of the precious chip area, sacrificing performance and cost, although we have the advantages of

    short design time and design flexibility. Thus, if performance or manufacturing cost is important,

    realization of control logic by logic networks (i.e., hard-wired control logic) is preferred. Actually, every

    design stage is important for the performance of the entire system. Logic design is also one of key factors

    for computer performance, such as architecture design, transistor circuit design, layout design, compilers,

    and application programs. Even if other factors are the same, computer speed can be significantly

    improved by deliberate logic design.

    Next, appropriate IC logic families and the corresponding transistor circuit technology are chosen for

    each segment of the system. Other aspects such as memories are simultaneously determined in greater

    detail. We do not use expensive, high-speed IC logic families where speed is not required.

    Architecture and transistor circuits are outside the scope of this handbook, so they are not discussed

    here further.

    The next stage in the design sequence is the design of logic networks, considering cost reduction and

    the highest performance, realizing functions for different segments of the digital system. Logic design

    requires many engineers for a fairly long time.

    Then, logic networks are converted into transistor circuits. This conversion is called technologymapping. It is difficult to realize the functions of the digital system with transistor circuits directly,

    skipping logic design, although experienced engineers can design logic networks and technology mapping

    at the same time, at least partly. Logic design with AND, OR, and NOT gates, using conventional switching

    theory, is convenient for human minds because AND, OR, and NOT gates in logic networks directly

    correspond, respectively, to basic logic operations, AND, OR, and NOT in logic expressions. Thus, logic

    design with AND, OR, and NOT gates is usually favored for manual design by designers and then followed

    by technology mapping. For example, the logic network with AND and OR gates shown in Figure 44.1(a)

    is technology-mapped into the MOS circuit shown in Figure 44.1(c). A variety of IC logic families, such

  • 7/29/2019 VLSI Implementation Styles

    4/40

  • 7/29/2019 VLSI Implementation Styles

    5/40

  • 7/29/2019 VLSI Implementation Styles

    6/40

  • 7/29/2019 VLSI Implementation Styles

    7/40

    45-1 2006 by CRC Press LLC

    45Programmable LogicDevicesCONTENTS

    45.1 Introduction ...........................................................45-145.2 PLAs and Variations...............................................45-2

    45.3 Logic Design with PLAs ........................................45-5

    45.4 Dynamic PLA .........................................................45-7

    45.5 Advantages and Disadvantages of PLAs...............45-745.5.1 Applications of PLAs.......................... ....................... 45-9

    45.6 Programmable Array Logic ...................................45-9References ........................................................................45-10

    45.1 Introduction

    Hardware realization of logic networks is generally very time-consuming and expensive. Also, once logic

    functions are realized in hardware, it is difficult to change them. In some cases, we need logic networks

    that are easily changeable. One such case is logic networks whose output functions need to be changed

    frequently, such as control logic in microprocessors, or logic networks whose outputs need to be flexible,such as additional functions in wrist watches and calculators. Another case is logic networks that need

    to be debugged before finalizing. Programmable logic devices (i.e., PLDs)are for this purpose. On these

    PLDs, all transistor circuits are laid out on IC chips prior to designers use, considering all anticipated

    cases. With PLDs, designers can realize logic networks on an IC chip, by only deriving concise logic

    expressions such as minimal sums or minimal products, and then making connections among pre-laid logic

    gates on the chip. So, designers can realize their own logic networks quickly and inexpensively using these

    pre-laid chips, because they need not design logic networks, transistor circuits, and layout for each design

    problem. Thus, designers can skip substantial time of months for hardware design. CAD programs for

    deriving minimal sums or minimal products are well developed [1], so logic functions can be realized very

    easily and quickly as hardware, using these CAD programs. The ease in changing logic functions without

    changing hardware is just like programming in software, so the hardware in this case is regarded as

    programmable. Programmable logic arrays (i.e., PLAs) and FPGAs are typical programmable logic devices.

    PLDs consists of mask-programmable PLDs and field-programmable PLDs. Mask-programmablePLDs (i.e., MPLDs) can be made only by semiconductor manufacturers because connections are made

    by custom masks. Manufacturers need to make few masks for connections out of all of more than

    20 masks, according to customers specification on what logic functions are to be realized. Unlike mask-

    programmable PLDs, field-programmable PLDs (i.e., FPLDs) can be programmed by users and are

    economical only for small production volume, whereas MPLDs are economical for high production

    volume. Logic functions can be realized quicker on FPLDs than on MPLDs, saving payment of charges

    Saburo MurogaUniversity of Illinois

    at Urbana-Champaign

  • 7/29/2019 VLSI Implementation Styles

    8/40

  • 7/29/2019 VLSI Implementation Styles

    9/40

  • 7/29/2019 VLSI Implementation Styles

    10/40

    45-4 The VLSI Handbook

    2006 by CRC Press LLC

    by De Morgans theorem. Thus, this is interpreted as a network of AND gates in the first level and

    OR gates in the second (output) levels, as illustrated in Figure 45.1(d). This is the reason why the

    upper and lower matrices in Figure 45.1(a) are called AND and OR arrays, respectively. The vertical

    lines which run through the two arrays in Figure 45.1(a) are called the product lines, since they

    correspond to the product terms in disjunctive forms for the output functions f1, f2, and f3. Thus,

    any combinational network (or networks) of AND and OR gates in two levels can be realized by a

    PLA. The connections of MOSFET gates to horizontal or vertical lines are usually denoted by dots,

    as shown in Figure 45.2.

    Sequential networks can also be easily realized on a PLA, as shown in Figure 45.2. Some outputs of

    the OR array are connected to the inputs of master-slave flip-flops (usuallyJ-Kmaster-slave flip-flops),

    whose outputs are in turn connected to the AND array as its inputs. More than one sequential network

    can be realized on a single PLA, along with many combinational networks. Flip-flops can be also realized

    inside the AND and OR arrays without providing them outside the arrays.

    In many PLAs, the option of an outputf1 or its complement is provided in order to give flexibility,

    as illustrated in the lower right-hand corner of Figure 45.2. By disconnecting one of the two s at each

    output, we can have eitherf1 or as output, as illustrated in Figure 45.3. Whenf1 has too many productsin its disjunctive form and cannot be realized on a PLA, its complement may have a sufficiently small

    number of terms to be realizable on the PLA, or vice versa.

    If the number of product lines in a PLA is too many, each horizontal line gets too long with a significant

    increase in parasitic capacitance. Then, if the majority of the MOSFET gates provided are connected to

    this horizontal line, the input or its inverter has too many fan-out connections on this horizontal line.

    Similarly, the total number of horizontal lines cannot be too large. In other words, the array size of a

    PLA is limited because of speed considerations. In contrast, the size of a ROM can be much larger, since

    we can use more than one decoder, or use a complex decoding scheme.

    FIGURE 45.2 PLA with flip-flops and output-complementation choice.

    x

    y

    z

    J

    Cl

    K

    Q

    J

    Cl

    K

    Q

    Inputs

    f1

    f2

    f3

    OR array

    J-Kmaster-slave

    flip-flops

    AND array

    Reset

    Clock

    f1

    f1 f1

  • 7/29/2019 VLSI Implementation Styles

    11/40

  • 7/29/2019 VLSI Implementation Styles

    12/40

    45-6 The VLSI Handbook

    2006 by CRC Press LLC

    The PLA show in Figure 45.3, for example, is minimized for the given functions f1, f2, and f3, with 8

    product lines and array size, (2 4 + 3) 8 = 88.

    However, the minimization of the number of connections in a minimal two-level AND-OR network

    may not be as important as the minimization of the number of AND gates, although it tends to reduce

    the power consumption, because the chances of faulty PLAs can be greatly reduced by careful fabrication

    of chips. But the PLA size is determined by the number of AND gates and cannot be changed by any

    other factors. Also, instead of making connections (i.e., dots) as they become necessary on a PLA, a

    PLA is sometimes prepared by disconnecting unnecessary connections by laser beam or by blowing

    fuses after it has been manufactured with all MOSFET gates connected to the lines. In this case, the

    chances of faults can be reduced by increasing the number of connections (i.e., the number of dots) in

    the two-level AND-OR network.

    For comparison with a PLA, the MOS realization of a ROM is shown in Figure 45.4. The upper matrix

    is a decoder which has 2nvertical lines if there are ninput variables. The lower matrix stores information

    by connecting or not connecting MOSFET gates. Figure 45.4 actually realizes the same output functions

    (in negative logic) as those in Figure 45.1(a). The AND array in Figure 45.1(a) is essentially a counterpart

    of the decoder in Figure 45.4, or the decoder may be regarded as a fixed AND array with 2n product

    lines, which is the maximum number of the product lines in a PLA. The AND array in Figure 45.1(a)

    has only three vertical lines, whereas the decoder in Figure 45.4 has eight fixed vertical lines. This indicates

    the compact information packing capability of PLAs. PLAs are smaller than ROMs, although the packing

    advantage of PLAs varies, depending on functions. For example, if we construct a ROM that realizes

    the functions of the PLA of Figure 45.3, in a manner similar to Figure 45.4, the decoder consists of 8

    horizontal lines and 16 vertical lines, and the lower matrix for information storage consists of 16 vertical

    lines and 3 horizontal lines. Thus, the ROM requires the array size of 16 (8 + 3) = 176, compared with

    88 in Figure 45.3.

    FIGURE 45.4 ROM that corresponds to the PLA in Figure 45.1.

    f1xyz xyz

    f3xyz xz

    f2xz

    x

    y

    z

    VddVdd

    (111) (110) (101) (100) (011) (010) (001) (000)

    Decoder

  • 7/29/2019 VLSI Implementation Styles

    13/40

    Programmable Logic Devices 45-7

    2006 by CRC Press LLC

    Generally, the size difference between PLAs and ROMs sharply increases as the number of input

    variables increases.

    A PLA, however, cannot store some functions, such as x1x2xnifn is large, because 2n1

    product lines are required and the number of these lines is excessively large for a PLA. (The horizontal

    lines become too long with excessive fan-out and parasitic capacitance.) However, we can store these

    functions in a ROM with an appropriate decoding scheme.

    Of course, in the case of ROMs, storing a truth table without worrying about conversion of given logic

    functions into a minimal sum is convenient, although it makes the ROM size bigger than the PLA size.

    Minimal two-level networks of AND and OR gates for the absolute minimization of the PLA size

    can be derived by the minimization methods discussed in earlier chapters, if a function to be

    minimized has either at most several variables, or many more variables but with a simple relationship

    among its prime implicants [8]. But otherwise, we have to be content with near-minimal networks

    instead of minimal networks. In many cases, efforts to reduce the PLA size, even without reaching

    an absolute minimum, result in significant size reduction. Also, CAD programs have been developed

    with heuristic minimization methods [12,13], such as the one by Hong et al. [7], which was the

    first powerful heuristic procedure drastically different from conventional minimization procedures.

    MINI, PLA minimization program of Hong, et al., was later improved to ESPRESSO by Rudell,

    Brayton, et al. [1,10,11]. Recently, however, Coudert and Madre [26] developed a new method for

    absolute minimization by implicitly expressing prime implicants and minterms using BDDs

    described in Chapter 29. By this method, absolute minimization of functions with greater numbers

    of variables is more feasible than before, although it is still time-consuming.

    45.4 Dynamic PLA

    If we want to realize a PLA in CMOS, instead of static nMOS circuit that has been discussed in Chapter 33,

    Section 33.3, in order to save power consumption, then a PLA in CMOS requires a large area because

    we need pMOS and nMOS subcircuits. Thus, instead of static CMOS, the dynamic CMOS illustrated in

    Figure 45.5(a) is usually used. During the absence of a clock pulse of the first- and second-phase clocks,

    1 and 2 (i.e., during 1 = 2 = 0 (low voltage, using positive logic)) shown in Figure 45.5(b), pMOSFETs,

    T1, T2, and T3, become conductive and nMOSFETs, T4, T5, and T6 become non-conductive prechargingvertical lines, P1, P2, and P3. When a clock pulse of the first-phase clock, 1, appears but a clock-pulse of

    the second-phase clock, 2, does not appear yet, i.e., when 1 = 1 (high voltage) and 2 = 0, pMOSFETs,

    T1, T2, and T3, become non-conductive and nMOSFETs, T4, T5, and T6, become conductive. Then,

    depending on the values ofx,y, and z, some verticle lines, P1, P2, and P3 are discharged through some

    of the nMOSFETs in the AND array. (For example, ify= 0 (low voltage), P1 is discharged through

    nMOSFETs A.) A clock pulse of the second-phase clock, 2, is still absent (i.e., 2 = 0), so pMOSFETs,

    T7, T8, and T9, become conductive and nMOSFETs T10, T11, and T12, become non-conductive, precharging

    horizontal lines,f1,f2, andf3. When a clock pulse of the first-phase clock, 1, is still present, and a clock

    pulse of the second-phase clock, 2, appears, i.e., when 1 = 2 = 1, pMOSFETs, T7, T8, and T9, become

    non-conductive and nMOSFETs,T10, T11, and T12, become conductive. Then, some of horizontal lines, f1,

    f2, and f3, are discharged through some of the nMOSFETs in the OR array, depending on which of the

    vertical lines, P1, P2, and P3, are still charged.

    45.5 Advantages and Disadvantages of PLAs

    PLAs, like ROMs which are more general, have the following advantages over random-logic gate networks,

    where random-logic gate networks are those that are compactly laid out on an IC chip:

    1. There is no neeed for the time-consuming logic design of random-logic gate networks and even

    more time-consuming layout.

    2. Design checking is easy, and design change is also easy.

  • 7/29/2019 VLSI Implementation Styles

    14/40

  • 7/29/2019 VLSI Implementation Styles

    15/40

    Programmable Logic Devices 45-9

    2006 by CRC Press LLC

    PLAs have the following advantage and disadvantage, compared with ROMs:

    For storing the same functions or tasks, PLAs can be smaller than ROMs; generally, the size

    difference sharply increases as the number of input variables increases.

    The small size advantages of PLAs diminishes as the number of terms in a disjunctive formincreases. Thus, PLAs cannot store complex functions, i.e., functions whose disjunctive forms

    consist of many product terms.

    45.5.1 Applications of PLAs

    Considering the above advantages and disadvantages, PLAs have numerous unique applications. A micro-

    processor chip uses many PLAs because of easy of design change and check. In particular, PLAs are used

    in its control logic, which is complex and requires many changes, even during its design. Also, PLAs are

    used for code conversions, microprogram address conversions, decision tables, bus priority resolvers, and

    memory overlay.

    When a new product is to be manufactured in small volume or test-marketed, PLAs is a choice. When

    the new product is well received in the market and does not need further changes, PLAs can be replaced

    by random-logic gate networks for low cost for high volume production and high speed. Also, a full-custom design approach is very time-consuming, probably taking months or years, but if PLAs are used

    in the control logic, a number of different custom-design chips with high performance can be made

    quickly by changing only one connection mask for the PLAs, although these chips cannot have drastically

    different performance and functions.

    45.6 Programmable Array Logic

    A programmable array logic (PAL) is a special type of a PLA where the OR array is not programmable.

    In other words, in a PAL, the AND array is programmable but the OR array is fixed; whereas in a PLA,

    both arrays are programmable. The advantage of PALs is the elimination of fuses in the OR array in

    Figure 45.1(a) and special electronic circuits to blow these fuses. Since these special electronic circuits

    and programmable OR array occupy a very large area, the area is significantly reduced in PAL. Sincesingle-output, two-level networks (i.e., many AND gates in the first level and one OR gate as the network

    output) are needed most often in desing practice, many single-output two-level networks which are

    mutually unconnected are placed in some PAL packages.

    In digital systems, many non-standard networks are still used because designers want to differentiate

    their computers from competitors. But logic functions that designers want to have are too diverse to be

    standardized by semiconductor manufacturers. When off-the-shelf IC packages for standard networks,

    including microprocessors and their peripheral networks, are assembled on pc boards, many non-

    standard networks are usually required for interfacing them to other key networks or for minor modi-

    fications. So, they require many discrete components and IC packages, each of which has a smaller number

    of transistors, in addition to a microprocessor package with millions of gates, occupying a significant

    share of the areas on pc boards. Now, we can make connections inside PALs, instead of custom-making

    pc boards. Custom-made pc boards are expensive and time-consuming because connection patterns on

    pc boards need to be designed, these pc boards need to be manufactured and then the holes of pc boards

    have to be soldered to the pins of IC packages. The replacement by PAL packages can substantially reduce

    the area, time, and cost. If we consider related factors such as reductions of cabinet size, power consump-

    tion, and fans, the significance of this reduction is further appreciated.

    There are mask-programmable PALs and field-programmable PALs (i.e., FPALs). When logic design

    is not finalized and needs to be changed often, FPAL packages can reduce expense and time for repeatedly

    redesigning and remaking pc boards.

  • 7/29/2019 VLSI Implementation Styles

    16/40

  • 7/29/2019 VLSI Implementation Styles

    17/40

  • 7/29/2019 VLSI Implementation Styles

    18/40

    46-2 The VLSI Handbook

    2006 by CRC Press LLC

    46.2 CMOS Gate Arrays

    CMOS gate arrays are commercially available from many manufacturers in slightly different layout

    forms. As an example, Figure 46.2 shows a cell of a CMOS gate array, where a pair of pMOSFETs anda pair of nMOSFETs are placed on the left and right, respectively, without connections between them.

    The NAND gate shown in Figure 46.3(a) can be realized by connecting the components shown in

    Figure 46.2 by two metal layers as shown in Figure 46.3(b). These two metal layers are formed by

    forming the first metal layer shown in Figure 46.3(c), the insulation layer (not shown), and then the

    second metal layer shown in (d). The inverter shown in Figure 46.4(a) can be realized by connections

    as shown in Figure 46.4(b).

    Many different patterns other than that in Figure 46.2 are available for the components of a cell.

    FIGURE 46.1 Gate array.

    FIGURE 46.2 A cell of CMOS gate array. (Courtesy of Fujitsu Ltd. With permission.)

    (a) Before making connections (b) After connections made

    f1

    f3

    x8

    f2x3

    x4x5

    x1x2

    x6x7

    Polysilicon gate for pMOS Polysilicon gate for nMOS

    p for source/drain n for source/drain

    n substrate

    n forVdd p forVss

    p tab

  • 7/29/2019 VLSI Implementation Styles

    19/40

  • 7/29/2019 VLSI Implementation Styles

    20/40

  • 7/29/2019 VLSI Implementation Styles

    21/40

    Gate Arrays 46-5

    2006 by CRC Press LLC

    logic networks. The cost difference would be greater (the cost is not necessarily linearly propor-

    tional to chip size) for the same production volume.

    3. It is difficult to keep gate delays uniform. As the number of fan-outs and the length of fan-out

    connections increase, delays increase dramatically. (If delay times of gates are not uniform, the network

    tends to generate spurious output signals.) In the case of full-custom design, the increase of gate delay

    by long or many-output connections of a gate can be reduced by redesigning the transistor circuit

    (e.g., increasing transistor size for delivering greater output power and accordingly reducing the delay).

    But such a precise adjustment is not possible in the case of gate arrays.

    Responding to a variety of different user needs in terms of speed, power consumption, cost, design

    time, ease of change, and possibly others, a large number of different gate arrays are commercially available

    from semiconductor manufacturers or are used in-house by computer manufacturers. Different numbers

    of gates are placed on a chip, with different configuration capabilities. Some gate arrays contain memories,

    for example.

    References

    1. Okabe, M. et al., A 400k-transistor CMOS sea-of-gate array with continuous track allocation, IEEEJ. Solid-State Circuits, pp. 12801286, Oct. 1989.

    2. Muroga, S., VLSI System Design, John Wiley & Sons, 1982.

    3. Price, J.E., VLSI chip architecture for large computers, in Hardware and Software Concepts in VLSI,

    Edited by G. Rabbat, Van Nostrand Reinhold Co., pp. 95115, 1983.

  • 7/29/2019 VLSI Implementation Styles

    22/40

  • 7/29/2019 VLSI Implementation Styles

    23/40

  • 7/29/2019 VLSI Implementation Styles

    24/40

    47-2 The VLSI Handbook

    2006 by CRC Press LLC

    functions by software. Even application programs can be run on FPGAs and perform much faster than

    on general-purpose computer in many cases.

    As the price of FPGAs goes down with higher speed, FPGAs are replacing other semi-custom design

    approaches in many applications.

    47.2 Basic Structures of FPGAs

    In the case of mask-programmable gate arrays, designers have to wait a few weeks for delivery of finished

    gate arrays from semiconductor manufacturers because the semiconductor manufacturers must prepare

    custom masks (although the number of custom masks for gate arrays is fewer than the case of the

    standard-cell library approach described in Chapter 48). With FPGAs, designers can realize their design

    on FPGA chips by themselves in minutes. Thus, FPGAs are becoming popular [1,2,810].

    Several different types of structures for FPGAs are available commercially. All of them have a basic

    structures that consists of many logic blocks or logic cells, accompanied by a large number of pre-laid

    lines for connecting these logic blocks. So, some manufacturers call FPGAs logic block arrays(LBAs).

    One has a structure similar to a gate array with routing channels where each logic cell in a gate array is

    replaced with a logic block, as shown in Figure 47.1. Another one is similar to sea-of-gate array, as shownin Figure 47.2 illustrated with 16 logic blocks. Also, there is a structure similar to standard cells (to be

    discussed in the next chapter) where there are routing channels between a pair of rows of logic blocks,

    as shown in Figure 47.3. There is a structure where outputs of logic blocks are connected to the inputs

    of other logic blocks through bus lines, as shown in Figure 47.4.

    The internal structure of logic blocks or logic cells differs, depending on the manufacturer. A logic

    block consists of SRAMs (used as look-up tables), PALs, NAND gates, along with multiplexers, flip-flops,

    and others. Lines are pre-laid horizontally and vertically and are connected to the inputs and outputs of

    logic blocks byprogrammable switches. Various programmableswitches, such as fuses, anti-fuses, RAMs,

    and non-volatile memories, are provided by different manufacturers. Each line actually consists of many

    short line segments and only necessary line segments are connected in order not to add unnecessary

    delay due to parasitic capacitance by using an excessive number of line segments. Line segments are also

    connected by programmable switches.

    FIGURE 47.1 FPGA type of gate array with routing channels.

    Connection lines

    denotes a connection to be

    made or to be disconnected.

    Logic block

    Switch matrix

  • 7/29/2019 VLSI Implementation Styles

    25/40

  • 7/29/2019 VLSI Implementation Styles

    26/40

  • 7/29/2019 VLSI Implementation Styles

    27/40

  • 7/29/2019 VLSI Implementation Styles

    28/40

  • 7/29/2019 VLSI Implementation Styles

    29/40

  • 7/29/2019 VLSI Implementation Styles

    30/40

  • 7/29/2019 VLSI Implementation Styles

    31/40

  • 7/29/2019 VLSI Implementation Styles

    32/40

  • 7/29/2019 VLSI Implementation Styles

    33/40

  • 7/29/2019 VLSI Implementation Styles

    34/40

  • 7/29/2019 VLSI Implementation Styles

    35/40

    Cell-Library Design Approach 48-3

    2006 by CRC Press LLC

    48.3 Hierarchical Design Approach

    The cell library design approaches, using cells of different shapes and sizes, can reduce the chip size more

    than the polycell design approach, because by keeping the same height, a large portion of the area of

    each cell is wasted, and by keeping all connections among cells in routing channels, the connection area

    may not be minimized. Moreover, by using a hierarchical approach based on cells of different shapes

    and sizesin other words, by treating many cells as a building block in a higher level, and many such

    building blocks as a building block in a next higher level, and so onwe can further reduce the chip

    area, as illustrated in Figure 48.2, because global area minimization can be treated better, even though

    this is done on the monitor. In other words, cells A, B, C, and D are assembled into a block R (shown

    in a dot-lined rectangle), as shown in Figure 48.2. Then, such blocks, R, S, T and U, shown in dot-lined

    rectangles are assembled into a bigger block W, which is a block in a higher level than blocks R, S, T,and U, as shown in Figure 48.2. But this is much more time-consuming than the polycell design approach,

    and the development of efficient CAD programs is harder. It appears to be difficult to make the difference

    of chip area from full-custom designed chips within about 20%, although the areas of full-custom

    designed chips vary greatly with designers and, accordingly, comparison is not simple.

    References

    1. Lauther, U., Cell based VLSI design system, in Hardware and Software Concepts in VLSI, Ed. by

    G. Rabbat, Van Nostrand Reinhold, pp. 480494, 1983.

    2. Kick, B. et al. Standard-cell-based design methodology for high-performance support chips, IBM

    Jour. Res. Dev., pp. 505514, July/Sept. 1997.

    3. Muroga, S., VLSI System Design, John Wiley & Sons, 1982.

    FIGURE 48.2 Hierarchical design approach.

    T U

    R S

    W

    A

    C D

    B

  • 7/29/2019 VLSI Implementation Styles

    36/40

  • 7/29/2019 VLSI Implementation Styles

    37/40

  • 7/29/2019 VLSI Implementation Styles

    38/40

  • 7/29/2019 VLSI Implementation Styles

    39/40

  • 7/29/2019 VLSI Implementation Styles

    40/40

    49-4 The VLSI Handbook

    has variations and it makes a difference whether or not libraries of cells or macrocells are prepared from

    scratch. (Notice that in Figure 49.2, design approaches are shown in thin-line curves for the sake of

    simplicity, but actually they should be represented in very broad lines.) The cost per package for the

    off-the-shelf package design approach is fairly uniform over the entire range, but it increases for low

    production volumes because the development cost becomes significant as initial investment in the overall

    package cost. The relationship shown in this figure will change as the integration size of an IC chip

    increases, because the dependence on CAD will inevitably increase.

    49.4 Comparison of All Different Design Approaches

    As discussed so far, we have a very wide spectrum of different design approaches, from full-custom design

    approaches to the design approaches with off-the-shelf packages, as illustrated in Table 49.1. Digital

    systems can be designed by combining them. Depending upon different criteria imposed by different

    design motivations, such as speed, power consumption, size, design time, ease of changes, and reliability,

    designers can use the following approaches:

    1. Custom-design full- and semi-custom approaches

    2. Off-the-shelf discrete components and off-the-shelf IC packages, along with memory packages

    3. Off-the-shelf microcomputers along with off-the-shelf IC packages

    The full-custom design approaches give us the highest performance and reliability or the smallest

    chip size, although they are most time-consuming. (Even in the case of microcomputers, the full-

    custom designed microcomputers have better performance and smaller size than off-the-shelf micro-

    computers, by being tailored to the users specific needs.) This is one end of the wide spectrum of

    different design approaches. At the other end, the off-the-shelf microcomputers give us a design

    approach where the development time is shortest, by programming rather than by chip design

    (including logic design), and the design changes are the easiest. The off-the-shelf discrete components

    and off-the-shelf IC packages give us logic networks tailored to specific needs with less programming

    than the off-the-shelf microcomputers.

    Custom design approaches, in particular the full-custom design approaches, are the most economical

    for very high production volumes (on the order of a few hundred thousand) but the least economical

    for low production volumes.

    When the production volume is low, the off-the-shelf discrete components and off-the-shelf IC

    packages give us the most economical approaches for simple tasks, but the off-the-shelf microcomputers

    are more economical for complex tasks, although performance is usually sacrificed.

    TABLE 49.1 Comparison of Different Task-Realization Approaches

    Full-Custom Semi-Custom

    Off-the-Shelf IC

    Package

    Off-the-Shelf

    Microcomputer

    Speed Fastest Fast Medium Slowest

    Size Smallest (chip size) Small (chip size) Large (many chips) Medium (many chips)

    Development time Longest (layout) Long (layout) Medium (logic design) Short (programming)Flexibility Lowest Low Medium High

    Initial investment Highest (layout) High (layout) Medium (logic design) Low (programming)

    Unit Cost

    High volume Lowest Low Medium Highest

    Low volume Highest High Medium Lowest

    Reliability Highest High Low Medium