Testing and Analyzing Methods for Truncated Binary Multiplication

  • Upload
    a-villa

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    1/94

    Testing and analyzing methods for truncated binary

    multiplication.

    Andreas Thor Winther - S053010

    Finishing Bachelor Project

    Spring 2009

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    2/94

    Testing and analyzing methods for truncated binary

    multiplication.

    Report made by:

    Andreas Thor Winther - s053010

    Supervisor:

    Associate ProfessorAlberto Nannarelli

    DTU Informatics

    Institute of Informatics and Mathematical Modeling

    Technical University of Denmark

    Richard Petersens Plads

    Bygning 322

    2800 Kgs. Lyngby

    Danmark

    Released: June 27. - 2009

    Edition: 1.Edition

    Notes: This report is written as documentation of a B.Eng. project made at

    Informatics and Mathematical Modeling at the Technical

    University of Denmark.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    3/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    i Abstract

    Multiplication operations are frequently required in digital signal processing. To increase

    the speed with which these are done, parallel multipliers can be used. These however

    require a large area on the chip and consume much power. An important goal would

    therefore be to reduce the area requirements. The purpose of this project is to analyze

    several methods of performing truncated multiplication and to determine which method

    would be most efficient for unsigned numbers of 8, 16 or 32 bits.

    Truncated multiplication is a technique where only the most significant columns of the

    multiplication matrix are used and therefore area requirements can be reduced by up to

    46.08 % for 32 bit inputs. This however leads to a rounding error and a reduction error

    which needs to be corrected by introducing a correction constant, and the goal of this

    project is to determine the point of balance for error vs. hardware savings. The scope of

    this project is to evaluate different well-known area reduction methods along with

    minimizing the error.

    The multiplier is implemented using VHDL (VHSIC Hardware Description Language,

    where VHSIC stands for Very High Speed Integrated Circuit) and simulated with

    ModelSim/Xilinx ISE in orderto observe the hardware areareduction. It is also

    implemented and simulated using Matlabs Simulink tool in order to evaluate the error.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    4/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    ii Resum

    Ved digital signal behandling stilles der ofte krav om multiplikation. For at ge

    hastigheden af denne proces kan parallelle multiplikatorer finde anvendelse, men dette

    optager stor plads p chipen og krver desuden en betydelig effekt. Det er derfor af

    vigtighed at reducere kravet til plads (og dermed effekt) gennem brug af forskellige

    metoder til udfrelse af trunkeret multiplikation og til at bestemme hvilken metode, der er

    den mest effektive for ikke-fortegnsbestemte tal p 8, 16 eller 32 bits.

    Trunkeret multiplikation er en teknik, der kun bruger de mest betydningsfulde sjler i

    multipliktionsmatricen for bl.a. at spare plads. Teknikken frer imidlertid til

    afrundingsfejl og reduktionsfejl, som der derfor m korrigeres for gennem inddragelse af

    en korrektionskonstant.

    Formlet med dette projekt er at bestemme balancepunktet mellem prcisionsfejl og

    plads p hardwaren. Projektets ml er at evaluere forskellige allerede kendte metoder, dersigter mod pladsbesparelse samtidig med en minimering af fejlforgelsen.

    Multiplikatoren implementeres gennem brug af VHDL (Very High Speed Integrated

    Circuit Hardware Description Language), og simuleres med modelSim/Xilinx ISE med

    det forml at konstatere pladsbesparelsen i hardwaren. Ved at bruge Matlabs redskab

    Simulink implementeres og simuleres den endvidere med det forml at observere

    fejlforgelsen.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    5/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    iii Preface

    This report is written as documentation of a B.Eng. project made at the Informatics and

    Mathematical Modeling department of the Technical University of Denmark.

    The team consisted of Andreas Thor Winther. The project supervisor at DTU was

    Associate Professor Alberto Nannarelli. Amanuensis Edward Alexandru Todirica helped

    with the technical part of the VHDL implementation.

    The project was about reducing area consumption on conventional parallel multipliers

    using truncation. The report consists of an analysis of the problem, a description of a

    possible solution, and a presentation of how it might be designed and implemented.

    Lyngby, June 2009

    Andreas Thor Winther

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    6/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    iv Acknowledgements

    The project was proposed by Associate Professor Alberto Nannarelli of IMM1 who also

    made sure I got a desk, along with the needed hardware and software, to carry out the

    task. I would like to thank Alberto for giving me the opportunity to work on this project

    and for being available during the project period.

    I would also like to thank Amanuensis Edward Alexandru Todirica for helping me with

    various problems concerning both VHDL problems and the writing of the report.

    Furtermore I would like to thank Emil N. Jakobsen, Markku Eerola, Bjarne Petersen and

    Troels Winther for reading through the report and for coming with constructive criticism.

    1 IMM: Informatics and Mathematical Modelling at the Technical University of Denmark

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    7/94

    Table of Contents

    i Abstract ........................................................................................................................

    ii Resum.........................................................................................................................

    iii Preface..........................................................................................................................

    iv Acknowledgements......................................................................................................

    1 Introduction................................................................................................................. 1

    1.1 The problem........................................................................................................ 1

    1.1.1 Problem description .................................................................................... 1

    1.1.2 Motivation................................................................................................... 2

    1.2 The project .......................................................................................................... 2

    1.2.1 Structure of the report ................................................................................. 2

    1.2.2 List of terminologies ................................................................................... 3

    1.2.3 Tools used ................................................................................................... 5

    1.2.3.1 Matlab r2008b Simulink ...................................................................... 5

    1.2.3.2 ModelSim 6.3a........................................................................................ 5

    1.2.3.3 Xilinx ISE 9.2i ........................................................................................ 5

    1.2.4 Hardware used ............................................................................................ 61.2.5 Method........................................................................................................ 6

    2 Theoretical background .............................................................................................. 8

    2.1 Binary multiplication .......................................................................................... 8

    2.1.1 Rounding..................................................................................................... 9

    2.1.2 Truncation ................................................................................................. 10

    2.2 Conventional multipliers................................................................................... 11

    2.3 Binary addition.................................................................................................. 11

    2.4 Carry-save adder (CSA) multiplier................................................................... 15

    2.4.1 Multiplier trees.......................................................................................... 16

    3 Analysis..................................................................................................................... 18

    3.1 Priorities ............................................................................................................ 18

    3.2 Problem solving ................................................................................................ 18

    4 Design ....................................................................................................................... 20

    4.1 Matlab Simulink model.................................................................................. 20

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    8/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    4.2 Modelsim VHDL model ................................................................................ 21

    5 Implementation ......................................................................................................... 23

    5.1 Matlab Simulink............................................................................................. 24

    5.2 VHDL ............................................................................................................... 26

    5.2.1 Components.vhd ....................................................................................... 27

    5.2.1.1 Fadd 1-bit Full adder.......................................................................... 27

    5.2.1.2 CLA_g n-bit Carry look-ahead adder ................................................ 28

    5.2.1.3 CSA_g Carry-save adder ................................................................... 29

    5.2.2 Nxn_mult.vhd ........................................................................................... 30

    5.2.3 Testbench.vhd ........................................................................................... 305.3 Testing............................................................................................................... 31

    5.3.1 Matlab Simulink test .............................................................................. 31

    5.3.2 Modelsim Xilinx ISE simulation ........................................................... 31

    5.3.3 Testing on an FPGA.................................................................................. 32

    6 Results....................................................................................................................... 33

    6.1.1 Precision error........................................................................................... 33

    6.1.2 Hardware savings...................................................................................... 36

    7 Discussion ................................................................................................................. 37

    7.1 Results evaluation............................................................................................. 37

    7.1.1 Precision Error .......................................................................................... 37

    7.1.2 Hardware savings...................................................................................... 39

    7.2 Simulink............................................................................................................ 40

    7.3 Perspectives....................................................................................................... 41

    7.4 Future work....................................................................................................... 41

    8 Conclusion ................................................................................................................ 42

    9 References................................................................................................................. 44

    10 Appendix A Matlab Simulink Design.................................................................... 46

    10.1 The inside of the Simulink sub-systems ........................................................... 46

    10.2 Matlab embedded function full source code..................................................... 47

    11 Appendix B - VHDL code ........................................................................................ 49

    11.1 Nxn_mult.vhd ................................................................................................... 49

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    9/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    11.2 Components.vhd ............................................................................................... 51

    11.3 Testbench.vhd ................................................................................................... 54

    12 Appendix C Test results......................................................................................... 59

    12.1 Simulink test bench........................................................................................... 59

    12.2 Simulation diagrams from modelSim ............................................................... 61

    13 Appendix D Synthesis reports ............................................................................... 67

    13.1 CLA_g (Carry Look-ahead) VS add_g (Ripple-carry)..................................... 67

    13.2 32-bit input (left = 31)....................................................................................... 67

    13.3 16-bit input (left = 15)....................................................................................... 70

    13.4 8-bit input (left = 7)........................................................................................... 7313.5 Example of a full syntesis report ...................................................................... 75

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    10/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Table of figures and tables

    Figure 1: 4x4 bit Binary Multiplication.............................................................................. 9

    Figure 2: 4x4 bit Binary Multiplication with truncation degree T=3 ............................... 10

    Figure 3: Left: A sequential multiplier, right: A parallel multiplier ................................. 11

    Figure 4: 1-bit full adder with truth table. [10]................................................................. 12

    Figure 5: 4-bit Ripple-Carry Adder .................................................................................. 12

    Figure 6: 4-bit Carry look-ahead adder. [10].................................................................... 13

    Figure 7: the Full adder and the CSA has same IO. [8] .................................................... 14

    Figure 8: CSA array.......................................................................................................... 14

    Figure 9: Schematic of the CSA. ...................................................................................... 15

    Figure 10: Carry-save adder (CSA) multiplier. ................................................................ 16

    Figure 11: Example of a carry-save adder Wallace tree multiplier. ................................. 17

    Figure 12: Bit shifting....................................................................................................... 20

    Figure 13: The designed Simulink model......................................................................... 21

    Figure 14: High-level design of the nxn bit parallel CSA multiplier implemented.......... 21

    Figure 15: The 2n-bit product........................................................................................... 23

    Figure 16: Flow diagram of the Embedded Matlab function, multiplier with truncation. 25Figure 17: Top-View of the VHDL implementation ........................................................ 27

    Figure 18: 2n bit final product of the multiplier ............................................................... 28

    Figure 19: The rounding sub-system ................................................................................ 46

    Figure 20: The Random Number Generator sub-system.................................................. 46

    Figure 21: The Average Calc sub-system......................................................................... 46

    Figure 23: left= 3, T = 3. Binary representation .............................................................. 61

    Figure 24: left= 3, T = 3. Unsigned representation.......................................................... 62

    Figure 25:Left= 31, T = 0 - 28. Binary representation.................................................... 63

    Figure 26: Left = 31, T = 0 - 28. Unsigned representation............................................... 64

    Figure 27: left= 3, T = 0 - 3. Binary representation......................................................... 65

    Figure 28: left = 3, T = 0 - 3. Unsigned representation .................................................... 66

    Table 1: Precision error for various truncated multipliers ................................................ 35

    Table 2: Hardware savings for various truncated multipliers........................................... 36

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    11/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Table 3: Maximum precision error for various multipliers .............................................. 38

    Table 4: Hardware savings compared with [17] ............................................................... 39

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    12/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 1 of 94

    1 Introduction

    This chapter gives an introduction to the problem, the tools used, the methods, and the

    structure of the report and the chapter consists of 2 sub-chapters:

    1. The problem an introduction to the problem and what to be done

    2. The project Tools and hardware, methods and the structure of the report.

    1.1

    The problem

    1.1.1 Problem description

    Truncation is a well-known method to reduce the hardware area of multipliers. By

    applying truncation significant hardware savings can be achieved at the cost of precision.

    The problem to be solved in this project is to analyze several methods and degrees of

    performing truncated multiplication and to determine which method/degree would be

    most efficient for 8, 16, and 32 bits.

    There are three main objectives:

    1. To determine the error for several methods and degrees of truncation for 8, 16 and

    32 bits. This is done in chapter 6

    2. To design and implement the parallel multiplier in order to observe the reduction

    in area and to confirm if the method works. This is done in chapter 5 and 6

    3. To compare the results and determine which area/error balance would be most

    efficient for 8, 16 and 32 bits.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    13/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 2 of 94

    1.1.2 Motivation

    Truncation is a method used to reduce hardware requirements of multipliers, while

    remaining very simple in terms of implementation and options to correct for potential

    adverse effects. This implies that the results of the project could potentially lead to major

    improvements in the design and manufacture of multiplier circuitry. It has been possible

    to simulate the effects of truncation, given a wide variety of conditions, without imposing

    demands that are impossible to meet using readily available technology. The balance of

    challenge and tractability of the stated problem has been a determining factor in the

    decision to undertake the project, along with a sense of personal interest and enthusiasm.

    Examples of use: [1], [2], [3], [4] , [5].

    1.2 The projectIn this sub-chapter the structure of the report is presented. Furthermore tools and

    hardware used in the project are described. Finally the used methods are briefly

    explained.

    1.2.1 Structure of the report

    The thesis is structured as follows:

    Chapter 1 is an introduction to the problem at hand and the tools and methods used.

    Chapter 2 gives the theoretical background.

    Chapter 3 provides a closer analysis of the problem

    Chapter 4 gives a top-view design of the two model implementationsChapter 5 explains in closer detail the implementation of the two models.

    Chapter 6 provides the results of the work.

    Chapter 7 discusses the work

    Chapter 8 concludes the report

    Chapter 9 has a list of used references.

    Appendix A unrolls the Simulink model including the full Matlab function source code.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    14/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 3 of 94

    Appendix B gives the full source code for the VHDL implementation.

    Appendix C provides test results.

    Appendix D holds the various synthesis reports used in this report.

    1.2.2 List of terminologies

    ASIC: (Application-SpecificIntegratedCircuit), is a chip designed for a

    particular application and is pre-programmed

    FPGA: (FieldProgrammable GateArray) can be reprogrammed and therefore

    makes it an excellent choice when testing models

    HDL: (HardwareDescriptionLanguage) General term for a language used

    to describe hardware.

    LED: (Light-EmittingDiode): a semiconductor that gives out light when an

    electric current is applied to it.

    LUT: (Look-Up Table): A data structure, usually an array or associative

    array, often used to replace a runtime computation with a simpler array

    indexing operation.

    MSB: (MostSignificantBit) The left-most bit in a binary represented:

    10011000.

    RTL: (RegisterTransferLevel) A way of describing the operation of a

    synchronous digital circuit. In RTL design, a circuit's behavior is

    defined in terms of the flow of signals (or transfer of data) between

    hardware registers, and the logical operations performed on those

    signals.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    15/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 4 of 94

    Synthesize: SeeXST.

    SystemC: A hardware description language based on C++. [6]

    SystemVerilog: A hardware description language based on Verilog. [7]

    Verilog: A hardware description language similar to VHDL.

    VHDL: (Very High Speed Integrated CircuitHardwareDescriptionLanguage) is the VHSIC Hardware Description Language. VHSIC is

    an abbreviation for Very High Speed Integrated Circuit.

    XST: (Xilinx Synthesis Tool) A Xilinx tool that synthesizes HDL designs.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    16/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 5 of 94

    1.2.3 Tools used

    Three primary tools were used and below a short introduction is given.

    1.2.3.1Matlab r2008b Simulink

    Matlab is a very powerful mathematical tool which serves many purposes including

    manipulation of equations, numbers and models. It is a programming language taking use

    of such things as scripts, functions and plug-ins etc. These features combined give an

    extremely wide span of mathematical possibilities.

    Simulink is a graphical block diagramming tool in Matlab, which is used to draw models

    by dragging and dropping building blocks into the model. A wide list of pre-made blocks

    is available but also custom user-made blocks can be designed and used. An example

    would be a block representing a constant source, a multiplier or a display that shows the

    value of the device/block under measurement.2

    1.2.3.2ModelSim 6.3a

    ModelSim provides a comprehensive simulation and debug environment for complex

    ASIC and FPGA designs. Hardware Description Languages (HDL) like VHDL and

    Verilog are typically used. SystemVerilog and SystemC are also commonly used.3

    1.2.3.3Xilinx ISE 9.2i

    Xilinx ISE has many uses but in this project it is used to synthesize models made in

    ModelSim. It simulates exactly how a model works on a specific device and gives very

    detailed analytical information, such as timings, RTL schematics and hardware utilization

    on the device4

    2 See more about Matlab and Simulink at: http://www.mathworks.com/products/(14-06/2009)3 See more about ModelSim at: http://www.model.com/(14-06/2009)4 See more about Xilinx at: http://www.xilinx.com/tools/designtools.htm (14-06/2009)

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    17/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 6 of 94

    1.2.4 Hardware used

    The board used in this project was an FPGA of the Spartan 2 family

    (more details in the box to the right).

    Most FPGAs meet the requirements needed for this project so the most

    readily available board was used.

    Furthermore an extension to the board has been used. This extension has

    a few buttons and a LED display, and this will be used in the end if the model synthesized

    in Xilinx ISE is uploaded to the FPGA for final testing.

    1.2.5 Method

    Following is a short explanation of how the before mentioned tools were used in this

    project.

    Matlab was used to simulate the error using Simulink. Two uniformly random numbers

    were generated. These numbers were then multiplied using both the built-in (no

    truncation, no error) multiplier block and a custom-made Embedded Matlab Function

    block with implemented truncation. The truncated product was rounded down to match

    the input size of the multiplier and the multiplicand and a correction constant was added

    to reduce the error. Lastly the truncated product was subtracted from the real product to

    see the deviation (the error resulting from the truncation). Finally the average deviation

    from a 1,000,000 samples were calculated and noted.

    ModelSim was used to compile the VHDL code and to simulate it.

    To see in more detail how the multiplier was implemented, see chapter 0

    Spartan 2 boardXC2S200PQ208AMS0229D1226889A5C

    3 VHDL files were written:

    nxn_mult.vhd The main file with the code needed for the multiplier.

    components.vhd Some components needed (full adder, n-bit adder etc).

    testbench.vhd Testbench used to stimulate inputs to the multiplier

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    18/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 7 of 94

    Xilinx ISE was used to synthesize the VHDL code implemented in ModelSim and a

    synthesis report which provides detailed information about the model was generated.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    19/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 8 of 94

    2 Theoretical background

    2.1 Binary multiplicationAn introduction to thepen-and-paper5 method of performing binary multiplication is

    briefly given below.

    Figure 1 shows the process of multiplying two binary numbers, the multiplicandand the

    multiplier. Notice that the width of the product is twice that of the inputs. The green box

    outlines the partial product matrix and the red box outlines a single partial product.

    The first step in this method is to form the partial product matrix in which each element is

    obtained by ANDing6 the appropriate bits from the multiplicand and the multiplier.

    Another way to look at this is:

    If the multiplierbit is 0, the partial product is also 0

    If the multiplierbit is 1, the partial product is equal to the multiplicand

    Repeat for every multiplier bit

    Notice that this gives a number of partial products equal to the width of the multiplier.

    To obtain the final product the elements in the columns (from right to left) are added

    using binary logic7. Any carries are carried on to the next column. The result of this

    operation is stored in one bit of the product and the operation is repeated for each

    remaining column.

    5 Equivalent to multiplying by hand.6 By ANDing, means sending the two bits through a 2-bit and gate.7 For more information about binary adding, see [8].

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    20/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 9 of 94

    Figure 1: 4x4 bit Binary Multiplication.Notice: Anxn-bit multiplier renders a2n-bit product.

    Source: http://edu.cs.tut.fi/PD2009/figs/EPs_arith/multiply.jpg-11/06/09

    2.1.1 Rounding

    Conventionally an n-bit multiplicand and an n-bit multiplier would render a 2n-bit

    product.

    Sometimes an n-bit output is desired to reduce the number of stored bits. This is done intwo steps.

    When rounding to n-bits a 1 is added to the n-1th bit and a potential carry is

    propagated to the nth-bit and so on.

    The unwanted bits (the n-1th least significant bits) are discarded.

    In Figure 1 rounding down to 4 bits corresponds to adding a 1 top3 and then discarding

    the 4 least significant bits. Rounding in decimal numbers is done in a similar way, that is

    if you add a 5 instead of a 1 to the n-1th bit.

    This will not work if the product corresponds to a number unless the bits discarded are

    fractional bits. To give an example in decimal numbers:

    213,456 rounded to 3 decimals would be 213 which is correct, while 213456 rounded to

    213 would result in a very different number.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    21/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 10 of 94

    2.1.2 Truncation

    Truncation is a method where the least significant columns in the partial product matrix

    are not formed. The amount of columns not formed in this way, T, defines the degree of

    truncation and the Tleast significant bits of the product always result in 0.

    The algorithm behind truncated multiplication is the same as when dealing with non-

    truncated multiplication regardless of the truncation degree.

    The effect is illustrated in Figure 2, where a truncation degree ofT = 3, is applied. Notice

    that the columns to the right of the maroon vertical line are missing.

    A loss of precision follows as a result of truncation since the least significant bits of the

    resulting product are always 0. Notices that the truncated product is always equal to or

    lower than the real product since 1s are converted to 0s and never the other way

    around. This error, however, can be compensated for by introducing a correction constant

    and thus leading to a truncation error that is acceptable in most cases when dealing with

    digital signal processing. In this report the correction constant is found by simulating the

    multiplier (without correction) and the average precision error over 1,000,000 million

    samples is then used in a second simulation run as the correction constant. A more

    detailed introduction to this will be given in the design chapter.

    Figure 2: 4x4 bit Binary Multiplication with truncation degree T=3

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    22/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 11 of 94

    2.2 Conventional multipliersEssentially there are two types of multipliers. In Figure 3 a sequential multiplier is shown

    on the left and a parallel multiplier is shown on the right. The general idea of a sequential

    multiplier is to use the same components repeatedly during many clock cycles whereas

    parallel multipliers use many components within just one clock cycle. Essentially,

    parallel multipliers provide speed, while sequential multipliers provide area efficiency.

    Since the scope of this project is to achieve hardware savings ofparallel multipliers,

    sequential multipliers will not be considered further.

    Figure 3: Left: A sequential multiplier, right: A parallel multiplier

    Sources:

    http://www.cise.ufl.edu/~mssz/CompOrg/Figure3.16-ALUmult2ckt.gif (21/06-2009)

    http://www.cs.umbc.edu/~squire/download/pmul4.jpg(21/06-2009)

    2.3 Binary additionBinary addition is much like decimal addition, except that it is performed in radix 2.

    Adding two 1-bit signals is quite straight-forward and can be done using a full-adder

    shown in Figure 4 below.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    23/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 12 of 94

    Figure 4: 1-bit full adder with truth table. [10]

    A full adder is one of the simplest components used to perform binary addition as it deals

    with only 1-bit signals. It takes two inputsA,B and a carry in, Cin which results in a sum,

    Sand a carry out, Cout.The carry in is only used when a third input comes from somewhere, for example from

    another full adder.

    Adding n-bit numbers together can be done in various ways [9]. The simplest n-bit adder

    is called aRipple-carry adderand a 4-bit example is shown in Figure 5. It is simply an

    array of full-adders where the carry-out from the right-most adder, C1 propagates into the

    carry-in of the second adder and so on. This is called a ripple-carry adder because the

    carry ripples from full adder to full adder as shown in the diagram.

    Figure 5: 4-bit Ripple-Carry Adder

    The carryripples from right to left. [10]

    This adder is quite slow as every full adder has to wait for the carry-out of the previous

    adder and only one adder is in use at any given time. To reduce delay various techniques

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    24/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 13 of 94

    have been used and in this project two are especially prominent, the Carry-save adder

    and the Carry look-ahead adder.

    A schematic of a 4-bit Carry look-ahead adder is given in Figure 6.

    Figure 6: 4-bit Carry look-ahead adder. [10]

    A Carry look-ahead adder calculates the sum without regard for potential carries from

    previous adders. The Carry Look-ahead block shown on the diagram calculates all the

    carries in the system. In the end the sums and the carries are added and the final sum is

    given including the last carry from the most significant bit.

    The advantage of this is that the delay time is radically decreased, as the full adders dont

    have to wait for a ripple-carry from a previous state. The catch is of course that more

    components are needed to implement the Carry look-ahead block and thus takes up more

    area on the chip.

    Since this project is about reducing hardware needed this adder might not seem as a

    logical choice. However, as will be mentioned later (and in Appendix D Synthesis

    reports) the area used is actually the same as for a ripple-carry adder after the area

    optimization algorithm is performed by Xilinx ISE. More about the Carry look-ahead

    adder can be found in Appendix B.6 in [8] and [11].

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    25/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 14 of 94

    Lastly, the Carry-save adder (CSA) is shown in Figure 7. Looking at the 1-bit case it can

    be seen that the CSA block is basically a full adder and has the same number of inputs

    and outputs (IO), they are just renamed. Herex andy are used instead ofa and b to avoid

    confusion.

    Figure 7: the Full adder and the CSA has same IO. [8]

    When implementing multipliers it is needed to add three n-bit numbers and this requires

    one CSA block for every bit as shown in Figure 8.

    Figure 8: CSA array

    For every bit one CSA block is needed. This circuit adds threen = 8 bit numbers. [8].

    The important thing to notice here is the fact that the CSA blocks all operate

    simultaneously and they dont wait for a carry from a previous state. The three input n-bitnumbers are compressed into two n-bit numbers, a sum and a carry.

    Finally the sum and carry are added, typically using a fast adder like the before

    mentioned Carry look-ahead adder.

    For more information about Carry-save addition, see [8]. [8], [10]

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    26/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 15 of 94

    2.4 Carry-save adder (CSA) multiplierIn this sub-chapter it will be shown how a multiplier can be implemented using Carry-

    save adders. The multiplierand multiplicandintroduced in 2.1 are in this case labeled a

    and b respectively. The partial products are formed, also as mentioned in 2.1, either by

    ANDing the appropriate bits or by an alternative method. In this multiplier the

    alternative method is used. In Figure 9 an array of CSA-Blocks (please note that a CSA-

    block is just a full adder with renamed IO) is shown.Every CSA-array represents a

    partial product and therefore n CSA-arrays are needed to make a full multiplier where n is

    the width of the multiplicand. This way of ordering the CSA-arrays is called a chain.

    Notice the extra input b(i) compared to a normal CSA; This input serves the same

    purpose as the multiplier bit in the before mentioned alternative method and in this case

    as a gating parameter8 for a.

    Figure 9: Schematic of the CSA.

    a, sumin, cin, sumout, cout are all n-bit wide, whileb(i) is only one bit wide.

    Every CSA-array has three input numbers as illustrated in Figure 9. The outputs of the

    first CSA are used as inputs in the second CSA along with a and so on. Since the first

    CSA array has no previous CSA-arrays it has two 0s and of course a as input. This

    8 Ifb(i) is 0 the signal a is gated and treated as zero in the box, otherwise a is passed through unaltered.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    27/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 16 of 94

    means that the first CSA-array can actually be replaced by very simple logic but more

    about that in 4.2.

    Figure 10: Carry-save adder (CSA) multiplier.

    As shown in Figure 10 it is possible to combine the CSA-array with a Carry lookahead

    adder. Each CSA-array outputs one bit giving the least significant half of the final

    product, while the final Carry look-ahead adder outputs the most significant half of the

    final product.

    2.4.1 Multiplier trees

    Instead of arranging the CSA blocks in a chain, a tree formation can be used. Amongst

    the most common multiplier trees are theDadda and Wallace trees [9][8]. In the above

    mentioned CSA chain they wouldnt have each CSA operate sequentially but would try to

    make as many of the CSAs function at the same time (in parallel). This reduces the delay

    time of the entire multiplier considerably but doesnt affect the area consumption and is

    therefore not within the scope of this project.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    28/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 17 of 94

    Figure 11: Example of a carry-save adder Wallace tree multiplier.

    This type of multiplier will not be used as it is efficient for reducing delay (speed) and that is not

    within the scope of this project.

    Source: http://net.pku.edu.cn/~course/cs101/resource/Intro2Algorithm/book6/676_a.gif -11/06/09

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    29/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 18 of 94

    3 Analysis

    In this chapter a closer analysis of the problem is done. As mentioned earlier the main

    problem in this project is to reduce area used on the FPGA. This chapter will consist of

    two sub-chapters:

    Identification of the priorities of the report.

    A possible solution to the problem. Which choices have to be made and why?

    3.1 PrioritiesMany things can be optimized by using different methods aiming at solving different

    problems. However, the primary purpose of this project is to reduce the area used on the

    FPGA. Speed is also a concern but only a secondary priority as described in 3.2.

    3.2 Problem solvingIn the previous chapters several choices have been made and here follows a brief recap:

    A Carry look-ahead adder was used as the final adder instead of a Ripple-carry

    adder as it was faster without compromising area requirements.

    Carry-save adders were used to represent the partial product matrix as they are

    fast and area efficient when adding m different n-bit numbers.

    Correction using a constant was chosen.

    In Chapter 2 a multiplier consisting of several Carry-save adders and a final Carry look-

    ahead adder was introduced. A third adder, the Ripple-carry adder, was introduced but

    also rejected shortly after as it was too slow. The problem to be addressed now is if there

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    30/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 19 of 94

    are other adders than the three above mentioned and the answer is yes. There are an

    almost endless amount of different adders invented, some of which are quite obscure (a

    few examples: [9]). They all have their perks and drawbacks but we will not go more into

    details about adders. The Carry look-ahead adder and the Carry-save adder were chosen

    as they are both very readily implemented, while still providing superior performance.

    These adders are also very commonly used in parallel multipliers and are as such obvious

    choices.

    As stated in 2.1.2 a precision error occurs when truncation is applied and this error can be

    averted by introducing some sort of correction. Here one type of correction, correctionusing a constant, was introduced. There are however several ways of applying correction

    some of which are quite intricate (as with adders) [12], [13], [14], [15]. The method of

    constant correction was chosen as it is an easily implemented correction and compared to

    this, the error precision achieved is quite noteworthy

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    31/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 20 of 94

    4 Design

    In this chapter both the Simulink model and the ModelSim VHDL model will be

    presented at a high-level to give a general idea of how to apply the solution presented in

    the analysis chapter (chapter 3.2).

    4.1 Matlab Simulink modelTwo uniformly distributed random numbers are generated by using a built-in Simulink

    block and are then split into two branches, Branch A and Branch B. Branch A simply

    generates the correct product (p_real = P) using a built-in Simulink block. Branch B is

    where the truncation, the rounding and the correction constant are applied. The two

    results are then compared and averaged over 1,000,000 samples.

    TheMultiplier with Truncation block is a Matlab Embedded Function and the

    block is implemented using the Matlab

    Programming language. More details about this

    implementation are given in chapter 5.1.

    The rounding is implemented in two steps. First a

    1 is added to the appropriate bit (see 2.1.1).

    Second, shifting is applied n times as shown in

    Figure 12. In this way the truncated bits arent removed but simply changed to

    zero. If the bits were completely removed then

    conversion to base 10 (decimal) would be

    incorrect.

    The simulation is run twice with each set-up. First, the simulation is run without

    any correction applied and the average difference is noted. Second, the average

    difference from first run is used as correction and the simulation is run again.

    Finally the average difference, both with and without applied correction, is noted

    in a table. See chapter 6.1.1 for the table.

    Figure 12: Bit shifting

    When shifting right twice then left twice

    the 2 least significant bits become 0

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    32/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 21 of 94

    Figure 13: The designed Simulink model.

    The inside of the boxes can be seen in Appendix A Matlab Simulink Design.

    4.2

    Modelsim VHDL model

    Figure 14: High-level design of thenxn bit parallel CSA multiplier implemented.

    Everything right of the blue line is truncated (not formed).

    In Figure 14 is used the same CSA multiplier as introduced in chapter 2.4 except for two

    modifications.

    Recollect that the first CSA-array can be replaced by very simple logic. This logic

    is implemented in the process trunc.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    33/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 22 of 94

    Truncation is applied. Remember that CSA-arrays are simply arrays of CSA-

    blocks (or full adders) and depending on the degree of truncation a number of the

    least significant adders are discarded. In Figure 14 this corresponds to discarding

    everything right of the blue line.

    The level of truncation and the width of the multiplicand is given by two parameters, T

    and leftrespectively. The implementation of the multiplier will be explained in more

    detail in chapter 0.

    The sum output after each stage is left shifted once before put into the next stage and the

    least significant bit of the sum (before left shifting) is used as product bit. The signal

    sum_1 coming from CSA_1 would give the product bitp(1) and so on.

    More about the actual implementation of the multiplier is given in chapter 0.

    Quick example, T = 4:The 3 rightmost adders are not formed in CSA_1 (the first CSA-array).The 2 rightmost adders are not formed in CSA_2.The rightmost adder is not formed in CSA_3.The product bitsp(3)-p(0) are set to 0.

    All other components operate as normal.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    34/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 23 of 94

    5 Implementation

    In this chapter an in-depth explanation of how the two models (Simulink and vhdl) is

    made. Since the Matlab Simulink model was made mostly using pre-made blocks, the

    only interesting part left is the embedded Matlab function and as such 5.1 will mostly be

    about this. The top view design has already been given in the design chapter (4.1).

    The VHDL implementation, however, is much more interesting to have a closer look at.

    A top view design has been given in the design chapter (4.2) but in this chapter we will

    go into details as to how exactly the different components were implemented. This is

    what 0 is all about.

    Finally it is explained how the above mentioned models have been tested to confirm that

    they are in fact correct and/or reliable.

    The parameter used to define the levelof truncation is different in the

    Simulink model and the VHDL

    implementation. Simulink uses the

    parameter kwhere n + kis the number

    of columns used from the partial

    product matrix while truncating.

    Contrarily, the VHDL implementation

    uses the parameter Twhich is defined as

    the number of columns not used from the partial product matrix.

    In the beginning of the project kwas used but this was a poor choice and instead Twas

    used when the VHDL model was implemented as it seemed more logical.

    Figure 15: The 2n-bit product.

    Two n-bit inputs render a 2n-bit final product. The

    Tbits are truncated while then+k bits are kept.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    35/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 24 of 94

    5.1 Matlab SimulinkThe top view figure of the model was given in the chapter 4.2 Figure 13.

    Random Number Generator, Rounding, Correction Constantand Calc Average are made

    solely using building blocks and function as explained in 4.2. In the program if you

    double click any of those above mentioned boxes a sub-system will appear where the

    blocks used can be seen. They all only have one layer (no subsystems in subsystems) and

    the sub-systems are shown in Appendix A Matlab Simulink Design.

    Now for the embedded Matlab function: It is implemented by using the Matlab

    Programming language and the full source code can be found in Appendix A Matlab

    Simulink Design as well. It has four inputs: a, b, n and k. The parameters n and khave to

    be set manually using the Matlab console. The correction constant Cis not used in this

    function but is also set manually using the Matlab console.

    a and b are the inputs the multiplicand and the multiplier respectively.

    n defines the width ofa and b.

    kdefines the degree of truncation used.

    Figure 16 helps illustrate how the function flows.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    36/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 25 of 94

    Figure 16: Flow diagram of the Embedded Matlab function,multiplier with truncation.

    It has two inputs and 1 output, all 32-bit integers. It is followed by 5 steps:

    1. The 32-bit integer inputs are converted into binary vectors or arrays with n giving

    the width of those arrays.

    2. The partial product matrix is formed as explained in 2.1.

    As an example a = 2710 = 110112 and b = 2610 = 110102 would render the matrix

    in Figure 16

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    37/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 26 of 94

    3. This matrix is then manipulated by having the truncated columns zeroed.

    4. The calculation is done by adding the columns.

    5. The final binary array is converted back to an integer.

    It is important to note that the algorithms and the methods used here wont be efficient at

    all on a real chip but they do give us the correct result. Since this is the only thing we are

    interested in at this point it is not a problem.

    5.2 VHDLThis is a description of the VHDL code used to implement the n-bit multiplier introduced

    in 4.2. It uses Carry-save adders to represent the partial product matrix and it uses a Carry

    look-ahead adder as the final adder.

    The VHDL code is in principal quite simple with only a few components; there are,

    however, some parts that might need a closer look. The full source code can be found in

    Appendix B - VHDL code.

    The entire model consists of 3 vhdl files including a test bench. They are all made from

    scratch and their use will now be clarified.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    38/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 27 of 94

    Figure 17: Top-View of the VHDL implementation

    The testbench stimulates inputs (a andb) to the multiplier and receives the resulting product (y).

    Inside the multiplier is one Carry look-ahead adder andn CSA-arrays, each withn full adders (CSA-

    blocks) inside.

    5.2.1 Components.vhd

    This file contains all the components needed to implement the multiplier and as such does

    nothing by itself. It was made to keep nxn_mult.vhdsimpler.

    It holds 3 components,fadd, cla_g and csa_g:

    5.2.1.1Fadd 1-bit Full adder

    Inputs: {a, b, cin}

    Outputs: {s, cout}

    All signals are 1-bit wide.

    This is a common full adder as described in 2.3. It takes a, b and cin(carry in) and gives s

    (sum) and cout(carry out) as outputs.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    39/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 28 of 94

    5.2.1.2CLA_g n-bit Carry look-ahead adder

    Generics: {left}

    Inputs: {a, b, cin}

    Outputs: {sum, cout}

    cin and coutare 1-bit wide.

    a,b and sum are left+1 bits wide.

    In this component leftis a generic. This means that you have to specify a value for left,

    when this component is instantiated.Leftspecifies the position of the MSB (most

    significant bit) ofa,b and sum. With leftbeing 31; a,b and sum would be 32 bits wide.

    This is a carry look-ahead adder like the one introduced in 2.3. It is used as the final

    adder in the multiplier since it is faster than a ripple-carry adder and uses the same area

    when optimized properly9.

    In this case cin is 0 and coutis connected to a signal with no connection (nc1). The

    signals sum_n and carry_n, coming from the final CSA-array in the multiplier, areconnected to a and b respectively.

    The output of the adder, sum becomes the left+1 most significant bits in the final product.

    Figure 18: 2n bit final product of the multiplier

    The light green square represents the left+1 most significant bits.

    9 Xilinx ISE 9.2i is used to synthesize the code and the code is optimized automatically while doing so.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    40/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 29 of 94

    5.2.1.3CSA_g Carry-save adder

    Generics: {left, T}

    Inputs: {b, a, sumin, cin}

    Outputs: {sumout, cout}

    b is 1-bit wide.

    a, sumin, cin, sumout, coutare left+1 bits wide.

    As explained in 4.2 this component uses two generics; leftdefines the width of the

    multiplicand (a), while Tsets the level of truncation.

    The Generate statementused in this component is a concurrent statement10 containing

    further concurrent statements to be replicated. Here it is used to create the multiple full

    adders that are contained inside the CSA-array.

    The signal b is not a part of a conventional carry-save adder and is only used in one line:

    As mentioned earlier b is a gating parameter. Ifb is 0 the signal a is gated and the

    internal signal aa is zero, otherwise a is passed through to aa. It is aa which is used

    inside the component.

    This component is generated lefttimes in the complete multiplier.

    10 Concurrent statement: So called because conceptually they can be activated and perform their actionstogether, that is, concurrently.

    aa

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    41/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 30 of 94

    5.2.2 Nxn_mult.vhd

    Generics: {left, T}

    Inputs: {a, b}

    Outputs: {y}

    a and b are leftbits wide.

    y is 2*left+1 bits wide.

    This is the main multiplier component itself. The inputs a and b are the multiplicand and

    the multiplier respectively andy is the product. Like in the previous components leftand

    Tare generics and here they set the width of the multiplicand and the degree of truncation

    respectively.

    Basically, the full role of this component is declaration and instantiation of the above

    mentioned components. It generates the wires needed to connect the components

    correctly using the before mentioned generate statements. One additional thing worth

    mentioning is the process trunc.

    As mentioned earlier the first CSA-array is replaced purely with combinatorial logic. This

    combinatorial logic is what trunc describes. The signal carry_0 is always zero (untouched

    in this process) and sum_0

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    42/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 31 of 94

    5.3 TestingTesting is a vital part of any work. Below is explained how the testing was carried out in

    this project.

    5.3.1 Matlab Simulink test

    To test the correctness of the design, a number of simulations were run, each time noting

    down the results in a table11. As a preliminary check I looked at the resulting table to see

    if a higher level of truncation indeed did lead to a higher precision error and that

    introducing a correction constant in fact did help. Furthermore the simulation was run

    step-by-step, checking by hand, that every result along the way was correct for a

    representative set of numbers.

    See Appendix C Test results for a sample of such a test bench.

    5.3.2 Modelsim Xilinx ISE simulation

    In 5.2.3 Testbench.vhd was introduced. For 4 and 8 bits a representative set of numbers

    were stimulated to the multiplier during a simulation. The results were checked manually

    using the pen and paper method. If both the 4 and the 8 bit simulations were correct it

    was assumed that the 16 and 32 bits were also correct as checking them manually would

    be too comprehensive. The results were also checked for all bits with no truncation. The

    simulation results can be found in Appendix C Test results.

    After designing the model and testing it, it was synthesized using Xilinx ISE. As

    mentioned in 3.2 I ended up using a Carry look-ahead adder instead of a ripple carry

    adder since the area it takes is the same. This was verified by synthesizing both a normal

    generic ripple carry adder and a generic carry look-ahead adder. See Appendix D

    Synthesis reports for further details.

    11 Simulink uses seeds to generate random numbers. This means that if the seeds arent changed it will stillbe the same X random numbers thus making them not so random. The seeds are therefore changedeverytime the correction constant was applied.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    43/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 32 of 94

    5.3.3 Testing on an FPGA

    As a final check the model synthesized in Xilinx ISE can be uploaded to an FPGA. Using

    LEDs to display the results and buttons to stimulate inputs the FPGA can function as a

    multiplier. Looking at the LEDs it is seen if the model functions correctly.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    44/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 33 of 94

    6 Results

    In this chapter the results from the two models will be presented.

    As explained in the implementation chapter 5 the Simulink model and the VHDL model

    use two different parameters of defining the level of truncation (kand Trespectively),

    hence both parameters are shown in both tables.

    6.1.1 Precision error

    In Table 1, below, the error increase is given for truncated multipliers with varying size

    of inputs and degrees of truncation (n and k/T). Please note that the signals used in the

    Simulink model are all 32-bit integers and this result in overflow when the resulting

    product exceeds 32-bits. In this table E[not corrected] is the average precision error over

    1,000,000 samples without any correction. The correction is then calculated and noted in

    the column C and finally the simulation is run again with C applied. This gives the values

    noted in the last column, E[corrected]. When multiplying two 16-bit numbers the result is

    a 32 bit integer which is not a problem but when adding the (positive) correction constant

    as well the product might potentially be a 33-bit number and the results are therefore not

    useable. Obviously two 32-bit inputs also results in overflow. This should have been

    caught during the testing and appropriate steps should have been taken but due to time

    limitations this was not done.

    Simulink does not support integers of more than 32 bits, which could have otherwise

    solved the problem. Notice that a 64 bit signed integer is potentially

    9,223,372,036,854,775,807, which is quite a high number.

    N = 15 (two 15-bit inputs) which gives the highest possible correct result, is included in

    Table 1.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    45/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 34 of 94

    The equation used to calculate the correction constant [17]:

    kn

    knEround

    C +

    +

    =

    2

    )*2(

    E is the uncorrected error. This calculation is performed to avoid using the truncated bits

    (2n (n + k)) as they are always 0.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    46/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 35 of 94

    Table 1: Precision error for various truncated multipliersEach row represents a multiplier defined by the parametersn, k and T.

    n k T E[not corrected] C E[corrected] Inputs

    8 1 7 -130.253771 130.2539 -5.965983 1,000,000

    8 2 6 -49.368171 49.36816 -6.347613 1,000,000

    8 3 5 -16.653515 16.65332 -4.954989 1,000,000

    8 4 4 -4.2326551 4.232666 -0.30413 1,000,000

    8 5 3 -0.053813 0.053833 0.391538 1,000,000

    8 8 0 1.599285 -1.599289 -0.69136 1,000,000

    15 1 14 -45064.90 45064.90 -9.476291 1,000,000

    15 2 13 -20507.77 20507.77 17.29518 1,000,000

    15 3 12 -9216.607 9216.607 5.324396 1,000,000

    15 4 11 -4114.335 4144.335 43.53750 1,000,00015 5 10 -1807.960 1807.960 -4.668613 1,000,000

    15 6 9 -795.4281 795.4281 8.099846 1,000,000

    15 7 8 -343.2297 343.2297 43.11340 1,000,000

    15 15 0 58.83136 -58.83136 -59.64291 1,000,000

    16 1 15 -59592.17 59592.17 OVERFLOW 1,000,000

    16 2 14 -45049.86 45049.86 OVERFLOW 1,000,000

    16 3 13 -37639.47 37639.48 OVERFLOW 1,000,000

    16 4 12 -286374.2 286374.2 OVERFLOW 1,000,000

    16 5 11 -25537.30 25537.30 OVERFLOW 1,000,000

    16 6 10 -31825.38 31825.38 OVERFLOW 1,000,000

    16 16 0 -30026.22 30026.22 OVERFLOW 1,000,000

    32 1 31 OVERFLOW OVERFLOW OVERFLOW -

    32 2 30 OVERFLOW OVERFLOW OVERFLOW -

    32 3 29 OVERFLOW OVERFLOW OVERFLOW -

    32 4 28 OVERFLOW OVERFLOW OVERFLOW -

    32 5 27 OVERFLOW OVERFLOW OVERFLOW -

    32 6 26 OVERFLOW OVERFLOW OVERFLOW -

    32 7 25 OVERFLOW OVERFLOW OVERFLOW -

    32 8 24 OVERFLOW OVERFLOW OVERFLOW -

    32 32 0 OVERFLOW OVERFLOW OVERFLOW -

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    47/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 36 of 94

    6.1.2 Hardware savings

    In Table 2 the hardware savings for various truncated multipliers (with n, kand Tas

    defining parameters)are presented. The number of 4 input LUTs are used to calculate the

    Area Savings column. The synthesis reports used for this table can be found in Appendix

    D Synthesis reports.

    Table 2: Hardware savings for various truncated multipliers

    Each row represents a multiplier defined by the parametersn, k and T.

    n k T Look Up Tables (LUTs) Slices used Hardware savings

    8 1 7 77 44 34.75 %8 2 6 88 51 25.42%

    8 3 5 99 57 16.10 %

    8 4 4 106 61 10.17 %

    8 5 3 111 64 5.932 %

    8 8 0 118 68 0.000 %

    16 1 15 286 164 42.11 %

    16 2 14 313 180 36.64 %

    16 3 13 339 195 31.38 %

    16 4 12 363 209 26.52 %

    16 5 11 384 221 22.27 %

    16 6 10 404 232 18.22 %16 16 0 494 284 0.000 %

    32 1 31 1086 624 46.08 %

    32 2 30 1145 658 43.15 %

    32 3 29 1203 692 40.27 %

    32 4 28 1259 724 27.49 %

    32 5 27 1312 754 24.86 %

    32 6 26 1364 784 32.27 %

    32 7 25 1414 813 29.79 %

    32 8 24 1461 840 27.46 %

    32 32 0 2014 1158 0.000 %

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    48/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 37 of 94

    7 Discussion

    In this chapter, the results presented in this report are discussed and compared with

    similar previously published results. It will also be discussed whether the tools, methods

    and components used were correct. Finally the perspectives of the work and what future

    work could be done are discussed.

    7.1

    Results evaluation

    7.1.1 Precision Error

    Theoretical calculations similar to the experimental results in this project are presented in

    [17]. In both cases, the average error follows no pattern with a precision error that is more

    or less independent of the degree of truncation (after correction is applied). Therefore,

    one is lead to believe that a high level of truncation should be applied as its downsides are

    the same as for a low degree of truncation. In [17] however another calculation is

    calculated, maximum error, which describes the highest possible deviation from the

    correct result for a single sample. The values are shown in Table 3. This value follows a

    pattern and it is obvious that a higher degree of truncation leads to a higher maximum

    error. Based on this information it is a matter of precision constraints that determines

    what degree of truncation would be best.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    49/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 38 of 94

    Table 3: Maximum precision error for various multipliersFor more information as to how these numbers are calculated, see [17]

    n k T Maximum error([17])

    8 1 7 2.5039

    8 2 6 1.2539

    8 3 5 0.7539

    8 4 4 0.6289

    8 5 3 0.5352

    8 8 0 0.5000

    16 1 15 5.5000

    16 2 14 2.7500

    16 3 13 1.5000

    16 4 12 1.0000

    16 5 11 0.7188

    16 6 10 0.5928

    16 16 0 0.5000

    24 1 23 8.5000

    24 2 22 4.2500

    24 3 21 2.2500

    24 4 20 1.3750

    24 5 19 0.9063

    24 6 18 0.6875

    24 7 17 0.5859

    24 24 0 0.5000

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    50/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 39 of 94

    7.1.2 Hardware savings

    Table 4 shows hardware savings for various multipliers compared with [17].

    Table 4: Hardware savings compared with [17]

    n k T Hardware savings Hardware savings ([17])

    8 1 7 34.75 % 35.4 %

    8 2 6 25.42% 23.9 %

    8 3 5 16.10 % 15.2 %

    8 4 4 10.17 % 9.28 %8 5 3 5.932 % 4.36 %

    8 8 0 0.000 % 0,00 %

    16 1 15 42.11 % 42.6 %

    16 2 14 36.64 % 40.0 %

    16 3 13 31.38 % 34.2 %

    16 4 12 26.52 % 29.3 %

    16 5 11 22.27 % 24.2 %16 6 10 18.22 % 19.9 %

    16 16 0 0.000 % 0.00 %

    32 1 31 46.08 % -

    32 2 30 43.15 % -

    32 3 29 40.27 % -

    32 4 28 27.49 % -

    32 5 27 24.86 % -32 6 26 32.27 % -

    32 7 25 29.79 % -

    32 8 24 27.46 % -

    32 32 0 0.000 % -

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    51/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 40 of 94

    In [17] the results are based on theoretical calculations and in this project they are based

    on experimental results. It can be seen that the experimental results from this project are

    in agreement with [17]. In [17] it is stated about the values in Table 4:

    The same model is implemented in this project as in [17] except for three things:

    Relative sizes are used in [17].

    n half adders are used in [17] instead of full adders, where n is the size of the

    inputs. Since half adders and full adders have the same relative size in [17] this

    does not affect the results.

    A simpler full-array multiplier is used in [17] compared to the CSAs used in this

    project. The area used, however, is equivalent as it is only a matter of how the

    carries are passed on.

    These differences are judged as minor differences, which can explain the deviations in

    Table 4. They do not affect the overall conclusion.

    7.2 SimulinkAs mentioned previously overflow occurs when using Simulink to simulate the multiplier

    for inputs of more than 15 bits and this immediately raises the question if Simulink is

    even the right tool. It is an easy readily understood tool with a user-friendly interface but

    it is not as powerful as other tools (E.g. Matlab). The multiplier itself was implemented

    using an Embedded Matlab Function but these inherit the restrictions enforced on

    the values given correspond to the hardware savings of truncated multiplierscompared with conventional multipliers which implement round to nearest by adding

    a one to column n-1. For this table, the relative sizes of the AND gates, half adders

    and full adders are 1, 4 and 9, respectively. The relative size of each full adder in the

    CLA is 9 and a 4-bit CLA logic block has a relative size of 20 . [17]

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    52/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 41 of 94

    Simulink and are as such not as useful aspure Matlab functions. It is therefore suggested

    to implement the model using only the Matlab programming language.

    7.3 PerspectivesTruncation is a method widely used. There are great possibilities in using truncation as it

    has always been a concern problem to reduce area-on-chip along with power dissipation

    and complexity of the components.Some of the many examples of use can be found in

    [1], [2], [4], [5], [18].

    7.4 Future workThe field of truncating multipliers is quite big and there are many different ways of

    performing truncation and very different hardware can be used as mentioned in Chapter

    3. Immediate future work could be to extend to multiple multipliers and/or multipliers

    that form the partial product matrix through other techniques (e.g. modified booth

    encoding [1], [18]). It could also be to implement more intricate adders or correction

    methods.

    Specific for this project, correcting the Simulink model or rewriting it completely using

    the Matlab programming language is a pending problem. Also, simulating the maximum

    error would be helpful as such a table is very helpful when deciding the degree of

    truncation to use.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    53/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 42 of 94

    8 Conclusion

    In this project n-1 Carry-save adders and one Carry look-ahead adder were used to

    implement a parallel multiplier with truncation, where n is the width of the inputs.

    To get the precision error a Simulink model was designed and implemented. Using inputs

    of more than 15-bits caused overflow. This made simulations unreliable and raised the

    question whether Simulink was the right tool. Using standard Matlab programming was

    suggested. The results though clearly demonstrated how truncation introduced large

    errors but also how those errors could be reduced to an acceptable level by introducing a

    simple correction constant. In [17] the maximum precision error of a single sample on a

    similar multiplier was predicted using theoretical calculations. These revealed that the

    maximum error increased as the degree of the truncation increased. Lastly, more complex

    correction methods could be used to reduce the precision error even further but this was

    not within the scope of this project [12], [13], [14], [15].

    The Multiplier was also implemented and simulated in ModelSim using the hardware

    description language VHDL. Furthermore the VHDL model was synthesized using Xilinx

    ISE to evaluate hardware savings. This showed that hardware savings of up to 46.08 %

    for 32-bit inputs could be achieved. The hardware savings from this project were also

    compared with [17]. It was found that the theoretical predictions made in [17] and the

    results presented in this project were in agreement.

    To conclude the project it can be said that even high levels of truncation lead to a

    relatively small precision error. This means that truncated multipliers offer significant

    hardware savings for applications, which do not require exact multiplication. Given

    specific hardware and precision constraints, the right number of columns to truncate for

    8, 16 and 32 bits can easily be determined. Future work could be to extend to multiple

    multipliers and/or multipliers that form the partial product matrix through other

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    54/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 43 of 94

    techniques (e.g. modified booth encoding [1], [18]). Some examples of use can be found

    in [1], [2], [4], [5], [18].

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    55/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 44 of 94

    9 References

    [1] Tso-Bin Juang, Shen-Fu-Hsiao, Low-Error Carry-Free Fixed-Width Multipliers

    With Low-Cost Composation Circuits, IEEE Transactions on circuits and

    systems-II: Express briefs, Vol. 52, No. 6, June 2005, (.PDF file).

    [2] E. George Walters III, Mark G. Arnold, Michael J. Schulte, Using Truncated

    Multipliers in DCT and IDCT hardware accelerators, (.PDF file).

    [3] Sunder S. Kidambi, Fayez El-Guibaly, Andreas Antoniou, Area-Efficient

    Multipliers for Digital Signal Processing Applications, IEEE Transactions on

    Circuits and SystemsII: Analog and Digital Signal Processing, Vol. 43, No. 2,

    February 1996, (.PDF file).

    [4] Florean Curticapean, Jarkko Niitylahti, A Hardware Efficient Direct Digital

    Frequency Synthesizer, Tampere (.PDF file).

    [5] E. George Walters III, Michael J. Schulte, Design Tradeoffs Using Truncated

    Multipliers In FIR Filter implementations, Lehigh University, (.PDF file)

    [6] http://www.systemc.org/home (last visited 24/6-2009).[7] http://www.systemverilog.org/(last visited 24/6-2009)

    [8] Prof. Loh - Carry-Save Addition, CS3220 Processor Design February 2,

    2005, (.PDF file).

    [9] http://www.aoki.ecei.tohoku.ac.jp/arith/mg/algorithm.html (last visited 24/6-

    2009)

    [10] http://www.statemaster.com/encyclopedia/Adder-(electronics), homepage, last

    visited 21/7-2009.

    [11] David A. Patterson, John L. Hennesey, Computer Organization And Design

    The Hardware/Software Interface, 3rd edition, revised printing, Morgan

    Kaufmann publishers, 2007, ISBN: 978-0-12-370606-5.

    [12] Lan-Da Van, Chih-Chyau Yang, Generalized Low-Error Area-Efficient Fixed-

    Width Multipliers, (.PDF file).

    [13] Eric J. King, Earl E. Swartzlander, Jr. Data-Dependent Truncation Scheme for

    Parallel Multipliers, Texas, (.PDF file).

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    56/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 45 of 94

    [14] E. George Walters III, Michael J. Schulte, Efficient Function Approximation

    Using Truncated Multipliers and Squarers, USA, (.PDF file).

    [15] James E. Stine, Oliver M. Duverne, Variations on Truncated Multiplication,

    Illinois, (.PDF file)

    [16] Peter J. Ashenden, The Designers Guide To VHDL, 2nd Edition, 1996, ISBN:

    1-55860-674-2.

    [17] Michael J. Schulte, Earl E. Swartzlander, Jr., Truncated Multiplication with

    Correction Constant, (.PDF file).

    [18] Alok A. Katkar, james E. Stine, Modified Booth Truncated Multipliers, Illinois

    (.PDF file)

    Pictures and homepages of tools used are referenced to directly in the report.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    57/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 46 of 94

    10 Appendix A Matlab Simulink Design

    This appendix unrolls the Matlab Simulink model. Below is the inside of the block sub-systems along with the full source code for the embedded Matlab function.

    10.1 The inside of the Simulink sub-systemsAs mentioned before the inside of the sub-systems are simply built-in Matlab Simulinkbuilding blocks.

    Figure 19: The rounding sub-system

    The input is converted to a 32bit integer, a constant is added and the integer is first right shifted,then lefting shifted (here 8 bits each way). This is done to simulate the rounding error.

    Figure 20: The Random Number Generator sub-systemTwo random numbers are generated and converted to 32-bit integers.

    Figure 21: The Average Calc sub-systemThe first part of the subsystem takes the sum of all samples and the second part divides with the

    number of samples performed at any given time. The block Counter functions as a +1 counter.

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    58/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 47 of 94

    10.2Matlab embedded function full source codefunction sum = Multiplier( a,b,n,k)% nxn bit multiplier with truncation. n determines the width of the

    inputs,% and k determines the level of truncationabin=zeros(1,n);bbin=zeros(1,n);m=0;carry=0;sum=0; %Sum in base 10 (output=integer)sumbin=zeros(1,2*n); % sum in base 2 (output=vector)y=zeros(n,2*n); %initiation of a (n x n*2) matrix

    %%%Here the integer input is converted to a binary arrayfor j = 1:n

    if a-pow2(n-j) >= 0 %% first for a

    abin(j)=1;a=a-pow2(n-j);

    elseabin(j)=0;

    end

    if b-pow2(n-j) >= 0 %% then for bbbin(j)=1;b=b-pow2(n-j);

    elsebbin(j)=0;

    end

    end

    %%% Partiel product matrix is generated here

    for i = n:-1:1 %from n downto 1m=m+1;for j = 1:n %from 1 to ny(m,n+1+j-m) = abin(j)&bbin(i);end

    end

    %%% Here truncation is introduced to the matrixfor i = 1:n

    for j = n+k+1:n*2 %The part is truncated away

    y(i,j) = 0;end

    end

    %%% Here the calculation itself is made

    for j = n*2:-1:1

    for i = 1:nsumbin(j) = sumbin(j) + y(i,j);

    endsumbin(j) = sumbin(j) + carry; %% adding previous carrycarry = floor(sumbin(j)/2); %% finding new carry

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    59/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 48 of 94

    sumbin(j) = sumbin(j) - carry * 2; %% setting final partiel

    product produkt

    end

    %%% Here the binary sum vector is converted to back to an integer

    for j = 0:n*2-1

    sum = sum + sumbin(2*n-j)*pow2(j);

    endint32(sum);end

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    60/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 49 of 94

    11 Appendix B - VHDL code

    11.1Nxn_mult.vhdlibrary IEEE;use IEEE.std_logic_1164.all;use IEEE.numeric_std.all;use ieee.std_logic_unsigned.all;

    entity Mult isgeneric ( left : natural := 7; -- position of MSB

    T : integer := 7 -- truncation of 0 means no truncation);

    port ( a : in std_logic_vector(left downto 0);b : in std_logic_vector(left downto 0);-- y_round: out std_logic_vector(left downto 0); --only used for roundingy : out std_logic_vector(2*left+1 downto 0)

    );end Mult;

    architecture circuits of Mult is

    constant zero : std_logic_vector(left downto 0) := (others => '0');signal nc1 : std_logic;type arr is array(0 to left) of std_logic_vector(left downto 0);

    signal c : arr := (others => (others => '0')); -- carries equals amount of adderssignal s : arr := (others => (others => '0')); -- partial sumssignal ss : arr := (others => (others => '0')); -- shifted sums

    component CSA_g -- duplicates entity portgeneric ( left : integer;

    T : integer

    );port (

    b : in std_logic; -- a multiplier bita : in std_logic_vector(left downto 0); -- multiplicandsum_in : in std_logic_vector(left downto 0); -- sums from previous stagecin : in std_logic_vector(left downto 0); -- carrys from previous stagesum_out : out std_logic_vector(left downto 0); -- sums to next stagecout : out std_logic_vector(left downto 0) -- carrys to next stage);

    end component CSA_g;

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    61/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 50 of 94

    component CLA_g -- duplicates entity portgeneric(left: integer);port (a : in std_logic_vector (left downto 0);

    b : in std_logic_vector (left downto 0);cin : in std_logic;sum : out std_logic_vector (left downto 0);cout : out std_logic);

    end component CLA_g;

    begin -- circuits of Mult

    --for truncation. Only affects p(0)trunc: process(a,b,s) isbeginif (T > 0 and b(0) = '1' ) then

    s(0)

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    62/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 51 of 94

    -- add_round: CLA_g-- generic map(left)-- port map(ss(left), c(left), y(left), y_round(left downto 0), nc1);

    end architecture circuits; -- of Mult

    11.2 Components.vhdlibrary IEEE;use IEEE.std_logic_1164.all;

    entity fadd is

    -- a, b and the carry-in ci are inputs of the adder.-- s is the sum output, co is the carry-out.port ( a, b : in std_logic;

    cin : in std_logic;s : out std_logic;cout : out std_logic);

    end fadd;

    architecture circuits of fadd isbegin

    -- This full-adder architecture contains two concurrent assignment.-- Compute the sum.s

  • 7/28/2019 Testing and Analyzing Methods for Truncated Binary Multiplication

    63/94

    Andreas T. WintherJune 2009Testing and analyzing of several methods for truncated multiplication in binary

    Page 52 of 94

    end CLA_g;

    architecture circuits OF CLA_g IS

    signal h_sum : std_logic_vector(left downto 0);signal carry_generate : std_logic_vector(left downto 0);signal carry_propagate : std_logic_vector(left downto 0);signal carry_in_internal : std_logic_vector(left downto 1);

    beginh_sum

  • 7/28/2019 Testing and Analyzing Meth