ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

Embed Size (px)

Citation preview

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    1/103

    Dr. Rehan Hafiz Lecture # 04

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    2/103

    Course Website for ADSD Fall 2011

    http://lms.nust.edu.pk/

    2

    Lectures: Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm

    Contact: By appointment/EmailOffice: VISpro Lab above SEECS Library

    Acknowledgement: Material from the following sources has been consulted/used in theseslides:1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan3. [STV] Advanced FPGA Design, Steve Kilts

    Material/Slides from these slides CAN be used with following citing reference:

    Dr. Rehan Hafiz: Advanced Digital System Design 2010

    Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

    http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/
  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    3/103

    This Lecture .

    3

    ASM Algorithmic State Machine

    Understanding Design Partition

    Controllers

    FSM Finite State Machines

    Mealy & Moore

    Micro Programmed

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    4/103

    Algorithm State Machine

    4

    ASMs: Usually the 1ststep towards algorithm to hardware mapping

    FSMs : More Controller oriented

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    5/103

    ASM- Algorithm State Machine

    Example5

    Up/Down Counter

    [CIL]

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    6/103

    6

    Implicit Coding

    Up/Down Counter

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    7/103

    Understanding Design Partitioning

    Systematically Porting an Algorithm to H/W

    7

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    8/103

    Greatest Common Divisor

    8

    Steps:

    Swap

    Check

    Process

    Slides from MIT Course 6.375 Complex Digital Systems http://csg.csail.mit.edu/6.375/

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    9/103

    GCD -Algorithm

    9

    Steps:

    Swap if req

    Check if B != 0

    Process

    A = 100, B= 60 (s)

    B !=0 (c)

    A = 40, B= 60 (p)

    A = 60, B= 40 (s)

    B !=0 (c)

    A = 20, B= 40 (p)

    A = 40, B= 20 (s)

    B !=0 (c)

    A = 20, B= 20 (p)

    A = 20, B= 20 (s)

    B !=0 (c)

    A = 0, B= 20 (p)

    A = 20, B= 0 (s)

    B !=0 (c)

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    10/103L02 Verilog 10.884 Spring 2005 02/04/05

    GCD Behavioral Examplemodule gcd_behavioral #(parameter width = 16 )

    ( input [width-1:0] A_in, B_in,output[width-1:0] Y );

    reg [width-1:0] A, B, Y, swap;integer done;

    always@( A_in or B_in )begin

    done = 0;A = A_in; B = B_in;

    while ( !done )begin

    if ( A < B )begin

    swap = A;A = B;B = swap;

    end

    elseif ( B != 0 )A = A - B;elsedone = 1;

    end

    Y = A;end

    endmodule

    We start byidentifying DATAProcessing Elements

    &

    Controlling Signals !

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    11/103L02 Verilog 11.884 Spring 2005 02/04/05

    Reference SlidesSlides from MIT Course 6.375 Complex Digital Systems http://csg.csail.mit.edu/6.375/

    Slides 11-46

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    12/103

    Summary

    Define higher level block diagram

    Define its interface

    Decompose into smaller blocks if required

    Decompose into Datapath & Controller

    Use different modules to implement Data path &

    Controller

    Define their interface

    Connect them in higher level block

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    13/103

    Controller Vs. Data-path

    Partitioning13

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    14/103

    Design Partitioning

    14

    Data path: The pipe that carries the data from the input of the design to the

    output and performs the necessary operations on the data.

    ALUs, Storage Registers & logic for moving data

    Controller Determines the sequence

    Configure the data path for various operations

    Data path and control blocks should be partitioned intodifferent modules.

    Allows module re-use Controller updates without requiring to update the Datapath

    DatapathCritical Timing Allows dedicated floor planning for Datapath Logic

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    15/103

    15

    2002 Dr. James P. Davis

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    16/103

    16

    Logic systems consist of two basic elements:

    Control logic consists of state machines (FSM)

    Datapath logic consists of functions like counters, arithmetic,

    multiplexers, decoders and memory (Wired Connected Datapaths)

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    17/103

    Finite State MachinesMoore Vs. Mealy Machine

    17

    Moore Machine

    Output function only ofpresent state

    May have more states

    Synchronous outputs

    No glitching One cycle delay

    Full cycle of stable output

    Mealy Machine

    Output function of both presentstates & input

    May have fewer states

    Asynchronous outputs

    If input glitches, so does output

    Output immediately available

    Output may not be stable longenough to be useful

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    18/103

    ASMs

    Moore Machine: No Oval, No Conditional Output List

    18

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    19/103

    Example: Output a ONE after detecting FOUR

    1s in a binary sequence19

    State Transition Graph

    How shall be its Moore

    equivalent

    [SHO]

    ASM

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    20/103

    ASM

    Mealy Vs. Moore20

    A hi f M l & M

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    21/103

    Architectures of Mealy & Moore

    Machines !!21

    The choice between Mealy

    and Moore machine

    implementations is usually

    the designers will.

    When some of the inputs are

    expected to glitch and

    outputs are required to be

    stable for one complete

    cycle MOORE is the best

    choice

    [SHO]

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    22/103

    22

    // This module implements FSM for the

    detection of four ones in a serial input stream of

    data

    module fsm mealy(

    input clk, //system clock input reset, //system reset

    input data in, //1-bit input stream

    output reg four_ones det //1-bit output to

    indicate 4 ones are detected or not

    );

    // Internal Variables

    reg [1:0] current _state, //4-bit current state

    register

    next _state; //4-bit next state register

    // State tags assigned using binary encoding

    parameter STATE _0 = 2'b00,

    STATE _1 = 2'b01,

    STATE _2 = 2'b10,

    STATE _3 = 2'b11;

    // Always block for State Assignment

    always @(posedge clk)

    begin

    if(reset)

    current _state < STATE 0;

    else

    current _state < next _state;

    end

    endmodule

    //State Assignment Block STATE 1: begin

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    23/103

    23

    //State Assignment Block

    // This block implements thecombination cloud of nextstate assignment logic

    always @(*)

    begin

    case(current state)

    STATE 0 :

    begin

    if(data _in)

    begin

    //transition to next state

    next _state = STATE 1;

    four _ones _det = 1'b0;

    end

    else

    begin

    //retain same state next _state = STATE 0;

    four _ones _det = 1'b0;

    end

    End

    STATE 1:

    begin

    if(data_ in)

    begin

    //transition to next state

    next _state = STATE 2; four _ones _det = 1'b0;

    end

    else

    begin

    //retain same state

    next state = STATE 1;

    four ones det = 1'b0;

    end

    end

    STATE 2 :

    begin

    if(data in)

    begin

    //transition to next state

    next state = STATE 3;

    four ones det = 1'b0;

    end else

    begin

    //retain same state

    next state = STATE 2;

    four ones det = 1'b0;

    end

    end STATE 3 :

    begin

    if(data in)

    begin

    //transition to next state

    next state = STATE 0;

    four ones det = 1'b1;

    end

    else

    begin

    //retain same state

    next state = STATE 3;

    four ones det = 1'b0;

    end

    end

    endcase

    end

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    24/103

    24

    To make this machine MOORE; output should be

    a function of current_state not next_state

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    25/103

    State Encoding Schemes

    25

    One Hot: Very light on resources. Infact, a sequence can be defined using a

    simple shift register

    Binary-coded counter sequences often change multiple bits on one count

    transition. That can lead to decoding glitches. Gray codes ensure minimum

    glitches since just one bit changes

    N d t k f Ill l St t

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    26/103

    Need to keep care of Illegal States

    with One Hot Encoding 26

    It is important to handle illegal states by checking whether more than

    one bit of the state register is 1.

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    27/103

    Guidelines - Summary

    27

    Design Partitioning in Datapath and Controller

    Datapath and control parts have different design objects so keep in different blocks !

    Datapath usually synthesized for better timing; controller synthesized to take

    minimum area.

    FSM Coding in Procedural Blocks

    Two always blocks are preferred, where one implements the sequential part that

    assigns the next state to the state register, and the second block implements the

    combinational logic that computes the next state

    The designer can include the output computations for Mealy or Moore machines,

    respectively, in the same combinational block. Alternatively, if the output is easy to

    compute, they can be computed separately in a continuous assignment outside the

    combinational procedural block.

    State Encoding

    Use meaningful tags using define or parameter statements for all possible states.

    Select the best encoding scheme

    D t t i f 1' 0' i th i l bit i t Th t i i t ill b

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    28/103

    Detect a pair of 1's or 0's in the single bit input. That is, input will be a

    series of one's and zero's. If two one's or two zero's comes one after another,

    output should go high. Otherwise output should be low.

    28

    http://electrosofts.com/verilog/fsm.html

    http://electrosofts.com/verilog/fsm.htmlhttp://electrosofts.com/verilog/fsm.html
  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    29/103

    29

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    30/103

    30

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    31/103

    Micro-programmed State

    Machines

    31

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    32/103

    Micro-Programmed State Machines

    32

    In hardwired state machine based designs, the controller

    is implemented as a Mealy or Moore finite state machine

    (FSM)

    Makes the design rigid

    What can we do if updates to algorithm or sequencing is

    expected ?

    Make the controller programmable

    How

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    33/103

    Idea

    33

    We DO NOT implement the logic for next state --- WE Simply store the

    outputs & next state for the current state in a memory --- Just like a

    lookup table

    The combinational logic is replaced by a sequence of control signals that

    are stored in program memory (PM)

    The PM may be a read only (ROM) or random access (RAM).

    The address of the contents in the memory is determined by the current state and

    input to the FSM.

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    34/103

    General Architecture

    34

    The designer evaluates all possible state

    transitions based on inputs and the current

    state and tabulates the outputs and next

    states as micro coding for PM.

    These values are placed in the PM such that

    the inputs and the current state provide the

    index or address to the PM.

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    35/103

    Example (MEALY)

    35

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    36/103

    Verilog Code

    36

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    37/103

    Micro Programmed MOORE

    37

    The micro program memory is split into two parts

    Combinational logic I and logic II are replaced by PM I and PM II.

    The input and the current state constitute the address for PM I.

    The memory contents of PM I are filled to appropriately generate the next

    state according to the ASM chart. The width of PM I is equal to the size ofthe current state register, whereas its depth is number of bits for {input &

    current state}

    Only the current state acts as the address for PM II. The contents of PM II

    generate output signals for the datapath

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    38/103

    Example

    38

    Variations:

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    39/103

    Variations:

    Counter based State Machines39

    Many controller designs do not depend on the external inputs.

    May require a sequence of control signals

    To read a value, the design only needs to generate addresses to the PM

    Simply Use Counters !!

    Remember the difference b/w micro-processor and these micro-

    programmed state machines for upcomg slides

    Variations : Adding Jumps

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    40/103

    Variations : Adding Jumps

    Loadable Counter based State Machines40

    State machines also have jumps & may also have explicit jumps

    decided at runtime !!!

    Controller should be capable of jumping to start generating control

    signals from a new address in the PM.

    Make branching address part of micro-code ! Unconditional Branching

    Load bit provides a

    programmable way of

    deciding if JUMP should be

    associated with a particular

    state

    Branch_addr provides the

    address

    Variations

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    41/103

    Loadable Counter based State Machines with

    Conditional Branch Support41

    Algorithms may require conditional Jump support as a result of for example

    some ALU operation

    Some sort of Status and Control register (SCR) may be sued

    Good Idea to have a centralized Status Register in your controller

    Not all status signals are always useful

    We increase the load bits

    to have a programmable

    way to test various

    options from the availablestatus bits

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    42/103

    Example Design Scenario

    42

    Variation :

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    43/103

    Variation :

    Register-based Controllers43

    Similar to PC (Program Counter Approach)

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    44/103

    Adding Subroutine Support

    44

    Subroutine, needs to return to the next micro code instruction.

    So we need to store return address in a register.

    The state machine saves the contents of micro PC in a special register

    called the subroutine return address (SRA) register.

    Parity bits are some time

    dd d h k f l

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    45/103

    45

    added to check false

    conditions . Again this helps in

    keeping the datapath as much

    independent as possible

    Allows us to branch on bothtrue and false states & its

    programmable

    PC ADDr

    RET ADDr

    JMP ADDr

    Load SRA on

    CALL to

    subroutine

    PC Address (00)

    JMP Address & Load SRA on CALL (01)

    RET Address (10) Select SRA Address

    Automatically

    updates the

    next PC

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    46/103

    Adding Nested Sub Routine Support

    46

    Add a STACK !!

    Level of nesting ??

    PC ADDr

    ET ADDr

    JMP ADDr

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    47/103

    Logic for Subroutine Address Stack

    47

    On CALL Write isenabled to save

    the RET address

    & the correct

    LIFO address is

    selected based

    upon the MUXvalue (simple

    increment is fine

    for STACK

    ADDRESSING)

    Read_lifo_addr

    points to top of

    stack

    Write_lifo_addr

    points to top+1 of

    stack

    Assumed no error

    handling

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    48/103

    Complete System !

    48

    LOOPs in State Machines

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    49/103

    LOOPs in State Machines

    Example : Filtering !49

    State 1 Reset

    State 2

    Wait for Data

    State 3

    Wait for Complete Data Packet

    State 4

    Start Processing : Repeat State 5 6, 256 times

    State 5

    Convolve filter with data at location x,y State 6

    x++, y++

    State 7

    End

    What if you want to

    apply a cascaded filter

    OO S h

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    50/103

    LOOPs in State Machines

    50

    State 1 Reset

    State 2

    Wait for Data

    State 3

    Wait for Complete Data Packet State 3.5

    Start Filtering : Repeat State 4, 2 times (For two filters)

    State 4

    Start Processing : Repeat State 5 6, 256 times

    State 5 Convolve filter with data at location x,y

    State 6

    x++, y++

    State 7

    End

    Need Nested LOOP

    Support !

    Imagine doing this for

    a Hard Wired State

    Machine !

    Addi LOOP S

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    51/103

    Adding LOOP Support

    51

    Consider a LOOP instruction

    Need a counter now

    Loop counter loads the value on loop command

    Endaddress in

    a loop

    instance

    reached

    Why need

    this ?

    LOOP Ended

    Why need this ?

    Addi NESTED LOOP S

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    52/103

    Adding NESTED LOOP Support

    52

    Add STACKs to your architecture !

    Good thing :

    All stacks need the same global address logic controller !!!

    Why ?

    Adding NESTED LOOP

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    53/103

    Adding NESTED LOOP

    Support

    53

    C l S !

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    54/103

    LOOP & Subroutine

    Address Stack

    Complete System !

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    55/103

    Design Example I Microcoded Machine

    FIFO/LIFO55

    Example Design-1

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    56/103

    Example Design 1

    LIFO/FIFO Architecture56

    A traditional four deep FIFO shall require 4 states

    Working:

    WRITE Gets the new value from IN_BUS on the next available

    space

    DEL updates the read address for the OUT_BUS

    ERROR = Any Error condition, for example : DEL on Empty

    Mi C d f FIFO

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    57/103

    Micro-Code for FIFO

    Mi C d f FIFO

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    58/103

    Micro-Code for FIFO

    Mi C d f LIFO

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    59/103

    Micro-Code for LIFO

    59

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    60/103

    Design Example-IIDesign for Block Based Estimation !

    60

    Example-2Image Source:

    http://www-sipl.technion.ac.il/Info/News&Events_1_e.php?id=373

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    61/103

    p

    Design for Block based Motion Estimation61

    Block based exhaustive Motion

    Estimation

    searches a block in the whole

    image & computes

    some similarity measure, e.g.

    Sum of Absolute Difference

    Example-2Image Source:

    http://www-sipl.technion.ac.il/Info/News&Events_1_e.php?id=373

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    62/103

    p

    Design for Block based Motion Estimation62

    [SHO]Fig 10.22

    Raster Scanning

    Sample Design

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    63/103

    Sample Design

    63

    [SHO]

    S stem Design for a Comple S stem !!

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    64/103

    System Design for a Complex System !!

    64

    From where shall I start

    1. Follow a Top Down Hierarchical Model withiterative refinement

    2. Define the interface with the external world{other components and memory e.t.c. !}

    1. The way of your memory arrangement can betricky but again we identify incrementally

    3. Define major functional blocks & reiterate Step1-3 for each of them until you constitute yourcomplete data path

    Consider Block based Motion Estimation

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    65/103

    Consider Block based Motion Estimation

    65

    [SHO]Fig 10.22

    Lets assume we wish to have a

    micro-coded design.

    We wish to have flexibility to

    change the raster scan direction!!!

    The FUN Part : Lets start the

    design right now Divide &

    Conquer

    Developing a RASTER Machine !!!!!

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    66/103

    p g

    Consider Block based Motion Estimation66

    Considerations:

    Describe what your block shall do

    Shall read an image and a reference block, both from

    memory; & shall raster scan the target image completely

    and report the x,y for lowest computed SAD.

    Define its I/O & Draw the block diagram !Any particular specs

    Customer want it programmable and may change rater style

    & starting position in future !

    Start studying your Algorithm to go further deep in the design.

    Requires Four nested Loops so you need a nice looking

    controller with loop support !Shall require some ALU to the real data crunching !

    Requires Register file to store data read from the memory

    RASTER MACHINE Need a lot of Address Logic to generate

    the right logic depending upon the current state !

    Need to store tx,ty

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    67/103

    67

    Tx & TyRegister

    Reference

    RAM

    (SinglePorted)

    Target

    RAM(SinglePorted)

    Target & REF Register FileAddressing controlled by Address Generator (above)

    Address Generator for :Tx,Ty ,ALU, RAMs, Register File

    Inputs :Current State, Tx, Ty,

    Address

    Generator

    for ALU for

    the

    processingstate

    To : ALU

    From: Reg

    File

    Address Generatorfor Extra Column/Row

    (EAG)

    RASTER ControlControls the Address

    Generation Logic

    Block Address Generator(BAG)

    For generating addresses for

    memory access during initial

    loading

    Needs to keep care for Row

    Major AddressingInput : From TAG

    Output : To Reg File & Memory

    tX,tY

    AddressGenerator

    (TAG)

    ALUPerforming SAD on each cycle & on storing the corresponding tx & ty with minimum

    SAD

    Controller

    (Micro-Coded,

    SupportingNested Loops)

    Row Major Addressing a b c de f g h

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    68/103

    j gfor Matrices

    68

    e f g h

    i j k lm n o p

    C = Number of Columns

    Suppose your loop is over i,j ;

    where i is the loop index for

    current row and j is the loop

    index for current column

    ith row Jth col Row Major Linear Address(Row * C)+Col Add Data

    0 0 0 [0000] a0 1 1 [0001] b0 2 2 [0010] c0 3 3 [0011] d1 0 4 [0100] e1 1 5 [0101] f1 2 6 [0110] g1 3 7 [0111] h2

    0

    8

    [1000]

    i

    2 1 9 [1001] j2 2 10 [1010] k2 3 11 [1011] l3 0 12 [1100] m3 1 13 [1101] n3

    2

    14

    [1110]

    o

    3 3 15 [1111] p

    How can you implement for a square image

    - A Row Major to Linear Address Mapper

    - A Linear to Row-Major Mapper

    Solution :

    Concatenation & De-Concatenation !

    i = tx

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    69/103

    69

    For i = 0: (255- (N-1)

    For j = 0: (255- (N-1)For k = 0: (N-1)

    For l = 0 : (N-1) SAD(k,l) = S(k,l)-R(k,l)

    SAD(i,j) = SAD(i,j) + SAD(k,l)

    End

    EndIf (SAD(I,j) < Min_SAD ); Min_SAD = SAD(i,j)

    End

    End

    Need to get data from RAMassuming Row Major Order

    Shall need a Row Major to

    Linear Converter if required

    Once the blocks are loaded it

    requires a simple one-to-one

    mapping (address generation)

    for ALU (SAD Block) !

    N = elements per row or col

    assuming a square block !

    i tx

    J = ty

    Raster Algo !

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    70/103

    70

    For i = 0:2: (255-(N-1))/2 For j = 0: (255- (N-1)

    For k = 0: (N-1)

    For l = 0 : (N-1)

    SAD(k,l) = |S(k,l)-R(k,l)|

    SAD(i,j) = SAD(i,j) + SAD(k,l)

    End

    End If (SAD(I,j) < Min_SAD ); Min_SAD = SAD(i,j)

    End

    tx = tx +1

    For j = (255- (N-1):0

    For k = 0: (N-1)

    For l = 0 : (N-1)

    SAD(k,l) = S(k,l)-R(k,l)

    SAD(i,j) = SAD(i,j) + SAD(k,l)

    End

    End

    If (SAD(I,j) < Min_SAD ); Min_SAD = SAD(i,j)

    End

    End

    Rastering efficiently !

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    71/103

    Rastering efficiently !

    71

    1 5 3 7 1

    2 3 7 5 2

    3 7 4 3 3

    4 3 5 2 4

    5 1 6 1 5

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    72/103

    72

    Tx & TyRegister

    Reference

    RAM

    (SinglePorted)

    Target

    RAM(SinglePorted)

    Target & REF Register FileAddressing controlled by Address Generator (above)

    Address Generator for :Tx,Ty ,ALU, RAMs, Register File

    Inputs :Current State, Tx, Ty,

    Address

    Generator

    for ALU for

    the

    processingstate

    To : ALU

    From: Reg

    File

    Address Generatorfor Extra Column/Row

    (EAG)

    RASTER ControlControls the Address

    Generation Logic

    Block Address Generator(BAG)

    For generating addresses for

    memory access during initial

    loading

    Needs to keep care for Row

    Major AddressingInput : From TAG

    Output : To Reg File & Memory

    tX,tY

    AddressGenerator

    (TAG)

    ALUPerforming SAD on each cycle & on storing the corresponding tx & ty with minimum

    SAD

    Controller

    (Micro-Coded,

    SupportingNested Loops)

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    73/103

    73

    ALU-In Depth

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    74/103

    ALU-In Depth

    74

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    75/103

    Instruction/StateState

    Value Loop Start End Comments

    Reset S0Reset Everything

    Set tx 0 Initialize tx (Starting x co-ordinate)

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    76/103

    76

    ( g )

    Set ty 0 Initialize ty (Starting y co-ordinate)

    RASTER RIGHT Tell processor you are traversing right initially

    Lp InitBlk S1 Block size Lp InitBlk Lp InitBlk Load initial Blocks for REF & TARGET. Will take clks equal to the number of elements

    Lp R S2 (256)/2-8 Lp C LpR_Dne Run till State Lp R Dne equal to half of number of rows

    Lp C S3 256-8 Lc+Pr LpC_Dne Run till State Lp C Dne equal to number of columns for each row traversed in RIGHT Direction

    Lc+Pr S4 = c size Lc+Pr Lc+Pr Process & Load RIGHT/LEFT Coulmn Due to RASTER value

    Pr S5 b c size Pr PrProcess only

    Pr_dneStore Result

    Update_ty Update ty based upon previous RASTER Direction

    SHIFT LEFT Shift Left

    LpC_Dne S7 Done with one row --- (over all the coulmns)

    RASTER DOWN Block needs to move down !

    Update_tx As defined by previous RASTER !

    Load R/C = c size Lc+Pr Lc+Pr Load Row Due to RASTER value

    SHIFT UP Update REG files !

    RASTER LEFT This step can be avoided by adding a XORING to a predefined bit of Counter : Useful for RASTER !

    Lp C S3 256-8 Lc+Pr LpC_Dne

    Lc+Pr S4 = c size Lc+Pr Lc+Pr Process & Load LEFT Coulmn Due to RASTER value

    Pr S5 b c size Process only

    Pr_dne Store Result

    Update_ty Update ty based upon previous RASTER Direction

    SHIFT RIGHT Shift right - take the extra coulmn to the other end !

    LpC_Dne S7

    RASTER DOWN

    Update_tx

    Load R/C = c size Lc+Pr Lc+Pr Load Row Due to RASTER value

    SHIFT UP

    RASTER RIGHT

    Lp R_Dne

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    77/103

    77

    Designing your own microprocessor

    Datapath Vs Control Logic Partitioning

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    78/103

    Micro Architecture DocumentingCase Study: RISC-SPM

    (A mini RISC Stored Program Machine)

    Datapath Vs. Control Logic Partitioning

    78

    Design Spec or

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    79/103

    Micro-Architecture

    Partitioning of functions into blocks,

    clock/reset requirements, pipelining of

    registers, memory buffers, state machines and

    interface details.

    Micro Architecture Documents

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    80/103

    Template

    Part 1 Block name, Owner, Version control

    Part 2 Overview

    Part 3 Functional/Requirement Specifications

    Operation details, Interfacing signals, .

    Part 4 Detailed Functional description of key

    circuitry with drawings

    Part 5 Verification list of assertions, formalverification rules, etc.

    Part 6 Comments

    Micro-Architecture Template

    P t 1 Bl k O V i t l

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    81/103

    Part 1 Block name, Owner, Version control

    Block Name

    A mini RISC Stored Program Machine ,Dual PortRAM

    Version Control

    Version Modification Author/s Date Remarks

    1.0 Initial Draft Ossama 10th Aug,09

    2.0 Updated FSM for

    Bulk Transfer,

    Page No

    Saad 13th Aug,09 It was found

    that ..

    Micro-Architecture Template

    P t 2 O i

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    82/103

    Part 2 Overview

    Describe what you block is supposed to do

    A mini RISC Stored Program Machine that

    performs basic arithmetic .

    Give enough information for people to recognizethe functionality in a glance

    Should List

    Abbreviations References

    Micro Architecture Template

    P t 3 F ti l/R i t S ifi ti

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    83/103

    Part 3 Functional/Requirement Specification

    What are the functional demands / requirements

    / constraints of your block

    Examples:

    The mini RISC SPM should operate at 2.5 GHz

    Interface with the external world

    Interfacing Signals, Any specific interface

    Instruction

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    84/103

    Set84

    Interface with the external world

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    85/103

    Interface with the external world

    85

    RISC SPM

    Rst

    Clk

    Int

    Micro Architecture Template

    Part 3 Functional/Requirement Specification

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    86/103

    Part 3 Functional/Requirement Specification

    Interface Signal List

    Every interface signal should be listed Dont forget comments:

    for example if a system clock is gated low

    Remember to fill in information which is helpful to

    the designers interfacing to you.

    Micro Architecture Template

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    87/103

    p

    Part 4 Detailed Functional description

    (a)Block Level Diagram Hirarchical (b) Datapath

    (c) Controller

    Block Level DiagramId tif Y j F ti l Bl k

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    88/103

    Identify You major Functional Blocks

    88

    Controller

    RstClk

    Memory

    Processor

    Micro Architecture Template

    Part 4 Detailed Functional description

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    89/103

    Part 4 Detailed Functional description

    a) Block level diagram

    If your block is top level you can go

    gradually to lower levels

    Block diagram/ Macro-Architecture

    Highlighting the flow of data and control

    signals

    Draw Control path and Data path for

    each ground level block For each & every block specify:

    Overview, Interfacing Signals,

    Block Level Diagram

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    90/103

    Identify You major Functional Blocks

    90

    Controller

    RstClk

    Memory

    Processor

    Moving further down into design

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    91/103

    Moving further down into design

    91

    Further add the functionality

    Show how your block is structured

    Dont necessarily draw every wire rather a

    qualitative approach All interface signals should be present on your

    drawing.

    Show all storage elements/registers/pipelinestages

    Block Level DiagramIdentify You major Functional Blocks

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    92/103

    Identify You major Functional Blocks

    92

    Controller

    RstClk

    Memory

    Register File

    ALU

    Instruction Reg.Program Counter

    Processor

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    93/103

    (b) Datapath93

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    94/103

    (c) Controller94

    Control Signals Generation

    Finite State Machines

    ASM Charts

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    95/103

    95

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    96/103

    96

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    97/103

    97

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    98/103

    98

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    99/103

    99

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    100/103

    100

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    101/103

    101

    (d) Timing waveforms of interfacing signals E.g. Interfacing with external RAM

    (e) Memory Map

    Status registers, Defined I/O ports etc

    Micro Architecture Template

    P 5 V ifi i

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    102/103

    Part 5 Verification

    Describe the rules for the correct behaviour ofyour block

    Take your time and describe rules

    Example: 2 cycles after signal A goes down, signal Bshould also go down.

    Micro Architecture DocumentSummary

  • 8/3/2019 ADSD Fall2011 04 Design Partitioning Micro Architecture 2011Oct21

    103/103

    Summary

    Part 1 Block name, Owner, Version control

    Part 2 Overview

    Part 3 Functional/Requirement Specifications

    Operation details/requirements, Interfacing signals

    Part 4 Detailed Functional description

    State diagrams for Control & Data path & waveforms

    Part 5 Verification list of assertions, formalverification rules, etc.