NIOS II Processor

Embed Size (px)

Citation preview

  • 7/28/2019 NIOS II Processor

    1/28

  • 7/28/2019 NIOS II Processor

    2/28

    Outline

    What is a Soft Processor

    What is the NIOS II?

    Architecture for NIOS II, what are the

    implications TigerSHARC VS. NIOS II

    Pipeline Issues

    Issues related to FIR

    Hardware acceleration, using FPGAlogic

  • 7/28/2019 NIOS II Processor

    3/28

    Whats is a Soft

    Processor? Processor implemented in VHDL, Verilog,

    etc., and downloaded onto FPGA hardware

    Can implement many parallel processors

    on one FPGA Can use addition FPGA resources on the

    same chip that is not part of the processor

    core.

    NIOS II is a Soft Processor

  • 7/28/2019 NIOS II Processor

    4/28

    Why Soft Processor?

    Higher level of design reuse

    Reduced obsolescence risk

    Simplified design update or change

    Increased design implementation

    options

    Lower latency between processor and

    FPGA components

  • 7/28/2019 NIOS II Processor

    5/28

    What is NIOS II?

    Software-defined processor

    The processor core is loaded onto

    FPGA

    Programmed using normal

    programming tools (C, asm), not

    hardware description languages

    Can use the rest of the FPGA hardwarefor accelerating parts of the code

  • 7/28/2019 NIOS II Processor

    6/28

    How Is NIOS II

    Implemented The custom FPGA logic that interacts

    with the processor is implemented in

    Altera Quartus II

    The Avalon Interface bus (commoninstruction/data bus) is implemented in

    Quartus II

    The architecture is generated in QuartusII and used for programming in Eclipse

    IDE

  • 7/28/2019 NIOS II Processor

    7/28

  • 7/28/2019 NIOS II Processor

    8/28

    NIOS II IDE

    Coding is implemented in Eclipse rather than

    VisualDSP.

  • 7/28/2019 NIOS II Processor

    9/28

    The Different NIOS II Cores

    There are 3 cores available from Altera

    NIOSII/e: Economical Core

    NIOSII/s: Standard Core

    NIOSII/f: Fast Core

  • 7/28/2019 NIOS II Processor

    10/28

    Whats the Difference between

    the Cores?

    An LE is equivalent to a 8-1 NAND gate + 1 D-Flip FlopAn ALM is equivalent to 2 LEs

  • 7/28/2019 NIOS II Processor

    11/28

    Comparison of TigerSHARC and

    NIOS II architecture

  • 7/28/2019 NIOS II Processor

    12/28

    TigerSHARC Architecture

  • 7/28/2019 NIOS II Processor

    13/28

    NIOS II Architecture

    -thirty two 32-bit general registers, six 32-bit control registers

    -variable cache based on how much FPGA space you have

    -ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is

    not separate like TigerSHARC

  • 7/28/2019 NIOS II Processor

    14/28

    Avalon Interface

    -separate address, data and control lines

    -up to 1024-bit data width transfer, can be set to any width (not power of 2)

    -one transfer per clock cycle.

  • 7/28/2019 NIOS II Processor

    15/28

    NIOS II/f pipeline

    Six stages

    One instruction can be dispatched and/or

    retired pre cycle

    Dynamic branch prediction: 2-bit branchhistory table (no BTB like in TigerSHARC)

  • 7/28/2019 NIOS II Processor

    16/28

    NIOS II/f pipeline

    The pipeline stalls for:

    Multi-cycle instructions

    Cache misses

    Data dependencies (2 cycles between

    calculating and using result)

    Mispredicted branch penalty: 3 cycles

  • 7/28/2019 NIOS II Processor

    17/28

  • 7/28/2019 NIOS II Processor

    18/28

    Hardware multiply

    Can use different options for multiplier(at the processor design stage) No h/w multiply (saves FPGA gates)

    Speed depends on algorithm Use embedded multipliers (if FPGA has

    those)

    1-5 cycles (depends on FPGA)

    Implement multipliers on FPGA gates 11 cycles

    Division 4-66 cycles on hardware

  • 7/28/2019 NIOS II Processor

    19/28

    Compare to TigerSHARC

    No support for parallel instructions

    No support for SIMD operations

    Multicycle instructions stall the pipeline

    All the above limitations can be overcome

    by using FPGA space unoccupied by the

    processor itself

  • 7/28/2019 NIOS II Processor

    20/28

    Comparison of NIOS II and

    TigerSHARC on an FIR Algorithm

  • 7/28/2019 NIOS II Processor

    21/28

    Integer FIR algorithm

    int coeff[]={1, 2, 3, 4, 5, 6, 7, 8};

    int data1[] = {1, 0, 0, 0, 0 ,0 ,0 ,0};

    int output[8];

    int i=0, j=0, k=0;

    for(k=0; k

  • 7/28/2019 NIOS II Processor

    22/28

    Speed analysis

    0 movi r4,8 i = 8

    1 Loop: ldw r2,0(r6) load data

    2 ldw r3,0(r7) load coefficient

    3 addi r4,r4,-1 i--

    4 addi r6,r6,4 coeffPt++

    5 mul r2,r2,r3 data = data * coeff

    6 addi r7,r7,-4 dataPt--

    7 stall data stallwaiting for multiplication

    result

    8 add r5,r5,r2 output += data

    9 bner4,zero,0x10002a0

    will mispredict 2 times in the

    beginning, and 1 time in the end of

    the loop (waste 3 cycles each time)

  • 7/28/2019 NIOS II Processor

    23/28

    Speed analysis

    9 cycles per iteration except the first two(branch predicted not taken) and the last(branch predicted taken) those will be9+3=12 cycles

    1 data stall can remove by movinginstruction from line 4 to 7

    Speed: 8 cycles * (N-3) + 11 cycles * 3 =

    8*(N-3)+33 cycles

    For 1024-tap FIR: 8201 cycles

    Clock cycle is 3 times longer (200MHz vs600MHz)

  • 7/28/2019 NIOS II Processor

    24/28

    Speed comparison

    8201 NIOS II cycles equivalent to 24603TigerSHARC cycles

    Lab3 timing:

    56000 cycles Debug mode 13000 unoptimized ASM

    4000 Optimized ASM

    Worse than unoptimized assembly, but nohardware acceleration used, so this is notthat bad

  • 7/28/2019 NIOS II Processor

    25/28

    Hardware Acceleration

    Profiling tool in Eclipse can show how

    long each function takes

    If function takes too long, it can be sped

    up by Custom instructions

    Hardware Acceleration

    Hardware Acceleration is to take thefunction and transform it into FPGA

    circuitry

  • 7/28/2019 NIOS II Processor

    26/28

    Hardware Acceleration

    Can be done using C2H compiler from Altera

    Trades off Logic Size for Speed up.Table 1. User Appl ication Resul ts Example

    Algorithm Speed Increase(vs. Nios II CPU)

    System fMAX(Mhz)

    System ResourceIncrease (1)

    Autocorrelation 41.0x 115 124%

    Bit Allocation 42.3x 110 152%

    Convolution Encoder 13.3x 95 133%

    Fast Fourier Transform(FFT)

    15.0x 85 208%

    High Pass Filter 42.9x 110 181%

    Matrix Rotate 73.6x 95 106%

    RGB to CMYK 41.5x 120 84%

    RGB to YIQ 39.9x 110 158%

    http://www.altera.com/products/ip/processors/nios2/tools/c2h/ni2-c2h.htmlhttp://www.altera.com/products/ip/processors/nios2/tools/c2h/ni2-c2h.html
  • 7/28/2019 NIOS II Processor

    27/28

    Conclusion

    Soft Processors such as the NIOSII

    offers another alternative in the

    embedded system scene.

    The NIOSII offers the advantage ofadded configurability, and customization

    that blur the line between FPGAs and

    DSPs

  • 7/28/2019 NIOS II Processor

    28/28

    References

    [1] http://www.fpgajournal.com/articles/behere.htm

    Describes an FPGA-DSP project based on Altera Nios

    [2] http://www.altera.com/products/ip/processors/nios2/ni2-index.html

    Official Nios II page

    [3] http://www.hunteng.co.uk/dsp-fpga.htm

    DSP or FPGA? What is better when?

    [4] http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdf

    Article from Xilinx about FPGA DSPs

    [5] http://www.niosforum.com

    Community forum for NIOS

    [6] http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf

    NIOSII Processor HandbookAltera Corporation

    [7] http://www.altera.com/literature/manual/mnl_avalon_spec.pdfAvalon Memory-Mapped Interface Specifications Altera Corporation

    [8] http://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.html

    ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded

    DRAM

    http://www.fpgajournal.com/articles/behere.htmhttp://www.altera.com/products/ip/processors/nios2/ni2-index.htmlhttp://www.hunteng.co.uk/dsp-fpga.htmhttp://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdfhttp://www.niosforum.com/http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdfhttp://www.altera.com/literature/manual/mnl_avalon_spec.pdfhttp://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.htmlhttp://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.htmlhttp://www.altera.com/literature/manual/mnl_avalon_spec.pdfhttp://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdfhttp://www.niosforum.com/http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdfhttp://www.hunteng.co.uk/dsp-fpga.htmhttp://www.hunteng.co.uk/dsp-fpga.htmhttp://www.hunteng.co.uk/dsp-fpga.htmhttp://www.altera.com/products/ip/processors/nios2/ni2-index.htmlhttp://www.altera.com/products/ip/processors/nios2/ni2-index.htmlhttp://www.altera.com/products/ip/processors/nios2/ni2-index.htmlhttp://www.fpgajournal.com/articles/behere.htm