of 26 /26
Slide 1 Microblaze Performance Monitoring Engine Alex Burns ECE 631 April 25, 2005

Microblaze Performance Monitoring Engine

Embed Size (px)


Microblaze Performance Monitoring Engine. Alex Burns ECE 631 April 25, 2005. Outline. Project description Microblaze performance Xilinx tools overview Microblaze implementation Performance monitoring Stream processing engine Implementation results. Description. - PowerPoint PPT Presentation

Text of Microblaze Performance Monitoring Engine

Microblaze Performance Monitoring EngineUse a Virtex 2 Pro evaluation board
Connect a performance monitoring engine to relevant signals of the processor
Slide *
ISA supports branch delay slot
Decreases branch penalty from 2 clock cycles to 1
Instruction pre-fetch buffer continues to fetch during pipeline stalls
Slide *
1 cycle (if branch is not taken)
2 cycles (taken with delay slot)
3 cycles (taken without delay slot)
2 cycles (if register A = 0)
34 cycles
3 cycles
2 cycles
2 cycles
Due to limited performance, software optimization offers great potential
Need to monitor software algorithms for efficiency to achieve the most performance for a given logic area
Logic could be added to improve performance
Designer must decide if this is necessary
Slide *
Use System Description (*.mhs) to generate a Netlist
Setup project hierarchy as microblaze is a submodule to a Xilinx Project Navigator entity
Create Board Support Libraries for given system description
Create HDL using Project Navigator to attach logic system description
Perform Synthesis, Mapping, Place and Route, Bit File creation using Project Navigator
Slide *
Return to Platform Studio to attach compiled software to bitstream
Load bitstream to FPGA
Attach Xilinx debugger (XMD) through JTAG port and attach process to configured Microblaze within FPGA
Load compiled software into instruction memory using XMD
Cross fingers
Run code
Project Goal 1
Implement a Microblaze
Synthesized all logic
Mapped to Memec evaluation board containing a Virtex 2 Pro (XCS2VP4)
Successfully ran test program which tests all IO, memory and displays output through UART
Slide *
Monitor Instruction side memory bus for accesses
Store accesses into VHDL counter
Read counter upon completion of micro-benchmark
Slide *
If cacheable, lookup in tag memory
If tag matches and valid bit is set, drive the ready signal (Cache Hit)
On cache miss, the cache waits for the OPB to fetch the data from memory
does not assert ready signal
Slide *
Trace Interface
The geniuses at Xilinx already thought of the need to monitor performance of the configurable Microblaze
Slide *
New Project Goal 2
Use given performance monitoring tools to connect to Stream Processing Engine
Slide *
What is a Stream
A stream is a block of instructions stored consecutively in memory and executed without branches
for (i=0; i<30; i++)
a += c[i];
Runs at 115K baud
Slide *
Software could be easily changed to any microbenchmark
Working Stream Processor
Microblaze requires 26 block rams with selected configuration
Stream Processor requires 39 block rams due to enormous hash table (4K x 49) and other FIFOs (8 x 48)
Available V2Pro FPGA contains 28 block rams and no external ram
Slide *
Decrease hash table within FPGA by not storing entire PC
Decrease hash table by shortening maximum stream length to be detected
Removed cache in Microblaze
Slide *
Microblaze runs at 100MHz
UART runs at 115K with ascii data requiring 10 bits per character (with start and stop bits)
Slide *
Use FPGA Digital Clock Manager to decrease clock speed
Processor clock slows while UART stays at maximum speed which decreases bandwidth requirement out of the Stream Processor
Didn’t work
Slide *
Future Work
Use a much higher bandwidth communication link to send data out of the stream processor
IDT FIFO, external memory, USB, Ethernet
Use an FPGA with more available BRAMs to avoid performance hit in stream hash table
Slide *