Microblaze Performance Monitoring EngineUse a Virtex 2 Pro
evaluation board
Connect a performance monitoring engine to relevant signals of the
processor
Slide *
ISA supports branch delay slot
Decreases branch penalty from 2 clock cycles to 1
Instruction pre-fetch buffer continues to fetch during pipeline
stalls
Slide *
1 cycle (if branch is not taken)
2 cycles (taken with delay slot)
3 cycles (taken without delay slot)
2 cycles (if register A = 0)
34 cycles
3 cycles
2 cycles
2 cycles
Due to limited performance, software optimization offers great
potential
Need to monitor software algorithms for efficiency to achieve the
most performance for a given logic area
Logic could be added to improve performance
Designer must decide if this is necessary
Slide *
Use System Description (*.mhs) to generate a Netlist
Setup project hierarchy as microblaze is a submodule to a Xilinx
Project Navigator entity
Create Board Support Libraries for given system description
Create HDL using Project Navigator to attach logic system
description
Perform Synthesis, Mapping, Place and Route, Bit File creation
using Project Navigator
Slide *
Return to Platform Studio to attach compiled software to
bitstream
Load bitstream to FPGA
Attach Xilinx debugger (XMD) through JTAG port and attach process
to configured Microblaze within FPGA
Load compiled software into instruction memory using XMD
Cross fingers
Run code
Project Goal 1
Implement a Microblaze
Synthesized all logic
Mapped to Memec evaluation board containing a Virtex 2 Pro
(XCS2VP4)
Successfully ran test program which tests all IO, memory and
displays output through UART
Slide *
Monitor Instruction side memory bus for accesses
Store accesses into VHDL counter
Read counter upon completion of micro-benchmark
Slide *
If cacheable, lookup in tag memory
If tag matches and valid bit is set, drive the ready signal (Cache
Hit)
On cache miss, the cache waits for the OPB to fetch the data from
memory
does not assert ready signal
Slide *
Trace Interface
The geniuses at Xilinx already thought of the need to monitor
performance of the configurable Microblaze
Slide *
New Project Goal 2
Use given performance monitoring tools to connect to Stream
Processing Engine
Slide *
What is a Stream
A stream is a block of instructions stored consecutively in memory
and executed without branches
for (i=0; i<30; i++)
a += c[i];
Runs at 115K baud
Slide *
Status
Software could be easily changed to any microbenchmark
Working Stream Processor
Microblaze requires 26 block rams with selected configuration
Stream Processor requires 39 block rams due to enormous hash table
(4K x 49) and other FIFOs (8 x 48)
Available V2Pro FPGA contains 28 block rams and no external
ram
Slide *
Decrease hash table within FPGA by not storing entire PC
Decrease hash table by shortening maximum stream length to be
detected
Removed cache in Microblaze
Slide *
Microblaze runs at 100MHz
UART runs at 115K with ascii data requiring 10 bits per character
(with start and stop bits)
Slide *
Use FPGA Digital Clock Manager to decrease clock speed
Processor clock slows while UART stays at maximum speed which
decreases bandwidth requirement out of the Stream Processor
Didn’t work
Slide *
Future Work
Use a much higher bandwidth communication link to send data out of
the stream processor
IDT FIFO, external memory, USB, Ethernet
Use an FPGA with more available BRAMs to avoid performance hit in
stream hash table
Slide *
Questions