Upload
gutic18
View
216
Download
0
Embed Size (px)
Citation preview
8/3/2019 9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons
1/8
High Level Programming for FPGA Based Image and Video Processing
using Hardware Skeletons
K Benkrid, D Crookes
, J Smith
and A Benkrid
School of Computer Science, The Queens University of Belfast, Belfast BT7 1NN, UK
VisiCom division of Titan corp., 10052 Mesa Ridge Court San Diego, CA, 92121 USA
Abstract
In this paper, we present a new approach to developing a
general framework for efficient FPGA based Image
Processing algorithms. This approach is based on the
new concept of Hardware Skeletons. A hardware
skeleton is a parameterised description of a task-specific
architecture, to which the user can supply parameters
such as values, functions or even other skeletons. Askeleton contains built-in rules that will apply
optimisations specific to the target hardware at the
implementation phase. The framework contains a library
of reusable skeletons for a range of common Image
Processing operations. The library also contains high
level skeletons for common combinations of basic image
operations. Given a complete algorithm description in
terms of skeletons, an efficient hardware configuration is
generated automatically. We have developed a library of
hardware skeletons for common image processing tasks,
with optimised implementations specifically for Xilinx
XC4000 FPGAs. This paper presents and illustrates our
hardware skeleton approach in the context of some
common image processing tasks, based on an
implementation on VISICOMs VigraVisionTM FPGA
based video board.
1. Introduction
Many modern Image Processing (IP) applications (such
as processing video and very large images) are so
computationally demanding that special purpose
hardware solutions need to be considered.
Reconfigurable hardware in the form of FPGAs can offer
the performance advantages of a custom hardware
solution, while their inherent reprogrammability feature
makes them multi-purpose and reusable. However, a big
disadvantage is the low level, hardware-orientedprogramming model needed to fully exploit the FPGAs
potential performance.
Despite the great amount of research done on
FPGAs, many FPGA-based applications have been
algorithm specific. An environment for developing
applications needs more than just a library of static
FPGA configurations, perhaps parameterisable (e.g. in
terms of input data wordlength), since it should allow
the user to experiment with alternative algorithms and
develop his/her own algorithms. There is a need for
bridging the gap between high level application-oriented
software and low level FPGA hardware. Many
behavioural synthesis tools have been developed to
satisfy this requirement [1][2][3]. These tools allow the
user to program FPGAs at a high level (e.g. in a C-like
syntax) without having to deal with low level hardware
details (e.g. scheduling, allocation, pipelining etc.).
However, although behavioural synthesis tools have
developed enormously, structural design techniques
often still result in circuits that are substantially smaller
and faster than those developed using only behavioural
synthesis tools [4][5].
This paper presents a framework for developing
efficient hardware solutions specifically for image
processing applications. This framework gives the
benefits of an application-oriented, high level
programming model, but does not sacrifice significantly
the performance of the solution. Our approach to this is
to use a concept which has proved relatively successful
in developing software for parallel machines, namely
skeletons [6][7][8]. Skeletons are reusable,parameterised fragments or frameworks to which the
user can supply components (e.g. functions). It is
common for skeletons to include functions as
parameters which are applied by the skeleton to a data
set. The implementation of a skeleton is normally
optimised for a specific target machine.
In this paper we introduce the concept of hardware
skeletons. A hardware skeleton is a parameterised
description of a task-specific architecture, to which the
user can supply parameters such as values, functions
(parameterised functional blocks) or even other
skeletons. Certain combinations of basic skeletons can
form the basis of additional, higher level skeletons. To
present the concept, the paper first identifies a usefulhigh level model for describing image processing
operations. The common basic tasks, which we identify,
will form the basis of a Hardware Skeleton Library.
Next, we outline the strategy which the system employs
to generate efficient FPGA configurations from a given
operation description. A layered implementation of the
hardware skeleton library is then presented. Finally, the
Proceedings of the 9
th
Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01)0-7695-2667-5 $20.00 2001 IEEE
8/3/2019 9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons
2/8
implementation of our system on a commercial FPGA
video board is presented.
2. A high level model for IP operations
Many image processing algorithm can be described in
terms of a Directed Acyclic Graph (DAG), where
vertices represent IP tasks, and the directed edges
represent the data flow.
IP tasks
Input2 (e.g. image)Input1 (e.g. image)
Output (e.g. image, histogram etc.)
Figure 1. A hypothetical image processing algorithm
modelled as a DAG
Nodes are typically simple tasks such as adding two
input images, or an image convolution. Common IP tasks
can be classified in terms of the locality of their data
access requirements into three categories:
Point operations: The same operation is applied to
each individual pixel of one or many source images to
produce a corresponding result pixel in the new image.
These include: relational operations (e.g. , , =),
arithmetic operations (e.g. +, -, *, ), logicaloperations (e.g. AND, OR) and Look-Up tables.
The operation could either be between two images or
between an image and a scalar value.
Neighbourhood operations: In neighbourhood
operations, a new pixel value is calculated using only
the pixel values in the neighbourhood of the original
pixel and the weights in a window (e.g. convolution).
This is done for all image pixels, and results in a new
image. Neighbourhood operations are completely
defined by a local operation between corresponding
pixels and window values (e.g. multiplication), a
global operation (e.g. accumulation) which reduces the
window of intermediate results to a single result pixel,
and a window (with given shape and coefficients) [9].
Global operations: These operations operate globally
on the whole image. We can distinguish two common
types of simple global operations:
- Reduction to Scalar (RS): These operate on the
whole image to produce a scalar as a result.
Examples include count, global maximum, global
minimum and global accumulation ().
- Reduction to Vector (RV): This operation operates
on the whole image to produce a vector as a result.
Examples include histogramming and cumulative
histogramming.
The properties of an item of data (represented by an
edge in the DAG) are of two kinds:
Data typeThis is defined by two properties:
- Structure: could be an image, a vector or a scalar.- Pixel type: which, for the purpose of this work,
could be either an integer or a boolean.
Data representationA data representation is defined by three properties:
- The data could be in bit serial, or in bit parallel
with an associated word size or, in digit serial
representation, with a particular digit and word
sizes.
- If data is in bit serial (or digit serial), it can then be
processed either MSB (or MSD) First or LSB (or
LSD) First.
- Number System which, for the purpose of this work,
could be one ofunsigned integer, 2s complement,
or Signed Digit (SD) number representation [10].Note that Binary representation corresponds to bit
parallel with a word size one (denoted asparallel(1)).
A node with a particular set of logical Inputs/Outputs
could be implemented by a range of different possible
implementations as illustrated, for example, for the
Absolute value operation in Figure 2. It is normal (but
not compulsory) for the input and output representations
to be the same.
(d)
Bit Parallel
2s complement
Bit Parallel
2s complement
Absolute valueAbsolute value
Int
Int
(a) (b)
Bit Serial
SD, MSDF
Bit Serial
SD, MSDF
(c)
Bit Serial 2s
complement, MSBF
Bit Serial 2s
complement, MSBF
Absolute value Absolute value
Figure 2. A DAG node (a) with several possible
implementations (b), (c) and (d)
The Hardware Skeleton Library contains parameterised
descriptions of architectures not only for the full range
of basic operations (nodes), but possibly with different
versions for different data representation combinations.
3. Implementation strategy
The users first task will be to represent the algorithm in
terms of a DAG, without initially being concerned with
data type or data representation considerations. Once
this is done, an analysis of the properties of the input
and output data formats of the nodes will identify a
range of possible implementations of each node. For
Proceedings of the 9
th
Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01)0-7695-2667-5 $20.00 2001 IEEE
8/3/2019 9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons
3/8
instance, the result of an N-bit integer image comparison
operation could be either an N-bit integer image or a (1-
bit) binary image. The choice will depend on subsequent
processing of the result image, and on what skeletons are
available. As a first step, the set of all possible
implementations should first be considered by the user.
The library of Hardware Skeletons (e.g. neighbourhood
operations, point operations, etc.), in which eachcomponent has a set of different implementations (e.g.
bit serial, bit parallel), is the basis of this phase. The
implementations of the library components are optimised
for specific target architectures (e.g. bit parallel adder
units based on dedicated fast carry logic on Xilinx 4000).
The range of possible implementations generated for a
particular algorithm depends on the extent of this library.
To select the optimum skeleton from the set of
possible choices, the cost of each choice needs to be
found, in terms of speed and area. This involves
estimating the expected performance or effectively
generating the FPGA configuration for each option,
including the application of the optimisations for each
skeleton. This cost based analysis enables the user to
settle on a final DAG with all attributes (data type and
representation) defined. The corresponding FPGA
implementation is finally generated, in the form of EDIF
netlist, for the chosen solution. This is performed by a
Prolog-based [11] Hardware Description Environment,
called HIDE4k, developed at Queens University Belfast
[5, 12, 13]. This enables highly scaleable and parameter-
ised component descriptions to be written, and generates
pre-placed configurations in EDIF format for Xilinx
XC4000 series [14]. The resulting EDIF file is finally fed
to Xilinx Placement and Routing (PAR) tools to generate
the FPGA configuration bitstream (see Figure 3).
Note that during the process of implementing a
DAG, the following issues arise:
Data representation conversionSince many data representations might be used within
the DAG, data representation converters may be
needed to convert between different representations
(e.g. from bit serial to bit parallel, or from Signed Digit
to twos complement etc.)
Data synchronisationWhen there are two or more inputs to a DAG node
(vertex), any branch that arrives earlier than the others
should be forced to wait for the slowest branches by
adding appropriate delays to the fastest branches. This
is performed automatically by our system so that the
user does not have to deal with low level data
synchronisation issues.
As a result, the users programming model is essentially
the set of hardware skeletons provided by the HardwareSkeleton Library. These skeletons can be accessed
either textually or, even more conveniently, via a GUI.
4. Implementing the Skeleton Library
We have implemented our Hardware Skeleton Library
as a hierarchy of three levels of hardware block
descriptions. At the bottom level is the arithmetic
library (see figure 4). This provides basic arithmetic
units (e.g. adders, multipliers) parameterised for
different number representations (e.g. bit serial, bit
parallel, 2s complement, unsigned etc.). Immediately
on top of this level, we find the basic image operations
library. The latter provides implementations for thebasic image operations presented in section 2 above
(e.g. basic neighbourhood operations). Finally, the top
level provides implementations for high level
(compound) skeletons.
Basic Image Operations Library(e.g. neighbourhood operations)
High Level (compound)Skeletons library
To Image Processing Application Developer
Arithmetic Cores Library
Figure 4. Hierarchical implementation of the Hardware
Skeleton Library
The following section considers each of these three
levels in more detail.
DAG withlogical data
types
Solutiongeneration
A DAG set ofavailable
implementations
Cost BasedAnalysis
DAG with specificdata representation
choices
Hardware SkeletonLibrary
Optimisation
A DAG set ofoptimised
implementations
CodeGenerator EDIF
HIDE4kSystem
Xilinx PARtools
Xilinx XC4000FPGA
Bitstream
Figure 3. Implementation strategy
Proceedings of the 9
th
Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01)0-7695-2667-5 $20.00 2001 IEEE
8/3/2019 9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons
4/8
4.1 Arithmetic cores library
This library provides the basic building blocksrequired for image processing operations (andsignal processing in general). It includesadders, multipliers, dividers, shifts and delays. The basic functions required for nearly any
signal processing operation includeaddition/subtraction, shifts and delays. Theseblocks can then be used to construct the morecomplicated structures such as multipliers,dividers and maximum/minimum selectors.
Versions of these cores are provided fordifferent number representations. At the timeof writing, the following numberrepresentations are supported:
Bit parallel (N bits) 2s complement -
Bit serial 2s complement MSBF
Bit serial 2s complement LSBF
Bit serial Signed Digit MSDF
The implementation of these cores is optimised for a
specific target architecture (Xilinx XC4000 FPGAs for
our particular case study). This should take advantage of
the particular features of the target architecture (e.g. 4
input LUTs, synchronous RAMs, dedicated fast carry
logic for XC4000). The core descriptions are held in
HIDE4k with rules for core-specific optimisations as part
of the core. For instance, a constant coefficient
multiplication will apply CSD coding of the multiplier
coefficient to reduce the consumed hardware [15][16].
Such optimisations are often not performed by
behavioural synthesis tools.
4.2 Basic image operations library
This library provides implementations of the basic image
operations presented in section 2.
Consider the case of basic neighbourhood
operations. As mentioned in section 2, a neighbourhood
operation is defined in terms of a local and global
operation. Local operations include multiplication and
addition. Global operations include accumulation,
maximum and minimum. These form the five basic
Image Algebra neighbourhood operations as shown in
Table 1 [9].
Neighbourhood Operation Local Op. GlobalOp.
Convolution * Multiplicative maximum * Max
Multiplicative minimum * Min
Additive maximum + Max
Additive minimum + Min
Table 1. Image Algebra core operation set
The architecture of a generic PxQ neighbourhoodoperation (with a local operation L and a global one G)
requires (Q-1) line buffers, PxQ replicated localoperation blocks, and a single PxQ-input globaloperation block, implemented as a tree of two-input
reduction blocks when bit parallel arithmetic is used, as
shown in figure 5.
Line
Buffer2
Line
BufferQ-1
Line
Buffer1
L
G
G
G
G
G
G
L
P
G
L
L
L
L
L
L
L
L
L
L
Pixel
buffers
Figure 5. A general 2D PxQ neighbourhood operation
This architecture is parameterisable or scaleable in
terms of [17]:
- The window size (PxQ)- The window coefficients
- The image size (line buffer size LB)
- The pixel wordlength
- The local and global operations (L and G)
- The number representation (arithmetic type)
A generic description of a neighbourhood operation
would then be given by:
neighbourhood_op(Arithmetic_type, Local_op,Global_op, Window, Pixel_wordlength, Image_Size)
Our HIDE4k system is capable of generating pre-placed
FPGA architectures in EDIF format from such generic
description. A ~30K line EDIF description is generated
in 1~2 sec. The resulting architectures are tailored to the
particular neighbourhood operation in hand (e.g.
specific window coefficients). Their performance (speed
Proceedings of the 9
th
Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01)0-7695-2667-5 $20.00 2001 IEEE
8/3/2019 9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons
5/8
and area) rivals those obtained with a careful hand design
[5].
A common skeleton used at this level is the reduce
skeleton which reduces a set of N inputs into one result
using a tree of 2-operands operations (Op) as shown in
Figure 6.
Op
Op
Op
Op
Op
Op
Op
Op
Op
Op
Figure 6. Reduce skeleton
At the time of writing, the supported 2-input reduction
operations (Op) are addition (+), subtraction (-),
maximum (max) and minimum (min).
4.3 High level (compound) skeletons library
This library contains efficient implementations of a set of
compound skeletons. These compound skeletons result
from the process of identifying, by experience, common
ways of assembling primitive operations and providing
optimised implementations of these. To demonstrate this
concept, we will present two examples of such
compound skeletons.
Pipeline skeletonIn this type of operations, two or more IP operations are
cascaded in a pipeline as shown in Figure 7. The input of
each pipeline stage is provided by the output of the
previous pipeline stage.
Operation 1 Operation 2 Operation N
Figure 7. Pipeline skeleton
This structure is described by:
Op_description = pipeline([Op1_desc, Op2_desc,, OpN_desc]
)
where OpI_desc {I = 1,2,, N} is the high level
description of each operation in the pipeline.
For instance, an Open operation (see Figure 8)
applied to 256x256 images of 8-bits/pixel, using 2s
complement bit parallel arithmetic, would be describedby:
Open = pipeline([neighbourhood(tc_par, add, min,
[[0,0,0],[0,0,0],[0,0,0]], 8, 256),neighbourhood(tc_par, add, max,[[0,0,0],[0,0,0],[0,0,0]], 8, 256)]
)
0 0 0
0 0 0
0 0 0
Additive Maximum
0 0 0
0 0 0
0 0 0
Additive Minimum
Figure 8. Open operation
Parallel skeletonA number of common image processing algorithms
comprise several concurrent neighbourhood operations
(simple or compound) which share the same input
image, and whose templates have the same size and
shape. The results of these parallel operations then used
in a reduce operation. Sobel, Prewitt, Roberts and
Kirsch edge detectors [18], are examples of such
operations.
Op
Op
Op
Par_desc2
Par_desc1
Parallel_op1
Parallel_opN
Parallel_opN-1
Parallel_op2
Figure 9. Parallel skeleton
The high level description (Par_desc) of this operationwould be defined as follows:
Par_desc = Op(Par_desc1, Par_desc2) (2)Par_desc1 and Par_desc2 are defined eitherrecursively as compound operations of the form (2)
itself, or as (terminal) pipeline skeletons of the form
(1). Note that the prefixes + and - can, for
readability, be written in infix form. For instance, a
Sobel operation (see Figure 10) will be described by:
Proceedings of the 9
th
Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01)0-7695-2667-5 $20.00 2001 IEEE
8/3/2019 9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons
6/8
pipeline([neighbourhood(tc_par, mult, accum,[[1,2,1],[0,0,0],[-1,-2,-1]], 8, Buf_size), abs])
+pipeline([neighbourhood(tc_par, mult, accum,[[1,0,-1],[2,0,-2],[1,0,-1]], 8, Buf_size), abs])
Absoluteoperation
-1 ~ 1
-2 ~ 2
-1 ~ 1
convolution
+
Absoluteoperation
1 2 1
~ ~ ~
-1 -2 -1
convolution
Figure 10. Sobel operation
In such operations, only one set of line and pixel buffers
is needed to synchronise the supply of pixels for all
parallel operations instead of allocating separate line
buffers for each neighbourhood operation. This is
because all neighbourhood operations are applied to the
same image.
Note that the skeletons (1) and (2) can be nested toany depth and interchangeably (FPGA area permitting).
5. Hardware implementation on a
commercial FPGA based video board
To assess the effectiveness of our skeleton-based
approach, we have implemented our system on a
commercial FPGA based video processing board. The
latter is a single slot PCI card which combines video
acquisition, FPGA based real-time processing, and
display [19]. A functional block diagram of the
VigraVision board is given by figure 11.
Imageacquisition
block
FPGA based
Video Processor
Imagedisplayblock
Camera
VigraVision Board
RGB
Monitor
Host PCI bus
DRAM
Figure 11. Block diagram of the VigraVision video
board
Bit parallel arithmetic has been chosen to implement theIP operations on the onboard FPGA (XC4013E-3). This
choice is motivated by the fact that bit parallel
architectures often lead to a better time-hardware product
than bit serial ones. This is mainly due to the existence of
dedicated fast carry logic on Xilinx FPGAs [5].
However, in the context of processing real time video,
the VigraVision board influences the choice of the
arithmetic. If bit serial arithmetic is to be used, there is a
need to generate a bit clock from the pixel clock. The bit
clock frequency is N times the pixels clock (for an
N-bit pixel). For practical real time video processing,
the luminance pixel sampling rate is 13.5 MHz. This
implies a bit clock frequency of 108 MHz for 8-bit
length pixel processing, and 216 MHz for 16-bit length
pixel processing. The XC4013E-3 cannot operate at
these frequencies. Thus the architectures used will beimplemented from bit parallel-based skeletons. Note
that a trade-off in the form of digit serial arithmetic is
still possible. However, this implies additional hardware
for the digit clock frequency generation, and extra care
for data synchronisation. A parallel implementation is
easier to implement and can be efficiently implemented
using dedicated fast carry logic [16].
Due to the limited memory resources on the FPGA
chip, the line buffers have been implemented using the
off-chip DRAM. Part of the FPGA is configured as an
interface to the onboard DRAM (FIFOs), while the
other part is configured to perform the required image
processing operation as shown below:
Line
buffers
(DRAM)
FPGA chip
Input video stream
Outputvideo
stream
DRAM
interface
Figure 12. Block diagram of the FPGA chip configuration
If the user wants to generate a complete configuration,
including all the low level hardware details, he or she
merely has to provide the required high leveldescription. The latter description must conform to the
format in (1) and (2) and can be input textually or even
graphically. Based on the skeleton library presented
above, our HIDE4k system is capable of generating the
corresponding efficient FPGA configuration in seconds
in the form of an EDIF netlist.
Due to the irregularity of the resulting architectures,
the generated EDIF netlist is only partially placed. Once
the EDIF description is generated, it is then fed to the
Xilinx PAR tools to generate the FPGA configuration
bitstream. This may take a long time (~1hr on a Pentium
233 running Windows 95 with 32M of RAM). This is
partly due to the fact that the EDIF netlist is only
partially placed. Another reason is the small area of thetarget FPGA (24x24 CLBs only). Nonetheless, thewhole process is transparent to the user.
At the application level, the user interfaces to the
VigraVision board through a C-callable library
(VigraVision ToolBox- VTB DLL). The ToolBox
includes hardware initialisation and register control
functions, image acquisition functions and image
processing functions. For instance, the application
Proceedings of the 9
th
Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01)0-7695-2667-5 $20.00 2001 IEEE
8/3/2019 9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons
7/8
developer downloads a particular FPGA configuration by
invoking the following function:
u_loadXilinx(Xilinx_Chip_ID, Configuration_
filename)
He or she can then copy the processed image from the
video buffer to the host processor, for further analysis,
by:
u_readRect(LPRECT lpImgRect, LPVOID imgData)
where lpImgRect specifies the rectangle in the frame
buffer where data is to be read from, and imgData is a
pointer to the image transferred to the host memory.
The Tool Box is supported in Microsoft Visual
C/C++ 5.0. Its functions can be accessed by application
software which has been linked with the VTB import
library and accessed via Windows as a DLL library. The
resulting video coprocessor (see Figure 13) is based on a
library of bitstream configurations ready to use
(download to the FPGA) from a high level language
(VC++ in our case). This library is extensible over time
using our HIDE4k system. Thanks to our skeleton
oriented approach, this task is relatively easy to performand requires little FPGA hardware knowledge. This has
been illustrated in this paper by two high level skeletons.
Other skeletons can be designed using a similar approach
and added to the library.
6. Conclusion
In this paper, we have presented a framework for FPGA
based Image Processing. Central to this framework is the
Hardware Skeleton Library which contains a set of high
level descriptions of task-specific architectures
specifically optimised for Xilinx XC4000 FPGAs.
Although extensible, the library is based on a core level
containing the operations of Image Algebra. The libraryalso contains high level skeletons for compound
operations, whose implementations include task-specific
optimisations. Skeletons are parameterisable, and
different skeletons for the same operation can be
provided, for instance for different arithmetic
representations. This gives the user a range of
implementation choices. This in turn supports
experimentation with different implementations and
choosing the most suitable one for the particular
constraints in hand (e.g. speed and area). We are
investigating the possibility of doing some of this
experimentation automatically, but for now we do it
manually. Given a complete algorithm description in
terms of skeletons, an efficient hardware configurationis generated automatically by our system.
Our approach was assessed successful by a real
hardware implementation of a video coprocessor on a
commercial FPGA based video board giving real time
processing of video data. This video coprocessor allows
for rapid generation of FPGA architectures from very
high level, algorithmic, descriptions and opens the way
to enabling image processing application developers to
exploit the high performance capability of a direct
hardware solution, while programming in an
application-oriented model.
Note that the skeleton oriented approach is not tied
to a particular FPGA chip. Moreover, it may have some
applicability for VLSI design. Furthermore, other
application domains where there is an established
algebra such as numerical processing can also benefit
from the skeleton approach.
Full system development will in practice inevitably
hit the problem that some particular task is not readily
expressed in terms of the skeletons currently in the
library. It will always be necessary to have an ongoing
process of skeleton development. This will of course
require a skilled architecture designer, although less
efficient solutions might be possible using existing
skeletons; but the advantage of our approach is that
system builders themselves do not require detailed
hardware description skills.
Future directions include upgrading the system to
handle other FPGA series (particularly Xilinx Virtex
chips). The extension of the hardware skeleton library,
both in supporting more arithmetic types and providing
other skeletons for more sophisticated image processing
operations (wavelet transform in particular), is being
investigated.
OR
High Level IP operationsdescriptionsExtensible bitstream
configurations library
HIDE4ksystem
Xilinx PARtools
EDIFNetlist
VC++ program
VigraVision PCI video board
text
Hardware Skeleton Library
VTB library
Figure 13. Overall view of the VigraVision based video coprocessor
Proceedings of the 9
th
Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01)0-7695-2667-5 $20.00 2001 IEEE
8/3/2019 9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons
8/8
7. References
[1] Synopsys Inc., Behavioural Compiler, Software
documentation, 1998.http://www.synopsys.com/products/beh_syn/
[2] C Level Design Inc, C/C++ Synthesis SystemCompiler, Product overview, 1998.
http://www.cleveldesign.com/products/[3] The Embedded Solutions Limited, Handel C
information sheets, 1999 http://www.embeddedsol.com
[4] Hutchings B, Bellows P, Hawkins J, Hemmert S, Nelson
B and Rytting M, A CAD suite for High-PerformanceFPGA design, FCCM99, Preliminary Proceedings.
[5] Benkrid K, Design and Implementation of a High Level
FPGA Based Coprocessor for Image and Video
Processing, PhD Thesis, Department of ComputerScience, The Queen's University of Belfast, 2000.
[6] Cole M, Algorithmic Skeletons: structured management
of parallel computation, MIT Press, 1989.
[7] Darlington J, Ghanem M, and To H W, 'StructuredParallel Programming', In Programming Models for
Massively Parallel Programming Computers, IEEE
Computer Society Press, pp. 160-169, Sept 1993.
[8] Michaelson G J, Scaife N R, and Wallace A M,'Prototyping parallel algorithms in Standard ML',
Proceedings of British Vision Conference, Sep 1995.
ftp://ftp.cee.hw.ac.uk/pub/funcprog/msw.bmvc95.ps.Z
[9] Ritter G X, Wilson J N and Davidson J L, ImageAlgebra: an overview, Computer Vision, Graphics and
Image Processing, No 49, pp 297-331, 1990.
[10] Avizienis A, Signed Digit Number Representation for
Fast Parallel Arithmetic, IRE Transactions on
Electronic Computer, Vol. 10, pp 389-400, 1961.
[11] Clocksin W F and Melish C S, Programming inProlog, Springer-Verlag, 1994
[12] Crookes D, Alotaibi K, Bouridane A, Donachy P and
Benkrid A, An Environment for Generating FPGA
Architectures for Image Algebra-based Algorithms,
ICIP98, Vol.3, pp. 990-994, 1998.[13] Benkrid K, Crookes D, Bouridane A, Corr P and
Alotaibi K, A High Level Software Environment for
FPGA Based Image Processing, Proc. IPA'99, IEESeventh International Conference on Image Processing
and its Applications, Manchester, July 1999. pp. 112-
116.
[14] Xilinx Ltd, XC4000E and XC4000X Series FieldProgrammable Gate Arrays -Product Specification,
1999. http://www.xilinx.com/partinfo/4000.pdf
[15] Koren I, Computer arithmetic algorithms, Prentice-
Hall, Inc, pp. 99-126, 1993.[16] Benkrid K, Crookes D, Smith J, Benkrid A, 'High Level
Programming for Real Time FPGA Based Video
Programming', Proc. ICASSP'2000, IEEE International
Conference on Acoustic, Speech and Signal Processing,
Istanbul, June 2000. Volume VI, pp. 3227-3231.[17] Crookes D, Benkrid K, Bouridane A, Alotaibi K and
Benkrid A, Design and Implementation of a High Level
Programming Environment for FPGA Based ImageProcessing, IEE proceedings Vision, Image and SignalProcessing, Vol. 147, No. 7, pp. 377-384.
[18] Ross J, The Image Processing Handbook, CRC Press,
1995.[19] Visicom Laboratories, The VigraVision PCI video
board: users manual, 1998. http://www.visicom.com
Proceedings of the 9
th
Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01)0-7695-2667-5 $20.00 2001 IEEE