9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons

8/3/2019 9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons

1/8

High Level Programming for FPGA Based Image and Video Processing

using Hardware Skeletons

K Benkrid, D Crookes

, J Smith

and A Benkrid

School of Computer Science, The Queens University of Belfast, Belfast BT7 1NN, UK

VisiCom division of Titan corp., 10052 Mesa Ridge Court San Diego, CA, 92121 USA

Abstract

In this paper, we present a new approach to developing a

general framework for efficient FPGA based Image

Processing algorithms. This approach is based on the

new concept of Hardware Skeletons. A hardware

skeleton is a parameterised description of a task-specific

architecture, to which the user can supply parameters

such as values, functions or even other skeletons. Askeleton contains built-in rules that will apply

optimisations specific to the target hardware at the

implementation phase. The framework contains a library

of reusable skeletons for a range of common Image

Processing operations. The library also contains high

level skeletons for common combinations of basic image

operations. Given a complete algorithm description in

terms of skeletons, an efficient hardware configuration is

generated automatically. We have developed a library of

hardware skeletons for common image processing tasks,

with optimised implementations specifically for Xilinx

XC4000 FPGAs. This paper presents and illustrates our

hardware skeleton approach in the context of some

common image processing tasks, based on an

implementation on VISICOMs VigraVisionTM FPGA

based video board.

1. Introduction

Many modern Image Processing (IP) applications (such

as processing video and very large images) are so

computationally demanding that special purpose

hardware solutions need to be considered.

Reconfigurable hardware in the form of FPGAs can offer

the performance advantages of a custom hardware

solution, while their inherent reprogrammability feature

makes them multi-purpose and reusable. However, a big

disadvantage is the low level, hardware-orientedprogramming model needed to fully exploit the FPGAs

potential performance.

Despite the great amount of research done on

FPGAs, many FPGA-based applications have been

algorithm specific. An environment for developing

applications needs more than just a library of static

FPGA configurations, perhaps parameterisable (e.g. in

terms of input data wordlength), since it should allow

the user to experiment with alternative algorithms and

develop his/her own algorithms. There is a need for

bridging the gap between high level application-oriented

software and low level FPGA hardware. Many

behavioural synthesis tools have been developed to

satisfy this requirement [1][2][3]. These tools allow the

user to program FPGAs at a high level (e.g. in a C-like

syntax) without having to deal with low level hardware

details (e.g. scheduling, allocation, pipelining etc.).

However, although behavioural synthesis tools have

developed enormously, structural design techniques

often still result in circuits that are substantially smaller

and faster than those developed using only behavioural

synthesis tools [4][5].

This paper presents a framework for developing

efficient hardware solutions specifically for image

processing applications. This framework gives the

benefits of an application-oriented, high level

programming model, but does not sacrifice significantly

the performance of the solution. Our approach to this is

to use a concept which has proved relatively successful

in developing software for parallel machines, namely

skeletons [6][7][8]. Skeletons are reusable,parameterised fragments or frameworks to which the

user can supply components (e.g. functions). It is

common for skeletons to include functions as

parameters which are applied by the skeleton to a data

set. The implementation of a skeleton is normally

optimised for a specific target machine.

In this paper we introduce the concept of hardware

skeletons. A hardware skeleton is a parameterised

description of a task-specific architecture, to which the

user can supply parameters such as values, functions

(parameterised functional blocks) or even other

skeletons. Certain combinations of basic skeletons can

form the basis of additional, higher level skeletons. To

present the concept, the paper first identifies a usefulhigh level model for describing image processing

operations. The common basic tasks, which we identify,

will form the basis of a Hardware Skeleton Library.

Next, we outline the strategy which the system employs

to generate efficient FPGA configurations from a given

operation description. A layered implementation of the

hardware skeleton library is then presented. Finally, the

Proceedings of the 9

th

Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01)0-7695-2667-5 $20.00 2001 IEEE


2/8

implementation of our system on a commercial FPGA

video board is presented.

2. A high level model for IP operations

Many image processing algorithm can be described in

terms of a Directed Acyclic Graph (DAG), where

vertices represent IP tasks, and the directed edges

represent the data flow.

IP tasks

Input2 (e.g. image)Input1 (e.g. image)

Output (e.g. image, histogram etc.)

Figure 1. A hypothetical image processing algorithm

modelled as a DAG

Nodes are typically simple tasks such as adding two

input images, or an image convolution. Common IP tasks

can be classified in terms of the locality of their data

access requirements into three categories:

Point operations: The same operation is applied to

each individual pixel of one or many source images to

produce a corresponding result pixel in the new image.

These include: relational operations (e.g. , , =),

arithmetic operations (e.g. +, -, *, ), logicaloperations (e.g. AND, OR) and Look-Up tables.

The operation could either be between two images or

between an image and a scalar value.

Neighbourhood operations: In neighbourhood

operations, a new pixel value is calculated using only

the pixel values in the neighbourhood of the original

pixel and the weights in a window (e.g. convolution).

This is done for all image pixels, and results in a new

image. Neighbourhood operations are completely

defined by a local operation between corresponding

pixels and window values (e.g. multiplication), a

global operation (e.g. accumulation) which reduces the

window of intermediate results to a single result pixel,

and a window (with given shape and coefficients) [9].

Global operations: These operations operate globally

on the whole image. We can distinguish two common

types of simple global operations:

- Reduction to Scalar (RS): These operate on the

whole image to produce a scalar as a result.

Examples include count, global maximum, global

minimum and global accumulation ().

- Reduction to Vector (RV): This operation operates

on the whole image to produce a vector as a result.

Examples include histogramming and cumulative

histogramming.

The properties of an item of data (represented by an

edge in the DAG) are of two kinds:

Data typeThis is defined by two properties:

- Structure: could be an image, a vector or a scalar.- Pixel type: which, for the purpose of this work,

could be either an integer or a boolean.

Data representationA data representation is defined by three properties:

- The data could be in bit serial, or in bit parallel

with an associated word size or, in digit serial

representation, with a particular digit and word

sizes.

- If data is in bit serial (or digit serial), it can then be

processed either MSB (or MSD) First or LSB (or

LSD) First.

- Number System which, for the purpose of this work,

could be one ofunsigned integer, 2s complement,

or Signed Digit (SD) number representation [10].Note that Binary representation corresponds to bit

parallel with a word size one (denoted asparallel(1)).

A node with a particular set of logical Inputs/Outputs

could be implemented by a range of different possible

implementations as illustrated, for example, for the

Absolute value operation in Figure 2. It is normal (but

not compulsory) for the input and output representations

to be the same.

(d)

Bit Parallel

2s complement

Bit Parallel

2s complement

Absolute valueAbsolute value

Int

Int

(a) (b)

Bit Serial

SD, MSDF

Bit Serial

SD, MSDF

(c)

Bit Serial 2s

complement, MSBF

Bit Serial 2s

complement, MSBF

Absolute value Absolute value

Figure 2. A DAG node (a) with several possible

implementations (b), (c) and (d)

The Hardware Skeleton Library contains parameterised

descriptions of architectures not only for the full range

of basic operations (nodes), but possibly with different

versions for different data representation combinations.

3. Implementation strategy

The users first task will be to represent the algorithm in

terms of a DAG, without initially being concerned with

data type or data representation considerations. Once

this is done, an analysis of the properties of the input

and output data formats of the nodes will identify a

range of possible implementations of each node. For


th



3/8

instance, the result of an N-bit integer image comparison

operation could be either an N-bit integer image or a (1-

bit) binary image. The choice will depend on subsequent

processing of the result image, and on what skeletons are

available. As a first step, the set of all possible

implementations should first be considered by the user.

The library of Hardware Skeletons (e.g. neighbourhood

operations, point operations, etc.), in which eachcomponent has a set of different implementations (e.g.

bit serial, bit parallel), is the basis of this phase. The

implementations of the library components are optimised

for specific target architectures (e.g. bit parallel adder

units based on dedicated fast carry logic on Xilinx 4000).

The range of possible implementations generated for a

particular algorithm depends on the extent of this library.

To select the optimum skeleton from the set of

possible choices, the cost of each choice needs to be

found, in terms of speed and area. This involves

estimating the expected performance or effectively

generating the FPGA configuration for each option,

including the application of the optimisations for each

skeleton. This cost based analysis enables the user to

settle on a final DAG with all attributes (data type and

representation) defined. The corresponding FPGA

implementation is finally generated, in the form of EDIF

netlist, for the chosen solution. This is performed by a

Prolog-based [11] Hardware Description Environment,

called HIDE4k, developed at Queens University Belfast

[5, 12, 13]. This enables highly scaleable and parameter-

ised component descriptions to be written, and generates

pre-placed configurations in EDIF format for Xilinx

XC4000 series [14]. The resulting EDIF file is finally fed

to Xilinx Placement and Routing (PAR) tools to generate

the FPGA configuration bitstream (see Figure 3).

Note that during the process of implementing a

DAG, the following issues arise:

Data representation conversionSince many data representations might be used within

the DAG, data representation converters may be

needed to convert between different representations

(e.g. from bit serial to bit parallel, or from Signed Digit

to twos complement etc.)

Data synchronisationWhen there are two or more inputs to a DAG node

(vertex), any branch that arrives earlier than the others

should be forced to wait for the slowest branches by

adding appropriate delays to the fastest branches. This

is performed automatically by our system so that the

user does not have to deal with low level data

synchronisation issues.

As a result, the users programming model is essentially

the set of hardware skeletons provided by the HardwareSkeleton Library. These skeletons can be accessed

either textually or, even more conveniently, via a GUI.

4. Implementing the Skeleton Library

We have implemented our Hardware Skeleton Library

as a hierarchy of three levels of hardware block

descriptions. At the bottom level is the arithmetic

library (see figure 4). This provides basic arithmetic

units (e.g. adders, multipliers) parameterised for

different number representations (e.g. bit serial, bit

parallel, 2s complement, unsigned etc.). Immediately

on top of this level, we find the basic image operations

library. The latter provides implementations for thebasic image operations presented in section 2 above

(e.g. basic neighbourhood operations). Finally, the top

level provides implementations for high level

(compound) skeletons.

Basic Image Operations Library(e.g. neighbourhood operations)

High Level (compound)Skeletons library

To Image Processing Application Developer

Arithmetic Cores Library

Figure 4. Hierarchical implementation of the Hardware

Skeleton Library

The following section considers each of these three

levels in more detail.

DAG withlogical data

types

Solutiongeneration

A DAG set ofavailable

implementations

Cost BasedAnalysis

DAG with specificdata representation

choices

Hardware SkeletonLibrary

Optimisation

A DAG set ofoptimised

implementations

CodeGenerator EDIF

HIDE4kSystem

Xilinx PARtools

Xilinx XC4000FPGA

Bitstream

Figure 3. Implementation strategy


th



4/8

4.1 Arithmetic cores library

This library provides the basic building blocksrequired for image processing operations (andsignal processing in general). It includesadders, multipliers, dividers, shifts and delays. The basic functions required for nearly any

signal processing operation includeaddition/subtraction, shifts and delays. Theseblocks can then be used to construct the morecomplicated structures such as multipliers,dividers and maximum/minimum selectors.

Versions of these cores are provided fordifferent number representations. At the timeof writing, the following numberrepresentations are supported:

Bit parallel (N bits) 2s complement -

Bit serial 2s complement MSBF

Bit serial 2s complement LSBF

Bit serial Signed Digit MSDF

The implementation of these cores is optimised for a

specific target architecture (Xilinx XC4000 FPGAs for

our particular case study). This should take advantage of

the particular features of the target architecture (e.g. 4

input LUTs, synchronous RAMs, dedicated fast carry

logic for XC4000). The core descriptions are held in

HIDE4k with rules for core-specific optimisations as part

of the core. For instance, a constant coefficient

multiplication will apply CSD coding of the multiplier

coefficient to reduce the consumed hardware [15][16].

Such optimisations are often not performed by

behavioural synthesis tools.

4.2 Basic image operations library

This library provides implementations of the basic image

operations presented in section 2.

Consider the case of basic neighbourhood

operations. As mentioned in section 2, a neighbourhood

operation is defined in terms of a local and global

operation. Local operations include multiplication and

addition. Global operations include accumulation,

maximum and minimum. These form the five basic

Image Algebra neighbourhood operations as shown in

Table 1 [9].

Neighbourhood Operation Local Op. GlobalOp.

Convolution * Multiplicative maximum * Max

Multiplicative minimum * Min

Additive maximum + Max

Additive minimum + Min

Table 1. Image Algebra core operation set

The architecture of a generic PxQ neighbourhoodoperation (with a local operation L and a global one G)

requires (Q-1) line buffers, PxQ replicated localoperation blocks, and a single PxQ-input globaloperation block, implemented as a tree of two-input

reduction blocks when bit parallel arithmetic is used, as

shown in figure 5.

Line

Buffer2

Line

BufferQ-1

Line

Buffer1

L

G

G

G

G

G

G

L

P

G

L

L

L

L

L

L

L

L

L

L

Pixel

buffers

Figure 5. A general 2D PxQ neighbourhood operation

This architecture is parameterisable or scaleable in

terms of [17]:

- The window size (PxQ)- The window coefficients

- The image size (line buffer size LB)

- The pixel wordlength

- The local and global operations (L and G)

- The number representation (arithmetic type)

A generic description of a neighbourhood operation

would then be given by:

neighbourhood_op(Arithmetic_type, Local_op,Global_op, Window, Pixel_wordlength, Image_Size)

Our HIDE4k system is capable of generating pre-placed

FPGA architectures in EDIF format from such generic

description. A ~30K line EDIF description is generated

in 1~2 sec. The resulting architectures are tailored to the

particular neighbourhood operation in hand (e.g.

specific window coefficients). Their performance (speed


th



5/8

and area) rivals those obtained with a careful hand design

[5].

A common skeleton used at this level is the reduce

skeleton which reduces a set of N inputs into one result

using a tree of 2-operands operations (Op) as shown in

Figure 6.

Op

Op

Op

Op

Op

Op

Op

Op

Op

Op

Figure 6. Reduce skeleton

At the time of writing, the supported 2-input reduction

operations (Op) are addition (+), subtraction (-),

maximum (max) and minimum (min).

4.3 High level (compound) skeletons library

This library contains efficient implementations of a set of

compound skeletons. These compound skeletons result

from the process of identifying, by experience, common

ways of assembling primitive operations and providing

optimised implementations of these. To demonstrate this

concept, we will present two examples of such

compound skeletons.

Pipeline skeletonIn this type of operations, two or more IP operations are

cascaded in a pipeline as shown in Figure 7. The input of

each pipeline stage is provided by the output of the

previous pipeline stage.

Operation 1 Operation 2 Operation N

Figure 7. Pipeline skeleton

This structure is described by:

Op_description = pipeline([Op1_desc, Op2_desc,, OpN_desc]

)

where OpI_desc {I = 1,2,, N} is the high level

description of each operation in the pipeline.

For instance, an Open operation (see Figure 8)

applied to 256x256 images of 8-bits/pixel, using 2s

complement bit parallel arithmetic, would be describedby:

Open = pipeline([neighbourhood(tc_par, add, min,

[[0,0,0],[0,0,0],[0,0,0]], 8, 256),neighbourhood(tc_par, add, max,[[0,0,0],[0,0,0],[0,0,0]], 8, 256)]

)

0 0 0

0 0 0

0 0 0

Additive Maximum

0 0 0

0 0 0

0 0 0

Additive Minimum

Figure 8. Open operation

Parallel skeletonA number of common image processing algorithms

comprise several concurrent neighbourhood operations

(simple or compound) which share the same input

image, and whose templates have the same size and

shape. The results of these parallel operations then used

in a reduce operation. Sobel, Prewitt, Roberts and

Kirsch edge detectors [18], are examples of such

operations.

Op

Op

Op

Par_desc2

Par_desc1

Parallel_op1

Parallel_opN

Parallel_opN-1

Parallel_op2

Figure 9. Parallel skeleton

The high level description (Par_desc) of this operationwould be defined as follows:

Par_desc = Op(Par_desc1, Par_desc2) (2)Par_desc1 and Par_desc2 are defined eitherrecursively as compound operations of the form (2)

itself, or as (terminal) pipeline skeletons of the form

(1). Note that the prefixes + and - can, for

readability, be written in infix form. For instance, a

Sobel operation (see Figure 10) will be described by:


th



6/8

pipeline([neighbourhood(tc_par, mult, accum,[[1,2,1],[0,0,0],[-1,-2,-1]], 8, Buf_size), abs])

+pipeline([neighbourhood(tc_par, mult, accum,[[1,0,-1],[2,0,-2],[1,0,-1]], 8, Buf_size), abs])

Absoluteoperation

-1 ~ 1

-2 ~ 2

-1 ~ 1

convolution

+

Absoluteoperation

1 2 1

~ ~ ~

-1 -2 -1

convolution

Figure 10. Sobel operation

In such operations, only one set of line and pixel buffers

is needed to synchronise the supply of pixels for all

parallel operations instead of allocating separate line

buffers for each neighbourhood operation. This is

because all neighbourhood operations are applied to the

same image.

Note that the skeletons (1) and (2) can be nested toany depth and interchangeably (FPGA area permitting).

5. Hardware implementation on a

commercial FPGA based video board

To assess the effectiveness of our skeleton-based

approach, we have implemented our system on a

commercial FPGA based video processing board. The

latter is a single slot PCI card which combines video

acquisition, FPGA based real-time processing, and

display [19]. A functional block diagram of the

VigraVision board is given by figure 11.

Imageacquisition

block

FPGA based

Video Processor

Imagedisplayblock

Camera

VigraVision Board

RGB

Monitor

Host PCI bus

DRAM

Figure 11. Block diagram of the VigraVision video

board

Bit parallel arithmetic has been chosen to implement theIP operations on the onboard FPGA (XC4013E-3). This

choice is motivated by the fact that bit parallel

architectures often lead to a better time-hardware product

than bit serial ones. This is mainly due to the existence of

dedicated fast carry logic on Xilinx FPGAs [5].

However, in the context of processing real time video,

the VigraVision board influences the choice of the

arithmetic. If bit serial arithmetic is to be used, there is a

need to generate a bit clock from the pixel clock. The bit

clock frequency is N times the pixels clock (for an

N-bit pixel). For practical real time video processing,

the luminance pixel sampling rate is 13.5 MHz. This

implies a bit clock frequency of 108 MHz for 8-bit

length pixel processing, and 216 MHz for 16-bit length

pixel processing. The XC4013E-3 cannot operate at

these frequencies. Thus the architectures used will beimplemented from bit parallel-based skeletons. Note

that a trade-off in the form of digit serial arithmetic is

still possible. However, this implies additional hardware

for the digit clock frequency generation, and extra care

for data synchronisation. A parallel implementation is

easier to implement and can be efficiently implemented

using dedicated fast carry logic [16].

Due to the limited memory resources on the FPGA

chip, the line buffers have been implemented using the

off-chip DRAM. Part of the FPGA is configured as an

interface to the onboard DRAM (FIFOs), while the

other part is configured to perform the required image

processing operation as shown below:

Line

buffers

(DRAM)

FPGA chip

Input video stream

Outputvideo

stream

DRAM

interface

Figure 12. Block diagram of the FPGA chip configuration

If the user wants to generate a complete configuration,

including all the low level hardware details, he or she

merely has to provide the required high leveldescription. The latter description must conform to the

format in (1) and (2) and can be input textually or even

graphically. Based on the skeleton library presented

above, our HIDE4k system is capable of generating the

corresponding efficient FPGA configuration in seconds

in the form of an EDIF netlist.

Due to the irregularity of the resulting architectures,

the generated EDIF netlist is only partially placed. Once

the EDIF description is generated, it is then fed to the

Xilinx PAR tools to generate the FPGA configuration

bitstream. This may take a long time (~1hr on a Pentium

233 running Windows 95 with 32M of RAM). This is

partly due to the fact that the EDIF netlist is only

partially placed. Another reason is the small area of thetarget FPGA (24x24 CLBs only). Nonetheless, thewhole process is transparent to the user.

At the application level, the user interfaces to the

VigraVision board through a C-callable library

(VigraVision ToolBox- VTB DLL). The ToolBox

includes hardware initialisation and register control

functions, image acquisition functions and image

processing functions. For instance, the application


th



7/8

developer downloads a particular FPGA configuration by

invoking the following function:

u_loadXilinx(Xilinx_Chip_ID, Configuration_

filename)

He or she can then copy the processed image from the

video buffer to the host processor, for further analysis,

by:

u_readRect(LPRECT lpImgRect, LPVOID imgData)

where lpImgRect specifies the rectangle in the frame

buffer where data is to be read from, and imgData is a

pointer to the image transferred to the host memory.

The Tool Box is supported in Microsoft Visual

C/C++ 5.0. Its functions can be accessed by application

software which has been linked with the VTB import

library and accessed via Windows as a DLL library. The

resulting video coprocessor (see Figure 13) is based on a

library of bitstream configurations ready to use

(download to the FPGA) from a high level language

(VC++ in our case). This library is extensible over time

using our HIDE4k system. Thanks to our skeleton

oriented approach, this task is relatively easy to performand requires little FPGA hardware knowledge. This has

been illustrated in this paper by two high level skeletons.

Other skeletons can be designed using a similar approach

and added to the library.

6. Conclusion

In this paper, we have presented a framework for FPGA

based Image Processing. Central to this framework is the

Hardware Skeleton Library which contains a set of high

level descriptions of task-specific architectures

specifically optimised for Xilinx XC4000 FPGAs.

Although extensible, the library is based on a core level

containing the operations of Image Algebra. The libraryalso contains high level skeletons for compound

operations, whose implementations include task-specific

optimisations. Skeletons are parameterisable, and

different skeletons for the same operation can be

provided, for instance for different arithmetic

representations. This gives the user a range of

implementation choices. This in turn supports

experimentation with different implementations and

choosing the most suitable one for the particular

constraints in hand (e.g. speed and area). We are

investigating the possibility of doing some of this

experimentation automatically, but for now we do it

manually. Given a complete algorithm description in

terms of skeletons, an efficient hardware configurationis generated automatically by our system.

Our approach was assessed successful by a real

hardware implementation of a video coprocessor on a

commercial FPGA based video board giving real time

processing of video data. This video coprocessor allows

for rapid generation of FPGA architectures from very

high level, algorithmic, descriptions and opens the way

to enabling image processing application developers to

exploit the high performance capability of a direct

hardware solution, while programming in an

application-oriented model.

Note that the skeleton oriented approach is not tied

to a particular FPGA chip. Moreover, it may have some

applicability for VLSI design. Furthermore, other

application domains where there is an established

algebra such as numerical processing can also benefit

from the skeleton approach.

Full system development will in practice inevitably

hit the problem that some particular task is not readily

expressed in terms of the skeletons currently in the

library. It will always be necessary to have an ongoing

process of skeleton development. This will of course

require a skilled architecture designer, although less

efficient solutions might be possible using existing

skeletons; but the advantage of our approach is that

system builders themselves do not require detailed

hardware description skills.

Future directions include upgrading the system to

handle other FPGA series (particularly Xilinx Virtex

chips). The extension of the hardware skeleton library,

both in supporting more arithmetic types and providing

other skeletons for more sophisticated image processing

operations (wavelet transform in particular), is being

investigated.

OR

High Level IP operationsdescriptionsExtensible bitstream

configurations library

HIDE4ksystem

Xilinx PARtools

EDIFNetlist

VC++ program

VigraVision PCI video board

text

Hardware Skeleton Library

VTB library

Figure 13. Overall view of the VigraVision based video coprocessor


th



8/8

7. References

[1] Synopsys Inc., Behavioural Compiler, Software

documentation, 1998.http://www.synopsys.com/products/beh_syn/

[2] C Level Design Inc, C/C++ Synthesis SystemCompiler, Product overview, 1998.

http://www.cleveldesign.com/products/[3] The Embedded Solutions Limited, Handel C

information sheets, 1999 http://www.embeddedsol.com

[4] Hutchings B, Bellows P, Hawkins J, Hemmert S, Nelson

B and Rytting M, A CAD suite for High-PerformanceFPGA design, FCCM99, Preliminary Proceedings.

[5] Benkrid K, Design and Implementation of a High Level

FPGA Based Coprocessor for Image and Video

Processing, PhD Thesis, Department of ComputerScience, The Queen's University of Belfast, 2000.

[6] Cole M, Algorithmic Skeletons: structured management

of parallel computation, MIT Press, 1989.

[7] Darlington J, Ghanem M, and To H W, 'StructuredParallel Programming', In Programming Models for

Massively Parallel Programming Computers, IEEE

Computer Society Press, pp. 160-169, Sept 1993.

[8] Michaelson G J, Scaife N R, and Wallace A M,'Prototyping parallel algorithms in Standard ML',

Proceedings of British Vision Conference, Sep 1995.

ftp://ftp.cee.hw.ac.uk/pub/funcprog/msw.bmvc95.ps.Z

[9] Ritter G X, Wilson J N and Davidson J L, ImageAlgebra: an overview, Computer Vision, Graphics and

Image Processing, No 49, pp 297-331, 1990.

[10] Avizienis A, Signed Digit Number Representation for

Fast Parallel Arithmetic, IRE Transactions on

Electronic Computer, Vol. 10, pp 389-400, 1961.

[11] Clocksin W F and Melish C S, Programming inProlog, Springer-Verlag, 1994

[12] Crookes D, Alotaibi K, Bouridane A, Donachy P and

Benkrid A, An Environment for Generating FPGA

Architectures for Image Algebra-based Algorithms,

ICIP98, Vol.3, pp. 990-994, 1998.[13] Benkrid K, Crookes D, Bouridane A, Corr P and

Alotaibi K, A High Level Software Environment for

FPGA Based Image Processing, Proc. IPA'99, IEESeventh International Conference on Image Processing

and its Applications, Manchester, July 1999. pp. 112-

116.

[14] Xilinx Ltd, XC4000E and XC4000X Series FieldProgrammable Gate Arrays -Product Specification,

1999. http://www.xilinx.com/partinfo/4000.pdf

[15] Koren I, Computer arithmetic algorithms, Prentice-

Hall, Inc, pp. 99-126, 1993.[16] Benkrid K, Crookes D, Smith J, Benkrid A, 'High Level

Programming for Real Time FPGA Based Video

Programming', Proc. ICASSP'2000, IEEE International

Conference on Acoustic, Speech and Signal Processing,

Istanbul, June 2000. Volume VI, pp. 3227-3231.[17] Crookes D, Benkrid K, Bouridane A, Alotaibi K and

Benkrid A, Design and Implementation of a High Level

Programming Environment for FPGA Based ImageProcessing, IEE proceedings Vision, Image and SignalProcessing, Vol. 147, No. 7, pp. 377-384.

[18] Ross J, The Image Processing Handbook, CRC Press,

1995.[19] Visicom Laboratories, The VigraVision PCI video

board: users manual, 1998. http://www.visicom.com


th


Documents

9. High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons