7
Parallel Video Processing Techniques for Surveillance Applications Leonidas Deligiannidis Wentworth Institute of Technology Computer Science 550 Huntington Av. Boston, MA, 02115 USA [email protected] Hamid R. Arabnia University of Georgia Computer Science 415 GSRC Athens, GA, 30602 USA [email protected] Abstract In this paper we present several solutions to Security Surveillance and their applications while utilizing parallel processing for improved performance. The algorithms presented in this paper are explained in detail and their implementations are provided for free for educational purposes. These algorithms with their implementations are being taught in our Parallel Processing course utilizing NVidia’s Cuda language and framework. We chose the topic of Security Surveillance in our Parallel Processing course because the results are visual and applicable in many situations such as Surveillance of parking lots (monitoring our car for example), dormitory rooms, offices, etc. Image processing techniques and algorithms are discussed in this paper, and extended to real-time high definition video processing focusing on surveillance. 1. Introduction One of the most important goals of Security Surveillance is to collect and disseminate real-time information and provide situational awareness to operators and security analysts [1]. Only then educated decisions and future reasoning can be made and prevent undesirable incidents [2]. The need for surveillance spans in many domains, including commercial buildings, law enforcement, the military, banks, parking lots, city settings as well as hallways and entrances to buildings, etc. Most of today's surveillance is used primarily as a forensic tool, to investigate what has already happened, and not to prevent an incident. Detection of objects, people, peoples’ faces [3], cars and their pattern of movement is necessary to enhance automated decision making and alarming. This is a difficult problem to solve [4]. One of the most common approaches to solving this problem is background subtraction [4] [5] [6] where each pixel is compared of its intensity value and if the change is above a threshold, it is marked as motion- detected. In [2] the authors developed an object detection system that is based on pixel and region analysis and gives better results in sudden pixel intensity variations. Many times motion detection alone may not be enough. The identity of the object or person that triggers the motion detector may need to be identified. Other times the trajectory of the moving entity may need to be tracked [7][8][9]. 2. GPUs and CUDA Originally, graphics processors were used primarily to render images. In recent years however, these Graphical Processing Units (GPUs) are also used to solve problems involving massive data parallel processing. Thus GPUs have been transformed to General Purpose Graphical Processing Units (GPGPUs) and can be viewed as external devices that can perform parallel computations for problems that don’t only involve graphics. Lately, GPUs have experienced a tremendous growth, mainly driven by the gaming industry. GPUs are now considered to be programmable architectures and devices consisting of several many-core processors capable of running hundreds of thousands of threads concurrently. NVIDIA Corporation provides a new API that utilizes C/C++ to program their graphics cards. This API is called CUDA (Compute Unified Device Architecture) [10] and enables us to utilize relatively easily data- parallel algorithms [11], [12] [13]. CUDA has been used in simulations [14][15][16], genetic algorithms [17], DNA sequence Alignment [18], encryption systems [19][20], Image processing [21-29], digital forensics [30], and other fields. CUDA is available for most operating systems, and it is freely available; CUDA is restricted to NVidia graphics cards, however. CUDA is a highly parallel computing platform and programming model and provides access to an elaborate GPU memory architecture and parallel thread execution management. Each thread has a unique identifier. Each thread can then perform the same operation on different set of data. A CUDA program consists of two main components. The program that runs on the host computer, and functions (called kernels) that run on the GPU. Each kernel is executed as a batch of threads organized as a grid of thread blocks. The size of the grid and blocks is user configurable to fit the problem's requirements. 2014 International Conference on Computational Science and Computational Intelligence 978-1-4799-3010-4/14 $31.00 © 2014 IEEE DOI 10.1109/CSCI.2014.38 183 2014 International Conference on Computational Science and Computational Intelligence 978-1-4799-3010-4/14 $31.00 © 2014 IEEE DOI 10.1109/CSCI.2014.38 183 2014 International Conference on Computational Science and Computational Intelligence 978-1-4799-3010-4/14 $31.00 © 2014 IEEE DOI 10.1109/CSCI.2014.38 183 2014 International Conference on Computational Science and Computational Intelligence 978-1-4799-3010-4/14 $31.00 © 2014 IEEE DOI 10.1109/CSCI.2014.38 183 2014 International Conference on Computational Science and Computational Intelligence 978-1-4799-3010-4/14 $31.00 © 2014 IEEE DOI 10.1109/CSCI.2014.38 183

[IEEE 2014 International Conference on Computational Science and Computational Intelligence (CSCI) - Las Vegas, NV, USA (2014.03.10-2014.03.13)] 2014 International Conference on Computational

  • Upload
    hamid-r

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

Page 1: [IEEE 2014 International Conference on Computational Science and Computational Intelligence (CSCI) - Las Vegas, NV, USA (2014.03.10-2014.03.13)] 2014 International Conference on Computational

Parallel Video Processing Techniques for Surveillance Applications

Leonidas Deligiannidis

Wentworth Institute of Technology Computer Science

550 Huntington Av. Boston, MA, 02115 USA

[email protected]

Hamid R. Arabnia

University of Georgia Computer Science

415 GSRC Athens, GA, 30602 USA [email protected]

Abstract In this paper we present several solutions to Security Surveillance and their applications while utilizing parallel processing for improved performance. The algorithms presented in this paper are explained in detail and their implementations are provided for free for educational purposes. These algorithms with their implementations are being taught in our Parallel Processing course utilizing NVidia’s Cuda language and framework. We chose the topic of Security Surveillance in our Parallel Processing course because the results are visual and applicable in many situations such as Surveillance of parking lots (monitoring our car for example), dormitory rooms, offices, etc. Image processing techniques and algorithms are discussed in this paper, and extended to real-time high definition video processing focusing on surveillance. 1. Introduction One of the most important goals of Security Surveillance is to collect and disseminate real-time information and provide situational awareness to operators and security analysts [1]. Only then educated decisions and future reasoning can be made and prevent undesirable incidents [2]. The need for surveillance spans in many domains, including commercial buildings, law enforcement, the military, banks, parking lots, city settings as well as hallways and entrances to buildings, etc. Most of today's surveillance is used primarily as a forensic tool, to investigate what has already happened, and not to prevent an incident. Detection of objects, people, peoples’ faces [3], cars and their pattern of movement is necessary to enhance automated decision making and alarming. This is a difficult problem to solve [4]. One of the most common approaches to solving this problem is background subtraction [4] [5] [6] where each pixel is compared of its intensity value and if the change is above a threshold, it is marked as motion-detected. In [2] the authors developed an object detection system that is based on pixel and region analysis and gives better results in sudden pixel intensity variations. Many times motion detection alone may not be enough. The identity of the object or

person that triggers the motion detector may need to be identified. Other times the trajectory of the moving entity may need to be tracked [7][8][9]. 2. GPUs and CUDA Originally, graphics processors were used primarily to render images. In recent years however, these Graphical Processing Units (GPUs) are also used to solve problems involving massive data parallel processing. Thus GPUs have been transformed to General Purpose Graphical Processing Units (GPGPUs) and can be viewed as external devices that can perform parallel computations for problems that don’t only involve graphics. Lately, GPUs have experienced a tremendous growth, mainly driven by the gaming industry. GPUs are now considered to be programmable architectures and devices consisting of several many-core processors capable of running hundreds of thousands of threads concurrently. NVIDIA Corporation provides a new API that utilizes C/C++ to program their graphics cards. This API is called CUDA (Compute Unified Device Architecture) [10] and enables us to utilize relatively easily data-parallel algorithms [11], [12] [13]. CUDA has been used in simulations [14][15][16], genetic algorithms [17], DNA sequence Alignment [18], encryption systems [19][20], Image processing [21-29], digital forensics [30], and other fields. CUDA is available for most operating systems, and it is freely available; CUDA is restricted to NVidia graphics cards, however. CUDA is a highly parallel computing platform and programming model and provides access to an elaborate GPU memory architecture and parallel thread execution management. Each thread has a unique identifier. Each thread can then perform the same operation on different set of data. A CUDA program consists of two main components. The program that runs on the host computer, and functions (called kernels) that run on the GPU. Each kernel is executed as a batch of threads organized as a grid of thread blocks. The size of the grid and blocks is user configurable to fit the problem's requirements.

2014 International Conference on Computational Science and Computational Intelligence

978-1-4799-3010-4/14 $31.00 © 2014 IEEE

DOI 10.1109/CSCI.2014.38

183

2014 International Conference on Computational Science and Computational Intelligence

978-1-4799-3010-4/14 $31.00 © 2014 IEEE

DOI 10.1109/CSCI.2014.38

183

2014 International Conference on Computational Science and Computational Intelligence

978-1-4799-3010-4/14 $31.00 © 2014 IEEE

DOI 10.1109/CSCI.2014.38

183

2014 International Conference on Computational Science and Computational Intelligence

978-1-4799-3010-4/14 $31.00 © 2014 IEEE

DOI 10.1109/CSCI.2014.38

183

2014 International Conference on Computational Science and Computational Intelligence

978-1-4799-3010-4/14 $31.00 © 2014 IEEE

DOI 10.1109/CSCI.2014.38

183

Page 2: [IEEE 2014 International Conference on Computational Science and Computational Intelligence (CSCI) - Las Vegas, NV, USA (2014.03.10-2014.03.13)] 2014 International Conference on Computational

These blocks can be configured as 1D, 2D, or 3D matrices. This architecture is built around a scalable array of multi-threaded streaming multiprocessors (SMs). CUDA uses a Single Instruction Multiple Thread (SIMT) architecture that enables us to write thread-level parallel code. CUDA also features several high-bandwidth memory spaces to meet the performance requirements of a program. For example, Global memory is memory accessed by the host computer and by the GPU. Other memory types are only accessible by the kernels and reside within the chip and provide a much lower latency: a read-only Constant memory, Shared Memory (which is private for each block of threads only), a texture cache and, finally, a two-level cache that is used to speed up accesses to the global memory. Coordination between threads within a kernel is achieved through synchronization barriers. However, as thread blocks run independently from all others, their scope is limited to the threads within the thread block. CPU based techniques can be used to synchronize multiple kernels. Generally, in a CUDA program, data is copied from the host memory to the GPU memory across the PCI bus. Once in the GPU memory, data is processed by kernels (functions that run in the GPU), and upon completion of a task the data needs to be copied back to the host memory. Newer GPUs support host page-locked memory where host memory can be accessed directly by kernels, but that reduces the available memory to the rest of the applications running on the host

computer. On the other hand, this eliminates the time needed to copy back and forth data from host to GPU memory and vice versa. Additionally, for image generation and manipulation application, we can use the interoperability functionality of OpenGL with CUDA to improve further the performance of an application. This is because we can render an image directly on the graphics card and avoid copying the image data from the host to GPU, and back, for each frame. 3. Parallel Algorithms for Image

Processing Spatial domain filtering (or image processing and manipulation in the spatial domain) can be implemented using CUDA where each pixel can be processed independently and in parallel. The spatial domain is a plane where a digital image is defined by the spatial coordinates of its pixels. Another domain considered in image processing is the frequency domain where a digital image is defined by its decomposition into spatial frequencies participating in its formation. Many image processing operations, particularly spatial domain filtering, are reduced to local neighborhood processing [31]. Let Sxy be the set of coordinates of a neighborhood (normally a 3x3 or 5x5 matrix) that is centered on an arbitrary pixel (x,y) of an image f.

Figure 1. Local neighborhood processing with a 3x3 filter of an input pixel (x,y). New pixel value is stored in the output image at the same coordinates.

184184184184184

Page 3: [IEEE 2014 International Conference on Computational Science and Computational Intelligence (CSCI) - Las Vegas, NV, USA (2014.03.10-2014.03.13)] 2014 International Conference on Computational

Processing a local neighborhood, generates a pixel(x,y) in the output image g. The intensity of the generated pixel value is determined by a specific operation involving the pixel in the input image at the same coordinates [32]. The spatial domain processing can be described by the following expression

���� �� � ���� ��� where ���� �� is the intensity value of the pixel (x,y) of the input image,���� �� is the intensity value of the pixel (x,y) of the output image, and is an operator defined on a local neighborhood of the pixel with coordinates (x,y), shown in figure 1. Because an operation of local neighborhood in the spatial domain is based on processing the local and limited area (typically a 3x3 or 5x5 matrix; also called filter kernel) around every pixel of the input image, these operations can be implemented separately and independently for each pixel, which provides the opportunity for these operations to be performed in parallel. The linear spatial filtering of an image of size NxM with a filter of size mxn is defined by the expression:

���� �� � ���� ���

������� � �� � � ��

����

where ���� �� is the input image, ���� �� are the

coefficients of the filter, and � � ���� , and � � ���

� . The pseudo code of the algorithm is shown below. An image f of size NxM is being processed where a filter w of size nxm is applied, and the result is placed in the output image g, (see Pseudo code 1).

Pseudo code 1. for x=1 to N do for y=1 to M do Out=0; for s=-a to a do for t=-b to b do // calculate the weighted sum Out = Out + f(x+s, y+t) * w(s,t) g(x,y) = Out

Normally the filter size is small in size, ranging from 3x3 to 5x5 matrices. The image size is, however, could be in thousands of pixels especially when we are working with high definition images and video. In the above algorithm, we see four nested loops. However, only the outer 2 loops need to be iterated many times; since the inner two loops iterate 9 times total for a 3x3 filter. Thus, to speed up performance we need to parallelize the outer two loops first, before even considering to parallelize the two inner loops. We can visualize the entire image as a long one-dimensional

array of pixels; that's how it is stored in memory anyway. Now we can re-write the above algorithm as such (see Pseudo code 2):

Pseudo code 2. foreach pixel p in f do x = p.getXcoordinates() y = p.getYcoordinates() Out=0 for s=-a to a do for t=-b to b do // calculate the weighted sum Out = Out + f(x+s, y+t) * w(s,t) g(x,y) = Out

The above algorithm collapses the two outer loops. Now, using Cuda's built-in thread and block identifiers, we can launch NxM threads where each thread will be processing a single pixel and thus eliminating the expensive two outer loops. Note that the two inner loops are not computationally expensive and can be left alone. So the revised parallel algorithm will look like Pseudo code 3:

Pseudo code 3. x = getPixel(threadID).getXcoordinates() y = getPixel(threadID).getYcoordinates() Out=0 for s=-a to a do for t=-b to b do // calculate the weighted sum Out = Out + f(x+s, y+t) * w(s,t) g(x,y) = Out

The same CUDA function can be called for different filtering effects by passing a reference to a filter. Since the filters do not change, they can be placed in Constant memory for decreased access time. The Cuda kernel is shown below in Code 1. The filter is placed in Constant memory for increased performance. *ptr is a pointer to the input image, and *result is a pointer to the output image. Since this kernel is invoked on RGB images, the filter is applied on all three colors and the result is constrained to values from 0 to 255 using our T() function. 4. Applications for Surveillance Using

Parallel Video Processing Based on the topics described earlier, we can implement several algorithms for practical surveillance applications. These algorithms that we will describe can run serially, but for improved performance we can run them in parallel. To implement and execute these algorithms we used a Lenovo W530 running Windows 8 Pro 64-bit, equipped with an [email protected] CPU, and 8GB of memory. The GPU is an NVIDIA Quadro K1000M with 192 CUDA cores.

185185185185185

Page 4: [IEEE 2014 International Conference on Computational Science and Computational Intelligence (CSCI) - Las Vegas, NV, USA (2014.03.10-2014.03.13)] 2014 International Conference on Computational

Code 1. __global__ void Parallel_kernel(uchar4 *ptr, uchar4 *result, const int filterWidth, const int filterHeight) { // map from threadIdx/BlockIdx to pixel position int x = threadIdx.x + blockIdx.x * blockDim.x; int y = threadIdx.y + blockIdx.y * blockDim.y; int offset = x + y * blockDim.x * gridDim.x; if( offset >= N*M ) return; // in case we launched more threads than we need. int w = N; int h = M; float red = 0.0f, green = 0.0f, blue = 0.0f; //multiply every value of the filter with corresponding image pixel. //Note: filter dimensions are relatively very small compared to the dimensions of an image. for(int filterY = 0; filterY < filterHeight; filterY++) { for(int filterX = 0; filterX < filterWidth; filterX++) { int imageX = ((offset%w) - filterWidth / 2 + filterX + w) % w; int imageY = ((offset/w) - filterHeight / 2 + filterY + h) % h; red += ptr[imageX + imageY*w].x * filterConst[filterX+filterY*filterWidth]; green += ptr[imageX + imageY*w].y * filterConst[filterX+filterY*filterWidth]; blue += ptr[imageX + imageY*w].z * filterConst[filterX+filterY*filterWidth]; } } //truncate values smaller than zero and larger than 255, and store the result. result[offset].x = T(int(red)); result[offset].y = T(int(green)); result[offset].z = T(int(blue)); }

For a camera, we used an Axis P13 series network camera attached to a Power Over Ethernet 100Gbit switch. We wrote software that communicates with the camera using the HTTP protocol, to instruct the camera when to begin and end the video feed as well as changing the resolution of the feed. The camera transmits the video in Motion JPG format. Each frame is delimited with special tags and the header for each image frame contains the length of the data frame. We used the std_image (http://nothings.org/stb_image.c) package to decode each JPG image before passing the image frame to our processing algorithms. 4.1 Motion Detector The first algorithm detects motion in the field of view of the camera. To implement this algorithm we need to keep track of the previous frame to see if the current and the previous frames differ and by how much. We first calculate the square difference of the previous and the current frame; this operation is done for every pixel. If this difference is above a set threshold, we fire a "Motion Detected" event, which can activate a sound alarm, etc., to get the attention of the operator. We can adjust the threshold value to make the detector more or less sensitive to pixel value changes. We can also display the pixels that triggered the alarm visually. The parallel algorithm is shown in pseudo code 4:

Pseudo code 4. //one thread for each pixel foreach pixel p do color_diff = prev_color(p) - curr_color(p) // square of different color_diff *= color_diff // update previous color prev_color(p) = curr_color(p) // empirically chosen value. // can be adjusted to make detector more // or less sensitive to changes. threshold = 5000 if( color_diff > threshold ) then fire "Motion Detected" event current_color(p) = RED else leave pixel unchanged OR make pixel gray OR anything you want

4.2 Over a Line Motion Detector Even though the Motion Detector algorithm presented earlier is a simple and useful algorithm, many times we are interested only in a section of the live video feed. This algorithm is applicable when for example, there is a busy road on the left and a restricted area on the right of the field of view of the camera. A single line can divide the live video feed into two areas. Since a line is defined but only two points, an operator only needs to define these two points. Alarm events get generated only if there is motion above the user defined line. To implement this algorithm we will use the dot product of

186186186186186

Page 5: [IEEE 2014 International Conference on Computational Science and Computational Intelligence (CSCI) - Las Vegas, NV, USA (2014.03.10-2014.03.13)] 2014 International Conference on Computational

two vectors. There are two ways of calproduct of two vectors:

� � � �� � � � � � !"�# � � � � �$�$ � �%�% � & �

We will use the second formula to caproduct. To determine if a pixel C w(Cx,Cy) is above the user specified defined by points A and B with coordand (Bx,By) respectively, we need to doas illustrated in figure 2.

Figure 2. Using dot product to determis over or below a user specified line. First we need to find the vector that is pthe line that is defined by points A and it vector D. We then need to find the vequal to the difference of vectors C andproduct of D and E is greater than 0, pthe line defined by points A and B. If point C is right on the line, otherwise pthe line. The parallel algorithm in pshown in Pseudo code 5. 4.3 Line Crossing Detector Line Crossing Detector is a similar algoLine Detector.

Pseudo code 5. //one thread for each pixel foreach pixel p do

culating the dot

�'�'

alculate the dot with coordinates

line, which is dinates (Ax,Ay) o the following,

mining if a point

perpendicular to B; we will call

vector E that is d A. If the dot oint C is above it is equal to 0,

point C is below pseudo code is

orithm to Over a

color_diff = prev_color(p) // square of different color_diff *= color_diff // update previous color prev_color(p) = curr_color( // empirically chosen value // can be adjusted to make // or less sensitive to cha threshold = 5000 Cx = p.getXcoordinates() Cy = p.getYcoordinates() // calculate dot product, s r = (Cx-Ax)*(Ay-By) + (Cy-A if( r > 0 ) then // OVER T if ( color_diff > thresho fire "Motion Detected" current_color(p) = RED else over the line but below could make pixel transp else // BELOW THE LINE do nothing, we are not interested in this area

Pseudo code 6. //one thread for each pixel foreach pixel p do color_diff = prev_color(p) // square of different color_diff *= color_diff // update previous color prev_color(p) = curr_color( // empirically chosen value // can be adjusted to make // or less sensitive to cha threshold = 5000 Cx = p.getXcoordinates() Cy = p.getYcoordinates() // calculate dot product, s r = (Cx-Ax)*(Ay-By) + (Cy-A if( r >= -200 && r <= 200 ) if ( color_diff > thresho fire "On Line Motion De current_color(p) = RED else pixel on line, but no m could make pixel green else // OVER or BELOW THE L do nothing, we are not interested in these areas

It is similar in that this algorithmdefined by two points and the dot the dot product computed in ODetector is equal to zero, then thelocated on the line. This algorithm detects motion on tline could be very skinny, dependa few pixels could end up exactly

- curr_color(p)

(p)

e. detector more anges.

see figure 2 Ay)*(Bx-Ax)

THE LINE old ) then event

w threshold. parent

- curr_color(p)

(p)

e. detector more anges.

see figure 2 Ay)*(Bx-Ax)

then old ) then etected" event

motion detected. to show line LINE

s

m is also using a line product. However, if

Over a Line Motion e pixel in question is

the line. Because the ding on its slope, only

on the line. For this

187187187187187

Page 6: [IEEE 2014 International Conference on Computational Science and Computational Intelligence (CSCI) - Las Vegas, NV, USA (2014.03.10-2014.03.13)] 2014 International Conference on Computational

reason, we can make the line thicker by adjusting the limits of the dot product. Instead of checking if the dot product is equal to exactly zero, we can check if the dot product is between the positive and negative n, where n is a user specified value. The algorithm in pseudo code 6 illustrates the Line Crossing Detector. One problem with this algorithm is that the thickness of the line depends on the distance between the points A and B that specify the line. To fix this, we need to calculate the distance (length) between the two points as shown below:

// find length of line distX=Ax-Bx; distY=Ay-By; LEN = sqrt(distX*distX+distY*distY); if(LEN equals 0) then LEN=0.001;

and then we need to replace in pseudo code 6 if ( r >= -200 && r <= 200 ) then

with X=4.0 if (r/LEN >= -X && r/LEN <= X) then

Now the width of the line does not depend on the distance of the A and B points. The value of X specifies the thickness of the line, and it does not depend on the length of the vector ()*****+. This is very useful when the user wants to dynamically adjust the position and orientation of the line by manipulating the points A and B. Figure 3 is a snapshot of our application showing motion detection above a user specified line and also motion on the line, in real time on a video feed with resolution 1920x1080. 4.4 Area Motion Detector We can modify the Over a Line Motion Detector algorithm and instead of specifying one line, we can specify multiple lines. We can specify several detached areas or one area defined by a polygon. This algorithm can be used easily to define an area of interest or exclude the defined area; this can be done by simply reversing the order of the points that define the area. We will need to compute N number of dot products where N is the number of lines.

Figure 3. Snapshot showing Motion Detected above a user specified line and on the line. 5. Conclusion In this paper we presented several algorithms that are easily implemented even by undergraduate students. These algorithms have many applications in Security and more specifically in Security Surveillance. Using CUDA, these algorithms can be executed in parallel for increased performance. This can be done because using CUDA we can launch one thread per pixel. So, a high resolution image or video feed can be processed in almost the same amount of time as a low resolution image or video stream. When our students implemented these algorithms as programming assignments we found that the biggest delay in the application was the network feeding the camera video stream to the application. Acknowledgments We would like to thank NVIDIA Corporation for providing equipment and financial support through their CUDA Teaching Center initiative. We would also like to thank Axis Communications for donating more than 20 network enabled surveillance cameras. Their support is greatly appreciated. References [1] Luis Olsina, Alexander Dieser, and Guillermo Covella,

"Metrics and Indicators as Key Organizational Assets for ICT Security Assessment", Emerging Trends in Computer Science & Applied Computing. Emerging Trends in ICT Security. Editors: Babak Akhgar & Hamid R. Arabnia. ISSN: 978-0-12-411474-6, Elsevier Inc., November 2013

[2] Robert T. Collins, Alan J. Lipton, Hironobu Fujiyoshi, and Takeo Kanade, "Algorithms for Cooperative Multisensor Surveillance", Invited paper, Proceedings of the IEEE Vol. 89, No. 10 Oct 2001 pp1456-1477

[3] Lucas D. Introna and David Wood, "Picturing Algorithmic Surveillance: The Politics of Facial Recognition Systems", Surveillance & Society CCTV Special (eds. Norris, McCahill and Wood) 2(2/3): 177-198, 2004

[4] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, "Wallflower: Principles and practice of background

188188188188188

Page 7: [IEEE 2014 International Conference on Computational Science and Computational Intelligence (CSCI) - Las Vegas, NV, USA (2014.03.10-2014.03.13)] 2014 International Conference on Computational

maintenance". In Proc. of the Int. Conf. Computer Vision, Corfu, Greece, 1999, pp. 255-261.

[5] I. Haritaoglu, D. Harwood, and L. S. Davis, "W4: Real-time surveillance of people and their activities," IEEE Transactions Pattern Anal. Mach. Intell., vol. 22, pp. 809-830, Aug. 2000.

[6] C. Stauffer and W. E. L. Grimson, "Learning patterns of activity using real-time tracking". IEEE Transactions Pattern Anal. Mach. Intell., vol. 22, pp. 747-757, Aug. 2000.

[7] Jiang X, Motai Y, Zhu X. “Predictive fuzzy logic controller for trajectory tracking of a mobile robot”. In: Proceedings of IEEE Mid-Summer Workshop on Soft Computing in Industrial Applications; 2005.

[8] Klancar G, Skrjanc I. “Predictive trajectory tracking control for mobile robots”, In: Proc of Power Electronics and Motion Control Conference; 2006. p. 373-78.

[9] Jianbing Ma, "Using Event Reasoning for Trajectory Tracking", Emerging Trends in Computer Science & Applied Computing. Emerging Trends in ICT Security. Editors: Babak Akhgar & Hamid R. Arabnia. ISSN: 978-0-12-411474-6, Elsevier Inc., November 2013.

[10] Nvidia Corporation's CUDA download page "https://developer.nvidia.com/cuda-downloads" Retrieved Nov. 26 2013

[11] A. V. Husselmann and K. A. Hawick. “Spatial agent-based modeling and simulations - a review”. Technical Report CSTN-153, Computer Science, Massey University, Albany, North Shore,102-904, Auckland, New Zealand, October 2011. In Proc. IIMS Postgraduate Student Conference, October 2011.

[12] A.V. Husselmann and K.A. Hawick. “Simulating species interactions and complex emergence in multiple flocks of birds with gpus”. In T. Gonzalez, editor, Proc. IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2011), pages 100–107, Dallas, USA, 14-16 Dec 2011. IASTED.

[13] Alwyn V. Husselmann and K. A. Hawick. “Parallel parametric optimization with firefly algorithms on graphical processing units”. In Proc. Int. Conf. on Genetic and Evolutionary Methods (GEM’12), number CSTN-141, pages 77–83, Las Vegas, USA, 16-19 July 2012.

[14] Janche Sang, Che-Rung Lee, Vernon Rego, and Chung-Ta King “A Fast Implementation of Parallel Discrete-Event Simulation” In Proc. of the Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA) 2013

[15] V. J. Rego and V. S. Sunderam, “Experiments in Concurrent Stochastic Simulation: The Eclipse Paradigm,” Journal of Parallel and Distributed Computing, vol. 14(1), pp. 66–84, January 1992.

[16] M.G.B. Johnson, D.P. Playne and K.A. Hawick, “Data-Parallelism and GPUs for Lattice Gas Fluid Simulations”, In Proc. of the Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA) 2010

[17] Vincent W. A. Tadaiesky, Ádamo L. de Santana, Lilian de J. C. Dias, Ivan de I. de Oliveira, Antonio F. L. Jacob Junior, and Fábio M. F. Lobato, “Runtime Performance Evaluation of GPU and CPU using a Genetic Algorithm Based on Neighborhood Model”. In Proc. of the Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA) 2013

[18] Jintai He, Mengxia Zhu, and Michael Wainer, “Parallel Sequence Alignments Using Graphics Processing Unit”. In Proc. of the Int. Conf. on Bioinformatics and Computational Biology (BIOCOMP) 2009

[19] Zhu Wang, Josh Graham, Noura Ajam, and Hai Jiang, “Design and Optimization of Hybrid MD5-Blowfish Encryption on GPUs”. In Proc. of the Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA) 2011

[20] Maksim Bobrov, Roy Melton, S. Radziszowski and Marcin Lukowiak “Effects of GPU and CPU Loads on Performance of CUDA Applications”, Proceedings of International Conference on Parallel and Distributed Processing Techniques and

Applications, PDPTA'11, July 2011, Las Vegas, NV, Vol. II, 575-581.

[21] Colantoni, P., Boukala, N., Da Rugna, J. “Fast and accurate color image processing using 3d graphics cards”. 8th International Fall Workshop: Vision Modeling and Visualization, 2003

[22] M. Ahmadvand, and A. Ezhdehakosh “GPU-Based Implementation of JPEG2000 Encoder”. In Proc. of the Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA) 2012

[23] M. G. Sánchez, V. Vidal, J. Bataller and G. Verdú, “Performance Analysis on Several GPU Architectures of an Algorithm for Noise Removal”. In Proc. of the Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA) 2012.

[24] Sánchez. M.G., Vidal, V., Bataller, J., Arnal, J., “A Fuzzy Metric in GPUs: Fast and Efficient Method for the Impulsive Image Noise Removal”, in Proc. Of the International Symposium on Computer and Information Sciences (ISCIS) 2011.

[25] Stone, S. S., Haldar J.P.,Tsao S.C.,Hwu W.W., Liang, Z-P., Sutton B.P., “Accelerating Advanced MRI Reconstructions on GPUs,” in Proceedings of the 5th International Conference on Computing Frontiers, May5-7, 2008.

[26] Xu, W., Mueller D., “Learning Effective Parameter Settings for Iterative CT Reconstruction Algorithms,” in Fully 3D Image Reconstruction in Radiology and Nuclear Medicine Conference, 2009.

[27] Li, L., Li, X., Tan, G., Chen M., Zhang P., “Experience of Parallelizing cryo-EM 3D Reconstruction on a CPU-GPU Heterogeneous System,” in Proceeding HPDC 11 Proceedings of the 20th international symposium on High performance distributed computing, 2011.

[28] Anderson R.F., Kirtzic J.S., Daescu, O., “Applying Parallel Design Techniques to Template Matching with GPUs”, in Springer-Verlag New York Inc., Volume: 6449, 2011.

[29] Sánchez. M.G., Vidal, V., Bataller, J., Arnal, J., “Implementing a GPU fuzzy filter for Impulsive Image Noise Correction”, in Proc. Of the International Conference Computational and mathematical methods in Science and Engineering (CMMSE) 2010.

[30] Chung-han Chen and Fan Wu “An Efficient Acceleration of Digital Forensics Search Using GPGPU”, In Proc. Of The 2013 International Conference on Security and Management (SAM), 2012

[31] Rafael C. Gonzalez and Richard E. Woods, “Digital Image Processing (3rd Edition)”. Prentice Hall, ISBN 9780131687288, 2008

[32] Daniel Shiffman, "Learning Processing: A Beginner's Guide to Programming Images, Animation, and Interaction", Morgan Kaufmann Series in Computer Graphics, ISBN 9780123736024, 2008

189189189189189