[IEEE 2010 2nd International Conference on Information Technology Convergence and Services (ITCS) - Cebu, Philippines (2010.08.11-2010.08.13)] 2010 2nd International Conference on

Accelerating Multi-Sensor Image Fusion using Graphics Hardware

SeungHun Yoo Imaging Media Research Center

Korea Institute of Science Technology Seoul, Korea

[email protected]

JaeIn Hwang Imaging Media Research Center

Korea Institute of Science Technology Seoul, Korea

[email protected]

Abstract— This paper shows approaches to accelerate pixel-level image fusion speed using graphics hardware. Recently, to improve visibility through maximization of information collected through development of various sensors and improvement of sensing technology, the importance of not only development of new fusion algorithm but speed of fusion process is increasing. Though specialized fusion boards for real time fusion processing are already developed, but they have disadvantages such as expensive price and lack of scalability. These disadvantages can be replaced by GPU (Graphics Processing Unit) that have good price/performance ratio, hardware programmability, enormous computing power and speed. Fifteen fusion methods were used for the tests that give numerical data regarding comparison of GPGPU (general-purpose GPU), CUDA (the latest architecture of GPU) with traditional CPU-based implementations. The evaluation results prove GPU acceleration to be much faster than CPU-based multi-threading.

Keywords-component; GPU; GPGPU; CUDA; Image fusion; High-speed fusion

I. INTRODUCTION Multi-sensor image fusion is the process of combining

relevant information from two or more images into a single image. Eventually the fused image will be more informative than any of the input images for human or machine perception. So far until recently, many image fusion techniques have been developed in a wide variety of applications such as concealed weapon detection, remote sensing, intelligent robots, digital camera applications, medical diagnosis and surveillance systems [1][2][3].

According to the stage at which the fusion takes place, image fusion algorithms can be considered to be at one of four levels of abstraction: signal-level; pixel-level; feature-level; and symbolic-level [1]. Among those levels, however, most fusion applications currently adopt pixel-level fusion method since it preserves the original information of the source images and is easy to be implemented. For quality improvement of fused images at each level, many fusion algorithms and quantitative evaluation metrics have been proposed and hence advanced fusion performance in the past few years.

Recently, fusion speed has emerged as an important factor in the image fusion literature with the contributions of high performance sensors to image resolution and quality [4]. As a

consequence, a substantial amount of memory and computing power are required for a high-speed fusion, when EO/IR (Electro-optical/Infrared) aviation images received from an unmanned aircraft are of an extremely high resolution. To circumvent these obstacles in a real-time system, two approaches are typically possible: hardware and software solution. For hardware solution, specialized image/video fusion boards were developed for real-time image fusion. The fusion hardware, however, has some restrictions on cost and memory, and is hard to modify its fusion method by accessing to the programmable chips. On the contrary, software solution somewhat relieves these disadvantages, and greatly reduces image fusion speed. With improvement of fusion speed, ultimately a new mechanism that remove shortcoming of two approaches is required.

The GPU (Graphics Processing Unit) on commodity video cards has evolved into powerful and an extremely flexible streaming processors with fully programmable floating-point pipelines and tremendous aggregate computational power and memory bandwidth [5]. The power and flexibility of GPUs make them an attractive platform for computational demanding tasks as not only for the specific graphics computations but also for general-purpose computation (GPGPU) [6][9]. In most cases GPU-based systems execute many times faster than comparable CPU implementations, although a performance boost is not always guaranteed. Recently, NVIDIA released its latest GPU model, CUDA (Compute Unified Device Architecture) [6][8] that provides an extended version of ANSI-C for general-purpose applications based on GPU.

In this paper, we propose a software approach to image fusion on the programmable graphics hardware platform as a solution for the drawbacks of both hardware and software approaches as mentioned above. Most of the pixel-level image fusion operations more easily and effectively can map well to taking advantage of GPU resources since these operations are largely per-pixel independent. This paper is organized as follows. Section 2 presents the characteristics of GPGPU and CUDA briefly. Section 3 reviews the existing pixel-level image fusion methods and Section 4 describes how accelerate these using graphics hardware. Section 5 shows the performance comparison through a speed measurement of the image fusion methods implemented in the CPU and GPU. Finally, Section 6 concludes our work.

978-1-4244-7585-8/10/$26.00 ©2010 IEEE

II. GRAPHICS PROCESSING UNITS

A. GPGPU Modern GPU is designed to follow the graphics pipeline

structure, although the performance and flexibility of each programmable stage can vary depending on the specific implementation. Programmers can supply their own codes for both the vertex and fragment processors by writing called shaders. Programmers write GPU programs using C-style shading languages such as Cg, HLSL, and GLSL (OpenGL Shading Language) [6][8].

A typical GPGPU program uses the fragment processor with the highest arithmetic rates in GPU [6]. In addition, for efficiency of both memory access and data processing, GPU has a feedback structure that enables reuse by sending the processing results of the fragment processor to the texture memory instead of the framebuffer. This process is called Render-To-Texture (RTT) which enables multiple pass computation. The RTT is implemented direct feedback of GPU output to input without going back to CPU, so more efficient results are made. Instead of these benefits, fragment processor cannot use indirect memory addressing in the writing operation and same memory as input and output.

B. CUDA CUDA has become a standard platform for GPGPU

computing in NVIDIA graphics cards with a chipset G80 or superior. To assist in creating a variety of GPU-based applications, CUDA provides main advantages as follows:

• NVIDIA has hidden the architectures of its GPUs beneath an application programming interface (API). Programmers need not to know the complex details of the GPU hardware.

• CUDA programming interface provides a relatively simple path for users familiar with the C programming language to easily write programs for execution by the device.

• From a programming perspective, GPU can gather data from any location in DRAM, and also scatter data to any location in DRAM, just like on a CPU.

III. IMAGE FUSION ALGORITHMS Many researchers have developed pixel-level image fusion

algorithms concerning how to extract visual information from input images [2][3]. We classified conventional pixel-level fusion methods into four groups: adaptive weighted averaging (AWA) image fusion, color-based image fusion, pyramid-based image fusion, and wavelet-based image fusion. In this section, we provide a brief description of several pixel-level fusion methods that we experimented with as the type of each group.

A. Adaptive weighted averaging (AWA) image fusion The most straightforward way to build a fused image is a

weighted averaging of input images. AWA fusion methods use (non)linearly weighted average of two corresponding pixels, or

the principal component obtained from the Karhunen-Loeve transform (PCA), or the weight of a square window in an image (Salient-Regional) [2][3].

B. Color-based image fusion Color-based fusion can be simply divided into two groups

according to whether source images are color or not. True-color fusion methods produce natural color results. IHS (Intensity-Hue-Saturation) image fusion is based on RGB true color space.

On the other hand, false-color fusion methods use a RGB space, each of which is reserved for two gray source images and a result, respectively. The RGB color fusion algorithm is mapping source images to red and green color plane, and can obtain final RGB-fused image by mapping the average image of two source image to blue plane. TNO method is a false color mapping where the 'unique' and 'common' components of each source image the two images are assigned to the RGB bands, and MIT method is based on a color perception model of the human visual system [1][2].

C. Pyramid-based image fusion Pyramid-based fusion is based on the multi-scale

decomposition of an image. The Gaussian convolution is broadly used for the filtering process, and morphological filters also work well. Filters and arithmetic operators make a bit differences between pyramid-based methods, which are generally operated by the four quadrants: REDUCE, EXPAND, DIFFERENCE, and COMBINATION. In this paper, Ratio pyramid, Contrast pyramid, Laplacian pyramid, FSD (filter-subtract-decimate) pyramid, Gradient pyramid and Morphological pyramid methods are used for our evaluation [1][2][3].

D. Wavelet-based image fusion DWT (discrete wavelet transform) fusion methods

decompose an image with the wavelet coefficients and scales. The fusion is carried out in the decomposed state between the wavelet transform and the inverse wavelet transform to produce a fusion result [1][7]. The Daubechies-4 DWT and a Haar-based shift invariant DWT are used for our evaluation.

IV. GPU ACCELERATION

A. GPGPU To assist in creating a variety of GPU-based GPU

acceleration is ultimately dependent upon how to efficiently handle the fragment programs over the parallelized architecture of the GPU. For RTT purpose, OpenGL API loads image data onto the texture memory with GPU’s framebuffer objects taking charge of passing data between fragment programs toward output texture memory.

AWA image fusion algorithms are simply consisted of a fragment program except for the process to determine the weight value. The weight value can be calculated in advance and then inputted to fragment program, or calculated directly in fragment program.

Color-based image fusion algorithms also can be consisted of a fragment program that is given to input of source images. In order to implement each color-based fusion algorithm, operations such as color transformation or false color mapping for inputted texture are implemented as a fragment program. The operation of color transformation for conversion to the other color space can be more easily and implemented using product function of matrix and vector provided Cg standard library.

Pyramid-based image fusion algorithms can be composed of several fragment programs. And it receives not only input images but fusion rules, because fusion method is differed according to each resolution. Each operation can be matched to a fragment program. In GPU, REDUCE and EXPAND operations can be implemented effectively by using bilinear texture interpolation instead of convolution.

In order to perform wavelet-based image fusion algorithms, convolution-based DWT is implemented on GPU. At each level, the 2D DWT is achieved by performing 1D DWT first horizontally and then vertically. For multiplication and addition that are needed to rendering process of DWT/IDWT, indirect address table suggested by J. Wang [7] is made and used as input texture of fragment program.

B. CUDA-based GPGPU CUDA requires analysis of algorithms and data to find the

optimal numbers of threads and blocks that will keep the GPU fully utilized. The size of global data and the number of thread processors and block in the GPU can have a significant impact on the overall performance. To help the optimization of performance, NVIDIA provides an Excel spreadsheet called the Occupancy Calculator (OCC). For example, in case of the Laplacian pyramid fusion of image size 512x512, we put 256 threads in each block and a grid is consisted of 16x16 blocks.

The optimal threads sizes per block and the number of registers are allocated through OCC on implementation of each image fusion algorithm. In addition to, most of the pixel-level fusion algorithms only use global memory without shared memory because many operations in algorithms need not data sharing and largely per-pixel independent.

V. EXPERIMENTS We tested the speed of each implementation with several

aerial images taken by multiple sensors. The sizes of images are among 512x512, 1024x1024 and 2048x2048. The GPU is NVIDIA’s Geforce 8600 GTS with 128 stream processors, and the CPU is Intel Core2 Quad Q6600 processor (2.4GHz).

For fair evaluations, we had the CPU multithreaded, similar to the GPU’s parallel structure. Since our CPU provides threads up to four parts, we used four-threaded implementations for the CPU. In case of CPU using multi-threading, its overhead is only the time required for partitioning an image and setting up multithreads, but takes less than 5 percent of the total elapsed time. In GPU, framebuffer object creation and setup overheads related to memory allocation such as OpenGL and Cg parameter setting are measured by size of image. Unlike the CPU case, GPU overhead results from initializing computation

that shows a significant difference between GPU models. GPGPU takes 339 msec., while CUDA takes 26 in average for 2048x2048 images.

All multi-scale algorithms have a three-level pyramidal analysis and wavelet decomposition. In pyramid-based image fusion, choose max and averaging fusion rules are used high-pass fusion method and base-fusion method respectively. Among of many DWT, Haar and Daubechies-4 dwt are selected in our image fusion tests, and averaging fusion scheme is used after wavelet decompositions.

Figure 1. Execution time ratio of pixel-level image fusion algorithms. ([CPU]/[GPGPU or CUDA])

Fig.1 shows the execution time ratio of the fusion methods using CPU and GPU. To simplify the graph, we chose best execution time between the GPGPU and CUDA. Fusion performance was very influenced by image size. In the result of this experiment, the gap of fusion speed in CPU and GPU is bigger as resolution and complexity of operations are increasing. In multi-resolution fusion method, processing time of GPU is getting shorter than speed of CPU as level or resolution is increasing.

Figure 2. CPU/GPGPU execution time ratio according to variation of the number of frames. Here, Group-A, -B, -C, and -D denote AWA fusion, color-based fusion, pyramid-based fusion, and wavelet-based fusion, respectively.

We compared GPGPU with CPU regarding consuming time required for 30 consecutive image fusions in a row in consideration of a real-time fusion system. Fig.2 shows the ratio of CPU/GPGPU execution time of four-type fusion algorithms including setup overheads when it performs fusion of 30 frames. The speedup of the GPU goes bigger as the number of frames increases, because the initialization of the GPU is not needed anymore since the first frame processing.

VI. CONCLUSION Recently, fusion speed has emerged as an important factor

in the image fusion literature with the contributions of high performance sensors to image resolution and quality. The enormous computational power and flexible programmability of GPU are applied in many existing applications requiring fast speed. By the help of GPU’s multi-pass fragments, we were able to accelerate the processing speed of fusion methods. Although the modern CPU also provides a parallel architecture based on multithread, the GPU-assisted system outperforms a CPU system in image fusion methods. Through the experiments, we confirmed that the fusion speed of GPU was greatly improved compared to that of CPU over various. According to fusion method, GPU was 2-to-21 times faster than CPU. Finally, the evaluation proves the GPU even good

for a real-time system that handles a lot of data at the same time. This result brightens prospects of the GPU.

REFERENCES

[1] R.S. Blum and Z. Liu,Multi-Sensor Image Fusion and Its Applications, CRC Press, 2005.

[2] M. Smith and J. Heather, “Review of image fusion technology in 2005”, Proceeding of SPIE vol. 5782, pp. 29-45, 2005.

[3] C. Pohl and J.L. Van Genderen,“Review article Multisensor image fusion in remote sensing: concepts, methods and application,” Int. J. Remote Sens. 19(5), pp. 823-854, 1998.

[4] A.Ardeshir Goshtasby and Stavri Nikolov, “Image Fusion: Advances in the State of the Art”, Inform. Fusion 8 (2007) 114-118.

[5] J.D. Owens, M. Houston, D. Luebke, S .Green, J.E. Stone, and J.C Phillips, “GPU Computing,” Proceedings of the IEEE 96(5), pp. 879-899, May, 2008.

[6] J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable Parallel Programming with CUDA,” ACM 6(2), pp. 40-53, 2008.

[7] T.-T. Wong, C.-S. Leung, P.-A. Heng, J. Wang, “Discrete Wavelet Transform on Consumer-Level Graphics Hardware,” IEEE Trans. on Multimedia 9(3), pp. 668-673, 2007.

[8] Halfhill, T.R., 2008. Parallel Processing With CUDA. Microprocessor Report [Online] Available from: http://www.MPRonline.com

[9] Schenk, O., Christen, M., Burkhart, H., 2008. “Algorithmic performance studies on graphics processing units”, J. Parallel Distrib. Comput., 68(10):1360-1369.

Documents

[IEEE 2010 2nd International Conference on Information Technology Convergence and Services (ITCS) - Cebu, Philippines (2010.08.11-2010.08.13)] 2010 2nd International Conference on