9
Real time area-based stereo matching algorithm for multimedia video devices T. HACHAJ *1 and M.R. OGIELA 2 1 Pedagogical University of Krakow, Institute of Computer Science and Computer Methods, 2 Podchorążych Ave, 30–084 Krakow, Poland 2 AGH University of Science and Technology 30 Mickiewicza Ave, 30–059 Krakow, Poland In this paper we investigate stereovision algorithms that are suitable for multimedia video devices. The main novel contribu− tion of this article is detailed analysis of modern graphical processing unit (GPU)–based dense local stereovision matching algorithm for real time multimedia applications. We considered two GPU−based implementations and one CPU implementa− tion (as the baseline). The results (in terms of frame per second, fps) were measured twenty times per algorithm configuration and, then averaged (the standard deviation was below 5%). The disparity range was [0,20], [0,40], [0,60], [0,80], [0,100] and [0,120]. We also have used three different matching window sizes (3×3, 5×5 and 7×7) and three stereo pair image reso− lutions 320×240, 640×480 and 1024×768. We developed our algorithm under assumption that it should process data with the same speed as it arrives from captures’ devices. Because most popular of the shelf video cameras (multimedia video de− vices) capture data with the frequency of 30Hz, this frequency was threshold to consider implementation of our algorithm to be “real time”. We have proved that our GPU algorithm that uses only global memory can be used successfully in that kind of tasks. It is very important because that kind of implementation is more hardware−independent than algorithms that operate on shared memory. Knowing that we might avoid the algorithms failure while moving the multimedia application between machines operating different hardware. From our knowledge this type of research has not been yet reported. Keywords: stereovision, GPU algorithm, local methods, dense methods, CUDA. 1. Introduction Stereovision techniques are widely used in many fields of science. Using two images of a scene taken at the same time from two viewpoints – called “stereo pair” consisted of left and right image – it is possible to reconstruct the three dimensional information. Stereo matching algorithms aim at defining pairs of conjugate pixels, one in each image, which correspond to the same point in the 3D scene [1]. Among possible applications of this technology the majority of con− temporary researches concentrates on engineering, biome− dicine and navigation systems. Optical methods that give displacement or strain fields are now widely used in experi− mental mechanics [2]. In Ref. 3 the application of the stereo− −correlation technique to measure accurately the 3D shape of a stamped sheet metal part or the surface strain field undergone by the part during the stamping process is pre− sented. In Ref. 4 the evaluation of the longitudinal modulus of elasticity (EL) of maritime pine is investigated by a com− bined temporal tracking and stereo−correlation technique. In the field of biomedicine authors in Ref. 5 reports the devel− opment of a data fusion system, which allows surgeons to visualize the inner structures of organs during liver surgery. In this system they used stereo cameras to track intraopera− tive liver deformation. In Ref. 6 stereoscopic video seg− ments of a patient undergoing robot−assisted laparoscopic partial nephrectomy for tumour and another for a partial staghorn renal calculus were processed to evaluate the per− formance of a 3D−to−3D registration algorithm. The na− vigation systems utilize stereovision mainly to perform obstacle detection tasks [7]. In order to generate stereovision image the proper ima− ge−matching algorithm has to be used. Those algorithms find the corresponding points in left and right stereo images that belong to the same object. The well−known epipolar constraint is derived from the application of projective geo− metry techniques to stereovision. It states that, given a pixel in one of the images, potential conjugate pixels in the other image belong to a straight line called the epipolar line. This constraint shows that stereo matching is fundamentally a 1D problem [1]. The difference between horizontal positions of conjugate pixels is called the disparity. Disparities associ− ated to pixels of one of the images are usually represented as another image, called the disparity map [1]. The matching algorithms can be divided by two criteria: by the features that are analysis during the matching proce− dure or by the optimization procedure. Among first division there are two groups of disparity map generation methods: Opto−Electron. Rev., 21, no. 4, 2013 T. Hachaj OPTO−ELECTRONICS REVIEW 21(4), 367–375 DOI: 10.2478/s11772−013−0107−5 * e−mail: [email protected]

Real time area-based stereo matching algorithm for multimedia video devices

  • Upload
    m-r

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Real time area-based stereo matching algorithm for multimediavideo devices

T. HACHAJ*1 and M.R. OGIELA2

1Pedagogical University of Krakow, Institute of Computer Science and Computer Methods,2 Podchorążych Ave, 30–084 Krakow, Poland

2AGH University of Science and Technology 30 Mickiewicza Ave, 30–059 Krakow, Poland

In this paper we investigate stereovision algorithms that are suitable for multimedia video devices. The main novel contribu−tion of this article is detailed analysis of modern graphical processing unit (GPU)–based dense local stereovision matchingalgorithm for real time multimedia applications. We considered two GPU−based implementations and one CPU implementa−tion (as the baseline). The results (in terms of frame per second, fps) were measured twenty times per algorithm configurationand, then averaged (the standard deviation was below 5%). The disparity range was [0,20], [0,40], [0,60], [0,80], [0,100]and [0,120]. We also have used three different matching window sizes (3×3, 5×5 and 7×7) and three stereo pair image reso−lutions 320×240, 640×480 and 1024×768. We developed our algorithm under assumption that it should process data withthe same speed as it arrives from captures’ devices. Because most popular of the shelf video cameras (multimedia video de−vices) capture data with the frequency of 30Hz, this frequency was threshold to consider implementation of our algorithm tobe “real time”. We have proved that our GPU algorithm that uses only global memory can be used successfully in that kind oftasks. It is very important because that kind of implementation is more hardware−independent than algorithms that operateon shared memory. Knowing that we might avoid the algorithms failure while moving the multimedia application betweenmachines operating different hardware. From our knowledge this type of research has not been yet reported.

Keywords: stereovision, GPU algorithm, local methods, dense methods, CUDA.

1. Introduction

Stereovision techniques are widely used in many fields ofscience. Using two images of a scene taken at the same timefrom two viewpoints – called “stereo pair” consisted of leftand right image – it is possible to reconstruct the threedimensional information. Stereo matching algorithms aim atdefining pairs of conjugate pixels, one in each image, whichcorrespond to the same point in the 3D scene [1]. Amongpossible applications of this technology the majority of con−temporary researches concentrates on engineering, biome−dicine and navigation systems. Optical methods that givedisplacement or strain fields are now widely used in experi−mental mechanics [2]. In Ref. 3 the application of the stereo−−correlation technique to measure accurately the 3D shapeof a stamped sheet metal part or the surface strain fieldundergone by the part during the stamping process is pre−sented. In Ref. 4 the evaluation of the longitudinal modulusof elasticity (EL) of maritime pine is investigated by a com−bined temporal tracking and stereo−correlation technique. Inthe field of biomedicine authors in Ref. 5 reports the devel−opment of a data fusion system, which allows surgeons tovisualize the inner structures of organs during liver surgery.

In this system they used stereo cameras to track intraopera−tive liver deformation. In Ref. 6 stereoscopic video seg−ments of a patient undergoing robot−assisted laparoscopicpartial nephrectomy for tumour and another for a partialstaghorn renal calculus were processed to evaluate the per−formance of a 3D−to−3D registration algorithm. The na−vigation systems utilize stereovision mainly to performobstacle detection tasks [7].

In order to generate stereovision image the proper ima−ge−matching algorithm has to be used. Those algorithmsfind the corresponding points in left and right stereo imagesthat belong to the same object. The well−known epipolarconstraint is derived from the application of projective geo−metry techniques to stereovision. It states that, given a pixelin one of the images, potential conjugate pixels in the otherimage belong to a straight line called the epipolar line. Thisconstraint shows that stereo matching is fundamentally a 1Dproblem [1]. The difference between horizontal positions ofconjugate pixels is called the disparity. Disparities associ−ated to pixels of one of the images are usually represented asanother image, called the disparity map [1].

The matching algorithms can be divided by two criteria:by the features that are analysis during the matching proce−dure or by the optimization procedure. Among first divisionthere are two groups of disparity map generation methods:

Opto−Electron. Rev., 21, no. 4, 2013 T. Hachaj 367

OPTO−ELECTRONICS REVIEW 21(4), 367–375

DOI: 10.2478/s11772−013−0107−5

*e−mail: [email protected]

sparse (feature−based) and dense methods. In feature−basedmatching algorithms, only a subset of pixels – correspond−ing to edges, corners or other salient features – is matched toprovide a sparse disparity map [10]. In a dense (area−based)group the whole texture of pixel is analysed [11].

Among the second division global methods enable gen−eration of accurate density maps but those techniques arequite slow. Algorithms from this group utilize many differ−ent matching strategies, for example, global relaxationtechnique [13].

The local methods may enable real time or nearly realtime performance. In Ref. 1 authors propose an approach tostereo matching using multiple 1D correlation windows,which yields a semi−dense disparity map and an associatedconfidence map. Ref. 15 presents an area−based stereo algo−rithm suitable for real time applications. The core of thealgorithm relies on the uniqueness constraint and on a mat−ching process that allows for rejecting previous matches assoon as more reliable ones are found. The method describedin Ref. 16 applies an adaptive window normalized crosscorrelation (NCC) matching and interpolated method to getthe sub−pixel image disparity value. In Ref. 17 a SupportVector Machine classifier is designed for solving the stereo−vision−matching problem. In Ref. 18 a multi−scale algorithmdedicated to a small baseline stereovision is described alongwith experiments on small angle stereo pairs. The stereomatching algorithms have been explored for many yearsand it is hardly possible to present a complete survey of allapproaches to that task. In order to observe how methodshave changed and developed among last decade we refer tothree papers (Refs. 19, 20, 21) that are completely devotedto state of the art of various aspects of stereo matchingtechniques.

Before the texture mapping hardware with programma−ble shaders became available, many computer graphic andimage processing tasks were performed on parallel architec−ture [22]. Currently off−the−shelf PCs are capable of execut−ing not only advanced rendering algorithms (for exampledirect volume rendering of large medical datasets [23,24]),but also all computational tasks that can be modelled asa single instruction multiple data algorithm (SIMD). SIMDarchitecture of GPU has found many applications in imageprocessing tasks (for actual state of the art see [25–27]).Among them are also GPU parallel implementations ofvarious stereovision algorithms [28,29].

The main novel contribution of this article is a detailedanalysis of a modern GPU – based dense local stereovisionmatching algorithm for multimedia video devices. We com−pare three different implementation of the matching algo−rithm: fast CPU implantation [15] (“baseline” algorithm inour comparison), GPU implementation with shared memorybased on Ref. 29 and our novel GPU implementation thatuses only global memory. We wanted to check if the imple−mentation of a stereo matching algorithm that works underconstraint of real time requires from a scientist a detailedknowledge hardware architecture of video card (especiallyabout block shared memory). It is very important because

that kind of implementation is less hardware−independentthan algorithms that operate only on global memory. Weknow that we might avoid the algorithms failure while mov−ing the multimedia application between machines operatingdifferent hardware. From our knowledge this type of rese−arch has not been yet reported.

2. Material and methods

The task of a dense local stereovision matching algorithm isto find value of disparity (parameter d) that minimizes thevalue of error function between the left and the right image.In our approach we use the sum of absolute differences’(SAD) error function [15]

SAD x y d L x i y j R x i d y jj m

m

i n

n

( , , ) ( , ) ( , )� � � � � � �� �� ��� , (1)

where L(x,y) and R(x,y) are the values of a pixel colour,respectively in the left and the right image, n, m are the sizeof a matching window and d is the disparity.

The most expensive task performed by the stereo algo−rithm is the computation of SAD scores, which are neededto carry out the direct matching phase. Many approachesthat speed up that process have been proposed. The mostbasic solution (the “naive approach”) requires redundantcalculations of SAD values in the left and the right imagesfor each window. The task might be simplified by a separatecalculation of SAD for given d on rows and columns. Theeven faster method was proposed in Ref. 15. The authors foreach considered d computes separately the vertical slice ofa matching window value (Fig. 1 – blue area). The obtainedpartial SAD values are stored in memory. SAD value ineach window is computed as the sum of sequential storedvalues. The SAD result for next window is computed froma previously obtained one, simply by subtracting the most –left and addition of most – right stored value from windowrange. In case of the next column, the stored data is updatedby subtracting the top – most (Fig. 1 – blue and green row)and adding bottom most value (yellow row) from windowrange a fragment of each column. This allows for keepingcomplexity small and independent of the size of the match−ing window, since only four elementary operations are nee−ded to obtain the SAD score at each new point. The pseudo−code of this algorithm can be found in Ref. 15.

The CPU implementation of that approach on a contem−porary PC CPU is still not fast enough to be computed inreal time. In case of applications with time restriction thestereo−matching algorithms have to be implanted as parallelsingle instruction multiple data (SIMD) GPU algorithms. Inorder to create optimal implementation of GPU−executablealgorithm the scientist has to take into account the hardwarearchitecture and optimize the data flow between threads.The constructed algorithm, however, might not work in thesame (optimal) way (or even stop working at all) betweendifferent GPU models that, for example, have differentamount of shared memory per processors block. We want to

Real time area−based stereo matching algorithm for multimedia video devices

368 Opto−Electron. Rev., 21, no. 4, 2013 © 2013 SEP, Warsaw

inspect what is the difference of speed between highly opti−mized GPU based stereo−matching algorithm and GPU im−plementation that utilizes only basic features of SIMD ar−chitecture. We developed our algorithm under assumptionthat it should process data with the same speed as it arrivesfrom capture devices. Since the most popular of the shelfvideo cameras (multimedia video devices) captures datawith a frequency of 30Hz, this frequency was threshold toconsider implementation of our algorithm to be “real time”.That is required for multimedia applications with a naturaluser interface [30].

We implemented our algorithms utilizing Compute Uni−fied Device Architecture (CUDA) of Nvidia video cards[31]. CUDA assumes that the CUDA threads may executeon a physically separate device (GPU) that operates as a co−processor to the host (CPU) running the C program. CUDAalso assumes that both the host and the device maintain theirown DRAM, referred to as host memory and device mem−ory, respectively. Therefore, the program manages global,constant, and texture memory spaces through calls to theCUDA runtime. This includes device memory allocationand de−allocation, as well as data transfer between host anddevice memory [31]. Since transfer of data between hostand device memory is time demanding, the algorithmsshould avoid frequent memory switching.

The second algorithm (later called “GPU global”) oper−ates using only global memory of device (Fig. 2.). This isour novel proposition of solving a stereo – matching prob−lem. After calculation difference between pixel values in theleft and the right image, SAD is computed separately forrows and columns of a stereo pair. The left and the rightimage and partial results are stored in global device mem−ory. The main loop operates on range of considered dispar−ity. After obtaining the results the disparity map is sent backto host memory.The pseudocode of our method is presented below:

Function GenerateDisparityMap(windowSize,//

matching window sizes

ImR,//right image

ImL, //left image

maximal_disparity)//maximal considered disparity.

Begin <<SIMD>> //begin of SIMD instructions

block, that is computed on GPU

– Initialize array of disparity (dmap) using

SIMD function, array is stored inside

global GPU memory. Set value “infinity” to

each cell of dmap.

– Each thread assigns value to one matrix

cell with coordinates computed inside each

SIMD function basing on thread and block ID.

End <<SIMD>>//end of SIMD block

//Loop in the range of all considered disparities

c := 0

Loop While c <<= maximal_disparity

Begin

Begin <<SIMD>>

– Calculate absolute difference between

pixels values in ImR and ImL. The disparity

between images is c.

– Results are stored in temporal table T1

insied global GPU memory.

– Each thread assigns value to one matrix

cell with coordinates computed inside each

SIMD function basing on thread and block ID.

End <<SIMD>>

Begin <<SIMD>>

– Computes SAD for rows of table T1, each

thread compute partial SAD for fixed row

summing up number of columns defined in

variable windowSize.

– Results are stored in temporal table T2

insied global GPU memory.

– Each thread assigns value to one matrix

cell with coordinates computed inside each

SIMD function basing on thread and block ID.

End <<SIMD>>

Begin <<SIMD>>

– Computes SAD for coulmns of table T2, each

thread compute partial SAD for fixed

column summing up number of rows defined

in variable windowSize.

Opto−Electron. Rev., 21, no. 4, 2013 T. Hachaj 369

Fig. 1. Fast area−based stereo matching algorithm [15].

– Results are stored in temoral table T1

insied global GPU memory.

– Each thread assigns value to one matrix

cell with coordinates computed inside each

SIMD function basing on thread and block ID.

End <<SIMD>>

Begin <<SIMD>>

– Update disparity map (dmap). Set value c

to particular cell of dmap if SAD for this

cell is smaller than any previously

computed SAD for this cell.

– Each thread assigns value to one matrix

cell with coordinates computed inside each

SIMD function basing on thread and block ID.

End <<SIMD>>

c := c + 1

End Loop While

Return dmap

The third algorithm (later called “GPU shared”) is basedon Ref. 29 and uses both global and shared memory (Fig. 3).The image is split up into tiles, i.e., rectangular sections ofthe image. In order to process the image in parallel, eachblock is responsible for computing the disparity values forone tile.

Access to the texture memory is much slower than ac−cessing a shared memory location. In order to get the maxi−mum throughput all reference pixels of a block are loaded inparallel. Each thread loads one pixel into shared memory. InCUDA development it is always important to keep the ra−tion of idle threads to active threads as low as possible. Inthe algorithm the number of threads and, therefore, the tilesize is maxed out to 512 threads per block [29]. Each thread

read one pixel from the left and the right image computesthe element of SAD and stores it in shared memory. Aftersynchronization of thread in block, the value of SAD for onepixel is calculated. During each iteration of main loop in thethread that operates on range of considered disparity, thevalue of disparity map might be updated and stored inglobal memory. After obtaining the results, the disparitymap is sent back to host memory.

The pseudocode of this algorithm is presented below.The CUDA source code of this method is in Ref. 29.

Function GenerateDisparityMap2(tile_width,

//width of area that is processed by

single shaders block

tile_height,

//height of area that is processed by

single shaders block

disparity_min,//minimal considered

disparity

disparity_max,//maximal considered

disparity

imL,//left image

imR)//right image

Begin <<SIMD>> //Begin of SIMD instructions

block, that is computed on GPU.

– Declare an array “differences” that is

stored inside shared memory. It can be

accessed from all threads inside single

block.

– Initialize array of disparity (dmap) using

SIMD function, array is stored inside

global GPU memory. Set value “infinity” to

each cell of dmap.

Real time area−based stereo matching algorithm for multimedia video devices

370 Opto−Electron. Rev., 21, no. 4, 2013 © 2013 SEP, Warsaw

Fig. 2. Schema of “GPU global” algorithm. After calculation difference between pixel values in left and right image, SAD is computed sepa−rately for rows and columns of a stereo pair. The left and right image and partial results are stored in global device memory. The main loop op−

erates on range of considered disparity.

- Initialize the base of indexes basing on

thread ID, block ID, tile_width and

tile_height. With this indexes each thread

in block operates inside same matching

window.

– Load pixel from imL, store it in valuable

“d”. Each thread load different value from

imL with coordinates computed inside each

SIMD function basing on thread ID, block

ID, tile_width and tile_height.

//Loop in considered disparity range.

//Each thread in each block finds disparity

of single pixel in disparity map.

c := disparity_min

Loop While c <<= disparity_max

Begin

– Calculate absolute difference between

value “d” and pixel loaded from imR. The

disparity between images is c. Store that

difference in shared memory (differences

array).

//Wait until all threads end previous

instructions.

<<SYNCHRONIZE SIMD THREADS>>

– Compute SAD by summing up partial results

from differences array.

– Update disparity map (dmap). Set value c

to particular cell of dmap if SAD for this

cell is smaller than any previously

computed SAD for this cell.

//Wait until all threads end previous

instructions.

<<SYNCHRONIZE SIMD THREADS>>

c := c + 1

End Loop While

End <<SIMD>>//end of SIMD block

Return dmap.

3. Calculation and results

The rendering speed of three previously described algo−rithms was tested on a consumer−quality PC with an IntelCore 2 Duo CPU 3.00 GHz processor, 3.25 GB RAM, andan Nvidia GeForce 9600 GT graphics card, running 32−bitWindows XP Professional. We considered two previouslydescribed GPU−based implementations and one CPU imple−mentation (as the baseline). The results (in terms of frameper second, fps) was measured twenty times per algorithmconfiguration and then averaged (the standard deviation wasbelow 5%). The disparity range was [0,20], [0,40], [0,60],[0,80], [0,100] and [0,120]. We also have used three differ−ent matching window sizes (3×3, 5×5 and 7×7). The ob−tained results for three stereo pair image resolutions320×240, 640×480 and 1024×768 are presented in Table 1,Table 2 and Table 3, respectively. The plot of speed mea−surements (in terms of frame per second, fps) as the functionof maximal considered disparity is presented in Fig. 4.Example density map obtained by examined local area−−based stereovision matching algorithm in indoor environ−ment can be seen in Fig. 5 (all implementations return thesame results).

The results presented in Tables 1–3 and in Fig. 4 showsimilar relationship between matching window size, stereopair resolution and computation speed of algorithms. Thecomputation speed decreases while more pixels are takeninto window and if image stream resolution grows. In allplots in Fig. 4 we add a dotted line in the level of 30 fps toshow more efficiently the difference between the tests andto emphasis the aspect of algorithms that are relevant to thisresearch. As we mentioned before, the algorithm will beused by multimedia devices which capture frequency is

Opto−Electron. Rev., 21, no. 4, 2013 T. Hachaj 371

Fig. 3. Schema of “GPU shared” algorithm. Each thread read one pixel from left and right image computes the element of SAD and stores it inshared memory. After synchronization of thread in block, the value of SAD for one pixel is calculated. During each iteration of main loop in

the thread that operates on range of considered disparity, the value of disparity map might be updated and stored in global memory.

about � 30 fps. What is more we prefer algorithm that is lessdependent to hardware that is installed. Knowing that wewill discuss obtained results in the next section.

4. Discussion

It is obvious that the calculation speed (expressed in framesper second) of all algorithms decrease with size of window,disparity range and resolution of stereo pair. The size ofwindow does not affect much the speed of CPU−based im−plementation, because it only matters in the initial step ofthe algorithm while the first columns values are obtained.

The experiment showed that in all considered configura−tions the fastest implementation is GPU shared algorithm. Itmight be about 2 times faster than GPU global algorithm (incase of a small window size) and 10 times faster than CPUimplementation – see Table 1 and Fig. 4 (window size 3×3,resolution 320×240). In case of larger window sizes (5×5and 7×7) the differences between usage of shared and onlyglobal memory becomes smaller (the shared memory algo−rithm is 2 times or 1.5 times faster). That is because morethreads need to be used to compute single SAD. In case ofreal−time applications it is required that the disparity maphas to be computed in � 30 fps or more. This condition is not

Real time area−based stereo matching algorithm for multimedia video devices

372 Opto−Electron. Rev., 21, no. 4, 2013 © 2013 SEP, Warsaw

Table 1. Speed measurements (in terms of frame per second, fps) for stereo pair resolution 320×240.

Algorithm / disparity d = 20 d = 40 d = 60 d = 80 d = 100 d = 120

GPU GLOBAL (3×3) 120.42 65.00 44.97 36.36 28.85 24.63

GPU SHARED (3×3) 219.83 130.65 93.03 71.49 58.40 47.95

CPU (3×3) 19.36 10.24 7.11 5.42 4.66 4.04

GPU GLOBAL (5×5) 93.39 49.58 34.61 27.37 22.06 18.29

GPU SHARED (5×5) 142.29 79.87 54.27 40.79 33.97 27.17

CPU (5×5) 19.66 10.48 7.32 5.71 4.76 4.13

GPU GLOBAL (7×7) 73.12 41.08 28.37 21.47 17.50 14.74

GPU SHARED (7×7) 97.44 53.78 35.82 27.18 21.86 18.70

CPU (7×7) 19.89 10.47 7.31 5.70 4.75 4.06

Table 2. Speed measurements (in terms of frame per second, fps) for stereo pair resolution 640×480.

Algorithm / disparity d = 20 d = 40 d = 60 d = 80 d = 100 d = 120

GPU GLOBAL (3×3) 35.89 18.36 13.07 9.91 8.00 6.70

GPU SHARED (3×3) 66.58 36.68 24.62 19.05 15.91 13.30

CPU (3×3) 4.58 2.46 1.70 1.28 1.06 0.90

GPU GLOBAL (5×5) 25.99 14.02 9.60 7.14 5.74 4.63

GPU SHARED (5×5) 42.94 21.39 14.65 11.42 9.31 7.88

CPU (5×5) 4.76 2.18 1.69 1.28 1.02 0.86

GPU GLOBAL (7×7) 21.08 11.22 7.62 5.74 4.22 3.76

GPU SHARED (7×7) 26.77 13.58 9.45 7.14 5.81 4.98

CPU (7×7) 4.79 2.47 1.71 1.26 1.05 0.89

Table 3. Speed measurements (in terms of frame per second, fps) for stereo pair resolution 1024×768.

Algorithm / disparity d = 20 d = 40 d = 60 d = 80 d = 100 d = 120

GPU GLOBAL (3×3) 7.86 4.05 2.72 2.06 1.63 1.36

GPU SHARED (3×3) 19.76 11.14 7.52 6.10 5.11 4.34

CPU (3×3) 1.89 0.99 0.67 0.51 0.42 0.35

GPU GLOBAL (5×5) 6.27 3.13 2.18 1.66 1.34 1.12

GPU SHARED (5×5) 13.61 7.92 5.63 4.29 3.47 2.92

CPU (5×5) 1.85 0.94 0.67 0.50 0.41 0.35

GPU GLOBAL (7×7) 5.31 2.72 1.84 1.40 1.12 0.93

GPU SHARED (7×7) 9.72 5.14 3.51 2.73 2.20 1.84

CPU (7×7) 1.88 0.98 0.67 0.51 0.42 0.34

satisfied by any of those algorithms while stereo pair is inresolution 1024×768 (see Table 3 and bottom row in Fig. 4).In case of 640×480 only GPU shared (while d = 20 andd = 40) and GPU global (d = 20) satisfies it (see Table 2 andmiddle row in Fig. 4). It has to be taken into account thatd = 20 might be too small to successfully match objects thatare close to the stereo camera. In case of 320×240 both GPUalgorithms run in required speed (for “GPU global” algo−rithm for d = 100 the fps is � 4% and for d = 120 the fps is� 22% below the required speed of real time applications).The 320×240 stereo pair resolution is sufficient for multi−

media applications that perform simple image processingtasks (like segmentation or object tracking). It can be seenthat GPU algorithm that uses only global memory can beused successfully in that kind of tasks. It is very importantbecause that kind of implementation is more hardware−inde−pendent than algorithms that operate on shared memory. Weknow that we might avoid the algorithms failure while mov−ing the multimedia application between machines operatingdifferent hardware. While dealing with similar computa−tional tasks as we described in this chapter a scientist hasto consider if an examined solution requires full possible

Opto−Electron. Rev., 21, no. 4, 2013 T. Hachaj 373

Fig. 4. Speed measurements (in terms of frame per second, fps) as a function of maximal considered disparity. Each row presents results fordifferent stereo pair resolution: 320×240, 640×480, 1024×768 (rows). Each column presents results for different window size: 3×3, 5×5

and 7×7, respectively.

Fig. 5. Example density map obtained by examined stereovision−matching algorithm in indoor environment.

performance speed of shared memory implementation. Ifit is not the main scope of the performed work, he or shemight use only the global memory of GPU and not put somany efforts in considering detailed hardware architectureof his video card, but concentrate on the scientific goal ofresearches.

5. Conclusions

We have showed that that is possible to use only globalmemory based GPU algorithm for real time stereovisiontasks in low resolution. The calculation speed of all consid−ered algorithms is the function of window size, the disparityrange and the resolution of stereo pair. In our case the rela−tion of performance speed (expressed in frames per second)of “GPU shared” algorithm to “GPU global” algorithm var−ies from 3 to 1.5. Both pseudocode and implementation ofour algorithm can be very useful, both for researchers andpracticians because it was deeply investigated and can bedirectly deployed into their on−going research or technicalprojects.

We are planning to apply the results presented in thisarticle in our medical data visualization systems [23,32,33]supplying it with natural interface based on stereovision[30]. Those methods aim at lifting support for physicians toa new level by presenting more informative and realistic 3Dvisualizations and to enable easy and reliable manipulationof visualized objects.

Acknowledgements

We kindly acknowledge the support of this study by thePedagogical University of Krakow Statutory ResearchGrant.

References

1. S. Lefebvre, S. Ambellouis, and F. Cabestaing, “A 1D ap−proach to correlation−based stereo matching”, Image VisionComput. 29, 580–593 (2011).

2. J−J. Orteu, “3−D computer vision in experimental mechan−ics”, Opt. Laser. Eng. 47, 282–291 (2009).

3. D. Garcia, J.J. Orteu, and L. Penazzi, “A combined temporaltracking and stereo−correlation technique for accurate measure−ment of 3D displacements: application to sheet metal form−ing”, J. Mater. Process. Tech. 125–126, 736–742 (2002).

4. J. Xavier, A.M.P. de Jesus, J.J.L. Morais, and J.M.T. Pinto,“Stereovision measurements on evaluating the modulus ofelasticity of wood by compression tests parallel to the grain”,Constr. Build. Mater. 26, 207–215 (2012).

5. M. Uematsu, N. Suzuki, A. Hattori, Y. Otake, S. Suzuki, M.Hayashibe, S. Kobayashi, and A. Uchiyama, “A real−time datafusion system updating 3D organ shapes using colour informa−tion from multi−directional cameras”, ICS 1268, Proc. Com−puter Assisted Radiology and Surgery, 741–746 (2004).

6. L−M. Su, B.P. Vagvolgyi, R. Agarwal, C.E. Reiley, R.H.Taylor, and G.D. Hager, “Augmented reality during robot−−assisted laparoscopic partial nephrectomy: toward real−time

3D−CT to stereoscopic video registration”, Urology 73,896–900 (2009).

7. Q. Yu, H. Araújo, and H. Wang, “A stereovision method forobstacle detection and tracking in non−flat urban environ−ments”, Auton. Robot. 19, 141–157 (2005).

8. R. Labayrade, C. Royere, D. Gruyer, and D. Aubert, “Coop−erative fusion for multi−obstacles detection with use of ste−reovision and laser scanner”, Auton. Robot. 19, 117–140(2005).

9. K. Kohara, N. Suganuma, T. Negishi and Takuya Nanri,“Obstacle detection based on occupancy grid maps usingstereovision system”, Int. J. Intell. Transport. Syst. Research8, 85–95 (2009).

10. G. Pajares and J.M. de la Cruz, “A probabilistic neural net−work for attribute selection in stereovision matching”, Neu−ral Comput. Appl., 83–89 (2002).

11. H. Halawana, H. Hamdan, and M. Hamdan, “Dense ste−reovision using mono−CCD color cameras”, Artif. Life Ro−bot. 15, 508–511 (2010).

12. A.S. Ogale and Y. Aloimonos, “Shape and the stereo corre−spondence problem”, Int. J. Comput. Vision 65, 147–162(2005).

13. G. Pajares and J.M. de la Cruz, “Fuzzy cognitive maps forstereovision matching”, Pattern recogn. 39, 2101–2114(2006).

14. G. Pajares, J.M. de la Cruz, and J.A. López−Orozco, “Relax−ation labelling in stereo image matching”, Pattern Recogn.33, 53–68 (2000).

15. L. Di Stefano, M. Marchionni, and S. Mattocci, “A fastarea−based stereo matching algorithm”, Image Vision Com−put. 22, 983–1005 (2004).

16. Q. Zhu, Y. Jiang, W. Deng and L. Tang, “Crowdedness esti−mation approach based on stereovision for bus passengers”,J. Shanghai University 14, 17–23 (2010).

17. G. Pajares and J.M. de la Cruz, “Stereovision matchingthrough support vector machines”, Pattern Recogn. Lett. 24,2575–2583 (2003).

18. J. Delon and B. Rougé, “Small baseline stereovision”, J.Math. Imaging Vision 28, 209–223 (2007).

19. M.J.P.M. Lemmens, “A survey on stereo matching tech−niques”, Proc. 16 th ISPRSC. In ASPRS 27/B8, pp.11–23,Kyoto, 1988.

20. D. Scharstein and R. Szeliski, “A taxonomy and evaluationof dense two−frame stereo correspondence algorithms”, Int.J. Comput. Vision 47 Issue 1–3, 7–42 (2002).

21. N. Lazarosa, G.C. Sirakoulisb, and A. Gasteratosa, “Reviewof stereo vision algorithms: from software to hardware”, Int.J. Optomechatr. 2, 435–462 (2008).

22. A.H.J. Koning, K.J. Zuidervelda, and M.A. Viergevera, “Vo−lume visualization on shared memory architectures”, Paral−lel Comput. 23, 915–925 (1997).

23. T. Hachaj and M.R. Ogiela, “Framework for cognitive analy−sis of dynamic perfusion computed tomography with visual−ization of large volumetric data”, J. Electron. Imaging. 21,(2012), doi: 10.1117/1.JEI.21.4.043017.

24. T. Hachaj and M.R. Ogiela, “Visualization of perfusion ab−normalities with GPU−based volume rendering”, Comput.Graphics 36, 163–169 (2012).

25. Y. Allusse, P. Horain, A. Agarwal, and C. Saipriyadarshan,“GpuCV: A GPU−accelerated framework for image proces−sing and computer vision”, Lect. Notes Comput. 5359,430–439 (2008).

Real time area−based stereo matching algorithm for multimedia video devices

374 Opto−Electron. Rev., 21, no. 4, 2013 © 2013 SEP, Warsaw

26. D. Castańo−Díez, D. Moser, A. Schoenegger, S. Pruggnaller,and A.S Frangakis, “Performance evaluation of image pro−cessing algorithms on the GPU”, J. Struct. Biol. 164,153–160 (2008).

27. R. Di Salvo and C. Pino, “Image and video processing onCUDA: state of the art and future directions”, MACMESE’11Proc. 13th WSEAS Int. Conf. on Mathematical and Compu−tational Methods in Science and Engineering, pp. 60–66,2011.

28. W. Lik Dennis Lui, R. Jarvis, Eye−Full Tower, “A GPU−−based variable multibaseline omnidirectional stereovisionsystem with automatic baseline selection for outdoor mobilerobot navigation”, Robot. Auton. Syst. 58, 747–761 (2010).

29. S. Prehn, “GPU stereo vision”, http://www.planetswebde−sign.de/fileadmin/pdfs/Prehn%20Sebastian%20−%20GPU%20Stereo%20Vision%20−%202007−12−06.pdf

30. R. Fujiki, H. Yoshimoto, D. Arita, and R. Taniguchi, “Real−−time model−based hand shape estimation with stereo vi−sion”, Proc. of Korea−Japan Joint Workshop on Frontiers ofComputer Vision, pp. 225–230, 2005.

31. NVIDIA CUDA Compute Unified Device Architecture, Pro−gramming Guide, Version 2.0, http://developer.download.nvidia.com/compute/cuda/2_0/docs/NVIDIA_CUDA_Pro−gramming_Guide_2.0.pdf, (2008).

32. M.R. Ogiela and S. Bodzioch, “Computer analysis of gall−bladder ultrasonic images towards recognition of pathologi−cal lesions”, Opto−Electron. Rev. 19, 155–168 (2011).

33. S. Bodzioch and M.R. Ogiela, “New approach to gallblad−der ultrasonic images analysis and lesions recognition”,Computerized Medical Imaging and Graphics. 33, 154–170(2009).

Opto−Electron. Rev., 21, no. 4, 2013 T. Hachaj 375