2
Accelerated Polyhedral Visual Hulls Using OpenCL Tobias Duckworth, David J. Roberts University of Salford, Manchester, UK ABSTRACT We present a method for reconstruction of the visual hull (VH) of an object in real-time from multiple video streams. A state of the art polyhedral reconstruction algorithm is accelerated by implementing it for parallel execution on a multi-core graphics processor (GPU). The time taken to reconstruct the VH is measured for both the accelerated and non-accelerated implementations of the algorithm, over a range of image resolutions and number of cameras. The results presented are of relevance to researchers in the field of 3D reconstruction at interactive frame rates (real-time), for applications such as telepresence. KEYWORDS: Reconstruction algorithms, Parallel processing, Virtual reality. INDEX TERMS: H.4.3 [Information Technology and Systems]: Communications Applications Computer conferencing, teleconferencing, and videoconferencing; I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling — Modeling from video 1 INTRODUCTION Reconstruction of 3D forms from multiple images is a popular area of research in the field of computer vision. There are numerous applications for systems capable of determining the 3D shape of an object, the requirements of which depend upon the application. The intended application is that of a 3D telepresence system in which the form of a human must be captured in real- time. Such a system must be able to capture the position and form of a dynamic non-rigid body quickly enough to be used as a real- time communication device – Interactive frame rates are around 20 frames per second and upward. Quality must be sufficient to provide a means of conveying non-verbal communication, such as facial expression and eye gaze. The real-time requirement of such a system rule out many 3D reconstruction techniques that are not sufficiently fast, and introduce a challenging balance between the speed of execution and quality of the result. Recent advances in GPU architectures provide multiple processing cores, which can be used to process data simultaneously. Subsequent development of languages targeting these parallel processing architectures, such as OpenCL and CUDA, provide general purpose approaches for harnessing this parallel processing capability. 2 RELATED WORK The technique known as shape from silhouette [1] creates a 3D form from a number of silhouettes derived from camera images. This is achieved by back projection of silhouettes from each camera’s point of view. The intersection of all silhouettes in 3D space forms an entity known as the visual hull [2]. The VH is the maximal surface enclosing the form of the object, and is unable to represent surface concavities. This approximation is sufficient to model objects that are largely convex in nature, and can be achieved efficiently. A number of approaches to modelling of the VH from silhouette images have been proposed, these mainly fall into two categories: Volumetric, and surface based approaches. Volumetric approaches [3][4] provide a general and robust solution to modelling of the VH, however they suffer from a lack of scalability as resolution increases, largely due to high memory consumption. Surface-based approaches [5][6] have a low memory footprint, even as image resolution is increased, and are therefore more scalable. However, the process requires more procedural logic, making it harder to achieve a robust implementation. Exact Polyhedral Visual Hulls (EPVH) [5] provides a robust algorithm for modelling of the VH, which yields watertight manifold meshes, but the performance of the algorithm does not lend itself to real-time applications. The authors refine the performance of the algorithm [7], later proposing a network distributed parallel implementation capable of real-time performance [8]. 3 ALGORITHM DETAILS We have re-implemented EPVH based on the description [5], since a reference implementation is not available. Two variants are implemented, one which executes on the CPU, and one which executes selected steps in parallel on the GPU. Performance over a range of camera numbers and resolutions are then compared. The algorithm variants are tested using a simulator; virtual cameras are placed around an object to be reconstructed – In this setting it is possible to run each algorithm with identical input data, ensuring they are tested under similar conditions. University of Salford, Salford, M5 4WT, United Kingdom email: [email protected] Figure 1. High level decomposition of the algorithm. Shading indicates suitability of the process for parallel execution. White processes have been implemented on the GPU. 203 IEEE Virtual Reality 2011 19 - 23 March, Singapore 978-1-4577-0038-5/11/$26.00 ©2011 IEEE

[IEEE 2011 IEEE Virtual Reality (VR) - Singapore, Singapore (2011.03.19-2011.03.23)] 2011 IEEE Virtual Reality Conference - Accelerated polyhedral visual hulls using OpenCL

  • Upload
    david-j

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2011 IEEE Virtual Reality (VR) - Singapore, Singapore (2011.03.19-2011.03.23)] 2011 IEEE Virtual Reality Conference - Accelerated polyhedral visual hulls using OpenCL

Accelerated Polyhedral Visual Hulls Using OpenCL

Tobias Duckworth, David J. Roberts

University of Salford, Manchester, UK

ABSTRACT We present a method for reconstruction of the visual hull (VH) of an object in real-time from multiple video streams. A state of the art polyhedral reconstruction algorithm is accelerated by implementing it for parallel execution on a multi-core graphics processor (GPU). The time taken to reconstruct the VH is measured for both the accelerated and non-accelerated implementations of the algorithm, over a range of image resolutions and number of cameras. The results presented are of relevance to researchers in the field of 3D reconstruction at interactive frame rates (real-time), for applications such as telepresence. KEYWORDS: Reconstruction algorithms, Parallel processing, Virtual reality. INDEX TERMS: H.4.3 [Information Technology and Systems]: Communications Applications — Computer conferencing, teleconferencing, and videoconferencing; I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling —Modeling from video

1 INTRODUCTION Reconstruction of 3D forms from multiple images is a popular area of research in the field of computer vision. There are numerous applications for systems capable of determining the 3D shape of an object, the requirements of which depend upon the application. The intended application is that of a 3D telepresence system in which the form of a human must be captured in real-time. Such a system must be able to capture the position and form of a dynamic non-rigid body quickly enough to be used as a real-time communication device – Interactive frame rates are around 20 frames per second and upward. Quality must be sufficient to provide a means of conveying non-verbal communication, such as facial expression and eye gaze. The real-time requirement of such a system rule out many 3D reconstruction techniques that are not sufficiently fast, and introduce a challenging balance between the speed of execution and quality of the result. Recent advances in GPU architectures provide multiple processing cores, which can be used to process data simultaneously. Subsequent development of languages targeting these parallel processing architectures, such as OpenCL and CUDA, provide general purpose approaches for harnessing this parallel processing capability.

2 RELATED WORK The technique known as shape from silhouette [1] creates a 3D form from a number of silhouettes derived from camera images. This is achieved by back projection of silhouettes from each

camera’s point of view. The intersection of all silhouettes in 3D space forms an entity known as the visual hull [2]. The VH is the maximal surface enclosing the form of the object, and is unable to represent surface concavities. This approximation is sufficient to model objects that are largely convex in nature, and can be achieved efficiently.

A number of approaches to modelling of the VH from silhouette images have been proposed, these mainly fall into two categories: Volumetric, and surface based approaches. Volumetric approaches [3][4] provide a general and robust solution to modelling of the VH, however they suffer from a lack of scalability as resolution increases, largely due to high memory consumption. Surface-based approaches [5][6] have a low memory footprint, even as image resolution is increased, and are therefore more scalable. However, the process requires more procedural logic, making it harder to achieve a robust implementation.

Exact Polyhedral Visual Hulls (EPVH) [5] provides a robust algorithm for modelling of the VH, which yields watertight manifold meshes, but the performance of the algorithm does not lend itself to real-time applications. The authors refine the performance of the algorithm [7], later proposing a network distributed parallel implementation capable of real-time performance [8].

3 ALGORITHM DETAILS We have re-implemented EPVH based on the description [5], since a reference implementation is not available. Two variants are implemented, one which executes on the CPU, and one which executes selected steps in parallel on the GPU. Performance over a range of camera numbers and resolutions are then compared. The algorithm variants are tested using a simulator; virtual cameras are placed around an object to be reconstructed – In this setting it is possible to run each algorithm with identical input data, ensuring they are tested under similar conditions.

University of Salford, Salford, M5 4WT, United Kingdom email: [email protected]

Figure 1. High level decomposition of the algorithm. Shading indicates suitability of the process for parallel execution. White

processes have been implemented on the GPU.

203

IEEE Virtual Reality 201119 - 23 March, Singapore978-1-4577-0038-5/11/$26.00 ©2011 IEEE

Page 2: [IEEE 2011 IEEE Virtual Reality (VR) - Singapore, Singapore (2011.03.19-2011.03.23)] 2011 IEEE Virtual Reality Conference - Accelerated polyhedral visual hulls using OpenCL

3.1 Decomposition and implementation We have decomposed the EPVH algorithm as shown in figure 1. Images are captured from each camera and processed to remove the background. Next, the boundary separating foreground from background is determined, resulting in a list of points defining the edge (contour) of the object. For simple objects this will result in a single contour. For more complex objects, such as those containing holes, a number of contours will be defined.

Viewing lines are created from contour points by back-projecting the point from the camera centre through the image plane to infinity. Since contour points are independent of each other this operation can be performed in parallel by the GPU. This is achieved by passing the camera inverse projection matrix and list of contour points to an OpenCL kernel instance for each camera. A kernel thread is created for each viewing line, each thread calculates the 3D end point of a viewing line. The 3D start point is simply the camera centre.

The image of each viewing line in every other camera view is determined by projecting the start and end points into each camera view. This can be performed in parallel by invoking an OpenCL kernel for each camera, and projecting the 3D start and end coordinates of the viewing line into every other camera image. Note that the dimensionality of the operation has increased compared to creation of the viewing lines; each kernel thread is responsible for projecting a single viewing line into a particular camera’s image plane.

Each projected viewing line must now be tested to determine intersection with any other camera’s contour edges. A simple 2D line segment intersection test is used. The viewing line direction and contour edge normal are used to determine whether the viewing line is entering or leaving the silhouette image. The operation can be executed in parallel on the GPU by invoking an OpenCL kernel for each camera. Each kernel thread is responsible for testing the intersection of a single viewing line with a single contour edge.

For each recorded intersection along a viewing line, the 3D coordinates must now be calculated so that they can be ordered along the length of the viewing line to determine spans that pass through the silhouettes of all camera images. These spans are used to form viewing edges, which correspond to where the viewing line passes along the surface of the VH of the object being reconstructed.

The start and end points of viewing edges form the majority of vertices in the final model. The remainder are triple points, where three camera silhouette cones intersect. Each viewing edge vertex must connect to two more vertices, one to the left, and one to the right. Using the information determined during the intersection of viewing lines and contour edges, the particular vertex instance a left or right edge leads to can be determined. If creation of this edge intersects any contour edge when projected into every other camera image, a triple point has been found.

Once all triple points have been located, and every vertex is connected to three other vertices via three edges, reconstruction of the model is complete. The polygons comprising the surface of the VH can now be created from the defined edges and determined connectivity. Created polygons derive their texture from the camera whose principle ray is closest to parallel with the polygon surface normal and pointing in the opposite direction.

4 EVALUATION The implementation was tested on an Apple Mac Pro with 2 x 2.8GHz quad core CPUs, 18GB RAM and an nVidia GTX285 GPU. Figure 2 shows the overall reconstruction time of both

algorithms for a range of camera image resolutions and number of cameras. It can be clearly seen that for low resolutions and low camera counts, the CPU algorithm completes more quickly, yet as resolution increases and more cameras are added, the GPU accelerated algorithm outperforms the CPU version – For high camera counts and resolutions an order of magnitude performance increase is achieved.

Figure 2. Overall reconstruction time

5 CONCLUSION AND FUTURE WORK The results show that as image resolution and camera count increase, the acceleration offered by the GPU increases proportionally. Interactive frame rates are achieved using a wider range of camera resolutions and quantity of cameras than with the non-accelerated algorithm. Therefore, the approach offers a scalable and efficient method for capturing three-dimensional models in real-time, suitable for virtual reality applications such as telepresence.

Only part of the EPVH algorithm has been implemented for parallel execution on the GPU, the remaining steps could be implemented to achieve further acceleration – This forms the basis for future work.

ACKNOWLEDGEMENTS This research was supported by EPSRC and OMG Vicon.

REFERENCES [1] B. Baumgart. A polyhedron representation for computer vision.

AFIPS '75: Proceedings of the national computer conference and exposition, 1975

[2] Laurentini. The visual hull concept for silhouette-based image understanding. Pattern Analysis and Machine Intelligence, IEEE Transactions on 16(2), pp. 150-162, 1994

[3] D. Knoblauch and F. Kuester. Focused Volumetric Visual Hull with Color Extraction. Advances in Visual Computing, pp. 208-217, 2009

[4] S. Zhang. DreamWorld: CUDA-accelerated real-time 3D modeling system. VECIMS. IEEE International Conference on, pp.168-173, 2009

[5] J. Franco and E. Boyer. Exact polyhedral visual hulls. British Machine Vision Conference, 2005

[6] W. Matusik. et al. Polyhedral visual hulls for real-time rendering. Eurographics Workshop on Rendering 1, pp. 115-125, 2001

[7] J. Franco and E. Boyer. Efficient polyhedral modeling from silhouettes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 13(3), pp. 414-427, 2009

[8] Petit et al. Grimage: ‘3D modeling for remote collaboration and telepresence.’ Proceedings of the ACM VRST, 2008

204