1
[email protected] 2009 CUDA Accelerated Sparse Field Level Set Segmentation of Large Medical Data Sets Mike Roberts, Jeff Packer, J Ross Mitchell, Mario Costa Sousa University of Calgary, Calgary, Alberta, Canada 1 Introduction Image segmentation is an important task in diagnostic medicine Level set segmentation techniques have been shown to improve the accuracy of difficult segmentation tasks Previous level set segmentation techniques are computationally expensive even when running on the GPU Long computation times limit clinical utility 2 Sparse Field Algorithm 9x speedup compared to previous GPU algorithms 16x fewer computations compared to previous GPU algorithms No reduction in segmentation accuracy Performed entirely on the GPU without requiring costly GPU-to-CPU message passing Number of active voxels approaches 0 as the segmentation converges 1. Shattuck D W, Prasad G, Mirza M, Narr K L, Toga A W (2009) Online Resource for Validation of Brain Segmentation Methods. NeuroImage 45(2), 2009. 431-439 2. Whitaker R T (1994) Volumetric Deformable Models: Active Blobs. Visualization in Biomedical Computing 1994. 122–134 3. Lefohn A E, Cates J E, Whitaker R E (2003) Interactive, GPU-Based Level Sets for 3D Segmentation. MICCAI 2003. 564–572 4. Cates J E, Lefohn A E, Whitaker R T (2004) GIST: An Interactive, GPU-Based Level-Set Segmentation Tool for 3D Medical Images. Medical Image Analysis 8(3), 2004. 217-231 5. Lefohn A E, Kniss J E, Hansen, Whitaker R E (2003) Interactive Deformation and Visualization of Level Set Surfaces Using Graphics Hardware. IEEE Visualization 2003. 75-82 6. Sethian J A (1999) Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press, New York 7. Adalsteinson D, Sethian J A (1995) A Fast Level Set Method for Propagating Interfaces. Journal of Computational Physics 8(2), 1995. 269– 277 8. Lefohn A E, Kniss J E, Hansen, Whitaker R E (2004) A Streaming Narrow-Band Algorithm: Interactive Computation and Visualization of Level Sets. IEEE Transactions on Visualization and Computer Graphics 10(4), 2004. 422-433 9. Park S J, Hong H, Shin Y G (2007) Interactive Segmentation and Visualization using Level-Set Method based on Graphics Hardware. IFMBE Proc. 14(4), 2007. 2603-2606 10. Klar O (2007) Interactive GPU based Segmentation of Large Medical Volume Data with Level-Sets. 11th Central European Seminar on Computer Graphics 2007. 11. Whitaker, R (1998) A Level-Set Approach to 3D Reconstruction from Range Data. International Journal of Computer Vision 29(3), 1998. 203–231 12. Peng, D, Merriman, B, Osher, S, Zhao, H, and Kang, M (1999). A PDE Based Fast Local Level Set Method. Journal of Computational Physics. 155(2), 1999. 410–438 13. CUDPP: CUDA Data Parallel Primitives Library, http://www.gpgpu.org/developer/cudpp/ 14. BrainWeb: Simulated Brain Database, http://www.bic.mni.mcgill.ca/brainweb/ 15. Cocosco C A, Kollokian V, Kwan R K-S, Evans A C (1997) BrainWeb: Online Interface to a 3D MRI Simulated Brain Database. NeuroImage 5(4), 1997. 425 3 CUDA Implementation Progression of our algorithm in 3D while segmenting the white and grey matter in a 256³ MRI with 9% noise and 40% intensity non-uniformity. Total computation time is 11 seconds. Progression of our algorithm for a single 2D slice while segmenting the white matter in the MRI above. The segmented region is shown in green. Active voxels are shown in red. Less than 3% of voxels are active at any time during the segmentation. Performance of our algorithm, the GPU narrow band algorithm and the CPU sparse field algorithm while segmenting the white and grey matter in a 256³ MRI with 9% noise and 40% intensity non- uniformity. Lower is better for all graphs. All algorithms labeled 98% of voxels correctly. Total computation time for the CPU sparse field algorithm is not shown to scale. All benchmarks were performed using an Intel 2.5 GHz Xeon CPU and an Nvidia GTX 280 GPU. Various 2D slices of the final white matter segmentation shown above. The final segmented region is shown in green. Active voxels are shown in red. The segmentation is deemed to have converged since there are very few active voxels. 6 References 4 Results 5 Conclusions We present a novel CUDA accelerated level set segmentation algorithm with significantly improved performance over previous algorithms. 360x speedup compared to previous CPU algorithms 16x reduction in the size of the computational domain compared to previous GPU algorithms 9x speedup compared to previous GPU algorithms No reduction in segmentation accuracy Processes the minimal set of voxels at each time step Number of active voxels approaches 0 as the segmentation converges Tracks the active computational domain at the granularity of individual voxels instead of in large tiles Iterate in parallel over every voxel in a level set field buffer. Initialize every voxel based on the user specified seed region. Performed entirely on the GPU without requiring costly GPU-to-CPU message passing No Initialize level set field based on the user specified seed region Identify active voxels based on the spatial gradient of the level set field For all active voxels, update level set field Segmentation converged Yes For all active voxels, update the active set based on the spatial gradient and the temporal derivative of the level set field previous level set field current level set field changing voxels neighbors of changing voxels current level set field voxels with non-zero spatial gradients new active set intersection data seed region seed region voxels with non-zero spatial gradients level set field before update level set field after update final level set field Is the active set empty? Iterate in parallel over every voxel in the level set field buffer to identify active voxels. If a voxel is deemed active, write its coordinates to an auxiliary buffer using the coordinates themselves as array indices. During each time step, iterate in parallel over the dense buffer of active voxel coordinates. Update the level set field at those coordinates. Compact the auxiliary buffer in parallel using Nvidia’s CUDPP library to get a dense buffer containing the coordinates of each active voxel. During each time step, iterate in parallel over the dense list of active voxel coordinates again to identify new active voxels. Write the coordinates of all the new active voxels to the auxiliary buffer. Compact the auxiliary buffer as above to get a dense buffer of active voxel coordinates for the next time step. 2106 4877 294 Total Processed Voxels (millions) Computation Time 0 0.5 1 1.5 2 2.5 3 0 500 1000 1500 2000 2500 Processed Voxels (millions) Iteration Number Processed Voxels Per Iteration 0 10 20 30 40 50 60 0 500 1000 1500 2000 2500 Computation Time (milliseconds) Iteration Number Computation Time Per Iteration 4865 101 11 Total Computation Time (seconds) CPU Sparse Field GPU Narrow Band (OpenGL) GPU Sparse Field (CUDA) 0 seconds 11 seconds 2 seconds 4 seconds 8 seconds Evolving Segmentation in 3D Evolving Segmentation in 2D Final Segmentation in 2D 0 seconds 4 seconds 9 seconds 11 seconds 12 seconds

CUDA Accelerated Sparse Field Level Set Segmentation of ...€¦ · 6. Sethian J A (1999) Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CUDA Accelerated Sparse Field Level Set Segmentation of ...€¦ · 6. Sethian J A (1999) Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry,

[email protected] 2009

CUDA Accelerated Sparse Field Level Set Segmentation of Large Medical Data SetsMike Roberts, Jeff Packer, J Ross Mitchell, Mario Costa SousaUniversity of Calgary, Calgary, Alberta, Canada

(c)

1 Introduction• Image segmentation is an important task in diagnostic medicine• Level set segmentation techniques have been shown to improve the accuracy of difficult segmentation tasks• Previous level set segmentation techniques are computationally expensive even when running on the GPU• Long computation times limit clinical utility

2 Sparse Field Algorithm

• 9x speedup compared to previous GPU algorithms• 16x fewer computations compared to previous GPU algorithms• No reduction in segmentation accuracy• Performed entirely on the GPU without requiring costly GPU-to-CPU message passing• Number of active voxels approaches 0 as the segmentation converges

1. Shattuck D W, Prasad G, Mirza M, Narr K L, Toga A W (2009) Online Resource for Validation of Brain Segmentation Methods. NeuroImage 45(2), 2009. 431-4392. Whitaker R T (1994) Volumetric Deformable Models: Active Blobs. Visualization in Biomedical Computing 1994. 122–1343. Lefohn A E, Cates J E, Whitaker R E (2003) Interactive, GPU-Based Level Sets for 3D Segmentation. MICCAI 2003. 564–5724. Cates J E, Lefohn A E, Whitaker R T (2004) GIST: An Interactive, GPU-Based Level-Set Segmentation Tool for 3D Medical Images. Medical Image Analysis 8(3), 2004. 217-2315. Lefohn A E, Kniss J E, Hansen, Whitaker R E (2003) Interactive Deformation and Visualization of Level Set Surfaces Using Graphics Hardware. IEEE Visualization 2003. 75-826. Sethian J A (1999) Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press, New York7. Adalsteinson D, Sethian J A (1995) A Fast Level Set Method for Propagating Interfaces. Journal of Computational Physics 8(2), 1995. 269–2778. Lefohn A E, Kniss J E, Hansen, Whitaker R E (2004) A Streaming Narrow-Band Algorithm: Interactive Computation and Visualization of Level Sets. IEEE Transactions on Visualization and Computer Graphics 10(4), 2004. 422-433

9. Park S J, Hong H, Shin Y G (2007) Interactive Segmentation and Visualization using Level-Set Method based on Graphics Hardware. IFMBE Proc. 14(4), 2007. 2603-260610. Klar O (2007) Interactive GPU based Segmentation of Large Medical Volume Data with Level-Sets. 11th Central European Seminar on Computer Graphics 2007.11. Whitaker, R (1998) A Level-Set Approach to 3D Reconstruction from Range Data. International Journal of Computer Vision 29(3), 1998. 203–23112. Peng, D, Merriman, B, Osher, S, Zhao, H, and Kang, M (1999). A PDE Based Fast Local Level Set Method. Journal of Computational Physics. 155(2), 1999. 410–43813. CUDPP: CUDA Data Parallel Primitives Library, http://www.gpgpu.org/developer/cudpp/14. BrainWeb: Simulated Brain Database, http://www.bic.mni.mcgill.ca/brainweb/15. Cocosco C A, Kollokian V, Kwan R K-S, Evans A C (1997) BrainWeb: Online Interface to a 3D MRI Simulated Brain Database. NeuroImage 5(4), 1997. 425

• Performed entirely on the GPU without requiring GPU-to-CPU message passing• Tracks the active computational domain at the granularity of individual voxels instead of in large tiles

3 CUDA Implementation

Progression of our algorithm in 3D while segmenting the white and grey matter in a 256³ MRI with 9% noise and 40% intensity non-uniformity. Total computation time is 11 seconds.

Progression of our algorithm for a single 2D slice while segmenting the white matter in the MRI above. The segmented region is shown in green. Active voxels are shown in red. Less than 3% of voxels are active at any time during the segmentation.

Performance of our algorithm, the GPU narrow band algorithm and the CPU sparse field algorithm while segmenting the white and grey matter in a 256³ MRI with 9% noise and 40% intensity non-uniformity. Lower is better for all graphs. All algorithms labeled 98% of voxels correctly. Total computation time for the CPU sparse field algorithm is not shown to scale. All benchmarks were

performed using an Intel 2.5 GHz Xeon CPU and an Nvidia GTX 280 GPU.

Various 2D slices of the final white matter segmentation shown above. The final segmented region is shown in green. Active voxels are shown in red. The segmentation is deemed to have converged since there are very few active voxels.

6 References

4 Results

5 Conclusions

We present a novel CUDA accelerated level set segmentation algorithm with significantly improved performance over previous algorithms.

• 360x speedup compared to previous CPU algorithms• 16x reduction in the size of the computational domain compared to previous GPU algorithms• 9x speedup compared to previous GPU algorithms• No reduction in segmentation accuracy

• Processes the minimal set of voxels at each time step• Number of active voxels approaches 0 as the segmentation converges• Tracks the active computational domain at the granularity of individual voxels instead of in large tiles

Iterate in parallel over every voxel in a level set field buffer. Initialize every voxel based on the user specified

seed region.

• Performed entirely on the GPU without requiring costly GPU-to-CPU message passing

No

Initialize level set field based on the user specified seed region

Identify active voxels based on the spatial gradient of the level set field

For all active voxels, update level set field

Segmentation converged

Yes

For all active voxels, update the active set based on the spatial gradient and the temporal derivative of the level set field

previous level set field

current level set field

changing voxels

neighbors of changing

voxels

current level set field

voxels with non-zero spatial

gradients

new active set

∩intersection

data seed region seed region voxels with non-zero spatial gradients

level set field before update

level set field after update

final level set field

Is the active set empty?

Iterate in parallel over every voxel in the level set field buffer to identify active voxels. If a voxel is

deemed active, write its coordinates to an auxiliary buffer using the coordinates themselves as array

indices.

During each time step, iterate in parallel over the dense buffer of active voxel coordinates. Update the level set field at

those coordinates.

Compact the auxiliary buffer in parallel using Nvidia’s CUDPP library to get a dense buffer containing the

coordinates of each active voxel.

During each time step, iterate in parallel over the dense list of active voxel coordinates again to identify new active voxels. Write the coordinates of all the new active voxels to the auxiliary buffer. Compact the auxiliary buffer as above

to get a dense buffer of active voxel coordinates for the next time step.

0102030405060

0 500 1000 1500 2000 2500

Compu

tatio

n Time 

(millisecon

ds)

Iteration Number

Computation Time Per Iteration

00.51

1.52

2.53

0 500 1000 1500 2000 2500Processed Vo

xels (m

illions)

Iteration Number

Processed Voxels Per Iteration

2106

4877

294

Total Processed Voxels (millions)

4865

101

11

Total Computation Time (seconds)

CPU Sparse Field GPU Narrow Band (OpenGL) GPU Sparse Field (CUDA)

0102030405060

0 500 1000 1500 2000 2500

Compu

tatio

n Time 

(millisecon

ds)

Iteration Number

Computation Time Per Iteration

00.51

1.52

2.53

0 500 1000 1500 2000 2500Processed Vo

xels (m

illions)

Iteration Number

Processed Voxels Per Iteration

2106

4877

294

Total Processed Voxels (millions)

4865

101

11

Total Computation Time (seconds)

CPU Sparse Field GPU Narrow Band (OpenGL) GPU Sparse Field (CUDA)

0102030405060

0 500 1000 1500 2000 2500

Compu

tatio

n Time 

(millisecon

ds)

Iteration Number

Computation Time Per Iteration

00.51

1.52

2.53

0 500 1000 1500 2000 2500Processed Vo

xels (m

illions)

Iteration Number

Processed Voxels Per Iteration

2106

4877

294

Total Processed Voxels (millions)

4865

101

11

Total Computation Time (seconds)

CPU Sparse Field GPU Narrow Band (OpenGL) GPU Sparse Field (CUDA)

0102030405060

0 500 1000 1500 2000 2500

Compu

tatio

n Time 

(millisecon

ds)

Iteration Number

Computation Time Per Iteration

00.51

1.52

2.53

0 500 1000 1500 2000 2500Processed Vo

xels (m

illions)

Iteration Number

Processed Voxels Per Iteration

2106

4877

294

Total Processed Voxels (millions)

4865

101

11

Total Computation Time (seconds)

CPU Sparse Field GPU Narrow Band (OpenGL) GPU Sparse Field (CUDA)

0102030405060

0 500 1000 1500 2000 2500

Compu

tatio

n Time 

(millisecon

ds)

Iteration Number

Computation Time Per Iteration

00.51

1.52

2.53

0 500 1000 1500 2000 2500Processed Vo

xels (m

illions)

Iteration Number

Processed Voxels Per Iteration

2106

4877

294

Total Processed Voxels (millions)

4865

101

11

Total Computation Time (seconds)

CPU Sparse Field GPU Narrow Band (OpenGL) GPU Sparse Field (CUDA)

0 seconds 11 seconds2 seconds 4 seconds 8 seconds

Evolving Segmentation in 3D

Evolving Segmentation in 2D

Final Segmentation in 2D

0 seconds 4 seconds 9 seconds 11 seconds 12 seconds