1
CATEGORY: MEDICAL IMAGING & VISUALIZATION POSTER MI02 CONTACT NAME Saoni Mukherjee: saoni@coe.neu.edu More information and software available: http://www.coe.neu.edu/Research/rcl//projects/CBCT.php filtered projections reconstructed 3D volume (F) X-ray source Saoni Mukherjee, Nicholas Moore, James Brock, Miriam Leeser Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA Biomedical image reconstruction applications with large datasets can benefit from acceleration. Graphic Processing Units(GPUs) are particularly useful in this context as they can produce high fidelity images rapidly. An image algorithm to reconstruct conebeam computed tomography(CT) using two dimensional projections is implemented using GPUs. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional construction. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU codes in C and MATLAB are compared with the heterogeneous versions written in CUDA-C and OpenCL. The relative performance is tested and evaluated on a mathematical phantom as well as on mouse data. Speedups of over thirty times using the GPU are seen for phantom data and close to fifty times for the larger mouse datasets. What is CT Scanning? sinogram: a line for every angle reconstruction routine reconstructed cross- sectional slice data 3D reconstructed volume Advantage i) Reduced X-ray exposure, ii) Image accuracy - more accurate than MRI! Disadvantage The longer time it takes to reconstruct the volume! - - interruption in treatment/ diagnosis. Co- ordinates Weight value, 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 MATLAB C Time ( Sec) Backprojection time Total time 2hr 20m 40s 2hr 20m 43s 1hr 32m 36s 1h 32m 39s Mathematical phantom Input: 64 × 60 pixels with 72 projections final volume: 64 × 60 × 50 voxels Host Device Language Intel Core i7 quad-core processor with @ 3.4 GHz MATLAB MATLAB PCT Intel Xeon W3580 quad-core processor @ 3.33 GHz NVIDIA Tesla C2070 C C with OpenMP CUDA Intel Xeon CPUs E5520 @ 2.27GHz AMD Radeon HD5870 OpenCL Mouse scan Input: 512 × 768 pixels with 361 projections final volume: 512 × 512 × 768 voxels 2.25 89.62 14.07 14.7 123.23 19.68 0 20 40 60 80 100 120 140 Weighting Filtering Backprojection Time (Millisec) NVIDIA timings AMD timings 17.02 1.36 0.32 0.01 0.1 0.01 17.09 1.44 0.33 0.11 0.16 0.1 0 2 4 6 8 10 12 14 16 18 MATLAB C C+OpenMP OpenCL-NVIDIA OpenCL-AMD CUDA Time (sec) Backprojection time Total time 32m 9s 1h 14m 37s 1h 14m 43s 32m 12s 42 s 2h 20m 43s 1m 31s 1m 7s 2h 20m 40s 55s Programming Paradigm Speedup over single threaded MATLAB Speedup over single threaded C Speedup over multi-threaded C C with OpenMP 50x 4x - OpenCL (NVIDIA) 1700x 136x 32x OpenCL (AMD) 170x 13x 3x CUDA 1700x 136x 32x Programming Paradigm Speedup over single threaded MATLAB Speedup over multi-threaded MATLAB Speedup over single threaded C Speedup over multi- threaded C MATLAB PCT 1.5x - - - C with OpenMP 4x - 2x - OpenCL (NVIDIA) 125x 80x 70x 30x CUDA 200x 130x 100x 45x Future work 1) The next bottleneck- Weighted Filtering. Was not earlier! 2) More configurations to be tested with auto-tuning- number of kernels to be launched, number of threads. 3) Streaming for bigger datasets. 4) Overlapping computation and communication. References [1] S. Mukherjee, N. Moore, J. Brock, M. Leeser, CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging, Proc. of IEEE High Performance Extreme Computing, (2012). [2] L. A. Feldkamp, L. C. Davis, J. W. Kress, Practical cone-beam algorithm, J. Opt. Soc. Am., Volume 1(A), (1984). [3] F. Xu, K. Mueller, Real-time 3D computed tomographic reconstruction using commodity graphics hardware, Physics in Medicine and Biology, 52(12) (2007). [4] F. Ino, S. Yoshida, K. Hagihara, RGBA Packing for Fast Cone Beam Reconstruction on the GPU, Proc. of SPIE, Vol. 7258, (2009). [5] NVIDIA corporation, NVIDIA CUDA C Programming Guide, CUDA Toolkit 4.1. [6] Fessler's image reconstruction toolbox, http://www.eecs.umich.edu/~fessler/irt/fessler.tgz. Klaus Mueller, Introduction to Medical Imaging, Lecture 6: X-Ray Computed Tomography, Computer Science Department, Stony Brook University Backprojection takes most of the time, but highly parallelizable. Different voxels are independent. Fessler’s image reconstruction toolbox 6 implements Feldkamp CBCT in MATLAB. Widely used in academia. Sample Projections 3D CT Reconstruction Weighted Projection: Weighted and ramp filtered raw data produce filtered projections Q 1 ,Q 2 , ...,Q K , collected at an angle θn where 1 ≤ n ≤ K. d i = distance between the volume origin and the source. F(x, y, z) = value of voxel (x, y, z) in volume F. Volume F in xyz space and Projections are in uv space. Backprojection: The volume F is reconstructed using the following equations: Feldkamp Algorithm Motivation Our approach Results P H A N T O M M O U S E S C A N Architectures and Languages used Abstract

Saoni Mukherjee , Nicholas Moore, James Brock, Miriam Leeseron-demand.gputechconf.com/gtc/2013/poster/pdf/P... · final volume: 512 × 512 × 768 voxels 2.25 89.62 14.07 14.7 123.23

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Saoni Mukherjee , Nicholas Moore, James Brock, Miriam Leeseron-demand.gputechconf.com/gtc/2013/poster/pdf/P... · final volume: 512 × 512 × 768 voxels 2.25 89.62 14.07 14.7 123.23

Category: Medical iMaging & Visualizationposter

Mi02contact name

saoni mukherjee: [email protected]

More information and software available: http://www.coe.neu.edu/Research/rcl//projects/CBCT.php

filtered projections

reconstructed 3D volume (F)

X-ray source

Saoni Mukherjee, Nicholas Moore, James Brock, Miriam Leeser

Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA

Biomedical image reconstruction applications with large datasets can benefit from acceleration. Graphic Processing Units(GPUs) are particularly useful in this context as they can produce high fidelity images rapidly. An image algorithm to reconstruct conebeam computed tomography(CT) using two dimensional projections is implemented using GPUs. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional construction. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU codes in C and MATLAB are compared with the heterogeneous versions written in CUDA-C and OpenCL. The relative performance is tested and evaluated on a mathematical phantom as well as on mouse data. Speedups of over thirty times using the GPU are seen for phantom data and close to fifty times for the larger mouse datasets.

What is CT Scanning?

sinogram: a line for every angle

reconstruction routine

reconstructed cross-sectional slice

data

3D reconstructed volume

, , ,

Advantage i) Reduced X-ray exposure, ii) Image accuracy - more accurate than MRI! Disadvantage The longer time it takes to reconstruct the volume! - - interruption in treatment/ diagnosis.

Co- ordinates

Weight value,

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

MATLAB C

Tim

e ( S

ec)

Backprojection time

Total time

2hr 2

0m 4

0s 2h

r 20m

43s

1hr 3

2m 3

6s

1h 3

2m 3

9s

Mathematical phantom Input: 64 × 60 pixels with 72 projections final volume: 64 × 60 × 50 voxels

Host Device Language Intel Core i7 quad-core processor with @ 3.4 GHz MATLAB

MATLAB PCT Intel Xeon W3580 quad-core processor @ 3.33 GHz

NVIDIA Tesla C2070 C C with OpenMP CUDA

Intel Xeon CPUs E5520 @ 2.27GHz AMD Radeon HD5870 OpenCL

Mouse scan Input: 512 × 768 pixels with 361 projections final volume: 512 × 512 × 768 voxels

2.25

89.6

2

14.0

7

14.7

123.

23

19.6

8

0

20

40

60

80

100

120

140

Weighting Filtering Backprojection

Tim

e (M

illis

ec)

NVIDIA timingsAMD timings

17.0

2

1.36

0.32

0.01

0.1

0.01

17.0

9

1.44

0.33

0.11

0.16

0.1

0

2

4

6

8

10

12

14

16

18

MATLAB C C+OpenMP OpenCL-NVIDIA OpenCL-AMD CUDA

Tim

e (s

ec) Backprojection time

Total time

32m

9s

1h 1

4m 3

7s

1h 1

4m 4

3s

32m

12s

42 s 2h

20m

43s

1m 3

1s

1m 7

s 2h 2

0m 4

0s

55s

Programming Paradigm

Speedup over single threaded MATLAB

Speedup over single threaded C

Speedup over multi-threaded C

C with OpenMP 50x 4x - OpenCL (NVIDIA) 1700x 136x 32x OpenCL (AMD) 170x 13x 3x CUDA 1700x 136x 32x

Programming Paradigm

Speedup over single threaded MATLAB

Speedup over multi-threaded MATLAB

Speedup over single threaded C

Speedup over multi-threaded C

MATLAB PCT 1.5x - - - C with OpenMP 4x - 2x - OpenCL (NVIDIA) 125x 80x 70x 30x CUDA 200x 130x 100x 45x

Future work 1) The next bottleneck- Weighted Filtering. Was not earlier! 2) More configurations to be tested with auto-tuning- number of kernels to be launched, number of threads. 3) Streaming for bigger datasets. 4) Overlapping computation and communication.

References [1] S. Mukherjee, N. Moore, J. Brock, M. Leeser, CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging, Proc. of IEEE High Performance Extreme Computing, (2012). [2] L. A. Feldkamp, L. C. Davis, J. W. Kress, Practical cone-beam algorithm, J. Opt. Soc. Am., Volume 1(A), (1984). [3] F. Xu, K. Mueller, Real-time 3D computed tomographic reconstruction using commodity graphics hardware, Physics in Medicine and Biology, 52(12) (2007). [4] F. Ino, S. Yoshida, K. Hagihara, RGBA Packing for Fast Cone Beam Reconstruction on the GPU, Proc. of SPIE, Vol. 7258, (2009). [5] NVIDIA corporation, NVIDIA CUDA C Programming Guide, CUDA Toolkit 4.1. [6] Fessler's image reconstruction toolbox, http://www.eecs.umich.edu/~fessler/irt/fessler.tgz.

Klaus Mueller, Introduction to Medical Imaging, Lecture 6: X-Ray Computed Tomography, Computer Science Department, Stony Brook University

• Backprojection takes most of the time, but highly parallelizable. • Different voxels are independent. •Fessler’s image reconstruction toolbox6 implements Feldkamp CBCT in MATLAB. Widely used in academia.

Sample Projections

3D CT Reconstruction

Weighted Projection: Weighted and ramp filtered raw data produce filtered projections Q1,Q2, ...,QK, collected at an angle θn where 1 ≤ n ≤ K. di = distance between the volume origin and the source. F(x, y, z) = value of voxel (x, y, z) in volume F. Volume F in xyz space and Projections are in uv space. Backprojection: The volume F is reconstructed using the following equations:

Feldkamp Algorithm

Motivation

Our approach

Results PHANTOM

MOUSE SCAN

Architectures and Languages used

Abstract