10
GPU-accelerated method for determining animal bone density from computed tomography (CT) images Michael C. Oliveira Department of Bioengineering, University of California Riverside EE 217 – GPU Architecture and Parallel Programming Mar 22, 2013

GPU-accelerated method for measuring animal bone density from CT images

Embed Size (px)

Citation preview

GPU-accelerated method for determining animal bone density from computed tomography (CT)

images

Michael C. OliveiraDepartment of Bioengineering, University of California Riverside

EE 217 – GPU Architecture and Parallel ProgrammingMar 22, 2013

Computed Tomography (CT)

Image from: Obenaus lecture 1/12/11, BPBE 510, ‘Computed Tomography’, Slide 5

Resulting CT Image

http://www.columbusimaging.com/Brilliance_CT_3.jpg

Typical CT Scanner

Colormap for CT Images:Black: low X-Ray Attenuation, lower signal intensityWhite: high X-Ray Attenuation, higher signal intensity

Signal Intensity ~ Hounsfield Units and

Attenuation Coefficient

http://radiographics.rsna.org/content/22/4/949/F7.medium.gif

Source-Detector Geometry

Image Processing Algorithms (from

Detector data)

Previous Work

0

0.5

1

1.5

2

2.5

3

Animal Bone Densities

Cow

Pig

Fish

Chicken

Animals

Bone

Den

sity

(g/c

m3)

Animal Ref Mean BD*

Calc Mean BD*

1Cow 2.1-2.2 2.547 +/- 0.48441Pig 2.0-2.1 1.910 +/- 0.4729

Fish ND 2.526 +/- 0.37471Chicken 2.1-2.2 2.017 +/- 0.7940

*Density in g cm-3

1 Aerssens et al. “Interspecies Differences in Bone Composition, Density and Quality: Potential Implications for in Vivo Bone Research.” Endocrinology. 139(2): 663-670. (1998)

Question:Can we use GPUs to accelerate bone density

calculations from CT images?

Hardware/Software SetupLenovo U410 Ultrabook

CPU GPU

Intel Core i5-3317U @ 1.70 GHz NVIDIA GeForce 610M

8.00 GB RAM 1 GB DRAM

Dual-core CPU 1 SM x 48 CUDA cores

Programmed with MATLAB and C++ in Visual Studio 2008

Programmed with CUDA 4.2 in Visual Studio

CT images are 512 x 512 pixels with 16-bit encoding

Hybrid Program Flowchart

Host and Device memory allocations

Read image and image mask data

Copy image and image mask to GPU

Load image and image mask into shared memory

Compute density per pixel and apply mask

Parallel reduction sum per block

Write block sums to global memory

Copy block sums to CPU

Compute average density from block

sums

CPU GPU

ROIs scaled to Hounsfield Units

Hounsfield units converted to Atten.

Coeff.

Atten. Coeff converted to Mass

Atten. Coeff

Density solved for each pixel in ROI

• Scale the Pixel Intensities to the Hounsfield Scale

• Conversion from Hounsfield Units to Attenuation Coefficient

• Solve for Density using the relationship between Attenuation Coefficient and Mass Attenuation Coefficient

)min(max

)min),,((

)min(max

)min(

CTimageCTimage

CTimagezyxCTimage

rscValHUrscValHU

rscValHUrscValHU

water

waterpixelHU

1000

pixel

pixelbonecortmass

.,

ROIs scaled to Hounsfield Units

Hounsfield units converted to Atten. Coeff.

Density solved for each pixel in ROI

Atten. Coeff converted to

Mass Atten. Coeff

2μwater= 0.1893 cm2/g @ 75 keV

2μmass,cort.bone= 0.2526 cm2/g @ 75 keV

CT Scale: [0, 65535]1HU Scale: [-1000, 3000]

1 Bushberg, Jerrold T. "Computed Tomography" The Essential Physics of Medical Imaging. Philadelphia: Lippincott Williams & Wilkins, 2002.2 NIST Physical Measurements Laboratory, http://physics.nist.gov/PhysRefData/XrayMassCoef/ComTab/bone.html

Results

Ref. C++ CUDA

BD* BD* Time (ms) BD* Time (ms) Speedup

1Cow 2.1-2.2 2.6718 5.134 2.6716 2.087 2.460x

*Density in g cm-3

1 Aerssens et al. “Interspecies Differences in Bone Composition, Density and Quality: Potential Implications for in Vivo Bone Research.” Endocrinology. 139(2): 663-670. (1998)

Code Block Time (ms)

CPU --> GPU copy 0.544

CUDA Kernel 0.036

GPU --> CPU copy 1.512

Total 2.097

98% time spent doing data transfer2% time doing computation

Timing done using QueryPerformanceCounter()

Summary

• Successfully implemented the bone density computation in CUDA and C++

• GPU computation shows a ~2.5x speedup compared to CPU-based computation for a single frame

• Advantage may become more apparent with larger number of images used