[IEEE 2008 IEEE Nuclear Science Symposium and Medical Imaging conference (2008 NSS/MIC) - Dresden, Germany (2008.10.19-2008.10.25)] 2008 IEEE Nuclear Science Symposium Conference Record

2008 IEEE Nuclear Science Symposium Conference Record

Development of a Database Driven StatisticalQuality Control Framework for Medical

Imaging Systems

Xinhong Ding, IEEE, A. Hans Vija, IEEE, Johannes Zeintl, Student Member, IEEE, Aarti Kriplani

M10-486

Abstract- Phantom studies are typically the first tests whenthe performance of the imaging system needs to be verified orwhen a problem needs troubleshooting. It is not always clearwhether the resulting image indicates a systematic problem withthe system. Often perceived artifacts are only due to residualcalibration errors or noise (quantum and system noise), yet thesystem being at nominal performance.

The goal is to develop a tool that facilitates the image qualityassessment by comparing the performance of a test system to aset of reference systems for a comprehensive diagnosis of theimage formation chain beyond NEMA [3].

In this paper, we report the development of a databaseoriented statistical quality control framework for medicalimaging systems. Central to this framework are threecomponents: (a) MiQ, a phantom analysis tool [2], whichprocesses data from various common phantoms and extractsquantitative image quality features, (b) a database which storesimages and associated image quality data, and (c) a statisticalinference engine which performs various statistical tests usinginformation from the database.

To demonstrate feasibility, we randomly selected five newnormal Symbia® SPECT-CT systems from the assembly line andperformed three Tomo acquisitions on each using a DataSpectrum cylindrical phantom with hot and cold inserts. We alsosimulated one "abnormal" system by introducing axial shift forsome projection data acquired on that system. Tukey-Kramermultiple comparison test is used to rank the five referencesystems and to test the difference between the "abnormal" systemand the reference systems.

Manuscript received November 14,2008.Dr. Xinhong Ding is with Siemens Medical Solution USA, me., Molecular

Imaging, Hoffinan Estates, IL 60192, USA (e-mail: [email protected]).

Dr. A. Hans Vija is with Siemens Medical Solution USA, me., MolecularImaging, Hoffinan Estates, IL 60192, USA (e-mail: [email protected]).

Johannes Zeintl is a PhD student with the University ofErlangenNuremberg, mstitute ofPattern Recognition, Erlangen, Germany (e-mail:[email protected]).

Dr. Aarti Kriplani is with Siemens Medical Solution USA, me., MolecularImaging, Hoffinan Estates, IL 60192, USA (e-mail: [email protected]).

I. INTRODUCTION

I mage quality assessment is performed to either evaluate the

system's performance and/or quality state, or with a givensystem at a given quality state, to evaluate the informationcontent of the image for a specific task.

Typically, phantom studies are the first experiments to beconducted for quality control (QC) to verify the performanceof the imaging system or to troubleshoot a problem. However,it is not always clear whether the resulting image indicates asystematic problem with the system. Often perceived artifactsare only due to residual calibration errors or quantum andsystem noise.

Similar difficulty exists in the assessment of image qualitywhen noise is inherent in the image. It is not always clear ifthe difference in observed images is caused by the deviation ofthe system (including algorithms), or simply due to thefluctuation of the noise in the data. Therefore a comparison toa database is preferable, to assess if the sample of an image isconsistent with the ensemble. The ensemble is either a digitaldatabase or the experience of an expert reader.

Medical imaging assessment is traditionally performed byhuman observers based on individual experience. While this isstill necessary, the individual human observer has somelimitation in objective image quality assessment. One of themajor limitations is the inability to correlate the observedimage quality with historical data on the same or on differentscanners of the same type. In addition, in many cases, differenthuman observers may draw different conclusions for the samestudies.

It is thus highly desirable to develop a tool that facilitatesimage quality assessment for a comprehensive diagnostic ofthe image formation chain beyond NEMA [3], and whichtakes into consideration the statistical nature of image qualityassessment as exemplified by a collection of empiricalreference data.

Toward this goal, we have developed a statistical imagequality assessment framework that explores the knowledge

978-1-4244-2715-4/08/$25.00 ©2008 IEEE 5377

drawn from an image quality (lQ) database. Figure 1 belowillustrates the basic concept of this framework.

Figure I. Concept of image quality assessment framework.

As shown in Figure 1, this framework is designed for imagequality assessment based on various phantoms. It has threemain components:

(a) MiQ, a phantom analysis tool [2], which processesdata from various common phantoms and extractsimage quality features,

(b) an image quality database which stores images andassociated image quality data, and

(c) a statistical inference engine which performs variousstatistical testing using information from thedatabase.

II. DATABASE DRIVEN STATICAL FRAMEWORK

A. Phantom Analysis Tool

This tool analyzes several commonly used phantoms andhas the following main functionalities:

image registration (manual or automatic)automatic segmentationquantitative characterization of image featurescomparison to image quality database

We now briefly list the phantoms and the correspondingquantitative features the tool extracts.

The NEMA 3 cylinder phantom (Figure 2) (Data Spectrum,Hillsborough, NC) was originally designed for PETacceptance testing with NEMA standard [3]. But it can also beused to evaluate the count rate, uniformity, attenuationcompensation, and scatter compensation for SPECT.

Figure 2. Data Spectrum NEMA 3 cylinder phantom and the region ofinterests (ROIs) generated by MiQ phantom analysis tool.

For NEMA 3 cylinder phantom, the MiQ phantom analysistool first automatically locates the three cylinderscorresponding to air, bone, and water, respectively, based onthe segmentation done on CT images. The tool then definesnine background regions as shown in Figure 2. Based on thisregion of interests (ROIs), the air to background count ratio,the bone to background count ratio, and the water tobackground count ratio are calculated. In addition, thefollowing positive uniformity and negative uniformity are alsocalculated, where the positive uniformity is defined as:

Up = +100. max{C1 ,··" ,Cg } - 6(%), (I)C

and the negative uniformity as:

Un = -100. 6 - min{~'1,'" ,Cg } (%} (2)C

where Ci is the average count ofith background ROI, and Cisthe average of all C j ' s.

The RMI 467 Electron Density CT (EDC) phantom (Figure3) manufactured by Gammex (Middleton, Winsconsin, USA)contains various rods of different material. It is mainly used inconjunction with a CT scanner to establish the relationshipbetween the electron density of various tissues and theircorresponding CT numbers. This phantom can be used toevaluate the image quality of the CT as well as the accuracy ofthe corresponding mu-map values at different energy levels.

Figure 3. Gammex EDC phantom and the segmentation generated by MiQphantom analysis toolkit.

For the EDC phantom, the MiQ phantom analysis tool firstautomatically segments the rods based on the CT images. Itthen identifies the material of each rod, and records thecorresponding HU values of these rods. The measured HUvalues are then transformed to proper mu-map values forgiven energy level.

Figure 4 shows the cardiac torso phantom manufactured byData Spectrum together with the 17 segment polar mapgenerated by the MiQ phantom analysis tool. To generate thispolar map, the tool must first segment the myocardium. Thiswas achieved through a proper 3D registration of the phantomdata with a reference synthetic heart model which has knownorientation and other myocardial parameters. Based onsegmented myocardium, the MiQ phantom analysis tool cangenerate several quantitative image features such as theuniformity of the polar map, the wall thickness of themyocardium, and the volume of the myocardium [2].

5378

(b) Contrast curve

(d) SPEcr segmentation

(a) Phantom

(c) cr segmentation

Another commonly used phantom in quantitative imagequality assessment is the cylindrical phantom with rod orsphere inserts made by Data Spectrum (Figure 7). The MiQphantom analysis tool is also capable of processing thisphantom. This tool first uses the CT image to find the centerlocations of each rod and to define the associated background(Figure 7(c)). These rod and background positions are thenmapped from CT to SPECT images by applying the propercoordinate transformation and registration between the twomodalities. For each sector of the rods, the MiQ phantomanalysis tool then calculates, among other things, the contrastC(%) and the uniformity U(%). Figure 7(b) gives an exampleof the contrast curve as it is plotted across the diameters ofdifferent rods.

This dynamic cardiac torso phantom is mainly used toassess image quality for gated cardiac studies. The MiQphantom analysis tool uses the same technique describedpreviously for the static cardiac torso phantom to locate themyocardium in each gate. The tool then estimates the volumeof the myocardium in each gate using a curve fitting techniqueto overcome the problem that the image under considerationmay not be contiguous. The MiQ phantom analysis tool thenderives the ejection fraction based on the estimated volumes(Figure 5).

Figure 5. Data Spectmm dynamic cardiac torso phantom and the ejectionfraction curve generated by MiQ phantom analysis tool.

Figure 4. Data Spectmm cardiac torso phantom and the 17 segment polar mapgenerated by MiQ phantom analysis tool.

In addition to the static cardiac torso, the MiQ phantomanalysis tool is also capable of processing the dynamic cardiactorso phantom (Figure 5) manufactured by Data Spectrum.

Figure 6 below illustrates the idea of the curve fitting methodto estimate the missing part in a circular short axis heart slice.

Figure 6. Left: the short axis view ofa slice of cardiac phantom data. Rightcurve fitting to estimate the radius of the circle.

Figure 7. Data Spectnun cylindrical phantom and the quantitative featuresgenerated by MiQ phantom analysis tool.

The contrast C and the uniformity U in a sector with n rodsand m background ROls are calculated as follows:

where Ai and Bj denotes the average counts inside the ith rodand lh background RaJ, respectively.

Let (Xi, Yi), i=O, ... ,n be the set of available data (the left imagein Figure 6). Then the objective of the curve fitting is toestimate the radius of a circle that best fits the data. The leastsquare estimate of this radius is calculated as follows: where Bi denotes the average counts inside the ith ROI.

A 2:'i(Xi - .Lo)COS(Oi) + 2:i(Y'i - Yo)sin(Oi)r = ...;;;;;;;....-::..--------=~------

11,

(}i == tan- 1(;;)

where (Xo, Yo) is the center of the circle.

(3)B. Database

The image quality database is based upon the eXIsttngsyngo® Dicom database that comes with the SiemensSymbia® SPECT-CT system. Unlike the traditional imagedatabase, the quantitative image quality features generated byMiQ phantom analysis tool will also be part of the database.

5379

Figure 8 below shows the three main components of thisdatabase.

Figure 8. Three main components of the data base

C. Statistical Inference Engine

Many problems in quantitative image assessment can beformulated as a multiple comparison problem. For example,the quality of an unknown system can be assessed bycomparing certain image quality features measured on thisparticular system to that of several known systems. As anotherexample, one might be interested in comparing differentimages acquired from different acquisition protocols, ordifferent images reconstructed using different algorithms orparameters. It is well known that for multiple objects, thestatistical comparison in general can not be simply reduced tomultiple pair-wise statistical comparison [4].

Instead of using the traditional approach of hypothesistesting which is built on distribution moments, we propose toutilize the multiple comparison method approach which isbuilt on order statistic [4], because we believe it is moresuitable to this problem.

Ill. MATERIALS AND METHODS

A. Data AcqUisition and Processing

To demonstrate feasibility, we randomly selected five newSymbia SPECT·CT systems from the assembly line, each withproper clinical QC, and acquired SPECT data on each usingthe Data Spectrum cylinder phantom with the Deluxe hot andcold rod inserts. The images were then reconstructed usingSiemens Flash3D iterative reconstruction without attenuationcorrection or scatter correction. A CT scan of the cylinderphantom was performed. The acquisition was repeated threetimes, with all acquisition and reconstruction parameters heldconstant, and not optimized for best performance. We alsorandomly selected a test system and y-shifted (axial) some ofthe projection frames to simulate a poorly calibrated"abnormal" test system.

The images from the five reference systems (from left) andone test system are shown in Figure 9.

111111111111Figure 9. Images from reference systems and one test system

Clearly, a human observer in general can not tell, based onthese images alone, if the differences among them arestatistically significant or not.

B. Quantitative Image Quality Feature Extraction

The reconstructed images (NM) were then processed by theMiQ phantom analysis tool which registered the NM data withthe CT, segmented the rods based on the CT images, andgenerated for instance the contrast C and uniformity Umeasures as described in the previous section.

After normalization to the max value, the raw measureddata is shown in Tables 1 and 2 below.

TABLE 1. NORMALIZED HOT RODS CONTRAST DATA

AcquisitionSystems Abnonnal

1 2 3 4 5 Test system1 0.84 0.5 0.8 0.78 1 0.832 0.94 0.87 0.76 0.76 0.99 0.613 0.91 0.86 0.78 0.72 0.94 0.85

TABLE 2 NORMALIZED NON-UNIFORMIlY DATA

AcquisitionSystems Abnonnal

1 2 3 4 5 Test system1 0.89 1 0.9 0.8 0.84 0.922 0.85 0.84 0.96 0.84 0.87 1.093 0.81 0.89 0.98 0.82 0.92 1.23

C. Multiple Comparison

Using the data from table 1 and table 2, we first conduct thefollowing hypothesis test:

Ho : J.ll = 112 = ... = J.lsHa: at least two fls' are different (6)

to answer the question:a) Are the five systems statistically identical in terms of themeasurement quantities?

We then compare the abnormal test system with the fivereference systems by testing the hypothesis:

Ho: J.ll = J.l2 = ... = J.ls = Jltcst

Ha: at least two fls' are different (7)

to answer the question:b) Is the unknown system different from the referencesystems?

5380

The following Tukey-Kramer multiple comparison test isused to do the test and to rank the systems.

where k is the number of treatments, Th and j}j are the samplemeans of treatments i and j with sample size ni and nj,respectively, S2 is an estimate of the experimental error, andqa,v,k is the tabulated value of the studentised-range statistics atthe a confidence level with v degree of freedom [4].

1 1 1-s2(- + -),2 ni nj

(8)

2006. IEEE Volume 6, Issue, Oct. 29 2006-Nov. 1 2006 Page(s): 3251 3257.

[3] National Electrical Manufacturers Association (NEMA). PerfonnanceMeasurements of Scintillation Cameras. Standards Publication NU 12007.Washington DC: NEMA; 2007.

[4] Design ofExperiments: Ranking and Selection. edited by Thomas 1.Santner and Ajit C. Tamhane, Marcel Dekker Inc. 1984.

IV. RESULT

Ranking of the reference systems: at a=O.05, for both thecontrast data and the non-uniformity data, the test shows nodifference for the five reference systems. The systems can beranked as S2 ::;S4 ::;S3 ::;S 1 ::;S5 based on Tukey's 95%confidence intervals, where Si denotes the ith system.

Testing the "abnormal" system: at a=0.05, the contrast datacomparison test does not demonstrate significant difference(sig=O.l O)~ but the non-uniformity data comparison test doesdemonstrate a significant difference (sig=0.02) between the"abnormal" system and the reference systems. In particular,Tukey's 95% confidence intervals reveal that the "abnormaltest system" is different from system 1 and 4. It is interestingto note that the simple paired t-test, which treats the fivereference systems as a whole, will not detect the significantdifference at this level.

V. FUTURE WORK

The next step is to test the sensitivity of the method bysystematically introducing imperfections of increasingmagnitude. We subsequently wish to catalogue artifactfeatures and correlate them to system defects or suboptimalcalibrations.

VI. ACKNOWLEDGMENT

We thank Dr Ron Malmin, Dr Manjit Ray, and Phillip Rittfor various help in collecting the data. We thank Dr EricHawman and Dr Amos Yahil for stimulating discussions. Wealso thank Dena Holzer for administrative support and editing.

REFERENCES

[1] A. H. Vija, E. G. Hawman, and 1. C. Engdahl, "Analysis ofa SPECTOSEM reconstruction method with 3D beam modeling and optionalattenuation correction: phantom studies." Nuclear Science SymposiumConference Record, 2003 IEEE Volume 4, Issue, 19-25 Oct. 2003Page(s): 2662 - 2666 VolA.

[2] 1. Zeintl, A. H. Vija, 1. T. Chapman et aI., "Quantifying the Effects ofAcquisition Parameters in Cardiac SPECT Imaging and Comparison withVisual Observers." Nuclear Science Symposium Conference Record,

5381

Documents

[IEEE 2008 IEEE Nuclear Science Symposium and Medical Imaging conference (2008 NSS/MIC) - Dresden, Germany (2008.10.19-2008.10.25)] 2008 IEEE Nuclear Science Symposium Conference Record