Evaluation of a System for High-Accuracy 3D Image …dan/papers/Mirota2013.pdfof the endoscope), and P (the phantom simulating a patient undergoing endoscopic skull base surgery)

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

1

Evaluation of a System for High-Accuracy 3DImage-Based Registration of Endoscopic Video toC-Arm Cone-Beam CT for Image-Guided Skull

Base SurgeryDaniel J. Mirota, Member, IEEE, Ali Uneri, Sebastian Schafer, Sajendra Nithiananthan,

Douglas D. Reh, Masaru Ishii, Gary L. Gallia, Russell H. Taylor, Fellow, IEEE,Gregory D. Hager, Fellow, IEEE, and Jeffrey H. Siewerdsen*

Abstract—The safety of endoscopic skull base surgery canbe enhanced by accurate navigation in preoperative CT or,more recently, intraoperative cone-beam CT (CBCT). The abilityto register real-time endoscopic video with CBCT offers anadditional advantage by rendering information directly withinthe visual scene to account for intraoperative anatomical change.However, tracker localization error (∼1–2 mm) limits the accu-racy with which video and tomographic images can be registered.This paper reports the first implementation of image-basedvideo-CBCT registration, conducts a detailed quantitation ofthe dependence of registration accuracy on system parameters,and demonstrates improvement in registration accuracy achievedby the image-based approach. Performance was evaluated as afunction of parameters intrinsic to the image-based approach,including system geometry, CBCT image quality, and computa-tional runtime. Overall system performance was evaluated in acadaver study simulating transsphenoidal skull base tumor exci-sion. Results demonstrated significant improvement (p < 0.001)in registration accuracy with a mean reprojection distance errorof 1.28 mm for the image-based approach versus 1.82 mm for theconventional tracker-based method. Image-based registration washighly robust against CBCT image quality factors of noise andresolution, permitting integration with low-dose intraoperativeCBCT.

Index Terms—Endoscopy, Registration, Image-guided treat-ment, Surgical guidance/navigation, Cone-beam CT

Copyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].

Manuscript received April 9, 2012; revised November 6, 2012; acceptedDecember 20, 2012. This work was supported principally by the NationalInstitutes of Health under grant number R01-CA127444 with further supportprovided by the NSF ERC Grant EEC9731748, the Link Fellowship inSimulation and Training and the Medtronic Computer-Aided Surgery ResearchAward. Asterisk indicates corresponding author.

D. J. Mirota, A. Uneri, R. H. Taylor, and G. D. Hager are with Departmentof Computer Science, Johns Hopkins University, Baltimore, MD, USA (email:[email protected]; [email protected]; [email protected]; [email protected]).

S. Schafer, S. Nithiananthan, and *J. H. Siewerdsen are with De-partment of Biomedical Engineering, Johns Hopkins University, Bal-timore, MD, USA (email: [email protected]; [email protected];[email protected]).

D. D. Reh and M. Ishii are with Department of Otolaryngology–Headand Neck Surgery, Johns Hopkins Medical Institutions, Baltimore, MD, USA(email: [email protected]; [email protected]).

G. L. Gallia is with Department of Neurosurgery & Oncology, Johns Hop-kins Medical Institutions, Baltimore, MD, USA (email: [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

I. INTRODUCTION

M INIMALLY invasive surgical access to the skull baseis becoming an increasingly popular approach over tra-

ditional open access (e.g., lateral rhinotomy). Endonasal skullbase surgery (ESBS) offers a minimally invasive approach toremove lesions of the skull base transnasally and has beenshown to provide reduced morbidity in comparison to openapproaches [1]. The technique is also used to treat pituitarylesions, non-neoplastic skull base lesions, and other tumors ofthe skull base and nasal cavity. Such lesions are particularlychallenging to access endonasally due to the proximity ofcritical neurovasculature, including the carotid arteries andcranial nerves [2]. Due to the complexity of these surgeries,navigation systems are often used to assist the surgeon [3], andstudies have shown that navigation helps reduce the morbidityof the endonasal approach [4], [5]. Active research continuesto improve the accuracy of the navigation systems [6]–[8] andreports that current less invasive registration methods, e.g.,surface tracing, are not sufficient for all skull base procedures.Furthermore, current surgical technique requires the carefulidentification of key anatomic landmarks in both CT andendoscopically because of the limited accuracy of today’sregistration methods [9], though there is no substitute foranatomic knowledge. Most of these systems rely on navigationin the context of preoperative data—e.g., preoperative CT—that does not convey intraoperative anatomical change norallow for intraoperative assessment of the resection.

Recently, systems capable of high-quality intraoperative3D imaging have become available based on cone-beam CT(CBCT) [10]–[15]. For example, CBCT on a mobile C-arm hasdemonstrated high-quality images with sub-millimeter spatialresolution and soft-tissue visibility [11], [16]. Previous workused a conventional tracking system to track the endoscopeand navigate within preoperative or intraoperative data [17]–[20], but did not use the video data directly in the registrationprocess. Other related work registers endoscopic video usingimage-based methods, but registered to preoperative data [21],[22].

We build on initial studies [23] and extend such methodsby directly using the endoscopic video data to register to in-traoperative CBCT, which enables high-precision registration

mailto:[email protected]











http://ieeexplore.ieee.org



2

N

X

E

CBCT

P

CBCT

POX

RE

N

Figure 1. Benchtop setup for measurement of video-CBCT registration accuracy. The benchtop system is shown at (left), with details of various componentsshown at (right). Labels refer to: N (the tracker), CBCT (the CBCT data), R (the reference marker), X (the pointer), E (the endoscope), O (the optical centerof the endoscope), and P (the phantom simulating a patient undergoing endoscopic skull base surgery).

N

CBCTR

P

EOX trackerTendoscope

trackerTpointer

trackerTreference

patientTreferenceendoscopeTcamera

cameraTpatient

ptip

CBCTTpatient

Figure 2. Frames of reference and transformations corresponding to thesystem in Figure 1. Labels for each component (N, CBCT, R, X, E, O, andP) are as in Figure 1. The notation ATB refers to the transformation fromreference B to A.

of video endoscopy with the most up-to-date tomographicdata during surgery. The system detailed below reconstructs3D point clouds from the endoscopic video using structurefrom motion (SfM) [24] and registers the resulting 3D pointcloud directly to intraoperative CBCT. In this way, the systemextends and improves previous methods that only include reg-istration to preoperative data [25] or only use a tracking systemfor estimating the endoscope position [26]. Furthermore, thesystem extends beyond [23] by investigating the effects ofrelevant system parameters and further experimentation.

II. METHODS

A. System Setup

The proposed system was first tested and evaluated usingthe benchtop arrangement shown in Figure 1, which allowedfor precise, reproducible control of the system geometry.The benchtop used two phantoms illustrated in Figure 1:a polycarbonate “red skull” phantom [27] provided a rigidcontext approximating the anatomy of the sinus and oralcavity; the black anthropomorphic phantom included a naturalhuman skeleton within tissue-equivalent plastic, modified toallow an endoscopic approach to the sphenoid, nasopharynx,

and oropharynx. The red skull was used in studies of funda-mental geometric registration accuracy and the anthropomor-phic phantom for studies simulating the CBCT image qualityproperties of the head. The phantoms were rigidly secured toa rotary table (Velmex, Inc., Bloomfield, NY, USA) fastenedto the optical table as in Figure 1. The benchtop also includedtwo linear stages (Velmex, Inc., Bloomfield, NY, USA) forprecise positioning of the endoscope (Karl Storz GmbH &Co. KG, Tuttlingen, Germany), which was held rigidly bya clamp attached to a passive articulated arm (NOGA Ltd.,Shlomi, Israel).

Tracking of surgical tools (a rigid pointer and endoscope)was performed using the Vicra infrared tracker (NorthernDigital Inc., Waterloo, Ontario, Canada). A simple pointertool and pivot calibration provided initial registration to thetracking system. The position of the phantom was trackedusing a reference marker rigidly attached to the skull. A rigidbody with four infrared markers was attached to the endoscopeas shown in Figure 1, allowing the endoscope to be trackedin real-time, with methods for registration and calibrationdetailed below.

B. Registration Methods

We compared two registration methods: i.) a conventionaltracker-based method in which video and CBCT were regis-tered based on the endoscope pose estimate provided by thetracker, and ii.) an image-based method in which video (i.e.,3D SfM point clouds) and 3D CBCT were directly registered.The tracker-based method provides the initialization for theimage-based method, thereby yielding an integrated systemthat is potentially both robust (coarse pose initialization bythe tracker) and precise (fine registration using video andCBCT images directly). It is worth noting that the accuracyof the tracker used does affect the final registration accuracy,as shown in [25]. Figure 2 shows the relationships andtransformations between the pertinent frames of reference.

1) Tracker-based Registration: In the tracker-basedmethod, the position of the endoscope is determined solely bythe tracking system. To compute the position of the endoscope,



3

CBCTTcamera = (CBCTTpatient)(patientTreference)(

trackerTreference)−1(trackerTendoscope)(

endoscopeTcamera) (1)

Fine / Asynchronous Registration

Image-based

SIFTNf df , c

SVD-SIFT Match

s

MotionEstimation Triangulation

Ni

3D-3D Registration

Isosurfacedc

UpdateCoarse / Real-time Registration

Tracker-basedendoscopepworld

imageTworldd

Planning

D , σ

CBCT

b

Endoscope

TrackerDisplay

Figure 3. System data flow diagram. Pa-rameters of each process are highlightedin gray: (b) baseline distance betweenvideo images; (D,σ ) dose and smooth-ing associated with the CBCT image;(N f ) the number of SIFT features; (dc)the amount of decimation in the CBCTsurface segmentation; (d f ,c) the distanceand correlation between features; (s) thenumber of samples in motion estimation;(d) the distance from the endoscope tothe target in the imageTworld transforma-tion; (Ni) the number of iterations in the3D-3D video-CBCT registration.

a pointer tool was first calibrated and used to record fiduciallocations on the exterior surface of the rigid phantom. A rigidtransformation between these fiducial points localized by thetracking system and the same points segmented in CBCTwas computed, providing the transformation patientTreferenceand CBCTTpatient. A camera calibration was then performedusing a checkerboard grid attached to the reference marker.The intrinsic camera parameters—including the 1st, 3rd and5th order radial distortion parameters—and the extrinsictransformation endoscopeTcamera were computed using theGerman Aerospace Agency (DLR) camera calibration toolbox[28]. The optical camera center of the endoscope (O) inFigure 2 was then computed as shown in (1). As illustratedin Figure 2 and shown algebraically in (1), the tracker-basedmethod requires a long series of transformations to computethe transformation from the camera to the CBCT. It includespassing through the tracking system, the reference frameof the patient and the endoscope rigid-body. Each of thesetransformations may add error to the estimated location ofthe optical center of the endoscope.

2) Image-based Registration: The image-based method di-rectly uses the video image data itself to register the en-doscope images to the CBCT volume. To do so, both theCBCT and the endoscope images require processing. For theendoscopic video, processing involved the pipeline shown inFigure 3. A pair of images first underwent SIFT feature [29]extraction. The resulting features were matched using SVD-SIFT Match [30] to create candidate correspondences fromwhich the motion between pairs of images was estimated usingAdaptive Scale Kernel Consensus (ASKC) [31]. Followingtriangulation, the feature point cloud was registered to CBCTusing the trimmed least-squares method described in [25].

The CBCT images—the volumetric data—are segmentedat the air/tissue boundary using a simple intensity-basedthreshold. Marching cubes [32] is applied and provides a 3D

surface at the air/tissue boundary. The CBCT is optionallysmoothed before surface extraction to reduce noise in thesegmentation, treated in detail in Section II-E2 below. Image-based registration therefore involves a match of the 3D SfMpoint cloud derived from endoscopic video to the air/tissuesurface derived from CBCT.

The top row of Figure 3 illustrates the conventional tracker-based approach, which provides initialization for the image-based approach illustrated in the bottom row. The diagramhighlights in gray the pertinent parameters of each step. Thedistance from the endoscope optical center to the target (onthe CBCT surface) is denoted d. The parameter d was firstmeasured using tracking software to compute the distancebetween the registered camera location and to a segmentedtarget. Once an initial measurement was recorded (e.g. 10mm),changes to the distance to target were made using the linearstage directly. The parameter b is the baseline distance betweenendoscopic video image pairs and was measured and changedfrom the linear stage directly. The SIFT detector [29] isapplied to the image pair, parameterized by the number offeatures, N f . The features detected on the pair of images arematched with SVD-SIFT Match [30]. In this step, we addtwo additional constraints on correspondences—the distancebetween features, d f , and the correlation of the features,c. After initial feature correspondences are established, themotion between the image pair is estimated using ASKC [31],which is similar to Random Sample Consensus (RANSAC)[33] and is parameterized by the number of samples, s, usedto achieve consensus. Based on the motion between videoimage pairs, the features are triangulated to form a 3D pointcloud that is registered to the segmented CBCT surface. Theentire surface is used in the registration process becausethe correspondence between the point cloud and the surfaceis unknown. The registration is a re-weighted least squaresmethod characterized by the number of iterations, Ni. Among



4

the image-based method pipeline steps, the parameters ofSVD-SIFT match, robust motion estimation and registrationwere previously investigated in [25].

The image-based registration is potentially affected byparameters governing CBCT image quality—specifically, theradiation dose, D (which affects the level of quantum noise inCBCT image as described in [34]) and optional smoothing,σ , representing the width of a uniform 3D Gaussian filterapplied to the CBCT reconstruction to reduce quantum noiseat the cost of spatial resolution. The segmentation of theair/tissue CBCT surface is parameterized by the thresholdvalue (t ∼ -500 HU) and percentage of decimation dc.

The dependence of image-based video-CBCT registrationaccuracy was evaluated as a function of each of the afore-mentioned parameters to elucidate the factors governing theperformance of the system. Such measurements provide un-derstanding of the robustness of the proposed system tovariation in any particular parameter and offer a guide forfuture development aimed at improving registration accuracyand computational speed.

C. Analysis of Registration Accuracy

The geometric accuracy of each registration method wasassessed in terms related to the target registration error (TRE)[35], which describes the root mean squared (RMS) distancebetween target points (i.e., anatomical points not included inthe registration process) transformed by the estimated regis-tration and their corresponding fixed locations. TRE metricssuitable for projective geometry are illustrated in Figure 4 anddescribed in [36].

We consider two rays emanating from the camera opticalcenter—one containing the target point plane (tim) in the 2Dvideo image and the other containing the target point (t3D) inthe 3D CBCT image. From these can be defined the projectiondistance (PD, in pixels), the angular error (AE, in degrees), andthe reprojection distance (RPD, in mm) similar to discussionin [37] for x-ray projections. However, unlike x-ray projectionimaging, the camera imaging geometry in Figure 4 describesthe imaging plane as a virtual image. In a complete pinholecamera model the image forms upside-down behind the opticalcenter of the camera on the imaging sensor where the pixelsize is fixed. The virtual image is magnified in front of thecamera and the pixels are also magnified to match the virtualimage size. For this reason, there is no magnification of PD,unlike that in x-ray projection geometry, which is magnifiedby the source-detector distance (SDD). In place of SDD, wehave the focal length of the camera, measured as the distancefrom the optical center to the image sensor, which was fixedthroughout all studies reported below. The registration errorcan be alternatively described in terms of the AE as shown inFigure 4, describing the angle between the two rays emanatingfrom the optical center and containing the target points in 2Dand 3D.

The metric primarily used below is the RPD, which is theperpendicular distance from the 3D target point (t3D) to theray r extending from the camera center (o) through the pointon the image plane (tim):

RPD =

∥∥∥∥t3D −(

o+ r(

r · (t3D −o)r · r

))∥∥∥∥ (2)

This form of reprojection distance is zero when the regis-tration is perfect and is similar to the RPD defined in [37].However, unlike RPD under x-ray projection geometry, theRPD for camera geometry is magnified by the distance (d)between the optical center of the camera (o) and the target in3D (t3D).

PD (px)

AE (deg)

RPD (mm)Distance to Target (d)

o

t3D

timCBCT

Figure 4. The relationship between different measurements of TRE. Thevariable o is the optical center of the camera. The variable tim is the targetin the image and t3D is the target in the CBCT volume. AE, PD, and RPDare the different measures of TRE in degrees, pixels, and mm respectively.

D. Evaluation Methodology

Given the number of parameters of the system (nine investi-gated in detail below), a systematic approach was undertakento evaluate each parameter first using a rigid phantom tobest isolate the parameter and minimize other sources ofregistration error. To this end, first the camera parameterswere evaluated with a rigid phantom, followed by the CBCTimage parameters, and finally the registration parameters, andthe investigation was then translated to cadaver studies forvalidation in a more realistic anatomical context.

E. Performance Evaluation in Phantom

The geometric accuracy of image-based registration wasevaluated as a function of the following factors of systemgeometry, image quality, and computational speed.

1) Dependence of Registration Accuracy on GeometricPose: The effect of geometric pose on RPD was investigatedas a function of both the distance to target (d) and the baselinebetween image pairs (b). The first concerns the magnificationof RPD as mentioned above, and the second holds implicationsfor how video frames should be sampled relative to the speedof endoscope motion. We hypothesized that a larger baselineimproves the SfM reconstruction (as shown in the visionliterature [38]) and thereby improves registration. However,we further hypothesized upper and lower limits to this im-provement due to lack of sufficient motion at the lower limit(small b) and lack of features at the upper limit (large b).



5

2) Dependence of Registration Accuracy on Image Quality:Two principal factors of CBCT image quality were investi-gated in terms of the effect on video-CBCT registration accu-racy. The first involved the level of quantum noise in CBCTreconstructions, which is directly related (inverse square rootdependence) to the radiation dose used in forming the image:reducing dose by a factor of 4 increases noise by a factorof 2. Dose was varied by adjustment of tube current (mA)over the range (0.1–6.5 mA) allowed by the C-arm prototype[10]. Other technique factors were fixed—e.g., 100 kVp, 200projections and reconstruction by a modified FDK algorithm[39] with a reconstructed voxel size of 0.3 mm × 0.3 mm ×0.3 mm. The corresponding range in dose, D, was 0.5–38.4mGy.

Another method to reduce the dose to the patient is to limitthe number of projections acquired. Fewer projections wouldincrease streaking associated with view sampling effects. Inthis case the CBCT images were acquired with our standardscan protocol, which involves 200 projection images, thus thelevel of streak artifacts evident in this paper is typical ofthat achieved with the prototype C-arm. While the numberof projections is potentially a parameter to be investigated(motivated primarily be reduced radiation dose and reducedreconstruction time), previous work [12] shows that for thecurrent system and application, low dose acquisition is betterachieved by mA reduction (rather than reduction of the numberof projections).

The second factor concerned spatial resolution, which canbe freely adjusted in trade-off with quantum noise. As a simpleinvestigation of post-reconstruction smoothing, we applied a3D Gaussian smoothing filter to the CBCT images (kernelwidth, σ , ranging 0–5 voxels), thereby blurring the imagesin a manner that reduced image noise while sacrificing detailand fine image features. We hypothesized that video-CBCTregistration accuracy would suffer at low dose levels (i.e., highnoise levels) and would benefit from a certain level of imagesmoothing (perhaps around an optimal trade-off between noiseand spatial resolution).

3) Dependence of Registration Accuracy on Input DataSize: We further investigated the dependence of registrationaccuracy on algorithmic parameters that have a direct bearingon computational speed. Two primary factors were considered:the number of input features (N f ) and the number of polygons(Npoly) used in registration, as these determine the amountof time needed to compute the registration. The number ofpolygons used is equal to the initial number of polygons timesthe decimation percentage dc, i.e. Npoly = Npoly(1− dc). Wehypothesized that fewer features and/or polygons will improveruntime and decrease accuracy. Quantifying the steepness ofthis dependence in the measurements reported below provideda guide to selecting parameters that satisfied requirementsin geometric accuracy within the runtime constraints of animplementation that would be practical for eventual clinicaluse.

F. Performance Evaluation in CadaverThe system was deployed in pre-clinical studies in which a

fellowship-trained neurosurgeon performed endoscopic skull

base target resection in a cadaveric head specimen as illus-trated in Figure 5. The cadaver was fixed using a mixture ofphenol and formalin. After the fixation process the cadaverremains slightly moist. This preparation allows for a cadaverthat remains flexible to closely simulate actual tissue, and thereflectance of (moist) endonasal tissues is fairly realistic. Thesetup included the same components labeled in the benchtopsetup of Figure 1, along with the mobile C-arm prototype forintraoperative CBCT and the surgical navigation interface [40]combining real-time tracking and video-CBCT registration.The specimen was rigidly secured in a Mayfield skull clamp(Integra Corp., Plainsboro, NJ, USA) mounted to an x-raycompatible carbon fiber operating table.

CBCT

P

OXR E

N

Figure 5. Cadaver study experimental setup. The labels refer to: N (thetracker), CBCT (the CBCT data), R (the reference marker), X (the pointer), E(the endoscope), O (the optical center of the endoscope), and P (the phantom—i.e., cadaveric head).

The geometric accuracy of image-based and tracker-basedvideo-CBCT registration was quantified by measurement of



6

RPD in the cadaver. Unambiguous target points were createdby gently piercing a 27-gauge needle through the thin layerof bone at the tuberculum sellae and the floor of the sellaturcica. The targets were large enough (∼1 mm diameter) tobe identified clearly in both CBCT and endoscopic video. Eachtarget was manually identified in CBCT to define its location(repeated localization providing sub-voxel accuracy in meantarget location). The targets were also manually identified foreach video frame of endoscope video in which they appeared.We grouped a total of 31 independent views of the four targets(pinholes) created on the clivus in one cadaveric specimen. Apassive articulated arm was used to support the endoscope,facilitate stable recording of video data and avoid potentialsynchronization errors between the tracker and the endoscope.

III. RESULTS

The results below first summarize the performance ofthe image-based method measured as a function of factorsof system geometry, image quality, and computational load,providing quantitation of hypothesized trends and a guide toparameter selection. The image-based method is then testedin a pre-clinical cadaver study in comparison to conventionaltracker-based registration. The camera calibration used forboth the tracker-based and imaged-based registration had 0.8pixels mean reprojection error. The error of the tracking system(NDI Virca) used for both registration methods is reportedto have localization error of 0.25 mm [41]. To evaluate the(intra-observer) reproducibility in the definition of tim and t3Dpoint selection, both tim and t3D were segmented ten timesin independent trials for each fiducial, showing an overallstandard deviation of 1.48 pixels and 0.06 mm, respectively.Projecting both of these through the RPD equation at andistance to target of d = 15mm yields an increase of 0.170mm, which is well below the RPD reported below.

A. Performance Evaluation in Phantom

1) Dependence of Registration Accuracy on GeometricPose: As detailed above, the rigid red skull phantom wasused in analysis of registration accuracy versus factors ofsystem geometry. Figure 6 presents the RPD measured as afunction of the baseline distance and camera-to-target distance.The effects of magnification are evident, as shown by thepositive slope in RPD(d), consistent with the stated hypothesis.Another clear trend is that larger baseline distances betweenvideo image pairs improve the RPD over the range shown. Thelinear fit in each case follows the model sin(AE)∗d = RPD.Measurements at b = 2.0 mm and d = 10 mm were notpossible, because the images were too disparate. In each case,the image-based video-CBCT registration shows improvedaccuracy in comparison to the conventional tracker-basedregistration.

Figure 7 shows the measurements of RPD as a function ofimage pair baseline distance (at a fixed distance to target ofd = 15 mm). Note that the baseline distance refers to in-planemotion along the x-axis of the image plane. The data suggestan operating range of 0.4 to 4 mm within which image-basedregistration outperforms tracker-based registration. Below this

10 15 20 250

0.5

1

1.5

2

2.5

3

3.5

4

Distance to Target (d) (mm)

RPD

(mm

)

Tracker−based

Baseline (b) (mm)0.51.01.52.0

Figure 6. Registration error (RPD) increases as a function of distance totarget (d) as a result of geometric magnification in projective geometry. Thebehavior is linear with the sine of the AE. The measurements demonstrateimproved registration accuracy for the image-based approach compared tothe tracker-based approach at all values of target distance (d) and imagepair baseline distance (b). Throughout this manuscript, boxplots show themedian (horizontal line), first and third quartile (lower and upper bounds ofthe rectangular box), and range (whiskers) of the data.

range, there was insufficient motion between frames to re-construct a reliable point cloud, and above this range, therewas an insufficient number of features to reliably computecorrespondence and 3D registration. The lower limit of theendoscope movement is related to the epipolar constraint usedto solve the structure from motion problem. However, theupper limit of the endoscope movement is a consequence ofthe chosen matching procedure.

As shown in Figure 8, RPD measured as a function ofbaseline distance perpendicular to the imaging plane (i.e.,along the z-axis) exhibits a larger operating range of 0.6to 9 mm. The larger range is primarily attributed to thelarger number of features remaining visible during endoscopemotion. As in Figure 7, below the lower limit of the operatingrange, there was insufficient motion to compute a reliable pointcloud, and above the upper limit there was an insufficientnumber of features.

The results of Figure 7 and Figure 8 hold implications forthe selection of video image pairs relative to the speed ofendoscope motion. Using image pair SfM, the side-to-sidemotion should be no faster than 4 mm for every 33 ms (thetypical frame rate of the video camera), corresponding to avelocity of 120 mm/s. In practice, the motion of the endoscopeis much slower (∼15 mm/s) when the surgeon is observingthe surgical site, although it can be quite fast (∼110 mm/s)when removing and reinserting the endoscope. At the otherextreme (b ∼0.3 mm, corresponding to ∼9 mm/s), image pairSfM breaks down if the endoscope motion is too slow (i.e.,nearly stationary). This may not be relevant in the context offreehand endoscopy but may become important for endoscopywith a tool-holder or robotic assistant, suggesting an additional



7

10−1

100

101

Baseline (b) (mm)

(Median)

(Min)

0 0.1 0.2 0.3 0.4 109876543210.5 0.6 0.7 0.8 0.9

RPD

(mm

)

Tracker−based (Max)

Figure 7. Registration error (RPD) mea-sured as a function of the baseline be-tween video image pairs. Camera-to-targetdistance (d) was fixed at 15 mm. The datasuggest an operating range of 0.4–4 mmin-plane motion (indicated with blue verti-cal lines). The performance of conventionaltracker-based registration is superimposed asgreen horizontal median and range lines.Within the operating range 0.4 ≤ b ≤ 4 mm,the image-based approach outperforms thetracker-based approach.

100

101

(Median)

(Tracker−based)

(Min)

(Max)

10−1

Baseline (b) (mm)0 0.1 0.2 0.3 0.4 109876543210.5 0.6 0.7 0.8 0.9

RPD

(mm

)

100

101

(Median)

(Tracker−based)

(Min)

(Max)Figure 8. Image-based registration methodwith z-axis motion at a distance of 15 mmshows the larger operation baseline of theout-of-plane motion from 0.6mm to 9mm(indicated with blue vertical lines).

“loop” in Figure 3 by which the tracker informs the image-based registration system how much the endoscope has moved,and image pairs are selected not necessarily from successiveframes, but at frames separated by a baseline distance withinthe operating range. However, this case would only occurif there were no motion during the initial registration, afterwhich the camera location can be tracked with 2D-3D featurecorrespondences.

Figure 9 shows a representative comparison of tracker-basedand image-based video-CBCT registration accuracy (using afixed 1 mm baseline distance for the latter). In this example,the mean PD for the tracker-based method was 19.3 pixels(with an interquartile range of 16.4 pixels), and the image-based method had a mean PD of 14.9 pixels (with an interquar-tile range of 10.5 pixels). The columns in Figure 9 show:the air/tissue surface segmented from CBCT; an actual videoimage view overlaid by targets identified in the video andestimated target point locations of the CBCT; and the spatialdistribution of the RPD interpolated from the measurementpoints over the entire field of view (interpolation by a multi-quadric radial basis function). The fairly small difference

between the two methods is expected in this case becauseof the completely rigid phantom, but the results are consistentwith the motivation for improved registration accuracy usinga tracker-based initialization and an image-based refinement.

2) Dependence of Registration Accuracy on Image Quality:As detailed above, the anthropomorphic head phantom wasused in analysis of registration accuracy versus factors ofimage quality, focusing on effects of image noise (dose) andspatial resolution. Figure 10 shows the effect of dose, wherewe hypothesized that lower dose would degrade (i.e., increase)RPD due to increased noise in the CBCT air/tissue surfacesegmentation. While there is an appreciable increase in imagenoise at the lowest dose levels, we found that dose impartedlittle or no significant effect on RPD over the range considered.While the hypothesis may be correct, it does not apply overthe range of dose deliverable by the CBCT C-arm. This isan advantageous finding in that image-based video-CBCTregistration may be performed without loss of accuracy evenat the lowest dose levels (e.g., 0.6 mGy) of intra-operativescanning. This corresponds to approximately 1/100th the doseof a typical diagnostic CT of the head (∼52.0 mGy) [42].



8

Surface Rendering Real Video Image RPD (x, y)Tr

acke

r-Bas

edIm

age-

base

d

2.25

1.50

0.75

0.00

RPD (mm)

2.25

1.50

0.75

0.00

RDP (mm)

Figure 9. Example comparison of video-CBCT registration using the conventionaltracker-based method (top) and the proposedimage-based method (bottom). Each row cor-responds to the rigid red skull phantom. Thefirst column shows the air/tissue isosurfacesegmented from CBCT. The second columnis an endoscope video image overlaid by thevideo (red) and estimated CBCT (blue) loca-tion of target points. The third column showsthe RPD interpolated over the endoscopefield of view, illustrating the improvementin registration accuracy for the image-basedmethod as in Figures 7 and 8.

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Doses (mGy)

Image−based TRE (without smoothing)

TRE

(mm

)Tracker−based Max

Median

Min

0.6 5.3 13.6 24.2 38.40.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0.6 5.3 13.6 24.2 38.4Dose (mGy)

Image−based TRE (without smoothing)

RPD

(mm

)

0.6 mGy 13.6 mGy 38.4 mGy

Sinus

Surface Rendering

TargetRegion(zoom)

0 5 10 15 20 25 30 35 400.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Dose (mGy)

RPD

(mm

)Tracker−based (Max)

(Median)

(Min)

Figure 10. Effect of dose (i.e., CBCT image noise) on video-CBCT registration accuracy. Over the range of dose considered, there is negligible effecton RPD. Although there is a visible increase in CBCT quantum noise at the lowest dose levels (fluctuations visible in both the axial slice images and theair/tissue surface segmentation), the surface registration appears to be robust down to the lowest dose deliverable by the C-arm. In each case, the image-basedregistration approach again outperformed tracker-based registration.

Figure 11 shows the effect of 3D image smoothing onvideo-CBCT registration accuracy. In these results, dose wasfixed at a nominal value of 24.2 mGy. As hypothesized, a(fairly weak) trend is exhibited in which RPD increases (de-grades) at high levels of image smoothing (σ >∼ 2.5 voxels)for which features of the air/tissue surface segmentation arelost. Following from the results in Figure 11, where therewas no observed dependency of RPD on dose over the rangeconsidered, there was similarly no optimum in smoothing—i.e., no value of σ below which RPD increased due to imagenoise. Under these circumstances, there appeared to be nobenefit to image smoothing, although one might hypothesizeconditions of increased image noise (e.g., still lower dose lev-els and/or large body sites) for which smoothing in the range(0< s< 2.5) may be beneficial to video-CBCT registration ac-curacy. In the results of both Figure 10 and Figure 11, image-

based registration outperformed tracker-based registration atthe nominal settings.

Figure 12 presents an example endoscope view in theanthropomorphic phantom with tracker-based and image-basedvideo-CBCT registration (nominal 2 mm baseline, 38.4 mGy,and no smoothing). In this case the PD for the tracker-basedmethod exhibited a mean of 27.2 pixels (interquartile rangeof 11.9 pixels), and the image-based method gave a mean PDof 20.2 pixels (interquartile range of 6.3 pixels). Analogousto the results of Figure 9, the results show a measurableimprovement for the image-based registration, although theeffect is fairly small (∼0.25–0.5 mm) due to the use of ahighly rigid phantom.

3) Dependence of Registration Accuracy on Input DataSize: The effect of parameters associated with image-basedregistration runtime (viz., the number of features and thenumber of polygons) was investigated to determine the po-



9

No smoothing, σ = 0 σ = 2.5 σ = 5

Sinus

TargetRegion(zoom)

Surface Rendering

0.6

0.8

1

1.2

1.4

1.6

Smoothing (σ)0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

RPD

(mm

)

Tracker−based (Max)

(Median)

(Min)

Figure 11. Effect of spatial resolution (i.e., a 3D image smoothing kernel) on video-CBCT registration accuracy. Smoothing is seen to reduce CBCT imagenoise at the cost of spatial resolution, and a small increase in RPD is observed over the range considered. Only at the highest levels of smoothing (majorloss of image features in the segmented air/tissue boundary) does the RPD for the image-based degrade to a level at or greater than that of the tracker-basedapproach.

Surface Rendering Real Video Image RPD (x, y)

Trac

ker-B

ased

Imag

e-ba

sed

2.25

1.50

0.75

0.00

RPD (mm)

2.25

1.50

0.75

0.00

RPD (mm)

Figure 12. Example frame comparingtracker-based (top) and image-based (bot-tom) methods of video-CBCT registrationin the anthropomorphic phantom. Dose andspatial resolution are fixed (D = 38.4 mGyand σ = 0 voxels). The first column showsthe air/tissue isosurface segmented fromCBCT. The second column shows an en-doscope video image overlaid with video(red) and estimated CBCT (blue) target pointlocations. The third column shows the RPDinterpolated over the endoscope field of view.

tential for increased computational speed without loss ofregistration accuracy. Figure 13 summarizes the results. Thenumber of features (Figure 13a) demonstrated a fairly weakeffect on runtime, governed by the search time of the kd-tree, O(Npoly log(Npoly)) [43], where Npoly is the number ofpolygons in the tree. Since the search is called at everyreconstructed feature point, the runtime performance is ap-proximately O(N f Npoly log(Npoly)), where N f is the number offeatures. In Figure 13a, Npoly was held constant (∼ 3.5×105

polygons) while N f was varied. The effect on runtime is fairlysmall above N f > ∼ 20. Figure 13b shows that the numberof features had a significant effect on RPD—specifically asharp increase in RPD for fewer than ∼ 40 features, belowwhich registration fails. The result is consistent with ourhypothesis and suggests a nominal, stable operating conditionof N f >∼ 100.

Figure 13c shows the strong effect of Npoly on runtime—approximately O(N f Npoly log(Npoly)) where Npoly is muchlarger than N f and construction time of the kd-tree beingO(Npoly log(Npoly)). The runtime increases roughly in linearproportion to Npoly over the range examined. However, Figure13d shows that the RPD performance degrades rapidly withreduction in the number of polygons, with registration accu-racy becoming worse than that of the tracker-based methodfor Npoly >∼ 3.5×105.

B. Performance Evaluation in Cadaver

The results of Figures 6–13 guided selection of nominaloperating parameters—e.g., b ∼2 mm, d ∼15 mm, D ∼24.2mGy, σ = 0 voxels, N f unchanged (i.e. no reduction in thenumber of features), and dc = 0. Following such characteri-zation of the underlying factors of registration and runtime



10

0 50 100 150 200 2500

10

20

30

40

50To

tal T

ime

(s)

Number of Features

log fit

(a)0 50 100 150 200 25010−1

100

101

102

Number of Features

Mea

n R

PD (m

m)

Tracker-based

40 feature threshold

(b)0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

0

10

20

30

40

50

Tota

l Tim

e (s

)

Number of Polygons

linear fit

(c)0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

0.5

0.6

0.7

0.8

0.9

1

Mea

n R

PD (m

m)

Number of Polygons

linear fit

Tracker-based

(d)

Figure 13. Computational load, runtime, and registration accuracy. (a) Reducing the number of features exhibits a minimal effect on runtime performanceabove a minimum of ∼ 20 features, and (b) registration accuracy shows a fairly sharp threshold at ∼ 40 features below which the image-based registrationessentially fails. (c) Runtime exhibits a linear dependence on the number of polygons, and (d) arbitrary reduction in the number of polygons imparts ameasureable increase in RPD (with performance inferior to the conventional tracker-based approach for fewer than ∼ 3.5×105 polygons).

performance, we evaluated the geometric accuracy of thesystem in a cadaver model. Figure 14 summarizes the overallmagnitude and range of RPD measurements in the cadaverstudy. The mean RPD for the tracker-based registration methodwas 1.82 mm (with 1.09 mm first quartile and 1.25 mm range).By comparison, the mean RPD for image-based registrationwas 1.28 mm (with 0.66 mm first quartile and 1.06 mm range).The improvement in RPD over the tracker-based methodwas statistically significant (p < 0.001). With regard to thestatistical analysis, we previously showed that the distributionof RPD is not normal, so we applied the transformation tonormality as described in [25]. We then applied a Student’st-test for statistical significance, and the data were trimmedof outliers based on the interquartile range on the transformeddistribution. Two outliers were identified—one due to incorrectscale estimation and one due to an anomalous failure toconverge.

0

1

2

3

4

Tracker−based Image−based

RPD

(mm

)

Cadaver Evaluation (N = 124, p < 0.001)

Figure 14. Comparison of tracker-based and image-based registration accu-racy in a cadaver study. A statistically significant improvement (p < 0.001)in RPD was measured for the image-based approach (mean RPD = 1.28 mm)in comparison to the tracker-based approach (mean RPD = 1.82 mm).

Figure 15 illustrates the results in images of the cadaver,showing example tracker-based and image-based registrationfrom the 31 cases used in the experiment. In this instance, thePD for the tracker-based method had a mean of 25.3 pixels(with interquartile range of 1.3 pixels) and the image-basedmethod had a mean of 12.9 pixels (interquartile range of 4.1

pixels). The overlay of video (red) and estimated CBCT (blue)target points in the video image shows a measurable improve-ment in localization accuracy, and the map of RPD(x,y) overthe endoscope field of view suggests a significant improvementin video-CBCT registration accuracy.

IV. DISCUSSION AND CONCLUSION

The experiments detailed in this work quantify the measureddependencies of pose, image quality and input data size onthe performance of image-based video-CBCT registration. Wefound, as supported by the computer vision literature, that anample baseline distance between image pairs is required tohave an accurate reconstruction and registration—specifically,0.5 mm < b < 4 mm. Furthermore, the system was found tobe robust to various factors of CBCT image quality, includingdose (down to the lowest levels deliverable by the CBCTC-arm prototype, ∼1/100th the dose of a diagnostic CT)and spatial resolution (additional image smoothing did notimprove registration). Parameters affecting registration runtimeshowed a weak dependence on the number of features (buta susceptibility to registration failure below N f ∼40) anda stronger dependence on the number of polygons (Npoly >∼ 3.5×105). With an understanding of the factors governingvideo-CBCT registration accuracy, we conducted a cadaverstudy that demonstrated statistically significant improvementin registration accuracy for the image-based approach in com-parison to the conventional tracker-based approach (Figures14–15).

The effect of system geometry on video-CBCT registrationaccuracy implies certain limitations on the speed at whichthe endoscope can be moved during acquisition of an imagepair—specifically, less than 120 mm/s side-to-side (x-y) and270 mm/s front-to-back (z), both of which are well abovetypical freehand motion of ∼5–50 mm/s. Additionally, motionin either the x-y or z directions is adequate for SfM point cloudgeneration and accurate registration provided that the motionprovides an ample baseline distance between image pairs—approximately 0.4–0.6 mm or greater (up to ∼4–5 mm).

The effect of CBCT image quality on registration accuracydemonstrated that dose (i.e., noise) and spatial resolution inCBCT reconstructions exhibited little effect over the rangeconsidered. This showed the potential to operate at the lowest



11

Surface Rendering Real Video Image RPD (x, y)Tr

acke

r-Bas

edIm

age-

base

d 2.25

1.50

0.75

0.00

RPD (mm)

2.25

1.50

0.75

0.00

RPD (mm) Figure 15. Example result of video-CBCTregistration in cadaver. The tracker-based(top) and image-based (bottom) methodsare overlaid by video (red) and estimatedCBCT (blue) locations of four target points(pinholes pricked in the tuberculum sellaeand floor of the sella turcica). The firstcolumn shows the air/tissue isosurface seg-mented from CBCT. The second columnshows an example endoscope image over-laid with video and estimated CBCT targetlocations. The third column shows the RPDinterpolated over the endoscope field of view,suggesting a modest but significant improve-ment in registration accuracy for the image-based approach.

dose settings available on the C-arm (0.6 mGy) while stillproviding accurate registration. Other image quality factors notconsidered in the current work could diminish performance—e.g., streak artifacts from metallic components in the image,which would potentially confound a simple air/tissue surfacesegmentation in CBCT.

The effect of input data size on runtime and registration ac-curacy showed that it is possible to achieve some improvementin computational speed by reducing the input data; however,further work is needed to accomplish this in a way that doesnot adversely affect registration accuracy.

While a fair, direct comparison between closed-source meth-ods is difficult to achieve and is beyond the scope of thiscurrent work, we offer a comparison of results reported in theliterature [21], [22]. It is worth noting that such a comparisonwould best be conducted via multi-institutional collaborationusing shared data / phantoms or via open-source / open-dataset so that each method may be applied fairly. In Luoet al. [21], an analysis of the pose error between an EMtracker and the video-registration was presented, with the EMtracker used as the ground-truth data. A mean translation errorwas reported between 0.679 and 0.875 mm with orientationerror approximately 0.5 degrees. For context, error at thislevel is within the 95% confidence interval of the AuroraEM tracker (Northern Digital Inc., Waterloo, Ontario, Canada)[44]; however, the model of the EM tracker was not reported.Similarly, the results presented above also surpass the tracker-base initialization. In Higgins et al. [22], the error was reportedas the perpendicular distance between the biopsy needle anda metal bead target (conceptually similar to RPD), with anoverall mean error of 1.97 mm, similar to the RPD reportedabove.

The results reported above help guide parameter selection inthe implementation of image-based video-CBCT registrationsuitable for clinical use, and they provide the first demonstra-tion of improved geometric accuracy in video-CBCT registra-tion over the conventional tracker-based approach. Of course,the work is not without limitations, areas of future improve-ment, and further investigation. One area of improvement infuture work is the avoidance local minima in the optimization

to allow for further refinement of the solution, as evident in thelack of additional improvement over the track-based methodin the first phantom study. ASCK may be adapted to addressthe robustness of registration in future work. The currentmethod performs SfM point cloud reconstruction from a singlepair of video images and performs the 3D point-surfaceregistration from only that one pair. This approach can beimproved through incorporation of a collection of images witha bundle adjustment in the point cloud reconstruction. Clinicalimplementation would likely require tighter synchronizationof the surgical tracker and the endoscopic video, mitigatedin the experiments herein by using a stable tool holder forthe endoscope and acquiring images using a static step andshoot. Furthermore, the phantoms and cadavers do not reflectthe complexities and anatomical deformation in real surgicalsituations. Additionally, the image-based method presentedin this work is best suited for applications in which rigidregistration is sufficient to align the reconstructed anatomy tothe intraoperative CBCT. For this reason, we focused on theregistration of structures of the skull base, in which the motion,deformation and resection during surgery can be captured withan up-to-date CBCT scan. Note that this is a potentially signif-icant improvement over the conventional context of trackingin preoperative CT, which fails to capture such intraopera-tive deformations and tissue excisions. Alternative advancedmethods such as nonrigid SfM [45] could be employed toreconstruct and register more dynamic deformable structures(i.e., deformations occurring in the short time interval betweenthe last CBCT and the current video acquisitions). Otherareas of improvement include a streamlined camera calibrationprocess, such as that reported in [46].

A final point of methodological note is the fact that image-based registration in the context of up-to-date intraoperativeCBCT properly reflects and utilizes the current state of theanatomy, including surgical excisions of the anterior sphe-noid sinus. Such is possible only to a limited extent in thecontext of registration to preoperative CT, where the realpromise of image-based registration is only partly realized—i.e., the image features do not necessarily match once theintraoperative scene is perturbed due to tissue deformation



12

and/or excision. Under conditions in which the anatomy isdramatically altered from the preoperative state, high-precisionimage-based registration would only be feasible in the contextof intraoperative imaging, such as CBCT. A separate studyinvolving comparison of image-based registration accuracy inthe context of preoperative versus intraoperative imaging isplanned for future work.

The limitations of video-CT registration using the traditionalapproach of external tracking are fairly well recognized [17],[47]. This work reports a two-fold advance: first via regis-tration in the context of up-to-date CBCT (which properlyreflects intraoperative change) and second via image-basedregistration that directly utilizes the video and CBCT data toperform a precise registration. Registration in preoperative CTalone is difficult or impossible following significant resectionor deformation in surgery, motivating intraoperative imagingto utilize the wealth of information in the video data. Themethod reported above enables a higher level of accuracythan the conventional tracker-based approach, offering a robustreal-time initialization by the tracker followed by a preciseimage-based refinement to sub-mm accuracy. Furthermore, thework demonstrates that CBCT can provide the intraoperativeimaging at a dose that is sufficiently low (∼0.6 mGy per scanversus 52.0 mGy) to permit repeat intraoperative scanning andallow accurate registration in an up-to-date anatomical context.

Looking forward, we envision the approach implemented ina tightly integrated hybrid configuration whereby the trackerprovides robust initialization to the registration, which is inturn refined as video data arrives and is processed in nearreal-time. The system would need to detect failures of theimage-based method and fall back to the tracker-based methodwithout disruption of the video stream and data overlay. As theregistration is refined and evolves, however, the display wouldprovide the most accurate visualization available at that time.This system should also eventually incorporate image-basedtracking and bundle adjustment of collections of images inregistration to the CBCT.

ACKNOWLEDGMENTS

A preliminary version of this work was presented at SPIEMedical Imaging 2011 [23]. Academic-industry partnership inthe development and application of the mobile C-arm proto-type for CBCT is acknowledged, in particular collaborators atSiemens Healthcare (Erlangen Germany)—Dr. Rainer Grau-mann, Dr. Gerhard Kleinszig, and Dr. Christian Schmidgunst.Cadaver studies were performed at the Johns Hopkins MedicalInstitute, Minimally Invasive Surgical Training Center, withsupport and collaboration from Dr. Michael Marohn and Ms.Sue Eller (Department of Surgery, Johns Hopkins University)gratefully acknowledged.

REFERENCES

[1] A. B. Kassam, D. M. Prevedello, R. L. Carrau, C. H. Snyderman,A. Thomas, P. Gardner, A. Zanation, B. Duz, S. T. Stefko, K. Byers,and M. B. Horowitz, “Endoscopic endonasal skull base surgery:analysis of complications in the authors’ initial 800 patients,” Journalof Neurosurgery, vol. 114, no. 6, pp. 1544–1568, 2011. [Online].Available: http://thejns.org/doi/abs/10.3171/2010.10.JNS09406

[2] C. H. Snyderman, H. Pant, R. L. Carrau, D. Prevedello, P. Gardner,and A. B. Kassam, “What are the limits of endoscopic sinus surgery?:The expanded endonasal approach to the skull base,” The Keio Journalof Medicine, vol. 58, no. 3, pp. 152–160, 2009. [Online]. Available:http://www.jstage.jst.go.jp/article/kjm/58/3/58 152/ article

[3] R. Heermann, B. Schwab, P. Issing, C. Haupt, C. Hempel,and T. Lenarz, “Image-guided surgery of the anterior skullbase.” Acta Oto-Laryngologica, vol. 121, no. 8, pp. 973 –978, 2001. [Online]. Available: http://search.ebscohost.com/login.aspx?direct=true&db=rzh&AN=2009441452&site=ehost-live

[4] J. Wiltfang, S. Rupprecht, O. Ganslandt, C. Nimsky, P. Keßler,S. Schultze-Mosgau, R. Fahlbusch, and F. Wilhelm Neukam, “Intra-operative image-guided surgery of the lateral and anterior skull base inpatients with tumors or trauma,” Skull Base, vol. 13, no. 01, pp. 21 –29, 2003.

[5] M. Fried, S. Parikh, and B. Sadoughi, “Image-guidance for endoscopicsinus surgery,” The Laryngoscope, vol. 118, no. 7, pp. 1287–1292,2008. [Online]. Available: http://www3.interscience.wiley.com/cgi-bin/fulltext/121605887/HTMLSTART

[6] G. J. Ledderose, H. Hagedorn, K. Spiegl, A. Leunig, and K. Stelter,“Image guided surgery of the lateral skull base: Testing a new dentalsplint registration device,” Computer Aided Surgery, vol. 17, no. 1, pp.13–20, 2012. [Online]. Available: http://informahealthcare.com/doi/abs/10.3109/10929088.2011.632783

[7] T. D. Grauvogel, E. Soteriou, M. C. Metzger, A. Berlis, and W. Maier,“Influence of different registration modalities on navigation accuracyin ear, nose, and throat surgery depending on the surgical field,” TheLaryngoscope, vol. 120, no. 5, pp. 881–888, 2010. [Online]. Available:http://dx.doi.org/10.1002/lary.20867

[8] G. Strauß, K. Koulechov, S. Rottger, J. Bahner, C. Trantakis, M. Hofer,W. Korb, O. Burgert, J. Meixensberger, D. Manzey, A. Dietz,and T. Luth, “Evaluation of a navigation system for ent withsurgical efficiency criteria,” The Laryngoscope, vol. 116, no. 4, pp.564–572, 2006. [Online]. Available: http://dx.doi.org/10.1097/01.MLG.0000202091.34295.05

[9] N. Cohen and D. Kennedy, “Revision endoscopic sinus surgery,” Oto-laryngologic Clinics of North America, vol. 39, no. 3, pp. 417–35, 2006.

[10] J. H. Siewerdsen, M. J. Daly, H. Chan, S. Nithiananthan,N. Hamming, and J. Irish, “Computer assisted head and neck,and ent surgery,” International Journal of Computer Assisted Radiologyand Surgery, vol. 4, pp. 71–80, 2009, 10.1007/s11548-009-0325-y.[Online]. Available: http://dx.doi.org/10.1007/s11548-009-0325-y

[11] M. J. Daly, J. H. Siewerdsen, D. J. Moseley, D. A. Jaffray, and J. C. Irish,“Intraoperative cone-beam ct for guidance of head and neck surgery:Assessment of dose and image quality using a c-arm prototype,” Medicalphysics, vol. 33, no. 10, pp. 3767–3780, 2006.

[12] S. Schafer, S. Nithiananthan, D. J. Mirota, A. Uneri, J. W. Stayman,W. Zbijewski, C. Schmidgunst, G. Kleinszig, A. J. Khanna, and J. H.Siewerdsen, “Mobile c-arm cone-beam ct for guidance of spine surgery:Image quality, radiation dose, and integration with interventionalguidance,” Medical Physics, vol. 38, no. 8, pp. 4563–4574, 2011.[Online]. Available: http://link.aip.org/link/?MPH/38/4563/1

[13] F. Caire, C. Gantois, F. Torny, D. Ranoux, A. Maubon, and J. J.Moreau, “Intraoperative use of the medtronic o-arm for deep brainstimulation procedures,” Stereotactic and Functional Neurosurgery,vol. 88, no. 2, pp. 109–114, 2010. [Online]. Available: http://www.karger.com/DOI/10.1159/000280823

[14] A. H. Jackman, J. N. Palmer, A. G. Chiu, and D. W. Kennedy,“Use of intraoperative ct scanning in endoscopic sinus surgery: Apreliminary report,” American Journal of Rhinology, vol. 22, no. 2, pp.170–174, 2008. [Online]. Available: http://www.ingentaconnect.com/content/ocean/ajr/2008/00000022/00000002/art00015

[15] B. Carelsen, W. Grolman, R. Tange, G. Streekstra, P. Van Kemenade,R. Jansen, N. Freling, M. White, B. Maat, and W. Fokkens, “Cochlearimplant electrode array insertion monitoring with intra-operative 3drotational x-ray,” Clinical Otolaryngology, vol. 32, no. 1, pp. 46–50,2007. [Online]. Available: http://dx.doi.org/10.1111/j.1365-2273.2007.01319.x

[16] J. H. Siewerdsen, D. J. Moseley, S. Burch, S. K. Bisland, A. Bogaards,B. C. Wilson, and D. A. Jaffray, “Volume ct with a flat-panel detectoron a mobile, isocentric c-arm: Pre-clinical investigation in guidance ofminimally invasive surgery,” Medical physics, vol. 32, no. 1, pp. 241–254, 2005.

[17] S. Liu, L. Gutierrez, and D. Stanton, “Quantitative evaluation foraccumulative calibration error and video-ct registration errors inelectromagnetic-tracked endoscopy,” International Journal of ComputerAssisted Radiology and Surgery, vol. 6, pp. 407–419, 2011. [Online].

http://thejns.org/doi/abs/10.3171/2010.10.JNS09406

http://www.jstage.jst.go.jp/article/kjm/58/3/58_152/_article

http://search.ebscohost.com/login.aspx?direct=true&db=rzh&AN=2009441452&site=ehost-live

http://search.ebscohost.com/login.aspx?direct=true&db=rzh&AN=2009441452&site=ehost-live

http://www3.interscience.wiley.com/cgi-bin/fulltext/121605887/HTMLSTART

http://www3.interscience.wiley.com/cgi-bin/fulltext/121605887/HTMLSTART

http://informahealthcare.com/doi/abs/10.3109/10929088.2011.632783

http://informahealthcare.com/doi/abs/10.3109/10929088.2011.632783

http://dx.doi.org/10.1002/lary.20867

http://dx.doi.org/10.1097/01.MLG.0000202091.34295.05

http://dx.doi.org/10.1097/01.MLG.0000202091.34295.05

http://dx.doi.org/10.1007/s11548-009-0325-y

http://link.aip.org/link/?MPH/38/4563/1

http://www.karger.com/DOI/10.1159/000280823

http://www.karger.com/DOI/10.1159/000280823

http://www.ingentaconnect.com/content/ocean/ajr/2008/00000022/00000002/art00015

http://www.ingentaconnect.com/content/ocean/ajr/2008/00000022/00000002/art00015

http://dx.doi.org/10.1111/j.1365-2273.2007.01319.x

http://dx.doi.org/10.1111/j.1365-2273.2007.01319.x



13

Available: http://dx.doi.org/10.1007/s11548-010-0518-4[18] M. J. Daly, H. Chan, E. Prisman, A. Vescan, S. Nithiananthan,

J. Qiu, R. Weersink, J. C. Irish, and J. H. Siewerdsen, “Fusion ofintraoperative cone-beam ct and endoscopic video for image-guidedprocedures,” in Medical Imaging 2010: Visualization, Image-GuidedProcedures, and Modeling, K. H. Wong and M. I. Miga, Eds.,vol. 7625, no. 1. SPIE, 2010, p. 762503. [Online]. Available:http://link.aip.org/link/?PSI/7625/762503/1

[19] R. Lapeer, M. S. Chen, G. Gonzalez, A. Linney, and G. Alusi, “Image-enhanced surgical navigation for endoscopic sinus surgery: evaluatingcalibration, registration and tracking,” The International Journal ofMedical Robotics and Computer Assisted Surgery, vol. 4, no. 1, pp.32–45, 2008. [Online]. Available: http://dx.doi.org/10.1002/rcs.175

[20] R. Shahidi, M. Bax, C. Maurer Jr, J. Johnson, E. Wilkinson, B. Wang,J. West, M. Citardi, K. Manwaring, and R. Khadem, “Implementation,calibration and accuracy testing of an image-enhanced endoscopy sys-tem,” Medical Imaging, IEEE Transactions on, vol. 21, no. 12, pp. 1524–1535, Dec 2002.

[21] X. Luo, M. Feuerstein, D. Deguchi, T. Kitasaka, H. Takabatake,and K. Mori, “Development and comparison of new hybrid motiontracking for bronchoscopic navigation,” Medical Image Analysis,vol. 16, no. 3, pp. 577 – 596, 2012, ¡ce:title¿Computer AssistedInterventions¡/ce:title¿. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1361841510001271

[22] W. E. Higgins, J. P. Helferty, K. Lu, S. A. Merritt, L. Rai, and K.-C.Yu, “3d ct-video fusion for image-guided bronchoscopy,” ComputerizedMedical Imaging and Graphics, vol. 32, no. 3, pp. 159 – 173,2008. [Online]. Available: http://www.sciencedirect.com/science/article/B6T5K-4RDB8WN-1/2/dbb491db374a9e45528e8a7269413a80

[23] D. J. Mirota, A. Uneri, S. Schafer, S. Nithiananthan, D. D. Reh,G. L. Gallia, R. H. Taylor, G. D. Hager, and J. H. Siewerdsen,“High-accuracy 3d image-based registration of endoscopic videoto c-arm cone-beam ct for image-guided skull base surgery,” inMedical Imaging 2011: Visualization, Image-Guided Procedures, andModeling, K. H. Wong and D. R. H. III, Eds., vol. 7964,no. 1. SPIE, 2011, pp. 79 640J–1 – 79 640J–10. [Online]. Available:http://link.aip.org/link/?PSI/7964/79640J/1

[24] H. C. Longuet-Higgins, “Review lecture: The perception ofmusic,” Proceedings of the Royal Society of London. Series B, BiologicalSciences, vol. 205, no. 1160, pp. 307–322, 1979. [Online]. Available:http://www.jstor.org/stable/77426

[25] D. J. Mirota, H. Wang, R. H. Taylor, M. Ishii, G. L. Gallia, andG. D. Hager, “A system for video-based navigation for endoscopicendonasal skull base surgery,” Medical Imaging, IEEE Transactionson, vol. 31, no. 4, pp. 963 –976, april 2012. [Online]. Available:http://dx.doi.org/10.1109/TMI.2011.2176500

[26] E. Prisman, M. J. Daly, H. Chan, J. H. Siewerdsen, A. Vescan, andJ. C. Irish, “Real-time tracking and virtual endoscopy in cone-beamct-guided surgery of the sinuses and skull base in a cadaver model,”International Forum of Allergy & Rhinology, vol. 1, no. 1, pp. 70–77,2011. [Online]. Available: http://dx.doi.org/10.1002/alr.20007

[27] A. D. Vescan, H. Chan, M. J. Daly, I. Witterick, J. C. Irish, andJ. H. Siewerdsen, “C-arm cone beam ct guidance of sinus andskull base surgery: quantitative surgical performance evaluation anddevelopment of a novel high-fidelity phantom,” in Medical Imaging2009: Visualization, Image-Guided Procedures, and Modeling, M. I.Miga and K. H. Wong, Eds., vol. 7261, no. 1. SPIE, 2009, p. 72610L.[Online]. Available: http://link.aip.org/link/?PSI/7261/72610L/1

[28] K. H. Strobl, W. Sepp, S. Fuchs, C. Paredes, and K. Arbter. (2010) DLRCalDe and DLR CalLab. Available online at http://www.robotic.dlr.de/callab/. Institute of Robotics and Mechatronics, German AerospaceCenter (DLR). Oberpfaffenhofen, Germany. Last accessed Oct. 15,2010. [Online]. Available: http://www.robotic.dlr.de/callab/

[29] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”International Journal of Computer Vision, vol. 60, pp. 91–110, 2004.[Online]. Available: http://citeseer.ist.psu.edu/lowe04distinctive.html

[30] E. Delponte, F. Isgro, F. Odone, and A. Verri, “Svd-matching usingsift features,” in Proceedings of the of the International Conference onVision, Video and Graphics, July 2005, pp. 125–132.

[31] H. Wang, D. Mirota, and G. D. Hager, “A generalized kernel consensusbased robust estimator,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 32, no. 1, pp. 178–184, 2010. [Online].Available: http://www.cs.jhu.edu/∼hwang/papers/tpami09.pdf

[32] T. S. Newman and H. Yi, “A survey of the marching cubesalgorithm,” Computers & Graphics, vol. 30, no. 5, pp. 854 – 879,2006. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0097849306001336

[33] M. A. Fischler and R. C. Bolles, “Random sample consensus: aparadigm for model fitting with applications to image analysis andautomated cartography,” Commun. ACM, vol. 24, pp. 381–395, June1981. [Online]. Available: http://doi.acm.org/10.1145/358669.358692

[34] C. Chou and H. H. Barrett, “Gamma-ray imaging in fourier space,”Opt. Lett., vol. 3, no. 5, pp. 187–189, Nov 1978. [Online]. Available:http://ol.osa.org/abstract.cfm?URI=ol-3-5-187

[35] J. Fitzpatrick, J. West, and J. Maurer, C.R., “Predicting error in rigid-body point-based registration,” Medical Imaging, IEEE Transactions on,vol. 17, no. 5, pp. 694–702, Oct 1998.

[36] D. Mirota, R. H. Taylor, M. Ishii, and G. D. Hager, “Direct endoscopicvideo registration for sinus surgery,” in Medical Imaging 2009:Visualization, Image-guided Procedures and Modeling. Proceedings ofthe SPIE, vol. 7261, February 2009, pp. 72 612K–1 – 72 612K–8.[Online]. Available: http://dx.doi.org/10.1117/12.812334

[37] E. B. van de Kraats, G. P. Penney, D. Tomazevic, T. van Walsum,and W. J. Niessen, “Standardized evaluation methodology for 2-D–3-D registration,” Medical Imaging, IEEE Transactions on, vol. 24, no. 9,pp. 1177 –1189, sept. 2005.

[38] Y. Ma, S. Soatto, J. Kosecka, and S. Sastry, An Invitation to 3-D Vision.New York: Springer Verlag, 2004, pp. 107–227.

[39] L. A. Feldkamp, L. C. Davis, and J. W. Kress, “Practical cone-beamalgorithm,” J. Opt. Soc. Am. A, vol. 1, no. 6, pp. 612–619, 1984. [Online].Available: http://josaa.osa.org/abstract.cfm?URI=josaa-1-6-612

[40] A. Uneri, S. Schafer, D. J. Mirota, S. Nithiananthan, Y. Otake, R. H.Taylor, and J. H. Siewerdsen, “Trek: an integrated system architecturefor intraoperative cone-beam ct-guided surgery,” International Journalof Computer Assisted Radiology and Surgery, vol. 7, pp. 159–173,2012. [Online]. Available: http://dx.doi.org/10.1007/s11548-011-0636-7

[41] (2012) Polaris family of optical tracking systems — specifications.Northern Digital Inc. Last accessed Oct. 30, 2012. [Online]. Available:http://www.ndigital.com/medical/polarisfamily-techspecs.php

[42] I. Pantos, S. Thalassinou, S. Argentos, N. L. Kelekis, G. Panayiotakis,and E. P. Efstathopoulos, “Adult patient radiation doses from non-cardiac ct examinations: a review of published results,” British Journalof Radiology, vol. 84, no. 1000, pp. 293–303, 2011. [Online]. Available:http://bjr.birjournals.org/content/84/1000/293.abstract

[43] B. C. Ooi, R. Sacks-Davis, and K. J. McDonell, “Spatial indexingin binary decomposition and spatial bounding,” Information Systems,vol. 16, no. 2, pp. 211 – 237, 1991. [Online]. Available:http://www.sciencedirect.com/science/article/pii/0306437991900163

[44] (2012) Aurora electromagnetic 3d tracking system accuracy andmeasurement volume. Northern Digital Inc. Last accessed Oct. 26, 2012.[Online]. Available: http://www.ndigital.com/medical/aurora-techspecs.php

[45] L. Torresani, A. Hertzmann, and C. Bregler, “Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors,” PatternAnalysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 5,pp. 878 –892, may 2008.

[46] W. P. Liu, D. J. Mirota, A. Uneri, Y. Otake, G. Hager, D. D. Reh,M. Ishii, G. L. Gallia, and J. H. Siewerdsen, “A clinical pilot study ofa modular video-ct augmentation system for image-guided skull basesurgery,” in SPIE Medical Imaging 2012: Image-Guided Procedures,Robotic Interventions, and Modeling, D. R. H. III and K. H. Wong,Eds., vol. 8316, no. 1. SPIE, 2012, p. 831633. [Online]. Available:http://link.aip.org/link/?PSI/8316/831633/1

[47] F. Schulze, K. Buhler, A. Neubauer, A. Kanitsar, L. Holton,and S. Wolfsberger, “Intra-operative virtual endoscopy for imageguided endonasal transsphenoidal pituitary surgery,” InternationalJournal of Computer Assisted Radiology and Surgery, vol. 5, pp. 143–154, 2010, 10.1007/s11548-009-0397-8. [Online]. Available: http://dx.doi.org/10.1007/s11548-009-0397-8

http://dx.doi.org/10.1007/s11548-010-0518-4

http://link.aip.org/link/?PSI/7625/762503/1

http://dx.doi.org/10.1002/rcs.175

http://www.sciencedirect.com/science/article/pii/S1361841510001271


http://www.sciencedirect.com/science/article/B6T5K-4RDB8WN-1/2/dbb491db374a9e45528e8a7269413a80

http://www.sciencedirect.com/science/article/B6T5K-4RDB8WN-1/2/dbb491db374a9e45528e8a7269413a80

http://link.aip.org/link/?PSI/7964/79640J/1

http://www.jstor.org/stable/77426

http://dx.doi.org/10.1109/TMI.2011.2176500

http://dx.doi.org/10.1002/alr.20007

http://link.aip.org/link/?PSI/7261/72610L/1

http://www.robotic.dlr.de/callab/



http://citeseer.ist.psu.edu/lowe04distinctive.html

http://www.cs.jhu.edu/~hwang/papers/tpami09.pdf



http://doi.acm.org/10.1145/358669.358692

http://ol.osa.org/abstract.cfm?URI=ol-3-5-187

http://dx.doi.org/10.1117/12.812334

http://josaa.osa.org/abstract.cfm?URI=josaa-1-6-612

http://dx.doi.org/10.1007/s11548-011-0636-7

http://www.ndigital.com/medical/polarisfamily-techspecs.php

http://bjr.birjournals.org/content/84/1000/293.abstract

http://www.sciencedirect.com/science/article/pii/0306437991900163

http://www.ndigital.com/medical/aurora-techspecs.php

http://www.ndigital.com/medical/aurora-techspecs.php

http://link.aip.org/link/?PSI/8316/831633/1

http://dx.doi.org/10.1007/s11548-009-0397-8

http://dx.doi.org/10.1007/s11548-009-0397-8

Documents

Evaluation of a System for High-Accuracy 3D Image …dan/papers/Mirota2013.pdfof the endoscope), and P (the phantom simulating a patient undergoing endoscopic skull base surgery)